Why Logistic Regression DOESN'T return probabilities?!

Model Calibration - EXPLAINED! Model Calibration. Fun!
SPONSOR
Kite is a free AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite. Love it! Learn more here:
www.kite.com/get-kite/?...
CODE: github.com/ajhalthor/model-ca...

Пікірлер: 36

  • @asrjy
    @asrjy3 жыл бұрын

    Great vid. Want to add few more points: Platt Scaling is used when the Calibration Plot looks like a sigmoid curve. Platt Scaling uses a modified Sigmoid function and solves an optimization problem to get A and B. Isotonic Regression is used when the calibration plot does not look like a sigmoid curve. Isotonic Regression breaks the curve into multiple linear models. Thus it needs more points than Platt Scaling.

  • @TheShadyStudios
    @TheShadyStudios3 жыл бұрын

    The combination of your status as a working Data Scientist and your decision to dive into the technical details of your chosen topics makes your videos match the definition of what I see as the ideal content for AI youtubers .. thanks!

  • @hbchaitanyabharadwaj3509
    @hbchaitanyabharadwaj35093 жыл бұрын

    Great explaination. Good to learn something new everyday.

  • @deeplearning5408
    @deeplearning5408 Жыл бұрын

    I watched several videos with explanation of Model Calibration and this is the best one. You've shown the practice not the theory and this is exactly what most Data Scientists need. Thank you and that KZread for recommending this video. I subscribed your channel immediately.

  • @CodeEmporium

    @CodeEmporium

    Жыл бұрын

    Thanks a lot! In more recent videos, I am doing a lot more math. But I’ll pivot back to programming too soon enough :)

  • @melodyzhan1942
    @melodyzhan19422 ай бұрын

    Thank you so much for making such great videos. It really helps for someone new to DS to quickly understand all the concepts. Appreciate explaining with actual codes and going through each step!

  • @shardulpingale2431
    @shardulpingale2431 Жыл бұрын

    Great explanation!❤️ Doubt : how do we calibrate probabilities in case of multi-class classification using OVO or the OVA SVMs?

  • @logicboard7746
    @logicboard7746 Жыл бұрын

    Well explained buddy. Short and to the point

  • @CodeEmporium

    @CodeEmporium

    Жыл бұрын

    Thanks!

  • @90benj
    @90benj3 жыл бұрын

    I wanted to ask you, how do you go about educating yourself on ML? Do you read papers regulary or are you researching specifically the topics your need for your work? I find it difficult to find projects to learn from that aren't so constructed that they don't teach much anymore. Thanks for your video, I liked the part were you showed your code, it would be really great, if you can comment your juypter notebook and put it in the description, if you don't mind, I would like to test and learn with the code myself.

  • @pravalipravali9989

    @pravalipravali9989

    3 жыл бұрын

    I'll tell you how I came across this topic. I was solving a basic classification problem on Kaggle. I wanted to make my predictions as accurate as possible to avoid opportunity costs. Also my data set was imbalanced. Hence, I was looking on various ways to deal with imbalanced data and adjusting thresholds and finally came across model calibration. So, I guess the key is to expose yourself to various problems and make your model work better.

  • @deepanshudashora5887
    @deepanshudashora58873 жыл бұрын

    great man, explained this very easily

  • @CodeEmporium

    @CodeEmporium

    3 жыл бұрын

    Thank you!

  • @fatiharisma9376
    @fatiharisma93763 жыл бұрын

    Hello sir, thank you for your excellent explanation. I have inquiry regarding calibration model for transition probabilities. However I don't understand how it works and I would like to learn and know more about it. I would be really appreciate, if you can assist me on this matter. Thank you.

  • @amos259
    @amos2593 жыл бұрын

    Can you Show that in Stata? thanks

  • @user-or7ji5hv8y
    @user-or7ji5hv8y2 жыл бұрын

    The motivation was really well explained

  • @CodeEmporium

    @CodeEmporium

    2 жыл бұрын

    Thanks 😊

  • @francoisplessier9913
    @francoisplessier99133 жыл бұрын

    I think the effect of the calibration in 2.2 is clearer when looking at the mean: it's now 10% which corresponds to the 0.9 ratio you used to define the unbalanced dataset (it was 23% on the uncalibrated model). Or I'm missing something? Anyway, thank you for this video, I wasn't aware of this calibration concept and it's quite clear now!

  • @CodeEmporium

    @CodeEmporium

    3 жыл бұрын

    Yeah. You could look at the mean. But we are typically dealing with long tailed distributions (lots of samples have small values and a few samples have large values). This upwards skews the mean. Hence i talk in medians instead. But either way, you can see the effects of the "reduction" in probability values

  • @francoisplessier9913

    @francoisplessier9913

    3 жыл бұрын

    @@CodeEmporium I understand, thanks for your reply!

  • @Han-ve8uh
    @Han-ve8uh2 жыл бұрын

    1. At 0:46 you mentioned probabilities may be skewed higher as a result of balancing, any resources explaining why this happens? (this video didn't demonstrate balancing) 2. At 4:24, why does half below 47% and half above 47% make sense, why mention "that's correct"? Is there an ideal y_pred.describe() output you have in mind for any problem that comes? What would be an incorrect .describe output? (At this moment we have no knowledge of calibration plots yet, just interpreting based on .describe output) 3. Is it fit calibration on valid and plot reliability curves on test, or vice-versa? (kzread.info/dash/bejne/c6mi0daapbabqaw.html did it the opposite way from your video, he used test set to train calibration layer and plotted curves on validation.)

  • @offchan
    @offchan3 жыл бұрын

    Great channel.

  • @CodeEmporium

    @CodeEmporium

    3 жыл бұрын

    Thanks a lot

  • @amortalbeing
    @amortalbeing3 жыл бұрын

    What do you mean by more represent probabilities @13:50?

  • @CodeEmporium

    @CodeEmporium

    3 жыл бұрын

    When you do undersampling/oversampling, the "probabilities" from a binary classifier are inflated to represent the minority class. I.e. they don't represent true probabilities

  • @amortalbeing

    @amortalbeing

    3 жыл бұрын

    @@CodeEmporium Aha! Thanks a lot man. really appreciate it and love your content. keep up the great job:)

  • @heavybreathing6696
    @heavybreathing66962 жыл бұрын

    If you didn't put in the class_weight parameter into your LR for the imbalanced dataset, the scores would still be pretty much in a "straight" line. What created your skewed calibration curve is the class_weight parameter. You only care about the class_weight parameter if you want to use a 0.5 as a threshold to label your predictions as a 1 or 0. If you're only looking at probabilities, you should NOT touch the class_weight parameter.

  • @Corpsecreate

    @Corpsecreate

    Жыл бұрын

    Absolutely spot on! Completely agree.

  • @TheBjjninja
    @TheBjjninja2 жыл бұрын

    I believe the strong use case would be in the medical industry where you need to "support" for the model's estimated probability say for having cancer as an example. I am not sure calibration is always required even for imbalanced dataset.

  • @Corpsecreate

    @Corpsecreate

    Жыл бұрын

    It's not required

  • @sankettilekar8650

    @sankettilekar8650

    11 ай бұрын

    Calibration is also required for credit risk model. In this case the actual event rate is used to calculate the unit economics and that's why it might be important that predicted and actual probabilities are close to each other.

  • @AshokKumar_2216
    @AshokKumar_22163 жыл бұрын

    🔴nice

  • @125errorz
    @125errorz3 жыл бұрын

    are you related to marques browlee the tech guy

  • @CodeEmporium

    @CodeEmporium

    3 жыл бұрын

    I am not. But that's a compliment :)

  • @subtlehyperbole4362
    @subtlehyperbole4362 Жыл бұрын

    I feel like I run into this issue all the time, but what do you even mean when you say the "true" probability in a situation like classification of something as fraud or not fraud? The "true" probability of any specific example is either 100% fraud or 100% not fraud, as something's identity as something fraudulent isn't an inherently probabalistic thing. This is in contrast to something like, say, whether the outcome a given plate appearance in baseball will be a home run or not. You could repeat the identical same plate appearance many many times and only some of them would result in a home run, most would not. That, at least, is what comes to my mind when we talk about predicting the "true" probability of unbalanced data events, that could be calibrated to. What am I missing?

  • @Corpsecreate
    @Corpsecreate Жыл бұрын

    Well done, but calibration is completely useless. In Case 2, turn off class_weight='balanced' and you wont need to calibrate anything. The only time you need to calibrate a model is when you play with the class weights or up-sample the minority class, both of which are things you should NEVER do in any circumstance.