Probability Calibration Workshop - Lesson 1

This is the first interactive lesson of a Probability Calibration Workshop -- presented by Brian Lucena at PyData Global. Model calibration in machine learning refers to the process of ensuring that the probabilities from your model are precise. There is an introduction which precedes this lesson. The notebooks associated with this workshop can be found in the repo github.com/numeristical/resou... in the folder "CalibrationWorkshop".
The workshop roughly covers the following topics:
Why calibration?
- What it means for model outputs to be *well-calibrated*.
- Why and when it is important (or not).
- Specific scenarios where calibration may be valuable.
Assessing the model
- How to determine if the model is well-calibrated.
- Reliability diagrams and how to use them.
- Issues with calibration for values close to 0/1.
Calibrating the model
- Illustration of the various techniques:
- Isotonic Regression
- Beta Calibration
- Platt Scaling
- Spline Calibration
- Demonstrating their use and results on real data.
- Tradeoffs between the approaches.
- Calibrating multi-class models.
Assessing the calibration
- Did the calibration improve model performance?
- Are there flaws in the calibration?
- How to adjust the calibration and improve further.

Пікірлер: 24

  • @tylervandermate6818
    @tylervandermate68182 жыл бұрын

    This workshop is an absolute goldmine! Everything is very clear. But I have the same question as Matt: Is there ever a situation where we wouldn't want to calibrate a model?

  • @numeristical

    @numeristical

    2 жыл бұрын

    There are some scenarios where you might make a decision based on a threshold that doesn't depend on a probability. For example, in fraud detection you might decide to flag the 10 riskiest transactions each day. Or you might choose a threshold based on precision / recall considerations. So it may not always be necessary. And calibrating very low probabilities (like 1 in 10,000 or 1 in 100,000) can be very difficult. Hope that helps clarify!

  • @ajbleas
    @ajbleas3 жыл бұрын

    This playlist is fantastic! Very useful information! :)

  • @numeristical

    @numeristical

    3 жыл бұрын

    Glad it was helpful!

  • @alibagheri411
    @alibagheri411 Жыл бұрын

    Great video, thanks very much

  • @numeristical

    @numeristical

    Жыл бұрын

    Glad you liked it!

  • @dakingrai4235
    @dakingrai4235 Жыл бұрын

    Many thanks for your video. A question on how to calculate the predicted confidence. Are we looking at the softmax score of the predicted label to get the predicted probability (conf)? For example, if I have three classes (cat, tiger, and dog) and I feed my model with a cat image but our model predicts it as a dog with 0.8 softmax score and a cat with 0.2 softmax score. Which softmax score do I use to assign the example to a specific beam and calculate the average confidence of that particular beam? Thank you!

  • @numeristical

    @numeristical

    Жыл бұрын

    OK, so the Coarsage algorithm outputs a vector of 17 numbers for each case. These numbers represent a probability distribution across 17 "classes", the score being 0, 1, 2, 3, ... , up 16 (or more). So to get the probability of, say, exactly 3, you would look at the 4th number (i.e. Python index 3)

  • @andreamarkos
    @andreamarkos2 жыл бұрын

    What algorithms are better candidates to predict probabilities for binary outputs in multivariate complex models?

  • @numeristical

    @numeristical

    2 жыл бұрын

    Gradient Boosting is still my "go-to" all purpose algorithm for any kind of structured data.

  • @fatiharisma9376
    @fatiharisma93763 жыл бұрын

    A very great explanation. Thank you very much sir. Sir I have inquiry regarding probability calibration. I've read that we can find a transition probabilities between state by using calibration technique, however I am not understand how it works. I would like to know about this technique. I would be really appreciate if you can assist me on this mater. Thank you again.

  • @numeristical

    @numeristical

    10 ай бұрын

    Thanks for your message. Calibration is used to "correct" probabilities when you have data. So if you have transition probabilities and then actual data, you could potentially used that as a calibration set.

  • @mattsamelson4975
    @mattsamelson49752 жыл бұрын

    In one of your examples with an imbalanced data set you make a bigger bin toward the right hand side of the reliability curve to account for fewer observations. The bin average might show as calibrated even though the individual predictions within that bin might be all over the place. How can one conclude in that case that that bin is well calibrated? In that case any given prediction may be far off the average which would suggest that it isn't well calibrated in that range. Am I looking at this incorrectly? Thanks in advance.

  • @numeristical

    @numeristical

    2 жыл бұрын

    Right, so this is the fundamental problem with binning - you are averaging the results of predictions with different probabilities. The wider the bin, the more granularity you are losing. So you're right - if you don't have a lot of example of predictions with a particular value (or in a range of values), you can't really conclude that the bin is well-calibrated. To make an analog to hypothesis testing, the best you can say is that you "fail to reject the null hypothesis that the probabilities in the bin are well-calibrated" but your test will not have much power.

  • @yuliahnurfadlilah3990
    @yuliahnurfadlilah3990 Жыл бұрын

    is it can be applied for realtime data such as IOT sensors? i have difficulties to know the equations to get to coding sensors

  • @numeristical

    @numeristical

    Жыл бұрын

    The source of the data doesn't matter, just as long as you have scores that you want to calibrate and an appropriate calibration set.

  • @flaviobrienza6081
    @flaviobrienza6081 Жыл бұрын

    Many thanks for your video. A question: can I first find a model's best hyperparams using RandomizedSearchCV, then create a new model with those hyperparams, without fitting it, and use it for probability calibration? Are the hyperparams found with RandomizedSearchCV still valid if I do this?

  • @numeristical

    @numeristical

    Жыл бұрын

    Hi - thanks for the question. I'm not sure what you mean by "creating a new model with those hyperparams, *without fitting it*..." You can't do much with a model if it is not fit. But I'm probably missing something. I've got a discord server called "numeristical" (just getting it started) but a question like this would be perfect for that venue. If you could join and post your question there, we can have a longer discussion. Here's a link to join: discord.gg/HagqzZa8 (will expire in 7 days). Thanks!

  • @flaviobrienza6081

    @flaviobrienza6081

    Жыл бұрын

    @@numeristical thanks for your reply. I was told that there are two ways to use CalibratedClassifierCV: 1- model = XGBClassifier model.fit(X_train, y_train) Creating the CalibratingClassifierCV, with cv='prefit', and fitting it with a validation set. 2- model = XGBClassifier Creating the CalibratedClassifierCV, with cv=5 for example, and fitting it with the training set. In the second case can I use the hyperparams found with RandomizedSearchCV?

  • @numeristical

    @numeristical

    Жыл бұрын

    Hmmm - I am not too familiar with the sklearn CalibratedClassifierCV. I remember it was unnecessarily complicated to use (looks like you are finding that as well). Instead, I would just fit your model normally (using whatever hyperparameter search works best), and then calibrate it with one of the methods I illustrate in this lesson. You can use the code in the notebook as a template. Hope this helps!

  • @flaviobrienza6081

    @flaviobrienza6081

    Жыл бұрын

    @@numeristical ok, many thanks.

  • @mattsamelson4975
    @mattsamelson49752 жыл бұрын

    Why wouldn't you ALWAYS calibrate probabilities for models?

  • @numeristical

    @numeristical

    2 жыл бұрын

    There are some scenarios where you might make a decision based on a threshold that doesn't depend on a probability. For example, in fraud detection you might decide to flag the 10 riskiest transactions each day. Or you might choose a threshold based on precision / recall considerations. So it may not always be necessary. And calibrating very low probabilities (like 1 in 10,000 or 1 in 100,000) can be very difficult. Hope that helps clarify!

  • @mattsamelson4975

    @mattsamelson4975

    2 жыл бұрын

    @@numeristical Yes very helpful. Thank you.