8.3 Bias-Variance Decomposition of the Squared Error (L08: Model Evaluation Part 1)

Ғылым және технология

In this video, we decompose the squared error loss into its bias and variance components.
-------
This video is part of my Introduction of Machine Learning course.
Next video: • 8.4 Bias and Variance ...
The complete playlist: • Intro to Machine Learn...
A handy overview page with links to the materials: sebastianraschka.com/blog/202...
-------
If you want to be notified about future videos, please consider subscribing to my channel: / sebastianraschka

Пікірлер: 37

@elnuisance2 жыл бұрын
This was life saving. Thank you so much Sebastian. Especially for explaining why 2ab = 0 while deriving the decomposition
@bluepenguin5606 Жыл бұрын
Hi Professor, thank you so much for the excellent explanation!! I learned bias variance decomposition long time ago but never fully understand it until I watch this video! Detailed explanation of each definition helps a lot. Also, with the code implementation, it helps me not only understand the concepts but also be able to implement into the real application which is the part I always struggle with! I'll definitely find time to watch other videos to make my ML foundation more solid.
@SebastianRaschka
Жыл бұрын
Wohoo, glad this was so useful! 😊
@kairiannah Жыл бұрын
this is how you teach machine leanring, respectfully the prof. at my university needs to take notes!
@user-st6sl4zo7p7 ай бұрын
Thank you so much for the intuitive explanation! The notations are clear to understand and it just instantly clicked.
@whenmathsmeetcoding18362 жыл бұрын
This was wonderful Sebastian after looking no such video available on you tube with such explanation
@SebastianRaschka
2 жыл бұрын
Wohoo, thanks so much for the kind comment!
@PriyanshuSingh-hm4tn2 жыл бұрын
The best explanation of bias & variance I've countered so far. it would be helpful if you could include the "noise" too.
@SebastianRaschka
2 жыл бұрын
Thanks! Haha, I would defer the noise term to my statistics class but yeah, maybe I should do a bonus video on that. A director's cut. :)
@ashutoshdave12 жыл бұрын
Thanks for this! Provides one of the best explanations👏
@SebastianRaschka
2 жыл бұрын
Thanks! Glad to hear!
@ashutoshdave1
2 жыл бұрын
@@SebastianRaschka Hi Sebastian, visited your awesome website resource for ML/DL. Thanks again. Can't wait for the Bayesian part to be completed.
@gurudevilangovan2 жыл бұрын
Thank you so much for the bias variance videos. Though I intuitively understood it, these equations never made sense to me before I watched the videos. Truly appreciated!!
@SebastianRaschka
2 жыл бұрын
Awesome, I am really glad to hear that I was able to explain it well :)
@khuongtranhoang91973 жыл бұрын
Do you know that you are doing truly good work! clear to every single details
@SebastianRaschka
3 жыл бұрын
Thanks, this is very nice to hear!
@krislee92962 жыл бұрын
Thank you so much. This helps me to understand perfectly about Bias-Variance mathmetically.
@SebastianRaschka
2 жыл бұрын
Awesome! Glad to hear!
@imvijay1166 Жыл бұрын
Thank you for this great lecture series!
@SebastianRaschka
Жыл бұрын
glad to hear that you are liking it!
@siddhesh1193699 ай бұрын
Hi, thanks for teaching, really helpful 😊
@justinmcgrath7535 ай бұрын
At 10:20, the bias comes out backward because the error should be y_hat - y, not y - y_hat. The "true value" in an error is substracted from the estimate. Not the other way around. This is easily remembered from thinking of a a simple random variable with mean mu and error e: y = mu + e. Thus, e = y - mu.
@tykilee96832 жыл бұрын
So helpful😭😭😭
@andypandy1ify3 жыл бұрын
This is an absolutely brilliant video Sebastian - thank you. I have no problem deriving the Bias-Variance Decomposition mathematically, but no one seems to explain what the variance or expectation is with respect to - is it just on one value? over multiple training sets? different values within one training set? You explained it excellently.
@SebastianRaschka
3 жыл бұрын
Thanks for the kind words! Glad it was useful!
@RictooАй бұрын
I have a couple of questions: Regarding the variance, is this calculated across different parameter estimates given the same functional form of the model? Also, these parameter estimates depend on the optimization algorithm used, right, ie., implying the model predictions are 'empirically-derived models' vs. some sort of theoretically optimal parameter combinations, given a particular functional form? If so, would this mean that _technically speaking_, there is an additional source of error in the loss calculation, which could be something like 'implementation variance' due to our model likely not having the most optimal parameters, compared to some theoretical optimum? Hope this makes sense, I'm not a mathematician. Thanks!
@kevinshao91482 жыл бұрын
Thanks for the great video! One question: 8:42 why y is constant? y=f(x) here also has distribution, is a R.V. is that correct? and when you say "apply expectation on both sides, this expectation over y or over x?
@SebastianRaschka
2 жыл бұрын
Good point. For simplicity, I assumed that y is not a random variable but a fixed target value instead
@kevinshao9148
2 жыл бұрын
@@SebastianRaschka Thank you so much for the reply! yeah that's where my confuse sticks. So what do you expectation over of? If you expectation over all the x value, then you cannot do this assumption right?
@DeepS69953 жыл бұрын
Professor, does your bias_variance_decomp work in google colab? It did not with me. It worked just fine in Jupyter. But the problem with Jupyter is that bagging is way slower (that's my computer) than what I could get in colab.
@SebastianRaschka
3 жыл бұрын
I think Google Colab has a very old version of MLxtend as the default. I recommend the following: !pip install mlxtend --upgrade
@DeepS6995
3 жыл бұрын
@@SebastianRaschka It works now. Thanks for the prompt response
@1sefirot93 жыл бұрын
any good sources or hints on dataset stratification for regression problems ?
@SebastianRaschka
3 жыл бұрын
Not sure if this is the best way, but personally I approached that by manually specifying bins for the target variable and then proceeding with stratification like for classification. There may be more sophisticated techniques out there, though, e.g., based on KL divergence or so.
@1sefirot9
3 жыл бұрын
@@SebastianRaschka hm, given a sufficiently large number of bins this should be a sensible approach, and easy to implement. I will play around with that. I am trying some of the things taught in this course on the Walmart Store Sales dataset (available from Kaggle), a naive training of LGM already returns marginally better results than what the instructor on udemy had (he used xgboost with hyperparameters returned by the AWS Sagemaker auto tuner).
@bashamsk1288 Жыл бұрын
When you say bias^2+variance that is for a single model In the beginning you said bias and variance for different models trained on different datasets which one is it? If we consider single model then bias is nothing but mean error and variance is mean squared error?
@jayp123Ай бұрын
I don’t understand why you can’t multiply ‘E’ the expectation by ‘y’ the constant