caltech
12 жыл бұрын
161,127
1

Lecture 08 - Bias-Variance Tradeoff

Ғылым және технология

Bias-Variance Tradeoff - Breaking down the learning performance into competing quantities. The learning curves. Lecture 8 of 18 of Caltech's Machine Learning Course - CS 156 by Professor Yaser Abu-Mostafa. View course materials in iTunes U Course App - itunes.apple.com/us/course/ma... and on the course website - work.caltech.edu/telecourse.html
Produced in association with Caltech Academic Media Technologies under the Attribution-NonCommercial-NoDerivs Creative Commons License (CC BY-NC-ND). To learn more about this license, creativecommons.org/licenses/b...
This lecture was recorded on April 26, 2012, in Hameetman Auditorium at Caltech, Pasadena, CA, USA.

Пікірлер: 57

@clarkupdike65182 жыл бұрын
Prof Yaser has the rare ability to hide unnecessary complexity until it is truly needed, and then when needed, very gently layers it on with good explanation. It makes it much easier to understand even complex formulae and notation that would lose a lot of the audience when delivered by the typical instructor.
@manjuhhh10 жыл бұрын
Thank you Caltech and Prof Yaser Abu Mostafa in particular for the lecture.
@kadaj2k74 жыл бұрын
It is so much fun to learn by watching his lectures. Thank you Caltech and Prof. Yaser!
@isaacguerreiro38695 жыл бұрын
When he demonstrate that h_0 is best from the learning situation than h_1 I had to applaud and smile. Prof. Yaser is incredible.
@chyldstudios4 жыл бұрын
The most lucid explanation of bias vs variance I've observed. Bravo!
@insanecrazy211 жыл бұрын
Thank you so much for this excellent,descriptive lecture! Worth every minute of it!
@BL4ckViP3R11 жыл бұрын
Excellent! Thanks for this great demonstration of trade-off between variance and bias.
@aviraljanveja51555 жыл бұрын
1:10:33 - When Prof. Abu Mostafa is so awesome that he teaches you the meaning of boot-strapping along with Bias and Variance ! :D
@bogdansalyp28346 жыл бұрын
Excellent lecture, thanks to Prof. Abu-Mustafa and Caltech!
@qrubmeeaz5 жыл бұрын
Great course. Thank you for posting it.
@mehdimashayekhi167510 жыл бұрын
Excellent course and teacher, thanks for sharing,
@Nbecom11 жыл бұрын
Same mathematical concept may occur in different contexts and be useful. Here bias and variance constitute two components of E_out and breaking E_out into these two components is a thing of beauty.
@demon01924 жыл бұрын
The past few lectures have been super theoretical and thus difficult for me, but am glad this was a good one
@SinanNoureddine
4 жыл бұрын
I agree Tony. The past three lectures were not easy at all. According to Caltech's website work.caltech.edu/telecourse.html. The last three lectures are pure theory and mathematical
@jjpp19935 жыл бұрын
What an elegant derivation
@nova2577 Жыл бұрын
Best explanation of Bias and Variance
@mayukhmalidas3 жыл бұрын
This video is 🔥
@KapilGuptathelearner11 жыл бұрын
in silde 15 the bias is calculated by integrating (sinx-0)^2 from -pi to pi?
@mikewy12035 жыл бұрын
讲的真是好，可以先看31分钟的sin函数例子，再回头看理论推导
@andysilv8 жыл бұрын
It seems rather strange that bias almost doesn't depend on N on slide 20. Prof says it doesn't play a sufficient role whether we take 2 or 10 at a time if we have infinite number of datasets, but the problem is, it can greatly affect the probability distribution on g^D(x) and this means it can affect the expectation of it over D, which will cause the bias to vary greatly. I didn't get the point.
@denisdaletski7807
7 жыл бұрын
Probability distribution will vary, but it's expected value is going to be almost the same. What will indeed vary is the line "ability to have extreme slope" (it will go down while N goes up). But as we have an infinite number of datasets of size N, our average line from infinite number of lines will tend to be closer to that one we get by just fitting to the target function (assuming for a moment that it's known). Average line g_hat(2) constructed from infinite number of datasets of size 2 will get close to the same "true" line as an average line g_hat(100) constructed from infinite number of datasets of size 100
@loscilla Жыл бұрын
1:00:26 shouldn't the expecte generalization error be without sigma? if it is the difference between E_out and E_in, sigma^2 cancels out. So 2(d+1)/N
@janasandeep6 жыл бұрын
what is the meaning of "zooming in" on a good h?
@mohammedzidan120310 жыл бұрын
what do you mean by data resources particularly ?
@desitravellers2023
5 жыл бұрын
You can assume it is the amount of information available with you as training samples. This include both the quantity(number of samples) and quality(good representation of input distribution).
@thangbom47425 жыл бұрын
I dont understand why it is possible to exchange the order of E_x and E_D (16:22). The x variable depends on the specific dataset D. Each dataset D define a different set of variable x. How can E_D calculated before E_x ?
@laurin1510
5 жыл бұрын
in case it is still relevant. U can interchange of the Fubini Tonelli theorem which talks about the assumptions u need to interchange the order of integration. Positivity of the integrand is the main one. Now to your concern. X does not depend on the specific Dataset. The learned function does. It does not mean that if u have a specific D u integrate over all of the X that are in D. U Allways integrate over the whole domain of X and independently! over the whole domain of all Datasets D of a given size ( lets say that size is N to stay consistent with the lecture).
@AndyLee-xq8wq Жыл бұрын
I want to know when we say effective number of parameters, vc dim and degree of freedom, are they actually the same thing?
@lindercandidodasilva1276
2 ай бұрын
yes.
@alexanderostrikov8 жыл бұрын
bigger VC, better aproximation => Bias smaller, Variance larger, in-sample error smaller , generalization error larger bigger N => Bias no change, Variance smaller, in-sample larger, generalization error smaller Right?
@codyneil97
8 жыл бұрын
+Sasha Ostrikov Correct. Only thing I'd add is that a larger N gives you the ability to use a more complex hypothesis set (bigger VC), which will lead to a smaller Eout.
@-long-4 жыл бұрын
24:24 is it true that g_bar(x) is the best hypothesis or it's just his way of saying to make people understand easier? Thank you ----------------- Update: after 3 years, I accidentally came back to this video. The answer is: yes, that g_bar(x) is the "best possible hypothesis" one can find given all the data points. Of course we'll have deterministic noise since g_bar(x) is just an approximation of f(x). The detail was discussed in Lecture 11 - Overfitting.
@ajayram1986 жыл бұрын
What does the professor mean by "inability to zoom in on the right hypothesis"?
@desitravellers2023
5 жыл бұрын
When will you say you have find the right hypothesis g, given a dataset D? It is when Eout(g) ~ 0 but since you have access to only sample D and not the entire input distribution, you hop that if Ein(g^D) ~ 0 then it should behave the same for Eout. But as the hypothesis complexity increases the probability of the previous hope failing also increases. Intuitively if you have large no. of choices to look at, the prob. of finding a hypo. g such that Ein(g^(D)~0 but |Eout - Ein| is large, also increases. This is what he meant when he said inability to zoom in on the right hypothesis".
@mohammedzidan120310 жыл бұрын
what do you mean by simple modal and complex model ?
@denisdaletski7807
7 жыл бұрын
It depends of a particular case. In the example of sinusoid, linear and constant models are pretty simple, while a 4-degree polynomial is I'd say "complex enough to approximate well". But the price to pay for this complexity may be the inability of such a polynomial to extrapolate well
@markh1462
5 жыл бұрын
Generally, more parameters=complex, less parameters=simple.
@tradingmogador91715 жыл бұрын
at the last section of this video he made a mistake by cofounding the E_in and E_out. E_in=(sigma^2)*(1+(d+1)/N) and E_out=(sigma^2)*(1-(d+1)/N) not the contrary.
@JackSPk
4 жыл бұрын
Why are you saying this? If d+1 = 10 , and N=2, it should be very easy for my model to (over)fit those two points. So: E_in = s^2 * (1 - 10/2) E_in = s^2 * (- 4) And: E_out = s^2 * (1 + 10/2) E_out = s^2 * 6 > 0 Which we expect, as for complex models to overfit few points (low E_in, high E_out). The other way around will give large values for E_in in this case, which shouldn't be the case.
@varunmahanot57665 жыл бұрын
i do actually have a learning problem😥😞
@VincentZhouPlus5 жыл бұрын
I like that the profession is so passionate. My eyes are wet.
@marcogelsomini76552 жыл бұрын
44:44 gold
@SinanNoureddine4 жыл бұрын
At minute 40:28, the professor meant to say the height of the line of g_bar(x) is the variance.
@Shnauzzy10 жыл бұрын
what does E[] function mean again, also P[]? 6 years since I've done statistics...
@solsticetwo3476
5 жыл бұрын
The expected vakue, nothing important to consider. Take other 6 years off
@markh1462
5 жыл бұрын
@@solsticetwo3476 if you just want to be a user of ML (running someone else's codes), then sure, maybe you don't need to know math (esp. stats foundation) behind it.
@markh14625 жыл бұрын
Actually Bayesian methods have an inherent benefit of naturally improving generalization. One can easily prove that the prior term behaves like added data, i.e., increasing N.
@MrCmon113
4 жыл бұрын
I always choose a prior that gives a probability of 1 to the correct hypothesis only.
@solsticetwo34765 жыл бұрын
This is the difference between teaching insights and teaching equations
@markh14625 жыл бұрын
On slide 8, g_bar needs not be the "best". It's just the expected value, i.e., what you "expect" to get "on average". The analysis of accuracy with respect to the expected value is the definition of bias and variance in measurement theory, estimation, and ML alike. This is nothing new--in fact, it's one of the oldest tricks, and certainly is not new to ML. The discussion about what should be the "best" is then another whole can of worms.
@walkingon20019 жыл бұрын
He speaks with a rolling tongue, it's annoying, it's hard to understand him.
@exmachina767
6 жыл бұрын
walkingon2001 I’m afraid your brain needs a bigger “language accents” training set to generalize properly :)
@solsticetwo3476
5 жыл бұрын
walkingon2001 Bobo, aprende otros idiomas como el
@markh1462
5 жыл бұрын
@@exmachina767 this is not a linguistic course.
@exmachina767
5 жыл бұрын
@@markh1462 Well, if you or your friends want "perfect", "native" accents that don't bother your delicate ears, stop watching this and go check out other courses. But you should get used to the fact that some of the best researchers in ML/AI are foreigners with accents. That's just what the real world is like.
@markh1462
5 жыл бұрын
@@exmachina767 I never said I want "perfect" accent. I'm just annoyed by your original comment which stated that people who have difficulty catching various accents have smaller brain (which was why I told you that most people on this vdo likely use their brains mainly for math, not linguistics). I myself am a foreigner and an ML researcher, and I do have lots of difficulties catching various accents--it does not mean that having accent is bad or wrong, or that people who have a hard time understanding foreign accents are dumb--it's just difficult, no more, no less. Why are you trying to interpret these comments otherwise? Small brain, delicate ears.. jeez seems like you love personal attacks huh--these attacks make you sound more delicate and immature than anyone else here IMO.