Difference between the error term, and residual in regression models
Errors and residuals are not the same thing in regression.The confusion that they are the same is not surprisingly given the way textbooks out there seem to use the words interchangeably. Let me introduce you then to residuals and the error term.
Пікірлер: 72
Wow, 8 years have passed but this video still is the best/simplest explanation on KZread. Cheers mate! Thanks!
0:20 residual term, error term(=disturbance term) 4:29 Y = value predicted by line + error (disturbance) 6:14 residual thank you
Finally someone who completely solved my confusions! Thanks!
Learning smth is a skill. A few amount of people have it. They are called good learners. Teaching something is a skill. An absolutely miniscule amount of people can do it right. And you my good sir are absolute legend of a teacher.
phil you my personal hero...please never stop making these kinds of videos
Thank you very much. It was helpful to review this basic idea as I started mixing them up for some reason. You talk very slowly, but I put the video on 2x the speed and your explanation was straight and to the point. Thanks!
So an error is the difference between a sample and the ground truth model, whereas the residual is the difference between a sample and a model we estimated.
I had this doubt for like about a century now!! Thanks for finally resolving it. 😅
Thank you so much Phil Chan... Your way of explanation is so good
Mate you are an ABSOLUTE LEGEND!! Thanks for this explanation.
Perfect! Thank you Phil!!!!
Very clear and highly entertaining.
My econ textbook constantly iterates that errors and residuals are not the same, but at the same time does not give any intuition behind it. Thank you for the video and keep up the good work!
This is fantastic. The video is very helpful
Perfect man. Crystal clear.
Thanks a lot! A very clear explanation!
thank you so much for your sharing. You made my study easier.
Really Nice.. Thanks for uploading.
really nicely and simply explained :)
Great.. Thank a lot for such a nice video.
phenomenal explanation
Please elaborate on this. We never have the 'true' line because typically we don't have the entire population set.
@hippolyte2175
2 жыл бұрын
thanks this comment helped me a lot
@damirb6294
2 жыл бұрын
Thanks for this comment. That is something that missing in the video. Other is just "simple" math.
Thanks, nice explanation!
I now hopefully understand the difference between residual and error term! But my new problem has to do with the Gauss-Markov assumptions. One assumption says "the expected value of the error term is zero" - but how do you control this when the error terms are unobserved? I´m so confused...
@PhilChanstats
6 жыл бұрын
This condition is more technical and depends on assumption on X. In observational data, Xs like Ys are random. So your zero mean error assumption is E(u|X)=0. This is the zero mean assumption. Within it it implies zero correlation between error and Xs. If when you plot residuals v fitted Ys and see there's a pattern in scatterplot like a linear pattern, then it points to this assumption not being met.
great work mate!!!
so is true line Y and estimated Line Y hat?
Thanks! well explained
SO useful Thanks
Thank you Sir!
i have a question im doing an OLS analysis on trade policy and i have a problem, when it comes to including the disturbance term i get confused , i know it has a mean of 0 and is a term referring to things outside of our measurement abilities that can have an effect on the dependent variable, but the question is do i just include it without giving it any value, do i just write an epsilon in the end and thats all? i hope you understand what i mean cause i means there must be a numerical value or something??
@PhilChanstats
7 жыл бұрын
simple answer, "yes"
@mm22sapphire50
7 жыл бұрын
ok, i guess your sure of that right?
amazing video thanks
So we don't know the true line. Is that because the true line is the real relationship between Y and X and we can only estimate the relationship? I was a bit confused about that part - just need clarification.
@DMaTTh32
9 жыл бұрын
anarki777 Yes that what he's saying
great thank you very much !! :)
very useful sir
What software did you use to make this video? It is excellent!
@PhilChanstats
8 жыл бұрын
+Oscar Wilde Powerpoint. Took ages.
great video
Thanks a lot .
nice video. thanks
What can i do to avoid error ( disturbance) or reduce it ???
@DMaTTh32
9 жыл бұрын
Samar A.Taha one of the assumptions in statistics is that the error term is zero. So in theory it is reduced (to 0)
lovely!
wow this is awesome this video solved my all problems But i learned that errors are not associated with estimated regression line can u tell me wht is the meaning of that
So can we say that u is the aquivalent of û (or other way round) because in my college u is defined as "any other factor that influences y except for x" which always sounds so abstract. I mean I would like to see u on a graph or like with numbers but when I watch your video it seems to be like û. The difference between the graph and the observations. Am I right?
@PhilChanstats
4 жыл бұрын
The error term is not the same as the residual. The error term is not observable(cannot be computed); the residual may be computed. Yes, you can view the error term as containing all other relevant X variables + noise.This noise can come from different sources depending on where your data comes from. It could be due in part o measurement errors, or just natural randomness.
@PhilChanstats
4 жыл бұрын
But quite often in texts and lectures I see residual used in place of error. So long as you understand the meaning that's ok.
@masaru444
4 жыл бұрын
Phil Chan thank you for the answer :) I didn‘t say that the error term and the residual are the same. They just seem to be similar in their meaning. One is the difference between observation and true line and the other one the difference between observation and estimated line. I now have my confirmation :)
Difference between Error and Residual?
but the true line does not exist. i mean, it's never a perfectly linear relationship in reality, so what the true line really means? and if the true line does not exist, what does error term represent? i still don't get it :)
8 жыл бұрын
Estimated regression line - is best fit line, that we can do from points, that we have in our sample. Our sample is limited = we do not know all points in whole population. That means, our estimated line is not (most probably) representing whole population - or said differently - it is not the TRUE regression line. We would get the TRUE LINE only if we calculated it from all points in the population - what is not possible in almost all cases. If you understand the difference between ESTIMATED regression line vs TRUE fit line, it is very easy now. RESIDUALS = distance between OBSERVED points and ESTIMATED fit line. ERROR = distance between OBSERVED points and TRUE fit line (which is uknown). Note, that ERROR is theoretical and abstract value = uknown value = we cannot calculate it, because do not have all points of population. We have only points from our sample. What's the relation? Why it is made so complicted in theory? Answer is really simple: We expect, that: RESIDUALS (known value) "APPROXIMATE" the ERROR (which is unknown)
@RPDBY
8 жыл бұрын
Štefan Šimík okay, thank you fro the effort. i think i get it better now
@tqri9795
7 жыл бұрын
Thanks for your further explanation!
@rafiullahkhan4622
6 жыл бұрын
True line shows the exact relationship between variables. That exact relationship is only known to ALLAH and is beyond the scope of human knowledge. One of the factors is the use of sample data instead of population data. There are many other reasons due to which we can't make the true line.
@vasilis_fr
6 жыл бұрын
yeah tottally agree with stefan and i would like to add one last thing. GM assumptions (in order to be BLUE the OLS estimate) hold for the disturbances and because we cannot know them they are called assumptions. Finally, residuals dont have necessarily to be normal distributed though in many cases it is convenient.
Let e1, e2 . . . , en be the residual values for the simple linear regression model Yi = β0 + β1xi + εi for i = 1, 2, . . . , n. Using the above model equation, explain why residuals can be used to estimate the unobserved values of the errors ε1, ε2, . . . , εn.
thx
phil can i have your email to ask you about a question i have? thanks
@PhilChanstats
8 жыл бұрын
+Michael Richardson Michael - you can try posting your question on youtube.
@michaelrichardson2693
8 жыл бұрын
Could you give me a real world example of a theoretical model that is endogenous in OLS? I am studying for my undergraduate degree and am struggling with the concept of endogeneity and exogeneity
I read in Wikipedia and it is the exact opposite!!! what is going on ?? A statistical error (or disturbance) is the amount by which an observation differs from its expected value, the latter being based on the whole population from which the statistical unit was chosen randomly. For example, if the mean height in a population of 21-year-old men is 1.75 meters, and one randomly chosen man is 1.80 meters tall, then the "error" is 0.05 meters; if the randomly chosen man is 1.70 meters tall, then the "error" is −0.05 meters. The expected value, being the mean of the entire population, is typically unobservable, and hence the statistical error cannot be observed either. A residual (or fitting deviation), on the other hand, is an observable estimate of the unobservable statistical error. Consider the previous example with men's heights and suppose we have a random sample of n people. The sample mean could serve as a good estimator of the population mean. Then we have: The difference between the height of each man in the sample and the unobservable population mean is a statistical error, whereas The difference between the height of each man in the sample and the observable sample mean is a residual. www.wikiwand.com/en/Errors_and_residuals
The key question is not explained entirely: what is the difference between "True" and "Estimated" line?
@PhilChanstats
2 жыл бұрын
The true line has the parameter values (in the example it's the intercept term and slope parameter). These values are not known, so we can't draw the "true" line. Using the data we can get estimates of the parameters. Chances are the estimates are close to but not equal to the true values.
@damirb6294
2 жыл бұрын
Thanks a lot! It is completely clear now.
Putting this video with 1.5x speed is a life saver, you talk soooo unecessarily slowly, like youre detailing a murder case
Explaining very slow, however the video was helpful