R demo | Robust Regression (don't depend on influential data)

Linear regression can be very sensitive to unusual data, like outliers, high leverage observations or a combination of both. A robust regression suppose to provide a solution for that. So, let's build both an ordinary and a robust regressions, compare them to find out whether outliers are a serious problem and see whether robust model performs better then usual linear model.
If you only want the code (or want to support me), consider join the channel (join button below any of the videos), because I provide the code upon members requests.
Enjoy! 🥳
Welcome to my VLOG! My name is Yury Zablotski & I love to use R for Data Science = "yuzaR Data Science" ;)
This channel is dedicated to data analytics, data science, statistics, machine learning and computational science! Join me as I dive into the world of data analysis, programming & coding. Whether you're interested in business analytics, data mining, data visualization, or pursuing an online degree in data analytics, I've got you covered. If you are curious about Google Data Studio, data centers & certified data analyst & data scientist programs, you'll find the necessary knowledge right here. You'll greatly increase your odds to get online master's in data science & data analytics degrees. Boost your knowledge & skills in data science and analytics with my engaging content. Subscribe to stay up-to-date with the latest & most useful data science programming tools. Let's embark on this data-driven journey together!

Пікірлер: 40

  • @MrNummularius
    @MrNummularius10 ай бұрын

    I’ve been away from statistical programming for months. Now I’m back and can’t express my gratitıde to check this channel again. One of the biggest source of knowledge if someone’s learning R. Thank you for everything.

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    10 ай бұрын

    Welcome back! Thanks again for nice feedback and for watching!

  • @jamiyana4969
    @jamiyana4969 Жыл бұрын

    Literally the best explanation out there! Amazing explanation in the simplest way and great video quality as well. A whole semester of lecture at uni only makes sense now. You're really saving our students life a lot

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    I am so glad to hear that! :) Thank you for such a warm feedback and for watching! Feel free to provide feedback any time for any video, even (or especially) the negative one. This way I can make my content even better ;)

  • @eyadha1
    @eyadha1 Жыл бұрын

    very helpful . thank you very much

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    Glad it helped! Thanks for the feedback!

  • @knutjagersberg381
    @knutjagersberg381 Жыл бұрын

    You keep creating the very best #rstats content.

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    Thanks a lot for the feedback, Knut! :) Glad you like it!

  • @JP-ee5iv
    @JP-ee5iv Жыл бұрын

    very clear and informative

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    Glad you liked it! Thanks for watching!

  • @muhammedhadedy4570
    @muhammedhadedy45704 ай бұрын

    Excellent and brilliant, as usual. You are a true legend, my dear professor. Just one question. Can I use this robust approach in logistic and Cox regression as well? Thanks in advance.

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    4 ай бұрын

    Thanks for feedback! I don't see any need to use robust approaches there. The log-regr uses 0 and 1 and Cox calculates median survival. You need to check the data for outliers though.

  • @biologicalstatistics3320
    @biologicalstatistics3320 Жыл бұрын

    these are great packages. It would be good if we can get an idea on how to perform these procedures manually and without the use of extra packages.

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    Thanks! What do you mean by "manually"? I actually like packages exactly because they do the work which we don't have to do manually any more. It's sure good for deeper understanding of processes, but I think about it as - not reinventing the weal

  • @hemantjoshi5034
    @hemantjoshi5034 Жыл бұрын

    Thank you ! Can you have another tutorial posted on how to comprehend the outcome of robust regression. Thanks again !

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    hey, generally, you can interpret the results similarly to OLS, using averages per group or average slopes, the only difference is, that those averages are robust to outliers and some other contaminations. Thus, I don't think a whole video is necessary there, except I'll go deep into the math of how the weights are calculated (which does not seem too important for the majority of people). thanks for watching!

  • @WilForDataScience
    @WilForDataScience2 ай бұрын

    Hi Mr. Zablotski, it's me again. Great video!!! This is a kind of regression/ML that I didn't even know could exists. Thanks to this I am going to investigate more and study it in depth. In fact I am kind of "stealing" your code to apply it in my own R project 😀 PS: While studying with this video I could notice that in 01:41 observation #5 has the highest weight but not the smallest residual, Can you please explain it so I can get a better understanding? or maybe it is just me who didn't get the topic right: I am up to receive any corrections. PS2: Would you consider creating a deeper video regarding influential observations (i. e. Outliers and Leverages) and how to treat/handle them? PS3: Would you consider a PCA in R video? That's it! Thank you very much sir, greatings from Colombia. 🤜🤛

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    2 ай бұрын

    Thanks again mate for your continuous support! And I am actually glad you steal my R code, that's why I create content - hoping that it would be useful for more people, then just me! :) PS: if you go to 3:31 you'll see that #5 lies directly on the robust regression, similar to #1, so, their weights are very high, while the weight of #5 is the highest. PS2: I did already some older videos on exploratory data analysis, for instance, on dlookr package and the Deep Exploratory Data Analysis. They might help. PS3: I'll put a PCA on the list! It's definitely one of the vids I wanna do. If I only had more time :) Cheers from Munich, bro!

  • @kwizeralambert1316
    @kwizeralambert1316 Жыл бұрын

    Amazing

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    Thanks 🙏 glad you enjoyed

  • @kwizeralambert1316

    @kwizeralambert1316

    Жыл бұрын

    @@yuzaR-Data-Science Greatly, I think it is a good idea to use practical examples, perhaps using data from World Bank or IMF or even WTO, WHO to demonstrate this issue of robust regression analysis. I did study regression analysis, but was not introduced to the issue of outliers and how to address them. What amazes me, i constantly see this issue of robust regression analysis in scientific papers I read.

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    you are right, a practical example would be better! I did use examples from real data in other videos before. Would work on that. Well, in my opinion, there is not enough use of a robust regression in papers in the vet-medicine, that's my field, but people instead use ordinary least squares without checking assumptions... that's horrible. so, I really needed an example where the result will become the opposite due to a single outlier. cheers

  • @chacmool2581
    @chacmool2581 Жыл бұрын

    How is 'robustbase::lmrob()' different from 'ltsreg()' in base R? I note that both are iterative, potentially computationally expensive and non-convergent and binary in their weighting of observations.

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    waw, that's deep, I honestly don't know details. I just know that they come from the same package, and that lmrob weights the values, while (I think) ltsreg just trimmes the extremes. My know out criteria for ltsreg was that I couln't easily plot or get a nice gt-like table of the results. While lmrob output can be further used.

  • @shaunaheron4448
    @shaunaheron4448 Жыл бұрын

    This was excellent--one question: can you recommend an R package for robust analyses of multilevel binomial models (glmer)? I had hoped that robustlmm could do the job, but unfortunately it only applies to gaussian distributions :( Thanks so much for all of your excellent videos!

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    Thanks Shauna! :) I actually can't recommend any package, because it does not exist to my knowledge (or I didn't look carefully enough). I asked myself why? And the reason might be that logistic regression (also the mixed one) does not have many assumptions. Here is a recourse on that from people other then me: www.statisticssolutions.com/free-resources/directory-of-statistical-analyses/assumptions-of-logistic-regression/ . So, if you have a binomial or even ordinal response, there won't be any outliers in your response. You could check the assumptions with check_model() function from performance package or with some other influential-diagnostics tools, but if you wanna remove outliers from the data, you can just look at the raw data, remove the most unusual observations, and then run glmer(..., family = binomial). robustglmm is amazing though! I use it in my work often. Another idea might be to use a bootstrapped regressions (like 1000 of them) and then get the distribution of parameters... it's kind of non-parametric ... or in other words - kind of robust. hope that helps! cheers

  • @shaunaheron4448

    @shaunaheron4448

    Жыл бұрын

    ​@@yuzaR-Data-Science ​ Thank you so much! This was such a helpful reply. Exactly as you state--a binomial outcome shouldn't really have outliers--so, I'm thinking they're due to the groupings? check_outliers is telling me I have some, but I reallllly don't want to remove them as I don't want to lose all of that information. Everything else in check_model looks great. When I run a similar glmer model--but with aggregated scores (now a count outcome)--I don't have outliers. SO, maybe it is due to my item-level random effect... The wheels are turning, will go see what I can figure out! As to bootstrapping: yes, I think I may go that route--I just have SOOOO little patience waiting for it run :)) Thanks again for your help! Oh, ps: if you could do a tutorial sometime on working w/bayesian models (brms) for complex mixed models that would be great!

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    you are welcome! there will be always some contaminations in the data. Sometimes they are - the most interesting data. In your case, it seems to be ok to have some semi-outliers. BRMS is great and I will definitely dive into it. But I would first cover some more popular frequentists techniques and tidymodels before going Bayesian. By the way, I let my laptop to calculate bootstrapping and Bayesian models overnight ;) Cheers

  • @fishfish20
    @fishfish20 Жыл бұрын

    As always, precise and impacting. I was thinking, can you do something on NonParametric methods of ordination and anova(permanova, anosim and mantel testfor Ecological/biological studies? Thank you.

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    Great suggestion, Jonathan! And thanks for the feedback, I appreciate it! I definitely put it on the list. It might take some time though, because I plan to do another non-parametric or semi-parametric techniques first, like bootstrapping regression or median based regression, but generally, I am very interested in the non-parametric methods!

  • @fishfish20

    @fishfish20

    Жыл бұрын

    @@yuzaR-Data-Science Noted sir. Looking forward to more videos from you. Thank you

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    @@fishfish20 more to come ;) cheers

  • @oluwafemioyedele
    @oluwafemioyedele Жыл бұрын

    Thank you, it was very useful, do you have any blog post on this ?

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    I usually do have a blog post to every video, but I recently have a problem with my blog, so it might take some time to resolve, but you can find all the other articles with code here: yuzar-blog.netlify.app/

  • @oluwafemioyedele

    @oluwafemioyedele

    Жыл бұрын

    @@yuzaR-Data-Science Thank you!!!

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    hey mate, I fixed the blog-problem. it was a big file which could not be pushed to github. so, I needed to reset my local blog from the internet, remove the big files and then commited and pushed again. anyway here we go: yuzar-blog.netlify.app/posts/2022-09-02-robustregression/

  • @abdulmusa6162
    @abdulmusa6162 Жыл бұрын

    Thanks so much sir, this is very helpful. Sir, I am working with survey data would you recommend an R package to analyze multiple response variables in R?

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    Thanks, Abdul! If your analysis is not too complex, may be a Manova in R would work. I have an old blog-post of my with some R code: yury-zablotski.netlify.app/post/manova-analyse-multiple-responces/ . Then, I would also do nested (grouped) models where you put names of the responses in the first column below each other, then responses in the next columns below each other, and then just blocks of the same predictors below each other. Then you nest the response names (not values) and use "map()" function for many models. Hmmm, I could do a video about it, if you think it's useful?

  • @abdulmusa6162

    @abdulmusa6162

    Жыл бұрын

    Thanks in million sir, I will be glad to see new video on this multiple response variables in R Once again massively appreciate your efforts sir

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    @@abdulmusa6162 you are very welcome! 🙏 cheers