4 Reasons Non-Parametric Bootstrapped Regression (via tidymodels) is Better then Ordinary Regression

If the assumptions of parametric models can be satisfied, parametric models are the way to go. However, there are often many assumptions and to satisfy them all is rarely possible. Data transformation or using non-parametric methods are two solutions for that. In this post we’ll learn the Non-Parametric Bootstrapped Regression as an alternative for the Ordinary Linear Regression in case when assumptions are violated.
If you only want the code (or want to support me), consider join the channel (join button below any of the videos), because I provide the code upon members requests.
Enjoy! 🥳
Welcome to my VLOG! My name is Yury Zablotski & I love to use R for Data Science = "yuzaR Data Science" ;)
This channel is dedicated to data analytics, data science, statistics, machine learning and computational science! Join me as I dive into the world of data analysis, programming & coding. Whether you're interested in business analytics, data mining, data visualization, or pursuing an online degree in data analytics, I've got you covered. If you are curious about Google Data Studio, data centers & certified data analyst & data scientist programs, you'll find the necessary knowledge right here. You'll greatly increase your odds to get online master's in data science & data analytics degrees. Boost your knowledge & skills in data science and analytics with my engaging content. Subscribe to stay up-to-date with the latest & most useful data science programming tools. Let's embark on this data-driven journey together!

Пікірлер: 51

  • @utubeleo5037
    @utubeleo5037 Жыл бұрын

    This was a great watch. it was really well put together, with a good mix of visuals, code and narrative. Thank you for putting it together and sharing

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    Glad you enjoyed it, Leo! Thanks for your feedback!

  • @SergioUribe
    @SergioUribe Жыл бұрын

    Thanks for share! I will start to use this model

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    👍 you can use any type of model with bootstrap 😉

  • @oousmane
    @oousmane Жыл бұрын

    Always excellent ❤️

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    Thank you very much! 🙏

  • @jeffbenshetler
    @jeffbenshetlerАй бұрын

    Excellent demonstration in R.

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Ай бұрын

    thanks a lot Jeff, glad you enjoyed it! :)

  • @eyadha1
    @eyadha1 Жыл бұрын

    great video. Thank you

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    Thanks 🙏 glad you enjoyed it

  • @CCL-ew7pl
    @CCL-ew7pl11 ай бұрын

    Great video, thanks Yury ( Munchausen cartoon was an unexpected special treat :))

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    11 ай бұрын

    😂 I wasn't sure anyone would recognise Baron Münchausen 😁 Glad you enjoyed it!

  • @heshamkrr669
    @heshamkrr669 Жыл бұрын

    WORKING thx bro

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    Cool 😎

  • @ambhat3953
    @ambhat3953 Жыл бұрын

    Thanks for this...i think now i have a direction to solve the data set at work which is not normally distributed

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    You are welcome 🙏 if not normal distribution is your only problem, look into non parametric statistical tests, like Mann Whitney or Kruskal Wallis

  • @ambhat3953

    @ambhat3953

    Жыл бұрын

    @@yuzaR-Data-Science Will do, thanks!

  • @alelust7170
    @alelust7170 Жыл бұрын

    Nice, Tks!

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    Any time!

  • @zane.walker
    @zane.walker Жыл бұрын

    I recently discovered bootstapped prediction intervals working with mixed-effects models and was quite impressed (thank goodness for modern computing power!). You present a persuasive argument to always using bootstrapped regression when any of the linear regression assumptions are violated. Are there any situations where you would use alternative methods, such as log transforms of the data, or weighted regression, to deal with issues such as heteroscedasticity rather than bootstrapping?

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    Surely there are different methods to solve problems! Many roads lead to Rom ;). Bootstrapping is one of them, if you have lot's of data (> 10.000, the more the better) and where you can't fix the assumptions, does't matter what you do. Besides, it's personal preference. Not normality from Shapiro Test is always there when you have lot's of data, even, if residuals look perfectly normal. I personally don't like log-transform data if data itself is interpretable, like weight of animals. I would never use log-weicht. But I would use log-virus-load, because the spread is huge and log shows the trend, while you would not see anything without log. Another think is - I'd rather trust averaged model from the distribution of coefs then a single coefficient from a normal "lm". I would not use bootstrap on small datasets. Finally, it's a question of context and how can you get the closest to the truth out-there.

  • @chacmool2581
    @chacmool2581 Жыл бұрын

    What does this resemble? Random Forests, RF. Except that RF bootstraps/samples observations as well as bootstraping predictors. Am I seeing this correctly? Of course, one loses interpretability with RF. Great stuff as always!

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    Sure you loose interpretability with RF. No coefficients. And that’s exactly what normal models do. But they have assumptions. So, we bootstrap/resample the data and fit 1000 models, which relaxes most assumptions, especially distributional ones

  • @johnsonahiamadzor7404
    @johnsonahiamadzor740411 ай бұрын

    Great work. How do I get these codes for practice? I'm very new to R.

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    11 ай бұрын

    In the description of the video is a link to a blog post where you can get all the R code and the explanations. If you are very new to R, don't be discouraged if not everything is clear and working now. Bootstrapping is kind of advanced topic. Thanks for watching!

  • @Maxwaener
    @Maxwaener Жыл бұрын

    Can you use this approach if you have a numeric predictor (change in percent) for a categorical outcome (2-4 levels)?

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    hey, sorry for late replay, I was on holidays. Yes, you can! the model you apply is up to you, you just need to specify it in the "map" function. It will then be run over the bootstrapped data, so that you can use any model. In your case it would be multinomial, I guess. But if it is only one predictor, I would turn it upsidedows and use quasibinomial model of percentage as an outcome with categorical predictor. It's easier to interpret then a multinomial in my opinion. cheers

  • @festusattah8612
    @festusattah86123 ай бұрын

    Thanks for this insightful video. However, I have one question. If I want to use this approach in a research paper, do you know of some papers I can cite to back up my choice of this model.

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    3 ай бұрын

    In my opinion you only need some reasons to do that. For example many assumptions are not met. I am sure there are papers, but don’t have any from the top of my head. But even if nobody cited, somebody should start. I certainly will, after I am done with my current paper on quantile regression.

  • @desaiha
    @desaiha Жыл бұрын

    How do u apply this technique to temporal data which has trend and or seasonality.

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    "strata" argument might help in the bootstrap funktion. ask R this: ?bootstraps. Or google of people who might have done something similar in tidymodels. I still didn't

  • @jonascruz6562
    @jonascruz656211 ай бұрын

    Great video! Anyway to conduct a Bootstrap regression but using the robust (Huber) regression instead of conventional linear model for data with many outliers?

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    11 ай бұрын

    sure there is a way, just exchange the "lm" with "lmrob" function from library(robustbase). I actually did a video on robust regression. However, I don't think it will be necessary, because first, the bootstrapping will smooth out the influence of outliers, but if you still have to many, may be they are not outliers, but the data has a weird distribution and you need some other type of model, like poisson or similar. Thanks for your feedback and thank you for watching!

  • @jonascruz6562

    @jonascruz6562

    11 ай бұрын

    Thank you for the answrr. I work with environmental contaminants, so I have a Lot of outliers even after log-transform the data. I am testing some New models. I Just found the boot.pval package, which is a Low-code package for Bootstrap regression, including rlm. Bye the way, I love you Low-code vídeos. Greatings from Brazil

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    11 ай бұрын

    Hey, Jonas, thanks for the recommendation. I'll check out the boot.pval package, because I model everyday with real-world data and need robust options. Thanks also for the feedback and for watching!

  • @rolfjohansen5376
    @rolfjohansen5376 Жыл бұрын

    How do I calculate a simple Maximum likelihood for a simple non-parametric regressior: y_i = b_i + e_i (number of datapoints = number of parameters?) thanks

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    Sorry, can't really say that with certainty, because never needed that till today. But if you somehow figure this out, please, let me know! Thanks for watching!

  • @ariancorrea2711
    @ariancorrea2711 Жыл бұрын

    Hi, how can i extract the r.squared for each model?

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    hey, from the "glance" fucntion library(broom) # for tidy(), glance() & augment() functions nested_models % mutate(models = map(data, ~ lm(wage ~ age, data = .)), coefs = map(models, tidy, conf.int = TRUE), quality = map(models, glance), preds = map(models, augment)) I did a demo about it in a video on "many models"

  • @gonzalodequesada1981
    @gonzalodequesada1981 Жыл бұрын

    Is it possible to do a bootstrap for a non-parametric multiple regression model?

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    Жыл бұрын

    that's a great question! :) the short answer is - yes, but it's not necessary, because the method I describe is non-parametric by itself. but my scientific curiosity says - let's do it! What kind of non-parametric regression do you mean? Write a function, like "lm()" or try it out please and post it here so everyone in the community can benefit. Thanks!

  • @EdoardoMarcora
    @EdoardoMarcora3 ай бұрын

    I don't understand how bootstrapping dispenses you from the distributional assumptions of the linear model (normality of residuals etc). What bootstrapping is doing is generating the sampling distribution free of its usual asymptotic assumptions, but the assumptions of the likelihood distribution are still there, right?

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    3 ай бұрын

    certainly it can, I am 100% sure, but please, don't believe some random youtube video, there is a lot of trash out-there (most likely some of my videos are partly incorrect too), thus, please, check it online or in stats book yourself. For example, here is a reference from a stat book which might explain more, but even the first half page will do it, I think: www.sagepub.com/sites/default/files/upm-binaries/21122_Chapter_21.pdf

  • @joaoalexissantibanezarment4766
    @joaoalexissantibanezarment476622 күн бұрын

    This is an excellent video!! I was thinking, a nonparametric alternative for linear regression could be LOESS regression and boostrapp could be done without problem but, because LOESS is a nonparametric, instead of medians the means could be used properly or also in this case the medians should be used?

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    21 күн бұрын

    While resampling allows for a better use of means, I am a big fan of medians, because if the distribution of anything after bootstrapping does not get normal, like in the case of p.values, I would trust the median, but not the mean. So, I would use median as much as I can.

  • @joaoalexissantibanezarment4766

    @joaoalexissantibanezarment4766

    21 күн бұрын

    @@yuzaR-Data-Science Ok, I really thank you for answer!

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    21 күн бұрын

    you are very welcome!

  • @joaoalexissantibanezarment4766

    @joaoalexissantibanezarment4766

    19 күн бұрын

    @@yuzaR-Data-Science I had another question. Althoguh bootstrapping is not exactly an option to handle outliers, could be the case that the more resamples used, the more robust is the model to outliers?

  • @yuzaR-Data-Science

    @yuzaR-Data-Science

    18 күн бұрын

    Yes, because then you would resample the most frequent cases more often, so their distribution would be higher, and the outliers ... hmm, we would not get rid of them, but they will be resampled very rarely. hope that helps. cheers