If the assumptions of parametric models can be satisfied, parametric models are the way to go. However, there are often many assumptions and to satisfy them all is rarely possible. Data transformation or using non-parametric methods are two solutions for that. In this post we’ll learn the Non-Parametric Bootstrapped Regression as an alternative for the Ordinary Linear Regression in case when assumptions are violated.
If you only want the code (or want to support me), consider join the channel (join button below any of the videos), because I provide the code upon members requests.
Enjoy! 🥳
Welcome to my VLOG! My name is Yury Zablotski & I love to use R for Data Science = "yuzaR Data Science" ;)
This channel is dedicated to data analytics, data science, statistics, machine learning and computational science! Join me as I dive into the world of data analysis, programming & coding. Whether you're interested in business analytics, data mining, data visualization, or pursuing an online degree in data analytics, I've got you covered. If you are curious about Google Data Studio, data centers & certified data analyst & data scientist programs, you'll find the necessary knowledge right here. You'll greatly increase your odds to get online master's in data science & data analytics degrees. Boost your knowledge & skills in data science and analytics with my engaging content. Subscribe to stay up-to-date with the latest & most useful data science programming tools. Let's embark on this data-driven journey together!

Пікірлер: 51

@utubeleo5037 Жыл бұрын
This was a great watch. it was really well put together, with a good mix of visuals, code and narrative. Thank you for putting it together and sharing
@yuzaR-Data-Science
Жыл бұрын
Glad you enjoyed it, Leo! Thanks for your feedback!
@SergioUribe Жыл бұрын
Thanks for share! I will start to use this model
@yuzaR-Data-Science
Жыл бұрын
👍 you can use any type of model with bootstrap 😉
@oousmane Жыл бұрын
Always excellent ❤️
@yuzaR-Data-Science
Жыл бұрын
Thank you very much! 🙏
@jeffbenshetlerАй бұрын
Excellent demonstration in R.
@yuzaR-Data-Science
Ай бұрын
thanks a lot Jeff, glad you enjoyed it! :)
@eyadha1 Жыл бұрын
great video. Thank you
@yuzaR-Data-Science
Жыл бұрын
Thanks 🙏 glad you enjoyed it
@CCL-ew7pl11 ай бұрын
Great video, thanks Yury ( Munchausen cartoon was an unexpected special treat :))
@yuzaR-Data-Science
11 ай бұрын
😂 I wasn't sure anyone would recognise Baron Münchausen 😁 Glad you enjoyed it!
@heshamkrr669 Жыл бұрын
WORKING thx bro
@yuzaR-Data-Science
Жыл бұрын
Cool 😎
@ambhat3953 Жыл бұрын
Thanks for this...i think now i have a direction to solve the data set at work which is not normally distributed
@yuzaR-Data-Science
Жыл бұрын
You are welcome 🙏 if not normal distribution is your only problem, look into non parametric statistical tests, like Mann Whitney or Kruskal Wallis
@ambhat3953
Жыл бұрын
@@yuzaR-Data-Science Will do, thanks!
@alelust7170 Жыл бұрын
Nice, Tks!
@yuzaR-Data-Science
Жыл бұрын
Any time!
@zane.walker Жыл бұрын
I recently discovered bootstapped prediction intervals working with mixed-effects models and was quite impressed (thank goodness for modern computing power!). You present a persuasive argument to always using bootstrapped regression when any of the linear regression assumptions are violated. Are there any situations where you would use alternative methods, such as log transforms of the data, or weighted regression, to deal with issues such as heteroscedasticity rather than bootstrapping?
@yuzaR-Data-Science
Жыл бұрын
Surely there are different methods to solve problems! Many roads lead to Rom ;). Bootstrapping is one of them, if you have lot's of data (> 10.000, the more the better) and where you can't fix the assumptions, does't matter what you do. Besides, it's personal preference. Not normality from Shapiro Test is always there when you have lot's of data, even, if residuals look perfectly normal. I personally don't like log-transform data if data itself is interpretable, like weight of animals. I would never use log-weicht. But I would use log-virus-load, because the spread is huge and log shows the trend, while you would not see anything without log. Another think is - I'd rather trust averaged model from the distribution of coefs then a single coefficient from a normal "lm". I would not use bootstrap on small datasets. Finally, it's a question of context and how can you get the closest to the truth out-there.
@chacmool2581 Жыл бұрын
What does this resemble? Random Forests, RF. Except that RF bootstraps/samples observations as well as bootstraping predictors. Am I seeing this correctly? Of course, one loses interpretability with RF. Great stuff as always!
@yuzaR-Data-Science
Жыл бұрын
Sure you loose interpretability with RF. No coefficients. And that’s exactly what normal models do. But they have assumptions. So, we bootstrap/resample the data and fit 1000 models, which relaxes most assumptions, especially distributional ones
@johnsonahiamadzor740411 ай бұрын
Great work. How do I get these codes for practice? I'm very new to R.
@yuzaR-Data-Science
11 ай бұрын
In the description of the video is a link to a blog post where you can get all the R code and the explanations. If you are very new to R, don't be discouraged if not everything is clear and working now. Bootstrapping is kind of advanced topic. Thanks for watching!
@Maxwaener Жыл бұрын
Can you use this approach if you have a numeric predictor (change in percent) for a categorical outcome (2-4 levels)?
@yuzaR-Data-Science
Жыл бұрын
hey, sorry for late replay, I was on holidays. Yes, you can! the model you apply is up to you, you just need to specify it in the "map" function. It will then be run over the bootstrapped data, so that you can use any model. In your case it would be multinomial, I guess. But if it is only one predictor, I would turn it upsidedows and use quasibinomial model of percentage as an outcome with categorical predictor. It's easier to interpret then a multinomial in my opinion. cheers
@festusattah86123 ай бұрын
Thanks for this insightful video. However, I have one question. If I want to use this approach in a research paper, do you know of some papers I can cite to back up my choice of this model.
@yuzaR-Data-Science
3 ай бұрын
In my opinion you only need some reasons to do that. For example many assumptions are not met. I am sure there are papers, but don’t have any from the top of my head. But even if nobody cited, somebody should start. I certainly will, after I am done with my current paper on quantile regression.
@desaiha Жыл бұрын
How do u apply this technique to temporal data which has trend and or seasonality.
@yuzaR-Data-Science
Жыл бұрын
"strata" argument might help in the bootstrap funktion. ask R this: ?bootstraps. Or google of people who might have done something similar in tidymodels. I still didn't
@jonascruz656211 ай бұрын
Great video! Anyway to conduct a Bootstrap regression but using the robust (Huber) regression instead of conventional linear model for data with many outliers?
@yuzaR-Data-Science
11 ай бұрын
sure there is a way, just exchange the "lm" with "lmrob" function from library(robustbase). I actually did a video on robust regression. However, I don't think it will be necessary, because first, the bootstrapping will smooth out the influence of outliers, but if you still have to many, may be they are not outliers, but the data has a weird distribution and you need some other type of model, like poisson or similar. Thanks for your feedback and thank you for watching!
@jonascruz6562
11 ай бұрын
Thank you for the answrr. I work with environmental contaminants, so I have a Lot of outliers even after log-transform the data. I am testing some New models. I Just found the boot.pval package, which is a Low-code package for Bootstrap regression, including rlm. Bye the way, I love you Low-code vídeos. Greatings from Brazil
@yuzaR-Data-Science
11 ай бұрын
Hey, Jonas, thanks for the recommendation. I'll check out the boot.pval package, because I model everyday with real-world data and need robust options. Thanks also for the feedback and for watching!
@rolfjohansen5376 Жыл бұрын
How do I calculate a simple Maximum likelihood for a simple non-parametric regressior: y_i = b_i + e_i (number of datapoints = number of parameters?) thanks
@yuzaR-Data-Science
Жыл бұрын
Sorry, can't really say that with certainty, because never needed that till today. But if you somehow figure this out, please, let me know! Thanks for watching!
@ariancorrea2711 Жыл бұрын
Hi, how can i extract the r.squared for each model?
@yuzaR-Data-Science
Жыл бұрын
hey, from the "glance" fucntion library(broom) # for tidy(), glance() & augment() functions nested_models % mutate(models = map(data, ~ lm(wage ~ age, data = .)), coefs = map(models, tidy, conf.int = TRUE), quality = map(models, glance), preds = map(models, augment)) I did a demo about it in a video on "many models"
@gonzalodequesada1981 Жыл бұрын
Is it possible to do a bootstrap for a non-parametric multiple regression model?
@yuzaR-Data-Science
Жыл бұрын
that's a great question! :) the short answer is - yes, but it's not necessary, because the method I describe is non-parametric by itself. but my scientific curiosity says - let's do it! What kind of non-parametric regression do you mean? Write a function, like "lm()" or try it out please and post it here so everyone in the community can benefit. Thanks!
@EdoardoMarcora3 ай бұрын
I don't understand how bootstrapping dispenses you from the distributional assumptions of the linear model (normality of residuals etc). What bootstrapping is doing is generating the sampling distribution free of its usual asymptotic assumptions, but the assumptions of the likelihood distribution are still there, right?
@yuzaR-Data-Science
3 ай бұрын
certainly it can, I am 100% sure, but please, don't believe some random youtube video, there is a lot of trash out-there (most likely some of my videos are partly incorrect too), thus, please, check it online or in stats book yourself. For example, here is a reference from a stat book which might explain more, but even the first half page will do it, I think: www.sagepub.com/sites/default/files/upm-binaries/21122_Chapter_21.pdf
@joaoalexissantibanezarment476622 күн бұрын
This is an excellent video!! I was thinking, a nonparametric alternative for linear regression could be LOESS regression and boostrapp could be done without problem but, because LOESS is a nonparametric, instead of medians the means could be used properly or also in this case the medians should be used?
@yuzaR-Data-Science
21 күн бұрын
While resampling allows for a better use of means, I am a big fan of medians, because if the distribution of anything after bootstrapping does not get normal, like in the case of p.values, I would trust the median, but not the mean. So, I would use median as much as I can.
@joaoalexissantibanezarment4766
21 күн бұрын
@@yuzaR-Data-Science Ok, I really thank you for answer!
@yuzaR-Data-Science
21 күн бұрын
you are very welcome!
@joaoalexissantibanezarment4766
19 күн бұрын
@@yuzaR-Data-Science I had another question. Althoguh bootstrapping is not exactly an option to handle outliers, could be the case that the more resamples used, the more robust is the model to outliers?
@yuzaR-Data-Science
18 күн бұрын
Yes, because then you would resample the most frequent cases more often, so their distribution would be higher, and the outliers ... hmm, we would not get rid of them, but they will be resampled very rarely. hope that helps. cheers

4 Reasons Non-Parametric Bootstrapped Regression (via tidymodels) is Better then Ordinary Regression

Пікірлер: 51

@yuzaR-Data-Science

Жыл бұрын

@yuzaR-Data-Science

Жыл бұрын

@yuzaR-Data-Science

Жыл бұрын

@yuzaR-Data-Science

Ай бұрын

@yuzaR-Data-Science

Жыл бұрын

@yuzaR-Data-Science

11 ай бұрын

@yuzaR-Data-Science

Жыл бұрын

@yuzaR-Data-Science

Жыл бұрын

@ambhat3953

Жыл бұрын

@yuzaR-Data-Science

Жыл бұрын

@yuzaR-Data-Science

Жыл бұрын

@yuzaR-Data-Science

Жыл бұрын

@yuzaR-Data-Science

11 ай бұрын

@yuzaR-Data-Science

Жыл бұрын

@yuzaR-Data-Science

3 ай бұрын

@yuzaR-Data-Science

Жыл бұрын

@yuzaR-Data-Science

11 ай бұрын

@jonascruz6562

11 ай бұрын

@yuzaR-Data-Science

11 ай бұрын

@yuzaR-Data-Science

Жыл бұрын

@yuzaR-Data-Science

Жыл бұрын

@yuzaR-Data-Science

Жыл бұрын

@yuzaR-Data-Science

3 ай бұрын

@yuzaR-Data-Science

21 күн бұрын

@joaoalexissantibanezarment4766

21 күн бұрын

@yuzaR-Data-Science

21 күн бұрын

@joaoalexissantibanezarment4766

19 күн бұрын

@yuzaR-Data-Science

18 күн бұрын

Келесі