yuzaR Data Science
2 жыл бұрын
9,082
1

R demo | Kruskal-Wallis test + Post-Hoc | How to conduct, visualize, interpret & more 😉

In this video, we'll:
install.packages("ggstatsplot")
library(ggstatsplot)
ggbetweenstats(
data = d,
x = education,
y = wage,
type = "nonparametric")
If you only want the code (or want to support me), consider join the channel (join button below any of the videos), because I provide the code upon members requests.
Enjoy! 🥳

Пікірлер: 74

@jwebbnature11 ай бұрын
Very clear and calm directions, thank you for making these videos to help us :)
@yuzaR-Data-Science
11 ай бұрын
Glad you like them! Thank you for watching!
@jackx73822 жыл бұрын
Every part is so well explained!
@yuzaR-Data-Science
2 жыл бұрын
Thanks! I am glad you liked it!
@saygindiler39282 жыл бұрын
Perfect, your videos amazing. Thanks
@yuzaR-Data-Science
2 жыл бұрын
Glad you like them!
@emredunder91082 жыл бұрын
Excellent!
@yuzaR-Data-Science
2 жыл бұрын
Glad you liked it!
@juniorsouza482611 ай бұрын
Amazing!
@yuzaR-Data-Science
11 ай бұрын
Thanks! If you liked this one, you might enjoy gtsummary or emmean package reviews. I found them so useful, that I could not resist to make videos on them. I use them everyday.
@MrNummularius Жыл бұрын
Amazing
@yuzaR-Data-Science
Жыл бұрын
Thank you! Cheers!
@ogollafredrickotieno10 ай бұрын
easily explained
@yuzaR-Data-Science
10 ай бұрын
Thanks 🙏
@kydaviddoyle19692 жыл бұрын
Great video!! Could you explain a little more on how to read/ interpretation of the Dunn test results as to tell which is significant? For example it looks like
@yuzaR-Data-Science
2 жыл бұрын
Only significant Dunn tests are displayed. That means - between 0.05) relationship. And only p.values from Dunn tests are displayed. Which means p-values are the only think you need to interpret. Just google please "how to interpret p.values". And "why do we need p.value correction for multiple comparisons". Hmm, eta-squared interpretation is actually part of the video, that is why I don't really know what is unclear. In the RStudio write following: "?interpret_eta_squared()". You'll get the table with the interpretation. You might also generally check what is an effect size and why it is useful. Again, just google it and read a bit. That will help. Thanks for watching and Cheers!
@NextGenAge27 күн бұрын
Great video! Is it possible to only show the pairwise comparisons between one group e.g. 'Original' and Synthetic1, Synthetic2, Synthetic3 ... etc? It also does pairwise comparisons between those synthetic groups which I don't want to show and also don't want to conduct tests for except with the original one. Having a separate one by one figure takes up a lot of space so wondering if this is possible?
@yuzaR-Data-Science
27 күн бұрын
I think it’s difficult with this function, although not impossible. But it’s much more practical to model it, with for example quantile regression, and use tab_model function from sjPlot package 📦 I have videos on both if need some assistance for a start
@Alex-gw6pm4 ай бұрын
Thank you for video! If you have 4 groups in each 5 animals (in general less than 10 animals in each group) and the distribution is normal, which better to use parametric or non parametric tests?
@yuzaR-Data-Science
4 ай бұрын
parametric tests are fine in my opinion. try anova. i also have similar video of anova and repeated anova.
@Maxwaener Жыл бұрын
Great video. When I try to change y axis to log10 the multiple comparisons disappear from the graph. Any suggestions as to how changeing the y axis to log10 and keep the multiple comparisons? (Couldnt find it on SO)
@yuzaR-Data-Science
Жыл бұрын
Hi, thanks! I am not sure about the axis, but when you change the data to log, the multiple comparisons might dessapear naturally. There is an option in ggbetweenstats - ""pairwise.display = "all" try this one
@sanjitchandradebnath49164 ай бұрын
Great video, it really helped me to easily do my test and add the p value on the plot. However, I have a small question. Instead of violin plot, I want to make a boxplot without the points. How can I change the default violin plot into just boxplot? Can you please suggest me? Thanks a lot.
@yuzaR-Data-Science
4 ай бұрын
Sure, just check out the options of the function and you’ll find almost everything what you need to adjust. Certainly type of the plot .
@zane.walker Жыл бұрын
Have you had any experience with using the extract_stats function with purrr:map to extract the stats from the ggbetweenstats function from multiple data sets?
@yuzaR-Data-Science
Жыл бұрын
not yet. but it seems like a nice function. I used report package for a while, and did a review on it. I think it's a better option.
@AuthenticMusicalInsight9 ай бұрын
OMGGGG THIS IS AMAZING. Thanks.... What alternative do you offer to perform as a two-way anova?
@yuzaR-Data-Science
9 ай бұрын
Thanks! 🙏 for two-way anova I would recommend the {emmeans} package. I have two videos on it. I use it everyday and think it's more general approach to any kind of model (not only linear-regression/two-way-anova) with two (or more) predictors and interaction between them. hope you find emmeans also useful! cheers
@MsTenseiga Жыл бұрын
just found you when I'm desperate to get my statistics in order. I understand what you did, I just can't replicate it for my own data just yet for some reason... I have like 18 rows of data, all named by species. Every species has a different amount of data. So, x would be the species, and y would be distance they travelled in my agarose gel. I know I need the Kruskal-Wallis test. I have managed to create a boxplot for my data, and to conduct the test for my data separately. Now I know which species is significantly different from which. I just can't figure out how to get that data visualized. I used to do it by hand, but that's borderline impossible with so much damn data. I'll try your approach with ggstats. Thanks so much for giving me a point to start
@yuzaR-Data-Science
Жыл бұрын
you are welcome! most of the mistakes are minor ... like data might be untidy somewhere or you made a typing mistake.... happens to me all the time. should work with no problem, except you have one observation per species
@MsTenseiga
Жыл бұрын
@@yuzaR-Data-Science so cool of you to respond ^^ I don't know why, but R apparently doesn't like the = sign. keeps telling me there's an unexpected one, but I typed it exactly as you did. It's unbelievable us biologists are expected to use R just like that when we're doing our thesis... but no one actually teaches us. Frustrating. And every tutorial is completely different. I'll just keep trying. Probably something I missed before you specified the x and y axis
@yuzaR-Data-Science
Жыл бұрын
:) I am biologist myself, cheers mate! I learned R autodidactically! Keep going it's worth it! Now to your question: are you sure you need = and not ==. They differ in R, and in the beginning I confused them too.
@yuzaR-Data-Science
Жыл бұрын
Oh, by the way, hear a link to the blog-post of this video, where you can copy paste R code: yuzar-blog.netlify.app/posts/2022-04-13-kw/ My blog generally could be a good start, because I teach my students with it, and so far, they progress.
@Maxwaener Жыл бұрын
Do you know a way to change the p values in the plot so that thay dont show as scientific notation? Making the p values more readable.
@yuzaR-Data-Science
Жыл бұрын
Not possible to my current knowledge. However, the package develops quickly and if check the options, it might be possible now. If no, you can request a feature on GitHub page of the package. If yes, please, let me know in the comments. Thanks for watching!
@staedtler84794 ай бұрын
If you don't mind i have a small question concerning the applicability of the kruskall wallis test. When measuring heavy metal concentration across different strata (e.g. sediment, water, fish organes) the units used are different. In this case, does it affect the applicability of the test ?
@yuzaR-Data-Science
4 ай бұрын
It sounds like it would. We can't compare kg to gramms, right? So, if the concentration is soooo different, that you need different units, why do you need the test at all, when you'll bring to the same unit, you'll see the difference immediately. Hope that helps.
@staedtler8479
4 ай бұрын
@@yuzaR-Data-Science Yes i firstly thought the same but it's not simple to convert HMs concentrations without having the density. Which bring an issue when conducting those kind of test. Surprisingly when i asked different AI about it, the explanation i got was about how the assumptions should be met only, and it doesn't really affect the test as it's robust enough to handle those differences.
@yuzaR-Data-Science
3 ай бұрын
well, in any case, we suppose to compare apples to apples. you should better ask your scientific supervisor about it and read some papers who did similar research, so that you can see how they did that comparison and you can immediately cite them
@bogdanandjelic22003 ай бұрын
Thanks for the content once again! I've got an issue - is it possible to disable scientific notations of p-values? Much appreciated.
@yuzaR-Data-Science
3 ай бұрын
Not that I know of. In fact I also hate these one. I already contacted the author one about it. But if more people tell him, he might solve this issue sooner. Thus, please, open a new issue on his github profile, so that he sees, that most of folks don't like it. cheers mate
@bogdanandjelic2200
3 ай бұрын
Makes sense! Thanks, will do it. Cheers@@yuzaR-Data-Science
@yuzaR-Data-Science
3 ай бұрын
👍
@yuzaR-Data-Science
3 ай бұрын
👍
@JibHyourinmaru Жыл бұрын
what if i have more than 2 variables to test? like 5 variables? can I do it all together and see the result in one figure? do you have line code for that? tq
@yuzaR-Data-Science
Жыл бұрын
Yes you can! But it depends on your question. Check out grouped_ggbetweenstats() function, or you have to do regression. One of my favorits is a quantile regression, I have a tutorial on that too on my channel.
@sergiom.querido964511 ай бұрын
how to solve the following error? I have installed all dependencies packages but the error remains Error: package or namespace load failed for ‘ggstatsplot’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]): there is no package called ‘statsExpressions’
@yuzaR-Data-Science
11 ай бұрын
The error message tells you how ;) install “statsExpressions” package 📦
@Rewuik Жыл бұрын
Amazing content, thank you! But, when i try to use par=(mfrow) the plots are not working. I'm trying to plot side-by-side plots with ggstatsplot.
@yuzaR-Data-Science
Жыл бұрын
Thanks for the feedback, Rogério! There are few ways to make it work: 1) grouped_ggbetweenstats() indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggbetweenstats.html#grouped-analysis-with-grouped_ggbetweenstats
@yuzaR-Data-Science
Жыл бұрын
2) ggarrange(a, b, c, ncol = 1, nrow = 3, labels = "AUTO", common.legend = T)
@yuzaR-Data-Science
Жыл бұрын
3) and the last, patchwork R package: patchwork.data-imaginist.com/
@Rewuik
Жыл бұрын
@@yuzaR-Data-Science Thank you, the package patchwork is so simple :)
@yuzaR-Data-Science
Жыл бұрын
you are welcome! hmmm, I might do a package review about patchwork one day ;)
@yuliaegorova589 Жыл бұрын
Is it possible to put effect size instead pvalues in pairwise comparisons on a plot?
@yuzaR-Data-Science
Жыл бұрын
That's actually a great idea! I think not, but I just asked the author of the package and will get back to you when it's possible.
@yuliaegorova589
Жыл бұрын
thanks a lot! would be great since we only show significant Comparisions on a plot already ( so i already know it is significant). what I am more interested is whether the effect size between the groups is large or small
@yuzaR-Data-Science
Жыл бұрын
There is another package from the same guy: pairwise_comparisons. It should have all the effects
@yuliaegorova589
Жыл бұрын
@@yuzaR-Data-Science yep, this one I use myself, but can it be combined with plots from this package? and also i wonder if you know where to find functions that calculate the effect sizes
@yuzaR-Data-Science
Жыл бұрын
you can ask the Indrajeed, the guy who created ggstatsplot package. hmmm, effectsize package is a useful one. I might do a video on "effectsize" in the future.
@hamadalonazi723 Жыл бұрын
Hi, yuzar! Please help me with this error. I have been trying to do the same test and am getting this error message. Error in `mutate()`: ℹ In argument: `isanoutlier = (.) %$% ...`. ℹ In group 1: `job.category = Doctor`. Caused by an error in `x$terms`: ! $ operator is invalid for atomic vectors Could you let me know how I can solve it? Please help
@yuzaR-Data-Science
Жыл бұрын
Well, I can't help without data and code. But sometimes googling the error message helpt. You are for sure not the first person to get such error ;) Thanks for watching!
@dinhluongnguyen36102 ай бұрын
I could'nt visit "For more details and R code go to....."
@yuzaR-Data-Science
2 ай бұрын
Sorry for that, man! Netlify shut down my blog since they want me to pay for increased traffic. I refuse to pay for doing something useful for the world (without earning absolutely nothing) and since R is open source. But I want to reopen it ASAP, as soon as I find an alternative for Netlify. It'll take some time though, because I am not an IT guy. FYI: my blog is actually the script for the video, word by word, code by code. Thanks for understanding! Since you are a member, I created a community post with R code for the whole video. Please, let me know whether you could see it and get the code. If you wish, I could send you an HTML version of that video with both, code and explanations, and if you like the other videos too, I’ll create others until I fix the blog. Cheers and thank you for joining! Highly appreciate that!
@rubyanneolbinado953 ай бұрын
Why it says "cannot find function" when ggstatplot has already been installed. So sad.
@yuzaR-Data-Science
3 ай бұрын
did load the library(ggstatsplot)? installing is done once. but you have to load it every time you use it. cheers
@rubyanneolbinado95
3 ай бұрын
This is very helpful. Thank you so much. God bless. ❤
@yuzaR-Data-Science
3 ай бұрын
you are very welcome! :)
@rubyanneolbinado95
3 ай бұрын
Hi sir, can you give me tips on how to present the results of my GLM. I have 3 models made but the two are not significant. Should I present them all in my thesis?
@yuzaR-Data-Science
3 ай бұрын
you can use sjPlot package for visualisation, gtsummary package for creating amazing tables and emmeans package for extracting the most results from your models. I reviewed all these packages on my youtube channel. the rest depends on you research question, thus ask your supervisors.
@kelleyknaak90512 жыл бұрын
p̾r̾o̾m̾o̾s̾m̾ ✨
@yuzaR-Data-Science
2 жыл бұрын
👍