How to find the best sampling depth for rarefaction (CC202)

Ғылым және технология

One critique of rarefaction is that the sampling depth people pick is arbitrary. Is that true? In this Code Club, I'll show you my thought process for picking the best sampling depth to use for rarefying my data by rarefaction. Along the way we'll look at approaches to visualize the number of sequences per sample. We'll also see how to calculate Good's coverage to ascertain how well we have sampled our communities for a desired depth of sequencing.
You can find my blog post for this episode at www.riffomonas.org/code_club/.... The data were generated in our Kozich et al. 2013 paper (doi.org/10.1128/AEM.01043-13) using samples from the Schloss et al. 2012 paper (doi.org/10.4161/gmic.21008).
#rarefaction #coverage #tidyverse #R #Rstudio #Rstats
Want more practice on the concepts covered in Code Club? You can sign up for my weekly newsletter at shop.riffomonas.org/youtube to get practice problems, tips, and insights.
If you're interested in taking an upcoming 3 day R workshop be sure to check out our schedule at riffomonas.org/workshops/
You can also find complete tutorials for learning R with the tidyverse using...
Microbial ecology data: www.riffomonas.org/minimalR/
General data: www.riffomonas.org/generalR/
0:00 Determining the threshold for rarefaction
3:46 Visualizing the number of sequences per sample
10:54 Making a decision
16:39 Calculating Good's coverage

Пікірлер: 30

  • @ericagardner8249
    @ericagardner824923 сағат бұрын

    Thank you, this is so helpful :)

  • @lilianjose9154
    @lilianjose91542 жыл бұрын

    Thank you, I enjoyed a lot studying from your videos. It is very detail and the thing that I am very like about your videos is you actually mention on what we, as the researcher think about the data we are having. I have learnt a lot. Thanks! And also, the explanations that you give as you type the codes are really helpful.

  • @Riffomonas

    @Riffomonas

    2 жыл бұрын

    Thanks for watching Lilian! 🤓

  • @eduardoacostasolisdeovando3706
    @eduardoacostasolisdeovando37062 жыл бұрын

    It's just great to find such valuable information on KZread, thanks a lot Pat.

  • @Riffomonas

    @Riffomonas

    2 жыл бұрын

    It’s really my pleasure. Thanks! 🤓

  • @sven9r
    @sven9r2 жыл бұрын

    Wow - ty Pat! I will write probably a longer e-mail regarding this topic. I held back because I knew you would talk about it. People following this channel can become SOOOOOOOOOOO much better in coding and biology. I visited 3 Unis in Germany and I swear no one except my PI right now did teach me as much as you 😅🤣 P.S. the Mexican bots are hitting and boosting your KZread algorithm :D

  • @Riffomonas

    @Riffomonas

    2 жыл бұрын

    Ha! Thanks so much for watching and being such a loyal viewer 🤓

  • @lisakelly4921
    @lisakelly49212 жыл бұрын

    Thank you, thank you! I have been thinking about this topic for a couple of weeks now!

  • @Riffomonas

    @Riffomonas

    2 жыл бұрын

    Wonderful- thanks for watching!🤓

  • @brantainman
    @brantainman Жыл бұрын

    Your videos are awesome. Clear explanations, excellent coding.

  • @Riffomonas

    @Riffomonas

    Жыл бұрын

    Thanks Brant! Glad to have you watching 🤓

  • @kristinamichl268
    @kristinamichl2682 жыл бұрын

    Thanks a lot for your videos, they are amazing, you can explain very well and address exactly the questions which were popping up as soon as I started working on my actual data!

  • @Riffomonas

    @Riffomonas

    2 жыл бұрын

    Wonderful! Thanks for watching 🤓

  • @abdullahimuhammad8020
    @abdullahimuhammad80202 жыл бұрын

    great video. Thank you for this as this come at the point when i am struggling with determining that threshold.

  • @Riffomonas

    @Riffomonas

    2 жыл бұрын

    Fantastic- thanks for watching!🤓

  • @abdullahimuhammad8020

    @abdullahimuhammad8020

    2 жыл бұрын

    @@Riffomonas can you please help with the script for calculating Good's coverage? I will like to apply the same to my data.

  • @belgarath73g
    @belgarath73g6 ай бұрын

    thanks for the explanation, if I have few reads but it makes sense because a treatement, even the good's coverage is a good estimator?

  • @kelmermartins123
    @kelmermartins1233 ай бұрын

    Thank you for sharing the video, Patrick. I found it really interesting and helpful. Your codes are so clean! I recently read your 2024 paper in mSphere which discussed rarefaction and touched on Good's coverage. The paper left me wondering about coverage thresholds. For instance, is there a threshold of, say, >85% that indicates a reliable capture of community diversity/composition?

  • @Riffomonas

    @Riffomonas

    3 ай бұрын

    Eh, I don't think it really matters. The deeper you go, the more resolution you'll be able to detect

  • @kelmermartins123

    @kelmermartins123

    3 ай бұрын

    @@Riffomonas Thanks for the fast response Patrick! This kind of more direct interaction with researchers is so nice. Well, at the end of the day the only thing that can be done is more sequencing, right? Hahah

  • @bridget9926
    @bridget99262 жыл бұрын

    This was super helpful! Thanks a lot. One quick question: Say you have a sample with a low number of sequences, but high good's coverage. Can you "trust" this sample? Or should there be a minimum sequencing depth you still need to decide on?

  • @Riffomonas

    @Riffomonas

    2 жыл бұрын

    Always rarefy to the smallest sample size. I use the goods to tell reviewers to back off if I have a low sequencing depth

  • @JS-lp2bc
    @JS-lp2bc2 жыл бұрын

    Would love to hear your thoughts on relative abundance versus absolute count. I’m working with 16S from swabs, so we can’t normalize starting material. I’ve had people ask about absolute values but I can’t really figure out a way to do that since each sample started with different amounts and aren’t consistent! Thanks!

  • @Riffomonas

    @Riffomonas

    2 жыл бұрын

    I think you’d need a spike in control to back out the abundance. Regardless, I think that if you want absolute abundance you would be better off using qPCR for the specific populations you are interested in

  • @JS-lp2bc

    @JS-lp2bc

    2 жыл бұрын

    @@Riffomonas thanks for the reply!

  • @user-kp9xh1mp5b
    @user-kp9xh1mp5b8 ай бұрын

    Thank you!! The video is great, i have learned a lot from it. I have one question here. I got very imbalanced sequenced data, such as sample 1 had 300k sequences while sample 2 had only 127 sequences. I want to rarefy the OTU to 5000 sequences so i used the same method to check the coverage of my data. However, i found the coverage is not good because several coverages of some samples are below 90%, but the seqs of the samples are about 20k-30k. How do i deal with such problems?

  • @Riffomonas

    @Riffomonas

    4 ай бұрын

    I don't worry about the coverage. Use the same rarefaction depth for everything and then you can compare things on the same basis.

  • @pinitphon1
    @pinitphon12 жыл бұрын

    I'm working for microbiome testing company. Our data show that we need more than 25,000 good quanlity reads on human gut microbiome...

  • @sihanbu9063

    @sihanbu9063

    2 жыл бұрын

    I'm also doing human gut microbiome study and mine needs 100,000 reads. However, the rarefaction curve even looks worse when I rarefied to 15,000 reads. Confused...

  • @Riffomonas

    @Riffomonas

    2 жыл бұрын

    You need that many reads to do what? If you’re looking for a super rare racing perhaps, but I find that hard to believe for typical analyses

Келесі