Drawing and Interpreting Heatmaps

This StatQuest is about the heatmaps. We see these all the time, but there are lots of arbitrary decisions that go into drawing them. Here, I show you what those decisions are and how they affect the results.
For a complete index of all the StatQuest videos, check out:
statquest.org/video-index/
If you'd like to support StatQuest, please consider...
Buying The StatQuest Illustrated Guide to Machine Learning!!!
PDF - statquest.gumroad.com/l/wvtmc
Paperback - www.amazon.com/dp/B09ZCKR4H6
Kindle eBook - www.amazon.com/dp/B09ZG79HXC
Patreon: www.patreon.com/statquest
...or...
KZread Membership: kzread.info/dron/tYLUTtgS3k1Fg4y5tAhLbw.htmljoin
...a cool StatQuest t-shirt or sweatshirt:
shop.spreadshirt.com/statquest-with-josh-starmer/
...buying one or two of my songs (or go large and get a whole album!)
joshuastarmer.bandcamp.com/
...or just donating to StatQuest!
www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
joshuastarmer
#statquest #rnaseq #heatmap

Пікірлер: 131

  • @statquest
    @statquest2 жыл бұрын

    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

  • @syednajeebashraf4101
    @syednajeebashraf41018 жыл бұрын

    These are one of best Statistics presentations I have ever seen. Too much helpful.

  • @KlizmaTime
    @KlizmaTime2 жыл бұрын

    This is so good. I am not a statistician, but needed to understand what a heat map for RNAseq results meant in one of the papers I was reading. I have a better understanding now. Thank you!

  • @statquest

    @statquest

    2 жыл бұрын

    Glad it was helpful!

  • @pedrovallim9170
    @pedrovallim91705 жыл бұрын

    Nice presentation. Particularly because it shows how the choice of the parameter of evaluation of distances of a point to distinct groups affects the visualization of the maps of heat. I suggest exploring in another presentation what is the most appropriate selection method.

  • @nikitrianta9896
    @nikitrianta98963 жыл бұрын

    Thank you for making statistics look so simple!!! These are the best videos I've ever seen !

  • @statquest

    @statquest

    3 жыл бұрын

    Thank you! :)

  • @mea97905
    @mea979056 жыл бұрын

    Great effort was put in this! Thank you so much for sharing.

  • @sumitpaliwal1540
    @sumitpaliwal15408 жыл бұрын

    Thanks. You make these things look very simple.

  • @PriscilaSantos-vs2po
    @PriscilaSantos-vs2po4 жыл бұрын

    I only discovered this channel yesterday. Thank you very much!!!!!!!!!!!

  • @statquest

    @statquest

    4 жыл бұрын

    Hooray! I'm glad you like the videos. :)

  • @bizikur
    @bizikur6 жыл бұрын

    your videos are so clear and helpful! Thank you!

  • @statquest

    @statquest

    6 жыл бұрын

    Thanks for the complement! They are fun to make and it's nice to know people like them. :)

  • @zi3417
    @zi3417 Жыл бұрын

    I am going to watch all videos in this playlist. Very useful for a new grad student(also neural research). Thank you ;)

  • @statquest

    @statquest

    Жыл бұрын

    Glad you like them!

  • @himalayanplanespotter2021
    @himalayanplanespotter20215 жыл бұрын

    Thanks Josh, helped me a lot in understanding heat map and hierarchical clustering. Greetings from Japan.

  • @statquest

    @statquest

    5 жыл бұрын

    Hooray!!! I'm glad to hear that the video was helpful. I will be visiting Japan in 2 months! I can't wait.

  • @himalayanplanespotter2021

    @himalayanplanespotter2021

    5 жыл бұрын

    That's awesome Josh! Hope you have a great time in Japan.

  • @felipegutierre7037
    @felipegutierre7037 Жыл бұрын

    I love when your videos results on my youtube searches! *.*

  • @statquest

    @statquest

    Жыл бұрын

    BAM! :)

  • @danieldimitrov3146
    @danieldimitrov31465 жыл бұрын

    Great video! Thank you so much!

  • @fidelialau
    @fidelialau6 жыл бұрын

    Thank you so much! Your video helped a lot.

  • @sagek7949
    @sagek79493 жыл бұрын

    Thank you for such a lucid explanation!

  • @statquest

    @statquest

    3 жыл бұрын

    Glad it was helpful!

  • @akhilmahajan1417
    @akhilmahajan14175 жыл бұрын

    Thanks Josh! I love your videos.

  • @statquest

    @statquest

    5 жыл бұрын

    Thank you so much!!!! :)

  • @uzmazehra9372
    @uzmazehra93724 жыл бұрын

    Can you please do a video on PERMANOVA. I find your videos easy to understand

  • @MegaGravitas
    @MegaGravitas3 жыл бұрын

    it will be really great to get a dedicated episode for scaling data the way it is described in this video.

  • @statquest

    @statquest

    3 жыл бұрын

    I'll keep that in mind.

  • @stillsummer
    @stillsummer8 жыл бұрын

    This video is great! One teeny tiny part that can be improved is the name of genes and samples in 8:15. Using "gene #1" instead of "gene 1" could be misleading at the first glance since you call sample #1, #2, #3. However, it's just a tedious point for your future reference. Other than that, your video is flawless.

  • @andrzejzielezinski1489
    @andrzejzielezinski14896 жыл бұрын

    OMG!!:D In my 300 years+ as a vampire I've never seen such an excellent and clean explanation of heatmaps and hierarchical clustering. Thank you - you are a great teacher! Is there any chance you could share the PowerPoint presentation? (I'm okay if you say no - it's a lot of hard work)

  • @andrzejzielezinski1489

    @andrzejzielezinski1489

    6 жыл бұрын

    Thank you very much, Joshua. I'll just use the slides to explain heatmaps to a small group of my non-English speaking friends. Of course I will recommend your channel and the website. Please keep up this great work!

  • @behnamassadi5885
    @behnamassadi58853 ай бұрын

    Thank you statquest, it was extremely helpful

  • @statquest

    @statquest

    3 ай бұрын

    Glad it helped!

  • @CWunderA
    @CWunderA5 жыл бұрын

    Fantastic one-stop-shop for clustering basics!

  • @statquest

    @statquest

    5 жыл бұрын

    Hooray!!!

  • @lkhmaj
    @lkhmaj4 жыл бұрын

    Hey Josh, I started to follow you from France and I find your videos really top. However, I have a small question: when you say for each sample, do you mean each row? thanks :)

  • @statquest

    @statquest

    4 жыл бұрын

    Unfortunately I am very sloppy when I say "sample" - in earlier videos I use the word to indicate a row of data, but it also means a collection of rows... In newer videos I call a row of data an "observation" and a collection of observations a "dataset". I wish I did that from the beginning... oh well. :(

  • @zudma
    @zudma4 жыл бұрын

    The start was really clear and helpful, but then on the second heatmap i instantly got lost, couldn't even tell what each row was, i wish you'd given some more references for it, and made it smaller, and maybe gave one of two examples of interpreting. That said, thanks for the video, it was helpful!

  • @statquest

    @statquest

    4 жыл бұрын

    I'm sorry you had trouble following the concepts in this video.

  • @debleenamukherjee2410
    @debleenamukherjee2410 Жыл бұрын

    This was so helpful! Thank you so much!!!❤

  • @statquest

    @statquest

    Жыл бұрын

    Thanks!

  • @songthanh896
    @songthanh896 Жыл бұрын

    Thank you very much for your wonderful explanation

  • @statquest

    @statquest

    Жыл бұрын

    Thanks!

  • @MountainVibesTX
    @MountainVibesTX4 жыл бұрын

    Hi Josh, huge subscriber, Love you bro, Trying to clarify what kind of data are these numbers. Is it like, the "presence" of each gene in a cell? And what does that tell us? What are we looking for? I know these are big questions. You can just provide an example. My stats background is based mostly from linear regression, where samples are rows and variables are columns, but it always seems like you're rows and columns are reversed.

  • @statquest

    @statquest

    4 жыл бұрын

    The data in these examples are the number of mRNA transcripts per gene in a pool of cells. This number roughly correlates with how "active" a gene is in a cell. For example, in liver cells, genes associated with liver functions have relatively large numbers of mRNA transcripts associated with them. In contrast, in skin cells, genes associated skin function (like pigments etc.) have a relatively large number mRNA transcripts associated with them.

  • @257Silvia
    @257Silvia3 жыл бұрын

    Such a great explanation!

  • @statquest

    @statquest

    3 жыл бұрын

    Thank you! :)

  • @mohammedmonzurmorshed2896
    @mohammedmonzurmorshed28964 жыл бұрын

    thanks so much..you are making my life easier

  • @statquest

    @statquest

    4 жыл бұрын

    Hooray!!!

  • @leixiao169
    @leixiao1692 жыл бұрын

    Thanks for the great lecture! For hierarchical clustering in R, do we only need to provide a count matrix of the log transformed normalized counts of all samples as input (generated from Rsubread or DESeq)? The hierarchical clustering script will calculate the Z-scaling based on the input count matrix, am I right?

  • @statquest

    @statquest

    2 жыл бұрын

    I believe so.

  • @ComputerScienceMaster
    @ComputerScienceMaster3 жыл бұрын

    Great tutorial... Love it

  • @statquest

    @statquest

    3 жыл бұрын

    Thank you! :)

  • @zaheral-masqari4827
    @zaheral-masqari48274 жыл бұрын

    thx a lot for this rich explain and I wish if you have videos for PCOA, LDA, alpha diversity and R language

  • @statquest

    @statquest

    4 жыл бұрын

    I have videos on PCoA and LDA. Here's a link to all of my videos: statquest.org/video-index/

  • @aftabnadim
    @aftabnadim4 жыл бұрын

    Thanks, Josh, could you please share with us a tutorial about how to draw heatmap in R-Studio or any other online database for the RNA Seq data.

  • @statquest

    @statquest

    4 жыл бұрын

    I'll keep that in mind.

  • @jonathanz4322
    @jonathanz43222 жыл бұрын

    nobody: StatQuest: statquest, statqueeessst .............. STATQUEST!

  • @statquest

    @statquest

    2 жыл бұрын

    :)

  • @snehavalabailu
    @snehavalabailu3 жыл бұрын

    Burst out at 12.30 😂😂😂 Can't believe stats made me laugh, I used to oly cry bcz of stats before 😂 you are goooood

  • @statquest

    @statquest

    3 жыл бұрын

    Hooray!!! :)

  • @addisonmcghee9190
    @addisonmcghee91903 жыл бұрын

    Hi Josh, Just a quick clarifying question: When you say "sample", do you mean a collection of observations? I'm trying to conceptualize how the data is stored. Is each sample a single participant or is it a vector?

  • @statquest

    @statquest

    3 жыл бұрын

    Unfortunately I was relatively sloppy when I used the word "sample" in this video. At 0:32 I say that a "sample" refers to a column of measurements (the rows are individual genes, or measurements takin within each sample.). To clarify, imagine a "sample" being a single person and on each person we measure weight, height and age. That said, usually the word "sample" refers to a bunch of people or a bunch of whatever.

  • @apulunuj
    @apulunuj4 жыл бұрын

    The values used in clustering by the eucleadian distance are scaled read counts right

  • @statquest

    @statquest

    4 жыл бұрын

    Yep! :)

  • @PraveenKumar-pd9sx
    @PraveenKumar-pd9sx4 жыл бұрын

    Can you pls do a video on genetic algorithm

  • @munaalhammadi4237
    @munaalhammadi42373 жыл бұрын

    Thank you for the nice presentation and great explanation. Is global scaling similar to scaling by column?

  • @statquest

    @statquest

    3 жыл бұрын

    No, it uses all of the columns to find a single scaling factor for all of the data instead of finding one scaling factor per column.

  • @munaalhammadi4237

    @munaalhammadi4237

    3 жыл бұрын

    @@statquest What is the function that performs global scaling in R?

  • @statquest

    @statquest

    3 жыл бұрын

    @@munaalhammadi4237 x

  • @birrabukhari3836
    @birrabukhari3836 Жыл бұрын

    it was really informative.

  • @statquest

    @statquest

    Жыл бұрын

    Thanks! :)

  • @adelutzaification
    @adelutzaification6 жыл бұрын

    Kick it up a notch! :)

  • @worldofinformation815
    @worldofinformation8153 жыл бұрын

    Thank you Sir 🌹✨

  • @statquest

    @statquest

    3 жыл бұрын

    :)

  • @sandeshacharya553
    @sandeshacharya5533 жыл бұрын

    Hi! Quick Question: When we are scaling data, do we scale them by samples or by genes? Or do we find mean and standard deviation of all the genes (expressed in all samples) and scale them all at once? The second makes more sense to me. Would be glad if you could answer it. :)

  • @sandeshacharya553

    @sandeshacharya553

    3 жыл бұрын

    If we scale the data per gene, can it be used for the comparison of expression among different genes in a single sample? Similarly, if we scale the data per sample, can we use to compare expression patterns in different samples?

  • @statquest

    @statquest

    3 жыл бұрын

    The scaling can go either way. The goal is to use something that makes the relationships clear. Sometimes that means scaling by gene, other times that means scaling by sample, and you can also do a global scaling.

  • @lifeandbeyond7279
    @lifeandbeyond72793 жыл бұрын

    very nice video...thanks

  • @statquest

    @statquest

    3 жыл бұрын

    :)

  • @steveb.s.baleba5944
    @steveb.s.baleba59446 жыл бұрын

    Can you share with us the R code ???

  • @juanete69
    @juanete692 ай бұрын

    In another video you spoke about dividing by the lenght of the sequence and the dividing by the sum of the column counts. Is it better to do it like that or to normalize like you explain in this video?

  • @statquest

    @statquest

    2 ай бұрын

    It really depends on your data.

  • @user-mw2pf8tw6c
    @user-mw2pf8tw6c6 жыл бұрын

    a great vedio,help me lot.

  • @statquest

    @statquest

    6 жыл бұрын

    Hooray! :)

  • @elaherezaee5672
    @elaherezaee56722 жыл бұрын

    thank you very helpful

  • @statquest

    @statquest

    2 жыл бұрын

    Thanks!

  • @Tiago211287
    @Tiago2112878 жыл бұрын

    Can I make heat-map of only differential expressed genes? Or the right thing to do is doing of all expressed genes?. I did a heat-map, and, the differential expressed genes are mix with not differential expressed genes. Some rows and columns are bad to look at because some outliers.

  • @Tiago211287

    @Tiago211287

    8 жыл бұрын

    +Joshua Starmer Thanks.

  • @stillsummer

    @stillsummer

    8 жыл бұрын

    One more follow-up question about heatmaps. If I have multiple groups of samples. Let's say Treatment 1, Treatment 2, and Control. Would you suggest us to make a heatmap based on differentially expressed genes in "Treatment 1 vs Control" or "Treatment 2 vs Control"? I am asking because those DE genes in the first comparison will not be the same as those in the second comparison. In this case, if you have to make a heatmap to describe your results, would you pick the top 100 genes with highest reads instead of top 100 DE genes? Does that make any sense?

  • @stillsummer

    @stillsummer

    8 жыл бұрын

    Hi Joshua, Thank you for your detail explanation! I aimed to discover the differentially expressed genes between different treatment groups so I think the second option would be more suitable for that purpose. Now I just have to figure out hoe to color code the heatmap...which would take a while. Thanks again :)

  • @mikhailalexeyev636
    @mikhailalexeyev6363 жыл бұрын

    Thank you!

  • @statquest

    @statquest

    3 жыл бұрын

    Thanks!

  • @user-nv4ug9oy3i
    @user-nv4ug9oy3i7 ай бұрын

    thank you so much

  • @statquest

    @statquest

    7 ай бұрын

    Welcome 😊!

  • @drajaygupta7908
    @drajaygupta7908 Жыл бұрын

    wonderful

  • @statquest

    @statquest

    Жыл бұрын

    Thank you!

  • @Mipetz38
    @Mipetz383 жыл бұрын

    15:10 Is it just me or does comparing points to the cluster average results in a more neatly colored heatmap?

  • @statquest

    @statquest

    3 жыл бұрын

    I'm partial to the furthest.

  • @ISK_VAGR
    @ISK_VAGR11 ай бұрын

    That is crazy good. Now I have to mention. That I don't believe that one should go with the one that looks good. There is always a reason to choose one or other method. Rationalizing why using centroids for example will provide more robustness to the result because outliers is a better argument if you want to keep a conservative interpretation. Another is using different clustering methods and see those that converge in similar results would be advisable. At least that is my opinion. In any case super good statquest

  • @statquest

    @statquest

    11 ай бұрын

    Glad you like the video. However, regardless of how you justify your method, you will always end up with some relatively arbitrary cut off because there is no way to prove that you've made the correct decision.

  • @factsfigures2740
    @factsfigures27403 жыл бұрын

    Sir , i understood why local scaling is helpful in case of an outlier in data. But can you please tell me how its done or any popular name of local scaling method which i can google and understand. Nonetheless , Thanks alot.

  • @statquest

    @statquest

    3 жыл бұрын

    Z scaling

  • @kwang-hwicho1298
    @kwang-hwicho12984 жыл бұрын

    At. 1:23, you said the relative abundances are scaled on per gene basis. if so, should all genes have dark red and dark blue in their row?

  • @statquest

    @statquest

    4 жыл бұрын

    We "z-scale" all of these genes by subtracting the mean value (per gene) and dividing by the standard deviation (per gene). That said, this does not mean that all genes will have the same maximum and minimum values. For example, if gene "a" has these values: 100, 10, -10, -100, then the scaled values = 1.2, 0.1, -0.1, -1.2. In contrast, if gene 'b' has these values: 1, 1, -1, -1, then the scaled values = 0.9, 0.9, -0.9, -0.9. In this case, gene 'a' will have a darker blue and a darker red than gene 'b'.

  • @kwang-hwicho1298

    @kwang-hwicho1298

    4 жыл бұрын

    @@statquest Thank you very much for your kind answer.

  • @rebeccaeliscu3460
    @rebeccaeliscu34605 жыл бұрын

    Does applying a log transformation to the count data accomplish the same thing as normalizing the data across genes?

  • @statquest

    @statquest

    5 жыл бұрын

    Great question! The answer is that although the log transformation can make the data easier to look at (by preventing the highly expressed genes from making it impossible to see differences in moderately expressed genes), it's not the same as normalizing - it doesn't center the mean of the measurements on 0 and it doesn't standardize the range of values.

  • @rebeccaeliscu3460

    @rebeccaeliscu3460

    5 жыл бұрын

    @@statquest Is it then appropriate to normalize the rlog data? Thanks for your response!

  • @statquest

    @statquest

    5 жыл бұрын

    @@rebeccaeliscu3460 Yes. However, there are no fixed rules for how to transform the data - the goal is to have something that is relatively easy to interpret, and that means there is a lot of flexibility. Try stuff and see what works best.

  • @marble2725
    @marble27253 жыл бұрын

    made me say "OHHHHHH!!" out loud, 10/10

  • @statquest

    @statquest

    3 жыл бұрын

    BAM! :)

  • @upalabdhadey6156

    @upalabdhadey6156

    3 жыл бұрын

    @@statquest Double BAM :> :>

  • @danielaquijano5872
    @danielaquijano58722 жыл бұрын

    I love you thank you

  • @statquest

    @statquest

    2 жыл бұрын

    :)

  • @zeebazahrasultana9368
    @zeebazahrasultana93683 жыл бұрын

    Great for a beginner 😁

  • @statquest

    @statquest

    3 жыл бұрын

    Thanks! :)

  • @debajitbhowmick7079
    @debajitbhowmick70793 жыл бұрын

    Will it be better if we use median than mean? 4 min 45 sec.

  • @statquest

    @statquest

    3 жыл бұрын

    The decision is arbitrary. Try them both and see what looks better.

  • @debajitbhowmick7079

    @debajitbhowmick7079

    3 жыл бұрын

    @@statquest well I love to see but do not know R or any comp language. But it is good to know.

  • @debajitbhowmick7079
    @debajitbhowmick70793 жыл бұрын

    Greatdiscussion

  • @statquest

    @statquest

    3 жыл бұрын

    Thanks!

  • @adelutzaification
    @adelutzaification6 жыл бұрын

    I am somewhat surprised and a tad vexed that, in doing clustering, there are so many choices in terms of calculating the distance and choosing the method of clustering. It's very... empirical... I somehow expected that certain parameters would fit best certain scenarios/ types of problems . Has anybody checked systematically the various permutations of parameters for "best fitting" of a problem (say gene expression). Perhaps, the scenarios need to be better defined in order to find this correspondence. It seems that there are cases in which a certain combination of parameters fit best the essence of the system , according to the researcher's experience. Maybe scrutinizing such cases can reveal a certain feature of the system that is not encountered in seemingly similar cases for which the combination doesn't work so well...? Maybe some empirical "constant" a la theorems encountered in physics can do the trick? ;)

  • @adelutzaification

    @adelutzaification

    6 жыл бұрын

    Gotcha. Each of these techniques in just another weapon in our arsenal. Hypothesis testing through experimentation is the best. But there is also model testing in statistics, isn't it. Can you make a video about that? Methods, parameters, best practices Thank you.

  • @adelutzaification

    @adelutzaification

    6 жыл бұрын

    Thanks. I also meant to reply to your message on Statquest.org but I can't find it any more to save my life... Today I learned about about "lasso regression" . Apparently, it can "do variable selection and prediction" which seems like the bee's knees. Can you expand on that in a video, por favor? Here's something you might enjoy: another "Bam" aficionado ( kzread.info/dash/bejne/iqqV3LOOiaqXm6g.html)

  • @majs7037
    @majs70373 жыл бұрын

    strijderssss

  • @statquest

    @statquest

    3 жыл бұрын

    ?

  • @RyonBang
    @RyonBang4 жыл бұрын

    its hard to be a scientist fck!! i was thinking what if i shouldve gone to medical school instead

  • @taotaotan5671
    @taotaotan56714 жыл бұрын

    Josh runs out songs...

  • @statquest

    @statquest

    4 жыл бұрын

    a sad day. :(

Келесі