Mutual Information, Clearly Explained!!!

Mutual Information is metric that quantifies how similar or different two variables are. This is a lot like R-squared, but R-squared only works for continuous variables. What's cool about Mutual Information is that it works for both continuous and discrete variables. So, in this video, we walk you through how to calculate Mutual Information step-by-step. BAM!
English
This video has been dubbed using an artificial voice via aloud.area120.google.com to increase accessibility. You can change the audio track language in the Settings menu.
Spanish
Este video ha sido doblado al español con voz artificial con aloud.area120.google.com para aumentar la accesibilidad. Puede cambiar el idioma de la pista de audio en el menú Configuración.
Portuguese
Este vídeo foi dublado para o português usando uma voz artificial via aloud.area120.google.com para melhorar sua acessibilidade. Você pode alterar o idioma do áudio no menu Configurações.
If you'd like to support StatQuest, please consider...
Patreon: / statquest
...or...
KZread Membership: / @statquest
...buying my book, a study guide, a t-shirt or hoodie, or a song from the StatQuest store...
statquest.org/statquest-store/
...or just donating to StatQuest!
www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
/ joshuastarmer
0:00 Awesome song and introduction
2:39 Joint and Marginal Probabilities
6:19 Calculating the Mutual Information for Discrete Variables
13:00 Calculating the Mutual Information for Continuous Variables
14:10 Understanding Mutual Information as a way to relate the Entropy of two variables.
#StatQuest #MutualInformation #DubbedWithAloud

Пікірлер: 146

  • @statquest
    @statquest Жыл бұрын

    To learn more about one common way to create histograms of continuous variables, see: journals.plos.org/plosone/article?id=10.1371/journal.pone.0087357 To learn more about Lightning: lightning.ai/ Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

  • @SelinDrawz
    @SelinDrawz Жыл бұрын

    Thank u daddy stat quest for carrying me through my university course

  • @statquest

    @statquest

    Жыл бұрын

    Ha! :)

  • @faizalrafi
    @faizalrafi8 ай бұрын

    I am binge-watching this series. Very clear and concise explanations for every topics given in the most interesting way!

  • @statquest

    @statquest

    8 ай бұрын

    Glad you like them!

  • @PunmasterSTP

    @PunmasterSTP

    2 ай бұрын

    Same here!

  • @Fan-vk9gx
    @Fan-vk9gx Жыл бұрын

    Super! I have been struggled between copula, mutual information, etc. for a while, that is exactly what I am looking for! Thank you, Josh! This video is really helpful!

  • @statquest

    @statquest

    Жыл бұрын

    Glad it was helpful!

  • @user-rf8jf1ot3t
    @user-rf8jf1ot3t14 күн бұрын

    I love this video. Simple and clear.

  • @statquest

    @statquest

    14 күн бұрын

    Thanks!

  • @kenmayer9334
    @kenmayer9334 Жыл бұрын

    Awesome stuff, Josh. Thank you!

  • @statquest

    @statquest

    Жыл бұрын

    My pleasure!

  • @dragoncurveenthusiast
    @dragoncurveenthusiast Жыл бұрын

    Your explanations are awesome!

  • @statquest

    @statquest

    Жыл бұрын

    Glad you like them!

  • @raizen74
    @raizen7411 ай бұрын

    Superb explanation! Your channel is great!

  • @statquest

    @statquest

    11 ай бұрын

    Glad you think so!

  • @arash2229
    @arash2229 Жыл бұрын

    Thank youuuu. you explain everything clearly

  • @statquest

    @statquest

    Жыл бұрын

    Glad it was helpful!

  • @PunmasterSTP
    @PunmasterSTP2 ай бұрын

    Mutual information, clearly explained? More like "Magnificent demonstration, you deserve more fame!" 👍

  • @statquest

    @statquest

    2 ай бұрын

    Thanks! 😃

  • @KatanyaTrader
    @KatanyaTrader Жыл бұрын

    OMG i never see this channel, how many hours would be saveeddd.. new subs here, thanks alottt for ur vids

  • @statquest

    @statquest

    Жыл бұрын

    Welcome!

  • @MegaNightdude
    @MegaNightdude Жыл бұрын

    Great stuff. As always.

  • @statquest

    @statquest

    Жыл бұрын

    Thank you very much! :)

  • @stepavancouver
    @stepavancouver Жыл бұрын

    An interesting explanation and nice sence of humor 👍

  • @statquest

    @statquest

    Жыл бұрын

    Thank you!

  • @smilefaxxe2557
    @smilefaxxe2557Ай бұрын

    Great explanation, thank you! ❤🔥

  • @statquest

    @statquest

    Ай бұрын

    Glad it was helpful!

  • @Maciek17PL
    @Maciek17PL Жыл бұрын

    Amazing as always!!!

  • @statquest

    @statquest

    Жыл бұрын

    Thank you!

  • @zachchairez4568
    @zachchairez4568 Жыл бұрын

    Great job! Love it!

  • @zachchairez4568

    @zachchairez4568

    Жыл бұрын

    Liking my own comment to double like your video :)

  • @statquest

    @statquest

    Жыл бұрын

    Double bam! :)

  • @VaibhaviDeo
    @VaibhaviDeo Жыл бұрын

    you are the best god sent really stay blessed

  • @statquest

    @statquest

    Жыл бұрын

    Thank you!

  • @bernardtiongingsheng85
    @bernardtiongingsheng85 Жыл бұрын

    Thank you so mcuh! It is really helpful. I really hope you can explain KL divergence in the next video.

  • @statquest

    @statquest

    Жыл бұрын

    I'll keep that in mind.

  • @sasha297603ha
    @sasha297603ha2 ай бұрын

    Love it, thanks!

  • @statquest

    @statquest

    2 ай бұрын

    Thank you!

  • @Geneu97
    @Geneu973 ай бұрын

    Thank you for being a content creator

  • @statquest

    @statquest

    3 ай бұрын

    Thanks!

  • @PunmasterSTP

    @PunmasterSTP

    2 ай бұрын

    Not just a creator of any content either. A creator of *exceptional* content!

  • @pablovivas5234
    @pablovivas5234 Жыл бұрын

    Keep it up. Great content

  • @statquest

    @statquest

    Жыл бұрын

    Thank you!

  • @felipevaldes7679
    @felipevaldes7679 Жыл бұрын

    I love this channel

  • @statquest

    @statquest

    Жыл бұрын

    BAM! :)

  • @felipevaldes7679

    @felipevaldes7679

    Жыл бұрын

    @@statquest lol, very on-brand too.

  • @isaacfernandez2243
    @isaacfernandez224311 ай бұрын

    Dude, you don't even know me, and I don't really know you either, but oh boyy, I fucking love you. Thank you. One day I will teach people just like you do.

  • @statquest

    @statquest

    11 ай бұрын

    Thanks! :)

  • @ian-haggerty
    @ian-haggertyАй бұрын

    Entropy === The expectation of the surprise!!! I'll never look at this concept the same again

  • @statquest

    @statquest

    Ай бұрын

    bam! :)

  • @pranabsarmaiitm2487
    @pranabsarmaiitm2487 Жыл бұрын

    awesome!!! Now waiting for a video on Chi2 Test of Independence.

  • @statquest

    @statquest

    Жыл бұрын

    I'll keep that in mind.

  • @Lynxdom
    @Lynxdom7 ай бұрын

    You got a like just for the musical numbers!

  • @statquest

    @statquest

    7 ай бұрын

    bam!

  • @user-sn4ni3np8h
    @user-sn4ni3np8h4 ай бұрын

    Two sigmas are like two for loops, such that, for every index of outer Sigma, the inner sigmaales a complete iteration.

  • @statquest

    @statquest

    4 ай бұрын

    bam!

  • @wowZhenek
    @wowZhenek Жыл бұрын

    Josh, thank you for the awesome easily digestible video. One question. Is there any specific guideline about binning the continuous variable? I'm fairly certain that depending on how you split it (how many bins you choose and how spread they are) the result might be different.

  • @statquest

    @statquest

    Жыл бұрын

    To learn more about one common way to create histograms of continuous variables, see: journals.plos.org/plosone/article?id=10.1371/journal.pone.0087357

  • @wowZhenek

    @wowZhenek

    Жыл бұрын

    @@statquest Josh, thank you for the link, but I guess I formulated my question incorrectly. The question was about not creating the histogram but actually choosing the bins. You split your set in 3 bins. Why 3? Why not 4 or 5? Would the result change drastically if you split in 5 bins? What if the distribution of the variable you are splitting is not normal or uniform? Etc

  • @statquest

    @statquest

    Жыл бұрын

    @@wowZhenek When building a histogram, choosing the bins is the hard part, and that is what that article describes - a special way to choose the number and width of bins specifically for Mutual Information. So take a look. Also, because we are using a histogram approach, it doesn't matter what the underlying distribution is. The histogram doesn't make any assumptions.

  • @wowZhenek

    @wowZhenek

    Жыл бұрын

    @@statquest oh, yeah, I didn't look inside the URL you gave because your described it as "one common way to create histograms of continuous variables" which seemed very much distant from what I was actually asking about. Now that I checked the link, damn, what a comprehensive abstract. Thank you very much!

  • @666shemhamforash93
    @666shemhamforash93 Жыл бұрын

    Amazing as always! Any update on the transformer video?

  • @statquest

    @statquest

    Жыл бұрын

    Still working on it.

  • @murilopalomosebilla2999
    @murilopalomosebilla2999 Жыл бұрын

    Excellent content as always!

  • @statquest

    @statquest

    Жыл бұрын

    Much appreciated!

  • @buckithed
    @buckithed4 ай бұрын

    Fire🔥🔥🔥

  • @statquest

    @statquest

    3 ай бұрын

    BAM! :)

  • @eltonsantos4724
    @eltonsantos4724 Жыл бұрын

    Que Top. Dublado em português

  • @statquest

    @statquest

    Жыл бұрын

    Muito obrigado! :)

  • @user-yx5rj2jv2d
    @user-yx5rj2jv2d9 ай бұрын

    awesome

  • @statquest

    @statquest

    9 ай бұрын

    Thanks!

  • @adityaagrawal2397
    @adityaagrawal23977 ай бұрын

    Just started Learning ML, am assured now that the journey would be smooth with this channel

  • @statquest

    @statquest

    7 ай бұрын

    Good luck! :)

  • @marahakermi-nt7lc
    @marahakermi-nt7lc10 ай бұрын

    thankss joshh 😍😍 in 1:30 since the response variable is not continuous and takes on 0 or 1(yes/no) can we model it with logistic regression?

  • @statquest

    @statquest

    10 ай бұрын

    Yep!

  • @ruiqili1818
    @ruiqili1818Ай бұрын

    Your explanations are alway awesome! I wonder how to explain Normalized Mutual Information?

  • @statquest

    @statquest

    Ай бұрын

    I believe it's just a normalized version of mutual information (so scale it to be a value between 0 and 1).

  • @Lara-qo5dc

    @Lara-qo5dc

    Ай бұрын

    This is great! Do you know if you can interpret a NMI value in percentages, something like 7% of information overlaps, or 7% of group members overlap?

  • @BorisNVM
    @BorisNVM5 ай бұрын

    this is cool

  • @statquest

    @statquest

    5 ай бұрын

    Thanks!

  • @ian-haggerty
    @ian-haggertyАй бұрын

    Seriously though, I think the KL divergence is worth a mention here. Mutual information appears to be the KL divergence between the actual (empirically derived) joint probability mass function, and the (empirically derived) probability mass function assuming independence. I know that's a lot of words, but my brain can't help seeing these relationships.

  • @statquest

    @statquest

    Ай бұрын

    One day I hope to do a video on the KL divergence.

  • @aleksandartta
    @aleksandartta Жыл бұрын

    1) based on what to choose the number of bins? Does larger number of bins gives lesser mutual information? 2) what if the label (output value) is numerical? Thank in advance

  • @statquest

    @statquest

    Жыл бұрын

    1) Here's how a lot of people find the best number (and width) of the bins: journals.plos.org/plosone/article?id=10.1371/journal.pone.0087357 2) Then you make a histogram of the label data.

  • @dhanrajm6537
    @dhanrajm65374 ай бұрын

    hi, what will be the base of the logarithm when calculating entropy. I believe it was mentioned in the entropy video that for 2 outputs(yes/no or heads/tails) the base of the logarithm will be two. Is there any generalization to this statement?

  • @statquest

    @statquest

    4 ай бұрын

    Unless there is a specific reason to use a specific base for the log function, we use log base 'e'.

  • @viranchivedpathak4231
    @viranchivedpathak42319 ай бұрын

    DOUBLE BAM!!

  • @statquest

    @statquest

    9 ай бұрын

    Thanks!

  • @AI_Financier
    @AI_Financier5 ай бұрын

    3 more things: 1- it would have been great if you could make a comparison with correlation too here, 2- discuss the minimum and maximum value of the MI, 3- the intuition of this specific formula

  • @statquest

    @statquest

    5 ай бұрын

    Thanks! I'm not really sure you can compare Mutual Information to correlation because correlation doesn't work at all with discrete data. I mention this at 1:20.

  • @noazamstein5795
    @noazamstein57955 ай бұрын

    is there a good and stable way to calculate mutual information for numeric variables *where the binning is not good*, e.g. highly skewed distributions where the middle bins are very different from the edge bins?

  • @statquest

    @statquest

    5 ай бұрын

    Hmm... off the top of my head, I don't know, but I wouldn't be surprised if there was someone out there publishing research papers on this topic.

  • @archithiwrekar4021
    @archithiwrekar402110 ай бұрын

    Hey, so what if our dependent variable ( here, loves troll 2) is continuous? Can we use Mutual information in that case? by binning aren't we just converting it into a categorical variable?

  • @statquest

    @statquest

    10 ай бұрын

    You could definitely try that.

  • @RaviPrakash-dz9fm
    @RaviPrakash-dz9fm Жыл бұрын

    Can we have videos about all the gazillion hypothesis tests available!!

  • @statquest

    @statquest

    Жыл бұрын

    I'll keep that in mind.

  • @GMD023
    @GMD023 Жыл бұрын

    Off topic question...but will chatgpt replace us as data scientists/analysts/ statisticians. I just discovered it tonight and it blew me away. I basically learned html and css in a day with it. Im worried it will massively reduce jobs in our field. I did a project that would normally take all day in a few minutes...scary stuff.

  • @insomniacookie2315

    @insomniacookie2315

    Жыл бұрын

    Well, if you really want his opinion, watch the AI Buzz #1 Josh uploaded three weeks ago. It’s in this channel. As for my opinion, obviously nobody knows yet, but it will soon be a new ground-level for anybody else. For some that all they can do is basic things ChatGPT does far better, they are in danger; for others that can make more values out of ChatGPT (or any tools to come), they are in far better shape. Which do you think you and fellow data scientists are? And even for the basic stuffs, there should be at least someone to check whether the ChatGPT has done some absurd work or not, right? Maybe at least for a few years or so.

  • @ayeshavlogsfun

    @ayeshavlogsfun

    Жыл бұрын

    Just out of curiosity how did you learn HTML and CSS in a day ? And what's specific task that you solved

  • @toom2141

    @toom2141

    Жыл бұрын

    I didnt think ChatGPT is that impressive afterall. Makes so many mistakes is not able to do really complicated stuff. Totally overhyped!

  • @statquest

    @statquest

    Жыл бұрын

    See: kzread.info/dash/bejne/nWeWm6-vpNecnLg.html

  • @GMD023

    @GMD023

    Жыл бұрын

    @@statquest thank you! This is great. Im also starting my first job today post college as a research data specialist! Your videos always helped me throughout my data science bachelors, so thank you!

  • @romeo72899
    @romeo72899 Жыл бұрын

    Can you please make a video on Latent Dirichlet Allocation

  • @statquest

    @statquest

    Жыл бұрын

    I'll keep that in mind! :)

  • @6nodder6
    @6nodder6 Жыл бұрын

    Is it weird that my prof. gave me the mutual information equation as one that uses entropy? We were given "I(A; B) = H(B) - sum_b P(B = b) * H(A | B = b)" with no mention of the equation you showed in this video

  • @statquest

    @statquest

    Жыл бұрын

    That is odd. Mutual information can be derived from the entropy of two variables. It is the average of how the surprise in one variable is related to the surprise in another. However, this is the standard formula. See: en.wikipedia.org/wiki/Mutual_information

  • @AI_Financier
    @AI_Financier5 ай бұрын

    maybe next video on this: KL divergence

  • @statquest

    @statquest

    5 ай бұрын

    It's on the list.

  • @9erik1
    @9erik18 ай бұрын

    6:18 not small bam, big bam... thank you very much...

  • @statquest

    @statquest

    8 ай бұрын

    BAM!!! :)

  • @harishankarkarthik3570
    @harishankarkarthik3570Ай бұрын

    The calculation at 8:27 seems incorrect. I plugged it into a calculator and got 0.32. The log is base 2 right?

  • @statquest

    @statquest

    Ай бұрын

    At 8:07 I say that we are using log base 'e'.

  • @Tufelkind
    @Tufelkind2 ай бұрын

    It's like FoodWishes for stats

  • @statquest

    @statquest

    2 ай бұрын

    :)

  • @andrewdouglas9559
    @andrewdouglas9559 Жыл бұрын

    It seems information gain (defined via entropy) and mutual information are the same thing?

  • @statquest

    @statquest

    Жыл бұрын

    They are related, but not the same thing. For details, see: en.wikipedia.org/wiki/Information_gain_(decision_tree)

  • @andrewdouglas9559

    @andrewdouglas9559

    Жыл бұрын

    @@statquest Thanks, I'll check it out. And also thanks for all the videos. It's an incredible resource you've produced.

  • @liam_42
    @liam_42Ай бұрын

    Hello, that's a great video and it has helped me understand a lot about Mutual Information as well as your other video about entropy. I do have a question. At 11:13 the answer you get after calculation is 0.5004 and it is explained that it is close to 0.5. However when I do the math (( 4 ÷ 5 ) × log ( 5 ÷ 4 ) + ( 1 ÷ 5 ) × log( 5 ) ) the answer I get is 0.217322... Am I missing something? Because from what I understood, the closer you get to 0.5, the better it is but it is not confirmed by my other examples. Is there a maximum to mutual information? Thank you for your video.

  • @statquest

    @statquest

    Ай бұрын

    The problem is that you are using log base 10 instead of the natural log (log base 'e'). I talk about this at 8:07 and in this other video: kzread.info/dash/bejne/m6merrBtaMrbnc4.html

  • @liam_42

    @liam_42

    Ай бұрын

    @@statquest Thank you for your answer. That explains a lot.

  • @rosss6989

    @rosss6989

    Ай бұрын

    I have same doubt, when both columns are equal it says mutual info is 0.5 then what is maximum value of mutual info and in which scenario ?

  • @yourfutureself4327
    @yourfutureself4327 Жыл бұрын

    i'm more of a 'Goblin 3: the frolicking' man myself

  • @statquest

    @statquest

    Жыл бұрын

    bam!

  • @Chuckmeister3
    @Chuckmeister3 Жыл бұрын

    What does it mean if mutual information is above 0.5? If 0.5 is perfectly shared information...

  • @statquest

    @statquest

    Жыл бұрын

    As you can see in the video, perfectly shared information can have MI > 0.5. So 0.5 is not the maximum value.

  • @Chuckmeister3

    @Chuckmeister3

    Жыл бұрын

    @@statquest Is MI then somehow influenced by the size of the data or the number of categories? The video seems to suggest it should be around 0.5 for perfectly shared information (at least in this example). With discrete data using 15 bins I get some values close to 1. Thanks for these great videos.

  • @statquest

    @statquest

    Жыл бұрын

    @@Chuckmeister3 Yes, the size of the dataset matters.

  • @yurigansmith

    @yurigansmith

    Жыл бұрын

    @@Chuckmeister3 Interpretation from coding theory (natural log replaced by log to base 2): Mutual information I(X;Y) is the amount of bits wasted if X and Y are encoded separately instead of jointly encoded as vector (X,Y). Statement holds on average and only asymptotically, i.e. for optimal entropy coding (e.g. arithmetic encoder) with large alphabets (asymptotically for size -> oo). It's the amount of information shared by X and Y measured in bits. Mutual information can become arbitrarily large, depending on the size of the alphabets of X and Y (and the distribution p(x,y) of course). But it can't be greater than the separate entropies H(X) and H(Y), respectively the minimum of both. You can think of I(X;Y) as the intersection of H(X) and H(Y). ps: I think the case of perfectly shared information is if there's a (bijective) function connecting each symbol of X with each symbol of Y, so that the relation between X and Y becomes deterministic. In that case H(X)=H(Y)=I(X;Y). The other extreme is X and Y being statistically independent: In that case I(X;Y) = 0.

  • @AlexanderYap
    @AlexanderYap Жыл бұрын

    If I want to calculate the correlation between Likes Popcorn and Likes Troll 2, can I use something like Chi2? Similarly between Height bins and Likes Troll 2. What's the advantage of calculating the Mutual Information?

  • @statquest

    @statquest

    Жыл бұрын

    The advantage is that we have a single metric that works on both continuous, discrete and mixed variables and we don't have to make any assumptions about the underlying distributions.

  • @AxDhan
    @AxDhan Жыл бұрын

    small bam = "bamsito"

  • @statquest

    @statquest

    Жыл бұрын

    Ha! :)

  • @rogerc23
    @rogerc23 Жыл бұрын

    Ummm I know I have a cold right now but did anyone only hear an Italian girl speaking ?

  • @statquest

    @statquest

    Жыл бұрын

    ?

  • @TommyMN
    @TommyMN Жыл бұрын

    If I could I'd kiss you on the mouth, wish you did a whole playlist about data compression

  • @statquest

    @statquest

    Жыл бұрын

    Ha! I'll keep that topic (data compression) in mind.

  • @FREELEARNING
    @FREELEARNING Жыл бұрын

    Great content. But just don't sing, you're not up to that.

  • @statquest

    @statquest

    Жыл бұрын

    Noted! :)

  • @VaibhaviDeo

    @VaibhaviDeo

    Жыл бұрын

    i will fite you if you tell daddy stat quest what to do what not to do

  • @igorg4129
    @igorg4129 Жыл бұрын

    I was always interested how should we think if we want to invent such a technique. Imean ok, lets say I "suspect" that the probabilities here should do the job, and say my goal is to get at the end of a day some "flag" from 0 to 1 which indicates the strenght of a relationship, but how should I think on, to deside like what comes to denominator vs nominator, when use log etc. There should be something like an "thinking algorithm" P.s Understanding this will be very helpfull in understanding the existing fancy formulas

  • @statquest

    @statquest

    Жыл бұрын

    I talk more about the reason for the equation in my video on Entropy: kzread.info/dash/bejne/i6iZxKmPqJCsqNo.html

  • @joshuasirusstara2044
    @joshuasirusstara2044 Жыл бұрын

    that small bam

  • @statquest

    @statquest

    Жыл бұрын

    :)