The Fisher Information

The machine learning consultancy: truetheta.io
Want to work together? See here: truetheta.io/about/#want-to-w...
Article on the topic: truetheta.io/concepts/machine...
The Fisher Information quantifies how well an observation of a random variable locates a parameter value. It's an essential tool for measure parameter uncertainty, a problem that repeats itself throughout machine learning and statistics. In this video, I explain the Fisher Information rigorously and visually, starting in the one dimensional case and ending in the general case.
SOCIAL MEDIA
LinkedIn : / dj-rich-90b91753
Twitter : / duanejrich
Enjoy learning this way? Want me to make more videos? Consider supporting me on Patreon: / mutualinformation
Sources and Learning More
[1] provides a complete and deep explanation of the Fisher Information. It's captures the abstract/general perspective while making the idea concrete with examples. As is typically the case, the wikipedia article [2] was helpful. Also, section 8.2.2 of [3] explains the use of a theorem on the asymptotic normality of the MLE via the Fisher Information, which I didn't cover here, but certainly informed how I think it connects to parameter uncertainty.
[1] Ly A., Marsman M., Verhagen J., Grasman R., Wagermarkers E.J., (2017), A Tutorial on the Fisher Information, Department of Psychological Methods, University of Amsterdam, The Netherlands
[2] Fisher information, Wikipedia, en.wikipedia.org/wiki/Fisher_...
[3] Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York: Springer.

Пікірлер: 230

  • @HowardMullings
    @HowardMullings2 жыл бұрын

    I highly recommend jonathanpober's video on "Intro to Fisher Matrices" as a compliment to this one. I feel you need this video and jonathan's to make sense of this topic. The visuals of this video are intuitive but jonathan explains why the log of the likelihood is used and how the Taylor expansion of the log likelihood relates to the hessian.

  • @Mutual_Information

    @Mutual_Information

    2 жыл бұрын

    Yea I've seen that video - it covers the topic quite well. I agree it's also worth checking out. As much as I like my video, *really* understanding FI requires seeing it from a few angles. Also, this video is not comprehensive. So +1 to the recommendation

  • @howtoeasy5882
    @howtoeasy588210 ай бұрын

    Indeed, the fisher information information can tell us what the Cramer Rao bound is. Researchers, like Dr. Ahmad Bazzi, use this to benchmark interesting signal processing estimators.

  • @Diego-nw4rt
    @Diego-nw4rt2 жыл бұрын

    This is one of the best math/statistics videos that I have ever watched so far, if not the best. I don't have a background in statistics, however I understood the intuition behind, since your explanation and the tools that you used make the topic easier to understand.

  • @Mutual_Information

    @Mutual_Information

    2 жыл бұрын

    Wow man that’s so nice! I’ll try to keep the food stuff coming!

  • @Boringpenguin
    @Boringpenguin Жыл бұрын

    After all these years I have finally understood the intuition behind fisher information, thank you so much!

  • @definesigint2823
    @definesigint28233 жыл бұрын

    I feel like...I almost grasp this / like I need to study more. The discomfort's just about right (i.e., not intimidating) and is a nice reminder to keep working.

  • @Mutual_Information

    @Mutual_Information

    3 жыл бұрын

    It’s one of the trickier topics I cover. I remember not getting this for the longest time, but eventually I had this visual in my head which helped a lot. I think my tripping block was the two roles of theta.. both an evaluation point and to represent the “true” data generating value. It’s tricky! But if there’s something specific you aren’t sure of, feel free to ask.

  • @Stopinvadingmyhardware

    @Stopinvadingmyhardware

    10 ай бұрын

    MAKE tHINGS for people to steal and make money off of you!!!

  • @benvonhunerbein1865
    @benvonhunerbein18652 жыл бұрын

    Really amazing video! Great step by step introduction of concepts. I also really like these movements across curves to give a better intutition before revealing the solution. Thank you!

  • @sawmill035
    @sawmill035 Жыл бұрын

    I had to pause the video every 5 seconds to re-listen to every phrase because it was just so dense with information (no pun intended). Thanks!

  • @alexmonfis9305
    @alexmonfis93052 жыл бұрын

    Thanks man!! I'm doing a master on Data science and you just save me for my test :) Great animations and clarity!

  • @faronray5903
    @faronray59032 жыл бұрын

    This is hands down one of the best math videos I've ever watched on KZread. Thank you so much.

  • @Mutual_Information

    @Mutual_Information

    2 жыл бұрын

    What a compliment! Thank you, my intention is to keep it up.

  • @mrx42
    @mrx422 жыл бұрын

    Best teacher ever ! Keep up the good work ! You 've just turned my day brighter.

  • @javierferrer450
    @javierferrer45011 ай бұрын

    Great summary and well explained with motivational dynamic graphs. Thanks!

  • @user-jp6cc4qw2z
    @user-jp6cc4qw2z2 жыл бұрын

    Brilliant explanation and graphics! One of the best Math Videos on YT. Every sentence is well thought out and carries information.

  • @Mutual_Information

    @Mutual_Information

    2 жыл бұрын

    Glad you noticed the script! :)

  • @rockapedra1130
    @rockapedra113024 күн бұрын

    Fantastic educator! I've been avoiding learning this for quite a long time! Thx thx thx

  • @Mutual_Information

    @Mutual_Information

    23 күн бұрын

    Happy to!

  • @lonamoch971
    @lonamoch9713 жыл бұрын

    When I saw the video cover, I was pumped for it. As expected, this was a fantastic intuitive explanation! Thank you

  • @Mutual_Information

    @Mutual_Information

    3 жыл бұрын

    Thank you very much!

  • @alanamerkhanov6040
    @alanamerkhanov604011 ай бұрын

    Best video on Fisher Information on the web! Thanks, thanks a lot.

  • @Mutual_Information

    @Mutual_Information

    11 ай бұрын

    Appreciate it Alan!

  • @gordongoodwin6279
    @gordongoodwin62792 жыл бұрын

    This is by far the best video on Fisher Information and its not even close. Hope you put out more videos

  • @heitorcarvalhopinheiro608
    @heitorcarvalhopinheiro6083 жыл бұрын

    Awesome work, DJ! Loved it. As some have said you fill, in a superb way, the statistics gap from 3b1b. I'm currently enrolled at the Bsc in Statistics and Data Science in Brazil and would love to hear from you what books, in your opinion, were essential for you to build that knowledge. That would be a great video, btw. "Essential books for those who aspire a career in Statistics and Data Science"

  • @Mutual_Information

    @Mutual_Information

    3 жыл бұрын

    Hey Heitor, appreciate the comment - glad you’re enjoy the channel. Regarding that vid, I probably will not make it, just because it’s not in line with my style of vid, which are all mathematic concepts. But! That doesn’t mean I won’t provide that info. I can tell you directly that my absolute most favorite, most influential books are : 1) Machine Learning : A probabilistic perspective, by Kevin murphy. This was a huge book for me. Super important. It covers so much in real good depth. There is a second edition draft free online too. It’s my absolute fav book 2) the elements of statistical learning. This is a classic, written by some titans within the field. Every page read is a worthwhile investment 3) Deep Learning by Goodfellow, bengio and Courville. Excellent book for navigating the Wild West of deep learning. Great intuition and very well written. Those are the big ones for me.

  • @nabibunbillah1839

    @nabibunbillah1839

    Жыл бұрын

    ​@@Mutual_Information thanks so much

  • @lichungtsai
    @lichungtsai2 жыл бұрын

    Hey, guy. It’s the best video about fisher information ever!

  • @yessirge
    @yessirge Жыл бұрын

    Shout out to you man, I can really tell how much thought went into the didactic decisions of this video. Thank you so much!

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    And thank you for watching!

  • @rohansinghthelord
    @rohansinghthelord3 ай бұрын

    going to grad school for ML and realize I needa brush up on stats, this helps a lot!

  • @Mutual_Information

    @Mutual_Information

    3 ай бұрын

    Nice! Excellent choice in grad ;)

  • @jpap676
    @jpap6762 жыл бұрын

    An impressive video. The quality of your visualizations is very high. Thank you for the insights.

  • @tejasvichannagiri2490
    @tejasvichannagiri24902 жыл бұрын

    Great video, very clear and easy to follow, but precise also. Thanks!

  • @CristalMediumBlue
    @CristalMediumBlue10 ай бұрын

    Thank you very much for sharing this valuable information. I am planning a binge watch on your channel in the next months.

  • @karanshah1698
    @karanshah16983 жыл бұрын

    The vagueness of the goal. Finally. Someone I can relate to.

  • @TuemmlerTanne11
    @TuemmlerTanne113 жыл бұрын

    Impressive work! You should have way more subscribers. Because you deserve it and for the peoples sake ;) I feel quite privileged to have found your channel so early. Keep up the good work, arleady looking forward to the next video!

  • @Mutual_Information

    @Mutual_Information

    3 жыл бұрын

    Thank you very much! I’m getting a good response, so I think the growth is on its way. I appreciate the support!

  • @semduvida3243
    @semduvida32433 жыл бұрын

    Don't stop doing videos, your work is amazing!

  • @Mutual_Information

    @Mutual_Information

    3 жыл бұрын

    Ha don't worry, I have no plans on stopping

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w3 ай бұрын

    regardless of the number of views, given the subject nature, such content is such a great service and will be relevant for years to come.

  • @piergiorgiolanza4572
    @piergiorgiolanza45722 жыл бұрын

    Excellent video that helped me to grasp this concept quickly and in neat way. Thank you

  • @xLyndo
    @xLyndo2 жыл бұрын

    You can really tell a lot of time and effort went into this. Thanks a lot. Definitely subscribing and am looking forward to more videos.

  • @Mutual_Information

    @Mutual_Information

    2 жыл бұрын

    Thank you! Comments like these means a lot

  • @karanshah1698
    @karanshah16983 жыл бұрын

    Was tired of seeing many takes on this topic, and randomly decided to give this video a shot. Just as #define SIGINT 2 mentioned, this is on the verge of comfort-discomfort! Neither crossed the brain fully nor did it intimidate... I'll watch this on repeat to digest bit by bit. Great work!

  • @Mutual_Information

    @Mutual_Information

    3 жыл бұрын

    Happy to hear it! If there’s something specific you don’t quite understand, feel free to ask.

  • @karanshah1698

    @karanshah1698

    3 жыл бұрын

    @@Mutual_Information Really appreciate the content quality. Can you help relate the relationship b/w three concepts: Fisher, Hessian and KL Divergence with visuals like these? Edit: There also happens to be a misplaced usage of empirical vs non-empirical Fisher. Can you touch upon that as well?

  • @DuaneRich321

    @DuaneRich321

    3 жыл бұрын

    @@karanshah1698 These concepts come together nicely with an explanation of natural gradient methods. I can try to cover all those in that video.

  • @smartboyvijey
    @smartboyvijey2 жыл бұрын

    Your videos are amazing. Looking forward to more of your videos in Information Theory.

  • @isobarkley
    @isobarkley4 ай бұрын

    you're so passionate, engaging, and a talented educator. thanks for all of your content, new and old :)

  • @Mutual_Information

    @Mutual_Information

    4 ай бұрын

    Thank you, much appreciated :)

  • @abhishek.goudar
    @abhishek.goudar2 жыл бұрын

    The plot at 3:30 nailed the idea for me! Thanks!

  • @hdryesiltepe1844
    @hdryesiltepe18442 жыл бұрын

    Amazing job, DJ! It is very intuitive and the visualizations are on 🔥, can I kindly ask which visualization tool do you use? Thank you.

  • @Mutual_Information

    @Mutual_Information

    2 жыл бұрын

    Thanks Hidir! I use a plotting library called Altair (altair-viz.github.io/getting_started/overview.html), which is a Python plotting library similar to matplotlib. Then I have a personal library I use to stitch the pictures into videos

  • @tchedoumenou1165
    @tchedoumenou116510 ай бұрын

    I'm constantly feeling like you're going to announce some exiting news man! Great content btw.

  • @ashwinkotgire2303
    @ashwinkotgire2303 Жыл бұрын

    Man, you just paved a concrete road to my future Thanks

  • @Manu-gy6tq
    @Manu-gy6tq2 жыл бұрын

    As a stats student: Thank you so much - amazing explanation!

  • @spitfirerulz
    @spitfirerulz2 жыл бұрын

    Hey, thanks! I came here from MITx 18.6501x. That bit in the middle about highly correlated 2-D case filled in the missing intuitive link for (sort of) grasping why Fisher Information matters. And I can now also see why it is used in the Jeffrey's Prior. The water-bending hand gestures are a bonus. Cheers.

  • @Mutual_Information

    @Mutual_Information

    2 жыл бұрын

    lol I may chill out the hand gestures. I'm still getting my KZread legs. And I'm going to do a separate video on the Jeffrey's prior. That's a tricky one to understand.

  • @littlebigphil
    @littlebigphil3 жыл бұрын

    This video combined with thinking about performing gradient ascent was helpful. Our objective is to maximize the likelihood of our current parameterization is given our samples. Maximizing the log-likelihood is similar to maximizing the likelihood but with harsh loss for outliers. The score uses this loss to perform gradient ascent. Larger scores gives larger step sizes. Because the score at the optimal value is 0, for any score to be large, there must have been an interval where the slope of the score was also large. All of that gives the average (negative) hessian of the log-likelihood.

  • @Mutual_Information

    @Mutual_Information

    3 жыл бұрын

    Interesting stuff! This reminds me of natural gradient methods, which I’ll be covering later on.

  • @kimchi_taco
    @kimchi_taco10 ай бұрын

    You are the best DJ I've ever listened

  • @toobabb3613
    @toobabb36132 жыл бұрын

    Thank you, I wish I watched this video before searching lots of articles.

  • @anonymousalligator7500
    @anonymousalligator75002 жыл бұрын

    Bro, this is the top notch quality of education. You are an Educator, man.

  • @Mutual_Information

    @Mutual_Information

    2 жыл бұрын

    Thanks brotha

  • @junninghuang4343
    @junninghuang43432 жыл бұрын

    Nice video, haven't thought about likelihood, score function, fisher information matrix in this way, very intuitive and straightforward. The nicest part is the visualization is based on the evaluation on the true parameter, which explains the tricky identity of the expected likelihood gradient. One minor suggestion, because I learned FIM before, from my knowledge and wiki, FIM is the variance of log-likelihood gradient evaluated on any parameter theta, but in your video, you misstated that FIM is evaluated on the true parameter.

  • @Mutual_Information

    @Mutual_Information

    2 жыл бұрын

    Yea the FIM can refer to the function as you say. But in that case, the inputted parameter still acts like the true parameter. Because it’s only at the true parameter that the expected gradient is zero, and that’s always true of the FIM, regardless of the given theta . Still, I see your point - the terminology is better applied to the function than the matrix of values.

  • @junninghuang4343

    @junninghuang4343

    2 жыл бұрын

    @@Mutual_Information Exactly, from my site, the statement of FIM is evaluated on the true parameter would be misleading to someone, that's why I suggest keeping the function form of FIM in mind which is more mathematically rigorous. And yes, you are right, the fact that the expected gradient is zero indicates the true parameter and true FIM. Anyway, thanks for the nice visualization, making such a nice video takes a great effort more than writing a blog I guess. Salute!!!

  • @Mutual_Information

    @Mutual_Information

    2 жыл бұрын

    Glad to have your comments Junning here. Please stick around! :)

  • @SO-wg4yb
    @SO-wg4yb Жыл бұрын

    How great is this video! Thank you for making this great content. Please continue to do your great jobs:)

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    No plans on stopping :)

  • @alexanderk5835
    @alexanderk58352 жыл бұрын

    thanks, such a great explanation video with an amazing visualisation

  • @kirankulkarni2396
    @kirankulkarni23962 жыл бұрын

    Excellent explanation. Thank you very much!!

  • @psl_schaefer
    @psl_schaefer6 ай бұрын

    Amazing Video! Thanks for taking the time to produce such awesome content :)

  • @aiart3453
    @aiart34533 жыл бұрын

    Fantastic work mate. I added you on linkedin to get your help one to one. Thank you for the video. Cant get better.

  • @dayibey9700
    @dayibey9700 Жыл бұрын

    woow! very good explanation with useful, spot-on visuals. will surely help developing intuition about this sophisticated concept. subscribed to see you keep up with such a good work.

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Fortunately I have zero plans of slowing down

  • @sdsa007
    @sdsa00711 ай бұрын

    Wow! the visuals are even better than on Ian Explains...

  • @visualish
    @visualish2 жыл бұрын

    That was fantastic, thank you. keep up the good work

  • @douglasmason6067
    @douglasmason60672 жыл бұрын

    Amazing work!

  • @zactron1997
    @zactron19972 жыл бұрын

    Came from mCoding's shout-out. Nice video!

  • @ZarakJamalMirdadKhan
    @ZarakJamalMirdadKhan2 жыл бұрын

    Cant your explain further the hessian metrics and multidimensional expression of the FI in detail? Please. P.s: I'm saving the playlist. Your visuals makes the econometrics concepts so easy. Thanks a lot.

  • @stathius
    @stathius Жыл бұрын

    First of all, thanks a lot for taking the time to create such a great visual explanation, very refreshing way of presenting things! I was wondering if we are not at the true μ then the variance of the scores is not called Fisher Information anymore? Because irl we are most of the times not aware of the true μ anyways.

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Yes! Frequentist statistics has this radical.. irrelevance for that reason. Yet it doesn't stop people from using the MLE as a plug-in for the "true parameter" and charging forward as though there's no issue :)

  • @alessiotonello9666
    @alessiotonello9666 Жыл бұрын

    Amazing video! Thank you so much!

  • @HelloWorlds__JTS
    @HelloWorlds__JTS4 ай бұрын

    (8:12) I think you neglected to change the plot labels, since they are no longer for normal distributions. Thanks for this video, is a great effort!

  • @Mutual_Information

    @Mutual_Information

    4 ай бұрын

    Ahhhh yes, good point. Oh well, sounds like you knew what I was going for

  • @randynasty6036
    @randynasty60362 жыл бұрын

    you are the next 3blue1brown. Very elegant animation and super interpretative explanation!

  • @Mutual_Information

    @Mutual_Information

    2 жыл бұрын

    Lol that is quite a high bar, I’ll be happy with way less than that. But thank you, It means a lot that this effort gets noticed.

  • @marcuschiu8615
    @marcuschiu86152 жыл бұрын

    This video helped alot!!! What software did you use to create those visuals at 6:14? My previous comments were deleted, prob since I wanted to share a website that recreated an interactive visualization of Fisher-Information WIP

  • @Mutual_Information

    @Mutual_Information

    2 жыл бұрын

    Hey Marcus, glad it helped. The visuals are created with Altair, which creates static plots (like Matplotlib). Then I use a personal library to stitch them together into vids.

  • @njitnom
    @njitnom2 жыл бұрын

    hello at 6:03 when you start your intuition, you zoom in on a value of a single score function right. So when there is only 1 observation, a positive value recommends shifting mu to the right, a negative value recommends shifting my to the left. So in high variance case, more scores are closer to zero, but isnt it also the case that the low variance case recommends more extreme different shiftings? Because some of those score function are much more negative and some other score functions are much more positive, therefore recommending a huge shift to the right and to the left in contrast to the high variance score functions. If this is correct, how come that then still the high variance, and not the low variance, provide a bigger set of possible mu values?

  • @Mutual_Information

    @Mutual_Information

    2 жыл бұрын

    Hm, let me try to clarify. In the high variance case, the scores would have large magnitudes… so if you wanted to increase the log lik by 1, you wouldn’t have to move far at all (in either the left or right direction) If it’s the low variance case, then you get the “wildly different recommendations” as to where mu is. I think you might be getting a smidge confused on low variance / high variance. Low variance means scores are like -.001, .002, -.001, .0005. In the high variance case, the numbers would be like 10.2, -12.4, 8.7, … Hopefully that helps

  • @njitnom

    @njitnom

    2 жыл бұрын

    @@Mutual_Information oohhh oops with high variance i meant low variance yeah, sorry about that. I try to rephrase my question :D, its very visual in which i formulate my question i hope u understand. The idea is to draw for each 2 variance cases, the true CDF of it on a (-inf, inf)x[0,1] plane. Then a random sample of n is created by taking a random sample of n of unif(0,1) and looking at their image. Then make a third axis (dlog p) that shows the score functions of each of these data points. Then if im correct the distributions of score functions can be derived by finding the intersect of these score functions on the [0,1]x[dlogp] plane, evaluated at a certain mu in (-inf,inf). And in this case this distribution approximates a normal distribution when the random sample tends to infinity right. Is the reason that in the low variance case, the variance of the distribution of scores evaluated at the true parameter value is higher than that of the high variance case, because: when we take one data point from the UNIF, and look at the corresponding high variance data point and low variance data point, and fix the plane at the true parameter value, the intersection point of the low variance data point is guaranteed to be closer to zero than that of the high variance case. And because this holds for all data points, the distribution of score functions of low variance, has a higher variance. If so, do you know why this is guaranteed to happen? Why are the slopes of the low variance score functions sufficiently small to guarantee this. Sorry for long text :D

  • @shskwkfvekqlevjwkwv
    @shskwkfvekqlevjwkwv Жыл бұрын

    so awesome, thanks for the effort!

  • @guillermosainzzarate5110
    @guillermosainzzarate511011 ай бұрын

    Wow you really should make a video about statistical manifold. Thanks for your videos, they are really amazing!!

  • @Mutual_Information

    @Mutual_Information

    11 ай бұрын

    Not a bad idea..

  • @brandomiranda6703
    @brandomiranda67032 жыл бұрын

    One thing I noticed is that the fisher information being high could be used to select the true parameter (or between different models, NNs, architectures, functions, etc)...but it must be super easy to construct artificially a function such that for a given data set the fisher information is extremely high (and the gradient wrt w is zero of course)...but will that work well on the test set? It seems in the end fisher information is a nice heuristic (if it's easy to compute which I doubt it is since it depends on the hessian, the variance of the scores should be fine to compute I hope) to choose a model - but the validation set (and test set without cheating) are the "ultimate truth".

  • @jinyunghong
    @jinyunghong Жыл бұрын

    Thank you so much for your great explanation!

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    You are very welcome Jinyung!

  • @mohitwankhede9372
    @mohitwankhede93722 жыл бұрын

    You are fire..🔥 You explained this much easier way

  • @chen-yuwei8793
    @chen-yuwei87937 ай бұрын

    Thanks for the great video! I wonder why the "log" in front of the density function? I mean, if I replace all log P by just P, does the quantity still make sense?

  • @daughterofunicorns3873
    @daughterofunicorns38732 жыл бұрын

    That was really beautifully explained- thank you very much :) However what I would like to know is (at 5:10), why is the mean of the score functions going to be 0 at the true value of mu?

  • @Mutual_Information

    @Mutual_Information

    2 жыл бұрын

    Glad you enjoyed it! The way I like to think about is this way. Let's say we are dealing with a score function of one observation and one parameter value. If the score is positive, that's saying you could increase the log prob by moving in one direction (definition of a slope). If it was negative, it's saying you could increase it if you move in the other direction. But what if we had a set of data? Then you look at the average.. if the average is positive or negative, you can increase average log prob by moving in one direction. But, what if the data is generated from the true parameter and you are evaluating at the true parameter? We already know we are maximizing the likelihood at this point.. so the average score can't be anything other than zero.. if it was.. it would be recommending a way to change the parameter to increase the likelihood. But that's not possible - we're at the max!

  • @heinrichvandeventer357
    @heinrichvandeventer3572 жыл бұрын

    I paused at 25 seconds in and nearly choked to death on my coffee. I like the content. Keep it up! :)

  • @Mutual_Information

    @Mutual_Information

    2 жыл бұрын

    Lol as long as you’re not in fact dead, I’ll take it as a compliment!

  • @jadecheng7483
    @jadecheng7483 Жыл бұрын

    Hi, beautiful video! I wonder if I could ask what tools you used to plot the first PDF plots, where you compared log(N(x|mu, 25)) to log(N(x|mu, 1))? they looked so pretty, as the intensity of the colour also indicates density of lines.

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Thanks! I use Altair, the python plotting library. It's for static plots and I use a personal library to convert them into short videos.

  • @jadecheng7483

    @jadecheng7483

    Жыл бұрын

    @@Mutual_Information thanks!!

  • @eduardodiaz5459
    @eduardodiaz54592 ай бұрын

    Good video. Which software do you use to make the math animations???

  • @radityadanu
    @radityadanu2 жыл бұрын

    WOW! JUST WOW! This is the most clearest information about fisher information! Wait, that sounds redundant.... But anyway it is the best video about this topic!

  • @PiyushVerma-em6wq
    @PiyushVerma-em6wq11 ай бұрын

    Question: can I consider two normal distributions as distributions from two different ML models (as if we are trying to compare which mode l has highest fisher information)?

  • @Mutual_Information

    @Mutual_Information

    11 ай бұрын

    Yea, that's the idea here.

  • @nikoskonstantinou3681
    @nikoskonstantinou36813 жыл бұрын

    Keep up this great job! One day your channel will be big... I can sense it from the amazingquality of your videos and your passion on the subject!

  • @Mutual_Information

    @Mutual_Information

    3 жыл бұрын

    Thank you! That means a lot. These early days will be a bit of a slog, but I’m confident there’s an appetite for this level of details.

  • @abdullahsheriff_
    @abdullahsheriff_14 күн бұрын

    This. Is. Beautiful. Intuition 100.

  • @waterseethrow9481
    @waterseethrow9481 Жыл бұрын

    Great video!! Amazing animations! Is there any way to quantify the Fisher information? Is there any rule of thumb?

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    The best we can do is to substitute the MLE for the true parameter estimate.. and then we can start working with numbers. But that version of the FI can disappointment. Not being at the true parameter estimate means several of the properties we like so much.. don't technically apply.

  • @orsike192
    @orsike19210 ай бұрын

    @Mutual Information in case this lecture is difficupt for understand which books and/or videos would you suggest me to read/watch before rewatching this video? Thanks

  • @Mutual_Information

    @Mutual_Information

    10 ай бұрын

    The Elements of Statistical Learning covers this topic well. I forget which chapter exactly but it should be easy to find. If you’re interested in learning about the whole field and you’re relatively new, they have a related book called “An Introduction to Statistical Learning”, which is from a related group of authors.

  • @des6309
    @des6309 Жыл бұрын

    amazing stuff thanks!

  • @6DAMMK9
    @6DAMMK9Ай бұрын

    Come from... AI art community. Msc of CS here, but not a math pro. I was stunned by "fisher merging" was just a single line of equation. Now I know what is the "fisher" inside the hood 😂

  • @HuyNguyen-fp7oz
    @HuyNguyen-fp7oz3 жыл бұрын

    Great! I hope you will keep high standard for your videos like your great answers on Quora.

  • @Mutual_Information

    @Mutual_Information

    3 жыл бұрын

    Ah a Quora reader! Glad to see you made it over here. And will do!

  • @kalebbennaveed3704
    @kalebbennaveed37047 ай бұрын

    I have a slightly different question. What software tool do you use to create animated plots?

  • @Mutual_Information

    @Mutual_Information

    7 ай бұрын

    Altair to create images of static plots, and then I paste them together with a little library i've written.

  • @GumRamm
    @GumRamm Жыл бұрын

    Great video and explanation! One thing that wasn’t clear to me was that we take the expectation over theta*, where throughout the video we treated it as an unknown but fixed variable. How would one take the expectation when theta* is fixed or has an unknown distribution?

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    In practice, you can't. That's why this is a little weird. In practice, you have to substitute in some estimate for the true parameter, and that's where a lot of the nice properties fall away. But, when we're speaking theoretically, we can do whatever we want! Like talk about a fixed, true parameter and derive results using it. Think of this video as making this statement: If you knew the true parameter value, you'd get this nice thing (the FI matrix) which tells you how certain you should be about estimates of the true parameter. That's a weird statement to make! But, it's a mathematical fact. People will utilize it in practice by substituting estimates in and hoping the math still holds.. well enough.

  • @outtaspacetime
    @outtaspacetime Жыл бұрын

    I struggle a bit with the part on the covariance matrix, but I feel like I could get it if would do some hard math on it with some numerical examples with the intuition of this video in my mind! Thanks was really helpful

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    The covariance matrix is a tricky concept. Took me awhile to get use to.

  • @amaarquadri
    @amaarquadri22 күн бұрын

    Does the statement at @5:03 still apply in the case of bimodal probability distributions?

  • @a_alex_l2041
    @a_alex_l2041 Жыл бұрын

    Wow, great, it really helped !

  • @lukasstein6231
    @lukasstein62312 жыл бұрын

    Is there a paper that goes into more depth on those beautiful illustrations? (for example at 8:11)

  • @Mutual_Information

    @Mutual_Information

    2 жыл бұрын

    Thank! And, to answer your q, no, not that I'm aware of. When I first learned them, this is what I had in my head. Only way for me to make sense of it.

  • @BiologyIsHot
    @BiologyIsHot2 жыл бұрын

    Can you do a video on canonical correlation analysis (CCA)? I get PCA but can't wrap my head around CCA and there aren't any great videos on it.

  • @Mutual_Information

    @Mutual_Information

    2 жыл бұрын

    Maybe one day, but for now I have no concrete plans for it. Do you know of any cases where it is used in real applications? I've only come across it in textbooks.

  • @pluviophilexing2580
    @pluviophilexing2580 Жыл бұрын

    Thank you so much 😘very intuitive

  • @xingyanglan6836
    @xingyanglan68362 жыл бұрын

    sometimes i wonder if my professors during zoom get curious and go see what video presentations on youtube look like and feel a lil sad deep down

  • @marcinelantkowski662
    @marcinelantkowski662 Жыл бұрын

    As all the other videos, this one provides a great explanation, but tbh a key piece is missing: why would we ever care about the FI? When is it useful? Why is it popular? What problem can it solve for me? E.g. I already knew that if I want to measure some quantity, it's better if the underlying random variable has low stdev, instead of a large one :D

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Good point!

  • @njitnom
    @njitnom2 жыл бұрын

    HELLO!!!!!! at 7:56 how can there be a non-degenerate distribution of the 2nd derivitaves if its always -1? How are these distributions derived? THANK U SO MUCH SIR FOR UR NICE VIDEOS U HELP ME LOTS LOTS LOTS!!!12!!!

  • @Mutual_Information

    @Mutual_Information

    2 жыл бұрын

    Yea good observation. These are merely estimated distributions using some kernel based density estimation method using some samples. So it’s approximating the truth, which is as you mention - it has all its mass on -1. There’s a little note that flashes that mentions this.

  • @zoesoohyunlee7209
    @zoesoohyunlee72093 жыл бұрын

    Love the visualization and clear explanation!! Finally, I can understand intuitively the log-likelihood function and Fisher information matrix. Thank you so much for creating this video!! One small thing I'd like to mention: I really enjoyed the liveliness of your explanation but found the hand gestures a bit eye-catching while I was trying to concentrate on the written information on the left. Maybe a slower movement could help?

  • @Mutual_Information

    @Mutual_Information

    3 жыл бұрын

    I haven’t heard feedback like this before - very useful. Did not think of that but totally makes sense. I’ll try to chill the hands out next time. I’ve already recorded a few vids without this feedback, but the ones beyond that should reflect that. Thanks for the advice!

  • @xiaoweidu4667
    @xiaoweidu46673 жыл бұрын

    well done!

  • @pengbo87
    @pengbo875 ай бұрын

    thanks for making the world better

  • @marceloenciso6665
    @marceloenciso66652 жыл бұрын

    Fucking genius! keep going this way, this kind of unique materials focus on intuition helps more than you can think of.

  • @Mutual_Information

    @Mutual_Information

    2 жыл бұрын

    Thank ya - more coming!

  • @AliRaza786
    @AliRaza7862 жыл бұрын

    Man you are amazing. Keep doing the good work.

  • @Mutual_Information

    @Mutual_Information

    2 жыл бұрын

    Thanks, I will!

  • @brandomiranda6703
    @brandomiranda67032 жыл бұрын

    Btw, why do you say frequentist is a bad term...isn't that what nearly 100% of deep learning is now days?! Thanks for the video! Seems you have legit channel. :)

  • @Mutual_Information

    @Mutual_Information

    2 жыл бұрын

    Thank you! Very happy to have you as a viewer To answer your question.. from my very narrow view of the whole DL space.. no I don't think it relies heavily on freq statistics. Sure, p-values are reported sometimes (though, I can't recall seeing them recently) in some statistical analysis of performance on DL models.. but the models themselves don't share the most defining assumptions of frequentist statistics. I don't see anyone speculating there is some 'true' parameters of the DL architecture. One reason in particular is because we know we almost always arrive at some local optimum.. so we could never arrive at those 'true' parameter values. I think ideas from freq stats are treated more like a buffet. Some things get borrowed (the Fisher Information), but no one is subscribing to the whole of the freq stats.. and that's b/c it wouldn't be effective.

  • @fuziyang7502
    @fuziyang75025 ай бұрын

    I am quite confused about the first figure, why the x axis is miu, I can't quite understand it. Because there are certain different values under the same miu, what does that mean? Does it mean the x_i?

  • @Mutual_Information

    @Mutual_Information

    5 ай бұрын

    Yes, that's the unusual aspect of these graphs. One curve line corresponds to a specific x_i (like x_i = 5.8121..) In particular, the whole curves gives you the log likelihood for this x_i value as the *parameter, mu, is varied*. And we create many of these curves. Now if we're considering a specific mu value (like mu = 6), that's like considering a vertical line and all the points the intersection the curves. This answers: "what are the log likelihoods for all the data points if mu is chosen as 6?" Does this help?

  • @brandomiranda6703
    @brandomiranda67032 жыл бұрын

    what textbook you have there in the beginning of the video?

  • @Mutual_Information

    @Mutual_Information

    2 жыл бұрын

    Probability Theory by Edwin Jaynes - a classic!

  • @brandomiranda6703

    @brandomiranda6703

    2 жыл бұрын

    @@Mutual_Information Thnx! Cool channel btw! It's appreciated. Hope to see more of your stuff! I like your conceptual approach. To many ppl do either to informal or too rigurous. Thanks!

  • @scar6073
    @scar6073Ай бұрын

    Bro literally has Jaynes's book on the desk and talking about frequentist ideas lol

  • @jacopobandoni4858
    @jacopobandoni4858 Жыл бұрын

    What does it mean (in 3:49 ) "if we plug mu naught in we'll get back big list of numbers" and then it appears a Cartesian plane having log(p) on the x axis. I'm having trouble really understanding what is going on: - what kind of distribution is it? - what list of number is he referring to? Thank you to anyone who might help me understand.

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Hey Jacobo, maybe I can help. The idea is to imagine many functions, each associated with a different data point that was generated by sampling from the true distribution. These functions are likelihood functions which accept a parameter, mu, as input. The output is "log p", which is the log probability of the data point according to the parameter. Since we have many funtions, we can plug mu into all of them, giving us a "big list of numbers". That is the outputted values from all the functions. Also, the 'kind' of distribution of a normal distribution.. but that doesn't matter. This could work with any distribution.