Understanding Shannon entropy: (1) variability within a distribution

In this series of videos we'll try to bring some clarity to the concept of entropy. We'll specifically take the Shannon entropy and:
* show that it represents the variability of the elements within a distribution, how different are they from each other (general characterization that works in all disciplines)
* show that this variability is measured in terms of the minimum number of questions needed to identify an element in the distribution (link to information theory)
* show that this is related to the logarithm of the number of permutations over large sequences (link to combinatorics)
* show that it is not in general coordinate independent (and that the KL divergence does not fix this)
* show that it is coordinate independent on physical state spaces - classical phase space and quantum Hilbert space (that is why those spaces are important in physics)
* show the link between the Shannon entropy to the Boltzmann, Gibbs and Von Neumann entropies (link to physics)
Most of these ideas are from our paper:
arxiv.org/abs/1912.02012
which is part of our bigger project Assumptions of Physics:
assumptionsofphysics.org/

Пікірлер: 59

  • @gcarcassi
    @gcarcassi3 жыл бұрын

    DISCLAIMER: If you see ads, these are put in by KZread. I do not get any money from them, KZread does. I'd like to turn them off, but it seems it's out of my control!

  • @jacoboribilik3253
    @jacoboribilik32533 ай бұрын

    Great video. I am looking forward to watching the rest of the videos on information theory.

  • @koppanydodony8769
    @koppanydodony876910 ай бұрын

    Finally a great video on the topic. Thank you, I have been searching for this for long.

  • @gcarcassi

    @gcarcassi

    10 ай бұрын

    Thanks!

  • @AlbertFlor
    @AlbertFlor2 жыл бұрын

    Incredibly clear and insightful explanation, thank you!

  • @gcarcassi

    @gcarcassi

    2 жыл бұрын

    Thanks!

  • @MikhailBarabanovA
    @MikhailBarabanovA3 жыл бұрын

    What a clean explanation!

  • @gcarcassi

    @gcarcassi

    3 жыл бұрын

    Thanks!

  • @hariprasadyalla
    @hariprasadyalla4 ай бұрын

    Wonderful explanation. I struggled for a day to fully comprehend how the logarithmic part of entropy formula work when the probabilities are not exactly equal to some powers of a base b. The third property in this explanation made my day. Thank you so much.

  • @gcarcassi

    @gcarcassi

    4 ай бұрын

    Thanks for sharing!

  • @alijassim7015
    @alijassim70153 жыл бұрын

    It's the first time I hear that there are different types/definitions of entropies... I always found entropy to be a challenging concept in itself; I think reading studying these different definitions might actually help me understand the concept better.. Thanks!

  • @gcarcassi

    @gcarcassi

    3 жыл бұрын

    I have seen people, in fact communities, talk past each other because they have different definitions in their head... To me, the confusion all comes from there. The test is explaining it to others, and see if it helps them too. So, if at the end of the series, you feel it actually helped, then I am going in the right direction... :-)

  • @rmorris5604

    @rmorris5604

    Жыл бұрын

    There aren’t different concepts of entropy. The chemical entropy is the Shannon entropy of a physically defined information. The probability distribution of possible micro states for a given macro state is the entropy used by chemists. When Claude Shannon was looking at communication theory he came up with his equation for the entropy of a communication process. He was wondering what to call it and a friend that knew the equations of chemistry said you should call it entropy because it’s the same thing. He also said, tongue in cheek, that their was another reason to call it entropy: because no one will understand it. :) Entropy is hard because it is based seemingly on knowledge of what we don’t know. Take rolling a dice. If I get a 6 you can’t calculate the entropy, but if I tell you that there are 6 possible outcomes, and they are equiprobable, then you can calculate. Physically we have two kinds of information. We have information about what could happen and information about what does happen. Actually we can know both. You can know what could happen based on some physical model and then measure what does happen in a particular experiment. So entropy is based on the actual knowledge of what could happen, what are the possibilities, which is knowledge, in advance, of what might happen. In a sense, nature sends us information when we do an experiment. So experimentation can be thought of as a signaling process in which nature is the transmitter and we are the receiver. The physical laws define the possible outcomes and their probabilities.

  • @dhimanbhowmick9558
    @dhimanbhowmick95589 ай бұрын

    Fantastic explanation, thanks a lot 🙏🏽

  • @EugeneEmile
    @EugeneEmile22 күн бұрын

    Great stuff.

  • @sukursukur3617
    @sukursukur36172 жыл бұрын

    Thank you very much. Excellent

  • @themagnificentdebunker2644
    @themagnificentdebunker2644 Жыл бұрын

    Please continue to make content.

  • @geoffrygifari3377
    @geoffrygifari33772 жыл бұрын

    ohhh i hear about shannon entropy being described as "information", "uncertainty", "surprise", and yeah they sound *vague* . the sources i got these from are mostly popular science content creators, meant to attract and introduce people to the field... so valuable to have deeper, meatier contents like this also

  • @gcarcassi

    @gcarcassi

    2 жыл бұрын

    Those are the common characterizations, which you will find in advanced books as well. This one is the result of our research... I doubt you will find it anywhere else. Our general goal is to have clear concepts at the physical/scientific level so that you can rederive the math... so no more "you can think it as this" but "it _is_ this".

  • @maltrho

    @maltrho

    Жыл бұрын

    The "surprise" connotation is really the worst, it is positively misleading/false (There is a good exact defiition of "surprise" with crucial differences from Shannons entropy)."Disorder" is just asking to be misunderstood (Fysicist has as little right to speak of order as biologist of a soul). "Uncertainty" really is by far the better choice of the ones you mentioned, closing quite well in on what the formula incapsulates... but saying 'variability" like in this video is probably more helpfull, that concept being more clinical.

  • @medaphysicsrepository2639
    @medaphysicsrepository26393 жыл бұрын

    wow.. this was an incredible video! Entropy is something that I am always finding out new things about.. harder to understand than quantum mechanics if I am being honest..

  • @gcarcassi

    @gcarcassi

    3 жыл бұрын

    Thanks! I am indeed finding that the level of confusion on the foundations of statistical mechanics is similar to the one in quantum mechanics... Some of the issues are incredibly similar (i.e. what do the probability represent?) But the main problem is that there are many different "things" that are called entropy. In the series I only go through some... :-/

  • @ismael72528
    @ismael725282 жыл бұрын

    excelent explanation.

  • @gcarcassi

    @gcarcassi

    2 жыл бұрын

    Thanks!

  • @vicvic553
    @vicvic5533 жыл бұрын

    Thank you so much. It really brigthen my imagination a lot! But actually, how it is related in context of informational theory? Could you make a video or just simply explain using an example, please?

  • @gcarcassi

    @gcarcassi

    3 жыл бұрын

    Have you looked at the second video in the series? It explains the connection to bits (information theory) and has a fully worked out example.

  • @geoffrygifari3377
    @geoffrygifari33772 жыл бұрын

    I'm trying to make examples: suppose we have a set O of three types of elements • × Δ , having 4 of those elements each. If we have other sets A,B,C with: A = { 8• , 2× , 2Δ } B = { 2• , 8× , 2Δ} C = { 2• , 2× , 8Δ } then all A, B, and C will have the same shannon entropy, and all of them will have shannon entropy larger than the original set O?

  • @sukursukur3617
    @sukursukur36172 жыл бұрын

    5:12 zscore helps for grouping numbers as independent from their values in the set. But if the data is continuous, it doesnt work.

  • @EinSofQuester
    @EinSofQuester2 жыл бұрын

    what I don't understand is how come we're using the full H(pi) when breaking down the shannon entropy into it subtypes. Why are we doing H(rk) = H(pi) + pa(H(qj)) instead of only using the portion of H(pi) which is not also included in H(qj)?

  • @gcarcassi

    @gcarcassi

    2 жыл бұрын

    Hi! You are right: I could have added a numerical example to show how the expression satisfies all requirements, especially the third! I have them numerical examples in the second video, to show explain what the numeric value means, and a Huffman coding example to show the connection to information theory. I wish KZread would allow me to modify the video and add it! I hate that I can't improve things... Anyway, here's how it works in your case. The third requirement states: H(3/8, 2/8, 3/8) = H(3/8, 5/8) + 5/8 H(2/5, 3/5) The first distribution is the expanded, the second is what you are expanding, and the third is what you are using to expand. If you substitute the H expression, you have: -3/8Log3/8 - 2/8Log2/8 - 3/8Log3/8 = -3/8Log3/8 - 5/8Log5/8 + 5/8(-2/5Log2/5 - 3/5Log3/5) You can work on the second term: = -3/8Log3/8 - 2/8Log5/8 - 3/8Log5/8 -2/8Log2/5 - 3/8Log3/5 = -3/8Log3/8 - 2/8 (Log5/8 + Log2/5) - 3/8 (Log5/8 + Log3/5) = -3/8Log3/8 - 2/8 Log (5/8 * 2/5) - 3/8 Log (5/8 * 3/5) = -3/8Log3/8 - 2/8 Log2/8 - 3/8 Log3/8 And show that it is indeed equal to the first. I wrote it quickly, I hope the math checks out! Thanks again!

  • @nidhibhawna2176
    @nidhibhawna21762 жыл бұрын

    Please make a video on How to use Shannon entropy to detect land-use change.

  • @gcarcassi

    @gcarcassi

    2 жыл бұрын

    Hi Nidhi! I am afraid it's completely out of my expertise: I didn't even know such a thing existed! My research is on the foundations of physics and related... sorry!

  • @davidhand9721
    @davidhand97214 ай бұрын

    To me, Shannon entropy will always be the number of bits or degrees of freedom. If I have a series of numbers, and they're all generated by the repeated application of some formula, then I only really have one degree of freedom, not a whole series of numbers. Since everything can be encoded as a number, you can always encode a physical system into series or functions, and it's not at all different.

  • @shrutidivilkar8140
    @shrutidivilkar8140 Жыл бұрын

    How Information theory is related to Shannon entropy

  • @kumarichhaya845
    @kumarichhaya8452 жыл бұрын

    While calculating the shanon entropy with the given formula in word my value of entropy is coming in minus, all I need to know is should we ignore negative sign in the entropy value

  • @gcarcassi

    @gcarcassi

    2 жыл бұрын

    If you are calculating the entropy on a discrete distribution, then the value must be positive. Note that the expression is minus the sum of p log p.... I often forget the minus.

  • @ElizaberthUndEugen
    @ElizaberthUndEugen3 жыл бұрын

    9:03 when H o U (where o is function composition) is f, where f(xy) = f(x) + f(y) and hence f = log, then why does the equality not say H(U(NM)) = ... = k log(NM) ? And instead only k log N (without the M)?

  • @gcarcassi

    @gcarcassi

    3 жыл бұрын

    Hi! Note that the slide says: => H(U(N)) = k log(N). There is an implication sign under the one, not an equality sign. So the last term is not equal to H(U(NM)). As you say, H(U(NM)) would be equal to k log(NM) and also H(U(M))=k log(M). Makes sense?

  • @ElizaberthUndEugen

    @ElizaberthUndEugen

    3 жыл бұрын

    @@gcarcassi Oh, I see, thanks!

  • @maltrho
    @maltrho Жыл бұрын

    I liked the video, but can someone tell me, what would be the problem with using just: Sigma:( P(i)^2)? Would it not behave similarly and thus measure the same thing in a slightly changed scale? (concerning the argument saying the the original expression 'is the only...'

  • @gcarcassi

    @gcarcassi

    Жыл бұрын

    If I understand the notation, that's the sum of the squares of the probability. It satisfies requirement 1 (function of the probability alone) Let's check requirement two! Let's suppose a uniform distribution with n elements. Sum_{1..n} (1/n)^2 = n/n^2=1/n. So, if you double the number of cases (n -> 2n), the indicator becomes the half (1/n -> 1/2n). This breaks requirement number two. What you could do is take the inverse: 1/(Sum p_i^2). That would become monotonic. But then you would break requirement 3 (which you can probably verify with a simple case... but too complicated to put in the comments). At the end of the day, there _are_ multiple indicators. See the Rényi entropy, which is essentially a family of indicators. The proof tells you that the Shannon entropy is the only one with those particular features. Exactly how those features are broken and why, will depend on a case by case basis... but the proof tells you they will be broken.

  • @maltrho

    @maltrho

    Жыл бұрын

    Yeah, you got my notation :-) (actually though i had wanted to have 1-P(i)^2 inside the summation, (just forgot the 1-). But to be honest i havent tested breaking nr 3. I made some plots in python and saw that the two behave practically identical (as expected) when fed with a bunch of various distributions. And then i wonder what justfies the further complication, outside maybe mapping the values nicely on the real line. Do you think the 3. requirement is an arbitrary preference from Shannons side, or that it does something important? (Just if you have any opinion)

  • @gcarcassi

    @gcarcassi

    Жыл бұрын

    ​@@maltrho Well, it's linearity in probability... which is itself a linear space. For example, it's what makes the Shannon entropy of independent distribution sum. If you make a series of choices, it's what makes information sum with the probability of making those choices. Basically, every "nice" property of the Shannon entropy with respect to probability theory comes from that... Given that there are other entropies, some other properties are going to be more important in other cases... Though I am not knowledgeable enough (my research is on the foundations of physics, not really in information theory).

  • @maltrho

    @maltrho

    Жыл бұрын

    @@gcarcassi i will sit down and play a bit around with those sums some day. Thank you so much for answering!

  • @jameschen2308
    @jameschen23082 жыл бұрын

    Where can I read up on entropy in the style of how you presented it?

  • @gcarcassi

    @gcarcassi

    2 жыл бұрын

    We wrote an article ( arxiv.org/abs/1912.02012 ), but apart from that, I am not sure. The style/approach is particular to our project Assumptions of Physics, which aims to find better conceptual starting points that can be used to rederive the math. For entropy in particular, we looked at many different sources and we found there is a lot confusion/contradictory information. Part of the problem is that it is used in many different contexts, so the presentation is often associated to particular views that are only valid in that context. Which makes it a problem for me to suggest something.. :-/

  • @jameschen2308

    @jameschen2308

    2 жыл бұрын

    @@gcarcassi you are doing gods work sir. I've never seen a cleaner introduction

  • @gcarcassi

    @gcarcassi

    2 жыл бұрын

    @@jameschen2308 Thank you so much!

  • @kruan2661
    @kruan26613 жыл бұрын

    Excellent pronunciation and very decent detail. 5:37 is a little confusing because you said the "variance" for the two cases are different, then you say the "variability" are the same, then you say "variance" is not a good indicator... so what exactly you want to say?

  • @gcarcassi

    @gcarcassi

    3 жыл бұрын

    We have two cases. The variability is the same in both cases. Therefore a good indicator of variability should return the same value in both cases. The variance does not give the same value in both cases. Therefore the variance is not a good indicator. Is it more clear?

  • @kruan2661

    @kruan2661

    3 жыл бұрын

    @@gcarcassi Thanks. also, 1:01 should be H(p) rather than H(pi)?

  • @gcarcassi

    @gcarcassi

    3 жыл бұрын

    @@kruan2661 It's H(p_i). It's short for H(p_1, p_2, ...). If you are pointing out that the p_i on the left side is different than the p_i on the right, you are absolutely right. On the left, the p_i indicates a set of numbers while on the right is each number taken individually. Yet, it is a "standard" convention to use p_i to mean the whole "vector" as well as each component through an enumeration. So, for example, in relativity one talks about the metric tensor g_{\mu, u}. It's a common "abuse of notation" and is useful to keep track the dimensionality of the objects. Similar abuses of notation, x^i sometimes means the coordinates as function from the space to the real value, and sometimes mean the value of the coordinates themselves.

  • @kruan2661

    @kruan2661

    3 жыл бұрын

    Thank you for clarifying!

  • @joeboxter3635
    @joeboxter3635 Жыл бұрын

    You are a genius - I hate you. Lol. Alright, I don't hate you. I'm jealous. Alright, my higher brain function has brought me to my senses. I admire you. Lol. You property approach is very nice. But I've seen this Shannon done with calculus of variation. It would be nice to see you hit this and other uses of calculus of variation.

  • @gcarcassi

    @gcarcassi

    Жыл бұрын

    Thanks! :-D For calculus of variation, you mean when it is used to maximize entropy? Or you mean you can derive the Shannon entropy formula itself (I haven't seen this done)?

  • @joeboxter3635

    @joeboxter3635

    Жыл бұрын

    @@gcarcassi I believe derivation of entropy. I saw it in a text as an example of calculus of variation. But we are talking almost 10 years ago. Let me dig and I'll try and get for you.

  • @joeboxter3635

    @joeboxter3635

    Жыл бұрын

    @@gcarcassi Alright, I have tried. But I cannot find. It was a book and I believe it was title engineering systems, but not sure. It had a variety of interesting topics. At the end was an appendix on the derivation of EL equation. It started off with comparison of electrical circuits and how mechanical systems could be modeled using similar diff eq. The it took the idea of lump circuit model to distribute components. This got you to PDE. It then showed how to solve these PDEs through generalized Laplace transforms and then that introduced you to idea of kernel functions. Then it went and showed how these generalized kernels could be tailored to specific problems. In fact, if I recall there was an example of how they could be used to solve nonlinear ode and pde with problem specific kernel. But that became an optimization problem. This brought you to calculus of variation. That's when I flipped to appendix and saw EL. It was quite fascinating. Not that I understood. I had just been skimming. Somehow there was an application of how EL with information theory. I thought I recall the result was -plogp and thus Shannon. But I'm not sure. I put it down and nearby was what at the time I thought was a better book. It was on calculus of variation. Years later I saw it. Turns out it's calc of variation by Gelfand, translated by Silverman. But I have never found the other book. I have been doing deep dives. Somehow I did remember mention of kolmogorov - and he does do complexity and information. But still no cal of variation to Shannon. Try to understand, that book I saw was 20 years ago. I only flipped through it for maybe an hour. But it made a lasting impression, apparently. Lol. My guess is you have Gelfand's Cal of Variation. If not, I highly recommend it.

  • @gcarcassi

    @gcarcassi

    Жыл бұрын

    @@joeboxter3635 Too bad, it sounds really interesting! Thanks for trying! I too have things that I have read, but can't reconstruct where. The one I wish I really wrote down the source, was a thought experiment about light polarization from the late eighteen hundred that already showed that light had to be quantized. It was astonishing simple, but I don't remember the argument nor the book. 😞 Maybe we should start a project that collects all these simple arguments/derivations in a single place... 🙂

  • @joeboxter3635

    @joeboxter3635

    Жыл бұрын

    @@gcarcassi Good idea!