Zipf's Law

Ғылым және технология

Do most words in a corpus occur with average frequency? Absolutely not! This video discusses a surprising regularity about word frequencies in corpora. And at the end, we'll make a trip to Hogwarts and see if Zipf's Law applies also in the world of wizards.
If you want to follow along, here are the word list files:
drive.google.com/file/d/1p5DT...
Request the Potter corpus:
docs.google.com/forms/d/e/1FA...
Vsauce on Zipf:
• The Zipf Mystery

Пікірлер: 29

  • @yingyusu3529
    @yingyusu35294 жыл бұрын

    Hello :) I watched your abralin talk live on Wednesday. I study generative syntax, and I was very inspired by your discussion of negative evidence in the Q&A session! Thank you for all the wonderful videos!

  • @MartinHilpert

    @MartinHilpert

    4 жыл бұрын

    Thanks a lot, Yingyu Su, that's very kind of you to say!

  • @BassmanTh
    @BassmanTh3 жыл бұрын

    Thanks for that extensive video! It put a great value into my master's thesis. Even though I'm dealing with distributions in geographical data, it was great and easy way to understand Zipf's law.

  • @LeonaDarkwind
    @LeonaDarkwind Жыл бұрын

    I LOVE that you've linked to Michael Stevens' video. I'm playing around with predictive language models and I'm really happy you're talking about WORD TOKENS in this video!

  • @coffecoding
    @coffecoding2 жыл бұрын

    This is great. You teach so clearly.

  • @shamsuddeenhassanmuhammad2143
    @shamsuddeenhassanmuhammad21433 жыл бұрын

    Excellent video. You teach excellently, your students must be happy with you.

  • @MartinHilpert

    @MartinHilpert

    3 жыл бұрын

    Thank you for your kind words. I teach linguistics to my students, but my students taught me how to do that, if that makes any sense. ;)

  • @Mustafghan
    @Mustafghan2 жыл бұрын

    Never seen normal distribution being explained so clearly and easy way to understand.

  • @ChildSarcophagus
    @ChildSarcophagus Жыл бұрын

    This blew my mind.

  • @duck2608
    @duck26084 жыл бұрын

    Hi, Thank you,I will follow all video of uncle

  • @zerobit778
    @zerobit7782 жыл бұрын

    Great professor

  • @cidiladamourasemedo1805
    @cidiladamourasemedo18052 жыл бұрын

    Hi Martin, thank you for the wonderful and very helpful video. I am applying Zipf's law on my task to create a dictionary of words that are specific for a particular category - However, I wonder if I could use the curve to determine a threshold number for the most significant words for the dictionary ? For instance, use the intercept to determine this?

  • @topsiterings
    @topsiterings3 жыл бұрын

    awesome!

  • @languagetv4756
    @languagetv47562 жыл бұрын

    thanks a lot

  • @thinaradesilva9351
    @thinaradesilva93512 жыл бұрын

    I'm doing a project on this same thing, would there be any chance for me to get in contact with you for a possible interview? awesome video by the way

  • @Melnish
    @Melnish2 жыл бұрын

    Thank you for the video:D I'm trying to download Antconc on mac with the newest version but there can be opened because "Apple cannot check it for malicious software." Also, when I was forced to open it doesn't have a way to open files on it. I would wondering do there have any ways to fix those problems?

  • @MartinHilpert

    @MartinHilpert

    2 жыл бұрын

    It's hard to diagnose these issues from afar, but Anthony Laurence has a great series of tutorials on his webpage: www.laurenceanthony.net/software/antconc/ Good luck!

  • @Pakanahymni
    @Pakanahymni4 жыл бұрын

    Have you ever tried plotting the multiple "position × n", would be interesting to see how much it varies. (if it was in the video I missed it)

  • @MartinHilpert

    @MartinHilpert

    4 жыл бұрын

    Hi Järvi! The common way of visualizing Zipf's Law is the scatterplot of rank and frequency with logged axes. I adopted that format in order to match up with other explanations that are out there.

  • @carolynknight4233
    @carolynknight42334 жыл бұрын

    Hi, thank you for your wonderful videos. Does this law hold true for words uttered or written by non-native speakers of a language? or uttered by children before having mastered the language?

  • @MartinHilpert

    @MartinHilpert

    4 жыл бұрын

    Hey Carolyn! Both L2 language and child language in first language acquisition show Zipfian distributions. Here is an interesting lecture by Nick C. Ellis on Zipf and L2 language use: kzread.info/dash/bejne/aZd_w7puZ9eriMY.html Better video & audio, similar content: www.uttv.ee/naita?id=25911 Here is a study about Zipf and child language: journals.plos.org/plosone/article?id=10.1371/journal.pone.0053227

  • @carolynknight4233

    @carolynknight4233

    4 жыл бұрын

    @@MartinHilpert Martin Hilpert Thank you so much Dr. Hilbert! I'm very excited about learning more about this, and I always look forward to your videos 🙂

  • @MartinHilpert

    @MartinHilpert

    4 жыл бұрын

    @@carolynknight4233 Thank you, Carolyn!

  • @TheRealGnolti
    @TheRealGnolti3 жыл бұрын

    Martin, Zipf's law makes me wonder about the value of MI scores, not that they aren't meaningful, but when you review collocation results for a word and find that MI seems to have nothing to do with absolute frequency, but just mutual attraction continuing to exert its pull regardless of frequency. Collocation is a function of context, and it's the frequency of contexts that varies, analogous to the way certain climatic circumstances can promote the health of, say, vegetation and insects. Plug "miserable" into COCA and you get "creature" at rank 15 and an MI of 7.38 after a long line of MIs in the 3.0 range, because "miserable creature" is construction that occurs on certain rhetorical occasions. Am I overthinking this?

  • @daquarlow
    @daquarlow3 жыл бұрын

    mathematicians paradise right here

  • @Temerold_se
    @Temerold_se2 жыл бұрын

    But what if you make a language with "aaa" before every word? Does Zipf's law apply then?

  • @MartinHilpert

    @MartinHilpert

    2 жыл бұрын

    Mathematically, adding "aaa" to each word does not change the distribution. In the real world, languages like that don't exist, though. Speakers would be too lazy to pronounce extra vowels that don't mean anything, and so some of the "aaa"s would disappear very soon.

  • @Temerold_se

    @Temerold_se

    2 жыл бұрын

    @@MartinHilpert ehm ok, but there's this asian language where they say like "Praise God" before every sentence. Also, real language or not, how does it apply?

  • @Temerold_se

    @Temerold_se

    2 жыл бұрын

    @@MartinHilpert btw, how does it now change the distribution? Take an existing text and add "aaa" to the beginning of each word, it wouldn't work, right?

Келесі