Zipf's Law
Ғылым және технология
Do most words in a corpus occur with average frequency? Absolutely not! This video discusses a surprising regularity about word frequencies in corpora. And at the end, we'll make a trip to Hogwarts and see if Zipf's Law applies also in the world of wizards.
If you want to follow along, here are the word list files:
drive.google.com/file/d/1p5DT...
Request the Potter corpus:
docs.google.com/forms/d/e/1FA...
Vsauce on Zipf:
• The Zipf Mystery
Пікірлер: 29
Hello :) I watched your abralin talk live on Wednesday. I study generative syntax, and I was very inspired by your discussion of negative evidence in the Q&A session! Thank you for all the wonderful videos!
@MartinHilpert
4 жыл бұрын
Thanks a lot, Yingyu Su, that's very kind of you to say!
Thanks for that extensive video! It put a great value into my master's thesis. Even though I'm dealing with distributions in geographical data, it was great and easy way to understand Zipf's law.
I LOVE that you've linked to Michael Stevens' video. I'm playing around with predictive language models and I'm really happy you're talking about WORD TOKENS in this video!
This is great. You teach so clearly.
Excellent video. You teach excellently, your students must be happy with you.
@MartinHilpert
3 жыл бұрын
Thank you for your kind words. I teach linguistics to my students, but my students taught me how to do that, if that makes any sense. ;)
Never seen normal distribution being explained so clearly and easy way to understand.
This blew my mind.
Hi, Thank you,I will follow all video of uncle
Great professor
Hi Martin, thank you for the wonderful and very helpful video. I am applying Zipf's law on my task to create a dictionary of words that are specific for a particular category - However, I wonder if I could use the curve to determine a threshold number for the most significant words for the dictionary ? For instance, use the intercept to determine this?
awesome!
thanks a lot
I'm doing a project on this same thing, would there be any chance for me to get in contact with you for a possible interview? awesome video by the way
Thank you for the video:D I'm trying to download Antconc on mac with the newest version but there can be opened because "Apple cannot check it for malicious software." Also, when I was forced to open it doesn't have a way to open files on it. I would wondering do there have any ways to fix those problems?
@MartinHilpert
2 жыл бұрын
It's hard to diagnose these issues from afar, but Anthony Laurence has a great series of tutorials on his webpage: www.laurenceanthony.net/software/antconc/ Good luck!
Have you ever tried plotting the multiple "position × n", would be interesting to see how much it varies. (if it was in the video I missed it)
@MartinHilpert
4 жыл бұрын
Hi Järvi! The common way of visualizing Zipf's Law is the scatterplot of rank and frequency with logged axes. I adopted that format in order to match up with other explanations that are out there.
Hi, thank you for your wonderful videos. Does this law hold true for words uttered or written by non-native speakers of a language? or uttered by children before having mastered the language?
@MartinHilpert
4 жыл бұрын
Hey Carolyn! Both L2 language and child language in first language acquisition show Zipfian distributions. Here is an interesting lecture by Nick C. Ellis on Zipf and L2 language use: kzread.info/dash/bejne/aZd_w7puZ9eriMY.html Better video & audio, similar content: www.uttv.ee/naita?id=25911 Here is a study about Zipf and child language: journals.plos.org/plosone/article?id=10.1371/journal.pone.0053227
@carolynknight4233
4 жыл бұрын
@@MartinHilpert Martin Hilpert Thank you so much Dr. Hilbert! I'm very excited about learning more about this, and I always look forward to your videos 🙂
@MartinHilpert
4 жыл бұрын
@@carolynknight4233 Thank you, Carolyn!
Martin, Zipf's law makes me wonder about the value of MI scores, not that they aren't meaningful, but when you review collocation results for a word and find that MI seems to have nothing to do with absolute frequency, but just mutual attraction continuing to exert its pull regardless of frequency. Collocation is a function of context, and it's the frequency of contexts that varies, analogous to the way certain climatic circumstances can promote the health of, say, vegetation and insects. Plug "miserable" into COCA and you get "creature" at rank 15 and an MI of 7.38 after a long line of MIs in the 3.0 range, because "miserable creature" is construction that occurs on certain rhetorical occasions. Am I overthinking this?
mathematicians paradise right here
But what if you make a language with "aaa" before every word? Does Zipf's law apply then?
@MartinHilpert
2 жыл бұрын
Mathematically, adding "aaa" to each word does not change the distribution. In the real world, languages like that don't exist, though. Speakers would be too lazy to pronounce extra vowels that don't mean anything, and so some of the "aaa"s would disappear very soon.
@Temerold_se
2 жыл бұрын
@@MartinHilpert ehm ok, but there's this asian language where they say like "Praise God" before every sentence. Also, real language or not, how does it apply?
@Temerold_se
2 жыл бұрын
@@MartinHilpert btw, how does it now change the distribution? Take an existing text and add "aaa" to the beginning of each word, it wouldn't work, right?