How Computers Compress Text: Huffman Coding and Huffman Trees
Computers store text (or, at least, English text) as eight bits per character. There are plenty of more efficient ways that could work: so why don't we use them? And how can we fit more text into less space? Let's talk about Huffman coding, Huffman trees, and Will Smith.
Thanks to the Cambridge Centre for Computing History: www.computinghistory.org.uk/
Thanks to Chris Hanel at Support Class for the graphics: supportclass.net
Filmed by Tomek: / tomek
And thanks to my proofreading team!
🟥 MORE FROM TOM: www.tomscott.com/
(you can find contact details and social links there too)
📰 WEEKLY NEWSLETTER with good stuff from the rest of the internet: www.tomscott.com/newsletter/
❓ LATERAL, free weekly podcast: lateralcast.com/ / lateralcast
➕ TOM SCOTT PLUS: / tomscottplus
👥 THE TECHNICAL DIFFICULTIES: / techdif
Пікірлер: 1 300
This is the last of the three trial Basics videos! This pushed my quick-explanation skills to the limit, but I figure that "slow down the video and replay if necessary" is better than "let people get bored"...
@superhands290
6 жыл бұрын
Are you bored of this realm?
@SiddheshNan
6 жыл бұрын
Tom Scott 2 days ago??
@timtjtim
6 жыл бұрын
These are really great - this one taught me something new, and I really do feel like I understand it now! If you do this again, and you go to the Computing Museum, will you let people know, or will it be a surprise? :)
@JohnR31415
6 жыл бұрын
I know some people can't cope with the high speed delivery, and thus just stop watching - if YT had an easy 'slow this video down' button....
@terimarymags
6 жыл бұрын
We need more like this.
Would love to learn more about compression! This was fascinating :-)
@bonzaipineapple3143
4 жыл бұрын
xisumavoid 😱 I love when someone I admire on yt also admires someone else I admire, I wish you both the best.
@zaimwaqar2788
4 жыл бұрын
Did not expect a Minecraft youtuber here!
@macaroon_nuggets8008
4 жыл бұрын
Second comment I have seen from you on this channel!
@onionbot2
4 жыл бұрын
wow uh hi
@TheScramblerTV
4 жыл бұрын
ecks eye zooma void
Dang, that last line got me. I want a series on more than just the basics now!
@miles4711
6 жыл бұрын
WΔY ΔWΔY The channel Computerphile (where Tom also featured a few times) has something on Huffman coding. And a lot more interesting stuff.
@WAYAWAYWithAsh
6 жыл бұрын
+miles4711 I also watch there. But Tom's approach here was well done in a way I hadn't quite seen before. That said, I'll look up those specific videos you mentioned. Thanks
@roderik1990
6 жыл бұрын
So basically, this method is the best given the frequency distribution of those single characters. But... if you have more information on how your data is distributed/patterned, you can get further compression by using those patterns.
@DeanyKong
6 жыл бұрын
Right? I've got total blue brains now.
@Nadia1989
6 жыл бұрын
Then I suggest you check out Crash Course Computer Science. It's a bit more than just the basics.
It's amazing how every video that Tom puts out always has an incredible amount of attention to detail. Take for instance the increase in video compression when he talks about it (0:43), him saying "worms" instead of "words" when talking about (0:52) lossy text compression, and the fact that at (4:40) he does his gestures mirrored so it is our left and right and not his left and right. On top of that he does it all in one take. I'm always amazed when a new episode comes out.
@fredoverflow
6 жыл бұрын
Speaking of one take, have you seen the ending to "Will KZread Ever Run Out Of Video IDs?"? :)
@I_killed_that_beard_guy
Жыл бұрын
Hello sir tutankhamon
@theodorlager4024
Жыл бұрын
frletmiflm with their old kit
Tom if you keep this up people will realize its not magic. If I have to stop wearing my wizards hat to work Ill spell you so hard.
@traewatkins931
5 жыл бұрын
Naa we just have to make doing the same thing more esoteric therefore appearing as advanced wizardry... just like we have for the last 20 years.
@thegamingsidedk9699
4 жыл бұрын
Ömåå Å
@usejasiri
4 жыл бұрын
Few people even watch KZread videos like Tom Scott's and other educational stuff. Don't worry, most people are on music and entertainment.
@WillKemp
3 жыл бұрын
@@traewatkins931 40 years. I started working as a programmer in 1979 😱, and that's how it was then 😂
@SerialElfYT
3 жыл бұрын
@@usejasiri for some of us this is entertainment
"... or you'll send the wrong worms" damn Matt, that joke was awful!
@DogsRNice
6 жыл бұрын
I live how many of the comments are pointing that out and also changing one word in the comment as well
@Culmaerija
6 жыл бұрын
at which point translators just gave up subbing the video ಠ_ಠ
@mckseal
6 жыл бұрын
"Send Words" changing one letter becomes "Send Worms", becomes "Sand Worms", becomes THE SPICE MUST FLOW
@yyny0
6 жыл бұрын
'Matt'
@JimPlaysGames
6 жыл бұрын
Are you kidding? That was grape!
3:00 God how I love the square sneaking of the screen
I've worked with various compression algorithms derived from Huffman and this is the best description of it I've seen.
During my encoding class I had to write a program that compresses files with huffman coding. As I was turning back my program the professor asked me: -Where is the source code? -Oh here, you just have to decompress it with my program.
@rmsgrey
6 жыл бұрын
How many quines did you submit?
@Mikomen97
6 жыл бұрын
I'm not sure if it counts as a quine as it takes an external file
@iabervon
6 жыл бұрын
"You know how I was supposed to be picking a version control system? Well, I decided I didn't like any of them, so I wrote my own and checked it into itself. I put the repository on my web site..."
@Mikomen97
6 жыл бұрын
Yeah once I was like "Wait does git has it's own repo where it keeps track of itself? (check) oh actually it does". Another thing is that gcc is written in c, so it can compile itself.... moreover it can compile a newer version of itself
@AndersJackson
6 жыл бұрын
There are many program languages that are programmed in themself, and are compiled by itself. Nothing strange there. Look up Bootstraping.
Everyone's talking about Tom sending the wrong worms, but there's precious little talk about the video quality going drastically downwards as he talked about video/images being lossy compression right before that.
@krashd
6 жыл бұрын
Thus proving his point that some things like images or audio can be compressed using a lossy technique without losing any of the actual meaning (we could still see and understand a lower quality video.) you only loss detailing.
@NoriMori1992
6 жыл бұрын
I think that's because everybody noticed that and it was a pretty obvious and predictable joke, whereas "sending the wrong worms" makes people do a double take.
@willmcconnellsimpson1411
4 жыл бұрын
I don’t think it was obvious, I thought it was inspired.
@artysanmobile
4 жыл бұрын
Smittens Yes, I noticed that and wondered at the coincidence.
Can you do a video on zip files?
@GoldenSun3DS
6 жыл бұрын
Do one on 7Zip and RAR, too.
@qwertyTRiG
6 жыл бұрын
Karate Girl91 There are a lot of sophisticated compression algorithms. Zip, 7z, rar, gzip, deflate, bz2, botli, etc. Some of these are stream algorithms, and tend to work in a concert with tar to compress folders.
@samdouglas32
6 жыл бұрын
It's not Tom, but Prof. Brailsford did a video explaining LZ77, which is still a very popular compression algorithm, variants are used in in ZIP, Gzip, RAR etc. It's easy to understand and a good place to get started if you want to learn more: kzread.info/dash/bejne/maODw5V9d8jghaQ.html
@sophieloren9454
5 жыл бұрын
Ethan Pender has
"You end up sending the wrong worms" Wow, that jake was funking awful
@pacha1500
6 жыл бұрын
im dying now
@flashylite
6 жыл бұрын
It reminded me of this! kzread.info/dash/bejne/k35k0KiKl7WnoLw.html
@JohnyComeLately
6 жыл бұрын
Hierophant yarn response was as goo as the original jack?
@seanthebluesheep
6 жыл бұрын
Taikamuna It took me a couple of minutes, but I got it in the end. The wrong worms is "the wrong words" but worms is the wrong word.
@Draugo
6 жыл бұрын
Before opening I assumed you would link this kzread.info/dash/bejne/on2hubh8fcqribQ.html
I like the dry humor, "Otherwise you say the wrong worms" is just objectively funny
I never thought that Wild Wild West would ever be used to such a profound effect. Well done, Tom.
@pepper669
6 жыл бұрын
To me, "Wild Wild West" is the least memorable movie in the history of cinema. OK, maybe except maybe for "Ishtar".
@frickyou1045
6 жыл бұрын
You have obviously never heard the masterpiece Wow Wow by the legend Neil Cicierega
@pepper669
6 жыл бұрын
Thanks for the tip! Maybe I'll even watch it (or try to do so :) ).
@uninstallyourheart
6 жыл бұрын
Walter Lowe I KN E W I WASNT THE ONLY ONE WHO CAUGHT THE REFERENCE
@cirlu_bd
4 жыл бұрын
try Wow Wow by Neil Cicierega, I can assure you Wild Wild West (and All Star if you fall into the rabbithole that is Neil) will have a brand new type of impact on you !
Toms red T-Shirt probably has it's own insurance.
@ScarceAnari
6 жыл бұрын
Im proud
@DerekHartley
6 жыл бұрын
Which one of the hundreds?
@oliilo1
6 жыл бұрын
And he's not even the guy known as "red shirt guy". Though I bet Tom use a green or blue shirt in public to avoid people recognizing him.
@rosiemae8023
6 жыл бұрын
oliilo1 Some of the guys at my local backspace know Tom, and apparently he only really wears his red tshirt when he knows/thinks he'll be filming. I only found this out when the guys at the hackspace found out I only wear green tshirts, every single day. It saves a decision in the morning, and makes washing so much easier.
@williambrennan104
6 жыл бұрын
*its
I've been binge watching your videos for the last few days procrastinating on my homework. Now my CS class has a question on a Huffman tree, and I end right back up here. Thank you, Tom. You made it super easy.
I love how you just seamlessly blend in jokes to your script.
Yo, the YT algorithm is giving me a Tom Scott renaissance right now and it's all so good. Love this guy!
Wow! I can't believe you just explained something that you would learn at a university lecture, but in a way that was so thoroughly entertaining!
This is an absolutely amazing series. I hope there will be more in the future!
THANK YOU TOM SCOTT!! I have wondered how compression like this worked for the better part of a decade, and I've even put a decent amount of time into researching it, but it never clicked until this video. You have the best content on this entire platform, both in subject matter and quality.
Basics videos are great. You're a good communicator, and it's fantastic for children doing computer science at GCSE.
0:46 I actually thought my wifi was acting up again
love that your getting back to the educational stuff. Breaking down complicated topics in computing or explaining how a hack was done was always a fave.
Would love to see more of these. Even when I already know about what you're explaining, your videos are so superbly made that they're still fascinating to watch.
Always needed a clear and concise way of explaining Huffman Coding. Thank you!
Wow, everyday I see that Huffman reference in my daily job and I use it to pack files, and I never new it was a masterpiece of software engineering. thanks for explaining it so well
This guy really seems like he LOVES to teach what he knows and is passionate about! I'm really enjoying the enthusiasm as well as the knowledge being shared. This is so good❤
"The wrong worms" Godlike writing, nice job tom
I love your videos. You make complicated things look so easy (altough I don't understand the technique and logic behind it anyway :D)
@kalebbruwer
6 жыл бұрын
brayzbeats. Doing this means the most commonly used characters end up at the top with the shortest binary codes.
@Ins4n1ty_
6 жыл бұрын
It's pretty simple really. Huffman Trees are very straightforward. Think about a short tree, like this one: o / \ o B / A o C D This means our alphabet has 4 letters. A, B, C and D. Now I want to compress a short text using this tree. The text is "DAB" (sorry, only thing I could think with these letters) So, we'll start by converting D into binary. To get to D, we gotta go to the left twice then to the right once, so here it goes: "Left Side: 0, Left Side: 0, Right Side: 1". So a D corresponds to 001 in binary. Now to get to A, it's 1 left and 1 right, so 01. Now to B, all I need is 1 right. so 1. This turns DAB into 001011. 6 bits of information. When you read this file, you'll convert 001011 back into DAB by doing the opposite process. You'll read bit by bit: 0 -> go to the left once. It's not a letter yet so read next bit. 0 -> Go to the left again, it's still not a letter so read next bit. 1 -> Go to the right. Oh it's a letter, letter D. OK so writing it down on the output: D. Now reset to the tree root and read next bit. 0 -> Left. Not a letter. Next Bit. 1 -> Right. It's the letter A! Write it down to the output. DA. Reset to root and next bit! 1 -> Right. It's the letter B! OK! Write it down to the output. DAB. There are no more bits to read so it's done! Your text is DAB!
@yesto9676
6 жыл бұрын
brayzbeats. I do understand these well, but math and engineering videos go past vocabulary and I have to watch them twice to understand sometimes.
@brayzbeats
6 жыл бұрын
That's it.
@caramonfire
6 жыл бұрын
That helped a lot, thank you!
Loved this. I was wondering where the other 2 episodes were but turns out I've accidentally already seen them. Love you Tom! Keep up the good stuff x
That genuinely one the most interesting videos I've seen in ages. More of these please Tom.
Thank you very much Tom. I’ve had a hard time understanding Huffman trees in school so that really helps me!
"But text has to be losslessly compressed: you can't just lose a bit of detail, otherwise you end up sending the wrong worms." 🤣 Nice play there Tom!
@KaosFireMaker
6 жыл бұрын
Along with the fact that video compression artifacts were added just before that
@zebronki
4 жыл бұрын
@@KaosFireMaker such a nice little touch!
@Christian-mn8dh
2 жыл бұрын
i dont get it
This video was well-written, and the execution was exceptional. Engaging, informative, and entertaining. Very nice! I would like to see more like this.
Just when I think I’ve seen your best work, Tom, along comes this. Excellent!
Surprised we didn't learn about Huffman coding in university. Could have been mentioned right after the sorting algorithms...
@wappa6914
4 жыл бұрын
Trurl In my school we did it
@davictor24
4 жыл бұрын
@@wappa6914 my school too. Did an Information Theory course in my final year.
This is really cool. Honestly really curious as to how the tree itself is stored. That would be really cool. Tbh I want Tom to teach a whole computer science basics class.
@blomblegle
Жыл бұрын
What I did was to store in the header of the compressed file the preorder traversal of the tree and then an array with information about which nodes are leaf nodes. You could also store combination of 2 different traversals, but "my" solution would only take up to [number of nodes] + ceil([number of nodes] / 8) bytes - not that significant difference, but still a difference :P
@rigveddesai5843
Жыл бұрын
idt the tree itself is stored it is just used to create the translation list which is used to compress the data, then you can send the list and the compressed data
Tom Scott, you, sir, are a genius. Nobody has ever been able to explain something like this to me in such a simple fashion, I actually understand this now, and I could decode a file by hand with a Huffman tree now just from what I learned in your video.
I was able to understand more in your short video than any other attempts I have made in the past. Thanks!
3:02 I love how the pointer gets so angry it just leaves
When he talked about images and the screen went low q I thought it was my connection! Well played!
Love this series so much. Hope to see more like this in the future
Fascinating. I'll be looking out for the rest of the series.
Awesome presentation....inspired by you...I am an IT engineer...:wanna start my tech channel too after viewing your video and fan base....wish me good luck ☺️☺️☺️
@Phil8sheo
3 жыл бұрын
Amazing to find this comment from you at the beginning of your channel! Now you have nearly 200,000 subscribers and that is simply amazing. Congrats!
@Martin__
3 жыл бұрын
@@Phil8sheo Was thinking the same. Amazing!
@jimhalpert9803
3 жыл бұрын
This comment is amazing. Wow.
This is magic. I've just finished my text compression algorithm in C++ the day before yesterday, and this video was uploaded. I feel I'm being observed.
@Boog_masskway
Жыл бұрын
The algorithm watches all
Always wanted to say that I love your voice, Tom. This is actually the reason why I subbed.
Bless you, Tom Scott! You managed to make me understand in 6 and a half minutes what my teacher couldn't in an hour.
This one of only two things I remember from my programming module - the other is a habit of typing both the open and close brackets at the same time.
0:49 this was funny on so many levels, hope no one sends me any worms, my computer's laggy enough as it is
I like it when you talk about something that combines computing science with mathemetics. you seem so passionate!
As usual, darned fascinating! Please continue with the series. I know _some_ stuff but I'm learning lots!
0:50 Hilarious, reminded me of SCP-586 (Anything talking apple it has one typo.)
This video was SUPER helpful! Thank you Tom!! I can't go to IRL classes right now - pandemic and all that - and this introductory video is the next best thing. And dare I say, my teachers are not as skilled at succinct explanations. Thanks!!
Marvelous imagery! If it weren't for the intuitive visuals and your keen explanation, I would have never understood this. This is fascinating once you get it!
Thanks to this channel I know the solutions to so many problems I didn't previously know existed. Thank you?
"... or you'll send the wrong worms" *5 seconds later* Nose Exaling
"... or you'll send the wrong worms"
@uncreativename9833
6 жыл бұрын
Colten Pilgreen 😂 0:50
@SloeJuice
5 жыл бұрын
Exposed! You need to make sure your computer virus is compressed properly? :D
@dkaloger5720
4 жыл бұрын
Wong
@futrey9353
3 жыл бұрын
Ip happesad mo we onze
I believe you look much more confident in compare to your previous videos, and also I wasn't expecting that your video will be so hypnotically extremely understandable; but you just nailed it. Thanks for sharing!
More of these types of Computer videos please!! They are the best!
"You can't just lose a little but of detail, otherwise you'll end up sending wrong *worms* " Damn, that's clever
3:05 someone give the animator a raise 😂
Well, I have my algorithms final tomorrow and I wanted to have a quick recap, Tom Scott nailing it, again. Thanks!
I enjoy this series a lot. I hope you continue it.
THAT DAMN WORM PUN!
But using the hauffman method, how does the computer separate the letters as long as they are in a constant stream?
@lcarusfp
6 жыл бұрын
Let's imagine we have some "compression" tree and a string consisting of 10111010... . Every letter has an individual number assigned(the number is assigned with the tree). some are 4 digits long some are longer. So the computer sees the 1 and compares it with every assigned number. So the Pc checks 1-(every assigned number). If the result is zero it arrived at a letter. So the pc checks 1. No zero as result. Take the next bit 10. No 0 as result. Next bit 101. Still no zero as result. 1011 we get a zero as result. And display that letter. then we start again. 1. no zero as result .... Every letter has an individual path. So there will never be a letter assigned to 10111. that's how the pc knows when to seperate the string.
@MrMomoro123
6 жыл бұрын
It separates the letters by referencing the tree. As it gets 1s and 0s it goes down the tree until it hits a dead end. The letter at that dead end is what is outputted, and the program heads back to the top of the tree. The structure of the tree is how it knows where the breaks are.
@PregnantOrc
6 жыл бұрын
No branching point has a letter so if the point it comes to has a character instead of a branching path it has found the character needed and the next bit starts at the top of the tree.
@Spincervino
6 жыл бұрын
Look at the graph at 5:02 Every time the computer reads a 1 it goes right in the tree, every time it reads a 0 it goes left. In this infinite sequence of zeros and ones, you only get to a letter at the end of a branch. You can imagine letters at the end of the branch as "leaves" in a tree, where every branch must end in one and only one leaf. The computer stores the entire tree so it knows if the sequence has reached a leaf or if it is still on a branch. If, with the last bit read, you did not get to a letter (end of branch) you continue reading. If you got to a letter, write the letter and start a new one.
@CoolJosh3k
6 жыл бұрын
Giacomo 3003 it knows because it reached the end of a branch.
I love this series. If you decide to continue, I would be very much for it!
Tom, I wish this was out two years ago when I was doing my computer science A Levels! This is amazing !
I feel so smart now
"The wrong worms" I see what you did there... nice.
Belatedly: That was really clear and comprehensible! It didn't feel too quick at all to me - it was all straightforward and easy (for me) to understand. Good job!
Thank you for putting this in layman's terms. This is super interesting, and you make it so I can actually understand every bit you say. Cheers!
0:50 "...sending the wrong *worms* " I see what you did there =D
A minor point: Position of atoms on a disk? More like magnetic fields. Edit: Nice visual stepping tree!
@PaleandPastey
6 жыл бұрын
The magnetic fields are defined by the orientation of electrons around the atoms of the disk; so it's 6 of one, half a dozen of the other really.
I am really getting into this new series! Great writing!
When comes a new series of the basic, I love them, youre a cool guy. Keep going
1:08, no animated sunglasses? Disappointed.
Tom Scott: Text has to be losslessly compressed! Xerox: Hold my copy machine!!!
@imveryangryitsnotbutter
4 жыл бұрын
*Hold my toner.
This was the best basics video so far.
This is the series I have been waiting for.
"Otherwise you end up sending the wrong worms"
@harrietgriffiths5002
6 жыл бұрын
Blah Cga it means "sending the wrong *words*" but he sent the wrong word so it says "worms"
3:32 well that aged like milk.
I'm always amazed at how many clever people are in the world, with solutions to problems that I know I could never have come up with.
Really amazing video quality. Something very different this time. I love your channel...
My teacher is trying to teach this, I hop on youtube and learned the concept in 7 minutes. go back to zoom, teacher haven't done with the example yet 🤣
Shame modern game devs have given up on saving space and increasing efficiency. Brute force is now the solution.
@marcusborderlands6177
3 жыл бұрын
The issue is that people are getting higher resolution displays, needing higher resolution textures, and games are using less repeating textures to make each location feel unique.
2 computer science lesson now make sense after 6:30 of of Tom Scott explaining Huffman Code! Thank you
I'm so glad you're doing computer stuff again. The travel stuff is fine, but this is why I'm subscribed.
Did he actually say "you end up sending the wrong worms"?
Sending the wrong worms eh? IC what you did there
@georgeelsham
6 жыл бұрын
Jasper H haha IC what you mean
@georgeelsham
6 жыл бұрын
Blah Cga IC stands for intergrated circuit (the chips used in computers) so we were talking about ICs and I see
@xexpaguette
4 жыл бұрын
@Blah Cga Wrong words. Wrong worms.
I've always wondered how this works, this is amazing!
Love it when a youtube video makes me feell like I lerned something. Tbank you, Tom
everything that exists is either a stopwatch or not a stopwatch
Tom..... Did you get a tan?
ayo you better give your editor a raise; this video is bang on
I know about this stuff already but still watch it because Tom Scott is a good lad
The wrong "worms", that certainly makes up for the Quadrillion / Trillion mistake! Edit: As I typed that, my grammar checker tried to suggest it should be "words"!
@Yeldur
6 жыл бұрын
It's a play on words; or rather, characters. Tom was supposed to say "The wrong words" but instead replaced a single character to represent the incorrect character being sent.
@DarkYuan
6 жыл бұрын
On that tangent, perhaps a video explaining how spell-checkers work? I think it had something to do with binary search trees, which I only briefly covered in college.
@f4tornado450
6 жыл бұрын
A quadrillion bits is actually 125 gigabytes, so I wouldn't consider it a mistake
Those here for Cryptocracy :)
@ipsitayankakoty652
2 жыл бұрын
ayyy
@bhargavd9829
2 жыл бұрын
got the answer?
@bhargavd9829
2 жыл бұрын
@@ipsitayankakoty652 paala ni
Really interesting series, hope there is more
This is amazing, very clearly explained. Thanks!