The Trie Data Structure (Prefix Tree)

Patreon ➤ / jacobsorber
Courses ➤ jacobsorber.thinkific.com
Website ➤ www.jacobsorber.com
---
The Trie Data Structure (Prefix Tree) // Move over hash tables, you've got some competition. This video talks about the Trie data structure (also known as a prefix tree) and shows you how to implement it in C.
***
Welcome! I post videos that help you learn to program and become a more confident software developer. I cover beginner-to-advanced systems topics ranging from network programming, threads, processes, operating systems, embedded systems and others. My goal is to help you get under-the-hood and better understand how computers work and how you can use them to become stronger students and more capable professional developers.
About me: I'm a computer scientist, electrical engineer, researcher, and teacher. I specialize in embedded systems, mobile computing, sensor networks, and the Internet of Things. I teach systems and networking courses at Clemson University, where I also lead the PERSIST research lab.
More about me and what I do:
www.jacobsorber.com
people.cs.clemson.edu/~jsorber/
persist.cs.clemson.edu/
To Support the Channel:
+ like, subscribe, spread the word
+ contribute via Patreon --- [ / jacobsorber ]
Source code is also available to Patreon supporters. --- [jsorber-youtube-source.heroku...]

Пікірлер: 123

  • @oj0024
    @oj00243 жыл бұрын

    I recently learned about another awesome data structure called "random skip list".

  • @2005kpboy

    @2005kpboy

    2 жыл бұрын

    Redis uses skip-lists for internal implementation of sorted sets

  • @TheCocoaDaddy
    @TheCocoaDaddy3 жыл бұрын

    Great video! I had never heard of a Trie data structure before. Thanks for posting!

  • @JacobSorber

    @JacobSorber

    3 жыл бұрын

    You're welcome. Glad I could add something to your repertoire.

  • @manohars123
    @manohars12310 ай бұрын

    Informative and well explained, Thanks! One optimization suggestion - In trieinsert( ) function, once child node with NULL is reached, no need to further check for NULL as it would always be the case.

  • @KamillaMirabelle
    @KamillaMirabelle3 жыл бұрын

    The size of the lookuptable are the miminum size we need to represent all representable encoding. This is the same for lexer based on DFA, since all encoding bigger than 8bit pr. Codepoint are just a sequence of bytes which we can search with.

  • @trimethoxy4637
    @trimethoxy46373 жыл бұрын

    omg really helpful. I was about to ask you to make video covering this topic, but you overtook me! going to create dictionary inside trie

  • @JacobSorber

    @JacobSorber

    3 жыл бұрын

    Glad I could help. Lots of requests for this topic.

  • @R4ngeR4pidz
    @R4ngeR4pidz3 жыл бұрын

    Awesome, love learning new things from your videos :P

  • @JacobSorber

    @JacobSorber

    3 жыл бұрын

    Thanks, Thijs. I'm glad they're helpful.

  • @mohamedhussien4013
    @mohamedhussien4013 Жыл бұрын

    Thank u so much. Elegant code.

  • @itsbk6192
    @itsbk61923 жыл бұрын

    Lets gooooo, i'll trie to keep a hold of my excitement

  • @Vulto166
    @Vulto166 Жыл бұрын

    Man you really wake up early! 5pm :O Thanks for the content!!

  • @sneakypete765
    @sneakypete7653 жыл бұрын

    Thanks for another great video!

  • @JacobSorber

    @JacobSorber

    3 жыл бұрын

    Glad you enjoyed it!

  • @sumitbhosale3401
    @sumitbhosale34013 жыл бұрын

    Nice One. Thanks

  • @jordanchen23
    @jordanchen233 жыл бұрын

    I've recently learned how to write a working trie dictionary program and your videos on linked lists and mmap have been tremendously helpful. That being said, I'm currently trying to optimize my code and learning a lot about cache profiling in the process since my program's still takes a long time to load an entire dictionary file. Maybe it's just because my own implementation used 27 child nodes, and the size of my data structure is leading to a lot of cache misses that I don't know how to resolve.

  • @JacobSorber

    @JacobSorber

    3 жыл бұрын

    In addition to cache profiling, have you used a standard code profiler, to see where your program is spending most of its time? That might help you identify any bottlenecks. But, yes, the larger your trie nodes are, the worse I would expect your cache performance to be.

  • @jordanchen23

    @jordanchen23

    3 жыл бұрын

    ​@@JacobSorber​If you're mean time taken for each part of my trie implementation, then no. Do you have any specific recommendations for profilers? However, if you mean the net runtime of each discrete stage in overall execution (load, check, unload), then yes. My assignment has a prewritten timer function in the main program file. For a dictionary with 143,091 words (max length 45 characters), it takes 0.08 secs to insert everything into a trie structure that contains 27 child nodes. Thing is, it only takes 0.04 seconds to unload everything, and while I do realize that it's easier to free memory I still found the asymmetry rather strange.

  • @thejedijohn
    @thejedijohn3 жыл бұрын

    Excellent explanation. Thank you for this. I'd recommend setting a higher delay before the code completion suggestions pop up, they were very distracting and made it much harder to follow the code. But maybe that's just me and my anxiety acting up.

  • @recursion.

    @recursion.

    2 жыл бұрын

    Nah just rewind it

  • @StephenBeale
    @StephenBeale Жыл бұрын

    This was excellent, thank you. Just one bit of feedback - often you are working quite low on the screen, which can be frequently obscured on KZread by the player controls along the bottom if (like myself!) the viewer is having to pause often in order to keep up. Cheers.

  • @rohankokatanur6426
    @rohankokatanur64263 жыл бұрын

    I learnt just a week ago that tries exist and thought to learn about them ..... You drop a video on it ...🎉

  • @JacobSorber

    @JacobSorber

    3 жыл бұрын

    Nice. Glad the timing worked out well.

  • @user-tq1rr1tc2c
    @user-tq1rr1tc2c23 күн бұрын

    ممنونم استاد

  • @JesseSabatini
    @JesseSabatini2 жыл бұрын

    greate video! not sure what i did wrong, i copied verbatim from the code in the video and i segfault in the insert for loop when calling insert on the second word. I used some debugging printf's and noticed that when inserting cattle, memory exists where it should be NULL causing it to bypass the new node creation within the insert for loop's if statement. not sure how i messed up tbh

  • @matiasbpg
    @matiasbpg3 жыл бұрын

    Nice video!

  • @JacobSorber

    @JacobSorber

    3 жыл бұрын

    Thanks! Glad you enjoyed it.

  • @jordanchen23
    @jordanchen23 Жыл бұрын

    I'm trying to make a trie that uses calloc to request a large amount of memory ahead of time and allocate them as needed.. A memory bucket if you will that can be called anytime the current one gets used up. The point is to lower the overall number of memory requests in the program. But i can't figure out the proper declaration of a ptr to a single trienode that allows me to to access and manipulate an array of trienodes that's already part of structure.

  • @laugh_till_u_pee968
    @laugh_till_u_pee9683 жыл бұрын

    i just subscribed to your channel for C language related content, but im commenting here just so you know, you are making great content. keep up the great work. subscribers will come eventually.

  • @JacobSorber

    @JacobSorber

    3 жыл бұрын

    Welcome! And, thanks for the support.

  • @Elzelgator
    @Elzelgator2 жыл бұрын

    Do we need to dynamicly allocate when we create a new Node??

  • @evansngugi2238
    @evansngugi22383 жыл бұрын

    Fantastic

  • @JacobSorber

    @JacobSorber

    3 жыл бұрын

    Thanks.

  • @birajpatel2804
    @birajpatel28043 жыл бұрын

    Hi sir. Can you do a video explaining how to implement JavaScript's setTimeout & clearTimeout in C? Thank you!:)

  • @JacobSorber

    @JacobSorber

    3 жыл бұрын

    Oooh. Good idea. I'll add it to the topic list and see what I can do. Thanks!

  • @MKhan-eq6pj
    @MKhan-eq6pj3 жыл бұрын

    Man you're a great guy. Could you do an algorithms playlist? Like a course like playlist! that would help so much and of course do it in C plsss

  • @JacobSorber

    @JacobSorber

    3 жыл бұрын

    Thanks. I'll see what I can do.

  • @godfather7339

    @godfather7339

    3 жыл бұрын

    @@JacobSorber 1 second him, I love ur teaching style

  • @KFlorent13

    @KFlorent13

    3 жыл бұрын

    @@JacobSorber Do you have any plan to tackle how to properly decompose your program ? What I mean is that I constantly seem to create circular dependencies with my .h files which are quite annoying to get rid of...

  • @Cosm1nx
    @Cosm1nx2 ай бұрын

    my saviour

  • @satriohalim7213
    @satriohalim7213Ай бұрын

    Hi Jacob, just wanna say your video is amazing and really helped me to learn Trie Data Structures. If i could ask, can you help me with printing words that starting with a certain prefix word? like print words starting from "da". I'm new in this thing and sometimes getting confused. I will be very gratefull if you can aswer me. Thank You Jacob!.

  • @ekinigdr2030
    @ekinigdr20303 жыл бұрын

    We can use that structure to sort numbers right ? I was messing around with it and i realized if i put random numbers and print breadth first fashion i get them in a sorted order.thanks by the way for all the videos youve created

  • @JacobSorber

    @JacobSorber

    3 жыл бұрын

    Yes, you definitely can. The sorted nature of the trie is one of the main advantages over something like a hash table. And, you're welcome. I'm glad you're enjoying them.

  • @lllbenderlll
    @lllbenderlll3 жыл бұрын

    WoW Man!!! This is nice 👍!!!

  • @JacobSorber

    @JacobSorber

    3 жыл бұрын

    Thanks! Glad you like it.

  • @lllbenderlll

    @lllbenderlll

    3 жыл бұрын

    @@JacobSorber it Would be very nice if you do a playlist with all your "data structures" videos. I like your style of explanation, it's awesome!!!

  • @ash1992saga
    @ash1992saga2 жыл бұрын

    I didnt understand the printtrie_rec function why the memcpy was done and how did he retrive the strings in the proper ascii alphabetical order ? Someone pls helpp me understand this .

  • @gregoryfenn1462
    @gregoryfenn14623 жыл бұрын

    Doesn't line 14 at 8:15 raise an error as you are using newnode (in the right-hand express) before defining it? How does malloc know what side to allocate if newnode isn't defined yet?

  • @diceandbricks

    @diceandbricks

    3 жыл бұрын

    Newnode is defined. Compiler parses the statement left to right, even though the code evaluates malloc before the assignment. This is a common idiom.

  • @monochromeart7311
    @monochromeart73112 жыл бұрын

    Why not just use memset(newnode, 0, NUM_CHARS * sizeof(trienode*)) to fill the array with 0s?

  • @JMoloneyProjects
    @JMoloneyProjects Жыл бұрын

    Since when can you dynamically assign a buffer length at runtime? unsigned char buffer[length + 1]; (When length is an unknown to the compiler function argument)? In my experience this should hit a compiler error. Am I missing something?

  • @JacobSorber

    @JacobSorber

    Жыл бұрын

    C99, but the rules were adjusted further in C11, I think.

  • @JMoloneyProjects

    @JMoloneyProjects

    Жыл бұрын

    @@JacobSorber That's cool, C++ needs to take note. Just tried in C++17 and it says no. How exactly doe it work? It couldn't put it onto the stack because it doesn't know how large it is to pre-allocate it. Does it write the code to malloc and free in the compiler?

  • @guangtingyu5609
    @guangtingyu56093 жыл бұрын

    Question: you don't free the memory after calling malloc, does that cause a memory leak?

  • @diceandbricks

    @diceandbricks

    3 жыл бұрын

    Depends. The OS automatically frees all resources when the process ends. For a short-lived process like this, the programmer can rely on the OS to perform that clean-up. A long-running process should manage and free its own resources.

  • @BookofYAH777
    @BookofYAH777 Жыл бұрын

    1:20 "now we got it sorted out and we just call them tries" but then you can't distinguish it from "try" statement

  • @JacobSorber

    @JacobSorber

    Жыл бұрын

    True, but at least a try statement isn't a data structure, so maybe we can use context clues to figure out what we're talking about.

  • @BookofYAH777

    @BookofYAH777

    Жыл бұрын

    @@JacobSorber that makes sense

  • @junbird
    @junbird3 жыл бұрын

    8:25 Why not use calloc instead of malloc, so that the array would already have been initialized to NULL values, without the need of manually looping on it? By the way, really great video, I would guess that this data structure is basically what Java refers to as "TreeMap", right? Like, you could replace that boolean with a pointer to something and if the pointer is NULL, the map doesn't contain that symbol, otherwise it returns that pointer (or something like that).

  • @jordanchen23

    @jordanchen23

    3 жыл бұрын

    I tried doing it this way for my first trie program and it worked, but when you run valgrind it reports memory leaks. The reason is seems to be that calloc sets all the values to 000000000000s, which isn't the same as NULL. As such, you must initalize all the pointers in a node to NULL unless you're using that memory block to read what's already in a file. So although calloc gives you a nice whole chunk of memory, you still have to do the looping, and the time you'd need to initilaize all the child nodes to NULL would probably end up taking longer than if you just used malloc in the first place.

  • @junbird

    @junbird

    3 жыл бұрын

    @@jordanchen23 mhm really? I really don't get why though, isn't NULL in C just an alias for 0?

  • @jordanchen23

    @jordanchen23

    3 жыл бұрын

    @@junbird correct me if I'm wrong, but ptrs are technically only instructed to "select" one byte at a time, specifically the 0th byte of whatever datatype the ptr itself is representing. What happens when your ptr requests memory is that the compiler will return to it the specified length of addressable memory space which is supposed to be off-limits to any unrelated processes hereafter. But until you set each data-containing part of the ptr-to-struct to null, anything within those spaces will remain as garbage value. Therefore, there is no guarantee that a program won't accidentally overwrite these spaces, so the compiler will just flag these as problems unless told otherwise.

  • @viacheslav1392

    @viacheslav1392

    Жыл бұрын

    @@junbird null isn't necessary 0

  • @jamalhasan6624

    @jamalhasan6624

    Жыл бұрын

    @@junbird null is 0x0

  • @mikelong3444
    @mikelong34442 жыл бұрын

    🔥🔥🔥

  • @emanalsbeiheen5619
    @emanalsbeiheen56193 жыл бұрын

    What is the best IDE for C langauge?,,,,, and Thank you for this great content

  • @_tobii

    @_tobii

    3 жыл бұрын

    Its subjective. It depends on what you need It to do

  • @DFPercush

    @DFPercush

    3 жыл бұрын

    Visual Studio (not Code) community edition if you're on Windows, for Linux probably Code::Blocks, maybe Eclipse, I'm not sure if IntelliJ does C but that's another one. I'm sure a decent search engine will turn up a few results with pros and cons, but tbh I find Visual Studio's debugger integration hard to beat.

  • @emanalsbeiheen5619

    @emanalsbeiheen5619

    3 жыл бұрын

    @@DFPercush Thank you

  • @emanalsbeiheen5619

    @emanalsbeiheen5619

    3 жыл бұрын

    @@_tobii I just wanna learn data structure with C,, Im Android developer

  • @_tobii

    @_tobii

    3 жыл бұрын

    @@emanalsbeiheen5619 are you working on Linux?

  • @klarnorbert
    @klarnorbert2 жыл бұрын

    Can you do one about radix tree?

  • @JacobSorber

    @JacobSorber

    2 жыл бұрын

    Probably. I'll add it to the list. Thanks.

  • @benjaminshinar9509
    @benjaminshinar95093 жыл бұрын

    wait, what? in the recursive function, did you really declare an array on the stack with a variable length? how does that work? I thought it was illegal to do, doesn't that mess up the stack frame size?

  • @fgdou

    @fgdou

    3 жыл бұрын

    No, he declares an array on the heap (malloc), but he forgot to free it Sorry to disappoint you 😁

  • @benjaminshinar9509

    @benjaminshinar9509

    3 жыл бұрын

    @@fgdou I'm asking about line 48, in the print_trie recursive function.

  • @diceandbricks

    @diceandbricks

    3 жыл бұрын

    @@benjaminshinar9509 yes Variable Length Arrays are legal in C99. It dynamically allocates array on the stack and automatically cleans up. It only works in function scope, not global nor static. Jacob might have a video about it. My advice is use them sparingly and with caution. If you make the array too large it can fill up your stack space.

  • @benjaminshinar9509

    @benjaminshinar9509

    3 жыл бұрын

    @@diceandbricks oh that explains it. I learned strict Ansi89 C before moving on to other languages.

  • @Timz2U
    @Timz2U3 жыл бұрын

    Great tutorial again! BTW, could you please make your code at the center of the screen in the next video? The subtitle just overlaps the code ;)

  • @Timz2U

    @Timz2U

    3 жыл бұрын

    oops, I just realized the subtitle is movable

  • @nicholas_obert
    @nicholas_obert3 жыл бұрын

    Nice to see you always use make even for dirt simple projects.

  • @JacobSorber

    @JacobSorber

    3 жыл бұрын

    It's a good habit.

  • @NeigeFraiche
    @NeigeFraiche3 жыл бұрын

    Instead of bool trieinsert(**trienode, ..), why not pass *root by ref? ie trienode *&root

  • @virajdesai2955

    @virajdesai2955

    3 жыл бұрын

    Passing by reference like that is a concept only in C++, not C.

  • @mhamdmarch8709
    @mhamdmarch87093 жыл бұрын

    You are amazing pls dont stop uploading videos 😍😍😍🤩🤩🤩🤩😘😘😘😘😘😘😘 Love u sooo much

  • @JacobSorber

    @JacobSorber

    3 жыл бұрын

    Thanks.

  • @JohnHollowell
    @JohnHollowell3 жыл бұрын

    in createnode(), shouldn't it be mallocing a "sizeof trienode" and not "sizeof *newnode"? It should be the size of the struct (and not a pointer to that struct) and should also be the trienode type and not the newnode variable. I didn't even know you could use sizeof with a variable

  • @thetastefultoastie6077

    @thetastefultoastie6077

    3 жыл бұрын

    newnode is a pointer to a trienode Therefore dereferencing the pointer with the Dereference Operator '*' results in a trienode type So sizeof(trienode) and sizeof(*newnode) are equivalent The advantage to using the type of the variable, instead of using the type directly, is that if you can change the type of the variable, you don't need to update the typeof statement

  • @JacobSorber

    @JacobSorber

    3 жыл бұрын

    Right. What Mr. Tasteful Toastie said.

  • @JohnHollowell

    @JohnHollowell

    3 жыл бұрын

    totally makes sense. I was thinking of it as a pointer and not dereference. Gotta love C's overloading of the * symbol for multiple things

  • @user-he4ef9br7z
    @user-he4ef9br7z3 жыл бұрын

    Can somebody tell me more about stdbool? I'm new to C.

  • @auz7977

    @auz7977

    3 жыл бұрын

    essentially, C (unlike some other languages) doesn't have a boolean data type built into the language itself. most of the time, people would use an integer with a value of 1 to mean "true", and 0 to mean "false". stdbool is a part of the standard library that adds a bool data type as well as "true" and "false" macros to use with it. so instead of writing something like this: int someCondition = 1; // 1 means true with stdbool, you can do this: bool someCondition = true;

  • @thetastefultoastie6077

    @thetastefultoastie6077

    3 жыл бұрын

    @@auz7977 Actually, C99 added a builtin boolean type called _Bool, and guarantees that any two true values will compare equal (which was impossible to achieve before) So stdbool.h doesn't add a boolean data type, but it does have these defines: #define bool _Bool #define true 1 #define false 0 #define __bool_true_false_are_defined 1

  • @auz7977

    @auz7977

    3 жыл бұрын

    @@thetastefultoastie6077 this is true. i just thought it would be easier to explain stdbool without having to explain _Bool. imo, using _Bool without stdbool is almost pointless. although i suppose i could have explained a bit better.

  • @thetastefultoastie6077

    @thetastefultoastie6077

    3 жыл бұрын

    @@auz7977 oh ok. I've met some seasoned C software engineers who didn't realise C had real booleans. Just nice to know it's a thing.

  • @OlliS71
    @OlliS71 Жыл бұрын

    Better program such Things in C++, it's more descriptive if you use the OOP features and you could casually teach some generic programming in this case. I think the most common trie data structure is the mapping trie which every memory management unit of modern CPUs has. The speciality with that is that the prefix-length of each branch at the same level has the same number of bits.

  • @SeshachalamMalisetti
    @SeshachalamMalisetti2 жыл бұрын

    type struct Node { Children map[rune]*Node }

  • @SimGunther
    @SimGunther3 жыл бұрын

    Wait until you see what hash array mapped tries and its cousin CHAMP have in store for you! :D

  • @JacobSorber

    @JacobSorber

    3 жыл бұрын

    I know. The fun never ends. 😂

  • @cmdlp4178
    @cmdlp41783 жыл бұрын

    create_code could have been just a simple call to calloc, that already initializes all members to 0 Also, the stack memory used is O(k²), where k is the trie-depth

  • @TheRealLughnatic

    @TheRealLughnatic

    3 жыл бұрын

    I think malloc is better in this case, since viewers may want to port this to other languages, and it may not seem immediately obvious how to port calloc to another language.

  • @pierreabbat6157
    @pierreabbat61573 жыл бұрын

    Trie it, you'll like it.

  • @JacobSorber

    @JacobSorber

    3 жыл бұрын

    😀

  • @nephatrine
    @nephatrine2 жыл бұрын

    I always have pronounced it like "tree".

  • @argustuft
    @argustuft2 жыл бұрын

    Does anybody still use Patricia tries ?

  • @argustuft

    @argustuft

    Жыл бұрын

    When I looked at Patricia tries (many years ago) I couldn't find a robust way of deleting nodes, so this limited their use. Great method for string searching and hash tables. It would be a good subject for Jacob to address.

  • @argustuft

    @argustuft

    Жыл бұрын

    Patricia allows fixed size nodes with variable length strings node: (leftptr,rghtptr,bitindex,strgptr)

  • @KamillaMirabelle
    @KamillaMirabelle3 жыл бұрын

    Have you considered to move to rust?

  • @muhammadwasim8787
    @muhammadwasim87873 жыл бұрын

    Just wishing if we get it implemented in java.

  • @fgdou
    @fgdou3 жыл бұрын

    Ho no! You forgot to free the array in the recursive function! 😂

  • @Mozartenhimer

    @Mozartenhimer

    3 жыл бұрын

    Don't need to, it's allocated on the stack.

  • @casperes0912
    @casperes09122 жыл бұрын

    I can't get over the fact that you place the pointer-stars in the wrong place. int* name is so much cleaner than int *name. Yes it goes wonky if you have multiple assignments on one line, but I don't think that's good to do in general in the first place, at least not if you're mixing pointers and values. int* says "this is an integer pointer" int *name says "This is an integer........ oh yeah, just a pointer to it" I can even accept int * name. In that case it can be seen as a qualifier similar to long int or unsigned int. Just space add extra type info. It is also more consistent with the way you cast things (unsigned char *) - the pointer-star is itself a free-standing bit of information, not tied to the name. But that's just me. Everyone has their own view on this I guess :)

  • @rustycherkas8229

    @rustycherkas8229

    2 жыл бұрын

    Wrong place??!! typedef struct { int foo; int bar; } myStruct_t, *ptrMyStruct_t; The variable - not the datatype - is the pointer. One cannot extend "int* name;". One CAN extend "int *name;" to "int *pVar1, *pVar2;" Time to "get over" it if you want to write good 'C' code.

  • @lewiscole5193
    @lewiscole51933 жыл бұрын

    Professor Sorber is obviously trying to introduce what tries are to neophyte programmers. But as with any data structure, it's important to understand the advantages and disadvantages of the use of any particular data structure to know when it's appropriate to use one versus another. So FWIW, let me emphasis what Professor Sorber waved his arms at, namely that tries are good for data retrieval, but they get there from here usually at the cost of using more memory to gain speed. But there are some instances where hash tables can beat tries when it comes to look up speed without the higher memory cost and if you want to do something other than exact string look up, say like partial string look up or sorting strings in addition to just trying to retrieve them, then some other form of tree like a ternary tree might be a better choice than a trie. Just for giggles, if anyone is interested in ternary trees, here's a paper that describes using ternary trees for searches, sorts, and partial string searches: i