The effect of Random UUID on database performance

Ғылым және технология

In this video I whiteboard how UUIDs hurt write (and read performance) when used on secondary and primary indexes. UUIDv4 are the most popular but they are truly random. Compared to snowflakes, ULID or UUIDv7 or even UUIDv1.
0:00 UUIDv4
2:30 B+Tree Indexes and UUID
5:30 Random UUIDv4 Insert Workload
12:40 Ordered Insert Workload (UUID7/ULID, Sequence)
14:00 Shared buffer pool flushes
15:00 Shopify ULID use case
17:00 URL shortner UUIDs?
Discovering Backend Bottlenecks: Unlocking Peak Performance
performance.husseinnasser.com
Fundamentals of Backend Engineering Design patterns udemy course (link redirects to udemy with coupon)
backend.husseinnasser.com
Fundamentals of Networking for Effective Backends udemy course (link redirects to udemy with coupon)
network.husseinnasser.com
Fundamentals of Database Engineering udemy course (link redirects to udemy with coupon)
database.husseinnasser.com
Follow me on Medium
/ membership
Introduction to NGINX (link redirects to udemy with coupon)
nginx.husseinnasser.com
Python on the Backend (link redirects to udemy with coupon)
python.husseinnasser.com
Become a Member on KZread
/ @hnasr
Buy me a coffee if you liked this
www.buymeacoffee.com/hnasr
Arabic Software Engineering Channel
/ @husseinnasser
🔥 Members Only Content
• Members-only videos
🏭 Backend Engineering Videos in Order
backend.husseinnasser.com
💾 Database Engineering Videos
• Database Engineering
🎙️Listen to the Backend Engineering Podcast
husseinnasser.com/podcast
Gears and tools used on the Channel (affiliates)
🖼️ Slides and Thumbnail Design
Canva
partner.canva.com/c/2766475/6...
Stay Awesome,
Hussein

Пікірлер: 157

  • @hnasr
    @hnasr9 ай бұрын

    database fundamentals course database.husseinnasser.com

  • @MrAtomUniverse

    @MrAtomUniverse

    9 ай бұрын

    This is good , i'm on udemy business, do keep more courses open to business plan users! thank you! great videos you have there

  • @vasiliynet3425
    @vasiliynet34259 ай бұрын

    Awesome! Thanks. Dialectic of randomness and order. Very beautiful.

  • @bishalkandel9976
    @bishalkandel99769 ай бұрын

    Let's make it bigger.

  • @barebears289

    @barebears289

    9 ай бұрын

    That's what she said😂😂

  • @ahmedalaaeldin7362
    @ahmedalaaeldin73629 ай бұрын

    Actually, I was thinking about that question a couple of days ago Thank you :)

  • @berndeckenfels
    @berndeckenfels9 ай бұрын

    Yes when you age them out, they might leave gaps in the index tree but also reuse the gaps regularly. They might not “nowhere near each other” but always in between. Sp with larger pages the splits are less likely. (Of course IOT is to a good usecase)

  • @AffyisAffy
    @AffyisAffy9 ай бұрын

    The whole time I was wonder why not just insert on a new index rather than pointing to a UUID, that way it's always ordered. But then you said MySQL defaults to this at the end of the video. That's critical for the 'why' of this video

  • @tirthwork9288

    @tirthwork9288

    9 ай бұрын

    He said that in the start of the video as well

  • @berndeckenfels

    @berndeckenfels

    9 ай бұрын

    You can use a sequence but keeping them has its own scalability problem, plus the pk won’t be stable

  • @therealtuyen
    @therealtuyen9 ай бұрын

    I just randomly stumbled upon your video, and I've always had this itch about UUID performance. Your video really boosted my confidence in what I was thinking. Thanks a lot! Oh, and quick question: Do databases like MariaDB, MySQL, or PostgreSQL automatically play nice with ULIDs? And how do they know to sort 'em out for indexing?

  • @nirnullz

    @nirnullz

    9 ай бұрын

    Postgresql, mysql 8++, mariadb 10.7++ support uuid internally. Postgresql have best implementation since it’s saved as 128bit binary but can be accessed as string without custom function on client. MariaDB/MySql client need to call certain function if you want to query a binary uuid field.

  • @therealtuyen

    @therealtuyen

    9 ай бұрын

    @@nirnullz I asked about ULID, bro ?

  • @consistentlearners
    @consistentlearners9 ай бұрын

    Happy Teachers Day Husseinn

  • @gam1l
    @gam1l7 ай бұрын

    This one cracked me up 09:59 😂 Great work as usual 👏 Please consider making a video on Pinecone and vector databases 🙏

  • @usamaabubakar3061
    @usamaabubakar30618 ай бұрын

    Will this concept be applied to columns that are Varchar and indexed? Because strings are also random like username, email, url slugs, etc.. If so, what is the workaround for storing string values as indexed?

  • @RealEvangelizer
    @RealEvangelizer9 ай бұрын

    Ive used UUID generated outside the database and stored as primary key varchar within the db. Its worked a lot better for over 100 million devices that I was working with. Primary goal was to use it as hashtables when use other systems like redis/dynamo etc. For lookups, with a bit off magic in UUID generation which is not being generated by DB, you can apply Bloom filters as well. I've been out of engineering field for a decade+ but these sort of videos are always much needed to discuss the fundamentals of how things work.

  • @xD-saleem
    @xD-saleem9 ай бұрын

    thanks so much its make sense now

  • @tempaccount8256
    @tempaccount82569 ай бұрын

    Never thought, Hussein will come with a pun in the video "that's what she said"

  • @mikestaub
    @mikestaub9 ай бұрын

    Please do a video on vector databases.

  • @ddanielsandberg
    @ddanielsandberg9 ай бұрын

    MySQL 8 uses UUID v1 (kind of compatible with UUID v2). By using the UUID_TO_BIN()/BIN_TO_UUID() function with the optional second argument 'swap_flag' it will reformat the UUID before converting it to a BINARY(16). This will make them sequential (since UUID v1/v2 is based on the timestamp).

  • @someoneanonymous6707

    @someoneanonymous6707

    9 ай бұрын

    How costly is that operation tho?

  • @cedricbrisson7240

    @cedricbrisson7240

    9 ай бұрын

    uuidv1 is not secure tho

  • @joesb

    @joesb

    9 ай бұрын

    @@cedricbrisson7240 "What security and why should it apply for this case?" uuidv1 has some characteristic that make it possible for someone who want to *intentionally* clash/guess your id, they might be able to. If YOUR OWN service is creating this id, why would you intentionally do it? Don't just parrot that something is insecure. Knows if it applies to your use case and attack vector as well.

  • @cedricbrisson7240

    @cedricbrisson7240

    9 ай бұрын

    Jesus chill lol It's simply good practice to avoid using vulnerable services/tools when safer alternatives can be used. As someone on a CSEC team I'd rather pre-emptively protect myself than wait for someone to find a use case for a specific flaw :) But sure, I'm just "parroting". You're the kind that praises himself as a software engineer thinking their above "programmers" right?

  • @joesb

    @joesb

    9 ай бұрын

    @@cedricbrisson7240 do you think “integer” is secure? Do you avoid using integer? It’s nonsense statement. Uuid v1 is insecure for a certain use case. Using it as an id you generate yourself is not one of them.

  • @alegon2007
    @alegon20079 ай бұрын

    What about distributed NO SQL DBs (document, key-value, etc) where the recommendation is to use partition keys with high cardinality and avoid sequential values because that will create “hot” partitions for inserts at least? Maybe its a different use case but it would be an interesting topic too

  • 9 ай бұрын

    where did you read that? Could you share the article?

  • @mishikookropiridze

    @mishikookropiridze

    9 ай бұрын

    Different scenario.

  • @ehfoss

    @ehfoss

    9 ай бұрын

    There are 3 new uuids in rfc draft. ulid is equivalent to uuidv7. uuidv8 let's you define your own data. And uuidv6 has a machine key followed by a timestamp akin to ulid which can help with the situation you're describing.

  • @mishikookropiridze

    @mishikookropiridze

    9 ай бұрын

    Also possible problem is that ULID doesn't have RFC, so implementation can differ.

  • @avinashboppudi

    @avinashboppudi

    9 ай бұрын

    @alegon2007, that is true for NO SQL DB's as it does not use B+ tree datastructure to store data. If I take DynamoDB as an example, It relies on something like partition key to hash and figure out the partition. This makes sure your single partition is not going to be overloaded with requests. While for sort key, it's recommended to keep it sequential if possible as internally in the shard it must have been using a B+ tree to sort on the sort keys

  • @user-fl6up9jo1w
    @user-fl6up9jo1w9 ай бұрын

    so what about storing uuid in hashmap index?

  • @buddy.abc123
    @buddy.abc1239 ай бұрын

    He made it bigger.

  • @julianpurse956
    @julianpurse9569 ай бұрын

    What about database performance of using a uuid stored in varchar column vs new db versions which support a native uuid type

  • @jacob_90s

    @jacob_90s

    9 ай бұрын

    Doesn't change the problem, and in fact you'll probably get even worse performance since you'll have a less compact data type.

  • @utkarshsaxena8034
    @utkarshsaxena80349 ай бұрын

    How about when we go ahead and distribute the data. Won't this make the last shard hot all the time?

  • @PabloGnesutta
    @PabloGnesutta9 ай бұрын

    What about NOSql databases like mongo db?

  • @usun_development3947
    @usun_development39479 ай бұрын

    So what are we supposed to use as a primary key identifier in MySQL, if we use distributed database setup?

  • @ferrysuhandri

    @ferrysuhandri

    9 ай бұрын

    using UUID for distributed database is fine. the video is about using identifier in general case. using distributed database is a special case.

  • 9 ай бұрын

    does it mean that, generally, indexing on a random value column is hurt?

  • @fullstackdotnet

    @fullstackdotnet

    9 ай бұрын

    it is, but dont start avoiding it and enforcing new guidelines yet, wait till you reach shopify level scale where you can benefit from that sort of performance improvement. its all about ROI

  • @RZhuAmpere
    @RZhuAmpere9 ай бұрын

    10:00 would have never expected that joke from you 😂

  • @CaleMcCollough
    @CaleMcCollough9 ай бұрын

    Great info. I need a tutorial on how to do this with PlanetScale. I ordered your Udemy class. You should add it to that.

  • @Iandavidandrino
    @Iandavidandrino9 ай бұрын

    I was just thinking about this 😵

  • @OzoneGrif
    @OzoneGrif9 ай бұрын

    The worst is using UUID with SQL Server; since the PK is Clustered; performance becomes awful very quickly. I use Postgresql and never had performance issues with UUIDs.

  • @ShinigamiZone

    @ShinigamiZone

    8 ай бұрын

    Can you tell me a bit more about this? What means clustered in that case for SQL Server? I just structured a whole DB with PK UUIDv4 in SQL Server but it’s not shipped to prod yet 👀

  • @ShinigamiZone

    @ShinigamiZone

    8 ай бұрын

    At least the prod system is not high-performance required I expect around 2000 writes per day and 1_000_000 reads per day And let’s say 10_000_000 rows in total with around 10 tables that are read mainly, 5 more tables for not so much read tables and 2 views (maybe need to be materialized views)

  • @OzoneGrif

    @OzoneGrif

    8 ай бұрын

    @@ShinigamiZone Clustered means physically ordered on the storage. Random UUID used as a Clustered PK (the default on SQL Server) means your records will be fragmented everywhere on your storage, leading to awful performance.

  • @anuragbhagsain9724
    @anuragbhagsain97249 ай бұрын

    lowkey "That's what she said" reference at 9:57 😂

  • @alfrendosilalahi6652
    @alfrendosilalahi66529 ай бұрын

    Any one can explain about what is page in that explanation? i don't really understand

  • @muratdemirturk2847

    @muratdemirturk2847

    9 ай бұрын

    it is like a file that DB put indexes in. Lets say you want to get a record from DB with id number 80. if indexing in pages were not used it would have to scan entire DB to find where a record with id=80 saved. This wont be an eficient way of getting data. this is actualy how file system work. In DB it uses indexes that points which sector that record are saved. And pages are basicly where those indexesare saved and its pointers are where actual data/record are saved. So DB first goes to pages and finds index. then looks for the exact record from where index is pointing. Long story short pages are phonebook for DB...

  • @tmbarral664
    @tmbarral6649 ай бұрын

    Hum…Hussein, I’m a kind of surprised here…. To a point I maybe doubtful…. Here you’re showing the ordering of the whole rows ( as stating item as key value) when we’re talking about indexes (indices). Indexes are ordered (by nature ;)) and are pointing to the value, no matter where the value is. But here, you’re saying the primary index, the one for the primary key, is not behaving as a normal index…. If that’s the case, that’s a major (I should use uppercase here) problem. That means that we shouldn’t be allowed, in any case(uppercase again) to use anything but autoincrement number ! See, that’s so big it makes me doubt.

  • @berndeckenfels

    @berndeckenfels

    9 ай бұрын

    For clustered index, also called IOT the value is in the index - however even if the value is not there you need to shuffle them

  • @redpillsatori3020
    @redpillsatori30209 ай бұрын

    Pro tip: watch at 1.5x speed. You're welcome

  • @Notion615
    @Notion6159 ай бұрын

    set page fill to 71% and periodically rebuild your index at low fragmentation percent and problem solved for indexes on uuid4

  • @durgeshchoudhary

    @durgeshchoudhary

    9 ай бұрын

    can you please point to documentation for these? would be helpful.

  • @ehfoss

    @ehfoss

    9 ай бұрын

    What shower of a db requires that you rebuild it all the time haha

  • @Notion615

    @Notion615

    9 ай бұрын

    @@ehfoss set up a service that runs every so often that rebuild your index programatically during periods of low load.

  • @Notion615

    @Notion615

    9 ай бұрын

    ​@@durgeshchoudhary To my fellow neckbeards who wish to know the black arts. kzread.info/dash/bejne/nKxhqNeHoJfMf6Q.html

  • @ankurdutta3277
    @ankurdutta32778 ай бұрын

    I learned this the hard way. In mongodb I keep getting 'document not found' even though it does.

  • @simon3121
    @simon31219 ай бұрын

    Missing the point. A UUID allows an application to create the ID rather having a two step approach of asking the DB for an ID first.

  • @AnOmegastick

    @AnOmegastick

    9 ай бұрын

    Applications can create a ULID.

  • @shammahrashad
    @shammahrashad8 ай бұрын

    His videos are so interesting but so slow. I can easily watch it at 2x speed.

  • @jameswestbrook5709
    @jameswestbrook57099 ай бұрын

    how about ULID thought?

  • @mardix

    @mardix

    9 ай бұрын

    ULID leaks timestamp. For sensitive data, you may not want to expose this info.

  • 9 ай бұрын

    if a B-tree index on UUID is hurt, will the hash index on UUID be better?

  • @ferrysuhandri

    @ferrysuhandri

    9 ай бұрын

    hash index for secondary index is possible, but for primary key not all systems can set it with hash. also, uuid is large, we should only use it as primary key when we really need.

  • @amrojjeh

    @amrojjeh

    8 ай бұрын

    My guess is that it'd be better but would still suffer because inserts would be stored far apart, and so you'd still have the caching issue

  • @michawhite7613

    @michawhite7613

    Ай бұрын

    Updates are slower on hash indices in the worst case. When a bucket becomes full, you now have to update every single bucket, since ID % (number of buckets) is now resulting in a different value for every single ID

  • @saggitt
    @saggitt8 ай бұрын

    If your IDs are public and you make them sequential, be aware that third parties will know the number of entities created over a time period.

  • @malibudancer8472
    @malibudancer84729 ай бұрын

    This is very narrowly scoped on dbs that don't optimize for distributed cases...

  • @bernard-ng
    @bernard-ng9 ай бұрын

    I messed up 😂😂😂 gotta talk to my CTO as soon as possible

  • @AnkurVermachords
    @AnkurVermachords9 ай бұрын

    I am confused at the statement that indexes are b+ tree and it gets indexed and it gets balanced but if gets balanced isnt the order gets messed up...i am a bit confused 😐 🤔

  • @Aspartame12

    @Aspartame12

    9 ай бұрын

    The tree gets balanced by shifting which node is treated as root , balanced binary search tree concept.

  • @AnkurVermachords

    @AnkurVermachords

    9 ай бұрын

    @@Aspartame12 but if change the root isnt the root becomes the first node then where is the order still same confuision

  • @Aspartame12

    @Aspartame12

    9 ай бұрын

    @@AnkurVermachords order in a balanced BST is never there, it's for search only , in a balanced bst search takes less time compared to unbalanced that's the whole premise, it also supports less than and greater than queries easily.

  • @AnkurVermachords

    @AnkurVermachords

    9 ай бұрын

    ​@@Aspartame12he mentioned the ds to be ordered at 3.40 .. aaaaahhhhhhh i am confused now, had the idea of tree now doesnt have the idea of plant as well😂

  • @Aspartame12

    @Aspartame12

    9 ай бұрын

    @@AnkurVermachords Okay, so a BST is ordered in the sense that there is a order in which the nodes are arranged left child is smaller right child is bigger and so on, if you balance the tree your root value changes , that means you do not have ready access to smallest value or largest value, but the middle value ( approx ) , so while searching for something you start from middle value ( root ) and go left or right to find it. You can also write a simple recursive/stack based code to print contents of a BST in ascending sorted order or descending sorted order.

  • @vmaxxk
    @vmaxxk9 ай бұрын

    I've seen DBs with millions of records across various tables perform just fine with UUIDs leveraged everywhere. Either the performance difference is negligible or it only affects extremely large datasets.

  • @willi1978

    @willi1978

    9 ай бұрын

    i think so too. usually data has no correlation between each other for example customers in a table. for timeseries data uid v7 can be used

  • @IvanRandomDude

    @IvanRandomDude

    9 ай бұрын

    Databases are much more powerful than we think.

  • @experimentalhypothesis1137
    @experimentalhypothesis11377 ай бұрын

    does this apply also for uuid5?

  • @philheathslegalteam

    @philheathslegalteam

    6 ай бұрын

    Yes. UUIDv6 is the implementation which is proposed to fix this

  • @michawhite7613
    @michawhite7613Ай бұрын

    If you need an index on usernames, won't you have the same problem?

  • @VTOLfreak
    @VTOLfreak9 ай бұрын

    And now you get locking contention on that last page. Fix one problem, create another.

  • @SmartK8
    @SmartK89 ай бұрын

    You never had to distribute databases into multiple locations, that insert independently and then access global data. I use UUIDs for all the data tables. You have 100 databases with millions of IDs. Just insert independently whenever you want, the worst thing that can happen is a rare collision hypothetically (never happened to me yet). They will still be OK with UUIDs. Now imagine this with auto-increment integer. Insanity. How would you keep track of this auto-increment ID and ensure it's unique across all databases, other then synchronizing them. Auto-increment integers as a key are for script kiddie databases, when you have one local database. I still wouldn't use integers even then. Because you never know, whether you'll need distributed database later.

  • @ConsuelaPlaysRS

    @ConsuelaPlaysRS

    9 ай бұрын

    "Auto-increment integers as a key are for script kiddie databases" I'm willing to bet the house that most tables in most databases powering most applications use auto-incrementing PKs lol

  • @SmartK8

    @SmartK8

    9 ай бұрын

    @@ConsuelaPlaysRS That doesn't automatically mean there's a good reason to have such keys. Also most databases are probably local file databases. Everyone can do whatever they want. I'm just saying after some gathered experience, you realize that little gain in being sequential is worth sacrificing for a more robust and secure database.

  • @shubhambravo

    @shubhambravo

    9 ай бұрын

    I work for a major b2c org and we use auto increment IDs to store order info 😅

  • @Ryan-gf1sz

    @Ryan-gf1sz

    9 ай бұрын

    @@ConsuelaPlaysRS While I agree that's true and it's convenient in many case,it just won't work in large distributed system.

  • @omniphage9391

    @omniphage9391

    9 ай бұрын

    for the longest time you wouldnt distribute your databases if you could avoid it, i know that sql sharding exists but that is something you would have to build your persistence layer around. With the rise of nosql databases this has changed, although that in turn also has implications for design. Its engineering. The perfect solution is the one that works.

  • @MrAtomUniverse
    @MrAtomUniverse9 ай бұрын

    For postgresql, why would anyone use UUID , SERIAL is the default method for incremental ID right?

  • @dz5483

    @dz5483

    9 ай бұрын

    Perhaps you don't want to reveal the IDs of your records such that it becomes easy to query your records incrementally. This could be a security risk in API available publicly

  • @gamemoves2415

    @gamemoves2415

    9 ай бұрын

    ​@@dz5483I'm thinking when you get the record from the DB on the front end rather you show the UUID.

  • @gamemoves2415

    @gamemoves2415

    9 ай бұрын

    ​@@dz5483I'm thinking when you get the record from the DB on the front end rather you show the UUID.

  • @sasonbraha4599

    @sasonbraha4599

    9 ай бұрын

    I remember reading about an incident in GithHub involving auto-incremented counters: "For example, in one incident at GitHub [13], an out-of-date MySQL follower was promoted to leader. The database used an autoincrementing counter to assign primary keys to new rows, but because the new leader’s counter lagged behind the old leader’s, it reused some primary keys that were previously assigned by the old leader. These primary keys were also used in a Redis store, so the reuse of primary keys resulted in inconsistency between MySQL and Redis, which caused some private data to be disclosed to the wrong users." Kleppmann, Martin. Designing Data-Intensive Applications (p. 250). O'Reilly Media. Kindle Edition.

  • @MrAtomUniverse

    @MrAtomUniverse

    9 ай бұрын

    @@dz5483 in this case, people use slug which is unique , otherwise it's like youtube watch?v=qRv5g5MEF2k , let me know i anyone has a better method.

  • @AbhishekSingh-pu6dg
    @AbhishekSingh-pu6dg8 ай бұрын

    The silent joke at 10:00 That's what she said. Haha, classic.

  • @mitchdigitalnetwork
    @mitchdigitalnetwork9 ай бұрын

    😂😂😂😂 Seems to be a little bit small, so let's make it bigger............ that's what she said 9:56

  • @rollinOnCode
    @rollinOnCode9 ай бұрын

    how do you detect and troubleshoot bad page splits?

  • @banterfc2073
    @banterfc20737 ай бұрын

    Why not use auto incremented IDs instead

  • @michawhite7613

    @michawhite7613

    Ай бұрын

    My understanding is that's not a problem in-of-itself. But when you have another vulnerability, this will make it way worse. For example, say I found a way to see a user's phone number. I do this for user 1, then user 2, etc. I can run the attack on every single user rather quickly. If the IDs are random, then I have to guess and check user IDs, and I won't be able to affect as many people.

  • @shubhambravo
    @shubhambravo9 ай бұрын

    Use escalidraw

  • @adarshkr532
    @adarshkr5329 ай бұрын

    did he just make a that's what she said joke at 10:00 😂

  • @CaimAstraea
    @CaimAstraea9 ай бұрын

    They never made sense to me. There's no force powerful enough in the universe to convience me I need them over autoincremented integers.

  • @jwbonnett
    @jwbonnett9 ай бұрын

    UULD is not 100% accurate, the issue is the datetime ticks, not every generation, generates a tick. So you can still have this shuffling, though it won't be to the same extent.

  • @ttst3
    @ttst38 ай бұрын

    10:01 we said it at the same time 😂

  • @yadneshkhode3091
    @yadneshkhode30919 ай бұрын

    👍

  • @DirkSteinkamp
    @DirkSteinkamp8 ай бұрын

    Sorry, but having a B+ tree with a page size of 2 is very close to degenerate the B+ tree to a binary tree -- and binary trees either need to be rebalanced regularly, or might degenerate into a linear list. Rebalancing is an expensive operation, especially if it involved a secondary storage like HDD / SSD. And that's what B+ trees are optimized for: to minimize HDD access, leveraging having some leeway for inserts and deletes without "reshuffling", and generally keep the tree depth low. This requires the page size to be considerably larger than two, and be harmonized with the block size on the disk. Besides that, different DBMS might offer different variants of indexes, e.g. hash indexes, that are not implemented as trees, but as hash tables. Depending on the use case this might be the index type to choose, and your argument becomes void. I personally don't like UUIDs very much in many use cases, but for other reasons: first is they are not easily graspable for humans, they don't tell a story, don't have some inherent use case information in their structure, ... yet they are very useful when randomness is required, because the systems involved only synchronize their data eventually. But besides that: too much cognitive load ... Anyhow: I don't buy your argument, sorry.

  • @amrojjeh

    @amrojjeh

    8 ай бұрын

    Hash table would still suffer from performance as it would have to load from disk if inserts aren't stored close to each other

  • @DavisTibbz
    @DavisTibbz9 ай бұрын

    What about turning the uuid to binary(16) ?

  • @ArunGordon
    @ArunGordon9 ай бұрын

    That's what she said 🤭

  • @BrianWoodruff-Jr
    @BrianWoodruff-Jr8 ай бұрын

    I get the sense you hate indexed random UUIDs in databases.

  • @kraigochieng6395
    @kraigochieng63959 ай бұрын

    10:02

  • @ByteBite_Tech
    @ByteBite_Tech9 ай бұрын

    the speed of the video is 1.25x

  • @hipertracker
    @hipertracker9 ай бұрын

    Using numbers as the example is misleading. UUID are not ordered by design so you cannot sort them. I do not understand why the index has to be sorted? What about hash indexes?

  • @berndeckenfels

    @berndeckenfels

    9 ай бұрын

    You can sort them by comparing their raw byte or hex string representations, both is used depending on your db type

  • @hipertracker

    @hipertracker

    8 ай бұрын

    @@berndeckenfels yes, but it doesn't make sense, you have no control over the sort order.

  • @berndeckenfels

    @berndeckenfels

    8 ай бұрын

    @@hipertracker for an index which is supposed to find a specific UUID the sort order does not matter as long as it can be used to find it.

  • @michawhite7613

    @michawhite7613

    Ай бұрын

    Hash indices are even slower to update. Instead of updating a few nodes, you now have to update every single bucket, since ID % (number of buckets) is now resulting in a different value for every single ID.

  • @Cthulch
    @Cthulch9 ай бұрын

    Do you need to have your index ordered? A linked list? TERRIBLE. A tree? gosh! I always imagined random uuids would be utilizing hash table indexes, which is completely useless to sort. Using random numbers with a sorted index seems like a pretty dumb mistake. I feel like when you say "index", you don't even consider that there's more than one index type.

  • @michawhite7613

    @michawhite7613

    Ай бұрын

    Hash indices are really fast for select queries, but they make the updates even worse. If a page gets full, now instead of just updating a few nodes, you have to update every single bucket, since the ID % (number of buckets) will result in a different value now.

  • @Cthulch

    @Cthulch

    Ай бұрын

    @@michawhite7613 If it was a student who wrote the hashing lib then yes. In SQL, however, it's so well optimized that the page overflow happens extremely rarely and it's not as dramatic as it may seem. Plus in vast majority of cases it's totally worth it to have inserts longer if it means having O(1) for reading. And the complexity of the inserts is also linear or at least log(n), so insignificant.

  • @jamesh4129
    @jamesh41299 ай бұрын

    First!

  • @JohnCDSMB

    @JohnCDSMB

    9 ай бұрын

    Congratulations

  • @gradientO

    @gradientO

    9 ай бұрын

    Incredible achievement

  • @karasira2696

    @karasira2696

    9 ай бұрын

    Wow

  • @franciscos.2301

    @franciscos.2301

    9 ай бұрын

    ​@@gradientObrought to you by the Honda Gang

  • @andrewbrown8463
    @andrewbrown84636 ай бұрын

    There is a big difference between a database that understands a uuid type and is storing it as a number compared with a uuid stored as a string (varchar). Ultimately though using either a int64/bigint or uuid is the right way to create a primary key for a table. Never should a user entered value be a primary key. Otherwise the rest of this video is mostly garbage or irrelevant in the scheme of things when it comes to performance in a database.

  • @silicalnz
    @silicalnz9 ай бұрын

    you've never worked with distributed databases 😂

  • @roy20021

    @roy20021

    9 ай бұрын

    can you elaborate?

  • @voicevy3210

    @voicevy3210

    9 ай бұрын

    ​@@roy20021you need uuid to handle data migrations, data should be independent of where it's located

  • @malibudancer8472

    @malibudancer8472

    9 ай бұрын

    @@roy20021 he is correct.

  • @roy20021

    @roy20021

    9 ай бұрын

    @@malibudancer8472 ok, can you elaborate a bit more?

  • @malibudancer8472

    @malibudancer8472

    9 ай бұрын

    @@roy20021 I don't really have time to spend time on a stranger on youtube, and you can't really post links on youtube without getting your content deleted. So I can only guide you what to google for. Read about the partitioner of Cassandra and the hash function used. Search for partitioning hot spots and educate yourself about it. Also all relevant databases optimize their indexes for truly random UUIDs and allow tuples to be indexed. That is all I can do for you, if I could I would have posted you direct links instead. Thank youtube for being a P**e of s***.

  • @mmm-ie5ws
    @mmm-ie5ws6 ай бұрын

    why do you have to make this a 20 min video???

  • @solitary200
    @solitary2008 ай бұрын

    ULID > UUID

  • @RaduCruceru
    @RaduCruceru9 ай бұрын

    All talk without showing numbers. You can theorize on how the index performs all you want, but you should take the time to run some performance tests before-hand and present the data. Otherwise this video is just basic guessing and a waste of time.

  • @nbme-answers

    @nbme-answers

    7 ай бұрын

    I appreciate the conceptual overview

  • @AtomicCodeX

    @AtomicCodeX

    5 ай бұрын

    What you expect? Watch him calculate how long it takes to conclude which method they use? You do realise he’s talking about how it works, a performance test won’t help with that. If you don’t have time to learn and just need to know which is faster, then just do that. Instead of shutting on other peoples work.

  • @opticalmouse2
    @opticalmouse29 ай бұрын

    You're like my college professor who lives in a fantasy world.

Келесі