Th30z Code
Жыл бұрын
28,049
1

Database Auto increment vs UUID - Which is Right for You?

Ғылым және технология

Discover the pros and cons to use an Auto-Increment ID vs a UUID, ULID or a Snowflake ID for your database table, and when to use one or another.
If you struggling to choose the right ID format for your database? Look no further! In this video, we dive into the world of database IDs, exploring the differences between auto-increment, UUIDs (Universally Unique Identifiers), ULIDs (Universally Unique Lexicographically Sortable Identifiers), and Twitter Snowflake IDs.
Discover the inner workings of each ID format to leverage their characteristics like simplicity and sortable nature for auto-increment IDs.
The globally unique generation and distribution for UUIDv4, the timestamp information of UUIDv7. The compactness of ULIDs and the structural advantages of Twitter Snowflake IDs.
#database #backend #sql #codingtips #programming #databases #databasedesign #backenddeveloper

Пікірлер: 47

@th30z-code10 ай бұрын
If you are interested in ULID vs UUID from a storage point of view check out this video: kzread.info/dash/bejne/ZIGWqKaLZ9TaZbg.html
@JohnWasinger Жыл бұрын
Nice summary about the differences between Auto Increment UUID ULID and Snowflake ID. I never heard of the last one and the timestamped feature of the ULID makes an easy way to check for expired tags that should be replaced.
@th30z-code
Жыл бұрын
If you want to read more about the Twitter Snowflake ID and the Instagram variant, check out their blog posts: blog.twitter.com/engineering/en_us/a/2010/announcing-snowflake instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c And if you want the timestamp feature of ULID but you need the UUID compatibility UUIDv7 are the way to go: www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-04.html#name-uuid-version-7
@jpboy196211 ай бұрын
I do both. Each table has an ID and a UID. ID is completely isolated to the backend. It is never exposed to the client, but is used for primary and foreign keys. The data type is a 64 bit Int. The UID is a Base62 encoded GUID. The client only uses the UID. All the pros, few of the cons. Thanks for the excellent video.
@gabrielspeed5464
11 ай бұрын
Well.. considering one of the main pros of a UUID is concurrent ID generation, pretty sure having both makes that impossible... As you need a sequencer/serializer to create new IDs...
@th30z-code
11 ай бұрын
true that everything gets queued on insertion, so you lose a bit of concurrency. but if the workload is mostly read and there are a lot of join or lookups on the ID you may save some CPU. also my guess is that the UUIDs here are not generated on the db side but on the application side
@ByronScottJones
11 ай бұрын
@@gabrielspeed5464not at all. In my system, the UUID is generated by the client when they first run the app. When they create their account, the database creates an autoincremented ID, and stores the UUID in the database. There's no reason they have to be created at the same time.
@ryanbtc
9 ай бұрын
So you filter by using UUID? I have worked for two enterprise software that use uuid just like the way you do, when the database is big enough, it will be slower than the ID
@ByronScottJones
9 ай бұрын
@@ryanbtc no, the client receives back the primary key. When they make future requests, they provide both the UUID and the PK. We lookup by PK, and confirm it's the right UUID.
@skorp567711 ай бұрын
Amazing video, thanks a lot! The thorough and dwtailed discussion gave a great overview over the issue! :) Just one small suggestion: maybe increase the font size to make the text easier to read on mobile devices. There's plenty of whitespace :)
@th30z-code
11 ай бұрын
Thanks for the tip, and glad the video was useful
@tesla177211 ай бұрын
when using db like mysql where Secondary index stores primary key as value. Using uuid (128 bits) will increase the size of secondary indexes.
@drumforhim3 ай бұрын
Very well presented. Thanks!
@anonymous_anonymity11 ай бұрын
Very helpful and well explained. Thanks a lot.
@th30z-code
11 ай бұрын
Thank you, glad it was helpful!
@syntheticfox_Ай бұрын
When choosing between server vs client side ID generation largely depends on security requirements of your application. For any external facing API, you will need to generate IDs on the server in order to keep the integrity of the data healthy. UUIDs can be generated on client or server.
@timothyconnally216711 ай бұрын
I like the idea of client side id’s. This could be timestamp + session ID, making it sortable. Server (database) sequence id’s are typically not guaranteed to be chronologically ordered as sequences are often cached for efficiency. Oracle, for example, defaults to 20 sequences in a cache. Two sessions can and often do mix their cached sequences out of chronological order. This well known and documented in several Oracle forums. I want to add my own concern for what I call the “identity crisis”. An autogenerated I’d has no meaning. It is *not* relational per Cobb. I see too many table designs ignore the need for a natural key to properly define what is being stored. This leads to sloppy inefficient designs. Primary keys should be exogenic data, not entropic autogenerated id’s. ‘Nuff said. 😅
@DomskiPlays11 ай бұрын
Finally I had this question for years
@th30z-code
11 ай бұрын
I'm glad it was useful!
@khomo1211 ай бұрын
Excellent! Thanks!
@th30z-code
11 ай бұрын
Thank you! glad it was helpful.
@uchennaofoma4624 Жыл бұрын
Whoa. This video is amazing
@th30z-code
Жыл бұрын
Thank you, glad to hear that! is there anything in particular that you liked? so I can expand more on that kind of topics or keep doing more of something that you liked
@adrianbool456811 ай бұрын
Very nicely presented video; but I feel the conclusions section let it down. I'd never use a client supplied "random" value as a key in my database. Doing so opens the system to a whole bunch of attacks. You're expecting an ID from the client that, by pure chance, can't practically collide with an existing value but then what happens to your system when someone is intentionally injecting into your system values that already exist? The Snowflake ID was obviously defined to allow for a server generated ID without most of the downsides you list in the conclusion section for Server Generated IDs. (e.g. Relying on a single generation source.)
@th30z-code
11 ай бұрын
Yeah, sorry you are not the first one reporting the confusion between "client side" and "server side" even if quoted. what I meant was more like IDs generated and passed to the storage/service (client side) vs IDs generated by a single source which is probably the storage itself (server side). so things like Snowflake Ids to me are "client side" since they are not coordinate or created by a central point, but yeah on the comparison I left them a bit out and assumed the larger IDs. you are also right that is not that simple as passing the ID and use that directly, but it's easier to explain the concept of coordinated vs uncoordinated operations without having to go through all the security stuff. For example when you work with mobile apps, and you have to work offline you always have things like (user_id, device_id, uuid) which means that an attacker must have the device session to be able to play with the uids directly. thanks for the feedback!
@LettersAndNumbers30011 ай бұрын
Are dbms’s smart enough to use the timestamp in v7 to sort / filter quickly? Mainly interested in SQL server but also open to graph DBs.
@th30z-code
11 ай бұрын
Even if the database doesn't know about the type (UUIDv7), the layout of the UUIDv7 (the timestamp in the beginning) is made specifically to make the IDs sort by timestamp, and you can leverage btree indexes. so, you can make an efficient between, less-than, greater-than query even if the uuid is stored as binary or text (hex retain the numeric order) '01893bdf-37a3-0000-0000-000000000000' If the database implementation is aware about the UUIDv7 type, and gives you the ability to filter only by timestamp. it can just look at the first 48bit, and discarding the rest. the sort order given by the uuid layout is already correct.
@LettersAndNumbers300
11 ай бұрын
@@th30z-code awesome, thanks
@dariuszlenartowicz8 ай бұрын
Using blindly the same big type id for all datasets (tables) is a big misunderstanding. In our company we reduced our databases in half migrating to appropriate I'd types.
@JohnVandivier11 ай бұрын
Followed!
@th30z-code
11 ай бұрын
thank you!
@clementseiller125511 ай бұрын
You can generate all types UUID you want on the server side too. Your list of pros and cons is a bit disingenuous as it could lead some to believe that you can only generate auto increment on the server side. Which is completely false. As a general recommendation, do the most you can on the server side as client side is not something you own nor control. This video present good use case to generate id on the client side but again, that is something you want to avoid.
@th30z-code
11 ай бұрын
yeah you are right, the "client side" and "server side" even if quoted may be misleading. what I meant was the difference between an ID generated automatically by the database or the service versus an ID generated by the application itself and passed directly to the database or the service. also I agree that if you can avoid to generate IDs on the "client side" (application) is better. but there is a whole category of application (mobile apps, batch jobs, ...) that must be able to work offline. In that case "UUIDs" saves you a lot of trouble on the reconciliation phase.
@BigFattyNat11 ай бұрын
Did i miss the plot here? I send a new row to my backend db from the clientside, just the meaningful fields from the client, then the db itself creates the auto incr id as an auto generated field on its own... Where is the problem?
@th30z-code
11 ай бұрын
There is no problem with that approach, and most application can and should work in that way. The limitation of waiting "a coordinator" to generate the id only applies to system that works offline or system with verify high traffic load where the reconciliation phase gets too complicated or the round-trip time is too expensive. Programmers tends to adopt techniques used by other large companies even if they don't have the same problems. so, if your code is simple and you haven't felt the necessity to use anything different from an auto-increment, keep using them. there is nothing wrong.
@tesla1772
11 ай бұрын
When using a master-master cluster auto incrementing ids cause problems. It can lead to id duplications
@tomaszn92018 ай бұрын
UUIDS are hundreds, thousands time slower to query (big tables) than integers.
@sdfsdf421df11 ай бұрын
correction: uuids can be/should be stored as 128b number, so it's not _that_ worse than other numeric data. from my point of view: - never, ever, ever trust client to 'generate' unique id for your system. Client side generated IDs is bear trap and your system should just take it as: "yeah, this value user wan't to send as his note but we don't work with it, and it cannot be used to uniquelly identify record, so if he asks for data base on it, we always return list." - server generated ids are predictable, and if you care about it (like in user id maybe), you might not want to use it -- you said it. - server generated ids from sequence in db(or other source of single truth) is only way IIUC to have predictable value-based pagination, uuid types cannot be used for predictable&reliable sorting in multi-node system, IIUC. If I'm setting up new system: - I dont allow user to force me specific ID to use at my side as ID - I have ID from sequence for internal use like value-based pagination, user never sees this value - I have uuid PK user works with.
@th30z-code
11 ай бұрын
It's true that UUID can be stored as 128bit (16 bytes), but unfortunately most system don't really leverage 128bit registers or use vectorization, so it will be still 2 64bit comparison operations. at some point all systems will be there and more and more databases have UUID has a special type so you'll get optimization for free once they improve the code. but that's just part of the problem with larger keys. You'll have more disk I/O (it's double the space usage compared to int64), your cache will have less space available for the data and so on... As someone else pointed out the "Client side" generation probably is a bit misleading even if quoted. what I meant was the difference between an ID generated automatically by the database or an ID generator instance versus an ID generated by the "application" itself (which may be a service or cron batch processor, ...) and passed directly to the database or the service. the inclusion of the Twitter Snowflake ID was to emphasise this aspect where for scalability you have to find tricks to avoid having the ID generator as a bottleneck. UUIDv7 or ULID or any time-based Id, can probably give you a good enough insertion sort order. If you think about it unless you have a single client you'll always have concurrent insertions, so even with an auto increment which one really started before the other? and I also agree that if you don't need to work offline, the end user application should not generate IDs, but the server should do that, and if you want the safety of exposing non predictable IDs the UUID aside is a good choice.
@LettersAndNumbers30011 ай бұрын
2:54 why does a disconnected client need order IDs? This is silly…
@th30z-code
11 ай бұрын
It's a trick to avoid data reconciliation. If on the client you already have the data you don't need to fetch it again from the server just to get an id, and you don't need to map a client id to a server id. Even if you are storing everything in a single row with all the information in a json you end up with the same problem.
@LettersAndNumbers300
11 ай бұрын
@@th30z-code The order is born on the client, without an ID. When you send it to the server, it gets an ID. It does not need an ID until it becomes part of the datastore on the server. Nobody says it's json or whatever, just in general. I just don't get it I guess sorry.
@th30z-code
11 ай бұрын
assuming you are offline. how do you modify (and identify) the order on the client? how can you add related data to that order. You'll probably have a client id, but then when you finally have connectivity you have to deal with reconciliation (which is difficult, not impossible). and if your server logic is asynchronous you may not have received the server id yet, but on the client side you have produced other modifications. so you keep going back and forth trying to match the client id to the server id. I'm not saying that you should always use this approach. If your application does not have the ability to work offline, or at least does not allow to edit things offline. You should go with the simpler approach of sending the data and getting back the id from the server if you need it. but if working offline is the common operation this approach saves you a lot of troubles and a lot of debugging to try to figure out which operation relates to the other.
@awksedgreepАй бұрын
If you have a tiny 3nf database that will never grow bigger than say 1G then UUID is slower, but fine. If you think your database will ever grow in size stick with auto inc.
@chrishabgood890010 ай бұрын
The problem with uuid is it’s not sequential.
@th30z-code
10 ай бұрын
It's not a problem it's a characteristic of UUID v4. If you need an ID that can be generated offline (similar to UUIDs) but with some sequentiality ULID, UUID v7 or Snowflake ids are what you looking for. Checkout this video that talks about the different behaviour, from a storage point of view, of the two type of IDs kzread.info/dash/bejne/ZIGWqKaLZ9TaZbg.html.