EFFICIENT COUNTING USING BITMAPS FOR SYSTEM DESIGN

Bitmaps are old but the solutions are new, in this session you will learn how to count the distinct values using bitmaps!!
#counters #distributedsystems #bitmaps
#microservice #learnmicroservices #totorialssystemdesign #microservicestutorials
#systemdesigntips #systemdesign #computerscience #learnsystemdesign #interviewpreperation #amazoninterview #googleinterview #uberinterview #micrsoftinterview

Пікірлер: 42

  • @treylitefm
    @treylitefm3 жыл бұрын

    This channel is a gold mine

  • @10clicks-freesoftwareengin51
    @10clicks-freesoftwareengin514 жыл бұрын

    Good One!

  • @ibrahimshaikh3642
    @ibrahimshaikh3642 Жыл бұрын

    Great video , keep doing it

  • @przemal86
    @przemal863 жыл бұрын

    I giggle everytime I see thumbnail for this video xD. Great sense of humor and content.

  • @user-oy4kf5wr8l
    @user-oy4kf5wr8l4 жыл бұрын

    Thank you for this amazing video. :D This bitmap looks like bloom filter :D

  • @dataguy7013
    @dataguy70134 жыл бұрын

    Naren, DAU and MAU are key metrics online copmanies calculate using Data warehouses. You just simplified the calculations for that significantly! Nice explanation.

  • @umalaxmi972
    @umalaxmi9722 жыл бұрын

    Thankyou Narendra, very good explanation. You always explain DS as though it's a cake walk. Why have you stopped posting new videos? Hope you are doing safe and healthy. I was just thinking, why don't we store the data to day-on-day information in database with 4 bit integer for logged in per day basis or it can even be an entry into a row type DB for the last logged in date with a bit 0/1 per user. Anyways we would need to think of a purging system as we can't retain this for long even in case of bitmap array

  • @marriagedance8657
    @marriagedance86574 жыл бұрын

    Great Video Sir, But I have few questions. Where do we store this bitmap? In Db or Memory? If in Db, do DBMS providers provide this functionality inbuilt? If in memory, are there any client libraries providing this? Also, what about persistance/fault tolerance if we are storing in memory?

  • @b.ravirao1655
    @b.ravirao16554 жыл бұрын

    Hi Narendra , That was very helpful. I was thinking of an improvement in there. How about using multiple bit map arrays instead of one single bitmap array ? Say , we can use 200 bitmap arrays for 200 M users and another array of 200 size which will store count of active users in ith bitmap array. This way, we can reduce time complexity by not traversing ith bitmap array if the active users for that bitmap array is 0. Correct me if I am missing any edge case. Thanks.

  • @daleffi
    @daleffi10 ай бұрын

    hmmm i don't know about that. Imagine the concurrency problem you would have to update this map as the logins are happening like crazy! Also, How would you efficiently traverse the array to update the bit correspondent to the user logging in?

  • @mitsukiorichimaru4511
    @mitsukiorichimaru45113 жыл бұрын

    correct me if i got it wrong, this bitmap array index went with the assumption that we would've users ids as integers right? But that's not the case in most dbs, we use mostly the uuid's then how would we correctly map/say array bitmap index corresponds to this user id? do we use some kind of hash function? if yes, then how we'd go write the unique hash function as it has collision problems ? Thanks

  • @shiwang789
    @shiwang7892 жыл бұрын

    1. our user ids might not be auto incrementing integers. in that case, I will need to store the mapping of a user id with auto incrementing integer first and for every request have to check the mapping first to make updates in bitmaps. I guess we can use something like in memory datastore like redis for that. 2. Also, we could have used count min sketch to get approximate numbers for these analytics. Thoughts?

  • @_modiX
    @_modiX4 жыл бұрын

    The last ID of your Bitmap is 999.999.999, not 1 Billion. For archive purposes this is storage efficient, but a simple SQL-DB wouldn't take much more. You can make relations and store its statistics this way, you don't have to store the entire user data every day. Also how is the system able to determine if the user is the same user or a different user? Where is the information stored? I still need a solid SQL DB, because I need to store key information to identify the user non the less. But other than that, yes, bit shifting is very fast to compute active users from specific days. As soon as you have to plot those, it gets slower, though.

  • @TechDummiesNarendraL

    @TechDummiesNarendraL

    4 жыл бұрын

    Good catch about 999..., Yes you are right we need a persistent db to store who is who, this is about collecting metrics in lighting fast with less overhead. I am sure not many people will use this in there daily life. But some systems do implement. Eg: Routers to find the unique combination of source vs destination IP address for various security implementation. citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.11.724&rep=rep1&type=pdf

  • @_modiX

    @_modiX

    4 жыл бұрын

    @@TechDummiesNarendraL Thanks for reply, yes this is a good reason and would've been a good example in your video as well as the clarification that this is not a replacement for a database to address interested beginners in this environment. Good video nonetheless. 👍🏻

  • @Emmanuel-px9lk
    @Emmanuel-px9lk4 жыл бұрын

    Perhaps I missed this, but how would you actually traverse the array to find all active users? Would you have to do a linear time scan and count how many '1's are there ?

  • @Chikaros

    @Chikaros

    3 жыл бұрын

    Yes.

  • @bishnuagrawal828
    @bishnuagrawal8282 жыл бұрын

    how would we handle case where userid is alpha numeric? How would we map a userid into a particular index uniquely?

  • @chidov
    @chidov3 жыл бұрын

    One question about the userId, as we know some system will use UUID as user id, in this case how could we set the order of the user in the bitMap?

  • @shrutikamboj4607

    @shrutikamboj4607

    2 жыл бұрын

    We can have a separate mapping for that.

  • @prakashsharma91
    @prakashsharma913 жыл бұрын

    How come iteration over 1 million bits is taking only 100ms?

  • @praveenjain183
    @praveenjain1833 жыл бұрын

    Liked the video, 1 question: what if my userIds are not integers but guids? any efficient way for the same?

  • @patrick7932003

    @patrick7932003

    2 жыл бұрын

    You can perform 2 things: 1. Use Row Number if there is any (recommended) 2. Use a Hash Function

  • @krishnaMurari48
    @krishnaMurari483 жыл бұрын

    we will need to load of all the 125 MB data of last n days in memory to count active users?? can we do it more efficiently??

  • @patrick7932003

    @patrick7932003

    2 жыл бұрын

    Hyperloglog can do this in a more space-efficient way (1-64 KB in size). However, it would sacrifice a little bit of accuracy.

  • @girig6421
    @girig64214 жыл бұрын

    Thanks for the video. Can a system like this account for users deleting their account? If the user was removed from the bitmap, it would mess with the indexing.

  • @TechDummiesNarendraL

    @TechDummiesNarendraL

    4 жыл бұрын

    Not really, even if the user is deleted the IDs of the existing users remains same.

  • @girig6421

    @girig6421

    4 жыл бұрын

    @@TechDummiesNarendraL If you had 10 users in your bitmap and you wanted to remove the 5th user, your bitmap would a length of 9. However the IDs of user 6,7,8,9 would then become 5,6,7,8.

  • @a.yashwanth

    @a.yashwanth

    4 жыл бұрын

    @@girig6421 🧐 why do you change user id for users whenever some user deleted his account?

  • @subbucs5674

    @subbucs5674

    4 жыл бұрын

    @@girig6421 It is expected that the user-id is auto-incremented. We can keep separate bitmap of user_deleted. So, total-active-users = total-logged-in-users - total-deleted-users. The use-case mentioned by Tech Dummies is for statistics purpose.

  • @ResonantFractal

    @ResonantFractal

    4 жыл бұрын

    @@girig6421 User ids should all be unique or strictly monotonically increasing. Else the design becomes practically unusable, viz., wrt (historical) metrics.

  • @sumonmal009
    @sumonmal0093 жыл бұрын

    THIS COMMENT IS FOR MY PERSONAL REFERENCE. TO UNDERSTAND PROPERLY WATCH THE FULL VIDEO -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- concept 2:45 bitmaps array 5:23 active user count 6:56 user login everyday 8:17 different device count 10:07

  • @bowang1825
    @bowang18254 жыл бұрын

    this is useful but how come it can be used in real world example?

  • @marlegagaming1274

    @marlegagaming1274

    3 жыл бұрын

    Analytics probably, "Get top K items purchased today", you could do all this in Redis, instead of an MR/Spark Job.

  • @Rxlochan
    @Rxlochan4 жыл бұрын

    Do bitmap have only 0 & 1 ?

  • @TechDummiesNarendraL

    @TechDummiesNarendraL

    4 жыл бұрын

    Bitmaps are made of bits and a bit can only have 1 or 0

  • @a.yashwanth

    @a.yashwanth

    4 жыл бұрын

    @@TechDummiesNarendraL *binary.

  • @paulhetherington3854
    @paulhetherington38542 жыл бұрын

    Count = French for: I'm already, a damn terroriste! How are we, dummies then? ie = #4, in XP- Orient!