Distributed Sorting - Google Interview Question - Algorithm & System Design - Full 2 Hour Interview

If you were given 1 TB of data and asked to sort it using 1000 computers, how would you do it. This is a Google senior interview question, and you are currently watching the optimum solution. In this video, we will do the full 2-hour interview stages together and get hired!
Outline of this interview question video:
* 0:00 Overview
* 2:22 Problem Visualization
* 4:40 Design Document
* 5:22 Problem Definition
* 5:35 Requirements Analysis
* 9:20 System Design
* 14:35 Complexity Analysis
* 15:17 Implementation
* 18:23 Tests
* 19:12 Discussion
* 22:02 Conclusion
First off, let's check out the agenda for this video. We will start by visualizing the problem. Then we will continue with a concise design document. This is probably going to be on a whiteboard in a real interview, so we will keep it short. In the document, we will define the problem, do requirements analysis, do system design using a simple diagram, and finally do the complexity analysis. Then we will actually code and implement this system. Then we'll add some tests. And we will close it off with a discussion with the interviewers.
I will take you through the entire 2-hour interview in sections and do every section with an in-depth analysis. You are about to see the most in-depth late-stage senior interview analysis on the internet. So, sit back, relax, and turn up your brain to the max.
Now let's move onto the problem in hand. In this question, you are given 1 TB of data sitting in a database, and 1000 computers each with 1.5 GB of RAM. And you are asked to sort this data as fast as possible.
This is a distributed sorting question asked by Google in a senior engineer interview. This was asked during the final interview round, and it is a hard question. In addition to the question being hard, the discussion held with the interviewer around other possible solutions, possible improvements, and external factors that might affect the performance, will test your deeper understanding of the topic. The source of this question is Hacker News, and you can find several versions of this question being discussed by ex-Googlers on HN. Obviously, I cannot reveal the source directly, but I am keeping a record of these questions as I find them to see how frequently they are asked. Distributed computing questions seem very frequent at senior Google interviews. On a side note, the original question was asked to be implemented using Python or C++ on a Kubernetes cluster. Finally, 2 hours is allocated for this interview, including design, implementation, and discussion. We will look into multiple approaches to solve this problem and investigate different requirements along with other variants that you might be given in an interview situation.
My simplified distributed sorting algorithm implementation:
* github.com/soygul/QuanticDev/blob/master/algorithms/distributed-computing/distributed-sorting/distributed-sorting.js
My k-way merge and tournament tree implementations and their tests:
* github.com/soygul/QuanticDev/tree/master/algorithms/merge
* github.com/soygul/QuanticDev/tree/master/algorithms/trees/tournament-tree
Chromium Project Design Document Template:
* docs.google.com/document/d/14YBYKgk-uSfjfwpKFlp_omgUq5hwMVazy_M965s_1KA/edit
My "K-Way Merge" video, which explains k-way merge which is used extensively in this video:
* kzread.info/dash/bejne/iqNpltClgKu5oMo.html
My "Software Engineering Compensation Guide" video to help you estimate what you should be paid:
* kzread.info/dash/bejne/ZqVlqamQYJfOmto.html
If you want to read or contribute, you can find this guide on:
* quanticdev.com/algorithms/distributed-computing/distributed-sorting
My Algorithms Playlist:
* kzread.info/head/PLlPRnMzqjADqDZFBqdwzbIjf71h8xYs4x
- - - - - - - - - - -
quanticdev
quantic_dev
quanticdev.com

Пікірлер: 32

  • @QuanticDev
    @QuanticDev3 жыл бұрын

    I will publish my free courses on more advanced topics on quanticdev.com soon. Check them out when I publish them. I greatly appreciate any feedback. Edit: I realized that I failed to mention inter-node communication. A message queue like RabbitMQ or even better, streaming message passing software like Apache Kafka would fit into our use case very well. They both have options for high-availability and message persistence. Hence we can sacrifice 3-4 nodes to be our dedicated message queue handlers.

  • @RAHULSINGH-zg9hp

    @RAHULSINGH-zg9hp

    3 жыл бұрын

    Kafka would be a better option since it gives default storage of 7 days which can be tuned as well

  • @QuanticDev

    @QuanticDev

    3 жыл бұрын

    @@RAHULSINGH-zg9hp I agree. It's more work but is a better long-term solution.

  • @ShubhamSingh-ku2ow
    @ShubhamSingh-ku2ow2 жыл бұрын

    Amazing content!! Thanks for bringing such high quality material. Looking forward to many more like this!!

  • @SergeyZarin
    @SergeyZarin3 жыл бұрын

    great lesson!

  • @hannnah689
    @hannnah6893 жыл бұрын

    Thanks for the vid, super useful!

  • @lidajones6983
    @lidajones69833 жыл бұрын

    Awesome content and explanation

  • @mrrank
    @mrrank2 жыл бұрын

    Good one

  • @sangrammallick3304
    @sangrammallick33043 жыл бұрын

    Awesome videos..will be waiting for more. Wish I had option to add more likes

  • @svenschmidt1735
    @svenschmidt17353 жыл бұрын

    Thanks for the video, definitely dense with lots of important information. I didn't get the distributed partitioning with 1000 value ranges at all, so a follow-up on d.p. would be greatly appreciated. Thumbs up

  • @QuanticDev

    @QuanticDev

    3 жыл бұрын

    Yes I skimmed through it to keep the vid short. I will prepare an animated vid for it as it is best explained visually.

  • @akashagarwal6390
    @akashagarwal63904 ай бұрын

    why not thinking of external merge sort in the first part?

  • @ducatidiavel4021
    @ducatidiavel40213 жыл бұрын

    Do the remaining 997 nodes read data packets from the database in parallel ? How do you ensure that each of those 997 nodes read their segment of the data correctly? Or do the control group issue these triggers sequentially ? I'm asking this cos assume the control group issued parallel triggers with exactly some way mentioning which segment of the data that specific node must read, what happens when it results in a read failure or the Node goes down at the time of the read. Do you receive some form of acknowlegement from the node ? Very nice video btw. Your videos are amazing. Its a wealth of info.

  • @QuanticDev

    @QuanticDev

    3 жыл бұрын

    Good point. They would read the data in parallel to keep things simple. In this case, just checking for heartbeat is not enough. The nodes must acknowledge the successful read. Otherwise controller group can handle that node as a failed one, and redistribute its range to all other nodes as usual. I didn't think of read failures. There are probably many other smaller things that I failed to think about. I'll add them to the article version of this video when I finish it. Thanks for bringing it up!

  • @ducatidiavel4021

    @ducatidiavel4021

    3 жыл бұрын

    @@QuanticDev Thanks for explaining this :)

  • @SiddharthKulkarniN
    @SiddharthKulkarniN3 жыл бұрын

    This video is very interesting. Have a like and a Sub.

  • @QuanticDev

    @QuanticDev

    3 жыл бұрын

    My pleasure.

  • @lquixada
    @lquixada3 жыл бұрын

    great content indeed! it's hard to find video explaining how to scale algorithms with multiple machines. Do you know any other good resources to find content like this one?

  • @QuanticDev

    @QuanticDev

    3 жыл бұрын

    I'm yet to come across even a single video like this one that is complete with the code, implementations, and animated descriptions. Mostly what I find is generic whiteboard systems design stuff, which is probably sufficient if you never actually do design but only acquire partial information as interview prep.

  • @lquixada

    @lquixada

    3 жыл бұрын

    @@QuanticDev thanks for the quick response. Is there any good sites, repos or books that you would recommend?

  • @QuanticDev

    @QuanticDev

    3 жыл бұрын

    @@lquixada "Designing Data-Intensive Applications" is the holy grail of all systems design books. The systems design section on the "Elements of Programming Interviews" is also pretty good, if you just want to learn stuff as an interview prep.

  • @Tweakimp
    @Tweakimp3 жыл бұрын

    Additional question I would ask: When can I start

  • @QuanticDev

    @QuanticDev

    3 жыл бұрын

    If you can get 3 quarters of this interview right, I would directly say "as it suits you".

  • @user-oy4kf5wr8l
    @user-oy4kf5wr8l3 жыл бұрын

    Thanks so much, buddy.......senior level interview, what does this mean , how many years of experience is considered as senior ? i am gonna do a sde 2 interview. i really hope that for system design question won't be so hard.. could u please give me any suggestion plz ? Thank u in advance.

  • @QuanticDev

    @QuanticDev

    3 жыл бұрын

    I can't define seniority by years but if you are going for sde 2, solving the ever generic systems design questions should suffice (design a tinyurl clone, etc.). I don't actually know a great YTer with great resources on systems design actually. If you find any, let me know so I can recommend in the future.

  • @user-oy4kf5wr8l

    @user-oy4kf5wr8l

    3 жыл бұрын

    @@QuanticDev there is an Indian guy making great video called Gaurav sen 😊

  • @user-oy4kf5wr8l

    @user-oy4kf5wr8l

    3 жыл бұрын

    @@QuanticDev thank you buddy 🎊

  • @QuanticDev

    @QuanticDev

    3 жыл бұрын

    @@user-oy4kf5wr8l my pleasure

  • @QuanticDev

    @QuanticDev

    3 жыл бұрын

    @@user-oy4kf5wr8l I'll check his channel out. Thanks.

Келесі