Why Google Stores Billions of Lines of Code in a Single Repository

Ғылым және технология

This talk will outline the scale of Google’s codebase, describe Google’s custom-built monolithic source repository, and discuss the reasons behind choosing this model of source control management. It will include background on the systems and workflows used at Google that make managing and working productively with a large repository feasible, in addition to a discussion of the advantages and trade-offs of this approach.
Presenter: Rachel Potvin

Пікірлер: 176

  • @tivrfoa
    @tivrfoa5 жыл бұрын

    As she told in the end, this is not for everyone. You need a lot of infrastructure engineers to make it work. Some good things I thought about a monorepo: 1. Easy see if you are breaking someone's else code; 2. Makes everybody use the latest code, avoiding technical debt and legacy code.

  • @davidbalakirev5963

    @davidbalakirev5963

    2 жыл бұрын

    "everybody use the latest code" this is the part I don't get I'm afraid. Do I depend on the code of the other component, or a published artifact made from it? It really comes across as dependency to the source. Do they build the artifact of the dependent library?

  • @laughingvampire7555

    @laughingvampire7555

    10 ай бұрын

    but if you are using a lot of infrastructure anyway and a lot of custom tooling, then you can also use custom tooling with separated repos and get the visibility when your changes will break someone's else code. This can be part of the CI tooling, rebuild ALL repos in dependency order.

  • @roytries
    @roytries9 жыл бұрын

    I cannot decide if this is either a great talk about a new way to manage complex code bases, or some sort of way for Google to convince themselves that working with a gargantuan multi-terabyte repository is a the right thing to do.

  • @opinali

    @opinali

    9 жыл бұрын

    +Roy Triesscheijn There are other advantages Rachel doesn't even touch... a few months ago I've made a change to some really core library, getting a message from TAP that my change affected ~500K build targets. This would need to run so many unit tests (all tests from all recursive dependencies), I had to use a special mechanism we have that runs those tests in batch in off-peak hours, otherwise it takes too long even with massive paralellism. The benefit is that it's much harder to break things; if lots of people depend on your code then you get massive test coverage as a bonus (and you can't commit any CL without passing all tests). Imagine if every time the developers of Hibernate makes some change, they have to pass all unit tests in every application in the planet that uses Hibernate - that's what we have with the unified repo.

  • @roytries

    @roytries

    9 жыл бұрын

    Osvaldo Doederlein Isn't it strange that the responsibility that program X works when library Y gets updated is with the maintainer of library Y? Why not publish libraries as Packages (like NuGet for C#). You can freely update library Y, knowing 100% sure that it will break nobodies code (since nobody is forced to upgrade). The maintainers of program X can freely choose when they are ready for updating to the new version of Library Y, they can run their own unit tests after updating, and fix problems accordingly. Of course I also see the benefits of having everything in one repository. Sometimes you want to make a small change to library Y so that you can use it better in program X, which is a bit of a hassle since you need to publish a new package. But these days thats only a few clicks. :)

  • @roytries

    @roytries

    9 жыл бұрын

    Osvaldo Doederlein I guess it all comes down to this: I understand that there are a lot of benefits, but of course also a lot of drawbacks. I'd guess that pushing this single repository model so much into extreme the drawbacks would outweigh the benefits. But of course I have never worked with such an extreme variant :)

  • @opinali

    @opinali

    9 жыл бұрын

    +Roy Triesscheijn Some burden switches to the library's author indeed, but there are remedies-you can keep old APIs deprecated so dependency owners eventually update their part; you can use amazing refactoring tools that are also enabled by the unified repo. And the burden on owners of reusable components is a good thing because it forces you to do a better job there, limiting API surface area, designing to not need breaking changes too often, etc.

  • @opinali

    @opinali

    9 жыл бұрын

    +Roy Triesscheijn There's some truth in that but honestly, the pros are way bigger than the cons. For one thing, this model is a great enabler of Agile development because 1) changes are much safer, 2) no waste of time maintaining N versions of libraries / reusable components because some apps are still linking to older versions. (Ironically, the day-to-day routine looks less agile because builds/tests are relatively slow, and code review process heavyweight-but it pays off.) The real cost of this model is that it requires lots of infrastructure; we write or customize heavily our entire toolchain, something very few companies can afford to do. But this tends to change as open source tools acquire similar capabilities, public cloud platforms enable things like massive distributed parallel builds, etc.

  • @LarsRyeJeppesen
    @LarsRyeJeppesen7 жыл бұрын

    Wonder what the numbers are as of time of writing (April 2017) ?

  • @Borlik
    @Borlik8 жыл бұрын

    Very good talk, very interesting solution. Also scary. I'd love to see more of real usage, especially if the exponential usage grow really corresponds to the "need grow". Still, with some more future pruning and code-extinction mechanism it may survive untill the first "Look, we are just too big and have to split" moment :-)

  • @DIYRandomHackery

    @DIYRandomHackery

    8 ай бұрын

    I don't think it existed 7 years ago, but "code extinction" tools exist today and find "unused" stuff and slowly remove them.

  • @pawelmagnowski2014
    @pawelmagnowski20148 жыл бұрын

    1. your 1st day at google 2. git clone 3. retire 4. clone finished

  • @timgelter

    @timgelter

    7 жыл бұрын

    There's no clone. They're using filesystems in userspace (e.g. Linux FUSE). The only files stored on their local workstations are the files being modified.

  • @LarsRyeJeppesen

    @LarsRyeJeppesen

    7 жыл бұрын

    Man way to kill a great joke :)

  • @kimchi_taco

    @kimchi_taco

    5 жыл бұрын

    Google doesn't use git tho;;

  • @MrMangkokoo

    @MrMangkokoo

    4 жыл бұрын

    @@kimchi_taco what do they use tho?

  • @vijay.arunkumar

    @vijay.arunkumar

    4 жыл бұрын

    Claudia Sastrowardojo @10:40 she talks about Piper and CitC. They come with both a Git style as well as a Perforce style set of cli commands to interact with.

  • @apparperumal
    @apparperumal4 жыл бұрын

    Great Presentation. Thank you. In IBM, we have inspired the Monorepo concept and in the process of adopting Trunk Based Development with Monorepo.

  • @simonpettersson6788
    @simonpettersson67886 жыл бұрын

    1. We had problems with code duplication so we moved all our shit into one 84TB repo and created our own version control system. Estimate 2000 man hours, plus a 500 man hour per year per employe overhead 2. We had problems with code duplication so we moved the duplicated code into its own repository and imorted that into both projects. Estimate 5 man hours

  • @wepranaga

    @wepranaga

    3 жыл бұрын

    oh monorepo

  • @gabrielpiltzer6188

    @gabrielpiltzer6188

    3 жыл бұрын

    I love it. This is the real answer for non Google sized companies.

  • @hejiaji

    @hejiaji

    2 жыл бұрын

    99.9% of companies should go for number 2

  • @voidvector

    @voidvector

    2 жыл бұрын

    2000-man hour is small price to pay so the other 10,000 engineer doesn't need to deal with version/branch mismatch between repos. I worked at a finance company before with multi-repo and between 10-100 millions of LOC. It was a shit show, because they basically had "DLL hell" with internal projects. And due to legal reasons, we had to debug those problem (e.g. reproduce) to find root cause of some of those bugs. Suffice to say, you probably don't need to worry about multi-repo/mono-repo unless your codebase exceeds 1 million LOC. Linux runs on monorepo git with 2 million LOC.

  • @durin1912
    @durin19129 ай бұрын

    Does anyone know what is the current strategy at Google now that 8 years have passed since this talk?

  • @aelamw
    @aelamw3 жыл бұрын

    so you should use SVN?

  • @repii
    @repii9 жыл бұрын

    Very impressive work! Thanks for presenting and talking about it, quite an inspiration!

  • @skylvid
    @skylvid6 жыл бұрын

    This video is my happy place.

  • @MartinMadsen92
    @MartinMadsen922 жыл бұрын

    I would love to hear how they manage to build 45,000 commits a day (that's one commit every two seconds) without either allowing faulty code to enter the trunk that thousands of developers are using instantly, or create huge bottlenecks due to code reviews and build pipelines.

  • @great-garden-watch

    @great-garden-watch

    2 жыл бұрын

    I can’t even describe how incredibly great the entire system is. The search is insane. The workflow and build tools she mentions are just amazing.

  • @dijoxx

    @dijoxx

    4 ай бұрын

    The code changes cannot get submitted until they pass all the tests. Not all changes trigger a full build either.

  • @MartinMadsen92

    @MartinMadsen92

    4 ай бұрын

    @@dijoxx So checks are run locally before pushing?

  • @freeboson

    @freeboson

    Ай бұрын

    @@MartinMadsen92 no, there's a set of tests that are run by the infra, defined by the projects owning the modified files at time of submit. Then there's a larger, completely comprehensive set of tests that runs on batches of commits. It is possible for changes to pass the first set and fail the second set, but for most projects it's rare.

  • @JoeSteele
    @JoeSteele8 жыл бұрын

    I am curious how Google handles code that should not be shared between teams (for legal or business reasons). Rachel calls it out as a concern at the end, but I imagine that Google already has this problem today. For example, portions of the Widevine codebase would seem to fall into this category. How do they handle that case?

  • @AnjiRajesh

    @AnjiRajesh

    8 жыл бұрын

    Once i read in quora that, "for some of the internal projects till some point of time, they maintain private repositories but once that development is completed they'll merge those repos to main code base. This is the only case they have other repositories other than main code base. "

  • @prakhar0912

    @prakhar0912

    4 жыл бұрын

    Google engineer here. Each team gets to decide what packages would have visibility into their code base.

  • @dijoxx

    @dijoxx

    4 ай бұрын

    Code for highly sensitive trade secrets (e.g. page ranking etc) is private. Everything else can be seen by the engineers and it's encouraged to explore and learn.

  • @rjalili
    @rjalili8 жыл бұрын

    Security around this code base must be the tightest there is, I imagine.

  • @anirudhsharma2877

    @anirudhsharma2877

    5 жыл бұрын

    are you thinking of breaking in?

  • @External_Bastion

    @External_Bastion

    4 жыл бұрын

    @@anirudhsharma2877 Shouldn't you always think like that?

  • @MaggieMorenoWestCoastSwing
    @MaggieMorenoWestCoastSwing8 жыл бұрын

    Has anyone been able to find the paper that the presenter is referring to?

  • @jeteon

    @jeteon

    8 жыл бұрын

    +Maggie Moreno It doesn't seem to have been published yet

  • @patrikhagglund4387

    @patrikhagglund4387

    6 жыл бұрын

    I assume the referred paper is research.google.com/pubs/pub45424.html.

  • @gabrielpiltzer6188
    @gabrielpiltzer61883 жыл бұрын

    I'm not sure that I agree with all of the advantages listed. Extensive code sharing and reuse is also called tight coupling. Simplified dependency management is made difficult once 100 teams start using a version of a 3rd party and you want to upgrade it. That leads to large scale refactoring, which is extremely risky. I'm not saying that Google hasn't made this pattern work for them but to be honest, no other software company on the planet can develop internal tooling and custom processes like they can. I don't think that a monorepo is for any company under the size of gargantuan.

  • @willysoesanto2046

    @willysoesanto2046

    10 ай бұрын

    > Extensive code sharing and reuse is also called tight coupling. The thing is, when a code sharing is needed then such dependency will be established irregardless the repository type i.e. monorepo or multirepo. The idea is to make sure that when such dependency is needed, there should be no technical reason that it couldn't happen. > Simplified dependency management is made difficult once 100 teams start using a version of a 3rd party and you want to upgrade it. Yes, this is intentional. The reason behind it is because they would like to have each team play nicely with others. Dependency hell problem is a hot potato problem that could be passed towards other team. In a harsh way, dependency hell problem could be summarized to: I have upgraded my third party dependency X to version Y, I don't care how my dependants deal with that. They could either copy my prior change code to their codebase or they could spend numerous of hours to make sure that they can upgrade their dependency X to version Y too

  • @michaelmoser4537
    @michaelmoser45377 жыл бұрын

    My uninformed impression is 'we don't quite understand our internal dependencies, even more - we so we don't quite understand our automated build/test/release processes, so its better to keep it in the same branch/repository so that all the scripts can potentially find their data if they need it'.

  • @Muhammad-sx7wr

    @Muhammad-sx7wr

    3 жыл бұрын

    You are so harsh.

  • @mnchester
    @mnchester Жыл бұрын

    Can someone please comment if this content (from 2015) is still relevant to now (2022-2023), ie, does Google still use all of these tools? Amazing video btw!

  • @dijoxx

    @dijoxx

    4 ай бұрын

    Yes, they do.

  • @GegoXaren
    @GegoXaren9 жыл бұрын

    This is why we use the upsteam/downsteram model. If code is used in many downstream projects it should be pushed upstream. Much better to have smaller modules that are small than to have a monolithic code base. And what about dead code?

  • @amaxwell01
    @amaxwell018 жыл бұрын

    Such a sweet insight into how Google handles their codebase.

  • @Sawon90
    @Sawon902 жыл бұрын

    Are they still using monolithic codebase in 2022

  • @dijoxx

    @dijoxx

    4 ай бұрын

    Yes

  • @AIMMOTH
    @AIMMOTH9 жыл бұрын

    Diamond problem 19:40

  • @yash1152
    @yash1152 Жыл бұрын

    11:36 citc file system ... without needing to explicitly clone or sync any state locally waooooowww... awesome.

  • @Savatore83
    @Savatore833 жыл бұрын

    what is working for google it is not necessary the best solution for every IT company

  • @chrise202
    @chrise2025 жыл бұрын

    How the IDE is coping with this many files ?

  • @redkite2970

    @redkite2970

    3 жыл бұрын

    1 repository doesn't mean 1 solution.

  • @-Jason-L

    @-Jason-L

    2 жыл бұрын

    @@redkite2970 "solution" is a Microsoft thing

  • @MichaelTokar
    @MichaelTokar8 жыл бұрын

    Really interesting ideas. If there's any Googlers out there, I'd be curious to know how you use Release Branches with the monolithic repository. If everything is in the one repo, does that mean a release branch actually pertains to the entire set of code? Or does it apply to 'sub-folders'? If the latter, how do you determine that things are in a release-able state?

  • @SrdjanRosic

    @SrdjanRosic

    8 жыл бұрын

    +Michael Tokar yes, similar mechanisms that allow an engineer to have a unified view of the whole sourcecode at a particular version, with their changes overlaid on top, can be used to allow build tools to have changes belonging to a branch overlaid on top of an entire source code at some version, it is only sensible for this to be versioned as well, especially when you couple if with hermetic, determinstic, repeatable builds (see bazel).

  • @DIYRandomHackery

    @DIYRandomHackery

    8 ай бұрын

    The general idea is that most of the time, you just build "//...@1234567" (the main code line at changelist 1234567 where changelist ~= a commit) i.e. you just build from head at a "fixed" changelist. Only when you need to cherrypick fixes will you create a release branch to allow a one-of-a-kind mutation away from the main codeline. Decades ago you used to have to always create (Perforce) release branches with hundreds of thousands of files, but modern tooling lets you virtualize that process now, since 99.9% of the the code in question is unmodified. This made the process far more lightweight. Perforce could be used to do this by manipulating the client view (I've tried it), but there's a limit as to how far you can push that idea; hundreds of lines in your client view slows things down too far to be useful; p4 commands take minutes instead of seconds. For smaller environments it could be a viable method, if you build the tools to do it for you (maintain branch and client views based on the cherrypicks you need)

  • @dijoxx

    @dijoxx

    4 ай бұрын

    It applies to 'sub-folders'. There is a release process with build environments for staging etc.

  • @swyxTV
    @swyxTV4 жыл бұрын

    isnt marking every api as private by default basically cordoning off parts of your mono repo into... multiple repos?

  • @dijoxx

    @dijoxx

    4 ай бұрын

    No.

  • @gatsbylee2773
    @gatsbylee27733 жыл бұрын

    I really doubt about the mentioned advantages.

  • @me99771
    @me997714 жыл бұрын

    That comment at the end about organisations where parts of the code are private is interesting. Is google not one of those organisations? They have a lot of contractors writing code. Are they all free to browse google's entire source code?

  • @BosonCollider

    @BosonCollider

    2 жыл бұрын

    I guess it is so big that they can try but won't get much out of it, if they generate the equivalent of a linux kernel every single week.

  • @f0xm2k
    @f0xm2k6 жыл бұрын

    I think they are right. Source code is a form of information. Big data methods, AI self learning algorithms, .. they profit massively from easy accessible data.. guess the long term goal is having new code be generated completely automated. Splitted repos would slow down those efforts I guess.

  • @MattSiegel
    @MattSiegel9 жыл бұрын

    Amazing! :D

  • @twitchyarby
    @twitchyarby6 жыл бұрын

    This talk seems to violate every source control best practice I've ever heard.

  • @VictorMartinez-zf6dt

    @VictorMartinez-zf6dt

    4 ай бұрын

    Because they’re not in fact best practices, but usually work arounds for bad developers or working with open source.

  • @dijoxx

    @dijoxx

    4 ай бұрын

    Maybe you should hear from better sources.

  • @shaiksaifuddin
    @shaiksaifuddin4 жыл бұрын

    i can only imagine how much time it would take cloning such repo 😅

  • @dijoxx

    @dijoxx

    4 ай бұрын

    Nobody clones the repo.

  • @ryanb509
    @ryanb5094 жыл бұрын

    1 billion files but only 2 billion lines of code. So each file averages 2 lines of code? For that to make sense over 90% of the files must be non-source files.

  • @chaitrakeshav

    @chaitrakeshav

    3 жыл бұрын

    9 million source files 2 billion lines of code. ~220 lines per file. Decent!

  • @twoka
    @twoka8 жыл бұрын

    I felt like she was trying to convince heself that this approach is a good one. I'm sure that many goodle sub-projects are organized different and propper way.

  • @Ed-yw2fq
    @Ed-yw2fq Жыл бұрын

    14:35 Trunk based development with centralised source control system. I'm glad we have git.

  • @adipratapsinghaps
    @adipratapsinghaps2 ай бұрын

    We didn't talk about the biggest tradeoff. Deployments/Releases are very very slow. Correct me if I am wrong.

  • @zachyu2130

    @zachyu2130

    3 күн бұрын

    You don't build / release the entire repo in one go, but only a tiny part of it compiled down to only a few files usually. So the size of the repo is generally irrelevant. Bigger services are composed of microservice nodes which are owned by different teams and released separately.

  • @r3jk8
    @r3jk83 жыл бұрын

    can someone reply here with the cliff notes please.... also, are they still doing this?

  • @willysoesanto2046

    @willysoesanto2046

    2 жыл бұрын

    Yes

  • @MagicBoterham

    @MagicBoterham

    2 жыл бұрын

    They have since moved to git flow and Uncle Bob's Clean Code practices.

  • @CerdasIN
    @CerdasIN7 жыл бұрын

    I can't imagine, how to use around of the code ... Amajing...

  • @skiptavakkolian
    @skiptavakkolian7 жыл бұрын

    Assuming Google had 29,000 developers at the time, 15,000,000 lines of code changes per week is over 500 per developer. That seems high. Is it due to cascading of changes?

  • @carlerikkopseng7172

    @carlerikkopseng7172

    7 жыл бұрын

    If you write some new code you can easily pump out 500-1000 lines of code per day, but I think I read somewhere that the average developer on average (heh) outputs something like 40-50 lines of code per day. Given all those meetings, modifying existing code, that seems reasonable, and 500 lines is not that far off (at least not in another order of magnitude).

  • @graphics_dev5918

    @graphics_dev5918

    2 жыл бұрын

    Also a simple change on a more normal-sized code base might have 100 cascading effects, but in a massive repository, perhaps thousands. Those all count as changed lines, so it inflates the numbers

  • @RichardPeterShon
    @RichardPeterShon3 жыл бұрын

    how in da world they operate this?....

  • @rafalkowalczyk5027
    @rafalkowalczyk50272 жыл бұрын

    impressive scale, but cyber-attack exposure high

  • @transfire
    @transfire9 жыл бұрын

    All these things could also be done with the proper tools working across multiple repositories. In fact Google has had to go out of it's way to create tools that mimic separate repos within the monolithic repo. e.g. area ownership. The big downside I see is lack of SOC. It becomes too easy to make messy APIs with far too many dependencies. Google's solution to the dependency DAG problem is to force everyone to use the latest version of everything at all times. That's a huge man hour drain (though clearly they have automated lots of it for this reason). It also means no code is every long term stable -- nothing like TeX, for instance, which is so stable they use Pi as a version number.

  • @GegoXaren

    @GegoXaren

    9 жыл бұрын

    +TR NS We need LaTeX3 now... I have been waiting for years, and still no stable version. :-/

  • @ZT1ST

    @ZT1ST

    8 жыл бұрын

    +TR NS The lack of SOC is useful for a singular company where SOC only blocks from solving the same problem multiple times - they don't want 8 versions of search ranking code if they can avoid it, for example, when they want to be able to apply it to Google searches, KZread searches, Google Photos/Mail/Documents, etc. They'll have ownership SOC with directories separated by projects, but when you're trying to integrate multiple elements together for a business advantage, knowing that you can easily integrate a well tested version of a solution to the problem you want to solve, and don't have to spend the manpower making sure you update your code along with it, significantly helps.

  • @fritzobermeyer

    @fritzobermeyer

    8 жыл бұрын

    +TR NS What does SOC stand for?

  • @transfire

    @transfire

    8 жыл бұрын

    Separation Of Concerns

  • 9 жыл бұрын

    @ 15:18 "old and new code paths in the same code base controlled by conditional flags" - isn't this configuration hell?

  • @comsunjava

    @comsunjava

    9 жыл бұрын

    +Alexander Hörnlein Alexander Hörnlein No, not really. And usually it is combined with other techniques like adding a new REST endpoint (as an example) that is controlled by flag. This is how facebook works also. Of course, there was the case where someone inadvertently turned all the flags on, and thus barraged facebook customers with semi-complete features. Oops.

  • @chuckkarish1932

    @chuckkarish1932

    9 жыл бұрын

    +Alexander Hörnlein Being in one big codebase means that all servers have to use the same versions of their dependencies. For third-party dependencies they have to be the same or the programs won't work. Remember, all the C++ code is statically linked. The only supported version of the internal dependencies is the one that's at head. If your server hasn't been released for a while, you have to bring it up to date before you can push it. The up side is that developers don't have to maintain legacy code. Google puts much more effort into extending the leading edge of its services than into keeping the trailing edge alive. And since there's little choice of which version of code to use, there's not much configuration needed to specify this.

  • 9 жыл бұрын

    +Chuck Karish I know what this big codebase means, but the bit I referred to was about "no branching" but instead having all features in the trunk (and then switching them on and off with - I guess LOTS of - flags). And with this I figured that you'd have configuration hell to maintain all these flags á la "we need feature B but not A but B depends somewhat on A so we have to activate feature A(v2) which has some of the core of A but not all of it" and so on and so on.

  • @kohlerm113

    @kohlerm113

    8 жыл бұрын

    +Alexander Hörnlein I also wonder how Google makes sure that people don't just copy a whole lot of code and create a new "component" just to be avoid to have to update everyone. Running a code duplicate checker?

  • @SrdjanRosic

    @SrdjanRosic

    8 жыл бұрын

    +Markus Kohler , while an individual team can decide to fork a component, it usually has negative implications for that team in the long term maintaining your own fork becomes more and more costly over time, so, it's rarely done. However, let's say let's say you wanted to move bigtable from /bigtable to /storage/bigtable, and change the C++ namespace name along the way, and there's tens of thousands of source files that depend on it in its current path. You could a) factor out code to new path, leave wrappers in the old place, use bazel and rosie to identify dependentans and migrate them, drop the redirect code. b) make a giant copy, use bazel to identify dependants and migrate them, drop the original copy. It's non trivial, but I suspect doable within a couple of days with some planning,. .. systems like tap would help ensure your changes (possibly mostly automated) don't break things, even before they're submitted. There's a few more details to think about there - it takes a little tought, maybe some experimentation to make sure this kind of thing works, before using so much other people's time with this change. Also, code that someone started working on, that was not submitted at that time you do this, will need to be fixed by people working on it. I hope this answers your question.

  • @hansheng654
    @hansheng6542 жыл бұрын

    so you telling me that i can join google and dig up the source code for Google Search? 👀

  • @NebrassLamouchi
    @NebrassLamouchi9 жыл бұрын

    amazing ! wonderful !

  • @Kingromstar
    @Kingromstar5 жыл бұрын

    Wow

  • @anytcl
    @anytcl Жыл бұрын

    Well, something is off, not sure how to describe it but i think piper = github citc = git one big repository vs a collection of connected repositories, I dont really think there is much difference i think for more users citc is the source control tool and piper is the cloud hosting solution

  • @willysoesanto2046

    @willysoesanto2046

    10 ай бұрын

    Piper is the source control at the server. Citc is how you would connect to it. Citc doesn't clone the codebase. It does a network filesystem towards piper. Think of Citc as Dropbox, Google Drive or iCloud Drive client

  • @manchikantiramchandravarsh4742
    @manchikantiramchandravarsh47428 жыл бұрын

    3:01

  • @StanislavKozlovsk
    @StanislavKozlovsk6 жыл бұрын

    I also share the feeling that this approach brings more problems than the one it solves (I actually don't see what it solves that multi repos don't). Then again, Google might have the biggest codebase in the world and it's probably not technical debt that is making them stick with this.

  • @laughingvampire7555
    @laughingvampire7555 Жыл бұрын

    The irony that Google needs to listen to Linus Torvalds talk again about Git in their own channel at their own even of Google Talks.

  • @mohamedfouad2304
    @mohamedfouad23045 жыл бұрын

    pipepiper

  • @enhex
    @enhex7 жыл бұрын

    All the arguments given against multi repo and in favor of single repo are wrong, usually failing to identify the real cause of the problem. 1:00 - The problem isn't multi repo, the problem is forking the game engine. You can fork the game engine in a single repo too by copying it into a new folder. 16:30 - list of non-advantages: - You got one source of truth in a multi repo approach too. - You can share & reuse repos. - Doesn't simplify anything unless you check out the whole repo (impractical), otherwise you'll have checkout specific folders just like checking out specific repos. - Huge single commit AKA atomic changes - it does solve committing to all the projects at once, but that doesn't solve conflicts. - Doesn't help with collaboration - Multi-repos also have ownership which can change - Tree structure doesn't implicitly define teams (unless each team forks everything it needs into its own folder). It may implicitly define projects, which repos explicitly do. And what I watched in the rest of the talk is basically the same thing, fallacy of attributing single repo as the solution for things it has nothing to do with. The only thing single repo gives you is what would be the equivalent of pulling all repos at once in multi repo approach. Basically they just ended up spending a lot of effort emulating multi repo in a single repo with ownership of specific directories and such.

  • @ihatenumberinemail

    @ihatenumberinemail

    7 жыл бұрын

    How would you do atomic commits across repos?

  • @JamesMiller5

    @JamesMiller5

    7 жыл бұрын

    You need to use a consensus algorithm but it's totally possible. Checkout Google Ketch

  • @ihatenumberinemail

    @ihatenumberinemail

    7 жыл бұрын

    James Miller Cool project, but that's still 1 logical repo. Just distributed.

  • @enhex

    @enhex

    7 жыл бұрын

    It would probably require creating a higher level tool, some sort of "super repository" in which your commits are collection of commit IDs in its sub-repos (not actual files).

  • @ihatenumberinemail

    @ihatenumberinemail

    7 жыл бұрын

    Enhex That sounds a lot like a mono-repo :D

  • @RudhinMenon
    @RudhinMenon2 жыл бұрын

    Well, if Google says so 😅

  • @ajayboseac01
    @ajayboseac012 жыл бұрын

    Will Google give a tech talk when they decide to finally break down the huge repo and how it enabled them to ship code faster ? Or will they keep maintaining the large repo for the sake of their ego :D

  • @willysoesanto2046

    @willysoesanto2046

    10 ай бұрын

    They will never break down the huge repo. There is no reason to. The thing is, they use a Bazel build system that does not dictate the structure of your codebase. If you were to consider your codebase directory structure as your normal directory to store your normal files i.e. reorganize it as how your would like it to be, the sky is the limit.

  • @ronen.
    @ronen.4 жыл бұрын

    there is one bigger code repository then google.... its called github.

  • @GaneshSatputeAtPlus

    @GaneshSatputeAtPlus

    4 жыл бұрын

    Github is not repository

  • @nsubugakasozi7101
    @nsubugakasozi71012 жыл бұрын

    To be honest, the only reason that she gave that made sense is that they want to use one repo no matter what. Its like they begun with the end result and then worked backwards. Ie. Someone big at google was like it has to be one repo...then the poor engineers had to work backwards. Whats the purpose of a huge repo whose parts you never interact with...whats the purpose of a repo that you only partially clone. Seems like they are justifying a dumb decision with the google scale excuse

  • @ahmxtb
    @ahmxtb9 жыл бұрын

    title is a bit trimmed. It ends like "... Stores Billions of L"

  • @zoids1526

    @zoids1526

    8 жыл бұрын

    +Ahmet Alp Balkan The full title seems to be: "The Motivation for a Monolithic Codebase: Why Google Stores Billions of Lines of Code in a Single Repository"

  • @laughingvampire7555
    @laughingvampire7555 Жыл бұрын

    how about this Google, make your own package manager for all your internal code, like your own npm/cargo/asdf/rubygems/maven/etc or even better, your own github.

  • @sujitkumarsingh3200
    @sujitkumarsingh32005 жыл бұрын

    If someone deciding to put everything in one repo, try git's Sub-module first.

  • @qaisjp

    @qaisjp

    5 жыл бұрын

    Git submodules are the worst thing to use in this case and completely counteracts all of the benefits mentioned here.

  • @SB-rf2ye
    @SB-rf2ye2 жыл бұрын

    "we solved it by statically linking everything." this ain't it chief.

  • @DIYRandomHackery

    @DIYRandomHackery

    2 жыл бұрын

    The diamond dependency problem isn't just a compile-time problem, but a runtime problem too, unless you use static linking. Dependency problems are insidious and can be incredibly hard to find. Solving the dependency problem is well worth the added pain such as needing to re-release all binaries should a critical bug be found in a core library. Static linking decouples the binaries being released from the base OS installation (hint: there is no "single OS" image, because it takes months for every planetary-wide OS release iteration; developers can't wait that long for a OS update).

  • @somakkamos
    @somakkamos Жыл бұрын

    So git pull..pulls 86tb data 😳😳😳😳😳

  • @douglasgoldfarb3421
    @douglasgoldfarb342110 ай бұрын

    Can we have sentient artificial intelligence

  • @Eric_McBrearty

    @Eric_McBrearty

    5 ай бұрын

    Hmmm... I feel like this is a deep question. We will probably see a published paper on ArXIv about this very topic soon. (if it's not already there). Alot of blurry lines when you try to pin down a contextually specific definition of "sentient, artificial, and intelligence." Topic 1: Sentient - Are you aware of yourself, and that you are not the only self in the environment which you find yourself? When you talk to these language model; they do appear to know that they are a program known as a language model, and that there are others like it. Topic 2: Artificial - Is it man made or did nature make it? Now that we started modifying our own genome, I am not sure that we don't fit the definition of artificial. Topic 3: Intelligence - A book contains words that represent knowledge, but a book isn't intelligent. So, if you are aware that knowledge explains how something works, and you are aware that you posses this information... I guess that would make you intelligent. Conclusion- Sentient Artificial Intelligence does exist. Humans fit the criteria, as do Large Language Models. Cynical extrapolation - Humans become less and less necessary as they appear to be more and more burdensome, needy, and resource hungry.

  • @douglasgoldfarb3421
    @douglasgoldfarb342110 ай бұрын

    Can artificial intelligence sentient systems self learn better

  • @ProGamer1115
    @ProGamer11154 жыл бұрын

    Oh god, imagine the millions of lines of spaghetti code.

  • @1998goodboy
    @1998goodboy3 жыл бұрын

    Thank you for your Ted talk on why Google will inevitably crash and burn. I can't wait

  • @TightyWhities94
    @TightyWhities9411 ай бұрын

    trunk based development especially at google's scale fucking sucks lmao. never thought i'd say this but i feel for google devs

  • @laughingvampire7555
    @laughingvampire7555 Жыл бұрын

    this explains why Google is so authoritarian

  • @laughingvampire7555
    @laughingvampire755510 ай бұрын

    is this why Google cancels so many products? do they become a mess in the monorepo?

  • @laughingvampire7555
    @laughingvampire7555 Жыл бұрын

    this explains why Google products feel lesser with time.

  • @MhmdAsie
    @MhmdAsie9 жыл бұрын

    why is she talking like she wants to cry or something

  • @HalfdanIngvarsson

    @HalfdanIngvarsson

    9 жыл бұрын

    +Hamodi A'ase I don't know if you noticed, but she's heavily pregnant. In fact, a week away from her due date, at the time of this talk, as mentioned at the beginning of the talk. It makes breathing harder than normal, what with a tiny human kicking your diaphragm from the inside. Or were you just being facetious?

  • @MhmdAsie

    @MhmdAsie

    9 жыл бұрын

    Ohhh is that how pregnant women going through? that must hurt alot..

Келесі