Advantages and Disadvantages of a Monolithic Repository
Total Page:16
File Type:pdf, Size:1020Kb
1 Advantages and Disadvantages of a Monolithic Repository 2 3 A case study at Google 4 5 Ciera Jaspan, Matthew Jorde, Emerson Murphy-Hill∗ 6 NC State University 7 Andrea Knight, Caitlin Sadowski, [email protected] 8 Edward K. Smith, Collin Winter 9 Google 10 ciera,majorde,aknight,supertri,edwardsmith, 11 [email protected] 12 13 ABSTRACT the organization. Successfully organizing these dependencies 14 Monolithic source code repositories (repos) are used by sev- and frameworks is crucial for development velocity. 15 eral large tech companies, but little is known about their One approach to scaling development practices is the 16 advantages or disadvantages compared to multiple per-project monolithic repo, a model of source code organization where 17 repos. This paper investigates the relative tradeoffs by utiliz- engineers have broad access to source code, a shared set 18 ing a mixed-methods approach. Our primary contribution is a of tooling, and a single set of common dependencies. This 19 survey of engineers who have experience with both monolithic standardization and level of access is enabled by having a 20 repos and multiple, per-project repos. This paper also backs single, shared repo that stores the source code for all the 21 up the claims made by these engineers with a large-scale anal- projects in an organization. Several large software companies 22 ysis of developer tool logs. Our study finds that the visibility have already moved to this organizational model, including 23 of the codebase is a significant advantage of a monolithic repo: Facebook, Google, and Microsoft [10, 12, 17, 21]; however, 24 it enables engineers to discover APIs to reuse, find examples there is little research addressing the possible advantages 25 for using an API, and automatically have dependent code or disadvantages of such a model. Does broad access to 26 updated as an API migrates to a new version. Engineers source code let software engineers better understand APIs 27 also appreciate the centralization of dependency management and libraries, or overwhelm engineers with use cases that 28 in the repo. In contrast, multiple-repository (multi-repo) aren’t theirs? Do projects benefit from shared dependency 29 systems afford engineers more flexibility to select their own versioning, or would engineers prefer more stability for their 30 toolchains and provide significant access control and stability dependencies? How often do engineers take advantage of 31 benefits. In both cases, the related tooling is also a significant the workflows that monolithic repos enable? Do engineers 32 factor; engineers favor particular tools and are drawn to repo prefer having consistent, shared toolchains or the flexibility 33 management systems that support their desired toolchain. of selecting a toolchain for their project? 34 In this paper, we investigate the experience of engineers 35 CCS CONCEPTS working within a monolithic repo and the tradeoffs between 36 using a monolithic repo and a multi-repo codebase. Specifi- • Software and its engineering → Software configuration man- 37 cally, this paper seeks to answer two research questions: agement and version control systems; 38 39 (1) What do developers perceive as the benefits and 40 1 INTRODUCTION drawbacks to working in a monolithic versus multi- 41 Companies today are producing more source code than ever repo environment? 42 before. Given the increasingly large codebases involved, it (2) To what extent do developers make use of the unique 43 is worth examining the software engineering experience pro- advantages that monolithic repos provide? 44 vided by the various approaches for source code management. 45 Large companies with multiple products typically have many To answer these questions, we ran a mixed-methods case 46 internal libraries and frameworks, and a vast number of de- study within a single company with a monolithic repo. We 47 pendencies between projects from entirely separate parts of surveyed software engineers to understand their perceptions 48 about working in monolithic repos. For engineers that also ∗ 49 Work completed while on sabbatical at Google had experience working in multi-repo systems, we asked 50 further questions to understand the benefits of each and why Permission to make digital or hard copies of part or all of this work 51 for personal or classroom use is granted without fee provided that they might prefer one model over another. We also analyzed 52 copies are not made or distributed for profit or commercial advantage the logs from developer tools to study the extent to which 53 and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. engineers utilize their ability to view and edit all of the code 54 For all other uses, contact the owner/author(s). in the codebase. We examined how often engineers view and 55 ICSE-SEIP ’18, Gothenburg, Sweden edit code far afield from their team and organization, and we 56 © 2018 Copyright held by the owner/author(s). 978-1-4503-5659- 6/18/05. $15.00 examined whether these views are simply to popular APIs. 57 DOI: 10.1145/3183519.3183550 Our survey results show that engineers at Google strongly 58 ICSE-SEIP ’18, May 27-June 3, 2018, Gothenburg, Sweden C. Jaspan et al. 1 prefer our monolithic repo, and that visibility of the code- system, it may still be true that code is viewable by all 2 base and simple dependency management were the primary engineers, as is the case for open-source projects on GitHub 3 factors for this preference. Engineers also cite as important or BitBucket. In theory, a multi-repo system could also have 4 the ability to find example uses of API and the ability to a shared set of developer tools; in practice this is rare as 5 automatically receive API updates. Logs data confirms that there is no enforcement for this to happen across repos. In a 6 engineers do take advantage of both the visibility of the code- multi-repo setup, commits are not to a single head, so version 7 base and the ability to edit code from other teams. Contrary skew and diamond dependencies (where each project may 8 to expectations, viewing popular APIs was not the primary depend on a different version of a library) do occur. 9 reason engineers view code outside of their team; this provides At Google, almost all code exists in a single large, cen- 10 further [18] evidence that viewing code to find examples of tral repo, in which almost all code2 is visible to almost all 11 using an API is important, possibly more so than viewing engineers. The repo is used by over 20,000 engineers and 12 the implementation of the API. contains over 2 billion lines of code. All engineers that work 13 We also discovered many interesting tradeoffs between in this monolithic repo use a shared set of tools, including a 14 using monolithic and multi-repo codebases; they each had single build system, common testing infrastructure, a single 15 benefits that were not possible in the other system. One code browsing tool, a single code review tool, and a custom 16 such tradeoff was around dependencies. Engineers note that source control system. The build system depends on com- 17 a primary benefit of multi-repo codebases is the ability to pilers that are also checked into the codebase; this allows 18 maintain stable, versioned dependencies. This is particularly a centralized tooling team to update the compiler version 19 interesting because it is in direct contrast to two of the pri- across the company. 20 mary benefits of a monolithic repo: ease of both dependency While engineers can view and edit nearly the entire code- 21 management and of receiving API updates. Another tradeoff base, all code is committed only after the approval of a code 22 appeared around flexibility of the toolchain. Engineers who owner. Code ownership is path-based, and directory owners 23 prefer multiple repos also prefer the freedom and flexibility implicitly own all subdirectories as well. Engineers are limited 24 to select their own toolchain. Interestingly, the forced con- to using a small set of programming languages, and there is 25 sistency of a monolithic repo was also cited as a benefit of a tool-enforced style for each language. 26 monolithic repos. 27 Finally, we saw evidence that for some engineers, the de- 3 METHODOLOGY 28 velopment tools were more important than the style of repo. To understand how engineers perceive the advantages and 29 Engineers called out favored development tools by name as a disadvantages of a monolithic repo, we surveyed a sample of 30 reason to use one repo over another, even though in theory, engineers at Google. Rather than only measuring engineer 31 these tools could be available for any type of repo. satisfaction with the monolithic repo, we asked them to 32 compare Google’s monolithic repo to their prior experiences 33 2 MONOLITHIC REPOSITORIES with other repo systems and to one hypothetical example. 34 For purposes of this paper, we define a monolithic source The goal was to identify the relative tradeoffs between a 35 repo to have several properties: monolithic repo and multi-repos. 36 We also took advantage of our ability to log engineers’ 37 (1) Centralization: The codebase is contained in a single repo encompassing multiple projects. interactions with the codebase. Our common developer tools 38 allow us to instrument not only commits to the codebase, 39 (2) Visibility: Code is viewable and searchable by all engineers in the organization. but also file views. We used these logs to confirm some of the 40 survey responses by showing that engineers not only say they 41 (3) Synchronization: The development process is trunk- based; engineers commit to the head of the repo.