Code Coverage at Google

Code Coverage at Google Marko Ivanković Goran Petrović René Just Gordon Fraser [email protected] [email protected] [email protected] [email protected] Google Switzerland GmbH Google Switzerland GmbH University of Washington University of Passau Zurich, Switzerland Zurich, Switzerland Seattle, WA, USA Passau, Germany ABSTRACT For example, Piwowarski et al. [25] mention that IBM performed Code coverage is a measure of the degree to which a test suite code coverage measurements in the late 1960s, and Elmendorf [13] exercises a software system. Although coverage is well established provided a reasonably robust strategy for using branch coverage in in software engineering research, deployment in industry is often testing operating systems as early as 1969. inhibited by the perceived usefulness and the computational costs While code coverage is commonly used in software engineering of analyzing coverage at scale. At Google, coverage information is research, its effectiveness at improving testing in practice is still computed for one billion lines of code daily, for seven programming a matter of debate [14, 18] and its adoption in industry is not uni- languages. A key aspect of making coverage information actionable versal. A large survey at IBM on their use of code coverage [25] is to apply it at the level of changesets and code review. revealed that once code coverage was adopted, it led to an increase This paper describes Google’s code coverage infrastructure and in test suite quality; however, ease of use and scalability are primary how the computed code coverage information is visualized and factors in whether code coverage is adopted in the first place. Exist- used. It also describes the challenges and solutions for adopting ing industry reports on code coverage are typically based on code code coverage at scale. To study how code coverage is adopted and bases of manageable size with one or two programming languages. perceived by developers, this paper analyzes adoption rates, error However, even at that scale, these reports raise concerns about the rates, and average code coverage ratios over a five-year period, cost-effectiveness of code coverage [20]. In particular, they raise and it reports on 512 responses, received from surveying 3000 concerns that both the machine resources required to compute cov- developers. Finally, this paper provides concrete suggestions for erage as well as the developer time required to process the results how to implement and use code coverage in an industrial setting. are too costly compared to the increase in test suite quality. Addressing and overcoming these concerns, Google has spent CCS CONCEPTS more than a decade refining its coverage infrastructure and imple- menting and validating the academic approaches. Google’s infras- • Software and its engineering → Software configuration man- tructure supports seven programming languages and scales to a agement and version control systems; Software testing and codebase of one billion lines of code that receives tens of thousands debugging; Empirical software validation; Collaboration in soft- commits per day. The paper details Google’s code coverage infras- ware development. tructure and discusses the many technical challenges and design KEYWORDS decisions. This coverage infrastructure is integrated at multiple points in the development workflow. This paper describes this inte- coverage, test infrastructure, industrial study gration, and reports on the adoption and perceived usefulness of ACM Reference Format: code coverage at Google, by analyzing five years of historical data Marko Ivanković, Goran Petrović, René Just, and Gordon Fraser. 2019. Code and the 512 responses, received from surveying 3000 developers. Coverage at Google. In Proceedings of the 27th ACM Joint European Software In summary, this paper makes the following contributions: Engineering Conference and Symposium on the Foundations of Software En- gineering (ESEC/FSE ’19), August 26–30, 2019, Tallinn, Estonia. ACM, New • It demonstrates that it is possible to implement a code cov- York, NY, USA,9 pages. https://doi.org/10.1145/3338906.3340459 erage infrastructure based on existing, well-established li- braries, seamlessly integrate it into the development work- 1 INTRODUCTION flow, and scale it to an industry-scale code base. Code coverage is a well established concept in computer science, • It details Google’s infrastructure for continuous code cover- and code coverage criteria such as statement coverage, branch age computation and how code coverage is visualized. coverage, and modified condition/decision coverage (MC/DC [11]) • It details adoption rates, error rates, and average code cover- are well-known measures for test suite adequacy. For example, age ratios over a five-year period. MC/DC-adequacy is required for safety-critical systems (RTCA • DO-178). Code coverage has been known and applied for decades. It reports on the perceived usefulness of code coverage, analyzing 512 responses from surveyed developers at Google. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed The remainder of this paper is structured as follows. Section2 for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. provides background information and terminology. Section3 details For all other uses, contact the owner/author(s). Google’s code coverage infrastructure. Section4 describes the adop- ESEC/FSE ’19, August 26–30, 2019, Tallinn, Estonia tion of code coverage and its usefulness, as perceived by Google © 2019 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-5572-8/19/08. developers. Section5 discusses related work. Finally, Section6 con- https://doi.org/10.1145/3338906.3340459 cludes and discusses future work. 955 ESEC/FSE ’19, August 26–30, 2019, Tallinn, Estonia Marko Ivanković, Goran Petrović, René Just, and Gordon Fraser Java Go C++ Python CL 1 CL 3 CL 4 CL 5 100 Pass change 80 CL 2 CL 2 CL 2 CL 2 Fail CL 2 S1 S2 S3 S4 S5 60 mail submit Figure 1: The life cycle of a changelist (CL). 40 Percentage of lines 20 2 BACKGROUND 0 Code coverage is a well established concept, but also a broad term that requires a more precise definition when applied in practice. This section defines important terms and concepts used throughout 1 2 3 4 5 the paper. Number of statements 2.1 Projects and Changelists Figure 2: Distribution of number of statements per line for C++, Java, Go, and Python in Google’s codebase. The following two concepts are required to understand the devel- Multi-statement lines are predominantly for loops and idiomatic returns. opment workflow at Google and its integration of coverage: Lines with more than 5 statements are removed for clarity; in total, they contribute less than 1%. • Project: A project is a collection of code that the project own- ers declare in a project definition file. The project definition 2.2 Code Coverage contains a set of (i.e., patterns describing a set code paths Google’s code coverage infrastructure measures line coverage, of files) and a set of test suites and organizational metadata which is the percentage of lines of source code executed by a set such as a contact email address. of tests. Note that blank lines and comments do not count as lines • Changelist: A changelist is an atomic update to Google’s of code, and hence do not contribute towards the denominator. Be- centralized version control system. It consists of a list of files, cause Google enforces style guides for all major languages [15], line the operations to be performed on these files, and possibly coverage strongly correlates with statement coverage in Google’s the file contents to be modified or added. codebase. Figure2 gives the distribution of number of statements Figure1 illustrates the life cycle of a changelist using an example. per line. Most lines contain a single statement, and lines with more CL 1 represents the starting state of the codebase. Suppose that at than one statement predominantly contain loops (initialization, this point a developer issues a change command, which creates a condition, update) and non-trivial, nested return statements. new changelist, CL 2. Initially, this new changelist is in a “pending” For C++, Java, and Python, Google’s code coverage infrastruc- state and the developer can edit its contents. As the developer edits ture also measures branch coverage, which is the percentage of CL 2, it progresses through so called “snapshots” (represented in branches of conditional statements (e.g., if statements) executed by the diagram as S1 to S5). Once the developer is satisfied with the a set of tests. Branch coverage is equivalent to edge coverage for contents of a changelist, they issue a mail command which initiates all conditional edges in the control flow graph [27]. the code review process. At this point, automated analyses includ- This paper distinguishes between two coverage scopes: ing code coverage are started and a notification is emailed to the • Project coverage: Line coverage of a project’s test suites reviewers. The changelist is now said to be “in review”. Through over that project’s code paths. Project coverage is computed the review process the contents may change, leading to different once per day for a given project. snapshots. When the reviewers and the author approve of a change- • Changelist coverage: The union of line coverage of all test list, the author issues a submit command. In the example above, suites of all projects that are affected by the changelist—that this command first merges CL 2 with the current submitted state is, all projects whose code paths appear in the changelist diff. of the codebase (CL 4) and, if the merge is successful, runs more Changelist coverage is computed one or more times for a automated tests.

Load more