The Impact of Code Review Coverage and Code Review Participation on Software Quality
Total Page:16
File Type:pdf, Size:1020Kb
The Impact of Code Review Coverage and Code Review Participation on Software Quality A Case Study of the Qt, VTK, and ITK Projects Shane McIntosh1, Yasutaka Kamei2, Bram Adams3, and Ahmed E. Hassan1 1Queen’s University, Canada 2Kyushu University, Japan 3Polytechnique Montréal, Canada 1{mcintosh, ahmed}@cs.queensu.ca [email protected] [email protected] ABSTRACT 1. INTRODUCTION Software code review, i.e., the practice of having third-party Software code reviews are a well-documented best practice team members critique changes to a software system, is a for software projects. In Fagan's seminal work, formal design well-established best practice in both open source and pro- and code inspections with in-person meetings were found prietary software domains. Prior work has shown that the to reduce the number of errors detected during the testing formal code inspections of the past tend to improve the qual- phase in small development teams [8]. Rigby and Bird find ity of software delivered by students and small teams. How- that the modern code review processes that are adopted in a ever, the formal code inspection process mandates strict re- variety of reviewing environments (e.g., mailing lists or the view criteria (e.g., in-person meetings and reviewer check- Gerrit web application1) tend to converge on a lightweight lists) to ensure a base level of review quality, while the mod- variant of the formal code inspections of the past, where ern, lightweight code reviewing process does not. Although the focus has shifted from defect-hunting to group problem- recent work explores the modern code review process qual- solving [34]. Nonetheless, Bacchelli and Bird find that one of itatively, little research quantitatively explores the relation- the main motivations of modern code review is to improve ship between properties of the modern code review process the quality of a change to the software prior to or after and software quality. Hence, in this paper, we study the integration with the software system [2]. relationship between software quality and: (1) code review Prior work indicates that formal design and code inspec- coverage, i.e., the proportion of changes that have been code tions can be an effective means of identifying defects so that reviewed, and (2) code review participation, i.e., the degree they can be fixed early in the development cycle [8]. Tanaka of reviewer involvement in the code review process. Through et al. suggest that code inspections should be applied metic- a case study of the Qt, VTK, and ITK projects, we find ulously to each code change [39]. Kemerer and Faulk in- that both code review coverage and participation share a dicate that student submissions tend to improve in quality significant link with software quality. Low code review cov- when design and code inspections are introduced [19]. How- erage and participation are estimated to produce compo- ever, there is little quantitative evidence of the impact that nents with up to two and five additional post-release defects modern, lightweight code review processes have on software respectively. Our results empirically confirm the intuition quality in large systems. that poorly reviewed code has a negative impact on software In particular, to truly improve the quality of a set of pro- quality in large systems using modern reviewing tools. posed changes, reviewers must carefully consider the poten- tial implications of the changes and engage in a discussion Categories and Subject Descriptors with the author. Under the formal code inspection model, time is allocated for preparation and execution of in-person D.2.5 [Software Engineering]: Testing and Debugging| meetings, where reviewers and author discuss the proposed Code inspections and walk-throughs code changes [8]. Furthermore, reviewers are encouraged to follow a checklist to ensure that a base level of review General Terms quality is achieved. However, in the modern reviewing pro- cess, such strict reviewing criteria are not mandated [36], Management, Measurement and hence, reviews may not foster a sufficient amount of discussion between author and reviewers. Indeed, Microsoft Keywords developers complain that reviews often focus on minor logic Code reviews, software quality errors rather than discussing deeper design issues [2]. We hypothesize that a modern code review process that neglects to review a large proportion of code changes, or suf- fers from low reviewer participation will likely have a nega- Permission to make digital or hard copies of all or part of this work for tive impact on software quality. In other words: personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MSR ’14, May 31 - June 1, 2014, Hyderabad, India 1 Copyright 2014 ACM 978-1-4503-2863-0/14/05 ...$15.00. https://code.google.com/p/gerrit/ If a large proportion of the code changes that are integrated during development are either: (1) omit- ted from the code review process (low review cover- age), or (2) have lax code review involvement (low review participation), then defect-prone code will permeate through to the released software product. Tools that support the modern code reviewing process, such as Gerrit, explicitly link changes to a software system recorded in a Version Control System (VCS) to their respec- tive code review. In this paper, we leverage these links to calculate code review coverage and participation metrics and add them to Multiple Linear Regression (MLR) models that Figure 1: An example Gerrit code review. are built to explain the incidence of post-release defects (i.e., defects in official releases of a software product), which is a for the author to address or discuss. The author can reply popular proxy for software quality [5, 13, 18, 27, 30]. Rather to comments or address them by producing a new revision than using these models for defect prediction, we analyze the of the patch for the reviewers to consider. impact that code review coverage and participation metrics Reviewers can also give the changes proposed by a patch have on them while controlling for a variety of metrics that revision a score, which indicates: (1) agreement or disagree- are known to be good explainers of code quality. Through ment with the proposed changes (positive or negative value), a case study of the large Qt, VTK, and ITK open source and (2) their level of confidence (1 or 2). The second column systems, we address the following two research questions: of the bottom-most table in Figure 1 shows that the change (RQ1) Is there a relationship between code review has been reviewed and the reviewer is in agreement with it coverage and post-release defects? (+). The text in the fourth column (\Looks good to me, Review coverage is negatively associated with the approved") is displayed when the reviewer has a confidence incidence of post-release defects in all of our mod- level of two. els. However, it only provides significant explana- Verifiers. In addition to reviewers, verifiers are also invited tory power to two of the four studied releases, sug- to evaluate patches in the Gerrit system. Verifiers execute gesting that review coverage alone does not guaran- tests to ensure that: (1) patches truly fix the defect or add tee a low incidence rate of post-release defects. the feature that the authors claim to, and (2) do not cause regression of system behaviour. Similar to reviewers, ver- (RQ2) Is there a relationship between code review ifiers can provide comments to describe verification issues participation and post-release defects? that they have encountered during testing. Furthermore, Developer participation in code review is also as- verifiers can also provide a score of 1 to indicate successful sociated with the incidence of post-release defects. verification, and -1 to indicate failure. In fact, when controlling for other significant ex- While team personnel can act as verifiers, so too can Con- planatory variables, our models estimate that com- tinuous Integration (CI) tools that automatically build and ponents with lax code review participation will con- test patches. For example, CI build and testing jobs can tain up to five additional post-release defects. be automatically generated each time a new review request or patch revision is uploaded to Gerrit. The reports gener- Paper organization. The remainder of the paper is orga- ated by these CI jobs can be automatically appended as a nized as follows. Section 2 describes the Gerrit-driven code verification report to the code review discussion. The third review process that is used by the studied systems. Sec- column of the bottom-most table in Figure 1 shows that the tion 3 describes the design of our case study, while Section 4 \Qt Sanity Bot" has successfully verified the change. presents the results of our two research questions. Section 5 Automated integration. Gerrit allows teams to codify discloses the threats to the validity of our study. Section 6 code review and verification criteria that must be satisfied surveys related work. Finally, Section 7 draws conclusions. before changes are integrated into upstream VCS reposito- ries. For example, a team policy may specify that at least 2. GERRIT CODE REVIEW one reviewer and one verifier provide positive scores prior Gerrit is a modern code review tool that facilitates a trace- to integration. Once the criteria are satisfied, patches are able code review process for git-based software projects [4]. automatically integrated into upstream repositories. The Gerrit tightly integrates with test automation and code in- \Merged" status shown in the upper-most table of Figure 1 tegration tools.