Feedback Topics in Modern Code Review: Automatic Identification

Feedback Topics in Modern Code Review: Automatic Identification and Impact on Changes Janani Raghunathan, Lifei Liu, Huzefa Kagdi Department of Electrical Engineering and Computer Science Wichita State University Wichita, Kansas 6760, USA Email: fjxraghunathan, lxliu2, [email protected] Abstract— Recent empirical studies show that the practice Peer-code review is the process of reviewers critiquing the of peer-code-review improves software quality. Therein, the code changes that developers submit to decide if those changes quality is examined from the external perspective of reducing are of acceptable quality and can be integrated to the main code the defects/failures, i.e., bugs, in reviewed software. There is a very little to no investigation on the impact of peer-code- base of a software system. Nowadays, it is often lightweight, review on improving the internal quality of software, i.e., what informal and tool-based, which is termed as Modern Code exactly is affected in code, due to this process. To this end, we Review (MCR) [13]. MCR is popularly used in industrial and conducted an empirical study on the human-to-human discourse open source software-development paradigms [14], [2], [15]. about the code changes, which are recorded in modern code The primary discourse between reviewers and developers is review tools in the form of review comments. Our objective of this study was to investigate the topics which are typically in the form of textual feedback about the code changes at the addressed via the textual comments. Although, there is an existing line level or collectively within an enabling tool, e.g., Gerrit. taxonomy of topics, there is no automatic approach to categorize The primary objective of this paper is to investigate how code reviews. We present a machine-learning-based approach to automatically classify reviewer-to-reviewer and reviewer-to- code review is effective in improving the internal software developer comments on proposed code changes. We applied this quality, which we substantiated with three research questions: approach on 468 code review comments of four open source RQ1: What are the common topics, related to both internal systems, namely eclipse, mylyn, android and openstack. The and external qualities, discussed during code review? results show that Evolvability categories are dominating topics. In an attempt to verify these observations, we analyzed the code RQ2: To what extent these common topics can be automati- changes that developers performed on receiving these comments. cally classified? We identified several refactorings that are congruent with the topics of review comments. Refactorings are mechanisms to RQ3: How do the developers address the review comments improve the internal structure of software. Therefore, our work that reviewers provide to revise their code changes? provides initial empirical evidence on the effectiveness of peer- With respect to RQ1, we analyzed the code review reposi- code-review on improving internal software quality. tories of four open source projects, namely eclipse, mylyn, I. INTRODUCTION android and openstack, which are archived in Gerrit. We manually investigated the three most relevant attributes of Different types of maintenance help improve sustainability information from Gerrit: the patch description as it contains and quality of large-scale software systems. Corrective mainte- the reason the patch is submitted, the changed lines of code by nance helps eliminating or reducing defects in the software; the developers and the review comments from the reviewers thereby, improving the external software quality. Preventive or for the proposed code changes. We classified each of the Perfective types of maintenance may not necessarily address review comments into the most appropriate topic based on the defects or features at hand directly; however, they improve the taxonomy of Mantyla et al. [16]. Our results indicate that the design or code structure; thereby, improving the internal topics related to both external and internal qualities are found software quality. Software testing is primarily used to identify during the code review; however, those related to the internal the defects and is often considered as an ubiquitous mechanism quality are dominating. As a prime example, 75% of the defects to improve the external quality. Code review is a rejuvenated identified during code review are evolvability type of defects phenomena with benefits in improving the software quality. and hence the majority of the comments raised by the reviewers Extensive research shows the usefulness of code review in are focussed on improving the internal software quality. improving the external quality [1], [2], [3], [4], which is an With respect to RQ2, we present an automatic approach to important result. Unfortunately, there is little to no work on identify the topics found in code reviews. Our approach builds investigating how code review improves the internal software a machine-learning-based classifier to automatically categorize quality. Previous studies [5], [6], [7], [8], [9], [10], [11], [12] the code change into its appropriate topic. The results of our show that the internal quality is also critical and equally (or automatic classifier suggest that evolvability type of issues in more) important. For example, evolvability or internal design a code can be well predicted with an average precision and issues could lead to code decay and/or premature degradation. recall of 0.45 and 0.41 respectively. Although, Mantyla et al. DOI reference number: 10.18293/SEKE2018-097 [16] proposed the taxonomy, they did not offer any automatic solutions to classify topics or issues typically identified during result of such feedback. The code change is typically depicted the code review. We also discuss application scenarios of this by showing the difference of the code before and after the automatic classification (see Section IV). change. Eventually, a reviewer signs-off on the review, once With respect to RQ3, we investigated the review comments they believe the code change is of sufficient quality to be from mylyn, eclipse and android. We manually studied the checked into the code repository. If a change never received changes in the code for the review comments raised by the sign-off, it is abandoned. It is critical that code review is both reviewers. As every review comment corresponded to a specific effective (actually improves code changes and blocks poor code topic, we were able to correlate the action of a developer to a from being checked into the repository) and timely (does not particular topic. We observed that developers adopted different act as a bottle-neck too by slowing down changes). Therefore, refactorings to address different defect topics in code review. automation and tool support are key to its success. We investigated the structural changes that took place in differ- Mantyla et al. [16] divided defects detected from code review ent revisions of code review as a result of review comments and into the following parent groups: evolvability, functional, and observed that developers did perform refactoring to address the false positives. Evolvability defects are sub-divided into Docu- review comments. By comparing those refactoring techniques mentation, Visual Representation, and Structure. The Functional with the respective topics, we observed a substantial congruence category has seven groups: Resource, Check, Interface, Logic, between the topics of feedback from reviewers and the specific Timing, Support and Larger defects. Each of these defect (categories) of refactorings developers performed to address categories have a definition as defined by Mantyla et al. and the review feedback in revising their code changes. Refactoring C. Bird et al. in their works [16] and [17] respectively. We is a mechanism to improve the internal design of the software. have used the taxonomy proposed by them to classify the Therefore, it is evident from our results that code review is reviews in our dataset. Mantyla et al. [16] have broadly referred effective in improving the internal software quality. to all types of issues found in code during code review as defects, irrespective of whether it is an external defect affecting II. BACKGROUND ON MCR AND TAXONOMY OF TOPICS the functionality of the software or an internal design flaw affecting the internal quality of the software. For consistency We define the key concepts involved in the modern code and simplicity, we adopted the same terminology. We have review process, which is driven by supporting infrastructure referred to all types of issues found by the reviewers as different and tools, e.g., Gerrit. types of defects or topics found in the code. Code Change: A code change is a set of modified source We define an evolvability defect as a defect in the code code files submitted in order to fix a bug or add a feature. that makes the code less compliant with standards, more error- Patch Description: A brief information about the patch and prone, or more difficult to modify, extend, or understand. The the reason it is submitted. For instance, it contains information functional defects are those that cause a system failure or fail about the bug id from Bugzilla, if the patch is submitted to in their business logic. False positives are those class of defects address a particular bug. which were initially suspected to be defects but later on were Review: A code review is a record of the interactions between discovered to be as no defects during team meetings. Each the owner of a change and reviewers of the change including category is further divided into sub-categories. comments on the code and signoffs from reviewers. Owner: An owner is the developer who makes the change III. EMPIRICAL STUDY:FORMULATION OF BENCHMARK in the source code and submits it for review. The purpose of this study was twofold: 1) to determine Reviewer: A reviewer on a particular review is a developer which specific categories were prevalent in the open-source who is assigned to and/or contributes to that review.

Load more