Does Code Review Really Remove Coding Convention Violations?

Does code review really remove coding convention violations? DongGyun Han§ Chaiyong Ragkhitwetsagul Jens Krinke Matheus Paixao Giovanni Rosa Amazon SERU, Faculty of ICT University College London University of Fortaleza University of Molise Germany Mahidol University UK Brazil Italy Thailand Abstract—Many software developers perceive technical debt Moreover, the same suvery of 682 developers also showed that as the biggest problems in their projects. They also perceive the adoption of code review had the biggest impact to code code reviews as the most important process to increase code quality [4]. quality. As inconsistent coding style is one source of technical debt, it is no surprise that coding convention violations can lead Tao et al. [12] reported that developers believe coding to patch rejection during code review. However, as most research conventions are important criteria to apply in evaluating a has focused on developer’s perception, it is not clear whether code patch, i.e., a code change, during code review. If a patch reviews actually prevent the introduction of coding convention does not follow coding conventions, the patch will be rejected violations and the corresponding technical debt. during the review. Furthermore, Tao et al. [12] observed that Therefore, we investigated how coding convention violations are introduced, addressed, and removed during code review 21.7% of patches in Eclipse and Mozilla projects are rejected if by developers. To do this, we analysed 16,442 code review they violate coding conventions (e.g., poor naming) or contain requests from four projects of the Eclipse community for the missing documentation). Their results show that developers are introduction of convention violations. Our result shows that still struggling with checking coding conventions manually, convention violations accumulate as code size increases despite although automated tools for checking coding conventions are changes being reviewed. We also manually investigated 1,268 code review requests in which convention violations disappear and available such as Checkstyle [13] and PMD [14]. Previous observed that only a minority of them have been removed because studies have reported that developers ignore about 90% of a convention violation has been flagged in a review comment. The warnings generated by automated tools [15], [16]. investigation results also highlight that one can speed up the code Panichella et al. [16] investigated how coding convention review process by adopting tools for code convention violation violations flagged by automated tools disappear during code detection. Index Terms—code review, coding conventions, coding style review. They mainly focused on how a patch under review affects the density of convention violations at the beginning I. INTRODUCTION and end of a code review. They found that the number of The adoption of coding conventions, or programming style convention violations usually drops. However, they also found guidelines, is one of the most widely accepted best practices in that the density of violations (i.e., number of violations per software development. It is assumed that adherence to coding line) does not significantly decrease in each code review. conventions increases not only readability [1], [2], [3] but Although Panichella et al. perform a manual investigation of also maintainability of software. Moreover, inconsistent coding 10% of code reviews for which a violation disappeared, their style is one source of technical debt, specifically code debt. paper does not investigate whether the violations disappear A survey of 682 developers [4] showed that technical debt is because of the reviewers’ comments. Instead, all disappear- the biggest concern of developers and they spend 17% of their ing violations are assumed to be resolved. Moreover, their time addressing technical debt. approach compared the number and the density of violations Code review, a popular method for maintaining high-quality in the files affected by the patch under review in the version software, is the process of conducting reviews of source code after the initial and after the final patch. However, due to the changes by developers other than the change author. During effect of rebasing, the number and density of violations can code review, a change in the form of a patch may undergo be different also in the parts of the files that have not been multiple revisions until it is finally accepted (or abandoned). changed by the patch. Rebasing a patch is necessary if the Diverse empirical studies report benefits of code review, such branch it is relative to has progressed in the development, so as detecting defects [5], improving code quality [6], and that the patch is relative to the current state of the branch. sharing knowledge among team members [7]. Code review Paixao et al. [17] demonstrate that empirical studies which has widely spread in both open source projects and industrial do not account for rebasing may report skewed, biased and projects [8], [9], [10], and has several supporting code re- inaccurate observations, as on average 34% of all reviews are view platforms such as Gerrit [11], CodeFlow and Fabricator. affected by modifications in the reviewed files due to rebasing. Since Panichella et al. do not take rebasing into account, their §DongGyun Han participated in this work prior to joining Amazon. observations and conclusions may be affected. ©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. To be published in the Proceedings 20th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM) in Adelaide, Australia. The purpose of this paper is to complement Panichella’s code review comment that points out a convention violation. results by (a) taking the effects of rebasing into account and A code snippet which contains a convention violation can (b) distinguishing violations that are resolved in response to disappear during code review for several reasons. For example, reviewers’ comments from violations that disappear coinci- it may be deleted to fix a defect, not because of a coding dentally. We used the code review open platform (CROP) convention issue. Thus, it is important to perform a detailed data [18], a data set created from project repositories and investigation and pin point the convention violations that are their corresponding Gerrit code review repositories [11]. The removed only because of code review comments. This research CROP data contains code review data, the changes that were question identifies convention violations that are important to reviewed, and the entire source code for every patch revision developers and flagged during code review. Then, they are which is usually not recorded in a Git repository. In total, fixed in the subsequent patches under review. In other words, we analysed 16,442 code review requests from four Java sub- we also want to identify the violation types that were never projects of Eclipse with a further detailed investigation of flagged by a reviewer. 1,268 review requests and 2,172 reported convention viola- To answer this research question, we need to extract the set tions. It is important to note that every change to the Eclipse of coding conventions that are perceived as being important projects undergoes code review, i.e., only approved changes in to developers during code review and study such convention their final version are present in the project’s Git repositories. violations in more detail. Previous studies have reported that The contribution of this paper is an in-depth investigation of developers ignore about 90% of warnings generated from coding convention violations during code review. We investi- automated tools [15], [16], and the answer to this research gate what kind of coding convention violations are introduced, question can help to identify important violations so that removed, and addressed during code review. Furthermore, we unimportant violations can be filtered out, preventing ‘Static investigate whether manual coding convention checks done Analysis Fatigue’ [19]. by reviewers delay the code review process. We also discuss RQ3. Do convention violations delay the code review pro- possible explanations for our observations, based on our exam- cess? As can be seen in Section IV and as also reported ination of coding conventions in the four open source projects by others [15], [16], Checkstyle and other automated coding and how automatic tools are used. The discussion highlights convention checking tools are not widely adopted in practice. how developers adopt coding conventions and what difficulties However, if such tools were used only to check for violations they face. that are important to developers, giving the tools a low false positive rate, their use could eliminate the need for human II. EXPERIMENTAL DESIGN developers to check and fix violations and speed up the code A. Research Questions review process. We investigate the delay caused by manual The goal of this paper is to investigate in detail how checking for convention violations that might be mitigated by developers deal with coding convention violations in real- adopting automated tools. world software projects during code

Does Code Review Really Remove Coding Convention Violations?

Java Programming Standards & Reference Guide

Command Line Interface

Squore Acceptance Provides a Fast and High Return on Investment by Efficiently

Devsecops in Reguated Industries Capgemini Template.Indd

Write Your Own Rules and Enforce Them Continuously

Open Source Tools for Measuring the Internal Quality of Java Software Products

Adhering to Coding Standards and Finding Bugs in Code Automatically

Inf 212 Analysis of Prog. Langs Elements of Imperative Programming Style

Jalopy User's Guide V. 1.9.4

C Style and Coding Standards

Advanced Tcl E D

Defining and Assessing Software Quality by Quality Models