A Large-Scale Study on Source Code Reviewer Recommendation

A Large-Scale Study on Source Code Reviewer Recommendation Jakub Lipcˇak´ and Bruno Rossi Faculty of Informatics Masaryk University, Brno, Czech Republic Email: [email protected], [email protected] reviews have a code reviewer assignment problem, taking Abstract—Context: Software code reviews are an important approximately 12 days longer to approve a code change with a part of the development process, leading to better software qual- code reviewer assignment problem, increasing projects costs. ity and reduced overall costs. However, finding appropriate code reviewers is a complex and time-consuming task. Goals: In this The reduction of such costs was the main reason for paper, we propose a large-scale study to compare performance the emergence of source code reviewer recommendation ap- of two main source code reviewer recommendation algorithms proaches [12]–[18] and for the appearance of automation in (RevFinder and a Na¨ıve Bayes-based approach) in identifying the software engineering, beneficial also for industry [19], [20]. best code reviewers for opened pull requests. Method: We mined data from Github and Gerrit repositories, building a large dataset The automated recommendation of code reviewers can reduce of 51 projects, with more than 293K pull requests analyzed, the efforts necessary to find the best code reviewers, cutting 180K owners and 157K reviewers. Results: Based on the large time spent by code reviewers on understanding large code analysis, we can state that i) no model can be generalized as best changes [13]. This can improve the effectiveness of the pull- for all projects, ii) the usage of a different repository (Gerrit, based model, with source code reviewers assigned to the pull GitHub) can have impact on the the recommendation results, iii) exploiting sub-projects information available in Gerrit can requests immediately after their creation. improve the recommendation results. In previous studies about source code reviewers recommen- Index Terms—Source Code Reviewer Recommendation, Dis- dation algorithms, the compared projects were limited in size tributed Software Development, Mining Software Repositories and number, with typically 8K-45K pull requests considered ( [12], [15], [21]–[23]). In this paper, we look into a large-scale I. INTRODUCTION evaluation (293K pull requests, 51 projects) based on Gerrit The source code review process was formalized in 1976 by and Github repositories, that can give more insights about the M. E. Fagan, as a highly structured process based on line- performance of reviewers recommendation approaches. Our by-line group reviews in the form of inspections [1]. The goal is to study the impact of different repository type used code review practices have rapidly changed in recent years for the source code recommendation. For this reason we chose [2], and the modern code review process has become informal two main baseline algorithms: RevFinder—heuristic model (in contrast to Fagan’s definition [1]), tool-based [3], more based on file paths [12] and a Na¨ıve Bayes-based approach light-weight, continuous and asynchronous [4]. (NB)—probabilistic model based on additional features such Software code reviews play nowadays a major role in as owners. More advanced algorithms exist, however they improving the quality of software in terms of defects iden- require additional features to be built (e.g., collaboration tified [5], [6]. However, code reviews are a time-consuming networks between developers), making the mining process process, with significant amount of human effort involved [7]. more complex. We have two main contributions in this paper: Managers might have concerns that code reviews will slow the • provision of a large dataset of 51 projects mined from arXiv:1806.07619v1 [cs.SE] 20 Jun 2018 projects down [6]. Code reviews could also lead to negative moods in teams if team members fear public criticism caused Gerrit (14) and Github (37) with 293,337 total pull re- by others reviewing their source code [8]. quests analyzed, considering 180,111 owners and 157,885 At a process level, source code review processes in com- reviewers. The dataset is available on Figshare [24]. panies nowadays are similar to the processes adopted in open • comparison of the results in the two repositories (Gerrit, source projects [6], [9], [10], with a lot of variation in the GitHub) using both RevFinder and Na¨ıve Bayes-based process steps [6], [11]. Furthermore, source code reviews approaches in the context of the 51-projects dataset. can be expensive: knowledgeable understanding of large code The article is structured as follows. In section II, we propose changes by reviewers is a time-consuming process, and finding the background on several algorithms for code reviewer rec- the most knowledgeable code reviewers for the source code ommendation. In section III we have the experimental study parts to be reviewed can also be very labor-intensive for design, with research questions, context, data analysis and developers. Thongtanunam et al. [12] examined comments replicability information. In section IV, we answer the research from more than 1,400 representative review samples of four questions with discussions and threats to validity. Section V open source projects. Authors discovered that 4%-30% of proposes the related works and section VI the conclusions. Pre-print of the accepted article: Lipcak, J., Rossi, B. (2018) A Large-Scale Study on Source Code Reviewer Recommendation, in 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA) 2018, IEEE. c 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. II. BACKGROUND D. Hybrid Approaches Over the years, many approaches for the automation of Algorithms in this group use different approaches (machine source code reviewers recommendation emerged [12], [14]– learning, social network analysis) for the recommendation of [18]. We categorize the most relevant existing recommendation code reviewers. An example is CoreDevRec [15]. algorithms into four groups based on the main features they Automatic Core Member Recommendation for Contribution process and the main techniques they use to recommend Evaluation (CoreDevRec) builds a prediction model from code reviewers: i) Heuristic-based approaches, ii) Machine historical pull requests using file paths similarities, developers Learning-based, iii) Social Networks-based, iv) Hybrid ap- GitHub relationship features, and activeness features, using proaches. Support Vector Machine (SVM) for prediction [15]. Table I shows a summary of all the features utilized by the A. Heuristic-based Approaches reviewed algorithms, while Table II summarizes the availability of source code, datasets, and metrics used for empirical Traditional recommendation approaches process historical validation of the proposed algorithms. project review data and use heuristic-based algorithms to find the most relevant code reviewers. Main algorithms are TABLE I ReviewBot [17], RevFinder [12] (that we will explain later, as SUMMARY OF FEATURES USED IN MOST COMMON SOURCE CODE part of our empirical analysis), and CORRECT [14]. REVIEWERS RECOMMENDATION ALGORITHMS. V=USED, X=NOT USED ReviewBot is a technique proposed by Balachandran [17]. Paper CoreDevRec [15] Comm. Net. [16] CORRECT [14] ReviewBot [17] Pred. Rev. [18] It is a code reviewer recommendation approach based on the RevFinder [12] assumption that lines of code changed in the pull request Total should be reviewed by the same code reviewers who had Feature previously discussed or modified the same lines of code (familiarity assumption). CORRECT (Code Reviewer Recommendation based on File paths 7 3 7 3 7 7 2 Cross-project and Technology experience) is another code Social interactions 7 7 7 3 7 3 2 reviewers recommendation technique [14]. The baseline idea Line change history 3 7 7 7 7 7 1 Reviewer expertise 7 7 3 7 7 7 1 is that if a previous pull request used some similar external Activeness of reviewers 7 7 7 3 7 7 1 library or technology to the current pull request, then the Patch meta-data 7 7 7 7 3 7 1 reviewers of the past pull request are also good candidates Patch content 7 7 7 7 3 7 1 Bug report information 7 7 7 7 3 7 1 for the current one (expertise assumption). Total 1 1 1 3 3 1 B. Machine Learning Algorithms in this group use different Machine Learning TABLE II techniques for the recommendation of code reviewers. They SOURCE CODE & DATASETS AVAILABILITY, METRICS USED IN THE EMPIRICAL STUDIES.V=USED, X=NOT USED are mainly different from the previous group, as they first need to build a model based on a training set. One typical approach Metrics Source code Dataset F - Measure is by Jeoung et al. [18]. Precision Recall MRR Predicting Reviewers and Acceptance of Patches is an Paper available available Top-k approach by Jeoung et al. [18] that uses the Bayesian Network technique to predict reviewers and patch acceptance based on a series of features such as patch meta-data, patch content and ReviewBot [17] 7 3 3 7 7 7 7 bug report information. RevFinder [12] 7 7 3 3 7 7 7 CORRECT [14] 7 7 3 3 3 3 7 C. Social Networks CoreDevRec [15] 7 7 3 3 7 7 7 Pred. Rev. [18] 7 7 3 7 7 7 7 Social networks have also been used to determine similar- Comm. Net. [16] 7 7 7 7 3 3 3 ities in communication between developers, suggesting more similar candidates for source code reviews. The Comment Network (CN) approach was proposed by Yu III. EMPIRICAL STUDY DESIGN et al. [16] as a code reviewer recommendation approach by The goal of this paper is to analyze source code recommen- analyzing social relations between contributors and developers.

A Large-Scale Study on Source Code Reviewer Recommendation

Open Source Used in Influx1.8 Influx 1.9

Buildbot Documentation Release 1.6.0

Gerrit J.J. Van Den Burg, Phd London, UK | Email: [email protected] | Web: Gertjanvandenburg.Com

QUARTERLY CHECK-IN Technology (Services) TECH GOAL QUADRANT

VES Home Welcome to the VNF Event Stream (VES) Project Home

Metrics for Gerrit Code Reviews

Fashion Terminology Today Describe Your Heritage Collections with an Eye on the Future

Full-Graph-Limited-Mvn-Deps.Pdf

Arxiv:1910.06663V1 [Cs.PF] 15 Oct 2019

A Dataset of Vulnerable Code Changes of the Chromium OS Project

Git and Gerrit in Action and Lessons Learned Along the Path to Distributed Version Control

CAFCR: a Multi-View Method for Embedded Systems Architecting; Balancing Genericity and Speciﬁcity