Intelligent Code Reviews Using Deep Learning
Total Page:16
File Type:pdf, Size:1020Kb
Intelligent code reviews using deep learning Anshul Gupta Neel Sundaresan Microsoft Corp. Microsoft Corp. [email protected] [email protected] ABSTRACT Software and its Engineering → Software functional properties Peer code review is a best practice in Software Engineering where source code is reviewed manually by one or more KEYWORDS peers(reviewers) of the code author. It is widely acceptable Deep neural networks, automatic peer reviews, code analysis, both in industry and open-source software (OSS) systems as a natural language processing, data mining, recommendation process for early detection and reduction of software defects. A larger chunk of reviews given during peer reviews are related to common issues such as coding style, 1 INTRODUCTION documentations, and best practices. This makes the code review process less effective as reviewers focus less on finding Peer code review [1, 2] is a critical part of software important defects. Hence, there is a need to automatically find development lifecycle where source code is reviewed such common issues and help reviewers perform focused code manually by one or more peers of the code author. The source reviews. Some of this is solved by rule based systems called code is inspected to discover errors, to ensure that the source linters but they are rigid and needs a lot of manual effort to code complies with best practice standards, and to discover adapt them for a new issue. vulnerabilities such as race conditions, malware, memory In this work, we present an automatic, flexible, and leaks, buffer overflows, and format string exploits. Code adaptive code analysis system called DeepCodeReviewer review is used to find these problems which may have been (DCR). DCR learns how to recommend code reviews related overlooked by a developer before the software is released. to common issues using historical peer reviews and deep Parts of this process are being automated by systems 1 learning. DCR uses deep learning to learn review relevance to called code linters/static analysis tools . Linters are made up a code snippet and recommend the right review from a of rules related to coding best practices. They get triggered repository of common reviews. DCR is trained on histroical whenever these best practices are violated, mostly during the peer reviews available from internal code repositories at build time in IDEs. The rules used by linters work well for only Microsoft. Experiments demonstrate strong performance of a few variations of a violation [3] and needs a lot of manual developed deep learning model in classifying relevant and customization to adapt them for specific projects, teams, or , non-relevant reviews w.r.t to a code snippet, and ranking industry. One needs to maintain right set of rules that suits reviews given a code snippet. We have also evaluated DCR one’s projects and add new analyse rules for modifications in recommentations using a user study and survey. The results best practices. This rigidness of static analysis tools makes of our user study show good acceptance rate and answers of them hard to use [4] and unscalable for large and diverse code our survey questions are strongly correlated with our repositories present in open source software (OSS), and system’s goal of making code reviews focused on finding industry. defects. Even in presence of these systems, peer or code reviews result in common issues related to documentation, style, and, CCS CONCEPTS structure. Mäntylä and Lassenius [5] showed that 75 % of code review comments are related to such common issues. • Machine learning → Supervised learning; Neural networks • Bacchelli and Bird [6] performed an empirical analysis to see the motivation behind the code review performed by Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are reviewers. They found that even with strong motivation to not made or distributed for profit or commercial advantage and that copies finding defect (bugs or functional issues in code), reviewers bear this notice and the full citation on the first page. Copyrights for third- party components of this work must be honored. For all other uses, contact tend to focus on reviews which are mostly related to obvious the owner/author(s). bugs, or best practices. This makes code reviews less effective KDD’18 Deep Learning Day, August 2018, London, UK © 2018 Copyright held by the owner/author(s). ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. 1 https://en.wikipedia.org/wiki/Static_program_analysis https://doi.org/10.1145/nnnnnnn.nnnnnnn KDD’18 Deep Learning Day, August 2018, London, UK A. Gupta et al. because reviewers are distracted by common issues, and they evaluate overall performance of DCR. Finally, we will focus less on important aspects such as defects, architectural conclude with discussions and future direction for this work. issues, and testing for code. To address challenges mentioned above, we developed an adaptive, scalable, and easy to configure automatic code 2 BACKGROUND AND RELATED WORK analysis system called DeepCodeReviewer (DCR). The system can automatically review code for common issues so that a 2.1 Background on peer reviews developer can proactively correct the code, and reviewers can perform code reviews focused towards finding defects. DCR Peer or code review is considered a highly recommended makes use of deep neural networks to automatically learn practice in software engineering [8]. It is widely acceptable code analyses corresponding to the human reviews given to both in industry and open-source software (OSS) systems as a code snippets in historical peer reviews. It uses Long Short process for early detection and reduction of software defects. Term Memory (LSTM) networks [7] to understand source Fagan [9] formalized a highly structured process for code code snippets and reviews, and learns to apply relevant reviewing based on line-by-line group reviews, done in review to a code snippet. This makes our system: 1. Flexible, extended meetings for code inspection. This was effective but as its code analyses can generalize to new variations of an was time consuming and a heavyweight process. In recent issue; 2. Easy to configure, as it learns to prioritize its analyses years organizations and OSS use a lightweight code review from a project or organization’s historical code reviews; and 3. process with less formal requirements along with supporting Adaptive, as it just needs historical code reviews for a specific tools to perform code review. Some examples of tools used in coding language. industry are Google’s Mondrian [10], Facebook’s 3 Phabricator[11] and the open-source Gerrit initiative. Git (code,review) pairs Our code review system is based on Pull requests(PRs) and Repositories are very similar to Github’s PR review system4. A PR asks Crawl Historical Code and review Training data peer preprocessing generation collaborators5 of a project to review or vote code changes reviews associated with a bug fix or a new feature request in a Review clustering Relevant, software product. Collaborators then review the submitted Multi-Encoder deep Model Non- learning model Training relevant code change to make sure that it is free of defects. If they find Vectorized pairs (code,review) Training phase (code,review) pairs any, they leave a comment explaining the defect that they have Production phase found. The creator of the PR then adds follow-up code changes Repository of New common reviews pullrequest related to the review comment. Once all the collaborators and Review with Code Review candidate (code, candidate Multi-Encoder deep maximum the creator are happy with the changes, the PR is then merged selection review) pairs learning model model to the master code repository or mainline. To train DCR, we confidence used historical PRs from our internal code repository and Figure 1 Overall architecture of DeepCodeReviewer with management system. two main phases: training phase processes historical peer reviews and trains a deep learning code review model; production phase makes use of trained model to perform 2.2 Related work code analysis and apply relevant review. Most of the work related to automating the process of code review is centered around static analysis of code[12]. Static Figure 1 describes DCR in action. For a new code snippet, analysis tools (commonly called linters) aims to examine the DCR finds most relevant review/s from a repository of text of a program statically, without attempting to execute it. common reviews generated using historical peer reviews. For They commonly work on source code representations i.e. scalability, it first finds a set of candidate reviews from abstract syntax tree, or some form of object code. These repository and then it makes use of a Deep learning model to analyses are shown in IDEs and usually consist of suggestions find most relevant review/s for a code snippet. For this work, related to design, security, performance improvement, style, we trained DCR for peer reviews in C# language projects. We and code readability. They make use of rules and allow end collected historical data from internal corporate git user to add any specific rules related to his/her project or repositories. In the following section, we present related team guidelines. For C#, Microsoft .NET Framework has two work for automatic code reviews, and background of code static analysis tools: 1. FxCop6; 2. StyleCop7. reviews using Git PRs2. Then we will explain in detail all the components of DCR. We will then discuss the evaluation of our 3 Gerrit: http://en.wikipedia.org/wiki/Gerrit_(software) proposed deep learning model, and results from user study to 4 https://github.com 5 Collaborators are developers working on a project. One or more collaborators are usually added to a PR as reviewers.