Large-Scale Analysis of Modern Code Review Practices and Software Security in Open Source Software
Total Page:16
File Type:pdf, Size:1020Kb
Large-Scale Analysis of Modern Code Review Practices and Software Security in Open Source Software Christopher Thompson Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2017-217 http://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-217.html December 14, 2017 Copyright © 2017, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Acknowledgement I want to thank the people who have supported and guided me along my path to eventually finishing this dissertation. My parents, who encouraged me to explore my various wild fascinations while I was growing up, and who supported me through my undergraduate studies. Nick Hopper, who mentored me throughout my undergraduate studies, and who gave me a chance to learn my love of computer security and research. David Wagner, my doctoral advisor, who has been a constant sounding board for my ideas (including what led to this dissertation), for always having something interesting to teach me, and for always wanting to learn from me as I explored new ideas. And finally, thank you to Hina, who has supported me through graduate school with love and lots of understanding. Large-Scale Analysis of Modern Code Review Practices and Software Security in Open Source Software by Christopher Thompson A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor David Wagner, Chair Professor Vern Paxson Associate Professor Coye Cheshire Fall 2017 Large-Scale Analysis of Modern Code Review Practices and Software Security in Open Source Software Copyright 2017 by Christopher Thompson 1 Abstract Large-Scale Analysis of Modern Code Review Practices and Software Security in Open Source Software by Christopher Thompson Doctor of Philosophy in Computer Science University of California, Berkeley Professor David Wagner, Chair Modern code review is a lightweight and informal process for integrating changes into a software project, popularized by GitHub and pull requests. However, having a rich empirical understanding of modern code review and its effects on software quality and security can help development teams make intelligent, informed decisions, analyzing the costs and the benefits of implementing code review for their projects, and provide insight on how to support and improve its use. This dissertation presents the results of our analyses on the relationships between modern code review practice and software quality and security, across a large popu- lation of open source software projects. First, we describe our neural network-based quantification model which allows us to efficiently estimate the number ofsecurity bugs reported to a software project. Our model builds on prior quantification-optimized models with a novel regularization technique we call random proportion batching. We use our quantification model to perform association analysis of very large samples of code review data, confirming and generalizing prior work on the relationship between code review and software security and quality. We then leverage timeseries change- point detection techniques to mine for repositories that have implemented code review in the middle of their development. We use this dataset to explore the causal treatment effect of implementing code review on software quality and security. We find thatim- plementing code review may significantly reduce security issues for projects that areal- ready prone to them, but may significantly increase overall issues filed against projects. Finally, we expand our changepoint detection to find and analyze the effect of using au- tomated code review services, finding that their use may significantly decrease issues reported to a project. These findings give evidence for modern code review being an effective tool for improving software quality and security. They also suggest thatthe development of better tools supporting code review, particularly for software security, could magnify this benefit while decreasing the cost of integrating code review intoa team’s development process. i Contents Contents i List of Figures iii List of Tables iv 1 Introduction 1 2 Background 4 2.1 Related Research ................................ 6 3 Quantification of Security Issues 10 3.1 Basic Quantification Techniques ....................... 11 3.2 Quantification Error Optimization ...................... 12 3.3 NNQuant and Randomized Proportion Batching . 13 3.4 Feature Extraction ............................... 15 3.5 Methodology ................................... 16 3.6 Evaluation .................................... 18 4 Association Analysis 22 4.1 Data Processing ................................. 23 4.2 Regression Design ................................ 25 4.3 Results ...................................... 29 4.4 Threats to Validity ............................... 32 4.5 Discussion .................................... 34 5 Causal Analysis 35 5.1 Changepoint Detection ............................. 36 5.2 Data Collection ................................. 37 5.3 Experimental Design .............................. 41 5.4 Results ...................................... 45 5.5 Threats to Validity ............................... 51 5.6 Discussion .................................... 55 ii Conclusions 57 Bibliography 58 A Full regression models 65 iii List of Figures 2.1 Example of a pull request on GitHub. ...................... 5 3.1 A reduced-size version of our neural network quantification architecture. 14 3.2 Quantifier prediction error plots on reserved test set. 19 3.3 Quantifier KLD plots on reserved test set. .................... 20 3.4 Quantifier prediction error plots on reserved test set for low proportions. 21 4.1 Relationships between unreviewed pull requests and overall issues and se- curity issues. ..................................... 30 4.2 Relationship between participation factors and overall issues. 32 5.1 An example of a changepoint in a timeseries. 38 5.2 Distributions of review coverage in repositories with and without change- points. ......................................... 39 5.3 Scatterplots of explanatory variables and post-test security issues. 41 5.4 Scatterplots of explanatory variables and post-test overall issues. 42 5.5 Effect of code review on overall issues. ...................... 47 5.6 Effect of code review on number security issues. 49 5.7 Effect of code review on overall issues and security issues with a singlecom- bined treatment group. ............................... 50 5.8 Maturation of covariates across groups. ..................... 53 iv List of Tables 3.1 Top repositories in our set of “security”-tagged issues. 17 4.1 Summary of our sample of GitHub repositories. 24 4.2 Top primary languages in our repository sample. 25 4.3 Control metrics used for regression analysis. 26 4.4 Code review coverage metrics used for regression analysis. 27 4.5 Code review participation metrics used for regression analysis. 27 4.6 Negative binomial association analysis model risk ratios. 29 5.1 Summary of our sample of GitHub repositories, by group. 43 5.2 Top primary languages in our repository sample. 44 5.3 Summary of our sample of GitHub repositories for automated code review usage, by group. ................................... 45 5.4 Models of treatment effect of code review .................... 46 5.5 Marginal effect of treatment on overall issues. 47 5.6 Marginal effect of treatment on security issues. 48 5.7 Treatment effect models for a single combined treatment group . 49 5.8 Treatment effects models for automated code review usage . 51 5.9 Distribution of days in the pre-test and post-test periods, by group. 53 5.10 Distribution of the error of our split-half-and-combine measurements com- pared to the full period estimates. ........................ 54 A.1 Full association analysis models. ......................... 66 A.2 Full treatment analysis models for three groups. 67 A.3 Full treatment analysis models for two groups. 68 A.4 Full treatment analysis models for automated code review usage. 69 v Acknowledgments I want to thank the people who have supported and guided me along my path to even- tually finishing this dissertation. My parents, who encouraged me to explore my vari- ous wild fascinations while I was growing up, and who supported me through my un- dergraduate studies. Nick Hopper, who mentored me throughout my undergraduate studies, and who gave me a chance to learn my love of computer security and research. David Wagner, my doctoral advisor, who has been a constant sounding board for my ideas (including what led to this dissertation), for always having something interesting to teach me, and for always wanting to learn from me as I explored new ideas. And finally, thank you to Hina, who has supported me through graduate school with love and lots of understanding. 1 Chapter 1 Introduction Code inspections are costly and formal, and frequently not performed as often as they maybe should be. In comparison, modern code review is lightweight, informal, and frequently