Software Maintenance at Commit-Time

Software Maintenance at Commit-Time Mathieu Louis Nayrolles A Thesis in The Department of Electrical & Computer Engineering Presented in Partial Fulfillment of the Requirements For the Degree of Doctor of Philosophy (Electrical and Computer Engineering) at Concordia University Montréal, Québec, Canada June 2018 © Mathieu Louis Nayrolles, 2018 Concordia University School of Graduate Studies This is to certify that the thesis prepared By: Mathieu Louis Nayrolles Entitled: Software Maintenance At Commit-Time and submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Electrical & Computer Engineering) complies with the regulations of this University and meets the accepted standards with respect to originality and quality. Signed by the final examining commitee: Chair Dr. Mohammad Mannan External Examiner Dr. Bram Adams External to Program Dr. Juergen Rilling Examiner Dr. Otmane Ait Mohamed Examiner Dr. Ferhat Khendek Thesis Supervisor Dr. Abdelwahab Hamou-Lhadj Approved by Dr. Mustafa K. Mehmet Ali, Graduate Program Director 26 June 2018 Date of Defense Dr. Amir Asif, Dean, Faculty of Engineering and Computer Science Abstract Software Maintenance At Commit-Time Mathieu Louis Nayrolles, Ph.D. Concordia University, 2018 Software maintenance activities such as debugging and feature enhancement are known to be challenging and costly, which explains an ever growing line of research in software maintenance areas including mining software repository, default prevention, clone detection, and bug reproduction. The main goal is to improve the productivity of software developers as they undertake maintenance tasks. Existing tools, however, operate in an offline fashion, i.e., after the changes to the systems have been made. Studies have shown that software developers tend to be reluctant to use these tools as part of a continuous development process. This is because they require installation and training, hindering their integration with developers’ workflow, which in turn limits their adoption. In this thesis, we propose novel approaches to support software developers at commit-time. As part of the developer’s workflow, a commit marks the end of a given task. We show how commits can be used to catch unwanted modifications to the system, and prevent the introduction of clones and bugs, before these modifications reach the central code repository. We also propose a bug reproduction technique that is based on model checking and crash traces. Furthermore, we propose a new way for classifying bugs based on the location of fixes that can serve as the basis for future research in this field of study. The techniques proposed in this thesis have been tested on over 400 open and closed (industrial) systems, resulting in high levels of precision and recall. They are also scalable and non-intrusive. iii Acknowledgments My deepest gratitude goes to Dr. Abdelwahab Hamou-Lhadj for accepting to enrol me as a Ph.D student, believing in my ideas, and helping me grow as a person and a researcher. Your guidance and availability never failed me during these challenging years. Thanks to Yves Jacquier, Olivier Pomarez, Julien Kauffmann and Nicolas Fleury from Ubisoft that first saw value in my work and allowed me to study their systems. Also, I would like to thank all the employees at Ubisoft that supported me during this adventure, especially Florent Jousset, Luc Bouchard and, Chadi Lebbos. Thanks to the members of Intelligent System-Logging and Monitoring research lab (formerly known as the Software Behaviour Analysis research lab) at ENCS Con- cordia Ubiversity and La Forge research lab at Ubisoft Montréal for the brainstorming sessions and their feedbacks on my work. I am also grateful to the Concordia School of Graduate Studies, the Fonds de recherche du Québec Nature et technologies (FRQNT) and, the Natural Sciences and Engineering Research Concil of Canada (NSERC) for their continuous financial support during my graduate studies. I would also like to thank Ms. Sheryl Tablan, the Graduate Program Coordinator - Ph.D. Program, who allowed me to navigate various administrative ambushes. Finally, LakméGremillet, for her unwavering support that carried me through ups and downs, successes and failures. There is no doubt in my mind that none of this will have been possible without her. iv Table of Contents 4.2.2 BUMPERMetadata ....................... 32 4.2.3 BumperQueryLanguageandAPI . 34 4.2.4 BumperDataRepository. 35 4.3 ExperimentalSetup............................ 36 4.4 EmpiricalValidation ........................... 38 4.5 ThreatstoValidity ............................ 39 4.6 ChapterSummary ............................ 39 Chapter 5. Preventing Code Clone Insertion At Commit-Time 41 5.1 Introduction................................ 41 5.2 Approach ................................. 42 5.2.1 Commit .............................. 43 5.2.2 Pre-CommitHook ........................ 44 5.2.3 ExtractandSaveBlocks . 44 5.2.4 CompareExtractedBlocks. 48 5.2.5 OutputandDecision ....................... 49 5.3 ExperimentalSetup............................ 50 5.4 EmpiricalValidation ........................... 53 5.5 ThreatstoValidity ............................ 56 5.6 ChapterSummary ............................ 57 Chapter 6. Preventing Bug Insertion Using Clone Detection At Commit-Time 59 6.1 Introduction................................ 59 6.2 Approach ................................. 61 6.2.1 ClusteringProjectRepositories . 62 6.2.2 Building a Database of Code Blocks of Defect-Commits and Fix-Commits ........................... 63 6.2.3 Analysing New Commits Using Pre-Commit Hooks . 64 6.3 ExperimentalSetup............................ 65 6.3.1 ProjectRepositorySelection . 66 6.3.2 ProjectDependencyAnalysis . 66 6.3.3 Building a Database of Defect-Commits and Fix-Commits for PerformancesEvaluation . 68 vi 6.3.4 ProcessofComparingNewCommits . 69 6.3.5 EvaluationMeasures . 70 6.4 EmpiricalValidation ........................... 71 6.4.1 BaselineClassifierComparison. 73 6.4.2 PerformanceofBIANCA. 74 6.4.3 Analysis of the Quality of the Fixes Proposed by BIANCA . 76 6.5 ThreatstoValidity ............................ 83 6.6 ChapterSummary ............................ 86 Chapter 7. Combining Code Metrics with Clone Detection for Just-In-Time Fault Prevention and Resolution 87 7.1 Introduction................................ 87 7.2 Approach ................................. 89 7.2.1 ClusteringProjects . 91 7.2.2 Building a Database of Code Blocks of Defect-Commits and Fix-Commits ........................... 91 7.2.3 BuildingaMetric-BasedModel . 93 7.2.4 ComparingCodeBlocks . 94 7.2.5 ClassifyingIncomingCommits. 94 7.2.6 ProposingFixes.......................... 94 7.3 ExperimentalSetup............................ 95 7.3.1 ProjectRepositorySelection . 95 7.3.2 ProjectDependencyAnalysis . 96 7.3.3 Building a Database of Defect-Commits and Fix-Commits . 96 7.3.4 ProcessofComparingNewCommits . 97 7.4 EmpiricalValidation ........................... 97 7.4.1 PerformanceofCLEVER. 97 7.4.2 Analysis of the Quality of the Fixes Proposed by CLEVER . 98 7.5 University-Industry Research Collaboration. 101 7.5.1 Deep understanding of the project requirements . 101 7.5.2 Understanding the benefits of the project to both parties . 101 7.5.3 Focusing in the Beginning on Low-Hanging Fruits . 102 7.5.4 Communicatingeffectively . 103 7.5.5 Managingchange ......................... 103 vii 7.6 ThreatstoValidity ............................ 103 7.7 ChapterSummary ............................ 104 Chapter 8. Bug Reproduction Using Crash Traces and Directed Model Checking 106 8.1 Introduction................................ 106 8.2 Preliminaries ............................... 108 8.3 Approach ................................. 111 8.3.1 CollectingCrashTraces . 111 8.3.2 Preprocessing ........................... 113 8.3.3 BuildingtheBackwardStaticSlice . 114 8.3.4 DirectedModelChecking. 119 8.3.5 Validation ............................. 121 8.3.6 Generating Test Cases for Bug Reproduction . 122 8.4 ExperimentalSetup............................ 124 8.4.1 TargetedSystems. 125 8.4.2 BugSelectionandCrashTraces . 126 8.5 EmpiricalValidation . 128 8.5.1 SuccessfullyReproduced . 129 8.5.2 PartiallyReproduced . 131 8.5.3 NotReproduced.......................... 134 8.6 ThreatstoValidity ............................ 136 8.7 ChapterSummary ............................ 137 Chapter 9. Towards a Classification of Bugs Based on the Location of the Corrections: An Empirical Study 138 9.1 Introduction................................ 138 9.2 ExperimentalSetup............................ 141 9.2.1 ContextSelection . 141 9.2.2 DatasetAnalysis ......................... 143 9.3 EmpiricalValidation . 150 9.3.1 AreT4bugpredictableatsubmissiontime? . 150 9.3.2 Whatarethebestpredictorsoftype4bugs?. 159 9.4 ThreatstoValidity ............................ 165 viii 9.5 ChapterSummary ............................ 165 Chapter 10. Conclusion and Future Work 167 10.1 SummaryoftheFindings. 167 10.2FutureWork................................ 169 10.2.1 CurrentLimitations . 169 10.2.2 Other Possible Opportunities for Future Research . 169 10.3ClosingRemarks ............................. 170 Bibliography 172 Appendices 191 ix List of Figures 1 Datastructureofacommit. ....................... 12 2 Datastructureoftwocommits. 12 3 Twobranchespointingononecommit. 13 4 Twobranchespointingontwocommits. 13 5 Lifecyleofareport ............................ 14 6 BumperArchitecture........................... 32 7 BUMPERMetamodel

Software Maintenance at Commit-Time

Orchestrating Big Data Analysis Workflows in the Cloud: Research Challenges, Survey, and Future Directions

An Enterprise Knowledge Network

Sustainability Forecasting for Apache Incubator Projects

Apachecon US 2008 with Apache Shindig

Open Source Used in Cisco Prime Access Registrar 7.3

Tracking Known Security Vulnerabilities in Third-Party Components

Unravel Data Systems Version 4.5

Talend Open Studio for Big Data Release Notes

Polycom Realpresence Cloudaxis Open Source Software OFFER

Third Party Library Attributions

Migration Guide

Open Source Licenses for Inspire R11