UC Riverside UC Riverside Electronic Theses and Dissertations
Total Page:16
File Type:pdf, Size:1020Kb
UC Riverside UC Riverside Electronic Theses and Dissertations Title Extracting Actionable Information From Bug Reports Permalink https://escholarship.org/uc/item/4tf0c5x4 Author Zhou, Bo Publication Date 2016 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA RIVERSIDE Extracting Actionable Information From Bug Reports A Dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science by Bo Zhou December 2016 Dissertation Committee: Dr. Rajiv Gupta, Chairperson Dr. Iulian Neamtiu Dr. Zhiyun Qian Dr. Zhijia Zhao Copyright by Bo Zhou 2016 The Dissertation of Bo Zhou is approved: Committee Chairperson University of California, Riverside Acknowledgments This dissertation would not have been possible without all people who have helped me during my Ph.D. study and my life. Foremost, I would like to express my sincere gratitude to my advisor Dr. Iulian Neamtiu for the continuous support of my PhD study and research, for his patience, moti- vation, enthusiasm, and immense knowledge. I will never forget he is the person that went through my papers tens of times overnight with me for accuracy. His guidance helped me in all the time of research and writing of this dissertation. I could not have imagined having a better advisor and mentor for my PhD study. Thanks, Dr. Neamtiu! I am also deeply indebted to my co-advisor Dr. Rajiv Gupta for the collaboration over the years, guidance and help for making me not only a good researcher, but also a good person. His attitude towards research has influenced me immensely in the past and in the future. Thanks, Dr. Gupta! Next, I would like to thank the members of my dissertation committee Dr. Zhiyun Qian and Dr. Zhijia Zhao for being always supportive. Their constructive comments make this dissertation much better. On an organizational level, I would like to thank the Department of Computer Science, and University of California, Riverside for providing an excellent environment for an international student far away from home. I would also like to sincerely acknowledge the support of several grants that funded my graduate education: National Science Foundation grants CNS-1064646, CCF-1149632 and Army Research Laboratory grants Cooperative Agreement Number W911NF-13-2-0045 (ARL Cyber Security CRA). iv I would like to express my gratitude to all my lab-mates: Yan Wang, Changhui Lin, Min Feng, Sai Charan Koduru, Kishore Kumar Pusukuri, Youngjian Hu, Tanzirul Azim, Amlan Kusum, Vineet Singh, Zachary Benavides, Keval Vora and Farzad Khorasani for helping me in many ways during my graduate study. You are priceless friends in my life. I also want to thank my dear friends Qu, Lei, Linchao, Jian, Hua, Yingqiao, Lufei, Boxiao, Luping, Linfeng and Zhipeng for helping and enriching my life in US and in the future. Finally, I would like to thank my father, my mother and my girlfriend Ruby. Their endless love and encouragement help me overcome all kinds of difficulties I encountered during my study and my life. I love you forever! v To my parents and my love for all the support. vi ABSTRACT OF THE DISSERTATION Extracting Actionable Information From Bug Reports by Bo Zhou Doctor of Philosophy, Graduate Program in Computer Science University of California, Riverside, December 2016 Dr. Rajiv Gupta, Chairperson Finding and fixing bugs is a major but time- and effort-consuming task for software quality assurance in software development process. When a bug is filed, valuable multi- dimensional information is captured by the bug report and stored in the bug tracking system. However, developers and researchers have so far used only part of this information (e.g., a detailed description of a failure and occasionally hint at the location of the fault in the code), and for limited purposes, e.g., finding and fixing bugs, detecting duplicate bug reports, or improving bug triagging accuracy. We contend that this information is useful not only for software testing and debugging but also for product understanding, software evolution, and software management. This dissertation makes several advances in extracting actionable information from bug reports using data mining and nature language processing techniques. Both software developers and researchers can benefit from our approach. We first focus on differences in bugs and bug-fixing processes between desktop and smartphone applications. Specifically, our investigation has two main thrusts: a quantita- tive analysis to discover similarities and differences between desktop and smartphone bug vii reports/processes, and a qualitative analysis where we extract topics from bug reports to understand bugs' nature, categories, as well as differences between platforms. Next, we present an approach whose focus is understanding the differences between concurrency and non-concurrency bugs, the differences among various concurrency bug classes, and predicting bug quantity, type, and location, from patches, bug reports and bug-fix metrics. In addition, we found that bugs of different severities have so far been \lumped together" even though their characteristics differ significantly. Moreover, we found that the nature of issues with the same severity, (e.g., high-severity), differs markedly between desktops and smartphones. To understand these differences, we perform an empirical study on 72 Android and desktop projects. We study how severity changes, quantify the differences between classes in terms of bug-fixing attributes and analyze how the topics differ across classes on each platform over time. Finally, we aid bug reproduction and fixing: we propose a novel delta debugging technique to reduce the length of event traces by using a record&replay scheme. When we capture the event sequence while executing the application, an event dependency graph (EDG) will be generated. Then we use the EDG to guide the delta debugging algorithm by eliminating irrelevant events. Therefore, the debugging process can be improved signifi- cantly if events that are irrelevant to the crash are filtered out. viii Contents List of Figures xiii List of Tables xv 1 Introduction 1 1.1 Motivation . .1 1.2 Dissertation Overview . .2 1.2.1 Cross-platform Analysis . .3 1.2.2 Empirical Study on Concurrency Bugs . .4 1.2.3 Bug Analysis on Severity Classes . .5 1.2.4 Delta Debugging on Android . .7 1.3 Thesis Organization . .8 2 Framework Overview 9 2.1 Applications . .9 2.2 Collecting Data From Bug Reports . 13 2.3 Quantitative Analysis . 18 2.4 Topic Modeling . 18 3 A Cross-platform Analysis of Bugs 20 3.1 Methodology . 21 3.1.1 Examined Projects . 21 3.1.2 Quantitative Analysis . 24 3.1.3 Qualitative Analysis . 26 3.2 Quantitative Analysis . 26 3.2.1 Bug-fix Process Attributes . 26 3.2.2 Management of Bug-fixing . 35 3.2.3 Bug Fix Rate Comparison . 43 3.2.4 Case Study: Cross-platform Projects . 45 3.3 Qualitative Analysis . 48 3.3.1 Topic Extraction . 49 3.3.2 Bug Nature and Evolution . 50 ix 3.3.3 Case Study: WordPress . 55 3.3.4 Smartphone-specific Bugs . 56 3.4 Actionable Findings . 57 3.4.1 Addressing Android's Concurrency Issues . 57 3.4.2 Improving Android's Bug Trackers . 57 3.4.3 Improving the Bug-fixing Process on All Platforms . 58 3.4.4 Challenges for Projects Migrating to GitHub . 59 3.5 Threats to Validity . 59 3.5.1 Selection Bias . 59 3.5.2 Data Processing . 60 3.5.3 IDs vs. Individuals . 60 3.6 Summary . 61 4 Empirical Study of Concurrency Bugs 62 4.1 Concurrency Bug Types . 63 4.2 Methodology . 64 4.2.1 Projects Examined . 65 4.2.2 Identifying Concurrency Bugs . 67 4.2.3 Collecting Bug-fix Process Data . 70 4.3 Quantitative Analysis of Bug-fixing Features . 71 4.3.1 Feature Distributions . 72 4.3.2 Differences Among Concurrency Bugs . 75 4.3.3 Discussion . 77 4.4 Predicting the Number of Concurrency Bugs . 78 4.4.1 Generalized Linear Regression . 79 4.4.2 Times Series-based Prediction . 82 4.5 Predicting the Type of Concurrency Bugs . 85 4.5.1 Approach . 86 4.5.2 Results . 88 4.6 Predicting Concurrency Bugs' Location . 91 4.6.1 Approach . 91 4.6.2 Results . 92 4.7 Threats to Validity . 94 4.7.1 Selection Bias . 94 4.7.2 Data Processing . 94 4.7.3 Unfixed and Unreported Bugs . 95 4.7.4 Short Histories . 95 4.7.5 Bug Classification . 95 4.8 Summary . 96 5 Bug Analysis Across Severity Classes 97 5.1 Methodology . 98 5.1.1 Examined Projects . 99 5.1.2 Severity Classes . 99 5.1.3 Quantitative Analysis . 99 x 5.1.4 Topic Analysis . 102 5.2 Quantitative Analysis . 102 5.2.1 Severity Change . 103 5.2.2 Bug-fix Process Attributes . 105 5.2.3 Management of Bug-fixing . 115 5.3 Topic Analysis . 123 5.3.1 Topic Extraction . 123 5.3.2 Bug Nature and Evolution . 125 5.4 Threats to Validity . 130 5.4.1 Selection Bias . 130 5.4.2 Severity Distribution on Android . 130 5.4.3 Priority on Google Code and JIRA . 130 5.4.4 Data Processing . 131 5.5 Summary . 131 6 Minimizing Bug Reproduction Steps on Android 132 6.1 Background . 133 6.2 Problem Overview . 134 6.3 Approach . 135 6.3.1 Creating Event Traces . 136 6.3.2 Generating the Event Dependency Graph . ..