Leveraging Machine Learning to Improve Software Reliability

Leveraging Machine Learning to Improve Software Reliability by Song Wang A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Doctor of Philosophy in Electrical and Computer Engineering Waterloo, Ontario, Canada, 2018 c Song Wang 2018 Examining Committee Membership The following served on the Examining Committee for this thesis. The decision of the Examining Committee is by majority vote. External Examiner: Dr. Thomas Zimmermann Senior Researcher Microsoft Research Supervisor(s): Dr. Lin Tan Associate Professor, Canada Research Chair Electrical and Computer Engineering University of Waterloo Internal Members: Dr. Sebastian Fischmeister Associate Professor Electrical and Computer Engineering University of Waterloo Dr. Mark Crowley Assistant Professor Electrical and Computer Engineering University of Waterloo Internal-External Member: Dr. Joanne Atlee Professor, Director of Women in Computer Science David R. Cheriton School of Computer Science University of Waterloo ii I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. iii Abstract Finding software faults is a critical task during the lifecycle of a software system. While traditional software quality control practices such as statistical defect prediction, static bug detection, regression test, and code review are often inefficient and time-consuming, which cannot keep up with the increasing complexity of modern software systems. We argue that machine learning with its capability in knowledge representation, learning, natural language processing, classification, etc., can be used to extract invaluable information from software artifacts that may be difficult to obtain with other research methodologies to improve existing software reliability practices such as statistical defect prediction, static bug detection, regression test, and code review. This thesis presents a suite of machine learning based novel techniques to improve existing software reliability practices for helping developers find software bugs more effective and efficient. First, it introduces a deep learning based defect prediction technique to improve existing statistical defect prediction models. To build accurate prediction models, previous studies focused on manually designing features that encode the statistical charac- teristics of programs. However, these features often fail to capture the semantic difference of programs, and such a capability is needed for building accurate prediction models. To bridge the gap between programs' semantics and defect prediction features, this thesis leverages deep learning techniques to learn a semantic representation of programs automatically from source code and further build and train defect prediction models by using these semantic features. We examine the effectiveness of the deep learning based prediction models on both the open-source and commercial projects. Results show that the learned semantic features can significantly outperform existing defect prediction models. Second, it introduces an n-gram language based static bug detection technique, i.e., Bugram, to detect new types of bugs with less false positives. Most of existing static bug detection techniques are based on programming rules inferred from source code. It is known that if a pattern does not appear frequently enough, rules are not learned, thus missing many bugs. To solve this issue, this thesis proposes Bugram, which leverages n-gram language models instead of rules to detect bugs. Specifically, Bugram models program tokens sequentially, using the n-gram language model. Token sequences from the program are then assessed according to their probability in the learned model, and low probability sequences are marked as potential bugs. The assumption is that low probability token sequences in a program are unusual, which may indicate bugs, bad practices, or unusual/special uses of code of which developers may want to be aware. We examine the effectiveness of our approach on the latest versions of 16 open-source projects. Results iv show that Bugram detected 25 new bugs, 23 of which cannot be detected by existing rule- based bug detection approaches, which suggests that Bugram is complementary to existing bug detection approaches to detect more bugs and generates less false positives. Third, it introduces a machine learning based regression test prioritization technique, i.e., QTEP, to find and run test cases that could reveal bugs earlier. Existing test case prioritization techniques mainly focus on maximizing coverage information between source code and test cases to schedule test cases for finding bugs earlier. While they often do not consider the likely distribution of faults in the source code. However, software faults are not often equally distributed in source code, e.g., around 80% faults are located in about 20% source code. Intuitively, test cases that cover the faulty source code should have higher priorities, since they are more likely to find faults. To solve this issue, this thesis proposes QTEP, which leverages machine learning models to evaluate source code quality and then adapt existing test case prioritization algorithms by considering the weighted source code quality. Evaluation on seven open-source projects shows that QTEP can significantly outperform existing test case prioritization techniques to find failed test cases early. Finally, it introduces a machine learning based approach to identifying risky code review requests. Code review has been widely adopted in the development process of both the proprietary and open-source software, which helps improve the maintenance and quality of software before the code changes being merged into the source code repository. Our observation on code review requests from four large-scale projects reveals that around 20% changes cannot pass the first round code review and require non-trivial revision effort (i.e., risky changes). In addition, resolving these risky changes requires 3X more time and 1.6X more reviewers than the regular changes (i.e., changes pass the first code review) on average. This thesis presents the first study to characterize these risky changes and automatically identify these risky changes with machine learning classifiers. Evaluation on one proprietary project and three large-scale open-source projects (i.e., Qt, Android, and OpenStack) shows that our approach is effective in identifying risky code review requests. Taken together, the results of the four studies provide evidence that machine learning can help improve traditional software reliability such as statistical defect prediction, static bug detection, regression test, and code review. v Acknowledgments First of all, I would like to express my deep gratitude to my supervisor, Dr. Lin Tan: thanks for accepting me as your student, believing in me, encouraging me to pursue the research topics that I love, being always there when I need help, and helping me grow with your continuous support and mentorship. Her guidance will continue to influence me beyond my Ph.D. education. I could not have wished for a better supervisor for my Ph.D. studies! I would like to thank my examination committee members|Dr. Joanne M. Atlee, Dr. Thomas Zimmermann, Dr. Sebastian Fischmeister, and Dr. Mark Crowley, for providing useful comments that help me improve my thesis. I would also like to thank Dr. Nachi Nagappan, Dr. Thomas Zimmermann, Dr. Harald Gall, Dr. Christian Bird, Zhen Gao, Chetan Bansal, Vincent J Hellendoorn, Justin Smith from whom I learned a lot, together with whom I had an amazing and wonderful internship at Microsoft Research in Summer 2018. Thanks to all the people in our research group who made my life at Waterloo fantas- tically joyful: Dr. Jaechang Nam, Dr. Jinqiu Yang, Edmund Wong, Thibaud Lutellier, Michael Chong, Taiyue Liu, Ming Tan, Yuefei Liu, Xinye Tang, Alexey Zhikhartsev, Devin Chollak, Yuan Xi, Moshi Wei, Tian Jiang, Lei Zhang, Hung Pham, Weizhen Qi, Yitong Li, and Tej Toor|thank you very much for all the friendship, support, and joy. I want to thank all my other collaborators during my Ph.D.'s study: Dr. Junjie Wang, Dr. Dana Movshovitz-Attias, Dr. Qiang Cui, Dr. Ke Mao, Dr. Tim Menzies, Mingyang Li, Adithya Abraham Philip, Dr. Mingshu Li, and Hu Yuanzhe. I really enjoyed working with them, and have learnt a lot from them. I would like to acknowledge my Master's supervisors, Dr. Qing Wang, Dr. Ye Yang, and Dr. Wen Zhang, who introduced me to the software engineering area, taught me many research skills, helped me publish my first research papers, and gave me their strong support during my Ph.D. application process. Without them, I can hardly imagine that I can eventually become a Ph.D. in the software engineering area. Finally, I would like to thank my family|my parents, parents-in-law, my younger brother, my two sisters and brothers-in-law, my four lovely nephews, for their support and unconditional love. And lastly, my beloved wife, Qi: thank you very much for your support, understanding, care, and love. I dedicate this thesis to them. vi Dedicated to my family. vii Table of Contents List of Tables xiii List of Figures xviii 1 Introduction1 1.1 Organization..................................4 1.2 Thesis Scope...................................4 1.3 Related Publications..............................4 2 Background and Related Work7 2.1

Load more