Automatic Program Repair Using Genetic Programming
Total Page:16
File Type:pdf, Size:1020Kb
Automatic Program Repair Using Genetic Programming A Dissertation Presented to the Faculty of the School of Engineering and Applied Science University of Virginia In Partial Fulfillment of the requirements for the Degree Doctor of Philosophy (Computer Science) by Claire Le Goues May 2013 c 2013 Claire Le Goues Abstract Software quality is an urgent problem. There are so many bugs in industrial program source code that mature software projects are known to ship with both known and unknown bugs [1], and the number of outstanding defects typically exceeds the resources available to address them [2]. This has become a pressing economic problem whose costs in the United States can be measured in the billions of dollars annually [3]. A dominant reason that software defects are so expensive is that fixing them remains a manual process. The process of identifying, triaging, reproducing, and localizing a particular bug, coupled with the task of understanding the underlying error, identifying a set of code changes that address it correctly, and then verifying those changes, costs both time [4] and money. Moreover, the cost of repairing a defect can increase by orders of magnitude as development progresses [5]. As a result, many defects, including critical security defects [6], remain unaddressed for long periods of time [7]. Moreover, humans are error-prone, and many human fixes are imperfect, in that they are either incorrect or lead to crashes, hangs, corruption, or security problems [8]. As a result, defect repair has become a major component of software maintenance, which in turn consumes up to 90% of the total lifecycle cost of a given piece of software [9]. Although considerable research attention has been paid to supporting various aspects of the manual debugging process [10, 11], and also to preempting or dynamically addressing particular classes of vulnerabilities, such as buffer overruns [12, 13], there exist virtually no previous automated solutions that address the synthesis of patches for general bugs as they are reported in real-world software. The primary contribution of this dissertation is GenProg, one of the very first automatic solutions designed to help alleviate the manual bug repair burden by automatically and generically patching bugs in deployed and legacy software. GenProg uses a novel genetic programming algorithm, guided by test cases and domain-specific operators, to affect scalable, expressive, and high quality automated repair. We present experimental evidence to substantiate our claims that GenProg can repair multiple types of bugs in multiple types of programs, and that it can repair a large proportion of the bugs that human developers address in practice (that it is expressive); that it scales to real-world system sizes (that it is scalable); and that it produces repairs that are of sufficiently high quality. Over the course of this evaluation, we contribute new benchmark sets of real bugs in real open-source software and novel experimental frameworks for quantitatively evaluating an automated repair technique. We also contribute a novel characterization of i Chapter 0 Abstract ii the automated repair search space, and provide analysis both of that space and of the performance and scaling behavior of our technique. General automated software repair was unheard of in 2009. In 2013, it has its own multi-paper sessions in top tier software engineering conferences. The research area shows no signs of slowing down. This dissertation’s description of GenProg provides a detailed report on the state of the art for early automated software repair efforts. Approval Sheet This dissertation is submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Science) Claire Le Goues This dissertation has been read and approved by the Examining Committee: Westley R. Weimer, Advisor Jack W. Davidson, Committee Chair Tai Melcher Stephanie Forrest Anita Jones Accepted for the School of Engineering and Applied Science: James H. Aylor, Dean, School of Engineering and Applied Science May 2013 iii “One never notices what has been done; one can only see what remains to be done.” –Marie Curie iv Acknowledgments “Scientists have calculated that the chances of something so patently absurd actually existing are millions to one. But magicians have calculated that million-to-one chances crop up nine times out of ten.” – Terry Pratchett It is impossible for me to overstate my gratitude to Wes Weimer for teaching me everything I know about science, despite what has been charitably described as my somewhat headstrong personality. He encourages the best in me by always expecting slightly more, and I hope that he finds these results adequate. He has also been an incomparable friend. Most graduate students are lucky to find one good adviser; I have had the tremendous fortune to find two. Stephanie Forrest taught me any number of things that Wes could not, such as the value of a simple sentence in place of a complicated one. I am very thankful for her friendship and mentorship. I thank the members of my family for their love, guidance, support, and good examples over all 28 years, not just these six. They manage to believe in my brilliance unconditionally and rather more than I deserve without ever letting me get ahead of myself, which is a difficult balance to strike. I have been blessed with a number of truly excellent friends who support me, challenge me, teach me, and make me laugh on a regular basis, both in and out of the office. They show me kindness beyond what is rational, from letting me live in their homes for indeterminant periods of time to reading this document over for typos and flow, simply because I asked. I hope they know how much it has meant to me. I am equally grateful to the Dames, Crash in particular, for teaching me to be Dangerous. Finally, I must acknowledge and thank my brilliant and loving Adam. His love is unfailing and his support ranges from the emotional to the logistical to the distinctly practical (e.g., doing of dishes, feeding of pets). I am regularly flummoxed by how lucky I am to have him in my life. v Contents Abstract i Acknowledgments v Contents vi List of Tables.................................................. ix List of Figures..................................................x List of Terms xi 1 Introduction 1 1.1 GenProg: Automatic program repair using genetic programming....................3 1.2 Evaluation metrics and success criteria..................................5 1.2.1 Expressive power.........................................6 1.2.2 Scalability............................................6 1.2.3 Repair quality...........................................7 1.3 Contributions and outline.........................................7 2 Background and Related Work9 2.1 What is a bug?..............................................9 2.2 How do developers avoid bugs?..................................... 11 2.2.1 What is testing?.......................................... 11 2.2.2 How can testing be improved?.................................. 12 2.2.3 What are formal methods for writing bug-free software?.................... 13 2.3 How are bugs identified and reported?.................................. 14 2.3.1 How are bugs reported and managed?.............................. 14 2.3.2 How can bugs be found automatically?............................. 16 2.4 How do humans fix bugs?......................................... 17 2.5 What is metaheuristic search?...................................... 18 2.6 What are automatic ways to fix bugs?.................................. 21 2.7 Summary................................................. 23 3 GenProg: automatic program repair using genetic programming 24 3.1 Illustrative example............................................ 26 3.2 What constitutes a valid repair?...................................... 28 3.3 Why Genetic Programming?....................................... 29 3.4 Core genetic programming algorithm................................... 32 3.4.1 Program Representation..................................... 32 3.4.2 Selection and population management.............................. 36 3.4.3 Genetic operators......................................... 37 3.5 Search space and localization....................................... 39 3.5.1 Fault space............................................ 40 3.5.2 Mutation space.......................................... 41 vi Contents vii 3.5.3 Fix space............................................. 42 3.6 How are individuals evaluated for desirability?............................. 43 3.7 Minimization............................................... 44 3.8 Summary and conclusions........................................ 45 4 GenProg is expressive, and produces high-quality patches 46 4.1 Expressive power experimental setup................................... 48 4.1.1 Benchmarks............................................ 49 4.1.2 Experimental parameters..................................... 51 4.2 Repair Results............................................... 52 4.3 Repair Descriptions............................................ 54 4.3.1 nullhttpd: remote heap buffer overflow............................. 54 4.3.2 openldap: non-overflow denial of service............................ 56 4.3.3 lighttpd: remote heap buffer overflow.............................. 57 4.3.4 php: integer overflow....................................... 58 4.3.5 wu-ftpd: format string...................................... 59 4.4 Repair