Practical Program Repair Via Bytecode Mutation

Practical Program Repair via Bytecode Mutation Ali Ghanbari and Lingming Zhang The University of Texas at Dallas {ali.ghanbari,lingming.zhang}@utdallas.edu ABSTRACT equivalent to developer patches) or suggesting patches, that may Software debugging is tedious, time-consuming, and even error- guide the debuggers to fix the bugs faster. Thus, APR has beenthe prone by itself. So, various automated debugging techniques have subject of intense research in spite of being a young research area. been proposed in the literature to facilitate the debugging process. Based on the actions they take for fixing a bug, state-of-the-art Automated Program Repair (APR) is one of the most recent ad- APR techniques can be divided into two broad categories: (1) tech- vances in automated debugging, and can directly produce patches niques that monitor the dynamic execution of a program to find for buggy programs with minimal human intervention. Although deviations from certain specifications, and then heal the program various advanced APR techniques (including those that are either by modifying its runtime state in case of any abnormal behavior search-based or semantic-based) have been proposed, the simplis- [43, 56]; (2) so-called generate-and-validate techniques that modify tic mutation-based APR technique, which simply uses pre-defined the code representation of the programs based on various rules/tech- mutation operators (e.g., changing a>=b into a>b) to mutate pro- niques, and then use either test cases or some formal specification grams for finding patches, has not yet been thoroughly studied. (such as code contracts) as an oracle to validate each generated can- In this paper, we implement the first practical bytecode-level APR didate patch, and find plausible patches (i.e., the patches that can technique, PraPR, and present the first extensive study on fixing pass all the tests/checks) [16, 18, 19, 23, 28, 36, 41, 42, 47, 51, 55, 62, real-world bugs (e.g., Defects4J bugs) using bytecode mutation. The 68, 73]. Among these, generate-and-validate techniques, especially experimental results show that surprisingly even PraPR with only those that are based on test cases, have gained popularity as testing the basic traditional mutators can produce genuine patches for 18 is the prevalent method for detecting software bugs, while very few bugs. Furthermore, with our augmented mutators, PraPR is able software systems are based on rigorous, formal specifications. to produce genuine patches for 43 bugs, significantly outperform- It is worth noting that, lately, multiple APR research papers ing state-of-the-art APR. It is also an order of magnitude faster, get published in top-notch Software Engineering conferences and indicating a promising future for bytecode-mutation-based APR. journals each year, introducing various delicately designed and/or implemented APR techniques. With such state-of-the-art APR tech- ACM Reference Format: niques, more and more real bugs can be fixed fully automatically, Ali Ghanbari and Lingming Zhang. 2018. Practical Program Repair via Bytecode Mutation. In Proceedings of ACM Conference (Conference’17). ACM, e.g., the most recent APR technique published in ICSE’18 [68] has New York, NY, USA, 14 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn been reported to produce genuine patches for 22 bugs of Defects4J (a set of real-world Java programs widely used for evaluating APR 1 INTRODUCTION techniques [31]). Despite the success of recent APR techniques, as also highlighted in a recent survey paper [22], currently we Software systems are ubiquitous in today’s world; most of our have a scattered collection of findings and innovations with no activities, and sometimes even our lives, depend on computers thorough evaluation of the techniques or some clear relationship controlled by the software. Unfortunately, software systems are between them. Actually, it is even not clear how the existing sim- not perfect and often come with bugs. Software debugging is a plistic mutation-based APR [19], which generates program patches difficult activity that consumes over 50% of the development time simply based on a limited set of mutators (also called mutation and effort [7], and it costs the global economy billions of dollars [15]. operators) [10], compares to the state-of-the-art APR techniques. To date, a huge body of research has been dedicated to automated Therefore, in this work, we present the first extensive study of debugging to automatically localize [9, 12, 13, 37, 38, 61, 69, 75, 76] a simplistic program repair approach that applies mutation-like or fix [16, 18, 19, 22, 23, 28, 36, 41–44, 47, 48, 51, 55, 56, 62, 64, arXiv:1807.03512v1 [cs.SE] 10 Jul 2018 patch generation on the widely used Defects4J benchmark pro- 67, 68, 73] software bugs. Among various automated debugging grams. More specifically, we build a practical APR tool named PraPR techniques, Automated Program Repair (APR) techniques aim to (Practical Program Repair) based on JVM bytecode mutation. We directly fix software bugs with minimal human intervention. These stress that although simplistic, PraPR offers various benefits over techniques can significantly speed up the debugging process by the state-of-the-art techniques. First, PraPR is the first bytecode- either synthesizing genuine patches (i.e., the patches semantically level APR technique for Java and all the generated patches can be Permission to make digital or hard copies of all or part of this work for personal or directly validated without compilation, while existing techniques classroom use is granted without fee provided that copies are not made or distributed [16, 18, 19, 23, 28, 36, 41, 42, 47, 51, 55, 62, 68, 73] have to compile for profit or commercial advantage and that copies bear this notice and the full citation 1 on the first page. Copyrights for components of this work owned by others than ACM each candidate patch before validating it . Even though some tech- must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, niques curtail compilation overhead by encoding a group of patches to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. inside a single meta-program, it can still take up to 37 hours to fix Conference’17, July 2017, Washington, DC, USA a Defects4J program due to numerous patch compilations and class © 2018 Association for Computing Machinery. ACM ISBN 978-x-xxxx-xxxx-x/YY/MM...$15.00 1Please note that compilation can be quite expensive especially for type-safe program- https://doi.org/10.1145/nnnnnnn.nnnnnnn ming languages. Conference’17, July 2017, Washington, DC, USA Ali Ghanbari and Lingming Zhang loadings [16]. Second, bytecode-level repair avoids messing up the 2.1 Mutation Testing source code in unexpected ways, and can even be applicable for Mutation testing [10] is a powerful method for assessing the quality fixing code without source code information, e.g., buggy 3rd-party of a given test suite in detecting potential software bugs. Mutation libraries that do not have official patches yet. Third, manipulating testing measures test suite quality via injecting “artificial bugs” into programs at the level of JVM bytecode [39] makes PraPR indepen- the subject programs through mutating it. The basic intuition is that dent of the syntax of a specific target programming language, and the more artificial bugs that a test suite can detect, the more likely makes it applicable to fix programs written in other JVM-based is it to detect potential real bugs, hence the test suite is of higher languages (notably Kotlin [27], Scala [52], and even Groovy [1]). quality [11, 32]. Central to mutation testing is the notion of mutation Lastly, PraPR does not require complex patching rules [36, 41, 73], operator, aka mutator, which is used to generate artificial bugs to complicated computations such as symbolic execution and equation mimic real bugs. Applying a mutator on a program results in a solving [16, 47, 51], or any training/mining [59, 68, 72], making it mutant (or mutation) of the program—a variant of the program that directly applicable for real-world programs and easily adoptable as differs from the original program only in the injected artificial bug, the baseline for future APR techniques. e.g., replacing a+b with a-b in one statement. In first-order mutation We have applied PraPR to fix all the 395 bugs available inDe- testing [29], the mutators can be treated as program transformers fects4J. Surprisingly, even the basic traditional mutators can already that change the meaning of the input program by making small, produce plausible and genuine patches for 113 and 18 bugs, respec- single-pointed changes to it. This implies that the resulting mutants tively. With both the traditional and our augmented simple mutators should also be syntactically valid, and also typeable, for it has to (e.g., replacing field accesses or method invocations), PraPR suc- simulate a valid buggy version of the program under test. This cessfully produces genuine patches for 43 Defects4J bugs, thereby also indicates that the mutators are highly dependent on the target significantly outperforming the state-of-the-art APR techniques programming language. Therefore, given an original program P, (e.g., the most recent CapGen technique published in ICSE’18 [68] mutation testing will generate a set of program variants, M, where can only fix 22 bugs). In addition, thanks to the bytecode-level each element is a valid mutant m. manipulation, PraPR with only single-thread execution is already Given the original program P, and a mutant m 2 M of the pro- 26.1X faster than the state-of-the-art CapGen in terms of per-patch gram, a test suite T is said to kill mutant m if and only if there exists time cost.

Load more