Hybrid Concolic Testing∗
Total Page:16
File Type:pdf, Size:1020Kb
Hybrid Concolic Testing∗ Rupak Majumdar Koushik Sen CS Department, UC Los Angeles, USA EECS Department, UC Berkeley, USA [email protected] [email protected] Abstract random testing [3, 22, 12, 5, 7, 23]. Random testing gen- erates a large number of inputs randomly. The program is We present hybrid concolic testing, an algorithm that in- then run on those inputs to check if programmer written as- terleaves random testing with concolic execution to obtain sertions hold, or in the absence of specifications, if a wide both a deep and a wide exploration of program state space. range of program behaviors including corner cases are exer- Our algorithm generates test inputs automatically by inter- cised. Random testing scales well in the sense that the time leaving random testing until saturation with bounded ex- taken to run the program on an input does not incur addi- haustive symbolic exploration of program points. It thus tional overhead beyond program execution. However, ran- combines the ability of random search to reach deep pro- dom testing does not guarantee correctness, and more dis- gram states quickly together with the ability of concolic test- turbingly, the range of behaviors covered for large programs ing to explore states in a neighborhood exhaustively. We is often vanishingly small in comparison to all the possible have implemented our algorithm on top of CUTE and ap- behaviors of the program. As a consequence, many bugs re- plied it to obtain better branch coverage for an editor im- main after random testing. Thus, while random testing can plementation (VIM 5.7, 150K lines of code) as well as a reach deep states of the program state space by executing data structure implementation in C. Our experiments sug- a large number of very long program paths quickly, it fails gest that hybrid concolic testing can handle large programs to be wide, that is, to capture a large variety of program and provide, for the same testing budget, almost 4× the behaviors. branch coverage than random testing and almost 2× that The inadequacy of random test input generation has led of concolic testing. to several symbolic techniques that execute a program using symbolic values in place of concrete inputs [19, 6, 30, 28, Categories and Subject Descriptors: D.2.5 [Software En- 2, 32, 33]. Precisely, the program is supplied symbolic con- gineering]: Testing and debugging. stants for inputs, and every assignment along an execution General Terms: Verification, Reliability. path updates the program state with symbolic expressions Keywords: directed random testing, concolic testing. and every conditional along the path generates a constraint in terms of the symbolic inputs. The goal is then to generate 1 Introduction concrete inputs that satisfy the constraints generated along a symbolic execution path: these inputs are guaranteed to execute along this path. Moreover, different symbolic ex- Testing is the primary way to find bugs in software. Test- ecutions generate different program behaviors, leading to ing using manually generated test cases is the primary tech- better coverage. nique used in industry to improve reliability of software—in Recently, concolic testing [14, 25, 4] has been proposed fact, manual testing accounts for 50–80% of the typical cost as a variant of symbolic execution where symbolic execu- of software development. However, manual test input gen- tion is run simultaneously with concrete executions, that eration is expensive, error-prone, and usually not exhaus- is, the program is simultaneously executed on concrete and tive. symbolic values, and symbolic constraints generated along With the increasing power of computers and advances in the path are simplified using the corresponding concrete val- theorem proving and constraint solving technologies, there ues. The symbolic constraints are then used to incremen- has been a renewed interest in automated testing. A sim- tally generate test inputs for better coverage by conjoining ple and often effective technique for automated testing is symbolic constraints for a prefix of the path with the nega- ∗This research was sponsored in part by the grants NSF-CCF-0427202 tion of a conditional taken by the execution. The primary and NSF-CCF-0546170. advantage of concolic execution over pure symbolic simu- lation is the presence of concrete (data and address) values, generate σ, concolic testing gets stuck in exploring a huge which can be used both to reason precisely about complex number of program paths before even reaching the state s. data structures as well as to simplify constraints when they We give a few examples of this behavior. For example, in go beyond the capability of the underlying constraint solver. a web server, each connection maintains a state machine In practice, however, both for symbolic and concolic ex- that moves the server between various states: disconnected, ecution, the possible number of paths that must be consid- connected, reading, etc. Random testing can provide the in- ered symbolically is so large that the methods end up ex- puts necessary to reach particular states of the machine, for ploring only small parts of the program state space, and example, when the server is processing a request, by gen- those that can be reached by “short” runs from the ini- erating inputs that exercise the “common case.” However, tial state, in reasonable time. Furthermore, maintaining from a particular state, the server can consider a specific and solving symbolic constraints along execution paths be- sequence of events to account for application specific rules comes expensive as the length of the executions grow. Thus (for example, the server must disconnect if a user name that previous applications of these techniques have been limited is not registered requests a special command) which are not to small units of code [25] and path lengths of at most about found by randomly setting the inputs. Similarly, in a text fifty thousand basic blocks [18]. That is, although wide, in editor, random inputs can get the system into a state where that different program paths are explored exhaustively, sym- there is enough data in the editor’s buffers so that certain bolic and concolic techniques are inadequate in exploring commands (for example, delete lines or format paragraphs) the deep states reached only after long program executions. are enabled. This is unfortunate, since concolic techniques hold most As the examples indicate, hybrid concolic testing is most promise for larger and complicated pieces of code for which suitable for testing reactive programs that periodically get generating test suites with good coverage of corner case be- input from their environment. Examples of such programs havior is most crucial. A natural question then is how to include editors, network servers, simple GUI based pro- combine the strengths of random testing and concolic sim- grams, event based systems, embedded systems, and sensor ulation to achieve both a deep and a wide exploration of the networks. On the other hand, transformational programs, program state space. that get some fixed input initially, are not suitable for hy- We present hybrid concolic testing, a simple algorithm brid concolic testing, since the future behavior cannot be af- that interleaves the application of random tests with con- fected by symbolic execution after the initial input has been colic testing to achieve deep and wide exploration of the set. program state space. From the initial program state, hy- In the end, hybrid concolic testing has the same limita- brid concolic testing starts by performing random testing tions of symbolic execution based test generation: the dis- to improve coverage. When random testing saturates, that covery of uncovered points depends on the scalability and is, does not produce any new coverage points after run- expressiveness of the constraint solver, and the exhaustive ning some predetermined number of steps, the algorithm search for uncovered points is limited by the number of automatically switches to concolic execution from the cur- paths to be explored. Therefore, in general, hybrid concolic rent program state to perform an exhaustive bounded depth testing may not achieve 100% coverage, although it can im- search for an uncovered coverage point. As soon as one is prove random testing considerably. Further, the algorithm is found, the algorithm reverts back to concrete mode. The not a panacea for all software quality issues. While we pro- interleaving of random testing and concolic execution thus vide an automatic mechanism for test input generation, all uses both the capacity of random testing to inexpensively the other effort required in testing, for example, test oracle generate deep program states through long program execu- generation, assertion based verification, and mock environ- tions and the capability of concolic testing to exhaustively ment creation still have to be performed as with any other and symbolically search for new paths with a limited looka- test input generation algorithm. Further, we look for code head. coverage, which may or may not be an indicator of code The interleaving of random and symbolic techniques is reliability. the crucial insight that distinguishes hybrid concolic testing We have implemented hybrid concolic testing on top of from a na¨ıve approach that simply runs random and con- the CUTE tool for concolic testing [25] and applied it to colic tests in parallel on a program. This is because many achieve high branch coverage for C programs. In our pre- programs show behaviors where the program must reach a liminary experiments, we compare random, concolic, and particular state s and then follow a precise sequence of in- hybrid concolic testing on the VIM text editor (150K lines put events σ in order to get to a required coverage point. of C code) and on an implementation of the red-black tree It is often easy to reach s using random testing, but not data structure.