Automating Software Testing Using Program Analysis

focussoftware development tools Automating Software Testing Using Program Analysis Patrice Godefroid, Peli de Halleux, Aditya V. Nori, Sriram K. Rajamani, Wolfram Schulte, and Nikolai Tillmann, Microsoft Research Michael Y. Levin, Microsoft Center for Software Excellence uring the last decade, code inspection for standard programming errors has Three new tools largely been automated with static code analysis. Commercial static program- combine techniques analysis tools are now routinely used in many software development organiza- from static program tions.1 These tools are popular because they find many software bugs, thanks to analysis, dynamic D three main ingredients: they’re automatic, they’re scalable, and they check many properties. analysis, model Intuitively, any tool that can automatically check millions of lines of code against hundreds checking, and of coding rules is bound to find on average, say, one bug every thousand lines of code. automated constraint Our long-term goal is to automate, as much as say, even half the code of a million-line C program solving to automate possible, an even more expensive part of the soft- would have tremendous value. test generation in ware development process, namely software testing. Such a tool doesn’t exist today, but in this arti- varied application Testing usually accounts for about half the R&D cle, we report on some recent significant progress budget of software development organizations. In toward that goal. Although automating test genera- domains. particular, we want to automate test generation by tion using program analysis is an old idea,2 practi- leveraging recent advances in program analysis, au- cal tools have only started to emerge during the last tomated constraint solving, and modern comput- few years. This recent progress was partly enabled ers’ increasing computational power. To replicate by advances in dynamic test generation,3 which gen- the success of static program analysis, we need the eralizes and is more powerful than traditional static same three key ingredients found in those tools. test generation. A key technical challenge is automatic code- At Microsoft, we are developing three new tools driven test generation: given a program with a set for automatic code-driven test generation. These of input parameters, automatically generate a set tools all combine techniques from static program of input values that, upon execution, will exercise analysis (symbolic execution), dynamic analysis (test- as many program statements as possible. An op- ing and runtime instrumentation), model checking timal solution is theoretically impossible because (systematic state-space exploration), and automated this problem is generally undecidable. In practice, constraint solving. However, they target differ- however, approximate solutions suffice: a tool that ent application domains and include other original could automatically generate a test suite covering, techniques. 30 IEEE SOFTWAR E Published by the IEEE Computer Society 0740-7459/08/$25.00 © 2008 IEEE Static versus dynamic are hard or impossible to reason about symbolically test generation with good enough precision. Work on automatic code-driven test generation can Dynamic test generation,4 on the other hand, DART blends roughly be partitioned into two groups: static and consists of dynamic. dynamic test Static test generation consists of analyzing a ■ executing the program P, starting with some generation program P statically by reading the program code given or random inputs; with model and using symbolic execution techniques to simu- ■ gathering symbolic constraints on inputs at con- late abstract program executions to attempt to com- ditional statements along the execution; and checking to pute inputs to drive P along specific execution paths ■ using a constraint solver to infer variants of the systematically or branches, without ever executing the program.2 previous inputs to steer the program’s next ex- The idea is to symbolically explore the tree of all ecution toward an alternative program branch. execute the computations that the program exhibits with all a program’s possible value assignments to input parameters. For This process is repeated until a specific program each control path p (that is, a sequence of the pro- statement is reached. feasible gram’s control locations), symbolic execution con- Directed Automated Random Testing (DART)3 program structs a path constraint that characterizes the input is a recent variant of dynamic test generation that paths. assignments for which the program executes along blends it with model-checking techniques to system- p. A path constraint is thus a conjunction of con- atically execute all of a program’s feasible program straints on input values. If a path constraint is satisfi- paths, while checking each execution using runtime able, then the corresponding control path is feasible. checking tools (such as Purify) for detecting various We can enumerate all the control paths by consider- types of errors. In a DART directed search, each ing all possible branches at conditional statements. new input vector tries to force the program’s ex- Assuming that the constraint solver used to check ecution through some new path. By repeating this the satisfiability of all path constraints is sound and process, such a directed search attempts to force the complete, this use of static analysis amounts to a program to sweep through all its feasible execution kind of symbolic testing. paths, similarly to systematic testing and dynamic Unfortunately, this approach doesn’t work when- software model checking.5 ever the program contains statements involving In practice, a directed search typically can’t ex- constraints outside the constraint solver’s scope of plore all the feasible paths of large programs in a reasoning. The following example illustrates this reasonable amount of time. However, it usually limitation: does achieve much better coverage than pure random testing and, hence, can find new program int obscure(int x, int y) { bugs. Moreover, it can alleviate imprecision in sym- if (x == hash(y)) return -1; // error bolic execution by using concrete values and ran- return 0; // ok domization: whenever symbolic execution doesn’t } know how to generate a constraint for a program statement depending on some inputs, we can always Assume the constraint solver can’t “symbolically simplify this constraint using those inputs’ concrete reason” about the function hash. This means that the values.3 constraint solver can’t generate two values for inputs Let’s illustrate this point with our previous pro- x and y that are guaranteed to satisfy (or violate) the gram. Even though it’s statically impossible to gen- constraint x == hash(y). (For instance, if hash is a hash erate two values for inputs x and y such that the con- or cryptographic function, it has been mathemati- straint x == hash(y) is satisfied (or violated), it’s easy cally designed to prevent such reasoning.) In this to generate, for a fixed value ofy , a value of x that is case, static test generation can’t generate test inputs equal to hash(y) because the latter is known dynami- to drive this program’s execution through either cally at runtime. branch of its conditional statement; static test gen- A directed search would proceed as follows. For eration is helpless for a program like this. In other a first program run, pick random values for inputs words, static test generation is doomed to perform x and y: for example, x = 33, y = 42. Assuming the poorly whenever perfect symbolic execution is im- concrete value of hash(42) is 567, the first concrete possible. Unfortunately, this is frequent in practice run takes the else branch of the conditional state- owing to complex program statements (pointer ma- ment (since 33 ≠ 567), and the path constraint for nipulations, arithmetic operations, and so on) and this first run isx ≠ 567 because the expression hash(y) calls to operating-system and library functions that isn’t representable and is therefore simplified with September/October 2008 IEEE SOFTWAR E 31 its concrete value 567. Next, the negation of the coverage is used as a heuristic to favor the ex- simplified constraint x = 567 can easily be solved pansion of executions with high new coverage. Fuzz testing, and lead to a new input assignment x = 567, y = 42. 3. The tracer records a complete instruction-level Next, running the program a second time with in- trace of the run using the iDNA framework.8 a black- puts x = 567, y = 42 leads to the error. 4. Lastly, the symbolic executor replays the re- box testing Therefore, static test generation is unable to gen- corded execution, collects input-related con- technique, is erate test inputs to control this program’s execu- straints, and generates new inputs using the tion, but dynamic test generation can easily drive constraint solver Disolver.9 a quick and that same program’s executions through all its fea- cost-effective sible program paths. In realistic programs, impreci- The symbolic executor is implemented on top sion in symbolic execution typically arises in many of the trace replay infrastructure TruScan,10 which method for places, and dynamic test generation can recover consumes trace files generated by iDNA and vir- uncovering from that imprecision. Dynamic test generation ex- tually re-executes the recorded runs. TruScan of- tends static test generation with additional runtime fers several features that substantially simplify security bugs. information, so it’s more general and powerful. It is symbolic execution. These include instruction de- the basis of the three tools we present in the remain- coding, providing an interface to program symbol der of this article, which all implement variants and information, monitoring various I/O system calls, extensions of a directed search. keeping track of heap and stack frame allocations, and tracking the data flow through the program SAGE: White-box fuzz structures. testing for security The constraint generation approach SAGE uses Security vulnerabilities (like buffer overflows) are a differs from previous dynamic test generation im- class of dangerous software defects that can let an plementations in two main ways.

Load more