Variably Interprocedural Program Analysis for Runtime Error Detection
Total Page:16
File Type:pdf, Size:1020Kb
Variably Interprocedural Program Analysis for Runtime Error Detection ∗ Aaron Tomb Guillaume Brat Willem Visser Univ. of California, Santa Cruz RIACS/NASA Ames SEVEN Networks Santa Cruz, CA, USA Moffett Field, CA, USA Redwood City, CA, USA [email protected] [email protected] [email protected] ABSTRACT 1. INTRODUCTION This paper describes an analysis approach based on a com- As software becomes more complex, and achieves an in- bination of static and dynamic techniques to find run-time creasingly critical role in the world’s infrastructure, the abil- errors in Java code. It uses symbolic execution to find con- ity to uncover defects becomes more and more crucial. Soft- straints under which an error (e.g., a null pointer derefer- ware flaws are estimated to cost the United States economy ence, array out of bounds access, or assertion violation) may alone tens of billions of dollars annually [24]. In response, occur and then solves these constraints to find test inputs many approaches to automated defect detection have been that may expose the error. It only alerts the user to the pos- explored. sibility of a real error when it detects the expected exception Program analysis tools aimed at defect detection for pro- during a program run. cedural or object-oriented languages can be divided into The analysis is customizable in two important ways. First, path-sensitive and path-insensitive approaches. Path insen- we can adjust how deeply to follow calls from each top-level sitive approaches, such as those based on abstract interpre- method. Second, we can adjust the path termination condi- tation, are traditionally used to show the absence of errors tion for the symbolic execution engine to be either a bound and can produce large numbers of spurious warnings — un- on the path condition length or a bound on the number of less the domain of programs and/or the error classes are times each instruction can be revisited. suitably restricted, as in the ASTREE´ tool [5]. Path sensi- We evaluated the tool on a set of benchmarks from the tive approaches tend to be focused on finding errors and ei- literature as well as a number of real-world systems that ther use whole program analysis (from some main method), range in size from a few thousand to 50,000 lines of code. or intraprocedural analysis, where each procedure is con- The tool discovered all known errors in the benchmarks (as sidered separately. Software model checkers, such as Java well as some not previously known) and reported on average PathFinder [30], BOGOR [25], and SPIN [15] take the for- 8 errors per 1000 lines of code for the industrial examples. mer approach and static analyzers, such as ESC/Java [12], In both cases the interprocedural call depth played little role take the latter approach. in the error detection. That is, an intraprocedural analysis Because we are interested in error detection, we will focus seems adequate for the class of errors we detect. on the path sensitive approach here. In particular we want to investigate the spectrum of analyses that fall between the extremes of whole program analysis and intraprocedural Categories and Subject Descriptors analysis. In addition, since it is often hard to determine the D.2.5 [Software Engineering]: Testing and Debugging— specification a program should adhere to, we will focus on Symbolic execution implicit correctness properties that, when violated, result in Java runtime exceptions. Specifically, we look for null pointer dereferences, out-of-bounds array accesses, negative General Terms array size exceptions, class cast exceptions, divisions by zero, Reliability, Verification and assertion violations. Even when doing a path sensitive analysis, spurious warn- ings can still occur. The negative impact of such false warn- Keywords ings on the successful uptake of an error detection tool can- Test generation, symbolic execution, defect detection not be stressed enough; for every warning, a human must consider all contexts in which the code can be executed to ∗This author contributed while at RIACS/NASA Ames, determine if the error is real or not, and this is a daunt- Moffett Field, CA ing task. Our goal is to try and reduce this effort as much as possible. More precisely our analysis has zero spurious warnings within the context of the top-level method being Copyright 2006 Association for Computing Machinery. ACM acknowl- analyzed. In other words, if it is possible for the top-level edges that this contribution was authored or co-authored by an employee, contractor or affiliate of the U.S. Government. As such, the Government re- method to be called with the discovered error-inducing in- tains a nonexclusive, royalty-free right to publish or reproduce this article, puts, then the error reported is real. Of course it is still or to allow others to do so, for Government purposes only. possible that, in the context of the rest of the program, the ISSTA’07, July 9–12, 2007, London, England, United Kingdom. method cannot be called with the inputs that expose the Copyright 2007 ACM 978-1-59593-734-6/07/0007 ...$5.00. error. However, at the very least, this shows an assumption • Null pointer errors dominated all others. In most cases being made by the method that is not explicitly checked in they accounted for more than 85% of all errors found. the code. In order to reduce spurious warnings, we first ensure that • The number of errors discovered per 1,000 LOC seems we prune most infeasible paths during symbolic execution by to be a good indication of the code quality. For ma- passing the current path condition (i.e., the constraints on ture and well used code, we discovered around 1 er- the input required to reach the current state) to a decision ror/1,000 LOC, whereas research prototype code aver- procedure to check for satisfiability. Next, when we detect aged around 10 errors/1,000 LOC. a possible error we solve the path condition to obtain test • The type and number of errors found can indicate cod- inputs that may expose the error. Finally, we run the code ing style. One example had fewer null pointer deref- with these inputs and only report an error if the test raises erences since many methods compared their inputs to the expected exception. null before proceeding. The Check ’n’ Crash tool [8] takes a similar approach. It uses ESC/Java [12] to generate constraints and JCrasher [7] The main conclusion of our work is that an intraproce- to execute the derived tests. In contrast, we use our own dural analysis using symbolic execution and test generation symbolic execution engine to determine the effect of inter- seems to be sufficient to find a large number of possible er- procedural analysis on the error detection capability (whereas rors. We conjecture, however, that to find deeper behavioral ESC/Java uses only an intraprocedural analysis). We start errors this simple approach may not be as effective. by doing an intraprocedural analysis and then in the subse- The following section describes variably interprocedural quent analyses follow call chains 1 level deep, 2 levels deep, analysis more formally. Section 3 then shows a simple ex- and so on, while keeping track of the errors detected. We ample that illustrate the advantages of our approach. Sec- also allow the termination condition for the analysis of a tion 4 describes the implementation of our tool, along with path to be customized: we can either set a bound on the our experience building it, while Section 5 describes its per- maximum size of the path condition or set a limit on how formance on a variety of target programs. In Section 6 we many times a specific bytecode instruction can be revisited. discuss a few optimizations that we’d like to do in the fu- We have evaluated our tool on the same small examples ture. Finally, Section 7 gives an overview of related work, used by Check’n’Crash [8] and also on five larger examples and Section 8 contains some concluding remarks. which range from 3,000 lines of code (LOC) to 48,000 LOC. For the small examples we discovered all the errors reported by Check’n’Crash, plus a few more. For the larger examples, 2. VARIABLY INTERPROCEDURAL we found an average of about 8 errors per 1,000 LOC. During ANALYSIS the evaluation we calculated a number of statistics on the Variably interprocedural analysis operates on a config- performance of the technique that resulted in the following urable set M of top-level methods. For whole-program anal- interesting observations: ysis this set is the singleton set containing the entry point of the program. For intraprocedural analysis, it is usually • The level of interprocedural analysis played no notice- the set of all methods in the program. Other analyses may able role in the discovery of new errors. However, a use different sets. more deeply interprocedural analysis did result in more We associate each method m in M with a call depth CDm. constrained path conditions and thus fewer warnings. The call depth CDm indicates how many levels of sub-calls Increasing the level of interprocedural analysis greatly of m will be explicitly analyzed when beginning with m at increased the execution time. the top level. The process of analyzing a method m with a call depth • For the benchmark examples, using a decision proce- CD is as follows: dure to detect infeasible paths pruned away very few paths in an intraprocedural analysis (less than 10%, • If CD < 0, stop. on average), but this percentage increased closer to m 20% during interprocedural analysis.