General, Scalable Path-Sensitive Fault Detection

General, Scalable Path-Sensitive Fault Detection Wei Le Mary Lou Soffa Department of Computer Science Department of Computer Science University of Virginia University of Virginia [email protected] [email protected] ABSTRACT software developers, and code is only allowed to check in Recent research has shown that program paths are impor- after it passes the inspection of those tools. Research has tant in static analysis for detecting and reporting faults. shown that program path information is essential in static However, developing a scalable path-sensitive analysis is chal- analysis for achieving precise fault detection, especially for lenging. Often the techniques only address one particular complex faults [5, 14, 20]. In path-sensitive analysis, identi- type of fault and require much manual effort to tune for the fied infeasible paths can be excluded, and program facts col- desirable scalability and precision. In this paper, we present lected along different paths for determining faults are never a novel framework that automatically generates scalable, in- merged, avoiding a loss of precision. Furthermore, because terprocedural, path-sensitive analyses to detect user speci- a fault often results from a sequence of executions, reporting fied faults. The framework consists of a specification tech- a fault in terms of path segments instead of program points nique for expressing program properties related to faults, allows the code inspector to follow the actual code sequences a scalable path-sensitive algorithm, and a generator that that lead to the fault. unifies the two. The generated analysis identifies not only Although useful for both detecting and reporting a fault, faults but also the path segments that are relevant to the developing a path-sensitive detector that is scalable for var- faults. The generality of the framework is accomplished for ious types of faults is challenging. In current path-sensitive both data and control centric faults, so that the detection detectors, scalability is mostly addressed for a certain type of of multiple types of faults can be unified, which enables the fault. For example, ESP [5] achieves a polynomial time ver- exploitation of fault interactions for diagnosis and efficiency. ification on identifying typestate violations. The scalability We implemented our framework and generated fault detec- is achieved by applying heuristics to select information rel- tors for identifying buffer overflow, integer truncation and evant to the fault. The approach is only shown effective for signedness errors, and null-pointer dereference. We exper- typestate errors because its underlying assumption is based imentally demonstrated that the generated analysis scales on how a programmer might program typestate transitions up to at least half a million lines of code, and its detection in practice. Tools such as Saturn [20] apply a modular, in- capability is comparable to manually produced analyses. In stead of whole program analysis. That is, parts of program our experiment, a total of 53 faults of the three types from are analyzed in isolation and then the results are composed 9 benchmarks are detected, among which 37 have not been at the whole program level. This summary based technique reported previously. The results show that we are able to is not effective for faults whose detection requires a large identify faults deeply embedded in the code, and the aver- amount of summary information or the tracking of global age length of faulty path segments is 1–4 procedures, which side effects. In fact, Saturn only demonstrates its effective- provides a focus for diagnosis. ness in detecting typestate violations. Besides heuristics and modular analysis, applying annotations is also a solution for scalability. ESPx [10] annotates procedural interfaces, aim- 1. INTRODUCTION ing to convert an interprocedural analysis to intraprocedu- The use of static analyses to check for properties in pro- ral. However, supporting multiple types of faults increases grams, such as protocol violations and security vulnerabil- the use of annotations while both writing and verifying an- ities, has been growing with the recognition of their ef- notations require much human effort. ESPx only supports fectiveness in improving software robustness and security. buffer related contracts. Our previous research developed a At Microsoft, static detectors report thousands of faults demand-driven analysis to achieve the scalability in detect- monthly [16]. Static detectors are deployed on desktops for ing only buffer overflow [14]. Considering the manual effort and challenges to build in- dividual fault detectors, techniques that address scalability and also are applicable to a variety of faults are desirable. Because of the generality, such a system can enable the de- Permission to make digital or hard copies of all or part of this work for tection of interactions of different types of faults, such as personal or classroom use is granted without fee provided that copies are showing an integer fault can lead to buffer overflow. not made or distributed for profit or commercial advantage and that copies In this paper, we present a novel framework which enables bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific the automatic generation of scalable path-sensitive analyses permission and/or a fee. that detect user specified faults. The framework consists of Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$10.00. 1 a general, scalable algorithm, a specification technique, and • a scalable algorithm that detects both control and data a generator that unifies the two. The key idea is to address centric faults and returns path segments of the faults, the scalability of path-sensitive fault detection in a general demand-driven algorithm, and automatically generate the • a specification language for user specified faults, and fault-specific part of the analysis from a specification. • implementation and experimentation that demonstrate The generated analysis is path-sensitive in that program the generality and scalability of the framework. facts collected along different paths are distinguished. Stat- ically identifiable infeasible paths are also excluded. Impor- In the rest of the paper, we present the three components tantly, only path segments that are relevant to the produc- of the framework: a general analysis in Section 2, specification of faults are reported, providing both the focus and nec- tion in Section 3, and the generator in Section 4. We show essary context for the code reviewer to diagnose the faults. the experimental results in Section 5, and compare the re- The scalability of path-sensitive analysis is achieved by a lated work in Section 6. The conclusion is given in Section 7. demand-driven algorithm and a set of optimizations, all of which are encapsulated in the general algorithm. 2. SCALABLE ANALYSIS The generality of the framework is accomplished for both data and control centric faults. Data centric faults require Demand-driven is a static analysis technique that deter- the tracking of variable values and ranges for detection, mines where to collect the information from the program e.g., buffer overflow and integer faults, while control cen- source, what information to collect, and when the analysis tric faults, such as typestate violations, mainly focus on the should terminate. A demand-driven analysis detects faults order of operations. Although different types of informa- by raising queries at the statements where faults potentially tion are required, there are commonalities in fault detec- occur, inquiring whether the constraints of correctness would tion. Our insight is that 1) many types of faults are only hold. The analysis performs a backward path traversal to de- observable at certain types of program statements, and 2) termine the occurrence of the fault. In this section, we first on the paths to such observable points, only certain types use an example to intuitively explain how a demand-driven of statements can contribute to the failure. By identifying analysis works to identify different types of faults. We then such observable points, we can construct a query at those define key components in a general analysis, and present a points regarding whether the fault can occur and propagate general algorithm developed based on the definitions. the query along the paths for resolution. Similarly, given contributing points, we know where to collect information 2.1 An Example for Demand-Driven Analysis to resolve the query and thus determine the fault. That is, In Figure 1, an integer signedness error occurs at node 11. by supplying observable/contributing points and the corre- We show how our analysis identifies this fault. In the first sponding actions at the points, we are able to guide a general step, the analysis performs a linear scan and identifies node 11 analysis to locate the desired faults. as a potentially faulty statement, because at node 11, a We develop a specification language for users to specify signedness conversion occurs for integer x to match the in- such information. The specification uses code signatures to terface of malloc. We raise a query [Value(x) ≥ 0] at identify observable and contributing points, and introduces node 11, indicating for integer safety, the value of x should the basic construct attributes to map the code to desired be non-negative along all paths before the signedness con- abstractions. For example, attribute Len(s) represents the version. The query is propagated backwards to determine length of the string s and Value(i) expresses the integer the satisfaction of the constraint. At node 10, the query is value stored in the variable i. Using attributes and a set of first updated to [Value(x)*8 ≥ 0] via a symbolic substitu- operations on the attributes, we specify the actions at ob- tion. Along branch h8, 7i, the query encounters a constant servable/contributing points. To generate an analysis, the assignment and is resolved to [1024 ≥ 0] as safe.

Load more