Thomas Ball and Sriram Rajamani's the SLAM Project: Debugging System Software Via Static Analysis

The SLAM Project: Debugging System Software via Static Analysis* Thomas Ball and Sriram K. Rajamani {tball,sriram}@microsoft.com Microso~ Research http://research.microsoft.com/slam/ Abstract. The goal of the SLAM project is to check whether automaton can read (but not modify) the state of the C or not a program obeys "API usage rules" that specif[y what program that is visible at the function call/return interface, it means to be a good client of an API. The SLAM toolkit maintain a history, and signal when a bad state occurs. }V~ statically analyzes a C program to determine whether or have developed Smc specifications for a variety of Windows not it violates given usage rules. The toolkit has two unique XP driver properties, ranging from simple locking proper- aspects: it does not require the programmer to annotate ties (such as given above) to complex properties dealing with the source program (invariants are inferred); it minimizes completion routines, plug-and-play, and power management. noise (false error messages) through a process known as Given a program P and a Sac specification S, a pre- "counterexample-driven refinement". SLAM exploits and ex- processor creates an instrumented program P' such that a tends results fi'om program analysis, model checking and au- unique label ERROR is reachable in P' if-and-only-if P does tomated deduction. }V~ have successfully applied the SLAM not satisI} S. The goal then shifts to determining whether toolkit to Windows XP device drivers, to both validate be- or not the ERROR label is reachable in P', a generally unde- havior and find defects in their usage of kernel APIs. cidable problem. Context. Today, many programmers are realizing the ben- Design. The basic design of the SLAM process is to iterate efits of using languages with static type systems. By pro- the creation, analysis and refinement of program abstrac- viding simple specifications about the tbrm of program data, tions, until either a t~asible execution path in P' to ERROR programmers receive useful compile-time error messages or is tbund~ the prograln P' is validated (EEROR is shown not guarantees about the behavior of their (type-correct) pro- to be reachable), or we run out of resources or patience. grams. Getting additional checking beyond the confines of The SLAM process creates a sound boolears pr~gr~rrs ab- a particular type system generally requires programmers to straction B' of the C program p,.1 Boolean programs have use assertions and pertbrm testing. A number of projects all the control-flow constructs of C programs, but contain have started to tbcus on statically checking programs against only boolean variables. Each boolean variable in B' con- user-supplied specifications, using techniques from program servatively tracks the state of a predicate (boolean expres- analysis [18, 19], model checking [21, 17, 22], and automated sion) in the C program. Boolean programs are created au- deduction [16, 12]. tomatically using the technique of predicate abstr~ctior~ [20]. If a teachability analysis of B' determines that the label Specification. The goal of the SLAM project is to check ERROR is not reachable in B' then it is not reachable in temporal sat>ty properties of sequential C programs [7]. P'. It is possible that B' may be too coarse an abstrac- Roughly stated, temporal sat~ty properties are those proper- tion of P' (that is, ERROR is reachable in B' via a path p ties whose violation is witnessed by a finite execution trace but ERROR is not reachable in P' via p). V~ apply a method (see [24] ibr a formal definition). A simple example of a known as cour~terw'arr~ple-driver~ ~vfir~err~er~t [23, 28, 27] to sat>ty property is that a lock should be alternatingly ac- create a more precise boolean program (by adding new pred- quired and released. }¥~ encode temporal sat~ty properties icates/boolean variables) that does not contain the spurious in a language called Sac (Specification Language for Inter- path p (or other paths that are spurious tbr the same reason face Checking) [9], which allows the definition of a sat~ty p is). Termination of the SLAM process is addressed below. automaton [30, 29] that monitors the execution behavior of }V~ expect the SLAM process to work well tbr programs a program at the level of function calls and returns. The whose behavior is governed by an underlying finite state protocol. Seen in this light, the goal of SLAM is tO tease out *Presented by the first author. the underlying "protocol" state machine ti'om the code, to Permission to make digital or hard copies of all or part a level of precision that is good enough to find real errors of this work for personal or classroom use is granted with- out fee provided that copies are not made or distributed for 1Of course, whenever one hears a claim that an analysis of C code profit or commercial advantage and that copies bear this no- is %ound", one must ask %ound with respect to what assumptions'.'" tice and the full citation on the first page. 2b copy oth- because of the (potential) use of arbitrary pointer arithmetic. In erwise, to republish, to post on servers or to redistribute SLAM, %ve guarantee soundness under the assumption that the C program obeys a "logical memory model' in which the expressions *p and to lists, requires prior specific permission and/or at fee. *(p+i) refer to the same object. Another analysis (see the work on POPL '02, Jan. 16-18, 2002 Portland, OR USA CCured presented at this symposium [25]) is needed to discharge the Copyright 2002 ACM ISBN 1-58113-450-9/02/01...$5.00 "logical memory" assumption. 1 or validate the code. For example, while a video card driver In practice, the SLAM process has terminated for all may have a huge data path, most of this data has no bearing drivers within 20 iterations. Our major concern has been on the driver's interaction with the operating system. How- with the overall running time of the process. So far, we are ever, some of the driver data definitely are relevant to this able to analyze programs on the order of 10,000 lines, and interaction, and correlations between these data may need abstractions with several hundred boolean variables in the to be tracked. range of minutes to a half hour. In practice, we find that most of the predicates SLAM generates are simple equalities Implementation. Three basic tools comprise the SLAM with possible pointer deret~rences. For this class of predi- toolkit (in addition to the SLI¢ preprocessor): cates, we believe it is possible to scale the SLAM process to several 100,000 lines of code through optimizations outlined • C2Be, a tool that transtbrms a C program P into a below. boolean program /31P(P,E) with respect to a set of A major expense in the SLAM process is the reacha- predicates E [2, 3]. C2Bp translates each procedure bility step (BEgot'), which has worst case running time of the C program separately, enabling it to scale to O(N(GL)a), where N is the size of the boolean program, G large programs. Using the theory of abstract interpre- is the number of global states, and L is the maximum num- tation [13], we have characterized the precision of the ber of local states over all procedures. The number of states boolean program abstractions created by C2ge [4]. is exponential (in the worst-case) in the maximal number of • BEBOP, a too1 for performing reachability analysis of variables in scope. boolean programs [6, 8]. BEBOe combines interproce- The key to scaling for the SLAM process is in control- dural dataflow analysis in the style of [26] with Binary ling the complexity of the boolean program abstraction. Decision Diagrams [10, 11] (BDDs) to etficiently rep- Satyaki Das has implemented a predicate abstraction tech- resent the reachable states of the boolean program at nique based on successive approximations [15] in the SLAM each program point. toolkit, which has proven quite useful in this regard. Also relevant here is the paper on "lazy abstraction" in this sym- • NEWTON, a too1 that discovers additional predicates to posium [21]. refine the boolean program, by analyzing the t~asibility Additionally, there is substantial overhead in having to of paths in the C program. iterate the SLAM process many times, which can be addressed by both the "lazy abstraction" method as well as The SLAM process starts with an initial set of predicates E0 methods tbr heuristically determining a "good" initial set derived t¥om the SLI¢ specification, and iterates the follow- of predicates. }V~stley }V~imer has implemented an algo- ing steps: rithm in SLAM that, given the set of predicates present in 1. Apply C2BP to construct the boolean program the SLI¢ specification, determines what other predicates in 13p(P', Ed. the C program will very likely be needed in the future. This technique, based on the value-flow graph [14], greatly re- 2. Apply BEBOe to check if there is a path Pi in duces the number of iterations of the SLAM process. BT'(P', Ei) that reaches the ERROR label. If BEBOP determines that ERROR is not reachable, then P satisfies Challenges, ~¥~ summarize by discussing some of the chal- the SLI¢ specification and the process terminates. lenges inherent in the endeavor of checking user-supplied properties of sottware. 3. If there is such a path pi, then NEWTON checks if Pi is feasible in P'.

Load more