Abstract Interpretation: a Unified Lattice Model for Static Analysis of Programs by Construction Or Approximation of Fixpoints
Total Page:16
File Type:pdf, Size:1020Kb
ABSTRACT INTERPRETATION : ‘A UNIFIED LATTICE MODEL FOR STATIC ANALYSIS OF PROGRAMS BY CONSTRUCTION OR APPROXIMATION OF FIXPOINTS Patrick Cousot*and Radhia Cousot** Laboratoire d’Informatique, U.S.M.G., BP. 53 38041 Grenoble cedex, France 1. Introduction Abstract program properties are modeled by a com– plete semilattice, Birkhoff[611. Elementary Pro- A program denotes computations in some universe of gram constructs are locally interpreted by order objects. Abstract interpretation of programs con– preserving functions which are used to associate sists in using that denotation to describe compu– a system of recursive equations with a program. The tations in another universe of abstract objects, program global properties are then defined as one so that the results of abstract execution give of the extreme fixpoints of that system, Tarski [55]. some information on the actual computations. An The abstraction process is defined in section 6. It intuitive example (which we borrow from Sintzoff is shown that the program properties obtained by 172]) is the rule of signs. The text ‘1515* 17 an abstract interpretation of a program are consis– may be understood to denote computations on the tent with those obtained by a more refined inter– abstract universe {(+), (-), (~)} where the se- pretation of that program. In particular, an ab– mantics of arithmetic operators is defined by the stract interpretation may be shown to be consistent rule of signs. The abstract execution -1515* 17 with the formal semantics of the language. Levels => -(+) * (+) e> (–) * (+) => (–), proves that of abstraction are formalized by showing that con- –1515 * 17 is a negative number. Abstract interpre– sistent abstract interpretations form a lattice tation is concerned by a particular underlying (section 7). Section 8 gives a constructive defi– structure of the usual universe of computations nition of abstract properties of programs based on (the sign, in our example). It gives a summary of constructive definitions of fixpoints. It shows some facets of the actual executions of a program. that various classical algorithms such as Kildall In general this summary is simple to obtain but [731, Wegbreit[751 compute program properties as inaccurate (e.g. –1515+17 => –(+)+(+) ‘> limits of finite Kleene[52]’s sequences. Section (-)+(+) => (f)). Despite its fundamentally in- 9 introduces finite fixpoint approximation methods complete results abstract interpretation allows to be used when Kleene’ssequences are infinite, the programmer or the compiler to answer ques– Cousot[761. They are shown to be consistent with tions which,do not need full knowled~e of program the abstraction process. Practical examples illus– executions or which tolerate an imprecise answer, trate the various sections. The conclusion points (e.g. partial correctness proofs of programs ignO- out that abstract interpretation of programs is a ring the termination problems, type checking, pro- unified approach to apparently unrelated program gram optimizations which are not carried in the analysis techniques. absence of certainty about their feasibility, . .). 3’. Syntax and Semantics of programs 2. &unmary We will use finite flowcharts as a language inde– Section 3 describes the syntax and mathematical pendent representation of progrems. semantics of a simple flowchart language, Scott and Strachey[71]. This mathematical semantics is 3.1 Syntax of a Progrwn used in section 4 to built a more abstract model of the semantics of programs, in that it ignores the A program is built from a set “Nodes”. Each node sequencing of control flow. This model is taken to has successor and predecessor nodes : be the most concrete of the abstract interpretatiOns n–succ, n–pred : Nodes+ 2Nodesl (men-succ(n)) of programs. Section 5 gives the formal definition of the abstract interpretations of a program. <=>(ne n-pred(m)) Hereafter, we note ISl the cardinality of a set S. ~Jhen ]Sl = 1 so that S = {~we sometimes use S to * Attach= de Recherche au C.N.R.S., Laboratoire denote x. Associ6 no 7. The node subsets “Entries”’, “Assignments’!, “Tests”, ** This work was supported by IRIA–SESORI under “Junctions” and “Exits” partition the set Nodes. grants 75-035 and 75-160. – An entry node (n c Entries) has no predecess... and one successor, ((n-nred(n) = @)”and “-’- (In-succ(n)l = l)). 238 An assi~nment node (n c Assignments) has one 3.2 Semantics o-f Pro$warns predecessor and one successor (on-pred(n)l =1) and (In–succ(n)l = l)). Let “Iden= “Expr” This section develops a simple “mathematical seman- be the distinct syntactic categories of identi– tics” of programs, in the style of Scott and fiers and expressions. An assignment node n as– Strachey[711 . signs the value of the right hand–side expres- sion expr(n) to the left hand-side identifier If S is a set we denote S0 the complete lattice id(n) : obtained fromS by adjoining {1S, TS} to it, and imposinq the ordering L ~<x<TS for all x ● S. expr : Assignments ~ Expr id — : Assignments + Ident ‘Ihe semantic domain “Values” is a complete latti- ce which is the sum of the lattice Bool = {~, A test node (ncTests) has a predecessor and two false}” and some other primitive domains. successors, ((ln–pred(n) I = 1) and (In–succ(n)l = 2)). The true and false successor nodes are Environments are used to hold the bindings of respectively denoted n-succ-t(n) and n–succ–f(n): identifiers to their values : Env = IdentO + Values n-succ–t, n–succ-f : Tests + Nodes I (Vn c Tests, n–succ(n) = {n–succ–t(n), We assume that the meaning of an expression n–succ-f(n)}) . expr c Expr in the environment e c Env is given by val [Eexprl (e) so that : Let “Bexpr” b~ the syntactic category of boo– Expr + [Env + Values]. lean expressions, each test node n contains a a: boolean expression test(n) : In particular the projection val I Bexpr of the function val in domain Bexpr has the functiona- test = Tests + Bexpr lity : Bexpr + [Env + BOOII. A junction node (n c Junctions) has one succes- val \ Bexpr : sor and more than one predecessor, ((l”n-succ(n)l The state set “States” consists of the set of 1) and (In-pred(n)l > 1)). Immediate predeces- - all information configurations that can occur sor nodes of a junction node are not junction during computations : nodes, (~n E Junctions, ~m e n-pred(n), States = Arcs” x Env. not(m ● Junctions)). A state (s c States) consists in a control state An exit node n has one predecessor and no succes– (cs(s)) and an environment (=(s)), such that : — Vs e States, s = <es(s), env(s)>. sor, ((in-pred(n)l = 1) and (n-succ(n) = 0)) — we use a continuous conditional function cond(b, The set “Arcs” of edges of a program is a subset of Yodes ~ Nodes defined by : ~l!~ ~?i;~q~~lbt~sL~,e_ ~~l~er~~p~~t~e~~s~suse Arcs ={<n,m. I (n e Nodes) anl (m ~ n–succ(n))} if b then e, else e z ~ to denote cond(b, e], which may be equivalently defined by : —— e~. Arcs ={~n,m’ I (m c Nodes) and (n c n-pred(m))}. We will assume that the directed graph Nodes, Arcs> – If —e c Env, v c Values, x c Ident then is connected. —e [v/xl = Ay. cond(y = X, V, :(Y)). We will use the following functions : The state transition function defines for each origin, end : Arcs + Nodes I (Va 6 Arcs, a = <origin(a), state a next state (we consider deterministic end(a)>) programs) : Arcs n–state : States + States a–succ : Nodes ‘+ 2 I a–succ(n) = {.n,ms I m e n–succ(n)} n–state(s) = Arcs let n be end(cs(s)), e be env(s) within a–~red : Nodes + 2 — — — - I case n in *-pred(n) = {<m,n> I m ● n-pred(n)} ‘Ass~nments => a–succ–t : Tests + Arcs ~a–succ(n),erval [[~(n)l (e)/id(n)]>. a–succ–t(n) = <n, n–succ–t(n)> Tests .>———— cond(val [[test(n)] (e) I Bexpr, a–succ–f : Tests + Arcs <a-succ–t(n), e>,<a-succ–f(n), e>) a-succ–f(n) = <n, n-succ-f(n)> Junctions => <a-succ(n), e> Exits => s E.rample : ‘v esac 4 ~:=1 (Each partial function f on a set S is extended to a continuous total function on the correspon– ding domain S0 by f(l) = J, f(T) = T and f(x)=l if the partial function is undefined at x). f false Let 1 ~nv be the bottom function on Env such that ( x=lo~ (Vx e Ident”, lEnv(x) = Lvalues). true J Let I–states be the subset of initial states : X:=x+l I I–states = {<a–succ(m),lEnv> I m c Entries} 239 – A “computation sequence” with initial state Since the equation Cv(r) = n-context(r, Cv) must is 6 I–states is the sequence : be valid for each a=, Cv is a solution ~ the sys- s = n-staten(is) forn=O, 1,... tem of “forward” equati~s : n Cv = F–cont(Cv) where f“ is the identity function and — — fn+l = f . fn. where F–cent : Context–Vectors + Context–Vectors - The initial to final state transition function : is defined by : . F-cont(Cv) — = Ar . n-context(r, —Cv) n-state : States + States Context-Vectors is a complete lattice with union u such that Cvi U CV2 = Lr. (Cvl(r) u ~(r)). is the minimal fixpoint of the functional : . — AF. (n-state o F) F-cent is order preserving for the ordering ~ of COntext–Vec tors which is defined by : Therefore <=> {Vr < Arcs, Cvl(r) SCv2(r)} n–statem= y (lF.