Abstract Interpretation PATRICK COUSOT DMI, E´ Cole Normale Supe´Rieure, Paris ͗[email protected]͗͘

Abstract Interpretation PATRICK COUSOT DMI, E´ Cole Normale Supe´Rieure, Paris ͗Cousot@Dmi.Ens.Fr͗͘

Abstract Interpretation PATRICK COUSOT DMI, E´ cole Normale Supe´rieure, Paris ͗[email protected]͗͘http://www.ens.fr/˜cousot͘

Abstract interpretation [Cousot and If the abstract semantics is computable Cousot 1977, 1979] is a general theory (Ᏸ] is usually assumed to be finite), we for approximating the semantics of dis- can infer that the abstract interpreta- crete dynamic systems, for example, tion is sound in the sense that: computations of programs. In particu- lar, program analysis algorithms can be S͠p͡ ʦ ͕S͉␴͑S, S]͠p͖͒͡, constructively derived from these abstract semantics. which may be sufficient to prove program properties, for example, that some programs “cannot go wrong.” However, PRINCIPLES OF ABSTRACT the choice of Ᏸ] and ␴ offers no guide- INTERPRETATION line for the design of the abstract se- ] A semantics S of a programming lan- mantics S with respect to the concrete guage L associates a semantic value semantics S. The approach propounded in [Cousot S͠p͡ ʦ Ᏸ in the semantic domain Ᏸ to each program p of L. The semantic do- and Cousot 1977, 1979] is constructive main Ᏸ can be transition systems (for in the sense that once the standard small-step operational semantics), pom- semantics S and an approximation ␣ sets, traces, relations (for big-step oper- are chosen, one can derive the best choice for the corresponding abstract se- ational semantics), higher-order func- ] tions (for denotational semantics), and mantics S . More precisely, let us call so on. Ᏸ is usually defined composition- elements of the powerset ally by induction on the structure of Ᏸ def ဧ͑Ᏸ͒ run-time objects (computations, data, Coll A etc.). S is defined compositionally by program (concrete) properties. Define induction on the syntactical structure of ᏰColl to be the collectingۋ SColl ʦ L programs using, for example, fixpoint semantics: definitions to handle iteration, recur- def sion, and the like. SColl͠p͡ A ͕S͠p͖͡. An empirical approach to abstract interpretation consists in a priori choos- (This is a conceptual step, because no ing a problem-specific abstract seman- other detailed specification of SColl is tics domain Ᏸ] and an abstract needed but for the design of formal ] ] Ᏸ that is designed proof methods.) SColl ͠p͡ is the strongestۋ semantics S ʦ L intuitively for a specific language L. program property. We have seen that an Then safety, correctness, or soundness abstract property P, such as the preced- ] is established by proving that a sound- ing {S ͉ ␴(S, S ͠p͡)}, is weaker in that ness relation ␴ satisfies SColl͠p͡ ʕ P. We call ʕ the approximation ordering. Now the abstraction func- ] ᏰColl. We callۋ ᭙ p ʦ L:␴͑S͠p͡, S ͠p͒͡. tion is a map ␣ ʦ ᏰColl

Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. © 1996 ACM 0360-0300/96/0600–0324 $03.50

ACM Computing Surveys, Vol. 28, No. 2, June 1996 Abstract Interpretation • 325

def ␣[Ᏸ] A {␣(S) ͉ S ʦ Ᏸ} the abstract do- in order to conclude (under suitable hy- main. We often use an isomorphic rep- potheses, see Cousot and Cousot [1979]) resentation Ᏸ] for ␣[Ᏸ] and directly de- that Ᏸ]. An example wouldۋ fine ␣ ʦ Ᏸ Coll ] ] be the approximation of a trace-based ␣͑lfp vFp͓C1,...,Cn͔͒ ʕ lfp v Fp semantics by a relational/denotational semantics: ⅐ ͓␣͑C1͒,...,␣͑Cn͔͒.

␣͑P͒ def ͕͗s , s ͉͘s s ...s ʦP͖ This leads to the definition of the A 0 n 0 1 n abstract semantics:

Ќ ] ͕͗s0, ͉͘s0s1...ʦPis ] def ] S ͠p͑ p ,...,p ͒͡ lfp v F an infinite trace}. 1 n A p ] ] Another approximation, ignoring the ⅐ ͓S ͠p1͡,...,S ͠pn͔͡, distinction between finite and infinite computations, would be the approxima- which is sound, by construction, so that tion of traces by sets of states: no a posteriori verification is necessary. When equality holds, we have a com- def ␣͑P͒ A ͕si͉ s0s1...si...ʦP͖, pleteness property that is useful in designing hierarchies of semantics. By which is adequate for safety/invariance identifying useful abstract algebras con- properties. ] ] sisting of an abstract domain Ᏸ and Once such abstract domain Ᏸ and abstract operations F] corresponding to abstraction function ␣ have been cho- common primitive operations F used in sen, it remains to derive the abstract ] the semantic definition of programming semantics S ͠p͡, p ʦ L. We proceed languages, one can design abstraction compositionally, by induction on the libraries of general scope. Finally, the syntactic structure of program p. The abstraction can be parameterized with price to pay is that in general the ideal other abstractions in order to obtain ␣(SColl͠p͡) has to be approximated by ] generic semantic definitions and ab- S ͠p͡ ʖ ␣(SColl͠p͡). For example, con- stract interpreters. Contrary to, for ex- sider a fixpoint definition (for some syn- ample, type inference, soundness (and tactic construct p(p1,...,pn) with com- relative completeness) can be estab- ponents p1,...,pn): lished once and for all (may be parame- def terized by basic abstractions for generic S p͑ p ,...,p ͒ lfp vF ͠ 1 n ͡ A p implementations) and not for each par- ticular instance of the program analysis ͓S͠p ͡,...,S͠p ͔͡. 1 n problem. For simplicity, we consider the simple The abstract interpretation frame- case when the computational ordering work sketched is based on an abstrac- coincides with the approximation order-v tion function ␣. The existence of a best ing ʕ. Assuming, by the induction hy- (most precise) approximation among all pothesis, that we know a sound abstract the possible sound ones is ensured by semantics for the program components choosing ␣ as the upper-adjoint of a p1,...,pn: Galois connection. Other equivalent for- mulations using Moore families, closure ] ␣͑SColl͠pi͒͡ ʕ S ͠pi͡, i ϭ 1,...,n, operators, ideals, and so on, were intro- then we look, by algebraic formula ma- duced in Cousot and Cousot [1979]. The ] case in which the computational order- nipulation, for Fp satisfying for all ing does not coincide with the approx- C1,...,Cn ʦᏰColl: imationv ordering ʕ in their concrete or ␣͕͑Fp͓S1,...,Sn͔͉ iϭ1,...,n:Si ʦCi͖͒ abstract versions is considered in Cou- sot and Cousot [1994]. Other alterna- ] ʕ Fp͓␣͑C1͒,...,␣͑Cn͔͒ tives are studied in Cousot and Cousot

ACM Computing Surveys, Vol. 28, No. 2, June 1996 326 • P. Cousot

[1992] (using a soundness relation ␴,a cide with the type structure of the lan- concretization ␥, an abstraction/concret- guage) for all possible data and control ization pair ͗␣, ␥͘ or widenings, which structures encountered in programming allow the degree of approximation to languages. Let us consider a small evolve during program analysis so that range of samples of data abstractions: the abstract domain Ᏸ] is not fixed once and for all but evolves during the anal- —For attribute-independent abstrac- ysis). tions of sets of vectors of numbers, Under the influence of Mycroft’s early one can consider signs, intervals, par- application of abstract interpretation to ity, and simple congruences. For rela- strictness analysis of lazy functional tional abstractions, one can consider languages, the standard semantics S is linear equalities, linear inequalities, often chosen to be a denotational se- congruences, congruencial trapezoids, mantics (see Jones and Nielson [1995] and so on. for a survey taking this point of view), —For the context-free abstraction of for- which is generally adequate for func- mal languages, one can consider regu- tional languages. The original opera- lar expressions and grammars (thus tional-based abstract interpretation making the link with set-based analy- [Cousot 1981] turned out to be much sis). Unitary-prefix monomial decom- more satisfactory for imperative, logic positions of rational subsets of a free [Debray 1994], and more recently con- m p monoid (such as {X.t1 .hd.tl ͦ current, distributed, and object-oriented m ϭ n ϭ p}) provide an example of languages. A general framework unifies context-sensitive abstraction [Deut- both points of view [Cousot and Cousot sch 1995]. These abstractions can be 1994] by lifting operational semantics to used for pointer analysis based on handle infinite behaviors, considering location-free/storeless models of com- the equivalence between rule-based and putation. fixpoint presentations of semantics specifications and viewing denotational —For the abstraction of sets of graphs, semantics as part of a hierarchy of ab- as used, for example, in store-based stractions of operational semantics. pointer analysis, refer to Deutsch [1995].

APPLICATION OF ABSTRACT To remain succinct on control struc- INTERPRETATION tures, let us consider the example of The most widespread use of abstract program loops that involve solving an ] ] semantics S is for the specification of equation X ϭ Fᐉ (X) originating from the ] ] program analyzers as used for high-per- fixpoint definition lfpv Fᐉ of the ab- formance compilers, program transfor- stract semantics S]͠ᐉ͡ of a program ᐉ. mation (e.g., program vectorization and This is a typical algorithmic problem parallelization), partial evaluation, test involved in abstract interpretation. One ] generation for program debugging, ab- classical solution, when Fᐉ is a mono- stract debugging (involving abstract tonic operator on a poset ͗Ᏸ], ]͘ with 0 def v nϩ1 values/properties instead of concrete infimum Ќ, is iteration: X A Ќ, X def ] n ones), polymorphic type inference, effect A Fᐉ (X ) until convergence. Widenings systems, model checking, verification of [Cousot and Cousot 1977] can be used to hybrid systems, and the like. accelerate convergence (in large or infi- An important aspect of research in nite domains) and cope with the lack of abstract interpretation is concerned least upper bounds during fixpoint iter- with the composable design of abstrac- ation. Widenings can be understood as a tions ␣ by induction on the mathemati- local dynamic change of the abstract cal structure of the semantic domain Ᏸ semantic domain during the analysis (which, for typed languages, often coin- [Cousot and Cousot 1992.]

ACM Computing Surveys, Vol. 28, No. 2, June 1996 Abstract Interpretation • 327

The complexity issues in abstract inter- Domains pretation have only been touched upon. A —abstract domains for nonnumerical common error is that abstract interpreta- objects; tion is thought to be inherently exponen- —design of abstraction functions speci- tial (as opposed to polynomial dataflow fying different program analysis analysis). It can be exactly the contrary! methods so as to compare their rela- It can be polynomial or polynomial on the tive power; average but exponential in pathological cases that are rare enough to be cut off by —decomposition/combination of existing widening (e.g., polymorphic type infer- program analyses; ence a` la Hindley-Milner). In general, ex- Algorithms ponential costs can be avoided by using —equation resolution and convergence widenings, introducing further approxi- acceleration methods; mation as analysis time elapses. The com- —compositional design of widenings; mon idea that program analyzers should be as fast as compilers does not necessar- Abstract interpreters ily take the cost/benefit tradeoff into ac- —design and implementation of gener- count: is it better to spend eight (night) al-purpose libraries of abstract do- hours of CPU time or eight (day) hours of mains and their associated opera- manpower to find a crucial programming tions; error? —design of language-specific generic abstract interpreters; and CONCLUSION AND RESEARCH —formal or experimental study of the PERSPECTIVES complexity/benefit tradeoff for program analyses. Although the designer of program analyzers may prefer empirical approaches, abstract interpretation is often an indis- REFERENCES pensable guideline in avoiding conceptual errors because it provides a meth- COUSOT, P. 1981. Semantic foundations of pro- odology for designing a formal speci- gram analysis. In Program Flow Analysis: Theory and Applications. S.S. Muchnick and fication. It provides a synthetic under- N.D. Jones, Eds., Prentice-Hall, Englewood standing of the abundant literature on Cliffs, NJ, Ch. 10, 303–342. semantics, type systems, logics of com- COUSOT,P.AND COUSOT, R. 1994. Higher-or- putations, program verification, pro- der abstract interpretation (and application gram analysis, partial evaluation, and to comportment analysis generalizing strict- the like, all involving more or less re- ness, termination, projection and PER analysis of functional languages), invited paper. fined or abstract semantics of computa- In Proceedings of 1994 ICCL (Toulouse, tions. France, May 16–19), IEEE, Los Alamitos, A number of problems remain to be CA, 95–112. considered or more thoroughly studied, COUSOT,P.AND COUSOT, R. 1992. Abstract in- for example, terpretation frameworks. J. Logic Comput. 2, 4 (Aug.), 511–547. Semantics COUSOT,P.AND COUSOT, R. 1979. Systematic —general frameworks formalizing the design of program analysis frameworks. In notion of semantic approximation; Proceedings of the 6th POPL (San Antonio, TX), ACM Press, New York, 269–282. —models of computations (e.g., true COUSOT,P.AND COUSOT, R. 1977. Abstract in- concurrency) to be used in concrete terpretation: A unified lattice model for static semantics; analysis of programs by construction or approximation of fixpoints. In Proceedings of the —design of hierarchies of parameter- 4th POPL (Los Angeles, CA), ACM Press, New ized semantics for programming lan- York, 238–252. guages (such as concurrent and ob- DEBRAY, S. K. 1994. Formal bases for data- ject-oriented languages); flow analysis of logic programs. In Advances

ACM Computing Surveys, Vol. 28, No. 2, June 1996 328 • P. Cousot

in Logic Programming Theory, International CA, June 21–23), ACM Press, New York, Schools for Computer Scientists, G. Levi, 226–229. Ed., Clarendon Press, New York, Sect. 3, JONES,N.AND NIELSON, F. 1995. Abstract inter- 115–182. pretation: A semantic-based tool for program DEUTSCH, A. 1995. Semantic models and ab- analysis. In Semantic Modelling, S. Abram- stract interpretation techniques for induc- sky, D.M. Gabbay, and T.S.E. Maibaum, Eds., tive data structures and pointers. Invited number 4 in Handbook of Logic in Computer paper. In Proceedings of PEPM’95 (La Jolla, Science. Clarendon Press, New York.

ACM Computing Surveys, Vol. 28, No. 2, June 1996