Abstract Interpretation: a Semantics-Based Tool for Program Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Abstract Interpretation a SemanticsBased Tool for Program Analysis Neil D Jones DIKU University of Cop enhagen Denmark Flemming Nielson Computer Science Department Aarhus University Denmark June Contents Introduction Goals and Motivations Relation to Program Verication and Transformation The Origins of Abstract Interpretation A Sampling of Dataow Analyses Outline Basic Concepts and Problems to b e Solved A Naive Analysis of the Simple Program Accumulating Semantics for Imp erative Programs Correctness and Safety Scott Domains Lattice Duality and Meet versus Join Abstract Values Viewed as Relations or Predicates Imp ortant Points from Earlier Sections Towards Generalizing the Cousot Framework Proving Safety by Logical Relations Abstract Interpretation Using a TwoLevel Metalanguage Syntax of Metalanguage Sp ecication of Analyses Correctness of Analyses Induced Analyses Exp ected Forms of Analyses Extensions and Limitations Other Analyses Language Prop erties and Language Types Approaches to Abstract Interpretation Examples of Instrumented Semantics Analysis of Functional Languages Complex Abstract Values Abstract Interpretation of Logic Programs Glossary 1 Introduction Desirable mathematical background for this chapter includes basic concepts such as lattices complete partial orders homomor phisms etc the elements of domain theory eg as in the chapter by Abramsky or the b o oks Schmidt or Nielson a the elements of denotational semantics eg as in the chapter by Tennent or the b o oks Schmidt or Nielson a interpretations as used in logic There will b e some use of structural op erational semantics Kahn Plotkin Nielson a for example deduction rules for a pro grams semantics and type system The use of category theory will b e kept to a minimum but would b e a useful background for the domainrelated parts of Section Goals and Motivations Our primary goal is to obtain as much information as p ossible ab out a programs p ossible run time b ehaviour without actually having to run it on all input data and to do this automatically A widely used technique for such program analysis is nonstandard execution which amounts to p erforming the programs computations using value descriptions or abstract values in place of the actual computed values The results of the analysis must describ e al l possible program executions in contrast to proling and other runtime instrumentation which describ e only one run at a time We use the term abstract interpretation for a semanticsbased version of nonstandard execution Nonstandard execution can b e roughly describ ed as follows p erform commands or evaluate expressions satisfy goals etc using stores values drawn from abstract value domains instead of the actual stores values used in computations deduce information ab out the programs computations on actual in put data from the resulting abstract descriptions of stores values One reason for using abstract stores values instead of the actual ones is for computability to ensure that analysis results are obtained in nite time Another is to obtain results that describ e the result of computations on a set of p ossible inputs The rule of signs is a simple familiar abstract interpretation using abstract values p ositive negative and the latter is needed to express for example the result of adding a p ositive and a negative number Another classical example is to check arithmetic computations by cast ing out nines a metho d using abstract values to detect errors in hand computations The idea is to p erform a series of additions sub tractions and multiplications with the following twist whenever a result exceeds it is replaced by the sum of its digits rep eatedly if necessary The result obtained this way should equal the sum mo dulo of the digits of the result obtained by the standard arithmetic op erations For example consider the alleged calculation This is checked by reducing to to and to and then reducing to and so further to and nally is reduced to This diers from the sum mo dulo of the digits of so the calculation was incorrect That the metho d is correct follows from a b mo d a b mo d a b mo d a mo d b mo d mo d a b mo d a mo d b mo d mo d The metho d abstracts the actual computation by only recording values mo dulo Even though much information is lost useful results are still obtained since this implication holds if the alleged answer mo dulo diers from the answer got by casting out nines there is denitely an error On the need for approximation Due to the unsolvability of the halting problem and nearly any other question concerning program b ehaviour no analysis that always terminates can b e exact Therefore we have only three alternatives Consider systems with a nite number of nite b ehaviours eg pro grams without lo ops or decidable prop erties eg type checking as in Pascal Unfortunately many interesting problems are not so ex pressible Ask interactively for help in case of doubt But exp erience has shown that users are often unable to infer useful conclusions from the myr iads of esoteric facts provided by a machine This is one reason why interactive program proving systems have turned out to b e less useful in practice than hop ed Accept approximate but correct information Consequently most research in abstract interpretation has b een concerned with eectively nding safe descriptions of program b ehaviour yielding answers which though sometimes to o conservative in relation to the pro grams actual b ehaviour never yield unreliable information In a formal sense we seek a v relation instead of equality The eect is that the price paid for exact computability is loss of precision A natural analogy abstract interpretation is to formal semantics as numerical analysis is to mathematical analysis Problems with no known analytic solution can b e solved numerically giving approximate solutions for example a numerical result r and an error estimate Such a result is reliable if it is certain that the correct result lies within the interval r r The solution is acceptable for practical usage if is small enough In general more precision can b e obtained at greater computational cost Safety Abstract interpretation usually deals with discrete nonnumerical ob jects that require a dierent idea of approximation than the numerical analysts By analogy the results pro duced by abstract interpretation of programs should b e considered as correct by a pure semantician as long as the answers are safe in the following sense A b o olean question can b e answered true false or I dont know while answers for the rule of signs could b e p ositive negative or This apparently crude approach is analogous to the numerical analysts and for practical usage the problem is not to give uninformative answers to o often analogous to the problem of obtaining a small An approximate program analysis is safe if the results it gives can always b e dep ended on The results are allowed to b e imprecise as long as they always err on the safe side so if b o olean variable J is sometimes true we allow it to b e describ ed as I dont know but not as false Again in general more precision can b e obtained at greater computational cost Dening the term safe is however a bit more subtle than it app ears In applications eg co de optimization in a compiler it usually means the result of abstract interpretation may safely b e used for program transfor mation ie without changing the programs semantics To dene safety it is essential to understand precisely how the abstract values are to b e interpreted in relation to actual computations For an example supp ose we have a function denition f X X exp n where exp is an expression in X X Two subtly dierent dependency n analyses asso ciate with exp a subset of f s arguments Analysis I fX X g fX j exps value dep ends on X in at least one i im j j computation of f X X g n Analysis I I fX X g fX j exps value dep ends on X in every i im j j computation of f X X g n For the example f W X Y Z if W then X Y else X Z analysis I yields fW X Y Zg which is the smallest variable set always sucient to evaluate the expression Analysis I I yields fW Xg signifying that regardless of the outcome of the test evaluation of exp requires the values of b oth W and X but not necessarily those of Y or Z These are b oth dep endence analyses but have dierent mo dality Anal ysis I for p ossible dep endence is used in the binding time analysis phase of partial evaluation a program transformation which p erforms as much as p ossible of a programs computation when given knowledge of only some of its inputs Any variable dep ending on at least one unknown input in at least one computation might b e unknown at sp ecialization time Thus if any among W X Y Z are unknown then the value of exp will b e unknown Analysis I I for denite dep endence is a need analysis identifying that the values of W and X will always b e needed to return the value Such anal yses are used for to optimize