Abstract Interpretation: From Theory to Tools

Patrick Cousot

cims.nyu.edu/~pcousot/

pc12ou)(sot@cims$*.nyu.edu

ICSME 2014, Victoria, BC, Canada, 2014-10-02 1 © P. Cousot Bugs everywhere!

Ariane 5.01 failure Patriot failure Mars orbiter loss Russian Proton-M/DM-03 rocket (overflow error) (float rounding error) (unit error) carrying 3 Glonass-M satellites (unknown programming error :)

unsigned int payload = 18; /* Sequence number + random bytes */ unsigned int padding = 16; /* Use minimum padding */

/* Check if padding is too long, payload and padding * must not exceed 2^14 - 3 = 16381 bytes in total. */

OPENSSL_assert(payload + padding <= 16381);

/* Create HeartBeat message, we just use a sequence number * as payload to distuingish different messages and add * some random stuff. * - Message Type, 1 byte * - Payload Length, 2 bytes (unsigned int) * - Payload, the sequence number (2 bytes uint) * - Payload, random bytes (16 bytes uint) * - Padding */

buf = OPENSSL_malloc(1 + 2 + payload + padding); p = buf; /* Message Type */ *p++ = TLS1_HB_REQUEST; /* Payload length (18 bytes here) */ s2n(payload, p); /* Sequence number */ s2n(s->tlsext_hb_seq, p); /* 16 random bytes */ RAND_pseudo_bytes(p, 16); p += 16; /* Random padding */ RAND_pseudo_bytes(p, padding);

ret = dtls1_write_bytes(s, TLS1_RT_HEARTBEAT, buf, 3 + payload + padding);

Heartbleed (buffer overrun)

ICSME 2014, Victoria, BC, Canada, 2014-10-02 2 © P. Cousot Bugs everywhere!

Ariane 5.01 failure Patriot failure Mars orbiter loss Russian Proton-M/DM-03 rocket (overflow error) (float rounding error) (unit error) carrying 3 Glonass-M satellites (unknown programming error :)

unsigned int payload = 18; /* Sequence number + random bytes */ unsigned int padding = 16; /* Use minimum padding */

/* Check if padding is too long, payload and padding * must not exceed 2^14 - 3 = 16381 bytes in total. */

OPENSSL_assert(payload + padding <= 16381);

/* Create HeartBeat message, we just use a sequence number * as payload to distuingish different messages and add * some random stuff. * - Message Type, 1 byte * - Payload Length, 2 bytes (unsigned int) * - Payload, the sequence number (2 bytes uint) * - Payload, random bytes (16 bytes uint) * - Padding */ buf = OPENSSL_malloc(1 + 2 + payload + padding); These are great proofs of the p = buf; /* Message Type */ *p++ = TLS1_HB_REQUEST; • /* Payload length (18 bytes here) */ s2n(payload, p); /* Sequence number */ s2n(s->tlsext_hb_seq, p); /* 16 random bytes */ RAND_pseudo_bytes(p, 16); presence of bugs! p += 16; /* Random padding */ RAND_pseudo_bytes(p, padding);

ret = dtls1_write_bytes(s, TLS1_RT_HEARTBEAT, buf, 3 + payload + padding);

Heartbleed (buffer overrun)

ICSME 2014, Victoria, BC, Canada, 2014-10-02 3 © P. Cousot On the limits of bug finding • Giant software manufacturers can rely on gentle end- users to find myriads of bugs; • But what about:

can passengers really help? • Is dynamic/static bug finding always enough? • Proving the absence of bugs is much better!

ICSME 2014, Victoria, BC, Canada, 2014-10-02 4 © P. Cousot

ICSME 2014, Victoria, BC, Canada, 2014-10-02 5 © P. Cousot Formal Methods • Mathematical and engineering principles applied to the specification, design, construction, verification, maintenance, and evolution of very high quality software • Strongly promoted by Harlan D. Mills since the 70’s e.g. • Harlan D. Mills: The New Math of Computer Programming. Commun. ACM 18(1): 43-48 (1975) • Harlan D. Mills: Software Development. IEEE Trans. Software Eng. 2(4): 265-273 (1976) • Harlan D. Mills: Function Semantics for Sequential Programs. IFIP Congress 1980: 241-250 • ...

ICSME 2014, Victoria, BC, Canada, 2014-10-02 6 © P. Cousot Main formal methods for verification • Objective: prove automatically that a program does satisfy a specification given either explicitly or implicitly (e.g. absence of runtime errors) • Deductive methods: use a theorem prover/proof assistant to check a user-provided proof argument • Enumerative, symbolic, bounded, solver(e.g. Z3)- based, interpolation, statistical, etc model-checking: check the specification by enumerating finitely many possibilities • Abstract interpretation: use approximation ideas to consider infinitely many possiblilities

ICSME 2014, Victoria, BC, Canada, 2014-10-02 7 © P. Cousot Fundamental limitations • By Gödel’s undecidability, no perfect solution is and will ever be possible: • Deductive methods: the burden is on the end-user and the proofs are exponential in the size of programs • Model-checking: severe unsolved scalability problem • Abstract interpretation: may produce false alarms (but no false negative) • Unsound methods (Coverity, Klocwork, Purify, etc): no correctness guarantee at all.

ICSME 2014, Victoria, BC, Canada, 2014-10-02 8 © P. Cousot The Evolution of Formal Methods

ICSME 2014, Victoria, BC, Canada, 2014-10-02 9 © P. Cousot testing are returned to the development team for cor- A traditional project experiencing, say, five rection. If quality is not acceptable, the software is errors/KLOC in function testing may have encount- removed from testing and returned to the develop- ered 25 or more errors per KLOC when measured ment team for rework and reverification. from first execution in unit testing. Quality compar- The process of Cleanroom development and cer- isons between traditional and Cleanroom software tification is carried out incrementally. Integration is are meaningful when measured from first execution. continuous, and system functionality grows with the Experience has shown that there is a qualitative addition of successive increments. When the final difference in the complexity of errors found in increment is complete, the system is complete. Cleanroom and traditional code. Errors left behind Because at each stage the harmonious operation of by Cleanroom correctness verification, if any, tend to future increments at the next level of refinement is be simple mistakes easily found and fixed by statis- predefmed by increments already in execution, inter- tical testing, not decp design or interface errars. face and design errors are rare. Cleanroom errors are not only infrequent, but The Cleanroom process is being successfully usually simple as well. applied in IBM and other organizations. The tech- Highhghts of Cleanroom projects reported in nology requires some training and practice, but Table 1Change are described below:of Scale builds on existing skills and software engineering practices. It is readily applied to both new system• 1993: IBM Flight Control. A III-I60 helicopter avionics development and re-engineering and extension of component was developed on schedule in three existing systems. The IBM Cleanroom Software increments comprising 33 KLOC of JOVIAL [SI. Technology Center (CSTC) [4] provides technology A total of 79 corrections were required during statis- transfer support to Cleanroom teams through educa- tical certification for an error rate of 2.3 errors per tion and consultation. KLOC for verified software with no prior execution or debugging. Cleanroom quality results • 2013: Astrée IBM COBOL checks Structuring automatically Facility the (COBOL/SI;). absence of Table 1 summarizes quality results from anyCOBOL/SF, runtime IBM’s error first in thecommercial control/command Cleanroom Cleanroom projects. Earlier results are reported in softwareproduct, was of developedthe A380 byand a A400Msix-person by team. abstract The [SI. The projects report a ”certification testing interpretationproduct automatically i.e. > 1000 transforms KLOC of unstructuredC failure rate;” for example, the rate for the IBM Flight COBOL programs into functionally equivalent struc- Control project was 2.3 errors per KLOC, andHarlan for D. Mills: Zerotured Defect Software: form Cleanroom for improved Engineering. Advances understandability in Computers 36: 1-41 (1993) and main- the IBM COBOL Structuring Facility project,Patrick 3.4 Cousot, Radhia Cousot, Jérôme Feret, Laurent Mauborgne, Antoine Miné, Xavier Rival: Why does Astrée scale up? Formal Methods in tenance.System Design 35 (3):It 229-264 makes (2009) use of proprietary graph-theoretic errors per KLOC. These numbers represent all algorithms, and exhibits a level of complexity on the errors found in all testing, measured from first-everICSME 2014, Victoria, BC, Canada,order 2014-10-02 of a COBOL compiler. 10 © P. Cousot execution through test completion. That is, the rates The current version of the 85 KLOC PL/I represent residual errors present in the software fol- product required 52 KLOC of new code and 179 lowing correctness verification by development corrections during statistical certification of five teams. increments, for a rate of 3.4 errors per KLOC [7]. The projects in Table 1 produced over a half a Several major components completed certification million lines of Cleanroom code with a range of 0 to with no errors found. In an early support program 5.1 errors per KLOC for an average of 3.3 errors per at a major aerospace corporation, six months of KLOC found in all testing, a remarkable quality intensive use resulted in no functional equivalence achievement indeed. errors ever found [SI. Productivity, including all Traditionally developed software does not specification, design, verification, certification, user undergo correctness verification. It goes from devel- publications, and management, averaged 740 LOC opment to unit testing and debugging, then more per person-month. Challenging schedules defined for debugging in function and system testing. At entry competitive reasons were all met. A major benefit of to unit testing, traditional software typically exhibits Cleanroom products is dramatically reduced mainte- 30-50 errors/KLOC. Traditional projects often nance costs. COBOL/SF has required less than one report errors beginning with function testing (or person-year per year for all maintenance and cus- later), omitting errors found in private unit testing. tomer support.

3 Proliferation

WCET Security protocole Operational Axiomatic Systems biology semantics verification analysis semantics Abstraction Dataflow Model Database refinement Confidentiality checking analysis analysis query Type Partial Obfuscation Dependence Program evaluation inference synthesis Denotational analysis Separation Effect logic Grammar systems semantics CEGAR analysis Theories Program Termination Statistical Trace combination transformation proof semantics model-checking Interpolants Abstract Shape Code analysis Invariance Symbolic contracts Integrity model proof execution analysis checking Malware Probabilistic Quantum entanglement Bisimulation detection verification detection SMT solvers Code refactoring Parsing Type theory Steganography Tautology testers

ICSME 2014, Victoria, BC, Canada, 2014-10-02 11 © P. Cousot The Theory of Abstract Interpretation: Unifies Formal Methods

ICSME 2014, Victoria, BC, Canada, 2014-10-02 12 © P. Cousot The need for a unified account of formal methods

WCET Security protocole Operational Axiomatic Systems biology semantics verification analysis semantics Abstraction Dataflow Model Database refinement Confidentiality checking analysis analysis query Type Partial Obfuscation Dependence Program evaluation inference synthesis Denotational analysis Separation Effect logic Grammar systems semantics CEGAR analysis Theories Program Termination Statistical Trace combination transformation proof semantics model-checking Interpolants Abstract Shape Code analysis Invariance Symbolic contracts Integrity model proof execution analysis checking Malware Probabilistic Quantum entanglement Bisimulation detection verification detection SMT solvers Code refactoring Parsing Type theory Steganography Tautology testers

ICSME 2014, Victoria, BC, Canada, 2014-10-02 13 © P. Cousot Underlying unity of formal methods Abstract interpretation WCET Security protocole Operational Axiomatic Systems biology semantics verification analysis semantics Abstraction Dataflow Model Database refinement Confidentiality checking analysis analysis query Type Partial Obfuscation Dependence Program evaluation inference synthesis Denotational analysis Separation Effect logic Grammar systems semantics CEGAR analysis Theories Program Termination Statistical Trace combination transformation proof semantics model-checking Interpolants Abstract Shape Code analysis Invariance Symbolic contracts Integrity model proof execution analysis checking Malware Probabilistic Quantum entanglement Bisimulation detection verification detection SMT solvers Code refactoring Parsing Type theory Steganography Tautology testers

ICSME 2014, Victoria, BC, Canada, 2014-10-02 14 © P. Cousot Principle of Abstract Interpretation

ICSME 2014, Victoria, BC, Canada, 2014-10-02 15 © P. Cousot Concrete universe of What is abstraction in AI? discourse

ICSME 2014, Victoria, BC, Canada, 2014-10-02 16 © P. Cousot Concrete universe of What is abstraction in AI? discourse Elements

ICSME 2014, Victoria, BC, Canada, 2014-10-02 17 © P. Cousot Concrete universe of What is abstraction in AI? discourse Elements

Properties

ICSME 2014, Victoria, BC, Canada, 2014-10-02 18 © P. Cousot Concrete universe of What is abstraction in AI? discourse Abstract Elements universe of properties

Properties

ICSME 2014, Victoria, BC, Canada, 2014-10-02 19 © P. Cousot Concrete universe of What is abstraction in AI? discourse Abstract Elements universe of properties Abstract properties

Properties

ICSME 2014, Victoria, BC, Canada, 2014-10-02 20 © P. Cousot Concrete universe of What is abstraction in AI? discourse Abstract Elements universe of properties γ Abstract properties α γ α

Properties

ICSME 2014, Victoria, BC, Canada, 2014-10-02 21 © P. Cousot Concrete universe of What is abstraction in AI? discourse Abstract Elements Inclusion universe of properties γ Abstract

⊆ properties α γ α

Properties

ICSME 2014, Victoria, BC, Canada, 2014-10-02 22 © P. Cousot Concrete universe of What is abstraction in AI? discourse Abstract Elements Inclusion universe of properties γ Abstract

⊆ properties α ⊑ γ Abstract α implication

Properties

ICSME 2014, Victoria, BC, Canada, 2014-10-02 23 © P. Cousot Concrete universe of What is abstraction in AI? discourse Abstract Elements Inclusion universe of properties γ Abstract

⊆ properties α ⊑ γ Abstract α implication

Provable abstract properties Properties are true in the concrete ICSME 2014, Victoria, BC, Canada, 2014-10-02 24 © P. Cousot the concrete (Ex. 12). We restate the EMC problem in terms which is itself sound because it correctly abstracts a concrete of the primitives of the underlying abstract domain (EMC, Hoare logic describing precisely the language semantics. Sec. 12). We prove soundness, i.e., a solution to EMC is Both concrete and abstract Hoare logics can be formalized a solution for EMC (Th. 15). We present some examples in a single unified framework using algebraic Hoare triples proving that, in general, a complete EMC is impossible. and abstract interpretation to relate algebraic Hoare logics We abstract the iterated formulation of the EMC solu- operating at different levels of abstraction. tion to provide an effective static analysis to compute EMC In this context the conjunction rule is potentially problem- (Sec. 13). Let us assume to have an abstract transformer atic in the abstract (as illustrated in the forthcoming Ex. 5). (roughly, the two-directional static analysis for the method We cannotAbstract get rid of this conjunction rule because it for- interpretation: example body) safely approximating the concrete semantics of the malizes the use of reduced products [13] in static analyzers. method and a projection of the abstract states in the origi- Therefore we study sufficient conditions on the abstraction nal method (before the extraction of the method) PS ,QS . for this conjunction rule to be sound (Th. 6) which is useful h i beyond the specific problem of method refactoring (Ex. 7). Then the iterations of the abstract transformer starting. from. Roslyn exposes the (C# and VB) compiler internals /* P :pre-state,S:refactored code, ~p: variables potentially PS ,QS provide a correct solution (Th. 20). When the S (syntax trees, object model, data-flow analyses, refactoring, h i used in S, ~g:variablesdefinitelyunmodifiedbyS, Q :post- abstract. . transformer is the best approximation of the con- Theory:We first introduce some definitions and notations used in Applications: S Practice: etc.) to external developers, so that they can develp new crete transformer, the abstract forwards/backwards iterations the rest of the paper. state such that ¯ P ¯ S ¯ Q ¯ holds. */ S ~p ~g S plugins (code analyses, refactorings) on top of it. provide the most precise solution for EMC (Th. 21). When | \ automatic contract refactoring, which automatically infers CCCheck is a static contract verifier for CodeContracts. It the underlying abstract domain does not satisfy the Ascend- Galois Connections We recall from [11] that a Galois RefactorContract(P S, S, ~p, ~g, QS) { analyzes each method in isolation, assuming the precondi- connection C, A, is such that C, and good contracts for the extracted method. ing/Descending chain conditions, a fixpoint acceleration op- ↵ h i ! h vi h i 1 tion and asserting the postcondition. CCCheck can also do erator (narrowing [11]) should be used to enforce the con- A, are partial orders, ↵ C A and C A use A ~p , , 1 // precondition abstract domain h vi 2 ! 2 ! h u i backward analyses to infer a precondition from the postcon- vergence of the iterations (Algorithm EMC in Alg. 5). The satisfy x C : y A : ↵(x) y x (y).We B ~p,~p , 2 , 2 // postcondition abstract domain Forth solution: Abstract states projection An immediate 8 2 8 2 v () h u i dition. CCCheck is based on abstract interpretation and hence resulting contract is still a correct solution (Th. 22), but we write C, A, to denote that the abstraction postJ//K forward analyser with widening/narrowing idea to solve the problem consists in projecting the relevant h i ↵ h vi has more advanced inference capabilities than similar tools. may not get the most general solution — just one more gen- function ↵ is surjective, !! and hence that there are no multiple preJ// backwardK analyser with widening/narrowing For instance, it infers loop invariants and it suggests method variables from the original abstract proof so as to get the eral than the simple projection. representations for the same concrete property in the abstract. preconditions and postconditions (the P , Q in this pa- We implemented the new algorithms by integrating the If the C and A are complete lattices, and ↵ is join-preserving, h m mi required modular proof. Such a solution is unsatisfactory for // abstractf projection on potentially used variables ~p per). CCCheck contains several abstract domains for the heap, Code Contracts for .NET static analyzer (CCCheck)[20] then it exists a unique such that C, A, . h i ↵ h vi non-nullness, numerical properties, array contents, enums, three main reasons. First, it does not work when refactoring with the Microsoft Roslyn CTP (Roslyn)[37]. We use ! P S , QS = ~p ~g(P S), ~p ~g(QS) ; h i h# \ # \ i but also to track (simple) existential and quantified proper- unreachable code: the abstract state is empty, so the generated Roslyn, a new implementation of .NET languages to support Abstract domains We let S S ~v be a statement with // .infer. a correct safety abstract contract 2 ties [20]. Most of these abstract domains use widenings so the compiler-as-service paradigm, as our refactoring engine. visible variables ~v and ~v be the set of unary predicates on precondition is false. Second, too much information may Let P m be the abstract safety pre-condition for S completeness cannot be guaranteed in the theoretical sense We use CCCheck as the underlying static analyzer. Unlike ~v P variables . Predicates can be isomorphicallyJ K represented as computed by the static analysis [18]; for tortuous counter-examples and the contracts cannot tech- be lost (e.g., for relational analyzes) or too much information similar tools (e.g.,[22, 23]), CCCheck is based on abstract ~ Boolean functions P J K~v , ~v B mapping values nically be the most general. The benchmarks ran with the interpretation, and it automatically infers and propagates loop ~ 2 P V ! Qm = post S~p P m; // forward abstract static analysis may be preserved (e.g., not related to the method correctness). ~v ~v of vector values of variables ~v to Booleans: P (~v) default settings show that the inferred contracts can hardly invariants intra-procedurarly, so that annotations are needed 2 V 2 // ¯ P ¯ S ¯ Q ¯ holds For instance, in Ex. 1, the projection of the abstract state B , true, false . PredicatesJ K J areK ordered according to m ~p ~g m be improved manually for the abstraction used by the static only for the method boundaries. The inferred invariants are { } J| \ K ˙= , i.e.J K, the pointwise lifting of logical implication to produces the too strong precondition 5 x for the extracted used to validate both the user-provided contracts as well ) P , Q = P , Q ; analyzer. functions: h R Ri h S S i A screenshot of the extract method with contracts.  as the absence of runtime errors (e.g., null dereference, ~ do . . Figure 1. method. Ideally we’d love to infer the precondition 0 x. P ˙= P 0 , ~v ~v : P (~v)= P 0(~v). The implementation We preferred not to implement our-  underflow/overflow, buffer overruns, etc.). Our experience ) 8 2 V ) // compute X, Y = F R S ( P R, QR ) The suggested contract for the extracted method is valid, safe, Third, programs evolve over time so a refactoring might shows that the proposed method extraction with contracts is For example x . x =0 ˙= x . x > 0. Predicates with h i h i selves the syntactic extract method from scratch. We used ) 1 1 S // partial order ˙= form a completeJ K Boolean lattice: X = P m P R pre ~p QR; backward analysis Roslyncomplete, which and takes the care most of both general the user one. interface (e.g., work when performed but no longer work with later program quite effective (Sec. 14). ) u u J K ˙ ˙ ˙ 2 2 code selection, right click, previews, etc.) and the basic refac- ~v , ˙= , false˙ , true, , , ˙ Y = Qm QR post S~p P R; // forward analysis modifications. hP ) _ ^ ¬i u u f J K torings. Furthermore, we did not want to try our examples where false˙ is the infimum, true˙ is the supremum, ˙ is the P , Q = P X, Q Y ; // narrowing imprecise. In particular, some information present in the _ R R R 1 R 2 5. Algebraic Hoare Logic least upper boundJ K (lub), ˙ is the greatest lower bound (glb), h i h J K i on toy implementations or abstract domains, hence we (mod- Example 2. Suppose that we want to extract a method ^ while P R, QR = X, Y ; ifiedoriginal and) used codeCCCheck (programmerto implement assertions, the EMC algorithm. runtime errors, etc) is We use Hoare logic [29] to formalize Contracts [4, 35]. and ˙ is the unique complement for the partial order ˙= on cch i6 h i MakeRoom from (*)...(**) in the code below. ¬ ) cc cc CCChecklost whenruns the as arefactored background code service is in staticallyRoslyn. While analyzed separately. The concrete Hoare rules are used to specify the program the set ~v . // gfpv F R S P R, QR P S , QS holds P P S , QS vh i vh i Roslyn provides syntactic, source-level, ASTs, CCCheck an- axiomatic semantics, i.e., all possible program executions. The precondition abstract domain A ~v , 1 is an abstract h i . . This can be avoided by using the method safety contract Insert(string[] list, h vi . . alyzes bytecode. Therefore there is some (non-trivial) glue We use an abstract version of Hoare logic to formalize domain expressingJ K properties of the variables ~v where the return P R, QR ; J//K (a) validity & (b) safety hold ref int count, string newElement) { h i codesuggested connecting by theCCCheck two. . When the safety precondition is contract-based separate static analyses. The abstract Hoare partial order 1 abstracts logical implication.J K The meaning v } The extract method with contracts is implemented as a Requires(list != null && rules are used to specify how the static analyzer should work of an abstract property P A ~v is a concrete property violated, the execution of the extracted method will either 2 Visual Studio extension for C#. When the user selects a piece // in bounds for a given abstraction. In this abstract Hoare logic, predi- 1(P ) ~v where the concretization Algorithm 5. Algorithm EMC (Extract Methods with Ab- not terminate or definitely yield a run-time error [18]. So 2 P of code S, Roslyn in the background (and concurrently), 0 <= count && count <= list.Length && cates are replaced by abstract properties chosen in computer- 1 A ~v , 1 J K ~v , ˙= stract Contracts) computing an approximation of a greatest 2h vi ! hP )i invokesthe safety the extension precondition asking it to provide is necessary a refactoring,for if any. avoiding runtime representable abstract domains with computable transformers J K fixpoint with convergence acceleration. // no overflows on resize is increasing (i.e., P 1 P 0 implies 1(P )˙= 1(P 0)). Our extension first forwards the call to the refactoring engine and fixpoint approximation [12] such as intervals [10], oc- v ) errors. As shown by our example, the safety precondition is list.Length < 33554432); J K J K of Roslyn. If no method is extracted from the selection tagons [36], subpolyhedra [30], or polyhedra [16]. Example 3. Assume that ~v , x is reduced to a single in general not sufficient to guarantee the absence of runtime Th. 22 states that all the abstract contracts included between (e.g., not all the branches of S are terminated by a return The general correctness argument is that static analyzers variable x. Let A ~v be the lattice with the ordering 1 defined v the best solution (29) and the abstract projections of the errors: when the safety precondition is satisfied, the execution if (list.Length == count) are correct because they implement an abstract Hoare logic Patrickby the following Cousot, Hasse diagram: Radhia Cousot, Francesco Logozzo, Michael Barnettstatement),: An abstract the extraction fails,interpretation and we stop there. If the abstract states are a solution of our problem. A natural way to extractionof the extracted succeeds, then method we generate may aor contract may not for the fail/terminate new 2. Once { J K compute P , Q is to perform the downwards iterates of framework for refactoring with applicationh R Rtoi extract methods withmethod.the contracts necessary safety. OOPSLA precondition 2012 is inferred: 213-232 [18], it can be (*) var tmp = new string[count*2 + 1]; F R S from P , Q with narrowing. The algorithm EMC The first step of the algorithm EMC is to deduce P , Q , h S S i h S S i CopyArray(list, tmp); is given in Alg. .5. An. optimization using chaotic iterations theused starting to get point a for safety the greatest post-condition fixpoint computation. by. isolated In the-. reachability withJ memoryK [8] would have ory,analysis this information of the methodcan be obtained body by [ fetching11, 13 the]. program list = tmp; (**) ICSME 2014, Victoria, BC, Canada, 2014-10-02 © P. Cousot Y = Q 2 Q 2 post S X25; // forward analysis points corresponding to the user selection, and then asking } m u R u ~p In general, an independent separate safety static analysis This is the solution we implemented, with more details given CCCheck for the corresponding invariants and Roslyn for list[count++] = newElement; Sof the. Unfortunately extracted there method are some which practical does issues that not com- take into account in the next section. ~p ~g J K | \ return list; plicatethe pre-invariant the theoretical schema. and post-invariant First, CCCheck does of not the keep selected code is 14. Experience } antoo explicit weak. map It from might source not locations be strong to bytecode enough offsets, to but guarantee that the The underlying tools We implemented the algorithms of only the inverse map, used to report warnings and sugges- refactored code invariant is still provable separately. Our main If we simply project the abstract states, the contract for the the previous section on top of two industrial-strength tools, tions. Second, for memory consumption reasons, CCCheck new method is Roslyn and CCCheck. throwsmotivation away the for inferred this workinvariants was once that it is the done isolated with the analysis raised numerous (and self-evident) complaints from end-users of Requires(list!=null && list.Length==count (1) CCCheck. && count <= 33554432); Ensures(Result() != null); Third solution: User assistance Another way of solving The precondition is too strong for the callers. The refactored the problem is to require the user to provide the precondition method MakeRoom can only be invoked when count is ex- and the postcondition for the extracted method. This is actly the length of the array and the array is not too large the actual state of the art: programmers using any form (less than 225 elements). Furthermore, in the postcondition, of DbC (CodeContracts, Spec#, JML, Eiffel, Separation because of the imprecision of the projection, we lost the rela- Logic, etc.) need to manually insert the contracts for the tion between the length of the result array and count. With extracted methods. We think that this is overkill and that our technique, we instead infer the more general contract: this represents another barrier for a wider adoption of DbC methods. We think that method extraction should come with Requires(list != null && 0 <= count * 2 + 1); Ensures(Result() != null 2 The safety precondition is not a weakest liberal precondition that would && 2 * count - Result().Length == -1); be sufficient but maybe not necessary to guarantee the absence of runtime The precondition ensures that the internal allocation is safe errors. The difference is that this sufficient precondition might exclude valid executions while the necessary safety precondition only excludes executions (even with possible arithmetic overflows) and that we copy a which are guaranteed to be definitely invalid or will not terminate. valid list. The postcondition guarantees that the array returned A very informal introduction to abstract interpretation

Patrick Cousot, Radhia Cousot: Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints. POPL 1977: 238-252 Patrick Cousot, Radhia Cousot: Systematic Design of Program Analysis Frameworks. POPL 1979: 269-282

ICSME 2014, Victoria, BC, Canada, 2014-10-02 26 © P. Cousot 1) Define the programming language semantics Formalize the concrete executions of programs (e.g. transition system) y y (x,y)

t=0 x t=2 x t=… t=1 t

Trajectory Space/time trajectory in state space

ICSME 2014, Victoria, BC, Canada, 2014-10-02 27 © P. Cousot II) Define the program properties of interest Formalize what you are interested to know about program behaviors

We are interested in the set of possible trajectories

ICSME 2014, Victoria, BC, Canada, 2014-10-02 28 © P. Cousot III) Define which specification must be checked Formalize what you are interested to prove about program behaviors

Forbiden zone

Possible trajectories

No trajectory should hit the forbidden zone

ICSME 2014, Victoria, BC, Canada, 2014-10-02 29 © P. Cousot IV) Choose the appropriate abstraction Abstract away all information on program behaviors irrelevant to the proof

Zone interdite

Possible trajectories

Abstraction of the trajectories

Abstraction by geometric forms (rectangles, polyhedra, ellipsoids, abstraction by parts, etc)

ICSME 2014, Victoria, BC, Canada, 2014-10-02 30 © P. Cousot V) Mechanically verify in the abstract The proof is fully automatic

Concrete Concrete universe of What is abstraction in AI? universe of What is abstraction in AI? Forbiddendiscourse zone Abstract discourse Abstract Elements universe of Elements Inclusion universe of properties properties γ Abstract γ Abstract

properties ⊆ properties α α γ γ α α Possible trajectories Properties Properties

ICSME 2014, Victoria, BC, Canada, 2014-10-02 21 © P. Cousot ICSME 2014, Victoria, BC, Canada, 2014-10-02 22 © P. Cousot

Concrete Concrete universe of What is abstraction in AI? universe of What is abstraction in AI? discourse Abstract discourse Abstract Abstraction of theElements trajectoriesInclusion universe of Elements Inclusion universe of properties properties γ Abstract γ Abstract

⊆ properties ⊆ properties α α ⊑ ⊑ γ γ Abstract Abstract α implication α implication

Provable abstract properties Properties Properties are true in the concrete ICSME 2014, Victoria, BC, Canada, 2014-10-02 23 © P. Cousot ICSME 2014, Victoria, BC, Canada, 2014-10-02 24 © P. Cousot ICSME 2014, Victoria, BC, Canada, 2014-10-02 31 © P. Cousot Soundness of the abstract verification Never forget any possible case so the abstract proof is correct in the concrete

Forbidden zone

Possible trajectories

Abstraction of the trajectories

ICSME 2014, Victoria, BC, Canada, 2014-10-02 32 © P. Cousot Unsound validation: testing Try a few cases

ICSME 2014, Victoria, BC, Canada, 2014-10-02 33 © P. Cousot Unsound validation: bounded model-checking Simulate the beginning of all executions

Forbidden zone

Possible trajectories

Bounded model-checking

ICSME 2014, Victoria, BC, Canada, 2014-10-02 34 © P. Cousot Unsound validation: static analysis Many static analysis tools are unsound (e.g. Coverity, etc.) so inconclusive

ICSME 2014, Victoria, BC, Canada, 2014-10-02 35 © P. Cousot Incompleteness When abstract proofs may fail while concrete proofs would succeed

Forbidden zone Alarm !!!

Possible trajectories

Error or false alarm ?

By soundness an alarm must be raised for this overapproximation!

ICSME 2014, Victoria, BC, Canada, 2014-10-02 36 © P. Cousot True error The abstract alarm may correspond to a concrete error

Forbidden zone Alarm !!!

Possible trajectories

Error

ICSME 2014, Victoria, BC, Canada, 2014-10-02 37 © P. Cousot False alarm The abstract alarm may correspond to no concrete error (false negative)

Forbidden zone Alarm !!!

Possible trajectories

False alarm

The only solution is to refine the analysis to take more properties into account (e.g. specifically for a domain of application)! ICSME 2014, Victoria, BC, Canada, 2014-10-02 38 © P. Cousot II.P. Combination of abstract domains Abstract interpretation-based tools usually use several dierent abstract domains, since the design of a complex one is best decomposed into a combination of simpler abstract domains. Here are a few abstract domain examplesCombination used in the Astree´ static of analyzer: abstractions2 in Astrée y y y

x x x

Collecting semantics:1, 5 Intervals:20 Simple congruences:24 partial traces x [a, b] x a[b] ⌃ ⌅ y y y t x x

Octagons:25 Ellipses:26 Exponentials:27 2 2 bt bt x y ⇥ a x + by axy ⇥ d a ⇥ y(t) ⇥ a ICSME 2014, Victoria, BC,± Canada,± 2014-10-02 39 © P. Cousot Such abstract domains (and more) are described in more details in Sects. III.H–III.I. The following classic abstract domains, however, are not used in Astree´ because they are either too imprecise, not scalable, di⌅cult to implement correctly (for instance, soundness may be an issue in the event of floating-point rounding), or out of scope (determining program properties which are usually of no interest to prove the specification): y y y

x x x

Polyhedra:9 Signs:7 Linear congruences:28 too costly too imprecise out of scope

Because abstract domains do not use a uniform machine representation of the information they manip- ulate, combining them is not completely trivial. The conjunction of abstract program properties has to be performed, ideally, by a reduced product 7 for Galois connection abstractions. In absence of a Galois connec- tion or for performance reasons, the conjunction is performed using an easily computable but not optimal over-approximation of this combination of abstract domains. Assume that we have designed several abstract domains and compute lfp F1 D1,...,lfp Fn Dn in these abstract domains D ,...,D , relative to a collecting semantics t I. The⌃ combination of⌃ these 1 n C analyses is sound as t I (lfp F ) (lfp F ). However, only combining the analysis results is C ⇧ 1 1 ··· n n not very precise, as it does not permit analyses to improve each other duringJ theK computation. Consider, for instance, that intervalJ andK parity analyses find respectively that x [0, 100] and x is odd at some iteration. Combining the results would enable the interval analysis to continue⌃ with the interval x [1, 99] and, e.g., avoid a useless widening. This is not possible with analyses carried out independently. ⌃ Combining the analyses by a reduced product, the proof becomes “let F ( x ,...,x ) ⇥( F (x ),..., 1 n⌦ 1 1 Fn(xn ) and r1,...,rn =lfp F in t I 1(r1) n(rn)” where ⇥ performs the reduction between abstract⌦ domains. For example⌦ ⇥( [0,C100], odd⇧ )= [1···, 99], odd . ⌦ ⌦ J K 10 of 38

American Institute of Aeronautics and Astronautics Examples of abstract interpretation-based program verification tools

ICSME 2014, Victoria, BC, Canada, 2014-10-02 40 © P. Cousot Example 1: Astrée

ICSME 2014, Victoria, BC, Canada, 2014-10-02 41 © P. Cousot Astrée • Commercially available: www.absint.com/astree/

• Effectively used in production to qualify truly large and complex software in transportation, communications, medicine, etc

Bruno Blanchet, Patrick Cousot, Radhia Cousot, Jérôme Feret, Laurent Mauborgne, Antoine Miné, David Monniaux, Xavier Rival: A static analyzer for large safety-critical software. PLDI 2003: 196-207 ICSME 2014, Victoria, BC, Canada, 2014-10-02 42 © P. Cousot Example of domain-specificExample of analysis abstraction: by Astrée ellipses(suite) typedef enum {FALSE = 0, TRUE = 1} BOOLEAN; BOOLEAN INIT; float P, X; void filter () { static float E[2], S[2]; if (INIT) { S[0] = X; P = X; E[0] = X; } else {P=(((((0.5*X)-(E[0]*0.7))+(E[1]*0.4)) +(S[0]*1.5))-(S[1]*0.7));} E[1] = E[0]; E[0] = X; S[1] = S[0]; S[0] = P; /* S[0], S[1] in [[?????,-1327.02698354 ?????], 1327.02698354]*/ To be inferred,II.P. Combination not oftested, abstract domains } Abstract interpretation-based tools usually use several dierent abstract domains, since the design of a checked,complex or one is verified best decomposed into a combination of simpler abstract domains. Here are a few abstract domain examples used in the Astree´ static analyzer:2 void main () { X = 0.2 * X + 5; INIT = TRUE; y y y while (1) { x x x

X=0.9*X+35;/* simulated filter inputCollecting */ semantics:1, 5 Intervals:20 Simple congruences:24 partial traces x [a, b] x a[b] ⌅ ⇥ filter (); INIT = FALSE; } y y y t } x x

25 26 27 ICSME 2014, Victoria, BC, Canada, 2014-10-02 43 Octagons: ©Ellipses: P. Cousot Exponentials: x y a x2 + by2 axy d abt y(t) abt ± ± ⇥ ⇥ ⇥ ⇥ Such abstract domains (and more) are described in more details in Sects. III.H–III.I. The following classic abstract domains, however, are not used in Astree´ because they are either too imprecise, not scalable, di⇥cult to implement correctly (for instance, soundness may be an issue in the event of floating-point rounding), or out of scope (determining program properties which are usually of no interest to prove the specification): y y y

x x FICS’08, Shanghai, 3–6/6/2008 — 64 — x © P. Cousot

Polyhedra:9 Signs:7 Linear congruences:28 too costly too imprecise out of scope

Because abstract domains do not use a uniform machine representation of the information they manip- ulate, combining them is not completely trivial. The conjunction of abstract program properties has to be performed, ideally, by a reduced product 7 for Galois connection abstractions. In absence of a Galois connec- tion or for performance reasons, the conjunction is performed using an easily computable but not optimal over-approximation of this combination of abstract domains. Assume that we have designed several abstract domains and compute lfp F1 D1,...,lfp Fn Dn in these abstract domains D ,...,D , relative to a collecting semantics t I. The⌅ combination of⌅ these 1 n C analyses is sound as t I (lfp F ) (lfp F ). However, only combining the analysis results is C ⇤ 1 1 ⇧ ···⇧ n n not very precise, as it does not permit analyses to improve each other duringJ theK computation. Consider, for instance, that intervalJ andK parity analyses find respectively that x [0, 100] and x is odd at some iteration. Combining the results would enable the interval analysis to continue⌅ with the interval x [1, 99] and, e.g., avoid a useless widening. This is not possible with analyses carried out independently. ⌅ Combining the analyses by a reduced product, the proof becomes “let F ( x ,...,x ) ⇥( F (x ),..., ⌃ 1 n⌥ ⌃ 1 1 Fn(xn ) and r1,...,rn =lfp F in t I 1(r1) n(rn)” where ⇥ performs the reduction between abstract⌥ domains.⌃ For example⌥ ⇥( [0,C100], odd⇤ )= ⇧[1···, 99]⇧, odd . ⌃ ⌥ ⌃ ⌥ J K 10 of 38

American Institute of Aeronautics and Astronautics Abstract interpretation • Abstract interpretation is the only formal method able to automatically infer program properties • All others can only check your assertions

Types are abstract interpretations, see Patrick Cousot: Types as Abstract Interpretations. POPL 1997: 316-331

ICSME 2014, Victoria, BC, Canada, 2014-10-02 44 © P. Cousot Example of domain-specificExample of analysis abstraction: by Astrée ellipses(suite) typedef enum {FALSE = 0, TRUE = 1} BOOLEAN; BOOLEAN INIT; float P, X; void filter () { static float E[2], S[2]; if (INIT) { S[0] = X; P = X; E[0] = X; } else {P=(((((0.5*X)-(E[0]*0.7))+(E[1]*0.4)) +(S[0]*1.5))-(S[1]*0.7));} E[1] = E[0]; E[0] = X; S[1] = S[0]; S[0] = P; /* S[0], S[1] in [[?????,-1327.02698354 ?????], 1327.02698354]*/ To be inferred,II.P. Combination not oftested, abstract domains } Abstract interpretation-based tools usually use several dierent abstract domains, since the design of a checked,complex or one is verified best decomposed into a combination of simpler abstract domains. Here are a few abstract domain examples used in the Astree´ static analyzer:2 void main () { X = 0.2 * X + 5; INIT = TRUE; y y y while (1) { x x x

X=0.9*X+35;/* simulated filter inputCollecting */ semantics:1, 5 Intervals:20 Simple congruences:24 partial traces x [a, b] x a[b] ⌅ ⇥ filter (); INIT = FALSE; } y y y t } x x

25 26 27 ICSME 2014, Victoria, BC, Canada, 2014-10-02 45 Octagons: ©Ellipses: P. Cousot Exponentials: x y a x2 + by2 axy d abt y(t) abt ± ± ⇥ ⇥ ⇥ ⇥ Such abstract domains (and more) are described in more details in Sects. III.H–III.I. The following classic abstract domains, however, are not used in Astree´ because they are either too imprecise, not scalable, di⇥cult to implement correctly (for instance, soundness may be an issue in the event of floating-point rounding), or out of scope (determining program properties which are usually of no interest to prove the specification): y y y

x x FICS’08, Shanghai, 3–6/6/2008 — 64 — x © P. Cousot

Polyhedra:9 Signs:7 Linear congruences:28 too costly too imprecise out of scope

Because abstract domains do not use a uniform machine representation of the information they manip- ulate, combining them is not completely trivial. The conjunction of abstract program properties has to be performed, ideally, by a reduced product 7 for Galois connection abstractions. In absence of a Galois connec- tion or for performance reasons, the conjunction is performed using an easily computable but not optimal over-approximation of this combination of abstract domains. Assume that we have designed several abstract domains and compute lfp F1 D1,...,lfp Fn Dn in these abstract domains D ,...,D , relative to a collecting semantics t I. The⌅ combination of⌅ these 1 n C analyses is sound as t I (lfp F ) (lfp F ). However, only combining the analysis results is C ⇤ 1 1 ⇧ ···⇧ n n not very precise, as it does not permit analyses to improve each other duringJ theK computation. Consider, for instance, that intervalJ andK parity analyses find respectively that x [0, 100] and x is odd at some iteration. Combining the results would enable the interval analysis to continue⌅ with the interval x [1, 99] and, e.g., avoid a useless widening. This is not possible with analyses carried out independently. ⌅ Combining the analyses by a reduced product, the proof becomes “let F ( x ,...,x ) ⇥( F (x ),..., ⌃ 1 n⌥ ⌃ 1 1 Fn(xn ) and r1,...,rn =lfp F in t I 1(r1) n(rn)” where ⇥ performs the reduction between abstract⌥ domains.⌃ For example⌥ ⇥( [0,C100], odd⇤ )= ⇧[1···, 99]⇧, odd . ⌃ ⌥ ⌃ ⌥ J K 10 of 38

American Institute of Aeronautics and Astronautics Example of domain-specificExample of analysis abstraction: by Astrée ellipses(suite) typedef enum {FALSE = 0, TRUE = 1} BOOLEAN; BOOLEAN INIT; float P, X; void filter () { static float E[2], S[2]; if (INIT) { S[0] = X; P = X; E[0] = X; } else {P=(((((0.5*X)-(E[0]*0.7))+(E[1]*0.4)) +(S[0]*1.5))-(S[1]*0.7));} E[1] = E[0]; E[0] = X; S[1] = S[0]; S[0] = P; /* S[0], S[1] in [[-1418.3753,-1327.02698354 1418.3753], 1327.02698354]*/ II.P. Combination of abstract domains } Abstract interpretation-based tools usually use several dierent abstract domains, since the design of a complex one is best decomposed into a combination of simpler abstract domains. Here are a few abstract domain examples used in the Astree´ static analyzer:2 void main () { X = 0.2 * X + 5; INIT = TRUE; y y y while (1) { x x x

X=0.9*X+35;/* simulated filter inputCollecting */ semantics:1, 5 Intervals:20 Simple congruences:24 partial traces x [a, b] x a[b] ⌅ ⇥ filter (); INIT = FALSE; } y y y t } x x

25 26 27 ICSME 2014, Victoria, BC, Canada, 2014-10-02 46 Octagons: ©Ellipses: P. Cousot Exponentials: x y a x2 + by2 axy d abt y(t) abt ± ± ⇥ ⇥ ⇥ ⇥ Such abstract domains (and more) are described in more details in Sects. III.H–III.I. The following classic abstract domains, however, are not used in Astree´ because they are either too imprecise, not scalable, di⇥cult to implement correctly (for instance, soundness may be an issue in the event of floating-point rounding), or out of scope (determining program properties which are usually of no interest to prove the specification): y y y

x x FICS’08, Shanghai, 3–6/6/2008 — 64 — x © P. Cousot

Polyhedra:9 Signs:7 Linear congruences:28 too costly too imprecise out of scope

Because abstract domains do not use a uniform machine representation of the information they manip- ulate, combining them is not completely trivial. The conjunction of abstract program properties has to be performed, ideally, by a reduced product 7 for Galois connection abstractions. In absence of a Galois connec- tion or for performance reasons, the conjunction is performed using an easily computable but not optimal over-approximation of this combination of abstract domains. Assume that we have designed several abstract domains and compute lfp F1 D1,...,lfp Fn Dn in these abstract domains D ,...,D , relative to a collecting semantics t I. The⌅ combination of⌅ these 1 n C analyses is sound as t I (lfp F ) (lfp F ). However, only combining the analysis results is C ⇤ 1 1 ⇧ ···⇧ n n not very precise, as it does not permit analyses to improve each other duringJ theK computation. Consider, for instance, that intervalJ andK parity analyses find respectively that x [0, 100] and x is odd at some iteration. Combining the results would enable the interval analysis to continue⌅ with the interval x [1, 99] and, e.g., avoid a useless widening. This is not possible with analyses carried out independently. ⌅ Combining the analyses by a reduced product, the proof becomes “let F ( x ,...,x ) ⇥( F (x ),..., ⌃ 1 n⌥ ⌃ 1 1 Fn(xn ) and r1,...,rn =lfp F in t I 1(r1) n(rn)” where ⇥ performs the reduction between abstract⌥ domains.⌃ For example⌥ ⇥( [0,C100], odd⇤ )= ⇧[1···, 99]⇧, odd . ⌃ ⌥ ⌃ ⌥ J K 10 of 38

American Institute of Aeronautics and Astronautics Example II: cccheck

ICSME 2014, Victoria, BC, Canada, 2014-10-02 47 © P. Cousot Code Contract Static Checker (cccheck) • Available within MS Visual Studio

Manuel Fähndrich, Francesco Logozzo: Static Contract Checking with Abstract Interpretation. FoVeOOS 2010: 10-30 ICSME 2014, Victoria, BC, Canada, 2014-10-02 48 © P. Cousot Comments on screenshot (courtesy Francesco Logozzo) • A screenshot from Clousot/cccheck on the classic binary search. • The screenshot shows from left to right and top to bottom 1. C# code + CodeContracts with a buggy BinarySearch 2. cccheck integration in VS (right pane with all the options integrated in the VS project system) 3. cccheck messages in the VS error list • The features of cccheck that it shows are: 1. basic abstract interpretation: a. the loop invariant to prove the array access correct and that the arithmetic operation may overflow is inferred fully automatically b. different from deductive methods as e.g. ESC/Java or Boogie or Dafny where the loop invariant must be provided by the end-user 2. inference of necessary preconditions: a. Clousot finds that array may be null (message 3) b. Clousot suggests and propagates a necessary precondition invariant (message 1) 3. array analysis (+ disjunctive reasoning): a. to prove the postcondition one must infer properties of the content of the array b. please note that the postcondition is true even if there is no precondition requiring the array to be sorted. 4. verified code repairs: a. from the inferred loop invariant does not follow that index computation does not overflow b. suggest a code fix for it (message 2)

ICSME 2014, Victoria, BC, Canada, 2014-10-02 49 © P. Cousot Conclusion

ICSME 2014, Victoria, BC, Canada, 2014-10-02 50 © P. Cousot To explore abstract interpretation...

Abstract Interpretation: Past, Present and Future

Patrick Cousot Radhia Cousot

CIMS ⇤, NYU, USA CNRS Emeritus, ENS ⇤⇤, s.t uops eiu@ d.ncyo ucm o @ct eo .snu frrs A good starting point: • Abstract the system. Because reasoning on a system involves determining Abstract interpretation is a theory of abstraction and constructive or proving its properties, the central concept is the abstraction of properties of the system, starting from the strongest one, as speci- approximation of the mathematical structures used in the formal 1 description of complex or infinite systems and the inference or ver- fied by the system semantics (and called the collecting semantics ). ification of their combinatorial or undecidable properties. Devel- This is the purpose of abstract interpretation (where “interpreta- oped in the late seventies, it has been since then used, implicitly tion” stands both for “meaning” and “execution”). A few gentle Patrick Cousotor explicitly, to many and aspects of computer Radhia science (such as static introductionsCousot to abstract interpretation: [44 , 79] can be consulted analysis and verification, contract inference, type inference, ter- for a first approach, including some publicly available on the web mination inference, model-checking, abstraction/refinement, pro- (e.g. web.mit.edu/16.399/www/). gram transformation (including watermarking, obfuscation, etc), combination of decision procedures, security, malware detection, Abstract interpretation:database queries, etc) and more recently, to system biology and 2.past, Scope present and future. SAT/SMT solvers. Production-quality verification tools based on Abstract interpretation comprehends undecidable problems (hence abstract interpretation are available and used in the advanced soft- is also applicable to decidable but complex ones). This implies that ware, hardware, transportation, communication, and medical in- any tool (prover, checker, analyzer) designed by abstract interpre- dustries. tation will fail on infinitely many counter-examples. For example The talk will consist in an introduction to the basic notions of a finiteness or decidability hypothesis will only be applicable to a abstract interpretation and the induced methodology for the sys- very restricted class of programs with finite behavior, hence will tematic development of sound abstract interpretation-based tools. fail on infinitely many other ones. This is inherent to undecidable In: Examples of abstractions will be provided, from semantics to typ- problems hence inescapable. By failure we understand being un- ing, grammars to safety, reachability to potential/definite termina- sound/incorrect, non-terminating, using a human oracle to assist tion, numerical to protein-protein abstractions, as well as applica- the computer, etc. Although abstract interpretation also applies to tions (including those in industrial use) to software, hardware and these cases2, it is usually used for sound, terminating, and fully system biology. automatic program analysis/verification, including the inference of Thomas A. Henzinger,This paper is a general discussion of abstract interpretation,Dale with soundMiller inductive arguments (like invariants)(Eds.): to deal with infinite re-Joint Meeting selected publications, which unfortunately are far from exhaustive currences for unbounded/non-terminating executions, which makes both in the considered themes and the corresponding references. the problem particularly difficult, with a very high complexity. Categories and Subject Descriptors D.2.4 [Software/Program of the Twenty-ThirdVerification]; D.3.1 [Formal Definitions andEACSL Theory]; F.3.1 [Spec- 3. Annual Static analysis Conference on ifying and Verifying and Reasoning about Programs]. The origin of abstract interpretation is in General Terms Algorithms, Languages, Reliability, Security, [50, 51] where reachable states are abstracted by local interval nu- Theory, Verification. merical invariants understood as a generalization of type inference Computer ScienceKeywords Abstract interpretation, Semantics,Logic Proof, Verification, (CSL)[49, 54]. The abstraction and from the collecting the semantics was Twenty-Ninth for- Static Analysis. malized by a Galois insertion and convergence acceleration of the iterates by widening, later improved by narrowing. The main in- 1. Abstraction novations at the time were to consider infinite non-Noetherian ab- stractions of infinite systems and to prove rigorously the correct- No reasoning on complex systems, including computer systems, ness of the static analysis with respect to a formal semantics (see Annual ACM/IEEEcan be done without abstracting Symposium the behavior, i.e. the semantics, of on6 Logic in Computer more details in footnote ).

⇤ Work supported in part by the CMACS NSF award 0926166. Work supported in part by the European ARTEMIS project MBAT (grant ⇤⇤ 1 The collecting (or static in [52]) semantics is the strongest property of the agreement No. 269335). standard semantics. Science (LICS),Permission to make digitalCSL-LICS or hard copies of part or all of this work for personal or '14, Vienna, Austria, July 14 - classroom use is granted without fee provided that copies are not made or distributed 2 For example, some commercial products do consider only two iterations for profit or commercial advantage and that copies bear this notice and the full citation in loops without widening, which is an under-approximation of an over- on the first page. Copyrights for third-party components of this work must be honored. approximating abstraction of program executions which can be formalized For all other uses, contact the Owner/Author. by abstract interpretation theory. The theory also proves beyond doubt Copyright is held by the owner/author(s). that the result cannot be claimed to be a sound over-approximation of the 18, 2014. ACMCSL-LICS 2014, July2014 14–18, 2014, Vienna, Austria., ISBN program978-1-4503-2886-9 behavior, a conclusion which is not always stated clearly enough Copyright c 2014 ACM ACM 978-1-4503-2886-9/14/07. . . $15.00. for practitioners to have a precise understanding of the scope of commercial http://dx.doi.org/10.1145/2603088.2603165 static analyzers.

ICSME 2014, Victoria, BC, Canada, 2014-10-02 51 © P. Cousot Conclusion • 40 years after Harlan D. Mills pioneer ideas, abstract interpretation-based formal methods have made considerable progress both in theory and practice • May become indispensable as • safety and security become central to • programmers are held responsible for their errors • machines hence programming becomes more and more complicated (if not intractable, e.g. parallelism, cloud, etc)

ICSME 2014, Victoria, BC, Canada, 2014-10-02 52 © P. Cousot The End, Thank You

ICSME 2014, Victoria, BC, Canada, 2014-10-02 53 © P. Cousot