Static Analysis by Abstract Interpretation of Embedded Critical Software

Julien Bertrane Radhia Cousot J´erˆomeFeret Laurent Mauborgne ENS∗ ENS∗ & CIMS§ CNRS & ENS∗ INRIA & ENS∗ IMDEA∗∗ Antoine Min´e Xavier Rival CNRS & ENS∗ INRIA & ENS∗

Abstract

Formal methods are increasingly used to help ensuring the correctness of complex, critical embedded software systems. modeling We show how sound semantic static analyses based on Ab- S static analysis stract Interpretation may be used to check properties at var- A ious levels of a software design: from high level models to low O level binary code. After a short introduction to the Abstract code generation V Interpretation theory, we present a few current applications: a checking for run-time errors at the C level, translation val- l static analysis i idation from C to assembly, and analyzing SAO models of d C a communicating synchronous systems with imperfect clocks. t translation i We conclude by briefly proposing some requirements to ap- o compilation validation ply Abstract Interpretation to modeling languages such as n UML. B static analysis I keywords: Abstract interpretation, Critical software, Em- N test bedded systems, Static analysis, System design, System modeling, System verification. execution

1 Introduction

Ensuring the correctness of software systems constitutes a Figure 1: Example workflow for designing an embedded ap- large part of software development budgets. It is particu- plication. larly important for critical embedded systems, such as found in automotive, aerospace and medical applications, as the slightest programming “bug” may have catastrophic finan- based static analysis, which always terminates and covers all cial and even human cost. In this article, we build a case executions, albeit in an over-approximated way. For example for using static analysis based on Abstract Interpretation to it can detect dead code (never executed) or dead data struc- help ensuring software correctness. tures (constructed but never used) but cannot always prove We illustrate a possible use for static analysis in Fig.1. In their absence. Moreover, static analysis can be applied at this drastically simplified workflow inspired from a real in- many levels: machine-readable specification, program source dustrial case [27], an engineer (not necessarily a programmer) or binary. The higher level the better, as it provides purer in- models a control system using the SAO graphical language, formation to the tool and its feedback is easier to understand a precursor and similar tool as SCADE [14] — a SimulinkTM to the designer. However, higher levels abstract away some fragment similar to SAO is also shown in Fig.2. It is then au- aspects of computations, which makes it impossible to check tomatically translated to the programming language C and some properties of actual executions. For instance, SAO and then compiled to produce the actual binary software exe- SCADE have real arithmetics and do not specify how actual cuted by the device. Validation includes testing, which re- numerical computations are performed nor the type of num- quires executing (part of) the binary with some monitoring bers, so that a static analysis of numeric overflows (as done and is able to check a wide range of properties (including by Astree´ [1, 11,5], Sec.3) or of the precision of floating- functional ones) but is costly and never achieves a full cover- point computations (as done by Fluctuat [17]) is done at the age of all possible executions (path- and data-coverage). For- C level — a static analysis of real expressions at the SCADE mal methods can also be employed. In particular, semantic- level may however be used to determine its numerically most precise float compilation to C [20]. Likewise, neither SAO nor (∗)Ecole´ Normale Sup´erieure, 45, rue d’Ulm, Paris, , C make any guarantee about the worst-case execution time, Ls e.@tr .an rssfitF (§)Courant Institute of Mathematical Sciences, NYU, New York, NY so, such an analysis (as done by AbsInt’s aiT [18]) is done at (∗∗)Fundaci´onIMDEA Software, 28660-Boadilla del Monte, Madrid, the binary level and for a specific processor. A SAO model Spain, m u oa eu amal .eibrrnt ordg [email protected] can also be enriched with non-software elements, such as real-

1 a b 2 Abstract Interpretation

- We provide a succinct introduction to Abstract Interpreta- tion, more details are provided e.g. in [3]. ++ -1 + z z-1 t Unit delay Unit delay 2.1 Small-Step Operational Semantics Switch Switch In order to analyze the behavior of a computer system during x(n) Switch j B i execution, we start by providing a model of computations, that is, an operational semantics. An example of operational Figure 2: A second-order digital filter. semantics for UML-Statecharts is [31]. Such an operational semantics of a given program can be described as a transition system hS, I, E, ti. S is a set of states, including initial states I ⊆ S and bad or erroneous states E ⊆ S. t ⊆ S × S is a transition relation between a time clocks and communication lines with delays, to enable state s ∈ S and its possible successors: for any state s0 ∈ S, a static analysis taking time into account (Sec.5). Finally, t(s, s0) is true if, and only if, s0 is a potential successor of s. 0 static analysis can also improve the confidence in compil- The blocking states have no successor B , {s ∈ S | ∀s ∈ S : ers and code generators: translation validation (Sec.4) can ¬t(s, s0)}. check whether the source and output are equivalent, at least with respect to a class of properties, so that the analysis for 2.2 Big-Step Operational Semantics such properties at a higher level needs not be redone at the lowest level (which is often more difficult). The big-step operational semantics of hS, I, E, ti is hS, I, E, t?i where t? S tn is the reflexive transitive closure , n>0 Ideally, a static analyzer should extract automatically pre- (1) 0 of t, t , 1S , {hs, si | s ∈ S} is the identity relation on cise properties from a complete mathematical specification of n+1 n (2) S, t , t ◦ t where ◦ is the composition of relations. the analyzed system. Most properties are however undecid- ? ? We have t = T (t ) where T (r) , 1S ∪ r ◦ t since a state able, so, we resort to abstraction, i.e., the analyzer explores 0 s is reachable from s in n > 0 steps if and only if n = 0 and machine-representable supersets of actual behaviors of the s = s0 or n > 0 and s0 is reachable from a successor of s in system using tractable algorithms. As a consequence, the ? n − 1 steps. Moreover if T (r) = 1S ∪ t ◦ r ⊆ r then t ⊆ r. It analyzer may consider spurious behaviors and miss proper- ⊆ follows that t? = lfp T , by definition of the ⊆-least fixpoint ties, but the analysis is sound: all the properties that are ⊆ (3) found (absence of run-time errors, worst-case execution time, lfp T of T . Because all non-trivial properties of programs are undecid- etc.) are indeed true for all executions. A specificity of static ⊆ analysis is that it works directly on the concrete system that able, t? = lfp T is not computable for infinite state transi- is input to or code generators, and the abstracted tion systems hS, I, E, ti (except for trivial programs t and system is derived automatically according to built-in abstrac- specifications E). tion mechanisms; no abstract system need to be provided — which would pose the question of whether it indeed corre- 2.3 Specification sponds to the concrete one. We use the Abstract Interpre- tation framework [8], a general theory of the approximation A typical verification problem is to prove that no execution of semantics, to design static analyzers that are sound by starting in an initial state can reach a bad state (e.g. where construction. There is no silver bullet: each static analyzer the next execution step would raise an error). The correct- should be tailored to a specific class of properties and pro- ness condition is ∀s ∈ I : ∀s0 ∈ S : t?(s, s0) =⇒ s0 6∈ E that grams to achieve both precision (low rate of missed prop- is no state s0 reachable from the initial states s is a bad state. erties) and efficiency. Thankfully, Abstract Interpretation For example, E can be the set of blocking states in order to provides a growing library of ready-to-use abstractions, and specify the absence of deadlocks. Error-freedom can also be 0 ? 0 the mean to design new ones in a principled way. written R ⊆ S \ E where R , {s | ∃s ∈ I : t (s, s )} is the set of states reachable from the initial states and After a short formal introduction to Abstract Interpreta- X \ Y , {x ∈ X | x 6∈ Y }. tion theory in Sec.2, this article describes more informally (1) ? 0 It follows that t (s, s ) = ∃n > 0 : ∃s0, . . . , sn : s = s0 ∧ t(s0, s1) ∧ several static analysis applications: checking for run-time 0 0 ... ∧ t(sn−1, sn) ∧ sn = s , including the case s = s for n = 0. ´ (2) 00 0 errors with Astree (Sec.3), translation validation from ◦ is the composition of relations that is r1 ◦ r2 , {hx, x i | ∃x : hx, 0 0 00 C to assembly (Sec.4), and analyzing communicating syn- x i ∈ r1 ∧ hx , x i ∈ r2}. v chronous systems with imperfect clocks (Sec.5). Section6 (3)The v-least fixpoint lfp f of an increasing map f on a poset v v concludes and suggests the application of Abstract Interpre- partially ordered by v is defined by f(lfp f) = lfp f and f(x) v x v tation to modeling languages such as UML. implies lfp f v x.

2 Define F (X) , I ∪ post[t]X where post[r]X , r(X) , {s0 | ∃s ∈ X : r(s, s0)} is the image of the set X by the relation r. We have F ∈ ℘(S) → ℘(S)(4) is additive(5) hence strict(6) and increasing.(7) We have R = F (R) since a state is reachable iff it is an initial state or the successor of a reach- able state. If F (X) ⊆ X then X contains the initial states I and, transitively, any of its successors so R ⊆ X. This ⊆ implies that R = lfp F , which is not computable either. Figure 3: A set and its interval and octagonal abstractions.

2.4 Abstraction octagons [25] that infer relations of the form ±X ± Y ≤ c 2.4.1 Intervals (where c is a constant automatically inferred by the static analysis), see Fig.3. Let us start with the simple example of abstracting a set V ⊆ Z of integers (e.g. the set of possible values of an integer 2.4.3 Transformers variable) by an interval of values αi(V ) , [min V, max V ] (where min Z , max ∅ , −∞ and max Z , min ∅ , +∞). Transformers, that is, relations between states, can also be In particular for the empty set ∅, αi(∅) = [+∞, −∞]. In abstracted. Consider for example the abstraction of a rela- general, this is obviously an over-approximation since the tion r ⊆ S × S by its right image αI (r) , r(I) , post[r]I , interval αi(V ) may contain spurious values not in V . So {x0 | ∃x ∈ I : r(x, x0)} of a given set I ⊆ S (e.g. of initial from z 6∈ αi(V ) we can conclude z 6∈ V whereas knowing that states). For example, the reachable states of hS, I, E, ti are ? z ∈ αi(V ) in the abstract, we do not know whether z ∈ V R = αI (t ) so reachability is an abstraction of the big-step or z 6∈ V in the concrete since z might be a spurious value. semantics. We have: The concretization is γ ([`, h]) {z ∈ | ` z h}. Let i , Z 6 6 ˙ us define the abstract domain of intervals V ] {[`, h] | ` ∈ αI (r) ⊆ R i , 0 0 ⇔ {x | ∃x ∈ I : r(x, x )} ⊆ R def. αI Z ∪ {−∞} ∧ h ∈ Z ∪ {+∞} ∧ ` h} ∪ {[+∞, −∞]}. We have AIAA Infotech@Aerospace 2010, Atlanta, georgia, 04/20/2010 © P Cousot et Al. 6 0 0 0 2 H I 2 2 ∀V ∈ ℘( ): ∀[`, h] ∈ V ] : α (V ) ⊆ [`, h] ⇐⇒ V ⊆ γ ([`, h]) ⇔ ∀x ∈ S :(∃x ∈ I : r(x, x )) =⇒ x ∈ R def. ⊆ Z i i i H I and so, by definition, the pair hα, γi is a Galois connection,(8) ⇔ ∀x0 ∈ S : ∀x ∈ I : r(x, x0) =⇒ x0 ∈ R generalization ←−−−γi ] 0 0 0 H I written h℘(Z), ⊆i −−−→ hVi , ⊆i. ⇔ ∀x, x ∈ S : r(x, x ) =⇒ (x ∈ I =⇒ x ∈ R) def. =⇒ αi H I ⇔ r ⊆ {hx, x0i | x ∈ I =⇒ x0 ∈ R} def. ⊆ H I 2.4.2 Cartesian Abstraction ⇔ r ⊆ γI (R) 0 0 n by defining γI (R) , {hx, x i | x ∈ I =⇒ x ∈ R} A set V ⊆ Z of vectors of n > 1 integer values (e.g. the set H I of possible values of n integer variables) can be abstracted which is the characteristic property of the Galois connection Qn γI by projection along each component αC (V ) , i=1{z | h℘(S × S), ⊆i ←−−− h℘(S), ⊆i. −−−→α ∃z1, . . . , zi−1, zi+1, . . . , zn ∈ Z : hz1, . . . , zi−1, z, zi+1,..., I vni ∈ V }. The concretization is γC (hV1,...,Vni) {hz1, γ , Vn n C n ˙ 2.4.4 Fixpoint Abstraction . . . , zni | zi ∈ Vi} so that h℘(Z ), ⊆i −−−−→←−−−− h℘(Z) , ⊆i i=1 αC ˙ (9) ? ⊆ ⊆ where ⊆ is the componentwise ordering. Composed with For reachability, we have R = αI (t ) = αI (lfp T ) = lfp F the interval abstraction, this specifies the interval analysis of ⊆ so that the abstraction αI (lfp T ) of the concrete fixpoint [8] such that αi◦C (V ) , hαi(V1), . . . , αi(Vn)i where hV1,..., ⊆ ⊆ γ ◦ n lfp T can be calculated as an abstract fixpoint lfp F not n ←−−−−−i C ] ˙ Vni αC (V ) which implies h℘(Z ), ⊆i −−−−−→ hVi , ⊆i. , αi◦C referring at all to the concrete world. This follows from a This abstraction forgets about relationships between values general result sketched below and the observation that: of variables (such as whether variables have equal values). To abstract relations between values X, Y , . . . of numeri- αI ◦ T (r) cal variables, more refined abstractions must be used such as = αI (1S ∪ r ◦ t) def. ◦ and T H I = α (1 ) ∪ α (r ◦ t) def. α (4)℘(X) = {Y | Y ⊆ X} is the set of all subsets Y of a set X. I S I I S S 0 0 H I (5) ◦ F is additive iff F ( i∈∆ Xi) = i∈∆ F (Xi). = I ∪ {x | ∃x ∈ I :(r t)(x, x )} def. 1S and αI (6) 0 00 00 00 0 H I F is strict whenever F (∅) = ∅. = I ∪ {x | ∃x ∈ I : ∃x : r(x, x ) ∧ t(x , x )} def. ◦ (7) F is increasing whenever X ⊆ Y implies F (X) ⊆ F (Y ). 0 00 00 00 00 0 H I γ = I ∪ {x | ∃x ∈ {x | ∃x ∈ I : r(x, x )} : t(x , x )} def. ∃, ∈ (8)By definition, hC, i ←−−− hA, vi if and only if hC, i and hA, −−−→α 0 00 00 0 H I vi are posets, α ∈ C → A, γ ∈ A → C and ∀x ∈ C, y ∈ A : α(x) v = I ∪ {x | ∃x ∈ αI (r): t(x , x )} def. αI H I y ⇐⇒ x  γ(y). ⇐= implies soundness in that y is an abstract over- = I ∪ post[t](αI (r)) def. post[t] approximation of the concrete x and =⇒ implies that α(x) is the best H I abstraction of x in that it is more precise than any other sound abstrac- = F ◦ αI (r) def. F and ◦ H I tion y. (9) ˙ 0 0 The above calculus also shows how to calculate the ab- The componentwise ordering is: hV1,...,Vni ⊆ hV1 ,...,Vni if Vn 0 and only if i=1(Vi ⊆ Vi ). stract transformer F knowing the concrete transformer T ,

3 which is the idea of the calculational design of static analyz- verification purposes (e.g. to prove the absence of run-time ers [7]. errors). More generally, under suitable hypotheses of existence of joins in posets and fixpoints [8,7], if f ∈ C → C, hC,  γ i ←−−− hA], vi and f ] ∈ A] → A] satisfy α ◦ f = f ] ◦ α, 3 Checking Run-Time Errors with −−−→α  v then α(lfp f) = lfp f ]. Astr´ee  Faced with undecidable problems, α(lfp f) is often non- We now describe a first concrete application of Abstract In- computable, in which case it must be over-approximated  v terpretation: the static analyzer Astree´ [4] that checks for α(lfp f) v lfp f ], which follows from the local condition run-time errors in embedded C programs and has been suc- α ◦ f v˙ f ] ◦ α.(10) This is the case for example for interval cessfully used in aeronautics [13] and aerospace [6]. It started analysis [8]. in 2001 as an academic project [11] and is industrialized by AbsInt Angewandte Informatik [1] since 2009. 2.4.5 Fixpoint Approximation Under suitable hypotheses of existence of joins in posets and 3.1 Semantics fixpoints [8,7], fixpoints can be computed iteratively. For ex- 3.1.1 Operational Semantics of C Sn n (11) ample, R = i=0 F (∅) with the intuition that the reach- able states are reachable in either 0, 1, 2, . . . , n, . . . compu- Astree´ accepts a fairly large subset of C, excluding dy- tation steps. However, in general, convergence of the iterates namic memory allocation, recursivity, and parallelism, that to a fixpoint may require infinitely many iterations (for unde- are often unused (even forbidden) in embedded code. The cidable problems) or suffer a combinatorial explosion in time language syntax and concrete semantics are based on the and memory (for finite but complex problems). But for rare C99 norm [21], supplemented with the IEEE 754-1985 norm cases where the fixpoint can be computed directly (e.g. by [19] for floating-point computations. However, the C99 norm elimination), convergence must in general be accelerated, e.g. leaves many aspects of the semantics unspecified and lets using a widening O [8] at the prejudice of precision. A na¨ıve implementations decide. As embedded software are rarely i i i+1 i+1 example of widening for intervals is [` , h ]O[` , h ] , strictly conforming but rely instead on platform-specific fea- [if `i+1 < `i then −∞ else `i, if hi+1 > hi then +∞ else `i] tures, Astree´ also takes them into account and provides so that unstable bounds are pushed to infinity, which en- options for the user to tune some semantic aspects (e.g. the forces rapid although imprecise convergence. A narrowing bit-size, alignment, and byte order of data-types). can then be used to remove some of the infinite bounds [8]. 3.1.2 Run-Time Errors 2.4.6 Design of a Static Analyzer Astree´ checks for run-time errors, that is, operations that The design of a static analyzer for a (specification or pro- are either undefined on the user-defined platform (e.g. out-of- gramming) language starts with the definition of its seman- bound array accesses) or have unintended results (e.g. wrap- tics and the properties of interest for each program of the around after integer overflows). The property to verify (ab-  language, often in the form of fixpoints lfp F correspond- sence of run-time errors) is thus implicit and derived from the ing to an abstraction of the notion of computation (e.g. program’s own source. More precisely, Astree´ checks over- reachability). F is then designed by induction on the lan- flows in unsigned and signed integer and float arithmetics and guage syntax. Then, an abstraction α is defined, which casts, divisions by zero, out-of-bound array accesses, NULL, is a complex task, hence must be done by composition of dangling, out-of-bound and misaligned pointer dereferences, simpler abstractions, using standard composition methods and assertion failures (in calls to the assert C function). Vm Astree´ does not stop at the first encountered error but such as the reduced product α(X) , i=1 αi(X) combin- instead continues the analysis for all executions that have ing different abstractions αi(X)[9] and standard abstrac- tions (some already implemented in libraries [22]). The a well-defined semantics (e.g. integer overflows with wrap- static analyzer is then designed by induction on the language around). Thus, if there are no alarms, or if all executions syntax. It reads a program, computes the abstract trans- leading to alarms can be proved by other means to be spuri- former F ] (designed to satisfy α ◦ F v F ] ◦ α) for that ous, then the program is indeed free from run-time errors. v program, over-approximates the abstract fixpoint lfp F ] by an iterative computation with convergence acceleration 3.2 Analyzed Codes with widening/narrowing. This ensures that the result A Although Astree´ accepts a large variety of C codes, it can- (e.g. an abstract invariant for reachability) is sound (i.e.  v not analyze most of them precisely and efficiently. It is α(lfp F ) v lfp F ] v A). The result A is then used for mainly specialized, by its choice of abstractions, to analyze (10)The pointwise ordering is f v˙ g if and only if ∀x : f(x) v g(x). control/command synchronous programs automatically gen- (11)The powers of a function f are: f 0 is the identity, f 1 , f, and erated from higher level specifications. Once generated, such f n+1 = f ◦ f n. codes have the following structure:

4 initialize state variables errors in the main loop, and a domain to handle quaternion loop for 10 h computations.(13) read inputs from sensors In addition to numerical domains, a specific memory do- update state and compute output main [24] has been developed to abstract structures and ar- write outputs to actuators wait for next clock tick (10 ms) rays in a field-sensitive way, and to handle pointers and type- unsafe C constructions (such as type-punning, physical casts, (where the read, update and write instructions may be scat- or union types). Finally, an abstraction of execution traces tered in the source or encapsulated within functions) and fea- [29] partitions the reachable states with respect to the his- ture a large number of global variables (the state), floating- tory of computation and permits some path-sensitivity. point computations, few nested loops, and a limited number The various abstractions are independent modules, each of recurring code patterns (a few macros widely instanti- with its own private abstract representation of properties and ated). algorithms, that communicate using a sophisticated commu- Programs analyzed by Astree´ should be stand-alone, i.e., nication network [10] approximating the classic notion of re- have no undefined symbols. Nevertheless, programs are run duced product [9]. within an environment and typically fetch external data, e.g. sensor values, through memory-mapped registers. Astree´ 3.3.2 Ensuring Precision and Efficiency allows specifying these memory locations as “volatile” with a range of expected values (or possibly the full range of the A key to the efficiency of Astree´ is its parsimonious and type, including special NaN or ±∞ float values). The analy- localized use of carefully limited abstract domains. One ex- sis naturally considers all possible sequences of inputs in the ample is the use of octagons [25] which are less expressive but specified ranges. cheaper than general polyhedra [12] (cubic versus exponen- tial cost). Moreover, relational domains, such as octagons 3.3 Design of Astr´ee and boolean trees, do not relate all variables together, but are limited to small packs of variables. Likewise, trace par- Astree´ is an abstract : it traverses the program titioning is only performed on limited portions of the source by structural induction on its syntax, iterating loops and code, after which all partitions are merged together. Code stepping into functions, to collect an abstraction of all pos- portions and variable packs are determined automatically by sible execution traces. The analysis is thus fully flow- and simple, syntax-directed pre-analyses. Finally, domains only context-sensitive. Its termination is guaranteed by the use of communicate a selected portion of the information they infer widening operators O [8] to accelerate loops and the absence to other domains — a fully reduced product would not scale of recursion. up given the amount of information inferred by the many domains. 3.3.1 Abstractions A key to the precision of Astree´ is its design by refine- ment, which is made easy by its very modular design. We ´ does not use a single abstraction, but rather many Astree actually started from a simple interval-based analyzer and abstractions (more than 20) that can be switched on and off. improved it until it reached zero alarms on a first family For instance, expressing the absence of many run-time er- of realistic code [4], then considered other, more complex, rors requires information on variable bounds, and so, As- codes. When encountering a new kind of codes, the analy- tree´ implements the interval domain [8] based on interval (12) sis generally terminates quickly but with some false alarms, arithmetics [26]. However, discovering precise bounds of- which must be investigated by hand to find the cause of ten requires stronger properties, especially for loops that re- imprecision. The analysis can then be improved in various quires invariant relations between variables (e.g. between the ways, from easiest and most common, to most time consum- loop index and other variables used in the loop body). Hence, ing but thankfully seldom required. Often, it is sufficient to ´ Astree uses relational domains, such as octagons [25] that tune some parameters of the abstractions that are exposed infer relations of the form ±X ± Y ≤ c, and boolean deci- through around 150 command-line options (such as iteration sion trees that handle disjunctions precisely — for instance strategies, domain aggressiveness, pack size, etc.), which any (B ∧ X > 1) ∨ (¬B ∧ X < −1). trained end-user can do. When Astree´ already contains a Beside these classic, general-purpose domains, Astree´ domain able to infer the missing information, a solution is to features domain-specific abstractions, tailored to its use in update the pre-analysis of variable packs and code portions control/command software. Examples include a domain to where the domain is activated — this happened, in particu- handle digital filters [15], as presented in Fig.2, a domain clock lar, once while analyzing programs in the same application of geometric-arithmetic deviations [16] X ≤ α(1 + β) domain but with a new code generator [3]. The regularity of used to bound the accumulation of floating-point rounding automatically generated code is very helpful to design robust (12)However interval arithmetics is used statically (at compile-time) full-scale pre-analyses by generalisation of test cases. When with a widening to accelerate convergence of iterative fixpoint compu- tations, not at all dynamically (at run-time) as in the traditional use (13)In mechanics, quaternions extend the complex numbers to desig- of interval arithmetics to evaluate expressions of numerical programs. nate rotations in the 3-dimensional space.

5 the information is inferred but fails to be exploited, new com- that approach is hard to scale, as the analysis of low level munication channels between domains can be opened. These operations is often complex. A second technique proceeds two cases require minor modifications to the analyzer source. by an automatic proof of equivalence (also known as trans- When all fails, it is always possible to design and implement lation validation) of both programs. Note that translation a new abstract domain focusing on the missing properties, validation is closely linked to the proof of absence of run- but this is a research-grade activity. This last case happened time errors at the source level: indeed, semantic equivalence when extending Astree´ from aeronautic to space applica- can only be guaranteed on safe executions, whereas unsafe tions [6]: the later required a domain to handle quaternion runs typically have undefined semantics. computations, which were absent in the former application. Both techniques can be formalized in the Abstract Inter- pretation framework [28], since the semantic effect 3.4 Applications reduces to a classic fixpoint transfer, where C computation steps are translated into assembly computation steps. Such A first application of Astree´ was the proof of absence of run- a formalization allows for the translation certification to in- time errors in two families of industrial embedded avionic terface well with the C code analysis. control/command software generated from SAO specifica- tions [13]. Programs in the first family have around 100 K The Lcertify tool was developed as a library of data- code lines and 10 K global variables (half of which are floats) structures and algorithms for translation validation, follow- and can be analyzed in around 2 h on a 64-bit 2.66 GHz intel ing the framework exposed in [28]. It comes with a C front- workstation using a single core. Programs in the second fam- end, but it can also be used with a custom front-end, adapted ily control more modern systems and are both more complex to a user-specific imperative language, as used for some ap- and longer: up to 1 M code lines. They are analyzed in 50 h. plications. At the assembly level, it takes 32-bit PowerPC bi- Both analyses give zero false alarms. naries. It can certify the compilation of industrial-size codes in a few minutes including disassembly. A second application was the analysis of space software [6] or, more precisely, a 14 K lines C code generated from a SCADE [14] model designed by Astrium ST. After special- ization required by a change of application domain and code generator, the code could be analyzed in 1 h, with zero false 5 Imperfectly-Clocked Synchronous alarm. Since its industrialization by AbsInt, Astree´ is being Systems adapted to handle code generated by dSPACE TargetLink (a popular code generator for MATLAB, Simulink and State- 5.1 Proving Temporal Specifications flow) with encouraging preliminary results [23]. The analysis presented in Sec.4 allows binaries obtained 4 Translation Validation through the process explained in Fig.1 to be proved safe with respect to specifications expressed at the C language The analysis described in Sec.3 is performed at the source level or at the assembly language level. This is a huge part level, thus its results hold true at the assembly level only of the control units of embedded systems which are then cer- if the compilation of the C code is semantically correct. tified. On the other hand, the temporal specifications are In some application domains, such as avionics, certification rather expressed at SAO level, where the synchronous hy- rules state that the development process should be certified pothesis and the syntax make them easier to define and read. and that the final version of the software should be certi- As many systems are designed in a distributed way, this im- fied [30]. From this point of view, the analysis of the C code plies studying multi-clock, inherently asynchronous systems. cannot be considered a sufficient guarantee, as the use of an These systems are luckily not arbitrary asynchronous sys- incorrect compiler may ruin the whole certification effort. tems but rather imperfectly-clocked, meaning that we have While analysis of the target code is adequate for some some information about their physical clocks and their com- applications such as worst-case execution time [18], higher munication channels that may be enough to prove the desired level properties such as absence of run-time errors are easier specification. A typical specification for a clock is that its pe- to reason about at the C level, since many simple C oper- riod (the time between two consecutive clock ticks) is equal ations turn into complex pieces of assembly code, as is the to some fixed value δ with a possible imprecision of x%. case for floating point conversions or code in which the notion A particular case is redundant sub-systems. The whole of error has disappeared (such as memory accesses). Thus, system is made safer by detecting the failures of one of the it is preferable to exploit the results obtained at the source redundant units and by considering only the results of other level in order to cope with the target code certification in a equivalent units. The synchronous language community pro- more efficient way. A first approach relies on the translation posed safe ways to handle this case, we thus developed ab- of invariants obtained in Sec.3 into assembly level proper- stract domains that can be very precise on the code produced ties, which can be checked using an independent analysis, yet following these guidelines.

6 5.2 Continuous-Time Semantics and Tem- other one must be validated, since the different levels can poral Abstract Domains significantly differ in their description of the target system. A modeling language like UML describes nothing but dif- We consider a continuous-time semantics since it allows ab- ferent abstractions of the target system, at different levels stract domains to compute a precise fixpoint abstracting this of abstraction, and so, Abstract Interpretation is certainly semantics. This is because the mathematical properties of applicable both to formalize these abstractions and to de- continuous spaces are richer than those of discrete ones, and velop static analysis techniques at each level of abstraction. maybe also because these systems are actually designed in a However, modeling languages are usually not formalized and continuous world through differential equations. subject to multiple, if not contradictory, interpretations. As By defining an universal constraint on a time interval [a; b] with any formal method, the first task towards the use of and a Boolean x, denoted ∀ha; bi : x, we can describe signals Abstract Interpretation on UML would therefore be to pro- that take the value x during the whole time interval [a; b]. vide a rigorous mathematical definition of the meaning of the Similarly, an existential constraint, denoted ∃[a; b]: x, de- data, business, object, and component modeling, and their scribes signals that take the value x at least once during [a; b]. diagramatic representations. It will then be possible to de- This abstract information can be propagated abstractly in fine in what sense modeling languages do abstract the design an efficient way. For example, the effect of a negation oper- process and ultimately the target system. Then, the devel- ation on a ∀ha; bi : x signal turns it into ∀ha; bi : ¬x. In a opment of tools, going beyond mere syntactic checks, will be more complex way, yet at the cost of only two additions, a possible. communication channel transmitting information in a serial way with at least α and at most β delay, submitted with a Acknowledgments. We thank Isabelle Perseil for her kind ∃[a; b]: x constraint results in ∃[a + α; b + β]: x. invitation to the UML&FM 2010 workshop. By considering conjunctions of these elements, we may ex- press temporal properties. This abstract domain thus plays a similar role as the one of the interval domain presented in References Sec.2 for the analysis of the code of one unit only. [1] AbsInt, Angewandte Informatik. Astree´ run-time error Additional domains on time properties were defined in [2]. analyzer. http://www.absint.com/astree/. This temporal aspect does not only enable proving tempo- ral properties, but also allows the automatic definition of a [2] Bertrane, J. Proving the properties of communicating reduced product [9], the time becoming a common language imperfectly-clocked synchronous systems. In Proceedings of the Thirteenth International Symposium on Static Analysis between them. (SAS 06) (Seoul, 29–31 Aug. 2006), K. Yi, Ed., vol. 4134 of LNCS, Springer, pp. 370–386. 5.3 Implementation and Experiments [3] Bertrane, J., Cousot, P., Cousot, R., Feret, J., Mauborgne, L., Mine,´ A., and Rival, X. Static analysis Relaxing the synchrony hypothesis allows the certification of and verification of aerospace software by abstract interpreta- a bigger subset of the whole embedded system, at the price of tion. In AIAA Infotech@Aerospace (I@A 2010), AIAA-2010- a much more complex analysis that actually depends on the 3385. AIAA (American Institute of Aeronautics and Astro- imprecision of the clocks and the communication systems. A nautics), Apr. 2010, pp. 1–38. prototype static analyzer has been developed according to [4] these ideas and was able to prove some temporal properties Blanchet, B., Cousot, P., Cousot, R., Feret, J., Mauborgne, L., Mine,´ A., Monniaux, D., and Rival, of redundant SAO systems with a voting system arbitrat- X. Design and implementation of a special-purpose static ing among them. Furthermore, when the analyzer cannot program analyzer for safety-critical real-time embedded soft- prove the specification, looking at the abstract fixpoint is ware, invited chapter. In The Essence of Computation: Com- sometimes sufficient to devise an erroneous trace. When this plexity, Analysis, Transformation. Essays Dedicated to Neil is not the case, it may be due false alarms that might be D. Jones, T. Mogensen, D. Schmidt, and I. Sudborough, removed by creating more precise abstract domains. Inci- Eds., LNCS 2566. Springer, 2002, pp. 85–108. dentally, by trying the prototype on systems with different [5] Blanchet, B., Cousot, P., Cousot, R., Feret, J., parameters, interesting information can be obtained, such as Mauborgne, L., Mine,´ A., Monniaux, D., and Rival, the minimal synchrony for the stabilization of the values read X. A static analyzer for large safety-critical software. In by sensors such that the specification is proved. Proc. ACM SIGPLAN ’2003 Conf. PLDI (San Diego, 2003), ACM Press, pp. 196–207. [6] Bouissou, O., Conquet, E., Cousot, P., Cousot, R., 6 Conclusion Feret, J., Goubault, E., Ghorbal, K., Lesens, D., Mauborgne, L., Mine,´ A., Putot, S., Rival, X., and We have shown that static analysis by Abstract Interpreta- Turin, M. Space software validation using abstract inter- tion can be applied from the design to the implementation pretation. In Proc. of the Int. Space System Engineering of software systems. Each level of description of the system Conference, Data Systems In Aerospace (DASIA’09) (Istan- must be checked, and the translation from one level to an- bul, Turkey, 26–29 May 2009), ESA publications, pp. 1–7.

7 [7] Cousot, P. The calculational design of a generic abstract programs. In Tools for Automatic (TAPAS interpreter. In Calculational System Design, M. Broy and 2010), Perpignan, France (17 Sep. 2010). R. Steinbr¨uggen,Eds. NATO ASI Series F. IOS Press, Am- [21] ISO/IEC JTC1/SC22/WG14 Working Group. C stan- sterdam, 1999. dard. Tech. Rep. 1124, ISO & IEC, 2007. [8] Abstract interpretation: a Cousot, P., and Cousot, R. [22] Jeannet, B., and Mine,´ A. Apron: A library of numer- unified model for static analysis of programs by con- ical abstract domains for static analysis. Computer Aided struction or approximation of fixpoints. In Conf. Rec. of the Verification, CAV’2009 5643 of LNCS (2009), 661–667. 4th ACM Symp. on Principles of Programming Languages (POPL’77) (Jan. 1977), pp. 238–252. [23] Kastner,¨ D., Wilhelm, S., Nenova, S., Cousot, P., Cousot, R., Feret, J., Mauborgne, L., Mine,´ A., and [9] Cousot, P., and Cousot, R. Systematic design of program Rival, X. Astree´ : Proving the absence of rutime er- analysis frameworks. In Conference Record of the Sixth An- rors. In Proc. of Embedded Real-Time Software and Systems nual ACM SIGPLAN-SIGACT Symposium on Principles of (ERTS’10) (Toulouse, France, May 2010), pp. 1–5. (to ap- Programming Languages (San Antonio, Texas, 1979), ACM pear). Press, pp. 269–282. [24] Mine,´ A. Field-sensitive value analysis of embedded C pro- [10] Cousot, P., Cousot, R., Feret, J., Mauborgne, L., grams with union types and pointer arithmetics. In Proc. Mine,´ A., Monniaux, D., and Rival, X. Combination of the ACM SIGPLAN-SIGBED Conf. on Languages, Com- of abstractions in the Astree´ static analyzer. In Proc. pilers, and Tools for Embedded Systems (LCTES’06) (June of the 11th Annual Asian Computing Science Conference 2006), ACM Press, pp. 54–63. (ASIAN’06) (Tokyo, Japan, 6–8 Dec. 2006), M. Okada and I. Satoh, Eds., vol. 4435 of LNCS, Springer, pp. 272–300. [25] Mine,´ A. The octagon abstract domain. Higher-Order and Symbolic Computation 19 (2006), 31–100. [11] Cousot, P., Cousot, R., Feret, J., Mauborgne, L., Mine,´ A., and Rival, X. The Astree´ static analyzer. [26] Moore, R. E. Interval Analysis. Prentice Hall, Englewood http://www.astree.ens.fr. Cliffs N. J., USA, 1966. [12] Cousot, P., and Halbwachs, N. Automatic discovery of [27] Randimbivololona, F., Souyris, J., Baudin, P., linear restraints among variables of a program. In Conf. Pacalet, A., Raguideau, J., and Schoen, D. Applying Rec. of the 5th Annual ACM SIGPLAN-SIGACT Symp. on formal proof techniques to avionics software: A pragmatic Principles of Programming Languages (POPL’78) (Tucson, approach. In Proc. of the World Congress on Formal Meth- USA, 1978), ACM Press, pp. 84–97. ods (FM’99) (1999), vol. 1709 of LNCS, Springer, pp. 1798– 1815. [13] Delmas, D., and Souyris, J. Astree´ : from research to industry. In Proc. of the 14th Int. Static Analysis Sympo- [28] Rival, X. Symbolic transfer functions-based approaches sium (SAS’07), G. Fil´eand H. Riis-Nielson, Eds., vol. 4634 to certified compilation. In Conf. Rec. of the 31st Annual of LNCS. Springer, Kongens Lyngby, Denmark, 22–24 Aug. ACM SIGPLAN-SIGACT Symp. on Principles of Program- 2007, pp. 437–451. ming Languages (POPL’04) (Venice, Italy, Jan. 2004), ACM [14] Esterel Technologies. Scade suiteTM, the standard Press, pp. 1–13. for the development of safety-critical embedded software in [29] Rival, X., and Mauborgne, L. The trace partitioning the avionics industry. http://www.esterel-technologies. abstract domain. ACM Trans. Program. Lang. Syst. 29, 5 com/. (2007). [15] Feret, J. Static analysis of digital filters. In Proc. of the [30] Technical Commission on Aviation, R. DO-178B. Tech. 13th European Symp. on Programming Languages and Sys- rep., Software Considerations in Airborne Systems and tems (ESOP’04) (27 Mar.–4 Apr. 2004), D. Schmidt, Ed., Equipment Certification, 1999. vol. 2986 of LNCS, Springer, pp. 33–48. [31] von der Beeck, M. A formal semantics of uml-rt. In [16] Feret, J. The arithmetic-geometric progression abstract do- Model Driven Engineering Languages and Systems, 9th In- main. In Proc. of the 6th Int. Conf. on Verification, Model ternational Conference, MoDELS 2006, Genova, Italy, Octo- Checking and Abstract Interpretation (VMCAI’05) (Paris, ber 1-6, 2006, Proceedings (2006), O.Nierstrasz, J. Whittle, France, 17–19 Jan. 2005), R. Cousot, Ed., vol. 3385 of LNCS, D. Harel, and G. Reggio, Eds., vol. 4199 of LNCS, Springer, Springer, pp. 42–58. pp. 768–782. [17] Goubault, E. Static analyses of floating-point operations. In Proc. of the 8th Int. Static Analysis Symposium (SAS’01) (2001), vol. 2126 of LNCS, Springer, pp. 234–259. [18] Heckmann, R., and Ferdinand, C. Worst-case execu- tion time prediction by . In Proc. of the 18th Int. Parallel and Distributed Processing Symposium (IPDPS’04) (2004), IEEE Computer Society, pp. 26–30. [19] IEEE Computer Society. IEEE standard for binary floating-point arithmetic. Tech. rep., ANSI/IEEE Std. 745- 1985, 1985. [20] Ioualalen, A. SARDANA: an abstract interpretation based tool for Optimization of numerical expressions in LUSTRE

8