Static Analysis by Abstract Interpretation of Embedded Critical Software

Static Analysis by Abstract Interpretation of Embedded Critical Software Julien Bertrane Patrick Cousot Radhia Cousot JérômeFeret Laurent Mauborgne ENS∗ ENS∗ & CIMSx CNRS & ENS∗ INRIA & ENS∗ IMDEA∗∗ Antoine Miné Xavier Rival CNRS & ENS∗ INRIA & ENS∗ Abstract Formal methods are increasingly used to help ensuring the correctness of complex, critical embedded software systems. modeling We show how sound semantic static analyses based on Ab- S static analysis stract Interpretation may be used to check properties at var- A ious levels of a software design: from high level models to low O level binary code. After a short introduction to the Abstract code generation V Interpretation theory, we present a few current applications: a checking for run-time errors at the C level, translation val- l static analysis i idation from C to assembly, and analyzing SAO models of d C a communicating synchronous systems with imperfect clocks. t translation i We conclude by briefly proposing some requirements to ap- o compilation validation ply Abstract Interpretation to modeling languages such as n UML. B static analysis I keywords: Abstract interpretation, Critical software, Em- N test bedded systems, Static analysis, System design, System modeling, System verification. execution 1 Introduction Ensuring the correctness of software systems constitutes a Figure 1: Example workflow for designing an embedded ap- large part of software development budgets. It is particu- plication. larly important for critical embedded systems, such as found in automotive, aerospace and medical applications, as the slightest programming \bug" may have catastrophic finan- based static analysis, which always terminates and covers all cial and even human cost. In this article, we build a case executions, albeit in an over-approximated way. For example for using static analysis based on Abstract Interpretation to it can detect dead code (never executed) or dead data struc- help ensuring software correctness. tures (constructed but never used) but cannot always prove We illustrate a possible use for static analysis in Fig.1. In their absence. Moreover, static analysis can be applied at this drastically simplified workflow inspired from a real in- many levels: machine-readable specification, program source dustrial case [27], an engineer (not necessarily a programmer) or binary. The higher level the better, as it provides purer in- models a control system using the SAO graphical language, formation to the tool and its feedback is easier to understand a precursor and similar tool as SCADE [14] | a SimulinkTM to the designer. However, higher levels abstract away some fragment similar to SAO is also shown in Fig.2. It is then au- aspects of computations, which makes it impossible to check tomatically translated to the programming language C and some properties of actual executions. For instance, SAO and then compiled to produce the actual binary software exe- SCADE have real arithmetics and do not specify how actual cuted by the device. Validation includes testing, which re- numerical computations are performed nor the type of num- quires executing (part of) the binary with some monitoring bers, so that a static analysis of numeric overflows (as done and is able to check a wide range of properties (including by Astree´ [1, 11,5], Sec.3) or of the precision of floating- functional ones) but is costly and never achieves a full cover- point computations (as done by Fluctuat [17]) is done at the age of all possible executions (path- and data-coverage). For- C level | a static analysis of real expressions at the SCADE mal methods can also be employed. In particular, semantic- level may however be used to determine its numerically most precise float compilation to C [20]. Likewise, neither SAO nor (∗)Ecole´ Normale Supérieure, 45, rue d'Ulm, Paris, France, C make any guarantee about the worst-case execution time, Ls e.@tr .an rssfitF (x)Courant Institute of Mathematical Sciences, NYU, New York, NY so, such an analysis (as done by AbsInt's aiT [18]) is done at (∗∗)FundaciónIMDEA Software, 28660-Boadilla del Monte, Madrid, the binary level and for a specific processor. A SAO model Spain, m u oa eu amal .eibrrnt ordg [email protected] can also be enriched with non-software elements, such as real- 1 a b 2 Abstract Interpretation - We provide a succinct introduction to Abstract Interpreta- tion, more details are provided e.g. in [3]. ++ -1 + z z-1 t Unit delay Unit delay 2.1 Small-Step Operational Semantics Switch Switch In order to analyze the behavior of a computer system during x(n) Switch j B i execution, we start by providing a model of computations, that is, an operational semantics. An example of operational Figure 2: A second-order digital filter. semantics for UML-Statecharts is [31]. Such an operational semantics of a given program can be described as a transition system hS; I; E; ti. S is a set of states, including initial states I ⊆ S and bad or erroneous states E ⊆ S. t ⊆ S × S is a transition relation between a time clocks and communication lines with delays, to enable state s 2 S and its possible successors: for any state s0 2 S, a static analysis taking time into account (Sec.5). Finally, t(s; s0) is true if, and only if, s0 is a potential successor of s. 0 static analysis can also improve the confidence in compil- The blocking states have no successor B , fs 2 S j 8s 2 S : ers and code generators: translation validation (Sec.4) can :t(s; s0)g. check whether the source and output are equivalent, at least with respect to a class of properties, so that the analysis for 2.2 Big-Step Operational Semantics such properties at a higher level needs not be redone at the lowest level (which is often more difficult). The big-step operational semantics of hS; I; E; ti is hS; I; E; t?i where t? S tn is the reflexive transitive closure , n>0 Ideally, a static analyzer should extract automatically pre- (1) 0 of t, t , 1S , fhs; si j s 2 Sg is the identity relation on cise properties from a complete mathematical specification of n+1 n (2) S, t , t ◦ t where ◦ is the composition of relations. the analyzed system. Most properties are however undecid- ? ? We have t = T (t ) where T (r) , 1S [ r ◦ t since a state able, so, we resort to abstraction, i.e., the analyzer explores 0 s is reachable from s in n > 0 steps if and only if n = 0 and machine-representable supersets of actual behaviors of the s = s0 or n > 0 and s0 is reachable from a successor of s in system using tractable algorithms. As a consequence, the ? n − 1 steps. Moreover if T (r) = 1S [ t ◦ r ⊆ r then t ⊆ r. It analyzer may consider spurious behaviors and miss proper- ⊆ follows that t? = lfp T , by definition of the ⊆-least fixpoint ties, but the analysis is sound: all the properties that are ⊆ (3) found (absence of run-time errors, worst-case execution time, lfp T of T . Because all non-trivial properties of programs are undecid- etc.) are indeed true for all executions. A specificity of static ⊆ analysis is that it works directly on the concrete system that able, t? = lfp T is not computable for infinite state transi- is input to compilers or code generators, and the abstracted tion systems hS; I; E; ti (except for trivial programs t and system is derived automatically according to built-in abstrac- specifications E). tion mechanisms; no abstract system need to be provided | which would pose the question of whether it indeed corre- 2.3 Specification sponds to the concrete one. We use the Abstract Interpre- tation framework [8], a general theory of the approximation A typical verification problem is to prove that no execution of semantics, to design static analyzers that are sound by starting in an initial state can reach a bad state (e.g. where construction. There is no silver bullet: each static analyzer the next execution step would raise an error). The correct- should be tailored to a specific class of properties and pro- ness condition is 8s 2 I : 8s0 2 S : t?(s; s0) =) s0 62 E that grams to achieve both precision (low rate of missed prop- is no state s0 reachable from the initial states s is a bad state. erties) and efficiency. Thankfully, Abstract Interpretation For example, E can be the set of blocking states in order to provides a growing library of ready-to-use abstractions, and specify the absence of deadlocks. Error-freedom can also be 0 ? 0 the mean to design new ones in a principled way. written R ⊆ S n E where R , fs j 9s 2 I : t (s; s )g is the set of states reachable from the initial states and After a short formal introduction to Abstract Interpreta- X n Y , fx 2 X j x 62 Y g. tion theory in Sec.2, this article describes more informally (1) ? 0 It follows that t (s; s ) = 9n > 0 : 9s0; : : : ; sn : s = s0 ^ t(s0; s1) ^ several static analysis applications: checking for run-time 0 0 ::: ^ t(sn−1; sn) ^ sn = s , including the case s = s for n = 0. ´ (2) 00 0 errors with Astree (Sec.3), translation validation from ◦ is the composition of relations that is r1 ◦ r2 , fhx; x i j 9x : hx; 0 0 00 C to assembly (Sec.4), and analyzing communicating syn- x i 2 r1 ^ hx ; x i 2 r2g. v chronous systems with imperfect clocks (Sec.5). Section6 (3)The v-least fixpoint lfp f of an increasing map f on a poset v v concludes and suggests the application of Abstract Interpre- partially ordered by v is defined by f(lfp f) = lfp f and f(x) v x v tation to modeling languages such as UML.

Load more