Spring 2014 Program Analysis and Verification

Lecture 9: Abstract Interpretation I

Roman Manevich Ben-Gurion University Syllabus

Abstract Static Analysis Crafting your Semantics Interpretation Analysis Techniques own fundamentals

Natural Automating Numerical Lattices Soot Semantics Hoare Logic Domains

Structural Control Flow Galois From proofs to CEGAR semantics Graphs Connections abstractions

Systematically Axiomatic Equation Fixed-Points Alias analysis developing Verification Systems transformers

Collecting Widening/ Shape Semantics Narrowing Analysis

Domain constructors

Interprocedural Analysis

2 Previously

• Another static analysis example – constant propagation • Basic concepts in static analysis – Control flow graphs – Equation systems – Collecting semantics – (Trace semantics)

3 Annotating programs Annotate(P, S) = case S is x:=aexpr return {P} x:=aexpr {F*[x:=aexpr] P}

case S is S1; S2

let Annotate(P, S1) be {P} A1 {Q1}

let Annotate(Q1, S2) be {Q1} A2 {Q2}

return {P} A1; {Q1} A2 {Q2}

case S is if bexpr then S1 else S2

let Pt = F[assume bexpr] P

let Pf = F[assume bexpr] P

let Annotate(Pt, S1) be {Pt} A1 {Q1}

let Annotate(Pf, S2) be {Pf} A2 {Q2}

return {P} if bexpr then {Pt} A1 {Q1} else {Pf} A2 {Q2} {Q1  Q2} case S is while bexpr do S

N := Nc := P // Initialize repeat let Pt = F[assume bexpr] Nc

let Annotate(Pt, S) be {Nc} Abody {N} Nc := Nc  N until N = Nc return {P} INV= {N} while bexpr do {P } A {F[assume bexpr](N)} t body 4 Collecting semantics example: input 1

1 label0: 2 if x <= 0 goto label1 3 x := x – 1 … [x3] [x2] [x1] entry 4 goto label0

5 label1:

[x-1] [x0] [x1] 2 if x > 0

[x-1] exit [x0] [x1] 3 x := x - 1

5 Collecting semantics example: input 2

1 label0: 2 if x <= 0 goto label1 3 x := x – 1 … [x3] [x2] [x1] entry 4 goto label0

5 label1:

[x2] [x-1] [x0] [x1] 2 if x > 0

[x-1] exit [x0] [x1] 3 x := x - 1 [x2]

6 Collecting semantics example: input 3

1 label0: 2 if x <= 0 goto label1 3 x := x – 1 … [x3] [x2] [x1] entry 4 goto label0

5 label1:

[x3] [x2] [x2] [x-1] [x0] [x1] 2 if x > 0

[x-1] exit [x0] [x1] 3 x := x - 1 [x3] [x2]

7 ad infinitum – fixed point

1 label0: 2 if x <= 0 goto label1 3 x := x – 1 … [x3] [x2] [x1] entry 4 goto label0

5 label1:

… [x3] [x2] [x2] [x-1] [x0] [x1] 2 if x > 0

… [x-2][ x-1] exit [x1] 3 x := x - 1 x x [ 3] [ …2 ]

8 Predicates at fixed point

1 label0: 2 if x <= 0 goto label1 3 x := x – 1 {true} entry 4 goto label0

5 label1:

{true} 2 if x > 0

{x0} exit {x>0} 3 x := x - 1 {x0}

9 Equational definition example

• A vector of variables R[0, 1, 2, 3, 4] • R[0] = {xZ} // established input Semantic function for assume x>0 R[1] = R[0]  R[4] R[2] = R[1]  {s | s(x) > 0} R[3] = R[1]  {s | s(x)  0} Semantic function for x:=x-1 lifted to sets of states R[4] = x:=x-1 R[2] • A (recursive) system of equations

entry R[0] R[1] if x > 0 R[3] R[2] R[4]

exit x := x-1 10 General definition

• A vector of variables R[0, …, k] one per input/output of a node – R[0] is for entry • For node n with multiple predecessors add equation R[n] = {R[k] | k is a predecessor of n} • For an atomic operation node R[m] S R[n] add equation R[n] = S R[m]

• Transform if b then S1 else S2 to (assume b; S1) or (assume b; S2)

entry R[0] R[1] if x > 0 R[3] R[2] R[4]

exit x := x-1 11 Current lecture Appendix A. • Semantic domains – Preorders – Partial orders (posets) – Pointed posets – Ascending/descending chains – The height of a poset – operators – Complete lattices – Constructing new lattices from old

12 Abstract interpretation Theory [1977]

By Rama (Own work) [CC-BY-SA-2.0-fr (http://creativecommons.org/licenses/by-sa/2.0/fr/deed.en)], via Wikimedia Commons

13 Abstract Interpretation [CC77]

• A very general mathematical framework for approximating semantics – Generalizes Hoare Logic – Generalizes weakest precondition calculus • Allows designing sound static analysis algorithms – Usually compute by iterating to a fixed-point – Not specific to any programming language style • Results of an abstract interpretation are (loop) invariants – Can be interpreted as axiomatic verification assertions and used for verification

14 Annotating programs Annotate(P, S) = case S is x:=aexpr return {P} x:=aexpr {F*[x:=aexpr] P}

case S is S1; S2

let Annotate(P, S1) be {P} A1 {Q1} let Annotate(Q , S ) be {Q } A {Q } 1 2 1 2 2 Approximates concrete semantics return {P} A ; {Q } A {Q } 1 1 2 2 sp(x:=aexpr, P)  F*[x:=aexpr] case S is if bexpr then S1 else S2

let Pt = F[assume bexpr] P

let Pf = F[assume bexpr] P

let Annotate(Pt, S1) be {Pt} A1 {Q1}

let Annotate(Pf, S2) be {Pf} A2 {Q2} Approximates

return {P} if bexpr then {Pt} A1 {Q1} disjunction else {Pf} A2 {Q2} {Q1  Q2} case S is while bexpr do S { P’ } S { Q’ } N := Nc := P // Initialize [consp] if PP’ and Q’Q repeat { P } S { Q } let Pt = F[assume bexpr] Nc

let Annotate(Pt, S) be {Nc} Abody {N} Nc := Nc  N until N = Nc return {P} INV= {N} while bexpr do {P } A {F[assume bexpr](N)} t body 15 The big picture

• Use semantic domains to define both concrete semantics and abstract semantics • Relate semantics in a sound way • Interpret program over abstract semantics

abstract abstract statement S representation representation abstract semantics of sets of states of sets of states

abstraction meaning abstraction meaning

statement S set of states set of states set of states collecting semantics 

16 A theory of semantic domains

1. Approximating elements 2. Approximating sets of elements

By Brett Jordan David Macdonald [CC-BY-2.0 (http://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons

17 Overall idea

• A semantic domain can be used to define properties (representations of predicates) – Also called abstract states • Common representations – Logical formulas – Automata – Specialized graphs

18 A taxonomy of semantic domain types

Complete Lattice (D, , , , , )

Lattice (D, , , , , )

Join Meet semilattice (D, , , ) (D, , , )

Complete partial order (CPO) (D, , )

Partial order (poset) (D, )

Preorder (D, ) 19 preorders

20 Preorder

• Let D be a set of elements • We say that a binary order relation  over D is a preorder if the following conditions hold for every d, d’, d’’  D – Reflexive: d  d – Transitive: d  d’ and d’  d’’ implies d  d’’ • There may exist d, d’ such that d  d’ and d’  d yet d  d’

21 Preorder examples

• SAV-predicates – SAV-factoids  = { x = y | x, y  Var }  { x = y + z | x, y, z  Var } – SAV-predicates  = 2 set – Order relation 1: P1  P2 iff P1  P2 imp – Order relation 2: P1  P2 iff P1  P2 – Which order relation is stronger (contains more pairs)? – Which order relation is easier to check?

– What if both P1 and P2 are in the image of explicate?

22 set SAV preorder 1: P1  P2 iff P1  P2

{} Var = {x, y}

{x=y} {y=x} {x=x+x} {y=y+y} {y=x+y} {y=y+x} {x=x+y} {x=y+x}

{x=y, y=x} {x=y, x=x+x} … {x=x+y, x=y+x}

{x=y, x=x+x, x=x+y} … {x=y, x=x+x, x=x+y}

{x=y, y=x, x=x+x, y=y+y, y=x+y, y=y+x, x=x+y, x=y+x}

23 imp SAV preorder 2: P1  P2 iff P1  P2

{} Var = {x, y}

{x=y} {y=x} {x=x+x} {y=y+y} {y=x+y} {y=y+x} {x=x+y} {x=y+x}

{x=y, y=x} {x=y, x=x+x} … {x=x+y, x=y+x}

{x=y, x=x+x, x=x+y} … {x=y, x=x+x, x=x+y}

{x=y, y=x, x=x+x, y=y+y, y=x+y, y=y+x, x=x+y, x=y+x}

24 Preorder examples

• CP-predicates – CP-factoids  = { x = c | x  Var, c  Z } – CP-predicates  = 2 set – Order relation 1: P1  P2 iff P1  P2 imp – Order relation 2: P1  P2 iff P1  P2 – Is there a difference? • {x=5, x=7, x=9}  {x=5, x=7} • {x=5, x=7, x=9}  {x=5, x=7} • {x=5, x=7}  {x=5, x=7, x=9}

25 CP preorder example

{}

… {x=-3} {x=-2} {x=-1} {x=0} {x=1} {x=2} {x=3} …

Var = {x}

26 CP preorder example

{}

… {x=-3} {x=0} {x=3} … {y=-5} {y=0} {y=36} …

{x=-3, y=-5} {x=0, y=0} {x=3, y=36}

Var = {x, y}

27 The problem with preorders

• Equivalent elements have different representations – {x=y, x=a+b} S {Q} – {x=y, y=a+b} S {Q’} • Leads to unpredictability • Which result should our static analysis give?

28 The problem with preorders

• Equivalent elements have different representations – {x=y, x=a+b} assume ya+b {x=y, x=a+b} – {x=y, y=a+b} assume ya+b {false} • Leads to unpredictability • Which result should our static analysis give?

29 The problem with preorders

• Equivalent elements have different representations – {x=y, x=a+b} assume xa+b {false} – {x=y, y=a+b} assume xa+b {x=y, x=a+b} • Leads to unpredictability • Which result should our static analysis give?

In practice many static analyses still use preorders

30 Partial orders

31 Partially ordered sets (partial orders)

• A (Poset for short) is a pair (D , ) • D is a set of elements – a semantic domain •  is a partial order between pairs of elements from D. That is  : D  D with the following

properties, for all d, d’, d’’ in D Makes it easier to choose the – Reflexive: d  d best element – Transitive: d  d’ and d’  d’’ implies d  d’’ – Anti-symmetric: d  d’ and d’  d implies d = d’ • If d  d’ and d  d’ we write d  d’

32 Partially ordered sets (partial orders)

• A partially ordered set (Poset for short) is a pair (D , ) • D is a set of elements – a semantic domain •  is a partial order between pairs of elements from D. That is  : D  D with the following properties, for all d, d’, d’’ in D – Reflexive: d  d – Transitive: d  d’ and d’  d’’ implies d  d’’ – Anti-symmetric: d  d’ and d’  d implies d = d’ • If d  d’ and d  d’ we write d  d’

33 SAV partial order

• SAV-predicates – SAV-factoids  = { x = y | x, y  Var }  { x = y + z | x, y, z  Var } – SAV-predicates  = 2 set • Order relation 1: P1  P2 iff P1  P2 Is this a partial order? imp • Order relation 2: P1  P2 iff P1  P2 that is models(P1)  models(P2) Is this a partial order? set* • Order relation 3: P1  P2 iff set Explicate(P1)  Explicate(P2) Is this a partial order?

34 CP partial order

• CP-predicates – CP-factoids  = { x = c | x  Var, c  Z } – CP-predicates  = 2 set • Order relation 1: P1  P2 iff P1  P2 Is it a partial order? imp • Order relation 2: P1  P2 iff P1  P2 Is it a partial order?

Can we define a more precise partial order?

35 CP partial order

• CP-predicates

– CP-factoids false = { x = c | x  Var, c  Z } – CP-predicates  = 2  {false} – Define reduce : 2  2

reduce(P) = if exists {x=c1, x=c2}P then {false} else P  – false = { P2 | P=reduce(P) }  {false}

• Order relation: P1  P2 if P1  P2 or P1={false}

36 Pointed poset

• A poset (D, ) with a least element  is called a pointed poset – For all dD we have that   d • The pointed poset is denoted by (D , , ) • We can always transform a poset (D, ) into a pointed poset by adding a special bottom element (D  {},   {d | dD}, )  • Example: false = { P2 | P=reduce(P) }  {false}

37 chains

38 Chains

• If d  d’ and d  d’ we write d  d’ • Similarly define d  d’ • Let (D, ) be a poset • An ascending chain is a sequence x1  x2  …  xk … • A descending chain is a sequence x1  x2  …  xk … • The height of a poset is the length of the maximal ascending chain – What is the height of the SAV poset? – What is the height of the CP poset?

39 Ascending chain example

true

x0 x0

x<0 x=0 x>0

false

40 Joining elements

By Viviana Pastor (originally posted to Flickr as Harbour Bridge 1) [CC-BY-2.0 (http://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons 41 Bounds

• Let (D , ) be a poset • Let X  D be a set of elements from D • An element dD is an upper bound (ub) of X iff for every xD we have that xd • An element dD is a lower bound (lb) of X iff for every xD we have that dx • An element dD is the least upper bound (lub) of X iff d is the minimal of all upper bounds of X • An element dD is the greatest lower bound (glb) of X iff d is the maximal of all lower bounds of X

42 Bounds example

true the signs lattice (for variable x)

x0 x0

x<0 x=0 x>0

false

43 x0 and true are upper bounds

true

x0 x0

x<0 x=0 x>0

false

44 x0 is the least upper bound

true

x0 x0

x<0 x=0 x>0

false

45 Join (confluence) operator

• Assume a poset (D, ) • Let X  D be a subset of D (finite/infinite) • The join of X is defined as – X = the least upper bound (LUB) of all elements in X if it exists

– X = min{ b | forall xX we have that xb} – The supremum of the elements in X – A kind of abstract union (disjunction) operator • Properties of a join operator – Commutative: x  y = y  x – Associative: (x  y)  z = x  (y  z) – Idempotent: x  x = x • x  y = y iff x  y

46 Properties of join

• Can be used to define partial order x  y = y iff x  y • Monotone: if y  z then (x  y)  (x  z) •   x = x •   x = 

47 Meet operator

• Assume a poset (D, ) • Let X  D be a subset of D (finite/infinite) • The meet of X is defined as – X = the greatest lower bound (GLB) of all elements in X if it exists

– X = max{ b | forall xX we have that bx} – The infimum of the elements in X – A kind of abstract intersection (conjunction) operator • Properties of a join operator – Commutative: x  y = y  x – Associative: (x  y)  z = x  (y  z) – Idempotent: x  x = x

48 Complete partial orders

49 Complete partial order (CPO)

• A CPO is a partial order where each ascending chain has a supremum

50 lattices

51

• A complete lattice (D, , , , , ) is • A set of elements D • A partial order x  y • A join operator  • A meet operator 

52 Join semilattice

• A complete lattice (D, , , ) is • A set of elements D with  • A partial order x  y • A join operator 

53 Meet semilattice

• A complete lattice (D, , , ) is • A set of elements D with  • A partial order x  y • A meet operator 

54 Powerset lattices

• For a set of elements X we define the powerset lattice for X as (2X, , , , , X) – Notice it is a complete lattice • For a set of program states State, we define the collecting lattice (2State, , , , , State)

55 Composing lattices

56 One lattice per variable

true true

x0 x0 y0 y0

x<0 x=0 x>0 y<0 y=0 y>0

false false

How can we compose them?

57 Cartesian product of complete lattices

• For two complete lattices L1 = (D1, 1, 1, 1, 1, 1) L2 = (D2, 2, 2, 2, 2, 2) • Define the poset Lcart = (D1D2, cart, cart, cart, cart, cart) as follows:

– (x1, x2) cart (y1, y2) iff x1 1 y1 and x2 2 y2 – cart = ? cart = ? cart = ? cart = ? • Lemma: L is a complete lattice

• Define the Cartesian constructor Lcart = Cart(L1, L2)

58 Cartesian product example

true (true, true)

x0 x0 y0 y0

x0,y0 x0,y0 x0,y0 x0,y0 … x0,y<0 x0,y<0 x0,y=0 x0,y=0 x0,y>0 x0,y>0 … x>0,y0 x>0,y0 … x<0,y<0 x<0,y=0 x<0,y>0 x=0,y<0 x=0,y=0 x=0,y>0 x>0,y<0 x>0,y=0 x>0,y>0

false How does it represent (x<0y<0)  (x>0y>0)? (false, false)

59 Disjunctive completion

• For a complete lattice L = (D, , , , , ) • Define the powerset lattice D L = (2 , , , , , )  = ?  = ?  = ?  = ?  = ? • Lemma: L is a complete lattice • L contains all subsets of D, which can be thought of as disjunctions of the corresponding predicates • Define the disjunctive completion constructor L = Disj(L)

60 The base lattice CPfalse

true

… {x=-2} {x=-1} {x=0} {x=1} {x=2} …

false

61 The disjunctive completion of CPfalse What is the height of this lattice? true

… {x=-2} {x=-1} {x=0} {x=1} {x=2} …

… {x=-2x=-1} {x=-2x=0} {x=-2x=1} … {x=1x=2} …

… {x=-1 x=1x=-2} … {x=0 x=1x=2} … …

false

62 Relational product of lattices

• L1 = (D1, 1, 1, 1, 1, 1) L2 = (D2, 2, 2, 2, 2, 2) D1D2 • Lrel = (2 , rel, rel, rel, rel, rel) as follows:

– Lrel = ?

63 Relational product of lattices

• L1 = (D1, 1, 1, 1, 1, 1) L2 = (D2, 2, 2, 2, 2, 2) D1D2 • Lrel = (2 , rel, rel, rel, rel, rel) as follows:

– Lrel = Disj(Cart(L1, L2)) • Lemma: L is a complete lattice • What does it buy us?

64 Cartesian product example

true

x0 x0 y0 y0

x0,y0 x0,y0 x0,y0 x0,y0 … x0,y<0 x0,y<0 x0,y=0 x0,y=0 x0,y>0 x0,y>0 … x>0,y0 x>0,y0 … x<0,y<0 x<0,y=0 x<0,y>0 x=0,y<0 x=0,y=0 x=0,y>0 x>0,y<0 x>0,y=0 x>0,y>0

false How does it represent What is the height of (x<0y<0)  (x>0y>0)? this lattice?

65 Relational product example

true

x0 x0 y0 y0

(x<0y<0)(x>0y>0) (x<0y<0)(x>0y=0) (x<0y0)(x<0y0)

false How does it represent What is the height of (x<0y<0)  (x>0y>0)? this lattice?

66 Collecting semantics

1 label0: 2 if x <= 0 goto label1 3 x := x – 1 … [x3] [x2] [x1] entry 4 goto label0

5 label1:

… [x3] [x2] [x2] [x-1] [x0] [x1] 2 if x > 0

… [x-2][ x-1] exit [x0] [x1] 3 x := x - 1 x x [ 3] [ …2 ]

67 Defining the collecting semantics

• How should we represent the set of states at a given control-flow node by a lattice? • How should we represent the sets of states at all control-flow nodes by a lattice?

68 Finite maps

• For a complete lattice L = (D, , , , , ) and finite set V • Define the poset LVL = (VD, VL, VL, VL, VL, VL) as follows:

– f1 VL f2 iff for all vV f1(v)  f2(v) – VL = ? VL = ? VL = ? VL = ? • Lemma: L is a complete lattice

• Define the map constructor LVL = Map(V, L)

69 The collecting lattice

• Lattice for a given control-flow node v: ? • Lattice for entire control-flow graph with nodes V: ? • We will use this lattice as a baseline for static analysis and define abstractions of its elements

70 The collecting lattice

• Lattice for a given control-flow node v: State Lv=(2 , , , , , State) • Lattice for entire control-flow graph with nodes V:

LCFG = Map(V, Lv) • We will use this lattice as a baseline for static analysis and define abstractions of its elements

71 Equational definition of the semantics

• Define variables of type set of states for each control-flow node R[entry] entry • Define constraints

between them R[2] 2 if x > 0

R[exit] exit R[3] 3 x := x - 1

72 Equational definition of the semantics

• R[2] = R[entry]  x:=x-1 R[3] • R[3] = R[2]  {s | s(x) > 0} • R[exit] = R[2]  {s | s(x)  0} R[entry] entry • A system of recursive equations • How can we approximate it using what we have learned so far? R[2] 2 if x > 0

R[exit] exit R[3] 3 x := x - 1

73 An abstract semantics Abstract transformer for x:=x-1 # • R[2] = R[entry]  x:=x-1 R[3] Abstract representation • R[3] = R[2]  {s | s(x) > 0}# of {s | s(x) < 0} • R[exit] = R[2]  {s | s(x)  0}# R[entry] entry • A system of recursive equations

R[2] 2 if x > 0

R[exit] exit R[3] 3 x := x - 1

74 Next lecture: abstract interpretation II