• Still need volunteers to teach – BDDs – SAT-solvers Axiomatic – SAT-based decision procedures – Temporal logic (and maybe other modal logics) – ESC/Java

• Please let me know soon

Automated Deduction - George Necula - Lecture 2 1 Automated Deduction - George Necula - Lecture 2 2

Review - More Semantics

• We have an imperative language with pointers and • There is also function calls – Each program has a meaning in the form of a mathematical object – Compositional • We have defined the semantics of the language – More complex formalism • e.g. what are appropriate meanings ? • Operational semantics – Relatively simple • Neither is good for arguing program correctness – Not compositional (due to loops and recursive calls) – Operational semantics requires running the code – Adequate guide for an implementation – Denotational semantics requires complex calculations

• We do instead: Programs → Theorems → Proofs

Automated Deduction - George Necula - Lecture 2 3 Automated Deduction - George Necula - Lecture 2 4

Programs →→→ Theorems. Axiomatic Semantics Partial Correctness Assertions

• Consists of: • The assertions we make about programs are of the – A language for making assertions about programs form: – Rules for establishing when assertions hold {A} c {B } with the meaning that: • Typical assertions: – Whenever we start the execution of c in a state that – During the execution, only non-null pointers are dereferenced satisfies A, the program either does not terminate or it – This program terminates with x = 0 terminates in a state that satisfies B • A is called precondition and B is called postcondition • Partial vs. total correctness assertions • For example: – Safety vs. liveness properties { y x } z := x; z := z +1 { y < z } – Usually focus on safety (partial correctness) is a valid assertion • These are called Hoare triple or Hoare assertions Automated Deduction - George Necula - Lecture 2 5 Automated Deduction - George Necula - Lecture 2 6

1 Total Correctness Assertions Languages for Assertions

• {A} c {B } is a partial correctness assertion. It does • A specification language not imply termination – Must be easy to use and expressive (conflicting needs) • [A] c [B ] is a total correctness assertion meaning that • Most often only expression  – Syntax: how to construct assertions Whenever we start the execution of c in a state that satisfies A the program does terminate in a state that – Semantics: what assertions mean satisfies B • Typical examples • Now let’s be more formal – First-order logic – Formalize the language of assertions, A and B – Temporal logic (used in protocol specification, hardware – Say when an assertion holds in a state specification) – Give rules for deriving Hoare triples – Special-purpose languages: Z, Larch, Java ML

Automated Deduction - George Necula - Lecture 2 7 Automated Deduction - George Necula - Lecture 2 8

State-Based Assertions An Assertion Language

• Assertions that characterize the state of the • We’ll use a fragment of first-order logic first ⊥⊥⊥ ∧ ∀ execution Formulas P ::= A | T | | P 1 P2 | x.P | P 1 ⇒ P2 | ≤ – Recall: state = state of locals + state of memory Atoms A ::= f(A 1,…,A n) | E 1 E2 | E 1 = E 2 | …

• Our assertions will need to be able to refer to • We can also have an arbitrary assortment of function – Variables symbols – Contents of memory – ptr(E,T) - expression E denotes a pointer to T – E : ptr(T) - same in a different notation • What are not state-based assertions – reachable(E 1,E 2) - list cell E 2 is reachable from E 1 – Variable x is live, lock L will be released – these can be built-in or defined – There is no correlation between the values of x and y

Automated Deduction - George Necula - Lecture 2 9 Automated Deduction - George Necula - Lecture 2 10

Semantics of Assertions Semantics of Assertions

• We introduced a language of assertions, we need to • Formal definition (we drop σ for simplicity): assign meanings to assertions. – We ignore for now references to memory ρ  true always ρ ρ ρ  e1 = e 2 iff ⊢ e1 ⇓ n1 and ⊢ e2 ⇓ n2 and n 1 = n 2 • Notation ρ, σ  A to say that an assertion holds in a ρ e ≥ e iff ρ ⊢ e ⇓ n and ρ ⊢ e ⇓ n and n ≥ n given state.  1 2 1 1 2 2 1 2 ρ ∧ ρ ρ – This is well-defined when ρ is defined on all variables  A1 A2 iff  A1 and  A2 σ ρ ρ ρ occurring in A and is defined on all memory addresses  A1 ∨ A2 iff  A1 or  A2 referenced in A ρ ρ ρ  A1 ⇒ A2 iff  A1 implies  A2 ρ  ∀x.A iff ∀n∈Z.ρ[x:=n]  A • The  judgment is defined inductively on the ρ ∃ ∃ ∈Z ρ structure of assertions.  x.A iff n . [x:=n]  A

Automated Deduction - George Necula - Lecture 2 11 Automated Deduction - George Necula - Lecture 2 12

2 Semantics of Assertions Why Isn’t This Enough?

• Now we can define formally the meaning of a partial • Now we have the formal mechanism to decide when correctness assertion {A} c {B }  { A} c { B }: – Start the program in all states that satisfies A ∀ρσ .∀ρ’σ’.( ρ,σ  A ∧ ρ, σ ⊢ c ⇓ ρ’, σ’) ⇒ ρ’,σ’  B – Run the program – Check that each final state satisfies B • … and the meaning of a total correctness assertion • This is exhaustive testing  [A] c [B] iff • Not enough ∀ ρσ.∀ρ’σ’.( ρ,σ  A ∧ ρ, σ ⊢ c ⇓ ρ’, σ’) ⇒ ρ’,σ’  B – Can’t try the program in all states satisfying the precondition ∧ – Can’t find all final states for non-deterministic programs ∀ρσ. ρ,σ  A ⇒ ∃ρ’σ’. ρ, σ ⊢ c ⇓ ρ’, σ’ – And also it is impossible to effectively verify the truth of a ∀x.A postcondition (by using the definition of validity)

Automated Deduction - George Necula - Lecture 2 13 Automated Deduction - George Necula - Lecture 2 14

Derivations as Proxies for Validity Derivation Rules for Assertions

• We define a symbolic technique for deriving valid • The derivation rules for ⊢ A are the usual ones from assertions from others that are known to be valid first-order logic with – We start with validity of first-order formulas • Natural deduction style axioms:

⊢ • We write A when we can derive (prove) the ⊢ A ⊢ B ⊢ [a/x]A (a is fresh) ⊢ ∀x.A assertion A ⊢ A ∧ B ⊢ ∀x.A ⊢ [E/x]A – We wish that (∀ρσ . ρ,σ  A) iff ⊢ A

⊢ [a/x]A ⊢ ⊢ A • We write {A} c {B} when we can derive (prove) the … … partial correctness assertion ⊢∃ ⊢ ⊢ B ⊢ A ⇒ B ⊢ A ⊢ [E/x]A x.A B – We wish that  {A} c {B} iff ⊢ {A} c {B} ⊢ A ⇒ B ⊢ B ⊢ ∃x.A ⊢ B

Automated Deduction - George Necula - Lecture 2 15 Automated Deduction - George Necula - Lecture 2 16

Derivation Rules for Hoare Triples Derivation Rules for

• Similarly we write ⊢ {A} c { B} when we can derive the • One rule for each syntactic construct: triple using derivation rules

⊢ {A} skip {A} • There is one derivation rule for each command in the language

⊢ {A} c 1 {B} ⊢ {B} c 2 {C} • Plus, the rule of consequence ⊢ {A} c 1; c 2 {C} ⊢ A’ ⇒ A ⊢ {A} c {B} ⊢ B ⇒ B’ ⊢ {A ∧ b} c {B} ⊢ {A ∧ ¬ b} c {B} ⊢ {A’} c {B’} 1 2

⊢ {A} if b then c 1 else c 2 {B}

Automated Deduction - George Necula - Lecture 2 17 Automated Deduction - George Necula - Lecture 2 18

3 Derivation Rules for Hoare Logic (II) Hoare Rules: Assignment

• The rule for while is not syntax directed • Example: { A } x := x + 2 {x >= 5 }. What is A? – It needs a loop invariant – A has to imply x ≥ 3 • General rule: ⊢ {A ∧ b} c {A} ⊢ {A} while b do c {A ∧ ¬ b} ⊢ {[e/x]A} x := e {A} • Surprising how simple the rule is ! – Exercise: try to see what is wrong if you make changes to the • But try { A } *x = 5 { *x + *y = 10 } rule (e.g., drop “∧ b” in the premise, …) • A is “*y = 5 or x = y” • How come the rule does not work?

Automated Deduction - George Necula - Lecture 2 19 Automated Deduction - George Necula - Lecture 2 20

Example: Assignment The Assignment Axiom (Cont.)

• Assume that x does not appear in e • Hoare said: “Assignment is undoubtedly the most Prove that {true} x := e { x = e } characteristic feature of programming a digital • We have computer, and one that most clearly distinguishes it from other branches of mathematics. It is surprising ⊢ {e = e} x := e {x = e} therefore that the axiom governing our reasoning about assignment is quite as simple as any to be found because [e/x](x = e) ≡ e = [e/x]e ≡ e = e in elementary logic .”

• Assignment + consequence: • Caveats are sometimes needed for languages with aliasing: ⊢ true ⇒ e = e ⊢ {e = e} x := e {x = e} – If x and y are aliased then ⊢ {true} x := e {x = e} { true } x := 5 { x + y = 10} is true Automated Deduction - George Necula - Lecture 2 21 Automated Deduction - George Necula - Lecture 2 22

Multiple Hoare Rules Example: Conditional

• For some constructs multiple rules are possible: D1 :: ⊢ {true ∧ y 0} x := 1 {x > 0}

D2 :: ⊢ {true ∧ y > 0} x := y {x > 0} ⊢ ∃ ∧ {A} x := e { x0.[x 0/x]A x = [x 0/x]e} ⊢ {true} if y 0 then x := 1 else x := y {x > 0}

(This was the “forward” axiom for assignment •D1 is obtained by consequence and assignment before Hoare) ⊢ {1 > 0} x := 1 {x > 0} ⊢ A ∧ b ⇒ C ⊢ {C} c {A} ⊢ A ∧ ¬ b ⇒ B ⊢ true ∧ y 0 ⇒ 1 > 0 ⊢ ∧ ≥ ⊢ {A} while b do c {B} {true y 0} x := 1 {x 0}

⊢ {C} c {b ⇒ C ∧ ¬ b ⇒ B} •D2 is also obtained by consequence and assignment ⊢ {b ⇒ C ∧ ¬ b ⇒ B} while b do c {B} ⊢ {y > 0} x := y {x > 0} • Exercise: these rules can be derived from the ⊢ true ∧ y > 0 ⇒ y > 0 previous ones using the consequence rules ⊢ {true ∧ y > 0} x := y {x > 0} Automated Deduction - George Necula - Lecture 2 23 Automated Deduction - George Necula - Lecture 2 24

4 Example: Loop Another Example

• We want to derive that • Verify that ⊢ {x 0} while x 5 do x := x + 1 { x = 6} ⊢ {A } while true do c { B} • Use the rule for while with invariant x 6 holds for any A, B and c

⊢ x 6 ∧ x 5 ⇒ x + 1 6 ⊢ {x + 1 6} x := x + 1 { x 6 } • We must construct a derivation tree ⊢ {x 6 ∧ x 5 } x := x + 1 {x 6} ⊢ {true ∧ true} c { true } ⊢ ⇒ ⊢ {x 6} while x 5 do x := x + 1 { x 6 ∧ x > 5} A true ⊢ true ∧ false ⇒ B {true} while true do c {true ∧ false} ⊢ • Then finish-off with consequence {A} while true do c { B} ⊢ x 0 ⇒ x 6 • We need an additional lemma: ⊢ x 6 ∧ x > 5 ⇒ x =6 ⊢ {x 6} while … { x 6 ∧ x > 5} ∀ ⊢ ⊢ {x 0} while … {x = 6} c. { true } c {true} – How do you prove this one? Automated Deduction - George Necula - Lecture 2 25 Automated Deduction - George Necula - Lecture 2 26

GCD Example GCD Example (2)

• Crucial to select good loop invariant

Automated Deduction - George Necula - Lecture 2 27 Automated Deduction - George Necula - Lecture 2 28

GCD Example (3) GCD Example (4)

Automated Deduction - George Necula - Lecture 2 29 Automated Deduction - George Necula - Lecture 2 30

5 GCD Example (5) GCD Example (6)

• The above can be proved by realizing that gcd(x,y) = gcd(x-y,y)

• Q.e.d. • This completes the proof • We used a lot of arithmetic • We had to invent the loop invariants

• What about the proof for total correctness? Automated Deduction - George Necula - Lecture 2 31 Automated Deduction - George Necula - Lecture 2 32

Hoare Rule for Function Call

• If no recursion we can inline the function call

f(x 1,…,xn) = Cf ∈ Program { A } Cf {B[f/x]} ⊢ {A[e 1/x 1,…,e n/xn]} x := f(e 1, …, e n) {B} Axiomatic Semantics in Presence of Side-Effects • In general,

1. each function f has a Pre f and Post f

⊢ {Pre f[e 1/x 1,…,e n/xn]} x := f(e 1, …, e n) {Post f [x/f] }

2. For each function we check { Pre f } Cf {Post f }

Automated Deduction - George Necula - Lecture 2 33 Automated Deduction - George Necula - Lecture 2 34

Naïve Handling of Program State Handling Program State

• We allow memory read in assertions: *x + *y = 5 • We cannot have side-effects in assertions • We try: – While creating the theorem we must remove side-effects ! { A } *x = 5 { *x + *y = 10 } – But how to do that when lacking precise aliasing information ? •A ought to be “*y = 5 or x = y ” • Important technique: Postpone alias analysis • The Hoare rule would give us: • Model the state of memory as a symbolic mapping (*x + *y = 10)[5/*x] from addresses to values: = 5 + *y = 10 – If E denotes an address and M a memory state then: = *y = 5 (we lost one case) – sel(M,E) denotes the contents of memory cell – upd(M,E,V) denotes a new memory state obtained from M by writing V at address E • How come the rule does not work? Automated Deduction - George Necula - Lecture 2 35 Automated Deduction - George Necula - Lecture 2 36

6 More on Memory Semantics of Memory Expressions

• We allow variables to range over memory states • We need a new kind of values (memory values) – So we can quantify over all possible memory states Values v ::= n | a | σ

• And we use the special pseudo-variable µ in assertions to refer to the current state of memory ρ, σ ⊢ µ ⇓ σ

ρ, σ σ ρ, σ • Example: ⊢ Em ⇓ ’ ⊢ E2 ⇓ a ρ, σ σ ⊢ sel(E m, E 2) ⇓ ’(a) “∀i. i ≥ 0 ∧ i < 5 ⇒ sel(µ, A + i) > 0” = allpositive( µ, A, 0, 5) ρ, σ σ ρ, σ ρ, σ ⊢ Em ⇓ ’ ⊢ Ea ⇓ a ⊢ Ev ⇓ v says that entries 0..4 in array A are positive ρ, σ σ ⊢ upd(E m, E a, Ev) ⇓ ’[a := v]

Automated Deduction - George Necula - Lecture 2 37 Automated Deduction - George Necula - Lecture 2 38

Hoare Rules: Side-Effects Memory Aliasing

• To correctly model writes we use memory expressions • Consider again: { A } *x = 5 { *x + *y = 10 } – A memory write changes the value of memory • We obtain: A = (*x + *y = 10)[upd( µ, x, 5)/ µ] µ µ { B[upd( , E 1, E 2)/ ] } *E 1 := E 2 {B} = (sel( µ, x) + sel( µ, y) = 10) [upd( µ, x, 5)/ µ] = sel(upd( µ, x, 5), x) + sel(upd( µ, x, 5), y) = 10 (*) • Important technique: treat memory as a whole µ • And reason later about memory expressions with = 5 + sel(upd( , x, 5), y) = 10 µ inference rules such as (McCarthy): = if x = y then 5 + 5 = 10 else 5 + sel( , y) = 10 = x = y or *y = 5 (**) = E2 if E 1 E3 • To (*) is theorem generation sel(upd(M, E , E ), E ) = 1 2 3 ≠ • From (*) to (**) is theorem proving sel(M, E 3) if E 1 E3

Automated Deduction - George Necula - Lecture 2 39 Automated Deduction - George Necula - Lecture 2 40

Alternative Handling for Memory Using Hoare Rules. Notes

• Reasoning about aliasing is expensive (NP-hard) • Hoare rules are mostly syntax directed

• Sometimes completeness is sacrificed with the • There are three wrinkles: following (approximate) rule: – When to apply the rule of consequence ? = E2 if E 1 (obviously) E 3 – What invariant to use for while ? ≠ sel(M, E 3) if E 1 (obviously) E 3 – How do you prove the implications involved in consequence ? sel(upd(M, E 1, E 2), E 3) = otherwise (p is a fresh p new parameter) • The last one is how theorem proving gets in the • The meaning of “obvious” varies: picture ≠ • The addresses of two distinct globals are – This turns out to be doable ! ≠ • The address of a global and one of a local are – The loop invariants turn out to be the hardest problem ! • PREfix and GCC use such schemes (Should the programmer give them? See Dijkstra.) Automated Deduction - George Necula - Lecture 2 41 Automated Deduction - George Necula - Lecture 2 42

7 Where Do We Stand? Soundness of Axiomatic Semantics

• We have a language for asserting properties of programs • Formal statement • We know when such an assertion is true If ⊢ { A } c { B} then  { A} c { B} • We also have a symbolic method for deriving or, equivalently assertions ρ σ ρ σ ρ σ ρ σ meaning For all , , if ,  A and D :: , ⊢ c ⇓ ’, ’ and H :: ⊢ { A } c { B} then ρ’, σ’  B A ρ, σ  A { A} c {B} soundness  { A} c {B} symbolic derivation (theorem proving) completeness ⊢ A ⊢ { A} c {B} Automated Deduction - George Necula - Lecture 2 43 Automated Deduction - George Necula - Lecture 2 44

Completeness of Axiomatic Semantics

• Is it true that whenever  {A} c {B} we can also derive Completeness of Axiomatic Semantics ⊢ {A} c {B} ? Weakest Preconditions • If it isn’t then it means that there are valid properties of programs that we cannot verify with Hoare rules

• Good news: for our language the Hoare triples are complete • Bad news: only if the underlying logic is complete (whenever  A we also have ⊢ A) - this is called relative completeness

Automated Deduction - George Necula - Lecture 2 45 Automated Deduction - George Necula - Lecture 2 46

Proof Idea Proof Idea (Cont.)

• Dijkstra’s idea: To verify that { A } c { B} • Completeness of axiomatic semantics: a) Find out all predicates A’ such that  { A’} c { B} If  { A } c { B } then ⊢ { A } c { B} • call this set Pre(c, B) • Assuming that we can compute wp(c, B) with the ∈ ⇒ b) Verify for one A’ Pre(c, B) that A A’ following properties: • Assertions can be ordered: 1. wp is a precondition (according to the Hoare rules) ⊢ false ⇒ true { wp(c, B) } c { B} 2. wp is the weakest precondition Pre(c, B) If  { A } c { B} then  A ⇒ wp(c, B) strong weak weakest ⊢ A ⇒ wp(c, B) ⊢ {wp(c, B)} c {B} A precondition: WP(c, B) ⊢ {A} c {B} ⇒ • Thus: compute WP(c, B) and prove A WP(c, B) • We also need that whenever  A then ⊢ A !

Automated Deduction - George Necula - Lecture 2 47 Automated Deduction - George Necula - Lecture 2 48

8 Weakest Preconditions Weakest Preconditions for Loops

• Define wp(c, B) inductively on c, following Hoare rules: • We start from the equivalence while b do c = if b then c; while b do c else skip {A} c 1 {C} {C} c 2 {B} • Let w = while b do c and W = wp(w, B) { A } c 1; c 2 {B}

wp(c ; c , B) = wp(c , wp(c , B)) 1 2 1 2 • We have that { [e/x]B } x := E {B} W = b ⇒ wp(c, W) ∧ ¬ b ⇒ B wp(x := e, B) = [e/x]B

{A 1} c 1 {B} {A 2} c 2 {B} • But this is a recursive equation ! – We know how to solve these using domain theory { E ⇒ A1 ∧ ¬ E ⇒ A2} if E then c 1 else c 2 {B} ⇒ ∧ ¬ ⇒ wp(if E then c 1 else c 2, B) = E wp(c 1, B) E wp(c 2, B) • We need a domain for assertions

Automated Deduction - George Necula - Lecture 2 49 Automated Deduction - George Necula - Lecture 2 50

A Partial-Order for Assertions Weakest Precondition for WHILE

• What is the assertion that contains least information? • Use the fixed-point theorem – true – does not say anything about the state F(A) = b ⇒ wp(c, A) ∧ ¬ b ⇒ B • What is an appropriate information ordering ? – Verify that F is both monotonic and continuous A ⊑ A’ iff  A’ ⇒ A • Is this partial order complete? • The least-fixed point (i.e. the weakest fixed point) is – Take a chain A 1 ⊑ A2 ⊑ … wp(w, B) = ∧Fi(true) – Let ∧Ai be the infinite conjunction of A i σ σ  ∧Ai iff for all i we have that  Ai

– Verify that ∧Ai is the least upper bound

• Can ∧Ai be expressed in our language of assertions? – In many cases yes (see Winskel), we’ll assume yes

Automated Deduction - George Necula - Lecture 2 51 Automated Deduction - George Necula - Lecture 2 52

Weakest Preconditions (Cont.) Weakest Precondition. Example 1

• Define a family of wp’s • Consider the code:

– wp k(while e do c, B) = weakest precondition on which the loop while x 5 do x := x + 1 if it terminates in k or fewer iterations, it terminates in B with postcondition P = x ≥ 7 wp 0 = ¬ E ⇒ B • What is the weakest precondition ? wp 1 = E ⇒ wp(c, wp 0) ∧ ¬ E ⇒ B … • wp(x := x + 1, A) = A[x+1/x] • WP = ¬ (x 5) ⇒ x ≥ 7 = x ∉ { 6 } • wp(while e do c, B) = ∧ ≥ wp = lub {wp | k ≥ 0} 0 k 0 k k ∉ ∉ • WP 1 = x 5 ⇒ x +1 6 ∧ WP 0 = x { 5, 6} • Is it the case that wp k ⇒ wp k-1 ? The opposite? • WP = x 5 ⇒ x + 1 ∉ { 5, 6 } ∧ WP = x ∉ { 4, 5, 6 } • Weakest preconditions are 2 0 – Impossible to compute (in general) •… – Can we find something easier to compute yet sufficient ? • WP = x ≥ 7

Automated Deduction - George Necula - Lecture 2 53 Automated Deduction - George Necula - Lecture 2 54

9 Theorem Proving and Program Analysis Weakest Precondition. Example 2 (again) • Consider the code: • Predicates form a lattice: while x ≥ 5 do x := x + 1 WP(s, B) = lub ⇒(Pre(s, B)) with postcondition P = x ≥ 7 • This is not obvious at all: ∨ • What is the weakest precondition ? – lub {P 1, P 2} = P 1 P2 – lub PS = ∨ P • wp(x := x + 1, A) = A[x+1/x] P ∈ PS – But can we always write this with a finite number of ∨ ? ¬ ≥ ⇒ ≥ ≥ • WP 0 = (x 5) x 7 = x 5 • Even checking implication can be quite hard ≥ ⇒ ≥ ∧ ≥ • WP 1 = x 5 x +1 5 WP 0 = x 5 • Compare with program analysis in which lattices are of •… finite height and quite simple • WP = x ≥ 5 Program Verification is Program Analysis on the lattice of first order formulas

Automated Deduction - George Necula - Lecture 2 55 Automated Deduction - George Necula - Lecture 2 56

Not Quite Weakest Preconditions

• Recall what we are trying to do:

false ⇒ true Pre(s, B) Verification Conditions strong weak weakest A precondition: WP(c, B) verification condition: VC(c, B) • We shall construct a verification condition : VC(c, B) – The loops are annotated with loop invariants ! – VC is guaranteed stronger than WP – But hopefully still weaker than A: A ⇒ VC(c, B) ⇒ WP(c, B)

Automated Deduction - George Necula - Lecture 2 57 Automated Deduction - George Necula - Lecture 2 58

Verification Conditions Invariants Are Not Easy

• Factor out the hard work • Consider the following code from QuickSort

– Loop invariants int partition(int *a, int L0, int H0, int pivot) {

– Function specifications int L = L 0, H = H 0; • Assume programs are annotated with such specs. while(L < H) { – Good software engineering practice anyway while(a[L] < pivot) L ++; • We will assume that the new form of the while while(a[H] > pivot) H --; construct includes an invariant: if(L < H) { swap a[L] and a[H] } } while I b do c return L – The invariant formula must hold every time before b is evaluated } • Consider verifying only memory safety • What is the loop invariant for the outer loop ?

Automated Deduction - George Necula - Lecture 2 59 Automated Deduction - George Necula - Lecture 2 60

10 Verification Condition Generation (1) Verification Condition Generation for WHILE

• Mostly follows the definition of the wp function

VC(while I e do c, B) = ∧ ∀ ⇒ ⇒ ∧ ¬ ⇒ VC(skip, B) = B I ( x1…xn. I (e VC(c, I) e B) ) VC(c ; c , B) = VC(c , VC(c , B)) 1 2 1 2 I holds I is preserved in B holds when the ⇒ ¬ ⇒ VC(if b then c 1 else c 2, B) = b VC(c 1, B) b VC(c 2, B) on entry an arbitrary iteration loop terminates VC(x := e, B) = [e/x]B in an arbitrary iteration VC(let x = e in c, B) = [e/x] VC(c, B) µ µ VC(*e 1 := e 2, B) = [upd( , e 1, e 2)/ ]B •I is the loop invariant (provided externally) VC(while b do c, B) = ? • x1, …, xn are all the variables modified in c • The ∀ is similar to the ∀ in mathematical induction: P(0) ∧ ∀n ∈ N. P(n) ⇒ P(n+1)

Automated Deduction - George Necula - Lecture 2 61 Automated Deduction - George Necula - Lecture 2 62

Verification Conditions for Function Calls Soundness and Completeness of VCGen

• VC(x := f(e 1,…,e n), P) = • We must prove that, for all c and B µ Pre f[E i/x i] ∧ ∀ ∀f Post f ⇒ P[f/x]  { VC(c, B) } c { B} • Or, equivalently • We check the precondition VC(c, B) ⇒ WP(c, B) – with actuals substituted for formals • Try it (the case for loops is interesting) • We assume that the memory and “f” are changed • We check that function postcondition is stronger than • Completeness is trickier P • One formulation: it is possible to choose the loop invariants such that WP(c, B) ⇒ VC(c, B) • Try it !! Automated Deduction - George Necula - Lecture 2 63 Automated Deduction - George Necula - Lecture 2 64

Forward Verification Condition Generation Symbolic Evaluation

• Traditionally VC is computed backwards • Consider the language of instructions: – Works well for structured code x := e | f() | if e goto L | goto L | L: | return | inv e

• But it can also be computed in a forward direction • The “inv e ” instruction is an annotation – Works even for un-structured languages (e.g., assembly – Says that boolean expression e holds at that point language) – Uses symbolic evaluation, a technique that has broad applications in program analysis • Programs are sequences of instructions

• e.g. the PREfix tool (Intrinsa, Microsoft) works this way • Notation: Ik is the instruction at address k

Automated Deduction - George Necula - Lecture 2 65 Automated Deduction - George Necula - Lecture 2 66

11 Symbolic Evaluation. Basic Intuition Symbolic Evaluation. The State.

• VC generation is traditionally backwards due to • We set up a symbolic evaluation state: assignments

VC(x 1 := e 1; …, xn := e n, P) = Σ : Var → SymbolicExpressions

(P[e n/x n]) [e n-1/x n-1] … [e 1/x 1] • We can use the following rule Σ(x) = the symbolic value of x in state Σ (P[e 2/x 2])[e 1/x 1] = P[e 2[e 1/x 1]/x 2, e 1/x 1] Σ[x:=e] = a new state in which x’s value is e

• Symbolic evaluation computes the substitution in a We shall use states also as substitutions: forward direction, and applies it when it reaches the Σ(e) - obtained from e by replacing x with Σ(x) postcondition So far this is pretty much like the operational semantics Automated Deduction - George Necula - Lecture 2 67 Automated Deduction - George Necula - Lecture 2 68

Symbolic Evaluation. The Invariants. Symbolic Evaluation. Rules.

• The symbolic evaluator keeps track of the • Define a VC function as an interpreter: encountered invariants VC : 1..n × SymbolicState × InvariantState → Assertion VC(L, Σ, Inv) if I = goto L • A new element of execution state: Inv ⊆ {1…n} k e ⇒ VC(L, Σ, Inv) ∧ if I = if e goto L ¬ e ⇒ VC(k+1, Σ, Inv) k • If k ∈ Inv then Σ Σ VC(k+1, [x:= (e)], Inv) if Ik = x := e –Ik is an invariant instruction that we have already executed Σ VC(k, Σ, Inv) = (Post current-function ) if Ik = return Σ (Pre f) ∧ • Basic idea: execute an inv instruction only twice: Σ Σ ∀a1..a m. ’(Post f) ⇒ VC(k+1, ’, Inv) – The first time it is encountered (where y 1, …, ym are modified by f) if Ik = f()

– And one more time around an arbitrary iteration and a 1, …, a m are fresh parameters Σ Σ and ’ = [y 1 := a 1, …, ym := a m]

Automated Deduction - George Necula - Lecture 2 69 Automated Deduction - George Necula - Lecture 2 70

Symbolic Evaluation. Invariants. Symbolic Evaluation. Invariants.

Two cases when seeing an invariant instruction: 2. We see the invariant for the second time

1. We see the invariant for the first time –Ik = inv E – k ∈ Inv –Ik = inv e . – k ∉ Inv Σ Σ – Let {y 1, …, ym} = the variables that could be modified on a VC(k, , Inv) = (e) path from the invariant back to itself (like a function return)

– Let a1, …, a m be fresh new symbolic parameters • Some tools take a more simplistic approach Σ VC(k, , Inv) = – Do not require invariants Σ Σ Σ (e) ∧ ∀a1…am. ’(e) ⇒ VC(k+1, ’, Inv ∪ {k}]) – Iterate through the loop a fixed number of times Σ Σ – PREfix, versions of ESC (Compaq SRC) with ’ = [y 1 := a 1, …, ym := a m] (like a function call) – Sacrifice completeness for usability Automated Deduction - George Necula - Lecture 2 71 Automated Deduction - George Necula - Lecture 2 72

12 Symbolic Evaluation. Putting it all together VC Generation Example

• Let • Consider the program

– x1, …, xn be all the variables and a1, …, a n fresh parameters Precondition: x 0 Σ – 0 be the state [x 1 := a 1, …,xn :=a n] Loop: inv x 6 – ∅ be the empty Inv set if x > 5 goto End • For all functions f in your program, compute x := x + 1 Σ Σ ∅ goto Loop ∀a1…an. 0(Pre f) ⇒ VC(f entry , 0, ) • If all of these predicates are valid then: End: return Postconditon: x = 6 – If you start the program by invoking any f in a state that

satisfies Pre f the program will execute such that • At all “inv e ” the e holds, and

• If the function returns then Post f holds – Can be proved w.r.t. a real interpreter (operational semantics) – Proof technique called co-induction (or, assume-guarantee) Automated Deduction - George Necula - Lecture 2 73 Automated Deduction - George Necula - Lecture 2 74

VC Generation Example (cont.) VC Generation Example (with memory)

∀x. • Consider the program x 0 ⇒ 1: I := 0 Precondition: B : bool ∧ A : array(bool, L ) x 6 ∧ ∀ R := B x’. ≥ ∧ (x’ 6 ⇒ 3: inv I 0 R : bool ≥ x’ > 5 ⇒ x’ = 6 if I L goto 9 ∧ assert saferd(A + I) x’ 5 ⇒ x’ + 1 6 ) T := *(A + I) I := I + 1 R := T goto 3 • VC contains both proof obligations and assumptions about the control flow 9: return R Postconditon: R : bool

Automated Deduction - George Necula - Lecture 2 75 Automated Deduction - George Necula - Lecture 2 76

VC Generation Example (cont.) VC and Invariants

µ ∀A. ∀B. ∀L. ∀ • Consider the Hoare triple: ∧ ⇒ B : bool A : array(bool, L) {x ≤ 0} while x ≤ 5 do x := x + 1 {x = 6} 0 ≥ 0 ∧ B : bool ∧ I ∀I. ∀R. I ≥ 0 ∧ R : bool ⇒ • The VC for this is: ≤ ∧ ∀ ∧ I ≥ L ⇒ R : bool x 0 ⇒ I(x) x. (I(x) ⇒ (x > 5 ⇒ x = 6 ∧ x ≤ 5 ⇒ I(x+1) )) I < L ⇒ saferd(A + I) ∧ • Requirements on the invariant: I + 1 ≥ 0 ∧ – Holds on entry ∀x. x ≤ 0 ⇒ I(x) sel( µ, A + I) : bool – Preserved by the body ∀x. I(x) ∧ x ≤ 5 ⇒ I(x+1) • VC contains both proof obligations and assumptions – Useful ∀x. I(x) ∧ x > 5 ⇒ x = 6 about the control flow • Check that I(x) = x ≤ 6 satisfies all constraints Automated Deduction - George Necula - Lecture 2 77 Automated Deduction - George Necula - Lecture 2 78

13 VC Can Be Large VC Can Be Large (2)

• Consider the sequence of conditionals • VCs are exponential in the size of the source because (if x < 0 then x := - x); (if x ≤ 3 then x += 3) they attempt relative completeness: – With the postcondition P(x) – To handle the case then the correctness of the program must • The VC is be argued independently for each path x < 0 ∧ -x ≤ 3 ⇒ P(- x + 3) ∧ • Remark: x < 0 ∧ -x > 3 ⇒ P(- x) ∧ – It is unlikely that the programmer could write a program by x ≥ 0 ∧ x ≤ 3 ⇒ P(x + 3) ∧ considering an exponential number of cases x ≥ 0 ∧ x > 3 ⇒ P(x ) – But possible. Any examples? • There is one conjunct for each path => exponential number of paths ! • Solutions: – Conjuncts for non-feasible paths have un-satisfiable guard ! – Allow invariants even in straight-line code – Thus do not consider all paths independently ! • Try with P(x) = x ≥ 3

Automated Deduction - George Necula - Lecture 2 79 Automated Deduction - George Necula - Lecture 2 80

Invariants in Straight-Line Code Dropping Paths

• Purpose: modularize the verification task • In absence of annotations drop some paths

• Add the command “after c establish I” • VC( if E then c 1 else c 2, P ) = choose one of ∧ ¬ – Same semantics as c ( I is only for verification purposes) – E ⇒ VC(c1, P) E ⇒ VC(c2, P) ∧ ∀ – E ⇒ VC(c , P) VC(after c establish I, P) = def VC(c,I) xi. I ⇒ P 1 – ¬E ⇒ VC(c , P) • where xi are the ModifiedVars(c) 2 • Use when c contains many paths • We sacrifice soundness ! after if x < 0 then x := - x establish x ≥ 0; – No more guarantees but possibly still a good debugging aid if x ≤ 3 then x += 3 { P(x) } • Remarks: • VC now is (for P(x) = x ≥ 3) – A recent trend is to sacrifice soundness to increase usability (x < 0 ⇒ - x ≥ 0) ∧ (x ≥ 0 ⇒ x ≥ 0) ∧ – The PREfix tool considers only 50 non-cyclic paths through a function (almost at random) ∀x. x ≥ 0 ⇒ (x ≤ 3 ⇒ P(x+3) ∧ x > 3 ⇒ P(x))

Automated Deduction - George Necula - Lecture 2 81 Automated Deduction - George Necula - Lecture 2 82

VCGen for Exceptions VCGen for Exceptions (2)

• We extend the source language with exceptions • Define: VC(c, P, Q) is a precondition that makes c without arguments: either not terminate, or terminate normally with P or – throw throws an exception throw an exception with Q

– try c 1 handle c 2 executes c2 if c1 throws • Problem: • Rules – We have non-local transfer of control VC(skip, P, Q) = P – What is VC(throw, P) ? VC(c 1; c 2, P, Q) = VC(c 1, VC(c 2, P, Q), Q) • Solution: use 2 postconditions VC(throw, P, Q) = Q

– One for normal termination VC(try c 1 handle c 2, P, Q) = VC(c 1, P, VC(c 2, Q, Q))

– One for exceptional termination VC(try c 1 finally c 2, P, Q) = ?

Automated Deduction - George Necula - Lecture 2 83 Automated Deduction - George Necula - Lecture 2 84

14 VCGen for Exceptions (3) Additional Language Extensions

• What if we have exceptions with arguments • wrong – Semantics: terminate execution with an error • Introduce global variable ex for the exception – VC(wrong, P) = false argument • fail – Semantics: terminate execution with a benign error – Used to denote error conditions that are allowed (e.g. index • The exceptional postcondition can now refer to ex out of bounds) – VC(fail, P) = true • Remember that we must add an exceptional • assert E = if E then skip else wrong postcondition to functions also – Used to specify verification goals – Like the THROWS clause in Java or Modula-3 • assume E = if E then skip else fail – Used to specify assumptions (e.g. about inputs)

Automated Deduction - George Necula - Lecture 2 85 Automated Deduction - George Necula - Lecture 2 86

Additional Language Extensions (2) Call-by-Reference

• Undefined variables • Let f(VAR x, y) { x ← 5; f ← x + y } – VC(let x in c, P) = ∀x. VC(c, P) • with ≥ – Pre f = y 10 ≥ – Post f = f 15 • VC of the body is ∀xy. y ≥ 10 ⇒ 5 + y ≥ 15 – Which is provable ! • Let {x = 10} y ← f(x,x) {y ≥ 15 } • Its VC is x = 10 ⇒ (x ≥ 10 ∧ (∀y. y ≥ 15 ⇒ y ≥ 15) – Also provable – But if we run the code, y is 10 at the end ! – What’s going on?

Automated Deduction - George Necula - Lecture 2 87 Automated Deduction - George Necula - Lecture 2 88

Call-by-Reference Arrays

• Axiomatic semantics can model only call-by-value • Arrays can be modeled like we did pointers • In a safe language (no & and no pointer arithmetic) • Not a big problem, just have to model call-by- – We can have a store value for each array reference with call-by-value – Instead of a unique store µ – Pass pointers instead • A[E] is considered as sel(A,E) – Use the sel/upd axioms instead of assignment axioms ← ← • A[E 1] E2 is considered as A upd(A, E1, E 2)

• What is the advantage over sel(µ, A+E) ?

Automated Deduction - George Necula - Lecture 2 89 Automated Deduction - George Necula - Lecture 2 90

15 Mutable Records - Two Models VC Generation for Assembly Language

• Let r : RECORD f1 : T1; f2 : T2 END • Issues when extending this to real assembly language • Records are reference types • A fixed set of variables • Method 1 – Machine registers – One “memory” for each record • Handling of stack slots – One index constant for each field. We postulate f1 ≠ f2 – Option 1: as memory locations (too inefficient) – r.f1 is sel(r,f1) and r.f1 := E is r := upd(r,f1,E) – Option 2: as additional registers (reverse spilling) • Careful with renaming at function calls and returns • Method 2 • Does not work in presence of address-of operator – One “memory” for each field • Two’s complement arithmetic – The record address is the index – Careful to distinguish between + and ⊕ – r.f1 is sel(f1,r) and r.f1 := E is f1 := upd(f1,r,E) • See Necula-Ph.D. thesis for x86 and Alpha example

Automated Deduction - George Necula - Lecture 2 91 Automated Deduction - George Necula - Lecture 2 92

VC as a “Semantic CheckSum” VC for Type Checking Low Level Code

• Weakest preconditions are an expression of the • Type checking low level code is hard because program semantics – We cannot have variable declarations – Two equivalent programs have (logically) equivalent WP • Each instance of a register use can have different types • VC are almost the same – The type of a register depends on the data flow – Except for the loop invariants and function specifications • Varies depending on how a use was reached • Consider the program and its VC ← x 4 4 == 5 : bool ∧ ¬ (4 == 5) • In fact, VC abstract over some syntactic details x ← x == 5 – Order of unrelated expressions assert x : bool • No live logical variables – Names of variables ← x not x • Not confused by reuse of x – Common subexpressions assert x

Automated Deduction - George Necula - Lecture 2 93 Automated Deduction - George Necula - Lecture 2 94

Invariance of VC Through Optimizations Dead-Code Elimination

• In fact, VC is so good at abstracting syntactic details • VC is insensitive to dead-code that it is invariant through many common optimiz. • The symbolic evaluator won’t even look at it ! – Preserves syntactic form, not just semantic meaning !

• We’ll take a look at some optimizations and see how VC changes/doesn’t change

• This can be used to verify correctness of compiler optimizations (Translation Validation)

Automated Deduction - George Necula - Lecture 2 95 Automated Deduction - George Necula - Lecture 2 96

16 Common-Subexpression Elimination Copy Propagation

• Does not change the VC • Just like CSE • CSE does not change the VC

Automated Deduction - George Necula - Lecture 2 97 Automated Deduction - George Necula - Lecture 2 98

Instruction Scheduling Register Allocation

• Does not change VC – Final logical variables are quantified over • Instruction scheduling does not change VC • Even spilling can be accomodated – By giving register names to spill slots on the stack

Automated Deduction - George Necula - Lecture 2 99 Automated Deduction - George Necula - Lecture 2 100

Loop Invariant Hoisting Redundant Conditional Elimination

• VC changes syntactically • VC is not changed • But not semantically (since C ∧ C ∧ … ∧ C ⇒ C ) • For same reasons as with CSE 1 2 n – We do have to prove something here • Same for array-bounds checking elimination

Automated Deduction - George Necula - Lecture 2 101 Automated Deduction - George Necula - Lecture 2 102

17 VC Characterize a Safe Interpreter Conclusion

• Consider a fictitious “safe” interpreter • Verification conditions – As it goes along it performs checks (e.g. saferd, validString) – Are an expression of semantics and specifications – Some of these would actually be hard to implement – Completely independent of a language – Can be computed backward/forward on • The VC describes all of the checks to be performed structured/unstructured code – Can be computed on high-level/low-level code – Along with their context (assumptions from conditionals) – Invariants and pre/postconditions are used to obtain a finite • Using symbolic evaluation we can hope to check expression (through induction) correctness of compiler optimizations – See “Translation Validation for an Optimizing Compiler” off • VC is valid ⇒ interpreter never fails the class web page – We enforce same level of “correctness” – But better (static + more powerful checks) • Next time: We start proving VC predicates

Automated Deduction - George Necula - Lecture 2 103 Automated Deduction - George Necula - Lecture 2 104

18