Language-independent Semantics-Based Program Verifiers for All Languages

Andrei Stefanescu Daejun Park Shijiao Yuwen Yilong Li Grigore Rosu Nov 2, 2016 @ OOPSLA’16 Problems with state-of-the-art verifiers

• Missing details of language behaviors

• e.g., VCC’s false positives/negatives, undefinedness of SV-COMP benchmarks

• Fragmentation: specific to a fixed language Missing details of language behaviors

1 unsigned x = UINT_MAX; 2 unsigned y = x + 1; 3 _(assert y == 0)

VCC incorrectly reported an overflow error Missing details of language behaviors

1 int assign(int *p, int x) 2 _(ensures *p == x) 3 _(writes p) 4 { 5 return (*p = x); 6 } 7 8 void main() { 9 int r; 10 assign(&r, 0) == assign(&r, 1); 11 _(assert r == 1) 12 }

VCC incorrectly proved it, missing non-determinism Missing details of language behaviors

* Grigore Rosu, https://runtimeverification.com/blog/?p=200 Problems with state-of-the-art verifiers

• Missing details of language behaviors

• Fragmentation: specific to a fixed language

• e.g., KLEE (LLVM), JPF (JVM), Pex (.NET), CBMC (C), SAGE (x86), …

• Implemented similar heuristics/optimizations: duplicating efforts Our solution

Clear separation, yet smooth integration, Between semantics reasoning and proof search, Using language-independent logic & proof system Idea: separation of concerns

Semantics Proof Reasoning Search

Language semantics: Verification techniques: • (c11, gcc, clang, …) • Deductive verification • Java (6, 7, 8, …) • Model checking • JavaScript (ES5, ES6, …) • Abstract interpretation • … • …

Defined/implemented once, and reused for all others Idea: separation of concerns

Semantics Proof Reasoning Search

Language semantics: Verification techniques: VCC • C (c11, gcc, clang, …) • Deductive verification • Java (6, 7, 8, …) • Model checking JPF CBMC • JavaScript (ES5, ES6, …) • Abstract interpretation • … • …

Defined/implemented once, and reused for all others Language-independent verification framework

Semantics ✏ Program & Properties

Language-independent uniform notation (logic)

Language-independent proof systems ` Proof automation

• Provides a nice interface (logic) in which both language semantics and program properties can be described. • Proof search in this logic becomes completely language- independent. Language-independent verification framework

Operational semantics Reachability properties (C/Java/JavaScript (Functional correctness of semantics) heap manipulations)

Language-independent uniform notation (Matching logic reachability)

Language-independent proof systems (Matching logic reachability proof systems)

Proof automation (Symbolic execution, SMT, Natural proofs, …) Operational semantics

• Easy to define and understand than axiomatic semantics

• Require little mathematical knowledge

• Similar to implement language

• Executable, thus testable

• Important when defining real large languages

• Shown to scale to defining full language semantics

• C, Java, JavaScript, Python, PHP, … Language-independent verification framework

Operational semantics Reachability properties (C/Java/JavaScript (Functional correctness of semantics) heap manipulations)

Language-independent uniform notation (Reachability logic)

Language-independent proof systems (Reachability logic proof systems)

Proof automation (Symbolic execution, SMT, Natural proofs, …) Reachability logic

• Unifying logic in which both language semantics and program correctness properties can be specified.

reachability between “patterns”

“pattern” formula representing a set of program states

• Pattern formula is FOL without predicate symbols. • Similar to algebraic data types for pattern matching in functional languages such as OCaml and Haskell. Expressiveness: semantics

• In OCaml:

match e with | ADD(x,y) => x + y | SUB(x,y) => x - y | MUL(x,y) => x * y | DIV(x,y) when y != 0 => x / y Expressiveness: semantics

• In OCaml:

match e with | ADD(x,y) => x + y | SUB(x,y) => x - y | MUL(x,y) => x * y | DIV(x,y) when y != 0 => x / y

• In Reachability logic:

ADD(x,y) => x + y SUB(x,y) => x - y MUL(x,y) => x * y DIV(x,y) /\ y != 0 => x / y Expressiveness: properties

• In Hoare logic:

fun insert (v: elem, t: tree) return (t’: tree) @requires bst(t) @ensures bst(t’) and keys(t’) == keys(t) \union { v } Expressiveness: properties

• In Hoare logic:

fun insert (v: elem, t: tree) return (t’: tree) @requires bst(t) @ensures bst(t’) and keys(t’) == keys(t) \union { v }

• In Reachability logic:

insert /\ bst(t) => . /\ bst(t’) /\ keys(t’) == keys(t) \union { v } Expressiveness

• Reachability formula can specify:

• Pre-/post-conditions

• Safety properties by augmenting semantics

• No liveness properties yet (ongoing work)

• Pattern formula can include:

• Recursive predicates

formula Language-independent verification framework

Operational semantics Reachability properties (C/Java/JavaScript (Functional correctness of semantics) heap manipulations)

Language-independent uniform notation (Reachability logic)

Language-independent proof systems (Reachability logic proof systems)

Proof automation (Symbolic execution, SMT, Natural proofs, …) Proof system for code with repetitive behaviors, such as loops, recursive Step : functions, jumps, etc. If there is a derivation of the claim = ' ' .' Language-independent proof system 'l 9'r FreeVars( l) l using itself as a circularity, then the claim holds. This would | ! ) 2 S 9 = ((' 'l) , Cfg) 'r '0 for each 'l 9 'r obviously be unsound if the new assumption was available for deriving sequents of the form: | ^W ? ^ ! ) 2 S , ' 8 '0 immediately, but requiring progress (taking at least on step Axiom : S A `C ) before circularities can be used) ensures that only diverging Q ' '0 is FOL formula (logical frame) executions correspond to endless invocation of a circularity. ) 2 S [ A Q Formally, we have the following result , ' '0 S A `C ^ ) ^ Reflexivity : Theorem 1. The proof system in Figure 4 is sound: if Q Q · ' '0 then = ' '0 (Q , ). Under , ' Q ' S ` ) S| ) 2 {9 8} S A ` ) some mild assumptions, it is relatively complete: given an Q Q Transitivity : oracle for , if = ' '0 then ' '0. Q Q T S| ) S ` ) , '1 '2 , '2 '3 C The proof for the all-path case is available in [12], and for S A ` ) S AQ[ C ` ) , '1 '3 the one-path case in [47]. When considering the completeness Consequence :S A `C ) of program verification logics, notice that if the logic for Q = '1 '0 , '0 '0 = '0 '2 specifying state properties (in this case, matching logic) | ! 1 S A `C 1 ) 2 | 2 ! Q , '1 '2 is undecidable, then the entire program verification logic Case Analysis : S A `C ) (in this case, reachability logic) is undecidable. By relative Q Q , '1 ' , '2 ' completeness, we prove the completeness of the proof system S A `C ) S A `C ) Q in Figure 4 assuming we can decide any matching logic , '1 '2 ' S A `C _ ) formula in , which means that any undecidability comes Abstraction : T Q from and is unavoidable. This theorem generalizes similar , ' '0 X FreeVars('0) = T C results from Hoare logic, but in a language-independent S A ` ) \ Q ; , X ' '0 setting. S A `C 9 ) Circularity : Q , ' Q' ' '0 4. Reachability Logic vs. Hoare Logic S A `C[{ ) 0} ) Q , ' '0 Here we briefly compare reachability logic (Section 3.3) with S A `C ) Hoare logic by means of a simple example, aiming to convey the message that verification using reachability logic is not Figure 4: Proof system for reachability. We assume the free harder than using Hoare logic, even when done manually. variables of ' ' in the Step proof rule are fresh (e.g., l )9 r disjoint from those of ' 8 '0). Here Q , . 4.1 The Program and the Language ) 2 {8 9} Consider the following snippet, say SUM, part of a C-like ' matches the left-hand-side ' of some rule in and thus, program summing up the natural numbers smaller than n: l S as is weakly well-defined, can take a step: if (, ⇢) = ' s = 0; S | then there is a 'l 9 'r and a valuation ⇢0 of the free while(--n) s = s + n; ) 2 S variables of 'l s.t. (, ⇢0) = 'l, and thus has at least one | Assume a simplified language whose loops cannot break/re- T -successor generated by 'l 9 'r. The second premise )S ) turn/jump, whose integers are arbitrarily large, and without ensures that each T -successor of a configuration matching )S local variables (so blocks are used for grouping only). Fig- ' matches ' : if T and matches ' then there is some 0 ) 0 ure 5 shows a reduction-style executable semantics of the ' ' S ⇢ , ⇢ = ' ' rule l 9 r and : Var such that ( ) l needed language fragment; with the notation explained in the ,)⇢ = '2 S ! T | ^' and ( 0 ) r; then the second part implies 0 matches 0. caption of Figure 5, the semantics consists of ten reduction xiom | eflexivity ransitivity A applies a trusted rule. R and T rules between configuration terms. Each of these rules can be capture the closure properties of the reachability relation. regarded as a one-path reachability rule, with side conditions Reflexivity requires empty to ensure that rules derived C as constraints on the left-hand-side pattern of the rule. For ex- with non-empty take at least one step. Transitivity enables C ample, the second rule for the conditional statement becomes the circularities as axioms for the second premise, since if the following one-path reachability rule: is not empty, the first premise is guaranteed to take a step. C Consequence, Case Analysis and Abstraction are adapted C[if(I) S else S ] I , 0 hh 1 2 icode h istateicfg ^ Int from Hoare logic. Ignoring circularities, these seven proof 9 C[S 1] code state cfg rules are the formal infrastructure for symbolic execution. ) hh i h i i Circularity has a coinductive nature, allowing us to Mathematical domain operations (+Int, etc.) are subscripted make new circularity claims. We typically make such claims with Int to distinguish them from the language constructs.

81 Proof system for code with repetitive behaviors, such as loops, recursive Step : functions, jumps, etc. If there is a derivation of the claim = ' ' .' Language-independent proof system 'l 9'r FreeVars( l) l using itself as a circularity, then the claim holds. This would | ! ) 2 S 9 = ((' 'l) , Cfg) 'r '0 for each 'l 9 'r obviously be unsound if the new assumption was available for deriving sequents of the form: | ^W ? ^ ! ) 2 S , ' 8 '0 immediately, but requiring progress (taking at least on step Axiom : S A `C ) before circularities can be used) ensures that only diverging Q semantics property ' '0 is FOL formula (logical frame) executions correspond to endless invocation of a circularity. ) 2 S [ A Q Formally, we have the following result , ' '0 S A `C ^ ) ^ Reflexivity : Theorem 1. The proof system in Figure 4 is sound: if Q Q '1 '0 · ' '0 then = ' '0 (Q , ). Under 1 , ' Q ' S ` ) S| ) 2 {9 8} ) S A ` ) some mild assumptions, it is relatively complete: given an Q Q '2 '0 ' '0 Transitivity : oracle for , if = ' '0 then ' '0. 2 Q Q T S| ) S ` ) , '1 '2 , '2 '3 ) ` ) C The proof for the all-path case is available in [12], and for S A ` ) S AQ[ C ` ) '3 '30 , '1 '3 the one-path case in [47]. When considering the completeness ). Consequence :S A `C ) of program verification logics, notice that if the logic for Q . = '1 '0 , '0 '0 = '0 '2 specifying state properties (in this case, matching logic) . | ! 1 S A `C 1 ) 2 | 2 ! Q , '1 '2 is undecidable, then the entire program verification logic Case Analysis : S A `C ) (in this case, reachability logic) is undecidable. By relative Q Q , '1 ' , '2 ' completeness, we prove the completeness of the proof system S A `C ) S A `C ) Q in Figure 4 assuming we can decide any matching logic , '1 '2 ' S A `C _ ) formula in , which means that any undecidability comes Abstraction : T Q from and is unavoidable. This theorem generalizes similar , ' '0 X FreeVars('0) = T C results from Hoare logic, but in a language-independent S A ` ) \ Q ; , X ' '0 setting. S A `C 9 ) Circularity : Q , ' Q' ' '0 4. Reachability Logic vs. Hoare Logic S A `C[{ ) 0} ) Q , ' '0 Here we briefly compare reachability logic (Section 3.3) with S A `C ) Hoare logic by means of a simple example, aiming to convey the message that verification using reachability logic is not Figure 4: Proof system for reachability. We assume the free harder than using Hoare logic, even when done manually. variables of ' ' in the Step proof rule are fresh (e.g., l )9 r disjoint from those of ' 8 '0). Here Q , . 4.1 The Program and the Language ) 2 {8 9} Consider the following snippet, say SUM, part of a C-like ' matches the left-hand-side ' of some rule in and thus, program summing up the natural numbers smaller than n: l S as is weakly well-defined, can take a step: if (, ⇢) = ' s = 0; S | then there is a 'l 9 'r and a valuation ⇢0 of the free while(--n) s = s + n; ) 2 S variables of 'l s.t. (, ⇢0) = 'l, and thus has at least one | Assume a simplified language whose loops cannot break/re- T -successor generated by 'l 9 'r. The second premise )S ) turn/jump, whose integers are arbitrarily large, and without ensures that each T -successor of a configuration matching )S local variables (so blocks are used for grouping only). Fig- ' matches ' : if T and matches ' then there is some 0 ) 0 ure 5 shows a reduction-style executable semantics of the ' ' S ⇢ , ⇢ = ' ' rule l 9 r and : Var such that ( ) l needed language fragment; with the notation explained in the ,)⇢ = '2 S ! T | ^' and ( 0 ) r; then the second part implies 0 matches 0. caption of Figure 5, the semantics consists of ten reduction xiom | eflexivity ransitivity A applies a trusted rule. R and T rules between configuration terms. Each of these rules can be capture the closure properties of the reachability relation. regarded as a one-path reachability rule, with side conditions Reflexivity requires empty to ensure that rules derived C as constraints on the left-hand-side pattern of the rule. For ex- with non-empty take at least one step. Transitivity enables C ample, the second rule for the conditional statement becomes the circularities as axioms for the second premise, since if the following one-path reachability rule: is not empty, the first premise is guaranteed to take a step. C Consequence, Case Analysis and Abstraction are adapted C[if(I) S else S ] I , 0 hh 1 2 icode h istateicfg ^ Int from Hoare logic. Ignoring circularities, these seven proof 9 C[S 1] code state cfg rules are the formal infrastructure for symbolic execution. ) hh i h i i Circularity has a coinductive nature, allowing us to Mathematical domain operations (+Int, etc.) are subscripted make new circularity claims. We typically make such claims with Int to distinguish them from the language constructs.

81 Proof system for code with repetitive behaviors, such as loops, recursive Step : functions, jumps, etc. If there is a derivation of the claim = ' ' .' Language-independent proof system 'l 9'r FreeVars( l) l using itself as a circularity, then the claim holds. This would | ! ) 2 S 9 = ((' 'l) , Cfg) 'r '0 for each 'l 9 'r obviously be unsound if the new assumption was available for deriving sequents of the form: | ^W ? ^ ! ) 2 S , ' 8 '0 immediately, but requiring progress (taking at least on step Axiom : S A `C ) before circularities can be used) ensures that only diverging Q semantics property ' '0 is FOL formula (logical frame) executions correspond to endless invocation of a circularity. ) 2 S [ A Q Formally, we have the following result , ' '0 S A `C ^ ) ^ Reflexivity : Theorem 1. The proof system in Figure 4 is sound: if Q Q '1 '0 · ' '0 then = ' '0 (Q , ). Under 1 , ' Q ' S ` ) S| ) 2 {9 8} ) S A ` ) some mild assumptions, it is relatively complete: given an Q Q '2 '0 ' '0 Transitivity : oracle for , if = ' '0 then ' '0. 2 Q Q T S| ) S ` ) , '1 '2 , '2 '3 ) ` ) C The proof for the all-path case is available in [12], and for S A ` ) S AQ[ C ` ) '3 '30 , '1 '3 the one-path case in [47]. When considering the completeness ). Consequence :S A `C ) of program verification logics, notice that if the logic for Q . = '1 '0 , '0 '0 = '0 '2 specifying state properties (in this case, matching logic) . | ! 1 S A `C 1 ) 2 | 2 ! Q , '1 '2 is undecidable, then the entire program verification logic ADD(x,y) => x + y insert /\ bst(t)Case Analysis : S A `C ) (in this case, reachability logic) is undecidable. By relative Q Q SUB(x,y) => x - y => , '1 ' , '2 ' completeness, we prove the completeness of the proof system S A `C ) S A `C ) MUL(x,y) => x * y Q in Figure 4 assuming we can decide any matching logic . /\ bst(t’) , '1 '2 ' S A `C _ ) formula in , which means that any undecidability comes . /\ keys(t’)bstraction == keys(t) \union { v } T . A : from and is unavoidable. This theorem generalizes similar ` Q T . , ' '0 X FreeVars('0) = C results from Hoare logic, but in a language-independent S A ` ) \ Q ; , X ' '0 setting. S A `C 9 ) Circularity : Q , ' Q' ' '0 4. Reachability Logic vs. Hoare Logic S A `C[{ ) 0} ) Q , ' '0 Here we briefly compare reachability logic (Section 3.3) with S A `C ) Hoare logic by means of a simple example, aiming to convey the message that verification using reachability logic is not Figure 4: Proof system for reachability. We assume the free harder than using Hoare logic, even when done manually. variables of ' ' in the Step proof rule are fresh (e.g., l )9 r disjoint from those of ' 8 '0). Here Q , . 4.1 The Program and the Language ) 2 {8 9} Consider the following snippet, say SUM, part of a C-like ' matches the left-hand-side ' of some rule in and thus, program summing up the natural numbers smaller than n: l S as is weakly well-defined, can take a step: if (, ⇢) = ' s = 0; S | then there is a 'l 9 'r and a valuation ⇢0 of the free while(--n) s = s + n; ) 2 S variables of 'l s.t. (, ⇢0) = 'l, and thus has at least one | Assume a simplified language whose loops cannot break/re- T -successor generated by 'l 9 'r. The second premise )S ) turn/jump, whose integers are arbitrarily large, and without ensures that each T -successor of a configuration matching )S local variables (so blocks are used for grouping only). Fig- ' matches ' : if T and matches ' then there is some 0 ) 0 ure 5 shows a reduction-style executable semantics of the ' ' S ⇢ , ⇢ = ' ' rule l 9 r and : Var such that ( ) l needed language fragment; with the notation explained in the ,)⇢ = '2 S ! T | ^' and ( 0 ) r; then the second part implies 0 matches 0. caption of Figure 5, the semantics consists of ten reduction xiom | eflexivity ransitivity A applies a trusted rule. R and T rules between configuration terms. Each of these rules can be capture the closure properties of the reachability relation. regarded as a one-path reachability rule, with side conditions Reflexivity requires empty to ensure that rules derived C as constraints on the left-hand-side pattern of the rule. For ex- with non-empty take at least one step. Transitivity enables C ample, the second rule for the conditional statement becomes the circularities as axioms for the second premise, since if the following one-path reachability rule: is not empty, the first premise is guaranteed to take a step. C Consequence, Case Analysis and Abstraction are adapted C[if(I) S else S ] I , 0 hh 1 2 icode h istateicfg ^ Int from Hoare logic. Ignoring circularities, these seven proof 9 C[S 1] code state cfg rules are the formal infrastructure for symbolic execution. ) hh i h i i Circularity has a coinductive nature, allowing us to Mathematical domain operations (+Int, etc.) are subscripted make new circularity claims. We typically make such claims with Int to distinguish them from the language constructs.

81 Language-independent verification framework

Operational semantics Reachability properties (C/Java/JavaScript (Functional correctness of semantics) heap manipulations)

Language-independent uniform notation (Reachability logic)

Language-independent proof systems (Reachability logic proof systems)

Proof automation (Symbolic execution, SMT, Natural proofs, …) Proof automation

• Deductive verification

• Symbolic execution for reachability space search

• Domain reasoning (e.g., integers, bit-vectors, floats, set, sequences, …) using SMT

• Natural proofs technique for quantifier instantiation for recursive heap predicates (e.g., list, tree, …) Language-independent verification framework

Operational semantics Reachability properties (C/Java/JavaScript (Functional correctness of semantics) heap manipulations)

Language-independent uniform notation (Reachability logic)

Language-independent proof systems (Reachability logic proof systems)

Proof automation (Symbolic execution, SMT, Natural proofs, …)

Does it really work? • Q1: How easy to instantiate the framework? • Q2: Is performance OK? Evaluation

• Instantiated framework by plugging-in three language semantics.

Semantics of C C verifier [POPL’12, PLDI’15]

Semantics of Java Verification Java verifier [POPL’15] Framework

Semantics of JavaScript JavaScript verifier [PLDI’15]

• Verified challenging heap-manipulating programs implementing the same algorithms in all three languages. KernelCC Java JavaScript Programs Execution Reasoning Execution Reasoning Execution Reasoning Execution Reasoning Time #StepKernel TimeCC #Query Time #Step Time #Query Time #Step Java Time #Query Time #StepJavaScript Time #Query BSTPrograms find 0.6 Execution 192 Reasoning 1.2 95 10.4 Execution 1,028 Reasoning 3.6 246 Execution 1.9 322 Reasoning 2.8 244 Execution 4.5 1,736 Reasoning 1.8 93 BST insertTime 0.8 #Step 336 Time 2.9 #Query 160 Time 23.0 #Step 2,481 Time 7.2 #Query 414 Time 4.1 #Step 691 Time 4.5 #Query 342 Time 5.4 #Step 3,394 Time 2.8 #Query 158 BSTBST delete find 1.4 0.6 582 192 5.61.2 420 95 55.110.4 1,0284,540 16.6 3.6 246 938 1.9 9.8 1,274 322 15.1 2.8 1,125 244 15.6 4.5 1,736 5,052 1.8 5.6 373 93 AVLBST findinsert 0.6 0.8 192 336 1.22.9 160 95 23.0 9.9 2,4811,028 7.23.1 414 214 4.1 2.2 691 322 4.5 2.7 342 244 5.4 4.5 3,394 1,736 2.8 1.9 158 93 AVLBST insertdelete 6.2 1.4 1,980 582 42.1 5.6 1,133 420 210.7 55.1 12,616 4,540 16.670.6 1,865 938 42.4 9.8 1,274 3,753 15.1 62.8 1,125 2,146 102.5 15.6 26,977 5,052 32.5 5.6 1,221 373 AVLAVL delete find 9.5 0.6 2,933 192 45.4 1.2 1,758 95 514.8 9.9 26,003 1,028 118.9 3.1 3,883 214 122.2 2.2 8,144 322 149.4 2.7 4,866 244 184.3 4.5 38,591 1,736 55.3 1.9 2,233 93 RBTAVL find insert 0.6 6.2 1,980 192 42.1 1.1 1,133 95 210.7 11.5 12,616 1,064 70.6 3.0 1,865 214 42.4 2.1 3,753 322 62.8 2.9 2,146 244 102.5 4.9 26,977 1,736 32.5 1.9 1,221 93 RBTAVL insert delete 7.6 9.5 2,331 2,933 48.145.4 1,3921,758 722.0514.8 26,00330,924 118.9181.8 3,883 4,394 122.2 39.9 8,144 4,240 149.4 75.7 4,866 2,547 184.3 84.9 38,591 28,082 55.3 29.6 2,233 1,381 RBTRBT delete find 10.6 0.6 3,891 192 33.7 1.1 2,033 95 1593.8 11.5 50,389 1,064 308.3 3.0 15,429 214 95.8 2.1 8,312 322 75.4 2.9 4,460 244 144.2 4.9 51,356 1,736 39.4 1.9 2,009 93 TreapRBT insert find 0.6 7.6 2,331 200 48.1 1.4 1,392 118 722.0 11.2 30,924 1,064 181.8 3.2 4,394 214 39.9 2.0 4,240 322 75.7 2.9 2,547 244 84.9 4.6 28,082 1,736 29.6 1.9 1,381 116 TreapRBT delete insert 10.6 1.4 3,891 753 33.7 4.5 2,033 247 1593.8 52.4 50,389 4,954 308.3 15.3 15,429 724 95.8 12.7 8,312 1,469 75.4 10.4 4,460 563 144.2 13.7 51,356 7,738 39.4 5.2 2,009 243 TreapTreap delete find 2.0 0.6 831 200 9.41.4 509118 73.911.2 1,0645,512 16.5 3.2 214 656 12.0 2.0 1,694 322 16.4 2.9 1,021 244 24.8 4.6 1,736 8,333 1.9 8.4 116 460 ListTreap reverse insert 0.4 1.4 142 753 0.34.5 247 21 52.4 6.6 4,954 815 15.3 4.8 724 76 12.7 1.5 1,469 222 10.4 2.6 563 46 13.7 5.0 7,738 1,162 5.2 0.5 243 20 ListTreap append delete 0.4 2.0 171 831 0.59.4 509 45 73.9 7.4 5,512 909 16.5 7.4 656 128 12.0 1.8 1,694 239 16.4 5.5 1,021 106 24.8 4.5 8,333 1,392 8.4 0.8 460 46 BubbleList reverse sort 0.9 0.4 391 142 26.8 0.3 190 21 28.4 6.6 2,401 815 38.0 4.8 357 76 1.5 3.4 222 589 35.4 2.6 345 46 5.0 5.6 1,162 2,688 25.7 0.5 145 20 InsertionList append sort 1.1 0.4 468 171 24.5 0.5 300 45 26.6 7.4 2,555 909 35.3 7.4 128 451 1.8 4.1 239 731 27.0 5.5 106 371 4.5 8.3 1,392 3,119 36.5 0.8 213 46 QuickBubble sort sort 1.1 0.9 604 391 31.626.8 269190 31.028.4 2,4013,601 38.048.2 357 518 3.4 7.1 589 958 35.4 40.0 345 413 15.0 5.6 2,688 5,046 25.7 33.1 145 252 MergeInsertion sort sort 1.7 1.1 970 468 55.024.5 478300 81.626.6 2,5556,589 35.389.0 1,070 451 14.1 4.1 1,566 731 27.0 72.9 371 737 22.8 8.3 3,119 7,021 36.5 43.2 213 480 TotalQuick sort 47.7 1.1 17,159 604 335.2 31.6 9,358 269 3470.5 31.0 158,473 3,601 970.6 48.2 31,791 518 379.3 7.1 35,170 958 604.5 40.0 20,064 413 654.9 15.0 196,895 5,046 326.3 33.1 9,629 252 Merge sort 1.7 970 55.0 478 81.6 6,589 89.0 1,070 14.1 1,566 72.9 737 22.8 7,021 43.2 480 Total 47.7 17,159 335.2 9,358 3470.5 158,473 970.6 31,791 379.3 35,170 604.5 20,064 654.9 196,895 326.3 9,629 Table 1: Summary of verification experiments: ‘Execution’ shows time (seconds) and number of operational semantic steps for symbolic execution (Section 5.1); ‘Reasoning’ shows time (seconds) and number of Z3 queries for reasoning (Section 5.2 & Efforts5.3).Table 1: Summary of verification experiments: ‘Execution’ shows time (seconds) and number of operational semantic steps for symbolic execution (Section 5.1); ‘Reasoning’ shows time (seconds) and number of Z3 queries for reasoning (Section 5.2 & 5.3). CJava JavaScript details of complex languages than in a translation or a VC Semantics development (months) 40 20 4 generator, for two reasons. First, an operational semantics CJava JavaScript details of complex languages than in a translation or a VC Semantics size (#rules) 2,572 1,587 1,378 is more amenable to visual inspection, as it is written in a Semantics development (months) 40 20 4 generator, for two reasons. First, an operational semantics Semantics size (LOC) 17,791 13,417 6,821 domain-specific language for defining semantics. Second, an Semantics size (#rules) 2,572 1,587 1,378 is more amenable to visual inspection, as it is written in a Language-specific e↵ort (days) 7 4 5 operational semantics is executable and can be thoroughly Semantics size (LOC) 17,791 13,417 6,821 domain-specific language for defining semantics. Second, an Semantics changes size (#rules) 63 38 12 tested. While this does not guarantee the absence of bugs (see SemanticsLanguage-specific changes e size↵ort (LOC) (days) 468 7 95 4 49 5 operational semantics is executable and can be thoroughly instantiatingSection 6.3), framework it greatly reduces their occurrence. SpecificationsSemantics changes size (#rules) 3663 3831 1231 tested. While this does not guarantee the absence of bugs (see (additionalEven effort) if a semantics is not already available, we believe AbstractionsSemantics changes size (LOC) 468 6 95 6 49 6 Section 6.3), it greatly reduces their occurrence. Specifications 36 31 31 that developing an operational semantics has an important Function definitions 14 14 14 Even if a semantics is not already available, we believe Abstractions 6 6 6 advantage over building a translator or a VC generator: the InstantiatingLemmas efforts include: 7 7 7 that developing an operational semantics has an important Function definitions 14 14 14 semantics is used not only for verification, but for other • advantage over building a translator or a VC generator: the LemmasFixing bugs of semantics 7 7 7 purposes as well, so overall the semantics development cost semantics is used not only for verification, but for other • SpecifyingTable heap 2: The abstractions development (e.g., costs lists and trees)is amortized. For example, the JavaScript semantics was used purposes as well, so overall the semantics development cost for bug finding in browsers [38]. Table 2: The development costs is amortized. For example, the JavaScript semantics was used Regarding number of annotations, our approach is compa- e↵ort scales with the language complexity. The e↵ort for C for bug finding in browsers [38]. is considerably larger than for Java and JavaScript due to the rable to the state-of-the-art language-specific approaches that e↵ort scales with the language complexity. The e↵ort for C Regarding number of annotations, our approach is compa- low level complexity of C. Overall, the numbers in Table 2 do not infer invariants (VCC, Frama-C). The user provides is considerably larger than for Java and JavaScript due to the rable to the state-of-the-art language-specific approaches that validate our hypothesis that program verification based on one specification for each recursive function and loop. The low level complexity of C. Overall, the numbers in Table 2 do not infer invariants (VCC, Frama-C). The user provides operational semantics and the K verification infrastructure is user also provides the definitions for heap abstractions and validate our hypothesis that program verification based on one specification for each recursive function and loop. The cost e↵ective in terms of development e↵ort. auxiliary functions used in specifications. The user does not operational semantics and the K verification infrastructure is user also provides the definitions for heap abstractions and For comparison, the state-of-the-art is to define a translator provide anything similar to ghost code or hints for the verifier. cost e↵ective in terms of development e↵ort. auxiliary functions used in specifications. The user does not to an intermediate verification language, like Boogie, or to The user may need to provide additional lemmas and those For comparison, the state-of-the-art is to define a translator provide anything similar to ghost code or hints for the verifier. define a verification condition (VC) generator. For example, lemmas apply to a class of programs rather than one particu- to an intermediate verification language, like Boogie, or to The user may need to provide additional lemmas and those the VCC translator from C to Boogie consists of approxi- lar program (e.g., the lemmas for list segments in Section 5.2 define a verification condition (VC) generator. For example, lemmas apply to a class of programs rather than one particu- mately 5000 lines of F# [1]. We believe that writing such a are shared by all sorting-related programs in all languages). the VCC translator from C to Boogie consists of approxi- lar program (e.g., the lemmas for list segments in Section 5.2 translator takes considerably more e↵ort than we reported mately 5000 lines of F# [1]. We believe that writing such a are shared by all sorting-related programs in all languages). for our approach in Table 2 (we do not include the time to translator takes considerably more e↵ort than we reported define the semantics into this comparison, since we assume 6.3 Operational Semantics Bugs for our approach in Table 2 (we do not include the time to the semantics already exist, and they serve other purposes We found bugs in all the three operational semantics used define the semantics into this comparison, since we assume 6.3 Operational Semantics Bugs as well). Moreover, we believe that one would have more for verification, despite the fact that these semantics are the semantics already exist, and they serve other purposes We found bugs in all the three operational semantics used confidence in an operational semantics to handle the tricky thoroughly tested on thousands of programs [8, 18, 25, 38]. as well). Moreover, we believe that one would have more for verification, despite the fact that these semantics are confidence in an operational semantics to handle the tricky thoroughly tested on thousands of programs [8, 18, 25, 38]. 88 88 ernel ava ava cript KernelCC J Java JJavaSScript ProgramsPrograms Execution Reasoning Execution Execution Reasoning Reasoning Execution Execution Reasoning Reasoning Execution Execution Reasoning Reasoning TimeTime #StepKernel TimeCC #Query Time Time #Step #Step Time Time #Query #Query Time Time #Step #Step Java Time Time #Query #Query Time Time #Step #StepJavaScript Time Time #Query #Query BSTBSTPrograms find find 0.6 Execution 192 Reasoning 1.2 95 10.4 10.4 Execution 1,028 1,028 Reasoning 3.6 3.6 246 246 Execution 1.9 1.9 322 322 Reasoning 2.8 2.8 244 244 4.5 Execution 4.5 1,736 1,736 Reasoning 1.8 1.8 93 93 BSTBST insert insertTime 0.8 #Step 336 Time 2.9 #Query 160 Time 23.0 23.0 #Step 2,481 2,481 Time 7.2 7.2 #Query 414 414 Time 4.1 4.1 #Step 691 691 Time 4.5 4.5 #Query 342 342 Time 5.4 5.4 #Step3,394 3,394 Time 2.8 2.8 #Query 158 158 BSTBST delete deletefind 1.4 0.6 582 192 5.61.2 420 95 55.1 55.110.4 4,540 1,0284,540 16.6 16.6 3.6 938 246 938 9.8 1.9 9.8 1,274 1,274 322 15.1 15.1 2.8 1,125 1,125 244 15.6 15.6 4.5 5,052 1,736 5,052 5.6 1.8 5.6 373 373 93 AVLAVLBST findinsert find 0.6 0.8 192 336 1.22.9 160 95 23.0 9.9 9.9 1,028 2,4811,028 3.1 7.23.1 214 414 214 2.2 4.1 2.2 322 691 322 2.7 4.5 2.7 244 342 244 4.5 5.4 4.5 1,736 3,394 1,736 1.9 2.8 1.9 158 93 93 AVLAVLBST insertdelete insert 6.2 1.4 1,980 582 42.1 5.6 1,133 420 210.7 210.7 55.1 12,616 12,616 4,540 70.6 16.670.6 1,865 1,865 938 42.4 42.4 9.8 3,753 1,274 3,753 62.8 15.1 62.8 2,146 1,125 2,146 102.5 102.5 15.6 26,977 26,977 5,052 32.5 32.5 5.6 1,221 1,221 373 AVLAVL delete deletefind 9.5 0.6 2,933 192 45.4 1.2 1,758 95 514.8 514.8 9.9 26,003 26,003 1,028 118.9 118.9 3.1 3,883 3,883 214 122.2 122.2 2.2 8,144 8,144 322 149.4 149.4 2.7 4,866 4,866 244 184.3 184.3 4.5 38,591 38,591 1,736 55.3 55.3 1.9 2,233 2,233 93 RBTRBTAVL find insertfind 0.6 6.2 1,980 192 42.1 1.1 1,133 95 210.7 11.5 11.5 12,616 1,064 1,064 70.6 3.0 3.0 1,865 214 214 42.4 2.1 2.1 3,753 322 322 62.8 2.9 2.9 2,146 244 244 102.5 4.9 4.9 26,977 1,736 1,736 32.5 1.9 1.9 1,221 93 93 RBTRBTAVL insert deleteinsert 7.6 9.5 2,331 2,933 48.145.4 1,392 1,758 722.0 722.0514.8 30,924 26,00330,924 181.8 118.9181.8 4,394 3,883 4,394 122.2 39.9 39.9 4,240 8,144 4,240 149.4 75.7 75.7 2,547 4,866 2,547 184.3 84.9 84.9 28,082 38,591 28,082 29.6 55.3 29.6 1,381 2,233 1,381 RBTRBT delete deletefind 10.6 10.6 0.6 3,891 192 33.7 1.1 2,033 95 1593.8 1593.8 11.5 50,389 50,389 1,064 308.3 308.3 3.0 15,429 15,429 214 95.8 95.8 2.1 8,312 8,312 322 75.4 75.4 2.9 4,460 4,460 244 144.2 144.2 4.9 51,356 51,356 1,736 39.4 39.4 1.9 2,009 2,009 93 TreapTreapRBT insert find find 0.6 7.6 2,331 200 48.1 1.4 1,392 118 722.0 11.2 11.2 30,924 1,064 1,064 181.8 3.2 3.2 4,394 214 214 39.9 2.0 2.0 4,240 322 322 75.7 2.9 2.9 2,547 244 244 84.9 4.6 4.6 28,082 1,736 1,736 29.6 1.9 1.9 1,381 116 116 TreapTreapRBT delete insert insert 10.6 1.4 3,891 753 33.7 4.5 2,033 247 1593.8 52.4 52.4 50,389 4,954 4,954 308.3 15.3 15.3 15,429 724 724 12.7 95.8 12.7 1,469 8,312 1,469 10.4 75.4 10.4 4,460 563 563 144.2 13.7 13.7 51,356 7,738 7,738 39.4 5.2 5.2 2,009 243 243 TreapTreap delete deletefind 2.0 0.6 831 200 9.41.4 509 118 73.9 73.911.2 5,512 1,0645,512 16.5 16.5 3.2 656 214 656 12.0 12.0 2.0 1,694 1,694 322 16.4 16.4 2.9 1,021 1,021 244 24.8 24.8 4.6 8,333 1,736 8,333 8.4 1.9 8.4 460 116 460 ListListTreap reverse reverse insert 0.4 1.4 142 753 0.34.5 247 21 52.4 6.6 6.6 4,954 815 815 15.3 4.8 4.8 724 76 76 12.7 1.5 1.5 1,469 222 222 10.4 2.6 2.6 563 46 46 13.7 5.0 5.0 1,162 7,738 1,162 0.5 5.2 0.5 243 20 20 ListListTreap append append delete 0.4 2.0 171 831 0.59.4 509 45 73.9 7.4 7.4 5,512 909 909 16.5 7.4 7.4 128 656 128 12.0 1.8 1.8 1,694 239 239 16.4 5.5 5.5 1,021 106 106 24.8 4.5 4.5 1,392 8,333 1,392 0.8 8.4 0.8 460 46 46 BubbleBubbleList reverse sort sort 0.9 0.4 391 142 26.8 0.3 190 21 28.4 28.4 6.6 2,401 2,401 815 38.0 38.0 4.8 357 357 76 3.4 1.5 3.4 589 222 589 35.4 35.4 2.6 345 345 46 5.6 5.0 5.6 2,688 1,162 2,688 25.7 25.7 0.5 145 145 20 InsertionInsertionList append sort sort 1.1 0.4 468 171 24.5 0.5 300 45 26.6 26.6 7.4 2,555 2,555 909 35.3 35.3 7.4 451 128 451 4.1 1.8 4.1 731 239 731 27.0 27.0 5.5 371 106 371 8.3 4.5 8.3 3,119 1,392 3,119 36.5 36.5 0.8 213 213 46 QuickQuickBubble sort sort sort 1.1 0.9 604 391 31.626.8 269 190 31.0 31.028.4 3,601 2,4013,601 48.2 38.048.2 518 357 518 7.1 3.4 7.1 958 589 958 40.0 35.4 40.0 413 345 413 15.0 15.0 5.6 5,046 2,688 5,046 33.1 25.7 33.1 252 145 252 MergeMergeInsertion sort sort sort 1.7 1.1 970 468 55.024.5 478 300 81.6 81.626.6 6,589 2,5556,589 89.0 35.389.0 1,070 1,070 451 14.1 14.1 4.1 1,566 1,566 731 72.9 27.0 72.9 737 371 737 22.8 22.8 8.3 7,021 3,119 7,021 43.2 36.5 43.2 480 213 480 TotalTotalQuick sort 47.7 47.7 1.1 17,159 604 335.2 31.6 9,358 269 3470.5 3470.5 31.0 158,473 158,473 3,601 970.6 970.6 48.2 31,791 31,791 518 379.3 379.3 7.1 35,170 35,170 958 604.5 604.5 40.0 20,064 20,064 413 654.9 654.9 15.0 196,895 196,895 5,046 326.3 326.3 33.1 9,629 9,629 252 Merge sort 1.7 970 55.0 478 81.6 6,589 89.0 1,070 14.1 1,566 72.9 737 22.8 7,021 43.2 480 Total 47.7 17,159 335.2 9,358 3470.5 158,473 970.6 31,791 379.3 35,170 604.5 20,064 654.9 196,895 326.3 9,629 TableTable 1: 1: SummarySummary of verification experiments: ‘Execution’ ‘Execution’ shows shows time time (seconds) (seconds) and and number number of of operational operational semantic semantic steps steps for for symbolicsymbolic execution execution (Section 5.1); ‘Reasoning’ shows shows time time (seconds) (seconds) and and number number of of Z3 Z3 queries queries for for reasoning reasoning (Section (Section 5.2 5.2 & & Table 1: Summary of verification experiments: ‘Execution’ shows time (seconds) and number of operational semantic steps for Efforts5.3).5.3). symbolic execution (Section 5.1); ‘Reasoning’ shows time (seconds) and number of Z3 queries for reasoning (Section 5.2 & 5.3). CJavaava JJavaavaSScriptcript detailsdetails of of complex complex languages languages than than in in a a translation translation or or a a VC VC SemanticsSemantics development development (months) 40 20 20 4 4 generator,generator, for for two two reasons. reasons. First, First, an an operational operational semantics semantics CJava JavaScript definingdetails semantics of complex languages than in a translation or a VC SemanticsSemantics size size (#rules) 2,572 1,587 1,587 1,378 1,378 isis more more amenable amenable to to visual visual inspection, inspection, as as it it is is written written in in a a Semantics development (months) 40 20 4 (alreadygenerator, given) for two reasons. First, an operational semantics SemanticsSemantics size size (LOC) 17,791 13,417 13,417 6,821 6,821 domain-specific language for defining semantics. Second, an Semantics size (#rules) 2,572 1,587 1,378 domain-specific language for defining semantics. Second, an Language-specificLanguage-specific e↵ort (days) 7 4 4 5 5 is more amenable to visual inspection, as it is written in a Semantics size (LOC) 17,791 13,417 6,821 operationaloperational semantics semantics is is executable executable and and can can be be thoroughly thoroughly SemanticsSemantics changes changes size (#rules) 63 38 38 12 12 domain-specific language for defining semantics. Second, an Language-specific e↵ort (days) 7 4 5 tested.tested. While While this this does does not not guarantee guarantee the the absence absence of of bugs bugs (see (see SemanticsSemantics changes changes size (LOC) 468 95 95 49 49 instantiatingoperational framework semantics is executable and can be thoroughly Semantics changes size (#rules) 63 38 12 SectionSection 6.3), 6.3), it it greatly greatly reduces reduces their their occurrence. occurrence. SpecificationsSpecifications 36 31 31 31 31 tested. While this does not guarantee the absence of bugs (see Semantics changes size (LOC) 468 95 49 (additionalEvenEven effort) if if a a semantics semantics is is not not already already available, available, we we believe believe AbstractionsAbstractions 6 6 6 6 6 Section 6.3), it greatly reduces their occurrence. Specifications 36 31 31 thatthat developing developing an an operational operational semantics semantics has has an an important important FunctionFunction definitions definitions 14 14 14 14 14 Even if a semantics is not already available, we believe Abstractions 6 6 6 advantageadvantage over over building building a a translator translator or or a a VC VC generator: generator: the the InstantiatingLemmasLemmas efforts include: 7 7 7 7 7 that developing an operational semantics has an important Function definitions 14 14 14 semanticssemantics is is used used not not only only for for verification, verification, but but for for other other • Fixing bugs of semantics advantage over building a translator or a VC generator: the Lemmas 7 7 7 purposespurposes as as well, well, so so overall overall the the semantics semantics development development cost cost semantics is used not only for verification, but for other • SpecifyingTable heap 2: The abstractions development (e.g., costs costs lists and trees)isis amortized. amortized. For For example, example, the theJJavaavaSScriptcriptsemanticssemantics was was used used purposes as well, so overall the semantics development cost forfor bug bug finding finding in in browsers browsers [38]. [38]. Table 2: The development costs is amortized. For example, the JavaScript semantics was used RegardingRegarding number number of of annotations, annotations, our our approach approach is is compa- compa- ee↵↵ortort scales scales with with the language complexity. The The e e↵↵ortort for for CC for bug finding in browsers [38]. isis considerably considerably larger than for Java and JJavaavaSScriptcript duedue to to the the rablerable to to the the state-of-the-art state-of-the-art language-specific language-specific approaches approaches that that e↵ort scales with the language complexity. The e↵ort for C Regarding number of annotations, our approach is compa- lowlow level level complexity complexity of C. Overall, the numbers numbers in in Table Table 2 2 dodo not not infer infer invariants invariants (VCC, (VCC, Frama-C). Frama-C). The The user user provides provides is considerably larger than for Java and JavaScript due to the rable to the state-of-the-art language-specific approaches that validatevalidate our our hypothesis hypothesis that program verification based based on on oneone specification specification for for each each recursive recursive function function and and loop. loop. The The low level complexity of C. Overall, the numbers in Table 2 do not infer invariants (VCC, Frama-C). The user provides operationaloperational semantics semantics and the K verification infrastructure infrastructure is is useruser also also provides provides the the definitions definitions for for heap heap abstractions abstractions and and validate our hypothesis that program verification based on one specification for each recursive function and loop. The costcost e e↵↵ectiveective in in terms of development e↵↵ort.ort. auxiliaryauxiliary functions functions used used in in specifications. specifications. The The user user does does not not operational semantics and the K verification infrastructure is user also provides the definitions for heap abstractions and ForFor comparison, comparison, the state-of-the-art is to to define define a a translator translator provideprovide anything anything similar similar to to ghost ghost code code or or hints hints for for the the verifier. verifier. cost e↵ective in terms of development e↵ort. auxiliary functions used in specifications. The user does not toto an an intermediate intermediate verification language, like like Boogie, Boogie, or or to to TheThe user user may may need need to to provide provide additional additional lemmas lemmas and and those those For comparison, the state-of-the-art is to define a translator provide anything similar to ghost code or hints for the verifier. definedefine a a verification verification condition (VC) generator. For For example, example, lemmaslemmas apply apply to to a a class class of of programs programs rather rather than than one one particu- particu- to an intermediate verification language, like Boogie, or to The user may need to provide additional lemmas and those thethe VCC VCC translator translator from C to Boogie consists consists of of approxi- approxi- larlar program program (e.g., (e.g., the the lemmas lemmas for for list list segments segments in in Section Section 5.2 5.2 define a verification condition (VC) generator. For example, lemmas apply to a class of programs rather than one particu- matelymately 5000 5000 lines lines of F# [1]. We believe that that writing writing such such a a areare shared shared by by all all sorting-related sorting-related programs programs in in all all languages). languages). the VCC translator from C to Boogie consists of approxi- lar program (e.g., the lemmas for list segments in Section 5.2 translatortranslator takes takes considerably more e↵ortort than than we we reported reported mately 5000 lines of F# [1]. We believe that writing such a are shared by all sorting-related programs in all languages). forfor our our approach approach in Table 2 (we do not include include the the time time to to translator takes considerably more e↵ort than we reported definedefine the the semantics semantics into this comparison, since since we we assume assume 6.36.3 Operational Operational Semantics Semantics Bugs Bugs for our approach in Table 2 (we do not include the time to thethe semantics semantics already exist, and they serve other other purposes purposes WeWe found found bugs bugs in in all all the the three three operational operational semantics semantics used used define the semantics into this comparison, since we assume 6.3 Operational Semantics Bugs asas well). well). Moreover, Moreover, we believe that one would would have have more more forfor verification, verification, despite despite the the fact fact that that these these semantics semantics are are the semantics already exist, and they serve other purposes We found bugs in all the three operational semantics used confidenceconfidence in in an operational semantics to to handle handle the the tricky tricky thoroughlythoroughly tested tested on on thousands thousands of of programs programs [8, [8, 18, 18, 25, 25, 38]. 38]. as well). Moreover, we believe that one would have more for verification, despite the fact that these semantics are confidence in an operational semantics to handle the tricky thoroughly tested on thousands of programs [8, 18, 25, 38]. 8888 88 Experiments

Time (secs)

Programs C Java JS Programs C Java JS

BST find 14.0 4.7 6.3 Treap find 14.4 4.9 6.5

BST insert 30.2 8.6 8.2 Treap insert 67.7 23.1 18.9

BST delete 71.7 24.9 21.2 Treap delete 90.4 28.4 33.2

AVL find 13.0 4.9 6.4 List reverse 11. 4 4.1 5.5

AVL insert 281.3 105.2 135.0 List append 14.8 7. 3 5.3

AVL delete 633.7 271.6 239.6 Bubble sort 66.4 38.8 31.3

RBT find 14.5 5.0 6.8 Insertion sort 61.9 31.1 44.8

RBT insert 903.8 115 . 6 114 . 5 Quick sort 79.2 47.1 48.1

RBT delete 1,902.1 171.2 183.6 Merge sort 170.6 87.0 66.0

Total 4,441.1 983.5 981.2

Average 246.7 54.6 54.5 Experiments

Time (secs)

Programs C Java JS Programs C Java JS

BST find 14.0 4.7 6.3 Treap find 14.4 4.9 6.5

BST insert 30.2 8.6 8.2 Treap insert 67.7 23.1 18.9

BST delete 71.7 24.9 21.2 Treap delete 90.4 28.4 33.2

AVLFull find functional 13.0correctness:4.9 6.4 List reverse 11. 4 4.1 5.5

AVL insert 281.3 105.2 135.0 List append 14.8 7. 3 5.3 insert /\ bst(t) AVL=> delete 633.7 271.6 239.6 Bubble sort 66.4 38.8 31.3 . /\ bst(t’) RBT find 14.5 5.0 6.8 Insertion sort 61.9 31.1 44.8 /\ keys(t’) == keys(t) \union { v } RBT insert 903.8 115 . 6 114 . 5 Quick sort 79.2 47.1 48.1

RBT delete 1,902.1 171.2 183.6 Merge sort 170.6 87.0 66.0

Total 4,441.1 983.5 981.2

Average 246.7 54.6 54.5 Experiments

Time (secs)

Programs C Java JS Programs C Java JS

BST find 14.0 4.7 6.3 Treap find 14.4 4.9 6.5

BST insert 30.2 8.6 8.2 Treap insert 67.7 23.1 18.9

BST delete 71.7 24.9 21.2 Treap delete 90.4 28.4 33.2

AVL find 13.0 4.9 6.4 List reverse 11. 4 4.1 5.5

AVL insert 281.3 105.2 135.0 List append 14.8 7. 3 5.3

AVL delete 633.7 271.6 239.6 Bubble sort 66.4 38.8 31.3

RBTPerformance find is14.5 comparable5.0 to6.8 a state-of-the-artInsertion sort verifier61.9 for31.1 C, 44.8

RBTVCDryad insert [PLDI’14],903.8 based115 . 6 on114 a . 5 separationQuick sort logic 79.2extension47.1 of VCC:48.1

RBT e.g., delete AVL insert1,902.1 : 260s171.2 vs 280s183.6 (ours)Merge sort 170.6 87.0 66.0

Total 4,441.1 983.5 981.2

Average 246.7 54.6 54.5 Idea: separation of concerns Language-independent verification framework

Semantics Program & Properties Semantics Proof ✏ Reasoning Semantics-BasedSearch Program Verifiers for All Languages Language-independent uniform notation (logic)

Language semantics: AndreiVerification S, tefanescu˘ techniques: Daejun Park Language-independentShijiao proof Yuwen systems VCC • C (c11, gcc, clang, …) University• Deductive of Illinois verification at University of Illinois at University of Illinois at Urbana-Champaign, USA Urbana-Champaign, USA Urbana-Champaign, USA ` • Java (6, 7, 8, …) • Model checking Proof automation JPF [email protected] [email protected] [email protected] • JavaScript (ES5, ES6, …) • Abstract interpretation • • … • … Provides a nice interface (logic) in which both language Yilong Li Grigore Ros, u semantics and program properties can be described. Runtime Verification, Inc., USA University of Illinois at Urbana-Champaign, USA • Proof search in this logic becomes completely language- Defined/implemented once, and reusedyilong.li@runtimeverification.com for all others [email protected] independent.

https://github.com/kframework/k

Abstract 1. Introduction Evaluation We present a language-independent verification framework ExperimentsOperational semantics are easy to define and understand, that can be instantiated with an operational semantics to auto- similarly to implementing an interpreter. They require little • Instantiated framework by plugging-inmatically generate three a programlanguage verifier. semantics. The framework treats formal training, scale up well, and, being executable, can Time (secs) both the operational semantics and the program correctness Programsbe tested. Thus,C operationalJava JS semanticsPrograms are typicallyC used asJava JS specifications as reachability rules between matching logic trusted reference models for the defined languages. Despite Semantics of C BST find 14.0 4.7 6.3 Treap find 14.4 4.9 6.5 patterns, and uses the soundC verifier and relatively complete reach- these advantages, they are rarely used directly for program [POPL’12, PLDI’15] ability logic proof system to prove the specifications using BST insertverification, because30.2 proofs8.6 tend8.2 to beTreap low-level, insert as they67.7 work 23.1 18.9 the semantics. We instantiate the framework with the seman- BST deletedirectly with71.7 the corresponding24.9 21.2 transitionTreap delete system. Hoare90.4 or 28.4 33.2 Semantics of Java Verification tics of one academic language,Java verifierKernelC, as well as with AVL finddynamic logics13.0 allow4.9 higher level6.4 reasoningList reverse at the11. cost 4 of 4.1 5.5 [POPL’15] Framework ava three recent semantics of real-world languages, C, J , and AVL insert(re)defining281.3 the language105.2 as135.0 a set ofList append abstract proof14.8 rules, 7. 3 5.3 Semantics of JavaScript JavaScript, developed independently of our verification infras- which are harder to understand and trust. The state-of-the-art JavaScript verifier AVL delete 633.7 271.6 239.6 Bubble sort 66.4 38.8 31.3 [PLDI’15] tructure. We evaluate our approach empirically and show that in mechanical program verification is to develop and prove the generated program verifiers can check automatically the RBT findsuch language-specific14.5 5.0 proof systems6.8 Insertion sound sort w.r.t. a61.9 trusted 31.1 44.8 full functional correctness of challenging heap-manipulating RBT insertoperational903.8 semantics115 [. 63, 26,11436 . 5], butQuick that sort needs to be79.2 done 47.1 48.1 programs implementing operations on list and tree data struc- for each language separately and is labor intensive. • Verified challenging heap-manipulating programs RBT delete 1,902.1 171.2 183.6 Merge sort 170.6 87.0 66.0 tures, like AVL trees. This is the first approach that can Defining multiple semantics for the same language and implementing the same algorithmsturn the operational in all three semantics languages. of real-world languages into proving the soundness of one semantics in terms of an- Total 4,441.1 983.5 981.2 correct-by-construction automatic verifiers. other are highly uneconomical tasks when real languages are concerned, often taking severalAverage man-years to complete.246.7 54.6 54.5 Categories and Subject Descriptors D.2.4 [Software Engi- For these reasons, many program verification tools forgo neering]: Software/Program Verification—correctness proofs; defining an operational or an axiomatic semantics altogether, F.3.1 [Logics and Meanings Of Programs]: Specifying and and instead they implement ad-hoc strongest-postcondition Verifying and Reasoning about Programs—mechanical veri- or weakest-precondition generation. For example, tools for C fication; F.3.2 [Logics and Meanings Of Programs]: Seman- like VCC [11] and Frama-C [21], and for Java like jStar [17] tics of Programming Languages—operational semantics take this approach. Sometimes this is a two step process: General Terms Languages, Theory, Verification first translate the high-level source code to a low-level inter- mediate verification language (IVL), and then perform the Keywords reachability logic, matching logic, K framework verification condition (VC) generation for the IVL. This leads to some re-usability: implementing a new program verifier for a language reduces to implementing a translator to the IVL,

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee and then reusing the VC generation already implemented provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. for the IVL. For example, VCC translates to Boogie [4] and ThisAbstracting is the with author’s credit is permitted. version To of copy the otherwise, work. It to isrepublish, posted to herepost on for servers, your or topersonal redistribute use. to lists, Not contact for Frama-C translates to Why3 [21]. redistribution.the Owner/Author(s). The Request definitive permissions version from [email protected] was published in or Publications the following Dept., ACM,publication: Inc., fax +1 (212) 869-0481. However, defining correct language translations is not easy. OOPSLA’16OOPSLA ’16, October, November 30-November 2–4, 04 2016, 2016, Amsterdam, Amsterdam, Netherlands Netherlands Copyrightc 2016© ACM.2016 held by978-1-4503-4444-9/16/11... owner/author(s). Publication rights licensed to ACM. Consider VCC. The translator consists of 5000 lines of F# [1] ACM 978-1-4503-4444-9/16/10. . . $15.00 http://dx.doi.org/10.1145/2983990.2984027DOI: http://dx.doi.org/10.1145/2983990.2984027

74