Cartesian Hoare for Verifying k-Safety Properties ∗

Marcelo Sousa Isil Dillig University of Oxford, UK The University of Texas at Austin, USA [email protected] [email protected]

Abstract produce outputs consistent with Ψ. Hence, a valid Hoare Unlike safety properties which require the absence of a “bad" triple states which input-output pairs are feasible in a single, program trace, k-safety properties stipulate the absence of but arbitrary, execution of S. a “bad" interaction between k traces. Examples of k-safety However, some important functional correctness proper- properties include transitivity, associativity, anti-symmetry, ties require reasoning about the relationship between multiple and monotonicity. This paper presents a sound and relatively program executions. As a simple example, consider deter- complete calculus, called Cartesian Hoare Logic (CHL), for minism, which requires ∀x, y. x = y ⇒ f(x) = f(y). This verifying k-safety properties. Our program logic is designed property is not a standard safety property because it cannot with automation and scalability in mind, allowing us to be violated by any individual execution trace. Instead, de- formulate a verification algorithm that automates reasoning terminism is a so-called 2-safety hyperproperty [10] since in CHL. We have implemented our verification algorithm in a we need two execution traces to establish its violation. In general, a k-safety (hyper-)property requires reasoning about fully automated tool called DESCARTES, which can be used to analyze any k-safety property of Java programs. We have relationships between k different execution traces. Even though there has been some work on verifying 2- used DESCARTES to analyze user-defined relational operators safety properties in the context of secure information flow [1, and demonstrate that DESCARTES is effective at verifying (or finding violations of) multiple k-safety properties. 3, 23] and security protocols of probabilistic systems [2, 4], we are not aware of any general-purpose program that Categories and Subject Descriptors F.3.1 [Logics and allow verifying k-safety properties for arbitrary values of k. Meanings of Programs]: Specifying and Verifying and Rea- Furthermore, even for k = 2, existing tools do not support a soning about Programs; D.2.4 [Software Engineering]: Soft- high degree of automation. ware/Program Verification This paper argues that many important functional correct- Keywords Relational hoare logic; safety hyper-properties; ness properties are k-safety properties for k ≥ 2, and we product programs; automated verification present Cartesian Hoare Logic (CHL) for verifying general k-safety specifications. 1 Our new program logic for k-safety 1. Introduction is sound, relatively complete, and has been designed to be easy to automate: We have built a verification tool called Following the success of Hoare logic as a DESCARTES that fully automates reasoning in CHL and suc- to verify program correctness, many tools can prove safety cessfully used it to analyze k-safety requirements of user- properties expressed as pre- and post-conditions. That is, defined relational operators in Java programs. given a Hoare triple {Φ} S {Ψ}, program verifiers establish that terminating executions of S on inputs satisfying Φ Motivating Example. To motivate the need for verifying ∗ This work was supported in part by NSF Award #1453386 and AFRL k-safety properties, consider the widely used Comparator Awards # 8750-14-2-0270 and 8750-15-2-0096. interface in Java. By implementing the compare method of this interface, programmers can define an ordering be- tween objects of a given type. Unfortunately, writing a correct compare method is notoriously hard, as any valid compara- tor must satisfy three different k-safety properties: • P1: ∀x, y. sgn(compare(x, y)) = −sgn(compare(y, x)) • P2: ∀x, y, z. (compare(x, y) > 0 ∧ compare(y, z) > 0) ⇒ compare(x, z) > 0

1 We call our calculus Cartesian Hoare Logic because it allows reasoning about the cartesian product of sets of input-output pairs from different runs. • P3: ∀x, y, z. (compare(x, y) = 0 ⇒ S1 ~ ... ~ Sk represents a set of programs that simulta- (sgn(compare(x, z)) = sgn(compare(y, z)))) neously execute S1,...,Sk. While every pair of programs 0 Among these properties, P1 is a 2-safety property, while P,P ∈ (S1 ~ ... ~ Sk) are semantically equivalent, our P2 and P3 are 3-safety properties. For instance, property P2 calculus exploits the fact that some of these programs are corresponds to transitivity, which can only be violated by a much easier to verify than others. Furthermore, and perhaps collection of three different runs of compare: Specifically, more crucially, our approach does not explicitly construct the two runs where compare(a,b) and compare(b,c) set of programs in (S1 ~ ... ~ Sk) but reasons directly about both return positive values, and a third run compare(a,c) the provability of Ψ on tuples of runs that satisfy Φ. that returns zero or a negative value. Contributions. To summarize, this paper makes the follow- To demonstrate that implementing comparators can be ing contributions: hard, consider the compare method shown in Figure 1, which is taken verbatim from a Stackoverflow post. This • We argue that many natural functional correctness prop- method is used for sorting poker hands according to their erties are k-safety properties, and we propose Cartesian strength and represents hands as strings of 13 characters. Hoare triples for specifying them (Section 3). In particular, the occurrence of number n at position k of • We present a sound and complete proof system called the string indicates that the hand contains n cards of type k. Cartesian Hoare Logic (CHL) for proving valid Cartesian Unfortunately, this method has a logical error when one of the Hoare triples (Section 4). hands is a full house and the other one has 3 cards of a kind; it • therefore ends up violating properties P2 and P3. As a result, We describe a practical algorithm based on CHL for programs using this comparator either suffer from run-time automatically proving k-safety properties (Section 5). exceptions or lose elements inserted into collections. • We use our techniques to analyze user-defined relational operators in Java and show that our tool can successfully Cartesian Hoare Logic. As illustrated by this example, an- verify (or find bugs in) such methods (Section 7). alyzing k-safety properties is extremely relevant for ensuring software correctness; yet there are no general purpose auto- mated tools for analyzing k-safety properties. This paper aims to rectify this situation by proposing a general framework for 2. Language and Preliminaries specifying and verifying general k-safety properties. Figure 2 presents a simple imperative language that we use In our approach, k-safety properties are specified using for formalizing Cartesian Hoare Logic. In this language, Cartesian Hoare triples of the form kΦk S kΨk, where Φ atomic statements A include skip (denoted as [), assignments and Ψ are first-order formulas that relate k different program (x := e),√ array writes x[e] := e, and assume statements runs. For instance, we can express property P2 from above written as c. Statements also include composition S; S and using the following Cartesian Hoare triple: conditionals S1 ⊕c S2, which execute S1 if c evaluates to true and S2 otherwise. In our language, loops have the syntax ky1 = x2 ∧ x1 = x3 ∧ y2 = y3k 0 0 ∗ [(c1,S1,S1),..., (cn,Sn,Sn)] which is short-hand for the compare(x, y){...} more convential loop construct: kret1 > 0 ∧ ret2 > 0 ⇒ ret3 > 0k while(true) { Here, a variable named vi refers to the value of program if(c_1) then S_1 else { S_1’; break; } variable v in the i’th program execution. Hence, the pre- ... condition Φ states that we are only interested in triples of if(c_n) then S_n else { S_n’; break; } } executions (π1, π2, π3) where (i) the value of y in π1 is equal to the value of x in π2, (ii) the values of x in π1 and π3 are In this paper, we choose to formalize loops with breaks the same, and (iii) the values of y are the same in π2 and π3. The postcondition Ψ says that, if compare returns a positive rather than the simpler loop construct while(c) do S (i.e., [(c, S, [)]∗) because break statements are ubiquitous in real value in both π1 and π2, then it should also be positive in π3. To verify such Cartesian Hoare triples, our new program programs and present a challenge for verifying k-safety. We assume an operational semantics that is specified using logic reasons about the Cartesian product of sets of input- 0 k Σ judgments of the form σ, e ⇓ int and σ, S ⇓ σ , where e is a output pairs from program runs. Specifically, if represents 0 the set of all input-output pairs of code S, then the judgment program expression, S is a statement, and σ, σ are valuations ` kΦk S kΨk is derivable in our calculus iff every k-tuple in (stores) mapping program variables to their values. Since the Σk that satisfies Φ also satisfies Ψ. operational semantics of this language is standard, we refer The key idea underlying our approach is to reduce the the interested reader to the extended version of the paper. verification of a Cartesian Hoare triple to a standard Hoare Definition 1. (Entailment) Let σ be a valuation and ϕ a triple that can be easily verified. Specifically, our calculus de- formula. We write σ |= ϕ iff σ, ϕ ⇓ true. rives judgments of the form hΦi (S1 ~ ... ~ Sk) hΨi where public int compare(String c1, String c2) { if (c1.indexOf(’4’) != -1 || c2.indexOf(’4’) != -1) { // Four of a kind if (c1.indexOf(’4’) == c2.indexOf(’4’)) { for (int i = 12; i >= 0; i--) { if (c1.charAt(i) != ’0’ && c1.charAt(i) != ’4’) { if (c2.charAt(i) != ’0’ && c2.charAt(i) != ’4’) return 0; return 1; } if (c2.charAt(i) != ’0’ && c2.charAt(i) != ’4’) return -1; } } return c1.indexOf(’4’) - c2.indexOf(’4’); } int tripleCount1 = StringFunctions.countOccurrencesOf(c1, "3"); int tripleCount2 = StringFunctions.countOccurrencesOf(c2, "3"); if (tripleCount1 > 1 || (tripleCount1 == 1 && c1.indexOf(’2’) != -1) || tripleCount2 > 1 || (tripleCount2 == 1 && c2.indexOf(’2’) != -1)) { // Full house int higherTriple = c1.lastIndexOf(’3’); if (higherTriple == c2.lastIndexOf(’3’)) { for (int i = 12; i >= 0; i--) { if (i == higherTriple) continue; if (c1.charAt(i) == ’2’ || c1.charAt(i) == ’3’) { if (c2.charAt(i) == ’2’ || c2.charAt(i) == ’3’) return 0; return 1; } if (c2.charAt(i) == ’2’ || c2.charAt(i) == ’3’) return -1; } } return higherTriple - c2.lastIndexOf(’3’); } return 0; } Figure 1. A (buggy) comparator for sorting poker hands

Hoare triple for which a counterexample consists of a single Statement S := A | S; S | S ⊕c S | execution, a counterexample for a Cartesian Hoare triple [(c ,S ,S0 ),..., (c ,S ,S0 )]∗ 1 1 1 n √n n includes k different executions. In the rest of the paper, we Atom A := [ | x := e | x[e] := e | c refer to the value k as the arity of the Cartesian Hoare triple. Expr e := int | x | x[e] | e opa e Cond c := > | ⊥ | ? | e op e | ¬c | c ∧ c | c ∨ c l Example 1. Figure 3 shows some familiar properties and their corresponding specification. Among these Cartesian Figure 2. Language for formalization. opa, opl denote arithmetic and logical operators, and ? represents non-deterministic choice. Hoare triples, transitivity and homomorphism correspond to 3-safety properties, and associativity is a 4-safety property. Definition 2. (Semantic Equivalence) Two statements S1 The remaining Cartesian Hoare triples, such as monotocity, and S2 are semantically equivalent, written S1 ≡ S2, if for injectivity, and idempotence, have arity 2. 0 0 all valuations σ, we have σ, S1 ⇓ σ iff σ, S2 ⇓ σ . Example 2. Figure 4 shows some Cartesian Hoare triples 3. Validity of Cartesian Hoare Triples and indicates whether they are valid. For instance, consider the first two rows of Figure 4. For both examples, the precon- We now introduce Cartesian Hoare triples and provide use dition tells us to consider a pair of executions π , π where cases from real-world programming idioms to motivate the 1 2 the values of x and y are swapped (i.e., the value of x in π relevance and general applicability of k-safety properties. 1 is the value of y in π2 and vice versa). Since the statement of Definition 3. (Hoare Triple) Given statement S, a Hoare interest is z := x − y, the value of z in π1 will be the additive 0 triple {Φ} S {Ψ} is valid if for all pairs of valuations (σ, σ ) inverse of the value of z in π2. Hence, the second Cartesian satisfying σ |= Φ and σ, S ⇓ σ0, we have σ0 |= Ψ. Hoare triple is valid, but the first one is not. Cartesian Hoare triples generalize standard Hoare triples by allowing us to relate different program executions: 3.1 Realistic Use Cases for Cartesian Hoare Triples Definition 4. (Cartesian Hoare Triple) Let S be a state- We now highlight some scenarios where k-safety properties ment over variables ~x, and let Φ, Ψ be formulas over are relevant in modern programming. Since it is well-known variables ~x1, . . . , ~xk. Then, the Cartesian Hoare triple that security properties such as non-interference are 2-safety kΦk S kΨk is valid, written |= kΦk S kΨk, if for every properties [3, 23], we do not include them in this discussion. 0 0 set of valuation pairs {(σ1, σ1),..., (σk, σk)} satisfying Equality. A ubiquitious programming practice is to imple- ment a custom equality operator for a given type, for ex- ]  0 σi[~xi/~x] |= Φ and ∀i ∈ [1, k]. σi,S ⇓ σi ample, by overriding the equals method in Java. A key 1≤i≤k property of equals is that it must be an equivalence relation ] 0  (i.e., reflexive, symmetric, and transitive). While reflexivity we also have: σ [~xi/~x] |= Ψ i can be expressed as a standard Hoare triple, symmetry and 1≤i≤k transitivity are 2 and 3-safety properties respectively. Fur- Intuitively, the validity of kΦk S kΨk means that, if we thermore, equals must satisfy another 2-safety property run S on k inputs whose relationship is given by Φ, then the called consistency, which requires multiple invocations of resulting outputs must respect Ψ. Hence, unlike a standard x.equals(y) to consistently return true or false. Property Specification

Monotonicity kx2 ≤ x1k f(x) kret2 ≤ ret1k

Determinism k~x1 = ~x2k f(~x) kret1 = ret2k

Injectivity k~x1 6= ~x2k f(~x) kret1 6= ret2k

Symmetric relation kx1 = y2 ∧ y1 = x2k R(x, y) kret1 = ret2k

Anti-symmetric relation kx1 = y2 ∧ y1 = x2 ∧ x1 6= y1k R(x, y) kret1 = false ∨ ret2 = falsek

Asymmetric relation kx1 = y2 ∧ y1 = x2k R(x, y) kret1 = true ⇒ ret2 = falsek

Transitive relation ky1 = x2 ∧ x1 = x3 ∧ y2 = y3k R(x, y) k(ret1 = true ∧ ret2 = true) ⇒ ret3 = truek

Total relation kx1 = y2 ∧ y1 = x2k R(x, y) kret1 = true ∨ ret2 = truek

Associativity ky1 = x2 ∧ y2 = y3 ∧ x1 = x4k f(x, y) k(x3 = ret1 ∧ y4 = ret2) ⇒ ret3 = ret4k

Homomorphism kx3 = x1 · x2k f(x) kret3 = ret1 · ret2k

Idempotence ktruek f(x) kret1 = x2 ⇒ ret1 = ret2k Figure 3. Example k-safety properties and their specification

Cartesian Hoare Triple Valid? on sets of runs that jointly satisfy Φ. Hence, unlike previous

kx1 = y2 ∧ x2 = y1k z := x − y kz1 = z2k  approaches for proving equivalence or non-interference [3, 5, kx1 = y2 ∧ x2 = y1k z := x − y kz1 = −z2k  26], our technique does not construct a product program that kx1 = x2 ∧ y1 = y2k z := 1 ⊕x>y z := 0 kz1 = z2k  is subsequently fed to an off-the-shelf verifier. Instead, we kx1 = x2 ∧ y1 = y2k z := 1 ⊕? z := 0 kz1 = z2k  combine the verification task with the construction of only ∗ kx1 = x2k [(true, x := x + 1,[)] kx1 6= x2k  the relevant parts of the product program. ∗ kx1 = x2k [(x < 10, x := x + 1,[)] kx1 6= x2k  To motivate the advantages of our approach over explicit product construction, consider the following specification: Figure 4. Example Cartesian Hoare triples kx1 > 0 ∧ x2 ≤ 0k S1 ⊕x>0 S2 ky1 = −y2k

Comparators. Another common programming pattern in- where S1 and S2 are very large code fragments. Now, let Sij volves writing a custom ordering operator, for example represent Sj[~xi/~x] , and let Sij,kl be a program that executes by implementing the Comparator interface in Java or Sij and Skl in lockstep. One way to verify our Cartesian operator<= in C++. Such comparators are required to de- Hoare triple is to construct the following product program fine a total order – i.e., they must be reflexive, anti-symmetric, and feed it to an off-the-shelf verifier: transitive, and total. Programming errors in comparators are quite common and can have a variety of serious consequences. P :(S11,21 ⊕x2>0 S11,22) ⊕x1>0 (S12,21 ⊕x2>0 S12,22) For instance, such bugs can cause collections to unexpectedly lose elements or not be properly sorted. However, this strategy is quite suboptimal: Since the pre- Map/Reduce. Map-reduce is a widely-used paradigm for writ- condition is x1 > 0 ∧ x2 ≤ 0, we are only interested in a ing parallel code [13, 24]. Here, the user-defined reduce small subset of P , which only includes S11,22. In contrast, function must be associative and stateless. Furthermore, if by combining semantic reasoning with construction of the reduce is commutative and associative, then it can be used relevant Hoare triples, our approach can avoid constructing as a combiner, a “mini-reducer" that can be executed be- and reasoning about redundant parts of the product program. map reduce tween the and phases to optimize bandwidth. 4.1 Core Cartesian Hoare Logic In concurrent data structures that support reduce operations (e.g., the Java ConcurrentHashMap), the user-defined With this intuition in mind, we now explain the core subset of function must also be commutative and associative. Cartesian Hoare Logic shown in Figure 5. Since our calculus derives judgments of the form hΦi S1 ~ ... ~ Sn hΨi, we 4. Cartesian Hoare Logic first start by defining the product space of two programs S1 and S2, which we denote as S1 ~ S2: We now present our program logic (CHL) for verifying Let S and S be two state- Cartesian Hoare triples. The key idea underlying CHL is Definition 5. (Product space) 1 2 ments such that vars(S ) ∩ vars(S ) = ∅. Then, to verify kΦk S kΨk by proving judgements of the form 1 2 hΦi (S1 ~ ... ~ Sk) hΨi where each Si corresponds to a  0 0 0 0  ∀σ1, σ2, σ1, σ2. (σ1,S1 ⇓ σ1 ∧ σ2,S2 ⇓ σ2) different execution of S. Here, the notation S ... S S1~S2 = S 0 0 1 ~ ~ k ⇔ (σ1 ] σ2,S ⇓ σ1 ] σ2) represents the set of all programs that are semantically J K equivalent to simultaneously running S1,...,Sk. However, Intuitively, S1 ~ S2 represents the set of programs that rather than explicitly constructing this set –or even any are semantically equivalent to the simultaneous execution program in this set– we only reason about the provability of Ψ of S1 and S2. We generalize this notion of product space to arbitrary values of k and consider product terms χ of the ` hΦi Sn hΨi form S1 ... Sk where: (Expand)  ~ ~ ` kΦk S kΨk S = {S} J K 0 0 ` {Φ} S {Ψ} S ~ χ = S ~ S where S ∈ χ (Lift) J K J K J K ` hΦi S hΨi Since we want to reason about different runs of the same ` hΦi S; [ χ hΨi program, we also define the self-product space as follows: ([−intro 1) ~ ` hΦi S ~ χ hΨi Definition 6. (Self product space) ` hΦi χ [ hΨi ([−intro 2) ~ 1 S ≡ S[~x1/~x] ` hΦi χ hΨi  n n−1 S ≡ ( S ) ~ S[~xn/~x] ` hΦi χ hΨi   ([−elim) In other words, Sn represents the product space of n ` hΦi χ ~ [ hΨi different α-renamed copies of S. ` hΦi χ1 ~ (χ2 ~ χ3) hΨi At a high level, the calculus rules shown in Figure 5 (Assoc) ` hΦi (χ1 ~ χ2) ~ χ3 hΨi derive judgments of the form ` hΦi χ hΨi indicating the ` hΦi χ2 χ1 hΨi provability of the standard Hoare triple {Φ} P {Ψ} for some (Comm) ~ program P ∈ χ . Since all programs in χ are semantically ` hΦi χ1 ~ χ2 hΨi 0 equivalent, thisJ meansK that the Hoare tripleJ K {Φ} P {Ψ} is 0 0 ` {Φ} S1 {Φ } ` hΦ i S2 ~ χ hΨi valid for any P 0 ∈ χ . However, since some programs are (Step) ` hΦi S1; S2 ~ χ hΨi easier to verify thanJ otherK semantically equivalent variants, V = vars(χ) our calculus enables the verification of a Cartesian Hoare (Havoc) triple kΦk S kΨk by considering different variants in the ` hΦi χ h∃V.Φi n self-product space S . Φ ⇒ Φ0 Ψ0 ⇒ Ψ ` hΦ0i χ hΨ0i  (Consq) We now explain the CHL rules in more detail and highlight ` hΦi χ hΨi how these proof rules allow us to achieve a good combination 0 ` hΦi (S1 ~ ... ~ Sn) hΦ i of flexibility, automatability, and scalability. 0 ` hΦ i (R1 ... Rn) χ hΨi (Seq) ~ ~ ~ Expand. The first rule of Figure 5 reduces the verification ` hΦi (S1; R1 ~ ... ~ Sn; Rn) ~ χ hΨi of an n-ary Cartesian Hoare triple kΦk S kΨk to the prov- ability of hΦi Sn hΨi. Since each program P ∈ Sn is ` hΦ ∧ ci S1; S ~ χ hΨi ` hΦ ∧ ¬ci S2; S χ hΨi semantically equivalent to the sequential execution of n α- (If) ~ ` hΦi (S1 ⊕c S2; S) χ hΨi renamed copies of S, the derivability of hΦi Sn hΨi im- ~ plies the validity of kΦk S kΨk.  Φ ⇒ I ∗ ` {I} [(c1,S1,[),..., (cn,Sn,[)] {I} n 0 Lift. The Lift rule allows us to prove hΦi S hΨi in the ` hposti(I) ∧ ¬cii S χ hΨi (∀i ∈ [1, n]) (Loop) i ~  0 0 ∗ degenerate case where n = 1. In this case, we resort to ` hΦi [(c1,S1,S1),..., (cn,Sn,Sn)] ~ χ hΨi standard Hoare logic to prove the validity of {Φ} S {Ψ}. Figure 5. Core Cartesian Hoare Logic (CHL) Skip intro. The next two rules, labeled [-intro, allow us to introduce no-ops after statements and at the end of other prod- uct terms. These rules are useful for eliminating redundancy Step. The Step rule allows us to decompose the verification in our calculus. of hΦi S1; S2 ~ χ hΨi in the following way: First, we find an 0 0 Skip elim. The [-elim rule, which is the analog of the [-intro auxiliary assertion Φ and prove the validity of {Φ} S1 {Φ }. 0 2 rule, eliminates skip statements from product terms. Note If we can also establish the validity of hΦ i S2 ~ χ hΨi, we that the elimination analog of the [-intro 1 rule is unnecessary obtain a proof of hΦi S1; S2 ~ χ hΨi. This rule turns out because it is derivable using the other rules. to be particularly useful when we can execute S2 and χ in lockstep, but not S1 and χ. Associativity. The Assoc rule exploits the associativity of hΦi (χ χ ) χ hΨi operator ~: To prove 1 ~ 2 ~ 3 , it suffices to Havoc. This rule allows us to prove hΦi χ h∃V.Φi for hΦi χ (χ χ ) hΨi show 1 ~ 2 ~ 3 . The associativity rule is very any χ, where V denotes free variables in χ. Note that by, important for the flexibility of our calculus because, together existentially quantifying V , we effectively assume that χ with the commutativity rule, it allows us to consider different invalidates all facts we know about variables V . However, k interleavings between program executions. facts involving variables that are not modified by χ are Commutativity. The Comm rule states the commutativity still valid. The main purpose of the Havoc rule is to prune of ~. Specifically, it says that a proof of hΦi χ1 ~ χ2 hΨi irrelevant parts of the state space by combining it with the also constitutes a proof of hΦi χ2 ~ χ1 hΨi. Consq rule. Consequence. The Consq rule is the direct analog of the Loop. At a conceptual level, the Loop rule allows us to 0 0 ∗ standard consequence rule in Hoare logic. As expected, this reason about L = [(c1,S1,S1),..., (cn,Sn,Sn)] ~ χ as rule states that we can prove the validity of hΦi χ hΨi by though χ was “embedded" inside each exit (break) point proving hΦ0i χ hΨ0i where Φ ⇒ Φ0 and Ψ0 ⇒ Ψ. of L. In particular, this rule first reasons about the first k − 1 iterations of the loop (i.e., all except the last one) and Example 3. We now illustrate why the Havoc and Consq then considers L χ where L represents the computation rules are very useful. Suppose we would like to prove: k ~ k performed during the last iteration. Let us now consider this rule in more detail. The first two htruei x1 := false ~ S2 ~ S3 h(x1 ∧ x2) ⇒ x3i lines of the premise state that I is an inductive of where S2,S3 are statements over x2, x3. After using Step the first k − 1 iterations of L, where k denotes the number of to prove {true} x1 := false {¬x1}, we end up with proof iterations of L. Hence, we know that I holds at the end of the obligation h¬x1i S2 ~ S3 hx1 ∧ x2 ⇒ x3i. Since S2,S3 do k−1’th iteration. Now, we essentially perform a case analysis: not contain variable x1, we can use Havoc to obtain a When the loop terminates, one of the ci’s must be false, but proof of h¬x1i S2 ~ S3 h¬x1i. Now, since we have ¬x1 ⇒ we don’t know which one, so we try to prove Ψ for each of the (x1 ∧ x2 ⇒ x3), the consequence rule allows us to obtain n possibilities. Specifically, suppose L terminated because the desired proof. Observe that, even though S2 and S3 are ci was the first condition to evaluate to false during the potentially very large program fragments, we were able to k’th iteration. In this case, we need to symbolically execute complete the proof without reasoning about S S at all. 0 2 ~ 3 S1,...,Si−1 before we can reason about Si ~ χ. For this purpose, we define post ( ) as follows: Sequence. The Seq rule allows us to execute statements i I from different runs in lockstep. In particular, it tells us post1(I) = I that we can prove hΦi (S ; R ... S ; R ) χ hΨi by 1 1 ~ ~ n n ~ posti(I) = post(posti−1(I) ∧ ci−1,Si−1) 0 first showing hΦi S1 ~ ... ~ Sn hΦ i and then separately 0 where post(φ, S) denotes a post-condition of S with respect constructing a proof of hΦ i R1 ~ ... ~ Rn ~ χ hΨi. to φ. Hence, posti(I) represents facts that hold right be- If. The If rule allows us to “embed" product term χ inside 0 fore executing (ci,Si,Si). Now, if the loop terminates due each branch of S1 ⊕c S2. As the next example illustrates, to condition ci, we need to prove Ψ holds after executing this rule can be useful for simplifying the verification task, 0 Si ~ χ assuming posti(I) ∧ ¬ci. Hence, if we can establish particularly when combined with Havoc and Consq. 0 hposti(I) ∧ ¬cii Si ~ χ hΨi for all i ∈ [1, n], this also gives 0 0 ∗ Example 4. Suppose we want to prove: us a proof of hΦi [(c1,S1,S1),..., (cn,Sn,Sn)] ~ χ hΨi. Example 5. Consider the following loop L: kϕ1k y := 1 ⊕x>0 y := 0 kϕ2k ∗ where ϕ1 is x1 = −x2 ∧x1 6= 0 and ϕ2 is y1 +y2 = 1. After [(i < n, [, [), (a[i] ≥ b[i], [, r:=−1), (a[i] ≤ b[i], i++, r:=1)] applying the Expand and If rules (combined with [ elim and intro), we end up with the following two proof obligations: which often arises when implementing a lexicographic order. Suppose we want to prove the following 3-safety property: (1) hϕ1 ∧ x1 > 0i y1 := 1 ~ (y2 := 1 ⊕x2>0 y2 := 0) hϕ2i (2) hϕ1 ∧ x1 ≤ 0i y1 := 0 (y2 := 1 ⊕x >0 y2 := 0) hϕ2i ~ 2 ka1 = a3 ∧ b1 = a2 ∧ b2 = b3 ∧ n1 = n2 ∧ n2 = n3k Let’s only focus on (1), since (2) is similar. Using first Comm r := 0; i := 0; L and then If, we generate the following proof obligations: kr1 ≤ 0 ∧ r2 ≤ 0 ⇒ r3 ≤ 0k

(1a) hϕ1 ∧ x1 > 0 ∧ x2 > 0i y2 := 1 ~ y1 := 1 hϕ2i We can verify this Cartesian Hoare triple using our Loop rule (1b) hϕ1 ∧ x1 > 0 ∧ x2 ≤ 0i y2 := 0 ~ y1 := 1 hϕ2i and ∀j. 0 ≤ j < i ⇒ a[j] = b[j]. However, Since the precondition of (1a) is unsatisfiable, we can imme- this invariant is not sufficient if we try to verify this code diately discharge it using Havoc and Consq. We can also dis- using self-composition. charge (1b) by computing its postcondition y2 = 0 ∧ y1 = 1, which can be obtained using Step, [-intro, elim, and Lift. 4.2 Cartesian Loop Logic Note that we could even verify this property using a path- The core calculus we have considered so far is sound and insensitive analysis (e.g., the polyhedra [11] or octagon [17] relatively complete, but it does not allow us to execute loops abstract domains) using the above proof strategy. In particu- from different runs in lockstep. Since lockstep execution lar, conceptually embedding the second program inside the can greatly simplify the verification task (e.g., by requiring then and else branches of the first program greatly simplifies simpler invariants), we have augmented our calculus with the proof. While we could also prove this simple property us- additional rules for reasoning about loops. These proof rules, ing self composition, the proof would require a path-sensitive which we refer to as “Cartesian Loop Logic" are summarized analysis. in Figure 6 and explained next. (Transform − single) √ [(c, S, S0)]∗ [(c, S, [)]∗; ¬c; S0

∗ 0 0 ∗ 00 00 0 [R] [(c ,S ,[)] ; S c = c1 ∧ wp(c ,S1) √ √ (Transform − multi) 0 ∗ 00 0 ∗ 00 0 [(c1,S1,S1),R] [(c ,S1; S ,[)] ;( c1; S1; S ) ⊕? ( ¬c1; S1)

[L]∗ [(c, B, [)]∗; R ` hΦi [(c, B, [)]∗; R χ hΨi (Flatten) ~ ` hΦi [L]∗ ~ χ hΨi V ` hI ∧ 1≤i≤n cii S1 ~ ... ~ Sn hIi Φ ⇒ I V V (I ∧ ¬ 1≤i≤n ci) ⇒ 1≤i≤n ¬ci (Fusion 1) ∗ ∗ V ` hΦi [(c1,S1,[)] ~ ... ~ [(cn,Sn,[)] hI ∧ ¬ 1≤i≤n cii V ` hI ∧ 1≤i≤n cii S1 ~ ... ~ Sn hIi ∗ ∗ ` hI ∧ ¬c1i [(c2,S2,[)] ~ ... ~ [(cn,Sn,[)] hΨi ∗ ∗ ∗ ` hI ∧ ¬c2i [(c1,S1,[)] ~ [(c3,S3,[)] ~ ... ~ [(cn,Sn,[)] hΨi ... ∗ ∗ ` hI ∧ ¬cni [(c1,S1,[)] ~ ... ~ [(cn−1,Sn−1,[)] hΨi (Fusion 2) ∗ ∗ (n ≥ 2., Φ ⇒ I) ` hΦi [(c1,S1,[)] ~ ... ~ [(cn,Sn,[)] hΨi Figure 6. Cartesian Loop Logic (CLL)

Transform. The first two rules, labeled Transform-single we can soundly reason about the last iteration of L using √ 00 √ 0 2 and Transform-multi, allow us to “rewrite" complicated loops ( c1; S1; S ) ⊕? ( ¬c1; S1). L containing break statements into code snippets of the form Example 6. Consider again loop L from Example 5. Using while(C) do {S}; S’ . These rules derive judgments Transform rules of Figure 6, we have L S where S is: of the form L S stating that any Hoare triple that is valid ∗ for S is also valid for L. In particular, while L and S may √ √ [(√i < n ∧ a[i] = b√[i], i++,[)] ; √ ( c ;( c ; ¬c ; r:= 1 ⊕ ¬c ; r:= −1)) ⊕ ¬c not be semantically equivalent, our transformation L S 1 2 3 ? 2 ? 1 guarantees that any valid Hoare triple {Φ} S {Ψ} implies the Here, c1, c2, c3 represent i < n, a[i] ≥ b[i], and a[i] ≤ b[i]. validity of {Φ} L {Ψ}. Flatten. The Flatten rule uses the previously discussed Let us first consider the Transform-single rule where the ∗ loop contains a single break statement. This rule simply Transform rules to simplify the proof of hΦi [L] hΨi. In √ ∗ “rewrites" the loop [(c, S, S0)∗] to [(c, S, [)∗; ¬c; S0]. In particular, it first “rewrites" [L] as S and then proves the 0 validity of hΦi S ~ χ hΨi. This proof rule is sound since other words, it says that we can replace S with a skip ∗ 0 {φ} S {ϕ} always implies the validity of {φ} [L] {ϕ} for statement and simply execute S once the loop terminates. ∗ The Transform-multi rule handles loops containing mul- any φ, ϕ whenever [L] S. As we will see shortly, the tiple break points. In particular, suppose we have a loop Flatten rule can make proofs significantly easier, particularly 0 ∗ ∗ when combined with the Fusion rules explained below. L of the form [(c1,S1,S1),R] and suppose that R [(c0,S0,[)]∗; S00. This means that R is semantically equiv- Fusion 1. This rule effectively allows us to perform lock- 0 0 alent to (c ,S ,[) in all iterations of L except the last one. step execution for loops L1,...,Ln that always execute the 0 0 0 ∗ Hence, the loop L : [(c1,S1,[), (c ,S ,[)] is also equiva- same number of times. Observe that this rule requires each lent to L under all valuations in which L executes at least Li to contain a single break point; hence, if this requirement 0 once more. Furthermore, if wp(c ,S1) denotes the weakest is violated, we must first apply the Flatten rule. 0 0 precondition of c with respect to S1, then L (and therefore Let us now consider this rule in more detail. Recall that L) is also equivalent to S1 ~ ... ~ Sn is the set of programs that are semantically 00 0 0 ∗ equivalent to S ; ...,S . Hence, the first two lines of the L : [(c1 ∧ wp(c ,S1),S1; S ,[)] 1 n premise effectively state that I is an inductive invariant of: under all valuations in which L executes one or more times. ∗ Now, let’s consider the last iteration of L. There are two L : [(c1 ∧ ... ∧ cn,S1; ... ; Sn,[)] possibilities: Either we broke out of the loop because c1 evaluated to false or some other condition ci in R evaluated Furthermore, the third line of the premise tells us that every to false. In the former case, we can assume ¬c1 and we need ci will evaluate to false when L terminates, which, in turn, to execute S0 . In the latter case, we can assume c , and we 1 1 2 The careful reader may wonder whether the last iteration of the loop could 00 0 still need to execute S1 as well as the “part" of R that is be accounted for using S1; S ⊕c S . This reasoning would not be sound 00 1 1 executed in the last iteration, which is given by S . Hence, since c1 may be modified in S2,...,Sn. 1: procedure K-VERIFY(Φ, χ, Ψ) indicates that all Li’s always execute the same number 2: Input: Precondition Φ, term χ, postcondition Ψ V of times. Thus, I ∧ ¬ i ci is a valid post-condition of 3: Output: Boolean indicating whether property holds L1 ~ ... ~ Ln under precondition Φ. 4: let V := Vars(χ) in Example 7. Consider loop L from Example 5, and suppose 5: if ((∃V.Φ) ⇒ Ψ) then return true; . Havoc and Consq we want to verify the following Cartesian Hoare triple: 6: match χ with kΦk (r := 0; i := 0; L) kΨk 7: | S: return VERIFY(Φ, S, Ψ); . Lift 0 0 where Φ is a = b ∧ b = a ∧ a 6= b and Ψ is 8: | [ ~ χ : return K-VERIFY(Φ, χ , Ψ) .[-elim 1 2 1 2 1 1 ∗ ∗ 9: | [L1] ; S1 ... [Ln] ; Sn: . all loops r1 < 0 ⇒ r2 ≥ 0. After applying Seq, we generate the ~ ~ 10: return K-VERIFYLOOP(Φ, χ, Ψ) following proof obligation: ∗ 0 11: | [L] ; S ~ χ hΦ ∧ r = r = 0 ∧ i = i = 0i L L hΨi 0 ∗ 1 2 1 2 1 ~ 2 12: return K-VERIFY(Φ, χ ~ [L] ; S) . Comm 0 0 We then apply Flatten to both L1 and L2, which yields the 13: | A; S ~ χ : let Φ := post(A, Φ) in 0 0 following proof obligation: 14: return K-VERIFY(Φ ,S ~ χ , Ψ) . Step 0 hΦ ∧ r = r = 0 ∧ i = i = 0i L0 ; S0 L0 ; S0 hΨi 15: | S1 ⊕c S2; S ~ χ : 1 2 1 2 1 1 ~ 2 2 0 16: K ERIFY Φ ∧ c, S ; S χ , Ψ where L0 ,L0 are α-renamed versions of the loop obtained let r := -V ( 1 ~ ) in 1 2 17: if(r != true) return false; . If in Example 6 and S1,S2 are α-renamed versions of S from 0 18: return K-VERIFY(Φ ∧ ¬c, S2; S ~ χ , Ψ ) Example 6. We now apply Seq again, grouping together L 0 0 1 19: | S ~ χ : return K-VERIFY(Φ,S; [ ~ χ , Ψ) .[-intro and L , and then use Fusion to obtain a postcondition Φ0: 2 20: | (χ1 ~ χ2) ~ χ3: 0 0 0 hΦ ∧ r1 = r2 = 0 ∧ i1 = i2 = 0i L1 ~ L2 hΦ i 21: return K-VERIFY(Φ, χ1 ~ (χ2 ~ χ3), Ψ) . Assoc Using the loop invariant Φ ∧ i1 = i2 and the Fusion Figure 7. Verification algorithm for k-safety and Consq rules, we can obtain the post-condition Φ0 = (Φ ∧ i1 = i2). This finally leaves us with the proof obligation 5. Verification Algorithm Based on CHL 0 0 0 hΦ i S1 ~ S2 hΨi, which is easy to discharge using If and We now describe how to incorporate the CHL proof rules Havoc: Note that the post-condition can only be violated into a useful verification algorithm. At a high level, there in the branches where r1 and r2 are both assigned to −1. are three important ideas: First, since the Havoc and Conse- However since r1 := −1 and r2 := −1 only execute quence rules can greatly simplify the verification task, our when a1[i1] < b1[i1] and a2[i2] < b2[i2] respectively, the algorithm applies these rules at every opportunity. Second, precondition Φ∧i1 = i2 implies ¬(a1[i1] < b1[i1]∧a2[i2] < we always prefer the If rule over other rules like Seq or Step, b2[i2]). Hence, the original Cartesian Hoare triple is valid. as this strategy avoids reasoning about parts of Sn that con- Observe that the Flatten and Fusion rules allow us to tradict the desired precondition. Third, we try to maximize verify the desired property using the simple loop invariant opportunities for lockstep execution of loops. For example, i1 = i2 without requiring any quantified array invariants. we use Associativity and Commutativity in a way that tries Fusion 2. The Fusion 2 rule is similar to Fusion 1 and to maximize possible applications of the Fusion rules. allows us to perform partial lockstep execution when we Our verification algorithm is presented in Figure 7: K- cannot prove that loops L1,...,Lk execute the same num- VERIFY takes as input pre- and post-conditions Φ, Ψ, product ber of times. Conceptually, this rule “executes" the loop term χ, and returns true iff we can prove hΦi χ hΨi. As a bodies S1,...,Sn in lockstep until one of the continua- first step, we check if hΦi χ hΨi can be verified using only tion conditions ci becomes false. Then, the rule proceeds the Havoc and Consequence rules (lines 4–5). with a case analysis: Assuming ci is the first condition to Next, the algorithm performs pattern matching on χ in evaluate to false, we potentially still need to execute loops lines 6–21. In the base case, χ is a single program S, so we L1,...,Li−1,Li+1,...,Ln. Furthermore, when we execute simply invoke a procedure VERIFY for proving the standard 0 these loops, we can assume the loop invariant I of the first Hoare triple {Φ} S {Ψ} (line 7). The next case [ ~ χ at loop as well as condition ¬ci (since we are assuming that ci line 8 is equally simple; in this case, we proceed by applying is the first condition to evaluate to false). Hence, lines 3-5 in [-elimination to verify hΦi χ0 hΨi. the premise verify the following proof obligation for each i: The next two cases concern terms χ that start with a loop. In particular, the pattern at line 9 checks if χ is of the form h ∧ ¬cii L1 ... Li−1 Li+1 ... Ln hΨi I ~ ~ ~ ~ ~ ∗ ∗ [L1] ; S1~...~[Ln] ; Sn (i.e., all programs start with a loop). Theorem 1 (Soundness). If ` kΦk S kΨk, then |= kΦk S kΨk. In this case, we invoke the helper function K-VERIFYLOOP which chooses between the proof tactics from Figure 6 and Theorem 2 (Relative Completeness). Given an oracle for which we consider in more detail later. On the other hand, if deciding the validity of any standard Hoare triple, if |= the first program starts with a loop (but not all the remaining kΦk S kΨk, then we also have ` kΦk S kΨk. ones), we then use Commutativity (line 12) with the hope The proofs of both theorems are provided in the extended that we will eventually find matching loops in all programs version of the paper. (i.e., match line 9 in a future recursive call). 1: procedure K-VERIFYLOOP(Φ, χ, Ψ) 2: match χ with all programs start with a loop). The goal of K-VERIFYLOOP ∗ ∗ is to determine which loop-related proof rule to use. 3: | [(c1,S1 [)] ; R1 ~ ... ~ [(cn,Sn [)] ; Rn: V In the case analysis of lines 2-21, we first check whether 4: let I := FINDINV(Φ, i ci,S1,...,Sn) in ∗ 0 V all terms in χ start with loops of the form [c, S, [] , and if so, 5: let Φ := I ∧ ¬ ci in i we apply one of the Fusion rules from Figure 6. Towards this 6: if(Φ0 ⇒ V ¬c ) then . Fuse 1, Seq i i goal, we invoke a function called FINDINV which returns a 0 7: return K-VERIFY(Φ ,R ... R , Ψ); 1 ~ ~ n formula I with the following properties: 8: else (1) Φ ⇒ I 9: forach(i ∈ [1,n]) . Fuse 2, Seq V (2) ` h ∧ cii S1 ... Sn h i 10: let χi := REMOVEITH(χ, i) in I i ~ ~ I 0 11: if(!K-VERIFY(Φ , χi, Ψ)) return false; Continuing with lines 6-12, we next check whether it 12: return true; is legal to apply the simpler Fusion 1 rule or whether it is ∗ 0 13: | [L] ; R ~ χ : necessary to use the more general Fusion 2 rule. If it is the ∗ ∗ 0 V V 14: if([L] [(c, B, [)] ; R ) then . Flatten case that (I ∧ ¬ i ci) ⇒ i ¬ci, all loops execute the same 0 ∗ 0 number of times, so we can apply Fusion 1. 15: return K-VERIFY(Φ, χ ~ [(c, B, [)] ; R ; R, Ψ); 16: else Otherwise, we apply the more general Fusion 2 rule. ∗ 17: let I := LOOPINV([L] ) in . Loop In particular, the recursive calls at line 11 correspond to 18: foreach (i ∈ [1,n]) discharging the proof obligations in the premises of the 0 19: let Φ := posti(I) ∧ ¬ci; Fusion 2 rule from Figure 6. Here, the call to the auxiliary 0 0 0 20: if(!K-VERIFY(Φ , Si; R ~ χ )) procedure to REMOVEITH returns the following formula: 21: then return false; ∗ ∗ [(c1,S1,[)] ; R1 ~ ... ~ [(ci−1,Si−1,[)] ; Ri−1~ 22: return true; ∗ ∗ [(ci+1,Si+1,[)] ; Ri+1 ... [(cn,Sn,[)] ; Rn Figure 8. Helper procedure for loops ~ ~ Observe that we return true if and only if we can discharge Continuing with line 13, we next pattern match on terms all proof obligations for every i ∈ [1, n]. 0 of the form A; S ~ χ where the first program starts with an Now consider the final case at line 13 of the algorithm. In atom A. Since we can compute the strongest postcondition for this case, we know that not all loops are of the form [c, S, []∗, atomic statements in an exact way, we apply the Step rule and so it may not be possible to apply the Fusion rules. Hence, 0 generate a new proof obligation hpost(A, Φ)i S ~ χ hΨi we first apply the Transform rules from Figure 6 to rewrite where post(A, Φ) denotes the (strongest) postcondition of A the first loop in the form [(c, B, [)]∗ If this rewrite is possible, with respect to Φ. we then apply the Flatten rule at line 15; otherwise, we resort Next, consider the case where χ starts with a program to the Loop rule from Figure 5. 3 Observe that recursive calls whose first statement is S1 ⊕c S2. In this case, we apply to K-VERIFY at line 20 correspond to the proof obligations the If rule rather than the Step rule, as this strategy has two in the premise of the Loop rule from Figure 5. advantages: First, as we saw in Example 4, this rule often Theorem 3 (Termination). K-VERIFY(Φ, χ, Ψ) terminates simplifies the verification task. Second, since one of Φ ∧ c for every Φ, χ, Ψ. or Φ ∧ ¬c may be unsatisfiable, we might be able to easily discharge the new proof obligations in the recursive calls to 6. Implementation K-VERIFY using the Consequence rule. The next case at line 19 applies to product terms of the We implemented the verification algorithm from Section 5 in form S χ0. Note that we only match this case if S consists a tool called DESCARTES, written in Haskell. DESCARTES ~ language-java of a single loop or conditional. Since we would like to apply uses the Haskell package to parse Java source the appropriate rules for conditionals and loops, we use [ code and the Z3 SMT solver [12] for determining satisfiability. intro and then make a recursive call — this strategy ensures Our implementation models object fields with uninterpreted that we will match one of the cases at lines 9, 11, or 15 in the functions and uses for library methods implemented next recursive call. by the Java SDK. The final case at line 20 exploits associativity: If the first The main verification procedure in DESCARTES receives term of χ is not a single program, but rather another product a Cartesian Hoare triple that is composed of a method in the IR format, the arity of the specification, and the pre- and term of the form χ1 ~ χ2, we apply associativity. This tactic 0 post-conditions. It then generates k alpha-renamed methods guarantees that χ will eventually be of the form S ~ χ in one of the future recursive calls. and applies the K-VERIFY algorithm from Section 5. If the Let us now turn our attention to the auxiliary procedure specification is not respected, DESCARTES outputs the Z3 model that can used to construct a counterexample. K-VERIFYLOOP shown in Figure 8. The interface of this function is the same as that of K-VERIFY, but it is only 3 Recall that the Transform rules from Figure 6 require computing weakest ∗ ∗ invoked on terms of the form [L1] ; S1 ~...~[Ln] ; Sn (i.e., preconditions, which may be difficult to do in the presence of nested loops. As a design choice, our loop invariant generator (i.e., the P1 P2 P3 implementation of FINDINV) is decoupled from the main Benchmark V t(s) V t(s) V t(s) verification method since we aim to experiment with several loop invariant generation techniques in the future. Currently, ARRAYINT  0.14  0.16  0.27 ARRAYINT†  0.19  0.41  0.38 our implementation of FINDINV follows standard guess-and CATBPOS  0.33  8.35  2.59 check methodology [14, 20]: CHROMOSOME  0.15  0.19  0.34 CHROMOSOME†  0.36  4.26  3.90 1. A guess procedure guesses a candidate invariant I. Cur- COLITEM  0.09  0.15  0.17 rently, we generate candidate invariants by instantiating COLITEM†  0.07  0.17  0.17 a set of built-in templates with program variables and CONTACT  0.11  1.14  1.85    constants which include (i) equalities and inequalities be- CONTAINER-V1 0.04 0.04 0.04 CONTAINER-V2  0.05  0.04  0.05 tween pairs of variables and (ii) quantified invariants of CONTAINER†  0.27  3.44  1.28 the form ∀j.• ≤ j < • ⇒ •[•] op • [•] where • indicates DATAPOINT  0.26  0.89  0.51 an unknown expression to be instantiated with variables FILEITEM  0.03  0.03  0.09 † and constants. FILEITEM  0.05  0.06  0.08 ISOSPRITE-V1  0.06  0.04  0.07 2. A check procedure verifies if candidate invariant I is in- ISOSPRITE-V2  0.40  1.82  0.18 ductive. This is performed by discharging two verification MATCH  0.05  0.04  0.06 MATCH†  0.05  0.05  0.06 conditions (VCs): NAME  0.15  0.38  0.27 † • The pre-condition implies I; NAME  0.16  0.40  0.50 NODE  0.03  0.03  0.09 • The cartesian hoare triple given by Eq (2) in Section NODE†  0.04  0.06  0.07 6 is valid. The validity of this cartesian hoare triple is NZBFILE  0.14  0.28  0.13 NZBFILE†  0.25  0.48  0.99 checked by a recursive call to K-VERIFY. POKERHAND  0.55  0.71  2.34 POKERHAND†  0.61  0.78  1.61 7. Experimental Evaluation SOLUTION  0.19  0.52  0.68 SOLUTION†  0.42  1.33  1.24 To evaluate our approach, we used DESCARTES to verify TEXTPOS  0.10  0.25  0.26 user-defined relational operators in Java programs, including TEXTPOS†  0.11  0.48  0.19 compare, compareTo, and equals. We believe this TIME  0.10  0.36  0.02 † application domain is a good testbed for our approach for TIME  0.08  0.35  0.22 WORD  0.84  6.19  0.07 multiple reasons: First, as mentioned in Section 3, these WORD†  0.24  0.52  0.28 relational operators must obey multiple k-safety properties, allowing us to consider five different k-safety properties Figure 9. Evaluation on Stackoverflow examples. in our evaluation. Second, since comparators and equals methods are ubiquitous in Java, we can test our approach on a variety of different applications. Third, comparators 7.1 Buggy Examples from Stackoverflow are notoriously tricky to get right; in fact, web forums like We manually curated a set of 34 comparator examples from Stackoverflow abound with questions from programmers who on-line forums such as Stackoverflow. In many online discus- are trying to debug their comparator implementations. sions that we looked at, a developer posts a buggy comparator The goal of our experimental evaluation is to explore the –often with quite tricky program logic– and asks other forum following questions: (1) Is DESCARTES able to successfully users for help with fixing his/her code. Hence, we were often uncover k-safety property violations in buggy programs? (2) able to extract a buggy comparator together with its repaired Is DESCARTES practical? (i.e., how long does verification version from each Stackoverflow post. Given this collection of take, and how many false positives does it report?) (3) How benchmarks containing both buggy and correct programs, we does DESCARTES compare with competing approaches such used DESCARTES to check whether each comparator obeys as self-composition [1] and product construction [3, 5]? properties P1-P3 required by the Java Comparator and To answer these three questions, we performed three Comparable interfaces. sets of experiments. In the first experiment, we consider The results of our evaluation are presented in Figure 9, and examples of buggy comparators from Stackoverflow as the benchmarks are provided under supplemental materials. well as their repaired versions. In the second experiment, For a buggy comparator called COMPARE, the benchmark we use DESCARTES to analyze equals, compare, and labeled COMPARE† describes its repaired version. For the compareTo methods that we found in industrial-strength column labeled V ,  indicates that DESCARTES was able to Java projects from Github. In our third experiment, we com- verify the property, and  means that DESCARTES reported pare DESCARTES with the self-composition and product a violation of the property. For a few benchmarks, (e.g., construction approaches and report our findings. In what CATBPOS), the post did not contain the correct version of the follows, we describe these experiments in more detail. comparator and the problem did not seem to have a simple fix; hence, Figure 9 does not include the repaired versions of Self-composition. Given Cartesian Hoare triple kΦk S kΨk a few benchmarks. of arity k, we emulate the self-composition approach by gen- From Figure 9, we see that DESCARTES automatically erating k-alpha-renamed copies of S and then verifying the verifies all correct comparators and does not report any false standard Hoare triple {Φ}S1; ... ; Sk{Ψ}. However, for the alarms. Furthermore, for each buggy program, DESCARTES benchmarks from Sections 7.1 and 7.2, this strategy fails to pinpoints the violated property and provides a counterexam- verify more than half (52%) of the examples. Furthermore, ple. Finally, the running times of the tool are quite reasonable: for the examples where self composition is able to prove the On average, DESCARTES takes 1.21 seconds per benchmark relevant k-safety properties, it is ∼ 20× slower compared to to analyze all relevant k-safety properties. DESCARTES. We believe this result demonstrates that self- composition is not a viable strategy for verifying k-safety properties in many programs. 7.2 Relational Operators from Github Projects In our second experiment, we evaluated our approach on Product programs. As mentioned earlier, another strategy equals, compare, and compareTo methods assembled for verifying k-safety is to explicitly construct a so-called from top-ranked Java projects on Github, such as Apache product program and use an off-the-shelf verifier. Unfortu- Storm, libGDX, Android and Netty. Towards this goal, we nately, there is no existing tool for product construction in wrote a script to collect relational operators satisfying certain Java, and there are several possible heuristics one can use complexity criteria, such as containing loops or at least for constructing the product program. Furthermore, previous 20 lines of code and 4 conditionals. Our script also filters work on sound product construction [3] only considers loops out comparators containing features that are currently not of the form while(C) do S, making it difficult to apply supported by our tool (e.g., bitwise operators, reflection). this technique to our benchmarks, almost all of which contain 4 In total, we ran DESCARTES over 2000 LOC from 62 rela- break or return statements inside loops. tional operators collected from real-world Java applications Hence, even though a completely reliable comparison with (provided under supplemental materials). Specifically, our the product construction approach is not feasible, we tried to benchmarks contain 28 comparators and 34 equals meth- emulate this approach by considering two different strategies ods. In terms of running time, DESCARTES takes an average for product construction: of 9.14 seconds to analyze all relevant k-safety properties of • S1: We eliminated break statements in loops in the stan- the relational operator. Furthermore, the maximum verifica- dard way by introducing additional boolean variables.5 tion time across all benchmarks is 55.27 seconds. Among the • S2: We eliminated break statements in loops by using our analyzed methods, DESCARTES was able to automatically Transform rules from Figure 6. verify all properties in 52 of the 62 benchmarks. Upon man- ual inspection of the remaining 10 relational operators, we In both strategies, our product construction tries to maxi- discovered that DESCARTES reported 5 false positives, owing mize the use of the Fusion rules from Figure 6, since this rule to weak loop invariants. However, the five remaining methods appears to be critical for successful verification. turned out to be indeed buggy — they violate at least one of Product construction with S1. In terms of the overall out- k the three required -safety properties. come, this product construction strategy yielded results very In summary, DESCARTES was able to verify complex real- similar to self-composition. In particular, we were only able world Java relational operators with a very low false positive to verify half of the benchmarks (50.7% to be exact), and rate of 8%. Furthermore, DESCARTES was able to uncover analysis time was a lot longer (143 seconds/benchmark for five real bugs in widely-used and well-tested Java projects. product construction vs. 8 seconds for DESCARTES). While We believe this experiment demonstrates that reasoning about it may be possible to decrease the false positive rate of this k -safety properties is difficult for programmers and that tools approach using a more sophisticated invariant generator, this like DESCARTES can help programmers uncover non-trivial result demonstrates that our approach is effective even when logical errors in their code. used with a relatively simple invariant generation technique. Product construction with S2. When using strategy S2 7.3 Comparison with Self-Composition and Product together with the Fusion rules of Figure 6, we were able In our last experiment, we compare DESCARTES with two to prove 85% of the correct benchmarks, but verification competing approaches, namely the self-composition [1] became significantly slower. In particular, DESCARTES is, and product construction [3, 5] methods, which are the only existing techniques for verifying k-safety properties. Recall 4 The product construction described in [5] can, in principle, handle arbitrary that both of these approaches explicitly construct a new control flow. However, the generated product program is only sound under certain restricted conditions, which must be verified using non-trivial program that is subsequently fed to an off-the-shelf verifier. verification conditions. To compare our technique with these approaches, we use the 5 In this approach, while(true){S; if(b) break;} is modeled non-buggy benchmarks considered in Sections 7.1 and 7.2. as f=true; while(f){S; if(b) f=false;} on average, ∼ 21× faster compared to the explicit product tems, our core CHL calculus is a generalization of pRHL for construction approach, and, in several cases, three orders proving general k-safety properties for k > 2, modulo the of magnitude faster. However, since the Transform rule is rules dedicated to probabilistic reasoning. Specifically, CHL a contribution of this paper, it is entirely unclear whether allows the exploration of any arbitrary interleaving of k pro- previous product construction approaches can verify the 71 grams and does not require any form of structural similarity correct benchmarks we consider here. between programs. It is unclear how to use prior work on relational program logics for proving k-safety properties for 8. Related Work k > 2. The term 2-safety property was originally introduced by Another key difference of this work from previous tech- Terauchi and Aiken [23]. They define a 2-safety property to niques is the focus on automation. Specifically, unlike prior be “a property which can be refuted by observing two finite work that uses relational hoare logics, our verification proce- traces" and show that non-interference [16, 19] is a 2-safety dure is fully automated and does not rely on interactive theo- property. Clarkson and Schneider generalize this notion to rem provers to discharge proof obligations. Another common k-safety hyperproperties and specify such properties using limitation of previous relational logics (e.g., RHL and pRHL) second-order formulas over traces [10]. However, they do not is that they do not yield satisfactory algorithmic solutions for consider the problem of automatically verifying k-safety. reasoning about asynchronous loops that execute different numbers of times. In contrast, we propose a specialized cal- Self-composition. Barthe et al. propose a method called culus to reason about both synchronous and asynchronous self-composition for verifying secure information flow [1]. loops and show that our approach can be successfully auto- In essence, this method reduces 2-safety to standard safety mated. Furthermore, our approach can reason about loops by sequentially composing two alpha-renamed copies of the that contain multiple exit points, whereas previous work only same program. Self-composition is, in theory, also applicable considers loops of the form while(C) do S. for verifying k-safety since we can create k alpha-renamed copies of the program and use an off-the-shelf verifier. While Bug finding. In this work, we focused on verifying the ab- theoretically complete, self-composition does not work well sence of k-safety property violations. While our technique is in practice (see Section 7 and [23]). sound, it may have false positives. A complementary research direction is to develop automated bug finding techniques Product programs. Our work is most closely related to a for finding violations of k-safety properties. One possible line of work on product programs, which have been used approach for automated bug detection is to construct a har- in the context of translation validation [18], proving non- ness program using self-composition and then use off-the- interference [3, 5] and program optimizations [21]. Given two shelf bug detection tools, such as dynamic symbolic exec- programs A, B (with disjoint variables), these approaches tion [8, 15] or bounded model checking [7, 9]. Based on our explicitly construct a new program, called the product of experience with the CATG tool [22] for dynamic symbolic ex- A and B, that is semantically equivalent to A; B but is ecution, this naive approach based on self-composition does somehow easier to verify. While there are several different not seem to scale for examples that contain input-dependent techniques for creating product programs, the key idea is to loops. For example, we were not able to successfully use execute A and B in lockstep whenever possible, with the dynamic symbolic execution to detect the bug in the com- goal of making verification easier. Unlike these approaches, parator from Figure 1 since CATG times-out. We believe a we do not explicitly create a full product program that is promising direction for future research could be to explore subsequently fed to an off-the-shelf verifier. In contrast, our complementary bug finding techniques for detecting viola- approach combines semantic reasoning with the construction tions of k-safety properties. of Hoare triples that are relevant for verifying the desired k-safety property. 9. Conclusion Relational program logics. Our work is also closely related We have proposed Cartesian Hoare Logic (CHL) for proving to a line of work on relational program logics [6, 25]. Ben- k-safety properties and presented a verification algorithm ton originally proposed Relational Hoare Logic (RHL) for that fully automates reasoning in CHL. Our approach can verifying program transformations performed by optimizing handle arbitrary values of k, does not require different runs compilers [6]. While Benton’s RHL provides a general frame- to follow similar paths, and combines the verification task work for relational correctness proofs, it can only be used with the exploration of k simultaneous executions. Most when two programs always follow the same control-flow path, importantly, our approach supports full automation and does which is not the case for different executions of the same pro- not require the use of interactive theorem provers. Our gram. Several extensions of Benton’s RHL have been devised evaluation shows that DESCARTES is practical and gives by Barthe et al. to verify security properties of probabilistic good results when verifying five different k-safety properties systems, where the closest to our work is pRHL [2]. While of relational operators in Java programs. Our comparison pRHL is a relational logic specialized for probabilistic sys- also demonstrates the advantages of DESCARTES over self- composition and explicit product construction approaches, the 5th ACM SIGACT-SIGPLAN symposium on Principles of both in terms of precision as well as running time. programming languages, pages 84–96. ACM, 1978. [12] L. De Moura and N. Bjørner. Z3: An efficient smt solver. In Acknowledgments Tools and Algorithms for the Construction and Analysis of The authors would like to thank Valentin Wustholz and Systems, pages 337–340. Springer, 2008. Kostas Ferles for their help with the CATG tool and Doug [13] J. Dean and S. Ghemawat. Mapreduce: simplified data pro- Lea, Hongseok Yang, Thomas Dillig, Yu Feng, Navid Yagh- cessing on large clusters. Communications of the ACM, 51(1): mazadeh, Vijay D’Silva, Ruben Martins, and the anonymous 107–113, 2008. reviewers for their helpful feedback. [14] C. Flanagan and K. R. M. Leino. Houdini, an annotation assistant for esc/java. In FME 2001: Formal Methods for References Increasing Software Productivity, pages 500–517. Springer, 2001. [1] G. Barthe, P. R. D’Argenio, and T. Rezk. Secure information flow by self-composition. In Computer Security Foundations [15] P. Godefroid, N. Klarlund, and K. Sen. Dart: directed auto- Workshop, 2004. Proceedings. 17th IEEE, pages 100–114. mated random testing. In ACM Sigplan Notices, volume 40, IEEE, 2004. pages 213–223. ACM, 2005. [2] G. Barthe, B. Grégoire, and S. Zanella Béguelin. Formal certi- [16] J. A. Goguen and J. Meseguer. Security policies and security fication of code-based cryptographic proofs. In Proceedings models. IEEE, 1982. of the 36th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 90–101. ACM, [17] A. Miné. The octagon abstract domain. Higher-Order and 2009. Symbolic Computation, 19(1):31–100, 2006. [3] G. Barthe, J. M. Crespo, and C. Kunz. Relational verification [18] A. Pnueli, M. Siegel, and E. Singerman. Translation validation. using product programs. In FM 2011: Formal Methods, pages In Tools and Algorithms for the Construction and Analysis of 200–214. Springer, 2011. Systems, pages 151–166. Springer, 1998. [4] G. Barthe, B. Köpf, F. Olmedo, and S. Zanella Béguelin. [19] A. Sabelfeld and A. C. Myers. Language-based information- Probabilistic relational reasoning for differential privacy. In flow security. Selected Areas in Communications, IEEE Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Journal on, 21(1):5–19, 2003. Symposium on Principles of Programming Languages, pages 97–110. ACM, 2012. [20] R. Sharma, S. Gupta, B. Hariharan, A. Aiken, P. Liang, and A. V. Nori. A data driven approach for algebraic loop invariants. [5] G. Barthe, J. M. Crespo, and C. Kunz. Beyond 2-safety: Asym- In Programming Languages and Systems, pages 574–592. metric product programs for relational program verification. Springer, 2013. In Logical Foundations of Computer Science, pages 29–43. Springer, 2013. [21] M. Sousa, I. Dillig, D. Vytiniotis, T. Dillig, and C. Gkantsidis. [6] N. Benton. Simple relational correctness proofs for static Consolidation of queries with user-defined functions. PLDI, analyses and program transformations. In ACM SIGPLAN 49(6):554–564, 2014. Notices, volume 39, pages 14–25. ACM, 2004. [22] H. Tanno, X. Zhang, T. Hoshino, and K. Sen. Tesma and catg: [7] A. Biere, A. Cimatti, E. M. Clarke, O. Strichman, and Y. Zhu. Automated test generation tools for models of enterprise appli- Bounded model checking. Advances in computers, 58:117– cations. In Proceedings of the 37th International Conference 148, 2003. on Software Engineering - Volume 2, pages 717–720. IEEE [8] C. Cadar, D. Dunbar, D. R. Engler, et al. Klee: Unassisted Press, 2015. and automatic generation of high-coverage tests for complex [23] T. Terauchi and A. Aiken. Secure information flow as a safety systems programs. In OSDI, volume 8, pages 209–224, 2008. problem. In Static Analysis Symposium (SAS). Springer, 2005. [9] E. Clarke, D. Kroening, and F. Lerda. A tool for checking [24] T. White. Hadoop: The definitive guide. " O’Reilly Media, ansi-c programs. In Tools and Algorithms for the Construction Inc.", 2012. and Analysis of Systems, pages 168–176. Springer, 2004. [10] M. R. Clarkson and F. B. Schneider. Hyperproperties. In [25] H. Yang. Relational separation logic. Theoretical Computer Computer Security Foundations Symposium, 2008. CSF’08. Science, 375(1):308–334, 2007. IEEE 21st, pages 51–65. IEEE, 2008. [26] A. Zaks and A. Pnueli. Covac: Compiler validation by program [11] P. Cousot and N. Halbwachs. Automatic discovery of linear analysis of the cross-product. In FM 2008: Formal Methods, restraints among variables of a program. In Proceedings of pages 35–51. Springer, 2008.