A Certified Procedure for RL Verification Andrei Arusoaie, David Nowak, Vlad Rusu, Dorel Lucanu

To cite this version:

Andrei Arusoaie, David Nowak, Vlad Rusu, Dorel Lucanu. A Certified Procedure for RL Verification. SYNASC 2017 : 19th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, Sep 2017, Timisoara, Romania. pp.8, ￿10.1109/SYNASC.2017.00031￿. ￿hal-01627517￿

HAL Id: hal-01627517 https://hal.inria.fr/hal-01627517 Submitted on 1 Nov 2017

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. A Certified Procedure for RL Verification

Andrei Arusoaie∗, David Nowak†, Vlad Rusu‡, and Dorel Lucanu∗ ∗ Alexandru Ioan Cuza, University of Ia¸si {arusoaie.andrei, dlucanu}@info.uaic.ro †CRIStAL, CNRS & Lille 1 University, France [email protected] ‡INRIA Lille Nord-Europe, France [email protected]

Abstract—Proving programs correct is hard. During the last ML formula ϕ is a configuration template accompanied by a decades computer scientists developed various logics dedicated to constraint, and it denotes the set of configurations that match program verification. One such effort is Reachability Logic (RL): the template and satisfy the constraint. RL can be used for a language-parametric generalisation of Hoare Logic. Recently, based on RL, an automatic verification procedure was given and both defining operational semantics of proved sound. In this paper we generalise this procedure and and expressing properties about program executions. An RL prove its soundness formally in the Coq proof assistant. For the formula is a pair of ML formulas, denoted ϕ ⇒ ϕ0, which says formalisation we had to deal with all the minutiae that were that all terminating executions that start from a configuration neglected in the paper proof (i.e., an insufficient assumption, in the set denoted by ϕ, eventually reach a configuration in the implicit hypotheses, and a missing case in the paper proof). 0 The Coq formalisation provides us with a certified program- set denoted by ϕ .A language-independent and sound proof verification procedure. system, which uses the operational semantics of a language to derive RL formulas, has been proposed in [3]. I.INTRODUCTION One practical disadvantage of the RL proof system [3] is that the proofs tend to be very low level. Moreover, some Proving programs correct is one of the major challenges that creative user input is required when applying the rules of the computer scientists have been struggling with during the last proof system. Those difficulties were addressed in [9], where a decades. Many techniques have been developed in order to procedure for program verification based on RL was proposed automate this non-trivial and tedious process. Nowadays there and the soundness of the procedure was proved. The intent are many software tools used for proving program correctness, behind this procedure is to overcome the lack of strategies but proving that the verification tools themselves are correct is for applying the rules of the RL proof system, and implicitly, often avoided. In this paper we address this issue for our own move forward to automation. It takes as inputs sets of RL work: more precisely, we present the formalisation in the Coq formulas representing the operational semantics of a language proof assistant of the soundness of a procedure for program and the program together with its specification to be proved; verification that we proposed in [9]. if it terminates, it returns either success or failure. The In the last decades computer scientists came up with several soundness property states that if the procedure terminates and logics meant to be used for program verification. Floyd/Hoare returns success then the program satisfies its specification. Its Logic [5], [8], [12], and Dynamic Logic [6] proof depends on several assumptions which are not precisely are probably the most well known logics dedicated to pro- formalised in [9]. In order to provide additional confidence in gram verification. Recently, Matching Logic (ML) [15] and this soundness result, we present in this paper a formalisation Reachability Logic (RL) [16], [17], [3] have been proposed of the soundness proof in Coq. as alternative approaches for dealing with this problem. ML The paper proof from [9] raised a series of technical difficul- and RL were inspired from the attempt to use rewrite-based ties. One difficulty is hidden in an intermediary lemma whose operational semantics for program verification. Although oper- proof requires intricate inductive reasoning. Then, several ational semantics are considered too low level for verification, hypotheses and assumptions are added in an ad-hoc manner they are much easier to define and have the big advantage just for the proofs to hold. Also, some helper lemmas are not of being executable (and thus testable). For example, several precisely formulated, i.e., they need additional hypotheses. For large and complete operational semantics for real languages a careful reader all these issues could raise some doubts about have been formalised with RL using the framework [18], K the validity of the paper proof. [14], [2]: [4], [7], Java [1], JavaScript [11]. A proof in a research paper should keep a good balance ML is a first-order logic for specifying and reasoning between correctness and clarity. The soundness proof in [9] about program configurations, e.g., code and infrastructure is monolithic, i.e., the soundness does not follow from the for executing it (heap, stack, registers, etc.). Intuitively, an soundness of each derivation rule, as it is commonly done in This work was supported by the PN III - Bridge Grant contract no. soundness proofs. Therefore, even if the procedure is recursive, 55BG/2016: “Cre¸sterea imunit˘a¸tiila atacuri Zero-day prin analiz˘astatic˘a”. the nature of the proof is not inductive; the entire execution of the procedure is needed to prove that the goals are valid. the proof system. This is the reason why we choose to give a This makes the proof complex and hence the task of keeping direct proof of the soundness of the procedure. An additional a good balance between correctness and clarity is difficult. In benefit of a direct proof is that, compared to the RL proof order to keep the proof concise, some details are omitted in system, our procedure is closer to an implementation. Thus, by [9] but this could have hidden side effects. formalising directly the soundness of the procedure in Coq (vs. Clearly, a proof assistant is the solution here: it has no formalising the soundness of a proof system) we bridge the problem dealing with every detail because it keeps track gap between theory and implementation. In the end we obtain rigorously of all cases, and thus, it ensures that no corner a function which can be further extracted (using specialised case is overlooked. Coq mechanisms) into a certified prover for RL. Contributions. The main contribution is a generalisation Outline. Section II provides a short overview of ML and RL of the procedure and a constructive1 formal proof in Coq of based on examples. In Section III we present the verification its soundness. We first generalise the procedure by introducing procedure proposed in [9], while in Section IV we describe non-determinism and then encode it as an inductive relation in our formalisation in Coq of the soundness of the procedure. Coq. Then, we formally prove that our generalised procedure We conclude in Section V. is sound. The soundness of the concrete procedure in [9] fol- II.BACKGROUND lows. Finally, we implement the procedure as a Coq function which can be used to build a certified prover. This section provides an overview of ML and RL. For both The formalisation led us to discovering a flaw in the paper logics, we explain their syntax and semantics by means of proof: it is supposed that a claim holds for a set of RL simple examples. The precise definitions that we use in Coq formulas, but the claim was proved only for a subset of are shown later in Section III. We assume the reader is familiar formulas. Fortunately, this flaw does not invalidate the final with (many-sorted) First-Order Logic (hereafter, FOL). result, but it makes the proof in [9] incomplete. Moreover, A. Matching Logic in order to fix the proof we realised that we need an extra Matching Logic is a first-order logic for specifying and hypothesis in the soundness theorem. In Coq, we add both the reasoning about program configurations, e.g., code and infras- required hypothesis and an additional case which handles the tructure for executing it (heap, stack, registers, etc.). RL formulas in question and completes the proof. We start diving into ML by means of a very simple example: Another issue that we discovered and fixed is related to 0 32 renaming the free variables in RL formulas. In [9], this is ϕ , hx = x + y; y = x - y; x = x - y; | x 7→ x y 7→ yi∧x+y < 2 imprecisely handled using an assumption saying that certain For someone familiar with FOL, the formula above may look sets of variables are disjoint, and if they are not, then the free strange: what is the content surrounded by ‘h’ and ‘i’ and why variables can always be renamed. However, why the variable is it used in the left hand side of a conjunction? Recall that renaming is sound is not established in [9]. Here we explicitly ML is designed to specify and reason about program configura- show that free variables in RL formulas can always be renamed tions. In ML, the construct hx = x + y; y = x - y; x = x - y; | x 7→ appropriately, and we update the proof accordingly. x y 7→ yi is a formula and represents all the configurations During the formalisation in Coq we were able to find that match it. For this particular example, the configurations and fix some other imprecisions in lemmas from [9]. In matched by the formula consist of a program (i.e., the se- order to simplify the formalisation we abstract away some quence of assignments which is supposed to swap two values) details about ML that were fully stated in [9], but are in and a state containing mappings from program variables to fact not relevant for soundness. Last but not least, the Coq their values (here, x and y are variables that can be matched formalisation provides us with a certified program-verification by values). For real-life languages (like C [4], [7] or Java [1]) procedure. Its use in practice depends on the availability in configurations typically hold more information. Coq of RL-based semantics for languages; such an effort is The formal syntax of ML formulas is shown below. Let Σ already underway in the K team [10]. be a many-sorted algebraic signature, Π a set of predicate symbols, Var a set of (sorted) variables, and Cfg a sort for Related work. The closest related work is the formalisation program configurations; then, the syntax of ML formulas is: in Coq of RL reported in [3]. Previous works [16], [17], ϕ ::= π | > | p(t1, . . . , tn) | ¬ϕ | ϕ ∧ ϕ | (∃V )ϕ, [3] include sound and relatively complete proof systems for where π is a Σ-term of sort Cfg with variables (also called various versions of RL. In [3], the authors also prove the 2 basic pattern) , p is a predicate symbol in Π, ti are Σ-terms soundness of such a proof system in Coq. Our initial attempt with variables of appropriate sorts, and V a subset of Var. to prove the soundness of the procedure from [9] consisted As usual, the other known FOL connectives (∨, →, . . . ) and in reusing the mechanised proof reported in [3]. However, the quantifier ∀ can be expressed using the ones above. soundness of the procedure is not a direct consequence of the Example 1: Recall the ML formula ϕ0 shown at the begin- soundness of the RL proof system, because there is no direct ning of this section. Let π hx = x + y; y = x - y; x = x - y; | correspondence between the inference steps and the rules of , 2Here we only consider the “topmost version" of ML from [13], that is, π 1We thank the reviewers for pointing it out. is a term of sort Cfg with variables. x 7→ x y 7→ yi. Then π is a term of sort Cfg with variables B. Reachability Logic x and y. The constructor for such terms is h_ | _i, where We have seen in the previous section that ML formulas the two occurrences of _ are placeholders for terms of a sort denote sets of program configurations (or states). An RL Pgm (corresponding to programs) and terms of a sort Map formula (also called rule) is a pair of ML formulas, written (representing the state). Note that x and y are different from ϕ ⇒ ϕ0, that expresses reachability relationships between the and : the former are logical variables included in Var x y two sets of configurations denoted by ϕ and ϕ0. For example: while the latter are constants representing program variables hx = x + y; y = x - y; x = x - y; | x 7→ x y 7→ yi ∧ x + y < 232 (or identifiers) of sort Id. Symbols + and + that occur in π ∧ x + y < 232 are also different: the first is an ⇒ operation symbol over program expressions, while the latter hskip | x 7→ y y 7→ xi is the integer addition that applies only to integers. Here we is an RL formula which expresses the fact that 4 distinguish them by different font sizes (+ vs. +). All these configurations which satisfy hskip | x 7→ y y 7→ xi operation symbols and their corresponding sorts are included will be reached from configurations which satisfy 32 in Σ, while the predicates (e.g., <) are in Π. hx = x + y; y = x - y; x = x - y; | x 7→ x y 7→ yi ∧ x + y < 2 . For ML formulas we use the same notion of free variable Informally, the formula expresses the fact that if we execute as in FOL, which we denote by FV(ϕ) in the paper. The the assignments, then the values of program variables x and y are swapped. For this particular program, this is a desired existential closure of ϕ is denoted by ϕ , (∃ FV(ϕ))ϕ. Next, let us clarify how ML formulas are interpreted. property to prove. It is important to note how expressive RL Intuitively, an ML formula is a pattern which denotes the set is: in the left hand side of the RL formula we included the of concrete configurations that match that pattern. program and a side condition (which ensures that integer Example 2: The following term: overflow does not happen for 32 bits architectures); the right γ , hx = x + y; y = x - y; x = x - y; | x 7→ 5 y 7→ 10i hand side expresses that no code is left to be executed, and is a configuration that matches the ML formula ϕ0: the variables x and y hold the swapped initial values. structural part of γ is the same as the one of ϕ0 where variables We extend the FV construct over RL formulas too; by 0 0 x and y are replaced by concrete values 5 and 10; also, for FV(ϕ ⇒ ϕ ) we denote the union FV(ϕ)∪FV(ϕ ). Also, if S 32 S 0 x = 5 and y = 10 the inequality x + y < 2 holds. is a set of RL formulas then FV(S) , ϕ⇒ϕ0∈S FV(ϕ ⇒ ϕ ). RL Formally, the satisfaction relation |=ML of ML is akin to A set of formulas S defines a transition system, denoted the satisfaction relation of FOL (denoted |=FOL). Next, we (Cfg, ⇒S ), over configurations: we say that γ0 ⇒S γ1 if there 0 consider a (Σ, Π)-model M . Then, (γ, ρ) |=ML (∃V )ϕ iff there is a rule ϕ0 ⇒ ϕ1 ∈ S and there is a valuation ρ : Var → M 0 0 0 0 is ρ : Var → M with ρ (x) = ρ(x) for all x 6∈ V such such that (γ0, ρ ) |=ML ϕ0 and (γ1, ρ ) |=ML ϕ1. We often use 0 that (γ, ρ ) |=ML ϕ. This is similar to FOL, except that |=ML only ⇒S to denote the transition system (Cfg, ⇒S ). is defined over pairs (γ, ρ), where γ is an element in M of + − sort Cfg called a concrete configuration, ρ : Var → M is a Example 4: The set S , {α , α , α} of RL formulas valuation, and ML formulas ϕ. In addition to the other FOL defines a transition system: constructs, |=ML includes the ML particular case: + (γ, ρ) |= π iff ρ(π) = γ. α , hX=Z1 + Z2; Code | Z1 7→ V1 Z2 7→ V2i ⇒ ML = Example 3: As already pointed out in Example 2, γ is a hX V1 + V2; Code | Z1 7→ V1 Z2 7→ V2i 0 configuration that matches ϕ . Formally, if ρ : Var → M is a − α , hX=Z1 - Z2; Code | Z1 7→ V1 Z2 7→ V2i ⇒ valuation such that ρ(x) = 5 and ρ(y) = 10, then (γ, ρ) |=ML hX=V1 − V2; Code | Z1 7→ V1 Z2 7→ V2i ϕ0 holds: first3, ρ(hx = x + y; y = x - y; x = x - y; | x 7→ 32 0 x y 7→ yi) = γ, and second, ρ |=FOL x + y < 2 . α , hX=V ; Code | (X 7→ V ) Statei ⇒ hCode | X 7→ V Statei It has been shown in [16] that ML can be encoded in FOL. 0 We discuss here a variant of this encoding. If ϕ is an ML- Here, X, Z1 and Z2 are variables of sort Id; V , V , V1, V2 formula then its encoding ϕ=? in FOL is the formula (∃z)ϕ0, are variables of sort Int which are supposed to match over where ϕ0 is obtained from ϕ by replacing each basic pattern integer values; Code is a variable of sort Pgm and State occurrence π with z = π, and z is a variable which does not is a variable of sort Map. Also, recall that + (-) and + appear in the free variables of ϕ. Here are a few examples of (respectively, −) are different symbols: the first one is used formulas and their encodings: in the language expressions, while the second is the integer ϕ ϕ=? addition (subtraction). The first two rules are meant to capture

(π1 ∧ φ1) ∨ (π2 ∧ φ2)(∃z)((z = π1 ∧ φ1) ∨ (z = π2 ∧ φ2)) the evaluation of addition and subtraction of expressions π1 ∧ ¬π2 (∃z)((z = π1) ∧ ¬(z = π2)) consisting only of program variables. Essentially, the first rule expresses the fact that if we have to evaluate the program The encoding of a formula ϕ has the following property [9]: =? expression Z1 + Z2, where Z1 and Z2 are variables of sort Id, for all ρ and ϕ, ρ |=FOL ϕ iff exists γ such that (γ, ρ) |=ML ϕ. then we replace it with the integer expression V1 + V2, where 3For simplicity, we apply the valution ρ directly to a Cfg term; instead, we should have used the homomorphic extension of ρ to terms of sort Cfg. 4It refers to the ML satisfaction relation. V1 and V2 hold the values of the program variables matched by C. Derivatives of an ML (and RL) formula − Z1 and Z2. The rule α does the same thing for subtraction. A notion introduced in [9] is that of derivative of an ML Finally, α corresponds to the step-by-step execution of (and RL) formula. If ϕ is an ML formula and S is a set of RL assignments: if X=V ; Code needs to be executed then the formulas, then the derivative of ϕ with S is defined as follows: current value of the program variable matched by X in the ∆ (ϕ) {(∃FV(ϕ , ϕ ))(ϕ ∧ ϕ)=? ∧ ϕ | ϕ ⇒ ϕ ∈ S}. 0 S , l r l r l r V 0 state, namely , is replaced (in the state in the right hand If ϕ ⇒ ϕ is an RL formula then side) by the new integer value V . Also, in the right hand side ∆ (ϕ ⇒ ϕ0) {ϕ ⇒ ϕ0 | ϕ ∈ ∆ (ϕ)}. Code holds the remaining assignments to be executed. Note S , 1 1 S that the rest of the state matched by State remains unchanged. Intuitively, the derivative of an ML formula ϕ encodes the The set S can be regarded as an operational semantics. concrete successors by ⇒S of configurations matching ϕ. Derivatives can be computed for sets of RL formulas G too: The satisfaction relation |= of RL is defined using paths, RL [ 0 which are (possibly infinite) sequences of transitions of the ∆S (G) , ∆S (ϕ ⇒ ϕ ). ϕ⇒ϕ0∈G form γ0 ⇒S γ1 ⇒S γ2 ⇒S ··· . + − Example 7: We extend the set S {α+, α−, α} shown in Example 5: Recall the set S , {α , α , α} from , Example 4. The following sequence represents a path: Example 4 with the following rules: t α , hwhile (R>=B){S}; Code | State B 7→b R7→ri ∧ r ≥ b ⇒ hS; while (R>B){S}; Code | State B 7→ b R 7→ ri τ , hx = x + y; y = x - y; x = x - y; | x 7→ 5 y 7→ 10i ⇒S αf h (R>=B){S}; Code | State B 7→b R7→ri ∧ r < b ⇒ hx =5 + 10; y = x - y; x = x - y; | x 7→ 5 y 7→ 10i ⇒S , while hCode | State B 7→ b R 7→ ri hy = x - y; x = x - y; | x 7→ 15 y 7→ 10i ⇒S hy =15 − 10; x = x - y; | x 7→ 15 y 7→ 10i ⇒S These rules simulate the behavior of a very particular while >= hx = x - y; | x 7→ 15 y 7→ 5i ⇒S loop, where conditions are of the form R B. The rule t hx =15 − 5; | x 7→ 15 y 7→ 5i ⇒S α essentially says that if the values r and b corresponding hskip | x 7→ 10 y 7→ 5i to variables R and B satisfy r ≥ b, then we execute the loop body S once, and then again the loop. On the other Here we considered that skip is the empty sequence; also, hand, when r < b, the loop is not executed and the execution we evaluated the integer expressions in the state. The rules continues with Code as stated by αf . We consider the + − − applied, in order, are: α , α, α , α, α , α. following patterns: If none of the rules in S can be applied to a configuration ϕ,hwhile (r>=b){d=d+1; r=r-b}|a7→a b7→b d7→d r7→ri∧ then we say that the configuration is terminating. For instance, a = b ∗ d + r ∧ a ≥ 0 ∧ b > 0 0 the last configuration of the path shown in Example 5 is ϕ , hskip | a 7→ a b 7→ b d 7→ a/b r 7→ a%bi terminating. Paths can be infinite too. A typical example where Here / and % represent the integer division and the remainder 0 infinite paths can occur are the rules for handling loops in operations. Let G0 , {ϕ ⇒ ϕ }. The set of derivatives 0 0 programs. A path τ is complete if it is either finite and the computed for G0 is ∆S (G0) = {ϕ1 ⇒ ϕ , ϕ2 ⇒ ϕ }, where: last configuration in τ is terminating, or it is infinite. Since the path shown in Example 5 is finite and terminating then it ϕ1 ,hd=d+1; r=r-b; while (r>=d){d=d+1; r=r-b}| is also complete. Moreover, we say that a path τ starts from a7→a b7→b d7→d r7→ri∧a = b∗d+r∧r ≥ b∧a ≥ 0∧b > 0 ϕ h | 7→a 7→b 7→d 7→ai∧a=b∗d+r∧r 0 from an ML formula ϕ if the first configuration in the path is 2, skip a b d r ϕ matched by . The path shown in Example 5 starts from: Both ϕ1 and ϕ2 are in fact the symbolic successors of ϕ using 0 32 ϕ , hx = x + y; y = x - y; x = x - y; | x 7→ x y 7→ yi∧x+y < 2 S. Moreover, the configurations that match ϕ1 and ϕ2 are the In this paper we are only interested in the all-path interpre- concrete successors by ⇒S of configurations matching ϕ. tation [3] of RL formulas: a pair (τ, ρ) satisfies an RL formula 0 0 ϕ ⇒ ϕ , written (τ, ρ) |= ϕ ⇒ ϕ , iff (τ, ρ) starts from ϕ III.A PROCEDUREFORVERIFYING RL PROPERTIES (recall that τ γ ⇒ γ ⇒ γ ⇒ ··· ), and either there , 0 S 1 S 2 S A procedure for verifying RL formulas (Figure 1), called i ≥ 0 (γ , ρ) |= ϕ0 τ exists such that i ML , or is infinite. prove, was introduced in [9], and is meant to overcome the Example 6: Consider τ the path shown in Example 5 and lack of strategies for applying the rules of the RL proof system 0 0 ϕ the ML formula shown above. Then, (τ, ρ) |= ϕ ⇒ from [3], and implicitly, to move forward towards automation. hskip | x 7→ y y 7→ xi, with valuation ρ chosen such that As the RL proof system, the procedure has a coinductive ρ(x) = 5 and ρ(y) = 10. Clearly, γ0 (i.e., the first config- nature. Intuitively, given an RL formula ϕ ⇒ ϕ0, one can 0 uration of τ) is an instance of ϕ (as shown in Example 2). symbolically execute ϕ and check whether every leaf node Also, there is an i = 6, such that γi , hskip | x 7→ 10 y 7→ 5i of the obtained symbolic execution tree implies ϕ0. The pro- is matched by hskip | x 7→ y y 7→ xi, using the same ρ. cedure uses derivatives to compute symbolic execution paths. 0 Finally, we say that ⇒S satisfies ϕ ⇒ ϕ , written ⇒S |=RL An obvious problem of this approach is that the symbolic 0 0 ϕ ⇒ ϕ , iff (τ, ρ) |=RL ϕ ⇒ ϕ for all (τ, ρ) starting from ϕ execution tree can be infinite due to loops or recursion. To 0 with τ complete. We often write S |=RL ϕ ⇒ ϕ instead of overcome this problem, RL allows the use of helper formulas 0 ⇒S |=RL ϕ ⇒ ϕ . called circularities, which generalise the notion of invariant. Instead of proving a single formula, prove attempts to prove procedure prove(S,G0,G) a set of formulas (goals). The procedure checks first whether 1: if G = ∅ then return success 2: else choose ϕ ⇒ ϕ0 ∈ G RL 0 0 a helper formula can be used (e.g., an formula specifying 3: if M |=ML ϕ → ϕ then return prove(S,G0,G \{ϕ ⇒ ϕ }) 0 a loop) to discharge the current goal, rather than performing 4: else if there is ϕc ⇒ ϕc ∈ G0 s. t. M |=ML ϕ → ϕc then symbolic execution blindly. A goal can be used in the proof 0 0 5: return prove(S,G ,G \{ϕ ⇒ ϕ } ∪ ∆ 0 (ϕ ⇒ ϕ )) 0 ϕc⇒ϕc of a different goal or in its own proof, provided that at least 6: else if ϕ is S-derivable then 0 0 one symbolic step has been performed from it. 7: return prove(S,G0,G \{ϕ ⇒ ϕ } ∪ ∆S (ϕ ⇒ ϕ )) The procedure takes as input three sets of RL formulas: the 8: else return failure. Fig. 1. RL verification procedure. ϕ denotes (∃FV(ϕ ))ϕ . operational semantics S, a set of (initial) goals G0, and a recur- c c c sive argument G whose initial value is ∆S (G0); and it returns paper proof in [9]. Using Coq, we were able to find a flaw in either success or failure. At each recursive call, a goal the paper proof: a claim about the goals in G0 does not hold. from G is either eliminated and/or replaced by other goals. If This flaw does not invalidate the final result, but it makes ϕ ⇒ ϕ0 ∈ G is the current goal then it is processed as follows: the proof incomplete: an additional case is needed to prove 0 if M |=ML ϕ → ϕ then the goal is eliminated from G; else, the conclusion of an important lemma for the goals in G0. the procedure checks whether there is a circularity available Note that, in the proof of the soundness theorem, the goals in G0 which is used to generate a new goal that replaces the (of interest) for which we apply that lemma are exactly those current one; if no circularity is found, but ϕ is S-derivable, from G0. During the formalisation, we also realised that we then ϕ ⇒ ϕ0 is replaced by a set of goals consisting in its need an extra hypothesis in the soundness theorem to fix the symbolic successors; otherwise, prove returns failure. The proof. All these issues are fixed in the Coq formalisation. procedure succeeds when all the goals in ∆S (G0) together Unlike in [9], in the Coq proof we explicitly handle the with those generated during the execution are eliminated. If renaming of variables in RL formulas. Also, we reformulate the procedure does not terminate or it returns failure then several lemmas from [9] (Lemmas 5, 7, and 8) such that they it means that G0 does not contain enough information to are self contained, and we update their proofs accordingly. prove the goals. This problem is similar to finding appropriate The theoretical results shown in this section are fully invariants for loops in imperative programs. formalized in the Coq proof assistant. Example 8: Recall the rules S = {α+, α−, α, αt, αf } from 0 Examples 7 and 4. Also, recall the set G0 = {ϕ ⇒ ϕ } and A. ML in Coq patterns ϕ1 and ϕ2 from Example 7. We pass to prove the A definition of ML such as in [9] would require us to 0 0 sets S, G0, and G = ∆S (G0) = {ϕ1 ⇒ ϕ , ϕ1 ⇒ ϕ }. The formalise many things (algebraic specifications, FOL, etc.). All 0 procedure picks a goal to be proved from G, say ϕ2 ⇒ ϕ and these details are in fact a complete language definition, i.e., a 0 checks whether M |=ML ϕ2 → ϕ . In this case, the implication triple ((Σ, Π, Cfg), M , S). However, for the soundness of the holds, since a = b ∗ d + r ∧ r < b ∧ a ≥ 0 ∧ b > 0 implies verification procedure all these details are irrelevant. Moreover, a/b = d and r = a%b; thus, the values of d and r in the state we want our procedure to remain language parametric. For are correct. This goal is now eliminated from G. this, we introduce a set of axioms that capture only what 0 Next, the only goal left to prove is ϕ1 ⇒ ϕ . Since the im- is needed in the soundness proof: variables, conjunction(∧), 0 plication M |=ML ϕ1 → ϕ does not hold, and no circularities implication (→), the existential quantifier (∃), a model M 0 ϕc ⇒ ϕc are available in G0 such that M |=ML ϕ1 → ϕc, together with valuations, a set of rules S, and the ML satis- then prove computes the next successor using α+: faction relation. When all these axioms are instantiated (i.e., a 0 ϕ1 ,hr=r-b; while (r>=d){d=d+1; r=r-b}| particular language is provided), the procedure remains sound. a7→a b7→b d7→d+1 r7→ri∧a = b∗d+r∧r ≥ b∧a ≥ 0∧b > 0 Variables (Var), program configurations (State), models 0 0 (Model), and valuations (Valuation) are all parameters in Coq. The same case for ϕ1 ⇒ ϕ : the implication and circularity tests fail, and α− is used to get the next successor: From the ML syntax we only keep →, ∧, and ∃, together with ML |= 00 the corresponding axioms of the satisfaction relation ML. ϕ1 ,hwhile (r>=d){d=d+1; r=r-b}|a7→a b7→b To avoid ambiguities, we distinguish the logical connectives d7→d + 1 r7→r − bi ∧ a = b ∗ d + r ∧ r ≥ b ∧ a ≥ 0 ∧ b > 0 and quantifiers of ML ( ∧, →, ∃, . . . ) from those of Coq by Now, prove reaches a point where the circularity ϕ ⇒ ϕ0 using bold font for the latter (∧, →, ∃, . . . ).The validity of an M |= ϕ00 → ϕ a = M 5 can be applied, since ML 1 . This is because ML formula ϕ in model M is denoted by |=ML . b∗d+r∧r ≥ b∧a ≥ 0∧b > 0 implies a = b∗(d+1)+(r−b)∧ To collect the free variables of an ML formula we use a a ≥ 0∧b > 0. Therefore, the circularity is applied and prove function called FV. In Coq, FV is a parameter and needs an eliminates the last goal from G’ and returns success. implementation when the ML constructs are instantiated. In the proofs of our lemmas we often have to build new IV. A FORMALPROOFOFSOUNDNESSIN COQ valuations from existing ones. For this we use a function %0 In this section we generalise the procedure (Figure 1) which exhibits the following behaviour: %(ρ, ρ0, x) returns a by introducing non-determinism and then encode it as an new valuation which applied to variable x produces the same inductive relation in Coq. In spite of this generalisation, the formal proof of soundness follows the same pattern as the 5These details are shown in the Appendix. result as valuation ρ0 applied to x; for all the other variables element None). The ith element of a path τ is τ(i) and can 0 z(6= x) the valuation %(ρ, ρ , z) is equal to ρ. We also use the be either a state γi or None. The subpath of τ which starts at 0 function % which is an extension of % to sets of variables. position i is denoted τ|i. Also, a path is well-formed, written The corresponding Coq definitions and lemmas (which are wfPath(τ), if every two consecutive states (say γi and γi+1) proved in Coq) are shown below: are in the transition relation given by S (γi ⇒S γi+1), and for Definition %0 (ρ ρ0:Valuation)(x : Var): Valuation := all i and j such that i < j, if τ(i) = None then τ(j) = None. fun z => if (x = z) then ρ0(x) else ρ(z). A path τ is infinite, written infinite(τ), if for all i, Fixpoint % (ρ ρ0:Valuation)(V : list Var): Valuation := τ(i) 6= None. A configuration γ is terminating, denoted 0 0 match V with as terminating(γ), if there is no γ such that γ ⇒S γ .A | [] => ρ well-formed path τ is complete and has n transitions, written 0 0 0 | v :: vs => % (%(ρ, ρ , v), ρ ,V ) complete(τ, n) if τ is finite and terminating(τ(n)). Also, a end. well-formed path τ is complete, denoted by complete(τ), if Lemma %6∈ : ∀x ρ ρ0 V . x 6∈ V → %(ρ, ρ0,V )(x) = ρ(x) Lemma %∈ : ∀x ρ ρ0 V . x ∈ V → %(ρ, ρ0,V )(x) = ρ0(x) infinite(τ) or complete(τ, n) for some natural number n. Note that, if the path τ is complete and well-formed then for all If an ML formula ϕ is satisfied by a pair (γ, ρ), and i (if τ is finite then i ≤ n) the subpath τ| is also complete none of its free variables are in V , then for any ρ0, the pair i and well-formed. This statement is assumed in [9] but in Coq (γ, %(ρ, ρ0,V )) also satisfies ϕ, because on those variables we prove it explicitly. Finally, we say that (τ, ρ) startsFrom ϕ %(ρ, ρ0,V ) has the same effect as ρ. Conversely, if ϕ is if (τ(0), ρ) |= ϕ. All these are defined in Coq as follows: satisfied by a pair (γ, ρ0), and all its free variables are ML included in V , then for any ρ, the pair (γ, %(ρ, ρ0,V )) also Definition Path := nat → option State Definition wfPath(τ):= ∀ i j.(i 1) are the sets of goals 0 0 0 0 Definition (ϕ, ρ) ∼ (ϕ, ρ ):=∀ γ.(γ, ρ)|=ML ϕ ↔ (γ, ρ ) |=ML ϕ, generated by each recursive call. In Coq, we define the following relation (which says when a goal g is in F): Essentially, if ϕ ∼ ϕ0 then both ϕ and ϕ0 match over the 0 0 0 same set of configurations. Next, we use an axiom to establish g ∈ G0 step(G, G ) g ∈ G\G G ⊆ F [IN-G ] [IN-STEP] the link between relations ∼ and ≈: g ∈ F 0 g ∈ F ∼ 0 0 0 0 Axiom ≈ :∀ ϕ0 ϕ0 ϕ1 ϕ1. ϕ0 ⇒ ϕ0 ≈ ϕ1 ⇒ ϕ1 → A goal g ∈ F if it is either in G0, or there is step where g was 0 0 0 0 0 ∀ ρ. ∃ ρ . (ϕ0, ρ) ∼ (ϕ1, ρ ) ∧ (ϕ0, ρ) ∼ (ϕ1, ρ ) eliminated (i.e., step(G, G0) with g ∈ G\G0) and the remaining C. The procedure as a relation goals, including the ones introduced by this step, are also in F. To ensure that this definition is consistent with the one The procedure shown in Figure 1 might not terminate. in [9], we prove in Coq the following technical lemma: Because all Coq functions are terminating, the procedure cannot be encoded as a Coq function. Also, the procedure Lemma 1: For all lists of goals G, if steps(G) then G ⊆ F. operates with a set of goals instead of a single goal. In our D. Helper lemmas formalisation, we adapt the inference steps above to work with lists of goals, since in Coq it is more convenient to work with In this section we present two helper lemmas that have been lists rather than sets. One could claim that working with lists also proved in [9], but here we reformulate them more pre- might restrict the generality of the procedure, but our encoding cisely. We also identify and explain the differences w.r.t. [9]. does not rely on list-specific features. Moreover, in our Coq The first lemma states that every goal which is generated formalisation we use the relation ≈ defined in Section IV-B by a successful execution is either eliminated by [IMPL], or its in order to handle variable renaming explicitly. left hand side is S-derivable. Unlike in [9], we disentangle the In Coq, we introduce non-determinism by encoding the precise hypotheses and formulate it as a stand-alone lemma: procedure as an inductive relation: Lemma 2: For all RL formulas ϕ ⇒ ϕ0 ∈ F, if S is total, 0 0 G S γ ρ ϕ ⇒ ϕ ∈ G M |=ML ϕ → ϕ all goals in 0 are -derivable, and there are and such [IMPL] 0 step(G, G \{ϕ ⇒ ϕ0}) that (γ, ρ) |=ML ϕ then M |=ML ϕ → ϕ or ϕ is S-derivable.

0 0 0 The main difference between Lemma 2 and the corresponding ϕc ⇒ ϕ ∈ G0 ϕc ⇒ ϕ ≈ ϕ 0 ⇒ ϕ 0 ϕ ⇒ ϕ0 ∈ G c c c c one from [9] is that here we explicitly provide in the hypoth- M |=ML ϕ → ϕc FV(ϕc0 ) ∩ FV(ϕ) = ∅ [CIRC] esis a configuration γ and a valuation ρ which satisfy ϕ. The 0 0 0 step(G, G \{ϕ ⇒ ϕ } ∪ {∆ϕ 0 ⇒ϕ (ϕ ⇒ ϕ )}) c c0 existence of (γ, ρ) is crucial in the soundness proof. γ ⇒ γ0 0 The second lemma states that for every transition S S ≈ S 0 0 which starts from ϕ there is a symbolic successor ϕ of ϕ such ϕ ⇒ ϕ ∈ G ϕ is S−derivable 0 FV(ϕ) ∩ FV(S ) = ∅ that γ0 is an instance of ϕ0. This lemma is also generalised SYMB 0 0 [ ] step(G, G \{ϕ ⇒ ϕ } ∪ ∆S0 (ϕ ⇒ ϕ )) to paths in Coq, and we prove that for a concrete execution Here, FV(S0) denotes the list of free variables that occur in path there is a symbolic one which covers it. There are two 0 0 all the RL formulas in S . G, G0, S, and S are all lists of additional conditions required by this lemma (w.r.t. [9]): all RL formulas. By abuse of notation we write ϕ ⇒ ϕ0 ∈ G rules from S have distinct free variables from variables in ϕ to denote the fact that ϕ ⇒ ϕ0 is in list G. Intuitively, step and all formulas from S and G0 are well-formed. 0 0 relates two lists of goals G and G , where G contains the Lemma 3: For all transitions γ ⇒S γ , for all valuations ρ, 0 0 current goal ϕ ⇒ ϕ , and G contains the remaining goals and for all formulas ϕ, if (γ, ρ) |=ML ϕ, f ∈ S ∪ G0 implies from G \{ϕ ⇒ ϕ0} and possibly the new goals generated wf(f), and FV(S) ∩ FV(ϕ) = ∅, then there are α ∈ S and 0 0 0 by [CIRC] and [SYMB]. At each step, only a single goal is ϕ , ∆α(ϕ) such that (γ , ρ) |=ML ϕ . E. The soundness for finite paths In terms of future work, we are interested in extracting a The proof of the soundness theorem depends on an interme- certified OCaml program from the procedure, and then use it diate technical lemma. In this section we formulate the lemma for verification within the K framework. This requires RL- by adding all the hypotheses needed by its proof so that it does based semantics of languages in OCaml, which could be not rely on hidden assumptions as in [9]. The lemma can be extracted from corresponding semantics in Coq. thought of as a version of soundness for finite paths. We prove REFERENCES it in Coq by induction on the number of transitions in τ. [1] Denis Bogdana¸s˘ and Grigore Ro¸su. K-Java: A Complete Semantics Lemma 4: For all finite and complete paths τ, for all of Java. In Proceedings of the 42nd Symposium on Principles of valuations ρ, and for all formulas ϕ ⇒ ϕ0 ∈ F, if (τ, ρ) Programming Languages (POPL’15), pages 445–456. ACM, January 2015. startsFrom ϕ, G0 6= ∅, wfPath(τ), steps(∆S (G0)), S is total, [2] Traian Florin ¸Serbanu¸t˘ a,˘ Andrei Arusoaie, David Lazar, Chucky Ellison, the left hand sides of goals in G0 are S-derivable, all formulas Dorel Lucanu, and Grigore Ro¸su. The primer (version 3.3). Electronic Notes in Theoretical Computer Science, 304:57 – 80, 2014. Proceedings in S ∪ G0 are well formed, and FV(S) ∩ FV(G0) = ∅, then 0 of the Second International Workshop on the K Framework and its (τ, ρ) |=RL ϕ ⇒ ϕ . Applications (K 2011). With respect to [9], there are several significant differences. [3] Andrei ¸Stefanescu,˘ ¸Stefan Ciobâca,˘ Radu Mereu¸ta,˘ Brandon M. Moore, Traian Florin ¸Serbanu¸t˘ a,˘ and Grigore Ro¸su. All-path reachability logic. First, the proof takes into account the fact that derivatives are In Proceedings of the Joint 25th International Conference on Rewriting computed with renamed RL formulas. Second, we find and Techniques and Applications and 12th International Conference on fix the flaw from the proof in [9], where a false assumption Typed Lambda Calculi and Applications (RTA-TLCA’14), volume 8560 of LNCS, pages 425–440. Springer, July 2014. about the goals in F is made: given a successful execution [4] Chucky Ellison and Grigore Ro¸su.An executable formal semantics of C of the procedure starting with ∆S (G0), all the goals in F are with applications. In Proceedings of the 39th Symposium on Principles eliminated by a step of the procedure. This cannot be true, of Programming Languages (POPL’12), pages 533–544. ACM, 2012. [5] Robert W. Floyd. Assigning meanings to programs. Proceedings of since F includes G0, but the procedure only processes goals Symposium on Applied Mathematics, 19:19–32, 1967. starting with ∆S (G0). The assumption does not invalidate the [6] David Harel, Dexter Kozen, and Jerzy Tiuryn. Dynamic logic. SIGACT result from [9], but it makes the proof incomplete. In fact, the News, 32(1):66–69, 2001. [7] Chris Hathhorn, Chucky Ellison, and Grigore Ro¸su.Defining the unde- goals in G0, which are of interest in the soundness theorem, finedness of c. In Proceedings of the 36th ACM SIGPLAN Conference on are not handled. This flaw was detected only when formalising Programming Language Design and Implementation (PLDI’15), pages the proof in Coq. To fix it, we add a new hypothesis: the free 336–345. ACM, June 2015. [8] Charles Antony Richard Hoare. An Axiomatic Basis for Computer variables that occur in rules in S and those that occur in rules Programming. Comm. ACM, 12(10):576–580, 583, October 1969. in G0 constitute disjoint sets (i.e., FV(S) ∩ FV(G0) = ∅). [9] Dorel Lucanu, Vlad Rusu, Andrei Arusoaie, and David Nowak. Veri- fying reachability-logic properties on rewriting-logic specifications. In F. The soundness theorem Logic, Rewriting, and Concurrency - Essays dedicated to José Meseguer on the Occasion of His 65th Birthday, volume 9200 of Lecture Notes In this section we formulate the soundness theorem. Several in Computer Science, pages 451–474. Springer, 2015. assumptions from [9] become hypotheses: all formulas in S [10] Brandon Moore and Grigore Ro¸su.Program verification by coinduction. 0 Technical Report http://hdl.handle.net/2142/73177, University of Illinois, and G0 are well formed and for all the goals ϕ ⇒ ϕ in G0 February 2015. ϕ is S-derivable. Moreover, we add the additional hypothesis [11] Daejun Park, Andrei ¸Stefanescu,˘ and Grigore Ro¸su. KJS: A complete formal semantics of JavaScript. In Proceedings of the 36th ACM required by LEMMA 4: FV(S) ∩ FV(G ) = ∅. The theorem 0 SIGPLAN Conference on Programming Language Design and Imple- below is fully formalised and proved in Coq: mentation (PLDI’15), pages 346–356. ACM, June 2015. Theorem 1: If S is total, all formulas in S ∪ G0 are well- [12] John C. Reynolds. Separation Logic: A Logic for Shared Mutable Data formed, the left hand sides of goals in G are S-derivable, Structures. In LICS, pages 55–74, 2002. 0 [13] Grigore Ro¸su. Matching logic — extended abstract. In Proceedings FV(S) ∩ FV(G0) = ∅, and steps(∆S (G0)) then ⇒S |=RL G0. of the 26th International Conference on Rewriting Techniques and Applications (RTA’15), volume 36 of Leibniz International Proceedings V. CONCLUSIONS in Informatics (LIPIcs), pages 5–21, Dagstuhl, Germany, July 2015. [14] Grigore Ro¸suand Traian Florin ¸Serbanu¸t˘ a.˘ K overview and simple case The formal proof of the soundness of the procedure was de- study. In Proceedings of International K Workshop (K’11), volume 304 veloped using version 8.4pl5 of the Coq proof assistant. The of ENTCS, pages 3–56. Elsevier, June 2014. 6 [15] Grigore Ro¸suand Andrei ¸Stefanescu.˘ Matching Logic: A New Program Coq code is organized as follows: the main file, sound.v, Verification Approach (NIER Track). In ICSE’11: Proceedings of the contains the definitions for step and steps, the main lemmas and 30th International Conference on Software Engineering, pages 868–871. the soundness theorem, all shown in Section IV. The axioms ACM, 2011. [16] Grigore Ro¸su and Andrei ¸Stefanescu.˘ Checking reachability using that we use for ML (Section IV-A) can be found in ml.v. The matching logic. In Proceedings of the 27th Conference on Object- definition of RL, derivatives, and the other related notions from Oriented Programming, Systems, Languages, and Applications (OOP- Section IV-B are located in rl.v and derivatives.v. SLA’12), pages 555–574. ACM, 2012. [17] Grigore Ro¸su, Andrei ¸Stefanescu,˘ ¸Stefan Ciobâca,˘ and Brandon M. The Coq code has over 1600 lines and the proof includes Moore. One-path reachability logic. In Proceedings of the 28th 24 lemmas, 15 axioms (all needed to preserve the language Symposium on Logic in Computer Science (LICS’13), pages 358–367. parametric feature) and 1 theorem. It took 5 months for one IEEE, June 2013. [18] Grigore Ro¸su and Traian Florin ¸Serbanu¸t˘ a.˘ An overview of the K person to improve the proof in [9] and encode it in Coq. semantic framework. Journal of Logic and Algebraic Programming, 79(6):397–434, 2010. 6Zip archive: https://profs.info.uaic.ro/~arusoaie.andrei/soundness-proof.zip