Coexecutability for Efficient Verification of Data Model Updates
Ivan Bocic´∗, Tevfik Bultan† Department of Computer Science University of California, Santa Barbara, USA ∗ [email protected] † [email protected]
Abstract—Modern applications use back-end data stores for 1 class PostsController persistent data. Automated verification of the code that updates 2 def destroy_tags the data store would prevent bugs that can cause loss or 3 ... 4 posts = Post.where(id: params[:post_ids]) corruption of data. In this paper, we focus on the most challenging 5 ... part of this problem: automated verification of code that updates 6 posts.each do |p| the data store and contains loops. Due to dependencies between 7 p.tags.destroy_all! loop iterations, verification of code that contains loops is a hard 8 end problem, and typically requires manual assistance in the form of 9 ... loop invariants. We present a fully automated technique that 10 end 11 end improves verifiability of loops. We first define coexecution, a method for modeling loop iterations that simplifies automated Fig. 1. An Example Action reasoning about loops. Then, we present a fully automated static program analysis that detects whether the behavior of In our earlier work [5], we demonstrated that one can a given loop can be modeled using coexecution. We provide check invariants about the data store by translating verification a customized verification technique for coexecutable loops that queries about actions to satisfiability queries in First Order results in more effective verification. In our experiments we Logic (FOL), and then using an automated FOL theorem observed that, in 45% of cases, modeling loops using coexecution reduces verification time between 1 and 4 orders of magnitude. prover to answer the satisfiability queries. However, due to In addition, the rate of inconclusive verification results in the undecidability of FOL, an automated theorem prover is not presence of loops is reduced from 65% down to 24%, all without guaranteed to come up with a solution every time, and some- requiring loop invariants or any manual intervention. times it may time-out without providing a conclusive result. In particular, actions that have loops in them are hardest to I.INTRODUCTION check automatically. Verification of code that contains loops Nowadays, it is common for software applications to store typically requires manual intervention where the developer their persistent data in a back-end data store in the cloud. has to provide a loop invariant in order to help the theorem For many application domains, such as social networking, prover in reasoning about the loop. This reduces the level of data is the most valuable asset of an application. Hence, the automation in the verification process and, hence, its practical correctness of the code that updates the data is of significant applicability. concern. In this paper, we present automated verification In this paper, we present a fully automated technique that techniques that improve the verifiability of code that update significantly improves the verifiability of actions with loops. the data. Our key contribution is the definition of a concept we call Typically, application programmers write object oriented coexecution, which, while intuitively similar to parallel or code for accessing and updating the data in the back-end concurrent execution, does not correspond to an execution data store. This code is automatically translated to data- on actual hardware. It is a concept we introduce specifically store queries using Object Relational Mapping (ORM). Mod- to make verification easier. We call a loop coexecutable if ern software development frameworks use the Model-View- coexecution of its iterations is equivalent to their sequential Controller (MVC) [23] architecture to separate the user inter- execution. We present an automated static analysis technique face code (View) from the code that handles the user requests that determines if a loop is coexecutable. We also developed (Controller), and the code that accesses and modifies the data a customized translation of coexecutable loops to FOL that store (Model). The inherent modularity in this architecture exploits the coexecution semantics and improves verifiability. creates opportunities for automated verification. In a typical We implemented these techniques for verification of actions RESTful [11] MVC-based application, user requests trigger that update the data model in applications written using execution of actions that read from or write to the back-end the Ruby-on-Rails (Rails) framework [26]. We applied our data store using an ORM library. Data store related bugs can be approach to verification of actions extracted from real world eliminated by verifying the implementations of these actions. Rails applications. Our experimental results demonstrate that exploiting the coexecutability property significantly improves This work is supported in part by the NSF grant CCF-1423623. the verifiability of actions with loops. {p1, p2, p3, t1, t2, t3, {p1, p2, p3, t1, t2, t3, ⟨p1, t1⟩, ⟨p2, t1⟩, ⟨p2, t2⟩, ⟨p3, t2⟩} {p1, p2, p3, t1, t2, t3, ⟨p1, t1⟩, ⟨p2, t1⟩, ⟨p2, t2⟩, ⟨p3, t2⟩} ⟨p1, t1⟩, ⟨p2, t1⟩, ⟨p2, t2⟩, ⟨p3, t2⟩} Iteration execution for p1 Delta: remove{t1, ⟨p1, t1⟩, ⟨p2, t1⟩} Iteration execution for p2 Iteration execution for p1 Iteration execution for p3 Delta: remove{t1, t2, ⟨p1, t2⟩, Delta: remove{t1, ⟨p1, t1⟩, ⟨p2, t1⟩} Delta: remove{t2, ⟨p2, t2⟩, ⟨p3, t2⟩} {p1, p2, p3, t2, t3, ⟨p2, t2⟩, ⟨p3, t2⟩} ⟨p2, t1⟩, ⟨p2, t2⟩, ⟨p3, t2⟩}
Iteration execution for p2 Delta: remove{t2, ⟨p2, t2⟩, ⟨p3, t2⟩} {p1, p2, p3, t2, t3, ⟨p2, t2⟩, ⟨p3, t2⟩} {p1, p2, p3, t3} {p1, p2, p3, t1, t3, ⟨p1, t1⟩, ⟨p2, t1⟩}
{p1, p2, p3, t3} Delta union: remove{t1, t2, Delta union Iteration execution for p3 ⟨p1, t1⟩, ⟨p2, t1⟩, ⟨p2, t2⟩, ⟨p3, t2⟩} Delta: empty
{p1, p2, p3, t3} {p1, p2, p3, t3}
(a) Sequential Execution (b) Coexecution Fig. 2. Sequential execution compared to Coexecution
The rest of the paper is organized as follows. Section II deduction process employed by the theorem prover does not provides an overview of the problem and presents coexecution terminate. This causes the theorem prover to time-out without at an intuitive level. Section III formally defines data stores a conclusive result. and sets up the foundation for Section IV, in which we define coexecution and define and prove the coexecutability A. Coexecution Overview condition. Section V presents the static program analysis Figure 1 presents an example Rails action based on an open algorithm that decides whether a given loop is coexecutable. source discussion platform called Discourse [7]. This action Section VI demonstrates the benefits of modeling loops using deletes all the Tag objects associated with a set of Post objects. coexecution on loops extracted from real world applications. The set of Post objects that are chosen by the user are assigned Section VII discusses related work, and Section VIII concludes to a variable called posts in line 4. Then, using a loop (lines the paper. 6 to 8) that iterates over each Post object in posts (with the loop variable p), all the Tag objects that are associated with II.OVERVIEW the objects in posts are deleted. Recently, we presented an automated verification frame- A data store invariant about the Discourse application could work [5] for verifying the actions that update the data, which be that each Tag object in the data store is associated with at 1) translates the action code and user-specified invariants into a least one Post object. In order to verify such an invariant, we formal model we call Abstract Data Store (ADS), 2) translates need to prove that each action that updates the data store (such verification queries about ADS specifications into first order as the one shown in Figure 1) preserves the invariant. logic (FOL), and 3) verifies them using an off-the-shelf FOL As we discussed above, actions with loops are especially theorem prover [32]. hard to verify when translated to FOL. The observation The key obstacle in automated verification of actions is that lead to the technique we present in this paper is, the verification of loops. The inclusion of even simple loops in loops can often be modeled in a way that does not require actions greatly increases verification time, and it also increases iteration interdependency. And, in such cases, it is possible to the chance that the theorem prover times out without ever translate the loops to FOL in a way that is more amenable to reaching a conclusion on whether the given action breaks verification. We call this alternate execution model for loops an invariant. Upon manual inspection of the theorem prover that removes iteration interdependency coexecution. deduction logs, we found out that the key issue that the Consider the action shown in Figure 1. Assume that the theorem prover struggles with is reasoning about the iteration initial data store state contains three Post objects p1, p2 and interdependencies in loops. p3 and three Tag objects t1, t2 and t3, with the following In order to verify loops, it is necessary for the automated associations: hp1, t1i, hp2, t1i, hp2, t2i, hp3, t2i. Also, assume theorem prover to deduce the rules that define which objects that the loop iterates on all the Post objects in the order and associations will exist after the loop has executed. To ac- p1, p2, p3 (i.e., the variable posts is the ordered collection complish this, the theorem prover has to compute the transitive of p1, p2, p3). closure of the loop body which is not expressible in FOL, Given this initial state, the standard sequential execution hence FOL theorem provers cannot make such deductions. semantics of the loop in lines 6 to 8 in Figure 1 is shown in Lacking that capability, the theorem prover attempts to deduce Figure 2(a) where gray rectangles with rounded corners denote the state of the data store after the first iteration has executed, states of the data store, and solid line arrows denote iteration then after the second iteration has executed, etc. Since we do executions. not bound the number of iterations, unless the theorem prover The first iteration, executed for Post p1, deletes all Tags of discovers a way to violate the invariant without reasoning p1 and their associations. We identify the operations executed about the loop, which is a very specific case, the automated by this iteration and summarize them as a delta of this operation (i.e., the changes in the data store state caused by this A. Data Store t1 operation). The first iteration removes and all associations Semantically, an abstract data store is a structure DS = t1 hp1, t1i hp2, t1i of ( and ). Similarly, the second iteration hC, R, A, Ii where C is a set of classes, R is a set of relations, Tags p2 t2 deletes all of , resulting in a delta that removes , A is a set of actions, and I is a set of invariants. hp2, t2i and hp3, t2i. Finally, the third iteration does not alter The set of classes C identifies the types of objects that the data store state as p3 is not associated with any Tags at can be stored in the data store. Each class can have a single that point. superclass or no superclass (superclass(c) ∈ C ∪ {⊥}) and, Figure 2(b) demonstrates the alternate coexecution seman- transitively, the superclass relation cannot contain cycles. A tics for the same loop. Instead of executing iterations sequen- relation r = hname, co, ct, cardi ∈ R contains the name of tially to reach the post-state, we first identify the delta for each the association, an origin class co ∈ C, a target class ct ∈ C iteration directly from the pre-state, in isolation from other and a cardinality constraint card (such as one-to-one, one-to- iterations. As expected, the first iteration (the one for p1) is many etc.). identical to the one from Figure 2(a). However, the iteration 1) Data Store States: Given a data store DS = for p2 deletes all Tags of p2, its delta removing t1 and t2 hC, R, A, Ii, the set of all possible data store states is denoted and all their associations. Note that this delta is different from as DS. Each data store state is a structure hO,T i ∈ DS where the delta of the sequential execution iteration for p2 shown O is a set of objects and T is a set of tuples. Together, we in Figure 2(a). Similarly, the iteration for p3 deletes t2 and refer to objects and tuples as entities. all associations whereas sequential execution for p3 produced We simplify our notation by using certain operators on ob- an empty delta. The deltas of these independent executions jects and tuples indiscriminately. For a state s = hO,T i ∈ DS, are combined together using the delta union operation, which object o and tuple t that may or may not be in s: in this case returns a union of all the delete operations (we s ∪ {o} = hO ∪ {o},T i s ∪ {t} = hO,T ∪ {t}i formally define the deltas and the delta union operation in s \{o} = hO \{o},T i s \{t} = hO,T \{t}i Section IV). In this example, the unified delta removes t1, o ∈ s ⇔ o ∈ O t ∈ s ⇔ t ∈ T t2 and their associations. Finally, we use the unified delta Objects are instances of classes, whereas tuples are instances to migrate from the pre-state to the post-state in one step, of relations. Each object o ∈ O is an instance of a class reaching the same post-state we acquired using sequential c ∈ C denoted by c = classof(o). Each tuple t ∈ T is in execution as shown in Figure 2(b). We call this one step the form t = hr, oo, oti where r = hname, co, ct, cardi ∈ R execution semantics based on the unified delta, coexecution. and classof(oo) = co and classof(ot) = ct. For a tuple For some loops, based on the dependencies among loop iter- t = hr, oo, oti we refer to oo as the origin object and ot as the ations, coexecution will yield a different result than sequential target object, and ∀t = hr, oo, oti, ∀s ∈ DS : t ∈ s ⇒ oo ∈ execution. However, coexecution is equivalent to sequential s∧ot ∈ s. Cardinality constraints of each relation r ∈ R must execution for some classes of interdependencies. Note that, be satisfied by every data store state in DS. in our example, iterations are interdependent (since the p1 Note that, in the abstract data store model, objects do not iteration prevents the p2 iteration from deleting t1 and its have basic fields. This model focuses on objects and how they associations), and yet coexecution and sequential execution are associated with one another. Basic fields can be introduced produce identical results. In Section IV-C, we formally define to the model by treating them as associations to a basic type the Coexecutability Condition that, if true for a given loop, class. For example, an age field would be an association to guarantees that coexecution of the loop is equivalent to se- one object of the Integer class (this way of treating basic quential execution of iterations. In Section V we implement fields has been used in Alloy [19], [20] for example). Observe this condition as a static program analysis, and based on this that, in the abstract data store model, all updates consist of analysis we are able to translate loops to FOL in a manner creations and deletions of entities (i.e., objects and tuples). that is more amenable to verification. 2) Actions and Invariants: Given a data store DS = In our analysis, we assume that actions that update the data hC, R, A, Ii, A denotes the set of actions. Each action store are executed as transactions (which is the case for most a ∈ A corresponds to a set of possible state transitions MVC-based web applications). Hence, the database ensures hhO,T i, hO0,T 0ii ⊆ DS × DS. Actions characterize possible that actions do not interfere with one another during runtime. updates to data store states (i.e., the transitions between the Effectively, all actions can be considered to execute within states). atomic blocks, and hence, so are the loops we are verifying. Given a data store DS = hC, R, A, Ii, I is the set of This gives us to freedom to model operations within a loop invariants. An invariant i ∈ I corresponds to a Boolean in any order, as long as the final effects of the loop execution function i: DS → {false, true} that identifies the set of data are identical to the effects of sequential execution. store states which satisfy the invariant. 3) Behaviors: Given a data store DS = hC, R, A, Ii, a III.FORMAL MODEL behavior of a data store DS is an infinite sequence of data store states s , s , s ... where In this section we present the formal model we use for 0 1 2 • ∀k ≥ 0 : sk ∈ DS ∧ ∃a ∈ A : hsk, sk + 1i ∈ a modeling data stores. • ∀k ≥ 0, ∀i ∈ I : i(sk) = true In other words, each behavior of a data store starts with an our abstraction and the behavior of ORM tools when objects initial data store state for which all invariants hold, and each are deleted before triggering iterations. pair of consecutive states corresponds to execution of a data IV. COEXECUTABILITY store action. Given a data store DS = hC, R, A, Ii, all states that appear Automated reasoning about loops is difficult since it is nec- in a behavior of DS are called the reachable states of DS essary to take into account many possible intermediate states that can appear during the loop execution. A loop invariant can and denoted as DSR. provide a compact characterization of the intermediate loop B. Statements states and help with automated reasoning, but loop invariant An action is composed of a finite number of statements. discovery is itself a difficult problem, and loop invariants A statement can be represented as a set of pairs of states are typically specified manually. The coexecution concept we hs, s0i ⊆ DS × DS which semantically represent possible introduce in this paper is a novel approach that enables us state transitions by means of that statement. For a statement to reason about the loop behavior without loop invariants and 0 0 S and two states s and s , we will use [s, s ]S to denote that reasoning about intermediate states. hs, s0i is a possible execution (state transition) of S. Below, we formally define coexecution. We also give a For example, a statement Sc that creates an object of class condition under which coexecution is equivalent to sequential C could be defined as: execution, and we call this property coexecutability. Note that 0 [s, s ]Sc ↔ ∃oc : classof(oc) = C∧ multiple iterations of a loop correspond to repeated sequential 0 0 oc ∈ s ∧ oc 6∈ s ∧ ∀e :(e ∈ s ↔ e ∈ s ) ∨ e = oc execution of the loop body. In order to simplify our presenta- As an other example, consider a Block statement B which tion, we will discuss how any two statements A and B can be coexecuted (which, for example, can represent the execution is a sequence of statements Ai, for 1 ≤ i ≤ n for some n. of the same loop body twice for different values of the iterator Statement A1 transitions between states s and s1 if and only variable). This discussion can be extended to coexecution of if [s, s1]A1 . The set of states that the sequence A1; A2 can transition to from s is equal to the union of all states that any number of statements, and, hence, is directly applicable to loops by treating iterations of a loop as separate statements. A2 can transition to from any state s1 such that [s, s1]A1 . 0 0 In this section, for brevity and simplicity, we will as- Therefore, ∀s, s ∈ DS × DS :[s, s ]A1;A2 ⇔ (∃s1 ∈ DS : 0 sume that the data store we reason about contains only one [s, s1]A1 ∧ [s1, s ]A2 ). If we extrapolate this reasoning to the whole block B: class called Class and only one relation called relation 0 0 Class ∀s, s ∈ DS × DS :[s, s ]B ⇔ (∃s1, s2 . . . sn−1 ∈ DS × · · · × DS : that associates objects of type with objects of type 0 Class with many-to-many cardinality. This allows us to use [s, s1]A1 ∧ [s1, s2]A2 ∧ · · · ∧ [sn−1, s ]An ) 1) ForEach Loop Statement: A ForEach loop statement minimal notation for data-store states, avoiding the need to (FE) is defined by two parameters: the set of objects being explicitly provide type information. For example, the state iterated over, denoted as α, and the block of code B that will {a, b, c, ha, bi, ha, ci} contains exactly three objects of type be executed for each member of α. Let |α| = n. The order Class, as well as two tuples of the relation type that associate of iteration is non-deterministic. B has access to the iterated object a with the other two. object and, therefore, the set of possible executions of B is As we discussed earlier, given two statements A and B, affected by the iterated variable. Effectively, each iteration is their sequential composition A; B is defined by the sequential a different state transition: we use notation [s, s0] to refer execution formula below: Bo ∀s, s0 ∈ DS × DS :[s, s0] ⇔ ∃s :[s, s ] ∧ [s , s0] (3) to a possible execution of an iteration executed for object o. A;B i i A i B In this case, we refer to o as the trigger object. The formula A. Execution Deltas defining the FE loop is: In order to define coexecution, we first need to define a way 0 0 ∀s, s ∈ DS × DS :[s, s ]FE ⇔ ∃o1 . . . on ∈ α, ∃s1 . . . sn ∈ DS : to express the effects of executing statements. Let us define ∀i, j ∈ [1 . . . n]: i 6= j ⇔ o 6= o ∧ (1) i j a structure hOc,Tc,Od,Tdi for that purpose: Oc and Tc are [s, s ] ∧ [s , s ] ∧ · · · ∧ [s , s ] ∧ s = s0 1 Bo1 1 2 Bo2 n−1 n Bon n (2) sets of objects and tuples, respectively, that are created by a In other words, a pair of states is an execution of a given loop given execution, and Od and Td are sets of objects and tuples, FE if and only if there exists a selection of objects from α respectively, that are deleted by a given execution. Let us call and a sequence of states such that (1) the said selection of this structure the delta of an execution. objects is a permutation of α, and (2) the said sequence of Given an execution from state s to s0, we denote the delta states is achievable by triggering iterations in the order of the of this execution as s0 s. For example, if given two states object permutation (2). s1 = {a, b, ha, bi} and s2 = {a, b, c}, then s2 s1 = There exists a corner case where an object that is about h{c}, {}, {}, {ha, bi}i and s1 s2 = h{}, {c}, {ha, bi}, {}i. to trigger an iteration gets deleted by a prior iteration. We A delta is consistent if and only if its corresponding create did not include this corner case as part of the definition as and delete sets are mutually exclusive, i.e., it introduces considerable complexity, but the semantic is as Oc ∩ Od = Tc ∩ Td = ∅ ∧ follows: such an iteration will still execute with an empty set (∀t = hoo, oti ∈ Tc : oo 6∈ Od ∧ ot 6∈ Od) ∧ iterator variable value. This behavior is in concordance with (∀t = hoo, oti ∈ Td : oo 6∈ Oc ∧ ot 6∈ Oc) In order to combine the changes done by different execu- s s tions, we introduce the union (∪) of two deltas: s si ⊖ s ∀δ1 = hOc1,Tc1,Od1,Td1i, δ2 = hOc2,Tc2,Od2,Td2i : sA ⊖ s sB ⊖ s
δ1 ∪ δ2 = hOc1 ∪ Oc2,Tc1 ∪ Tc2,Od1 ∪ Od2,Td1 ∪ Td2i si sA sB
We will use this operation to merge the changes done by s' ⊖ si (s ⊖ s) U (s ⊖ s) U A B independently executed statements. Note that the result of s' s' the union operation may not be a consistent delta even if all the arguments were individually consistent. We call deltas Fig. 3. Sequential execution vs. Coexecution conflicting if and only if their union is not consistent. 1) Delta Apply Operation: We will introduce the apply The inverse implication also holds: operation, that takes a state s and a consistent delta δ and ∀o ∈ Op :(o ∈ O ∨ o ∈ Oc1 ∪ Oc2) ∧ o 6∈ Od1 ∪ Od2 updates the state as dictated by the delta. The result is a new ⇒ ∀o ∈ Op :(o ∈ O ∧ o 6∈ Od1 ∧ o 6∈ Od2)∨ state that contains all objects and tuples that existed in s and (o ∈ Oc1 ∧ o 6∈ Od2) ∨ o ∈ Oc2 were not deleted by δ, and all objects and tuples created by ⇒ ∀o ∈ Op : o ∈ Os δ. In addition, whenever an object is deleted, all the tuples A consequence of this property is that the delta apply referring to that object are deleted as well. The apply operation operation is commutative for non-conflicting deltas (as delta maps a state and a consistent delta into a state, and we use the union is trivially commutative). ⊕ operator to denote this operation. Formally, given a state s and a consistent delta δ = hOc,Tc,Od,Tdi: B. Coexecution ∀s = hO,T i ∈ DS, s0 = hO0,T 0i ∈ DS : s0 = s ⊕ δ ⇔ Coexecution of two statements A and B, which we denote 0 (∀o : o ∈ O ⇔ (o ∈ O ∨ o ∈ Oc) ∧ o 6∈ Od) ∧ as A|B, means finding the deltas of independent executions of 0 (∀t = hoo, oti : t ∈ T ⇔ (t ∈ T ∨ t ∈ Tc)∧ both statements starting from the pre-state, finding the union 0 0 t 6∈ Td ∧ oo ∈ O ∧ ot ∈ O ) of those deltas, and applying the union to the pre-state. This is For example, given a state s = {a, b, c, ha, bi} and a delta visualized in Figure 3, similar to Figure 2 but applied to two δ = h{c}, {b}, {ha, ci}, {}i, s ⊕ δ = {a, c, ha, ci}. Notice how generic states and any two statements A and B. Formally, 0 0 the creation of object c was idempotent given that s already ∀s, s ∈ DS × DS :[s, s ]A|B ⇔ ∃sA, sB ∈ DS × DS : 0 had that object, and that deletion of object b implied that all [s, sA]A ∧ [s, sB ]B ∧ s = s ⊕ ((sA s) ∪ (sB s)) (4) tuples related to b were deleted as well. Notice that coexecution, because of the delta apply operation, We can observe that ∀s, s0 ∈ DS × DS : s0 = s ⊕ (s0 s). is defined only if no two possible deltas from the pre-state via This follows directly from the definition, as s0 s will create statements A and B are conflicting. all entities (objects and tuples) in s0 that are not in s and delete For example, if statement A adds a new object to a state, all the entities that are part of s and not s0. and statement B deletes all tuples from a state, executing these Lemma 1: Given any two non-conflicting deltas δ1 and δ2: statements from the state s = {a, b, ha, bi} independently will
∀s ∈ DS :(s ⊕ δ1) ⊕ δ2 = s ⊕ (δ1 ∪ δ2) yield the following states: This lemma follows directly from definitions of delta union sA = {a, b, c, ha, bi}, sB = {a, b} and the apply operation. For simplicity we will limit the proof Therefore, to objects, but the same proof can be extended to cover tuples. sA s =h{c}, {}, {}, {}i Given a state s = hO,T i, non-conflicting deltas δ1 = sB s =h{}, {}, {}, {ha, bi}i hOc1,Tc1,Od1,Td1i and δ2 = hOc2,Tc2,Od2,Td2i, and post- (sA s) ∪ (sB s) =h{c}, {}, {}, {ha, bi}i states ss = hOs,Tsi = (s ⊕ δ1) ⊕ δ2 and sp = hOp,Tpi = and, the coexecution A|B will result in the following state: s ⊕ (δ ∪ δ ), we proceed to show that any object in s must 1 2 s s ⊕ ((sA s) ∪ (sB s)) ={a, b, c} be in s , and that any object in s must be in s . p p s which is the same state to which sequential execution of A ∀o ∈ O :(o ∈ O ∧ o 6∈ O ∧ o 6∈ O ) ∨ s d1 d2 and B would transition from s, which means that A and B (o ∈ O ∧ o 6∈ O ) ∨ o ∈ Oc c1 d2 2 are coexecutable. ⇒ ∀o ∈ Os :(o ∈ O ∧ o 6∈ Od1 ∧ o 6∈ Od2) ∨ (o ∈ Oc1 ∨ o ∈ Oc2) C. Coexecutability Condition Because these deltas are non-conflicting, (o ∈ Oc1 ∨ o ∈ Not all statements are coexecutable since coexecution re- Oc2) ⇒ (o 6∈ Od1 ∧ o 6∈ Od2). Joining this implication with quires non-conflicting deltas, and even if their deltas are the previous formula: not conflicting, the result of coexecution may not be equal ∀o ∈ Os :(o ∈ O ∧ o 6∈ Od1 ∧ o 6∈ Od2)∨ to the result of sequential execution. Below we define a ((o ∈ Oc1 ∨ o ∈ Oc2) ∧ o 6∈ Od1 ∧ o 6∈ Od2) coexecutability condition, such that, given any two statements ⇒ ∀o ∈ Os :(o ∈ O ∨ o ∈ Oc1 ∨ o ∈ Oc2) ∧ o 6∈ Od1 ∧ o 6∈ Od2 A and B, if A and B satisfy the coexecutability condition, ⇒ ∀o ∈ Os :(o ∈ O ∨ o ∈ Oc1 ∪ Oc2) ∧ o 6∈ Od1 ∪ Od2 then their sequential execution is always equivalent to their ⇒ ∀o ∈ Os : o ∈ Op coexecution. 1) Statement Reads, Creates and Deletes: We model each 2) Coexecutability Condition Definition and Proof: We can statement as a set of (potentially non-deterministic) state tran- now define the coexecutability condition and our main result: sitions. This definition of statements is very general and widely Theorem 1: Given two statements A and B, if the following applicable, but makes it difficult to identify a statement’s read condition holds: set. We need to have access to a statement’s read set in order to ∀s ∈ DS, ∀e ∈ s : (reads(A, e) ⇒ ¬ creates(B, e) ∧ ¬ deletes(B, e)) reason about interdependencies of statements. In the remainder ∧(creates(A, e) ⇒ ¬ reads(B, e) ∧ ¬ deletes(B, e)) of this subsection we define how to infer a statement’s read, ∧(deletes(A, e) ⇒ ¬ reads(B, e) ∧ ¬ creates(B, e)) create and delete sets from its transition set. then coexecution of A and B is equivalent to their sequential First, we define what it means for a delta set ∆ to 0 0 0 execution (i.e., ∀s, s ∈ DS :[s, s ]A;B ⇔ [s, s ]A|B). In other cover a given statement A from a given set of states S = words, A and B are coexecutable. {s1, s2 . . . sn}: The proof of the above theorem is tedious due to the cover(∆, A, S) ⇔(∀s ∈ S, s0 ∈ DS :[s, s0] ⇒ (∃δ ∈ ∆ : s0 = s ⊕ δ)) A differences between objects and tuples and how they depend ∧ (∀s ∈ S, δ ∈ ∆ : [s, s ⊕ δ]A) I.e., a set of deltas ∆ covers a statement A from a set of states on one another (e.g., deleting an object deletes all associated S if and only if every state transition achievable from any state tuples, and creating a tuple that is associated with a non- in S via A is achievable from the same state via some delta existing object is impossible etc.). In order to simplify the in ∆, and any transition achievable from any s ∈ S via any proof, without loss of generality, we will outline the proof by delta in ∆ is a transition of A. focusing only on the creation and deletion of objects. A delta cover precisely describes all possible executions of First, the condition in Theorem 1 implies that no statement a statement from a set of states using a single set of deltas. can delete an object that can be created by the other. Therefore Intuitively, the existence of a delta cover shows that the given the deltas from any s via A and B are not conflicting, and coexecution is always defined. statement does not need to distinguish between the covered 0 states in order to decide how to proceed with execution. Let us take any two states s and s and assume that there Note that this does not mean that ∆ is the collection of all exists a state sA such that [s, sA]A. deltas achievable from states in S via A. We can demonstrate Let us consider any object oc that is created by sA s. this by considering a delete-all statement with S = DS, and All objects created by sA s are not read by B. Therefore, ∆ containing a single delta that creates no entities and deletes there exists a delta cover ∆ that describes all transitions from all entities that exist in any state in DS. In this particular case, s ∪ {oc} and s \{oc} via B. Since sA s is creating oc we the delta in ∆ is different than any delta achievable from any know that oc 6∈ s, so this delta cover describes all transitions finite s ∈ S via A, yet it is true that this ∆ covers A from S. from s and s ∪ {oc} via B. We can now define what it means for a statement A to read Let us inspect every member of such a delta set ∆. If any an entity e: δ ∈ ∆ creates anything in s then this operation is always reads(A, e) ⇔ ∃s ∈ DS : ¬∃∆ ⊆ DS∆ : cover(∆, A, {s ∪ {e}, s \{e}}) redundant for states s and s ∪ {oc}, so we can remove this This means that A reads e if and only if there exists a pair of operation and still have a delta cover of B over {s, s ∪ {oc}}. states s ∪ {e} and s \{e} that cannot be covered by any ∆ for We can similarly remove all deletions of all objects outside s∪ the statement A. This implies that A’s actions are dependent {oc} as redundant operations. Since oc cannot be deleted by B, on e’s existence in some way, for example if it is deciding we know that this trimmed delta cover does not delete anything whether to delete or not delete some object other than e based outside s. From the definition of a delta cover it follows that on e’s existence. I.e., if statement A reads entity e, then in the resulting trimmed delta cover is, in fact, precisely the set order to describe the behavior of A, we need to specifically of all deltas achievable from s via B. refer to e. Similar reasoning can be followed for any object od that is Based on this definition, a delete-all statement does not deleted by sA s. It follows that the set of deltas achievable read any entity e because, for any two states s ∪ {e} and from s via B covers B over {s, s \{od}}. s \{e} for any state s, there exists a ∆ that covers it: Because the set of all deltas achievable from s via B covers ∆ = {h{}, {}, objects of (s ∪ {e}), tuples of (s ∪ {e})i}. {s, s∪{oc}}, directly from the definition of delta covers (with Hence, using this definition, we are able to infer that two the prior assumption that an sA s.t. [s, sA]A exists): 0 delete-all statements that are executed back to back are co- ∀s, s ∈ DS × DS : ∃sA :[s, sA]A ⇒ 0 0 executable, although in sequential execution, behavior of the ([s ∪ {oc}, s ]B ⇔ ∃sB ∈ DS :[s, sB ]B ∧ s = sB ∪ {oc}) second delete-all statement changes (it becomes a no-op) due Let us generalize and say that s s creates objects o for to the presence of the first delete-all statement. A ci some 1 ≤ i ≤ n and deletes objects o for some 1 ≤ i ≤ n . We can also define what it means for a statement A to create c di d If we were to now enumerate all objects created and deleted or delete an entity similarly: 0 0 0 by s s one by one and apply the above reasoning to them, creates(A, e) ⇔ ∃s, s ∈ DS :[s, s ]A ∧ e 6∈ s ∧ e ∈ s A deletes(A, e) ⇔ ∃s, s0 ∈ DS :[s, s0] ∧ e ∈ s ∧ e 6∈ s0 the resulting formula would be: A 0 Recall that, since we are abstracting away the basic types, ∀s, s ∈ DS × DS : ∃sA ∈ DS :[s, sA]A ⇒ ([s ∪ {o , . . . o }\{o , . . . o }, s0] ⇔ ∃s ∈ DS : any update to the data store state consists of creation and c1 cnc d1 dnd B B [s, s ] ∧ s0 = s ∪ {o , . . . o }\{o , . . . o }) deletion of entities (i.e., objects and associations). B B B c1 cnc d1 dnd Category Node Children 1 program A n a l y s i s Block *Statement 2 var data: AnalysisData; Either *Block 3 f u n c t i o n Boolean If Formula, Block, Block 4 Analyze(loop: ForEach): 5 data.clearAllData; ObjectSetStmt Object Set 6 AnalyzeStatement(loop); Statement Assign Variable, Object Set 7 f o r type in DataStoreTypes: Delete Object Set 8 operations = data.operationsDoneOn( type ); CreateTuple Object Set, Relation, Object Set 9 i f (operations .hasTwoDifferentOpsWithOneGlobal ()) then DeleteTuple Object Set, Relation, Object Set 10 r e t u r n F al s e ; ForEach Variable, Object Set, Block 11 end ; 12 end ; Variable 13 r e t u r n True ; CreateObjectSet Class 14 end Dereference Object Set, Relation 15 AllOfClass Class 16 procedure AnalyzeStatement(stmt: Statement) Subset Object Set 17 case type ( s t m t ) of Object Set OneOf Object Set 18 Block : 19 f o r subStmt in stmt. subStatements do Union *Object Set 20 AnalyzeStatement(subStmt); Empty 21 end ; DereferenceCreate Object Set, Relation 22 D e l e t e : 23