Serializability for Eventual Consistency: Criterion, Analysis

rtifact Comple * A t * te n * te A Serializability for Eventual Consistency: is W s E * e n l l C o L D C o P * * c u e m s O E u e e P n R t v e o d t * y * s E a Criterion, Analysis, and Applications a l d u e a t Lucas Brutschy, Dimitar Dimitrov, Peter Müller, and Martin Vechev Department of Computer Science, ETH Zurich, Switzerland {lucas.brutschy, dimitar.dimitrov, peter.mueller, martin.vechev}@inf.ethz.ch Abstract However, relaxations of strong consistency come at a price as Developing and reasoning about systems using eventually consistent applications may now experience unexpected behaviors not possible data stores is a difficult challenge due to the presence of unexpected under strong consistency. These behaviors may lead to serious er- behaviors that do not occur under sequential consistency. A funda- rors, and make development of such applications more challenging mental problem in this setting is to identify a correctness criterion because the application itself is now responsible for guaranteeing that precisely captures intended application behaviors yet is generic strong consistency where required. Serializability then becomes an enough to be applicable to a wide range of applications. important criterion for reasoning about the correctness of the appli- In this paper, we present such a criterion. More precisely, we cation: if serializability holds, one can reason about the application generalize conflict serializability to the setting of eventual consis- without considering the effects of weak consistency. Further, serializ- tency. Our generalization is based on a novel dependency model ability violations can guide the placement of correct synchronization. that incorporates two powerful algebraic properties: commutativity While desirable, the general notion of serializability is very difficult and absorption. These properties enable precise reasoning about to use in practice. A key issue is that deciding whether a concurrent programs that employ high-level replicated data types, common in execution is serializable is NP-hard [28]. This is one of the key modern systems. To apply our criterion in practice, we also devel- reasons why stronger but computationally tractable serializability oped a dynamic analysis algorithm and a tool that checks whether a criteria, such as conflict serializability [28], have been explored (see given program execution is serializable. also [4]). We performed a thorough experimental evaluation on two real- Key Properties To be practically useful, a serializability criterion world use cases: debugging cloud-backed mobile applications and for eventual consistency must possess at least three key properties: implementing clients of a popular eventually consistent key-value (a) it must be strong enough so that the associated decision problem store. The experimental results indicate that our criterion reveals is computationally feasible, (b) it must be precise enough so it does harmful synchronization problems in applications, is more effective not rule out desired application behaviors, and (c) it should handle at finding them than prior approaches, and can be used for the the weak semantics of modern data stores. development of practical, eventually consistent applications. Designing a serializability criterion that addresses all three properties is very challenging. For example, general serializability Categories and Subject Descriptors D.2.5 [Software Engineer- satisfies conditions (b) and (c), but not condition (a). On the other ing]: Testing and Debugging; C.2.4 [Computer-Communication hand, conflict serializability satisfies condition (a), but neither (b) nor Networks]: Distributed Systems — Distributed databases (c). Subsequent developments on conflict serializability improved on (b) and (c) but, interestingly, not on both at the same time. Keywords Replication, eventual consistency, serializability For example, the work of Weihl [38] improves the precision of conflict serializability under strong consistency by reasoning about 1. Introduction commuting high-level operations. Other works, for instance [3, 14], incorporate weaker semantics, such as snapshot isolation and causal Modern distributed systems increasingly rely on replicated data consistency, but do not improve the precision and still work with stores [12, 19, 20, 33] in order to achieve high scalability and low-level reads and writes. availability. As dictated by the CAP theorem [16], consistency, availability and partition-tolerance cannot be achieved at the same Key Challenge The main challenge then is: can we obtain a time. While various trade-offs exist, most replicated stores tend to criterion that covers all three properties above: is computationally provide relaxed correctness notions that are variants of eventual feasible, is precise enough to be used for practical applications, and consistency: updates are not immediately but eventually propagated can deal with (very) weak semantics such as eventual consistency? to other replicas, and replicas observing the same set of operations This Work To addresses the above challenge, we propose a new reflect the same state. serializability criterion that generalizes conflict serializability to eventually consistent semantics, while at the same time taking into account high-level operations. This enables precise reasoning about replicated data types (such as replicated maps and lists [9, 30]), which are commonly used in modern distributed applications. Since we assume only eventual consistency, our criterion immediately applies to all consistency levels that strengthen eventual consistency in various ways [9, 10, 23, 34]. The key technical insight of our work is that a precise criterion for eventual consistency needs to take into account not only commutativity, but also that some operations absorb (mask) the effects ar u1 u2 u1 u2 of others. Absorption is one of the main properties that permits an execution possible under eventual consistency to be recognized as vi vi ⊕ ⊕ equivalent to a strongly consistent one. The core technical problem vi here is finding a way to combine commutativity and absorption q1 q2 q1 q2 reasoning: a combination that is natural for reads and writes but (a) Visibility, (b) Dependency, non-trivial for abstract operations with rich semantics. arbitration anti-dependency We note that in practice, a serializability criterion need not (and typically should not) be used on the entire application. It is most Figure 1: Execution of motivating example, with operations: effective when used in a targeted manner for program parts intended u1 map(G,I).user.setIfEmpty("Alice") to be serializable (e.g., payment check-out), while not used for u map(G,I).user.setIfEmpty("Bob") parts that can tolerate weak consistency (e.g., display code) and can 2 q map(G,I).user returns "Alice" benefit from higher performance. Indeed, in our evaluation, we used 1 q map(G,I).user returns "Bob" our criterion in such a targeted way. This usage scenario is similar 2 to how standard conflict serializability is used for shared memory concurrent programming (e.g., [37]). To substantiate the usefulness of our criterion, we built a dynamic Here, Players is a distributed map from a game G and a po- analyzer that checks whether the criterion holds on program execu- sition I to a user participating in the game. A user identifier is tions, and evaluated our analyzer on two application domains. First, stored in field user. For now, suppose that each operation such as setIfEmpty runs atomically in its own transaction (our results we analyzed 33 small mobile apps written in TOUCHDEVELOP [35], a framework that uses weakly consistent cloud types [9]. The ex- extend to transactions with multiple operations). The intended be- perimental results indicate that our serializability criterion is able havior is that all players have a consistent view of the Players map to detect violations that lead to some difficult-to-catch errors. Sec- and, in particular, that it is not possible for two players to assume ond, we implemented the database benchmark TPC-C [36] using they have obtained the same position in a game. This behavior is guaranteed for executions under strong consistency: if two compet- the eventually consistent data store RIAK [19] and show how our criterion can guide developers to derive correct synchronization for ing accesses to the same position are performed concurrently, one client implementations. of the setIfEmpty operations will remain without effect, and the corresponding client has to try the next position in the game. Contributions The main contributions of our paper are: To illustrate the issues with eventual consistency, suppose we have two users, ‘Alice’ and ‘Bob’, trying to acquire the same position • An effective serializability criterion for clients of eventually I in game G. Here, each user executes a setIfEmpty update, which consistent data stores. Our criterion generalizes conflict serializ- will eventually propagate to the replica of each user, followed by a ability to deal with weakly consistent behaviors and high-level query on the map (executed on the user’s own replica). That is, as data types. shown in Figure 1, ‘Alice’ performs update u1 and query q1, while • Polynomial-time algorithms to check whether the criterion holds ‘Bob’ performs update u2 and query q2. The figure also shows a on a given program execution. possible execution of these updates and queries. In the graph, vi • An implementation of our algorithms for two data stores: the designates whether an update is visible to a query (i.e., whether the update was applied to the replica on which the query is executed TOUCHDEVELOP cloud platform for mobile device applications before the query took effect). By ar, we denote an arbitration order and the distributed database RIAK. in which all (conflicting) updates are ordered by the system. A query • A detailed evaluation that indicates that our criterion is useful for therefore observes a database state that is the result of applying such finding previously undetected errors and can help in building cor- visible updates in the arbitration order. rect and scalable applications running on eventually consistent Returning to our example in Figure 1a, here, both u1 and u2 data stores.

Load more