Consistency Tradeoffs in Modern Distributed Database System Design

COVER FEATURE Consistency Tradeoffs in Modern Distributed Database System Design Daniel J. Abadi, Yale University The CAP theorem’s impact on modern dis- capabilities and in exposing the exaggerated marketing tributed database system design is more hype of many commercial DDBSs. However, since its limited than is often perceived. Another initial formal proof,7 CAP has become increasingly mis- tradeoff—between consistency and latency understood and misapplied, potentially causing significant harm. In particular, many designers incorrectly conclude —has had a more direct influence on sev- that the theorem imposes certain restrictions on a DDBS eral well-known DDBSs. A proposed new during normal system operation, and therefore imple- formulation, PACELC, unifies this tradeoff ment an unnecessarily limited system. In reality, CAP only with CAP. posits limitations in the face of certain types of failures, and does not constrain any system capabilities during normal operation. lthough research on distributed database sys- Nonetheless, the fundamental tradeoffs that inhibit tems began decades ago, it was not until recently DDBSs’ capabilities during normal operation have influ- that industry began to make extensive use of enced the different design choices of well-known systems. A DDBSs. There are two primary drivers for this In fact, one particular tradeoff—between consistency and trend. First, modern applications require increased data latency—arguably has been more influential on DDBS and transactional throughput, which has led to a desire design than the CAP tradeoffs. Both sets of tradeoffs are for elastically scalable database systems. Second, the important; unifying CAP and the consistency/latency trade- increased globalization and pace of business has led to off into a single formulation—PACELC—can accordingly the requirement to place data near clients who are spread lead to a deeper understanding of modern DDBS design. across the world. Examples of DDBSs built in the past 10 years that attempt to achieve high scalability or world- CAP IS FOR FAILURES wide accessibility (or both) include SimpleDB/Dynamo/ CAP basically states that in building a DDBS, designers DynamoDB,1 Cassandra,2 Voldemort (http://project- can choose two of three desirable properties: consistency voldemort.com), Sherpa/PNUTS,3 Riak (http://wiki.basho. (C), availability (A), and partition tolerance (P). Therefore, com), HBase/BigTable,4 MongoDB (www.mongodb.org), only CA systems (consistent and highly available, but not VoltDB/H-Store,5 and Megastore.6 partition-tolerant), CP systems (consistent and partition- DDBSs are complex, and building them is difficult. tolerant, but not highly available), and AP systems (highly Therefore, any tool that helps designers understand the available and partition-tolerant, but not consistent) are tradeoffs involved in creating a DDBS is beneficial. The possible. CAP theorem, in particular, has been extremely useful in Many modern DDBSs—including SimpleDB/Dynamo, helping designers to reason through a proposed system’s Cassandra, Voldemort, Sherpa/PNUTS, and Riak—do not 0018-9162/12/$31.00 © 2012 IEEE Published by the IEEE Computer Society FEBRUARY 2012 37 r2aba.indd 37 1/25/12 11:34 AM cover FEATURE by default guarantee consistency, as defined by CAP. (Al- the system to make the complete set of ACID (atomicity, though consistency of some of these systems became consistency, isolation, and durability) guarantees alongside adjustable after the initial versions were released, the focus high availability when there are no partitions. Therefore, here is on their original design.) In their proof of CAP, Seth the theorem does not completely justify the default con- Gilbert and Nancy Lynch7 used the definition of atomic/ figuration of DDBSs that reduce consistency (and usually linearizable consistency: “There must exist a total order several other ACID guarantees). on all operations such that each operation looks as if it were completed at a single instant. This is equivalent to CONSISTENCY/LATENCY TRADEOFF requiring requests of the distributed shared memory to To understand modern DDBS design, it is important act as if they were executing on a single node, responding to realize the context in which these systems were built. to operations one at a time.” Amazon originally designed Dynamo to serve data to the Given that early DDBS research focused on consistent core services in its e-commerce platform (for example, the systems, it is natural to assume that CAP was a major influ- shopping cart). Facebook constructed Cassandra to power ence on modern system architects, who, during the period its Inbox Search feature. LinkedIn created Voldemort to after the theorem was proved, built an increasing number handle online updates from various write-intensive fea- tures on its website. Yahoo built PNUTS to store user data that can be read or written to on every webpage view, to It is wrong to assume that DDBSs that store listings data for Yahoo’s shopping pages, and to store reduce consistency in the absence of data to serve its social networking applications. Use cases any partitions are doing so due to CAP- similar to Amazon’s motivated Riak. based decision-making. In each case, the system typically serves data for web- pages constructed on the fly and shipped to an active website user, and receives online updates. Studies indi- of systems implementing reduced consistency models. cate that latency is a critical factor in online interactions: The reasoning behind this assumption is that, because any an increase as small as 100 ms can dramatically reduce DDBS must be tolerant of network partitions, according to the probability that a customer will continue to interact or CAP, the system must choose between high availability and return in the future.9 consistency. For mission-critical applications in which high Unfortunately, there is a fundamental tradeoff between availability is extremely important, it has no choice but to consistency, availability, and latency. (Note that availability sacrifice consistency. and latency are arguably the same thing: an unavailable However, this logic is flawed and not consistent with system essentially provides extremely high latency. For pur- what CAP actually says. It is not merely the partition toler- poses of this discussion, I consider systems with latencies ance that necessitates a tradeoff between consistency and larger than a typical request timeout, such as a few seconds, availability; rather, it is the combination of as unavailable, and latencies smaller than a request timeout, but still approaching hundreds of milliseconds, as “high • partition tolerance and latency.” However, I will eventually drop this distinction and • the existence of a network partition itself. allow the low-latency requirement to subsume both cases. Therefore, the tradeoff is really just between consistency The theorem simply states that a network partition and latency, as this section’s title suggests.) causes the system to have to decide between reducing This tradeoff exists even when there are no network availability or consistency. The probability of a network partitions, and thus is completely separate from the trade- partition is highly dependent on the various details of the offs CAP describes. Nonetheless, it is a critical factor in the system implementation: Is it distributed over a wide area design of the above-mentioned systems. (It is irrelevant to network (WAN), or just a local cluster? What is the quality this discussion whether or not a single machine failure is of the hardware? What processes are in place to ensure treated like a special type of network partition.) that changes to network configuration parameters are The reason for the tradeoff is that a high availability performed carefully? What is the level of redundancy? requirement implies that the system must replicate data. Nonetheless, in general, network partitions are somewhat If the system runs for long enough, at least one compo- rare, and are often less frequent than other serious types nent in the system will eventually fail. When this failure of failure events in DDBSs.8 occurs, all data that component controlled will become As CAP imposes no system restrictions in the base- unavailable unless the system replicated another version line case, it is wrong to assume that DDBSs that reduce of the data prior to the failure. Therefore, the possibility consistency in the absence of any partitions are doing so of failure, even in the absence of the failure itself, implies due to CAP-based decision-making. In fact, CAP allows that the availability requirement requires some degree of 38 COMPUTER r2aba.indd 38 1/25/12 11:34 AM data replication during normal system operation. (Note the preprocessor first, even if another data replica is nearer to important difference between this tradeoff and the CAP the update initiation location. tradeoffs: while the occurrence of a failure causes the CAP tradeoffs, the failure possibility itself results in this tradeoff.) (2) Data updates sent to an agreed-upon To achieve the highest possible levels of availability, a location first DDBS must replicate data over a WAN to protect against I will refer to this agreed-upon location as a “master the failure of an entire datacenter due, for example, to a node” (different data items can have different master hurricane, terrorist attack, or, as in the famous April 2011 nodes). This master node resolves all requests to update Amazon EC2 cloud outage, a single network configuration the data item, and the order that it chooses to perform error. The five reduced-consistency systems mentioned these updates determines the order in which all replicas above are designed for extremely high availability and perform the updates. After the master node resolves up- usually for replication over a WAN. dates, it replicates them to all replica locations. DATA REPLICATION As soon as a DDBS replicates data, a tradeoff between As soon as a DDBS replicates data, a consistency and latency arises. This occurs because there tradeoff between consistency and are only three alternatives for implementing data replica- latency arises.

Consistency Tradeoffs in Modern Distributed Database System Design

Smart Methods for Retrieving Physiological Data in Physiqual

CAP Theorem P

Nosql Databases: Yearning for Disambiguation

IBM Cloudant: Database As a Service Fundamentals

A Big Data Roadmap for the DB2 Professional

Introduction to Nosql

CAP Twelve Years Later: How the “Rules” Have Changed

Tradeoffs in Distributed Databases

Newsql Principles, Systems and Current Trends

CAP Theorem P

Inadequacies of CAP Theorem

(2020) Analysis of Trade-Offs in Fault-Tolerant Distributed Computing and Replicated Databases