Coordination Avoidance in Distributed Databases

Coordination Avoidance in Distributed Databases By Peter David Bailis A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Joseph M. Hellerstein, Co-Chair Professor Ion Stoica, Co-Chair Professor Ali Ghodsi Professor Tapan Parikh Fall 2015 Coordination Avoidance in Distributed Databases Copyright 2015 by Peter David Bailis 1 Abstract Coordination Avoidance in Distributed Databases by Peter David Bailis Doctor of Philosophy in Computer Science University of California, Berkeley Professor Joseph M. Hellerstein, Co-Chair Professor Ion Stoica, Co-Chair The rise of Internet-scale geo-replicated services has led to upheaval in the design of modern data management systems. Given the availability, latency, and throughput penalties asso- ciated with classic mechanisms such as serializable transactions, a broad class of systems (e.g., “NoSQL”) has sought weaker alternatives that reduce the use of expensive coordination during system operation, often at the cost of application integrity. When can we safely forego the cost of this expensive coordination, and when must we pay the price? In this thesis, we investigate the potential for coordination avoidance—the use of as little coordination as possible while ensuring application integrity—in several modern data- intensive domains. We demonstrate how to leverage the semantic requirements of applications in data serving, transaction processing, and web services to enable more efficient distributed algorithms and system designs. The resulting prototype systems demonstrate regular order-of-magnitude speedups compared to their traditional, coordinated counter- parts on a variety of tasks, including referential integrity and index maintenance, transaction execution under common isolation models, and database constraint enforcement. A range of open source applications and systems exhibit similar results. i To my family ii Contents List of Figuresv List of Tables viii Acknowledgments ix 1 Introduction1 1.1 Coordination Avoidance.............................3 1.2 Primary Contributions..............................6 1.3 Outline and Previously Published Work.....................9 2 Coordination: Concepts and Costs 10 2.1 Coordination and Correctness in Database Systems.............. 10 2.2 Understanding the Costs of Coordination.................... 12 2.2.1 Latency.................................. 12 2.2.2 Throughput and Scalability....................... 14 2.2.3 Availability and Failures......................... 17 2.2.4 Summary: Costs............................. 19 2.2.5 Outcome: NoSQL, Historical Context, Safety and Liveness...... 19 2.3 System Model................................... 21 3 Invariant Confluence and Coordination 27 3.1 Invariant Confluence: Criteria Defined..................... 27 3.2 Invariant Confluence and Coordination-Free Execution............ 28 3.3 Discussion and Limitations............................ 33 3.4 Summary..................................... 34 4 Coordination Avoidance and Weak Isolation 36 4.1 ACID in the Wild................................. 36 4.2 Invariant Confluence Analysis: Isolation Levels................. 37 4.2.1 Invariant Confluent Isolation Guarantees................ 39 4.2.2 Sticky Availability............................ 44 4.2.3 Non-Invariant Confluent Semantics................... 45 CONTENTS iii 4.2.4 Summary................................. 48 4.3 Implications: Existing Algorithms and Empirical Impact............ 49 4.3.1 Existing Algorithms........................... 50 4.3.2 Empirical Impact: Isolation Guarantees................. 51 4.4 Isolation Models................................. 56 4.5 Summary..................................... 64 5 Coordination Avoidance and RAMP Transactions 65 5.1 Overview..................................... 67 5.2 Read Atomic Isolation in the Wild........................ 68 5.3 Semantics and System Model.......................... 71 5.3.1 RA Isolation: Formal Specification................... 71 5.3.2 RA Implications and Limitations.................... 72 5.3.3 RA Compared to Other Isolation Models................ 73 5.3.4 RA and Serializability.......................... 76 5.3.5 System Model and Scalability...................... 80 5.4 RAMP Transaction Algorithms......................... 81 5.4.1 RAMP-Fast................................ 82 5.4.2 RAMP-Small: Trading Metadata for RTTs............... 84 5.4.3 RAMP-Hybrid: An Intermediate Solution............... 87 5.4.4 Summary and Additional Details.................... 88 5.4.5 Distribution and Fault Tolerance.................... 91 5.4.6 Additional Semantics........................... 92 5.4.7 Further Optimizations.......................... 93 5.5 Experimental Evaluation............................. 93 5.5.1 Experimental Setup............................ 94 5.5.2 Experimental Results: Comparison................... 95 5.5.3 Experimental Results: CTP Overhead.................. 100 5.5.4 Experimental Results: Scalability.................... 100 5.6 Applying and Modifying the RAMP Protocols................. 101 5.6.1 Multi-Datacenter RAMP......................... 102 5.6.2 Quorum-Replicated RAMP Operation................. 104 5.6.3 RAMP, Transitive Dependencies, and Causal Consistency....... 105 5.7 RSIW Proof.................................... 108 5.8 RAMP Correctness and Independence...................... 111 5.9 Discussion..................................... 114 5.10 Summary..................................... 115 6 Coordination Avoidance for Database Constraints 117 6.1 Invariant Confluence of SQL Constraints.................... 117 6.1.1 Invariant Confluence for SQL Relations................ 118 6.1.2 Invariant Confluence for SQL Data Types............... 120 CONTENTS iv 6.1.3 SQL Discussion and Limitations..................... 121 6.2 More Formal Invariant Confluence Analysis of SQL Constraints....... 122 6.3 Empirical Impact: SQL-Based Constraints................... 130 6.3.1 TPC-C Invariants and Execution.................... 130 6.3.2 Evaluating TPC-C New-Order...................... 132 6.3.3 Analyzing Additional Applications................... 136 6.4 Constraints from Open Source Applications.................. 137 6.4.1 Background and Context......................... 139 6.4.2 Feral Mechanisms in Rails........................ 142 6.4.3 Rails Invariant Confluence Analysis................... 153 6.5 Quantifying Integrity Violations in Rails.................... 156 6.6 Other Frameworks................................ 165 6.7 Implications for Databases............................ 167 6.7.1 Summary: Database Shortcomings Today................ 167 6.7.2 Domesticating Feral Mechanisms.................... 168 6.8 Detailed Validation Behavior, Experimental Workload............. 170 6.8.1 Uniqueness Validation Behavior..................... 170 6.8.2 Association Validation Behavior..................... 171 6.8.3 Uniqueness Validation Schema...................... 171 6.8.4 Uniqueness Stress Test.......................... 172 6.8.5 Uniqueness Workload Test........................ 172 6.8.6 Association Validation Schema..................... 172 6.8.7 Association Stress Test.......................... 173 6.8.8 Association Workload Test........................ 174 6.9 Summary..................................... 175 7 Related Work 176 8 Conclusions 184 8.1 Design Patterns for Coordination Avoidance.................. 184 8.2 Limitations.................................... 185 8.3 Future Work.................................... 186 8.3.1 Automating Coordination Avoidance.................. 187 8.3.2 Comprehending Weak Isolation..................... 188 8.3.3 Emerging Application Patterns...................... 189 8.3.4 Statistical Coordination Avoidance................... 190 8.4 Closing Thoughts................................. 191 Bibliography 193 v List of Figures 1.1 An illustration of a distributed, replicated database and its relation to application servers and end users. In modern distributed databases, data is stored on several servers that may be located in geographically distant regions (e.g., Virginia and Oregon, or even different continents) and may be accessed by mul- tiple database clients (e.g., application servers, analytics frameworks, database administrators) simultaneously. The key challenge that we investigate in this thesis is how to minimize the amount of synchronous communication across databases while providing “always on,” scalable, and high performance access to each replica.....................................2 1.2 In this thesis, we develop the principle of Invariant Confluence, a necessary and sufficient condition for safe, convergent, coordination-free execution, and apply it to a range of application domains at increasing levels of abstraction: database isolation, database constraints, and safety properties from modern database- backed applications. Each guarantee that is invariant confluent is guaranteed to have at least one coordination-free implementation; we investigate the design of several implementations in this work, which operate at the database infrastruc- ture tier (Figure 1.1)..................................6 2.1 CDF of round-trip times for slowest inter- and intra- availability zone links compared to cross-region links.............................. 13 2.2 Microbenchmark performance of coordinated and coordination-free execution of transactions of varying

Coordination Avoidance in Distributed Databases

Velocity London 2018

How Are the Leaders of the Most Digitally Savvy Companies

AWS with Corey Quinn

K12 Education Leaders! on Behalf of AWS, We Are Excited for You to Join Us for AWS Re:Invent 2020!

Analyzing Dynamic Capabilities in the Context of Cloud Platform Ecosystems - a Case Study Approach

Download.Oracle.Com/Docs/Cd/B28359 01/Network.111/B28530/Asotr Ans.Htm

Failure Modes and Effects Analysis (FMEA)

Startup Attendee Guide

Professional LAMP : Linux, Apache, Mysql, and PHP Web Development / Jason Gerner

Chief of the Year: Amazon's Werner Vogels

Data-Intensive Text Processing with Mapreduce

Response Resolver Maps Response from Data Source to Graphql Response Via Velocity Template