Scalable Transactions for Scalable Distributed Database Systems By

Scalable Transactions for Scalable Distributed Database Systems by Gene Pang A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Michael J. Franklin, Chair Professor Ion Stoica Professor John Chuang Summer 2015 Scalable Transactions for Scalable Distributed Database Systems Copyright 2015 by Gene Pang 1 Abstract Scalable Transactions for Scalable Distributed Database Systems by Gene Pang Doctor of Philosophy in Computer Science University of California, Berkeley Professor Michael J. Franklin, Chair With the advent of the Internet and Internet-connected devices, modern applications can experience very rapid growth of users from all parts of the world. A growing user base leads to greater usage and large data sizes, so scalable database systems capable of handling the great demands are critical for applications. With the emergence of cloud computing, a major movement in the industry, modern applications depend on distributed data stores for their scalable data management solutions. Many large-scale applications utilize NoSQL systems, such as distributed key-value stores, for their scalability and availability properties over traditional relational database systems. By simplifying the design and interface, NoSQL systems can provide high scalability and performance for large data sets and high volume workloads. However, to provide such benefits, NoSQL systems sacrifice traditional consis- tency models and support for transactions typically available in database systems. Without transaction semantics, it is harder for developers to reason about the correctness of the in- teractions with the data. Therefore, it is important to support transactions for distributed database systems without sacrificing scalability. In this thesis, I present new techniques for scalable transactions for scalable database systems. Distributed data stores need scalable transactions to take advantage of cloud computing, and to meet the demands of modern applications. Traditional techniques for transactions may not be appropriate in a large, distributed environment, so in this thesis, I describe new techniques for distributed transactions, without having to sacrifice traditional semantics or scalability. I discuss three facets to improving transaction scalability and support in distributed database systems. First, I describe a new transaction commit protocol that reduces the response times for distributed transactions. Second, I propose a new transaction programming model that allows developers to better deal with the unexpected behavior of distributed transactions. Lastly, I present a new scalable view maintenance algorithm for convergent join views. Together, the new techniques in this thesis contribute to providing scalable transactions for modern, distributed database systems. i To my wife and children ii Contents Contents ii List of Figures iv List of Tables vi 1 Introduction 1 1.1 Trends of Large-Scale Applications . 1 1.2 EmergenceofCloudComputing ......................... 1 1.3 ScalingwithNoSQL ............................... 2 1.4 Bringing’SQL’BacktoNoSQL ......................... 3 1.5 Scalable Transactions for Scalable Distributed Database Systems....... 4 1.6 SummaryandContributions........................... 5 1.7 DissertationOverview .............................. 6 2 Background 7 2.1 Introduction.................................... 7 2.2 TraditionalDatabaseSystems . 7 2.3 DatabaseTransactions .............................. 9 2.4 Scaling Out Database Management Systems . .. 14 2.5 Transactions in a Distributed Setting . 19 2.6 MaterializedViews ................................ 22 2.7 Summary ..................................... 24 3 A New Transaction Commit Protocol 26 3.1 Introduction.................................... 26 3.2 ArchitectureOverview .............................. 28 3.3 TheMDCCProtocol ............................... 30 3.4 ConsistencyGuarantees ............................. 41 3.5 Evaluation..................................... 45 3.6 RelatedWork ................................... 54 3.7 Conclusion..................................... 56 iii 4 A New Transaction Programming Model 57 4.1 Introduction.................................... 57 4.2 ThePast,TheDream,TheFuture . 58 4.3 PLANET Simplified Transaction Programming Model . 62 4.4 AdvancedPLANETFeatures .......................... 68 4.5 Geo-Replication.................................. 71 4.6 Evaluation..................................... 78 4.7 RelatedWork ................................... 87 4.8 Conclusion..................................... 88 5 A New Scalable View Maintenance Algorithm 89 5.1 Introduction.................................... 89 5.2 Motivation..................................... 90 5.3 Goals for Scalable View Maintenance . 91 5.4 ExistingMaintenanceMethods . .. .. 94 5.5 PossibleAnomalies ................................ 95 5.6 Scalable View Maintenance . 101 5.7 SCALAVIEWAlgorithm............................. 105 5.8 Proofs ....................................... 106 5.9 Evaluation..................................... 111 5.10RelatedWork ................................... 119 5.11Conclusion..................................... 121 6 Conclusion 122 6.1 Contributions ................................... 122 6.2 FutureWork.................................... 123 6.3 Conclusion..................................... 124 Bibliography 126 iv List of Figures 2.1 Typical model for a single-server database system . ........ 9 2.2 Typical scalable architecture for distributed database systems .......... 19 3.1 MDCCarchitecture.................................. 29 3.2 PossiblemessageorderinMDCC . 38 3.3 TPC-WwritetransactionresponsetimesCDF. .... 48 3.4 TPC-W throughput scalability . 49 3.5 Micro-benchmarkresponsetimesCDF . ... 50 3.6 Commits/abortsforvaryingconflictrates . ..... 52 3.7 Response times for varying master locality . .... 53 3.8 Time-series of response times during failure (failure simulated at 125 seconds) . 54 4.1 Round trip response times between various regions on Amazon’s EC2 cluster. 58 4.2 Client view of PLANET transactions . 63 4.3 PLANETtransactionstatediagram. ... 66 4.4 Sequence diagramfortheMDCCclassicprotocol . ..... 72 4.5 Transaction outcomes, varying the timeout (20,000 items, 200 TPS)....... 79 4.6 Commit & abort throughput, with variable hotspot (200,000 items,200TPS) . 81 4.7 Average response time, with variable hotspot (200,000 items, 200TPS) ..... 81 4.8 Commit throughput, with variable client rate (50,000 items, 100 hotspot). 82 4.9 Commit response time CDF (50,000 items, 100 hotspot) . ..... 83 4.10 Transaction types, with variable data size (200 TPS, uniform access) ...... 84 4.11 Average commit latency, with variable data size (200 TPS, uniform access) . 84 4.12 Admission control, varying policies (100 TPS, 25,000 items, 50 hotspot). 85 4.13 Admission control, varying policies (400 TPS, 25,000 items, 50 hotspot). 86 5.1 History and corresponding conflict dependency graph for example anomaly . 98 5.2 Simple three table join for the micro-benchmark . ..... 112 5.3 ViewstalenessCDF ................................. 113 5.4 View staleness CDF (across three availability zones) . ...... 114 5.5 Data size amplification as a percentage of the base table data . ....... 115 5.6 Bandjoinqueryformicro-benchmark. .... 115 v 5.7 ViewstalenessCDFforbandjoins . 116 5.8 View staleness CDF scalability . 117 5.9 Throughput scalability . 117 5.10 Simple two table linear join query (Linear-2) . .... 118 5.11 Simple three table linear join query (Linear-3) . ..... 119 5.12 Simple four table star join query (Star-4) . ...... 119 5.13 View staleness CDF for different join types . ..... 120 vi List of Tables 3.1 Definitions of symbols in MDCC pseudocode . .. 40 vii Acknowledgments I would like to express my sincere gratitude to my advisor, Michael Franklin, for his guid- ance throughout my research. He always provided me with great insight into my work, and really taught me to think critically about the core research problems. Also, as someone who had no prior research experience, I am extremely grateful for the opportunity he had given me to work with him in the AMPLab. Through this opportunity, I was able to collaborate with extremely bright professors and students, and this really shaped my graduate research career. I am also grateful for the joint work and insight from my research collaborators. I worked very closely with Tim Kraska when he was a postdoc in the AMPLab, and he was instrumental in helping me cultivate the ideas in this thesis. I would like to thank Alan Fekete, for providing me with invaluable feedback on my research. I feel very fortunate to have the opportunity to work with Alan and to experience his expertise in database transactions. I am also thankful for Sam Madden, who helped me formulate the initial ideas for my work, when he was visiting the AMPLab. It is so incredible that I had been given the opportunity to interact with such great experts in the database community. I would also like to express my appreciation for the rest of my dissertation committee, Ion Stoica and John Chuang. They contributed thoughtful comments to my research, and really helped me to improve this thesis. It is easy to be too close to my own work, so their perspectives as outside experts helped me to see the bigger picture. Throughout my research, there have been many researchers

Load more