Serializable Isolation for Snapshot Databases
Total Page:16
File Type:pdf, Size:1020Kb
Serializable Isolation for Snapshot Databases This thesis is submitted in fulfillment of the requirements for the degree of Doctor of Philosophy in the School of Information Technologies at The University of Sydney MICHAEL JAMES CAHILL August 2009 Copyright © 2009 Michael James Cahill All Rights Reserved Abstract Many popular database management systems implement a multiversion concurrency control algorithm called snapshot isolation rather than providing full serializability based on locking. There are well- known anomalies permitted by snapshot isolation that can lead to violations of data consistency by interleaving transactions that would maintain consistency if run serially. Until now, the only way to pre- vent these anomalies was to modify the applications by introducing explicit locking or artificial update conflicts, following careful analysis of conflicts between all pairs of transactions. This thesis describes a modification to the concurrency control algorithm of a database management system that automatically detects and prevents snapshot isolation anomalies at runtime for arbitrary ap- plications, thus providing serializable isolation. The new algorithm preserves the properties that make snapshot isolation attractive, including that readers do not block writers and vice versa. An implementa- tion of the algorithm in a relational database management system is described, along with a benchmark and performance study, showing that the throughput approaches that of snapshot isolation in most cases. iii Acknowledgements Firstly, I would like to thank my supervisors, Dr Alan Fekete and Dr Uwe Röhm, without whose guid- ance I would not have reached this point. Many friends and colleagues, both at the University of Sydney and at Oracle, have helped me to develop my ideas and improve their expression. Thank you to Alex, Ashok, Danny, James, Keith, Kevin, Margo, Mat, Mike, Mohammad, Raymes, Tim, Tara and Vincent, among many others. Of course, any errors, inaccuracies, blemishes or bugs that remain are entirely my own. Lastly, I am profoundly grateful to my wife Rachel and to my children Patrick and Evelyn. Your support and patience while I have been busy working has been invaluable. v CONTENTS Abstract iii Acknowledgements v List of Figures xi Chapter 1 Introduction 1 1.1 Why does serializable execution matter? . 1 1.2 What makes snapshot isolation an attractive alternative? . 2 1.3 Why is a new approach needed? . 4 1.4 Contributions . 4 1.4.1 Serializable Snapshot Isolation . 4 1.4.2 Implementation Experience. 5 1.4.3 Anomaly-aware Performance Evaluation . 6 1.4.4 Summary . 7 1.5 Outline of the Thesis . 7 Chapter 2 Background 9 2.1 Transactions . 9 2.2 Serializable Execution . 11 2.2.1 Strict Two-Phase Locking . 12 2.3 Weak Isolation. 13 2.4 Multiversion Concurrency Control . 14 2.5 Snapshot Isolation . 14 2.5.1 Write Skew Anomalies . 16 2.5.2 Phantoms . 19 2.6 Making Applications Serializable with Snapshot Isolation . 20 2.6.1 Materialization . 23 2.6.2 Promotion . 24 vii viii CONTENTS 2.6.3 Mixing Isolation Levels . 25 2.6.4 Automating Static Analysis of Applications . 26 2.7 Serialization Graph Testing . 27 2.8 Database Performance Measurement . 28 2.8.1 The TPC-C benchmark. 29 2.8.2 The SmallBank Benchmark . 31 2.8.3 SmallBank Transaction Mix . 32 2.8.4 SmallBank Static Dependency Graph . 33 2.8.5 Making SmallBank Serializable. 34 2.9 Summary . 35 Chapter 3 Serializable Snapshot Isolation 37 3.1 Design alternatives . 39 3.1.1 After-the-fact analysis tool . 39 3.1.2 Approaches to runtime conflict detection . 40 3.2 The Basic Algorithm . 41 3.3 Transaction lifecycle changes . 46 3.4 Correctness. 47 3.5 Detecting phantoms . 49 3.6 False positives . 51 3.7 Enhancements and optimizations . 54 3.7.1 Abort early . 54 3.7.2 Victim selection . 54 3.7.3 Upgrading SIREAD locks . 54 3.8 Mixing queries at SI with updates at Serializable SI . 55 3.9 Summary . 56 Chapter 4 Implementation Details 57 4.1 Berkeley DB Introduction. 57 4.2 Adding Snapshot Isolation to Berkeley DB . 57 4.3 Adding Serializable SI to Berkeley DB . 60 4.3.1 Transaction Cleanup in Berkeley DB . 60 4.3.2 Analysis of the code changes . 61 4.4 InnoDB Introduction . 61 CONTENTS ix 4.5 Adding Snapshot Isolation to InnoDB . 63 4.6 Adding Serializable SI to InnoDB . 64 4.6.1 Transaction Cleanup in InnoDB . 66 4.6.2 Analysis of the code changes . ..