Application-Level Caching with Transactional Consistency by Dan R
Total Page:16
File Type:pdf, Size:1020Kb
Application-Level Caching with Transactional Consistency by Dan R. K. Ports M.Eng., Massachusetts Institute of Technology (2007) S.B., S.B., Massachusetts Institute of Technology (2005) Submitted to the Department of Electrical Engineering and Computer Science in partial fulllment of the requirements for the degree of Doctor of Philosophy in Computer Science at the MASSACHUSETTSINSTITUTEOFTECHNOLOGY June 2012 © Massachusetts Institute of Technology 2012. All rights reserved. Author...................................................... Department of Electrical Engineering and Computer Science May 23, 2012 Certied by . Barbara H. Liskov Institute Professor esis Supervisor Accepted by . Leslie A. Kolodziejski Chair, Department Committee on Graduate eses 2 Application-Level Caching with Transactional Consistency by Dan R. K. Ports Submitted to the Department of Electrical Engineering and Computer Science on May 23, 2012, in partial fulllment of the requirements for the degree of Doctor of Philosophy in Computer Science abstract Distributed in-memory application data caches like memcached are a popular solution for scaling database-driven web sites. ese systems increase performance signicantly by reducing load on both the database and application servers. Unfortunately, such caches present two challenges for application developers. First, they cannot ensure that the application sees a consistent view of the data within a transaction, violating the isolation properties of the underlying database. Second, they leave the application responsible for locating data in the cache and keeping it up to date, a frequent source of application complexity and programming errors. is thesis addresses both of these problems in a new cache called TxCache. TxCache is a transactional cache: it ensures that any data seen within a transaction, whether from the cache or the database, reects a slightly stale but consistent snap- shot of the database. TxCache also o ers a simple programming model. Application developers simply designate certain functions as cacheable, and the system automati- cally caches their results and invalidates the cached data as the underlying database changes. Our experiments found that TxCache can substantially increase the performance of a web application: on the RUBiS benchmark, it increases throughput by up to 5.2 relative to a system without caching. More importantly, on this application, TxCache× achieves performance comparable (within 5%) to that of a non-transactional cache, showing that consistency does not have to come at the price of performance. esis Supervisor: Barbara H. Liskov Title: Institute Professor 3 4 acknowledgments is thesis, like the OSDI paper on which it is based, is the result of a collaboration with Austin Clements, Irene Zhang, Sam Madden, and Barbara Liskov. e work described here would not have been possible without their contributions. Austin and Irene implemented major parts of TxCache; I remain amazed by the speed with which they translated my random ideas into a functional system. More than once, Sam’s keen acumen for databases provided the solution to a problem that appeared intractable. My work – and especially this thesis – was substantially improved by working with my advisor, Barbara Liskov. Barbara’s principled approach to research and emphasis on the value of providing the right abstractions were a strong inuence on my research; I hope that inuence shows clearly in this thesis. Barbara’s suggestions also improved the clarity of this work. I also thank the other members of my thesis committee, Sam Madden and Frans Kaashoek for their helpful feedback (especially at short notice!). It has been a pleasure to work with the other members of the Programming Methodology Group: Winnie Cheng, James Cowling, Dorothy Curtis, Evan Jones, Daniel Myers, David Schultz, Liuba Shrira, and Ben Vandiver. In particular, James Cowling and David Schultz provided essential support during the writing of this thesis – both through their positive feedback and encouragement, and by being willing to discuss even the most esoteric details of transaction processing and database implementation. Before I joined PMG, I nished a master’s thesis working with David Karger on peer-to-peer systems. His enthusiasm and insight encouraged me to continue in systems research. As part of that project, I worked closely with Frans Kaashoek, Robert Morris, and various members of the PDOS group. Frans and Robert taught me many valuable lessons about how to do research and how to write a paper. I spent a few summers as an intern at VMware, where I beneted greatly from working with Carl Waldspurger, Mike Chen, Tal Garnkel, E. Lewis, Pratap Subrah- manyam, and others. I also beneted from recent work with Kevin Grittner on the implementation of 5 Serializable Snapshot Isolation in PostgreSQL. at work inuenced the invalidation mechanism in Chapter 7 and the discussion in Appendix B, but more importantly, Kevin provided valuable insight into the real-world benets of serializable isolation. I thank my family for their support. My parents, omas Ports and Catherine Covey, were a constant source of encouragement. Most of all, I thank my wife, Irene Zhang. It isn’t an exaggeration to say that this work wouldn’t have been possible without her. She has supported me in many ways: from having helpful discussions about this work, to reviewing drafts of my writing, to providing encouragement when I needed it most. 6 previously published material is thesis signicantly extends and revises work previously published in the following paper: Dan R. K. Ports, Austin T. Clements, Irene Zhang, Samuel Madden, and Barbara Liskov. Transactional Consistency and Automatic Manage- ment in an Application Data Cache. In Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’10), Vancouver, BC, Canada, October 2010. Parts of Appendix B are adapted from the following forthcoming publication: Dan R. K. Ports and Kevin Grittner. Serializable Snapshot Isolation in PostgreSQL. In Proceedings of the 38th International Conference on Very Large Data Bases (VLDB ’12), Istanbul, Turkey, August 2012. To appear. funding is research was supported by the National Science Foundation under ITR grants CNS-0428107 and CNS-0834239, and by a National Defense Science and Engi- neering Graduate (NDSEG) Fellowship sponsored by the Oce of Naval Research. 7 8 CONTENTS 1 Introduction 17 1.1 e Case for Application-Level Caching................. 18 1.2 Existing Caches Are Dicult to Use................... 19 1.3 Contributions................................. 23 1.4 Outline..................................... 24 2 Architecture and Model 27 2.1 System Architecture.............................. 27 2.2 Assumptions.................................. 30 3 Cacheable Functions: A Simple Programming Model 31 3.1 e Cacheable Function Model...................... 32 3.1.1 Restrictions on Cacheable Functions.............. 32 3.1.2 Making Functions Cacheable................... 33 3.1.3 Automatic Cache Management................. 34 3.2 Discussion................................... 35 3.2.1 Naming................................ 36 3.2.2 Invalidations............................. 37 3.2.3 Limitations.............................. 38 3.3 Implementing the Programming Model................. 39 3.3.1 Cache Servers............................ 39 3.3.2 TxCache Library.......................... 41 9 4 Consistency Model 45 4.1 Transactional Consistency.......................... 46 4.2 Relaxing Freshness.............................. 47 4.2.1 Dealing with Staleness: Avoiding Anomalies......... 48 4.3 Read/Write Transactions........................... 50 4.4 Transactional Cache API........................... 51 5 Consistency Protocol 53 5.1 Validity Intervals................................ 53 5.1.1 Properties............................... 55 5.2 A Versioned Cache.............................. 56 5.3 Storage Requirements............................ 58 5.3.1 Pinned Snapshots.......................... 60 5.4 Maintaining Consistency with Validity Intervals............ 61 5.4.1 Basic Operation........................... 61 5.4.2 Lazy Timestamp Selection.................... 62 5.4.3 Handling Nested Calls....................... 67 6 Storage Layer Support 69 6.1 A Transactional Block Store......................... 70 6.1.1 Implementation........................... 71 6.2 Supporting Relational Databases...................... 75 6.2.1 Exposing Multiversion Concurrency.............. 76 6.2.2 Tracking Query Validity...................... 79 6.2.3 Managing Pinned Snapshots................... 85 7 Automatic Invalidations 89 7.1 Unbounded Intervals and Invalidation Streams............. 90 7.1.1 Invalidation Streams........................ 91 7.1.2 Discussion: Explicit Invalidations................ 93 7.2 Invalidation Tags............................... 94 7.2.1 Tag Hierarchy............................ 95 7.3 Invalidation Locks............................... 97 10 7.3.1 Storing Invalidation Locks.................... 100 8 Evaluation 103 8.1 Implementation Notes............................ 103 8.2 Porting the RUBiS Benchmark....................... 106 8.3 Experimental Setup.............................. 108 8.4 Cache Sizes and Performance........................ 109 8.4.1 Cache Contents and Hit Rate.................. 111 8.5 e Benet of Staleness........................... 113 8.6 e Cost of Consistency........................... 115 8.6.1 Cache Miss Analysis........................ 116 8.6.2