Operating System Transactions

Copyright by Donald Elliott Porter 2010 The Dissertation Committee for Donald Elliott Porter certifies that this is the approved version of the following dissertation: Operating System Transactions Committee: Emmett Witchel, Supervisor Lorenzo Alvisi Kathryn S. McKinley Vitaly Shmatikov Michael Swift Operating System Transactions by Donald Elliott Porter, B.A., M.S.C.S. Dissertation Presented to the Faculty of the Graduate School of The University of Texas at Austin in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy The University of Texas at Austin December 2010 Dedication To Lindsay, sine qua non. Acknowledgments I would like to express my deepest gratitude to my advisor, Emmett Witchel, for help and guidance over my graduate career. There are few people who have invested more time and energy into my success than Emmett. Moreover, he has very generously supported my pursuit of independent and external projects over the last two years. There is much that is hard to summarize adequately, so I simply say thank you. Proverbs 27:17 says, \As iron sharpens iron, so one [person] sharpens an- other." This has been my experience with my wonderful collaborators: Chris Ross- bach, Owen Hofmann, Indrajit Roy, Hany Ramadan, Mike Bond, Kathryn McKin- ley, Justin Brickell, Vitaly Shmatikov, Jung Woo Ha, Alex Benn, Aditya Bhandari, Jason Davis, and David Chen. TxOS was a large implementation effort, and I thank the graduate students who contributed code to the system: Owen Hofmann, Chris Rossbach, Sangman Kim, Alex Benn, Indrajit Roy, Andrew Matsuoka, and Baishakhi Ray. I also thank each of my committee members for helpful comments and suggestions on the project. I am also thankful for colleagues who offered small acts of kindness during my graduate career, including: • Sara Standtman, who frequently guided me through the UT beauracracy, as well as helping me write a grant proposal that nearly killed us both. • Peter Djeu, who taught me to teach better, as well as how to play Super Bomberman. v • Chris Rossbach, Harry Li, Allen Clement, and Owen Hofmann, who, in the course of sharing office space with me, returned kind words and good advice for rather frequent interruptions. • The three Mikes I was a TA for (Scott, Dahlin, and Walfish), as well as Lorenzo Alvisi, who helped me learn to teach others better. • Taylor Riché, who early in my graduate career impressed upon me the importance of giving a good talk and backed it up with hours watching my bad talks. He also tried (less successfully) to impress upon me the importance of typesetting minutia|you're welcome for the accent aigu. To my dearest friend, Justin Quarry, who continues to teach me the meaning of dedication to one's craft. To Wesley Beal, who reminds me to make time for important relationships. To Terry Talley, who gave me particular help starting graduate school and to whom I owe a disproportionate share of my professional wisdom. I thank God for the many blessings in my life. I am especially thankful for my family; without their love, encouragement, and support I wouldn't be where I am today. I thank my Mom for showing me the fun and creative side of math at a young age; and my Dad for honing my critical and strategic thinking skills over years of conversation, advice, and merciless defeat at board games. Finally, I thank my spouse Lindsay for sharing this adventure with me. vi Operating System Transactions Publication No. Donald Elliott Porter, Ph.D. The University of Texas at Austin, 2010 Supervisor: Emmett Witchel Applications must be able to synchronize accesses to operating system (OS) resources in order to ensure correctness in the face of concurrency and system failures. This thesis proposes system transactions, with which the programmer specifies atomic updates to heterogeneous system resources and the OS guarantees atomicity, consistency, isolation, and durability (ACID). This thesis provides a model for system transactions as a concurrency con- trol mechanism. System transactions efficiently and cleanly solve long-standing concurrency problems that are difficult to address with other techniques. For ex- ample, malicious users can exploit race conditions between distinct system calls in privileged applications, gaining administrative access to a system. Programmers can eliminate these vulnerabilities by eliminating these race conditions with system vii transactions. Similarly, failed software installations can leave a system unusable. System transactions can roll back an unsuccessful software installation without dis- turbing concurrent, independent updates to the file system. This thesis describes the design and implementation of TxOS, a variant of Linux 2.6.22 that implements system transactions. The thesis contributes new implementation techniques that yield fast, serializable transactions with strong isolation and fairness between system transactions and non-transactional activity. Using system transactions, programmers can build applications with better performance or stronger correctness guarantees from simpler code. For instance, wrapping an installation of OpenSSH in a system transaction guarantees that a failed installation will be rolled back completely. These atomicity properties are provided by the OS, requiring no modification to the installer itself and adding only 10% performance overhead. The prototype implementation of system transactions also mini- mizes non-transactional overheads. For instance, a non-transactional compilation of Linux incurs negligible (less than 2%) overhead on TxOS. Finally, this thesis describes a new lock-free linked list algorithm, called olf, for optimistic, lock-free lists. olf addresses key limitations of prior algorithms, which sacrifice functionality for performance. Prior lock-free list algorithms can safely insert or delete a single item, but cannot atomically compose multiple operations (e.g., atomically move an item between two lists). olf provides both arbitrary composition of list operations as well as performance scalability close to previous lock-free list designs. olf also removes previous requirements for dynamic memory allocation and garbage collection of list nodes, making it suitable for low-level system software, such as the Linux kernel. We replace lists in the Linux kernel's directory cache with olf lists, which currently requires a coarse-grained lock to ensure invariants across multiple lists. olf lists in the Linux kernel improve performance of a filesystem metadata microbenchmark by 3× over unmodified Linux at 8 CPUs. viii The TxOS prototype demonstrates that a mature OS running on commodity hardware can provide system transactions at a reasonable performance cost. As a practical OS abstraction for application developers, system transactions facilitate writing correct application code in the presence of concurrency and system failures. The olf algorithm demonstrates that application developers can have both the functionality of locks and the performance scalability of a lock-free linked list. ix Contents List of Tables xv List of Figures xviii Chapter 1 Introduction1 1.1 Motivating examples...........................4 1.1.1 Software installation or upgrade................5 1.1.2 Eliminating races for security..................6 1.2 Composing linked list operations without locks............8 1.3 Summary.................................9 1.4 Thesis organization............................ 10 Chapter 2 Technical overview 11 2.1 System transactions........................... 11 2.1.1 System transaction semantics.................. 12 2.1.2 Interaction of transactional and non-transactional threads.. 14 2.1.3 System transaction progress................... 14 2.1.4 System transactions for system state.............. 15 2.1.5 Communication model...................... 16 2.2 TxOS overview.............................. 18 x Chapter 3 The TxOS Kernel 20 3.1 Version management of transactional state............... 21 3.1.1 Versioning kernel objects.................... 22 3.1.2 Splitting objects into header and data............. 23 3.1.3 Read-only objects........................ 24 3.2 Conflict detection and interoperability................. 26 3.2.1 Conflict detection........................ 26 3.2.2 Contention Management..................... 28 3.2.3 Asymmetric conflicts and fairness................ 28 3.2.4 Minimizing conflicts on lists................... 30 3.3 Managing transaction state....................... 32 3.3.1 Multi-process transactions.................... 36 3.4 Commit protocol............................. 39 3.5 Abort Protocol.............................. 41 3.6 Impact of data structure changes.................... 41 3.7 Integration with transactional memory................. 43 3.7.1 Lock-based STM requirements................. 44 3.7.2 HTM and obstruction-free STM requirements......... 45 Chapter 4 TxOS Kernel Subsystems 47 4.1 Virtual file system............................ 48 4.1.1 Transactional file data access.................. 48 4.1.2 Transactional file systems.................... 50 4.1.3 Serializable directory reads................... 52 4.1.4 Early release of file handles................... 54 4.2 Memory mapping............................. 55 4.3 Pipes.................................... 56 4.4 Text console................................ 57 xi 4.5 Signal delivery.............................. 57 4.6 Future work................................ 58 4.7 Summary................................. 60 Chapter 5 Evaluation 62 5.1 Single-thread system call overheads..................

Load more