Building a Reliable Storage Stack
Total Page:16
File Type:pdf, Size:1020Kb
Building a Reliable Storage Stack Ph.D. Thesis David Cornelis van Moolenbroek Vrije Universiteit Amsterdam, 2016 This work was supported by the European Research Council Advanced Grant 227874. This work was carried out in the ASCI graduate school. ASCI dissertation series number 355. Copyright © 2016 by David Cornelis van Moolenbroek. ISBN 978-94-028-0240-5 Cover design by Eva Dienske. Printed by Ipskamp Printing. VRIJE UNIVERSITEIT Building a Reliable Storage Stack ACADEMISCH PROEFSCHRIFT ter verkrijging van de graad Doctor aan de Vrije Universiteit Amsterdam, op gezag van de rector magnificus prof.dr. V. Subramaniam, in het openbaar te verdedigen ten overstaan van de promotiecommissie van de Faculteit der Exacte Wetenschappen op maandag 12 september 2016 om 11.45 uur in de aula van de universiteit, De Boelelaan 1105 door David Cornelis van Moolenbroek geboren te Amsterdam promotor: prof.dr. A.S. Tanenbaum To my parents Acknowledgments This book marks the end of both a professional and a personal journey–one that has been long but rewarding. There are several people whom I would like to thank for accompanying and helping me along the way. First and foremost, I would like to thank my promotor, Andy Tanenbaum. While I was finishing up my master project under his supervision, he casually asked me “Would you like to be my Ph.D student?” during one of our meetings. I did not have to think long about the answer. Right from the start, he warned me that I would now have to conduct original research myself; only much later did I understand the full weight of this statement. Throughout the years, Andy’s supervision has been fairly hands-off, but he has also been truly supportive especially at key moments. As a result, I learned a lot, but I was also able to see it through to the end. I am incredibly proud to have worked as a Ph.D student under Andy’s supervision, and incredibly grateful that he guided me through the process. Among Andy’s contributions, perhaps the most important one was to suggest that I collaborate with Raja Appuswamy. As a result, I had the opportunity to work with Raja on his great new idea at the time, which turned into Loris and ended up being the ground work for this thesis as well. During our collaboration, I learned so many things from Raja–ranging from the more practical side of doing good research to various aspects of Indian culture. Working with him has always been pleasant and easy, and outside work I have been lucky enough to share several memorable adventures with him. Now that it is my turn to face the guillotine, I could not wish for a better paranymph! I would also like to thank Dirk Vogt, whom I am honored to have as my other paranymph. With his inspiring enthusiasm, broad interests, challenging ideas, and a great gift for generating entropy, he has been a key figure in the later years of my Ph.D and is a major part of the reason that I miss going to the VU on a regular basis. vii viii ACKNOWLEDGMENTS During my years as a Ph.D student, I have had the fortune to share office space and time with all of the researchers in Andy’s MINIX 3 group: Jorrit Herder, Mischa Geldermans, Raja Appuswamy, Cristiano Giuffrida, Tomáš Hrubý, Lorenzo Caval- laro, Dirk Vogt, and Erik van der Kouwe. From detailed technical discussions to pizza-and-movie nights, it has always been fun. The programmers deserve mention in the same breath: Ben Gras, Philip Homburg, Kees Bot, Arun Thomas, Thomas Veerman, Gianluca Guida, Kees Jongenburger, and Lionel Sambuc. They tolerated that I regularly put on a programmers’ hat as well, making my time as Ph.D student much more enjoyable even if longer! I would like to thank several others. The various students that contributed to MINIX 3, and the Loris project in particular, including Arne Welzel, Richard van Heuven van Staereling, and Sharan Santhanam. The Dutch lesson group: Ana-Maria Oprescu, Corina Stratan, Albana Gaba, Philip Homburg, Dirk Vogt, Daniela Remenska, and later various others. I will always cherish the amazing, cultural, crazy evenings and nights I shared with these fantastic people. The larger VU computer systems group for many fun days and evenings in and around the coffee room; Kaveh Razavi and Caroline Waij in particular. Arjen van Deutekom, for always being a good listener and a welcome source of interesting discussions, good advice, and comic relief. The moo team, for being somewhat of a home to me, even if virtual. Especially Oliver [Redacted] and Loren Segal, for enduring all my stories about work frustra- tion, helping me out with a few important issues, and providing many distractions. My talented sister, Eva Dienske, who always pushes me to get the most out of myself, and whose work I am proud to have on the cover of this thesis. My parents, Jaap van Moolenbroek and Joke Putters, for their undying support, encouragement, understanding, and love. Finally, I would like to thank the members of my reading committee: Herbert Bos, Cristiano Giuffrida, Jorrit Herder, Spyros Voulgaris, and Carsten Weinhold, who have taken the time to read my thesis and provide valuable feedback. David van Moolenbroek Hilversum, The Netherlands, June 2016 Contents Acknowledgments vii Contents ix 1 General Introduction 1 1.1 Problems . 3 1.2 Research approach and questions . 5 1.3 Research overview . 7 1.3.1 Loris: a new storage stack arrangement . 7 1.3.2 The platform . 9 1.3.3 Improving the reliability of Loris . 10 1.4 Contributions and thesis outline . 12 2 Loris - A Dependable, Modular, File-Based Storage Stack 15 2.1 Introduction . 16 2.2 Problems with the Traditional Storage Stack . 17 2.2.1 Reliability . 17 2.2.2 Flexibility . 19 2.2.3 Heterogeneity Issues . 20 2.3 Solutions Proposed in the Literature . 20 2.4 The Design of Loris . 21 2.4.1 The Physical Layer . 22 2.4.2 The Logical Layer . 25 2.4.3 The Cache Layer . 26 2.4.4 The Naming Layer . 27 2.5 The Advantages of Loris . 28 ix x CONTENTS 2.5.1 Reliability . 28 2.5.2 Flexibility . 29 2.5.3 Heterogeneity . 30 2.6 Evaluation . 30 2.6.1 Test Setup . 31 2.6.2 Evaluating Reliability and Availability . 31 2.6.3 Performance Evaluation . 35 2.7 Conclusion . 37 3 Integrated System and Process Crash Recovery in the Loris Storage Stack 39 3.1 Introduction . 40 3.2 Background: the Loris storage stack . 41 3.2.1 Layers of the stack . 42 3.3 The case for integrated recovery . 43 3.3.1 Recovering from system crashes . 43 3.3.2 Recovering from process crashes . 45 3.3.3 Integrated recovery . 45 3.4 Checkpointing . 46 3.4.1 The TwinFS file store . 46 3.4.2 General consistency scheme requirements . 48 3.4.3 Taking and reloading checkpoints . 49 3.5 Data resynchronization . 49 3.5.1 Limiting the areas to scan . 50 3.5.2 Verifying data . 50 3.5.3 The TwinFS resynchronization log . 51 3.5.4 Resynchronization procedure . 51 3.6 In-memory roll-forward logging . 52 3.6.1 Interaction with checkpointing . 52 3.6.2 Logging and replay . 53 3.6.3 Assumptions and guarantees . 55 3.7 Evaluation . 55 3.7.1 Performance evaluation . 55 3.7.2 Reliability evaluation . 57 3.8 Related work . 58 3.8.1 System crash recovery . 58 3.8.2 Process crash recovery . 59 3.9 Conclusion and future work . 60 4 Battling Bad Bits with Checksums in the Loris Page Cache 61 4.1 Introduction . 62 4.2 Background: the Loris stack . 63 4.3 The case for checksumming in the cache . 64 CONTENTS xi 4.3.1 Memory errors . 64 4.3.2 Software bugs . 66 4.4 Dealing with memory errors . 67 4.4.1 Suitability of on-disk checksums . 67 4.4.2 Propagation of checksums . 68 4.4.3 Verification strategies . 68 4.4.4 Other memory . 69 4.5 Dealing with software bugs . 70 4.5.1 Assumptions . 71 4.5.2 The Dirty State Store . 71 4.5.3 Checksumming dirty pages . 72 4.5.4 Recovery procedure . 73 4.5.5 Consequences for memory errors . 73 4.6 Implementation . 74 4.7 Evaluation . 74 4.7.1 Microbenchmarks . 74 4.7.2 Macrobenchmarks . 75 4.7.3 Fault injection . 79 4.8 Related work . 80 4.8.1 Memory errors . 80 4.8.2 Software bugs . 80 4.9 Conclusion . 81 5 Transaction-based Process Crash Recovery of File System Namespace Modules 83 5.1 Introduction . 84 5.2 Motivation . 85 5.2.1 Namespace modules as an emerging concept . 85 5.2.2 The reliability problem . 88 5.3 Design . 89 5.3.1 Assumptions . 89 5.3.2 Transactions and recovery . 90 5.3.3 Support in the object storage layer . 91 5.3.4 Support in the VFS layer . 92 5.3.5 Requirements for namespace modules . 92 5.4 Implementation . 93 5.4.1 Background: the Loris storage stack . 93 5.4.2 Infrastructure changes . 95 5.4.3 Case study: the POSIX namespace module . 95 5.4.4 Case study: the HDF5 namespace module . 97 5.5 Evaluation . 98 5.5.1 Performance . 98 5.5.2 Reliability . 101 xii CONTENTS 5.6 Related work . 102 5.7 Conclusion and future work . 103 6 Towards a Flexible, Lightweight Virtualization Alternative 105 6.1 Introduction . ..