User-Space Process Virtualization in the Context of Checkpoint-Restart and Virtual Machines
Total Page:16
File Type:pdf, Size:1020Kb
User-Space Process Virtualization in the Context of Checkpoint-Restart and Virtual Machines A dissertation presented by Kapil Arya to the Faculty of the Graduate School of the College of Computer and Information Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy Northeastern University Boston, Massachusetts August 2014 Copyright c August 2014 by Kapil Arya NORTHEASTERN UNIVERSITY GRADUATE SCHOOL OF COMPUTER SCIENCE Ph.D. THESIS APPROVAL FORM THESIS TITLE: User-Space Process Virtualization in the Context of Checkpoint-Restart and Virtual Machines AUTHOR: Kapil Arya Ph.D. Thesis approved to complete all degree requirements for the Ph.D. degree in Computer Science Distribution: Once completed, this form should be scanned and attached to the front of the electronic dissertation document (page 1). An electronic version of the document can then be uploaded to the Northeastern University-UMI website. Abstract Checkpoint-Restart is the ability to save a set of running processes to a check- point image on disk, and to later restart them from the disk. In addition to its traditional use in fault tolerance, recovering from a system failure, it has numerous other uses, such as for application debugging and save/restore of the workspace of an interactive problem-solving environment. Transparent checkpointing operates without modifying the underlying application pro- gram, but it implicitly relies on a “Closed World Assumption” — the world (including file system, network, etc.) will look the same upon restart as it did at the time of checkpoint. This is not valid for more complex programs. Until now, checkpoint-restart packages have adopted ad hoc solutions for each case where the environment changes upon restart. This dissertation presents user-space process virtualization to decouple ap- plication processes from the external subsystems. A thin virtualization layer is introduced between the application and each external subsystem. It pro- vides the application with a consistent view of the external world and allows for checkpoint-restart to succeed. The ever growing number of external sub- systems make it harder to deploy and maintain virtualization layers in a monolithic checkpoint-restart system. To address this, an adaptive plugin based approach is used to implement the virtualization layers that allow the checkpoint-restart system to grow organically. The principle of decoupling the external subsystem through process vir- tualization is also applied in the context of virtual machines for providing a solution to the long standing double-paging problem. Double-paging oc- curs when the guest attempts to page out memory that has previously been swapped out by the hypervisor and leads to long delays for the guest as the contents are read back into machine memory only to be written out again. The performance rapidly drops as a result of significant lengthening of the time to complete the guest I/O request. Acknowledgments No dissertation is accomplished without the support of many people and I can only begin to thank all those who have helped me in completing it. I am indebted to my advisor, Gene Cooperman, for his patience, encour- agement, support, and guidance over the years. It is because of Gene that I decided to go for a Ph.D., while I was a Master’s student at Northeastern. Gene taught me about how to do research and to distinguish the ideas that only I would find interesting, from the ideas that are important. I could not have asked for a better teacher and without him, this document would not exist. I am thankful to Panagiotis (Pete) Manolios, Alan Mislove and William Robertson for serving on my committee and for providing their insightful input and constructive criticism. I resoundingly thank Peter Desnoyers for always being available to discuss ideas and for providing constructive feed- back on several occasions. I also want to thank the International Student and Scholar Institute (ISSI) team and Bryan Lackaye for helping with the administrative matters during my stay at Northeastern. I was fortunate to be mentored by Alex Garthwaite during the summer internships at VMware. His guidance and encouragement is always there and never seems to fade away. Alex agreed to be the external member in my committee and I am thankful for his feedback and thoughtful comments that have not only improved the quality of this dissertation, but also pro- vided ideas for future directions. His dictum that a good dissertation is a completed one, became my mantra during the last two years. I also want to thank Yury Baskakov for all the help that I received while working on the Tesseract project. He never got tired of my random specula- tions and was always there to provide further insights and also to cover my blind spots. A special thanks goes to Jerri-Ann Meyer and Joyce Spencer for their continued support of the project. Finally, I want to thank Ron Mann for his continued advise and guidance that has helped me become a better engineer. I am grateful to Alok Singh Gehlot for his friendship, all the advice he provided me over the years, and for his constant reminder that it’s not done until it’s done. He was always available for me and without his guidance, I would not have been at Northeastern for my Master’s and later, Ph.D. I want to thank Rohan Garg and Jaideep Ramachandran for going through the thesis drafts and sitting through my practice talks and for providing valu- able feedback. Over the years, I have had the support of a lot of friends and I want to thank Jaijun Cao, Harsh Raju Chamarthi, Tyler Denniston, Anand Gehlot, Gregory Kerr, Samaneh Kazemi Nafchi, Artem Polyakov, Sumit Puro- hit, Praveen Singh Solanki, Ana-Maria Visan, Vishal Vyas, any others I regret- tably failed to name. I am enormously thankful to Surbhi for her enduring friendship and companionship through all these years. Finally, I owe much to my family. I want to express my deepest gratitude for my grandparents, Smt. Mohini Devi and Sh. Omdutt Ji, my parents, Smt. Jamana Devi and Sh. Nem Singh Ji, my aunt and uncle, Smt. Sangeeta Devi and Sh. Hari Singh Ji, my uncles Sh. Kamlesh Ji and Sh. Dilip Ji, and my siblings and cousins, Kavita, Lalita, Shilpa, and Anil, for their never ending love, dedication and support. I am forever indebted to them. To my grandfather Shri Omdutt Ji Solanki And my school teacher Shri Devi Singh Ji Kachhwaha Contents Contents List of Figures List of Tables 1 Overview1 1.1 Closed-World Assumption . .2 1.2 Double-Paging Anomaly . .4 1.3 Process Virtualization . .4 1.4 Thesis Statement . .6 1.5 Contributions . .7 1.5.1 Process Virtualization through Plugins . .7 1.5.2 Application-Specific Plugins . .8 1.5.3 Third-Party Plugins . .9 1.5.4 Solving the Double-Paging Problem . .9 1.6 Organization . 10 2 Concepts Related to Checkpoint-Restart and Virtualization 13 2.1 Checkpoint-Restart . 13 2.1.1 Kernel-Level Transparent Checkpoint-Restart . 15 2.1.2 User-Level Transparent Checkpoint-Restart . 18 2.1.3 Fault Tolerance . 21 2.2 System Call Interpositioning . 21 CONTENTS 2.3 Virtualization . 22 2.3.1 Language-Specific Virtual Machines . 22 2.3.2 Process Virtualization . 22 2.3.3 Lightweight O/S-based Virtual Machines . 23 2.3.4 Virtual Machines . 24 2.4 DMTCP Version 1 . 25 2.4.1 Library Call Wrappers . 27 2.4.2 DMTCP Coordinator . 27 2.4.3 Checkpoint Thread . 27 2.4.4 Checkpoint . 28 2.4.5 Restart . 28 2.4.6 Checkpoint Consistency for Distributed Processes . 29 3 Adaptive Plugins as a Mechanism for Virtualization 31 3.1 The Ever Changing Execution Environment . 31 3.1.1 PID: Virtualizing Kernel Resource Identifiers . 32 3.1.2 SSH Connection: Virtualizing a Protocol . 33 3.1.3 InfiniBand: Virtualizing a Device Driver . 35 3.1.4 OpenGL: A Record/Replay Approach to Virtualizing a Device Driver . 36 3.1.5 POSIX Timers: Adapting to Application Requirements 36 3.2 Virtualizing the Execution Environment . 37 3.2.1 Virtualize Access to External Resources . 37 3.2.2 Capture/Restore the State of External Resources . 38 3.3 Adaptive Plugins as a Synthesis of System-Level and Application- Level Checkpointing . 39 4 The Design of Plugins 41 4.1 Plugin Architecture . 42 4.1.1 Virtualization through Function Wrappers . 43 4.1.2 Event Notifications . 46 CONTENTS 4.1.3 Publish/Subscribe Service . 49 4.2 Design Recipe for Virtualization through Plugins . 50 4.3 Plugin Dependencies . 52 4.3.1 Dependency Resolution . 52 4.3.2 External Resources Virtualized by Other Plugins . 54 4.3.3 Multiple Plugins Wrapping the Same Function . 55 4.4 Extending to Multiple Processes . 56 4.4.1 Unique Resource-id for Shared Resources . 57 4.4.2 Checkpointing Shared Resources . 58 4.4.3 Restoring Shared Resources . 61 4.5 Three Base Plugins . 62 4.5.1 Coordinator Interface Plugin . 62 4.5.2 Thread Plugin . 62 4.5.3 Memory Plugins . 63 4.6 Implementation Challenges . 65 4.6.1 Wrapper Functions . 65 4.6.2 New Process/Program Creation . 67 4.6.3 Checkpoint Deadlock on a Runtime Library Resource 68 4.6.4 Blocking Library Functions and Checkpoint Starvation 69 5 Expressivity of Plugins 71 5.1 File Descriptor Related Plugins . 73 5.2 Pid, System V IPC, and Timer Plugins . 77 5.3 Application-Specific Plugins . 77 5.4 SSH Connection . 78 5.5 Batch-Queue Plugin for Resource Managers . 81 5.6 Ptrace Plugin . 84 5.7 Deterministic Record-Replay . 85 5.8 Checkpointing Networks of Virtual Machines . 87 CONTENTS 5.9 3-D Graphic: Support for Programmable GPUs in OpenGL 2.0 and Higher .