Non-Intrusive Virtual Systems Monitoring by Sahil Suneja a Thesis
Total Page:16
File Type:pdf, Size:1020Kb
Non-intrusive Virtual Systems Monitoring by Sahil Suneja A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Computer Science University of Toronto c Copyright 2016 by Sahil Suneja Abstract Non-intrusive Virtual Systems Monitoring Sahil Suneja Doctor of Philosophy Graduate Department of Computer Science University of Toronto 2016 In this thesis, I discuss why existing intrusive systems monitoring approaches are not a good fit for the modern virtualized cloud, and describe two alternative out-of-band solutions that leverage virtualization for better systems monitoring. My first solution employs Virtual Machine Introspection (VMI) to gain access to a VM's runtime state from the virtualization layer. I develop new VMI techniques to efficiently expose VM memory state from outside the VM boundary, which can be readily employed in existing cloud platforms as they are designed to operate with no new modifications or dependencies. While there exist a variety of other competing alternatives, their latency, overhead, complexity and consistency trade-offs are not clear. Thus, I begin my thesis with addressing this gap by organizing the various existing VMI techniques into a taxonomy based upon their operational principles, and performing a thorough exploration of their trade-offs both qualitatively and quantitatively. I further present a deep dive on VMI consistency aspects to understand the sources of inconsistency in observed VM state, and show marginal benefits for consistency with commonly employed VMI solutions despite their prohibitive overheads. Then, I present NFM (Near Field Monitoring)- a new approach that decouples system execution from monitoring by pushing monitoring components out of the target systems' scope. By extending and combining VMI with a backend cloud analytics platform, NFM provides simple, standard interfaces to monitor running systems in the cloud that require no guest cooperation or modification, and have minimal effect on guest execution. By decoupling monitoring and analytics from target system context, NFM provides always-on monitoring, even when the target system is unresponsive. My second solution- CIVIC (Cloning and Injection based VM Inspection and Customization)- avoids NFM's functionality duplication effort and overcomes its VMI-related limitations arising out of its raw memory byte level visibility into the guest. CIVIC operates at a logical OS level and reuses the vast stock monitoring software codebase, but in a separate isolated environment thus avoiding guest intrusion and interference hassles. CIVIC enables a broader usage scope in addition to NFM's passive (read-only) monitoring, by supporting actuation or on-the-fly introduction of new functionality. It restricts all impact and side- effects of such customization operations inside a live clone of the guest VM. New functionality over the ii replicated VM state is introduced using code injection. I present four applications built on top of NFM using its `systems as data' monitoring approach, to showcase its capabilities for across-systems and across-time analytics. I also highlight CIVIC's versatility in terms of enabling hotplugged and impact-free live customization, by employing it to monitor, inspect, troubleshoot and tune unmodified VMs. iii Acknowledgements I express my sincere gratitude to Prof. Eyal de Lara for steering me towards the successful completion of this voyage of exploration. I've learnt a great many things from him, and I could not have asked for a more supportive supervisor. I am thankful to my committee members- Prof. Angela Demke Brown, Prof. Bianca Schroeder, and Prof. Ryan Johnson- for their guidance and backing. I owe a great deal of gratitude to my mentors at IBM Research- Dr. Canturk Isci and Dr. Vasanth Bala- for their encouragement, collaboration and contribution to this work. I also want to thank all members of the DCS Graduate Office for facilitating a smoothly functioning work environment. During the course of my journey I've been helped by so many of them, that I'm afraid I'll miss out on any names if I start listing them here! A note of thanks also to my fellow graduate students and faculty members. I've always been in awe of their brilliance and dedication, which has humbled me and motivated me to strive hard. My sincere gratitude and respect to my parents- Mr. S.K. Suneja and Mrs. Vandna Suneja- for their love, affection and emotional support, encouraging me to put in my sincere efforts. A very special thanks to my brother, Sagar Suneja, for pushing me forward during the last mile. Thanks also to my friends for their help and support throughout my time in Toronto. It would be unfair not to thank the numerous stackoverflow.com users for sharing their technical knowledge! Finally, Thank you God for all of the above! Sahil Suneja iv Contents 1 Introduction 1 2 Background and Related Work 4 2.1 Data Center Monitoring Tasks . .4 2.2 Monitoring Techniques . .5 2.3 VMI Techniques . .6 2.3.1 Exposing VM State . .6 2.3.2 Exploiting VM State . .6 2.4 VMI Applications . .7 2.5 Other Candidate Techniques for Monitoring . .7 2.5.1 Concerns with Alternatives . .8 3 Exploring VM Introspection: Techniques and Trade-offs 10 3.1 VMI Taxonomy . 11 3.2 Qualitative Comparison . 14 3.3 Quantitative Comparison . 15 3.3.1 Maximum Monitoring Frequency . 17 3.3.2 Resource Cost on Host . 18 3.3.3 Impact on VM's Performance . 19 3.3.4 Real Workload Results . 21 3.4 Consistency of VM State . 25 3.4.1 Inconsistency Types . 25 3.4.2 Quantitative Evaluation . 27 3.5 Observations and Recommendations . 28 3.6 Summary . 30 4 Near Field Monitoring 32 4.1 NFM's Design . 33 4.2 Implementation . 35 4.2.1 Exposing VM State . 36 4.2.2 Exploiting VM State . 37 4.2.3 The Frame Datastore . 39 4.2.4 Application Architecture . 39 4.3 Prototype Applications . 39 v 4.3.1 TopoLog . 40 4.3.2 CTop . 42 4.3.3 RConsole . 43 4.3.4 PaVScan . 44 4.4 Evaluation . 45 4.4.1 Latency and Frequency of Monitoring . 46 4.4.2 Monitoring Accuracy . 47 4.4.3 Benefits of Holistic Knowledge . 47 4.4.4 Operational Efficiency Improvements . 48 4.4.5 Impact on VM's Performance . 49 4.4.6 Impact on Co-located VMs . 50 4.4.7 Space Overhead . 51 4.5 Summary . 52 5 Cloning and Injection based VM Inspection and Customization 53 5.1 CIVIC's Design . 55 5.1.1 Discussion . 57 5.2 Implementation . 57 5.2.1 Disk COW . 58 5.2.2 Live Migration . 58 5.2.3 COW Memory . 58 5.2.4 Hotplugging . 59 5.2.5 Code Injection . 59 5.2.6 Application Loader Script . 61 5.3 Performance Evaluation . 62 5.3.1 Memory Cost . 62 5.3.2 Clone Instantiation Time . 63 5.3.3 Impact on Source VM . 64 5.4 Applications . 65 5.4.1 Safe Agent Reuse . 65 5.4.2 Anomaly Detection . 66 5.4.3 Problem Diagnostics and Troubleshooting . 67 5.4.4 Autotuning-as-a-Service . 69 5.5 Conclusion . 70 6 Conclusion and Future Work 71 Bibliography 74 vi List of Tables 3.1 Qualitative comparison of VMI techniques- empty cells in compatibility column indicates functionality not advertised by hypervisor, or enabled by users. 14 4.1 Key capabilities of the prototype applications . 39 6.1 NFM vs. CIVIC . 72 vii List of Figures 3.1 VMI Taxonomy: categorizing current implementations . 11 3.2 Comparing maximum monitoring frequency across all KVM instances of VMI techniques . 17 3.3 CPU used vs. maximum monitoring frequency . 18 3.4 Comparing % degradation on x264 benchmark's frames-encoded/s as a function of moni- toring frequency. 20 3.5 Comparing % degradation on memory, disk and network throughput as a function of monitoring frequency . 22 3.6 Comparing % degradation on Sysbench OLTP benchmarks transactions/s as a function of monitoring frequency . 23 3.7 Comparing % degradation on httperf's metrics as a function of monitoring frequency . 24 3.8 Observed inconsistency probabilities for all categories. 27 4.1 Introspection and analytics architecture ..