Virtual Devices for Virtual Machines Andrew Kent Warfield

Virtual Devices for Virtual Machines Andrew Kent Warfield Clare Hall A dissertation submitted to the University of Cambridge for the degree of Doctor of Philosophy University of Cambridge Computer Laboratory William Gates Building 15 JJ Thomson Avenue Cambridge CB3 0FD UK Email: andrew.warfi[email protected] May 5, 2006 This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration except where specifically indi- cated in the text. This dissertation is not substantially the same as any I have submitted for a degree or diploma or any other qualification at any other university. No part of this dissertation has already been, or is currently being submitted for any such degree, diploma or other qualification. This dissertation does not exceed sixty thousand words. This dissertation is copyright c 2005 Andrew Warfield. All trademarks used in this dissertation are hereby acknowledged. ii Virtual Devices for Virtual Machines Summary Andrew Kent Warfield May 5, 2006 Clare Hall College Computer systems research has recently seen a huge resurgence of interest in hardware virtualization, a software technique originally developed to man- age mainframe computers in the 1960s. Using virtual machines (VMs), a commodity PC may be divided into isolated “slices”, each perceiving that it is executing on separate physical hardware. This thesis considers the ef- fective virtualization of I/O devices on commodity hardware and presents an approach that allows developers to add new functionality to a piece of hardware as a software extension, running in an isolated VM. The new virtual device is presented to the OS using the existing virtualized hardware interface, allowing extensions to be easily applied across a wide range of operating systems. Isolating extensions in their own virtual machines is effectively a “sledge- hammer” version of the system decomposition that was attempted by mi- crokernels through the 1980s and 1990s. The VM-based approach has the benefit of demonstrably working with a broad range of existing systems, and allowing developers to build extensions in their OS and language of choice. It concurrently maintains the benefits of isolation: extension crashes are protected from disrupting the rest of the system, and extension software has a clean and simple interface to devices. This thesis develops this work by demonstrating the construction of a set of device extensions for various pieces of hardware. Additionally, this thesis demonstrates that device extensions may be aggregated within cluster environments to implement device services, allowing specific device types to be treated as a service throughout a cluster of virtual machines. Several examples are presented to validate the flexibility of device extensions: A packet symmetry-based rate limiter demonstrates a single-host network extension that prevents VMs from issuing common forms of denial of service attacks. Parallax, a distributed storage system for VMs, demonstrates the implementation of a device service for the management of storage within a cluster. Finally, device extensions are combined with other virtualization projects to develop deployable system-wide extensions to virtual hardware. Acknowledgements I am indebted to my advisor, Steven Hand, who has acted as both friend and mentor throughout my doctoral work. Steve is a gifted teacher and frequently provided the right guidance at the right time. I hope to be able to match his standard in supervising my own graduate students in the future. The packet symmetry work in Chapter 4 stems from collaboration with Christian Kreibich, Jon Crowcroft, Steve Hand, and Ian Pratt. James Bulpin developed two excellent implementations of the distributed block store used by Parallax in Chapter 5, and Christian Limpach provided the file system measurements presented in Figure 5.6. The work in Chapter 6 is a result of collaborations in combining the soft device framework with other research projects. The extensions for debugging are in collaboration with Alex Ho, while taint tracking is the result of ongoing collaboration with Alex, Michael Fetterman, Christopher Clark, and Steve Hand. I have been very lucky to be a member of the Systems Research Group during a period of time where so much great work has been in progress, and am delighted to have been able to participate in such a broad range of projects during my time at Cambridge. In particular, James Bulpin, Julian Chesterfield, Christopher Clark, Jon Crowcroft, Michael Dales, Tim Dee- gan, Michael Fetterman, Keir Fraser, Alex Ho, Eva Kalyvianaki, Christian Kreibich, Christian Limpach, Anil Madhavapeddy, Rolf Neugebauer, Ian Pratt, Russ Ross, and Steven Smith have all provided insightful discussion and generally been really good fun to work with. Thanks to Steve, Jon, Christian K., Julian, Rolf, Tim D., Alex, and Richard Mortier for providing feedback on drafts of this thesis. Thanks finally to the staff of the Castle Pub, who have provided the impetus for more interesting research than they can possibly realize. iv Table of Contents Summary iii Acknowledgements iv 1 Introduction 1 1.1 Motivation . 1 1.2 Contribution . 2 1.3 Outline . 3 1.4 Published Results . 4 2 Background 7 2.1 The Virtualization Renaissance . 7 2.1.1 Role of a Virtual Machine Monitor . 8 2.1.2 VMMs in Modern Systems . 9 2.1.3 High-level Principles . 11 2.2 The Xen Virtual Machine Monitor . 14 2.2.1 Xen . 16 2.2.2 Paravirtualized Hardware Interface . 17 2.2.3 Live VM Migration . 22 2.3 Summary . 25 v 3 The Soft Device Architecture 27 3.1 Extending Devices . 28 3.1.1 Performance and Safety . 29 3.1.2 Software Engineering . 30 3.2 Split Drivers . 32 3.2.1 Isolating Driver Code . 33 3.2.2 VM Device Interface . 34 3.2.3 The Data Path: Device Channels and Grant Tables . 35 3.2.4 The Control Path: Control Interfaces and XenStore . 42 3.2.5 The Virtual Block Interface . 45 3.2.6 The Virtual Network Interface . 48 3.2.7 Performance of Split Drivers . 49 3.3 Soft Devices . 52 3.3.1 The Device Tap . 54 3.3.2 The Block Tap . 58 3.3.3 The Network Tap . 63 3.3.4 Performance . 65 3.4 Device Services . 71 3.4.1 A Device Service Architecture . 72 3.4.2 Performance Isolation . 73 3.4.3 Security Isolation Through Narrow Interfaces . 74 3.4.4 Failure Isolation and Fate Sharing . 74 3.4.5 Administrative Isolation . 75 3.5 Summary . 76 4 Soft Devices for Traffic Management 78 4.1 Network Device Extensions in Xen . 79 4.1.1 Preventing Source Address Spoofing . 80 vi 4.1.2 Resource Isolation and Rate Control . 80 4.1.3 Migration of Network Addresses . 81 4.1.4 Virtual Private Networks . 82 4.2 Operator Responsibility and Denial of Service . 82 4.3 Packet Symmetry Enforcement . 83 4.3.1 Packet Symmetry . 84 4.3.2 Prototype Implementation . 86 4.4 Summary . 89 5 Soft Devices for Storage 91 5.1 Parallax: A Distributed Storage Service for Virtual Machines 92 5.1.1 Overall System Design . 94 5.1.2 Implementation and Evaluation . 106 5.1.3 Future Work . 109 5.1.4 Current Status . 112 5.2 Summary . 112 6 Supporting System-Wide Architectural Change 113 6.1 Device Support for Pervasive Debugging . 114 6.1.1 Hardware Modifications in PDB . 114 6.1.2 Soft Device Support for Debugging . 116 6.1.3 Understanding Intrusions . 117 6.2 Taint-based Data Isolation . 118 6.2.1 Overview of Taint-tracking . 118 6.2.2 Device Extensions . 120 6.3 Summary . 121 vii 7 Related Work 123 7.1 Virtual Machines . 124 7.2 Device Access in Operating Systems . 126 7.3 Extensibility in System Software . 130 7.4 Device Virtualization and Extension . 131 7.5 Smart Devices . 133 8 Conclusion 134 8.1 Future Work . 134 8.1.1 Extending the Existing Extensions . 134 8.1.2 Other Device Interfaces . 135 8.2 Summary . 135 viii List of Tables 2.1 The Paravirtualized x86 Interface. 17 3.1 Xen's Virtual Block Device Interface. 45 3.2 Network Latency Overhead . 69 3.3 Summary of Approaches to the Management of Virtual Devices 77 5.1 VM Interfaces to CVDs . 97 5.2 Administrative Interfaces to CVDs . 98 ix List of Figures 2.1 Overview of a VMM-based System . 8 2.2 Division of the Administrative Role . 12 2.3 Type I and II VMMs . 15 2.4 Memory in Xen . 20 2.5 Migration Timeline . 23 2.6 Results of Migrating a Running Web Server VM . 25 3.1 Soft Device Overview . 29 3.2 Split Driver Structure . 35 3.3 Shared-Memory Ring . 40 3.4 Device Channel Example: Block Read . 41 3.5 XenStore Overview . 43 3.6 The Block Request Structure . 46 3.7 System Benchmarks of Split Drivers . 50 3.8 High-level View of a Traffic-limiting Soft Device . 52 3.9 Overview of Device Tap Configurations . 54 3.10 Device Tap Structure . 55 3.11 Examples of Forwarding Modes . 57 3.12 Structure of the Block Tap . 58 3.13 Example of the blktaplib Interface . 62 3.14 Example of the libipq Interface . 64 x 3.15 Block Throughput . 66 3.16 Block Request Timelines . 67 3.17 Network Throughput . 68 3.18 Structure of a Device Service . 72 4.1 Illustration of Asymmetry-based Rate Limiting . 85 4.2 Simple DDoS Example: UDP Flood . 86 4.3 Limiting UDP Flood Based on Packet Symmetry . 88 4.4 High Throughput TCP Traffic Remains Unaffected . 89 5.1 Parallax High-Level Architecture . 95 5.2 CVD Snapshot and Copy-on-Write .

Virtual Devices for Virtual Machines Andrew Kent Warfield

Understanding Full Virtualization, Paravirtualization, and Hardware Assist

The Server Virtualization Landscape, Circa 2007

A Virtual Software Appliance for LHC Applications

Virtualization Technologies Overview Course: CS 490 by Mendel

Virtualization and Cloud Computing Virtualization Introduction Day 01, Session 1.2 Virtualization

The New Economics of Virtualization the Impact of Open Source, Hardware-Assist and Native Virtualization TABLE of CONTENTS Introduction

Virtual Iron Software, Inc

Platespin® Recon

A Performance Comparison of Hypervisors

Hexagon Title Slide

A Comparison of Virtualization Technologies for HPC∗

Virtualization and Virtual Machine Software. We Help You to Choose the Best VM