Virtualization for Bare-Metal Cloud

Directvisor: Virtualization for Bare-metal Cloud Kevin Cheng Spoorti Doddamani Tzi-Cker Chiueh Binghamton University Binghamton University Industrial Technology Research New York, USA New York, USA Institute, Taiwan [email protected] [email protected] [email protected] Yongheng Li Kartik Gopalan Binghamton University Binghamton University New York, USA New York, USA [email protected] [email protected] Abstract ACM Reference Format: Bare-metal cloud platforms allow customers to rent remote Kevin Cheng, Spoorti Doddamani, Tzi-Cker Chiueh, Yongheng physical servers and install their preferred operating sys- Li, and Kartik Gopalan. 2020. Directvisor: Virtualization for Bare- metal Cloud. In ACM SIGPLAN/SIGOPS International Conference on tems and software to make the best of servers’ raw hardware Virtual Execution Environments (VEE ’20), March 17, 2020, Lausanne, capabilities. However, this quest for bare-metal performance Switzerland. ACM, New York, NY, USA, 14 pages. https://doi.org/ compromises cloud manageability. To avoid overheads, cloud 10.1145/3381052.3381317 operators cannot install traditional hypervisors that provide common manageability functions such as live migra- 1 Introduction tion and introspection. We aim to bridge this gap between Conventional multi-tenant cloud services [14, 27, 33] en- performance, isolation, and manageability for bare-metal able customers to rent traditional system virtual machines clouds. Traditional hypervisors are designed to limit and em- (VM) [1, 5, 29] to scale up their IT operations to the cloud. ulate hardware access by virtual machines (VM). In contrast, However, commodity hypervisors used for virtualizing these we propose Directvisor – a hypervisor that maximizes a platforms suffer from both performance overheads and iso- VM’s ability to directly access hardware for near-native per- lation concerns arising from co-located workloads of other formance, yet retains hardware control and manageability. users. To address this concern, cloud operators have begun Directvisor goes beyond traditional direct-assigned (pass- offering [12, 13] bare-metal cloud service which allows cus- through) I/O devices by allowing VMs to directly control tomers to rent dedicated remote physical machines. Bare- and receive hardware timer interrupts and inter-processor metal cloud customers are assured stronger isolation than interrupts (IPIs) besides eliminating most VM exits. At the multi-tenant clouds and bare-metal performance for critical same time, Directvisor supports seamless (low-downtime) workloads such as big data analytics and AI. live migration and introspection for such VMs having direct However, the quest for native performance and physi- hardware access. cal isolation compromises cloud manageability. Since cloud CCS Concepts • Software and its engineering → Vir- operators do not install hypervisors on bare-metal servers, tual machines; Operating systems; they lose many essential manageability services provided by hypervisors, such as live migration [11, 25], high availabil- Keywords Virtualization, Hypervisor, Virtual Machine, Bare- ity [15], patching [10], and introspection-based security [18, metal cloud, Live migration 20, 24, 42, 44]. In contrast, multi-tenant cloud providers com- Permission to make digital or hard copies of all or part of this work for pete to differentiate their offerings through rich hypervisor- personal or classroom use is granted without fee provided that copies are not level services. We aim to bridge this gap between perfor- made or distributed for profit or commercial advantage and that copies bear mance, isolation, and manageability for bare-metal clouds. this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with We propose Directvisor to provide the best of both worlds: credit is permitted. To copy otherwise, or republish, to post on servers or to performance of bare-metal clouds and manageability of virtu- redistribute to lists, requires prior specific permission and/or a fee. Request alized clouds. Directvisor runs one or more DirectVMs which permissions from [email protected]. are near-native VMs that directly access dedicated hardware. VEE ’20, March 17, 2020, Lausanne, Switzerland Traditional hypervisors are designed to limit and emulate © 2020 Association for Computing Machinery. hardware access by VMs. In contrast, Directvisor is designed ACM ISBN 978-1-4503-7554-2/20/03...$15.00 https://doi.org/10.1145/3381052.3381317 to maximize a VM’s ability to directly access hardware for near-native performance while retaining hardware control and manageability. During normal execution, the Directvi- sor allows a DirectVM to directly interact with processor and device hardware without hypervisor’s intervention, as VEE ’20, March 17, 2020, Lausanne, Switzerland K. Cheng, S. Doddamni, T. Chiueh, Y. Li, and K. Gopalan if the VM runs directly on the physical server. However, since it requires saving the VM’s execution context upon the Directvisor maintains its ability to regain control over a Di- exit, emulating the exit reason in the hypervisor, and finally rectVM when needed, such as for live migration, introspec- restoring the VM’s context before VM entry. tion, high availability, and performance monitoring. Specifi- Direct Device Access: Intel VT-d [2] provides processor- cally, Directvisor makes the following novel contributions. level support, called IOMMU [7], for direct and safe access (1) Direct Interrupt Processing: Directvisor goes be- to hardware I/O devices by unprivileged VMs running in yond traditional direct-assigned (pass-through) I/O devices non-root mode, which is the processor privilege with which to allow VMs to directly control and receive timer inter- VMs execute. Virtual function I/O (VFIO) [45] is a Linux rupts and inter-processor interrupts (IPIs) without intercep- software framework that enables user-space device drivers tion or emulation by the hypervisor. This is accomplished to interact with I/O devices directly without involving the through a novel use of processor-level support for virtual- Linux kernel. In the KVM/QEMU platform, a VM runs as ization [30, 49], directed I/O, and posted interrupts [2, 31]. part of a user-space process called QEMU [6]; specifically, Direct receiving of timer interrupts and IPIs in a VM greatly guest VCPUs run as non-root mode threads within QEMU. reduces the corresponding processing latencies, which is im- QEMU uses VFIO to configure a VM to directly access an I/O portant for latency-critical applications. In contrast, existing device without emulation by either KVM or QEMU. In con- approaches [3, 39, 45, 48] focus only on direct processing of trast, in a para-virtual [47] I/O architecture, the hypervisor device I/O interrupts. Additionally, Directvisor also elimi- emulates a simplified virtual I/O device, called virtio [43], nates the most common VM exits and nested paging over- which provides worse I/O performance than VFIO. heads, besides ensuring inter-VM isolation and hypervisor Timer Interrupts and IPIs: A CPU may experience two transparency. Other than during startup and live migration, types of interrupts: external and local interrupts. External the Directvisor is not involved in a DirectVM’s normal exe- interrupts originate from external I/O devices, such as a net- cution. work card and disk. Local interrupts originate within proces- (2) Seamless Live Migration and Manageability: Di- sor hardware, such as timer interrupts and IPI. A local APIC rectvisor supports seamless (low-downtime) live migration (Advanced Programmable Interrupt Controller) associated of a DirectVM by switching the VM from direct hardware with each CPU core delivers both types of interrupts. access to emulated/para-virtual access at the source ma- Posted Interrupts: Normally, when a CPU running a chine before migration and re-establishing direct access VM’s virtual CPU (VCPU) receives an interrupt, a VM exit is at the destination after migration. Unlike existing live mi- triggered. The hypervisor then processes the interrupt and, gration [23, 26, 41, 50–52] approaches for VMs with pass- if necessary, emulates the hardware interrupt by delivering through I/O access, Directvisor does not require device- virtual interrupts to one or more VMs. The Posted Interrupt specific state capture and migration code, maintains liveness mechanism [2, 31] is a processor-level hardware support during device switchover, and does not require the hypervi- that allows a VM to directly receive external interrupts from sor to trust the guest OS. Additionally, Directvisor supports directly assigned I/O devices without triggering a VM exit to other manageability functions of traditional clouds, such as the hypervisor. In this case, the IOMMU and local APIC hard- VM introspection and checkpointing. ware convert the external interrupt into a special interrupt, Our Directvisor prototype was implemented by modify- called Posted Interrupt Notification (or PIN) vector, which do ing the KVM/QEMU virtualization platform and currently not cause VM exits. Because the external interrupts “pretend” supports Linux guests in DirectVM. The rest of this paper de- to be a PIN interrupt, a VM can receive them directly without scribes the detailed design, implementation, and evaluation any VM exits. of Directvisor’s virtualization support for DirectVM. 3 Directvisor Overview 2 Background Figure 1 shows the high-level architecture of Directvisor, In this section,

Virtualization for Bare-Metal Cloud

Towards Scalable Multiprocessor Virtual Machines

NASM Intel X86 Assembly Language Cheat Sheet

Optimizing HLT Code for Run-Time Efficiency

Understanding the Linux Kernel, 3Rd Edition by Daniel P

Virtualization in Linux KVM + QEMU

MS-DOS System Programmer's Guide I I Document No.: 819-000104-3001 Rev

KVM Message Passing Performance

PIC12F752/HV752 Data Sheet

Intel X86 Assembly Language & Microarchitecture

Intel® Architecture Instruction Set Extensions and Future Features

Controlling Processor C-State Usage in Linux

1 CS 3204 Operating Systems Pgy Announcements