The Design, Implementation, and Evaluation of Software and Architectural Support for ARM Virtualization
Total Page:16
File Type:pdf, Size:1020Kb
The Design, Implementation, and Evaluation of Software and Architectural Support for ARM Virtualization Christoffer Dall Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2018 c 2017 Christoffer Dall All rights reserved ABSTRACT The Design, Implementation, and Evaluation of Software and Architectural Support for ARM Virtualization Christoffer Dall The ARM architecture is dominating in the mobile and embedded markets and is making an up- wards push into the server and networking markets where virtualization is a key technology. Similar to x86, ARM has added hardware support for virtualization, but there are important differences be- tween the ARM and x86 architectural designs. Given two widely deployed computer architectures with different approaches to hardware virtualization support, we can evaluate, in practice, benefits and drawbacks of different approaches to architectural support for virtualization. This dissertation explores new approaches to combining software and architectural support for virtualization with a focus on the ARM architecture and shows that it is possible to provide virtual- ization services an order of magnitude more efficiently than traditional implementations. First, we investigate why the ARM architecture does not meet the classical requirements for virtualizable architectures and present an early prototype of KVM for ARM, a hypervisor using lightweight paravirtualization to run VMs on ARM systems without hardware virtualization sup- port. Lightweight paravirtualization is a fully automated approach which replaces sensitive instruc- tions with privileged instructions and requires no understanding of the guest OS code. Second, we introduce split-mode virtualization to support hosted hypervisor designs using ARM’s architectural support for virtualization. Different from x86, the ARM virtualization exten- sions are based on a new hypervisor CPU mode, separate from existing CPU modes. This separate hypervisor CPU mode does not support running existing unmodified OSes, and therefore hosted hypervisor designs, in which the hypervisor runs as part of a host OS, do not work on ARM. Split- mode virtualization splits the execution of the hypervisor such that the host OS with core hypervisor functionality runs in the existing kernel CPU mode, but a small runtime runs in the hypervisor CPU mode and supports switching between the VM and the host OS. Split-mode virtualization was used in KVM/ARM, which was designed from the ground up as an open source project and merged in the mainline Linux kernel, resulting in interesting lessons about translating research ideas into practice. Third, we present an in-depth performance study of 64-bit ARMv8 virtualization using server hardware and compare against x86. We measure the performance of both standalone and hosted hy- pervisors on both ARM and x86 and compare their results. We find that ARM hardware support for virtualization can enable faster transitions between the VM and the hypervisor for standalone hyper- visors compared to x86, but results in high switching overheads for hosted hypervisors compared to both x86 and to standalone hypervisors on ARM. We identify a key reason for high switching overhead for hosted hypervisors being the need to save and restore kernel mode state between the host OS kernel and the VM kernel. However, standalone hypervisors such as Xen, cannot leverage their performance benefit in practice for real application workloads. Other factors related to hyper- visor software design and I/O emulation play a larger role in overall hypervisor performance than low-level interactions between the hypervisor and the hardware. Fourth, realizing that modern hypervisors rely on running a full OS kernel, the hypervisor OS kernel, to support their hypervisor functionality, we present a new hypervisor design which runs the hypervisor and its hypervisor OS kernel in ARM’s separate hypervisor CPU mode and avoids the need to multiplex kernel mode CPU state between the VM and the hypervisor. Our design benefits from new architectural features, the virtualization host extensions (VHE), in ARMv8.1 to avoid modifying the hypervisor OS kernel to run in the hypervisor CPU mode. We show that the hypervisor must be co-designed with the hardware features to take advantage of running in a sep- arate CPU mode and implement our changes to KVM/ARM. We show that running the hypervisor OS kernel in a separate CPU mode from the VM and taking advantage of ARM’s ability to quickly switch between the VM and hypervisor results in an order of magnitude reduction in overhead for important virtualization microbenchmarks and reduces the overhead of real application workloads by more than 50%. Contents List of Figures iv List of Tables v Introduction 1 1 ARM Virtualization without Architectural Support 9 1.1 Requirements for Virtualizable Architectures . 9 1.1.1 Virtualization of ARMv7 . 11 1.1.2 Virtualization of ARMv8 . 15 1.2 KVM for ARM without Architectural Support . 15 1.2.1 Lightweight Paravirtualization . 17 1.2.2 CPU Exceptions . 22 1.2.3 Memory Virtualization . 24 1.3 Related Work . 27 1.4 Summary . 29 2 The Design and Implementation of KVM/ARM 30 2.1 ARM Virtualization Extensions . 32 2.1.1 CPU Virtualization . 32 2.1.2 Memory Virtualization . 36 2.1.3 Interrupt Virtualization . 38 2.1.4 Timer Virtualization . 43 2.1.5 Comparison of ARM VE with Intel VMX . 45 2.2 KVM/ARM System Architecture . 47 i 2.2.1 Split-mode Virtualization . 48 2.2.2 CPU Virtualization . 52 2.2.3 Memory Virtualization . 56 2.2.4 I/O Virtualization . 58 2.2.5 Interrupt Virtualization . 61 2.2.6 Timer Virtualization . 63 2.3 Reflections on Implementation and Adoption . 64 2.4 Experimental Results . 69 2.4.1 Methodology . 70 2.4.2 Performance and Power Measurements . 73 2.4.3 Implementation Complexity . 78 2.5 Related Work . 80 2.6 Summary . 82 3 Performance of ARM Virtualization 83 3.1 Hypervisor Design . 84 3.1.1 Hypervisor Overview . 85 3.1.2 ARM Hypervisor Implementations . 87 3.2 Experimental Design . 88 3.3 Microbenchmark Results . 90 3.4 Application Benchmark Results . 97 3.5 Related Work . 105 3.6 Summary . 107 4 Improving ARM Virtualization Performance 109 4.1 Architectural Support for Hosted Hypervisors . 112 4.1.1 Intel VMX . 112 4.1.2 ARM VE . 113 4.1.3 KVM . 114 4.2 Hypervisor OS Kernel Support . 116 4.2.1 Virtualization Host Extensions . 117 ii 4.2.2 el2Linux . 121 4.3 Hypervisor Redesign . 124 4.4 Experimental Results . 127 4.4.1 Microbenchmark Results . 129 4.4.2 Application Benchmark Results . 131 4.5 Related Work . 135 4.6 Summary . 136 5 Conclusions and Future Work 137 5.1 Conclusions . 137 5.2 Future Work . 141 Bibliography 146 iii List of Figures 1.1 Address Space Mappings . 25 2.1 ARMv7 Processor Modes . 33 2.2 ARMv8 Processor Modes . 33 2.3 Stage 1 and stage 2 Page Table Walk . 37 2.4 ARM Generic Interrupt Controller v2 (GICv2) overview . 41 2.5 KVM/ARM System Architecture . 50 2.6 UP VM Normalized lmbench Performance . 75 2.7 SMP VM Normalized lmbench Performance . 75 2.8 UP VM Normalized Application Performance . 76 2.9 SMP VM Normalized Application Performance . 76 2.10 SMP VM Normalized Energy Consumption . 78 3.1 Hypervisor Design . 85 3.2 Xen ARM Architecture . 87 3.3 KVM/ARM Architecture . 87 3.4 Application Benchmark Performance . 99 4.1 KVM on Intel VMX and ARM VE . 115 4.2 Hypervisor Designs and CPU Privilege Levels . 116 4.3 Virtualization Host Extensions (VHE) . 120 4.4 Application Benchmark Performance . 133 iv List of Tables 1.1 ARMv7 sensitive non-privileged instructions . 12 1.2 ARMv6 sensitive non-privileged instructions . 12 1.3 Sensitive non-privileged status register instruction descriptions . 14 1.4 Sensitive instruction encoding types . 21 1.5 ARMv7-A Exceptions . 23 2.1 Comparison of ARM VE with Intel VMX . 45 2.2 ARMv7 CPU State . 53 2.3 ARMv8 CPU state (AArch64) . 53 2.4 Instructions and registers that trap from the VM to the hypervisor . 54 2.5 Benchmark Applications . 72 2.6 Micro-Architectural Cycle Counts . 73 2.7 Code Complexity in Lines of Code (LOC) . 79 3.1 Microbenchmarks . ..