VMI: an Interface for Paravirtualization
Total Page:16
File Type:pdf, Size:1020Kb
VMI: An Interface for Paravirtualization Zach Amsden, Daniel Arai, Daniel Hecht, Anne Holler, Pratap Subrahmanyam VMware, Inc. {zach,arai,dhecht,anne,pratap}@vmware.com Abstract 1 Introduction Paravirtualization has a lot of promise, in par- Virtual machines allow multiple copies of po- ticular in its ability to deliver performance by tentially different operating systems to run con- allowing the hypervisor to be aware of the id- currently in a single hardware platform [5]. A ioms in the operating system. Since kernel virtual machine monitor (VMM) is a software changes are necessary, it is very easy to get layer that virtualizes hardware resources, ex- into a situation where the paravirtualized kernel porting a virtual hardware interface that reflects is incapable of executing on a native machine, the underlying machine architecture. A proces- or on another hypervisor. It is also quite easy sor architecture whose instructions produce dif- to expose too many hypervisor implementation ferent results depending on the privilege level details in the name of performance, which can at which they are executed is not classically impede the general development of the kernel virtualizable [13]. An example of such an ar- with many hypervisor specific subtleties. chitecture is the x86. Unfortunately, these ar- chitectures require additional complexity in the VMI, or the Virtual Machine Interface, is a VMM to cope with these non-virtualizable in- clearly defined extensible specification for OS structions. communication with the hypervisor. VMI de- livers great performance without requiring that A flexible operating system such as Linux kernel developers be aware of concepts that are has the advantage that the source code can only relevant to the hypervisor. As a result, be modified to avoid the use of these non- it can keep pace with the fast releases of the virtualizable instructions [15], thereby simpli- Linux kernel and a new kernel version can be fying the VMM. Recently, the Xen project [12] trivially paravirtualized. With VMI, a single has explored paravirtualization in some detail Linux kernel binary can run on a native ma- by constructing a paravirtualizing VMM for chine and on one or more hypervisors. Linux. Once you have taken the mental leap of accepting to change the kernel source, it be- In this paper, we discuss a working patch to comes obvious that more VMM simplification Linux 2.6.16 [1], the latest version of Linux as is possible by allowing the kernel to communi- of this writing. We present performance data on cate complex idioms to the VMM. native to show the negligible cost of VMI and on the VMware hypervisor to show its overhead VMMs traditionally make copies of critical compared with native. We also share some fu- processor data structures and then write-protect ture work directions. the original data structures to maintain consis- 364 • VMI: An Interface for Paravirtualization tency of the copy. The processor faults when our peers in this area. The paper concludes in the primary is modified, at which time the Section 9 by summarising our observations. VMM gets control and appropriately updates the copy. A paravirtualized kernel can directly communicate to the VMM when it modifies data structures that are of interest to the VMM. 2 Challenges for Paravirtualization This communication channel can be faster than a processor fault. This leads to both elimination There are several high level goals which must of code from the VMM—i.e., simplicity—and be balanced in designing an API for paravirtu- also performance. alization. The most general concerns are: While reducing complexity of the VMM is good, we should be careful not to increase the • Portability – it should be easy to port a overall complexity of the system. It would be guest OS to use the API. unacceptable if the code changes to the ker- nel makes it harder to maintain, or restricts it • Performance – the API must enable a portability, distributability or general reliabil- high performance hypervisor implementa- ity. Performance and the simplification of the tion. VMM has to be balanced with these considera- • Maintainability – it should be easy to tions too. For instance, it is tempting to allow maintain and upgrade the guest OS. the kernel to be aware of idioms from the hy- pervisor for more performance. This can lead • Extensibility – it should be possible for to a situation where the paravirtualized kernel future expansion of the API. is incapable of executing on a native machine or on another hypervisor. Introducing hypervi- • Transparency – the same kernel should sor specific subtleties into the kernel can also run on both native hardware and on mul- impede general kernel development. tiple hypervisors. Hence, paravirtualization must be done care- 2.1 Portability fully. The purpose of this paper is to propose a disciplined approach to paravirtualization. There is some code cost to port a guest OS The rest of the paper is organized as follows. In to run in a paravirtualized environment. The section 2, we describe the core guiding princi- closer the API resembles a native platform that ples to follow while paravirtualizing the kernel. the OS supports, the lower the cost of porting. In Section 3, we propose VMI, or the Virtual A low level interface that encapsulates the non- Machine Interface, that is an implementation of virtualizable and performance critical parts of these guidelines. Section 4 describes the other the system can make the porting of a guest OS, face of VMI, the part that interfaces with the in many cases, to be a simple replacement of hypervisor. In Section 5, we share the key as- one function with another. pects of the Linux 2.6.16-rc6 implementation. Section 6 describes several of the performance Of course, once we introduce interfaces that experiments we have done and shares perfor- go beyond simple instructions, we have to go mance data. In Section 7, we talk about our to a higher level. For instance, the kernel future work. Section 8 describes work done by can manage its page tables cooperatively with 2006 Linux Symposium, Volume Two • 365 the VMM. In these cases, we carefully main- changes to the hypervisor are visible to the tain kernel portability by relying on the kernel paravirtual kernel, maintenance of the kernel source architecture itself. As an example, sup- becomes difficult. Additionally, in the Linux port for the page table interfaces in the Linux world, the rapid pace of development on the operating system has proven to be very mini- kernel means new kernel versions are produced mal in cost because of the already portable and every few months. This rapid pace is not al- modular design of the memory management ways appropriate for end users, so it is not un- layer. common to have dozens of different versions of the Linux kernel in use that must be actively supported. To keep this many versions in sync 2.2 High Performance with potentially radical changes in the paravir- tualized system is not a scalable solution. In addition to pure CPU emulation, perfor- To reduce the maintenance burden as much as mance concerns in a hypervisor arise from possible while still allowing the implementa- the fact that many operations, such as ac- tion to accommodate changes, a stable ABI cesses to page tables or virtual devices includ- with semantic invariants is necessary. The un- ing the APIC, require costly trapping memory derlying implementation of the ABI, including accesses. To alleviate these performance prob- the details of how it communicates with the hy- lems, a simple CPU-oriented interface must be pervisor, should not be visible to the kernel. If expanded to incorporate MMU and interrupt this encapsulation exists, then in most cases the controller interfaces. paravirtualized kernel need not be recompiled Also, while a low level API that closely resem- to work with a newer hypervisor. This allows bles hardware is preferred for portability, care performance optimizations, bug fixes, debug- must be taken to ensure that performance is not ging, or statistical instrumentation to be added sacrificed. A low level API does not explic- to the API implementation without any impact itly provide support for higher level compound on the guest kernel. operations. Some examples of such compound operations are the updating of many page table 2.4 Extensibility entries, flushing system TLBs, and providing bulk operations during context switches. In order to provide a vehicle for new features, new device support, and general evolution, the Therefore, the interface must not preclude the API uses feature compartmentalization with possibility of optimizing low level operations in controlled versioning. The API is split into sec- some way to achieve the same performance that tions, and each section can be incrementally ex- would be available had it provided higher level panded as needed. abstractions. Then, deeply intrusive hooks into the paravirtualized OS can be avoided while preserving performance. 2.5 Transparency 2.3 Maintainability Any software vendor will appreciate the cost of handling multiple kernels, so the API takes into account the need for allowing the same paravir- Concurrent development of the paravirtual ker- tualized kernel to run on both native hardware nel and hypervisor is a common scenario. If [10] and on other hypervisors. See Figure 1. 366 • VMI: An Interface for Paravirtualization 3.1 Linear Address Space VMI Linux The VMI specifies that a portion of the par- avirtualized kernel’s linear address space is re- VMI VMI VMI served. This space is used by the VMI layer and the hypervisor. See Section 4.4 for more VMI Layer for VMI Layer for VMI Layer for native VMware hypervisor Xen 3.0.1 details. Native Machine VMware hypercalls Xen hypercalls 3.2 Bootstrapping VMware hypervisor Xen 3.0.1 Native Machine Native Machine Our implementation allows a paravirtualized kernel to begin running in a fully virtualized Figure 1: VMI guests run unmodified on differ- manner, compatible with a standard PC boot ent hypervisors and raw hardware sequence.