Microkernels Meet Recursive Virtual Machines
Total Page:16
File Type:pdf, Size:1020Kb
Microkernels Meet Recursive Virtual Machines Bryan Ford Mike Hibler Jay Lepreau Patrick Tullmann Godmar Back Stephen Clawson Department of Computer Science, University of Utah Salt Lake City, UT 84112 [email protected] http://www.cs.utah.edu/projects/flux/ Abstract “vertically” by implementing OS functionality in stackable virtual machine monitors, each of which exports a virtual This paper describes a novel approach to providingmod- machine interface compatible with the machine interface ular and extensible operating system functionality and en- on which it runs. Traditionally, virtual machines have been capsulated environments based on a synthesis of micro- implemented on and export existing hardware architectures kernel and virtual machine concepts. We have developed so they can support “naive” operating systems (see Fig- a software-based virtualizable architecture called Fluke ure 1). For example, the most well-known virtual machine that allows recursive virtual machines (virtual machines system, VM/370 [28, 29], provides virtual memory and se- running on other virtual machines) to be implemented ef- curity between multiple concurrent virtual machines, all ficiently by a microkernel running on generic hardware. exporting the IBM S/370 hardware architecture. Further- A complete virtual machine interface is provided at each more, special virtualizable hardware architectures [22, 35] level; efficiency derives from needing to implement only have been proposed, whose design goal is to allow virtual new functionality at each level. This infrastructure allows machines to be stacked much more efficiently. common OS functionality, such as process management, demand paging, fault tolerance, and debugging support, to This paper presents a new approach to OS extensibil- be provided by cleanly modularized, independent, stack- ity which combines both microkernel and virtual machine able virtual machine monitors, implemented as user pro- concepts in one system. We have designed a “virtualiz- cesses. It can also provide uncommon or unique OS fea- able architecture” that does not attempt to emulate an ac- tures, including the above features specialized for particu- tual hardware architecture closely, but is instead designed lar applications’ needs, virtual machines transparently dis- along the lines of a traditional process model and is in- tributed cross-node, or security monitors that allow arbi- tended to be implemented in software by a microkernel. trary untrusted binaries to be executed safely. Our proto- The microkernel runs on the “raw” hardware platform and type implementation of this model indicates that it is prac- exports our software-based virtualizable architecture (see tical to modularize operating systems this way. Some types Figure 2), which we will refer to as a virtualizable pro- of virtual machine layers impose almost no overhead at all, cess or nested process architecture to avoid confusion with while others impose some overhead (typically 0–35%), but traditional hardware-based architectures. The virtual ma- only on certain classes of applications. chine monitors designed to run on this software architec- ture, which we call nesters, can efficiently create additional recursive virtual machines or nested processes in which ar- 1 Introduction bitrary applications or other nesters can run. Increasing operating system modularity and extensibil- Although the Fluke architecture does not closely fol- ity without excessively hurting performance is a topic of low a traditional virtual machine architecture, it is de- much ongoing research [5, 9, 18, 36, 40]. Microkernels [4, signed to preserve certain highly useful properties of re- 24] attempt to decompose operating systems “horizontally” cursive virtual machines. These properties are required to by moving traditional kernel functionality into servers run- different degrees by different nesters that take advantage ning in user mode. Recursive virtual machines [23], on of the model. For example, demand paging and check- the other hand, allow operating systems to be decomposed pointing nesters require access to and control over the pro- gram state contained in their children, grandchildren, and This research was supported in part by the Defense Advanced Re- so on, whereas process management and security monitor- search Projects Agency, monitored by the Department of the Army, under contract number DABT63–94–C–0058. The opinions and conclusions ing nesters primarily rely on being able to monitor and con- containedin this documentare those of the authors andshould not be inter- trol IPC-based communication across the boundary sur- preted as representing official views or policies of the U.S. Government. Application App App Nested Process Process Architecture Interface Process Process Checkpoint Operating Nester System Application Kernel App App Process Process Virtual Machine Demand Paging Application Debug Nester Nester Operating System Virtual Machine Kernel Monitor Process Mgmt Nester Virtual Machine Virtual Machine Hardware Virtual Machine Monitor Hardware Microkernel Interface Interface Bare Machine Bare Machine Figure 1: Traditional virtual machines based on hardware architec- Figure 2: Virtual machines based on an extended architecture imple- tures. Each shaded area is a separate virtual machine, and each virtual mented by a microkernel. The interface between the microkernel and the machine exports the same architecture as the base machine’s architecture. bare machineis a traditional hardware-basedmachine architecture, but the common interface between all the other layers in the system is a software- based nested process architecture. Each shaded area is a separate process. rounding the nested environment. Our microkernel’s API provides these properties effi- ciently in several ways. Address spaces are composed from nel and most do not allow control over physical memory other address spaces using hierarchical memory remapping management, just backing store. Similarly, multiuser se- primitives. For CPU resources, the kernel provides primi- curity mechanisms are not always needed, since most per- tives that support hierarchical scheduling. To allow IPC- sonal computers are dedicated to the use of a single per- based communication to short-circuit the hierarchy safely, son, and even process management and job control features the kernel provides a global capability model that supports may not be needed in single-applicationsystems such as the selective interposition on communication channels. On top proverbial “Internet appliance.” Our system demonstrates of the microkernel API, well-defined IPC interfaces pro- decomposed paging and POSIX process management by vide I/O and resource management functionalityat a higher implementing these traditional kernel functions as optional level than in traditional virtual machines. These higher- nesters which can be used only when needed, and only for level interfaces are more suited to the needs of modern ap- the parts of a system for which they are desired. plications: e.g., they provide file handles instead of device Increasing the scope of existing mechanisms: There I/O registers. are algorithms and software packages available for com- This nested process architecture can be used to apply mon operating systems to provide features such as dis- existing algorithms and techniques in more flexible ways. tributed shared memory (DSM) [10, 32], checkpoint- Some examples we demonstrate in this paper include the ing [11], and security against untrusted applications [52]. following: However, these systems only cleanly support applications Decomposing the kernel: Some features of traditional running in a single logical protection domain. In a nested operating systems are usually so tightly integrated into the process model, any process can create further nested sub- kernel that it is difficult to eliminate them in situations processes which are completely encapsulated within the in which they are not needed. A striking example is de- parent. This design allows DSM, checkpointing, security, mand paging. Although it is often possible to disable it and other mechanisms to be applied just as easily to multi- in particular situations on particular regions (e.g., using process applications or even complete operating environ- POSIX’s mlock()), all of the paging support is still in ments. Our system demonstrates this flexibility by provid- the kernel, occupying memory and increasing system over- ing a checkpointer, implemented as a nester, which can be head. Even systems that support “external pagers,” such as transparently applied to arbitrary domains such as a single Mach, contain considerable paging-related code in the ker- application, a multi-process user environment containing a process manager and multiple applications, or even the en- sections. tire system. Composing OS features: The mechanisms mentioned 2.1 Traditional Virtual Machines above are generally difficult or impossible to combine flex- ibly. One might be able to run an application and check- A virtual machine simulator, such as the Java inter- point it, or to run an untrusted application in a secure en- preter [25], is a program that runs on one hardware ar- vironment, but existing software mechanisms are insuffi- chitecture and implements in software a virtual machine cient to run a checkpointed, untrusted application without conforming to a completely different architecture. In con- implementing a new, specialized program designed to pro- trast, a virtual machine monitor or hypervisor, such