Microkernels Meet Recursive Virtual Machines

Microkernels Meet Recursive Virtual Machines Bryan Ford Mike Hibler Jay Lepreau Patrick Tullmann Godmar Back Stephen Clawson Department of Computer Science, University of Utah Salt Lake City, UT 84112 [email protected] http://www.cs.utah.edu/projects/flux/ Abstract ªverticallyº by implementing OS functionalityin stackable virtual machine monitors, each of which exports a virtual Thispaper describes a novel approach to providingmod- machine interface compatible with the machine interface ular and extensible operating system functionality and en- on which it runs. Traditionally,virtual machines have been capsulated environments based on a synthesis of micro- implemented on and export existing hardware architectures kernel and virtual machine concepts. We have developed so they can support ªnaiveº operating systems (see Fig- a software-based virtualizable architecture called Fluke ure 1). For example, the most well-known virtual machine that allows recursive virtual machines (virtual machines system, VM/370 [28, 29], provides virtual memory and se- running on other virtual machines) to be implemented ef- curity between multiple concurrent virtual machines, all ®ciently by a microkernel running on generic hardware. exporting the IBM S/370 hardware architecture. Further- A complete virtual machine interface is provided at each more, special virtualizable hardware architectures [22, 35] level; ef®ciency derives from needing to implement only have been proposed, whose design goal is to allow virtual new functionality at each level. This infrastructure allows machines to be stacked much more ef®ciently. common OS functionality, such as process management, demand paging, fault tolerance, and debugging support, to This paper presents a new approach to OS extensibil- be provided by cleanly modularized, independent, stack- ity which combines both microkernel and virtual machine able virtual machine monitors, implemented as user pro- concepts in one system. We have designed a ªvirtualiz- cesses. It can also provide uncommon or unique OS fea- able architectureº that does not attempt to emulate an ac- tures, including the above features specialized for particu- tual hardware architecture closely, but is instead designed lar applications' needs, virtual machines transparently dis- along the lines of a traditional process model and is in- tributed cross-node, or security monitors that allow arbi- tended to be implemented in software by a microkernel. trary untrusted binaries to be executed safely. Our proto- The microkernel runs on the ªrawº hardware platform and type implementation of this model indicates that it is prac- exports our software-based virtualizable architecture (see tical to modularize operating systems this way. Some types Figure 2), which we will refer to as a virtualizable pro- of virtual machine layers impose almost no overhead at all, cess or nested process architecture to avoid confusion with while others impose some overhead (typically 0±35%),but traditional hardware-based architectures. The virtual ma- only on certain classes of applications. chine monitors designed to run on this software architecture, which we call nesters, can ef®ciently create additional recursive virtual machines or nested processes in which ar- 1 Introduction bitrary applications or other nesters can run. Increasing operating system modularity and extensibil- Although the Fluke architecture does not closely fol- ity without excessively hurting performance is a topic of low a traditional virtual machine architecture, it is de- much ongoingresearch [5, 9, 18, 36, 40]. Microkernels [4, signed to preserve certain highly useful properties of re- 24]attempt to decompose operatingsystems ªhorizontallyº cursive virtual machines. These properties are required to by moving traditionalkernel functionalityinto servers run- different degrees by different nesters that take advantage ning in user mode. Recursive virtual machines [23], on of the model. For example, demand paging and check- the other hand, allow operating systems to be decomposed pointing nesters require access to and control over the program state contained in their children, grandchildren, and This research was supported in part by the Defense Advanced Re- so on, whereas process management and security monitor- search Projects Agency,monitoredby the Departmentof the Army, under contract number DABT63±94±C±0058. The opinions and conclusions ing nesters primarilyrely onbeingable tomonitorand con- containedin thisdocumentarethoseof theauthorsandshould not be inter- trol IPC-based communication across the boundary sur- preted as representing of®cial views or policies of the U.S. Government. Application App App Nested Process Process Architecture Interface Process Process Checkpoint Operating Nester System Application Kernel App App Process Process Virtual Machine Demand Paging Application Debug Nester Nester Operating System Virtual Machine Kernel Monitor Process Mgmt Nester Virtual Machine Virtual Machine Virtual Machine Hardware Monitor Hardware Microkernel Interface Interface Bare Machine Bare Machine Figure 1: Traditional virtual machines based on hardware architec- Figure 2: Virtual machines based on an extended architecture imple- tures. Each shaded area is a separate virtual machine, and each virtual mented by a microkernel. The interface between the microkerneland the machine exports the same architecture as the base machine's architecture. bare machineis a traditionalhardware-basedmachinearchitecture,butthe common interface betweenall the other layers in the system is a software- based nested process architecture. Each shaded area is a separate process. rounding the nested environment. Our microkernel's API provides these properties ef®- ciently in several ways. Address spaces are composed from nel and most do not allow control over physical memory other address spaces using hierarchical memory remapping management, just backing store. Similarly, multiuser se- primitives. For CPU resources, the kernel provides primi- curity mechanisms are not always needed, since most per- tives that support hierarchical scheduling. To allow IPC- sonal computers are dedicated to the use of a single per- based communication to short-circuit the hierarchy safely, son, and even process management and job control features the kernel provides a global capability model that supports may not be needed in single-applicationsystems such as the selective interpositionon communication channels. On top proverbial ªInternet appliance.º Our system demonstrates of the microkernel API, well-de®ned IPC interfaces pro- decomposed paging and POSIX process management by videI/O and resource management functionalityat a higher implementing these traditionalkernel functions as optional level than in traditional virtual machines. These higher- nesters which can be used only when needed, and only for level interfaces are more suited to the needs of modern ap- the parts of a system for which they are desired. plications: e.g., they provide ®le handles instead of device Increasing the scope of existing mechanisms: There I/O registers. are algorithms and software packages available for com- This nested process architecture can be used to apply mon operating systems to provide features such as dis- existing algorithms and techniques in more ¯exible ways. tributed shared memory (DSM) [10, 32], checkpoint- Some examples we demonstrate in this paper include the ing [11], and security against untrusted applications [52]. following: However, these systems only cleanly support applications Decomposing the kernel: Some features of traditional running in a single logical protection domain. In a nested operating systems are usually so tightly integrated into the process model, any process can create further nested sub- kernel that it is dif®cult to eliminate them in situations processes which are completely encapsulated within the in which they are not needed. A striking example is de- parent. This design allows DSM, checkpointing, security, mand paging. Although it is often possible to disable it and other mechanisms to be applied just as easily to multi- in particular situations on particular regions (e.g., using process applications or even complete operating environ- POSIX's mlock()), all of the paging support is still in ments. Our system demonstrates this ¯exibility by provid- the kernel, occupying memory and increasing system over- ing a checkpointer, implemented as a nester, which can be head. Even systems that support ªexternal pagers,º such as transparently applied to arbitrary domains such as a single Mach, contain considerable paging-related code in the ker- application, a multi-process user environment containing a process manager and multiple applications, or even the en- sections. tire system. Composing OS features: The mechanisms mentioned 2.1 Traditional Virtual Machines above are generally dif®cult or impossibleto combine ¯ex- ibly. One might be able to run an application and check- A virtual machine simulator, such as the Java inter- point it, or to run an untrusted application in a secure en- preter [25], is a program that runs on one hardware ar- vironment, but existing software mechanisms are insuf®- chitecture and implements in software a virtual machine cient to run a checkpointed, untrusted application without conforming to a completely different architecture. In con- implementing a new, specialized program designed to pro- trast, a virtual machine monitor or hypervisor, such as vide both functions. A nested

Microkernels Meet Recursive Virtual Machines

The Flask Security Architecture: System Support for Diverse Security Policies

All Computer Applications Need to Store and Retrieve Information

Composing Abstractions Using the Null-Kernel

The Flask Security Architecture: System Support for Diverse Security Policies

Harvard Cs252r Fa'13

Capability Myths Demolished

What Have We Learnt in 20 Years of L4 Microkernels?

Microkernel OS Evolution

Formal Verification of a Component Platform Matthew Fernandez

Dynamic Data Models: an Application of MOP-Based Persistence in Common Lisp 1 Introduction

Statistics for Brynosaurus.Com (2004)

Lxds: Towards Isolation of Kernel Subsystems