Predictable Migration and Communication in the Quest-V Multikernel

Predictable Migration and Communication in the Quest-V Multikernel Ye Li, Eric Missimer and Richard West Computer Science Department Boston University Boston, MA 02215 Email: liye,missimer,richwest}@cs.bu.edu Abstract—Quest-V is a system we have been developing separate largely unrelated guest OSes and their appli- from the ground up, with objectives focusing on safety, pre- cations, Quest-V sandbox kernels work together as a dictability and efficiency. It is designed to work on emerging distributed system on a chip. multicore processors with hardware virtualization support. Quest-V is implemented as a “distributed system on a The distributed organization of Quest-V means there chip” and comprises multiple sandbox kernels. Sandbox is no notion of a single kernel responsible for global kernels are isolated from one another in separate regions scheduling or system-wide resource management. In- of physical memory, having access to a subset of processing stead, each sandbox kernel manages its own dedicated cores and I/O devices. This partitioning prevents system resources, including a specific set of cores, I/O devices failures in one sandbox affecting the operation of other sandboxes. Shared memory channels managed by system and memory regions. The philosophy behind Quest-V is monitors enable inter-sandbox communication. to share as little as possible between different sandbox The distributed nature of Quest-V means each sandbox kernels and their memory spaces. This reduces resource has a separate physical clock, with all event timings being contention and has been shown to increase scalabil- managed by per-core local timers. Each sandbox is respon- ity [1]. However, this imposes several challenges on the sible for its own scheduling and I/O management, without requiring intervention of a hypervisor. In this paper, we design of applications. For example, multi-threaded ap- formulate bounds on inter-sandbox communication in the plications must be distributed across the various sandbox absence of a global scheduler or global system clock. kernels to increase the degree of parallelism. Since each We also describe how address space migration between sandbox manages a private region of physical memory, sandboxes can be guaranteed without violating service separate threads in different sandboxes will each need a constraints. Experimental results on a working system show the conditions under which Quest-V performs real-time private copy of their address spaces. Parallel applications communication and migration. must be designed to work in a distributed manner, communicating across special shared memory channels when necessary. I. INTRODUCTION Quest-V’s distributed system design means that each Quest-V is a real-time system we are developing that sandbox operates with an independent system clock. Just focuses on predictability, efficiency and safety. It is built as traditional distributed systems have neither physically from the ground up rather than being a modification shared memory nor a single physical clock for all loosely to a pre-existing system such as Linux. We chose this coupled compute nodes, each Quest-V sandbox on a path because Linux is in many ways overly complex, multicore processor uses a local timer on each core to making it difficult to enforce system-wide predictability manage event timings. by simply modifying individual components such as the The lack of a global clock and a global scheduler in scheduler. Additionally, our development of Quest-V has Quest-V poses challenges in two key ways: (1) in the enabled us to investigate ways to integrate emerging communication between threads in different sandboxes, hardware features directly into the design of the OS. and (2) the migration of threads between sandboxes. The For example, many modern processors feature hardware first problem occurs when two or more threads need to virtualization support (e.g., Intel VT, AMD-V and ARM synchronize or exchange data within time bounds. A Cortex A15 processors), and Quest-V uses these capabil- sending thread in one sandbox may be forced to wait ities to isolate system components into separate “sandbox for a reply from a remote thread that is independently kernels”. This enhances fault tolerance and dependability scheduled in another sandbox. A method to bound the of the system. Unlike traditional hypervisors that support round-trip delay is necessary. The second problem occurs when a thread may have partially executed in one sand- This work is supported in part by NSF Grant #1117025. box and needs to complete by a certain time in another 1 sandbox. Ideally, we want the migrating thread to avoid any penalty for migration and clock skew that could affect its ability to meet timing requirements. Likewise, any migrating thread should not adversely affect the timing guarantees of threads already executing in the destination sandbox. Contributions. This paper describes how Quest-V guarantees predictable inter-sandbox communication and thread migration between separate sandboxes running on a multicore processor. We formulate constraints under which these objectives can be guaranteed without requiring a common clock or global scheduler. In the next section, we briefly describe the Quest- Fig. 1. Quest-V Architecture Overview V architecture. This is followed by a discussion of the mechanisms to enforce predictable thread migration, into a 4KB memory page. They primarily exist to main- as well as inter-sandbox communication. Experimental tain extended page table (EPT) mappings, so that each results are shown in Section V. A summary of related sandbox’s virtual memory space maps onto a separate work is described in Section VI. Finally, conclusions and region of host physical memory. The only other times future work are discussed in Section VII. monitors are needed is to launch a sandbox when the system is initialized, to aid in fault recovery or protection II. QUEST-V ARCHITECTURE management (e.g., if there is an EPT violation), and to As stated earlier, the Quest-V system is partitioned update shared memory mappings between sandboxes. into a series of sandbox kernels, with each sandbox en- Shared memory mappings are used in Quest-V to compassing a subset of memory, I/O and CPU resources. establish communication channels between the otherwise Sandbox memories are isolated from one another using separate sandboxes. We use replicated monitors rather hardware virtualization capabilities found on modern than a single hypervisor such as in Xen [2] for three multicore processors. The current system works on x86 main reasons. First, each monitor is a trusted code base (Intel VT-x) platforms but plans are underway to port with a small memory footprint (less than 4KB). Second, Quest-V to the ARM architecture. the monitors can be implemented differently so that they We envision Quest-V being applicable to safety- are not all susceptible to the same security vulnerability. critical systems, where it is important that services Third, a monitor maintains EPT mappings for only one remain operational even when there are hardware and sandbox, eliminating the overheads of scheduling and software failures. Future automotive, avionics, factory switching guest address spaces as is done with traditional automation and healthcare systems are example appli- hypervisor systems. cation domains where safety-criticality is required. In Quest-V all scheduling and device management Although we assume the existence of hardware vir- is performed within each sandbox directly. We use a tualization features, this is not essential for the design form of I/O passthrough to allow device interrupts to of our system. Sandbox memory isolation can instead be directed to a sandbox kernel without monitor inter- be enforced using segmentation or paging on platforms vention. This differs from the “split driver” model of with simpler memory protection units. However, hard- systems such as Xen that have a special domain to handle ware virtualization offers potential advantages (albeit interrupts before they are directed into a guest. Allowing with added costs) that we wish to investigate in our sandboxes to have direct access to I/O devices and to system design. In particular, it provides an extra logical perform local scheduling decisions greatly reduces the “ring of protection” that allows services to execute with overhead of switching into a hypervisor (or, equivalently, traditional kernel privileges. This is in contrast with monitor) to aid in resource management. micro-kernels that grant the least privileges necessary Quest-V supports configurable partitioning of re- for services, at the expense of added communication to sources amongst guests, similar to the way cores are access other services in separate protection domains. partitioned in systems such as Corey [3]. By default, A high-level overview of the Quest-V architecture is we assume each sandbox is associated with a single shown in Figure 1. Each sandbox, including its kernel processing core since this simplifies local (sandbox) and application space, is associated with a separate scheduling, although it is possible to configure a sandbox monitor. The monitors are relatively small, fitting easily to encompass multiple cores. Similarly devices can be 2 shared or partitioned. For example, a network device maximizing total CPU utilization in all cases. Factors could be exclusive to one sandbox while all USB devices such as shared caches and other micro-architectural are shared between two or more sandboxes. resource contention [8],

Predictable Migration and Communication in the Quest-V Multikernel

Clustering with Openmosix

Load Balancing Experiments in Openmosix”, Inter- National Conference on Computers and Their Appli- Cations , Seattle WA, March 2006 0 0 5 4 9

HPC with Openmosix

Introduction Course Outline Why Does This Fail? Lectures Tutorials

The Utility Coprocessor: Massively Parallel Computation from the Coffee Shop

Legoos: a Disseminated, Distributed OS for Hardware Resource

Architectural Review of Load Balancing Single System Image

SEMINARIS DOCENTS DE CASO 03/04 - 2Q NÚM GRUP: Facultat D´Informàtica De Barcelona - Departament AC - UPC

Study of Mosix Software Tool for Cluster Computing in Linux

Process Migration in Distributed Systems

Using Mosix for Wide-Area Compuational Resources

Task Management Issues in Distributed Systems Abstract 1 Introduction 2 Techniques and Attributes of Task Management