In-Kernel Servers on Mach

InKernel ServersonMach Implementation and Performance Jay Lepreau Mike Hibler Bryan Ford Jerey Law Center for Software Science Department of Computer Science University of Utah Salt LakeCity UT Email flepreaumikebafordlawgcsuta hedu Abstract The advantages in mo dularityandpower of microkernelbased op erating systems suchas Macharewell known The existing p erformance problems of these systems however are signicant Much of the p erformance degradation is due to the cost of maintaining separate protection domains traversing software layers and using a semantically richinterpro cess com munication mechanism An approach that optimizes the common case is to p ermit merging of protection domains in p erformance critical applications while maintainin g protection b ound aries for debugging or in situations that demand robustness In our system client calls to the server are eectively b ound either to a simple system call interface or to a full RPC mechanism dep ending on the servers lo cation The optimization reduces argument copies as well as work done in the control path to handle complex and infrequently encountered message typ es In this pap er we present a general metho d of doing this for Mach and the results of applying it to the Mach microkernel and the OSF single server We describ e the necessary mo dications to the kernel the single server and the RPC stub generator Semantic equivalence backwards compatibility and common source and binary co de are preserved Performance on micro and macro b enchmarks is rep orted with RPC p erformance improving by a factor of three Unix system calls to the server improving b etween and a factor of two and p erformance gain on large b enchmarks A breakdown of the times on the RPC path is also presented Intro duction The mo dularity of microkernelbased op erating systems OSs provides them wellknown b en ets These b enets include improved debugging facilities and software engineering improvements which result from a higher degree of system structuring The use of separate protection domains to implement systems on top of a microkernel provides robustness the ability to comp ose servers to exhibit dierent OS p ersonalities oers exibility In addition microkernels exp ort p owerful abstractions which are useful for building distributed systems and multicomputer OSs However this mo dularity and p ower has come at some cost Because the mo dularity is strictly at the task level and tasks communicate via a p owerful but heavy interpro cess communication IPC mechanism all intertask interactions are exp ensive The exp ensiveintertask op erations include the common case of untrusted clients calling trusted servers on the same machine passing small amounts of data back and forth This researchwas sp onsored by the HewlettPackard ResearchGrants Program Although tasks can communicate through shared memorymuchintertask interaction is handled via remote pro cedure calls RPCs A few common IO op erations suchasUnix read are often implemented with shared memory and can b e reasonably fast Although the Mach microkernel has shown isolated instances of p erformance comparable to that of macrokernel systems this has not b een observed in general Manysoftware layers and some hardware b oundaries must b e traversed even for the most trivial interactions b etween tasks All messages exercise much of the IPC mechanism even though common servers such as the Unix server use only a small subset of its capabilities The conclusion is that for intertask calls Mach has failed to optimize the common case Instrumentation of the Mach IPC path on the HewlettPackard PARISC has shown that at least on this platform the traditional context switch from one address space to another is only a minor comp onent of samemachine Mach RPC costs As detailed later most of the cost is in kernel entry and exit co de data copies in the MIGgenerated stub routines and in p ort shuing and related co de Wehave largely eliminated the last two costs in the common case To optimize the common case wehave constructed a separately linked OSF server image referred to as the inkernel server or INKS that can b e loaded into the kernel protection domain at run time We dynamically replace heavyweight IPCbased communicationmechanisms with lighterweight trap and pro cedure call mechanisms where p ossible Wehave extended the RPC stub generator to pro duce sp ecial stubs which enact these new mechanisms Since the change of communication mechanism is essentially a change in pro cedure binding the presence or absence of these mechanisms is transparenttoclients In the rest of this pap er we discuss the goals constraints and design in Section give im plementation details in Section and give status and p erformance results in Section including breakdown of the RPC path microb enchmarks and macrob enchmarks Finallywe examine related work discuss planned and p ossible future work and give our conclusions Design Goals Our goals were vefold Most imp ortantlywewanted to improve the performance of systems based on Mach b oth those implementing Unix and more sp ecialized systems such as network proto col servers Nearly as imp ortant we soughttobeascompatible as p ossible as elab orated b elow Wewanted to make the required system transformations acceptableby imp osing minimal constraints on Mach program structure Wewanted to explore the issues involved and nallywe wanted a sample system as a target for a more dynamic binding mechanism based on an ob ject server wehave implemented Compatibility Constraints To maintain compatibilitywe imp osed a numb er of constraints on the design The source for in and outofkernel servers must b e identical with little use of conditional compilation All changes to the microkernel must b e backwards compatible That is old clients and servers that use normal RPCs must still run New clients must b e able to function with old servers With a small amount of eort new clients and servers should function with old microkernels Note that for Unix programs compatibility is not an issue b ecause the emulator encapsulates all of the interface to Mach and will normally match the installed server Beyond our primary goals wehaveachieved additional compatibility Fully linked binaries of in and outofkernel servers are identical Conformant servers which are dened to b e those which dont directly call machmsg but only use MIGgenerated stubs currently require only a few lines of b oilerplate source co de in order to take full advantage of our system These could b e eliminated by mo dest further changes to MIG or to pieces of the Mach library Nonconformant servers run correctly but with only a slight sp eed increase over outofkernel servers The C threads library required no changes with the exception of one bug x Changes to the OSF server and emulator were mo dest and predominantly stemmed from its co de heritage as a macrokernel Design Overview First we will outline the changes to the kernel and to the RPC stub generator In principle these are the only changes needed to allow conformant servers to run inside the kernel with full p erformance Then we will trace a clienttoserver call which will demonstrate the overall design Later sections will describ e the sp ecial problems that the OSF server and emulator presented When loaded into the kernel a server continues to exist as a distinct task but with two unique attributes it shares the kernel address map ie lives in the kernel address space and its threads run in the same privileged mo de as kernel threads Twochanges were made to the kernel in supp ort of this First a mechanism was added to load a server into the kernel address space and instantiate it as a separate task Second existing supp ort for kernel tasks had to b e extended b oth to supp ort preemption of its threads and to allow them to p erform system calls In addition to supp ort user to server system calls we added kernel infrastructure for server stack management and call registration The other key comp onent in our design is the RPC stub generator whichfortypical Mach clientsserver applications is the MachInterface Generator MIG Wehave pro duced an extended version of MIG called KMIG which includes facilities to supplement remote pro cedure calls by generating sp ecial traps through a dispatch table The sp ecial stubs pro duced by KMIG allow user mo de client programs to use more ecient control paths when making calls to servers loaded into the kernel protection domain This mechanism could also b e made available to those servers when making calls to other servers in the same domain Ifaserver is linked with the corresp onding serverside KMIGgenerated stub routines when it is started it can optionally b e loaded into the kernel Once loaded the servers rst action is to register with the kernel the RPC pro cedure ID ranges it supp orts as client system calls along with avector of p ointers to its serverside stubs These tables are automatically generated by KMIG The KMIGgenerated client stubs are essentially conventional system call traps The system call numb ers are used bythekernel to index into the appropriate server registered table When a client executes a serverprovided system call the kernel observes that it is out of the normal range nds the servers receive p ort from the clients send right name and lo oks up the registered trap table for that server If such a table exists and the kernel nds the vector index within it it sets up some server context stack global base p ointers and task identity and dispatches to the asso

In-Kernel Servers on Mach

Performance Best Practices for Vmware Workstation Vmware Workstation 7.0

Chapter 1. Origins of Mac OS X

Zenon Manual Programming Interfaces

Operating System Architectures

Access Control for the SPIN Extensible Operating System

Designing a Concurrent File Server

Z/OS Basics Preface

Contents of Lecture 8 on UNIX History

A Generic Software Modeling Framework for Building Heterogeneous Distributed and Parallel Software Systems

Microkernels in a Bit More Depth

Extensibility, Safety and Performance in the SPIN Operating

Density Functional Studies on EPR Parameters and Spin-Density Distributions