Interface and Execution Models in the Fluke Kernel

The following paper was originally published in the Proceedings of the 3rd Symposium on Operating Systems Design and Implementation New Orleans, Louisiana, February, 1999 Interface and Execution Models in the Fluke Kernel Bryan Ford, Mike Hibler, Jay Lepreau, Roland McGrath, Patrick Tullmann University of Utah For more information about USENIX Association contact: 1. Phone: 1.510.528.8649 2. FAX: 1.510.548.5738 3. Email: [email protected] 4. WWW URL:http://www.usenix.org/ Interface and Execution Models in the Fluke Kernel Bryan Ford Mike Hibler Jay Lepreau Roland McGrath Patrick Tullmann Department of Computer Science, University of Utah Abstract damental internal organization of the kernel implemen- We have defined and implemented a kernel API that tation. A central aspect of this internal structure is the execution model in which the kernel handles in-kernel makes every exported operation fully interruptible and events such as processor traps, hardware interrupts, and restartable, thereby appearing atomic to the user. To achieve interruptibility, all possible kernel states in which system calls. In the process model, which is used by tra- ditional monolithic kernels such as BSD, Linux, and Win- a thread may become blocked for a “long” time are represented as kernel system calls, without requiring the kernel dows NT, each thread of control in the system has its own to retain any unexposable internal state. kernel stack—the complete state of a thread is implicitly encoded its stack. In the interrupt model,usedbysystems Since all kernel operations appear atomic, services such as V [8], QNX [19], and the exokernel implementa- such as transparent checkpointing and process migration tions [13, 22], the kernel uses only one kernel stack per that need access to the complete and consistent state of a processor—thus for typical uniprocessor configurations, process can be implemented by ordinary user-mode pro- only one kernel stack. A thread in a process-model ker- cesses. Atomic operations also enable applications to nel retains its kernel stack state when it sleeps, whereas provide reliability in a more straightforward manner. in an interrupt-model kernel threads must explicitly save This API also allows us to explore novel kernel im- any important kernel state before sleeping, since there is plementation techniques and to evaluate existing tech- no stack implicitly encoding the state. This saved kernel niques. The Fluke kernel’s single source implements ei- state is often known as a continuation [11], since it allows ther the “process” or the “interrupt” execution model on the thread to “continue” where it left off. both uniprocessors and multiprocessors, depending on a configuration option affecting a small amount of code. In this paper we draw attention to the distinction be- We report preliminary measurements comparing fully, tween an interrupt-model kernel implementation—a ker- partially and non-preemptible configurations of both pro- nel that uses only one kernel stack per processor, explic- cess and interrupt model implementations. We find that itly saving important kernel state before sleeping—and the interrupt model has a modest performance advan- an “atomic” kernel API—a system call API designed to tage in some benchmarks, maximum preemption latency eliminate implicit kernel state. These two kernel prop- varies nearly three orders of magnitude, average preemp- erties are related but fall on orthogonal dimensions, as tion latency varies by a factor of six, and memory use illustrated in Figure 1. In a purely atomic API, all pos- favors the interrupt model as expected, but not by a large sible states in which a thread may sleep for a noticeable amount. We find that the overhead for restarting the most amount of time are cleanly visible as a kernel entrypoint. costly kernel operation ranges from 2–8% of the cost of For example, the state of a thread involved in any sys- the operation. tem call is always well-defined, complete, and immedi- ately available for examination or modification by other 1 Introduction threads; this is true even if the system call is long-running and consists of many stages. In general, this requires An essential issue of operating system design and im- all system calls and exception handling mechanisms to plementation is when and how one thread can block and be cleanly interruptible and restartable,inthesameway relinquish control to another, and how the state of a thread that the instruction sets of modern processor architectures suspended by blocking or preemption is represented in are cleanly interruptible and restartable. For purposes of the system. This crucially affects both the kernel inter- readability, in the rest of this paper we will refer to an face that represents these states to user code, and the fun- API with these properties as “atomic.” We use this term because, from the user’s perspective, no thread is ever in This research was largely supported by the Defense Advanced Re- search Projects Agency, monitored by the Dept. of the Army under the middle of any system call. contract DABT63–94–C–0058, and the Air Force Research Laboratory, We have developed a kernel, Fluke [16], which ex- Rome Research Site, USAF, under agreement F30602–96–2–0269. ports a purely atomic API. This API allows the com- Contact information: [email protected], Dept. of Computer Sci- ence, 50 Central Campus Drive, Rm. 3190, University of Utah, SLC, plete state of any user-mode thread to be examined and UT 84112-9205. http://www.cs.utah.edu/projects/flux/. modified by other user-mode threads without being arbi- Execution Model kernel’s execution model. (iii) To present the first com- Interrupt Process parison between the two kernel implementation models using a kernel that supports both in pure form, revealing Fluke Fluke that the models are not necessarily as different as com- (interrupt-model) (process-model) monly believed. (iv) To show that it is practical to use process-model legacy code in an interrupt-model kernel, Atomic VV and to present several techniques for doing so. (original) (Carter) Mach Mach The rest of the paper is organized as follows. In Section (Draves) (original) 2 we look at other systems, both in terms of the “atom- icity” their API and in terms their execution models. In API Model Section 3 we define the two models more precisely, and examine the implementation issues in each, looking at the BSD strengths and weaknesses each model brings to a kernel. Fluke’s atomic API is detailed in Section 4. In the following section, we present six issues of importance to the Conventional execution model of a kernel, with measurements based Figure 1: The kernel execution and API model continuums. on different configurations of the same kernel. The final V was originally a pure interrupt-model kernel but was later section summarizes our analysis. modified to be partly process-model; Mach was a pure process- model kernel later modified to be partly interrupt-model. Fluke 2 Related Work supports either execution model via compile time options. Related work is grouped into kernels with atomic or near-atomic system call APIs and work related to kernel trarily delayed. Supporting a purely atomic API slightly execution models. increases the width and complexity of the kernel interface but provides important benefits to user-level applica- 2.1 Atomic System Call API tions in terms of the power, flexibility, and predictability The clean interruptibility and restartability of instruc- of system calls. tions is now recognized as a vital property of all mod- In addition, Fluke supports both the interrupt and pro- ern processor architectures. However, this has not always cess execution models through a build-time configuration been the case; as Hennessy and Patterson state: option affecting only a small fraction of the source, en- This last requirement is so difficult that com- abling a comparison between them. Fluke demonstrates puters are awarded the title restartable if they that the two models are not necessarily as different as they pass that test. That supercomputers and many have been considered to be in the past; however, they each early microprocessors do not earn that badge of have strengths and weaknesses. Some processor archi- honor illustrates both the difficulty of interrupts tectures inherently favor the process model and process and the potential cost in hardware complexity model kernels are easier to make fully preemptible. Al- and execution speed. [18] though full preemptibility comes at a cost, this cost is as- sociated with preemptibility, not with the process model Since kernel system calls appear to user-mode code es- itself. Process model kernels use more per-thread ker- sentially as an extension of the processor architecture, nel memory, but this is unlikely to be a problem in prac- the OS clearly faces a similar challenge. However, few tice except for power-constrained systems. We show that operating systems have met this challenge nearly as thor- while an atomic API is beneficial, the kernel’s internal oughly as processor architectures have. execution model is less important: an interrupt-based or- For example, the Unix API [20] distinguishes between ganization has a slight size advantage, whereas a process- “short” and “long” operations. “Short” operations such based organization has somewhat more flexibility. as disk reads are made non-interruptible on the assump- Finally, contrary to conventional wisdom, our kernel tion that they will complete quickly enough that the delay demonstrates that it is practical to use legacy process- will not be noticeable to the application, whereas “long” model code even within interrupt-model kernels. The key operations are interruptible but, if interrupted, must be is to run the legacy code in user mode but in the kernel’s restarted manually by the application.

Interface and Execution Models in the Fluke Kernel

A Microkernel API for Fine-Grained Decomposition

Introduction to Debugging the Freebsd Kernel

Openvms Record Management Services Reference Manual

Microkernel Mechanisms for Improving the Trustworthiness of Commodity Hardware

Introduction to Unix

Blackberry QNX Multimedia Suite

Blackberry Playbook OS 2.0 Performs. Best in Class Communications

Advanced Operating Systems #1

Process Scheduling Ii

QNX Neutrino® Realtime Operating System

Microkernels in a Bit More Depth Early Operating Systems Had Very Little Structure a Strictly Layered Approach Was Promoted by Dijkstra

Building Performance Measurement Tools for the MINIX 3 Operating System