Proceedings of the FREENIX Track: 2001 USENIX Annual Technical Conference

USENIX Association Proceedings of the FREENIX Track: 2001 USENIX Annual Technical Conference Boston, Massachusetts, USA June 25–30, 2001 THE ADVANCED COMPUTING SYSTEMS ASSOCIATION © 2001 by The USENIX Association All Rights Reserved For more information about the USENIX Association: Phone: 1 510 528 8649 FAX: 1 510 548 5738 Email: [email protected] WWW: http://www.usenix.org Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. Improving the FreeBSD SMP implementation GregLehey IBM LTC Ozlabs [email protected] [email protected] ABSTRACT UNIX-derivedoperating systems have traditionally have a simplistic approach to process synchronization which is unsuited to multiprocessor application. Initial FreeBSD SMP support kept this approach by allowing only one process to run in kernel mode at any time, and also blocked interrupts across multiple processors, causing seriously suboptimal performance of I/O bound systems. This paper describes work done to remove this bot- tleneck, replacing it with fine-grained locking. It derivesfrom work done on BSD/OS and has manysimilarities with the approach taken in SunOS 5. Synchronization is per- formed primarily by a locking construct intermediate between a spin lock and a binary semaphore, termed mutexes.Ingeneral, mutexesattempt to block rather than to spin in cases where the likely wait time is long enough to warrant a process switch. The issue of blocking interrupt handlers is addressed by attaching a process context to the interrupt handlers. Despite this process context, an interrupt handler normally runs in the context of the interrupted process and is scheduled only when blocking is required. • There is only one processor.All code runs on it. Introduction • If both an interrupt handler and a process are Acrucial issue in the design of an operating sys- available to run, the interrupt handler runs. tem is the manner in which it shares resources • Interrupt handlers have different priorities. If such as memory,data structures and processor one interrupt handler is running and one with time. In the UNIX model, the main clients for re- ahigher priority becomes runnable, the higher sources are processes and interrupt handlers. In- priority interrupt immediately preempts the terrupt handlers operate completely in kernel lower priority interrupt. space, primarily on behalf of the system. Pro- cesses normally run in one of twodifferent • The scheduler runs when a process voluntari- modes, user mode and kernel mode. User mode ly relinquishes the processor,its time slice ex- code is the code of the program from which the pires, or a higher-priority process becomes process is derived, and kernel mode code is part runnable. The scheduler chooses the highest of the kernel. This structure givesrise to multiple priority process which is ready to run. potential conflicts. • If the process is in kernel mode when its time slice expires or a higher priority process be- Use of processor time comes runnable, the system waits until it re- The most obvious demand a process or interrupt turns to user mode or sleeps before running routine places on the system is that it wants to the scheduler. run: it must execute instructions. In traditional This method works acceptably for the single pro- UNIX, the rules governing this sharing are: cessor machines for which it was designed. In the following section, we’ll see the reasoning behind the last decision. vice drivers, the process context (‘‘top half’’)and the interrupt context (‘‘bottom half’’)must share Kernel data objects data. Two separate issues arise here: each half must ensure that anychanges to shared data struc- The most obvious problem is access to memory. tures occur in a consistent manner,and theymust Modern UNIX systems run with memory protec- find a way to synchronize with each other. tion,which prevents processes in user mode from accessing the address space of other processes. Protection This protection no longer applies in kernel mode: all processes share the kernel address space, and Each half must protect its data against change by theyneed to access data shared between all pro- the other half. Forexample, the buffer header cesses. For example, the fork() system call structure contains a flags word with 32 flags, needs to allocate a proc structure for the new some set and reset by both halves. Setting and re- process. The file sys/kern_fork.c contains the fol- setting bits requires multiple instructions on most lowing code: architectures, so the potential for data corruption exists. UNIX solves this problem by locking out int fork1(p1, flags, procp) interrupts during critical sections. Tophalf code struct proc *p1; must explicitly lock out interrupts with the spl int flags; 1 struct proc **procp; functions. One of the most significant sources of { bugs in drivers is inadequate synchronization with struct proc *p2, *pptr; the bottom half. ... /* Allocate new proc. */ Interrupt code does not need to perform anyspe- newproc = zalloc(proc_zone); cial synchronization: by definition, processes don’trun when interrupt code is active. The function zalloc takes a struct proc Blocking interrupts has a potential danger that an entry offafreelist and returns its address: interrupt will not be serviced in a timely fashion. On PC hardware, this is particularly evident with item = z->zitems; z->zitems = ((void **) item)[0]; serial I/O, which frequently generates an interrupt ... for every character.At115200 bps, this equates return item; to an interrupt every 85 ms. In the past, this has givenrise to the dreaded silo overflows; evenon What happens if the currently executing process is fast modern hardware it can be a problem. It’sal- interrupted exactly between the first twolines of so not easy to decide interrupt priorities: in the the code above,maybe because a higher priority early days, disk I/O was givenahigh priority in process wants to run? item contains the pointer order to avoid overruns, while serial I/O had a low to the process structure, but z->z_items still priority.Now adays disk controllers can handle points to it. If the interrupting code also allocates transfers by themselves, but overruns are still a aprocess structure, it will go through the same problem with serial I/O. code and return a pointer to the same memory area, creating the process equivalent of Siamese Waiting for the other half twins. In other cases, a process will need to wait for UNIX solves this issue with the rule ‘‘The UNIX some event to complete. The most obvious exam- kernel is non-preemptive’’.This means that when ple is I/O: a process issues an I/O request, and the aprocess is running in kernel mode, no other pro- driverinitiates the transfer.Itcan be a long time cess can execute kernel code until the first process before the transfer completes: if it’sreading relinquishes the kernel voluntarily,either by re- turning to user mode, or by sleeping. 1. The naming goes back to the early days of UNIX on the PDP-11. The PDP-11 had a relatively simplistic level-based interrupt Synchronizing processes and inter- structure. When running at a specific level, only higher priority rupts interrupts were allowed. UNIX named functions for setting the interrupt priority levelafter the PDP-11 SPL instruction, so initially The non-preemption rule only applies to process- the functions had names like spl4 and spl7.Later machines came out with interrupt masks, and BSD changed the names to es. Interrupts happen independently of process more descriptive names such as splbio (for block I/O) and context, so a different method is needed. In de- splhigh (block out all interrupts). keyboard input, for example, it could be weeks ev ents map to the same address. before the I/O completes. When the transfer completes, it causes an interrupt, so it’sthe interrupt handler which finally determines that the transfer Adapting the UNIX model to SMP is complete and notifies the process. Traditional UNIX performs this synchronization with the Anumber of the basic assumptions of this model functions sleep and wakeup,though current no longer apply to SMP,and others become more BSD no longer uses sleep:ithas been replaced of a problem: tsleep with ,which offers additional functional- • More than one processor is available. Code ity. can run in parallel. sleep tsleep The top half of a drivercalls or • Interrupt handlers and user processes can run when it wants to wait for an event, and the bottom on different processors at the same time. half calls wakeup when the event occurs. In more detail, • The ‘‘non-preemption’’rule is no longer suffi- cient to ensure that twoprocesses can’tex- read • The process issues a system call ,which ecute at the same time, so it would theoreti- brings it into kernel mode. cally be possible for twoprocesses to allocate • read locates the driverfor the device and the same memory. calls it to initiate a transfer. • Locking out interrupts must happen in every • read next calls tsleep,passing it the ad- processor.This can adversely affect perfor- dress of some unique object related to the re- mance. quest. tsleep stores the address in the proc structure, marks the process as sleeping and The initial FreeBSD model relinquishes the processor.Atthis point, the process is sleeping. The original version of FreeBSD SMP support solved these problems in a manner designed for • At some later point, when the request is com- reliability rather than performance: effectively it wakeup plete, the interrupt handler calls found a method to simulate the single-processor with the address which was passed to paradigm on multiple processors.

Proceedings of the FREENIX Track: 2001 USENIX Annual Technical Conference

Twenty Years of Berkeley Unix : from AT&T-Owned to Freely

The Release Engineering of Freebsd 4.4

Berkeley DB from Wikipedia, the Free Encyclopedia

Copyright © 1992, by the Author(S). All Rights Reserved

Contributeurs Au Projet Freebsd Version: 43184 2013-11-13 Par Hrs

Downloaded for Free From

An Operating System

Virus Bulletin, June 1990

NEWS RELEASE Contact: Jim Ormond 212-626

UNIX Papers April, 1996

The Design and Implementation of the 4.4BSD Operating System

The Release Engineering of 4.3BSD