Process Synchronization in the UTS Kernel Lawrence M

Process Synchronization in the UTS Kernel Lawrence M. Ruane AT&T Bell Laboratories ABSTRACT: Any operating system kernel has some form of process synchronization, allowing a process to wait for a particular condition. The traditional choice for UNIX systems, the event-wait mechanism, leads to race conditions on multiprocessors. This problem was initially solved in Amdahl's UTS multiprocessing kernel by replacing the event-wait mechanism with Dijkstra semaphores. The kernel, however, became noticeably more complicated and less reliable when based on semaphores. This has led us to develop a race-free multiprocessor event- wait mechanism with some novel properties. A few common synchronization techniques have emerged, the most complex of which was verified correct with the supertrace protocol validation system spin. A new scheduling approach with per-CPU run queues reduces the number of unnecessary context switches due to awakening all waiting processes. The overall approach is claimed to be simple, efficient, and reliable. @ Computing Systems, Vol. 3 'No. 3 'Summer 1990 387 I. Introduction Broadly, process synchronization refers to processes blocking them- selves until the system changes state in some way. In the original UNIX kernel, processes wait for events [Thompson 1978]. An event is represented by an arbitrary integer (the channel), chosen by convention to be the address of the kernel data structure whose state)change the process is interested in. For example, suppose a resource, represented by a data structure pointed to by r, must be used by only one process at a time. Here is the standard protocol to lock the resource:l while (r->Iock) sleep(r); r-)lock = 1; If the resource is locked lock( is nonzero), the process suspends itself until the resource becomes free Lock( is zero). The process then locks the resource and proceeds to use it. The lock flag is often called a sleep lock. Since the kernel is non-preemptable (the kernel will not switch to another process unless explicitly requested via sleepO), there ls no race condition in the test and set of the sleep lock: it is impossible for two processes to see the lock cleared and then both set the lock and use the resource simultaneously. Non-preemptability also simplifies the kernel tremendously, since it implies that sleep locks are only needed when a process might sleep at a "lower level," during which no other process should access the resource. For example, while a process waits for I/O completion for a particular inode (file), the process holds the inode's sleep lock. After changing the state of the system in a way that might be relevant to sleeping processes, a process must wake them up using the agreed-upon channel: r-)Iock = 0; wakeup(r); l. For simplicity, we ignore process scheduling priority upon waking up, the second argument to sleep O. 388 Lawrence M. Ruane An awakened process must treat the wakeup as advisory; it must re-evaluate the wait condition - thus the white instead of an if - for two reasons. First, any number of the processes might have run since the the wakeup was issued, any of which might have changed the wait condition. Second, more than one event can map to the same channel, so the wait condition might not have changed at all. Our experience has been that while event-wait is not as efficient as some other synchronization methods (that wake only one waiting process, for example), it puts the least burden on kernel algorithms because it closely approximates the busy-wait model. Under pure busy-wait, there are no process blocking or resumption concerns at all - every process has its own processor' Algorithms can use "polling" loops that are much more complex than the simple while statement above; then the polling is slowed down by slipping in calls to sleepO and wakeupO at the right places, without changing the algorithms. Appendix,'4 compares event-wait to other forms of monitors. 2. Race Conditions If the resource is released by an interrupt handler, a race condition can cause the wakeup to be lost. A process wishing to use the resource tests the lock, finds it set, and commits to go to sleep. But before actually calling sleep O, an intemrpt occurs. The interrupt handler frees the lock and does the wakeup O (which has no effect), and returns to the intemrpted process, which goes to sleep, possibly forever. Disabting interrupts between the test of r->lock and the transition into the wait state eliminates the race condition: spl6O ; /*disable interruPtsx/ while (r->Iock) sleep(r); r-)lock = 1; splOO ; /xenable interruPtsx/ Interrupts are automatically enabled after the process is switched out, and are disabled again when the process switches back in. Process Synchronization in the UTS Kernel 389 On multiprocessing systems, disabling interrupts on the current cPU is not sufficient - the intemrpt handler can run on another CPU. Also, the race condition prevented by non- preemptability on a uniprocessor can now occuf: two processes on different CPUs attempt to lock the same resource at about the same time; the processes both see lock set to zero, set lock, and simultaneously use the resource - practically a guaranteed disaster lBach le86l. 3. Semaphores The problems with the event-wait method on multiprocessors have led to several implementations that substitute semaphores po( and vo routines) [Dijkstra 1968]. Multiprocessing kernels based on semaphores exist for the IBM/370 architecture (uTS and the UNIX System for System/370 [Felton et al. 1984]), the AT&T 38204 computer, and the Sequent systems [Beck et al. 1987], among others. In supporting and developing new kernel features for UTS over the years, we have found semaphores troublesome. There are the obvious portability problems of converting algorithms based on event-wait to a different synchronization model. But more impor- tantly, we have encountered many difficulties due to the nature of the semaphore mechanism itself. Sometimes semaphores simplify kernel algorithms by providing just the right abstraction' but often their use leads to significantly more complicated algorithms. what is the root of the problems with semaphores? Semanti- cally, if not in actual implementation, the semaphore mechanism encapsulates the common while (resource is locked) sleepO; get resource lock; synchronization paradigm we have already seen (or a slight gen- eralization if the semaphore is initialized to a value greater than one). There are really three separate parts to this: testing for resource availability; blocking; and resource allocation. 390 Lawrence M. Ruane But when the algorithms are complex, we would often like to have direct control over the individual parts. For example, the buffer cache getblkO routine requires sophisticated "polling" loops [Bach 1986]. In other situations, we would like to (indivisibly) test for the availability of several different resources before allocating any of them, to simplify the handling of a non-available resource. In cases like these, which are all too common in the real world, event-wait works better. By analogy, consider the printf O routine. It really does two separate things - formatting a string and writing it to standard output - that often happen to be needed together. But if only printf O were available, the programmer's life would be difficult indeed. This is why sprintf O and puts O, the two com- ponents of printf O, are made available. Providing high-level primitives such as printf O and P O is fine, as long as one allows access to the lower level ones as well. A case study showing some of the problems with semaphores in actual use is given in Appendix.B. Due to these problems, we decided to develop a modifred event-wait mechanism for UTS. 4. Spinlocks Before getting into the multiprocessor event-wait implementation, we must review spinlocks. All multiprocessing kernels use spinlocks to provide low-level processor (as opposedto process) synchronization. A spinlock is a variable that takes on values 0 and 1. The spintockO routine indivisibly changes a given spinlock variable from 0 to 1 (retrying if necessary, without giving up the CPU). The UTS implementation of spinlocks is typical - a busy-wait loop around an indivisible test-and-set instruction. The freelocko routine simply sets the given spinlock to 0. (Spinlocks are known by other names, such as primitive semaphores in Bach [1986].) Spinlocks provide short-term exclusion; they must never be held across sleeps (otherwise every processor may try to get the spinlock at about the same time, causing system-wide deadlock since the process holding the lock cannot run to free it). Process Synchronization in the rlrs Kernel 391 Similarþ, intemrpts must be disabled while holding any spinlocks (at least any that an interrupt handler might try to get), lest the interrupt handler deadlock on a held spinlock. This require- ment is met in UTS in the simplest possible way: the entire kernel is non-interruptible. (Another simple way is to increment a per- CPU counter in spinlockO, decrement it in freelockO, and only enable intemrpts if the counter is zero.) The sleep queue is a typical example. In UTS, the sleep queue is hashed by channel number, with one spinlock per hash chain. While manipulating or referencing a hash chain, one must hold its spinlock: struct hsque { long Ik; struct proc *proc; ) hsque[64]; #define sqhash(c) (&hsque[(c>>3)&63] ) sleep (chan) { struct hsque ahp = sqhash(chaa); spintock(&hp->1k) ; línk current proc onto hp->proci freetock (&hp->Ik) ; swtchO; ) wakeup (chan) { struct hsque shp = sqhash(chan); spinlock(&hp->Ik) ; move all procs waiting on chan from hp->proc to run queue: freelock(&hp->Ik) ; ) 392 Lawrence M. Ruane 5. A Multiprocessor Event-Wait Mechanism The modifred event-wait mechanism extends the use of spinlocks to prevent the race conditions described earlier.

Load more