A Fast Mutual Exclusion Algorithm
Total Page:16
File Type:pdf, Size:1020Kb
A Fast Mutual Exclusion Algorithm Leslie Lamport November 14, 1985 revised October 31, 1986 This report appeared in the ACM Transactions on Computer Systems, Volume 5, Number 1, February 1987, Pages 1–11. c Digital Equipment Corporation 1988 This work may not be copied or reproduced in whole or in part for any com- mercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission ofthe Systems Research Center ofDigital Equipment Corporation in Palo Alto, California; an acknowledgment of the authors and individual contributors to the work; and all applicable portions ofthe copyright notice. Copying, reproducing, or republishing for any other purpose shall require a license with payment offeeto the Systems Research Center. All rights reserved. Author’s Abstract A new solution to the mutual exclusion problem is presented that, in the absence ofcontention, requires only seven memory accesses. It assumes atomic reads and atomic writes to shared registers. Capsule Review To build a useful computing system from a collection of processors that com- municate by sharing memory, but lack any atomic operation more complex than a memory read or write, it is necessary to implement mutual exclusion using only these operations. Solutions to this problem have been known for twenty years, but they are linear in the number of processors. Lamport presents a new algorithm which takes constant time (five writes and two reads) in the absence ofcontention, which is the normal case. To achieve this performance it sacrifices fairness, which is probably unimportant in practical applications. The paper gives an informal argument that the algorithm’s performance in the absence of contention is optimal, and a fairly formal proof of safety and freedom from deadlock, using a slightly modified Owicki-Gries method. The proofs are extremely clear, and use very little notation. Butler Lampson iv Contents 1 Introduction 1 2 The Algorithms 2 3 Correctness Proofs 6 3.1MutualExclusion......................... 7 3.2DeadlockFreedom........................ 11 References 15 vi 1 Introduction The mutual exclusion problem—guaranteeing mutually exclusive access to a critical section among a number ofcompeting processes—is well known, and many solutions have been published. The original version ofthe prob- lem, as presented by Dijkstra [2], assumed a shared memory with atomic read and write operations. Since the early 1970s, solutions to this version have been oflittle practical interest. Ifthe concurrent processes are being time-shared on a single processor, then mutual exclusion is easily achieved by inhibiting hardware interrupts at crucial times. On the other hand, multiprocessor computers have been built with atomic test-and-set instruc- tions that permitted much simpler mutual exclusion algorithms. Since about 1974, researchers have concentrated on finding algorithms that use a more restricted form of shared memory or that use message passing instead of shared memory. Oflate, the original version ofthe problem has not been widely studied. Recently, there has arisen interest in building shared-memory multipro- cessor computers by connecting standard processors and memories, with as little modification to the hardware as possible. Because ordinary sequen- tial processors and memories do not have atomic test-and-set operations, it is worth investigating whether shared-memory mutual exclusion algorithms are a practical alternative. Experience gained since shared-memory mutual exclusion algorithms were first studied seems to indicate that the early solutions were judged by criteria that are not relevant in practice. A great deal ofeffort went into developing algorithms that do not allow a process to wait longer than it “should” while other processes are entering and leaving the critical sec- tion [1, 3, 6]. However, the current beliefamong operating system designers is that contention for a critical section is rare in a well-designed system; most ofthe time, a process will be able to enter without having to wait [5]. Even an algorithm that allows an individual process to wait forever (be “starved”) by other processes entering the critical section is considered acceptable, since such starvation is unlikely to occur. This beliefshould perhaps be classified as folklore, since there does not appear to be enough experience with multi- processor operating systems to assert it with great confidence. Nevertheless, in this paper it is accepted as fact, and solutions are judged by how fast they are in the absence ofcontention. Ofcourse, a solution must not take much too long or lead to deadlock when there is contention. With modern high-speed processors, an operation that accesses shared 1 memory takes much more time than one that can be performed locally. Hence, the number ofreads and writes to shared memory is a good measure ofan algorithm’s execution time. All the published N-process solutions that I know ofrequire a process to execute O(N) operations to shared memory in the absence ofcontention. This paper presents a solution that does only five writes and two reads ofshared memory in this case. An even faster solution is also given, but it requires an upper bound on how long a process can remain in its critical section. An informal argument is given to suggest that these algorithms are optimal. 2 The Algorithms Each process is assumed to have a unique identifier, which for convenience is taken to be a positive integer. Atomic reads and writes are permitted to single words ofmemory, which are assumed to be long enough to hold a process number. The critical section and all code outside the mutual exclusion protocol are assumed not to modify any variables used by the algorithms. Perhaps the simplest possible algorithm is one suggested by Michael Fischer, in which process number i executes the following algorithm, where x is a word ofshared memory, angle brackets enclose atomic operations, and await b is an abbreviation for while ¬b do skip: repeat await x =0; x := i ; delay until x = i ; critical section; x := 0 The delay operation causes the process to wait sufficiently long so that, if another process j had read the value of x in its await statement before process i executed its x := i statement, then j will have completed the following x := j statement. It is traditional to make no assumption about process speeds because, when processes time-share a processor, a process can be delayed for quite a long time between successive operations. However, assumptions about execution times may be permissible in a true multipro- cessor ifthe algorithm can be executed by a low-level operating system routine with hardware interrupts disabled. Indeed, an algorithm with busy 2 waiting should never be used ifcontending processes can share a processor, since a waiting process i could be tying up a processor needed to run the other process that i is waiting for. The algorithm above appears to require a total ofonly five memory access times in the absence ofcontention, since the delay must wait foronly a single memory access to occur. However, the delay must be for the worst case access time. Since there could be N − 1 processes contending for access to the memory, the worst case time must be at least O(N) times the best case (most probable) time needed to perform a memory access.1 Moreover, in computer systems that use a static priority for access to memory, there may not even be an upper bound to the time taken by a memory access. Therefore, an algorithm that has such a delay in the absence of contention is not acceptable. Before constructing a better algorithm, let us consider the minimum se- quence ofmemory accesses needed to guarantee mutual exclusion starting from the initial state of the system. The goal is an algorithm that requires a fixed number ofmemory accesses, independent of N, in the absence of contention. The argument is quite informal, some assertions having such flimsy justification that they might better be called assumptions, and the conclusion could easily be wrong. But even ifit should be wrong, the ar- gument can guide the search for a more efficient algorithm, since such an algorithm must violate some assertion in the proof. Delays long enough to ensure that other processes have done something seem to require O(N) time because ofpossible memory contention, so we may assume that no delay operations are executed. Therefore, only mem- ory accesses need be considered. Let Si denote the sequence ofwrites and reads executed by process i in entering its critical section when there is no contention—that is, the sequence executed when every read returns either the initial value or a value written by an earlier operation in Si. There is no point having a process write a variable that is not read by another process. Any access by Si to a memory word not accessed by Sj can play no part in preventing both i and j from entering the critical section at the same time. Therefore, in a solution using the minimal num- ber ofmemory references, all the Si should access the same set ofmemory words. (Remember that Si consists ofthe accesses performed in the absence 1Memory contention is not necessarily caused by processes contending for the critical section; it could result from processes accessing other words stored in the same memory module as x. Memory contention may be much more probable than contention for the critical section. 3 ofcontention.) Since the number ofmemory words accessed is fixed, inde- pendent of N,byincreasingN we can guarantee that there are arbitrarily many processes i for which Si consists ofthe identical sequence ofwrites and reads—that is, identical except for the actual values that are written, which may depend upon i.