96
How to Make a Correct
Multipro cess Program Execute
Correctly on a Multipro cessor
Leslie Lamp ort
February 14, 1993
Systems Research Center
DEC's business and technology ob jectives require a strong research program.
The Systems Research Center (SRC) and three other research lab oratories
are committed to lling that need.
SRC b egan recruiting its rst research scientists in l984|their charter, to
advance the state of knowledge in all asp ects of computer systems research.
Our currentwork includes exploring high-p erformance p ersonal computing,
distributed computing, programming environments, system mo delling tech-
niques, sp eci cation technology, and tightly-coupled multipro cessors.
Our approach to b oth hardware and software research is to create and use
real systems so that we can investigate their prop erties fully. Complex
systems cannot b e evaluated solely in the abstract. Based on this b elief,
our strategy is to demonstrate the technical and practical feasibility of our
ideas by building prototyp es and using them as daily to ols. The exp erience
we gain is useful in the short term in enabling us to re ne our designs, and
invaluable in the long term in helping us to advance the state of knowledge
ab out those systems. Most of the ma jor advances in information systems
have come through this strategy, including time-sharing, the ArpaNet, and
distributed p ersonal computing.
SRC also p erforms work of a more mathematical avor which complements
our systems research. Some of this work is in established elds of theoretical
computer science, such as the analysis of algorithms, computational geome-
try, and logics of programming. The rest of this work explores new ground
motivated by problems that arise in our systems research.
DEC has a strong commitment to communicating the results and exp erience
gained through pursuing these activities. The Companyvalues the improved
understanding that comes with exp osing and testing our ideas within the
research community. SRC will therefore rep ort results in conferences, in
professional journals, and in our research rep ort series. We will seek users
for our prototyp e systems among those with whom wehave common research
interests, and we will encourage collab oration with university researchers.
Rob ert W. Taylor, Director iii
HowtoMake a Correct Multipro cess Program
Execute Correctly on a Multipro cessor
Leslie Lamp ort
February 14, 1993 iv
c
Digital Equipment Corp oration 1993
This work may not b e copied or repro duced in whole or in part for any com-
mercial purp ose. Permission to copy in whole or in part without payment
of fee is granted for nonpro t educational and research purp oses provided
that all such whole or partial copies include the following: a notice that
such copying is by p ermission of the Systems Research Center of Digital
Equipment Corp oration in Palo Alto, California; an acknowledgment of the
authors and individual contributors to the work; and all applicable p ortions
of the copyright notice. Copying, repro ducing, or republishing for any other
purp ose shall require a license with payment of fee to the Systems Research
Center. All rights reserved. v
Author's Abstract
Amultipro cess program executing on a mo dern multipro cessor must issue
explicit commands to synchronize memory accesses. A metho d is prop osed
for deriving the necessary commands from a correctness pro of of the algo-
rithm.
Capsule Review
Recently,anumb er of mechanisms for interpro cess synchronization have
b een prop osed. As engineers attempt to implementmultipro cessors of in-
creasing scale and p erformance, these mechanisms have b ecome quite com-
plex and dicult to reason ab out.
This short pap er presents a formalism based only on two ordering relations
between the events of an algorithm, \precedes" and \can a ect". It allows
the mechanisms that must b e provided to ensure the algorithm's correctness
to b e determined directly from the correctness pro of. The formalism and
its application to an example mutual exclusion algorithm are presented and
discussed.
Although the pap er is quite terse, a careful reading will reward those inter-
ested in concurrency or multipro cessor design.
Chuck Thacker vi
Contents
1 The Problem 1
2 The Formalism 2
3 An Example 3
3.1 An Algorithm and its Pro of :: ::: ::: ::: :: ::: ::: 3
3.2 The Implementation ::: ::: ::: ::: ::: :: ::: ::: 5
3.3 Observations :: :: ::: ::: ::: ::: ::: :: ::: ::: 6
4 Further Remarks 7
References 8 vii
1 The Problem
Accessing a single memory lo cation in a multipro cessor is traditionally as-
sumed to b e atomic. Such atomicity is a ction; a memory access consists of
anumb er of hardware actions, and di erent accesses may b e executed con-
currently. Early multipro cessors maintained this ction, but more mo dern
ones usually do not. Instead, they provide sp ecial commands with which
pro cesses themselves can synchronize memory accesses. The programmer
must determine, for each particular computer, what synchronization com-
mands are needed to make his program correct.
One prop osed metho d for achieving the necessary synchronization is with a
constrained style of programming sp eci c to a particular typ e of multipro-
cessor architecture [7, 8]. Another metho d is to reason ab out the program in
a mathematical abstraction of the architecture [5]. We take a di erent ap-
proach and derive the synchronization commands from a pro of of correctness
of the algorithm.
The commonly used formalisms for describing multipro cess programs as-
sume atomicity of memory accesses. When an assumption is built into a
formalism, it is dicult to discover from a pro of where the assumption is ac-
tually needed. Pro ofs based on these formalisms, including invariance pro ofs
[4, 16] and temp oral-logic pro ofs [17], therefore seem incapable of yielding
the necessary synchronization requirements. We derive these requirements
from pro ofs based on a little-used formalism that makes no atomicity as-
sumptions [11, 12, 14]. This pro of metho d is quite general and has b een
applied to a numb er of algorithms. The metho d of extracting synchroniza-
tion commands from a pro of is describ ed by an example|a simple mutual
exclusion algorithm. It can b e applied to the pro of of any algorithm.
Most programs are written in higher-level languages that provide abstrac-
tions, suchaslocks for shared data, that free the programmer from concerns
ab out the memory architecture. The compiler generates synchronization
commands to implement the abstractions. However, some algorithms|
esp ecially within the op erating system|require more ecient implemen-
tations than can b e achieved with high-level language abstractions. It is to
these algorithms, as well as to algorithms for implementing the higher-level
abstractions, that our metho d is directed. 1
2 The Formalism
An execution of a program is represented by a collection of operation execu-
- -
tions with the two relations (read precedes ) and (read can a ect ). An
op eration execution can b e interpreted as a nonempty set of events, where
- -
the relations and have the following meanings.
-
A B :every eventin Aprecedes every eventin B.
-
A B: some eventin Aprecedes some eventin B.
However, this interpretation serves only to aid our understanding. Formally,
we just assume that the following axioms hold, for any op eration executions
A, B , C , and D .
- - - -
A1. is transitive(A B C implies A C ) and irre exive
-
(A A).
=
- - -
A2. A B implies A B and B A.
=
- - - - -
B C or A B C implies A C . A3. A
- - - -
A4. A B C D implies A D .
-
A5. For any A there are only a nite number of B such that A B .
=
The last axiom essentially asserts that all op eration executions terminate;
nonterminating op erations satisfy a di erent axiom that is not relevant here.
Axiom A5 is useful only for proving liveness prop erties; safety prop erties are
proved with Axioms A1{A4. prop erties. Anger [3] and Abraham and Ben-
David [1] intro duced the additional axiom
- - - -
B C D implies A D . A6. A
and showed that A1{A6 form a complete axiom system for the interpretation
based on op eration executions as sets of events.
Axioms A1{A6 are indep endent of what the op eration executions do. Rea-
soning ab out a multipro cess program requires additional axioms to capture
the semantics of its op erations. The appropriate axioms for read and write
op erations will dep end on the nature of the memory system. 2
The only assumptions we make ab out op eration executions are axioms A1{
A5 and axioms ab out read and write op erations. We do not assume that
- -
and are the relations obtained byinterpreting an op eration execu-
tions as the set of all its events. For example, sequential consistency [10]is
-
equivalent to the condition that is a total ordering on the set of op er-
ation executions|a condition that can b e satis ed even though the events
comprising di erent op eration executions are actually concurrent.
This formalism was develop ed in an attempt to provide elegant pro ofs of
concurrent algorithms|pro ofs that replace conventional b ehavioral argu-
- -
ments with axiomatic reasoning in terms of the two relations and .
Although the simplicity of such pro ofs has b een questioned [6], they do tend
to capture the essence of why an algorithm works.
3 An Example
3.1 An Algorithm and its Pro of
Figure 1 shows pro cess i of a simple N -pro cess mutual exclusion algo-
rithm [13]. We prove that the algorithm guarantees mutual exclusion (two
pro cesses are never concurrently in their critical sections). The algorithm is
also deadlo ck-free (some critical section is eventually executed unless all pro-
cesses halt in their noncritical sections), but we do not consider this liveness
prop erty. Starvation of individual pro cesses is p ossible.
The algorithm uses a standard proto col to achievemutual exclusion. Before
entering its critical section, each pro cess i must rst set x true and then nd
i
x false, for all other pro cesses j . Mutual exclusion is guaranteed b ecause,
j
when pro cess i nds x false, pro cess j cannot enter its critical section until it
j
sets x true and nd x false, which is imp ossible until i has exited the critical
j i
section and reset x . The pro of of correctness formalizes this argument.
i
To provemutual exclusion, we rst name the following op eration executions
th
that o ccur during the n iteration of pro cess i's rep eat lo op.
n
L The last execution of statement l prior to entering the critical section.
i
This op eration execution sets x to true.
i
n
R The last read of x b efore entering the critical section. This read
j
i;j
obtains the value false. 3
rep eat forever
noncritical section ;
l : x := true ;
i
for j := 1 until i 1
do if x then x := false ;
j i
while x do o d;
j
goto l od;
for j := i +1 until N do while x do od od;
j
critical section ;
x := false
i
end rep eat
Figure 1: Pro cess i of an N -pro cess mutual-exclusion algorithm.
n
CS The execution of the critical section.
i
n
X The write to x after exiting the critical section. It writes the value
i
i
false.
n m
Mutual exclusion asserts that CS and CS are not concurrent, for all m
i j
1
-
) and n,ifi6=j. Two op erations are nonconcurrent if one precedes (
the other. Thus, mutual exclusion is implied by the assertion that, for all
n m m n
- -
m and n, either CS CS or CS CS ,if i6=j.
i j j i
The pro of of mutual exclusion, using axioms A1{A4 and assumptions B1{
B4 b elow, app ears in Figure 2. It is essentially the same pro of as in [13],
except that the prop erties required of the memory system have b een iso-
lated and named B1{B4. (In [13], these prop erties are deduced from other
assumptions.)
B1{B4 are as follows, where universal quanti cation over n, m, i, and j is
assumed. B4 is discussed b elow.
n n
-
R B1. L
i;j i
n
n
-
B2. R CS
i
i;j
n
n
-
X B3. CS
i
i
1
Except where indicated otherwise, all assertions have as an unstated hyp othesis the
assumption that the op eration executions they mention actually o ccur. For example, the
n m
theorem in Figure 2 has the hyp othesis that CS and CS o ccur.
i j 4
n m
-
Theorem For all m, n, i, and j such that i 6= j , either CS CS or
i j
n m
-
CS . CS
i j
n m
-
Case A: R L .
i;j j
n m
-
1. L R
i j;i
m m
Proof : B1 , case assumption, B1 (applied to L and R ), and A4.
j j;i
m n
-
2. R L
=
j;i i
Proof : 1 and A2.
n m
-
3. X R
i j;i
m n n
Proof : 2 and B4 (applied to R , L , and X ).
j;i i i
n m
-
4. CS CS
i j
m
m
Proof : B3, 3, B2 (applied to R and CS ), and A4.
j
j;i
n m
-
Case B: R L .
=
i;j j
m n
-
1. X R
j i;j
Proof : Case assumption and B4.
n m
-
CS . 2. CS
i j
m
m
Proof : B3 (applied to CS and X ), 1, B2, and A4.
j
j
Figure 2: Pro of of mutual exclusion for the algorithm of Figure 1.
n m m m n
- -
B4. If R L then X exists and X R .
=
i;j j j j i;j
Although B4 cannot b e proved without additional assumptions, it merits an
n m
-
informal justi cation. The hyp othesis, R L , asserts that pro cess i's
=
i;j j
n
read R of x o ccurred to o late for any of its events to have preceded any
j
i;j
m
of the events in pro cess j 's write L of x . It is reasonable to infer that the
j
j
m
value obtained by the read was written by L or a later write to x . Since
j
j
m n n
L writes true and R is a read of false, R must read the value written
j i;j i;j
m m
by a later write. The rst write of x issued after L is X ,sowe exp ect
j
j j
n m
-
R to hold. X
i;j j
3.2 The Implementation
Implementing the algorithm for a particular memory architecture may re-
quire synchronization commands to assure B1{B4. Most prop osed memory
systems satisfy the following prop erty.
C1. All write op erations to a single memory cell byany one pro cess are
observed by other pro cesses in the order in which they were issued. 5
They also provide some form of synch command (for example, a \cache
ush" op eration) satisfying
C2. A synch command causes the issuing pro cess to wait until all previ-
ously issued memory accesses have completed.
Prop erties C1 and C2 are rather informal. We restate them more precisely
as follows.
0
C1 . If the value obtained by a read A issued by pro cess i is the one written
by pro cess j , then that value is the one written by the last-issued write
-
B in pro cess j such that B A.
0
C2 . If op eration executions A, B , and C are issued in that order by a single
-
C . pro cess, and B is a synch, then A
0
Prop ertyC2 implies that B1{B3 are guaranteed if synch op erations are
inserted in pro cess i's co de immediately after statement l (for B1), immedi-
ately b efore the critical section (for B2), and immediately after the critical
0
section (for B3). Assumption B4 follows from C1 .
Now let us consider a more sp ecialized memory architecture in which each
pro cess has its own cache, and a write op eration (asynchronously) up dates
every copy of the memory cell that resides in the caches. In suchanarchi-
tecture, the following additional condition is likely to hold:
C3. A read of a memory cell that resides in the pro cess's cache precedes
-
( )every op eration execution issued subsequently by the same pro-
cess.
If the memory system provides some way of ensuring that a memory cell
is p ermanently resident in a pro cess's cache, then B2 can b e satis ed by
keeping all the variables x in pro cess i's cache. In this case, the synch
j
immediately preceding the critical section is not needed.
3.3 Observations
One might think that the purp ose of memory synchronization commands is
to enforce orderings b etween commands issued by di erent pro cesses. How-
ever, B1{B3 are precedence relations b etween op erations issued by the same 6
pro cess. In general, one pro cess cannot directly observe all the events in the
execution of an op eration by another pro cess. Hence, the results of execut-
ing two op eration executions A and D in di erent pro cesses can p ermit the
-
deduction only of a causality( ) relation b etween A and D . Only if A and
-
D o ccur in the same pro cess can A D b e deduced by direct observation.
-
Otherwise, deducing A D requires the existence of an op eration B in
the same pro cess as A and an op eration C in the same pro cess as D such
- - -
that A B C D . Synchronization commands can guarantee the
- -
relations A B and C D .
The mutual exclusion example illustrates how a set of prop erties sucient
to guarantee correctness can b e extracted directly from a correctness pro of
of the algorithm. Implementations of the algorithm on di erent memory
architectures can b e derived from the assumptions, with no further reasoning
ab out the algorithm.
4 Further Remarks
The atomicity condition traditionally assumed for multipro cess programs is
sequential consistency, meaning that the program b ehaves as if the memory
accesses of all pro cesses were interleaved and then executed sequentially [10].
It has b een prop osed that, when sequential consistency is not provided by
the memory system, it b e achieved by a constrained style of programming.
Synchronization commands are added either explicitly by the programmer,
or automatically from hints he provides. The metho d of [7, 8] can b e applied
to our simple example, if the x are identi ed by the programmer as syn-
i
chronization variables. However, in general, deducing what synchronization
commands are necessary requires analyzing all p ossible executions of the
program, which is seldom feasible. Such an analysis is needed to nd the
precedence relations that, in the approach describ ed here, are derived from
the pro of.
Although it replaces traditional informal reasoning with a more rigorous, ax-
iomatic style, the pro of metho d wehave used is essentially b ehavioral|one
reasons directly ab out the set of op eration executions. Behavioral meth-
o ds do not seem to scale well, and our approach is unlikely to b e practical
for large, complicated algorithms. Most multipro cess programs for mo dern
multipro cessors are b est written in terms of higher-level abstractions. The
metho d presented here can b e applied to the algorithms that implement 7
these abstractions and to those algorithms, usually in the depths of the
op erating system, where eciency and correctness are crucial.
Assertional pro ofs are practical for more complicated algorithms. The obvi-
ous way to reason assertionally ab out algorithms with nonatomic memory
op erations is to represent a memory access by a sequence of atomic op er-
ations [2, 9]. With this approach, the memory architecture and synchro-
nization op erations are enco ded in the algorithm. Therefore, a new pro of
is needed for each architecture, and the pro ofs are unlikely to help discover
what synchronization op erations are needed. A less obvious approach uses
the predicate transformers win (weakest invariant) and sin (strongest invari-
ant) to write assertional pro ofs for algorithms in which no atomic op erations
are assumed, requirements on the memory architecture b eing describ ed by
axioms [15]. Such a pro of would establish the correctness of an algorithm
for a large class of memory architectures. However, in this approach, all
-
intrapro cess relations are enco ded in the algorithm, so the pro ofs are
unlikely to help discover the very precedence relations that lead to the in-
tro duction of synchronization op erations.
Acknowledgments
I wish to thank Allan Heydon, Michael Merritt, David Probst, Garrett
Swart, Fred Schneider, and Chuck Thacker for their comments on earlier
versions.
References
[1] Uri Abraham, Shai Ben-David, and Menachem Magidor. On global-
time and inter-pro cess communication. In M. Z. Kwiatkowska, M. W.
Shields, and R.M. Thomas, editors, Semantics for Concurrency, pages
311{323. Springer-Verlag, Leicester, 1990.
[2] James H. Anderson and Mohamed G. Gouda. Atomic semantics of
nonatomic programs. Information Processing Letters, 28:99{103, June
1988.
[3] Frank D. Anger. On Lamp ort's interpro cessor communication
mo del. ACM Transactions on Programming Languages and Systems,
11(3):404{417, July 1989. 8
[4] E. A. Ashcroft. Proving assertions ab out parallel programs. Journal of
Computer and System Sciences, 10:110{135, February 1975.
[5] Hagit Attiya and RoyFriedman. A correctness condition for high-
p erformance multipro cessors. In Proceedings of the Twenty-Fourth An-
nual ACM Symposium on the Theory of Computing, pages 679{690,
1992.
[6] Shai Ben-David. The global time assumption and semantics for con-
current systems. In Proceedings of the 7th annual ACM Symposium on
Principles of Distributed Computing, pages 223{232. ACM Press, 1988.
[7] Kourosh Gharachorlo o, Daniel Lenoski, James Laudon, Phillip Gib-
b ons, Ano op Gupta, and John Hennessy. Memory consistency and
event ordering in scalable shared-memory multipro cessors. In Proceed-
ings of the International Conference on Computer Architecture, 1990.
[8] Phillip B. Gibb ons, Michael Merritt, and Kourosh Gharachorlo o. Prov-
ing sequential consistency of high-p erformance shared memories. In
Symposium on Paral lel Algorithms and Architectures, July 1991. A full
version available as an AT&T Bell Lab oratories technical rep ort, May,
1991.
[9] Leslie Lamp ort. Proving the correctness of multipro cess programs.
IEEE Transactions on Software Engineering, SE-3(2):125{143, March
1977.
[10] Leslie Lamp ort. Howtomakeamultipro cessor computer that correctly
executes multipro cess programs. IEEE Transactions on Computers,C-
28(9):690{691, Septemb er 1979.
[11] Leslie Lamp ort. A new approachtoproving the correctness of multi-
pro cess programs. ACM Transactions on Programming Languages and
Systems, 1(1):84{97, July 1979.
[12] Leslie Lamp ort. The mutual exclusion problem|part i: A theory of
interpro cess communication. Journal of the ACM, 33(2):313{326, Jan-
uary 1985.
[13] Leslie Lamp ort. The mutual exclusion problem|part ii: Statement
and solutions. Journal of the ACM, 32(1):327{348, January 1985. 9
[14] Leslie Lamp ort. On interpro cess communication|part i: Basic formal-
ism. Distributed Computing, 1:77{85, 1986.
[15] Leslie Lamp ort. win and sin : Predicate transformers for concur-
rency. ACM Transactions on Programming Languages and Systems,
12(3):396{428, July 1990.
[16] Susan Owicki and David Gries. Verifying prop erties of parallel
programs: An axiomatic approach. Communications of the ACM,
19(5):279{284, May 1976.
[17] Amir Pnueli. The temp oral logic of programs. In Proceedings of the 18th
Annual Symposium on the Foundations of Computer Science, pages 46{
57. IEEE, Novemb er 1977. 10