<<

96

How to Make a Correct

Multipro cess Program Execute

Correctly on a Multipro cessor

Leslie Lamp ort

February 14, 1993

Systems Research Center

DEC's business and technology ob jectives require a strong research program.

The Systems Research Center (SRC) and three other research lab oratories

are committed to lling that need.

SRC b egan recruiting its rst research scientists in l984|their charter, to

advance the state of knowledge in all asp ects of computer systems research.

Our currentwork includes exploring high-p erformance p ersonal computing,

, programming environments, system mo delling tech-

niques, sp eci cation technology, and tightly-coupled multipro cessors.

Our approach to b oth hardware and software research is to create and use

real systems so that we can investigate their prop erties fully. Complex

systems cannot b e evaluated solely in the abstract. Based on this b elief,

our strategy is to demonstrate the technical and practical feasibility of our

ideas by building prototyp es and using them as daily to ols. The exp erience

we gain is useful in the short term in enabling us to re ne our designs, and

invaluable in the long term in helping us to advance the state of knowledge

ab out those systems. Most of the ma jor advances in information systems

have come through this strategy, including time-sharing, the ArpaNet, and

distributed p ersonal computing.

SRC also p erforms work of a more mathematical avor which complements

our systems research. Some of this work is in established elds of theoretical

, such as the analysis of , computational geome-

try, and logics of programming. The rest of this work explores new ground

motivated by problems that arise in our systems research.

DEC has a strong commitment to communicating the results and exp erience

gained through pursuing these activities. The Companyvalues the improved

understanding that comes with exp osing and testing our ideas within the

research community. SRC will therefore rep ort results in conferences, in

professional journals, and in our research rep ort series. We will seek users

for our prototyp e systems among those with whom wehave common research

interests, and we will encourage collab oration with university researchers.

Rob ert W. Taylor, Director iii

HowtoMake a Correct Multipro cess Program

Execute Correctly on a Multipro cessor

Leslie Lamp ort

February 14, 1993 iv

c

Digital Equipment Corp oration 1993

This work may not b e copied or repro duced in whole or in part for any com-

mercial purp ose. Permission to copy in whole or in part without payment

of fee is granted for nonpro t educational and research purp oses provided

that all such whole or partial copies include the following: a notice that

such copying is by p ermission of the Systems Research Center of Digital

Equipment Corp oration in Palo Alto, ; an acknowledgment of the

authors and individual contributors to the work; and all applicable p ortions

of the copyright notice. Copying, repro ducing, or republishing for any other

purp ose shall require a license with payment of fee to the Systems Research

Center. All rights reserved. v

Author's Abstract

Amultipro cess program executing on a mo dern multipro cessor must issue

explicit commands to synchronize memory accesses. A metho d is prop osed

for deriving the necessary commands from a correctness pro of of the algo-

rithm.

Capsule Review

Recently,anumb er of mechanisms for interpro cess synchronization have

b een prop osed. As engineers attempt to implementmultipro cessors of in-

creasing scale and p erformance, these mechanisms have b ecome quite com-

plex and dicult to reason ab out.

This short pap er presents a formalism based only on two ordering relations

between the events of an , \precedes" and \can a ect". It allows

the mechanisms that must b e provided to ensure the algorithm's correctness

to b e determined directly from the correctness pro of. The formalism and

its application to an example algorithm are presented and

discussed.

Although the pap er is quite terse, a careful reading will reward those inter-

ested in concurrency or multipro cessor design.

Chuck Thacker vi

Contents

1 The Problem 1

2 The Formalism 2

3 An Example 3

3.1 An Algorithm and its Pro of :: ::: ::: ::: :: ::: ::: 3

3.2 The Implementation ::: ::: ::: ::: ::: :: ::: ::: 5

3.3 Observations :: :: ::: ::: ::: ::: ::: :: ::: ::: 6

4 Further Remarks 7

References 8 vii

1 The Problem

Accessing a single memory lo cation in a multipro cessor is traditionally as-

sumed to b e atomic. Such atomicity is a ction; a memory access consists of

anumb er of hardware actions, and di erent accesses may b e executed con-

currently. Early multipro cessors maintained this ction, but more mo dern

ones usually do not. Instead, they provide sp ecial commands with which

pro cesses themselves can synchronize memory accesses. The programmer

must determine, for each particular computer, what synchronization com-

mands are needed to make his program correct.

One prop osed metho d for achieving the necessary synchronization is with a

constrained style of programming sp eci c to a particular typ e of multipro-

cessor architecture [7, 8]. Another metho d is to reason ab out the program in

a mathematical abstraction of the architecture [5]. We take a di erent ap-

proach and derive the synchronization commands from a pro of of correctness

of the algorithm.

The commonly used formalisms for describing multipro cess programs as-

sume atomicity of memory accesses. When an assumption is built into a

formalism, it is dicult to discover from a pro of where the assumption is ac-

tually needed. Pro ofs based on these formalisms, including invariance pro ofs

[4, 16] and temp oral-logic pro ofs [17], therefore seem incapable of yielding

the necessary synchronization requirements. We derive these requirements

from pro ofs based on a little-used formalism that makes no atomicity as-

sumptions [11, 12, 14]. This pro of metho d is quite general and has b een

applied to a numb er of algorithms. The metho d of extracting synchroniza-

tion commands from a pro of is describ ed by an example|a simple mutual

exclusion algorithm. It can b e applied to the pro of of any algorithm.

Most programs are written in higher-level languages that provide abstrac-

tions, suchaslocks for shared data, that free the programmer from concerns

ab out the memory architecture. The compiler generates synchronization

commands to implement the abstractions. However, some algorithms|

esp ecially within the op erating system|require more ecient implemen-

tations than can b e achieved with high-level language abstractions. It is to

these algorithms, as well as to algorithms for implementing the higher-level

abstractions, that our metho d is directed. 1

2 The Formalism

An execution of a program is represented by a collection of operation execu-

- -

tions with the two relations (read precedes ) and (read can a ect ). An

op eration execution can b e interpreted as a nonempty set of events, where

- -

the relations and have the following meanings.

-

A B :every eventin Aprecedes every eventin B.

-

A B: some eventin Aprecedes some eventin B.

However, this interpretation serves only to aid our understanding. Formally,

we just assume that the following axioms hold, for any op eration executions

A, B , C , and D .

- - - -

A1. is transitive(A B C implies A C ) and irre exive

-

(A A).

=

- - -

A2. A B implies A B and B A.

=

- - - - -

B C or A B C implies A C . A3. A

- - - -

A4. A B C D implies A D .

-

A5. For any A there are only a nite number of B such that A B .

=

The last axiom essentially asserts that all op eration executions terminate;

nonterminating op erations satisfy a di erent axiom that is not relevant here.

Axiom A5 is useful only for proving liveness prop erties; safety prop erties are

proved with Axioms A1{A4. prop erties. Anger [3] and Abraham and Ben-

David [1] intro duced the additional axiom

- - - -

B C D implies A D . A6. A

and showed that A1{A6 form a complete axiom system for the interpretation

based on op eration executions as sets of events.

Axioms A1{A6 are indep endent of what the op eration executions do. Rea-

soning ab out a multipro cess program requires additional axioms to capture

the semantics of its op erations. The appropriate axioms for read and write

op erations will dep end on the nature of the memory system. 2

The only assumptions we make ab out op eration executions are axioms A1{

A5 and axioms ab out read and write op erations. We do not assume that

- -

and are the relations obtained byinterpreting an op eration execu-

tions as the set of all its events. For example, sequential consistency [10]is

-

equivalent to the condition that is a total ordering on the set of op er-

ation executions|a condition that can b e satis ed even though the events

comprising di erent op eration executions are actually concurrent.

This formalism was develop ed in an attempt to provide elegant pro ofs of

concurrent algorithms|pro ofs that replace conventional b ehavioral argu-

- -

ments with axiomatic reasoning in terms of the two relations and .

Although the simplicity of such pro ofs has b een questioned [6], they do tend

to capture the essence of why an algorithm works.

3 An Example

3.1 An Algorithm and its Pro of

Figure 1 shows pro cess i of a simple N -pro cess mutual exclusion algo-

rithm [13]. We prove that the algorithm guarantees mutual exclusion (two

pro cesses are never concurrently in their critical sections). The algorithm is

also deadlo ck-free (some critical section is eventually executed unless all pro-

cesses halt in their noncritical sections), but we do not consider this liveness

prop erty. Starvation of individual pro cesses is p ossible.

The algorithm uses a standard proto col to achievemutual exclusion. Before

entering its critical section, each pro cess i must rst set x true and then nd

i

x false, for all other pro cesses j . Mutual exclusion is guaranteed b ecause,

j

when pro cess i nds x false, pro cess j cannot enter its critical section until it

j

sets x true and nd x false, which is imp ossible until i has exited the critical

j i

section and reset x . The pro of of correctness formalizes this argument.

i

To provemutual exclusion, we rst name the following op eration executions

th

that o ccur during the n iteration of pro cess i's rep eat lo op.

n

L The last execution of statement l prior to entering the critical section.

i

This op eration execution sets x to true.

i

n

R The last read of x b efore entering the critical section. This read

j

i;j

obtains the value false. 3

rep eat forever

noncritical section ;

l : x := true ;

i

for j := 1 until i 1

do if x then x := false ;

j i

while x do o d;

j

goto l od;

for j := i +1 until N do while x do od od;

j

critical section ;

x := false

i

end rep eat

Figure 1: Pro cess i of an N -pro cess mutual-exclusion algorithm.

n

CS The execution of the critical section.

i

n

X The write to x after exiting the critical section. It writes the value

i

i

false.

n m

Mutual exclusion asserts that CS and CS are not concurrent, for all m

i j

1

-

) and n,ifi6=j. Two op erations are nonconcurrent if one precedes (

the other. Thus, mutual exclusion is implied by the assertion that, for all

n m m n

- -

m and n, either CS CS or CS CS ,if i6=j.

i j j i

The pro of of mutual exclusion, using axioms A1{A4 and assumptions B1{

B4 b elow, app ears in Figure 2. It is essentially the same pro of as in [13],

except that the prop erties required of the memory system have b een iso-

lated and named B1{B4. (In [13], these prop erties are deduced from other

assumptions.)

B1{B4 are as follows, where universal quanti cation over n, m, i, and j is

assumed. B4 is discussed b elow.

n n

-

R B1. L

i;j i

n

n

-

B2. R CS

i

i;j

n

n

-

X B3. CS

i

i

1

Except where indicated otherwise, all assertions have as an unstated hyp othesis the

assumption that the op eration executions they mention actually o ccur. For example, the

n m

theorem in Figure 2 has the hyp othesis that CS and CS o ccur.

i j 4

n m

-

Theorem For all m, n, i, and j such that i 6= j , either CS CS or

i j

n m

-

CS . CS

i j

n m

-

Case A: R L .

i;j j

n m

-

1. L R

i j;i

m m

Proof : B1 , case assumption, B1 (applied to L and R ), and A4.

j j;i

m n

-

2. R L

=

j;i i

Proof : 1 and A2.

n m

-

3. X R

i j;i

m n n

Proof : 2 and B4 (applied to R , L , and X ).

j;i i i

n m

-

4. CS CS

i j

m

m

Proof : B3, 3, B2 (applied to R and CS ), and A4.

j

j;i

n m

-

Case B: R L .

=

i;j j

m n

-

1. X R

j i;j

Proof : Case assumption and B4.

n m

-

CS . 2. CS

i j

m

m

Proof : B3 (applied to CS and X ), 1, B2, and A4.

j

j

Figure 2: Pro of of mutual exclusion for the algorithm of Figure 1.

n m m m n

- -

B4. If R L then X exists and X R .

=

i;j j j j i;j

Although B4 cannot b e proved without additional assumptions, it merits an

n m

-

informal justi cation. The hyp othesis, R L , asserts that pro cess i's

=

i;j j

n

read R of x o ccurred to o late for any of its events to have preceded any

j

i;j

m

of the events in pro cess j 's write L of x . It is reasonable to infer that the

j

j

m

value obtained by the read was written by L or a later write to x . Since

j

j

m n n

L writes true and R is a read of false, R must read the value written

j i;j i;j

m m

by a later write. The rst write of x issued after L is X ,sowe exp ect

j

j j

n m

-

R to hold. X

i;j j

3.2 The Implementation

Implementing the algorithm for a particular memory architecture may re-

quire synchronization commands to assure B1{B4. Most prop osed memory

systems satisfy the following prop erty.

C1. All write op erations to a single memory cell byany one pro cess are

observed by other pro cesses in the order in which they were issued. 5

They also provide some form of synch command (for example, a \cache

ush" op eration) satisfying

C2. A synch command causes the issuing pro cess to wait until all previ-

ously issued memory accesses have completed.

Prop erties C1 and C2 are rather informal. We restate them more precisely

as follows.

0

C1 . If the value obtained by a read A issued by pro cess i is the one written

by pro cess j , then that value is the one written by the last-issued write

-

B in pro cess j such that B A.

0

C2 . If op eration executions A, B , and C are issued in that order by a single

-

C . pro cess, and B is a synch, then A

0

Prop ertyC2 implies that B1{B3 are guaranteed if synch op erations are

inserted in pro cess i's co de immediately after statement l (for B1), immedi-

ately b efore the critical section (for B2), and immediately after the critical

0

section (for B3). Assumption B4 follows from C1 .

Now let us consider a more sp ecialized memory architecture in which each

pro cess has its own cache, and a write op eration (asynchronously) up dates

every copy of the memory cell that resides in the caches. In suchanarchi-

tecture, the following additional condition is likely to hold:

C3. A read of a memory cell that resides in the pro cess's cache precedes

-

( )every op eration execution issued subsequently by the same pro-

cess.

If the memory system provides some way of ensuring that a memory cell

is p ermanently resident in a pro cess's cache, then B2 can b e satis ed by

keeping all the variables x in pro cess i's cache. In this case, the synch

j

immediately preceding the critical section is not needed.

3.3 Observations

One might think that the purp ose of memory synchronization commands is

to enforce orderings b etween commands issued by di erent pro cesses. How-

ever, B1{B3 are precedence relations b etween op erations issued by the same 6

pro cess. In general, one pro cess cannot directly observe all the events in the

execution of an op eration by another pro cess. Hence, the results of execut-

ing two op eration executions A and D in di erent pro cesses can p ermit the

-

deduction only of a causality( ) relation b etween A and D . Only if A and

-

D o ccur in the same pro cess can A D b e deduced by direct observation.

-

Otherwise, deducing A D requires the existence of an op eration B in

the same pro cess as A and an op eration C in the same pro cess as D such

- - -

that A B C D . Synchronization commands can guarantee the

- -

relations A B and C D .

The mutual exclusion example illustrates how a set of prop erties sucient

to guarantee correctness can b e extracted directly from a correctness pro of

of the algorithm. Implementations of the algorithm on di erent memory

architectures can b e derived from the assumptions, with no further reasoning

ab out the algorithm.

4 Further Remarks

The atomicity condition traditionally assumed for multipro cess programs is

sequential consistency, meaning that the program b ehaves as if the memory

accesses of all pro cesses were interleaved and then executed sequentially [10].

It has b een prop osed that, when sequential consistency is not provided by

the memory system, it b e achieved by a constrained style of programming.

Synchronization commands are added either explicitly by the programmer,

or automatically from hints he provides. The metho d of [7, 8] can b e applied

to our simple example, if the x are identi ed by the programmer as syn-

i

chronization variables. However, in general, deducing what synchronization

commands are necessary requires analyzing all p ossible executions of the

program, which is seldom feasible. Such an analysis is needed to nd the

precedence relations that, in the approach describ ed here, are derived from

the pro of.

Although it replaces traditional informal reasoning with a more rigorous, ax-

iomatic style, the pro of metho d wehave used is essentially b ehavioral|one

reasons directly ab out the set of op eration executions. Behavioral meth-

o ds do not seem to scale well, and our approach is unlikely to b e practical

for large, complicated algorithms. Most multipro cess programs for mo dern

multipro cessors are b est written in terms of higher-level abstractions. The

metho d presented here can b e applied to the algorithms that implement 7

these abstractions and to those algorithms, usually in the depths of the

op erating system, where eciency and correctness are crucial.

Assertional pro ofs are practical for more complicated algorithms. The obvi-

ous way to reason assertionally ab out algorithms with nonatomic memory

op erations is to represent a memory access by a sequence of atomic op er-

ations [2, 9]. With this approach, the memory architecture and synchro-

nization op erations are enco ded in the algorithm. Therefore, a new pro of

is needed for each architecture, and the pro ofs are unlikely to help discover

what synchronization op erations are needed. A less obvious approach uses

the predicate transformers win (weakest invariant) and sin (strongest invari-

ant) to write assertional pro ofs for algorithms in which no atomic op erations

are assumed, requirements on the memory architecture b eing describ ed by

axioms [15]. Such a pro of would establish the correctness of an algorithm

for a large class of memory architectures. However, in this approach, all

-

intrapro cess relations are enco ded in the algorithm, so the pro ofs are

unlikely to help discover the very precedence relations that lead to the in-

tro duction of synchronization op erations.

Acknowledgments

I wish to thank Allan Heydon, Michael Merritt, David Probst, Garrett

Swart, Fred Schneider, and Chuck Thacker for their comments on earlier

versions.

References

[1] Uri Abraham, Shai Ben-David, and Menachem Magidor. On global-

time and inter-pro cess communication. In M. Z. Kwiatkowska, M. W.

Shields, and R.M. Thomas, editors, Semantics for Concurrency, pages

311{323. Springer-Verlag, Leicester, 1990.

[2] James H. Anderson and Mohamed G. Gouda. Atomic semantics of

nonatomic programs. Information Processing Letters, 28:99{103, June

1988.

[3] Frank D. Anger. On Lamp ort's interpro cessor communication

mo del. ACM Transactions on Programming Languages and Systems,

11(3):404{417, July 1989. 8

[4] E. A. Ashcroft. Proving assertions ab out parallel programs. Journal of

Computer and System Sciences, 10:110{135, February 1975.

[5] Hagit Attiya and RoyFriedman. A correctness condition for high-

p erformance multipro cessors. In Proceedings of the Twenty-Fourth An-

nual ACM Symposium on the Theory of Computing, pages 679{690,

1992.

[6] Shai Ben-David. The global time assumption and semantics for con-

current systems. In Proceedings of the 7th annual ACM Symposium on

Principles of Distributed Computing, pages 223{232. ACM Press, 1988.

[7] Kourosh Gharachorlo o, Daniel Lenoski, James Laudon, Phillip Gib-

b ons, Ano op Gupta, and John Hennessy. Memory consistency and

event ordering in scalable shared-memory multipro cessors. In Proceed-

ings of the International Conference on Computer Architecture, 1990.

[8] Phillip B. Gibb ons, Michael Merritt, and Kourosh Gharachorlo o. Prov-

ing sequential consistency of high-p erformance shared memories. In

Symposium on Paral lel Algorithms and Architectures, July 1991. A full

version available as an AT&T Bell Lab oratories technical rep ort, May,

1991.

[9] Leslie Lamp ort. Proving the correctness of multipro cess programs.

IEEE Transactions on Software Engineering, SE-3(2):125{143, March

1977.

[10] Leslie Lamp ort. Howtomakeamultipro cessor computer that correctly

executes multipro cess programs. IEEE Transactions on Computers,C-

28(9):690{691, Septemb er 1979.

[11] Leslie Lamp ort. A new approachtoproving the correctness of multi-

pro cess programs. ACM Transactions on Programming Languages and

Systems, 1(1):84{97, July 1979.

[12] Leslie Lamp ort. The mutual exclusion problem|part i: A theory of

interpro cess communication. Journal of the ACM, 33(2):313{326, Jan-

uary 1985.

[13] Leslie Lamp ort. The mutual exclusion problem|part ii: Statement

and solutions. Journal of the ACM, 32(1):327{348, January 1985. 9

[14] Leslie Lamp ort. On interpro cess communication|part i: Basic formal-

ism. Distributed Computing, 1:77{85, 1986.

[15] Leslie Lamp ort. win and sin : Predicate transformers for concur-

rency. ACM Transactions on Programming Languages and Systems,

12(3):396{428, July 1990.

[16] Susan Owicki and David Gries. Verifying prop erties of parallel

programs: An axiomatic approach. Communications of the ACM,

19(5):279{284, May 1976.

[17] . The temp oral logic of programs. In Proceedings of the 18th

Annual Symposium on the Foundations of Computer Science, pages 46{

57. IEEE, Novemb er 1977. 10