TM

Tight Bounds for k-Set Agreement

Soma Chaudhuri Maurice Herlihy Nancy A. Lynch Mark R. Tuttle

CRL 98/4 May 1998 Cambridge Research Laboratory

The Cambridge Research Laboratory was founded in 1987 to advance the state of the art in both core computing and human-computerinteraction, and to use the knowledge so gained to support the Company’s corporate objectives. We believe this is best accomplished through interconnected pur- suits in technology creation, advanced systems engineering, and business development. We are ac- tively investigating scalable computing; mobile computing; vision-based human and scene sensing; speech interaction; computer-animated synthetic persona; intelligent information appliances; and the capture, coding, storage, indexing, retrieval, decoding, and rendering of multimedia data. We recognize and embrace a technology creation model which is characterized by three major phases:

Freedom: The life blood of the Laboratory comes from the observations and imaginations of our research staff. It is here that challenging research problems are uncovered (through discussions with customers, through interactions with others in the Corporation, through other professional interac- tions, through reading, and the like) or that new ideas are born. For any such problem or idea, this phase culminates in the nucleation of a project team around a well articulated central research question and the outlining of a research plan.

Focus: Once a team is formed, we aggressively pursue the creation of new technology based on the plan. This may involve direct collaboration with other technical professionals inside and outside the Corporation. This phase culminates in the demonstrable creation of new technology which may take any of a number of forms - a journal article, a technical talk, a working prototype, a patent application, or some combination of these. The research team is typically augmented with other resident professionals—engineering and business development—who work as integral members of the core team to prepare preliminary plans for how best to leverage this new knowledge, either through internal transfer of technology or through other means.

Follow-through: We actively pursue taking the best technologies to the marketplace. For those opportunities which are not immediately transferred internally and where the team has identified a significant opportunity, the business development and engineering staff will lead early-stage com- mercial development, often in conjunction with members of the research staff. While the value to the Corporation of taking these new ideas to the market is clear, it also has a significant positive im- pact on our future research work by providing the means to understand intimately the problems and opportunities in the market and to more fully exercise our ideas and concepts in real-world settings.

Throughout this process, communicating our understanding is a critical part of what we do, and participating in the larger technical community—through the publication of refereed journal articles and the presentation of our ideas at conferences–is essential. Our technical report series supports and facilitates broad and early dissemination of our work. We welcome your feedback on its effec- tiveness.

Robert A. Iannucci, Ph.D. Director Tight Bounds for k -Set Agreement

Soma Chaudhuri Maurice Herlihy Department of Computer Science Computer Science Department Iowa State University Brown University Ames, IA 50011 Providence, RI 02912 [email protected] [email protected]

Nancy A. Lynch Mark R. Tuttle MIT Lab for Computer Science DEC Cambridge Research Lab 545 Technology Square One Kendall Square, Bldg 700 Cambridge, MA 02139 Cambridge, MA 02139 [email protected] [email protected]

May 1998

Abstract

We prove tight bounds on the time needed to solve k -set agreement. In this prob- lem, each processor starts with an arbitrary input value taken from a fixed set, and halts

after choosing an output value. In every execution, at most k distinct output values may be chosen, and every processor’s output value must be some processor’s input value.

We analyze this problem in a synchronous, message-passing model where processors

f k c

fail by crashing. We prove a lower bound of b rounds of communication for f solutions to k -set agreement that tolerate failures, and we exhibit a protocol proving the matching upper bound. This result shows that there is an inherent tradeoff between the running time, the degree of coordination required, and the number of faults toler- ated, even in idealized models like the synchronous model. The proof of this result is interesting because it is the first to apply topological techniques to the synchronous model. c Digital Equipment Corporation, 1998

This work may not be copied or reproduced in whole or in part for any commercial purpose. Per- mission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of the Cambridge Research Laboratory of Digital Equipment Corpo- ration in Cambridge, Massachusetts; an acknowledgment of the authors and individual contributors to the work; and all applicable portions of the copyright notice. Copying, reproducing, or repub- lishing for any other purpose shall require a license with payment of fee to the Cambridge Research Laboratory. All rights reserved.

CRL Technical reports are available on the CRL’s web page at http://www.crl.research.digital.com.

Digital Equipment Corporation Cambridge Research Laboratory One Kendall Square, Building 700 Cambridge, Massachusetts 02139 1

1 Introduction

Most interesting problems in concurrent and require processors to coordinate their actions in some way. It can also be important for protocols solv- ing these problems to tolerate processor failures, and to execute quickly. Ideally, one would like to optimize all three properties—degree of coordination, fault-tolerance, and efficiency—but in practice, of course, it is usually necessary to make tradeoffs among them. In this paper, we give a precise characterization of the tradeoffs required

by studying a family of basic coordination problems called k -set agreement.

In k -set agreement [Cha91], each processor starts with an arbitrary input value and halts after choosing an output value. These output values must satisfy two conditions: each output value must be some processor’s input value, and the set of output val-

ues chosen must contain at most k distinct values. The first condition rules out trivial solutions in which a single value is hard-wired into the protocol and chosen by all processors in all executions, and the second condition requires that the processors co- ordinate their choices to some degree. This problem is interesting because it defines

a family of coordination problems of increasing difficulty. At one extreme, if n is the

number of processors in the system, then n-set agreement is trivial: each processor

simply chooses its own input value. At the other extreme, -set agreement requires that all processors choose the same output value, a problem equivalent to the consensus problem [LSP82, PSL80, FL82, FLP85, Dol82, Fis83]. Consensus is well-known to be the “hardest” problem, in the sense that all other decision problems can be reduced to it. Consensus arises in applications as diverse as on-board aircraft control [W+ 78],

database transaction commit [BHG87], and concurrent object design [Her91]. Between

n these extremes, as we vary the value of k from to , we gradually increase the degree of processor coordination required. We consider this family of problemsin a synchronous, message-passing model with

crash failures. In this model, n processors communicate by sending messages over a completely connected network. Computation in this model proceeds in a sequence of rounds. In each round, processors send messages to other processors, then receive messages sent to them in the same round, and then perform some local computation and change state. This means that all processors take steps at the same rate, and that all messages take the same amount of time to be delivered. Communication is reliable,

but up to f processors can fail by stopping in the middle of the protocol.

The primary contribution of this paper is a lower bound on the amount of time k required to solve k -set agreement, together with a protocol for -set agreement that

proves a matching upper bound. Specifically, we prove that any protocol solving k -

bf k c

set agreement in this model and tolerating f failures requires rounds of

f k

communication in the worst case—assuming n , meaning that there are at

least k nonfaulty processors—and we prove a matching upper bound by exhibiting a

bf k c

protocol that solves k -set agreement in rounds. Since consensus is just -set

agreement, our lower bound implies the well-known lower bound of f rounds for

f r bf k c consensus when n [FL82]. More important, the running time

demonstrates that there is a smooth but inescapable tradeoff among the number f of r faults tolerated, the degree k of coordination achieved, and the time the protocol

must run. For a fixed value of f , Figure 1 shows that 2-set agreement can be achieved 2 1 INTRODUCTION

f+1

f __ + 1 r = rounds 2

1 2 3 4 5 k = degree of coordination

Figure 1: Tradeoff between rounds and degree of coordination. in half the time needed to achieve consensus. In addition, the lower bound proof itself is interesting because of the geometric proof technique we use, combining ideas due to Chaudhuri [Cha91, Cha93], Fischer and Lynch [FL82], Herlihy and Shavit [HS93], and Dwork, Moses, and Tuttle [DM90, MT88]. In the past few years, researchers have developed powerful new tools based on classical algebraic topology for analyzing tasks in asynchronous models (e.g., [AR96, BG93, GK96, HR94, HR95, HS93, HS94, SZ93]). The principal innovation of these papers is to model computations as simplicial complexes (rather than graphs) and to derive connections between computations and the topological properties of their complexes. This paper extends this topological ap- proach in several new ways: it is the first to derive results in the synchronous model, it derives lower bounds rather than computability results, and it uses explicit construc- tions instead of existential arguments. Although the synchronous model makes some strong (and possibly unrealistic) as- sumptions, it is well-suited for proving lower bounds. The synchronous model is a special case of almost every realistic model of a concurrent system we can imagine,

and therefore any lower bound for k -set agreement in this simple model translates into a lower bound in any more complex model. For example, our lower bound holds for models that permit messages to be lost, failed processors to restart, or processor speeds to vary. Moreover, our techniques may be helpful in understanding how to prove (pos-

sibly) stricter lower bounds in more complex models. Naturally, our protocol for k -set agreement in the synchronous model does not work in more general models, but it is still useful because it shows that our lower boundis the best possible in the synchronous model. This paper is organized as follows. In Section 2, we give an informal overview of our lower bound proof. In Section 3 we define our model of computation, and in

Section 4 we define k -set agreement. In Sections 5 through 9 we prove our lower

bound, and in Section 10 we give a protocol solving k -set agreement, proving the matching upper bound. 3

2 Overview

We start with an informal overview of the ideas used in the lower bound proof. For k

the remainder of this paper, suppose P is a protocol that solves -set agreement and

n P r bf k c tolerates the failure of f out of processors, and suppose halts in

rounds. This means that all nonfaulty processors have chosen an output value at time r

n f k

in every execution of P . In addition, suppose , which means that at

least k processors never fail. Our goal is to consider the global states that occur at

P k

time r in executions of , and to show that in one of these states there are pro-

k

cessors that have chosen k distinct values, violating -set agreement. Our strategy P is to consider the local states of processors that occur at time r in executions of , and to investigate the combinations of these local states that occur in global states. This investigation depends on the construction of a geometric object. In this section, we use a simplified version of this object to illustrate the general ideas in our proof.

Since consensus is a special case of k -set agreement, it is helpful to review the stan-

dard proof of the f round lower bound for consensus [FL82, DS83, Mer85, DM90] P to see why new ideas are needed for k -set agreement. Suppose that the protocol is

a consensus protocol, which means that in all executions of P all nonfaulty processors

r g g r 2 have chosen the same output value at time . Two global states 1 and at time are

said to be similar if some nonfaulty processor p has the same local state in both global states. The crucial property of similarity is that the decision value of any processor

in one global state completely determines the decision value for any processor in all

v g p

similar global states. For example, if all processors decide in 1 , then certainly de-

v g p g g p

1 2

cides in 1 . Since has the same local state in and , and since ’s decision value

p v g

is a function of its local state, processor also decides in 2 . Since all processors

p g v g 2

agree with in 2 , all processors decide in , and it follows that the decision value

g g 2

in 1 determines the decision value in . A similarity chain is a sequence of global

g g g g

i i+1

states, 1 , such that is similar to . A simple inductive argument shows

g g

that the decision value in 1 determines the decision value in . The lower boundproof P consists of showing that all time r global states of lie on a single similarity chain. It

follows that all processors choose the same value in all executions of P , independent of the input values, violating the definition of consensus.

The problem with k -set agreement is that the decision values in one global state do

not determine the decision values in similar global states. If p has the same local state

g g p 2

in 1 and , then must choose the same value in both states, but the values chosen

by the other processors are not determined. Even if n processors have the same

g g 2 local state in 1 and , the decision value of the last processor is still not determined.

The fundamental insight in this paper is that k -set agreement requires considering all “degrees” of similarity at once, focusing on the number and identity of local states common to two global states. While this seems difficult—if not impossible—to do us- ing conventional graph theoretic techniques like similarity chains, there is a geometric generalization of similarity chains that provides a compact way of capturing all degrees of similarity simultaneously, and it is the basis of our proof.

A simplex is just the natural generalization of a triangle to n dimensions: for ex-

ample, a -dimensional simplex is a vertex, a -dimensional simplex is an edge linking two vertices, a -dimensional simplex is a solid triangle, and a -dimensional simplex 4 2 OVERVIEW

Figure 2: Global states for zero, one, and two-round protocols. 5

Bermuda Triangle

Figure 3: Global states for an r -round protocol (showing the embedded Bermuda Tri- angle).

is a solid tetrahedron. We can represent a global state for an n-processor protocol as

n

an -dimensional simplex [Cha93, HS93], where each vertex is labeled with a

g g p

2 1

processor id and local state. If 1 and are global states in which has the same

g g p

2 1 local state, then we “glue together” the vertices of 1 and labeled with . Figure 2 shows how these global states glue together in a simple protocol in which each of three processors repeatedly sends its state to the others. Each process begins with a binary input. The first picture shows the possible global states after zero rounds: since no communication has occurred, each processor’s state consists only of its input. It is easy to check that the simplices corresponding to these global states form an octahedron. The next picture shows the complex after one round. Each triangle corresponds to a failure-free execution, each free-standing edge to a single-failure execution, and so on. The third picture shows the possible global states after three rounds.

The set of global states after an r -round protocol is quite complicated (Figure 3), but it contains a well-behaved subset of global states which we call the Bermuda Trian-

gle B , since all fast protocols vanish somewhere in its interior. The Bermuda Triangle

(Figure 4) is constructed by starting with a large k -dimensional simplex, and triangu-

lating it into a collection of smaller k -dimensional simplexes. We then label each vertex

p s p s

with an ordered pair consisting of a processor identifier and a local state in g

such a way that for each simplex T in the triangulation there is a global state con-

p s

sistent with the labeling of the simplex: for each ordered pair labeling a corner

p s g of T , processor has local state in global state . To illustrate the process of labeling vertices, Figure 5 shows a simplified repre-

sentation of a two-dimensional Bermuda Triangle B . It is the Bermuda Triangle for 6 2 OVERVIEW

vertices labeled with (processor, local state) pairs

(S,d)

each simplex is consistent with a global state

(R,c)

(P,a) (Q,b)

Figure 4: Bermuda Triangle with simplex representing typical global state. 7

cccaa

ccccc cc?aa cc?aa cccc? cccc?

cccca ccccb

cc?aa ccbaa ccc?a ccc?b

cccaa cccbb

cc?aa cc?bb c??aa c?baa

ccaaa ccbbb

c?aaa c?bbb cb?aa cbbaa

caaaa cbbbb

?aaaa ?bbbb ?b?aa ?bbaa aaaaa bbbbb ?aaaa b?aaa bb?aa bbb?a bbbb? baaaa bbaaa bbbaa bbbba bb?aa bbbaa

Figure 5: The Bermuda Triangle for 5 processors and a 1-round protocol for 2-set

agreement.

a protocol P for processors solving -set agreement in round. We have labeled grid points with local states, but we have omitted processor ids and many intermediate

nodes for clarity. The local states in the figure are represented by expressions such

aa a b c bbaa

as bb . Given distinct input values , we write to denote the local state b

of a processor p at the endofa roundin whichthe first two processorshaveinput value p

and send messages to p, the middle processor fails to send a message to , and the last p

two processors have input value a and send messages to . In Figure 5, following any

a b horizontal line from left to right across B , the input values are changed from to . The input value of each processor is changed—one after another—by first silencing the

processor, and then reviving the processor with the input value b. Similarly, moving c along any vertical line from bottom to top, processors’ input values change from b to .

The complete labeling of the Bermuda Triangle B shown in Figure 5—which

p s

would include processor ids—has the following property. Let be the label of

x B s

a grid point x. If is a corner of , then specifies that each processor starts with the

P s

same input value, so p must choose this value if it finishes protocol in local state .

B s If x is on an edge of , then specifies that each processor starts with one of the two

input values labeling the ends of the edge, so p must choose one of these values if it

x B s

halts in state s. Similarly, if is in the interior of , then specifies that each processor p starts with one of the three values labeling the corners of B , so must choose one of

these three values if it halts in state s.

Now let us “color” each grid point with output values (Figure 6). Given a grid

p s x v p s point x labeled with , let us color with the value that chooses in local state 8 2 OVERVIEW

Figure 6: Sperner’s Lemma. B at the end of P . This coloring of has the property that the color of each of the cor- ners is determined uniquely, the color of each point on an edge between two corners is forced to be the color of one of the corners, and the color of each interior point can be the color of any corner. Colorings with this property are called Sperner colorings, and have been studied extensively in the field of algebraic topology. At this point, we ex- ploit a remarkable combinatorial result first proved in 1928: Sperner’s Lemma [Spa66,

p.151] states that any Sperner coloring of any triangulated k -dimensional simplex must

include at least one simplex whose corners are colored with all k colors. In our

case, however, this simplex corresponds to a global state in which k processors

k

choose k distinct values, which contradicts the definition of -set agreement. Thus, in the case illustrated above, there is no protocol for -set agreement halting in round. We note that the basic structureof the Bermuda Triangleand the idea of coloringthe vertices with decision values and applying Sperner’s Lemma have appeared in previous

work by Chaudhuri [Cha91, Cha93]. In that work, she also proved a lower bound

f k c k of b rounds for -set agreement, but for a very restricted class of protocols. In particular, a protocol’s decision function can depend only on vectors giving partial information about which processors started with which input values, but cannot depend on any other information in a processor’s local state, such as processor identities or message histories. The technical challenge in this paper is to construct a labeling of vertices with processor ids and local states that will allow us to prove a lower bound

for k -set agreement for arbitrary protocols.

Our approach consists of four parts. First, we label points on the edges of B with

global states. For example, consider the edge between the corner where all processors b start with input value a and the corner where all processors start with . We construct 9

a long sequence of global states that begins with a global state in which all processors b

start with a, ends with a global state in which all processors start with , and in between b systematically changes input values from a to . These changes are made so gradually, however, that for any two adjacent global states in the sequence, at most one processor can distinguish them. Second, we label each remaining point using a combination of the global states on the edges. Third, we assign nonfaulty processors to points in such a way that the processor labeling a point has the same local state in the global states labeling all adjacent points. Finally, we project each global state onto the associated nonfaulty processor’s local state, and label the point with the resulting processor-state pair.

3 The Model

We use a synchronous, message-passing model with crash failures. The system con-

n p p n sists of processors, 1 . Processors share a global clock that starts at and

advances in increments of . Computation proceeds in a sequence of rounds, with

r r round r lasting from time to time . Computation in a round consists of three

phases: first each processor p sends messages to some of the processors in the sys- tem, possibly including itself, then it receives the messages sent to it during the round, and finally it performs some local computation and changes state. We assume that the communication network is totally connected: every processor is able to send distinct

messages to every other processor in every round. We also assume that communication

q r

is reliable (although processors can fail): if p sends a message to in round , then the r message is delivered to q in round . Processors follow a deterministic protocol that determines what messages a pro- cessor should send and what output a processor should generate. A protocol has two components: a message component that maps a processor’s local state to the list of messages it should send in the next round, and an output component that maps a pro-

cessor’s local state to the output value (if any) that it should choose. Processors can be r faulty, however, and any processor p can simply stop in any round . In this case, pro-

cessor p follows its protocol and sends all messages the protocol requires in rounds 1

r

through r , sends some subset of the messages it is required to send in round , and

p r p

sends no messages in rounds after r . We say that is silent from round if sends

p r p no messages in round r or later. We say that is active through round if sends all

messages in round r and earlier. A full-information protocol is one in which every processor broadcasts its en- tire local state to every processor, including itself, in every round [PSL80, FL82, Had83]. One nice property of full-information protocols is that every execution of

a full-information protocol P has a compact representation called a communication

r P

graph [MT88]. The communication graph G for an -round execution of is a two-

r

dimensional two-colored graph. The vertices form an n grid, with processor

n r

names through labeling the vertical axis and times through labeling the hor-

i h p ii

izontal axis. The node representing processor p at time is labeled with the pair .

q i hp i i

Given any pair of processors p and and any round , thereis an edgebetween

q ii p q and h whose color determines whether successfully sends a message to in 10 3 THEMODEL p ,a 1 p ,a red 2 green p ,a 3 0 1 2 3

Figure 7: A three-round communication graph.

p h p i round i: theedgeisgreenif succeeds, and red otherwise. In addition, each node

is labeled with p’s input value. Figure 7 illustrates a three round communication graph.

In this figure, green edges are denoted by solid lines and red edges by dashed lines.

p i i hq ii i p q

We refer to the edge between h and as the round edge from to , and

p i i i p

we refer to the node h as the round node for since it represents the point i at which p sends its round messages. We define what it means for a processor to be silent or active in terms of communication graphs in the obvious way. In the crash failure model, a processor is silent in all rounds following the round in

which it stops. This means that all communication graphs representing executions in

p j i

this model have the property that if a round i edge from is red, then all round

p i edges from p are red, which means that is silent from round . We assume that all

communication graphs in this paper have this property, and we note that every r -round P

graph with this property corresponds to an r -round execution of . P

Since a communication graph G describes an execution of , it also determines G

the global state at the end of P , so we sometimes refer to as a global communica-

t G

tion graph. In addition, for each processor p and time , there is a subgraph of that t

corresponds to the local state of p at the end round , and we refer to this subgraph as t

a local communication graph. The local communication graph for p at time is the

p t G p t

subgraph G of containing all the information visible to at the end of round .

p t hp ti

Namely, G is the subgraph induced by the node and all earlier nodes reach-

p ti able from h by a sequence (directed backwards in time) of green edges followed by at most one red edge. In the remainder of this paper, we use graphs to represent states. Wherever we used “state” in the informal overview of Section 2, we now substitute the word “graph.” Furthermore, we defined a full-information protocol to be a protocol in which processors broadcast their local states in every round, but we now assume that processors broadcast their local communication graphs instead. In addition, we assume

that all executions of a full-information protocol run for exactly r rounds and produce

output at exactly time r . All local and global communication graphs are graphs at

time r , unless otherwise specified. The crucial property of a full-information protocol is that every protocol can be simulated by a full-information protocol, and hence that we can restrict attention to

full-information protocols when proving the lower bound in this paper:

k f Lemma 1: If there is an n-processor protocol solving -set agreement with fail-

11

n k

ures in r rounds, then there is an -processor full-information protocol solving -set r agreement with f failures in rounds.

4 The k -set Agreement Problem

The k -set agreement problem [Cha91] is defined as follows. We assume that each p

processor i has two private registers in its local state, a read-only input register and a p

write-only output register. Initially, i ’s input register contains an arbitrary input value

V k v v k from a set containing at least values 0 , and its output register is empty. A protocol solves the problem if it causes each processor to halt after writing an output value to its output register in such a way that

1. every processor’s output value is some processor’s input value, and

2. the set of output values chosen has size at most k .

5 Bermuda Triangle

In this section, we define the basic geometric constructs used in our proof that every

k f bf k c

protocol P solving -set agreement and tolerating failures requires at least

f k

rounds of communication, assuming n

k We start with some preliminary definitions. A simplex S is the convex hull of

1

x x k k affinely-independent points 0 in Euclidean space. It is a -dimensional

volume, the k -dimensional analogue of a solid triangle or tetrahedron. The points

x x S k S k

0 are called the vertices of , and is the dimension of . We sometimes

k F

call S a -simplex when we wish to emphasize its dimension. A simplex is a face

F S

of S if the vertices of form a subset of the vertices of (which means that the di-

F S k S S

mension of is at most the dimension of ). A set of -simplexes 1 is a

S S S S S S

i j triangulation of if 1 and the intersection of and is a face of

2

i j S each for all pairs and . The vertices of a triangulation are the vertices of the i . Any

triangulation of S induces triangulations of its faces in the obvious way. k The construction of the Bermuda Triangle is illustrated in Figure 8. Let B be the -

simplex in k -dimensional Euclidean space with vertices

N N N N N B

where N is a huge integer defined later in Section 6.3. The Bermuda Triangle is a B

triangulation of B defined as follows. The vertices of are the grid points contained

B x x x x

k i

in : these are the points of the form 1 , where the are integers

N x x x

2 k between and satisfying 1 . Informally, the simplexes of the triangulation are defined as follows: pick any grid point and walk one step in the positive direction along each dimension (Figure 9).

1

x x x x x x

0 1 0 0 k Points k are affinely independent if are linearly independent.

2

k S S j Notice that the intersection of two arbitrary -dimensional simplexes i and will be a volume of some dimension, but it need not be a face of either simplex. 12 5 BERMUDA TRIANGLE

Figure 8: Construction of Bermuda Triangle.

Figure 9: Simplex generation in Kuhn’s triangulation.

13

The k points visited by this walk define the vertices of a simplex, and the trian-

gulation B consists of all simplexes determined by such walks. For example, the 2-

dimensional Bermuda Triangle is illustrated in Figure 5. This triangulation, known as

e e k

Kuhn’s triangulation, is defined formally as follows [Cha93]. Let 1 be the

e i

unit vectors; that is, i is the vector with a single 1 in the th coordi-

y f f

1 k

nate. A simplex is determined by a point 0 and an arbitrary permutation of

e e y y f

k i i1 i

the unit vectors 1 : the vertices of the simplex are the points

for all i . When we list the vertices of a simplex, we always write them in the

y y k

order 0 in which they are visited by the walk.

B B

For brevity, we refer to the vertices of B as the corners of . The “edges” of are B partitioned to form the edges of B . More formally, the triangulation induces triangu-

lations of the one-dimensional faces (line segments connecting the vertices) of B , and B these induced triangulations are called the edges of B . The simplexes of are called

primitive simplexes.

p L p Each vertex of B is labeled with an ordered pair consisting of a processor id

and a local communicationgraph L. As illustrated in the overview in Section 2, the cru-

S y y k

cial property of this labeling is that if is a primitive simplex with vertices 0 ,

y q L

i i

and if each vertex i is labeled with a pair , then there is a global communica-

G q G L i tion graph such that each i is nonfaulty in and has local communication graph

in G . Constructing this labeling is the subject of the next three sections. We first assign p

global communication graphs G to vertices in Section 6, then we assign processors to

p L

vertices in Section 7, and then we assign ordered pairs to vertices in Section 8,

p G where L is the local communication graph of in .

6 Graph Assignment

In this section, we label each vertex of B with a global communication graph. Actually, for expository reasons, we augment the definition of a communication graph and label

vertices of B with these augmented communication graphs instead. Constructing this labeling involves several steps. We define operations on augmented communication graphs that make minor changes in the graphs, and we use these operations to construct

long sequences of graphs. Then we label vertices along edges of B with graphs from

these sequences, and we label interior vertices of B by performing a merge of the graphs labeling the edges.

6.1 Augmented Communication Graphs We extend the definition of a communication graph to make the processor assignment

in Section 7 easier to describe. We augment communication graphs with tokens, and i

place tokens on the graph so that if processor p fails in round , then there is a token

p j i p j i on the node h for processor in some earlier round (Figure 10). In this sense, every processor failure is “covered” by a token, and the number of processors failing in the graph is bounded from above by the number of tokens. In the next few sections, when we construct long sequences of these graphs, tokens will be moved be- tween adjacent processors within a round, and used to guarantee that processor failures 14 6 GRAPH ASSIGNMENT p ,a 1 red p ,a 2 green token p ,a 3 0 1 2 3

Figure 10: Three-round communication graph with one token per round.

in adjacent graphs change in a orderly fashion. For every value of , we define graphs

with exactly tokens placed on nodes in each round, but we will be most interested in

k

the two cases with equal to and .

G For each value , we define an -graph to be a communication graph with

tokens placed on the nodes of the graph that satisfies the following conditions for each

i r

round i, :

1. The total number of tokens on round i nodes is exactly .

p j i p

2. Ifaround i edge from is red, then there is a token on a round node for .

p p i

3. Ifa round i edge from is red, then is silent from round .

i i p

We say that p is covered by a round token if there is a token on the round node for ,

i p j i

we say that p is covered in round if is covered by a round token, and we p

say that p is covered in a graph if is covered in any round. Similarly, we say that a

p p i round i edge from is covered if is covered in round . The second condition says

every red edge is covered by a token, and this together with the first condition implies

that at most r processors fail in an -graph. We often refer to an -graph as a graph

when the value of is clear from context or unimportant. We emphasize that the tokens are simply an accounting trick, and have no meaning as part of the global or local state

in the underlying communication graph.

We define a failure-free -graph to be an -graph in which all edges are green, and

i p i all round tokens are on processor 1 in all rounds .

6.2 Graph operations We now define four operations on augmented graphs that make only minor changes to a graph. In particular, the only change an operation makes is to change the color of a single edge, to change the value of a single processor’s input, or to move a single token between adjacent processors within the same round. The operations are defined

as follows (see Figure 11):

i p q i p q 1. delete : This operation changes the color of the round edge from to to red, and has no effect if the edge is already red. This makes the delivery of the 6.2 Graph operations 15

p ,a p ,a 1 1 delete(3, p , p ) p ,a 2 3 p ,a 2 2 p ,a p ,a 3 3 0 1 2 3 0 1 2 3

p ,a p ,a 1 1 add(3, p , p ) p ,a p ,a 2 2 1 2 p ,a p ,a 3 3 0 1 2 3 0 1 2 3

p ,a p ,a 1 1 change(p , b) p ,a p ,a 2 3 2 p ,a p ,b 3 3 0 1 2 3 0 1 2 3

p ,a p ,a 1 1 move(2, p , p ) p ,a p ,a 2 3 2 2 p ,a p ,a 3 3 0 1 2 3 0 1 2 3

Figure 11: Operations on augmented communication graphs.

16 6 GRAPH ASSIGNMENT

p q p

round i message from to unsuccessful. It can only be applied to a graph if

i p i

and q are silent from round , and is covered in round .

i p q i p q 2. add : This operation changes the color of the round edge from to to

green, and has no effect if the edge is already green. This makes the delivery of

p q p

the round i message from to successful. It can only be applied to a graph if

i p i p and q are silent from round , processor is active through round , and

is covered in round i.

p v p v

3. change : This operation changes the input value for processor to , and p

has no effect if the value is already v . It can only be applied to a graph if is

p

silent from round , and is covered in round .

i p q i h p i i h q i i

4. move : This operation moves a round token from to ,

p q fp q g fp p g

j +1

and is defined only for adjacent processors and (that is, j

p i for some j ). It can only be applied to a graph if is covered by a round token, and all red edges are covered by other tokens.

It is obvious from the definition of these operations that they preserve the property of

G G being an -graph: if is an -graph and is a graph operation, then is an -graph.

We define delete, add, and change operations on communication graphs in exactly the i same way, except that the condition “p is covered in round ” is omitted.

6.3 Graph sequences

v

We now define a sequence of graph operations that can be applied to any failure-

G v

free graph G to transform it into the failure-free graph in which all processors

v v

have input v . We want to emphasize that the sequences differ only in the value .

For this reason, we define a parameterized sequence V with the property that for all

G v G G v

values v and all graphs , the sequence transforms to . In general, we define

a parameterized sequence X 1 X to be a sequence of graph operations with free

variables X 1 X appearing as parameters to the graph operations in the sequence.

G p m G p m G

Given a graph G , let red and green be graphs identical to ex- m r cept that all edges from p in rounds are red and green, respectively. We define

these graphs only if

m G

1. p is covered in round in , G

2. all faulty processors are silent from round m (or earlier) in , and

p m r G

3. and all tokens are on 1 in rounds in .

G p m

In addition, we define the graph green only if

m G

4. p is active through round in .

G p m G p m

These restrictions guarantee that if G is an -graph and red and green

G p m G p m are defined, then red and green are both -graphs. 6.3 Graph sequences 17

In the case of ordinarycommunicationgraphs, a result by Moses and Tuttle [MT88]

G p m

implies that there is a “similarity chain” of graphs between G and red and be-

G p m tween G and green . In their proof—a refinement of similar proofs by Dwork and Moses [DM90] and others—the sequence of graphs they construct has the property that each graph in the chain can be obtained from the preceding graph by applying a sequence of the add, delete, and change graph operations defined above. The same proof works for augmented communication graphs, provided we insert move opera-

tions between the add, delete, and change operations to move tokens between nodes

G appropriately. With this modification, we can prove the following. Let faulty be the

set of processors that fail in G .

m

Lemma 2: For every processor p, round , and set of processors, there are se-

G p m p m

quences silence and revive such that for all graphs :

G p m G

1. If red is defined and faulty , then

p mG G p m

silence red

G p m G

2. If green is defined and faulty , then

G p m p mG

revive green

m r

Proof: We proceed by reverse induction on m. Suppose . Define

r p p r p p p r

1 n

silence delete delete

r p p r p p p r

1 n

revive add add

G p r i

For part 1, let G be any graph and suppose red is defined. For each with

i n G G r p

, let i be the graph identical to except that the round edges from

p p G p r p i

to 1 are red. Since red is defined, condition 1 implies that is covered

r G i i n G

1

in round in . For each with , it follows that i is really a graph,

r p p G G G G

i1 i 0

and delete i can be applied to and transforms it to . Since

G G p r G G p r p r n

and red , it follows that silence transforms to red . For

G p r part 2, let G be any graph and suppose green is defined. The proof of this part

is the direct analogue of the proof of part 1. The only difference is that since we are

p p

coloring round r edges from green instead of red, we must verify that is active

G

through round r in , but this follows immediately from condition 4.

r m f pg Suppose m and the induction hypothesis holds for . Define

and define

m p m p p m p p

1 2 i1 i

set i move move

m p m p p m p p

i i1 2 1

reset i move move

p p i

The set function moves the token from 1 to and the reset function moves the token

p p 1 back from i to .

18 6 GRAPH ASSIGNMENT

m p p m p p p

i i

Define block i to be delete if , and otherwise

m p

set i

0

0

m p p p m p m

i i i

fp g

silence delete revive

i

m p

reset i

m p p m p p p

i i

Define unblock i to be add if , and otherwise

m p

set i

0

0

m p p p m p m

i i i

fp g

silence add revive

i

m p

reset i

Finally, define

m p m p p m p p n

block block 1 block

m p m p p m p p n unblock unblock 1 unblock

and define

m p p m p m

silence silence block

0

m p p m p m p m

revive silence unblock revive

G p m

For part 1, let G be any graph, and suppose red is defined and

G G p m G p m

faulty . Since red is defined, the graph red is also defined,

m G p m

and the induction hypothesis for states that silence transforms

G p m m p G p m

to red . We now show that block transforms red

G p m i i n G

to red , and we will be done. For each with , let i be the

p m p

graph identical to G except that is silent from round and the edges from

p p G G p m p

i i

to 1 are red in . Since red is defined, condition 1 implies that is

m G i i n G

covered in round in . For each with , it follows that i really is a graph

G G p m G G G p m

0 n

and that faulty i . Since red and red , it is

m p p G G i i n

i1 i

enough to show that block i transforms to for each with .

p

The proof of this fact depends on whether i , so we consider two cases.

p p m G

i1

Consider the easy case with i . We know that is covered in round in

p m

since it is covered in G by condition 1. We know that is silent from round

G G G p m p

1 0 i

in i since it is silent in red . We know that is silent from

m G p p p p

1 i i i

round in i since implies (assuming that is not just again) that

m G

fails in G , and hence is silent from round in by condition 2. This means

m p p m p p G G G

i i1 i1 i

that block i delete can be applied to to transform to .

p H H

i1 i

Now consider the difficult case when i . Let and be graphs identical

G G m p H H

1 i i i1 i

to i and , except that a single round token is on in and .

m p G G

i1

Condition 3 guarantees that all round tokens are on 1 in , and hence in

G H H m p G

i1 i i i1

and i , so and really are graphs. In addition, set transforms

H m p H G I I H

1 i i i i1 i i1

to i , and reset transforms to . Let and be identical to

H p m I I p

i i1 i i

and i except that is silent from round in and . Processor is covered

m H H I I p

1 i i1 i i

in round in i and , so and really are graphs. In fact, does

G p p m I I I

i i1 i i1

not fail in since i , so is active through round in and , so

H p m H I p m m

1 i i i i red i and green . The inductive hypothesisfor

6.3 Graph sequences 19

0

0

H I p m p m

i1 i1 i i

fp g

states that silence transforms to , and revive

i

I H I I

i i1 i

transforms i to . Finally, notice that the only difference between and is the

m p p p m p p i

color of the round edge from to i . Since is covered in round and and are

m m p p I

i1

silent from round in both graphs, we know that delete i transforms

I m p p G G

i i1 i

to i . It follows that block transforms to , and we are done.

G p m

For part 2, let G be any graph and suppose green is defined and

G G p m G G p m H H

faulty . Since green is defined, let green . Now let and

G p m H H

be graphs identical to G and except that is silent from round in and .

G p m p m G

Since green is defined, processor is covered in round in by condition 1

H H G p m

and hence in G , so and really are graphs. In addition, since green is

m G p

defined, processor p is active through round in by condition 4, so processor

G H H p m

is active through round m in and . This means that green is de-

G p m G H p m

fined, and in fact we have H red and green . The

p m m G H

induction hypothesis for states that silence transforms to and

0

p m H G

that revive transforms to . To complete the proof, we need only show

m p H H

that unblock transforms to . The proof of this fact is the direct analogue of

m p G p m G p m

the proof in part 1 that block transforms red to red . The p

only difference is that since we are coloring round m edges from with green instead

p m H

of red, we must verify that is active through round in the graphs i analogous G

to i in the proof of part 1, but this follows immediately from condition 4.

G G v G p i

Given a graph , let i be a graph identical to , except that processor has

v G G v

input . Using the preceding result, we can transform to i .

i

Lemma 3: For each , there is a parameterized sequence i V with the property that

v G v G G v i for all values and failure-free graphs , the sequence i transforms to .

Proof: Define

p p p p p

1 2 i1 i

set i move move

p p p p p

i i1 2 1 reset i move move

and define

p p p p p

i i i i i

i V V

fp g

set silence change revive reset

i

v G

where denotes the empty set. Now consider any value and any failure-free graph ,

G v G G G p i

and let . Since and are failure-free graphs, all round 1 tokens are on 1 ,

H G G

so let H and be graphs identical to and except that a single round 1 token is

p H H H H p G i

on i in and . We know that and are graphs, and that set transforms

H p H G p H H I i

to and reset i transforms to . Since is covered in and , let

I H H p I

and be identical to and except that i is silent from round 1. We know that

I p H I i

and are graphs, and it follows by Lemma 2 that silence transforms to

p I H I I

i

p g

and that revivef transforms to . Finally, notice that and differ only

i

p p i

in the input value for i . Since is covered and silent from round 1 in both graphs,

p v I I

the operation change i can be applied to and transforms it to . Stringing all of

v G G G v i

this together, it follows that i transforms to .

G v By concatenating such operation sequences, we can transform G into by chang- ing processors’ input values one at a time:

20 6 GRAPH ASSIGNMENT

v G n

Lemma 4: Let V 1 V V . For every value and failure-free graph , the

v G G v

sequence transforms to .

B N

Now we can define the parameter N used in defining the shape of : is the length

r of the sequence V , which is exponential in .

6.4 Graph merge

v

Speaking informally, we will use each sequence i of graph operations to generate

a sequence of graphs, and we will use this sequence of graphs to label vertices along

i B the edge of B in the th dimension. Then we will label vertices in the interior of by

performing a “merge” of the graphs on the edges in the different dimensions.

H H k

The merge of a sequence 1 of graphs is a graph defined as follows:

e H H k 1. an edge is colored red if it is red in any of the graphs 1 , and green

otherwise, and

hp i v i

2. an initial node is labeled with the value i where is the maximum index

hp i v H v i

i 0

such that is labeled with i in , or if no such exists, and

p ii

3. the number of tokens on a node h is the sum of the number of tokens on the

H H k node in the graphs 1 .

The first condition says that a message is missing in the resulting graph if and only

if it is missing in any of the merged graphs. To understand the second condition,

p s p

j j

notice that for each processor j there is a integer with the property that ’s input

v s v

j i

value in changed to i by the th operation appearing in . Now choose a vertex

x x x B x k

1 of , and imagine walking from the origin to by walking along the

x x x

1 2

first dimension to 1 , then along the second dimension to ,

i p v v

i1 i

and so forth. In each dimension , processor j ’s input is changed from to

s x x x i

1 2 k

after j steps in this dimension. Since , there is a final dimension

p v i

in which j ’s input is changed to , and never changed again. The second condition v

above is just a compact way of identifying this final value i .

H H H H H

k 1 k

Lemma 5: Let be the mergeof the graphs 1 . If are -graphs, k

then H is a -graph. k

Proof: We consider the three conditions required of a k -graph in turn. First, there are

H H H k

tokens in each roundof since there is token in each roundof eachgraph 1 . H

Second, every red edge in H is covered by a token since every red edge in corre-

H H j

sponds to a red edge in one of the graphs j , and this edgeis coveredby a token in .

i H p i

Third, if there is a red edge from p in round in , then there is a red from in round

H p i

of one of the graphs j . In this graph, is silent from round , so the same is true

H k in H . Thus, is a -graph. 6.5 Graph assignments 21

6.5 Graph assignments

B v F i

Now we can define the assignment of graphs to vertices of . For each value i , let

v x x x

1 k

be the failure-free -graph in which all processors have input i . Let

B x v

j j

be an arbitrary vertex of . For each coordinate j , let be the prefix of consist-

x H j

ing of the first j operations, and let be the -graph resulting from the application

F H p p

j 1 j 1 i

of j to . This means that in , some set of adjacent processors have

v v G x

1 j

had their inputs changed from j to . The graph labeling is defined to be the

H H G k k

merge of 1 . We know that is a -graph by Lemma 5, and hence that at

f G most r k processors fail in .

Remember that we always write the vertices of a primitive simplex in a canonical

y y k

order 0 . In the same way, we always write the graphs labeling the vertices] of

G G G y

k i i a primitive simplex in the canonical order 0 , where is the graph labeling .

6.6 Graphs on a simplex

The graphs labeling the vertices of a primitive simplex have some convenient proper-

S y y S k

ties. For this section, fix a primitive simplex , and let 0 be the vertices of

G G S k and let 0 be the graphs labeling the corresponding vertices of . Our first re-

sult says that any processor that is uncovered at a vertex of S is nonfaulty at all vertices

of S .

S q Lemma 6: If processor q is not covered in the graph labeling a vertex of , then is

nonfaulty in the graph labeling every vertex of S .

y a a S i

1 k i i i

Proof: Let 0 be the first vertex of . For each , let and be the

v a a H H

i i i

prefixes of i consisting of the first and operations, and let and be the

i

F i G

i i i1 i

result of applying i and to . For each , we know that the graph labeling

i i i

I y S I I H H j

the vertex i of is the merge of graphs where is either or .

1

j j

k

i

q G q I

Suppose is faulty in i . Then must be faulty in some graph in the sequence of

j

i i

I I G q H H j

graphs merged to form i , so must fail in one of the graphs or .

1 n

j

v v

j j j j

Since j and are prefixes of , it is easy to see from the definition of

q H H q

that the fact that fails in one of the graphs j and implies that is covered in j

both graphs. Since one of these graphs is contained in the sequence of graphs merged

G a q G a

to form a for each , it follows that is covered in each . This contradicts the fact S that q is uncovered in a graph labeling a vertex of . Our next result shows that we can use the bound on the number of tokens to bound

the number of processors failing at any vertex of S .

F G F F jF j r k

i i i Lemma 7: If i is the set of processors failing in and , then

f .

q F q F i q G q i

Proof: If , then i for some and fails in , so is covered in every graph F

labeling every vertex of S by Lemma 6. It follows that each processor in is covered r k

in each graph labeling S . Since there are at most tokens to cover processors in any F graph, there are at most r k processors in .

22 6 GRAPH ASSIGNMENT S

We have assigned graphs to S , and now we must assign processors to . A lo-

S q q k

cal processor labeling of is an assignment of distinct processors 0 to the

y y S q G y

k i i i

vertices 0 of so that is uncovered in for each . A global processor B labeling of B is an assignment of processors to vertices of that induces a local pro-

cessor labeling at each primitive simplex. The final important property of the graphs

S S labeling S is that if we use a processor labeling to label with processors, then is consistent with a single global communication graph. The proof of this requires a few

preliminary results.

G G p p

1 i

Lemma 8: If i and differ in ’s input value, then is silent from round

G G G G q p t

k i1 i

in 0 . If and differ in the color of an edge from to in round ,

p q t G G k

then and are silent from round in 0 .

G G y y

1 i i1 i

Proof: Suppose the two graphs i and labeling vertices and differ in

t q p t

the input to p at time or in the color of an edge from to in round .

j y a a a

1 1 j k

The vertices differ in exactly one coordinate , so i and

y a a a v

1 j k

i . For each , let be the prefix of consisting of

0

a H F

1

the first operations, and let be the result of applying to . Furthermore,

j v a

j j j

in the special case of , let j be the prefix of consisting of the first

1

H F

j j 1

operations, and let be the result of applying j to .

j

0 0 0

H H G H G

1 i

We know that i is the merge of , and that is the merge

1

j

k

0 0 1 0 1 0

H H H H H G G H

1 i

of . If and are equal, then i and are equal. Thus,

1

j j j j

k

1

p t q

and H must differ in the input to at time or in the color of an edge between

j

0 1

p t G G H H

1 i

and in round , exactly as i and differ. Since and are the result of

j j

F t

j j j 1 j

applying j and to , this change at time must be caused by the operation .

p j

It is easy to see from the definition a graph operation like j that (1) if changes ’s

0 1

p H H

input value, then is silent from round in and , and (2) if j changes the color

j j

0 1

p t p q t H H

of an edge from q to in round , then and are silent from round in and .

j j

G G

1 i

Consequently, the same is true in the merged graphs i and .

G G p t p

1 i

Lemma 9: If i and differ in the local communicationgraph of at time , then

t G G k

is silent from round in 0 .

t

Proof: We proceed by induction on t. If , then the two graphs must differ in

p

the input to p at time , and Lemma 8 implies that is silent from round in the

G G t k

graphs 0 labeling the simplex. Suppose and the inductive hypothesis

p t

holds for t . Processor ’s local communication graph at time can differ in the

q t

two graphs for one of two reasons: either p hears from some processor in round in

q q

one graph and not in the other, or p hears from some processor in both graphs but

has different local communication graphs at time t in the two graphs. In the first

p t G G k

case, Lemma 8 implies that is silent from round in the graphs 0 . Inthe

q t

second case, the induction hypothesis for t implies that is silent from round in

G G q t G G

k i1 i

the graphs 0 . In particular, is silent in round in and , contradicting

q t the assumption that p hears from in round in both graphs, so this case can’t happen.

6.6 Graphs on a simplex 23

p r G G p k

Lemma 10: If sends a message in round in any of the graphs 0 , then

r G G k

has the same local communication graph at time in all of the graphs 0 .

r

Proof: If p has different local communication graphs at time in two of the

G G G G p

k i1 i

graphs 0 , then there are two adjacent graphs and in which has

p r

different local communication graphs at time r . By Lemma9, is silent in round

G G p r k in all of the graphs 0 , contradicting the assumption that sent a round message in one of them. Finally, we can prove the crucial property of primitive simplexes in the Bermuda

Triangle:

q q k

Lemma 11: Given a local processor labeling, let 0 be the processors labeling

S L q G

i i

the vertices of , and let i be the local communication graph of in . There is a

G q G

global communication graph with the property that each i is nonfaulty in and has

L G

the local communication graph i in . r

Proof: Let Q be the set of processors that send a round message in any of the

G G q q

k 0 k graphs 0 . Notice that this set includes the uncovered processors ,

since Lemma 6 says that these processors are nonfaulty in each of these graphs. For

Q q

each processor q , Lemma 10 says that has the same local communication graph

r G G k at time in each graph 0 .

Let H be the global communication graph underlying any one of these graphs.

Q r H

Notice that each processor q is active through round in . To see this, notice

r S

that since q sends a message in round in one of the graphs labeling , it sends all

q

messages in round r in that graph. On the other hand, if fails to send a message in

H S

round r in , then the same is true for the corresponding graph labeling . Thus,

G G S p r

1 i

there are adjacent graphs i and labeling where sends a round message r

in one and not in the other. Consequently, Lemma 8 says q is silent in round in all

q r graphs labeling S , but this contradicts the fact that does send a round message in

one of these graphs. H

Now let G be the global communication graph obtained from by coloring green

q Q

each round r edge from each processor , unless the edge is red in one of the local

L L G k

communication graphs 0 in which case we color it red in as well. Notice

Q r H

that since the processors q are active through round in , changing the color

q Q of a round r edge from a processor to either red or green is acceptable, provided

we do not cause more that f processors to fail in the process. Fortunately, Lemma 7

r k n f

implies that there are at least n processors that do not fail in any of the

G G n f k

graphs 0 . This means that there is a set of processors that send to every

r G r

processor in round of every graph i , and in particular that the round edges from L

these processors are green in every local communication graph i . It follows that for

f r G

at least n processors, all round edges from these processors are green in , so at G

most f processors fail in .

q G q G G

i 0 k

Each processor i is nonfaulty in , since is nonfaulty in each , mean-

q G G L L G

0 k 0 k

ing each edge from i is green in each and , and therefore in .

q L G i

In addition, each processor i has the local communication graph in . To see this,

L r p q j

j i

notice that i consists of a round edge from to for each , and the local com-

p r L i munication graph for j at time if this edge is green. This edge is green in

24 7 PROCESSOR ASSIGNMENT

G L

if and only if it is green in . In addition, if this edge is green in i , then it is green

G p j

in i . In this case, Lemma 10 says that has the same local communication graph at

r G G G q

k i

time in each graph 0 , and therefore in . Consequently, has the local

L G communication graph i in .

7 Processor Assignment

What Lemma 11 at the end of the preceding section tells us is that all we have left to do is to construct a global processor labeling. In this section, we show how to do this.

We first associate a set of “live” processors with each communication graph labeling a B vertex of B , and then we choose one processor from each set to label vertices of .

7.1 Live processors

c n r k k Given a graph G , we construct a set of uncovered (and hence

nonfaulty) processors. We refer to these processors as the live processors in G , and we

G G G

denote this set by live . These live sets have one crucial property: if and are two

G G p

graphs labeling adjacent vertices, and if p is in both live and live , then has the

p R

same rank in both sets. As usual, we define the rank of i in a set of processors to

p R j i

be the number of processors j with .

G

Given a graph G , we now show how to construct live . This construction has one

G

goal: if G and are graphs labeling adjacent vertices, then the construction should

G G

minimize the number of processors whose rank differs in the sets live and live .

G The construction of live begins with the set of all processors, and removes a set

of r k processors, one for each token. This set of removed processors includes the p

covered processors, but may include other processors as well. For example, suppose i

p G p p

+1 i i+1

and i are covered with one token each in , but suppose is uncovered and

is covered by two tokens in G . For simplicity, let’s assume these are the only tokens

G p p

i+1

on the graphs. When constructing the set live , we remove both i and since

G p +1

they are both covered. When constructing the set live , we remove i , but we

p +1

must also remove a second processor correspondingto the second token covering i . p

Which processorshouldwe remove? If we choosea low processor like 1 , thenwehave

p

changed the rank of a low processor like 2 from to . If we choose a high processor

p p n n

n1

like n , then we have change the rank of a high processorlike from to . p

On the other hand, if we choose to remove i again, then no processors change rank. In

G p p

general, the construction of live considers each processor in turn. If is covered

m G m p p by p tokens in , then the construction removes processors by starting with ,

working down the list of remaining processors smaller than p, and then working up the

list of processors larger than p if necessary.

G p m

Specifically, given a graph , the multiplicity of is the number p of tokens ap-

i m p G G m hm p

pearing on nodes for in , andthe multiplicity of is the vector p .

n 1

G

Given the multiplicity of G as input, the algorithm given in Figure 12 computes live .

p i i

In this algorithm, processor i is denoted by its index . We refer to the th iteration

of the main loop as the ith step of the construction. This construction has two obvious properties:

7.1 Live processors 25

S f ng

n for each i

count 0

i i i n

for each j m

if count = i then break

S

if j then

S S fj g

count count + 1

G S

live

G

Figure 12: The construction of live .

G

Lemma 12: If i live then

i m

1. is uncovered: i

P

i1

i m i

2. room exists under : j

j =1

i G m i i

Proof: Suppose live . For part1, if i then will be removed by step if it

G

has not already removed by an earlier step, contradicting i live . For part 2, notice

P

i1

m i

that steps through remove a total of j values. If this sum is greater

j =1

i

than i , then it is not possible for all of these values to be contained in ,

i i G so i will be removed within the first steps, contradicting live .

The assignment of graphs to the corners of a simplex has the property that once p S

becomes covered on one corner of S , it remains covered on the following corners of :

p G G i j p j

Lemma 13: If is uncoveredin the graphs i and , where , then is uncovered

G G G

i+1 j

in each graph i .

p G i j p G

1

Proof: If is covered in for some between and , then is uncovered in

G i j G G

1 and covered in for some between and . Since and are on adjacent

vertices of the simplex, the sequences of graphs merged to construct them are of

H H H H H H m

k m k 1

the form 1 and , respectively, for some .

m

p G G p H

1 m

Since is uncovered in and covered in , it must be that is uncovered in

H

and covered in H . Notice, however, that is used in the construction of each

m

G G G p

+1 j

graph . This means that is covered in each of these graphs, con-

p G tradicting the fact that is uncovered in j . Finally, because token placements in adjacent graphs on a simplex differ in at most the movement of one token from one processor to an adjacent processor, we can use

the preceding lemma to prove the following:

p G p G p G

j i

Lemma 14: If live i and live , then has the same rank in live

G

and live j .

i j p G p

Proof: Assume without loss of generality that . Since live i and

G p G G

i j live j , Lemma 12 implies that is uncovered in the graphs and , and Lemma13

26 8 ORDERED PAIR ASSIGNMENT

p G G G

i+1 j implies that is uncovered in each graph i . Since token placements in adjacent graphs differ in at most the movement of one token from one processor to

an adjacent processor, and since p is uncovered in all of these graphs, this means that

the number of tokens on processors smaller than p is the same in all of these graphs.

P

p1

m p

Specifically, the sum of multiplicities of processors smaller than is the

=1

G G G

i+1 j

same in i . In particular, Lemma 12 implies that this sum is the same

s p G G p p s G G

j i j value in i and , so has the same rank in live and live .

7.2 Processor labeling

G G

We now choose one processor from each set live to label the vertex with graph .

x x x k

Given a vertex 1 , we define

k

X

x mo d k x

plane i

i=1

.

y x

Lemma 15: If x and are distinct vertices of the same simplex, then plane

y

plane .

x y y x f f j

Proof: Since and are in the same simplex, we can write 1

f f j k x x x

j 1 k

for some distinct unit vectors 1 and some . If

P P

k k

y y y j x y

1 k i

and , then the sums i and differ by exactly . Since

i=1 i=1

j k k x

and since planes are defined as sums modulo , we have plane

y

plane . x

We define a global processor labeling as follows: given a vertex labeled with a

x x G graph G , we define to map to the processor having rank plane in live .

Lemma 16: The mapping is a global processor labeling.

x G

Proof: First, it is clear that maps each vertex labeled with a graph x to a pro-

q G x

cessor x that is uncovered in . Second, maps distinct vertices of a simplex to

y p

distinct processors. To see this, suppose that both x and are labeled with , and

G G x y p G

y x

let x and be the graphs labeling and . We know that the rank of in live

x p G y p

is plane and that the rank of in live y is plane , and we know that has the

G G x y y same rank in live x and live by Lemma 14. Consequently, plane plane ,

contradicting Lemma 15. We label the vertices of B with processors according to the processor labeling .

8 Ordered Pair Assignment

p L

Finally, we assign ordered pairs of processor ids and local communicationgraphs

x p G x

to vertices of B . Given a vertex labeled with processor and graph , we label

p L L p G with the ordered pair where is the local communication graph of in . The 27 following result is a direct consequence of Lemmas 11 and 16. It says that the local communicationgraphs of processors labeling the corners of a vertex are consistent with

a single global communication graph.

q q L L

k 0 k Lemma 17: Let 0 and be the processors and local communication

graphs labeling the vertices of a simplex. There is a global communication graph G

q G

with the property that each i is nonfaulty in and has the local communication

L G graph i in .

9 Sperner’s Lemma

We now state Sperner’s Lemma, and use it to prove a lower bound on the number of

rounds required to solve k -set agreement.

B c N N i

Notice that the corners of are points of the form i with

N i k c c N 1

indices of value for . For example, 0 , ,

c N N B

and k . Informally, a Sperner coloring of assigns a color to each

c w i

vertex so that each corner vertex i is given a distinct color , each vertex on the edge

c c w w

j i j

between i and is given either or , and so on.

F S

More formally, let S be a simplex and let be a face of . Any triangulation

F T S

of S induces a triangulation of in the obvious way. Let be a triangulation of .

T T A Sperner coloring of T assigns a color to each vertex of so that each corner of

has a distinct color, and so that the vertices contained in a face F are colored with the

F T colors on the corners of F , for each face of . Sperner colorings have a remarkable

property: at least one simplex in the triangulation must be given all possible colors. k

Lemma 18 (Sperner’s Lemma): If B is a triangulation of a -simplex, then for any

k B Sperner coloring of B , there exists at least one -simplex in whose vertices are all given distinct colors.

Let P be the protocol whose existence we assumed in the previous section. Define

B x p

a coloring P of as follows. Given a vertex labeled with processor and local

x v P p

communication graph L, color with the value that requires processor to choose P when its local communication graph is L. This coloring is clearly well-defined, since

is a protocol in which all processors chose an output value at the end of round r . We will now expand the argument sketched in the introduction to show that P is a Sperner coloring.

We first prove a simple claim. Recall that B is the simplex whose vertices are the

c c B B F B k

corner vertices 0 , and that is a triangulation of . Let be some face of

c F F B not containing the corner i , and let denote the triangulation of induced by . We

prove the following technical statement about vertices in F .

x x x F c

k i

Claim 19: If 1 is a vertex of a face not containing , then

i x N

1. if , then 1 ,

i k x x

+1 i 2. if , then i , and

28 9 SPERNER’S LEMMA

i k x

3. if , then k . B

Proof: Each vertex x of can be expressed using barycentric coordinates with respect

x c c j k

0 k k j

to the corner vertices: that is, 0 , where for

P

k

x F c i

and i . Since is a vertex of a face not containing the corner , it

i=0

follows that i . We consider the three cases.

i c c N

k 0

Case 1: . Each corner 1 has the value in the first position. Since ,

c c N N

0 k k 1 k

the value in the first position of 0 is .

i k c c i i

i1

Case 2: . Each corner 0 has in positions and , and

c c N i i

+1 k i

each corner i has in positions and . Since , the linear

c c N

0 k k i+1 k

combination 0 will have the same value in

i i x x

i+1

positions and . Thus, i .

i k c c k

k 1 k

Case 3: . Each corner 0 has in position . Since , the value

k c c x

0 k k k

in the th position of 0 is . Thus, .

k f

Lemma 20: If P is a protocol for -set agreement tolerating faults and halting

r bf k c B

in rounds, then P is a Sperner coloring of .

Proof: We must show that P satisfies the two conditions of a Sperner coloring.

c c i

For the first condition, consider any corner vertex i . Remember that was origi- F

nally labeled with the 1-graph i describing a failure-free execution in which all pro-

v L c i

cessors start with input i , and that the local communication graph labeling is a

F k

subgraph of i . Since the -set agreement problem requires that any value chosen by a

v F i

processor must be an input value of some processor, all processors must chose i in ,

c v c

i i

and it follows that the vertex i must be colored with . This means that each corner v

is colored with a distinct value i . B

For the second condition, consider any face F of , and let us prove that vertices

F F c

in are colored with the colors on the corners of . Equivalently, suppose that i is

F F v

not a corner of , and let us prove that no vertex in is colored with i . x

Consider the global communication graph G originally labeling , and the graphs

H H G k

1 used in the merge defining . The definition of this merge says that the

hp i G v m m h p i

inputvalue labeling a node in is m where is the maximum such that

v H v m

m 0

is labeled with m in , or if no such exists. Again, we consider three cases.

G v

In each case, we show that no processor in has the input value i .

i x N H F

1 1

Suppose . Since 1 by Claim 19, we know that , where the v

input value of every processor is 1 . By the definition of the merge operation, it follows

G v

immediately that no processor in can have input value 0 .

i k x x H

+1 i i

Suppose . Again, i by Claim 19. Now, is the result

x v F H

i i i1 i+1

of applying i , the first operations of , to the graph . Similarly,

x v F

+1 i+1 i+1 i

is the result of applying i , the first operations of , to the graph .

x x

+1 i i i+1 i

Since i , both and are of the same length, and it follows that con-

p v

i+1

tains an operation of the form change i if and only if contains an operation

p v +1

of the form change i . This implies that for any processor, either its input value

v H v H v H v H

1 i i i+1 i i i+1 i+1

is i in and in , or its input value is in and in . In both v cases, i is not the input value of this processor.

29

i k x H F

k k 1

Suppose . Since k by Claim 19, we know that , where the

v

1

input value of every processor is k . By the definition of merge, it follows immedi-

G v

ately that no processor in can have input value k .

x F B c

Therefore, we have shown that if is a vertex of a face of , and i is not a

G x

corner vertex of F , then the communication graph corresponding to contains no v

processor with input value i . Therefore, by the agreement condition, the value chosen

v x v x i

at this vertex cannot be i , and it follows that is assigned a color other than . So,

v c F c v

j j j

must be colored by a color j such that is a corner vertexof . Since is colored ,

the second condition of Sperner’s Lemma holds. So P is a Sperner coloring.

Sperner’s Lemma guarantees that some primitive simplex is colored by k dis-

tinct values, and this simplex corresponds to a global state in which k processors

k

choose k distinct values, contradicting the definition of -set agreement:

f k k

Theorem 21: If n , then no protocol for -set agreement can halt in fewer

f k c

than b rounds.

k f

Proof: Suppose P is a protocol for -set agreement tolerating faults and halting

bf k c B

in r rounds, and consider the corresponding Bermuda Triangle . Lemma20

B

says that P is a Sperner coloring of , so Sperner’sLemma 18 says that there is a sim-

S k v v q q

k 0 k

plex whose vertices are colored with distinct values 0 . Let

L L k

and 0 be the processors and local communication graphs labeling the corners

S G q

of . By Lemma 17, there exists a communication graph in which i is nonfaulty

L G r

and has local communication graph i . This means that is a time global commu-

P q v k i

nication graph of in which each i must choose the value . In other words,

P k processors must choose k distinct values, contradicting the fact that solves -set

agreement in r rounds.

10 Protocol k An optimal protocol P for -set agreement is given in Figure 13. In this protocol, processors repeatedly broadcast input values and keep track of the least input value

received in a local variable best. Initially, a processor sets best to its own input value.

f k c In each of the next b rounds, the processor broadcasts the value of best and then sets best to the smallest value received in that round from any processor (including itself). In the end, it chooses the value of best as its output value.

To prove that P is an optimal protocol, we must prove that, in every execution

r bf k c of P , processors halt in rounds, every processor’s output value is some

processor’s input value, and the set of output values chosen has size at most k . The first

two statements follow immediately from the text of the protocol, so we need only prove

t p p

the third. For each time and processor , let bestpt be the value of best held by at

t t t

t q t

time . For each time , let Best be the set of values bestq best where the

1

q q t

processors 1 are the processors active through time . Notice that Best is

r

the set of input values, and that Best is the set of chosen output values. Our first

t

observation is that the set Best never increases from one round to the next.

t t t Lemma 22: Best Best for all times . 30 10 PROTOCOL

best input value;

bf k c for each round through do

broadcast best;

b b

receive values 1 from other processors;

min fb b g best 1 ;

choose best. k

Figure 13: An optimal protocol P for -set agreement.

b t b p +1

Proof: If Best , then bestpt for some processor active through

t b b p

1 +1

round . Since bestpt is the minimum of the values sent to by

t b q

processors during round , we know that bestq t for some processor that is

b t active through round t. Consequently, Best . We can use this observation to prove that the only executions in which many output

values are chosen are executions in which many processors fail. We say that a proces-

t q p t

sor p fails before time if there is a processor to which sends no message in round q

(and p may fail to send to in earlier rounds as well).

tj d dt t

Lemma 23: If jBest , then at least processors fail before time .

t

Proof: We proceed by induction on t. The case of is immediate, so suppose

t j tj d

that t and that the induction hypothesis holds for . Since Best

t t j t j d

and since Best Best by Lemma 22, it follows that Best ,

S dt

and the induction hypothesis for t implies that there is a set of processors

d

that fail before time t . It is enough to show that there are an additional processors t

not contained in S that fail before time .

b b t q d

Let 0 be the values of Best written in increasing order. Let be a pro-

b t b q

d i i

cessor with bestq t set to the largest value at time , and for each value let be a

b t q q

0 d

processor that sent i in round . The processors are distinct since the

b b t d

values 0 are distinct, and these processors do not fail before time since S

they send a message in round t, so they are not contained in . On the other hand, the

q q b b t

d1 0 d1

processors 0 sending the small values in round clearly did

q b q q t

not send their values to the processor setting best to the large value d , or would

d q q

0 d1

have set bestq t to a smaller value. Consequently, these processors fail t

in round t and hence fail before time .

r

Since Best is the set of output values chosen by processors at the end of round

bf k c k k r

r , if output values are chosen, then Lemma 23 says that at least

k r processors fail, which is impossible since f . Consequently, the set of output

values chosen has size at most k , as we are done.

k bf k c Theorem 24: The protocol P solves -set agreement in rounds. REFERENCES 31

Acknowledgments

The results in the paper have appeared earlier in preliminary form. The lower bound result has appeared in [CHLT93] and the algorithm has appeared in [Cha91]. Much of this work was performed while the first author was visiting MIT. The first and third authors were supported in part by NSF grant CCR-89-15206, in part by DARPA con- tracts N00014-89-J-1988, N00014-92-J-4033, and N00014-92-J-1799, and in part by ONR contract N00014-91-J-1046. In addition, the first author was supported in part by NSF grant CCR-93-08103.

References

[AR96] and Sergio Rajsbaum. A combinatorial topology framework for wait-free computability. In Proceedings of the Workshop on Distributed Algorithms and Graphs, 1996. [BHG87] Philip A. Bernstein, Vassos Hadzilacos, and Nathan Goodman. Concurrency Control and Recovery in Database Systems. Addison-Wesley Publishing Company, Reading, Massachusetts, 1987. [BG93] Elizabeth Borowsky and Eli Gafni. Generalized FLP impossibility result

for t-resilient asynchronous computations. In Proceedings of the 25th ACM Symposium on the Theory of Computing, May 1993. [Cha91] Soma Chaudhuri. Towards a complexity hierarchy of wait-free concurrent objects. In Proceedings of the 3rd IEEE Symposium on Parallel and Dis- tributed Processing. IEEE, December 1991. Also appeared as Technical Report No. 91-024, Iowa State University, 1991. [Cha93] Soma Chaudhuri. More choices allow more faults: Set consensus prob- lems in totally asynchronous systems. Information and Computation, 105(1):132–158, July 1993. A preliminary version appeared in ACM PODC, 1990. [CHLT93] Soma Chaudhuri, Maurice Herlihy, Nancy Lynch, and Mark R. Tuttle. A

tight lower bound for k -set agreement. In Proceedings of the 34th IEEE Symposium on Foundations of Computer Science, pages 206–215. IEEE, November 1993. [DM90] and . Knowledge and common knowledge in a Byzantine environment: Crash failures. Information and Computation, 88(2):156–186, October 1990. [Dol82] Danny Dolev. The Byzantine generals strike again. Journal of Algorithms, 3(1):14–30, March 1982. [DS83] DannyDolev and H. Raymond Strong. Authenticated algorithms for Byzan- tine agreement. SIAM Journal on Computing, 12(3):656–666, November 1983. 32 REFERENCES

[Fis83] Michael J. Fischer. The consensusproblemin unreliable distributed systems (a brief survey). In Marek Karpinsky, editor, Proceedings of the 10th Inter- national Colloquium on Automata, Languages, and Programming, pages 127–140. Springer-Verlag, 1983. A preliminary version appeared as Yale Technical Report YALEU/DCS/RR-273. [FL82] Michael J. Fischer and Nancy A. Lynch. A lower bound for the time to assure interactive consistency. Information Processing Letters, 14(4):183– 186, June 1982. [FLP85] Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. Impossibility of distributed consensus with one faulty processor. Journal of the ACM, 32(2):374–382, 1985. [GK96] Eli Gafni and . Three-processor tasks are undecidable. daphne.cs.ucla.edu/eli/undec.ps, 1996. [Had83] Vassos Hadzilacos. A lower bound for Byzantine agreement with fail-stop processors. Technical Report TR–21–83, Harvard University, 1983. [Her91] Maurice P. Herlihy. Wait-free synchronization. ACM Transactions on Pro- gramming Languages and Systems, 13(1):124–149, January 1991. [HR94] Maurice P. Herlihy and Sergio Rajsbaum. Set consensus using arbitrary objects. In Proceedings of the 13th Annual ACM Symposium on Principles of Distributed Computing, August 1994. [HR95] Maurice P. Herlihy and Sergio Rajsbaum. Algebraic spans. In Proceedings of the 14th Annual ACM Symposium on Principles of Distributed Comput- ing, pages 90–99. ACM, August 1995. [HS93] Maurice P. Herlihy and . The asynchronous computability the- orem for t-resilient tasks. In Proceedings of the 25th ACM Symposium on Theory of Computing, pages 111–120. ACM, May 1993. [HS94] Maurice P. Herlihy and Nir Shavit. A simple constructive computability theorem for wait-free computation. In Proceedings of the 1994 ACM Sym- posium on Theory of Computing, May 1994. [LSP82] Leslie Lamport, Robert Shostak, and Marshall Pease. The Byzantine gener- als problem. ACM Transactions on Programming Languages and Systems, 4(3):382–401, July 1982. [Mer85] Michael Merritt. Notes on the Dolev-Strong lower bound for byzantine agreement. Unpublished manuscript, 1985. [MT88] Yoram Moses and Mark R. Tuttle. Programmingsimultaneous actions using common knowledge. Algorithmica, 3(1):121–169, 1988. [PSL80] Marshall Pease, Robert Shostak, and Leslie Lamport. Reaching agreement in the presence of faults. Journal of the ACM, 27(2):228–234, 1980. REFERENCES 33

[SZ93] Michael Saks and Fotis Zaharoglou. Wait-free k -set agreement is impossi- ble: The topology of public knowledge. In Proceedings of the 1993 ACM Symposium on Theory of Computing, May 1993. [Spa66] Edwin H. Spanier. Algebraic Topology. Springer-Verlag, New York, 1966.

[W+ 78] J. H. Wensley et al. SIFT: Design and analysis of a fault-tolerant computer for aircraft control. Proceedings of the IEEE, 66(10):1240–1255, October 1978. 34 REFERENCES

TM Tight Bounds for k-Set Agreement Soma Chaudhuri Maurice Herlihy CRL 98/4 Nancy A. Lynch Mark R. Tuttle May 1998