WaitFree Consensus

James Aspnes

July

CMUCS

Scho ol of Computer Science

Carnegie Mellon University

Pittsburgh PA

Submitted in partial fulllmen t of the requirements

for the degree of Do ctor of Philosophy

c

James Aspnes

This researc h was supp orted by an IBM Graduate Fellowship and an NSF Graduate

Fellowship

Keywords distributed algorithms shared memory consensus random

walks martingales shared coins

Abstract

Consensus is a decision problem in which n pro cessors each starting with a

value not known to the others must collectively agree on a single value If the

initial values are equal the pro cessors must agree on that common value this

is the validity condition A consensus proto col is waitfree if every pro ces

sor nishes in a nite numb er of its own steps regardless of the relative sp eeds

of the other pro cessors a condition that precludes the use of traditional syn

chronization techniques such as critical sections lo cking or leader election

Waitfree consensus is fundamental to synchronization without mutual ex

clusion as it can b e used to construct waitfree implementations of arbitrary

concurrent data structures It is known that no deterministic algorithm for

waitfree consensus is p ossible although many randomized algorithms have

b een prop osed

I present two algorithms for solving the waitfree consensus problem in the

standard asynchronous sharedmemory model The rst is a very simple

proto col based on a random walk The second is a proto col based on weighted

2

voting in which each pro cessor executes O n log n exp ected op erations

This b ound is close to the trivial lower b ound of n and it substantially

2

improves on the b est previouslyknown b ound of O n log n due to Bracha

and Rachman

Acknowledgments

I would like to b egin by thanking the memb ers of my committee My advisor

Steven Rudich has always b een willing to provide encouragement Danny

Sleator showed me that research is b etter done as play than as work Merrick

Furst has always b een a source of interesting observations and ideas Maurice

Herlihy intro duced me to waitfree consensus when I rst arrived at Carnegie

Mellon his keen insights into have b een a continuing

inuence on my work

I would like to thank my parents for their warm supp ort and encourage

ment

I would like to thank the many other students who lightened the monastic

burdens of graduate student life David Applegate in particular was always

ready to supply a new problem or a new toy

Finally I would like to thank my b eloved wife Nan Ellman who waited

longer and more patiently for me to nish than anyone i

Contents

Intro duction

The Asynchronous SharedMemory Mo del

Basic elements

Time and asynchrony

Randomization

Relation to other models

Performance measures

Consensus and Shared Coins

Consensus

Shared coins

Consensus using shared coins

Consensus Using a Random Walk

Random walks

The robust shared coin proto col

Implementing a b ounded counter with atomic registers

The randomized consensus proto col

Consensus Using Weighted Voting

Intro duction

The shared coin proto col

Martingales

Knowledge algebras and measurability

Denition of a martingale

Gambling systems ii

Limit theorems

Pro of of correctness

Phases of the proto col

Common votes

Extra votes

Written votes vs decided votes

Choice of parameters

Conclusions and Op en Problems

Comparison with other proto cols

Limits to waitfree consensus

Op en problems iii

List of Figures

Consensus from a shared coin

Robust shared coin proto col

Pictorial representation of robust shared coin proto col

The proto col as a controlled random walk

Pseudo co de for counter op erations

Consensus proto col

Counter scan for randomized consensus proto col

Shared coin proto col iv

List of Tables

Comparison of consensus proto cols v

Chapter

Intro duction

Consensus CIL is a to ol for allowing a group of pro cessors to collectively

cho ose one value from a set of alternatives It is dened as a decision problem

in which n pro cessors each starting with a value or not known to

the others must collectively agree on a single value The restriction to a

single bit do es not prevent the pro cessors from cho osing b etween more than

two p ossibilities since they can run just run a onebit consensus proto col

multiple times The pro cessors communicate by reading from and writing

to a collection of registers each pro cessor nishes the proto col by deciding

on a value and halting A consensus proto col is waitfree if each pro cessor

makes its decision after a nite numb er of its own steps regardless of the

relative sp eeds or halting failures of the other pro cessors In addition a

consensus proto col must satisfy the validity condition if every pro cessor

starts with the same input value every pro cessor decides on that value This

condition excludes trivial proto cols such as one where every pro cessor always

decides

The asynchronous sharedmemory model is an attempt to capture the

eect of making the weakest p ossible assumptions ab out the timing of events

in a distributed system At each moment an adversary scheduler cho oses one

of the n pro cessors to run No guarantees are made ab out the schedulers

choices it may start and stop pro cessors at will based on a total knowledge

of the state of the system including the contents of the registers the pro

gramming of the pro cessors and even the internal states of the pro cessors

Since the scheduler can always simulate a halting failure by cho osing not to

run a pro cessor the mo del eectively allows up to n halting failures The

adversarys p ower however is not unlimited It cannot cause the pro cessors

to deviate from their programming or cause op erations on the registers to

fail or return incorrect values

Combined with the requirement that a consensus proto col terminate af

ter a nite numb er of op erations the adversarys p ower precludes the use of

traditional synchronization techniques such as critical sections lo cking or

leader election any pro cessor that obtains a critical resource can b e killed

and as so on as any pro cessor or group of pro cessors is given control over the

outcome of the proto col the scheduler can put them to sleep and leave the

other pro cessors helpless to complete the proto col on their own In general

any proto col dep ending on mutual exclusion where one pro cessors p os

session of a resource or role dep ends on other pro cessors b eing excluded from

it will not b e waitfree

Waitfree consensus is fundamental to synchronization without mutual

exclusion and thus lies at the heart of the more general problem of con

structing highly concurrent data structures Her It can b e used to obtain

waitfree implementations of arbitrary abstract data typ es with atomic op er

ations Her Plo It is also complete for distributed decision tasks

CM in the sense that it can b e used to solve all such decision tasks that

have a waitfree solution Intuitively the pro cessors can individually simulate

a sequence of op erations on an ob ject or the computation of a decision value

and use consensus proto cols to cho ose among the p ossibly distinct outcomes

Conversely if consensus is not p ossible it is also impossible to construct

waitfree implementations for many simple abstract data typ es including

queues testandset bits or compareandswap registers as there exist sim

ple deterministic consensus proto cols for b ounded numb ers of pro cessors

using these primitives Her

Alas given the p owerful adversary of the asynchronous sharedmemory

model it is not p ossible to have a deterministic waitfree consensus proto col

one in which the b ehavior of the pro cessors is predictable in advance In

fact in a wide variety of asynchronous models it has b een shown there is

no deterministic consensus proto col that works against a scheduler that can

stop even a single pro cessor CIL DDS FLP Her LAA TM

Though this result is usually proved using more general techniques when

only singlewriter registers are used it has a simple pro of that illustrates

many of the problems that arise when trying to solve waitfree consensus

Imagine that two pro cessors A and B are trying to solve the consensus

problem Their situation is very much like the situation of two p eople facing

each other in a narrow hallway neither p erson has any stake in whether

they pass on the left or the right but if one go es left and the other right

they will bump into each other and make no progress When A and B are

deterministic pro cessors under the control of a malicious adversary scheduler

we can show the scheduler will b e able to use its knowledge of their state and

its control over the timing of events to keep A and B oscillating back and

forth forever b etween the two p ossible decision values

Here is what happ ens Since each pro cessor is deterministic at any given

p oint in time it has some preference dened as the value left or right

in the hallway example that it will eventually cho ose if the other pro cessor

1

executes no more op erations AHa At the b eginning of the proto col

each pro cessors preference is equal to its input b ecause without knowing

that some other pro cessor has a dierent input it must cautiously decide on

its own input to avoid violating the validity condition So we can assume

that initially pro cessor A prefers to pass on the left and pro cessor B on the

right

Now the scheduler go es to work It stops B and runs A by itself After

some nite numb er of steps A must make a decision to go left and halt or

the termination condition will b e violated But b efore A can nish it must

make sure that B will make the same decision it makes or the consistency

condition will b e violated So at some p oint A must tell B something that

will cause B to change its preference to left and in the sharedmemory

model this message must take the form of a write op eration since B cant

see when A do es a read op eration Immediately before A carries out this

critical write the scheduler stops A and starts B

This action puts B in the same situation that A was in B still prefers to

go right and after some nite numb er of steps it must tell A to change its

preference to right When this p oint is reached either one of two conditions

holds either B has done something to neutralize As still undelivered demand

that B change it preference in which case the scheduler just stops B and

runs A again or b oth A and B are ab out to deliver writes that will cause

the other to change its preference In this case the scheduler allows b oth of

1

This unfortunate p ossibility is unlikely to o ccur in the realworld hallway situation

assuming healthy participants but it is allowed by the asynchronous sharedmemory model

since the adversary can always cho ose never to run the other pro cessor again

the writes to go through and now A prefers to go right and B prefers to go

left putting the two pro cessors back where they started with roles reversed

In eect the adversary uses its p ower over the timing of events to make sure

that just when A gives in and agrees to adopt B s p osition B do es exactly

the same thing and so on ad innitum

Fortunately human b eings do not app ear to b e controlled by an adver

sary scheduler so in real life one hardly ever sees two p eople b ouncing in

unison from one side of a hallway to the other for more than a few iterations

Pro cessors that do not have even the illusion of free will can nonetheless get

some of the same eect using randomization Imagine in the hallway situa

tion that A had the ability to tell B to ip a fair coin to set its new preference

No matter what As preference was or changed to there would b e a

chance that the result of B s coin ip would match As preference In fact

if b oth pro cessors were continually ipping coins to change their preferred

value a run of identical coin ips would so on o ccur that was long enough

that the two pro cessors would b e able to notice that they were in agreement

and the proto col would terminate Though this rough description leaves out

many important details it gives the basic idea b ehind the rst randomized

consensus proto col for the asynchronous sharedmemory mo del due to Abra

hamson Abr The only drawback of the approach is that it do es not scale

well the o dds of n pro cessors simultaneously ipping heads is exp onentially

small and b ecause agreement is not detected immediately in Abrahamsons

2

O (n )

proto col its worstcase exp ected running time is only b ounded by

The rst p olynomialtime consensus proto col for this mo del is describ ed

by Aspnes and Herlihy AHa The key observation similar to one made

by Chor Merritt and Shmoys CMS in the context of a dierent model is

that the n dierent lo cal coin ips can b e replaced by a single shared coin

proto col that pro duces random bits that all of the pro cessors agree on with

at least a constant probability regardless of the b ehavior of the scheduler

We showed that it is p ossible to construct a consensus proto col from any

shared coin proto col by running the shared coin rep eatedly until agreement

is reached A description of the construction app ears in Section The

cost of the resulting consensus proto col is within a constant factor of the cost

of the shared coin Subsequent work on sharedmemory consensus proto cols

has concentrated primarily on the problem of constructing ecient shared

coin proto cols

All currently known shared coin proto cols use some form of a very simple

idea Each pro cessor rep eatedly adds random  votes to a common p o ol

until some termination condition is reached Any pro cessor that sees a p os

itive total vote decides and those that see a negative total vote decide

Intuitively b ecause all of the pro cessors are executing the same lo op over and

over again the adversarys p ower is eectively limited to blo cking votes it

dislikes by stopping pro cessors in b etween ipping their lo cal coins to decide

on the value of the votes and actually writing the votes out to the registers

The adversarys control is limited by running the proto col for long enough

that the sum of these blo cked votes is likely to b e only a fraction of the total

2

vote a pro cess that requires accumulating n votes

In the original shared coin proto col of Aspnes and Herlihy AHa each

pro cessor decides on a value when it sees a total vote whose absolute value

is at least a constant multiple of n from the origin For each of the exp ected

2 2

n votes n register op erations are executed giving a total running

4

time of n op erations Unfortunately b oth the implementation of the

counter representing the p osition of the random walk and the mechanism for

rep eatedly running the shared coin require a p otentially unb ounded amount

of space This problem was corrected in a proto col of Attiya Dolev and

Shavit ADS which retained the multiple rounds of its predecessor but

cleverly reused the space used by old shared coins once they were no longer

needed

A simpler descendent of the shared coin proto col of Aspnes and Herlihy

which also requires only b ounded space is the shared coin proto col describ ed

in Chapter This proto col by using a more sophisticated termination con

dition guarantees that the pro cessors always agree on its outcome A simple

modication of this proto col gives a consensus proto col that do es not require

multiple executions of a shared coin which can b e implemented using only

three O log nbit counters supp orting increment decrement and read op er

2

ations and which runs in only n exp ected counter op erations However

this apparent sp eed is lost in the implementation of the counter b ecause

2

n register op erations are needed for each counter op eration giving it

4

the same running time of n exp ected register op erations as its prede

cessors Since the consensus proto col of Chapter rst app eared Asp

other researchers BR DHPW have describ ed weaker primitives that

act suciently like counters to make the proto col work and which use only a

linear numb er of register op erations for each counter op eration Using these

primitives in place of the counters gives a consensus proto col that runs in

3

exp ected n register op erations

An alternative to having each pro cessor nish the proto col when it sees a

total vote far from the origin is to simply gather votes until some predeter

mined quorum is reached The rst shared coin proto col to use this technique

2

is that of Saks Shavit and Woll SSW It is still necessary to gather n

votes to overcome the eect of votes withheld by the scheduler and in fact

4

the SaksShavitWoll proto col still requires n register op erations Fur

thermore it is unlikely that any proto col that runs in a xed numb er of

total op erations can guarantee that all pro cessors agree on the outcome of

the coin thus it is necessary to retain the complex multiple rounds of the

AspnesHerlihy proto col in some form However stopping the proto col after

a sp ecied numb er of votes are collected has a very important consequence

it is no longer necessary for a pro cessor to check for termination after every

vote it casts

This remarkable fact was observed by Bracha and Rachman BR and

is the basis for their fast shared coin proto col In this proto col as in previous

shared coin proto cols the pro cessors rep eatedly generate random  votes

2

and add them to a running total After a quorum of n votes are collected

pro cessors may decide on the output of the shared coin based on the sign

of the total vote But each pro cessor only checks if the quorum has b een

reached after every O n log n votes so the pro cessors can generate an

2

additional O n log n extra votes b eyond the common votes making up

the quorum However by making the numb er of common votes large enough

compared to the numb er of extra votes the probability that the extra votes

will change the sign of the total vote can b e made arbitrarily small Thus

even if one pro cessor reads the total vote immediately after the quorum is

reached and another reads it after many extra votes have b een cast it is

still likely that b oth will agree with each other on the outcome of the shared

coin In addition b ecause each pro cessor only needs to compute the total

vote once after it has seen a full quorum no counters or other complicated

primitives are needed to keep track of the voting Each pro cessor simply

maintains in its own register a tally of all the votes it has cast and computes

the total vote by summing the tallies in all of the registers The result is that

2

the proto col requires only O n log n expected total register op erations

There is however still ro om for improvement All of the shared coin

proto cols we have described suer from a fundamental aw if the scheduler

stops all but one of the pro cessors that lone pro cessor is still forced to

2

generate n lo cal coin ips The essence of waitfreeness is b ounding the

work done by a single pro cessor despite the failures of other pro cessors But

the b ound on the work done by a single pro cessor in every one of these

proto cols is asymptotically no b etter than the b ound on the work done by

all of the pro cessors together

Chapter shows that waitfree consensus can b e achieved without forcing

a fast pro cessor to do most of the work I describ e a shared coin proto col

in which the pro cessors cast votes of steadily increasing weights In eect a

fast pro cessor or a pro cessor running in isolation b ecomes impatient and

starts casting large votes to nish the proto col more quickly This mechanism

do es grant the adversary greater control b ecause it can cho ose from up to n

dierent weights one for each pro cessor when determining the weight of the

next vote to b e cast One eect of this control is that a more sophisticated

analysis is required than for the unweightedvoting proto cols Still with

appropriatelychosen parameters the proto col guarantees that each pro cessor

2

nishes after only O n log n exp ected op erations

The organization of the dissertation is as follows Chapters and pro

vide a framework of denitions for the material in the later chapters Chapter

describ es the asynchronous sharedmemory mo del in detail and compares

it with other models of distributed systems Chapter formally denes the

consensus problem and its relationship to the problem of constructing shared

coins The main results app ear in Chapters and Chapter describes the

simple consensus proto col based on a random walk Chapter describ es the

faster proto col based on weighted voting Finally Chapter compares these

results to other solutions to the problem of waitfree consensus and discusses

p ossible directions for future work

Much of the content of Chapters and also app ears in Asp and

AW resp ectively Some of the material in Chapter is derived from

AHa

Chapter

The Asynchronous

SharedMemory Mo del

This chapter gives a detailed description of the asynchronous sharedmemory

model This model is the standard one for analyzing waitfree consensus

proto cols Abr ADS AHa Asp AW BR BR DHPW

SSW Though it app ears in varying guises all are essentially equivalent

The description of the model here largely follows that of the weak model

of Abrahamson Abr The reader interested in a more formal denition of

the model may nd one in AHa based on the IO Automaton model of

Lynch Lyn

Basic elements

The system consists of a collection of n pro cessors state machines whose

b ehavior is typically sp ecied by a highlevel proto col In principle no

limits are assumed on the computational p ower of the pro cessors although

in practice none of the proto cols describ ed in this do cument will require much

lo cal computation

The pro cessors can communicate only by executing read and write op

erations on a collection of singlewriter multireader atomic registers

Lam Lamb Each of these registers is asso ciated with one of the pro

cessors its owner Only the owner of a register is allowed to write to it

although any of the pro cessors may read from it Atomicity means that

read and write op erations act as if they take place instantaneously they

never fail and the result of concurrent execution of multiple op erations on

the same register is consistent with their having o ccurred sequentially

The assumptions b ehind atomicity may app ear to b e rather strong esp e

cially in a mo del that is designed to b e as harsh as p ossible However it turns

out that atomic registers are not p owerful enough to implement determinis

tically such simple synchronization primitives as queues or testandset bits

Her and may b e constructed eciently from much weaker primitives in a

variety of ways BP IL NW Pet SAG So in fact the apparent

strength of atomic registers is somewhat illusory

Time and asynchrony

The systems represented by the mo del may have many events o ccurring con

currently However b ecause the only communication b etween pro cessors in

the system is by op erations on atomic registers it is p ossible to represent

its b ehavior using a globaltime mo del BD Lama Lamb Instead

of treating op erations on the registers as o ccurring over p ossiblyoverlapping

intervals of time they are treated as o ccurring instantaneously The history

of an execution of the system can thus b e described simply as a sequence

of op erations Concurrency in the system as a whole is mo deled by the

interleaving of op erations from dierent pro cessors in this sequence

The actual order of the interleaving is the primary source of nondetermin

ism in the system At any given time there may b e up to n pro cessors that are

ready to execute another op eration how then do es the system cho ose which

of the pro cessors will run next We would like to make as few assumptions

here as p ossible so that our proto cols will work under the widest p ossible

set of circumstances One way of doing this is to assign control over timing

to an adversary scheduler a function that cho oses a pro cessor to run at

each step based on the previous history and current state of the system The

adversary scheduler is not b ound by any fairness constraints it may start

and stop pro cessors at will doing whatever is necessary to prevent a proto col

from executing correctly In addition no limits are placed on the schedulers

computational p ower or knowledge of the programming or internal states of

the pro cessors However its control is limited only to the timing of events

in the system it cannot for example cause a read op eration to return the

wrong value or a pro cessor to deviate from its programming

The denition of a waitfree proto col implicitly dep ends on having such

a p owerful adversary A proto col is said to b e waitfree if every pro cessor

nishes the proto col in a nite numb er of its own steps regard less of the

relative speeds of the other processors The adversary in the asynchronous

sharedmemory mo del simply represents the universal quantier hidden in

that condition If we can design a proto col that will b eat an allp owerful

adversary we will know that the proto col will succeed in the far easier task

of working correctly in whatever circumstances chance and the workings of

a real system might throw at it

Randomization

In order to solve the consensus problem in the presence of an adversary

scheduler the pro cessors will need to b e able to act nondeterministically In

addition to giving each pro cessor the ability to write to its own registers and

to read from any of the registers we will give each pro cessor the ability to

ip a lo cal coin This op eration provides the pro cessor with a random bit

that cannot b e predicted by the scheduler in advance though it is known to

the scheduler immediately afterwards by virtue of the schedulers ability to

see the internal states of the pro cessors The timing of coinip op erations

like that of read and write op erations is under the control of the scheduler

Relation to other mo dels

There are other mo dels that are closely related to the asynchronous shared

memory model In particular it is tempting to dene the prop erty of b eing

waitfree as the prop erty that a proto col will nish that is one pro cessor will

nish even in the presence of up to n halting failures where a halting

failure is an event after which a pro cessor executes no more op erations Such

a denition would make waitfreeness a natural extension of tresilience

the prop erty of working in the presence of up to t halting failures

However in the context of a totally asynchronous system this denition

is unnecessarily restrictive It is true that the adversary is able to simulate

up to n halting failures simply by cho osing not to run halted pro ces

sors ever again However there is no reason to b elieve that dead pro cessors

are the only source of diculty in an asynchronous environment For ex

ample the adversary could cho ose to put some pro cessor to sleep for a very

long interval waking it only when its view of the world was so outdated

that its misguided actions would only hinder the completion of a proto col

As the hallway example in the intro duction shows stopping a pro cessor and

reawakening it much later can b e even more devastating stopping a pro cessor

forever Furthermore distinguishing b etween slow pro cessors and dead ones

requires either an assumption that slow pro cessors must take a step after

some b ounded interval or that fast pro cessors may execute a p otentially un

b ounded numb er of op erations waiting for the slow pro cessors to revive The

rst assumption imposes a weak form of synchrony on the system violating

the principle of avoiding helpful assumptions the second makes it dicult

to measure the eciency of a proto col For these reasons we avoid the issue

completely by using the more general denition

Other alternatives to the model involve changing the underlying commu

nications medium from atomic registers either by adopting stronger prim

itives that provide greater synchronization or by moving to some sort of

messagepassing model We avoid the rst approach b ecause as always we

would like to work in as weak a mo del as p ossible However the question of

how a dierent choice of primitives can aect the diculty of solving wait

free consensus is an interesting one ab out which little is known except for

the deterministic case LAA Her

Moving to a messagepassing model presents new diculties In general

the dening prop erty of a messagepassing model is that the pro cessors com

municate by sending messages to each other directly rather than op erating

on a common p o ol of registers or other primitives Messagepassing models

come in b ewildering variety a general taxonomy can b e found in LL

Dolev et al DDS classify a large collection of messagepassing models

and show which are capable of solving consensus deterministically

Among these many models one has traditionally b een asso ciated with

solving asynchronous consensus BND BT CM FLP In this

model the adversary is allowed to i stop up to t pro cessors and ii de

lay messages arbitrarily Unfortunately a simple partition argument shows

that in this model one cannot solve consensus even with a randomized al

gorithm if at least n pro cessors can fail BT Intuitively the adversary

can divide the pro cessors into two groups of size n and delay all messages

passing b etween the groups As neither group will b e able to distinguish

this partitioning from the other group actually b eing dead the two groups

will indep endently come up with decisions that may b e inconsistent On the

other hand solutions to the waitfree consensus problem for shared memory

can b e used to obtain solutions to consensus problems for messagepassing

models with weaker failure conditions by simulating the shared memory An

example of this technique may b e found in BND A general comparison

of the p ower of sharedmemory and messagepassing mo dels in the presence

of halting failures can b e found in CM

Performance measures

It is not immediately obvious how b est to measure the p erformance of a wait

free decision proto col Two measures are very natural for the asynchronous

model as they impose no implicit assumptions on the scheduling of op era

tions in the system These are the total work measure which simply counts

the total numb er of register op erations executed by all the pro cessors together

until every pro cessor has nished the proto col and the p erpro cessor work

measure which takes the maximum over all pro cessors of the numb er of reg

ister op erations executed by each pro cessor individually b efore it nishes the

proto col

The p erpro cessor measure is closer to the spirit of the waitfree guar

antee that each pro cessor nishes in a nite numb er of its own steps as it

gives an upp er b ound on what that nite numb er is However prior to the

proto col of Chapter for every known consensus proto col see Table

the two measures were within a constant factor of each other As a result

only the total work measure has typically b een considered This usage is in

contrast to what is needed in situations where pro cessors may reenter the

proto col rep eatedly as in proto cols for simulating various shared abstract

+

data typ es AAD AG AHb And DHPW Her Plo or for

timestamping and similar mechanisms DS DW IL In these pro

to cols one is typically interested in the numb er of register op erations each

pro cessor must execute to simulate a single op eration on the shared ob ject

an inherently p erpro cessor measure and the total work is meaningful only

when interpreted in an amortized sense

Given the usefulness of the p erpro cessor measure in this broader con

text I will concentrate primarily on it However b ecause the totalwork

measure has traditionally b een used to analyze consensus proto cols it will b e

considered as well

An alternative to these measures that has seen some use in analyzing

waitfree proto cols is the rounds measure of asynchronous time AFL

ALS LF SSW It is used for models that represent halting failures

explicitly When using this measure up to n pro cesses may b e designated

as faulty at the discretion of the adversary once a pro cessor b ecomes faulty it

is never allowed to execute another op eration A round is a minimal interval

during which every nonfaulty pro cessor executes at least one op eration The

measure is simply the numb er of these rounds In eect this measure counts

the op erations of the slowest nonfaulty pro cessor at any given p oint in the

execution If a slow pro cessor executes only one op eration in a given interval

only one round has elapsed even though a faster pro cessor might have carried

out hundreds of op erations during the same interval

The rounds measure is reasonable if one denes the prop erty of b eing

waitfree as equivalent to b eing able to survive up to n halting failures

However as explained ab ove in the context of a totally asynchronous sys

tem this denition is unnecessarily restrictive But once we adopt the more

general denitions we quickly run into trouble If some pro cessor stops and

then starts again much later during the execution of the proto col the entire

p erio d that the pro cessor is inactive counts as only one round As a result

the rounds measure implicitly resolves the problem of distinguishing slow

pro cessors from dead ones by guaranteeing that pro cessors will either run at

b ounded relative sp eeds or not run at all This is in conict with the goal of

using a mo del that is as general as p ossible and for this reason the rounds

1

measure will not b e used here

1

A notion of rounds do es app ear in Section these rounds are part of the internal

structure of the proto col describ ed there and have no relation to the rounds measure

Chapter

Consensus and Shared Coins

This chapter formally describes the problem of solving consensus and the

closelyrelated problem of construction a shared coin and gives an example

of a method for solving consensus using a shared coin This last technique

will b e of particular importance in Chapter

Consensus

Consensus is a decision problem in which n pro cessors each starting with

a value or not known to the others must collectively agree on a single

value A consensus proto col is a distributed proto col for solving consensus

It is correct if it meets the following conditions CIL

 Consistency All pro cessors decide on the same value

 Termination Every pro cessor decides on some value in nite exp ected

time

 Validity If every pro cessor starts with the same value every pro cessor

decides on that value

The basic idea b ehind consensus is to allow the pro cessors to make a

collective decision For this purp ose the consistency condition is the most

fundamental of the correctness conditions as it is what actually guarantees

that the pro cessors agree The termination condition is phrased to apply in

many p ossible models in the asynchronous sharedmemory model it trans

lates into requiring that the proto col b e waitfree as it requires that pro

cessors must nish in nite exp ected time regardless of the actions of the

adversary scheduler

If it happ ens that the pro cessors already agree with each other we want

the consensus proto col to ratify that agreement rather than veto it hence the

validity condition From a less practical p ersp ective the validity condition

is needed b ecause its absence makes the problem uninteresting since all of

the pro cessors could just decide every time the proto col is run without any

communication at all

If we are allowed to make convenient assumptions ab out the system con

sensus is not a dicult problem For example on a PRAM p erhaps the

friendliest cousin of asynchronous sharedmemory consensus reduces to sim

ply taking any function we like of the input values that satises the validity

condition In general in any model where b oth the pro cessors and the com

munications medium are reliable the problem can b e solved simply by having

the pro cessors exchange information ab out their inputs until all of them know

the entire set of inputs at this p oint each can individually compute a func

tion of the inputs as in the PRAM case to come up with the decision value

for the proto col It is only when we move to a model like asynchronous

sharedmemory that allows pro cessors to fail that consensus b ecomes hard

One diculty is that the harsh assumptions of the asynchronous shared

memory model can amplify the correctness conditions in ways that may not

b e immediately obvious For example the validity conditions implies that

the adversary can always force the pro cessors to decide on a particular value

by running only those pro cessors that started with that value Because these

live pro cessors are unable to see the diering input values of the dead

pro cessors they will see a situation indistinguishable from one in which ev

ery pro cessor started with the same value In this latter case the validity

condition would force the pro cessors to decide on that common value So

b ecause of their limited knowledge the live pro cessors must decide on the

only input value they can see even though there may b e other pro cessors

that disagree with it This example shows that one must b e very careful

ab out what assumptions one makes in the model as they can subtly aect

what a proto col is allowed to do

Shared coins

In order to solve the consensus problem we will need to cop e with the con

siderable p ower of the adversary We cannot mo dify the model to place

restrictions on the adversary instead we must nd some way of getting the

pro cessors to reach agreement in spite of the adversarys interference

One way is to base a consensus proto col on a stronger primitive the

shared coin A shared coin is a decision proto col in which each pro cessor

decides on a bit which with some probability will b e the same value that

every other pro cessor decides on But unlike consensus the actual value

chosen will not always b e under the control of the adversary In order to

prevent this control given the adversarys ability to run only some of the

pro cessors we must drop the validity condition and with it the notion of

input bits What we are left with is the following denition

1

A shared coin proto col with agreement parameter is a distributed

decision proto col that satises these two conditions

 Termination Every pro cessor decides on some value in nite exp ected

time

 Probabilistic agreement For each value b or the probability

that every pro cess decides on b is at least

The probabilistic agreement condition guarantees that with probability

the outcome of the shared coin proto col is agreed on by all pro cessors

and is indistinguishable from ipping a fair coin With probability

no guarantees whatso ever are made it is p ossible that the pro cessors will

not agree with each other at all or that the adversary will b e able to cho ose

what value each pro cessor decides on Some sort of adversary control is

always p ossible as it is known that a waitfree shared coin with exactly

equal to is impossible AHa

The agreement parameter is not the only p ossible parameter for shared

coin merely the one that is most convenient when building consensus pro

to cols If we wish to use the coin directly for example as a source of semi

random bits SV in a distributed algorithm a more natural parameter

1

Called the deance parameter in AHa The less melodramatic term agreement

parameter is taken from SSW

is the bias dened by In terms of the bias the agreement

prop erty can b e restated as follows

 Bounded bias The probability that at least one pro cessor decides on

a given value is at most

This prop erty says in eect that the adversary can force some pro cessor to

see a particular outcome with only greater probability than if the pro cessors

were actually collectively ipping a fair coin

In some circumstances we would like to guarantee that all of the pro

cessors always agree on the outcome of the coin even though the adversary

might have b een able to control what that outcome is A shared coin that

guarantees agreement will b e called robust As will b e seen in Chapter

robust shared coins can often b e converted directly into consensus proto cols

by the addition of only a small amount of machinery However Chapter

describ es an intrinsically nonrobust shared coin in this situation more so

phisticated techniques are needed to achieve consensus One approach is

describ ed in the next section

Consensus using shared coins

It is a wellestablished result that one can construct a consensus proto col from

a shared coin with constant agreement parameter ADS AHa SSW

This section gives as an example the rst of these constructions AHa As

we shall see this construction gives a consensus proto col which requires an

0 2

exp ected O T n n op erations p er pro cessor and O T n n

0

total op erations where T n and T n are the exp ected numb er of op erations

p er pro cessor and total op erations for the shared coin proto col

Pseudo co de for each pro cessors b ehavior in the sharedcoinbased con

sensus proto col is given in Figure Each pro cessor has a register of its own

with two elds prefer and round initialized to ? In addition there are

assumed to b e a p otentially unb ounded collection of shared coin primitives

one for each round of the proto col Two sp ecial terms are used to simplify

the description of the proto col A pro cessor is a leader if its round eld is

greater than or equal to every other pro cesss round eld Two pro cessors

agree if b oth their prefer elds are equal and neither eld is ?

pro cedure consensus input

prefer round input

rep eat

read all the registers

if all who disagree trail by and Im a leader then

output prefer

else if leaders agree then

prefer round leader preference round

else if prefer 6 ? then

prefer round ? round

else

prefer round shared coin round round

Figure Consensus from a shared coin

Let us sketch out the workings of the proto col The most serious problem

that the proto col is designed to solve is how to neutralize slow pro cessors

that have old outofdate views of the world Because such pro cessors end

up with low round values relative to the fast pro cessors they are eectively

excluded from the real decisionmaking in the proto col until they manage to

catch up to their faster comrades

Intuitively the decisionmaking pro cess consists of the leaders running

the shared coin proto col in line It is not necessarily the case that all of

the leaders at each round will take part in the shared coin proto col as those

that arrive earliest may not see disagreement and will execute line instead

However those early arrivals must in fact agree with each other and so with

probability at least the others will switch to agree with them at any given

round It follows that the exp ected numb er of rounds until agreement is

O

Once the leaders agree the slower pro cessors are forced to adopt the

leaders p osition by executing line The proto col terminates when the

agreeing pro cessors advance far enough rounds to know that any pro cessor

that disagrees will pass through line b efore catching up and b ecoming a

leader itself

This explanation is informal and glosses over many important but tedious

details of the proto col The interested reader is referred to AHa for a more

thorough description of the construction including a full pro of of correctness

Alternative constructions with similar p erformance may b e found in ADS

and SSW

For our purp oses it will suce to summarize the relevant results from

AHa

Theorem AHa The protocol of Figure implements a consen

sus protocol that requires an expected O rounds where is the agreement

parameter of the shared coin

From which it follows that

Corollary The protocol of Figure implements a consensus proto

col that requires an expected O T n n operations per processor and

0 2

O T n n operations in total where is the agreement parameter

0

T n the expected number of operations per processor and T n the expected

number of operations in total for the shared coin

Pro of From the theorem we exp ect at most O rounds

In each round each pro cessor executes at most n read op erations one

instance of the shared coin and two write op erations for a total of n

T n op erations

2

Similarly in each round the pro cessors collectively execute at most n

read op erations n write op erations and one instance of the shared coin for

2 0

a total of n n T n op erations

Chapter

Consensus Using a Random

Walk

All currentlyknown waitfree consensus proto cols that run in p olynomial

time are based on some form of a shared coin proto col The key insight

used in constructing shared coins is that it is dangerous to give to o much

p ower over the outcome of the proto col to any one pro cessor at any given

time Such tyranny like all tyrannies runs the risk of a sudden change

in p olicy following an assassination and thereby gives control over p olicy to

p otential assassins like our adversary scheduler In all currently known shared

coin proto cols each individual pro cessors p ower is minimized by having the

pro cessors rep eatedly cast small random votes for the two decision values

How this voting pro cess is b est represented dep ends on the method used

to decide when it is nished In this chapter we describe a shared coin and

a consensus proto col in which the voting ends when the dierence b etween

the numb er of votes and votes is large Under such circumstances the

voting pro cess can b e viewed as a random walk in which each vote moves

the total up or down by one until an absorbing barrier at K is reached

where K is a parameter of the proto col In fact the original p olynomial

time shared coin of AHa worked on exactly this principle Unfortunately

a simple implementation of a random walk do es not guarantee agreement

as the adversary can allow one pro cessor to see a total greater than K and

decide and then by releasing negative votes trapp ed inside stopp ed

pro cessors move the total down out of the decision range so that with some

nonzero probability the other pro cessors will eventually move it to K and

decide So to use a simple randomwalkbased shared coin in a consensus

proto col one would need to run it rep eatedly as described in Section

The proto cols describ ed in this chapter avoid the need for such metho ds

by extending the random walk to incorp orate the function of detecting agree

ment As a result we obtain a robust shared coin describ ed in Section

which guarantees that all pro cessors agree on its outcome Because the coin

guarantees agreement it can b e modied in to obtain a consensus proto col

simply by attaching a preamble to ensure validity as describ ed in Section

The resulting consensus proto col and its variants obtained by replacing the

counter implementation BR DHPW are particularly simple as they

are the only known waitfree consensus proto cols that do not require the re

p eated execution of a nonrobust shared coin proto col and the multiround

sup erstructure that comes with it

The simplicity of the proto col also allows some optimizations that are

more dicult when using a nonrobust coin The consensus proto col is de

signed to require fewer total op erations if fewer pro cessors actually partici

pate in it a feature which b ecomes important when for example the proto col

is used as a primitive for building shared data structures which only a few

pro cessors might attempt to access simultaneously

The chapter is organized as follows Section describ es some prop erties

of random walks that will b e used later in the chapter Section describ es

the robust shared coin proto col and proves its correctness The description of

the robust shared coin proto col assumes the presence of an atomic counter

providing increment decrement and read op erations that app ear to o ccur

sequentially Section shows how such a counter may b e built from single

2

writer atomic registers at the cost of O n register op erations p er counter

op eration Finally Section describes the consensus proto col obtained by

modifying the robust shared coin

Random walks

Let us b egin by stating a few basic lemmas ab out the b ehavior of random

walks

Lemma Consider a symmetric random walk with step size running

between absorbing barriers at a and b and starting at x where a x b

Then

The expected number of steps until one of the barriers is reached is

2

ba

given by x ab x which is always less than or equal to

2

xa

The probability that the random walk hits b before a is

ba

Pro of The random walk described is just a form of the classical gamblers

ruin problem See Fel pp

Lemma Consider a symmetric random walk with step size running

between a reecting barrier at a and an absorbing barrier at b starting at

position x a x b Then the expected time until b is reached is x a

2

b ab x b a

Pro of This random walk can b e obtained from the random walk with ab

sorbing barriers at b and a b a by the transformation x a jx aj

The following critical lemma describ es a mo died random walk that will

b e of great importance in analyzing the shared coin and consensus proto cols

Lemma Consider a symmetric random walk with absorbing barriers at

a and b with the fol lowing twist a point c a c b is chosen as the center

of the random walk The adversary chooses the starting position x of the

random walk to be anywhere in the range from a to b Also before each step

the adversary may choose between moving randomly in either direction with

probability or moving away from c with probability No matter what

choices the adversary makes the expected number of steps until one of the

2

barriers is reached is at most b a

Pro of The game describ ed can b e thought of as a controlled Markov

pro cess DY in which the adversary is trying to maximize the exp ected

time Because this pro cess is played over a nite set of states a standard

result of the theory of controlled Markov pro cesses can b e applied This

result states that the maximum time can b e achieved by an adversary using

a simple strategy one which cho oses the same option from each state at all

times

Such a strategy can b e sp ecied by listing the p oints where the adversary

cho oses to force the particle to move away from c We can think of these

p oints as dividing the range of the random walk into intervals b etween each

pair of p oints where the adversary forces the particle to move determinis

tically is a region where the particle moves randomly The p oints at the

edge of these random regions act like barriers in a random walk A p oint on

the side away from c pushes the particle into a new region and so acts like

an absorbing barrier while a p oint on the side toward c pushes the particle

back into the old region and so acts like a reecting barrier Thus the region

containing c acts like a random walk with two absorbing barriers and the

remaining regions act like random walks with one absorbing barrier on the

side away from c and one reecting barrier on the side toward c

Because each barrier can only b e crossed away from c once the particle

leaves a region it can never return Now supp ose the particle starts in a

2

region with width w After at most w steps on average by Lemmas or

1

1

2

steps it will pass into a new region of width w after an additional w

2

2

it will pass into a new region of width w and so on until either a or b is

3

P

reached Since these regions all t b etween a and b w b a and thus

i

P

2 2

since each w w b a

i

i

Though the b ound in Lemma is proved for the case of a very p owerful

adversary that is always allowed to cho ose b etween a random move and a

deterministic move at each step the b ound applies equally well to a weaker

adversary whose choices are more constrained as the stronger adversary

could always cho ose to op erate within the weaker adversarys constraints

This technique of proving b ounds for a strong adversary that carry over to

a weaker one has great simplifying p ower It will b e used extensively in the

analysis of the shared coin and consensus proto cols

The robust shared coin proto col

Figure shows pseudo co de for each pro cessors b ehavior in the robust

shared coin proto col The coin is constructed using an atomic counter

which supp orts atomic increment decrement and read op erations In this

section these op erations are assumed to take unit time The counter is

initialized to The pro cessors lo cal coin is represented by the pro cedure

local ip which returns the values and with equal probability

A pro cessors b ehavior in the proto col is represented in pictorial form in

Figure While a pro cessor reads values in the central range from K

Shared data

counter counter with range K n K n initialized to

coin pro cedure shared

rep eat

c counter

if c K n then output

else if c K n then output

else if c K then decrement counter

else if c K then increment counter

else

if local ip then increment counter

else decrement counter

Figure Robust shared coin proto col

Z

K K

Z

Z

Z

Z

Z

 

Z

Z

Z

 

K n K n

Figure Pictorial representation of robust shared coin proto col

to K where K is a parameter of the proto col it ips a lo cal fair coin to

decide whether to increment or decrement the counter This part of the

proto col is essentially the same as the randomwalkbased shared coin of

Aspnes and Herlihy AHa What is new is the addition of a slop e at

either side of the random walk On these slop es a pro cessor do es not move

the counter randomly but instead always moves it away from the center

When a pro cessor reads a counter value in one of the buckets b eyond the

slop es it decides either or dep ending on the sign of the counter

If the slop es are wide enough once any pro cessor has seen a value that

causes it to decide all other pro cessors will see values that cause them to

push the counter toward the same decision This mechanism eliminates the

p ossibility that delayed writes might move the counter out of the decision

range and allow the random walk with small but nonnegligible probability

to wander over to the other side More formally we can show

Lemma If any processor reads a counter value v K n then al l

subsequent reads wil l return values greater than or equal to K in the

symmetric case where v K n al l subsequent values read wil l be less

than or equal to K

Pro of Supp ose that a pro cessor has read v K n then it immediately

terminates leaving n running pro cessors Thus the numb er d of pro cessors

that will execute a decrement b efore their next read is at most n Let

l c d where c is the value stored in the counter Since c K n it

must b e the case that l K Now consider the eect of the actions the

scheduler can take If it allows a decrement to pro ceed c and d b oth drop

by and l remains constant If it allows an increment to o ccur c increases

and l increases with it If it allows a read the value read is c l K

and thus d is unaected In each case l remains at least K and the claim

follows since c l The pro of of the symmetric case is similar

The consistency prop erty follows immediately from Lemma A similar

argument shows that the counter will not overow

Lemma The counter value never leaves the range K n K n in

any execution of the shared coin protocol

Pro of Supp ose that the counter reaches K n at some p oint Then each

pro cessor will execute at most one increment or decrement op eration b efore

it reads the counter at which p oint it will decide and execute no additional

op erations Thus the counter cannot exceed K n n K n The full

result follows by symmetry

Proving the termination and b ounded bias prop erties of the shared coin

requires some additional machinery Dene the true p osition t of the ran

dom walk to b e the value in the counter plus for each pro cessor that will

increment the counter b efore its next read and minus for each pro cessor

that will decrement the counter b efore its next read The following Lemma

relates the value read by a pro cessor to the true p osition of the random walk

Lemma Let c be a value read from the counter by some processor and

t the true position of the random walk in the state preceding the read Then

jc tj n

Pro of There can b e at most n pro cessors with p ending increments or

decrements

Let us assume hereafter that the scheduler can cause a pro cessor to read

any value b etween t n and t n Because such a scheduler could

always cho ose to simulate any scheduler the proto col will actually face any

go o d statement we can prove with the assumption will carry over to the

situation without it The advantage of granting the adversary this additional

p ower is that it allows us to forget ab out the vagaries of the counter value

Instead we can treat the proto col as a controlled random walk using the true

p osition t

Consider the lower part of Figure the upp er part simply rep eats

Figure without the buckets If the true p osition t is in the central region

b etween K n and K n then Lemma implies that any

pro cessor that reads the counter will see a value b etween K and K and

move t randomly In the two immediately adjacent regions any pro cessor

will either read a value b etween K and K and move t randomly or read a

value that causes it to move t away from Finally any pro cessor that reads

a value in the outermost regions where jtj K n will either make a

decision or move t away from In each of these cases the scheduler is never

allowed to force that true p osition toward and if K is large relative to n

much of the execution of the proto col will b e sp ent in the central region where

the schedulers control is ineective These two prop erties of the proto col are

the basis of the pro of of its termination and b ounded bias as shown b elow

Z

K K

Z

. .

. .

. .

. . Z

. .

Z

Z

. .

. .

. .

. .

Z

. .

. .

. .

. .

Z

. .

. .

. .

. .

. .

Z

. .

. .

. .

. .

. . Z

. .

. .

.

.

. .

Z

. .

Z

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

. .

K n K n

. .

. .

. . . .

h

. .

h

h

. .

h

h

. .

h

h

. .

h

h

K n K n h

. .

Z

Z

Z

Z

Z

Z

Z

Z

Z

Z

Z

Figure The proto col as a controlled random walk

Lemma The robust shared coin protocol executes an expected O K

2

n total counter operations when K n

Pro of If we consider the true p osition t Lemma implies that the sched

uler can only force t up if t K n and down if t K n

Furthermore if jtj ever exceeds K n n each pro cessor will

decide after its next read Thus the movement of the true p osition is a

controlled random walk in the sense of Lemma with center and bar

riers at K n The exp ected numb er of steps until a barrier is

2

reached is at most K n steps which will b e followed by at most

n op erations as the pro cessors each decide Since each step takes a constant

numb er of counter op erations the exp ected numb er of op erations required is

2

O K n

The time b ound of Lemma shows that every pro cessor terminates in

nite exp ected time when K n The b ounded bias prop erty is a conse

quence of the following lemma

Lemma Against any scheduler the probability that the processors in the

K (n1) K +(n1)

robust shared coin protocol wil l decide is between and

2K 2K

Pro of Supp ose the scheduler is trying to maximize the probability of decid

ing Under the simplifying assumption it can force a decision of as so on

as t K n however if it allows t to slip b elow K n the

pro cessors will eventually decide When K n t K n

the scheduler may cho ose b etween moving t randomly or forcing t toward

K n Clearly forcing the counter toward K n can only

increase the probability of deciding so cho osing to move t randomly max

imizes the probability of deciding But if the scheduler makes this choice

the movement of the true p osition b ecomes a simple random walk with ab

sorbing barriers at K n and K n By Lemma the

K +(n1)

probability that t reaches K n rst is The lower b ound

2K

follows by symmetry

Combining the lemmas we obtain

Theorem When K n the protocol of Figure implements a robust

shared coin

Pro of Consistency follows from Lemma termination from Lemma

and b ounded bias from Lemma

Lemma allows K to b e chosen to obtain arbitrarily small nonnegative

1

then bias Let the bias of the shared coin b e

2

n

K

which gives

n

K

Combining this inequality with Lemma gives a b ound on the worst

2

case exp ected running time for the proto col of O n total counter op er

ations This time is comparable to the worstcase exp ected running times of

the proto cols nonrobust ancestors The proto col thus achieves robustness

without paying a signicant cost in sp eed

Implementing a b ounded counter with

atomic registers

The robust shared coin proto col assumes the presence of a shared counter

supp orting atomic increment decrement and read op erations with the re

striction that no op eration will b e applied that will move the counter out of

some xed range r r In practice such a counter is not likely to b e avail

able as a hardware primitive Fortunately it is not dicult to implement a

shared counter using atomic registers However some care must b e taken to

guarantee that the counter uses only a b ounded amount of space

Both Aspnes and Herlihy AHa and Attiya Dolev and Shavit ADS

describ e shared counter implementations The two counter implementations

b oth assign a register to hold the net increment due to each pro cessor so

that the counters value is simply the sum of the values in these registers

Both algorithms use simple atomic snapshot proto cols to allow the entire set

of registers to b e read in a single atomic action

Alas neither implementation do es quite what we would like Even though

the value stored in the counter will never exceed the range r r the net

increment due to an individual pro cessor is p otentially unb ounded The

AspnesHerlihy proto col ignores this diculty by assuming the presence of

unb ounded registers which it also uses to implement the atomic scan The

AttiyaDolevShavit proto col uses only b ounded registers but enforces the

b ounds by prematurely terminating the shared coin proto col if any pro ces

sors register wanders out of a limited range This premature termination

o ccurs infrequently and is acceptable in a shared coin that do es not need to

guarantee consistency But it is not acceptable for a robust coin as it may

allow the scheduler to force some pro cessor to cho ose one value through

premature termination after another has already chosen a dierent value

through the normal workings of the shared coin proto col

A simple alternative to premature termination that still allows the size

of the registers to b e b ounded is to store the remainder of each pro cessors

contribution relative to some convenient modulus m greater than the total

range r The counter value can then b e reconstructed as the unique v

in the range r r that is congruent to the sum of the registers mo dulo m

Pseudo co de for the three counter op erations using this technique is shown

in Figure it assumes the presence of an array of registers which can b e

Shared data

scannable array count n initialized to

pro cedure increment

v count me

count me v mod m

pro cedure decrement

v count me

count me v mod m

pro cedure read

scan count

P

n1

count i v

i=0

0 0 0

return v where r v r and v v mod m

Figure Pseudo co de for counter op erations

read in a single op eration Such an array can b e simulated using an atomic

+

snapshot proto col AAD AHb And An atomic snapshot is an

op eration that returns a picture of the values in all of the registers in the

array that is consistent with other pictures returned by other snapshots and

with the order of nonoverlapping write op erations even though it may not

corresp ond to the actual values in the registers at a particular p oint in time

Typically it is necessary to make writes to registers in the scannable array

b e more than just simple writes to individual registers so taking a snapshot

of the array and writing to an element of the array will b oth b e exp ensive

+

However the algorithm of Afek et al AAD allows an atomic scan op er

2

ation to b e implemented with O n bits of extra space and a maximum of

2

O n primitive read and write op erations for each snapshot and each write

to a simulated register in the scannable array Using their algorithm

2

implements an atomic counter where each counter op erations costs O n

register op erations

The randomized consensus proto col

Figure shows pseudo co de for each pro cessors b ehavior in the randomized

consensus proto col The proto col uses three shared counters The rst two

maintain a total of the numb er of participating pro cessors that started with

each of the inputs and The last is used as the counter for a modied

version of the robust shared coin proto col All of the counters have an initial

value of

The proto col is optimized for the case where few pro cessors participate

We will dene a pro cessor to b e active if it takes at least one step b efore

some pro cessor decides on a value and denote by p the total numb er of active

pro cessors in a given execution The proto col uses the counters a and a

0 1

to keep track of the numb er of active pro cessors by having each pro cessor

increment one or the other of these counters as it starts the proto col

The proto col dep ends on b eing able to take an atomic snapshot of the

counters Since the rst two counters are never decremented such a snapshot

can b e obtained as described in Figure Though the op eration dened

there is not waitfree b ecause it will not nish if a or a changes during

0 1

some pass through the lo op this event can o ccur at most p times during any

execution of the consensus proto col So in fact the time to carry out the

atomic snapshot will b e b ounded in the context in which it is used

If the counters are not primitives but are instead constructed as describ ed

in Section using an atomic scan op eration the overhead of Figure can

b e avoided completely by simply reading all three counters in a single atomic

scan of the arrays that implement them

Several features of the proto col are worth noting First of all the same

slop es that ensured consistency for the robust shared coin ensure consis

tency for the consensus proto col for the same reasons Second the counters

a and a allow the proto col to guarantee validity as the random walk is only

0 1

invoked if b oth have nonzero values These counters are also used to min

imize the range of the random walk by taking advantage of the fact stated

in the following lemma a mo dication of Lemma

Lemma Let a a c be the values read from the counters by some pro

0 1

cessor and t the true position of the random walk in the state preceding the

read Then jc tj a a

0 1

Pro of There are at most a a pro cessors with p ending increments or

0 1

Shared data

counter a with range n initialized to

0

counter a with range n initialized to

1

counter c with range n n initialized to

pro cedure consensus input

increment a

input

rep eat

read a a c

0 1

if c n then output

else if c n then output

else if c a a or a then decrement c

0 1 1

else if c a a or a then increment c

0 1 0

else

if local ip then increment c

else decrement c

Figure Consensus proto col

pro cedure scan counters

rep eat

a read a

0 0

a read a

1 1

c read c

0

read a a

0

0

0

read a a

1

1

0 0

a a and a until a

1 0

1 0

return a a c

0 1

Figure Counter scan for randomized consensus proto col

decrements

To prove that the consensus proto col is correct we must establish that it

is consistent that it terminates and that it is valid The pro of of consistency

is a straightforward modication of the pro of of Lemma

Lemma If any processor reads a counter value v n then al l subse

quent reads wil l return values n in the symmetric case where v n

al l subsequent reads wil l return values n

Pro of Apply the pro of of Lemma with K n

Similarly the pro of that the counter c do es not overow is a straightfor

ward modication of Lemma

Lemma The value of c never leaves the range n n in any execu

tion of the consensus protocol

Pro of Apply the pro of of Lemma with K n

Termination is trickier to demonstrate As in the case of the shared coin

the key to proving the consensus proto cols termination is the fact that the

schedulers only alternative to moving the true p osition randomly is to move

it away from the origin In the shared coin proto col this condition dep ends

on xing the parameter K n In the consensus proto col the situation

is more complicated as the proto col uses its knowledge of the numb er of

currently active pro cessors to set the inner b oundaries of the slop e close to

the origin while still preventing the scheduler from b eing able to force the

true p osition to move toward the origin

Lemma Let n be the total number of processors and p be the number

of processors that take at least one step before some processor decides on a

value Then the worstcase expected running time of the consensus protocol

2

is O p n total counter operations

2

Pro of We will show that the consensus proto col terminates in O p n

time by reducing it to a controlled random walk of the true p osition t Divide

the execution of the proto col into two phases In the rst phase at most one

of a a is nonzero if the execution do es not leave the rst phase b efore

0 1

n increments or decrements have o ccurred the proto col will terminate after

O n additional steps by Lemma

In the second phase b oth a and a are nonzero Let v b e a value read

0 1

by some pro cessor from the counter c By Lemma we know that jt

v j a a p Now to force an increment during the second

0 1

phase the scheduler must show a pro cessor a counter value v that is at least

a a p ossibly by withholding lo cal coinips to raise the value of c or

0 1

by withholding increments to lower the value of a a In either case

0 1

Lemma applies and t must b e greater than The case of the scheduler

attempting to force a decrement is symmetric and thus in either case the

scheduler can only force the true p osition to move away from

Furthermore since p is an upp er b ound b oth on the distance b etween c

and t and on the value of a a if jtj p then jv j a a and the

0 1 0 1

true p osition will move away from thereafter Thus the second phase of the

execution can b e modeled as a controlled random walk in the sense of Lemma

with center barriers at p and a starting p osition equal to the true

p osition at the end of the rst phase By Lemma this random walk will

2

take an exp ected O p steps each consisting of a constant numb er of counter

op erations to this value must b e added O n steps until termination up to

2

O n steps from the rst phase and O p extra read op erations due to extra

passes through the lo op in scan counters The total expected numb er of

2

counter op erations is thus O p n

2

Note that the exp ected running time of O p n is expressed in total

counter op erations If the counter is implemented as described in Section

2 2

the total numb er of register op erations will b e O n p n

Lemma The protocol of Figure satises the validity condition

Pro of Supp ose every pro cessor starts with the input Then a is never

0

incremented and so retains its initial value of throughout the execution

of the proto col Thus each pro cessor will increment c until it reads a value

v n at which p oint it will decide The case where every pro cessor has

input is symmetric

Combining the lemmas gives

Theorem Figure implements a consensus protocol

Pro of Lemmas and

It is worth lo oking at the b ehavior of the shared coin implicitly emb edded

in the consensus proto col of Figure Because the function of detecting

agreement is implemented in the shared coin itself limiting scheduler con

trol over the outcome of the shared coin is no longer necessary to achieve

consensus Thus the parameter K of the shared coin proto col can b e set to

minimize the time taken in the random walk without regard to its eect on

the agreement parameter In the proto col of Figure the shared coin has

1

as low as is p ossible without setting an eective agreement parameter of

2p

K p

At the same time the simplicity of the proto col allows the numb er and

size of the shared counters to b e very small Unfortunately when the avail

able primitives are limited to atomic registers this small size is lost in the

2

n space overhead of the atomic scan op eration It is not immediately

clear that this overhead is a necessary feature of an atomic counter imple

mentation much work remains to b e done in this area

Chapter

Consensus Using Weighted

Voting

Intro duction

In the previous chapter we built a consensus proto col that directly incorp o

rated a robust shared coin Here we will show how to construct a faster but

nonrobust shared coin which gives consensus using standard constructions

such as the one of AHa describ ed in Section

This shared coin proto col requires a departure from previous practice As

in the proto cols of the previous chapter the fundamental technique b ehind all

shared coin proto cols since AHa has b een the use of rep eated equally

weighted votes to reduce the impact of any particular pro cessors private

knowledge and with it the adversarys ability to aect the outcome of the

coin There are many advantages to this approach The pro cessors act as

anonymous conduits of a stream of unpredictable random increments If

the scheduler stops a particular pro cessor at worst all it do es is keep one

vote from b eing written out to the common p o olthe next lo cal coin ip

executed by some other pro cessor is no more or less likely to give the value

the scheduler wants than the next one executed by the pro cessor it has just

stopp ed Intuitively the schedulers p ower over the outcome of the shared

coin is limited to ltering out up to n lo cal coin ips from this stream

of indep endent random variables But the eect of this ltering is at worst

equivalent to adjusting the nal tally of votes by up to n If a constant

2 2

multiple of n votes are cast the total variance will b e n Because the

total vote is approximately normally distributed the proto col can guarantee

that with constant probability the total vote is more than n away from the

origin rendering the schedulers adjustment ineective

Alas the very anonymity of the pro cessors that is the strength of the

voting technique is also its greatest weakness To overcome the schedulers

2

p ower to withhold votes it is necessary that a total of n votes are cast

but the scheduler might also cho ose to stop all but one of the pro cessors

2

leaving that lone pro cessor to generate all n votes by itself It follows

that for all of the p olynomialtime waitfree consensus proto cols based on

unweighted voting the worstcase exp ected b ound on the work done by a

single pro cessor is asymptotically no b etter than the b ound on the total

work done by all of the pro cessors together

In this chapter we show how to avoid this problem by modifying a pro

to col of Bracha and Rachman BR to allow the pro cessor to cast votes of

increasing weight Thus a fast pro cessor or a pro cessor running in isolation

can quickly generate votes of sucient total variance to nish the proto col

at the cost of giving the scheduler greater control by allowing it b oth to with

hold votes with larger impact and to cho ose among up to n dierent weights

one for each pro cessor when determining the weight of the next vote

There are two main diculties that this approach entails The rst is that

careful adjustment of the weight function and other parameters of the pro

to col is necessary to make sure that it p erforms correctly More importantly

allowing the weight of the ith vote to dep end on the particular pro cessor

the scheduler cho oses to run which may in turn dep end on the outcomes

of previous votes means that we cannot treat the sequence of votes as a

sequence of indep endent random variables

However the sign of each vote is determined by a fair coin ip that

the scheduler cannot predict in advance and so despite all the schedulers

p owers the exp ected value of each vote b efore it is cast is always This

is the primary requirement of a martingale pro cess Bil Fel Kop

Under the right conditions martingales have many similarities to sequences

of sums of independent random variables In particular martingale analogues

of the Central Limit Theorem and Cherno b ounds will b e used in the pro of

of correctness

The rest of the chapter is organized as follows Section denes the

shared coin proto col and gives an overview of its op eration Section

pro cedure shared coin

b egin

my reg variance vote

t

rep eat

for i to c do

ip w t vote local

2

my reg my regvariance w t my regvote vote

t t

end

read all the registers summing the variance elds into the

lo cal variable total variance

until total variance K

read all the registers summing the vote elds into the lo cal vari

able total vote

if total vote

then output

else if total vote

then output

else fail

end

Figure Shared coin proto col

contains a brief denition of martingales and describ es some of their prop er

ties Finally Section proves the correctness of the proto col for two sets

of parameters one of which allows it to simulate the equallyweighted vot

2

ing proto col of BR and one which gives a b ound of O n log n on the

exp ected numb er of op erations executed by a single pro cessor

The shared coin proto col

Figure gives pseudo co de for each pro cessors b ehavior during the

shared coin proto col Each pro cessor rep eatedly ips a lo cal coin that re

turns the values and with equal probability The weighted value of

each ip is w t or w t resp ectively where t is the numb er of coins ipp ed

by the pro cessor up to and including its current ip Each weighted ip

represents a vote for either the output value if p ositive or if negative

After each ip the pro cessor up dates its register to hold the sum of the

weighted ips it has p erformed and the sum of the squares of their values

After every c ips the pro cessor reads the registers of all the other pro ces

sors and computes the sum of all the weighted ips the total vote and the

sum of the squares of their values the total variance If the total variance

is greater than the quorum K it stops and outputs if the total vote is

p ositive and if it is negative it treats a total vote of zero as a failure to

avoid intro ducing asymmetry b etween the two outcomes Alternatively if

the total variance has not yet reached the quorum K it continues to ip its

lo cal coin

As in the previous chapter the function local ip returns the values and

randomly with equal probability The values K and c are parameters of

the proto col which will b e set dep ending on the numb er of pro cessors n to

give the desired b ounds on the agreement parameter and running time The

weight function w t is used to make later lo cal coin ips have more eect

than earlier ones so that a pro cessor running in isolation will b e able to

achieve the quorum K quickly The weight function will b e assumed to b e

a

of the form w t t where a is a nonnegative parameter dep ending on n

though other weight functions are p ossible this choice simplies the analysis

We will demonstrate that for suitable choice of K c and a all pro cessors

return with constant probability the case of all pro cessors returning

will follow by symmetry The structure of the argument follows the pro of of

correctness of the less sophisticated proto col of Bracha and Rachman BR

2

which corresp onds to Figure when w t is the constant K n

and c n log n Votes cast b efore the quorum K is reached will form

1

a p o ol of common votes that all pro cessors see We will show that with

constant probability i the total of the common votes is far from the origin

and ii the sum of the extra votes cast b etween the time the quorum is

reached and the time some pro cessor do es its nal read in line is small

so that the total vote read by each pro cessor will have the same sign as the

total common vote

1

The denitions of the common and extra votes we will use dier slightly from those

used in BR the formal denitions app ear in Section

This simple overview of the pro of hides many tricky details To simplify

the analysis we will concentrate not on the votes actually written to the

registers but on the votes whose values have b een decided by the pro cessors

execution of the lo cal coin ip in line conversion back to the values actually

in the registers will b e done by showing a b ound on the dierence b etween

the total decided vote and the total of the register values In eect we are

treating a vote as having b een cast the moment that its value is determined

instead of when it b ecomes visible to the other pro cessors

Some care is also needed to correctly model the sequence of votes Most

importantly as p ointed out ab ove allowing the weight of the ith vote to

dep end on which pro cessor the scheduler cho oses to run means the votes are

not indep endent So the straightforward pro of techniques used for proto cols

based on a stream of identicallydistributed random votes no longer apply

and it is necessary to bring in the theory of martingales to describ e the

execution of the proto col

Martingales

A martingale is a sequence of random variables S S which informally

1 2

may b e thought of as representing the changes in the fortune of a gambler

playing in a fair casino Because the gambler can cho ose how much to b et or

which game to play at each instant each random variable S may dep end on

i

all previous events But b ecause the casino is fair and the gambler cannot

predict the future the exp ected change in the gamblers fortune at any play

is always

We will need to use a very general denition of a martingale Bil Fel

Kop The simplest denition of a martingale says that the exp ected value

of S given S S S is just S To use a gambling analogy this deni

i+1 1 2 i i

tion says that a gambler who knows only the previous values of her fortune

cannot predict its expected future value any b etter than by simply using its

current value But what if the gambler knows more information than just

the changing size of her bankroll For example imagine that she is placing

b ets on a fair version of roulette and always b ets on either red or black

Knowing that her fortune increased after b etting red will tell her only that

one of eighteen red numb ers came up but a real gambler will see precisely

which of the eighteen numb ers it was Still we would like to claim that this

additional knowledge do es not aect her ability to predict the future To

do so the denition of a martingale must b e extended to allow additional

information to b e represented explicitly

The to ol used to represent the information known at any p oint in time

2

will b e a concept from measure theory a algebra The description given

here is informal more complete denitions can b e found in Fel Sections

IV IV and V or Bil

Knowledge  algebras and measurability

Recall that any probabilistic statement is always made in the context of some

p ossibly implicit sample space The elements of the sample space called

sample p oints represent all p ossible results of some set of exp eriments

such as ipping a sequence of coins or cho osing a p oint at random from the

unit interval Intuitively all randomness is reduced to selecting a single p oint

from the sample space An event such as a particular coinip coming up

heads or a random variable taking on the value is simply a subset of the

sample space that o ccurs if one of the sample p oints it contains is selected

If we are omniscient we can see which sample p oint is chosen and thus

can tell for each event whether it o ccurs or not However if we have only

partial information we will not b e able to determine whether some events

o ccurred or not We can represent the extent of our knowledge by making

a list of all events we do know ab out This list will have to satisfy certain

closure prop erties for example if we know whether or not A o ccurred and

whether or not B o ccurred then we should know whether or not the event

A or B o ccurred

We will require that the set of known events b e a algebra A algebra

F is a family of subsets of a sample space that i contains the empty set

ii is closed under complement if F contains A it contains n A the com

plement of A and iii is closed under countable union if F contains all of

S

1

3

A A it contains A An event A is said to b e F measurable if it

1 2 i

i=1

is contained in F In our context the term measurable which comes from

the original measuretheoretic use of algebras to represent families of sets

on which a probability distribution is welldened simply means known

2

Sometimes called a  eld

3

Additional prop erties such as b eing closed under nite union or intersection follow

immediately from this denition

We know ab out an event if we can determine whether or not it o ccurred

What ab out random variables A random variable X is dened to b e F

measurable if every event of the form X c is F measurable The closure

prop erties of F then imply that such events as a X b X d and

so forth are also F measurable Lo oking at the situation in reverse given

random variables X X we can consider the minimum algebra F for

1 2

which each of the random variables is F measurable this algebra written

hX i is called the algebra generated by X X and represents all

i 1 2

information that can b e inferred from knowing the values of the generators

A algebra gives us a rigorous way to dene knowledge in a probabilis

tic context Measurability and generated algebras give us a way to move

back and forth b etween the abstract concept of a algebra and concrete

statements ab out which random variables are completely known To analyze

random variables that are only partial ly known we need one more denition

We need to extend conditional exp ectations so that the condition can b e a

algebra rather than just a collection of random variables

For each event A let I b e the indicator variable that is if A o ccurs

A

and otherwise Let U E X j F b e a random variable such that i U

is F measurable and ii EUI EXI for all A in F The random

A A

variable E X j F is called the conditional exp ectation of X with resp ect

to F Fel Section V Intuitively the rst condition on EX j F says

that it reveals no information not already found in F The second condition

says that just knowing that some event in F o ccurred do es not allow one

to distinguish b etween X and EX j F this fact ultimately implies that

EX j F uses all information that is found in F and is relevant to X

If F is a generated by random variables X X the conditional exp ec

1 2

tation EX j F reduces to the simpler version EX j X X Some other

1 2

facts ab out conditional exp ectation that we will use but not prove if X is

F measurable then EXY j F X EY j F which implies E X j F X

0 0 0

and if F F then EEX j F j F EX j F See Fel Section V

Denition of a martingale

We now have the to ols to dene a martingale when the information available

at each p oint in time is not limited to just the values of earlier random

variables A martingale fS F g i n is a sto chastic pro cess where

i i

each S is a random variable representing the state of the pro cess at time i and

i

F is a algebra representing the knowledge of the underlying probability

i

distribution available at time i Martingales are required to satisfy three

axioms for all i

F F The past is never forgotten

i i+1

S is F measurable The present is always known

i i

ES j F S The future cannot b e foreseen

i+1 i i

Often F will simply b e the algebra hS S i generated by the vari

i 1 i

ables S through S in this case axioms and will hold automatically

1 i

To avoid sp ecial cases let F denote the trivial algebra consisting of the

0

empty set and the entire probability space The dierence sequence of a

martingale is the sequence X X X where X S and X S S

1 2 n 1 1 i i i1

for i A zeromean martingale is a martingale for which ES

i

Gambling systems

A remarkably useful theorem which has its origins in the study of gambling

systems is due to Halmos Hal We restate his theorem b elow in modern

notation

Theorem Let fS F g i n be a martingale with dierence se

i i

quence fX g Let f g i n be random variables taking on the values

i i

and such that each is F measurable Then the sequence of random

i i1

P

i

0

X is a martingale relative to F variables S

j j i

j =1 i

Pro of The rst two prop erties are easily veried Because is F

i i1

measurable E X j F EX j F and the third prop erty also

i i i1 i i i1

follows

Limit theorems

Many results that hold for sums of indep endent random variables carry over

in modied form to martingales For example the following theorem of

Hall and Heyde HH Theorem is a martingale version of the classical

Central Limit Theorem

2

Theorem HH Let fS F g be a zeromean martingale Let V

i i

n

h i

P P

n n

2+2 2

E jX j EX j F and let Dene L

i i1 n

i=1 i=1 i

i h

2 1+

Then there exists a constant C depending only on such E jV j

n

that whenever L

n

1(3+2 )

jPrS x xj CL

n

2

n

4(1+ ) (3+2 )

jxj

where is the standard unit normal distribution with mean and variance

If we are interested only in the tails of the distribution of S we can

n

get a tighter b ound using Azumas inequality a martingale analogue of the

standard Cherno b ound Che for sums of indep endent random variables

The usual form of this b ound see AS Sp e assumes that the dierence

variables X satisfy jX j This restriction is to o severe for our purp oses

i i

so b elow we prove a generalization of the inequality In order to do so we

will need the following technical lemma

Lemma Let fS F g i n be a zeromean martingale with dif

i i

ference sequence fX g Let F F be a not necessarily trivial algebra

i 0 1

such that ES j F If there exists a sequence of random variables

1 0

w w w and a random variable W such that

1 2 n

W is F measurable

0

Each w is F measurable

i i1

For al l i jX j w with probability and

i i

P

n

2

w W with probability

i=1 i

then for any

i h

2

W 2 S

n

e j F E e

0

x

Pro of The pro of is by induction on n Using the convexity of e and the

fact that EX j F we have

1 0

i h

2 2

X w 2 w w

1 1 1

1

j F E e cosh w e e e

0 1

2

If n we are done since w W If n is greater than for each i

1

0 0 0 0

n let S S X and F F Then fS F g i n satises

i+1 1 i+1

i i i i

0 2 0 0

so w and W W w F w the conditions of the lemma with F

i+1 1

1 i 0

i h

0

2 2

S

(W w )2 0

n1

1

e But then using by the induction hyp othesis E e j F

0

0 0

the fact that EX j F E EX j F j F when F F we can compute

i i i h h h

X (S X ) S

n n

1 1

j F E E e e j F j F E e

0 1 0

i i h h

0

X S 0

1

n1

j F E e E e j F

0

0

i h

2 2

X )2 (W w

1

1

E e j F e

0

i h

2 2

(W w )2 X

1

1

e E e j F

0

2 2 2 2

(W w )2 w 2

1 1

e e

2

W 2

e

Theorem Let fS F g i n be a zeromean martingale with

i i

dierence sequence fX g If there exists a sequence of random variables

i

w w w and a constant W such that

1 2 n

Each w is F measurable

i i1

For al l i jX j w with probability and

i i

P

n

2

W with probability w

i i=1

then for any

2

2W

PrS e

n

i h

2

W 2 S

n

e Thus by Markovs Pro of By Lemma for any E e

inequality

i h

2

W 2 S

n

e e e PrS Pr e

n

Setting W gives

Symmetry immediately gives us

Corollary For any martingale fS F g satisfying the premises of The

i i

orem and any

2

2W

PrS e

n

Pro of Replace each S by S and apply Theorem

i i

Pro of of correctness

For this section we will x a particular scheduler We may assume without

loss of generality that the scheduler is deterministic b ecause any random

inputs the scheduler might use cannot dep end on the history of an execution

and therefore may also b e xed in advance

Consider the sequence of random variables X X where X represents

1 2 i

the ith vote that is decided by some pro cessor executing line or if fewer

than i lo cal coin ips o ccur For each i let F b e hX X i the algebra

i 1 i

generated by X through X Because the scheduler is deterministic all of

1 i

the random events in the system preceding the ith vote are captured in the

variables X through X and the algebra F thus determines the entire

1 i1 i1

history of the system up to but not including the ith vote Furthermore

since the schedulers b ehavior dep ends only on the history of the system

F in fact determines the schedulers choice of which pro cessor will cast

i1

the ith vote Thus conditioned on F X is just a random variable which

i1 i

takes on the values w with equal probability for some weight w determined

by the schedulers choice of which pro cessor to run Hence EX j F

i i1

P

i

and the sequence of partial sums S X is a martingale relative to

i i

j =1

fF g

i

We are not going to analyze fS F g directly Instead it will b e used as

i i

a base on which other martingales will b e built using Theorem

P

i

2

Let if X K and otherwise Votes for which will b e

i i

j =1 j

called common votes For each pro cessor P let if the vote X o ccurs

P i i

b efore P reads during its nal read in line the register of the pro cessor

deciding X and let otherwise In eect is the indicator variable

i P i P i

for whether P would see X if it were written out immediately Observe

i

that for a xed scheduler the values of b oth and can b e determined by

i P i

examining the history of the system up to but not including the time when the

vote X is cast and thus b oth and are F measurable Consequently

i i P i i

o o n n

P P

i i

are martingales relative to and the sequences X X

P j j j j

j j

fF g by Theorem Votes for which but will b e referred to

i P i i

as the extra votes for pro cessor P Observe that since P could

P i i

not have started its nal read until the total variance was at least K The

n o

P

i

sequence of the partial sums of these extra votes is a X

P i i i

j

dierence of martingales and is thus also a martingale relative to fF g

i

The structure of the pro of of correctness is as follows First we show

P

that the distribution of the total common vote X is close to a normal

i i

distribution with mean and variance K for suitable choices of a and K

p

P

in particular for n suciently large the probability that X x K

i i

will b e at least a constant for any xed x Next we complete the pro of by

showing that if the total common vote is far from the origin the chances

that any pro cessor will read a total vote whose sign diers from the common

vote is small This fact is itself shown in two steps First it is shown

that for suitable choice of c the total of the extra votes for a pro cessor P

P

X will b e small with high probability Second a b ound is

P i i i

P

derived on the dierence b etween X and the total vote actually read

P i i

by P

It will b e necessary to select values for a K and c that give the correct

b ounds on the probabilities However we will b e in a b etter p osition to

justify our choice for these parameters after we have develop ed more of the

analysis so the choice of parameters will b e deferred until Section

Phases of the proto col

We b egin by dening the phases of the proto col more carefully Let t b e

i

the value of the ith pro cessors internal variable t at any given step of the

proto col Let U b e the random variable representing the maximum value of

i

t during the entire execution of the proto col Let T b e the random variable

i i

representing the maximum value of t during the part of the execution of the

i

proto col where

i

In the pro of of correctness we will encounter many quantities of the form

P P

n n

T or U for various functions We will want to get b ounds

i i

i i

on these quantities without having to lo ok to o closely at the particular values

of each T or U This section proves several very general inequalities ab out

i i

quantities of this form all of which are ultimately based on the following

constraint

Z

T

a

i

T

X X X X

i

T

i

a a

j dj j K

a

i i j i

The constant a will reapp ear often for convenience we will write it as

A As noted ab ove a and hence A

A

A

nT

AK

K

so that K The constant T represents Dene T

K K

n A

the maximum value of each T if they are set to b e equal while satisfying

i

inequality Note that T need not b e integral Now we can show

K

A

Lemma Let x x A and let be any strictly increasing function

P

n

x such that is concave Then for any nonnegative fx g if

i i

i

P

n

K then x nT

i K

i

Pro of Since is concave we have

X X

x x

i i

n n

HLP Theorem Simple algebraic manipulation yields

X X

x

i

x n

i

n

But

A

X X

x K x

i i

T

K

n n A n

P

x nT Hence

i K

A

Letting b e the identity function we have x Ax which is

concave for A Hence

Corollary

n

X

T nT

i K

i

In the case where is convex the following lemma applies instead

A

Lemma Let x x A and let be any strictly increasing function

P

n

A

such that is convex Then for any nonnegative fx g if x A

i

i i

P

n

A

x n n T K then

i K

i

P

Pro of Let Y x Now x x or

i i i

x x

i i

Y

Y Y

which is at most

x x

i i

Y

Y Y

given the convexity of Hence

n n n

X X X

x x

i i

x n Y

i

Y Y

i i i

n

X

x n

i

i

n K

A

which is just n n T

K

A

The quantity n T is the maximum value that any x can take on

K i

P

without violating the constraint on x So what Lemma says is that

i

P

if is convex x is maximized by maximizing one of the x while

i i

setting the rest to zero

For the variables U we can show

i

A

Lemma Let x x A and let be any strictly increasing function

such that x c is concave in x Then for any nonnegative fx g

i

P

n

if x K then

i

i

n

X

U nT c

i K

i

Pro of Let W b e the numb er of votes written to the registers during the part

i

of the execution where the total of the register variance elds is less than or

P

A

A K equal to K The set of variables fW g satises the inequality W

i

i

using the same argument as gives Furthermore U W c b ecause

i i

after the ith pro cessors next vote the total variance in the registers must

exceed K and it can cast at most c more votes b efore noticing this fact

Dene x x c Then U W c W But

i i i

P P

n n

W U satisfy the premises of Lemma and thus

i i

i i

n T nT c

K K

Setting x to x gives

Corollary

n

X

U nT c

i K

i

A

Pro of x c Ax c which is concave since A

c

Dene g then g T T c will b e an upp er b ound for

K K

T

K

T c as well as a numb er of closely related constants involving c that

K

will app ear later

Common votes

The purp ose of this section is to show that for n suciently large the total

common vote is far from the origin with constant probability We do so by

showing that under the right conditions the total common vote will b e nearly

normally distributed

o n

P P

i i

is X F X As p ointed out ab ove S Let S

j j i j j Ki Ki

j j

a martingale Let N d nT e It follows from Corollary that for

K i

i N and thus S lim S is the sum of all the common votes The

KN i Ki

distribution of S is characterized in the following lemma

KN

Lemma If

A

A

n T

K

then for any x

i h

p

A

K x C Pr S x

KN

A

n T

K

where C is an absolute constant

Pro of The pro of uses Theorem which requires that the martingale b e

normalized so that the total conditional variance V is close to So let

N

n o

P

X i

i i

p

Y and consider the martingale To apply the theorem Y F

i j i

j

K

we need to compute a b ound on the value L We will x

N

h i

P

We b egin by getting a b ound on the rst term E jY j We have

i

T

N n N N

i

i h

X X X X X

a

j X j j E E jY j E E jY j

i i i i

K K

i j i i i

Now

Z

T

a

i

T

X

i

T

i

a a a a

j dj T j T

i i

a

j

4a+1

x

A a

Dene x x A x x taking Then y

a

4a+1

(4a+1)A

P

T

Ay

n

a aA

i

is at most is convex and hence T Ay

i i

a a

1A 4a+1

n T

A a

K

n using Lemma If a is p ositive then n T

K

a

is zero however if a is zero will b e In either case n

n Plugging everything value back into gives

N

A a A a

h i

X

n T n T n

K K

E jY j

i

K K a K

i

i h

observe that j For the second term E jV

N

N N

i h i h

X X

j F E Y V E X j F

i i i i

i N

K

i i

which is just K times the sum of the squares of the weights jX j of the

i

common votes But the total variance of the common votes can dier from

K by at most the variance of the rst vote X for which Since the

i i

A

pro cessor that casts this vote can have cast at most n T votes b eforehand

K

a

A

the variance of this vote is at most n T giving the b ound

K

a

A

jV j n T

K

N

K

Combining and gives

a

A

A a A a

n T

K

n T n T n

K K

L

N

K K a K K

a

aA a aA

n T n T A n

K K

A

K K a n T

K

A a aA a

n T n T

K K

K

A

A n T

K

A A

A n T A n T

K K

a

2a

2aA

n T A

K

e An T

K

A

A

n T

K

b bx

The secondtolast step uses the approximation x e for nonnegative

b and x The exp onential term is serendipitously b ounded by e if holds

A A a

since A n T implies that n T is also at most

K K

A more direct application of shows that L and thus Theorem

N

applies Hence

N

h i

p

X X

Pr X x K x Pr Y x x

i i i

i

A

C

A

n T jxj

K

A

C

A

n T

K

Extra votes

In this section we examine the extra votes from the p oint of view of a par

ticular pro cessor P

Recall that is dened to b e if the vote X is cast by some pro cessor Q

P i i

b efore P s nal read of Qs register and otherwise Clearly since P

P i i

could not have started its nal read until the total variance exceeded K As

discussed ab ove b oth and are F measurable Thus is

P i i i i P i i

o n

P

i

a random variable that is F measurable and S X F

i P i j j i

j

is a martingale by Theorem

a

Dene ng T The following lemma shows a b ound on the tails of

K

P

X

i i

Lemma For any x if

s

T

K

a

g d

nA

holds for some positive d x and

x d

A

g

log np

holds for some positive p n then for each processor P

h i

p

X

Pr X x K pn

P i i i

Pro of The pro of uses Corollary so we pro ceed by showing that its

premises stated in Theorem are satised

By Corollary X and thus X is zero for i nT c So

i i i K

P

X S where M nT c

i i P M K

Set w j X j Then the rst premise of Corollary follows from the

i i i

fact that for each i and jX j are b oth F measurable The second premise

i i i

is immediate For the third premise notice that

X X X X X X

j X j X X X X X

i i i P i i i

i i i i i

The rst term is

U

n

i

X X X

a

j X

i

j i

The second term is

X

a

K t X

i

i

for some t which is at most U for some i Thus

i

U

n

i

X X X

a a

j j X j K t

i i

j i

U

n

i

X X

a

j K

j i

n

X

A

U A K

i

i

A

Let x x A Then

A

A

Ay c

y c

A

A

X

A

k A Ak

Ay c

k A

k

For y the second derivative of each term is either when k A or

negative thus y c is concave and Lemma gives

n

A A A

X

U nT c ng T

i K K

nT c

K

A A A

i

It follows from and that

A

X

ng T

K

A

j X j K K g

i i

A

Applying from Corollary now yields for all

2 A

K g

PrS e

P M

p

K by Lemma So If holds then d

i h i h

p p

X

K Pr S x d K Pr X x

P M i i

2 A

xd KK g

e

2 A

xd g

e

But if holds then

x d

A

g

log np

and since log np and g

x d

log np log pn

A

g

From which it follows that

2 A

xd g logpn

e e pn

Written votes vs decided votes

P

In this section we show that the dierence b etween X and the total

P i i

a

vote actually read by P is b ounded by ng T

K

Lemma Let R be the sum of the votes read during P s nal read

P

Then

X

a a

nT c ng T X R

k K P i i P

Pro of Supp ose and supp ose X is decided by pro cessor P If the

P i i j

vote X is not included in the value read by P it must have b een decided

i

b efore P s read of P s register but written afterwards Because each vote

j

is written out b efore the next vote is decided there can b e at most one vote

P

from P which is included in X but is not actually read by P This

j P i i

P P

n

a a

vote has weight at most U So we have j X R j U Now let

P i i P

j i i

a

x x Then

a

X

a

a

k A ak A

Ay c y c Ay c

k

k

which is concave since the second derivative of each term of the sum is neg

ative The rest follows from Lemma

Choice of parameters

Let us summarize the pro of of correctness in a single theorem

Theorem Dene

A a

A

AK

T

K

n

c

g

T

K

and suppose there exist d x d and positive p n such that al l of the

fol lowing hold

s

T

K

a

g d

nA

x d

A

g

log np

A

A

n T

K

Then the protocol implements a shared coin with agreement parameter at least

A

x C p

A

n T

K

where C is the constant from Lemma

Pro of To show that the agreement parameter is at least we must

show that for each z f g the probability that all pro cessors decide z is

at least Without loss of generality let us consider only the probability

that all pro cessors decide the case of all pro cessors deciding follows by

symmetry

p

P

a

Recall the denition ng T Supp ose that X x K and

K i i

p

P

K Then for each P we that for each pro cessor P X x

P i i i

P

have X and by Lemma P reads a value greater than during

P i i

its nal read and thus decides

p

P

Now for this event not to o ccur we must either have X x K

i i

p

P

or X x K for some P But as the probability of a

P i i i

union of events never exceeds the sum of the probabilities of the events the

probability of failing in any of these ways is at most

h i h i

p p

X X X

Pr K Pr K X x X x

i i P i i i

P

A

npn x C

A

n T

K

by Lemmas and So the probability some pro cessor decides is at

most and thus the probability that all pro cessors decide is at least

minus

The running time of the proto col is more easily shown

A

Theorem No processor executes more than AK nc c n

register operations during an execution of the shared coin protocol

Pro of First consider the maximum numb er of votes a pro cessor can cast

A

After AK votes the total variance of the pro cessors votes will b e

A

1A

A

Z AK

1A

AK

AK

X

a a

K x x dx

A

x

so after at most an additional c votes the pro cessor will execute line of

Figure and see a total variance greater than K Thus each pro cessor

A

casts at most AK c votes But each vote costs write op eration in

line and every c votes costs n reads in line to which must b e added

a onetime cost of n reads in line The total numb er of op erations is

A A

thus at most AK c dnce n AK c nc n

A

AK nc c n

It remains only to nd values for a K and c which give b oth a constant

agreement parameter and a reasonable running time As a warmup let us

consider what happ ens if we emulate the proto col of Bracha and Rachman

BR

n

Theorem If a K n and c then for n suciently

log n

large the protocol implements a shared coin with agreement parameter at least

in which each processor executes at most O n log n operations

Pro of For the agreement parameter we have A T n and g

K

log n Let d x and p Then holds since

q

a

g d T nA Furthermore

K

A

x d

log np log n log p

log n

when n p Thus holds The remaining inequality holds

for n so by Theorem we have a probability of failure of at most

C p

n

O

n

which is not more than for n suciently large In particular for

n greater than some n this quantity is at most and the agreement

parameter is thus at least

The running time is immediate from Theorem

Now consider what happ ens if a is not restricted to b e a constant

log n

Theorem If a log n K n log n n log n and c

nl og n then for n suciently large the protocol implements a shared coin

with constant agreement parameter in which each processor executes at most

O n log n operations

Pro of We have A log n T n log n and g Let d

2

K

log n

x and p

We want to apply Theorem so rst we verify that its premises are

satised To show compute

log n

2

log n log n log n a

e e g

log n

q

which for n will b e less than d T nA To show note that

K

log n

log n A

e g

log n

A

and thus log g log n But

x d

log log

log np log np

log np

log np

log n log p log n log p

x For suciently large n using the approximation log x x

this quantity exceeds log n and holds The remaining constraint

is easily veried and thus Theorem applies and the agreement

parameter is at least

log n

C

log n

n n log n

O log nn

which is at least for suciently large n Thus the proto col gives a

constant agreement parameter

Now by Theorem the numb er of op erations executed by any single

A

pro cessor is at most AK nc c n or

log n log n

log n n log nn log n O log n O n

which is O n log n

It follows immediately that plugging a coin with the parameters of The

orem into the consensus proto col construction of Chapter gives a

consensus proto col that requires an exp ected O n log n op erations p er pro

cessor It is not dicult to see that the b est b ound we can place on the total

numb er of op erations is in fact n times this quantity or O n log n The

worst case is when each pro cessor casts the same numb er of common votes

Chapter

Conclusions and Op en

Problems

In this thesis I have shown

A simple algorithm for a robust waitfree shared coin with bias at most

which runs in an exp ected O n total register op erations

A modication of this algorithm that achieves consensus in an exp ected

O n total register op erations and which can b e implemented using

only three atomic counters

The asymptotically fastest known waitfree consensus proto col in the

p erpro cessor measure based on a shared coin that requires only an

exp ected O n log n register op erations p er pro cessor to achieve a con

stant agreement parameter

This chapter discusses how these results t into the history of waitfree

consensus and what diculties need to b e overcome to make further im

provements It concludes with a list of op en problems

Comparison with other proto cols

Table gives a comparison of the running times of waitfree consensus

proto cols for the sharedmemory model In this table the quantity p is the

Exp ected op erations

Per pro cessor Total

2 2

O n O n

Abrahamson Abr

Aspnes and Herlihy AHa O n O n

Attiya Dolev and Shavit ADS O n O n

Chapter Asp O n p n O n p n

Bracha and Rachman BR O np n O np n

Dwork et al DHPW O np n O np n

Saks Shavit and Woll SSW O n O n

Bracha and Rachman BR O n log n O n log n

Chapter AW O n log n O n log n

Table Comparison of consensus proto cols

numb er of active pro cessors as dened in Section The rst known pro

to col was the exp onential proto col of Abrahamson Abr The rst known

p olynomialtime proto col was that of Aspnes and Herlihy AHa Attiya

Dolev and Shavit ADS describ ed a modication of this proto col which

required only a b ounded amount of space but which retained the spirit of

the roundsbased structure of the AspnesHerlihy proto col

The proto col of Chapter which also app ears in Asp was the rst to

eliminate the use of rounds by using a robust shared coin Since its rst ap

p earance its p erformance was improved by a factor of n by Bracha and Rach

man BR and by Dwork et al DHPW Both groups achieved the im

provement by replacing the O n implementation of an atomic counter with

a weaker primitive that required only O n register op erations p er counter

op eration and acted suciently like a counter to make the consensus proto

col work

The rst proto col to use the idea of casting votes until a quorum is reached

instead of until a suciently large margin of victory is reached was that

of Saks Shavit and Woll SSW Their proto col was optimized for the

sp ecial case where nearly all of the pro cessors are running in lo ckstep Bracha

and Rachman BR noticed that the proto col could b e sp ed up by having

each pro cessor read all the registers only after every O n log n votes the

resulting proto col is a sp ecial case of the proto col of Chapter obtained by

setting a to The proto col of Chapter which also app ears in AW is

the rst to use votes of unequal weight and as a result is the rst for which

the maximum exp ected numb er of op erations executed by a single pro cessor

is more than a constant factor less than the maximum executed by all of the

pro cessors together

Limits to waitfree consensus

The table shows a considerable evolution of waitfree consensus proto cols

since Abrahamsons exp onential solution It is natural to ask how much

b etter consensus proto cols can still get

One limitation we quickly run into is the following If a pro cessor is run

ning by itself it must read every other pro cessors register at least once If

not it cannot distinguish the situation where it really ran rst all by itself

from the situation where some other pro cessor whose register it has not read

ran to completion b efore it started In the latter case the pro cessor would b e

required by the consistency condition to agree with its unseen predecessor

but without reading that predecessors register it would have no way of know

ing which value to cho ose Thus in any waitfree consensus proto col some

pro cessor can always b e forced to execute at least n read op erations

This n lower b ound is unaected even if the adversary is substantially

weakened the argument remains valid for example if the adversary is not

allowed to see the internal states of pro cessors or even if it is required to

sp ecify all of its scheduling decisions b efore the proto col starts So in fact

the O n log n proto col we have described here is close to the b est we can

hop e for in the p erpro cessor measure given the assumption of singlewriter

registers even against relatively weak adversaries

On the other hand the question of how far the total numb er of op era

tions can b e reduced do es not have as easy an answer That some pro cessor

can b e forced to execute n op erations do es not mean that all pro cessors

can b e forced to it could b e the case that if the pro cessors co op erate they

could collectively gather information ab out the state of the system faster

than they would indep endently In fact the b est known lower b ound for

exp ected total op erations is only n log n based on the minimum num

b er of op erations needed to communicate every pro cessors state to every

other pro cessor SSW Furthermore the fact that the proto col of BR

achieves a b ound of O n log n on total work shows that some improvement

is p ossible on our proto col though p ossibly only at the exp ense of increasing

the p erpro cessor b ound

However to get b elow n op erations will require at least two break

throughs The rst problem is that all of the algorithms we currently have

require that every pro cessor read every other pro cessors register directly at

some p oint which takes n total op erations It seems likely that some

sort of randomized co op erative technique could allow this dissemination of

information to pro ceed more quickly p ossibly at the cost of using very large

registers but at present no such technique is known

The second problem is that to reduce the total numb er of op erations b elow

n it will b e necessary to reduce the numb er of lo cal random choices b elow

n as lo cal coinips that have no writes b etween them eectively con

solidate into a single random choice from the p oint of view of the scheduler

This problem app ears more dicult than the rst as it requires abandoning

the voting technique at the heart of all currently known waitfree consen

sus proto cols The reason is that in these proto cols the schedulers p ower

only b ecomes limited when the standard deviation of the total vote b ecomes

comparable to the sum of the votes that the scheduler can withhold With

unweighted votes n votes are required for weighted votes the situation

is only made worse as increasing the weight of some votes increases the sum

of the withheld votes more quickly than it increases the standard deviation of

the total vote It app ears that it will b e dicult to get b elow n without

adopting some decision metho d that takes more account of the ordering of

events in the system

Op en problems

The consensus proto col describ ed in Chapter comes quite close to the limits

of current methods for solving waitfree consensus Aside from optimizations

such as eliminating the log n factors from the p erpro cessor b ound or reducing

the value of n at which the proto col b ecomes practical essentially the only

question remaining is whether the total numb er of op erations can b e reduced

substantially There are several questions whose answers would shed light on

this problem as well as many other problems in the area

Is it p ossible in the asynchronous sharedmemory mo del for n pro ces

sors to collectively read n registers in fewer than n total op erations

Do es every consensus proto col contain a shared coin

Can a shared coin with constant agreement parameter b e built that

requires less than n total op erations A closely related question

can a shared coin of arbitrarily small bias run in less than n

total op erations

Biblio graphy

AAD Yehuda Afek Danny Dolev Eli Gafni Michael

Merritt and Atomic snapshots of shared memory In

Proceedings of the Ninth ACM SIGACTSIGOPS Symposium on

Principles of Distributed Computing pages August

Abr Karl Abrahamson On achieving consensus using a shared mem

ory In Proceedings of the Seventh ACM SIGACTSIGOPS Sym

posium on Principles of Distributed Computing pages

August

ADS H Attiya D Dolev and N Shavit Bounded p olynomial ran

domized consensus In Proceedings of the Eighth ACM Sym

posium on Principles of Distributed Computing pages

August

AFL Eshrat Arjomandi Michael J Fischer and Nancy A Lynch Ef

ciency of synchronous versus asynchronous distributed systems

Journal of the ACM July

AG James H Anderson and Bo jan Groselj Beyond atomic registers

Bounded waitfree implementations of nontrivial ob jects In

Proceedings of the Fifth International Workshop on Distributed

Algorithms pages SpringerVerlag

AHa James Aspnes and Maurice Herlihy Fast randomized consen

sus using shared memory Journal of Algorithms

Septemb er

AHb James Aspnes and Maurice Herlihy Waitfree synchronization in

the asynchronous PRAM mo del In Second Annual ACM Sym

posium on Paral lel Algorithms and Architectures July

ALS Hagit Attiya Nancy Lynch and Nir Shavit Are waitfree al

gorithms fast In st Annual Symposium on Foundations of

Computer Science pages Octob er

And James H Anderson Comp osite registers In Proceedings of the

Ninth ACM SIGACTSIGOPS Symposium on Principles of Dis

tributed Computing pages August

AS and Jo el H Sp encer The Probabilistic Method John

Wiley and Sons

Asp James Aspnes Time and spaceecient randomized consensus

In Proceedings of the Ninth ACM SIGACTSIGOPS Symposium

on Principles of Distributed Computing pages August

To app ear Journal of Algorithms

AW James Aspnes and Orli Waarts Randomized consensus in ex

p ected O n log n op erations p er pro cessor To app ear Thirty

Third Annual Symposium on Foundations of Computer Science

BD Shai BenDavid The global time assumption and semantics

for concurrent systems In Proceedings of the Seventh ACM

SIGACTSIGOPS Symposium on Principles of Distributed Com

puting pages August

Bil Patrick Billingsley Probability and Measure John Wiley and

Sons second edition

BND Amotz BarNoy and Danny Dolev Sharedmemory vs message

passing in an asynchronous distributed environment In Proceed

ings of the Eighth ACM Symposium on Principles of Distributed

Computing pages August

BP James E Burns and Gary L Peterson Constructing multireader

atomic values from nonatomic values In Proceedings of the Sixth

ACM Symposium on Principles of Distributed Computing pages

BR Gabi Bracha and Ophir Rachman Approximated counters and

randomized consensus Technical Rep ort Technion

BR Gabi Bracha and Ophir Rachman Randomized consensus in ex

p ected O n log n op erations In Proceedings of the Fifth Inter

national Workshop on Distributed Algorithms SpringerVerlag

BT Gabriel Bracha and Sam Toueg Asynchronous consensus and

byzantine proto cols in faulty environments Technical Rep ort

Department of Computer Science Cornell University

Che H Cherno A measure of asymptotic eciency for tests of a

hyp othesis based on the sum of observations Annals of Mathe

matical Statistics Decemb er

CIL Benny Chor Amos Israeli and Ming Li On pro cessor co ordi

nation using asynchronous hardware In Proceedings of the Sixth

ACM Symposium on Principles of Distributed Computing pages

CM Benny Chor and Lior Moscovici Solvability in asynchronous

environments In th Annual Symposium on Foundations of

Computer Science pages Octob er

CMS Benny Chor Michael Merritt and David B Shmoys Sim

ple constanttime consensus proto cols in realistic failure mo dels

Journal of the ACM July

DDS Danny Dolev and Larry Sto ckmeyer On the

minimal synchronism needed for distributed consensus Journal

of the ACM January

DHPW Cynthia Dwork Maurice Herlihy Serge Plotkin and Orli

Waarts Timelapse snapshots In Proceedings of Israel Sym

posium on the Theory of Computing and Systems

DS Danny Dolev and Nir Shavit Bounded concurrent timestamp

systems are constructible In Proceedings of the TwentyFirst

Annual ACM Symposium on the Theory of Computing pages

DW Cynthia Dwork and Orli Waarts Simple and ecient b ounded

concurrent timestamping or b ounded concurrent timestamp sys

tems are comprehensible In Proceedings of the TwentyFourth

Annual ACM Symposium on the Theory of Computing pages

DY EB Dynkin and AA Yushkevich Control led Markov Processes

SpringerVerlag

Fel William Feller An Introduction to Probability Theory and Its

Applications volume John Wiley and Sons third edition

Fel William Feller An Introduction to Probability Theory and Its

Applications volume John Wiley and Sons second edition

FLP Michael J Fischer Nancy Ann Lynch and Michael S Pater

son Impossibility of distributed commit with one faulty pro cess

Journal of the ACM April

Hal Paul R Halmos Invariants of certain sto chastic transformations

The mathematical theory of gambling systems Duke Mathemat

ical Journal June

Her Maurice Herlihy Waitfree synchronization ACM Transactions

on Programming Languages and Systems Jan

uary

HH P Hall and CC Heyde Martingale Limit Theory and Its Ap

plication Academic Press

HLP G Hardy JE Littlewo o d and G Polya Inequalities Cam

bridge University Press second edition

IL Amos Israeli and Ming Li Bounded time stamps In Proceed

ings of the th Annual Symposium on Foundations of Computer

Science pages

Kop PE Kopp Martingales and Stochastic Integrals Cambridge

University Press

LAA Michael C Loui and Hosame H AbuAmara Memory require

ments for agreement among unreliable asynchronous pro cesses

In Franco P Preparata editor Advances in Computing Research

volume JAI Press

Lam Leslie Lamp ort Concurrent reading and writing Communica

tions of the ACM Novemb er

Lama Leslie Lamp ort On interpro cess communication part I Basic

formalism Distributed Computing

Lamb Leslie Lamp ort On interpro cess communication part I I Algo

rithms Distributed Computing

LF Nancy A Lynch and Michael J Fischer On describing the b e

havior and implementation of distributed systems Theoretical

Computer Science

LL Leslie Lamp ort and Nancy Lynch Distributed computing Mo d

els and metho ds In Jan Van Leeuwen editor Handbook of The

oretical Computer Science volume B chapter IX pages

MIT Press

Lyn Nancy Lynch IO automata A model for discrete event sys

tems Technical Rep ort MITLCSTM MIT Lab oratory for

Computer Science March

NW Richard NewmanWolfe A proto col for waitfree atomic multi

reader shared variables In Proceedings of the Sixth ACM Sym

posium on Principles of Distributed Computing pages

Pet Gary L Peterson Concurrent reading while writing ACM

Transactions on Programming Languages and Systems

January

Plo Serge A Plotkin Sticky bits and universality of consensus In

Proceedings of the Eighth ACM Symposium on Principles of Dis

tributed Computing pages August

SAG Ambuj K Singh James H Anderson and Mohamed G Gouda

The elusive atomic register revisited In Proceedings of the Sixth

ACM Symposium on Principles of Distributed Computing pages

August

Sp e Jo el Sp encer Ten Lectures on the Probabilistic Method So ciety

for Industrial and Applied Mathematics

SSW Michael Saks Nir Shavit and Heather Woll Optimal time ran

domized consensus making resilient algorithms fast in prac

tice In Proceedings of the Second Annual ACMSIAM Sympo

sium on Discrete Algorithms pages

SV Miklos Santha and Umesh V Vazirani Generating quasirandom

sequences from semirandom sources Journal of Computer and

System Sciences

TM Gadi Taub enfeld and Possibility and impossi

bility results in a shared memory environment In Proceedings

of the Third International Workshop on Distributed Algorithms

pages SpringerVerlag Septemb er