<<

review articles

DOI:10.1145/3363823 we do not have good tools to build ef- A 50-year history of concurrency. ficient, scalable, and reliable concur- rent systems. BY SERGIO RAJSBAUM AND Concurrency was once a specialized discipline for experts, but today the chal- lenge is for the entire information tech- nology community because of two dis- ruptive phenomena: the development of Mastering networking communications, and the end of the ability to increase processors speed at an exponential rate. Increases in performance come through concur- Concurrent rency, as in multicore architectures. Concurrency is also critical to achieve fault-tolerant, distributed services, as in global , cloud , and Computing blockchain applications. through sequen- tial thinking. Right from the start in the 1960s, the main way of dealing with con- through currency has been by reduction to se- quential reasoning. Transforming problems in the concurrent domain into simpler problems in the sequential Sequential domain, yields benefits for specifying, implementing, and verifying concur- rent programs. It is a two-sided strategy, together with a bridge connecting the Thinking two sides. First, a sequential specificationof an object (or service) that can be ac- key insights

˽˽ A main way of dealing with the enormous challenges of building concurrent systems is by reduction to sequential I must appeal to the patience of the wondering readers, thinking. Over more than 50 years, more sophisticated techniques have been suffering as I am from the sequential nature of human developed to build complex systems in communication. this way. 12 ˽˽ The strategy starts by designing —.W. Dijkstra, 1968 sequential specifications, and then mechanisms to associate a concurrent execution to a sequential execution that itself can be tested against the ONE OF THE most daunting challenges in information sequential specification.

science and technology has always been mastering ˽˽ The history starts with concrete, physical techniques, and evolves concurrency. Concurrent programming is enormously up to today, with more abstract, scalable, and fault-tolerant ideas including difficult because it copes with many possible distributed ledgers.

nondeterministic behaviors of tasks being done at ˽˽ We are at the limits of the approach, encountering performance limitations the same time. These come from different sources, by the requirement to satisfy a including failures, operating systems, sequential specification, and because not all concurrent problems have architectures, and asynchrony. Indeed, even today sequential specifications.

78 COMMUNICATIONS OF THE ACM | JANUARY 2020 | VOL. 63 | NO. 1 cessed concurrently states the desired efficient, scalable, and fault-tolerant between the executions of a concurrent behavior only in executions where the concurrent objects. Locks enforce ex- program and the sequential specifica- processes access the object one after clusive accesses to shared data, and tion. It enforces safety properties by the other, sequentially. Thus, famil- protocols. More which concurrent executions appear as iar paradigms from sequential com- abstract and fault-tolerant solutions if the operations invoked on the object puting can be used to specify shared that include agreement protocols that where executed instantaneously, in objects, such as classical data struc- can be used for replicated data to lo- some sequential interleaving. This is tures (for example, queues, stacks, cally execute object operations in the captured by the notion of a consistency and lists), registers that can be read same order. Reliable communication condition, which defines the way con- or modified, or transactions. protocols such as atomic broadcast and current invocations to the operations This makes it easy to understand the gossiping are used by the processes to of an object correspond to a sequential object being implemented, as op- communicate with each other. Distrib- interleaving, which can then be tested posed to a truly concurrent specifi- uted data structures, such as - against the its sequential specification. cation which would be hard or un- chains. Commit protocols to ensure at- A brief history and some examples. natural. Instead of trying to modify omicity properties. Several techniques The history of concurrency is long and the well-understood notion of say, a are commonly useful, such as time- the body of research enormous; a few queue, we stay with the usual sequen- stamps, quorums, group membership milestones are in the sidebar “A Few tial specification, and move the mean- services, and failure detectors. Inter- Dates from the History of Synchroniza- ing of a concurrent implementation of esting liveness issues arise, specified tion.” The interested reader will find a queue to another level of the system. by progress conditions to guarantee many more results about principles The second part of the strategy is to that operations are actually executed. of concurrency in textbooks.4,21,35,37 We

IMAGE FROM SHUTTERSTOCK.COM IMAGE provide implementation techniques for The bridge establishes a connection concentrate here only on a few signifi-

JANUARY 2020 | VOL. 63 | NO. 1 | COMMUNICATIONS OF THE ACM 79 review articles

foundational articles on concurrent pro- gramming appears in Brinch.6 As soon as the programs being run A Few Dates from the concurrently began to interact with each other, it was realized how difficult History of Synchronization it is to think concurrently. By the end of the 1960s a crisis was emerging: 1965 Mutual exclusion from atomic read/write registers Dijkstra11 programming was done without any 1965 Semaphores Dijkstra13 conceptual foundation and programs Mutual exclusion from non-atomic 1971 Lamport23 were riddled with subtle errors caus- read/write registers ing erratic behaviors. In 1965, Dijks- 1974 Concurrent reading and writing Lamport,24 Peterson33 1977, 1983 Distributed state machine (DA 2000) Lamport25 tra discovered that mutual exclusion Byzantine failures in synchronous systems of parts of code is a fundamental con- 1980 Pease, Shostak, Lamport31 (DA 2005) cept of programming and opened the 32 1981 Simplicity in mutex Peterson way for the first books of principles on 1983 Asynchronous randomized consensus (DA 2015) Ben-Or,5 Rabin34 1985 Liveness (progress condition) (DA 2018) Alpern, Schneider1 concurrent programming, which ap- Impossibility of asynchronous deterministic peared at the beginning of the 1970s. 1985 consensus in the presence of crashes Fischer, Lynch, Paterson16 Locks. A mutual exclusion (DA 2001) 1987 Fast mutual exclusion Lamport27 consists of the code for two opera- 1991 Wait-free synchronization (DA 2003) Herlihy19 tions—acquire() and release()— Herlihy, Moss,20 Shavit, 1993, 1997 (DA 2012) that a process invokes to bracket a Touitou40 section of code called a . Shared memory on top of asynchronization 1995 message-passing systems despite a minority Attiya, Bar Noy, Dolev2 The usual environment in which it is of process crashes (DA 2011) executed is asynchronous, where pro- Weakest information on failures to solve consensus cess speeds are arbitrary, independent 1996 in the presence of asynchrony and process crashes Chandra, Hadzilacos, Toueg8 (DA 2010) from each other. A mutual exclusion 2008 Scalability, accountability Nakamoto30 algorithm guarantees two properties. ˲˲ Mutual exclusion. No two processes A paper that received the Dijkstra ACM Award in the year X is marked (DA X). are simultaneously executing their crit- ical section. ˲˲ -freedom. If one or several cant examples of sequential reasoning be that it is expensive to implement, processes invoke acquire() opera- used to master concurrency, providing and furthermore, there are inher- tions that are executed concurrently, a sample of fundamental notions of ently concurrent problems with no eventually one of them terminates its this approach, and we describe several sequential specifications. invocation, and consequently executes algorithms, both shared memory and its critical section. , as a concrete illustra- Mutual Exclusion Deadlock-freedom does not pre- tion of the ideas. Concurrent computing began in 1961 vent specific timing scenarios from oc- We tell the story through an evo- with what was called multiprogramming curring in which some processes can lution that starts with mutual ex- in the Atlas , where concur- never enter their critical section. The clusion, followed by implementing rency was simulated—as we do when stronger starvation-freedom progress read/write registers on top of mes- telling stories where things happen con- condition states that any process that sage passing systems, then imple- currently—interlacing the execution of invokes acquire() will terminate its menting arbitrary objects through sequential programs. Concurrency was invocation (and will consequently ex- powerful synchronization mecha- born in order to make efficient use of a ecute its critical section). nisms. We discuss the modern dis- sequential computer, which can execute A mutual exclusion algorithm. The tributed ledger trends of doing so in only one instruction at a time, giving us- first mutual exclusion algorithms were a highly scalable, tamper-proof way. ers the illusion that their programs are difficult to understand and prove cor- We conclude with a discussion of the all running simultaneously, through the rect. We describe here an elegant al- limitations of this approach: It may . A collection of early gorithm by Peterson32 based on read/ write shared registers. Algorithms for Algorithm 1. Peterson’s mutual exclusion algorithm for two processes. a message-passing system have been described since Lamport’s logical clock paper.25 The version presented in Algorithm 1 is for two processes but can be eas- ily generalized to n processes. The two processes p1 and p2 share three read/write atomic registers, FLAG[1], FLAG[2], and LAST. Initially FLAG[1],

80 COMMUNICATIONS OF THE ACM | JANUARY 2020 | VOL. 63 | NO. 1 review articles

FLAG[2], are down, while LAST does not need to be initialized. Both processes can read all registers. Moreover, while An Atomic (Linearizable) LAST can be written by both processes, only pi , i ∈ {1, 2}, writes to FLAG[i]. Execution of Processes p1, p2, and p3 Atomic means the read and write op- erations on the registers seem to have on xRead/Write Register R. been executed sequentially (hence, the notion of “last writer” associated with LAST is well defined). R.read() → 1 R.read() → 2 p1 When process pi invokes acquire(), it first raises its flag, thereby indicat- R.write(1) R.write(2) p2 ing it is competing, and then writes its name in LAST indicating it is the last R.write(3) R.read() → 2 p3 writer of this register. Next, process pi re- Omniscient observer’s peatedly reads FLAG[j] and LAST until it time line sees FLAG[j] = down or it is no longer the Here R = 1 Here R = 3 Here R = 2 last writer of LAST. When this occurs, pi terminates its invocation. The opera- From an external observer point of view, it appears as if the operations tion release() consists in a simple were executed sequentially, at the sequence of linearization points lowering of the flag of the invoking pro- of the read/write operations (indicated by the dotted arrows) cess. The read and write operations on FLAG[1], FLAG[2], and LAST are totally ordered (atomicity), which facilitates sion, in a way that operations can over- Read/Write Register on Top the proof of the mutual exclusion and lap in time. of Message-Passing Systems starvation-freedom properties. Tolerating crash failures. Addition- Perhaps the most basic shared ob- Mutual exclusion was the first ally, mutual exclusion cannot be used ject is the read/write register. In the mechanism for mastering concur- to implement an object in the presence shared memory context, there are im- rent programming through sequential of asynchrony and process crashes. If a plementations from very simple regis- thinking, and lead to the identification process crashes inside its critical sec- ters (that only one process can write, of notions that began to give a scien- tion, other processes, unable to tell if and another can read), all the way to tific foundation to the approach, such it crashed or is just slow, are prevented a multi-writer multireader (MWMR) as the concepts of progress condition from accessing the object. register, that every process can write and atomicity. It is the origin of the im- Consistency conditions. Wherever and every process can read; for exam- portant area of concurrency control, by concurrent accesses to share data take ple, see Herlihy.21 controlling access to data using locks place, a consistency condition is need- Distributed message-passing sys- (for example, 2-phase locking). ed to define which concurrent opera- tems often support a shared memory tion executions are considered correct. abstraction and have a wide accep- From Resources to Objects Instead of transforming a concurrent tance in both research and commer- At the beginning, a critical section was execution into a sequential execution cial computing, because this abstrac- encapsulating the use of a physical re- (as in mutual exclusion), the idea is to tion provides a more natural transition source, which by its own nature, is se- enforce that, from an external observer from uniprocessors and simplifies quentially specified (for example, disk, point of view, everything must appear programming tasks. We describe here printer, processor). Conceptually not as if the operations were executed se- a classic fault-tolerant implementa- very different, locks were then used to quentially. This is the sequential consis- tion of a read/write register on top of protect concurrent accesses of simple tency notion (or ), which a message passing system. data (such as a file). However, when has been used since early 1976 in the It is relatively easy to build atomic critical sections began to be used to en- database context to guarantee that read/write registers on top of a reli- capsulate more general shared objects, transactions appear to have executed able asynchronous message-passing new ideas were needed. atomically.38 However, sequential con- system, for example, Raynal,36 but if Data is not physical resources. A sistency is not composable. The stron- processes may crash, more involved shared object is different from a physi- ger consistency condition of lineariz- algorithms are needed. Two important cal object, in that it does not a priori ability (or atomicity) requires that the results are presented by Attiya, Bar-Noy require exclusive access; a process can total order of the operations respects and Dolev:2 read the data of a file while another the order on non-overlapping opera- ˲˲ An algorithm that implements an process concurrently modifies it. The tions.22 is illustrated in atomic read/write register on top of -free approach (introduced by the sidebar “An Atomic (Linearizable) a system of n asynchronous message- Lamport24), makes possible to envis- Execution of Processes,” that describes passing processes, where at most t < age implementations of purely digital an execution in which three processes n/2 of them may crash. objects without using mutual exclu- access an atomic read/write register R. ˲˲ A proof of the impossibility of

JANUARY 2020 | VOL. 63 | NO. 1 | COMMUNICATIONS OF THE ACM 81 review articles

building an atomic read/write register Design principles of ABD: intersect- quorum, pi terminates the write op-

when t ≥ n/2. ing quorums. A process pi broadcasts eration.

This section presents the algorithm, a query to all the processes and waits On its server side, a process pi that referred to as the ABD Algorithm, which for acknowledgments from a majority receives a write_req message sent by

illustrates the importance of the ideas of them. Such a majority quorum set, a process pj during phase 1 of a write of reducing concurrent thinking to has the following properties. As t < n/2, operation, sends it back an acknowl- sequential reasoning. A more detailed waiting for acknowledgments from a edgment carrying the sequence num- proof as well as other algorithms can majority of processes can never block ber associated with the latest value it 2,4,37 be found. forever the invoking process. More- saved in regi .When it receives write_

Design principles of ABD. Each writ- over, the fact that any two quorums req message sent by a process pj dur- ten value has an identity. Each process have a non-empty intersection implies ing phase 2 of a write operation, it up-

is both a client and a server. Let REG be the atomicity property of the read/ dates its local data regi implementing the multi-writer multi-reader (MWMR) write register REG. REG if the received timestamp is more register that is built (hence, any pro- The operation REG.write(v). This recent (with respect to the total order cess is allowed to read and write the operation is implemented by Algo- on timestamps) than the one saved in

register). On its client side a process pi rithm 2. When a process pi invokes timestampi, and, in all cases, it sends

can invoke the operations REG.write REG.write(v), it first creates a tag back to pj and acknowledgment (so pj (v) to write a value v in REG, and REG. denoted (tag) which will identify the terminates its write). read() to obtain its current value. On query/response messages generated It is easy to see that, due to the in-

its server side, a process pi manages by this write invocation. Then (phase tersection property of quorums, the

two local variables: regi which locally 1), it executes a first instance of the timestamp associated with a value v by

implements REG, and timestampi, query/response exchange pattern the invoking process pi is greater than which contains a timestamp made up to learn the highest sequence num- the ones of the write operations that

of a sequence number (which can be ber saved in the local variables time- terminated before pi issued its own

considered as a date) and a process stampj of a majority of processes pj . write operation. Moreover, while con-

identity j. The timestamp timestampi When this is done, pi computes the current write operations can associate constitutes the “identity” of the value timestamp ts which will be associat- the same sequence number with their

v saved in regi (namely, this value was ed with the value v it wants to write in values, these values have different (and

written by this process at this time). REG. Finally (phase 2), pi starts a sec- ordered) timestamps.

Any two timestamps 〈sni, i〉 and 〈snj, ond query/response pattern in which The operation REG.read(). Algorithm j〉 are totally ordered by their lexico- it broadcasts the pair (v, ts) to all the 3 implements operation REG.read(),

graphical order; namely, 〈sni, i〉 < 〈snj, j〉 processes. When it has received the with a similar structure as the imple-

means (sni < snj) ∨ (sni = snj ∧ i < j). associated acknowledgments from a mentation of operation REG.write(). Notice that the following scenario Algorithm 2. ABD’s implementation of read/write register: write operation. can occur, which involves two read op- erations read1 and read2 on a register REG by the processes p1 and p2, respec- tively, and a concurrent write opera- tion REG.write(v) issued by a process p3. Let ts(v) be the timestamp associ- ated with v by p3. It is possible that the phase 1 majority quorum obtained by p1 includes the pair (v, ts(v)), while the one obtained by p2 does not. If this oc- curs, the first read operation read1 ob- tains a value more recent that the one obtained by the second read2, which violates atomicity. This can be easily solved by directing each read operation to write the value it is about to return as a result. In this way, when read1 termi- nates and returns v, this value is known by a majority of processes despite asyn- chrony, concurrency, and a minority of process crashes. This phenomenon (called new/old inversion) is prevented by the phase 2 of a read operation (as il- lustrated in the accompanying figure). We have seen how the combina- tion of intersecting quorums and

82 COMMUNICATIONS OF THE ACM | JANUARY 2020 | VOL. 63 | NO. 1 review articles timestamps, two ideas useful in other mentations, that require that any op- State Machine Replication situations, facilitate the implementa- eration invoked by a process that does A concurrent stack can be implement- tion of atomic read/write registers in not crash must return (independently ed by executing the operations pop() asynchronous message-passing sys- of the speed or crashes of other pro- and push() using mutual exclusion. tems where a minority of process may cesses), are said to be wait-free. However, as already indicated, this crash. And how sequential thinking for One way of measuring the synchro- strategy does not work if processes may shared registers can be used at the up- nization power of an object in the crash. The state machine replication per abstraction level. presence of asynchrony and process mechanism25,39 is a general way of imple- crashes is by its consensus number. menting an object by asynchronous The World of Concurrent Objects The consensus number of an object processes communicating by message- A read/write register is a special case O is the greatest integer n, such that passing. We will discuss implementa- of an object. In general, an object is it is possible to wait-free implement a tions where the processes may fail by defined by the set of operations that consensus object for n processes from crashing; there are also implementa- processes can invoke, and by the be- any number of objects O and atomic tions that tolerate arbitrary (Byzantine) havior of the object when these opera- read/write registers. The consensus failures.7 We should point out that non- tions are invoked sequentially. These number of O is ∞ if there is no such deterministic automata sometimes ap- can be represented by an automaton greatest integer. As an example, the pear in applications and pose addition- or by a set of sequential traces. In the consensus number of a Test&Set ob- al challenges for implementations. case of an automaton, for each state, ject or a stack object is 2, while the con- The general idea is for the processes and each possible operation invoca- sensus number of a Compare&Swap to agree on a sequential order of the tion, a transition specifies a response or Load/Link&Store/Conditional (LL/ concurrent invocations, and then each to that invocation, and a new state SC) object is ∞. We will discuss a LL/ one to simulate the sequential specifi- (the transition is often a determinis- SC object later. These ideas where first cation automaton locally. We illustrate tic function, but not always). Thus, discussed by Herlihy.19 here the approach with a total order usual data structures from sequen- tial programing, such as queues and Algorithm 3. ABD’s implementation of read/write register: read operation. stacks, can be used to define concur- rent objects. Consensus. At the core of many situ- ations where sequential reasoning for concurrent programming is used (including state machine replication) are agreement problems. A common underlying abstraction is the consen- sus object. Let CONS be a consensus object. A process pi can invoke the op- eration CONS.propose(v) once. The invocation eventually returns a value v′. This sequential specification for CONS is defined by the following properties. ˲˲ Validity. If an invocation returns v then there is a CONS.propose(v). ˲˲ Agreement. No two different values are returned. ˲˲ Termination. If a process invokes CONS.propose(v) and does not crash, the operation returns a value. New/old inversion scenario. All objects are not equal in an asyn- chronous, crash-prone environment. The phase 1 majority quorum The phase 1 majority quorum obtained by p1 contains obtained by p2 does not contain Consensus objects are the strongest, the pair (v, ts(v)) the pair (v, ts(v)) in the sense that (together with read/ write registers), they can be used to im- plement, despite asynchrony and pro- read2() p cess crashes, any object defined by a 2 sequential specification. Other impor- read1() tant objects, such as a queue or a stack p1 are of intermediate strength: they can- REG.write(v) not be implemented by asynchronous p3 processes, which communicate using read/write registers only. Such imple-

JANUARY 2020 | VOL. 63 | NO. 1 | COMMUNICATIONS OF THE ACM 83 review articles

Algorithm 4. TO-broadcast-based universal construction. broadcast mechanism for reaching the required agreement. Total order broadcast. The TO-broad- cast abstraction is an important primi- tive in , which ensures that all correct processes receive messages in the same order.18,37 It is used through two operations, TO _ broadcast() and TO _ deliver(). A process invokes TO _ broadcast(m), to send a message m to all other pro- cesses. As a result, processes execute TO _ deliver() when they receive a (to- tally ordered) message. TO-broadcast illustrates one more general idea within the theory of mastering concurrent programming through sequential thinking: the iden- Algorithm 5. Implementing TO-broadcast from consensus. tification of communication abstrac- tions that facilitate building concur- rent objects defined by a sequential specification. State machine replication based on TO-broadcast. A concurrent imple- mentation of object O is described in Algorithm 4. It is a universal con- struction, as it works for any object O defined by a sequential specification.

The object has operations opx(), and a transition function δ() (assuming

δ is deterministic),where δ(state, opx

(paramx)) returns the pair 〈state′, res〉, where state′ is the new state of the ob- ject and res the result of the operation. The idea of the construction is sim-

ple. Each process pi has a copy statei of the object, and the TO-broadcast ab- straction is used to ensure that all the

processes pi apply the same sequence of operations to their local representa-

tion statei of the object O. Implementing TO-broadcast from consensus. Algorithm 5 is a simple con- struction of TO-broadcast on top of an asynchronous system where consensus objects are assumed to be available.18 Circumventing Consensus Let broadcast(m) stand for “for

each j ∈ {1, . . . , n} do send(m) to pj Impossibility end for.” If the invoking process does not crash during its invocation, all pro-

Three ways of circumventing the consensus impossibility: cesses receive m; if it crashes an arbi- trary subset of processes receive m. • The approach8 can abstract away synchrony assumptions sufficient to distinguish between slow processes and dead processes. The core of the algorithm is the background task T. A consensus ob- • In eventually synchronous systems14 there is a time after which the processes run synchronously. The celebrated Paxos algorithm is an example.28 ject CS[k] is associated with the itera- tion number k. A process pi waits until • By using random coins5 consensus is solvable with high probability. there are messages in the set pendingi • Often not all combinations of input values occur.29 and not yet in the queue to_deliver-

ablei. When this occurs, process pi computes this set of messages (seq)

84 COMMUNICATIONS OF THE ACM | JANUARY 2020 | VOL. 63 | NO. 1 review articles and order them. Then it proposes seq State machine replication based on “self-contained” in the sense it has to to the consensus instance SC[k]. This LL/SC. To give more intuition about deal with the ABA problem.35) instance returns a sequence saved in state machine replication, and further- The intuition of how the LL/SC op- resi, which is added by pi at the end of more, about the way that blockchains erations work is as follows. Consider a its local queue to_deliverablei. work, we present an implementation memory location M, initialized to ⊥, ac- based on LL/SC. (Another option is cessed only by the operations LL/SC. As- When are Universal based on Compare&Swap, but it is not sume that if a process invokes M.SC(v) Constructions Possible? An impossibility. A fundamental re- Algorithm 6. Implementing a consensus object CONS from the operations LL/SC. sult in distributed computing is the impossibility to design a (determinis- tic) algorithm that solves consensus in the presence of asynchrony, even if only one process may crash, either in message-passing or read/write shared memory systems.16 Given that consen- sus and TO-broadcast are equivalent, the state machine replication algo- rithm presented above cannot be im- plemented in asynchronous systems where processes can crash. Thus, sequential thinking for con- current computing has studied proper- ties about the underlying system that enable the approach to go through. There are several ways of considering Universal Construction computationally stronger (read/write or message-passing) models,35,37 where based on LL/SC state machine replication can be im- plemented. Some ways, mainly suited to message-passing systems, are pre- sented in the sidebar “Circumventing Consensus Impossibility.” Here, we discuss a different way, through power- ful communication hardware. Systems that include powerful ob- jects. Shared memory systems usually include synchronization operations such as Test&Set, Compare&Swap, or the pair of operations Load Link and Store Conditional (LL/SC), in addi- tion to read/write operations. These operations have a consensus number greater than 1. More specifically, the consensus number of Test&Set is 2, while the consensus number of both Compare&Swap and the pair LL/SC, is +∞. Namely, 2-process (but not a 3-pro- cess) consensus can be implemented from Test&Set, despite crash failures. Compare&Swap (or LL/SC) can imple- ment consensus for any number of processes. Hence, for any n, any ob- ject can be implemented in an asyn- chronous n-process read/write system enriched with Compare&Swap (or LL/ SC), despite up to n−1 process crashes. Furthermore, there are implementa- tions that tolerate arbitrary, malicious (Byzantine) failures.7,37

JANUARY 2020 | VOL. 63 | NO. 1 | COMMUNICATIONS OF THE ACM 85 review articles

it has previously invoked M.LL(). The tion δ. To do so, when a process invokes operation M.LL() is a simple read of append(X), X consists of a transition M which returns the current value of to be applied to the state machine. The M.LL(). When a process pi invokes state of the object is obtained through M.SC(v) the value v is written into M if a read() invocation, which returns and only if no other process invoked While resources the sequence of operations which have M.SC() since its (pi) last invocation of are physical been sequentially appended to the led- M.LL(). If the write succeeds M.SC() re- ger, and then locally applying them turns true, otherwise it returns false. objects, data starting from the initial state of the ob- Algorithm 6 is a simple implemen- is digital objects. ject (see Raynal37 for more details). tation of consensus from the pair of Three remarkable properties. The operations LL/SC, which tolerates any apparently innocent idea of a read() number of process crashes. operation that returns the list of com- In the sidebar “Universal Construc- mands that have been applied to the tion Based on LL/SC,” there is a shared- state machine, opens the discussion of memory, LL/SC based universal con- one of the remarkable points of distrib- struction.15 Looking at the algorithm, uted ledgers that has brought them to one begins to get a feeling for the dis- such wide attention. The possibility of tributed ledgers discussed next. guaranteeing a tamperproof list of com- mands. The blockchain implementa- Distributed Ledgers tion is by using cryptographic hashes Since ancient times, ledgers have been that link each record to the previous one at the heart of commerce, to represent (although the idea has been known in concurrent transactions by a permanent the cryptography community for years). list of individual records sequentialized The ledger implementation used in by date. Today we are beginning to see Bitcoin showed it is possible to have a algorithms that enable the collaborative state machine replication tolerating creation of digital distributed ledgers Byzantine failures that scales to hun- with properties and capabilities that go dreds of thousands of processes. The far beyond traditional physical ledgers. cost is temporarily sacrificing consis- All participants within a network can tency—forks can happen at the end of have their own copy of the ledger. Any of the blockchain, which implies that the them can append a record to the ledger, last few records in the blockchain may which is then reflected in all copies in have to be withdrawn. minutes or even seconds. The records The third remarkable property stored in the ledger can stay tamper- brought to the public attention by dis- proof, using cryptographic techniques. tributed ledgers is the issue of who Ledgers as universal constructions. the participants can be. As opposed to Mostly known because of their use classic algorithms for mastering con- in cryptocurrencies, and due to its currency through sequential thinking, blockchain implementation,30 from the participants do not have to be a pri- the perspective of this paper a distrib- ori-known, can vary with time, and may uted ledger is a -tolerant even be anonymous. Anyone can ap- replicated implementation of a spe- pend a block and read the blockchain cific ledger object. The ledger object (although there are also permissioned has two operations, read() and ap- versions where participants have to be pend(). Its sequential specification registered, and even hybrid models). In is defined by a list of blocks. A block a sense, a distributed ledger is an open X can be added at the end of the list distributed database, with no central with the operation append(X), while authority, where the data itself is dis- a read() returns the whole list. In the tributed among the participants. case of a cryptocurrency, X may contain Agreement in dynamic, Byzantine sys- a set of transactions. tems. Bitcoin’s distributed ledger im- Thus, a ledger object, as any other plementation is relatively simple to ex- object, can be implemented using a plain in the framework of state machine Byzantine failures-tolerant state ma- replication. Conceptually it builds on chine replication algorithm. Converse- randomized consensus (something ly, a ledger can be used as a universal that had already been carefully studied construction of an object O defined by in traditional approaches, as noted in a state machine with a transition func- the sidebar “Circumventing Consensus

86 COMMUNICATIONS OF THE ACM | JANUARY 2020 | VOL. 63 | NO. 1 review articles

Impossibility”), through the following suffer from a performance bottleneck programming control. Comm. ACM 8, 8 (Sept. 1965), 569. 12. Dijkstra, E.W. Cooperating sequential processes. ingenious technique to implement it. due to the requirement of ordering all Programming Languages. Academic Press, 1968, 43–112. Whenever several processes (not neces- transactions in a single list, which has 13. Dijkstra, E.W. Hierarchical ordering of sequential processes. Acta Informatica 1, 1 (1971), 115–138. sarily known a priori, hence the name prompted the exploration of partially 14. Dolev, ., Dwork, C., and Stockmeyer, L. On the of “dynamic system”) want to concur- ordered ledgers, based on directed acy- minimal synchronism needed for distributed consensus. JACM 34, 1 (1987), 77–97. rently append a block, they participate clic graphs such as those of Tangle or 15. Fatourou, P. and Kallimanis, N.D. Highly-efficient wait- in a lottery. Each process selects a ran- Hedera Hashgraph. free synchronization. Theory of Computing Systems 55 (2014), 475–520. dom number (by solving cryptographic The CAP Theorem formalizes a fun- 16. Fischer, M.J., Lynch, N.A., and Paterson, M.S. Impossibility of distributed consensus with one faulty puzzles) between 0 and some large in- damental limitation of the approach process. JACM 32, 2 (1985), 374–382. teger K, and the one that gets a number of mastering concurrency through se- 17. Gilbert, S. and Lynch, N. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant smaller than k << K, wins, and has the quential reasoning—at most, two of the web services. SIGACT News 33, 2 (2002), 51–59. right to append its desired block. following three properties are achiev- 18. Hadzilacos, V. and Toueg, S. A modular approach to fault-tolerant broadcasts and related problems. TR The implementation details of the able: consistency, availability, partition 94-1425. Cornell Univ. (1994) lottery (by a procedure called proof of tolerance.17 This may give an intuition 19. Herlihy, M.P. Wait-free synchronization. ACM Trans. on Prog. Languages and Systems 13, 1 (1991), 124–149. work) are not important for this arti- of why distributed ledgers implemen- 20. Herlihy, M.P. and Moss, J.E.B. Transactional memory: cle; what is important here is that with tations have temporary forks. An alter- Architectural support for lock-free data structures. In Proc. 20th ACM Int’l Symp. . high probability only one wins (and se- native is a cost in availability and post- ACM Press, 1993, 289–300. lected at random). However, from time pone the property that every non-failing 21. Herlihy, M. and Shavit, N. The Art of Multiprocessor Programming. Morgan Kaufmann, ISBN 978-0-12- to time, more than one process wins, participant returns a response for all 370591-4 (2008). and a fork happens, with more than operations in a reasonable amount of 22. Herlihy, M.P. and Wing, J.M. Linearizability: A correctness condition for concurrent objects. ACM one block being appended at the end time. We have already seen in the ABD Trans. Programming Languages and Systems 12, 2 of the blockchain. Only one branch algorithm that the system continues to (1990), 463–492. 23. Lamport, L. A new solution of Dijkstra’s concurrent eventually pervades (in Bitcoin this is function and upholds its consistency programming problem. Commun. ACM 17, 8 (1974), 453–455. achieved by always appending to the guarantees, provided that only a minor- 24. Lamport, L. Concurrent reading and writing. Commun. longest branch). This introduces a new ity of processes may fail. ACM 20, 11 (Nov. 1977), 806–811. 25. Lamport, L. Time, clocks, and the ordering of events interesting idea into the paradigm of Finally, another fundamental limita- in a distributed system. Commun. ACM 21, 7 (Sept. mastering concurrency through se- tion to the approach of mastering con- 1978), 558–565. 26. Lamport, L. On interprocess communication, Part I: Basic quential thinking: a trade-off between currency through sequential reasoning formalism. Distributed Computing 1, 2 (1986), 77–85. faster state machine replication, and is that not all concurrent problems of 27. Lamport, L. A fast mutual exclusion algorithm. ACM Trans. Computer Systems 5, 1 (1987), 1–11. temporary loss of consistency. In other interest have sequential specifications. 28. Lamport, L. The part-time parliament. ACM Trans. words, the x operations at the very end Many examples are discussed in Casta- Computer Systems 16, 2 (1998), 133–169. 10 29. Mostéfaoui, Rajsbaum, S. and Raynal, M. Conditions on of the blockchain, for some constant ñeda et al., where a generalization of input vectors for concensus solvability in asynchronous x (which depends on the assumptions linearizability to arbitrary concurrent distributed systems. JACM 50, 6 (2003, 922–954. 30. Nakamoto, S. Bitcoin: A peer-to-peer electronic cash about the environment) cannot yet be specifications is described. system. Unpublished manuscript (2008). considered committed. Acknowledgment. The authors ac- 31. Pease, M., Shostak, R. and Lamport, L. Reaching agreement in the presence of faults. JACM 27 (1980), knowledge support of UNAM-PAPIT 228–234. The Limits of the Approach IN106520, INRIA-Mexico Associate 32. Peterson, G.L. Myths about the mutual exclusion problem. Information Processing Letters 12, 3 (1981), It is intuitively clear, and it has been Team, CNRS-Mexico UMI. 115–116. 33. Peterson, G.L. Concurrent reading while writing. formally proved, that linearizability or ACM Trans. on Prog. Languages and Systems (1983), even serializability may be costly. Re- References 5:46-5:55. 1. Alpern, B. and Schneider, F.B. Defining liveness. 34. Rabin, M. Randomized Byzantine generals. Proc. 24th cent papers in the context of shared Information Processing Letters 21, 4 (1985), 181–18. IEEE Symp. Foundations of . IEEE memory programming, argue that it is 2. Attiya, H., Bar-Noy, A., and Dolev, D. memory Computer Society Press, 1983, 116–124. robustly in message-passing systems. JACM 42, 1 35. Raynal, M. Concurrent Programming: Algorithms, often possible to improve performance (1995), 121–132. Principles and Foundations. Springer, 2013, ISBN 978- 3. Attiya, H., Ellen, F. and Morrison, A., Limitations of 3-642-32026-2. of concurrent data structures by relax- highly-available eventually-consistent data stores. 9 36. Raynal, M. Distributed Algorithms for Message-Passing ing their semantics. In the context IEEE Trans. Parallel Distributed Systems 28, 1 (2017), Systems. Springer, 2013, ISBN 978-3-642-38122-5. of distributed systems, eventual con- 141–155. 37. Raynal, M. Fault-Tolerant Message-Passing Distributed 4. Attiya, H. and Welch, J. Distributed Computing: Systems: An Algorithmic Approach. Springer, 2018, sistency is widely deployed to achieve Fundamentals, Simulations and Advanced Topics, (2nd ISBN 978-3-319-94140-0. high availability by guaranteeing that Edition), Wiley, 2004. 38. Stearns, R.C., Lewis, P.M. and Rosenkrantz, D.J. 5. Ben-Or, M. Another advantage of free choice: Concurrency control for database systems. In Proc. if no new updates are made to a given completely asynchronous agreement protocols. 16th Conf. Found. Comp. Sci., (1976), 19–32. nd ACM Symp. on Principles of Distributed data item, eventually all accesses to In Proc. 2 39. Schneider, F.B. Implementing fault-tolerant services Computing, (1983), 27–30. using the state machine approach. ACM Computing that item will return the last updated 6. Brinch, H.P. (Ed.). The Origin of Concurrent Programming. Surveys 22, 4 (1990), 299–319. Springer, (2002). value (despite its name is not techni- 40. Shavit, N. and Touitou, D. transactional 7. Cachin, C. State machine replication with Byzantine memory. Distributed Computing 10, 2 (1997), 99–116. cally a consistency condition.3). In the faults. Replication. Springer LNCS 5959, (2011), 169–184. 8. Chandra, T.D., Hadzilacos, V., and Toueg, S. The case of distributed ledgers, we have weakest failure detector for solving consensus. JACM Sergio Rajsbaum ([email protected]) is a professor 43, 4 (1996), 685–722. seen the benefit that can be gained by at the Instituto de Matematicas at the Universidad 9. Calciu, I., Sen, S., Balakrishnan, M., and Aguilera, M. Nacional Autónoma de Mexico inn Mexico City, México. relaxing the sequential approach to How to implement any concurrent data structure for mastering concurrency: branches at modern servers. ACM Operating Systems Rev. 51, 1 Michel Raynal ([email protected]) is a professor at IRISA, (2017), 24–32. University of Rennes, France, and Polytechnic University in the end of the blockchain (such as Bit- 10. Castañeda, A., Rajsbaum, S., and Raynal, M. Unifying Hong Kong. concurrent objects and distributed tasks: Interval- coin) temporarily violate a consistent linearizability. JACM 65, 6 (2018), Article 45. view of the ledger. Still, blockchains 11. Dijkstra, E.W. Solution of a problem in concurrent © 2020 ACM 0001-0782/20/1

JANUARY 2020 | VOL. 63 | NO. 1 | COMMUNICATIONS OF THE ACM 87