Motivation

Synchronization is Coming Back • Synchonization is coming back, but is it the same? But is it the Same?

• Lots of new concepts and mechanisms since locks! Michel RAYNAL [email protected] • The multicore (r)evolution

The synchronization world has changed Institut Universitaire de France • IRISA, Universit´ede Rennes, France Dpt of Computing, Polytechnic Univ., Hong Kong

c Michel Raynal Synchronization is coming back! 1 c Michel Raynal Synchronization is coming back! 2

A recent book on the topic Part I

Concurrent programming: , Principles, PART 1: Lock-Based Synchronization and Foundations

• Chapter 1: The Problem by Michel Raynal • Chapter 2: Solving Exclusion Problem Springer, 515 pages, 2013 • Chapter 3: Lock-Based Concurrent Objects ISBN: 978-3-642-32026-2 6 Parts, composed of 17 Chapters Balance: algorithms vs foundations

c Michel Raynal Synchronization is coming back! 3 c Michel Raynal Synchronization is coming back! 4 Part II Part III

PART 3: Mutex-free Synchronization

Chapter 5: Mutex-Free Concurrent Objects PART 2: On the Foundations Side: the Atomicity Concept • • Chapter 6: Hybrid Concurrent Objects

• Chapter 4: • Chapter 7: Wait-Free Objects from Read/write Registers Only

Atomicity: Formal Definition and Properties • Chapter 8: Snapshot Objects from Read/Write Registers Only • Chapter 9: Renaming Objects from Read/Write Registers Only

c Michel Raynal Synchronization is coming back! 5 c Michel Raynal Synchronization is coming back! 6

Part IV Part V

PART 5: On the Foundations Side: from Safe Bits to Atomic Registers

PART 4: The Approach • Chapter 11: Safe, Regular, and Atomic Read/Write Registers

• Chapter 10: Transactional Memory • Chapter 12: From Safe Bits to Atomic Bits: a Lower Bound and an Optimal Construction • Chapter 13: Bounded Constructions of Atomic b-Valued Registers

c Michel Raynal Synchronization is coming back! 7 c Michel Raynal Synchronization is coming back! 8 Part V Summary

Computing Model: Processes and Concurrent Objects PART 6: On the Foundations Side: • the Computability Power of Concurrent Objects • Base read/write model, more powerful models • On the safety side: Linearizability • Chapter 14: Universality of Consensus • On the liveness side: Progress conditions • Chapter 15: The Case of Unreliable Base Objects • Mutex-free concurrent objects • Chapter 16: • Comuptability power, consensus number, etc. Consensus Numbers and the Consensus Hierarchy • Hybrid concurrent objects • Chapter 17: The Alpha(s) and Omega of Consensus • Conclusion

c Michel Raynal Synchronization is coming back! 9 c Michel Raynal Synchronization is coming back! 10

Part I Asynchronous model (1)

• Computing model: n sequential processes Π = p1,...,pn (Turing machines) • Timing model: Asynchrony: No upper bound on the MODEL and DEFINITIONS time required to execute a computation step • Failure model

⋆ No failure ⋆ Crash failure

c Michel Raynal Synchronization is coming back! 11 c Michel Raynal Synchronization is coming back! 12 Asynchronous process model (2) Asynchronous Base Communication Model (1)

Shared memory provides processes with

• Process crash: a process behaves according to its spec- ification until it possibly crashes, i.e., halts prematurely • Reliable Read/Write atomic registers (after it has crashed a process is definitely stopped) • Reliable Compare&Swap atomic registers • Terminology wrt a run: • Atomic means that: ⋆ A Correct process is a p that never crashes i ⋆ The operations on a register appear as if they have been executed sequentially ⋆ A Faulty process is a pi that crashes ⋆ Each operation appears as being executed instanta- neously at some point of the time line between its start event and its end event

c Michel Raynal Synchronization is coming back! 13 c Michel Raynal Synchronization is coming back! 14

Atomic register Other base (hardware) operations

READ: 2 READ: 1

WRITE: 1 WRITE: 2 READ: 2 WRITE: 3 • Test&set • Fetch&add Real time line • Compare&swap

Valueis1 3 2 • LL/SC • Swap, Mem-to-Mem SWAP

- Lamport L., On interprocess Communication (part 1: Basic Formalism, Part 2: • etc. Algorithms), , 1(2):77-101, 1986 - Herlihy M.P. and Wing J.L., Linearizability: a Correctness Condition for Concurrent Objects. ACM Trans. on Programming Languages and Systems, 12(3):463-492, 1990

c Michel Raynal Synchronization is coming back! 15 c Michel Raynal Synchronization is coming back! 16 Structural view (Base level) Part II p1 local memory pi local memory pn local memory

p1 pi pn On the SAFETY side

Asynchronous shared memory Abstraction (could be shared disks, SAN)

c Michel Raynal Synchronization is coming back! 17 c Michel Raynal Synchronization is coming back! 18

SAFETY PROPERTY Concurrent Object: example

An object accessed by concurrent processes

p LINEARIZABILITY p1 pi n

- Herlihy M.P. and Wing J.M., Linearizability: a correctness condition for concurrent objects. ACM Toplas, 12(3):463- 492, 1990 Enqueue (v) r ← Dequeue ()

- Lamport L., On interprocess Communication (part 1: Basic Formalism, Part 2: Algorithms), Distributed Com- puting, 1(2):77-101, 1986       c Michel Raynal Synchronization is coming back! 19  c Michel Raynal Synchronization is coming back! 20      Sequential vs Concurrent (1) History (Run, Execution)

Enq (a) Enq (b) Deq (a|b|c) ? SEQUENTIAL: p1

Enq (a) Enq (c) Enq (b) Deq(a) Deq (c) Enq (c) Deq(a|b|c) ? p2

CONCURRENT:

Enq (a) Enq (b) Deq (a|b|c) ? p1 History H Inv (Enq(b)) Sequences of events Res (Enq(b)) Enq (c) Deq(a|b|c) ? p2 A history defines a partial order on the operations

c Michel Raynal Synchronization is coming back! 21 c Michel Raynal Synchronization is coming back! 22

Two types of Concurrent Objects Example: Interactive consistency

• A concurrent object encapsulates a particular synchro problem Each process pi proposes a value vi • Concurrent objects with a sequential specification Queue, file, graph, tree, stack, ... and has to decide a value such that:

⋆ Fifo queue: producer/consumer problem • Validity. Let D[1..n] be the vector decided by a process. ⋆ File (register): Readers/writers problem ∀i ∈ [1..n]: D[i] ∈ {vi, ⊥} and D[i]= vi if pi is correct • Concurrent objects with a not-sequential specification • Agreement. No two processes decide different vectors Rendezvous Object, Interactive Consistency, • Termination. Every correct process decides Non Blocking Atomic Commit Object, ...

c Michel Raynal Synchronization is coming back! 23 c Michel Raynal Synchronization is coming back! 24 Linearizability Sequential vs Concurrent (2)

Enq (a) Enq (b) Deq (a|b|c) ? An execution (or “history”) H is linearizable if the opera- p1 tions issued by the processes appear as if they have been executed in some sequential order that respects: Enq (c) Deq(a|b|c) ? p2 • The seq order in each process • The seq specification of the object The real-time order of the non-overlapping operations • Enq (a) Enq (c) Deq (b) Enq (b) Deq (a)

c Michel Raynal Synchronization is coming back! 25 c Michel Raynal Synchronization is coming back! 26

Sequential vs Concurrent (3) Locality of the Linearizability property (1)

Enq (a) Enq (b) Deq (a|b|c) ? p1 • A property P of a concurrent system is Local if the system as a whole satisfies P whenever each individual Enq (c) Deq (a|b|c) ? object satisfies P p2 Locality means that, given two objects X and Y , each satisfying the property P , the composite object [X,Y ] satisfies the property P • Linearizability is a local property (consistency criterion) Enq (a) Deq (a) Enq (c) Enq (b) Deq(c)

c Michel Raynal Synchronization is coming back! 27 c Michel Raynal Synchronization is coming back! 28 Locality of the Linearizability property (2) Sequential Consistency

• An execution (or “history”) H is sequentially consistent if the operations issued by the processes appear as if they have been executed in some sequential order that • Theorem: A history H is linearizable iff, for each object respects each process order and the seq specification X, H restricted to the operations on X is linearizable of every object Locality means that, given two linearizable objects X .Enq( ) and Y , the composite object [X,Y ] is linearizable Q a • It follows that each object can be implemented indepen- dently from the others (this is a fundamental property from both theoretical and practical point of views) Q.Enq(b) Q.Deq(b) • Sequential consistency, some forms of serializability are not local properties A ”witness” seq history:

Q.Enq(b) Q.Enq(a) Q.Deq(b)

c Michel Raynal Synchronization is coming back! 29 c Michel Raynal Synchronization is coming back! 30

Sequential Consistency is not Local Part III

Q.Enq(a) Q′.Enq(b′) Q′.Deq(b′)

Q′.Enq(a′) Q.Enq(b) Q.Deq(b) On the LIVENESS side

• Q and Q′ are sequentially consistent PROGRESS CONDITIONS • the whole history H is not sequentially consistent Impossibility to produce a “witness” sequential history • See distributed caches

c Michel Raynal Synchronization is coming back! 31 c Michel Raynal Synchronization is coming back! 32 Two cases MUTEX-FREEDOM

• Reliable systems: • No code is protected by a critical section ⋆ Lock-based implementations ⋆ Progress conditions O.op1() by p1 O.op2(b) by p2 O.op1() by p3 ∗ -freedom ∗ Starvation-freedom

• Crash-prone systems:

⋆ Mutex-free implmentations R1 R3 R2R3R3 R2R3 R1 R2 R1 ⋆ Associated progress conditions

c Michel Raynal Synchronization is coming back! 33 c Michel Raynal Synchronization is coming back! 34

Abortable object Obstruction-freedom

• If an operation executes alone (concurrency-free con- • If an operation executes alone (concurrency-free con- text), it terminates and returns a correct value text), it terminates and returns a correct value • If an operation executes in a concurrency context, • If an operation executes in a concurrency context, it terminates and returns a correct value or it returns ⊥ and has no effect on the object ⋆ it can never termination ⋆ if it terminates, its result is correct This definition is different from the one used by Aguilera M.K., Frolund S., Hadzila- cos V., Horn S.L. and Toueg S., Abortable and Query-abortable Objects and their Implementations. PODC’07, pp. 23-32, 2007 Abortability is a stronger notion than obstruction-freedom

c Michel Raynal Synchronization is coming back! 35 c Michel Raynal Synchronization is coming back! 36 Non-blocking Starvation-free

• Any operation invocation terminates

• In a concurrency context, at least one operation termi- • Failure-prone systems where t process may crash: nates called t-resilient systems • In a failure-free system: Non-blocking = Deadlock-free • Systems where t = n − 1: called wait-free systems • Difference between wait-freedom and starvation-freedom is failure-free systems

c Michel Raynal Synchronization is coming back! 37 c Michel Raynal Synchronization is coming back! 38

Hierarchy of progress conditions in failure-prone systems Part IV

• Obstruction-freedom ≺ Non-blocking ≺ Wait-freedom Simple mutex-free implementations • Abortable ≺ Wait-freedom of concurrent objects

c Michel Raynal Synchronization is coming back! 39 c Michel Raynal Synchronization is coming back! 40 Introductory example: the SPLITTER Splitter: Specification

x processes

• At most one process exits with stop At most ( 1) processes exit with stop right • x − right ≤ x − 1 processes ≤ 1 process • At most (x − 1) processes exit with down

down Lamport 1987, Anderson-Moir 1995.

≤ x − 1 processes

c Michel Raynal Synchronization is coming back! 41 c Michel Raynal Synchronization is coming back! 42

Splitter: WAIT-FREE Implementation Splitter: Proof (very easy)

Two shared registers: LAST (init ∀) and DOOR (init open) procedure direction() % issued by pi % LAST ← i DOOR← closed LAST = i

LAST ← idi; if DOOR = closed then movei ← right else DOOR ← closed if (LAST = idi) then movei ← stop else move ← down i No processpj has modified the atomic registerLAST endif endif; return (movei)

c Michel Raynal Synchronization is coming back! 43 c Michel Raynal Synchronization is coming back! 44 A NON-BLOCKING timestamp object: definition A non-blocking timestamp object: base objects

• NEXT : next integer timestamp value (init to 1) • Validity. No two invocations of get timestamp() return • LAST : unbounded array of atomic registers. the same value A process p deposits its index i in LAST [k] to indicate • Consistency. Let gt1() and gt2() be two distinct invo- i cations of get timestamp(). it is trying to obtain the timestamp k • COMP: unbounded array of atomic Boolean registers If gt1() returns before gt2() starts, the timestamp re- (all init to false) turned by gt2() is greater than the one returned by gt1() A process pi sets COMP[k] to true to indicate that it is • Termination. Obstruction-freedom competing for the timestamp k (several processes can write true into COMP[k])

c Michel Raynal Synchronization is coming back! 45 c Michel Raynal Synchronization is coming back! 46

A non-blocking timestamp object: implementation A wait-free stack: model and base objects

operation get timestamp(i) is k ← NEXT ; • Read/write model enriched with Fetch&add and swap repeat forever operations LAST [k] ← i; if (¬COMP[k]) • Base atomic objects then COMP[k] ← true; if (LAST [k]= i) ⋆ REG[0..∞): array of atomic registers which contains the elements of the stack (init to ) then NEXT ← NEXT + 1; return(k) ⊥ end if REG[0] sentinel end if; ⋆ NEXT : atomic register that contains the index of the k ← k +1 next entry where a value can be deposited (init to 1) end repeat end operation.

c Michel Raynal Synchronization is coming back! 47 c Michel Raynal Synchronization is coming back! 48 A wait-free stack: push A wait-free stack: pop algorithm

operation Q.pop() is last ← NEXT − 1; for x from last to 0 do operation push(v) is aux ← REG[x].swap(⊥); in ← NEXT .fetch&add() − 1; if (aux 6= ⊥) then return(aux) end if REG[in] ← v; end for, return() return(empty) end operation. end operation.

wait-freedom vs bounded wait-freedom

c Michel Raynal Synchronization is coming back! 49 c Michel Raynal Synchronization is coming back! 51

Part V Boosting progress with contention managers

• From Obstruction-freedom to wait-freedom Boosting Failure detector: ✸P (eventually perfect) IMPLEMENTATIONS • From non-blocking to wait-freedom

Failure detector: ΩX

c Michel Raynal Synchronization is coming back! 52 c Michel Raynal Synchronization is coming back! 53 Boosting progress with contention managers From an obstruction-free to a non-blocking timestamp

operation get timestamp(i) is k ← NEXT ; repeat forever

operation1() operationm() LAST [k] ← i; if (¬COMP[k]) then COMP[k] ← true; if (LAST [k]= i) need help() thenNEXT ← NEXT + 1; Obstruction-free Failure detector-based implementation Contention manager CM.stop help(i); return(k) stop help() end if end if; k ← k + 1; CM.need help(i) end repeat end operation.

c Michel Raynal Synchronization is coming back! 54 c Michel Raynal Synchronization is coming back! 55

From obstruction-freedom to non-blocking Part VI

operation CM .need help(i) is NEED HELP[i] ← true; repeat x ← {j | NEED HELP[j]} until (ev leader(x)= i) end repeat; Read/write-based WAIT-FREE return() IMPLEMENTATIONS end operation. operation CM .stop help(i) is NEED HELP[i] ← false; return() end operation.

c Michel Raynal Synchronization is coming back! 56 c Michel Raynal Synchronization is coming back! 57 The (Static) Renaming Problem Moir-Anderson’s Protocol

• id1,...,idn are the initial identities of the processes

• id1,...,idn ∈ [0..N − 1], where N >>>>> n • Each process has to acquire a new name in the set [0..M − 1] • No two processes can get the same name • Considers M = n(n + 1)/2 • Type of renaming: • Basic idea: a grid of splitters ⋆ Static renaming: a name is acquired once for all

⋆ Dynamic renaming: names are (acquired; released)∗

• Theoretical result: M ≥ 2n − 1 (Herlihy-Shavit)

c Michel Raynal Synchronization is coming back! 58 c Michel Raynal Synchronization is coming back! 59

Grid of Splitters The wait-free algorithm

li 0 1 2 3 4 ki operation get name(idi) 0 01 2 3 4 di ← 0; ri ← 0; movei ← down; while (movei 6= stop) do 1 5 6 7 8 movei ← splitter[di, ri].direction(); case (movei = right) then ri ← ri +1 (move = down) then d ← d +1 2 9 10 11 i i i (movei = stop) then exit loop end case 3 12 13 end while return (n × di + ri − (di(di − 1)/2)) % the new name is the position [di, ri] in the grid % 4 14

c Michel Raynal Synchronization is coming back! 60 c Michel Raynal Synchronization is coming back! 61 A snapshot Object Snapshot operations

• Keeps data provided by processes SM[i] SM[1] SM[n] • When pi invokes store(v) it defines v as its last deposited value • A process invokes snapshot to get the values deposited by the processes • Everything has to appear as if the operations were executed instantaneously (at some time between their invocation and their termina- tion)

update(v) by pi snapshot() by pj, ∀j

c Michel Raynal Synchronization is coming back! 62 c Michel Raynal Synchronization is coming back! 63

Underlying idea Snapshot: Partial proof (easy)

aa [1].sn = a = SM [1].sn bbi[1].sn = a REG[1] i operation update (v) sn ← sn +1; % local seq number generator % aa [2].sn = b bbi[2].sn = b i i REG[2] i SM[i] ← (v,sni) % atomic write %

aai[3].sn = c bbi[3].sn = c operation snapshot REG[3] while true do aai[4].sn = d bbi[4].sn = d = SM [4].sn Ai ← scan; Bi ← scan; REG[4] % double “asynchronous” scan % if (∀j : A [j].sn = B [j].sn) then return (A .val) end if i i i first scan() second scan() snapshot() operation % Ai.val =[Ai[1].val,...,Ai[n].val] % end while time line

linearization point of the snapshot() operation

c Michel Raynal Synchronization is coming back! 64 c Michel Raynal Synchronization is coming back! 65 How an update can help a snapshot Afek et al.s algorithm (1)

up1 up2 operation update ( ): pj v snap int help arrayi ← snapshot(); sni ← sni + 1; pi SM[i] ← (v,sn , help array ) snap i i

c Michel Raynal Synchronization is coming back! 66 c Michel Raynal Synchronization is coming back! 67

Afek et al.s algorithm (2) Snapshot: Proof operation snapshot: could helpi ← ∅; while true do snapshot() pi Ai ← scan; Bi ← scan; % double “asynch” collect % if (∀j : Ai[j].sn = Bi[j].sn) help array then return (Ai.val) update() update() else for j : 1 ≤ j ≤ n do pj snapshot() if (Ai[j].sn 6= Bi[j].sn) then if (j ∈ could helpi) help array then return (Bi[j].help array) else could help ← could help ∪ {j} update() update() i i pk end if end if snapshot() end for successful double scan end if end while

c Michel Raynal Synchronization is coming back! 68 c Michel Raynal Synchronization is coming back! 69 Part VII Definition

An implementation of a concurrent object is

• Mutex-free if it does not use locks HYBRID • Static Hybrid if IMPLEMENTATIONS ⋆ It uses locks only for some operations

• Dynamic Hybrid if

⋆ It never uses locks in “favorable circumstances”

c Michel Raynal Synchronization is coming back! 70 c Michel Raynal Synchronization is coming back! 71

Example Hybridism vs adaptivity

• Static hybrid set:

⋆ add() and remove(): lock-based ⋆ belong(): no lock • Hybridism is wrt to lock and concurrency

• Static hybrid stack/queue • Adaptivity is wrt the concurrency degree (cost has to depends on the numb of competing processes only) ⋆ In concurrency-free contexts: no lock and a bounded number of steps ⋆ In concurrency contexts: locks can be used

c Michel Raynal Synchronization is coming back! 72 c Michel Raynal Synchronization is coming back! 73 Part VII Underlying system

A MUTEX-FREE • Atomic read/write registers ABORTABLE STACK • Compare&swap objects

c Michel Raynal Synchronization is coming back! 74 c Michel Raynal Synchronization is coming back! 75

Compare&Swap: definition Compare&Swap: use by pi

• Let X = a Compare&Swap is a conditional write • pi reads a from register X • It then does computation and computes a new value c for X primitive X.C&S(old, new): if (X = old) then X ← new; return(true) • Finally pi wants to assign c to X if and only if X has not been modified since it read it else return(false) end if. To that end pi issues X.C&S(a, c)

c Michel Raynal Synchronization is coming back! 76 c Michel Raynal Synchronization is coming back! 77 Compare&Swap: the ABA problem Solving the ABA problem

Unfortunately the previous use of Compare&Swap is incor- Associate a new sequence number with every X.C&S rect!

• Initially X = a • X is now a pair ha,sni • At time τ : • At time τ1: pi reads a from X 1 pi reads ha,sni from X • At time τ2 >τ1: At time : pj successfully executes X.C&S(a, b) (X = b) • τ2 >τ1 pj successfully executes X.C&S(ha,sni, hb, sn +1i) • At time τ3 >τ2: At time : pj successfully executes X.C&S(b, a) (X = a) • τ3 >τ2 pk successfully executes X.C&S(hb, sn +1i, ha,sn +2i) • At time τ >τ : 4 3 • At time τ >τ : pi successfully executes X.C&S(a, b) and erroneously be- 4 3 lieves that X has not been modified by another process when pi executes X.C&S(ha,sni, hc, sn + 1i), the write in the interval [τ1..τ4] into X fails and returns false to pi

c Michel Raynal Synchronization is coming back! 78 c Michel Raynal Synchronization is coming back! 79

Stack operations Stack representation (1)

• The stack is of size k • An array STACK [0..k] of atomic registers • Operation push(v) • ∀x : 0 ≤ x ≤ k : STACK [x] has two fields

⋆ returns full if the stack is full, otherwise ⋆ STACK [x].val contains a value ⋆ adds v to the top of the stack and returns done ⋆ STACK [x].sn contains a seq number (used to prevent the ABA problem on this register) • Operation pop() It counts the nb of successful writes on STACK [x]

⋆ returns empty if the stack is empty, otherwise ∀x : 1 ≤ x ≤ k : STACK [x] initialized to h⊥, 0i ⋆ suppresses the value from the top of the stack and returns it • STACK [0] always stores a dummy entry (init to h⊥, −1i)

c Michel Raynal Synchronization is coming back! 80 c Michel Raynal Synchronization is coming back! 81 Stack representation (2) Principle: laziness + helping mechanism

• A push or pop operation

⋆ updates TOP, and • A register TOP that contains the index of the top of the stack plus the corresponding pair hv,sni ⋆ leaves to the next operation the corresponding update of the stack • TOP initialized to h0, ⊥, 0i Hence it helps the previous (push or pop) operation by modifying the stack accordingly • Both STACK [x] and TOP are modified with Compare&Swap

Shafiei N., Non-blocking Array-based Algorithms for Stacks and Queues. Proc. th Int’l Conference on Distributed Computing and Networking (ICDCN’09), Springer Verlag LNCS #5408, pp. 55-66, 2009

c Michel Raynal Synchronization is coming back! 82 c Michel Raynal Synchronization is coming back! 83

Abortable push: weak push() Abortable stack: help procedure

operation weak push(v): (index, value, seqnb) ← TOP; help(index, value, seqnb); procedure help(index, value, seqnb): if (index = k) then return(full) end if; stacktop ← STACK [index].val; sn of next ← STACK [index + 1].sn; STACK [index].C&S(hstacktop, seqnb − 1i, hvalue, seqnbi). newtop ←hindex +1,v,sn of next +1i; if TOP.C&S(hindex, value, seqnbi, newtop) then return(done) else return(⊥) end if.

c Michel Raynal Synchronization is coming back! 84 c Michel Raynal Synchronization is coming back! 85 Abortable pop: weak pop() From an abortable to a non-blocking stack

operation weak pop(): (index, value, seqnb) ← TOP; operation non blocking push(v): help(index, value, seqnb); repeat res ← weak push(v) until res 6= ⊥ end repeat; if (index = 0) then return(empty) end if; return(res). belowtop ← STACK [index − 1]; newtop ←hindex − 1, belowtop.val, belowtop.sn +1i; operation non blocking pop(): if TOP.C&S(hindex, value, seqnbi, newtop) repeat res ← weak pop() until res 6= ⊥ end repeat; then return(value) else return(⊥) end if. return(res).

c Michel Raynal Synchronization is coming back! 86 c Michel Raynal Synchronization is coming back! 87

Part VIII From a non-blocking lock to a starvation-free lock (1)

A Dynamic Hybrid • A non-blocking lock denoted LOCK STARVATION-FREE STACK • A boolean register FLAG[i] (initialized to false) that process pi sets to true when it wants to obtain the starvation-free lock

c Michel Raynal Synchronization is coming back! 88 c Michel Raynal Synchronization is coming back! 89 From a non-blocking lock to a starvation-free lock (2) Operation and additional data structures

operation starvation free lock(): FLAG[i] ← true; wait ((TURN = i) ∨ (¬FLAG[TURN ])); • Operations strong push() or strong pop() LOCK .lock(). denoted strong push or pop() operation starvation free unlock(): • A boolean register CONTENTION (initialized to false) FLAG[i] ← false; that is set to true by a process when it executes the if (¬FLAG[TURN ]) underlying weak operation(par) operation. then TURN ← (TURN mod n)+1 end if; LOCK .unlock();

c Michel Raynal Synchronization is coming back! 90 c Michel Raynal Synchronization is coming back! 91

The implementation (par = v for push() and ⊥ for pop()) Proof

• Lemma 1: If a process pi returns from its strong push(v) or strong pop() operation strong push or pop(par): invocation, it returns a non-⊥ value if (¬CONTENTION ) then res ← weak push or pop(par); • Lemma 2: If, while executing a strong push(v) or strong pop(), a pro- if (res 6= ⊥) then return(res) end if cess pi reads true from CONTENTION at first line or end if; obtains res = ⊥ at the 2nd line, it eventually obtains starvation free lock(); the lock CONTENTION ← true; • Theorem: repeat res ← weak push or pop(par) until (res 6= ⊥); Any invocation of strong push() or strong pop() returns a CONTENTION ← false; non-⊥ value, and all invocations are linearizable starvation free unlock(); Moreover, the algorithm is contention-sensitive: any op- return(res). eration invoked in a contention-free context is lock-free and accesses six times the shared memory

c Michel Raynal Synchronization is coming back! 92 c Michel Raynal Synchronization is coming back! 93 When there are failures Part IX

• The abortable and non-blocking implementations cope with any number of process crashes • The contention-sensitive implementation is wait-free IF THE fundamental ISSUE no process crashes between the invocation of lock() and the return of unlock()

c Michel Raynal Synchronization is coming back! 94 c Michel Raynal Synchronization is coming back! 95

The fundamental problem Example

• Wait-free implement linearizable concurrent objects in presence of process crashes • Example: Build a wait-free queue A from read/write atomic registers in a system where up to n−1 processes Compose base objects to get “more powerful” objects can crash

X: t-fault tolerant Compare&Swap object k = t + 1

Problem: Given two objects A and X, is there

a wait-free implementation of A by X in a [1] [2] [ ] system of n processes (prone to crashes)? x base x base ··· ··· x base k

• Wait-free = live with respect to any number of process • Is it possible?? crashes

c Michel Raynal Synchronization is coming back! 96 c Michel Raynal Synchronization is coming back! 97 Herlihy’s Result The CONSENSUS Problem: Definition

• An object is universal if it can be used to wait-free im- Each process proposes a value and has to decide a value plement any object (that has a seq specification) in such a way that: • Main result (Herlihy 1991):

• Termination: Every correct process eventually Consensus is a universal object decides some value

• Any concurrent object (that has a sequential speci- fication) can be built from a consensus object (and • Validity: If a process decides v, then v was read/write registers) proposed by some process • A Universal construction is an algorithm that, given the sequential specification of an object, constructs a cor- responding concurrent object (from consensus objects • Uniform Agreement: No two (correct or not) and registers) processes decide differently

c Michel Raynal Synchronization is coming back! 98 c Michel Raynal Synchronization is coming back! 99

CONSENSUS in Action! (Concept) ... But there is a bad news: The Main Result

Let X be a consensus object (a single operation Propose(v)) Fischer-Lynch-Paterson’s Impossibility result (1985)

• Consensus is a Synchronization tool: Consensus makes a decision irreversible (only one ac- There is no protocol that solves the tion/value succeeds) consensus problem in asynchronous

x1 ← X.Propose(a) at t1 and x2 ← X.Propose(b) at t2 systems (shared memory or message we always have x1 = x2 (either a or b) whatever the passing) that is subject to even a single invocation times t1 and t2 process crash failure • Consensus solves Non-determinism: Consensus makes a result UNIQUE Let f be a non-deterministic function Fischer M.J., Lynch N.A. and Paterson M.S., Impossibility If x1 ← X.Propose(f(a)) and x2 ← X.Propose(f(a)) of Distributed Consensus with One Faulty Process. Journal we always have x1 = x2 of the ACM, 32(2):374-382, 1985.

c Michel Raynal Synchronization is coming back! 100 c Michel Raynal Synchronization is coming back! 101 Herlihy’s Results Herlihy’s Results

• As the synchronization power of read/write atomic vari- • Which objects allow implementing a consensus object? ables is too weak to implement consensus, are they syn- chronization primitives powerful enough to solve con- • Idea: investigate the synchro power of base objects sensus in a SM asynchronous system? • A few synchronization primitives (synchr objects): Each object has a consensus number The consensus number of X is the largest n for which ⋆ Test&Set (shared)= X solves consensus among n processes [prev ← shared; shared ← 1; return (prev)] • FLP ⇒ CN(Read/Write atomic variables)=1 Which means that read/write operations are not pow- ⋆ Swap (local, shared)= [local ↔ shared] erful enough to solve consensus in presence of even a single crash when n> 1 ⋆ Move (shared1, shared2)= [shared1 ↔ shared2] • Which is the synchronization power needed to solve consensus in a SM asynchronous system? ⋆ Compare&Swap (shared, old, new)= [prev ← shared; if prev = old then shared ← new fi; return (prev)]

c Michel Raynal Synchronization is coming back! 102 c Michel Raynal Synchronization is coming back! 103

Herlihy’s Results Consensus with Test&Set (2 processes)

• Let shared shared variable init to 0 • Let prefer[0], prefer[1] be two shared variables init to ⊥ • Consensus numbers define a hierarchy on the power of synch primitives Propose (v) = % issued by pi (i =0 or 1) % ⋆ CN(Test&Set) = CN(Swap) = CN(Stack)= prefer[i] ← v; CN(Fetch&Add) = CN(Fifo Queue) = ... = 2 val ← Test&Set (shared); case val =0 then return (v) ⋆ CN(Move) = CN(Compare&Swap)= +∞ val =1 then return (prefer[1 − i]) endcase

The “winner” is the first that executes Test&Set(shared)

c Michel Raynal Synchronization is coming back! 104 c Michel Raynal Synchronization is coming back! 105 Consensus with a FIFO Queue (2 processes) Consensus with Compare&Swap (n processes)

• Let queue be a shared queue init to < 0, 1 > (return from an empty queue returns ⊥) • Let shared shared variable init to ⊥ • Let prefer[0], prefer[1] be two shared variables init to ⊥ Propose (v) = % issued by pi (i =1, 2,...,n) %

Propose (v) = % issued by pi (i =0 or 1) % Let first be a local variable; first ←Compare&Swap (shared, ⊥,v) prefer[i] ← v; if first = ⊥ val ←Dequeue (queue); then return (v) case val =0 then return (v) else return (first) val =1 then return (prefer[1 − i]) endif endcase

The “winner” is the first that deposits its v in shared The “winner” is the first process that dequeues 0

c Michel Raynal Synchronization is coming back! 106 c Michel Raynal Synchronization is coming back! 107

An Observation Herlihy’s Hierarchy

Read/Write objects CN =1

Test-and-Set, Swap, Queue, Stack, ... • With Test&Set or FIFO Queue: CN =2 the shared register used to synchronize (determine a winner) and the shared register used to store the decided Herlihy’s Hierarchy value are distinct registers

• With Compare&Swap: CN =+∞ the shared register used to synchronize (determine a Consensus object winner) and the shared register used to store the decided value are the same register Comp-and-Swap, LLCS, Move, ...

c Michel Raynal Synchronization is coming back! 108 c Michel Raynal Synchronization is coming back! 109 What has been learnt from Herlihy’s Hierarchy Part X

• Test&Set, Swap, Fetch&Add, etc. are synchronization primitives that are too weak to implement reliable ob- A UNIVERSAL CONSTRUCTION jects in presence of process crashes A hierarchy on the power of synchr primitives when ad- - Chandra T. and Toueg S., Unreliable Failure Detectors dressing fault-tolerance for Reliable Distributed Systems. Journal of the ACM, 43(2):225-267, 1996. • Additional synchr power is required to design objects tolerant to process crashes. - Herlihy M.P., Wait-free synchronization. ACM Toplas, Object combination does not always work! (e.g., atomic 11(1):124-149, 1991. read/write objects do not allow the construction of more sophisticated objects) - Guerraoui R. and Raynal M., Fault-Tolerance Techniques for Concurrent Objects. Tech Report # 1667, 22 pages, FLP means (here) that fault-masking can be impossible IRISA, Universit´ede Rennes 1 (France), December 2004 to achieve when solving non-trivial problems if we do not rely on powerful enough synch primitives http://www.irisa.fr/bibli/publi/pi/2004/1667/1667.html

c Michel Raynal Synchronization is coming back! 110 c Michel Raynal Synchronization is coming back! 111

The Specification of the Objects Local data structures

• sx is a local variable containing the state of the object • X an object of type T i X as currently known by pi • Operations: X .op(param) (returns always a response) • next sni[1..n] is a local array local of sequence numbers • The type of X is defined by a transition function δ() next sni[j] (initialized to 1) is the next sequence number that, to p ’s knowledge, p will associate with its next ′ i j ⋆ δ(sx,op(param)) is a non-empty set of (sx ,res) pairs operation on X defining all the possible “results” we can obtain when the object X is in the state sx • propi (a list), execi (a list), resulti (an invocation result) ⋆ Each pair (sx′,res) is such that sx′ is a possible new and ki (an integer) are auxiliary variables state of X , and then res is the corresponding result returned by op(param) Notation: ⋆ If the set has only one pair, the type T is deterministic execi[r] = rth element of the list execi Otherwise, it is non-deterministic |execi| = size of the list execi

c Michel Raynal Synchronization is coming back! 112 c Michel Raynal Synchronization is coming back! 113 Shared Base Objects Universal construction (1); User interface

• LAST OP[1 ..n]: shared array of 1WnR atomic registers

Only pi can write LAST OP[i] Each register LAST OP[j ] has two fields: when operation X .op(param) is invoked by pi: resulti ←⊥; ⋆ LAST OP[j].op: last operation invoked by p j LAST OP[i] ← (op(param),next sni[i]); ⋆ LAST OP[j].sn: associated sequence number wait until (resulti 6= ⊥); ⋆ Each entry of the array is initialized to (⊥, 0) return (resulti)

• A list of consensus objects CONS[k] for k =1, 2,... (used sequentially by each process) A process invokes CONS[k].propose (v)

c Michel Raynal Synchronization is coming back! 114 c Michel Raynal Synchronization is coming back! 115

Universal construction (2): Background task Universal construction (3): Background task

while (true) do Assume first that the type T of the object is deterministic Step 1: Build a proposal prop ← ǫ; % empty list % i CONS[k ] propose ( ); for 1 ≤ j ≤ n do execi ← i . propi let = ; if (LAST OP[j].sn ≥ next sn [j]) then ℓ |execi| i for r from 1 to ℓ do add (LAST OP[j].op, j) to propi end if; end for; (sxi,res) ← δ(sxi, exec[r].op); let j = exec [r].proc; Step 2: Try to commit the proposal i next sn [j] ← next sn [j]+1; if (prop 6= ǫ) then k ← k + 1; i i i i i if (i = j) then result ← res end if ...... i end for end if end while

c Michel Raynal Synchronization is coming back! 116 c Michel Raynal Synchronization is coming back! 117 The case of non-deterministic types Correctness proof: The object is live and safe

• Wait-free property: Show that each operation invoked by a correct process terminates whatever the behavior of the other processes • Brute force strategy: This is the liveness property stating that the implemen- Consider a deterministic reduction δ′() of δ() tation ensures the operations do terminate • A nicer solution: • Linearizability (semantics of the objet): Use the consensus invocation to solve ordering + single From an external observer point of view, the operations result of each operation + single resulting state on the concurrent object occur as if the object was accessed sequentially by the processes This is the safety property stating that the implemen- tation of the object is linearizable

c Michel Raynal Synchronization is coming back! 118 c Michel Raynal Synchronization is coming back! 119

A Variant Wait-free vs Non-Blocking

“Simplified” version of the construction where the shared • The previous construction: array LAST OP[1 : n] and the seq numbers are suppressed, prop is a simpe variable (init. to ⊥) and δ(sx , ⊥)= sx i i i ⋆ Satisfies the non-blocking property: if processes pro- pose operations at least one process progress when operation X .op(param) is invoked by pi: ⋆ Does not satisfy the wait-free property: the progress of a correct process cannot be ensured resulti ←⊥; propi ← op(param) ; wait until (result 6= ⊥); return (result ) i i • The universal construction algorithm ensures the wait- free property thanks to a helping mechanism while (true) do % Background Task % ki ← ki + 1; The shared array LAST OP[1 : n] (and the associated execi ← CONS[ki].propose (propi); seq numbers) allows a process to propose to the con- (sx ,res) ← δ(sx , exec .op); sensus instances not only its own operations but all the i i i pending operations let j = execi[r].proc; if (i = j) then resulti ← res; propi ←⊥ end if Helping mechanisms: feature of wait-free computing end while

c Michel Raynal Synchronization is coming back! 120 c Michel Raynal Synchronization is coming back! 121 Part XI From Process Failures to Object Failures

When OBJECTS CAN FAIL • Until now: we have considered that - Jayanti P., Chandra T.D. and Toueg S., Fault-Tolerant only processes may fail, objects (atomic Wait-Free Shared Objects. Journal of the ACM, 45(3):451- registers) were implicitly assumed reli- 500, 1998 able - Guerraoui R. and Raynal M., Fault-Tolerance Techniques for Concurrent Objects. Tech Report # 1667, 22 pages, • From now on: base objects can fail IRISA, Universit´ede Rennes 1 (France), December 2004 http://www.irisa.fr/bibli/publi/pi/2004/1667/1667.html

c Michel Raynal Synchronization is coming back! 122 c Michel Raynal Synchronization is coming back! 123

Object Failure Modes Failure Mode Hierarchy

• An implementation of a shared object is t-fault tolerant • Crash failure of an object: after some time all with respect to a failure mode F (crash, omission, arbi- operations answer (responsive failure mode) trary) if the object remains correct and wait-free despite ⊥ the occurrence of up to t base objects that fail according to F • Omission failure of an object: after some time the answers to some processes are always ⊥ • Failure mode F is less severe than the failure mode G (denoted F ≺G) if a protocol that is t-fault tolerant for • Arbitrary failure of an object: the answers can the failure mode G is also t-fault tolerant for the failure be arbitrary mode F • crashes ≺ omissions ≺ arbitrary failures

c Michel Raynal Synchronization is coming back! 124 c Michel Raynal Synchronization is coming back! 125 State Machine Replication A Wait-Free t-Omission Tolerant Implementation

t +1 copies of the base object: base cons[1..(t + 1)]

sequential traversal of the base consensus objects • Replication management using the State Ma- chine approach (Thomas, Lamport-Schneider’s approach): concurrent RPC-like + votes procedure PROPOSE v • Replication + Sequential iteration on replicas % est, aux,k: local variables of the invoking process % The type of control structure applied to repli- est ← v; cas becomes decisive (concurrent=independent for k from 1 to (t + 1) do vs sequential=dependent) aux ← base cons[k].propose est; if (aux 6= ⊥) then est ← aux endif endfor; return(est)

c Michel Raynal Synchronization is coming back! 126 c Michel Raynal Synchronization is coming back! 127

Graceful Degradation Graceful Degradation: Example

• Let us assume that base objects can fail by crashing. • A wait-free implementation of a shared object is grace- If the implementation remains wait-free and correct de- fully degrading if it never fails more severely than the spite the crash of any number of processes and the crash base objects it is derived from, whatever the number of of up to t base objects, then this implementation is base objects that fail t-fault tolerant with respect to the crash failure mode • Remark: the “severity” relation on failure modes (≺) • If the implementation is wait-free and fails only by crash involves only the existence of a fault-tolerant protocol (if it fails) despite the crash of any number of processes (it does not involve the notion of graceful degradation) and the crash of any number of base objects, then it is gracefully degrading

c Michel Raynal Synchronization is coming back! 128 c Michel Raynal Synchronization is coming back! 129 A Gracefully Degrading t-Omission Tolerant Impl. Impossibility for the Crash Failure Mode

2t +1 copies of the base object: base cons[1..(2t + 1)] • It is impossible to design gracefully degrading procedure PROPOSE v t-fault tolerant wait-free implementations for the crash failure mode (for any object) % V [1..2t + 1],est,k: local variables of the invoking process % est ← v; • It is possible to design gracefully degrading for k from 1 to (2t + 1) do t-fault tolerant wait-free implementations for V [k] ← base cons[k].propose est; the omission (or arbitrary) failure mode if (V [k] 6= ⊥) ∧ (V [k] 6= est) then est ← V [k]; • Hence (Jayanti-Chandra-Toueg 1998): Com- V [1..(k − 1)] ← [⊥,..., ⊥] endif bining fault-tolerance and graceful degradation endfor; is not possible for all failure modes if (#⊥(V ) >t) then return(⊥) else return(est) endif

c Michel Raynal Synchronization is coming back! 130 c Michel Raynal Synchronization is coming back! 131

Conclusion Conclusion

• Linearizability • Wait-free computing • “Wait-Free” concept • Consensus number, Herlihy Hierarchy • Wait-free objects • Universal construction • Process failure vs Object failures • Object failure modes • Fault-tolerant wait-free objects • Process failure vs object failures • t-Resilient wait-free objects • Gracefully Degrading wait-free objects • Gracefully degrading objects

c Michel Raynal Synchronization is coming back! 132 c Michel Raynal Synchronization is coming back! 133 The other “only slide to remember” ! Three books

• Taubenfeld G., Synchronization algorithms and concur- rent programming. Pearson Education/Prentice Hall, 423 pages, 2006 (ISBN 0-131-97259-6) • Herlihy M. and Shavit N., The art of multiprocessor pro- Asynchrony and failures gramming. Morgan Kaufmann, 508 pages, 2008 (ISBN do modify our view of synchronization 978-0-12-370591-4) • Raynal M., Concurrent programming: algorithms, prin- ciples and foundations. Springer 510 pages, November 2012 (ISBN 978-3-642-32026-2)

c Michel Raynal Synchronization is coming back! 134 c Michel Raynal Synchronization is coming back! 135