Distributed Algorithms

Master 2 IFI, CSSR Distributed Algorithms Francesco Bongiovanni INRIA Sophia Antipolis Research Center OASIS Team [email protected] Course web site : deptinfo.unice.fr/~baude/AlgoDist Nov. 2009 Chapter 7 : Failure Detectors, Consensus, Self-Stabilization 1 Acknowledgement . The slides for this lecture are based on ideas and materials from the following sources: . Introduction to Reliable Distributed Programming Guerraoui, Rachid, Rodrigues, Luís, 2006, 300 p., ISBN: 3-540-28845-7 (+ teaching material) . ID2203 Distributed Systems Advanced Course by Prof. Seif Haridi from KTH – Royal Institute of Technology (Sweden) . CS5410/514: Fault-tolerant Distributed Computer Systems Course by Prof. Ken Birman from Cornell University . Distributed Systems : An Algorithmic Approach by Sukumar, Ghosh, 2006, 424 p.,ISBN:1-584-88564-5 (+teaching material) . Various research papers 2 Outline 1. Failure Detectors . Definition . Properties – completeness and accuracy . Classes of FDs . Two algorithms : PFD and EPFD . Leader Election vs Failure Detector 2. Consensus . Definition . Properties . Types of Consensus : regular and uniform . Algorithm: hierarchical consensus 3. Self-stabilization . Principle . Example: Dijkstra's Token ring 3 Failure detectors 4 System models . synchronous distributed system . each message is received within bounded time . each step in a process takes lb < time < ub . each local clock’s drift has a known bound . asynchronous distributed system . no bounds on process execution . no bounds on message transmission delays . arbitrary clock drifts the Internet is an asynchronous distributed system 5 Failure model . First we must decide what do we mean by failure? . Different types of failures . Crash-stop (fail-stop) . A process halts and Crashes does not execute any further operations Omissions . Crash-recovery Crashes and recoveries . A process halts, but then recovers (reboots) after Arbitrary (Byzantine) a while . Crash-stop failures can be detected in synchronous systems . Next: detecting crash-stop failures in asynchronous systems 6 What's a Failure Detector ? Pi Pj 7 What's a Failure Detector ? Crash failure Pi Pj 8 What's a Failure Detector ? Needs to know about PJ's failure Crash failure Pi Pj 9 1. Ping-ack protocol If pj fails, within T time units, pi will send it a ping message, and will time out within another T time units. Detection time = 2T Needs to know about PJ's failure ping Pi Pj ack - Pj replies - Pi queries Pj once every T time units - if Pj does not respond within T time units, Pi marks pj as failed 10 2. Heart-beating protocol Needs to know about PJ's failure heartbeat Pi Pj - Pj maintains a sequence number - if P has not received a new heartbeat for the past i - P send P a heartbeat with T time units, P declares P as failed j i i j incremented seq. number after T' (=T) time units If pj has sent x heartbeats until the time it fails, then pi will timeout within (x+1)*T time units in the worst case, and will detect p as failed. j 11 Failure Detectors . Abstracting time . FD provide information (not necessary fully accurate) about which processes have crashed . Use failure detectors to encapsulate timing assumptions . Black box giving suspicions regarding node failures . Accuracy of suspicions depends on model strength 12 Failure Detectors . Basic properties . Completeness . Every crashed process is suspected . Accuracy . No correct process is suspected Both properties comes in two flavours . Strong and Weak 13 Failure Detectors . Strong Completeness . Every crashed process is eventually suspected by every correct process . Weak Completeness . Every crashed process is eventually suspected by at least one correct process . Strong Accuracy . No correct process is ever suspected . Weak Accuracy . There is at least one correct process that is never suspected 14 Failure Detectors . Classes of FDs Accuracy Weak Strong Eventually Eventually Weak Strong Weak W Q Eventual ◊Q Completeness Weak ◊W Strong Perfect Eventual Eventually S P Strong Perfect Strong ◊S ◊P Synchronous Systems Asynchronous Systems 15 Perfect Failure Detector (P) 16 Perfect Failure Detector (P) 17 Correctness of P . PFD1 (strong completeness) . A crashed node doesn’t send <heartbeat> . Eventually every node will notice the absence of <heartbeat> . PFD2 (strong accuracy) . Assuming local computation is negligible . Maximum time between 2 heartbeats . γ + δ time units . If alive, all nodes will recv hb in time . No inaccuracy 18 Eventually Perfect Failure Detector 19 Eventually Perfect Failure Detector 20 Correctness of EPFD . PFD1 (strong completeness) . Same as before . PFD2 (eventual strong accuracy) . Each time p is inaccurately suspected by a correct q . Timeout T is increased at q . Eventually system becomes synchronous, and T becomes larger than the unknown bound δ (T>γ +δ) . q will receive HB on time, and never suspect p again 21 Leader Election 22 Leader Election vs Failure Detection . Failure detection captures failure behavior . Detect failed nodes . Leader election (LE) also captures failure behavior . Detect correct nodes (a single & same for all) . Formally, leader election is an FD . Always suspects all nodes except one (leader) . Ensures some properties regarding that node 23 Leader Election vs Failure Detection . We’ll define two leader election algorithm . Leader election (LE) which “matches” P . Eventual leader election (Ω) which “matches” eventual P 24 Matching LE and P . P’s properties . P always eventually detects failures (strong completeness) . P never suspects correct nodes (strong accuracy) . Completeness of LE . Informally: eventually ditch crashed leaders . Formally: eventually every correct node trusts some correct node . Accuracy of LE . Informally: never ditch a correct leader . Formally: No two correct nodes trust different correct nodes . Is this really accuracy? . Yes! Assume two nodes trust different correct nodes . One of them must eventually switch, i.e. leaving a correct node 25 LE desirable properties . LE always eventually detects failures . Eventually every correct node trusts some correct node . LE is always accurate . No two correct nodes trust different correct nodes . But the above two permit the following . But P1 is “inaccurately” leaving a correct leader 26 LE desirable properties . To avoid “inaccuracy” we add . Local Accuracy: . If a node is elected leader by pi, all previously elected leaders by pi have crashed Not allowed, as P is 1 correct 27 Leader election - interface 28 Leader election - algorithm 29 Matching Ω and EPFD . Eventual P weakens P by only providing eventual accuracy . Weaken LE to Ω by only guaranteeing eventual agreement LE Properties: LE1 (eventual completeness). Eventually every eventual correct node trusts some correct node LE2 (agreement). No two correct nodes trust different correct nodes LE3 (local accuracy).If a node is elected leader by pi, all previously elected leaders by pi have crashed 30 Eventual Leader election - interface 31 Eventual Leader election - algorithm . See in the book... 32 Consensus (agreement) 33 Consensus . In the consensus problem, the processes propose values and have to agree on one among these values B A C . Solving consensus is key to solving many problems in distributed computing (e.g., total order broadcast, atomic commit, terminating reliable broadcast) 34 Consensus – cannonical application . a set of servers implement a distributed database . a subset of servers participate in a particular transaction . some of the servers may fail . the remaining servers must agree on whether to install the results of the transaction to the database or discard them 35 Consensus – cannonical application 36 Consensus – cannonical application 37 Consensus – basic properties . Termination . Every correct node eventually decides . Agreement . No two correct processes decide differently . Validity . Any value decided is a value proposed . Integrity: . A node decides at most once 38 FLP impossibility result . Consensus in Asynchronous System . Impossibility of consensus in the fail-silent model . FPL (Fischer, Lynch and Peterson 1985) : consensus is impossible in the fail-silent model with deterministic processes, even if only one process crashes . No way to satisfy agreement (safety) and termination (liveness) together 39 How to solve consensus in asynchronous systems with crashes ? . How to solve consensus in the presence of crashes ? . Either we relaxed our system model, that is, we assume partial synchrony . Either we modify the specifications . Constraining the set of inputs . Change the termination property: terminates with some probability . Or... 40 How to solve consensus in asynchronous systems with crashes ? . Intuitively consensus is impossible to solve because : . 1) the decision depends on one process . 2) we have no idea if this process is alive (we have to wait for its message) or dead. Thus we add to the asynchronous system what it needs in order to solve the consensus: . Failure detectors 41 (regular) Consensus 42 (regular) Consensus . Sample execution Question : does it satisfy consensus ? 43 Uniform consensus 44 Uniform consensus Question: Does it satisfy uniform consensus ? 45 Hierarchical consensus . Use perfect fd (P) and best-effort bcast (BEB) . Each node stores its proposal in proposal . Possible to adopt another proposal by changing proposal . Store identity of last adopted proposer in lastprop . Loop through rounds 1 to N . In round i . node i is leader and . broadcasts proposal v, and decides proposal v

Load more