Kein Folientitel

182.703: Problems in Distributed Computing (Part 3) WS 2019 Ulrich Schmid Institute of Computer Engineering, TU Vienna Embedded Computing Systems Group E191-02 [email protected] Content (Part 3) The Role of Synchrony Conditions Failure Detectors Real-Time Clocks Partially Synchronous Models Models supporting lock-step round simulations Weaker partially synchronous models Dynamic distributed systems U. Schmid 182.703 PRDC 2 The Role of Synchrony Conditions U. Schmid 182.703 PRDC 3 Recall Distributed Agreement (Consensus) Yes No Yes NoYes? None? meet All meet Yes No No U. Schmid 182.703 PRDC 4 Recall Consensus Impossibility (FLP) Fischer, Lynch und Paterson [FLP85]: “There is no deterministic algorithm for solving consensus in an asynchronous distributed system in the presence of a single crash failure.” Key problem: Distinguish slow from dead! U. Schmid 182.703 PRDC 5 Consensus Solvability in ParSync [DDS87] (I) Dolev, Dwork and Stockmeyer investigated consensus solvability in Partially Synchronous Systems (ParSync), varying 5 „synchrony handles“ : • Processors synchronous / asynchronous • Communication synchronous / asynchronous • Message order synchronous (system-wide consistent) / asynchronous (out-of-order) • Send steps broadcast / unicast • Computing steps atomic rec+send / separate rec, send U. Schmid 182.703 PRDC 6 Consensus Solvability in ParSync [DDS87] (II) Wait-free consensus possible Consensus impossible s/r Consensus possible s+r for f=1 ucast bcast async sync Global message order message Global async sync Communication U. Schmid 182.703 PRDC 7 The Role of Synchrony Conditions Enable failure detection Enforce event ordering • Distinguish „old“ from „new“ • Ruling out existence of stale (in-transit) information • Distinguish slow from dead • Creating non-overlapping „phases of operation“ (rounds) U. Schmid 182.703 PRDC 8 Failure Detectors U. Schmid 182.703 PRDC 9 Failure Detectors [CT96] (I) • Chandra & Toueg augmented purley asynchronous systems with (unreliable) failure detectors (FDs): Pressure Sensor Valve Proc p Network Proc q • Every processor owns a local FD module (an „oracle“ – we do not a priori care about how it is implemented!) • In every step [of a purely asynchronous algorithm], the FD can be queried for a hint about failures of other procs U. Schmid 182.703 PRDC 10 Failure Detectors [CT96] (II) • make mistakes – the (time-free!) FD specification restricts the allowed mistakes of a FD • FD hierarchy: A stronger FD specification implies – less allowed mistakes – more difficult problems to be solved using this FD – But: FD implementation more demanding/difficult • Every problem Pr has a weakest FD W: – There is a purely asynchronous algorithm for solving Pr that uses W – Every FD that also allows to solve Pr can be transformed (via a purely asynchronous algorithm) to simulate W U. Schmid 182.703 PRDC 11 Example Failure Detectors (I) • Perfect failure detector P: Outputs suspect list – Strong completeness: Eventually, every process that crashes is permanently suspected by every correct process – Strong accuracy: No process is ever suspected before it crashes • Eventually perfect failure detector ◊P: – Strong completeness – Eventual strong accuracy: There is a time after which correct processes are never suspected by correct processes U. Schmid 182.703 PRDC 12 Example Failure Detectors (II) • Eventually strong failure detector ◊S: – Strong completeness – Eventual weak accuracy: There is a time after which some correct process is never suspected by correct processes • Leader oracle Ω: Outputs a single process ID – There is a time after which every not yet crashed process outputs the same correct process p (the „leader“) • Both are weakest failure detectors for consensus (with majority of correct processes) U. Schmid 182.703 PRDC 13 Consensus with ◊S: Rotating Coordinator U. Schmid 182.703 PRDC 14 Why Agreement? Intersecting Quorums Intersecting Quorums: v v v v ┴ ┴ ┴ n=7 f=3 p decides v every q changes its estimate to v U. Schmid 182.703 PRDC 15 Implementability of FDs • If we can implement a FD like Ω or ◊S, we can also implement consensus (for n > 2f ) • In a purely asynchronous system – it is impossible to solve consensus (FLP result) – it is hence also impossible to implement Ω or ◊S • Back at key question: What needs to be added to an asynchronous system to make Ω or ◊S implementable? – Real-time constraints [ADFT04, …] – Order constraints [MMR03, …] – ??? U. Schmid 182.703 PRDC 16 Food for Thoughts (1) Starting out from the rotating coordinator consensus algorithm with ◊S shown above, devise a consensus algorithm that uses Ω instead of ◊S. (Use Ω in the first phase to receive only from the process that is considered leader, and keep in mind that different receivers may have different leaders for some time. You thus need some additional effort to make their estimates the same if somebody decides early.) (2) Sketch the proof that your algorithm works correctly in a system of n > 2f processes, where up to f may crash. U. Schmid 182.703 PRDC 17 Real-Time Clocks U. Schmid 182.703 PRDC 18 Distributed Systems with RT Clocks • Equip every processor p with a local RT clock Cp(t) T 1+ ρ C (t) C (t) 1− ρ Pressure p q Sensor Valve t Proc p Network Proc q • Small clock drift ρ local clocks progress approximately as real-time, with clock rate [1-ρ,1+ ρ] • End-to-end delay bounds [τ-, τ+], a priori known U. Schmid 182.703 PRDC 19 The Role of Real-Time • Real-time clocks enable both: Failure detection Event ordering • [Show later: Real-time clocks are not the only way …] U. Schmid 182.703 PRDC 20 Failure Detection: Timeout using RT Clock 5 seconds status = do_roundtrip(q) TO afterbefore pong: pong: ALIVEDEAD { send ping to q +1 +2 +3 +4 +5 p TO := Cp(t) + 5 seconds set timer process pong wait until Cp(t) = TO ping if pong did not arrive then return DEAD pong else q t return ALIVE process ping } p can reliably detect whether q has been alive recently, if • the end-to-end delays are at most τ+ = 2.5 seconds • τ+ is known a priori [at coding time] U. Schmid 182.703 PRDC 21 Event Ordering: Via Clock Synchronization Internal CS: External CS: • Precision |Cp(t) - Cq(t)| ≤ π • Accuracy |Cp(t) – t | ≤ α • Progress like RT (small drift ρ) • CS-Alg needs access to RT • CS-Alg must periodically • External CS internal CS π = 2α resynchronize T = t T T α Cp(t) t + α ≤ π ≥ Cp(t) ≥ α Cq(t) t - α t t U. Schmid 182.703 PRDC 22 Internal CS: Generate Periodic Resync Event Via sync. clocks Cp(t1) = P Cp(t2) = 2P Cp(t3) = 3P + no message p overhead ≤ π − requires initial synchrony q C (t ’) = P C (t ’) = 2P C (t ’) = 3P q 1 q 2 q 3 Via message Trigger phase Exchange phase exchange [ST87] p − message overhead Cp(tk) = kP + no initial synchrony required q Cq(tk’) = kP U. Schmid 182.703 PRDC 23 Internal CS: Estimate Remote Clocks One-way: Cp(tk) = kP Cp(tq) + 1 message only p d ε − Must know d and ε (and thus d = d + ε) kP Estimate [at q]: max C (t ) = kP+d+ε/2 q p q t |Error| ≤ ε/2 q Round-trip: C (t ) = kP − 2 messages & larger error, p k 2d 2ε BUT p tp + Round-trip time U can be U Estimate [at p]: measured locally → need to Time ? Cq(tp) = kP+2d+ε know d only [compute ε] Tq q |Error| ≤ ε + U ≈ 2d → ε ≈ 0 tq “Probabilistic CS” [Cri89] Cq(tp) U. Schmid 182.703 PRDC 24 Internal CS: FT Midpoint Algo [LWL88] A priori bounded [τ-, τ+] allows to estimate all remote clocks Discard f largest and f smallest clock readings (could be faulty) Set local clock to midpoint of remaining interval π´≤ π/2 After resync … C p p Cp Cq q C Before resync … π q U. Schmid 182.703 PRDC 25 Internal CS in Biological Systems • Malaccae Fireflies – Male fireflies emit light pulses ~ once per second – Swarm of fireflies eventually flash in synchrony • Cardiac ganglion of lobster heart – Heart activation by synchronized firing of 4 interneurons – Ganglion controls activation frequency within some bounds, without losing synchrony – Evolution-optimized strategy: Fault- tolerance, self-stabilization, etc. U. Schmid 182.703 PRDC 26 SS+FT Pulse Generation Alg. [DDP03] • Pulse-coupled oscillators (“integrate-and-fire”) – Time-decaying refractory function ~ own node’s sense of time – Time-decaying threshold level ~ perception of pulses from peers – Pulse is generated [+ refractory function reset] when refractory function hits threshold level Peers synced Peers not synced endogenous cycle t U. Schmid 182.703 PRDC 27 SS+FT Clock Sync Algorithm [DDP03b] • Allows to build linear-time self-stabilizing, Byzantine fault-tolerant clock sync algorithm, by – using synchronized pulses as a pacemaker – employing a self-stabilizing Byzantine agreement algorithm acting on clock values • Also solves initial clock synchronization problem U. Schmid 182.703 PRDC 28 External CS: Global Positioning System (GPS) GPS satellites Satellite clocks synchronized to Time: t1 (known) Time: t2 (known) USNO atomic 3D-pos: s1 (known) 3D-pos: s2 (known) master clock GPS-Receiver Rec. time: t1, t2 (unknown) solves system of equations Local rec. time: T , T Clk.offset Δ = T – t (unknown) 1 2 GPS ti+|χ-si|/c+Δ = Ti (known) Rec. 3D-pos: χ (unknown) • 4 satellites required to determine χ = (x, y, z) and Δ • 1 satellite sufficient for Δ if χ is already known U. Schmid 182.703 PRDC 29 Why are Synchronized Clocks Useful? • Synchronized clocks allow to simulate communication- closed lock-step rounds via clock time [NT93]: C (t ) = R C (t ) = 2R C (t ) = 3R p 1 p 2 p 3 p ≤ τ+ t ≤ π q C (t ’) = R C (t ’) = 2R C (t ’) = 3R q 1 q 2 q 3 • Only requirement: R ≥ τ+ + π holds! • Lock-step rounds perfect failure detection at end of rounds U.

Kein Folientitel

ACM SIGACT News Distributed Computing Column 28

Distributed Consensus: Performance Comparison of Paxos and Raft

2011 Edsger W. Dijkstra Prize in Distributed Computing

The Need for Language Support for Fault-Tolerant Distributed Systems

R S S M : 20 Y A

On the Minimal Synchronism Needed for Distributed Consensus

Distinguished Lecture Series

BEATCS No 119

Chapter on Distributed Computing

A Formal Treatment of Lamport's Paxos Algorithm

Consensus on Transaction Commit

Easy Impossibility Proofs for Distributed Consensus Problems