Kein Folientitel

Total Page:16

File Type:pdf, Size:1020Kb

Kein Folientitel 182.703: Problems in Distributed Computing (Part 3) WS 2019 Ulrich Schmid Institute of Computer Engineering, TU Vienna Embedded Computing Systems Group E191-02 [email protected] Content (Part 3) The Role of Synchrony Conditions Failure Detectors Real-Time Clocks Partially Synchronous Models Models supporting lock-step round simulations Weaker partially synchronous models Dynamic distributed systems U. Schmid 182.703 PRDC 2 The Role of Synchrony Conditions U. Schmid 182.703 PRDC 3 Recall Distributed Agreement (Consensus) Yes No Yes NoYes? None? meet All meet Yes No No U. Schmid 182.703 PRDC 4 Recall Consensus Impossibility (FLP) Fischer, Lynch und Paterson [FLP85]: “There is no deterministic algorithm for solving consensus in an asynchronous distributed system in the presence of a single crash failure.” Key problem: Distinguish slow from dead! U. Schmid 182.703 PRDC 5 Consensus Solvability in ParSync [DDS87] (I) Dolev, Dwork and Stockmeyer investigated consensus solvability in Partially Synchronous Systems (ParSync), varying 5 „synchrony handles“ : • Processors synchronous / asynchronous • Communication synchronous / asynchronous • Message order synchronous (system-wide consistent) / asynchronous (out-of-order) • Send steps broadcast / unicast • Computing steps atomic rec+send / separate rec, send U. Schmid 182.703 PRDC 6 Consensus Solvability in ParSync [DDS87] (II) Wait-free consensus possible Consensus impossible s/r Consensus possible s+r for f=1 ucast bcast async sync Global message order message Global async sync Communication U. Schmid 182.703 PRDC 7 The Role of Synchrony Conditions Enable failure detection Enforce event ordering • Distinguish „old“ from „new“ • Ruling out existence of stale (in-transit) information • Distinguish slow from dead • Creating non-overlapping „phases of operation“ (rounds) U. Schmid 182.703 PRDC 8 Failure Detectors U. Schmid 182.703 PRDC 9 Failure Detectors [CT96] (I) • Chandra & Toueg augmented purley asynchronous systems with (unreliable) failure detectors (FDs): Pressure Sensor Valve Proc p Network Proc q • Every processor owns a local FD module (an „oracle“ – we do not a priori care about how it is implemented!) • In every step [of a purely asynchronous algorithm], the FD can be queried for a hint about failures of other procs U. Schmid 182.703 PRDC 10 Failure Detectors [CT96] (II) • make mistakes – the (time-free!) FD specification restricts the allowed mistakes of a FD • FD hierarchy: A stronger FD specification implies – less allowed mistakes – more difficult problems to be solved using this FD – But: FD implementation more demanding/difficult • Every problem Pr has a weakest FD W: – There is a purely asynchronous algorithm for solving Pr that uses W – Every FD that also allows to solve Pr can be transformed (via a purely asynchronous algorithm) to simulate W U. Schmid 182.703 PRDC 11 Example Failure Detectors (I) • Perfect failure detector P: Outputs suspect list – Strong completeness: Eventually, every process that crashes is permanently suspected by every correct process – Strong accuracy: No process is ever suspected before it crashes • Eventually perfect failure detector ◊P: – Strong completeness – Eventual strong accuracy: There is a time after which correct processes are never suspected by correct processes U. Schmid 182.703 PRDC 12 Example Failure Detectors (II) • Eventually strong failure detector ◊S: – Strong completeness – Eventual weak accuracy: There is a time after which some correct process is never suspected by correct processes • Leader oracle Ω: Outputs a single process ID – There is a time after which every not yet crashed process outputs the same correct process p (the „leader“) • Both are weakest failure detectors for consensus (with majority of correct processes) U. Schmid 182.703 PRDC 13 Consensus with ◊S: Rotating Coordinator U. Schmid 182.703 PRDC 14 Why Agreement? Intersecting Quorums Intersecting Quorums: v v v v ┴ ┴ ┴ n=7 f=3 p decides v every q changes its estimate to v U. Schmid 182.703 PRDC 15 Implementability of FDs • If we can implement a FD like Ω or ◊S, we can also implement consensus (for n > 2f ) • In a purely asynchronous system – it is impossible to solve consensus (FLP result) – it is hence also impossible to implement Ω or ◊S • Back at key question: What needs to be added to an asynchronous system to make Ω or ◊S implementable? – Real-time constraints [ADFT04, …] – Order constraints [MMR03, …] – ??? U. Schmid 182.703 PRDC 16 Food for Thoughts (1) Starting out from the rotating coordinator consensus algorithm with ◊S shown above, devise a consensus algorithm that uses Ω instead of ◊S. (Use Ω in the first phase to receive only from the process that is considered leader, and keep in mind that different receivers may have different leaders for some time. You thus need some additional effort to make their estimates the same if somebody decides early.) (2) Sketch the proof that your algorithm works correctly in a system of n > 2f processes, where up to f may crash. U. Schmid 182.703 PRDC 17 Real-Time Clocks U. Schmid 182.703 PRDC 18 Distributed Systems with RT Clocks • Equip every processor p with a local RT clock Cp(t) T 1+ ρ C (t) C (t) 1− ρ Pressure p q Sensor Valve t Proc p Network Proc q • Small clock drift ρ local clocks progress approximately as real-time, with clock rate [1-ρ,1+ ρ] • End-to-end delay bounds [τ-, τ+], a priori known U. Schmid 182.703 PRDC 19 The Role of Real-Time • Real-time clocks enable both: Failure detection Event ordering • [Show later: Real-time clocks are not the only way …] U. Schmid 182.703 PRDC 20 Failure Detection: Timeout using RT Clock 5 seconds status = do_roundtrip(q) TO afterbefore pong: pong: ALIVEDEAD { send ping to q +1 +2 +3 +4 +5 p TO := Cp(t) + 5 seconds set timer process pong wait until Cp(t) = TO ping if pong did not arrive then return DEAD pong else q t return ALIVE process ping } p can reliably detect whether q has been alive recently, if • the end-to-end delays are at most τ+ = 2.5 seconds • τ+ is known a priori [at coding time] U. Schmid 182.703 PRDC 21 Event Ordering: Via Clock Synchronization Internal CS: External CS: • Precision |Cp(t) - Cq(t)| ≤ π • Accuracy |Cp(t) – t | ≤ α • Progress like RT (small drift ρ) • CS-Alg needs access to RT • CS-Alg must periodically • External CS internal CS π = 2α resynchronize T = t T T α Cp(t) t + α ≤ π ≥ Cp(t) ≥ α Cq(t) t - α t t U. Schmid 182.703 PRDC 22 Internal CS: Generate Periodic Resync Event Via sync. clocks Cp(t1) = P Cp(t2) = 2P Cp(t3) = 3P + no message p overhead ≤ π − requires initial synchrony q C (t ’) = P C (t ’) = 2P C (t ’) = 3P q 1 q 2 q 3 Via message Trigger phase Exchange phase exchange [ST87] p − message overhead Cp(tk) = kP + no initial synchrony required q Cq(tk’) = kP U. Schmid 182.703 PRDC 23 Internal CS: Estimate Remote Clocks One-way: Cp(tk) = kP Cp(tq) + 1 message only p d ε − Must know d and ε (and thus d = d + ε) kP Estimate [at q]: max C (t ) = kP+d+ε/2 q p q t |Error| ≤ ε/2 q Round-trip: C (t ) = kP − 2 messages & larger error, p k 2d 2ε BUT p tp + Round-trip time U can be U Estimate [at p]: measured locally → need to Time ? Cq(tp) = kP+2d+ε know d only [compute ε] Tq q |Error| ≤ ε + U ≈ 2d → ε ≈ 0 tq “Probabilistic CS” [Cri89] Cq(tp) U. Schmid 182.703 PRDC 24 Internal CS: FT Midpoint Algo [LWL88] A priori bounded [τ-, τ+] allows to estimate all remote clocks Discard f largest and f smallest clock readings (could be faulty) Set local clock to midpoint of remaining interval π´≤ π/2 After resync … C p p Cp Cq q C Before resync … π q U. Schmid 182.703 PRDC 25 Internal CS in Biological Systems • Malaccae Fireflies – Male fireflies emit light pulses ~ once per second – Swarm of fireflies eventually flash in synchrony • Cardiac ganglion of lobster heart – Heart activation by synchronized firing of 4 interneurons – Ganglion controls activation frequency within some bounds, without losing synchrony – Evolution-optimized strategy: Fault- tolerance, self-stabilization, etc. U. Schmid 182.703 PRDC 26 SS+FT Pulse Generation Alg. [DDP03] • Pulse-coupled oscillators (“integrate-and-fire”) – Time-decaying refractory function ~ own node’s sense of time – Time-decaying threshold level ~ perception of pulses from peers – Pulse is generated [+ refractory function reset] when refractory function hits threshold level Peers synced Peers not synced endogenous cycle t U. Schmid 182.703 PRDC 27 SS+FT Clock Sync Algorithm [DDP03b] • Allows to build linear-time self-stabilizing, Byzantine fault-tolerant clock sync algorithm, by – using synchronized pulses as a pacemaker – employing a self-stabilizing Byzantine agreement algorithm acting on clock values • Also solves initial clock synchronization problem U. Schmid 182.703 PRDC 28 External CS: Global Positioning System (GPS) GPS satellites Satellite clocks synchronized to Time: t1 (known) Time: t2 (known) USNO atomic 3D-pos: s1 (known) 3D-pos: s2 (known) master clock GPS-Receiver Rec. time: t1, t2 (unknown) solves system of equations Local rec. time: T , T Clk.offset Δ = T – t (unknown) 1 2 GPS ti+|χ-si|/c+Δ = Ti (known) Rec. 3D-pos: χ (unknown) • 4 satellites required to determine χ = (x, y, z) and Δ • 1 satellite sufficient for Δ if χ is already known U. Schmid 182.703 PRDC 29 Why are Synchronized Clocks Useful? • Synchronized clocks allow to simulate communication- closed lock-step rounds via clock time [NT93]: C (t ) = R C (t ) = 2R C (t ) = 3R p 1 p 2 p 3 p ≤ τ+ t ≤ π q C (t ’) = R C (t ’) = 2R C (t ’) = 3R q 1 q 2 q 3 • Only requirement: R ≥ τ+ + π holds! • Lock-step rounds perfect failure detection at end of rounds U.
Recommended publications
  • ACM SIGACT News Distributed Computing Column 28
    ACM SIGACT News Distributed Computing Column 28 Idit Keidar Dept. of Electrical Engineering, Technion Haifa, 32000, Israel [email protected] Sergio Rajsbaum, who edited this column for seven years and established it as a relevant and popular venue, is stepping down. This issue is my first step in the big shoes he vacated. I would like to take this opportunity to thank Sergio for providing us with seven years’ worth of interesting columns. In producing these columns, Sergio has enjoyed the support of the community at-large and obtained material from many authors, who greatly contributed to the column’s success. I hope to enjoy a similar level of support; I warmly welcome your feedback and suggestions for material to include in this column! The main two conferences in the area of principles of distributed computing, PODC and DISC, took place this summer. This issue is centered around these conferences, and more broadly, distributed computing research as reflected therein, ranging from reviews of this year’s instantiations, through influential papers in past instantiations, to examining PODC’s place within the realm of computer science. I begin with a short review of PODC’07, and highlight some “hot” trends that have taken root in PODC, as reflected in this year’s program. Some of the forthcoming columns will be dedicated to these up-and- coming research topics. This is followed by a review of this year’s DISC, by Edward (Eddie) Bortnikov. For some perspective on long-running trends in the field, I next include the announcement of this year’s Edsger W.
    [Show full text]
  • Distributed Consensus: Performance Comparison of Paxos and Raft
    DEGREE PROJECT IN INFORMATION AND COMMUNICATION TECHNOLOGY, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2020 Distributed Consensus: Performance Comparison of Paxos and Raft HARALD NG KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Distributed Consensus: Performance Comparison of Paxos and Raft HARALD NG Master in Software Engineering of Distributed Systems Date: September 10, 2020 Supervisor: Lars Kroll (RISE), Max Meldrum (KTH) Examiner: Seif Haridi School of Electrical Engineering and Computer Science Host company: RISE Swedish title: Distribuerad Konsensus: Prestandajämförelse mellan Paxos och Raft iii Abstract With the growth of the internet, distributed systems have become increasingly important in order to provide more available and scalable applications. Con- sensus is a fundamental problem in distributed systems where multiple pro- cesses have to agree on the same proposed value in the presence of partial failures. Distributed consensus allows for building various applications such as lock services, configuration manager services or distributed databases. Two well-known consensus algorithms for building distributed logs are Multi-Paxos and Raft. Multi-Paxos was published almost three decades before Raft and gained a lot of popularity. However, critics of Multi-Paxos consider it difficult to understand. Raft was therefore published with the motivation of being an easily understood consensus algorithm. The Raft algorithm shares similar characteristics with a practical version of Multi-Paxos called Leader- based Sequence Paxos. However, the algorithms differ in important aspects such as leader election and reconfiguration. Existing work mainly compares Multi-Paxos and Raft in theory, but there is a lack of performance comparisons in practice. Hence, prototypes of Leader- based Sequence Paxos and Raft have been designed and implemented in this thesis.
    [Show full text]
  • 2011 Edsger W. Dijkstra Prize in Distributed Computing
    2011 Edsger W. Dijkstra Prize in Distributed Computing We are proud to announce that the 2011 Edsger W. Dijkstra Prize in Distributed Computing is awarded to Hagit Attiya, Amotz Bar-Noy, and Danny Dolev, for their paper Sharing Memory Robustly in Message-Passing Systems which appeared in the Journal of the ACM (JACM) 42(1):124-142 (1995). This fundamental paper presents the first asynchronous fault-tolerant simulation of shared memory in a message passing system. In 1985 Fischer, Lynch, and Paterson (FLP) proved that consensus is not attainable in asynchronous message passing systems with failures. In the following years, Loui and Abu-Amara, and Herlihy proved that solving consensus in an asynchronous shared memory system is harder than implementing a read/write register. However, it was not known whether read/ write registers are implementable in asynchronous message-passing systems with failures. Hagit Attiya, Amotz Bar-Noy, and Danny Dolev’s paper (ABD) answered this fundamental question afrmatively. The ABD paper makes an important foundational contribution by allowing one to “view the shared-memory model as a higher-level language for designing algorithms in asynchronous distributed systems.” In a manner similar to that in which Turing Machines were proved equivalent to Random Access Memory, ABD proved that the implications of FLP are not idiosyncratic to a particular model. Impossibility results and lower bounds can be proved in the higher-level shared memory model, and then automatically translated to message passing. Besides FLP, this includes results such as the impossibility of k-set agreement in asynchronous systems as well as various weakest failure detector constructions.
    [Show full text]
  • The Need for Language Support for Fault-Tolerant Distributed Systems
    The Need for Language Support for Fault-tolerant Distributed Systems Cezara Drăgoi1, Thomas A. Henzinger∗2, and Damien Zufferey†3 1 INRIA, ENS, CNRS Paris, France [email protected] 2 IST Austria Klosterneuburg, Austria [email protected] 3 MIT CSAIL Cambridge, USA [email protected] Abstract Fault-tolerant distributed algorithms play an important role in many critical/high-availability applications. These algorithms are notoriously difficult to implement correctly, due to asyn- chronous communication and the occurrence of faults, such as the network dropping messages or computers crashing. Nonetheless there is surprisingly little language and verification support to build distributed systems based on fault-tolerant algorithms. In this paper, we present some of the challenges that a designer has to overcome to implement a fault-tolerant distributed system. Then we review different models that have been proposed to reason about distributed algorithms and sketch how such a model can form the basis for a domain-specific programming language. Adopting a high-level programming model can simplify the programmer’s life and make the code amenable to automated verification, while still compiling to efficiently executable code. We conclude by summarizing the current status of an ongoing language design and implementation project that is based on this idea. 1998 ACM Subject Classification D.3.3 Language Constructs and Features Keywords and phrases Programming language, Fault-tolerant distributed algorithms, Auto- mated verification Digital Object Identifier 10.4230/LIPIcs.xxx.yyy.p 1 Introduction Replication of data across multiple data centers allows applications to be available even in the case of failures, guaranteeing clients access to the data.
    [Show full text]
  • R S S M : 20 Y A
    Robust Simulation of Shared Memory: 20 Years After∗ Hagit Attiya Department of Computer Science, Technion and School of Computer and Communication Sciences, EPFL Abstract The article explores the concept of simulating the abstraction of a shared memory in message passing systems, despite the existence of failures. This abstraction provides an atomic register accessed with read and write opera- tions. This article describes the Attiya, Bar-Noy and Dolev simulation, its origins and generalizations, as well as its applications in theory and practice. Dedicated to the 60th birthday of Danny Dolev 1 Introduction In the summer of 1989, I spent a week in the bay area, visiting Danny Dolev, who was at IBM Almaden at that time, and Amotz Bar-Noy, who was a post- doc at Stanford, before going to PODC in Edmonton.1 Danny and Amotz have already done some work on equivalence between shared memory and message- passing communication primitives [14]. Their goal was to port specific renaming algorithms from message-passing systems [10] to shared-memory systems, and therefore, their simulation was tailored for the specific construct—stable vectors– used in these algorithms. Register constructions were a big fad at the time, and motivated by them, we were looking for a more generic simulation of shared-memory in message passing systems. ∗This article is based on my invited talk at SPAA 2008. 1 During the same visit, Danny and I also worked on the first snapshot algorithm with bounded memory; these ideas, tremendously simplified and improved by our coauthors later, lead to the atomic snapshots paper [2].
    [Show full text]
  • On the Minimal Synchronism Needed for Distributed Consensus
    On the Minimal Synchronism Needed for Distributed Consensus DANNY DOLEV Hebrew University, Jerusalem, Israel CYNTHIA DWORK Cornell Univer.sity, Ithaca, New York AND LARRY STOCKMEYER IBM Ahnaden Research Center, San Jose, California Abstract. Reaching agreement is a primitive of distributed computing. Whereas this poses no problem in an ideal, failure-free environment, it imposes certain constraints on the capabilities of an actual system: A system is viable only if it permits the existence of consensus protocols tolerant to some number of failures. Fischer et al. have shown that in a completely asynchronous model, even one failure cannot be tolerated. In this paper their work is extended: Several critical system parameters, including various synchrony conditions, are identified and how varying these affects the number of faults that can be tolerated is examined. The proofs expose general heuristic principles that explain why consensus is possible in certain models but not possible in others. Categories and Subject Descriptors: C.2.4 [Computer-Communication Networks]: Distributed Systems- distributed applications; distributed databases; network operating systems; C.4 [Performance of Systems]: reliability, availability, and serviceability; F. I .2 [Computation by Abstract Devices]: Modes of Computation-parallelism; H.2.4 [Database Management]: Systems-distributed systems General Terms: Algorithms, Reliability, Theory, Verification Additional Key Words and Phrases: Agreement problem, asynchronous system, Byzantine Generals problem, commit problem, consensus problem, distributed computing, fault tolerance, reliability 1. Introduction The problem of reaching agreement among separated processors is a fundamental problem of both practical and theoretical importance in the area of distributed systems; (see, e.g., [I], [7], [8], [ 1 l] and [ 121).We consider a system of N processors A preliminary version of this paper appears in the Proceedings of the 24th Annual Symposium on Foundations of Computer Science, November 7-9, 1983, Tucson, Ariz., pp.
    [Show full text]
  • Distinguished Lecture Series
    DISTINGUISHED LECTURE SERIES NANCY LYNCH NEC Professor of Software Science and Engineering, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology “A Theoretical View of Distributed Systems” Thursday, January 30, 2020 • 2-3 p.m Donald Bren Hall, Room 6011 ABSTRACT For several decades, my collaborators, students, and I have worked on theory for distributed systems, in order to understand their capabilities and limitations in a rigorous, mathematical way. This work has produced many different kinds of results, including: • Abstract models for problems that are solved by distributed systems, and for the algorithms used to solve them, • Rigorous proofs of algorithm correctness and performance properties (also some error discoveries), • Impossibility results and lower bounds, expressing inherent limitations of distributed systems, • Some new algorithms, and • General mathematical foundations for modeling and analyzing distributed systems. These various results have spanned many different kinds of systems, ranging from distributed data- management systems, to communication systems, to biological systems such as insect colonies and brains. In this talk, I will overview some highlights of our work over many years on theory for distributed systems. I will break this down in terms of three intertwined “research threads”: algorithms for traditional distributed systems, impossibility results, and mathematical foundations. At the end, I will say something about ourrecent work on algorithms for new kinds of distributed systems. BIO Nancy Lynch is the NEC Professor of Software Science and Engineering in MIT’s EECS department. She heads the Theory of Distributed Systems research group in the Computer Science and AI Laboratory. She received her PhD from MIT and her BS from Brooklyn College, both in Mathematics Lynch has (co-)written many research articles about distributed algorithms and impossibility results, and about formal modeling and verification of distributed systems.
    [Show full text]
  • BEATCS No 119
    ISSN 0252–9742 Bulletin of the European Association for Theoretical Computer Science EATCS E A T C S Number 119 June 2016 Council of the European Association for Theoretical Computer Science President:Luca Aceto Iceland Vice Presidents:Paul Spirakis United Kingdom and Greece Antonin Kucera Czech Republic Giuseppe Persiano Italy Treasurer:Dirk Janssens Belgium Bulletin Editor:Kazuo Iwama Kyoto,Japan Lars Arge Denmark Anca Muscholl France Jos Baeten The Netherlands Luke Ong UK Lars Birkedal Denmark Catuscia Palamidessi France Mikolaj Bojanczyk Poland Giuseppe Persiano Italy Fedor Fomin Norway Alberto Policriti Italy Pierre Fraigniaud France Alberto Marchetti Spaccamela Italy Leslie Ann Goldberg UK Vladimiro Sassone UK Magnus Halldorsson Iceland Thomas Schwentick Germany Monika Henzinger Austria Jukka Suomela Finland Christos Kaklamanis Greece Thomas Wilke Germany Elvira Mayordomo Spain Peter Widmayer Switzerland Michael Mitzenmacher USA Gerhard Woeginger¨ The Netherlands Past Presidents: Maurice Nivat (1972–1977)Mike Paterson (1977–1979) Arto Salomaa (1979–1985)Grzegorz Rozenberg (1985–1994) Wilfred Brauer (1994–1997)Josep D´iaz (1997–2002) Mogens Nielsen (2002–2006)Giorgio Ausiello (2006–2009) Burkhard Monien (2009–2012) Secretary Office:Ioannis Chatzigiannakis Italy Efi Chita Greece EATCS Council Members email addresses Luca Aceto ..................................... [email protected] Lars Arge .............................. [email protected] Jos Baeten ............................... [email protected] Lars Birkedal ............................ [email protected] Mikolaj Bojanczyk .......................... [email protected] Fedor Fomin ................................ [email protected] Pierre Fraigniaud .. [email protected] Leslie Ann Goldberg ................ [email protected] Magnus Halldorsson ........................ [email protected] Monika Henzinger ................ [email protected] Kazuo Iwama ........................ [email protected] Dirk Janssens ........................
    [Show full text]
  • Chapter on Distributed Computing
    Chapter on Distributed Computing Leslie Lamport and Nancy Lynch February 3, 1989 Contents 1 What is Distributed Computing? 1 2 Models of Distributed Systems 2 2.1 Message-PassingModels . 2 2.1.1 Taxonomy......................... 2 2.1.2 MeasuringComplexity . 6 2.2 OtherModels........................... 8 2.2.1 SharedVariables . .. .. .. .. .. 8 2.2.2 Synchronous Communication . 9 2.3 FundamentalConcepts. 10 3 Reasoning About Distributed Algorithms 12 3.1 ASystemasaSetofBehaviors . 13 3.2 SafetyandLiveness. 14 3.3 DescribingaSystem . .. .. .. .. .. .. 14 3.4 AssertionalReasoning . 17 3.4.1 Simple Safety Properties . 17 3.4.2 Liveness Properties . 20 3.5 DerivingAlgorithms . 25 3.6 Specification............................ 26 4 Some Typical Distributed Algorithms 27 4.1 Shared Variable Algorithms . 28 4.1.1 MutualExclusion. 28 4.1.2 Other Contention Problems . 31 4.1.3 Cooperation Problems . 32 4.1.4 Concurrent Readers and Writers . 33 4.2 DistributedConsensus . 34 4.2.1 The Two-Generals Problem . 35 4.2.2 Agreement on a Value . 35 4.2.3 OtherConsensusProblems . 38 4.2.4 The Distributed Commit Problem . 41 4.3 Network Algorithms . 41 4.3.1 Static Algorithms . 42 4.3.2 Dynamic Algorithms . 44 4.3.3 Changing Networks . 47 4.3.4 LinkProtocols ...................... 48 i 4.4 Concurrency Control in Databases . 49 4.4.1 Techniques ........................ 50 4.4.2 DistributionIssues . 51 4.4.3 Nested Transactions . 52 ii Abstract Rigorous analysis starts with a precise model of a distributed system; the most popular models, differing in how they represent interprocess commu- nication, are message passing, shared variables, and synchronous communi- cation. The properties satisfied by an algorithm must be precisely stated and carefully proved; the most successful approach is based on assertional reasoning.
    [Show full text]
  • A Formal Treatment of Lamport's Paxos Algorithm
    A Formal Treatment of Lamport’s Paxos Algorithm Roberto De Prisco Nancy Lynch Alex Shvartsman Nicole Immorlica Toh Ne Win October 10, 2002 Abstract This paper gives a formal presentation of Lamport’s Paxos algorithm for distributed consensus. The presentation includes a state machine model for the complete protocol, a correctness proof, and a perfor- mance analysis. The correctness proof, which shows the safety properties of agreement and validity, is based on a mapping to an abstract state machine representing a non-distributed version of the protocol. The performance analysis proves latency bounds, conditioned on certain timing and failure assumptions. Liveness and fault-tolerance correctness properties are corollaries of the performance properties. This work was supported in part by the NSF ITR Grant CCR-0121277. Massachusetts Institute of Technology, Laboratory for Computer Science, 200 Technology Square, NE43-365, Cambridge, MA 02139, USA. Email: [email protected]. Department of Computer Science and Engineering, 191 Auditorium Road, Unit 3155, University of Connecticut, Storrs, CT 06269 and Massachusetts Institute of Technology, Laboratory for Computer Science, 200 Technology Square, NE43-316, Cam- bridge, MA 02139, USA. Email: [email protected]. Massachusetts Institute of Technology, Laboratory for Computer Science, 200 Technology Square, NE43-???, Cambridge, MA 02139, USA. Email: [email protected]. Massachusetts Institute of Technology, Laboratory for Computer Science, 200 Technology Square, NE43-413, Cambridge, MA 02139, USA. Email: [email protected]. 1 Introduction Overview: Lamport’s Paxos algorithm [1] has at its core a solution to the problem of distributed consen- sus. In that consensus algorithm, the safety properties—agreement and validity—hold in all executions.
    [Show full text]
  • Consensus on Transaction Commit
    Consensus on Transaction Commit Jim Gray and Leslie Lamport Microsoft Research 1 January 2004 revised 19 April 2004 MSR-TR-2003-96 Abstract The distributed transaction commit problem requires reaching agreement on whether a transaction is committed or aborted. The classic Two-Phase Commit protocol blocks if the coordinator fails. Fault-tolerant consensus algorithms also reach agreement, but do not block whenever any majority of the processes are working. The Paxos Commit algorithm runs a Paxos consensus algorithm on the commit/abort decision of each participant to obtain a transaction commit protocol that uses 2F + 1 coordinators and makes progress if at least F + 1 of them are working. Paxos Commit has the same stable-storage write delay, and can be implemented to have the same message delay in the fault-free case, as Two-Phase Commit, but it uses more messages. The classic Two-Phase Commit algorithm is obtained as the special F = 0 case of the Paxos Commit algorithm. Contents 1 Introduction 1 2 Transaction Commit 2 3 Two-Phase Commit 4 3.1 The Protocol . 4 3.2 The Cost of Two-Phase Commit . 6 3.3 The Problem with Two-Phase Commit . 6 4 Paxos Commit 7 4.1 The Paxos Consensus Algorithm . 7 4.2 The Paxos Commit Algorithm . 10 4.3 The Cost of Paxos Commit . 12 5 Paxos versus Two-Phase Commit 14 6 Transaction Creation and Registration 16 6.1 Creation . 16 6.2 Registration . 17 7 Conclusion 18 A The TLA+ Specifications 21 A.1 The Specification of a Transaction Commit Protocol .
    [Show full text]
  • Easy Impossibility Proofs for Distributed Consensus Problems
    Distributed Computing (1986) 1:26 39 Easy impossibility proofs for distributed consensus problems Michael J. Fischer 1, Nancy A. Lynch 2, and Michael Merritt 3 Department of Computer Science, Yale University, P.O. Box 2158, New Haven, CT 06520, USA 2 Laboratory for Computer Science, Massachusetts Institute of Technology, 545 Technology Square, Cambridge, M A 02139, USA s AT & T Bell Laboratories, 600 Mountain Ave, Murray Hill, NJ 07974, USA and Laboratory for Computer Science, Massachusetts Institute of Technology, 545 Technology Square, Cambridge, M A 02139, US A Michael J. Fischer is cur- 1972. She has served on the .klculty of Tufts University, the rently Professor of Computer University of Southern California, Florida International Science at Yale University, University, Georgia Tech. New Haven, CT, where he heads the Theory of Compu- Michael Merritt is currently tation Group. He is also Edi- a member of the technical tor_in_Chief of the Journal of stq[] with AT& T Bell the Association .for Comput- Laboratories. During the 1984 ing Machinery. His research 85 academic year, he was interests include theory of a visiting lecturer at M.I.7:, distributed systems, crypto- sponsered by Bell Labs. His graphic protocols, and compu- research interests include dis- tational complexity. tributed computation, cryptog- Dr. Fischer received the raphy and security. Dr. Merritt B. S. degree in mathematics received the B. S. degree in j?om the University of Mi- computer science and philo- chigan, Ann Arbor, in 1963, sophy from Yale in 1978 and and the M. A. and Ph.D. degrees in applied mathematics the M. Sc.
    [Show full text]