Distributed Computing

Total Page:16

File Type:pdf, Size:1020Kb

Distributed Computing !"#$!%& ''''()*+$!,-' Principles of Distributed Computing Roger Wattenhofer [email protected] Spring 2014 Contents 1 Vertex Coloring 5 1.1 Problem & Model . .5 1.2 Coloring Trees . .8 2 Leader Election 15 2.1 Anonymous Leader Election . 15 2.2 Asynchronous Ring . 16 2.3 Lower Bounds . 18 2.4 Synchronous Ring . 21 3 Tree Algorithms 23 3.1 Broadcast . 23 3.2 Convergecast . 24 3.3 BFS Tree Construction . 25 3.4 MST Construction . 27 4 Distributed Sorting 31 4.1 Array & Mesh . 31 4.2 Sorting Networks . 34 4.3 Counting Networks . 37 5 Shared Memory 45 5.1 Introduction . 45 5.2 Mutual Exclusion . 46 5.3 Store & Collect . 49 5.3.1 Problem Definition . 49 5.3.2 Splitters . 50 5.3.3 Binary Splitter Tree . 51 5.3.4 Splitter Matrix . 53 6 Shared Objects 57 6.1 Introduction . 57 6.2 Arrow and Friends . 58 6.3 Ivy and Friends . 63 7 Maximal Independent Set 69 7.1 MIS . 69 7.2 Original Fast MIS . 71 7.3 Fast MIS v2 . 74 i ii CONTENTS 7.4 Applications . 78 8 Locality Lower Bounds 83 8.1 Locality . 84 8.2 The Neighborhood Graph . 86 9 Social Networks 91 9.1 Small World Networks . 92 9.2 Propagation Studies . 98 10 Synchronization 101 10.1 Basics . 101 10.2 Synchronizer α ............................ 102 10.3 Synchronizer β ............................ 103 10.4 Synchronizer γ ............................ 104 10.5 Network Partition . 106 10.6 Clock Synchronization . 108 11 Hard Problems 115 11.1 Diameter & APSP . 115 11.2 Lower Bound Graphs . 117 11.3 Communication Complexity . 120 11.4 Distributed Complexity Theory . 125 12 Stabilization 129 12.1 Self-Stabilization . 129 12.2 Advanced Stabilization . 134 13 Wireless Protocols 139 13.1 Basics . 139 13.2 Initialization . 141 13.2.1 Non-Uniform Initialization . 141 13.2.2 Uniform Initialization with CD . 141 13.2.3 Uniform Initialization without CD . 142 13.3 Leader Election . 143 13.3.1 With High Probability . 143 13.3.2 Uniform Leader Election . 143 13.3.3 Fast Leader Election with CD . 144 13.3.4 Even Faster Leader Election with CD . 145 13.3.5 Lower Bound . 147 13.3.6 Uniform Asynchronous Wakeup without CD . 148 13.4 Useful Formulas . 148 14 Peer-to-Peer Computing 153 14.1 Introduction . 153 14.2 Architecture Variants . 154 14.3 Hypercubic Networks . 155 14.4 DHT & Churn . 161 14.5 Storage and Multicast . 163 CONTENTS iii 15 Dynamic Networks 171 15.1 Synchronous Edge-Dynamic Networks . 171 15.2 Problem Definitions . 172 15.3 Basic Information Dissemination . 173 15.4 Small Messages . 176 15.4.1 k-Verification . 176 15.4.2 k-Committee Election . 177 15.5 More Stable Graphs . 179 16 All-to-All Communication 183 17 Consensus 189 17.1 Impossibility of Consensus . 189 17.2 Randomized Consensus . 194 18 Multi-Core Computing 199 18.1 Introduction . 199 18.1.1 The Current State of Concurrent Programming . 199 18.2 Transactional Memory . 201 18.3 Contention Management . 202 19 Dominating Set 209 19.1 Sequential Greedy Algorithm . 210 19.2 Distributed Greedy Algorithm . 211 20 Routing 217 20.1 Array . 217 20.2 Mesh . 218 20.3 Routing in the Mesh with Small Queues . 219 20.4 Hot-Potato Routing . 220 20.5 More Models . 222 21 Routing Strikes Back 223 21.1 Butterfly . 223 21.2 Oblivious Routing . 224 21.3 Offline Routing . 225 iv CONTENTS Introduction What is Distributed Computing? In the last few decades, we have experienced an unprecedented growth in the area of distributed systems and networks. Distributed computing now encom- passes many of the activities occurring in today's computer and communications world. Indeed, distributed computing appears in quite diverse application areas: Typical \old school" examples are parallel computers, or the Internet. More re- cent application examples of distributed systems include peer-to-peer systems, sensor networks, and multi-core architectures. These applications have in common that many processors or entities (often called nodes) are active in the system at any moment. The nodes have certain degrees of freedom: they may have their own hardware, their own code, and sometimes their own independent task. Nevertheless, the nodes may share com- mon resources and information, and, in order to solve a problem that concerns several|or maybe even all|nodes, coordination is necessary. Despite these commonalities, a peer-to-peer system, for example, is quite different from a multi-core architecture. Due to such differences, many differ- ent models and parameters are studied in the area of distributed computing. In some systems the nodes operate synchronously, in other systems they oper- ate asynchronously. There are simple homogeneous systems, and heterogeneous systems where different types of nodes, potentially with different capabilities, objectives etc., need to interact. There are different communication techniques: nodes may communicate by exchanging messages, or by means of shared mem- ory. Occasionally the communication infrastructure is tailor-made for an appli- cation, sometimes one has to work with any given infrastructure. The nodes in a system often work together to solve a global task, occasionally the nodes are autonomous agents that have their own agenda and compete for common resources. Sometimes the nodes can be assumed to work correctly, at times they may exhibit failures. In contrast to a single-node system, distributed systems may still function correctly despite failures as other nodes can take over the work of the failed nodes. There are different kinds of failures that can be considered: nodes may just crash, or they might exhibit an arbitrary, erroneous behavior, maybe even to a degree where it cannot be distinguished from malicious (also known as Byzantine) behavior. It is also possible that the nodes follow the rules indeed, however they tweak the parameters to get the most out of the system; in other words, the nodes act selfishly. Apparently, there are many models (and even more combinations of models) that can be studied. We will not discuss them in greater detail now, but simply 1 2 CONTENTS define them when we use them. Towards the end of the course a general picture should emerge. Hopefully! This course introduces the basic principles of distributed computing, high- lighting common themes and techniques. In particular, we study some of the fundamental issues underlying the design of distributed systems: • Communication: Communication does not come for free; often communi- cation cost dominates the cost of local processing or storage. Sometimes we even assume that everything but communication is free. • Coordination: How can you coordinate a distributed system so that it performs some task efficiently? • Fault-tolerance: As mentioned above, one major advantage of a distrib- uted system is that even in the presence of failures the system as a whole may survive. • Locality: Networks keep growing. Luckily, global information is not always needed to solve a task, often it is sufficient if nodes talk to their neighbors. In this course, we will address the fundamental question in distributed computing whether a local solution is possible for a wide range of problems. • Parallelism: How fast can you solve a task if you increase your computa- tional power, e.g., by increasing the number of nodes that can share the workload? How much parallelism is possible for a given problem? • Symmetry breaking: Sometimes some nodes need to be selected to orches- trate the computation (and the communication). This is achieved by a technique called symmetry breaking. • Synchronization: How can you implement a synchronous algorithm in an asynchronous system? • Uncertainty: If we need to agree on a single term that fittingly describes this course, it is probably \uncertainty". As the whole system is distrib- uted, the nodes cannot know what other nodes are doing at this exact moment, and the nodes are required to solve the tasks at hand despite the lack of global knowledge. Finally, there are also a few areas that we will not cover in this course, mostly because these topics have become so important that they deserve and have their own courses. Examples for such topics are distributed programming, software engineering, as well as security and cryptography. In summary, in this class we explore essential algorithmic ideas and lower bound techniques, basically the \pearls" of distributed computing and network algorithms. We will cover a fresh topic every week. Have fun! Chapter Notes Many excellent text books have been written on the subject. The book closest to this course is by David Peleg [Pel00], as it shares about half of the material. BIBLIOGRAPHY 3 A main focus of Peleg's book are network partitions, covers, decompositions, spanners, and labeling schemes, an interesting area that we will only touch in this course. There exist a multitude of other text books that overlap with one or two chapters of this course, e.g., [Lei92, Bar96, Lyn96, Tel01, AW04, HKP+05, CLRS09, Suo12]. Another related course is by James Aspnes [Asp]. Some chapters of this course have been developed in collaboration with (for- mer) Ph.D. students, see chapter notes for details. Many students have helped to improve exercises and script. Thanks go to Philipp Brandes, Raphael Ei- denbenz, Roland Flury, Klaus-Tycho F¨orster,Stephan Holzer, Barbara Keller, Fabian Kuhn, Christoph Lenzen, Thomas Locher, Remo Meier, Thomas Mosci- broda, Regina O'Dell, Yvonne-Anne Pignolet, Jochen Seidel, Stefan Schmid, Johannes Schneider, Jara Uitto, Pascal von Rickenbach (in alphabetical order). Bibliography [Asp] James Aspnes. Notes on Theory of Distributed Systems. [AW04] Hagit Attiya and Jennifer Welch. Distributed Computing: Funda- mentals, Simulations and Advanced Topics (2nd edition). John Wi- ley Interscience, March 2004. [Bar96] Valmir C. Barbosa. An introduction to distributed algorithms.
Recommended publications
  • ACM SIGACT News Distributed Computing Column 28
    ACM SIGACT News Distributed Computing Column 28 Idit Keidar Dept. of Electrical Engineering, Technion Haifa, 32000, Israel [email protected] Sergio Rajsbaum, who edited this column for seven years and established it as a relevant and popular venue, is stepping down. This issue is my first step in the big shoes he vacated. I would like to take this opportunity to thank Sergio for providing us with seven years’ worth of interesting columns. In producing these columns, Sergio has enjoyed the support of the community at-large and obtained material from many authors, who greatly contributed to the column’s success. I hope to enjoy a similar level of support; I warmly welcome your feedback and suggestions for material to include in this column! The main two conferences in the area of principles of distributed computing, PODC and DISC, took place this summer. This issue is centered around these conferences, and more broadly, distributed computing research as reflected therein, ranging from reviews of this year’s instantiations, through influential papers in past instantiations, to examining PODC’s place within the realm of computer science. I begin with a short review of PODC’07, and highlight some “hot” trends that have taken root in PODC, as reflected in this year’s program. Some of the forthcoming columns will be dedicated to these up-and- coming research topics. This is followed by a review of this year’s DISC, by Edward (Eddie) Bortnikov. For some perspective on long-running trends in the field, I next include the announcement of this year’s Edsger W.
    [Show full text]
  • Edsger Dijkstra: the Man Who Carried Computer Science on His Shoulders
    INFERENCE / Vol. 5, No. 3 Edsger Dijkstra The Man Who Carried Computer Science on His Shoulders Krzysztof Apt s it turned out, the train I had taken from dsger dijkstra was born in Rotterdam in 1930. Nijmegen to Eindhoven arrived late. To make He described his father, at one time the president matters worse, I was then unable to find the right of the Dutch Chemical Society, as “an excellent Aoffice in the university building. When I eventually arrived Echemist,” and his mother as “a brilliant mathematician for my appointment, I was more than half an hour behind who had no job.”1 In 1948, Dijkstra achieved remarkable schedule. The professor completely ignored my profuse results when he completed secondary school at the famous apologies and proceeded to take a full hour for the meet- Erasmiaans Gymnasium in Rotterdam. His school diploma ing. It was the first time I met Edsger Wybe Dijkstra. shows that he earned the highest possible grade in no less At the time of our meeting in 1975, Dijkstra was 45 than six out of thirteen subjects. He then enrolled at the years old. The most prestigious award in computer sci- University of Leiden to study physics. ence, the ACM Turing Award, had been conferred on In September 1951, Dijkstra’s father suggested he attend him three years earlier. Almost twenty years his junior, I a three-week course on programming in Cambridge. It knew very little about the field—I had only learned what turned out to be an idea with far-reaching consequences. a flowchart was a couple of weeks earlier.
    [Show full text]
  • Arxiv:1901.04709V1 [Cs.GT]
    Self-Stabilization Through the Lens of Game Theory Krzysztof R. Apt1,2 and Ehsan Shoja3 1 CWI, Amsterdam, The Netherlands 2 MIMUW, University of Warsaw, Warsaw, Poland 3 Sharif University of Technology, Tehran, Iran Abstract. In 1974 E.W. Dijkstra introduced the seminal concept of self- stabilization that turned out to be one of the main approaches to fault- tolerant computing. We show here how his three solutions can be for- malized and reasoned about using the concepts of game theory. We also determine the precise number of steps needed to reach self-stabilization in his first solution. 1 Introduction In 1974 Edsger W. Dijkstra introduced in a two-page article [10] the notion of self-stabilization. The paper was completely ignored until 1983, when Leslie Lamport stressed its importance in his invited talk at the ACM Symposium on Principles of Distributed Computing (PODC), published a year later as [21]. Things have changed since then. According to Google Scholar Dijkstra’s paper has been by now cited more than 2300 times. It became one of the main ap- proaches to fault tolerant computing. An early survey was published in 1993 as [26], while the research on the subject until 2000 was summarized in the book [13]. In 2002 Dijkstra’s paper won the PODC influential paper award (renamed in 2003 to Dijkstra Prize). The literature on the subject initiated by it continues to grow. There are annual Self-Stabilizing Systems Workshops, the 18th edition of which took part in 2016. The idea proposed by Dijkstra is very simple. Consider a distributed system arXiv:1901.04709v1 [cs.GT] 15 Jan 2019 viewed as a network of machines.
    [Show full text]
  • Distributed Algorithms for Wireless Networks
    A THEORETICAL VIEW OF DISTRIBUTED SYSTEMS Nancy Lynch, MIT EECS, CSAIL National Science Foundation, April 1, 2021 Theory for Distributed Systems • We have worked on theory for distributed systems, trying to understand (mathematically) their capabilities and limitations. • This has included: • Defining abstract, mathematical models for problems solved by distributed systems, and for the algorithms used to solve them. • Developing new algorithms. • Producing rigorous proofs, of correctness, performance, fault-tolerance. • Proving impossibility results and lower bounds, expressing inherent limitations of distributed systems for solving problems. • Developing foundations for modeling, analyzing distributed systems. • Kinds of systems: • Distributed data-management systems. • Wired, wireless communication systems. • Biological systems: Insect colonies, developmental biology, brains. This talk: 1. Algorithms for Traditional Distributed Systems 2. Impossibility Results 3. Foundations 4. Algorithms for New Distributed Systems 1. Algorithms for Traditional Distributed Systems • Mutual exclusion in shared-memory systems, resource allocation: Fischer, Burns,…late 70s and early 80s. • Dolev, Lynch, Pinter, Stark, Weihl. Reaching approximate agreement in the presence of faults. JACM,1986. • Lundelius, Lynch. A new fault-tolerant algorithm for clock synchronization. Information and Computation,1988. • Dwork, Lynch, Stockmeyer. Consensus in the presence of partial synchrony. JACM,1988. Dijkstra Prize, 2007. 1A. Dwork, Lynch, Stockmeyer [DLS] “This paper introduces a number of practically motivated partial synchrony models that lie between the completely synchronous and the completely asynchronous models and in which consensus is solvable. It gave practitioners the right tool for building fault-tolerant systems and contributed to the understanding that safety can be maintained at all times, despite the impossibility of consensus, and progress is facilitated during periods of stability.
    [Show full text]
  • 2020 SIGACT REPORT SIGACT EC – Eric Allender, Shuchi Chawla, Nicole Immorlica, Samir Khuller (Chair), Bobby Kleinberg September 14Th, 2020
    2020 SIGACT REPORT SIGACT EC – Eric Allender, Shuchi Chawla, Nicole Immorlica, Samir Khuller (chair), Bobby Kleinberg September 14th, 2020 SIGACT Mission Statement: The primary mission of ACM SIGACT (Association for Computing Machinery Special Interest Group on Algorithms and Computation Theory) is to foster and promote the discovery and dissemination of high quality research in the domain of theoretical computer science. The field of theoretical computer science is the rigorous study of all computational phenomena - natural, artificial or man-made. This includes the diverse areas of algorithms, data structures, complexity theory, distributed computation, parallel computation, VLSI, machine learning, computational biology, computational geometry, information theory, cryptography, quantum computation, computational number theory and algebra, program semantics and verification, automata theory, and the study of randomness. Work in this field is often distinguished by its emphasis on mathematical technique and rigor. 1. Awards ▪ 2020 Gödel Prize: This was awarded to Robin A. Moser and Gábor Tardos for their paper “A constructive proof of the general Lovász Local Lemma”, Journal of the ACM, Vol 57 (2), 2010. The Lovász Local Lemma (LLL) is a fundamental tool of the probabilistic method. It enables one to show the existence of certain objects even though they occur with exponentially small probability. The original proof was not algorithmic, and subsequent algorithmic versions had significant losses in parameters. This paper provides a simple, powerful algorithmic paradigm that converts almost all known applications of the LLL into randomized algorithms matching the bounds of the existence proof. The paper further gives a derandomized algorithm, a parallel algorithm, and an extension to the “lopsided” LLL.
    [Show full text]
  • LNCS 4308, Pp
    On Distributed Verification Amos Korman and Shay Kutten Faculty of IE&M, Technion, Haifa 32000, Israel Abstract. This paper describes the invited talk given at the 8th Inter- national Conference on Distributed Computing and Networking (ICDCN 2006), at the Indian Institute of Technology Guwahati, India. This talk was intended to give a partial survey and to motivate further studies of distributed verification. To serve the purpose of motivating, we allow ourselves to speculate freely on the potential impact of such research. In the context of sequential computing, it is common to assume that the task of verifying a property of an object may be much easier than computing it (consider, for example, solving an NP-Complete problem versus verifying a witness). Extrapolating from the impact the separation of these two notions (computing and verifying) had in the context of se- quential computing, the separation may prove to have a profound impact on the field of distributed computing as well. In addition, in the context of distributed computing, the motivation for the separation seems even stronger than in the centralized sequential case. In this paper we explain some motivations for specific definitions, sur- vey some very related notions and their motivations in the literature, survey some examples for problems and solutions, and mention some ad- ditional general results such as general algorithmic methods and general lower bounds. Since this paper is mostly intended to “give a taste” rather than be a comprehensive survey, we apologize to authors of additional related papers that we did not mention or detailed. 1 Introduction This paper addresses the problem of locally verifying global properties.
    [Show full text]
  • Stabilization
    Chapter 13 Stabilization A large branch of research in distributed computing deals with fault-tolerance. Being able to tolerate a considerable fraction of failing or even maliciously be- having (\Byzantine") nodes while trying to reach consensus (on e.g. the output of a function) among the nodes that work properly is crucial for building reli- able systems. However, consensus protocols require that a majority of the nodes remains non-faulty all the time. Can we design a distributed system that survives transient (short-lived) failures, even if all nodes are temporarily failing? In other words, can we build a distributed system that repairs itself ? 13.1 Self-Stabilization Definition 13.1 (Self-Stabilization). A distributed system is self-stabilizing if, starting from an arbitrary state, it is guaranteed to converge to a legitimate state. If the system is in a legitimate state, it is guaranteed to remain there, provided that no further faults happen. A state is legitimate if the state satisfies the specifications of the distributed system. Remarks: • What kind of transient failures can we tolerate? An adversary can crash nodes, or make nodes behave Byzantine. Indeed, temporarily an adversary can do harm in even worse ways, e.g. by corrupting the volatile memory of a node (without the node noticing { not unlike the movie Memento), or by corrupting messages on the fly (without anybody noticing). However, as all failures are transient, eventually all nodes must work correctly again, that is, crashed nodes get res- urrected, Byzantine nodes stop being malicious, messages are being delivered reliably, and the memory of the nodes is secure.
    [Show full text]
  • Kein Folientitel
    182.703: Problems in Distributed Computing (Part 3) WS 2019 Ulrich Schmid Institute of Computer Engineering, TU Vienna Embedded Computing Systems Group E191-02 [email protected] Content (Part 3) The Role of Synchrony Conditions Failure Detectors Real-Time Clocks Partially Synchronous Models Models supporting lock-step round simulations Weaker partially synchronous models Dynamic distributed systems U. Schmid 182.703 PRDC 2 The Role of Synchrony Conditions U. Schmid 182.703 PRDC 3 Recall Distributed Agreement (Consensus) Yes No Yes NoYes? None? meet All meet Yes No No U. Schmid 182.703 PRDC 4 Recall Consensus Impossibility (FLP) Fischer, Lynch und Paterson [FLP85]: “There is no deterministic algorithm for solving consensus in an asynchronous distributed system in the presence of a single crash failure.” Key problem: Distinguish slow from dead! U. Schmid 182.703 PRDC 5 Consensus Solvability in ParSync [DDS87] (I) Dolev, Dwork and Stockmeyer investigated consensus solvability in Partially Synchronous Systems (ParSync), varying 5 „synchrony handles“ : • Processors synchronous / asynchronous • Communication synchronous / asynchronous • Message order synchronous (system-wide consistent) / asynchronous (out-of-order) • Send steps broadcast / unicast • Computing steps atomic rec+send / separate rec, send U. Schmid 182.703 PRDC 6 Consensus Solvability in ParSync [DDS87] (II) Wait-free consensus possible Consensus impossible s/r Consensus possible s+r for f=1 ucast bcast async sync Global message order message Global async sync Communication U. Schmid 182.703 PRDC 7 The Role of Synchrony Conditions Enable failure detection Enforce event ordering • Distinguish „old“ from „new“ • Ruling out existence of stale (in-transit) information • Distinguish slow from dead • Creating non-overlapping „phases of operation“ (rounds) U.
    [Show full text]
  • Distributed Consensus: Performance Comparison of Paxos and Raft
    DEGREE PROJECT IN INFORMATION AND COMMUNICATION TECHNOLOGY, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2020 Distributed Consensus: Performance Comparison of Paxos and Raft HARALD NG KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Distributed Consensus: Performance Comparison of Paxos and Raft HARALD NG Master in Software Engineering of Distributed Systems Date: September 10, 2020 Supervisor: Lars Kroll (RISE), Max Meldrum (KTH) Examiner: Seif Haridi School of Electrical Engineering and Computer Science Host company: RISE Swedish title: Distribuerad Konsensus: Prestandajämförelse mellan Paxos och Raft iii Abstract With the growth of the internet, distributed systems have become increasingly important in order to provide more available and scalable applications. Con- sensus is a fundamental problem in distributed systems where multiple pro- cesses have to agree on the same proposed value in the presence of partial failures. Distributed consensus allows for building various applications such as lock services, configuration manager services or distributed databases. Two well-known consensus algorithms for building distributed logs are Multi-Paxos and Raft. Multi-Paxos was published almost three decades before Raft and gained a lot of popularity. However, critics of Multi-Paxos consider it difficult to understand. Raft was therefore published with the motivation of being an easily understood consensus algorithm. The Raft algorithm shares similar characteristics with a practical version of Multi-Paxos called Leader- based Sequence Paxos. However, the algorithms differ in important aspects such as leader election and reconfiguration. Existing work mainly compares Multi-Paxos and Raft in theory, but there is a lack of performance comparisons in practice. Hence, prototypes of Leader- based Sequence Paxos and Raft have been designed and implemented in this thesis.
    [Show full text]
  • Download Distributed Systems Free Ebook
    DISTRIBUTED SYSTEMS DOWNLOAD FREE BOOK Maarten van Steen, Andrew S Tanenbaum | 596 pages | 01 Feb 2017 | Createspace Independent Publishing Platform | 9781543057386 | English | United States Distributed Systems - The Complete Guide The hope is that together, the system can maximize resources and information while preventing failures, as if one system fails, it won't affect the availability of the service. Banker's algorithm Dijkstra's algorithm DJP algorithm Prim's algorithm Dijkstra-Scholten algorithm Dekker's algorithm generalization Smoothsort Shunting-yard algorithm Distributed Systems marking algorithm Concurrent algorithms Distributed Systems algorithms Deadlock prevention algorithms Mutual exclusion algorithms Self-stabilizing Distributed Systems. Learn to code for free. For the first time computers would be able to send messages to other systems with a local IP address. The messages passed between machines contain forms of data that the systems want to share like databases, objects, and Distributed Systems. Also known as distributed computing and distributed databases, a distributed system is a collection of independent components located on different machines that share messages with each other in order to achieve common goals. To prevent infinite loops, running the code requires some amount of Ether. As mentioned in many places, one of which this great articleyou cannot have consistency and availability without partition tolerance. Because it works in batches jobs a problem arises where if your job fails — Distributed Systems need to restart the whole thing. While in a voting system an attacker need only add nodes to the network which is Distributed Systems, as free access to the network is a design targetin a CPU power based scheme an attacker faces a physical limitation: getting access to more and more powerful hardware.
    [Show full text]
  • Distributed Computing Column 36 Distributed Computing: 2009 Edition
    Distributed Computing Column 36 Distributed Computing: 2009 Edition Idit Keidar Dept. of Electrical Engineering, Technion Haifa, 32000, Israel [email protected] It’s now the season for colorful reviews of 2009. While you can read elsewhere about the year in sports, top-40 pop hit charts, people of the year, and so on, I throw my share into the mix with a (biased) review of this year’s major distributed computing events. Awards First, let’s look at awards. This year we learned that two women were recognized with ACM and IEEE prestigious awards for their achievements in, (among other things), distributed computing. Barbara Liskov won the 2008 ACM Turing Award for a range of contributions to practical and theoretical foundations of programming language and system design, some of which are in distributed computing. The award citation mentions her impact on distributed programming, decentralized information flow, replicated storage, and modular upgrading of distributed systems. I include below a short reflection, by Rodrigo Rodrigues, on Barbara Liskov’s Turing Award and legacy. Looking ahead to 2010, it was recently announced that Nancy Lynch is the recipient of the prestigious 2010 IEEE Emanuel R. Piore Award1 for her contributions to foundations of distributed and concurrent computing. The award is given for outstanding contributions in the field of information processing in relation to computer science. Leslie Lamport has won the same award in 2004 for his seminal contributions to the theory and practice of concurrent programming and fault-tolerant computing. The 2009 Edsger W. Dijkstra Prize in Distributed Computing was awarded to Joseph Halpern and Yoram Moses for “Knowledge and Common Knowledge in a Distributed Environment”.
    [Show full text]
  • The Need for Language Support for Fault-Tolerant Distributed Systems
    The Need for Language Support for Fault-tolerant Distributed Systems Cezara Drăgoi1, Thomas A. Henzinger∗2, and Damien Zufferey†3 1 INRIA, ENS, CNRS Paris, France [email protected] 2 IST Austria Klosterneuburg, Austria [email protected] 3 MIT CSAIL Cambridge, USA [email protected] Abstract Fault-tolerant distributed algorithms play an important role in many critical/high-availability applications. These algorithms are notoriously difficult to implement correctly, due to asyn- chronous communication and the occurrence of faults, such as the network dropping messages or computers crashing. Nonetheless there is surprisingly little language and verification support to build distributed systems based on fault-tolerant algorithms. In this paper, we present some of the challenges that a designer has to overcome to implement a fault-tolerant distributed system. Then we review different models that have been proposed to reason about distributed algorithms and sketch how such a model can form the basis for a domain-specific programming language. Adopting a high-level programming model can simplify the programmer’s life and make the code amenable to automated verification, while still compiling to efficiently executable code. We conclude by summarizing the current status of an ongoing language design and implementation project that is based on this idea. 1998 ACM Subject Classification D.3.3 Language Constructs and Features Keywords and phrases Programming language, Fault-tolerant distributed algorithms, Auto- mated verification Digital Object Identifier 10.4230/LIPIcs.xxx.yyy.p 1 Introduction Replication of data across multiple data centers allows applications to be available even in the case of failures, guaranteeing clients access to the data.
    [Show full text]