Distributed Computing
Total Page:16
File Type:pdf, Size:1020Kb
!"#$!%& ''''()*+$!,-' Principles of Distributed Computing Roger Wattenhofer [email protected] Spring 2014 Contents 1 Vertex Coloring 5 1.1 Problem & Model . .5 1.2 Coloring Trees . .8 2 Leader Election 15 2.1 Anonymous Leader Election . 15 2.2 Asynchronous Ring . 16 2.3 Lower Bounds . 18 2.4 Synchronous Ring . 21 3 Tree Algorithms 23 3.1 Broadcast . 23 3.2 Convergecast . 24 3.3 BFS Tree Construction . 25 3.4 MST Construction . 27 4 Distributed Sorting 31 4.1 Array & Mesh . 31 4.2 Sorting Networks . 34 4.3 Counting Networks . 37 5 Shared Memory 45 5.1 Introduction . 45 5.2 Mutual Exclusion . 46 5.3 Store & Collect . 49 5.3.1 Problem Definition . 49 5.3.2 Splitters . 50 5.3.3 Binary Splitter Tree . 51 5.3.4 Splitter Matrix . 53 6 Shared Objects 57 6.1 Introduction . 57 6.2 Arrow and Friends . 58 6.3 Ivy and Friends . 63 7 Maximal Independent Set 69 7.1 MIS . 69 7.2 Original Fast MIS . 71 7.3 Fast MIS v2 . 74 i ii CONTENTS 7.4 Applications . 78 8 Locality Lower Bounds 83 8.1 Locality . 84 8.2 The Neighborhood Graph . 86 9 Social Networks 91 9.1 Small World Networks . 92 9.2 Propagation Studies . 98 10 Synchronization 101 10.1 Basics . 101 10.2 Synchronizer α ............................ 102 10.3 Synchronizer β ............................ 103 10.4 Synchronizer γ ............................ 104 10.5 Network Partition . 106 10.6 Clock Synchronization . 108 11 Hard Problems 115 11.1 Diameter & APSP . 115 11.2 Lower Bound Graphs . 117 11.3 Communication Complexity . 120 11.4 Distributed Complexity Theory . 125 12 Stabilization 129 12.1 Self-Stabilization . 129 12.2 Advanced Stabilization . 134 13 Wireless Protocols 139 13.1 Basics . 139 13.2 Initialization . 141 13.2.1 Non-Uniform Initialization . 141 13.2.2 Uniform Initialization with CD . 141 13.2.3 Uniform Initialization without CD . 142 13.3 Leader Election . 143 13.3.1 With High Probability . 143 13.3.2 Uniform Leader Election . 143 13.3.3 Fast Leader Election with CD . 144 13.3.4 Even Faster Leader Election with CD . 145 13.3.5 Lower Bound . 147 13.3.6 Uniform Asynchronous Wakeup without CD . 148 13.4 Useful Formulas . 148 14 Peer-to-Peer Computing 153 14.1 Introduction . 153 14.2 Architecture Variants . 154 14.3 Hypercubic Networks . 155 14.4 DHT & Churn . 161 14.5 Storage and Multicast . 163 CONTENTS iii 15 Dynamic Networks 171 15.1 Synchronous Edge-Dynamic Networks . 171 15.2 Problem Definitions . 172 15.3 Basic Information Dissemination . 173 15.4 Small Messages . 176 15.4.1 k-Verification . 176 15.4.2 k-Committee Election . 177 15.5 More Stable Graphs . 179 16 All-to-All Communication 183 17 Consensus 189 17.1 Impossibility of Consensus . 189 17.2 Randomized Consensus . 194 18 Multi-Core Computing 199 18.1 Introduction . 199 18.1.1 The Current State of Concurrent Programming . 199 18.2 Transactional Memory . 201 18.3 Contention Management . 202 19 Dominating Set 209 19.1 Sequential Greedy Algorithm . 210 19.2 Distributed Greedy Algorithm . 211 20 Routing 217 20.1 Array . 217 20.2 Mesh . 218 20.3 Routing in the Mesh with Small Queues . 219 20.4 Hot-Potato Routing . 220 20.5 More Models . 222 21 Routing Strikes Back 223 21.1 Butterfly . 223 21.2 Oblivious Routing . 224 21.3 Offline Routing . 225 iv CONTENTS Introduction What is Distributed Computing? In the last few decades, we have experienced an unprecedented growth in the area of distributed systems and networks. Distributed computing now encom- passes many of the activities occurring in today's computer and communications world. Indeed, distributed computing appears in quite diverse application areas: Typical \old school" examples are parallel computers, or the Internet. More re- cent application examples of distributed systems include peer-to-peer systems, sensor networks, and multi-core architectures. These applications have in common that many processors or entities (often called nodes) are active in the system at any moment. The nodes have certain degrees of freedom: they may have their own hardware, their own code, and sometimes their own independent task. Nevertheless, the nodes may share com- mon resources and information, and, in order to solve a problem that concerns several|or maybe even all|nodes, coordination is necessary. Despite these commonalities, a peer-to-peer system, for example, is quite different from a multi-core architecture. Due to such differences, many differ- ent models and parameters are studied in the area of distributed computing. In some systems the nodes operate synchronously, in other systems they oper- ate asynchronously. There are simple homogeneous systems, and heterogeneous systems where different types of nodes, potentially with different capabilities, objectives etc., need to interact. There are different communication techniques: nodes may communicate by exchanging messages, or by means of shared mem- ory. Occasionally the communication infrastructure is tailor-made for an appli- cation, sometimes one has to work with any given infrastructure. The nodes in a system often work together to solve a global task, occasionally the nodes are autonomous agents that have their own agenda and compete for common resources. Sometimes the nodes can be assumed to work correctly, at times they may exhibit failures. In contrast to a single-node system, distributed systems may still function correctly despite failures as other nodes can take over the work of the failed nodes. There are different kinds of failures that can be considered: nodes may just crash, or they might exhibit an arbitrary, erroneous behavior, maybe even to a degree where it cannot be distinguished from malicious (also known as Byzantine) behavior. It is also possible that the nodes follow the rules indeed, however they tweak the parameters to get the most out of the system; in other words, the nodes act selfishly. Apparently, there are many models (and even more combinations of models) that can be studied. We will not discuss them in greater detail now, but simply 1 2 CONTENTS define them when we use them. Towards the end of the course a general picture should emerge. Hopefully! This course introduces the basic principles of distributed computing, high- lighting common themes and techniques. In particular, we study some of the fundamental issues underlying the design of distributed systems: • Communication: Communication does not come for free; often communi- cation cost dominates the cost of local processing or storage. Sometimes we even assume that everything but communication is free. • Coordination: How can you coordinate a distributed system so that it performs some task efficiently? • Fault-tolerance: As mentioned above, one major advantage of a distrib- uted system is that even in the presence of failures the system as a whole may survive. • Locality: Networks keep growing. Luckily, global information is not always needed to solve a task, often it is sufficient if nodes talk to their neighbors. In this course, we will address the fundamental question in distributed computing whether a local solution is possible for a wide range of problems. • Parallelism: How fast can you solve a task if you increase your computa- tional power, e.g., by increasing the number of nodes that can share the workload? How much parallelism is possible for a given problem? • Symmetry breaking: Sometimes some nodes need to be selected to orches- trate the computation (and the communication). This is achieved by a technique called symmetry breaking. • Synchronization: How can you implement a synchronous algorithm in an asynchronous system? • Uncertainty: If we need to agree on a single term that fittingly describes this course, it is probably \uncertainty". As the whole system is distrib- uted, the nodes cannot know what other nodes are doing at this exact moment, and the nodes are required to solve the tasks at hand despite the lack of global knowledge. Finally, there are also a few areas that we will not cover in this course, mostly because these topics have become so important that they deserve and have their own courses. Examples for such topics are distributed programming, software engineering, as well as security and cryptography. In summary, in this class we explore essential algorithmic ideas and lower bound techniques, basically the \pearls" of distributed computing and network algorithms. We will cover a fresh topic every week. Have fun! Chapter Notes Many excellent text books have been written on the subject. The book closest to this course is by David Peleg [Pel00], as it shares about half of the material. BIBLIOGRAPHY 3 A main focus of Peleg's book are network partitions, covers, decompositions, spanners, and labeling schemes, an interesting area that we will only touch in this course. There exist a multitude of other text books that overlap with one or two chapters of this course, e.g., [Lei92, Bar96, Lyn96, Tel01, AW04, HKP+05, CLRS09, Suo12]. Another related course is by James Aspnes [Asp]. Some chapters of this course have been developed in collaboration with (for- mer) Ph.D. students, see chapter notes for details. Many students have helped to improve exercises and script. Thanks go to Philipp Brandes, Raphael Ei- denbenz, Roland Flury, Klaus-Tycho F¨orster,Stephan Holzer, Barbara Keller, Fabian Kuhn, Christoph Lenzen, Thomas Locher, Remo Meier, Thomas Mosci- broda, Regina O'Dell, Yvonne-Anne Pignolet, Jochen Seidel, Stefan Schmid, Johannes Schneider, Jara Uitto, Pascal von Rickenbach (in alphabetical order). Bibliography [Asp] James Aspnes. Notes on Theory of Distributed Systems. [AW04] Hagit Attiya and Jennifer Welch. Distributed Computing: Funda- mentals, Simulations and Advanced Topics (2nd edition). John Wi- ley Interscience, March 2004. [Bar96] Valmir C. Barbosa. An introduction to distributed algorithms.