Michel Raynal an Algorithmic Approach
Total Page:16
File Type:pdf, Size:1020Kb
Michel Raynal Fault-Tolerant Message-Passing Distributed Systems An Algorithmic Approach Fault-Tolerant Message-Passing Distributed Systems [email protected] Michel Raynal Fault-Tolerant Message-Passing Distributed Systems An Algorithmic Approach [email protected] Michel Raynal IRISA-ISTIC Université de Rennes 1 Institut Universitaire de France Rennes, France Parts of this work are based on the books “Fault-Tolerant Agreement in Synchronous Message- Passing Systems” and “Communication and Agreement Abstractions for Fault-Tolerant Asynchro- nous Distributed Systems”, author Michel Raynal, © 2010 Morgan & Claypool Publishers (www. morganclaypool.com). Used with permission. ISBN 978-3-319-94140-0 ISBN 978-3-319-94141-7 (eBook) https://doi.org/10.1007/978-3-319-94141-7 Library of Congress Control Number: 2018953101 © Springer Nature Switzerland AG 2018 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland [email protected] Preface La recherche du temps perdu passait par le Web. [...] La memoire´ etait´ devenue inepuisable,´ mais la profondeur du temps [...] avait disparu. On etait´ dans un present´ infini. In Les annees´ (2008), Annie Ernaux (1940) Sed nos immensum spatiis confecimus aequor, Et iam tempus equum fumentia solvere colla.1 In Georgica, Liber II, 541-542, Publius Virgilius (70 BC–19 BC) Je suis arrive´ au jour ou` je ne me souviens plus quand j’ai cesse´ d’etreˆ immortel. In Livro de Cronicas´ , Antonio´ Lobo Antunes (1942) C’est une chose etrange´ a` lafin que le monde Un jour je m’en irai sans en avoir tout dit. In Les yeux et la memoire´ (1954), chant II, Louis Aragon (1897–1982) Tout garder, c’est tout detruire.´ Jacques Derrida (1930–2004) 1French: Mais j’ai dej´ a` fourni une vaste carriere,` il est temps de deteler´ les chevaux tout fumants. English: But now I have traveled a very long way, and the time has come to unyoke my steaming horses. v [email protected] vi Preface What is distributed computing? Distributed computing was born in the late 1970s when researchers and practitioners started taking into account the intrinsic characteristic of physically distributed sys- tems. Thefield then emerged as a specialized research area distinct from networking, operating sys- tems, and parallel computing. Distributed computing arises when one has to solve a problem in terms of distributed entities (usually called processors, nodes, processes, actors, agents, sensors, peers, etc.) such that each entity has only a partial knowledge of the many parameters involved in the problem that has to be solved. While parallel computing and real-time computing can be characterized, respectively, by the terms efficiency and on-time computing, distributed computing can be characterized by the term uncertainty. This uncertainty is created by asynchrony, multiplicity of controlflows, absence of shared memory and global time, failure, dynamicity, mobility, etc. Mastering one form or another of uncertainty is pervasive in all distributed computing problems. A main difficulty in designing distributed algorithms comes from the fact that no entity cooperating in the achievement of a common goal can have an instantaneous knowledge of the current state of the other entities, it can only know their past local states. Although distributed algorithms are often made up of a few lines, their behavior can be difficult to understand and their properties hard to state and prove. Hence, distributed computing is not only a fundamental topic but also a challenging topic where simplicity, elegance, and beauty arefirst-class citizens. Why this book? In the book “Distributed algorithms for message-passing systems” (Springer, 2013), I addressed distributed computing in failure-free message-passing systems, where the computing enti- ties (processes) have to cooperate in the presence of asynchrony. Differently, in my book “Concurrent programming: algorithms, principles and foundations” (Springer, 2013), I addressed distributed com- puting where the computing entities (processes) communicate through a read/write shared memory (e.g., multicore), and the main adversary lies in the net effect of asynchrony and process crashes (unexpected definitive stops). The present book considers synchronous and asynchronous message-passing systems, where pro- cesses can commit crash failures, or Byzantine failures (arbitrary behavior). Its aim is to present in a comprehensive way basic notions, concepts and algorithms in the context of these systems. The main difficulty comes from the uncertainty created by the adversaries managing the environment (mainly asynchrony and failures), which, by its very nature, is not under the control of the system. A quick look at the content of the book The book is composed of four parts, thefirst two are on communication abstractions, the other two on agreement abstractions. Those are the most important abstractions distributed applications rely on in asynchronous and synchronous message-passing sys- tems where processes may crash, or commit Byzantine failures. The book addresses what can be done and what cannot be done in the presence of such adversaries. It consequently presents both impossi- bility results and distributed algorithms. All impossibility results are proved, and all algorithms are described in a simple algorithmic notation and proved correct. Parts on communication abstractions. • – Part I is on the reliable broadcast abstraction. [email protected] Preface vii – Part II is on the construction of read/write registers. Parts on agreement. • – Part III is on agreement in synchronous systems. – Part IV is on agreement in asynchronous systems. On the presentation style When known, the names of the authors of a theorem, or of an algorithm, are indicated together with the date of the associated publication. Moreover, each chapter has a bib- liographical section, where a short historical perspective and references related to that chapter are given. Each chapter terminates with a few exercises and problems, whose solutions can be found in the article cited at the end of the corresponding exercise/problem. From a vocabulary point of view, the following terms are used: an object implements an abstrac- tion, defined by a set of properties, which allows a problem to be solved. Moreover, each algorithm isfirst presented intuitively with words, and then proved correct. Understanding an algorithm is a two-step process: First have a good intuition of its underlying principles, and its possible behaviors. This is nec- • essary, but remains informal. Then prove the algorithm is correct in the model it was designed for. The proof consists in a • logical reasoning, based on the properties provided by (i) the underlying model, and (ii) the statements (code) of the algorithm. More precisely, each property defining the abstraction the algorithm is assumed to implement must be satisfied in all its executions. Only when these two steps have been done, can we say that we understand the algorithm. Audience This book has been written primarily for people who are not familiar with the topic and the concepts that are presented. These include mainly: Senior-level undergraduate students and graduate students in informatics or computing engineer- • ing, who are interested in the principles and algorithmic foundations of fault-tolerant distributed computing. Practitioners and engineers who want to be aware of the state-of-the-art concepts, basic princi- • ples, mechanisms, and techniques encountered in fault-tolerant distributed computing. Prerequisites for this book include undergraduate courses on algorithms, basic knowledge on operat- ing systems, and notions on concurrency in failure-free distributed computing. One-semester courses, based on this book, are suggested in the section titled “How to Use This Book” in the Afterword. Origin of the book and acknowledgments This book has two complementary origins: Thefirst is a set of lectures for undergraduate and graduate courses on distributed computing I • gave at the University of Rennes (France), the Hong Kong Polytechnic University, and, as an invited professor, at several universities all over the world. Hence, I want to thank the numerous students for their questions that, in one way or another, contributed