The Hidden Complexity of Distributed Systems

Thèse n° 8271 The Hidden Complexity of Distributed Systems Présentée le 15 octobre 2020 à la Faculté informatique et communications Laboratoire de calcul distribué Programme doctoral en informatique et communications pour l’obtention du grade de Docteur ès Sciences par Karolos ANTONIADIS Acceptée sur proposition du jury Prof. E. Telatar, président du jury Prof. R. Guerraoui, directeur de thèse Prof. R. Friedman, rapporteur Prof. F. Pedone, rapporteur Prof. M. Grossglauser, rapporteur 2020 Τό νά ὁραματίζεσαι éna σκοπό, aÎtό dèn εἶnai ἀntÐjeto mè τήν πράξη, ἀπεναντίας εἶnai ἡ ἀρχή thc. >Αλλά prèpei ἐν suneqeÐø na ὁρίseic ta mèsa tοῦ σκοποῦ, tά sugkekrimèna σκαλοπάτια éna éna kaÐ νά tά patᾶς. ᾿Εδῶ ὃμως πιά ὁ ἡδοniσμός tοῦ ±raÐou kaÐ tοῦ ὑψηλοῦ sè ἐγκατaleÐpei kaÐ θἄπρεπε νά tόn ἀντikatαστήσει ἡ peijarqÐa kaÐ ἡ ἄσκηση t¨c sugkèntrwshc kaÐ t¨c δουλειᾶς. — KwnstantÐnoc Τσάτσος Man muss nur gehn: Kein Gefühl ist das fernste. — Rainer Maria Rilke To my parents, Sigrid and Jannis Acknowledgements I am extremely grateful to Prof. Rachid Guerraoui for giving me the opportunity to freely pur- sue the work presented in this thesis. I want to thank him for being a great mentor, extremely patient, and encouraging during all those years. I was lucky enough to meet some amazing people during my studies that I have the honor of considering good friends: Georgios Chatzopoulos, Georgios Damaskinos, David Kozhaya, Dragos-Adrian Seredinschi, Vasileios Trigonakis, and Igor Zablotchi. Special thanks go to Igor Zablotchi: without our discussions this work would not have been possible. Additionally, the whole PhD experience would have been rather dull without the interaction with Aggelos Biboudis, Panayiotis Danassis, Tudor David, Chi Thang Duong, El Mahdi El Mhamdi, Jad Hamza, Lê Nguyên Hoang, Christos Kotsalos, Rhicheek Patra, Matej Pavlovic, Javier Picorel, Georg Stefan Schmid, Mahsa Taziki, and Jingjing Wang. Furthermore, I would like to thank France Faille and Fabien Salvi for always being there to help and answer my bureaucratic, and not only questions. I would like to express my gratitude to two persons that played a great role in my academic path. I owe a lot to Prof. Panagiota Fatourou at University of Crete for inspiring me in my early study years and introducing me to research, and to Prof. Souzana Papadopoulou for being a lifelong mathematics mentor and a great role model. Naturally, I owe everything to my parents, Sigrid and Jannis, and my siblings Katerina and Antonios, for always being there for me, no matter what. Words cannot describe how thankful and fortunate I am to be part of this great family. Finally, I want to thank my wife, Maria. Meeting her is the best and most beautiful thing that ever happened to me. I am glad she was part of this journey. The best is yet to come. Bern, September 11, 2020 K. A. i Preface The work presented in this thesis was conducted in the Distributed Computing Laboratory at EPFL under the supervision of Prof. Rachid Guerraoui. The results of this thesis are based on the following articles: • Karolos Antoniadis, Vincent Gramoli, Rachid Guerraoui, Eric Ruppert, and Igor Zablotchi. “Leaderless Consensus.” Under submission. • Karolos Antoniadis, Diego Didona, Rachid Guerraoui, and Willy Zwaenepoel. “The Impossibility of Fast Transactions.” In the Proceedings of the 34th International Parallel and Distributed Processing Symposium (IPDPS), 2020. • Karolos Antoniadis, Rachid Guerraoui, Dahlia Malkhi, and Dragos-Adrian Seredinschi. “State Machine Replication is More Expensive than Consensus.” In the 32nd International Symposium on Distributed Computing (DISC), 2018. Despite the aforementioned papers, during my doctoral studies, I was also involved in work that resulted in the following publications: • Karolos Antoniadis, Rachid Guerraoui, and Vasileios Trigonakis. “Thread-Placement Learning.” In the Proceedings of the 40th International Conference on Distributed Computing Systems (ICDCS), 2020. • (Book Chapter) Karolos Antoniadis and Rachid Guerraoui. “The Notions of Time and Global State in a Distributed System.” In the book “Concurrency: The Works of Leslie Lamport”, 2019. • Karolos Antoniadis, Peva Blanchard, Rachid Guerraoui, and Julien Stainer. “The entropy of a distributed computation random number generation from memory interleaving.” In the Distributed Computing Journal, 2018. iii Abstract The field of distributed computing has a long history, of more than fifty years. During that time, our understanding of the field has improved immensely and a certain body of folklore beliefs has formed. However, such folklore beliefs are not necessarily always true. In this thesis, we identify hidden complexities (in the sense of intricacies or costs) of distributed systems that contradict some of these folklore beliefs. Specifically, in this thesis, we challenge accepted beliefs in the following way: i) consensus algorithms need not be leader-based: there exists a deterministic leaderless consensus algorithm that is robust to non-synchronous peri- ods, ii) completing a state machine replication command can be arbitrarily more expensive than solving a consensus instance, and iii) no data store actually provides fast transactions because they are impossible. Our results are associated to some of the fundamental problems in the field of distributed computing and are of significant practical relevance. Keywords: consensus, fast transactions, leaderless, state machine replication v Zusammenfassung Der Bereich des verteilten Rechnens hat eine lange Geschichte von mehr als fünfzig Jahren. In dieser Zeit hat sich unser Verständnis auf diesem Gebiet immens verbessert, und es hat sich ein gewisser “Volksglaube” herausgebildet. Solche Überlieferungen sind jedoch nicht unbedingt immer wahr. In dieser Arbeit identifizieren wir versteckte Komplexitäten (im Sinne von Feinheiten oder Kosten) verteilter Systeme, die einigen dieser überlieferten Annahmen widersprechen. Kon- kret stellen wir in dieser Arbeit akzeptierte Überzeugungen auf folgende Weise in Frage: i) Konsensusalgorithmen müssen nicht führerbasiert sein: Es gibt einen deterministischen füh- rerlosen Konsensusalgorithmus, der robust gegenüber nicht-synchronen Perioden ist, ii) die Ausführung einer Replikation von Zustandsautomaten kann beliebig teurer sein als die Lösung einer Konsensusinstanz, und iii) kein Datenspeicher bietet tatsächlich schnelle Transaktio- nen, weil sie unmöglich sind. Unsere Ergebnisse stehen im Zusammenhang mit einigen der grundlegenden Probleme auf dem Gebiet des verteilten Rechnens und sind von erheblicher praktischer Relevanz. Schlüsselwörter: Konsensus, schnelle Transaktionen, führerlos, Replikation von Zustandsau- tomaten vii Contents Acknowledgementsi Preface iii Abstract (English/Deutsch)v List of Figures xi 1 Introduction 1 1.1 Distributed Computing Models.............................2 1.2 Consensus and State Machine Replication......................3 1.3 Transactions........................................3 1.4 Contributions.......................................4 1.5 Roadmap..........................................6 I Consensus and State Machine Replication7 2 Leaderless Consensus9 2.1 Introduction........................................9 2.2 Model............................................ 11 2.3 Leaderless Termination................................. 14 2.4 Archipelago: Leaderless Consensus.......................... 17 2.5 Archipelago: Proof of Correctness........................... 21 2.6 Leaderless Consensus in Message Passing...................... 31 2.7 ArchSMR: Archipelago in Practice........................... 37 2.8 Related Work........................................ 40 2.9 Conclusion......................................... 41 3 State Machine Replication is More Expensive Than Consensus 43 3.1 Introduction........................................ 43 3.2 Model............................................ 46 3.2.1 Consensus..................................... 47 ix Contents 3.2.2 State Machine Replication............................ 49 3.3 Complexity Lower Bound on State Machine Replication.............. 58 3.3.1 Complexity Lower Bound............................ 58 3.3.2 Extension to other Models............................ 60 3.4 The Empirical Perspective................................ 61 3.4.1 Experimental Methodology........................... 61 3.4.2 Experimental Results on a Single Machine.................. 63 3.4.3 Wide-area Experiments............................. 65 3.5 Discussion......................................... 67 3.6 Conclusion......................................... 68 II Transactions 71 4 The Impossibility of Fast Transactions 73 4.1 Introduction........................................ 73 4.2 Model............................................ 75 4.3 Fast Transactions Are Impossible............................ 85 4.4 Unbounded-Version Data Store............................. 90 4.5 Conclusion......................................... 100 5 Concluding Remarks 103 Bibliography 105 Curriculum Vitae 113 x List of Figures 2.1 Graphical depiction of a synchronous 1 execution................. 13 ¡ 2.2 With 2 processes, Archipelago might never decide in a synchronous 1 execution ¡ (v0 v)............................................ 23 È 2.3 Execution pattern that appears when the minimum value propagates to the next adopt-commit-max object (x 2)............................ 25 ¸ 2.4 Lemma7 (1)........................................ 29 2.5 Lemma7 (2)........................................ 30 2.6 Lemma7 (3).......................................

The Hidden Complexity of Distributed Systems

42 Paxos Made Moderately Complex

A General Characterization of Indulgence

Velos: One-Sided Paxos for RDMA Applications

Distributed Computing Column 36 Distributed Computing: 2009 Edition

State Machine Replication Is More Expensive Than Consensus

The Impact of RDMA on Agreement

Byzantizing Paxos by Refinement

Table of Contents

The Information Structure of Indulgent Consensus

Epidemic Protocols: from Large Scale to Big Data Davide Frey

Project-Team ASAP

State Machine Replication Is More Expensive Than Consensus