CMU-CS-20-126 August 26, 2020

Theoretical Foundations for Practical Concurrent and Distributed Computation Naama Ben-David CMU-CS-20-126 August 26, 2020 Computer Science Department School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Thesis Committee: Guy Blelloch, Chair Umut Acar Phillip Gibbons Mor Harchol-Balter Erez Petrank (Technion) Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Copyright c 2020 Naama Ben-David This research was sponsored by the National Science Foundation award numbers: CCF1919223, CCF1533858, and CCF1629444; by a Microsoft Research PhD Fellowship; and by a National Sciences and Engineering Research Council of Canada (NSERC) Postgraduate Scholarship-Doctoral (PGS-D) fellowship. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of any sponsoring institution, the U.S. government or any other entity. Keywords: Multiprocessor hardware, NUMA, contention, shared memory, NVRAM, RDMA Abstract Many large-scale computations are nowadays computed using several processes, whether on a single multi-core machine, or distributed over many machines. This wide-spread use of concurrent and distributed technology is also driving innovations in their underlying hardware. To design fast and correct algorithms in such settings, it is important to develop a theory of concurrent and distributed computing that is faithful to practice. Unfortunately, it is difficult to abstract practical problems into approachable theoretical ones, leading to theoretical models that are too far removed from reality to be easily applied in practice. This thesis aims to bridge this gap by building a strong theoretical foundation for practical concurrent and distributed computing. The thesis takes a two-pronged approach toward this goal; improving theoretical analysis of well-studied settings, and modeling new technologies theoretically. In the first vein, we consider the problem of analyzing shared-memory concurrent algorithms such that the analysis reflects their performance in practice. A new shared memory model that accounts for contention costs is presented, as well as a tool for uncovering the way that a machine interleaves concurrent instructions. We also show that the analysis of concurrent algorithms in more restricted settings can be easy and accurate, by designing and analyzing a concurrent algorithm for nested parallelism. The second approach explored in this thesis to bridge the theory-practice gap considers and models two emerging technologies. Firstly, we study non-volatile random access memories (NVRAM) and develop a general simulation that can adapt many classic concurrent algorithms to a setting in which memory is non-volatile and can recover after a system fault. Finally, we study remote direct memory access (RDMA) as a tool for replication in data centers. We develop a model that captures the power of RDMA and demonstrate that RDMA supports heightened performance and fault tolerance, thereby uncovering its practical relevance by formally reasoning about it theoretically. iv Acknowledgements I would like to express my deep gratitude to my incredible advisor, Guy Blelloch. Being a good researcher and a good advisor are two very different skill-sets, and I feel extremely fortunate to have worked with someone who is so profoundly great at both. I’ve had the privilege of working closely with Guy over the five years of my PhD, allowing me to learn from his wide breadth of technical knowledge and his critical way of choosing worthy problems. At the same time, Guy has always allowed me to explore my own research interests, supporting and encouraging my various internships, visits, and other collaborations. I cannot think of a better role model, and I can only hope to continue to learn from him in the future. I’d also like to thank my other thesis committee members, Umut Acar, Phil Gibbons, Mor Harchol-Balter, and Erez Petrank, for their time, their wisdom, and their advice. I especially thank Erez for his friendship, relentless support, and complete confidence in my ability as a researcher. He has made the Technion feel like a second home-base for me, welcoming me with office space, regular research meetings, and frequent lunch excursions whenever I visit. During my PhD, I’ve had the opportunity to work with many amazing researchers, who have inspired me and made my research more enjoyable. I therefore thank my collaborators, Umut Acar, Marcos Aguilera, Guy Blelloch, Irina Calciu, David Yu Cheng Chan, Jeremy Fine- man, Michal Friedman, Phil Gibbons, Yan Gu, Rachid Guerraoui, Vassos Hadzilacos, Virendra Marathe, Charles McGuffey, Erez Petrank, Mike Rainey, Ziv Scully, Julian Shun, Yihan Sun, Sam Toueg, Yuanhao (Hao) Wei, Athanasios (Thanasis) Xygkis, and Igor Zablotchi. I’ve especially enjoyed my collaborations with Hao; I thank him for the many insightful conversations, brainstorming sessions, and long deadline-fueled nights. I would like to thank my internship/visit hosts, Marcos Aguilera and Rachid Guerraoui. Mar- cos hosted me during my internship at VMware, and introduced me to an exciting research area and to many opportunities that have enriched my PhD experience, even long after my internship ended. Throughout it all, Marcos has been a great collaborator and a reliable source of support and advice. Rachid invited me to visit him at EPFL, which became one of the most fun and productive semesters of my degree. I thank him for this opportunity, for his encouragement and for the examples he sets, both in research and in life. The years spent working on my PhD have been the best in my life. This would not have been possible without my incredible friends. While there are too many friends to name here, I am grateful for each office chat, coffee break, walk, talk, lunch, dinner, karaoke night, library and coffee shop work session, work party, and real party. You have made Pittsburgh my home; the place I feel most comfortable and miss when I am away. I’d like to especially thank Bailey Flanigan and Ram Raghunathan. Bailey has kept me sane during the quarantine, providing com- v pany, laughter, support, good food, and overall close friendship whenever I needed it. Ram is an extraordinary friend whom I feel lucky to have in my life. I am looking forward to finally living in the same metropolitan area as him again! My fiance, David Wajc, is both a family member and a friend. He has been there for me through the day-to-day frustrations and joys, making my successes more rewarding and my fail- ures less discouraging. I thank him for the inexpressible comfort in the knowledge that despite leaving Pittsburgh behind, I will feel at home anywhere with him. Finally, I thank my mother Shoham Ben-David, my father Shai Ben-David, my two brothers Shalev and Sheffi Ben-David, and my sister Lior Hochman, for their unconditional love and support. They provided encouragement to pursue my passion, confidence that I can achieve anything, and the reassurance that research is not the most important thing in life. I dedicate this thesis to them. vi Contents 1 Introduction 1 1.1 Part 1: Contention in Shared Memory . .3 1.1.1 Analyzing Backoff . .4 1.1.2 Measuring Real Schedules . .5 1.1.3 Leveraging Structure in Nested Parallel Programs . .6 1.2 Part 2: Non Volatile Memory . .7 1.2.1 Delay-Free Persistent Simulations . .8 1.3 Part 3: Remote Direct Memory Access . .9 1.3.1 Scaling to Large Networks . 10 1.3.2 Dynamic Permissions in Small Networks . 11 1.3.3 State Machine Replication with RDMA . 12 1.4 Summary of Contributions . 13 1.5 Model and Preliminaries . 14 1.6 Bibliographic Note . 17 I Contention in Shared Memory 19 2 Analyzing Backoff 21 2.1 Introduction . 21 2.2 Model . 23 2.3 Exponential Backoff . 26 2.3.1 Analyzing Exponential Delay . 27 2.4 A New Approach: Adaptive Probability . 31 2.4.1 Analysis . 32 2.5 Related Work: Other Contention Models . 37 3 Measuring Scheduling Patterns under Contention 39 3.1 Introduction . 39 3.1.1 Our Contributions . 40 3.2 Background and Machine Details . 41 3.2.1 NUMA Architectures . 41 3.2.2 Lock-Free Algorithms and Scheduling . 41 3.2.3 Machines Used . 42 vii 3.3 The Benchmark . 43 3.3.1 Implementation Details . 44 3.3.2 Experiments Shown . 45 3.4 Inferring Scheduling Models . 46 3.4.1 AMD Scheduling Model . 46 3.4.2 Intel Scheduling Model . 48 3.5 Takeaways for Fairness and Focus . 52 3.5.1 Fairness . 53 3.5.2 Focus . 55 3.6 Related Work . 58 3.7 Chapter Discussion . 59 4 Contention-Bounded Series-Parallel DAGs 61 4.1 Introduction . 61 4.2 Background and Peliminaries . 63 4.2.1 SP-DAGs . 63 4.2.2 The SNZI Data Structure . 64 4.3 The Series-Parallel Dag Data Structure . 65 4.3.1 Incounters . 66 4.3.2 Dynamic SNZI . 68 4.3.3 Choosing Handles to the In-Counter . 69 4.4 Correctness and Analysis . 72 4.4.1 Running Time . 72 4.4.2 Space Bounds . 76 4.5 Experimental Evaluation . 77 4.6 Related Work . 81 II Non Volatile Random Access Memory 83 5 Delay-Free Simulations on NVRAM 85 5.1 Introduction . 85 5.2 Model and Preliminaries . 88 5.2.1 Capsules . 89 5.3 k-Delay Simulations . 89 5.4 Building Blocks . 91 5.4.1 Capsule Implementation . 91 5.4.2 Recoverable Primitives . 92 5.5 Persisting Concurrent Programs . 96 5.6 Optimizing the Simulation . 97 5.6.1 CAS-Read Capsules . 97 5.6.2 Normalized Data Structures . 100 5.7 Practical Concerns . 102 5.8 Experiments . 105 viii III Remote Direct Memory Access 109 6 Introduction and Preliminaries 111 6.1 RDMA in Practice .

CMU-CS-20-126 August 26, 2020

ACM SIGACT News Distributed Computing Column 28

Fast and Correct Load-Link/Store-Conditional Instruction Handling in DBT Systems

A DATA-ORIENTED NETWORK ARCHITECTURE Doctoral Dissertation

Edsger Dijkstra: the Man Who Carried Computer Science on His Shoulders

IEEE TCDP Editorial Notes, June 2020 1. IEEE TCDP Awards

NSDI '11: 8Th USENIX Symposium on Networked Systems Design And

Mastering Concurrent Computing Through Sequential Thinking

A Separation Logic for Refining Concurrent Objects

Lecture Notes in Computer Science 2584 Edited by G

Lock-Free Programming

31 International Symposium on Distributed Computing Andréa W

Multithreading for Gamedev Students