Programming Safely with Weak (And Strong) Consistency
Total Page:16
File Type:pdf, Size:1020Kb
PROGRAMMINGSAFELYWITHWEAK(ANDSTRONG)CONSISTENCY A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by matthew pierson milano August 2020 ©2020 Matthew Pierson Milano Some parts of this dissertation (in particular, Chapters 1 and 2) are based on published work where publishing rights have been transferred to the Association for Computing Machinery. For those parts, the following copyright notices are required: Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. For chapter 1 : © 2018 Matthew Milano and Andrew Myers. Publication rights licensed to ACM. For chapter 2 : © 2019 Association for Computing Machinery Chapter 3 is based on published work first appearing under the Leibniz International Proceedings in Informatics (LIPIcs). © 2019 Matthew Milano, Rolph Recto, Tom Magrino, and Andrew C Myers PROGRAMMINGSAFELYWITHWEAK(ANDSTRONG) CONSISTENCY matthew pierson milano Cornell University, 2020 Writing programs against weak consistency is inherently difficult. This dissertation makes the job of writing safe programs against weak consistency easier, by introducing programming languages in which strong guarantees are defended from weakly-consistent influence, and in which programmers can write consistent-by-construction programs atop underlying weakly-consistent replication. The first of these languages is MixT, a new language for writing mixed-consistency transactions. These atomic transactions can operate against data at multiple consistency levels simultaneously, and are equipped with an information-flow type system which guarantees weakly-consistent observations cannot influence strongly-consistent actions. While mixed-consistency transactions can defend strong data from weak observations, they cannot ensure that fully-weak code is itself correct. To address this, we leverage monotonic data types to introduce a core language of datalog-like predicates and triggers. In this language, programmers can write monotonic functions over a set of monotonic shared objects, ultimately resulting in a boolean. These monotonic, boolean-returning functions are stable predicates: once they have become iii true, they remain true for all time. Actions which are predicated on these stable predicates cannot be invalidated by missed or future updates. This monotonic language sits at the core of Derecho, a new system for building strongly-consistent distributed systems via replicated state machines. Derecho’s Shared State Table (SST) implements monotonic datatypes atop Remote Direct Memory Access (RDMA), resulting in a high-performance, asynchronous substrate on which to build Derecho’s monotonic language. Using this SST, we have rephrased the Paxos delivery condition monotonically, granting strong consistency despite the underlying asynchronous replication. Finally Gallifrey exposes the monotonic reasoning properties of Derecho’s core language directly to the user, safely integrating monotonic datatypes into a traditional Java-like programming language. Gallifrey allows any object to be asynchronously replicated via Restrictions to its interface, allow- ing only those operations which are safe to call concurrently. Datatypes shared under these restrictions can be viewed monotonically, using a language of predicates and triggers similar to that at the core of Dere- cho. A novel linear region-based type system enforces that shared object restrictions are respected. iv BIOGRAPHICALSKETCH Matthew Milano was born in New York, New York in 1990. He obtained his Bachelor of Science in Computer Science and Music from Brown University in 2013 and his Masters of Science in Computer Science from Cornell University in 2017. He is a recipient of the 2012 Rose Rosengard Subotnik prize, the 2013 Brown University Senior Prize in Computer Science, an honorable mention for the 2015 National Science Foundation Graduate Research Fellowship, and is a 2015 National Defense Science and Engineering Graduate Fellow. He has been recognized for his service to the Cornell Computer Science Department and his outreach efforts in the community. In his spare time Matthew is an avid reader, board games enthusiast, and academically-trained composer. v ACKNOWLEDGMENTS The kindness and generosity of the people who have surrounded me dur- ing my PhD never ceases to amaze. Without the community surrounding me, attaining a PhD would be nearly impossible. While my research and academic life has been touched by innumerable hands, I would like to single out a few specific individuals who have had an outsized impact on my life and work in graduate school. I first must call out my advisor, Andrew Myers, and my former advisor, Nate Foster. It is hard to overstate the influence your advisor has on you, and I have been blessed to work under two faculty who are dedicated to making sure that I grow into the researcher they always knew (no matter how I doubted) that I could be. Andrew Myers in particular takes special care to let me pick my own directions and follow my own problems, even if it’s not in a direction that he would naturally have taken—and even if it’s not clear that success is in easy reach. And he’s been right there with me every step of the way, ever ready with a guiding hand in case I stumble. Tom Magrino and Isaac Sheff were my original mentors when I arrived at the department, and soon grew into my friends and collaborators. Tom’s expertise in coordination-avoiding systems was instrumental in understanding Gallifrey’s systems challenges, while Isaac’s continued quiet assistance has proven invaluable into the final months of my PhD. vi Rolph Recto, my closest student collaborator and intellectual sparring partner, always pushed me to justify my designs, to consider every avenue and not just go with what seems most natural to me. Key ideas in Gallifrey (chapter 3)—such as the ability to transition between restrictions—are the result of his insistence. The accessibility of Gallifrey has no doubt improved dramatically as a result. Ken Birman has been an incredibly impactful mentor, collaborator, and all-but-advisor since the early years of my PhD. His continued insistence that we shouldn’t settle just for what seems possible has been the driving force uncountable breakthroughs in Derecho, and his tireless advocacy for his students and projects is legendary. I am forever grateful that he chose to include me. Fabian Muehlboeck has been an endless source of inspiration, assistance, and companionship for the duration of my PhD. We spent many late nights together, working through each other’s research problems, playing games, or staying sane. Every idea in this thesis owes something to Fabian, whether it be late-night debugging help or careful assistance with formal material. But the largest thanks must be saved for two primary undergraduate re- search assistants, Patrick LaFontaine and Danny Yang, without whom this dissertation would surely remain incomplete. Patrick took implementation lead in the systems aspects of Gallifrey (chapter 3), integrating shared objects into Antidote’s existing CRDT patterns. Danny is responsible for almost the entire Gallifrey compiler, and continued his contributions for months after his graduation. It is my sincere hope that I see both of them as PhD students some day. vii This research was conducted with Government support under and awarded by DoD, Air Force Office of Scientific Research, National Defense Science and Engineering Graduate (NDSEG) Fellowship, 32 CFR 168a, and by NSF grants 1717554 and 1704788 viii CONTENTS 0 introduction1 0.1 Setting the Stage . 1 0.2 An Overview of Contributions . 4 0.3 Meeting The Mixed-Consistency World with MixT . 5 0.3.1 Where We Are Today . 5 0.3.2 Static Information Flow for Consistency . 8 0.3.3 Mixed-Consistency Transactions . 10 0.3.4 Problems Still Persist . 11 0.4 In Defense of Strong Consistency with Derecho . 11 0.4.1 Existing Solutions . 12 0.4.2 Convergent Programming over RDMA . 15 0.4.3 Derecho: Datacenter Replication at the Speed of Light 17 0.4.4 The Limits of RDMA . 18 0.5 Consistent Programming Over the Wide Area with Gallifrey 19 0.5.1 Systems Solutions with CRDTs . 19 0.5.2 Gallifrey: A New Language for Geodistributed Pro- gramming . 22 0.6 Reasoning about Isolation with Static Types . 25 0.6.1 Existing Type Systems . 25 0.6.2 Isolation Types . 27 0.7 Common Themes . 30 0.8 A Roadmap . 31 1 mixt 33 ix x contents 1.1 Introduction . 33 1.2 Motivation . 36 1.2.1 A Running Example . 36 1.2.2 The Need For Mixed-Consistency Transactions . 37 1.2.3 Mixing Consistency Breaks Strong Consistency . 39 1.3 MixT Transaction Language . 41 1.3.1 MixT Language Syntax . 42 1.3.2 A MixT Example Program . 43 1.3.3 MixT API . 45 1.4 Mixed-Consistency Transactions . 47 1.4.1 Defining Mixed Consistency . 47 1.4.2 Noninterference for Mixed Consistency . 49 1.4.3 Consistency as Information Flow . 51 1.4.4 Transaction Splitting . 52 1.4.5 Whole-Transaction Atomicity . 53 1.5 Formalizing the MixT Language . 55 1.5.1 Desugared Language . 55 1.5.2 Statically Checking Consistency Labels . 57 1.5.3 Transaction Splitting . 58 1.5.4 Enforcing Atomicity in Split Transactions . 60 1.6 Correctness . 65 1.6.1 Mixed Isolation . 65 1.6.2 Mixed Consistency and Noninterference through Splitting . 66 1.6.3 Atomicity through Write Witnesses .