Reactive Design Patterns
ROLAND KUHN
WITH BRIAN HANAFEE AND JAMIE ALLEN
FOREWORD BY JONAS BONÉR
MANNING SHELTER ISLAND For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: [email protected]
©2017 by Manning Publications Co. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps.
Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine.
Manning Publications Co. Development editor: Jennifer Stout 20 Baldwin Road Technical development editor: Brian Hanafee PO Box 761 Project editors: Tiffany Taylor and Janet Vail Shelter Island, NY 11964 Line editor: Ben Kovitz Copyeditor: Tiffany Taylor Proofreader: Katie Tennant Technical proofreader: Thomas Lockney Typesetter: Dottie Marsico Cover designer: Leslie Haimes
ISBN 9781617291807 Printed in the United States of America 1 2 3 4 5 6 70 8 9 10 – EBM – 22 21 20 19 18 17 To my children — Roland
brief contents
PART 1 INTRODUCTION...... 1 1 ■ Why Reactive? 3 2 ■ A walk-through of the Reactive Manifesto 12 3 ■ Tools of the trade 39
PART 2 THE PHILOSOPHY IN A NUTSHELL ...... 65 4 ■ Message passing 67 5 ■ Location transparency 81 6 ■ Divide and conquer 91 7 ■ Principled failure handling 100 8 ■ Delimited consistency 105 9 ■ Nondeterminism by need 113 10 ■ Message flow 120
PART 3 PATTERNS ...... 125 11 ■ Testing reactive applications 127 12 ■ Fault tolerance and recovery patterns 162 13 ■ Replication patterns 184 14 ■ Resource-management patterns 220 15 ■ Message flow patterns 255 16 ■ Flow control patterns 294 17 ■ State management and persistence patterns 311
v
contents
foreword xv preface xvii acknowledgments xix about this book xxi about the authors xxiv
PART 1 INTRODUCTION...... 1 Why Reactive? 3 1 1.1 The anatomy of a Reactive application 4 1.2 Coping with load 5 1.3 Coping with failures 7 1.4 Making the system responsive 8 1.5 Avoiding the ball of mud 10 1.6 Integrating nonreactive components 10 1.7 Summary 11
A walk-through of the Reactive Manifesto 12 2 2.1 Reacting to users 12
Understanding the traditional approach 13 ■ Analyzing latency with a shared resource 15 ■ Limiting maximum latency with a queue 16
vii viii CONTENTS
2.2 Exploiting parallelism 18
Reducing latency via parallelization 18 ■ Improving parallelism with composable Futures 20 ■ Paying for the serial illusion 21 2.3 The limits of parallel execution 23
Amdahl’s Law 23 ■ Universal Scalability Law 24 2.4 Reacting to failure 26
Compartmentalization and bulkheading 28 ■ Using circuit breakers 29 ■ Supervision 30 2.5 Losing strong consistency 31
ACID 2.0 33 ■ Accepting updates 34 2.6 The need for Reactive design patterns 35
Managing complexity 36 ■ Bringing programming models closer to the real world 37 2.7 Summary 38
Tools of the trade 39 3 3.1 Early Reactive solutions 39 3.2 Functional programming 41
Immutability 41 ■ Referential transparency 44 ■ Side effects 45 ■ Functions as first-class citizens 46 3.3 Responsiveness to users 47 Prioritizing the three performance characteristics 47 3.4 Existing support for Reactive design 48
Green threads 48 ■ Event loops 49 ■ Communicating Sequential Processes 50 ■ Futures and promises 52 Reactive Extensions 57 ■ The Actor model 58 Summary 62
PART 2 THE PHILOSOPHY IN A NUTSHELL ...... 65
Message passing 67 4 4.1 Messages 67 4.2 Vertical scalability 68 4.3 Event-based vs. message-based 69 4.4 Synchronous vs. asynchronous 71 4.5 Flow control 73 CONTENTS ix
4.6 Delivery guarantees 76 4.7 Events as messages 78 4.8 Synchronous message passing 80 4.9 Summary 80
Location transparency 81 5 5.1 What is location transparency? 81 5.2 The fallacy of transparent remoting 82 5.3 Explicit message passing to the rescue 83 5.4 Optimization of local message passing 84 5.5 Message loss 85 5.6 Horizontal scalability 87 5.7 Location transparency makes testing simpler 87 5.8 Dynamic composition 88 5.9 Summary 90
Divide and conquer 91 6 6.1 Hierarchical problem decomposition 92 Defining the hierarchy 92 6.2 Dependencies vs. descendant modules 94 Avoiding the matrix 95 6.3 Building your own big corporation 96 6.4 Advantages of specification and testing 97 6.5 Horizontal and vertical scalability 98 6.6 Summary 99
Principled failure handling 100 7 7.1 Ownership means commitment 100 7.2 Ownership implies lifecycle control 102 7.3 Resilience on all levels 104 7.4 Summary 104
Delimited consistency 105 8 8.1 Encapsulated modules to the rescue 106 8.2 Grouping data and behavior according to transaction boundaries 107 x CONTENTS
8.3 Modeling workflows across transactional boundaries 107 8.4 Unit of failure = unit of consistency 108 8.5 Segregating responsibilities 109 8.6 Persisting isolated scopes of consistency 111 8.7 Summary 112
Nondeterminism by need 113 9 9.1 Logic programming and declarative data flow 113 9.2 Functional reactive programming 115 9.3 Sharing nothing simplifies concurrency 116 9.4 Shared-state concurrency 117 9.5 So, what should we do? 117 9.6 Summary 119
Message flow 120 10 10.1 Pushing data forward 120 10.2 Modeling the processes of your domain 122 10.3 Identifying resilience limitations 122 10.4 Estimating rates and deployment scale 123 10.5 Planning for flow control 124 10.6 Summary 124
PART 3 PATTERNS ...... 125 Testing reactive applications 127 11 11.1 How to test 127
Unit tests 128 ■ Component tests 129 ■ String tests 129 Integration tests 129 ■ User-acceptance tests 130 ■ Black-box vs. white-box tests 130 11.2 Test environment 131 11.3 Testing asynchronously 132
Providing blocking message receivers 133 ■ The crux of choosing timeouts 135 ■ Asserting the absence of a message 141 Providing synchronous execution engines 142 ■ Asynchronous assertions 144 ■ Fully asynchronous tests 145 ■ Asserting the absence of asynchronous errors 147 CONTENTS xi
11.4 Testing nondeterministic systems 150
The trouble with execution schedules 151 ■ Testing distributed components 151 ■ Mocking Actors 152 ■ Distributed components 153 11.5 Testing elasticity 154 11.6 Testing resilience 154
Application resilience 155 ■ Infrastructure resilience 158 11.7 Testing responsiveness 160 11.8 Summary 161
Fault tolerance and recovery patterns 162 12 12.1 The Simple Component pattern 162
The problem setting 163 ■ Applying the pattern 163 The pattern, revisited 165 ■ Applicability 166 12.2 The Error Kernel pattern 166
The problem setting 167 ■ Applying the pattern 167 The pattern, revisited 170 ■ Applicability 171 12.3 The Let-It-Crash pattern 171
The problem setting 172 ■ Applying the pattern 172 The pattern, revisited 173 ■ Implementation considerations 174 ■ Corollary: the Heartbeat pattern 175 Corollary: the Proactive Failure Signal pattern 176 12.4 The Circuit Breaker pattern 177
The problem setting 177 ■ Applying the pattern 178 The pattern, revisited 181 ■ Applicability 182 12.5 Summary 182
Replication patterns 184 13 13.1 The Active–Passive Replication pattern 184
The problem setting 185 ■ Applying the pattern 186 The pattern, revisited 196 ■ Applicability 197 13.2 Multiple-Master Replication patterns 197
Consensus-based replication 198 ■ Replication with conflict detection and resolution 201 ■ Conflict-free replicated data types 203 13.3 The Active–Active Replication pattern 210
The problem setting 211 ■ Applying the pattern 211 xii CONTENTS
The pattern, revisited 217 ■ The relation to virtual synchrony 218 13.4 Summary 219
Resource-management patterns 220 14 14.1 The Resource Encapsulation pattern 221
The problem setting 221 ■ Applying the pattern 221 The pattern, revisited 227 ■ Applicability 228 14.2 The Resource Loan pattern 229
The problem setting 229 ■ Applying the pattern 230 The pattern, revisited 232 ■ Applicability 233 Implementation considerations 233 ■ Variant: using the Resource Loan pattern for partial exposure 234 14.3 The Complex Command pattern 234
The problem setting 235 ■ Applying the pattern 236 The pattern, revisited 243 ■ Applicability 243 14.4 The Resource Pool pattern 244
The problem setting 244 ■ Applying the pattern 245 The pattern, revisited 247 ■ Implementation considerations 248 14.5 Patterns for managed blocking 248
The problem setting 249 ■ Applying the pattern 249 The pattern, revisited 251 ■ Applicability 253 14.6 Summary 253
Message flow patterns 255 15 15.1 The Request–Response pattern 255
The problem setting 256 ■ Applying the pattern 257 Common instances of the pattern 258 ■ The pattern, revisited 263 ■ Applicability 264 15.2 The Self-Contained Message pattern 265
The problem setting 265 ■ Applying the pattern 266 The pattern, revisited 268 ■ Applicability 268 15.3 The Ask pattern 269
The problem setting 269 ■ Applying the pattern 270 The pattern, revisited 273 ■ Applicability 274 15.4 The Forward Flow pattern 275
The problem setting 275 ■ Applying the pattern 275 The pattern, revisited 276 ■ Applicability 276 CONTENTS xiii
15.5 The Aggregator pattern 277
The problem setting 277 ■ Applying the pattern 277 The pattern, revisited 281 ■ Applicability 281 15.6 The Saga pattern 282
The problem setting 282 ■ Applying the pattern 283 The pattern, revisited 284 ■ Applicability 286 15.7 The Business Handshake pattern (a.k.a. Reliable Delivery pattern) 286
The problem setting 287 ■ Applying the pattern 287 The pattern, revisited 292 ■ Applicability 292 15.8 Summary 293
Flow control patterns 294 16 16.1 The Pull pattern 294
The problem setting 295 ■ Applying the pattern 295 The pattern, revisited 297 ■ Applicability 298 16.2 The Managed Queue pattern 298
The problem setting 298 ■ Applying the pattern 299 The pattern, revisited 300 ■ Applicability 301 16.3 The Drop pattern 301
The problem setting 301 ■ Applying the pattern 302 The pattern, revisited 303 ■ Applicability 306 16.4 The Throttling pattern 306
The problem setting 307 ■ Applying the pattern 307 The pattern, revisited 309 16.5 Summary 310
State management and persistence patterns 311 17 17.1 The Domain Object pattern 312
The problem setting 312 ■ Applying the pattern 313 The pattern, revisited 315 17.2 The Sharding pattern 316
The problem setting 316 ■ Applying the pattern 316 The pattern, revisited 318 ■ Important caveat 319 17.3 The Event-Sourcing pattern 319
The problem setting 319 ■ Applying the pattern 320 The pattern, revisited 321 ■ Applicability 322 xiv CONTENTS
17.4 The Event Stream pattern 322
The problem setting 323 ■ Applying the pattern 323 The pattern, revisited 325 ■ Applicability 325 17.5 Summary 326
appendix A Diagramming Reactive systems 327 appendix B An illustrated example 329 appendix C The Reactive Manifesto 343
index 351 foreword
I’m grateful that Roland has taken the time to write this foundational book, and I can’t think of anyone more capable of pulling it off. Roland is an unusually clear and deep thinker; he coauthored the Reactive Manifesto, has been the technical lead for the Akka project for several years, has coauthored and taught the very popular Cour- sera course on Reactive programming and design, and is the best technical writer I have met. Clearly, I’m very excited about this book. It outlines what Reactive architec- ture/design is all about, and does an excellent job explaining it from first principles in a practical context. Additionally, it is a catalog of patterns that explains the bigger pic- ture, how to think about system design, and how it is all connected—much like what Martin Fowler’s Patterns of Enterprise Application Architecture did 15 years ago. During my professional life, I have seen the immense benefits of resilient, loosely coupled, message-driven systems firsthand, especially when compared with more- traditional approaches that propose to hide the nature of distributed systems. In 2013, I had the idea of formalizing the experiences and lessons learned: the Reactive Mani- festo was born. It started out as a set of rough notes that I remember presenting to the company at one of Typesafe’s (now Lightbend) internal technical meetups. Coinci- dentally, this meetup was collocated with the Scala Days New York conference, where Roland, Martin Odersky, and Erik Meijer shot their bad, and unintentionally quite funny, promotion video of their Coursera course on Reactive programming. The story around the Reactive principles resonated with the other engineers and was published in July of 2013. Since then, the Manifesto has been receiving a lot of great feedback from the community. It was rewritten and vastly improved by Roland, Martin Thomp- son, Dave Farley, and myself, leading up to version 2.0 published in September 2014.
xv xvi FOREWORD
By the end of 2016, it had been signed by more than 17,000 people. During this time, we have seen Reactive progress from a virtually unacknowledged technique used only by fringe projects within a select few corporations to a part of the overall platform strategy of numerous big players in many different fields, including middleware, financial services, retail, social media, betting/gaming, and so on. The Reactive Manifesto defines “Reactive Systems” as a set of architectural design principles that are geared toward meeting the demands that systems face—today and tomorrow. These principles are most definitely not new; they can be traced back to the ’70s and ’80s and the seminal work by Jim Gray and Pat Helland on the Tandem System, as well as Joe Armstrong and Robert Virding on Erlang. However, these pio- neers were ahead of their time, and it was not until the past five years that the technol- ogy industry was forced to rethink current best practices for enterprise system development and learned to apply the hard-won knowledge of the Reactive principles to today’s world of multicore architectures, Cloud Computing, and the Internet of Things. By now, the Reactive principles have had a big impact on the industry, and as with many successful ideas, they get overloaded and reinterpreted. This is not a bad thing; ideas need to evolve to stay relevant. However, this can also cause confusion and lead to dilution of the original intent. One example is the unfortunate emerging miscon- ception that Reactive is nothing but programming in an asynchronous and nonblock- ing style using callbacks or stream-oriented combinators—techniques that are aptly classified as Reactive Programming. Concentrating on this aspect alone means miss- ing out on many of the benefits of the Reactive principles. It is the contribution of this book to take a much larger perspective—a systems view—moving the focus from how individual components function in isolation to the design of collaborative, resilient, and elastic systems: Reactive systems. This future classic belongs on the shelf of every professional programmer, right next to GoF1 and Domain-Driven Design.2!Enjoy the ride—I certainly did
JONAS BONÉR CTO AND FOUNDER OF LIGHTBEND CREATOR OF AKKA
1 Design Patterns: Elements of Reusable Object-Oriented Software by Gamma, Helm, Johnson, and Vlissides (Addison- Wesley, 1995). 2 Domain-Driven Design by Eric Evans (Addison-Wesley, 2004). preface
Even before I had officially joined the Akka team, Mike Stephens from Manning tried to convince me to write a book on Akka. I was tempted to say yes, but in the context of an impending change of jobs and countries, my wife brought me to my senses: such a project would be too much to handle. The idea of writing a book stuck in my head, though. Three years later—after the Reactive Manifesto had been published—Martin Odersky, Erik Meijer, and I taught the course Principles of Reactive Programming on the Coursera platform, reaching more than 120,000 students in two iterations. The idea for that course had been born at a Typesafe engineering meeting where I sug- gested to Martin that we should nurture the blossoming movement of Reactive pro- gramming by demonstrating how to use these tools effectively while avoiding the pitfalls—my own experience answering questions on the Akka mailing list had given me a good idea of the topics people commonly struggled with. A video course is a wonderful way of reaching a large number of students, interact- ing with them on the discussion forums, and in general improving the lives of others. Unfortunately, the discussion of the subject is necessarily limited in its depth and breadth by the format: only so much can be shown in seven weekly lectures. Therefore, I still longed for formalizing and passing on my knowledge about Reactive systems by writing a book. It would have been straightforward to write about Akka, but I felt that if I wrote a book, its scope should be wider than that. I love working on Akka—it has lit- erally changed the course of my life—but Akka is merely a tool for expressing distrib- uted and highly reliable systems, and it is not the only tool needed in this regard. Thus began the journey toward the work you are holding in your hands right now. It was a daunting task, and I knew that I would need help. Luckily, Jamie was just
xvii xviii PREFACE
about to finish Effective Akka 3 and was immediately on board. Neither of us had the luxury of writing during daytime; consequently, the book started out slow and kept lagging behind the plan. Instead of having three chapters ready to enter the early access program during the first iteration of the course Principles of Reactive Program- ming, we could only announce it several months later. It is astonishing how much detail one finds missing when starting out from the viewpoint that the contents are basically already known and just need to be transferred into the computer. Over time, Jamie got even busier with his day job, until he had to stop contributing entirely. Later, Brian joined the project as Manning’s technical development editor, and it soon became clear that he could not only make very good suggestions but also imple- ment them. We made it official by signing him up as a coauthor, and then Brian helped me push the manuscript over the finish line. This book contains not only advice on when and how to use the tools of Reactive programming, but also the reasoning behind the advice, so that you may adapt it to different requirements and new applications. I hope that it will inspire you to learn more and to explore the wonderful world of Reactive systems.
ROLAND KUHN
3 Effective Akka by Jamie Allen (O’Reillly Media, 2013). acknowledgments
ROLAND KUHN My first thanks go to Jamie, without whom I would not have dared take on this project. But my deepest gratitude is to Jonas Bonér, who created Akka, entrusted Akka to my care for many years, and supported me on every step along the way. I am also deeply thankful to Viktor Klang for countless rigorous discussions about all topics of life (and distributed systems) but, more importantly, for teaching me how to lead by example and how important it is to not let the devil over the bridge. Jonas, Viktor, and Patrik Nordwall also deserve special thanks for covering my duties as Akka Tech Lead while I took a mini-sabbatical of three months to work intensely on this book. I greatly appreciate Brian and Jamie stepping up and shouldering part of the tremendous weight of such a project: it is gratifying and motivating to work alongside such trusted companions. For helpful reviews of the early manuscript, I would like to thank Sean Walsh and Duncan DeVore, as well as Bert Bates, who helped shape the overall arrangement of how the patterns are presented. I also thank Endre Varga, who spent considerable effort developing the KVStore exercise for Principles of Reactive Programming that forms the basis for the state replication code samples used in chapter 13. Thanks also go to Pablo Medina for helping me with the CKite example code in section 13.2, and to Thomas Lockney, technical proofreader, who kept a sharp eye out for errors. The following peer reviewers gave generously of their time: Joel Kotarski, Valentine Sinit- syn, Mark Elston, Miguel Eduardo Gil Biraud, William E. Wheeler, Jonathan Freeman, Franco Bulgarelli, Bryan Gilbert, Carlos Curotto, Andy Hicks, William Chan, Jacek Sokulski, Dr. Christian Bridge-Harrington, Satadru Roy, Richard Jepps, Sorbo Bagchi, NenkoTabakov, Martin Anlauf, Kolja Dummann, Gordon Fische, Sebastien Boisver,
xix xx ACKNOWLEDGMENTS
and Henrik Løvborg. I am grateful to the Akka community for being such a welcom- ing and fertile place for developing our understanding of distributed systems. I would like to thank the team at Manning who made this book possible, especially Mike Stephens for nagging until I gave in, Jenny Stout for urging me to make prog- ress, and Candace Gillhoolley from marketing. I would like to distinguish Ben Kovitz as an extremely careful and thorough copy editor, and I thank Tiffany Taylor for finding even more redundant words to be removed from the final text, as well as Katie Tennant for identifying and fixing unclear passages. Finally, in the name of all readers, I extend my utmost appreciation and love to my wife, Alex. You endured my countless hours of spiritual absence with great compassion.
JAMIE ALLEN I wish to thank my wife, Yeon, and my three children, Sophie, Layla, and James. I am also grateful to Roland for allowing me to participate in this project, and to Brian for pushing the project over the finish line and contributing his expertise.
BRIAN HANAFEE I thank my wife, Patty, for supporting me, always, and my daughters, Yvonne and Barbara, for helping me with Doctor Who history and sometimes pretend- ing my jokes are funny. Thank you, Susan Conant and Bert Bates, for getting me started and teaching me how to edit and teach in book form. Finally, thank you, Roland and Jamie, for showing me Reactive principles and welcoming me into this project. about this book
This book is intended to be a comprehensive guide to understanding and designing Reactive systems. Therefore, it includes not only an annotated version of the Reactive Manifesto, but also the reasoning that led to its inception. The main part of the book is a selection of design patterns that implement many facets of Reactive system design, with pointers toward deeper literature resources for further study. While the pre- sented patterns form a cohesive whole, the list is not exhaustive—it cannot be—but the included background knowledge will enable the reader to identify, distill, and curate new patterns as the need arises. Whom this book is for This book was written for everyone who may want to implement Reactive systems: It covers the architecture of such systems as well as the philosophy behind it, giving architects an overview of the characteristics of Reactive applications and their components and discussing the applicability of the patterns. Practitioners will benefit from a detailed discussion of the scenario solved by each pattern, the steps to take in applying it—illustrated with complete source code—as well as a guide to transfer and adapt the pattern to different cases. Learners wishing to deepen their knowledge, for example, after viewing the course material of Principles of Reactive Programming, will be delighted to read about the thought processes behind the Reactive principles and to follow the literature references for further study. This book does not require prior knowledge of Reactive systems; it builds upon famil- iarity with software development in general and refers to some experience with the
xxi xxii ABOUT THIS BOOK
difficulties arising from distributed systems. For some parts, a basic understanding of functional programming is helpful (in the sense of programming with immutable val- ues and pure functions), but category theory does not feature in this book. How to read this book The contents of this book are arranged such that it lends itself well to being read as a story, cover to cover, developing from an introductory example and an overview of the Reactive Manifesto and the Reactive toolbox, continuing with the philosophy behind Reactive principles, and culminating in the description of patterns covering the differ- ent aspects of designing a Reactive system. This journey covers a lot of ground and the text contains references to additional background information. Reading it in one go will leave you with an intuition of the scope of the book and what information is found where, but it will typically only be the entry point for further study; you will return for the extraction of deeper insights while applying the acquired knowledge in projects of your own. If you are already familiar with the challenges of Reactive systems, you may skip the first chapter, and you will likely skim chapter 3 on the tools of the trade because you have already worked with most of those. The impatient will be tempted to start reading the patterns in part 3, but it is recommended to take a look at part 2 first: the pattern descriptions frequently refer to the explanations and background knowledge of this more theoretical part that form the basis on which the patterns have been developed. It is expected that you will return to the more philosophical chapters—especially chapters 8 and 9—after having gained more experience with the design and imple- mentation of Reactive systems; don’t worry if these discussions do not immediately become fully transparent upon first reading. Conventions Due to the overloading of the English term “future” for a programming concept that deviates significantly from the original meaning, all uses of the word referring to the programming concept appear capitalized as Future, even when not appearing in code font. The situation is slightly different for the term “actor,” which in plain English refers to a person on stage as well as a participant in an action or process. This term appears capitalized only when referring specifically to the Actor model, or when the name of the Actor trait appears in code font. Source code for the examples All source code for the examples used in this book are available for download on GitHub here: https://github.com/ReactiveDesignPatterns/CodeSamples/. GitHub also offers facilities for raising issues with the samples or discussing them; please make use of them. You are also welcome to open pull requests with improve- ments; this way, all future readers will benefit from your thoughtfulness and experience. ABOUT THIS BOOK xxiii
Most of the samples are written in Java or Scala and use sbt for the build definition; please refer to www.scala-sbt.org/ for detailed documentation. A Java development kit supporting Java 8 will be required to build and run the samples. Other online resources An overview of the presented patterns as well as further material is available at www.reactivedesignpatterns.org/. In addition, purchase of Reactive Design Patterns includes free access to a private web forum run by Manning Publications where you can make comments about the book, ask technical questions, and receive help from the lead author and from other users. To access the forum and subscribe to it, point your web browser to www.manning.com/books/reactive-design-patterns. This page provides information on how to get on the forum once you are registered, what kind of help is available, and the rules of conduct on the forum. Manning’s commitment to our readers is to provide a venue where a meaningful dialog between individual readers and between readers and authors can take place. It is not a commitment to any specific amount of participation on the part of the authors, whose contribution to the AO forum remains voluntary (and unpaid). We suggest you try asking them some challenging questions lest their interest stray! The Author Online forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print. about the authors
Dr. Roland Kuhn studied physics at the Technische Universität München and obtained a doctorate with a dissertation on measurements of the gluon spin structure of the nucleon at a high-energy particle physics experiment at CERN (Geneva, Switzer- land). This entailed usage and implementation of large computing clusters and fast data processing networks, which laid the foundation for his thorough understanding of distributed computing. Afterward, he worked for four years at the German space operations center, building control centers and ground infrastructure for military sat- ellite missions, before joining the Akka team at Lightbend (then called Typesafe), which he led from November 2012 to March 2016. During this time he co-taught the course Principles of Reactive Programming on the Coursera platform together with Martin Odersky and Erik Meijer, a course that was visited by more than 120,000 stu- dents. Together with Jonas Bonér, he authored the first version of the Reactive Mani- festo, published in June 2013. Currently, Roland is CTO of Actyx, a Munich-based company he cofounded, bringing the benefits of modern Reactive systems to small and midsize manufacturing enterprises across Europe.
Brian Hanafee received his BS in EECS from the University of California, Berkeley. He is a Principal Systems Architect at Wells Fargo Bank, where he designs internet bank- ing and payment systems and is a consistent advocate for raising the technology bar. Previously he was with Oracle, working on new and emerging products and systems for interactive television and for text processing. He sent his first email from a moving vehicle in 1994. Prior to that, Brian was an Associate at Booz, Allen & Hamilton and at Advanced Decision Systems, where he applied AI techniques to military planning
xxiv ABOUT THE AUTHORS xxv systems. He also wrote software for one of the first ejection-safe helmet-mounted dis- play systems.
Jamie Allen is the Director of Engineering for the UCP project at Starbucks, an effort to redefine the digital experience for every customer across all of our operating models and locations. He is the author of Effective Akka (O’Reilly, 2013), and previously worked with Roland and Jonas at Typesafe/Lightbend for over four years. Jamie has been a Scala and actor developer since 2008, and has worked with numerous clients around the world to help them understand and adopt Reactive application approaches.
Part 1 Introduction
H ave you ever wondered how high-profile web applications are imple- mented? Social networks and huge retail sites must have some secret ingredient that makes them work quickly and reliably, but what is it? In this book, you will learn about the design principles and patterns behind such systems that never fail and are capable of serving the needs of billions of people. Although the sys- tems you build may not have such ambitious requirements, the primary qualities are common: You want your application to work reliably, even though parts (hardware or software) may fail. You want it to keep working when you have more users to support, and you want to be able to add or remove resources to adapt its capacity to changing demand (capacity planning is hard to get right without a crystal ball). In chapter 1, we will sketch the development of an application that exhibits these qualities and more. We will illustrate the challenges you will encounter and present solutions based on a concrete example—a hypothetical implementation of the Gmail service—but we will do so in a technology-agnostic fashion. This use case sets the stage for the detailed discussion of the Reactive Mani- festo that follows in chapter 2. The manifesto is written in a concise, high-level form in order to concentrate on its essence: the combination of individually use- ful program characteristics into a cohesive whole that is larger than the sum of its parts. We will show this by breaking the high-level traits into smaller pieces and explaining how everything fits back together.
2 PART 1 Introduction
We will complete this part of the book in chapter 3 with a whirlwind tour through the tools of the trade: functional programming, Futures and Promises, Communicat- ing Sequential Processes (CSP), Observers and Observables (Reactive Extensions), and the Actor model.
Why Reactive?
We start from the desire to build a system that is responsive to users. This means the system should respond to user input in a timely fashion under all circumstances. Because any single computer can fail at any time, we need to distribute such a sys- tem over multiple computers. Adding this fundamental requirement for distribu- tion makes us recognize the need for new architecture patterns (or to rediscover old ones). In the past, we developed methods that allowed us to retain the illusion of single-threaded local processing while having it magically executed on multiple cores or network nodes, but the gap between that illusion and reality is becoming prohibitively large.1 The solution is to make the distributed, concurrent nature of our applications explicit in the programming model, using it to our advantage. This book will teach you how to write systems that stay responsive in the face of partial outages, program failure, changing loads, and even bugs in the code. You will see that this requires adjustments to the way you think about and design your applications. Here are the four tenets of the Reactive Manifesto,2 which defines a common vocabulary and lays out the basic challenges that a modern computer sys- tem needs to meet: It must react to its users (responsive). It must react to failure and stay available (resilient). It must react to variable load conditions (elastic). It must react to inputs (message-driven).
1 For example, Java EE services allow us to transparently call remote services that are wired in automatically, possibly even including distributed database transactions. The possibility of network failure or remote ser- vice overload, and so on, is completely hidden, abstracted away, and consequently out of reach for devel- opers to meaningfully take into account. 2 http://reactivemanifesto.org
3 4 CHAPTER 1 Why Reactive?
Value Responsive Maintainable Extensible
Means Elastic Resilient
Form Message-driven Figure 1.1 The structure of Reactive values
In addition, creating a system with these properties in mind will guide you toward bet- ter modularization, both of the runtime deployment and of the code itself. Therefore, we add two more attributes to the list of benefits: maintainability and extensibility. Another way to structure the attributes is shown in figure 1.1. In the following chapters, you will learn about the reasoning of the Reactive Mani- festo in detail, and you will get to know several tools of the trade and the philosophy behind their design, enabling you to effectively use these tools to implement reactive designs. The design patterns that emerge from these tools are presented in the third part of the book. To set the stage for diving into the manifesto, we will first explore the challenges of creating a Reactive application, using the example of a well-known email service: we will imagine a reimplementation of Gmail. 1.1 The anatomy of a Reactive application The first task when starting such a project is to sketch an architecture for the deploy- ment and draft the list of software artifacts that need to be developed. This may not be the final architecture, but you need to chart the problem space and explore poten- tially difficult aspects. We will start the Gmail example by enumerating the different high-level features of the application: The application must offer a view of the mailboxes to the user and display their contents. To this end, the system must store all emails and keep them available. It must allow the user to compose and send email. To make this more comfortable, the system should offer a list of contacts and allow the user to manage them. A good search function for locating emails is required. The real Gmail application has more features, but this list will suffice for our pur- poses. Some of these features are more intertwined than the others: for example, dis- playing emails and composing them are both part of the user interface and share (or compete for) the same screen space, whereas the implementation of email storage is only distantly related to these two. The implementation of the search function will need to be closer to the storage than the front-end presentation. Coping with load 5
Storage
Search Full search
Contacts Pop-up card Autocomplete Fuzzy index
Sign-on Editing
Gmail
Profile Listing
Mail Composing
Filters
Storage
Figure 1.2 Partially decomposed module hierarchy of the hypothetical Gmail implementation
These considerations guide the hierarchical decomposition of Gmail’s overall func- tionality into smaller and smaller pieces. More precisely, you can apply the Simple Com- ponent pattern as described in chapter 12, making sure you clearly delimit and segregate the different responsibilities of the entire application. The Error Kernel pat- tern and the Let-It-Crash pattern complement this process, ensuring that the applica- tion’s architecture is well suited to reliable failure handling—not only in case of machine or network outages, but also for rare failure conditions in the source code that are handled incorrectly (a.k.a. bugs). The result of this process will be a hierarchy of components that need to be devel- oped and deployed. An example is shown in figure 1.2. Each component may be com- plex in terms of its function, such as the implementation of search algorithms; or it may be complex in its deployment and orchestration, such as providing email storage for billions of users. But it will always be simple to describe in terms of its responsibility. 1.2 Coping with load The resources necessary to store all those emails will be enormous: hundreds of mil- lions of users with gigabytes of emails each will need exabytes3 of storage capacity. This magnitude of persistent storage will need to be provided by many distributed
3 One exabyte is 1 billion gigabytes (using decimal SI prefixes; using binary SI prefixes, one EB is roughly 1.07 billion GB). 6 CHAPTER 1 Why Reactive?
machines. No single storage device offers so much space, and it would be unwise to store everything in one location. Distribution makes the dataset resilient against local perils like natural disasters; but, more important, it also allows the data to be accessed efficiently from a larger region. For a worldwide user base, the data should be globally distributed as well. It would be preferable to have the emails of a Japanese user stored in or close to Japan (assuming that is where the user logs in from most of the time). This insight leads us to the Sharding pattern described in chapter 17: you can split up the overall dataset into many small pieces—or shards—that you then distribute. Because the number of shards is much smaller than the number of users, it is practical to make the location of each shard known throughout the system. In order to find a user’s mailbox, you only need to identify the shard it belongs to. You can do that by equipping every user with an ID that expresses geographical affinity (for example, using the first few digits to denote the country of residence), which is then mathemat- ically partitioned into the correct number of shards (for example, shard 0 contains IDs 0–999,999; shard 1 contains IDs 1,000,000–1,999,999; and so on). The key here is that the dataset naturally consists of many independent pieces that can easily be separated from each other. Operations on one mailbox never affect another mailbox directly, so the shards also do not need to communicate among themselves. Each serves only one particular part of the solution. Another area in which the Gmail application will need a lot of resources is in the display of folders and emails to the user. It would be impossible to provide this func- tionality in a centralized fashion, not only for reasons of latency (even at the speed of light, it takes noticeable time to send information around the globe) but also due to the sheer number of interactions that millions of users perform every second. Here, you will also split the work among many machines, starting with the users’ computers: most of the graphical presentation is rendered within the browser, shifting the work- load very close to where it is needed and in effect sharding it for each user. The web browser will need to get the raw information from a server, ideally one that is close by to minimize network round-trip time. The task of connecting a user with their mailbox and routing requests and responses accordingly is one that can also easily be sharded. In this case, the browser’s network address directly provides all needed characteristics, including an approximate geographic location. One noteworthy aspect is that in all the aforementioned cases, resources can be added by making the shards smaller, distributing the load over more machines. The maximum number is given by the number of users or used network addresses, which will be more than enough to provide sufficient resources. This scheme will need adjustment only when serving a single user requires more computing power than a single machine can provide, at which point a user’s dataset or computing problem needs to be broken down into smaller pieces. This means that by splitting a system into distributable parts, you gain the ability to scale the service capacity, using a larger number of shards to serve more users. As long as the shards are independent from each other, the system is in theory infinitely Coping with failures 7
scalable. In practice, the orchestration and operation of a worldwide deployment with millions of nodes requires substantial effort and must of course be worth it. 1.3 Coping with failures Sharding datasets or computational resources solves the problem of providing suffi- cient resources for the nominal case, when everything is running smoothly and net- works are operational. In order to cope with failures, you need the ability to keep running when things go wrong: A machine may fail temporarily (for example, due to overheating or kernel panic) or permanently (electrical or mechanical failure, fire, flood, and so on). Network components may fail, both within a computing center as well as out- side on the internet—including the case that intercontinental overseas cables go down, resulting in a split of the internet into disconnected regions. Human operators or automated maintenance scripts may accidentally destroy parts of the data. The only solution to this problem is to replicate the system—its data or functional- ity—in more than one location. The geographical placement of the replicas needs to match the scope of the system; a global email service should serve each customer from multiple countries, for example. Replication is a more difficult and diverse topic than sharding because intuitively you mean to have the same data in multiple places—but keeping the replicas synchro- nized to match this expectation comes at a high cost. Should writing to the nearest location fail or be delayed if a more distant replica is momentarily unavailable? Should it be impossible to see the old data on a distant replica after the nearest one has already signaled completion of the operation? Or should such inconsistency just be unlikely or very short-lived? These questions will be answered differently between projects or even for different modules of one particular system. Therefore, you are presented with a spectrum of solutions that allows you to make trade-offs between operational complexity, performance, availability, and consistency. We will discuss several approaches covering a wide range of characteristics in chap- ter 13. The basic choices are as follows: Active–passive replication—Replicas agree on which one of them can accept updates. Fail-over to a different replica requires consensus among the remain- ing ones when the active replica no longer responds. Consensus-based multiple-master replication—Each update is agreed on by suffi- ciently many replicas to achieve consistent behavior across all of them, at the cost of availability and latency. Optimistic replication with conflict detection and resolution—Multiple active replicas disseminate updates and roll back transactions during conflict or discard con- flicting updates that were performed during a network partition. 8 CHAPTER 1 Why Reactive?