Matthew K. Mukerjee. Eliminating Adverse Control Plane Interactions
Total Page:16
File Type:pdf, Size:1020Kb
Eliminating Adverse Control Plane Interactions in Independent Network Systems Matthew K. Mukerjee May 2018 CMU-CS-18-106 Computer Science Department School of Computer Science Carnegie Mellon University Pittsburgh, PA ¿esis Committee: Srinivasan Seshan, Chair Vyas Sekar Peter Steenkiste Bruce Maggs (Duke; Akamai) Submitted in partial fulllment of the requirements for the degree of Doctor of Philosophy. Copyright © 2018 Matthew K. Mukerjee. ¿is work was supported by the National Science Foundation under grants numbered CNS-1040801, CNS-1314721, CNS-1345305, and CNS-1565343. ¿e views and conclusions contained in this document are those of the author and should not be interpreted as representing the ocial policies, either expressed or implied, of any sponsoring institution, the U.S. government, or any other entity. Keywords: networks, control plane, split control, control competition, information sharing, interfaces, recongurable datacenters, datacenter optics, content brokering, CDNs, live video delivery To Sophie for putting up with this long, crazy ride. iv Abstract Network system operation is typically divided into control and data planes— while the data plane is responsible for processing individual messages or packets, the control plane computes the conguration of devices and optimizes system-wide performance. Unfortunately, the control plane of each network system or protocol layer typically operates independently (e.g., CDNs selecting servers without coordinating with ISPs), resulting in poor interactions between control planes across systems. We can categorize existing systems into one of four general control plane coordination mechanisms that overcome these problems, based on how much information can be shared between control planes. If no information can be shared, control planes simply react to data plane changes as a rudimentary form of coordination (e.g., CDN server selection + ISP trac engineering). If all information can be shared, transparency in decision making can remove most poor interactions (e.g., Coow datacenter ow scheduling). In many scenarios, however, only some information can be shared (e.g., between control planes running in dierent companies). Coordination in these scenarios is more specialized; control planes with separate data plane resources can use priority ranking (i.e., providing a list of preferences for resources without needing to show how these preferences were decided; e.g., BGP routing between ISPs), and control planes with shared data plane resources can use hierarchical partitioning (i.e., making coarse- grained decisions globally, and ne-grained decisions locally; e.g., internet-wide BGP + OSPF routing). While systems utilizing control plane coordination exist today, they have been designed ad hoc. We propose a set of recipes that show when it’s appropriate to use dierent coordination mechanisms, based on key properties (information sharing and shared resources) in varied scenarios (layering, administrative separation, and internet-scale systems). We use these recipes to guide system design in a variety of contexts, as a case study in control plane coordination. First, many systems use layer separation for modularity, but in doing so trade performance for generality. As layered systems have no infor- mation sharing constraints, we argue that transparency (i.e., cross-layer optimization; allowing layers to run specialized code to better use other layers) is the correct tech- nique for regaining performance. We explore this with our emulator Etalon [116], in the context of recongurable datacenters. Second, some systems are administratively separate (i.e., are split across dierent companies), limiting information sharing for business reasons. ¿ese systems may overcome adverse interactions using priority ranking. We explore this with VDX [114, 115], in the context of content brokering. Finally, systems needing internet scalability are increasingly combining a slow central- ized control plane with a fast distributed control plane. Complete information sharing (and thus, transparency) would appear possible, but timescale separation makes this fundamentally impractical. Instead, hierarchical partitioning can overcome the chal- lenges present in this scenario. We explore this with VDN [117], in the context of live video delivery. ¿rough this case study we nd that these coordination mechanism not only solve a variety of problems, but can be eciently implemented. vi Acknowledgments First and foremost I would like to thank my girlfriend Sophie for always encourag- ing me to push forward on what I want to work on and providing innite emotional support. I would also like to thank my family (Mom, Dad, Amanda, and Austin) for their love and emotional support. I would like to thank my advisor, Srini Seshan, for seven years of fun tackling a large variety of interesting problems, as well as giving me the needed encouragement to keep submitting papers until they get in. I would also like to thank the faculty on my committee (Peter Steenkiste, Vyas Sekar, and Bruce Maggs) for all the help and feedback that went into this dissertation, as well as fun collaborations throughout the years. I’d like to thank my undergrad and masters advisors, Andrew Campbell, Tanzeem Choudhury, and Daniel Freedman, for pushing me towards research and eventually a PhD. I’d like to thank my mentors at Google, Ben Greenstein, Mike Piatek, and Matt Welsh, for a fun Summer internship. I’d like to thank the research and support sta in CSD and XIA, Deb Cavlovich, Angie Miller, Kathy McNi, Jenn Landefeld, Angy Malloy, Dan Barrett, and Nitin Gupta, for making sure everything just works and minimizing a lot of the pain of getting a PhD. I would like to thank all my friends, collaborators, and mentors: David Naylor, Nico Feltman, Lili Ehrlich, Susu and Ray Kanemoto, Evan Shimizu, Alex Lloyd and Dana Daugherty, Lauren and Logan Plath, Ravi Choudhuri, Elizabeth, Matt, and Argus Grin, Kris Salada, Mitch Cairns, and Michaela Nachtigall, Yu Zhao, Lindsey and Vince Slaugh, Brian Russman, Richard Wang and Yvonne Yu, Yuchen Wu, Alex Poms, John Wright, Hyeontaek Lim, David Witmer, Junchen Jiang, Bruno Vavala, Ying Jung Chen, Eunjin Lee, Nadi Bozkurt, Conglong Li, Devdeep Ray, Ranysha Ware, Athula Balachandran, George Nychis, Wolf Richter, Raja Sambasivan, Dongsu Han, Justine Sherry, Hui Zhang, Dave Andersen, Michael Kaminsky, George Porter, Alex Snoeren, other people working under XIA, and the CMU systems seminar folks. I apologize if I forgot anyone. Finally, I’dlike to thank the anonymous reviewers for SIGCOMM, NSDI, CoNEXT, and HotNets for their feedback on dra s of the included work. ¿e work presented in this dissertation is based on work in submission to SIGCOMM 2018 [116], appearing in CoNEXT 2017 [115], HotNets 2016 [114], and SIGCOMM 2015 [117], and builds on work appearing in CoNEXT 2015 [98] and ANCS 2017 [95]. viii Contents 1 Introduction1 1.1 Coordination Mechanisms.................................4 1.2 Understanding designs for split control plane systems.................8 1.2.1 Salient features...................................8 1.2.2 Related Work.................................... 11 1.2.3 Design Space.................................... 13 1.3 Common Scenarios..................................... 14 1.4 Recipes for control plane coordination.......................... 16 1.5 Scope............................................. 17 1.6 Goals............................................. 17 1.7 Contributions........................................ 17 2 Control Planes in Dierent Layers: A Case Study in Recongurable Datacenters 19 2.1 Introduction......................................... 20 2.2 Setting............................................. 22 2.2.1 Network Model.................................. 22 2.2.2 Computing Schedules............................... 23 2.2.3 Schedule Execution................................ 23 2.2.4 Challenges..................................... 23 2.3 Etalon............................................. 24 2.3.1 Overview...................................... 24 2.3.2 So ware Switch.................................. 25 2.3.3 Time Dilation................................... 26 2.3.4 Testbed....................................... 26 2.3.5 Validation...................................... 27 2.4 Overcoming rapid bandwidth uctuation with dynamic buer resizing...... 27 2.4.1 Understanding the problem........................... 27 2.4.2 Dynamic buer resizing............................. 29 2.4.3 Experiments.................................... 30 2.4.4 Incorporating explicit network feedback.................... 31 2.4.5 Delay sensitivity analysis............................. 32 2.5 Overcoming poor demand estimation with ADUs................... 32 2.5.1 Understanding problems with demand estimation.............. 32 ix 2.5.2 Using ADUs.................................... 33 2.5.3 Experiments.................................... 34 2.5.4 Cumulative results................................. 35 2.6 Overcoming dicult-to-schedule workloads with application-specic changes................................ 36 2.6.1 HDFS write placement diculties and solutions............... 37 2.6.2 Experiments.................................... 38 2.7 Related work......................................... 40 2.8 Summary........................................... 40 3 Control Planes in Dierent Businesses: A Case Study in Content Brokering 43 3.1 Introduction......................................... 44 3.2 Content Delivery: ¿e Past and