Making Networks Robust to Component Failures
Total Page:16
File Type:pdf, Size:1020Kb
University of Massachusetts Amherst ScholarWorks@UMass Amherst Doctoral Dissertations Dissertations and Theses Spring August 2014 Making Networks Robust to Component Failures Daniel Gyllstrom University of Massachusetts Amherst Follow this and additional works at: https://scholarworks.umass.edu/dissertations_2 Part of the OS and Networks Commons Recommended Citation Gyllstrom, Daniel, "Making Networks Robust to Component Failures" (2014). Doctoral Dissertations. 88. https://doi.org/10.7275/qvzz-3479 https://scholarworks.umass.edu/dissertations_2/88 This Open Access Dissertation is brought to you for free and open access by the Dissertations and Theses at ScholarWorks@UMass Amherst. It has been accepted for inclusion in Doctoral Dissertations by an authorized administrator of ScholarWorks@UMass Amherst. For more information, please contact [email protected]. MAKING NETWORKS ROBUST TO COMPONENT FAILURES A Dissertation Presented by DANIEL P. GYLLSTROM Submitted to the Graduate School of the University of Massachusetts Amherst in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY May 2014 School of Computer Science c Copyright by Daniel P. Gyllstrom 2014 All Rights Reserved MAKING NETWORKS ROBUST TO COMPONENT FAILURES A Dissertation Presented by DANIEL P. GYLLSTROM Approved as to style and content by: Jim Kurose, Chair Prashant Shenoy, Member Deepak Ganesan, Member Lixin Gao, Member Lori A. Clarke, Chair School of Computer Science ACKNOWLEDGMENTS The help and support of family, friends, and colleagues made this thesis possible. Starting with my wife, Sarah, thank you for your love, support, and reminding me to enjoy the fun things in life when research was a struggle. Next, thank you to my daughter Sophia. Coming home to you was (is) the best part of my a day. You forced me to create some structure to my work days (mom you were right!); I can't help but think that if you were born a few years ago I would have graduated a year or two earlier. Thank you mom and dad for your love and constant encouragement, especially during tough times. Your support guided me through the program. I hope to be the same parent to my kids. To my sisters Amanda, Lisa, and Sara thank you for your unconditional love and support. JP, Dave, and Devon thank you sharing a beer with me whenever I asked (selfless). To my second family { Cecile, Vladimir, Grandma Lola, Felicia, Adam, and Han- nah { thank you for your care and support. Avon, CT will always be a haven of relaxation and sleep. Thank you Jim Kurose for being my mentor, role model, and advisor. If I never stepped foot in your graduate Networking course, I would have left the program. You injected energy and curiosity to research that re-inspired me to stay in the Ph.D program. You gave me the independence I wanted and a guiding hand when necessary. I will always aspire to have your enthusiasm and approach to both teaching and research. iv Thank you Leeanne Leclerc for believing in me, keeping me in order (you even reminded me to buy Sarah a Christmas present), and helping me through the program. It was always a treat to stop by your office. Thank you to my friends and colleagues in the Networks Lab. I was lucky to have such a bright group of friends to help me work through research challenges, share a cup of coffee, and geek out on networking research. Moe, Karl, D. Brent, Oggy, et al. thank you for being my chilled-out entertainers. v ABSTRACT MAKING NETWORKS ROBUST TO COMPONENT FAILURES MAY 2014 DANIEL P. GYLLSTROM B.Sc., TRINITY COLLEGE M.Sc., UNIVERSITY OF MASSACHUSETTS AMHERST Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST Directed by: Professor Jim Kurose In this thesis, we consider instances of component failure in the Internet and in networked cyber-physical systems, such as the communication network used by the modern electric power grid (termed the smart grid). We design algorithms that make these networks more robust to various component failures, including failed routers, failures of links connecting routers, and failed sensors. This thesis divides into three parts: recovery from malicious or misconfigured nodes injecting false information into a distributed system (e.g., the Internet), placing smart grid sensors to provide measurement error detection, and fast recovery from link failures in a smart grid communication network. First, we consider the problem of malicious or misconfigured nodes that inject and spread incorrect state throughout a distributed system. Such false state can degrade vi the performance of a distributed system or render it unusable. For example, in the case of network routing algorithms, false state corresponding to a node incorrectly declaring a cost of 0 to all destinations (maliciously or due to misconfiguration) can quickly spread through the network. This causes other nodes to (incorrectly) route via the misconfigured node, resulting in suboptimal routing and network congestion. We propose three algorithms for efficient recovery in such scenarios and evaluate their efficacy. The last two parts of this thesis consider robustness in the context of the electric power grid. We study the use and placement of a sensor, called a Phasor Measurement Unit (PMU), currently being deployed in electric power grids worldwide. PMUs provide voltage and current measurements at a sampling rate orders of magnitude higher than the status quo. As a result, PMUs can both drastically improve existing power grid operations and enable an entirely new set of applications, such as the reliable integration of renewable energy resources. However, PMU applications require correct (addressed in thesis part 2) and timely (covered in thesis part 3) PMU data. Without these guarantees, smart grid operators and applications may make incorrect decisions and take corresponding (incorrect) actions. The second part of this thesis addresses PMU measurement errors, which have been observed in practice. We formulate a set of PMU placement problems that aim to satisfy two constraints: place PMUs \near" each other to allow for measurement error detection and use the minimal number of PMUs to infer the state of the maximum number of system buses and transmission lines. For each PMU placement problem, we prove it is NP-Complete, propose a simple greedy approximation algorithm, and evaluate our greedy solutions. In the last part of this thesis, we design algorithms for fast recovery from link failures in a smart grid communication network. We propose, design, and evalu- ate solutions to all three aspects of link failure recovery: (a) link failure detection, vii (b) algorithms for pre-computing backup multicast trees, and (c) fast backup tree installation. To address (a), we design link-failure detection and reporting mechanisms that use OpenFlow to detect link failures when and where they occur inside the network. OpenFlow is an open source framework that cleanly separates the control and data planes for use in network management and control. For part (b), we formulate a new problem, Multicast Recycling, that pre-computes backup multicast trees that aim to minimize control plane signaling overhead. We prove Multicast Re- cycling is at least NP-hard and present a corresponding approximation algorithm. Lastly, two control plane algorithms are proposed that signal data plane switches to install pre-computed backup trees. An optimized version of each installation al- gorithm is designed that finds a near minimum set of forwarding rules by sharing forwarding rules across multicast groups. This optimization reduces backup tree in- stall time and associated control state. We implement these algorithms using the POX open-source OpenFlow controller and evaluate them using the Mininet emula- tor, quantifying control plane signaling and installation time. viii TABLE OF CONTENTS Page ACKNOWLEDGMENTS ............................................. iv ABSTRACT .......................................................... vi LIST OF TABLES ...................................................xiii LIST OF FIGURES.................................................. xiv CHAPTER 1. INTRODUCTION ................................................. 1 1.1 Thesis Overview..................................................1 1.1.1 Component Failures in Communication Networks...............1 1.1.2 Approaches to Making Networks More Robust to Failures........2 1.2 Thesis Contributions..............................................4 1.3 Thesis Outline...................................................6 2. RECOVERY FROM FALSE ROUTING STATE IN DISTRIBUTED ROUTING ALGORITHMS .................... 8 2.1 Introduction.....................................................8 2.2 Problem Formulation............................................ 10 2.3 Recovery Algorithms............................................. 12 2.3.1 Preprocessing............................................. 13 2.3.2 The 2nd Best Algorithm................................... 14 2.3.3 The Purge Algorithm...................................... 16 2.3.4 The CPR Algorithm....................................... 17 2.3.5 Multiple Compromised Nodes............................... 21 2.4 Analysis of Algorithms........................................... 22 2.5 Simulation Study................................................ 23 ix 2.5.1 Simulations using Graphs with Fixed Link Weights............ 24 2.5.1.1 Simulation 1: Erd¨os-R´enyi Graphs with Fixed Unit Link Weights.................................. 24 2.5.1.2 Simulation 2: Erd¨os-R´enyi Graphs with Fixed but Randomly