Sketching As a Tool for Efficient Networked Systems

Sketching as a Tool for Efficient Networked Systems by Zaoxing Liu A dissertation submitted to The Johns Hopkins University in conformity with the requirements for the degree of Doctor of Philosophy Baltimore, Maryland October, 2018 © 2018 by Zaoxing Liu All rights reserved Abstract Today, computer systems need to cope with the explosive growth of data in the world. For instance, in data-center networks, monitoring systems are used to measure traffic statistics at high speed; and in financial technology companies, distributed processing systems are deployed to support graph analytics. To fulfill the requirements of handling such large datasets, webuild efficient networked systems in a distributed manner most of the time. Ideally, we expect the systems to meet service-level objectives (SLOs) using the least amount of resource. However, existing systems constructed with conventional in-memory algorithms face the following challenges: (1) excessive resource requirements (e.g., CPU, ASIC, and memory) with high cost; (2) infeasibility in a larger scale; (3) processing the data too slowly to meet the objectives. To address these challenges, we propose sketching techniques as a tool to build more efficient networked systems. Sketching algorithms aim to process the data with one or several passes in an online, streaming fashion (e.g., a stream of network packets), and compute highly accurate results. With sketching, we only maintain a compact summary of the entire data and provide theoretical guarantees on error bounds. ii This dissertation argues for a sketching based design for large-scale networked systems, and demonstrates the benefits in three application contexts: (i) Network monitoring: we build generic monitoring frameworks that support a range of applications on both software and hardware with universal sketches. (ii) Graph pattern mining: we develop a swift, approximate graph pattern miner that scales to very large graphs by leveraging graph sketching techniques. (iii) Halo finding in N-body simulations: we design scalable halo finders on CPU and GPU by leveraging sketch-based heavy hitter algorithms. iii Acknowledgments First, I would like to thank my advisor Vladimir (Vova) Braverman for his guidance, endless patience in tolerating my immature ideas and views, and continuous encouragement in pursuing exciting research problems. He has taught me how to do research, from defining research problems to designing elegant solutions. I have been amazed by Vova’s insights into the core of many research questions at times. His wisdom and humbleness inspired me over the past four years and will forever influence my life. Vova is very open-minded and gives me enough freedom to seek ideas that interest me most; and he is also very supportive on what I want to pursue, from attending various conferences/workshops to connecting me to the experts in the field. Second, I would like to express my gratitude on the support from my committee members Vyas Sekar and Xin Jin. Vyas has always offered his time on discussing ideas with me and provided insightful thoughts toward the roots of the problems. I have been very fortunate working with him on the UnivMon project and several others. His guidance on how to formulate the problems and complete the system designs benefited me a lot over the years. Xin has always been energetic and enthusiastic in sharing his thoughts on the ASAP project and some others. It is with great joy that I got a chance to work iv with Xin and learn from him in various ways. During the PhD journey, I have had the opportunity to interact or collabo- rate with several researchers and faculty who have mentored me in many ways — Yair Amir, Randal Burns, Tamas Budavari, Michael Dinitz, Gil Einziger, Roy Friedman, Ryan Huang, Gerard Lemson, Mark Neyrinck, Shivaram Venkatara- man, Vinodchandran N. Variyam, Ion Stoica, and Alex Szalay. Their advice and suggestions benefited me in accomplishing specific projects and beyond. I’m also thankful to my graduate student collaborators — Zhihao Bai, Ran Ben- Basat, Nikita Ivkin, Anand Iyer, Srinivas Suresh Kumara, Antonis Manousis, Li Song, Tejasvam Singh, Greg Vorsanger, Lin Yang, and Zhishuai Zhang. I would like to thank my GBO committee members: Tamas Budavari, Andrei Gritsan, Xin Li, and Scott Smith, who examined my pre-proposal towards the dissertation and gave valuable suggestions. It has been a great pleasure of studying in the department, and I want to share my thank to the wonderful faculty and staff members here. I am grateful to our awesome administrative team — Zackary Burwell, Debbie Deford, Tonette McClamy, Cathy Thornton, and others, for your responsiveness and services. I have enjoyed my time at Hopkins. In particular, I feel incredibly fortunate to meet, interact with several talented graduate students and postdocs: Zhihao Bai, James Browne, Renyuan Cheng, Kuan Cheng, Arka Rai Choudhuri, Shuya Chu, Jinqiu Deng, Venkata Gandikota, Cong Gao, Jiaqi Gao, Yifan Ge, Aarushi Goel, Yuge Gong, Yigong Hu, Kevin Huang, Yao Huang, Nikita Ivkin, Haoyuan Ji, Zhengzhong Jin, Qian Ke, Xingguo Li, Shuo Li, Kunal Lillaney, v Chang Liu, Qing Liu, Chang Lou, Hongyuan Mei, Disa Mhembere, Yasamin Nazari, Hang Ou, Fabian Prada, Jalaj Upadhyay, Enayat Ullah, Yuefan Wang, Shiwei Weng, Xiang Xiang, Yanbo Xu, Xi Yang, Lin Yang, Mo Yu, Zhuolong Yu, Andong Zhan, Shi Zhang, Zeyu Zhang, Zehua Zhao, Tuo Zhao, Chao Zheng, Yu Zheng, Da Zheng, Hang Zhu, Zhuodun Zhu, and many others. Thanks to them all for ensuring that my time at Hopkins was smooth and delightful. Special thanks to Tuo Zhao and Yanbo Xu for helping me develop and learn, and gathering us together during the time at Hopkins. I want to thank my parents for their all-around support. Thanks to them for believing in my capabilities more than I ever will. Their love and care encourage me to chase my dream without worries behind. Finally, I want to express my deepest gratitude to my wife, Keke, for sharing her valuable time with me in the United States. We share our joy and sorrow, happiness and pain, together. I own a great deal to her for endless love, care, and support. This dissertation is based upon work supported in part by DARPA-111477, NSF IIS- 1447639, Research Trends Det. HR0011-16-P-0014, EAGER CCF-1650041, CAREER CCF- 1652257, Cisco-90073352, and ONR. vi To my parents, J.H. Liu & Z.L. Zhao. To my wife, K.K. Wen. vii Table of Contents Table of Contents viii List of Tables xiv List of Figures xvi 1 Introduction1 1.1 Current Practice . .2 1.1.1 Network Monitoring . .2 1.1.2 Graph Pattern Mining . .4 1.2 Background on Sketching Techniques . .7 1.3 Thesis Approach and Contributions . .8 1.3.1 A Robust Network Monitoring Infrastructure . .9 1.3.2 A Fast, Approximate Graph Patterning Framework . 11 1.3.3 A Memory-efficient Halo Finder in N-body Simulations 12 2 UnivMon: Universal Flow Monitoring with Sketching 13 2.1 Background and Related Work . 16 viii 2.2 UnivMon architecture . 19 2.3 Theoretical Foundations of UnivMon . 22 2.3.1 Theory of Universal Sketching . 23 2.3.2 Algorithms for Universal Sketching . 25 2.3.3 Application to Network Monitoring . 28 2.4 Network-wide UnivMon . 31 2.4.1 Problem Scope . 31 2.4.2 Strawman Solutions and Limitations . 33 2.4.3 Our Approach . 35 2.4.4 Extension to Multi-path . 38 2.5 UnivMon Implementation . 40 2.5.1 Implementation overview . 40 2.5.2 Mapping UnivMon data plane to P4 . 43 2.5.3 Control plane . 46 2.6 Evaluation . 47 2.6.1 Methodology . 47 2.6.2 Single Router Evaluation . 49 2.6.3 Network-wide Evaluation . 55 2.6.4 Summary of Main Findings . 56 2.7 Chapter Summary . 57 3 NitroSketch: Robust Sketch-based Monitoring in Software Switches 59 ix 3.1 Related Work and Motivation . 63 3.2 Bottleneck Analysis . 67 3.3 NitroSketch Framework . 72 3.3.1 Key Idea . 72 3.3.2 NitroSketch Algorithms . 75 3.3.3 Interface to Other Sketching Algorithms . 79 3.4 Analysis of NitroSketch . 80 3.4.1 Interpretation of Main Theorems . 81 3.4.2 Comparison to Uniform Sampling . 82 3.4.3 Proof of Theorem 3.4.2................... 83 3.4.4 Analysis of DS-NitroSketch . 87 3.4.5 Analysis of the Comparison with Uniform Sampling . 92 3.5 Implementation . 100 3.5.1 Data Plane Module . 100 3.5.2 Control Plane Module . 102 3.6 Evaluation . 102 3.6.1 Methodology . 103 3.6.2 Throughput . 105 3.6.3 CPU Utilization . 108 3.6.4 Accuracy and Convergence Time . 109 3.6.5 Comparison with Other Solutions . 109 3.7 Chapter Summary . 112 x 4 ASAP: Fast, Approximate Graph Pattern Mining at Scale 113 4.1 Background & Motivation . 117 4.1.1 Approximate Pattern Mining . 118 4.1.2 Graph Pattern Mining Theory . 119 4.1.2.1 Example: Triangle Counting . 120 4.1.3 Challenges . 122 4.2 ASAP Overview . 122 4.3 Approximate Pattern Mining in ASAP . 125 4.3.1 Extending to General Patterns . 125 4.3.1.1 Analysis of General Patterns . 126 4.3.1.2 Programming API . 132 4.3.2 Applying to Distributed Settings . 134 4.3.3 Advanced Mining Patterns . 137 4.4 Building the Error-Latency Profile (ELP) . 140 4.4.1 Building Estimator vs. Time Profile . 140 4.4.2 Building Estimator vs. Error Profile . 141 4.4.3 Handling Evolving Graphs . 143 4.5 Evaluation . 144 4.5.1 Overall Performance . 146 4.5.2 Advanced Pattern Mining . 148 4.5.3 Effectiveness of ELP Techniques . 149 4.5.4 Scaling ASAP on a Cluster .

Load more