Building Distributed Systems Using Programmable Networks

c Copyright 2020 Ming Liu Building Distributed Systems Using Programmable Networks Ming Liu A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2020 Reading Committee: Arvind Krishnamurthy, Chair Luis Ceze Ratul Mahajan Program Authorized to Offer Degree: Computer Science & Engineering University of Washington Abstract Building Distributed Systems Using Programmable Networks Ming Liu Chair of the Supervisory Committee: Professor Arvind Krishnamurthy Computer Science & Engineering The continuing increase of data center network bandwidth, coupled with a slower improvement in CPU performance, has challenged our conventional wisdom regarding data center networks: how to build distributed systems that can keep up with the network speeds and are high-performant and energy-efficient? The recent emergence of a programmable network fabric (PNF) suggests a poten- tial solution. By offloading suitable computations to a PNF device (i.e., SmartNIC, reconfigurable switch, or network accelerator), one can reduce request serving latency, save end-host CPU cores, and enable efficient traffic control. In this dissertation, we present three frameworks for building PNF-enabled distributed systems: (1) IncBricks, an in-network caching fabric built with network accelerators and programmable switches; (2) iPipe, an actor-based framework for offloading distributed applications on Smart- NICs; (3) E3, an energy-efficient microservice execution platform for SmartNIC-accelerated servers. This dissertation presents how to make efficient use of in-network heterogeneous computing re- sources by employing new programming abstractions, applying approximation techniques, co- designing with end-host software layers, and designing efficient control-/data-planes. Our pro- totyped systems using commodity PNF hardware not only show the feasibility of such an approach but also demonstrate that it is an indispensable technique for efficient data center computing. Table of Contents Page List of Figures.......................................... vi List of Tables.......................................... ix Chapter 1: Introduction...................................1 1.1 Main Contribution..................................4 1.2 Lessons Learned...................................6 1.3 Published Work....................................8 1.4 Organization......................................8 Chapter 2: Background................................... 10 2.1 Programmable Network Fabric............................ 10 2.2 Programmable Switches............................... 11 2.3 Programmable NICs (SmartNICs).......................... 13 2.4 Network Accelerators................................. 15 2.5 Challenges of PNF Offloading............................ 17 2.6 Summary....................................... 18 Chapter 3: IncBricks: Caching Data in the Network.................... 20 3.1 Background...................................... 20 3.2 Solution Overview.................................. 22 3.3 System Architecture.................................. 23 3.3.1 Data center Network............................. 23 3.3.2 Deployment.................................. 24 i 3.4 IncBox: Programmable Network Middlebox..................... 25 3.4.1 Design Decisions............................... 25 3.4.2 IncBox Design and Prototype........................ 26 3.5 IncCache: a Distributed Coherent Key-Value Store................. 28 3.5.1 Packet Format................................ 28 3.5.2 Hash Table Based Data Cache........................ 29 3.5.3 Hierarchical Directory-based Cache Coherence............... 31 3.5.4 Extension for Multi-rooted Tree Topologies................. 32 3.5.5 Optimization Between Switch and Network Processor........... 35 3.5.6 Fault Tolerance................................ 37 3.5.7 Compute Primitives and Client Side APIs.................. 37 3.5.8 IncCache Implementation.......................... 38 3.6 Evaluation....................................... 39 3.6.1 Experimental Setup.............................. 39 3.6.2 Throughput and Latency........................... 40 3.6.3 Performance Factors of IncBricks...................... 42 3.6.4 Case Study: Conditional Put and Increment................. 44 3.7 Related Work..................................... 44 3.7.1 In-network Aggregation........................... 44 3.7.2 In-network Monitoring............................ 45 3.7.3 Network Co-design for Key-value Stores.................. 45 3.8 Summary and Learned Lessons............................ 46 Chapter 4: iPipe: Offloading Distributed Actors onto SmartNICs............. 47 4.1 Background...................................... 48 4.2 Solution Overview.................................. 49 4.3 Understanding the SmartNIC............................. 50 4.3.1 Experiment Setup............................... 50 4.3.2 Traffic Control................................ 50 4.3.3 Computing Units............................... 52 4.3.4 Onboard Memory............................... 54 4.3.5 Host communication............................. 55 4.4 iPipe Framework................................... 57 ii 4.4.1 Actor Programming Model and APIs.................... 57 4.4.2 iPipe Actor Scheduler............................ 59 4.4.3 Distributed Memory Objects and Others................... 67 4.4.4 Security Isolation............................... 68 4.4.5 Host/NIC Communication.......................... 69 4.5 Applications Built with iPipe............................. 70 4.5.1 Replicated Key-value Store.......................... 70 4.5.2 Distributed Transactions........................... 71 4.5.3 Real-time Analytics............................. 72 4.6 Evaluation....................................... 73 4.6.1 Experimental Methodology......................... 73 4.6.2 Host Core Savings.............................. 74 4.6.3 Latency versus Throughput.......................... 75 4.6.4 iPipe Actor Scheduler............................ 77 4.6.5 iPipe Migration................................ 78 4.6.6 SmartNIC as a Dispatcher.......................... 79 4.6.7 Comparison with Floem........................... 79 4.6.8 Network Functions on iPipe......................... 80 4.7 Related Work..................................... 81 4.7.1 SmartNIC Acceleration............................ 81 4.7.2 In-network Computations.......................... 81 4.7.3 RDMA-based Data center Applications................... 82 4.8 Summary and Learned Lessons............................ 82 Chapter 5: E3: Running Microservices on the SmartNIC................. 83 5.1 Background...................................... 84 5.2 Solution Overview.................................. 85 5.3 Benefits and Challenges of Microservices Offloading................ 86 5.3.1 Microservices................................. 86 5.3.2 SmartNIC-accelerated Server........................ 89 5.3.3 Benefits of SmartNIC Offload........................ 90 5.3.4 Challenges of SmartNIC Offload....................... 92 5.4 E3 Microservice Platform............................... 93 iii 5.4.1 Communication Subsystem......................... 95 5.4.2 Addressing and Routing........................... 96 5.4.3 Control-plane Manager............................ 96 5.4.4 Data-plane Orchestrator........................... 99 5.4.5 Failover/Replication Manager........................ 100 5.5 Implementation.................................... 100 5.6 Evaluation....................................... 103 5.6.1 Benefit and Cost of SmartNIC-Offload.................... 104 5.6.2 Avoiding Host Starvation........................... 106 5.6.3 Sharing SmartNIC and Host Bandwidth................... 106 5.6.4 Communication-aware Placement...................... 107 5.6.5 Energy Efficiency = Cost Efficiency..................... 108 5.6.6 Performance at Scale............................. 109 5.7 Related Work..................................... 111 5.7.1 Architecture Studies for Microservices................... 111 5.7.2 Heterogeneous Scheduling.......................... 112 5.7.3 Microservice Scheduling........................... 112 5.7.4 Power Proportionality............................ 113 5.8 Summary and Learned Lessons............................ 113 Chapter 6: Exploring the PNF-based Disaggregated Storage............... 115 6.1 Background...................................... 115 6.2 Solution Overview.................................. 116 6.3 SmartNIC-based Disaggregated Storage....................... 117 6.3.1 Hardware Architecture............................ 117 6.3.2 Software System Stack............................ 118 6.4 Performance Characterization and Implication.................... 119 6.4.1 Experiment Setup............................... 119 6.4.2 Performance Results............................. 120 6.4.3 Design Implications............................. 121 6.5 Related Work..................................... 123 6.5.1 Remote Storage I/O Stack.......................... 123 6.5.2 Disaggregated Storage Architecture..................... 123 iv 6.5.3 Applications Built Using Disaggregated Storage............... 124 6.6 Open Problems.................................... 124 Chapter 7: Conclusions and Future Work......................... 126 7.1 Contributions..................................... 126 7.2 Limitations...................................... 128 7.3 Future Work...................................... 131 7.3.1 Rethinking Distributed System Abstraction................. 131 7.3.2 Multi-tenancy................................

Building Distributed Systems Using Programmable Networks

A Distributed, Cooperative Citeseer

Scooped, Again

Donut: a Robust Distributed Hash Table Based on Chord

On the Feasibility of Peer-To-Peer Web Indexing and Search

CHR: a Distributed Hash Table for Wireless Ad Hoc Networks

Mast Cornellgrad 0058F 12280.Pdf (2.365Mb)

JAIME TEEVAN [email protected]

Chord: a Scalable Peer-To-Peer Lookup Service for Internet

Overcite: a Cooperative Digital Research Library

Overcite: a Distributed, Cooperative Citeseer

Algorithmic Engineering Towards More Efficient Key-Value Systems Bin

P43-Balakrishnan.Pdf