Evaluating the Power of Flexible Packet Processing for Network Resource Allocation Naveen Kr

Evaluating the Power of Flexible Packet Processing for Network Resource Allocation Naveen Kr. Sharma, Antoine Kaufmann, and Thomas Anderson, University of Washington; Changhoon Kim, Barefoot Networks; Arvind Krishnamurthy, University of Washington; Jacob Nelson, Microsoft Research; Simon Peter, The University of Texas at Austin https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/sharma This paper is included in the Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’17). March 27–29, 2017 • Boston, MA, USA ISBN 978-1-931971-37-9 Open access to the Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation is sponsored by USENIX. Evaluating the Power of Flexible Packet Processing for Network Resource Allocation Naveen Kr. Sharma∗ Antoine Kaufmann∗ Thomas Anderson∗ Changhoon Kim† Arvind Krishnamurthy∗ Jacob Nelson‡ Simon Peter§ Abstract of the packet header, perform simple computations on Recent hardware switch architectures make it feasible values in packet headers, and maintain mutable state that to perform flexible packet processing inside the net- preserves the results of computations across packets. Im- work. This allows operators to configure switches to portantly, these advanced data-plane processing features parse and process custom packet headers using flexi- operate at line rate on every packet, addressing a ma- ble match+action tables in order to exercise control over jor limitation of earlier solutions such as OpenFlow [22] how packets are processed and routed. However, flexible which could only operate on a small fraction of packets, switches have limited state, support limited types of op- e.g., for flow setup. FlexSwitches thus hold the promise erations, and limit per-packet computation in order to be of ushering in the new paradigm of a software defined able to operate at line rate. dataplane that can provide datacenter applications with Our work addresses these limitations by providing a greater control over the network’s datapaths. set of general building blocks that mask these limita- Despite their promising new functionality, Flex- tions using approximation techniques and thereby en- Switches are not all-powerful. Per-packet processing ca- abling the implementation of realistic network proto- pabilities are limited and so is stateful memory. There cols. In particular, we use these building blocks to tackle has not yet been a focused study on how the new hard- the network resource allocation problem within datacen- ware features can be used in practice. Consequently, ters and realize approximate variants of congestion con- there is limited understanding of how to take advantage trol and load balancing protocols, such as XCP, RCP, of FlexSwitches, nor is there insight into how effective and CONGA, that require explicit support from the net- are the hardware features. In other words, do Flex- work. Our evaluations show that these approximations Switches have instruction sets that are sufficiently pow- are accurate and that they do not exceed the hardware erful to support the realization of networking protocols resource limits associated with these flexible switches. that require in-network processing? We demonstrate their feasibility by implementing RCP This paper takes a first step towards addressing this with the production Cavium CNX880xx switch. This question by studying the use of FlexSwitches in the con- implementation provides significantly faster and lower- text of a classic problem: network resource allocation. variance flow completion times compared with TCP. We focus on resource allocation because it has a rich literature that advocates per-packet dataplane processing 1 Introduction in the network. Researchers have used dataplane pro- Innovation in switch design now leads to not just cessing to provide rate adjustments to end-hosts (con- faster but more flexible packet processing architectures. gestion control), determine meaningful paths through Whereas early generations of software-defined network- the network (load balancing), schedule or drop packets ing switches could specify forwarding paths on a per- (QoS, fairness), monitor flows to detect anomalies and flow basis, today’s latest switches support configurable resource exhaustion attacks (IDS), and so on. Our goal per-packet processing, including customizable packet is to evaluate the feasibility of implementing these pro- headers and the ability to maintain state inside the switch. tocols on a concrete and realizable hardware model for Examples include Intel FlexPipe [26], Texas Instru- programmable data planes. ments’s Reconfigurable Match Tables (RMTs) [13], the We begin our study with a functional analysis of the Cavium XPliant switches [15], and Barefoot’s Tofino set of issues that arise in realizing protocols such as the switches [10]. We term these switches FlexSwitches. RCP [17] congestion control protocol on a FlexSwitch. FlexSwitches give greater control over the network At first sight, the packet processing capabilities of Flex- by exposing previously proprietary switching features. Switches appear to be insufficient for RCP and associ- They can be reprogrammed to recognize, modify, and ated protocols; today’s FlexSwitches provide a limited add new header fields, choose actions based on user- number of data plane operators and also limit the num- defined match rules that examine arbitrary components ber of operations performed per packet and the amount of state maintained by the switch. While some of these lim- ∗University of Washington †Barefoot Networks its can be addressed through hardware redesigns that sup- ‡Microsoft Research port an enhanced set of operators, the limits associated §University of Texas at Austin with switch state and operation count are likely harder USENIX Association 14th USENIX Symposium on Networked Systems Design and Implementation 67 to overcome if the switches are to operate at line rate. We therefore develop a set of building blocks designed to mask the absence of complex operators and to overcome the switch resource limits through the use of approximation algorithms adapted from the literature on streaming algorithms. We then demonstrate that it is possible to implement approximate versions of classic resource allocation protocols using two design principles. First, our approximations only need to provide an appropriate level of accuracy, taking advantage of known workload prop- erties of datacenter networks. Second, we co-design the network computation with end-host processing to reduce the computational demand on the FlexSwitch. Figure 1: Abstract Switch Model (from [12]). Our work makes the following contributions: • We identify and implement a set of building blocks data are passed to an ingress pipeline of user-defined that mask the constraints associated with Flex- M+A tables. Each table matches on a subset of ex- Switches. We quantify the tradeoff between resource tracted fields and can apply simple processing primi- use and accuracy under various realistic workloads. tives to any field, ordered by a user-defined control pro- gram. The ingress pipeline also chooses an output queue • We show how to implement a variety of network re- for the packet, determining its forwarding destination. source allocation protocols using our building blocks. Packets pass through an egress pipeline for destination- • In many instances, our FlexSwitch realization is an ap- dependent modifications before output. Legacy forward- proximate variant of the original protocol, so we eval- ing rules (e.g., IP longest prefix match) may also impact uate the extent to which we are able to match the per- packet forwarding and modification. formance of the unmodified protocol. Using ns-3 sim- To support packet processing on the data path, Flex- ulations, we show that our approximate variants emu- Switches provide several hardware features: late their precise counterparts with high accuracy. Computation primitives that perform a limited amount • Using an implementation on a production FlexSwitch of processing on header fields. This includes operations and emulation on an additional production FlexSwitch such as addition, bit-shifts, hashing, and max/min. hardware model, we show that our protocol implemen- M+A tables generalize the abstraction of tations are feasible on today’s hardware. match+action provided by OpenFlow, allowing matches 2 Resource Allocation Case Study and actions on any user-defined packet field. Actions may add, modify, and remove fields. We begin by studying whether FlexSwitches are capable A limited amount of stateful memory can maintain of supporting RCP, a classic congestion control proto- state across packets, such as counters, meters, and reg- col that assumes switch dataplane operations. We first isters, that can be used while processing. describe an abstract FlexSwitch hardware model based Switch meta-data, such as queue lengths, congestion on existing and upcoming switches. We then exam- status, and bytes transferred, augment stateful memory ine whether we can implement RCP on the FlexSwitch and can also be used in processing. model. We identify a number of stumbling blocks that Timers built into the hardware can invoke the switch- would prevent a direct implementation. local CPU to perform periodic computation such as up- FlexSwitch hardware model. Rather than support- dating M+A table entries or exporting switch meta-data ing arbitrary per-packet computation, most FlexSwitches off-switch to a central controller. provide a match+action

Evaluating the Power of Flexible Packet Processing for Network Resource Allocation Naveen Kr

Packet Processing at Wire Speed Using Network Processors

Packet Processing Execution Engine (PROX) - Performance Characterization for NFVI User Guide

Lightweight Internet Protocol Stack Ideal Choice for High-Performance HFT and Telecom Packet Processing Applications Lightweight Internet Protocol Stack

High Level Synthesis for Packet Processing Pipelines Cristian

An Architecture for High-Speed Packet Switched Networks (Thesis)

ICMP for Ipv6

Packet Processing with a Noc-Enhanced FPGA

Pathminer Powered Predictable Packet Processing

Impressive Packet Processing Performance Enables Greater Workload Consolidation Single Intel® Xeon® Processor Platform Achieves Over 80 Mpps Throughput1

High Level Synthesis Tool for High Speed Packet Processing

High Performance Packet Processing with Flexnic

IMPLEMENTING VOICE OVER IP Copyright 6 2003 by John Wiley & Sons, Inc