Packet Processing with a Noc-Enhanced FPGA

Bringing Programmability to the Data Plane: Packet Processing with a NoC-Enhanced FPGA Andrew Bitar, Mohamed S. Abdelfattah, Vaughn Betz Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada fbitar, mohamed, [email protected] 4000 Abstract—Modern computer networks need components that can evolve to support both the latest bandwidth demands and 3500 new protocols and features. To address this need, we propose a new programmable packet processor architecture built from 3000 an FPGA containing an embedded Network-on-Chip (NoC). The 2500 architecture is highly flexible, providing more programmability than is possible in an ASIC-based design, while supporting 2000 throughputs of 400 and 800 Gb/s. Additionally, we show that our 1500 design is 1.7× and 3.2× more area efficient, and achieves 1.5× and 3.7× lower latency than the best previously proposed FPGA- 1000 based packet processor on complex and simple applications, (Gb/s) BW Tranceiver Total respectively. Lastly, we explore various ways a designer can take 500 advantage of the flexibility available in this architecture. 0 I. INTRODUCTION Stratix I Stratix II Stratix IV Stratix V Stratix X Computer networks have seen rapid evolution over the past Fig. 1: Transceiver bandwidth available on Altera Stratix devices has rapidly grown with every new generation. The three data points decade. “Cloud computing” and the “Internet of Things” are for each generation correspond to the device models with the three becoming household terms, as we move to an era where highest transceiver BW. computational power is offloaded from the PC and onto data centers located miles away. This surge in demand on networking can bring the high bandwidth data onto the chip, these designs capabilities has led to new network protocols and functionalities still rely on buses made of the FPGA’s soft reconfigurable being created, updated and enhanced. The implementation interconnect to move the data across the chip. These buses are of these protocols and functionalities has proven challenging slow (typically 100-400 MHz) and therefore must be made in current network infrastructures, causing a demand for very wide, consuming high amounts of interconnect area and “programmable networks”. creating challenging timing closure problems for the designer. Software-Defined Networking (SDN) is a proposed network- Thus, it should come as no surprise that FPGAs have yet to ing paradigm that provides programmability by configuring be widely adopted throughout network infrastructures. network hardware (i.e. the “data plane”) through a separate Recent work has looked closely at this FPGA interconnect software-programmable “control plane” [1]. Various APIs have problem and has argued for the inclusion of a new system- been developed to interface the control plane with the data level interconnect in the form of a Network-on-Chip (NoC) [8]. plane, with OpenFlow [2] being a popular one in the academic Hardening such a NoC in the FPGA’s silicon would provide community. However, the degree of programmability provided a fast, chip-wide interconnect that consumes a small fraction by these APIs is limited by the capabilities of the data plane. If of the FPGA area [8]. Such a NoC-enhanced FPGA has the new protocols are developed with functionalities that go beyond potential to better transport high-bandwidth transceiver data what is available in the hardware, then expensive hardware in networking applications. Prior work has already shown that replacements will still be necessary. an efficient and programmable Ethernet switch can be built Field-programmable gate arrays (FPGAs) have long provided from a NoC-enhanced FPGA that far outperforms previous a hardware-programmable alternative to fixed ASIC designs. FPGA-based switch designs [9]. The reconfigurability of FPGAs seems like a natural solution In this work, we consider another fundamental network to the demand for programmable networks; FPGA designs building block: the packet processor. In general terms, a packet can be re-programmed in hardware to serve network evolu- processor is a device that performs actions based on the contents tion. However, what FPGAs gain in programmability they of received packetized data. The flexibility of this unit is lose in performance; Kuon and Rose quantified the FPGA’s imperative to the future programmability of computer networks. programmability overhead to be 18×-35× in area and 3×-4× Recent works [4, 10–12] have proposed various ways to in critical path delay compared to ASICs [3]. As a result, one build packet processors that support SDN/OpenFlow, with the would expect FPGAs to struggle with efficiently supporting majority using match tables that can be configured to support the high bandwidth demands of modern computer networks. various types of processing (Section II-B). Our design takes a This is generally true; as a whole, established network different route. By interconnecting multiple, protocol-specific infrastructures are largely dominated by ASICs. FPGA designs processing modules through a NoC embedded in an FPGA, we have made some breakthroughs in various networking applica- develop a new form of packet processor design that provides a tions [4–7], thanks to the increasing transceiver bandwidth avail- high degree of flexibility for network evolution. In effect, our able in modern FPGAs (Figure 1). Although the transceivers design brings programmability directly to the data plane. 978-1-4673-9091-0/15/$31.00 2015 IEEE Memory Controllers and other I/Os FPGA Ethernet Links (Hard) VLAN VLAN Router (Hard) IPv4 IPv6 Transceivers Transceivers Module (Soft) TCP UDP Memory Controllers and other I/Os (a) Parse Graph Fig. 2: A NoC-Enhanced FPGA. The embedded NoC is implemented Ethernet IPv4 TCP Payload in hard logic, and connects to modules on the FPGA through a 4 P B B v FabricPort [8]. C 0 0 P T I 2 2 : : : : t t x n n x e e e Our focus is to explore this new form of packet processor e L L N design. To this end, we make the following contributions: N 1) Propose a new packet processor architecture that maximizes (b) TCP/IP Packet hardware flexibility while efficiently supporting modern Fig. 3: An example parse graph commonly seen in enterprise networks, network bandwidths (400G and 800G). and a packet belonging to that parse graph. A parse graph can be used 2) Evaluate the architecture by implementing common packet to represent the combinations of protocols that a packet processor parsing and processing use-cases. may see in a certain application. 3) Compare its resource utilization and performance to the NoC can connect a module running at any FPGA frequency best previously proposed FPGA packet processor. and any width between 1 and 600 bits to the NoC routers 4) Explore how a designer can take advantage of this archi- at 150 bits and 1.2 GHz. It does so using a combination of tecture’s flexibility. clock-crossing and width adaptation circuitry that formats data II. BACKGROUND into NoC packets in an efficient way [8]. A. NoC-Enhanced FPGA Existing FPGAs contain only fine-grained programmable B. Match Table Packet Processors blocks (lookup-tables, registers and multiplexers) from which Depending on the application, a packet processor must be an application’s interconnect can be constructed. Previous work able to process packets of a certain set of protocols. These has proposed the inclusion of an embedded NoC to augment this protocols vary depending on the type of packet and layer they fine-grained interconnect so as to better interconnect an FPGA represent on the OSI stack. For example, Ethernet is one of application. Embedded NoCs have been shown to improve both the most common layer 2 protocols found in modern networks. area and power efficiency, and ease timing closure compared The protocol at each layer contains information regarding the to traditionally-used soft buses configured from the FPGA’s type and location of the protocol found in the next higher layer programmable blocks [13–15]. (Figure 3b). The various different combinations of protocols Figure 2 shows an embedded NoC on the FPGA. The NoC that may be received in a single packet can be represented by routers and links run approximately four times as fast as the a parsing graph [11] (Figure 3a). FPGA fabric; therefore, we use a special component called the As network protocols are changed, added and replaced, a FabricPort to bridge both width and frequency in a flexible way programmable packet processor must be able to be reconfigured between the embedded NoC routers and soft FPGA modules [8]. to support different parsing graphs. The OpenFlow standard Furthermore, the processing modules connected to the NoC comes with specifications for architecting an SDN-compatible need not operate using the same clock frequency or phase – packet processor [16]. The architecture is based on a cascade the FabricPort essentially decouples the modules connected of match-action “flow” tables: memory holding all possible together through the NoC. The routers are packet-switched values of relevant packet header fields, and instructions for virtual-channel routers that perform distributed arbitration and what action to be taken once matched. Figure 4 illustrates switching of packets coming from modules connected to the high level design of this architecture. The pipeline begins the NoC. These routers also contain built-in buffering and by matching fields whose locations in the packet header are credit-based backpressure, and so can automatically respond to known. When a field is matched, the entry contains information bursts of data and heavy traffic on its links while maintaining regarding where to look in the next match table for the protocol application correctness. of the next layer, as well as any other corresponding actions The NoC we use has been evaluated on a large 28-nm Stratix to be taken.

Packet Processing with a Noc-Enhanced FPGA

Packet Processing at Wire Speed Using Network Processors

Packet Processing Execution Engine (PROX) - Performance Characterization for NFVI User Guide

Lightweight Internet Protocol Stack Ideal Choice for High-Performance HFT and Telecom Packet Processing Applications Lightweight Internet Protocol Stack

High Level Synthesis for Packet Processing Pipelines Cristian

Evaluating the Power of Flexible Packet Processing for Network Resource Allocation Naveen Kr

An Architecture for High-Speed Packet Switched Networks (Thesis)

ICMP for Ipv6

Pathminer Powered Predictable Packet Processing

Impressive Packet Processing Performance Enables Greater Workload Consolidation Single Intel® Xeon® Processor Platform Achieves Over 80 Mpps Throughput1

High Level Synthesis Tool for High Speed Packet Processing

High Performance Packet Processing with Flexnic

IMPLEMENTING VOICE OVER IP Copyright 6 2003 by John Wiley & Sons, Inc