Design Patterns for Packet Processing Applications on Multi-Core IA Processors
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Control Plane Overview Subnet 1.2 Subnet 1.2
The Network Layer: Control Plane Overview Subnet 1.2 Subnet 1.2 R3 R2 R6 Interior Subnet 1.2 R5 Subnet 1.2 Subnet 1.2 R7 1. Routing Algorithms: Link-State, Distance Vector R8 R4 R1 Subnet 1.2 Dijkstra’s algorithm, Bellman-Ford Algorithm Exterior Subnet 1.2 Subnet 1.2 2. Routing Protocols: OSPF, BGP Raj Jain 3. SDN Control Plane Washington University in Saint Louis 4. ICMP Saint Louis, MO 63130 5. SNMP [email protected] Audio/Video recordings of this lecture are available on-line at: Note: This class lecture is based on Chapter 5 of the textbook (Kurose and Ross) and the figures provided by the authors. http://www.cse.wustl.edu/~jain/cse473-16/ Washington University in St. Louis http://www.cse.wustl.edu/~jain/cse473-16/ ©2016 Raj Jain Washington University in St. Louis http://www.cse.wustl.edu/~jain/cse473-16/ ©2016 Raj Jain 5-1 5-2 Network Layer Functions Overview Routing Algorithms T Forwarding: Deciding what to do with a packet using 1. Graph abstraction a routing table Data plane 2. Distance Vector vs. Link State T Routing: Making the routing table Control Plane 3. Dijkstra’s Algorithm 4. Bellman-Ford Algorithm Washington University in St. Louis http://www.cse.wustl.edu/~jain/cse473-16/ ©2016 Raj Jain Washington University in St. Louis http://www.cse.wustl.edu/~jain/cse473-16/ ©2016 Raj Jain 5-3 5-4 Rooting or Routing Routeing or Routing T Rooting is what fans do at football games, what pigs T Routeing: British do for truffles under oak trees in the Vaucluse, and T Routing: American what nursery workers intent on propagation do to T Since Oxford English Dictionary is much heavier than cuttings from plants. -
C5ENPA1-DS, C-5E NETWORK PROCESSOR SILICON REVISION A1
Freescale Semiconductor, Inc... SILICON REVISION A1 REVISION SILICON C-5e NETWORK PROCESSOR Sheet Data Rev 03 PRELIMINARY C5ENPA1-DS/D Freescale Semiconductor,Inc. F o r M o r G e o I n t f o o : r w m w a t w i o . f n r e O e n s c T a h l i e s . c P o r o m d u c t , Freescale Semiconductor, Inc... Freescale Semiconductor,Inc. F o r M o r G e o I n t f o o : r w m w a t w i o . f n r e O e n s c T a h l i e s . c P o r o m d u c t , Freescale Semiconductor, Inc... Freescale Semiconductor,Inc. Silicon RevisionA1 C-5e NetworkProcessor Data Sheet Rev 03 C5ENPA1-DS/D F o r M o r Preli G e o I n t f o o : r w m w a t w i o . f n r e O e n s c T a h l i e s . c P o r o m m d u c t , inary Freescale Semiconductor, Inc... Freescale Semiconductor,Inc. F o r M o r G e o I n t f o o : r w m w a t w i o . f n r e O e n s c T a h l i e s . c P o r o m d u c t , Freescale Semiconductor, Inc. C5ENPA1-DS/D Rev 03 CONTENTS . -
The Network Layer: Control Plane
CHAPTER 5 The Network Layer: Control Plane In this chapter, we’ll complete our journey through the network layer by covering the control-plane component of the network layer—the network-wide logic that con- trols not only how a datagram is forwarded among routers along an end-to-end path from the source host to the destination host, but also how network-layer components and services are configured and managed. In Section 5.2, we’ll cover traditional routing algorithms for computing least cost paths in a graph; these algorithms are the basis for two widely deployed Internet routing protocols: OSPF and BGP, that we’ll cover in Sections 5.3 and 5.4, respectively. As we’ll see, OSPF is a routing protocol that operates within a single ISP’s network. BGP is a routing protocol that serves to interconnect all of the networks in the Internet; BGP is thus often referred to as the “glue” that holds the Internet together. Traditionally, control-plane routing protocols have been implemented together with data-plane forwarding functions, monolithi- cally, within a router. As we learned in the introduction to Chapter 4, software- defined networking (SDN) makes a clear separation between the data and control planes, implementing control-plane functions in a separate “controller” service that is distinct, and remote, from the forwarding components of the routers it controls. We’ll cover SDN controllers in Section 5.5. In Sections 5.6 and 5.7 we’ll cover some of the nuts and bolts of managing an IP network: ICMP (the Internet Control Message Protocol) and SNMP (the Simple Network Management Protocol). -
Design and Implementation of a Stateful Network Packet Processing
610 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 25, NO. 1, FEBRUARY 2017 Design and Implementation of a Stateful Network Packet Processing Framework for GPUs Giorgos Vasiliadis, Lazaros Koromilas, Michalis Polychronakis, and Sotiris Ioannidis Abstract— Graphics processing units (GPUs) are a powerful and TCAMs, have greatly reduced both the cost and time platform for building the high-speed network traffic process- to develop network traffic processing systems, and have ing applications using low-cost hardware. The existing systems been successfully used in routers [4], [7] and network tap the massively parallel architecture of GPUs to speed up certain computationally intensive tasks, such as cryptographic intrusion detection systems [27], [34]. These systems offer a operations and pattern matching. However, they still suffer from scalable method of processing network packets in high-speed significant overheads due to critical-path operations that are still environments. However, implementations based on special- being carried out on the CPU, and redundant inter-device data purpose hardware are very difficult to extend and program, transfers. In this paper, we present GASPP, a programmable net- and prohibit them from being widely adopted by the industry. work traffic processing framework tailored to modern graphics processors. GASPP integrates optimized GPU-based implemen- In contrast, the emergence of commodity many-core tations of a broad range of operations commonly used in the architectures, such as multicore CPUs and modern graph- network traffic processing applications, including the first purely ics processors (GPUs) has proven to be a good solution GPU-based implementation of network flow tracking and TCP for accelerating many network applications, and has led stream reassembly. -
IS-IS Client for BFD C-Bit Support
IS-IS Client for BFD C-Bit Support The Bidirectional Forwarding Detection (BFD) protocol provides short-duration detection of failures in the path between adjacent forwarding engines while maintaining low networking overheads. The BFD IS-IS Client Support feature enables Intermediate System-to-Intermediate System (IS-IS) to use Bidirectional Forwarding Detection (BFD) support, which improves IS-IS convergence as BFD detection and failure times are faster than IS-IS convergence times in most network topologies. The IS-IS Client for BFD C-Bit Support feature enables the network to identify whether a BFD session failure is genuine or is the result of a control plane failure due to a router restart. When planning a router restart, you should configure this feature on all neighboring routers. • Finding Feature Information, on page 1 • Prerequisites for IS-IS Client for BFD C-Bit Support, on page 1 • Information About IS-IS Client for BFD C-Bit Support, on page 2 • How to Configure IS-IS Client for BFD C-Bit Support, on page 2 • Configuration Examples for IS-IS Client for BFD C-Bit Support, on page 3 • Additional References, on page 4 • Feature Information for IS-IS Client for BFD C-Bit Support, on page 4 Finding Feature Information Your software release may not support all the features documented in this module. For the latest caveats and feature information, see Bug Search Tool and the release notes for your platform and software release. To find information about the features documented in this module, and to see a list of the releases in which each feature is supported, see the feature information table. -
Embedded Multi-Core Processing for Networking
12 Embedded Multi-Core Processing for Networking Theofanis Orphanoudakis University of Peloponnese Tripoli, Greece [email protected] Stylianos Perissakis Intracom Telecom Athens, Greece [email protected] CONTENTS 12.1 Introduction ............................ 400 12.2 Overview of Proposed NPU Architectures ............ 403 12.2.1 Multi-Core Embedded Systems for Multi-Service Broadband Access and Multimedia Home Networks . 403 12.2.2 SoC Integration of Network Components and Examples of Commercial Access NPUs .............. 405 12.2.3 NPU Architectures for Core Network Nodes and High-Speed Networking and Switching ......... 407 12.3 Programmable Packet Processing Engines ............ 412 12.3.1 Parallelism ........................ 413 12.3.2 Multi-Threading Support ................ 418 12.3.3 Specialized Instruction Set Architectures ....... 421 12.4 Address Lookup and Packet Classification Engines ....... 422 12.4.1 Classification Techniques ................ 424 12.4.1.1 Trie-based Algorithms ............ 425 12.4.1.2 Hierarchical Intelligent Cuttings (HiCuts) . 425 12.4.2 Case Studies ....................... 426 12.5 Packet Buffering and Queue Management Engines ....... 431 399 400 Multi-Core Embedded Systems 12.5.1 Performance Issues ................... 433 12.5.1.1 External DRAMMemory Bottlenecks ... 433 12.5.1.2 Evaluation of Queue Management Functions: INTEL IXP1200 Case ................. 434 12.5.2 Design of Specialized Core for Implementation of Queue Management in Hardware ................ 435 12.5.2.1 Optimization Techniques .......... 439 12.5.2.2 Performance Evaluation of Hardware Queue Management Engine ............. 440 12.6 Scheduling Engines ......................... 442 12.6.1 Data Structures in Scheduling Architectures ..... 443 12.6.2 Task Scheduling ..................... 444 12.6.2.1 Load Balancing ................ 445 12.6.3 Traffic Scheduling ................... -
Packet Processing at Wire Speed Using Network Processors
Packet processing at wire speed using Network processors Chethan Kumar and Hao Che University of Texas at Arlington {ckumar, hche}@cse.uta.edu Abstract 1 Introduction Recent developments in fiber optics and the new The modern day Internet has seen an explosive bandwidth hungry applications have put more stress growth of applications being used on it. As more and on the active components (switches, routers etc.,) of a more applications are being developed, there is an network. Optical fiber bandwidth is no longer a increase in the amount of load put on the internet. At constraint for increasing the network bandwidth. the same time the fiber optics bandwidth has However, the processing power of the network has not increased dramatically to meet the traffic demand, but scaled upto the increase in the fiber bandwidth. the present day routers have limited processing power Communication industry is looking forward for more to handle this profound demand increase. Hence the innovative ways of designing router1 architecture and networking and telecommunications industry is research is being conducted to develop a scalable, compelled to look for new solutions for improving flexible and cost-effective architecture for routers. A the performance and the processing power of the successful outcome of this effort is a specialized routers. processor called Network processor. Network processor provides performance at hardware speeds One of the industry’s solutions to the challenges while attaining the flexibility of software. Network posed by the increased demand for the processing processors from different vendors employ different power is programmable functional units grouped into architectures and the choice of a particular type of a processor called Application Specific Instruction network processor can affect the architecture of the Processor (ASIP) or Network processor (NP)2. -
Packet Processing Execution Engine (PROX) - Performance Characterization for NFVI User Guide
USER GUIDE Intel Corporation Packet pROcessing eXecution Engine (PROX) - Performance Characterization for NFVI User Guide Authors 1 Introduction Yury Kylulin Properly designed Network Functions Virtualization Infrastructure (NFVI) environments deliver high packet processing rates, which are also a required dependency for Luc Provoost onboarded network functions. NFVI testing methodology typically includes both Petar Torre functionality testing and performance characterization testing, to ensure that NFVI both exposes the correct APIs and is capable of packet processing. This document describes how to use open source software tools to automate peak traffic (also called saturation) throughput testing to characterize the performance of a containerized NFVI system. The text and examples in this document are intended for architects and testing engineers for Communications Service Providers (CSPs) and their vendors. This document describes tools used during development to evaluate whether or not a containerized NFVI can perform the required rates of packet processing within set packet drop rates and latency percentiles. This document is part of the Network Transformation Experience Kit, which is available at https://networkbuilders.intel.com/network-technologies/network-transformation-exp- kits. 1 User Guide | Packet pROcessing eXecution Engine (PROX) - Performance Characterization for NFVI Table of Contents 1 Introduction ................................................................................................................................................................................................................ -
Lightweight Internet Protocol Stack Ideal Choice for High-Performance HFT and Telecom Packet Processing Applications Lightweight Internet Protocol Stack
GE Intelligent Platforms Lightweight Internet Protocol Stack Ideal Choice for High-Performance HFT and Telecom Packet Processing Applications Lightweight Internet Protocol Stack Introduction The number of devices connected to IP networks will be nearly three times as high as the global population in 2016. There will be nearly three networked devices per capita in 2016, up from over one networked device per capita in 2011. Driven in part by the increase in devices and the capabilities of those devices, IP traffic per capita will reach 15 gigabytes per capita in 2016, up from 4 gigabytes per capita in 2011 (Cisco VNI). Figure 1 shows the anticipated growth in IP traffic and networked devices. The IP traffic is increasing globally at the breath-taking pace. The rate at which these IP data packets needs to be processed to ensure not only their routing, security and delivery in the core of the network but also the identification and extraction of payload content for various end-user applications such as high frequency trading (HFT), has also increased. In order to support the demand of high-performance IP packet processing, users and developers are increasingly moving to a novel approach of combining PCI Express packet processing accelerator cards in a standalone network server to create an accelerated network server. The benefits of using such a hardware solu- tion are explained in the GE Intelligent Platforms white paper, Packet Processing in Telecommunications – A Case for the Accelerated Network Server. 80 2016 The number of networked devices -
High Level Synthesis for Packet Processing Pipelines Cristian
High Level Synthesis for Packet Processing Pipelines Cristian Soviani Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2007 c 2007 Cristian Soviani All Rights Reserved ABSTRACT High Level Synthesis for Packet Processing Pipelines Cristian Soviani Packet processing is an essential function of state-of-the-art network routers and switches. Implementing packet processors in pipelined architectures is a well-known, established technique, albeit different approaches have been proposed. The design of packet processing pipelines is a delicate trade-off between the desire for abstract specifications, short development time, and design maintainability on one hand and very aggressive performance requirements on the other. This thesis proposes a coherent design flow for packet processing pipelines. Like the design process itself, I start by introducing a novel domain-specific language that provides a high-level specification of the pipeline. Next, I address synthesizing this model and calculating its worst-case throughput. Finally, I address some specific circuit optimization issues. I claim, based on experimental results, that my proposed technique can dramat- ically improve the design process of these pipelines, while the resulting performance matches the expectations of hand-crafted design. The considered pipelines exhibit a pseudo-linear topology, which can be too re- strictive in the general case. However, especially due to its high performance, such an architecture may be suitable for applications outside packet processing, in which case some of my proposed techniques could be easily adapted. Since I ran my experiments on FPGAs, this work has an inherent bias towards that technology; however, most results are technology-independent. -
Evaluating the Power of Flexible Packet Processing for Network Resource Allocation Naveen Kr
Evaluating the Power of Flexible Packet Processing for Network Resource Allocation Naveen Kr. Sharma, Antoine Kaufmann, and Thomas Anderson, University of Washington; Changhoon Kim, Barefoot Networks; Arvind Krishnamurthy, University of Washington; Jacob Nelson, Microsoft Research; Simon Peter, The University of Texas at Austin https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/sharma This paper is included in the Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’17). March 27–29, 2017 • Boston, MA, USA ISBN 978-1-931971-37-9 Open access to the Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation is sponsored by USENIX. Evaluating the Power of Flexible Packet Processing for Network Resource Allocation Naveen Kr. Sharma∗ Antoine Kaufmann∗ Thomas Anderson∗ Changhoon Kim† Arvind Krishnamurthy∗ Jacob Nelson‡ Simon Peter§ Abstract of the packet header, perform simple computations on Recent hardware switch architectures make it feasible values in packet headers, and maintain mutable state that to perform flexible packet processing inside the net- preserves the results of computations across packets. Im- work. This allows operators to configure switches to portantly, these advanced data-plane processing features parse and process custom packet headers using flexi- operate at line rate on every packet, addressing a ma- ble match+action tables in order to exercise control over jor limitation of earlier solutions such as OpenFlow [22] how packets are processed and routed. However, flexible which could only operate on a small fraction of packets, switches have limited state, support limited types of op- e.g., for flow setup. FlexSwitches thus hold the promise erations, and limit per-packet computation in order to be of ushering in the new paradigm of a software defined able to operate at line rate. -
Network Processors: Building Block for Programmable Networks
NetworkNetwork Processors:Processors: BuildingBuilding BlockBlock forfor programmableprogrammable networksnetworks Raj Yavatkar Chief Software Architect Intel® Internet Exchange Architecture [email protected] 1 Page 1 Raj Yavatkar OutlineOutline y IXP 2xxx hardware architecture y IXA software architecture y Usage questions y Research questions Page 2 Raj Yavatkar IXPIXP NetworkNetwork ProcessorsProcessors Control Processor y Microengines – RISC processors optimized for packet processing Media/Fabric StrongARM – Hardware support for Interface – Hardware support for multi-threading y Embedded ME 1 ME 2 ME n StrongARM/Xscale – Runs embedded OS and handles exception tasks SRAM DRAM Page 3 Raj Yavatkar IXP:IXP: AA BuildingBuilding BlockBlock forfor NetworkNetwork SystemsSystems y Example: IXP2800 – 16 micro-engines + XScale core Multi-threaded (x8) – Up to 1.4 Ghz ME speed RDRAM Microengine Array Media – 8 HW threads/ME Controller – 4K control store per ME Switch MEv2 MEv2 MEv2 MEv2 Fabric – Multi-level memory hierarchy 1 2 3 4 I/F – Multiple inter-processor communication channels MEv2 MEv2 MEv2 MEv2 Intel® 8 7 6 5 y NPU vs. GPU tradeoffs PCI XScale™ Core MEv2 MEv2 MEv2 MEv2 – Reduce core complexity 9 10 11 12 – No hardware caching – Simpler instructions Î shallow MEv2 MEv2 MEv2 MEv2 Scratch pipelines QDR SRAM 16 15 14 13 Memory – Multiple cores with HW multi- Controller Hash Per-Engine threading per chip Unit Memory, CAM, Signals Interconnect Page 4 Raj Yavatkar IXPIXP 24002400 BlockBlock DiagramDiagram Page 5 Raj Yavatkar XScaleXScale