Packet Processing at Wire Speed Using Network Processors

Packet processing at wire speed using Network processors Chethan Kumar and Hao Che University of Texas at Arlington {ckumar, hche}@cse.uta.edu Abstract 1 Introduction Recent developments in fiber optics and the new The modern day Internet has seen an explosive bandwidth hungry applications have put more stress growth of applications being used on it. As more and on the active components (switches, routers etc.,) of a more applications are being developed, there is an network. Optical fiber bandwidth is no longer a increase in the amount of load put on the internet. At constraint for increasing the network bandwidth. the same time the fiber optics bandwidth has However, the processing power of the network has not increased dramatically to meet the traffic demand, but scaled upto the increase in the fiber bandwidth. the present day routers have limited processing power Communication industry is looking forward for more to handle this profound demand increase. Hence the innovative ways of designing router1 architecture and networking and telecommunications industry is research is being conducted to develop a scalable, compelled to look for new solutions for improving flexible and cost-effective architecture for routers. A the performance and the processing power of the successful outcome of this effort is a specialized routers. processor called Network processor. Network processor provides performance at hardware speeds One of the industry’s solutions to the challenges while attaining the flexibility of software. Network posed by the increased demand for the processing processors from different vendors employ different power is programmable functional units grouped into architectures and the choice of a particular type of a processor called Application Specific Instruction network processor can affect the architecture of the Processor (ASIP) or Network processor (NP)2. NPs router and the performance of the whole system. offer the ease of programming with high scalability. Selecting the optimal design for router architecture They offer dedicated processing power to the routers with a particular type of network processor can be for performing standard RFC complaint packet very difficult. A systematic modeling framework has processing tasks while allowing the slower, higher to be developed to analyze the impact of various level control and management tasks to be performed design choices on the system performance. This in the general purpose CPU. It is this separation of framework should be simple, efficient and easy to tasks which allows the router to harness the full comprehend. In this paper, we provide a survey of the power of the NP. To be able to optimize the ongoing research works in network processing field, performance of a NP there should be clear guidelines the problems faced for processing packets at wire for dividing the tasks (also called as function speed, some of the solutions developed to address partitioning) between the NP and the general purpose these problems, router and network processor processor. Further, the tasks that can be executed in architectures, simulation and analysis tools for the NP can be split between the slow CPU and the routers and network processors. fast processing elements3 within the NP. 2 Network Processor is also referred as Network Processing Unit (NPU) 1 A router also refers to multi-service switch which includes 3 Different vendors use different names for processing multiple Asynchronous Transfer Mode (ATM) and frame- elements. Intel uses the term “Micro Engine”. AMCC calls relay interfaces it as “nPCore” Different vendors use different architectures to design the performance of NP can be very helpful to Network processors. A NP’s architecture can affect its designers and system integrators to evaluate different performance and thereby the performance of the router NP architectures and choose the right one for their as a whole. Benchmarking the performance of NPs system. In Sec. 8 we try to address one of the most based on different architectures will help system important issues which can cause undeterministic integrators to choose the right kind of NP for their behavior of the NP – memory access latency. routers. However, just choosing the right kind of NP Memory access latencies can prohibit the NPs in will not be sufficient to boost the performance of a achieving wire speed4 processing. We look at some router. Of course, there has to be several other of the solutions to solve the memory access latency components working in tandem with the NP. The problem. Finally, we conclude our discussion in Sec. individual ability of the hardware components along 9. with their interactions can affect the performance of the whole system which can be much different than what is anticipated. Good modeling frameworks to analyze and 2 Function Partitioning quantify the effect of individual components and their interactions will make the choice of design for a system a lot easier. Policy Applications Network Management Control plane This survey paper is an effort to give an overview of Signaling ongoing research efforts in Network processing fields. Topology Management This paper is organized as follows: in Sec. 2 we Queuing / Scheduling mention some of the common packet processing tasks and explain how these tasks can be partitioned. This Data Transformation gives us a big picture of the role of NP and the division Classification Forwarding plane of tasks among different processors in a router. In Sec. Data Parsing 3 we describe how the packet processing tasks are Media Access Control mapped to the physical components and study some of Physical Layer the architectural solutions to design a router. In Sec. 4 we discuss some of the techniques to evaluate the Fig 1: Packet processing tasks performance of routers based on their designs. Here we study different modeling frameworks from a system- level perspective. These frameworks are helpful not The key to an efficient design for a router is the only to analyze the capabilities of the individual understanding of the nature of the packet processing components, but also to quantify the effect of tasks and dividing the tasks into the functional interactions of the components on the system components. Packet processing tasks can be broadly architecture. In Sec. 5 we talk about NPs, different categorized into: types of NPs and the programming model for NPs. We describe some of the issues for processing the packets • Forwarding plane (Data path) tasks – group of at line speed and mention some of the techniques to tasks on the forwarding path of a router. These address these issues. In Sec. 6 we explain two of the include receiving, processing and transmitting the important characteristics of NP – multithreading and packets. pipelining. In this section we also describe different types of pipelining techniques, their advantages and • Control Plane (Control path) tasks – group of disadvantages. A good understanding of multithreading tasks which involve the control and management and pipelining is very important to analyze NP operations. These comprise of maintaining architecture. In Sec. 7 we mention some of the tools to routing table, ICMP packet processing, building model and analyze the performance of NP. Analysis of different architectures and comparing their effects on 4 Also called as Line speed processing up routing tree, network monitoring and One of the hidden outcomes of the separation of management tasks. control plane and data plane is the ability to forward packets even in case of CE failure and/or restart. A detailed description of these packet processing tasks Since the CE and FE association is dynamic can be found in [6] [7] & [14]. availability can be increased through mechanisms such as graceful restart [3]. As trivial as it sounds, partitioning the tasks into control plane and data plane has been one of the biggest A similar framework [4] has been developed by Intel challenges in network processing. Researchers are for their IXP series of network processors. However trying to come out with a framework for partitioning this framework is limited to Intel network processors. the packet processing tasks. One such effort is being conducted by the ForCES (Forwarding and Control Data path tasks can be further partitioned into slow Element Separation) [3] group of IETF. ForCES work and fast data path functions [12] [14]. Invoking slow group is trying to come out with an architectural data path functions for a packet processing results in framework for the data plane and control plane slow data path forwarding of that packet. The separation, identifying the associated entities in each of purpose of using slow data path forwarding is to these planes and the interactions among them. process packets which need special treatment and more resources. The slow data path functions may The network entity (such as a router) is subdivided into include, for e.g., packet fragmentation and options two logical sub elements known as Control Element field processing. Fast data path functions include IP (CE) and Forwarding Element (FE). Forwarding header validation, IP header lookup, firewall/policy elements can be hardware based (ASIC), programmable filtering, MPLS label swapping. (Network Processor) or software based (implemented with a general purpose CPU). Forwarding Element handles all the data plane tasks shown in fig 1. Control 3 Mapping packet processing tasks to Elements are based on general purpose CPU physical components implementing the control plane

Packet Processing at Wire Speed Using Network Processors

Packet Processing Execution Engine (PROX) - Performance Characterization for NFVI User Guide

Lightweight Internet Protocol Stack Ideal Choice for High-Performance HFT and Telecom Packet Processing Applications Lightweight Internet Protocol Stack

High Level Synthesis for Packet Processing Pipelines Cristian

Evaluating the Power of Flexible Packet Processing for Network Resource Allocation Naveen Kr

An Architecture for High-Speed Packet Switched Networks (Thesis)

ICMP for Ipv6

Packet Processing with a Noc-Enhanced FPGA

Pathminer Powered Predictable Packet Processing

Impressive Packet Processing Performance Enables Greater Workload Consolidation Single Intel® Xeon® Processor Platform Achieves Over 80 Mpps Throughput1

High Level Synthesis Tool for High Speed Packet Processing

High Performance Packet Processing with Flexnic

IMPLEMENTING VOICE OVER IP Copyright 6 2003 by John Wiley & Sons, Inc