NetBind: A Binding Tool for Constructing Data Paths in Network Processor-Based Routers Andrew T. Campbell, Stephen T. Chou, Michael E. Kounavis, Vassilis D. Stachtos and John Vicente network processors because these techniques introduce Abstract-- There is growing interest in network processor considerable overhead in terms of additional instructions in technologies capable of processing packets at line rates. In this the critical path. One solution to this problem is to optimize paper, we present the design, implementation and evaluation the code produced by a binding tool, once data path of NetBind, a high performance, flexible and scalable binding composition has taken place. Code optimization algorithms tool for dynamically constructing data paths in network can be complex and time-consuming, however, and thus not processor-based routers. The methodology that underpins NetBind balances the flexibility of network programmability suitable for applications that require fast data path against the need to process and forward packets at line speeds. composition, (e.g., data path composition on a packet-by- Data paths constructed using NetBind seamlessly share the packet basis). We believe that a binding tool for network resources of the same network processor. We compare the processor-based routers needs to balance the flexibility of performance of NetBind to the MicroACE system developed network programmability against the need to process and by Intel to support binding between software components forward packets at line rates. This poses significant running on Intel IXP1200 network processors. We evaluate challenges. these alternative approaches in terms of their binding In this paper, we present the design, implementation and overhead, and discuss how this can affect the forwarding evaluation of NetBind, a high performance, flexible and performance of IPv4 data paths running on IXP1200 network processor-based routers. We show that NetBind provides scalable binding tool for creating composable data paths in better performance in comparison to MicroACE with smaller network processor-based routers. By “high performance” binding overhead. The NetBind source code described and we mean that NetBind can produce data paths that forward evaluated in this paper is freely available on the Web minimum size packets at line rates without introducing (comet.columbia.edu/genesis/netbind) for experimentation. significant overhead in the critical path. By “flexible” we mean that NetBind allows data paths to be composed at a I. INTRODUCTION fine granularity from components supporting simple operations on packet headers and payloads. By “scalable” Recently, there has been a growing interest in network we mean that NetBind can be used across a wide range of processor technologies [1-4] that can support software- applications and time scales. NetBind can be used to based implementations of the critical path while processing dynamically program a diverse set of data paths. For packets at high speeds. Network processors use specialized example, in [14] we show how Cellular IP [6] data paths architectures that employ multiple processing units to offer can be composed for network processor-based radio routers. high packet-processing throughput. We believe that NetBind can create packet-processing pipelines through the introducing programmability in network processor-based dynamic binding of small pieces of machine language code. routers is an important area of research that has not been In NetBind, data path components export symbols, which fully addressed as yet. The difficulty stems from the fact are used during the binding process. A binder modifies the network processor-based routers need to forward minimum machine language code of executable components at run- size packets at line rates and yet support modular and time. As a result, components can be seamlessly merged extensible data paths. Typically, the higher the line rate into a single code piece. In order to support fast data path supported by a network processor-based router the smaller composition, NetBind reduces the number of binding the set of instructions that can be executed in the critical operations required for constructing data paths to a path. minimum set so that binding latencies are comparable to Data path modularity and extensibility requires the packet forwarding times. While the design of NetBind is dynamic binding between independently developed packet guided by a set of general principles that make it applicable processing components. Traditional techniques for realizing to a class of network processors, the current implementation dynamic binding, (e.g., insertion of code stubs or of the tool is focused toward the Intel IXP1200 network indirection through function tables), cannot be applied to processor. Porting NetBind to other network processors is for future work. Andrew T. Campbell, Stephen T. Chou, Michael E. Kounavis and Vassilis This paper is structured as follows. In Section II we D. Stachtos are members of the COMET Group, Columbia University, present an overview of network processor architectures, and {campbell, schou, mk, vs}@comet.columbia.edu. John Vicente is affiliated IXP1200 in particular, and discuss issues and design with the Intel Corporation and is also member of the COMET Group, Columbia University, [email protected]. choices associated with dynamic binding. Specifically, we investigate tradeoffs that are associated with different 0-7803-7457-6/02/$17.00 ©2002 IEEE. 91 IEEE OPENARCH 2002 design choices and discuss their implications on the (IX bus) controller, a PCI controller, control units for performance of the data path. In Section III, we present the accessing off-chip SRAM and SDRAM memory chips, and design and implementation of NetBind. NetBind can create an on-chip scratch memory. The internal architecture of the multiple data paths that can share resources of the same IXP1200 is illustrated in Figure 1. In what follows, we IXP1200 network processor. In Section IV, we use a provide an overview of the IXP1200 architecture. The number of NetBind created IPv4 [5] data paths to evaluate information presented here on the IXP1200 is sufficient to the performance of the system. We also evaluate the understanding the design and implementation of NetBind. MicroACE system [7] developed by Intel to support For further details on the IXP1200 see [2]. binding between software components running on Intel IXP1200 network processors, and identify pros and cons in comparison to NetBind. Section V discusses the related work, and finally, in Section VI, we provide some 32 SRAM 32 SDRAM read registers read registers concluding remarks. ‘read’ SDRAM transfer controller microengine 0 registers 64 32-bit 64 32-bit PCI GPRs GPRs PCI microengine 1 hash bus (Bank A) (Bank B) controller unit microengine 2 IX Bus IX controller bus Shifter StrongARM microengine 3 Core 4K scratch ALU microengine 4 memory SRAM microengine 5 32 SRAM 32 SDRAM controller write registers write registers ‘write’ Figure 1: Internal Architecture of IXP1200 transfer registers II. DYNAMIC BINDING IN NETWORK PROCESSORS Figure 2: Microengine Registers A. Network Processors One of the IXP1200 RISC CPUs is a StrongARM Core processor running at 166 or 200 MHz. The StrongARM 1) Features Core can be used for processing slow path exception packets, managing routing tables and other network state A common practice when designing and building high information. The StrongARM Core uses a 16K instruction performance routers is to implement the fast path using cache and an 8K data cache. The other six RISC CPUs, Application Specific Integrated Circuits (ASICs) in order to called “microengines” are used for executing the fast path avoid the performance cost of software implementations. code. Like the StrongARM Core, microengines run at 166 Network processors represent an alternative approach or 200 MHz. Each microengine supports four hardware where multiple processing units, (e.g., the microengines of contexts sharing an arithmetic logic unit, register space and Intel's IXP1200 [2] or the dyadic protocol processing units instruction store. Each microengine has a separate of IBM's PowerNP NP4GS3 [3]), offer dedicated instruction store of size 1K or 2K called “microstore”. computational support for parallel packet processing. Each microengine incorporates 256 32-bit registers. Processing units often have their own on-chip instruction Among these registers, 128 registers are General Purpose and data stores. In some network processor architectures, Registers (GPRs) and 128 are memory transfer registers. processing units are multithreaded. Hardware threads The register space of each microengine is shown in Figure usually have a separate program counter and manage a 2. Registers are used in the following manner. GPRs are separate set of state variables. However, each thread shares divided into two banks, (i.e., banks A and B, shown in an arithmetic logic unit and register space. Network Figure 2), of 64 registers each. Each GPR can be addressed processors do not only employ parallelism in the execution in a “context relative” or “absolute” addressing mode. By of the packet processing code, rather, they also support context relative mode, we mean that the address of a common networking functions realized in hardware, (e.g., register is meaningful to a particular context only. Each hashing [2], classification [3] or packet scheduling [2-3]). context can access one fourth of the GPR space in the context relative
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages13 Page
-
File Size-