Queueing Behavior and Packet Delays in Network Processor Systems

Queueing Behavior and Packet Delays in Network Processor Systems Jing Fu Olof Hagsand Gunnar Karlsson KTH, Royal Institute of KTH, Royal Institute of KTH, Royal Institute of Technology Technology Technology SE-100 44, Stockholm, SE-100 44, Stockholm, SE-100 44, Stockholm, Sweden Sweden Sweden [email protected] [email protected] [email protected] emerged to provide a flexible router forwarding plane. The goals are to provide the performance of traditional ASICs ABSTRACT and the programmability of general-purpose processors. To Network processor systems provide the performance of ASICs achieve this, programmable processing elements, special pur- combined with the programmability of general-purpose pro- pose hardware and general-purpose CPUs are used to per- cessors. One of the main challenges in designing these sys- form packet processing tasks. In this work, we model a tems is the memory subsystem used when forwarding and router using a network processor system, that is, a system queueing packets. In this work, we study the queueing constituted by line cards built with network processors. behavior and packet delays in a network processor system which works as a router. The study is based on a system Packets may arrive to a network processor system in bursts model that we have introduced and a simulation tool that and be forwarded to the same limited resource (i.e., an out- is constructed according to the model. Using the simula- going interface) causing congestion. Therefore, packets need tion tool, both best-e®ort and di®serv IPv4 forwarding were to be queued at several stages. Thus, designing the mem- modeled and tested using real-world and synthetically gen- ory subsystem and queueing disciplines inside the system erated packet traces. The results on queueing behavior have becomes an important task. been used to dimension various queues, which can be used as guidelines for designing memory subsystems and queueing We extend our earlier work on network processors systems [1] disciplines. In addition, the results on packet delays show by introducing a revised model. The model is capable of that our di®serv setup provides good service di®erentiation modeling a system with multiple line cards, supporting a for best-e®ort and priority packets. The study also reveals variety of parallel processing approaches, queueing disci- that the choice of traces has a large impact on the results plines and forwarding services. Afterwards, we study the when evaluating router and switch architectures. queueing behavior and packet delays using both real-world and synthetically generated packet traces. Our study covers queueing behavior and packet delays of a best-e®ort IPv4 Keywords forwarding service and an IPv4 forwarding service support- network processor, router, queueing behavior ing di®serv [2]. 1. INTRODUCTION The rest of the paper is organized as follows. Section 2 During the recent years, both the Internet tra±c and packet overviews related work. Section 3 presents a model for net- transmission rates have grown rapidly. At the same time, work processor systems. Section 4 presents and character- new Internet services such as VPNs, QoS, and IPTV are izes the packet traces used in the simulations. Section 5 emerging. These trends have implications in the architecture shows the experimental setup. Section 6 presents and ana- of routers. Ideally, a router should process packets at line lyzes the results. Finally, section 7 concludes the paper. rates of high speed, and at the same time be su±ciently programmable and flexible to support current and future Internet services. 2. RELATED WORK There are a variety of studies focusing on investigating the To meet these requirements, network processor systems have performance of network processor systems in various as- pects. These studies are based on analytical models, simulations, or real experiments. An example of an analytical model is described in [3], where the design space of a network processor is explored. How- ever, the model is based on a high level of abstraction, where the goal is to quickly identify interesting architectures, which may be subject to a more detailed evaluation using simulation. Their ¯nal output is three candidate architectures, representing cost versus performance tradeo®s. The IETF ForCES (Forwarding and Control Element Sep- route aration) group has de¯ned the ForCES forwarding element processor model [4]. The model provides a general management model Source Sink for diverse forwarding elements including network proces- ingress egress sors. The observation that current network processors are line card 1 line card 1 di±cult to program has influenced the work with NetVM, a Source Sink ingress egress network virtual machine [5]. NetVM models the main com- line card 2 line card 2 ponents of a network processor, and aims at influencing the Source Sink design of next-generation future network processor architec- ingress egress tures giving a more uni¯ed programming interface. line card 3 line card 3 Source Sink Many studies on the Intel IXP 1200 network processor have ingress egress been performed. Spalink et al. demonstrate how to build line card 4 switch line card 4 a software-based router using the IXP 1200 [6]. Their anal- fabric ysis partly focuses on queueing disciplines, which includes packet scheduling, queue contention and port mapping. Lin Figure 1: Network processor system overview. et al. present an implementation and evaluation of di®serv over the IXP 1200 [7]. They have showed in detail the design and implementation of di®serv. The throughput of the 3.1 Network Processor Line Cards flows are measured and the performance bottlenecks of the Processing blocks network processor are identi¯ed. For example, they found Processing blocks are abstractions of processing elements SRAM to be one of the major performance bottlenecks. Pa- (PEs) in a line card. In a block, a program runs on the paefstathiou et al. present how to manage queues in network local processing unit and processes the received packets. A processors [8]. The study is performed both at the IXP 1200 block may need to wait for external access to memory or an and a reference prototype architecture. To summarize, the engine in order to complete, thus reducing the utilization. queueing studies performed at the IXP 1200 are focused on Using several threads increases the utilization by processing technical details, including where and how to queue packets. several packets simultaneously: while one thread is blocked, Still, there are no studies on dimensioning the queues in a another may take over the execution. network processor system based on real-world and synthetically generated traces. Engines Finally, in-router queueing behavior and packet delays are Engines are special-purpose hardware available to a network studied in a gateway router of the Sprint IP backbone net- processor that performs speci¯c tasks. They are usually trig- work [9]. The statistics are used to derive a model of router gered by PEs. Examples are TCAM engines and checksum delay performance that accurately predicts packet delays in- engines. side a router. Channels 3. A MODEL FOR NETWORK PROCES- Processing blocks and engines are inter-connected by chan- SOR SYSTEMS nels that represent potential paths for packet transfer. In this section, a model for a network processor system is presented. The major building blocks of the model are line Queues cards using network processors, a switch fabric and a route There are several places in the system where queues are nec- processor. In other words, we have modeled a router using essary. First, packets may arrive to an ingress line card at network processor based line cards. The model is based on a higher rate than the service rate of the line card. Sec- a simpler model, presented in an earlier work [1]. ond, several ingress line cards may simultaneously transmit a large number of packets to the same egress line card. The basic building blocks of a network processor line card Third, the introduction of quality of service may cause best- are processing blocks, engines, channels and queues. Such a e®ort packets to be queued when higher priority tra±c is line card can be logically separated into an ingress and an present. egress line card. Fig. 1 shows a network processor system that represents a router with four ingress and four egress In order to make the processing of packets more e±cient, line cards. We assume that there is only one port inside a special meta-packet is created. This meta-packet includes each line card, and packets arriving at this port are ¯rst the packet header, information about the packet and a pointer processed at the ingress line card, and are then transmitted to the actual packet. While the actual packet resides in through the switch fabric to the egress line cards. Based on slower SDRAM, the meta-packet is stored in faster SRAM the queueing discipline, the packets can be queued either at for faster access. This means that an SRAM operation needs the ingress or at the egress line cards. Moreover, there is to be performed when transferring a packet between process- a route processor whose main task is to handle routing and ing blocks, while an SDRAM operation is needed to transmit management protocols. the entire packet over the backplane. In this model, there is no slow-path or terminating tra±c. All queues are formed from meta packets and are FIFOs im- In other words, packets are only sent between line cards, plementing either tail drop or random early discard (RED) and there is no tra±c to or from the route processor. policies. TCAM Egress Line Card TCAM Table 1: Autocorrelation of the FUNET interarrival Source Ethernet Destination Ethernet Source Ethernet Destination Ethernet Sink times.

Queueing Behavior and Packet Delays in Network Processor Systems

Instability Phenomena in Underloaded Packet Networks with Qos Schedulers M.Ajmone Marsan, M.Franceschinis, E.Leonardi, F.Neri, A.Tarello

Scalable Multi-Module Packet Switches with Quality of Service

Improving the Scalability of High Performance Computer Systems

Fair Scheduling in Input-Queued Switches Under Inadmissible Traffic

Scheduling Algorithm and Evaluating Performance of a Novel 3D-VOQ Switch

On Scheduling Input Queued Cell Switches

Performance Analysis of Networks on Chips

Switch Kenji Yoshigoe University of South Florida

Quantifiable Service Differentiation for Packet Networks

Fabric-On-A-Chip: Toward Consolidating Packet Switching Functions on Silicon

Data Path Processing in Fast Programmable Routers

Msc THESIS Design of a High-Performance Buﬀered Crossbar Switch Fabric Using Network on Chip