Design and Evaluation of Network Systems and Forwarding Applications

JING FU

Licentiate Thesis Stockholm, Sweden, 2006 TRITA-EE 2006:054 School of Electrical Engineering ISSN 1653-5146 KTH, Stockholm, Sweden

Akademisk avhandling som med tillstånd av Kungl Tekniska högskolan framlägges till offentlig granskning för avläggande av licentiatexamen tisdag den 19 december 2006 i Sal Q2.

© Jing Fu, December 2006

Tryck: Universitetsservice US AB iii

Abstract

During recent years, both the traffic and packet transmission rates have been growing rapidly, and new Internet services such as VPNs, QoS and IPTV have emerged. To meet increas- ing line speed requirements and to support current and future Internet services, improvements and changes are needed in current routers both with respect to hardware architectures and forwarding ap- plications. High speed routers are nowadays mainly based on application specific integrated circuits (ASICs), which are custom made and not flexible enough to support diverse services. General- purpose processors offer flexibility, but have difficulties to in handling high data rates. A number of IP-address lookup algorithms have therefore been developed to enable fast packet pro- cessing in general-purpose processors. Network processors have recently emerged to provide the performance of ASICs combined with the programmability of general-purpose processors. This thesis provides an evaluation of design including both hardware architectures and software applications. The first part of the thesis contains an evaluation of various network proces- sor system designs. We introduce a model for network processor systems which is used as a basis for a simulation tool. Thereafter, we study two ways to organize processing elements (PEs) inside a network processor to achieve parallelism: a pipelined and a pooled organization. The impact of using multiple threads inside a single PE is also studied. In addition, we study the queueing behavior and packet delays in such systems. The results show that parallelism is crucial to achieving high per- formance, but both the pipelined and the pooled processing-element topologies achieve comparable performances. The detailed queueing behavior and packet delay results have been used to dimension queues, which can be used as guidelines for designing memory subsystems and queueing disciplines. The second part of the thesis contains a performance evaluation of an IP-address lookup algo- rithm, the LC-trie. The study considers trie search depth, prefix vector access behavior, be- havior, and packet lookup service time. For the packet lookup service time, the evaluation contains both experimental results and results obtained from a model. The results show that the LC-trie is an efficient route lookup algorithm for general-purpose processors, capable of performing 20 million packet lookups per second on a 4, 2.8 GHz computer, which corresponds to a 40 Gb/s link for average sized packets. Furthermore, the results show the importance of choosing packet traces when evaluating IP-address lookup algorithms: real-world and synthetically generated traces may have very different behaviors. The results presented in the thesis are obtained through studies of both hardware architectures and software applications. They could be used to guide the design of next-generation routers.

v

Acknowledgements

I would like to thank my main advisor Professor Gunnar Karlsson, for believing in me and offering me the PhD position in his lab. I would like to express my gratitude for his support, guidance and comments. My second advisor, Associate Professor Olof Hagsand, has guided and helped me throughout the years. Even though he has left the lab a year ago, he continued to provide me support when needed. I am also grateful to everyone at LCN, and the ones who have graduated and left the lab. I appreciative the friendship, discussions and the stimulating working environment. I would also like to thank the organizations and entities that have funded my research: the Win- ternet project funded by SSF and the Graduate School of Telecommunication at KTH. Finally, I would like to thank my family for their continuous support and patience.

Contents

Contents vii

1 Introduction 1 1.1 Network processors ...... 3 1.2 Queueing in network processor systems ...... 4 1.3 IP-address lookup ...... 4 1.4 Contributions of the thesis ...... 5

2 Summary of original work 7 2.1 Paper A: Designing and Evaluating Network Processor Applications . . . 7 2.2 Paper B: Performance Evaluation and Cache Behavior of LC-Trie for IP- Address Lookup ...... 8 2.3 Paper C: Queuing Behavior and Packet Delays in Network Processor Systems 8 2.4 Other papers ...... 9

3 Conclusions and future work 11

Bibliography 13

vii

Chapter 1

Introduction

The design of Internet switching devices, such as routers and , is becoming in- creasingly challenging. One challenge comes from the technological trends in optics, pro- cessors and memory. During recent years, the capacity of optical links has increased faster than the processing capacity of single processor systems, which in turn has increased faster than memory access speed. These trends have significant impact on the architecture of routers since a single-processor router may not be able to and buffer packets at high-speed line rates. Therefore, parallel processing, queueing and buffering approaches in routers are explored in various designs. Another challenge comes from the requirements for new functionality in routers. The main task of a traditional IP router is to perform IP-address lookup to determine the next- hop address for the packets. The Internet has evolved, and numerous new services have been introduced: Firewalls limit access to protected networks by filtering unwanted traffic; (QoS) provides service contracts between networks and users; Internet Protocol (IPTV) delivers digital television using the IP protocol. Therefore, an ideal router should be flexible and programmable in order to support current and future Internet services. Routers can be logically separated into a control plane and a forwarding plane. The control plane is responsible for , management and signalling protocols. The for- warding plane is responsible for forwarding packets from one port to another. Figure 1 shows a route processor executing in the control plane, and a fabric with a num- ber of line cards together constitutes the forwarding plane. The forwarding procedure of a packet can be summarized as follows. A packet first arrives on a port of a line card for ingress processing. Depending on the forwarding decision such as IP-address lookup, the outgoing interface and the egress line card of the packet are determined. Thereafter, the packet is transmitted through the switch fabric to the line card for egress processing. Finally, the packet is sent out through an output port to external links. Routers can be divided into two categories depending on their functionality. An edge router connects clients to the Internet while a core router serves solely to transmit data between other routers, e.g. inside the network of an Internet service provider. The require-

1 2 CHAPTER 1. INTRODUCTION

Route Processor Control Plane

Line Card 1 Line Card 2

External Links External Links

Line Card 3 Line Card 4

Switch Fabric Forwarding Plane

Figure 1.1: An abstraction of a router.

ments on routers differ depending on their type. While an edge router requires flexibility and programmability, the main requirement of a core router is high capacity. Current packet forwarding devices are typically based on general-purpose processors, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) or network processors. Deciding which technology to use involves a trade-off between flexibility, performance, cost, development time and power consumption. In a forwarding plane based on general-purpose processors, all networking function- alities must be implemented in software, which leads to low throughput. Recently, open software PC routers such as XORP [1] and Bifrost [2] have emerged. They put particular focus on code efficiency in order to provide high performance, and could be used as edge routers. ASICs are most suitable for core routers. They are fast, but not flexible and pro- grammable. They are custom-made for each application and all functionality is imple- mented in hardware. However, there are configurable ASICs that provide limited configu- ration, but are not fully programmable. The design of an ASIC is a long and complex task involving a large team of engineers passing through several iterative design phases. FPGAs contain programmable logic components and interconnects. Although they are generally slower than ASICs, they have several advantages such as a shorter time to market and flexibility. Network processors are programmable with hardware optimized for network opera- tions. The main purpose of a network processor is to provide the performance of ASICs combined with the programmability of general-purpose processors. There are also hybrid solutions to . First, it is common for network processor systems to have general-purpose processors for control plane operations and slow-path processing [3,5]. Second, a hybrid FPGA/ASIC architecture is also available [7]. In the following subsections, we first present network processors, followed by queueing in network processor systems, and finally IP-address lookup. 1.1. NETWORK PROCESSORS 3

1.1 Network processors

Currently, there is a variety of network processor designs [3–6]. Despite the diversity of the designs, there are some architectural similarities among them. First, all network processors make use of processing elements (PEs) to perform packet processing functions. PEs are instruction set processors that decode instruction streams. Second, in order to meet increasing line speed requirements, all network processors use some kind of parallel processing approach by exploiting packet-level and task-level parallelism [6, 8].

1.1.1 Approaches to parallelism

The first approach to parallelism is to use multiple PEs, since the processing capacity of a single PE may not suffice to handle line rates of high speed. There seems to be two main variants for structuring the PEs. The first alternative is to arrange the PEs in a , where each PE performs a single pipeline stage of packet processing. The processing is then constrained to a single dimension but may have advantages in reduced hardware complexity. The second alternative is to arrange the PEs in a general pool, where each PE performs the same functionalities. The advantage with the pooled approach is its generality. A hybrid approach combines the two approaches into parallel pipelines. The second approach is to achieve parallelism within a single PE. The IXP 2800 network processor, for example, supports several threads to be able to process packets simultaneously [3], whereas the Xelerated X10 network processor uses a synchronized dataflow mechanism to achieve parallelism within a PE [5].

1.1.2 Modelling and evaluation of network processors

A variety of network processor models have been introduced for management, program- ming and performance evaluation purposes. The IETF ForCES (forwarding and control element separation) group has defined the ForCES forwarding element model [10]. It pro- vides a general management model for diverse forwarding elements including network processors. Several programming models have been developed, for example NP-Click [9], a programming model for the Intel IXP 1200 network processor [4]. The observation that current network processors are difficult to program has influenced the work with NetVM, a network virtual machine [11]. NetVM models the main compo- nents of a network processor, and aims at influencing the design of next-generation future network processor architectures giving them a more unified programming interface. A number of network processor models have been introduced to study various design approaches [12–14]. Some of the models focus on a particular network processor such as Intel IXP 1200 while others are general models. 4 CHAPTER 1. INTRODUCTION

1.2 Queueing in network processor systems

There are several places in a network processor system where queues are necessary. First, packets may arrive through several ports to an ingress line card of a network processor system at a higher rate than the service rate of the line card. Second, several ingress line cards may simultaneously transmit a large number of packets to the same egress line card. Third, the introduction of quality of service may cause best-effort packets to be queued when higher priority traffic is present. The queueing disciplines normally used between ingress and egress line cards include input queueing, output queueing and virtual output queueing (VOQ) [15]. In input queue- ing, the queues are placed between the ingress line cards and the switch fabric. Input queueing is not efficient due to head-of-the-line blocking: If the packet at the head of a queue is blocked waiting for access to a particular egress line card, packets in the queue destined to other egress line cards are blocked as well, even if the other egress line cards are ready to receive them. In output queueing, the queues are placed between the switch fabric and the egress line cards. It allows packets from several ingress line cards to be transmitted to a single egress line card simultaneously. The main challenge in output queueing centers on the capacity of the switch fabric and the queue memory: The fabric has to allow multiple packets to be transmitted to an egress line card simultaneously, instead of one at a time in the input queueing case. In addition, the memory bandwidth must have capacity equal to that of the switch fabric. As an alternative, a system can use virtual output queueing with the queues placed between the ingress line cards and the switch fabric. Unlike input queueing, there is a separate queue for each egress line card. This solves the head-of-the-line blocking prob- lem and does not require high speedup in the switch fabric, but requires arbitration of the queues. In addition to using a virtual output queue (VOQ) for each egress line card, a VOQ for each traffic class can be used to provide service differentiation.

1.3 IP-address lookup

The processing functions inside a packet forwarding device can be categorized into a num- ber of items, including classification, IP-address lookup, modification, metering, schedul- ing, shaping, etc. The goal of the IP-address lookup is to find the outgoing interface and the next-hop address of an IP packet, given the destination IP-address as input argument. The procedure is as follows: For each packet, the router extracts the destination IP address, makes a lookup in a routing table, and determines the outgoing interface. Each entry in the routing table consists of a prefix-length pair. Instead of an exact match, the IP address lookup algorithm performs a longest-prefix match. Since each packet needs to be examined at high network speed, the lookup procedure must be very efficient. For a 10 Gb/s interface, 5 million lookups per second are required for average sized packets. 1.4. CONTRIBUTIONS OF THE THESIS 5

1.3.1 Approaches to IP-address lookup

The current approaches for performing IP-address lookup include both hardware and soft- ware solutions. The hardware solutions make use of content addressable memory (CAM) and ternary content addressable memory (TCAM) to perform IP address lookup [16–18]. CAM and TCAM provide content-based memory that compares a search key to all slots in memory in parallel. To achieve higher performance and reduce the time for memory ac- cesses, many hardware solutions use a pipelined architecture [19, 20]. However, there are many approaches for fast IP-address lookup that are based on software solutions. The ma- jority of these algorithms use a modified version of a trie data structure [21]. Degermark, et al., try to fit the data structure into the cache [22]. Srinivasan and Varghese present a modified trie data structure for performing IP-address lookup [23]. Finally, the LC-trie, an efficient IP-Address lookup algorithm, is another approach based on a modified version of the trie data structure [24]. The LC-trie algorithm has recently been implemented in the Linux 2.6.13 kernel forwarder.

1.3.2 Performance evaluation of IP-address lookup algorithms

There are several efforts studying accurate performance measurement of IP-address lookup algorithms. Narlikar and Zane present an analytical model to predict the performance of software-based IP-address lookup algorithms [25]. Kawabe et al., present a method for predicting IP-address lookup algorithm performance based on statistical analysis of the Internet traffic [26]. Ruiz-Sanchez et al., survey existing route lookup algorithms [27]. In particular, they examine the packet lookup service time of lookup algorithms using a random network trace.

1.4 Contributions of the thesis

This thesis provides an evaluation of router designs including both hardware architectures and software applications. The contributions include work in two areas. The first area includes an investigation of various aspects of network processor sys- tems. A system model is introduced and used as a basis for a simulation tool. In particular, different methods to organize PEs to achieve parallelism are studied together with the im- pact of using multiple threads inside a single PE. The queueing behavior and packet delays in a system are studied given real-world and synthetically generated packet traces as input data. The results on queueing behavior have been used to dimension queues, which can be used as guidelines for designing memory subsystems and selecting queueing disciplines. The second area includes a detailed performance evaluation of an IP-address lookup algorithm, the LC-trie. The performance study includes trie search depth, prefix vector access behavior, cache behavior, and packet lookup service time. For the packet lookup service time, the evaluation contains both experimental results and results obtained from a formal model. 6 CHAPTER 1. INTRODUCTION

The rest of the thesis is organized as follows. Chapter 2 summarizes the original work of the thesis, presented in three enclosed papers. Chapter 3 concludes the work and pro- poses future research topics. Chapter 2

Summary of original work

2.1 Paper A: Designing and Evaluating Network Processor Applications

J. Fu and O. Hagsand, "Designing and Evaluating Network Processor Applications", in Proc. of 2005 IEEE Workshop on High Performance Switching and Routing (HPSR), pp. 142-146, Hong Kong, May 2005.

Summary: This paper introduces a network processor model based on the study of current network processors. A simulation environment following the model is constructed using the Java programming language. The simulation environment provides a program- ming interface and run-time to evaluate different design approaches. This includes pro- cessing element layout, forwarding application design and use of multiple threads inside a single processing element. The evaluation of various designs is done by analyzing the resulting saturation through- put, latency, utilization and code complexity. Using the simulation environment, an IPv4 forwarding application is evaluated using two different processing element topologies: a pipelined and a pooled. In addition, the impact if using multiple threads inside a single PE is studied. The simulation results show that the use of parallelism is crucial to achieve high per- formance. In particular, using multiple threads inside a processing element has a large impact on the performance. The processing blocks become fully utilized when there are four or more threads. Moreover, both architectures, the pipelined and the pooled achieve comparable performances.

Contribution: The original idea of developing a model for network processors came from the second author of the paper. The author of this thesis developed the model, carried out simulation experiments, obtained and analyzed the results and wrote the article under

7 8 CHAPTER 2. SUMMARY OF ORIGINAL WORK the supervision of the second author.

2.2 Paper B: Performance Evaluation and Cache Behavior of LC-Trie for IP-Address Lookup

J. Fu, O. Hagsand and G. Karlsson, "Performance Evaluation and Cache Behavior of LC- Trie for IP-Address Lookup", in Proc. of 2006 IEEE Workshop on High Performance Switching and Routing (HPSR), Poznan, Poland, June 2006.

Summary: This paper presents a detailed performance analysis of the LC-trie IP- address lookup algorithm. Both a realistic trace from FUNET [28] and a synthetically generated random network trace are used for the study. The performance measurements include trie search depth, prefix vector access behavior, cache behavior, and packet lookup service time. Search depth and prefix vector access behavior are obtained experimentally by running a modified version of the LC-trie source code [29]. Cache behavior results are obtained by using an introduced cache model supporting two replacement policies: random and least recently used. Finally, the packet lookup service time results are obtained both through an introduced model and through experimental runs. The results show that for the FUNET trace, the LC-trie algorithm is capable of perform- ing 20 million lookups per second on a , 2.8 GHz computer, which corresponds to a 40 Gb/s link for average-sized packets. Since the FUNET trace contains both uni- versity and student dormitory traffic, we believe it is a realistic and representative trace for aggregated Internet traffic. As the results show that LC-trie performs up to five times better on the realistic trace than on a synthetically generated random network trace, this il- lustrates that the choice of traces may have a large influence on the results when evaluating IP-address lookup algorithms.

Contribution: The original idea of performing a performance analysis of the LC-trie lookup algorithm came from the third author of the paper. The author of this thesis carried out experiments, developed the analytical model, obtained and analyzed the results and wrote the article under the supervision of the co-authors.

2.3 Paper C: Queuing Behavior and Packet Delays in Network Processor Systems

J. Fu, O. Hagsand and G. Karlsson, "Queuing Behavior and Packet Delays in Network Pro- cessor Systems", Technical report, TRITA-EE 2006:050, ISSN 1653-5146, KTH, Royal Institute of Technology, Sweden, October 2006. 2.4. OTHER PAPERS 9

Summary: This paper studies the queueing behavior and packet delays in a network processor system. The study is based on an introduced system model and a simulation tool that is constructed according to the model. Using the simulation tool, both best-effort and diffserv IPv4 forwarding were modeled and tested using real-world and synthetically generated packet traces. By studying the traces, the packet arrival processes are studied and modeled. The results include average and 99th percentile queue lengths for various queues, per- centages of dropped packets and packet delays. The results on queueing behavior have been used to dimension the queues, which can be used as guidelines for designing memory subsystems and queueing disciplines. The result on packet delays show that our diffserv setup provides good service differentiation for best-effort and priority packets. While best- effort packets experience high loss probability, delay and delay jitters, priority packets are transmitted with a very low delay and delay jitters are small. The study also reveals that there are large differences in queueing behavior and packet delays for the real-world and the synthetically generated traces. This indicates that the choice of traces has a large impact on the results when evaluating router and switch archi- tectures.

Contribution: The author of this thesis developed the model, carried out simulation experiments, obtained and analyzed the results and wrote the article under the supervision of the co-authors.

2.4 Other papers

The following papers have also been authored by the author of this thesis:

• J. Fu and O. Hagsand, "A Programming Model for a Forwarding Element", in Proc. of SNCNW 2004, Karlstad, Sweden, November 2004. • J. Fu and O. Hagsand, "An Evaluation Environment for Programmable Forwarding Elements", Poster in Multimedia Interactive Protocols and Systems 2004, Grenoble, France, November 2004. • J. Fu, O. Hagsand and Gunnar Karlsson "An Analysis of Queueing Behavior in Network Processor Systems", in Proc. of SNCNW 2006, Luleå, Sweden, October 2006.

Chapter 3

Conclusions and future work

This thesis presents an evaluation of network processor systems and a performance eval- uation of the LC-trie IP-address lookup algorithm. A system model is developed to study various design approaches. Thereafter, a simulation tool is constructed according to the model. The simulation tool is used to evaluate several aspects of network processor de- sign including PE layout, forwarding application design, use of multiple threads inside a single PE, and queueing discipline. We evaluate these designs by analyzing the resulting throughput, latency, utilization, code complexity, queueing behavior, and packet delays. The conclusions from this work cover a number of areas. First, our results show that the use of parallelism is crucial to achieve high performance. The best performance is achieved when there are an adequate number of threads inside each PE. Also, both the pipelined and pooled PE topologies achieve comparable performance. Second, the detailed results on queueing behavior are used to dimension various queues to guide the design of memory subsystems queueing disciplines. Third, the results on packet delays show that our diffserv setup provides good service differentiation for best-effort and priority packets. While best-effort packets experience high loss probability, delay and delay jitter, priority packets are transmitted with a very low delay and delay jitter is small. Fourth, the results on queueing behavior reveals that the real-world traffic is not only bursty, but packets are also clumped together when transmitted to outgoing interfaces. As the real-world and synthetically generated traces are compared, large differences in queueing behavior and packet delays are observed. This illustrates that the choice of traces may have a large influence on the results when evaluating router and switch architectures. The performance measurements on the LC-trie include trie search depth, prefix vector access behavior, cache behavior, and packet lookup service time. The packet lookup ser- vice time is obtained both through experimental runs and a formal model. The results show that the LC-trie algorithm is capable of performing 20 million packet lookups per second on a Pentium 4, 2.8 GHz computer, which corresponds to a 40 Gb/s link for average sized packets. Further, the results show that the LC-trie performs up to five times better on the realistic FUNET trace compared to a synthetically generated network trace. There are open issues to address which can be the subject of future study:

11 12 CHAPTER 3. CONCLUSIONS AND FUTURE WORK

• We would like to study in detail the design, implementation and performance of a distributed router. In particular, we would like to study how to perform efficient IP- address lookup. The lookup procedure could be partitioned and performed both in the incoming and outgoing FEs. The separation of the procedure may result in new algorithms and data structures enabling faster lookups. • We would like to investigate how to support network services using a distributed router. Any user, including computer hosts and routers, could join the distributed router anywhere in the Internet. Network services, including virtual network service, bypassing of Internet filters, and anonymity in the Internet are provided to the users. The research will focus on how to provide the services and how to perform internal routing in such a router. Bibliography

[1] XORP, eXtensible Open Router Platform, Available: http://www.xorp.org/. [2] Bifrost Network Proceject, Available: http://bifrost.slu.se/index.en.html. [3] Intel Corp., ”Intel IXP 2800 Network Processor Family”, Hardware Reference Manual, August 2004. [4] Intel Corp., ”Intel IXP 1200 Network Processor Family”, Hardware Reference Manual, December 2001. [5] Xelerated, ”Xelerator x10q Network Processors”, Product Brief, October 2003. [6] N. Shah, ”Understanding Network Processors”, Master’s thesis, Univ. of California, Berkely, 2001. [7] P. S. Zuchowski, C. B. Reynolds, R. J. Grupp, S.G. Davis, B. Cremen, and B. Troxel, ”A hybrid ASIC and FPGA architecture”, in Proc. of the 2002 IEEE/ACM interna- tional conference on computer-aided design, San Jose, California, pp. 187-194, 2002. [8] D. E. Comer, ”Network Systems Design using Network Processors”, Prentice Hall, 2004. [9] N. Shah, W. Plishker, and K. Keutzer, ”NP-Click: A programming model for the Intel IXP1200”, in 9th International Symposium on High Performance Computer Architec- tures (HPCA), Feb 2003. [10] L. Yang, J. Halpern, R. Gopal, and R. Dantu, ”ForCES Forwarding Element Model”, Internet Draft, March 2006. [11] L. Degioanni, M. Baldi, D. Buffa, F. Risso, F. Stirano, and G Varenni, ”Network Virtual Machine (NetVM): A New Architecture for Efficient and Portable Packet Pro- cessing Applications”, in Proc. of 8th International Conference on Telecommunica- tions (ConTEL 2005), pp. 153-168, Zagreb, Croatia, June 2005. [12] M. Gries, C. Kulkarni, C. Sauer, K. Keutzer, ”Exploring Trade-offs in Performance and Programmability of Processing Element Topologies for Network Processors”, Net- work : Issues and Practices, volume 2, Morgan Kaufmann Publish- ers, Nov. 2003.

13 14 BIBLIOGRAPHY

[13] L. Thiele, S. Chakraborty, M. Gries, ”Design Space Exploration of Network Pro- cessor Architectures”, First Workshop on Network Processors at the 8th International Symposium on High Performance Computer Architectures, February 2002.

[14] P. Crowley and J.-L. Baer, ”A Modeling Framework for Network Processor Systems”, First Workshop on Network Processors at the 8th International Symposium on High Performance Computer Architectures, February 2002.

[15] T. Anderson, S. Owicki, J. Saxe, and C. Thacker, ”High speed switch scheduling for local area networks,” ACM Trans. Comput. Syst., pp. 319-352, Nov. 1993.

[16] A. McAuley and P. Francis, ”Fast Routing table Lookups using CAMs,” in Proc. of IEEE Infocom’93, vol. 3, pp. 1382-1391, San Francisco, 1993.

[17] F. Zane, N. Girija and A. Basu, ”CoolCAMs: Power-Efficient TCAMs for Forwarding Engines,” in Proc. of IEEE Infocom’03, pp. 42-52, San Francisco, May 2003.

[18] E. Spitznagel, D. Taylor and J. Turner, ”Packet Classification Using Extended TCAMs”, in Proc. of ICNP’03, pp. 120-131, Nov. 2003.

[19] P. Gupta, S. Lin, and N. McKeown, ”Routing Lookups in Hardware at Memory Ac- cess Speeds,” in Proc. of IEEE Infocom’98, pp. 1240-1247, San Francisco, Apr. 1998.

[20] A. Moestedt and P. Sj’odin,´ ”IP Address Lookup in Hardware for High-Speed Rout- ing ,” in Proceedings of Hot Interconnects VI, Stanford, 1998.

[21] E. Fredkin. ”Trie memory,” Communications of the ACM, pp. 490-499. 1960.

[22] M. Degermark A. Brodnik, S. Carlsson, and S. Pink, ”Small Forwarding Tables for Fast Routing Lookups”. in Proc. ACM SIGCOMM Conference’97, pages 3-14, Oct. 1997.

[23] V. Srinivasan and G. Varghese, ”Faster IP lookups using controlled prefix expansion,” in Proc. of ACM SIGMETRICS 1998, Madison, pp.1-10, 1998.

[24] S. Nilsson and G. Karlsson, ”IP-address lookup using LC-tries,” IEEE Journal on Selected Areas in Communications, vol. 17, no. 6, pp. 1083-1092, June 1999.

[25] G. Narlikar and F. Zane, ”Performance modeling for fast IP lookups”, in Proc. of ACM SIGMETRICS 2001, pp.1-12, 2001.

[26] R. Kawabe, S. Ata and M.Murata, ”Performance Prediction Method for IP lookup algorithms,” in Proc. of IEEE Workshop on High Performance Switching and Routing, 2002 pp.111-115, Kobe, Japan, May 2002.

[27] M. A. Ruiz-Sanchez, E. W. Biersack, and W. Dabbous, ”Survey and taxonomy of IP address lookup algorithms,” IEEE Network Magazine, vol. 15, no. 2, pp. 8-23, Mar. 2001. 15

[28] CSC, Finish IT Center for Science, FUNET Looking Glass, Available: http.//www.csc.fi/sumoi/funet/noc/looking-glass/lg.cgi. [29] LC-trie implementation source code. Available: http://www.nada.kth.se/ snils- son/public/code/router/.