Compact Trie Forest: Scalable Architecture for IP Lookup on Fpgas

Compact Trie Forest: Scalable architecture for IP Lookup on FPGAs Oguzhan˘ Erdem Aydin Carus Hoang Le Electrical and Electronics Engineering Computer Engineering Electrical Engineering Trakya University Trakya University University of Southern California Edirne, TURKEY 22030 Edirne, TURKEY 22030 Los Angeles, USA 90007 Email: [email protected] Email: [email protected] Email: [email protected] Abstract—Memory efficiency with compact data structures pipelining techniques are used to improve the through- for Internet Protocol (IP) lookup has recently regained much put. However, pipelined hardware implementation of these interest in the research community. In this paper, we revisit algorithms suffer from inefficient memory usage due to the classic trie-based approach for solving the longest prefix matching (LPM) problem used in IP lookup. Among all existing unbalanced mapping of tree onto the pipeline stages. implementation platforms, Field Programmable Gate Array We propose a compact trie forest data structure for trie- (FPGA) is a prevailing platform to implement SRAM-based based IP lookup. The search data structure is realized on pipelined architectures for high-speed IP lookup because of its a scalable high-throughput, SRAM-based linear pipeline abundant parallelism and other desirable features. However, architecture. Additionally, this design utilizes the dual-ported due to the available on-chip memory and the number of I/O pins of FPGAs, state-of-the-art designs cannot support large feature of on-chip memory on FPGAs to achieve high routing tables consisting of over 350K prefixes in backbone throughput. This paper makes the following contributions: routers. 1) A Compact trie (CT) structure that achieves a better We propose a search algorithm and data structure denoted memory efficiency compared with that of the tradi- Compact Trie (CT) for IP lookup. Our algorithm demonstrates a substantial reduction in the memory footprint compared with tional binary trie (Section III). the state-of-the-art solutions. A parallel architecture on FPGAs, 2) A Compact trie forest (CTF) consisting of multiple named Compact Trie Forest (CTF), is introduced to support the CTs to eliminate the backtracking problem in CTs. data structure. Along with pipelining techniques, our optimized These CTs are searched in parallel for high perfor- architecture also employs multiple memory banks in each mance IP lookup, taking advantages of the abundant stage to further reduce memory and resource redundancy. Implementation on a state-of-the-art FPGA device shows that parallelism provided by state-of-the-art FPGAs (Sec- the proposed architecture can support large routing tables tion III). consisting up to 703K IPv4 or 418K IPv6 prefixes. The post 3) A linear pipelined SRAM-based architecture that can place-and-route result shows that our architecture can sustain be easily implemented in hardware. Our optimized 420 a throughput of million lookups per second (MLPS), or architecture also employs multiple memory banks to 135 Gbps for the minimum packet size of 40 Bytes. The result surpasses the worst-case 150 MLPS required by the further improve memory efficiency (Section IV). standardized 100GbE line cards. 4) A design that can support up to 703K IPv4 and 418K IPv6 prefixes, using a state-of-the-art FPGA I. INTRODUCTION device. The post place-and-route result shows that our architecture can sustain a throughput of 420 million Most hardware-based solutions for network routers fall lookups per second, or 135 Gbps for the minimum into two main categories: Ternary Content Addressable packet size of 40 Bytes (Section V). Memory (TCAM)-based and dynamic/static random access The rest of the paper is organized as follows. Section II memory (DRAM/SRAM)-based solutions. In TCAM-based covers the background and overviews the existing solutions solutions, each prefix is stored in a word and an incoming for IP lookup. Section III presents in detail the algorithms IP address is compared in parallel with all the entries and data structures for CTF. Section IV introduces the in TCAM in one clock cycle. TCAM-based solutions are proposed architecture and its implementation on FPGA. simple, and therefore, are de-facto solutions for today’s Section V presents experimental setup and implementation routers. However, TCAMs are expensive, power-hungry, results. Section VI concludes the paper. and offer little adaptability to new addressing and routing protocols. On the other hand, SRAM has higher density, II. BACKGROUND lower power consumption, and higher speed. The common data structure in SRAM-based solutions is some form of tree. A. IP Lookup Overview In these solutions, multiple memory accesses are required IP packet forwarding, or simply, IP-lookup, is a classic in order to find the search result. Therefore, FPGA based problem. In computer networking, a routing table is a 978-1-4673-2921-7/12/$31.00 c 2012 IEEE database that is stored in a router or a networked computer. some pre-defined lengths to compare directly with the input The routing table stores the routes and metrics associated address. In a binary search tree (BST), each node has a value with those routes, such as next hop routing indices, to (prefix) and an associated next hop index. The left subtree of particular network destinations. The IP-lookup problem is that node contains only values less than or equal to the nodes referred to as “longest prefix matching” (LPM), which is value, and the right subtree contains values greater than the used by routers in IP networking to select an entry from nodes value. Pipelined BST-based IP lookup solutions are the given routing table. To determine the outgoing port for limited by the complex pre-processing required to convert a given address, the longest matching prefix among all the routing prefixes into exclusive ranges and sort them. Using prefixes needs to be determined. Routing tables often contain this structure is also difficult to perform incremental updates. a default route in case matches with all other entries fail. III. ALGORITHM AND DATA STRUCTURE B. Related Work A. Definitions and Notations Various hardware based IP lookup solutions on FPGAs have been proposed in recent years. In general, these FPGA- The following notations are used throughout the paper: based approaches can be classified into four categories: (1) MSB - Most Significant Bit, LSB - Least Significant Bit, linear pattern search in TCAM [1], [2], (2) hash based solu- MSSB - Most Significant Set Bit, LSSB - Least Significant Set Bit, MSRB - Most Significant Reset Bit, LSRB - Least tions [3], [4], (3) binary bit traversal in pipelined tries [5]– ∗ [8], and (4) binary value search in pipelined trees [9], [10]. Significant Reset Bit. For instance, prefix 00110101 has 0 TCAM-based solutions are simple, but they are expensive, and 1 values as MSB and LSB; 2, 7, 0 and 6 values for power-hungry, and offer little adaptability to new addressing MSSB, LSSB, MSRB and LSRB positions. and routing protocols. Hash based IP lookup schemes have Definition Prefix node in a trie is any node to which a several disadvantages; (a) large number of different hash path from the root of the trie corresponds to an entry in the tables may be required to store routing prefixes of different routing table. If there is no valid prefix stored in a trie node, lengths, (b) use of separate hash functions for each length then it is called non-prefix node. is impractical, (c) it is hard to find perfect hash functions Active part to minimize bin overflows, and (d) an additional memory Definition (AP) of a prefix is the bit string (CAM, etc.) needs to be reserved for resolving overflows. between The most common and simple data structure for IP lookup 1) MSSB and LSSB bits for (MSB,LSB)=(0,0) is the binary trie. These trie-based solutions achieve good 2) MSSB and LSRB bits for (MSB,LSB)=(0,1) throughput performance and support quick prefix updates. 3) MSRB and LSSB bits for (MSB,LSB)=(1,0) In such a trie, the path from the root to a node represents 4) MSRB and LSRB bits for (MSB,LSB)=(1,1) a prefix in a routing table. Fig. 1 illustrates a sample of a prefix excluding the both MSS(R)B and LSS(R)B bits. prefix table and its corresponding binary trie. In this figure, For example, the active parts of the prefixes 011011∗ and each black node corresponds to a prefix. Multiple memory 00101100∗ are 1 and 01, respectively. accesses are required to find the longest matched prefix. Definition If two prefixes have the same active part, then Therefore, pipelining techniques are used to improve the they are called conflicted prefixes. For instance, the prefixes throughput. However, pipelined hardware implementation of 011001∗ and 0011010∗ are conflicted because they both this algorithm suffers from inefficient memory usage due to have the same active part of 10. unbalanced mapping of tree onto the pipeline stages. B. Prefix Table Conversion Next Prefix Hop 0 1 11* P1 A prefix p can be expressed as the concatenation of three 111* P2 0101* P3 0 110 substrings x, y and z, such that p = xyz. In this notation, 00101* P4 01001* P5 P1 x is a string composed of only 0’s followed by a single 01110* P6 0 1 0 1 0 0 1 10001* P7 11001* P8 P2 1 (MSSB), or only 1’s followed by a single 0 (MSRB). z 11010* P9 1 0 0 1 1 0 1 0 1 0 000101 P10 is a string composed of a 1 (LSSB) followed by 0’s, or a 001010 P11 010001 P12 0 1 0 P3 1 1 1 001 1 p xyz 011101 P13 0 (LSRB) followed by 1’s. Alternatively, prefix = 011110 P14 {|x|,y,|z|} 100011 P15 1 0 P4 P5 P6 P7 0 1 P8 P9 0 can be represented as a triplet .

Load more