On Adding Bloom Filters to Longest Prefix Matching Algorithms

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON COMPUTERS 1 On Adding Bloom Filters to Longest Prefix Matching Algorithms Hyesook Lim, Member, IEEE, Kyuhee Lim, Nara Lee, and Kyong-hye Park, Student Members, IEEE ! Abstract—High speed IP address lookup is essential to achieve wire- power per bit than SRAMs. TCAMs consume around speed packet forwarding in Internet routers. Ternary content address- 30-40% of the total line card power. As line cards are able memory (TCAM) technology has been adopted to solve the IP stacked together, TCAMs impose a high cost on the address lookup problem because of its ability to perform fast parallel matching. However, the applicability of TCAMs presents difficulties due cooling system. System vendors are willing to accept to cost and power dissipation issues. Various algorithms and hardware some latency penalty if the power of a line card can be architectures have been proposed to perform the IP address lookup lowered [6]. TCAMs also cost about 30 times more per bit using ordinary memories such as SRAMs or DRAMs without using of storage than DDR SRAMs. Various algorithms have TCAMs. Among the algorithms, we focus on two efficient algorithms pro- been studied to replace TCAMs with ordinary memories viding high-speed IP address lookup; parallel multiple-hashing algorithm such as SRAMs or DRAMs [1]-[4], [6]-[26]. and binary search on level algorithm. This paper shows how effectively A fast on-chip SRAM is often used in several ap- an on-chip Bloom filter can improve those algorithms. A performance evaluation using actual backbone routing data with 15,000-220,000 plications, so that critical data is stored there with a prefixes shows that by adding a Bloom filter, the complicated hardware guaranteed fast access time [27] since an access to off- for parallel access is removed without search performance penalty in chip memory (usually DRAM) requires longer access parallel-multiple hashing algorithm. Search speed has been improved time, which is 10-20 times slower than on-chip memory by 30-40% by adding a Bloom filter in binary search on level algorithm. access. It is important to partition properly so that a small part of data is stored into on-chip memories and Index Terms—Internet, router, IP address lookup, longest prefix matching, Bloom filter, multi-hashing, binary search on levels, leaf pushing. most of the data is stored in slower and higher capacity off-chip memories. Several metrics are used for evaluating the performance of IP address lookup algorithms and architec- 1INTRODUCTION tures. Since IP address lookup should be performed at DDRESS lookup determines an output port us- wire-speed for every incoming packet, which can be a ing the destination IP address of an incoming hundred million packets per second, search performance packet.A The address aggregation technology currently is the most important metric. Search performance is often used for the Internet is a bitwise prefix matching scheme measured by the number of off-chip memory accesses. called classless inter-domain routing (CIDR), which uses The next metric is the required memory size for storing variable-length subnet masking to allow arbitrary-length a routing table. The incremental update of a routing prefixes. An IP address is said to match a prefix if the table is also an important metric. Scalability for large most significant l bits of the address and a l-bit prefix routing data sets and migration to IPv6 should also be are the same. When an IP address matches more than considered. The performance in these metrics depends one prefix, the longest matching prefix is selected as the on data structures and search algorithms, and thus, it best matching prefix (BMP) [1]-[4]. IP address lookup is is essential to have an efficient structure and a search one of the most challenging operations in router design algorithm to provide the high-performance IP address because of the amount of traffic and the number of lookup. networks, which have increased dramatically in recent IP address lookup algorithms can be roughly catego- years. rized by trie(or tree)-based algorithms [2]-[4], [8], [13]- Using application-specific integrated circuits (ASICs) [26], hashing-based algorithms [9]-[11], or bitmap-based with off-chip ternary content addressable memories algorithms [6], [12]. Recently, dynamic programming- (TCAMs) has been the best solution to provide the based approaches have been proposed to improve the wire-speed packet forwarding. However, TCAMs have search performance and/or storage performance [15]- some limitations [5]. TCAMs consume 150 times more [21]. Hashing is a well-defined procedure for turning each Manuscript received 5 Nov. 2011; Authors are with the Department of Electronics Engineering, Ewha W. key into a smaller integer called a hash index, which University, Seoul, Korea (e-mail: [email protected]). serves as a pointer into an array. Hashing has been Digital Object Indentifier 10.1109/TC.2012.193 0018-9340/12/$31.00 © 2012 IEEE This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON COMPUTERS 2 used mostly in search algorithms to quickly locate a However, the positive does not mean that all those bit- data record for a given search key. For the IP address locations were set only by that current element under lookup, hashing is applied to each length of prefixes, and querying and there is a possibility that those locations the longest prefix among matched prefixes is selected would have been set by some other elements in the set. as the best match [9]-[11]. Among trie-based algorithms, This type of positive result is termed as a false positive. binary searching on hash tables organized by prefix It is important to properly control the rate of the false lengths [13]-[15] provides the best IP address lookup positive in designing a Bloom filter. For a given ratio performance. of m/n, it is known that the false positive probability is Bloom filter is a space-efficient probabilistic data struc- minimized when the number of hash functions k has the ture used to test whether an element is a member following relationship [7]: of a set. Bloom filter has been popularly applied to m network algorithms [7], [26], [28]-[31]. This paper shows k = ln 2 (1) 2log2 n how effectively an on-chip Bloom filter can improve the search performance of known efficient IP address lookup On the whole, a Bloom filter may produce false positives algorithms. but not false negatives. This paper is organized as follows. Section 2 describes the Bloom filter theory. Section 3 introduces two different algorithms providing high-speed IP address lookup; 3RELATED WORKS parallel multiple-hashing, and binary search on levels. Section 4 describes our proposed method to improve IP address lookup problem can be defined formally as P = {P ,P , ··· ,P } those algorithms using a Bloom filter. Section 5 shows follows [8]. Let 1 2 N be a set of routing N A performance evaluation results, and Section 6 concludes prefixes, where is the number of prefixes. Let be S(A, l) the paper. an incoming IP address and be a substring of the most significant l bits of A. Let n(Pi) be the length of a prefix Pi. A is defined to match Pi if S(A, n(Pi)) = Pi. M( ) 2BLOOM FILTER THEORY Let A be the set of prefixes in P that A matches, then M(A)={Pi ∈ P : S(A, n(Pi)) = Pi}. The longest A Bloom filter is basically a bit-vector used to represent prefix matching (LPM) problem is to find the prefix Pj in the membership information of a set of elements. A M(A), such that n(Pj ) >n(Pi) for all Pi ∈M(A),i= j. = { ··· } Bloom filter that represents a set S x1,x2, ,xn of Once the longest matching prefix Pj is determined, the n elements is described by an array of m bits, initially all input packet is forwarded to an output port directed by set to 0. Bloom filter supports two different operations; the prefix Pj . programming and querying. In programming, for an element x in the set S, k different hash functions are computed in such a way that the resulting hash index 3.1 IP Address Lookup Algorithms Using Bloom Fil- hi(x) is of the range 0 ≤ hi(x) ≤ m for i =1, ··· ,k. Then ters all the bit-locations corresponding to k hash indices are An IP address lookup architecture proposed by Dharma- set as 1 in the Bloom filter. The pseudo-code to program purikar et al. is the first algorithm employing a Bloom a Bloom filter for an element x is as follows [7]: filter [7]. It performs parallel queries on W Bloom filters sorted by prefix length to determine the possible lengths BFProgramming (x) of prefix match, where W is 32 in IPv4. For a given for ( i =1tok ) IP address, off-chip hash tables are probed for prefix BF[hi(x)]=1; lengths, which turn out to be positive in Bloom filters A querying is performed to test whether an element starting from the longest prefix. This architecture has a y ∈ S. For an input y, k hash indices are generated using high implementation complexity because of the Bloom the same hash functions that were used to program the filters as well as the hash tables in each prefix length.

Load more