
CSE 535 : Lecture 5 String Matching with Bloom Filters Washington University Fall 2003 http://www.arl.wustl.edu/arl/projects/fpx/cse535/ Copyright 2003, Sarang Dharmapurikar [Guest Lecture] CSE 535 : Fall 2003 1 Background on Bloom Filter • Data structure proposed by Burton Bloom • Randomized data structure – Strings are stored using multiple hash functions – It can be queried to check the presence of a string • Membership queries result in rare false positives but never false negatives • Originally used for UNIX spell check • Modern applications include : – Content Networks – Summary Caches – route trace-back – Network measurements – Intrusion Detection CSE 535 : Fall 2003 2 Hash Functions • Input : x •Output : H[x] • Properties – Each value of x maps to a value of H[x] – Typically: Size of (x) >> Size of (H[x]) – H[x] evenly distributed over values of x • Implementation – Hash Function • XOR of bits, Shifting, rotates .. X H H[X] CSE 535 : Fall 2003 3 Programming a Bloom Filter • Bloom filter computes k hash functions on input 1 1 H1 1 X H2 H3 H k 1 1 m-bit vector CSE 535 : Fall 2003 4 Programming a Bloom Filter 1 1 1 H1 1 Y H2 H 3 1 H k 1 1 1 m-bit vector CSE 535 : Fall 2003 5 Querying a Bloom Filter 1 1 1 H1 1 X H2 H 3 1 match H k 1 1 1 m-bit vector CSE 535 : Fall 2003 6 Querying a Bloom Filter 1 1 1 H1 1 W H2 H 3 1 Match H (false positive) k 1 1 1 m-bit vector CSE 535 : Fall 2003 7 Optimal Parameters of a Bloom filter • n : number of strings to be stored 1 • k : number of hash functions 1 • m : the size of the bit-array (memory) 1 H1 • The false positive probability H 1 Y 2 k H f = (½) 3 1 H4 H • The optimal value of hash functions, k, k 1 is k = ln2 × m/n = 0.693 × m/n 1 1 m-bit Array Key Point : False positive probability decreases exponentially with linear increase in the number of hash functions & memory CSE 535 : Fall 2003 8 Counting Bloom Filters • A message once programmed in the Bloom filter can not be deleted – Deletion of message requires clearing the corresponding bits – Since a bit can be set by multiple messages, clearing it will disturb other messages • Counting Bloom filters solve the problem – Array of counters instead of array of bits – Increment the corresponding counters when a message is added, decrement when deleted 1 2 0 0 A A 1 1 1 1 0 0 B 0 0 B 1 2 0 0 off-chip counter array CSE 535 : Fall 2003 9 Counting Bloom Filters • Maintain Bloom filters on the chip and corresponding counters off the chip 1 2 – Saves on the on-chip 0 0 resources to implement 1 1 counters 1 1 0 0 – Addition and deletion of 0 0 messages are rare 1 2 0 0 – Set the bit when On-chip bit array off-chip counter array corresponding counter changes 0 to 1, clear it when counter changes 1 to 0 CSE 535 : Fall 2003 10 Using Bloom filters for String Matching Hash Table False Positives Resolver BFW BF5 BF4 BF3 Entering byte bW --------- b5 b4 b3 b2 b1 Leaving byte CSE 535 : Fall 2003 11 Bloom filter for cs535 Hash Table BF16 Entering byte bW --------- b5 b4 b3 b2 b1 Leaving byte CSE 535 : Fall 2003 12 System Overview Off-chip • Receives the control packets, 64 Mega bytes decodes the commands in it and SDRAM accordingly either updates the A component that reads-writes data Bloom filter or updates the hash given by the user component in the table off-chip SDRAM SDRAM Controller • Implements the hash-table Sends Control packets to around SDRAM CPP and data packets to Bloom filter • communicates with the SDRAM Hash Table Interface controller through a request grant protocol When hash table instructs, it Process the packet Control sends a notification packet out headers Packet Processor Bloom Filter Input Output Controller Controller Protocol Wrappers CSE 535 : Fall 2003 13 Bloom Filters on the FPX Platform Bloom filters • Xilinx XCV2000E FPGA implemented on the Reconfigurable – Implements Application Device Reconfigurable Application Device (RAD) on the Field- programmable Port Extender (FPX) – Contains 160 Embedded RAMs • Each BlockRAM has dual (2) ports • Each BlockRAM stores 4096 bits – Enables MP2 to Field-programmable Port Extender (FPX) Platform implement large, fast, parallel Bloom filters CSE 535 : Fall 2003 14 Partial Bloom Filter 1 bit dinA weA doutA ‘0’ addrA Output H1(X) X Hash dinB (match/no match) Value 4096 bits doutB Calculator weB ‘0’ addrB H2(X) CSE 535 : Fall 2003 15 Partial Bloom Filter Address Valid PBF BRAM # Bit 1 bit dinA Request weA doutA Decoder addrA Output H1(X) X Hash dinB (match/no match) Value 4096 bits doutB Calculator weB ‘0’ addrB H2(X) CSE 535 : Fall 2003 16 Bloom Filter Control Interface H1 H2 PBF 1 H3 H4 PBF 2 H5 X Hash Value H6 PBF 3 Calculator Match H7 H8 PBF 4 H9 H10 PBF 5 CSE 535 : Fall 2003 17.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages9 Page
-
File Size-