CSE 535 : Lecture 5
String Matching with Bloom Filters
Washington University Fall 2003
http://www.arl.wustl.edu/arl/projects/fpx/cse535/
Copyright 2003, Sarang Dharmapurikar [Guest Lecture]
CSE 535 : Fall 2003 1
Background on Bloom Filter
• Data structure proposed by Burton Bloom • Randomized data structure – Strings are stored using multiple hash functions – It can be queried to check the presence of a string • Membership queries result in rare false positives but never false negatives • Originally used for UNIX spell check • Modern applications include : – Content Networks – Summary Caches – route trace-back – Network measurements – Intrusion Detection
CSE 535 : Fall 2003 2 Hash Functions
• Input : x •Output : H[x] • Properties – Each value of x maps to a value of H[x] – Typically: Size of (x) >> Size of (H[x]) – H[x] evenly distributed over values of x • Implementation – Hash Function • XOR of bits, Shifting, rotates ..
X H H[X]
CSE 535 : Fall 2003 3
Programming a Bloom Filter • Bloom filter computes k hash functions on input
1
1
H1 1 X H2
H3
H k 1
1
m-bit vector
CSE 535 : Fall 2003 4 Programming a Bloom Filter
1 1
1
H1 1 Y H2 H 3 1
H k 1
1 1
m-bit vector
CSE 535 : Fall 2003 5
Querying a Bloom Filter
1 1
1
H1 1 X H2 H 3 1 match
H k 1
1 1
m-bit vector
CSE 535 : Fall 2003 6 Querying a Bloom Filter
1 1
1
H1 1 W H2 H 3 1 Match
H (false positive) k 1
1 1
m-bit vector
CSE 535 : Fall 2003 7
Optimal Parameters of a Bloom filter
• n : number of strings to be stored 1 • k : number of hash functions 1 • m : the size of the bit-array (memory) 1
H1 • The false positive probability H 1 Y 2 k H f = (½) 3 1 H4 H • The optimal value of hash functions, k, k 1 is k = ln2 × m/n = 0.693 × m/n 1 1 m-bit Array Key Point : False positive probability decreases exponentially with linear increase in the number of hash functions & memory
CSE 535 : Fall 2003 8 Counting Bloom Filters
• A message once programmed in the Bloom filter can not be deleted – Deletion of message requires clearing the corresponding bits – Since a bit can be set by multiple messages, clearing it will disturb other messages
• Counting Bloom filters solve the problem – Array of counters instead of array of bits – Increment the corresponding counters when a message is added, decrement when deleted
1 2 0 0 A A 1 1 1 1 0 0 B 0 0 B 1 2 0 0 off-chip counter array CSE 535 : Fall 2003 9
Counting Bloom Filters
• Maintain Bloom filters on the chip and corresponding counters off the chip 1 2 – Saves on the on-chip 0 0 resources to implement 1 1 counters 1 1 0 0 – Addition and deletion of 0 0 messages are rare 1 2 0 0 – Set the bit when On-chip bit array off-chip counter array corresponding counter changes 0 to 1, clear it when counter changes 1 to 0
CSE 535 : Fall 2003 10 Using Bloom filters for String Matching
Hash Table
False Positives Resolver
BFW BF5 BF4 BF3
Entering byte bW ------b5 b4 b3 b2 b1 Leaving byte
CSE 535 : Fall 2003 11
Bloom filter for cs535
Hash Table
BF16
Entering byte bW ------b5 b4 b3 b2 b1 Leaving byte
CSE 535 : Fall 2003 12 System Overview
Off-chip • Receives the control packets, 64 Mega bytes decodes the commands in it and SDRAM accordingly either updates the A component that reads-writes data Bloom filter or updates the hash given by the user component in the table off-chip SDRAM
SDRAM Controller • Implements the hash-table Sends Control packets to around SDRAM CPP and data packets to Bloom filter • communicates with the SDRAM Hash Table Interface controller through a request grant protocol
When hash table instructs, it Process the packet Control sends a notification packet out headers Packet Processor Bloom Filter
Input Output Controller Controller
Protocol Wrappers
CSE 535 : Fall 2003 13
Bloom Filters on the FPX Platform
Bloom filters • Xilinx XCV2000E FPGA implemented on the Reconfigurable – Implements Application Device Reconfigurable Application Device (RAD) on the Field- programmable Port Extender (FPX) – Contains 160 Embedded RAMs • Each BlockRAM has dual (2) ports • Each BlockRAM stores 4096 bits – Enables MP2 to Field-programmable Port Extender (FPX) Platform implement large, fast, parallel Bloom filters
CSE 535 : Fall 2003 14 Partial Bloom Filter
1 bit dinA weA doutA ‘0’ addrA Output H1(X) X Hash dinB (match/no match) Value 4096 bits doutB Calculator weB ‘0’ addrB
H2(X)
CSE 535 : Fall 2003 15
Partial Bloom Filter
Address Valid PBF
BRAM # Bit
1 bit dinA Request weA doutA Decoder addrA Output H1(X) X Hash dinB (match/no match) Value 4096 bits doutB Calculator weB ‘0’ addrB
H2(X)
CSE 535 : Fall 2003 16 Bloom Filter
Control Interface
H1
H2 PBF 1 H3
H4 PBF 2 H5 X Hash Value H6 PBF 3 Calculator Match H7
H8 PBF 4 H9
H10 PBF 5
CSE 535 : Fall 2003 17