CSE 535 : Lecture 5

String Matching with Bloom Filters

Washington University Fall 2003

http://www.arl.wustl.edu/arl/projects/fpx/cse535/

Copyright 2003, Sarang Dharmapurikar [Guest Lecture]

CSE 535 : Fall 2003 1

Background on Bloom Filter

proposed by Burton Bloom • Randomized data structure – Strings are stored using multiple hash functions – It can be queried to check the presence of a string • Membership queries result in rare false positives but never false negatives • Originally used for UNIX spell check • Modern applications include : – Content Networks – Summary Caches – route trace-back – Network measurements – Intrusion Detection

CSE 535 : Fall 2003 2 Hash Functions

• Input : x •Output : H[x] • Properties – Each value of x maps to a value of H[x] – Typically: Size of (x) >> Size of (H[x]) – H[x] evenly distributed over values of x • Implementation – • XOR of , Shifting, rotates ..

X H H[X]

CSE 535 : Fall 2003 3

Programming a Bloom Filter • Bloom filter computes k hash functions on input

1

1

H1 1 X H2

H3

H k 1

1

m- vector

CSE 535 : Fall 2003 4 Programming a Bloom Filter

1 1

1

H1 1 Y H2 H 3 1

H k 1

1 1

m-bit vector

CSE 535 : Fall 2003 5

Querying a Bloom Filter

1 1

1

H1 1 X H2 H 3 1 match

H k 1

1 1

m-bit vector

CSE 535 : Fall 2003 6 Querying a Bloom Filter

1 1

1

H1 1 W H2 H 3 1 Match

H (false positive) k 1

1 1

m-bit vector

CSE 535 : Fall 2003 7

Optimal Parameters of a Bloom filter

• n : number of strings to be stored 1 • k : number of hash functions 1 • m : the size of the bit-array (memory) 1

H1 • The false positive probability H 1 Y 2 k H f = (½) 3 1 H4 H • The optimal value of hash functions, k, k 1 is k = ln2 × m/n = 0.693 × m/n 1 1 m- Key Point : False positive probability decreases exponentially with linear increase in the number of hash functions & memory

CSE 535 : Fall 2003 8 Counting Bloom Filters

• A message once programmed in the Bloom filter can not be deleted – Deletion of message requires clearing the corresponding bits – Since a bit can be by multiple messages, clearing it will disturb other messages

• Counting Bloom filters solve the problem – Array of counters instead of array of bits – Increment the corresponding counters when a message is added, decrement when deleted

1 2 0 0 A A 1 1 1 1 0 0 B 0 0 B 1 2 0 0 off-chip counter array CSE 535 : Fall 2003 9

Counting Bloom Filters

• Maintain Bloom filters on the chip and corresponding counters off the chip 1 2 – Saves on the on-chip 0 0 resources to implement 1 1 counters 1 1 0 0 – Addition and deletion of 0 0 messages are rare 1 2 0 0 – Set the bit when On-chip bit array off-chip counter array corresponding counter changes 0 to 1, clear it when counter changes 1 to 0

CSE 535 : Fall 2003 10 Using Bloom filters for String Matching

Hash Table

False Positives Resolver

BFW BF5 BF4 BF3

Entering byte bW ------b5 b4 b3 b2 b1 Leaving byte

CSE 535 : Fall 2003 11

Bloom filter for cs535

Hash Table

BF16

Entering byte bW ------b5 b4 b3 b2 b1 Leaving byte

CSE 535 : Fall 2003 12 System Overview

Off-chip • Receives the control packets, 64 Mega bytes decodes the commands in it and SDRAM accordingly either updates the A component that reads-writes data Bloom filter or updates the hash given by the user component in the table off-chip SDRAM

SDRAM Controller • Implements the hash-table Sends Control packets to around SDRAM CPP and data packets to Bloom filter • communicates with the SDRAM Hash Table Interface controller through a request grant protocol

When hash table instructs, it Process the packet Control sends a notification packet out headers Packet Processor Bloom Filter

Input Output Controller Controller

Protocol Wrappers

CSE 535 : Fall 2003 13

Bloom Filters on the FPX Platform

Bloom filters • Xilinx XCV2000E FPGA implemented on the Reconfigurable – Implements Application Device Reconfigurable Application Device (RAD) on the Field- programmable Port Extender (FPX) – Contains 160 Embedded RAMs • Each BlockRAM has dual (2) ports • Each BlockRAM stores 4096 bits – Enables MP2 to Field-programmable Port Extender (FPX) Platform implement large, fast, parallel Bloom filters

CSE 535 : Fall 2003 14 Partial Bloom Filter

1 bit dinA weA doutA ‘0’ addrA Output H1(X) X Hash dinB (match/no match) Value 4096 bits doutB Calculator weB ‘0’ addrB

H2(X)

CSE 535 : Fall 2003 15

Partial Bloom Filter

Address Valid PBF

BRAM # Bit

1 bit dinA Request weA doutA Decoder addrA Output H1(X) X Hash dinB (match/no match) Value 4096 bits doutB Calculator weB ‘0’ addrB

H2(X)

CSE 535 : Fall 2003 16 Bloom Filter

Control Interface

H1

H2 PBF 1 H3

H4 PBF 2 H5 X Hash Value H6 PBF 3 Calculator Match H7

H8 PBF 4 H9

H10 PBF 5

CSE 535 : Fall 2003 17