Hashing Hashing

• Dictionaries • Dictionaries • Chained Hashing • Chained Hashing • Linear Probing • Linear Probing • Hash Functions • Hash Functions

Philip Bille

Dictionaries Dictionaries

• Dictionaries. Maintain dynamic set S of elements supporting the following • Applications. operations. Each element x has a key x.key from a universe U and satellite data • Basic data structures for representing a set. x.data. • Used in numerous algorithms and data structures. • SEARCH(k): determine if element with key k exists. If so, return it. • INSERT(x): add x to S (we assume x is not already in S) • DELETE(x): remove x from S. • Challenge. How can we solve problem with current techniques?

U • U = {0,..,99} key(S) • key(S) = {1, 13, 16, 41, 54, 66, 96} Dictionaries Dictionaries

• Solution 1: linked-list. Maintain S as a linked list. • Solution 2: direct addressing. A 0 head • Maintain S in array A of size |U|. 1 1 66 54 1 96 16 41 13 • Store element x at A[x.key]. 2 3 4 • SEARCH(k): linear search for key k. • SEARCH(k): return A[x.key]. 5 6 • INSERT(x): insert x in the front of the list. • INSERT(x): Set A[x.key] = x. 7 • DELETE(x): remove x from list. • DELETE(x): Set A[x.key] = null. 8 9 10 11 • Time. • Time. 12 13 • SEARCH in O(n) time. • SEARCH, INSERT and DELETE in O(1) time. 13 14 • INSERT and DELETE in O(1) tine. • Space. 15 16 • Space. • O(|U|) 16 17 • O(n). 18 19

Dictionaries

Data structure SEARCH INSERT DELETE space Hashing linked list O(n) O(1) O(1) O(n)

direct addressing O(1) O(1) O(1) O(|U|) • Dictionaries • Chained Hashing • Challenge. Can we do significantly better? • Linear Probing • Hash Functions Chained Hashing

• Idea. Find a h : U → {0, ... , m-1}, where m = Θ(n). Hash function k = 50 should spread keys from S approximately evenly over {0, ..., m-1}. 0

• Chained hashing. 1 1541 1 0 • Maintain array A[0..m-1] of linked lists. 2 1 50 • Store element x in linked list at A[h(x.key)]. 3 13 2 4 54 • Collision. 3 5 • x and y collides if h(x.key) = h(y.key). 4 6 66 16 96 5 7 • SEARCH(k): linear search in A[h(k)] for key k. 6 8 • INSERT(x): insert x in front of list A[h(x.key)]. 9 • DELETE(x): remove x from list A[h(x.key)]. h(k) = k mod 7 U = {0,..,99} key(S) = {1, 13, 16, 41, 54, 66, 96} m = 10 h(k) = k mod 10

k = 87 k = 75

0 0

1 50 1 50

2 2

3 87 3 87

4 4

5 5 75 6 6

h(k) = k mod 7 h(k) = k mod 7 k = 15 k = 7

0 0 7

1 5015 1 15 50

2 2

3 87 3 87

4 4

5 75 5 75 6 6

h(k) = k mod 7 h(k) = k mod 7

k = 17 k = 22

0 7 0 7

1 15 50 1 1522 50

2 2

3 1787 3 17 87

4 4

5 75 5 75 6 6

h(k) = k mod 7 h(k) = k mod 7 Chained Hashing Chained Hashing

• SEARCH(k): linear search in A[h(k)] for key k. • SEARCH(k): linear search in A[h(k)] for key k. • INSERT(x): insert x in front of list A[h(x.key)]. • INSERT(x): insert x in front of list A[h(x.key)]. • DELETE(x): remove x from list A[h(x.key)]. • DELETE(x): remove x from list A[h(x.key)]. 0 1 1541 1

• Exercise. Insert sequence of keys K = 5, 28, 19, 15, 20, 33, 12, 17, 10 in an initially • Time. 2 empty of size 9 using chained hashing with hash function h(k) = k mod 9. • SEARCH in O(length of list) time. 3 13

• INSERT and DELETE in O(1) time. 4 54 • Length of list depends on hash function. 5 • Space. 6 66 16 96 • O(m + n) = O(n). 7

8

9 U = {0,..,99} key(S) = {1, 13, 16, 41, 54, 66, 96} m = 10 h(k) = k mod 10

Chained Hashing Dictionaries

• Def. Load factor α = average length of lists = n/m = O(1)

Data structure SEARCH INSERT DELETE space • Simple uniform hashing. Assume that every key is 0 linked list O(n) O(1) O(1) O(n) mapped uniformly at random to {0, .., m-1}. 1 1541 1 • Expected length of list = α. direct addressing O(1) O(1) O(1) O(|U|) 2 • ⇒ expected time for SEARCH is O(1). 3 13 chained hashing O(1)† O(1) O(1) O(n)

• Time (assuming simple uniform hashing). 4 54 † = expected time assuming simple uniform hashing

• SEARCH in O(1) expected time. 5 • INSERT and DELETE in O(1) time. 6 66 16 96

7

8

9 U = {0,..,99} key(S) = {1, 13, 16, 41, 54, 66, 96} m = 10 h(k) = k mod 10 Linear Probing

• Linear probing. • Maintain S in array A of size m. Hashing • Element x stored in A[h(x.key)] or in cluster to the right of A[h(x.key)]. • Cluster = consecutive (cyclic) sequence of non-empty entries. • Dictionaries • Chained Hashing 41 1 11 13 54 98 • Linear Probing 0 1 2 3 4 5 6 7 8 9 • Hash Functions h(k) = k mod 10

• SEARCH(k): linear search from A[k] in cluster to the right of A[k]. • INSERT(x): insert x on A[h(x.key)]. If non-empty, insert on next empty entry to the right of x (cyclically). • DELETE(x): remove x from A[h(x.key)]. Re-insert all elements to the right of x in the cluster.

Linear Probing

• Theorem. Simple uniform hashing ⟹ expected O(1) time for linear probing operations. 5 1 27 32 54 11 19 • Caching. Linear probing is cache-efficient.

41 1 11 13 54 98

0 1 2 3 4 5 6 7 8 9 h(k) = k mod 10 0 1 2 3 4 5 6 7 8 9 10

• Variants. • h(k) = k mod 11 • . Dictionaries

Data structure SEARCH INSERT DELETE space

linked list O(n) O(1) O(1) O(n) Hashing

direct addressing O(1) O(1) O(1) O(|U|) • Dictionaries chained hashing O(1)† O(1) O(1) O(n) • Chained Hashing † † † linear probing O(1) O(1) O(1) O(n) • Linear Probing † = expected time assuming simple uniform hashing • Hash Functions

Hash Functions

• Simple hash functions. • h(k) = k mod m. Typically, m is prime. • h(k) =⎣m(kZ -⎣kZ⎦)⎦, for constant Z, 0 < Z < 1. Hashing

• Universal hash functions. • Dictionaries • Choose hash functions randomly from family of hash functions. • Chained Hashing • Designed to have strong guarantees on collision probabilities. • Linear Probing • ⇒ Dictionaries with constant expected time performance. • Hash Functions • Expectation on random choice of hash function. Independent of input set.

• Other hash functions. • , MurmurHash, SHA-xxx, FNV, ...

• Applications. • Cryptography, similarity, coding, ...