Lecture 4: Hashing

Rafael Oliveira

University of Waterloo Cheriton School of Computer Science [email protected]

September 21, 2020

1 / 78 Overview

Introduction Hash Functions Why is hashing? How to hash?

Succinctness of Hash Functions Coping with randomness Universal Hashing Hashing using 2-universal families Perfect Hashing

Acknowledgements

2 / 78 Computational Model

Before we talk about hash functions, we need to state our model of computation: Definition (Word RAM model) In the word RAMa model: all elements are integers that fit in a machine word of w bits Basic operations (comparison, arithmetic, bitwise) on such words take Θ(1) time We can also access any position in the array in Θ(1) time

aRAM stands for Random Access Model

3 / 78 Naive approach: use an array A of m elements, initially A[i] = 0 for all i, and when a key is inserted, set A[i] = 1.

Insertion: O(1), Deletion: O(1), Search: O(1) Memory: O(m log(m)) (this is very bad!) Want to also achieve optimal memory O(n log(m)). For this we will use a technique called hashing. A is a function h : U → [0, n − 1], where |U| = m >> n. A is a data structure that consists of: a table T with n cells [0, n − 1], each cell storing O(log(m)) bits a hash function h : U → [0, n − 1] From now on, we will define memory as # of cells.

What is hashing? We want to store n elements (keys) from the set U = {0, 1,..., m − 1}, where m >> n, in a data structure that supports insertions, deletions, search “as efficiently as possible.”

4 / 78 Insertion: O(1), Deletion: O(1), Search: O(1) Memory: O(m log(m)) (this is very bad!) Want to also achieve optimal memory O(n log(m)). For this we will use a technique called hashing. A hash function is a function h : U → [0, n − 1], where |U| = m >> n. A hash table is a data structure that consists of: a table T with n cells [0, n − 1], each cell storing O(log(m)) bits a hash function h : U → [0, n − 1] From now on, we will define memory as # of cells.

What is hashing? We want to store n elements (keys) from the set U = {0, 1,..., m − 1}, where m >> n, in a data structure that supports insertions, deletions, search “as efficiently as possible.”

Naive approach: use an array A of m elements, initially A[i] = 0 for all i, and when a key is inserted, set A[i] = 1.

5 / 78 Want to also achieve optimal memory O(n log(m)). For this we will use a technique called hashing. A hash function is a function h : U → [0, n − 1], where |U| = m >> n. A hash table is a data structure that consists of: a table T with n cells [0, n − 1], each cell storing O(log(m)) bits a hash function h : U → [0, n − 1] From now on, we will define memory as # of cells.

What is hashing? We want to store n elements (keys) from the set U = {0, 1,..., m − 1}, where m >> n, in a data structure that supports insertions, deletions, search “as efficiently as possible.”

Naive approach: use an array A of m elements, initially A[i] = 0 for all i, and when a key is inserted, set A[i] = 1.

Insertion: O(1), Deletion: O(1), Search: O(1) Memory: O(m log(m)) (this is very bad!)

6 / 78 What is hashing? We want to store n elements (keys) from the set U = {0, 1,..., m − 1}, where m >> n, in a data structure that supports insertions, deletions, search “as efficiently as possible.”

Naive approach: use an array A of m elements, initially A[i] = 0 for all i, and when a key is inserted, set A[i] = 1.

Insertion: O(1), Deletion: O(1), Search: O(1) Memory: O(m log(m)) (this is very bad!) Want to also achieve optimal memory O(n log(m)). For this we will use a technique called hashing. A hash function is a function h : U → [0, n − 1], where |U| = m >> n. A hash table is a data structure that consists of: a table T with n cells [0, n − 1], each cell storing O(log(m)) bits a hash function h : U → [0, n − 1] From now on, we will define memory as # of cells.

7 / 78 Why is hashing useful?

Designing efficient data structures (dictionaries) for searching Data streaming algorithms Derandomization Complexity Theory many more

8 / 78 Will settle for: # collisions small with high probability.

Ideally, want hash function to map different keys into different locations. Definition (Collision) We say that a collision happens for hash function h with inputs x, y ∈ U if x 6= y and h(x) = h(y).

By pigeonhole principle, impossible to achieve without knowing keys in advance.

Challenges in Hashing

Setup: Universe U = {0,..., m − 1} of size m >> n where n is the size of the range of our hash function h : U → [0, n − 1] Store O(n) elements of U (keys) in hash table T (which has n cells)

9 / 78 Will settle for: # collisions small with high probability.

Challenges in Hashing

Setup: Universe U = {0,..., m − 1} of size m >> n where n is the size of the range of our hash function h : U → [0, n − 1] Store O(n) elements of U (keys) in hash table T (which has n cells)

Ideally, want hash function to map different keys into different locations. Definition (Collision) We say that a collision happens for hash function h with inputs x, y ∈ U if x 6= y and h(x) = h(y).

By pigeonhole principle, impossible to achieve without knowing keys in advance.

10 / 78 Challenges in Hashing

Setup: Universe U = {0,..., m − 1} of size m >> n where n is the size of the range of our hash function h : U → [0, n − 1] Store O(n) elements of U (keys) in hash table T (which has n cells)

Ideally, want hash function to map different keys into different locations. Definition (Collision) We say that a collision happens for hash function h with inputs x, y ∈ U if x 6= y and h(x) = h(y).

By pigeonhole principle, impossible to achieve without knowing keys in advance.

Will settle for: # collisions small with high probability.

11 / 78 Simplest version to keep in mind: 1 Pr [h(x) = h(y)] ≤ ∀x 6= y ∈ U h∈R H poly(n)

Assumptions: keys are independent from hash function we choose. we do not know keys in advance (even if we did, nontrivial problem!)

Question Still could have collisions. How do we handle them?

Our solution: family of hash functions

Construct family of hash functions H such that the number of collisions is small with high probability, when we pick hash function uniformly at random from the family H.

12 / 78 Assumptions: keys are independent from hash function we choose. we do not know keys in advance (even if we did, nontrivial problem!)

Question Still could have collisions. How do we handle them?

Our solution: family of hash functions

Construct family of hash functions H such that the number of collisions is small with high probability, when we pick hash function uniformly at random from the family H.

Simplest version to keep in mind: 1 Pr [h(x) = h(y)] ≤ ∀x 6= y ∈ U h∈R H poly(n)

13 / 78 Our solution: family of hash functions

Construct family of hash functions H such that the number of collisions is small with high probability, when we pick hash function uniformly at random from the family H.

Simplest version to keep in mind: 1 Pr [h(x) = h(y)] ≤ ∀x 6= y ∈ U h∈R H poly(n)

Assumptions: keys are independent from hash function we choose. we do not know keys in advance (even if we did, nontrivial problem!)

Question Still could have collisions. How do we handle them?

14 / 78 This setting is same as our balls-and-bins setting! So, if we have to store n keys: Expected number of keys in a location:1 maximum number of collisions (max load) in one particular location: O(log n/ log log n) keys Solving collisions: store all keys hashed into location i by a linked list.

Known as chain hashing.

Could also pick two random hash functions and use power of two choices. Collision bound becomes O(log log n).

Random Hash Functions?

Natural to consider following approach:

From all functions h : U → [0, n − 1], just pick one uniformly at random.

15 / 78 So, if we have to store n keys: Expected number of keys in a location:1 maximum number of collisions (max load) in one particular location: O(log n/ log log n) keys Solving collisions: store all keys hashed into location i by a linked list.

Known as chain hashing.

Could also pick two random hash functions and use power of two choices. Collision bound becomes O(log log n).

Random Hash Functions?

Natural to consider following approach:

From all functions h : U → [0, n − 1], just pick one uniformly at random.

This setting is same as our balls-and-bins setting!

16 / 78 Solving collisions: store all keys hashed into location i by a linked list.

Known as chain hashing.

Could also pick two random hash functions and use power of two choices. Collision bound becomes O(log log n).

Random Hash Functions?

Natural to consider following approach:

From all functions h : U → [0, n − 1], just pick one uniformly at random.

This setting is same as our balls-and-bins setting! So, if we have to store n keys: Expected number of keys in a location:1 maximum number of collisions (max load) in one particular location: O(log n/ log log n) keys

17 / 78 Could also pick two random hash functions and use power of two choices. Collision bound becomes O(log log n).

Random Hash Functions?

Natural to consider following approach:

From all functions h : U → [0, n − 1], just pick one uniformly at random.

This setting is same as our balls-and-bins setting! So, if we have to store n keys: Expected number of keys in a location:1 maximum number of collisions (max load) in one particular location: O(log n/ log log n) keys Solving collisions: store all keys hashed into location i by a linked list.

Known as chain hashing.

18 / 78 Random Hash Functions?

Natural to consider following approach:

From all functions h : U → [0, n − 1], just pick one uniformly at random.

This setting is same as our balls-and-bins setting! So, if we have to store n keys: Expected number of keys in a location:1 maximum number of collisions (max load) in one particular location: O(log n/ log log n) keys Solving collisions: store all keys hashed into location i by a linked list.

Known as chain hashing.

Could also pick two random hash functions and use power of two choices. Collision bound becomes O(log log n).

19 / 78 Question How much resource (time & space) does it take to compute random hash functions?

Storing entire function h : U → [0, n − 1] require O(m log n) bits (way too much space!) Even if we only stored the elements we saw, would require O(n) time to evaluate h(x) (need to decide if we had already computed it!)

Remark Thus, for random function all operations (insert, delete, search) take O(n log m) time (at best!)

How do we cope with the computational problem that arose with randomness?

Random Hash Functions? Random hash functions look very good. However, we haven’t discussed the following:

20 / 78 Storing entire function h : U → [0, n − 1] require O(m log n) bits (way too much space!) Even if we only stored the elements we saw, would require O(n) time to evaluate h(x) (need to decide if we had already computed it!)

Remark Thus, for random function all operations (insert, delete, search) take O(n log m) time (at best!)

How do we cope with the computational problem that arose with randomness?

Random Hash Functions? Random hash functions look very good. However, we haven’t discussed the following: Question How much resource (time & space) does it take to compute random hash functions?

21 / 78 Remark Thus, for random function all operations (insert, delete, search) take O(n log m) time (at best!)

How do we cope with the computational problem that arose with randomness?

Random Hash Functions? Random hash functions look very good. However, we haven’t discussed the following: Question How much resource (time & space) does it take to compute random hash functions?

Storing entire function h : U → [0, n − 1] require O(m log n) bits (way too much space!) Even if we only stored the elements we saw, would require O(n) time to evaluate h(x) (need to decide if we had already computed it!)

22 / 78 How do we cope with the computational problem that arose with randomness?

Random Hash Functions? Random hash functions look very good. However, we haven’t discussed the following: Question How much resource (time & space) does it take to compute random hash functions?

Storing entire function h : U → [0, n − 1] require O(m log n) bits (way too much space!) Even if we only stored the elements we saw, would require O(n) time to evaluate h(x) (need to decide if we had already computed it!)

Remark Thus, for random function all operations (insert, delete, search) take O(n log m) time (at best!)

23 / 78 Random Hash Functions? Random hash functions look very good. However, we haven’t discussed the following: Question How much resource (time & space) does it take to compute random hash functions?

Storing entire function h : U → [0, n − 1] require O(m log n) bits (way too much space!) Even if we only stored the elements we saw, would require O(n) time to evaluate h(x) (need to decide if we had already computed it!)

Remark Thus, for random function all operations (insert, delete, search) take O(n log m) time (at best!)

How do we cope with the computational problem that arose with randomness? 24 / 78 Introduction Hash Functions Why is hashing? How to hash?

Succinctness of Hash Functions Coping with randomness Universal Hashing Hashing using 2-universal families Perfect Hashing

Acknowledgements

25 / 78 Ideally something that takes O(log m) time to compute (as this is the size of our input).

Question How many hash functions can we have with the property above?

poly(m) functions, as each function takes at most O(log m) bits to describe. Thus these are succinct functions (easy to describe and compute) which have random-like properties!

Part of derandomization/pseudorandomness: huge subfield in TCS!

How to cope with “hardness” of randomness?

We want something that is random-like (few collisions w.h.p.) but easy to compute/represent.

26 / 78 Question How many hash functions can we have with the property above?

poly(m) functions, as each function takes at most O(log m) bits to describe. Thus these are succinct functions (easy to describe and compute) which have random-like properties!

Part of derandomization/pseudorandomness: huge subfield in TCS!

How to cope with “hardness” of randomness?

We want something that is random-like (few collisions w.h.p.) but easy to compute/represent.

Ideally something that takes O(log m) time to compute (as this is the size of our input).

27 / 78 poly(m) functions, as each function takes at most O(log m) bits to describe. Thus these are succinct functions (easy to describe and compute) which have random-like properties!

Part of derandomization/pseudorandomness: huge subfield in TCS!

How to cope with “hardness” of randomness?

We want something that is random-like (few collisions w.h.p.) but easy to compute/represent.

Ideally something that takes O(log m) time to compute (as this is the size of our input).

Question How many hash functions can we have with the property above?

28 / 78 Part of derandomization/pseudorandomness: huge subfield in TCS!

How to cope with “hardness” of randomness?

We want something that is random-like (few collisions w.h.p.) but easy to compute/represent.

Ideally something that takes O(log m) time to compute (as this is the size of our input).

Question How many hash functions can we have with the property above?

poly(m) functions, as each function takes at most O(log m) bits to describe. Thus these are succinct functions (easy to describe and compute) which have random-like properties!

29 / 78 How to cope with “hardness” of randomness?

We want something that is random-like (few collisions w.h.p.) but easy to compute/represent.

Ideally something that takes O(log m) time to compute (as this is the size of our input).

Question How many hash functions can we have with the property above?

poly(m) functions, as each function takes at most O(log m) bits to describe. Thus these are succinct functions (easy to describe and compute) which have random-like properties!

Part of derandomization/pseudorandomness: huge subfield in TCS!

30 / 78 Definition (Full Independence)

A set of random variables X1,..., Xn are said to be (fully) independent if they satisfy " n # n \ Y Pr Xi = ai = Pr[Xi = ai ] i=1 i=1

Definition (k-wise Independence)

A set of random variables X1,..., Xn are said to be k-wise independent if for any set J ⊂ [n] such that |J| ≤ k they satisfy " # \ Y Pr Xi = ai = Pr[Xi = ai ] i∈J i∈J

k-wise independence

Weaker notion of independence.

31 / 78 Definition (k-wise Independence)

A set of random variables X1,..., Xn are said to be k-wise independent if for any set J ⊂ [n] such that |J| ≤ k they satisfy " # \ Y Pr Xi = ai = Pr[Xi = ai ] i∈J i∈J

k-wise independence

Weaker notion of independence.

Definition (Full Independence)

A set of random variables X1,..., Xn are said to be (fully) independent if they satisfy " n # n \ Y Pr Xi = ai = Pr[Xi = ai ] i=1 i=1

32 / 78 k-wise independence

Weaker notion of independence.

Definition (Full Independence)

A set of random variables X1,..., Xn are said to be (fully) independent if they satisfy " n # n \ Y Pr Xi = ai = Pr[Xi = ai ] i=1 i=1

Definition (k-wise Independence)

A set of random variables X1,..., Xn are said to be k-wise independent if for any set J ⊂ [n] such that |J| ≤ k they satisfy " # \ Y Pr Xi = ai = Pr[Xi = ai ] i∈J i∈J

33 / 78 Why are they even random? Why are they pairwise independent? Are they also 3-wise independent?

Pairwise independence When k = 2, k-wise independence is called pairwise independence.

Example (XOR pairwise independence) b Given b uniformly random bits Y1,..., Yb, we can generate 2 − 1 pairwise independent random variables as follows: M XS := Yi S ⊆ [b] \ ∅ i∈S

34 / 78 Why are they even random? Why are they pairwise independent? Are they also 3-wise independent?

Pairwise independence When k = 2, k-wise independence is called pairwise independence.

Example (XOR pairwise independence) b Given b uniformly random bits Y1,..., Yb, we can generate 2 − 1 pairwise independent random variables as follows: M XS := Yi S ⊆ [b] \ ∅ i∈S

35 / 78 Why are they pairwise independent? Are they also 3-wise independent?

Pairwise independence When k = 2, k-wise independence is called pairwise independence.

Example (XOR pairwise independence) b Given b uniformly random bits Y1,..., Yb, we can generate 2 − 1 pairwise independent random variables as follows: M XS := Yi S ⊆ [b] \ ∅ i∈S

Why are they even random?

36 / 78 Are they also 3-wise independent?

Pairwise independence When k = 2, k-wise independence is called pairwise independence.

Example (XOR pairwise independence) b Given b uniformly random bits Y1,..., Yb, we can generate 2 − 1 pairwise independent random variables as follows: M XS := Yi S ⊆ [b] \ ∅ i∈S

Why are they even random? Why are they pairwise independent?

37 / 78 Pairwise independence When k = 2, k-wise independence is called pairwise independence.

Example (XOR pairwise independence) b Given b uniformly random bits Y1,..., Yb, we can generate 2 − 1 pairwise independent random variables as follows: M XS := Yi S ⊆ [b] \ ∅ i∈S

Why are they even random? Why are they pairwise independent? Are they also 3-wise independent?

38 / 78 Why are they even random? Why are they pairwise independent? Are they also 3-wise independent? Can think of these random variables as picking a random line over a finite field. If we only know one point of the line, the second point is still uniformly random. However two points determine the line.

Pairwise independence II

Example (Pairwise independence in Fp) Let p be a . Given 2 uniformly random variables Y1, Y2 ∼ [0,..., p − 1], generate p pairwise independent random variables as follows:

Xi := Y1 + i · Y2 mod p i ∈ [0, p − 1]

39 / 78 Why are they pairwise independent? Are they also 3-wise independent? Can think of these random variables as picking a random line over a finite field. If we only know one point of the line, the second point is still uniformly random. However two points determine the line.

Pairwise independence II

Example (Pairwise independence in Fp) Let p be a prime number. Given 2 uniformly random variables Y1, Y2 ∼ [0,..., p − 1], generate p pairwise independent random variables as follows:

Xi := Y1 + i · Y2 mod p i ∈ [0, p − 1]

Why are they even random?

40 / 78 Are they also 3-wise independent? Can think of these random variables as picking a random line over a finite field. If we only know one point of the line, the second point is still uniformly random. However two points determine the line.

Pairwise independence II

Example (Pairwise independence in Fp) Let p be a prime number. Given 2 uniformly random variables Y1, Y2 ∼ [0,..., p − 1], generate p pairwise independent random variables as follows:

Xi := Y1 + i · Y2 mod p i ∈ [0, p − 1]

Why are they even random? Why are they pairwise independent?

41 / 78 Can think of these random variables as picking a random line over a finite field. If we only know one point of the line, the second point is still uniformly random. However two points determine the line.

Pairwise independence II

Example (Pairwise independence in Fp) Let p be a prime number. Given 2 uniformly random variables Y1, Y2 ∼ [0,..., p − 1], generate p pairwise independent random variables as follows:

Xi := Y1 + i · Y2 mod p i ∈ [0, p − 1]

Why are they even random? Why are they pairwise independent? Are they also 3-wise independent?

42 / 78 Pairwise independence II

Example (Pairwise independence in Fp) Let p be a prime number. Given 2 uniformly random variables Y1, Y2 ∼ [0,..., p − 1], generate p pairwise independent random variables as follows:

Xi := Y1 + i · Y2 mod p i ∈ [0, p − 1]

Why are they even random? Why are they pairwise independent? Are they also 3-wise independent? Can think of these random variables as picking a random line over a finite field. If we only know one point of the line, the second point is still uniformly random. However two points determine the line.

43 / 78 Definition (Universal Hash Functions) Let U be a universe with |U| ≥ n. A family of hash functions H = {h : U → [0, n − 1]} is k-universal if, for any distinct elements u1,..., uk ∈ U, we have

k−1 Pr [h(u1) = h(u2) = ... = h(uk )] ≤ 1/n h∈R H

Definition (Strongly Universal Hash Functions) H = {h : U → [0, n − 1]} is strongly k-universal if, for any distinct elements u1,..., uk ∈ U and for any values y1,..., yk ∈ [0, n − 1], we have

k Pr [h(u1) = y1,..., h(uk ) = yk ] ≤ 1/n h∈R H

Universal Hash Functions We want hash functions. Why are we talking about random variables?

44 / 78 Definition (Strongly Universal Hash Functions) H = {h : U → [0, n − 1]} is strongly k-universal if, for any distinct elements u1,..., uk ∈ U and for any values y1,..., yk ∈ [0, n − 1], we have

k Pr [h(u1) = y1,..., h(uk ) = yk ] ≤ 1/n h∈R H

Universal Hash Functions We want hash functions. Why are we talking about random variables? Definition (Universal Hash Functions) Let U be a universe with |U| ≥ n. A family of hash functions H = {h : U → [0, n − 1]} is k-universal if, for any distinct elements u1,..., uk ∈ U, we have

k−1 Pr [h(u1) = h(u2) = ... = h(uk )] ≤ 1/n h∈R H

45 / 78 Universal Hash Functions We want hash functions. Why are we talking about random variables? Definition (Universal Hash Functions) Let U be a universe with |U| ≥ n. A family of hash functions H = {h : U → [0, n − 1]} is k-universal if, for any distinct elements u1,..., uk ∈ U, we have

k−1 Pr [h(u1) = h(u2) = ... = h(uk )] ≤ 1/n h∈R H

Definition (Strongly Universal Hash Functions) H = {h : U → [0, n − 1]} is strongly k-universal if, for any distinct elements u1,..., uk ∈ U and for any values y1,..., yk ∈ [0, n − 1], we have

k Pr [h(u1) = y1,..., h(uk ) = yk ] ≤ 1/n h∈R H

46 / 78 Family H is strongly k-universal if the random variables h(0),..., h(|U| − 1) are k-wise independent.

Can use random variables to construct universal hash functions!

Relation to k-wise independent random variables

What do the previous definitions have to do with random variables?

47 / 78 Can use random variables to construct universal hash functions!

Relation to k-wise independent random variables

What do the previous definitions have to do with random variables?

Family H is strongly k-universal if the random variables h(0),..., h(|U| − 1) are k-wise independent.

48 / 78 Relation to k-wise independent random variables

What do the previous definitions have to do with random variables?

Family H is strongly k-universal if the random variables h(0),..., h(|U| − 1) are k-wise independent.

Can use random variables to construct universal hash functions!

49 / 78 How do we make the domain U much larger than image of the maps? (as usually in hashing size of universe much larger than size of table) Proposition k k Let U = [0, p − 1] ≡ [0, p − 1] \{(0,..., 0)} and a = (a0,... ak−1)

H = {ha,b(x) := a · x + b mod p | a ∈ U, b ∈ [0, p − 1]}

is strongly 2-universal.

Strongly 2-universal families of hash functions

Let p be a prime number, U = [0, p − 1]. Proposition

H = {ha,b(x) := a · x + b mod p | a, b ∈ [0, p − 1]} is strongly 2-universal.

50 / 78 Proposition k k Let U = [0, p − 1] ≡ [0, p − 1] \{(0,..., 0)} and a = (a0,... ak−1)

H = {ha,b(x) := a · x + b mod p | a ∈ U, b ∈ [0, p − 1]}

is strongly 2-universal.

Strongly 2-universal families of hash functions

Let p be a prime number, U = [0, p − 1]. Proposition

H = {ha,b(x) := a · x + b mod p | a, b ∈ [0, p − 1]} is strongly 2-universal.

How do we make the domain U much larger than image of the maps? (as usually in hashing size of universe much larger than size of table)

51 / 78 Strongly 2-universal families of hash functions

Let p be a prime number, U = [0, p − 1]. Proposition

H = {ha,b(x) := a · x + b mod p | a, b ∈ [0, p − 1]} is strongly 2-universal.

How do we make the domain U much larger than image of the maps? (as usually in hashing size of universe much larger than size of table) Proposition k k Let U = [0, p − 1] ≡ [0, p − 1] \{(0,..., 0)} and a = (a0,... ak−1)

H = {ha,b(x) := a · x + b mod p | a ∈ U, b ∈ [0, p − 1]}

is strongly 2-universal.

52 / 78 Strongly 2-universal families of hash functions Proposition k k Let U = [0, p − 1] ≡ [0, p − 1] \{(0,..., 0)} and a = (a0,... ak−1)

H = {ha,b(x) := a · x + b mod p | a ∈ U, b ∈ [0, p − 1]}

is strongly 2-universal.

53 / 78 2-universal families of hash functions

What if my hast table size is not a prime? Proposition

H = {ha,b(x) := (a · x + b mod p) mod n | a, b ∈ [0, p − 1]}

is 2-universal (but not strongly 2-universal).

Practice problem: prove the proposition above.

54 / 78 YES! Instead of constructing random lines (degree 1 polynomials), can construct random univariate polynomials of degree k − 1 Two points determine a line. Similarly, k points determine a univariate polynomial of degree k − 1 Random degree k − 1 polynomials are k-wise independent! Practice problem: prove this!

k-universal families of hash functions

Can we construct k-universal families of hash functions like this?

55 / 78 Two points determine a line. Similarly, k points determine a univariate polynomial of degree k − 1 Random degree k − 1 polynomials are k-wise independent! Practice problem: prove this!

k-universal families of hash functions

Can we construct k-universal families of hash functions like this? YES! Instead of constructing random lines (degree 1 polynomials), can construct random univariate polynomials of degree k − 1

56 / 78 Random degree k − 1 polynomials are k-wise independent! Practice problem: prove this!

k-universal families of hash functions

Can we construct k-universal families of hash functions like this? YES! Instead of constructing random lines (degree 1 polynomials), can construct random univariate polynomials of degree k − 1 Two points determine a line. Similarly, k points determine a univariate polynomial of degree k − 1

57 / 78 k-universal families of hash functions

Can we construct k-universal families of hash functions like this? YES! Instead of constructing random lines (degree 1 polynomials), can construct random univariate polynomials of degree k − 1 Two points determine a line. Similarly, k points determine a univariate polynomial of degree k − 1 Random degree k − 1 polynomials are k-wise independent! Practice problem: prove this!

58 / 78 Remark For random function all operations (insert, delete, search) take O(n log m) time (at best!)

Remark In XOR example, our function takes O(n) storage space, and O(n) time to compute.a

In Fp examples, our function takes O(1) storage space and O(1) time to compute!b

aReminder that we assume that n < 2w . bWe assume that p < 2w .

Efficiency

How did pairwise independent improve the problems we were having with random functions?

59 / 78 Remark In XOR example, our function takes O(n) storage space, and O(n) time to compute.a

In Fp examples, our function takes O(1) storage space and O(1) time to compute!b

aReminder that we assume that n < 2w . bWe assume that p < 2w .

Efficiency

How did pairwise independent improve the problems we were having with random functions?

Remark For random function all operations (insert, delete, search) take O(n log m) time (at best!)

60 / 78 Efficiency

How did pairwise independent improve the problems we were having with random functions?

Remark For random function all operations (insert, delete, search) take O(n log m) time (at best!)

Remark In XOR example, our function takes O(n) storage space, and O(n) time to compute.a

In Fp examples, our function takes O(1) storage space and O(1) time to compute!b

aReminder that we assume that n < 2w . bWe assume that p < 2w .

61 / 78 Theorem from Probability

Theorem (Markov’s inequality) If X is a non-negative random variable and t > 0, we have:

[X ] Pr[X ≥ t] ≤ E t

62 / 78 H = {ha,b(x) := (a · x + b mod p) mod n | a, b ∈ [0, p − 1]} Only need to choose a, b ∈ [0, p − 1] to store a function from H.

Computation time of ha,b is also O(log m) Can this hash function match chain hashing parameters? (O(log log n) search time)

Do not have same expected search time as chain hashing.

Lemma (Maximum number of collisions) The expected number of collisions using a 2-universal hash family is

`2/2n

Hashing with 2-universal families

Let U = [0, m − 1], and p be a prime number such that m ≤ p < 2m (exists by Bertrand’s postulate)

63 / 78 Only need to choose a, b ∈ [0, p − 1] to store a function from H.

Computation time of ha,b is also O(log m) Can this hash function match chain hashing parameters? (O(log log n) search time)

Do not have same expected search time as chain hashing.

Lemma (Maximum number of collisions) The expected number of collisions using a 2-universal hash family is

`2/2n

Hashing with 2-universal families

Let U = [0, m − 1], and p be a prime number such that m ≤ p < 2m (exists by Bertrand’s postulate)

H = {ha,b(x) := (a · x + b mod p) mod n | a, b ∈ [0, p − 1]}

64 / 78 Computation time of ha,b is also O(log m) Can this hash function match chain hashing parameters? (O(log log n) search time)

Do not have same expected search time as chain hashing.

Lemma (Maximum number of collisions) The expected number of collisions using a 2-universal hash family is

`2/2n

Hashing with 2-universal families

Let U = [0, m − 1], and p be a prime number such that m ≤ p < 2m (exists by Bertrand’s postulate)

H = {ha,b(x) := (a · x + b mod p) mod n | a, b ∈ [0, p − 1]} Only need to choose a, b ∈ [0, p − 1] to store a function from H.

65 / 78 Can this hash function match chain hashing parameters? (O(log log n) search time)

Do not have same expected search time as chain hashing.

Lemma (Maximum number of collisions) The expected number of collisions using a 2-universal hash family is

`2/2n

Hashing with 2-universal families

Let U = [0, m − 1], and p be a prime number such that m ≤ p < 2m (exists by Bertrand’s postulate)

H = {ha,b(x) := (a · x + b mod p) mod n | a, b ∈ [0, p − 1]} Only need to choose a, b ∈ [0, p − 1] to store a function from H.

Computation time of ha,b is also O(log m)

66 / 78 Do not have same expected search time as chain hashing.

Lemma (Maximum number of collisions) The expected number of collisions using a 2-universal hash family is

`2/2n

Hashing with 2-universal families

Let U = [0, m − 1], and p be a prime number such that m ≤ p < 2m (exists by Bertrand’s postulate)

H = {ha,b(x) := (a · x + b mod p) mod n | a, b ∈ [0, p − 1]} Only need to choose a, b ∈ [0, p − 1] to store a function from H.

Computation time of ha,b is also O(log m) Can this hash function match chain hashing parameters? (O(log log n) search time)

67 / 78 Lemma (Maximum number of collisions) The expected number of collisions using a 2-universal hash family is

`2/2n

Hashing with 2-universal families

Let U = [0, m − 1], and p be a prime number such that m ≤ p < 2m (exists by Bertrand’s postulate)

H = {ha,b(x) := (a · x + b mod p) mod n | a, b ∈ [0, p − 1]} Only need to choose a, b ∈ [0, p − 1] to store a function from H.

Computation time of ha,b is also O(log m) Can this hash function match chain hashing parameters? (O(log log n) search time)

Do not have same expected search time as chain hashing.

68 / 78 Hashing with 2-universal families

Let U = [0, m − 1], and p be a prime number such that m ≤ p < 2m (exists by Bertrand’s postulate)

H = {ha,b(x) := (a · x + b mod p) mod n | a, b ∈ [0, p − 1]} Only need to choose a, b ∈ [0, p − 1] to store a function from H.

Computation time of ha,b is also O(log m) Can this hash function match chain hashing parameters? (O(log log n) search time)

Do not have same expected search time as chain hashing.

Lemma (Maximum number of collisions) The expected number of collisions using a 2-universal hash family is

`2/2n

69 / 78 Thus, by Markov’s inequality, we have Lemma (Maximum load of entry of hash table) With probability ≥ 1/2 the number of collisions using a 2-universal hash family is r 2`2 ≤ . n √ When ` ≈ n (as is usually assumed in hashing), we expect 2n.

Hashing with 2-universal families

Lemma (Maximum number of collisions) The expected number of collisions using a 2-universal hash family is

`2/2n

70 / 78 Hashing with 2-universal families

Lemma (Maximum number of collisions) The expected number of collisions using a 2-universal hash family is

`2/2n

Thus, by Markov’s inequality, we have Lemma (Maximum load of entry of hash table) With probability ≥ 1/2 the number of collisions using a 2-universal hash family is r 2`2 ≤ . n √ When ` ≈ n (as is usually assumed in hashing), we expect 2n.

71 / 78 Corollary If h ∈ H is a random hash function from a 2-universal family of hash √ functions, then for any set S ⊆ U of size ` ≤ n, the probability of h being perfect for S is at least 1/2.

Proof: There is no collision with probability ≥ 1/2.

New idea: build a two-level hash table!

Theorem The two-level approach gives perfect hashing scheme.

Perfect Hashing

How to build a hash table with O(1) search time and O(n) memory? Can we still do it with a 2-universal family of hash functions?

72 / 78 New idea: build a two-level hash table!

Theorem The two-level approach gives perfect hashing scheme.

Perfect Hashing

How to build a hash table with O(1) search time and O(n) memory? Can we still do it with a 2-universal family of hash functions? Corollary If h ∈ H is a random hash function from a 2-universal family of hash √ functions, then for any set S ⊆ U of size ` ≤ n, the probability of h being perfect for S is at least 1/2.

Proof: There is no collision with probability ≥ 1/2.

73 / 78 Theorem The two-level approach gives perfect hashing scheme.

Perfect Hashing

How to build a hash table with O(1) search time and O(n) memory? Can we still do it with a 2-universal family of hash functions? Corollary If h ∈ H is a random hash function from a 2-universal family of hash √ functions, then for any set S ⊆ U of size ` ≤ n, the probability of h being perfect for S is at least 1/2.

Proof: There is no collision with probability ≥ 1/2.

New idea: build a two-level hash table!

74 / 78 Perfect Hashing

How to build a hash table with O(1) search time and O(n) memory? Can we still do it with a 2-universal family of hash functions? Corollary If h ∈ H is a random hash function from a 2-universal family of hash √ functions, then for any set S ⊆ U of size ` ≤ n, the probability of h being perfect for S is at least 1/2.

Proof: There is no collision with probability ≥ 1/2.

New idea: build a two-level hash table!

Theorem The two-level approach gives perfect hashing scheme.

75 / 78 Proof (sketch) of Theorem

76 / 78

Acknowledgement

Lecture based largely on Lap Chi’s notes. See Lap Chi’s notes at https://cs.uwaterloo.ca/~lapchi/cs466/notes/L05.pdf

77 / 78 ReferencesI

Motwani, Rajeev and Raghavan, Prabhakar (2007) Randomized Algorithms

Mitzenmacher, Michael, and Eli Upfal (2017) Probability and computing: Randomization and probabilistic techniques in algorithms and data analysis. Cambridge university press, 2017.

78 / 78