CS 224: Advanced Algorithms Spring 2017 Lecture 6 — February 9, 2017 Prof. Jelani Nelson Scribe: Albert Chalom

1 Wrapping up on hashing

Recall from las lecture that we showed that when using hashing with chaining of n items into m C log n bins where m = Θ(n) and hash function h :[ut] → m that if h is from log log n wise family, then no C log n has size > log log n .

1.1 Using two hash functions

Azar, Broder, Karlin, and Upfal ’99 [1] then used two hash functions h, g :[u] → [m] that are fully random. Then ball i is inserted in the least loaded of the 2 bins h(i), g(i), and ties are broken randomly. log log n Theorem: Max load is ≤ log 2 + O(1) with high probability. When using d instead of 2 hash log log n functions we get max load ≤ log d + O(1) with high probability.

n 1.2 Breaking n into d slots

n V¨ocking ’03 [2] then considered breaking our n buckets into d slots, and having d hash functions n h1, h2, ..., hd that hash into their corresponding slot of size d . Then insert(i), loads i in the least loaded of the h(i) bins, and breaks tie by putting i in the left most bin. log log n This improves the performance to Max Load ≤ where 1.618 = φ2 < φ3... ≤ 2. dφd

1.3 Survey: Mitzenmacher, Richa, Sitaraman [3]

Idea: Let Bi denote the number of bins with ≥ i balls. Let H(b) = height of ball b in its bin. Then in order for a bin to grow to height i+1, both bins that ball b hashed to had to be at least  2 Bi height i, so we get P(H(b) ≥ i + 1)) ≤ n .

2 Bi Bi+1 Bi 2 This gives us E(Bi+1) ≤ E(Number of balls with height ≥ i + 1) ≤ n , so n ≤ ( n ) . n As a base case, since there are only n balls we know that at most 2 bins can have a height of 2 B2 1 B3 1 B4 1 balls, giving us n ≤ 2 . Then using our recursive definition from above we get n ≤ 22 , n ≤ (22)2 , ..., and can get the rule B2+j ≤ 1 . n 22j

1 Once 22j >> n the probability is really small that a bin will have 2 + j balls, so j >> log log n ⇒ no bins will have 2 + j balls with high probability.

2 n αi Formal: Let α6 = 2e , and αi+1 = e × n for i ≥ 6. Let Ei be the event that Bi ≤ αi. 1 We then need to show that P (∀i ≥ 6,Ei) ≥ 1 − poly(n) .

Claim 1: P ((Bi+1 > αi+1)| ∩j=1 Ej) ≤ P(E6) P(E7|E6) P(E8|E7,E6)...

T P Ei+1 ∩ ( Ej) \ j≤i P (Ei+1) P(Ei+1| Ej) = T ≤ T ( Ej) ( Ej) j≤i P P j≤i j=1

2 The denominator here is close to one, and the numerator is upper bounded by P(Bin(n, (αi/n) ) > αi+1). Consider the following experiment. We insert the balls one by one into the bins. Number the balls 1, ..., n and assign them to bins in increasing order of ball number. In our experiment we maintain two variables: X and Y. X is a Boolean ”flag” that starts as TRUE and becomes FALSE if any of E1, ..., Ei ever fail to hold. Y is a counter: it counts the number of balls of height at least i+1. We αi 2 want to say P((Y > αi+1) AND X)≤ P(Bin(n, ( n ) ) > αi+1). Let ”A” be the event (Y > αi+1) alphai 2 AND ”X” and ”B” be the event ”Bin(n, ( n ) ) > αi+1”. So we want to show P(A) ≤ P(B). So, we’re going along inserting balls 1,...,n. We’re going to do what is called a ”coupling argument” αi 2 (we will define some Bin(n, ( n ) ) random variable Z starting at 0 and gradually being incremented as we insert our balls, and Z and our X,Y will be defined on the same probability space so that A and B are dependent and easy to compare, but marginally A and B have the correct probabilities of occurring). When we assign ball j, there are two cases. Either X is already FALSE, in which 2 αi case we just Y to 0 (we already know A failed) and we increment Z with probability p = n2 . Otherwise, if X is true, then we know there are exactly t ≤ αi bins of load ≥ i for some t. We label these bins 1,...,t in this time step (and the other bins are labeled t+1,...,n for some ordering 1 that doesn’t matter). We then pick a uniform random variable U in [0,1]. If U is in [0, n2 ] then 1 2 we imagine the two bins j got hashed to are 1,1. If it’s in [ n2 , n2 ), then we imagine it got hashed 2 1 to bins 1 and 2. Etc. We slice up [0,1] into n buckets of size exactly n2 each such that the first t2 buckets all correspond to the t2 different ways we can hash to two heavy bins (i.e. bins with 1 2 2 load at least i). The remaining buckets of size n2 in [0,1] then correspond to all the other n − t different hash possibilities for ball j. We then adjust X as needed if some condition failed based on the U we drew (and potentially set Y to 0 if X got set to FALSE). We also increment Z iff U ≤ p. ***This is where the coupling happens***, since Z and X,Y are now defined on the same probability space based on these U random variables over our iterations! AND THE KEY THING: event ”A” can only happen if ”B” also happened, since the way we coupled Y and Z maintains the invariant that Y ≤ Z always (note we hold Y at 0 if X is ever set to FALSE)! Thus P(A) ≤ P(B). 2 αi Claim 2: P (Bin(n, n2 )) > αi+1. For those not aware Bin(n,p) refers to the distribution of the number of heads when flipping n p-biased coins.

2 αi Proof: Let Xi be an indicator for Ci being heads. Then E(Xi) = p = n2 . Then using the Chernoff

2 2 P −Cαi bound we get P( Xi > e × n × p) ≤ e n

2 Amortization

2.1 Definition

If a supports operations A1,A2, ..., Ar, we say that the amortized cost of Aj is tj if r P ∀ sequences of operations with ni operations of type Ai the total runtime is ≤ njtj. j=1

2.2 Potential function:

Let Φ map the states of the data structures to R>0(the non-negative real numbers), and Φ of the empty data structure is defined to be 0. Then the valid amortized cost of an operation tactual+∆Φ = tactual + Φ(after op) − Φ(before op) R P Then the total amortized cost of a sequence of ops = tactual(j) + Φ(time j) − Φ(timej − 1) = j=1 P (actual time for jth operation) + Φ(final) − Φ0. Since Φ0 = 0 and Φfinal ≥ 0 then our equation j above for total amortized cost is ≥ the total actual time.

2.3 Y-fast

n Recall from the first lecture that a Y fast consisted of an x-fast trie of θ(w) groups each pointing w to a BJT of [ 2 , 2w) elements, and we said this supports query of θ(log w) and an amortized insert time of θ(log w). From now on, let t refer to actual time and t˜ refers to amortized time. Now lets analyze the Y-fast trie insertion using our potential function. We have Φ(State of the structure) = P ((Size of group j) - w) grps

For the query operation we get t˜query = tquery + ∆Φ = t˜query = tquery + 0 = Θ(log w).

For t˜insert = tinsert + ∆Φ. Now consider the two cases (either we have to split a group of 2w into 2 groups of size w or we don’t). In the first case tinsert = Θ(log w) + w and ∆Φ = −w giving t˜insert is Θ(log w). In the second case tinsert = Θ(log w) and ∆Φ = 1 giving t˜insert = Θ(log w).

3 Heaps:

3.1 Definition:

Heaps maintain keys with comparable values subject to the follow operations:

3 1. deleteMin(): report the smallest item and delete it from the

2. decKey(*P, v’): reduce value of p to V’

3. insert(K, V): insert key K with value V to the heap

Binary Heaps: John Williams ’64 [4] published the which supports tI (insertion time) = tD (delete min) = tK (dec key) in O(log n).

Binomial Heaps: Vuillemin ’78 [5] published the which supports t˜I = O(1) and tD = tK = O(log n).

Fibonacci Heaps: Fredman and Tarjan ’87 [6] published the T˜I = T˜K = O(1), and tD = O(log n). Strict Fibonacci Heaps: Brodal, Lagogiannis, and Tarjan ’12 [7] published the Strict Fibonacci Heap TI = TK = O(1), and tD = O(log n). Then Thorup ’07 [8] showed that in Word RAM sorting in nS(n) ⇒ there exists a heap with O(1) find min (note this is not the same as delete min, and O(S(n)) insert/delete√ time. As pset 1 showed, there exist sorting algorithms where S(n) is O(log log n) or O( log log n

3.2 Binomial Heaps

A binomial heap stores its items in a forest. Each is given a ”rank” equal to the number of children of the root, and has the requirement that a rank k tree has k subtrees of ranks 0,1,2...,k-1. Each tree is also in heap order (meaning the parent’s value ≤ all of it’s children’s values. The binomial heap is also subject to the invariant that there are ≤ 1 rank-k tree in our forest for each k. A heap of rank 0 is just a single element. A heap of rank 1 is just a parent and child node. A heap of rank 2 has one child leaf node, and one child that also has a child. And a heap of rank 3 has 3 children (one that’s a tree of rank 2, one a tree of rank 1, and one a tree of rank 0). The heap then performs the following operations:

• Insert: Create a new rank 0 tree, and then keep merging to preserve our invariant.

• Dec key: Bubble up the node to maintain heap order.

• Delete min: Compare roots, cut the smallest root, and put its children as new top level trees then merge.

Claim: A rank k tree has 2k nodes. Proof: Induction. This implies that the largest rank in the forest is ≤ log n. To show our runtime, let Φ = number of trees in our forest. t˜i = ti + ∆Φ = 1 + (number of trees merged) + 1 - (number of trees merged) = O(1).

4 t˜k = tk + ∆Φ = tk + 0 = O(log n). t˜d = td + ∆Φ = k + (number of trees) + (at most number of trees) = O(log n + log n + log n) = O(log n).

References

[1] Yossi Azar, Andrei Z. Broder, Anna R. Karlin, Eli Upfal: Balanced Allocations. SIAM J. Comput., 29(1):180-200, 1999.

[2] Berthold V¨ocking: How asymmetry helps load balancing. J. ACM, 50(4):568-589, 2003.

[3] Richard Cole, Alan M. Frieze, Bruce M. Maggs, Michael Mitzenmacher, Andra W. Richa, Ramesh K. Sitaraman, Eli Upfal: On Balls and Bins with Deletions. RANDOM, 1998: 145- 158

[4] John W. J. Williams. Algorithm 232: Heapsort. CACM 7, 347–348, 1964.

[5] Jean Vuillemin: A Data Structure for Manipulating Priority Queues. Commun. ACM 21(4): 309-315, 1978.

[6] Michael L. Fredman, Robert Endre Tarjan: Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM 34(3), 596-615, 1987.

[7] Gerth Stlting Brodal, George Lagogiannis, Robert Endre Tarjan: Strict fibonacci heaps. STOC 1177-1184, 2012.

[8] Mikkel Thorup: Equivalence between priority queues and sorting. J. ACM 54(6): 28, 2007.

5