Lecture 6 — February 9, 2017 1 Wrapping up on Hashing
Total Page:16
File Type:pdf, Size:1020Kb
CS 224: Advanced Algorithms Spring 2017 Lecture 6 | February 9, 2017 Prof. Jelani Nelson Scribe: Albert Chalom 1 Wrapping up on hashing Recall from las lecture that we showed that when using hashing with chaining of n items into m C log n bins where m = Θ(n) and hash function h :[ut] ! m that if h is from log log n wise family, then no C log n linked list has size > log log n . 1.1 Using two hash functions Azar, Broder, Karlin, and Upfal '99 [1] then used two hash functions h; g :[u] ! [m] that are fully random. Then ball i is inserted in the least loaded of the 2 bins h(i); g(i), and ties are broken randomly. log log n Theorem: Max load is ≤ log 2 + O(1) with high probability. When using d instead of 2 hash log log n functions we get max load ≤ log d + O(1) with high probability. n 1.2 Breaking n into d slots n V¨ocking '03 [2] then considered breaking our n buckets into d slots, and having d hash functions n h1; h2; :::; hd that hash into their corresponding slot of size d . Then insert(i), loads i in the least loaded of the h(i) bins, and breaks tie by putting i in the left most bin. log log n This improves the performance to Max Load ≤ where 1:618 = φ2 < φ3::: ≤ 2. dφd 1.3 Survey: Mitzenmacher, Richa, Sitaraman [3] Idea: Let Bi denote the number of bins with ≥ i balls. Let H(b) = height of ball b in its bin. Then in order for a bin to grow to height i+1, both bins that ball b hashed to had to be at least 2 Bi height i, so we get P(H(b) ≥ i + 1)) ≤ n . 2 Bi Bi+1 Bi 2 This gives us E(Bi+1) ≤ E(Number of balls with height ≥ i + 1) ≤ n , so n ≤ ( n ) . n As a base case, since there are only n balls we know that at most 2 bins can have a height of 2 B2 1 B3 1 B4 1 balls, giving us n ≤ 2 . Then using our recursive definition from above we get n ≤ 22 , n ≤ (22)2 , ..., and can get the rule B2+j ≤ 1 . n 22j 1 Once 22j >> n the probability is really small that a bin will have 2 + j balls, so j >> log log n ) no bins will have 2 + j balls with high probability. 2 n αi Formal: Let α6 = 2e , and αi+1 = e × n for i ≥ 6. Let Ei be the event that Bi ≤ αi. 1 We then need to show that P (8i ≥ 6;Ei) ≥ 1 − poly(n) . Claim 1: P ((Bi+1 > αi+1)j \j=1 Ej) ≤ P(E6) P(E7jE6) P(E8jE7;E6)::: T P Ei+1 \ ( Ej) \ j≤i P (Ei+1) P(Ei+1j Ej) = T ≤ T ( Ej) ( Ej) j≤i P P j≤i j=1 2 The denominator here is close to one, and the numerator is upper bounded by P(Bin(n; (αi=n) ) > αi+1). Consider the following experiment. We insert the balls one by one into the bins. Number the balls 1; :::; n and assign them to bins in increasing order of ball number. In our experiment we maintain two variables: X and Y. X is a Boolean ”flag" that starts as TRUE and becomes FALSE if any of E1; :::; Ei ever fail to hold. Y is a counter: it counts the number of balls of height at least i+1. We αi 2 want to say P((Y > αi+1) AND X)≤ P(Bin(n; ( n ) ) > αi+1). Let "A" be the event (Y > αi+1) alphai 2 AND "X" and "B" be the event "Bin(n; ( n ) ) > αi+1". So we want to show P(A) ≤ P(B). So, we're going along inserting balls 1,...,n. We're going to do what is called a "coupling argument" αi 2 (we will define some Bin(n; ( n ) ) random variable Z starting at 0 and gradually being incremented as we insert our balls, and Z and our X,Y will be defined on the same probability space so that A and B are dependent and easy to compare, but marginally A and B have the correct probabilities of occurring). When we assign ball j, there are two cases. Either X is already FALSE, in which 2 αi case we just set Y to 0 (we already know A failed) and we increment Z with probability p = n2 . Otherwise, if X is true, then we know there are exactly t ≤ αi bins of load ≥ i for some t. We label these bins 1,...,t in this time step (and the other bins are labeled t+1,...,n for some ordering 1 that doesn't matter). We then pick a uniform random variable U in [0,1]. If U is in [0; n2 ] then 1 2 we imagine the two bins j got hashed to are 1,1. If it's in [ n2 ; n2 ), then we imagine it got hashed 2 1 to bins 1 and 2. Etc. We slice up [0,1] into n buckets of size exactly n2 each such that the first t2 buckets all correspond to the t2 different ways we can hash to two heavy bins (i.e. bins with 1 2 2 load at least i). The remaining buckets of size n2 in [0,1] then correspond to all the other n − t different hash possibilities for ball j. We then adjust X as needed if some condition failed based on the U we drew (and potentially set Y to 0 if X got set to FALSE). We also increment Z iff U ≤ p. ***This is where the coupling happens***, since Z and X,Y are now defined on the same probability space based on these U random variables over our iterations! AND THE KEY THING: event "A" can only happen if "B" also happened, since the way we coupled Y and Z maintains the invariant that Y ≤ Z always (note we hold Y at 0 if X is ever set to FALSE)! Thus P(A) ≤ P(B). 2 αi Claim 2: P (Bin(n; n2 )) > αi+1. For those not aware Bin(n,p) refers to the distribution of the number of heads when flipping n p-biased coins. 2 αi Proof: Let Xi be an indicator for Ci being heads. Then E(Xi) = p = n2 . Then using the Chernoff 2 2 P −Cαi bound we get P( Xi > e × n × p) ≤ e n 2 Amortization 2.1 Definition If a data structure supports operations A1;A2; :::; Ar, we say that the amortized cost of Aj is tj if r P 8 sequences of operations with ni operations of type Ai the total runtime is ≤ njtj. j=1 2.2 Potential function: Let Φ map the states of the data structures to R>0(the non-negative real numbers), and Φ of the empty data structure is defined to be 0. Then the valid amortized cost of an operation tactual+∆Φ = tactual + Φ(after op) − Φ(before op) R P Then the total amortized cost of a sequence of ops = tactual(j) + Φ(time j) − Φ(timej − 1) = j=1 P (actual time for jth operation) + Φ(final) − Φ0. Since Φ0 = 0 and Φfinal ≥ 0 then our equation j above for total amortized cost is ≥ the total actual time. 2.3 Y-fast Tries n Recall from the first lecture that a Y fast trie consisted of an x-fast trie of θ(w) groups each pointing w to a BJT of [ 2 ; 2w) elements, and we said this supports query of θ(log w) and an amortized insert time of θ(log w). From now on, let t refer to actual time and t~ refers to amortized time. Now lets analyze the Y-fast trie insertion using our potential function. We have Φ(State of the structure) = P ((Size of group j) - w) grps For the query operation we get t~query = tquery + ∆Φ = t~query = tquery + 0 = Θ(log w). For t~insert = tinsert + ∆Φ. Now consider the two cases (either we have to split a group of 2w into 2 groups of size w or we don't). In the first case tinsert = Θ(log w) + w and ∆Φ = −w giving t~insert is Θ(log w). In the second case tinsert = Θ(log w) and ∆Φ = 1 giving t~insert = Θ(log w). 3 Heaps: 3.1 Definition: Heaps maintain keys with comparable values subject to the follow operations: 3 1. deleteMin(): report the smallest item and delete it from the heap 2. decKey(*P, v'): reduce value of p to V' 3. insert(K, V): insert key K with value V to the heap Binary Heaps: John Williams '64 [4] published the binary heap which supports tI (insertion time) = tD (delete min) = tK (dec key) in O(log n). Binomial Heaps: Vuillemin '78 [5] published the Binomial Heap which supports t~I = O(1) and tD = tK = O(log n). Fibonacci Heaps: Fredman and Tarjan '87 [6] published the Fibonacci Heap T~I = T~K = O(1), and tD = O(log n). Strict Fibonacci Heaps: Brodal, Lagogiannis, and Tarjan '12 [7] published the Strict Fibonacci Heap TI = TK = O(1), and tD = O(log n). Then Thorup '07 [8] showed that in Word RAM sorting in nS(n) ) there exists a heap with O(1) find min (note this is not the same as delete min, and O(S(n)) insert/deletep time.