Chapter 6

Sampling and other Stuff

By Sariel Har-Peled, September 30, 2008x

6.1 Two-Point Sampling

6.1.1 About Modulo Rings and Pairwise Independence

Let p be a prime number, and let ZZp = {0, 1,..., p − 1} denote the ring of integers modules p. Two integers x and yb are equivalent modulo p, if x ≡ y mod p; namely, the reminder of dividing x and y by p is the same.

Lemma 6.1.1 Given y, i ∈ ZZp, and choosing a, b randomly and uniformly from ZZp, the of y ≡ ai + b (mod p) is 1/p. Proof: Imagine that we first choose a, then the required probability, is that we choose b such that y − ai ≡ b (mod p). And the probability for that is 1/p, as we choose b uniformly.

Lemma 6.1.2 Let p be a prime, and a ∈ {1,..., p − 1}. Then,

  ai (mod p) i = 0,..., p − 1 = ZZp.

Putting it differently, for any non-zero a ∈ ZZp, there is a unique inverse b ∈ ZZp such that ab (mod p) = 1. Proof: Assume, for the sake of contradiction, that the claim is false. Then, by the pigeon hole principle, there must exists i > j such that ai (mod p) = a j (mod p). Namely, there are k0, k, u such that ai = u + kp and a j = u + k0 p. Since i > j it must be that k > k0. Subtracting the two equalities, we get that a(i− j) = (k−k0)p > 0. Now, i − j must be larger than one, since if i − j = 1 then a = p, which is impossible. Similarly,

xThis work is licensed under the Creative Commons Attribution-Noncommercial 3.0 License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

1 i − j < p. Also, i − j can not divide p, since p is a prime. Thus, it must be that i − j must divide k − k0. So, let us set α = (k − k0)/(i − j) ≥ 1. This implies that a = αp ≥ p, which is impossible. Thus, our assumption is false.

Lemma 6.1.3 Given y, z, x, w ∈ ZZp, such that x , w, and choosing a and b randomly and uni- 2 formly from ZZp, the probability that y ≡ ax + b (mod p) and z = aw + b (mod p) is 1/p .

Proof: This equivalent to claiming that the system of equalities y ≡ ax + b (mod p) and z = aw + b have a unique solution in a and b. To see why this is true, subtract one equation from the other. We get y − z ≡ a(x − w) (mod p). Since x − w . 0 (mod p), it must be that there is a unique value of a such that the equation holds. This in turns, imply a specific value for b. The probability that a and b get those two specific values is 1/p2.

Lemma 6.1.4 Let i and j be two distinct elements of ZZp. And choose a and b randomly and independently from ZZp. Then, the two random variables Yi = ai+b (mod p) and Y j = a j+b (mod p) are uniformly distributed on ZZp, and are pairwise independent.

Proof: The claim about the uniform distribution follows from Lemma 6.1.1, as Pr[Yi = α] = 1/p, for any α ∈ ZZp. As for being pairwise independent, observe that h i ∩ 2   Pr Yi = α Y j = β 1/n 1 Pr Yi = α Y j = β = h i = = = Pr[Yi = α] , Pr Y j = β 1/n n by Lemma 6.1.1 and Lemma 6.1.3. Thus, Yi and Y j are pairwise independent.

Remark 6.1.5 It is important to understand what independence between random variables mean: It means that having information about the value of X, gives you no information about Y. But this is only pairwise independence. Indeed, consider the variables Y1, Y2, Y3, Y4 defined above. Every pair of them are pairwise independent. But, if you give the value of Y1 and Y2, I know the value of Y3 and Y4 immediately. Indeed, giving me the value of Y1 and Y2 is enough to figure out the value of a and b. Once we know a and b, we immediately can compute all the Yis. Thus, the notion of independence can be extended to k-pairwise independence of n random variables, where only if you know the value of k variables, you can compute the value of all the other variables. More on that later in the course.

Pn Lemma 6.1.6 Let X1, X2,..., Xn be pairwise independent random variables, and X = i=1 Xi. Pn Then V[X] = i=1 V[Xi].

Proof: Observe, that h i h i V[X] = E (X − E[X])2 = E X2 −(E[X])2 .

2 Let X and Y be pairwise independent variables. Observe that E[XY] = E[X] E[Y], as can be easily verified. Thus, h i V[X + Y] = E (X + Y − E[X] − E[Y])2 h i = E (X + Y)2 − 2(X + Y)(E[X] + E[Y]) +(E[X] + E[Y])2 h i = E (X + Y)2 −(E[X] + E[Y])2 h i = E X2 + 2XY + Y2 −(E[X])2 − 2 E[X] E[Y] −(E[Y])2  h i   h i  = E X2 −(E[X])2 + E Y2 −(E[Y])2 + 2 E[XY] − 2 E[X] E[Y] = V[X] + V[Y] + +2 E[X] E[Y] − 2 E[X] E[Y] = V[X] + V[Y] .

Using the above argumentation for several variables, instead of just two, implies the lemma.

6.1.2 Using less randomization for a randomized algorithm We can consider a randomized algorithm, to be a deterministic algorithm A(x, r) that receives together with the input x, a random string r of bits, that it uses to read random bits from. Let us redefine RP:

Definition 6.1.7 The class RP (for Randomized Polynomial time) consists of all languages L that have a deterministic algorithm A(x, r) with worst case polynomial running time such that for any input x ∈ Σ∗,

• x ∈ L ⇒ A(x, r) = 1 for half the possible values of r.

• x < L ⇒ A(x, r) = 0 for all values of r.

Let assume that we now want to minimize the number of random bits we use in the execution of the algorithm (Why?). If we run the algorithm t times, we have confidence 2−t in our result, while using t log n random bits (assuming our random algorithm needs only log n bits in each execution). Similarly, let us choose two random numbers from ZZn, and run A(x, a) and A(x, b), gaining us only confidence 1/4 in the correctness of our results, while requiring 2 log n bits. Can we do better? Let us define ri = ai + b mod n, where a, b are random values as above Pt (note, that we assume that n is prime), for i = 1,..., t. Thus Y = i=1 A(x, ri) is a sum of random variables which are pairwise independent, as the ri are pairwise independent. Assume,√ that x ∈ L, 2 Pt h i then we have E[Y] = t/2, and σY = V[Y] = i=1 var A(x, ri) ≤ t/4, and σY ≤ t/2. The probability that all those executions failed, corresponds to the event that Y = 0, and " √ #  t  t √ 1 Pr[Y = 0] ≤ Pr |Y − E[Y]| ≥ = Pr |Y − E[Y]| ≥ · t ≤ , 2 2 t

by the Chebyshev inequality. Thus we were able to “extract” from our random bits, much more than one would naturally suspect is possible.

3 6.2 Chernoff Inequality - A Special Case

Theorem 6.2.1 Let X1,..., Xn be n independent random variables, such that Pr[Xi = 1] = Pr[Xi = −1] = 1 Pn 2 , for i = 1,..., n. Let Y = i=1 Xi. Then, for any ∆ > 0, we have

2 Pr[Y ≥ ∆] ≤ e−∆ /2n.

Proof: Clearly, for an arbitrary t, to specified shortly, we have   E exp(tY) Pr[Y ≥ ∆] = Prexp(tY) ≥ exp(t∆) ≤ , exp(t∆) the first part follows by the fact that exp(·) preserve ordering, and the second part follows by the Markov inequality. Observe that t −t   1 t 1 −t e + e E exp(tX ) = e + e = i 2 2 2 ! 1 t t2 t3 = 1 + + + + ··· 2 1! 2! 3! ! 1 t t2 t3 + 1 − + − + ··· 2 1! 2! 3! ! t2 t2k = 1 + + + + ··· + + ··· , 2! (2k)! by the Taylor expansion of exp(·). Note, that (2k)! ≥ (k!)2k, and thus

∞ 2i ∞ 2i ∞ 2 !i   X t X t X 1 t  2  E exp(tX ) = ≤ = = exp t /2 , i (2i)! 2i(i!) i! 2 i=0 i=0 i=0 again, by the Taylor expansion of exp(·). Next, by the independence of the Xis, we have      X Y Yn          E exp(tY) = Eexp tXi = E exp(tXi) = E exp(tXi) i i i=1 n Y 2 2 ≤ et /2 = ent /2. i=1 We have   exp nt2/2   Pr[Y ≥ ∆] ≤ = exp nt2/2 − t∆ . exp(t∆) Next, by minimizing the above quantity for t, we set t = ∆/n. We conclude,  !2  ! n ∆ ∆  ∆2 Pr[Y ≥ ∆] ≤ exp − ∆ = exp − . 2 n n  2n

By the symmetry of Y, we get the following:

4 Corollary 6.2.2 Let X1,..., Xn be n independent random variables, such that Pr[Xi = 1] = Pr[Xi = −1] = 1 Pn 2 , for i = 1,..., n. Let Y = i=1 Xi. Then, for any ∆ > 0, we have

2 Pr[|Y| ≥ ∆] ≤ 2e−∆ /2n.

1 Corollary 6.2.3 Let X1,..., Xn be n independent coin flips, such that Pr[Xi = 0] = Pr[Xi = 1] = 2 , Pn for i = 1,..., n. Let Y = i=1 Xi. Then, for any ∆ > 0, we have   n − 2 Pr Y − ≥ ∆ ≤ 2e 2∆ /n. 2

Remark 6.2.4 Before going any further, it is might be instrumental to understand what this in- equalities imply. Consider then case where X is either zero or one with probability half. In this √ √ i case µ = E[Y] = n/2. Set δ = t n ( µ is approximately the standard deviation of X if pi = 1/2). We have by

 n     √    Pr Y − ≥ ∆ ≤ 2 exp −2∆2/n = 2 exp −2(t n)2/n = 2 exp −2t2 . 2

Thus, Chernoff inequality implies exponential decay (i.e., ≤ 2−t) with t standard deviations, instead of just polynomial (i.e., ≤ 1/t2) by the Chebychev’s inequality.

6.2.1 Application – QuickSort is Quick We revisit QuickSort. We remind the reader that the running time of QuickSort is proportional to the number of comparisions performed by the algorithm. Next, consider an arbitrary element u being sorted. Consider the ith level recursive subproblem that contains u, and let S i be the set of elements in this subproblems. We consider u to be successful in the ith level, if |S i+1| ≤ |S i| /2. Namely, if u is successful, then the next level in the recursion involing u would include a considerably smaller subproblem. Let Xi be the indicator variable which is 1 if u is successful. We first observe that if QuickSort is applied to an array with n elements, then u can be suc- cessful at most T = lg n times, before the subproblem it participates in is of size one, and the recurssion stops. Thus, consider the indicator variable Xi which is 1 if u is successful in the ith level, and zero otherwise. Note that the Xis are independent, and Pr[Xi = 1] = 1/2. If u participates in v levels, then we have the random variables X1, X2,..., Xv. To make things simpler, we will extend this series by adding indpendent random variables, such that Pr[‘] Xi = 1 = 1/2, for i ≥ v. Thus, we have an infinite sequence of independent random variables, that are 0/1 and get 1 with probablity 1/2. The question is how many elements in the sequence we need to read, till we get T ones.

Lemma 6.2.5 Let X1, X2,... be an infinite sequence of indpendent random 0/1 variables. Let√ M be an arbitrary parameter. Then the probability that we need to read more than 2M + 4t M   √ √ variables of this sequence till we collect M ones is at most 2 exp −t2 , for t ≤ M. If t ≥ M  √  then this probablity is at most 2 exp −t M .

5 √ PL Proof: Consider the Y = i=1 Xi, where L = 2M + 4t M. Its expectation is L/2, and using the Chernoff inequality, we get !  L L  2 L 2 α = Pr[Y ≤ M] ≤ Pr Y − ≥ − M ≤ 2 exp − − M 2 2 L 2 ! ! ! 2  √ 2 2  √ 2 8t2 M ≤ 2 exp − M + 2t M − M ≤ 2 exp − 2t M = 2 exp − , L L L √ √ by Corollary 6.2.3. For t ≤ M we have that L = 2M + 4t M ≤ 8M, as such in this case   Pr[Y ≤ M] ≤ 2 exp −t2 . ! ! √ 8t2 M 8t2 M  √  If t ≥ M, then α = 2 exp − √ ≤ 2 exp − √ ≤ 2 exp −t M . 2M + 4t M 6t M Going back to the QuickSort problem, we have that if we sort n elements, the probablity p p that u will participate in more than L = (4 + c) lg n = 2 lg n + 4c lg n lg n, is smaller than  p p  2 exp −c lg n lg n ≤ 1/nc, by Lemma 6.2.5. There are n elements being sorted, and as such the probablity that any element would partiticpate in more than (4 + c + 1) lg n recursive calls is smaller than 1/nc.

Lemma 6.2.6 For any c > 0, the probablity that QuickSort performs more than (6 + c)n lg n, is smaller than 1/nc.

6