Chapter 6 Sampling and Other Stuff

Chapter 6 Sampling and other Stuff By Sariel Har-Peled, September 30, 2008¬ 6.1 Two-Point Sampling 6.1.1 About Modulo Rings and Pairwise Independence Let p be a prime number, and let ZZp = f0; 1;:::; p − 1g denote the ring of integers modules p. Two integers x and yb are equivalent modulo p, if x ≡ y mod p; namely, the reminder of dividing x and y by p is the same. Lemma 6.1.1 Given y; i 2 ZZp, and choosing a; b randomly and uniformly from ZZp, the probability of y ≡ ai + b (mod p) is 1=p. Proof: Imagine that we first choose a, then the required probability, is that we choose b such that y − ai ≡ b (mod p). And the probability for that is 1=p, as we choose b uniformly. Lemma 6.1.2 Let p be a prime, and a 2 f1;:::; p − 1g. Then, ai (mod p) i = 0;:::; p − 1 = ZZp: Putting it differently, for any non-zero a 2 ZZp, there is a unique inverse b 2 ZZp such that ab (mod p) = 1. Proof: Assume, for the sake of contradiction, that the claim is false. Then, by the pigeon hole principle, there must exists i > j such that ai (mod p) = a j (mod p). Namely, there are k0; k; u such that ai = u + kp and a j = u + k0 p: Since i > j it must be that k > k0. Subtracting the two equalities, we get that a(i− j) = (k−k0)p > 0. Now, i − j must be larger than one, since if i − j = 1 then a = p, which is impossible. Similarly, ¬This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. 1 i − j < p. Also, i − j can not divide p, since p is a prime. Thus, it must be that i − j must divide k − k0. So, let us set α = (k − k0)=(i − j) ≥ 1. This implies that a = αp ≥ p, which is impossible. Thus, our assumption is false. Lemma 6.1.3 Given y; z; x; w 2 ZZp, such that x , w, and choosing a and b randomly and uni- 2 formly from ZZp, the probability that y ≡ ax + b (mod p) and z = aw + b (mod p) is 1=p . Proof: This equivalent to claiming that the system of equalities y ≡ ax + b (mod p) and z = aw + b have a unique solution in a and b. To see why this is true, subtract one equation from the other. We get y − z ≡ a(x − w) (mod p). Since x − w . 0 (mod p), it must be that there is a unique value of a such that the equation holds. This in turns, imply a specific value for b. The probability that a and b get those two specific values is 1=p2. Lemma 6.1.4 Let i and j be two distinct elements of ZZp. And choose a and b randomly and independently from ZZp. Then, the two random variables Yi = ai+b (mod p) and Y j = a j+b (mod p) are uniformly distributed on ZZp, and are pairwise independent. Proof: The claim about the uniform distribution follows from Lemma 6.1.1, as Pr[Yi = α] = 1=p, for any α 2 ZZp. As for being pairwise independent, observe that h i \ 2 Pr Yi = α Y j = β 1=n 1 Pr Yi = α Y j = β = h i = = = Pr[Yi = α] ; Pr Y j = β 1=n n by Lemma 6.1.1 and Lemma 6.1.3. Thus, Yi and Y j are pairwise independent. Remark 6.1.5 It is important to understand what independence between random variables mean: It means that having information about the value of X, gives you no information about Y. But this is only pairwise independence. Indeed, consider the variables Y1; Y2; Y3; Y4 defined above. Every pair of them are pairwise independent. But, if you give the value of Y1 and Y2, I know the value of Y3 and Y4 immediately. Indeed, giving me the value of Y1 and Y2 is enough to figure out the value of a and b. Once we know a and b, we immediately can compute all the Yis. Thus, the notion of independence can be extended to k-pairwise independence of n random variables, where only if you know the value of k variables, you can compute the value of all the other variables. More on that later in the course. Pn Lemma 6.1.6 Let X1; X2;:::; Xn be pairwise independent random variables, and X = i=1 Xi. Pn Then V[X] = i=1 V[Xi]. Proof: Observe, that h i h i V[X] = E (X − E[X])2 = E X2 −(E[X])2 : 2 Let X and Y be pairwise independent variables. Observe that E[XY] = E[X] E[Y], as can be easily verified. Thus, h i V[X + Y] = E (X + Y − E[X] − E[Y])2 h i = E (X + Y)2 − 2(X + Y)(E[X] + E[Y]) +(E[X] + E[Y])2 h i = E (X + Y)2 −(E[X] + E[Y])2 h i = E X2 + 2XY + Y2 −(E[X])2 − 2 E[X] E[Y] −(E[Y])2 h i h i = E X2 −(E[X])2 + E Y2 −(E[Y])2 + 2 E[XY] − 2 E[X] E[Y] = V[X] + V[Y] + +2 E[X] E[Y] − 2 E[X] E[Y] = V[X] + V[Y] : Using the above argumentation for several variables, instead of just two, implies the lemma. 6.1.2 Using less randomization for a randomized algorithm We can consider a randomized algorithm, to be a deterministic algorithm A(x; r) that receives together with the input x, a random string r of bits, that it uses to read random bits from. Let us redefine RP: Definition 6.1.7 The class RP (for Randomized Polynomial time) consists of all languages L that have a deterministic algorithm A(x; r) with worst case polynomial running time such that for any input x 2 Σ∗, • x 2 L ) A(x; r) = 1 for half the possible values of r. • x < L ) A(x; r) = 0 for all values of r. Let assume that we now want to minimize the number of random bits we use in the execution of the algorithm (Why?). If we run the algorithm t times, we have confidence 2−t in our result, while using t log n random bits (assuming our random algorithm needs only log n bits in each execution). Similarly, let us choose two random numbers from ZZn, and run A(x; a) and A(x; b), gaining us only confidence 1=4 in the correctness of our results, while requiring 2 log n bits. Can we do better? Let us define ri = ai + b mod n, where a; b are random values as above Pt (note, that we assume that n is prime), for i = 1;:::; t. Thus Y = i=1 A(x; ri) is a sum of random variables which are pairwise independent, as the ri are pairwise independent. Assume,p that x 2 L, 2 Pt h i then we have E[Y] = t=2, and σY = V[Y] = i=1 var A(x; ri) ≤ t=4, and σY ≤ t=2. The probability that all those executions failed, corresponds to the event that Y = 0, and " p # t t p 1 Pr[Y = 0] ≤ Pr jY − E[Y]j ≥ = Pr jY − E[Y]j ≥ · t ≤ ; 2 2 t by the Chebyshev inequality. Thus we were able to “extract” from our random bits, much more than one would naturally suspect is possible. 3 6.2 Chernoff Inequality - A Special Case Theorem 6.2.1 Let X1;:::; Xn be n independent random variables, such that Pr[Xi = 1] = Pr[Xi = −1] = 1 Pn 2 , for i = 1;:::; n. Let Y = i=1 Xi. Then, for any ∆ > 0, we have 2 Pr[Y ≥ ∆] ≤ e−∆ =2n: Proof: Clearly, for an arbitrary t, to specified shortly, we have E exp(tY) Pr[Y ≥ ∆] = Prexp(tY) ≥ exp(t∆) ≤ ; exp(t∆) the first part follows by the fact that exp(·) preserve ordering, and the second part follows by the Markov inequality. Observe that t −t 1 t 1 −t e + e E exp(tX ) = e + e = i 2 2 2 ! 1 t t2 t3 = 1 + + + + ··· 2 1! 2! 3! ! 1 t t2 t3 + 1 − + − + ··· 2 1! 2! 3! ! t2 t2k = 1 + + + + ··· + + ··· ; 2! (2k)! by the Taylor expansion of exp(·). Note, that (2k)! ≥ (k!)2k, and thus 1 2i 1 2i 1 2 !i X t X t X 1 t 2 E exp(tX ) = ≤ = = exp t =2 ; i (2i)! 2i(i!) i! 2 i=0 i=0 i=0 again, by the Taylor expansion of exp(·). Next, by the independence of the Xis, we have 2 0 13 2 3 X Y Yn 6 B C7 6 7 E exp(tY) = E46exp@B tXiAC57 = E46 exp(tXi)57 = E exp(tXi) i i i=1 n Y 2 2 ≤ et =2 = ent =2: i=1 We have exp nt2=2 Pr[Y ≥ ∆] ≤ = exp nt2=2 − t∆ : exp(t∆) Next, by minimizing the above quantity for t, we set t = ∆=n. We conclude, 0 !2 1 ! Bn ∆ ∆ C ∆2 Pr[Y ≥ ∆] ≤ expB − ∆C = exp − : @2 n n A 2n By the symmetry of Y, we get the following: 4 Corollary 6.2.2 Let X1;:::; Xn be n independent random variables, such that Pr[Xi = 1] = Pr[Xi = −1] = 1 Pn 2 , for i = 1;:::; n.

Load more