Lecture 2 — 31 August 2014 1 Overview 2 Food for Thought 3 Probability Background
Total Page:16
File Type:pdf, Size:1020Kb
CS 388R: Randomized Algorithms Fall 2015 Lecture 2 | 31 August 2014 Prof. Eric Price Scribe: Sid Kapur, Neil Vyas 1 Overview In this lecture we will first introduce some probability machinery: linearity of expectation, the central limit theorem, and the discrete-setting Chernoff bound. We will then apply these tools to some simple problems: repeated Bernoulli trials and the coupon collecting problem. Finally, we will discuss how randomized algorithms fit in to the landscape of complexity theory. 2 Food for Thought 1. Suppose you flip an unbiased coin 1000 times. How surprised would you be if we observed • 500 heads? • 510 heads? • 600 heads? • 1000 heads? 2. Suppose you have a biased coin that lands heads with an unknown probability p. How many flips will it take to learn p to an error of ± with probability 1 − δ? 3 Probability Background Theorem 1. Linearity of expectation. If X1;:::;Xn are random variables, then " n # n X X E Xi = E[Xi] i=1 i=1 Pn Example: Suppose X1;:::;Xn are Bernoulli trials with p = 0:5. Let X = i=1 Ei. Then n X n [X] = [X ] = E E i 2 i=1 Definition 2. The variance of a random variable X, denoted Var[X], is defined as 2 Var[X] = E (X − E[X]) 1 Observation 3. Note that if Y1 and Y2 are independent random variables with E[Y1] = E[Y2] = 0, then 2 Var[Y1 + Y2] = E (Y1 + Y2) 2 2 = E Y1 + Y2 + 2Y1Y2 2 2 = E Y1 + E Y2 + 2 E[Y1] E[Y2] (since Y1 and Y2 are independent) 2 2 = E Y1 + E Y2 = Var[Y1] + Var[Y2] Claim 4. We claim (without proof) that if Y1 and Y2 are independent, then Var[Y1 + Y2] = Var[Y1] + Var[Y2] (i.e. E[Y1] and E[Y2] need not be zero). Pn Example: Suppose X1;:::;Xn are independent Bernoulli trials with p = 0:5. Let X = i=1 Ei. Then n X Var[X] = Var[Xi] (by Claim ??) i=1 " # 1 12 1 12 = n + 2 2 2 2 n = 4 If n = 1000, then Var[X] = 250 and the standard deviation of X = pVar[X] ≈ 16. Theorem 5. The central limit theorem Suppose X1;:::;Xn are independent identically dis- tributed random variables with expected value µ and variance σ2. X1+···+Xn Then as n ! 1, the sample average Sn = n converges to the Gaussian σ2 N µ, n Some statements of the central limit theorem give bounds on convergence, but unfortunately these bounds are not tight enough to be useful in this example. By the central limit theorem, as the number of coin tosses goes to infinity, our binomial distribution 2 n 2 n of coin tosses converges to a Gaussian N(µ, σ ) where µ = 2 and σ = 4 , as we calculated above. 4 Bounding the Sum of Bernoulli Trials Let's return to our coin-tossing problem. Can we find an upper bound on the probability that the number of coin tosses exceeds a certain number? 2 Claim 6. If Z is drawn from a Gaussian with mean µ and variance σ2, then 2 − t P[Z ≥ µ + t] ≤ e 2σ2 8t ≥ 0 2 − t P[Z ≤ µ − t] ≤ e 2σ2 8t ≥ 0 (We omit the proof.) If our distribution were Gaussian (or if we were willing to invoke the central limit theorem), we could use the bound from Claim ??: 2 − t P[X ≥ 500 + σt] ≤ e 2 2 − t ) P[X ≥ 500 + 16t] ≤ e 2 −18 ) P[X ≥ 600] ≤ e But unfortunately, we have a binomial distribution, not a Gaussian, and we don't want to use the central limit theorem. What should we do? Pn Definition 7. Chernoff bound (for the discrete setting) Let X = i=1 Xi, where the Xi take on values in [0; 1] and are independent. (The Xi can take on any value between 0 and 1. Also, the Xi need not be identical.) Then 2 − 2t P[X ≥ E[X] + t] ≤ e n 2 − 2t P[X ≤ E[X] − t] ≤ e n If we use the Chernoff bound, we get: 2 2 − 2t σ P[X ≥ 500 + σt] ≤ e n 2t2(n=4) − 2 n = e n (since σ = ) 4 2 − t = e 2 Note that this is the same bound that we got from the Gaussian! So the Chernoff bound is basically tight in this case. 4.1 Evaluating the Chernoff Bound There are a few limitations of the Chernoff bound in this situation. The main limitation is that it does not depend on the variance of the Xi. Suppose each Xi has a very low probability, say, 1 [X ] = p P i 10 n Then p 1 n [X] = p · n = E 10 n 10 3 Then the Chernoff bound would give us 1 − δ confidence interval of 2 − 2t δ [X ≥ [X] + t] ≤ e n = P E 2 r n log(2/δ) ) t = 2 q n log(2/δ) So the Chernoff bound gives us X 2 E[X] ± 2 with probability 1 − δ. p If we pick δ = 0:10, then the bound is X 2 E[X] ± 10n. Note that the width of our confidence interval is about 60 times the expected value! We know that the bound can be made much tighter here because the variance of the Xi is small. The Chernoff bound doesn't take variance into account. Later in this class, we will encounter Bernstein-type inequalities, which do account for the variance. Also, note that the Chernoff bound can be less than 0 or greater than n, which does not make sense in this problem. 4.2 Second Food for Thought Suppose you have a biased coin that lands heads with an unknown probability p. How many flips will it take to learn p to an error of ± with probability 1 − δ? We need to find n such that δ [X ≥ (p + )n] ≤ P 2 By the Chernoff bound, 2 − 2(n) P[X ≥ (p + )n] ≤ e n 2 = e−2 n So we must pick n such that 2 δ e−2 n ≤ 2 i.e. 1 2 n ≥ log 22 δ 5 The Coupon Collecting Problem Suppose there exist n types of coupons, with an infinite supply of each. If you collect coupons uniformly at random, what is the expected number of coupons you must collect to hold at least one of each type? T0 must be 1, since the next coupon we receive is the first coupon, so it must be a new type. 1 Tn−1 must be n, since for each coupon we receive, there is a n probability that it is the coupon that we haven't received yet. 4 n In general, Tn−k = k because there are k coupon types remaining, so at each step the probability k that we receive a new coupon is n . So n X X n T = T = = nH ≤ n(log n + 1) i k k i=1 Pn 1 where Hn = i=1 i is the nth harmonic number. 6 Background on Complexity Theory Let's begin our whirlwind tour of complexity theory. We'll explore, to varying degrees, P, NP, ZPP, RP, coRP, and BPP. We use as supplemental material section 1.5 from Randomized Algorithms by Motwani and Ragha- van [?]. Note that formally, complexity classes are sets of languages, but we'll be looser and also speak in terms of algorithms inhabiting these classes. 6.1 P: Polynomial-time deterministic algorithms These can be described by Turing machines and languages.A language is a set ` ⊆ f0; 1gn. Given an input string x 2 f0; 1gn, a Turing machine either accepts the string, rejects the string, or loops. The language of a Turing machine is the set of strings that the Turing machine accepts. It is possible to phrase many non-decision problems as decision problems. For example, suppose the original problem is to find an integer n that has some property. The corresponding decision problem would be: Given an input x, is x less than or equal to n? In this setting, we can use binary search to find x. Definition 8. The class P consists of all languages ` that have a polynomial-time algorithm A such that for any input x, If x 2 `, A accepts If x 62 `, A rejects 6.2 NP: P with a good \advice" string \Advice" given to a Turing machine is an additional input string allowed to depend on the length of the input, n, but not the input itself. Definition 9. A language l is in NP if there exists a polynomial-time algorithm A such that for any input x, If x 2 `, there exists an advice string y such that A accepts the input (x; y) If x 62 `, then for all strings y, A rejects the input (x; y) 5 For our purposes, the advice string may be arbitrarily long, but the algorithm can only inspect a polynomial-in-the-length-of-the-input amount of it. Note the resemblance between advice strings and random bits. We will come back to this later. 6.3 RP and coRP: One-sided error Definition 10. An algorithm A is in RP if it has polynomial worst-case runtime and, for all inputs x, 1 If x 2 `, then [A accepts x] ≥ P 2 If x 62 `, then P[A accepts x] = 0 Definition 11. An algorithm A is in coRP if it has polynomial worst-case runtime and, for all inputs x, If x 2 `, then P[A accepts x] = 1 1 If x 62 `, then [A accepts x] < P 2 Note that an RP algorithm corresponds to a Monte Carlo algorithm that can err only when x 2 `, and, similarly, an algorithm in coRP corresponds to a Monte Carlo algorithm that can err only when x2 = ` [?].