<<

CS 388R: Randomized Fall 2015 Lecture 2 — 31 August 2014 Prof. Eric Price Scribe: Sid Kapur, Neil Vyas

1 Overview

In this lecture we will first introduce some probability machinery: linearity of expectation, the central limit theorem, and the discrete-setting Chernoff bound. We will then apply these tools to some simple problems: repeated Bernoulli trials and the coupon collecting problem. Finally, we will discuss how randomized algorithms fit in to the landscape of complexity theory.

2 Food for Thought

1. Suppose you flip an unbiased coin 1000 times. How surprised would you be if we observed • 500 heads? • 510 heads? • 600 heads? • 1000 heads? 2. Suppose you have a biased coin that lands heads with an unknown probability p. How many flips will it take to learn p to an error of ± with probability 1 − δ?

3 Probability Background

Theorem 1. Linearity of expectation.

If X1,...,Xn are random variables, then " n # n X X E Xi = E[Xi] i=1 i=1

Pn Example: Suppose X1,...,Xn are Bernoulli trials with p = 0.5. Let X = i=1 Ei. Then n X n [X] = [X ] = E E i 2 i=1 Definition 2. The variance of a random variable X, denoted Var[X], is defined as

 2 Var[X] = E (X − E[X])

1 Observation 3. Note that if Y1 and Y2 are independent random variables with E[Y1] = E[Y2] = 0, then

 2 Var[Y1 + Y2] = E (Y1 + Y2)  2 2  = E Y1 + Y2 + 2Y1Y2  2  2 = E Y1 + E Y2 + 2 E[Y1] E[Y2] (since Y1 and Y2 are independent)  2  2 = E Y1 + E Y2 = Var[Y1] + Var[Y2]

Claim 4. We claim (without proof) that if Y1 and Y2 are independent, then Var[Y1 + Y2] = Var[Y1] + Var[Y2] (i.e. E[Y1] and E[Y2] need not be zero).

Pn Example: Suppose X1,...,Xn are independent Bernoulli trials with p = 0.5. Let X = i=1 Ei. Then n X Var[X] = Var[Xi] (by Claim ??) i=1 " # 1 12 1 12 = n + 2 2 2 2 n = 4

If n = 1000, then Var[X] = 250 and the standard deviation of X = pVar[X] ≈ 16.

Theorem 5. The central limit theorem Suppose X1,...,Xn are independent identically dis- tributed random variables with expected value µ and variance σ2.

X1+···+Xn Then as n → ∞, the sample average Sn = n converges to the Gaussian  σ2  N µ, n

Some statements of the central limit theorem give bounds on convergence, but unfortunately these bounds are not tight enough to be useful in this example.

By the central limit theorem, as the number of coin tosses goes to infinity, our binomial distribution 2 n 2 n of coin tosses converges to a Gaussian N(µ, σ ) where µ = 2 and σ = 4 , as we calculated above.

4 Bounding the Sum of Bernoulli Trials

Let’s return to our coin-tossing problem. Can we find an upper bound on the probability that the number of coin tosses exceeds a certain number?

2 Claim 6. If Z is drawn from a Gaussian with mean µ and variance σ2, then

2 − t P[Z ≥ µ + t] ≤ e 2σ2 ∀t ≥ 0 2 − t P[Z ≤ µ − t] ≤ e 2σ2 ∀t ≥ 0 (We omit the proof.)

If our distribution were Gaussian (or if we were willing to invoke the central limit theorem), we could use the bound from Claim ??:

2 − t P[X ≥ 500 + σt] ≤ e 2 2 − t ⇒ P[X ≥ 500 + 16t] ≤ e 2 −18 ⇒ P[X ≥ 600] ≤ e

But unfortunately, we have a binomial distribution, not a Gaussian, and we don’t want to use the central limit theorem. What should we do? Pn Definition 7. Chernoff bound (for the discrete setting) Let X = i=1 Xi, where the Xi take on values in [0, 1] and are independent. (The Xi can take on any value between 0 and 1. Also, the Xi need not be identical.) Then

2 − 2t P[X ≥ E[X] + t] ≤ e n 2 − 2t P[X ≤ E[X] − t] ≤ e n

If we use the Chernoff bound, we get:

2 2 − 2t σ P[X ≥ 500 + σt] ≤ e n 2t2(n/4) − 2 n = e n (since σ = ) 4 2 − t = e 2

Note that this is the same bound that we got from the Gaussian! So the Chernoff bound is basically tight in this case.

4.1 Evaluating the Chernoff Bound

There are a few limitations of the Chernoff bound in this situation.

The main limitation is that it does not depend on the variance of the Xi. Suppose each Xi has a very low probability, say, 1 [X ] = √ P i 10 n Then √ 1 n [X] = √ · n = E 10 n 10

3 Then the Chernoff bound would give us 1 − δ confidence interval of

2 − 2t δ [X ≥ [X] + t] ≤ e n = P E 2 r n log(2/δ) ⇒ t = 2

q n log(2/δ) So the Chernoff bound gives us X ∈ E[X] ± 2 with probability 1 − δ. √ If we pick δ = 0.10, then the bound is X ∈ E[X] ± 10n. Note that the width of our confidence interval is about 60 times the expected value!

We know that the bound can be made much tighter here because the variance of the Xi is small. The Chernoff bound doesn’t take variance into account. Later in this class, we will encounter Bernstein-type inequalities, which do account for the variance. Also, note that the Chernoff bound can be less than 0 or greater than n, which does not make sense in this problem.

4.2 Second Food for Thought

Suppose you have a biased coin that lands heads with an unknown probability p. How many flips will it take to learn p to an error of ± with probability 1 − δ? We need to find n such that δ [X ≥ (p + )n] ≤ P 2 By the Chernoff bound,

2 − 2(n) P[X ≥ (p + )n] ≤ e n 2 = e−2 n

So we must pick n such that 2 δ e−2 n ≤ 2 i.e. 1 2 n ≥ log 22 δ

5 The Coupon Collecting Problem

Suppose there exist n types of coupons, with an infinite supply of each. If you collect coupons uniformly at random, what is the expected number of coupons you must collect to hold at least one of each type?

T0 must be 1, since the next coupon we receive is the first coupon, so it must be a new type. 1 Tn−1 must be n, since for each coupon we receive, there is a n probability that it is the coupon that we haven’t received yet.

4 n In general, Tn−k = k because there are k coupon types remaining, so at each step the probability k that we receive a new coupon is n . So n X X n T = T = = nH ≤ n(log n + 1) i k k i=1 Pn 1 where Hn = i=1 i is the nth harmonic number.

6 Background on Complexity Theory

Let’s begin our whirlwind tour of complexity theory. We’ll explore, to varying degrees, P, NP, ZPP, RP, coRP, and BPP. We use as supplemental material section 1.5 from Randomized Algorithms by Motwani and Ragha- van [?]. Note that formally, complexity classes are sets of languages, but we’ll be looser and also speak in terms of algorithms inhabiting these classes.

6.1 P: Polynomial-time deterministic algorithms

These can be described by Turing machines and languages.A language is a set ` ⊆ {0, 1}n. Given an input string x ∈ {0, 1}n, a Turing machine either accepts the string, rejects the string, or loops. The language of a Turing machine is the set of strings that the Turing machine accepts. It is possible to phrase many non-decision problems as decision problems. For example, suppose the original problem is to find an integer n that has some property. The corresponding decision problem would be: Given an input x, is x less than or equal to n? In this setting, we can use binary search to find x.

Definition 8. The class P consists of all languages ` that have a polynomial-time A such that for any input x,

If x ∈ `, A accepts If x 6∈ `, A rejects

6.2 NP: P with a good “advice” string

“Advice” given to a Turing machine is an additional input string allowed to depend on the length of the input, n, but not the input itself.

Definition 9. A language l is in NP if there exists a polynomial-time algorithm A such that for any input x,

If x ∈ `, there exists an advice string y such that A accepts the input (x, y) If x 6∈ `, then for all strings y, A rejects the input (x, y)

5 For our purposes, the advice string may be arbitrarily long, but the algorithm can only inspect a polynomial-in-the-length-of-the-input amount of it. Note the resemblance between advice strings and random bits. We will come back to this later.

6.3 RP and coRP: One-sided error

Definition 10. An algorithm A is in RP if it has polynomial worst-case runtime and, for all inputs x, 1 If x ∈ `, then [A accepts x] ≥ P 2 If x 6∈ `, then P[A accepts x] = 0

Definition 11. An algorithm A is in coRP if it has polynomial worst-case runtime and, for all inputs x,

If x ∈ `, then P[A accepts x] = 1 1 If x 6∈ `, then [A accepts x] < P 2

Note that an RP algorithm corresponds to a Monte Carlo algorithm that can err only when x ∈ `, and, similarly, an algorithm in coRP corresponds to a Monte Carlo algorithm that can err only when x∈ / ` [?].

6.4 ZPP Zero-error with expected polynomial time

This corresponds to Las Vegas-type randomized algorithms. Thus, ZPP = RP ∩ coRP.

6.5 PP and BPP: Two-sided error

Definition 12. An algorithm A is in PP (probabilistic polynomial time) if it has polynomial worst- case runtime and, for all inputs x, 1 If x ∈ `, [A accepts x] ≥ P 2 1 If x 6∈ `, [A accepts x] ≤ P 2

Unfortunately, PP is not a very useful class. For example, NP ⊆ PP. To show this we consider SAT, the unrestricted satisfiability problem: Given f(x1, . . . , xn), is it satisfiable? We propose the following algorithm:

6 function RandomizedSAT(f : {True, False}n → {True, False}): {True, False} for all i ∈ {1 . . . n} do xi ← Uniform(True, False) end for if f(x1, . . . , xn) = True then return True else b ← Uniform(True, False) return b end if end function

1 1 Suppose that the input is satisfiable; then P[RandomizedSAT accepts] ≥ 2 + 2n+1 . Similarly, if the 1 input is not satisfiable, P[RandomizedSAT rejects] = 2 . Thus SAT is in PP. Note that since we can reduce every problem in NP to SAT, every NP is in PP. Thus, PP isn’t a very useful class to reason about. We introduce BPP in the hopes of remedying this situation.

Definition 13. An algorithm A is in BPP (bounded probabilistic time) if, for all inputs x, 2 If x ∈ `, [A accepts x] ≥ P 3 2 If x 6∈ `, [A rejects x] ≤ P 3

In the next section, we will show that we can amplify any BPP algorithm to arbitrary accuracy.

6.6 Amplification of BPP algorithms

Now, let’s get back to that last remark; let’s amplify the probability of success of a BPP algorithm.

Suppose that we want a BPP algorithm with P[accept x if x ∈ `] ≥ 1 − δ and P[reject x if x 6∈ `] ≥ 1 1 − δ. We claim that for a BPP algorithm, this can be accomplished with only O(log δ ) time overhead.

Claim 14. Amplification. Given a BPP algorithm for some problem that accepts correct answers 2 2 with probability at least 3 , and rejects incorrect answers with probability at least 3 , there exists a BPP algorithm that accepts correct answers with probability at least 1 − δ and rejects incorrect answers with probability at least 1 − δ, for any 0 ≤ δ < 1.

Proof: Consider the algorithm A0 that runs k independent iterations of A, and accepts if the majority of the runs accepted, and rejects if the majority of the runs rejected. and then employ the Chernoff bound to bound the probability that the majority vote across all iterations is incorrect.

7 2 Suppose x ∈ `. Then E[num. runs that accept x] ≥ 3 k. So

  2 2 − 2t num accepts ≤ k − t ≤ e k (by Chernoff Bound) P 3   1 − k ⇒ num accepts ≤ k ≤ e 18 (letting t = k/6) P 2 − k ⇒ k ≥ −18 log δ (setting e 18 ≤ δ)

So, the probability that the majority of the k runs falsely reject (and thus A0 rejects) when x ∈ ` is bounded by δ if we set k ≥ −18 log δ.

The proof that A1 rejects incorrect answers with probability at least 1 − δ is similar. So, A0 is in BPP with the 1 − δ bounds that we desired, if we set k appropriately.

6.7 Ppoly

Ppoly is the set of all problems that are decidable by deterministic polynomial-time algorithms that can take in an advice string that depends on the size of the input, but not the input itself.

We’ll show one relevant theoretical use of Ppoly in a later section.

6.8 Containments

It is known that P ⊆ ZPP, but it is not known whether ZPP ⊆ P. It should also be apparent that ZPP = RP ∩ coRP, as stated above.

We claim that P ⊆ RP ⊆ NP, and that RP ⊆ BPP; here we will only prove that RP ⊆ NP.

Claim 15. RP ⊆ NP. Proof:

Suppose A is a RP algorithm for the language `. To prove that ` ∈ NP, it is sufficient to show that A has polynomial runtime and that:

1. For every x ∈ `, there exists an advice string r such that A(x, r) accepts

2. For every x 6∈ `, for all advice strings r, A(x, r) rejects.

(1) is true since, by the definition of RP, there is a random string r such that A(x, r) accepts.

(2) is true since, by the defintion of RP, there is no random string r such that A(x, r) accepts. So ` ∈ NP.

It is not known whether the following statements hold:

• BPP ⊂ NP ?

8 • RP = ZPP = coRP ?

7 Adleman’s Theorem

Theorem 16. Adleman’s Theorem: BPP ⊆ Ppoly.

Proof: Let A ∈ BPP.

0 1 By Claim ??, there exists an algorithm A ∈ BPP with δ = 2n+1 . Let Bad(x) be the set of advice strings r such that A0(x, r) is incorrect. So, over the advice string R,   [ P [∃x s.t. R ∈ Bad(x)] = P  R ∈ Bad(x) x∈{0,1}n X ≤ P (R ∈ Bad(x)) (union bound) x∈{0,1}n 1 ≤ 2n · 2n+1 1 = < 1 2

Thus, there is a positive probability that A0 succeeds on every x. Thus there exists a random 0 advice string (call it rn) such that for all inputs x of length n, A (x, rn) gives the correct answer. 0 So, the algorithm A (x, rn) is a polynomial-time deterministic algorithm that, given the appro- 0 priate advice string rn, gives the correct answer for all inputs x of length n. So A ∈ Ppoly.

So, in a sense, randomness doesn’t help that much. Every problem in BPP can be solved in polynomial time without randomness (assuming we have an advice string for the Ppoly algorithm). However, randomized algorithms often give us a smaller polynomial runtime.

References

[MR] Rajeev Motwani, Prabhakar Raghavan Randomized Algorithms. Cambridge University Press, 0-521-47465-5, 1995.

9