CS70: Jean Walrand: Lecture 30
Total Page:16
File Type:pdf, Size:1020Kb
CS70: Jean Walrand: Lecture 30. Markov & Chebyshev; A brief review I Bounds: 1. Markov 2. Chebyshev I A brief review I A mini-quizz Bounds: An Overview Andrey Markov Andrey Markov is best known for his work on stochastic processes. A primary subject of his research later became known as Markov chains and Markov processes. Pafnuty Chebyshev was one of his teachers. Markov was an atheist. In 1912 he protested Leo Tolstoy’s excommunication from the Russian Orthodox Church by requesting his own excommunication. The Church complied with his request. Monotonicity Let X be a RV. We write X ≥ 0 if all the possible values of X are nonnegative. That is, Pr[X ≥ 0] = 1. It is obvious that X ≥ 0 ) E[X] ≥ 0: We write X ≥ Y if X − Y ≥ 0. Then, X ≥ Y ) X − Y ≥ 0 ) E[X] − E[Y ] = E[X − Y ] ≥ 0: Hence, X ≥ Y ) E[X] ≥ E[Y ]: We say that expectation is monotone. (Which means monotonic, but not monotonous!) Markov’s inequality The inequality is named after Andrey Markov, although it appeared earlier in the work of Pafnuty Chebyshev. It should be (and is sometimes) called Chebyshev’s first inequality. Theorem Markov’s Inequality Assume f : Â ! [0;¥) is nondecreasing. Then, E[f (X)] Pr[X ≥ a] ≤ ; for all a such that f (a) > 0: f (a) Proof: Observe that f (X) 1fX ≥ ag ≤ : f (a) Indeed, if X < a, the inequality reads 0 ≤ f (X)=f (a), which holds since f (·) ≥ 0. Also, if X ≥ a, it reads 1 ≤ f (X)=f (a), which holds since f (·) is nondecreasing. Taking the expectation yields the inequality. A picture Markov Inequality Example: G(p) = ( ) [ ] = 1 [ 2] = 2−p Let X G p . Recall that E X p and E X p2 . Choosing f (x) = x, we get E[X] 1 Pr[X ≥ a] ≤ = : a ap Choosing f (x) = x2, we get E[X 2] 2 − p Pr[X ≥ a] ≤ = : a2 p2a2 Markov Inequality Example: P(l) Let X = P(l). Recall that E[X] = l and E[X 2] = l + l 2. Choosing f (x) = x, we get E[X] l Pr[X ≥ a] ≤ = : a a Choosing f (x) = x2, we get E[X 2] l + l 2 Pr[X ≥ a] ≤ = : a2 a2 Chebyshev’s Inequality This is Pafnuty’s inequality: Theorem: var[X] Pr[jX − E[X]j > a] ≤ ; for all a > 0: a2 Proof: Let Y = jX − E[X]j and f (y) = y 2. Then, E[f (Y )] var[X] Pr[Y ≥ a] ≤ = : f (a) a2 This result confirms that the variance measures the “deviations from the mean.” Chebyshev and P(l) Let X = P(l). Then, E[X] = l and var[X] = l. Thus, var[X] l Pr[jX − lj ≥ a] ≤ = : a2 a2 l a > l ) Pr[X ≥ a] ≤ Pr[jX − lj ≥ a − l] ≤ : (a − l)2 Fraction of H’s Here is a classical application of Chebyshev’s inequality. How likely is it that the fraction of H’s differs from 50%? Let Xm = 1 if the m-th flip of a fair coin is H and Xm = 0 otherwise. Define X + ··· + X Y = 1 n ; for n ≥ 1: n n We want to estimate Pr[jYn − 0:5j ≥ 0:1] = Pr[Yn ≤ 0:4 or Yn ≥ 0:6]: By Chebyshev, var[Y ] Pr[jY − 0:5j ≥ 0:1] ≤ n = 100var[Y ]: n (0:1)2 n Now, 1 1 1 var[Y ] = (var[X ] + ··· + var[X ]) = var[X ] = : n n2 1 n n 1 4n Fraction of H’s X + ··· + X Y = 1 n ; for n ≥ 1: n n 25 Pr[jY − 0:5j ≥ 0:1] ≤ : n n For n = 1;000, we find that this probability is less than 2:5%. As n ! ¥, this probability goes to zero. We look at a general case next. Weak Law of Large Numbers Theorem Weak Law of Large Numbers Let X1;X2;::: be independent with the same distribution. Then, for all e > 0, X + ··· + X Pr[j 1 n − E[X ]j ≥ e] ! 0; as n ! ¥: n 1 Proof: X1+···+Xn Let Yn = n : Then var[Y ] Pr[jY − E[X ]j ≥ e] ≤ n n 1 e2 var[X ] = 1 ! 0; as n ! ¥: ne2 Summary Bounds! I Markov: Pr[X ≥ a] ≤ E[f (X)]=f (a) where ... 2 I Chebyshev: Pr[jX − E[X]j ≥ a] ≤ var[X]=a A brief review - 1 I Framework: Ω;Pr[w];P[A] = ∑ Pr[w] w2A I Conditional Probability and Bayes’ Rule Pr[A \ B] Pr[Am]Pr[BjAm] Pr[AjB] := ;Pr[AmjB] = Pr[B] ∑k Pr[Ak ]Pr[BjAk ] I Independence: A;B ind. , Pr[A \ B] = Pr[A]Pr[B] I Mutual Independence: A;B;C;::: mutually independent ,··· I Random Variable: X :Ω ! Â A brief review - 2 I Expectation: E[X] := ∑a × Pr[X = a] = ∑X(w)Pr[w]: a w E[aX + bY ] = aE[X] + bE[Y ]: I Independent RVs: X;Y are independent if and only ifPr[X 2 A;Y 2 B] = ··· I If independent, then I f (X);g(Y ) independent I E[XY ] = E[X]E[Y ] I var[X + Y ] = var[X] + var[Y ] A brief review - 3 I Uniform distribution: U[1;:::;n] I Random experiment: Pick a number uniformly in f1;:::;ng I Definition: Pr[X = m] = 1=n;m = 1;:::;n; I Mean: E[X] = (n + 1)=2; Calculation 2 I Variance: var[X] = (n − 1)=12; Calculation A brief review - 4 I Geometric distribution: G(p);p 2 [0;1] I Random experiment: Number of flips until first H. m−1 I Definition: Pr[X = m] = (1 − p) p;m ≥ 1 I Mean: E[X] = 1=p; “trick” or renewal 2 I Variance: var[X] = (1 − p)=p ; “trick” or renewal I Renewal: X = 1 + ZY ;Y ≡ X;Pr[Z = 1] = q := 1 − p E[X] = 1 + qE[X];E[X 2] = 1 + 2qE[X] + qE[X 2]: A brief review - 5 I Binomial distribution: B(n;p);n ≥ 1;p 2 [0;1] I Random experiment: Number of heads out of n flips. n m n−m I Definition: Pr[X = m] = m p (1 − p) ;m = 0;1;:::;n I Mean: E[X] = np;X = X1 + ··· + Xn;E[X] = nE[X1]; by linearity I Variance: var[X] = np(1 − p); X = X1 + ··· + Xn;var[X] = nvar[X1] by ind. A brief review - 6 I Poisson distribution: P(l);l > 0 I Random experiment: Number of H’s out of n 1 flips with p = l=n = B(n;l=n). l m −l I Definition: Pr[X = m] = m! e ;m ≥ 0 I Mean: E[X] = l; E[X] = n(l=n) = l I Variance: var[X] = l; var[X] = n(l=n)(1 − l=n) = l − l 2=n = l since n 1. A mini-quizz True or False: I Probability Space: I Pr[A [ B] = Pr[A] + Pr[B]: False True iff disjoint. I Pr[A \ B] = Pr[A]Pr[B]: False True iff independent. I A \ B = /0 ) A;B independent. False I For all A;B, one has Pr[AjB] ≥ Pr[A]: False I Pr[A \ B \ C] = Pr[A]Pr[BjA]Pr[CjB]. False I Random Variables: I For all X;Y , one has E[XY ] = E[X]E[Y ]. False True if (not iff) independent. I var[3X + 1] = 3var[X]. False = 9var[X]: I var[2X + 3Y ] = 4var[X] + 9var[Y ]. False True if (not iff) independent. I E[2X + 3Y + 1] = 2E[X] + 3E[Y ] + 1. True Inequalities: Some Applications Below, Y is measured and used to construct a guess g(Y ) about X. Here are some possible goals. −8 I Detection: Pr[X 6= g(Y )] ≤ 10 (e.g., bit error rate) I Estimation: Pr[jX − g(Y )j > a] ≤ 5% (e.g., polling) I Hypothesis Testing: Pr[g(Y ) = 1jX = 0] ≤ 5% and Pr[g(Y ) = 0jX = 1] ≤ 5% (e.g., fire alarm) I Design: Pr[buffer overflow] ≤ 5% −6 I Control: Pr[crash] ≤ 10 Many situations where exact probabilities are difficult to compute but bounds can be found..