CS70: Jean Walrand: Lecture 30. Bounds: An Overview Andrey Markov

Markov & Chebyshev; A brief review Andrey Markov is best known for his work on stochastic processes. A primary subject of his research later became known as Markov I Bounds: chains and Markov processes. 1. Markov was one of his teachers. 2. Chebyshev Markov was an atheist. In 1912 he protested ’s excommunication from the I A brief review by requesting his I A mini-quizz own excommunication. The Church complied with his request.

Monotonicity Markov’s inequality A picture The inequality is named after Andrey Markov, although it appeared earlier in Let X be a RV. We write X 0 if all the possible values of X are the work of Pafnuty Chebyshev. It should be (and is sometimes) called ≥ Chebyshev’s first inequality. nonnegative. That is, Pr[X 0] = 1. ≥ Theorem Markov’s Inequality It is obvious that X 0 E[X] 0. Assume f : ℜ [0,∞) is nondecreasing. Then, ≥ ⇒ ≥ → We write X Y if X Y 0. E[f (X)] ≥ − ≥ Pr[X a] , for all a such that f (a) > 0. Then, ≥ ≤ f (a) Proof: X Y X Y 0 E[X] E[Y ] = E[X Y ] 0. ≥ ⇒ − ≥ ⇒ − − ≥ Observe that f (X) Hence, 1 X a . { ≥ } ≤ f (a) X Y E[X] E[Y ]. ≥ ⇒ ≥ Indeed, if X < a, the inequality reads 0 f (X)/f (a), which ≤ We say that expectation is monotone. holds since f ( ) 0. Also, if X a, it reads 1 f (X)/f (a), which · ≥ ≥ ≤ (Which means monotonic, but not monotonous!) holds since f ( ) is nondecreasing. · Taking the expectation yields the inequality. Markov Inequality Example: G(p) Markov Inequality Example: P(λ) Chebyshev’s Inequality

1 2 2 p Let X = G(p). Recall that E[X] = and E[X ] = − . 2 2 p p2 Let X = P(λ). Recall that E[X] = λ and E[X ] = λ + λ . This is Pafnuty’s inequality: Theorem: Choosing f (x) = x, we Choosing f (x) = x, we var[X] get Pr[ X E[X] > a] , for all a > 0. get | − | ≤ a2 E[X] 1 E[X] λ Pr[X a] = . Pr[X a] = . ≥ ≤ a ap ≥ ≤ a a Proof: Let Y = X E[X] and f (y) = y 2. Then, | − | 2 Choosing f (x) = x2, Choosing f (x) = x , E[f (Y )] var[X] we get we get Pr[Y a] = . ( ) 2 2 2 ≥ ≤ f a a E[X 2] 2 p E[X ] λ + λ Pr[X a] = − . Pr[X a] = . ≥ ≤ a2 p2a2 ≥ ≤ a2 a2 This result confirms that the variance measures the “deviations from the mean.”

Chebyshev and P(λ) Fraction of H’s Fraction of H’s Let X = P(λ). Then, E[X] = λ and var[X] = λ. Thus, Here is a classical application of Chebyshev’s inequality. var[X] λ How likely is it that the fraction of H’s differs from 50%? Pr[ X λ a] = . | − | ≥ ≤ a2 a2 Let Xm = 1 if the m-th flip of a fair coin is H and Xm = 0 λ otherwise. X + + X a > λ Pr[X a] Pr[ X λ a λ] 2 . 1 n ⇒ ≥ ≤ | − | ≥ − ≤ (a λ) Define Yn = ··· , for n 1. − X + + X n ≥ Y = 1 ··· n , for n 1. n n ≥ 25 Pr[ Yn 0.5 0.1] . We want to estimate | − | ≥ ≤ n Pr[ Y 0.5 0.1] = Pr[Y 0.4 or Y 0.6]. For n = 1,000, we find that this probability is less than 2.5%. | n − | ≥ n ≤ n ≥ As n ∞, this probability goes to zero. By Chebyshev, → var[Y ] We look at a general case next. Pr[ Y 0.5 0.1] n = 100var[Y ]. | n − | ≥ ≤ (0.1)2 n Now, 1 1 1 var[Y ] = (var[X ] + + var[X ]) = var[X ] = . n n2 1 ··· n n 1 4n Weak Summary A brief review - 1 I Framework: Theorem Weak Law of Large Numbers Ω,Pr[ω],P[A] = ∑ Pr[ω] ω A Let X1,X2,... be independent with the same distribution. Then, ∈ for all ε > 0, I Conditional Probability and Bayes’ Rule

X1 + + Xn Pr[A B] Pr[Am]Pr[B Am] Pr[ ··· E[X1] ε] 0, as n ∞. Bounds! Pr[A B] := ∩ ;Pr[Am B] = | | n − | ≥ → → | Pr[B] | ∑ Pr[Ak ]Pr[B Ak ] k |

I Independence: Proof: I Markov: Pr[X a] E[f (X)]/f (a) where ... ≥ ≤ X1+ +Xn 2 Let Yn = ··· . Then I Chebyshev: Pr[ X E[X] a] var[X]/a A,B ind. Pr[A B] = Pr[A]Pr[B] n | − | ≥ ≤ ⇔ ∩ var[Y ] Pr[ Y E[X ] ε] n | n − 1 | ≥ ≤ ε2 I Mutual Independence: var[X ] = 1 0, as n ∞. A,B,C,... mutually independent nε2 → → ⇔ ···

I Random Variable: X :Ω ℜ → A brief review - 2 A brief review - 3 A brief review - 4

I Expectation: E[X] := ∑a Pr[X = a] = ∑X(ω)Pr[ω]. a × ω E[aX + bY ] = aE[X] + bE[Y ].

I Independent RVs:

X,Y are independent if and only ifPr[X A,Y B] = ∈ ∈ ··· I Geometric distribution: G(p),p [0,1] ∈ I Uniform distribution: U[1,...,n] I Random experiment: Number of flips until first H. I If independent, then m 1 I Random experiment: Pick a number uniformly in 1,...,n I Definition: Pr[X = m] = (1 p) − p,m 1 { } − ≥ I f (X),g(Y ) independent I Definition: Pr[X = m] = 1/n,m = 1,...,n; I Mean: E[X] = 1/p; “trick” or renewal 2 I E[XY ] = E[X]E[Y ] I Mean: E[X] = (n + 1)/2; Calculation I Variance: var[X] = (1 p)/p ; “trick” or renewal 2 − I var[X + Y ] = var[X] + var[Y ] I Variance: var[X] = (n 1)/12; Calculation I Renewal: X = 1 + ZY ,Y X,Pr[Z = 1] = q := 1 p ≡ − − E[X] = 1 + qE[X];E[X 2] = 1 + 2qE[X] + qE[X 2]. A brief review - 5 A brief review - 6 A mini-quizz

True or False:

I Probability Space:

I Pr[A B] = Pr[A] + Pr[B]. False True iff disjoint. ∪ I Pr[A B] = Pr[A]Pr[B]. False True iff independent. ∩ I A B = /0 A,B independent. False ∩ ⇒ I For all A,B, one has Pr[A B] Pr[A]. False | ≥ I Pr[A B C] = Pr[A]Pr[B A]Pr[C B]. False ∩ ∩ | | I Random Variables:

I For all X,Y , one has E[XY ] = E[X]E[Y ]. False True if (not I Binomial distribution: B(n,p),n 1,p [0,1] I Poisson distribution: P(λ),λ > 0 iff) independent. ≥ ∈ I var[3X + 1] = 3var[X]. False = 9var[X]. I Random experiment: Number of heads out of n flips. I Random experiment: Number of H’s out of n 1 flips with  I n m n m p = λ/n = B(n,λ/n). var[2X + 3Y ] = 4var[X] + 9var[Y ]. False True if (not iff) I Definition: Pr[X = m] = m p (1 p) − ,m = 0,1,...,n − λ m λ independent. I Mean: E[X] = np;X = X1 + + Xn,E[X] = nE[X1], by I Definition: Pr[X = m] = m! e− ,m 0 ··· ≥ I E[2X + 3Y + 1] = 2E[X] + 3E[Y ] + 1. True linearity I Mean: E[X] = λ; E[X] = n(λ/n) = λ I Variance: var[X] = np(1 p); I Variance: var[X] = λ; − X = X + + X ;var[X] = nvar[X ] by ind. var[X] = n(λ/n)(1 λ/n) = λ λ 2/n = λ since n 1. 1 ··· n 1 − −  Inequalities: Some Applications

Below, Y is measured and used to construct a guess g(Y ) about X. Here are some possible goals. 8 I Detection: Pr[X = g(Y )] 10 (e.g., bit error rate) 6 ≤ − I Estimation: Pr[ X g(Y ) > a] 5% (e.g., polling) | − | ≤ I Hypothesis Testing: Pr[g(Y ) = 1 X = 0] 5% and | ≤ Pr[g(Y ) = 0 X = 1] 5% (e.g., fire alarm) | ≤ I Design: Pr[buffer overflow] 5% ≤ 6 I Control: Pr[crash] 10 ≤ − Many situations where exact probabilities are difficult to compute but bounds can be found.