CS70: Jean Walrand: Lecture 30.

Markov & Chebyshev; A brief review

I Bounds: 1. Markov 2. Chebyshev

I A brief review

I A mini-quizz Bounds: An Overview Andrey Markov

Andrey Markov is best known for his work on stochastic processes. A primary subject of his research later became known as Markov chains and Markov processes. was one of his teachers. Markov was an atheist. In 1912 he protested ’s excommunication from the by requesting his own excommunication. The Church complied with his request. Monotonicity

Let X be a RV. We write X ≥ 0 if all the possible values of X are nonnegative. That is, Pr[X ≥ 0] = 1. It is obvious that X ≥ 0 ⇒ E[X] ≥ 0. We write X ≥ Y if X − Y ≥ 0. Then,

X ≥ Y ⇒ X − Y ≥ 0 ⇒ E[X] − E[Y ] = E[X − Y ] ≥ 0.

Hence, X ≥ Y ⇒ E[X] ≥ E[Y ].

We say that expectation is monotone. (Which means monotonic, but not monotonous!) Markov’s inequality The inequality is named after Andrey Markov, although it appeared earlier in the work of Pafnuty Chebyshev. It should be (and is sometimes) called Chebyshev’s first inequality. Theorem Markov’s Inequality Assume f : ℜ → [0,∞) is nondecreasing. Then,

E[f (X)] Pr[X ≥ a] ≤ , for all a such that f (a) > 0. f (a) Proof: Observe that f (X) 1{X ≥ a} ≤ . f (a) Indeed, if X < a, the inequality reads 0 ≤ f (X)/f (a), which holds since f (·) ≥ 0. Also, if X ≥ a, it reads 1 ≤ f (X)/f (a), which holds since f (·) is nondecreasing. Taking the expectation yields the inequality. A picture Markov Inequality Example: G(p)

= ( ) [ ] = 1 [ 2] = 2−p Let X G p . Recall that E X p and E X p2 .

Choosing f (x) = x, we get E[X] 1 Pr[X ≥ a] ≤ = . a ap

Choosing f (x) = x2, we get

E[X 2] 2 − p Pr[X ≥ a] ≤ = . a2 p2a2 Markov Inequality Example: P(λ)

Let X = P(λ). Recall that E[X] = λ and E[X 2] = λ + λ 2.

Choosing f (x) = x, we get E[X] λ Pr[X ≥ a] ≤ = . a a

Choosing f (x) = x2, we get

E[X 2] λ + λ 2 Pr[X ≥ a] ≤ = . a2 a2 Chebyshev’s Inequality

This is Pafnuty’s inequality: Theorem: var[X] Pr[|X − E[X]| > a] ≤ , for all a > 0. a2

Proof: Let Y = |X − E[X]| and f (y) = y 2. Then,

E[f (Y )] var[X] Pr[Y ≥ a] ≤ = . f (a) a2

This result confirms that the variance measures the “deviations from the mean.” Chebyshev and P(λ) Let X = P(λ). Then, E[X] = λ and var[X] = λ. Thus, var[X] λ Pr[|X − λ| ≥ a] ≤ = . a2 a2 λ a > λ ⇒ Pr[X ≥ a] ≤ Pr[|X − λ| ≥ a − λ] ≤ . (a − λ)2 Fraction of H’s Here is a classical application of Chebyshev’s inequality. How likely is it that the fraction of H’s differs from 50%?

Let Xm = 1 if the m-th flip of a fair coin is H and Xm = 0 otherwise. Define X + ··· + X Y = 1 n , for n ≥ 1. n n We want to estimate

Pr[|Yn − 0.5| ≥ 0.1] = Pr[Yn ≤ 0.4 or Yn ≥ 0.6]. By Chebyshev, var[Y ] Pr[|Y − 0.5| ≥ 0.1] ≤ n = 100var[Y ]. n (0.1)2 n Now, 1 1 1 var[Y ] = (var[X ] + ··· + var[X ]) = var[X ] = . n n2 1 n n 1 4n Fraction of H’s

X + ··· + X Y = 1 n , for n ≥ 1. n n 25 Pr[|Y − 0.5| ≥ 0.1] ≤ . n n For n = 1,000, we find that this probability is less than 2.5%. As n → ∞, this probability goes to zero. We look at a general case next. Weak

Theorem Weak Law of Large Numbers

Let X1,X2,... be independent with the same distribution. Then, for all ε > 0, X + ··· + X Pr[| 1 n − E[X ]| ≥ ε] → 0, as n → ∞. n 1

Proof: X1+···+Xn Let Yn = n . Then var[Y ] Pr[|Y − E[X ]| ≥ ε] ≤ n n 1 ε2 var[X ] = 1 → 0, as n → ∞. nε2 Summary

Bounds!

I Markov: Pr[X ≥ a] ≤ E[f (X)]/f (a) where ... 2 I Chebyshev: Pr[|X − E[X]| ≥ a] ≤ var[X]/a A brief review - 1

I Framework: Ω,Pr[ω],P[A] = ∑ Pr[ω] ω∈A

I Conditional Probability and Bayes’ Rule

Pr[A ∩ B] Pr[Am]Pr[B|Am] Pr[A|B] := ;Pr[Am|B] = Pr[B] ∑k Pr[Ak ]Pr[B|Ak ]

I Independence: A,B ind. ⇔ Pr[A ∩ B] = Pr[A]Pr[B]

I Mutual Independence: A,B,C,... mutually independent ⇔ ···

I Random Variable: X :Ω → ℜ A brief review - 2

I Expectation: E[X] := ∑a × Pr[X = a] = ∑X(ω)Pr[ω]. a ω E[aX + bY ] = aE[X] + bE[Y ].

I Independent RVs:

X,Y are independent if and only ifPr[X ∈ A,Y ∈ B] = ···

I If independent, then

I f (X),g(Y ) independent I E[XY ] = E[X]E[Y ] I var[X + Y ] = var[X] + var[Y ] A brief review - 3

I Uniform distribution: U[1,...,n]

I Random experiment: Pick a number uniformly in {1,...,n} I Definition: Pr[X = m] = 1/n,m = 1,...,n; I Mean: E[X] = (n + 1)/2; Calculation 2 I Variance: var[X] = (n − 1)/12; Calculation A brief review - 4

I Geometric distribution: G(p),p ∈ [0,1]

I Random experiment: Number of flips until first H. m−1 I Definition: Pr[X = m] = (1 − p) p,m ≥ 1 I Mean: E[X] = 1/p; “trick” or renewal 2 I Variance: var[X] = (1 − p)/p ; “trick” or renewal I Renewal: X = 1 + ZY ,Y ≡ X,Pr[Z = 1] = q := 1 − p E[X] = 1 + qE[X];E[X 2] = 1 + 2qE[X] + qE[X 2]. A brief review - 5

I Binomial distribution: B(n,p),n ≥ 1,p ∈ [0,1]

I Random experiment: Number of heads out of n flips. n  m n−m I Definition: Pr[X = m] = m p (1 − p) ,m = 0,1,...,n I Mean: E[X] = np;X = X1 + ··· + Xn,E[X] = nE[X1], by linearity I Variance: var[X] = np(1 − p); X = X1 + ··· + Xn;var[X] = nvar[X1] by ind. A brief review - 6

I Poisson distribution: P(λ),λ > 0

I Random experiment: Number of H’s out of n  1 flips with p = λ/n = B(n,λ/n). λ m −λ I Definition: Pr[X = m] = m! e ,m ≥ 0 I Mean: E[X] = λ; E[X] = n(λ/n) = λ I Variance: var[X] = λ; var[X] = n(λ/n)(1 − λ/n) = λ − λ 2/n = λ since n  1. A mini-quizz

True or False:

I Probability Space:

I Pr[A ∪ B] = Pr[A] + Pr[B]. False True iff disjoint. I Pr[A ∩ B] = Pr[A]Pr[B]. False True iff independent. I A ∩ B = /0 ⇒ A,B independent. False I For all A,B, one has Pr[A|B] ≥ Pr[A]. False I Pr[A ∩ B ∩ C] = Pr[A]Pr[B|A]Pr[C|B]. False

I Random Variables:

I For all X,Y , one has E[XY ] = E[X]E[Y ]. False True if (not iff) independent. I var[3X + 1] = 3var[X]. False = 9var[X]. I var[2X + 3Y ] = 4var[X] + 9var[Y ]. False True if (not iff) independent. I E[2X + 3Y + 1] = 2E[X] + 3E[Y ] + 1. True Inequalities: Some Applications

Below, Y is measured and used to construct a guess g(Y ) about X. Here are some possible goals. −8 I Detection: Pr[X 6= g(Y )] ≤ 10 (e.g., bit error rate)

I Estimation: Pr[|X − g(Y )| > a] ≤ 5% (e.g., polling)

I Hypothesis Testing: Pr[g(Y ) = 1|X = 0] ≤ 5% and Pr[g(Y ) = 0|X = 1] ≤ 5% (e.g., fire alarm)

I Design: Pr[buffer overflow] ≤ 5% −6 I Control: Pr[crash] ≤ 10 Many situations where exact probabilities are difficult to compute but bounds can be found.