Chapter 3.1,3.3,3.4 LLN and CLT Markov’s and Chebyshev’s Inequalities

For any X ≥ 0 and constant a > 0 then Lecture 11: Using the LLN and CLT, Moments of Markov’s Inequality: Distributions E(X ) P(X ≥ a) ≤ a 104 Chebyshev’s Inequality: Var(X ) Colin Rundel P(|X − E(X )| ≥ a) ≤ a2 February 22, 2012

Statistics 104 (Colin Rundel) Lecture 11 February 22, 2012 1 / 22

Chapter 3.1,3.3,3.4 LLN and CLT Chapter 3.1,3.3,3.4 LLN and CLT Using Markov’s and Chebyshev’s Inequalities Accuracy of Chebyshev’s inequality

Suppose that it is known that the number of items produced in a factory during a week How tight the inequality is depends on the distribution, but we can look at is a random variable X with 50. (Note that we don’t know anything about the the Normal distribution since it is easy to evaluate. Let X ∼ N (µ, σ2) then distribution of the pmf) 1 P(|X − µ| ≥ kσ) ≤ (a) What can be said about the probability that this week’s production will be exceed k2 75 units?  |X − µ|  1 P ≥ k ≤ (b) If the of a week’s production is known to equal 25, then what can be said σ k2  X − µ X − µ  1 about the probability that this week’s production will be between 40 and 60 units? P ≤ −k or ≥ k ≤ σ σ k2

-k 0 k

Statistics 104 (Colin Rundel) Lecture 11 February 22, 2012 2 / 22 Statistics 104 (Colin Rundel) Lecture 11 February 22, 2012 3 / 22 Chapter 3.1,3.3,3.4 LLN and CLT Chapter 3.1,3.3,3.4 LLN and CLT Accuracy of Chebyshev’s inequality, cont. LLN and CLT

2 X −µ If X ∼ N (µ, σ ) then let U = σ ∼ N (0, 1). Law of large numbers:

Sn − nµ lim = lim (X¯n − µ) → 0 Empirical Rule: Chebyshev’s inequality: n→∞ n n→∞

1 : 1 − P(−1 ≤ U ≤ 1) ≈ 1 − 0.66 = 0.34 1 − P(−1 ≤ U ≤ 1) ≤ 11 = 1 √ 1 − P(−2 ≤ U ≤ 2) ≈ 1 − 0.66 = 0.05 1 − P(−2 ≤ U ≤ 2) ≤ 1 = 0.25 Sn − n ∗ mu d 2 22 lim √ = lim n(X¯n − µ) → N(0, σ ) n→∞ n n→∞ 1 1 − P(−3 ≤ U ≤ 3) ≈ 1 − 0.66 = 0.003 1 − P(−3 ≤ U ≤ 3) ≤ 32 = 0.11 √  S − nµ   n(X¯ − µ)  Why do we care if it is so inaccurate? P a ≤ n √ ≤ b = P a ≤ n ≤ b ≈ Φ(b) − Φ(a) σ n σ

Statistics 104 (Colin Rundel) Lecture 11 February 22, 2012 4 / 22 Statistics 104 (Colin Rundel) Lecture 11 February 22, 2012 5 / 22

Chapter 3.1,3.3,3.4 LLN and CLT Chapter 3.1,3.3,3.4 LLN and CLT Equivalence of de Moivre-Laplace with CLT Equivalence of de Moivre-Laplace with CLT, cont.

We have already seen that a Binomial random variable is equivalent to the Central Limit theorem tells us: sum of n iid Bernoulli random variables.

iid n   Let X ∼ Binom(np), Y , ··· , Y ∼ Bern(p) where X = P Y , Sn − nµ 1 n i=1 i P c ≤ √ ≤ d ≈ Φ(d) − Φ(c) E(Yi ) = p, and Var(Yi ) = p(1 − p). σ n

de Moivre-Laplace theorem tells us: a − nµ X − nµ b − nµ P(a ≤ X ≤ b) = P √ ≤ √ ≤ √ ! nσ nσ nσ a − np X − np b − np P(a ≤ X ≤ b) = P ≤ ≤ ! pnp(1 − p) pnp(1 − p) pnp(1 − p) a − np Sn − np b − np = P p ≤ p ≤ p ! ! np(1 − p) np(1 − p) np(1 − p) b − np a − np ! ! ≈ Φ p − Φ p b − np a − np np(1 − p) np(1 − p) ≈ Φ − Φ pnp(1 − p) pnp(1 − p)

Statistics 104 (Colin Rundel) Lecture 11 February 22, 2012 6 / 22 Statistics 104 (Colin Rundel) Lecture 11 February 22, 2012 7 / 22 Chapter 3.1,3.3,3.4 LLN and CLT Chapter 3.1,3.3,3.4 LLN and CLT Why do we care? (Statistics) Astronomy Example

In general we are interested in making statements about how the world An astronomer is interested in measuring, in light years, the distance from his observatory to a distant star. Although the astronomer has a measuring technique, he knows that, because of works, we usually do this based on empirical evidence. This tends to changing atmospheric conditions and normal error, each time a measurement is made it will not involve observing or measuring something a bunch of times. yield the exact distance but merely an estimate. As a result the astronomer plans to make a series of measurements and then use the average value of these measurements as his estimated LLN tells us that if we average our results we’ll get closer and closer value of the actual distance. If the astronomer believes that the values of the measurements are to the true value as we take more measurements independent and identically distributed random variables having a common mean d (the actual distance) and a common variance of 4 (light years), how many measurements need he make to CLT tells us the distribution of our average is normal if we take be reasonably sure that his estimated distance is accurate to within ±0.5 light years? enough measurements, which tells us our uncertainty about the ‘true’ value

Example - Polling: I can’t interview everyone about their opinions but if I interview enough my estimate should be close and I can use the CLT to approximate my margin of error.

Statistics 104 (Colin Rundel) Lecture 11 February 22, 2012 8 / 22 Statistics 104 (Colin Rundel) Lecture 11 February 22, 2012 9 / 22

Chapter 3.1,3.3,3.4 LLN and CLT Chapter 3.1,3.3,3.4 LLN and CLT Astronomy Example, cont. Astronomy Example, cont.

We need to decide on a level to account for being “reasonably sure”, Our previous estimate depends on how long it takes for the distribution of usually taken to be 0.95% but the choice is arbitrary. the average of the measurements to converge to the normal distribution. CLT only guarantees convergence as n → ∞, but most distributions converge much more quickly (think about the np ≥ 10, np ≥ 10 requirement for Normal approximation to Binomial).

We can also solve this problem using Chebyshev’s inequality

Statistics 104 (Colin Rundel) Lecture 11 February 22, 2012 10 / 22 Statistics 104 (Colin Rundel) Lecture 11 February 22, 2012 11 / 22 Chapter 3.1,3.3,3.4 Moments of Distributions Chapter 3.1,3.3,3.4 Moments of Distributions Moments Properties of Moments

Zeroth : 0 Some definitions, µ0 = µ0 = 1 First Moment: 0 Raw moment: µ1 = E(X ) = µ 0 n µn = E(X ) µ1 = E(X − µ) = 0 Second Moment: 2 µ2 = E[(X − µ) ] = Var(X ) : 0 0 2 2 µ2 − (µ1) = Var(X ) µn = E[(X − µ) ] Third Moment: µ (X ) = 3 σ3 Normalized / Standardized moment: Fourth Moment: µ4 (X ) = 4 µn σ µ σn Ex.Kurtosis(X ) = 4 − 3 σ4 Note that some moments do not exist, which is the case when E(X n) does not converge.

Statistics 104 (Colin Rundel) Lecture 11 February 22, 2012 12 / 22 Statistics 104 (Colin Rundel) Lecture 11 February 22, 2012 13 / 22

Chapter 3.1,3.3,3.4 Moments of Distributions Chapter 3.1,3.3,3.4 Moments of Distributions Third and Forth Moments Moment Generating Function

The moment generating function of the discrete random variable X is 3 3 2 2 3 defined for all real values of t by µ3 = E[(X − µ) ] = E(X − 3X µ + 3X µ − µ ) 3 2 3 3 tX X tx = E(X ) − 3µE(X ) + 3µ − µ MX (t) = E[e ] = e P(X = x) = E(X 3) − 3µσ2 − µ3 x 0 2 3 This is called the moment generating function because we can obtain the = µ3 − 3µσ − µ moments of X by successively differentiating MX (t) and evaluating at t = 0.

0 0 MX (0) = E[e ] = 1 = µ0 d  d  4 4 3 2 2 3 4 M0 (t) = E[etX ] = E etX = E[XetX ] µ4 = E[(X − µ) ] = E(X − 4X µ + 6X µ − 4X µ + µ ) X dt dt 4 3 2 2 4 4 0 0 0 = E(X ) − 4µE(X ) + 6µ E(X ) − 4µ + µ MX (0) = E[Xe ] = E[X ] = µ1 4 3 2 2 2 2 2 4 d d  d  = E(X ) − 4µE(X ) + 2µ (σ + µ ) + 4µ σ + µ M00(t) = M0 (t) = E[XetX ] = E (XetX ) = E[X 2etX ] X dt X dt dt 0 0 2 2 4 00 2 0 2 0 = µ4 − 4µµ3 + 6µ σ + 3µ MX (0) = E[X e ] = E[X ] = µ2

Statistics 104 (Colin Rundel) Lecture 11 February 22, 2012 14 / 22 Statistics 104 (Colin Rundel) Lecture 11 February 22, 2012 15 / 22 Chapter 3.1,3.3,3.4 Moments of Distributions Chapter 3.1,3.3,3.4 Moments of Distributions Moment Generating Function - Poisson Moment Generating Function - Poisson Skewness

Let X ∼ Pois(λ) then

∞ −λ k ∞ t k tX X tk e λ −λ X (λe ) 0 000 3 2 MX (t) = E[e ] = e = e µ = M (0) = λ + 3λ + λ k! k! 3 X k=0 k=0 = exp(λet − λ) µ µ0 − 3µσ2 − µ3 Skewness(X ) = 3 = 3 σ3 σ3 0 t t 3 2 2 3 MX (t) = λe exp(λe − λ) λ + 3λ + λ − 3λ − λ 0 0 = √ MX (0) = µ1 = λ = E(X ) ( λ)3 √ 2 ( λ) −1/2 M00(t) = (λet )2 exp(λet − λ) + λet exp(λet − λ) = √ = λ X 3 00 0 2 2 ( λ) MX (0) = µ2 = λ + λ = E(X )

Statistics 104 (Colin Rundel) Lecture 11 February 22, 2012 16 / 22 Statistics 104 (Colin Rundel) Lecture 11 February 22, 2012 17 / 22

Chapter 3.1,3.3,3.4 Moments of Distributions Chapter 3.1,3.3,3.4 Moments of Distributions Moment Generating Function - Poisson Kurtosis Moment Generating Function - Binomial

Let X ∼ Binom(n, p) then

µ0 = M000(0) = λ3 + 3λ2 + λ n ! n ! 3 X tX X tk n k n−k X n t k n−k M (t) = E[e ] = e p (1 − p) = (pe ) (1 − p) 0 0000 4 3 2 X k k µ4 = MX (0) = λ + 6λ + 7λ + λ k=0 k=0 = [pet + (1 − p)]n µ µ0 − 4µµ0 + 6µ2σ2 + 3µ4 Kurtosis(X ) = 4 = 4 3 σ4 σ4 µ0 − 4λ4(λ3 + 3λ2 + λ) + 6λ3 + 3λ4 0 t t n−1 = 4 MX (t) = npe (pe + 1 − p) λ2 0 0 MX (0) = µ1 = np = E(X ) µ0 − λ4 − 6λ3 − 4λ2 = 4 λ2 4 3 2 4 3 2 00 t 2 t n−2 t t n−1 (λ + 6λ + 7λ + λ) − λ − 6λ − 4λ MX (t) = n(n − 1)(pe ) (pe + 1 − p) + npe (pe + 1 − p) = 2 00 0 2 2 λ MX (0) = µ2 = n(n − 1)p + np = E(X ) 3λ2 + λ = = 3 + λ−1 λ2 Var(X ) = µ0 − (µ0 )2 = n(n − 1)p2 + np − n2p2 Ex. Kurtosis(X ) = λ−1 2 1 = np((n − 1)p + 1 − np) = np(1 − p)

Statistics 104 (Colin Rundel) Lecture 11 February 22, 2012 18 / 22 Statistics 104 (Colin Rundel) Lecture 11 February 22, 2012 19 / 22 Chapter 3.1,3.3,3.4 Moments of Distributions Chapter 3.1,3.3,3.4 Moments of Distributions Moment Generating Function - Normal Moment Generating Function - Normal, cont.

Let U ∼ N (0, 1) and X ∼ N (µ, σ2) where X = µ + σZ then Z ∞ tU tx 1 −x2/2 MU (t) = E[e ] = e √ e dx 2 2 2π 0 t σ 2 −∞ MX (t) = exp(µt + )(µ + tσ ) Z ∞ 2 1 − x −tx 2 = √ e 2 dx 0 0 2π −∞ MX (0) = µ1 = µ Z ∞ 2 2 1 − (x−t) + t = √ e 2 2 dx 2π −∞  t2σ2   t2σ2  Z ∞ 2 00 2 2 2 t2/2 1 − (x−t) M (t) = exp µt + σ + exp µt + (µ + tσ ) = e √ e 2 dx X 2 2 −∞ 2π 2 00 0 2 2 = et /2 MX (0) = µ2 = µ + σ

tX t(µ+σZ) tµ tσZ MX (t) = E[e ] = E[e ] = E[e e ] tµ tσZ tµ = e E[e ] = e MU (tσ) 2 2 2 2  t σ  = etµet σ /2 = exp tµ + 2

Statistics 104 (Colin Rundel) Lecture 11 February 22, 2012 20 / 22 Statistics 104 (Colin Rundel) Lecture 11 February 22, 2012 21 / 22

Chapter 3.1,3.3,3.4 Moments of Distributions Moment Generating Function - Normal, cont.

0 000 3 2 µ3 = MX (0) = µ + 3µσ 0 0000 4 2 2 4 µ4 = MX (0) = µ + 6µ σ + 3σ

µ µ0 − 3µσ2 − µ3 Skewness(X ) = 3 = 3 σ3 σ3 µ3 + 3µσ2 − 3µσ2 − µ3 = = 0 σ3

µ µ0 − 4µµ0 + 6µ2σ2 + 3µ4 Kurtosis(X ) = 4 = 4 3 σ4 σ4 µ0 − 4µ(µ3 + 3µσ2) + 6µ2σ2 + 3µ4 = 4 σ4 (µ4 + 6µ2σ2 + 3σ4) − 6µ2σ2 − 1µ4 3σ4 = = = 3 σ4 σ4 Ex. Kurtosis(X ) = 0

Statistics 104 (Colin Rundel) Lecture 11 February 22, 2012 22 / 22