<<

STA111 - Lecture 8 , Central Limit

1 Law of Large Numbers

Let X1, X2, ... , Xn be independent and identically distributed (iid) random variables with finite expectation Pn µ = E(Xi ), and let the sample be Xn = i=1 Xi /n. Then,

P (Xn → µ) = 1 as n → ∞. In words, the Law of Large Numbers (LLN) shows that sample converge to the popula- tion/theoretical mean µ (with 1) as the sample size increases. This might sound kind of obvious, but it is something that has to be proved.

The following picture in the Wikipedia article illustrates the concept. We are rolling a die many times, and every time we roll the die we recompute the of all the results. The x-axis of the graph is the number of trials and the y-axis corresponds to the average . The sample mean stabilizes to the (3.5) as n increases:

2

The Central Limit Theorem (CLT) states that sums and averages of random variables are approximately Normal when the sample size is big enough.

Before we introduce the formal statement of the theorem, let’s think about the distribution of sums and averages of random variables. Assume that X1,X2, ... , Xn are iid random variables with finite mean and

1 2 µ = E(Xi ) and σ = V (X), and let Sn = X1 + X2 + ··· + Xn and Xn = Sn/n. Using properties of expectations and ,

E(Sn) = E(X1) + E(X2) + ··· + E(Xn) = nµ 2 V (Sn) = V (X1) + V (X2) + ··· + V (Xn) = nσ . and similarly for Xn: 1 1 E(X ) = (E(X ) + E(X ) + ··· + E(X )) = (µ + µ + ··· + µ) = µ n n 1 2 n n 1 1 V (X ) = (V (X ) + V (X ) + ··· + V (X )) = (σ2 + σ2 + ··· + σ2) = σ2/n, n n2 1 2 n n2

2 The variance of the sample mean V (Xn) = σ /n shrinks to zero as n → ∞, which makes intuitive sense: as we get more and more data, the sample mean will become more and more precise. This result also gives a handwavy idea as to why LLN is true: as n → ∞, the sample mean “converges” to a that has mean µ and variance 0, which is just µ (this is not a proof of LLN, just some handwaving, I must insist).

2 Now suppose that X1,X2, ... , Xn are iid Normal(µ, σ ). Since linear combinations of Normals are Normal:

2 2 Xn ∼ Normal(µ, σ /n),Sn ∼ Normal(nµ, nσ ), so we can standardize and compute using the standard Normal table, that is √  n(X − µ)  S − nµ  P n ≤ z = P n √ ≤ z = P (Z ≤ z), σ σ n where Z ∼ Normal(0, 1). If the sample size is big enough, we can do the same thing with sums and averages of random variables that are not necessarily Normal. More precisely, let X1,X2, .... , Xn be iid with finite 2 expectation and variance µ = E(Xi ) and V (Xi ) = σ . Let Sn = X1 + X2 + ··· + Xn, Xn = Sn/n, and Z ∼ Normal(0, 1). Then, √  n(X − µ)  S − nµ  P n ≤ z = P n √ ≤ z → P (Z ≤ z) σ σ n as n → ∞.

In practice, we will use CLT to approximate the distribution of sums and averages of random variables by the Normals 2 2 Xn ≈ Normal(µ, σ /n),Sn ≈ Normal(nµ, nσ ). Examples:

• Binomial: If Y ∼ Binomial(n, p), we can write Y = X1 + X2 + ··· + Xn, where Xi are independent Bernoulli(p) random variables. By CLT, we know that Y ≈ Normal(np, np(1 − p)). The picture below shows the Binomial PMF and the pdf of the Normal approximation for n = 25 and p ∈ {0.5, 0.15}. The approximation is better for p = 0.5, which is not a coincidence: the is symmetric if p = 0.5 and it becomes more skewed as we approach extreme values (close to 0 or 1). Nonetheless, the approximation for p = 0.15 is pretty good even if n = 25, and it is very good when n = 50.

2 n = 25, p = 0.5 n = 25, p = 0.15 n = 50, p = 0.15 0.15 0.15 0.20 0.15 0.10 0.10 0.10 P(Y=k) P(Y=k) P(Y=k) 0.05 0.05 0.05 0.00 0.00 0.00 0 5 10 15 20 25 0 5 10 15 20 25 0 10 20 30 40 50 k k k

• Poisson: If Y ∼ Poisson(λ), we can write Y = X1 + X2 + ··· + Xn, where the Xi are independent Poisson(λ/n) random variables. By CLT, we can approximate the Poisson as a Normal(λ, λ) random variable. The figure below shows the Poisson PMF for λ ∈ {25, 50} and the PDF of the corresponding Normal approximation:

λ=25 λ=50 0.08 0.03 0.04 P(Y=k) P(Y=k) 0.00 0.00 0 10 20 30 40 50 20 30 40 50 60 70 80 k k

• Sample averages from a weird distribution: Let X1,X2, ... Xn be iid drawn from a discrete distri- bution with the following PMF: P(X=k) 0.00 0.05 0.10 0.15 0 5 10 15 20 25 30 k

The following picture shows a found after taking 5000 averages from samples of size n = 30 coming from the weird distribution (and the corresponding Normal approximation):

3 n = 30 Density 0.0 0.2 0.4 2 3 4 5 6 7 8

This is telling us that the distribution of sample averages coming from this odd-looking discrete dis- tribution can be approximated well with a , even if n is as small as 30.

2.1 Optional Reading: Berry-Esseen Bounds CLT is a very cool limiting statement, but how accurate is the approximation in finite samples? The following bound (Berry-Esseen) can give you an idea of what the maximum error can be: √  n(X − µ)  S − nµ  0.8 α n n √ √ sup P ≤ z − P (Z ≤ z) = sup P ≤ z − P (Z ≤ z) ≤ 3 , z∈R σ z∈R σ n σ n

3 where α = E[(Xi − µ) ]. This bound works for any sequence of iid random variables (with finite expecta- tion, variance, and α) and all values z ∈ R. Tighter bounds can be found if one is willing to make further assumptions about the distribution of the Xi or content with a bound that works for particular values of z.

Example:

iid • Let Y ∼ Binomial(n, 1/2). As you know, Y = X1 + X2 + ··· + Xn, where Xi ∼ Bernoulli(1/2). In this 3 2 case, α = E[(Xi − p) ] = 1/8, σ = 1/4, µ = 1/2, and the Berry-Esseen bound gives us   Sn − nµ 3.2 sup P √ ≤ z − P (Z ≤ z) ≤ √ z∈R σ n n This is a pretty loose bound; for example, for n = 100 the Berry-Esseen bound gives us a maximum error is 0.32.

Exercise 1. (Extra Credit) Find the Berry-Esseen bound for the sum of independent Poisson(1) random vari- ables, and plot the value of the bound for n between 30 and 200 (you can create the plot with WolframAlpha, for example).

4