STA111 - Lecture 8 Law of Large Numbers, Central Limit Theorem
Total Page:16
File Type:pdf, Size:1020Kb
STA111 - Lecture 8 Law of Large Numbers, Central Limit Theorem 1 Law of Large Numbers Let X1, X2, ... , Xn be independent and identically distributed (iid) random variables with finite expectation Pn µ = E(Xi ), and let the sample mean be Xn = i=1 Xi =n. Then, P (Xn ! µ) = 1 as n ! 1. In words, the Law of Large Numbers (LLN) shows that sample averages converge to the popula- tion/theoretical mean µ (with probability 1) as the sample size increases. This might sound kind of obvious, but it is something that has to be proved. The following picture in the Wikipedia article illustrates the concept. We are rolling a die many times, and every time we roll the die we recompute the average of all the results. The x-axis of the graph is the number of trials and the y-axis corresponds to the average outcome. The sample mean stabilizes to the expected value (3.5) as n increases: 2 Central Limit Theorem The Central Limit Theorem (CLT) states that sums and averages of random variables are approximately Normal when the sample size is big enough. Before we introduce the formal statement of the theorem, let’s think about the distribution of sums and averages of random variables. Assume that X1;X2; ::: ; Xn are iid random variables with finite mean and 1 2 variance µ = E(Xi ) and σ = V (X), and let Sn = X1 + X2 + ··· + Xn and Xn = Sn=n. Using properties of expectations and variances, E(Sn) = E(X1) + E(X2) + ··· + E(Xn) = nµ 2 V (Sn) = V (X1) + V (X2) + ··· + V (Xn) = nσ : and similarly for Xn: 1 1 E(X ) = (E(X ) + E(X ) + ··· + E(X )) = (µ + µ + ··· + µ) = µ n n 1 2 n n 1 1 V (X ) = (V (X ) + V (X ) + ··· + V (X )) = (σ2 + σ2 + ··· + σ2) = σ2=n; n n2 1 2 n n2 2 The variance of the sample mean V (Xn) = σ =n shrinks to zero as n ! 1, which makes intuitive sense: as we get more and more data, the sample mean will become more and more precise. This result also gives a handwavy idea as to why LLN is true: as n ! 1, the sample mean “converges” to a random variable that has mean µ and variance 0, which is just µ (this is not a proof of LLN, just some handwaving, I must insist). 2 Now suppose that X1;X2; ::: ; Xn are iid Normal(µ, σ ). Since linear combinations of Normals are Normal: 2 2 Xn ∼ Normal(µ, σ =n);Sn ∼ Normal(nµ, nσ ); so we can standardize and compute probabilities using the standard Normal table, that is p n(X − µ) S − nµ P n ≤ z = P n p ≤ z = P (Z ≤ z); σ σ n where Z ∼ Normal(0; 1). If the sample size is big enough, we can do the same thing with sums and averages of random variables that are not necessarily Normal. More precisely, let X1;X2; :::: ; Xn be iid with finite 2 expectation and variance µ = E(Xi ) and V (Xi ) = σ . Let Sn = X1 + X2 + ··· + Xn, Xn = Sn=n, and Z ∼ Normal(0; 1). Then, p n(X − µ) S − nµ P n ≤ z = P n p ≤ z ! P (Z ≤ z) σ σ n as n ! 1. In practice, we will use CLT to approximate the distribution of sums and averages of random variables by the Normals 2 2 Xn ≈ Normal(µ, σ =n);Sn ≈ Normal(nµ, nσ ): Examples: • Binomial: If Y ∼ Binomial(n; p), we can write Y = X1 + X2 + ··· + Xn, where Xi are independent Bernoulli(p) random variables. By CLT, we know that Y ≈ Normal(np; np(1 − p)). The picture below shows the Binomial PMF and the pdf of the Normal approximation for n = 25 and p 2 f0:5; 0:15g. The approximation is better for p = 0:5, which is not a coincidence: the Binomial distribution is symmetric if p = 0:5 and it becomes more skewed as we approach extreme values (close to 0 or 1). Nonetheless, the approximation for p = 0:15 is pretty good even if n = 25, and it is very good when n = 50. 2 n = 25, p = 0.5 n = 25, p = 0.15 n = 50, p = 0.15 0.15 0.15 0.20 0.15 0.10 0.10 0.10 P(Y=k) P(Y=k) P(Y=k) 0.05 0.05 0.05 0.00 0.00 0.00 0 5 10 15 20 25 0 5 10 15 20 25 0 10 20 30 40 50 k k k • Poisson: If Y ∼ Poisson(λ), we can write Y = X1 + X2 + ··· + Xn, where the Xi are independent Poisson(λ/n) random variables. By CLT, we can approximate the Poisson as a Normal(λ, λ) random variable. The figure below shows the Poisson PMF for λ 2 f25; 50g and the PDF of the corresponding Normal approximation: λ=25 λ=50 0.08 0.03 0.04 P(Y=k) P(Y=k) 0.00 0.00 0 10 20 30 40 50 20 30 40 50 60 70 80 k k • Sample averages from a weird distribution: Let X1;X2; ::: Xn be iid drawn from a discrete distri- bution with the following PMF: P(X=k) 0.00 0.05 0.10 0.15 0 5 10 15 20 25 30 k The following picture shows a histogram found after taking 5000 averages from samples of size n = 30 coming from the weird distribution (and the corresponding Normal approximation): 3 n = 30 Density 0.0 0.2 0.4 2 3 4 5 6 7 8 This is telling us that the distribution of sample averages coming from this odd-looking discrete dis- tribution can be approximated well with a Normal distribution, even if n is as small as 30. 2.1 Optional Reading: Berry-Esseen Bounds CLT is a very cool limiting statement, but how accurate is the approximation in finite samples? The following bound (Berry-Esseen) can give you an idea of what the maximum error can be: p n(X − µ) S − nµ 0:8 α n n p p sup P ≤ z − P (Z ≤ z) = sup P ≤ z − P (Z ≤ z) ≤ 3 ; z2R σ z2R σ n σ n 3 where α = E[(Xi − µ) ]. This bound works for any sequence of iid random variables (with finite expecta- tion, variance, and α) and all values z 2 R. Tighter bounds can be found if one is willing to make further assumptions about the distribution of the Xi or content with a bound that works for particular values of z. Example: iid • Let Y ∼ Binomial(n; 1=2). As you know, Y = X1 + X2 + ··· + Xn, where Xi ∼ Bernoulli(1=2). In this 3 2 case, α = E[(Xi − p) ] = 1=8, σ = 1=4, µ = 1=2, and the Berry-Esseen bound gives us Sn − nµ 3:2 sup P p ≤ z − P (Z ≤ z) ≤ p z2R σ n n This is a pretty loose bound; for example, for n = 100 the Berry-Esseen bound gives us a maximum error is 0.32. Exercise 1. (Extra Credit) Find the Berry-Esseen bound for the sum of independent Poisson(1) random vari- ables, and plot the value of the bound for n between 30 and 200 (you can create the plot with WolframAlpha, for example). 4.