Convergence in Distribution Central Limit Theorem

Convergence in Distribution Central Limit Theorem Statistics 110 Summer 2006 Copyright °c 2006 by Mark E. Irwin Convergence in Distribution Theorem. Let X » Bin(n; p) and let ¸ = np, Then µ ¶ n e¡¸¸x lim P [X = x] = lim px(1 ¡ p)n¡x = n!1 n!1 x x! So when n gets large, we can approximate binomial probabilities with Poisson probabilities. Proof. µ ¶ µ ¶ µ ¶ µ ¶ n n ¸ x ¸ n¡x lim px(1 ¡ p)n¡x = lim 1 ¡ n!1 x n!1 x n n µ ¶ µ ¶ µ ¶ n! 1 ¸ ¡x ¸ n = ¸x 1 ¡ 1 ¡ x!(n ¡ x)! nx n n Convergence in Distribution 1 µ ¶ µ ¶ µ ¶ n! 1 ¸ ¡x ¸ n = ¸x 1 ¡ 1 ¡ x!(n ¡ x)! nx n n µ ¶ ¸x n! 1 ¸ n = lim 1 ¡ x! n!1 (n ¡ x)!(n ¡ ¸)x n | {z } | {z } !1 !e¡¸ e¡¸¸x = x! 2 Note that approximation works better when n is large and p is small as can been seen in the following plot. If p is relatively large, a di®erent approximation should be used. This is coming later. (Note in the plot, bars correspond to the true binomial probabilities and the red circles correspond to the Poisson approximation.) Convergence in Distribution 2 lambda = 1 n = 10 p = 0.1 lambda = 1 n = 50 p = 0.02 lambda = 1 n = 200 p = 0.005 p(x) p(x) p(x) 0.0 0.1 0.2 0.3 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 x x x lambda = 5 n = 10 p = 0.5 lambda = 5 n = 50 p = 0.1 lambda = 5 n = 200 p = 0.025 p(x) p(x) p(x) 0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.00 0.05 0.10 0.15 0 1 2 3 4 5 6 7 8 9 0 2 4 6 8 10 12 0 2 4 6 8 10 12 x x x Convergence in Distribution 3 Example: Let Y1;Y2;::: be iid Exp(1). Then Xn = Y1 + Y2 + ::: + Yn » Gamma(n; 1) which has p E[Xn] = n; Var(Xn) = n; SD(Xn) = n Xpn¡n Thus Zn = n has mean = 0 and variance = 1. Lets compare its distribution to Z » N(0; 1). i.e. Is P [¡1 · Zn · 2] ¼ P [¡1 · Z · 2]? Let X ¡ n p Z = np ; X = n + nZ n n n n p p fZn(z) = fXn(n + nz) £ n Convergence in Distribution 4 Z b P [a · Zn · b] = fZn(z)dz a Z b p p = nfXn(n + nz)dz a Z p b p (n + nz)n¡1 p = n e¡(n+ nz)dz a (n ¡ 1)! p To go further we need Stirling's Formula: n! ¼ nne¡n 2¼n. So p p p p p n f (n + nz) n = e¡n¡z n(n + z n)n¡1 Xn (n ¡ 1)! p p p e¡n¡z n(n + z n)n¡1 n ¼ p (n ¡ 1)n¡1e¡n+1 2¼n µ ¶ 1 p z n ¼ p e¡z n 1 + p n 2¼ | {z } gn(z) Convergence in Distribution 5 µ ¶ p z log(g (z)) = ¡z n + n log 1 + p n n · ¸ µ ¶ p z 1z2 1 z3 1 1 = ¡z n + n p ¡ + ¡ ::: ¼ ¡ z2 + O p n 2 n 3n3=2 2 n so p p 1 ¡z2=2 fX (n + z n) n ¼ p e n 2¼ Thus Z b 1 ¡z2=2 P [a · Zn · b] ! p e dz = P [a · Z · b] a 2¼ So as n increases, the distribution of Zn gets closer and closer to a N(0; 1). Convergence in Distribution 6 p Another way of thinking of this, is that the distribution of Xn = n + Zn n approaches that of a N(n; n). n = 2 n = 5 n = 10 f(x) f(x) f(x) 0.0 0.1 0.2 0.3 0.00 0.05 0.10 0.15 0.20 0.00 0.04 0.08 0.12 −2 0 2 4 6 −2 0 2 4 6 8 10 12 0 5 10 15 20 x x x n = 20 n = 50 n = 100 f(x) f(x) f(x) 0.00 0.02 0.04 0.06 0.08 0.00 0.01 0.02 0.03 0.04 0.05 0.00 0.01 0.02 0.03 0.04 5 10 15 20 25 30 35 30 40 50 60 70 70 80 90 100 110 120 130 x x x Convergence in Distribution 7 De¯nition. Let X1;X2;::: be a sequence of RVs with cumulative distribution functions F1;F2;::: and let X be a RV with distribution F . We say Xn Converges in Distribution to X if lim Fn(x) = F (x) n!1 D at every point at which F is continuous. Xn ¡! X An equivalent statement to this is that for all a and b where F is continuous P [a · Xn · b] ! P [a · X · b] Note that if Xn and X are discrete distributions, this condition reduces to P [Xn = xi] ! P [X = xi] for all support points xi. Convergence in Distribution 8 Note that an equivalent de¯nition of convergence in distribution is that D Xn ¡! X if E[g(Xn)] ! E[g(X)] for all bounded, continuous functions g(¢). This statement of convergence in distribution is needed to help prove the following theorem Theorem. [Continuity Theorem] Let Xn be a sequence of random variables with cumulative distribution functions Fn(x) and corresponding moment generating functions Mn(t). Let X be a random variable with cumulative distribution function F (x) and moment generating function M(t). If Mn(t) ! M(t) for all t in an open interval containing zero, then D Fn(x) ! F (x) at all continuity points of F . That is Xn ¡! X. Thus the previous two examples (Binomial/Poisson and Gamma/Normal) could be proved this way. Convergence in Distribution 9 For the Gamma/Normal example µ ¶ Ã !n p p t ¡t n 1 ¡t n MZ (t) = MX p e = e n n pt n 1 ¡ n Similarly to the earlier proof, its easier to work with log MZn(t) µ ¶ p t log M (t) = ¡t n ¡ n log 1 ¡ p Zn n · ¸ p t 1t2 1 t3 = ¡t n ¡ n ¡p ¡ ¡ ¡ ::: n 2 n 3n3=2 µ ¶ 1 1 = t2 + O p 2 n Thus t2=2 MZn(t) ! e which is the MGF for a standard normal. Convergence in Distribution 10 Central Limit Theorem Theorem. [Central Limit Theorem (CLT)] Let X1;X2;X3;::: be a sequence of independent RVs having mean ¹ and variance σ2 and a common distribution function F (x) and moment generating function M(t) de¯ned in a neighbourhood of zero. Let Xn Sn = Xn i=1 Then · ¸ S ¡ n¹ lim P n p · x = ©(x) n!1 σ n That is S ¡ n¹ n p ¡!D N(0; 1) σ n Central Limit Theorem 11 Proof. Without a loss of generality, we can assume that ¹ = 0. So let Spn Zn = σ n. Since Sn is the sum of n iid RVs, µ µ ¶¶ t n M (t) = (M(t))n ; M (t) = M p Sn Zn σ n Taking a Taylor series expansion of M(t) around 0 gives 1 1 M(t) = M(0) + M 0(0)t + M 00(0)t2 + ² = 1 + σ2t2 + O(t3) 2 t 2 since M(0) = 1;M 0(0) = ¹ = 0;M 00(0) = σ2. So µ ¶ µ ¶ Ãµ ¶ ! t 1 t 2 t 3 M p = 1 + σ2 p + O p σ n 2 σ n σ n µ ¶ t2 1 = 1 + + O 2n n3=2 Central Limit Theorem 12 This gives µ 2 µ ¶¶n t 1 2 M (t) = 1 + + O ! et =2 Zn 2n n3=2 2 Note that the requirement of a MGF is not needed for the theorem to hold. 2 In fact, all that is needed is that Var(Xi) = σ < 1. A standard proof of this more general theorem uses the characteristic function (which is de¯ned for any distribution) Z 1 Á(t) = eitxf(x)dx = M(it) ¡1 p instead of the moment generating function M(t), where i = ¡1. Thus the CLT holds for distributions such as the log normal, even though it doesn't have a MGF. Central Limit Theorem 13 Also, the CLT is often presented in the following equivalent form X¹ ¡ ¹ p X¹ ¡ ¹ Z = n p = n n ¡!D N(0; 1) n σ= n σ To see this is the same, just multiply the numerator and denominator by n in the ¯rst form to get the statement about Sn. The common way that this is used is that µ 2¶ approx: ¡ ¢ approx: σ S » N n¹; nσ2 or X¹ » N ¹; n n n Central Limit Theorem 14 Example: Insurance claims Suppose that an insurance company has 10,000 policy holders. The expected yearly claim per policyholder is $240 with a standard deviation of $800. What is the approximate probability that the total yearly claims S10;000 > $2.6 Million E[S ] = 10; 000 £ 240 = 2; 400; 000 10;000 p SD(S10;000) = 10; 000 £ 800 = 80; 000 P [S10;000 > 2; 600; 000] · ¸ S ¡ 2; 400; 000 2; 600; 000 ¡ 2; 400; 000 = P 10;000 > 80; 000 80; 000 ¼ P [Z > 2:5] = 0:0062 Note that this probability statement does not use anything about the distribution of the original policy claims except their mean and standard deviation.

Convergence in Distribution Central Limit Theorem

Use of Proc Iml to Calculate L-Moments for the Univariate Distributional Shape Parameters Skewness and Kurtosis

A Skew Extension of the T-Distribution, with Applications

A Family of Skew-Normal Distributions for Modeling Proportions and Rates with Zeros/Ones Excess

Approximating the Distribution of the Product of Two Normally Distributed Random Variables

Portfolio Allocation with Skewness Risk: a Practical Guide∗

Skewness and Kurtosis 2

Long-Term Skewness and Systemic Risk

Mean-Variance-Skewness Portfolio Performance Gauging: a General Shortage Function and Dual Approach

Statistical Notes for Clinical Researchers: Assessing Normal Distribution (2) Using Skewness and Kurtosis

Understanding the Forecast Statistics and Four Moments ROV Technical Papers Series: Volume 3

The Skewness of the Stock Market at Long Horizons

Measures of Multivariate Skewness and Kurtosis in High-Dimensional Framework