Large Sample Tools∗ STA 312: Fall 2012

Large Sample Tools∗ STA 312: Fall 2012 Background Reading: Davison's Statistical models • For completeness, look at Section 2.1, which presents some basic applied statistics in an advanced way. • Especially see Section 2.2 (Pages 28-37) on convergence. • Section 3.3 (Pages 77-90) goes more deeply into simulation than we will. At least skim it. Overview Contents 1 Foundations2 2 Law of Large Numbers3 3 Consistency7 4 Central Limit Theorem8 5 Convergence of random vectors 10 6 Delta Method 11 ∗See last slide for copyright information. 1 1 Foundations Sample Space Ω, ! 2 Ω • Observe whether a single individual is male or female: Ω = fF; Mg • Pair of individuals; observe their genders in order: Ω = f(F; F ); (F; M); (M; F ); (M; M)g • Select n people and count the number of females: Ω = f0; : : : ; ng For limits problems, the points in Ω are infinite sequences. Random variables are functions from Ω into the set of real numbers P rfX 2 Bg = P r(f! 2 Ω: X(!) 2 Bg) Random Sample X1(!);:::;Xn(!) • T = T (X1;:::;Xn) • T = Tn(!) • Let n ! 1 to see what happens for large samples Modes of Convergence • Almost Sure Convergence • Convergence in Probability • Convergence in Distribution Almost Sure Convergence a:s: We say that Tn converges almost surely to T , and write Tn ! if P rf! : lim Tn(!) = T (!)g = 1: n!1 • Acts like an ordinary limit, except possibly on a set of probability zero. • All the usual rules apply. • Called convergence with probability one or sometimes strong convergence. 2 2 Law of Large Numbers Strong Law of Large Numbers Let X1;:::;Xn be independent with common expected value µ. a:s: Xn ! E(Xi) = µ The only condition required for this to hold is the existence of the expected value. Probability is long run relative frequency • Statistical experiment: Probability of \success" is p • Carry out the experiment many times independently. • Code the results Xi = 1 if success, Xi = 0 for failure, i = 1; 2;::: Sample proportion of successes converges to the probability of success Recall Xi = 0 or 1. 1 X E(Xi) = x P rfXi = xg x=0 = 0 · (1 − p) + 1 · p = p Relative frequency is n 1 X a:s: X = X ! p n i n i=1 3 Simulation • Estimate almost any probability that's hard to figure out • Power • Weather model • Performance of statistical methods • Confidence intervals for the estimate A hard elementary problem • Roll a fair die 13 times and observe the number each time. • What is the probability that the sum of the 13 numbers is divisible by 3? Simulate from a multinomial > nsim = 1000 # nsim is the Monte Carlo sample size > set.seed(9999) # So I can reproduce the numbers if desired. > kount = numeric(nsim) > for(i in 1:nsim) + { + tot = sum(rmultinom(1,13,die)*(1:6)) + kount[i] = (tot/3 == floor(tot/3)) + # Logical will be converted to numeric + } > kount[1:20] [1] 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 > xbar = mean(kount); xbar [1] 0.329 Check if the sum is divisible by 3 > tot = sum(rmultinom(1,13,die)*(1:6)) > tot [1] 42 > tot/3 == floor(tot/3) [1] TRUE > 42/3 [1] 14 4 Estimated Probability > nsim = 1000 # nsim is the Monte Carlo sample size > set.seed(9999) # So I can reproduce the numbers if desired. > kount = numeric(nsim) > for(i in 1:nsim) + { + tot = sum(rmultinom(1,13,die)*(1:6)) + kount[i] = (tot/3 == floor(tot/3)) + # Logical will be converted to numeric + } > kount[1:20] [1] 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 > xbar = mean(kount); xbar [1] 0.329 q X(1−X) Confidence Interval X ± zα/2 n > z = qnorm(0.995); z [1] 2.575829 > pnorm(z)-pnorm(-z) # Just to check [1] 0.99 > margerror99 = sqrt(xbar*(1-xbar)/nsim)*z; margerror99 [1] 0.03827157 > cat("Estimated probability is ",xbar," with 99% margin of error ", + margerror99,"\n") Estimated probability is 0.329 with 99% margin of error 0.03827157 > cat("99% Confidence interval from ",xbar-margerror99," to ", + xbar+margerror99,"\n") 99% Confidence interval from 0.2907284 to 0.3672716 Recall the Change of Variables formula: Let Y = g(X) Z 1 Z 1 E(Y ) = y fY (y) dy = g(x) fX (x) dx −∞ −∞ Or, for discrete random variables 5 X X E(Y ) = y pY (y) = g(x) pX (x) y x This is actually a big theorem, not a definition. Applying the change of variables formula To approximate E[g(X)] n n 1 X 1 X a:s: g(X ) = Y ! E(Y ) n i n i i=1 i=1 = E(g(X)) So for example n 1 X a:s: Xk ! E(Xk) n i i=1 n 1 X a:s: U 2V W 3 ! E(U 2VW 3) n i i i i=1 That is, sample moments converge almost surely to population moments. R 1 Approximate an integral: −∞ h(x) dx Where h(x) is a nasty function. Let f(x) be a density with f(x) > 0 wherever h(x) 6= 0. Z 1 Z 1 h(x) h(x) dx = f(x) dx −∞ −∞ f(x) h(X) = E f(X) = E[g(X)]; 6 So • Sample X1;:::;Xn from the distribution with density f(x) h(Xi) • Calculate Yi = g(Xi) = for i = 1; : : : ; n f(Xi) a:s: • Calculate Y n ! E[Y ] = E[g(X)] Convergence in Probability P We say that Tn converges in probability to T , and write Tn ! T if for all > 0, lim P fjTn − T j < g = 1 n!1 Convergence in probability (say to a constant θ) means no matter how small the interval around θ, for large enough n (that is, for all n > N1) the probability of getting that close to θ is as close to one as you like. Weak Law of Large Numbers p Xn ! µ • Almost Sure Convergence implies Convergence in Probability • Strong Law of Large Numbers implies Weak Law of Large Numbers 3 Consistency Consistency T = T (X1;:::;Xn) is a statistic estimating a parameter θ P The statistic Tn is said to be consistent for θ if Tn ! θ. lim P fjTn − θj < g = 1 n!1 a:s: The statistic Tn is said to be strongly consistent for θ if Tn ! θ. Strong consistency implies ordinary consistency. 7 Consistency is great but it's not enough. • It means that as the sample size becomes indefinitely large, you probably get as close as you like to the truth. • It's the least we can ask. Estimators that are not consistent are completely unaccept- able for most purposes. 100; 000; 000 T a:s:! θ ) U = T + a:s:! θ n n n n Consistency of the Sample Variance n 1 X σ2 = (X − X)2 bn n i i=1 n 1 X 2 = X2 − X n i i=1 a:s: 1 Pn 2 a:s: 2 2 2 By SLLN, Xn ! µ and n i=1 Xi ! E(X ) = σ + µ . Because the function g(x; y) = x − y2 is continuous, n ! 1 X a:s: σ2 = g X2; X ! g(σ2 + µ2; µ) = σ2 + µ2 − µ2 = σ2 bn n i n i=1 4 Central Limit Theorem Convergence in Distribution Sometimes called Weak Convergence, or Conver- gence in Law Denote the cumulative distribution functions of T1;T2;::: by F1(t);F2(t);::: respectively, and denote the cumulative distribution function of T by F (t). d We say that Tn converges in distribution to T , and write Tn ! T if for every point t at which F is continuous, lim Fn(t) = F (t) n!1 8 Univariate Central Limit Theorem Let X1;:::;Xn be a random sample from a distribution with expected value µ and variance σ2. Then p n(X − µ) Z = n !d Z ∼ N(0; 1) n σ Connections among the Modes of Convergence a:s: p d • Tn ! T ) Tn ! T ) Tn ! T . d p • If a is a constant, Tn ! a ) Tn ! a. Sometimes we say the distribution of the sample mean is approximately normal, or asymptotically normal. • This is justified by the Central Limit Theorem. • But it does not mean that Xn converges in distribution to a normal random variable. • The Law of Large Numbers says that Xn converges in distribution to a constant, µ. • So Xn converges to µ in distribution as well. Why would we say that for large n, the sample mean is approximately N(µ, σ2 )? p n n(Xn−µ) d Have Zn = σ ! Z ∼ N(0; 1). p p n(X − µ) n(x − µ) P rfX ≤ xg = P r n ≤ n σ σ p n(x − µ) = P r Z ≤ n σ p n(x − µ) ≈ Φ σ σ2 Suppose Y is exactly N(µ, n ): p p n(Y − µ) n(x − µ) P rfY ≤ xg = P r ≤ σ σ p n(x − µ) = P r Z ≤ n σ p n(x − µ) = Φ σ 9 5 Convergence of random vectors Convergence of random vectors 1. Definitions (All quantities in boldface are vectors in Rm unless otherwise stated ) a:s: ? Tn ! T means P f! : limn!1 Tn(!) = T(!)g = 1. P ? Tn ! T means 8 > 0; limn!1 P fjjTn − Tjj < g = 1. d ? Tn ! T means for every continuity point t of FT, limn!1 FTn (t) = FT(t).

Large Sample Tools∗ STA 312: Fall 2012

Chapter 8 Large Sample Theory

Applying the Delta Method in Metric Analytics

Chapter 6 Asymptotic Distribution Theory

The 'Delta Method'

Chapter 5 the Delta Method and Applications

Stat 8931 (Aster Models) Lecture Slides Deck 4 [1Ex] Large Sample

Taylor Approximation and the Delta Method

TESTING for the PARETO DISTRIBUTION Suppose That X1

Uniformity and the Delta Method

Calculating Confidence Intervals for Continuous and Discontinuous Functions of Estimated Parameters

MLE Lecture 4.Pdf

1 the Delta Method; a General Approach to Handling Nonlinear Functions