Asymptotic Theory for 240A

Home , Asymptotic distribution, Estimator

Colin Cameron: Brief Asymptotic Theory for 240A

For 240A we do not go in to great detail. Key OLS results are in Section 1 and 4. The theorems cited in sections 2 and 3 are those from Appendix A of Cameron and Trivedi (2005), Microeconometrics: Methods and Applications.

1. A Roadmap for the OLS Estimator

1.1. Estimator and Model Assumptions 1 N 2 N Consider the OLS estimator = i=1 xi i=1 xiyi, where there is just one regressor and no intercept. We wantP to …nd theP properties of as sample size N . b ! 1 First, we need to make assumptions about the data generatingb process (dgp). We assume data are independent over i, the model is correctly speci…ed and the error is well behaved:

yi = xi + ui 2 ui xi [0; ], not necessarily normal. j 1 2 N 2 Then E[ ] = and V[ ] = i=1 xi for any N. b b P 1.2. Consistency of Intuitively collapses on its mean of as N , since V[ ] 0 as N . b ! 1 ! ! 1 The formal term is that converges in probability to , or that plim = , p or that b . We say that is consistent for . b ! The usual method ofb proof is not this simple as our toolkit works separatelyb on eachb average, and then combinesb results. Given the dgp, can be written as

1 N x u = + N i=1 i i : b 1 N x2 NP i=1 i The numerator can be shownb to go to zero,P by a law of large numbers, while the denominator goes to something nonzero. It follows that the ratio goes to zero. Formally:

1 N p plim x u 0 + N i=1 i i = + = : ! plim 1 N x2 plim 1 N x2 NP i=1 i N i=1 i b P P 1.3. Asymptotic Normality of Intuitively, if a central limit theorem can be applied, then b a a 2 N 2 1 [E[ ],V[ ]] [ , ( x ) ]; N N i=1 i a where means is “basymptoticallyb b distributedP as”. Again our toolkit works separately on each average, and then combines results. The method is to rescale by pN, to get something with nondegenerate distribution, and

1 N 0; 2 plim 1 N x2 i=1 xiui N i=1 i pN( ) = pN d N 1 N 2 ! h 1 N 2 i N Pi=1 xi plim N i=1Pxi 1 b 1 d P 0; 2 plim N x2 P : !N N i=1 i " # P The key component is that 1 N x u has a normal distribution as N , pN i=1 i i by a central limit theorem. ! 1 Given this theoretical resultP we convert from pN( ) to and drop the plim, giving a 2 N 2 1 [ , ( x ) ]: b b N i=1 i P 2. Consistency b

The keys are (1) convergence in probability; (2) law of large numbers for an average; (3) combining pieces.

2.1. Convergence in Probability

Consider the limit behavior of a sequence of random variables bN as N . ! 1 This is a stochastic extension of a sequence of real numbers, such as aN = 2 + (3=N).

2 Examples include: (1) bN is an estimator, say ; (2) bN is a component of an 1 estimator, such as N i xiui; (3) bN is a test statistic. Due to sampling randomness we can never be certainb that a random sequence P bN , such as an estimator N , will be within a given small distance of its limit, even if the sample is in…nitely large. But we can be almost certain. Di¤erent ways of expressing this almost certaintyb correspond to di¤erent types of convergence of a sequence of random variables to a limit. The one most used in econometrics is convergence in probability. Recall that a sequence of nonstochastic real numbers aN converges to a if f g for any " > 0, there exists N = N (") such that for all N > N ,

aN a < ": j j e.g. if aN = 2+3=N; then the limit a = 2 since aN a = 2+3=N 2 = 3=N < " j j j j j j for all N > N = 3=": For a sequence of r.v.’swe cannot be certain that bN b < ", even for large N, due to the randomness. Instead, we require that thej probability j of being within " is arbitrarily close to one. Thus bN converges in probability to b if f g

lim Pr[ bN b < "] = 1; N !1 j j for any " > 0. A more formal de…nition is the following. De…nition A1: (Convergence in Probability) A sequence of random variables bN converges in probability to b if for any " > 0 and > 0, there exists f g N = N ("; ) such that for all N > N ,

Pr[ bN b < "] > 1 : j j

We write plim bN = b, where plim is short-hand for probability limit, or p bN b. The limit b may be a constant or a random variable. The usual de…nition of convergence! for a sequence of real variables is a special case of A1. For vector random variables we can apply the theory for each element of 2 bN . [Alternatively replace bN b by the scalar (bN b)0(bN b) = (b1N b1) + 2 j j + (bKN bK ) or by its square root bN b .] jj jj Now consider bN to be a sequence of parameter estimates . f g b 3 De…nition A2:(Consistency) An estimator is consistent for 0 if plim = : 0b Unbiasedness consistency. Unbiasedness states E[] = . Unbiasedness ; b 0 permits variability around 0 that need not disappear as the sample size goes to in…nity. b Consistency ; unbiasedness. e.g. add 1=N to an unbiased and consistent estimator - now biased but still consistent. A useful property of plim is that it can apply to transformations of random variables.

Theorem A3:(Probability Limit Continuity). Let bN be a …nite-dimensional vector of random variables, and g( ) be a real-valued function continuous at a constant vector point b. Then p p bN b g(bN ) g(b): ! ) ! This theorem is often referred to as Slutsky’s Theorem. We instead call Theo- rem A12 Slutsky’stheorem. Theorem A3 is one of the major reasons for the prevalence of asymptotic results versus …nite sample results in econometrics. It states a very convenient property that does not hold for expectations. For example, plim(aN ; bN ) = (a; b) implies plim(aN bN ) = ab, whereas E[aN bN ] generally di¤ers from E[a]E[b]. Similarly plim[aN =bN ] = a=b provided b = 0. There are several ways to establish convergence6 in probability. The brute force method uses De…nition A1, this is rarely done. It is often easier to establish alternative modes of convergence, notably convergence in mean square or use of Chebychev’sinequality, which in turn imply convergence in probability. But it is usually easiest to use a law of large numbers.

2.2. Laws of Large Numbers Laws of large numbers are theorems for convergence in probability (or almost surely) in the special case where the sequence bN is a sample average, i.e. f g bN = XN where 1 N X = X : N N i i=1 X 4 Note that Xi here is general notation for a random variable, and in the regression context does not necessarily denote the regressor variables. For example Xi = xiui. De…nition A7:(Law of Large Numbers)A weak law of large numbers (LLN) speci…es conditions on the individual terms Xi in XN under which p (XN E[XN ]) 0: ! For a strong law of large numbers the convergence is instead almost surely. If a LLN can be applied then

plim XN = lim E[XN ] in general 1 N = lim N i=1 E[Xi] if Xi independent over i = if Xi iid. P The simplest laws of large numbers assume that Xi is iid.

Theorem A8:(Kolmogorov LLN ) Let Xi be iid (independent and identically f g as distributed). If and only if E[Xi] = exists and E[ Xi ] < , then (XN ) 0. j j 1 ! The Kolmogorov LLN gives almost sure convergence. Usually convergence in probability is enough and we can use the weaker Khinchine’s Theorem.

Theorem A8b:(Khinchine’s Theorem) Let Xi be iid (independent and iden- f g p tically distributed). If and only if E[Xi] = exists, then (XN ) 0. ! There are other laws of large numbers. In particular, if Xi are independent but not identically distributed we can use the Markov LLN.

2.3. Consistency of OLS Estimator

1 N 1 N 2 Obtain the probability limit of = + [ N i=1 xiui]=[ N i=1 xi ], under simple random sampling with iid errors. Assume xi iid with mean x and ui iid with mean 0. b P P 1 p As xiui are iid, apply Khinchine’s Theorem yielding N i xiui E[xu] = E[x] E[u] = 0. ! 2 1 P 2 p 2 As xi are iid, apply Khinchine’sTheorem yielding N i xi E[x ] which we assume exists. ! P By Theorem A3 (Probability Limit Continuity) plim[aN =bN ] = a=b if b = 0. Then 6 plim 1 N x u 0 plim = + N i=1 i i = + = : plim 1 N x2 E[x2] NP i=1 i b P5 3. Asymptotic Normality

The keys are (1) convergence in distribution; (2) central limit theorem for an average; (3) combining pieces. Given consistency, the estimator has a degenerate distribution that collapses on 0 as N . So cannot do statistical inference. [Indeed there is no reason to ! 1 do it if N .] Need to magnify orb rescale to obtain a random variable with nondegenerate! 1 distribution as N . ! 1 b 3.1. Convergence in Distribution

Often the appropriate scale factor is pN, so consider bN = pN( 0). bN has an extremely complicated cumulative distribution function (cdf) FN . But like any other function FN it may have a limit function, where convergenceb is in the usual (nonstochastic) mathematical sense. De…nition A10:(Convergence in Distribution) A sequence of random variables bN is said to converge in distribution to a random variable b if f g

lim FN = F; N !1 at every continuity point of F , where FN is the distribution of bN , F is the distribution of b, and convergence is in the usual mathematical sense.

d We write bN b, and call F the limit distribution of bN . p ! d f g bN b implies bN b. ! ! d In general, the reverse is not true. But if b is a constant then bN b implies p ! bN b. ! To extend limit distribution to vector random variables simply de…ne FN and F to be the respective cdf’sof vectors bN and b. A useful property of convergence in distribution is that it can apply to transformations of random variables.

Theorem A11:(Limit Distribution Continuity). Let bN be a …nite-dimensional vector of random variables, and g( ) be a continuous real-valued function. Then d d bN b g(bN ) g(b): (3.1) ! ) !

6 This result is also called the Continuous Mapping Theorem.

d p Theorem A12:(Slutsky’sTheorem) If aN a and bN b, where a is a random variable and b is a constant, then ! !

d (i) aN + bN a + b d ! (ii) aN bN ab (3.2) !d (iii) aN =bN a=b, provided Pr[b = 0] = 0: ! Theorem A12 (also called Cramer’s Theorem) permits one to separately …nd the limit distribution of aN and the probability limit of bN , rather than having to consider the joint behavior of aN and bN . Result (ii) is especially useful and is sometimes called the Product Rule.

3.2. Central Limit Theorems Central limit theorems give convergence in distribution when the sequence bN is a sample average. A CLT is much easier way to get the plim than brute forcef g use of De…nition A10. By a LLN XN has a degenerate distribution as it converges to a constant, limE[XN ]. So scale (XN E[XN ]) by its standard deviation to construct a random variable with unit variance that will have a nondegenerate distribution. De…nition A13:(Central Limit Theorem) Let

XN E[XN ] ZN = ; V[XN ] where XN is a sample average. A centralp limit theorem (CLT) speci…es the conditions on the individual terms Xi in XN under which

d ZN [0; 1]; !N i.e. ZN converges in distribution to a standard normal random variable. Note that

ZN = (XN E[XN ])= V[XN ] in general N N = (Xi E[Xpi])= V[Xi] if Xi independent over i i=1 i=1 = pN(XN )= q if Xi iid. P P 7 If XN satis…es a central limit theorem, then so too does h(N)XN for functions h( ) such as h(N) = pN, since

h(N)XN E[h(N)XN ] ZN = : V[h(N)XN ]

p 1=2 N Often we apply the CLT to the normalization pNXN = N i=1 Xi, since V[pNX ] is …nite. N P The simplest central limit theorem is the following.

Theorem A14:(Lindeberg-Levy CLT) Let Xi be iid with E[Xi] = and 2 d f g V[Xi] = . Then ZN = pN(XN )= [0; 1]. !N Lindberg-Levy is the CLT in introductory statistics. For the iid case the LLN required exists, while CLT also requires 2 exists. There are other central limit theorems. In particular, if Xi are independent but not identically distributed we can use the Liapounov CLT.

3.3. Limit Distribution of OLS Estimator

1 N 1 N 2 Obtain the limit distribution of pN( ) = [ xiui]=[ x ], under pN i=1 N i=1 i simple random sampling with iid errors. Assume xi iid with mean x and second 2 P 2 P moment E[x ], and assume ui iid withb mean 0 and variance . Then xiui are iid, with mean E[xu] =E[x] E[u] = 0 and variance V[xu] = E[(xu)2] (E[xu])2 =E[x2u2] 0 = E[x2]E[u2] = 2E[x2]. Apply Lindeberg-Levy CLT yielding

N 1 N 1 N xiui 0 p i=1 xiui d pN i=1 = N [0; 1]: 2 2 2 2 !N P E[x ] ! P E[x ]

p d p d p Using Slutsky’s theorem that aN bN a b (for aN a and bN b), this implies that ! ! !

1 N N xiui 1 pN i=1 2 2 d 2 2 d 2 2 xiui = E[x ] [0; 1] E[x ] [0; E[x ]]: pN i=1 P2E[x2] !N !N X p p p

8 d d p Then using Slutsky’stheorem that aN =bN a=b (for aN a and bN b) ! ! ! 1 N i=1 xiui pN( ) = pN 1 N 2 N Pi=1 xi 2 2 2 2 b d [0; E[x ]] d [0; E[x ]] d 1 NP N 0; 2 E[x2] ; ! 1 N 2 ! E[x2] !N plim N i=1 xi h i P 1 N 2 2 where we use result from consistency proof that plim N i=1 xi =E[x ]. P 3.4. Asymptotic Distribution of OLS Estimator From consistency we have that has a degenerate distribution with all mass at , while pN( ) has a limit normal distribution. For formal asymptotic theory, such as deriving hypothesis tests,b we work with this limit distribution. But for exposition itb is convenient to think of the distribution of rather than pN( ). We do this by introducing the arti…ce of "asymptotic distribution". Speci…cally we consider N large but not in…nite, andb drop the probabilityb limit in the preceding result, so that

1 N 2 1 2 pN( ) 0; xi : N " N i=1 # X b It follows that the asymptotic distribution of is

N 1 a 2 2 ; xi b : N i=1 X Note that this is exactlyb the same result as we would have got if yi = xi + ui 2 with ui [0; ]: N 4. OLS Estimator with Matrix Algebra

1 Now consider = (X0X) X0y with y = X + u, so

1 b = + (X0X) X0u:

Note that the k k matrix Xb0X = i xixi0 is a sum, where xi is a k 1 vector of regressors for theith observation. Dividing by N gives an average. P 9 4.1. Multivariate CLT Now we need to work with vectors. Two useful results follow. De…nition A16a:(Multivariate Central Limit Theorem) Let N = E[XN ] and VN = V[X N ].A multivariate central limit theorem (CLT) speci…es the conditions on the individual terms Xi in X N under which

1=2 d V (X N ) [0; I]: N N !N 1=2 Note that if we can apply a CLT to X N we can also apply it to N X N . d Theorem A17:(Limit Normal Product Rule) If a vector aN [; A] and p !N a matrix HN H, where H is positive de…nite, then ! d HN aN [H; HAH0]: !N

4.2. Consistency of OLS To prove consistency we rewrite this as

1 1 1 = + N X0X N X0u:

1 1 The reason for renormalizationb in the right-hand side is that N X0X = N i xixi0 is an average that converges in probability to a …nite nonzero matrix if xi satis…es P assumptions that permit a LLN to be applied to xixi0 . Then 1 1 1 plim = + plim N X0X plim N X0u ; using Slutsky’sTheorem (Theorem A.3). The OLS estimator is therefore consis- b tent for (i.e., plim OLS = ) if

1 b plim N X0u = 0:

1 1 If a law of LLN can be applied to the average N X0u = N i xiui then a necessary condition for this to hold is that E[xiui] = 0. The fundamental P condition for consistency of OLS is that E[ui xi] = 0 so that E[xiui] = 0. j

10 4.3. Limit Distribution of OLS Given consistency, the limit distribution of is degenerate with all the mass at

. To obtain a limit distribution we scale OLS up by a multiple pN, so b 1 1 1=2 pN( ) = N Xb0X N X0u: 1 We know plim N X0X existsb and is …nite and nonzero from the proof of consis- 2 tency. For iid errors, E[u X] = 0 and V[X0u X] = E[X0uu0X0 X] = X0X we j j 1=2 j assume that a CLT can be applied to bN = N X0u to yield

1 1=2 1=2 d [N X0X] N X0u [0; I], so by Theorem A17 !N 1=2 d 2 1 N X0u [0; plim N X0X ]: !N Then by Theorem A17 (the limit normal product rule)

d 1 1 2 1 pN( ) (plim N X0X) [0; (plim N X0X)] !d 2 1 N 1 [0; (plim N X0X) ]: b !N 4.4. Asymptotic Distribution of OLS Then dropping the limits

2 1 pN( ) [0; N X0X ]; N so b a 2 1 [ ; (X0X) ]: N The asymptotic variance matrix is b 2 1 V[ ] = (X0X) ; and is consistently estimated by theb estimated variance matrix

2 1 V[ ] = s (X0X) ;

2 2 2 2 where s is consistent for . For example, s = u0u=(N k) or s = u0u=N: b b b b b b

11 4.5. OLS with Heteroskedastic Errors

2 What if the errors are heteroskedastic? If E[uu0 X] = = Diag[i ] then V[X0u X] = N 2 j j E[X0uu0X0 X] = X0X = xix0 . A Multivariate CLT gives j i=1 i i 1 1=2 1=2 d [N XP0X] N X0u [0; I], so !N 1=2 d 1 N X0u [0; plim N X0X]; !N leading to

d 1 1 1 pN( ) (plim N X0X) [0; plim N X0X] !d 2 1 N 1 1 1 1 [0; (plim N X0X) plim N X0X (plim N X0X) ]: b !N Then dropping the limits etcetera a 1 1 [ ; (X0X) X0X(X0X) ]: N The asymptotic variance matrix is b 1 1 V[ ] = (X0X) X0X(X0X) : White (1980) showed that we can use the estimated asymptotic variance matrix b N 1 2 1 V[ ] = (X0X) ( ui xixi0 )(X0X) ; i=1 X where ui = yi xi0 bisb the OLS residual. b 2 Why does this work? There are N variances i and only N observations, 2 so we cannotb consistentlyb estimate by = Diag[ui ]. But this is not neces- 1 sary! We just need to consistently estimate the k k matrix plim N X0X = 1 N 2 plim N i=1 i xixi0 . b b 2 2 Since E[u xix0 xi] = xix0 , by a LLN, if the error ui was observed, we could use iP ij i i N N 1 2 1 2 plim N ui xixi0 = plim N i xixi0 : i=1 i=1 p The error is not observed,X so we instead use the residual.X Formally, since , p u2 u2 0 so replacing u2 by u2 makes no di¤erence asymptotically. This! gives i i ! i i White’sestimator. b b This is a fundamental result.b Without specifying a model for heteroskedasticity we can do OLS and get heteroskedastic robust standard errors. This generalizes to other failures of = 2I, notably serially correlated errors (use Newey-West) and clustered errors. And it generalizes to estimators other than OLS (e.g. IV, MLE).

12 5. Optional Extra: Di¤erent Sampling Schemes

The key for consistency is obtaining the probability of the two averages (of xiui 2 and of xi ), by use of laws of large numbers (LLN). And for asymptotic normality the key is the limit distribution of the average of xiui, obtained by a central limit theorem (CLT). Di¤erent assumptions about the stochastic properties of xi and ui lead to 2 di¤erent properties of xi and xiui and hence di¤erent LLN and CLT.

5.1. Sampling Schemes For the data di¤erent sampling schemes assumptions include:

1. Simple Random Sampling (SRS).

SRS is when we randomly draw (yi; xi) from the population. Then xi are 2 iid. So xi are iid, and xiui are iid if the errors ui are iid. 2. Fixed regressors. This occurs in an experiment where we …x the xi and observe the resulting random yi. Given xi …xed and ui iid it follows that xiui are inid (even if ui 2 are iid), while xi are nonstochastic. 3. Exogenous Strati…ed Sampling This occurs when we oversample some values of x and undersample others. 2 Then xi are inid, so xiui are inid (even if ui are iid) and xi are inid.

The simplest results assume simple random sampling, as we have done.

5.2. LLN and CLT for inid Data

2 2 Suppose rather than Xi iid [; ] we have have Xi inid [; ]. Then we often use the following LLN and CLT.

Theorem A9:(Markov LLN ) Let Xi be inid (independent but not identi- f g 2 1+ 1+ cally distributed) with E[Xi] = i and V[Xi] = i . If i1=1 (E[ Xi i ]=i ) < 1 N as j j , for some > 0, then (XN N E[Xi]) 0: 1 i=1 ! P Compared to Kolmogorov LLN, theP Markov LLN allows nonidentical distribution, at expense of require existence of an absolute moment beyond the …rst.

13 The rest of the side-condition is likely to hold with cross-section data. e.g. if set 2 2 2 = 1, then need variance plus 1 ( =i ) < which happens if is bounded. i=1 i 1 i Theorem A15:(Liapounov CLTP ) Let Xi be independent with E[Xi] = i and f g (2+)=2 2 N 2+ N 2 V[Xi] = . If lim E[ Xi ] = = 0, for some choice i i=1 j ij i=1 i N N 2 d of > 0, then ZN =P (Xi )= P [0; 1]. i=1 i i=1 i !N Compared to Lindberg-Levy,P the LiapounovqP CLT additionally requires existence of an absolute moment of higher order than two.

5.3. Consistency of OLS Estimator

1 N 1 N 2 Obtain probability limit of = + [ N i=1 xiui]=[ N i=1 xi ]: P P 5.3.1. Simple Random Samplingb (SRS) with iid errors

Assume xi iid with mean x and ui iid with mean 0. 1 p As xiui are iid, apply Khinchine’s Theorem yielding N i xiui E[xu] = E[x] E[u] = 0. ! 2 1 P 2 p 2 As xi are iid, apply Khinchine’sTheorem yielding N i xi E[x ] which we assume exists. ! P By Theorem A3 (Probability Limit Continuity) plim[aN =bN ] = a=b if b = 0. Then 6 plim 1 N x u 0 plim = + N i=1 i i = + = : plim 1 N x2 E[x2] NP i=1 i b P 5.3.2. Fixed Regressors with iid errors

2 Assume xi …xed and that ui iid with mean 0 and variance . Then xiui are inid with mean E[xiui] = xiE[ui] = 0 and variance V[xiui] = 2 2 1 1 p 1 p xi . Apply Markov LLN yielding N i xiui N i E[xiui] 0, so N i xiui 2 2 2 ! ! 0. The side-condition with = 1 is i1=1 xi =i which is satis…ed if xi is bounded. 1 2 P P P We also assume lim N x exists. i i P By Theorem A3 (Probability Limit Continuity) plim[aN =bN ] = a=b if b = 0. Then P 6 plim 1 N x u 0 plim = + N i=1 i i = + = : lim 1 N x2 lim 1 N x2 N P i=1 i N i=1 i b P P 14 5.3.3. Exogenous Strati…ed Sampling with iid errors

Assume xi inid with mean E[xi] and variance V[xi] and ui iid with mean 0. 2 2 Now xiui are inid with mean E[xiui] =E[xi]E[ui] = 0 and variance V[xiui] =E[xi ] , 1 p so need Markov LLN. This yields N i xiui 0, with the side-condition satis- 2 ! …ed if E[xi ] is bounded. 2 P And xi are inid, so need Markov LLN with side-condition that requires e.g. 4 existence and boundedness of E[xi ]. Combining again get plim = .

5.4. Limit Distribution ofb OLS Estimator

1 N 1 N 2 Obtain limit distribution of pN( ) = [ xiui]=[ x ]: pN i=1 N i=1 i P P 5.4.1. Simple Random Samplingb (SRS) with iid errors

2 Assume xi iid with mean x and second moment E[x ], and assume ui iid with mean 0 and variance 2. Then xiui are iid, with mean E[xu] =E[x] E[u] = 0 and variance V[xu] = E[(xu)2] (E[xu])2 =E[x2u2] 0 = E[x2]E[u2] = 2E[x2]. Apply Lindeberg-Levy CLT yielding N 1 N 1 N xiui 0 p i=1 xiui d pN i=1 = N [0; 1]: 2 2 2 2 !N P E[x ] ! P E[x ] p d p d p Using Slutsky’s theorem that aN bN a b (for aN a and bN b), this implies that ! ! ! 1 N N xiui 1 pN i=1 2 2 d 2 2 d 2 2 xiui = E[x ] [0; 1] E[x ] [0; E[x ]]: pN i=1 P2E[x2] !N !N X p p d d p Then using Slutsky’stheoremp that aN =bN a=b (for aN a and bN b) ! ! ! 1 N i=1 xiui pN( ) = pN 1 N 2 N Pi=1 xi 2 2 2 2 b d [0; E[x ]] d [0; E[x ]] d 1 NP N 0; 2 E[x2] ; ! 1 N 2 ! E[x2] !N plim N i=1 xi h i 1 N 2 2 where we use result from consistencyP proof that plim N i=1 xi =E[x ].

15 P 5.4.2. Fixed Regressors with iid errors

2 Assume xi …xed and and ui iid with mean 0 and variance . 2 2 Then xiui are inid with mean 0 and variance V[xiui] = xi . Apply Liapounov LLN yielding

N 1 N 1 N xiui 0 p i=1 xiui d pN i=1 = N [0; 1]: 0 1 N 2 2 1 2 1 N 2 !N lim NP i=1 xi limPN i=1 xi @q A q P d P d p Using Slutsky’s theorem that aN bN a b (for aN a and bN b), this implies ! ! !

1 N N xiui N 1 pN i=1 2 1 2 xiui = N xi i=1 i=1 pN 2 limPN 1 N x2 r X i=1 i X q N N 2P 1 2 d 2 1 2 = [0; 1] lim N xi [0; lim xi ]: N r i=1 !N N i=1 X X d d p Then using Slutsky’stheorem that aN =bN a=b (for aN a and bN b) ! ! ! N 1 N 2 1 2 1 xiui 0; lim i=1 xi N pN i=1 d N N d 2 1 2 pN( ) = 0; lim xi : 1 N 2 ! h 1 N 2 i !N N i=1 N Pi=1 xi lim N i=1Pxi " # X b P P 5.4.3. Exogenous Strati…ed Sampling with iid errors

Assume xi inid with mean E[xi] and variance V[xi] and ui iid with mean 0. Similar to …xed regressors will need to use Liapounov CLT. We will get

1 N d 2 1 2 pN( ) 0; plim xi : !N " N i=1 # X b

16 6. Simulation Exercise

6.1. Data Generating Process

2 Suppose yi are iid (1), which has mean 1 and variance 2. 1 N Then y = N i=1 yi has E[y] = 1,V[y] = V[y]=N = 2=N, and using the result that the sum of N independent 2(1) is 2(N), we know that y 2(N)=N. P 6.2. Distribution of y From a CLT we know that as N ! 1 y [1; 2=N]: N Question: how well does this apply in …nite samples? Answer: do a simulation:

th 2 For the s of S times, generate N observations from (1) and calculate ys.

Then does the density of the S generated ys look like the density of a standard normal with mean 1 and variance 2=N? We can do this in several ways: comparing key moments, or comparing key percentiles, or compare a histogram or (kernel) density estimate to the normal.

6.3. Distribution of t-test

The t-test statistic of H0 : = 1 is y 1 t = ; sy

2 2 2 1 N where sy = sy=N and sy = N 1 i=1 yi. From asymptotic theory we know that t [0; 1] as N . As a …nite sample correction we will insteadP suppose it ist(N N 1) degrees! distributed, 1 which is the exact result if yi were normal rather than chi-square. Question: how well does this apply in …nite samples? Answer: do a simulation:

th 2 For the s of S times, generate N observations from (1) and calculate ys, ys 1 sys and hence ts = . sys

Then does the density of the S generated ts look like a t(N 1) density? Again we can test the similarity using moments, percentiles or kernel density estimate.

17 6.4. Size of t-test For the t-test it is behavior in the tails that really matter. In addition to view the 2:5 and 97:5 percentiles, we investigate the size of the t-test. Recall

Size = Pr[reject H0 : = 1 H0 true] j The nominal size of the test is the size that we think we are testing at. Assume that t T (N 1) distributed under H0. Then for two-tailed testing at 5 percent

Nominal size = 0:05 = Pr[ t > t:025;N 1 t T (N 1)]: j j j

But we do not know that t T (N 1) under H0, and in small samples it is not this. The true size of the test is the actual size that we are testing at.

True size = Pr[ t > t:025;N 1 true distribution of t under H0]: j j j To obtain this unknown true distribution we do a simulation. For the sth of 2 ys 1 S times, generate N observations from (1), calculate ts = , and record sys whether ts > t:025;N 1, i.e. whether we reject H0. Thenj thej simulation estimate of true size is the proportion of rejections:

True size = (# simulations with t > t:025;N 1)=(# simulations): j j