Asymptotic Results for the Linear Regression Model
Total Page:16
File Type:pdf, Size:1020Kb
Asymptotic Results for the Linear Regression Model C. Flinn November 29, 2000 1. Asymptotic Results under Classical Assumptions The following results apply to the linear regression model y = Xβ + ε, where X is of dimension (n k), ε is a (unknown) (n 1) vector of disturbances, and β is a (unknown) (k 1)×parametervector.Weassumethat× n k, and that × À ρ(X)=k. This implies that ρ(X0X)=k as well. Throughout we assume that the “classical” conditional moment assumptions apply, namely E(εi X)=0 i. • | ∀ 2 V (εi X)=σ i. • | ∀ We Þrst show that the probability limit of the OLS estimator is β, i.e., that it is consistent. In particular, we know that 1 βˆ = β +(X0X)− X0ε 1 E(βˆ X)=β +(X0X)− X0E(ε X) ⇒ | | = β In terms of the (conditional) variance of the estimator βˆ, 2 1 V (βˆ X)=σ (X0X)− . | Now we will rely heavily on the following assumption X X lim n0 n = Q, n n →∞ where Q is a Þnite, nonsingular k k matrix. Then we can write the covariance ˆ × of βn in a sample of size n explicitly as 2 µ ¶ 1 ˆ σ Xn0 Xn − V (β Xn)= , n| n n so that 2 µ ¶ 1 ˆ σ Xn0 Xn − lim V (βn Xn) = lim lim n n n →∞ | 1 =0 Q− =0 × Since the asymptotic variance of the estimator is 0 and the distribution is centered on β for all n, we have shown that βˆ is consistent. Alternatively, we can prove consistency as follows. We need the following result. Lemma 1.1. µ ¶ X ε plim 0 =0. n ¡ ¢ X0ε Proof. First, note that E n =0for any n. Then the variance of the expression X0ε n is given by µ ¶ µ ¶µ ¶ X ε X ε X ε 0 V 0 = E 0 0 n n n 2 = n− E(X0εε0X) σ2 X X = 0 , n n ¡ ¢ X0ε so that limn V n =0 Q =0. Since the asymptotic mean of the ran- dom variable→∞ is 0 and the asymptotic× variance is 0, the probability limit of the expression is 0.¥ 2 Now we can state a slightly more direct proof of consistency of the OLS esti- mator, which is 1 plim(βˆ) = plim(β +(X0X)− X0ε) µ ¶ 1 µ ¶ X X − X ε = β + lim n0 n plim 0 n n 1 = β + Q− 0=β. × Next, consider whether or not s2 is a consistent estimator of σ2. Now SSE s2 = , n k − 2 2 where SSE =(y Xβˆ)0(y Xβˆ). We showed that E(s )=σ for all n -thatis, 2 − − 2 that s is an unbiased estimator of σ for all sample sizes. Since SSE = ε0Mε, 1 with M =(I X(X0X)− X0), then − ε Mε p lim s2 = p lim 0 n k ε M− ε = p lim 0 n µ ¶µ ¶ 1 µ ¶ ε ε ε X X X − X ε = p lim 0 p lim 0 0 0 n − n n n ε0ε 1 = p lim 0 Q− 0. n − × × Now Xn ε0ε 1 2 = n− ε n i i=1 so that µ ¶ Xn ε0ε 1 2 E = n− E ε n i i=1 Xn 1 2 = n− Eεi i=1 1 2 2 = n− (nσ )=σ . 3 Similarly, , under the assumption that εi is i.i.d., the variance of the random variable being considered is given by µ ¶ Xn ε0ε 2 2 V = n− V ( ε ) n i i=1 Xn 2 2 = n− V (εi ) i=1 2 4 2 = n− (n[E(εi ) V (εi) ]) 1 4 − 2 = n− [E(ε ) V (εi) ], i − ε0ε 4 so that the limit of the variance of n is 0 as long as E(εi ) is Þnite [we have already assumed that the Þrst two moments of the distribution of εi exist]. Thus ε0ε 2 the asymptotic distribution of n is centered at σ and is degenerate, thus proving consistency of s2. 2. Testing without Normally Distributed Disturbances In this section we look at the distribution of test statistics associated with linear restrictions on the β vector when εi is not assumed to be normally distributed as 2 N(0, σ ) for all i. Instead, we will proceed with the weaker condition that εi is in- dependently and identically distributed with the common cumulative distribution 2 function (c.d.f.) F. Furthermore, E(εi)=0and V (εi)=σ for all i. Since we retain the mean independence and homogeneity assumptions, and since unbiasedness, consistency, and the Gauss-Markov theorem for that matter, allonlyrelyontheseÞrst two conditional moment assumptions, all these results continue to hold when we drop normality. However, the small sample distributions of our test statistics no longer will be accurate, since these were all derived under the assumption of normality. If we made other explicit assumptions regarding F, it is possible in principle to derive the small sample distributions of test statistics, though these distributions are not simple to characterize analytically or even to compute. Instead of making explicit assumptions regarding the form of F, we can derive distributions of test statistics which are valid for large n no matter what the exact form of F [except that it must be a member of the class of distibutions for which the asymptotic results are valid, of course]. We begin with the following useful lemma, which is associated with Lindberg- Levy. 4 2 2 Lemma 2.1. If ε is i.i.d. with E(εi)=0and E(εi )=σ for all i; if the elements of the matrix X are uniformly bounded so that Xij <U for all i and j and for X0X | | U Þnite; and if lim n = Q is Þnite and nonsingular, then 1 2 X0ε N(0, σ Q). √n → Proof. Consider the case of only one regressor for simplicity. Then 1 Xn Z X ε n √n i i ≡ i=1 is a scalar. Let Gi be the c.d.f. of Xiεi. Let Xn Xn 2 2 2 Sn V (Xiεi)=σ Xi . ≡ i=1 i=1 1 P 2 In this scalar case, Q = lim n− i Xi . By the Lindberg-Feller Theorem, the 2 necessary and sufficient condition for Zn N(0, σ Q) is → Xn Z 1 2 lim 2 ω dGi(ω)=0 (2.1) Sn ω >νSn i=1 | | ω for all ν > 0. Now Gi(ω)=F ( ). Then rewrite [2.1]as Xi | | µ ¶ Xn 2 Z 2 n Xi ω ω lim 2 dF ( )=0. Sn n ω/Xi >νSn/ Xi Xi Xi i=1 | | | | | | P 2 2 2 n Xi 2 n 2 1 Since lim S =limnσ = nσ Q, then lim 2 =(σ Q)− , which is a Þnite n i=1 n Sn and nonzero scalar. Then we need to show Xn 1 2 lim n− Xi δi,n =0, i=1 R ³ ´2 ω ω where δi,n dF ( ). Now lim δi,n =0for all i and any ω/Xi >νSn/ Xi Xi Xi ≡ | | | | | | Þxed ν since Xi is bounded while lim Sn = [thus the measure of the set | | ∞ 1 P 2 ω/Xi > νSn/ Xi goes to 0 asymptotically]. Since lim n− Xi is Þnite and | | | | 1 P 2 lim δi,n =0for all i, lim n− Xi δi,n =0.¥ 5 For vector-valued Xi, the result is identical of course, with Q being k k instead of a scalar. The proof is only slightly more involved. × Now we can prove the following important result. Theorem 2.2. Under the conditions of the lemma, 2 1 √n(βˆ β) N(0, σ Q− ). − → ¡ ¢ 1 ¡ ¢ 1 X0X 1 X0X 1 1 Proof. √n(βˆ β)= − X0ε. Since lim − = Q− and X0ε − n √n n √n → 2 2 1 1 2 1 N(0, σ Q),then √n(βˆ β) N(0, σ Q− QQ− )=N(0, σ Q− ). − → ¥ The results of this proof have the following practical implications. For small n, the distribution of √n(βˆ β) is not normal, though asymptotically the dis- tribution of this random variable− converges to a normal. The variance of this 2 1 random variable converges to σ Q− which is arbitrarily well-approximated by ³ ´ 1 0 2 XnXn − 2 0 1 ˆ s = s n(X Xn)− . But the variance of (β β) is equal to the variance n n − of √n(βˆ β) divided by n, so that in large samples the variance of the OLS − 2 0 1 2 0 1 estimator is approximately equal to s n(XnXn)− /n = s (XnXn)− , even when F is non-normal. Usual t tests of one linear restriction on β are no longer consistent. However, an analagous large sample test is readily available. 2 2 Proposition 2.3. Let εi be i.i.d. (0,σ ), σ < , and let Q be Þnite and non- singular. Consider the test ∞ H0 : Rβ = r, where R is (1 k) and r is a scalar, both known. Then × Rβˆ r p − N(0, 1). 2 1 s R(X0X)− R0 → Proof. Under the null, Rβˆ r = Rβˆ Rβ = R(βˆ β), so that the test statistic is − − − √nR(βˆ β) p − . 2 1 s R(X0X/n)− R0 Since 2 1 √n(βˆ β) N(0, σ Q− ) − → 2 1 √nR(βˆ β) N(0, σ RQ− R0). ⇒ − → 6 p 2 1 The denominator of the test statistic has a probability limit equal to σ RQ− R0, which is the standard deviation of the random variable in the numerator. A mean zero normal random variable divided by its standard deviation has the distribution N(0, 1).¥ A similar result holds for the situation in which multiple (nonredundent) linear restrictions on β are tested simultaneously.