<<

Asymptotic Results for the Linear Regression Model

C. Flinn November 29, 2000

1. Asymptotic Results under Classical Assumptions

The following results apply to the linear regression model

y = Xβ + ε, where X is of dimension (n k), ε is a (unknown) (n 1) vector of disturbances, and β is a (unknown) (k 1)×parametervector.Weassumethat× n k, and that × À ρ(X)=k. This implies that ρ(X0X)=k as well. Throughout we assume that the “classical” conditional moment assumptions apply, namely

E(εi X)=0 i. • | ∀ 2 V (εi X)=σ i. • | ∀ We Þrst show that the probability limit of the OLS is β, i.e., that it is consistent. In particular, we know that

1 βˆ = β +(X0X)− X0ε 1 E(βˆ X)=β +(X0X)− X0E(ε X) ⇒ | | = β

In terms of the (conditional) variance of the estimator βˆ,

2 1 V (βˆ X)=σ (X0X)− . | Now we will rely heavily on the following assumption X X lim n0 n = Q, n n →∞ where Q is a Þnite, nonsingular k k matrix. Then we can write the covariance ˆ × of βn in a sample of size n explicitly as

2 µ ¶ 1 ˆ σ Xn0 Xn − V (β Xn)= , n| n n so that

2 µ ¶ 1 ˆ σ Xn0 Xn − lim V (βn Xn) = lim lim n n n →∞ | 1 =0 Q− =0 × Since the asymptotic variance of the estimator is 0 and the distribution is centered on β for all n, we have shown that βˆ is consistent. Alternatively, we can prove consistency as follows. We need the following result. Lemma 1.1. µ ¶ X ε plim 0 =0. n ¡ ¢ X0ε Proof. First, note that E n =0for any n. Then the variance of the expression X0ε n is given by µ ¶ µ ¶µ ¶ X ε X ε X ε 0 V 0 = E 0 0 n n n 2 = n− E(X0εε0X) σ2 X X = 0 , n n ¡ ¢ X0ε so that limn V n =0 Q =0. Since the asymptotic mean of the ran- dom variable→∞ is 0 and the asymptotic× variance is 0, the probability limit of the expression is 0.¥

2 Now we can state a slightly more direct proof of consistency of the OLS esti- mator, which is

1 plim(βˆ) = plim(β +(X0X)− X0ε) µ ¶ 1 µ ¶ X X − X ε = β + lim n0 n plim 0 n n 1 = β + Q− 0=β. × Next, consider whether or not s2 is a consistent estimator of σ2. Now SSE s2 = , n k − 2 2 where SSE =(y Xβˆ)0(y Xβˆ). We showed that E(s )=σ for all n -thatis, 2 − − 2 that s is an unbiased estimator of σ for all sample sizes. Since SSE = ε0Mε, 1 with M =(I X(X0X)− X0), then − ε Mε p lim s2 = p lim 0 n k ε M− ε = p lim 0 n µ ¶µ ¶ 1 µ ¶ ε ε ε X X X − X ε = p lim 0 p lim 0 0 0 n − n n n

ε0ε 1 = p lim 0 Q− 0. n − × × Now Xn ε0ε 1 2 = n− ε n i i=1 so that µ ¶ Xn ε0ε 1 2 E = n− E ε n i i=1 Xn 1 2 = n− Eεi i=1 1 2 2 = n− (nσ )=σ .

3 Similarly, , under the assumption that εi is i.i.d., the variance of the being considered is given by µ ¶ Xn ε0ε 2 2 V = n− V ( ε ) n i i=1 Xn 2 2 = n− V (εi ) i=1 2 4 2 = n− (n[E(εi ) V (εi) ]) 1 4 − 2 = n− [E(ε ) V (εi) ], i −

ε0ε 4 so that the limit of the variance of n is 0 as long as E(εi ) is Þnite [we have already assumed that the Þrst two moments of the distribution of εi exist]. Thus ε0ε 2 the asymptotic distribution of n is centered at σ and is degenerate, thus proving consistency of s2.

2. Testing without Normally Distributed Disturbances

In this section we look at the distribution of test associated with linear restrictions on the β vector when εi is not assumed to be normally distributed as 2 N(0, σ ) for all i. Instead, we will proceed with the weaker condition that εi is in- dependently and identically distributed with the common cumulative distribution 2 function (c.d.f.) F. Furthermore, E(εi)=0and V (εi)=σ for all i. Since we retain the mean independence and homogeneity assumptions, and since unbiasedness, consistency, and the Gauss-Markov theorem for that matter, allonlyrelyontheseÞrst two conditional moment assumptions, all these results continue to hold when we drop normality. However, the small sample distributions of our test statistics no longer will be accurate, since these were all derived under the assumption of normality. If we made other explicit assumptions regarding F, it is possible in principle to derive the small sample distributions of test statistics, though these distributions are not simple to characterize analytically or even to compute. Instead of making explicit assumptions regarding the form of F, we can derive distributions of test statistics which are valid for large n no matter what the exact form of F [except that it must be a member of the class of distibutions for which the asymptotic results are valid, of course]. We begin with the following useful lemma, which is associated with Lindberg- Levy.

4 2 2 Lemma 2.1. If ε is i.i.d. with E(εi)=0and E(εi )=σ for all i; if the elements of the matrix X are uniformly bounded so that Xij

1 2 X0ε N(0, σ Q). √n → Proof. Consider the case of only one regressor for simplicity. Then

1 Xn Z X ε n √n i i ≡ i=1 is a scalar. Let Gi be the c.d.f. of Xiεi. Let Xn Xn 2 2 2 Sn V (Xiεi)=σ Xi . ≡ i=1 i=1

1 P 2 In this scalar case, Q = lim n− i Xi . By the Lindberg-Feller Theorem, the 2 necessary and sufficient condition for Zn N(0, σ Q) is → Xn Z 1 2 lim 2 ω dGi(ω)=0 (2.1) Sn ω >νSn i=1 | | ω for all ν > 0. Now Gi(ω)=F ( ). Then rewrite [2.1]as Xi | | µ ¶ Xn 2 Z 2 n Xi ω ω lim 2 dF ( )=0. Sn n ω/Xi >νSn/ Xi Xi Xi i=1 | | | | | | P 2 2 2 n Xi 2 n 2 1 Since lim S =limnσ = nσ Q, then lim 2 =(σ Q)− , which is a Þnite n i=1 n Sn and nonzero scalar. Then we need to show Xn 1 2 lim n− Xi δi,n =0, i=1

R ³ ´2 ω ω where δi,n dF ( ). Now lim δi,n =0for all i and any ω/Xi >νSn/ Xi Xi Xi ≡ | | | | | | Þxed ν since Xi is bounded while lim Sn = [thus the measure of the set | | ∞ 1 P 2 ω/Xi > νSn/ Xi goes to 0 asymptotically]. Since lim n− Xi is Þnite and | | | | 1 P 2 lim δi,n =0for all i, lim n− Xi δi,n =0.¥

5 For vector-valued Xi, the result is identical of course, with Q being k k instead of a scalar. The proof is only slightly more involved. × Now we can prove the following important result. Theorem 2.2. Under the conditions of the lemma,

2 1 √n(βˆ β) N(0, σ Q− ). − → ¡ ¢ 1 ¡ ¢ 1 X0X 1 X0X 1 1 Proof. √n(βˆ β)= − X0ε. Since lim − = Q− and X0ε − n √n n √n → 2 2 1 1 2 1 N(0, σ Q),then √n(βˆ β) N(0, σ Q− QQ− )=N(0, σ Q− ). − → ¥ The results of this proof have the following practical implications. For small n, the distribution of √n(βˆ β) is not normal, though asymptotically the dis- tribution of this random variable− converges to a normal. The variance of this 2 1 random variable converges to σ Q− which is arbitrarily well-approximated by ³ ´ 1 0 2 XnXn − 2 0 1 ˆ s = s n(X Xn)− . But the variance of (β β) is equal to the variance n n − of √n(βˆ β) divided by n, so that in large samples the variance of the OLS − 2 0 1 2 0 1 estimator is approximately equal to s n(XnXn)− /n = s (XnXn)− , even when F is non-normal. Usual t tests of one linear restriction on β are no longer consistent. However, an analagous large sample test is readily available.

2 2 Proposition 2.3. Let εi be i.i.d. (0,σ ), σ < , and let Q be Þnite and non- singular. Consider the test ∞ H0 : Rβ = r, where R is (1 k) and r is a scalar, both known. Then × Rβˆ r p − N(0, 1). 2 1 s R(X0X)− R0 → Proof. Under the null, Rβˆ r = Rβˆ Rβ = R(βˆ β), so that the test statistic is − − − √nR(βˆ β) p − . 2 1 s R(X0X/n)− R0 Since

2 1 √n(βˆ β) N(0, σ Q− ) − → 2 1 √nR(βˆ β) N(0, σ RQ− R0). ⇒ − → 6 p 2 1 The denominator of the test statistic has a probability limit equal to σ RQ− R0, which is the standard deviation of the random variable in the numerator. A mean zero normal random variable divided by its standard deviation has the distribution N(0, 1).¥

A similar result holds for the situation in which multiple (nonredundent) linear restrictions on β are tested simultaneously.

2 2 Proposition 2.4. Let εi be i.i.d. (0,σ ), σ < , and let Q be Þnite and non- singular. Consider the test ∞ H0 : Rβ = r, where R is (m k) and r is a (m 1) vector, both known. Then × × 1 1 2 (r Rβˆ)0[R(X0X)− R0]− (r Rβˆ)/m χ − − m . SSE/(n k) → m − Proof. The denominator is a consistent estimator of σ2 [as would be SSS/n], and has a degenerate limiting distribution. Under the null hypothesis, r Rβˆ = 1 − R(X0X)− X0ε, so that the numerator of the test statistic can be written ε0Dε, where− 1 1 1 1 D X(X0X)− R0[R(X0X)− R0]− R(X0X)− X0. ≡ Now D is symmetric and idempotent with ρ(D)=m. Then write

ε0Dε ε0PP0DPP 0ε 2 = 2 mσ m·σ ¸ 1 I 0 = V m V m 0 00 1 Xm = V 2, m i i=1 · ¸ I 0 where P is the orthogonal matrix such that P DP = m and where V = 0 00 P 0ε σ . Thus the Vi are i.i.d. with mean 0 and standard deviation 1.BecauseV = P 0ε/σ, Xn P ε V = ji j ,i=1, ..., m. i σ j=1

7 The terms in the summand are independent random variables with mean 0 and 2 2 variance σj = Pji. Since the εj are i.i.d., the applies, so that

Xn P ε /σ ji j N(0, 1), W j=1 n → q q Pn 2 Pn 2 where Wn = j=1 σj = j=1 Pji =1because P is orthogonal. Then since P 2 1 m 2 χm each Vi is standard normal, V . m i=1 i → m ¥ The practical use of this theorem is as follows. For large samples, the sample distribution of the statistic

1 1 2 (r Rβˆ)0[R(X0X)− R0]− (r Rβˆ)/m χ − − m , (2.2) SSE/(n k) → m − which means that for large enough n

1 1 (r Rβˆ)0[R(X0X)− R0]− (r Rβˆ) − − χ2 . (2.3) SSE/(n k) → m − Now when disturbances were normally distributed, in a sample of size n we have the same test statistic given by the left-hand side of [2.2] was distributed as an 2 χm(x) F (m, n k). Note that limn F (x; m, n k) is m . For example, say that the test statistic− associated with→∞ a null with (−m) 3 restrictions assumed the value 4. Inasamplesizeofn = 8000, we have (approximately) 1 F (4; 3, 8000) = .00741. − 2 The asymptotic approximation given in [2.3] in this example yields 1 χ3(3 4) = .00738. Insmallsamples,differences are much greater of course. For− example,× for the same value of the test statistic, when n =20we have 1 F (4; 3, 20 3) = 2 − − .02523, which is certainly different than 1 χ3(3 4) = .00738. In summary, when the sample size is very− large,× the normality assumption is pretty much inconsequential in the testing of linear restrictions on the parame- ter vector β. In small samples, some given assumption as to the form of F (ε) is generally required to compute the distribution of the estimator βˆ. Under normal- ity, the small sample distributions of test statistics follow the t or F, depending on the number of restrictions being tested. Testing in this environment depends critically on the normality assumption, and if the disturbances are not normally distributed, tests will be biased in general.

8