Lecture 5 – Large Sample Results for OLS in a Time Series Setting (Reference – Section 2.3,2.4.2.5,2.6, Hayashi)

We will formulate a set of conditions under which the OLS estimator is consistent and asymptotically normally. These conditions will replace the assumption that the regressors are i.i.d. with the assumption that they are stationary and ergodic. The conditions for consistency will allow for serial correlation and conditional heteroskedasticity in the disturbances. The conditions for asymptotic normality will rule out serial correlation in the disturbances but will allow for conditional heteroskedasticity. The proof of these theorems, which are provided in the text, rely on the Ergodic Theorem (consistency) and the Martingale Differences Central Limit Theorem (asymptotic normality). Sufficient Conditions for the Consistency of the OLS Estimator

Assume the univariate stochastic processes {yt} and

{t} and the k-dimensional stochastic process {xt} are generated accorded to:

A.1. (Linearity)

' yt  xt    t t = 1,2,…

A.2. (Stationarity and Ergodicity)

{yt,xt} [or, equivalently, {t,xt}] is a jointly stationary and ergodic process.

A.3. (Orthogonality Condition)

E(xtit) = 0 for i = 1,…,k and t = 1,2,…

A.4. (Rank Condition)

' E(xt xt )   xx , a finite p.d. matrix, for t = 1,2,…

ˆ Under these conditions, T  . a.s. Some comments on these asssumptions-

1. We have replaced the i.i.d. assumption with stationarity, a more time series friendly assumption that allow for temporal dependence in the regressors and the disturbances (although we will put some additional restrictions on the temporal dependence of the disturbances when we formulate the conditions for asymptotic normality).

2. Assumption A.3 requires that the disturbances are “contemporaneously uncorrelated” with the regressors. This “orthogonality condition” is

weaker than the condition that E(t│xt, xt-1,…) = 0, which, in turn, is weaker than the strict exogeneity condition.

Note: Hayashi correctly points out in footnote 10, the phrase “predetermined regressors” is used differently by different people. Some, like me,

use it to mean E(t│xt, xt-1,…) = 0 or the weaker

condition E(xt-st) = 0 for all s > 0. Hayashi

prefers to use it to mean E(t│xt) or the weaker

condition E(xtt) = 0. 3. According to A.3, E(t) = 0 if xt includes an intercept and according to A.2 if the variance of 2 2 t is finite then it is constant, i.e., E(t ) =  for all t (and, so, the disturbances are unconditionally homoskedastic).

- - - - -

Let xt1 = 1 for all t. Then E(t) = 0 is an immediate consequence of (A.3).

By (A.2), t is stationary, so if it has a finite variance that variance is the same for all t.

- - - - -

4. The assumptions do not rule out conditional heteroskedasticity in the ’s. That is, the model does not restrict the dependence of 2 E(t │t-1, t-2, …,xt, xt-1,…) on current and past x’s or on past ’s. So, for example, ARCH disturbances are consistent with this set of assumptions. 5. The rank condition is essentially a no multicollinearity (in the limit) condition. [Since the x’s form a stationary and ergodic ’ process, xtxt is also stationary and ergodic. By the Ergodic Theorem,

T 1 ' xt xt  XX  a.s. T 1

T 1 ' that is,  xt xt is a.s. nonsingular for sufficiently T 1 large T.] Sufficient Conditions for the Asymptotic Normality of the OLS Estimator

We will add the following assumption to (A.1)-(A.4) above. (Or since the added assumption will be stronger than (A.3) we could simply replace (A.3) with this assumption.)

A.5

{xtt} is a (k-dimensional) martingale difference sequence with finite second moment matrix, S.

Then under (A.1)-(A.5)

ˆ ˆ T (T   ) N(0,a var( )) , d

ˆ 1 1 where a var( )   XX S XX . Some comments on these asssumptions-

1. The m.d.s. assumption on {xtt} is stronger than A.3 because an m.d.s. is a zero mean sequence. A sufficient condition for A.3-

E(t│xt,xt-1,…) = 0. A sufficient condition for the m.d.s. part of A.5-

E(t│xt,xt-1,…,t-1,t-2,…) = 0.

2. These assumptions imply that t is uncorrelated with current and past x’s and with past ’s. So they do not allow for serial correlation in the ’s. They do allow for predetermined but not strictly exogenous regressors and they do allow for conditionally heteroskedastic disturbances.

2 ' 2 ' 3. S  E(xt t xt )  E( t xt xt ) So, the assumption that S is finite is a “fourth- moment” restriction.

To apply this result to hypothesis testing or confidence interval construction, we will need a ˆ 1 1 consistent estimator of a var( )   XX S XX .

1 1 A consistent estimator of  XX is S XX , where T 1 ' S XX   xt xt . This follows from the Ergodic T 1 Theorem. Suppose we have a consistent estimator of ˆ 1 ˆ 1 S, ST . Then S XX ST S XX is a consistent estimator of a var(ˆ) .

Note that if we add Hayashi’s assumption A.6,

2 E[(xtixtj) ] < ∞ then

T ˆ 1 2 ' ST  ˆt xt xt  S  p T 1 In the special case where the ’s are conditionally homoskedastic with variance 2

S= 2 ' 2 ' 2 ' 2 E( t xt xt )  E[E( t xt xt xt )]   E(xt xt )    xx ˆ 2 1 and so, in this case, a var( )    XX .

In this case, a consistent estimator of a var(ˆ) will be 2 1 2 2 ˆ S XX where ˆ is any consistent estimator of  . Consistent estimators of 2 under these assumptions 1 1 include SSR and SSR T T  k Applications to Hypothesis Testing

1. Assume that

ˆ ˆ T (T   ) N(0,a var( )) d and

1 ˆ 1 is a consistent estimator of ˆ , S XX ST S XX a var( ) ˆ where ST is a consistent estimator of S.

Then

ˆ ˆ ti = (i  i )/ se(i ) N(0,1) d where

1 se(ˆ )  (S 1SˆS 1 ) i T xx xx ii Comments –

i. Recall that the t(n-k) distribution converges to the N(0,1) distribution as n goes to infinity, so that in large samples whether we use the t(n-k) distribution or the N(0,1) distribution will not matter (i.e., will not give different test outcomes). ii. This standard error is called the heteroskedasticity-consistent standard error (heteroskedasticity-robust standard error,White’s standard error), since it is valid even if there is conditional heteroskedasticity in the disturbances. iii. If we make the stronger assumption that the disturbances are conditionally homoskedastic, then this standard error reduces to the “usual” form:

ˆ 2 1 se(i )  ˆ (X ' X )ii

2. Under A.1 to A.5 and the restriction that R = r, where R is a qxk matrix and r is a qx1 vector of real numbers, where rank(R) = q,

W ≡

ˆ 1 ˆ 1 1 ˆ T(R  r)'(R S XX ST S XX R') (R  r)

 2 (q) d

ˆ where ST is any consistent estimator of S.

Note that if the disturbances are conditionally homoskedastic, this statistic reduces to

ˆ 2 1 1 ˆ T (R  r)'(R ˆ S XX R') (R  r) = (Rˆ  r)'(R(X ' X ) 1 R') 1 (Rˆ  r) /ˆ 2 = q  F where F is the usual OLS F-statistic.

{Recall: F(T-k,q) converges in distribution to χ2(q).} 3. Suppose A.1 – A.5 hold and g() = 0, where g is a set of q nonlinear functions such that G(), the qxk matrix of (continuous at ) first derivatives evaluated at , has rank q. Then

ˆ ˆ 1 ˆ 1 G(ˆ)')1 g(ˆ) W  Tg( )'(G( ) S XX ST S XX

 2 (q) d