<<

S. K. Bhaumik

Reference: Sankar Kumar Bhaumik, Principles of : A Modern Approach Using Eviews, Oxford University Press, 2015, Ch. 5

1 ❖ Definition

• One of the assumptions of the CLRM → the disturbance term of the model is independent. Symbolically, for the model

Yt =  + X t +  t

Cov( t ,  s ) = E( t s ) = 0 for t  s

This feature of regression disturbance is known as serial independence or non- autocorrelation. It implies that the value of disturbance term in one period is not correlated with its value in another period → the disturbance occurring at one point of time does not carry over into another period.

2 • However, this assumption may not always be fulfilled, particularly when we are dealing with . This is because the disturbance captures the impact of a large number of random and independent factors, which enter into our relationship but are not measurable. Therefore, it is quite possible that the effect of these factors operating in one period would carry over, at least in part, to the following period → the disturbance term at period t may be related with the disturbance terms at t − 1, t − 2, ... and t + 1, t + 2, ...., and so on.

• In that case, Cov( t ,  s )  0 for t  s and we say that the disturbances are auto-correlated.

• So autocorrelation represents lack of independence for the disturbance term of the model as a result of which its value in one period gets correlated with its value in another period.

3 ❖ Sources of autocorrelation

➢ Prolonged influence of shocks: In time series data, random shocks or disturbances have effects which persist for a number of time periods → an earthquake, flood, drought, or war affects the performance of the economy in periods following the period of its occurrence. The influences of such shocks become all the more visible when we consider shorter time-intervals (in weekly, monthly and quarterly data compared to yearly data) → this problem becomes more prevalent when time series data with shorter time intervals are considered.

➢ Inertia: Past actions often have strong influence on current actions owing to inertia or psychological conditioning → a positive disturbance in one period is likely to influence activities in successive periods resulting in autocorrelation.

4

➢ Spatial autocorrelation: Although autocorrelation is found more in time series data, some cross-sectional data might also suffer from this problem. For example, while working with household level data, we may observe the disturbances of the households in the same neighbourhood being correlated. This is called ‘spatial autocorrelation’.

➢ Model misspecification: Exclusion of a relevant explanatory variable from the model, whose successive values are correlated, will make the disturbance term associated with the misspecified model autocorrelated. So when that variable is included in the model, the apparent problem of autocorrelation disappears.

5 ❖ Specification of Autocorrelation Relationship

Suppose two successive values of disturbance term are correlated so that the autocorrelation relationship is

 t =  t−1 + ut (1) where  is a parameter whose absolute value is less than 1, i.e.,   1 and ut is the disturbance term of the above autocorrelation relationship, which has the following properties. E(u ) = 0  t for all t 2 2  Var(ut ) = E(ut ) =  u 

ut is normally distributed

E(ut ut−i ) = 0 Equation (1) is known as the first-order autoregressive scheme, denoted by AR(1) .

6 By successive substitutions for  t−1, t−2 , .... in (1), we obtain

 t = ( t−2 + ut−1 ) + ut 2 =   t−2 + ut−1 + ut 2 =  ( t−3 + ut−2 ) + ut−1 + ut 3 2 =   t−3 +  ut−2 + ut−1 + ut 2 = ut + ut−1 +  ut−2 + .... (2)

This shows that under the first-order autoregressive scheme, the effect of past disturbances wears off gradually as   1.

We may use the expression (2) to compute , and of disturbances.

Mean of  t The mean of is E( ) = E(u + u +  2u + .... ) t t t−1 t−2 = 0  = 0 ( E(ut ) for all t) 7 Variance of  t

2 2 Var( t ) = E(ut + ut−1 +  ut−2 + .... ) 2 2 2 4 2 2 = E(ut +  ut −1 +  ut −2 + .... + 2ut ut −1 + 2 ut ut −2 + ....) 2 4 = Var(ut ) +  Var(ut−1 ) +  Var(ut−2 ) + .... ( E(ut ut−i ) = 0) 2 2 4 2 2 =  u (1+  +  + .... ) ( E(ut ) =  u for all t) 2  u = 1−  2 2 =   (3)

Covariance of and  t−1

From (2), we have 2  t = ut + ut−1 +  ut−2 + .... Lagging this by one period,  = u + u +  2u + .... t−1 t−1 t−2 t−3 8 So the covariance between  t and  t−1 is

Cov( t ,  t−1 ) = E( t t−1 ) 2 2 = E[{ut + ut−1 +  ut−2 + ....}{ut−1 + ut−2 +  ut−3 + ....}] 2 = E[{ut + (ut−1 + ut−2 + ....}{ut−1 + ut−2 +  ut−3 + ....}] 2 2 2 = E[ut ut−1 + ut ut−2 +  ut ut−3 + ....] + E[ut−1 + ut−2  ut−3 + ....] 2 2 2 4 2 = ( u +   u +   u + .... ) (using assumptions of ut mentioned above)  2 = u 1−  2 2 2 2 2 =   (σε = σu (1 − ρ ),as shown in (3) above) (4)

From (4), it follows that

E(εt εt−1) ρ = 2 (5) σε

2 E(εtεt−2 ) ρ = 2 σε . . .

s E(εtεt−s ) ρ = 2 (6) σε 9 An important observation → if we put s = 0 in (6), then

2 2 2 2 E(εt ) σε = σε σε = 1 which that zero-order autocorrelation coefficient is unity.

We can rewrite (5) as

Cov( t ,  t−1 )  = Var( t ) Var( t−1 )

This means that  also represents the between  t and  t−1 . 2 s Similarly,  is the correlation coefficient between and  t−2 and  is the correlation coefficient between and  t−s .

10 ❖ Consequences of Autocorrelation

✓ Unbiasedness

Consider the model

Yt =  + X t +  t for which ˆ , the OLS estimator of  , is given by

ˆ  xt yt  = 2  xt

 xt t =  + 2  xt Taking expectations, E(ˆ) =   ˆ is an unbiased estimator of  . So autocorrelation does not remove unbiasedness property for the OLS estimators. 11 ✓ Bestness To examine this property we compute the variance of ˆ under autocorrelation, which is compared with variance of when there is no autocorrelation.

Var(ˆ) = E[ˆ − ]2 2   x   = E t t   2    xt  1   = E x 2 2 + 2 x x   + 2 x x   + .... 2 2  t t  t t−1 t t−1  t t−2 t t−2  ( xt )  t=1 t=2 t=3 

1 2 2 2 2 2 = 2 2   xt + 2  xt xt−1 + 2   xt xt−2 + ....  ( xt )

 E( t t−k ) k k 2   =   E( t t−k ) =      2 

2    xt xt−1 2  xt xt−2  = 2 1+ 2 2 + 2 2 + ....  (7)  xt   xt  xt 

Since  is positive and  xt xt−k is also positive (expectedly), the expression in the bracket is greater than 1. 12 Var(ˆ)  Var(ˆ)  ˆ Thus, Under AR No AR is no longer a minimum variance or best estimator.

✓ Consistency limVar(ˆ) = 0 Here we have to check whether n→ .

From (7), we have 2   ˆ   xt xt−1 2  xt xt−2 Var() = 2 1+ 2 2 + 2 2 + ....   xt   xt  xt  2 2  2 2 = 2 + 2 2   xt xt−1 +   xt xt−2 + ....   xt ( xt ) 2 2  2 2 2 2 2 2 = 2 + 2 2 r1  xt  xt−1 +  r2  xt  xt−2 + ....   xt ( xt )

    xt xt−k 2 2   r =   x x − = r  x  x −  k 2 2 t t k k t t k    xt  xt−k 

2 2  / n 2( / n) 2 2 2 2 2 = 2 + 2 2 r1  xt / n  xt−1 / n +  r2  xt / n  xt−2 / n + ....   xt / n ( xt / n)

13 2 Now as n →  ,  / n → 0

2 2 2  xt / n,  xt−1 / n,  xt−2 / n, .... approach to some finite number (say  ), and

* * r1 , r2 , .... approach to numbers with absolute value  1, say r1 , r2 , ....

Therefore, lim( 2 / n) 2 lim( 2 / n) ˆ n→ n→ * 2 * lim Var( ) = + 2 (r1  +  r2 + ....) = 0 n→    ˆ is a consistent estimator even under autocorrelation.

14  Tests for Autocorrelation

❖ Durbin-Watson (1951) test → simplest and most widely used

Assumptions: ▪ The regression model includes an intercept term. ▪ We are examining presence of first-order autocorrelation. ▪ The regression model doesn’t include a lagged dependent variable as an explanatory variable.

Test procedure: Suppose the model is Y =  +  X +  X + ... +  X + t 0 1 1t 2 2t k kt t (8)  =  + u ρ  1 t t−1 t (9)

[ ut has all usual OLS assumptions]

The null hypothesis being tested is: H N : ρ = 0  is called first-order autocorrelation coefficient. 15 Steps

Step 1: Estimate model (8) by OLS and obtain the estimated residuals ˆt = et . Step 2: Compute Durbin-Watson d- as n ˆ ˆ 2 ( t −  t−1 ) t=2 d = n ˆ 2  t t=1 n 2 (et − et−1 ) t=2 = n 2 et t=1 n 2 n 2 n  et +  et−1 − 2  et et−1 = t=2 t=2 t=2 n 2  et t=1 For large samples, n 2 n  n  2  et − 2  et et−1   et et−1  d  t=1 t=1  21 − t=1  n 2  n 2   et   et  t=1  t=1 

 n   1   et et−1 Or d  2 (1 − ρ)   = n t=1  (10)   1 n 2 1 n 2   et  et−1   n t=1 n t=1  16 The value of d* lies between 0 and 4

Testing of d* ▪ Compare d* with the values of theoretical-d (upper and lower), available from Durbin-Watson d-table. ▪ d-values depend on number of observations (n) and number of explanatory variables (k) in the model.

The decision rules: If →  *   *   *  − −  *  − −  *  0 d dL dL d dU dU d 4 dU 4 dU d 4 dL 4 dL d 4 Then → Reject HN No decision possible Do not reject HN No decision possible Reject HN

Inference → Positive - No significant - Negative Autocorrelation Autocorrelation Autocorrelation

Limitations of Durbin-Watson Test • Unable to test higher-order autocorrelation. • Not suitable when the model includes a lagged dependent variable as an explanatory variable. • Not applicable if the model does not contain an intercept term. • In two cases, the test produces inconclusive result.

• Not robust for small samples. 17

❖ Theil-Nagar Correction to Durbin-Watson d-statistic

To make the DW test suitable for small samples, Theil-Nagar modified the d-statistic.

As d = 2 (1 − ρ)  ρ = 1 − (d/2), Theil-Nagar suggested that, in case of a small sample,  should be corrected as  d  n2 1 −  + k 2 2  =   corrected n2 − k 2 where n = number of observations, and k = number of parameters in the model (including the intercept term).

❖ Durbin’s (1970) h-test

When lagged dependent variable appeared as an explanatory variable of the model, one can use Durbin’s h-test to examine the presence of autocorrelation.

This test is describes below. 18 Consider the model

Yt =  + X t +Yt−1 +  t (11)

where  t =  t−1 + ut

The hypothesis to be tested is: H N :  = 0

We compute the value of h-statistic as  d  n h = 1−  2 (12)  2  1− n sˆ 2 where d = usual Durbin-Watson d-statistic, and sˆ = estimated variance of least- squares estimate of  . The steps involved in computation of h are (i) Estimate (11) by OLS method (ii) Compute the variance of ˆ (iii) Compute d (iv) Compute ρ = 1 − (d/2), and (v) Compute h

Durbin has shown that, for large samples, the h-statistic follows a standard normal distribution with zero mean and unit variance, i.e., h  N(0, 1) . Therefore, statistical significance of computed h may be tested using the standardized normal distribution table. From normal distribution, we know that Prob[ − 1.96  h  1.96] = 0.95 19 So, the decision rules are

• If h  1.96, we reject HN (at 95% level of confidence) and conclude that first- order positive autocorrelation [AR(1)] is present in data.

• If h  −1.96 , we reject HN (at 95% level of confidence) and conclude that first- order negative autocorrelation is present in data.

• When −1.96  h  1.96, HN is accepted and the conclusion is that of absence of autocorrelation (positive or negative).

2 One important limitation of Durbin’s h-test is that it cannot be applied if nsˆ  1. In that case, the h-test suffers from ‘indeterminacy problem’. Although this situation does not usually happen, we will have to use some other test if it happens.

20 ❖ Breusch-Godfrey (BG) Lagrange Multiplier (LM) Test

▪ This test can pick up higher-order autocorrelation. ▪ Applicable under presence of lagged dependent variable as an explanatory variable. To understand the BG test procedure, consider the model

Yt = 0 + 1X1t + 2 X 2t + ... + k X kt +t

where εt = ρ1εt−1 + ρ2εt−2 + ...+ ρpεt− p + ut The hypotheses are

H N : 1 = 2 = ... =  p = 0 (No autocorrelation) H : At least one of the  is non − zero A s (Autocorrelation is present)

Steps in BG Test

(1) Estimate the model by OLS and obtain ˆt = et

e X , X , ... , X ,e ,e , ... ,e (2) Run the auxiliary regression of t on 1t 2t kt t−1 t−2 t− p . 21 2 (3) To test the validity of HN, compute the LM-statistic = (n – p)R using the value of R2 from auxiliary regression [here n = number of observations, and p = order of autocorrelation we are examining].

This LM-statistic follows a Chi-square distribution with degrees of freedom p.

* 2* 2 (4) Decision rule: If LM =  >  , reject HN ⟹ autocorrelation is present.

F-version of the BG-test

Here we run the auxiliary regression twice – once by including the terms et−1,et−2 , ... ,et− p (called unrestricted regression), and once by dropping these (called restricted regression).

Next we compute the F-value as 2 (R 2 − R ) p F* = UR R 2 (1 − RUR ) (n − m −1− p) R2 R2 2 UR and R are values of R from unrestricted and restricted models respectively, n = number of observations, and m = number of explanatory variables in unrestricted model.

* The decision rule: If F  Fλ(p,n − m −1− p) , reject HN ⟹ 22 autocorrelation is present. Remedial Measures

Autocorrelation makes the OLS estimators inefficient.

Therefore, it is necessary to correct this problem.

Two different approaches to correct autocorrelation problem:

❖ When the value of  is known

Autocorrelation problem may be corrected easily when the value of the autocorrelation coefficient is known. To understand how, consider the model

Yt =  + X t +  t (16) where  t is assumed to follow first-order autocorrelation so that

 t =  t−1 + ut (17)

23 Now lagging (16) by one period and pre-multiplying it by  , we get

Yt−1 =  + Xt−1 +  t−1 (18) Subtracting (18) from (17) gives

(Yt − Yt−1 ) = (1 − ) +  (Xt − Xt−1 ) + ( t −  t−1 ) (19) * * * So Yt =  + Xt + ut (ut = εt − ρεt−1 (from 17)) (20) * * * where Yt = (Yt − Yt−1 ) , Xt = (Xt − Xt−1 ) and  = (1 − ) .

Note that the disturbance term ut in (20) satisfies all usual assumptions of the CLRM.

So if is known, we can apply OLS to (20) and obtain estimates that are BLUE.

24 (2) When the value of  is unknown ⟶ faced in reality ⟶ we may apply Cochrane-Orcutt Iterative Procedure

Under this method, we estimate the following equation by OLS method.

Yt =  + X t +  t

Then we obtain ˆt = et , which is used to calculate ˆ as e e ˆ t t−1  = 2 et In the next step, use to estimate the model * (Yt − ˆYt−1 ) =  +  (Xt − ˆXt−1 ) + ( t − ˆ t−1 ) Thus, a two-step estimation is involved here.

Suppose  and   are Cochrane-Orcutt estimates of  and  in the second step. Use them to obtain a new set of residuals and also obtain a new estimate of  , say ˆ , as ee ˆ t t−1  = 2 et Again use ˆ to construct the model ˆ * ˆ ˆ (Yt − ˆYt−1 ) =  +  (Xt − ˆXt−1 ) + ( t − ˆ t−1 ) Continue this procedure until estimated values of and converge (i.e., they stop differing by no more than some pre-selected (very small) value, say 0.001). 25 Hildreth-Lu (1960) search procedure

This is a simple approach, which is intuitively appealing.

Here the OLS method is applied repeatedly to estimate the equation * (Yt − Yt−1) =  + (X t − X t−1) + (t −  t−1) by assigning different values to  between –1 and +1, say ρ = −0.99,−0.98,−0.97, .... ,0.97,0.98,0.99

From all these estimated models, we select the one that gives the lowest value of the residual sum squares (RSS). Thus, a ‘search procedure’ is involved here.

Limitation → this search procedure is very hectic and involves many calculations.

26 ❖ Heteroskedasticity and autocorrelation consistent (HAC) standard errors

When the disturbance term is autocorrelated, instead of using above methods, we may consider using the ‘heteroskedasticity and autocorrelation consistent’ (HAC) standard errors, developed by Newey and West (1987).

The HAC standard errors are standard errors for OLS estimators that are consistent whether or not the disturbance term of the model is homoskedastic and autocorrelated.

However, the HAC standard errors may not be valid (accurate) when the size of the sample is small.

27