Chapter 11 Autocorrelation

Chapter 11 Autocorrelation One of the basic assumptions in the linear regression model is that the random error components or disturbances are identically and independently distributed. So in the model y Xu , it is assumed that 2 u ifs 0 Eu(,tts u ) 0 if0s i.e., the correlation between the successive disturbances is zero. 2 In this assumption, when Eu(,tts u ) u , s 0 is violated, i.e., the variance of disturbance term does not remain constant, then the problem of heteroskedasticity arises. When Eu(,tts u ) 0, s 0 is violated, i.e., the variance of disturbance term remains constant though the successive disturbance terms are correlated, then such problem is termed as the problem of autocorrelation. When autocorrelation is present, some or all off-diagonal elements in E(')uu are nonzero. Sometimes the study and explanatory variables have a natural sequence order over time, i.e., the data is collected with respect to time. Such data is termed as time-series data. The disturbance terms in time series data are serially correlated. The autocovariance at lag s is defined as sttsEu( , u ); s 0, 1, 2,... At zero lag, we have constant variance, i.e., 22 0 Eu()t . The autocorrelation coefficient at lag s is defined as Euu()tts s s ;s 0,1,2,... Var() utts Var ( u ) 0 Assume s and s are symmetrical in s , i.e., these coefficients are constant over time and depend only on the length of lag s . The autocorrelation between the successive terms (and)uu21, (and),....uu32 (and)uunn1 gives the autocorrelation of order one, i.e., 1 . Similarly, the autocorrelation between the successive terms (uuuuuu3142 and ),( and )...(nn and 2 ) provides the autocorrelation with of order two, i.e., 2 . Regression Analysis | Chapter 11 | Autocorrelation | Shalabh, IIT Kanpur 1 Source of autocorrelation Some of the possible reasons for the introduction of autocorrelation in the data are as follows: 1. Carryover of effect, at least in part, is an important source of autocorrelation. For example, the monthly data on expenditure on the household is influenced by the expenditure of the preceding month. The autocorrelation is present in cross-section data as well as time-series data. In the cross- section data, the neighbouring units tend to be similar with respect to the characteristic under study. In time-series data, time is the factor that produces autocorrelation. Whenever some ordering of sampling units is present, the autocorrelation may arise. 2. Another source of autocorrelation is the effect of deletion of some variables. In regression modeling, it is not possible to include all the variables in the model. There can be various reasons for this, e.g., some variable may be qualitative, sometimes direct observations may not be available on the variable etc. The joint effect of such deleted variables gives rise to autocorrelation in the data. 3. The misspecification of the form of relationship can also introduce autocorrelation in the data. It is assumed that the form of relationship between study and explanatory variables is linear. If there are log or exponential terms present in the model so that the linearity of the model is questionable, then this also gives rise to autocorrelation in the data. 4. The difference between the observed and true values of the variable is called measurement error or errors–in-variable. The presence of measurement errors on the dependent variable may also introduce the autocorrelation in the data. Regression Analysis | Chapter 11 | Autocorrelation | Shalabh, IIT Kanpur 2 Structure of disturbance term: Consider the situation where the disturbances are autocorrelated, 01 n 1 E(') 10n 2 nn12 0 1 11 n 1 12n 0 nn12 1 1 11 n 1 2 12n . u nn12 1 2 Observe that now there are ()nk parameters- 12, ,...,ku , , 12 , ,..., n 1 . These ()nk parameters are to be estimated on the basis of available n observations. Since the number of parameters are more than the number of observations, so the situation is not good from the statistical point of view. In order to handle the situation, some special form and the structure of the disturbance term is needed to be assumed so that the number of parameters in the covariance matrix of disturbance term can be reduced. The following structures are popular in autocorrelation: 1. Autoregressive (AR) process. 2. Moving average (MA) process. 3. Joint autoregression moving average (ARMA) process. Estimation under the first order autoregressive process: Consider a simple linear regression model yXutnttt01 ,1,2,...,. Assume usi ' follow a first-order autoregressive scheme defined as uuttt 1 where 1,E (t ) 0, Regression Analysis | Chapter 11 | Autocorrelation | Shalabh, IIT Kanpur 3 2 ifs 0 E(,tts ) 0if0s for all tn1,2,..., where is the first-order autocorrelation between ut and utt1,1,2,...,. n Now uuttt 1 ()utt21 t 2 tt12 t ... r tr r0 Eu()t 0 222242 Eu(tt ) E ( ) E ( t12 ) E ( t ) ... 24 2' (1 ....) (t s are seriallyindependent) 2 Eu()22 forall. i tu1 2 Euu( ) E22 ... ... tt112123 t t t t t t E ttt 12 ... tt 12 ... E ... 2 tt12 2 u . Similarly, 22 Euu()tt2 u . In general, s 2 Euu()tts u 1 21 n 1 n2 . Euu(') 2 231 n u nn123 n 1 Note that the distribution terms are no more independent and E(')uu 2 I . The disturbance are nonspherical. Regression Analysis | Chapter 11 | Autocorrelation | Shalabh, IIT Kanpur 4 Consequences of autocorrelated disturbances: Consider the model with first-order autoregressive disturbances yX u n1 nk k1 n1 uuttt1 ,1,2,..., t n with assumptions Eu() 0,( Euu ') 2 ifs 0 EE()ttts 0,( ) 0ifs 0 where is a positive definite matrix. The ordinary least squares estimator of is bXXXy (')1 ' (')X XXXu1 '( ) bXXXu (')1 ' Eb()0. So OLSE remains unbiased under autocorrelated disturbances. The covariance matrix of b is Vb()()()' Eb b (')X XXEuuXXX11 '(')(') (')XXXXXX11 ' (') 21 u (').XX The residual vector is e y Xb Hy Hu ee'' yHy uHu ' 1 E(')ee Euu (') E u '(') X X X X ' u 21 ntrXXXX u (') ' . ee' Since s2 , so n 1 2 1 E()strXXXX21u (') ' , nn11 so s2 is a biased estimator of 2 . In fact, s2 has a downward bias. Regression Analysis | Chapter 11 | Autocorrelation | Shalabh, IIT Kanpur 5 Application of OLS fails in case of autocorrelation in the data and leads to serious consequences as an overly optimistic view from R2. narrow confidence interval. usual t -ratio and F ratio tests provide misleading results. the prediction may have large variances. Since disturbances are nonspherical, so generalized least squares estimate of yields more efficient estimates than OLSE. The GLSE of is ˆ ('X 11XX ) ' 1 y E()ˆ ˆ 211 VXX()u ( ' ). The GLSE is best linear unbiased estimator of . Durbin Watson test: The Durbin-Watson (D-W) test is used for testing the hypothesis of lack of the first-order autocorrelation in the disturbance term. The null hypothesis is H0 :0 Use OLS to estimate in y Xu and obtain the residual vector eyXbHy where bXXXyHIXXXX(')11 ', (') '. The D-W test statistic is n 2 eett 1 t2 d n 2 et t1 nn n 2 eett11 ee tt tt22 t 2 nn 2. n 22 2 eett e t tt11 t 1 Regression Analysis | Chapter 11 | Autocorrelation | Shalabh, IIT Kanpur 6 For large n, dr112 dr 2(1 ) where r is the sample autocorrelation coefficient from residuals based on OLSE and can be regarded as the regression coefficient of et on et1 . Here positive autocorrelation of et ’s d 2 negative autocorrelation of et ’s d 2 zero autocorrelation of et ’s d 2 As 11,r so if 1rd 0, then 2 4 and if 0rd 1, then 0 2. So d lies between 0 and 4. Since e depends on X , so for different data sets, different values of d are obtained. So the sampling distribution of d depends on X . Consequently, exact critical values of d cannot be tabulated owing to their dependence on X. Durbin and Watson, therefore, obtained two statistics d and d such that ddd and their sampling distributions do not depend upon X. Considering the distribution of d and d , they tabulated the critical values as dL and dU respectively. They prepared the tables of critical values for 15 nk 100 and 5. Now tables are available for 6n 200 and k 10. The test procedure is as follows: H0 :0 Nature of H1 Reject H0 when Retain H0 when The test is inconclusive when H1 :0 dd L dd U dddL U H1 :0 dd(4L ) dd (4 U ) (4)ddUL (4) d H1 :0 dd L or ddUU (4 d ) dddLU dd(4L ) or (4ddUL ) (4 d ) Values of dL and dU are obtained from tables. Regression Analysis | Chapter 11 | Autocorrelation | Shalabh, IIT Kanpur 7 Limitations of D-W test 1. If d falls in the inconclusive zone, then no conclusive inference can be drawn. This zone becomes fairly larger for low degrees of freedom. One solution is to reject H0 if the test is inconclusive. A better solution is to modify the test as Reject H0 when dd U . Accept H0 when dd U . This test gives a satisfactory solution when values of xi ’s change slowly, e.g., price, expenditure etc. 2. The D-W test is not applicable when the intercept term is absent in the model. In such a case, one can use another critical value, say dM in place of dL . The tables for critical values dM are available. 3. The test is not valid when lagged dependent variables appear as explanatory variables.

Chapter 11 Autocorrelation

Power Comparisons of Five Most Commonly Used Autocorrelation Tests

Understanding Linear and Logistic Regression Analyses

Ordinary Least Squares 1 Ordinary Least Squares

LECTURES 2 - 3 : Stochastic Processes, Autocorrelation Function

Alternative Tests for Time Series Dependence Based on Autocorrelation Coefficients

3 Autocorrelation

Spatial Autocorrelation: Covariance and Semivariance Semivariance

Ordinary Least Squares: the Univariate Case

Modeling and Generating Multivariate Time Series with Arbitrary Marginals Using a Vector Autoregressive Technique

Simple Linear Regression with Least Square Estimation: an Overview

Chapter 2: Ordinary Least Squares Regression

Application of General Linear Models (GLM) to Assess Nodule Abundance Based on a Photographic Survey (Case Study from IOM Area, Paciﬁc Ocean)