Heteroscedasticity and Autocorrelation

Home , Heteroscedasticity, Homoscedasticity, Least squares

Chapter 5 Heteroscedasticity and Autocorrelation

5.1 Nonspherical Disturbances

Assumption 4 in the classical linear regression model combines homoscedasticity and nonautocorrelation in a single concept, spherical disturbances. Given the model y = Xβ + ε, spherical disturbances mean that

E[ε|X] = 0 (5.1) and E[εε ′|X] = σ 2I (5.2)

σ 2 0 ... 0  0 σ 2 ... 0  ′ E[εε |X] =  . . . . .  . . .. .     0 0 ... σ 2 

However, when the variance of the residuals is not constant we have heteroscedasticity, and when the residuals are not independent of each other we have autocorrelation. To understand heteroscedasticity and autocorrelation, it is useful to write the variance-covariance matrix of the residuals in the following way

ε = y − Xβ (5.3) E[ε|X] = 0 E[εε ′|X] = σ 2Ω = Σ.

The matrix Ω is positive deﬁnite and when Ω = I we are back to the special case of spherical disturbances. When the disturbances are uncorrelated across observations, but we do have different variances (heteroscedasticity), σ 2Ω can we expressed as

43 44 5 Heteroscedasticity and Autocorrelation

2 ω1 0 ... 0 σ1 0 ... 0    2  0 ω2 ... 0 0 σ ... 0 ′ 2 2 2 E[εε |X] = σ Ω = σ  . . . .  =  . . . . .  . . .. .   . . .. .     2   0 0 ... ωn   0 0 ... σn 

When autocorrelation is present, usually in time-series or panel data, σ 2Ω can we expressed as

1 ρ1 ... ρn−1   ρ1 1 ... ρn−2 ′ 2 2 E[εε |X] = σ Ω = σ  . . . . .  . . .. .     ρn−1 ρn−2 ... 1 

The off-diagonal elements depend on the model used for the disturbance. Usually the values decrease as we move away from the diagonal, consistent with the notion of fading memory. While heteroscedasticity is usually present in cross-sectional data and autocorrelation in time-series data, panel data sets may exhibit both character- istics.

5.2 Least Squares Estimation

We now turn to analyze the implications of nonspherical disturbances when using the ordinary least squares estimator. The OLS estimator will still be unbiased, consistent, and asymptotically normally distributed. However, it will no longer be efﬁcient and the usual inference procedures are no longer appropriate. The OLS estimator is:

b = ( X′X)−1X′y = β + ( X′X)−1X′ε (5.4)

Taking expectations and if E[ε|X] = 0 (the regressors are uncorrelated with the disturbance), then E[b] = EX[E[b|X]] = β. (5.5) Hence, it is unbiased. With nonstochastic regressors (or conditional on X), the sampling variance of OLS is given by

Var [b|X] = E[( b − β)( b − β)′|X] (5.6) = E[( X′X)−1X′εε ′X(X′X)−1|X] = ( X′X)−1X′(σ 2Ω)X(X′X)−1 σ 2 1 −1 1 1 −1 = X′X X′ΩX X′X . n n n n 5.3 Efﬁcient Estimation by Generalized Least Squares (GLS) 45

Because the variance of b is not σ 2(X′X)−1, statistical inference based on s2(X′X)−1 may be misleading. Not only the wrong matrix is used, but s2 may be a biased estimator of σ 2. The inference procedures based on the F and t distributions will no longer be appropriate. While this result is for the sampling variance, the asymptotic covariance has a similar structure. If σ 2Ω were known, the estimator for the asymptotic covariance matrix of b would be

σ 2 1 −1 1 1 −1 V = X′X X′[σ 2Ω]X X′X . (5.7) OLS n n n n The problem, however, is that σ 2Ω is unknown.

5.3 Efﬁcient Estimation by Generalized Least Squares (GLS)

5.3.1 Generalized Least Squares

Let Ω −1 = P′P. If we premultiply y = Xβ + ε by P we obtain

Py = PX β + Pε (5.8)

y∗ = X∗β + ε∗

Then, the conditional variance of ε∗ is:

′ 2 ′ 2 E[ε∗ε∗ |X∗] = Pσ ΩP = σ I, (5.9) meaning that the classical regression model applies to the transformed model. If Ω is known, then y∗ and X∗ are observed data. The OLS estimator (which is now efﬁcient) on the transformed model is

ˆ ′ −1 ′ β = ( X∗ X∗) X∗ y∗ (5.10) = ( X′P′PX )−1X′P′Py = ( X′Ω −1X)−1X′Ω −1y.

This is called the Generalized Least Squares (GLS) estimator of β. This estimator differs from the usual OLS estimator in the sense that GLS uses a ‘weighting matrix.’ With spherical disturbances, Ω −1 = I and GLS is equal to OLS. GLS is unbiased and it has a sampling variance given by

ˆ 2 ′ −1 2 −1 −1 Var [β|X∗] = σ (X∗ X∗) = σ (XΩ X) . (5.11)

F and t statistics follow directly from the transformed model. 46 5 Heteroscedasticity and Autocorrelation 5.3.2 Feasible Generalized Least Squares

To be able to implement the GLS estimator we need to know the matrix Ω. One approach is to estimate a restricted version of Ω that involves a small set of parameters θ such that Ω = Ω(θ). A common used formula in time-series settings is

1 ρ ... ρn−1  ρ 1 ... ρn−2  Ω(ρ) =  . . . . ,  . . .. .     ρn−1 ρn−2 ... 1  which involves only a single additional parameter ρ. Once θ is estimated, we can use Ωˆ = Ω(θˆ) instead of the true Ω. Then the Feasible Generalized Least Squares (FGLS) estimator is ˆ βˆ = ( X′Ωˆ −1X)−1X′Ωˆ −1y. (5.12) We will later cover a simple approach to estimate the weighting matrix P.

5.4 Heteroscedasticity

Heteroscedasticity in the regression model arises when

2 Var [εi|X] = σi , i = 1,2,... n. (5.13)

If we assume that the disturbances are pair-wise uncorrelated, we have

2 ω1 0 ... 0 σ1 0 ... 0    2  0 ω2 ... 0 0 σ ... 0 ′ 2 2 2 E[εε |X] = σ Ω = σ  . . . .  =  . . . . ,  . . .. .   . . .. .     2   0 0 ... ωn   0 0 ... σn 

2 2 where it is sometime useful to write σi = σ ωi. This arbitrary scaling allows us to have the following normalization

n tr (Ω) = ∑ ωi = n. (5.14) i=1

Here homoscedastic disturbances is the special case where ωi = 1 for i = 1,2,..., n. We can think of ωi as the weights that are scaled in such a way that they reﬂect the variety in variance of the disturbances. 5.5 Testing for Heteroscedasticity 47 5.5 Testing for Heteroscedasticity

5.5.1 Breusch-Pagan Test

Given the linear regression model

y = β0 + β1x1 + β2x2 + ··· + βK + ε (5.15) we know that OLS is unbiased and consistent if we assume E[ε|x1,x2,..., xK] = 0. Let the null hypothesis that we have homoscedastic errors be

2 H0 : Var [ε|x1,x2,..., xK] = σ . (5.16)

Because we are assuming that ε has zero conditional expectation, Var [ε|x1,x2,..., xK] = 2 E[ε |x1,x2,..., xK], and so the null hypothesis of homoscedasticity is equivalent to

2 2 H0 : E[ε |x1,x2,..., xK] = σ . (5.17)

This shows that if we want to test for violation of the homoscedasticity assumption, 2 we want to test whether E[ε |x1,x2,..., xK] is related to one or more of the varibles 2 in X. If H0 is false, E[ε |x1,x2,..., xK] can be any function of X. A simple approach is to assume a linear function

2 ε = δ0 + δ1x1 + δ2x2 + ··· + δKxK + ε, (5.18) where ε is an error term with mean zero given X. The null hypothesis for homoscedasticity is: H0 : δ0 = δ1 = δ2 = ··· = δK = 0. (5.19) Under the null, it is reasonable to assume that ε is independent of X. To be able to implement this test, we follow a two step procedure. In the ﬁrst step we estimate Equation 5.15 via OLS. We estimate the residuals e, square them and then estimate the following equation:

2 e = δ0 + δ1x1 + δ2x2 + ··· + δKxK + error . (5.20)

We can then easily compute the F statistic for the joint signiﬁcance of all variables in X. Using OLS residuals in place of the errors does not affect the large sample distribution of the F statistic. An additional LM statistic to test for heteroscedasticity 2 can be constructed based on the Re2 obtained from Equation 5.20:

2 LM = n · Re2 . (5.21)

2 Under the null hypothesis, LM is distributed asymptotically as χK−1. This LM version of the test is called the Breusch-Pagan test for heteroscedasticity. 48 5 Heteroscedasticity and Autocorrelation 5.5.2 White Test

White (1980) proposed a test for heteroscedasticity that that adds the squares and cross products of all the independent variables to Equation 5.20. In a model with K − 1 independent variables, the White test is based on the estimation of:

2 2 2 2 e = δ0 + δ1x1 + δ2x2 + δ3x3 + δ4x1 + δ5x2 + δ6 (5.22) δ7x1 · x2 + δ8x1 · x3 + δ9x2 · x3 + error .

Compared with the Breusch-Pagan test, Equation 5.22 has six more regressors. The White test for heteroscedasticity is based on the LM statistic for testing that all the δ j in Equation 5.22 are zero, except for the intercept. When there is a large number of regressors in Equation 5.15, then the test estimates Equation 5.18 with the squared residuals on the ﬁtted values of Equation 5.15 to avoid loosing too many degrees of freedom. 1

5.5.3 Breusch-Pagan and White tests in Stata

Let’s go over the following example. use http://www.ats.ucla.edu/stat/stata/ado/analysis/hetdata, clear regress exp age ownrent income incomesq That yields the following regression output: Source | SS df MS Number of obs = 72 ------+------F( 4, 67) = 5.39 Model | 1749357.01 4 437339.252 Prob > F = 0.0008 Residual | 5432562.03 67 81083.0153 R-squared = 0.2436 ------+------Adj R-squared = 0.1984 Total | 7181919.03 71 101153.789 Root MSE = 284.75

------exp | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------+------age | -3.081814 5.514717 -0.56 0.578 -14.08923 7.925606 ownrent | 27.94091 82.92232 0.34 0.737 -137.5727 193.4546 income | 234.347 80.36595 2.92 0.005 73.93593 394.7581 incomesq | -14.99684 7.469337 -2.01 0.049 -29.9057 -.0879857 _cons | -237.1465 199.3517 -1.19 0.238 -635.0541 160.7611 ------The command line for the Breusch-Pagan test for heteroscedasticity is estat hettest

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of exp chi2(1) = 29.23 Prob > chi2 = 0.0000

1 The F test can also be used. 5.5 Testing for Heteroscedasticity 49

The default Stata option of the Breusch-Pagan test has the null hypothesis that the error variances are all equal versus the alternative that the error variances are a multi- plicative function of one or more of the regressors. The hettest command has the alternative hypothesis that the error variances change as the fitted valuesy ˆ change. A large χ2 (with a corresponding low p-value) indicates that heteroscedasticity is present. In the above example we do have a heteroscedastic errors. Step by step, what the hettest is doing is the following. First, let’s predict the fitted values and the residuals of the above regression equation predict yhat predict e, resid Now, we generate the squared residuals and rescale them so that the squared values have a mean equal to one. This is needed for the eventual test statistic. gen esquare = eˆ2 / (e(rss)/e(N)) Then, regress the rescaled square residuals on the fitted values: reg esquare yhat

Source | SS df MS Number of obs = 72 ------+------F( 1, 70) = 4.46 Model | 58.4504531 1 58.4504531 Prob > F = 0.0383 Residual | 917.248964 70 13.1035566 R-squared = 0.0599 ------+------Adj R-squared = 0.0465 Total | 975.699417 71 13.7422453 Root MSE = 3.6199

------esquare | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------+------yhat | .0057804 .0027369 2.11 0.038 .0003218 .0112389 _cons | -.5175294 .8356209 -0.62 0.538 -2.184123 1.149064 ------To compute the χ2 statistic and its corresponding p-value display "Chi Square (1) = " e(mss) / 2 display "Prob > chi2 = " chi2tail(1, e(mss)/ 2) To get Chi Square (1) = 29.225227 Prob > chi2 = 6.443e-08 That are the same as the ones obtained using the Stata command hettest . For the White test the command is: quietly regress exp age ownrent income incomesq estat imtest, white To obtain White’s test for Ho: homoskedasticity against Ha: unrestricted heteroskedasticity chi2(12) = 14.33 Prob > chi2 = 0.2802 Cameron & Trivedi’s decomposition of IM-test ------Source | chi2 df p ------+------Heteroskedasticity | 14.33 12 0.2802 50 5 Heteroscedasticity and Autocorrelation

Skewness | 7.89 4 0.0959 Kurtosis | 1.67 1 0.1964 ------+------Total | 23.88 17 0.1227 ------

5.6 Weighted Least Squares, Unknown Ω

When there is heteroscedasticity and Ω is unknown, we need to be estimate it from the data and use Weighted Least Squares (WLS). The GLS for correcting for heteroscedasticity is called WLS. The idea is the same as in GLS, we have to weight each of the observations in such a way that the resulting weighted residuals have constant variance. There are many ways in which heteroscedasticity can appear, and a general way to write this is Var [ε|X] = σ 2h(X), (5.23) where h(X) is some function of the explanatory variables that determines the heteroscedasticity. Because variances must be positive, h(X) > 0. Assume the following ﬂexible functional form for h(X)

Var [ε|X] = σ 2 exp (Xδ). (5.24)

Under Equation refeq:5.23, we can write

2 2 ε = σ exp (δ0 + δ1x1 + δ2x2 + ··· + δK−1xK−1)η, (5.25) where η has a mean equal to unity, conditional on X. If we assume that η is independent from X we have

2 log (ε ) = α0 + δ1x1 + δ2x2 + ··· + δK−1xK−1 + u. (5.26)

To be able to implement this procedure, we replace the unobserved ε with the OLS residuals e to estimate:

2 log (e ) = α0 + δ1x1 + δ2x2 + ··· + δK−1xK−1 + u. (5.27)

Finally, for h we calculate 2 hî = exp (log (e ).) (5.28) d Now we can use WLS with weights 1 /hî. Using weights 1 /hî just means that we need to transform our model and multiply each of the variables with its corresponding weight. 5.7 White Standard Errors 51 5.7 White Standard Errors

From the previous discussion we can see that heteroscedasticity has some potential serious implications for inferences based on OLS results. A quick (but not gener- ally accepted) solution for the heteroscedasticity problem is to use the White heteroscedasticity consistent estimator for the asymptotic variance-covariance matrix of b. This one is given by

−1 n −1 1 1 ′ 1 2 ′ 1 ′ Est. Asy. Var [b] = X X ∑ ei xixi X X . (5.29) n n n i=1 n This formula proposed by White (1980) implies that without actually specifying the type of heteroscedasticity, we can still make appropriate inferences based on the OLS results. White heteroscedasticity robust standard errors can be easily computer in Stata using the following command use http://www.stata-press.com/data/r11/census3 reg brate medage c.medage#c.medage i.region, robust

Linear regression Number of obs = 50 F( 5, 44) = 332.98 Prob > F = 0.0000 R-squared = 0.9196 Root MSE = 8.782

------| Robust brate | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------+------medage | -109.0958 5.624251 -19.40 0.000 -120.4307 -97.76087 | c.medage#| c.medage | 1.635209 .0958924 17.05 0.000 1.441951 1.828468 | region | 2 | 15.00283 3.484837 4.31 0.000 7.979603 22.02606 3 | 7.366445 3.002913 2.45 0.018 1.314471 13.41842 4 | 21.39679 2.909907 7.35 0.000 15.53226 27.26132 | _cons | 1947.611 83.05775 23.45 0.000 1780.219 2115.003 ------

Compare this regression output with the output shown after Equation 4.13. We can see that while the coefficients are obviously the same, but the standard errors and its associated t statistics, p values, confidence intervals and F statistics are now based on the asymptotic variance-covariance matrix of Equation 5.29. Using a different estimator for the variance-covariance matrix of b can easily flip your results. For example, at a 5% significance level 3.region is statistically significant using the variance-covariance estimator in Equation 2.43, but it is not statistically significant if we use the one in Equation 5.29. 52 5 Heteroscedasticity and Autocorrelation 1500 1000 500 Residuals 0 -500 2 4 6 8 10 income

Fig. 5.1 Plot of the residuals against income.

5.8 Weighted Least Squares in Stata

The implementation of Weighted Least Squares in Stata is straight forward if we follow the steps detailed before Equation 5.27. A second approach is to use the Weighed Least Squares procedure ( wls0 ) programed by Philip Ender (from UCLA). 2 The ﬁrst step is to install the program in your computer net search wls0 Then just follow the instructions. Let’s go over the following example use http://www.ats.ucla.edu/stat/stata/ado/analysis/hetdata, clear quietly regress exp age ownrent income incomesq rvpplot income, yline(0) scheme(lean1) To obtain the regression output of expenditures on income, income squared plus some controls, Source | SS df MS Number of obs = 72 ------+------F( 4, 67) = 5.39 Model | 1749357.01 4 437339.252 Prob > F = 0.0008 Residual | 5432562.03 67 81083.0153 R-squared = 0.2436 ------+------Adj R-squared = 0.1984 Total | 7181919.03 71 101153.789 Root MSE = 284.75

------exp | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------+------

2 This example was obtained from http://www.ats.ucla.edu/stat/stata/ado/analysis/wls0.htm 5.8 Weighted Least Squares in Stata 53 .04 .02 wls residuals wls 0 -.02 0 200 400 600 800 Fitted values

Fig. 5.2 Plot of the WLS residuals against the ﬁtted values.

age | -3.081814 5.514717 -0.56 0.578 -14.08923 7.925606 ownrent | 27.94091 82.92232 0.34 0.737 -137.5727 193.4546 income | 234.347 80.36595 2.92 0.005 73.93593 394.7581 incomesq | -14.99684 7.469337 -2.01 0.049 -29.9057 -.0879857 _cons | -237.1465 199.3517 -1.19 0.238 -635.0541 160.7611 ------and Figure 5.1 that shows the plot of the residuals against income. This ﬁgure is the same as the one presented in Greene (2008), Figure 8.1. There is clear evidence of heteroscedasticity as the residuals are greater for individuals with higher income. Now, if we obtain the WLS using income as weights wls0 exp age ownrent income incomesq, wvar(income) type(abse) noconst graph The WLS type, abse , uses the absolute value of the residuals. The plot of the WLS residuals against the ﬁtted values is presented in Figure 5.2, while the regression output is WLS regression - type: proportional to abs(e) (sum of wgt is 5.1961e-03) Source | SS df MS Number of obs = 72 ------+------F( 4, 67) = 5.73 Model | 818838.784 4 204709.696 Prob > F = 0.0005 Residual | 2393372.07 67 35721.9713 R-squared = 0.2549 ------+------Adj R-squared = 0.2104 Total | 3212210.86 71 45242.4065 Root MSE = 189

------exp | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------+------age | -2.694186 3.807306 -0.71 0.482 -10.2936 4.905229 ownrent | 60.44878 58.55088 1.03 0.306 -56.41928 177.3168 income | 158.427 76.39115 2.07 0.042 5.949594 310.9044 incomesq | -7.249289 9.724337 -0.75 0.459 -26.65915 12.16057 _cons | -114.1089 139.6875 -0.82 0.417 -392.9263 164.7085 ------54 5 Heteroscedasticity and Autocorrelation

Figure 5.2 shows appears as if WLS solved the heteroscedasticity problem. In prac- tice, the graphical approach to assess heteroscedasticity is not enough and you should carry out a more strict heteroscedasticity test before and after using WLS. Moreover, wls0 has a large number of options, so before using it (as in any built in command in a statistical package), I recommend you to go to help wls0 . Fi- nally, notice that while the use of White robust standard errors does not change the estimated coefﬁcients, the use of WLS does.