Chapter 4

Lecture 3:

In many situations, the Gauss-Markov conditions will not be satisfied. These are:

E[ǫ] = 0 i = 1,...,n ǫ X ⊥ 2 Var(ǫ)= σ In

We consider the model

Y = Xβ + ǫ.

Suppose that, conditioned on X, ǫ has

Var(ǫ X)= σ2Ψ | where Ψ depends on X. Recall that the OLS estimator βOLS of β is:

t −1 t b t −1 t βOLS =(X X) X Y = β +(X X) X ǫ.

Therefore, conditioned on X,b the of βOLS is:

2 t b−1 t −1 Var(βOLS X)= σ (X X) Ψ(X X) . | If Ψ = I, then the proof of the Gauss-Markov theorem, that the OLS estimator is BLUE breaks down; 6 b the OLS estimator is unbiased, but no longer best in the sense.

4.1 Estimation when Ψ is known

If Ψ is known, let P denote a non-singular n n matrix such that P tP =Ψ−1. Such a matrix exists × and P ΨP t = I. Consequently, E[Pǫ X] = 0 and Var(Pǫ X) = σ2I, which does not depend on X. | | It follows that the Gauss-Markov conditions are satisfied for Pǫ. The entire model may therefore be transformed:

41 PY = PXβ + Pǫ which may be written as: Y ∗ = X∗β + ǫ∗. This model satisfies the Gauss Markov conditions and the resulting estimator, which is BLUE, is:

∗t ∗ −1 ∗t t −1 −1 t −1 βGLS =(X X ) X Y =(X Ψ X) X Ψ Y.

Here GLS stands for Generlisedb Least Squares. It has covariance:

2 ∗t ∗ −1 2 t −1 −1 Var(βGLS)= σ (X X ) = σ (X Ψ X) . If k = rank(X); that is X is n k where n>k (that is k = p + 1 where p is the number of regressor ×b variables), the estimator is:

1 1 σ2 = (Y ∗ X∗β )t(Y ∗ X∗β )= (Y Xβ )tΨ−1(Y Xβ ). n k − GLS − GLS n k − GLS − GLS − − b b b b 4.2b Heteroskedasticity

Heteroskedasticity refers to the special case where Ψ is diagonal, but the elements are not all equal. The errors are mutually uncorrelated, but the variance may vary between observations. This is frequently encountered in cross-sectional models. For example, suppose yi denotes expenditure on food, while xi is disposable income. Higher income corresponds to higher expenditure on food; variation of food expenditure increases as income increases. A suitable model may be:

2 Var(ǫi xi)= σ exp α1xi . | { } Set

2 1 h = Var(ǫi Xi.) i σ2 | where Xi. denotes the row-vector of values for the regressor variables for observation i. We assume 2 2 that, for each i, Var(ǫi Xi.) = Var(ǫi X). Under this assumption, Ψ = diag(h ,...,h ). We also | | 1 n replace the Gauss-Markov assumption of zero expectation with E[ǫi X] = 0. The computation of the | BLUE estimator now follows:

n −1 n 1 1 β = Xt X Xt Y . GLS h2 i. i. h2 i. i i=1 i ! i=1 i X X This is a Generalised Least Squaresb (GLS) estimator and is sometimes referred to as . Furthermore, the covariance structure is:

n −1 1 Var(β )= σ2 Xt X GLS h2 i. i. i=1 i ! X b 42 and the unbiased estimate of σ2 is:

n 2 1 1 2 σ = (Yi Xi.β ) . n k h2 − GLS i=1 i − X If in addition we make assumptionsb of normality, then as before,b an F -test can be used to test a number of linear restrictions. Let R be a j k matrix; we’re testing j linear restrictions on the parameters. × Consider H0 : Rβ = q versus the alternative H1 : Rβ = q. For example, we could test (simultaneously) 6 β1 + β2 + β4 = 1 and β5 = 0. This represents two restrictions (j = 2).

Let

ξ =(Rβ q)t(RVar(β)Rt)−1(Rβ q). − − 2 Under H0, thi has an asumptoticb χ distributiond b withb j degrees of freedom. This test is usually referred to as the .

4.3 Heteroskedasticity: Unknown

Following from the formula for the covariance matrix of β, a of the k k matrix × n 1 1b Σ= Xtdiag(σ2)X = σ2Xt X n i n i i. i. i=1 X is needed. It turns out that (proof omitted) under very general conditions,

n 1 2 t S R X Xi. ≡ n i i. i=1 X is a considtent estimator for Σ, where Ri is the OLS residual. Therefore:

n t −1 2 t t −1 Var(βOLS)=(X X) Ri Xi.Xi.(X X) i=1 X can be used as an estimate ofd theb true variance of the OLS estimator. Hence inference can be made about βOLS without specifying the type of heteroskedasticity.

The diagonal elements of Var(βOLS) are usually referred to as heteroskedasticity-consistent standard b errors. d b

4.3.1 Multiplicative Heteroskedasticity

A common form of heteroskedasticity employed in practise is that of multiplicative heteroskedasticity. It is assumed that the error variance is related to a number of exogenous variables, gathered in a j-vector zi for observation i. It is assumed that:

43 j 2 2 Var(ǫi Xi.)= σi = σ exp αkzik . | ( =1 ) Xk To compute the EGLS (Estimated Generalised Least Squares) estimator, we need consistent esti- mators of α1,...,αj. We assume that

j 2 log Ri = const + zikαk + vi =1 Xk n where vi is, asymptotically, homoskedastic. One first obtains (Ri)i=1, the residuals from OLS. Next, 2 n regress (log Ri )i=1 against zi. and a constant. The estimators α of α are consistent. 2 From this, h = exp zi.α may be computed. i { } Now run OLS on the transformed model: b b b y X ǫ i = i. β + i hi hi hi This yields the EGLS estimator β of β. EGLS b b b The scalar σ2 can be estimated by: b n 2 1 (yi Xi.βEGLS) σ2 = − n k 2 i=1 hi − X b b and the estimated covariance matrix of βEGLS is given by:b

n −1 b 1 β σ2 Xt X . Var( )= 2 i. i. i=1 hi ! X d b b 4.4 Testing for Heteroskedasticity b

There are several tests available for heteroskedasticity

Two different populations Suppose the variance of group A based on n1 observations is 2 2 s1 and the sample variance of group B based on n2 observations is s2. Suppose that the model is Y = X1β1 + ǫ for population 1 and Y = X2β2 + ǫ for population 2, where X1 is n1 k1 and X2 is × n2 k2, n1 > k1 and n2 > k2, rank(X1) = k1 and rank(X2) = k2. Then, under the null hypothesis × 2 2 H0 : σ1 = σ2, where σ1 and σ2 are the variances for the models for populations 1 and 2 respectively,

2 2 s1 Fn1−k1,n2−k2 . s2 ∼

Testing for Multiplicative Heteroskedasticity Following the construction, as a regression of 2 log Ri against const + α1zi1 + ... + αkzik, the test of H0 : α1 = ... = αk = 0 can be tested in the same way as for least squares regression.

44 The Breusch-Pagan Test The Breusch–Pagan test, developed in 1979 by Trevor Breusch and , is used to test for heteroskedasticity in a model. It was independently suggested with some extension by R. Dennis Cook and Sanford Weisberg in 1983. It tests whether the variance of the errors from a regression is dependent on the values of the regressor variables. In that case, heteroskedasticity is present. Suppose that we estimate the regression model

Y = Xβ + ǫ

and obtain from this fitted model a set of values for R the residuals. constrains these so that their is 0 and so, given the assumption that their variance does not depend on the regressor variables, an estimate of this variance can be obtained from the average of the squared values of the residuals. If the assumption is not held to be true, a simple model might be that the variance is linearly related to independent variables. Such a model can be examined by regressing the squared residuals on the independent variables, using an auxiliary regression equation of the form

R2 =Ξγ + v

t where γ = (γ0,γ1,...,γk) . This is the basis of the Breusch–Pagan test. It is a chi-squared test: the 2 is distributed nχk (k degrees of freedom). If the test statistic has a p-value below an appropriate threshold (e.g. p < 0.05) then the null hypothesis of homoskedasticity is rejected and heteroskedasticity assumed.

2 Procedure The Breusch–Pagan test is based on models of the type σi = h(zi.γ) where zi. = (1, zi1, . . . , zik) are the explanatory variables for observation i. The null hypothesis is equivalent to the restriction H0 : γ1 = ... = γk = 0. The steps are:

Step 1: Apply OLS to the model Y = Xβ + ǫ. • Step 2: Perform the auxiliary regression: • 2 Ri = γ0 + γ1zi1 + ... + γkzik + ηi

Q Step 3: Compute the coefficient of determination Ξ = 1 res . Under H0, asymptotically, • − Qtotal nΞ χ2. ∼ k

4.5 Example

This section gives an example where the following model is appropriate:

2 t E ǫ xi, zi = g(z α) i | i   45 where zi is an l-vector of observations on exogeneous or predetermined variables which affect the variance and α is a k-vector of parameters.

Consider journal prices (a set found in AER). The variables subs denotes the number of library subscriptions. Let yi = log(subs); the logarithm of the number of subscriptions for journal i. Let zi denote the price per citation, which we call citeprice. This is given by price (the price of a library subscription) divided by citations (the number of citations). When regressing log(subs) 2 2 2 against log(citeprice), the variance seems to be of the form: E[ǫ zi] = σ z , Let xi = log(zi). The i | i minimisation problem becomes: find β1 and β2 which minimise

n 1 2 (yi β1 β2xi) . z2 − − i=1 i X This is accomplished easily in R using:

> jour_wls1 <- lm(log(subs)~log(citeprice), data=journals, weights = 1/citeprice^2)

Often, we are not sure as to which form the skedastic function takes and this leads to Feasible Gener- alised Least Squares. For example, we could start by regressing the logarithm of the squared residuals from the OLS (ordinary least squares) regression on the logarithm of citeprice and a constant. In the second step, we used the fitted values of this auxiliary regression as the weights in the model of interest. For example, we could start with variances of the form:

2 2 γ2 E ǫ xi = σ x = exp γ1 + γ2 log xi i | i { } where γ1 and γ2 may be estimated from the data by of an auxiliary regression.

There is a substantial example of this as a tutorial exercise: first, start with a ‘naive’ Ordinary Least Squares. Then iteratively compute the γ1 and γ2 based on the residuals for the previous regression until a ‘fixed point’ is reached. In the example, the γ1 and γ2 for the final model are substantially different from those produced based on the initial OLS regression.

4.6

2 We now consider another case where Var(ǫ) = σ I and where Cov(ǫs,ǫt) = Γ(s,t) for some symmetric 6 non-negative definite function Γ. One case, considered in detail in the course, is where

Y = Xβ + ǫ and the noise vector ǫ satisfies:

2 ǫt = ρǫt 1 + vt v N(0,σ I). − ∼ 46 We assume that all ǫj have the same variance. When ρ < 1, a process ǫt : t Z+ which satisfies | | { ∈ } this is stationary and causal (see Time Series course). Here

2 2 Var(ǫt)= ρ Var(ǫt−1)+ σ .

Set γ(0) = Var(ǫt) and γ(h)= Cov(ǫt,ǫt+h), then

σ2 γ(0) = . 1 ρ2 − Also,

γ(h)= ργ(h + 1) h 1 ≥ so that

σ2 γ(h)= ρ|h|. 1 ρ2 2 − Since ǫt ρǫt 1 = vt where vt IID(0,σ ), hence we can transform the regression problem using − − { }∼

∗ ∗ Y = Yt ρYt 1,X t,. = Xt. ρXt 1,. t − − − − to get

Y ∗ = X∗β + v. This is now in the standard OLS framework.

There are clearly problems with the first observation. This may be solved in the following way: assume 1 ǫ1 v2,v3,.... Now, since Var(ǫ1)= 2 Var(v2), use: ⊥ 1−ρ

∗ 2 ∗ 2 Y1 = 1 ρ Y1, X1 = 1 ρ X1. − . − and then the problem is within the appropriatep OLS framework.p

4.6.1 Parameter estimation: ρ unknown

Now suppose that we want to model the observations with AR(1) errors, but ρ is unknown. We may start with an OLS regression of Y against X and store the OLS residuals R. The resulting naive estimator for ρ is

T =2 RjRj−1 ρ j . = T 2 (4.1) P j=2 Rj−1 Tis estimator is clearly biased if ρ = 0. Furthermore,b P β obtained by first estimating ρ in this way and 6 then using the plug-in estimator to get β is no longer BLUE. b Clearly, the procedure should be iterated: firstly obtain ρ as above, then use this to obtain β then b iterate, updating the estimates ρ and β until there is convergence. b b b b 47 4.6.2 Test for first order autocorrelation

It turns out that, under the null hypothesis H0 : ρ = 0, √T ρ N(0, 1). ∼ When T is not large, a popular test is the Durbin-Watson test. The test statistic is: b T 2 (R R 1) D t=2 t t− . W = T − 2 P t=1 Rt Clearly P

DW 2 2ρ. ≃ −

A value of DW close to 2 indicates that ρ 0. ≃ b The distribution of the test statistic depends both on T and also on p, the number of variables in

Xi.. These have been computed and are found in the software.

4.6.3 ARMA models

There are other error patterns which have non trivial . The most common model is the ARMA (autregressive ) where

p q 2 ǫt φjǫt j = vt + θjvt j vt WN(0,σ ). − − − { }∼ j=1 j=1 X X 4.6.4 Dealing with Autocorrelation

The computation showing how to transform an AR(1) model into a model where OLS can be used is instructive. Consider the situation where we have a static model:

′ yt = xtβ + ǫt where ǫt = ρǫt−1 + vt. Here:

′ ′ yt = x β + ρyt 1 ρx 1β + vt t − − t− which has WN(0,σ2) errors. Adding lagged dependent variables and lagged exogenous variables may be the best way to proceed.

4.6.5 Heteroscedasticity-and-autocorrelation-consistent Standard Errors for OLS

Consider the model

Y = Xβ + ǫ

Let

48 T H−1 T 1 1 S∗ = R2X X′ + w R R (X X′ + X X′ ). T i i. i. T i s s−j s. s−j. s−j,. s. i=1 i=1 s=i+1 X X X We are considering a situation where the autocorrelation is 0 after a lag H. Bartlett suggests using j weights wj = 1 for j H. The covariance of βOLS is then estimated as: − H | |≤

\ t b−1 ∗ t −1 Var(βOLS)=(X X) TS (X X) .

b

49 Tutorial 3: 2020-03-10

Exercise 1: Weighted Least Squares

We now try an auxiliary regression to estimate the weighs for a weighted least squares. Consider the journal price data set journals found in AER. We estimate γ1 and γ2 for:

2 E ǫ zi = exp γ1 + γ2 log zi . i | { } where zi is the citeprice variable described in the lecture. Firstly fit an ordinary least squares regression to log of subscription to log of citations. We take a subset of the Journals data set, containing only price and citations and add a column with the ratio: > journals <- Journals[,c("subs","price")] > journals$citeprice <- Journals$price/Journals$citations and now make the linear regression: > jour_lm <- lm(log(subs)~log(citeprice),data=journals) Now try the ‘auxiliary regression’ of the square of the residuals against cite price: > auxreg <- lm(log(residuals(jour_lm)^2)~log(citeprice),data=journals) and now do a weighted least squares regression using the model above: > jour_fgls1 <- lm(log(subs)~log(citeprice),weights=1/exp(fitted(auxreg)),data=journals) It is possible to iterate this: having done a weighted least squares regression, we now use this as the basis for computing the weights: first, set γ2 = 0 and γ2i = 1, as an initialisation. Then:

> while(abs((gamma2i - gamma2)/2)> 1e-7){ + gamma2<-gamma2i + fglsi<-lm(log(subs)~log(citeprice),data=journals,weights=1/citeprice^gamma2) + gamma2i<-coef(lm(log(residuals(fglsi)^2)~ + log(citeprice),data=journals))[2] + }

Now try > jour_fgls2<-lm(log(subs)~log(citeprice),data=journals, + weights=1/citeprice^gamma2) The loop specifies that as long as the relative change of the slope coefficient exceeds 10−7 in absolute value, the iteration is continued. What parameter value is used for γ2 in the final model? What is the parameter value in the first model? Find the coefficients for the final regression model using:

> coef(jour_fgls2)

50 Exercise 2

The data for this exercise is found in the airq.dat file in the course directory.

> www<-"https://www.mimuw.edu.pl/~noble/courses/Econometrics/data/airq.dat" > air <- read.table(www,header=T)

The data set contains observations for 30 standard metropolitan statistical areas (SMSAs) in Cal- ifornia for 1972 on the following variables:

airq indicator for air quality (the lower the better): • vala value added of companies (in 1000 US$) • rain amount of rain (in inches) • coas dummy variable: 1 for SMSAs at the coast, 0 for others • dens population density (per square mile) • medi average income per head (in US$) •

1. Estimate a linear regression model that explains airq from the other variables using ordinary least squares. Interpret the coefficient estimates.

2. Test the null hypothesis that average income is not associated with air quality. Test the joint hypothesis that none of the variables are associated with air quality.

3. Test whether the variance of the error term is different for coastal and non-coastal areas, using an F test. In view of the outcome of the test, comment upon the validity of the results of the previous part of the exercise.

4. Perform a Breusch-Pagan test for heteroskedasticity related to all five explanatory variables. You may find the package lmtest useful.

> install.packages("lmtest")

The appropriate function is bptest

bptest(formula, varformula = NULL, studentize = TRUE, data = list()) where the arguments are as follows: Arguments

formula a symbolic description for the model to be tested (or a fitted "lm" object). • 51 varformula a formula describing only the potential explanatory variables for the variance • (no dependent variable needed). By default the same explanatory variables are taken as in the main regression model. studentize logical. If set to TRUE Koenker’s studentized version of the test statistic will • be used. data an optional data frame containing the variables in the model. By default the variables • are taken from the environment which bptest is called from.

5. Assuming that we have multiplicative heteroskedasticity related to coas and medi, estimate the 2 coefficients by running a regression of log Ri , where Ri are the residuals from the OLS regression on these two variables. Test the null hypothesis of homoskedasticity on the basis of this auxiliary regression.

6. Using these results, compute an EGLS (estimated generlised least squares) estimator for the . Compare your results with those obtained in the first part. Do the hypothesis tests in the second part based on this model.

52