<<

Heretoskedasticity: Cont.

We have seen last time how to construct heteroskedasticity robust t-test, i.e.

1=2 n i i = (W hite) se n1=2(W hite )se  b i b i     We now see how to constructb heteroskedastic robust Waldb test and LM test. That is we construct test for multiple linear restrictions which are asymptotically valid even in the presence of conditional heteroskedasticity. Heteroskedastic robust Wald tests.

H0 : R = r vs H1 : R = r 6 where R is q k; with q be the number of restrictions. The usual Wald test is constructed as

1 0 2 1 W = R ^ r  R (X0X) R0 R ^ r u   h i   Recall that when performing ab Wald test we estimate only the unrestricted model. 2 If E (uu0 X) = uIn; then the usual Wald test has no longer a 2(q) limiting distributionj under6 the null. This is because, we are using an of the which is not consistent for the true variance. Though, we can have robust Wald test if we use an heteroskedastic robust matrix. How to implement this? 2 0 1 1 We need to replace (uR(X X) R0) with

1 1 1 (R(bX0X) (X0uu0X)(X0X) R0) : Thus, the heteroskedastic robust Wald test is given by:

1 0 1 1 WR = R ^ r R(X0X) (X0uu0X)(X0X) R0 R ^ r   h i   Under the null, WR is asymptotically 2(q): Thus, we can simply compare the value we obtain for WR with the 90% or 95% critical values of a 2(q): F-tests Recall that an easy way of performing tests for linear restriction is the use of F test, where (SSRr SSRu)=q F = ; SSRu=(n k) where SSRu and SSRr are respectively the sum of the square errors from the unrestricted and the restricted models. We have seen that in the case of iid normal error and conditional homoskedasticity, under the null, F F (q; n k): If we drop the assumption of normal errors (but keep all the others), it is no

1 longer true that under the null F F (q; n k), however it is true that under the  null qF d 2(q): Thus, can still perform tests for linear restriction based on the simple F! . However, if the assumption of conditional homoskedasticity d 2 is violated, then it is no longer true that under the null qF q; this bacause the F implicitely use the usual variance estimator! (i.e. as the usual/ non robust Wald test). Thus inference based on the chi-square with q degree of freedom is not valid even in large sample. For example the probability of type I error we commit by rejecting the null if we get a value larger than the 95% critical value of a chi-square with q degree of freedom is not 5%; but it may be larger or smaller than 0:05: There is no a natural way of making an heteroskedastic robust F-test.

Heteroskedastic Robust LM tests. Unrestricted model: yi = 1 + 2x2;i + 3x3;i + 4x4;i + 5x5;i + ui Restricted model: yi = 1 + 2x2;i + 3x3;i + ui We estimate ONLY the restricted model, take the residuals ui;r: Now, regress 2 ui;r all regressors (or only on the excluded), and take the R from this regression. 2 If conditional homoskedasticity hold, then under the null, nR bis asymptotically distributedb as 2(2) (we have two restrictions!!). Unfortunately, in the presence of conditional heteroskedasticity, the nR2 test does no longer provide valid infer- ence (e.g. is no longer asymptotically distributed as 2(2) when the restrictions hold). A robust version of LM test can be constructed. Step 1: As before compute the residuals from the restricted model, ui;r Step 2: Regress x4;i on x2;i; x3;i, take residuals u4;i: Also, regress x5;i on x2;i and x3;i; take the residuals u5;i: In general, each of the excluded regressorsb is regressed on the included ones and the residuals areb taken. Step 3: Construct the productsb ui;ru4;i and ui;ru5;i: In general we takes the products of residuals from restricted models and residuals from the regression of excluded variables on all includedb variables;b b thereb are as many products as the number of excluded variables. Step 4: Regress 1 on ui;ru4;i and ui;ru5;i (yes, the dependent variable is a vector of all ones!). Construct n SSR1; where SSR1 is the sum of squared residuals from the last regression. Under the null, n SSR1 is asymptotically b b b b distributed as 2(2) (as here we have two restrictions!).

Tests for the Null of (Conditional) Heteroskedasticity Want to test:

2 2 2 H0 : E(u X) =  versus HA : E(u X) is function of X i j u i j Thus we want to have a test for the null of conditional homoskedasticity versus the alternative of a conditional variance which is a function of the regressors. (ii) Breusch and Pagan test Let ui be the residuals from the regression of yi on constant, x2;i; :::; xk;i: 2 2 Regress ui on constant, x2;i; :::; xk;i; take the R from this regression, Under b b 2 2 the null, nR is asymptotically distributed as 2(k 1): Note that the degree of freedom are given by the number of variables (in addition to intercept) on 2 which we have regressed ui : 2 2 Logic is simple, under the null, E(ui X) is constant, and thus ui is uncorre- j 2 2 lated with any function of the X: As we do not know ui ; we replace it with ui ; and make use of the fact that, as the OLS estimator for are consistent, tests 2 2 based on ui are asymptotically equivalent to tests based on ui : b (i) White Test 2 Very similarb to Breusch and Pagan. Though, instead of regressing ui on con- 2 2 2 stant, x2;i; :::; xk;i we regress ui on constant, x2;i; :::; xk;i; x2;i; :::xk;i; x2;ix3;i; :::; x2;ixk;i; :::; xk 1;ixk;i: In other words, we regress the squared residuals on all x; on all theirb squared 2 and on all the possible crossb products among the x; and take the R : Under 2 the null, the nR is a 2(q); where q = 2(k 1) + (k 1)(k 2)=2; i.e. q is 2 the number on variables (in addition to intercept) on which we regress ui : The 2 logic underlying White’stest, is that it may be possible that ui is not correlated 2 with xi; but is instead correlated with xi or some nonlinear function ofb the x: Basically, the White test has power against a large spectrum of alternatives, 2 though it has less power versus the alternative that ui is a linear function of the x: (iii) Modi…ed White tes 2 2 k 2 Regress ui on yi and yi ; where yi = 1 + i=2 kxi;k: Take the R : Under the null of conditional homoskedsticity, nR2 is asymptotically distributed as P 2(2): b b b b b b Important: The test for conditional heteroskedasticity outlined above have a well de…ned chi-squared distribution and so can be used to perform valid (large sample) inference, under the assumption of conditional homokurtosis, i.e. if 4 E(ui Xi) = 4; where 4 is a constant. In principle, can construct test for conditionalj homoskedasticity robust to conditional heterokurtosis etc...gets too complicated! Example price = 21:77(29:5)+0:0021(:00064)lotsize+:123(:013)sqrft+13:85(9:01)bdrms 2 noted negative intecept (bad!). n = 88, R = :67: "Usual" standard errors in brackets. We do not known whether there is conditional heteroskedsticity or not... Do a Breusch-Pagan test. Hint: if we reject with BP test, no need to perform a White test; if we fail to reject with BP, better to perform also White and modi…ed White test. 2 2 2 Regress ui on constant, lotsize; sqrft; bdrm; get a R = :16: Thus, nR = 88 :16 = 14:9: A sample of 88 is not that large, though little bit of caution  in applyingb asymptotic results. We know that under the null of conditional homoskedasticity, nR2 d (3): The 95% critical value of (3) is 7:8; thus ! 2 2 we reject the null at 5%: Also, note that Pr ( 2(3) > 14:9) = :0028; very tiny P-values, can reject at any level! Very often, by using a log- instead of a linear model, we "reduce"

3 heteroskedasticity. Try...

log(price) = 5:61(:65) + :168(:038) log(lotsize) +:7(:093) log(sqrft) + :037(:028) log(bdrms) d 2 2 n = 88;R = :643: Take squared residuals from the log regression, ui and regress on constant, log(lotsize); log(sqrft); log(brms); get an R2 = :048: Now, nR2 = 88 0:048 = 4:22: Now, 4:22 < 7:8; cannot reject at 5%: Look at  b P value; Pr ( 2(3) > 4:22) = :239; well can only reject at signi…cant level above 24%! 2 As we do not reject, we also try the modi…ed White test. By regressing ui 2 on constant, log(price); log(price) we get a R2 = 0:0392; smaller than above. Thus, do not reject. Incidentally, it is quite common that the level model displayb conditional heteroskedasticityd d while the logged model does not!

Weighted We have seen: how to construct t-test, Wald-test, and LM which are robust to conditional heteroskedasticity. On the other hand, better to forget the F- test, as we cannot robustify it. Also, we have seen how to test for the null of conditional homoskedasticity (i.e. NO conditional heteroskedasticity). In practice, …rst we test for conditional homoskedasticity. If we fail to reject, good...can proceed using OLS and perform t, F, Wald. Still, it remains the issue that our estimates are no longer e¢ cient. For the time being, suppose we know the "true" conditional variance. That is suppose that,

2 2 E(ui X) = E(ui x2;i; :::; xk;i) j 2 j = uh (x2;i; :::; xk;i) and suppose that we know the functional form of h: Note that h is mapping k 1 + from R to R : 1=2 1=2 De…ne yi = yi=h (x2;i; :::; xk;i) ; xj;i = xj;i=h (x2;i; :::; xk;i) for j = 2; :::; k: Consider, y= X + u where y = (y1; :::; yn )0 and X is a n k matrix with generic element xj;i for j = 1=2  1=2 2; :::; k and i = 1; :::; n; x1;i = 1=h (x2;i; :::; xk;i) ; and ui = ui=h (x2;i; :::; xk;i) : This is simple...consider

yi = 1 + 2x2;i + ui

2 2 1=2 where E(ui x2;i) = uh(x2;i): Divide both left and right hand sides by h (x2;i); we have: j yi = 1 + 2x2;i + ui

4 2 2 ui 1 2 1 2 where E(u x2;i) = E x2;i = E u x2;i = h(x2;i)u = i j h(x2;i) j h(x2;i) i j h(x2;i) 2 u: Thus, we are back to the case of conditional homoskedasticity!!! 1=2 1=2 Now, we run a OLS regression of yi 1=h (x2;i; :::; xk;i), xj;i=h (x2;i; :::; xk;i) j = 2; :::; k and we compute.

1 wls = (XX0) Xy

2 2 Result WLS-1: Let A.MLR1-3, A.MLR4 hold. Also, assume that E(u X) = h (x2;i; :::; xk;i) : b i j u Then, in …nite sample wls is BLUE and in large sample,

1=2 d 2 1 n b N 0; p lim (XX0=n) wls ! u     Note that we have requiredb A.MLR3, i.e. E(ui X) = 0; instead A.MLR3’, i.e. j E(X0u=n) = 0: Sketch of proof: For simplicity, assume the case of the simple linear model, 2 2 1=2 yi = 1 + 2x2;i + ui; where E(ui x2;i) = uh(x2;i): We regress yi=h (x2;i) on 1=2 1=2 j 1=h (x2;i) and x2;i=h (x2;i); and we obtain

n x2;i x yi y i=1 2  2;wls = 2 P n   x x i=1 2b;i 2 b b   1 1 P 1=2 1 1 1=2 where  = n x = n x2;i=h (x2;i) and  = n y = n yi=h (x2;i): x2 2;i b y i Note that P P P P yi y = 2 x2;i x + (ui u ) b  2 b    1 and that x2;i x bu = 0; where bu = n ubi: Thus, 2   P   P 1=2 n b b n b x2;i x ui 1=2 i=1 2 n 2;wls 2 = 2 1 Pn   n x    i=1 2;i bx2 b P   Now, b

E x2;i x ui 2    1 = E x2;i bx E(ui x2;i) 2 h1=2(x ) j  2;i  = 0   b

5 Recalling that x2;i and ui are iid becase yi; x2;i and h(x2;i) are iid,

n 1=2 V ar n x2;i x ui 2 i=1 ! X   n 2 1 b 2 = n E x2;i x ui 2 i=1 X    2 b 2 = E E x2;i x ui x2;i 2 j    2 1 2 = E x2;i xb E ui x2;i 2 h(x ) j  2;i   2  2 2 = E x2;i bx u = var(x2;i)u 2    2 2 as E u x2;i =  h(x2;i): Thus, becauseb of the , i j u  n 1=2 d 2 n x2;i x ui N(0; uvar(x2;i)) 2 ! i=1 X   and so b 1=2 d 2 n N(0;  =var(x2;i)) 2;wls 2 ! u   Example b Consider …rst the simple saving function,

si = 1 + 2inci + ui where si and inci are saving and income of household i: We have on 100 households in 1970. We …rst estimate the model above by using OLS, and then 2 2 by using WLS assuming that E(ui inci) = uinci: Thus, estimation by WLS 1=2 1j =2 1=2 entails regressing si=inci on 1=inci and on inci=inci : We obtain: for OLS, 2 1 = 124(655); 2 = :147(0:58);R = :062: For WLS, 1;wls = 124:1(480); 2 2;wls = :172(:057);R = :085: Note that, MPS (marginal propensity for saving) isb substantially higherb for WLS. This is not surprising at all,b given that we have imposedb a given form of heteroskedasticity.

6 Weighted Least Squares: cont.

We have seen that if we knew the true conditional variance, then we can implement Weighted Least Squares and get e¢ cient or asymptotically e¢ cient . Though, it is somewhat unusual to know (up to a multiplicative factor) the . There is one case, in which indeed we know the functional form of the covariance. This is the case in which we have data on averages over di¤erent groups and, at the individual level, we can rely on the homoskedasticity assumption. Suppose that we want to estimate the contribution of an individual to her/his pension plan as a function of how much the employer contributes (this has clear important implications for policy, social security issues etc.). Ideally we have data on single employees at di¤erent …rms, i.e. the equation at individual level is

contri;e = 1 + 2earni;e + 3agei;e + 4mratei + ui;e where contri;e; earni;e; agei;e are annaul contribution to pension plan, annaual earnings and age of employee e at …rm i; while mratei is the amount …rm i put in front of every dollar put in the pension plan by one of its employees. Suppose 2 2 that E(ui;e X) =u; i.e. at individual level conditional homoskedasticity holds. Though, wej do not observe individual data, but only averages for each …rm, i.e. we can only estimate

ave contri = + ave earni 1 2 + ave agei + mratei + ui 3 4 1 m where ave contri = m e=1 contribi;e; ave agei; ave earni are de…ned 1 mi e analogously. Thus, note that ui = m ui;e: P i e=1 2 2 2 1 mi 2 2 If E(ui;e) =u; then E(ui ) =E miP e=1 ui;e = u=mi: Thus, we can e   1=2 implement weighted least squares by moltiplyingP all data by mi ; i.e. in this case h(x2;i; :::; xk;i) = 1=mie: Thus, a straightforward case of application of WLS is when we have averages of individuals for groups. However, in general we do not know the functional form of h: Often, we postulate the functional form of h; up to some parameters to be estimated. A frequently used model is the following.

2 2 E(u X) =  exp (1 + 2x2;i + ::: + kxk;i) ; (1) i j u that is h(x2:i; :::; xk;i) = exp (1 + 2x2;i + ::: + kxk;i) : The idea is that this model is ‡exible enough to capture di¤erent forms of conditional heteroskedas- ticity. Basically, we can estimate the ; and then scale the data by

1=2 1=2 1=hi = 1= exp 1 + 2x2;i + ::: + kxk;i :    b b b b 7 How to proceed? Step 1: Estimate the model for the conditional , e.g. regress yi on intercept, x2;i; :::; xk;i: Call 1; 2; :::; k the estimated coe¢ cient on intercept, x2;i; :::; xk;i: 2 2 Step 2: Take the residualb ubi; formblog(ui ) and regress log(ui ) on constant x2;i; :::; xk;i: Denote 1; 2; :::; k the estimated coe¢ cient, and de…ne gi the re- sulting predicted value, i.e. b b b b b b b 2 gi = log(ui ) = 1 + 2x2;i + ::: + kxk;i

+ 1=2 + 1=2 Step 3: Form hib= exp(dgbi); andb constructb yi =byi=hi ; xj;i = xj;i=hi ; j = 2; :::; k; as + 1=2 + + Step 4: Regressbyi on 1=bhi ; x2;i; :::; xk;i and form b b

+ + 1 + + fwlsb = X X 0 X y  where fwls is called feasibleb weighted least square estimator. Result WLS-2: Let A.MLR1-A.MLR4 hold. Suppose that: b 2 2 E(u X) =  exp (1 + 2x2;i + ::: + kxk;i) : i j u

0 If  = 1; 2; :::; k is consistent for ; then:   b b b 1b=2 d 2 1 n N 0; p lim (XX0=n) fwls ! u     that is n1=2 b has the same limiting distribution as n1=2 ; fwls wls and so it is asymptotically e¢ cient.   Remark: inb …nite sample fwls is no longer BLUE. b + hi hi Sketch of Proof: Note that yi = yi : Given that p lim  = ; p lim = 1: b hi hi Thus, in large sample, regressing y+ on 1=h1=2; x+ ; :::; x+ and regressing y i b i 2;i k;i b i 1=2 b on 1=hi ; x2;i; :::; xk;i is equivalent. From the result above, we see that if web have a for the true conditional variance, then Weighted Least Squares and Feasible Weighted Least Squares have the same limiting distribution. The issue is that often we do not know the functional form of h; and have no particular ideas on that. In that case, we use a so called nonparametric estimators of the conditional variance. These estimators have the advantage of being able to approximate any function (subject to some regularity conditions, such as twice continuous di¤erentiability). In this sense they are very ‡exible. The price we pay is that they converge very very slow. Need a lot of data to have them working. A well known type of estimators is the so called kernel estimators.

8 If we do not know the functional form of h; the idea is to use a kernel estima- tor of the conditional variance, in order to get an estimator able to approximate every h function. Suppose that hi = h(x2;i; x3;i): De…ne,

hNP (x2;i; x3;i)

1 n 2 x2j x2;i x3j x3;i n2 j=1 uj K  K  = b n n n ; 1 n x2j x2;i  x3j x3;i  P n2 j=1 K  K  n b n n where for example P     2 x2j x2;i x2j x2;i K = 0:75 1    n   n  ! x2j x2;i if < 1:  n 1=6 and  0 as n but rather slow (e.g.  = n ): Note that K is called n ! ! 1 n the kernel and n the bandwidht. Now, provide the true h function is twice continuously di¤erentiable, for any point x2;i; x3;i

p lim hNP (x2;i; x3;i) h(x2;i; x3;i) = 0 Thus, even if we do not know h; can always use a nonparametric estima- b 1=2 1=2 tor and construct fwls by regressing yi=hNP (x2;i; x3;i) on 1=hNP (x2;i; x3;i); 1=2 1=2 x2;i=hNP (x2;i; x3;i) and x3;i=hNP (x2;i; x3;i): b b b The issue is that hNP (x2;i; x3;i) converges to the true h(x2;i; x3;i); very slowly.b And, the slower theb higher is the number of variables on which hi depend. Thus, for smallb sample fwls can behave quite bad. Heuristically, it is a good idea doing weighted least squares with nonparametric estimators, if we have sample of about 400 500 bobservations or more. What happens if we compute (feasible) WLS using the wrong weighting 2 2 matrix? In other words, suppose that E(ui X) = ug(X): However, we do WLS using h(X) (or feasible WLS using anj estimator consistent for h(X))? Well, consistency is preserved, though the estimator is no longer e¢ cient. In + this case, still need to use White standard errors, using X or X and the residual from the weighted regression. Example: Demand for cigarettes. cigsi; inci; educi; agei are number of ciga- rettes per day, year income, education and age of individual i; pcigsi price of cigarettes for the state where individual i lives (US data!), and resti dummy equal to 1 if smoking is prohibited in restaurants in the states where individual i lives. We have:

cigsi

= 3:64(24) + :88(:72) log(inci) :75(5:7) log(pcigsi) d 2 0:5(:167)educi + :77(:16)agei :009(:0017)age i 2:83(1:1)resti

9 n = 807: We construct a Breusch-Pagan test for heteroskedasticity. We regress 2 2 2 ui on constant, log(inci); educi; agei; agei ; and log(pcigsi); resti: Get R = :04; thus nR2 = 807 0:04 = 32; which is larger than the 95% critical values of (6):  2 Rejectb the null of conditional heteroskedasticity. We use the model in (1) and follow Step1-Step4 above to construct feasible weighted least square estimators, which are reported below.

cigsi

= 5:64(18) + 1:3(:44) log(inci) :2:94(4:5) log(pcigsi) d 2 0:46(:12)educi + :48(:09)agei :006(:0009)age i 3:46(0:8)resti The …rst thing we note is that (feasible)WLS estimators are quite di¤erent from OLS. For example, coe¢ cient on income, is more than twice than before, also coe¢ cient on age and restaurant are quite di¤erent... What should we conclude? We know that both OLS and (feasible)WLS are consistent, thus in large sample (and 800 is large) should be quite close each other. One explanation is that the underlying model is misspeci…ed, so that OLS and (feasible)WLS converge to two di¤erent probability limit. We’llcome back to that when we’lldo Hausman Test!

10