Heteroskedasticity in Regression

Heteroskedasticity 1 / 50 Heteroskedasticity in Regression Paul E. Johnson1 2 1Department of Political Science 2Center for Research Methods and Data Analysis, University of Kansas 2020 Heteroskedasticity 2 / 50 Introduction Outline 1 Introduction 2 Fix #1: Robust Standard Errors 3 Weighted Least Squares Combine Subsets of a Sample Random coefficient model Aggregate Data 4 Testing for heteroskedasticity Categorical Heteroskedasticity Checking for Continuous Heteroskedasticity Toward a General Test for Heteroskedasticity 5 Appendix: Robust Variance Estimator Derivation 6 Practice Problems Heteroskedasticity 3 / 50 Introduction Remember the Basic OLS Theory behind the Linear Model yi = β0 + β1x1i + ei Error term, we assumed, for all i, E(ei ) = 0 for all i (errors are \symmetric" above and below) 2 2 Var(ei ) = E[(ei − E(ei )) ] = σ (Homoskedasticity: same variance). Heteroskedasticity: the assumption of homogeneous variance is violated. Heteroskedasticity 4 / 50 Introduction Homoskedasticity means 2 σe 0 0 0 0 2 0 σe 0 0 0 2 0 0 σ 0 0 Var(e) = e .. 0 0 0 . 0 2 0 0 0 0 σe Heteroskedasticity 5 / 50 Introduction Heteroskedasticity depicted in one of these ways 2 σe 2 w 0 σe w1 0 1 σ2 σ2w e e 2 w2 2 σ2 Var(e) = σe w3 or e w3 . . . . 2 0 σ wN σ2 e 0 e wN I get confused a lot when comparing textbooks because of this problem! Heteroskedasticity 6 / 50 Introduction Consequences of Heteroskedasticity 1: βÔLS still unbiased, consistent OLS Estimates of β0 and β1 are still unbiased and consistent. Unbiased: E[βÔLS ] = β Consistent: As N → ∞, βÔLS tends to β in probability limit. If the predictive line was \right" before, It's still right now. However, these are incorrect standard error of β^ RMSE confidence / prediction intervals Heteroskedasticity 7 / 50 Introduction Proof: βÔLS Still Unbiased Begin with \mean centered" data. OLS with one variable: P P P 2 P P ^ xi · yi xi (b · xi + ei ) β1 xi + xi · ei xi · ei β1 = P 2 = P 2 = P 2 = β1+ P 2 xi xi xi xi Apply the Expected value operator to both sides: P ^ xi · ei E[β1] = E(β1) + E( P 2 ) xi P P ^ xi · ei E[xi · ei ] E[β1] = β1 + E( P 2 ) = β1 + ( P 2 ) xi xi Assume xi is uncorrelated with ei , E[xi ei ] = 0, the work is done E(β^1) = β1 Heteroskedasticity 8 / 50 Introduction Consequence 2. OLS formula for Var\(β^) is wrong 1 Usual formula to estimate Var(β^), Var\(β^) is wrong. And it's square root, the std.err.(β^) is wrong. 2 Thus t-tests are WRONG (too big). Heteroskedasticity 9 / 50 Introduction Proof: OLS Var\(β^) Wrong Variance of ei : Var(ei ). The variance of the OLS slope estimator, Var(b^1), in \mean-centered (or deviations) form": P P P P 2 xi · ei Var[ xi ei ] Var(xi ei ) xi · Var(ei ) Var(β^1) = Var = = = P x2 P 2 2 P 2 2 P 2 2 i ( xi ) ( xi ) ( xi ) (1) We assume all Var(ei ) are equal, and we put in the MSE as an estimate of it. \^ MSE Var(β1) = P 2 (2) xi Heteroskedasticity 10 / 50 Introduction Proof: OLS Var(β^)Wrong (page 2) Instead, suppose the \true" variance 2 2 Var(ei ) = s + si (3) (a common minimum variance s2 plus an additional individualized 2 variance si ). Plug this into (1): P x2(s2 + s2) s2 P x · s2 i i = + i i (4) P 2 2 P x2 P 2 2 ( xi ) i ( xi ) The first term is "roughly"what OLS would calculate for the variance of β^1. The second term is the additional "true variance" in β^1 that the OLS formula V\(β^1) does not include. Heteroskedasticity 11 / 50 Introduction Consequence 3: βÔLS is Inefficient ÔLS 1 βi is inefficient: It has higher variance than the \weighted" estimator. 2 Note that to prove an estimator is \inefficient", it is necessary to provide an alternative estimator that has lower variance. ^WLS 3 WLS: Weighted Least Squares estimator, β1 . 4 The Sum of Squares to be minimized now includes a weight for each case N X 2 SS(β^0, β^1) = Wi (y − yî ) (5) i=1 5 The weights chosen to \undo" the heteroskedasticity. 2 Wi = 1/Var(ei ) (6) Heteroskedasticity 12 / 50 Introduction Heteroskedasticity 13 / 50 Introduction Covariance matrix of error terms This thing is a weighting matrix 1 0 Var(e1) 1 Var(e2) .. . 0 1 Var(eN ) Is usually simplified in various ways. Factor out a common parameter, so each individual's error variance is proportional 1 0 w1 1 1 w 2 2 .. σe . 0 1 wN Heteroskedasticity 14 / 50 Introduction 2 Example. Suppose variance proportional to xi Heteroskedasticity The \truth" is 60 yi = 3 + 0.25xi + ei (7) 40 20 Homoskedastic: y 0 Std.Dev.(ei ) = σe = 10 (8) −20 Heteroskedastic: 30 40 50 60 x Std.Dev.(ei ) = 0.05∗(xi −min(xi ))∗σe (9) Heteroskedasticity 15 / 50 Introduction Compare Lines of 1000 Fits (Homo vs Heteroskedastic) Heteroskedastic Errors Homoskedastic Errors 60 60 40 40 20 20 Homoskedastic y Homoskedastic 0 0 Heteroskedastic y Heteroskedastic −20 −20 20 30 40 50 60 70 80 20 30 40 50 60 70 80 x x Heteroskedasticity 16 / 50 Introduction Histograms of Slope Estimates (w/Kernel Density Lines) Heteroskedastic Data Homoskedastic Data 5 8 4 6 3 4 Density Density 2 2 1 0 0 0.0 0.2 0.4 0.0 0.2 0.4 Slope Est. Slope Est. Heteroskedasticity 17 / 50 Introduction So, the Big Message Is Heteroskedasticity inflates the amount of uncertainty in the estimates. Distorts t-ratios. Heteroskedasticity 18 / 50 Fix #1: Robust Standard Errors Outline 1 Introduction 2 Fix #1: Robust Standard Errors 3 Weighted Least Squares Combine Subsets of a Sample Random coefficient model Aggregate Data 4 Testing for heteroskedasticity Categorical Heteroskedasticity Checking for Continuous Heteroskedasticity Toward a General Test for Heteroskedasticity 5 Appendix: Robust Variance Estimator Derivation 6 Practice Problems Heteroskedasticity 19 / 50 Fix #1: Robust Standard Errors Robust estimate of the variance of b^ Replace OLS formula for Var\(b^) with a more \robust" version Robust \heteroskedasticity consistent" variance estimator: weaker assumptions. No known small sample properties But are consistent / asymptotically valid Note: This does not “fix" bÔLS . It just gives us more accurate t-ratios by correcting std.err(b^). Heteroskedasticity 20 / 50 Fix #1: Robust Standard Errors Robust Std.Err. in a Nutshell Recall: the variance-covariance matrix of the errors assumed by OLS. 2 σe 0 0 0 0 2 0 σe 0 0 0 0 Var(e) = E(e · e |X ) = 0 0 σ2 0 0 (10) e ......... 0 2 0 0 0 0 σe Heteroskedasticity 21 / 50 Fix #1: Robust Standard Errors Heteroskedastic Covariance Matrix If there's heteroskedasticity, we have to allow the possibility like this: 2 σ1 0 0 0 0 0 σ2 0 0 0 2 0 .. Var(e) = E[e · e |X ] = 0 0 . ··· 0 2 0 0 0 σN−1 0 2 0 0 0 0 σN Heteroskedasticity 22 / 50 Fix #1: Robust Standard Errors Robust Std.Err. : Use Variance Estimates Fill in estimates for the case-specific error variances 2 σc1 0 0 0 0 2 0 σc2 0 0 0 \ .. Var(e) = 0 0 . ··· 0 0 0 0 σ[2 0 N−1 2 0 0 0 0 σcN Embed those estimates into the larger formula that is used to calculate the robust standard errors. Famous paper White, Halbert. (1980). A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity. Econometrica, 48(4), 817-838. Robust estimator originally proposed by Huber (1967), but was forgotten Heteroskedasticity 23 / 50 Fix #1: Robust Standard Errors Derivation: Calculating The Robust Estimator of Var(b^) The true variance of the OLS estimator is P 2 ^ xi Var(ei ) Var(b1) = P 2 2 (11) ( xi ) 2 Assuming Homoskedasticity, estimate σe with MSE. \^ MSE ^ Var(b1) = P 2 and the square root of that is std.err.(b1) (12) xi The robust versions replace Var(ei ) with other estimates. White's suggestion was P 2 2 \^ xi · eî Robust Var(b1) = P 2 2 (13) ( xi ) 2 eî : the \squared residual", used in place of the unknown error variance. Heteroskedasticity 24 / 50 Weighted Least Squares Outline 1 Introduction 2 Fix #1: Robust Standard Errors 3 Weighted Least Squares Combine Subsets of a Sample Random coefficient model Aggregate Data 4 Testing for heteroskedasticity Categorical Heteroskedasticity Checking for Continuous Heteroskedasticity Toward a General Test for Heteroskedasticity 5 Appendix: Robust Variance Estimator Derivation 6 Practice Problems Heteroskedasticity 25 / 50 Weighted Least Squares WLS is Efficient Change OLS: N X 2 SS(β^) = (yi − yî ) i=1 to WLS: N X 2 minimize SS(β^) = Wi (yi − yî ) i=1 In practice, weights are guesses about the standard deviation of the error term 1 Wi = (14) σi Heteroskedasticity 26 / 50 Weighted Least Squares Feasible Weighted Least Squares. Analysis proceeds in 2 steps. Regression is estimated to gather information about the error variance. That information is used to fill in the Weight matrix with WLS May revise the weights, re-fit the WLS, repeatedly until convergence. Heteroskedasticity 27 / 50 Weighted Least Squares Combine Subsets of a Sample Data From Categorical Groups Suppose you separately investigate data for men and women men : yi = β0 + β1xi + ei (15) women : yi = c0 + c1xi + ui (16) Then you wonder, \can I combine the data for men and women to estimate one model" humans : yi = β0 + β1xi + β2sexi + β3sexi xi + ei (17) This \manages" the differences of intercept and slope for men and women by adding coefficients β2 and β3.

Heteroskedasticity in Regression

Generalized Linear Models and Generalized Additive Models

Generalized Least Squares and Weighted Least Squares Estimation

Generalized and Weighted Least Squares Estimation

Lecture 24: Weighted and Generalized Least Squares 1

18.650 (F16) Lecture 10: Generalized Linear Models (Glms)

Weighted Least Squares

Heteroscedastic Errors

4.3 Least Squares Approximations

Power Comparisons of the Mann-Whitney U and Permutation Tests

Time-Series Regression and Generalized Least Squares in R*

Statistical Properties of Least Squares Estimates

Research Report Statistical Research Unit Goteborg University Sweden