Robust and Clustered Standard Errors
Total Page:16
File Type:pdf, Size:1020Kb
Robust and Clustered Standard Errors Molly Roberts March 6, 2013 Molly Roberts Robust and Clustered Standard Errors March 6, 2013 1 / 1 An Introduction to Robust and Clustered Standard Errors Outline Molly Roberts Robust and Clustered Standard Errors March 6, 2013 2 / 1 An Introduction to Robust and Clustered Standard Errors Outline Molly Roberts Robust and Clustered Standard Errors March 6, 2013 3 / 1 An Introduction to Robust and Clustered Standard Errors Linear Regression with Non-constant Variance Review: Errors and Residuals Errors are the vertical distances between observations and the unknown Conditional Expectation Function. Therefore, they are unknown. Residuals are the vertical distances between observations and the estimated regression function. Therefore, they are known. Molly Roberts Robust and Clustered Standard Errors March 6, 2013 4 / 1 An Introduction to Robust and Clustered Standard Errors Linear Regression with Non-constant Variance Notation Errors represent the difference between the outcome and the true mean. y = Xβ + u u = y − Xβ Residuals represent the difference between the outcome and the estimated mean. y = Xβ^ + u^ u^ = y − Xβ^ Molly Roberts Robust and Clustered Standard Errors March 6, 2013 5 / 1 An Introduction to Robust and Clustered Standard Errors Linear Regression with Non-constant Variance Variance of β^ depends on the errors − β^ = X0X 1 X0y − = X0X 1 X0(Xβ + u) − = β + X0X 1 X0u Molly Roberts Robust and Clustered Standard Errors March 6, 2013 6 / 1 An Introduction to Robust and Clustered Standard Errors Linear Regression with Non-constant Variance Variance of β^ depends on the errors − V [β^] = V [β] + V [X0X 1 X0u] − = 0 + V [X0X 1 X0u] − − − − = E[X0X 1 X0uu0X X0X 1] − E[X0X 1 X0u]E[X0X 1 X0u]0 − − = E[X0X 1 X0uu0X X0X 1] − 0 Molly Roberts Robust and Clustered Standard Errors March 6, 2013 7 / 1 An Introduction to Robust and Clustered Standard Errors Linear Regression with Non-constant Variance Variance of β^ depends on the errors (continued) − V [β^] = V [β] + V [X0X 1 X0u] − = 0 + V [X0X 1 X0u] − − − − = E[X0X 1 X0uu0X X0X 1] − E[X0X 1 X0u]E[X0X 1 X0u]0 − − = E[X0X 1 X0uu0X X0X 1] − 0 − − = X0X 1 X0E[uu0]X X0X 1 − − = X0X 1 X0ΣX X0X 1 Molly Roberts Robust and Clustered Standard Errors March 6, 2013 8 / 1 An Introduction to Robust and Clustered Standard Errors Linear Regression with Non-constant Variance Constant Error Variance and Dependence Under standard OLS assumptions, u ∼ Nn(0; Σ) 2 σ2 0 0 ::: 0 3 6 0 σ2 0 ::: 0 7 Σ = Var(u) = E[uu0] = 6 7 6 . 7 4 . 5 0 0 0 : : : σ2 What does this mean graphically for a CEF with one explanatory variable? Molly Roberts Robust and Clustered Standard Errors March 6, 2013 9 / 1 An Introduction to Robust and Clustered Standard Errors Linear Regression with Non-constant Variance Evidence of Non-constant Error Variance (4 examples) ● ● ● 1.5 ● ● ● ● ● 2.0 ●● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ●● ●●●● ●● ●● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● ● ● ● ● ●● ●● 1.0 ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● y ● ● ● y ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●●●● ●● ● ● ●● ● ● ● ● ●●● ●●● ● ● ● ● ● ● ● ● ● −0.5 ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● −1.5 −1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x x ● ● ● ● ● ● ● ●●● 1.2 1.0 ● ● ●●● ● ● ●● ●● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● 0.8 ● ●●● ●● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● 0.8 ● ●● ● 0.6 ● ●●● ● ●● ●●●● ● ● ● ● ●●●●● ● y ● y ● ● ●●● ● ● ●●● ● ● ●● ● 0.4 ● ● ● ●●●● ● ●●● ● ● ● ● ●● ● ●●●● ● ● ● 0.4 ● ●● ● ● ● ● ●● 0.2 ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●●●● ● ● ● ● ● ● ● ● ● 0.0 ● ● ●● ● ● 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x x Molly Roberts Robust and Clustered Standard Errors March 6, 2013 10 / 1 An Introduction to Robust and Clustered Standard Errors Linear Regression with Non-constant Variance Notation The constant error variance assumption sometimes called homoskedasiticity states that 2 σ2 0 0 ::: 0 3 6 0 σ2 0 ::: 0 7 Var(u) = E[uu0] = 6 7 6 . 7 4 . 5 0 0 0 : : : σ2 In this section we will allow violations of this assumption in the following heteroskedastic form. 2 2 3 σ1 0 0 ::: 0 6 0 σ2 0 ::: 0 7 Var(u) = E[uu0] = 6 2 7 6 . 7 4 . 5 2 0 0 0 : : : σn Molly Roberts Robust and Clustered Standard Errors March 6, 2013 11 / 1 An Introduction to Robust and Clustered Standard Errors Linear Regression with Non-constant Variance Consequences of non-constant error variance 2 2 I The σ^ will not be unbiased for σ . I For “α” level tests, probability of Type I error will not be α. I “1 − α” confidence intervals will not have 1 − α coverage probability. I The LS estimator is no longer BLUE. However, I The degree of the problem depends on the amount of heteroskedasticity. I β^ is still unbiased for β Molly Roberts Robust and Clustered Standard Errors March 6, 2013 12 / 1 An Introduction to Robust and Clustered Standard Errors Linear Regression with Non-constant Variance Heteroskedasticity Consistent Estimator Suppose 2 2 3 σ1 0 0 ::: 0 6 0 σ2 0 ::: 0 7 V [u] = Σ = 6 2 7 6 . 7 4 . 5 2 0 0 0 : : : σn then Var(β^) = (X0X)−1 X0ΣX (X0X)−1 and Huber (and then White) showed that 2 ^2 3 u1 0 0 ::: 0 2 − 6 0 u^ 0 ::: 0 7 − X0X 1 X0 6 2 7 X X0X 1 6 . 7 4 . 5 2 0 0 0 ::: u^n is a consistent estimator of V [β^]. Molly Roberts Robust and Clustered Standard Errors March 6, 2013 13 / 1 An Introduction to Robust and Clustered Standard Errors Linear Regression with Non-constant Variance Things to note about this approach 1. Requires larger sample size I large enough for each estimate (e.g., large enough in both treatment and baseline groups or large enough in both runoff and non-runoff groups) I large enough for consistent estimates (e.g., need n ≥ 250 for Stata default when highly heteroskedastic (Long and Ervin 2000)). 2. Doesn’t make β^ BLUE 3. What are you going to do with predicted probabilities? Molly Roberts Robust and Clustered Standard Errors March 6, 2013 14 / 1 An Introduction to Robust and Clustered Standard Errors GLM’s and Non-constant Variance Outline Molly Roberts Robust and Clustered Standard Errors March 6, 2013 15 / 1 An Introduction to Robust and Clustered Standard Errors GLM’s and Non-constant Variance What happens when the model is not linear? Huber (1967) developed a general way to find the standard errors for models that are specified in the wrong way. Under certain conditions, you can get the standard errors, even if your model is misspecified. These are the robust standard errors that scholars now use for other glm’s, and that happen to coincide with the linear case. Molly Roberts Robust and Clustered Standard Errors March 6, 2013 16 / 1 An Introduction to Robust and Clustered Standard Errors GLM’s and Non-constant Variance What does it mean for a non-linear model to have heteroskedasticity? I Think about the probit model in the latent variable formulation. ∗ I Pretend that there is heteroskedasticity on the linear model for y . I Heteroskedasticity in the latent variable formulation will completely change the functional form of P(y = 1jx). I What does this mean? The P(y = 1jx) 6= Φ(xβ). Your model is wrong. Molly Roberts Robust and Clustered Standard Errors March 6, 2013 17 / 1 An Introduction to Robust and Clustered Standard Errors GLM’s and Non-constant Variance RSEs for GLMs To derive robust standard errors in the general case, we assume that y ∼ fi (yjθ) Then our likelihood function is given by n Y fi (Yi jθ) i=1 and thus the log-likelihood is n X L(θ) = log fi (Yi jθ) i=1 Molly Roberts Robust and Clustered Standard Errors March 6, 2013 18 / 1 An Introduction to Robust and Clustered Standard Errors GLM’s and Non-constant Variance RSEs for GLMs We will denote the first and second partial derivatives of L to be: n n 0 X 00 X L (θ) = gi (Yi jθ); L (θ) = hi (Yi jθ) i=1 i=1 Where δ g (Y jθ) = [logf (yjθ)]0 = log f (yjθ) i i i δθ i and δ2 h (Y jθ) = [logf (yjθ)]00 = log f (yjθ) i i i δθ2 i Molly Roberts Robust and Clustered Standard Errors March 6, 2013 19 / 1 An Introduction to Robust and Clustered Standard Errors GLM’s and Non-constant Variance RSEs for GLMs This shouldn’t be too unfamiliar. Remember, the Fisher information matrix is −Eθ[hi (Yi jθ)]. Molly Roberts Robust and Clustered Standard Errors March 6, 2013 20 / 1 An Introduction to Robust and Clustered Standard Errors GLM’s and Non-constant Variance RSEs for GLMs I Let’s assume the model is correct – there is a true value θ0 for θ. I Then we can use the Taylor approximation for the log-likelihood function to estimate what the likelihood function looks like around θ0: 1 L(θ) = L(θ ) + L0(θ )(θ − θ ) + (θ − θ )T L00(θ )(θ − θ ) 0 0 0 2 0 0 0 Molly Roberts Robust and Clustered Standard Errors March 6, 2013 21 / 1 An Introduction to Robust and Clustered Standard Errors GLM’s and Non-constant Variance RSEs for GLMs ^ I We can use the Taylor approximation to approximate θ − θ0 and therefore the variance covariance matrix. I We want to find the maximum of the log-likelihood function, so we set L0(θ) = 0: 0 T 00 L (θ0) + (θ − θ0) L (θ0) = 0 ^ 00 −1 0 T θ − θ0 = [−L (θ0)] L (θ0) ^ 00 −1 0 00 −1 Avar(θ) = [−L (θ0)] [Cov(L (θ0))][−L (θ0)] Molly Roberts Robust and Clustered Standard Errors March 6, 2013 22 / 1 An Introduction to Robust and Clustered Standard Errors GLM’s and Non-constant Variance RSEs for GLMs It’s the sandwich estimator.