diagnostics
Regression diagnostics & model fit
Bias and efficiency
Specification Johan A. Elkink Heterosked. School of Politics & International Relations Autocorrelation University College Dublin Multicollinearity
Measurement error
18 November 2019 diagnostics
1 Bias and efficiency
2 Specification
Bias and efficiency Specification 3 Heteroskedasticity Heterosked.
Autocorrelation Multicollinearity 4 Autocorrelation Measurement error
5 Multicollinearity
6 Measurement error diagnostics Outline
1 Bias and efficiency
Bias and efficiency 2 Specification
Specification
Heterosked. 3 Heteroskedasticity Autocorrelation
Multicollinearity
Measurement error 4 Autocorrelation
5 Multicollinearity
6 Measurement error diagnostics Unbiasedness
Bias and efficiency Specification An unbiased estimator of a coefficient β is an estimator where Heterosked. the mean of the sampling distribution is identical to the true Autocorrelation β. I.e. E(βˆ) = β. Multicollinearity Measurement The bias of an estimator is thus E(βˆ) − β. error diagnostics Best unbiased
Often, many estimators could be defined whereby Bias and efficiency E(βˆ) − β = 0, i.e. that are unbiased. The best unbiased Specification estimator is the estimator that leads to an unbiased estimated Heterosked. with the smallest variance. Autocorrelation Multicollinearity Another term for this is efficiency - a smaller standard error Measurement error means a more efficient estimator.
If all assumptions underlying OLS hold, OLS is BLUE, i.e. the best linear unbiased estimator of β. diagnostics Mean Squared Error
Bias and efficiency Although unbiasedness is often considered more important than
Specification high efficiency, there is a certain trade-off between the two. It Heterosked. is better to have a slightly biased but highly efficient estimator Autocorrelation than an unbiased but very inefficient estimator. Multicollinearity Measurement The Mean Squared Error (MSE) refers to the weighted error square error of an estimator, with equal weights for variance and bias. diagnostics Outline
1 Bias and efficiency
Bias and efficiency 2 Specification
Specification
Heterosked. 3 Heteroskedasticity Autocorrelation
Multicollinearity
Measurement error 4 Autocorrelation
5 Multicollinearity
6 Measurement error diagnostics Outliers
Bias and efficiency An outlier is a point on the regression line where the error is Specification large. Heterosked. Autocorrelation A point with high leverage is located far from the other points. Multicollinearity
Measurement error A high leverage point that strongly influences the regression line is called an influential point. diagnostics Outlier, low leverage, low influence
Bias and efficiency
Specification
Heterosked.
Autocorrelation
Multicollinearity y
Measurement error 8 10 12 14
3 4 5 6 7 8
x diagnostics High leverage, low influence
Bias and efficiency
Specification
Heterosked.
Autocorrelation
Multicollinearity y
Measurement error 10 15 20 25
5 10 15 20
x diagnostics High leverage, high influence
Bias and efficiency
Specification
Heterosked.
Autocorrelation
Multicollinearity y
Measurement error 8 10 12 14 16
5 10 15 20
x diagnostics Outliers: solution
Bias and “Automatic rejection of outliers is not always a wise procedure. Sometimes efficiency
Specification the outlier is providing information that other data points cannot due to
Heterosked. the fact that it arises from an unusual combination of circumstances which
Autocorrelation may be of vital interest and requires further investigation rather than Multicollinearity rejection. As a general rule, outliers should be rejected out of hand only if Measurement they can be traced to causes such as errors of recording the observations or error setting up the apparatus [in a physical experiment]. Otherwise, careful
investigation is in order.”(Draper & Smith (1998), as cited in Gujarati (2003)) • No extraneous variables in X . • No omitted independent variables. • Parameters to be estimated are constant. • Number of parameters is less than the number of cases, k < n.
diagnostics OLS assumptions: specification
Bias and • Linear in parameters. efficiency
Specification
Heterosked.
Autocorrelation
Multicollinearity
Measurement error
Note that this does not imply that you cannot include non-linearly transformed variables, e.g. 2 yi = β0 + β1xi + β2xi + εi can be estimated with OLS. diagnostics Nonlinearity
Bias and efficiency The estimator assumes a linear relation, since the model is Specification
Heterosked. always of the form yi = β0 + β1xi1 + β2xi2 + εi .
Autocorrelation A nonlinear relation can often be made linear, however, by Multicollinearity
Measurement transforming one of the variables, e.g. by taking a square or a error log. Other forms of nonlinearity require specialised models. • Parameters to be estimated are constant. • Number of parameters is less than the number of cases, k < n.
Note that this does not imply that you cannot include non-linearly transformed variables, e.g. 2 yi = β0 + β1xi + β2xi + εi can be estimated with OLS.
diagnostics OLS assumptions: specification
Bias and • Linear in parameters. efficiency
Specification • No extraneous variables in X . Heterosked. • No omitted independent variables. Autocorrelation
Multicollinearity
Measurement error diagnostics Omitted relevant variables
Bias and A relevant variable is a variable that is correlated with the efficiency dependent variable. Specification Heterosked. If the omitted variable (Z) is correlated with an independent Autocorrelation variable (X ), the estimate of βˆX will be biased. If Z¯ 6= 0, Multicollinearity βˆ will be biased. Measurement intercept error If Z is uncorrelated with X , the estimated standard error for βˆX is biased upwards. diagnostics Specification tests
Bias and efficiency
Specification F -tests can be used to test a full model against a restricted Heterosked. model. Autocorrelation
Multicollinearity This is material for a more advanced course. Measurement error • Number of parameters is less than the number of cases, k < n.
Note that this does not imply that you cannot include non-linearly transformed variables, e.g. 2 yi = β0 + β1xi + β2xi + εi can be estimated with OLS.
diagnostics OLS assumptions: specification
Bias and • Linear in parameters. efficiency
Specification • No extraneous variables in X . Heterosked. • No omitted independent variables. Autocorrelation • Parameters to be estimated are constant. Multicollinearity
Measurement error Note that this does not imply that you cannot include non-linearly transformed variables, e.g. 2 yi = β0 + β1xi + β2xi + εi can be estimated with OLS.
diagnostics OLS assumptions: specification
Bias and • Linear in parameters. efficiency
Specification • No extraneous variables in X . Heterosked. • No omitted independent variables. Autocorrelation • Parameters to be estimated are constant. Multicollinearity • Measurement Number of parameters is less than the number of cases, error k < n. • Errors are normally distributed. • Errors have a constant variance. • Errors are not autocorrelated. • Errors are not correlated with X .
diagnostics OLS assumptions: errors
Bias and efficiency
Specification • Errors have an expected value of zero given X .
Heterosked.
Autocorrelation
Multicollinearity
Measurement error diagnostics E(ε) = 0
The mean of the errors is assumed to be zero.
Bias and This is violated when efficiency
Specification
Heterosked. • there is systematic measurement error in the dependent
Autocorrelation variable Multicollinearity • a relevant variable with a non-zero mean is excluded Measurement error • the dependent variable is not continuous or is truncated or censored • a constant is not included and should have been
If E(ε) 6= 0, the estimate of the intercept is biased. diagnostics Outline
1 Bias and efficiency
Bias and efficiency 2 Specification
Specification
Heterosked. 3 Heteroskedasticity Autocorrelation
Multicollinearity
Measurement error 4 Autocorrelation
5 Multicollinearity
6 Measurement error • Errors have a constant variance. • Errors are not autocorrelated. • Errors are not correlated with X .
diagnostics OLS assumptions: errors
Bias and efficiency
Specification • Errors have an expected value of zero given X . Heterosked. • Errors are normally distributed. Autocorrelation
Multicollinearity
Measurement error • Errors are not autocorrelated. • Errors are not correlated with X .
diagnostics OLS assumptions: errors
Bias and efficiency
Specification • Errors have an expected value of zero given X . Heterosked. • Errors are normally distributed. Autocorrelation • Errors have a constant variance. Multicollinearity
Measurement error diagnostics Homoscedasticity
Bias and efficiency
Specification
Heterosked.
Autocorrelation
Multicollinearity
Measurement error diagnostics Heteroscedasticity
Bias and efficiency
Specification
Heterosked.
Autocorrelation
Multicollinearity
Measurement error diagnostics Residual plots: heteroscedasticity
Bias and efficiency To detect heteroscedasticity (unequal variances), it is useful to Specification plot: Heterosked.
Autocorrelation • Residuals against fitted values Multicollinearity
Measurement • Residuals against dependent variable error • Residuals against independent variable(s) diagnostics Residual plots: y by x
Bias and efficiency
Specification
Heterosked.
Autocorrelation
Multicollinearity y
Measurement error 0 10 20 30 40
−2 0 2 4 6 8
x diagnostics Residual plots: ε by yˆ
Bias and efficiency
Specification
Heterosked.
Autocorrelation
Multicollinearity residuals(m) Measurement error −15 −10 −5 0 5 10 15
0 5 10 15 20 25 30
fitted(m) diagnostics Residual plots: ε by y
Bias and efficiency
Specification
Heterosked.
Autocorrelation
Multicollinearity residuals(m) Measurement error −15 −10 −5 0 5 10 15
0 10 20 30 40
y diagnostics Residual plots: ε by x
Bias and efficiency
Specification
Heterosked.
Autocorrelation
Multicollinearity residuals(m) Measurement error −15 −10 −5 0 5 10 15
−2 0 2 4 6 8
x diagnostics Residual plots: y by x
Bias and efficiency
Specification
Heterosked.
Autocorrelation
Multicollinearity y
Measurement error 5 10 15 20 25 30
0 2 4 6 8
x diagnostics Residual plots: ε by yˆ
Bias and efficiency
Specification
Heterosked.
Autocorrelation
Multicollinearity residuals(m) Measurement error −2 −1 0 1 2
5 10 15 20 25
fitted(m) diagnostics Residual plots: ε by y
Bias and efficiency
Specification
Heterosked.
Autocorrelation
Multicollinearity residuals(m) Measurement error −2 −1 0 1 2
5 10 15 20 25 30
y diagnostics Residual plots: ε by x
Bias and efficiency
Specification
Heterosked.
Autocorrelation
Multicollinearity residuals(m) Measurement error −2 −1 0 1 2
0 2 4 6 8
x diagnostics Residual plots: y by x
Bias and efficiency
Specification
Heterosked.
Autocorrelation
Multicollinearity y
Measurement error 0 20 40 60 80 100
−2 0 2 4 6 8
x diagnostics Residual plots: ε by yˆ
Bias and efficiency
Specification
Heterosked.
Autocorrelation
Multicollinearity residuals(m) Measurement error −10 0 10 20 30
−20 0 20 40 60 80
fitted(m) diagnostics Residual plots: ε by y
Bias and efficiency
Specification
Heterosked.
Autocorrelation
Multicollinearity residuals(m) Measurement error −10 0 10 20 30
0 20 40 60 80 100
y diagnostics Residual plots: ε by x
Bias and efficiency
Specification
Heterosked.
Autocorrelation
Multicollinearity residuals(m) Measurement error −10 0 10 20 30
−2 0 2 4 6 8
x diagnostics Heteroscedasticity: effect
Bias and efficiency
Specification Heterosked. With heteroscedastic disturbances, βˆ will be unbiased but Autocorrelation inefficient. Multicollinearity
Measurement error diagnostics Heteroscedasticity: solution
Bias and efficiency
Specification First, check whether you can see a misspecification (e.g. the
Heterosked. relationship is quadratic).
Autocorrelation
Multicollinearity Otherwise, some specialised model, like a regression with
Measurement robust standard errors or a weighted least squares model error can be used. diagnostics Outline
1 Bias and efficiency
Bias and efficiency 2 Specification
Specification
Heterosked. 3 Heteroskedasticity Autocorrelation
Multicollinearity
Measurement error 4 Autocorrelation
5 Multicollinearity
6 Measurement error • Errors are not correlated with X .
diagnostics OLS assumptions: errors
Bias and efficiency
Specification • Errors have an expected value of zero given X . Heterosked. • Errors are normally distributed. Autocorrelation • Errors have a constant variance. Multicollinearity • Measurement Errors are not autocorrelated. error diagnostics Autocorrelation
Autocorrelation refers to the fact that residuals might be Bias and efficiency correlated with eachother. This can occur for various reasons: Specification
Heterosked. • Spatial autocorrelation Autocorrelation
Multicollinearity • Temporal autocorrelation Measurement • Persistent shocks error • Inertia / psychological conditioning • Partial adjustments over time diagnostics Autocorrelation: consequences
Bias and efficiency
Specification Heterosked. Autocorrelated residuals leads to an inflated R2 and the Autocorrelation estimates for βˆ will be unbiased but inefficient. Multicollinearity
Measurement error diagnostics Residual plots: y by x
Bias and efficiency
Specification
Heterosked.
Autocorrelation
Multicollinearity y
Measurement error 0 5 10 15 20 25
0 2 4 6 8
x diagnostics Residual plots: εt by εt−1
Bias and efficiency
Specification
Heterosked.
Autocorrelation
Multicollinearity
Measurement residuals(m)[−1] error −6 −4 −2 0 2 4 6
−6 −4 −2 0 2 4 6
residuals(m)[−n] diagnostics Residual plots: y by x
Bias and efficiency
Specification
Heterosked.
Autocorrelation
Multicollinearity y
Measurement error 0 5 10 15 20 25
−2 0 2 4 6 8 10
x diagnostics Residual plots: εt by εt−1
Bias and efficiency
Specification
Heterosked.
Autocorrelation
Multicollinearity
Measurement residuals(m)[−1] error −4 −2 0 2 4 6 8
−4 −2 0 2 4 6 8
residuals(m)[−n] diagnostics Autocorrelation: solution
Bias and efficiency Estimating a regression model with autocorrelation is possible, Specification
Heterosked. but not straightforward. Many solutions exist. The fields that
Autocorrelation are of primary concern when dealing with such data are
Multicollinearity time-series analysis and spatial econometrics.
Measurement error Both are well beyond this course. diagnostics Outline
1 Bias and efficiency
Bias and efficiency 2 Specification
Specification
Heterosked. 3 Heteroskedasticity Autocorrelation
Multicollinearity
Measurement error 4 Autocorrelation
5 Multicollinearity
6 Measurement error diagnostics OLS assumptions: errors
Bias and efficiency
Specification • Errors have an expected value of zero given X . Heterosked. • Errors are normally distributed. Autocorrelation • Errors have a constant variance. Multicollinearity • Measurement Errors are not autocorrelated. error • Errors are not correlated with X . • X is of full column rank (note: requires k < n) • No measurement error in X • No endogenous variables in X
diagnostics OLS assumptions: regressors
Bias and efficiency Specification • X varies Heterosked.
Autocorrelation
Multicollinearity
Measurement error • No measurement error in X • No endogenous variables in X
diagnostics OLS assumptions: regressors
Bias and efficiency Specification • X varies Heterosked. • X is of full column rank (note: requires k < n) Autocorrelation
Multicollinearity
Measurement error diagnostics Detecting multicollinearity
Bias and To detect multicollinearity problems, you can simply regress all efficiency independent variables on all other independent variables or look Specification at the variance inflation factors (VIFs), which are based on the Heterosked. R2 of these auxilliary regressions: Autocorrelation Multicollinearity 1 Measurement VIFk = 2 , error 1 − Rk
2 2 with Rk the R for the kth auxilliary regression. diagnostics Multicollinearity: solution
Bias and efficiency If you have very high multicollinearity, you might simply decide Specification to drop one of the offending variables. It is apparently not Heterosked. providing a lot of additional information. If it is still providing Autocorrelation some information, the results estimates will be biased. Multicollinearity
Measurement error Alternatives are to leave the model as is, or develop an index combining multiple variables. diagnostics Outline
1 Bias and efficiency
Bias and efficiency 2 Specification
Specification
Heterosked. 3 Heteroskedasticity Autocorrelation
Multicollinearity
Measurement error 4 Autocorrelation
5 Multicollinearity
6 Measurement error • No endogenous variables in X
diagnostics OLS assumptions: regressors
Bias and efficiency Specification • X varies Heterosked. • X is of full column rank (note: requires k < n) Autocorrelation
Multicollinearity • No measurement error in X
Measurement error diagnostics Measurement error
Bias and efficiency Specification For measurement error, important distinctions are between Heterosked. systematic and random measurement error and between Autocorrelation measurement errors in dependent and in independent variables. Multicollinearity Measurement The consequences for the estimation are different. error diagnostics Measurement error
A systematic error in the dependent variable leads to a biased estimate of βˆintercept : Bias and efficiency
Specification
Heterosked.
Autocorrelation
Multicollinearity
Measurement Y error −4 −2 0 2 4 6
−2 −1 0 1 2
X diagnostics Measurement error
A systematic error in the independent variable leads to a biased estimate of βˆintercept : Bias and efficiency
Specification
Heterosked.
Autocorrelation
Multicollinearity
Measurement Y error −2 0 2 4 6
−2 −1 0 1
X diagnostics Measurement error
A random error in the dependent variable leads to a less efficient estimate of βˆ: Bias and efficiency
Specification
Heterosked.
Autocorrelation
Multicollinearity
Measurement Y error −4 −2 0 2 4 6 8
−2 −1 0 1 2 3
X diagnostics Measurement error
A random error in the independent variable leads to a biased estimate of βˆ: Bias and efficiency
Specification
Heterosked.
Autocorrelation
Multicollinearity
Measurement Y error −6 −4 −2 0 2 4 6
−3 −2 −1 0 1 2
X diagnostics OLS assumptions: regressors
Bias and efficiency Specification • X varies Heterosked. • X is of full column rank (note: requires k < n) Autocorrelation
Multicollinearity • No measurement error in X Measurement • No endogenous variables in X error