diagnostics

Regression diagnostics & model fit

Bias and efficiency

Specification Johan A. Elkink Heterosked. School of Politics & International Relations University College Dublin Multicollinearity

Measurement error

18 November 2019 diagnostics

1 and efficiency

2 Specification

Bias and efficiency Specification 3 Heteroskedasticity Heterosked.

Autocorrelation Multicollinearity 4 Autocorrelation Measurement error

5 Multicollinearity

6 Measurement error diagnostics Outline

1 Bias and efficiency

Bias and efficiency 2 Specification

Specification

Heterosked. 3 Heteroskedasticity Autocorrelation

Multicollinearity

Measurement error 4 Autocorrelation

5 Multicollinearity

6 Measurement error diagnostics Unbiasedness

Bias and efficiency Specification An unbiased of a coefficient β is an estimator where Heterosked. the of the distribution is identical to the true Autocorrelation β. I.e. E(βˆ) = β. Multicollinearity Measurement The bias of an estimator is thus E(βˆ) − β. error diagnostics Best unbiased

Often, many could be defined whereby Bias and efficiency E(βˆ) − β = 0, i.e. that are unbiased. The best unbiased Specification estimator is the estimator that leads to an unbiased estimated Heterosked. with the smallest . Autocorrelation Multicollinearity Another term for this is efficiency - a smaller Measurement error a more efficient estimator.

If all assumptions underlying OLS hold, OLS is BLUE, i.e. the best linear unbiased estimator of β. diagnostics

Bias and efficiency Although unbiasedness is often considered more important than

Specification high efficiency, there is a certain trade-off between the two. It Heterosked. is better to have a slightly biased but highly efficient estimator Autocorrelation than an unbiased but very inefficient estimator. Multicollinearity Measurement The Mean Squared Error (MSE) refers to the weighted error square error of an estimator, with equal weights for variance and bias. diagnostics Outline

1 Bias and efficiency

Bias and efficiency 2 Specification

Specification

Heterosked. 3 Heteroskedasticity Autocorrelation

Multicollinearity

Measurement error 4 Autocorrelation

5 Multicollinearity

6 Measurement error diagnostics Outliers

Bias and efficiency An outlier is a point on the regression line where the error is Specification large. Heterosked. Autocorrelation A point with high leverage is located far from the other points. Multicollinearity

Measurement error A high leverage point that strongly influences the regression line is called an influential point. diagnostics Outlier, low leverage, low influence

Bias and efficiency

Specification

Heterosked.

Autocorrelation

Multicollinearity y

Measurement error 8 10 12 14

3 4 5 6 7 8

x diagnostics High leverage, low influence

Bias and efficiency

Specification

Heterosked.

Autocorrelation

Multicollinearity y

Measurement error 10 15 20 25

5 10 15 20

x diagnostics High leverage, high influence

Bias and efficiency

Specification

Heterosked.

Autocorrelation

Multicollinearity y

Measurement error 8 10 12 14 16

5 10 15 20

x diagnostics Outliers: solution

Bias and “Automatic rejection of outliers is not always a wise procedure. Sometimes efficiency

Specification the outlier is providing information that other points cannot due to

Heterosked. the fact that it arises from an unusual combination of circumstances which

Autocorrelation may be of vital interest and requires further investigation rather than Multicollinearity rejection. As a general rule, outliers should be rejected out of hand only if Measurement they can be traced to causes such as errors of recording the observations or error setting up the apparatus [in a physical ]. Otherwise, careful

investigation is in order.”(Draper & Smith (1998), as cited in Gujarati (2003)) • No extraneous variables in X . • No omitted independent variables. • Parameters to be estimated are constant. • Number of parameters is less than the number of cases, k < n.

diagnostics OLS assumptions: specification

Bias and • Linear in parameters. efficiency

Specification

Heterosked.

Autocorrelation

Multicollinearity

Measurement error

Note that this does not imply that you cannot include non-linearly transformed variables, e.g. 2 yi = β0 + β1xi + β2xi + εi can be estimated with OLS. diagnostics Nonlinearity

Bias and efficiency The estimator assumes a linear relation, since the model is Specification

Heterosked. always of the form yi = β0 + β1xi1 + β2xi2 + εi .

Autocorrelation A nonlinear relation can often be made linear, however, by Multicollinearity

Measurement transforming one of the variables, e.g. by taking a square or a error log. Other forms of nonlinearity require specialised models. • Parameters to be estimated are constant. • Number of parameters is less than the number of cases, k < n.

Note that this does not imply that you cannot include non-linearly transformed variables, e.g. 2 yi = β0 + β1xi + β2xi + εi can be estimated with OLS.

diagnostics OLS assumptions: specification

Bias and • Linear in parameters. efficiency

Specification • No extraneous variables in X . Heterosked. • No omitted independent variables. Autocorrelation

Multicollinearity

Measurement error diagnostics Omitted relevant variables

Bias and A relevant variable is a variable that is correlated with the efficiency dependent variable. Specification Heterosked. If the omitted variable (Z) is correlated with an independent Autocorrelation variable (X ), the estimate of βˆX will be biased. If Z¯ 6= 0, Multicollinearity βˆ will be biased. Measurement intercept error If Z is uncorrelated with X , the estimated standard error for βˆX is biased upwards. diagnostics Specification tests

Bias and efficiency

Specification F -tests can be used to test a full model against a restricted Heterosked. model. Autocorrelation

Multicollinearity This is material for a more advanced course. Measurement error • Number of parameters is less than the number of cases, k < n.

Note that this does not imply that you cannot include non-linearly transformed variables, e.g. 2 yi = β0 + β1xi + β2xi + εi can be estimated with OLS.

diagnostics OLS assumptions: specification

Bias and • Linear in parameters. efficiency

Specification • No extraneous variables in X . Heterosked. • No omitted independent variables. Autocorrelation • Parameters to be estimated are constant. Multicollinearity

Measurement error Note that this does not imply that you cannot include non-linearly transformed variables, e.g. 2 yi = β0 + β1xi + β2xi + εi can be estimated with OLS.

diagnostics OLS assumptions: specification

Bias and • Linear in parameters. efficiency

Specification • No extraneous variables in X . Heterosked. • No omitted independent variables. Autocorrelation • Parameters to be estimated are constant. Multicollinearity • Measurement Number of parameters is less than the number of cases, error k < n. • Errors are normally distributed. • Errors have a constant variance. • Errors are not autocorrelated. • Errors are not correlated with X .

diagnostics OLS assumptions: errors

Bias and efficiency

Specification • Errors have an of zero given X .

Heterosked.

Autocorrelation

Multicollinearity

Measurement error diagnostics E(ε) = 0

The mean of the errors is assumed to be zero.

Bias and This is violated when efficiency

Specification

Heterosked. • there is systematic measurement error in the dependent

Autocorrelation variable Multicollinearity • a relevant variable with a non-zero mean is excluded Measurement error • the dependent variable is not continuous or is truncated or censored • a constant is not included and should have been

If E(ε) 6= 0, the estimate of the intercept is biased. diagnostics Outline

1 Bias and efficiency

Bias and efficiency 2 Specification

Specification

Heterosked. 3 Heteroskedasticity Autocorrelation

Multicollinearity

Measurement error 4 Autocorrelation

5 Multicollinearity

6 Measurement error • Errors have a constant variance. • Errors are not autocorrelated. • Errors are not correlated with X .

diagnostics OLS assumptions: errors

Bias and efficiency

Specification • Errors have an expected value of zero given X . Heterosked. • Errors are normally distributed. Autocorrelation

Multicollinearity

Measurement error • Errors are not autocorrelated. • Errors are not correlated with X .

diagnostics OLS assumptions: errors

Bias and efficiency

Specification • Errors have an expected value of zero given X . Heterosked. • Errors are normally distributed. Autocorrelation • Errors have a constant variance. Multicollinearity

Measurement error diagnostics

Bias and efficiency

Specification

Heterosked.

Autocorrelation

Multicollinearity

Measurement error diagnostics

Bias and efficiency

Specification

Heterosked.

Autocorrelation

Multicollinearity

Measurement error diagnostics Residual plots: heteroscedasticity

Bias and efficiency To detect heteroscedasticity (unequal ), it is useful to Specification plot: Heterosked.

Autocorrelation • Residuals against fitted values Multicollinearity

Measurement • Residuals against dependent variable error • Residuals against independent variable(s) diagnostics Residual plots: y by x

Bias and efficiency

Specification

Heterosked.

Autocorrelation

Multicollinearity y

Measurement error 0 10 20 30 40

−2 0 2 4 6 8

x diagnostics Residual plots: ε by yˆ

Bias and efficiency

Specification

Heterosked.

Autocorrelation

Multicollinearity residuals(m) Measurement error −15 −10 −5 0 5 10 15

0 5 10 15 20 25 30

fitted(m) diagnostics Residual plots: ε by y

Bias and efficiency

Specification

Heterosked.

Autocorrelation

Multicollinearity residuals(m) Measurement error −15 −10 −5 0 5 10 15

0 10 20 30 40

y diagnostics Residual plots: ε by x

Bias and efficiency

Specification

Heterosked.

Autocorrelation

Multicollinearity residuals(m) Measurement error −15 −10 −5 0 5 10 15

−2 0 2 4 6 8

x diagnostics Residual plots: y by x

Bias and efficiency

Specification

Heterosked.

Autocorrelation

Multicollinearity y

Measurement error 5 10 15 20 25 30

0 2 4 6 8

x diagnostics Residual plots: ε by yˆ

Bias and efficiency

Specification

Heterosked.

Autocorrelation

Multicollinearity residuals(m) Measurement error −2 −1 0 1 2

5 10 15 20 25

fitted(m) diagnostics Residual plots: ε by y

Bias and efficiency

Specification

Heterosked.

Autocorrelation

Multicollinearity residuals(m) Measurement error −2 −1 0 1 2

5 10 15 20 25 30

y diagnostics Residual plots: ε by x

Bias and efficiency

Specification

Heterosked.

Autocorrelation

Multicollinearity residuals(m) Measurement error −2 −1 0 1 2

0 2 4 6 8

x diagnostics Residual plots: y by x

Bias and efficiency

Specification

Heterosked.

Autocorrelation

Multicollinearity y

Measurement error 0 20 40 60 80 100

−2 0 2 4 6 8

x diagnostics Residual plots: ε by yˆ

Bias and efficiency

Specification

Heterosked.

Autocorrelation

Multicollinearity residuals(m) Measurement error −10 0 10 20 30

−20 0 20 40 60 80

fitted(m) diagnostics Residual plots: ε by y

Bias and efficiency

Specification

Heterosked.

Autocorrelation

Multicollinearity residuals(m) Measurement error −10 0 10 20 30

0 20 40 60 80 100

y diagnostics Residual plots: ε by x

Bias and efficiency

Specification

Heterosked.

Autocorrelation

Multicollinearity residuals(m) Measurement error −10 0 10 20 30

−2 0 2 4 6 8

x diagnostics Heteroscedasticity: effect

Bias and efficiency

Specification Heterosked. With heteroscedastic disturbances, βˆ will be unbiased but Autocorrelation inefficient. Multicollinearity

Measurement error diagnostics Heteroscedasticity: solution

Bias and efficiency

Specification First, check whether you can see a misspecification (e.g. the

Heterosked. relationship is quadratic).

Autocorrelation

Multicollinearity Otherwise, some specialised model, like a regression with

Measurement robust standard errors or a weighted model error can be used. diagnostics Outline

1 Bias and efficiency

Bias and efficiency 2 Specification

Specification

Heterosked. 3 Heteroskedasticity Autocorrelation

Multicollinearity

Measurement error 4 Autocorrelation

5 Multicollinearity

6 Measurement error • Errors are not correlated with X .

diagnostics OLS assumptions: errors

Bias and efficiency

Specification • Errors have an expected value of zero given X . Heterosked. • Errors are normally distributed. Autocorrelation • Errors have a constant variance. Multicollinearity • Measurement Errors are not autocorrelated. error diagnostics Autocorrelation

Autocorrelation refers to the fact that residuals might be Bias and efficiency correlated with eachother. This can occur for various reasons: Specification

Heterosked. • Spatial autocorrelation Autocorrelation

Multicollinearity • Temporal autocorrelation Measurement • Persistent shocks error • Inertia / psychological conditioning • Partial adjustments over time diagnostics Autocorrelation: consequences

Bias and efficiency

Specification Heterosked. Autocorrelated residuals leads to an inflated R2 and the Autocorrelation estimates for βˆ will be unbiased but inefficient. Multicollinearity

Measurement error diagnostics Residual plots: y by x

Bias and efficiency

Specification

Heterosked.

Autocorrelation

Multicollinearity y

Measurement error 0 5 10 15 20 25

0 2 4 6 8

x diagnostics Residual plots: εt by εt−1

Bias and efficiency

Specification

Heterosked.

Autocorrelation

Multicollinearity

Measurement residuals(m)[−1] error −6 −4 −2 0 2 4 6

−6 −4 −2 0 2 4 6

residuals(m)[−n] diagnostics Residual plots: y by x

Bias and efficiency

Specification

Heterosked.

Autocorrelation

Multicollinearity y

Measurement error 0 5 10 15 20 25

−2 0 2 4 6 8 10

x diagnostics Residual plots: εt by εt−1

Bias and efficiency

Specification

Heterosked.

Autocorrelation

Multicollinearity

Measurement residuals(m)[−1] error −4 −2 0 2 4 6 8

−4 −2 0 2 4 6 8

residuals(m)[−n] diagnostics Autocorrelation: solution

Bias and efficiency Estimating a regression model with autocorrelation is possible, Specification

Heterosked. but not straightforward. Many solutions exist. The fields that

Autocorrelation are of primary concern when dealing with such data are

Multicollinearity time-series analysis and spatial .

Measurement error Both are well beyond this course. diagnostics Outline

1 Bias and efficiency

Bias and efficiency 2 Specification

Specification

Heterosked. 3 Heteroskedasticity Autocorrelation

Multicollinearity

Measurement error 4 Autocorrelation

5 Multicollinearity

6 Measurement error diagnostics OLS assumptions: errors

Bias and efficiency

Specification • Errors have an expected value of zero given X . Heterosked. • Errors are normally distributed. Autocorrelation • Errors have a constant variance. Multicollinearity • Measurement Errors are not autocorrelated. error • Errors are not correlated with X . • X is of full column rank (note: requires k < n) • No measurement error in X • No endogenous variables in X

diagnostics OLS assumptions: regressors

Bias and efficiency Specification • X varies Heterosked.

Autocorrelation

Multicollinearity

Measurement error • No measurement error in X • No endogenous variables in X

diagnostics OLS assumptions: regressors

Bias and efficiency Specification • X varies Heterosked. • X is of full column rank (note: requires k < n) Autocorrelation

Multicollinearity

Measurement error diagnostics Detecting multicollinearity

Bias and To detect multicollinearity problems, you can simply regress all efficiency independent variables on all other independent variables or look Specification at the variance inflation factors (VIFs), which are based on the Heterosked. R2 of these auxilliary regressions: Autocorrelation Multicollinearity 1 Measurement VIFk = 2 , error 1 − Rk

2 2 with Rk the R for the kth auxilliary regression. diagnostics Multicollinearity: solution

Bias and efficiency If you have very high multicollinearity, you might simply decide Specification to drop one of the offending variables. It is apparently not Heterosked. providing a lot of additional information. If it is still providing Autocorrelation some information, the results estimates will be biased. Multicollinearity

Measurement error Alternatives are to leave the model as is, or develop an index combining multiple variables. diagnostics Outline

1 Bias and efficiency

Bias and efficiency 2 Specification

Specification

Heterosked. 3 Heteroskedasticity Autocorrelation

Multicollinearity

Measurement error 4 Autocorrelation

5 Multicollinearity

6 Measurement error • No endogenous variables in X

diagnostics OLS assumptions: regressors

Bias and efficiency Specification • X varies Heterosked. • X is of full column rank (note: requires k < n) Autocorrelation

Multicollinearity • No measurement error in X

Measurement error diagnostics Measurement error

Bias and efficiency Specification For measurement error, important distinctions are between Heterosked. systematic and random measurement error and between Autocorrelation measurement errors in dependent and in independent variables. Multicollinearity Measurement The consequences for the estimation are different. error diagnostics Measurement error

A systematic error in the dependent variable leads to a biased estimate of βˆintercept : Bias and efficiency

Specification

Heterosked.

Autocorrelation

Multicollinearity

Measurement Y error −4 −2 0 2 4 6

−2 −1 0 1 2

X diagnostics Measurement error

A systematic error in the independent variable leads to a biased estimate of βˆintercept : Bias and efficiency

Specification

Heterosked.

Autocorrelation

Multicollinearity

Measurement Y error −2 0 2 4 6

−2 −1 0 1

X diagnostics Measurement error

A random error in the dependent variable leads to a less efficient estimate of βˆ: Bias and efficiency

Specification

Heterosked.

Autocorrelation

Multicollinearity

Measurement Y error −4 −2 0 2 4 6 8

−2 −1 0 1 2 3

X diagnostics Measurement error

A random error in the independent variable leads to a biased estimate of βˆ: Bias and efficiency

Specification

Heterosked.

Autocorrelation

Multicollinearity

Measurement Y error −6 −4 −2 0 2 4 6

−3 −2 −1 0 1 2

X diagnostics OLS assumptions: regressors

Bias and efficiency Specification • X varies Heterosked. • X is of full column rank (note: requires k < n) Autocorrelation

Multicollinearity • No measurement error in X Measurement • No endogenous variables in X error