Regression Diagnostics & Model

diagnostics Regression diagnostics & model fit Bias and efficiency Specification Johan A. Elkink Heterosked. School of Politics & International Relations Autocorrelation University College Dublin Multicollinearity Measurement error 18 November 2019 diagnostics 1 Bias and efficiency 2 Specification Bias and efficiency Specification 3 Heteroskedasticity Heterosked. Autocorrelation Multicollinearity 4 Autocorrelation Measurement error 5 Multicollinearity 6 Measurement error diagnostics Outline 1 Bias and efficiency Bias and efficiency 2 Specification Specification Heterosked. 3 Heteroskedasticity Autocorrelation Multicollinearity Measurement error 4 Autocorrelation 5 Multicollinearity 6 Measurement error diagnostics Unbiasedness Bias and efficiency Specification An unbiased estimator of a coefficient β is an estimator where Heterosked. the mean of the sampling distribution is identical to the true Autocorrelation β. I.e. E(β^) = β. Multicollinearity Measurement The bias of an estimator is thus E(β^) − β. error diagnostics Best unbiased Often, many estimators could be defined whereby Bias and efficiency E(β^) − β = 0, i.e. that are unbiased. The best unbiased Specification estimator is the estimator that leads to an unbiased estimated Heterosked. with the smallest variance. Autocorrelation Multicollinearity Another term for this is efficiency - a smaller standard error Measurement error means a more efficient estimator. If all assumptions underlying OLS hold, OLS is BLUE, i.e. the best linear unbiased estimator of β. diagnostics Mean Squared Error Bias and efficiency Although unbiasedness is often considered more important than Specification high efficiency, there is a certain trade-off between the two. It Heterosked. is better to have a slightly biased but highly efficient estimator Autocorrelation than an unbiased but very inefficient estimator. Multicollinearity Measurement The Mean Squared Error (MSE) refers to the weighted error square error of an estimator, with equal weights for variance and bias. diagnostics Outline 1 Bias and efficiency Bias and efficiency 2 Specification Specification Heterosked. 3 Heteroskedasticity Autocorrelation Multicollinearity Measurement error 4 Autocorrelation 5 Multicollinearity 6 Measurement error diagnostics Outliers Bias and efficiency An outlier is a point on the regression line where the error is Specification large. Heterosked. Autocorrelation A point with high leverage is located far from the other points. Multicollinearity Measurement error A high leverage point that strongly influences the regression line is called an influential point. diagnostics Outlier, low leverage, low influence Bias and efficiency Specification Heterosked. Autocorrelation Multicollinearity y Measurement error 8 10 12 14 3 4 5 6 7 8 x diagnostics High leverage, low influence Bias and efficiency Specification Heterosked. Autocorrelation Multicollinearity y Measurement error 10 15 20 25 5 10 15 20 x diagnostics High leverage, high influence Bias and efficiency Specification Heterosked. Autocorrelation Multicollinearity y Measurement error 8 10 12 14 16 5 10 15 20 x diagnostics Outliers: solution Bias and \Automatic rejection of outliers is not always a wise procedure. Sometimes efficiency Specification the outlier is providing information that other data points cannot due to Heterosked. the fact that it arises from an unusual combination of circumstances which Autocorrelation may be of vital interest and requires further investigation rather than Multicollinearity rejection. As a general rule, outliers should be rejected out of hand only if Measurement they can be traced to causes such as errors of recording the observations or error setting up the apparatus [in a physical experiment]. Otherwise, careful investigation is in order."(Draper & Smith (1998), as cited in Gujarati (2003)) • No extraneous variables in X . • No omitted independent variables. • Parameters to be estimated are constant. • Number of parameters is less than the number of cases, k < n. diagnostics OLS assumptions: specification Bias and • Linear in parameters. efficiency Specification Heterosked. Autocorrelation Multicollinearity Measurement error Note that this does not imply that you cannot include non-linearly transformed variables, e.g. 2 yi = β0 + β1xi + β2xi + "i can be estimated with OLS. diagnostics Nonlinearity Bias and efficiency The estimator assumes a linear relation, since the model is Specification Heterosked. always of the form yi = β0 + β1xi1 + β2xi2 + "i . Autocorrelation A nonlinear relation can often be made linear, however, by Multicollinearity Measurement transforming one of the variables, e.g. by taking a square or a error log. Other forms of nonlinearity require specialised models. • Parameters to be estimated are constant. • Number of parameters is less than the number of cases, k < n. Note that this does not imply that you cannot include non-linearly transformed variables, e.g. 2 yi = β0 + β1xi + β2xi + "i can be estimated with OLS. diagnostics OLS assumptions: specification Bias and • Linear in parameters. efficiency Specification • No extraneous variables in X . Heterosked. • No omitted independent variables. Autocorrelation Multicollinearity Measurement error diagnostics Omitted relevant variables Bias and A relevant variable is a variable that is correlated with the efficiency dependent variable. Specification Heterosked. If the omitted variable (Z) is correlated with an independent Autocorrelation variable (X ), the estimate of β^X will be biased. If Z¯ 6= 0, Multicollinearity β^ will be biased. Measurement intercept error If Z is uncorrelated with X , the estimated standard error for β^X is biased upwards. diagnostics Specification tests Bias and efficiency Specification F -tests can be used to test a full model against a restricted Heterosked. model. Autocorrelation Multicollinearity This is material for a more advanced course. Measurement error • Number of parameters is less than the number of cases, k < n. Note that this does not imply that you cannot include non-linearly transformed variables, e.g. 2 yi = β0 + β1xi + β2xi + "i can be estimated with OLS. diagnostics OLS assumptions: specification Bias and • Linear in parameters. efficiency Specification • No extraneous variables in X . Heterosked. • No omitted independent variables. Autocorrelation • Parameters to be estimated are constant. Multicollinearity Measurement error Note that this does not imply that you cannot include non-linearly transformed variables, e.g. 2 yi = β0 + β1xi + β2xi + "i can be estimated with OLS. diagnostics OLS assumptions: specification Bias and • Linear in parameters. efficiency Specification • No extraneous variables in X . Heterosked. • No omitted independent variables. Autocorrelation • Parameters to be estimated are constant. Multicollinearity • Measurement Number of parameters is less than the number of cases, error k < n. • Errors are normally distributed. • Errors have a constant variance. • Errors are not autocorrelated. • Errors are not correlated with X . diagnostics OLS assumptions: errors Bias and efficiency Specification • Errors have an expected value of zero given X . Heterosked. Autocorrelation Multicollinearity Measurement error diagnostics E(") = 0 The mean of the errors is assumed to be zero. Bias and This is violated when efficiency Specification Heterosked. • there is systematic measurement error in the dependent Autocorrelation variable Multicollinearity • a relevant variable with a non-zero mean is excluded Measurement error • the dependent variable is not continuous or is truncated or censored • a constant is not included and should have been If E(") 6= 0, the estimate of the intercept is biased. diagnostics Outline 1 Bias and efficiency Bias and efficiency 2 Specification Specification Heterosked. 3 Heteroskedasticity Autocorrelation Multicollinearity Measurement error 4 Autocorrelation 5 Multicollinearity 6 Measurement error • Errors have a constant variance. • Errors are not autocorrelated. • Errors are not correlated with X . diagnostics OLS assumptions: errors Bias and efficiency Specification • Errors have an expected value of zero given X . Heterosked. • Errors are normally distributed. Autocorrelation Multicollinearity Measurement error • Errors are not autocorrelated. • Errors are not correlated with X . diagnostics OLS assumptions: errors Bias and efficiency Specification • Errors have an expected value of zero given X . Heterosked. • Errors are normally distributed. Autocorrelation • Errors have a constant variance. Multicollinearity Measurement error diagnostics Homoscedasticity Bias and efficiency Specification Heterosked. Autocorrelation Multicollinearity Measurement error diagnostics Heteroscedasticity Bias and efficiency Specification Heterosked. Autocorrelation Multicollinearity Measurement error diagnostics Residual plots: heteroscedasticity Bias and efficiency To detect heteroscedasticity (unequal variances), it is useful to Specification plot: Heterosked. Autocorrelation • Residuals against fitted values Multicollinearity Measurement • Residuals against dependent variable error • Residuals against independent variable(s) diagnostics Residual plots: y by x Bias and efficiency Specification Heterosked. Autocorrelation Multicollinearity y Measurement error 0 10 20 30 40 −2 0 2 4 6 8 x diagnostics Residual plots: " by y^ Bias and efficiency Specification Heterosked. Autocorrelation Multicollinearity residuals(m) Measurement error −15 −10 −5 0 5 10 15 0 5 10 15 20 25 30 fitted(m) diagnostics Residual plots: " by y Bias and efficiency Specification Heterosked. Autocorrelation Multicollinearity residuals(m) Measurement

Regression Diagnostics & Model

Phase Transition Unbiased Estimation in High Dimensional Settings Arxiv

Bias, Mean-Square Error, Relative Efficiency

Weak Instruments and Finite-Sample Bias

STAT 830 the Basics of Nonparametric Models The

Bias in Parametric Estimation: Reduction and Useful Side-Effects

Nearly Weighted Risk Minimal Unbiased Estimation✩ ∗ Ulrich K

Chapter 4 Parameter Estimation

Ch. 2 Estimators

Estimators, Bias and Variance

Estimation COMP 245 STATISTICS

Lecture 9 Estimators

A Simulation Study of the Robustness of the Least Median of Squares Estimator of Slope in a Regression Through the Origin Model