Chapter 4: Model Adequacy Checking
Total Page:16
File Type:pdf, Size:1020Kb
Chapter 4: Model Adequacy Checking In this chapter, we discuss some introductory aspect of model adequacy checking, including: • Residual Analysis, • Residual plots, • Detection and treatment of outliers, • The PRESS statistic • Testing for lack of fit. The major assumptions that we have made in regression analysis are: • The relationship between the response Y and the regressors is linear, at least approximately. • The error term ε has zero mean. • The error term ε has constant varianceσ 2 . • The errors are uncorrelated. • The errors are normally distributed. Assumptions 4 and 5 together imply that the errors are independent. Recall that assumption 5 is required for hypothesis testing and interval estimation. Residual Analysis: The residuals , , , have the following important properties: e1 e2 L en (a) The mean of is 0. ei (b) The estimate of population variance computed from the n residuals is: n n 2 2 ∑()ei−e ∑ei ) 2 = i=1 = i=1 = SS Re s = σ n − p n − p n − p MS Re s (c) Since the sum of is zero, they are not independent. However, if the number of ei residuals ( n ) is large relative to the number of parameters ( p ), the dependency effect can be ignored in an analysis of residuals. Standardized Residual: The quantity = ei ,i = 1,2, , n , is called d i L MS Re s standardized residual. The standardized residuals have mean zero and approximately unit variance. A large standardized residual ( > 3 ) potentially indicates an outlier. d i Recall that e = (I − H )Y = (I − H )(X β + ε )= (I − H )ε Therefore, / Var()e = var[]()I − H ε = (I − H )var(ε )(I −H ) = σ 2 ()I − H . Studentized Residual: The quantity = ei = ei ,i = 1,2, , n , ti L (1− ) 2 (1− ) MS Re s hii S hii is called the studentized residual. The studentized residuals have approximately a Student's t distribution with n − p degrees of freedom. If we delete the i th observation, fit the regression model to the remaining n −1observations, and calculate the predicted value of corresponding to the deleted yi observation, the corresponding predictor error is = − ) . Generally a large ei yi y(−i) difference between the ordinary residual and the PRESS residual will indicate a point where the model fits the data well, but a model without that point predicts poorly. These prediction errors are usually called PRESS residuals or deleted residuals. It can be shown that = ei . Therefore, e(−i) 1− hii 2 ⎡ ⎤ 1 1 2 Var = Var ei = Var = 1− = σ ()e(−i) ⎢ ⎥ 2 ()ei 2 [σ ()hii ] ⎢1− ⎥ 1− ⎣ hii ⎦ ()1−hii ()1−hii hii Note that a standardized PRESS residual is ei ()1− e(−i) = hii = ei Var() 2 2 1− ei σ σ ()hii 1− ()hii which, if we use to estimate 2 is just the studentized residual. MS Re s σ R-student Residual: The quantity = ei , i = 1,2, , n , is called the R- ri L 2 (1− ) S (−i) hii student residual or jackknife residuals, where the quantity 2 is the residual variance S (−i) computed with the i th observation removed. It can be shown that 2 (n − p) − ei MS Re s 1− 2 = hii S (−i) n − p −1 If the usual assumptions in regression analysis are met, the jackknife residual follows exactly a t -distribution with n − p −1 degrees of freedom. Example 1: Consider the following data: y x1 x2 16 7 5 11 3 4 12 3 6 14 4 1 10 5 2 ⎡1 7 5⎤ ⎢ ⎥ ⎡16⎤ ⎢ ⎥ ⎡ ⎤ ⎢ ⎥ ⎢1 3 4⎥ ⎢ 5 22 18⎥ ⎢11⎥ ⎢ ⎥ / ⎢ ⎥ y = ⎢12⎥ , X = ⎢1 3 6⎥ ⇒ X X = ⎢22 108 79⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢14⎥ ⎢1 4 1⎥ ⎢18 79 82⎥ ⎣⎢10⎦⎥ ⎢ ⎥ ⎣⎢ ⎦⎥ ⎢ ⎥ 1 5 2 ⎣⎢ ⎦⎥ ⎡ ⎤ ⎢ 2.7155 − 0.3967 − 0.2139⎥ −1 ⎢ ⎥ / = ⎢− 0.3967 0.0893 0.0010 ⎥ ()X X ⎢ ⎥ ⎢ ⎥ ⎢− 0.2139 0.0010 0.0582 ⎥ ⎣ ⎦ ⎡1 7 5⎤ ⎢ ⎥ ⎡ ⎤ ⎢ ⎥ 2.8645 − 0.3712 − 0.2592 ⎡ ⎤ ⎢1 3 4⎥⎢ ⎥⎢1 1 1 1 1 ⎥ −1 ⎢ ⎥⎢ ⎥ / / ⎢ ⎥ H =X ()X X X = ⎢1 3 6⎥⎢− 0.3712 0.0936 − 0.0067⎥⎢7 3 3 4 5⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢1 4 1⎥⎢ ⎥⎢5 4 6 1 2⎥ ⎢− 0.2592 − 0.0067 0.0719⎥ ⎢ ⎥⎣ ⎦⎣⎢ ⎦⎥ ⎢ ⎥ 1 5 2 ⎣⎢ ⎦⎥ ⎡ ⎤ ⎢ 0.9252 − 0.0935 0.0748 − 0.1121 0.2056⎥ ⎢ ⎥ ⎢− 0.0935 0.3832 0.4268 0.1931 0.0903 ⎥ ⎢ ⎥ ⎢ ⎥ H = ⎢ 0.0748 0.4268 0.7030 − 0.1101 − 0.0945⎥ ⎢ ⎥ ⎢− 0.1121 0.1931 − 0.1101 0.6096 0.4195 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 0.2056 0.0903 − 0.0945 0.4195 0.3790 ⎣⎢ ⎦⎥ ⇒ = 0.9252, = 0.3832, = 0.7030, = 0.6096, = 0.3790 h11 h22 h33 h44 h55 ⎡ ⎤ 0.0748 0.0935 − 0.0748 0.1121 − 0.2056 ⎢ ⎥ ⎡ ⎤ ⎢ ⎥ ⎢ 0.84 ⎥ ⎢ ⎥⎡16⎤ 0.0935 0.6168 − 0.4268 − 0.1931 − 0.0903 ⎢ ⎥ ⎢− 0.45⎥ ⎢ ⎥ 11 ⎢ ⎥ ⎢ ⎥⎢ ⎥ = − = − 0.0748 − 0.4268 0.2970 0.1101 0.0945 ⎢12⎥ = ⎢ 0.16 ⎥ e ()I H y ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢14⎥ ⎢ ⎥ ⎢ 0.1121 − 0.1931 0.1101 0.3904 − 0.4195 ⎥ ⎢ 2.26 ⎥ ⎢ ⎥⎢10⎥ ⎣ ⎦ ⎢− 2.81⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ − 0.2056 − 0.0903 0.0945 − 0.4195 0.6210 ⎣⎢ ⎦⎥ ' 13.9374 = ee = = 6.97 MS Re s n − p 2 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0.84 0.32 d1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢− 0.45⎥ ⎢− 0.17⎥ ⎢d 2⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 1 = e = ⎢ 0.16 ⎥ = ⎢ 0.06 ⎥ ⎢ 3⎥ d 6.97 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ MS Re s ⎢ ⎥ ⎢ ⎥ ⎢d 4⎥ ⎢ 2.26 ⎥ ⎢ 0.86 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣d 5⎦ − 2.81 −1.06 ⎣⎢ ⎦⎥ ⎣⎢ ⎦⎥ ⎡ ⎤ e1 ⎢ ⎥ ⎡ 0.84 ⎤ ⎢ 1− ⎥ MS Re s ()h11 ⎢ ⎥ ⎢ ⎥ ⎢ 6.97(1− 0.9252) ⎥ ⎢ ⎥ ⎢ ⎥ ⎡ ⎤ ⎡ ⎤ e1 − 0.45 1.16 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢t ⎥ 1− ⎢ MS Re s ()h22 ⎥ ⎢ 6.97(1− 0.3832) ⎥ ⎢− 0.22⎥ ⎢t2⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 0.16 = ⎢ e1 ⎥ = ⎢ ⎥ = ⎢ 0.11 ⎥ ⎢t3⎥ ⎢ ()1− ⎥ ⎢ 6.97(1− 0.7030) ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ MS Re s h33 ⎥ ⎢ ⎥ ⎢ ⎥ t4 1.37 ⎢ ⎥ ⎢ ⎥ ⎢ 2.26 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ e1 ⎥ ⎢ ⎥ ⎣t5⎦ ⎢ 6.97(1− 0.6096) ⎥ −1.35 ⎢ 1− ⎥ ⎢ ⎥ MS Re s ()h44 ⎢ ⎥ ⎣ ⎦ ⎢ ⎥ ⎢ − 2.81 ⎥ ⎢ ⎥ e1 ⎢ ⎥ ⎢ ⎥ ⎣ 6.97(1− 0.3790) ⎦ 1− ⎣⎢ MS Re s ()h55 ⎦⎥ 2 e1 (n − p) − 2 MS Re s 1− (5 − 3)6.97 − 0.84 2 = h11 = 1−0.9252 = 4.5 S (−1) n − p −1 5 − 3 −1 2 (n − p) − e2 2 Re s (−0.45) MS 1− (5 − 3)6.97 − 2 = h22 = 1−0.3832 = 13.6 S (−2) n − p −1 5 − 3 −1 2 e3 (n − p) − 2 MS Re s 1− (5 − 3)6.97 − 0.16 2 = h33 = 1−0.7030 = 13.9 S (−3) n − p −1 5 − 3 −1 2 e44 (n − p) − 2 MS Re s 1− (5 − 3)6.97 − 2.26 2 = h44 = 1−0.6096 = 0.86 S (−4) n − p −1 5 − 3 −1 2 (n − p) − e55 2 Re s (−2.81) MS 1− (5 − 3)6.97 − 2 = h55 = 1−0.3790 = 1.22 S (−5) n − p −1 5 − 3 −1 ⎡ ⎤ ⎢ e1 ⎥ ⎢ 2 ⎥ ()1− ⎡ 0.84 ⎤ ⎢ S (−1) h11 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 4.5(1− 0.9252) ⎥ ⎢ e1 ⎥ ⎡ ⎤ ⎡ ⎤ ⎢ − 0.45 ⎥ 1.45 r(−1) ⎢ 2 ⎥ ⎢ ⎥ ⎢ ⎥ ()1− ⎢ ⎥ ⎢ ⎥ ⎢ S (−2) h22 ⎥ ⎢ 13.6(1− 0.3832) ⎥ ⎢− 0.15⎥ r(−2) ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 0.16 ⎢ ⎥ = ⎢ e1 ⎥ = ⎢ ⎥ = ⎢ 0.08 ⎥ r(−3) ⎢ 2 ⎥ ⎢ ⎥ ⎢ ⎥ ()1− ⎢ 13.9(1− 0.7030) ⎥ ⎢ S (−3) h33 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢r(−4)⎥ 3.90 ⎢ ⎥ ⎢ ⎥ ⎢ 2.26 ⎥ ⎢ ⎥ ⎢ e1 ⎥ ⎢ ⎥ ⎢ 3.23⎥ ⎢ (−5)⎥ 0.86(1− 0.6096) − ⎣r ⎦ ⎢ 2 ⎥ ⎢ ⎥ ()1− ⎢ ⎥ ⎣ ⎦ ⎢ S (−4) h44 ⎥ ⎢ − 2.81 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ e1 ⎥ ⎣ 1.22(1− 0.3790) ⎦ ⎢ 2 ⎥ ()1− ⎣⎢ S (−5) h55 ⎦⎥ SAS Output: Residuals, Studentized Residuals and R-student Residuals Obs Residuals student Rstudent 1 0.84112 1.16423 1.45010 2 -0.44860 -0.21618 -0.15468 3 0.15888 0.11034 0.07826 4 2.26168 1.36988 3.89917 5 -2.81308 -1.35107 -3.23320 Scat t er pl ot of X2 ver sus X1 x1 7 6 5 4 3 123456 x2 Graphical Analysis of Residuals: (a) Normal probability plot: If the normality assumption is not badly violated, the conclusion reached by a regression analysis in which normality is assumed will generally be reliable and accurate. A very simple method of checking the normality assumption is to construct a normal probability plot of residuals. Let , , , be the residuals ranked in increasing order. Note that e(1) e(2) L e(n) ⎛ 1 ⎞ ⎜ i − ⎟ E()= −1⎜ 2 ⎟ e(i) Φ ⎜ n ⎟ ⎜ ⎟ ⎝ ⎠ where Φ denotes the standard normal cumulative distribution. Normal probability plots are constructed by plotting the ranked residuals against the expected normal e(i) ⎛ 1 ⎞ ⎜ i − ⎟ value −1⎜ 2 ⎟. The resulting points should lie approximately on a straight line. Φ ⎜ n ⎟ ⎜ ⎟ ⎝ ⎠ Substantial departures from a straight line indicate that the distribution is not normal. If normality is deemed unsatisfactory, the Y values may be transformed by using a Log, square root, etc. to see whether the new set of observation is approximately normal. (b) Plot of Residuals versus the Fitted values: A plot of the residuals (or the ei scaled residuals , or ) versus the corresponding fitted values ) is d i ti r i yi useful for detecting several common types of model inadequacies. If the plot of residuals versus the fitted values can be contained in a horizontal band, then there are no obvious model defects. The outward-opening funnel pattern implies that the variance of ε is an increasing function of Y . An inward-opening funnel indicates that the variance of ε decrease as Y increases. The double-bow often occurs when Y is a proportion between zero and one. The usual approach for dealing with inequality of variance is to apply a suitable transformation to either the regressor or the response variable. A curved plot indicates nonlinearity. This could mean that other regressor variables are needed in the model. For example a squared term may be necessary. Transformation on the regressor and/or the response variable may be helpful in these cases. A plot of residuals versus the predicted values may also reveal one or more unusually large residuals.