Chapter 4: Model Adequacy Checking

In this chapter, we discuss some introductory aspect of model adequacy checking, including:

• Residual Analysis,

• Residual plots,

• Detection and treatment of ,

• The PRESS statistic

• Testing for lack of fit.

The major assumptions that we have made in are:

• The relationship between the response Y and the regressors is linear, at least approximately.

• The error term ε has zero mean.

• The error term ε has constant varianceσ 2 .

• The errors are uncorrelated.

• The errors are normally distributed.

Assumptions 4 and 5 together imply that the errors are independent. Recall that assumption 5 is required for hypothesis testing and interval estimation.

Residual Analysis: The residuals , , , have the following important properties: e1 e2 L en

(a) The mean of is 0. ei

(b) The estimate of population variance computed from the n residuals is:

n n 2 2 ∑()ei−e ∑ei ) 2 = i=1 = i=1 = SS Re s = σ n − p n − p n − p MS Re s

(c) Since the sum of is zero, they are not independent. However, if the number of ei residuals ( n ) is large relative to the number of parameters ( p ), the dependency effect can be ignored in an analysis of residuals.

Standardized Residual: The quantity = ei ,i = 1,2, , n , is called d i L MS Re s standardized residual. The standardized residuals have mean zero and approximately unit variance. A large standardized residual ( > 3 ) potentially indicates an . d i

Recall that

e = (I − H )Y = (I − H )(X β + ε )= (I − H )ε

Therefore,

/ Var()e = var[]()I − H ε = (I − H )var(ε )(I −H ) = σ 2 ()I − H .

Studentized Residual: The quantity = ei = ei ,i = 1,2, , n , ti L (1− ) 2 (1− ) MS Re s hii S hii is called the studentized residual. The studentized residuals have approximately a Student's t distribution with n − p degrees of freedom.

If we delete the i th observation, fit the regression model to the remaining n −1observations, and calculate the predicted value of corresponding to the deleted yi observation, the corresponding predictor error is = − ) . Generally a large ei yi y(−i) difference between the ordinary residual and the PRESS residual will indicate a point where the model fits the data well, but a model without that point predicts poorly.

These prediction errors are usually called PRESS residuals or deleted residuals. It can be shown that = ei . Therefore, e(−i) 1− hii

2 ⎡ ⎤ 1 1 2 Var = Var ei = Var = 1− = σ ()e(−i) ⎢ ⎥ 2 ()ei 2 [σ ()hii ] ⎢1− ⎥ 1− ⎣ hii ⎦ ()1−hii ()1−hii hii

Note that a standardized PRESS residual is

ei ()1− e(−i) = hii = ei Var() 2 2 1− ei σ σ ()hii 1− ()hii which, if we use to estimate 2 is just the studentized residual. MS Re s σ

R-student Residual: The quantity = ei , i = 1,2, , n , is called the R- ri L 2 (1− ) S (−i) hii student residual or jackknife residuals, where the quantity 2 is the residual variance S (−i) computed with the i th observation removed. It can be shown that

2 (n − p) − ei MS Re s 1− 2 = hii S (−i) n − p −1

If the usual assumptions in regression analysis are met, the jackknife residual follows exactly a t -distribution with n − p −1 degrees of freedom.

Example 1: Consider the following data:

y x1 x2 16 7 5 11 3 4 12 3 6 14 4 1 10 5 2

⎡1 7 5⎤ ⎢ ⎥ ⎡16⎤ ⎢ ⎥ ⎡ ⎤ ⎢ ⎥ ⎢1 3 4⎥ ⎢ 5 22 18⎥ ⎢11⎥ ⎢ ⎥ / ⎢ ⎥ y = ⎢12⎥ , X = ⎢1 3 6⎥ ⇒ X X = ⎢22 108 79⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢14⎥ ⎢1 4 1⎥ ⎢18 79 82⎥ ⎣⎢10⎦⎥ ⎢ ⎥ ⎣⎢ ⎦⎥ ⎢ ⎥ 1 5 2 ⎣⎢ ⎦⎥

⎡ ⎤ ⎢ 2.7155 − 0.3967 − 0.2139⎥ −1 ⎢ ⎥ / = ⎢− 0.3967 0.0893 0.0010 ⎥ ()X X ⎢ ⎥ ⎢ ⎥ ⎢− 0.2139 0.0010 0.0582 ⎥ ⎣ ⎦

⎡1 7 5⎤ ⎢ ⎥ ⎡ ⎤ ⎢ ⎥ 2.8645 − 0.3712 − 0.2592 ⎡ ⎤ ⎢1 3 4⎥⎢ ⎥⎢1 1 1 1 1 ⎥ −1 ⎢ ⎥⎢ ⎥ / / ⎢ ⎥ H =X ()X X X = ⎢1 3 6⎥⎢− 0.3712 0.0936 − 0.0067⎥⎢7 3 3 4 5⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢1 4 1⎥⎢ ⎥⎢5 4 6 1 2⎥ ⎢− 0.2592 − 0.0067 0.0719⎥ ⎢ ⎥⎣ ⎦⎣⎢ ⎦⎥ ⎢ ⎥ 1 5 2 ⎣⎢ ⎦⎥

⎡ ⎤ ⎢ 0.9252 − 0.0935 0.0748 − 0.1121 0.2056⎥ ⎢ ⎥ ⎢− 0.0935 0.3832 0.4268 0.1931 0.0903 ⎥ ⎢ ⎥ ⎢ ⎥ H = ⎢ 0.0748 0.4268 0.7030 − 0.1101 − 0.0945⎥ ⎢ ⎥ ⎢− 0.1121 0.1931 − 0.1101 0.6096 0.4195 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 0.2056 0.0903 − 0.0945 0.4195 0.3790 ⎣⎢ ⎦⎥

⇒ = 0.9252, = 0.3832, = 0.7030, = 0.6096, = 0.3790 h11 h22 h33 h44 h55

⎡ ⎤ 0.0748 0.0935 − 0.0748 0.1121 − 0.2056 ⎢ ⎥ ⎡ ⎤ ⎢ ⎥ ⎢ 0.84 ⎥ ⎢ ⎥⎡16⎤ 0.0935 0.6168 − 0.4268 − 0.1931 − 0.0903 ⎢ ⎥ ⎢− 0.45⎥ ⎢ ⎥ 11 ⎢ ⎥ ⎢ ⎥⎢ ⎥ = − = − 0.0748 − 0.4268 0.2970 0.1101 0.0945 ⎢12⎥ = ⎢ 0.16 ⎥ e ()I H y ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢14⎥ ⎢ ⎥ ⎢ 0.1121 − 0.1931 0.1101 0.3904 − 0.4195 ⎥ ⎢ 2.26 ⎥ ⎢ ⎥⎢10⎥ ⎣ ⎦ ⎢− 2.81⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ − 0.2056 − 0.0903 0.0945 − 0.4195 0.6210 ⎣⎢ ⎦⎥

' 13.9374 = ee = = 6.97 MS Re s n − p 2

⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0.84 0.32 d1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢− 0.45⎥ ⎢− 0.17⎥ ⎢d 2⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 1 = e = ⎢ 0.16 ⎥ = ⎢ 0.06 ⎥ ⎢ 3⎥ d 6.97 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ MS Re s ⎢ ⎥ ⎢ ⎥ ⎢d 4⎥ ⎢ 2.26 ⎥ ⎢ 0.86 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣d 5⎦ − 2.81 −1.06 ⎣⎢ ⎦⎥ ⎣⎢ ⎦⎥

⎡ ⎤ e1 ⎢ ⎥ ⎡ 0.84 ⎤ ⎢ 1− ⎥ MS Re s ()h11 ⎢ ⎥ ⎢ ⎥ ⎢ 6.97(1− 0.9252) ⎥ ⎢ ⎥ ⎢ ⎥ ⎡ ⎤ ⎡ ⎤ e1 − 0.45 1.16 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢t ⎥ 1− ⎢ MS Re s ()h22 ⎥ ⎢ 6.97(1− 0.3832) ⎥ ⎢− 0.22⎥ ⎢t2⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 0.16 = ⎢ e1 ⎥ = ⎢ ⎥ = ⎢ 0.11 ⎥ ⎢t3⎥ ⎢ ()1− ⎥ ⎢ 6.97(1− 0.7030) ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ MS Re s h33 ⎥ ⎢ ⎥ ⎢ ⎥ t4 1.37 ⎢ ⎥ ⎢ ⎥ ⎢ 2.26 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ e1 ⎥ ⎢ ⎥ ⎣t5⎦ ⎢ 6.97(1− 0.6096) ⎥ −1.35 ⎢ 1− ⎥ ⎢ ⎥ MS Re s ()h44 ⎢ ⎥ ⎣ ⎦ ⎢ ⎥ ⎢ − 2.81 ⎥ ⎢ ⎥ e1 ⎢ ⎥ ⎢ ⎥ ⎣ 6.97(1− 0.3790) ⎦ 1− ⎣⎢ MS Re s ()h55 ⎦⎥ 2 e1 (n − p) − 2 MS Re s 1− (5 − 3)6.97 − 0.84 2 = h11 = 1−0.9252 = 4.5 S (−1) n − p −1 5 − 3 −1

2 (n − p) − e2 2 Re s (−0.45) MS 1− (5 − 3)6.97 − 2 = h22 = 1−0.3832 = 13.6 S (−2) n − p −1 5 − 3 −1 2 e3 (n − p) − 2 MS Re s 1− (5 − 3)6.97 − 0.16 2 = h33 = 1−0.7030 = 13.9 S (−3) n − p −1 5 − 3 −1 2 e44 (n − p) − 2 MS Re s 1− (5 − 3)6.97 − 2.26 2 = h44 = 1−0.6096 = 0.86 S (−4) n − p −1 5 − 3 −1 2 (n − p) − e55 2 Re s (−2.81) MS 1− (5 − 3)6.97 − 2 = h55 = 1−0.3790 = 1.22 S (−5) n − p −1 5 − 3 −1

⎡ ⎤ ⎢ e1 ⎥ ⎢ 2 ⎥ ()1− ⎡ 0.84 ⎤ ⎢ S (−1) h11 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 4.5(1− 0.9252) ⎥ ⎢ e1 ⎥ ⎡ ⎤ ⎡ ⎤ ⎢ − 0.45 ⎥ 1.45 r(−1) ⎢ 2 ⎥ ⎢ ⎥ ⎢ ⎥ ()1− ⎢ ⎥ ⎢ ⎥ ⎢ S (−2) h22 ⎥ ⎢ 13.6(1− 0.3832) ⎥ ⎢− 0.15⎥ r(−2) ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 0.16 ⎢ ⎥ = ⎢ e1 ⎥ = ⎢ ⎥ = ⎢ 0.08 ⎥ r(−3) ⎢ 2 ⎥ ⎢ ⎥ ⎢ ⎥ ()1− ⎢ 13.9(1− 0.7030) ⎥ ⎢ S (−3) h33 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢r(−4)⎥ 3.90 ⎢ ⎥ ⎢ ⎥ ⎢ 2.26 ⎥ ⎢ ⎥ ⎢ e1 ⎥ ⎢ ⎥ ⎢ 3.23⎥ ⎢ (−5)⎥ 0.86(1− 0.6096) − ⎣r ⎦ ⎢ 2 ⎥ ⎢ ⎥ ()1− ⎢ ⎥ ⎣ ⎦ ⎢ S (−4) h44 ⎥ ⎢ − 2.81 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ e1 ⎥ ⎣ 1.22(1− 0.3790) ⎦ ⎢ 2 ⎥ ()1− ⎣⎢ S (−5) h55 ⎦⎥

SAS Output: Residuals, Studentized Residuals and R-student Residuals

Obs Residuals student Rstudent

1 0.84112 1.16423 1.45010

2 -0.44860 -0.21618 -0.15468

3 0.15888 0.11034 0.07826

4 2.26168 1.36988 3.89917

5 -2.81308 -1.35107 -3.23320

Scat t er pl ot of X2 ver sus X1 x1 7

6

5

4

3

123456

x2

Graphical Analysis of Residuals:

(a) Normal probability plot: If the normality assumption is not badly violated, the conclusion reached by a regression analysis in which normality is assumed will generally be reliable and accurate. A very simple method of checking the normality assumption is to construct a normal probability plot of residuals.

Let , , , be the residuals ranked in increasing order. Note that e(1) e(2) L e(n) ⎛ 1 ⎞ ⎜ i − ⎟ E()= −1⎜ 2 ⎟ e(i) Φ ⎜ n ⎟ ⎜ ⎟ ⎝ ⎠ where Φ denotes the standard normal cumulative distribution. Normal probability plots are constructed by plotting the ranked residuals against the expected normal e(i) ⎛ 1 ⎞ ⎜ i − ⎟ value −1⎜ 2 ⎟. The resulting points should lie approximately on a straight line. Φ ⎜ n ⎟ ⎜ ⎟ ⎝ ⎠ Substantial departures from a straight line indicate that the distribution is not normal.

If normality is deemed unsatisfactory, the Y values may be transformed by using a Log, square root, etc. to see whether the new set of observation is approximately normal.

(b) Plot of Residuals versus the Fitted values: A plot of the residuals (or the ei scaled residuals , or ) versus the corresponding fitted values ) is d i ti r i yi useful for detecting several common types of model inadequacies.

If the plot of residuals versus the fitted values can be contained in a horizontal band, then there are no obvious model defects.

The outward-opening funnel pattern implies that the variance of ε is an increasing function of Y . An inward-opening funnel indicates that the variance of ε decrease as Y increases. The double-bow often occurs when Y is a proportion between zero and one. The usual approach for dealing with inequality of variance is to apply a suitable transformation to either the regressor or the response variable.

A curved plot indicates nonlinearity. This could mean that other regressor variables are needed in the model. For example a squared term may be necessary. Transformation on the regressor and/or the response variable may be helpful in these cases.

A plot of residuals versus the predicted values may also reveal one or more unusually large residuals. These points are potential residuals. Extreme predicted value with large residual could also indicate either the variance is not constant or the true relationship between Y and X is not linear. These possibilities should be investigated before the points are considered outliers. (c) Plot of Residuals versus the Regressors: Plotting the residuals versus corresponding values of each regressor variable can also be helpful. Once again a horizontal band containing the residuals is desirable. The funnel and double-bow patterns indicate nonconstant variance. The curved band or a nonlinear pattern in general indicates that the assumed relationship between Y and the regressor is not correct. Thus, either higher-order terms in X j X j (such as 2 ) or a transformation should be considered. X j

Note that in the simple it is not necessary to plot residuals versus both predicted values and the regressor variable since the predicted values are linear combinations of the regressor values.

(d) Plot of Residuals in Time sequence: It is a good idea to plot the residuals against time order, if the time sequence in which the data were collected is known. If a horizontal band will enclose all of the residuals and the residuals will fluctuate in a more or less random fashion within this band, then there are no autocorrelation.

(e) Partial Regression plots: A limitation of the plot of residuals versus regressor variables is that they may not completely show the correct or complete marginal effect of a regressor, given the other regressors in the model. The partial regression plot considers the marginal role of the regressor given other regressors that are already in the model. In this plot, the X j response variable Y and the regressor are both regressed against the other X j regressors in the model and the residuals obtained for each regression. The plot of these residuals against each other provides information about the nature of the marginal relationship for regressor under consideration. X j

If the regressor enters the model linearly, the partial regression plot should show a X j linear relationship with a slope equal to ) in the multiple linear regression model. β j

Note that:

• Partial regression plots only suggest possible relationship between regressor and the response. These plots may not give information about the proper form of the relationship if several variables already in the model are incorrectly specified. It will usually be necessary to investigate several alternative forms for the relationship between the regressor and Y or several transformations. Residual plots for these subsequent models should be examined to identify the best relationship or transformation.

• Partial regression plots will not, in general, detect interaction effects among the regressors.

• The presence of strong collinearity can cause partial regression plots to give incorrect information about the relationship between the response and the regressor variables.

(f) Partial Residual plots: Suppose that the model contains the regressor , , , . The partial residuals for regressor are defined X 1 X 2 L X k X j as * ()Y | = + ) , i = 1,2, , n where the are the residuals from ei x j ei β j xij L ei the model with all k regressors included. The partial residuals are plotted versus and the interpretation of the partial residual plot is very similar to xij that of the partial regression plot.

Example 2 (Delivery Time Data): A soft drink bottler is analyzing the vending machine service routes in his distribution system. He is interested in predicting the amount of time required by the route driver to service the vending machines in an outlet. This service activity includes stocking the machine with beverage products and minor maintenance or housekeeping. The industrial engineer responsible for the study has suggested that the two most important variables affecting the deliver time (Y ) are the number of cases of product stocked ( ) and the distance walked by the route driver ( ). The engineer X 1 X 2 has collected 40 observations on deliver time.

SAS Output:

Regr essi on Model Y on X1 and X2

y = 2. 4123 +1. 6392 x1 +0. 0136 x2 1. 0 N 20 Rsq 0. 8 0. 9525 Adj Rsq 0. 9469 0. 6 RMSE 3. 7303

0. 4

0. 2

0. 0

0. 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1. 0

CDF of RSTUDENT

Q-Q-plot of Rst udent Residuals

5

4

3

2

1

0

-1

-2

-2 -1 0 1 2

Nor mal Quant i l es

Regr essi on Model Y on X1 and X2 y = 2. 4123 +1. 6392 x1 +0. 0136 x2 5 N 20 4 Rsq 0. 9525 3 Adj Rsq 0. 9469 2 RMSE 3. 7303 1

0

-1

-2

0 1020304050607080

Pr edi c t ed Val ue

Regr essi on Model Y on X1 and X2 y = 2. 4123 +1. 6392 x1 +0. 0136 x2 5 N 20 4 Rsq 0. 9525 3 Adj Rsq 0. 9469 2 RMSE 3. 7303 1

0

-1

-2

0. 0 2. 5 5. 0 7. 5 10. 0 12. 5 15. 0 17. 5 20. 0 22. 5 25. 0 27. 5 30. 0

x1

Regr essi on Model Y on X1 and X2 y = 2. 4123 +1. 6392 x1 +0. 0136 x2 5 N 20 4 Rsq 0. 9525 3 Adj Rsq 0. 9469 2 RMSE 3. 7303 1

0

-1

-2

0 200 400 600 800 1000 1200 1400 1600

x2

Par t i al Resi dual pl ot s pr 1 60

50

40

30

20

10

0

0102030

x1

Par t i al Resi dual pl ot s pr 2 30

20

10

0

-10

0 200 400 600 800 1000 1200 1400 1600

x2 PRESS Statistic: PRESS residuals are defined by = − ) , where ) is the ei yi y(−i) y(−i) predicted value of the i th observed response based on a fit to the remaining n −1 sample points. Large PRESS residuals are potentially useful in identifying observations where the model does not fit the data well or observation for which the model is likely to provide poor future predictions. The PREES Statistic is defined by

2 2 n ) n ⎛ ei ⎞ PRESS = ∑()yi− y(−i) = ∑⎜ ⎟ i=1 i=1 ⎜ ⎟ ⎝1−hii ⎠

PRESS is generally regarded as a measure of how well a regression model will perform in predicting new data. One very important of the PRESS statistic is in comparing regression models. Generally, a model with a small value of PRESS is desired. The PRESS statistic can be also used to compute an R2 -like statistic for prediction, say 2 PRESS = 1− RPr ediction SS T

This statistic gives some indication of the predictive capability of the regression model.

Example 2 (Cont.):

2 236.56224 = 1− SS REs = 1− = 0.9525 R 4977.99610 SS T

2 PRESS 546.03153 = 1− = 1− = 0.8903 RPr ediction 4977.99610 SS T

Therefore, we could expect this model to “explain” about 89.03% of the variation in predicting new observations, as compared to approximately 95.25% of the variability in the original data explained by the least-squares fit.

Lack of Fit of the Regression Model: