Residual Plots and Normal Probability Plots

Residual plots and normal probability plots proc reg; title 'normal probability plot'; model X5 LnY=X1 X2 X3 X8; proc univariate data=out1 normal plot; output out=out1 r=residX5 residY var residY; p=FittedX5 FittedY; qqplot residY/ normal (L=1 mu=0 symbol1 c=green v=dot width=3; sigma=est); symbol2 c=green v=circle i=rl; run; axis2 label=('Residual'); title 'Residual plot against predicted'; proc gplot data=out1; plot residY*FittedY=1/ vaxis=axis2; run; title 'Residual plot against X5'; proc gplot data=out1; plot ResidY*X5=1/ vaxis=axis2; run; title 'partial regression plot for X5'; proc gplot data=out1; plot residY*residX5=2/vref=0; run;

6. 0000E-01

4. 0000E-01

2. 0000E-01 R e s i 0 d u a l -2. 000E-01

-4. 000E-01

-6. 000E-01

-3 -2 -1 0 1 2 3

Nor mal Quant i l es Model Diagnostic for Outlying observations, influential cases and Multicollinearity proc reg data=surgical; title 'Leverage values'; model LnY=X1 X2 X3 X8/VIF influence; proc gplot data=out2; output out=out2 rstudent=Rstudent plot leverage*case=3; cookd=CooksD dffits=DFFITS h=leverage; run; run; title 'Cooks Distance'; data out2; axis2 label=('Cooks D'); set out2; proc gplot data=out2; case+1; plot CooksD*case=3/vaxis=axis2; keep Rstudent CooksD DFFITS Leverage run; Case; title 'DFFITS Values'; run; axis2 label=('Standard symbol3 c=red v=dot i=l; Influence'j=r'on Predicted Value'); title 'Student deleted residuals'; proc gplot data=out2; axis2 label=('Studentized Residual' j=r plot DFFITS*case=3/vaxis=axis2; 'Without Current Obs'); run; proc gplot data=out2; plot rstudent*case=3/vaxis=axis2; run;

St udent i zed Resi dual Wit hout Cur r ent Obs 4

3

2

1

0

-1

-2

-3

0 10 20 30 40 50 60

case Lever age 0. 31 0. 30 0. 29 0. 28 0. 27 0. 26 0. 25 0. 24 0. 23 0. 22 0. 21 0. 20 0. 19 0. 18 0. 17 0. 16 0. 15 0. 14 0. 13 0. 12 0. 11 0. 10 0. 09 0. 08 0. 07 0. 06 0. 05 0. 04 0. 03 0. 02

0 10 20 30 40 50 60

case Cooks D 0. 34 0. 33 0. 32 0. 31 0. 30 0. 29 0. 28 0. 27 0. 26 0. 25 0. 24 0. 23 0. 22 0. 21 0. 20 0. 19 0. 18 0. 17 0. 16 0. 15 0. 14 0. 13 0. 12 0. 11 0. 10 0. 09 0. 08 0. 07 0. 06 0. 05 0. 04 0. 03 0. 02 0. 01 0. 00

0 10 20 30 40 50 60

case

St andar d I nf l uence on Pr edi ct ed Val ue 2

1

0

-1

0 10 20 30 40 50 60

case Output Statistics

Hat Diag Cov ------DFBETAS------Obs Residual RStudent H Ratio DFFITS Intercept X1 X2 X3 X8 13 -0.0401 -0.2047 0.1555 1.3070 -0.0879 0.0628 -0.0074 -0.0579 -0.0564 0.0173 14 -0.0609 -0.2940 0.0537 1.1610 -0.0700 0.0352 -0.0042 -0.0466 -0.0247 0.0188 15 -0.1308 -0.6563 0.1174 1.2011 -0.2393 0.0025 -0.0596 -0.0020 0.0575 -0.1842 16 0.1995 0.9713 0.0522 1.0612 0.2280 -0.0777 0.1416 0.0651 -0.0238 -0.0858 17 0.5952 3.3696 0.1499 0.4511 1.4151 0.4437 -0.1538 0.6261 -1.1405 -0.0103 18 -0.1597 -0.8078 0.1271 1.1871 -0.3082 -0.2849 0.1749 0.0737 0.2263 -0.0058 19 -0.1868 -0.9047 0.0452 1.0670 -0.1969 0.0678 -0.1349 -0.0155 -0.0134 0.0882 20 -0.0379 -0.1885 0.1073 1.2372 -0.0653 -0.0086 0.0150 0.0036 -0.0032 -0.0572

Parameter Estimates

Parameter Standard Variance Variable DF Estimate Error t Value Pr > |t| Inflation

Intercept 1 3.85242 0.19270 19.99 <.0001 0 X1 1 0.07332 0.01897 3.86 0.0003 1.10259 X2 1 0.01419 0.00173 8.20 <.0001 1.01992 X3 1 0.01545 0.00140 11.07 <.0001 1.04871 X8 1 0.35297 0.07719 4.57 <.0001 1.09186

Conclusion 1. Outlying Y observation is the 17th case and Outlying X observations are 28th and 38th case. 2. Only 17th observation is identified being influential case since the absolute values of Cooks D, DFFITS and DFBETAS measurements of case 17 are much larger than those of other cases by SAS output and case plots. And by looking at the values of those measurements for case 17, we did not find case 17 make large influence to the model. 3. By looking at the VIF values of the four predictor variables, there is no evidence of severe multicollinearity between them. Direct check of the influence of the influential case to the predictions

It is to measure the absolute percent difference that the influential case brought using the following statistics: ˆ ˆ Yi(17)  Yi ˆ . Yi data surgical1; data out5; set surgical; merge out3 out4; if _n_=17 then LnY=.; run; run; Data out5; proc reg data=surgical; set out5; model LnY =X1 X2 X3 X8; Diff=abs((FitLnY17-FitLnY)/FitLnY); output out=out3 P=FitLnY; run; run; Proc means data=out5; proc reg data=surgical1; var Diff; model LnY=X1 X2 X3 X8; run; output out=out4 P=FitLnY17; run;

The MEANS Procedure Analysis Variable : Diff N Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 54 0.0042777 0.0043149 0.000127378 0.0177002 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Conclusion Since the largest absolute percent difference that case 17 make to the prediction is only 1.7% and the average percent difference is 0.4%, we can claim that case 17 does not have such a disproportionate influence on the fitted values that remedial action would be required.

Homework #7 The due time is Dec. 10, 2009 by the beginning of the class. Page 416 of textbook: 10.12