Analysis of Covariance Handout

Total Page:16

File Type:pdf, Size:1020Kb

Analysis of Covariance Handout

Statistics 701 Analysis of Covariance Handout

Data Set for Illustration: Rehabilitation data set, where Y = Number of Days of Treatment, the qualitative factor is "Physical Fitness" with 1 = Below Average, 2 = Average, and 3 = High Average. The covariate or quantitative factor is Age, in years. The full data set is given below.

YnumDays AphyFit Rep XAge 29 1 1 18.3 42 1 2 30.0 38 1 3 26.5 40 1 4 28.1 43 1 5 29.7 40 1 6 27.8 30 1 7 19.8 42 1 8 29.3 30 2 1 20.8 35 2 2 25.2 39 2 3 29.2 28 2 4 20.0 31 2 5 21.5 31 2 6 22.1 29 2 7 19.7 35 2 8 24.7 29 2 9 20.2 33 2 10 22.9 26 3 1 22.7 32 3 2 28.7 21 3 3 18.9 20 3 4 18.0 23 3 5 21.7 22 3 6 20.0

Here is a plot of these data with NumDays as dependent and Age as independent, where we used different markers to indicate the Physical Fitness. This plot was generated using Minitab using the grouping option.

1 2 40 3 s y a D

m 30 u N Y

20

20 25 30 XAge

Clearly, from this plot we note that the response variable is affected by the Age, so that in analyzing this data, we should use a model that takes into account the effect of Age. This is the type of data where an Analysis of Covariance model is appropriate. Analysis of Covariance Model (ANOCOVA)

Yij =  + i + Xij + ij, j=1,2,…,ni; i=1,2,…,p=3.

Performing the Analysis using Minitab (ANOVA using General Linear Model option and by using the Covariate option within this option). The output from this run is given below.

Worksheet size: 100000 cells Retrieving project from file: C:\COURSES\STAT70~1\LECTUR~1\ANALYS~1.MPJ

General Linear Model

Factor Type Levels Values APhyFit fixed 3 1 2 3

Analysis of Variance for YNumDays, using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P APhyFit 2 672.00 246.08 123.04 399.11 0.000 XAge 1 409.83 409.83 409.83 1329.39 0.000 Error 20 6.17 6.17 0.31 Total 23 1088.00

Term Coef StDev T P Constant 3.9083 0.7610 5.14 0.000 XAge 1.16729 0.03201 36.46 0.000

Unusual Observations for YNumDays

Obs YNumDays Fit StDev Fit Residual St Resid 23 23.0000 24.0389 0.2267 -1.0389 -2.05R

R denotes an observation with a large standardized residual.

Conclusions

1. There is a significant effect of the covariate "Age" as can be deduced from the p-value from the ANOVA table pertaining to "Xage". The estimate of the regression coefficient is 1.16729, and the estimate of the standard error is 0.03201. Note that this test is performed after removing the effect of the qualitative factor. 2. After removing the effect of the covariate Age, we also find that there are signficant differences among the levels of the factor "Physical Fitness" as can be discerned from the p-value associated with AphyFit. To see which levels are different, we could then examine the result of the Tukey multiple comparison procedure which is given below. Since all these simultaneous confidence intervals for the difference between two means do not include zero, then we could conclude that the three levels of physical fitness do have different mean number of days of treatment, even after we have removed the effect of age.

Tukey 95.0% Simultaneous Confidence Intervals Response Variable YNumDays All Pairwise Comparisons among Levels of APhyFit

APhyFit = 1 subtracted from:

APhyFit Lower Center Upper --+------+------+------+---- 2 -2.574 -1.847 -1.121 (--*-) 3 -9.566 -8.723 -7.880 (--*--) --+------+------+------+---- -9.0 -6.0 -3.0 0.0

APhyFit = 2 subtracted from:

APhyFit Lower Center Upper --+------+------+------+---- 3 -7.606 -6.876 -6.146 (-*--) --+------+------+------+---- -9.0 -6.0 -3.0 0.0

Also included below are the fitted values as well as the (estimated) residuals.

YnumDays AphyFit Rep XAge FITS1 RESI1 29 1 1 18.3 28.7930 0.20697 42 1 2 30.0 42.4503 -0.45028 38 1 3 26.5 38.3648 -0.36478 40 1 4 28.1 40.2324 -0.23244 43 1 5 29.7 42.1001 0.89991 40 1 6 27.8 39.8822 0.11775 30 1 7 19.8 30.5440 -0.54396 42 1 8 29.3 41.6332 0.36682 30 2 1 20.8 29.8639 0.13613 35 2 2 25.2 34.9999 0.00007 39 2 3 29.2 39.6691 -0.66907 28 2 4 20.0 28.9300 -0.93004 31 2 5 21.5 30.6810 0.31903 31 2 6 22.1 31.3813 -0.38134 29 2 7 19.7 28.5799 0.42015 35 2 8 24.7 34.4163 0.58372 29 2 9 20.2 29.1635 -0.16349 33 2 10 22.9 32.3152 0.68483 26 3 1 22.7 25.2062 0.79380 32 3 2 28.7 32.2099 -0.20991 21 3 3 18.9 20.7705 0.22949 20 3 4 18.0 19.7199 0.28005 23 3 5 21.7 24.0389 -1.03891 22 3 6 20.0 22.0545 -0.05452

1 40 2 3 1 2 s e d 3 u e l t t a

i 30 F V

20

20 25 30 XAge

Towards Providing Parameter Estimates The Cell Means for the Response Variable NumDays are given below. Rows: APhyFit

YNumDays YNumDays YNumDays N Mean StDev

1 8 38.000 5.477 2 10 32.000 3.464 3 6 24.000 4.427 All 24 32.000 6.878

The Cell Means for the Covariate Age are given below.

Rows: APhyFit

XAge XAge XAge N Mean StDev

1 8 26.188 4.566 2 10 22.630 2.990 3 6 21.667 3.858 All 24 23.575 4.098

Recall that the estimate of the regression coefficient was: 1.16729

Estimate of the grand mean: 32 - (1.16729)(23.575) = 4.4811

Estimates of the Treatment Effects for levels of Physical Fitness:

Below Average: (38 - 32) - (1.16729)(26.188 - 23.575) = 2.9499 Average: (32 - 32) - (1.16729)(22.630 - 23.575) = 1.1031 High Average: (24 - 32) - (1.16729)(21.667 - 23.575) = -5.7728

Note that (8)(2.9499) + (10)(1.1031) + (6)(-5.7728) = 0.

Remark: The testing procedure could also be performed using the "Extra Sum of Squares" approach. Thus, to test for differences among the levels of the qualitative variable "PhyFit" we fit the two models:

Full Model: includes Age and PhyFit, whose ANOVA was given above. Reduced Model: only includes Age. The ANOVA for this is given by

Analysis of Variance for YNumDays, using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P XAge 1 835.75 835.75 835.75 72.89 0.000 Error 22 252.25 252.25 11.47 Total 23 1088.00

Term Coef StDev T P Constant -2.682 4.121 -0.65 0.522 XAge 1.4711 0.1723 8.54 0.000

The extra sum of squares associated with "PhyFit" is therefore: SSE(Reduced) - SSE(Full) = 252.25 - 6.17 = 246.08, and its degrees-of-freedom is 22 - 20 = 2. Note that this extra sum of squares is also obtainable from the Adjusted SS in the first ANOVA table (this corresponds to the Type III SS). This approach could also be used to perform the test that the regression coefficient is zero, but this test is obtainable from the first ANOVA table.

Recommended publications