Analysis of Covariance (Really a Numerical BLOCKING Factor)
Total Page:16
File Type:pdf, Size:1020Kb
Analysis of Covariance (really a numerical BLOCKING Factor) Its useful when we are interested in comparing treatment effects, but our response is affected by another numerical variable that we cannot effectively control in our design. Example: Studying weekly sales of Y of some item under advertising strategies for different stores (treatments), will be more successful if each store’s sales of the item the week before, X, is included in our study. Physicians studying the effects of diets on renal function will want to use the age of the patients as a co-variate since, age may have huge effects on the renal function too. The data for the simplest ANCOVA will be of the following form: ni observation from the ith treatment as pairs (Yij, Xij), j=1,…,ni and i=1,…,t. The FULL model or the unequal slopes model for an ANCOVA is simply that each of the r treatments possesses its own regression line for Y vs. X, but with the same amount of variability for each line. The Model: Yij i + i Xij + ij Where: is the overall constant (an average Y intercept over the r regression lines) i: an adjustment to the Y intercept for the ith regression line i: slope of the ith regression line Xij covariate assumed to be measured without error ij are independently, normally distributed with mean 0 and variance 2. The true regression line for treatment 1 is (+1) + 1X for treatment 2 is (+2) + 2X and so on.. In most situations we are interested in comparing the mean responses between treatments at a specified value of X, say X0. Such a difference is labeled D (for treatment 1 and 2) D= () – () X0. Obviously if we try to do this for all possible values of X0 its going to be a lot of work. Hence it would be much easier for us if the lines were parallel () and then it’s a straight comparison of (). Then our model is of form: The Model: Yij i + Xij + ij. So when comparing the mean responses among treatments that of primary interest, 1. Fit the first (unequal slopes) model. 2. Check for equality of slopes 3. If the test is highly INSIGNIFICANT, fit the second model and proceed with comparison of means. How to do this in SAS: Do a plot to check for the equality of slopes. PROC PLOT; PLOT Y*X=TRT; OR you could do a plot separately for each treatment and compare: PROC SORT; BY TRT; PROC PLOT; BY TRT; PLOT Y*X; This should give you a rough idea of whether the lines are indeed parallel. To do a formal test, we want to check for equality of slopes. PROC GLM; CLASS TRT; MODEL Y=TRT X X*TRT; REMEMBER: You should only interpret the TYPE III F test for X*TRT which tests for equal slopes. Do not interpret anything else. ESPECIALLY TRT effects. (It tests for equality of the y intercepts among the treatments and if X=0 is not in your data range, this test is neither of use, nor relevant). Equal Slopes Model: If you have decided that the slopes are indeed equal, you can use the following statements PROC GLM; CLASS TRT; MODEL Y=TRT X; Hypothesis of interest: No treatment effects: (lines coincide) , r (TYPE III F TEST FOR TRT) No X effect (slope =0) =0 (TYPE III F TEST FOR X) SAS creates the vector of parameters as follows and you can estimate anything you want from the ESTIMATE statements in SAS. Consider a situation with 3 treatments and 1 covariate. Vector created by SAS is ( ) Intercept TRT slope How to do this: If you are interested in the intercept of treatment 1 ESTIMATE INTERCEPT 1 TRT 1 0 0; Common slope ESTIMETE X 1; Distance between line 1 and 2 ESTIMATE TRT 1 –1 0; Mean response in treatment 1 with a X=50 ESTIMATE INTERCEPT 1 TRT 1 0 0 X 50; AND SO ON. The LSMEANS or the adjusted means calculates the means of the treatment at the most typical value of X which is X…, If that is of interest to you you can use the following statements; After the model statement LSMEANS TRT/ STDERR PDIFF; It gives you the estimates of the means, the stderr and the p-valus for the non-simulatneous difference among the means. You can use these results to do BONFERRONI type comparisons. HOWEVER, NEVER NEVER USE THE MEANS STATEMENT IS SAS WITH ANCOVA. Example A common clinical method to evaluate an individual’s cardiovascular capacity is through treadmill testing. Maximal oxygen uptake is considered a good index of work capacity and cardiovascular function. The measured maximal oxygen uptake by an individual depends on a number of factors including the mode of testing, test protocol, and the subject’s physical conditioning and age. A common test protocol on the treadmill is the inclined protocol where grade and speed are incrementally increased until exhaustion occurs. Two treatments were of interest to the researcher: A 12-week step aerobic training program and a 12-week outdoor running regimen on flat terrain. It was thought that the step aerobic training better simulated the treadmill inclined protocol than the flat terrain running regimen. Twelve healthy males who did not participate in a regular exercise program were selected. Six individuals were randomly assigned to the step aerobic treatment and six to the flat terrain running treatment. Various respiratory measurements were made on the subjects while on the treadmill before the 12-week period. There were no differences in the respiratory measurements of the two groups of subjects prior to the treatment. The measurement of interest is the change in maximal ventilation (liters/minute) of oxygen for the 12-week period. The observations on the 12 subjects and their ages are shown in the following table: Aerobic Group Running Group Age Change Age Change 31 17.05 23 -0.87 23 4.96 22 -10.74 27 10.40 22 -3.27 28 11.05 25 -1.97 22 0.26 27 7.5 24 2.51 20 -7.25 The experimental design is completely randomized with a one-way treatment structure. We could fit the usual linear model, Yit = + i + it However, treadmill performance is related to age. So, the analysis performed should adjust for age related effects. A model with a linear predictor for adjusting age (X) effects is: Yit = + i + β Xit + it Assumptions In addition to the usual assumptions on the error variables, the analysis of covariance model assumes a linear relationship between the covariate and the mean response, with the same slope for each treatment. First check model lack of fit by plotting response versus the covariate for each treatment on the same scale to make sure the relationship is linear. If each slope is linear then a formal test for equality of slopes for each treatment can be conducted. The model for the unequal slope situation is: Yit = + i + βi Xit + it This model may be rewritten as: Yit = + i + β Xit + (β) Xit + it Where th th Yit = the response for the t replicate for the i level of the treatment = the overall mean response th i = deviation of the response from for the i level of treatment β = the common slope of the linear predictor β = the interaction between the treatment effect and the linear predictor th Xit = the it value of the linear predictor 2 it = is random error it ~ N(0, ) and are mutually independent Approach to ANCOVA analysis Test if the interaction β between the treatment effect and the linear predictor is significant. Recall that interactions measure the parallel nature of the treatment means across the levels of the second factor, which in this case is the linear predictor. If the interaction is significant, this indicates that at least one of the regression lines for a treatment has a different slope. If the interaction is non-significant, the regression lines are assumed to have the same slope for each treatment. An example analysis of covariance title1 'Analysis of Covariance for Maximal Oxygen Uptake'; options linesize=80 pageno=1; data ancova; input age oxygen trt $; cards; 31 17.05 aerobic 23 4.96 aerobic 27 10.40 aerobic 28 11.05 aerobic 22 0.26 aerobic 24 2.51 aerobic 23 -0.87 running 22 -10.74 running 22 -3.27 running 25 -1.97 running 27 7.50 running 20 -7.25 running ; proc print data = ancova; run; proc plot data = ancova; plot oxygen*age=trt/ vpos=19 hpos=50; run; title2 'Model with Interaction'; proc mixed data=ancova method=type3; class trt; model oxygen = trt age trt*age; run; title2 'Model without Interaction'; proc mixed data=ancova method=type3; class trt; model oxygen = trt age / solution; lsmeans trt / pdiff ; run; Analysis of Covariance for Maximal Oxygen Uptake 1 Obs age oxygen trt 1 31 17.05 aerobic 2 23 4.96 aerobic 3 27 10.40 aerobic 4 28 11.05 aerobic 5 22 0.26 aerobic 6 24 2.51 aerobic 7 23 -0.87 running 8 22 -10.74 running 9 22 -3.27 running 10 25 -1.97 running 11 27 7.50 running 12 20 -7.25 running Analysis of Covariance for Maximal Oxygen Uptake 2 Plot of oxygen*age. Symbol is value of trt. 20 ˆ ‚ a ‚ ‚ oxygen ‚ a a ‚ ‚ r ‚ a ‚ a 0 ˆ a r ‚ r r ‚ ‚ r ‚ ‚ r ‚ ‚ ‚ -20 ˆ Šƒˆƒƒƒˆƒƒƒˆƒƒƒˆƒƒƒˆƒƒƒˆƒƒƒˆƒƒƒˆƒƒƒˆƒƒƒˆƒƒƒˆƒƒƒˆƒƒƒƒ 20 21 22 23 24 25 26 27 28 29 30 31 age Analysis of Covariance for Maximal Oxygen Uptake 3 Model with Interaction The Mixed Procedure Model Information Data Set WORK.ANCOVA Dependent Variable oxygen Covariance Structure Diagonal Estimation Method Type 3 Residual Variance Method Factor Fixed Effects SE Method Model-Based Degrees of Freedom Method Residual Class Level Information Class Levels Values trt 2 aerobic running Dimensions Covariance Parameters 1 Columns in X 6 Columns in Z 0 Subjects 1 Max Obs Per Subject 12 Number of Observations Number of Observations Read 12 Number of Observations Used 12 Number of Observations Not Used 0 Type 3 Analysis of Variance Sum of Source DF Squares Mean Square Expected Mean Square trt 1 5.907110 5.907110 Var(Residual) + Q(trt) Type 3 Analysis of Variance Error Source Error Term DF F Value Pr > F trt MS(Residual) 8 0.69 0.4298 Analysis of Covariance for Maximal Oxygen Uptake 4 Model with Interaction The Mixed Procedure Type 3 Analysis of Variance Sum of Source DF Squares Mean Square Expected Mean Square age 1 303.176487 303.176487 Var(Residual) + Q(age,age*trt) age*trt 1 2.048957 2.048957 Var(Residual) + Q(age*trt) Residual 8 68.339814 8.542477 Var(Residual) Type 3 Analysis of Variance Error Source Error Term DF F Value Pr > F age MS(Residual) 8 35.49 0.0003 age*trt MS(Residual) 8 0.24 0.6375 Residual .