<<

22S:101 : J. Huang 1

Analysis of (ANOVA)

Analysis of variance (ANOVA) refers to a broad class of methods for studying variations among samples under different conditions (or treatments). The simplest form of ANOVA can be used for testing three or more population . It can be considered as an extension of the two-sample t-tests we discussed for comparing two population means.

The basic idea of ANOVA is to partition the total variation in a set into two or more components. Associated with each of these components is a specific source of variation, so that in the analysis it is possible to ascertain the magnitude of the contributions of each of these sources to the total variation. 22S:101 Biostatistics: J. Huang 2

1. The Completely Randomized Design

In many situations we want to test the null hypothesis that three or more treatments are equally effective. The for this purpose is designed in such a way that the treatment of interest are assigned completely at random to the subjects (or objects). Then the measurements to determine treatment effectiveness are taken. This design is called the completely randomized experimental design. 22S:101 Biostatistics: J. Huang 3

Example: [Zelazo et al. “Walking” in the newborn. Science 176, 314-315, 1972.]

Zelazo et al. conducted a study to test the generality of the observation that stimulation of the walking and placing reflexes in the newborn promotes increased walking and placing. The subjects in the Table were about 1-week-old male infants. Each infant was assigned to one of the four groups: an experimental group (active-exercise) and three control groups.

Table: Ages of infants for walking alone (months)

Active-exercise Passive-exercise No-exercise 8-week Control 9.00 11.00 11.50 13.25 9.50 10.00 12.00 11.50 9.75 10.00 9.00 12.00 10.00 11.75 11.50 13.50 13.00 10.50 13.25 11.50 9.50 15.00 13.00 10.125 11.375 11.708 12.350 sd 1.447 1.896 1.520 0.962

Grand mean 6 × 10.125 + 6 × 11.375 + 6 × 11.708 + 5 × 12.350 x = = 11.348. 23 22S:101 Biostatistics: J. Huang 4

The omnibus hypothesis

In most applications involving more than two groups, the investigator has some specific contrasts in mind for estimation or test. For instance, in the Example, the comparison of the active-excercise group with each of the remaining three groups are the three contrasts of principal interest [we will talk more about this below].

However, in some situations, it is of interest to test the omnibus hypothesis that there are no differences among the treatment groups.

H0 : µ1 = µ2 = µ3 = µ4 vs. HA : The opposite of H0. 22S:101 Biostatistics: J. Huang 5

Analysis of Variance Table for the Example Source of Variation SS df MS F- Between Samples 14.78 3 4.92 F=2.14 With Samples 43.69 19 2.30 Total 58.47 22 df: degree of freedom; SS: Sum of Squares; MS: Mean Square. 22S:101 Biostatistics: J. Huang 6

Analysis of Variance Table Source of Variation SS df MS F-statistic Between Samples SSB k − 1 MSB=SSB/(k-1) F=MSB/MSW With Samples SSW n − k MSW=SSW/(n-k) Total SST n − 1 Here k is the number of groups and n is the total sample size, i.e., n = n1 + · · · + nk. k − 2 SSB = X nj(xj x) , j=1

k nj − 2 SSW = X X(xij xj) , j=1 i=1

k nj − 2 SST = X X(xij x) . j=1 i=1 Using the notations in the text book, SSB s2 = MSB = , B k − 1

SSW s2 = W SB = . W n − k 22S:101 Biostatistics: J. Huang 7

Example(page 287)

Forced expiratory volumn in one second for patients with coronary artery disease sampled at 3 different medical centers.

John Hopkins Rancho Los Amigos St. Louis n1 = 21 n2 = 16 n3 = 23 x1 = 2.63 liters x2 = 3.03 liters x3 = 2.88 liters s1 = 0.496 liters s2 = 0.523 liters s3 = 0.498 liters

− 2 − 2 − 2 SSW = (n1 1)s1 + (n2 1)s2 + (n3 1)s3 = (21 − 1)0.4962 + (16 − 1)0.5232 + (23 − 1)0.4982 = 14.478.

21 × 2.63 + 16 × 3.03 + 23 × 2.88 x = = 2.83. 21 + 16 + 23

2 2 2 SSB = n1(x1 − x) + n2(x2 − x) + n3(x3 − x) = 21(2.63 − 2.83)2 + 16(3.03 − 2.83)2 + 23(2.88 − 2.83)2 = 1.538.

SST = SSB + SSW = 14.478 + 1.538 = 16.016. 22S:101 Biostatistics: J. Huang 8

ANOVA Table Source of Variation SS df MS F-statistic Between Samples 1.538 2 0.769 F=3.028 With Samples 14.478 57 0.254 Total 16.016 59 The F-statistic SSB/(n − k) 1.538/2 0.769 F = = = = 3.028. SSW/(k − 1) 14.478/57 0.254 This F-statistic has (2, 57) degrees of freedom. The p-value is 0.056. 22S:101 Biostatistics: J. Huang 9

Multiple Comparisons Procedures

H0 : µ1 = · · · = µk.

There are k(k − 1)/2 pairwise comparisons. One (conservative) way to control the overall type I error rate is to use the Bonferroni correction: the significance level for each individual test is

∗ 0.05 α = . k(k − 1)/2

We are often interested in a particular pair-wise comparison.

H0 : µi = µj.

We use the t-statistic x − x t = i j . ij 2 psw[(1/ni) + (1/nj)] 22S:101 Biostatistics: J. Huang 10

Example, for the FEV example in Chapter 12,

x1 − x2 t12 = 2 psw[(1/n1) + (1/n2)] 2.63 − 3.03 = p0.254[(1/21) + (1/16)] = −2.39.

From the t distribution with df = n − k = 60 − 3 = 57, p = 0.02. Therefore, we reject the null hypothesis at the 0.10/3 = 0.033 level. However, we would not reject the null hypothesis at the 0.05/3 = 0.017 level.

x1 − x3 t13 = 2 psw[(1/n1) + (1/n3)] 2.63 − 2.88 = p0.254[(1/21) + (1/23)] = −1.64.

From the t distribution with df = n − k = 60 − 3 = 57, p = 0.11.

x2 − x3 t23 = 2 psw[(1/n2) + (1/n3)] 3.03 − 2.88 = p0.254[(1/16) + (1/23)] = 0.91.

From the t distribution with df = n − k = 60 − 3 = 57, p = 0.37.