Quick viewing(Text Mode)

Statistical Analysis and Optimal Design for Cluster Randomized Trials

Statistical Analysis and Optimal Design for Cluster Randomized Trials

Psychological Methods Copyright 1997 by the Am 1997, Vol. 2, No. 2, 173-185

Statistical Analysis and for Cluster Randomized Trials

Stephen W. Raudenbush Michigan State University

In many intervention studies, therapy outcome evaluations, and educational field trials, random treatment assignment of clusters rather than persons is desirable for political feasibility, logistics, or ecological . However, cluster randomized designs are widely regarded as lacking statistical precision. This article considers when and to what extent using a pretreatment covariate can increase experimental precision. To answer this question, the author first optimizes allocation of resources within and between clusters for the no-covariate case. Optimal sizes at each level depend on variation within and between clusters and on the cost of at each level. Next, the author considers optimal allocation when a covariate is added. In this case, the explanatory power of the covariate at each level becomes highly relevant for choosing optimal sample sizes. A key conclusion is that statis- tical analysis that fully uses information about the covariate-outcome relationship can substantially increase the of the cluster randomized trial, especially when the cost of sampling clusters is high and the covariate accounts for substantial variation between clusters. Recent multilevel studies indicate that these conditions are common.

The advantages of randomized in fa- This article considers design issues that arise in cilitating causal inference are widely recognized. cluster randomized trials, that is, experiments in probabilistically equates treatment which clusters of persons rather than persons them- groups on all pretreatment covariates. As a result, a selves are assigned at random to treatments. Such for a treatment quantifies clusters might be classrooms, schools, therapy groups, uncertainty about the magnitude of causal impact of clinics, health maintenance organizations (HMOs), the treatments. Such unambiguous causal inferences neighborhoods, program sites, or even entire cities or are not possible in quasi-experiments (intervention towns. There are many settings in which cluster ran- studies widiout randomization) because the confound- domized experiments are more feasible or desirable ing effects of measured or unmeasured pretreatment than experiments based on the randomized assign- influences cannot be eliminated with certainty. ment of individuals to treatments. First, it may be politically or logistically impossible to assign children Research reported here was conducted with support from to preschools, patients to therapists, or residents to the Project on Human Development in Chicago Neighbor- neighborhoods. Yet of pre- hoods with funds from the MacArthur Foundation, the Na- schools, therapists, or neighborhoods may be quite tional Institute of Justice and the National Institute of Men- feasible. Second, even when it is possible to assign tal Health. An earlier version of this article was presented at persons within clusters to treatments, it may be unde- the Manpower Demonstration Research Corporation's sirable because of "diffusion of treatments" (Cook & Workshop on Alternatives to Individual Random Assign- Campbell, 1979). If half the teachers within a school ment. I would like to thank Winston Lin for his extremely are assigned to an in-service training program, they helpful critique of an earlier draft and Howard Bloom for may share their new learning with colleagues who do providing a helpful example. Correspondence concerning this article should be ad- not attend. Third, and perhaps most important, results dressed to Stephen W. Raudenbush, Department of Coun- of demonstration projects based on random assign- seling, Educational Psychology, and Special Education, ment of clusters may generalize better to the policy Michigan State University, 461 Erickson Hall, East Lan- contexts they are designed to inform. The unit of as- sing, Michigan 48824. signment and treatment—the cluster—is often the unit

173 174 RAUDENBUSH of reform. After the research is completed, a pre- the of the treatment contrast will typi- school or a therapist will adopt a new approach for all cally depend more heavily on the number of clusters clients, not for a randomly selected half. than on the number of participants per cluster. Studies Despite its appeal, researchers have often regarded with large numbers of clusters tend to be expensive. cluster randomization with suspicion because it com- Blair and Higgins (1985) provided a lucid discussion plicates the analysis and is assumed to lack statistical of the influence of the intracluster correlation on pre- precision. cision and power. Murray et al. (1994) provided a thorough review of design options for coping with Analysis poor precision in trials for which whole communities are assigned to treatments. Walsh (1947) showed that if clusters are the unit of randomization, p values based on conventional analy- ses will generally be too small. The greater the vari- Planning Cluster Randomized Trials ability between clusters, the more severe the bias. He showed how to use information about the intracluster In this article, I argue that sound planning for clus- correlation to estimate the bias associated with con- ter-randomized trials, based on now-standard hierar- ventional confidence intervals and tests. Kish (1965) chical statistical models, can help ensure the design of related use of the intracluster in affordable studies with adequate statistical precision. cluster randomized trials to the in Such planning involves careful choice of sample sample surveys based on . Cornfield (1978) wrote: "Randomization by cluster accompa- sizes and evaluation of alternative designs. nied by an analysis appropriate to randomization by individual is an exercise in self-deception" (p. 101). Choosing Sample Sizes Variability between clusters can arise for two rea- sons. First, persons are typically nonrandomly se- In planning individualized randomized trials, a key lected into the intact clusters that are the unit of ran- consideration is the total sample size, which will domization, and these selection effects can create strongly influence experimental precision. Planning intercluster variation on the outcome. Second, even if of cluster randomized trials is more complex in that no such selection effects are present, the shared ex- two sample sizes—the within-cluster sample size and perience within a classroom, therapy group, or HMO the total number of clusters—contribute to experi- can create intercluster outcome variation that is extra- mental precision. neous to the treatment effect. In many cluster-based interventions, the sample Classical experimental design texts (e.g., Kirk, size per cluster is under the researchers' control. Ex- 1982) recommend the nested analysis of amples include school-based interventions designed (ANOVA) for such designs. Unfortunately, these to prevent mental health disorders and related sub- methods do not apply well when the data are unbal- stance abuse (Jones, 1992; Rosenbaum, Flewelling, anced (unequal sample sizes per cluster) or when co- Bailey, Ringwalt, & Wilkinson, 1994), community- variates are available. Recently, however, more ap- based health-promotion trials (Johnson et al., 1990; propriate analytic methods have become available Murray et al., 1994), evaluations of residential treat- under the label of multilevel models (Goldstein, ment programs (Friedman & Glickman, 1987), and 1987), hierarchical linear models (Bryk & Rauden- community-based care (Goldberg, 1994; Lehman, bush, 1992), or random coefficient models (Longford, Slaughter, & Myers, 1992). Sampling large numbers 1993). Raudenbush (1993) showed how these meth- of persons per cluster will constrain the number of ods duplicate classical ANOVA results for a variety clusters one can afford to sample, especially when of designs and generalize application to the more individual-level is expensive as, for complex data that arise in large-scale field studies. example, when individual psychological assessments or biological assays are required. In these cases, I Statistical Precision shall approach planning from the perspective of opti- Aversion to cluster randomized designs is based in mal allocation; Choosing the optimal within-cluster part on a widespread perception that they lack statis- sample size is a prelude to deciding on the total num- tical precision. When the analysis is done correctly, ber of clusters. STATISTICAL ANALYSIS 175

In other cases, cluster size will be immutable as, for control group C on a posttreatment, continuous, inter- example, when married couples are assigned at ran- val scale outcome variable, Y. It is not possible to dom to alternative approaches of family therapy. assign persons at random to treatments; rather / clus- Whether the within-cluster sample size is determined ters, each of size n, will be assigned at random into by optimal allocation or by nature, both the within- equal size groups of Jf2. Thus, the total sample size and between-cluster sample sizes will contribute to will be nj with nJ/2 persons in E and nJ/2 persons in precision, but the relative importance of these two C. In addition, one has access to a prerandomization sample sizes will depend on whether covariates are covariate, X, measured on each of the nJ subjects. A used. simple for this scenario, ignoring the covariate, is Use of Covariates Y

The added precision depends on the explanatory grand ; yl is the treatment contrast defined as the power of the covariate. For example, Porter and mean difference between the two groups; it is typi- Raudenbush (1987) gave an example in which adding 2 2 cally assumed that u} ~ N(0, T ), ei} ~ N(0, a ). The ays a single covariate has the same effect on precision as are assumed independent for all j and the efjs are doubling the sample size. In cluster randomized trials, assumed independent for all i and j with Uj and etj assessing the added value of the covariate is more independent of each other. Here T2 is the between- complex because it depends on the explanatory power cluster variance, and o-2 is the within-cluster variance. of the covariate, size of the variance components, and Equation 1 models the dependence between obser- costs of sampling at each level. An analysis of optimal vations in the same cluster via the random effect, Uj. allocation of resources may reveal that the optimal Thus, the between a pair of observations within-cluster sample size when using the covariate is YtJ and Yt,j (outcomes of two persons /' and i" located quite different from the optimal sample size for the in the same cluster./) is T2, and the correlation between design without the covariate. these two observations is the intracluster correlation

Optimal Design (2) In choosing between two designs (e.g., a design without a covariate as compared with a design with a As Equation 2 shows, the intracluster correlation is covariate), one proceeds as follows. First, one as- the proportion of variance in the outcome that lies sesses optimal allocation separately for each design, between clusters. given assumptions about costs and variance compo- This model assumes conditional independence, that nents. Second, one computes the standard error of the is, given the random effect associated with a cluster, treatment contrast for each design. I define the opti- the responses of persons within the cluster are inde- ma] design for a given set of assumptions as the de- pendent. This assumption will be unrealistic in some sign that yields the smaller standard error.1 A design settings. For example, suppose that classrooms were is uniformly optimal if, for all plausible assumptions assigned at random to experimental and control con- about costs and variance components, it produces the ditions, where the experimental treatment involved smaller standard error. This approach is readily gen- cooperative learning (Slavin, 1983). By design, the eralized to comparisons of multiple designs (e.g., us- intervention promotes a high degree of ing vs. covariance analysis vs. no covari- among subgroups of students within the experimental ables). In this article, I restrict attention to two designs but not within the control condition. An appropriate (i.e., with and without the covariate) for simplicity. for such data would represent the

Optimal Allocation and Precision With No Covariate 1 Alternatively, optimally could be defined with respect to maximizing power, but this requires specification of ef- Let us consider a simple setting in which one fect size and makes the presentation to follow a bit more wishes to compare an experimental group E and a complicated. 176 RAUDENBUSH additional clustering in the experimental classrooms, ANOVA in the case of balanced data. Hopkins (1982) for example, by use of a three-level hierarchical provided a lucid discussion of the relationship be- model (Bryk & Raudenbush, 1992, chap. 8) where tween nested analyses and aggregated analyses. cooperative learning groups within experimental Some authors have recommended testing residual classrooms represent a level in the model. Although variation between clusters first; if such variation is the implications of such a model for research planning found nonsignificant, the variation within and be- are beyond the scope of this article, I caution the tween clusters is pooled to yield a more powerful test reader to consider the realism of model assumptions; of treatment group differences. This type of prelimi- I consider the implications of more complex models nary testing, however, is known to yield underesti- for future work on planning later (see Final Remarks mates of uncertainty about treatments when cluster section). variation is found nonsignificant (Fabian, 1991). In the framework of Equation 1, if p = 0, an "in- Thus, a Type II error in testing for cluster effects dividual-level analysis" that ignores the clusters will increases the probability of a Type I error in testing give an appropriate estimate of the treatment effect treatment effects. In the sequel, I avoid the two-step and the standard error. However, if p > 0, as pointed testing procedure on these grounds; retention of the out by Walsh (1947), such an analysis would produce null hypothesis of no cluster effects may be presumed a negatively biased estimate of the standard error of a Type II error in most cases of practical interest. the treatment contrast, yielding a liberal test of sig- nificance and too-short confidence interval. Even Estimates small departures of p from zero can produce consid- The restricted maximum likelihood (REML) esti- erable bias. mate of the treatment contrast and its variance are The standard analysis for Equation 1 in the case of balanced data is the mixed, two-factor nested ANOVA (Kirk, 1982, p. 460, Table 1). Clusters are the random factor and treatments the fixed factor. Un- 4A (3) der the assumptions associated with Equation 1, this Var(-9,) = — analysis gives estimates and F tests for the between- cluster variance component and the treatment con- trast. The F test for treatments is the ratio of the where between-treatment mean square to the between- 2 2 cluster mean square; the F test for residual cluster A = T + o- /n. (4) variation within treatments is the ratio of the between- cluster mean square to the within-cluster mean square Thus, the treatment contrast estimate is the difference (see Table 1). between the mean outcomes in the two groups with a If interest is confined to the treatment contrast, a variance that is efficiently estimated via REML by simple t test of group differences with the cluster as the sample data will duplicate the nested 4 * mean square between clusters

Table 1 In the case of balanced data (equal ns in all clusters), for a Balanced, Two-Group Cluster estimation of Equation 1 by REML duplicates the Randomized Design Without Covariates (Case I) familiar results of the nested ANOVA, as presented in Source df MS Expected MS Table 1 (Rao & Kleffe, 1988, pp. 35-37). When the data are unbalanced, REML estimation requires an Treatment 1 + er2 iterative procedure because the treatment contrast es- Clusters within / - 2 timate and the variance components estimates are mu- treatment tually dependent in the case of unbalanced data (see Persons within /(« - 1) Raudenbush, 1993, for an extensive discussion of how clusters mixed models estimated via REML 2 duplicate and generalize classical ANOVA proce- Note. Estimates are as follows: y, = Y..E - Y..c; cr dures). STATISTICAL ANALYSIS 177

Cost of Data Collection 500 2 = p; o-2 = 1 - p. + CV In planning the study, one must consider cost. In (10) particular, T monetary units are available. It is esti- Table 2 provides n(optimal), the corresponding J, mated that, once one has sampled a cluster, it costs C1 to enroll each subject, while the cost of sampling each and the sampling variance of the treatment contrast additional cluster is C2. Thus, the total cost of the estimates under various assumptions about intraclus- study is ter correlations and cost. Intracluster correlations ranging from small to large include .01, .05, .10, .20, T = C2). (6) and .50. The cost of sampling clusters is viewed as In many settings, more complex cost functions will twice, 10 times, or 50 tunes that of sampling persons be needed. For example, it may be far more costly to within clusters. sample experimental clusters than control clusters be- Table 2 gives the results that are expected. Small cause of large costs associated with implementing the intracluster correlations and expensive sampling of experimental intervention. I consider this issue later clusters favor large ns. The variance estimates can be (see Final Remarks section), while using the simple put in perspective by noting that if the treatment cost function of Equation 6 for illustrative purposes. groups are separated by 0.30 standard deviations, a variance of 0.0225 or smaller would be needed to Optimal Allocation provide a power of roughly 0.50 to detect the treat- ment effect at the 5% level of significance. Table 2 The cost constraint of Equation 6 defines J as therefore assigns an asterisk to those scenarios pro- T ducing a sampling variance less than 0.0225 as a J = (7) rough indicator of the designs producing a modicum nC, + C2' of precision. It is clear that planners run into serious Thus the variance of the treatment contrast (Equation 3) is expressible as a function of total resources avail- Table 2 able, the relative costs of sampling at each level, Optimal Sample Sizes and Corresponding Sampling and n: as a Function of the Intracluster Correlation and Cost Based on an Analysis That Ignores 2 2 the Covariate 4(T + cr /n) * ( C2) (8) Cluster/ person Minimizing Equation 8 with respect to n yields Intracluster cost ratio

"n(optimal)," that is, the sample size n per cluster correlation (p) (C2) n(optimal) J Var (},) that minimizes the sampling variance of the treatment .01 2 14 31 .0103* contrast: .01 10 31 12 .0138* .01 50 70 4 .0232 o- C2 n(optimal) = — * -y / —. (9) .05 2 6 61 .0133* T .05 10 14 21 .0226 This result duplicates the familiar result of Cochran .05 50 31 6 .0522 (1977; see also Snijders & Bosker, 1993) for mini- .10 2 4 80 .0156* mizing the variance of the population mean estimate .10 10 9 26 .0304 in a survey using a balanced two-stage cluster sample .10 50 21 7 .0811

with simple random sampling of clusters and then .20 2 3 104 .0186* simple random sampling of persons within clusters. .20 10 6 31 .0426 Equation 9 shows that a large n per cluster is most .20 50 14 8 .1317 advisable when the variability within clusters is large .50 2 1 146 .0233 in relation to the variability between them and when .50 10 3 38 .0693 the cost of sampling additional clusters is large in .50 50 7 9 .2606 relation to the cost of sampling persons. Note. An asterisk indicates a scenario that produces a sampling To illustrate, I now set T = 500, C = 1, and o-2 + 1 variance less than 0,0225 as an indicator of the designs producing 2 T = 1, so that a modicum of precision. 178 RAUDENBUSH problems when the intracluster correlation and the cost of sampling clusters are simultaneously large. I now consider how using a covariate can improve things.

Optimal Allocation and Precision With (13) a Covariate J/2+1 i'=l

I now add a covariate, X, but now use a hierarchical J/2 linear model for multilevel analysis to use all of the [ ^(M.]E information in the covariate to account for variation in j=i the outcome. The analysis may be viewed as a nested 2 with random effects of clusters + (M.iC-M..c) . and fixed effects of treatment and covariate. J/2+1 J

Here X and X are the covariate values of person i Model ijE ijc in cluster; of the experimental group and person i in The model can be represented by the mixed linear cluster j of the control group, respectively; M.jE and regression model M.jC are the covariate means for cluster j of the ex- perimental group and cluster j of the control groups,

respectively; and M. .E and M. .c are the covariate means for the experimental and control groups, re- where Sj again takes on a value of 0.5 for clusters in spectively. Also, SS^.., is the pooled, within-cluster the experimental group and -0.5 for those in the con- sum of squares of the covariate, and SSb,, is the trol group; Xy is the covariate measured at the person pooled, within-treatment, between-cluster sum of s now tne level; y{ again is the treatment contrast; -y2 * squares of the covariate. Equation 12 has the same regression coefficient for the person-level covariate; form as the variance of the treatment contrast with the 2 2 and it is assumed that w, ~ W(0, T ,^), etj ~ N(Q, o- k). covariate under the model without the covariate The symbols T2 and o"2 now both have the subscript (Equation 1) except that (a) the variances are residual y\x to emphasize that both the between-cluster vari- variances and (b) there is a correction factor that de- ance and the within-cluster variances are now residual pends on the distance between the treatment means on variances conditional on the effects of the covariate X. the covariate. Thus, the benefit of adding the covariate is greatest when the residual variance, A^ is much Estimation smaller than the unconditional variance, A, and when the covariate means of the two treatment groups are An assumption of ordinary regression similar. is that the model residuals are independent. Equation Inferences about the treatment effect under the 11 fails to satisfy this assumption, because the covari- analysis of covariance are based on the conditional ance between residuals of persons in the same cluster distribution of the outcome given the covariate, X, as will be T^y Efficient estimation requires an iterative in Equation 12. However, in planning research, the algorithm, for example, that based on maximum like- difference between covariate means cannot be known lihood. in advance. Thus, in planning research, it is natural to The Appendix describes maximum likelihood esti- treat X as a and to consider the vari- mation for the treatment contrast and derives the vari- ance of the treatment effect averaged over ance of the treatment contrast given the covariate and possible values of the covariate. Treating X as nor- the variance components: mally distributed produces a useful substitute for Equation 12: 2 J(M..E-M..c) /4 Var(^IX) = - 1 + 1 + (14) (12) ve + where where STATISTICAL ANALYSIS 179

The ratio UI(V + Y) will not be distributed exactly as F because, although U and Y are independent, neither (15) is independent of V, However, the Appendix provides a very accurate approximation (to the order of J~5) of the expectation of Equation 21, namely, Here Aj. = if + crj/n, where T2. is the between-cluster variance of X, and o-2 is the within-cluster variance of X. Defining p., = T2./^2 + a2) as the intracluster cor- relation on the covariate and p = + a-2,,,) as the residual intracluster correlation of Y (after adjust- 1 12 ing for X), one defines (22) Jn-2 h7n-2

~ PyU 6 = (16) It is easily shown that Equation 22 converges rapid- ly to Equation 14 provides a general expression that sim- plifies in specific interesting cases. (23)

Case 1. X is a cluster-level covariate, so that SSm = 0. Then This is the expression for the variance of the treatment contrast that is used below in determining optimal (17) allocation.

Optimal Allocation where F(l, / - 2) is distributed as F with 1 and / - 2

degrees of freedom. In this case, the variance of the If one substitutes for T/(nC^ + C2) for J in Equation treatment effect over possible random samples of the 23 and then minimizes with respect to n, one obtains, covariate is to a close approximation (see the Appendix), n(opti- mal) as the solution to Var (•?,) = £[Var(-v,IX)] = - (18)

Case 2. X is a person-level covariate in a study in n(optimal) = — which persons are assigned at random to clusters and clusters to treatments, with no independent cluster ef- (24)

fects. Here pybc = px = 0; 9 = 1. Then where £j = CJT and k2 = CJT. Given the negligible

magnitude of kl and k2 as J increases, it is readily (19) apparent that the right-hand side of Equation 24 is dominated by V^o^LyOt^y, which has the same where F(l, 7n - 2) is distributed as F with 1 and Jn - form as in Equation 9. 2 degrees of freedom, so that the variance of the treat- To illustrate, I use the same costs as before. How- ment effect over possible random samples of the co- ever, computation of standard errors requires also that variate is I make assumptions about the magnitudes of the within- and between-cluster variance components ad- 1 + (20) justed for the covariate. Such assumptions will typi- Jn-4 cally be based on past research. In principle, the ad- justed between-cluster variance, T2!^ can be either Case 3. Here, p^ = pybt with no other restrictions on the . The assumption that the covariate and smaller or larger than the unadjusted between-cluster 2 outcome have the same (or similar) intracluster cor- variance, T . This adjusted variance will be larger than relations is probably quite realistic in many settings. the unadjusted variance when the direction of the as- Then, we have 6 = 1 so that sociation between the covariate and the outcome is different at the two levels. At least in educational A research, however, experience shows that prior aca- (21) demic attainment and social background are posi- 180 RAUDENBUSH lively related to valued educational outcomes at both Table 3 levels. In fact, the relationship between such covari- Optimal Sample Sizes and Corresponding Sampling ates and the outcome tends to be considerably stron- Variances as a Function of the Intracluster Correlation ger between clusters than within clusters. In Scotland, and Cost Based on an Analysis With a Covariate prior academic achievement accounts for half the Cluster/ variation in educational attainment within neighbor- Unconditional person hoods and over 90% of the variation between neigh- intracluster cost ratio correlation (p) borhoods (Garner & Raudenbush, 1991). In U.S. high (C2) n(optimal) J Var (7,) schools, socioeconomic status explains only about 7% .01 2 19 23 .0050* of the variation within schools but over half the varia- .01 10 43 9 .0062* tion between schools (Bryk & Raudenbush, 1992, .01 50 97 4 .0094* chap. 4). Prior research can be vital in choosing co- .05 2 9 48 .0060* variates to maximize the benefits of the multilevel .05 10 19 17 .0091* covariance analysis. .05 50 43 5 .0186* Recently, Bloom (1995) has examined past find- .10 2 6 64 .0067* ings from a large-scale evaluation of adult basic edu- .10 10 13 22 .0116* .10 .0274 cation programs to facilitate planning of a cluster ran- 50 29 6 .20 2 4 85 .0076* domized trial. He found that the covariate, pretested .20 10 9 27 .0152* cognitive achievement, accounted for 73% of the .20 50 20 7 .0422 variation between clusters and 48% of the variation .50 2 2 126 .0085* within clusters in posttested educational achievement, .50 10 4 35 .0225* a finding that appears representative of prior research .50 50 10 8 .0784 using cognitive outcomes. Under the same assump- Note. This table is constructed with Equations 23 and 24 on the tions regarding cost and the unconditional variances basis of prior research of Bloom (1995) showing that a prior mea- as before, but assuming the covariate has the same sure of cognitive ability accounted for 73% of the between-cluster variation and 48% of the within-cluster variation in an educational explanatory power as found by Bloom, one can com- achievement posttest. Appropriate assumptions about the explana- pute the optimal n, corresponding J, and the variance tory power of the covariate at each level will be case specific. An of the treatment effect estimate. These are presented asterisk indicates a scenario that produces a sampling variance less than 0.0225 as an indicator of the designs producing a modicum of in Table 3. Note that the optimal n is larger and the precision. corresponding J is smaller when the covariate is used (Table 3) than when the covariate is not used (Table variance analysis is uniformly optimal. Of course, this 2). This occurs because the explanatory power is conclusion is based on the assumptions concerning greater between clusters than within clusters, so that the explanatory power of the covariate at each level. the adjusted intracluster correlation is smaller than the unadjusted intracluster correlation. Most important, note the substantial reduction in the variance of the Relative Efficiency of the Two Designs treatment contrast when the covariate is used. As a One defines the relative efficiency of two unbiased result, the analysis using the covariate significantly to be the ratio of the reciprocal of their increases the probability of detecting a non-zero treat- variances. Table 4 gives the relative efficiency of the ment effect. analysis without the covariate in comparison with the Optimal Design analysis with the covariate (under Bloom's, 1995, es- timates of variance components). In each case, the Figure 1 graphically illustrates the concept of op- variance is computed at n(optimal) and the corre- timal design. The figure plots the standard error of the sponding J. treatment contrast as a function of plausible values of Table 4 shows that the larger the unconditional in- the costs and the variance components. (The standard tracluster correlation and the more expensive it is to errors in the figure are based on Bloom's estimates of sample clusters in relation to persons within clusters, the total unadjusted variance of 1579). For all plau- the greater the benefit of using the covariance analysis sible values of cost and variance, the standard error as opposed to the analysis with no covariate. This will associated with the no-covariate analysis exceeds that generally be the case when the covariate more effec- associated with the covariance analysis. Thus the co- tively explains variation between clusters than within STATISTICAL ANALYSIS 181

Slumlord Error Table 4 0.327 • Efficiency of the Design That Ignores the Covariate Relative to That of the Design That Uses the Covariate in Two-Level Analysis 0.251 ' Cluster/ Unconditional person 0.174 • intracluster cost ratio Relative correlation (p) (C2) efficiency 0.098 .01 2 .481 0.150 .01 10 .450 .01 50 .405 .05 2 .449 .05 10 .404

S Under d Error .05 50 .356 .10 2 .430 0.327 - .10 10 .381 .10 50 .337 .20 2 .406 0.25, ' .20 10 .358 .20 50 .320 0.174 ' .50 2 .364 .50 10 .325 .50 50 .301 0.098 0.150 Note. This table is constructed with Tables 2 and 3 on the basis of prior research of Bloom (199S) showing that a prior measure of cognitive ability accounted for 73% of the between-cluster varia- tion and 48% of the within-cluster variation in an educational Figure 1. Standard error of treatment contrast: With (Top) achievement posltest. Appropriate assumptions about the explana- and without (Bottom) the covariate. This figure is con- tory power of the covariate at each level will be case specific. structed on the basis of prior research of Bloom (1995) showing that a prior measure of cognitive ability accounted revealing that, while the aggregated analysis can be for 73% of the between-cluster variation and 48% of the quite inefficient when J is very small, as is common in within-cluster variation in an educational achievement post- community trials (Murray et al., 1994), it becomes test. Appropriate assumptions about the explanatory power nearly efficient as J becomes large. of the covariate at each level will be case specific. Contextual Effects Models clusters as when using cognitive outcomes where The model of Equation 1 1 is founded on the as- schools or neighborhoods are the clusters. In other sumption that only one is needed to repre- cases, of course, this relationship may not hold, and sent the relationship between the covariate and the the relative efficiencies will not have the pattern of outcome. Researchers often find, however, that the Table 4. within-cluster and between-cluster relationships be- Use of an Aggregated or tween the covariate and outcome can be quite differ- Between-Cluster Covariate ent (e.g., Willms, 1986). Then the needed model is Suppose that an analyst uses the aggregated out- come and the aggregated covariate to estimate the treatment effect as in Equation 17, Case 1. One sees that the efficiency of the aggregated analysis relative (26) to that of the two-level analysis is ("Y2 + 13) 1 l+Jn-4 J-3 (25) Here -/2 is the within-cluster coefficient, -y3 is the 1 y-4' + 1 + contextual coefficient, and y2 "fa is the between- J-4 cluster coefficient. Clearly, if y$ is zero, the within- 182 RAUDENBUSH and between-cluster coefficients are equal, as is as- n substituted for ra(optimal). When the fixed sample sumed by the basic model (Equation 11). Now aver- sizes vary, substituting the harmonic n, nhamlonic = aging the outcome within clusters gives JfLnj', will give a good approximation to the stan- dard error. In either case, once within-cluster sample Y-j = To + liSj + Cv2 + y3)M.j + Uj + e.f (27) sizes are determined, overall precision will depend on where the treatment effect will be adjusted for the the total number of clusters, which is constrained by between-cluster coefficient, y + "fo- Thus, the aggre- 2 the resources available for the study. The overall pre- gated analysis (Equation 27) will give the same treat- cision may be substantially enhanced by the use of a ment effect estimate as will the two-level contextual covariate. effects model (Equation 26). Given relatively large I have also assumed that investigators will have within-cluster sample sizes, this analysis will be more collected some covariate information at the person efficient than the analysis based on Equation 11 when level. It has become quite routine to collect at least 1 some demographic information. However, a decision > (28) j-y to collect effective covariates could significantly in- where R is the proportion reduction in between-cluster crease the cost of data collection. Collecting covari- ates at the person level would increase C , while col- variance associated with estimating y2 + "Ys rather 1 than just -y2. A more refined assessment of the benefit lecting cluster-level covariates would increase C2. of incorporating the contextual effect as a means of Such costs will, of course, affect optimal allocation, improving estimation of the treatment effect is a use- and one might well imagine a setting where aggregate ful topic of further research. covariates would be comparatively cheap, increasing the relative efficiency of the aggregated analysis of Final Remarks covariance. Although cluster randomized trials, like individual In many cases, only very rough estimates of cost randomized trials, provide unbiased estimates of pro- will be available. In this case, it would be wise to gram impact, they are often regarded as weak in pre- compute n (optimal) across a of plausible cost cision. The argument presented here is that careful assumptions and, in each case, to compute the rel- choice of covariates and sound planning combined evant J and standard error (as in Tables 2-3 and Fig- with efficient analyses that use all of the information ure 1). It will often be found that the optimal design at each level can significantly increase the precision and standard error will be quite insensitive to varia- of cluster randomization studies. tion in plausible assumptions about cost. I have considered the simple case of a single per- There are cases in which the optimal design will be son-level covariate and have assumed that use of that quite unbalanced. Suppose, as described earlier, that covariate would explain variation at both levels. It is the treatment increases social interactions within ex- possible, of course, to use multiple person-level and perimental clusters, creating a new source of depen- cluster-level covariates in the same analysis. The dence within them. In this case, within-classroom number of person-level covariates that can be used is variance may be larger in the experimental classrooms constrained by nJ, while the number of cluster-level than in the control classrooms, implying the need for covariates is constrained by J. larger sample sizes within experimental clusters than In many cases, within-cluster sample sizes are un- within control clusters. Also described earlier was a der the control of the researcher. It is then useful to setting in which it is more costly to sample experi- determine the optimal within-cluster sample size mental clusters than to sample control clusters, im- given plausible assumptions about costs and variance plying an optimal design having more control clusters components. In some cases, however, the research than experimental clusters. While it is beyond the setting will fix the within-cluster sample size n. For scope of this article to consider more complex vari- example, when married couples or identical twins are ance and cost functions leading to optimal designs randomized to therapy, n = 2 by definition, and in that are unbalanced, the basic approach outlined here classroom research it may be logistically necessary to can and should be generalized to apply to these cir- include every member of each classroom in the study. cumstances. In these cases, of course, the optimal allocation for- It is also important to extend the kind of analysis mulas presented here will be unnecessary, but the used here to other designs. For example, one might standard error formulas will still apply with the fixed consider designs involving multiple pretests and fol- STATISTICAL ANALYSIS 183 low-ups of persons nested within clusters assigned at analysis. American Journal of , 108(2), random to treatments. In these designs, measures of 100-102. change (e.g., growth rates or acceleration rates) be- Fabian, V. (1991). On the problem of interactions in the come the outcomes. Between-cluster variances may analysis of variance. Journal of the American Statistical be much smaller than for cross-sectional status mea- Association, 86(414), 362-367. sures, thus increasing the efficiency of the cluster ran- Feldman, H. A., & McKinlay, S. M. (1994). Cohort versus domized trial in a setting where it is expensive to cross-sectional design in large field trials: Precision, sample large numbers of clusters. However, longitu- sample size, and a unifying model. Statistics in Medicine, dinal follow-up of persons may be expensive. A vari- 13, 61-78. ant involves repeated cross-sectional samples from Friedman, I., & Glickman, N. W. (1987). Residential pro- the same clusters before and after implementation of gram characteristics for completion of treatment by ado- the treatment (Feldman & McKinlay, 1994; Murray et lescent drug abusers. Journal of Nervous and Mental Dis- al., 1994). The relevant error variation for assessing ease, 175(1), 419-424. program effects is the within-cluster variation over Garner, C., & Raudenbush, S. (1991). Educational attain- time rather than the between-cluster variation, possi- ment: A multi-level analysis of the influence of pupil bly leading to gains in precision when sampling large ability, family, school, and neighborhood. Sociology of numbers of clusters is expensive. Education, 64(4), 251-262. Sound planning of cluster randomized trials re- Goldberg, D. (1994). Cost-effectiveness in the treatment of quires collection of data on variance components and patients with schizophrenia. Acta Psychiatrica Scandina- costs of sampling at each level. The levels may in- via, 59(382), 89-92. volve variation between time points within persons, Goldstein, H. (1987). Multilevel models in educational and variation between persons within clusters, variation in . London: Oxford University Press. cluster means over time, or variation between cluster Hopkins, K. (1982). The unit of analysis: Group means means cross-sectionally. The levels that are relevant versus individual observations. American Educational depend on the design options under consideration. Research Journal, 19, 5-18. Next, the most efficient statistical analysis for each Johnson, C. A., Pentz, M. A., Weber, M. D., Dwyer, J. H., given design option must be chosen. It is then possible Baer, N., MacKinnon, D. P., Hanson, W. B., & Flay, to estimate the optimal allocation of resources and B. R. (1990). Relative effectiveness of comprehensive power for competing design alternatives in order to community programming for drug abuse prevention with make best use of the resources available for the study. high-risk and low-risk adolescents. Journal of Consulting and Clinical Psychology, 58, 447^156. Jones, B. A. (1992). Collaboration: The case for indigenous References community-based organization support of dropout pre- vention programming and implementation. Journal of Blair, R. C., & Higgins, J. (1985). A comparison of the Negro Education, 61, 496-508. power of the paired samples rank transform to Kirk, R. (1982). Experimental design: Procedures for the that of Wilcoxon's signed ranks statistics. Journal of Ed- behavioral sciences (2nd ed.). Belmont, CA: ucational Statistics, 10, 368-383. Brooks/Cole. Bloom, H. (1995, November). Minimum detectable effects Kish, L. (1965). . New York: Wiley. in a cluster randomized . Presentation at the Lehman, A. F., Slaughter, J. G., & Meyers, C. P. (1992). Manpower Development Research Corporation, New of life experiences of the chronically mentally ill. York. Evaluation and Program Planning, 15, 7—12. Longford, N. (1993). Random coefficient models. Oxford: Bryk, A., & Raudenbush, S. (1992). Hierarchical linear Clarendon Press. models in social and behavioral research: Applications and data analysis methods. Newbury Park, CA: Sage. Murray, D., McKinlay, S., Martin, D., Donner, A., Dwyer, J., Raudenbush, S., & Graubard, B. (1994, August). De- Cochran, W. (1977). Sampling techniques (3rd ed.). New sign and analysis issues in community trials. Evaluation York: Wiley. Review, pp. 493-514. Cook, T., & Campbell, D. (1979). Quasi-experimentation. Porter, A. C., & Raudenbush, S. W. (1987). Analysis of New York: Rand McNally. covariance: Its model and use in psychological research. Cornfield, J. (1978). Randomization by group: A formal Journal of Counseling Psychology, 34, 383-392. 184 RAUDENBUSH

Rao, C. R., & Kleffe, J. (1988). Estimation of variance com- Slavin, R. E. (1983). Cooperative learning. New York: ponents and applications. Amsterdam: Elsevier. Longman. Raudenbush, S. W. (1993). Hierarchical linear models and Snijders, T., & Bosker, R. (1993). Standard errors and experimental design. In L. K. Edwards (Ed.), Applied sample sizes for two-level research. Journal of Educa- analysis of variance in behavioral science (pp. 459-495). tional Statistics, IS, 237-259. New York: Marcel Dekker. Walsh, J. (1947). Concerning the effects of the intra-class Rosenbaum, D. P., Flewelling, R. L., Bailey, S. L., Ring- correlation on certain significance tests. Annals of Math- wait, C. L., & Wilkinson, D. L. (1994). Cops in the class- ematical Statistics, 18, 88-96. room: A longitudinal evaluation of drug abuse resistance Willms, J, D. (1986). Social class segregation and its rela- education (DARE). Journal of Research on Crime and tionship to student's examination results in Scotland. Delinquency, 31(1), 3-31. American Sociological Review, 51, 224-241.

Appendix

Approximations for Experimental Precision

The first aim is to derive Equation 12, the variance of the where d = M.. E - M.. c. Of course, a^, T^X will not be treatment contrast under the two-level model of Equation known; ML estimation of y requires substitution of ML

11. The model may be formulated in notation as estimates of o-^w T^ in Equation A4. Computing Equation A4 yields Equation 12 for the variance of the treatment Y = (Al) effect estimate. where The next aim is to derive the approximation of Equation 22. One begins with Equation 14, expressed as

V Var (-?,IX) = Var «, It/, Y. V) = 1 +-,

(A5)

The distributional assumptions for U, Y, and V are given by Equation 15. Now, taking the expectation of Equation A5 (A2) with respect to U gives x,.=

(A6)

Next, one takes the expectation of Equation A6 with respect Thus we have, for o^ T^ known, to Y, giving

4Avlt

(A7)

Finding an expression for the expectation in Equation A7 is Var (-?)=| difficult, but the expectation can be approximated to any degree required by expanding Algebraic simplification leads to (V6 + Y)-1 = D-1 (A8)

in a Taylor series, giving to the fifth order

nJd/4\]- (A4) (A9) STATISTICAL ANALYSIS 185 where where

£>„ = V6 + Y, = J-2. (A10) o- e0 = (V06 +• y0), V0 = J(n - 1). (A13) The process must iterate one more time, taking the expec- Setting 9 = 1 in Equation A12 gives Equation 22. tation of Equation A10 with respect to the distribution of V: Finally, one considers minimization of the variance of the

Ev(Em-')] = treatment effect estimate based on covariance analysis (All) (Equation 23) subject to the cost constraint (Equation 6). Direct minimization of Equation 23 after substituting Equa- Again, the expectations in Equation Al 1 are not expressible tion 7 for ] produces a quartic equation that defies simple 1 in simple form. However, expanding D^ in a fifth-degree solution. However, if one approximates Equation 23 by Taylor series, substituting the resulting expression into 3 Equation All (ignoring all terms less than 0 (J~ )), and 1 taking the expectation gives i+-

(A14) (Al 1A) and minimizes Equation A14, one obtains Equation 24, an where approximation that is accurate to O(J~A).

2 3 3 + 2(V06 + K,,)* , - 8(V09 Received March 22, 1996 \2[V (V + 4)9" + 2V ,y 92 a 0 ( o Revision received September 18, 1996 (A12) Accepted December 5, 1996

Low Publication Prices for APA Members and Affiliates

Keeping you up-to-date. All APA Fellows, Members, Associates, and Student Affiliates receive—as part of their annual dues—subscriptions to the American Psychologist and APA Monitor. High School Teacher and International Affiliates receive subscriptions to the APA Monitor, and they may subscribe to the American Psychologist at a significantly reduced rate. In addition, all Members and Student Affiliates are eligible for savings of up to 60% (plus a journal credit) on all other APA journals, as well as significant discounts on subscriptions from cooperating societies and publishers (e.g., the American Association for Counseling and Development, Academic Press, and Human Sciences Press).

Essential resources. APA members and affiliates receive special rates for purchases of APA books, including the Publication Manual of the American Psychological Association, and on dozens of new topical books each year.

Other benefits of membership. Membership in APA also provides eligibility for competitive insurance plans, continuing education programs, reduced APA convention fees, and specialty divisions.

More information. Write to American Psychological Association, Membership Services, 750 First Street, NE, Washington, DC 20002-4242.