arXiv:1001.3011v1 [stat.ME] 18 Jan 2010 eois A nttt n. ay ot Carolina e-mail: North USA Cary, 27513, Inc., and Institute Discovery SAS Scientific Genomics, of Russell Director USA. is 14853, Wolfinger York New Ithaca, Cornell University, Hall, Comstock and Biology, Computational Biological of Department Professor, e-mail: Federer Walter including many textbooks, in found be classical can Examples after. experiments soon designed followed of context the in development Russel and Wells T. Martin Federer, in T. Walter Booth, Covariance G. James of Experiments Analysis Designed for Components Model Multivariate A el e-mail: Wells 09 o.2,N.2 223–237 2, DOI: No. 24, Vol. 2009, Science Statistical oecvrae.Aayi fcvrac a long to a back has covariance dating history, of or Analysis one covariates. for account more to experiments designed in means ae ot e-mail: Booth James c ttsia Science article Statistical original the the of by reprint published electronic an is This ern iesfo h rgnli aiainand pagination in detail. original typographic the from differs reprint nttt fMteaia Statistics Mathematical of Institute hsatcecnen h duteto treatment of adjustment the concerns article This 10.1214/09-STS294 [email protected] .INTRODUCTION 1. nttt fMteaia Statistics Mathematical of Institute , oe,otooa ein admzdbok design. blocks randomized design, orthogonal model, phrases: con and ongoing words of Key source the literature. s been statistics have incorrectly the that are issues errors. models some standard used incorrect clarifies and widely estimates values. biased some to covariate m leading that observed components revealed the variance is univariate on it the conditioning r using by with made orthogonal obtained be is t adjustments can design appropriate means the response then if the factors, that, blocking of multivariate (random) shown distribution a is joint propose It the variates. we for covariance, model of components gene analysis coherent a o for develop conditional to ology order inherently In are values. covariate experiments served designed in means Abstract. 09 o.2,N.2 223–237 2, No. 24, Vol. 2009, Russ.Wolfi[email protected] [email protected] [email protected] Fisher 2009 , rdtoa ehd o oait duteto treatment of adjustment covariate for methods Traditional r rfsosand Professors are ( 1934 Federer .Mc fits of Much ). a Emeritus was n Martin and . dutdma,bokn atr conditional factor, blocking mean, Adjusted ( This . 1955 ), in 1 n ono ( Johnson and oeeto h es qae tbcuedifferences because fit squares has least predictor the additional on effect an no as co- means the block mea- Including ac- variate covariate treatments. average to the the among means surements in treatment differences model arithmetic for blocks count the fixed treat- adjust the squares from fit least obtained The means predictor. ment a covariate as the and included de- fixed, this is as treated of are analysis blocks a the classical sign with the In design covariate. (RCB) single blocks complete randomized inferential case. this additional in are is issues there assumption However, this which valid. in not modi- problems be handle can to propose fied we model The were mixed treatments. multivariate the they of because application affected to example, prior not generally measured for are we treatments, values Also, the covariate by design. the the that by vari- suppose random fixed are not covariates the ables, which in settings in ( ( Cox and Cochran 1967 sacnncleapew ics ndti the detail in discuss we example canonical a As ,a ela oercn et uha Milliken as such texts recent more as well as ), 2002 .W r pcfial interested specifically are We ). h approach The .Wolfinger D. l 1957 treatment o a method- ral h ob- the n setto espect However, uinin fusion variance ), pecified, n co- and ndcradCochran and Snedecor odel 2 BOOTH, FEDERER, WELLS AND WOLFINGER at the block level are already accounted for. Treating show that conditional maximum likelihood can elim- blocks as fixed effectively confines the scope of infer- inate the bias. ence to only those blocks and covariate values in the An outline of the remainder of the paper is as fol- study. In particular, the standard errors obtained lows. In the next section we look back historically from the fixed effects model for the and attempt to explain why some fundamental is- treatment means do not account for repeated sam- sues in analysis of covariance are still unresolved. pling of blocks. The randomized complete blocks design is discussed However, in most applications the blocks in the in detail in Section 3. In Section 4 we show how the study can be viewed as a random sample from a pop- bivariate model for the randomized complete blocks ulation of interest, and it is of interest to extend the design can be generalized to orthogonal designs, and scope of inference to the population of blocks. An to allow adjustment for multiple covariates. In the obvious modification of the classical model in this orthogonal case the conditional model for the re- case is to simply treat the block effects as random; sponse given the covariates implied by the multivari- that is, to fit a (univariate) mixed model with two ate mixed model for the joint distribution turns out variance components. Under the standard indepen- to be a univariate mixed model. This implies that dence and normality assumptions, adjusted treat- appropriate adjustment of treatment means can be ment means obtained from the mixed model have accomplished using standard software. The method- the same form as the classical least squares means, ology is applied to some standard examples of or- except that the covariate regression coefficient is thogonal designs. The nonorthogonal case is discussed a weighted average of the estimates obtained from in Section 5. Here we show that appropriate adjust- intra- and inter-block regressions. ment cannot be accomplished by fitting a univari- A key point made in this paper is that both uni- ate mixed model. However, an EM algorithm for variate fixed and mixed models for analysis of co- fitting a general multivariate linear variance com- variance are inherently conditional on the measured ponents model is developed using arguments that covariate values. An obvious question is, therefore, parallel those for the univariate case, as discussed what joint distribution for response and covariate in Searle, Casella and McCulloch (1992), Chapter 8. leads to the conditional models. We show that by The methodology is applied to balanced incomplete simply treating the block effects as random in the block designs and unbalanced designs. We conclude randomized complete blocks design setting, one is with some discussion in Section 6. implicitly assuming that the marginal block vari- ance for the covariate is zero; that is, an implied 2. HISTORICAL PERSPECTIVE model for the joint distribution that is not realistic. As a result, the adjusted treatment means obtained Since analysis of covariance in designed experi- from the naive, univariate mixed model are biased. ments has such a long history, readers may won- However, by starting with a bivariate variance com- der why confusion over such basic modeling issues ponents model for the joint distribution of response persists even today. We attempt to explain this by and covariate, one is led to a sensible univariate con- discussing the topic in its historical context. Ar- ditional model, which properly accounts for the de- guably, the heyday of analysis of covariance was pre- sign with respect to the covariate. The idea of a 1960, predating the widespread use of matrix alge- bivariate model is suggested in Cox and McCullagh bra in statistics and clearly before the availability [(1982), Section 7], but not fully developed. Multi- of high-speed computing. Matrices are two hundred variate variance components models are discussed and some years old but their use in statistics only be- in Khuri, Mathew and Sinha [(1998), Chapter 10], came commonplace in the late 1950s. The very first but the application to analysis of covariance is not paper in the first issue of the Annals of Mathematical considered. The fully conditional approach was also Statistics by Wichsell (1930) is entitled “Remarks advocated by Neuhaus and McCulloch (2006) who on Regression,” yet it has no matrices. It is likely consider the situation where random effects in a that the lateness of adoption of matrices arose from generalized linear mixed model may be correlated their being treated as a topic in pure mathematics with one of the predictors. Classical likelihood ap- and, hence, their practical use in statistical mod- proaches lead to inconsistent estimators in this set- eling remained hidden. In the classic design books ting. Results in Neuhaus and McCulloch (2006) Cochran and Cox (1957) and Federer (1955), there COVARIANCE IN DESIGNED EXPERIMENTS 3 is no mention of matrices, whereas in Kempthorne Some statisticians, however, have advo- (1952) the design problem is formulated as a general cated a more general model which allows linear model but is not applied to analyze any ad- the intra-block regression coefficients to vanced designs. Kempthorne [(1952), page 66] even be different from the inter-block regres- notes that, at the time, there did not appear to sion. In this paper, all models are such be a complete matrix formulation of the general that the intra-block regression is the same linear model anywhere in the literature. In unbal- as that for the inter-block regression. It is anced data it was a longtime puzzle why statistical difficult for this writer to visualize situa- methods gave two different least squares estimates tions allowing separate regressions. of fixed effects in a one-way classification depending It turns out that the “[s]ome statisticians” Zelen upon whether one assumed that one effect was zero, or that all the effects summed to zero. The literature referred to were right, but why? Clearly their argu- had certainly not kept up with R. C. Bose’s (1949) ments were not entirely convincing at the time. The concept of estimability. Nor were 1950s design and bivariate variance components model described in linear model researchers aware that Penrose’s (1955) Section 3.3 reveals that Zelen’s two statements are generalized inverse matrix could be used to solve incompatible. In fact, between block variation in the the normal equations in the nonfull rank setting, as covariate implies that the intra- and inter-block re- in design problems. With the aid of a generalized gressions are not the same. Putting it another way, inverse, Rao (1962) demonstrated how the unique forcing the two slopes to be equal amounts to an as- unbiased estimators of estimable functions could be sumption that there is no variation among the block constructed. In a recent Statistical Science conver- covariate means. This assumption is clearly violated sation (Wells (2009)) Shayle Searle points out that in practical circumstances and therefore leads to bi- random effects modeling in the 1950s was quite lim- ased (adjusted) treatment means as well as incon- ited and mostly for balanced data. One of the early sistent estimates of variance components. formulations of matrix methods in variance and co- variance components analysis can be found in Searle 3. RANDOMIZED COMPLETE BLOCKS (1956). DESIGN An excellent paper by Zelen (1957) reviews the 3.1 Classical Approach thinking at the time for balanced incomplete block (BIB) designs. The algebraic manipulations associ- Consider a randomized complete blocks design with ated with the multivariate model described in this response, Y , and associated concomitant variable, paper would be very difficult, if not impossible, with- Z. Suppose that Z is measured prior to application out the use of matrix algebra and facility with the of the treatment but is possibly correlated with the multivariate normal distribution, mathematical tools response. Let i = 1,...,t be the index for treatment, that were not fully developed in the statistics liter- and j = 1, . . . , b be the index for block. The classical ature at the time of Zelen’s paper. We focus on Ze- fixed effects linear model for this design is as follows: len’s discussion of intra- and inter-block regressions, (1) Yij = µ + τi + βj + γzij + Eij, which we summarize with the following two quotes 2 from Section 3 of his paper: where Eij ∼ N(0,σe ), independently, and i τi = β = 0. Notice that replacing the covariate z in the analysis of covariance, the inter- j j ij with the block centered value z − z¯ has no effect block model will be important if the vari- ij j on the fit of this model because the term, −γz¯ , ability of the concomitant variate is large j can be incorporated into the fixed effect for block j. for “between blocks” as compared to the The adjusted mean for treatment i is the estimated variability “within blocks.” This situation mean response at a fixed value of Z, conventionally may allow more precise inter-block esti- taken to be its average observed value,z ¯ . mates of the regression coefficients as com- Throughout this paper Greek letters represent fixed pared to the corresponding intra-block es- effects (unknown parameters), upper case Roman timates letters are random variables or known matrices, and and later, when discussing the slopes of intra- and lower case Roman letters are either observed values inter-block regressions, of random variables or known constants (or vectors). 4 BOOTH, FEDERER, WELLS AND WOLFINGER

Table 1 Adjusted treatment means and standard errors for Pearce’s apple yield data

Univariate fixed Univariate mixed Bivariate mixed Treatment Adj.Mean Std.Err. Adj.Mean Std.Err. Adj.Mean Std.Err.

A 280.48 6.37 280.41 13.69 280.48 12.98 B 266.57 6.36 266.55 13.68 266.57 12.98 C 274.07 6.36 274.05 13.68 274.07 12.98 D 281.14 6.44 281.32 13.72 281.14 13.02 E 300.92 6.72 301.33 13.87 300.92 13.19 S 251.34 6.86 250.85 13.95 251.34 13.28

Adjusted means and their standard errors for the apple yield data from Pearce (1953, 1982), based on fixed effects, univariate mixed effects and bivariate mixed effects models. The standard errors involve the ML estimates of variance compo- nents.

With these conventions it is implicit in the notation ordinary least squares estimate that model (1) characterizes the conditional distri- T z (Ct ⊗ Cb)y bution of the response given the observed values of (3) γˆols = , zT (C ⊗ C )z the concomitant variable. t b ¯ We define the inter-block regression model to be where Ct = It − Jt is the centering matrix of di- T the implied model for the block means, specifically, mension t, and y = (y11,y12,...,ytb) is the entire response vector, with z defined analogously. Since ¯ ¯ Yj = µ + βj + γz¯j + Ej. CbJ¯b = 0, it follows from (3) thatγ ˆols is independent of the unadjusted treatment mean vector, (I ⊗J¯ )y, Thus, in this context, the inter-block model con- t b with components,y ¯ , i = 1,...,t. Hence, the vari- tains no information about treatment differences, i ance formula for the adjusted means based on the or about the regression parameter, γ, because the traditional model with fixed block effects is terms, γz¯ , are confounded with the block effects. j σ2 σ2 We define the intra-block regression model using the e e 2 (4) var(ˆµi,adj)= + T (¯zi − z¯) . b z (Ct ⊗ C )z t − 1 orthogonal Helmert contrasts, h2,..., ht, be- b tween components of the observation vector, Yj, for For numerical illustration we consider the apple ∗ T block j. Specifically, let Yij = hi Yj, for i = 2,...,t yield data from Pearce (1953, 1982). In this experi- ∗ ∗ and j = 1, . . . , b, and similarly define zij and Eij . ment there were b = 4 blocks, and t = 6 treatments Then the intra-block model in this case is (A, B, C, D, E and S), with S being the standard practice in English apple orchards of keeping the ∗ ∗ ∗ ∗ Yij = τi + γzij + Eij, land clean in the summer. The response, Y , is the

∗ T yield per plot, and the covariate, Z, is the number where τi = hi τ , i = 2,...,t, are t − 1 orthogonal of boxes of fruit, measured to the nearest tenth of a contrasts among the treatment means. It follows box, for the four seasons previous to the application that all the information about treatment differences, of the treatments. The adjusted treatment means and the covariate regression parameter, is contained and their estimated standard errors, based on three in the intra-block model. different models, are reported in Table 1. The adjusted treatment means are estimates of 3.2 Univariate Mixed Model the mean responses when Z =z ¯. These are given by In most applications the blocks can be regarded as a random sample from a population, and it is of (2) µˆi,adj =µ ˆ +τ ˆi +γ ˆz¯ =y ¯i − γˆ(¯zi − z¯), interest to make inferences about the average treat- for i = 1,...,t, whereγ ˆ is the BLUE for γ. These ment effects across the population of potential blocks. do not involve the (estimated) block effects because In such cases it makes sense to treat the block effects they are averages over the blocks and the block ef- as random. Thus, model (1) becomes fects sum to zero. In the fixed effects case,γ ˆ is the (5) Yij = µ + τi + Bj + γzij + Eij, COVARIANCE IN DESIGNED EXPERIMENTS 5

2 where now Bj ∼ i.i.d. N(0,σb ) independently of the case in which the block effects are omitted from the random errors, Eij. Replacing the covariate zij with model. the data centered value, zij −z¯, has no effect on the The adjusted means and their standard errors based fit of this model because the term, −γz¯, can be in- on the mixed effects model (5), for the apple yield corporated into the fixed intercept. However, unlike data, are tabulated in Table 1. The ML variance 2 the fixed block model (1), replacing the covariate component estimates in this example areσ ˆe = 194.55 2 with block centered values, zij − z¯j, does affect the andσ ˆb = 553.98, resulting in a correlation estimate fit. ρˆ = 0.9447, andγ ˆ = 28.89. This compares with the The inter-block regression model derived from (5) ordinary least squares estimateγ ˆ = 28.40. Thus, in is this case the adjusted mean values are quite similar. However, the standard errors reported by the soft- ¯ ¯ (6) Yj = µ + γz¯j + Bj + Ej, ware are quite different. This is because inferences from the fixed effects model (1) are restricted to while the intra-block model is the same as in the the four blocks in the study, whereas those from the fixed effects case. Thus, when the block effects are model (5) apply to the population of blocks. Specif- treated as random, they are incorporated into the ically, since (7) impliesγ ˆmixed is independent of the error term of the inter-block model. As a result, the unadjusted treatment means, inter-block model does contain additional informa- tion about the covariate regression parameter. In var(ˆµi,adj) particular, it would appear from (6) that the infor- 2 2 σe + σ mation in the inter-block model will increase with = b b the variability of the covariate block means. This zT [(I − ρJ¯ ) ⊗ C ]Σ[(I − ρJ¯ ) ⊗ C ]z explains the first quote from Zelen (1957) given in + t t b t t b T ¯ 2 Section 2 (albeit for a BIB design). (z [(It − ρJt) ⊗ Cb]z) The adjusted treatment means based on model 2 (¯zi − z¯) , (5) have the form (2), being the expected treat- 2 2 ment means in repeated sampling (involving differ- where Σ ≡ var(Y)= σe It ⊗ Ib + σb Jt ⊗ Ib. Notice ent blocks) at a common concomitant variable value, that, even as ρ approaches 1, this variance formula Z =z ¯ . However, the BLUE for γ is a weighted av- still differs from the fixed effects variance given in 2 erage of the estimates obtained from the intra- and (4) by an additive amount, σb /b, which accounts for inter-block regression models, where the weights are variation due to sampling of blocks. inversely proportional to their . Let ρ de- 3.3 Bivariate Mixed Model note correlation between any two sample treatment As noted in the Introduction, the models (1) and means, that is, (5) are inherently conditional on the observed val- σ2 ues of the covariate Z. The fixed effects model (1) is ρ = cor(Y¯ , Y¯ )= b . i k 2 2 appropriate if the blocks in the experiment are the σb + σe /t only ones of interest, whereas model (5) is an at- Then, it is shown in the Appendix that the BLUE tempt to broaden the applicability of inferences to of γ based on (5) has the form the hypothetical population from which the blocks were drawn. An obvious question is what model(s) zT [(I − ρJ¯ ) ⊗ C ]y t t b for the joint distribution of (Y,Z) leads to the con- γˆmixed = T z [(It − ρJ¯t) ⊗ Cb]z (7) ditional model (5)? T T Consider a bivariate model in which the distribu- z (Ct ⊗ Cb)y + (1 − ρ)z (J¯t ⊗ Cb)y = . tion of Z is independent of the treatments but allows zT [(I − ρJ¯ ) ⊗ C ]z t t b for random variation between blocks and residual er- If the block variance dominates the error variance, ror. Specifically, and hence ρ ≈ 1, then the mixed effects estimate Y µ τ of γ is close to the ordinary least squares estimate ij = y + i,y Zij µz 0 in (3). On the other hand, if the block variance is (8) dominated by the error variance, then ρ ≈ 0, and B E + j,y + ij,y the estimate in (7) corresponds to the fixed effect B E j,z ij,z 6 BOOTH, FEDERER, WELLS AND WOLFINGER

2 where the block effects are i.i.d. bivariate normal, σb,z It − Jt (σe,zyIt + σ Jt) 2 σ2 + tσ2 b,zy Bj,y 0 σb,y σb,yz e,z b,z ∼ i.i.d. N2 , ΣB = 2 , Bj,z 0 σb,zy σb,z 2 2 = σe It + σb Jt, independently of the bivariate residual errors, 2 2 2 2 where σe = σe,y − σe,yz/σe,z, and E 0 σ2 σ ij,y ∼ i.i.d. N , Σ = e,y e,yz . 2 2 E 2 0 E σ σ2 (13) σb = σb,y − [γeσb,yz + γb(σb,yz + σe,yz/t)]. ij,z e,zy e,z It follows from (10) and (12) that the univariate As before, let Y = (Y ,...,Y )T denote the re- j 1j tj conditional model implied by (9) is sponse vector for the jth block, and similarly define Zj. Then the conditionally specified model implied (14) Yij = µ + τi + Bj + γezij + γbz¯j + Eij, by bivariate model (8) can be formally derived using where µ = µy − (γe + γb)µz, τi ≡ τi,y, and Bj ∼ i.i.d. the fact that 2 2 N(0,σb ) independently of Eij ∼ i.i.d. N(0,σe ). The Yj inter-block regression model implied by (14) is Z j Y¯j = µ + γbez¯j + Bj + E¯j, µ σ2 I + σ2 J (9) ∼ i.i.d. N y , e,y t b,y t 2t µ 1 σ I + σ J where z t e,zy t b,zy t σb,yz + σe,yz/t σe,yzIt + σb,yzJt γbe ≡ γe + γb = 2 2 , σ2 + σ2 /t σ It + σ Jt b,z e,z e,z b,z is the slope of the inter-block regression. Thus, in where µy is the vector of treatment means with com- ponents, µi,y = µy + τi,y. It follows that this case the inter-block model contains no infor- mation about the intra-block covariate regression E(Yj|Zj = zj) coefficient. Similarly, in the case of a generalized 2 2 −1 linear mixed model, where random effects may be = µy + (σe,yzIt + σb,yzJt)(σe,zIt + σb,zJt) correlated with one of the predictors, it is shown (zj − 1tµz) in Neuhaus and McCulloch (2006) that conditional (10) 1 maximum likelihood also leads naturally to the par- µ I J¯ = y + (σe,yz t + tσb,yz t) 2 titioning of the covariate into between- and within- σe,z cluster components. 2 tσb,z Writing (13) in terms of the intra-block and inter- It − J¯t (zj − 1tµz) σ2 + tσ2 block slopes, γe and γbe, we obtain e,z b,z σ2 = σ2 − γ σ /t − γ (σ + σ /t). = µy + γe(zj − 1tµz)+ γb1t(¯zj − µz), b b,y e e,yz be b,yz e,yz 2 Thus, the block variance for the response in the con- where γe = σe,yz/σe,z and ditional model is the marginal block variance ad- 2 2 σe,zσb,yz − σe,yzσ justed for intra- and inter-block regression on the (11) γ = b,z . b 2 2 2 covariate. σe,z(σb,z + σe,z/t) The univariate mixed model (5) is the conditional The conditional variance is model implied by (8) when γb = 0, which only hap- pens if σ2 = 0, an unrealistic assumption in prac- var(Yj|Zj = zj) b,z tice. At this point it is interesting to recall Zelen’s 2 2 = σe,yIt + σb,yJt (1957) comments, quoted earlier, concerning the 2 2 −1 equality of slopes in the inter- and intra-block mod- − (σe,yzIt + σb,yzJt)(σ It + σ Jt) e,z b,z els, and the information in the inter-block model (σe,zyIt + σb,zyJt) about the intra-block slope increasing with the block- (12) to-block variability in the covariate. It is now clear = σ2 I + σ2 J e,y t b,y t that these statements are incompatible. Block-to- 1 block variability in the covariate implies that the I J − (σe,yz t + σb,yz t) 2 σe,z inter- and intra-block slopes are different. For this COVARIANCE IN DESIGNED EXPERIMENTS 7 reason the use of the univariate mixed model (5) 4. GENERAL ORTHOGONAL BLOCKING leads to biased estimates of adjusted means and in- DESIGNS consistent estimates of variance components as the 4.1 Theory number of blocks increases. T We define the adjusted treatment means to be the Let Zij = (Zij1,...,Zijm) be a covariate vector expected responses if the covariate values were all associated with the response Yij, for i = 1,...,k in equal to the average observed covariate value. Thus, replicate j = 1, . . . , b. Thus, the data matrix for repli- the model (14) implies cate j is given by µ = µ + τ + (γ + γ )¯z T i,adj i b e Y1j Z1j (15) T Y2j Z = µi,y + γbe(¯z − µz).  2j  . . = [Yj, Zj], It is shown in the Appendix thatµ ˆ =z ¯ , thatγ ˆ . . z e   equals the ordinary least squares estimate based on  Y ZT   kj kj  univariate fixed effects model, and thatµ ˆ =y ¯ − i,y i say. Let Z∗ denote the rth column of Z . Suppose γˆ (¯z − z¯ ). It follows that the adjusted treatment jr j e i that Y can be decomposed into Y = µ +T +U , means based on (14) are identical to (2). j j y j j where µ is the fixed mean of Y , which depends on Estimates of the adjusted treatment means for the y j T apple yield data based on (14) are given in Table 1. the treatments, j is the sum random factors asso- The estimate of the inter-block slope in this case is ciated with treatments (and therefore independent of Zj), and Uj is the sum of q random design factors γˆbe = 37.25. This is quite different in magnitude (al- though not statistically) from the estimated intra- plus residual errors. We suppose that vec[Uj , Zj], j = 1, . . . , b, are i.i.d. block slope,γ ˆe = 28.40. Since the intra-block esti- mate is identical to those based on the univariate multivariate normal vectors of dimension k(m + 1), T fixed effects model, the standard errors for the ad- with means, vec[0k, µz ⊗1k], where the components justed means are given by of µz are the marginal means of the m covariates, and with , V. We say that the de- σ2 + σ2 σ2 var(ˆµ )= e b + e (¯z − z¯ )2. sign is an “orthogonal blocking design” if the matrix i,adj b zT (C ⊗ C )z i t b V has the following structure. Let A0 ≡ J¯k, and Al, The estimated standard errors are larger than those l = 1,...,q, be k × k matrices with the properties ′ based on the fixed effects model by an additive factor that (a) Al is idempotent, (b) AlAl′ =0 if l = l , of σ2/b. This is as is should be, because the scope q b and l=0 Al = Ik. Then we suppose there exist non- of inference has been broadened to the population singular matrices, G0, G1,..., Gq, each of dimension of blocks. m + 1, such that Up to now we have assumed that the covariate q values are not affected by the treatments. If they (16) V = G ⊗ A . are, then the bivariate model (8) is no longer appro- l l l=0 priate. An obvious modification of (8) in this case is Notice that a design can be orthogonal, in this sense, regardless of the assignment of the treatments. Y µ τ B E ij = y + i,y + j,y + ij,y . In general, the matrix V is a function of (q + Zij µz τi,z Bj,z Eij,z 1 1) 2 (m + 1)(m + 2) free variance and covariance pa- The conditional model for Y given Z implied by rameters which determine the q+1 variance-covariance this model has exactly the same form as (14). How- matrices, Σ0,..., Σq, associated with the residual ever, the treatment effect parameter τi is equal to errors, and the q random design factors. In particu- τi,y − γeτi,z. This makes sense in that what is be- lar, we note that the variance–covariance structure ing estimated is the direct effect of treatments on for Uj is the response mean, as opposed to the indirect effect q through the covariate. As noted by Bartlett (1936), there is reason for caution in this setting due to hid- Vuu = gl,uuAl, den extrapolation. Comparing conditional expecta- l=0 tions of treatment means at equal covariate levels where gl,uu, l = 0,...,q, are scalar parameters, and may not make sense if the treatments affect what that this structure is that implied by orthogonality covariate values are observed. of the random blocking factors. 8 BOOTH, FEDERER, WELLS AND WOLFINGER

Example. Consider the RCB design discussed the covariates are equal to their respective marginal in Section 3. In this case there is only one covariate, means is so m = 1. The vector U associated with the jth j µ = µ − 1 γT (z¯∗ − µ ), block consists of the sum of the block effect and and adj y k 0 z the residual error vector, which generalizes the formula (15) for the RCB de- sign. Uj = 1tBj + Ej. The conditional variance of Uj is given by Finally, the covariance matrix for (YT , ZT )T is given −1 j j var(Uj |Zj = zj)= Vuu − VuzV Vzu by zz q −1 V = ΣE ⊗ It + ΣB ⊗ Jt = (gl,uu − gl,uzGl,zzgl,zu) ⊗ Al l=0 = (ΣE + tΣB) ⊗ J¯t + ΣE ⊗ (I − J¯t). q

The fact that vec[Uj , Zj], j = 1, . . . , b, are i.i.d. = λlAl, multivariate normal vectors implies that marginally l=0 vec(Z ), j = 1, . . . , b, are i.i.d. N(µ ⊗1 , V ), where j z k zz corresponding to an orthogonal design with orthog- u and z subscript combinations are used to denote onal partition {A }. components of the partitioned matrix. Moreover, con- l ditionally upon Z, Uj, j = 1, . . . , b, have indepen- Example. Consider again the RCB design of dent normal distributions with means Section 3. Note that the conditional mean (10) can be reexpressed in the form, E(Uj|Zj = zj) c ¯ ¯ −1 E(Yj |Zj = zj)= µy + γbeJtzj + γe(It − Jt)zj, = VuzVzz (zj − µz ⊗ 1k) µc µ q q where y = y − γbeµz, and γbe = γe + γb. Moreover, −1 the conditional variance (12) can be reexpressed as = gl,uz ⊗ Al Gl,zz ⊗ Al (zj − µz ⊗ 1k) 2 2 2 l=0 l=0 var(Y |Z = z ) = (σ + tσ )J¯ + σ (I − J¯ ). j j j e b t e t t q −1 4.2 Examples = gl,uzGl,zz ⊗ Al (zj − µz ⊗ 1k) 4.2.1 Split-plot designs. Consider a standard split- l=0 q plot experiment with t whole-plot treatments, each T replicated r times, and s split-plot treatments in = γ ⊗ Al (zj − µ ⊗ 1k), l z each wholeplot. Let Y denote response to split- l=0 ijk plot treatment k, in whole-plot j assigned to whole- T −1 where γl = gl,uzGl,zz is a 1 × m parameter vector. plot treatment i. Similarly index the covariate val- T T Since (γl ⊗Al)(µz ⊗1k)= γ µz ⊗Al1k = 0, unless ues Zijk. Then, the marginal models for the response T l = 0, in which case it equals γ0 µz1k, we have and covariate are q m Yijk = µy + αi,y + W(i)j,y + τk,y + ατik,y + Eijk,y E(U |Z = z )= −γT µ 1 + γ A z∗ . j j j 0 z k lr l jr and r=1 l=0 Z = µ + W + E Finally, since Tj has mean zero, and is independent ijk z (i)j,z ijk,z Z of j, respectively. Bivariate normality for the pairs, (W(i)j,y, q m W(i)j,z) and (Eijk,y, Eijk,z), imply that the condi- c ∗ E(Yj|Zj = zj)= µy + γlrAlzjr, tional model for appropriate covariate adjustment r=1 has the form, l=0 c T where µy = µy −γ0 µz1k. Thus, the conditional mean Yijk = µ + αi + W(i)j + τk + ατik of the response is given by a linear model with treat- (17) c + γwz¯ij + γezijk + Eijk, ment effects incorporated into µy, and covariate re- gression effects with slopes, {γlr}, l = 0,...,q, asso- where αi is the fixed main effect of the ith wholeplot ciated with each of the m covariates, r = 1,...,m. treatment, τk is the fixed main effect of the kth split- Since Al1k =0 for l> 0, the expected response if all plot treatment, and W(i)j is the random effect of the COVARIANCE IN DESIGNED EXPERIMENTS 9 jth wholeplot replicate nested within the ith whole- 4.2.3 Incomplete block designs. Consider an incom- plot treatment. Milliken and Johnson [(2002), Sec- plete block design with k

Zijkl = µz + Bi,z + W(ij)k,z + Eijkl,z. noise of resistors. As described by Zelen, the “ge- ometrical shapes were rectangular parallelepipeds Notice that, in this case, there are random inter- (all having the same thickness) formed by taking all actions between the blocking factor and treatments four combinations of 2 widths (w , w ) and 2 lengths that affect the response, but not the covariate. Bi- 1 2 variate normality of the pairs, (B ,B ), (W , (l1, l2).” Three resistors were mounted on each of 12 i,y i,z (ij)k,y ceramic plates according to a BIB design. The re- W(ij)k,z) and (Eijkl,y, Eijkl,z), results in a conditional model for the response with a covariate adjustment sponse was the logarithm of the noise measurement, at the individual response level, as well as adjust- and the covariate was the logarithm of the resis- ments for covariate variation in wholeplot and block tance of each resistor. Estimated treatment effects means, specifically, and their standard errors obtained using the uni- variate mixed model of the form (5), and using the Yijkl = µ + Bi + αj + (Bα)ij + W(ij)k conditional model (14) derived from the bivariate + τl + (Bτ)il + (ατ)jl + (Bατ)ijl model (8), are given in Table 2. There are substan- tial differences in both the estimated effects and the + γ z¯ + γ z¯ + γ z + E . b i w ijk e ijkl ijkl standard errors obtained using the two models. The 4.2.2 Latin square design. Let Yijk denote the re- estimates are also highly correlated, and these cor- sponse in cell (i, j) of a latin square design involving relations must be taken into account in comparisons two random blocking factors and a fixed treatment among the length and width combinations. In par- factor, each with b levels. An appropriate model ig- ticular, Zelen considered the interaction contrast, noring any covariate information is π = (l2w1 − l2w2) − (l1w1 − l1w2). Yrijk = µy + Ri,y + Cj,y + τk + Eijk,y. A marginal model for a random covariate is Estimates of π under the two models (5) and (14) are 0.022 and 0.018 respectively, with standard errors Zrijk = µz + Ri,z + Cj,z + Eijk,z. 0.056 and 0.061. Thus, both models lead to the same Bivariate normality of the pairs, (Ri,y,Ri,z), (Cj,y, conclusion that there is little statistical evidence for Cj,z) and (Eijk,y, Eijk,z), results in a conditional interaction. model for the response with a covariate adjustment at the individual response level, as well as adjust- 5. NONORTHOGONAL DESIGNS ments for covariate variation in row and column 5.1 Factorization means. That is, A key feature of the multivariate mixed model Yijk = µ + Ri + Cj + τk + γrz¯i in the orthogonal design case is that the parame- + γcz¯j + γezijk + Eijk. c ters in the conditional model for Y (µy and γl, l = 10 BOOTH, FEDERER, WELLS AND WOLFINGER

Table 2 to µ + τi. Now suppose that only k 0. The model implies bivariate model has t + 7 parameters which deter- that Z has variance-covariance matrix equal to mine the marginal means and block and error co- q variance matrices, (µ ,µ , Σ , Σ ). The parame- r y z B E V Z C C 2 D Σ I DT terization in terms of the marginal model for Z, ≡ var( )= i iσi + i( i ⊗ di ) i . and the conditional model for Y , is a union of two i=1 i=0 variation independent components of dimensions 3 We define the adjusted response mean vector as 2 2 and t + 4 respectively. Specifically, (µz,σb,z,σe,z) ∪ its given the covariates eval- 2 2 (µ, γe, γb,σe ,σb ), where µ has ith component equal uated at their estimated mean values. If we partition COVARIANCE IN DESIGNED EXPERIMENTS 11 the variance matrix, V, into n × n matrix compo- This implies that nents, the conditional expectation of the response Σ I 2 Σ I D Σ I DT vector is given by | | = |{d ci σi }||{d i ⊗ di }|| 0( 0 ⊗ n) 0 | r q E(Y|Zi = 1nµˆzi , i = 1,...,m) 2ci di n = σi |Σi| |Σ0| , m −1 i=1 i=1 = X0β + [rV0i]([Vij]i,j=1) [c1nµˆz,i − 1nµz,i]. Here, we have used the notational definitions in because D0 = Im+1 ⊗ In. The complete data log- Searle, Casella and McCulloch [(1992), Section 8.3]. likelihood is therefore r q Thus, for example, [rV0i] = [V01,..., V0m]. The es- 1 1 l = − c log σ2 − d log |Σ | timate of the adjusted mean response vector is there- 2 i i 2 i i fore i=1 i=0 r T q ˆ T ˆ −1 −1 T ˆ −1 1 T Ti 1 µˆ adj = X0β = X0(X V X) X V Z. i T −1 − 2 − Bi (Σi ⊗ Idi ) Bi, 2 σi 2 A “naive” variance-covariance formula for the ad- i=1 i=0 justed mean vector, ignoring variability due to the where estimation of V, is given by the conditional variance r q of µˆ adj assuming V is known. Specifically, B0 = Z − Xβ − CiTi − DiBi T −1 −1 T i0 ∗ 0j m i=1 i=1 var(µˆ adj)= X0(X V X) X [V V00V ]i,j=0 depends on the parameter β. It follows that the T −1 −1 T X(X V X) X0 , maximum likelihood estimates based on the com- ij −1 ∗ plete data are where [V ] = V , and V00 = V00 − [rV0i] m −1 ([Vij]i,j=1) [cV0j]. 2 1 T (18) σˆi = Ti Ti, i = 1,...,r, 5.3 EM Algorithm ci 1 m The distributional assumptions described above (19) Σˆ = BT B , i = 0,...,q, i d ij ik imply that the “complete” data vector, i j,k=0 T T T T T T and (Z , T1 ,..., Tr , B1 ,..., Bq ) , ˆ T −1 −1 T −1 has a multivariate normal distribution with mean Xβ = X[X (Σ0 ⊗ In) X] X (Σ0 ⊗ In) T T T T (20) (β X , 0 ) . The assumptions imply the covari- r q ance between Z and Ti and Bi are, respectively, Z − CiTi − DiBi . T 2 i=1 i=1 cov(Z, Ti )= Ciσi , The EM algorithm consists of iteratively replacing and T T Ti and Bi in (20), and Ti Ti and BijBik in (18) T cov(Z, Bi )= Di(Σi ⊗ Idi ). and (19), by their conditional expectations given the observed data Z. These expectations are straightfor- Thus, the joint density of the complete data vector ward to calculate because the conditional distribu- is tions involved are multivariate normal. Specifically, −1/2 f(z, t, b)= |2πΣ| exp(−Q/2), 2 T −1 Ti|Z = z ∼ N[σi Ci V (z − Xβ), T T T −1 T T where Q = [(z − Xβ) , t , b ]Σ [(z − Xβ) , t , 2 4 T −1 σ Ici − σ C V Ci], bT ]T , and i i i 2 independently, for i = 1, . . . , r, and V {rCiσi } T 2 2 T −1 Σ = {cCi σi } {dIci σi } E(Bi|Z = z) = (Σi ⊗ Idi )Di V (z − Xβ),  T T {c(Σi ⊗ Idi )Di } 0 var(Bi|Z = z)= Σi ⊗ Idi  {rDi(Σi ⊗ Idi )} T −1 − (Σi ⊗ Idi )D V Di(Σi ⊗ Idi ), 0 . i  {dΣi ⊗ Idi } independently, for i = 1,...,q.  12 BOOTH, FEDERER, WELLS AND WOLFINGER

Table 3 and where θ = (θ1,θ2) and θ1 and θ2 are variation Adjusted means and standard errors for unbalanced apple yield data independent. Our approach reveals the fact that some widely Covariate Response Response Std.Err. used models generate biased adjusted means and in- Treatment mean mean Adj.Mean correct standard errors because the assumed condi- tional model imposes unrealistic constraints on the A 8.53 283.67 269.29 13.35 B 8.40 266.67 255.69 13.35 joint distribution. Our multivariate model also clar- C 8.35 275.25 271.62 12.73 ifies some issues that have been the source of long- D 7.93 270.25 277.47 12.73 standing confusion in the statistics literature. One E 7.48 277.25 295.96 12.73 such example is in the analysis of balanced incom- S 9.30 279.50 251.63 12.73 plete block designs. As noted by Zelen (1957), “With Adjusted means and their standard errors for the apple yield respect to the non-covariance situation, most statis- data from Pearce (1953, 1982), with covariate and response ticians agree that the inter-block analysis may be data missing for treatments A and B in block 1. The stan- important if the number of blocks is ‘large’ or if dard errors were computed using equation (17) and the ML the variability between blocks is ‘small’.” However, estimates of variance components. what is less understood is that the same statement is true in the analysis of covariance. The multivariate 5.4 An Unbalanced Example analysis makes this clear because it reveals that be- Consider the apple yield data from Pearce (1953) tween block variation in the covariate implies that discussed in Section 2. Suppose that the observa- the slope of the inter-block regression is different tions (both covariate and response) were missing from that in the intra-block regression. for treatments A and B in block number 1. The In the multivariate model discussed in this paper, adjusted means based on this unbalanced data are we assume that the effect of the covariates is the given in Table 3. The adjusted means are evalu- same for all treatments. It is common in the liter- ated at the ML estimate of the covariate population ature for authors to consider models in which this mean,µ ˆz = 8.2080, which is not the same as the is not the case. For example, one can easily mod- overall mean covariate value,z ¯ = 8.3182, because of ify the univariate analysis of covariance model (5), the imbalance with respect to treatments. The stan- for a randomized blocks experiment, to allow the dard errors for the adjusted means for treatments slope of the covariate regression to depend on the A and B are larger than for the other treatments treatment (see, for example, Milliken and Johnson because they are based on observations from three (2002), Chapter 9). However, as we have shown, blocks rather than four. this univariate analysis is incorrect because it fails to account for the block regression with respect to 6. DISCUSSION the covariate. If the block regression components The traditional methods for covariate adjustment are included in the model, should these also de- of treatment means in designed experiments are in- pend on the treatments? It is the opinion of these herently conditional. In order to develop a coher- authors that the correct univariate model for co- ent general methodology, we have proposed a multi- variate adjustment, if one exists, must be motivated variate variance components model for the joint dis- by a multivariate model for the joint distribution tribution of the response and covariates. We have of the response and the covariates. For example, a shown that, if the design is orthogonal with respect conditional model for the response in which the re- to blocking factors, then appropriate adjustments to gression slopes depend on the treatments is implied treatment means can be made using the univariate by a multivariate model in which the error covari- variance components model obtained by condition- ance structure is heterogeneous across treatments, ing on the observed covariate values. As noted in but this model also implies that the conditional er- Section 5, the key to this is the factorization for the ror variances are heterogeneous across treatments, joint distribution of (Y, Z), an assumption that is not typically made. In addi- tion, it seems unnatural to assume heterogeneity in y z y z z f( , ; θ)= fY |Z( | ; θ1)fZ( ; θ2), the error covariance structure unless there is also where the conditional density fY |Z defines a univari- heterogeneity in the block variance–covariance ma- ate linear mixed model for the response variable Y , trices. Thus, it is unclear to these authors if there is COVARIANCE IN DESIGNED EXPERIMENTS 13 a coherent univariate analysis that allows covariate cal models are used. In fact, the batching of ef- effects to depend on the treatments. fects in a hierarchical model has an exact counter- The ideas presented in this paper underscore the part in the rows of the analysis of the variance ta- importance of proper model specification and care- ble. In the case where the batches are nonexchange- ful parameter interpretation in regression analysis of able Gelman (2005) recommends subtracting batch- blocked and clustered data. The formulation of the level regression predictors, then additive effects for multivariate model guards against ad hoc formula- the factor levels in each batch could be modeled tion and misspecification of the regression model by as exchangeable. The proposed multivariate vari- omitting the block-level mean effects that may seri- ance components model for the joint distribution ously bias the estimate of the individual-level effects. of the response and covariates would be a better A number of articles have explored particular types starting point for the hierarchical modeling in the of adjusting and centering for block and cluster means. case of nonexchangeable batches. Assigning prob- There are several reasons for adjusting for the block ability distributions for the treatment effects and and cluster means. As noted by Berlin et al. (1999), variance components automatically leads to coher- variability in block and cluster means is common, ent Bayesian inferences for the analysis of the co- and can confound the estimated association between variance model. the individual-level exposure measurement and out- Our modeling strategy has assumed, as is tradi- come; adjusting for the cluster mean may remove tional in designed experiments, that the covariate confounding bias. Similarly, Neuhaus and Kalbfleisch values are not affected by the treatments, for exam- (1998) argue that inference on the individual-level ple, because they were measured prior to application effects can be misleading without adjustment. Both of the treatments. From a graphical models view- Kreft, de Leeuw and Aiken (1995) and Raudenbush point, our model is B → Y ← Z ← B, where B = and Bryk (2002) articulate the need for evaluating (By,Bz). It is a diversion to try to frame the model block and cluster-level effects as predictor variables in this article on a causal inference scaffold since the in their own right. The paper by Begg and Parides inferential goals are quite different. In this article we (2003) reviews different heuristic adjustment and have outlined a coherent framework for the adjust- centering approaches for the separation of individual- ment of treatment means in designed experiments level and block/cluster-level effects on response and that account for one or more covariates, whereas in their appropriate interpretation. In this paper we causality one is trying to assess an intervention ef- suggest a multivariate model that automatically fect (of Z on Y ) in the presence of a background yields the best adjustment and centering suggested variable (B) (Cox and Wermuth (2004)). The com- by Begg and Parides (2003). monality of the two issues lies in the fact that in both Throughout this article we assumed a joint normal one is trying to sort out a set of consistent condi- multivariate model. It is well known (see Cambanis, tional relations within a system of random variables. Huang and Simons (1981)) that conditional moment A goal in casual modeling is to address the overall calculations are robust with respect to the family of regression coefficient of Y on Z where B has been elliptically contoured distributions. That is, if two decoupled from Z, that is, B and Z are nonadja- random vectors have a joint elliptically contoured cent in the graph. A consequence of this decoupling distribution, then the conditional distribution of one is that γb in (11) equals zero so that the partial and given the other is also elliptically contoured. The lo- overall effect Z on Y coincide, in which case the cation and scale parameters of the conditional dis- conditional model implied by (8) reduces to the uni- tribution do not depend upon auxiliary parameters variate mixed model (5). Separating the block effect of the joint distribution, and consequently, the con- from the covariate massively restricts the scope of ditional mean and covariance calculations which ap- possible models. By starting with a bona fide multi- ply in the normal case are valid in this more general variate model for the joint distribution of response, elliptically contoured setting as well. covariates and blocks, one is led to a sensible uni- In the Bayesian context Gelman (2005) presents a variate conditional model, which properly accounts general hierarchical regression approach for ANOVA for the design with respect to the covariate. problems in which effects are structured into ex- Finally, we note that the multivariate variance changeable batches. In this sense, ANOVA is a spe- component model has interesting applications be- cial case of linear regression, but only if hierarchi- yond just analysis of covariance. For example, the 14 BOOTH, FEDERER, WELLS AND WOLFINGER generalization of a paired t-test for a univariate re- and for i = 2,...,t, sponse to multiple observations per subject is a mixed Y ∗|Z∗ = z∗ ∼ i.i.d. N(θ + γ z∗ ,σ2), effects model with between and within subject error ij ij ij i,y e ij e components. If the response is multivariate, then a j = 1, . . . , b, multivariate variance components model allows the 2 2 where γbe = (σe,yz + tσb,yz)/(σe,z + tσb,z)= γb + γe, same generalization to repeated multivariate mea- 2 2 2 surements. The use of multivatiate variance compo- θ1,yz = θ1,y −γbeθ1,z, and σbe = σe,y +tσb,y −γbe(σe,yz + nents models for repeated measures analysis is con- tσb,yz). sidered in Khuri, Mathew and Sinha (1998), Chap- From these distributional results we can easily ˆ ∗ ter 10. deduce the ML estimates. In particular, θ1,z =z ¯1 which impliesµ ˆz =z ¯, APPENDIX: ML ESTIMATION BASED ON b (z∗ − z¯∗ )y∗ b (¯z − z¯ )¯y THE BIVARIATE MODEL FOR A RCB DESIGN j=1 1j 1 1j j=1 j j γˆbe = = b (z∗ − z¯∗ )2 b (¯z − z¯ )2 The representation of the bivariate model given j=1 1j 1 j=1 j y z T in (9) implies that the joint density of ( , ) can be z(J¯t ⊗ Cb)y factored, = T , z (J¯t ⊗ Cb)z b t b (z∗ − z¯∗ )y∗ f(y, z)= fY |Z(yj|zj)fZ (zj). i=2 j=1 ij i ij γˆe = j=1 t b ∗ ∗ 2 i=2 j=1(zij − z¯i) Likelihood-based inference can equivalently be based t b (zij − z¯i − z¯j +z ¯)yij on the joint density of a 1–1 transformation of the = i=1 j=1 data vector. Specifically, let H denote the Helmet t b 2 t i=1 j=1(zij − z¯i − z¯j +z ¯) matrix of dimension t, and consider the transfor- ∗ ∗ T T zT (C ⊗ C )y mation, (Yj, Zj) → (Yj , Zj ) ≡ (H Yj, H Zj), for t b ∗ ∗ = T , j = 1, . . . , b. The (i, j)th component of Y is Yij = z (Ct ⊗ Cb)z T hi Yj, where hi is the ith column of H. Similarly, ˆ ∗ ∗ ˆ ∗ ∗ ∗ T θ1,yz =y ¯1 − γˆbez¯1 and θi,y =y ¯i − γˆez¯i, i = 2,...,t. Zij = hi Zj. ˆ ∗ T ′ Note that θi,y =y ¯i. Also, the ML estimate of γe h h ′ Now, using the facts that i i =0 for i = i , is identical to the OLS estimate of γ based on the hT h hT 1 i i =1, and i =0, for i = 2,...,t, it is straight- standard fixed effects model (1). Also, θˆ =y ¯∗ = forward to verify that the pairs, (Y ∗,Z∗ ), i = 1,...,t 1,y 1 ij ij hT y¯ , but θˆ =y ¯∗ − γˆ z¯∗ = hT (y¯ − γˆ z¯ ), for i = and j = 1, . . . , b, are mutually independent. Further- 1 i,y i e i i e more, 2,...,t. Hence, ˆ T T ∗ µˆ y = Hθy = HH y¯ − γˆeHH z¯ +γ ˆe1z¯ Y1j θ1,y ∗ ∼ i.i.d. N2 , ΣE + tΣB , Z θ1,z = y¯ − γˆ (z¯ − 1z¯ ), 1j e j = 1, . . . , b, which agrees exactly with the adjusted mean for- and for each i = 2,...,t, mula (2) based on the fixed effects model (1). ∗ Finally, if there is no between block variation in Yij θi,y 2 ∗ ∼ i.i.d. N2 , ΣE , j = 1, . . ., b, the covariate (i.e., σb,z = 0), then γb = 0. In this case, Zij 0 γˆbe andγ ˆe are independent estimates of γe. The ML T T where θi,y = hi µy and θ1,z = h1 1µz. It now follows estimate of γe in this case is a weighted average that of the two independent estimates, with weights in- Z∗ ∼ i.i.d. N(θ ,σ2 + tσ2 ), versely proportional to their conditional variances. 1j 1,z e,z b,z Since j = 1, . . . , b, 2 2 var(Y|z) = (σ It + σ J) ⊗ Ib ∗ ∗ ∗ ∗ 2 e b Y1j|Z1j = z1j ∼ i.i.d. N(θ1,yz + γbez1j,σbe), 2 2 ¯ = (σe + tσb )[(1 − ρ)It + ρJt], j = 1, . . . , b, it follows that Z∗ ∼ i.i.d. N(0,σ2 ), ij e,z σ2 + tσ2 z e b var(ˆγbe| )= T , i = 2,...,t,j = 1, . . . , b, z (J¯t ⊗ Cb)z COVARIANCE IN DESIGNED EXPERIMENTS 15 and Fisher, R. A. (1934). Statistical Methods for Research Work- ers, 5th ed. Oliver and Boyd Ltd., Edinburgh. σ2 + tσ2 z e b Gelman, A. (2005). Analysis of variance: Why it is more var(ˆγe| ) = (1 − ρ) T . z (Ct ⊗ Cb)z important than ever (with discussion). Ann. Statist. 33 1– 53. MR2157795 This implies that Kempthorne, O. (1952). Design and Analysis of Experi- T ments. Wiley, New York. MR0045368 z [(J¯t + 1/(1 − ρ)Ct) ⊗ Cb]y γˆ = Khuri, A. I., Mathew, T. and Sinha, B. K. (1998). Sta- e,ML T ¯ z [(Jt + 1/(1 − ρ)Ct) ⊗ Cb]z tistical Tests for Mixed Linear Models. Wiley, New York. zT I J¯ C y MR1601351 [( t − ρ t) ⊗ b] Kreft, I. G., de Leeuw, J. and Aiken, L. S. (1995). The = T , z [(It − ρJ¯t) ⊗ Cb]z effect of different forms of centering in hierarchical linear models. Multivariate Behavioral Research 30 1–21. which agrees with (7). Milliken, G. A. and Johnson, D. E. (2002). Analyis of Messy Data, Volume III: Analysis of Covariance. Chapman ACKNOWLEDGMENTS and Hall, New York. Neuhaus, J. M. and Kalbfleisch, J. D. (1998). Between- This paper honors the memory of our coauthor and within-cluster covariate effects in the analysis of clus- Walter T. Federer who died April 14, 2008. Booth tered data. Biometrics 54 638–645. supported in part by NSF Grant DMS-08-05865. Neuhaus, J. M. and McCulloch, C. E. (2006). Separat- Wells supported in part by NSF Grant 06-12031 and ing between- and within-cluster covariate effects by using NIH Grant R01-GM083606-01. conditional and partitioning methods. J. Roy. Statist. Soc. Ser. B 68 859–872. MR2301298 Pearce, S. C. (1953). Field Experiments With Fruit REFERENCES Trees and Other Perennial Plants. Commonwealth Bu- reau of Horticulture and Plantation Crops, Farnham Royal, Bartlett, M. S. (1936). A note on the analysis of variance. Slough, England, App. IV. Journal of Agricultural Science 26 388. Pearce, S. C. (1982). Encyclopedia of Statistical Sciences 1 Begg, M. D. and Parides, M. K. (2003). Separation of 61–69. Wiley, New York. individual-level and cluster-level covariate effects in regres- Penrose, R. A. (1955). A generalized inverse for matrices. sion analysis of correlated data. Stat. Med. 22 2591–2602. Proc. Cambridge Philos. Soc. 51 406–413. MR0069793 Berlin, J. A., Kimmel, S. E., Ten Have, T. R. and Sam- Rao, C. R. (1962). A note on a generalized inverse of a matrix mel, M. D. (1999). An empirical comparison of several with applications to problems in mathematical statistics. clustered data approaches under confounding due to clus- J. Roy. Statist. Soc. Ser. B 24 152–158. MR0138149 ter effects in the analysis of complications of coronary an- Raudenbush, S. W. and Bryk, A. S. (2002). Hierarchical gioplasty. Biometrics 55 470–476. Linear Models: Applications and Data Analysis Method, Bose, R. C. (1949). Least Squares Aspects of Analysis of 2nd ed. Sage Publications, Newbury Park, CA. Variance. Institute of Statistics Mimeo Series 9. Univer- Searle, S. R. (1956). Matrix methods in variance and covari- sity North Carolina, Chapel Hill. ance component analysis. Ann. Math. Statist. 27 737–748. Cambanis, S., Huang, S. and Simons, G. (1981). On the MR0081051 theory of elliptically contoured distributions. J. Multivari- Searle, S. R., Casella, G. and McCulloch, C. E. (1992). ate Anal. 11 368–385. MR0629795 Variance Components. Wiley, New York. MR1190470 Cochran, W. G. and Cox, G. M. (1957). Experimental De- Snedecor, G. W. and Cochran, W. G. (1967). Statisti- signs. 2nd ed. Wiley, New York. MR0085682 cal Methods, 6th ed. Iowa State Univ. Press, Ames, IA. Cox, D. R. and McCullagh, P. (1982). Some aspects of MR0381046 analysis of covariance (with discussion). Biometrics 38 Wells, M. T. (2009). A conversation with Shayle R. Searle. 541–561. MR0685170 Statist. Sci. 24 244–254. Cox, D. R. and Wermuth, N. (2004). Causality: A statisti- Wichsell, S. D. (1930). Remarks on regression. Ann. Math. cal view. Int. Statist. Review 72 285–305. Statist. 1 3–13. Federer, W. T. (1955). Experimental Design. MacMillan, Zelen, M. (1957). The analysis of covariance for incomplete New York. block designs. Biometrics 13 309–332. MR0090208