The Intraclass Covariance Matrix
Total Page:16
File Type:pdf, Size:1020Kb
Behavior Genetics, Vol. 35, No. 5, September 2005 (Ó 2005) DOI: 10.1007/s10519-005-5877-1 The Intraclass Covariance Matrix Gregory Carey1,2 Received 03 March 2005—Final 10 May 2005 Introduced by C.R. Rao in 1945, the intraclass covariance matrix has seen little use in behavioral genetic research, despite the fact that it was developed to deal with family data. Here, I reintroduce this matrix, and outline its estimation and basic properties for data sets on pairs of relatives. The intraclass covariance matrix is appropriate whenever the research design or mathematical model treats the ordering of the members of a pair as random. Because the matrix has only one estimate of a population variance and covariance, both the observed matrix and the residual matrix from a fitted model are easy to inspect visually; there is no need to mentally average homologous statistics. Fitting a model to the intraclass matrix also gives the same log likelihood, likelihood-ratio (LR) v2, and parameter estimates as fitting that model to the raw data. A major advantage of the intraclass matrix is that only two factors influence the LR v2—the sampling error in estimating population parameters and the discrepancy between the model and the observed statistics. The more frequently used interclass covariance matrix adds a third factor to the v2—sampling error of homologous statistics. Because of this, the degrees of freedom for fitting models to an intraclass matrix differ from fitting that model to an interclass matrix. Future research is needed to establish differences in power—if any—between the interclass and the intraclass matrix. KEY WORDS: Twins; Sib-Pairs; linear models; covariance matrix; correlation matrix; interclass covariance; intraclass covariance. In current behavioral genetic research on twins or sib- In many other cases, however, the assignment of pairs, scores for the second member of a pair are members to twin 1 or twin 2 is random or arbitrary concatenated to those for the first member and then a with respect to the design or the models fitted to the covariance matrix is calculated. This covariance data. In this case, the vector of means for twin 1 and the matrix is an interclass matrix. In some cases, the vector of means for twin 2 have the same expectation, assignment of members to ‘‘twin 1’’ versus ‘‘twin 2’’ the within-person covariance matrix for twin 1 and the (or sib 1 versus sib 2) is dictated by the research de- within-person covariance matrix for twin 2 have the sign. For example, if pairs are ascertained because of a same expectation, and the cross-twin covariance ma- single affected individual, then the proband may be trix is symmetric. Here, the twins are in an intraclass designated at ‘‘twin 1’’ and the cotwin as ‘‘twin 2.’’ relationship. Note that the choice of interclass or in- Another example is the covariance matrix for oppo- traclass relationships is not a property of the twins per site-sex DZ twins where females, say, are twin 1 and se but depends on the research design and the models males are twin 2. fitted to data. In one analysis of the same data set, opposite-sex DZ pairs could be treated as being in an 1 Department of Psychology and Institute for Behavioral Genetics, interclass relationship while the next analysis could University of Colorado, Boulder, CO, 80309-0345, USA. deal with them as an intraclass relationship. 2 To whom correspondence should be addressed at Department of When the design or model implies an intraclass Psychology and Institute for Behavioral Genetics, University of relationship, then the researcher may calculate the Colorado, Boulder. Tel: +1-303-492-1658; Fax: +1-303-492- 2967; e-mail: [email protected] intraclass covariance matrix and fit models to that 667 0001-8244/05/0900-0667/0 Ó 2005 Springer Science+Business Media, Inc. 668 Carey matrix. This strategy, however, is seldom employed in the phenotype. If the phenotypes are sampled from a behavioral genetic research. The purpose of this multivariate normal, then the log likelihood (less a paper is to explain the intraclass covariance matrix constant) for a sample of N pairs is and outline several of its basic properties.1 N N 1 t 1 log L log R xi l RÀ xi l : ð Þ ¼ À 2 j j À 2 i 1 ð À Þ ð À Þ INTRACLASS STATISTICS X¼ Tedious algebraic reduction for the first derivatives of Let k denote the number of phenotypes. Here, I this function with respect to reveal that the maxi- develop the intraclass model for a single type of twins l (e.g., MZ or DZ pairs) and later extend it to multiple mum likelihood (ML) estimate of the expected value types. In the intraclass model, the expected mean for for the kth phenotype is any phenotype for the first member of the pair is the N same as that for the second member. Letting l denote X1ki X2ki i i 1 þ the mean for the ith phenotype, the vector of expected l^ ¼ : k ¼ P 2N values for an intraclass relationship may be written as This is simply the sample mean treating the raw l l ; l ; . ; l ; l ; l ; . ; l t: ¼ ð 1 2 k 1 2 kÞ scores of twin 1 and twin 2 as if they were indepen- Because of the intraclass relationship, the within- dent observations. person covariance matrix, W, for the first twin will Let D denote the matrix of the deviations of raw scores from their means and let dt be the th row equal that of the second twin, and the cross-twin i i covariance matrix, T, will be symmetric. Hence, the vector from this matrix. Then the log likelihood of the scores corrected for the mean is expected intraclass covariance matrix, R, will be partitioned as N N 1 t 1 log L log R di RÀ di W T ð Þ ¼ À 2 j j À 2 i 1 R : 1 X¼ ¼ T W ð Þ N 1 t 1 log R trace D DRÀ : 2 Note that an interclass matrix contains what I shall call ¼ À 2 j j À 2 ð Þ ð Þ homologous statistics—two estimates of the same Using the derivatives for matrix elements given by population variance or covariance. For example, the Mulaik (1972), it can be shown that the maximum variance of the first phenotype for twin 1 and the var- likelihood estimate for a variance is iance for the first phenotype for twin 2 are homologous N statistics. Usually, these are different numbers, so vi- 2 2 D1ki D2ki sual inspection of the matrix requires the mental 2 i 1 þ r^ ¼ ; averaging of two numbers. The intraclass matrix, on k ¼ P 2N the other hand, has one and only one estimate for a the maximum likelihood estimate of a within-indi- population variance or covariance. Hence, there is no vidual covariance is need for mental averaging in this matrix. N D1jiD1ki D2jiD2ki i 1 þ INTRACLASS ESTIMATORS W^ jk ¼ ; ¼ P 2N Let the vector of scores for the ith twin pair be and the maximum likelihood of a cross-twin covari- represented as ance is t xi X11i; X12i; . ; X1ki; X21i; X22i; . ; X2ki ; N ¼ ð Þ D1jiD2ki D1kiD2ji where the first subscript denotes the first or second i 1 þ T^jk ¼ : member of the pair and the second subscript denotes ¼ P 2N While it is possible to use numerical methods to 1 I cannot detail all of the properties of intraclass relationships here, obtain estimates for the means and the covariance especially the extension to variable sibship size. For further details, matrix, a more efficient strategy is to create a data set the inquisitive reader should consult C.R. Rao (1945), who is credited with developing the multivariate intraclass covariance by double entering the twin pairs (first enter twin 1’s matrix, Konishi et al., (1991) and Srivastava et al., (1988). scores followed by twin 2’s scores, then enter twins 2’s The Intraclass Covariance Matrix 669 scores followed by twin 1’s scores). The observed for numerical error, one will get the same log like- means from these data are the maximum likelihood lihood, the same likelihood ratio v2, and the same estimators of the means. The corrected sum of parameter estimates. The fit statistics and parameter squares and cross products matrix (SSCP) matrix estimates from fitting a model to the interclass ma- from this data set (i.e., corrected for the means) trix will differ from those derived from the same divided by 2N is the maximum likelihood estimate model fitted to raw data. of R. Note that when N is large (as it should be for Another advantage of using the intraclass ML), then the results from computing a covariance covariance matrix comes in assessing the fit of the matrix using standard statistical packages that divide model by inspection of the residual matrix, i.e., dif- the SSCP matrix by 2N 1 will be acceptable for all ference between the observed and the predicted practical purposes. À covariance matrices. With the intraclass matrix, there The estimators point out an important difference is no need to mentally average two homologous between the intraclass and the interclass matrix. For a residuals to assess fit. given data set, there is one and only one intraclass The major difference between fitting models to matrix. On the other hand, there is a large number of an interclass and to an intraclass matrix lies in the interclass matrices, each one dependent on which degrees of freedom (df).