On the Sampling Variance of Intraclass Correlations and Genetic Correlations
Total Page:16
File Type:pdf, Size:1020Kb
Copyright 1998 by the Genetics Society of America On the Sampling Variance of Intraclass Correlations and Genetic Correlations Peter M. Visscher University of Edinburgh, Institute of Ecology and Resource Management, Edinburgh EH9 3JG, Scotland Manuscript received February 3, 1997 Accepted for publication March 23, 1998 ABSTRACT Widely used standard expressions for the sampling variance of intraclass correlations and genetic correlation coef®cients were reviewed for small and large sample sizes. For the sampling variance of the intraclass correlation, it was shown by simulation that the commonly used expression, derived using a ®rst-order Taylor series performs better than alternative expressions found in the literature, when the between-sire degrees of freedom were small. The expressions for the sampling variance of the genetic correlation are signi®cantly biased for small sample sizes, in particular when the population values, or their estimates, are close to zero. It was shown, both analytically and by simulation, that this is because the estimate of the sampling variance becomes very large in these cases due to very small values of the denominator of the expressions. It was concluded, therefore, that for small samples, estimates of the heritabilities and genetic correlations should not be used in the expressions for the sampling variance of the genetic correlation. It was shown analytically that in cases where the population values of the heritabili- ties are known, using the estimated heritabilities rather than their true values to estimate the genetic correlation results in a lower sampling variance for the genetic correlation. Therefore, for large samples, estimates of heritabilities, and not their true values, should be used. HERE are three classic papers on the topic of sam- tions for beef cattle traits, argued that using the esti- Tpling variances of estimates of genetic correlations mates of heritabilities and genetic correlations in the in the 1950s: Reeve (1955), who derived expressions expressions of Reeve (1955) and Robertson (1959a), of the sampling variance for parent-offspring designs; rather than their true (population) values, appeared to Robertson (1959a), who derived general expressions be closer to the observed empirical sampling variance for balanced one-way ANOVA designs with equal popu- of the genetic correlation coef®cient. This was corrobo- lation values for heritabilities; and Tallis (1959), who rated by a small simulation study. Furthermore, these derived general expressions for balanced and unbal- authors implied that the equations derived in Reeve anced one-way designs. Although the latter article is, in (1955) and Robertson (1959a) were inaccurate, be- a sense, the most general, it is not referred to frequently cause the empirical sampling variance of estimated ge- (it does not help that the reference added in the proof netic correlation coef®cients was much larger than the of the Robertson article points to the wrong journal). expected sampling variance, using both their data and It could be argued that the expressions derived in simulations. those articles are no longer relevant, since estimation Three areas of confusion can be identi®ed: techniques have moved on from least-squares methods to likelihood-based methods [mainly residual maximum 1. Should the parameters in the expressions for the likelihood (REML); Patterson and Thompson (1971)]. sampling variance of the genetic correlation coef- Using likelihood methods, sampling variances can be ®cient in Reeve (1955), Robertson (1959a), and approximated from likelihood pro®les (Meyer and Tallis (1959) be the population parameters or Hill 1991). However, numerous publications, particu- the estimates of those parameters? larly in the evolutionary genetics literature, still use the 2. How good are the expressions derived in the trio expressions derived by Reeve, Robertson, and Tallis. of articles? There appears to be some confusion about the use of 3. What is the impact of estimates that are outside expressions for the sampling variance of genetic correla- the parameter space, i.e., negative heritability esti- tions. For example, Koots and Gibson (1996), who mates and/or estimates of genetic correlations performed a meta-analysis of an impressive number ,21or.11? (1500) of estimates of heritabilities and genetic correla- In this article I review the main expressions of Reeve, Robertson, and Tallis and clarify under what circum- stances the expressions should be used. I also review Address for correspondence: University of Edinburgh, Institute of Ecol- ogy and Resource Management, West Mains Rd., Edinburgh EH9 and evaluate the equations for the sampling variance 3JG, Scotland. E-mail: [email protected] of intraclass correlations as a paradigm, since these are Genetics 149: 1605±1614 ( July 1998) 1606 P. M. Visscher central to the understanding of the assumptions and Robertson and Lerner (1949) quoted Fisher, but used methods. c 2 5 1/[(s 2 2)n(n 2 1)]. (5) Finally, following Zerbe and Goldgar (1980), an expression MATERIALS AND METHODS can be derived using To answer the above questions, we ®rst look at the derivation y 5 B/W 5 [1 1 (n 2 1)Ãt]/[1 2 Ãt ] of the sampling variance of intraclass correlation coef®cients z {[1 2 t]/[1 1 (n 2 1)t]} F . because this serves as an appropriate paradigm for the sam- s21, s(n21) pling variance of the genetic correlation coef®cient. Subse- Hence, quently, expressions for the sampling variance of the genetic ≈ 2 correlation are reviewed and evaluated. var(Ãt) [dtÃ/dy] var(y) 5 f(t) c3 with Sampling variance of intraclass correlation coef®cients 2 2 c3 5 [s(n 2 1)] [sn 2 3]/[n (s 2 1) Population parameters known: Consider a simple, balanced (s(n 2 1) 2 2)2 (s(n 2 1) 2 4)]. (6) one-way design with s sires and n progeny per sire, and assume that parameters are estimated using least-squares methods, Although this expression is trivial to derive, to my knowledge e.g., ANOVA. Observations are assumed to be normally distrib- this is the ®rst time that it has been documented following a uted. The expectations and variances of the between- and ®rst-order approximation using the F ratio. within-sire mean squares (MS) are All the above expressions for ci reduce to the one most 2 commonly used (i.e., c0 5 1/[n(s 2 1)(n 2 1)]) for large s E[B ] 5 [(1 2 t) 1 nt]sp , and n. However, even for large n, there are discrepancies 2 E[W] 5 [(1 2 t)]sp , between the formulas depending on the use of s,(s21), or (s 2 2) in the denominator. It is likely that the form used by var(B) 5 2{[(1 2 t) 1 nt]s 2}2/[s 2 1], p Robertson and Lerner is incorrect and probably stems from 2 2 Fisher Fisher var(W ) 5 2{[(1 2 t)]sp} /[s(n 2 1)]. the use of 's z-transformation. showed that with B and W the between- and within-sire MS, t the popula- var(z) ≈ n/[2(n 2 1)(s 2 2)], tion intraclass correlation, and sp the phenotypic standard deviation. The variance of the least-squares estimate of the with intra-class correlation, 1 z 5 ¤2log{[1 1 (n 2 1)Ãt]/[1 2 Ãt]}. Ãt 5 [(B 2 W )/n]/[(B 2 W )/n 1 W ] Hence, 5 [B 2 W ]/[B 1 (n 2 1)W ] 5 A/C , var(Ãt ) ≈ [dtÃ/dz]2 var(z) 5 {4(1 2 t)2[1 1 (n 2 1)t]2/n2]} can be approximated using a ®rst-order Taylor-series expan- 2 2 sion about the expected mean squares, {n/[2(n 1)(s 2)]} 5 var(Ãt) ≈ var(A)/E 2(C) 1 E 2(A)var(C )/E 4(C) f(t)c 2 . Osborne Paterson 2 2E(A)cov(A,C )/E 3(C) However, as pointed out by and (1952), this is a roundabout way of deriving the sampling variance of 5 2(1 2 t)2[1 1 (n 2 1)t]2 [(sn 2 1)/ the estimated intraclass correlation, by ®rst transforming to z and then back again. (n2s(s 2 1)(n 2 1))] Population values unknown: Except when doing power cal- ≈ 2(1 2 t)2[1 1 (n 2 1)t]2/[(s 2 1)n(n 2 1)]. (1) culations and/or investigating the design of experiments (Robertson 1959b) or simulation studies, we do not know Robertson Falconer Equation 1 is well known ( 1959b; and the population values and hence do not know the exact or Mackay 1996). Its derivation, using means and variances of approximate sampling variance of the intraclass correlation. Osborne Pat- mean squares, appears to be ®rst given by and The standard practice is to use the formulae derived in the erson (1952). previous section and to substitute Ãt for t. This is essentially For large n, Equation 1 reduces to based upon the assumption that E(Ãt) 5 t. However, it should var(Ãt) ≈ 2t 2 (1 2 t)2/(s 2 1). (2) be obvious that if, by chance, the estimate of the intraclass correlation is too high (or too low), the resulting estimate of A number of expressions similar to Equation 1, which all the sampling variance will be biased. Using differ in the terms relating to the number of sires and progeny per sire, can be found in the literature. They differ in particu- d{f(t)}/dt 5 4[1 2 t][1 1 (n 2 1)t] lar in the degrees of freedom relating to the between-sire [(n 2 1)(1 2 2t) 2 1] (7) component of variance. The general expression is ≈ gives a maximum estimate of the sampling variance of Ãt 5 var(Ãt) f(t) ci (3) 1 Taylor /2 [(n 2 2)/(n 2 1)] (see also 1976).