Biased Estimators of Quantitative Trait Locus Heritability and Location in Interval Mapping

Biased Estimators of Quantitative Trait Locus Heritability and Location in Interval Mapping

Heredity (2005) 95, 476–484 & 2005 Nature Publishing Group All rights reserved 0018-067X/05 $30.00 www.nature.com/hdy Biased estimators of quantitative trait locus heritability and location in interval mapping M Bogdan1,2 and RW Doerge2,3 1Institute of Mathematics, Wroclaw University of Technology, Wroclaw, Poland; 2Department of Statistics, Purdue University, West Lafayette, IN 47907, USA; 3Department of Agronomy, Purdue University, West Lafayette, IN 47907, USA In many empirical studies, it has been observed that genome the testing bias), and we use interval mapping to estimate its scans yield biased estimates of heritability, as well as genetic location and effect, the estimator of the effect will be biased. effects. It is widely accepted that quantitative trait locus As evidence, we present results of simulations investigating (QTL) mapping is a model selection procedure, and that the the relative importance of the two sources of bias and the overestimation of genetic effects is the result of using the dependence of bias of heritability estimators on the true QTL same data for model selection as estimation of parameters. heritability, sample size, and the length of the investigated There are two key steps in QTL modeling, each of which part of the genome. Moreover, we present results of biases the estimation of genetic effects. First, test proce- simulations demonstrating the skewness of the distribution dures are employed to select the regions of the genome for of estimators of QTL locations and the resulting bias in which there is significant evidence for the presence of QTL. estimation of location. We use computer simulations to Second, and most important for this demonstration, esti- investigate the dependence of this bias on the true QTL mates of the genetic effects are reported only at the locations location, heritability, and the sample size. for which the evidence is maximal. We demonstrate that Heredity (2005) 95, 476–484. doi:10.1038/sj.hdy.6800747; even when we know there is just one QTL present (ignoring published online 28 September 2005 Keywords: quantitative trait locus; interval mapping; biased estimators; model selection Introduction and references given there). It is already known that the bias in the estimation of QTL effects arises naturally Interval mapping (Thoday, 1961; Lander and Botstein, from the fact that standard QTL mapping can be 1989) is a well-known method for identifying and interpreted as a model selection procedure and that it locating quantitative trait loci (QTL). Extended (Zeng, uses the same data to choose the best model and estimate 1993, 1994) to include, and conditional on, additional its parameters (Utz et al, 2000; Broman, 2001; Go¨ring et al, regions of the genomes, interval mapping has been 2001; Ball, 2001; Allison et al, 2002). There are two main expanded to composite interval mapping, and both are steps in the model selection process each of which adds currently enjoying a renewed popularity as statistical to the bias of the estimators of genetic effects. First, in tools for identifying genomic regions associated with performing the genome scan, all QTL are identified when quantitative data (eg gene expression QTL or eQTL). the corresponding test statistic (log-odds score or like- Where QTL mapping was once an end point for lihood-ratio test) exceeds a certain threshold. The bias experimentation, it is now a jumping-off point for resulting from using the same data for testing and exciting genomic applications (Jansen and Nap, 2001; estimation has been very well investigated (Broman, Doerge, 2002; Wayne and McIntyre, 2002; Schadt et al, 2001; Go¨ring et al, 2001; Allison et al, 2002; Xu, 2003). The 2003). Since QTL are genomic regions associated with second, and more subtle point, in QTL mapping relies on variation of a quantitative trait, it is useful to estimate choosing the locations for which the evidence of a QTL is effects of QTL and heritability, as well as their genetic maximal, and then reporting only the estimates of the map location for the purpose of narrowing in on genetic effects obtained for these optimal locations. In important components involved in the genetic architec- QTL mapping, while many test statistics are significant, ture of many experimental populations (Mackay, 2001). only the location that provides the largest test statistic In practice, it has been observed that the estimators of value along with the estimated QTL effect(s) for that effects for QTL estimated via interval mapping, and its location is reported. Although Broman (2001) and extensions (Zeng, 1993, 1994), can be severely inflated Allison et al (2002) suggest that this latter step is also a (see Beavis, 1994, 1998; Utz et al, 2000; Allison et al, 2002, source of the bias, they do not discuss this issue in detail. In this work, we explore the second source of bias. Namely, we demonstrate via interval mapping that even Correspondence: RW Doerge, Department of Statistics, Purdue Univer- when one QTL is assumed present (ie ignoring the sity, 1399 Mathematical Sciences Building, West Lafayette, IN 47907- 1399, USA. E-mail: [email protected] testing step), model selection is still performed and the Received 29 June 2004; accepted 18 July 2005; published online 28 resulting estimator of the genetic effect is biased. September 2005 Furthermore, the relative importance of the two sources Bias in interval mapping M Bogdan and RW Doerge 477 of bias is investigated with respect to the dependence of of this distribution is given by bias of heritability estimators on true QTL heritability, ! 2 sample size, and the length of the genome that is tested. ffiffiffiffiffiffi1 ðy À m À 0:5aÞ fiðyÞ¼pi p exp À We also demonstrate that the distribution of estimators 2p s 2s2 for QTL location is skewed toward the middle of the !ð2Þ 2 chromosome under investigation, which in turn results ffiffiffiffiffiffi1 ðy À m þ 0:5aÞ þð1 À piÞ p exp À in the bias of QTL location. 2p s 2s2 Although we explain the phenomenon of biased QTL effects and location via interval mapping, it is well There are no closed-form calculations for the maximum- known for cases involving multiple QTL that the likelihood estimators of parameters m, a, and s2 relative estimators provided by simple interval mapping can to r1; therefore, the likelihood still be biased because the effects of the other QTL are Yn neglected (ie employing a single QTL model). To address 2 Lðm; a; s Þ¼ fiðYiÞð3Þ this problem improved methods, like composite interval i¼1 mapping (CIM) (Zeng, 1993, 1994), multiple QTL mapping (MQM) (Jansen, 1993), or multiple interval may be maximized by employing the EM algorithm mapping (MIM) (Kao et al, 1999), have been developed. (Dempster et al, 1977). For an application of the EM These methods search the entire genome and report algorithm to interval mapping, see Jansen and Stam QTL effects at the locations where the test statistics are (1994) and Kao and Zeng (1997). The maximization largest and exceed a given threshold. The model procedure is repeated at each increment through the selection process that is incorporated in these more interval, and then over a dense grid of locations, or advanced methods (eg CIM, MQM, MIM) is the same as marker intervals, covering the genome. At each location, that used for traditional interval mapping; therefore, the the likelihood of the fitted model is recorded, and the resulting estimators of QTL effects and heritability will location that maximizes the likelihood function is also suffer from the bias issue that is the focus of this regularly used as a point estimator of QTL location with research. the corresponding estimate a taken as the point estimate for the QTL effect. Therefore, interval mapping can be viewed as a process that selects from statistical models corresponding to different QTL locations. Since the Methods likelihood of a given model depends on the estimate of heritability (see Appendix A1), the model selection process chooses models with the highest heritability Explanation of bias of heritability estimates resulting from estimates, thus leading to overestimation of this quantity. model selection We demonstrate the overestimation of heritability in Consider a backcross population where Xi denotes the more detail by considering a situation where the markers QTL genotype for the ith individual, and allow it to take are densely spaced and no model evaluations within ¼ 1 on two values (Kao et al, 1999): Xi 2 if the ith individual the interval are performed. Essentially, the interval ¼À1 is homozygous at a QTL and Xi 2 if it is hetero- mapping process, in this extreme situation, reduces to a zygous. We assume that the relationship between the regression over markers and chooses the marker that quantitative trait value Yi and a QTL genotype Xi is results in the highest coefficient of determination R2 (ie described by a normal regression model explains the most phenotypic variation) as a candidate Yi ¼ m þ aXi þ xi ð1Þ location for a QTL. To understand this, assume that a where m and a are the overall mean and QTL effect QTL is located at the first marker (M1) such that the quantitative trait data are distributed according to the parameters, respectively, and xi is the error term (environmental noise), which is normally distributed normal regression model with mean 0 and variance s2. Yi ¼ m þ aZ1i þ xi ð4Þ Interval mapping is based on an estimated genetic map and the assumption that a putative QTL is at a where Z1i is the genotype of the ith individual at first B 2 particular location. Each incremental location is flanked marker M1 and xi N(0, s ). If the position of the QTL is by a pair of markers that are related to each other via an a known, then the problem of estimating the genetic effect 2 priori estimated recombination value r.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    9 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us