Additive Genetic Variability and the Bayesian Alphabet
Total Page:16
File Type:pdf, Size:1020Kb
Genetics: Published Articles Ahead of Print, published on July 20, 2009 as 10.1534/genetics.109.103952 1 Additive genetic variability and the Bayesian alphabet ; ; ; \ 2 Daniel Gianola y z , Gustavo de los Campos, William G. Hill , # 3 Eduardo Manfrediz and Rohan Fernando e-mail: [email protected] Department of Animal Sciences, University of Wisconsin-Madison, Madison, Wisconsin 53706 yDepartment of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, N-1432 Ås, Norway zInstitut National de la Recherche Agronomique, UR631 Station d’Amélioration Génétique des Animaux, BP 52627, 32326 Castanet-Tolosan, France \Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JT, UK #Department of Animal Science, Iowa State University, Ames, Iowa 50011 4 July 14, 2009 5 Corresponding author: Daniel Gianola, Department of Animal Sciences, 1675 Observatory 6 Dr., Madison, WI 53706. Phone: (608) 265-2054. Fax: (608) 262-5157. E-mail: 7 [email protected]. 8 Running head: Quantitative variation and the Bayesian alphabet 9 Key Words: quantitative genetics, marker assisted selection, genomic selection, single nu- 10 cleotide polymorphisms, Bayesian methods, prior distributions. 1 11 ABSTRACT 12 The use of all available molecular markers in statistical models for prediction of quantitative 13 traits has led to what could be termed a genomic-assisted selection paradigm in animal and plant 14 breeding. This paper provides a critical review of some theoretical and statistical concepts in the 15 context of genomic-assisted genetic evaluation of animals and crops. First, relationships between 16 the (Bayesian) variance of marker e¤ects in some regression models and additive genetic variance 17 are examined under standard assumptions. Second, the connection between marker genotypes and 18 resemblance between relatives is explored, and linkages between a marker-based model and the 19 in…nitesimal model are reviewed. Third, issues associated with the use of Bayesian models for 20 marker-assisted selection, with a focus on the role of the priors, are examined from a theoretical 21 angle. The sensitivity of some Bayesian speci…cation that has been proposed (called "Bayes A" ) 22 with respect to priors is illustrated with a simulation. Methods that can solve potential shortcomings 23 of some of these Bayesian regression procedures are discussed brie‡y. 24 1 Introduction 25 In an in‡uential paper in animal breeding, MEUWISSEN et al. (2001) suggested using all avail- 26 able molecular markers as covariates in linear regression models for prediction of genetic value for 27 quantitative traits. This has led to a genome-enabled selection paradigm. For example, major dairy 28 cattle breeding countries are now genotyping elite animals and genetic evaluations based on SNPs 29 (single nucleotide polymorphisms) are becoming routine (HAYES et al. 2009; VAN RADEN et al. 30 2009). A similar trend is taking place in poultry (e.g., LONG et al. 2007; GONZÁLEZ-RECIO et 31 al. 2008, 2009), beef cattle (GARRICK 2008 personal communication) and plants (HEFFNER et 32 al. 2009). 33 The extraordinary speed with which events are taking place hampers the process of relating 34 new developments to extant theory, as well as the understanding of some of the statistical methods 35 proposed so far. These span from Bayes hierarchical models (e.g., MEUWISSEN et al. 2001) 36 and the Bayesian LASSO (e.g., de LOS CAMPOS et al. 2009b) to ad hoc procedures (e.g., VAN 37 RADEN 2008). Another issue is how parameters of models for dense markers relate to those of 38 classical models of quantitative genetics. 39 Many statistical models and approaches have been proposed for marker-assisted selection. These 40 include multiple regression on marker genotypes (LANDE and THOMPSON 1990), best linear unbi- 41 ased prediction (BLUP) including e¤ects of a single marker locus (FERNANDO and GROSSMAN 42 1989), ridge regression (WHITTAKER et al. 2000), Bayesian procedures (MEUWISSEN et al. 43 2001; GIANOLA et al. 2003; XU 2003; de los CAMPOS et al. 2009b) and semi-parametric speci…- 44 cations (GIANOLA et al. 2006a; GIANOLA and de los CAMPOS 2008). In particular, the methods 45 proposed by MEUWISSEN et al. (2001) have captured enormous attention in animal breeding, be- 46 cause of several reasons. First, the procedures cope well with data structures in which the number 47 of markers amply exceeds the number of markers, the so called "small n; large p" situation. Second, 2 48 the methods in MEUWISSEN et al. (2001) constitute a logical progression from the standard BLUP 49 widely used in animal breeding to richer speci…cations, where marker-speci…c variances are allowed 50 to vary at random over many loci. Third, Bayesian methods have a natural way of taking into 51 account uncertainty about all unknowns in a model (e.g., GIANOLA and FERNANDO 1986) and, 52 when coupled with the power and ‡exibility of Markov chain Monte Carlo, can be applied to almost 53 any parametric statistical model. In MEUWISSEN et al. (2001) normality assumptions are used 54 together with conjugate prior distributions for variance parameters; this leads to computational 55 representations that are well known and have been widely used in animal breeding (e.g., WANG 56 et al. 1993, 1994). An important question is that of the impact of the prior assumptions made in 57 these Bayesian models on estimates of marker e¤ects and, more importantly, on prediction of future 58 outcomes, which is central in animal and plant breeding. 59 A second aspect mentioned above, is how parameters from these marker based models relate 60 to those of standard quantitative genetics theory, with either a …nite or an in…nite number (the 61 in…nitesimal model) of loci assumed. The relationships depend on the hypotheses made at the 62 genetic level and some of the formulae presented, e.g., in MEUWISSEN et al. (2001) use linkage 63 equilibrium; however, the reality is that marker-assisted selection relies on the existence of linkage 64 disequilibrium. While a general treatment of linkage disequilibrium is very di¢ cult in the context of 65 models for predicting complex phenotypic traits, it is important to be aware of its potential impact. 66 Another question is the de…nition of additive genetic variance, according to whether marker e¤ects 67 are assumed …xed or random, with the latter corresponding to the situation in which such e¤ects 68 are viewed as sampled randomly from some hypothetical distribution. MEUWISSEN et al. (2001) 69 employ formulae for eliciting the variance of marker e¤ects, given some knowledge of the additive 70 genetic variance in the population. Their developments begin with the assumption that markers 71 e¤ects are …xed, but these eventually become random without a clear elaboration of why this is so. 72 Since their formulae are used in practice for eliciting priors in the Bayesian treatment (e.g., HAYES 73 et al. 2009), it is essential to understand their ontogeny, especially considering that priors may have 74 an impact on prediction of outcomes. 75 The objective of this paper is to provide a critical review of some of these aspects in the context 76 of genomic-assisted evaluation of quantitative traits. First, relationships between the (Bayesian) 77 variance of marker e¤ects in some regression models and additive genetic variance are examined. 78 Second, connections between marker genotypes and resemblance between relatives are explored. 79 Third, the liaisons between a marker-based model and the in…nitesimal model are reviewed. Fourth, 80 and in the context of the quantitative genetics theory discussed in the preceding sections, some 81 statistical issues associated with the use of Bayesian models for (massive) marker-assisted selection 82 are examined, with main focus on the role of the priors. The sensitivity of some of these methods 83 with respect to priors is illustrated with a simulation. 3 84 2 Relationship between the variance of marker e¤ects and additive 85 genetic variance 86 2.1 Additive genetic variance 87 A simple speci…cation serves to set the stage. The phenotypic value (y) for a quantitative trait is 88 described by the linear model 89 y = wa + E; (1) 90 where a is the additive e¤ect of a bi-allelic locus on the trait, w is a random indicator variable 91 (covariate) relating a to the phenotype, and E is an independently distributed random deviate, E 92 (0;VE) ; where VE is the environmental or residual variance, provided gene action is additive. Under 93 Hardy-Weinberg (HW) equilibrium, the frequencies of the three possible genotypes are Pr (MM) = 2 2 94 p ; Pr (Mm) = 2pq and Pr (mm) = q ; where p = Pr (M) and q = 1 p = Pr (m) : Code arbitrarily 95 the states of w such that 1 with probability p2 96 w = 8 0 with probability 2pq (2) <> 1 with probability q2 97 The genetic values of MM; Mm and mm:> individuals are a; 0 and a; respectively. Then 98 E (w) = p q; (3) 99 2 100 E w = 1 2pq (4) 101 and 102 V ar (w) = 2pq: (5) 103 This is the variance among genotypes (not among their genetic values) at the locus under HW 104 equilibrium. 105 A standard treatment (e.g., FALCONER AND MACKAY 1996) regards the additive e¤ect a as 106 a …xed parameter. The conditional distribution of phenotypes, given a (but unconditionally with 107 respect to w; since genotypes vary at random according to HW frequencies) has mean 108 E (y a) = E (w) a = (p q) a; (6) j 109 and variance 2 2 110 V ar (y a) = a V ar (w) + VE = 2pqa + VE = VA + VE (7) j 2 111 where VA = 2pqa is the additive genetic variance at the locus.