Copyright Ó 2009 by the Genetics Society of America DOI: 10.1534/genetics.109.103952

Additive Genetic Variability and the Bayesian Alphabet

Daniel Gianola,*,†,‡,1 Gustavo de los Campos,* William G. Hill,§ Eduardo Manfredi‡ and Rohan Fernando** *Department of Animal Sciences, University of Wisconsin, Madison, Wisconsin 53706, †Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, N-1432 A˚ s, Norway, ‡Institut National de la Recherche Agronomique, UR631 Station d’Ame´lioration Ge´ne´tique des Animaux, BP 52627, 32326 Castanet-Tolosan, France, §Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom and **Department of Animal Science, , Ames, Iowa 50011 Manuscript received April 14, 2009 Accepted for publication July 16, 2009

ABSTRACT The use of all available molecular markers in statistical models for prediction of quantitative traits has led to what could be termed a genomic-assisted selection paradigm in animal and . This article provides a critical review of some theoretical and statistical concepts in the context of genomic- assisted genetic evaluation of animals and crops. First, relationships between the (Bayesian) variance of marker effects in some regression models and additive genetic variance are examined under standard assumptions. Second, the connection between marker genotypes and resemblance between relatives is explored, and linkages between a marker-based model and the infinitesimal model are reviewed. Third, issues associated with the use of Bayesian models for marker-assisted selection, with a focus on the role of the priors, are examined from a theoretical angle. The sensitivity of a Bayesian specification that has been proposed (called ‘‘Bayes A’’) with respect to priors is illustrated with a simulation. Methods that can solve potential shortcomings of some of these Bayesian regression procedures are discussed briefly.

N an influential article on , multiple regression on marker genotypes (Lande and I Meuwissen et al. (2001) suggested using all available Thompson 1990), best linear unbiased prediction (BLUP) molecular markers as covariates in linear regression including effects of a single-marker locus (Fernando models for prediction of genetic value for quantitative and Grossman 1989), ridge regression (Whittaker traits. This has led to a genome-enabled selection et al. 2000), Bayesian procedures (Meuwissen et al. 2001; paradigm. For example, major dairy cattle breeding Gianola et al. 2003; Xu 2003; de los Campos et al. countries are now genotyping elite animals and genetic 2009b), and semiparametric specifications (Gianola evaluations based on SNPs (single nucleotide poly- et al. 2006a; Gianola and de los Campos 2008). In morphisms) are becoming routine (Hayes et al. 2009; particular, the methods proposed by Meuwissen et al. van Raden et al. 2009). A similar trend is taking place (2001) have captured enormous attention in animal in poultry (e.g.,Long et al. 2007; Gonza´lez-Recio breeding, because of several reasons. First, the proce- et al. 2008, 2009), beef cattle (D. J. Garrick,personal dures cope well with data structures in which the communication), and plants (Heffner et al. 2009). number of markers amply exceeds the number of The extraordinary speed with which events are taking observations, the so-called ‘‘small n, large p’’ situation. place hampers the process of relating new develop- Second, the methods in Meuwissen et al. (2001) con- ments to extant theory, as well as the understanding of stitute a logical progression from the standard BLUP some of the statistical methods proposed so far. These widely used in animal breeding to richer specifications, span from Bayes hierarchical models (e.g.,Meuwissen where marker-specific variances are allowed to vary at et al. 2001) and the Bayesian Lasso (e.g., de los Campos random over many loci. Third, Bayesian methods have a et al. 2009b) to ad hoc procedures (e.g., van Raden natural way of taking into account uncertainty about all 2008). Another issue is how parameters of models for unknowns in a model (e.g.,Gianola and Fernando dense markers relate to those of classical models of 1986) and, when coupled with the power and flexibility quantitative genetics. of Markov chain Monte Carlo, can be applied to almost Many statistical models and approaches have been any parametric statistical model. In Meuwissen et al. proposed for marker-assisted selection. These include (2001) normality assumptions are used together with conjugate prior distributions for variance parameters; this leads to computational representations that are 1Corresponding author: Department of Animal Sciences, 1675 Observa- well known and have been widely used in animal tory Dr., Madison, WI 53706. E-mail: [email protected] breeding (e.g.,Wang et al. 1993, 1994). An important

Genetics 183: 347–363 (September 2009) 348 D. Gianola et al. question is that of the impact of the prior assump- y ¼ wa 1 E; ð1Þ tions made in these Bayesian models on estimates of marker effects and, more importantly, on prediction of where a is the additive effect of a biallelic locus on the future outcomes, which is central in animal and plant trait, w is a random indicator variable (covariate) breeding. relating a to the phenotype, and E is an independently A second aspect mentioned above is how parameters distributed random deviate, E ð0; VEÞ, where VE is from these marker-based models relate to those of the environmental or residual variance, provided gene standard quantitative genetics theory, with either a finite action is additive. Under Hardy–Weinberg (HW) equi- or an infinite number (the infinitesimal model) of loci librium, the frequencies of the three possible genotypes 2 2 assumed. The relationships depend on the hypotheses are PrðMMÞ¼p ; PrðMmÞ¼2pq, and PrðmmÞ¼q , made at the genetic level and some of the formulas where p ¼ PrðMÞ and q ¼ 1 p ¼ PrðmÞ. Code arbi- presented, e.g.,inMeuwissen et al. (2001), use linkage trarily the states of w such that equilibrium; however, the reality is that marker-assisted 8 < 2 selection relies on the existence of linkage disequilib- 1 with probability p w ¼ ð2Þ rium. While a general treatment of linkage disequilib- : 0 with probability 2pq 2: rium is very difficult in the context of models for 1 with probability q predicting complex phenotypic traits, it is important The genetic values of MM, Mm, and mm individuals are to be aware of its potential impact. Another question is a, 0, and a, respectively. Then the definition of additive genetic variance, according to whether marker effects are assumed fixed or random, EðwÞ¼p q; ð3Þ with the latter corresponding to the situation in which 2 such effects are viewed as sampled randomly from some Eðw Þ¼1 2pq; ð4Þ hypothetical distribution. Meuwissen et al. (2001) and employ formulas for eliciting the variance of marker effects, given some knowledge of the additive genetic VarðwÞ¼2pq: ð5Þ variance in the population. Their developments begin with the assumption that marker effects are fixed, but This is the variance among genotypes (not among their these eventually become random without a clear elab- genetic values) at the locus under HW equilibrium. oration of why this is so. Since their formulas are used in A standard treatment (e.g.,Falconer and Mackay practice for eliciting priors in the Bayesian treatment 1996) regards the additive effect a as a fixed parameter. (e.g.,Hayes et al. 2009), it is essential to understand The conditional distribution of phenotypes, given a their ontogeny, especially considering that priors may (but unconditionally with respect to w, since genotypes have an impact on prediction of outcomes. vary at random according to HW frequencies), has The objective of this article is to provide a critical mean review of some of these aspects in the context of Eðy j aÞ¼EðwÞa ¼ðp qÞa ð6Þ genomic-assisted evaluation of quantitative traits. First, relationships between the (Bayesian) variance of marker and variance effects in some regression models and additive genetic 2 2 variance are examined. Second, connections between Varðy j aÞ¼a VarðwÞ 1 VE ¼ 2pqa 1 VE ¼ VA 1 VE; marker genotypes and resemblance between relatives ð7Þ are explored. Third, the liaisons between a marker- 2 based model and the infinitesimal model are reviewed. where VA ¼ 2pqa is the additive genetic variance at the Fourth, and in the context of the quantitative genetics locus. In this standard treatment, the additive genetic theory discussed in the preceding sections, some statis- variance depends on the additive effect a but not on its tical issues associated with the use of Bayesian models for variance; unless a is assigned a probability distribution, (massive) marker-assisted selection are examined, with it does not possess variance. the main focus on the role of the priors. The sensitivity of Uncertainty about the additive effect: Assume now 2 some of these methods with respect to priors is illus- that a u; sa ,whereu is the mean of the distribu- trated with a simulation. tion. In the Appendix of Meuwissen et al. (2001), a suddenlymutatesfromfixedtorandomwithout explanation. How does the variance of a arise? Param- RELATIONSHIP BETWEEN THE VARIANCE 2 eter sa can be assigned at least two interpretations. The OF MARKER EFFECTS AND ADDITIVE firstoneisasthevariancebetweena effects in a GENETIC VARIANCE conceptual sampling scheme in which such effects Additive genetic variance: A simple specification are drawn at random from a population of loci. In the 2 serves to set the stage. The phenotypic value (y) for a second (Bayesian), sa represents uncertainty about quantitative trait is described by the linear model the true but unknown value of the additive effect of the Quantitative Variation and the Bayesian Alphabet 349 specific locus, but without invoking a scheme of In a nutshell, the additive genetic variance at the 2 random sampling over a population of loci. For locus, VA ¼ 2pqa , does not appear in (11), because a has 2 2 example, sa ¼ 0 means in a Bayesian sense that a ¼ u been integrated out; actually, it is replaced by 2pqu in 2 with complete certainty, but not necessarily that the (11). The term ð1 2pqÞsa does not arise in the locusdoesnothaveaneffect,sinceu may (may not) be standard (fixed) model, where additive genetic variance distinct from 0; this cannot be overemphasized. With a in stricto sensu is VA. When a is known with certainty, then 2 single locus the Bayesian interpretation is more in- sa ¼ 0 and yet the locus generates additive genetic 2 2 tuitive, since it is hard to envisage a reference pop- variance, as measured by VA ¼ 2pqu .Ifu ¼ 0 and sa ¼ 0, ulationoflociinthiscase. then the variance is purely environmental. In short, the 2 2 Irrespective of the interpretation assigned to sa, the connection between the uncertainty variance sa and 2 assumption a u; sa induces another conditional the additive variance VA (which involves the effect of the distribution (given w) with mean and variance locus) is elusive when both genotypes and effects are random variables. Eðy j wÞ¼Eðwa j wÞ¼wu ð8Þ Several loci: Consider now K loci with additive effect ak at locus k, without dominance or epistasis. The and phenotype is expressible as 2 2 Varðy j wÞ¼w sa 1 VE; ð9Þ XK y ¼ w a 1 e; ð13Þ respectively. However, both w (the genotypes) and a k k k¼1 (their effects) are now random variables possessing P K a joint distribution. Here, it is assumed that w and a are with Eðy j a1; a2; ... ; akÞ¼ k¼1ðpk qkÞak under HW independent, but this may not be so, as in a mutation- equilibrium. If all markers were quantitative trait loci stabilizing selection model (e.g.,Turelli 1985) or in (QTL), this would be a ‘‘finitenumber of QTL’’model; it situations discussed by Zhang and Hill (2005) where is assumed that these loci are markers hereinafter unless the distribution of gene frequencies after selection stated otherwise. Under this specification depends on a. Deconditioning over both a and w (that ! XK is, averaging over all genotypes at the locus and all values Varðy j a1; a2; ... ; ak Þ¼Var wk ak j a1; a2; ... ; ak 1 VE: that a can take), the expected value and variance of the k¼1 marginal distribution of y are ð14Þ To deduce the additive variance, some assumption must EðyÞ¼Ew½Eðy j wÞ ¼ EðwÞu ¼ðp qÞu ð10Þ be made about the joint distribution of w1, w2,..., wK, the genotypes at the K loci. and Linkage equilibrium (LE) and HW frequencies are assumed to make the problem tractable. Some expres- VarðyÞ¼E ½Varðy j wÞ 1 Var ½Eðy j wÞ ww sions are available to accommodate linkage disequilib- ¼ E w2s2 1 V 1 Var ½wu w a E w rium, but parameters are not generally available to s2 1 u2 1 : ¼ð1 2pqÞ a 2pq VE ð11Þ evaluate them (see the appendix). When there is se- lection, genetic drift, or introgression and many loci are This distribution is not normal: the phenotype results considered jointly, some of which will be even physically from multiplying a discrete random variable w by the linked, the LE assumption is unrealistic. Hence, as in normal variate a, and then adding a deviate E, which the case of many other authors, e.g.,Barton and de may or may not be normal, depending on what is Vladar (2009) in a study of evolution of traits using assumed about the residual distribution. The genetic 2 2 statistical mechanics, LE–HW assumptions are used variance is now ð1 2pqÞsa 1 2pqu . The term ð1 2 here. 2pqÞsa stems from randomness (uncertainty) about a, 2 If there is LE, the distributions of genotypes at the K and it dissipates from (11) only if sa ¼ 0. Note that there 2 loci are mutually independent, so that is additive genetic variance even if sa ¼ 0, and it is equal u2 u to 2pq . Further, if the standard assumption ¼ 0is XK adopted, the variance of the marginal distribution of 2 Varðy j a1; a2; ... ; akÞ¼ VarðwkÞak 1 VE phenotypes becomes k¼1 XK 2 2 VarðyÞ¼ð1 2pqÞsa 1 VE: ð12Þ ¼ 2pkqkak 1 VE: ð15Þ k¼1 Here, the standard term for additive genetic variance 2 disappears and yet sa may not be zero, since one poses The multilocusP additive genetic variance under LE–HW K 2 that a locus has no effect (on average) but there is some is then VA ¼ k¼1 2pkqkak . uncertainty or variation among loci effects, as repre- Suppose now that all effects ak ðk ¼ 1; 2; ... ; K Þ are 2 sented by sa. drawn from the same random process with some 350 D. Gianola et al. distribution function PðaÞ, mean u, and variance s2. the difficulty of inferring a given s2 with any reasonable a ak Using the same reasoning as before, the variance of the precision, there is the question of possible ‘‘commonal- marginal distribution of the phenotypes is ity’’ between locus effects. For instance, some loci ! !may have correlated effects, and alternative forms XK XK 2 for the prior distribution of the a’s have been suggested VarðyÞ¼Vara ðpk qk Þak 1 Ea 2pk qk ak 1 VE : k¼1 k¼1 by Gianola et al. (2003). Also, some of these effects ð16Þ are expected to be identically equal to 0, especially if Then ak represents a marker effect, as opposed to being XK XK the result of a QTL [if the marker is in linkage 2 2 VarðyÞ¼sa ð1 2pkqkÞ 1 u 2pkqk 1 VE; ð17Þ disequilibrium (LD) with the QTL, its effect would k¼1 k¼1 be expected to be nonnull]. In such a case, a more flexible prior distribution might be useful, such as a which generalizes (11) to K loci. The first term is mixture of normals, a double exponential, or a Dirichlet variance due to uncertainty, the second term is the process. standard additive genetic variance, and the third one is If a frequentist interpretation is adopted for the residual variation. 2 assumption ak N uk; s , it is difficult to envisage What is the relationship between the multilocus ak P the conceptual population from which a is sampled, additive genetic variance V ¼ K 2p q a2 and s2? k A i¼1 k k k a unless the variances are also random draws from some Let V 9 be the mean variance, obtained by averaging V A A population. Posing a locus-specific variance is equiva- over the distribution of the a’s. This operation yields ! lent to assuming that each sire in a sample of Holsteins is XK XK drawn from a different conceptual population with sire- 2 2 2 V A9 ¼ E 2pkqkak ¼ðsa 1 u Þ 2pkqk: specific variance. There would be as many variances as i¼1 i¼1 there are sires! Variation in allelic frequencies: In addition to assum- Hence, if u ¼ 0 (additive effects expressed as a deviation ing random variation of a effects over loci, a distribution from their mean), then of gene frequencies may be posed as well. Wright XK (1937) found that a beta distribution arose from a 2 V A9 ¼ sa 2pkqk: ð18Þ diffusion equation that was used to study changes in i¼1 allele frequencies in finite populations, so this is well grounded in population genetics. In a Bayesian context, The relationship between the uncertainty variance s2 a on the other hand, assigning a beta distribution to gene and the marked average additive genetic variance V 9 A frequencies would be a (mathematically convenient) would then be representation of uncertainty. V 9 Suppose that allelic frequencies pk (k ¼ 1, 2, ..., K)vary s2 ¼ P A ; ð19Þ a K over loci according to the same beta Bðfa; fbÞ process, i¼1 2pkqk where fa, fb are parameters determining the form of the in agreement with Habier et al. (2007), but different distribution; Wright (1937) expressed these parameters from Meuwissen et al. (2001). Unless the markers are as functions of effective population size and mutation QTL, V A9 gives only the part of the additive genetic rates. Standard results yield variance captured by markers, and this may be a tiny fa fraction only (Maher 2008). This makes the connection EðpÞ¼ ¼ p0; ð20Þ ðf 1 f Þ between additive genetic variance for a trait and marked a b variance even more elusive. fb Corresponding formulas under linkage disequilib- EðqÞ¼ ¼ q0; ð21Þ ðfa 1 fbÞ rium are in the appendix. Heterogeneity of variance: Suppose now that locus and effects are independently (but not identically) distrib- p0q0 2 VarðpÞ¼ : ð22Þ uted as ak N uk; sa . The meanP of the marginal f 1 f 1 1 k K a b distribution of phenotypes is EðyÞ¼ k¼1ðpk qkÞuk, and the variance becomes The expected heterozygosity is given by

XK XK ðf 1 f Þ ð Þ¼ ð Þs2 1 u2 1 : a b : Var y 1 2pkqk ak 2pkqk k VE 2pq ¼ 2p0q0 ð23Þ k¼1 k¼1 ðfa 1 fb 1 1Þ If all s2 ¼ 0 (complete Bayesian certainty about all There are now four sources of variation: (1) due to ak marker effects ak), there would still be genetic variance, random sampling of genotypes over individuals in the as measured by the second term in VarðyÞ. Apart from population, (2) due to uncertainty about additive Quantitative Variation and the Bayesian Alphabet 351 effects (equivalently, variability due to sampling additive The variance of the marginal distribution of the effects over loci), (3) due to spread of gene frequencies phenotypes has, thus, three components. The one 2 over loci or about some equilibrium distribution, and involving sa relates to uncertainty about or random (4) environmental variability. The third source contrib- variation of marker effects. The second, utes to variance under conceptual repeated sampling, ðf 1 f 1 2Þ since gene frequencies would fluctuate around the 2pq a b K u2; equilibrium distribution or over loci. ðfa 1 fbÞ Consider now additive genetic variance when dispersion is exactly the additive genetic variance when the mean of in allelic frequencies is taken into account, assuming LE. the distribution of marker effects ðuÞ is known with Let p ¼ðp ; p ; ... ; p Þ9 and a ¼ða ; a ; ... ; a Þ9.Us- 1 2 K 1 2 K complete certainty, and the third component is the ing previous results, (22) and (23), environmental variance VE.Ifu ¼ 0 and the first term of (29) is interpreted as additive genetic variance (V $), it Varðy j aÞ¼Ep"#½Varðy j p; aÞ 1 Varp½Eðy j p"#; aÞ A XK XK turns out that 2 ¼ Ep 2pkqkak 1 VE 1 Varp ðpk qkÞak V A$ k¼1 k¼1 s2 ¼ : XK a 2 ðf 1 f 1 2Þ ðp0 q0Þ 1 2pqððfa 1 fb 1 2Þ=ðfa 1 fbÞÞ K ¼ 2p q a b a2 1 V : 0 0 ðf 1 f 1 1Þ k E ð30Þ a b k¼1 ð24Þ abier The expected additive genetic variance is now This is similar to H et al. (2007) only if p0 ¼ q0, fa 1 fb is large enough, and K is very large, such that XK ðf 1 f 1 2Þ ð V ¼ 2p q a b a2: ð25Þ XK A 0 0 f 1 f 1 k 1 ð a b 1Þ ¼ lim 2pkqk ¼ 2pð1 pÞf ðp j fa; fbÞdp; k 1 K /‘ K k¼1 To arrive at the marginal distribution of phenotypes, f ; f variation in a effects is brought into the picture, where f ðp j a bÞ is the beta density representing producing the variance decomposition variation of allelic frequencies.

VarðyÞ¼Ea½Varðy j aÞ 1 Vara½Eðy j aÞ ; ð26Þ RESEMBLANCE BETWEEN RELATIVES after variation in genotypes and frequencies (i.e., with Standard results: QTL are most often unknown, so respect to w and p) has been taken into account. From their effects and their relationships to those of markers (13) are difficult to model explicitly. An alternative is to focus on effects of markers, whose genotypes are presumably Eðy j a; pÞ¼E ½Eðy j a; w; pÞ w"# in linkage disequilibrium with one or several QTL, so XK XK can be thought of as proxies. Using a slightly different ¼ Ew wk ak j a; p ¼ ð2pk 1Þak ; k¼1 k¼1 notation, the marker-based linear model suggested by XK Meuwissen et al. (2001) for genomic-assisted prediction Eðy j aÞ¼Ep½Eðy j a; pÞ ¼ðp0 q0Þ ak ; of genetic values is k¼1 y ¼ Xb 1 Wa 1 e; ð31Þ so that in the absence of correlations between locus m effects where y is an n 3 1 vector of phenotypic values, b is a

2 2 vector of macroenvironmental nuisance parameters, X Vara½Eðy j aÞ ¼ K ðp0 q0Þ s : ð27Þ a is an incidence matrix, am ¼ fgamk is a K 3 1 vector of Likewise, using (24), additive effects of markers, and W ¼ fgwik is a known incidence matrix containing codes for marker geno- ðf 1 f 1 2Þ types, e.g., 1, 0, and 1 for mm, Mm, and MM, respec- a b s2 1 u2 1 : Ea½Varðy j aÞ ¼ 2p0q0 K ð a Þ VE tively. Let the ith row of W be w9 and assume that ðfa 1 fb 1 1Þ i e N 0; Is2 is a vector of microenvironmental re- ð28Þ e sidual effects. Combining (27) and (28) as required by (26) leads to If genotypes are sampled at random from the pop- ulation, this induces a probability distribution with 2 ðfa 1 fb 1 2Þ 2 VarðyÞ¼ ðp0 q0Þ 1 2p0q0 K sa mean vector ðfa 1 fb 1 1Þfa ðf 1 f 1 2Þ Eðy j b; a Þ¼Xb 1 EðWÞa 1 2pq a b Ku2 1 V : m m f 1 f E ð a b Þ ¼ Xb 1 ðP QÞam; ð32Þ

ð29Þ where 352 D. Gianola et al.

EðWÞ VarðWam j am Þ 2 3 2 3 2 3 p p :::p q q :::q am9 Dam r12a9m Dam :::r1na9m Dam 1 2 K 1 2 K 6 7 6 7 6 7 6 a9m Dam :::r2na9m Dam 7 6 p1 p2 :::pK 7 6 q1 q2 :::qK 7 6 7 6 7 6 7 6 7 6 7 6 7 6 ::: : 7 6 ::::::7 6 ::::::7 ¼ 6 7 ¼ 6 7 6 7 6 :: : 7 6 ::::::7 6 ::::::7 6 7 6 7 6 7 4 ::5 4 ::::::5 4 ::::::5 symmetric am9 Dam p1 p2 :::pK q1 q2 :::qK ¼ VAAr ; ð33Þ ¼ P Q; where

XK and P and Q are matrices whose columns contain pk 2 VA ¼ a9mDam ¼ 2pkqkak ð34Þ and qk, respectively, in every position of column k. k¼1 Every element of the columnP vector ðP QÞam is the K is the additive genetic variance among multilocus same and equal to g ¼ k¼1ðpk qkÞak, a constant that can be absorbed into the intercept element of b. genotypes in HW–LE, and Ar ¼ rij is the matrix of Hence, and without loss of generality, Eðy j b; amÞ¼ additive relationships between individuals; D is diag- Eðy j bÞ. onal in this case. The variance–covariance matrix The covariance matrix (regarding b and marker (33) involves the fixed marker effects am, but these effects am as fixed parameters) is get absorbed into VA. The implication is that a model with the conditional 2 (given marker effects and gene frequencies) expecta- Varðy j b; aÞ¼VarðWamÞ 1 Ise: tion function Here Eðy j b; amÞ¼Xb 1 ðP QÞam ð35Þ

VarðWa j a Þ [recall that ðP QÞa ¼ 1g] and conditional covari- 2 m m 3 m am9 Varðw1Þam am9 Covðw1; w92Þam :::am9 Covðw1; wn9 Þam ance matrix 6 7 6 : a9 Varðw Þa :::a9 Covðw ; w9 Þa 7 ! 6 m 2 m m 2 n m 7 6 7 XK 6 ::::::7 ¼ 6 7: 2 2 6 ::::::7 Varðy j b; amÞ¼ 2pkqkak Ar 1 Ise ð36Þ 6 7 4 ::::::5 k¼1 symmetric ::::a9m Varðwn Þam has the equivalent representation

If genotypes are drawn at random from the same y ¼ Xb 1 a* 1 e; ð37Þ P population, all wi vectors have the same distribution, K with a* ¼ Wam ¼ ai* ¼ k¼1 wikak distributed as that is, wi ðp; DÞ for all i. If the population is in HW– LE, D ¼ fg2p q is a diagonal matrix. Further, if indi- * k k a ð0; Ar VAÞ: ð38Þ viduals are genetically related, off-diagonal terms am9 Cov wi; wj9 am are not null, because of covariances This is precisely the standard model of quantitative due to genotypic similarity. genetics applied to a finite number of marker loci (K), To illustrate, let i be a randomly chosen male mated to where additive genetic variation stems from the sam- a random female, and let j be a randomly chosen pling of genotypes but not of their effects. The assump- descendant. Under HW–LE, genotypes at different tion of normality is not required. marker loci are mutually independent, within and Formulas under LD are given in the second section of between individuals, so it suffices to consider a single the appendix. locus. It turns out that Estimating the pedigree-based relationship matrix and expected heterozygosity using markers: Consider 1 1 Covðw ; w Þ¼pq ¼ ð2pqÞ¼ Varðw Þ; model (37): i j 2 2 i y ¼ Xb 1 a* 1 e ¼ Xb 1 Wam 1 e: yielding the standard result that the covariance between 1 genotypes of offspring and parent is 2 of the variance Given the observed marker genotypes W, the distribu- between genotypes at the locus in question. This tion of W is irrelevant with regard to inference about generalizes readily to any type of additive relationships a*. It is relevant only if one seeks to estimate parameters in the population. of the genotypic distribution, e.g., gene frequencies and Letting rij betheadditiverelationshipbetweeni linkage disequilibrium statistics. Consider, for instance, and j, the expected value of WW9: Quantitative Variation and the Bayesian Alphabet 353

2 EðWW9Þ¼Efwi9wj g; i; j ¼ 1; 2; ... ; n; sa is not the additive genetic variance, which is given by P (34) above. where w9w ¼ K w w , and the sum is over 2 i j k¼1 i;k j;k Assuming am N 0; Isa implies that a* ¼ Wam markers. If all individuals belong to the same genotypic must be normal (given W). However, the elements of distribution, as argued above, W (indicators of genotypes, discrete) are also random so the finite sample distribution of a* is not normal; XK XK XK however, as K goes to infinity, the distribution ap- Eðw9w Þ¼ Eðw w Þ¼r 2p q 1 ðp q Þ2; i j i;k j;k ij k k k k proaches normality, as discussed later in this section. If k¼1 k¼1 k¼1 a and W are independently distributed, the mean ð39Þ m vector and covariance matrix of the distribution of abier 1 marked breeding values are as in H et al. (2007); if pk ¼ qk ¼ 2 , then E wi9wj ¼ rij ðK =2Þ so Eðð2=K ÞWW9Þ¼Ar . Under this idealized Eða*Þ¼E ½E ðWa j a Þ ¼ E ½ðP QÞa ¼ 0 assumption ð2=K ÞWW9 provides an unbiased estimator am W m m am m of the pedigree-based additive relationship matrix, but ð40Þ only if HW–LE holds. Note that under a beta distribu- tion of gene frequencies and

K * X Varða Þ¼Vara ½EðWam j amÞ 1 Ea ½VarðWam j amÞ : 1 ðfa 1 fbÞ m m lim 2pkqk ¼ 2p0q0 : K /‘ f 1 f 1 ð41Þ K k¼1 ð a b 1Þ

Likewise Using the fact that EWðWam j amÞ¼ðP QÞam and (33), then under LE assumptions XK 1 ðf 1 f Þ ! 2 a b XK lim ðpk qkÞ ¼ 1 4p0q0 : K /‘ K ðf 1 f 1 1Þ * 2 2 k¼1 a b Varða Þ¼ðP QÞðP QÞ9sa 1 Ar Eam 2pkqkak : k¼1 Using this continuous approximation, (39) becomes P P K 2 2 K Since Ea 2pkqka ¼ s 2pkqk,itfollowsthat Eðw9i wj Þ¼rij KH0 1 K ð1 2H0Þ¼K 1 KH0ðrij 2Þ; m k¼1 k a k¼1 where H ¼ 2pq ¼ 2p q ððf 1 f Þ=ðf 1 f 1 1ÞÞ is "# 0 0 0 a b a b XK the expected heterozygosity. Since the relationship * 2 2 Varða Þ¼ðP QÞðP QÞ9sa 1 Ar sa 2pkqk : holds for any combination i, j and there are n(n 1 1)/ k¼1 2 distinct elements in WW9 and in A , one can form an r ð42Þ estimator of mean heterozygosity as

w9w K This shows that when both genotypes (the w’s) and Hˆ ¼ i j ; 0 K ðr 2Þ their effects (the a’s) vary at random according to ij independent trinomial (at each locus) and normal distributions, respectively, the variance of the distribu- where w9i wj and rij are averages over all distinct elements tion of a marked breeding value a* is affected by dif- in WW9 and Ar. The estimator is simple, but no claim is made about its properties. ferences in gene frequencies pk qk,bythevarianceof 2 the distribution of marker effects sa,andbythelevel of heterozygosity. In the special case where allelic 1 frequencies are equal to 2 ateachoftheloci,thefirst CONNECTION WITH THE INFINITESIMAL MODEL 2 term vanishes and one gets Varða*Þ¼Ar s˜ a*,where 2 2 Clearly, (31) or (37) with (38) involves a finite s˜ a* ¼ K sa=2. This can be construed as the counterpart number of markers, and the marked additive genetic of the polygenic additive variance of the standard variance is a function of allelic frequencies and of infinitesimal model, but in a special situation. Again, 2 effects of individual markers. In the infinitesimal model, this illustrates that sa relates to additive genetic on the other hand, the effects of individual loci or of variance in a more subtle manner than would appear gene frequencies do not appear explicitly. How do these at first sight. two types of models connect? WhatP is the distribution of marked breeding values K The vector of additive genetic effects (or marked ai* ¼ k¼1 wikak when both wik and ak vary at random? * breedingP value) of all individuals is a* ¼ Wam ¼ Because ak is normal, the conditional distributionP ai j wi K 2 K 2 fgai* ¼ k¼1 wikak . Next, assume that effects a1, is normal, with mean 0 and variance sa k¼1 wik. a2, ..., aK are independently and identically distributed Thus, one can write the density of the marginal 2 as ai N 0; sa , but this does not need to be so. Again, distribution of ai*as 354 D. Gianola et al. X X X * * finitesimally small as K /‘, so the covariance matrix pðai Þ¼ ... pðai j wiÞPrðwi1; wi2; ... ; wikÞ; wi1 wi2 wiK tends to

* 2 Varða Þ¼Ar sa*: where Prðwi1; wi2; ... ; wikÞ¼PrðwiÞ is the probability of observing the K-dimensional marker genotype wi in individual i. If the population is in HW–LE at all K As shown by Habier et al. (2007), markers may act as marker loci, the joint distribution of genotypes of an a proxy for a pedigree. Hence, unless the pedigree individual over the K loci is the product of the marginal is introduced into the model in some explicit form, distributions at each of the marker loci; that is, markers may be capturing relationships among individ- uals, as opposed to representing flags for QTL regions. YK At any rate, it is essential to keep in mind that markers Prðw Þ¼ ðp2Þwik1 ð2p q Þwik2 ðq2Þwik3 : i k k k k are not genes. k¼1

For example, if individual i is AA at locus k, wik1 ¼ 1, THE BAYESIAN ALPHABET wik2 ¼ 0, and wik3 ¼ 0; if i is heterozygote wik1 ¼ 0, wik2 ¼ The preceding part of this article sets the quantitative 1, and wik3 ¼ 0, and if i has genotype aa, then wik1 ¼ 0, genetics theory basis upon which marker-assisted pre- wik2 ¼ 0, and wik3 ¼ 1. Note that wik3 ¼ 1 wik1 wik2,so diction of breeding value (using linear models) rests. that there are only two free indicator variates. It follows Specifically, Meuwissen et al. (2001) use this theory to that the marginal distribution of ai* is a mixture of assess values of hyperparameters of some Bayesian models K 3 normal distributions each withP a null mean, but proposed by these authors. In what follows, a critique of s2 K 2 /‘ distribution-specific variance a k¼1 wik.AsK , the some methods proposed for inference is presented. mixing probabilities PrðwiÞ become infinitesimally Bayes A: Meuwissen et al. (2001) suggested using small, so that model (31) for marker-enabled prediction of additive pða*Þ genetic effects and proposed two Bayesian hierarchical ið P K 2 structures, termed ‘‘Bayes A’’ and ‘‘Bayes B.’’ A brief qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1 ð k¼P1 wik ak Þ P exp 2 K 2 f ðwÞdw; review of these two methods follows, assuming that k 2 K 2 2sa k¼1 wik 2psa k¼1 wik denotes a SNP locus. In Bayes A (using notation employed here), the prior for some density f(w). The mixture distribution of ai* distribution of a marker effect ak, given some marker- must necessarily converge toward a Gaussian one, specific uncertainty variance s2 , is assumed to be P ak K 2 because ai* ¼ wikak is the sum of a large (now normal with null mean and dispersion s . In turn, k¼1 ak infinite) number of independent random variates (note the variance associated with the effect of each marker that under LE wikak is independent of any other wik9ak9 k ¼ 1, 2, ..., K is assigned the same scaled inverse chi- because genotypes at different loci are mutually in- square prior distribution s2 j n; S2 , where n and S 2 are ak dependent), so the central limit theorem holds; it holds known degrees of freedom and scale parameters, re- even under weaker assumptions. Then spectively. This hierarchy is represented as

* 2 2 2 2 2 2 2 a N ð0; s * Þ: j s ð ; s Þ; s j y; y x : i a ak ak N 0 ak ak S S n

Since all components of the mixture have null means, its The marginal prior induced for ak is obtained by variance is just the average of the variances of all com- integrating the normal density over s2 , yielding ak ponents of the mixture (Gianola et al. 2006b); that is, ð ! ‘ pða j y; S2Þ¼ N ð0; s2 Þpðs2 j y; S2Þs2 X X X XK k ak ak ak 2 2 2 0 ! s * ¼ lim ... PrðwiÞ s w : ð a K /‘ a ik ‘ 2 2 w w w ¼ a 1 yS i1 i2 iK k 1 } ðs2 Þðð11y12Þ=2Þ k s2 ak exp 2 d ak 0 2sa This is the additive genetic variance of an infinitesimal  k model, i.e., one with an infinite number of loci, so that a2 ððy11Þ=2Þ } 1 1 k the probability of any genotypic configuration is in- yS2 finitesimally small. ð43Þ When the joint distribution of additive genetic values a* of a set of individuals is considered, it is reasonable to (Box and Tiao 1973; Sorensen and Gianola 2002). conjecture that it should be multivariate normal, This is the kernel of the density of the t-distribution 2 especially under LE. Its mean vector is Eða*Þ¼0,as ½ak j 0; y; S , which is the de facto prior in Meuwissen shown in (40), and the covariance matrix of the limiting et al. (2001) assigned to a marker effect. Again, y, S 2 are process can be deduced from (42). The first term assumed known and given arbitrary values by the user; becomes null, because allelic frequencies became in- this is a crucial issue. Quantitative Variation and the Bayesian Alphabet 355

Bayes B: In Bayes B, Meuwissen et al. (2001) proposed ( point mass at some constant c if s2 ¼ 0 2 ak ak j s ak ð ; s2 Þ s2 . N 0 ak if ak 0

 0 with probability p s2 j p ¼ ak 2 2 yS xn with probability 1 p:

This implies that the joint prior distribution of ak and s2 , given an arbitrary probability parameter p,is ak

2 pðak ; s j pÞ ( ak a ¼ c and s2 ¼ 0 with probability p igure k ak F1.—Ratio betweenpffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi coefficients of variation ¼ 2 2 2 CV s2 j ELSE =CV s2 ¼ 1 1=ðn 3Þ of the condi- N ð0; s ÞpðyS xn Þ with probability 1 p; ak ak ak tional posterior and prior distributions of the variance of the marker effect, as a function of the degrees of freedom where c is a constant (if s2 ¼ 0, this implies that a is ak k n of the prior. known with certainty), taken by Meuwissen et al. (2001) to be equal to 0, even though it does not need to be 0. u Marginally, after integrating s2 out, the prior takes the For parameter of a model, Bayesian learning should ak form be suchp thatffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi the posterior coefficient of variation, that is,  CV ¼ Varðu j DATAÞ=Eðu j DATAÞ, tends to 0 asymp- ak ¼ c with probability p totically as DATA accrue. This does not happen in Bayes pðak j pÞ¼ 2 A or Bayes B for s2 . In Bayes A, the coefficients of tð0; y; S Þ with probability 1 p: ak variation of the prior and of the fully conditional Then, Bayes B reduces to Bayes A by taking p ¼ 0. posterior distribution are Acritique:NeitherBayesAnorBayesB(orany sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi variations thereof that assume marker-specific varian- 2 CVðs2 Þ¼ ; n . 4; ces) allow Bayesian learning on these variances to ak ðn 4Þ proceed far away from the prior. This means that the hyperparameters of the prior assigned to these varian- and ces (according to assumptions based on quantitative sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi genetics theory) will always have influence on the 2 CVðs2 j ELSEÞ¼ ; n . 3; extent of shrinkage produced on marker effects. A user ak ðn 3Þ can arbitrarily control the extent of shrinkage simply by varying y and S 2. It suffices to illustrate this problem respectively, so that CV s2 j ELSE =CV s2 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ak ak with Bayes A, since the problem propagates to Bayes B. 1 1=ðn 3Þ. This ratio goes to 1 rapidly as the It is straightforward to show that the fully condi- degrees of freedom of the prior increase (meaning that tional (i.e., given all other parameters and the data, a the prior ‘‘dominates’’ inference), as illustrated in situation denoted as ELSE hereinafter) posterior Figure 1. For example, if n ¼ 4.1, n ¼ 5.1, and n ¼ 6.1, distribution of s2 is the scaled inverse chi-square the ratio between the coefficients of variation of the ak process s2 j n 1 1; ðnS2 1 a2Þ=ðn 1 1Þ ,soBayesian conditional posterior distribution of s2 and that of ak k ak learning ‘‘moves’’ only a single degree of freedom away its prior is 0.30, 0.72, and 0.82, so that the prior is from the prior distribution s2 j n; S2 ,eventhough influential even at mild values of the degrees of ak its scale parameter is modified from S 2 into freedom. Given a large n, the conditional posterior 2 2 ðnS 1 ak Þ=ðn 1 1Þ (most markers are expected to have essentially copies the prior, with the contribution of nearly null effects). Now, since ak is unknown, and DATA being essentially nil. As mentioned, since mar- inferring it consumes information contained in the ginal posterior inferences about s2 require decondi- ak data, this implies that, unconditionally (that is, a tioning over ak (thus consuming information contained posteriori), inferences about s2 areevenmorestrongly in the data), the impact of the prior will be even more ak affected by the information encoded by its prior marked at the margins. This is questionable, at least distribution than in the conditional posterior process. from an inference perspective. For instance, if n ¼ 5, say, this means that the posterior Another way of illustrating the same problem is based degree of belief about s2 has an upper bound at 6, on computing information gain, i.e., the difference in ak irrespective of whether data on millions of markers or entropy before and after observing data. Since the of individuals become available. posterior distribution of s2 is unknown, we consider ak 356 D. Gianola et al. the entropy of the fully conditional posterior distribu- tion of s2 , instead of that of the marginal process. This ak provides an upper bound for the information gain. The entropy of the prior is nohi s2 j y; 2 H ak S ð hi ¼ ðs2 j y; 2Þ ðs2 j y; 2Þ s2 log p ak S p ak S d ak    n nS2 n n d n ¼ log G 1 1 1 logG : 2 2 2 2 dðn=2Þ 2 ð44Þ

In the entropy of the fully conditional posterior distri- igure bution of s2 , H s2 j ELSE , n is replaced by n 1 1 F 2.—Prior densities of the marker-specific variance ak ak 2 2 2 2 2 2 2 s (solid circles, n ¼ 4, S ¼ 1; solid curve, n ¼ 10, S ¼ 1; n n 1 n 1 n ak and S by S ak (it is expected that S ak S crosses, n ¼ 100, S ¼ 1) and values of integrand 2 for most markers). The relative information gain (frac- Lðy; y 1 1; S ; ak ¼ 0Þ in the Kullback–Leibler distance, for tion of entropy reduced by knowledge encoded in each of the three priors, shown as solid lines. The integrands ELSE) is then are essentially indistinguishable from each other for all values of s2 . Values of the integrand are drastically different (open nohinohi ak 2 2 2 circles) when 10 markers are assigned the same variance, so H s j y; S H s j ELSE 2 2 ak ak that Lðy; y 1 10; S ; ak ¼ 0 for all k; s Þ. RIG ¼ nohi : ð45Þ ak s2 j y; 2 H ak S

For instance, RIG ¼ 1 if the entropy of the conditional The impact of the degrees of freedom of the prior on posterior process is 0. Assume a nil marker effect and a the random quantity L(.) of the integrand in KL was root-scale parameter S ¼ 1. For a ¼ 0, S ¼ 1, and n ¼ 4, examined by assuming that the conditional posterior k 2 distribution of s had ak ¼ 0 (again, most marker effects RIG ¼ 0.125; for ak ¼ 0, S ¼ 1, and n ¼ 10, then RIG ¼ ak 2 are expected to be tiny, if not null) and that the scale 6.51 3 10 ; and for ak ¼ 0, S ¼ 1, and n ¼ 100, RIG ¼ 9.60 3 103. Even at mild values of the prior degrees of parameter of the prior of the marker-specific variances freedom n, the extent of uncertainty reduction due to was S ¼ 1. Figure 2 displays three scaled inverse chi- observing data is negligible. Metaphorically, the prior is square densities, all with the same parameter S ¼ 1 and totalitarian in Bayes A, at least for each one of the s2 degrees of freedom 4, 10, or 100, as well as values of the ak 2 2 random quantity Lðy; y 1 p; S ; am; s Þ with p ¼ 1 and parameters. ak A third gauge is the Kullback–Leibler distance (KL) ak ¼ 0 in the KL gauge. Also shown in Figure 2 (in open between the prior and conditional posterior distribu- circles) are values of Lð:; :; :; :Þ for p ¼ 10, meaning tions. The KL metric (Kullback 1968) is the expected that, instead of having marker-specific variances, 10 logarithmic divergence between two parametric distri- markers would share the same variance. While the butions, one taken as reference or point of departure. three priors are different, and reflect distinct states of prior uncertainty about s2 via their distinct degrees Using the prior as reference distribution, the expected ak 2 2 of freedom, the Lðy; y 1 p; S ; am; s Þ values are es- distance is ak sentially the same irrespective of s2 and flat for any ak 2 KL conditional; prior values of sa appearing with appreciable density in ð k the priors. On the other hand, if it is assumed that 2 2 2 2 2 2 2 ¼ Lðy; y 1 p; S ; amÞpðs j y; S Þds ; Lðy; y 1 10; S ; a ¼ 0; s Þ, where s is now a variance ak ak k ak ak assigned to a group of markers (even assuming that where their effects are nil), Lð:; :; :; :Þ is steep with respect to s2 , taking negative values for s2 ,0.55 (roughly). In ak ak pðs2 j y; S2Þ 2 2 ak fact, the KL distances (evaluated with numerical in- Lðy; y 1 p; S ; am; s Þ¼log ak ðs2 j Þ tegration) between the prior and conditional posterior p ak ELSE distributions are (1) 7.33 3 102 for n ¼ 4, S ¼ 1, p ¼ 1 2 is a randomly varying distance (randomness is due to and ak ¼ 0; (2) 2.64 3 10 for n ¼ 10, S ¼ 1, p ¼ 1 and uncertainty about s2 Þ, and p is the number of markers a ¼ 0; and (3) 2.52 3 103 for n ¼ 100, S ¼ 1, p ¼ 1, and ak k that are assigned the same variance, so that p ¼ 1 and ak ¼ 0, so that the conditional posterior is very close to am ¼ ak in Bayes A; however, p could be much larger if, the prior even at small values of the degrees of freedom say, all p markers on the same chromosome were parameter. However, when the number of markers assigned the same variance. sharing the same variance increases to p ¼ 10 (assuming Quantitative Variation and the Bayesian Alphabet 357

that, essentially, the extent of shrinkage of effects will always be dictated strongly by the prior, which negates the objective of introducing marker-specific variances into the model. The magnitudes of the estimates of marker effects can be made smaller or larger at will via changes of the degrees of freedom and scale parameters of the prior distribution. Arguably, Bayes B is not well formulated in a Bayesian context. Meuwissen et al. (2001) interpret that assign- ing a priori a value s2 ¼ 0 with probability p means that ak the specific SNP does not have an effect on the trait. As mentioned earlier in this article, stating that a param- eter has 0 variance a priori does not necessarily mean that the parameter takes value 0: it could have any value, but igure known with certainty. Thus, assuming s2 ¼ 0 implies F 3.—Effect of scale parameter on the conditional ak posterior distribution of the variance of the marker effect. determinism about such an effect. It turns out, however, Open boxes, prior distribution; solid circles, conditional pos- that their sampler sets a ¼ 0 when the state s2 ¼ 0is k ak terior distribution for c ¼ 2 (standardized marker effect). The drawn! A more reasonable specification is to place the other three conditional distributions (solid lines) are barely distinguishable from the prior. mixture with a 0 state at the level of the effects, but not at the level of the variances. Impact on predictions: A counterargument to the all ak’s ¼ 0, as stated), KL ¼ 4.47, so that considerable preceding critique could be articulated as follows: Even Bayesian learning about s2 takes place in this situation. though the prior affects inferences about marker-specific ak Relative to scenario 1 above, the KL distance increases variances, this is practically irrelevant, because one can ‘‘kill’’ by 61 times. the influence of the prior on estimates of marker effects simply by A pertinent question is whether or not the learned increasing sample size. Superficially, it seems valid, be- marker effect (i.e., a draw from its conditional posterior cause the fully conditional posterior distribution of ak distribution) has an important impact on KL via (assuming a model with a single location parameter m) modification of the scale parameter from S2 into is 2 2 ðnS 1 a Þ=ðn 1 1Þ. Let c ¼ ak=S be the realized value k a j ELSE k 2  3 of the marker effect in units of the ‘‘prior standard P P n k deviation’’ S, with c ¼ 0, 0.01, 0.5, 1, and 2; the last two 6 i¼1 wik yi m k9¼1 wik9ak 2 7 6 k96¼k s 7 N 4 P ; P e 5; cases would be representative of markers with huge n w2 1 s2=s2 n w2 1 s2=s2 i¼1 ik e ak i¼1 ik e ak effects. The density of the conditional posterior distri- bution of s2 is then k ¼ 1; 2; ...; K : ak P n 2 2 2 2 2 ðn11Þ=2 AsP sample size n increases, ¼ wik 1 se=sa tends to ððn 1 c ÞS =2Þ i 1 k pðs2 j ELSEÞ¼ ðs2 Þððn1112Þ=2Þ n w2 so the influence of s2 vanishes asymptotically, ak G n 1 = ak i¼1 ik ak ðð 1Þ 2Þ ! given some fixed values of n, S. This indicates that, in ðn 1 c2ÞS2 Bayes A, even though Bayesian learning about the s2 3 exp : ak 2s2 parameters is limited, the influence of the prior on the ak posterior distributionsP of marker effects and of the genetic values K w a dissipates in large samples. The KL distances between the conditional posterior k¼1 ik k However, in marker-assisted prediction of genetic values and the prior for these five situations, assuming n ,,p, so the prior may be influential. The sensitivity S ¼ 1 and n ¼ 4, are (1) KLðc ¼ 0Þ¼7:33 3 102, of Bayes A with respect to the prior in a finite sample was (2) KLðc ¼ 0:01Þ¼7: 32 3 102, (3) KLðc ¼ 0:5Þ¼ examined by simulation. 4: 67 3 102, (4) KLðc ¼ 1Þ¼1: 54 3 102, and (5) Simulation: Bayes A was fitted under different prior KLðc ¼ 2Þ¼0:34. Even though marker effects are specifications to a simple data structure. Records for drastically different, the conditional posteriors are not 300 individuals were generated under the additive too different (in the KL sense) from each other, model meaning that the extent of shrinkage in Bayes A (or B) continues to be dominated by the prior. This is X280 illustrated in Figure 3: even when c ¼ 2, the conditional yi ¼ wikak 1 ei; i ¼ 1; 2; ... ; 300; posterior does not differ appreciably from the prior. k¼1 In short, neither Bayes A nor Bayes B, as formulated where yi is the phenotype for individual i, and the rest is by Meuwissen et al. (2001), allows for any appreciable as before. Residuals were independently sampled from a Bayesian learning about marker-specific variances so standard normal distribution. 358 D. Gianola et al.

TABLE 1 TABLE 2 Correlation between marker genotypes (average over markers Nine different specifications of hyperparameters of the prior and over 100 Monte Carlo replicates) by scenarios of distribution of marker variances in Bayes A (n, prior adjacency between pairs of markers and of linkage degrees of freedom; S 2, prior scale parameter) disequilibrium (X0, low linkage disequilibrium; X1, high linkage disequilibrium) S 2 ¼ 105 S 2 ¼ 103 S 2 ¼ 5 3 102 n ¼ 01 2 3 Adjacency Adjacency Adjacency Adjacency 1 n ¼ 2 45 6 Disequilibrium 12 34 n ¼ 17 8 9 scenario X0 0.007 0.002 0.002 0.013 X1 0.722 0.567 0.450 0.356 were assigned to the residual variance. The nine priors considered are in Table 2. Hyperparameter values were chosen such that the prior had, at most, the same Two LD scenarios regarding the distribution of the contribution to the degrees of freedom of the fully 280 markers were generated. In scenario X0, markers conditional distribution as the information coming were in weak LD, with almost no correlation between from the remaining components of the model (i.e., 1). genotypes of adjacent markers (Table 1). In scenario Values of S 2 were chosen following similar considera- X1, LD was relatively high: the correlation between tions. Note that if the samples of marker effects are equal markers dropped from 0.772 for adjacent markers to to their true value, a2 # 0.22 (see Figure 4). Priors 1–3 are 0.354 for markers separated by three positions (Table k improper, and the other six priors are proper but do not 1). Effects of allele substitutions were kept constant possess finite means and variances. Therefore, scenarios across simulations and were set to zero for all markers with S 2 ¼ 105 correspond to cases of relatively small except for 10, as shown in Figure 4. The locations of influence of the prior on the scale parameter of the fully markers with nonnull effects were chosen such that conditional distribution, while S 2 ¼ 5 3 102 represents a different situations were represented. For example case where the fully conditional distribution has a strong (Figure 4), in chromosome 3 there were two adjacent dependency on the prior specification. markers with opposite effects, while chromosome 4 had For each of these models and Monte Carlo replicates two adjacent markers with equal effects. 35,000 iterations of the Gibbs Sampler were run, and the A Monte Carlo study with 100 replicates was run for first 5000 iterations were discarded as burn-in. Inspec- each of the two LD scenarios. For each replicate and LD tion of trace plots and other diagnostics (effective scenario, nine variations of Bayes A were fitted, each sample size, MC standard error) computed using Coda defined by a combination of prior values of hyper- (Plummer et al. 2008) indicated that this was adequate parameters. In all cases, a scale inverted chi-square to infer quantities of interest. distribution with 1 d.f. and scale parameter equal to 1 Table 3 shows the average (across 100 MC replicates) of posterior means of the residual variance and of the correlation between the true and the estimated quantity of several features. This provides an assessment of goodness of fit, of how well the model estimates ge- nomic values, and of the extent to which the model can uncover relevant marker effects. As expected, Bayes A was sensitive with respect to prior specification for all items monitored. Scenarios 4 and 7 produced over- fitting (low estimate of residual variance, whose true value was 1, and high correlation between data and fitted values). It also had a low ability to recover signal (i.e., to estimate marker effects and genomic values), as indicated by the corresponding correlations. Other priors (e.g., 6) produced a model with a better ability to estimate genomic values and marker effects. These results were similar in both scenarios of LD. The results in Table 3 also indicate that it is much more difficult to uncover marker effects than to predict genomic values. To have a measure of the ability of each model to Figure 4.—Positions (chromosome and marker number) locate genomic regions affecting the trait, an index was and effects of markers (there were 280 markers, with 270 hav- created as follows. For each marker having a nonnull ing no effect). effect and for each replicate, a dummy variable was Quantitative Variation and the Bayesian Alphabet 359

TABLE 3 Average (over 100 replicates) of posterior mean estimates of residual variance (s2) and of the correlation between the true and the estimated value for several items (phenotypes, y; true genomic value, Wam; fitted genomic value, Waˆm ; true marker effects, am; estimated marker effects,

2 s Corrðy; Waˆm Þ CorrðWam ; Waˆm Þ Corrðam ; aˆm Þ Mean SD Mean SD Mean SD Mean SD Low linkage disequilibrium between markers (X0) Bayes A 1 0.518 0.062 0.839 0.027 0.580 0.063 0.102 0.048 2 0.941 0.089 0.577 0.028 0.721 0.092 0.200 0.022 3 1.074 0.105 0.496 0.032 0.701 0.106 0.199 0.020 4 0.394 0.053 0.895 0.022 0.531 0.060 0.079 0.051 5 0.824 0.077 0.652 0.025 0.699 0.079 0.183 0.028 6 0.950 0.089 0.578 0.027 0.722 0.088 0.201 0.021 7 0.173 0.053 0.966 0.015 0.455 0.057 0.042 0.043 8 0.575 0.056 0.813 0.019 0.606 0.066 0.116 0.044 9 0.710 0.066 0.728 0.020 0.659 0.072 0.152 0.037 High linkage disequilibrium between markers (X1) Bayes A 1 0.535 0.069 0.824 0.029 0.580 0.070 0.121 0.045 2 0.938 0.076 0.609 0.033 0.677 0.083 0.210 0.026 3 1.093 0.085 0.528 0.034 0.650 0.086 0.211 0.025 4 0.404 0.067 0.888 0.025 0.533 0.067 0.094 0.048 5 0.809 0.069 0.670 0.030 0.659 0.076 0.200 0.030 6 0.948 0.075 0.616 0.031 0.676 0.081 0.211 0.026 7 0.195 0.056 0.960 0.015 0.462 0.060 0.062 0.048 8 0.566 0.058 0.809 0.021 0.593 0.070 0.132 0.042 9 0.689 0.062 0.734 0.024 0.629 0.072 0.173 0.036 SD, among-replicates standard deviation of item. created indicating whether or not the marker, or any of measure of uncertainty. It was shown that the connec- its 4 flanking markers, ranked among the top 20 on the tion between the variance of marker effects and the basis of the absolute value of the posterior mean of the additive genetic variance depends on what is assumed marker’s effect. Averaging across markers and replicates about locus effects. For instance, in the classical model led to an index of ‘‘retrieved regions’’ (Table 4). Results of Falconer and Mackay (1996), locus effects are suggest that the ability of Bayes A to uncover relevant considered as fixed and additive genetic variance stems genomic regions is also affected by the choice of from random sampling of genotypes. To introduce a hyperparameters. For example, in scenarios 1, 4, and 7 variance of marker effects, these must be assumed to be only one of five regions was retrieved by Bayes A. On the random samples from some distribution that, in the other hand, the fraction of retrieved regions was twice as Bayesian setting, is precisely an uncertainty distribution. large when using other priors (scenarios 2, 3, and 6). The ability to uncover genomic regions affecting a trait was usually worse with high LD, due to redundancy TABLE 4 between markers. Fraction of retrieved regions by set of priors in Bayes A and scenario of linkage disequilibrium (LD) DISCUSSION Set of priors in Bayes A Low LD High LD This article examined two main issues associated with the development of statistical models for genome- 1 0.24 0.21 2 0.43 0.34 assisted prediction of quantitative traits using dense 3 0.47 0.33 panels of markers, such as single-nucleotide polymor- 4 0.22 0.19 phisms. The first one is the relationship between 5 0.36 0.31 parameters from standard quantitative genetics theory, 6 0.43 0.34 such as additive genetic variance, and those from 7 0.22 0.18 marker-based models, i.e., the variance of marker 8 0.26 0.22 effects. In a Bayesian context, the latter act mainly as a 9 0.29 0.26 360 D. Gianola et al.

The article also discussed assumptions that need to be models with marker effects are presented in Yi and Xu made to establish a connection between the two sets of (2008) and de los Campos et al. (2009a). parameters and introduced a more general partition of Bayes B requires a reformulation (and a new letter, to variance, in which genotypes, effects, and allelic fre- avoid confusion!), e.g., the mixture with a zero state quencies are random variables. Some expressions for posed at the level of effects and not at that of the relating additive genetic variance and that of marker variances, as discussed earlier. For example, one could effects are available under the assumption of linkage assume that the marker effect is 0 with probability p or equilibrium, as discussed in the article. However, that it follows a normal distribution with common accommodating linkage disequilibrium explicitly into variance otherwise. Further, the mixing probability p an inferential system suitable for marker-assisted selec- could be assigned a prior distribution, e.g., a beta tion represents a formidable challenge. process, as opposed to specifying an arbitrary value for The second aspect addressed in this study was a p. Mixture models in genetics are discussed, for critique of methods Bayes A and B as proposed by example, by Gianola et al. (2006b) and some new, yet Meuwissen et al. (2001). These methods require spec- unpublished, normal mixtures for marker-assisted se- ifying hyperparameters that are elicited using formulas lection are being developed by R. L. Fernando (R. L. related to those mentioned in the paragraph above; Fernando, unpublished data) (http://dysci.wisc.edu. however, the authors did not state the assumptions edu/sglpe/pdf/Fernando.pdf). needed precisely. It was shown here that these hyper- A more general solution is to use a nonparametric parameters can be influential. method, as suggested by Gianola et al. (2006a), The influence of the prior on inferences and pre- Gianola and van Kaam (2008), Gianola and de los dictions via Bayes A can be mitigated in several ways. Campos (2008) and casted more generally by de los One way consists of forming clusters of markers such Campos et al. (2009a). These methods do not make that their effects share the same variance. Thus, shrink- hypotheses about mode of inheritance, contrary to the age would be specific to the set of markers entering into parametric methods discussed above, where additive the cluster. The clusters could be formed either on the action is assumed. Evidence is beginning to emerge that basis of biological information (e.g., according to nonparametric methods may have better predictive coding or noncoding regions specific to a given chro- ability when applied to real data (Gonza´lez-Recio mosome) or perhaps statistically, using some form of et al. 2008, 2009; N. Long,D.Gianola,G.J.M.Rosa supervised or unsupervised clustering procedure. If and K. A. Weigel, unpublished results.). clusters of size p were formed, the conditional posterior In conclusion, this article discussed connections be- distribution of the variances of the markers would have tween marker-based additive models and standard mod- p 1 n d.f., instead of 1 1 n in Bayes A. A second way of els of quantitative genetics. It was argued that the mitigating the impact of hyperparameters is to assign a relationship between the variance of marker effects and noninformative prior to the scale and degrees of the additive genetic variance is not as simple as has been freedom parameters of Bayes A. This has been done in reported, becoming especially cryptic if the assumption quantitative genetics, as demonstrated by Strande´n of linkage equilibrium is violated, which is manifestly the and Gianola (1998) and Rosa et al. (2003, 2004) and case with dense whole-genome markers. Also, a critique of discussed in Sorensen and Gianola (2002). For exam- earlier models for genomic-assisted evaluation in animal ple, Strande´n and Gianola (1998) used models with t- breeding was advanced, from a Bayesian perspective, and distributions for the residuals (the implementation some possible remedies of such models were suggested. would be similar in Bayes A, with the t-distribution Hugo Naya, Institut Pasteur of is thanked for having assigned to marker effects instead), with unknown engineered the two linkage disequilibrium scenarios used in the degrees of freedom and unknown parameters. In simulation. The Associate Editor is thanked for his careful reading of Strande´n and Gianola (1998) a scaled inverted chi- the manuscript and for help in clarifying some sections of the article. square distribution was assigned to the scale parameter Part of this work was carried out while D. Gianola was a Visiting Professor at Georg-August-Universita¨t, Go¨ttingen, Germany (Alexander of the t-distribution, and equal prior probabilities were von Humboldt Foundation Senior Researcher Award), and a Visiting assigned to a set of mutually exclusive and exhaustive Scientist at the Station d’Ame´lioration Ge´ne´tique des Animaux, Centre values of the degrees of freedom. On the other hand, de Recherche de Toulouse, France (Chaire D’Excellence Pierre de Rosa et al. (2003, 2004) presented a more general Fermat, Agence Innovation, Midi-Pyrenee´s). Support by the Wisconsin treatment, in which the degrees of freedom were Agriculture Experiment Station and by National Science Foundation (NSF) grant Division of Mathematical Sciences NSF DMS-044371 to sampled with a Metropolis–Hastings algorithm. A third D.G. and G. de los C. is acknowledged. modification of Bayes A would consist of combining the two preceding options, i.e., assign a common variance to a cluster of marker effects and then use noninformative LITERATURE CITED priors, as in Rosa et al. (2003, 2004), for the parameters Barton, N. H., and H. P. de Vladar, 2009 Statistical mechanics and of the t-distribution. Applications of thick-tailed priors, the evolution of polygenic quantitative traits. Genetics 181: 997– such as the t or the double exponential distribution, to 1011. Quantitative Variation and the Bayesian Alphabet 361

Box, G. E. P., and G. C. Tiao, 1973 Bayesian Inference in Statistical SNPs in genomic selection: application to early mortality in Analysis. Addison-Wesley, Reading, MA. broilers. J. Anim. Breed. Genet. 124: 377–389. de los Campos, G., D. Gianola and G. J. M. Rosa, 2009a Re- Maher, B., 2008 The case of the missing heritability. Nature 456: producing kernel Hilbert spaces regression: a general framework 18–21. for genetic evaluation. J. Anim. Sci. 87: 1883–1887. Meuwissen, T. H., B. J. Hayes and M. E. Goddard, 2001 Prediction de los Campos, G., H. Naya,D.Gianola,J.Crossa,A.Legarra et al., of total genetic value using genome-wide dense marker maps. Ge- 2009b Predicting quantitative traits with regression models netics 157: 1819–1829. for dense molecular markers and pedigrees. Genetics 182: 375– Plummer, M., N. Best,K.Cowles and K. Vines, 2008 Coda: output 385. analysis and diagnostics for MCMC. http://cran.r-project.org/ Falconer, D. S., and T. F. C. Mackay, 1996 Introduction to Quantita- web/packages/coda/index.html. tive Genetics, Ed. 4. Longmans Green, Harlow, Essex, UK. Rosa, G. J. M., C. R. Padovani and D. Gianola, 2003 Robust linear Fernando,R.L.,andM.Grossman, 1989 Marker-assisted selection us- mixed models with normal/independent distributions and ing best linear unbiased prediction. Genet. Sel. Evol. 33: 209–229. Bayesian MCMC implementation. Biom. J. 45: 573–590. Gianola, D., and G. de los Campos, 2008 Inferring geneticvaluesfor Rosa, G. J. M., D. Gianola and C. R. Padovani, 2004 Bayesian lon- quantitative traits non-parametrically. Genet. Res. 90: 525–540. gitudinal data analysis with mixed models and thick-tailed distri- Gianola, D., and R. L. Fernando, 1986 Bayesian methods in ani- butions using MCMC. J. Appl. Stat. 31: 855–873. mal breeding. J. Anim. Sci. 63: 217–244. Sorensen, D., and D. Gianola, 2002 Likelihood, Bayesian, and Gianola, D., M. Pe´rez-Enciso and M. A. Toro, 2003 On marker- MCMC Methods in Quantitative Genetics. Springer, New York. assisted prediction of genetic value: beyond the ridge. Genetics Strande´n, I., and D. Gianola, 1998 Attenuating effects of prefer- 163: 347–365. ential treatment with Student-t mixed linear models: a simulation Gianola, D., R. L. Fernando and A. Stella, 2006a Genomic assis- study. Genet. Sel. Evol. 30: 565–583. ted prediction of genetic value with semi-parametric procedures. Turelli, M., 1985 Effects of pleiotropy on predictions concerning Genetics 173: 1761–1776. mutation selection balance for polygenic traits. Genetics 111: Gianola, D., B. Heringstad and J. Ødega˚rd, 2006b On the quan- 165–195. titative genetics of mixture characters. Genetics 173: 2247–2255. van Raden, P. M., 2008 Efficient methods to compute genomic pre- Gianola, D., and J. B. C. H. M. van Kaam, 2008 Reproducing kernel dictions. J. Dairy Sci. 91: 4414–4423. Hilbert spaces methods for genomic assisted prediction of quan- van Raden, P.M., C. P. van Tassell,G.R.Wiggans,T.S.Sonstegard, titative traits. Genetics 178: 2289–2303. R. D. Schnabel et al., 2009 Reliability of genomic predictions for Gonza´lez-Recio, O., D. Gianola,N.Long,K.A.Weigel,G.J.M. North American Holstein bulls. J. Dairy Sci. 92: 16–24. Rosa et al., 2008 Nonparametric methods for incorporating ge- Wang, C. S., J. J. Rutledge and D. Gianola, 1993 Marginal infer- nomic information into genetic evaluations: an application to ences about variance components in a mixed linear model using mortality in broilers. Genetics 178: 2305–2313. Gibbs sampling. Genet. Sel. Evol. 25: 41–62. Gonza´lez-Recio, O., D. Gianola,G.J.M.Rosa,K.A.Weigel and A. Wang, C. S., J. J. Rutledge and D. Gianola, 1994 Bayesian analysis Kranis, 2009 Genome-assisted prediction of a quantitative trait of mixed linear models via Gibbs sampling with an application to in parents and progeny: application to food conversion rate in litter size in Iberian pigs. Genet. Sel. Evol. 26: 91–115. chickens. Genet. Sel. Evol. 41: 3–13. Whittaker, J. C., R. Thompson and M. C. Denham, 2000 Marker- Habier, D., R. L. Fernando and J. C. M. Dekkers, 2007 The impact assisted selection using ridge regression. Genet. Res. 75: 249– of genetic relationship information on genome-assisted breeding 252. values. Genetics 177: 2389–2397. Wright, S., 1937 The distribution of gene frequencies in popula- Hayes, B. J., P. J. Bowman,A.J.Chamberlain and M. E. Goddard, tions. Proc. Natl. Acad. Sci. USA 23: 307–320. 2009 Invited review: genomic selection in dairy cattle: progress Yi, N., and S. Xu, 2008 Bayesian Lasso for quantitative trait loci map- and challenges. J. Dairy Sci. 92: 433–443. ping. Genetics 179: 1045–1055. Heffner, E. L., M. E. Sorrell and J. L. Jannink, 2009 Genomic se- Xu, S., 2003 Estimating polygenic effects using markers of the entire lection for crop improvement. Crop Sci. 49: 1–12. genome. Genetics 163: 789–801. Kullback, S., 1968 Information Theory and Statistics, Ed. 2. Dover, Zhang, X. S., and W. G. Hill, 2005 Predictions of patterns of re- New York. sponse to artificial selection in lines derived from natural popu- Lande,R.,andR.Thompson, 1990 Efficiency of marker-assisted selec- lations. Genetics 169: 411–425. tion in the improvement of quantitative traits. Genetics 124: 743–756. Long, N., D. Gianola,G.J.M.Rosa,K.A.Weigel and S. Avendan˜o, 2007 Machine learning classification procedure for selecting Communicating editor: E. Arjas

APPENDIX P P K K Linkage disequilibrium: The expression i¼1 2pkqk ¼ k¼1 VarðwkÞ results from jointly sampling genotypes (but not their effects) at K loci in linkage equilibrium. This is a needed assumption for arriving at (19). On the other hand, if there is LD, the additive genetic variance (under HW equilibrium at each locus) is ! XK VAðDÞ¼Var wkak k¼1 XK XK XK 2 ¼ 2pkqkak 1 2 2Dkl akal ; ðA1Þ k¼1 k¼1 l.k where Dkl ¼ PrðABÞkl pA;kpB,l is the usual LD statistic involving the two loci in question. The first term in (A1) is the additive genetic variance under LE; the second term is a contribution to variance from LD, and it may be negative or positive. It can be shown that the average correlation between genotypes at a pair of loci is (approximately) ,K1. 362 D. Gianola et al.

The average (over a effects) variance under LD depends on the distribution of the a’s. If these are independently 2 and identically distributed with mean u and variance sa, one has XK XK XK 2 2 2 V A9 ðDÞ¼EV½AðDÞ ¼ðsa 1 u Þ 2pkqk 1 2u 2Dkl : ðA2Þ i¼1 k¼1 l.k

If u ¼ 0, then V A9 (D) ¼ V A9 , in which case linkage disequilibrium would not affect relationship (19). There is no mechanistic basis for expecting that all loci have the same effects and for these being mutually independent. There may be some genomic regions without any effect at all, or some regions may induce similarity (or dissimilarity) of effects; for example, if two genes are responsible for producing a fixed amount of transcript, their effects would be negatively correlated, irrespective of whether or not genotypes are in linkage equilibrium. A more 2 2 general assumption may be warranted, i.e., that effects follow some multivariate distribution a ðu; VsaÞ, where sa is just a dispersion parameter. Here, with w being the vector of genotypes for the K loci, one can write the average variance under LD, "# ! XK V A9 ðDÞ¼E Var wkak ¼ E½a9Ma ; k¼1 where M is the covariance matrix of w (diagonal under LE), 2 3 p1q1 D12 ::: D1K 6 7 6 p2q2 ::: D2K 7 6 7 6 ::: :7 M ¼ 26 7: 6 :: :7 4 ::5 Symmetric pK qK

Further,

2 V A9 ðDÞ¼u9Mu 1 satrðMVÞ; where trð:Þ is the trace of the matrix in question. The counterpart of (19) is V 9 ðDÞu9Mu s2 ¼ A ; ðA3Þ a trðMVÞ which is a complex relationship even if u ¼ 0. It follows that (19), appearing often in the literature, holds only under strong simplifying assumptions. In short, the connection between additive genetic variance and the variance of marker effects depends on the unknown means of the distributions of marker effects, their possible covariances (induced by unknown molecular and chromosomal process), their gene frequencies, and all pairwise linkage disequilibrium parameters, which are a function of the Dkl ’s. It is not obvious what the effects of using (19) as an approximation are, but the assumptions surrounding it are undoubtedly strong. Covariance between relatives and linkage disequilibrium: Linkage disequilibrium complicates matters, as noted earlier.P The covariance between marked genetic values of individuals i and j, instead of being rij am9 Dam ¼ K 2 rij k¼1 2pkqkak , takes the form XK XK Covðwi9am; wj9am j amÞ¼ Covðwi;kwj;kÞakal ; k¼10l¼1 2 31 wi;1;wj;1 wi;1;wj;2 :::wi;1;wj;K B 6 7C B 6 w w :::w w 7C B 6 i;2; j;2 i;2; j;K 7C B 6 7C B 6 ::: : 7C ¼ am9 BCov6 7Cam B 6 :: : 7C B 6 7C @ 4 ::5A

symmetric wi;K ;wj;K

¼ am9 M*am9 and M* is no longer a diagonal matrix, because of LD creating covariances between genotypes at different marker loci. The diagonal elements of M* have the form (assuming HW frequencies within each locus) Quantitative Variation and the Bayesian Alphabet 363

* mij;k ¼ Covðwi;k;wj;kÞ¼rij 2pkqk; k ¼ 1; 2; ... ; K and the off-diagonals are

* mij; k; l ¼ rij 2Dkl ; k 6¼ l:

This implies that disequilibrium statistics D must be brought into the picture when estimating a pedigree relationship matrix using markers in LD.