Genetics: Published Articles Ahead of Print, published on July 20, 2009 as 10.1534/genetics.109.103952

1 Additive genetic variability and the Bayesian alphabet

; ; ; \ 2 Daniel Gianola  y z , Gustavo de los Campos, William G. Hill , # 3 Eduardo Manfrediz and Rohan Fernando

e-mail: [email protected]

Department of Animal Sciences, University of Wisconsin-Madison, Madison, Wisconsin 53706

yDepartment of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, N-1432 Ås, Norway

zInstitut National de la Recherche Agronomique, UR631 Station d’Amélioration Génétique des Animaux, BP 52627, 32326 Castanet-Tolosan, France \Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JT, UK #Department of Animal Science, , Ames, Iowa 50011

4 July 14, 2009

5 Corresponding author: Daniel Gianola, Department of Animal Sciences, 1675 Observatory

6 Dr., Madison, WI 53706. Phone: (608) 265-2054. Fax: (608) 262-5157. E-mail:

7 [email protected].

8 Running head: Quantitative variation and the Bayesian alphabet

9 Key Words: quantitative genetics, marker assisted selection, genomic selection, single nu-

10 cleotide polymorphisms, Bayesian methods, prior distributions.

1 11 ABSTRACT

12 The use of all available molecular markers in statistical models for prediction of quantitative

13 traits has led to what could be termed a genomic-assisted selection paradigm in animal and plant

14 breeding. This paper provides a critical review of some theoretical and statistical concepts in the

15 context of genomic-assisted genetic evaluation of animals and crops. First, relationships between

16 the (Bayesian) variance of marker e¤ects in some regression models and additive genetic variance

17 are examined under standard assumptions. Second, the connection between marker genotypes and

18 resemblance between relatives is explored, and linkages between a marker-based model and the

19 in…nitesimal model are reviewed. Third, issues associated with the use of Bayesian models for

20 marker-assisted selection, with a focus on the role of the priors, are examined from a theoretical

21 angle. The sensitivity of some Bayesian speci…cation that has been proposed (called "Bayes A" )

22 with respect to priors is illustrated with a simulation. Methods that can solve potential shortcomings

23 of some of these Bayesian regression procedures are discussed brie‡y.

24 1 Introduction

25 In an in‡uential paper in , MEUWISSEN et al. (2001) suggested using all avail-

26 able molecular markers as covariates in linear regression models for prediction of genetic value for

27 quantitative traits. This has led to a genome-enabled selection paradigm. For example, major dairy

28 cattle breeding countries are now genotyping elite animals and genetic evaluations based on SNPs

29 (single nucleotide polymorphisms) are becoming routine (HAYES et al. 2009; VAN RADEN et al.

30 2009). A similar trend is taking place in poultry (e.g., LONG et al. 2007; GONZÁLEZ-RECIO et

31 al. 2008, 2009), beef cattle (GARRICK 2008 personal communication) and plants (HEFFNER et

32 al. 2009).

33 The extraordinary speed with which events are taking place hampers the process of relating

34 new developments to extant theory, as well as the understanding of some of the statistical methods

35 proposed so far. These span from Bayes hierarchical models (e.g., MEUWISSEN et al. 2001)

36 and the Bayesian LASSO (e.g., de LOS CAMPOS et al. 2009b) to ad hoc procedures (e.g., VAN

37 RADEN 2008). Another issue is how parameters of models for dense markers relate to those of

38 classical models of quantitative genetics.

39 Many statistical models and approaches have been proposed for marker-assisted selection. These

40 include multiple regression on marker genotypes (LANDE and THOMPSON 1990), best linear unbi-

41 ased prediction (BLUP) including e¤ects of a single marker locus (FERNANDO and GROSSMAN

42 1989), ridge regression (WHITTAKER et al. 2000), Bayesian procedures (MEUWISSEN et al.

43 2001; GIANOLA et al. 2003; XU 2003; de los CAMPOS et al. 2009b) and semi-parametric speci…-

44 cations (GIANOLA et al. 2006a; GIANOLA and de los CAMPOS 2008). In particular, the methods

45 proposed by MEUWISSEN et al. (2001) have captured enormous attention in animal breeding, be-

46 cause of several reasons. First, the procedures cope well with data structures in which the number

47 of markers amply exceeds the number of markers, the so called "small n; large p" situation. Second,

2 48 the methods in MEUWISSEN et al. (2001) constitute a logical progression from the standard BLUP

49 widely used in animal breeding to richer speci…cations, where marker-speci…c variances are allowed

50 to vary at random over many loci. Third, Bayesian methods have a natural way of taking into

51 account uncertainty about all unknowns in a model (e.g., GIANOLA and FERNANDO 1986) and,

52 when coupled with the power and ‡exibility of Markov chain Monte Carlo, can be applied to almost

53 any parametric statistical model. In MEUWISSEN et al. (2001) normality assumptions are used

54 together with conjugate prior distributions for variance parameters; this leads to computational

55 representations that are well known and have been widely used in animal breeding (e.g., WANG

56 et al. 1993, 1994). An important question is that of the impact of the prior assumptions made in

57 these Bayesian models on estimates of marker e¤ects and, more importantly, on prediction of future

58 outcomes, which is central in animal and .

59 A second aspect mentioned above, is how parameters from these marker based models relate

60 to those of standard quantitative genetics theory, with either a …nite or an in…nite number (the

61 in…nitesimal model) of loci assumed. The relationships depend on the hypotheses made at the

62 genetic level and some of the formulae presented, e.g., in MEUWISSEN et al. (2001) use linkage

63 equilibrium; however, the reality is that marker-assisted selection relies on the existence of linkage

64 disequilibrium. While a general treatment of linkage disequilibrium is very di¢ cult in the context of

65 models for predicting complex phenotypic traits, it is important to be aware of its potential impact.

66 Another question is the de…nition of additive genetic variance, according to whether marker e¤ects

67 are assumed …xed or random, with the latter corresponding to the situation in which such e¤ects

68 are viewed as sampled randomly from some hypothetical distribution. MEUWISSEN et al. (2001)

69 employ formulae for eliciting the variance of marker e¤ects, given some knowledge of the additive

70 genetic variance in the population. Their developments begin with the assumption that markers

71 e¤ects are …xed, but these eventually become random without a clear elaboration of why this is so.

72 Since their formulae are used in practice for eliciting priors in the Bayesian treatment (e.g., HAYES

73 et al. 2009), it is essential to understand their ontogeny, especially considering that priors may have

74 an impact on prediction of outcomes.

75 The objective of this paper is to provide a critical review of some of these aspects in the context

76 of genomic-assisted evaluation of quantitative traits. First, relationships between the (Bayesian)

77 variance of marker e¤ects in some regression models and additive genetic variance are examined.

78 Second, connections between marker genotypes and resemblance between relatives are explored.

79 Third, the liaisons between a marker-based model and the in…nitesimal model are reviewed. Fourth,

80 and in the context of the quantitative genetics theory discussed in the preceding sections, some

81 statistical issues associated with the use of Bayesian models for (massive) marker-assisted selection

82 are examined, with main focus on the role of the priors. The sensitivity of some of these methods

83 with respect to priors is illustrated with a simulation.

3 84 2 Relationship between the variance of marker e¤ects and additive

85 genetic variance

86 2.1 Additive genetic variance

87 A simple speci…cation serves to set the stage. The phenotypic value (y) for a quantitative trait is

88 described by the linear model

89 y = wa + E; (1)

90 where a is the additive e¤ect of a bi-allelic locus on the trait, w is a random indicator variable

91 (covariate) relating a to the phenotype, and E is an independently distributed random deviate, E  92 (0;VE) ; where VE is the environmental or residual variance, provided gene action is additive. Under

93 Hardy-Weinberg (HW) equilibrium, the frequencies of the three possible genotypes are Pr (MM) = 2 2 94 p ; Pr (Mm) = 2pq and Pr (mm) = q ; where p = Pr (M) and q = 1 p = Pr (m) : Code arbitrarily 95 the states of w such that 1 with probability p2 96 w = 8 0 with probability 2pq (2) <> 1 with probability q2 97 The genetic values of MM; Mm and mm:> individuals are a; 0 and a; respectively. Then

98 E (w) = p q; (3) 99 2 100 E w = 1 2pq (4)

101 and 

102 V ar (w) = 2pq: (5)

103 This is the variance among genotypes (not among their genetic values) at the locus under HW

104 equilibrium.

105 A standard treatment (e.g., FALCONER AND MACKAY 1996) regards the additive e¤ect a as

106 a …xed parameter. The conditional distribution of phenotypes, given a (but unconditionally with

107 respect to w; since genotypes vary at random according to HW frequencies) has mean

108 E (y a) = E (w) a = (p q) a; (6) j

109 and variance 2 2 110 V ar (y a) = a V ar (w) + VE = 2pqa + VE = VA + VE (7) j 2 111 where VA = 2pqa is the additive genetic variance at the locus. In this standard treatment, the

112 additive genetic variance depends on the additive e¤ect a but not on its variance; unless a is assigned

113 a probability distribution it does not possess variance.

4 114 2.2 Uncertainty about the additive e¤ect

2 115 Assume now that a ;  ; where  is the mean of the distribution. In the Appendix of MEUWIS-  a 116 SEN et al. (2001), a suddenly mutates from …xed to random without explanation. How does the 2 117 variance of a arise? Parameter a can be assigned at least two interpretations. The …rst one is as

118 the variance between a e¤ects in a conceptual sampling scheme in which such e¤ects are drawn at 2 119 random from a population of loci. In the second (Bayesian), a represents uncertainty about the

120 true but unknown value of the additive e¤ect of the speci…c locus, but without invoking a scheme 2 121 of random sampling over a population of loci. For example, a = 0 means in a Bayesian sense that

122 a =  with complete certainty, but not necessarily that the locus does not have an e¤ect, since 

123 may (may not) be distinct from 0; this cannot be overemphasized. With a single locus the Bayesian

124 interpretation is more intuitive, since it is hard to envisage a reference population of loci in this

125 case. 2 2 126 Irrespective of the interpretation assigned to  , the assumption a ;  induces another a  a 127 conditional distribution (given w) with mean and variance 

128 E (y w) = E (wa w) = w; (8) j j

129 and 2 2 130 V ar (y w) = w  + VE; (9) j a

131 respectively. However, both w (the genotypes) and a (their e¤ects) are now random variables

132 possessing a joint distribution. Here, it is assumed that w and a are independent, but this may not

133 be so, as in a mutation-stabilizing selection model (e.g., TURELLI 1985) or in situations discussed

134 by ZHANG and HILL (2005) where the distribution of gene frequencies after selection depends on

135 a. Deconditioning over both a and w (that is, averaging over all genotypes at the locus and all

136 values that a can take), the expected value and variance of the marginal distribution of y are

137 E (y) = Ew [E (y w)] = E (w)  = (p q) ; (10) j

138 and

139 V ar (y) = Ew [V ar (y w)] + V arw [E (y w)] j j 2 2 140 = Ew w a + VE + V arw [w] 2 2 141 = (1  2pq)  + 2pq + VE: (11) a

142 This distribution is not normal: the phenotype results from multiplying a discrete random variable

143 w by the normal variate a; and then adding a deviate E; which may or may not be normal, depending 2 2 144 on what is assumed about the residual distribution. The genetic variance is now (1 2pq) a+2pq . 2 145 The term (1 2pq) a stems from randomness (uncertainty) about a; and it dissipates from (11) 2 2 2 146 only if a = 0. Note that there is additive genetic variance even if a = 0; and it is equal to 2pq .

5 147 Further, if the standard assumption  = 0 is adopted, the variance of the marginal distribution of

148 phenotypes becomes 2 149 V ar (y) = (1 2pq)  + VE: (12) a 2 150 Here, the standard term for additive genetic variance disappears and yet, a may not be zero, since

151 one poses that a locus has no e¤ect (on average) but there is some uncertainty or variation among 2 152 loci e¤ects, as represented by a: 2 153 In a nutshell: the additive genetic variance at the locus, VA = 2pqa ; does not appear in (11), 2 2 154 because a has been integrated out; actually, it is replaced by 2pq in (11). The term (1 2pq)  a 155 does not arise in the standard (…xed) model, where additive genetic variance in stricto sensu is VA . 2 156 When a is known with certainty, then a = 0 and yet the locus generates additive genetic variance, 2 2 157 as measured by VA = 2pq : If  = 0 and a = 0; then the variance is purely environmental. In 2 158 short, the connection between the uncertainty variance a and the additive variance VA (which

159 involves the e¤ect of the locus) is elusive when both genotypes and e¤ects are random variables.

160 2.3 Several loci

161 Consider now K loci with additive e¤ect ak at locus k, without dominance or epistasis. The

162 phenotype is expressible as K 163 y = wkak + e; (13) k=1 X K 164 with E (y a1; a2; :::; ak) = (pk qk) ak under HW equilibrium. If all markers were quantitative j k=1 X 165 trait loci (QTL), this would be a "…nite number of QTL" model; it will be assumed that these loci

166 are markers hereinafter unless stated otherwise. Under this speci…cation

K 167 V ar (y a1; a2; :::; ak) = V ar wkak a1; a2; :::; ak + VE: (14) j j k=1 ! X

168 In order to deduce the additive variance, some assumption must be made about the joint distribution

169 of w1; w2:::; wK , the genotypes at the K loci.

170 Linkage equilibrium (LE) and HW frequencies will be assumed to make the problem tractable.

171 Some expressions are available to accommodate linkage disequilibrium, but parameters are not

172 generally available to evaluate them (see the Appendix). When there is selection, genetic drift or

173 introgression and many loci are considered jointly, some of which will be even physically linked, the

174 LE assumption is unrealistic. Hence, as in the case of many other authors, e.g., BARTON et al.

175 (2009) in a study of evolution of traits using statistical mechanics, LE-HW assumptions are used

176 here.

6 177 If there is LE, the distributions of genotypes at the K loci are mutually independent, so that

K 2 178 V ar (y a1; a2; :::; ak) = V ar (wk) a + VE j k k=1 X K 2 179 = 2pkqkak + VE: (15) k=1 X K 2 180 The multi-locus additive genetic variance under LE-HW is then VA = 2pkqkak: k=1 X

181 Suppose now that all e¤ects ak (k = 1; 2; :::; K) are drawn from the same random process with 2 182 some distribution function P (a), mean  and variance a. Using the same reasoning as before, the

183 variance of the marginal distribution of the phenotypes is

K K 2 184 V ar (y) = V ara (pk qk) ak + Ea 2pkqka + VE : (16) k k=1 ! k=1 ! X X

185 Then K K 2 2 186 V ar (y) =  (1 2pkqk) +  2pkqk + VE; (17) a k=1 k=1 X X 187 which generalizes (11) to K loci. The …rst term is variance due to uncertainty, the second term is

188 the standard additive genetic variance, and the third one is residual variation. K 2 What is the relationship between the multi-locus additive genetic variance VA = 2pkqkak i=1 2 X and a? Let VA0 be the mean variance, obtained by averaging VA over the distribution of the a0s: This operation yields K K 2 2 2 VA0 = E 2pkqkak = a +  2pkqk: i=1 ! i=1 X  X 189 Hence, if  = 0 (additive e¤ects expressed as a deviation from their mean) then

K 2 190 VA0 = a 2pkqk: (18) i=1 X 2 191 The relationship between the uncertainty variance a and the marked average additive genetic

192 variance VA0 would then be 2 VA0 193  = ; (19) a K 2pkqk i=1 X 194 in agreement with HABIER et al. (2007), but di¤erent from MEUWISSEN et al. (2001). Unless

195 the markers are QTL, VA0 gives only the part of the additive genetic variance captured by markers,

7 196 and this may be a tiny fraction only (MAHER 2008). This makes the connection between additive

197 genetic variance for a trait and marked variance even more elusive.

198 Corresponding formulae under linkage disequilibrium are in the Appendix.

199 2.4 Heterogeneity of variance

Suppose now that locus e¤ects are independently (but not identically) distributed as a N  ; 2 : k k ak K   The mean of the marginal distribution of phenotypes is E (y) = (pk qk) k; and the variance k=1 becomes X K K 2 2 V ar (y) = (1 2pkqk)  + 2pkqk + VE: ak k k=1 k=1 X X 2 200 If all  were equal to 0 (complete Bayesian certainty about all marker e¤ects a ), there would ak k 201 still be genetic variance, as measured by the second term in V ar (y) : Apart from the di¢ culty of 2 202 inferring a given  with any reasonable precision, there is the question of possible “commonality” ak 203 between loci e¤ects. For instance, some loci may have correlated e¤ects, and alternative forms for

204 the prior distribution of the a0s have been suggested by GIANOLA et al. (2003). Also, some of

205 these e¤ects are expected to be identically equal to 0; especially if ak represents a marker e¤ect,

206 as opposed to being the result of a QTL (if the marker is in LD with the QTL, its e¤ect would be

207 expected to be non-null). In such a case, a more ‡exible prior distribution might be useful, such as

208 a mixture of normals, a double exponential, or a Dirichlet process. 2 209 If a frequentist interpretation is adopted for the assumption ak N k;  , it is di¢ cult to  ak 210 envisage the conceptual population from which ak is sampled, unless the variances  are also random 211 draws from some population. Posing a locus-speci…c variance is equivalent to assuming that each

212 sire in a sample of Holsteins is drawn from a di¤erent conceptual population with sire-speci…c

213 variance. There would be as many variances as there are sires!

214 2.5 Variation in allelic frequencies

215 In addition to assuming random variation of a e¤ects over loci, a distribution of gene frequencies may

216 be posed as well. WRIGHT (1937) found that a beta distribution arose from a di¤usion equation

217 that was used to study changes in allele frequencies in …nite populations, so this is well grounded

218 in population genetics. In a Bayesian context, on the other hand, assigning a beta distribution to

219 gene frequencies would be a (mathematically convenient) representation of uncertainty.

220 Suppose that allelic frequencies pk (k = 1; 2; :::; K) vary over loci according to the same beta

221 B (a; b) process, where a; b are parameters determining the form of the distribution (it can be

222 made U shaped or skewed by appropriate choice); WRIGHT (1937) expressed these parameters as 223 functions of e¤ective population size and mutation rates. Standard results yield

8 a 224 E (p) = = p0; (20) (a + b) b 225 E (q) = = q0; (21) (a + b)

226 and p0q0 227 V ar (p) = : (22) a + b + 1

228 The expected heterozygosity is given by

(a + b) 229 2pq = 2p0q0 : (23) (a + b + 1)

230 There are now four sources of variation: 1) due to random sampling of genotypes over individuals

231 in the population; 2) due to uncertainty about additive e¤ects (equivalently, variability due to

232 sampling additive e¤ects over loci); 3) due to spread of gene frequencies over loci or about some

233 equilibrium distribution, and 4) environmental variability. The third source contributes to variance

234 under conceptual repeated sampling, since gene frequencies would ‡uctuate around the equilibrium

235 distribution, or over loci.

236 Consider now additive genetic variance when dispersion in allelic frequencies is taken into ac-

237 count, assuming LE. Let p = (p1; p2; :::; pK )0 and a = (a1; a2; :::; aK )0. Using previous results, (22)

238 and (23)

239 V ar (y a) = Ep [V ar (y p; a)] + V arp [E (y p; a)] j j j K K 2 240 = Ep 2pkqka + VE + V arp (pk qk) ak k "k=1 # "k=1 # X X K (a + b + 2) 2 241 = 2p q a + V (24) 0 0 ( +  + 1) k E a b k=1   X

242 The expected additive genetic variance is now

K (a + b + 2) 2 243 V = 2p q a (25) A 0 0 ( +  + 1) k a b k=1 X

244 In order to arrive at the marginal distribution of phenotypes, variation in a e¤ects is brought into

245 the picture, producing the variance decomposition

246 V ar (y) = Ea [V ar (y a)] + V ara [E (y a)] ; (26) j j

247 after variation in genotypes and frequencies (i.e., with respect to w and p) has been taken into

9 248 account. From (13)

K K 249 E (y a; p) = Ew [E (y a; w; p)] = Ew wkak a; p = (2pk 1) ak; j j j "k=1 # k=1 X X K 250 E (y a) = Ep [E (y a; p)] = (p0 q0) ak; j j k=1 X

251 so that in the absence of correlations between locus e¤ects

2 2 252 V ara [E (y a)] = K (p0 q0)  : (27) j a

253 Likewise, using (24)

(a + b + 2) 2 2 254 Ea [V ar (y a)] = 2p0q0 K a +  + VE: (28) j (a + b + 1)  255 Combining (27) and (28) as required by (26) leads to

2 (a + b + 2) 2 256 V ar (y) = (p0 q0) + 2p0q0 K ( +  + 1)  a  a b a  (a + b + 2) 2 257 +2pq K + VE: (29) (a + b)

The variance of the marginal distribution of the phenotypes has, thus, three components. The 2 one involving a relates to uncertainty about or random variation of marker e¤ects. The second,

( +  + 2) 2pq a b K2; (a + b)

258 is exactly the additive genetic variance when the mean of the distribution of marker e¤ects () is

259 known with complete certainty, and the third component is the environmental variance VE: If  = 0

260 and the …rst term of (29) is interpreted as additive genetic variance (VA00), it turns out that

2 VA00 261 a = : (30) 2 (a+b+2) (p0 q0) + 2pq K (a+b) n o This is similar to HABIER et al. (2007) only if p0 = q0; a + b is large enough, and K is very large, such that 1 K lim 2pkqk = 2p(1 p)f (p a; b) dp; K K j !1 k=1 X Z 262 where f (p  ;  ) is the beta density representing variation of allelic frequencies. j a b

10 263 3 Resemblance between relatives

264 3.1 Standard results

265 Quantitative trait loci (QTL) are most often unknown, so their e¤ects and their relationships to

266 those of markers are di¢ cult to model explicitly. An alternative is to focus on e¤ects of markers,

267 whose genotypes are presumably in linkage disequilibrium with one or several QTL, so can be

268 thought of as proxies. Using a slightly di¤erent notation, the marker-based linear model suggested

269 by MEUWISSEN et al. (2001) for genomic-assisted prediction of genetic values is

270 y = X + Wam+e; (31)

271 where y is an n 1 vector of phenotypic values, is a vector of macro-environmental nuisance  272 parameters, X is an incidence matrix, am = amk is a K 1 vector of additive e¤ects of markers, f g  273 and W = wik is a known incidence matrix containing codes for marker genotypes, e.g., -1, 0 and f g th 2 274 1 for mm; Mm and MM; respectively. Let the i row of W be w and assume that e N 0; I i0  e 275 is a vector of micro-environmental residual e¤ects.  276 If genotypes are sampled at random from the population, this induces a probability distribution

277 with mean vector

278 E (y ; a ) = X +E (W) am j m 279 = X + (P Q) am; (32) where p1 p2 : : : pK q1 q2 : : : qK 2 p1 p2 : : : pK 3 2 q1 q2 : : : qK 3 :::::: :::::: 6 7 6 7 E (W) = 6 7 6 7 = P Q; 6 :::::: 7 6 :::::: 7 6 7 6 7 6 7 6 7 6 :::::: 7 6 :::::: 7 6 7 6 7 6 p1 p2 : : : pK 7 6 q1 q2 : : : qK 7 6 7 6 7 4 5 4 5 280 and P and Q are matrices whose columns contain pk and qk; respectively, in every position of column K 281 k. Every element of the column vector (P Q) am is the same and equal to = (pk qk) ak, a k=1 282 constant that can be absorbed into the intercept element of : Hence, and withoutX loss of generality,

283 E (y ; a ) = E (y ). j m j The covariance matrix (regarding and marker e¤ects am as …xed parameters) is

2 V ar (y ; a) = V ar (Wam) + I : j e

11 Here

am0 V ar (w1) am am0 Cov (w1; w20 ) am ::: am0 Cov (w1; wn0 ) am 2 : am0 V ar (w2) am ::: am0 Cov (w2; wn0 ) am 3 ::::: : 6 7 V ar (Wam am) = 6 7 : j 6 ::::: : 7 6 7 6 7 6 ::::: : 7 6 7 6 symmetric : : : : a0 V ar (wn) am 7 6 m 7 4 5 284 If genotypes are drawn at random from the same population, all wi vectors have the same distri-

285 bution, that is, wi (p; D) for all i: If the population is in HW-LE, D = 2pkqk is a diagonal  f g 286 matrix. Further, if individuals are genetically related, o¤-diagonal terms am0 Cov wi; wj0 am are 287 not null, because of covariances due to genotypic similarity.   To illustrate, let i be a randomly chosen male mated to a random female, and let j be a randomly chosen descendant. Under HW-LE, genotypes at di¤erent marker loci are mutually independent, within and between individuals, so it su¢ ces to consider a single locus. It turns out that

1 1 Cov (w ; w ) = pq = (2pq) = V ar (w ) ; i j 2 2 i

1 288 yielding the standard result that the covariance between genotypes of o¤spring and parent is 2 of 289 the variance between genotypes at the locus in question. This generalizes readily to any type of

290 additive relationships in the population.

291 Letting rij be the additive relationship between i and j

am0 Dam r12am0 Dam : : : r1nam0 Dam 2 am0 Dam : : : r2nam0 Dam 3 :::: 6 7 292 V ar (Wam am) = 6 7 j 6 :: 7 6 7 6 7 6 :: 7 6 7 6 symmetric a0 Dam 7 6 m 7 293 = 4VAAr; 5 (33)

294 where K 2 295 VA = am0 Dam= 2pkqkak (34) k=1 X 296 is the additive genetic variance among multi-locus genotypes in HW-LE, and Ar= rij is the matrix f g 297 of additive relationships between individuals; D is diagonal in this case. The variance-covariance

298 matrix (33) involves the …xed marker e¤ects am; but these get absorbed into VA:

299 The implication is that a model with the conditional (given marker e¤ects and gene frequencies)

300 expectation function

301 E (y ; a ) = X + (P Q) am; (35) j m

12 302 (recall that (P Q) am = 1 ) and conditional covariance matrix K 2 2 303 V ar (y ; a ) = 2pkqka Ar + I ; (36) j m k e k=1 ! X

304 has the equivalent representation

305 y = X + a + e; (37)

K 306 with a = Wam = ai = wikak distributed as  k=1  P 307 a (0; ArVA) : (38) 

308 This is precisely the standard model of quantitative genetics applied to a …nite number of marker

309 loci (K) ; where additive genetic variation stems from the sampling of genotypes but not of their

310 e¤ects. The assumption of normality is not required.

311 Formulae under LD are given in the second section of the Appendix.

312 3.2 Estimating the pedigree based relationship matrix and expected heterozy-

313 gosity using markers

Consider model (37)

y = X + a + e = X + Wam + e:

Given the observed marker genotypes W; the distribution of W is irrelevant with regards to inference

about a. It is relevant only if one seeks to estimate parameters of the genotypic distribution, e.g., gene frequencies and linkage disequilibrium statistics. Consider, for instance, the expected value of

WW0:

E WW0 = E wi0wj ; i; j = 1; 2; :::; n: K   314 where wi0wj = wi;kwj;k; and the sum is over markers. If all individuals belong to the same k=1 315 genotypic distribution,P as argued above,

K K K 2 316 E w0wj = E (wi;kwj;k) = rij 2pkqk + (pk qk) ; (39) i k=1 k=1 k=1  X X X 1 K 2 as in HABIER et al. (2007); if pk = qk = 2 ; then E (wi0wj) = rij 2 so E K WW0 = Ar: 2 WW Under this idealized assumption K 0 provides an unbiased estimator of the pedigree based additive relationship matrix, but only if HW-LE holds. Note that under a beta distribution of gene frequencies K 1 (a + b) lim 2pkqk = 2p0q0 : K K ( +  + 1) !1 k=1 a b X

13 Likewise K 1 2 (a + b) lim (pk qk) = 1 4p0q0 : K K ( +  + 1) !1 k=1 a b X 317 Using this continuous approximation (39) becomes

318 E w0wj = rijKH0 + K (1 2H0) i 319  = K + KH0(rij 2);

(a+b) where H0 = 2pq = 2p0q0 is the expected heterozygosity. Since the relationship holds for (a+b+1) any combination i; j and there are n(n + 1)=2 distinct elements in WW0 and in Ar, one can form an estimator of mean heterozygosity as

wi0wj K H0 = ; K(rij 2) b 320 where wi0wj and rij are averages over all distinct elements in WW0 and Ar: The estimator is simple, 321 but no claim is made about its properties.

322 4 Connection with the in…nitesimal model

323 Clearly, (31) or (37) with (38) involve a …nite number of markers, and the marked additive genetic

324 variance is a function of allelic frequencies and of e¤ects of individual markers. In the in…nitesimal

325 model, on the other hand, the e¤ects of individual loci or of gene frequencies do not appear explicitly.

326 How do these two types of models connect?

327 The vector of additive genetic e¤ects (or marked breeding value) of all individuals is a = Wam = K 328 a = wikak : Next, assume that e¤ects a1; a2; :::; aK are independently and identically f ig (k=1 ) X 2 2 329 distributed as ai N 0;  ; but this does not need to be so. Again,  is not the additive genetic  a a 330 variance, which is given by (34) above. 2 331 Assuming am N 0; I implies that a = Wam must be normal (given W). However, the  a  332 elements of W (indicators of genotypes, discrete) are also random so the …nite sample distribution of 333 a is not normal; however, as K goes to in…nity, the distribution approaches normality, as discussed

334 later in this section of the paper. If am and W are independently distributed, the mean vector and

335 covariance matrix of the distribution of marked breeding values are

336 E (a) = Ea [EW (Wam a )] = Ea [(P Q) am] = 0; (40) m j m m

337 and

338 V ar (a) = V ara [E (Wam a )] + Ea [V ar (Wam a )] : (41) m j m m j m

14 Using the fact that EW (Wam a ) = (P Q) am and (33), then under LE assumptions j m K 2 2 V ar (a) = (P Q)(P Q)0  + ArEa 2pkqka : a m k k=1 ! X K K 2 2 339 Since Eam 2pkqkak = a 2pkqk, it follows that k=1 ! k=1 X X K 2 2 340 V ar (a) = (P Q)(P Q)0  + Ar  2pkqk : (42) a a " k=1 # X 341 This shows that when both genotypes (the w0s) and their e¤ects (the a0s) vary at random according

342 to independent trinomial (at each locus) and normal distributions, respectively, the variance of the

343 distribution of a marked breeding value a is a¤ected by di¤erences in gene frequencies pk qk, 2 344 by the variance of the distribution of marker e¤ects a; and by the level of heterozygosity. In the 1 345 special case where allelic frequencies are equal to 2 at each of the loci, the …rst term vanishes and 2 2 2 Ka 346 one gets V ar (a ) = A  ; where  = : This can be construed as as the counterpart of the  r a a 2 347 polygenic additive variance of the standard in…nitesimal model, but in a special situation. Again, 2 e e 348 this illustrates that a relates to additive genetic variance in a more subtle manner than would

349 appear at …rst sight. K

What is the distribution of marked breeding values ai = wikak when both wik and ak vary k=1 X at random? Because ak is normal, the conditional distribution ai wi is normal, with mean 0 and K j 2 2 variance a wik: Thus, one can write the density of the marginal distribution of ai as k=1 X

p (ai) = ::: p (ai wi) Pr (wi1; wi2; :::wik) ; w w w j Xi1 Xi2 XiK

where Pr (wi1; wi2; :::wik) = Pr (wi) is the probability of observing the K-dimensional marker geno-

type wi in individual i: If the population is in HW-LE at all K marker loci, the joint distribution of genotypes of an individual over the K loci is the product of the marginal distributions at each of the marker loci, that is

K 2 wik1 wik2 2 wik3 Pr (wi) = pk (2pkqk) qk : k=1 Y  

For example, if individual i is AA at locus k; wik1 = 1; wik2 = 0 and wik3 = 0; if i is heterozygote

wik1 = 0; wik2 = 1 and wik3 = 0; and if i has genotype aa then wik1 = 0; wik2 = 0 and wik3 = 1.

Note that wik3 = 1 wik1 wik2; so that there are only 2 free indicator variates. It follows that K the marginal distribution of ai is a mixture of 3 normal distributions each with a null mean,

15 K 2 2 but distribution speci…c variance  w : As K ; the mixing probabilities Pr (wi) become a ik ! 1 k=1 in…nitesimally small, so that X

K 2 wikak 1 2 k=1 ! 3 p (ai) exp X f(w)dw;  K 6 K 7 Z 2 6 2 2 7 22 w 6 2a wik 7 v a ik 6 7 u k=1 k=1 u X 6 X 7 t 4 5 for some density f(w): The mixture distribution of ai must necessarily converge towards a Gaussian K

one, because ai = wikak is the sum of a large (now in…nite) number of independent random k=1 X variates (note that under LE wikak is independent of any other wik0 ak0 because genotypes at di¤erent loci are mutually independent), so the central limit theorem holds; it holds even under weaker assumptions. Then 2 a N 0;  : i  a Since all components of the mixture have null means, its variance is just the average of the variances of all components of the mixture (GIANOLA et al. 2006), that is

K 2 2 2 a = lim ::: Pr (wi) a wik :  K !1 wi1 wi2 w k=1 ! X X XiK X

350 This is the additive genetic variance of an in…nitesimal model, i.e., one with an in…nite number of

351 loci, so that the probability of any genotypic con…guration is in…nitesimally small.

When the joint distribution of additive genetic values a of a set of individuals is considered, it is reasonable to conjecture that it should be multivariate normal, especially under LE. Its mean

vector is E (a) = 0, as shown in (40), and the covariance matrix of the limiting process can be deduced from (42). The …rst term becomes null, because allelic frequencies became in…nitesimally small as K ; so the covariance matrix tends to ! 1 V ar (a ) = A 2 :  r a

352 As shown by HABIER et al. (2007), markers may act as a proxy for a pedigree. Hence, unless the

353 pedigree is introduced into the model in some explicit form, markers may be capturing relationships

354 among individuals, as opposed to representing ‡ags for QTL regions. At any rate, it is essential to

355 keep in mind that markers are not genes.

16 356 5 The Bayesian alphabet

357 The preceding part of the paper sets the quantitative genetics theory basis upon which marker

358 assisted prediction of breeding value (using linear models) rests. Speci…cally, MEUWISSEN et al.

359 (2001) use this theory to assess values of hyper-parameters of some Bayesian models proposed by

360 these authors. In what follows, a critique of some methods proposed for inference is presented.

361 5.1 Bayes A

362 MEUWISSEN et al. (2001) suggested using model (31) for marker-enabled prediction of additive

363 genetic e¤ects and proposed two Bayesian hierarchical structures, termed "Bayes A" and "Bayes

364 B". A brief review of these two methods follows, assuming that k denotes a SNP locus.

In Bayes A (using notation employed here), the prior distribution of a marker e¤ect ak, given some marker-speci…c uncertainty variance 2 , is assumed to be normal with null mean and disper- ak sion 2 . In turn, the variance associated to the e¤ect of each marker k = 1; 2; :::; K is assigned the ak same scaled inverse chi-square prior distribution 2 ; S2 where  and S2 are known degrees of ak j freedom and scale parameters, respectively. This hierarchy is represented as:

2 2 2 2 2 2 ak  N 0; ;  ; S S : j ak  ak ak j  

 2 365 The marginal prior induced for a is obtained by integrating the normal density over  ; yielding k ak 366 (BOX and TIAO 1973; SORENSEN and GIANOLA 2002)

1 2 2 2 2 2 367 p ak ; S = N 0; p  ; S  j ak ak j ak Z  0   1 1++2 a2 + S2 2 2 k 2 368  exp d _ ak 22 ak Z  ak  0  +1 2 2 ak 369 1 + : (43) _ S2   2 370 This is the kernel of the density of the t distribution ak 0; ; S ; which is the de facto prior in j 2 371 MEUWISSEN et al. (2001) assigned to a marker e¤ect. Again, ; S are assumed known and given 372 arbitrary values by the user; this is a crucial issue.

373 5.2 Bayes B

374 In Bayes B, MEUWISSEN et al. (2001) proposed

17 point mass at some constant c if 2 = 0 2 ak 375 ak  j ak  N 0;2 if 2 > 0 ( ak ak

2 0 with probability   376   = : ak 2 2 j ( S  with probability 1  This implies that the joint prior distribution of a and 2 ; given an arbitrary probability parameter k ak ; is a = c and 2 = 0 with probability  2 k ak p ak;   = ak j N 0;2 p S2 2 with probability 1 : ( ak   where c is a constant (if 2 = 0; this implies that a is known with certainty), taken by MEUWIS- ak k SEN et al. (2001) to be equal to 0; even though it does not need to be 0. Marginally, after integrating 2 out, the prior takes the form ak

a = c with probability  p (a ) = k k 2 j ( t 0; ; S with probability 1 :  377 Then, Bayes B reduces to Bayes A by taking  = 0:

378 5.3 A critique

379 Neither Bayes A nor Bayes B (or any variations thereof that assume marker-speci…c variances)

380 allow Bayesian learning on these variances to proceed far away from the prior. This means that

381 the hyper-parameters of the prior assigned to these variances (according to assumptions based on

382 quantitative genetics theory) will always have in‡uence on the extent of shrinkage e¤ected of marker 2 383 e¤ects. A user can arbitrarily control the extent of shrinkage simply by varying  and S . It su¢ ces

384 to illustrate this problem with Bayes A, since the problem propagates to Bayes B.

385 It is straightforward to show that the fully conditional (i.e., given all other parameters and the 2 386 data, a situation denoted as ELSE hereinafter) posterior distribution of  is the scaled inverse ak 2 2 2 S +ak 387 chi-square process   + 1; ; so Bayesian learning "moves" only a single degree of freedom ak +1 j 2 2 2 388 away from the prior distribution  ; S ; even though its scale parameter is modi…ed from S h aik 2 2 j S +ak 389 into +1 (most markers are expected to have nearly null e¤ects). Now, since ak is unknown, and 390 inferring it consumes information contained in the data, this implies that, unconditionally (that 2 391 is, a posteriori); inferences about  are even more strongly a¤ected by the information encoded ak 392 by its prior distribution than in the conditional posterior process. For instance, if  = 5; say, this 2 393 means that the posterior degree of belief about  has an upper bound at 6; irrespective of whether ak 394 data on millions of markers or of individuals become available. For parameter  of a model, Bayesian learning should be be such that the posterior coe¢ cient pV ar( DAT A) j of variation, that is CV = E( DAT A) ; tends to 0 asymptotically as DAT A accrue. This does j not happen in Bayes A or Bayes B for 2 : In Bayes A, the coe¢ cients of variation of the prior and ak

18 of the fully conditional posterior distribution are

2 CV 2 = ;  > 4; ak ( 4) s  and 2 CV 2 ELSE = ;  > 3; ak j ( 3) s  2 2 1 395 respectively, so that CV  ELSE =CV  = 1 : This ratio goes to 1 rapidly as the ak ak ( 3) j 396 degrees of freedom of the prior increase (meaning  thatq the prior "dominates" inference), as illustrated 397 in Figure 1. For example, if  = 4:1;  = 5:1 and  = 6:1; the ratio between the coe¢ cients of 2 398 variation of the conditional posterior distribution of  and that of its prior is approximately 0:30; ak 399 0:72 and 0:82, so that the prior is in‡uential even at mild values of the degrees of freedom. Given a

400 large , the conditional posterior essentially copies the prior, with the contribution of DAT A being 2 401 essentially nil. As mentioned, since marginal posterior inferences about  require deconditioning ak 402 over ak (thus consuming information contained in the data), the impact of the prior will be even

403 more marked at the margins. This is questionable, at least from an inference perspective.

404 Another way of illustrating the same problem is based on computing information gain, i.e., the 2 405 di¤erence in entropy before and after observing data. Since the posterior distribution of  is ak 2 406 unknown, we consider the entropy of the fully conditional posterior distribution of  ; instead of ak 407 that of the marginal process. This provides an upper bound for the information gain. The entropy

408 of the prior is

2 2 409 H  ; S ak j 2 2 2 2 2 410 = log p   ; S p  ; S d ak j ak j ak Z   S2     d  411 = log + 1 + log : (44) 2 2 2 2 d  2      2   2  2 412 In the entropy of the fully conditional posterior distribution of  , H  ELSE ;  is replaced ak ak 2 2 2 2 2 2 j 413 by  + 1 and S by S + a (it is expected that S + a S for most markers). The relative k k    414 information gain (fraction of entropy reduced by knowledge encoded in ELSE) is then

H 2 ; S2 H 2 ELSE ak ak 415 RIG = j j : (45) H 2 ; S2   ak j     416 For instance, RIG = 1 if the entropy of the conditional posterior process is 0: Assume a nil marker

417 e¤ect and a root-scale parameter S equal to 1. For ak = 0, S = 1 and  = 4;RIG = 0:125; 2 418 for ak = 0, S = 1 and  = 10; then RIG = 6: 51 10 ; and for ak = 0;S = 1 and  = 100; 3  419 RIG = 9:60 10 : Even at mild values of the prior degrees of freedom , the extent of uncertainty  420 reduction due to observing data is negligible. Metaphorically, the prior is totalitarian in Bayes A,

19 2 421 at least for each one of the  parameters. ak A third gauge is the Kullback-Leibler distance (KL) between the prior and conditional posterior distributions. The KL metric (KULLBACK 1968) is the expected logarithmic divergence between two parametric distributions, one taken as reference or point of departure. Using the prior as reference distribution, the expected distance is

2 2 2 2 KL [conditional; prior] = L ;  + p; S ; am p  ; S d ; ak j ak Z   where p 2 ; S2 2 2 ak L ;  + p; S ; am;  = log j ; ak p 2 ELSE ak   j 2 422 is a randomly varying distance (randomness is due to uncertainty about  ); and p is the number ak 423 of markers that are assigned the same variance, so that p = 1 and am = ak in Bayes A; however, p

424 could be much larger if, say, all p markers on the same chromosome were assigned the same variance.

425 The impact of the degrees of freedom of the prior on the random quantity L (:) of the integrand 2 426 in KL was examined by assuming that the conditional posterior distribution of  had a = 0 ak k 427 (again, most marker e¤ects are expected to be tiny, if not null), and that the scale parameter of the

428 prior of the marker-speci…c variances was S = 1: Figure 2 displays three scaled inverse chi-square

429 densities, all with the same parameter S = 1 and degrees of freedom 4; 10 or 100; as well as values of 2 2 430 the random quantity L ;  + p; S ; a ;  with p = 1 and a = 0 in the KL gauge. Also shown in m ak k 431 Figure 2 (in open circles) are values of L (:; :; :; :) for p = 10; meaning that, instead of having marker- 432 speci…c variances, 10 markers would share the same variance. While the three priors are di¤erent, 2 433 and re‡ect distinct states of prior uncertainty about  via their distinct degrees of freedom, the ak 2 2 2 434 L ;  + p; S ; a ;  values are essentially the same irrespective of  , and ‡at for any values m ak ak 2 435 of  appearing with appreciable density in the priors. On the other hand, if it is assumed that ak  2 2 2 436 L ;  + 10;S ; a = 0;  ; where  is now a variance assigned to a group of markers (even k ak ak 2 437 assuming that their e¤ects are nil), L (:; :; :; :) is steep with respect to  ; taking negative values for  ak 2 438  < 0:55 (roughly). In fact, the KL distances (evaluated with numerical integration) between the ak 2 439 prior and conditional posterior distributions are: 1) 7:33 10 for  = 4;S = 1; p = 1 and ak = 0; 2  3 440 2) 2:64 10 for  = 10;S = 1; p = 1 and ak = 0; and 3) 2:52 10 for  = 100;S = 1; p = 1 and   441 ak = 0; so that the conditional posterior is very close to the prior even at small values of the degrees

442 of freedom parameter. However, when the number of markers sharing the same variance increases

443 to p = 10 (assuming all ak0 s=0; as stated), KL=4:47; so that considerable Bayesian learning about 2 444  takes place in this situation. Relative to scenario (1) above, the KL distance increases by about ak 445 61 times. A pertinent question is whether or not the learned marker e¤ect (i.e., a draw from its conditional posterior distribution) has an important impact on KL via modi…cation of the scale parameter from 2 2 2 S +ak ak S into +1 : Let c = S be the realized value of the marker e¤ect in units of the "prior standard deviation" S, with c = 0; 0:01; 0:5; 1 and 2; the last two cases would be representative of markers

20 with huge e¤ects. The density of the conditional posterior distribution of 2 is then ak

+1 (+c2)S2 2 2 +1+2 2 2 2 2  + c S p  ELSE =    2 exp : ak j +1 ak 22 2 ak !    446 The KL distance between the conditional posterior and the prior for each of these …ve situations, 2 2 447 assuming S = 1 and  = 4 are: 1) KL(c = 0) = 7:33 10 ; 2) KL(c = 0:01) = 7: 32 10 ; 3) 2  2  448 KL(c = 0:5) = 4: 67 10 ; 4) KL(c = 1) = 1: 54 10 and 5) KL(c = 2) = 0:34: Even though   449 marker e¤ects are drastically di¤erent, the conditional posteriors are not too di¤erent (in the KL

450 sense) from each other, meaning that the extent of shrinkage in Bayes A (or B) continues to be

451 dominated by the prior. This is illustrated in Figure 3: even when c = 2 the conditional posterior

452 does not di¤er appreciably from the prior.

453 In short, neither Bayes A nor Bayes B, as formulated by MEUWISSEN et al. (2001) allow for

454 any appreciable Bayesian learning about marker-speci…c variances so that, essentially, the extent

455 of shrinkage of e¤ects will always be dictated strongly by the prior, which negates the objective of

456 introducing marker-speci…c variances into the model. The magnitudes of the estimates of marker ef-

457 fects can be made smaller or larger at will via changes of the degrees of freedom and scale parameters

458 of the prior distribution.

459 Arguably, Bayes B is not well formulated in a Bayesian context. MEUWISSEN et al. (2001) 2 460 interpret that assigning a priori a value  = 0 with probability  means that the speci…c SNP ak 461 does not have an e¤ect on the trait. As mentioned earlier in this paper, stating that a parameter

462 has 0 variance a priori does not necessarily mean that the parameter takes value 0 : it could have 2 463 any value, but known with certainty. Thus, assuming  = 0 implies determinism about such an ak 2 464 e¤ect. It turns out, however, that their sampler sets a = 0 when the state  = 0 is drawn! A k ak 465 more reasonable speci…cation is to place the mixture with a 0 state at the level of the e¤ects, but

466 not at the level of the variances.

467 5.4 Impact on predictions

A counterargument to the preceding critique could be articulated as follows: "Even though the prior a¤ects inferences about marker speci…c variances, this is practically irrelevant, because one can "kill" the in‡uence of the prior on estimates of marker e¤ects simply by increasing sample size".

Super…cially, it seems valid, because the fully conditional posterior distribution of ak (assuming a

21 model with a single location parameter ) is

n K wik 0yi  wik ak1 2 0 3 i=1 k0=1 2 X B kX=k C  6 06 e 7 ak ELSE N 6 B n C; n 7 ; k = 1; 2; :::; K: j  6 @ 2 A 2 7 6 2 e 2 e 7 6 wik + 2 wik + 2 7 6 ak ak 7 6 i=1 i=1 7 6 X X 7 6 7 6 7 4 5 n n 2 2 e 2 2 468 As sample size n increases w + tends to w so the in‡uence of  vanishes asymp- ik 2 ik ak ak i=1 i=1 469 totically, given some …xed valuesX of ; S: This indicatesX that, in Bayes A, even though Bayesian 2 470 learning about the  parameters is limited, the in‡uence of the prior on the posterior distribu- ak K 471 tions of marker e¤ects and of the genetic values wikak dissipates in large samples. However, in k=1 472 marker assisted prediction of genetic values n <

473 of Bayes A with respect to the prior in a …nite sample was examined by simulation.

474 5.5 Simulation

Bayes A was …tted under di¤erent prior speci…cations to a simple data-structure. Records for 300 individuals were generated under the additive model

280 yi = wikak + ei; i = 1; 2; :::; 300 k=1 X

475 where yi is the phenotype for individual i, and the rest is as before. Residuals were independently

476 sampled from a standard normal distribution.

477 Two linkage disequilibrium (LD) scenarios regarding the distribution of the 280 markers were

478 generated. In scenario X0, markers were in weak LD, with almost no correlation between genotypes

479 of adjacent markers (Table 1). In scenario X1 LD was relatively high: the correlation between

480 markers dropped from 0.772 for adjacent markers to 0.354 for markers separated by 3 positions

481 (Table 1). E¤ects of allele substitutions were kept constant across simulations, and were set to zero

482 for all markers except for 10, as shown in Figure 4. The locations of markers with non-null e¤ects

483 were chosen such that di¤erent situations were represented. For example (Figure 4), in chromosome

484 3 there were two adjacent markers with opposite e¤ects, while chromosome 4 had two adjacent

485 markers with equal e¤ects.

486 A Monte Carlo study with 100 replicates was run for each of the two LD scenarios. For each

487 replicate and LD scenario, 9 variations of Bayes A were …tted, each de…ned by a combination of

488 prior values of hyper-parameters. In all cases, a scale inverted chi-squared distribution with 1

489 degree of freedom and scale parameter equal to 1 was assigned to the residual variance. The 9

22 490 priors considered are in Table 2. Hyper-parameter values were chosen such that the prior had, at

491 most, the same contribution to the degrees of freedom of the fully conditional distribution as the 2 492 information coming from the remaining components of the model (i.e., 1). Values of S were chosen

493 following similar considerations. Note that if the samples of marker e¤ects are equal to their true 2 2 494 value ak 0:2 (see Figure 4). Priors 1-3 are improper, and the other six priors are proper but do  2 5 495 not possess …nite means and variances. Therefore, scenarios with S = 10 correspond to cases of

496 relatively small in‡uence of the prior on the scale parameter of the fully conditional distribution, 2 2 497 while S = 5 10 represents a case where the fully conditional distribution has a strong dependency  498 on the prior speci…cation.

499 For each of these models and Monte Carlo replicates 35,000 iterations of the Gibbs Sampler

500 were run, and the …rst 5,000 iterations were discarded as burn-in. Inspection of trace plots and

501 other diagnostics (e¤ective sample size, MC standard error) computed using Coda (PLUMMER et

502 al. 2008) indicated that this was adequate to infer quantities of interest.

503 Table 3 shows the average (across 100 MC replicates) of posterior means of the residual variance

504 and of the correlation between the true and estimated quantity of several features. This provides

505 an assessment of goodness of …t, of how well the model estimates genomic values, and of the extent

506 to which the model can uncover relevant marker e¤ects. As expected, Bayes A was sensitive with

507 respect to prior speci…cation for all items monitored. Scenarios 4 and 7 produced over-…tting (low

508 estimate of residual variance, whose true value was 1, and high correlation between data and …tted

509 values). It also had a low ability to recover signal (i.e., to estimate marker e¤ects and genomic

510 values), as indicated by the corresponding correlations. Other priors (e.g., 6) produced a model

511 with a better ability to estimate genomic values and marker e¤ects. These results were similar in

512 both scenarios of LD. The results in Table 3 also indicate that it is much more di¢ cult to uncover

513 marker e¤ects than to predict genomic values.

514 In order to have a measure of the ability of each model to locate genomic regions a¤ecting

515 the trait, an index was created as follows. For each marker having a non-null e¤ect and for each

516 replicate, a dummy variable was created indicating whether or not the marker, or any of its 4

517 ‡anking markers, ranked among the top-20 based on the absolute value of the posterior mean of

518 the marker’se¤ect. Averaging across markers and replicates led to an index of “retrieved regions”

519 (Table 4). Results suggest that the ability of Bayes A to uncover relevant genomic regions is also

520 a¤ected by the choice of hyper-parameters. For example, in scenarios 1, 4 and 7 only 1 out of …ve

521 regions were retrieved by Bayes A. On the other hand, the fraction of retrieved regions was twice as

522 large when using other priors (scenarios 2, 3 or 6). The ability to uncover genomic regions a¤ecting

523 a trait was usually worse with high LD, due to redundancy between markers.

524 6 Discussion

525 This paper examined two main issues associated with the development of statistical models for

526 genome-assisted prediction of quantitative traits using dense panels of markers, such as single nu-

23 527 cleotide polymorphisms. The …rst one is the relationship between parameters from standard quan-

528 titative genetics theory, such as additive genetic variance, and those from marker-based models,

529 i.e., the variance of marker e¤ects. In a Bayesian context, the latter act mainly as a measure of

530 uncertainty. It was shown that the connection between the variance of marker e¤ects and the addi-

531 tive genetic variance depends on what is assumed about locus e¤ects. For instance, in the classical

532 model of FALCONER and MACKAY (1996), locus e¤ects are considered as …xed and additive ge-

533 netic variance stems from random sampling of genotypes. In order to introduce a variance of marker

534 e¤ects, these must be assumed to be random samples from same distribution which, in the Bayesian

535 setting, is precisely an uncertainty distribution. The paper also discussed assumptions that need to

536 be made in order to establish a connection between the two sets of parameters, and also introduced

537 a more general partition of variance, in which genotypes, e¤ects and allelic frequencies are random

538 variables. Some expressions for relating additive genetic variance and that of marker e¤ects are

539 available under the assumption of linkage equilibrium, as discussed in the paper. However, accom-

540 modating linkage disequilibrium explicitly into an inferential system suitable for marker-assisted

541 selection represents a formidable challenge.

542 The second aspect addressed in the study was a critique of methods Bayes A and B as proposed

543 by MEUWISSEN et al. (2001). These methods require specifying hyper-parameters that are elicited

544 using formulae related to those mentioned in the paragraph above; however, the authors did not

545 state the assumptions needed precisely. It was shown here that these hyper-parameters can be

546 in‡uential.

547 The in‡uence of the prior on inferences and predictions via Bayes A can be mitigated in several

548 ways. One, consists of forming clusters of markers such that their e¤ects share the same variance.

549 Thus, shrinkage would be speci…c to the set of markers entering into the cluster. The clusters could

550 be formed either on the basis of biological information (e.g., according to coding or non-coding

551 regions speci…c to a given chromosome) or perhaps statistically, using some form or supervised

552 or unsupervised clustering procedure. If clusters of size p were formed, the conditional posterior

553 distribution of the variances of the markers would have p +  degrees of freedom, instead of 1 + 

554 in Bayes A. A second way of mitigating impact of hyper-parameters, is to assign a non-informative

555 prior to the scale and degrees of freedom parameters of Bayes A. This has been done in quantitative

556 genetics, as demonstrated by STRANDÉN and GIANOLA (1998) and ROSA et al. (2003, 2004),

557 and discussed in SORENSEN and GIANOLA (2002). For example, STRANDÉN and GIANOLA

558 (1998) used models with t distributions for the residuals (the implementation would be similar 559 in Bayes A, with the t distribution assigned to marker e¤ects instead), with unknown degrees 560 of freedom and unknown parameters. In STRANDÉN and GIANOLA (1998) a scaled inverted

561 chi-square distribution was assigned to the scale parameter of the t distribution, and equal prior 562 probabilities were assigned to a set of mutually exclusive and exhaustive values of the degrees of

563 freedom. On the other hand, ROSA et al. (2003, 2004) presented a more general treatment, in which

564 the degrees of freedom were sampled with a Metropolis-Hastings algorithm. A third modi…cation

565 of Bayes A would consist in combining the two preceding options, i.e., assign a common variance

24 566 to a cluster of marker e¤ects and then use non-informative priors, as in ROSA et al. (2003, 2004)

567 to the parameters of the t distribution. Applications of thick tailed priors, such as the t or the 568 double exponential distribution, to models with marker e¤ects, are presented in YI and XU (2008)

569 and de los CAMPOS et al. (2009a).

570 Bayes B requires a reformulation (and receive a new letter, to avoid confusion!), e.g., the mixture

571 with a zero state posed at the level of e¤ects and not at that of the variances, as discussed earlier.

572 For example, one could assume that the marker e¤ect is 0 with probability ; or that it follows

573 a normal distribution with common variance otherwise. Further, the mixing probability  could

574 be assigned a prior distribution, e.g., a beta process, as opposed to specifying an arbitrary value

575 for : Mixtures models in genetics are discussed, for example, by GIANOLA et al. (2006b) and

576 some new, yet unpublished, normal mixtures for marker-assisted selection are being developed by

577 FERNANDO (http://dysci.wisc.edu.edu/sglpe/pdf/Fernando.pdf)

578 A more general solution is to use a non-parametric method, as suggested by GIANOLA et al.

579 (2006a), GIANOLA and van KAAM (2008), GIANOLA and DE LOS CAMPOS (2008) and casted

580 more generally by de los CAMPOS et al. (2009). These methods do not make hypotheses about

581 mode of inheritance, contrary to the parametric methods discussed above, where additive action is

582 assumed. Evidence is beginning to emerge that non-parametric methods may have better predictive

583 ability when applied to real data (GONZÁLEZ-RECIO et al. 2008, 2009; LONG et al. 2009).

584 In conclusion, this paper discussed connections between marker-based additive models and stan-

585 dard models of quantitative genetics. It was argued that the relationship between the variance of

586 marker e¤ects and the additive genetic variance is not as simple as it has been reported, becoming

587 especially cryptic if the assumption of linkage equilibrium is violated, which is manifestly the case

588 with dense whole-genome markers. Also, a critique of earlier models for genomic-assisted evaluation

589 in animal breeding was advanced, from a Bayesian perspective, and some possible remedies of such

590 models were suggested.

591 Acknowledgements

592 Dr. Hugo Naya, Institut Pasteur of is thanked for having engineered the two linkage disequi-

593 librium scenarios used in the simulation. Part of this work was carried out while the senior author was

594 a Visiting Professor at Georg-August-Universität, Göttingen (Alexander von Humboldt Foundation Senior

595 Researcher Award), and Visiting Scientist at the Station d’Amélioration Génétique des Animaux, Centre de

596 Recherche de Toulouse (Chaire D’Excellence Pierre de Fermat, Agence Innovation, Midi-Pyreneés). Support

597 by the Wisconsin Agriculture Experiment Station, and by grant NSF DMS-NSF DMS-044371 to the …rst and

598 second authors is acknowledged. The Associate Editor is thanked for his careful reading of the manuscript

599 and for help in clarifying some sections of the paper.

600 APPENDIX

601 Linkage disequilibrium

25 K K 602 The expression 2pkqk = V ar (wk) results from jointly sampling genotypes (but not their i=1 k=1 X X 603 e¤ects) at K loci in linkage equilibrium. This is a needed assumption for arriving at (19). On

604 the other hand, if there is linkage disequilibrium (LD), the additive genetic variance (under HW

605 equilibrium at each locus) is

K 606 VA(D) = V ar wkak k=1 ! X K K K 2 607 = 2pkqkak + 2 2Dklakal; (46) k=1 k=1 l>k X X X

608 where Dkl = Pr (AB) pA;k pB;l is the usual LD statistic involving the two loci in question. The kl 609 …rst term in (46) is the additive genetic variance under LE; the second term is a contribution to

610 variance from LD, and it may be negative or positive. It can be shown that the average correlation 1 611 between genotypes at a pair of loci is (approximately) less than K :

612 The average (over a e¤ects) variance under LD depends on the distribution of the a0s: If these 2 613 are independently and identically distributed with mean  and variance a; one has

K K K 2 2 2 614 VA0 (D) = E [VA(D)] = a +  2pkqk + 2 2Dkl: (47) i=1 k=1 l>k  X X X

615 If  = 0; then VA0 (D) = VA0 ; in which case linkage disequilibrium would not a¤ect relationship (19). There is no mechanistic basis for expecting that all loci have the same e¤ects and for these being mutually independent. There may be some genomic regions without any e¤ect at all, or some regions may induce similarity (or dissimilarity) of e¤ects; for example, if two genes are responsible for producing a …xed amount of transcript, their e¤ects would be negatively correlated, irrespective of whether or not genotypes are in linkage equilibrium. A more general assumption may be warranted, i.e., that e¤ects follow some multivariate distribution a ; V2 where 2 is just a dispersion  a a parameter. Here, with w being the vector of genotypes for the K loci, one can write the average variance under LD K

VA0 (D) = E V ar wkak = E a0Ma ; " k=1 !# X   where M is the covariance matrix of w (diagonal under LE)

p1q1 D12 :::D1K 2 p2q2 D2K 3 :::: 6 7 M =2 6 7 : 6 ::: 7 6 7 6 7 6 :: 7 6 7 6 Symmetric pK qK 7 6 7 4 5

26 Further, 2 VA0 (D) = 0M+atr (MV) ;

616 where tr (:) is the trace of the matrix in question. The counterpart of (19) is

2 VA0 (D) 0M 617  = ; (48) a tr (MV)

618 which is a complex relationship even if  = 0: It follows that (19), appearing often in the literature,

619 holds only under strong simplifying assumptions. In short, the connection between additive genetic

620 variance and the variance of marker e¤ects depends on the unknown means of the distributions of

621 marker e¤ects, their possible covariances (induced by unknown molecular and chromosomal process),

622 their gene frequencies, and all pairwise linkage disequilibrium parameters, which are a function of the

623 Dkl0 s: It is not obvious what the e¤ects of using (19) as an approximation are, but the assumptions 624 surrounding it are undoubtedly strong.

625 Covariance between relatives and linkage disequilibrium

626 Linkage disequilibrium complicates matters, as noted earlier. The covariance between marked K 2 627 genetic values of individuals i and j; instead of being rijam0 Dam=rij 2pkqkak takes the form k=1 X K K 628 Cov w0am; w0 am am = Cov (wi;kwj;k) akal; i j j k=1 l=1  X X wi;1;wj;1 wi;1;wj;2 : : : wi;1;wj;K 0 2 wi;2;wj;2 : : : wi;2;wj;K 31 :::: B 6 7C 629 = am0 BCov 6 7C am B 6 ::: 7C B 6 7C B 6 7C B 6 :: 7C B 6 7C B 6 symmetric wi;K;wj;K 7C B 6 7C @ 4 5A 630 = am0 Mam0 :

631 and M is no longer a diagonal matrix, because of LD creating covariances between genotypes at

632 di¤erent marker loci. The diagonal elements of M have the form (assuming HW frequencies within

633 each locus)

634 mij;k = Cov (wi;k;wj;k) = rij2pkqk; k = 1; 2; :::; K

and the o¤-diagonals are

m = rij2Dkl; k = l: ij;k;l 6

635 This implies that disequilibrium statistics D must be brought into the picture when estimating a

636 pedigree relationship matrix using markers in LD.

27 637 LITERATURE CITED

638 BARTON, N. H. and H. P. de VLADAR, 2009 Statistical mechanics and the evolution of poly-

639 genic quantitative traits. Genetics 181: 997 -1011.

640 BOX, G. E. P., and G. C. TIAO, 1973 Bayesian Inference in Statistical Analysis. Addison-

641 Wesley, Reading, MA.

642 DE LOS CAMPOS, G., D. GIANOLA, and G. J. M. ROSA, 2009a Reproducing kernel Hilbert

643 spaces regression: a general framework for genetic evaluation. Journal of Animal Science 87: 1883-

644 1887.

645 DE LOS CAMPOS, G., NAYA, H., D. GIANOLA, J. CROSSA, A. LEGARRA, E. MANFREDI,

646 K. WEIGEL and J. M. COTES, 2009b Predicting Quantitative Traits with Regression Models for

647 Dense Molecular Markers and Pedigrees. Genetics 182: 375-385.

648 FALCONER, D. S., and T. F. C. MACKAY, 1996 Introduction to Quantitative Genetics. Ed.

649 4. Longmans Green, Harlow, Essex, UK.

650 FERNANDO, R. L., and M. GROSSMAN, 1989 Marker-assisted selection using best linear

651 unbiased prediction. Genet. Sel. Evol. 33: 209-229.

652 GIANOLA, D., and de los CAMPOS, G., 2008 Inferring genetic values for quantitative traits

653 non-parametrically. Genetics Research 90: 525-540.

654 GIANOLA, D., and R. L. FERNANDO, 1996 Bayesian methods in animal breeding. J. Anim.

655 Sci. 63: 217-244.

656 GIANOLA, D., R. L. FERNANDO, and A. STELLA, 2006a Genomic assisted prediction of

657 genetic value with semi-parametric procedures. Genetics 173: 1761-1776.

658 GIANOLA, D., B. HERINGSTAD, and J. ØDEGÅRD, 2006b On the quantitative genetics of

659 mixture characters. Genetics 173: 2247-2255.

660 GIANOLA, D., M. PÉREZ-ENCISO, and M. A. TORO, 2003 On marker-assisted prediction of

661 genetic value: beyond the ridge. Genetics 163: 347-365.

662 GIANOLA, D., and J. B. C. H. M. van KAAM, 2008 Reproducing kernel Hilbert spaces methods

663 for genomic assisted prediction of quantitative traits. Genetics 178: 2289-2303.

664 GONZÁLEZ-RECIO, O., D. GIANOLA, N. LONG, K. A. WEIGEL, G. J. M. ROSA, and

665 S. AVENDAÑO, 2008 Nonparametric methods for incorporating genomic information into genetic

666 evaluations: an application to mortality in broilers. Genetics 178: 2305-2313.

667 GONZÁLEZ-RECIO, O., D. GIANOLA, G. J. M. ROSA, K. A. WEIGEL, and A. KRANIS,

668 2009 Genome-assisted prediction of a quantitative trait in parents and progeny: application to food

669 conversion rate in chickens. Genetics, Selection, Evolution 41: 3-13.

670 HABIER, D., R. L. FERNANDO, and J. C. M. DEKKERS, 2007 The impact of genetic rela-

671 tionship Information on genome-assisted breeding values. Genetics 177: 2389 -2397.

672 HAYES, B. J., P. J. BOWMAN, A. J. CHAMBERLAIN, and M. E. Goddard, 2009 Invited

673 review: genomic selection in dairy cattle: progress and challenges. Journal of Dairy Science 92:

674 433-443.

28 675 HEFFNER, E. L, M. E. SORRELL, and J. L. JANNINK, 2009 Genomic Selection for Crop

676 Improvement. Crop Sci 49: 1-12.

677 KULLBACK, S., 1968 Information Theory and Statistics. Ed. 2., Dover, New York.

678 LANDE, R., and R. THOMPSON, 1990 E¢ ciency of marker-assisted selection in the improve-

679 ment of quantitative traits. Genetics 124: 743-756.

680 LONG, N., D, GIANOLA, G. J. M. ROSA, K. A. WEIGEL, and S. AVENDAÑO, 2007 Ma-

681 chine learning classi…cation procedure for selecting SNPs in genomic selection: application to early

682 mortality in broilers. Journal of Animal Breeding and Genetics 124: 377-389.

683 LONG, N., D, GIANOLA, G. J. M. ROSA, and K. A. WEIGEL, 2009 Radial basis function

684 regression methods for predicting quantitative traits using SNP markers. Genetics Research (sub-

685 mitted)

686 MAHER, B. (2008) The case of the missing heritability. Nature 456: 18-21.

687 MEUWISSEN, T. H., B. J. HAYES, and M. E. GODDARD, 2001 Prediction of total genetic

688 value using genome-wide dense marker maps. Genetics 157: 1819-1829.

689 PLUMMER, M., N. BEST, K. COWLES and K. VINES, 2008 Coda: output analysis and

690 diagnostics for MCMC. http://cran.r-project.org/web/packages/coda/index.html

691 ROSA, G. J. M., C. R. PADOVANI, and D. GIANOLA, 2003 Robust linear mixed models with

692 normal/independent distributions and Bayesian MCMC implementation. Biometrical Journal 45:

693 573-590.

694 ROSA, G. J. M., D. GIANOLA , and C. R. PADOVANI, 2004 Bayesian longitudinal data analy-

695 sis with mixed models and thick-tailed distributions using MCMC. Journal of Applied Statistics 31:

696 855-873.

697 SORENSEN, D., and D. GIANOLA, 2002 Likelihood, Bayesian, and MCMC Methods in Quan-

698 titative Genetics. Springer, New York.

699 STRANDÉN, I., and D. GIANOLA, 1998 Attenuating e¤ects of preferential treatment with

700 Student-t mixed linear models: a simulation study. Genetics, Selection, Evolution 30:565-583.

701 TURELLI, M., 1985 E¤ects of pleiotropy on predictions concerning mutation selection balance

702 for polygenic traits. Genetics 111: 165-195.

703 VAN RADEN, P. M., 2008 E¢ cient methods to compute genomic predictions. Journal of Dairy

704 Science 91: 4414-4423.

705 VAN RADEN, P. M., C. P. VAN TASSELL, G. R. WIGGANS, T. S. SONSTEGARD, R. D.

706 SCHNABEL, J. F. TAYLOR, and F. S. SCHENCKEL, 2009 Reliability of genomic predictions for

707 North American Holstein bulls. Journal of Dairy Science 92: 16-24.

708 WANG, C. S., J. J. RUTLEDGE, and D. GIANOLA, 1993 Marginal inferences about variance

709 components in a mixed linear model using Gibbs sampling. Genet. Sel. Evol. 25: 41-62.

710 WANG, C. S., J. J. RUTLEDGE, and D. GIANOLA, 1994 Bayesian analysis of mixed linear

711 models via Gibbs sampling with an application to litter size in Iberian pigs. Genet. Sel. Evol. 26:

712 91-115.

713 WHITTAKER, J. C., R. THOMPSON, and M. C. DENHAM, 2000 Marker-assisted selection

29 714 using ridge regression. Genet. Res. 75: 249-252.

715 WRIGHT, S., 1937 The distribution of gene frequencies in populations. Proc. Nat. Acad. Sci.

716 23: 307-320.

717 YI, N., and S. XU, 2008 Bayesian Lasso for quantitative trait loci mapping. Genetics 179:

718 1045-1055.

719 XU, S., 2003 Estimating polygenic e¤ects using markers of the entire genome. Genetics 163:

720 789-801.

721 ZHANG, X. S., and W.G. HILL, 2005 Predictions of patterns of response to arti…cial selection

722 in lines derived from natural populations. Genetics 169: 411-425.

30 723 Table 1. Correlation between marker genotypes (average over markers and over 100 Monte-

724 Carlo replicates) by scenarios of adjacency between pairs of markers and of linkage disequilibrium

725 (X0: low linkage disequilibrium; X1 high linkage disequilibrium).

Adjacency Adjacency Adjacency Adjacency Disequilibrium scenario 1 2 3 4 726 X0 0.007 0.002 -0.002 0.013 X1 0.722 0.567 0.450 0.356

31 727 Table 2. Nine di¤erent speci…cations of hyper-parameters of the prior distribution of marker 2 728 variances in Bayes A  = prior degrees of freedom; S = prior scale parameter .  S2 = 10 5 S2 = 10 3 S2 = 5 102   = 0 1 2 3 729 1  = 2 4 5 6  = 1 7 8 9

32 2 730 Table 3. Average (over 100 replicates) of posterior mean estimates of residual variance ( )

731 and of the correlation between the true and estimated value for several items (phenotypes: y; true

732 genomic value: Wam; …tted genomic value Wam , true marker e¤ects: am; estimated marker e¤ects

733 am). SD= among-replicates standard deviation of item. b b

734

33 735 Table 4. Fraction of retrieved regions by set of priors in Bayes A and scenario of linkage

736 disequilibrium (LD).

Set of priors in Bayes A Low LD High LD 1 0:24 0:21 2 0:43 0:34 3 0:47 0:33 4 0:22 0:19 737 5 0:36 0:31 6 0:43 0:34 7 0:22 0:18 8 0:26 0:22 9 0:29 0:26 738

34 1.0 Ratio 0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0 4 6 8 10 12 14 16 18 20 Degrees of freedom 739 2 2 1 740 Figure 1. Ratio between coe¢ cients of variation CV  ELSE =CV  = 1 of the ak ak ( 3) j 741 conditional posterior and prior distributions of the variance of the marker e¤ect, q as a function of 742 the degrees of freedom  of the prior.

35 4

3

2

1

0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 Marker•specific variance •1

•2

743 2 744 Figure 2. Prior densities of the marker speci…c variance  (dots:  = 4;S = 1; solid: ak 2 745  = 10;S = 1; crosses:  = 100;S = 1) and values of integrand L ;  + 1;S ; ak = 0 in the 746 Kullback-Leibler distance, for each of the three priors, in solid lines. The integrands are essentially 2 747 indistinguishable from each other for all values of  . Values of the integrand are drastically ak 748 di¤erent (open circles) when 10 markers are assigned the same variance, so that 2 2 749 L ;  + 10;S ; a = 0 for all k;  k ak 

36 Density 0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0 0 1 2 3 4 5 6 Marker variance 750

751 Figure 3. E¤ect of scale parameter on the conditional posterior distribution of the variance of the

752 marker e¤ect. Prior distribution marked with boxes; conditional posterior distribution for c = 2

753 (standardized marker e¤ect) in dots. The other three conditional distributions (solid lines) are

754 barely distinguisable from the prior.

37 755

756 Figure 4. Positions (chromosome and marker number) and e¤ects of markers (there were 280

757 markers, with 270 having no e¤ect)

758

759

760

761

762

763

764

765

766

767

38