1 Joint genome-wide prediction in several populations accounting for of genotypes:

2 A hierarchical Bayes approach. II: Multivariate spike and slab priors for marker effects and

3 derivation of approximate Bayes and fractional Bayes factors for the complete family of models

4 Carlos Alberto Martíneza,b, Kshitij Khareb, Arunava Banerjeec, Mauricio A. Elzoa

5 aDepartment of Animal Sciences

6 bDepartment of

7 cDepartment of Computer and Information Science and Engineering

8 University of Florida, Gainesville, FL, USA

9 Correspondence: Carlos Alberto Martínez, Department of Animal Sciences, University of Florida,

10 Gainesville, FL 32611, USA.

11 Tel: 352-328-1624.

12 Fax: 352-392-7851.

13 E-mail: [email protected]

14

15

16

17

18

19

20

21

22

23

24

25 1

26 Abstract

27 This study corresponds to the second part of a companion paper devoted to the development of

28 Bayesian multiple regression models accounting for randomness of genotypes in across population

29 genome-wide prediction. This family of models considers heterogeneous and correlated marker

30 effects and allelic frequencies across populations, and has the ability of considering records from

31 non-genotyped individuals and individuals with missing genotypes in any subset of loci without the

32 need for previous imputation, taking into account uncertainty about imputed genotypes. This paper

33 extends this family of models by considering multivariate spike and slab conditional priors for

34 marker allele substitution effects and contains derivations of approximate Bayes factors and

35 fractional Bayes factors to compare models from part I and those developed here with their null

36 versions. These null versions correspond to simpler models ignoring heterogeneity of populations,

37 but still accounting for randomness of genotypes. For each marker loci, the spike component of

38 priors corresponded to point mass at 0 in ℝ풮, where 풮 is the number of populations, and the slab

39 component was a 풮-variate Gaussian distribution, independent conditional priors were assumed. For

40 the Gaussian components, matrices were assumed to be either the same for all markers or

41 different for each marker. For null models, the priors were simply univariate versions of these finite

42 mixture distributions. Approximate algebraic expressions for Bayes factors and fractional Bayes

43 factors were found using the Laplace approximation. Using the simulated datasets described in part I,

44 these models were implemented and compared with models derived in part I using measures of

45 predictive performance based on squared Pearson correlations, Information Criterion,

46 Bayes factors, and fractional Bayes factors. The extensions presented here enlarge our family of

47 genome-wide prediction models making it more flexible in the sense that it now offers more

48 modeling options.

2

49 Key words: Bayesian whole-genome regressions; Finite mixture priors; genetic heterogeneity;

50 Laplace approximation; multi-population genome-enabled prediction.

51 1. Introduction

52 The scenario of across population genome-wide prediction accounting for randomness of genotypes

53 was addressed in part I of our series of studies. There, we adopted a hierarchical Bayesian modeling

54 strategy to accommodate heterogeneous and correlated marker effects across subpopulations and

55 random genotypes. In that companion paper we provided a detailed derivation of the joint pmf of the

56 genotypes conditional on pedigree information and allelic frequencies and also discussed some of its

57 properties. Furthermore, the flexibility of hierarchical Bayesian modeling allowed us to account for

58 heterogeneous and correlated allelic frequencies. The “MG-GBLUP” model proposed by Lehermeir

59 et al. (2015) is similar to the models developed in part I of this study, except that they did not

60 consider randomness of genotypes. In addition, they did not consider models with different

61 (heterogeneous) covariance matrices of marker effects. One of the main properties of our models is

62 that individuals with phenotypic records and missing genotypes at any subset of loci (including non-

63 genotyped individuals) can be considered in the analysis without previous imputation. Furthermore,

64 due to the use of a Bayesian approach, uncertainty about imputed genotypes is automatically taken

65 into account.

66 The so called “spike and slab” priors, are finite mixtures of a continuous distribution (the slab) and a

67 mass point at some constant (the spike) (Mitchell and Beauchamp, 1988). A particular case of these

68 priors are the zero-inflated priors which have point mass at zero. This sort of priors has been used in

69 high dimensional problems to induce a stronger shrinkage and perform variable selection. In single

70 population analyses, it has been reported that when there are genes with major effects controlling the

71 trait under study or the number of genes controlling the trait is low, Bayesian variable selection

72 models tend to perform better (Daetwyler et al., 2012; Heslot et al., 2012; Gianola and Rosa, 2015).

73 In the case of multiple population analyses, van den Berg et al. (2015) studied scenarios under which

3

74 Bayesian variable selection models outperformed genomic BLUP (GBLUP). They found that

75 GBLUP was outperformed when the number of QTL was smaller than the number of independent

76 chromosome segments. They also found that the difference in accuracy between these models was

77 larger than in the single population case.

78 In a Bayesian framework, model comparison can be performed via Bayes factors and some

79 modifications of them known as non-subjective Bayes factors (Ghosh et al, 2006). Bayes factors

80 measure the change in the favoring a model once are observed (Lavine and Schervish,

81 1999). On the other hand, O’Hagan (1994; 1995) proposed a non-subjective Bayes factor known as

82 fractional Bayes factor which uses a fractional part of the likelihood resulting in a “partial” Bayes

83 factor. Analytical forms of Bayes factors involve integration of the joint distribution of data and

84 parameters over the parameter space of a given model to obtain marginal likelihoods, and even for

85 some simple models these integrals do not have a closed form solution. One option to obtain

86 algebraic approximations is to use the Laplace approximation after arranging the integrand in an

87 appropriate form (Ghosh et al, 2006). Another criterion to compare models is the Deviance

88 Information Criterion (DIC, Spiegelhalter et al., 2002; 2014) which combines measures of model fit

89 and model complexity and, despite some limitations, it has been used in several research areas

90 (Spiegelhalter et al., 2014).

91 Thus, the objectives of this study were to extend the family of models developed in a companion

92 paper (part I) by considering the so called spike and slab priors for marker effects and to derive

93 approximate expressions for Bayes factors and fractional Bayes factors to compare the proposed

94 models with their corresponding null versions that ignore population structure.

95 2. Methods

96 2.1 The models

4

97 The complete population or simply the population is defined as the set of individuals with

98 phenotypes considered in the analysis, which is comprised by a set of 풮 subpopulations defined by

99 some criterion like environment, race, breed, line, etc. Also the following assumptions are made:

100 linkage equilibrium, Hardy-Weinberg equilibrium, no mutation, and starting from the oldest

101 individuals with phenotypes, the pedigree is fully known.

102 The following is the describing the relationship between records and mappings of

103 marker genotypes: 풚 = 푊품 + 풆, where 풚 ∈ ℝ푛 is a vector containing response variables (e.g.,

104 records corrected for non-genetic factors), 푊 ∈ ℝ푛×푚 is an observable random matrix with entries

105 corresponding to a one to one mapping from the set of individual marker genotypes to a subset of the

106 integers (defined later), 품 ∈ ℝ푚 is an unknown random vector of average marker allele substitution

107 effects for every population and 풆 ∈ ℝ푛 is a random vector of residuals. If records are sorted by

108 subpopulation as well as the columns of 푊 and the elements of 품, then for every 푙 = 1,2, … , 풮,

( ) ( ) ( ) ( ) 109 풚푙 = 푊푙품푙 + 풆푙, with dimensions: 풚푙 푛푙×1, 푊푙 푛푙×푚, 품푙 푚×1 and 풆푙 푛푙×1 where 푛푙 is the sample

풮 110 size of subpopulation 푙, and 푚 is the number of marker loci; therefore, 푛 = ∑푙=1 푛푙.

111 In our models, the mapping from the set of genotypes at each locus and each individual into a subset

112 of the integers is defined as follows, biallelic loci are considered. If A and B are the marker alleles at

113 each locus and B is considered the reference allele then:

1, 푖푓 푔푒푛표푡푦푝푒 = 퐵퐵 푊 = {푤푙 } = { 0, 푖푓 푔푒푛표푡푦푝푒 = 퐴퐵 . 푙 푖푗 푛 ×푚 푙 −1, 푖푓 푔푒푛표푡푦푝푒 = 퐴퐴

2 2 114 The following is the hierarchical representation of our models. Let 푅 = (휎푒1, … , 휎푒풮) and 푉 =

풮 115 퐵푙표푐푘 퐷푖푎푔. {휎2 퐼 } then 푒푙 푛푙 푙 = 1

풚|푊, 품, 푅~푀푉푁(푊품, 푉)

∗ ∗ ∗ ∗ ∗ ∗ 푊|풑1, 풑2, … , 풑푚~휋(∙|풑1, 풑2, … , 풑푚)

5

푖푖푑 ∗ ∗ 풑푗 ∼ 휋(풑 ), 푗 = 1,2, … , 푚

푖푖푑 휏2 푣 휏2 푣 휎2 , … , 휎2 ∼ 퐼푛푣푒푟푠푒 퐺푎푚푚푎 ( , ) ≔ 퐼퐺 ( , ) 푒1 푒풮 2 2 2 2

푖푛푑 푃표푖푛푡 푚푎푠푠 푎푡 ퟎ 푤푖푡ℎ 푝푟표푏푎푏푖푙푖푡푦 휋0 품푗|퐺푗, 휋0 ∼ { 푀푉푁(ퟎ, 퐺푗) 푤푖푡ℎ 푝푟표푏푎푏푖푙푖푡푦 1 − 휋0

푖푖푑 퐺푗 ∼ 퐼푛푣푒푟푠푒 푊푖푠ℎ푎푟푡(푎, 휮) ≔ 퐼푊(푎, 휮)

휎2 휎 ⋯ 휎 푗1 푗1,2 푗1,풮 휎2 ⋯ 휎 푗2 푗2,풮 퐺푗 = ⋱ ⋮ 푠푦푚 휎2 [ 푗풮 ]

where 휎2 is the residual in subpopulation 푙, 휎2 is the variance of the effect of the 푗푡ℎ marker 116 푒푙 푗푙

푡ℎ 117 in the 푙 subpopulation, 휎푗푙,푙′ is the covariance between effects of marker 푗 in subpopulations 푙 and

∗ 푡ℎ ∗ 118 푙′, 풑푗 is a parameter related to allelic frequencies of the 푗 marker in each subpopulation and 휋(풑 )

119 is its probability density function (pdf). This set of parameters and their pdf are described in part I of

120 this series of papers. Here, parameter 휋0 was assumed to be known.

121 The model presented above assumed a different for the vector of allele substitution

122 effects for each marker in the slab component of the mixture distribution and consequently this sort

123 of models will be referred to as heterogeneous marker effects covariance matrix models. On the other

0 124 hand, models with 퐺1 = ⋯ = 퐺푚 = 퐺 will be referred to as homogeneous marker effects

2 2 2 125 covariance matrix models. Moreover, the special case 휎푒1 = ⋯ = 휎푒풮 = 휎 corresponds to that of

126 models with homoscedastic residuals.

127 In part I, it was discussed that the scenario of completely (i.e., at all loci) or partially missing

∗ ∗ ∗ ∗ ∗ 128 genotypes can be handled because of the use of the pmf 휋(푊|푃 ), 푃 = (풑1, 풑2, … , 풑푚) and the fact

129 that these missing genotypes are regarded as model parameters. There, it was also shown that the

6

130 likelihood can be written as 푓(풚, 푊ℴ|푊푁, 품, 푅, 푃∗) = 푓(풚|푊, 품, 푅)푓(푊ℴ|푊푁, 푃∗) where 푊ℴ is the

131 fraction of 푊 corresponding to observed genotypes, 푊푁 the fraction corresponding to missing

132 genotypes, and 푓(풚|푊, 품, 푅) and 푓(푊ℴ|푊푁, 푃∗) are referred to as the 풚 component and the 푊

133 component of the likelihood.

134 The conditional prior for 품푗 can be written as:

휋(품 |퐺 , 휋) = 휋 퐼 + (1 − 휋 )푀푉푁(품 ; 0, 퐺 )퐼 푗 푗 0 {품푗=ퟎ} 0 푗 푗 {품푗≠ퟎ}

135 where 퐼{∙} is the indicator function. This form is more convenient from the algebraic point of view

136 because it allows carrying out computations and writing expressions for the joint conditional prior in

137 an easier way. Under the conditional independence assumption, the joint conditional prior for 품 is:

푚 휋(품|퐺, 휋 ) = 휋 퐼 + (1 − 휋 )푀푉푁(품 ; 0, 퐺 )퐼 . 0 ∏ { 0 {품푗=ퟎ} 0 푗 푗 {품푗≠ퟎ}} 푗=1

138 An explicit form of this prior pdf can be found as follows. Let 푖 = 0,1, … , 푚 be the number of

139 markers having a null effect. Consequently, when expanding the product above, for each 푖 there are

푚 푚 140 ( 푖 ) combinations of 푖 markers with null effect chosen from 푚 markers. For 푙 = 1,2, … , ( 푖 ), let 훿푖푙

141 denote the event that the 푙푡ℎ subset of 푖 markers (i.e., the 푙푡ℎ combination of 푖 markers with null

142 effect chosen from the total set of 푚 markers) have null effect and 퐼훿푖푙 the indicator function of this

푚 143 event. Thus, there are ( 푖 ) terms in the expansion with 휋0 appearing exactly 푖 times; therefore, each

푚 144 one of these ( 푖 ) terms is of the form:

푖 ( )푚−푖 ( ) 퐼훿푖푙 휋0 1 − 휋0 ∏ 푀푉푁 품푘; ퟎ, 퐺푘 푐 푘:푔푘∈훿푖푙0

푐 145 where 훿푖푙0 is the set of marker loci with null effects given 훿푖푙, and 훿푖푙0 is its complement, i.e., the set

146 of 푚 − 푖 markers with non-null effect under 훿푖푙. Therefore when expanding 휋(품|퐺, 휋0) for the

147 heterogeneous marker effect covariance matrix model:

7

푚 푚 ( 푖 ) ( | ) 푖 ( )푚−푖 ( ) 휋 품 퐺 = ∑ 휋0 1 − 휋0 ∑ 퐼훿푖푙 ∏ 푀푉푁 품푘; ퟎ, 퐺푘 , 푐 푖=0 푙=1 푘:푔푘∈훿푖푙0

148 while for the homogeneous marker effect covariance matrix model the expression is the same except

0 149 that 퐺푗 = 퐺 ∀ 푗 = 1,2 … , 푚.

150 Regarding the marginal priors, under homogeneous covariance matrix of marker effects:

푚 푚 ( 푖 ) −푎+풮+푚−푖+1 −1 휋(품) ∝ ∑ 휋푖 (1 − 휋 )푚−푖 ∑ 퐼 ∫|퐺0| 2 exp 푡푟 ((횺 + ∑ 품 품′ ) (퐺0)−1) 푑퐺0 0 0 훿푖푙 2 푘 푘 푖=0 푙=1 + 푘:품 ∈훿푐 풫풮 푘 푙푖0 ( )

푎+푚−푖 푚 −( ) 푚 ( 푖 ) 2 푎 + 푚 − 푖 푖 ( )푚−푖 풮(푚−푖)⁄2 ′ ∝ ∑ 휋0 1 − 휋0 ∑ 퐼훿푖푙 2 Γ풮 ( ) |횺 + ∑ 품푘품푘| . 2 푐 푖=0 푙=1 푘:푔푘∈훿푙푖0

151 Hence, marker effects are not marginally independent a priori and their joint marginal prior

푖 푚−푖 152 distribution is a mixture of non-standard distributions with mixing probabilities 휋0(1 − 휋0) .

153 For heterogeneous marker effect covariance matrix model:

푚 푚 ( 푖 ) 푖 푚−푖 푖 푚−푖 −풮푖⁄2 푎 푎 + 1 1 휋(품) ∝ ∑ 휋0(1 − 휋0) ∑ 퐼훿 2 Γ풮 ( ) Γ풮 ( ) ∏ 푎+1 . 푖푙 2 2 ( ) 푐 ′ −1 2 푖=0 푙=1 푘:푔푘∈훿푖푙0 품푘횺∗ 품푘 |1 + 푎 + 1 − 풮|

푖 푚−푖 154 This is a mixture distribution with mixing probabilities 휋0(1 − 휋0) . Each component in the

푚 155 mixture is a sum of ( 푖 ) elements. Each one of these elements is the product of 푚 − 푖 multivariate t

1 156 distributions with scale matrix 횺 = 횺 and degrees of freedom 푎 + 1 − 풮 for non-null vectors ∗ 푎+1−풮

157 of markers effects, and point mass at zero for 푖 null vectors of marker effects, under event 훿푖푙. In this

158 case, marker effects are marginally independent a priori.

159 2.2 Full conditionals

160 Only full conditionals that change with respect to those considered in part I are presented.

휋(품|퐸푙푠푒) =

8

(푚) 푚 푖 ′ −1 ′ ′ −1 푊 푐 푊훿푐 푊 푐 풚 푊 푐 푊훿푐 푖 푚−푖 훿푖푙0 푖푙0 −1 훿푖푙0 훿푖푙0 푖푙0 −1 ∑ 휋0(1 − 휋0) ∑ 퐼훿 푀푉푁 (품훿푐 ; ( + 퐺 푐 ) , ( + 퐺 푐 ) ) 푖푙 푖푙0 휎2 훿푖푙0 휎2 휎2 훿푖푙0 푖=0 푙=1

′ ′ ′ 푐 161 where 품 푐 = (품푘 ⋯ 품푘 ) , 푘: 품 ∈ 훿 , corresponds to the vector of dimension 풮(푚 − 푖) 훿푖푙0 1 푚−푖 푘 푖푙0

162 with the non-null marker effects under 훿 , 푊 푐 is the submatrix of the design matrix corresponding 푖푙 훿푖푙0

−1 −1 163 to 품 푐 and 퐺 푐 = 퐼 ⨂퐺 , 푖 = 0,1, … , 푚. 훿푖푙0 훿푖푙0 푚−푖 0

164 Remark 1 Notice that each element in the summation above corresponds to a multivariate normal

푐 165 distribution of dimension 풮(푚 − 푖) for those markers in 훿푖푙0 and point mass at zero for those markers

166 in 훿푖푙0. Thus, in each term, the multivariate normal corresponds to the distribution of the effects of

167 the subset of markers with non-null effects given 훿푖푙. Therefore, this joint full conditional distribution

168 of 품 suggests that for each marker, the full conditional distribution of 품푗 (given data, and other

169 parameters in the model including the remaining components of 품) is a spike and slab distribution.

170 Note that it is easier to deal with 휋(품푗|퐸푙푠푒) than with 휋(품|퐸푙푠푒). The full conditional 휋(품푗|퐸푙푠푒)

171 can be found from 휋(품|퐸푙푠푒) using the Bayes theorem. However, this could be complex because it

172 requires identifying all the cases in which 품푗 = 0 and all the cases in which 품푗 ≠ 0. An easier way is

173 to derive it using the conditional prior for 품푗. Details are presented in Appendix A. The final result

174 is:

휋(품푗|퐸푙푠푒) =

푊′ 푝(품 = 0|퐸푙푠푒)퐼 + (1 − 푝(품 = 0|퐸푙푠푒)) 푀푉푁 (퐺−1 푗 (풚 − 푊 품 ), 퐺−1) 퐼 푗 {품푗=ퟎ} 푗 퐹푗 휎2 (−푗) (−푗) 퐹푗 {품푗≠ퟎ}

′ 푊 푊푗 퐺 = 푗 + (퐺0)−1 퐹푗 휎2 휋 푝(품 = 0|퐸푙푠푒) = 0 , 푗 −1⁄2 2 0 1 −1⁄2 ′ 휋0 + (1 − 휋0)(|퐺퐹푗||퐺 |) exp ( 2 ‖퐺∗퐹푗 푊푗 (풚 − 푊(−푗)품(−푗))‖ ) 2휎 2

9

2 ′ 2 0 −1 175 where 퐺∗퐹푗 = 휎 퐺퐹푗 = 푊푗 푊푗 + 휎 (퐺 ) . Thus, the full conditional distribution of 품푗 is a spike

−1 ′ −1 176 and slab distribution where the slab component is a 푀푉푁(퐺퐹푗 푊푗 (풚 − 푊(−푗)품(−푗)), 퐺퐹푗 ) and the

177 spike is a point mass at 0 in ℝ풮. On the other hand,

푚 푚 ( 푖 ) (푚−푖+푎+풮+1) − ( 0| ) 푖( )푚−푖 | 0| 2 휋 퐺 퐸푙푠푒 ∝ ∑ 흅 1 − 흅 ∑ 퐼훿푖푙 퐺 푖=0 푙=1

−1 ′ 0 −1 × exp ( 푡푟 ((횺 + ∑ 품푘품푘) (퐺 ) )), 2 푐 푘:푔푘∈훿푖푙0

178 a mixture of sums of inverse Wishart distributions with mixing probabilities 흅푖(1 − 흅)푚−푖, 푖 =

푡ℎ 푚 179 0,1, … , 푚. The 푖 component of the mixture is the sum of ( 푖 ) inverse Wishart distributions with

′ 푚 180 parameters (푚 − 푖 + 푎, 횺 + ∑ 푐 품 품 ) 퐼 , 푙 = 1,2, … , ( ). 푘:푔푘∈훿푖푙0 푘 푘 훿푖푙 푖

181 For the heterogeneous marker effect covariance matrix model the full conditional 휋(품푗|퐸푙푠푒) has the

182 same form as for the homogeneous marker effect covariance matrix model except that now 퐺퐹푗 =

푊′푊 183 푗 푗 + 퐺−1 and 퐺 = 푊′푊 + 휎2퐺−1 and 휎2 푗 ∗퐹푗 푗 푗 푗

′ 퐼푊(푎 + 1, 횺+품푗품푗), 푖푓 품푗 ≠ 0 휋(퐺푗|퐸푙푠푒) = { 퐼푊(푎, 횺), 푖푓 품푗 = 0

184 The expressions for models with heteroscedastic residuals are very similar and therefore these are

185 omitted. Such expressions can be found in Appendix A along with joint posterior distributions.

186 2.3 Model comparison

187 2.3.1 Theoretical approximation to model comparison via Bayes factors and fractional Bayes factors

188 Here, the term null model refers to simplified versions of the proposed models in two scenarios. The

189 first one corresponds to the case in which all data are pooled and the factor splitting the complete

190 population into subpopulations is ignored. In the second scenario, the complete population is split

191 into subpopulations and each one of them is analyzed independently. The null model corresponding

10

192 to the first scenario was already presented in part I, and for the second scenario, the model for each

193 subpopulation is the same, but only considering data from the corresponding subpopulation. This

194 model is referred to as independent subpopulations model.

195 In order to find some theoretical approach to compare the full models with their null versions,

196 approximate Bayes factors and fractional Bayes factors are derived in this section. To this end,

197 analytical approximations of multivariate integrals that have to be solved to find marginal likelihoods

198 are derived. The Laplace approximation (Ghosh et al., 2006) is used to solve some of these

199 multivariate integrals. As will be shown in this section, the use of the Laplace approximation requires

200 the matrix 푊 to be of full column rank. This assumption does not hold in many real life situations

201 where 푚 > 푛 and therefore this matrix cannot be of full column rank. However, as more individuals

202 are genotyped, this situation can be found more frequently, especially for chips of intermediate

203 density. Notice that for matrix 푊 to be of full rank, the number of observations in each

204 subpopulation cannot be smaller than 푚; therefore, the requirement is that 푛푙 ≥ 푚 ∀ 푙 = 1,2, … , 풮.

205 As a matter of fact, in countries like the US there exist data sets where the number of genotyped

206 animals exceeds the number of molecular markers in chips like the Illumina 50k (CDCB, 2016).

207 Moreover, in certain cases, some filtering or preselection criteria reduces the set of markers to be

208 included in the analyses and for populations with a large amount of genotyped individuals this could

209 also lead to the full rank scenario. More comments on this will be made in the discussion. Therefore,

210 in real life situations like across country or across breed analysis, the situation 푛푙 ≥ 푚 ∀ 푙 =

211 1,2, … , 풮 could be observed, thus the assumption of matrix 푊 being of full column rank could be

212 satisfied. Of course, 푛푙 ≥ 푚 ∀ 푙 = 1,2, … , 풮 is not a sufficient condition for matrix 푊 to be of full

213 column rank, but given the structure of this matrix, this would generally be the case except in certain

214 situations, for example, having clones in the same subpopulation.

215 Bayes factors

11

216 Bayes factors have generally been interpreted as measures of support in favor of a model provided by

217 data. Lavine and Schervish (1999) showed that what Bayes factors are actually measuring the change

218 in the odds favoring a model once data are observed. The Bayes factor comparing two models

219 denoted as 푀1 and 푀0 is defined as:

푓(풚|푀1) 퐵퐹10 = 푓(풚|푀0)

∫ 휋1(휽1)푓1(풚|휽1)푑휽1 = Θ1 ∫ 휋0(휽0)푓0(풚|휽0)푑휽0 Θ0

220 where 휽푖, 휋푖(휽푖), 푓푖(풚|휽푖) and Θ푖 are the parameters, prior, likelihood and parametric space under

221 model 푖, respectively, 푖 = 1, 2.

222 Approximate Bayes factors comparing homogenous marker effect covariance matrix models

223 (Gaussian and spike and slab priors, homoscedastic residuals) and heterogeneous marker effect

224 covariance matrix models (Gaussian and spike and slab priors, homoscedastic residuals) to their null

225 versions were derived. Also, an approximate Bayes factor comparing the heterogeneous marker

226 effect covariance matrix model with heteroscedastic residuals with the independent subpopulations

227 model was found. These approximate Bayes factors were conditioned on the genotypes (i.e.,

228 conditioned on 푊 and 푊0). Therefore, the 풚 component of the likelihood is used. The case when a

229 part of 푊 is not observed is treated at the end of this section.

230 A brief outline of the derivation of these approximate Bayes factors is presented. In each case, model

231 sub-index 1 corresponds to the full model while sub-index 0 denotes the null model. The Bayes

232 factor comparing homogeneous marker effect covariance matrix models with its null version is

233 denoted 퐵퐹10푊 when a Gaussian prior is posed over 품 and residuals are homoscedastic. Whenever

234 residuals are heteroscedastic the letter 퐻 appears in the subindex and when the prior posed over 품 is

235 spike and slab the letter 퐺 is replaced by 푆푆. Moreover, the superindex ∗ is used to identify models

12

236 with heterogeneous marker effect covariance matrices. The same subindex notation is used for

237 fractional Bayes factors.

238 In general, let:

푓(풚|푊, 푀1) 퐵퐹10푊 = . 푓(풚|푊0, 푀0)

2 0 ∗ 239 For the homogeneous marker effect covariance matrix model 휽1 ≔ (휽, 흓)=({품, 휎 , 푊}, {퐺 , 푃 })

∗ 2 2 240 and 휽0 ≔ (휽0, 흓0) = ({품0, 휎 , 푊0}, {휎푔 , 풑0}). Let ℝ+ denote the positive reals. Then:

휋(풚, 휽1) = 푓(풚|휽)휋(휽, 흓)

= 푓(풚|품, 휎2, 푊)휋(품, 퐺0)휋(휎2)휋(푊, 푃∗)

241 then

∫ ∫ ∫ 푓(풚|품, 휎2, 푊)휋(품, 퐺0)휋(휎2) 푑휎2푑품푑퐺0 = 푓(풚|푊)

+ 푚풮 풫풮 ℝ ℝ+

= ∫ 휋(퐺0) ( ∫ ∫ 푓(풚|품, 휎2, 푊)휋(품|퐺0)휋(휎2) 푑휎2푑품) 푑퐺0

+ 푚풮 풫풮 ℝ ℝ+

242 Thus, the previous multiple integral has to be solved in order to find 푓(풚|푊). An analytic expression

2 0 2 2 243 for the inner integral ∫ 푚푛 ∫ 푓(풚|품, 휎 , 푊)휋(품|퐺 )휋(휎 ) 푑휎 푑품 is approximated using the ℝ ℝ+

244 Laplace approximation (Ghosh et al., 2006). As shown in appendix B, after obtaining this

245 approximation, the external integral can be found in a closed form. The Laplace method is based on a

246 second order Taylor series expansion and allows finding an approximation to integrals of the form:

퐼 = ∫ 푞(휽)푒푛ℎ(휽)푑휽, ℝ푝

247 where 푞 and ℎ are smooth functions of 휽 and ℎ has a unique maximum at 휽̂. In ,

248 푛ℎ(휽) is usually taken to be the log-likelihood or the log of the unnormalized posterior. Hence, 휽̂ can

13

249 be the MLE or the posterior when the posterior is unimodal. The Laplace approximation has

250 the form (Ghosh et al., 2006):

푛ℎ(휽̂) 푝⁄2 −푝⁄2 ̂ −1⁄2 ̂ −1 퐼 = 푒 (2휋) 푛 |∆ℎ(휽)| 푞(휽)(1 + 푂(푛 )),

̂ ̂ 251 where 푝 = dim(휽) and |∆ℎ(휽)| is the determinant of the Hessian matrix of −ℎ evaluated at 휽. The

252 inner integral in 푓(풚|푊) can be written as:

2 ∗ ∫ ∫ 휋(품|퐺0)휋(휎2)푒ln 푓(풚|품,휎 ,푊) 푑휎2푑품 ≔ ∫ 푞(휽∗)푒푛ℎ(휽 )푑휽∗,

푚풮 푚풮+1 ℝ ℝ+ ℝ

253 where 휽∗ ≔ (품, 휎2).

254 Under the assumption that 푓(풚|품, 휎2, 푊) has a unique maximum at 휽̂∗ ≔ (품̂, 휎̂2), Laplace

255 approximation can be used. The 풚 component of the is a 푀푉푁(푊푔, 휎2퐼).

256 Therefore, following standard results from linear models theory, if 푊 is of full column rank then,

‖풚−푊품̂‖2 풚′(퐼−퐻 )풚 (푛−푟) 257 품̂ = (푊′푊)−1푊′풚 is the MLE of 품, and 휎̂2 = = 푊 = 푆2 is the MLE of 휎2, 푛 푛 푛

풚′(퐼−퐻 )풚 258 where 푆2 = 푊 is the estimator of 휎2, 푟 = 푟푎푛푘(푊′푊) = 푚풮 and 퐻 = 푛−푟 푊

259 푊(푊′푊)−1푊′ is the projection matrix onto the column space of 푊.

260 After computing all the required expressions and making algebraic simplifications (see Appendix B),

261 it follows that:

퐵퐹10퐺푊

푎+푚 푎 −( ) 푛+푣+2 1 2 푚 ′ 2 −( ) (푚풮+2)⁄2 2 |횺| |횺 + ∑ 품̂푗품̂ | 푆푆푅 2 푆푆푅 |푊 ′푊 | ≈ ( ) ( 푗=1 푗 ) ( ) ( 0 0 ) 푏 푏 + ∑푚 푔̂2 푆푆푅 (푚+2)⁄2 |푊′푊| 푗=1 0푗 0 푆푆푅0

푚(풮−1) 풮 푎 + 푚 + 1 − 푙 −푛휏2 1 1 2 2 Γ ( ) × exp ( ( − )) ( ) (∏ 2 ) 2 푆푆푅 푆푆푅0 푛 푎 + 1 − 푙 푙=2 Γ ( 2 )

14

2 ′( ) ′ ( )−1 ′ 2 ‖풚−푊0품̂0‖ 262 where 푆푆푅 = 풚 퐼 − 퐻푊 풚, 푆푆푅0 = 풚 (퐼 − 퐻푊0)풚, 퐻푊0 = 푊0 푊0′푊0 푊0, 푆0 = , 푟0 = 푛−푟0

−1 ′ 263 푟푎푛푘(푊0′푊0) = 푚, 품̂0 = (푊0′푊0) 푊0풚.

264 Following similar steps (see Appendix B),

( | ∗) ∗ 푓 풚 푊, 푀1 퐵퐹10퐺푊 = ∗ 푓(풚|푊0, 푀0)

푎+1 푎푚 푚 −( ) 푛+푣+2 1 2 ′ 2 −( ) (푚풮+2)⁄2 2 |횺| |횺 + 품̂푗품̂푗| 푆푆푅 2 푆푆푅 |푊0′푊0| ≈ ( ) ∏ ( 2 ) ( ) (푚+2)⁄2 ( ) 푏 푏 + 푔̂ 푆푆푅0 |푊′푊| 푗=1 0푗 푆푆푅0

푚 푚(풮−1) 풮 푎 + 2 − 푙 −푛휏2 1 1 2 2 Γ ( ) × exp ( ( − )) ( ) (∏ 2 ) . 2 푆푆푅 푆푆푅0 푛 푎 + 1 − 푙 푙=2 Γ ( 2 )

푎+1 푚 푎푚 푚 −( ) 풮 푎 + 2 − 푙 푎 2 ′ 2 |횺| |횺 + 품̂푗품̂ | Γ ( ) Γ ( ) 퐵퐹∗ ≈ ( ) ∏ ( 푗 ) (∏ 2 2 ) 10퐺푊퐻 풮 풮 2 푎 + 1 − 푙 푎 + 1 푏 ∏푙=1(푏 + 푔̂0푗푙) 푗=1 푙=2 Γ ( 2 ) Γ ( 2 )

푛+푣+2 1 −( ) 푆푆푅 2 푆푆푅(푚풮+2)⁄2 |푊 ′푊 | 2 퐵퐹 ≈ (2휋)푚(풮+1)⁄2 ( ) ( 0 0 ) 10푆푆푊 푆푆푅 (푚+2)⁄2 |푊′푊| 0 푆푆푅0

푚(풮−1) 푎 푎 2 −푛휏 1 1 2 2 |횺| 2 Γ (2) × exp ( ( − )) ( ) ( ) 푎 2 푆푆푅 푆푆푅0 푛 푏 Γ풮 (2)

푚 푎+푚−푖 푚 ( 푖 ) 푖 푚−푖 풮(푚−푖)⁄2 푎 + 푚 − 푖 ′ −( ) ∑ ∑ 퐼 흅 (1 − 흅) 2 Γ ( ) |횺 + ∑ 푐 품̂ 품̂ | 2 푖=0 푙=1 훿푖푙 풮 2 푘:푔푘∈훿푙0 푘 푘 × 푎+푚−푖 푚 −( ) 푚 ( 푖 ) ( )⁄ 푎 + 푚 − 푖 2 ∑ ∑ 퐼 흅푖(1 − 흅)푚−푖2 푚−푖 2Γ ( ) |푏 + ∑ 푐 푔̂ | 2 푖=0 푙=1 훿푖푙 2 푘:푔푘∈훿푙0 푘0

푛+푣+2 1 −( ) 푆푆푅 2 푆푆푅(푚풮+2)⁄2 |푊 ′푊 | 2 퐵퐹∗ ≈ (2휋)푚(풮+1)⁄2 ( ) ( 0 0 ) 10푆푆푊 푆푆푅 (푚+2)⁄2 |푊′푊| 0 푆푆푅0

푚(풮−1) −1 푎 푚 2 −푛휏 1 1 2 2 |횺| 2 Γ (2) × exp ( ( − )) ( ) ( ) ( 푎 ) 2 푆푆푅 푆푆푅0 푛 푏 Γ풮 (2)

15

푚−푖 푚 푖 푚 ( 푖 ) 푖 푚−푖 −풮푖⁄2 푎 푎 + 1 1 ∑푖=0 ∑ 퐼훿 흅 (1 − 흅) 2 (Γ풮 ( )) (Γ풮 ( )) ∏푘:푔 ∈훿푐 푎+1 푙=1 푖푙 2 2 푘 푙0 ( ) −1 ′ 2 |1 + 품̂푘횺 품̂푘 | × 푚−푖 푚 푖 푚 ( 푖 ) 푖 푚−푖 −푖⁄2 푎 푎 + 1 1 ∑푖=0 ∑ 퐼훿 흅 (1 − 흅) 2 (Γ ( )) (Γ ( )) ∏푘:푔 ∈훿푐 푎+1 푙=1 푖푙 2 2 푘 푙0 ( ) 2 −1 2 |1 + 푔̂0푘푏 |

265 Before presenting fractional Bayes factors, the following result comparing 푆푆푅0 and 푆푆푅 in the

266 particular case of our models is presented and proven. This result will be used in the discussion

267 section to help in the interpretation of Bayes factors and fractional Bayes factors.

268 Result 1 For the models considered in this study, the following inequality holds: 푆푆푅0 ≥ 푆푆푅.

269 Proof

′ ′ 270 Let 푆푆푀1 = 풚 퐻푊풚 and 푆푆푀0 = 풚 퐻푊0풚. Thus, proving that 푆푆푅0 ≥ 푆푆푅 is equivalent to prove that

271 푆푆푀1 ≥ 푆푆푀0. Let 퐶(푊0) be the column space of 푊0 and 퐶(푊) the column space of 푊. Now, it is

272 proven that 퐶(푊0) ≼ 퐶(푊), where the notation "퐶(푊0) ≼ 퐶(푊)" that 퐶(푊0) is a subspace

푚 273 of 퐶(푊). Let 풛 ∈ 퐶(푊0), then ∃ 풂 ∈ ℝ such that 풛 = 푊0풂, that is,

푊1풂 풛 = [ ⋮ ]. 푊풮풂

274 Similarly, let 풘 ∈ 퐶(푊), then ∃ 풃 ∈ ℝ푚풮 such that 풘 = 푊풃. Without loss of generality vector 풃

푚 275 can be partitioned as 풃 = (풃1, … , 풃풮) where 풃푙 ∈ ℝ ∀ 푙 = 1,2, … , 풮. Then 풘 is of the form

푊1풃ퟏ 풘 = [ ⋮ ]. 푊풮풃풮

276 In particular, if 풃푙 = 풂 ∀ 푙 = 1,2, … , 풮, it follows that 풛 also has the form of an element of 퐶(푊),

277 that is, 풛 ∈ 퐶(푊). Clearly, 풘 cannot be written as a linear combination of the columns of 푊0;

( ) ( ) 278 therefore, 퐶 푊0 ≼ 퐶 푊 . Applying theorem B.47 of Christensen (2011), it follows that 퐻푊 − 퐻푊0

279 is an orthogonal projection. By properties of orthogonal projections (Harville, 2000) it follows that

16

280 퐻푊 − 퐻푊0 is a semi-positive definite matrix, and consequently 풚′(퐻푊 − 퐻푊0)풚 ≥ 0 ⟺ 풚′퐻푊풚 ≥

281 풚′퐻푊0풚.

282 Fractional Bayes factors

283 O’Hagan (1994; 1995) proposed a non-subjective Bayes factor known as fractional Bayes factor

284 which uses a fraction 푐 of the likelihood resulting in a “partial” Bayes factor having the following

285 form:

푐 ∫ 휋0(휽0)(푓0(풚|휽0)) 푑휽0 Θ0 퐹퐵퐹10 = 퐵퐹10 푐 . ∫ 휋1(휽1)(푓1(풚|휽1)) 푑휽1 Θ1

286 Thus, given 푊, the fractional Bayes factor for the homogeneous marker effect covariance matrix

287 model with homoscedastic residuals and Gaussian prior for 품 has the form:

푓푐(풚|푊0, 푀0퐺) 퐹퐵퐹10퐺푊 = 퐵퐹10퐺푊 푓푐(풚|푊, 푀1퐺)

2 2 푐 2 2 2 2 ∫ 휋(휎푔 ) (∫ 푚 ∫ (푓0(풚|품ퟎ, 휎푒 , 푊)) 휋(품ퟎ|휎푔 )휋(휎푒 ) 푑휎푒 푑품ퟎ) 푑휎푔 ℝ+ ℝ ℝ+ = 퐵퐹10퐺푊 푐 0 2 0 2 2 0 ∫ + 휋(퐺 ) (∫ 푚풮 ∫ (푓1(풚|품, 휎 , 푊)) 휋(품|퐺 )휋(휎 ) 푑휎 푑품) 푑퐺 풫풮 ℝ ℝ+

푐 288 Hence, ln(푓푖(풚|휽푖)) , 푖 = 0,1, and their corresponding Hessian matrices evaluated at the MLE have

289 to be found in order to find the Laplace approximation to the integrals inside the brackets in the

푐 290 numerator and denominator of 퐹퐵퐹10퐺푊. This is easily done because ln(푓푖(풚|휽푖)) = 푐 ln 푓푖(풚|휽푖).

291 The determinants of the negative Hessian matrices are now denoted by 퐷̃0, 퐷̃1 and they satisfy:

푚+1 푚풮+1 292 퐷̃0 = 푐 퐷0 and 퐷̃1 = 푐 퐷1. The approximate 퐹퐵퐹10퐺푊 is denoted as 퐹퐵퐹̅̅̅̅̅̅10퐺푊.

∗ ∗ 293 Fractional Bayes factors derived in this study were 퐹퐵퐹̅̅̅̅̅̅10푆푆푊 , 퐹퐵퐹̅̅̅̅̅̅10푆푆푊 and 퐹퐵퐹̅̅̅̅̅̅10퐺퐻푊. It turned

∗ ∗ 294 out that 퐹퐵퐹̅̅̅̅̅̅10퐺푊 = 퐹퐵퐹̅̅̅̅̅̅10퐺푊 because the components making 퐵퐹̅̅̅̅10퐺푊 different from 퐵퐹̅̅̅̅10퐺푊

∗ 푓푐(풚|푊0,푀0) 푓푐(풚|푊0,푀0) 295 cancelled when multiplying them by and ∗ respectively. For details on the 푓푐(풚|푊,푀1) 푓푐(풚|푊,푀1)

17

296 derivation see Appendix B. Moreover, the same cancellation happened when deriving 퐹퐵퐹̅̅̅̅̅̅10푆푆푊 and

∗ 297 퐹퐵퐹̅̅̅̅̅̅10푆푆푊. The resulting expression was:

∗ ∗ 퐹퐵퐹̅̅̅̅̅̅10퐺푊 = 퐹퐵퐹̅̅̅̅̅̅10퐺푊 = 퐹퐵퐹̅̅̅̅̅̅10푆푆푊 = 퐹퐵퐹̅̅̅̅̅̅10푆푆푊 ≔ 퐹퐵퐹̅̅̅̅̅̅10푊

푛(푐−1) 푆푆푅 2 푆푆푅(푚풮+2)⁄2 = 푐푚(풮−1)⁄2 ( ) . 푆푆푅 (푚+2)⁄2 0 푆푆푅0

298 Notice that in the case 푚 > 푛 where 푊 and 푊0 are not of full column rank, this expression is

− − 299 invariant to the choice of the generalized inverses (푊′푊) and (푊0′푊0) . This follows because of

300 the uniqueness of the projection operator onto the column space of 푊, 퐻푊 (Harville, 2000), which

301 implies that 푆푆푅 and 푆푆푅0 are invariant to the choice of the generalized inverses. The approximate

∗ 302 fractional Bayes factor 퐹퐵퐹̅̅̅̅̅̅10퐺퐻푊 was equal to 1 (see Appendix B for details). Thus, it does not

303 provide information for comparing the corresponding models.

304 Based on the fact that the 퐹퐵퐹̅̅̅̅̅̅10푊 is invariant to the choice of generalized inverses of 푊′푊 and

305 푊0′푊0 when 푚 > 푛, a brief discussion about the possible use of this criterion in the non-full rank

306 case is provided in Appendix C. The issue is that the derivation that led to the fractional Bayes factor

307 in the full rank case cannot be applied to the non-full rank case due to the fact that |푊′푊| =

′ −1 ′ −1 308 |푊0푊0| = 0 and (푊′푊) and (푊0푊0) do not exist. Although expressions involving these

309 quantities cancel later on, it is clear that the derivations presented in Appendix B do not justify using

310 퐹퐵퐹̅̅̅̅̅̅10푊 in the non-full rank case.

311 These Bayes factors are useful for carrying out the conventional conditioned on 푊,

312 that is, conditioned on the observed genotypes. When part of 푊 is not observed, the joint distribution

313 of 풚 and 푊푁 given 푊ℴ can be obtained and then summing over the set 풢푁 yields Bayes factors and

ℴ 푓(풚|푊,푀1) 314 fractional Bayes factors conditioned on 푊 . Recall that 퐵퐹10푊 = , to find 퐵퐹10푊ℴ = 푓(풚|푊0,푀0)

ℴ 푓(풚|푊 푀1) 315 ℴ the following computation has to be performed: 푓(풚|푊0 푀0)

18

ℴ 푁 ℴ 푓(풚|푊 , 푀1) = ∑ 휋(풚, 푊 |푊 , 푀1) 풢푁

푁 ℴ = ∑ 푓(풚|푊, 푀1)휋(푊 |푊 , 푀1) 풢푁

푁 ℴ ∗ ∗ ∗ = ∑ {푓(풚|푊, 푀1) ∫ 휋(푊 |푊 , 푃 )휋(푃 )푑푃 }. 풢푁 Ω

316 For 풓 known:

푚 푁 ℴ 푁 ℴ 휋(푊 |푊 , 푀1) = ∏ ∫ 휋(풘푗 |풘푗 , 풑푗)휋(풑푗|풓)푑풑푗 푗=1 풓 Ω푗

풓 풮 풮 풓 풓 317 where Ω푗 ≔ {풑푗 ∈ ℝ |0 < 푝푙푗 ≤ 푟푙 ∀ 푙, ∑푙=1 푟푙 = 1} and Ω = Ω1 × ⋯ × Ω푚 is the support of the

푁 ℴ 318 distribution of 푃. For all 푗, the pmf 휋(풘푗 |풘푗 , 풑푗) can be found using Bayes theorem as

푁 ℴ ℴ ℴ 319 휋(풘푗 |풘푗 , 풑푗) = 휋(풘푗|풑푗)⁄휋(풘푗 |풑푗), but computing 휋(풘푗 |풑푗) requires ∑풢푁 휋(풘푗|풑푗) which can

320 be unfeasible from the computational point of view. Alternatively, 휋(푊푁|푊ℴ, 푃∗) can be derived

321 from first principles by noticing that the dependence on 푊ℴ comes from the term where genotypes of

322 individuals are conditioned on parental genotypes and then proceeding as in section 2.1.1 of part I.

323 Using the expressions derived in section 2.2.1 of part I and assuming 풓 known:

푁 ℴ 휋(푊 |푊 , 푀1) ∝

푛푙푗 푚 풮 퐴 푁 퐵푗 푗 퐻 훼 −1 1 푛 +훼푙−1 푛 푙 푛푁 풮+1 푙푁 푙푁 2 ∏ ∫ 푝(풮+1)푗 ∏ { 2푓 푝푙푗 (푟푙 − 푝푙푗) ∏ 휋 (푤푖′푗|푤푆 ′ , 푤퐷 ′ )} 푑풑푗 푙푗푁 푖 푗 푖 푗 푗=1 풓 푙=1 푟 푖′=푓 +1 Ω푗 푙 푙푗푁

324 where 푓푙푗푁 is the number of founders with missing genotypes at locus 푗 in subpopulation 푙, 푛푙푗푁 is

325 the total number of individuals with missing genotypes at locus 푗 in subpopulation 푙. Given that

푛 푙푗푁 푙 ∗ 푁 ℴ ∏ ′ ( | ) 326 푖′=푓 +1 휋 (푤푖 푗|푤푆 ′ , 푤퐷 ′ ) does not depend on 푃 , the problem of finding 휋 푊 푊 , 푀1 푙푗푁 푖 푗 푖 푗

327 involves the evaluation of 푚 integrals of the form:

19

풮 퐴 퐵푗 푗 훼 −1 푛 +훼푙−1 푛 풮+1 푙푁 푙푁 ∫ 푝(풮+1)푗 ∏ {푝푙푗 (푟푙 − 푝푙푗) } 푑풑푗, 풓 푙=1 Ω푗

퐴 푛 푗 풮 푙푁 328 this integral corresponds to the expectation of the function ∏푙=1(푟푙 − 푝푙푗) of the random vector 풑푗

329 taken over 휋(풑푗|풓). It does not have a closed form solution, but these integrals could be evaluated

푁 ℴ 330 numerically in order to find a numerical approximation to 휋(푊 |푊 , 푀1). A similar situation occurs

331 when 풓 is not known, that is, integrals with no closed form solutions have to be evaluated in order to

푁 ℴ 332 find 휋(푊 |푊 , 푀1).

333 Notice that matrices 푊 and 푊0 contain the same random variables but in different arrays.

푁 ℴ 334 Consequently, 푊 and 푊 are the same in both cases and the analytic form of 휋(푊0|풑0), can be

∗ 335 easily derived from 휋(푊|푃 ) by setting 풮 = 1 and taking into account that the prior posed over 풑0 is

336 the product of 푚 퐵푒푡푎(훼, 훽) densities.

ℴ 푁 ℴ 푓(풚|푊 , 푀0) = ∑ 푓(풚|푊0, 푀0)휋(푊 |푊 , 푀0) 풢푁

푁 ℴ 푁 ℴ 337 where 휋(푊 |푊 , 푀0) = ∫ 휋(푊 |푊 , 풑0)휋(풑0)푑풑0 , Ω0 = [0,1] × [0,1] × ⋯ × [0,1], an Ω0

338 푚 −dimensional unit hypercube.

푛푁푗 푚 퐴 퐵푗 푗 푁 ℴ 푛퐻 푛 푛푁 휋(푊 |푊 , 풑 ) = 2 푁 ∏ {푝 푁 (1 − 푝 ) ∏ 휋 (푤 ′ |푤 , 푤 )}, 0 푗 푗 푖 푗 푆푖′푗 퐷푖′푗 ′ 푗=1 푖 =푓푁푗+1

퐵푗 퐴푗 339 then, using the fact that 푛푁 + 푛푁 = 2푓푁푗 (which is twice the total number of founders with missing

340 genotypes at locus 푗), it follows that:

푚 푛푁푗 1 푛퐻 퐴 2 푁 퐵푗 푗 푁 ℴ 푛 +훼−1 푛푁 +훽−1 휋(푊 |푊 , 푀0) = ∏ ∏ 휋 (푤푖′푗|푤푆 , 푤퐷 ) ∫ 푝푗 푁 (1 − 푝푗) 푑푝푗 퐵(훼, 훽)푚 푖′ 푖′ ′ 푗=1 푖 =푓푁푗+1 0

20

푚 퐵푗 퐴푗 푛푁푗 푛퐻 2 푁 Γ (푛푁 + 훼) Γ (푛푁 + 훽) = ∏ { ∏ 휋 (푤푖′푗|푤푆 , 푤퐷 )}, 퐵(훼, 훽)푚 푖′푗 푖′푗 Γ(2푓푁푗 + 훼 + 훽) ′ 푗=1 푖 =푓푁푗+1

341

342 where 푛푁푗 is the total number of individuals with missing genotypes at locus 푗. Applying properties

343 of the Gamma function (Casella and Berger, 2002; Kosmala 2004) this can be reduced to (see

344 Appendix A):

푁 ℴ 휋(푊 |푊 , 푀0)

퐵푗 퐴푗 푛 퐵 푛 퐴 푛푁푗 푚 ∏ 푁 (푛 푗 − 푘 + 훼) ∏ 푁 (푛 푗 − 푘 + 훽) 푛퐻 푘=1 푁 푘=1 푁 = 2 푁 ∏ { ∏ 휋 (푤 ′ |푤 , 푤 )} . 2푓푁푗 푖 푗 푆푖′푗 퐷푖′푗 ∏ (2푓 − 푘 + 훼 + 훽) ′ 푗=1 푘=1 푁푗 푖 =푓푁푗+1

푁 ℴ 345 Therefore, in the case 풮 = 1, there is an explicit expression for 휋(푊 |푊 , 푀0).

ℴ 346 Notice that obtaining an approximation to the pdf 푓(풚|푊 , 푀1) involves computation of

푚 ′ (푎+푚)⁄2 푁 347 푆푆푅, 품̂, |횺 + ∑푗=1 품̂푗품̂푗| and |푊′푊| for every possible value of 푊 . Thus, this could be

348 computationally unfeasible even for small or moderate sample sizes and chip densities.

349 Regarding interpretation of Bayes factors, their values can be classified according to the

350 recommendations of Raftery (1996). This author proposed a scale to interpret Bayes factors based on

351 a previous scale proposed by Jeffreys (1961); however, Raftery’s scale is more granular and more

352 conservative (Raftery, 1996). The scale is as follows: if 퐵퐹10< 1, the evidence is negative (i.e.,

353 against model 1), values between 1 and 3 indicate that evidence for model 1 is not worth more than a

354 bare mention, values between 3 and 20 indicate positive evidence in favor of model 1, values

355 between 20 and 150 indicate strong evidence in favor of model 1 and values greater than 150 suggest

356 very strong evidence for model 1.

357 2.3.2 Deviance information criterion

21

358 As in part I, another criterion used to compare models is the Deviance Information Criterion (DIC;

359 Spiegelhalter et al., 2002). It combines a measure of goodness of fit based on the posterior

360 distribution with a penalty for model complexity. In part I it was shown that for our family of models

361 퐷퐼퐶 can be written as the sum of two components, one computed from the 풚 component of the

362 likelihood and the other from the 푊 component of the likelihood:

ℴ 푁 ℴ 푁 ∗ 퐷퐼퐶 = −2 log 푓(풚|푊 , 푊̂퐵 , 품̂퐵, 푅̂퐵) + 2푝퐷퐼퐶−풚 − 2 log 푓(푊 |푊̂퐵 , 푃̂퐵) + 2푝퐷퐼퐶−푊

: = 퐷퐼퐶풚 + 퐷퐼퐶푊

ℴ ̂ 푁 ̂ 363 where 푝퐷퐼퐶−풚 = 2(log 푓(풚|푊 , 푊퐵 , 품̂퐵, 푅퐵) − 퐸푊푁,품,푅,푃∗|풚,푊ℴ [log 푓(풚|푊, 품, 푅)]) and 푝퐷퐼퐶−푊 =

ℴ ̂ 푁 ̂∗ ℴ 푁 ∗ 364 2(푓(푊 |푊퐵 , 푃퐵) − 퐸푊푁,푃∗|풚,푊ℴ [푓(푊 |푊 , 푃 )]).

365 2.4 Analysis of simulated data

366 With the aim of providing an example of the implementation of some of the proposed models and to

367 compare their performance, the two small simulated datasets described in part I were used here as

368 well. For the sake of completeness some minor details about the simulation are provided. After

369 simulating a historical population using a forward-in-time approach, subpopulations were created

370 using individuals pertaining to the historical population as founders. Each subpopulation had

371 different selection criteria, selection pressures, and mating systems. Dataset 1 was comprised of three

372 subpopulations with different number of generations, migration was allowed and the heritability of

373 the trait was high. Dataset 2 consisted of two subpopulations with two generations each, migration

374 was not allowed and the heritability of the trait was low (see Table 2 of companion paper for further

375 details). These simulations were performed using the software QMSIm (Sargolzaei and Schenkel,

376 2013). For further details, see part I.

377 These datasets were used to carry out analyses using the following models. Spike and slab prior and

378 heterogeneous marker effect covariance matrices with 휋0 = 0.5, 휋0 = 0.9 and 휋0 = 0.2 and their

379 null versions. All models assumed homoscedastic residuals. In the results and discussion sections,

22

380 results from the models fitted to these datasets in part I will also be considered. Models fit in part I

381 were Multivariate Gaussian prior and homogeneous marker effect covariance matrices, Multivariate

382 Gaussian prior and heterogeneous marker effect covariance matrices, both with homoscedastic

383 residuals. Not all models were used to analyze these data because of the following reasons. Firstly,

384 taking into account that simulations did not consider heteroscedastic residuals, only models with

385 homoscedastic residuals were fit. Secondly, some models have computational issues that make their

386 implementation intractable. This is the case of models with a spike and slab prior over 품 with

387 homogeneous marker effect covariance matrices. In these models, the full conditional distribution of

388 the covariance matrix 퐺0 involves all the combinations of 푖 out of 푚 markers with null effects for

389 푖 = 0,1, … , 푚; therefore, it is not easy to sample from 휋(퐺0|퐸푙푠푒) due to the number of

390 combinations being exponential in 푚. As shown in section 2.2.2, for the model with heterogeneous

391 marker effect covariance matrices, it is easy to sample from the full conditional distribution of the

392 covariance matrix of each marker locus which makes its implementation possible.

1 1 393 Data were analyzed using the MCMC algorithm described in part I assuming that 풓 = ( , … , ) and 풮 풮

1 394 using the product of 풮 independent uniform (0, ) distributions as proposal for 휋(푃|퐸푙푠푒). The 풮

395 following criteria for model comparison were computed: approximate Bayes factors and fractional

396 Bayes factors derived in section 2.3.1, the squared correlation between predicted breeding values and

397 phenotypes in the testing populations (predictive ability), squared correlations between true and

398 predicted breeding values in the testing and training populations (accuracy) and DIC.

399 The hyper-parameter 휋0 was assumed to be given. In practice, values close to 1 are used reflecting

400 the belief that many of the SNP do not have an effect. Alternatively, this can be

401 tuned or a prior can be posed over it in order to reflect uncertainty. Here, three values of this

402 parameter were implemented in the analyses, 0.9, 0.5 and 0.2. This does not correspond to a tuning

403 procedure; it was done only for illustrative purposes. The three values were chosen to reflect

23

404 situations in which the prior belief is that a high proportion of marker loci do not have an effect

405 (휋0 = 0.9), approximately half of them have an effect (휋0 = 0.5), and a high proportion of markers

406 have an effect (휋0 = 0.2). In dataset 2, the full genotypes of three individuals (one founder from

407 each subpopulation and a non-founder from subpopulation 1) were not included in the analysis in

408 order to simulate the case of missing genotypes.

409 For each analysis, 20.000 iterations were run, considering the first 10.000 as burn-ins. In-house R

410 scripts (R Core Team, 2015) were created to accommodate spike and slab priors and to compute

411 Bayes factors and Fractional Bayes factors as well as DIC. Analyses were performed using the

412 University of Florida’s high performance computing cluster.

413 3. Results

414 3.1 Bayes factors

415 Using the expressions derived in section 2.3.1, approximate Bayes factors and fractional Bayes

∗ ∗ 416 factors were computed for dataset 1. Recall that 퐹퐵퐹̅̅̅̅̅̅10퐺푊 = 퐹퐵퐹̅̅̅̅̅̅10퐺푊 = 퐹퐵퐹̅̅̅̅̅̅10푆푆푊 = 퐹퐵퐹̅̅̅̅̅̅10푆푆푊;

∗ ∗ 417 therefore, the same expression permits the comparison of models 푀1퐺, 푀1퐺, 푀1푆푆 and 푀1푆푆 with

418 their corresponding null models. Because of the same reason that makes the from the full

0 419 conditional distribution of 퐺 under model 푀1푆푆 difficult, approximate Bayes factors for models with

∗ 420 spike and slab priors were not computed. According to the Raftery’s scale, 퐵퐹̅̅̅̅10퐺푊 and 퐵퐹̅̅̅̅10퐺푊

421 suggested very strong evidence in favor of all full models (they were greater than 150) in dataset 1.

422 The same result was found when using the fractional Bayes factor which was computed with 푐 =

423 0.5.

424 In dataset 2, computation of Bayes factors was not possible because 푚 > 푛1. Furthermore, even

425 though only three individuals were assumed to be non-genotyped and the number of markers was

426 small, computation of the fractional Bayes factor was not performed due to its computational

427 demands. All evidence provided by the approximate fractional Bayes factors computed using the

24

428 posterior means of 푊푁 (which could be seen as a sort of plug-in criteria) was against the full models,

429 that is, all fractional Bayes factors were smaller than 1.

430 3.2 DIC, predictive ability and accuracies of predicted breeding values

431 In dataset 1, 퐷퐼퐶푊 is common to all full models and to all null models, i.e., there are only two values.

432 It is due to the fact that there were no missing genotypes (see part I for details). The values were

433 4717671 for full models, and 6589105 for null models, that is, information coming from observed

434 genotypes provided evidence in favor of the full models. It means that in this population, genotypic

435 data provided support for the assumption of heterogeneous and correlated allelic frequencies when

436 comparing it with the competing assumption that allelic frequencies are the same in all

437 subpopulations.

438 Tables 2 and 3 contain DIC values for datasets 1 and 2 respectively, whereas Table 4 shows

439 predictive abilities and accuracies for the two datasets. For Tables 2 to 4, the following is the

440 meaning of abbreviations for the different models fitted to datasets 1 and 2: 푀1퐺= full model with

∗ 441 Multivariate Gaussian prior and homogeneous marker effect covariance matrices, 푀1퐺 = full model

∗ 442 with Multivariate Gaussian prior and heterogeneous marker effect covariance matrices, 푀1푆푆0.5 = full

443 model with spike and slab prior, 휋0 = 0.5 and heterogeneous marker effect covariance matrices,

∗ 444 푀1푆푆0.9 = full model with spike and slab prior, 휋0 = 0.9 and heterogeneous marker effect covariance

∗ 445 matrices, 푀1푆푆0.2 = full model with spike and slab prior, 휋0 = 0.2 and heterogeneous marker effect

446 covariance matrices. The remaining models with subindex 1 replaced by 0 correspond to null

447 versions of the corresponding full models.

448 Table 2 풚 component and total DIC for dataset 1

Model 푫푰푪풚 Total DIC

푀1퐺 33702.55 4751373.55 ∗ 푀1퐺 11599.05 4729270.05 ∗ 푀1푆푆0.5 11604.09 4729275.09

25

∗ 푀1푆푆0.9 11648.94 4729319.94 ∗ 푀1푆푆0.2 11437.05 4729108.05

푀0퐺 15396.32 6604501.32 ∗ 푀0퐺 13008.42 6602113.42 ∗ 푀0푆푆0.5 12502.17 6601607.17 ∗ 푀0푆푆0.9 12625.29 6601730.29 ∗ 푀0푆푆0.2 12137.88 6601242.88 449

450 Therefore, according to the component of total DIC computed from the 풚 component of the

451 likelihood, except for the models with homogeneous marker effect covariance matrices (),

452 full models should be preferred over their null versions in this dataset. When considering total DIC,

453 all full models had a smaller DIC. Additionally, the model with the smallest DIC, and therefore the

454 one to be preferred was model 푀0푆푆퐻0.2 followed by model 푀1퐺퐻. Notwithstanding, the DIC values

455 for models 푀1퐺퐻, 푀1푆푆퐻0.5, 푀1푆푆퐻0.9 and 푀1푆푆퐻0.2 were close.

456 Table 3 풚 component, 푊 component and total DIC for dataset 2

Model 푫푰푪풚 푫푰푪푾 Total DIC

푀1퐺 1314.0 38367.4 39681.4 ∗ 푀1퐺 1328.8 38356.4 39684.2 ∗ 푀1푆푆0.5 1313.6 38394.9 39708.5 ∗ 푀1푆푆0.9 1304.8 38382.7 39687.5 ∗ 푀1푆푆0.2 1323.4 38373.8 39697.2

푀0퐺 1365.6 38180.3 39545.9 ∗ 푀0퐺 1370.1 38179.0 39549.1 ∗ 푀0푆푆0.5 1350.4 38173.4 39523.8 ∗ 푀0푆푆0.9 1361.2 38195.8 39557.0 ∗ 푀0푆푆0.2 1245.5 38178.4 39432.9 457

458 In this dataset the two components of the DIC values and therefore DIC values were similar for all

459 models. The 풚 components of DIC were smaller for the full models except for the model with spike

26

460 and slab prior for 품 and 휋0 = 0.2. Conversely, the 푊 components were smaller for null models as

461 well as total DIC values.

462 Table 4 Predictive abilities and accuracies in datasets 1 and 2

Accuracy in testing Accuracy in Training Predictive Ability Model population population Dataset1 Dataset 2 Dataset1 Dataset2 Dataset1 Dataset2

푀1퐺 0.29 0.019 0.27 0.04 0.32 0.17 ∗ 푀1퐺 0.76 0.016 0.83 0.03 0.94 0.21 ∗ 푀1푆푆0.5 0.81 0.017 0.88 0.04 0.92 0.19 ∗ 푀1푆푆0.9 0.81 0.018 0.88 0.04 0.90 0.14 ∗ 푀1푆푆0.2 0.79 0.016 0.86 0.03 0.94 0.20

푀0퐺 0.53 0.004 0.50 0.07 0.55 0.24 ∗ 푀0퐺 0.83 0.013 0.88 0.05 0.88 0.23 ∗ 푀0푆푆0.5 0.72 0.003 0.77 0.06 0.86 0.24 ∗ 푀0푆푆0.9 0.69 0.008 0.76 0.05 0.85 0.20 ∗ 푀0푆푆0.2 0.72 0.009 0.79 0.05 0.79 0.24 463

464 According to the behavior of predictive abilities in dataset 1, the performance of the different models

465 was similar except for 푀1퐺. The model with the best predictive ability was model

∗ 466 푀0퐺 while model 푀1퐺 had the worst. The accuracies in testing dataset 1 showed a pattern similar to

467 that followed by predictive abilities. The performance of the models was similar except for model

468 푀1퐺 which made the poorest job when predicting breeding values and model 푀0퐺 which had the

469 worst performance of all null models. The highest accuracies of predicted breeding values in testing

∗ ∗ ∗ 470 population 1 were observed for models 푀1푆푆0.5, 푀1푆푆0.2, and 푀0퐺. Finally, the accuracies of

471 predicted breeding values in the training population showed the same behavior than the other

472 measures, a poorer performance for models with homogeneous covariance matrix (or variance for

473 null models) of marker effects with model 푀1퐺 having the smallest accuracy. Models with the

∗ ∗ 474 highest accuracies were 푀1퐺 and 푀1푆푆0.2.

27

475 For dataset 2, predictive abilities and accuracies in the testing sets were very low. Accuracies in

476 training set were slightly larger. All these measures based on squared correlations did not show

477 marked differences between models. Full models had higher predictive abilities and smaller

478 accuracies in testing and training sets.

479 4. Discussion

480 4.1 General features of the models

481 The set of hierarchical Bayesian models for simultaneous genome-wide prediction

482 in several subpopulations accounting for randomness of genotypes developed in part I was extended

483 by incorporating spike and slab priors. The slab components of the conditional priors for marker

484 effects were 풮-variate Gaussian distributions considering homogeneous or heterogeneous covariance

485 matrices (or variances) and the spike component was multivariate mass at zero for full models and

486 univariate mass at zero for null models. Then, in order to provide general criteria for comparison of

487 the proposed models with some null versions of them, approximate Bayes factors and fractional

488 Bayes factors were derived under the assumption that 푛푙 ≥ 푚 ∀ 푙 = 1,2, … , 풮 and the possible use of

489 fractional Bayes factors for the case 푚 > 푛 was briefly discussed. These Bayes factors and fractional

490 Bayes factors were approximations because some of the multiple integrals required to find the

491 marginal distribution of data given a model were approximated via the Laplace method.

492 Spike and slab priors assign positive mass at zero; therefore, models considering this class of priors

493 can be used for variable selection and they induce a stronger shrinkage towards zero (Gianola, 2013;

494 Xu and Ghosh, 2015). Our spike and slab models can perform variable selection at the marker level,

495 that is, it is assumed that either a given marker has effects in all subpopulations or it does not have

496 effect in any subpopulation. In statistics, this is known as sparsity at the group level (Xu and Ghosh,

497 2015). Xu and Ghosh (2015) reparametrized the coefficients of the multiple linear regression as the

498 product of a positive diagonal matrix and a vector, i.e., 푔푗 ≔ 푉푔푏푗, 푗 = 1,2, … , 푚. Then, they posed

28

499 independent univariate spike and slab priors for the elements of the positive diagonal matrix and

500 independent multivariate spike and slab priors for 푏푗. This strategy permits to induce two kinds of

501 sparsity, at group level and within group. Thus, an extension of our models that would induce

502 sparsity at the group (i.e., marker) and within group levels would be to consider conditional priors

503 similar those developed in Xu and Ghosh (2015). Therefore, a given marker would have positive

504 probability of having null effects only in a proper subset of subpopulations.

505 Uncertainty on the hyper-parameter 휋0 can be accounted for by posing a prior over it. A usual choice

506 is a Beta distribution or its special case the Uniform(0,1). Implementation of this approach in the

507 models presented here is straightforward. It implies adding one more level in the hierarchy. In this

508 case, the question arising is the impact of this on inferences. Using the Kullback-Leibler ,

509 Lehmann and Casella (1998, Theorem 5.7) provide theoretical justification for the idea that

510 parameters that are in lower levels of the hierarchy have a smaller impact on inference.

511 Notwithstanding, this does not that the impact of this extra level in the hierarchy is negligible

512 and therefore, if the prior knowledge about 휋0 is poor or null it may be worth to account for

513 uncertainty. As mentioned before, alternatively this parameter can be tuned.

514 Regarding approximate Bayes factors and fractional Bayes factor, those derived here were obtained

515 via Laplace approximation which has an error of order 푂(푛−1) (Ghosh et al., 2006). This means that

516 the error of approximation is bounded from above by a constant times 푛−1. There is a refinement

517 based on the Laplace method that allows obtaining an approximation with error of order 푂(푛−2)

518 when 푞(휽) is a positive function (Tierney and Kadane, 1986), which is always satisfied in the

519 context of this study (see section 2.3.1). This refinement could be implemented to obtain more

520 accurate approximations of Bayes factors and fractional Bayes factors.

521 Other authors, e.g., Raftery (1996) and Lewis and Raftery (1997) have also used the Laplace method

522 or modifications of it (DiCiccio et al., 1997) to derive approximate Bayes factors. The following

29

523 comments regarding the algebraic expressions of Bayes factors and fractional Bayes factors are made

524 for a given dataset, that is, given 풚, 푛, 푚 and 푊ℴ. It is well known that for nested models (i.e., the

525 null model corresponds to the full model with some parameters set to zero) 푆푆푅0 > 푆푆푅 (Searle,

526 1971). In this case the models are not nested; therefore, this standard result cannot be used. However,

527 Result 1 establishes the relationship between 푆푆푅0 and 푆푆푅 for our models.

528 Thus, by Result 1, the following component of the algebraic expression for 퐵퐹10퐺푊 is always greater

푛+푣+2 푆푆푅 −( ) 529 or equal than 1: ( ) 2 and as a consequence it never provides evidence against model 1. 푆푆푅0

530 Conversely, for 푛 ≥ 2 the following component is always smaller or equal than 1, that is, it never

푚(풮−1) −푛휏2 1 1 2 531 provides evidence in favor of model 1: exp ( ( − )) ( ) 2 . Of course, the strength of 2 푆푆푅 푆푆푅0 푛

532 the evidence in favor or against model 1 (when 푆푆푅0 > 푆푆푅) depends on the observed data. Both

533 expressions depend on the data and the hyper-parameters assigned to the residual variance. On the

534 other hand, the following expression

1 1 |푊 ′푊 | 2 |푊 ′푊 + 푊 ′푊 + ⋯ + 푊 ′푊 | 2 ( 0 0 ) = ( 1 1 2 2 풮 풮 ) , |푊′푊| |푊1′푊1||푊2′푊2| ⋯ |푊풮′푊풮|

535 depends only on the data. However, there are no general results establishing the relationship between

536 the determinants inside the parenthesis and this is why it cannot be established if this component is

537 always smaller or greater than 1. Of course, these determinants are always positive because of the

538 assumption that all submatrices 푊1, … , 푊풮 are of full column rank. Thus, if this component favors

539 model 1 or not depends on each dataset. The following component depends on both, the priors and

540 the data:

푎+푚 풮 푎 + 푚 + 1 − 푙 −( ) (푚풮+2)⁄2 Γ ( ) 푚 ′ 2 푆푆푅 2 |횺 + ∑푗=1 품̂푗품̂푗| (푚+2)⁄2 (∏ ) ( 푚 2 ) . 푆푆푅 푎 + 1 − 푙 푏 + ∑푗=1 푔̂0푗 0 푙=2 Γ ( 2 )

30

541 The relative value of this component with respect to 1 cannot be established. Thus, as with the

542 previous component, its contribution to the evidence in favor or against model 1 varies with each

∗ ∗ ∗ 543 dataset. A similar situation occurs with 퐵퐹10퐺푊 and 퐵퐹10퐺푊퐻, while for 퐵퐹10푆푆푊 and 퐵퐹10푆푆푊 there

544 are new terms induced by the spike and slab priors posed over 품 and 품0 whose relative value with

545 respect to 1 depends on the observed data. However, the following statement can be made for the

546 term involving gamma functions. In its positive domain, the Gamma function has a minimum point at

547 approximate coordinates (1.461, 0.885) (Kosmala, 2004), this implies that after 1.461 the function

548 is increasing. Furthermore as 푥 ↓ 0, Γ(푥) → ∞. Note that for 푙 = 2,3, … , 풮 and 푎 > 풮 − 1 (recall that

푎+푚+1−푙 푚 푎+1−푙 549 the inverse Wishart distribution requires this condition) > and > 0. Therefore, 2 2 2

550 given that in genome-wide prediction 푚 has order of magnitude of at least 102, for values of 푎 such

551 that 푎 + 1 − 풮 ≥ 1.461 this term is always greater than 1.

552 Regarding fractional Bayes factors, as mentioned before,

∗ ∗ 퐹퐵퐹10퐺푊 = 퐹퐵퐹10퐺푊 = 퐹퐵퐹10푆푆푊 = 퐹퐵퐹10푆푆푊

푛(푐−1) 푆푆푅 2 푆푆푅(푚풮+2)⁄2 = 푐푚(풮−1)⁄2 ( ) 푆푆푅 (푚+2)⁄2 0 푆푆푅0

553 due to cancellation of terms making approximate Bayes factors different. Recall that 푐 ∈ (0,1). As

푆푆푅(푚풮+2)⁄2 554 푐 ↑ 1 and 푚 and 푛 remain constant the fractional Bayes factor approaches (푚+2)⁄2 . For 푐 ∈ (0,1) 푆푆푅0

푛(푐−1) 푛(푐−1) 푆푆푅 555 the exponent is always negative and therefore ( ) 2 never provides evidence against 2 푆푆푅0

556 model 1. On the contrary, 푐푚(풮−1)⁄2 provides evidence against model 1; however, as noted before,

557 given 푚 and 풮, as 푐 ↑ 1 the evidence provided by this component is negligible because the whole

558 expression approaches 1.

31

559 Some recommendations to choose the value of 푐 are given in O’Hagan (1994) and Ghosh et al.

푆푆푅(푚풮+2)⁄2 560 (2006). Finally, the behavior of (푚+2)⁄2 depends on the magnitude of the difference between 푆푆푅 푆푆푅0

561 and 푆푆푅0 and the number of subpopulations.

562 An important aspect of these approximations is that they require 푛푙 ≥ 푚 ∀ 푙 = 1,2, … , 풮. As

563 discussed in section 2.3.1, the fast growth in the number of genotyped individuals may make this

564 assumption possible for SNP chips of moderate size (i.e., 50 to 100k). However, the availability of

565 denser chips and full sequences implies that 푚 also grows. On one hand, it is said that the higher the

566 number of SNP the better the accuracy of genome-wide predictions because more LD between

567 markers and QTL is “captured”. On the other hand, some studies with real data such as Vázquez et

568 al. (2010) in Holstein cattle and de los Campos et al. (2013) in humans have found that using subsets

569 of SNP yields reasonable accuracy of genome-wide predictions. Moreover, the curve relating

570 accuracy to marker density has been reported to reach a plateau for traits as height in humans

571 (Vázquez et al., 2012) which suggests that in some cases not too much accuracy is lost when

572 selecting subsets of SNP using some criteria.

573 Finally, the ability of our models to include non-genotyped individuals allows having a larger 푛,

574 which combined with the factors mentioned before, increases the likelihood of having situations with

575 푛푙 ≥ 푚 ∀ 푙 = 1,2, … , 풮. The approximate fractional Bayes factor 퐹퐵퐹̅̅̅̅̅̅10푊 could be used for the case

576 푚 > 푛 but there is no formal mathematical justification for it. A brief discussion with an outline of

577 the steps required to justify its use in such case is provided in Appendix C. Thus, the use of this

578 expression for model comparison in the non-full rank case has to be seen as an ad hoc approach

579 because there is no formal proof of its validity yet. Therefore, the question if the approximate

580 fractional Bayes factor derived here is also valid for the non-full rank case remains to be formally

581 answered. Thus, refuting this result or establishing a rigorous proof of it is an open problem.

582 4.2 Simulation results

32

583 Our small simulations correspond to two populations comprised by three and two subpopulations

584 respectively. One trait per population was simulated. In both cases subpopulations had different

585 mating designs, selection criteria, selection pressures and heritabilities. However, these populations

586 display two contrasting scenarios. The first one (dataset 1) corresponded to a population comprised

587 of three subpopulations that diverged by several generations, heritabilities were high, migration was

588 allowed, the number of individuals in each subpopulation was larger than the number of SNP and

589 there were no missing genotypes. Conversely, the second scenario (dataset 2) considered a

590 population comprised by two subpopulations that diverged by only two generations, trait

591 heritabilities were low, there was no migration, the number of individuals was smaller than the

592 number of SNP in one subpopulation (hence the model was not of full rank) and there were missing

593 genotypes.

594 In dataset 1, predictive ability did not suggest a superior predictive capability of full models, that is,

595 models accounting for potential heterogeneity induced by the existence of subpopulations. As shown

596 in Table 4, its values were very similar across models (except for the model with a homogeneous

597 covariance matrix of marker effects which had considerably lower predictive ability). In this dataset,

598 the number of marker loci considered in the analysis was equal to the number of QTL; therefore, it

599 could be expected that the smallest value of 휋0 had the best performance. The different squared

600 correlations between predicted and observed values yielded similar results for the three values of 휋0

601 used here with a slightly better performance for the model with the smallest value of 휋0. While this

602 set of correlations did not provide conclusive evidence in favor of the full models, the DIC, Bayes

603 factors and fractional Bayes factors favored the full models.

604 Due to the low heritabilities in the two subpopulations forming dataset 2, predictive ability and

605 accuracies were very low (Table 4). In this dataset full models had slightly higher predictive abilities

606 than their null versions. Conversely, accuracies of predicted breeding values in training and

33

607 validation datasets suggested a tiny superiority of null models. Total DIC and

608 퐷퐼퐶푊 provided evidence in favor of null models, but differences were not substantial. In addition, the

609 “plug-in” fractional Bayes factors also gave evidence in favor of null models. As in part I, the

610 performance of the fitted models was more similar in dataset 2 than in dataset 1.

611 A broad observation is that when combining the results obtained here with those obtained in the

612 companion paper, the overall behavior observed in part I was kept. In general, what was observed in

613 these small simulations was that under the biological scenario simulated in dataset 1, full models

614 tended to have better performance, whereas in the setting simulated in dataset 2, null models tended

615 to outperform full models. In all cases differences were small (except for models 푀1퐺 and 푀0퐺 in

616 dataset 1). Therefore, after including the outputs of the spike and slab models, our results are still in

617 agreement with those found by Olson et al. (2012), Makgahlela et al. (2013), de los Campos et al.

618 (2015) and Lehermeier et al. (2015).

619 5. Conclusions

620 This study enlarges the family of hierarchical Bayesian models for across population genome-wide

621 prediction accounting for randomness of genotypes derived in the companion paper (part I) by

622 considering the so called spike and slab priors (multivariate and univariate) for marker allele

623 substitution effects. This class of priors allows a stronger shrinkage towards zero and variable

624 selection at group level. This development concedes even more flexibility to our family of models

625 because the user will have more modelling options that permit to cope with a wider spectrum of

626 biological scenarios. For example, for traits controlled by genes with major effects or controlled by a

627 small number of genes, using spike and slab priors is theoretically advantageous.

628 The approximate Bayes factors and fractional Bayes factors derived here can be used to complement

629 other criteria such as measures of accuracy of predicted breeding values and correlations between

630 predicted breeding values and phenotypes when comparing models. These criteria were derived

34

631 under the assumption of a full rank model which is currently satisfied in certain populations and we

632 believe that it will become an increasingly more frequent situation as more individuals are

633 genotyped. The invariance of our approximate fractional Bayes factor to the choice of the generalized

′ ′ 634 inverses of 푊 푊 and 푊0푊0 seems promising because it allows the use of this criterion in the non-

635 full rank case. However, a formal justification or rejection of this criterion remains an open problem.

636 For now, this criterion might be used ad hoc, keeping always in mind the risks that it implies.

637 In addition to all the possible extensions and refinements of our models discussed in the companion

638 paper, the modification of the spike and slab priors presented here to allow sparsity within group

639 (marker) is another aspect that opens a path for further research.

640 Author Contributions

641 C.A. Martínez developed modeling strategies, carried out the derivations, wrote the R scripts,

642 designed and made the simulations and wrote the paper. K. Khare advised modeling strategies,

643 reviewed, corrected and discussed the derivations and the statistical aspects of the paper. A. Banerjee

644 advised modeling strategies, reviewed, corrected and discussed the derivations and the statistical

645 aspects of the paper. M.A. Elzo designed the simulation, reviewed, corrected and discussed the

646 genetic aspects of the paper.

647 Acknowledgments

648 Authors acknowledge Dr. Malay Ghosh from the Department of Statistics of the University of

649 Florida for useful comments and discussions, and for pointing out relevant references. C. A. Martínez

650 also thanks Fulbright Colombia and “Departamento Adiministrativo de Ciencia, Tecnología e

651 Innovación” COLCIENCIAS for supporting his PhD and Master programs at the University of

652 Florida through a scholarship, and Bibiana Coy for her love, support and constant encouragement.

653 References

654 Casella, G., Berger, R. (2002). (2nd ed.). Duxbury, Pacific Grove, CA, USA.

35

655 CDCB Council on Dairy Cattle Breeding (2016). Genetic evaluations-genotype counts. Retrieved

656 January, 10, 2016 from https://www.cdcb.us/Genotype/counts.html.

657 Christensen, R. (2011). Plane Answers to Complex Questions the Theory of Linear Models (4th ed.).

658 Springer, New York, NY, USA.

659 de los Campos, G., Vázquez, A.I., Fernando, R., Klimentidis, Y.C., Sorensen, D. (2013). Prediction

660 of Complex Human Traits Using the Genomic Best Linear Unbiased Predictor. PLOS

661 Genetics, 9(7), e1003608.

662 de los Campos, G., Veturi, Y., Vázquez, A.I., Lehermeier, C., Pérez-Rodríguez, P. (2015).

663 Incorporating genetic heterogeneity in whole-genome regressions using interactions. Journal

664 of Agricultural, Biological and Environmental Statistics, 20(4), 467-490.

665 Daetwyler, H.D., Pong-Wong R., Villanueva, B., Woolliams, J.A. (2012). The impact of genetic

666 architecture on genome-wide evaluation methods. Genetics, 185, 1021-1031.

667 DiCiccio, T.J., Kass, R.E., Raftery, A., Wasserman, L. (1997). Computing Bayes Factors by

668 Combining Simulations and Asymptotic Approximations. Journal of the American Statistical

669 Association, 92(439), 903-915.

670 Ghosh, J.K., Delampady, M., Samanta, T. (2006). An Introduction to Bayesian Analysis Theory and

671 Methods. Springer, New York, NY, USA.

672 Gianola, D. (2013). Priors in whole-genome regression: The Bayesian alphabet returns. Genetics,

673 194, 573-596.

674 Gianola, D., Rosa, G. (2015). One Hundred Years of Statistical Developments in Animal Breeding.

675 Annual Reviews of Animal Biosciences, 3, 19-56.

676 Harville, D. (2000). Matrix Algebra from a ’s Perspective. Springer, New York, NY,

677 USA.

678 Heslot, N., Yang, H.P., Sorrells, M.E., & Jannink, J.L. (2012). Genomic selection in plant breeding:

679 A comparison of models. Crop Science, 52, 146-160.

36

680 Jeffreys, H. (1961). Theory of Probability (3rd ed.). Oxford University Press, Oxford, UK.

681 Kosmala, W.A.J. (2004). A Friendly Introduction to Analysis Single and Multivariate (2nd ed.).

682 Pearson, Upper Saddle River, NJ, USA.

683 Lavine, M., Schervish, M.T. (1999). Bayes Factors: What They Are and What They Are Not. The

684 American Statistician, 53(2), 119-122.

685 Lehermeir, C., Schon, C., de los Campos, G. (2015). Assessment of genetic heterogeneity in

686 structured plant populations using multivariate whole-genome regression models. Genetics,

687 201, 323-337.

688 Lehmann, E.L., Casella, G. (1998). Theory of point estimation. Springer, New York, NY, USA.

689 Lewis, S.M., Raftery, A.E. (1997). Estimating Bayes Factors via Posterior Simulation With the

690 Laplace-Metropolis Estimator. Journal of the American Statistical Association, 92(438), 648-

691 655.

692 Makgahlela, M.L., Mantysaari, E.A., Stranden, I., Koivula, M., Nielsen, U.S., Sillanpaa, M.J., Juga,

693 J. (2013). Across breed multi-trait random regression genomic predictions in the Nordic Red

694 dairy cattle. Journal of Animal Breeding and Genetics, 130, 10-19.

695 Mitchell, T.J., Beauchamp, J.J. (1988). Bayesian Variable Selection in Linear Regression. Journal of

696 the American Statistical Association, 83(404), 1023-1032.

697 O’Hagan, A. (1994). . Kendall’s Advanced Theory of Statistics, Vol. 2B. Halsted

698 Press, New York, NY, USA.

699 O’Hagan, A. (1995). Fractional Bayes Factors for model comparison. Journal of the Royal Statistical

700 Society. Series B, 57(1), 99-138.

701 Olson, K.M., VanRaden, P.M., Tooker, M.E. (2012). Multibreed genomic evaluations using purebred

702 Holsteins, Jerseys, and Brown Swiss. Journal of Dairy Science, 95, 5378-5383.

703 R Core Team (2015). R: A language and environment for statistical computing. R foundation for

704 statistical computing, Vienna, Austria. URL https://www.R-project.org/.

37

705 Raftery, A.E. (1996). Approximate Bayes Factors and Accounting for Model Uncertainty in

706 Generalised Linear Models. Biometrika, 86(2), 251-266.

707 Sargolzaei, M., Schenkel, F.S. (2013). QMSim User’s Guide Version 1.10. Centre for Genetic

708 Improvement of Livestock, Department of Animal and Poultry Science, University of

709 Guelph, Guelph, Canada.

710 Searle, S.R. (1971). Linear Models. John Wiley & Sons, New York, NY, USA.

711 Spiegelhalter, D.J., Best, N.G., Carlin, B.P., van der Linde, A. (2002). Bayesian measures of model

712 complexity and fit (with discussion). Journal of the Royal Statistical Society Series B, 64,

713 583–639.

714 Spiegelhalter, D.J., Best, N.G., Carlin, B.P., van der Linde, A. (2014). The deviance information

715 criterion: 12 years on. Journal of the Royal Statistical Society Series B, 76, 485–493.

716 Tierney, L., Kadane, J.B. (1986). Accurate approximations for posterior moments. Journal of the

717 American Statistical Association, 81, 82-86.

718 van den Berg, S., Calus, M.P.L., Meuwissen, T.H.E., Wientjes, Y.C.J. (2015). Across population

719 genomic prediction scenarios in which Bayesian variable selection outperforms GBLUP.

720 BMC Genetics, 16:416.

721 Vázquez, A.I., Rosa, G.J.M., Weigle, K.A., de los Campos, G., Gianola, D., Allison, D.B. (2010)

722 Predictive ability of subsets of single nucleotide polymorphisms with and without parent

723 average in US Holsteins. Journal of Dairy Science, 93(12), 5942-5949.

724 Vázquez, A.I., de los Campos, G., Klimentidis, Y.C., Rosa, G.J.M., Gianola, D., Yi, N., Allison,

725 D.B. (2012). A Comprehensive Genetic Approach for Improving Prediction of Skin Cancer

726 Risk in Humans. Genetics, 192, 1493-1502.

727 Xu, X., Ghosh, M. (2015). Bayesian variable selection and estimation for group Lasso. Bayesian

728 Analysis, 10(4), 909-936.

729

38

730 Appendix A: Joint posteriors, full conditionals and details of some derivations 731 732 Joint posteriors 733 Spike and slab prior for 품, homogeneous marker effect covariance matrix and homoscedastic 734 residuals 푛 − −1 휋(품, 휎2, 푊푁, 퐺, 푃|풚, 푊ℴ) ∝ (휎2) 2 exp ( (풚 − 푊품)′(풚 − 푊품)) 2휎2

푚 푚 ( 푖 ) 푖 ( )푚−푖 ( 0) × ∑ ∑ 퐼훿푖푙 휋0 1 − 휋0 ∏ 푀푉푁 품푘; ퟎ, 퐺 푐 푖=0 푙=1 푘:푔푘∈훿푖푙0

1 − (푎+풮+1) −1 × |퐺0| 2 exp ( 푡푟(횺(퐺0)−1)) 2

푣 2 −( +1) −휏 × (휎2) 2 exp ( ) 2휎2

× 휋(푊|푃∗)휋(푃∗).

735 Spike and slab prior for 품, heterogeneous marker effect covariance matrix and homoscedastic 736 residuals 푛 − −1 휋(품, 휎2, 푊푁, 퐺, 푃|풚, 푊ℴ) ∝ (휎2) 2 exp ( (풚 − 푊품)′(풚 − 푊품)) 2휎2 푚 푚 ( 푖 ) 푖 ( )푚−푖 ( ) × ∑ ∑ 퐼훿푖푙 휋0 1 − 휋0 ∏ 푀푉푁 품푘; ퟎ, 퐺푘 푐 푖=0 푙=1 푘:푔푘∈훿푖푙0 푚 1 − (푎+풮+1) −1 × ∏|퐺 | 2 exp ( 푡푟(횺퐺−1)) , 푗 2 푗 푗=1 푣 2 −( +1) −휏 × (휎2) 2 exp ( ) 2휎2 × 휋(푊|푃∗)휋(푃∗) 737 738 Full conditionals 739 Spike and slab prior for 품 740 Derivations for the heterogeneous marker effect covariance matrix model are presented. The 741 homogeneous marker effect covariance matrix model is simply a special case with covariance 0 742 matrices satisfying: 퐺1 = 퐺2 = ⋯ = 퐺푚 = 퐺 . 푚 푚 ( 푖 ) ( | ) 푖 ( )푚−푖 ( ) ( | 2) 휋 품 퐸푙푠푒 ∝ ∑ ∑ [퐼훿푖푙 휋0 1 − 휋0 ∏ 푀푉푁 품푘; ퟎ, 퐺푘 ] × 푓 풚 푊, 품, 휎 푐 푖=0 푙=1 푘:푔푘∈훿푖푙0

39

푚 푚 ( 푖 ) 1 1 ( ′ ′ ′ ′ ) 푖 ( )푚−푖 ′ −1 ∝ exp (− 품 푊 푊품 − 2품 푊 풚 ) ∑ ∑ 퐼훿푖푙 휋0 1 − 휋0 ∏ exp (− 품 퐺푘 품). 2 푐 2 푖=0 푙=1 푘:푔푘∈훿푙푖0 743 Notice that under a particular 훿푖푙: 푚 푚 ′ ′ ′ ′ 품 푊 푊품 = ∑ ∑ 품푗푊푗 푊ℎ품ℎ 푗=1 ℎ=1 푚 푚 ′ ′ = ∑ 품푗푊푗 ∑ 푊ℎ품ℎ 푗=1 ℎ=1 푚 푚 ′ ′ ′ ′ = ∑ 품푘푊푘 ∑ 푊ℎ품ℎ + ∑ 품푘푊푘 ∑ 푊ℎ품ℎ 푐 푘:품푘∈훿푖푙0 ℎ=1 푘:품푘∈훿푖푙0 ℎ=1 744 but for 품푘 ∈ 훿푖푙0, 품푘 = 0, hence, under 훿푖푙: 푚 ′ ′ ′ ′ 품 푊 푊품 = ∑ 품푘푊푘 ∑ 푊ℎ품ℎ 푐 푐 푘:품푘∈훿푖푙0 ℎ:품푘∈훿푖푙0 ′ ′ = ∑ ∑ 품푘푊푘푊ℎ품ℎ 푐 푐 푘:품푘∈훿푖푙0 ℎ:품푘∈훿푖푙0 ′ ′ = 품 푐 푊 푐 푊 푐 품 푐 훿푖푙0 훿푖푙0 훿푖푙0 훿푖푙0 ′ ′ ′ 푐 745 where 품 푐 = (품푘 ⋯ 품푘 ) , 푘: 품 ∈ 훿 corresponds to the vector of dimension 풮(푚 − 푖) 훿푖푙0 1 푚−푖 푘 푖푙0 ′ 746 with the non-null marker effects under 훿 , 푊 푐 is the submatrix of the design matrix corresponding 푖푙 훿푖푙0 747 to 품 푐 . 훿푖푙0 748 Similarly ′ ′ ′ 품′푊 풚 = ∑ 품푘푊푘 풚 푐 푘:품푘∈훿푖푙0 ′ ′ = 품 푐 푊 푐 풚, 훿푖푙0 훿푖푙0 749 in addition, notice that: ′ −1 ′ −1 ∑ 품 퐺 품 = 품 푐 퐺 푐 품 푐 푘 푘 푘 훿푖푙0 훿푖푙0 훿푖푙0 푐 푘:품푘∈훿푖푙0 750 where −1 퐺푘1 −1 퐺훿푐 = ( ⋱ ) 푖푙0 −1 퐺푘푚−푖 751 therefore, 휋(품|퐸푙푠푒) ∝ (푚) 푚 푖 ′ 푊 푐 푊훿푐 푖 푚−푖 1 ′ 훿푖푙0 푖푙0 −1 2 ′ ′ ∑ ∑ 퐼훿 휋0(1 − 휋0) exp (− (품 푐 ( + 퐺 푐 ) 품훿푐 ) − 품 푐 푊 푐 풚) 푖푙 2 훿푖푙0 휎2 훿푖푙0 푖푙0 휎2 훿푖푙0 훿푖푙0 푖=0 푙=1 752 i.e., 휋(품|퐸푙푠푒) =

40

(푚) 푚 푖 ′ −1 ′ ′ −1 푊 푐 푊훿푐 푊 푐 풚 푊 푐 푊훿푐 푖 푚−푖 훿푖푙0 푖푙0 −1 훿푖푙0 훿푖푙0 푖푙0 −1 ∑ 휋0(1 − 휋0) ∑ 퐼훿 푀푉푁 (품훿푐 ; ( + 퐺 푐 ) , ( + 퐺 푐 ) ) . 푖푙 푖푙0 휎2 훿푖푙0 휎2 휎2 훿푖푙0 푖=0 푙=1 753 754 In the case of the full conditional distribution of a single marker effect: 휋(품 |퐸푙푠푒) ∝ 휋 퐼 + (1 − 휋 )푀푉푁(품 ; 0, 퐺 )퐼 푓(풚|푊, 품, 휎2) 푗 ( 0 {품푗=ퟎ} 0 푗 푗 {품푗≠ퟎ}) 1 ∝ 휋 exp (− (품′ 푊′ 푊 품 − 2품′ 푊′ 풚)) 퐼 0 2 (−푗) (−푗) (−푗) (−푗) (−푗) (−푗) {품푗=ퟎ} 1 +(1 − 휋 ) exp (− (품′ (푊′푊 + 퐺−1)품 − 2품′ 푊′(풚 − 푊 품 ))) 퐼 0 2 푗 푗 푗 푗 푗 푗 푗 (−푗) (−푗) {품푗≠ퟎ}

755 where 품(−푗) corresponds to the vector 품 without subvector 품푗 and 푊(−푗) corresponds to the design

756 matrix after deleting columns corresponding to marker 푗, and 푊푗 is the design matrix corresponding

757 to 품푗. Thus, the full conditional of 품푗 is also a spike and slab distribution. A more explicit form of 758 this distribution can be found by computing the mixing probabilities. To this end: 759 휋 푓(풚|푊, 푔, 휎2)퐼 + (1 − 휋 )휋(품 )푓(풚|푊, 품, 휎2)퐼 0 {품푗=ퟎ} 0 푗 {품푗≠ퟎ} 휋(품푗|퐸푙푠푒) = 2 2 휋0푓(풚|푊, 품(−푗), 품푗 = 0, 휎 ) + (1 − 휋0) ∫ 휋(품푗)푓(풚|푊, 품, 휎 ) 푑품푗 품푗≠ퟎ 휋 푓(풚|푊, 푔, 휎2)퐼 0 {품푗=ퟎ} ⇒ 푝(품푗 = 0|퐸푙푠푒) = 2 2 휋0푓(풚|푊, 품(−푗), 품푗 = 0, 휎 ) + (1 − 휋0) ∫ 휋(품푗)푓(풚|푊, 품, 휎 ) 푑품푗 품푗≠ퟎ 2 2 760 let 푚(풚|푊, 품(−푗), 휎 ) = ∫ 휋(품푗)푓(풚|푊, 품, 휎 ) 푑품푗, then: 품푗≠ퟎ 2 2 휋0푓(풚|푊, 푔, 휎 )퐼{품 =ퟎ} + (1 − 휋0)휋(품푗)푓(풚|푊, 품, 휎 )퐼{품 ≠ퟎ} 휋(품 |퐸푙푠푒) = 푗 푗 푗 2 2 휋0푓(풚|푊, 품(−푗), 품푗 = 0, 휎 ) + (1 − 휋0)푚(풚|푊, 품(−푗), 휎 ) 761

762 also notice that the pdf 휋(품푗|퐸푙푠푒) can be written as 2 휋(품푗)푓(풚|푊,품,휎 ) 763 푝(품푗 = 0|퐸푙푠푒)퐼{품 =ퟎ} + (1 − 푝(품푗 = 0|퐸푙푠푒)) 2 퐼{품 ≠ퟎ}, this follows because 푗 푚(풚|푊,품(−푗),휎 ) 푗 764 휋(품 )푓(풚|푊, 품, 휎2) 푝(품 = 0|퐸푙푠푒)퐼 + (1 − 푝(품 = 0|퐸푙푠푒)) 푗 퐼 푗 {품푗=ퟎ} 푗 2 {품푗≠ퟎ} 푚(풚|푊, 품(−푗), 휎 ) 2 휋0푓(풚|푊, 푔, 휎 )퐼{품 =ퟎ} = 푗 퐼 2 2 {품푗=ퟎ} 휋0푓(풚|푊, 품(−푗), 품푗 = 0, 휎 ) + (1 − 휋0)푚(풚|푊, 품(−푗), 휎 ) 2 2 휋0푓(풚|푊, 푔, 휎 )퐼{품 =ퟎ} 휋(품 )푓(풚|푊, 품, 휎 ) + (1 − 푗 ) 푗 퐼 2 2 2 {품푗≠ퟎ} 휋0푓(풚|푊, 품(−푗), 품푗 = 0, 휎 ) + (1 − 휋0)푚(풚|푊, 품(−푗), 휎 ) 푚(풚|푊, 품(−푗), 휎 ) 765 2 휋0푓(풚|푊, 푔, 휎 )퐼{품 =ퟎ} = 푗 퐼 2 2 {품푗=ퟎ} 휋0푓(풚|푊, 품(−푗), 품푗 = 0, 휎 ) + (1 − 휋0)푚(풚|푊, 품(−푗), 휎 )

41

2 2 (1 − 휋 )푚(풚|푊, 품( ), 휎 ) 휋(품 )푓(풚|푊, 품, 휎 ) + ( 0 −푗 ) 푗 퐼 2 2 2 {품푗≠ퟎ} 휋0푓(풚|푊, 품(−푗), 품푗 = 0, 휎 ) + (1 − 휋0)푚(풚|푊, 품(−푗), 휎 ) 푚(풚|푊, 품(−푗), 휎 ) 766 2 2 휋0푓(풚|푊, 푔, 휎 )퐼{품 =ퟎ} + (1 − 휋0)휋(품푗)푓(풚|푊, 품, 휎 )퐼{품 ≠ퟎ} = 푗 푗 2 2 휋0푓(풚|푊, 품(−푗), 품푗 = 0, 휎 ) + (1 − 휋0)푚(풚|푊, 품(−푗), 휎 )

= 휋(품푗|퐸푙푠푒). 767

768 Therefore, the remaining task is finding the explicit form of 푝(품푗 = 0|퐸푙푠푒) under the model being 769 considered. Doing some algebra, using the completing the quadratic form technique and properties of 770 the multivariate normal distribution it can be shown that: 771

−1⁄2 1 2 ∫ 휋(품 )푓(풚|푊, 품, 휎2) 푑품 = (2휋휎2)−푛⁄2|퐺 | exp (− ‖풚 − 푊 품 ‖ ) 푗 푗 푗 2휎2 (−푗) (−푗) 2 품푗≠ퟎ 2 −1⁄2 1 −1⁄2 ′ × |퐺퐹푗| exp ( ‖퐺 푊 (풚 − 푊(−푗)품(−푗))‖ ), 2휎2 퐹푗 푗 2 772 ′ −1 773 where 퐺퐹푗: = 푊푗 푊푗 + 퐺푗 . Now, using the fact that 1 2 푓(풚|푊, 품 , 품 = 0, 휎2) = (2휋휎2)−푛⁄2 exp (− ‖풚 − 푊 품 ‖ ) (−푗) 푗 2휎2 (−푗) (−푗) 2 774 it follows that 휋0 푝(품푗 = 0|퐸푙푠푒) = 2 . −1⁄2 1 −1⁄2 ′ 휋0 + (1 − 휋0)(|퐺퐹푗||퐺푗|) exp ( 2 ‖퐺퐹푗 푊푗 (풚 − 푊(−푗)품(−푗))‖ ) 2휎 2 775 Thus, the full conditional distribution of 품푗 is a spike and slab distribution where the slab component −1 ′ −1 풮 776 is a 푀푉푁(퐺퐹푗 푊푗 (풚 − 푊(−푗)품(−푗)), 퐺퐹푗 ) and the spike is a point mass at 0 in ℝ . 777 778 Full conditionals for models with heteroscedastic residuals 779 In this case: 1 푓(풚|푊, 품, 푅) ∝ |푉|−1⁄2 exp (− (풚 − 푊품)′푉−1(풚 − 푊품)) 2 풮 1 = ∏(휎2)−푛푙⁄2 exp (− (풚 − 푊 품 )′(풚 − 푊 품 )). 푙 2휎2 푙 푙 푙 푙 푙 푙 푙=1 푙 780 In addition 풮 휏2 휋(푅) ∝ ∏(휎2)−(푣⁄2+1) exp (− ). 푙 2휎2 푙=1 푙 781 In the following, only the full conditionals that change with respect to the homoscedastic models and 782 those presented in part I are presented. 783 Under a spike and slab prior for 품, the full conditionals that change with respect to the 784 homoscedastic residuals model are:

42

휋(품푗|퐸푙푠푒) = 푝(품 = 0|퐸푙푠푒)퐼 + 1 − 푝(품 = 0|퐸푙푠푒) 푀푉푁(퐺−1 푊′푉−1(풚 − 푊 품 ), 퐺−1 )퐼 푗 {품푗=ퟎ} ( 푗 ) 푉퐹푗 푗 (−푗) (−푗) 푉퐹푗 {품푗≠ퟎ} 785 −1 ′ −1 0 −1 786 where 퐺푉퐹푗 = 푊푗 푉 푊푗 + (퐺 ) and 787 휋 푝(품 = 0|퐸푙푠푒) = 0 푗 −1⁄2 2 1 −1⁄2 ′ −1 휋0 + (1 − 휋0)(|퐺푉퐹푗||퐺푗|) exp ( ‖퐺푉퐹푗 푊푗 푉 (풚 − 푊(−푗)품(−푗))‖ ) 2 2 788 The full conditionals 휋(푅|퐸푙푠푒) and 휋(푊푁|퐸푙푠푒) change with respect to the homoscedastic model 789 but these are the same as for the model with Gaussian prior presented in part I. 790 For the model with spike and slab prior for 품 and heterogeneous marker effect covariance matrices, 791 휋(푅|퐸푙푠푒) and 휋(푊|퐸푙푠푒) also remain unchanged with respect to the model with Gaussian prior.

792 Regarding the full conditional of marker additive effects 휋(품푗|퐸푙푠푒), it is similar to the case of the 793 model with homogeneous marker effect covariance matrices, the only difference is that in this model −1 ′ −1 −1 794 퐺푉퐹푗 = 푊푗 푉 푊푗 + 퐺푗 . 795 푵 퓸 796 Algebraic simplification of 흅(푾 |푾 , 푴ퟎ) 797 798 Here we recursively use the following well-known property of the Gamma function: Γ(훼 + 1) = 799 훼Γ(훼) as well as the definition of the Beta function. 800 푛 퐻 푚 퐵푗 퐴푗 푁푗 푛푁 Γ (푛 + 훼) Γ (푛 + 훽) 푁 ℴ 2 푁 푁 휋(푊 |푊 , 푀0) = ∏ { ∏ 휋 (푤푖′푗|푤푆 푗, 푤퐷 푗)} 퐵(훼, 훽)푚 푖′ 푖′ Γ(2푓푁푗 + 훼 + 훽) ′ 푗=1 푖 =푓푁푗+1 퐵 퐴 푛푁푗 푚 Γ (푛 푗 + 훼) Γ (푛 푗 + 훽) 푛퐻 푁 푁 = 2 푁 ∏ { ∏ 휋 (푤 ′ |푤 , 푤 )}, 푖 푗 푆푖′푗 퐷푖′푗 퐵(훼, 훽)Γ(2푓푁푗 + 훼 + 훽) ′ 푗=1 푖 =푓푁푗+1 801 notice that

Γ(2푓푁푗 + 훼 + 훽) = (2푓푁푗 − 1 + 훼 + 훽)Γ(2푓푁푗 − 1 + 훼 + 훽)

= (2푓푁푗 − 1 + 훼 + 훽)(2푓푁푗 − 2 + 훼 + 훽)Γ(2푓푁푗 − 2 + 훼 + 훽)

= (2푓푁푗 − 1 + 훼 + 훽) ⋯ (2푓푁푗 − (2푓푁푗 − 1) + 훼 + 훽)Γ(2푓푁푗 − (2푓푁푗 − 1) + 훼 + 훽) 2푓푁푗

= Γ(훼 + 훽) ∏(2푓푁푗 − 푘 + 훼 + 훽) 푘=1 2푓푁푗 Γ(훼)Γ(훽) ⇒ 퐵(훼, 훽)Γ(2푓 + 훼 + 훽) = Γ(훼 + 훽) ∏(2푓 − 푘 + 훼 + 훽) 푁푗 Γ(훼 + 훽) 푁푗 푘=1 2푓푁푗

= Γ(훼)Γ(훽) ∏(2푓푁푗 − 푘 + 훼 + 훽) (퐴. 1) 푘=1 802 similarly,

43

퐵푗 푛푁 퐵푗 퐵푗 Γ (푛푁 + 훼) = Γ(훼) ∏ (푛푁 − 푘 + 훼) (퐴. 2) 푘=1 퐴푗 푛푁 퐴푗 퐴푗 Γ (푛푁 + 훽) = Γ(훽) ∏ (푛푁 − 푘 + 훽) (퐴. 3) 푘=1 803 consequently, from (퐴. 1), (퐴. 2) and (퐴. 3) it follows that 퐵푗 퐴푗 퐵푗 퐴푗 푛푁 퐵푗 푛푁 퐴푗 Γ (푛푁 + 훼) Γ (푛푁 + 훽) Γ(훼)Γ(훽) ∏푘=1 (푛푁 − 푘 + 훼) ∏푘=1 (푛푁 − 푘 + 훽) = 2푓푁푗 퐵(훼, 훽)Γ(2푓푁푗 + 훼 + 훽) Γ(훼)Γ(훽) ∏푘=1 (2푓푁푗 − 푘 + 훼 + 훽) 퐵푗 퐴푗 푛푁 퐵푗 푛푁 퐴푗 ∏푘=1 (푛푁 − 푘 + 훼) ∏푘=1 (푛푁 − 푘 + 훽) = 2푓푁푗 ∏푘=1 (2푓푁푗 − 푘 + 훼 + 훽) 804 푁 ℴ 805 plugging this expression in 휋(푊 |푊 , 푀0) it follows that: 806 푁 ℴ 휋(푊 |푊 , 푀0) 퐵푗 퐴푗 푛 퐵 푛 퐴 푛푁푗 푚 ∏ 푁 (푛 푗 − 푘 + 훼) ∏ 푁 (푛 푗 − 푘 + 훽) 푛퐻 푘=1 푁 푘=1 푁 = 2 푁 ∏ { ∏ 휋 (푤 ′ |푤 , 푤 )} . 2푓푁푗 푖 푗 푆푖′푗 퐷푖′푗 ∏ (2푓 − 푘 + 훼 + 훽) ′ 푗=1 푘=1 푁푗 푖 =푓푁푗+1 807 808 Appendix B: Details of the derivation of Bayes factors and fractional Bayes factors 809

810 For model 푀1퐺 the Hessian matrix of the log-likelihood is: 1 −푊′푊 (푊′푊품 − 푊′풚) 1 휎2 퐻 = , 1퐺 휎2 1 푛 (풚 − 푊품)′(풚 − 푊품) 푆푦푚 ( − ) 휎2 2 휎2 ( ) ̂∗ 811 thus, matrix ∆ℎ(휽 ) is: 푊′푊 푛 2 ( ) (푊′푊(푊′푊)−1푊′풚 − 푊′푦) (푛 − 푟)푆2 (푛 − 푟)푆2 푛2 푠푦푚 2 2 ( 2((푛 − 푟)푆 ) ) 푊′푊 0 = ( 푆푆푅 ), 푛2 0 2푆푆푅2 812 therefore:

44

1 ′ 1⁄2 푛 |푊 푊| 2 |∆ (휽̂∗)| = ( ) ≔ 퐷 ℎ 푆푆푅(푚풮+2)/2 2 1 휋(휽̂∗) = 휋(품̂|퐺0) 1 (휏2)푣⁄2 −휏2 = (2휋)−푚풮⁄2|퐺0|−푚/2 exp (− (품̂′((퐺0)−1⨂퐼)품̂)) (휎̂2)−(푣⁄2+1) exp ( ), 2 푣 푣⁄2 2휎̂2 Γ (2) 2 813 thus

−(푛+푣+2)⁄2 2 1 푆푆푅 푛 ∫ ∫ 휋(품|퐺0)휋(휎2)푒ln 푓(풚|품,휎 ,푊) 푑휎2푑품 ≈ (2휋)−(푛+1)⁄2 ( ) exp (− ) 퐷1 푛 2 푚풮 ℝ ℝ+ (휏2)푣⁄2 푛휏2 1 × exp (− ) 푛−(푚풮+1)⁄2|퐺0|−푚/2 exp (− (품̂′((퐺0)−1⨂퐼)품̂)) 푣 푣⁄2 2푆푆푅 2 Γ (2) 2 −(푛+푣+2)⁄2 ( 2)푣⁄2 2 1 −(푛+1)⁄2 푆푆푅 푛 휏 푛휏 −(푚풮+1)⁄2 = (2휋) ( ) exp (− ) 푣 exp (− ) 푛 퐷1 푛 2 푣⁄2 2푆푆푅 Γ (2) 2 1 × |퐺0|−푚/2 exp (− (풚′푊(푊′푊)−1((퐺0)−1⨂퐼)품̂)(푊′푊)−1푊′풚). 2 814 퐾 −1 ≔ 1 |퐺0|−푚⁄2 exp ( 풚′퐶(푊, 퐺0)풚), 퐷1 2 815 where −(푛+푣+2)⁄2 ( 2)푣⁄2 2 −(푛+1)⁄2 푆푆푅 푛 휏 푛휏 −(푚풮+1)⁄2 퐾1 = (2휋) ( ) exp (− ) exp (− ) 푛 푛 2 푣 푣⁄2 2푆푆푅 Γ (2) 2 ′ 2 푆푆푅 = 풚 (퐼 − 퐻푊)풚 = (푛 − 푟)푆 퐶(푊, 퐺0) = 푊(푊′푊)−1(퐼⨂(퐺0)−1)(푊′푊)−1푊. 816 Then:

퐾1 0 0 −푚⁄2 −1 ′ 0 0 푓(풚|푊, 푀1퐺) ≈ ∫ 휋(퐺 ) |퐺 | exp ( 풚 퐶(푊, 퐺 )풚) 푑퐺 퐷1 2 + 풫풮 | |푎⁄2 푚 퐾1 횺 0 −(푎+푚+풮+1)⁄2 1 ′ 0 −1 0 = 푎 ∫|퐺 | exp (− 푡푟 ((Σ + ∑ 품̂푗품̂푗) (퐺 ) )) 푑퐺 푎풮⁄2 2 푗=1 퐷12 Γ풮 ( ) + 2 풫풮 푎 + 푚 푎⁄2 퐾 |횺| Γ풮 ( ) = 2푚풮⁄2 1 2 . (푎+푚)⁄2 푎 퐷1 푚 ′ Γ ( ) |횺 + ∑푗=1 품̂푗품̂푗| 풮 2 817 ′ 0 ′ 0 −1 818 The second equality follows by noticing that 풚 퐶(푊, 퐺 )풚 = 품̂ (퐼푚⨂(퐺 ) )품̂. 819 The univariate version of the 퐼푊(푎, 휮) prior posed over the covariance matrix of marker effects 퐺0 is 푎 푏 820 an 퐼퐺 ( , ) prior for the marker effect variance 휎2. Therefore, the expression for the null model is 2 2 푔 푎 푏 821 easily found by replacing 풮 by 1, 푊 by 푊 , the 퐼푊(푎, 횺) density by a 퐼퐺 ( , ) and integrating with 0 2 2 45

2 822 respect to 휎푔 . The resulting expression is completely analogous with matrix 횺 replaced by the scalar

823 푏 and vectors 품̂푗 by scalars 푔̂0푗. This relationship holds for other models and their null versions, 824 hence, hereinafter the derivations for the null models are not presented. Thus, 825 푎 + 푚 퐾 |푏|푎⁄2 Γ ( ) 푓(풚|푊 , 푀 ) ≈ 2푚⁄2 0 2 0 0퐺 (푎+푚)⁄2 푎 퐷0 푚 2 Γ ( ) |푏 + ∑푗=1 푔̂0푗| 2 826 ′ ( )−1 827 where 퐾0 is 퐾1 with 풮 = 1, 푆푆푅 replaced by 푆푆푅0 = 푦 (퐼 − 퐻푊0)푦, 퐻푊0 = 푊0 푊0′푊0 푊0, and 2 2 2 ‖풚−푊0품̂0‖ 828 푆 replaced by 푆0 = , 푟0 = 푟푎푛푘(푊0′푊0) = 푚. 푛−푟0 829 Using these results it follows that:

퐵퐹10퐺푊 푎 + 푚 푎 + 푚 푚풮⁄2 푎⁄2 푎⁄2 2 퐾 퐷 Γ풮 ( ) Γ ( ) |횺| 푏 ≈ 1 0 ( 2 ⁄ 2 ) ( ⁄ ) 푚⁄2 푎 푎 (푎+푚)⁄2 (푎+푚)⁄2 2 퐾0 퐷1 Γ ( ) Γ ( ) 푚 ′ 푚 2 풮 2 2 |횺 + ∑푗=1 품̂푗품̂푗| (푏 + ∑푗=1 푔̂0푗) 푎+푚 푎 −( ) 푛+푣+2 1 2 푚 ′ 2 −( ) (푚풮+2)⁄2 2 |횺| |횺 + ∑ 품̂푗품̂ | 푆푆푅 2 푆푆푅 |푊 ′푊 | = ( ) ( 푗=1 푗 ) ( ) ( 0 0 ) 푏 푏 + ∑푚 푔̂2 푆푆푅 (푚+2)⁄2 |푊′푊| 푗=1 0푗 0 푆푆푅0 푚(풮−1) 풮 푎 + 푚 + 1 − 푙 −푛휏2 1 1 2 2 Γ ( ) × exp ( ( − )) ( ) (∏ 2 ). 2 푆푆푅 푆푆푅0 푛 푎 + 1 − 푙 푙=2 Γ ( 2 ) 830 ∗ 831 The Hessian matrix for model 푀1퐺 does not change with respect to model 푀1퐺 because the likelihood 832 remains the same, thus:

−(푛+푣+2)⁄2 2 1 푆푆푅 ∫ ∫ 휋(품|퐺)휋(휎2)푒ln 푓(풚|품,휎 ,푊) 푑휎2푑품 ≈ (2휋)−(푛+1)⁄2 ( ) 푒−푛⁄2 퐷1 푛 푚풮 ℝ ℝ+ 푚 ( 2)푣⁄2 2 휏 푛휏 −(푚풮+1)⁄2 −1/2 1 ′ −1 × 푣 exp (− ) 푛 ∏|퐺푗| exp (− (품̂푗퐺푗 품̂푗)). Γ ( ) 2푣⁄2 2푆푆푅 2 2 푗=1 833 Using this it follows that: | |푎푚/2 ∗ 퐾1 횺 휋(풚|푊, 푀1퐺) ≈ 푚 푎푚풮/2 푎 퐷12 (Γ풮 (2)) 푚 −1(푎+풮+2)/2 1 × ∫ ∏|퐺 | exp (− 푡푟(횺퐺−1 + 품̂ 품̂′ 퐺−1)) 푑퐺 푗 2 푗 푗 푗 푗 + + 푗=1 풫풮 ×⋯×풫풮 푚 푎푚/2 퐾 |횺| −1(푎+풮+2)/2 1 = 1 ∏ ∫|퐺 | exp (− 푡푟 ((횺 + 품̂ 품̂′ )퐺−1)) 푑퐺 푎 푚 푗 2 푗 푗 푗 푗 퐷 2푎푚풮/2 (Γ ( )) 푗=1 + 1 풮 2 풫풮

46

푚 푎 + 1 푎푚/2 2(푎+1)풮⁄2Γ ( ) 퐾1|횺| 풮 2 = 푚 ∏ (푎+1)⁄2 푎푚풮/2 푎 |횺 + 품̂ 품̂′ | 퐷12 (Γ풮 (2)) 푗=1 푗 푗 푎 + 1 푚 푎푚/2 퐾 Γ풮 ( ) |횺| = 1 2푚풮/2 ( 2 ) 푎 (푎+1)⁄2 퐷1 Γ ( ) 푚 ′ 풮 2 ∏푗=1|횺 + 품̂푗품̂푗| ̂∗ 834 The following are the details of the computation of matrix ∆ℎ(휽 ) and its determinant for model ∗ 835 푀1퐺퐻. Using matrix differentiation it follows that: 휕2푙 = −푊′푉−1푊 휕품휕품′

휕푙 1 −1 휕푉 1 −1 휕푉 −1 2 = − 푡푟 (푉 2) + (풚 − 푊품)′푉 2 푉 (풚 − 푊품) 휕휎푙 2 휕휎푙 2 휕휎푙

1 0 1 0 = − 푡푟 (푉−1 ( 퐼 )) + (풚 − 푊품)′푉−1 ( 퐼 ) 푉−1(풚 − 푊품) 2 푛푙 2 푛푙 0 0

1 0 1 0 = − 푡푟 (( 휎−2퐼 )) + (풚 − 푊품)′ ( 휎−4퐼 ) (풚 − 푊품) 2 푙 푛푙 2 푙 푛푙 0 0 1 = − (푛 휎−2 − 휎−4(풚 − 푊품)′(풚 − 푊품)) 2 푙 푙 푙 2 ′ 휕 푙 푛푙 (풚푙 − 푊푙품푙) (풚푙 − 푊푙품푙) 2 2 = 4 − 6 휕(휎푙 ) 2휎푙 휎푙 휕2푙 2 2 = 0 휕휎푙 휕휎푙′ 2 휕 푙 −1 휕푉 −1 −1 휕푉 −1 2 = − (푊′푉 2 푉 풚 − 푊′푉 2 푉 푊품) 휕휎푙 휕품 휕휎푙 휕휎푙 0 0 ′ −4 −4 = − (푊 ( 휎푙 퐼푛푙 ) 풚 − 푊′ ( 휎푙 퐼푛푙 ) 푊품) 0 0 −4 ′ ′ = 휎푙 (푊푙 푊푙품푙 − 푊푙 풚푙). 836 ̂∗ ̂ 2 2 2 2 ( ) ( )⁄( ) 837 Here 휽 = (품̂, 푉) = (품̂, 휎̂1 퐼푛1, … , 휎̂풮 퐼푛풮 ), 휎̂푙 ≔ 푆푙 = 풚푙 − 푊푙품̂푙 ′ 풚푙 − 푊푙품̂푙 푛 − 푟푙 , thus ̂∗ ∆ℎ(휽 )

47

푛2(푊 ′푊 (푊 ′푊 )−1푊 ′풚 − 푊 ′풚 ) 푛2(푊 ′푊 (푊 ′푊 )−1푊 ′풚 − 푊 ′풚 ) ′ ̂ −1 1 1 1 1 1 1 1 1 1 풮 풮 풮 풮 풮 풮 풮 풮 풮 푊 푉 푊 2 ⋯ 2 푆푆푅1 푆푆푅1 푛3 1 1 ⋯ 0 = 2푆푆푅2 푛 1 ⋱ ⋮ 3 푛풮 푆푦푚 2 ( 2푆푆푅풮 ) 푊′푉̂ −1푊 0 ⋯ 0 푛3 1 ⋯ 0 1 2푆푆푅2 = 1 푛 ⋱ ⋮ 3 푛풮 푆푦푚 2 ( 2푆푆푅풮 ) 838 then: 풮 푊′푉̂ −1푊 푛3 |∆ (휽̂∗)| = | | ∏ 푙 ℎ 푛 2푛푆푆푅2 푙=1 푙 풮 3 푛푙푊푙′푊푙 푛푙 = ∏ | | 2 푛푆푆푅푙 2푛푆푆푅 푙=1 푙 풮 1 푛 푚+3 = ∏ 푙 |푊 ′푊 | 푛풮(푚+1) 2푆푆푅푚+2 푙 푙 푙=1 푙 풮 (푚+3)⁄2 1⁄2 1 푛 ∴ 퐷∗ ≔ |∆ (휽̂∗)| = ∏ 푙 |푊 ′푊 |1⁄2. 1 ℎ 푛풮(푚+1)⁄22풮⁄2 (푚+2)⁄2 푙 푙 푙=1 푆푆푅푙 839 840 Using this result, the Laplace approximation of the integral:

2 2 2 2 2 2 ∫ ∫ ⋯ ∫ 푓(풚|푊, 품, 휎1 , … , 휎풮 ) 휋(품|퐺) 휋(휎1 , … , 휎풮 )푑휎1 … 푑휎풮 푑품 푚풮 ℝ ℝ+ ℝ+ 841 is: 푛 풮 − 푙 2 1 −(푛−풮)⁄2 푆푆푅푙 푛푙 −1⁄2 1 ′ −1 −풮(푚+1)⁄2 ∗ (2휋) ∏ {( ) exp (− )} |퐺| exp (− 품̂ 퐺 품̂) 푛 퐷1 푛푙 2 2 푙=1 풮 푣 2 풮푣⁄2 2 −( +1) (휏 ) 푛푙휏 푆푆푅푙 2 × 풮 ∏ exp (− ) ( ) 푣 푣⁄2 2푆푆푅푙 푛푙 (Γ (2) 2 ) 푙=1 푛 +푣 풮 −( 푙 +1) ( 2)풮푣⁄2 2 2 1 −(푛−풮)⁄2 휏 −풮(푚+1)⁄2 푛 푛푙휏 푆푆푅푙 = ∗ (2휋) 풮 푛 exp (− ) ∏ exp (− ) ( ) 퐷1 푣 푣⁄2 2 2푆푆푅푙 푛푙 (Γ (2) 2 ) 푙=1 푚 −1⁄2 1 × ∏|퐺 | exp (− 품̂′ 퐺−1품̂ ), 푗 2 푗 푗 푗 푗=1

48

842 consequently: ( 2)풮푣⁄2 ∗ 1 −(푛−풮)⁄2 휏 −풮(푚+1)⁄2 휋(풚|푊, 푀1푆푆퐻) ≈ ∗ (2휋) 풮 푛 퐷1 푣 푣⁄2 (Γ (2) 2 ) 푛 +푣 풮 −( 푙 +1) 푛 푛 휏2 푆푆푅 2 × exp (− ) ∏ exp (− 푙 ) ( 푙) 2 2푆푆푅푙 푛푙 푙=1 푚 푎푚⁄2 |횺| −(푎+풮+2)⁄2 1 ′ −1 × 푚 ∏ ∫|퐺푗| exp (− 푡푟 ((횺 + 품̂푗품̂푗)퐺푗 )) 푑퐺푗 ⁄ 푎 2 2푎푚풮 2 (Γ ( )) 푗=1 + 풮 2 풫풮 푎 + 1 푚 ∗ 푎푚⁄2 퐾 Γ풮 ( ) |횺| = 1 2푚풮⁄2 ( 2 ) ∗ 푎 (푎+1)⁄2 퐷1 Γ ( ) 푚 ′ 풮 2 ∏푗=1|횺 + 품̂푗품̂푗| 843 where 푛 +푣 풮 −( 푙 +1) ( 2)풮푣⁄2 2 2 ∗ −(푛−풮)⁄2 휏 −풮(푚+1)⁄2 푛 푛푙휏 푆푆푅푙 퐾1 = (2휋) 풮 푛 exp (− ) ∏ exp (− ) ( ) . 푣 푣⁄2 2 2푆푆푅푙 푛푙 (Γ (2) 2 ) 푙=1 844 845 The null model here is actually not a single model, but independent models each fitting a 846 subpopulation. Thus, the predicted vector of allelic effects is formed by putting together the vectors

847 품̂1, … , 품̂풮 obtained from each individual analysis. Of course, in this situation heterogeneous residual 848 variances are assumed because if analyses for different subpopulations are independent, imposing the 849 same residual variance for all subpopulations does not seem to be the best approach. Notice that for

850 each subpopulation the 풚푙 component of the likelihood has the same form of the 풚 component 851 likelihood of any null model with homogeneous marker effects and residual variances, but here we 852 are considering a subvector of 풚 containing phenotypes from subpopulation 푙 and the appropriate ∗ 853 rows of 푊0. Then, the approximation of 휋(풚|푊, 푀0푆푆퐻) is computed as the product of the ∗ 854 approximations of 휋(풚푙|푊푙, 푀0푆푆퐻), 푙 = 1,2, … , 풮, which, after some simplifications, yields the ∗ 855 expression for 퐵퐹10퐺푊퐻 presented in section 2.3.1. ∗ 856 Regarding fractional Bayes Factors, as mentioned in the paper, 퐹퐵퐹10퐺푊 = 퐹퐵퐹10퐺푊 = 퐹퐵퐹10푆푆푊 = ∗ 857 퐹퐵퐹10푆푆푊, here we present some details on the derivation of 퐹퐵퐹10퐺푊, that is, using the Gaussian 858 prior with homogeneous marker effect covariance matrices. In the other cases the procedure is 859 analogous; the key step is the cancellation of terms coming from the priors. Let

2 푐 0 2 2 퐼1 ≔ ∫ ∫(푓1(풚|품, 휎 , 푊)) 휋(품|퐺 )휋(휎 ) 푑휎 푑품 푚풮 ℝ ℝ+ ̃ 퐾1 0 −푚⁄2 −1 ′ 0 −1 퐼1 ≈ |퐺 | exp ( 품̂ ((퐺 ) ⨂퐼)품̂), 퐷̃1 2 860 where −(푛푐+푣+2)⁄2 ( 2)푣⁄2 2 −(푛푐+1)⁄2 푆푆푅 푛푐 휏 푛휏 −(푚풮+1)⁄2 퐾̃1 = (2휋) ( ) exp (− ) exp (− ) 푛 푛 2 푣 푣⁄2 2푆푆푅 Γ (2) 2

49

(푚풮+1)⁄2 푐 푛 1/2 (푚풮+1)⁄2 퐷̃1 = |푊′푊| = 푐 퐷1. √2푆푆푅(푚풮+2)⁄2 861 Then 푎 + 푚 푚풮⁄2 푎⁄2 2 퐾̃ |횺| Γ풮 ( ) 푓 (풚|푊, 푀 ) ≈ 1 2 . 푐 1퐺 (푚풮+1)⁄2 (푎+푚)⁄2 푎 푐 퐷1 푚 ′ Γ ( ) |횺 + ∑푗=1 품̂푗품̂푗| 풮 2 862 As mentioned in the derivations of Bayes factors, 푓푐(풚|푊0, 푀0퐺) can be easily defined from the 863 previous expression. Let 퐵퐹̅̅̅̅10퐺푊 represent the approximation of 퐵퐹10퐺푊, then 푎+푚 −( ) 퐾̃ 푐(푚풮+1)⁄2퐷 푏 + ∑푚 푔̂2 2 퐹퐵퐹 ≈ 퐵퐹̅̅̅̅ 0 1 ( 푗=1 0푗 ) 10퐺푊 10퐺푊 ̃ (푚+1)⁄2 푚 ′ 퐾1 푐 퐷0 |횺 + ∑푗=1 품̂푗품̂푗| 푎 + 1 − 푙 풮 Γ ( ) × 2푚(1−풮)⁄2 ∏ 2 푎 + 푚 + 1 − 푙 푙=2 Γ ( 2 ) 푛푐+푣+2 − 퐾̃ 푆푆푅 2 푛휏2 1 1 0 = ( 0) 푛−푚(1−풮)⁄2 exp (− ( − )) 퐾̃1 푆푆푅 2 푆푆푅0 푆푆푅 푛푐+푣+2+푛+푣+2 − 2 (푚풮+2)⁄2 푚(풮−1)⁄2 푆푆푅0 푆푆푅 ∴ 퐹퐵퐹10퐺푊 ≈ 푐 ( ) (푚+2)⁄2 푆푆푅 푆푆푅0 푛 − (푐−1) (푚풮+2)⁄2 2 푚(풮−1)⁄2 푆푆푅 푆푆푅0 = 푐 (푚+2)⁄2 ( ) 푆푆푅0 푆푆푅 푛 (푐−1) (푚풮+2)⁄2 2 푚(풮−1)⁄2 푆푆푅 푆푆푅 = 푐 (푚+2)⁄2 ( ) . 푆푆푅0 푆푆푅0 864 ∗ 865 Now we present some details on the derivation of 퐹퐵퐹10푊퐻. 866 푎 + 푚 푚 ∗ 푎푚/2 퐾̃ Γ풮 ( ) |횺| 푓 (풚|푊 , 푀∗ ) ≈ 1 2푚풮⁄2 ( 2 ) 푐 1 1퐺퐻 ̃∗ 푎 (푎+1)/2 퐷1 Γ ( ) 푚 푚 ′ 풮 2 ∏푗=1|횺 + ∑푗=1 품̂푗품̂푗| 푛 푐+푣+2 풮 −( 푙 ) 푛푐 (휏2)풮푣⁄2 푛 휏2 푆푆푅 2 ̃∗ −(푛푐−풮)⁄2 −풮(푚+1)⁄2 푙 푙 퐾1 = (2휋) 푛 exp (− ) 풮 ∏ exp (− ) ( ) 2 푣 푣⁄2 2푆푆푅푙 푛푙 (Γ (2) 2 ) 푙=1 ∗ 풮(푚+1)/2 ∗ 퐷̃1 = 푐 퐷1. 867 868 After some simplifications it follows that 869 퐾̃∗퐷̃∗ 퐾̃∗퐷∗ 퐹퐵퐹∗ ≈ 0 1 = 0 1 10푊퐻 ̃∗ ̃∗ ̃∗ ∗ 퐾1 퐷0 퐾1 퐷0

50

∗ 풮(푚+1)/2 ∗ −1 퐾̃0 푛 퐷1 ∗ = (푚+1)/2 = ( ∗) 퐾̃ 풮 퐷0 1 ∏푙=1 푛푙 ∗ ∴ 퐹퐵퐹10푊퐻 ≈ 1 870

871 Appendix C: Comments about the use of 푭푩푭̅̅̅̅̅̅ퟏퟎ푾 in the non-full rank case under the priors 872 used in this study

873 Notice that 퐹퐵퐹̅̅̅̅̅̅10푊 is an approximation of 퐹퐵퐹10푊 for a full rank linear regression model. Here, 874 퐹퐵퐹̅̅̅̅̅̅10푊 is seen as a limiting form of the approximate fractional Bayes factor of a function referred to 875 as “augmented likelihood” and we present an outline of how to justify its convergence to the true 876 fractional Bayes factor when the model is not of full rank. Take into account that what follows is not 877 rigorous enough to justify this approach. 878 Consider the “augmented likelihood”: −1 퐿 (품, 휎2, 푊, 휌|풚) ≔ (2휋)−푛⁄2(휎2)−푛⁄2 exp ( ((풚 − 푊품)′(풚 − 푊품) + 휌품′품)) , 휌 > 0 푎 2휎2 2 2 2 879 notice that lim휌→0 퐿푎(품, 휎 , 푊, 휌|풚) = 푓(풚|품, 휎 , 푊) ≔ 퐿(품, 휎 , 푊|풚), the 풚 component of 880 likelihood. Using this augmented likelihood instead of the proper likelihood the integral that has to 881 be solved is:

2 0 2 2 ∫ 휋(퐺) ( ∫ ∫ 퐿푎(품, 휎 , 푊, 휌|풚)휋(품|퐺 )휋(휎 ) 푑휎 푑품) 푑퐺 ≔ 퐼휌, + 푚풮 풫 ℝ ℝ+ 882 under the regularity condition that limits and integrals can be interchanged (satisfied in exponential 883 families), it follows that:

2 0 2 2 lim 퐼휌 = ∫ 휋(퐺) ( ∫ ∫ 퐿(품, 휎 , 푊|풚)휋(품|퐺 )휋(휎 ) 푑휎 푑품) 푑퐺 ≔ 퐼, 휌→0 + 푚풮 풫 ℝ ℝ+ 2 884 thus, under the stated regularity condition, for small 휌, using 퐿푎(품, 휎 , 푊, 휌) instead of the proper 885 likelihood is a proxy to compute 퐼. Then, the Laplace approximation of 2 2 2 ∗ 886 ∫ 푚풮 ∫ 퐿푎(품, 휎 , 푊, 휌)휋(품|퐺)휋(휎 ) 푑휎 푑품 ≔ 퐼휌 (recall that Laplace approximation is used only ℝ ℝ+ 887 for this inner integral) can be computed because it implies inverting a matrix of the form 푊′푊 + 휌퐼 888 which is positive definite and therefore it is invertible and its determinant is not null. This ∗ 889 approximation is denoted as 퐼휌̅ . Subsequently the same steps used before are performed in order to 890 obtain a pseudo fractional Bayes factor. 푛 푚풮+2 (푐−1) 푆푆푅2 2 (푆푆푅2) 2 ̅̅̅̅̅̅휌 푚(풮−1)⁄2 휌 휌 퐹퐵퐹10푊 = 푐 ( 2 ) 푚+2, 푆푆푅휌0 2 2 (푆푆푅휌0) 2 ′ ′ −1 2 ′ ′ −1 ′ 891 where 푆푆푅휌 = 풚 (퐼 − 푊(푊 푊 + 휌퐼) 푊′)풚, and 푆푆푅휌0 = 풚 (퐼 − 푊0(푊0푊0 + 휌퐼) 푊0)풚. We 892 know that: ∗ ∗ 893 퐼휌̅ ~퐼휌 as 푛 → ∞ (by Laplace approximation) ∗̅ ∗̅ ∗ 퐼휌 894 where 퐼휌~퐼휌 means ∗ → 1 as 푛 → ∞. Under the regularity condition mentioned above it implies that 퐼휌

51

895 퐼휌̅ ~퐼휌 as 푛 → ∞ (퐶. 1) where 퐼̅ = 휋(퐺)퐼∗̅ 푑퐺. We also know that 896 휌 ∫풫+ 휌

∗ ∗ 2 0 2 2 퐼휌~퐼 = ∫ ∫ 퐿(품, 휎 , 푊|풚)휋(품|퐺 )휋(휎 ) 푑휎 푑품 푎푠 휌 → 0 푚풮 ℝ ℝ+ ⟹ 퐼휌~퐼 푎푠 휌 → 0 (퐶. 2) 897 however, (퐶. 1) and (퐶. 2) do not necessarily imply that

898 퐼휌̅ ~퐼 as(푛, 휌) → (∞, 0) (퐶. 3) 899 because iterated limits are not always equal to multivariate (bivariate in this case) limits. Thus, it has 퐼∗̅ 퐼∗̅ 900 to be shown that lim (lim 휌) = lim 휌 = 1. To proof this, the first step is to show 휌→0 푛→∞ 퐼 (푛,휌)→(∞,0) 퐼 901 that the bivariate limit exists and then that it is equal to the iterated limit. If this is shown, then 퐼∗̅ 퐼∗̅ 902 lim 휌 = lim (lim 휌) = 1, and 퐹퐵퐹̅̅̅̅̅̅ which clearly satisfies (푛,휌)→(∞,0) 퐼 푛→∞ 휌→0 퐼 10푊 휌 퐹퐵퐹̅̅̅̅̅̅10푊 = lim 퐹퐵퐹̅̅̅̅̅̅ 휌→0 10푊 903 is an approximation to the fractional Bayes factor in the non-full rank scenario. 904 905

52