1 Joint genome-wide prediction in several populations accounting for randomness of genotypes:
2 A hierarchical Bayes approach. II: Multivariate spike and slab priors for marker effects and
3 derivation of approximate Bayes and fractional Bayes factors for the complete family of models
4 Carlos Alberto Martíneza,b, Kshitij Khareb, Arunava Banerjeec, Mauricio A. Elzoa
5 aDepartment of Animal Sciences
6 bDepartment of Statistics
7 cDepartment of Computer and Information Science and Engineering
8 University of Florida, Gainesville, FL, USA
9 Correspondence: Carlos Alberto Martínez, Department of Animal Sciences, University of Florida,
10 Gainesville, FL 32611, USA.
11 Tel: 352-328-1624.
12 Fax: 352-392-7851.
13 E-mail: [email protected]
14
15
16
17
18
19
20
21
22
23
24
25 1
26 Abstract
27 This study corresponds to the second part of a companion paper devoted to the development of
28 Bayesian multiple regression models accounting for randomness of genotypes in across population
29 genome-wide prediction. This family of models considers heterogeneous and correlated marker
30 effects and allelic frequencies across populations, and has the ability of considering records from
31 non-genotyped individuals and individuals with missing genotypes in any subset of loci without the
32 need for previous imputation, taking into account uncertainty about imputed genotypes. This paper
33 extends this family of models by considering multivariate spike and slab conditional priors for
34 marker allele substitution effects and contains derivations of approximate Bayes factors and
35 fractional Bayes factors to compare models from part I and those developed here with their null
36 versions. These null versions correspond to simpler models ignoring heterogeneity of populations,
37 but still accounting for randomness of genotypes. For each marker loci, the spike component of
38 priors corresponded to point mass at 0 in ℝ풮, where 풮 is the number of populations, and the slab
39 component was a 풮-variate Gaussian distribution, independent conditional priors were assumed. For
40 the Gaussian components, covariance matrices were assumed to be either the same for all markers or
41 different for each marker. For null models, the priors were simply univariate versions of these finite
42 mixture distributions. Approximate algebraic expressions for Bayes factors and fractional Bayes
43 factors were found using the Laplace approximation. Using the simulated datasets described in part I,
44 these models were implemented and compared with models derived in part I using measures of
45 predictive performance based on squared Pearson correlations, Deviance Information Criterion,
46 Bayes factors, and fractional Bayes factors. The extensions presented here enlarge our family of
47 genome-wide prediction models making it more flexible in the sense that it now offers more
48 modeling options.
2
49 Key words: Bayesian whole-genome regressions; Finite mixture priors; genetic heterogeneity;
50 Laplace approximation; multi-population genome-enabled prediction.
51 1. Introduction
52 The scenario of across population genome-wide prediction accounting for randomness of genotypes
53 was addressed in part I of our series of studies. There, we adopted a hierarchical Bayesian modeling
54 strategy to accommodate heterogeneous and correlated marker effects across subpopulations and
55 random genotypes. In that companion paper we provided a detailed derivation of the joint pmf of the
56 genotypes conditional on pedigree information and allelic frequencies and also discussed some of its
57 properties. Furthermore, the flexibility of hierarchical Bayesian modeling allowed us to account for
58 heterogeneous and correlated allelic frequencies. The “MG-GBLUP” model proposed by Lehermeir
59 et al. (2015) is similar to the models developed in part I of this study, except that they did not
60 consider randomness of genotypes. In addition, they did not consider models with different
61 (heterogeneous) covariance matrices of marker effects. One of the main properties of our models is
62 that individuals with phenotypic records and missing genotypes at any subset of loci (including non-
63 genotyped individuals) can be considered in the analysis without previous imputation. Furthermore,
64 due to the use of a Bayesian approach, uncertainty about imputed genotypes is automatically taken
65 into account.
66 The so called “spike and slab” priors, are finite mixtures of a continuous distribution (the slab) and a
67 mass point at some constant (the spike) (Mitchell and Beauchamp, 1988). A particular case of these
68 priors are the zero-inflated priors which have point mass at zero. This sort of priors has been used in
69 high dimensional problems to induce a stronger shrinkage and perform variable selection. In single
70 population analyses, it has been reported that when there are genes with major effects controlling the
71 trait under study or the number of genes controlling the trait is low, Bayesian variable selection
72 models tend to perform better (Daetwyler et al., 2012; Heslot et al., 2012; Gianola and Rosa, 2015).
73 In the case of multiple population analyses, van den Berg et al. (2015) studied scenarios under which
3
74 Bayesian variable selection models outperformed genomic BLUP (GBLUP). They found that
75 GBLUP was outperformed when the number of QTL was smaller than the number of independent
76 chromosome segments. They also found that the difference in accuracy between these models was
77 larger than in the single population case.
78 In a Bayesian framework, model comparison can be performed via Bayes factors and some
79 modifications of them known as non-subjective Bayes factors (Ghosh et al, 2006). Bayes factors
80 measure the change in the odds favoring a model once data are observed (Lavine and Schervish,
81 1999). On the other hand, O’Hagan (1994; 1995) proposed a non-subjective Bayes factor known as
82 fractional Bayes factor which uses a fractional part of the likelihood resulting in a “partial” Bayes
83 factor. Analytical forms of Bayes factors involve integration of the joint distribution of data and
84 parameters over the parameter space of a given model to obtain marginal likelihoods, and even for
85 some simple models these integrals do not have a closed form solution. One option to obtain
86 algebraic approximations is to use the Laplace approximation after arranging the integrand in an
87 appropriate form (Ghosh et al, 2006). Another criterion to compare models is the Deviance
88 Information Criterion (DIC, Spiegelhalter et al., 2002; 2014) which combines measures of model fit
89 and model complexity and, despite some limitations, it has been used in several research areas
90 (Spiegelhalter et al., 2014).
91 Thus, the objectives of this study were to extend the family of models developed in a companion
92 paper (part I) by considering the so called spike and slab priors for marker effects and to derive
93 approximate expressions for Bayes factors and fractional Bayes factors to compare the proposed
94 models with their corresponding null versions that ignore population structure.
95 2. Methods
96 2.1 The models
4
97 The complete population or simply the population is defined as the set of individuals with
98 phenotypes considered in the analysis, which is comprised by a set of 풮 subpopulations defined by
99 some criterion like environment, race, breed, line, etc. Also the following assumptions are made:
100 linkage equilibrium, Hardy-Weinberg equilibrium, no mutation, and starting from the oldest
101 individuals with phenotypes, the pedigree is fully known.
102 The following is the linear model describing the relationship between records and mappings of
103 marker genotypes: 풚 = 푊품 + 풆, where 풚 ∈ ℝ푛 is a vector containing response variables (e.g.,
104 records corrected for non-genetic factors), 푊 ∈ ℝ푛×푚 is an observable random matrix with entries
105 corresponding to a one to one mapping from the set of individual marker genotypes to a subset of the
106 integers (defined later), 품 ∈ ℝ푚 is an unknown random vector of average marker allele substitution
107 effects for every population and 풆 ∈ ℝ푛 is a random vector of residuals. If records are sorted by
108 subpopulation as well as the columns of 푊 and the elements of 품, then for every 푙 = 1,2, … , 풮,
( ) ( ) ( ) ( ) 109 풚푙 = 푊푙품푙 + 풆푙, with dimensions: 풚푙 푛푙×1, 푊푙 푛푙×푚, 품푙 푚×1 and 풆푙 푛푙×1 where 푛푙 is the sample
풮 110 size of subpopulation 푙, and 푚 is the number of marker loci; therefore, 푛 = ∑푙=1 푛푙.
111 In our models, the mapping from the set of genotypes at each locus and each individual into a subset
112 of the integers is defined as follows, biallelic loci are considered. If A and B are the marker alleles at
113 each locus and B is considered the reference allele then:
1, 푖푓 푔푒푛표푡푦푝푒 = 퐵퐵 푊 = {푤푙 } = { 0, 푖푓 푔푒푛표푡푦푝푒 = 퐴퐵 . 푙 푖푗 푛 ×푚 푙 −1, 푖푓 푔푒푛표푡푦푝푒 = 퐴퐴
2 2 114 The following is the hierarchical representation of our models. Let 푅 = (휎푒1, … , 휎푒풮) and 푉 =
풮 115 퐵푙표푐푘 퐷푖푎푔. {휎2 퐼 } then 푒푙 푛푙 푙 = 1
풚|푊, 품, 푅~푀푉푁(푊품, 푉)
∗ ∗ ∗ ∗ ∗ ∗ 푊|풑1, 풑2, … , 풑푚~휋(∙|풑1, 풑2, … , 풑푚)
5
푖푖푑 ∗ ∗ 풑푗 ∼ 휋(풑 ), 푗 = 1,2, … , 푚
푖푖푑 휏2 푣 휏2 푣 휎2 , … , 휎2 ∼ 퐼푛푣푒푟푠푒 퐺푎푚푚푎 ( , ) ≔ 퐼퐺 ( , ) 푒1 푒풮 2 2 2 2
푖푛푑 푃표푖푛푡 푚푎푠푠 푎푡 ퟎ 푤푖푡ℎ 푝푟표푏푎푏푖푙푖푡푦 휋0 품푗|퐺푗, 휋0 ∼ { 푀푉푁(ퟎ, 퐺푗) 푤푖푡ℎ 푝푟표푏푎푏푖푙푖푡푦 1 − 휋0
푖푖푑 퐺푗 ∼ 퐼푛푣푒푟푠푒 푊푖푠ℎ푎푟푡(푎, 휮) ≔ 퐼푊(푎, 휮)
휎2 휎 ⋯ 휎 푗1 푗1,2 푗1,풮 휎2 ⋯ 휎 푗2 푗2,풮 퐺푗 = ⋱ ⋮ 푠푦푚 휎2 [ 푗풮 ]
where 휎2 is the residual variance in subpopulation 푙, 휎2 is the variance of the effect of the 푗푡ℎ marker 116 푒푙 푗푙
푡ℎ 117 in the 푙 subpopulation, 휎푗푙,푙′ is the covariance between effects of marker 푗 in subpopulations 푙 and
∗ 푡ℎ ∗ 118 푙′, 풑푗 is a parameter related to allelic frequencies of the 푗 marker in each subpopulation and 휋(풑 )
119 is its probability density function (pdf). This set of parameters and their pdf are described in part I of
120 this series of papers. Here, parameter 휋0 was assumed to be known.
121 The model presented above assumed a different covariance matrix for the vector of allele substitution
122 effects for each marker in the slab component of the mixture distribution and consequently this sort
123 of models will be referred to as heterogeneous marker effects covariance matrix models. On the other
0 124 hand, models with 퐺1 = ⋯ = 퐺푚 = 퐺 will be referred to as homogeneous marker effects
2 2 2 125 covariance matrix models. Moreover, the special case 휎푒1 = ⋯ = 휎푒풮 = 휎 corresponds to that of
126 models with homoscedastic residuals.
127 In part I, it was discussed that the scenario of completely (i.e., at all loci) or partially missing
∗ ∗ ∗ ∗ ∗ 128 genotypes can be handled because of the use of the pmf 휋(푊|푃 ), 푃 = (풑1, 풑2, … , 풑푚) and the fact
129 that these missing genotypes are regarded as model parameters. There, it was also shown that the
6
130 likelihood can be written as 푓(풚, 푊ℴ|푊푁, 품, 푅, 푃∗) = 푓(풚|푊, 품, 푅)푓(푊ℴ|푊푁, 푃∗) where 푊ℴ is the
131 fraction of 푊 corresponding to observed genotypes, 푊푁 the fraction corresponding to missing
132 genotypes, and 푓(풚|푊, 품, 푅) and 푓(푊ℴ|푊푁, 푃∗) are referred to as the 풚 component and the 푊
133 component of the likelihood.
134 The conditional prior for 품푗 can be written as:
휋(품 |퐺 , 휋) = 휋 퐼 + (1 − 휋 )푀푉푁(품 ; 0, 퐺 )퐼 푗 푗 0 {품푗=ퟎ} 0 푗 푗 {품푗≠ퟎ}
135 where 퐼{∙} is the indicator function. This form is more convenient from the algebraic point of view
136 because it allows carrying out computations and writing expressions for the joint conditional prior in
137 an easier way. Under the conditional independence assumption, the joint conditional prior for 품 is:
푚 휋(품|퐺, 휋 ) = 휋 퐼 + (1 − 휋 )푀푉푁(품 ; 0, 퐺 )퐼 . 0 ∏ { 0 {품푗=ퟎ} 0 푗 푗 {품푗≠ퟎ}} 푗=1
138 An explicit form of this prior pdf can be found as follows. Let 푖 = 0,1, … , 푚 be the number of
139 markers having a null effect. Consequently, when expanding the product above, for each 푖 there are
푚 푚 140 ( 푖 ) combinations of 푖 markers with null effect chosen from 푚 markers. For 푙 = 1,2, … , ( 푖 ), let 훿푖푙
141 denote the event that the 푙푡ℎ subset of 푖 markers (i.e., the 푙푡ℎ combination of 푖 markers with null
142 effect chosen from the total set of 푚 markers) have null effect and 퐼훿푖푙 the indicator function of this
푚 143 event. Thus, there are ( 푖 ) terms in the expansion with 휋0 appearing exactly 푖 times; therefore, each
푚 144 one of these ( 푖 ) terms is of the form:
푖 ( )푚−푖 ( ) 퐼훿푖푙 휋0 1 − 휋0 ∏ 푀푉푁 품푘; ퟎ, 퐺푘 푐 푘:푔푘∈훿푖푙0
푐 145 where 훿푖푙0 is the set of marker loci with null effects given 훿푖푙, and 훿푖푙0 is its complement, i.e., the set
146 of 푚 − 푖 markers with non-null effect under 훿푖푙. Therefore when expanding 휋(품|퐺, 휋0) for the
147 heterogeneous marker effect covariance matrix model:
7
푚 푚 ( 푖 ) ( | ) 푖 ( )푚−푖 ( ) 휋 품 퐺 = ∑ 휋0 1 − 휋0 ∑ 퐼훿푖푙 ∏ 푀푉푁 품푘; ퟎ, 퐺푘 , 푐 푖=0 푙=1 푘:푔푘∈훿푖푙0
148 while for the homogeneous marker effect covariance matrix model the expression is the same except
0 149 that 퐺푗 = 퐺 ∀ 푗 = 1,2 … , 푚.
150 Regarding the marginal priors, under homogeneous covariance matrix of marker effects:
푚 푚 ( 푖 ) −푎+풮+푚−푖+1 −1 휋(품) ∝ ∑ 휋푖 (1 − 휋 )푚−푖 ∑ 퐼 ∫|퐺0| 2 exp 푡푟 ((횺 + ∑ 품 품′ ) (퐺0)−1) 푑퐺0 0 0 훿푖푙 2 푘 푘 푖=0 푙=1 + 푘:품 ∈훿푐 풫풮 푘 푙푖0 ( )
푎+푚−푖 푚 −( ) 푚 ( 푖 ) 2 푎 + 푚 − 푖 푖 ( )푚−푖 풮(푚−푖)⁄2 ′ ∝ ∑ 휋0 1 − 휋0 ∑ 퐼훿푖푙 2 Γ풮 ( ) |횺 + ∑ 품푘품푘| . 2 푐 푖=0 푙=1 푘:푔푘∈훿푙푖0
151 Hence, marker effects are not marginally independent a priori and their joint marginal prior
푖 푚−푖 152 distribution is a mixture of non-standard distributions with mixing probabilities 휋0(1 − 휋0) .
153 For heterogeneous marker effect covariance matrix model:
푚 푚 ( 푖 ) 푖 푚−푖 푖 푚−푖 −풮푖⁄2 푎 푎 + 1 1 휋(품) ∝ ∑ 휋0(1 − 휋0) ∑ 퐼훿 2 Γ풮 ( ) Γ풮 ( ) ∏ 푎+1 . 푖푙 2 2 ( ) 푐 ′ −1 2 푖=0 푙=1 푘:푔푘∈훿푖푙0 품푘횺∗ 품푘 |1 + 푎 + 1 − 풮|
푖 푚−푖 154 This is a mixture distribution with mixing probabilities 휋0(1 − 휋0) . Each component in the
푚 155 mixture is a sum of ( 푖 ) elements. Each one of these elements is the product of 푚 − 푖 multivariate t
1 156 distributions with scale matrix 횺 = 횺 and degrees of freedom 푎 + 1 − 풮 for non-null vectors ∗ 푎+1−풮
157 of markers effects, and point mass at zero for 푖 null vectors of marker effects, under event 훿푖푙. In this
158 case, marker effects are marginally independent a priori.
159 2.2 Full conditionals
160 Only full conditionals that change with respect to those considered in part I are presented.
휋(품|퐸푙푠푒) =
8
(푚) 푚 푖 ′ −1 ′ ′ −1 푊 푐 푊훿푐 푊 푐 풚 푊 푐 푊훿푐 푖 푚−푖 훿푖푙0 푖푙0 −1 훿푖푙0 훿푖푙0 푖푙0 −1 ∑ 휋0(1 − 휋0) ∑ 퐼훿 푀푉푁 (품훿푐 ; ( + 퐺 푐 ) , ( + 퐺 푐 ) ) 푖푙 푖푙0 휎2 훿푖푙0 휎2 휎2 훿푖푙0 푖=0 푙=1
′ ′ ′ 푐 161 where 품 푐 = (품푘 ⋯ 품푘 ) , 푘: 품 ∈ 훿 , corresponds to the vector of dimension 풮(푚 − 푖) 훿푖푙0 1 푚−푖 푘 푖푙0
162 with the non-null marker effects under 훿 , 푊 푐 is the submatrix of the design matrix corresponding 푖푙 훿푖푙0
−1 −1 163 to 품 푐 and 퐺 푐 = 퐼 ⨂퐺 , 푖 = 0,1, … , 푚. 훿푖푙0 훿푖푙0 푚−푖 0
164 Remark 1 Notice that each element in the summation above corresponds to a multivariate normal
푐 165 distribution of dimension 풮(푚 − 푖) for those markers in 훿푖푙0 and point mass at zero for those markers
166 in 훿푖푙0. Thus, in each term, the multivariate normal corresponds to the distribution of the effects of
167 the subset of markers with non-null effects given 훿푖푙. Therefore, this joint full conditional distribution
168 of 품 suggests that for each marker, the full conditional distribution of 품푗 (given data, and other
169 parameters in the model including the remaining components of 품) is a spike and slab distribution.
170 Note that it is easier to deal with 휋(품푗|퐸푙푠푒) than with 휋(품|퐸푙푠푒). The full conditional 휋(품푗|퐸푙푠푒)
171 can be found from 휋(품|퐸푙푠푒) using the Bayes theorem. However, this could be complex because it
172 requires identifying all the cases in which 품푗 = 0 and all the cases in which 품푗 ≠ 0. An easier way is
173 to derive it using the conditional prior for 품푗. Details are presented in Appendix A. The final result
174 is:
휋(품푗|퐸푙푠푒) =
푊′ 푝(품 = 0|퐸푙푠푒)퐼 + (1 − 푝(품 = 0|퐸푙푠푒)) 푀푉푁 (퐺−1 푗 (풚 − 푊 품 ), 퐺−1) 퐼 푗 {품푗=ퟎ} 푗 퐹푗 휎2 (−푗) (−푗) 퐹푗 {품푗≠ퟎ}
′ 푊 푊푗 퐺 = 푗 + (퐺0)−1 퐹푗 휎2 휋 푝(품 = 0|퐸푙푠푒) = 0 , 푗 −1⁄2 2 0 1 −1⁄2 ′ 휋0 + (1 − 휋0)(|퐺퐹푗||퐺 |) exp ( 2 ‖퐺∗퐹푗 푊푗 (풚 − 푊(−푗)품(−푗))‖ ) 2휎 2
9
2 ′ 2 0 −1 175 where 퐺∗퐹푗 = 휎 퐺퐹푗 = 푊푗 푊푗 + 휎 (퐺 ) . Thus, the full conditional distribution of 품푗 is a spike
−1 ′ −1 176 and slab distribution where the slab component is a 푀푉푁(퐺퐹푗 푊푗 (풚 − 푊(−푗)품(−푗)), 퐺퐹푗 ) and the
177 spike is a point mass at 0 in ℝ풮. On the other hand,
푚 푚 ( 푖 ) (푚−푖+푎+풮+1) − ( 0| ) 푖( )푚−푖 | 0| 2 휋 퐺 퐸푙푠푒 ∝ ∑ 흅 1 − 흅 ∑ 퐼훿푖푙 퐺 푖=0 푙=1
−1 ′ 0 −1 × exp ( 푡푟 ((횺 + ∑ 품푘품푘) (퐺 ) )), 2 푐 푘:푔푘∈훿푖푙0
178 a mixture of sums of inverse Wishart distributions with mixing probabilities 흅푖(1 − 흅)푚−푖, 푖 =
푡ℎ 푚 179 0,1, … , 푚. The 푖 component of the mixture is the sum of ( 푖 ) inverse Wishart distributions with
′ 푚 180 parameters (푚 − 푖 + 푎, 횺 + ∑ 푐 품 품 ) 퐼 , 푙 = 1,2, … , ( ). 푘:푔푘∈훿푖푙0 푘 푘 훿푖푙 푖
181 For the heterogeneous marker effect covariance matrix model the full conditional 휋(품푗|퐸푙푠푒) has the
182 same form as for the homogeneous marker effect covariance matrix model except that now 퐺퐹푗 =
푊′푊 183 푗 푗 + 퐺−1 and 퐺 = 푊′푊 + 휎2퐺−1 and 휎2 푗 ∗퐹푗 푗 푗 푗
′ 퐼푊(푎 + 1, 횺+품푗품푗), 푖푓 품푗 ≠ 0 휋(퐺푗|퐸푙푠푒) = { 퐼푊(푎, 횺), 푖푓 품푗 = 0
184 The expressions for models with heteroscedastic residuals are very similar and therefore these are
185 omitted. Such expressions can be found in Appendix A along with joint posterior distributions.
186 2.3 Model comparison
187 2.3.1 Theoretical approximation to model comparison via Bayes factors and fractional Bayes factors
188 Here, the term null model refers to simplified versions of the proposed models in two scenarios. The
189 first one corresponds to the case in which all data are pooled and the factor splitting the complete
190 population into subpopulations is ignored. In the second scenario, the complete population is split
191 into subpopulations and each one of them is analyzed independently. The null model corresponding
10
192 to the first scenario was already presented in part I, and for the second scenario, the model for each
193 subpopulation is the same, but only considering data from the corresponding subpopulation. This
194 model is referred to as independent subpopulations model.
195 In order to find some theoretical approach to compare the full models with their null versions,
196 approximate Bayes factors and fractional Bayes factors are derived in this section. To this end,
197 analytical approximations of multivariate integrals that have to be solved to find marginal likelihoods
198 are derived. The Laplace approximation (Ghosh et al., 2006) is used to solve some of these
199 multivariate integrals. As will be shown in this section, the use of the Laplace approximation requires
200 the matrix 푊 to be of full column rank. This assumption does not hold in many real life situations
201 where 푚 > 푛 and therefore this matrix cannot be of full column rank. However, as more individuals
202 are genotyped, this situation can be found more frequently, especially for chips of intermediate
203 density. Notice that for matrix 푊 to be of full rank, the number of observations in each
204 subpopulation cannot be smaller than 푚; therefore, the requirement is that 푛푙 ≥ 푚 ∀ 푙 = 1,2, … , 풮.
205 As a matter of fact, in countries like the US there exist data sets where the number of genotyped
206 animals exceeds the number of molecular markers in chips like the Illumina 50k (CDCB, 2016).
207 Moreover, in certain cases, some filtering or preselection criteria reduces the set of markers to be
208 included in the analyses and for populations with a large amount of genotyped individuals this could
209 also lead to the full rank scenario. More comments on this will be made in the discussion. Therefore,
210 in real life situations like across country or across breed analysis, the situation 푛푙 ≥ 푚 ∀ 푙 =
211 1,2, … , 풮 could be observed, thus the assumption of matrix 푊 being of full column rank could be
212 satisfied. Of course, 푛푙 ≥ 푚 ∀ 푙 = 1,2, … , 풮 is not a sufficient condition for matrix 푊 to be of full
213 column rank, but given the structure of this matrix, this would generally be the case except in certain
214 situations, for example, having clones in the same subpopulation.
215 Bayes factors
11
216 Bayes factors have generally been interpreted as measures of support in favor of a model provided by
217 data. Lavine and Schervish (1999) showed that what Bayes factors are actually measuring the change
218 in the odds favoring a model once data are observed. The Bayes factor comparing two models
219 denoted as 푀1 and 푀0 is defined as:
푓(풚|푀1) 퐵퐹10 = 푓(풚|푀0)
∫ 휋1(휽1)푓1(풚|휽1)푑휽1 = Θ1 ∫ 휋0(휽0)푓0(풚|휽0)푑휽0 Θ0
220 where 휽푖, 휋푖(휽푖), 푓푖(풚|휽푖) and Θ푖 are the parameters, prior, likelihood and parametric space under
221 model 푖, respectively, 푖 = 1, 2.
222 Approximate Bayes factors comparing homogenous marker effect covariance matrix models
223 (Gaussian and spike and slab priors, homoscedastic residuals) and heterogeneous marker effect
224 covariance matrix models (Gaussian and spike and slab priors, homoscedastic residuals) to their null
225 versions were derived. Also, an approximate Bayes factor comparing the heterogeneous marker
226 effect covariance matrix model with heteroscedastic residuals with the independent subpopulations
227 model was found. These approximate Bayes factors were conditioned on the genotypes (i.e.,
228 conditioned on 푊 and 푊0). Therefore, the 풚 component of the likelihood is used. The case when a
229 part of 푊 is not observed is treated at the end of this section.
230 A brief outline of the derivation of these approximate Bayes factors is presented. In each case, model
231 sub-index 1 corresponds to the full model while sub-index 0 denotes the null model. The Bayes
232 factor comparing homogeneous marker effect covariance matrix models with its null version is
233 denoted 퐵퐹10푊 when a Gaussian prior is posed over 품 and residuals are homoscedastic. Whenever
234 residuals are heteroscedastic the letter 퐻 appears in the subindex and when the prior posed over 품 is
235 spike and slab the letter 퐺 is replaced by 푆푆. Moreover, the superindex ∗ is used to identify models
12
236 with heterogeneous marker effect covariance matrices. The same subindex notation is used for
237 fractional Bayes factors.
238 In general, let:
푓(풚|푊, 푀1) 퐵퐹10푊 = . 푓(풚|푊0, 푀0)
2 0 ∗ 239 For the homogeneous marker effect covariance matrix model 휽1 ≔ (휽, 흓)=({품, 휎 , 푊}, {퐺 , 푃 })
∗ 2 2 240 and 휽0 ≔ (휽0, 흓0) = ({품0, 휎 , 푊0}, {휎푔 , 풑0}). Let ℝ+ denote the positive reals. Then:
휋(풚, 휽1) = 푓(풚|휽)휋(휽, 흓)
= 푓(풚|품, 휎2, 푊)휋(품, 퐺0)휋(휎2)휋(푊, 푃∗)
241 then
∫ ∫ ∫ 푓(풚|품, 휎2, 푊)휋(품, 퐺0)휋(휎2) 푑휎2푑품푑퐺0 = 푓(풚|푊)
+ 푚풮 풫풮 ℝ ℝ+
= ∫ 휋(퐺0) ( ∫ ∫ 푓(풚|품, 휎2, 푊)휋(품|퐺0)휋(휎2) 푑휎2푑품) 푑퐺0
+ 푚풮 풫풮 ℝ ℝ+
242 Thus, the previous multiple integral has to be solved in order to find 푓(풚|푊). An analytic expression
2 0 2 2 243 for the inner integral ∫ 푚푛 ∫ 푓(풚|품, 휎 , 푊)휋(품|퐺 )휋(휎 ) 푑휎 푑품 is approximated using the ℝ ℝ+
244 Laplace approximation (Ghosh et al., 2006). As shown in appendix B, after obtaining this
245 approximation, the external integral can be found in a closed form. The Laplace method is based on a
246 second order Taylor series expansion and allows finding an approximation to integrals of the form:
퐼 = ∫ 푞(휽)푒푛ℎ(휽)푑휽, ℝ푝
247 where 푞 and ℎ are smooth functions of 휽 and ℎ has a unique maximum at 휽̂. In Bayesian statistics,
248 푛ℎ(휽) is usually taken to be the log-likelihood or the log of the unnormalized posterior. Hence, 휽̂ can
13
249 be the MLE or the posterior mode when the posterior is unimodal. The Laplace approximation has
250 the form (Ghosh et al., 2006):
푛ℎ(휽̂) 푝⁄2 −푝⁄2 ̂ −1⁄2 ̂ −1 퐼 = 푒 (2휋) 푛 |∆ℎ(휽)| 푞(휽)(1 + 푂(푛 )),
̂ ̂ 251 where 푝 = dim(휽) and |∆ℎ(휽)| is the determinant of the Hessian matrix of −ℎ evaluated at 휽. The
252 inner integral in 푓(풚|푊) can be written as:
2 ∗ ∫ ∫ 휋(품|퐺0)휋(휎2)푒ln 푓(풚|품,휎 ,푊) 푑휎2푑품 ≔ ∫ 푞(휽∗)푒푛ℎ(휽 )푑휽∗,
푚풮 푚풮+1 ℝ ℝ+ ℝ
253 where 휽∗ ≔ (품, 휎2).
254 Under the assumption that 푓(풚|품, 휎2, 푊) has a unique maximum at 휽̂∗ ≔ (품̂, 휎̂2), Laplace
255 approximation can be used. The 풚 component of the likelihood function is a 푀푉푁(푊푔, 휎2퐼).
256 Therefore, following standard results from linear models theory, if 푊 is of full column rank then,
‖풚−푊품̂‖2 풚′(퐼−퐻 )풚 (푛−푟) 257 품̂ = (푊′푊)−1푊′풚 is the MLE of 품, and 휎̂2 = = 푊 = 푆2 is the MLE of 휎2, 푛 푛 푛
풚′(퐼−퐻 )풚 258 where 푆2 = 푊 is the least squares estimator of 휎2, 푟 = 푟푎푛푘(푊′푊) = 푚풮 and 퐻 = 푛−푟 푊
259 푊(푊′푊)−1푊′ is the projection matrix onto the column space of 푊.
260 After computing all the required expressions and making algebraic simplifications (see Appendix B),
261 it follows that:
퐵퐹10퐺푊
푎+푚 푎 −( ) 푛+푣+2 1 2 푚 ′ 2 −( ) (푚풮+2)⁄2 2 |횺| |횺 + ∑ 품̂푗품̂ | 푆푆푅 2 푆푆푅 |푊 ′푊 | ≈ ( ) ( 푗=1 푗 ) ( ) ( 0 0 ) 푏 푏 + ∑푚 푔̂2 푆푆푅 (푚+2)⁄2 |푊′푊| 푗=1 0푗 0 푆푆푅0
푚(풮−1) 풮 푎 + 푚 + 1 − 푙 −푛휏2 1 1 2 2 Γ ( ) × exp ( ( − )) ( ) (∏ 2 ) 2 푆푆푅 푆푆푅0 푛 푎 + 1 − 푙 푙=2 Γ ( 2 )
14
2 ′( ) ′ ( )−1 ′ 2 ‖풚−푊0품̂0‖ 262 where 푆푆푅 = 풚 퐼 − 퐻푊 풚, 푆푆푅0 = 풚 (퐼 − 퐻푊0)풚, 퐻푊0 = 푊0 푊0′푊0 푊0, 푆0 = , 푟0 = 푛−푟0
−1 ′ 263 푟푎푛푘(푊0′푊0) = 푚, 품̂0 = (푊0′푊0) 푊0풚.
264 Following similar steps (see Appendix B),
( | ∗) ∗ 푓 풚 푊, 푀1 퐵퐹10퐺푊 = ∗ 푓(풚|푊0, 푀0)
푎+1 푎푚 푚 −( ) 푛+푣+2 1 2 ′ 2 −( ) (푚풮+2)⁄2 2 |횺| |횺 + 품̂푗품̂푗| 푆푆푅 2 푆푆푅 |푊0′푊0| ≈ ( ) ∏ ( 2 ) ( ) (푚+2)⁄2 ( ) 푏 푏 + 푔̂ 푆푆푅0 |푊′푊| 푗=1 0푗 푆푆푅0
푚 푚(풮−1) 풮 푎 + 2 − 푙 −푛휏2 1 1 2 2 Γ ( ) × exp ( ( − )) ( ) (∏ 2 ) . 2 푆푆푅 푆푆푅0 푛 푎 + 1 − 푙 푙=2 Γ ( 2 )
푎+1 푚 푎푚 푚 −( ) 풮 푎 + 2 − 푙 푎 2 ′ 2 |횺| |횺 + 품̂푗품̂ | Γ ( ) Γ ( ) 퐵퐹∗ ≈ ( ) ∏ ( 푗 ) (∏ 2 2 ) 10퐺푊퐻 풮 풮 2 푎 + 1 − 푙 푎 + 1 푏 ∏푙=1(푏 + 푔̂0푗푙) 푗=1 푙=2 Γ ( 2 ) Γ ( 2 )
푛+푣+2 1 −( ) 푆푆푅 2 푆푆푅(푚풮+2)⁄2 |푊 ′푊 | 2 퐵퐹 ≈ (2휋)푚(풮+1)⁄2 ( ) ( 0 0 ) 10푆푆푊 푆푆푅 (푚+2)⁄2 |푊′푊| 0 푆푆푅0
푚(풮−1) 푎 푎 2 −푛휏 1 1 2 2 |횺| 2 Γ (2) × exp ( ( − )) ( ) ( ) 푎 2 푆푆푅 푆푆푅0 푛 푏 Γ풮 (2)
푚 푎+푚−푖 푚 ( 푖 ) 푖 푚−푖 풮(푚−푖)⁄2 푎 + 푚 − 푖 ′ −( ) ∑ ∑ 퐼 흅 (1 − 흅) 2 Γ ( ) |횺 + ∑ 푐 품̂ 품̂ | 2 푖=0 푙=1 훿푖푙 풮 2 푘:푔푘∈훿푙0 푘 푘 × 푎+푚−푖 푚 −( ) 푚 ( 푖 ) ( )⁄ 푎 + 푚 − 푖 2 ∑ ∑ 퐼 흅푖(1 − 흅)푚−푖2 푚−푖 2Γ ( ) |푏 + ∑ 푐 푔̂ | 2 푖=0 푙=1 훿푖푙 2 푘:푔푘∈훿푙0 푘0
푛+푣+2 1 −( ) 푆푆푅 2 푆푆푅(푚풮+2)⁄2 |푊 ′푊 | 2 퐵퐹∗ ≈ (2휋)푚(풮+1)⁄2 ( ) ( 0 0 ) 10푆푆푊 푆푆푅 (푚+2)⁄2 |푊′푊| 0 푆푆푅0
푚(풮−1) −1 푎 푚 2 −푛휏 1 1 2 2 |횺| 2 Γ (2) × exp ( ( − )) ( ) ( ) ( 푎 ) 2 푆푆푅 푆푆푅0 푛 푏 Γ풮 (2)
15
푚−푖 푚 푖 푚 ( 푖 ) 푖 푚−푖 −풮푖⁄2 푎 푎 + 1 1 ∑푖=0 ∑ 퐼훿 흅 (1 − 흅) 2 (Γ풮 ( )) (Γ풮 ( )) ∏푘:푔 ∈훿푐 푎+1 푙=1 푖푙 2 2 푘 푙0 ( ) −1 ′ 2 |1 + 품̂푘횺 품̂푘 | × 푚−푖 푚 푖 푚 ( 푖 ) 푖 푚−푖 −푖⁄2 푎 푎 + 1 1 ∑푖=0 ∑ 퐼훿 흅 (1 − 흅) 2 (Γ ( )) (Γ ( )) ∏푘:푔 ∈훿푐 푎+1 푙=1 푖푙 2 2 푘 푙0 ( ) 2 −1 2 |1 + 푔̂0푘푏 |
265 Before presenting fractional Bayes factors, the following result comparing 푆푆푅0 and 푆푆푅 in the
266 particular case of our models is presented and proven. This result will be used in the discussion
267 section to help in the interpretation of Bayes factors and fractional Bayes factors.
268 Result 1 For the models considered in this study, the following inequality holds: 푆푆푅0 ≥ 푆푆푅.
269 Proof
′ ′ 270 Let 푆푆푀1 = 풚 퐻푊풚 and 푆푆푀0 = 풚 퐻푊0풚. Thus, proving that 푆푆푅0 ≥ 푆푆푅 is equivalent to prove that
271 푆푆푀1 ≥ 푆푆푀0. Let 퐶(푊0) be the column space of 푊0 and 퐶(푊) the column space of 푊. Now, it is
272 proven that 퐶(푊0) ≼ 퐶(푊), where the notation "퐶(푊0) ≼ 퐶(푊)" means that 퐶(푊0) is a subspace
푚 273 of 퐶(푊). Let 풛 ∈ 퐶(푊0), then ∃ 풂 ∈ ℝ such that 풛 = 푊0풂, that is,
푊1풂 풛 = [ ⋮ ]. 푊풮풂
274 Similarly, let 풘 ∈ 퐶(푊), then ∃ 풃 ∈ ℝ푚풮 such that 풘 = 푊풃. Without loss of generality vector 풃
푚 275 can be partitioned as 풃 = (풃1, … , 풃풮) where 풃푙 ∈ ℝ ∀ 푙 = 1,2, … , 풮. Then 풘 is of the form
푊1풃ퟏ 풘 = [ ⋮ ]. 푊풮풃풮
276 In particular, if 풃푙 = 풂 ∀ 푙 = 1,2, … , 풮, it follows that 풛 also has the form of an element of 퐶(푊),
277 that is, 풛 ∈ 퐶(푊). Clearly, 풘 cannot be written as a linear combination of the columns of 푊0;
( ) ( ) 278 therefore, 퐶 푊0 ≼ 퐶 푊 . Applying theorem B.47 of Christensen (2011), it follows that 퐻푊 − 퐻푊0
279 is an orthogonal projection. By properties of orthogonal projections (Harville, 2000) it follows that
16
280 퐻푊 − 퐻푊0 is a semi-positive definite matrix, and consequently 풚′(퐻푊 − 퐻푊0)풚 ≥ 0 ⟺ 풚′퐻푊풚 ≥
281 풚′퐻푊0풚.
∎
282 Fractional Bayes factors
283 O’Hagan (1994; 1995) proposed a non-subjective Bayes factor known as fractional Bayes factor
284 which uses a fraction 푐 of the likelihood resulting in a “partial” Bayes factor having the following
285 form:
푐 ∫ 휋0(휽0)(푓0(풚|휽0)) 푑휽0 Θ0 퐹퐵퐹10 = 퐵퐹10 푐 . ∫ 휋1(휽1)(푓1(풚|휽1)) 푑휽1 Θ1
286 Thus, given 푊, the fractional Bayes factor for the homogeneous marker effect covariance matrix
287 model with homoscedastic residuals and Gaussian prior for 품 has the form:
푓푐(풚|푊0, 푀0퐺) 퐹퐵퐹10퐺푊 = 퐵퐹10퐺푊 푓푐(풚|푊, 푀1퐺)
2 2 푐 2 2 2 2 ∫ 휋(휎푔 ) (∫ 푚 ∫ (푓0(풚|품ퟎ, 휎푒 , 푊)) 휋(품ퟎ|휎푔 )휋(휎푒 ) 푑휎푒 푑품ퟎ) 푑휎푔 ℝ+ ℝ ℝ+ = 퐵퐹10퐺푊 푐 0 2 0 2 2 0 ∫ + 휋(퐺 ) (∫ 푚풮 ∫ (푓1(풚|품, 휎 , 푊)) 휋(품|퐺 )휋(휎 ) 푑휎 푑품) 푑퐺 풫풮 ℝ ℝ+
푐 288 Hence, ln(푓푖(풚|휽푖)) , 푖 = 0,1, and their corresponding Hessian matrices evaluated at the MLE have
289 to be found in order to find the Laplace approximation to the integrals inside the brackets in the
푐 290 numerator and denominator of 퐹퐵퐹10퐺푊. This is easily done because ln(푓푖(풚|휽푖)) = 푐 ln 푓푖(풚|휽푖).
291 The determinants of the negative Hessian matrices are now denoted by 퐷̃0, 퐷̃1 and they satisfy:
푚+1 푚풮+1 292 퐷̃0 = 푐 퐷0 and 퐷̃1 = 푐 퐷1. The approximate 퐹퐵퐹10퐺푊 is denoted as 퐹퐵퐹̅̅̅̅̅̅10퐺푊.
∗ ∗ 293 Fractional Bayes factors derived in this study were 퐹퐵퐹̅̅̅̅̅̅10푆푆푊 , 퐹퐵퐹̅̅̅̅̅̅10푆푆푊 and 퐹퐵퐹̅̅̅̅̅̅10퐺퐻푊. It turned
∗ ∗ 294 out that 퐹퐵퐹̅̅̅̅̅̅10퐺푊 = 퐹퐵퐹̅̅̅̅̅̅10퐺푊 because the components making 퐵퐹̅̅̅̅10퐺푊 different from 퐵퐹̅̅̅̅10퐺푊
∗ 푓푐(풚|푊0,푀0) 푓푐(풚|푊0,푀0) 295 cancelled when multiplying them by and ∗ respectively. For details on the 푓푐(풚|푊,푀1) 푓푐(풚|푊,푀1)
17
296 derivation see Appendix B. Moreover, the same cancellation happened when deriving 퐹퐵퐹̅̅̅̅̅̅10푆푆푊 and
∗ 297 퐹퐵퐹̅̅̅̅̅̅10푆푆푊. The resulting expression was:
∗ ∗ 퐹퐵퐹̅̅̅̅̅̅10퐺푊 = 퐹퐵퐹̅̅̅̅̅̅10퐺푊 = 퐹퐵퐹̅̅̅̅̅̅10푆푆푊 = 퐹퐵퐹̅̅̅̅̅̅10푆푆푊 ≔ 퐹퐵퐹̅̅̅̅̅̅10푊
푛(푐−1) 푆푆푅 2 푆푆푅(푚풮+2)⁄2 = 푐푚(풮−1)⁄2 ( ) . 푆푆푅 (푚+2)⁄2 0 푆푆푅0
298 Notice that in the case 푚 > 푛 where 푊 and 푊0 are not of full column rank, this expression is
− − 299 invariant to the choice of the generalized inverses (푊′푊) and (푊0′푊0) . This follows because of
300 the uniqueness of the projection operator onto the column space of 푊, 퐻푊 (Harville, 2000), which
301 implies that 푆푆푅 and 푆푆푅0 are invariant to the choice of the generalized inverses. The approximate
∗ 302 fractional Bayes factor 퐹퐵퐹̅̅̅̅̅̅10퐺퐻푊 was equal to 1 (see Appendix B for details). Thus, it does not
303 provide information for comparing the corresponding models.
304 Based on the fact that the 퐹퐵퐹̅̅̅̅̅̅10푊 is invariant to the choice of generalized inverses of 푊′푊 and
305 푊0′푊0 when 푚 > 푛, a brief discussion about the possible use of this criterion in the non-full rank
306 case is provided in Appendix C. The issue is that the derivation that led to the fractional Bayes factor
307 in the full rank case cannot be applied to the non-full rank case due to the fact that |푊′푊| =
′ −1 ′ −1 308 |푊0푊0| = 0 and (푊′푊) and (푊0푊0) do not exist. Although expressions involving these
309 quantities cancel later on, it is clear that the derivations presented in Appendix B do not justify using
310 퐹퐵퐹̅̅̅̅̅̅10푊 in the non-full rank case.
311 These Bayes factors are useful for carrying out the conventional model selection conditioned on 푊,
312 that is, conditioned on the observed genotypes. When part of 푊 is not observed, the joint distribution
313 of 풚 and 푊푁 given 푊ℴ can be obtained and then summing over the set 풢푁 yields Bayes factors and
ℴ 푓(풚|푊,푀1) 314 fractional Bayes factors conditioned on 푊 . Recall that 퐵퐹10푊 = , to find 퐵퐹10푊ℴ = 푓(풚|푊0,푀0)
ℴ 푓(풚|푊 푀1) 315 ℴ the following computation has to be performed: 푓(풚|푊0 푀0)
18
ℴ 푁 ℴ 푓(풚|푊 , 푀1) = ∑ 휋(풚, 푊 |푊 , 푀1) 풢푁
푁 ℴ = ∑ 푓(풚|푊, 푀1)휋(푊 |푊 , 푀1) 풢푁
푁 ℴ ∗ ∗ ∗ = ∑ {푓(풚|푊, 푀1) ∫ 휋(푊 |푊 , 푃 )휋(푃 )푑푃 }. 풢푁 Ω
316 For 풓 known:
푚 푁 ℴ 푁 ℴ 휋(푊 |푊 , 푀1) = ∏ ∫ 휋(풘푗 |풘푗 , 풑푗)휋(풑푗|풓)푑풑푗 푗=1 풓 Ω푗
풓 풮 풮 풓 풓 317 where Ω푗 ≔ {풑푗 ∈ ℝ |0 < 푝푙푗 ≤ 푟푙 ∀ 푙, ∑푙=1 푟푙 = 1} and Ω = Ω1 × ⋯ × Ω푚 is the support of the
푁 ℴ 318 distribution of 푃. For all 푗, the pmf 휋(풘푗 |풘푗 , 풑푗) can be found using Bayes theorem as
푁 ℴ ℴ ℴ 319 휋(풘푗 |풘푗 , 풑푗) = 휋(풘푗|풑푗)⁄휋(풘푗 |풑푗), but computing 휋(풘푗 |풑푗) requires ∑풢푁 휋(풘푗|풑푗) which can
320 be unfeasible from the computational point of view. Alternatively, 휋(푊푁|푊ℴ, 푃∗) can be derived
321 from first principles by noticing that the dependence on 푊ℴ comes from the term where genotypes of
322 individuals are conditioned on parental genotypes and then proceeding as in section 2.1.1 of part I.
323 Using the expressions derived in section 2.2.1 of part I and assuming 풓 known:
푁 ℴ 휋(푊 |푊 , 푀1) ∝
푛푙푗 푚 풮 퐴 푁 퐵푗 푗 퐻 훼 −1 1 푛 +훼푙−1 푛 푙 푛푁 풮+1 푙푁 푙푁 2 ∏ ∫ 푝(풮+1)푗 ∏ { 2푓 푝푙푗 (푟푙 − 푝푙푗) ∏ 휋 (푤푖′푗|푤푆 ′ , 푤퐷 ′ )} 푑풑푗 푙푗푁 푖 푗 푖 푗 푗=1 풓 푙=1 푟 푖′=푓 +1 Ω푗 푙 푙푗푁
324 where 푓푙푗푁 is the number of founders with missing genotypes at locus 푗 in subpopulation 푙, 푛푙푗푁 is
325 the total number of individuals with missing genotypes at locus 푗 in subpopulation 푙. Given that
푛 푙푗푁 푙 ∗ 푁 ℴ ∏ ′ ( | ) 326 푖′=푓 +1 휋 (푤푖 푗|푤푆 ′ , 푤퐷 ′ ) does not depend on 푃 , the problem of finding 휋 푊 푊 , 푀1 푙푗푁 푖 푗 푖 푗
327 involves the evaluation of 푚 integrals of the form:
19
풮 퐴 퐵푗 푗 훼 −1 푛 +훼푙−1 푛 풮+1 푙푁 푙푁 ∫ 푝(풮+1)푗 ∏ {푝푙푗 (푟푙 − 푝푙푗) } 푑풑푗, 풓 푙=1 Ω푗
퐴 푛 푗 풮 푙푁 328 this integral corresponds to the expectation of the function ∏푙=1(푟푙 − 푝푙푗) of the random vector 풑푗
329 taken over 휋(풑푗|풓). It does not have a closed form solution, but these integrals could be evaluated
푁 ℴ 330 numerically in order to find a numerical approximation to 휋(푊 |푊 , 푀1). A similar situation occurs
331 when 풓 is not known, that is, integrals with no closed form solutions have to be evaluated in order to
푁 ℴ 332 find 휋(푊 |푊 , 푀1).
333 Notice that matrices 푊 and 푊0 contain the same random variables but in different arrays.
푁 ℴ 334 Consequently, 푊 and 푊 are the same in both cases and the analytic form of 휋(푊0|풑0), can be
∗ 335 easily derived from 휋(푊|푃 ) by setting 풮 = 1 and taking into account that the prior posed over 풑0 is
336 the product of 푚 퐵푒푡푎(훼, 훽) densities.
ℴ 푁 ℴ 푓(풚|푊 , 푀0) = ∑ 푓(풚|푊0, 푀0)휋(푊 |푊 , 푀0) 풢푁
푁 ℴ 푁 ℴ 337 where 휋(푊 |푊 , 푀0) = ∫ 휋(푊 |푊 , 풑0)휋(풑0)푑풑0 , Ω0 = [0,1] × [0,1] × ⋯ × [0,1], an Ω0
338 푚 −dimensional unit hypercube.
푛푁푗 푚 퐴 퐵푗 푗 푁 ℴ 푛퐻 푛 푛푁 휋(푊 |푊 , 풑 ) = 2 푁 ∏ {푝 푁 (1 − 푝 ) ∏ 휋 (푤 ′ |푤 , 푤 )}, 0 푗 푗 푖 푗 푆푖′푗 퐷푖′푗 ′ 푗=1 푖 =푓푁푗+1
퐵푗 퐴푗 339 then, using the fact that 푛푁 + 푛푁 = 2푓푁푗 (which is twice the total number of founders with missing
340 genotypes at locus 푗), it follows that:
푚 푛푁푗 1 푛퐻 퐴 2 푁 퐵푗 푗 푁 ℴ 푛 +훼−1 푛푁 +훽−1 휋(푊 |푊 , 푀0) = ∏ ∏ 휋 (푤푖′푗|푤푆 , 푤퐷 ) ∫ 푝푗 푁 (1 − 푝푗) 푑푝푗 퐵(훼, 훽)푚 푖′ 푖′ ′ 푗=1 푖 =푓푁푗+1 0
20
푚 퐵푗 퐴푗 푛푁푗 푛퐻 2 푁 Γ (푛푁 + 훼) Γ (푛푁 + 훽) = ∏ { ∏ 휋 (푤푖′푗|푤푆 , 푤퐷 )}, 퐵(훼, 훽)푚 푖′푗 푖′푗 Γ(2푓푁푗 + 훼 + 훽) ′ 푗=1 푖 =푓푁푗+1
341
342 where 푛푁푗 is the total number of individuals with missing genotypes at locus 푗. Applying properties
343 of the Gamma function (Casella and Berger, 2002; Kosmala 2004) this can be reduced to (see
344 Appendix A):
푁 ℴ 휋(푊 |푊 , 푀0)
퐵푗 퐴푗 푛 퐵 푛 퐴 푛푁푗 푚 ∏ 푁 (푛 푗 − 푘 + 훼) ∏ 푁 (푛 푗 − 푘 + 훽) 푛퐻 푘=1 푁 푘=1 푁 = 2 푁 ∏ { ∏ 휋 (푤 ′ |푤 , 푤 )} . 2푓푁푗 푖 푗 푆푖′푗 퐷푖′푗 ∏ (2푓 − 푘 + 훼 + 훽) ′ 푗=1 푘=1 푁푗 푖 =푓푁푗+1
푁 ℴ 345 Therefore, in the case 풮 = 1, there is an explicit expression for 휋(푊 |푊 , 푀0).
ℴ 346 Notice that obtaining an approximation to the pdf 푓(풚|푊 , 푀1) involves computation of
푚 ′ (푎+푚)⁄2 푁 347 푆푆푅, 품̂, |횺 + ∑푗=1 품̂푗품̂푗| and |푊′푊| for every possible value of 푊 . Thus, this could be
348 computationally unfeasible even for small or moderate sample sizes and chip densities.
349 Regarding interpretation of Bayes factors, their values can be classified according to the
350 recommendations of Raftery (1996). This author proposed a scale to interpret Bayes factors based on
351 a previous scale proposed by Jeffreys (1961); however, Raftery’s scale is more granular and more
352 conservative (Raftery, 1996). The scale is as follows: if 퐵퐹10< 1, the evidence is negative (i.e.,
353 against model 1), values between 1 and 3 indicate that evidence for model 1 is not worth more than a
354 bare mention, values between 3 and 20 indicate positive evidence in favor of model 1, values
355 between 20 and 150 indicate strong evidence in favor of model 1 and values greater than 150 suggest
356 very strong evidence for model 1.
357 2.3.2 Deviance information criterion
21
358 As in part I, another criterion used to compare models is the Deviance Information Criterion (DIC;
359 Spiegelhalter et al., 2002). It combines a measure of goodness of fit based on the posterior
360 distribution with a penalty for model complexity. In part I it was shown that for our family of models
361 퐷퐼퐶 can be written as the sum of two components, one computed from the 풚 component of the
362 likelihood and the other from the 푊 component of the likelihood:
ℴ 푁 ℴ 푁 ∗ 퐷퐼퐶 = −2 log 푓(풚|푊 , 푊̂퐵 , 품̂퐵, 푅̂퐵) + 2푝퐷퐼퐶−풚 − 2 log 푓(푊 |푊̂퐵 , 푃̂퐵) + 2푝퐷퐼퐶−푊
: = 퐷퐼퐶풚 + 퐷퐼퐶푊
ℴ ̂ 푁 ̂ 363 where 푝퐷퐼퐶−풚 = 2(log 푓(풚|푊 , 푊퐵 , 품̂퐵, 푅퐵) − 퐸푊푁,품,푅,푃∗|풚,푊ℴ [log 푓(풚|푊, 품, 푅)]) and 푝퐷퐼퐶−푊 =
ℴ ̂ 푁 ̂∗ ℴ 푁 ∗ 364 2(푓(푊 |푊퐵 , 푃퐵) − 퐸푊푁,푃∗|풚,푊ℴ [푓(푊 |푊 , 푃 )]).
365 2.4 Analysis of simulated data
366 With the aim of providing an example of the implementation of some of the proposed models and to
367 compare their performance, the two small simulated datasets described in part I were used here as
368 well. For the sake of completeness some minor details about the simulation are provided. After
369 simulating a historical population using a forward-in-time approach, subpopulations were created
370 using individuals pertaining to the historical population as founders. Each subpopulation had
371 different selection criteria, selection pressures, and mating systems. Dataset 1 was comprised of three
372 subpopulations with different number of generations, migration was allowed and the heritability of
373 the trait was high. Dataset 2 consisted of two subpopulations with two generations each, migration
374 was not allowed and the heritability of the trait was low (see Table 2 of companion paper for further
375 details). These simulations were performed using the software QMSIm (Sargolzaei and Schenkel,
376 2013). For further details, see part I.
377 These datasets were used to carry out analyses using the following models. Spike and slab prior and
378 heterogeneous marker effect covariance matrices with 휋0 = 0.5, 휋0 = 0.9 and 휋0 = 0.2 and their
379 null versions. All models assumed homoscedastic residuals. In the results and discussion sections,
22
380 results from the models fitted to these datasets in part I will also be considered. Models fit in part I
381 were Multivariate Gaussian prior and homogeneous marker effect covariance matrices, Multivariate
382 Gaussian prior and heterogeneous marker effect covariance matrices, both with homoscedastic
383 residuals. Not all models were used to analyze these data because of the following reasons. Firstly,
384 taking into account that simulations did not consider heteroscedastic residuals, only models with
385 homoscedastic residuals were fit. Secondly, some models have computational issues that make their
386 implementation intractable. This is the case of models with a spike and slab prior over 품 with
387 homogeneous marker effect covariance matrices. In these models, the full conditional distribution of
388 the covariance matrix 퐺0 involves all the combinations of 푖 out of 푚 markers with null effects for
389 푖 = 0,1, … , 푚; therefore, it is not easy to sample from 휋(퐺0|퐸푙푠푒) due to the number of
390 combinations being exponential in 푚. As shown in section 2.2.2, for the model with heterogeneous
391 marker effect covariance matrices, it is easy to sample from the full conditional distribution of the
392 covariance matrix of each marker locus which makes its implementation possible.
1 1 393 Data were analyzed using the MCMC algorithm described in part I assuming that 풓 = ( , … , ) and 풮 풮
1 394 using the product of 풮 independent uniform (0, ) distributions as proposal for 휋(푃|퐸푙푠푒). The 풮
395 following criteria for model comparison were computed: approximate Bayes factors and fractional
396 Bayes factors derived in section 2.3.1, the squared correlation between predicted breeding values and
397 phenotypes in the testing populations (predictive ability), squared correlations between true and
398 predicted breeding values in the testing and training populations (accuracy) and DIC.
399 The hyper-parameter 휋0 was assumed to be given. In practice, values close to 1 are used reflecting
400 the belief that many of the SNP do not have an effect. Alternatively, this hyperparameter can be
401 tuned or a prior can be posed over it in order to reflect uncertainty. Here, three values of this
402 parameter were implemented in the analyses, 0.9, 0.5 and 0.2. This does not correspond to a tuning
403 procedure; it was done only for illustrative purposes. The three values were chosen to reflect
23
404 situations in which the prior belief is that a high proportion of marker loci do not have an effect
405 (휋0 = 0.9), approximately half of them have an effect (휋0 = 0.5), and a high proportion of markers
406 have an effect (휋0 = 0.2). In dataset 2, the full genotypes of three individuals (one founder from
407 each subpopulation and a non-founder from subpopulation 1) were not included in the analysis in
408 order to simulate the case of missing genotypes.
409 For each analysis, 20.000 iterations were run, considering the first 10.000 as burn-ins. In-house R
410 scripts (R Core Team, 2015) were created to accommodate spike and slab priors and to compute
411 Bayes factors and Fractional Bayes factors as well as DIC. Analyses were performed using the
412 University of Florida’s high performance computing cluster.
413 3. Results
414 3.1 Bayes factors
415 Using the expressions derived in section 2.3.1, approximate Bayes factors and fractional Bayes
∗ ∗ 416 factors were computed for dataset 1. Recall that 퐹퐵퐹̅̅̅̅̅̅10퐺푊 = 퐹퐵퐹̅̅̅̅̅̅10퐺푊 = 퐹퐵퐹̅̅̅̅̅̅10푆푆푊 = 퐹퐵퐹̅̅̅̅̅̅10푆푆푊;
∗ ∗ 417 therefore, the same expression permits the comparison of models 푀1퐺, 푀1퐺, 푀1푆푆 and 푀1푆푆 with
418 their corresponding null models. Because of the same reason that makes the sampling from the full
0 419 conditional distribution of 퐺 under model 푀1푆푆 difficult, approximate Bayes factors for models with
∗ 420 spike and slab priors were not computed. According to the Raftery’s scale, 퐵퐹̅̅̅̅10퐺푊 and 퐵퐹̅̅̅̅10퐺푊
421 suggested very strong evidence in favor of all full models (they were greater than 150) in dataset 1.
422 The same result was found when using the fractional Bayes factor which was computed with 푐 =
423 0.5.
424 In dataset 2, computation of Bayes factors was not possible because 푚 > 푛1. Furthermore, even
425 though only three individuals were assumed to be non-genotyped and the number of markers was
426 small, computation of the fractional Bayes factor was not performed due to its computational
427 demands. All evidence provided by the approximate fractional Bayes factors computed using the
24
428 posterior means of 푊푁 (which could be seen as a sort of plug-in criteria) was against the full models,
429 that is, all fractional Bayes factors were smaller than 1.
430 3.2 DIC, predictive ability and accuracies of predicted breeding values
431 In dataset 1, 퐷퐼퐶푊 is common to all full models and to all null models, i.e., there are only two values.
432 It is due to the fact that there were no missing genotypes (see part I for details). The values were
433 4717671 for full models, and 6589105 for null models, that is, information coming from observed
434 genotypes provided evidence in favor of the full models. It means that in this population, genotypic
435 data provided support for the assumption of heterogeneous and correlated allelic frequencies when
436 comparing it with the competing assumption that allelic frequencies are the same in all
437 subpopulations.
438 Tables 2 and 3 contain DIC values for datasets 1 and 2 respectively, whereas Table 4 shows
439 predictive abilities and accuracies for the two datasets. For Tables 2 to 4, the following is the
440 meaning of abbreviations for the different models fitted to datasets 1 and 2: 푀1퐺= full model with
∗ 441 Multivariate Gaussian prior and homogeneous marker effect covariance matrices, 푀1퐺 = full model
∗ 442 with Multivariate Gaussian prior and heterogeneous marker effect covariance matrices, 푀1푆푆0.5 = full
443 model with spike and slab prior, 휋0 = 0.5 and heterogeneous marker effect covariance matrices,
∗ 444 푀1푆푆0.9 = full model with spike and slab prior, 휋0 = 0.9 and heterogeneous marker effect covariance
∗ 445 matrices, 푀1푆푆0.2 = full model with spike and slab prior, 휋0 = 0.2 and heterogeneous marker effect
446 covariance matrices. The remaining models with subindex 1 replaced by 0 correspond to null
447 versions of the corresponding full models.
448 Table 2 풚 component and total DIC for dataset 1
Model 푫푰푪풚 Total DIC
푀1퐺 33702.55 4751373.55 ∗ 푀1퐺 11599.05 4729270.05 ∗ 푀1푆푆0.5 11604.09 4729275.09
25
∗ 푀1푆푆0.9 11648.94 4729319.94 ∗ 푀1푆푆0.2 11437.05 4729108.05
푀0퐺 15396.32 6604501.32 ∗ 푀0퐺 13008.42 6602113.42 ∗ 푀0푆푆0.5 12502.17 6601607.17 ∗ 푀0푆푆0.9 12625.29 6601730.29 ∗ 푀0푆푆0.2 12137.88 6601242.88 449
450 Therefore, according to the component of total DIC computed from the 풚 component of the
451 likelihood, except for the models with homogeneous marker effect covariance matrices (variances),
452 full models should be preferred over their null versions in this dataset. When considering total DIC,
453 all full models had a smaller DIC. Additionally, the model with the smallest DIC, and therefore the
454 one to be preferred was model 푀0푆푆퐻0.2 followed by model 푀1퐺퐻. Notwithstanding, the DIC values
455 for models 푀1퐺퐻, 푀1푆푆퐻0.5, 푀1푆푆퐻0.9 and 푀1푆푆퐻0.2 were close.
456 Table 3 풚 component, 푊 component and total DIC for dataset 2
Model 푫푰푪풚 푫푰푪푾 Total DIC
푀1퐺 1314.0 38367.4 39681.4 ∗ 푀1퐺 1328.8 38356.4 39684.2 ∗ 푀1푆푆0.5 1313.6 38394.9 39708.5 ∗ 푀1푆푆0.9 1304.8 38382.7 39687.5 ∗ 푀1푆푆0.2 1323.4 38373.8 39697.2
푀0퐺 1365.6 38180.3 39545.9 ∗ 푀0퐺 1370.1 38179.0 39549.1 ∗ 푀0푆푆0.5 1350.4 38173.4 39523.8 ∗ 푀0푆푆0.9 1361.2 38195.8 39557.0 ∗ 푀0푆푆0.2 1245.5 38178.4 39432.9 457
458 In this dataset the two components of the DIC values and therefore DIC values were similar for all
459 models. The 풚 components of DIC were smaller for the full models except for the model with spike
26
460 and slab prior for 품 and 휋0 = 0.2. Conversely, the 푊 components were smaller for null models as
461 well as total DIC values.
462 Table 4 Predictive abilities and accuracies in datasets 1 and 2
Accuracy in testing Accuracy in Training Predictive Ability Model population population Dataset1 Dataset 2 Dataset1 Dataset2 Dataset1 Dataset2
푀1퐺 0.29 0.019 0.27 0.04 0.32 0.17 ∗ 푀1퐺 0.76 0.016 0.83 0.03 0.94 0.21 ∗ 푀1푆푆0.5 0.81 0.017 0.88 0.04 0.92 0.19 ∗ 푀1푆푆0.9 0.81 0.018 0.88 0.04 0.90 0.14 ∗ 푀1푆푆0.2 0.79 0.016 0.86 0.03 0.94 0.20
푀0퐺 0.53 0.004 0.50 0.07 0.55 0.24 ∗ 푀0퐺 0.83 0.013 0.88 0.05 0.88 0.23 ∗ 푀0푆푆0.5 0.72 0.003 0.77 0.06 0.86 0.24 ∗ 푀0푆푆0.9 0.69 0.008 0.76 0.05 0.85 0.20 ∗ 푀0푆푆0.2 0.72 0.009 0.79 0.05 0.79 0.24 463
464 According to the behavior of predictive abilities in dataset 1, the performance of the different models
465 was similar except for 푀1퐺. The model with the best predictive ability was model
∗ 466 푀0퐺 while model 푀1퐺 had the worst. The accuracies in testing dataset 1 showed a pattern similar to
467 that followed by predictive abilities. The performance of the models was similar except for model
468 푀1퐺 which made the poorest job when predicting breeding values and model 푀0퐺 which had the
469 worst performance of all null models. The highest accuracies of predicted breeding values in testing
∗ ∗ ∗ 470 population 1 were observed for models 푀1푆푆0.5, 푀1푆푆0.2, and 푀0퐺. Finally, the accuracies of
471 predicted breeding values in the training population showed the same behavior than the other
472 measures, a poorer performance for models with homogeneous covariance matrix (or variance for
473 null models) of marker effects with model 푀1퐺 having the smallest accuracy. Models with the
∗ ∗ 474 highest accuracies were 푀1퐺 and 푀1푆푆0.2.
27
475 For dataset 2, predictive abilities and accuracies in the testing sets were very low. Accuracies in
476 training set were slightly larger. All these measures based on squared correlations did not show
477 marked differences between models. Full models had higher predictive abilities and smaller
478 accuracies in testing and training sets.
479 4. Discussion
480 4.1 General features of the models
481 The set of hierarchical Bayesian linear regression models for simultaneous genome-wide prediction
482 in several subpopulations accounting for randomness of genotypes developed in part I was extended
483 by incorporating spike and slab priors. The slab components of the conditional priors for marker
484 effects were 풮-variate Gaussian distributions considering homogeneous or heterogeneous covariance
485 matrices (or variances) and the spike component was multivariate mass at zero for full models and
486 univariate mass at zero for null models. Then, in order to provide general criteria for comparison of
487 the proposed models with some null versions of them, approximate Bayes factors and fractional
488 Bayes factors were derived under the assumption that 푛푙 ≥ 푚 ∀ 푙 = 1,2, … , 풮 and the possible use of
489 fractional Bayes factors for the case 푚 > 푛 was briefly discussed. These Bayes factors and fractional
490 Bayes factors were approximations because some of the multiple integrals required to find the
491 marginal distribution of data given a model were approximated via the Laplace method.
492 Spike and slab priors assign positive mass at zero; therefore, models considering this class of priors
493 can be used for variable selection and they induce a stronger shrinkage towards zero (Gianola, 2013;
494 Xu and Ghosh, 2015). Our spike and slab models can perform variable selection at the marker level,
495 that is, it is assumed that either a given marker has effects in all subpopulations or it does not have
496 effect in any subpopulation. In statistics, this is known as sparsity at the group level (Xu and Ghosh,
497 2015). Xu and Ghosh (2015) reparametrized the coefficients of the multiple linear regression as the
498 product of a positive diagonal matrix and a vector, i.e., 푔푗 ≔ 푉푔푏푗, 푗 = 1,2, … , 푚. Then, they posed
28
499 independent univariate spike and slab priors for the elements of the positive diagonal matrix and
500 independent multivariate spike and slab priors for 푏푗. This strategy permits to induce two kinds of
501 sparsity, at group level and within group. Thus, an extension of our models that would induce
502 sparsity at the group (i.e., marker) and within group levels would be to consider conditional priors
503 similar those developed in Xu and Ghosh (2015). Therefore, a given marker would have positive
504 probability of having null effects only in a proper subset of subpopulations.
505 Uncertainty on the hyper-parameter 휋0 can be accounted for by posing a prior over it. A usual choice
506 is a Beta distribution or its special case the Uniform(0,1). Implementation of this approach in the
507 models presented here is straightforward. It implies adding one more level in the hierarchy. In this
508 case, the question arising is the impact of this on inferences. Using the Kullback-Leibler divergence,
509 Lehmann and Casella (1998, Theorem 5.7) provide theoretical justification for the idea that
510 parameters that are in lower levels of the hierarchy have a smaller impact on inference.
511 Notwithstanding, this does not mean that the impact of this extra level in the hierarchy is negligible
512 and therefore, if the prior knowledge about 휋0 is poor or null it may be worth to account for
513 uncertainty. As mentioned before, alternatively this parameter can be tuned.
514 Regarding approximate Bayes factors and fractional Bayes factor, those derived here were obtained
515 via Laplace approximation which has an error of order 푂(푛−1) (Ghosh et al., 2006). This means that
516 the error of approximation is bounded from above by a constant times 푛−1. There is a refinement
517 based on the Laplace method that allows obtaining an approximation with error of order 푂(푛−2)
518 when 푞(휽) is a positive function (Tierney and Kadane, 1986), which is always satisfied in the
519 context of this study (see section 2.3.1). This refinement could be implemented to obtain more
520 accurate approximations of Bayes factors and fractional Bayes factors.
521 Other authors, e.g., Raftery (1996) and Lewis and Raftery (1997) have also used the Laplace method
522 or modifications of it (DiCiccio et al., 1997) to derive approximate Bayes factors. The following
29
523 comments regarding the algebraic expressions of Bayes factors and fractional Bayes factors are made
524 for a given dataset, that is, given 풚, 푛, 푚 and 푊ℴ. It is well known that for nested models (i.e., the
525 null model corresponds to the full model with some parameters set to zero) 푆푆푅0 > 푆푆푅 (Searle,
526 1971). In this case the models are not nested; therefore, this standard result cannot be used. However,
527 Result 1 establishes the relationship between 푆푆푅0 and 푆푆푅 for our models.
528 Thus, by Result 1, the following component of the algebraic expression for 퐵퐹10퐺푊 is always greater
푛+푣+2 푆푆푅 −( ) 529 or equal than 1: ( ) 2 and as a consequence it never provides evidence against model 1. 푆푆푅0
530 Conversely, for 푛 ≥ 2 the following component is always smaller or equal than 1, that is, it never
푚(풮−1) −푛휏2 1 1 2 531 provides evidence in favor of model 1: exp ( ( − )) ( ) 2 . Of course, the strength of 2 푆푆푅 푆푆푅0 푛
532 the evidence in favor or against model 1 (when 푆푆푅0 > 푆푆푅) depends on the observed data. Both
533 expressions depend on the data and the hyper-parameters assigned to the residual variance. On the
534 other hand, the following expression
1 1 |푊 ′푊 | 2 |푊 ′푊 + 푊 ′푊 + ⋯ + 푊 ′푊 | 2 ( 0 0 ) = ( 1 1 2 2 풮 풮 ) , |푊′푊| |푊1′푊1||푊2′푊2| ⋯ |푊풮′푊풮|
535 depends only on the data. However, there are no general results establishing the relationship between
536 the determinants inside the parenthesis and this is why it cannot be established if this component is
537 always smaller or greater than 1. Of course, these determinants are always positive because of the
538 assumption that all submatrices 푊1, … , 푊풮 are of full column rank. Thus, if this component favors
539 model 1 or not depends on each dataset. The following component depends on both, the priors and
540 the data:
푎+푚 풮 푎 + 푚 + 1 − 푙 −( ) (푚풮+2)⁄2 Γ ( ) 푚 ′ 2 푆푆푅 2 |횺 + ∑푗=1 품̂푗품̂푗| (푚+2)⁄2 (∏ ) ( 푚 2 ) . 푆푆푅 푎 + 1 − 푙 푏 + ∑푗=1 푔̂0푗 0 푙=2 Γ ( 2 )
30
541 The relative value of this component with respect to 1 cannot be established. Thus, as with the
542 previous component, its contribution to the evidence in favor or against model 1 varies with each
∗ ∗ ∗ 543 dataset. A similar situation occurs with 퐵퐹10퐺푊 and 퐵퐹10퐺푊퐻, while for 퐵퐹10푆푆푊 and 퐵퐹10푆푆푊 there
544 are new terms induced by the spike and slab priors posed over 품 and 품0 whose relative value with
545 respect to 1 depends on the observed data. However, the following statement can be made for the
546 term involving gamma functions. In its positive domain, the Gamma function has a minimum point at
547 approximate coordinates (1.461, 0.885) (Kosmala, 2004), this implies that after 1.461 the function
548 is increasing. Furthermore as 푥 ↓ 0, Γ(푥) → ∞. Note that for 푙 = 2,3, … , 풮 and 푎 > 풮 − 1 (recall that
푎+푚+1−푙 푚 푎+1−푙 549 the inverse Wishart distribution requires this condition) > and > 0. Therefore, 2 2 2
550 given that in genome-wide prediction 푚 has order of magnitude of at least 102, for values of 푎 such
551 that 푎 + 1 − 풮 ≥ 1.461 this term is always greater than 1.
552 Regarding fractional Bayes factors, as mentioned before,
∗ ∗ 퐹퐵퐹10퐺푊 = 퐹퐵퐹10퐺푊 = 퐹퐵퐹10푆푆푊 = 퐹퐵퐹10푆푆푊
푛(푐−1) 푆푆푅 2 푆푆푅(푚풮+2)⁄2 = 푐푚(풮−1)⁄2 ( ) 푆푆푅 (푚+2)⁄2 0 푆푆푅0
553 due to cancellation of terms making approximate Bayes factors different. Recall that 푐 ∈ (0,1). As
푆푆푅(푚풮+2)⁄2 554 푐 ↑ 1 and 푚 and 푛 remain constant the fractional Bayes factor approaches (푚+2)⁄2 . For 푐 ∈ (0,1) 푆푆푅0
푛(푐−1) 푛(푐−1) 푆푆푅 555 the exponent is always negative and therefore ( ) 2 never provides evidence against 2 푆푆푅0
556 model 1. On the contrary, 푐푚(풮−1)⁄2 provides evidence against model 1; however, as noted before,
557 given 푚 and 풮, as 푐 ↑ 1 the evidence provided by this component is negligible because the whole
558 expression approaches 1.
31
559 Some recommendations to choose the value of 푐 are given in O’Hagan (1994) and Ghosh et al.
푆푆푅(푚풮+2)⁄2 560 (2006). Finally, the behavior of (푚+2)⁄2 depends on the magnitude of the difference between 푆푆푅 푆푆푅0
561 and 푆푆푅0 and the number of subpopulations.
562 An important aspect of these approximations is that they require 푛푙 ≥ 푚 ∀ 푙 = 1,2, … , 풮. As
563 discussed in section 2.3.1, the fast growth in the number of genotyped individuals may make this
564 assumption possible for SNP chips of moderate size (i.e., 50 to 100k). However, the availability of
565 denser chips and full sequences implies that 푚 also grows. On one hand, it is said that the higher the
566 number of SNP the better the accuracy of genome-wide predictions because more LD between
567 markers and QTL is “captured”. On the other hand, some studies with real data such as Vázquez et
568 al. (2010) in Holstein cattle and de los Campos et al. (2013) in humans have found that using subsets
569 of SNP yields reasonable accuracy of genome-wide predictions. Moreover, the curve relating
570 accuracy to marker density has been reported to reach a plateau for traits as height in humans
571 (Vázquez et al., 2012) which suggests that in some cases not too much accuracy is lost when
572 selecting subsets of SNP using some criteria.
573 Finally, the ability of our models to include non-genotyped individuals allows having a larger 푛,
574 which combined with the factors mentioned before, increases the likelihood of having situations with
575 푛푙 ≥ 푚 ∀ 푙 = 1,2, … , 풮. The approximate fractional Bayes factor 퐹퐵퐹̅̅̅̅̅̅10푊 could be used for the case
576 푚 > 푛 but there is no formal mathematical justification for it. A brief discussion with an outline of
577 the steps required to justify its use in such case is provided in Appendix C. Thus, the use of this
578 expression for model comparison in the non-full rank case has to be seen as an ad hoc approach
579 because there is no formal proof of its validity yet. Therefore, the question if the approximate
580 fractional Bayes factor derived here is also valid for the non-full rank case remains to be formally
581 answered. Thus, refuting this result or establishing a rigorous proof of it is an open problem.
582 4.2 Simulation results
32
583 Our small simulations correspond to two populations comprised by three and two subpopulations
584 respectively. One trait per population was simulated. In both cases subpopulations had different
585 mating designs, selection criteria, selection pressures and heritabilities. However, these populations
586 display two contrasting scenarios. The first one (dataset 1) corresponded to a population comprised
587 of three subpopulations that diverged by several generations, heritabilities were high, migration was
588 allowed, the number of individuals in each subpopulation was larger than the number of SNP and
589 there were no missing genotypes. Conversely, the second scenario (dataset 2) considered a
590 population comprised by two subpopulations that diverged by only two generations, trait
591 heritabilities were low, there was no migration, the number of individuals was smaller than the
592 number of SNP in one subpopulation (hence the model was not of full rank) and there were missing
593 genotypes.
594 In dataset 1, predictive ability did not suggest a superior predictive capability of full models, that is,
595 models accounting for potential heterogeneity induced by the existence of subpopulations. As shown
596 in Table 4, its values were very similar across models (except for the model with a homogeneous
597 covariance matrix of marker effects which had considerably lower predictive ability). In this dataset,
598 the number of marker loci considered in the analysis was equal to the number of QTL; therefore, it
599 could be expected that the smallest value of 휋0 had the best performance. The different squared
600 correlations between predicted and observed values yielded similar results for the three values of 휋0
601 used here with a slightly better performance for the model with the smallest value of 휋0. While this
602 set of correlations did not provide conclusive evidence in favor of the full models, the DIC, Bayes
603 factors and fractional Bayes factors favored the full models.
604 Due to the low heritabilities in the two subpopulations forming dataset 2, predictive ability and
605 accuracies were very low (Table 4). In this dataset full models had slightly higher predictive abilities
606 than their null versions. Conversely, accuracies of predicted breeding values in training and
33
607 validation datasets suggested a tiny superiority of null models. Total DIC and
608 퐷퐼퐶푊 provided evidence in favor of null models, but differences were not substantial. In addition, the
609 “plug-in” fractional Bayes factors also gave evidence in favor of null models. As in part I, the
610 performance of the fitted models was more similar in dataset 2 than in dataset 1.
611 A broad observation is that when combining the results obtained here with those obtained in the
612 companion paper, the overall behavior observed in part I was kept. In general, what was observed in
613 these small simulations was that under the biological scenario simulated in dataset 1, full models
614 tended to have better performance, whereas in the setting simulated in dataset 2, null models tended
615 to outperform full models. In all cases differences were small (except for models 푀1퐺 and 푀0퐺 in
616 dataset 1). Therefore, after including the outputs of the spike and slab models, our results are still in
617 agreement with those found by Olson et al. (2012), Makgahlela et al. (2013), de los Campos et al.
618 (2015) and Lehermeier et al. (2015).
619 5. Conclusions
620 This study enlarges the family of hierarchical Bayesian models for across population genome-wide
621 prediction accounting for randomness of genotypes derived in the companion paper (part I) by
622 considering the so called spike and slab priors (multivariate and univariate) for marker allele
623 substitution effects. This class of priors allows a stronger shrinkage towards zero and variable
624 selection at group level. This development concedes even more flexibility to our family of models
625 because the user will have more modelling options that permit to cope with a wider spectrum of
626 biological scenarios. For example, for traits controlled by genes with major effects or controlled by a
627 small number of genes, using spike and slab priors is theoretically advantageous.
628 The approximate Bayes factors and fractional Bayes factors derived here can be used to complement
629 other criteria such as measures of accuracy of predicted breeding values and correlations between
630 predicted breeding values and phenotypes when comparing models. These criteria were derived
34
631 under the assumption of a full rank model which is currently satisfied in certain populations and we
632 believe that it will become an increasingly more frequent situation as more individuals are
633 genotyped. The invariance of our approximate fractional Bayes factor to the choice of the generalized
′ ′ 634 inverses of 푊 푊 and 푊0푊0 seems promising because it allows the use of this criterion in the non-
635 full rank case. However, a formal justification or rejection of this criterion remains an open problem.
636 For now, this criterion might be used ad hoc, keeping always in mind the risks that it implies.
637 In addition to all the possible extensions and refinements of our models discussed in the companion
638 paper, the modification of the spike and slab priors presented here to allow sparsity within group
639 (marker) is another aspect that opens a path for further research.
640 Author Contributions
641 C.A. Martínez developed modeling strategies, carried out the derivations, wrote the R scripts,
642 designed and made the simulations and wrote the paper. K. Khare advised modeling strategies,
643 reviewed, corrected and discussed the derivations and the statistical aspects of the paper. A. Banerjee
644 advised modeling strategies, reviewed, corrected and discussed the derivations and the statistical
645 aspects of the paper. M.A. Elzo designed the simulation, reviewed, corrected and discussed the
646 genetic aspects of the paper.
647 Acknowledgments
648 Authors acknowledge Dr. Malay Ghosh from the Department of Statistics of the University of
649 Florida for useful comments and discussions, and for pointing out relevant references. C. A. Martínez
650 also thanks Fulbright Colombia and “Departamento Adiministrativo de Ciencia, Tecnología e
651 Innovación” COLCIENCIAS for supporting his PhD and Master programs at the University of
652 Florida through a scholarship, and Bibiana Coy for her love, support and constant encouragement.
653 References
654 Casella, G., Berger, R. (2002). Statistical Inference (2nd ed.). Duxbury, Pacific Grove, CA, USA.
35
655 CDCB Council on Dairy Cattle Breeding (2016). Genetic evaluations-genotype counts. Retrieved
656 January, 10, 2016 from https://www.cdcb.us/Genotype/counts.html.
657 Christensen, R. (2011). Plane Answers to Complex Questions the Theory of Linear Models (4th ed.).
658 Springer, New York, NY, USA.
659 de los Campos, G., Vázquez, A.I., Fernando, R., Klimentidis, Y.C., Sorensen, D. (2013). Prediction
660 of Complex Human Traits Using the Genomic Best Linear Unbiased Predictor. PLOS
661 Genetics, 9(7), e1003608.
662 de los Campos, G., Veturi, Y., Vázquez, A.I., Lehermeier, C., Pérez-Rodríguez, P. (2015).
663 Incorporating genetic heterogeneity in whole-genome regressions using interactions. Journal
664 of Agricultural, Biological and Environmental Statistics, 20(4), 467-490.
665 Daetwyler, H.D., Pong-Wong R., Villanueva, B., Woolliams, J.A. (2012). The impact of genetic
666 architecture on genome-wide evaluation methods. Genetics, 185, 1021-1031.
667 DiCiccio, T.J., Kass, R.E., Raftery, A., Wasserman, L. (1997). Computing Bayes Factors by
668 Combining Simulations and Asymptotic Approximations. Journal of the American Statistical
669 Association, 92(439), 903-915.
670 Ghosh, J.K., Delampady, M., Samanta, T. (2006). An Introduction to Bayesian Analysis Theory and
671 Methods. Springer, New York, NY, USA.
672 Gianola, D. (2013). Priors in whole-genome regression: The Bayesian alphabet returns. Genetics,
673 194, 573-596.
674 Gianola, D., Rosa, G. (2015). One Hundred Years of Statistical Developments in Animal Breeding.
675 Annual Reviews of Animal Biosciences, 3, 19-56.
676 Harville, D. (2000). Matrix Algebra from a Statistician’s Perspective. Springer, New York, NY,
677 USA.
678 Heslot, N., Yang, H.P., Sorrells, M.E., & Jannink, J.L. (2012). Genomic selection in plant breeding:
679 A comparison of models. Crop Science, 52, 146-160.
36
680 Jeffreys, H. (1961). Theory of Probability (3rd ed.). Oxford University Press, Oxford, UK.
681 Kosmala, W.A.J. (2004). A Friendly Introduction to Analysis Single and Multivariate (2nd ed.).
682 Pearson, Upper Saddle River, NJ, USA.
683 Lavine, M., Schervish, M.T. (1999). Bayes Factors: What They Are and What They Are Not. The
684 American Statistician, 53(2), 119-122.
685 Lehermeir, C., Schon, C., de los Campos, G. (2015). Assessment of genetic heterogeneity in
686 structured plant populations using multivariate whole-genome regression models. Genetics,
687 201, 323-337.
688 Lehmann, E.L., Casella, G. (1998). Theory of point estimation. Springer, New York, NY, USA.
689 Lewis, S.M., Raftery, A.E. (1997). Estimating Bayes Factors via Posterior Simulation With the
690 Laplace-Metropolis Estimator. Journal of the American Statistical Association, 92(438), 648-
691 655.
692 Makgahlela, M.L., Mantysaari, E.A., Stranden, I., Koivula, M., Nielsen, U.S., Sillanpaa, M.J., Juga,
693 J. (2013). Across breed multi-trait random regression genomic predictions in the Nordic Red
694 dairy cattle. Journal of Animal Breeding and Genetics, 130, 10-19.
695 Mitchell, T.J., Beauchamp, J.J. (1988). Bayesian Variable Selection in Linear Regression. Journal of
696 the American Statistical Association, 83(404), 1023-1032.
697 O’Hagan, A. (1994). Bayesian Inference. Kendall’s Advanced Theory of Statistics, Vol. 2B. Halsted
698 Press, New York, NY, USA.
699 O’Hagan, A. (1995). Fractional Bayes Factors for model comparison. Journal of the Royal Statistical
700 Society. Series B, 57(1), 99-138.
701 Olson, K.M., VanRaden, P.M., Tooker, M.E. (2012). Multibreed genomic evaluations using purebred
702 Holsteins, Jerseys, and Brown Swiss. Journal of Dairy Science, 95, 5378-5383.
703 R Core Team (2015). R: A language and environment for statistical computing. R foundation for
704 statistical computing, Vienna, Austria. URL https://www.R-project.org/.
37
705 Raftery, A.E. (1996). Approximate Bayes Factors and Accounting for Model Uncertainty in
706 Generalised Linear Models. Biometrika, 86(2), 251-266.
707 Sargolzaei, M., Schenkel, F.S. (2013). QMSim User’s Guide Version 1.10. Centre for Genetic
708 Improvement of Livestock, Department of Animal and Poultry Science, University of
709 Guelph, Guelph, Canada.
710 Searle, S.R. (1971). Linear Models. John Wiley & Sons, New York, NY, USA.
711 Spiegelhalter, D.J., Best, N.G., Carlin, B.P., van der Linde, A. (2002). Bayesian measures of model
712 complexity and fit (with discussion). Journal of the Royal Statistical Society Series B, 64,
713 583–639.
714 Spiegelhalter, D.J., Best, N.G., Carlin, B.P., van der Linde, A. (2014). The deviance information
715 criterion: 12 years on. Journal of the Royal Statistical Society Series B, 76, 485–493.
716 Tierney, L., Kadane, J.B. (1986). Accurate approximations for posterior moments. Journal of the
717 American Statistical Association, 81, 82-86.
718 van den Berg, S., Calus, M.P.L., Meuwissen, T.H.E., Wientjes, Y.C.J. (2015). Across population
719 genomic prediction scenarios in which Bayesian variable selection outperforms GBLUP.
720 BMC Genetics, 16:416.
721 Vázquez, A.I., Rosa, G.J.M., Weigle, K.A., de los Campos, G., Gianola, D., Allison, D.B. (2010)
722 Predictive ability of subsets of single nucleotide polymorphisms with and without parent
723 average in US Holsteins. Journal of Dairy Science, 93(12), 5942-5949.
724 Vázquez, A.I., de los Campos, G., Klimentidis, Y.C., Rosa, G.J.M., Gianola, D., Yi, N., Allison,
725 D.B. (2012). A Comprehensive Genetic Approach for Improving Prediction of Skin Cancer
726 Risk in Humans. Genetics, 192, 1493-1502.
727 Xu, X., Ghosh, M. (2015). Bayesian variable selection and estimation for group Lasso. Bayesian
728 Analysis, 10(4), 909-936.
729
38
730 Appendix A: Joint posteriors, full conditionals and details of some derivations 731 732 Joint posteriors 733 Spike and slab prior for 품, homogeneous marker effect covariance matrix and homoscedastic 734 residuals 푛 − −1 휋(품, 휎2, 푊푁, 퐺, 푃|풚, 푊ℴ) ∝ (휎2) 2 exp ( (풚 − 푊품)′(풚 − 푊품)) 2휎2
푚 푚 ( 푖 ) 푖 ( )푚−푖 ( 0) × ∑ ∑ 퐼훿푖푙 휋0 1 − 휋0 ∏ 푀푉푁 품푘; ퟎ, 퐺 푐 푖=0 푙=1 푘:푔푘∈훿푖푙0
1 − (푎+풮+1) −1 × |퐺0| 2 exp ( 푡푟(횺(퐺0)−1)) 2
푣 2 −( +1) −휏 × (휎2) 2 exp ( ) 2휎2
× 휋(푊|푃∗)휋(푃∗).
735 Spike and slab prior for 품, heterogeneous marker effect covariance matrix and homoscedastic 736 residuals 푛 − −1 휋(품, 휎2, 푊푁, 퐺, 푃|풚, 푊ℴ) ∝ (휎2) 2 exp ( (풚 − 푊품)′(풚 − 푊품)) 2휎2 푚 푚 ( 푖 ) 푖 ( )푚−푖 ( ) × ∑ ∑ 퐼훿푖푙 휋0 1 − 휋0 ∏ 푀푉푁 품푘; ퟎ, 퐺푘 푐 푖=0 푙=1 푘:푔푘∈훿푖푙0 푚 1 − (푎+풮+1) −1 × ∏|퐺 | 2 exp ( 푡푟(횺퐺−1)) , 푗 2 푗 푗=1 푣 2 −( +1) −휏 × (휎2) 2 exp ( ) 2휎2 × 휋(푊|푃∗)휋(푃∗) 737 738 Full conditionals 739 Spike and slab prior for 품 740 Derivations for the heterogeneous marker effect covariance matrix model are presented. The 741 homogeneous marker effect covariance matrix model is simply a special case with covariance 0 742 matrices satisfying: 퐺1 = 퐺2 = ⋯ = 퐺푚 = 퐺 . 푚 푚 ( 푖 ) ( | ) 푖 ( )푚−푖 ( ) ( | 2) 휋 품 퐸푙푠푒 ∝ ∑ ∑ [퐼훿푖푙 휋0 1 − 휋0 ∏ 푀푉푁 품푘; ퟎ, 퐺푘 ] × 푓 풚 푊, 품, 휎 푐 푖=0 푙=1 푘:푔푘∈훿푖푙0
39
푚 푚 ( 푖 ) 1 1 ( ′ ′ ′ ′ ) 푖 ( )푚−푖 ′ −1 ∝ exp (− 품 푊 푊품 − 2품 푊 풚 ) ∑ ∑ 퐼훿푖푙 휋0 1 − 휋0 ∏ exp (− 품 퐺푘 품). 2 푐 2 푖=0 푙=1 푘:푔푘∈훿푙푖0 743 Notice that under a particular 훿푖푙: 푚 푚 ′ ′ ′ ′ 품 푊 푊품 = ∑ ∑ 품푗푊푗 푊ℎ품ℎ 푗=1 ℎ=1 푚 푚 ′ ′ = ∑ 품푗푊푗 ∑ 푊ℎ품ℎ 푗=1 ℎ=1 푚 푚 ′ ′ ′ ′ = ∑ 품푘푊푘 ∑ 푊ℎ품ℎ + ∑ 품푘푊푘 ∑ 푊ℎ품ℎ 푐 푘:품푘∈훿푖푙0 ℎ=1 푘:품푘∈훿푖푙0 ℎ=1 744 but for 품푘 ∈ 훿푖푙0, 품푘 = 0, hence, under 훿푖푙: 푚 ′ ′ ′ ′ 품 푊 푊품 = ∑ 품푘푊푘 ∑ 푊ℎ품ℎ 푐 푐 푘:품푘∈훿푖푙0 ℎ:품푘∈훿푖푙0 ′ ′ = ∑ ∑ 품푘푊푘푊ℎ품ℎ 푐 푐 푘:품푘∈훿푖푙0 ℎ:품푘∈훿푖푙0 ′ ′ = 품 푐 푊 푐 푊 푐 품 푐 훿푖푙0 훿푖푙0 훿푖푙0 훿푖푙0 ′ ′ ′ 푐 745 where 품 푐 = (품푘 ⋯ 품푘 ) , 푘: 품 ∈ 훿 corresponds to the vector of dimension 풮(푚 − 푖) 훿푖푙0 1 푚−푖 푘 푖푙0 ′ 746 with the non-null marker effects under 훿 , 푊 푐 is the submatrix of the design matrix corresponding 푖푙 훿푖푙0 747 to 품 푐 . 훿푖푙0 748 Similarly ′ ′ ′ 품′푊 풚 = ∑ 품푘푊푘 풚 푐 푘:품푘∈훿푖푙0 ′ ′ = 품 푐 푊 푐 풚, 훿푖푙0 훿푖푙0 749 in addition, notice that: ′ −1 ′ −1 ∑ 품 퐺 품 = 품 푐 퐺 푐 품 푐 푘 푘 푘 훿푖푙0 훿푖푙0 훿푖푙0 푐 푘:품푘∈훿푖푙0 750 where −1 퐺푘1 −1 퐺훿푐 = ( ⋱ ) 푖푙0 −1 퐺푘푚−푖 751 therefore, 휋(품|퐸푙푠푒) ∝ (푚) 푚 푖 ′ 푊 푐 푊훿푐 푖 푚−푖 1 ′ 훿푖푙0 푖푙0 −1 2 ′ ′ ∑ ∑ 퐼훿 휋0(1 − 휋0) exp (− (품 푐 ( + 퐺 푐 ) 품훿푐 ) − 품 푐 푊 푐 풚) 푖푙 2 훿푖푙0 휎2 훿푖푙0 푖푙0 휎2 훿푖푙0 훿푖푙0 푖=0 푙=1 752 i.e., 휋(품|퐸푙푠푒) =
40
(푚) 푚 푖 ′ −1 ′ ′ −1 푊 푐 푊훿푐 푊 푐 풚 푊 푐 푊훿푐 푖 푚−푖 훿푖푙0 푖푙0 −1 훿푖푙0 훿푖푙0 푖푙0 −1 ∑ 휋0(1 − 휋0) ∑ 퐼훿 푀푉푁 (품훿푐 ; ( + 퐺 푐 ) , ( + 퐺 푐 ) ) . 푖푙 푖푙0 휎2 훿푖푙0 휎2 휎2 훿푖푙0 푖=0 푙=1 753 754 In the case of the full conditional distribution of a single marker effect: 휋(품 |퐸푙푠푒) ∝ 휋 퐼 + (1 − 휋 )푀푉푁(품 ; 0, 퐺 )퐼 푓(풚|푊, 품, 휎2) 푗 ( 0 {품푗=ퟎ} 0 푗 푗 {품푗≠ퟎ}) 1 ∝ 휋 exp (− (품′ 푊′ 푊 품 − 2품′ 푊′ 풚)) 퐼 0 2 (−푗) (−푗) (−푗) (−푗) (−푗) (−푗) {품푗=ퟎ} 1 +(1 − 휋 ) exp (− (품′ (푊′푊 + 퐺−1)품 − 2품′ 푊′(풚 − 푊 품 ))) 퐼 0 2 푗 푗 푗 푗 푗 푗 푗 (−푗) (−푗) {품푗≠ퟎ}
755 where 품(−푗) corresponds to the vector 품 without subvector 품푗 and 푊(−푗) corresponds to the design
756 matrix after deleting columns corresponding to marker 푗, and 푊푗 is the design matrix corresponding
757 to 품푗. Thus, the full conditional of 품푗 is also a spike and slab distribution. A more explicit form of 758 this distribution can be found by computing the mixing probabilities. To this end: 759 휋 푓(풚|푊, 푔, 휎2)퐼 + (1 − 휋 )휋(품 )푓(풚|푊, 품, 휎2)퐼 0 {품푗=ퟎ} 0 푗 {품푗≠ퟎ} 휋(품푗|퐸푙푠푒) = 2 2 휋0푓(풚|푊, 품(−푗), 품푗 = 0, 휎 ) + (1 − 휋0) ∫ 휋(품푗)푓(풚|푊, 품, 휎 ) 푑품푗 품푗≠ퟎ 휋 푓(풚|푊, 푔, 휎2)퐼 0 {품푗=ퟎ} ⇒ 푝(품푗 = 0|퐸푙푠푒) = 2 2 휋0푓(풚|푊, 품(−푗), 품푗 = 0, 휎 ) + (1 − 휋0) ∫ 휋(품푗)푓(풚|푊, 품, 휎 ) 푑품푗 품푗≠ퟎ 2 2 760 let 푚(풚|푊, 품(−푗), 휎 ) = ∫ 휋(품푗)푓(풚|푊, 품, 휎 ) 푑품푗, then: 품푗≠ퟎ 2 2 휋0푓(풚|푊, 푔, 휎 )퐼{품 =ퟎ} + (1 − 휋0)휋(품푗)푓(풚|푊, 품, 휎 )퐼{품 ≠ퟎ} 휋(품 |퐸푙푠푒) = 푗 푗 푗 2 2 휋0푓(풚|푊, 품(−푗), 품푗 = 0, 휎 ) + (1 − 휋0)푚(풚|푊, 품(−푗), 휎 ) 761
762 also notice that the pdf 휋(품푗|퐸푙푠푒) can be written as 2 휋(품푗)푓(풚|푊,품,휎 ) 763 푝(품푗 = 0|퐸푙푠푒)퐼{품 =ퟎ} + (1 − 푝(품푗 = 0|퐸푙푠푒)) 2 퐼{품 ≠ퟎ}, this follows because 푗 푚(풚|푊,품(−푗),휎 ) 푗 764 휋(품 )푓(풚|푊, 품, 휎2) 푝(품 = 0|퐸푙푠푒)퐼 + (1 − 푝(품 = 0|퐸푙푠푒)) 푗 퐼 푗 {품푗=ퟎ} 푗 2 {품푗≠ퟎ} 푚(풚|푊, 품(−푗), 휎 ) 2 휋0푓(풚|푊, 푔, 휎 )퐼{품 =ퟎ} = 푗 퐼 2 2 {품푗=ퟎ} 휋0푓(풚|푊, 품(−푗), 품푗 = 0, 휎 ) + (1 − 휋0)푚(풚|푊, 품(−푗), 휎 ) 2 2 휋0푓(풚|푊, 푔, 휎 )퐼{품 =ퟎ} 휋(품 )푓(풚|푊, 품, 휎 ) + (1 − 푗 ) 푗 퐼 2 2 2 {품푗≠ퟎ} 휋0푓(풚|푊, 품(−푗), 품푗 = 0, 휎 ) + (1 − 휋0)푚(풚|푊, 품(−푗), 휎 ) 푚(풚|푊, 품(−푗), 휎 ) 765 2 휋0푓(풚|푊, 푔, 휎 )퐼{품 =ퟎ} = 푗 퐼 2 2 {품푗=ퟎ} 휋0푓(풚|푊, 품(−푗), 품푗 = 0, 휎 ) + (1 − 휋0)푚(풚|푊, 품(−푗), 휎 )
41
2 2 (1 − 휋 )푚(풚|푊, 품( ), 휎 ) 휋(품 )푓(풚|푊, 품, 휎 ) + ( 0 −푗 ) 푗 퐼 2 2 2 {품푗≠ퟎ} 휋0푓(풚|푊, 품(−푗), 품푗 = 0, 휎 ) + (1 − 휋0)푚(풚|푊, 품(−푗), 휎 ) 푚(풚|푊, 품(−푗), 휎 ) 766 2 2 휋0푓(풚|푊, 푔, 휎 )퐼{품 =ퟎ} + (1 − 휋0)휋(품푗)푓(풚|푊, 품, 휎 )퐼{품 ≠ퟎ} = 푗 푗 2 2 휋0푓(풚|푊, 품(−푗), 품푗 = 0, 휎 ) + (1 − 휋0)푚(풚|푊, 품(−푗), 휎 )
= 휋(품푗|퐸푙푠푒). 767
768 Therefore, the remaining task is finding the explicit form of 푝(품푗 = 0|퐸푙푠푒) under the model being 769 considered. Doing some algebra, using the completing the quadratic form technique and properties of 770 the multivariate normal distribution it can be shown that: 771
−1⁄2 1 2 ∫ 휋(품 )푓(풚|푊, 품, 휎2) 푑품 = (2휋휎2)−푛⁄2|퐺 | exp (− ‖풚 − 푊 품 ‖ ) 푗 푗 푗 2휎2 (−푗) (−푗) 2 품푗≠ퟎ 2 −1⁄2 1 −1⁄2 ′ × |퐺퐹푗| exp ( ‖퐺 푊 (풚 − 푊(−푗)품(−푗))‖ ), 2휎2 퐹푗 푗 2 772 ′ −1 773 where 퐺퐹푗: = 푊푗 푊푗 + 퐺푗 . Now, using the fact that 1 2 푓(풚|푊, 품 , 품 = 0, 휎2) = (2휋휎2)−푛⁄2 exp (− ‖풚 − 푊 품 ‖ ) (−푗) 푗 2휎2 (−푗) (−푗) 2 774 it follows that 휋0 푝(품푗 = 0|퐸푙푠푒) = 2 . −1⁄2 1 −1⁄2 ′ 휋0 + (1 − 휋0)(|퐺퐹푗||퐺푗|) exp ( 2 ‖퐺퐹푗 푊푗 (풚 − 푊(−푗)품(−푗))‖ ) 2휎 2 775 Thus, the full conditional distribution of 품푗 is a spike and slab distribution where the slab component −1 ′ −1 풮 776 is a 푀푉푁(퐺퐹푗 푊푗 (풚 − 푊(−푗)품(−푗)), 퐺퐹푗 ) and the spike is a point mass at 0 in ℝ . 777 778 Full conditionals for models with heteroscedastic residuals 779 In this case: 1 푓(풚|푊, 품, 푅) ∝ |푉|−1⁄2 exp (− (풚 − 푊품)′푉−1(풚 − 푊품)) 2 풮 1 = ∏(휎2)−푛푙⁄2 exp (− (풚 − 푊 품 )′(풚 − 푊 품 )). 푙 2휎2 푙 푙 푙 푙 푙 푙 푙=1 푙 780 In addition 풮 휏2 휋(푅) ∝ ∏(휎2)−(푣⁄2+1) exp (− ). 푙 2휎2 푙=1 푙 781 In the following, only the full conditionals that change with respect to the homoscedastic models and 782 those presented in part I are presented. 783 Under a spike and slab prior for 품, the full conditionals that change with respect to the 784 homoscedastic residuals model are:
42
휋(품푗|퐸푙푠푒) = 푝(품 = 0|퐸푙푠푒)퐼 + 1 − 푝(품 = 0|퐸푙푠푒) 푀푉푁(퐺−1 푊′푉−1(풚 − 푊 품 ), 퐺−1 )퐼 푗 {품푗=ퟎ} ( 푗 ) 푉퐹푗 푗 (−푗) (−푗) 푉퐹푗 {품푗≠ퟎ} 785 −1 ′ −1 0 −1 786 where 퐺푉퐹푗 = 푊푗 푉 푊푗 + (퐺 ) and 787 휋 푝(품 = 0|퐸푙푠푒) = 0 푗 −1⁄2 2 1 −1⁄2 ′ −1 휋0 + (1 − 휋0)(|퐺푉퐹푗||퐺푗|) exp ( ‖퐺푉퐹푗 푊푗 푉 (풚 − 푊(−푗)품(−푗))‖ ) 2 2 788 The full conditionals 휋(푅|퐸푙푠푒) and 휋(푊푁|퐸푙푠푒) change with respect to the homoscedastic model 789 but these are the same as for the model with Gaussian prior presented in part I. 790 For the model with spike and slab prior for 품 and heterogeneous marker effect covariance matrices, 791 휋(푅|퐸푙푠푒) and 휋(푊|퐸푙푠푒) also remain unchanged with respect to the model with Gaussian prior.
792 Regarding the full conditional of marker additive effects 휋(품푗|퐸푙푠푒), it is similar to the case of the 793 model with homogeneous marker effect covariance matrices, the only difference is that in this model −1 ′ −1 −1 794 퐺푉퐹푗 = 푊푗 푉 푊푗 + 퐺푗 . 795 푵 퓸 796 Algebraic simplification of 흅(푾 |푾 , 푴ퟎ) 797 798 Here we recursively use the following well-known property of the Gamma function: Γ(훼 + 1) = 799 훼Γ(훼) as well as the definition of the Beta function. 800 푛 퐻 푚 퐵푗 퐴푗 푁푗 푛푁 Γ (푛 + 훼) Γ (푛 + 훽) 푁 ℴ 2 푁 푁 휋(푊 |푊 , 푀0) = ∏ { ∏ 휋 (푤푖′푗|푤푆 푗, 푤퐷 푗)} 퐵(훼, 훽)푚 푖′ 푖′ Γ(2푓푁푗 + 훼 + 훽) ′ 푗=1 푖 =푓푁푗+1 퐵 퐴 푛푁푗 푚 Γ (푛 푗 + 훼) Γ (푛 푗 + 훽) 푛퐻 푁 푁 = 2 푁 ∏ { ∏ 휋 (푤 ′ |푤 , 푤 )}, 푖 푗 푆푖′푗 퐷푖′푗 퐵(훼, 훽)Γ(2푓푁푗 + 훼 + 훽) ′ 푗=1 푖 =푓푁푗+1 801 notice that
Γ(2푓푁푗 + 훼 + 훽) = (2푓푁푗 − 1 + 훼 + 훽)Γ(2푓푁푗 − 1 + 훼 + 훽)
= (2푓푁푗 − 1 + 훼 + 훽)(2푓푁푗 − 2 + 훼 + 훽)Γ(2푓푁푗 − 2 + 훼 + 훽)
= (2푓푁푗 − 1 + 훼 + 훽) ⋯ (2푓푁푗 − (2푓푁푗 − 1) + 훼 + 훽)Γ(2푓푁푗 − (2푓푁푗 − 1) + 훼 + 훽) 2푓푁푗
= Γ(훼 + 훽) ∏(2푓푁푗 − 푘 + 훼 + 훽) 푘=1 2푓푁푗 Γ(훼)Γ(훽) ⇒ 퐵(훼, 훽)Γ(2푓 + 훼 + 훽) = Γ(훼 + 훽) ∏(2푓 − 푘 + 훼 + 훽) 푁푗 Γ(훼 + 훽) 푁푗 푘=1 2푓푁푗
= Γ(훼)Γ(훽) ∏(2푓푁푗 − 푘 + 훼 + 훽) (퐴. 1) 푘=1 802 similarly,
43
퐵푗 푛푁 퐵푗 퐵푗 Γ (푛푁 + 훼) = Γ(훼) ∏ (푛푁 − 푘 + 훼) (퐴. 2) 푘=1 퐴푗 푛푁 퐴푗 퐴푗 Γ (푛푁 + 훽) = Γ(훽) ∏ (푛푁 − 푘 + 훽) (퐴. 3) 푘=1 803 consequently, from (퐴. 1), (퐴. 2) and (퐴. 3) it follows that 퐵푗 퐴푗 퐵푗 퐴푗 푛푁 퐵푗 푛푁 퐴푗 Γ (푛푁 + 훼) Γ (푛푁 + 훽) Γ(훼)Γ(훽) ∏푘=1 (푛푁 − 푘 + 훼) ∏푘=1 (푛푁 − 푘 + 훽) = 2푓푁푗 퐵(훼, 훽)Γ(2푓푁푗 + 훼 + 훽) Γ(훼)Γ(훽) ∏푘=1 (2푓푁푗 − 푘 + 훼 + 훽) 퐵푗 퐴푗 푛푁 퐵푗 푛푁 퐴푗 ∏푘=1 (푛푁 − 푘 + 훼) ∏푘=1 (푛푁 − 푘 + 훽) = 2푓푁푗 ∏푘=1 (2푓푁푗 − 푘 + 훼 + 훽) 804 푁 ℴ 805 plugging this expression in 휋(푊 |푊 , 푀0) it follows that: 806 푁 ℴ 휋(푊 |푊 , 푀0) 퐵푗 퐴푗 푛 퐵 푛 퐴 푛푁푗 푚 ∏ 푁 (푛 푗 − 푘 + 훼) ∏ 푁 (푛 푗 − 푘 + 훽) 푛퐻 푘=1 푁 푘=1 푁 = 2 푁 ∏ { ∏ 휋 (푤 ′ |푤 , 푤 )} . 2푓푁푗 푖 푗 푆푖′푗 퐷푖′푗 ∏ (2푓 − 푘 + 훼 + 훽) ′ 푗=1 푘=1 푁푗 푖 =푓푁푗+1 807 808 Appendix B: Details of the derivation of Bayes factors and fractional Bayes factors 809
810 For model 푀1퐺 the Hessian matrix of the log-likelihood is: 1 −푊′푊 (푊′푊품 − 푊′풚) 1 휎2 퐻 = , 1퐺 휎2 1 푛 (풚 − 푊품)′(풚 − 푊품) 푆푦푚 ( − ) 휎2 2 휎2 ( ) ̂∗ 811 thus, matrix ∆ℎ(휽 ) is: 푊′푊 푛 2 ( ) (푊′푊(푊′푊)−1푊′풚 − 푊′푦) (푛 − 푟)푆2 (푛 − 푟)푆2 푛2 푠푦푚 2 2 ( 2((푛 − 푟)푆 ) ) 푊′푊 0 = ( 푆푆푅 ), 푛2 0 2푆푆푅2 812 therefore:
44
1 ′ 1⁄2 푛 |푊 푊| 2 |∆ (휽̂∗)| = ( ) ≔ 퐷 ℎ 푆푆푅(푚풮+2)/2 2 1 휋(휽̂∗) = 휋(품̂|퐺0) 1 (휏2)푣⁄2 −휏2 = (2휋)−푚풮⁄2|퐺0|−푚/2 exp (− (품̂′((퐺0)−1⨂퐼)품̂)) (휎̂2)−(푣⁄2+1) exp ( ), 2 푣 푣⁄2 2휎̂2 Γ (2) 2 813 thus
−(푛+푣+2)⁄2 2 1 푆푆푅 푛 ∫ ∫ 휋(품|퐺0)휋(휎2)푒ln 푓(풚|품,휎 ,푊) 푑휎2푑품 ≈ (2휋)−(푛+1)⁄2 ( ) exp (− ) 퐷1 푛 2 푚풮 ℝ ℝ+ (휏2)푣⁄2 푛휏2 1 × exp (− ) 푛−(푚풮+1)⁄2|퐺0|−푚/2 exp (− (품̂′((퐺0)−1⨂퐼)품̂)) 푣 푣⁄2 2푆푆푅 2 Γ (2) 2 −(푛+푣+2)⁄2 ( 2)푣⁄2 2 1 −(푛+1)⁄2 푆푆푅 푛 휏 푛휏 −(푚풮+1)⁄2 = (2휋) ( ) exp (− ) 푣 exp (− ) 푛 퐷1 푛 2 푣⁄2 2푆푆푅 Γ (2) 2 1 × |퐺0|−푚/2 exp (− (풚′푊(푊′푊)−1((퐺0)−1⨂퐼)품̂)(푊′푊)−1푊′풚). 2 814 퐾 −1 ≔ 1 |퐺0|−푚⁄2 exp ( 풚′퐶(푊, 퐺0)풚), 퐷1 2 815 where −(푛+푣+2)⁄2 ( 2)푣⁄2 2 −(푛+1)⁄2 푆푆푅 푛 휏 푛휏 −(푚풮+1)⁄2 퐾1 = (2휋) ( ) exp (− ) exp (− ) 푛 푛 2 푣 푣⁄2 2푆푆푅 Γ (2) 2 ′ 2 푆푆푅 = 풚 (퐼 − 퐻푊)풚 = (푛 − 푟)푆 퐶(푊, 퐺0) = 푊(푊′푊)−1(퐼⨂(퐺0)−1)(푊′푊)−1푊. 816 Then:
퐾1 0 0 −푚⁄2 −1 ′ 0 0 푓(풚|푊, 푀1퐺) ≈ ∫ 휋(퐺 ) |퐺 | exp ( 풚 퐶(푊, 퐺 )풚) 푑퐺 퐷1 2 + 풫풮 | |푎⁄2 푚 퐾1 횺 0 −(푎+푚+풮+1)⁄2 1 ′ 0 −1 0 = 푎 ∫|퐺 | exp (− 푡푟 ((Σ + ∑ 품̂푗품̂푗) (퐺 ) )) 푑퐺 푎풮⁄2 2 푗=1 퐷12 Γ풮 ( ) + 2 풫풮 푎 + 푚 푎⁄2 퐾 |횺| Γ풮 ( ) = 2푚풮⁄2 1 2 . (푎+푚)⁄2 푎 퐷1 푚 ′ Γ ( ) |횺 + ∑푗=1 품̂푗품̂푗| 풮 2 817 ′ 0 ′ 0 −1 818 The second equality follows by noticing that 풚 퐶(푊, 퐺 )풚 = 품̂ (퐼푚⨂(퐺 ) )품̂. 819 The univariate version of the 퐼푊(푎, 휮) prior posed over the covariance matrix of marker effects 퐺0 is 푎 푏 820 an 퐼퐺 ( , ) prior for the marker effect variance 휎2. Therefore, the expression for the null model is 2 2 푔 푎 푏 821 easily found by replacing 풮 by 1, 푊 by 푊 , the 퐼푊(푎, 횺) density by a 퐼퐺 ( , ) and integrating with 0 2 2 45
2 822 respect to 휎푔 . The resulting expression is completely analogous with matrix 횺 replaced by the scalar
823 푏 and vectors 품̂푗 by scalars 푔̂0푗. This relationship holds for other models and their null versions, 824 hence, hereinafter the derivations for the null models are not presented. Thus, 825 푎 + 푚 퐾 |푏|푎⁄2 Γ ( ) 푓(풚|푊 , 푀 ) ≈ 2푚⁄2 0 2 0 0퐺 (푎+푚)⁄2 푎 퐷0 푚 2 Γ ( ) |푏 + ∑푗=1 푔̂0푗| 2 826 ′ ( )−1 827 where 퐾0 is 퐾1 with 풮 = 1, 푆푆푅 replaced by 푆푆푅0 = 푦 (퐼 − 퐻푊0)푦, 퐻푊0 = 푊0 푊0′푊0 푊0, and 2 2 2 ‖풚−푊0품̂0‖ 828 푆 replaced by 푆0 = , 푟0 = 푟푎푛푘(푊0′푊0) = 푚. 푛−푟0 829 Using these results it follows that:
퐵퐹10퐺푊 푎 + 푚 푎 + 푚 푚풮⁄2 푎⁄2 푎⁄2 2 퐾 퐷 Γ풮 ( ) Γ ( ) |횺| 푏 ≈ 1 0 ( 2 ⁄ 2 ) ( ⁄ ) 푚⁄2 푎 푎 (푎+푚)⁄2 (푎+푚)⁄2 2 퐾0 퐷1 Γ ( ) Γ ( ) 푚 ′ 푚 2 풮 2 2 |횺 + ∑푗=1 품̂푗품̂푗| (푏 + ∑푗=1 푔̂0푗) 푎+푚 푎 −( ) 푛+푣+2 1 2 푚 ′ 2 −( ) (푚풮+2)⁄2 2 |횺| |횺 + ∑ 품̂푗품̂ | 푆푆푅 2 푆푆푅 |푊 ′푊 | = ( ) ( 푗=1 푗 ) ( ) ( 0 0 ) 푏 푏 + ∑푚 푔̂2 푆푆푅 (푚+2)⁄2 |푊′푊| 푗=1 0푗 0 푆푆푅0 푚(풮−1) 풮 푎 + 푚 + 1 − 푙 −푛휏2 1 1 2 2 Γ ( ) × exp ( ( − )) ( ) (∏ 2 ). 2 푆푆푅 푆푆푅0 푛 푎 + 1 − 푙 푙=2 Γ ( 2 ) 830 ∗ 831 The Hessian matrix for model 푀1퐺 does not change with respect to model 푀1퐺 because the likelihood 832 remains the same, thus:
−(푛+푣+2)⁄2 2 1 푆푆푅 ∫ ∫ 휋(품|퐺)휋(휎2)푒ln 푓(풚|품,휎 ,푊) 푑휎2푑품 ≈ (2휋)−(푛+1)⁄2 ( ) 푒−푛⁄2 퐷1 푛 푚풮 ℝ ℝ+ 푚 ( 2)푣⁄2 2 휏 푛휏 −(푚풮+1)⁄2 −1/2 1 ′ −1 × 푣 exp (− ) 푛 ∏|퐺푗| exp (− (품̂푗퐺푗 품̂푗)). Γ ( ) 2푣⁄2 2푆푆푅 2 2 푗=1 833 Using this it follows that: | |푎푚/2 ∗ 퐾1 횺 휋(풚|푊, 푀1퐺) ≈ 푚 푎푚풮/2 푎 퐷12 (Γ풮 (2)) 푚 −1(푎+풮+2)/2 1 × ∫ ∏|퐺 | exp (− 푡푟(횺퐺−1 + 품̂ 품̂′ 퐺−1)) 푑퐺 푗 2 푗 푗 푗 푗 + + 푗=1 풫풮 ×⋯×풫풮 푚 푎푚/2 퐾 |횺| −1(푎+풮+2)/2 1 = 1 ∏ ∫|퐺 | exp (− 푡푟 ((횺 + 품̂ 품̂′ )퐺−1)) 푑퐺 푎 푚 푗 2 푗 푗 푗 푗 퐷 2푎푚풮/2 (Γ ( )) 푗=1 + 1 풮 2 풫풮
46
푚 푎 + 1 푎푚/2 2(푎+1)풮⁄2Γ ( ) 퐾1|횺| 풮 2 = 푚 ∏ (푎+1)⁄2 푎푚풮/2 푎 |횺 + 품̂ 품̂′ | 퐷12 (Γ풮 (2)) 푗=1 푗 푗 푎 + 1 푚 푎푚/2 퐾 Γ풮 ( ) |횺| = 1 2푚풮/2 ( 2 ) 푎 (푎+1)⁄2 퐷1 Γ ( ) 푚 ′ 풮 2 ∏푗=1|횺 + 품̂푗품̂푗| ̂∗ 834 The following are the details of the computation of matrix ∆ℎ(휽 ) and its determinant for model ∗ 835 푀1퐺퐻. Using matrix differentiation it follows that: 휕2푙 = −푊′푉−1푊 휕품휕품′
휕푙 1 −1 휕푉 1 −1 휕푉 −1 2 = − 푡푟 (푉 2) + (풚 − 푊품)′푉 2 푉 (풚 − 푊품) 휕휎푙 2 휕휎푙 2 휕휎푙
1 0 1 0 = − 푡푟 (푉−1 ( 퐼 )) + (풚 − 푊품)′푉−1 ( 퐼 ) 푉−1(풚 − 푊품) 2 푛푙 2 푛푙 0 0
1 0 1 0 = − 푡푟 (( 휎−2퐼 )) + (풚 − 푊품)′ ( 휎−4퐼 ) (풚 − 푊품) 2 푙 푛푙 2 푙 푛푙 0 0 1 = − (푛 휎−2 − 휎−4(풚 − 푊품)′(풚 − 푊품)) 2 푙 푙 푙 2 ′ 휕 푙 푛푙 (풚푙 − 푊푙품푙) (풚푙 − 푊푙품푙) 2 2 = 4 − 6 휕(휎푙 ) 2휎푙 휎푙 휕2푙 2 2 = 0 휕휎푙 휕휎푙′ 2 휕 푙 −1 휕푉 −1 −1 휕푉 −1 2 = − (푊′푉 2 푉 풚 − 푊′푉 2 푉 푊품) 휕휎푙 휕품 휕휎푙 휕휎푙 0 0 ′ −4 −4 = − (푊 ( 휎푙 퐼푛푙 ) 풚 − 푊′ ( 휎푙 퐼푛푙 ) 푊품) 0 0 −4 ′ ′ = 휎푙 (푊푙 푊푙품푙 − 푊푙 풚푙). 836 ̂∗ ̂ 2 2 2 2 ( ) ( )⁄( ) 837 Here 휽 = (품̂, 푉) = (품̂, 휎̂1 퐼푛1, … , 휎̂풮 퐼푛풮 ), 휎̂푙 ≔ 푆푙 = 풚푙 − 푊푙품̂푙 ′ 풚푙 − 푊푙품̂푙 푛 − 푟푙 , thus ̂∗ ∆ℎ(휽 )
47
푛2(푊 ′푊 (푊 ′푊 )−1푊 ′풚 − 푊 ′풚 ) 푛2(푊 ′푊 (푊 ′푊 )−1푊 ′풚 − 푊 ′풚 ) ′ ̂ −1 1 1 1 1 1 1 1 1 1 풮 풮 풮 풮 풮 풮 풮 풮 풮 푊 푉 푊 2 ⋯ 2 푆푆푅1 푆푆푅1 푛3 1 1 ⋯ 0 = 2푆푆푅2 푛 1 ⋱ ⋮ 3 푛풮 푆푦푚 2 ( 2푆푆푅풮 ) 푊′푉̂ −1푊 0 ⋯ 0 푛3 1 ⋯ 0 1 2푆푆푅2 = 1 푛 ⋱ ⋮ 3 푛풮 푆푦푚 2 ( 2푆푆푅풮 ) 838 then: 풮 푊′푉̂ −1푊 푛3 |∆ (휽̂∗)| = | | ∏ 푙 ℎ 푛 2푛푆푆푅2 푙=1 푙 풮 3 푛푙푊푙′푊푙 푛푙 = ∏ | | 2 푛푆푆푅푙 2푛푆푆푅 푙=1 푙 풮 1 푛 푚+3 = ∏ 푙 |푊 ′푊 | 푛풮(푚+1) 2푆푆푅푚+2 푙 푙 푙=1 푙 풮 (푚+3)⁄2 1⁄2 1 푛 ∴ 퐷∗ ≔ |∆ (휽̂∗)| = ∏ 푙 |푊 ′푊 |1⁄2. 1 ℎ 푛풮(푚+1)⁄22풮⁄2 (푚+2)⁄2 푙 푙 푙=1 푆푆푅푙 839 840 Using this result, the Laplace approximation of the integral:
2 2 2 2 2 2 ∫ ∫ ⋯ ∫ 푓(풚|푊, 품, 휎1 , … , 휎풮 ) 휋(품|퐺) 휋(휎1 , … , 휎풮 )푑휎1 … 푑휎풮 푑품 푚풮 ℝ ℝ+ ℝ+ 841 is: 푛 풮 − 푙 2 1 −(푛−풮)⁄2 푆푆푅푙 푛푙 −1⁄2 1 ′ −1 −풮(푚+1)⁄2 ∗ (2휋) ∏ {( ) exp (− )} |퐺| exp (− 품̂ 퐺 품̂) 푛 퐷1 푛푙 2 2 푙=1 풮 푣 2 풮푣⁄2 2 −( +1) (휏 ) 푛푙휏 푆푆푅푙 2 × 풮 ∏ exp (− ) ( ) 푣 푣⁄2 2푆푆푅푙 푛푙 (Γ (2) 2 ) 푙=1 푛 +푣 풮 −( 푙 +1) ( 2)풮푣⁄2 2 2 1 −(푛−풮)⁄2 휏 −풮(푚+1)⁄2 푛 푛푙휏 푆푆푅푙 = ∗ (2휋) 풮 푛 exp (− ) ∏ exp (− ) ( ) 퐷1 푣 푣⁄2 2 2푆푆푅푙 푛푙 (Γ (2) 2 ) 푙=1 푚 −1⁄2 1 × ∏|퐺 | exp (− 품̂′ 퐺−1품̂ ), 푗 2 푗 푗 푗 푗=1
48
842 consequently: ( 2)풮푣⁄2 ∗ 1 −(푛−풮)⁄2 휏 −풮(푚+1)⁄2 휋(풚|푊, 푀1푆푆퐻) ≈ ∗ (2휋) 풮 푛 퐷1 푣 푣⁄2 (Γ (2) 2 ) 푛 +푣 풮 −( 푙 +1) 푛 푛 휏2 푆푆푅 2 × exp (− ) ∏ exp (− 푙 ) ( 푙) 2 2푆푆푅푙 푛푙 푙=1 푚 푎푚⁄2 |횺| −(푎+풮+2)⁄2 1 ′ −1 × 푚 ∏ ∫|퐺푗| exp (− 푡푟 ((횺 + 품̂푗품̂푗)퐺푗 )) 푑퐺푗 ⁄ 푎 2 2푎푚풮 2 (Γ ( )) 푗=1 + 풮 2 풫풮 푎 + 1 푚 ∗ 푎푚⁄2 퐾 Γ풮 ( ) |횺| = 1 2푚풮⁄2 ( 2 ) ∗ 푎 (푎+1)⁄2 퐷1 Γ ( ) 푚 ′ 풮 2 ∏푗=1|횺 + 품̂푗품̂푗| 843 where 푛 +푣 풮 −( 푙 +1) ( 2)풮푣⁄2 2 2 ∗ −(푛−풮)⁄2 휏 −풮(푚+1)⁄2 푛 푛푙휏 푆푆푅푙 퐾1 = (2휋) 풮 푛 exp (− ) ∏ exp (− ) ( ) . 푣 푣⁄2 2 2푆푆푅푙 푛푙 (Γ (2) 2 ) 푙=1 844 845 The null model here is actually not a single model, but independent models each fitting a 846 subpopulation. Thus, the predicted vector of allelic effects is formed by putting together the vectors
847 품̂1, … , 품̂풮 obtained from each individual analysis. Of course, in this situation heterogeneous residual 848 variances are assumed because if analyses for different subpopulations are independent, imposing the 849 same residual variance for all subpopulations does not seem to be the best approach. Notice that for
850 each subpopulation the 풚푙 component of the likelihood has the same form of the 풚 component 851 likelihood of any null model with homogeneous marker effects and residual variances, but here we 852 are considering a subvector of 풚 containing phenotypes from subpopulation 푙 and the appropriate ∗ 853 rows of 푊0. Then, the approximation of 휋(풚|푊, 푀0푆푆퐻) is computed as the product of the ∗ 854 approximations of 휋(풚푙|푊푙, 푀0푆푆퐻), 푙 = 1,2, … , 풮, which, after some simplifications, yields the ∗ 855 expression for 퐵퐹10퐺푊퐻 presented in section 2.3.1. ∗ 856 Regarding fractional Bayes Factors, as mentioned in the paper, 퐹퐵퐹10퐺푊 = 퐹퐵퐹10퐺푊 = 퐹퐵퐹10푆푆푊 = ∗ 857 퐹퐵퐹10푆푆푊, here we present some details on the derivation of 퐹퐵퐹10퐺푊, that is, using the Gaussian 858 prior with homogeneous marker effect covariance matrices. In the other cases the procedure is 859 analogous; the key step is the cancellation of terms coming from the priors. Let
2 푐 0 2 2 퐼1 ≔ ∫ ∫(푓1(풚|품, 휎 , 푊)) 휋(품|퐺 )휋(휎 ) 푑휎 푑품 푚풮 ℝ ℝ+ ̃ 퐾1 0 −푚⁄2 −1 ′ 0 −1 퐼1 ≈ |퐺 | exp ( 품̂ ((퐺 ) ⨂퐼)품̂), 퐷̃1 2 860 where −(푛푐+푣+2)⁄2 ( 2)푣⁄2 2 −(푛푐+1)⁄2 푆푆푅 푛푐 휏 푛휏 −(푚풮+1)⁄2 퐾̃1 = (2휋) ( ) exp (− ) exp (− ) 푛 푛 2 푣 푣⁄2 2푆푆푅 Γ (2) 2
49
(푚풮+1)⁄2 푐 푛 1/2 (푚풮+1)⁄2 퐷̃1 = |푊′푊| = 푐 퐷1. √2푆푆푅(푚풮+2)⁄2 861 Then 푎 + 푚 푚풮⁄2 푎⁄2 2 퐾̃ |횺| Γ풮 ( ) 푓 (풚|푊, 푀 ) ≈ 1 2 . 푐 1퐺 (푚풮+1)⁄2 (푎+푚)⁄2 푎 푐 퐷1 푚 ′ Γ ( ) |횺 + ∑푗=1 품̂푗품̂푗| 풮 2 862 As mentioned in the derivations of Bayes factors, 푓푐(풚|푊0, 푀0퐺) can be easily defined from the 863 previous expression. Let 퐵퐹̅̅̅̅10퐺푊 represent the approximation of 퐵퐹10퐺푊, then 푎+푚 −( ) 퐾̃ 푐(푚풮+1)⁄2퐷 푏 + ∑푚 푔̂2 2 퐹퐵퐹 ≈ 퐵퐹̅̅̅̅ 0 1 ( 푗=1 0푗 ) 10퐺푊 10퐺푊 ̃ (푚+1)⁄2 푚 ′ 퐾1 푐 퐷0 |횺 + ∑푗=1 품̂푗품̂푗| 푎 + 1 − 푙 풮 Γ ( ) × 2푚(1−풮)⁄2 ∏ 2 푎 + 푚 + 1 − 푙 푙=2 Γ ( 2 ) 푛푐+푣+2 − 퐾̃ 푆푆푅 2 푛휏2 1 1 0 = ( 0) 푛−푚(1−풮)⁄2 exp (− ( − )) 퐾̃1 푆푆푅 2 푆푆푅0 푆푆푅 푛푐+푣+2+푛+푣+2 − 2 (푚풮+2)⁄2 푚(풮−1)⁄2 푆푆푅0 푆푆푅 ∴ 퐹퐵퐹10퐺푊 ≈ 푐 ( ) (푚+2)⁄2 푆푆푅 푆푆푅0 푛 − (푐−1) (푚풮+2)⁄2 2 푚(풮−1)⁄2 푆푆푅 푆푆푅0 = 푐 (푚+2)⁄2 ( ) 푆푆푅0 푆푆푅 푛 (푐−1) (푚풮+2)⁄2 2 푚(풮−1)⁄2 푆푆푅 푆푆푅 = 푐 (푚+2)⁄2 ( ) . 푆푆푅0 푆푆푅0 864 ∗ 865 Now we present some details on the derivation of 퐹퐵퐹10푊퐻. 866 푎 + 푚 푚 ∗ 푎푚/2 퐾̃ Γ풮 ( ) |횺| 푓 (풚|푊 , 푀∗ ) ≈ 1 2푚풮⁄2 ( 2 ) 푐 1 1퐺퐻 ̃∗ 푎 (푎+1)/2 퐷1 Γ ( ) 푚 푚 ′ 풮 2 ∏푗=1|횺 + ∑푗=1 품̂푗품̂푗| 푛 푐+푣+2 풮 −( 푙 ) 푛푐 (휏2)풮푣⁄2 푛 휏2 푆푆푅 2 ̃∗ −(푛푐−풮)⁄2 −풮(푚+1)⁄2 푙 푙 퐾1 = (2휋) 푛 exp (− ) 풮 ∏ exp (− ) ( ) 2 푣 푣⁄2 2푆푆푅푙 푛푙 (Γ (2) 2 ) 푙=1 ∗ 풮(푚+1)/2 ∗ 퐷̃1 = 푐 퐷1. 867 868 After some simplifications it follows that 869 퐾̃∗퐷̃∗ 퐾̃∗퐷∗ 퐹퐵퐹∗ ≈ 0 1 = 0 1 10푊퐻 ̃∗ ̃∗ ̃∗ ∗ 퐾1 퐷0 퐾1 퐷0
50
∗ 풮(푚+1)/2 ∗ −1 퐾̃0 푛 퐷1 ∗ = (푚+1)/2 = ( ∗) 퐾̃ 풮 퐷0 1 ∏푙=1 푛푙 ∗ ∴ 퐹퐵퐹10푊퐻 ≈ 1 870
871 Appendix C: Comments about the use of 푭푩푭̅̅̅̅̅̅ퟏퟎ푾 in the non-full rank case under the priors 872 used in this study
873 Notice that 퐹퐵퐹̅̅̅̅̅̅10푊 is an approximation of 퐹퐵퐹10푊 for a full rank linear regression model. Here, 874 퐹퐵퐹̅̅̅̅̅̅10푊 is seen as a limiting form of the approximate fractional Bayes factor of a function referred to 875 as “augmented likelihood” and we present an outline of how to justify its convergence to the true 876 fractional Bayes factor when the model is not of full rank. Take into account that what follows is not 877 rigorous enough to justify this approach. 878 Consider the “augmented likelihood”: −1 퐿 (품, 휎2, 푊, 휌|풚) ≔ (2휋)−푛⁄2(휎2)−푛⁄2 exp ( ((풚 − 푊품)′(풚 − 푊품) + 휌품′품)) , 휌 > 0 푎 2휎2 2 2 2 879 notice that lim휌→0 퐿푎(품, 휎 , 푊, 휌|풚) = 푓(풚|품, 휎 , 푊) ≔ 퐿(품, 휎 , 푊|풚), the 풚 component of 880 likelihood. Using this augmented likelihood instead of the proper likelihood the integral that has to 881 be solved is:
2 0 2 2 ∫ 휋(퐺) ( ∫ ∫ 퐿푎(품, 휎 , 푊, 휌|풚)휋(품|퐺 )휋(휎 ) 푑휎 푑품) 푑퐺 ≔ 퐼휌, + 푚풮 풫 ℝ ℝ+ 882 under the regularity condition that limits and integrals can be interchanged (satisfied in exponential 883 families), it follows that:
2 0 2 2 lim 퐼휌 = ∫ 휋(퐺) ( ∫ ∫ 퐿(품, 휎 , 푊|풚)휋(품|퐺 )휋(휎 ) 푑휎 푑품) 푑퐺 ≔ 퐼, 휌→0 + 푚풮 풫 ℝ ℝ+ 2 884 thus, under the stated regularity condition, for small 휌, using 퐿푎(품, 휎 , 푊, 휌) instead of the proper 885 likelihood is a proxy to compute 퐼. Then, the Laplace approximation of 2 2 2 ∗ 886 ∫ 푚풮 ∫ 퐿푎(품, 휎 , 푊, 휌)휋(품|퐺)휋(휎 ) 푑휎 푑품 ≔ 퐼휌 (recall that Laplace approximation is used only ℝ ℝ+ 887 for this inner integral) can be computed because it implies inverting a matrix of the form 푊′푊 + 휌퐼 888 which is positive definite and therefore it is invertible and its determinant is not null. This ∗ 889 approximation is denoted as 퐼휌̅ . Subsequently the same steps used before are performed in order to 890 obtain a pseudo fractional Bayes factor. 푛 푚풮+2 (푐−1) 푆푆푅2 2 (푆푆푅2) 2 ̅̅̅̅̅̅휌 푚(풮−1)⁄2 휌 휌 퐹퐵퐹10푊 = 푐 ( 2 ) 푚+2, 푆푆푅휌0 2 2 (푆푆푅휌0) 2 ′ ′ −1 2 ′ ′ −1 ′ 891 where 푆푆푅휌 = 풚 (퐼 − 푊(푊 푊 + 휌퐼) 푊′)풚, and 푆푆푅휌0 = 풚 (퐼 − 푊0(푊0푊0 + 휌퐼) 푊0)풚. We 892 know that: ∗ ∗ 893 퐼휌̅ ~퐼휌 as 푛 → ∞ (by Laplace approximation) ∗̅ ∗̅ ∗ 퐼휌 894 where 퐼휌~퐼휌 means ∗ → 1 as 푛 → ∞. Under the regularity condition mentioned above it implies that 퐼휌
51
895 퐼휌̅ ~퐼휌 as 푛 → ∞ (퐶. 1) where 퐼̅ = 휋(퐺)퐼∗̅ 푑퐺. We also know that 896 휌 ∫풫+ 휌
∗ ∗ 2 0 2 2 퐼휌~퐼 = ∫ ∫ 퐿(품, 휎 , 푊|풚)휋(품|퐺 )휋(휎 ) 푑휎 푑품 푎푠 휌 → 0 푚풮 ℝ ℝ+ ⟹ 퐼휌~퐼 푎푠 휌 → 0 (퐶. 2) 897 however, (퐶. 1) and (퐶. 2) do not necessarily imply that
898 퐼휌̅ ~퐼 as(푛, 휌) → (∞, 0) (퐶. 3) 899 because iterated limits are not always equal to multivariate (bivariate in this case) limits. Thus, it has 퐼∗̅ 퐼∗̅ 900 to be shown that lim (lim 휌) = lim 휌 = 1. To proof this, the first step is to show 휌→0 푛→∞ 퐼 (푛,휌)→(∞,0) 퐼 901 that the bivariate limit exists and then that it is equal to the iterated limit. If this is shown, then 퐼∗̅ 퐼∗̅ 902 lim 휌 = lim (lim 휌) = 1, and 퐹퐵퐹̅̅̅̅̅̅ which clearly satisfies (푛,휌)→(∞,0) 퐼 푛→∞ 휌→0 퐼 10푊 휌 퐹퐵퐹̅̅̅̅̅̅10푊 = lim 퐹퐵퐹̅̅̅̅̅̅ 휌→0 10푊 903 is an approximation to the fractional Bayes factor in the non-full rank scenario. 904 905
52