| INVESTIGATION
A Multiple-Trait Bayesian Lasso for Genome-Enabled Analysis and Prediction of Complex Traits
Daniel Gianola*,†,‡,§,1 and Rohan L. Fernando‡ *Department of Animal Sciences, and †Department of Dairy Science, University of Wisconsin-Madison, Wisconsin 53706, ‡Department of Animal Science, Iowa State University, Ames, Iowa 50011, and §Department of Plant Sciences, Technical University of Munich (TUM), TUM School of Life Sciences, Freising, 85354 Germany ORCID IDs: 0000-0001-8217-2348 (D.G.); 0000-0001-5821-099X (R.L.F.)
ABSTRACT A multiple-trait Bayesian LASSO (MBL) for genome-based analysis and prediction of quantitative traits is presented and applied to two real data sets. The data-generating model is a multivariate linear Bayesian regression on possibly a huge number of molecular markers, and with a Gaussian residual distribution posed. Each (one per marker) of the T 3 1 vectors of regression coefficients (T: number of traits) is assigned the same T2variate Laplace prior distribution, with a null mean vector and unknown scale matrix Σ. The multivariate prior reduces to that of the standard univariate Bayesian LASSO when T ¼ 1: The covariance matrix of the residual distribution is assigned a multivariate Jeffreys prior, and Σ is given an inverse-Wishart prior. The unknown quantities in the model are learned using a Markov chain Monte Carlo sampling scheme constructed using a scale-mixture of normal distributions representation. MBL is demonstrated in a bivariate context employing two publicly available data sets using a bivariate genomic best linear unbiased prediction model (GBLUP) for benchmarking results. The first data set is one where wheat grain yields in two different environments are treated as distinct traits. The second data set comes from genotyped Pinus trees, with each individual measured for two traits: rust bin and gall volume. In MBL, the bivariate marker effects are shrunk differentially, i.e., “short” vectors are more strongly shrunk toward the origin than in GBLUP; conversely, “long” vectors are shrunk less. A predictive comparison was carried out as well in wheat, where the comparators of MBL were bivariate GBLUP and bivariate Bayes Cp—a variable selection procedure. A training- testing layout was used, with 100 random reconstructions of training and testing sets. For the wheat data, all methods produced similar predictions. In Pinus, MBL gave better predictions that either a Bayesian bivariate GBLUP or the single trait Bayesian LASSO. MBL has been implemented in the Julia language package JWAS, and is now available for the scientific community to explore with different traits, species, and environments. It is well known that there is no universally best prediction machine, and MBL represents a new resource in the armamentarium for genome-enabled analysis and prediction of complex traits.
KEYWORDS Bayesian Lasso; complex traits; GWAS; multiple-traits; prediction; quantitative genetics; quantitative genomics
WO main paradigms have been employed for investi- Gianola et al. (2016) formulate a statistically orientated Tgating statistical associations between molecular critique. WGR was developed mostly in animal and plant markers and complex traits: marker-by-marker genome- breeding (e.g., Lande and Thompson 1990; Meuwissen wide association studies (GWAS) and whole-genome re- et al. 2001; Gianola et al. 2003) primarily for predicting gression approaches (WGR). GWAS is dominant in human future performance, but it has received some attention in genetics; Visscher et al. (2017) present a perspective and human genetics as well (e.g.,delosCamposet al. 2010; Yang et al. 2010; Lee et al. 2011; Makowsky et al. 2011; López de Maturana et al. 2014). de los Campos et al. Copyright © 2020 by the Genetics Society of America doi: https://doi.org/10.1534/genetics.119.302934 (2013), Gianola (2013) and Isik et al. (2017) reviewed Manuscript received July 5, 2019; accepted for publication December 20, 2019; an extensive collection of WGR approaches. Other studies published Early Online December 26, 2019. “ ” Available freely online through the author-supported open access option. notedthatWGRcanbeusedbothfor discovery of asso- Supplemental material available at figshare: https://doi.org/10.25386/genetics. ciations and for prediction (Moser et al. 2015; Goddard 10689380. 1Corresponding author: University of Wisconsin-Madison, 1675 Observatory Dr., et al. 2016; Fernando et al. 2017). Hence, WGR methodol- Madison, WI 53706. E-mail: [email protected] ogy is an active area of research.
Genetics, Vol. 214, 305–331 February 2020 305 Multiple-trait analysis has long been of great interest in observations. Tibshirani (1996) noted that the LASSO solu- plant and animal breeding, mainly from the point of view of tions can also be obtained by calculating the mode of the joint selection for many traits (Smith 1936; Hazel 1943; conditional posterior distribution of the regression coeffi- Walsh and Lynch 2018). Henderson and Quaas (1976) de- cients in a Bayesian model in which each coefficient is veloped multi-trait best linear unbiased prediction of breed- assigned the same conditional double exponential (DE) or ing values for all individuals and traits measured in a Laplace prior. Using a ridge regression reformulation of population of animals—a methodology that has gradually LASSO, it can be seen that its Bayesian version shrinks become routine in the field. For example, Gao et al. (2018) small-value regression coefficients very strongly toward zero, described an application of a nine-variate model to data rep- whereas large-effect variants are regularized to a much lesser resenting close to 7 million and 4 million Holstein and Nor- extent than in ridge regression (Tibshirani 1996; Gianola dic Red cattle, respectively; the nine traits were milk, fat, and 2013). Yuan and Lin (2006) and Yuan et al. (2007) consid- protein yields in each of the first three lactations of the cows. ered the problem of clustering regression coefficients into A multiple-trait analysis is also a natural choice in quests groups (factors), with the focus becoming factor selection, for understanding and dissecting genetic correlations be- as opposed to the predictor variable selection that takes place tween traits using molecular markers, e.g.,evaluating in LASSO. For instance, a cluster could consist of a group of whether pleiotropy or linkage disequilibrium are at the markers in tight physical linkage. These authors noted that, in root of between-trait associations (Gianola et al. 2015; some instances, grouping enhances prediction performance Cheng et al. 2018a). For instance, Galesloot et al. (2014) over ridge regression, while in others, it does not. Such find- compared six methods of multivariate GWAS via simula- ings are consistent with knowledge accumulated in close to tion and found that all delivered a higher power than sin- two decades of experience with genome-enabled prediction in gle-trait GWAS, even when genetic correlations were animal breeding: there is no universally best prediction ma- weak. Many single-trait WGR methods extend directly to the chine. A multiple-trait application of a LASSO penalty on re- multiple-trait domain, e.g., genomic best linear unbiased pre- gression coefficients was presented by Li et al. (2015). These diction (GBLUP; VanRaden 2007). Other procedures, such as authors assigned a multivariate Laplace distribution to the Bayesian mixture models, are more involved, but extensions model residuals, and a group-LASSO penalty (Yuan and Lin are available (Calus and Veerkamp 2011; Jia and Jannink 2006) to the regression coefficients. The procedure differs 2012; Cheng et al. 2018a). The mixture model of Cheng from Tibshirani’s LASSO in that the model selects vectors of et al. (2018a) is particularly interesting because it provides regressors (corresponding to regressions of a given marker insight into whether markers affect all, some, or none of the over traits) as opposed to single-trait predictor variables. traits addressed. For example, the proportion of markers in Park and Casella (2008) introduced a fully Bayesian LASSO each of the ð0; 0Þ; ð0; 1Þ; ð1; 0Þ and ð1; 1Þ categories, where (BL). Contrary to LASSO, BL produces a model where all re- ð0; 0Þ means “no effect,” and ð1; 1Þ denotes “effect” on each of gression coefficients are non-null (even if p . N); most regres- two disease traits in Pinus taeda was estimated by Cheng et al. sions are often tiny in value, except those associated with (2018a) using single nucleotide polymorphisms (SNPs). The covariates (markers) with strong effects. In short, LASSO pro- proportion of Markov chain Monte Carlo (MCMC) samples duces a sparse model, whereas BL yields an effectively sparse falling into the ð1; 1Þ class was ,3%, with 140 markers specification, similar to Bayesian mixture models such as Bayes appearing as candidates for further scrutiny of pleiotropy; B (Meuwissen et al. 2001). The first application of BL in quan- 97% of the SNP were in the ð0; 0Þ class, and 0.5% were in titative genetics was made by Yi and Xu (2008) in the context the ð0; 1Þ and ð1; 0Þ classes. It must be noted that Cheng of quantitative trait locus (QTL) mapping, with subsequent et al. (2018a) used Bayesian model averaging, so posterior applications reported in de los Campos et al. (2009), Legarra estimates of effects, and of their uncertainties, constitute av- et al. (2011), Li et al. (2011) and Lehermeier et al. (2013). erages over all possible configurations. The resulting “average It appearsthat no multiple-trait generalizationof the BL has yet model” is not truly sparse, as Bayesian mixture models always been reported. The present paper describes a multi-trait Bayesian assign some posterior probability to each of the possible con- LASSO (MBL) model based on adopting a multivariate Laplace figurations. An alternative to a mixture is to use a prior distri- distribution with unknown scale matrix as prior distribution bution that produces strong shrinkage toward the origin of for the markers or variants under scrutiny. The MBL is intro- “weak” vectors of marker effects; here, each marker has a duced and compared with a multiple-trait GBLUP (MTGBLUP) vector with dimension equal to the number of traits. using wheat and pine tree data sets. The section Multi-Trait The LASSO (least absolute shrinkage and selection oper- Regression Model describes MBL, including a MCMC sampling ator) method presented by Tibshirani (1996) is a single- algorithm. Subsequently, MBL is compared with MTGBLUP response method based on minimizing a linear regression using a wheat data set. Finally, bivariate MBL and bivariate residual sum of squares subject to a constraint based on an MTGBLUP are contrasted from a predictive perspective, show- L1 norm. It can produce a sparse model, i.e., if the linear ing a better performance of MBL over BLUP and over a single- regression model has p regression coefficients, the LASSO trait Bayesian LASSO specification, corroborating the usefulness yields a smaller model (i.e., model selection), but with a of multiple-trait analyses. The paper concludes with a general complexity that cannot exceed N; the number of discussion and technical details are presented in Appendices.
306 D. Gianola and R. L. Fernando b* b*; b* ; b* 3 Multi-Trait Regression Model trait t, respectively. Above, ¼ vecð 1 2 3Þ is a 3p 1 vector and e* ¼ vecðe*; e* ; e*Þ has dimension 3N 3 1. Note Assume there are T traits observed in each of N individuals, 1 2 3 that Varðe*Þ¼R 5I ¼ R. Putting y *ðm; b*Þ¼ðI 51 Þmþ and let b b be a T 3 1 vector of allelic substitution 0 3 N j ¼ f jtg 5 b*; ; ; ...; ; b ðI3 XÞ the probability model is effects at marker j ¼ 1 2 p with jt representing the ef- fect of marker j on trait t ðt ¼ 1; 2; ...; TÞ. The multi-trait 1 1 * m; b*; } 2 *2 * m; b* 9 regression model (assuming no nuisance location effects p y R0 N exp ðy y ð ÞÞ jR j2 2 other than a mean) for the T responses is 0 Xp 3 21 * 2 * m; b* : m b ; ; :::; ; ; ; ...; ; R ðy y ð ÞÞ (5) yi ¼ þ xij j þ ei i ¼ 1 2 N j ¼ 1 2 p (1) j¼1 We will work with either (1) or (4), depending on the context. 3 where yi is a T 1 vector of responses observed in individual ; m m Bayesian prior assumptions i ¼f tg is the vector of trait means, and xij is the genotype individual i possesses at marker locus j. The residual vector Parameters m and R : The vector m will be assigned a “flat” 3 0 ei ðT 1Þ is assumed to follow the Gaussian distribution improper prior, and Jeffreys noninformative prior (e.g., e jR Nð0; R Þ; where R is a T 3 T covariance matrix. i 0 0 0 Sorensen and Gianola 2002) will be adopted for R0 so that All ei vectors are assumed to be mutually independent and their joint prior density is identically distributed. If traits are sorted within individuals, the probability model 2 T þ 1 associated with Equation 1 can be represented as 2 pðm; R0Þ } jR0j : (6) ; ; ...; m; b ; b ; ...; b ; p y1 y2 yN 1 2 p R0 Multivariate laplace prior distribution for marker effects: " ! XN Xp 9 The same T-variate Laplace prior distribution with a null } 1 21 2m2 b mean vector will be assigned to each of the T 3 1 vectors b N exp yi xij j j 2 2 jR0j i¼1 j¼1 ð j ¼ 1; 2; ...; pÞ, assumed mutually independent, a priori. !# Xp Gómez et al. (1998) presented a multi-dimensional version 3 21 2 m 2 b of the power exponential family of distributions; one special R0 yi xij j j¼1 case is the multivariate Laplace distribution (MLAP). The density of the MLAP with a zero-mean vector used here is } 1 21 21 ; N exp tr R0 Se (2) T 2 G jR0j 2 T qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi b Σ 2 21 b9Σ21b ; pð jj Þ¼ 1 T exp j j 1 T where jΣj2 p2 Gð1 þ TÞ2ð þ Þ 2 ! ! j ¼ 1; 2; ...; p; XN Xp Xp 9 2 m 2 b 2m2 b (7) Se ¼ yi xij j yi xij j (3) i¼1 j¼1 j¼1 where Σ ¼ fΣtt9g is a T 3 T positive-definite scale matrix. The is a T 3 T matrix of sums of squares and products of the un- variance–covariance matrix of MLAP is observed regression residuals. Varðb jΣÞ¼4ðT þ 1ÞΣ ¼ B; (8) The regression model can be formulated in an equivalent j manner by sorting individuals within traits; hereafter, we will note that the absolute values of the elements of B; the inter- *; * * use T ¼ 3. Let y1 y2 and y3 be response vectors of order N trait variance–covariance of marker effects, are larger each observed for traits 1, 2, and 3, respectively. The repre- Σ b ; s2 " ; than those of .Hence, jt ð0 b;tÞ for j where sentation of the model is s2 Σ ; b;t ¼ 4ðT þ 1Þ tt is the appropriate diagonal element of B 2 3 2 32 3 2 32 3 2 3 likewise, sb;tt9¼ 4ðT þ 1ÞΣtt 9is the covariance of marker ef- y * 1 00 m X 00 b* e* 6 1 7 6 N 76 1 7 6 76 1 7 6 1 7 fects between traits t and t9; for all j: Putting T ¼ 1 in (7) 4 y* 5 ¼ 4 01 0 54 m 5 þ 4 0 X 0 54 b* 5 þ 4 e* 5 2 N 2 2 2 yields y* 001 m 00X b* * 3 N 3 3 e3 1 jbj pðb ΣÞ¼ pffiffiffiffiffiffi exp 2pffiffiffiffiffiffi : (9) * * j ¼ ðI351NÞm þ ðI35XÞb þ e ; (4) 2 4Σ 4Σ 3 9 ; 3 where 1N is an N 1 vector of 1 s X ¼ fxijg is an N p The preceding is thep densityffiffiffiffiffiffi of a DE distribution with b* 3 * 3 Σ b Σ matrix of marker genotypes, and t ðp 1Þ and et ðN 1Þ null mean, parameter 4 and variance Varð Þ¼8 .As are vectors of regression coefficients and of residuals for mentioned earlier, Tibshirani (1996) and Park and Casella
Multiple-Trait Bayesian Lasso 307 Σ 2; 2; ...; 2 (2008) used the DE distribution as conditional (given ) where D ¼ diagðv1 v2 vpÞ is a diagonal matrix. Hence, prior for regression coefficients in the BL—a member of the À Á “Bayesian Alphabet” (Gianola et al. 2009). Gianola et al. 1 2 2 p b* Σ; v2 } exp 2 b*9 Σ 15D 1 b* : (15) (2018) assigned the DE distribution to residuals of a linear 2 model for the purpose of attenuating outliers, and Li et al. (2015) used the MLAP distribution for the residuals in a Σ Σ “robust” linear regression model for QTL mapping. Scale matrix : The scale matrix of MLAP can be given a fi MLAP is therefore an interesting candidate prior for multi- xed value (becoming a hyper-parameter), or inferred, in trait marker effects in a multiple trait generalization of the which case a prior distribution is needed. Here, an in- V Bayesian LASSO (MBL). A zero-mean MLAP distribution has verse-Wishart ðIWÞ distribution with scale matrix b and nb degrees of freedom will be assigned as prior. The a sharp peak at the 0 coordinates, although, when T ¼ 1 MLAP reduces to a DE distribution, the marginal and condi- density is tional densities of MLAP are not DE. Gómez et al. (1998) T þ nb þ 1 showed that such densities are elliptically contoured, and, À Á 2 À Á 1 21 p Σ Vb; nb } jΣj 2 exp 2 tr Σ Vb : thus, not DE. Appendix A and Supplemental Material, Fig- 2 – ures S1 S3 in the Supplemental material give background (16) on MLAP. Gómez-Sánchez-Manzano et al. (2008) showed that Joint posterior and fully conditional distributions MLAP can be represented as a scaled mixture of normal dis- b Σ; 2 ; 2Σ ; 2 2 tributions under the hierarchy: (1) ½ j vj ¼NTð0 vj Þ The joint posterior distribution, including v ¼ fvj g from the 2 Tþ1; 1 ; ; ...; ; ; Σ 2 scale-mixture of normals representation of the prior distribu- and (2) vj Gamma 2 8 for j ¼ 1 2 p NT ð0 vj Þ denotes a T2variate normal distribution with null mean tion of b, was assumed to take the form Σ 2: 2 and covariance matrix vj The density of vj is À Á À Á * 2 * * * p m; b ; R0; Σ; v y ; H } p y m; b ; R0 pðR0jHÞ 2 À Á À Á (17) À Á À ÁTþ121 À v Á 3 b* Σ; 2 2 Σ ; h v2 } v2 2 exp 2 j : (10) p v p v pð jHÞ j j 8 where H denotes the hyper-parameters; recall that y* is Let the collection of all marker effects over traits be repre- the data vector sorted by individuals within trait. The 3 sented by the Tp 1 vector fully conditional distributions are presented below, 9 with ELSE used to denote all parameters that are b ¼ b9 b9 :::b9 : (11) 1 2 p kept fixed, together with H,inaspecificconditional If independent and identical MLAP prior distributions are distribution. assigned to each of the subvectors, the joint prior density of all marker effects, given Σ, can be represented as Parameters m and b* given ELSE: From (17,) and using representations (4) and (15), the fully conditional posterior Yp distribution of m and b* has density pðbjΣÞ¼ pðb jΣÞ j À Á À Á À Á j¼1 2 p m; b* ELSE } p y* m; b*; R0 p b* Σ; v Z Yp N À Á À Á b ; Σ 2 2 2; 1 2 ¼ NT j 0 vj h vj dvj (12) } exp 2 ðy*2y*ðm; b*ÞÞ9R 1ðy*2 y*ðm; b*ÞÞ j¼1 0 2 h i 9 b 2 2; 2 ::: 2 À Á and the joint density of and v ¼ v1 v2 vp is 1 2 2 3 exp 2 b*9 Σ 15D 1 b* : (18) À Á Yp À Á À Á 2 b; 2 Σ ; Σ 2 2 : p v ¼ NT 0 vj h vj (13) j¼1 The preceding is a multivariate normal density (e.g., Sorensen and Gianola 2002). The mean vector of the distri- When individuals are sorted within traits (e.g., T ¼ 3), note bution is that ½b* Σ; v2 is a Tp2dimensional normal distribution, with 2 2 9 21 null mean vector and covariance matrix m R 1N R 151 X ¼ 0 0 N b* 215 9 215 9 Σ215 21 02 3 1 2 3 R0 X 1N R0 X X þ D * "#À Á (19) b Σ Σ Σ 21 9 B6 1 7 C D 11 D 12 D 13 R 51 y* @4 b* 5 Σ; 2A 4 Σ Σ Σ 5 Σ5 ; 3 À 0 NÁ ; Var 2 j v ¼ D 21 D 22 D 23 ¼ D 21 9 * R 5X9 y* b DΣ31 DΣ32 DΣ32 0 3 (14) and the variance–covariance matrix is
308 D. Gianola and R. L. Fernando " # 2 2 9 21 Recall that m R 1N R 151 X Var jELSE ¼ 0 0 N b 21 21 21 21 R 5X91N R 5X9X þ Σ 5D À Á Yp À Á 0 0 2 2 2 p b* Σ; v ¼ N b 0; Σv ; (26) ¼ C 1: T j j j¼1 (20) so, from (17) A more explicit representation is presented in Appendix B for À Á : the case T ¼ 3 pðΣjELSEÞ } p b* Σ; v2 pðΣjHÞ Yp Σ21 Fully conditional distributions of partitions of the loca- } 1 21b9 b 1 exp j 2 j tion vector: For details, see Van Tassell and Van Vleck (1996) Σ 2 2 2 v j¼1 vj j and Sorensen and Gianola (2002). Since the joint posterior of the location parameters, given Σ; v2; and R ; is multivariate 0 T þ nb þ 1 normal, all conditionals and linear combinations thereof are 2 À Á 1 21 3 jΣj 2 exp 2 tr Σ Vb normal as well. In particular ðT ¼ 3Þ, 2 2 3 X3 X n 1 9 9 p þ T þ b þ 1 E m ELSE 419 rtt y* 2 Xbt 2 N rtt m 5; 2 ð tj Þ¼ tt N t t9 1 21 r N } jΣj 2 exp 2 tr Σ Sb ; t9¼1 t96¼t 2 i ¼ 1; 2; 3; (21) (27) and where Xp b b9 1 j j m ; ; ; : Sb ¼ þ Vb (28) Varð tjELSEÞ¼ tt t ¼ 1 2 3 (22) 2 r N j¼1 vj
Likewise is a T 3 T matrix. Hence the conditional posterior distribution of Σ is IW p T nb; Sb . The kernel of the density of Σ is 2 21 ð þ þ Þ À Á D 1 1 21 b* tt 9 often represented as expf2 tr½Σ ðp þ T þ nbÞSb g, where E t ELSE ¼ r X X þ Σ 2 0 tt Sb ¼ Sb=ðp þ T þ nbÞ: 2 3 X3 X Fully conditional distribution of v2: From (17), and 3 4 9 tt9 * 2 m 2 tt9 9 21Σtt9 b* 5; X r yj 1N j r X X þ D t9 using (13), t9¼1 t96¼t À Á À Á À Á p v2 ELSE } p b* Σ; v2 p v2 i ¼ 1; 2; 3; (23) Yp À Á À Á and } b ; Σ 2 2 NT j 0 vj h vj j¼1 À Á 21 21 b* tt 9 D ; ; ; : Var t ELSE ¼ r X X þ t ¼ 1 2 3 (24) " # Σ 2 2 tt Yp b9Σ 1b À ÁT 1 À Á 1 j þ 21 v } 2 j 2 2 2 j À ÁT exp 2 vj exp 2 2 2v 8 j¼1 vj j
" # Σ 4 Fully conditional distributions of R0 and : From (17), Yp À Á b9Σ21b v 21 j j þ using (2) and (6) } v2 2exp 2 4 : (29) j 2v2 j¼1 j N þ T þ 1 2 1 2 The preceding density is not in a recognizable form. Appen- pðR jELSEÞ } jR j 2 exp 2 tr R 1S ; 0 0 2 0 e dix C gives details of a Metropolis-Hastings algorithm tailored (25) for making draws from the distribution having density (29). A brief description of the procedure follows. so ½R jELSE is an IW distribution with N þ T degrees of 0 MCMC algorithm freedom and scale matrix Se.InIW; the kernel of the 1 21 density is often written as expf2 tr½R ðN þ TÞÞSe g; where Starting values for R0 and Σ can be obtained “externally” 2 0 Se ¼ Se=ðN þ TÞ: from some estimates of R0 and B (the T 3 T matrix of
Multiple-Trait Bayesian Lasso 309 Figure 1 Bivariate Bayesian LASSO: trace plot and posterior density of residual correlation.
variances and covariances of marker effects) calculated of marker effects. A vector of effects of a marker with a short with standard methods such as maximum likelihood. Re- Mahalanobis distance away from 0 is more strongly shrunk call that Σ ¼ B=½4ðT þ 1Þ : toward the origin (i.e., the mean of prior distribution) than 2 ; ; ...; Sample each vj ð j ¼ 1 2 pÞ using the following Metropolis- vectors containing strong effects on at least one trait. MLAP Hastings sampler: preserves the spirit of BL, producing “pseudoselection” of a 1; b 1 ; 1. At round t, draw y from Y IGð ¼ 2 ¼ 4Þ and eval- covariates: all markers stay in the model, but some are effec- uate y as proposal; IG stands for an inverse-gamma tively nullified. A marker with strong marginal and joint ef- distribution. fects on the traits under consideration could flag potentially 2. Draw U Uð0; 1Þ; with the probability of move being pleiotropic regions. minð1; RÞ, with R as in Appendix C. , ; ½tþ1 2½tþ1 = ½t 3. If U R set wj ¼ y and form vj ¼ 2 wj as a Missing records for some traits new state. Otherwise, set v2½tþ1 ¼ v2½t ; j ¼ 1; 2; ...; p: j j Often, not all traits are measured in all individuals, a situation that is more common in animal breeding than in plant In a “single-pass” sampler, use (19) and (20) for sampling breeding. A standard approach (“data augmentation”) treats the entire location vector jointly. Otherwise, adopt a missing phenotypes as unknowns in an expanded joint blocking strategy; for example draw m and b* using posterior distribution. As shown in Appendix F, a predictive (21), (22), (23) and (24). distribution can be used to produce an imputation of missing Sample R from IW N T; S and Σ from IW p T nb; Sb . 0 ð þ eÞ ð þ þ Þ data.
Remarks Alternative Formulation in TN Dimensions Appendix E shows that the degree of shrinkage of marker The MCMC sampler described above is based on a regression effects results from a joint action between Σ and the strength on markers formulation stemming from either (1) or (4). In a
310 D. Gianola and R. L. Fernando Figure 2 Bivariate Bayesian LASSO: trace plot and posterior density of correlation between marker effects.
“single-pass” sampler, Tð1 þ pÞ parameters must be drawn dairy cattle, where milk production of daughters of bulls in together; when p is very large, direct inversion is typically different countries are regarded as different traits and in unfeasible, so the scheme must be reformulated into a multi-environment situations in plant breeding; both in- “block-sampling” one, i.e., by drawing some of the location stances can be represented as special cases of a multiple-trait parameters jointly by conditioning on the other location mixed effects model. parameters, or by using a single-site sampler (Sorensen A publicly available Loblolly pine ðPinus taedaÞ data de- and Gianola 2002). Blocking or single-site sampling facili- scribed in Cheng et al. (2018a) was used to carry out a pre- tate computation at the expense of slowing down conver- dictive comparison between a Bayesian bivariate GBLUP with gence to the target distribution. Appendix D gives a scheme the bivariate Bayesian LASSO, as well as the latter vs. a single- in which Tð1 þ NÞ effects (trait means and bivariate geno- trait Bayesian LASSO. After edits, there were n ¼ 807 indi- mic breeding values) are inferred, and the Tp marker effects viduals with p ¼ 4828 SNP markers with measurements on are calculated indirectly, following ideas of Henderson rust bin scores and rust gall volume—two disease traits; see (1977) and adapted by Goddard (2009) to a genome-based Cheng et al. (2018a). Supplemental material available at fig- model. share: https://doi.org/10.25386/genetics.10689380.
Data availability Bivariate Analysis of Wheat Yield: MBL vs. GBLUP The authors state that all data necessary for confirming the conclusions presented in the article are represented fully Genomic BLUP and Bayesian BLUP within the article. The wheat yield data set in the R package The bivariate model was BGLR (Pérez and de los Campos 2014) was employed to contrast MBL with GBLUP and Bayes Cp. This wheat data m y1 1 1 g1 e1 ; ¼ m þ þ (30) set has been studied extensively, e.g., by Crossa et al. y2 1 2 g2 e2 (2010), Gianola et al. (2011, 2016), and Long et al. (2011).
There are n ¼ 599 wheat inbred lines, each genotyped with where y1ðy2Þ is the vector of grain yields in environment 1ð2Þ m m p ¼ 1279 DArT (Diversity Array Technology) markers, and of the 599 inbred lines; 1 and 2 are the trait means in the each planted in four environments. The DArT markers are two environments, and 1 is a 599 3 1 incidence vector of ; ; “ ” binary ð0 1Þ and denote the presence or absence of an allele ones; g1 and g2 are the additive genomic values of the lines, at a marker locus in a given line. Grain yields in environments and e1 and e2 are model residuals. In GBLUP (VanRaden 1 and 2 were employed to compare outcomes between anal- 2008) the genetic signals captured by markers are repre- b b ; yses based on bivariate GBLUP and the bivariate BL. In the sented as g1 ¼ X 1 and g2 ¼ X 2 where X is a bivariate model, yields in the two environments are treated 599 3 1279 centered and scaled matrix of genotype codes, — b b as distinct traits, conceptually an idea that dates back to and 1ð 2Þ contains the marker allele substitution effects on Falconer (1952). This type of setting arises in breeding of trait 1ð2Þ: The residual distribution was
Multiple-Trait Bayesian Lasso 311 Figure 3 Bivariate GBLUP vs. bivariate Bayesian LASSO (posterior mean) estimates of marker effects on wheat grain yield.