Metafounders Are Related to Fst Fixation Indices and Reduce Bias in Single-Step Genomic Evaluations Carolina A
Total Page:16
File Type:pdf, Size:1020Kb
Metafounders are related to Fst fixation indices and reduce bias in single-step genomic evaluations Carolina A. Garcia-Baccino, Andres Legarra, Ole F. Christensen, Ignacy Misztal, Ivan Pocrnic, Zulma G. Vitezica, Rodolfo J. C. Cantet To cite this version: Carolina A. Garcia-Baccino, Andres Legarra, Ole F. Christensen, Ignacy Misztal, Ivan Pocrnic, et al.. Metafounders are related to Fst fixation indices and reduce bias in single-step genomic evaluations. Genetics Selection Evolution, BioMed Central, 2017, 49 (1), pp.34. 10.1186/s12711-017-0309-2. hal- 01487094 HAL Id: hal-01487094 https://hal.archives-ouvertes.fr/hal-01487094 Submitted on 10 Mar 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution| 4.0 International License Garcia‑Baccino et al. Genet Sel Evol (2017) 49:34 DOI 10.1186/s12711-017-0309-2 Genetics Selection Evolution RESEARCH ARTICLE Open Access Metafounders are related to Fst fixation indices and reduce bias in single‑step genomic evaluations Carolina A. Garcia‑Baccino1,2, Andres Legarra3* , Ole F. Christensen4, Ignacy Misztal5 , Ivan Pocrnic5 , Zulma G. Vitezica3 and Rodolfo J. C. Cantet1,2 Abstract Background: Metafounders are pseudo-individuals that encapsulate genetic heterozygosity and relationships within and across base pedigree populations, i.e. ancestral populations. This work addresses the estimation and use‑ fulness of metafounder relationships in single-step genomic best linear unbiased prediction (ssGBLUP). Results: We show that ancestral relationship parameters are proportional to standardized covariances of base allelic frequencies across populations, such as Fst fixation indexes. These covariances of base allelic frequencies can be estimated from marker genotypes of related recent individuals and pedigree. Simple methods for their estimation include naïve computation of allele frequencies from marker genotypes or a method of moments that equates aver‑ age pedigree-based and marker-based relationships. Complex methods include generalized least squares (best linear unbiased estimator (BLUE)) or maximum likelihood based on pedigree relationships. To our knowledge, methods to infer Fst coefficients from marker data have not been developed for related individuals. We derived a genomic relationship matrix, compatible with pedigree relationships, that is constructed as a cross-product of { 1,0,1} codes and that is equivalent (apart from scale factors) to an identity-by-state relationship matrix at genome-−wide markers. Using a simulation with a single population under selection in which only males and youngest animals are geno‑ typed, we observed that generalized least squares or maximum likelihood gave accurate and unbiased estimates of the ancestral relationship parameter (true value: 0.40) whereas the naïve method and the method of moments were biased (average estimates of 0.43 and 0.35). We also observed that genomic evaluation by ssGBLUP using metafound‑ ers was less biased in terms of estimates of genetic trend (bias of 0.01 instead of 0.12), resulted in less overdispersed (0.94 instead of 0.99) and as accurate (0.74) estimates of breeding values than ssGBLUP without metafounders and provided consistent estimates of heritability. Conclusions: Estimation of metafounder relationships can be achieved using BLUP-like methods with pedigree and markers. Inclusion of metafounder relationships reduces bias of genomic predictions with no loss in accuracy. Background genotyped populations, i.e. some individuals are geno- Metafounders are pseudo-individuals that describe rela- typed, using high-density genetic markers across the tionships within and across pedigree base populations. genome, others are not, and phenotypes may be recorded The concept of metafounders provides a coherent frame- in either of the two subsets. An integrated solution called work for the theory of genomic evaluation [1]. Genomic single-step genomic best linear unbiased prediction (ssG- evaluation in agricultural species often implies partially BLUP) has been proposed [2–4]. This solution uses the following integrated relationship matrix: A − A A−1A + A A−1GA−1A A A−1G *Correspondence: [email protected] H = 11 12 22 21 12 22 22 21 12 22 , 3 −1 GenPhySE, INRA, INPT, ENVT, Université de Toulouse, GA22 A21 G 31326 Castanet‑Tolosan, France Full list of author information is available at the end of the article © The Author(s) 2017. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/ publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Garcia‑Baccino et al. Genet Sel Evol (2017) 49:34 Page 2 of 14 with inverse: pseudo-individuals that encapsulate three ideas: (a) separate means for each base population [4, 12, 13], (b) −1 −1 00 H = A + −1 −1 , randomness of these means [5] and (c) propagation of 0G− A22 the randomness of these means to the progeny [11], where G is the genomic relationship matrix, A is the while accommodating several populations with complex pedigree-based relationship matrix, and matrices crosses, e.g. [14]. Legarra et al. [1] also generalized one A11, A12, A21, A22 are submatrices of A with labels 1 and relationship between founders (scalar γ) to several rela- 2 denoting non-genotyped and genotyped individuals, tionships between founders in the pedigree, i.e. ancestral respectively. relationships (matrix Ŵ), and suggested simple methods Because genotyped animals are often not a random to estimate them. Legarra et al. [1] showed that construc- Ŵ sample from the analyzed populations (they tend to be tion of A from Ŵ and a pedigree reduces to the use of younger or selected), it was quickly acknowledged that the tabular rules [15] for construction of relationships, a proper analysis requires specifying different means for and its inversion is achieved by inversion of Ŵ followed by genotyped and non-genotyped individuals for the trait Henderson’s rules [16]. We provide an example of matri- Ŵ under consideration. These different means can be con- ces A and Ŵ in “Appendix”. However, the performance of sidered as parameters of the model, which are either fixed their model has not been tested so far, either for estima- [4] or random [5, 6] effects. In the latter case, the random tion of ancestral relationships or for genomic evaluation. variables induce covariances between individuals, a situ- This work has two objectives. The first is to show that ation that is informally referred to as “compatibility” of the structure of the metafounder approach yields an genomic and pedigree relationships. In fact, compatibility alternative parameterization and method for estimation implies equality of the average breeding value of the base of ancestral relationships. By doing so, we found that population and of the genetic variance [7] across the dif- ancestral relationships are generalizations of Wright’s Fst ferent measures of relationships. Numerically, the prob- fixation index [17]. The second goal is to test, by simu- lem appears as follows. The formulae for matrixH and its lation, (1) methods to estimate ancestral relationship − −1 − −1 inverse contain (G A22) and (G A22 ) (assuming G parameters, (2) the quality of genomic predictions using is full rank), respectively. This suggests that ifG and A22 metafounders, and (3) the quality of variance component differ too much, biases may appear. estimation. For the second goal, the simulated popula- Genomic relationships are usually computed in one of tion is undergoing selection and with a complete partially two manners: using “cross-products” [8] or “corrected genotyped pedigree. identity-by-state (IBS)” [9]. Both depend critically on assumed allele frequencies at markers in the pedigree Methods base population [10]. Base allele frequencies are often Relationship between metafounders and allele frequencies unavailable. However, for most purposes, allele fre- in the pedigree base population quencies are not of interest per se and can be treated as Single population nuisance parameters that can be marginalized. Chris- Let M be a matrix of genotypes coded as gene con- tensen [11] achieved an algebraic integration of allele tent, i.e. {0,1,2} and′ the genomic relationship matrix frequencies, leading to a very simple covariance struc- G = (M − J)(M − J) /s, with J a matrix of 1s, with ref- ture with allele frequencies in genomic relationships erence alleles taken at random, so that the expected p fixed at 0.5 (e.g., using genotypes coded as −{ 1,0,1} in allele frequency is 0.5 for a random locus [11]. In the cross-product method) and a parameter called γ other words, the matrix Z