INVESTIGATION
The Impact of Non-additive Effects on the Genetic Correlation Between Populations
Pascal Duenk,*,1 Piter Bijma,* Mario P. L. Calus,* Yvonne C. J. Wientjes,* and Julius H. J. van der Werf† *Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH Wageningen, the Netherlands and †School of Environmental and Rural Science, University of New England, Armidale 2351 NSW, Australia ORCID IDs: 0000-0001-8403-968X (P.D.); 0000-0002-3213-704X (M.P.L.C.); 0000-0002-0681-2902 (Y.C.J.W.)
ABSTRACT Average effects of alleles can show considerable differences between populations. The magnitude KEYWORDS of these differences can be measured by the additive genetic correlation between populations (rg ). This rg dominance canbelowerthanoneduetothepresenceofnon-additive genetic effects together with differences in epistasis allele frequencies between populations. However, the relationship between the nature of non-additive genomic effects, differences in allele frequencies, and the value of rg remains unclear, and was therefore the focus prediction of this study. We simulated genotype data of two populations that have diverged under drift only, or average effects under drift and selection, and we simulated traits where the genetic model and magnitude of non- population additive effects were varied. Results showed that larger differences in allele frequencies and larger divergence non-additive effects resulted in lower values of rg . In addition, we found that with epistasis, rg decreases with an increase of the number of interactions per locus. For both dominance and epistasis, we found that, when non-additive effects became extremely large, rg had a lower bound that was determined by the type of inter-allelic interaction, and the difference in allele frequencies between populations. Given that dominance variance is usually small, our results show that it is unlikely that true rg values lower than 0.80 are due to dominance effects alone. With realistic levels of epistasis, rg dropped as low as 0.45. These results may contribute to the understanding of differences in genetic expression of complex traits between populations, and may help in explaining the inefficiency of genomic trait prediction across populations.
Populations can differ considerably in the average effects of loci (i.e., a, belong to two populations at the same time. This means that a trait the difference between average effects of the two alleles, Falconer expressed in two populations can be seen as a pair of traits that cannot and Mackay (1996)). For a given genotype (i.e., individual), differ- be measured on the same individual, analogous to e.g., age at sexual ences in a between two populations lead to differences in the additive maturity in males and females (Falconer and Mackay 1996). Although genetic values of that genotype, as expressed in both populations. no phenotypic correlation exists between such pairs of traits, they can The magnitude of these differences can be measured by the additive nevertheless be genetically correlated. genetic correlation between populations (rg ), defined as the corre- The rg can be lower than one due to genotype by environment lation between the additive genetic values of a genotype expressed in interaction (GxE) (Falconer 1952), or due to non-additive genetic ef- population 1 and population 2. In reality, a single genotype cannot fects (GxG-interaction) together with differences in allele frequencies between populations (Fisher 1918). Knowledge of this correlation con- Copyright © 2020 Duenk et al. tributes to the understanding of the genetic architectures of polygenic doi: https://doi.org/10.1534/g3.119.400663 traits (de Candia et al. 2013; Brown et al. 2016). Such understand- Manuscript received August 25, 2019; accepted for publication December 18, ing may lead to improved knowledge of genetics and can facilitate 2019; published Early Online December 19, 2019. This is an open-access article distributed under the terms of the Creative accurate prediction of traits, such as disease risk in humans and Commons Attribution 4.0 International License (http://creativecommons.org/ yield traits in crops (Forsberg et al. 2017). Furthermore, under- licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction standing the genetic mechanisms that determine rg may help in in any medium, provided the original work is properly cited. explaining the inefficiency of trait prediction across populations Supplemental material available at figshare: https://doi.org/10.25387/g3.10252856. 1Corresponding author: Animal Breeding and Genomics, Wageningen University & (Wientjes et al. 2015). Research, Droevendaalsesteeg 1, P.O. Box 338, 6700 AH, Wageningen, the Following Falconer (1952), we can interpret a metric trait expressed Netherlands. E-mail: [email protected] in two populations as two different, genetically correlated traits.
Volume 10 | February 2020 | 783 n■ Table 1 Overview of scenarios with the genetic architecture of the trait, and their parameters for distributions of sampled dominance coefficients and epistatic coefficients Parameters for distributions of non-additive effects Configuration Small Intermediate Large ¼ : ; ¼ : ¼ : ; ¼ : ¼ : ; ¼ : D Dominance md 0 2 sd 0 30 md 0 2 sd 0 70 md 0 2 sd 1 50 ¼ : ; ¼ : ¼ : ; ¼ : ¼ : ; ¼ : EAA A A mg 0 0 sg 0 16 mg 0 0 sg 0 33 mg 0 0 sg 0 68 ¼ : ; ¼ : ¼ : ; ¼ : ¼ : ; ¼ : EDD D D mg 0 0 sg 0 16 mg 0 0 sg 0 33 mg 0 0 sg 0 68 ¼ : ; ¼ : ¼ : ; ¼ : ¼ : ; ¼ : EC Complementary mg 0 0 sg 0 16 mg 0 0 sg 0 33 mg 0 0 sg 0 68 ¼ : ; ¼ : ¼ : ; ¼ : ¼ : ; ¼ : EM Multiplicative mg 0 0 sg 0 16 mg 0 0 sg 0 33 mg 0 0 sg 0 68
The additive genetic value of individual i for the trait expressed in the relationship between the importance of dominance and rg between population that i belongs to (say, population 1) is two purebred populations. With epistasis, rg is also expected to depend on the magnitude of P1 ¼ 9 aP1; vi ha;i epistatic effects and on the difference in allele frequencies between populations. Epistasis in the functional (i.e., biological) sense means where ha;i is a column vector of additive genotypes (measured as allele that the genotypic values of individuals depend on interactions between counts, minus the mean allele count in the population) of individual alleles or genotypes at different loci (Bateson and Mendel 1909), and aP1 i at quantitative trait loci (QTL), and is a column vector of aver- there is substantial evidence for the existence of functional epistasis age effects at those QTL in population 1. The additive genetic value across species (Carlborg et al. 2003; Le Rouzic et al. 2008; Pettersson of individual i for another population (say, population 2) is et al. 2011; Mackay 2015). Epistasis in the statistical sense is measured as the deviation of multi-locus genotypic values from the sum of the P2 ¼ 9 aP2; vi ha;i marginal effects (i.e., average and dominance effects) of the individual loci (Fisher 1918; Cockerham 1954). Although functional epistatic aP2 where is a column vector of average effects in population 2. interactions do not necessarily lead to substantial statistical epistasis Conceptually, this vP2 can be thought of as the additive genetic value i (Cheverud and Routman 1995; Hill et al. 2008; Mäki-Tanila and Hill for an individual in population 2 that has the same genotype as in- fi fi 2014), epistasis can contribute signi cantly to the additive genetic dividual i. Here we de ne the additive genetic correlation between variance because average effects of individual loci may capture a sub- population 1 and population 2 (rg ) as the correlation between both stantial part of the functional epistasis (Hill et al. 2008; Mäki-Tanila additive genetic values for the individuals in population 1, and Hill 2014; Monnahan and Kelly 2015). Furthermore, epistatic variance may be ‘converted’ into additive genetic variance due to ¼ P1; P2 ¼ 9 aP1; 9 aP2 : rg cor vi vi cor ha;i ha;i (1) genetic drift or due to selection (Cheverud and Routman 1996; Hill 2017). Thus, epistatic interactions modify average effects of in- fi In other words, the rg is de ned for individuals coming from pop- dividual loci when allele frequencies change, and may therefore play ulation 1, which may be different from the r defined for individuals g an important role in the value of rg and its change over time. coming from population 2 (See Discussion). In summary, the rg between populations is affected by non-additive Equation (1) illustrates that the value of rg depends on the differ- effects in combination with differences in allele frequencies between ences in average effects between populations. With non-additive effects, populations. For populations in the same environment (i.e.,inthe average effects depend on the allele frequencies in the population, and, absence of GxE), rg is equal to 1 in the absence of non-additive effects therefore, larger differences in allele frequencies between populations or in the absence of allele frequency differences. So far, the relationship are expected to result in lower values of rg . between the nature and magnitude of non-additive effects, differences Note that r is the correlation between the additive genetic values, g in allele frequencies, and the value of rg remains unclear. Our objective not the genotypic values (i.e., additive plus non-additive genetic values). was therefore to investigate the impact of non-additive effects on rg for In the absence of GxE-interaction, the genotypic correlation be- populations that have diverged either under drift only, or under both tween both populations is equal to one irrespective of the pres- drift and selection. ence of GxG-interactions, because the genotypic value of a genotype (i.e., individual) is the same in both populations. The additive genetic METHODS correlation ðrg Þ may, however, be smaller than one because the parti- We aimed to investigate the relationship between non-additive effects tioning of genotypic values into additive genetic values, dominance and the additive genetic correlation between populations (rg )withsmall deviations and epistatic deviations depends on the allele frequencies effective size, as observed in livestock. For this purpose, we simulated (Fisher 1918; Cockerham 1954; Kempthorne 1954). genotypes of quantitative trait loci (QTL) for two populations that have A deeper understanding of the relationship between non-additive diverged for a number of generations under either pure drift, or under genetic effects, allele frequencies and rg may help geneticists to predict drift and selection. The populations were assumed to be kept in the the value of rg based on the importance of dominance and epistasis in same environment, so there was no GxE. We simulated traits following the expression of the trait, and the genetic distance between popula- several scenarios that differed in the type (i.e., genetic model) and the tions. Wei et al. (1991) studied the impact of dominance on the magnitude of non-additive effects (Table 1). additive genetic correlation between a purebred and crossbred pop- We considered six genetic models; a basic model with additive effects ulation, known as rpc. Using a two-locus model, they showed that rpc only (A), which served as a basis for comparison, and five alternative indeed depends on both the magnitude of the dominance effect (d), models with non-additive effects: one with only dominance effects and on the difference in allele frequencies between the populations. (D), and four with only epistatic effects. With epistasis, we simulated We are not aware of any theoretical studies that investigated the interactions between pairs of loci that followed one of the configurations
784 | P. Duenk et al. Figure 1 Epistatic contrasts for four biological epistatic configurations.
presented in Figure 1. We chose these genetic models so that there populations. In the selection-drift scenario, animals in population were scenarios with only dominance variance (D), scenarios with only 1 were selected based on their own phenotype (mass selection), while additive by additive epistatic variance (EAA and EM), and scenarios in population 2, selection was random. In this scenario, effective with all types of non-additive variance (EC and EDD). For each genetic population size (Ne) in population 1 was 250, which was calculated model, we considered three magnitudes of non-additive effects, as 1=ð2DFÞ, where DF is the inbreeding rate estimated from the labeled as small, intermediate, and large. pedigree (Falconer and Mackay 1996). We simulated selection only in population 1 to reduce computation time. Simulation We simulated genotypes of two livestock populations (1 and 2) that diverged for 50 generations (Figure 2). For divergence, we considered Genome: The simulated genome consisted of 10 chromosomes of two situations: one where the populations diverged due to drift only, 1 Morgan that each had 200 randomly positioned bi-allelic loci. fi and one where the populations diverged also due to selection in In the rst historical generation (generation -3001), we randomly population 1 and drift in population 2. sampled the allele frequencies of loci from a uniform distribution. Mutation rate was 2.5 1025 during the historical generations. In Populations: We simulated a historical population with QMSim generation 0, the distribution of allele frequencies had evolved to (Sargolzaei and Schenkel 2009) by randomly mating 100 males and a U-shape, and we randomly selected 500 segregating loci to become 500 females starting in generation -3001. From generation -3000 to QTL, which resulted in low linkage disequilibrium between QTL. generation -2501, we simulated a bottleneck by gradually decreasing There was no mutation from generation 0 to 50, because the population size to 150 (25 males and 125 females) to create initial QMSim software does not allow for mutation after the last historical linkage disequilibrium (LD), and population size was gradually in- generation. creased again to 600 during the next 100 generations. The population size remained constant from generation -2400 until -1 to allow for the Functional genetic effects: Additive effects (a)ofall500QTLwere development of mutation-drift equilibrium. To provide a sufficient sampled from Nð0; 1Þ. We assumed that the size of the dominance number of individuals for the development of populations 1 and 2, we and epistatic effects were proportional to the additive effects of the doubled the number of individuals in the last historical generation QTL involved in the interaction (Wellmann and Bennewitz 2011). (generation 0) to 200 males and 1,000 females. This simulation resulted We therefore sampled dominance coefficients (d) for all QTL from ð ; 2Þ in an average effective population size (Ne)of 285 generation 0, N md sd , from which dominance effects (d) were computed as calculated as the harmonic mean of 4NmNf in each preceding djaj. Similarly, we sampled epistatic coefficients (g) for all pairwise NmþNf ð ; 2 Þ generation, where Nm is the number of males and Nf is the num- epistatic interactions from N mg sgp,ffiffiffiffiffiffiffiffiffiffiffi from which functional ep- j j ber of females that become parents in a generation (Falconer and istatic effects (e) were computed as gkl akal where k and l denote Mackay 1996). the QTL involved in the interaction. Each QTL had an epistatic in- After simulating the historical population, we simulated two current teraction with 5 randomly sampled other QTL, resulting in a total of populations (1 and2). Werandomlysampled100 malesand 500 females 1250 pairwise interactions. from the last historical generation to become founders of population 1. For both dominance and epistasis, we considered 3 magnitudes of The remaining 100 males and 500 females were the founders of effects: small, intermediate, and large. For all magnitudes, the mean fi fi population 2. We will refer to the generation of founders as genera- dominance coef cient (md) was 0.2, and the mean epistatic coef cient tion 0. Within each population, simulation continued for 50 genera- (mg) was 0.0. The magnitude of dominance and epistatic effects were tions by randomly mating 100 selected males with 500 selected controlled by changing the standard deviation of dominance coef- females. Each mating resulted in 5 offspring, resulting in a total of ficients (sd) or epistatic coefficients (sg). For dominance, sd was 2,500 offspring (exactly 1,250 males and 1,250 females) in each 0.3 with small effects, 0.7 with intermediate effects, and 1.5 with generation. Generations were non-overlapping, meaning that in large effects (Table 1). The mean and standard deviation of small each generation, the parents were selected from the previous dominance coefficients were chosen based on empirical results of generation only. In the drift scenario, animals in both populations Bennewitz and Meuwissen (2010) and Sun and Mumm (2016). For were randomly selected to become parents of the next generation. epistasis, sg was scaled such that the total functional epistatic vari- Effective population size (Ne) in the drift scenario was 285 in the two ance was comparable to the total functional dominance variance in
Volume 10 February 2020 | Impact of Non-additive Effects on rg | 785 Figure 2 Overview of the simu- lated population structure.
thep scenarioffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi with the same magnitude. To this end, sg was computed obtain the average effects and dominance effects in population ð 2 þ 2Þ= aP1 aP2 as md sd Ng, where Ng is the number of epistatic interactions 1( ). The procedure to obtain these effects in population 2 ( ) 2 2 per QTL, and md and sd are the squared mean and variance of follows naturally by replacing the genotype and allele frequencies of dominance effects in the scenario with the corresponding magni- population 1 with the frequencies in population 2. ptude.ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi For example, with small epistatic effects, sg was computed as The procedure starts by applying the natural and orthogonal ð0:22 þ 0:32Þ=5 0:16 (Table 1). interactions (NOIA) model (Álvarez-Castro and Carlborg 2007) for each epistatic interaction between two QTL. First, functional epi- From functional dominance and epistatic effects to statistically static values for the 9 possible two-locus genotypes at QTL k and l orthogonal effects: We simulated dominance and epistasis by intro- were collected in a vector ckl ¼ tekl,whereekl is a scalar represent- ducing functional dominance and epistatic effects that are independent ing the functional epistatic effect between QTL k and l,andt is a of allele and genotype frequencies. Our interest, however, is in statistical 9 · 1 vector of epistatic contrasts for the 9 two-locus genotypes, average, dominance and epistatic effects of QTL, which do depend on ordered as (WWYY, WwYY, wwYY, WWYy, ..., wwyy). The sim- genotype frequencies (Fisher 1918; Cheverud and Routman 1995). We ulated epistatic contrasts in t followed one of four configurations: will describe the general procedure to obtain these statistical effects, for additive x additive (EAA), dominance x dominance (EDD), comple- a general situation where there can be dominance, epistasis, or both. mentary (EC), or multiplicative (EM) (Figure 1). The contrasts in t Note, however, that our scenarios had either dominance or epistasis, were centered and scaled to a standard deviation of one, so that the but never both. After obtaining statistical effects, we describe how we contrasts were comparable between configurations. We then used computed the additive genetic value, genotypic value and phenotype genotype frequencies of QTL k and l to partition the functional ep- for each individual. Although genotypic values themselves are inde- istatic values in ckl into 9 statistical genetic effects (Álvarez-Castro and pendent of genotype frequencies, the partitioning of these genotypic Carlborg 2007; Vitezica et al. 2017) values into additive, dominance, and epistatic components does de- 21 ¼ 9 9 ; pend on genotype frequencies. Additive genetic values of individuals bkl WklDklWkl WklDklckl in population 1 were needed to compute rg , and genotypic values and phenotypes were needed because selection in population 1 was based where Dkl is a 9x9 diagonal matrix with each of the nine genotype on own performance. In the following, we describe the procedure to frequenciesinthesameorderasint.MatrixWkl ¼ Wk5Wl,
786 | P. Duenk et al. 9 9 9 9 9 where 5 denotes the Kronecker product, and Wk and Wl are P1 ¼ P1 þ P1 þ 5 ðaaÞP1 þ 5 ða ÞP1 gi vi hd;id ha;i ha;i ha;i hd;i d constructed as þ 9 5 9 ð aÞP1 þ 9 5 9 ð ÞP1; hd;i ha;i d hd;i hd;i dd ¼½ Wx 1 wa wd 2 3 where h ; is a column vector of additive genotype indicators for 2p p a i 2ð2 2 Þ Xx xx 6 1 pXx 2pxx 2 7 individual i, and hd;i is column a vector of dominance genotype 6 p þ p 2 p 2p 7 6 XX xx XX xx 7 indicators for individual i. These indicators were coded following 6 7 6 4pWW pww 7 the NOIA parameterization as denoted in the rows of wa and wd ¼ 6 1 2ð1 2 p 2 2p Þ 7; 6 Xx xx þ 2 2 2 7 (Equation 1) for genotypes XX, Xx,andxx, respectively. Pheno- 6 pXX pxx pXX pxx 7 6 7 types with a broad sense heritability of 0.5 were computed as 4 2pWW pWw 5 P1 ¼ P1 þ P1 P1 ð ; 2Þ 2 1 2ð2 2 p 2 2p Þ y g e ,wheree N 0 se ,andse was equal to the Xx xx þ 2 2 2 2 pXX pxx pXX pxx variance of genotypic values (sg ). (1) Computing parameters of interest where columns relate to orthogonal contrasts for the mean (1), average The parameters of interest were (1) the genetic correlation between the effect (wa), and dominance effect (wd) of QTL X and where pXx, pXx, traitinpopulation 1 and the trait in population 2 (r ), and(2) the average g and pxx are the genotype frequencies of QTL X. The resulting vector of absolute difference in allele frequencies between populations (Dp). For statistical genetic effects is each generation, we computed rg as the Pearson correlation between h i9 the additive genetic values of individuals in population 1 for the trait ¼ ; k ; k ; l ; ð Þ ; ð Þ ; l ; ð Þ ; ð Þ ; expressed in the two populations (equation (1)). Effectively, this r is a bkl m akl dkl akl aa kl da kl dkl ad kl dd kl (2) g weighted correlation between aP1 and aP2, where the weights depend k l on the allele frequencies in population 1. Hence, the rg computed as the where akl and akl are the terms that contribute to average effects of QTL k and l. The other terms in bkl contribute to dominance effects correlation of additive genetic values of individuals in population k ; l 2 may give different results because the genotypes sampled from (dkl dkl) of individual QTL and to epistatic effects of interacting QTL ðaaÞ ; ð aÞ ; ða Þ ; ð Þ population 2 result in different weights than those sampled from ( kl d kl d kl dd kl). We repeated this procedure of partitioning functional epistatic populationP 1 (see Discussion). For each generation, we computed Dp ð A 2 B Þ= effects into statistical genetic effects for all pairwise interactions as pk pk 500. We chose this parameter as a measure for between QTL. Each QTL was involved in 5 epistatic interactions population divergence, because we expect that there is a linear re- D and therefore has 5 terms that contribute to its average effect. Fol- lationship between p and rg . These parameters were computed for lowing this reasoning, the average effect of QTL k in population 1 with generation 1 to 5, and for every 5th generation after generation 5, epistasis is to limit computation time.