ARTICLE IN PRESS
Journal of Theoretical Biology 235 (2005) 207–219 www.elsevier.com/locate/yjtbi
Background selection and population differentiation
Xin-Sheng HuÃ, Fangliang He
Department of Renewable Resources, 751 General Services Building, University of Alberta, Edmonton, AB Canada T6G 2H1
Received 23 July 2004; received in revised form 24 November 2004; accepted 6 January 2005 Available online 8 March 2005
Abstract
A general analytical formula is derived, which predicts the effects of background selection on population differentiation at a neutral locus as a result of its linkage with selected loci of deleterious mutations. The theory is based on the assumptions of random mating, multiplicative fitness, and weak selection in hermaphrodite plants in the island model of population structure. The analytical results show that Fst at the neutral locus increases as a result of the effects of background selection, regardless of the dependence or independence among linked background selective loci. The increment in Fst is closely related to the magnitude of linkage disequilibria between the neutral locus and selected loci, and can be estimated by the ratio of Fst with background selection to Fst without background selection minus one. The steady-state linkage disequilibrium between a neutral locus and a selected locus in subpopulations, primarily attained by gene flow, decreases with the recombination rate, and can be enhanced when there are dependence among linked selected loci. Monte Carlo computer simulations with two- and three-locus models show that the analytical formulae perform well under general conditions. Application of the present theory may aid in analyzing the genome-wide mapping of the effect of background selection in terms of Fst. r 2005 Elsevier Ltd. All rights reserved.
Keywords: Background selection; Population differentiation; Linkage disequilibrium; Gene flow; Selection
1. Introduction 2000). When LD equals zero, both kinds of effects disappear. Like the selectively favored mutations that cause In a natural population without subdivision the LD hitchhiking effects on linked neutral loci (Maynard between two linked neutral loci dissipates with genera- Smith and Haigh, 1974), selectively disfavored muta- tion as the consequence of recombination, and even- tions can also change gene frequencies and reduce tually approaches zero (e.g., Bennett, 1954; Hill and genetic diversities at linked neutral loci (‘‘background Robertson, 1968; Hill, 1974). In the population with selection’’, Charlesworth et al., 1993). Early studies subdivision the dissipation of global LD with generation showed that a substantial reduction in genetic diversity is enhanced since the inter-subpopulation gene flow can at a neutral locus can result from its linkage to reduce the effective size of the whole population and deleterious mutations (e.g., Charlesworth et al., 1993; hence increase the drift speed (Wright, 1943). However, Hudson and Kaplan, 1995; Nordborg et al., 1996). The a certain amount of LD in local subpopulations can be genetic basis for maintaining both kinds of effects is the attained owing to the effects of inter-subpopulation gene persistence of the linkage disequilibrium (LD) between flow that counteracts genetic drift. Stable LD between neutral and selected loci (see the review by Barton, selected nuclear loci without epistasis can be maintained in subdivided populations (Li and Nei, 1974). When the recombination fraction between selected nuclear loci is ÃCorresponding author. Tel.: +1 780 492 0715; of the same order or smaller than the selection fax: +1 780 492 4323. coefficient, a substantial amount of LD can be present E-mail address: [email protected] (X.-S. Hu). in a cline (Slatkin, 1975). The LD between selected
0022-5193/$ - see front matter r 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.jtbi.2005.01.004 ARTICLE IN PRESS 208 X.-S. Hu, F. He / Journal of Theoretical Biology 235 (2005) 207–219 nuclear and cytoplasmic loci that are physically un- is much less fully explored (Charlesworth et al., 1997; linked can even be generated when inter-population Nordborg, 1997; Barton, 2000). In this study, we gene flow (seed and pollen flow) takes place (e.g., Hu analytically derive the population differentiation at a and Li, 2002). neutral site due to its linkage with the sites that are Similarly, the persistence of the LD between one subject to unfavorable mutations. Computer simula- neutral locus and another selected locus is expected in a tions are conducted to validate the analytical results that local population owing to the inter-subpopulation gene demonstrate the increase in Fst owing to the effect of flow. As long as a certain amount of LD between neutral background selection. and selected loci is preserved, the effect of background selection should be present. The persistence of LD can cause an increase in variance of neutral allele 2. Assumptions frequencies and hence increase its population differen- tiation (Barton, 2000). Based on the classical island model of population Previous LD studies are often examined in terms of structure (Wright, 1969), here we consider diallelic genes or molecular markers as an ‘‘observation unit’’ at selected nuclear loci (diploid) that are linked with a the equilibrium between gene flow and genetic drift. At a neutral locus in a hermaphrodite population of plants. fine scale, the length of genomes for maintaining a For simplicity the selected loci addressed throughout certain amount of LD can be long in terms of the this study refer to those with selectively disfavored number of base pairs and so is the length within which mutation. Weak selection is considered in modeling so background selection has a significant effect. It is that all terms containing the second or higher order of meaningful to examine the effect of background selection coefficient are neglected. Like Nordborg et al. selection in terms of single nucleotide polymorphisms (1996), the selected loci are subject to a balance of (SNP) as an observation unit/marker. For example, 1 mutation–selection–migration, and genetic drift effects percent of recombination fraction (1 centiMorgan or are assumed negligible. The dependence among selected cM) is equal to 1 million base pairs on the physical map loci, caused by gene flow, is considered, relaxing the in human genomes and contains about 1000 SNP (e.g., independence assumption made by Nordborg et al. Wang et al., 1998). Within a few cMs of genetic distance (1996) and Hudson and Kaplan (1995). the LD between one selected nucleotide site and another The modeling procedure is based on a sequence of neutral site is likely substantial, and the effect of events in the life cycle of hermaphrodite plants: pollen background selection on population differentiation at flow, random combination between pollen and ovules the individual neutral sites can be significant. Evidence (random mating), seed flow, mutation, natural selection, indicates that high LD may extend over several genetic drift, and next adults. This procedure is similar centiMorgans in cattle and human genomes (e.g., Farnir to Hu and Ennos (1999) except that mutation and et al., 2000; Abecasis et al., 2001). Needless to say, LD background selection are included and also similar to distribution along genomes varies with populations Nordborg et al. (1996) except that migration is (e.g., Goddard et al., 2000; Shifman et al., 2003). considered. The gene frequencies in migrants of pollen SNP are abundant in various organisms, such as in grains or seeds are equal to the average of gene Arabidopiss thaliana and the rice genome (see the review frequencies over all subpopulations. The gene frequency by Rafalski, 2002). The genetic diversities of SNP within in ovules before random combination with pollen grains either the coding or non-coding regions of a gene are is assumed to be the same as that in the preceding affected by their physical distances from the selected generation. sites that may be located within the same gene or in the In the following we first derive the change of gene regions of other genes. For multiple linked genes with frequency at a neutral locus as a result of linkage to one unequal numbers of SNP, the spatial pattern of genetic and two selected loci, and then give a general expression diversity across SNP could exhibit a patchy pattern for the change due to the background selection from an along chromosomes. These naturally occurring patterns arbitrary number of selected loci. Wright’s F st is then of SNP diversity provide a tool for mapping the effect employed to describe the population differentiation. of background selection in terms of population differ- entiation. The purpose of this study is to develop further 3. Allele frequency population genetic theory required for understanding the effect of background selection on population 3.1. Two-locus case differentiation at a neutral locus. Although the effect of background selection on genetic diversity of a neutral Consider a selected locus A that is linked to a neutral gene has widely been appreciated, the theoretical locus C in the ith subpopulation. The wild-type allele at investigation of such effect on population differentiation the A locus is denoted by Ai, and its mutant allele by ai; ARTICLE IN PRESS X.-S. Hu, F. He / Journal of Theoretical Biology 235 (2005) 207–219 209 their frequencies are p and p ðp þ p ¼ 1Þ; respec- of Eq. (2) is the change due to migration (seed and Ai ai Ai ai tively. Let the mutation rate from the wild-type allele to pollen flow), the second term is the change due to the the mutant allele be u1 at the A locus. The fitness of mutation of the allele Ci to other alleles, and the third genotypes is assumed to be 1, 1Às1i,and1À2s1i for the term is the change due to the linkage to the selected A genotypes of AiAi, Aiai,andaiai, respectively. The locus. If the linkage disequilibrium is of the order similar migration rates of pollen and seeds into each subpopu- to the selection coefficient (s1i), the third term on the lation are denoted by mP and mS; respectively. Accord- right-hand side of Eq. (2) is negligible. Since LD is ing to the life cycle mentioned in the assumptions, the primarily generated by the inter-subpopulation gene change in the allele frequency at the A locus is given by flow, its magnitude can be much greater than the order of selection coefficient when the recombination fraction Dp ¼ p p s À u p À m~ ðp À p¯ Þ, (1) Ai Ai ai 1i 1 Ai Ai A is very small, say within a few cMs of genome. where m~ ¼ mS þ mP=2; p¯A is the frequency of the allele From the setting of the conditional probabilities of x0i Ai in migrants (seeds and pollen grains). This equation and x1i; DACðiÞ can be expressed by can also be implied from Wright’s general expression D ¼ p p ðx À x Þ. (3) (Wright 1969, p. 474). The first term on the right-hand ACðiÞ Ai ai 0i 1i side of Eq. (1) represents the increment in p due to Ai When the neutral allele C is equally distributed under selection, the second term is the reduction due to i the mutant and mutant-free backgrounds of the A locus, mutation, and the third is the change due to immigra- i.e. x ¼ x ; the effect of background selection equals tion. At steady state Dp ¼ 0; the allele frequencies 0i 1i Ai zero ðD ¼ 0Þ: at the A locus can be analytically solved from Eq. (1), ACðiÞ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi At the steady state the changes in conditional p ¼ððs À u À m~ ÞÆ ðs À u À m~ Þ2 þ 4s m~ p¯ Þ=2s Ai 1i 1 1i 1 1i A 1i probability x0i and x1i per generation is equal to zero. with the condition of 0 p 1: p Ai p Genetic drift does not change the means of the Consider the neutral locus that has alleles Ci and ci in conditional probabilities x0i and x1i and hence the mean the ith subpopulation. Let the mutation rate from Ci to of DACðiÞ although it alters the distributions of these ci be v. There are four types of two-locus gametes: AiCi, variables. Instead of using the diffusion model (e.g.
Aici, aiCi, and aici, with frequencies of PAiCi ; PAici ; PaiCi ; Nordborg et al., 1996), the steady state x0i and x1i can and Paici ; respectively. Let x0i (0px0ip1) be the be calculated by letting Dx0i ¼ Dx1i ¼ 0 according to probability that the allele Ci is linked with a mutant- Eqs. (A.3) and (A.4) in Appendix A, that is free background of gametes with respect to the A locus, ! p s À u À v À r~ p À m~ r~ p p(Ci|Ai) ¼ x0i. Let x1i (0px1ip1) be the probability that ai 1i 1 1i ai 1i ai the allele Ci is linked with the mutant allele ai, r~ p þ u p =p Àp ðs þ r~ ÞÀv À m~ 1i Ai 1 Ai ai Ai 1i 1i p(Ci|ai) ¼ x1i. The conditional probabilities for x0i and ! ! Àm~ p¯ =p x1i in migrants are denoted as x¯ 0 and x¯ 1; respectively. x0i AC Ai ¼ , ð4Þ According to the Bayesian theorem, the frequencies of Àm~ p¯ =p x1i aC ai the four types of gametes can be expressed as PAiCi ¼ p x ; P ¼ p ð1 À x Þ; P ¼ p x ; and P ¼ Ai 0i Aici Ai 0i aiCi ai 1i aici where r~ ¼ r ð1 Àð1 À 2p Þs Þ: From Eq. (4), we p ð1 À x Þ: 1i 1 ai 1i ai 1i obtained Let r1 be the recombination fraction between the A and C loci. Following the approach similar to Nordborg m~ ðr~ p¯ þ P¯ ðp s þ v þ m~ Þ=p Þ 1i C AC Ai 1i Ai et al. (1996, p. 170), the changes in the conditional x0i ¼ , (5a) r~1iðm~ þ vÞ probabilities of x0i and x1i due to the joint effects of migration, selection, and mutation at the A locus are m~ ðr~ p¯ ÀðÀu P¯ þ P¯ ðp s À u À v À mÞÞ=p Þ derived as Eqs. (A.3) and (A.4) in Appendix A. When 1i C 1 AC aC ai 1i ai x1i ¼ . there is no effect of migration, Eqs. (A.3) and (A.4) r~1iðm~ þ vÞ reduce to the previous results of Nordborg et al. (1996). (5b) Let Dp0 be the change in the frequency of the neutral Ci allele Ci due to the joint effects of background selection, According to Eqs. (1) and (3) and the relation p¯C ¼ mutation, and migration. According to Eqs. (A.3) and p¯AC þ p¯aC; the steady-state DAC(i) can be derived as (A.4) in Appendix A and the relation of p ¼ P þ Ci AiCi 0 m~ v PaiCi the analytical expression for DpC is given by D ¼ D¯ À ðp À p¯ Þp¯ , (6) i ACðiÞ AC Ai A C r~1i m þ v Dp0 ¼ DP þ DP ¼ p Dx þ p Dx Ci AiCi aiCi Ai 0i ai 1i where D¯ is the LD in migrants. Eq. (6) explicates that ¼Àm~ ðp À p¯ ÞÀvp þ s D , ð2Þ AC Ci C Ci 1i ACðiÞ DAC(i) reduces with the increasing recombination frac- where DAC(i) is the LD between the A and C loci in the tion, but increases with the increasing migration rate or ith subpopulation. The first term on the right-hand side the selection coefficient. ARTICLE IN PRESS 210 X.-S. Hu, F. He / Journal of Theoretical Biology 235 (2005) 207–219
3.2. Three-locus case Solutions to Eq. (9) can be calculated with the Mathematica tool. Assume that the neutral locus is linked on either side There are eight types of three-locus gametes: AiCiBi, to a selected locus each with disfavored mutations. The AiCibi, AiciBi, Aicibi, aiCiBi, aiCibi, aiciBi, and aicibi, with difference between the two- and three-locus cases is that frequencies of PAiCiBi ; PAiCibi ; PAiciBi ; PAicibi ; PaiCiBi ; the effects of LD between selected loci generated by gene PaiCibi ; PaiciBi ; and Paicibi ; respectively. Let the prob- flow and the double crossover among the three loci are ability that the allele Ci is linked with a mutant-free included. Assume that another diallelic selected locus B background of gametes with respect to the two selected links to the neutral locus C at the opposite side to the A loci, p(Ci|AiBi), be y0i. Similarly, let p(Ci|aiBi) ¼ y1i, locus, i.e. the order of ACB. The wild-type allele at the B p(Ci|Aibi) ¼ y2i, and p(Ci|aibi) ¼ y3i. All these condi- locus in the ith subpopulation is denoted by Bi, and its tional probabilities are in the range of 0 to 1 mutant allele is denoted by b ; their frequencies are p ð0 y 1; j ¼ 0; 1; 2; 3Þ: The frequencies of the three- i Bi p jip and p ðp þ p ¼ 1Þ; respectively. Let the mutation locus gametes A C B and A c B can be written as bi Bi bi i i i i i i rate from the wild-type allele to the mutant be u2 at the PAiCiBi ¼ PAiBi y0i; and PAiciBi ¼ PAiBi ð1 À y0iÞ; respec- B locus. The fitness is assumed to be 1, 1Às2i, and 1À2s2i tively, where PAiBi is the frequency of gamete AiBi. The for the genotypes of BiBi, Bibi, and bibi, respectively. expressions for the remaining six three-locus gametes Using the assumption of multiplicative viability, the (AiCibi, Aicibi, aiCiBi, aiCibi, aiciBi, and aicibi) can be fitness for each two-locus genotype can be readily written in a similar way. The conditional probabilities calculated. Following the life cycle the steady-state allele for y0i; y1i; y2i; and y3i in migrants are denoted as y¯0; y¯1; frequencies at the A and B loci at the balance of y¯2; and y¯3; respectively. migration–selection–mutation are shown to have the Let r2 be the recombination fraction between the B following relations: and C loci. The changes in the conditional probabilities of y0i, y1i, y2i,andy3i due to the joint effects of p p s1i À u1p À m~ ðp À p¯ ÞÀDABðiÞs2i ¼ 0, (7a) Ai ai Ai Ai A migration, selection, and mutation are given in Appen- dix C. According to Appendix C the analytical expres- pB pb s2i À u2pB À m~ ðpB À p¯BÞÀDABðiÞs1i ¼ 0, (7b) i i i i sion for Dp0 is Ci where p¯B is the frequency of the allele Bi in migrants Dp0 ¼ DP þ DP þ DP þ DP (seeds and pollen grains), and DAB(i) is the steady-state Ci AiCiBi aiCiBi AiCibi aiCibi LD between the A and B loci. ¼ PA B Dy þ Pa B Dy þ PA b Dy þ Pa b Dy The steady-state D can be obtained from Eq. i i 0i i i 1i i i 2i i i 3i AB(i) ¼Àm~ ðp À p¯ ÞÀvp þ s D þ s D , (B.2) in Appendix B, that is Ci C Ci 1i ACðiÞ 2i CBðiÞ ð10Þ m~ ðD¯ þðp À p¯ Þðp À p¯ ÞÞ AB Ai A Bi B DABðiÞ ¼ , 1 Àð1 À m~ Àðp À p Þs1i Àðp À p Þs2i À u1 À u2Þð1 À rÞ Ai ai Bi bi where DCB(i) are the LD between the C and B loci in the (8) ith subpopulation. where r is the recombination rate between the A and the According to the conditional probabilities of y0i, y1i, B loci. The analytical expressions for the allele y2i, and y3i, DAC(i) and DCB(i) can be, respectively, frequencies at the A and B loci are hard to obtain using given by the joint Eqs. (7) and (8). In the specific case where allele D ¼ p p ðp ðy À y Þþp ðy À y ÞÞ ACðiÞ A1 ai Bi 0i 1i bi 2i 3i frequencies at the A and B loci are coincident, i.e. s1i ¼ þðp ðy À y Þþp ðy À y ÞÞD , ð11aÞ s ¼ s ; u ¼ u ¼ u; p¯ ¼ p¯ ¼ p¯; and p ¼ p ¼ p ; ai 0i 2i Ai 1i 3i ABðiÞ 2i i 1 2 A B Ai Bi i we obtained a cubic equation
3 2 DCBðiÞ ¼ pB pb ðpA ðy0i À y2iÞþpa ðy1i À y3iÞÞ d0pi þ d1pi À d2pi À d3 ¼ 0, (9) i i i i þðp ðy À y Þþp ðy À y ÞÞD . ð11bÞ where bi 0i 2i Bi 2i 3i ABðiÞ 2 The first part on the right-hand side of Eqs. (11a) or d0 ¼ 4ð1 À rÞsi , (11b) is the amount without the influence of the LD d1 ¼ sið1 þ m~ Àð1 À m~ À 2u þ 2siÞð1 À rÞ between the A and B loci, and the second part is the increment due to the LD between the A and B loci À 4ðsi À u À mÞð1 À rÞÞ, generated by gene flow. The above equation analytically d2 ¼ð1 Àð1 À m~ À 2u þ 2siÞð1 À rÞÞðsi À u À m~ Þ demonstrates that the presence of the LD among linked selected loci can enhance the effects of background þ 4sim~ ð1 À rÞp¯ þ 2sim~ p¯ selection. The proportion of DAC(i) and DCB(i) explained and by the component of D can be assessed by ðp ðy À ABðiÞ ai 0i y Þþp ðy À y ÞÞDABðiÞ=DACðiÞ and ðp ðy À y Þþ 2 2i Ai 1i 3i bi 0i 2i d3 ¼ m~ ðð1 Àð1 À m~ À 2u þ 2siÞð1 À rÞÞp¯ À siD¯ AB À sip¯ Þ. p ðy À y ÞÞD =D ; respectively. Bi 2i 3i ABðiÞ CBðiÞ ARTICLE IN PRESS X.-S. Hu, F. He / Journal of Theoretical Biology 235 (2005) 207–219 211
Letting Dyji ¼ 0 ðj ¼ 0; 1; 2; 3Þ in Appendix C, we 0.04 obtain four non-linear equations for calculating the steady-state conditional probabilities (y , y , y , 0i 1i 2i 0.03 and y3i), 0 1 h00 h05 B C ÀÁ0.02 B C T @ ... A y0i y1i y2i y3i y0iy3i y1iy2i h ... h 0.01 DAC(i) 30 35 4 6 0 1 Linkage disequilibrium DAB(i) g0 B C B C 0.00 ¼ @ ...A , ð12Þ 00.02 0.04 0.06 0.08 0.1 0.12 g (a) m + m /2 3 4 1 S P where 25
h ¼ s p þ s p À u À u À v À m~ (%) 00 1i a 2i b 1 2 )
i i i ( 20 explained AB
Àð1 þ 2s1ip þ 2s2ip ÞðPa B ð1 À s1iÞr1 ai bi i i ) D i (
þ PAibi ð1 À s2iÞr2 þ Paibi ð1 À s1i À s2iÞ AC 15 D ðr1 þ r2 À r1r2ÞÞ; ..., 10 1 þ 2s1ip þ 2s2ip h ¼ ai bi ðr r ð1 À s À s ÞP P Þ, 35 1 2 1i 2i Aibi aiBi 5 Paibi by the component of
The proportion of 0 m~ P¯ ACB m~ P¯ aCb g0 ¼À ; ...; g3 ¼À . 00.02 0.04 0.06 0.08 0.1 0.12 PAiBi Paibi (b) mS + mP /2 The analytical solution is algebraically complicated, but can be numerically solved with Mathematica model Fig. 1. Effects of the dependence between the selected A and B loci: (a) solutions. Changes in DACðiÞ and DABðiÞ with the migration rate. (b) The proportion of DACðiÞ explained by the component of DABðiÞ: DAB(i) is Our numerical examples demonstrate that both DABðiÞ calculated according to Eqs. (7) and (8) while DAC(i) is calculated and DACðiÞ increase with the migration rate (Fig. 1a). according to Eq. (11a). The settings of other parameters are the À5 The proportion of DACðiÞ explained by the component of mutation rate for the A and B loci u1 ¼ u2 ¼ 10 and for the neutral locus v ¼ 10À4, the selection coefficients s ¼ s ¼ 0:02; the recombi- DABðiÞ can be more than 20% when m~ ¼ 0:1(Fig. 1b). 1i 2i nation rates between the A and C loci or the B and C loci r1 ¼ r2 ¼ Both DABðiÞ and DACðiÞ decrease with the recombination 0:01 and between the A and B loci r ¼ 2r1; the LD between the A and B rate (Fig. 2a,b). The proportion of DACðiÞ explained by ¯ loci in migrants DAB ¼ 0:03; the migrant allele frequencies p¯A ¼ p¯B ¼ the component of DABðiÞ can be substantially increased 0:8; and the conditional probabilities in migrants for the neutral locus when the three loci are tightly linked, and more than C under different backgrounds y¯0 ¼ 0:8; y¯1 ¼ y¯2 ¼ 0:7; and y¯3 ¼ 0:2: 20% of DACðiÞ can be brought about when r1 ¼ r2 ¼ 0:0005 (Fig. 2c). The cumulative effects Li from multiple selected loci 3.3. General case are likely substantial when they closely link to the neutral locus. From the preceding two- and three-locus analyses, extension can be obtained to a more general case where a neutral locus links to an arbitrary number of selected 4. Population differentiation loci among which the interaction may exist. The change in the frequency of the neutral locus Ci in 4.1. Equal spatial selection the ith subpopulation, regardless of the magnitude of the linkage disequilibria among selected loci, can be We now examine the effects of background selection generally expressed as on population differentiation using the classical island 0 model (Wright, 1969). Background selection can change DpC ¼Àm~ ðpC À p¯CÞÀvpC þ Li, (13) i Pi i both the effective population size and the neutral allele L where Li ¼ j¼1 sjiDMjCðiÞ; in which Mj represents the frequency. In the preceding section we have shown the wild-type allele at the jth selected locus. Effects of the systematical change of allele frequencies at a neutral
LD among multiple selected loci are included in DMjCðiÞ: locus as a result of linkage to selected loci. Previous ARTICLE IN PRESS 212 X.-S. Hu, F. He / Journal of Theoretical Biology 235 (2005) 207–219
0.03 effective size for the ith subpopulation under the impacts 0 of background selection, denoted by Nei; can be approximated by 0.02 0 À ) li i ( Nei ¼ Nee , (14) AB
D where 0.01 XL 1 À p Mj li ¼ 2 . ð1 þ r~j=sjiÞ 0 j¼1 0 0.10.2 0.3 0.4 0.5 We assume that the effective subpopulation size is not (a) Recombination rate (r ) between the A and B loci affected by gene flow although the effective size of the whole population is changed (Wright, 1943). 0.04 2 Denote by s0 the variance of allele frequencies among subpopulations after pollen and seed flow and 0.03 background selection. Suppose that the number of 2 ) 0 i
( subpopulations (n) is large. According to Eq. (13), s
AC 0.02 can be calculated by D Xn 2 1 0.01 s0 ¼ ðp0 À p¯ Þ2 n Ci C i¼1 0 2 EFððð1 À m~ ÞðpC À p¯CÞÀvpC þ LiÞ Þ 00.05 0.1 0.05 0.2 0.25 i i 2 2 2 (b) Recombination rate (r1) between the A and C loci ¼ðð1 À m~ Þ À 2vÞs þ P þ L , ð15Þ
25 where EF represents the expectation with respect to the ) i ( allele frequency distribution among subpopulations,
AC 20 P ¼ 2ð1 À m~ ÞE ððp À p¯ ÞL Þ; and L2 ¼ E ðL2Þ: Note D F Ci C i F i 15 that the expectations of the terms involving coefficients (%)
) 2 i
( of v ; mv; and vs are neglected in deriving Eq. (15).
AB 10 When the selection coefficients for the same allele at D any selected locus are equal among all subpopulations, 5 i.e. s1j ¼ ¼snj; then l1 ¼ ¼ln and the effective
The proportion of 0 0 0 subpopulation size is the same, i.e. Ne1 ¼ ¼Nen ¼ explained by the component of 00.05 0.1 0.05 0.2 0.25 0 Ne: The effects of background selection are equal among 2 2 (c) Recombination rate (r1) between the A and C loci subpopulations, i.e. L1 ¼ ¼Ln ¼ L; and L ¼ L : The second term on the right-hand side of Eq. (15) Fig. 2. Effects of the dependence between the selected A and B loci: (a) varnishes, i.e. P ¼ 0: According to Hu and Ennos changes in DABðiÞ with the recombination fraction between the A and B (1999), the steady-state variance of allele frequencies loci; (b) changes in D with the recombination fraction between the ACðiÞ after genetic drift, can be written as A and C loci; (c) the proportion of DACðiÞ explained by the component of D : D is calculated according to Eqs. (7) and (8), while D ABðiÞ ABðiÞ ACðiÞ 1 is calculated according to Eq. (11a). The settings of other parameters 2 ¼ À ððð À Þ2 À Þ 2 þ 2Þ À5 s 1 0 1 m~ 2v s L are the mutation rate for the A and B loci u1 ¼ u2 ¼ 10 and for the 2Ne À4 neutral locus v ¼ 10 , the selection coefficients s1i ¼ s2i ¼ 0:02; the p¯ ð1 À p¯ Þ recombination rates between the A and C loci or the B and C loci þ C C . ð16Þ 2N0 r1 ¼ r2 and between the A and B loci r ¼ 2r1; the LD between the A e and B loci in migrants D¯ AB ¼ 0:03; the migration rate m~ ¼ 0:05; the migrant allele frequencies p¯ ¼ p¯ ¼ 0:8; and the conditional prob- Population differentiation at the neutral locus, denoted A B 2 abilities in migrants for the neutral locus C under different back- by F st1 (¼ s =p¯Cð1 À p¯CÞ), can be obtained by substitut- grounds y¯0 ¼ 0:8; y¯1 ¼ y¯2 ¼ 0:7; and y¯3 ¼ 0:2: ing Eq. (14) into Eq. (16), 1 2N eÀl À 1 ¼ þ e 2 studies showed that background selection can reduce the F st1 Àl 1 L . (17) 1 þ 4Neðm~ þ vÞe p¯Cð1 À p¯CÞ effective subpopulation size (Ne) for the neutral locus (e.g. Nordborg et al., 1996) and hence affect the genetic Clearly, F st1 is greater than the population differentia- drift process. Here we include both effects in deriving tion under the purely neutral process, denoted by F st:b the expression for population differentiation. According ( ¼ 1=ð1 þ 4Nm~ Þ) for diploid nuclear genes (Hu and to Eq. (4) of Nordborg et al. (1996), the steady-state Ennos, 1999). ARTICLE IN PRESS X.-S. Hu, F. He / Journal of Theoretical Biology 235 (2005) 207–219 213
4.2. Unequal spatial selection selection. The steady-state F st under the purely neutral process equals 0.2320 for m~ ¼ 0:015; 0.0847 for m~ ¼ The more general situation is that the selection 0:05; and 0.0444 for m~ ¼ 0:1; with the settings of other À4 coefficients of the same allele at any selected locus parameters Ne ¼ 50; n ¼ 30, and v ¼ 5 10 : In the are unequal among subpopulations, and so is the two-locus case, the proportion of increment in F st is effective subpopulation size among subpopulations, i.e. about 1.68% for m~ ¼ 0:015; 3.23% for m~ ¼ 0:05; and 0 a a 0 Ne1 Nen: Eq. (15) remains effective since it is 4.10% for m~ ¼ 0:1 when the neural locus tightly links to derived before the occurrence of genetic drift. The the selected locus A (r1 ¼ 0:0001) (Fig. 3a). The steady-state increment in the variance of allele frequen- proportion of increment in F st decreases with the cies after genetic drift, denoted by Ds2 ; is recombination rate. In the three-locus case, the propor- d p ð1 À p Þ tion of increment in F st is about 4.22% for m~ ¼ 0:015; 2 Ci Ci Ds ¼ EF 7.86% for m~ ¼ 0:05; and 9.16% for m~ ¼ 0:1 when r1 ¼ d 2N eÀli e p ð1 À p Þ p ð1 À p Þ ¼ E Ci Ci þðl þ l2=2! þ Þ Ci Ci F 2N i i 2N e e 5 1 2 ¼ ðp¯Cð1 À p¯CÞÀs Þþr, ð18Þ 2Ne 4 p ð1Àp Þ where r ¼ E ððl þ l2=2! þ Þ Ci Ci Þ is the incre- F i i 2Ne ment part due to the background selection. Therefore, 3 (%) the steady-state equation for the variance of allele st frequencies, equivalent to Eq. (16), is expressed as F 2 1 s2 ¼ 1 À ððð1 À m~ Þ2 À 2vÞs2 þ P þ L2Þ 1 2Ne p¯ ð1 À p¯ Þ The proportion of increment in þ C C þ r. ð19Þ 0 2Ne 0 0.10.2 0.3 0.4 0.5
Denote by F st2 the population differentiation at the (a) Recombination rate (r 1) neutral locus. Rearranging Eq. (19) yields 10 1 F st2 ¼ ð1 þ Þ, (20) 1 þ 4Neðm~ þ vÞ 8 where 6
2 (%) ð2Ne À 1ÞðP þ L Þþ2Ner st