Background Selection and Population Differentiation

ARTICLE IN PRESS

Journal of Theoretical Biology 235 (2005) 207–219 www.elsevier.com/locate/yjtbi

Background selection and population differentiation

Xin-Sheng HuÃ, Fangliang He

Department of Renewable Resources, 751 General Services Building, University of Alberta, Edmonton, AB Canada T6G 2H1

Received 23 July 2004; received in revised form 24 November 2004; accepted 6 January 2005 Available online 8 March 2005

Abstract

A general analytical formula is derived, which predicts the effects of background selection on population differentiation at a neutral locus as a result of its linkage with selected loci of deleterious mutations. The theory is based on the assumptions of random mating, multiplicative ﬁtness, and weak selection in hermaphrodite plants in the island model of population structure. The analytical results show that Fst at the neutral locus increases as a result of the effects of background selection, regardless of the dependence or independence among linked background selective loci. The increment in Fst is closely related to the magnitude of linkage disequilibria between the neutral locus and selected loci, and can be estimated by the ratio of Fst with background selection to Fst without background selection minus one. The steady-state linkage disequilibrium between a neutral locus and a selected locus in subpopulations, primarily attained by gene ﬂow, decreases with the recombination rate, and can be enhanced when there are dependence among linked selected loci. Monte Carlo computer simulations with two- and three-locus models show that the analytical formulae perform well under general conditions. Application of the present theory may aid in analyzing the genome-wide mapping of the effect of background selection in terms of Fst. r 2005 Elsevier Ltd. All rights reserved.

Keywords: Background selection; Population differentiation; Linkage disequilibrium; Gene ﬂow; Selection

1. Introduction 2000). When LD equals zero, both kinds of effects disappear. Like the selectively favored mutations that cause In a natural population without subdivision the LD hitchhiking effects on linked neutral loci (Maynard between two linked neutral loci dissipates with genera- Smith and Haigh, 1974), selectively disfavored mutation as the consequence of recombination, and even- tions can also change gene frequencies and reduce tually approaches zero (e.g., Bennett, 1954; Hill and genetic diversities at linked neutral loci (‘‘background Robertson, 1968; Hill, 1974). In the population with selection’’, Charlesworth et al., 1993). Early studies subdivision the dissipation of global LD with generation showed that a substantial reduction in genetic diversity is enhanced since the inter-subpopulation gene flow can at a neutral locus can result from its linkage to reduce the effective size of the whole population and deleterious mutations (e.g., Charlesworth et al., 1993; hence increase the drift speed (Wright, 1943). However, Hudson and Kaplan, 1995; Nordborg et al., 1996). The a certain amount of LD in local subpopulations can be genetic basis for maintaining both kinds of effects is the attained owing to the effects of inter-subpopulation gene persistence of the linkage disequilibrium (LD) between flow that counteracts genetic drift. Stable LD between neutral and selected loci (see the review by Barton, selected nuclear loci without epistasis can be maintained in subdivided populations (Li and Nei, 1974). When the recombination fraction between selected nuclear loci is ÃCorresponding author. Tel.: +1 780 492 0715; of the same order or smaller than the selection fax: +1 780 492 4323. coefficient, a substantial amount of LD can be present E-mail address: [email protected] (X.-S. Hu). in a cline (Slatkin, 1975). The LD between selected

0022-5193/$ - see front matter r 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.jtbi.2005.01.004 ARTICLE IN PRESS 208 X.-S. Hu, F. He / Journal of Theoretical Biology 235 (2005) 207–219 nuclear and cytoplasmic loci that are physically un- is much less fully explored (Charlesworth et al., 1997; linked can even be generated when inter-population Nordborg, 1997; Barton, 2000). In this study, we gene flow (seed and pollen flow) takes place (e.g., Hu analytically derive the population differentiation at a and Li, 2002). neutral site due to its linkage with the sites that are Similarly, the persistence of the LD between one subject to unfavorable mutations. Computer simula- neutral locus and another selected locus is expected in a tions are conducted to validate the analytical results that local population owing to the inter-subpopulation gene demonstrate the increase in Fst owing to the effect of flow. As long as a certain amount of LD between neutral background selection. and selected loci is preserved, the effect of background selection should be present. The persistence of LD can cause an increase in variance of neutral allele 2. Assumptions frequencies and hence increase its population differentiation (Barton, 2000). Based on the classical island model of population Previous LD studies are often examined in terms of structure (Wright, 1969), here we consider diallelic genes or molecular markers as an ‘‘observation unit’’ at selected nuclear loci (diploid) that are linked with a the equilibrium between gene flow and genetic drift. At a neutral locus in a hermaphrodite population of plants. fine scale, the length of genomes for maintaining a For simplicity the selected loci addressed throughout certain amount of LD can be long in terms of the this study refer to those with selectively disfavored number of base pairs and so is the length within which mutation. Weak selection is considered in modeling so background selection has a significant effect. It is that all terms containing the second or higher order of meaningful to examine the effect of background selection coefficient are neglected. Like Nordborg et al. selection in terms of single nucleotide polymorphisms (1996), the selected loci are subject to a balance of (SNP) as an observation unit/marker. For example, 1 mutation–selection–migration, and genetic drift effects percent of recombination fraction (1 centiMorgan or are assumed negligible. The dependence among selected cM) is equal to 1 million base pairs on the physical map loci, caused by gene flow, is considered, relaxing the in human genomes and contains about 1000 SNP (e.g., independence assumption made by Nordborg et al. Wang et al., 1998). Within a few cMs of genetic distance (1996) and Hudson and Kaplan (1995). the LD between one selected nucleotide site and another The modeling procedure is based on a sequence of neutral site is likely substantial, and the effect of events in the life cycle of hermaphrodite plants: pollen background selection on population differentiation at flow, random combination between pollen and ovules the individual neutral sites can be significant. Evidence (random mating), seed flow, mutation, natural selection, indicates that high LD may extend over several genetic drift, and next adults. This procedure is similar centiMorgans in cattle and human genomes (e.g., Farnir to Hu and Ennos (1999) except that mutation and et al., 2000; Abecasis et al., 2001). Needless to say, LD background selection are included and also similar to distribution along genomes varies with populations Nordborg et al. (1996) except that migration is (e.g., Goddard et al., 2000; Shifman et al., 2003). considered. The gene frequencies in migrants of pollen SNP are abundant in various organisms, such as in grains or seeds are equal to the average of gene Arabidopiss thaliana and the rice genome (see the review frequencies over all subpopulations. The gene frequency by Rafalski, 2002). The genetic diversities of SNP within in ovules before random combination with pollen grains either the coding or non-coding regions of a gene are is assumed to be the same as that in the preceding affected by their physical distances from the selected generation. sites that may be located within the same gene or in the In the following we first derive the change of gene regions of other genes. For multiple linked genes with frequency at a neutral locus as a result of linkage to one unequal numbers of SNP, the spatial pattern of genetic and two selected loci, and then give a general expression diversity across SNP could exhibit a patchy pattern for the change due to the background selection from an along chromosomes. These naturally occurring patterns arbitrary number of selected loci. Wright’s F st is then of SNP diversity provide a tool for mapping the effect employed to describe the population differentiation. of background selection in terms of population differentiation. The purpose of this study is to develop further 3. Allele frequency population genetic theory required for understanding the effect of background selection on population 3.1. Two-locus case differentiation at a neutral locus. Although the effect of background selection on genetic diversity of a neutral Consider a selected locus A that is linked to a neutral gene has widely been appreciated, the theoretical locus C in the ith subpopulation. The wild-type allele at investigation of such effect on population differentiation the A locus is denoted by Ai, and its mutant allele by ai; ARTICLE IN PRESS X.-S. Hu, F. He / Journal of Theoretical Biology 235 (2005) 207–219 209 their frequencies are p and p ðp þ p ¼ 1Þ; respec- of Eq. (2) is the change due to migration (seed and Ai ai Ai ai tively. Let the mutation rate from the wild-type allele to pollen flow), the second term is the change due to the the mutant allele be u1 at the A locus. The fitness of mutation of the allele Ci to other alleles, and the third genotypes is assumed to be 1, 1Às1i,and1À2s1i for the term is the change due to the linkage to the selected A genotypes of AiAi, Aiai,andaiai, respectively. The locus. If the linkage disequilibrium is of the order similar migration rates of pollen and seeds into each subpopu- to the selection coefficient (s1i), the third term on the lation are denoted by mP and mS; respectively. Accord- right-hand side of Eq. (2) is negligible. Since LD is ing to the life cycle mentioned in the assumptions, the primarily generated by the inter-subpopulation gene change in the allele frequency at the A locus is given by flow, its magnitude can be much greater than the order of selection coefficient when the recombination fraction Dp ¼ p p s À u p À m~ ðp À p¯ Þ, (1) Ai Ai ai 1i 1 Ai Ai A is very small, say within a few cMs of genome. where m~ ¼ mS þ mP=2; p¯A is the frequency of the allele From the setting of the conditional probabilities of x0i Ai in migrants (seeds and pollen grains). This equation and x1i; DACðiÞ can be expressed by can also be implied from Wright’s general expression D ¼ p p ðx À x Þ. (3) (Wright 1969, p. 474). The first term on the right-hand ACðiÞ Ai ai 0i 1i side of Eq. (1) represents the increment in p due to Ai When the neutral allele C is equally distributed under selection, the second term is the reduction due to i the mutant and mutant-free backgrounds of the A locus, mutation, and the third is the change due to immigra- i.e. x ¼ x ; the effect of background selection equals tion. At steady state Dp ¼ 0; the allele frequencies 0i 1i Ai zero ðD ¼ 0Þ: at the A locus can be analytically solved from Eq. (1), ACðiÞ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi At the steady state the changes in conditional p ¼ððs À u À m~ ÞÆ ðs À u À m~ Þ2 þ 4s m~ p¯ Þ=2s Ai 1i 1 1i 1 1i A 1i probability x0i and x1i per generation is equal to zero. with the condition of 0 p 1: p Ai p Genetic drift does not change the means of the Consider the neutral locus that has alleles Ci and ci in conditional probabilities x0i and x1i and hence the mean the ith subpopulation. Let the mutation rate from Ci to of DACðiÞ although it alters the distributions of these ci be v. There are four types of two-locus gametes: AiCi, variables. Instead of using the diffusion model (e.g.

Aici, aiCi, and aici, with frequencies of PAiCi ; PAici ; PaiCi ; Nordborg et al., 1996), the steady state x0i and x1i can and Paici ; respectively. Let x0i (0px0ip1) be the be calculated by letting Dx0i ¼ Dx1i ¼ 0 according to probability that the allele Ci is linked with a mutant- Eqs. (A.3) and (A.4) in Appendix A, that is free background of gametes with respect to the A locus, ! p s À u À v À r~ p À m~ r~ p p(Ci|Ai) ¼ x0i. Let x1i (0px1ip1) be the probability that ai 1i 1 1i ai 1i ai the allele Ci is linked with the mutant allele ai, r~ p þ u p =p Àp ðs þ r~ ÞÀv À m~ 1i Ai 1 Ai ai Ai 1i 1i p(Ci|ai) ¼ x1i. The conditional probabilities for x0i and ! ! Àm~ p¯ =p x1i in migrants are denoted as x¯ 0 and x¯ 1; respectively. x0i AC Ai ¼ , ð4Þ According to the Bayesian theorem, the frequencies of Àm~ p¯ =p x1i aC ai the four types of gametes can be expressed as PAiCi ¼ p x ; P ¼ p ð1 À x Þ; P ¼ p x ; and P ¼ Ai 0i Aici Ai 0i aiCi ai 1i aici where r~ ¼ r ð1 Àð1 À 2p Þs Þ: From Eq. (4), we p ð1 À x Þ: 1i 1 ai 1i ai 1i obtained Let r1 be the recombination fraction between the A and C loci. Following the approach similar to Nordborg m~ ðr~ p¯ þ P¯ ðp s þ v þ m~ Þ=p Þ 1i C AC Ai 1i Ai et al. (1996, p. 170), the changes in the conditional x0i ¼ , (5a) r~1iðm~ þ vÞ probabilities of x0i and x1i due to the joint effects of migration, selection, and mutation at the A locus are m~ ðr~ p¯ ÀðÀu P¯ þ P¯ ðp s À u À v À mÞÞ=p Þ derived as Eqs. (A.3) and (A.4) in Appendix A. When 1i C 1 AC aC ai 1i ai x1i ¼ . there is no effect of migration, Eqs. (A.3) and (A.4) r~1iðm~ þ vÞ reduce to the previous results of Nordborg et al. (1996). (5b) Let Dp0 be the change in the frequency of the neutral Ci allele Ci due to the joint effects of background selection, According to Eqs. (1) and (3) and the relation p¯C ¼ mutation, and migration. According to Eqs. (A.3) and p¯AC þ p¯aC; the steady-state DAC(i) can be derived as (A.4) in Appendix A and the relation of p ¼ P þ Ci AiCi 0 m~ v PaiCi the analytical expression for DpC is given by D ¼ D¯ À ðp À p¯ Þp¯ , (6) i ACðiÞ AC Ai A C r~1i m þ v Dp0 ¼ DP þ DP ¼ p Dx þ p Dx Ci AiCi aiCi Ai 0i ai 1i where D¯ is the LD in migrants. Eq. (6) explicates that ¼Àm~ ðp À p¯ ÞÀvp þ s D , ð2Þ AC Ci C Ci 1i ACðiÞ DAC(i) reduces with the increasing recombination frac- where DAC(i) is the LD between the A and C loci in the tion, but increases with the increasing migration rate or ith subpopulation. The ﬁrst term on the right-hand side the selection coefﬁcient. ARTICLE IN PRESS 210 X.-S. Hu, F. He / Journal of Theoretical Biology 235 (2005) 207–219

3.2. Three-locus case Solutions to Eq. (9) can be calculated with the Mathematica tool. Assume that the neutral locus is linked on either side There are eight types of three-locus gametes: AiCiBi, to a selected locus each with disfavored mutations. The AiCibi, AiciBi, Aicibi, aiCiBi, aiCibi, aiciBi, and aicibi, with difference between the two- and three-locus cases is that frequencies of PAiCiBi ; PAiCibi ; PAiciBi ; PAicibi ; PaiCiBi ; the effects of LD between selected loci generated by gene PaiCibi ; PaiciBi ; and Paicibi ; respectively. Let the prob- flow and the double crossover among the three loci are ability that the allele Ci is linked with a mutant-free included. Assume that another diallelic selected locus B background of gametes with respect to the two selected links to the neutral locus C at the opposite side to the A loci, p(Ci|AiBi), be y0i. Similarly, let p(Ci|aiBi) ¼ y1i, locus, i.e. the order of ACB. The wild-type allele at the B p(Ci|Aibi) ¼ y2i, and p(Ci|aibi) ¼ y3i. All these condi- locus in the ith subpopulation is denoted by Bi, and its tional probabilities are in the range of 0 to 1 mutant allele is denoted by b ; their frequencies are p ð0 y 1; j ¼ 0; 1; 2; 3Þ: The frequencies of the three- i Bi p jip and p ðp þ p ¼ 1Þ; respectively. Let the mutation locus gametes A C B and A c B can be written as bi Bi bi i i i i i i rate from the wild-type allele to the mutant be u2 at the PAiCiBi ¼ PAiBi y0i; and PAiciBi ¼ PAiBi ð1 À y0iÞ; respec- B locus. The fitness is assumed to be 1, 1Às2i, and 1À2s2i tively, where PAiBi is the frequency of gamete AiBi. The for the genotypes of BiBi, Bibi, and bibi, respectively. expressions for the remaining six three-locus gametes Using the assumption of multiplicative viability, the (AiCibi, Aicibi, aiCiBi, aiCibi, aiciBi, and aicibi) can be fitness for each two-locus genotype can be readily written in a similar way. The conditional probabilities calculated. Following the life cycle the steady-state allele for y0i; y1i; y2i; and y3i in migrants are denoted as y¯0; y¯1; frequencies at the A and B loci at the balance of y¯2; and y¯3; respectively. migration–selection–mutation are shown to have the Let r2 be the recombination fraction between the B following relations: and C loci. The changes in the conditional probabilities of y0i, y1i, y2i,andy3i due to the joint effects of p p s1i À u1p À m~ ðp À p¯ ÞÀDABðiÞs2i ¼ 0, (7a) Ai ai Ai Ai A migration, selection, and mutation are given in Appen- dix C. According to Appendix C the analytical expres- pB pb s2i À u2pB À m~ ðpB À p¯BÞÀDABðiÞs1i ¼ 0, (7b) i i i i sion for Dp0 is Ci where p¯B is the frequency of the allele Bi in migrants Dp0 ¼ DP þ DP þ DP þ DP (seeds and pollen grains), and DAB(i) is the steady-state Ci AiCiBi aiCiBi AiCibi aiCibi LD between the A and B loci. ¼ PA B Dy þ Pa B Dy þ PA b Dy þ Pa b Dy The steady-state D can be obtained from Eq. i i 0i i i 1i i i 2i i i 3i AB(i) ¼Àm~ ðp À p¯ ÞÀvp þ s D þ s D , (B.2) in Appendix B, that is Ci C Ci 1i ACðiÞ 2i CBðiÞ ð10Þ m~ ðD¯ þðp À p¯ Þðp À p¯ ÞÞ AB Ai A Bi B DABðiÞ ¼ , 1 Àð1 À m~ Àðp À p Þs1i Àðp À p Þs2i À u1 À u2Þð1 À rÞ Ai ai Bi bi where DCB(i) are the LD between the C and B loci in the (8) ith subpopulation. where r is the recombination rate between the A and the According to the conditional probabilities of y0i, y1i, B loci. The analytical expressions for the allele y2i, and y3i, DAC(i) and DCB(i) can be, respectively, frequencies at the A and B loci are hard to obtain using given by the joint Eqs. (7) and (8). In the specific case where allele D ¼ p p ðp ðy À y Þþp ðy À y ÞÞ ACðiÞ A1 ai Bi 0i 1i bi 2i 3i frequencies at the A and B loci are coincident, i.e. s1i ¼ þðp ðy À y Þþp ðy À y ÞÞD , ð11aÞ s ¼ s ; u ¼ u ¼ u; p¯ ¼ p¯ ¼ p¯; and p ¼ p ¼ p ; ai 0i 2i Ai 1i 3i ABðiÞ 2i i 1 2 A B Ai Bi i we obtained a cubic equation

3 2 DCBðiÞ ¼ pB pb ðpA ðy0i À y2iÞþpa ðy1i À y3iÞÞ d0pi þ d1pi À d2pi À d3 ¼ 0, (9) i i i i þðp ðy À y Þþp ðy À y ÞÞD . ð11bÞ where bi 0i 2i Bi 2i 3i ABðiÞ 2 The first part on the right-hand side of Eqs. (11a) or d0 ¼ 4ð1 À rÞsi , (11b) is the amount without the influence of the LD d1 ¼ sið1 þ m~ Àð1 À m~ À 2u þ 2siÞð1 À rÞ between the A and B loci, and the second part is the increment due to the LD between the A and B loci À 4ðsi À u À mÞð1 À rÞÞ, generated by gene flow. The above equation analytically d2 ¼ð1 Àð1 À m~ À 2u þ 2siÞð1 À rÞÞðsi À u À m~ Þ demonstrates that the presence of the LD among linked selected loci can enhance the effects of background þ 4sim~ ð1 À rÞp¯ þ 2sim~ p¯ selection. The proportion of DAC(i) and DCB(i) explained and by the component of D can be assessed by ðp ðy À ABðiÞ ai 0i y Þþp ðy À y ÞÞDABðiÞ=DACðiÞ and ðp ðy À y Þþ 2 2i Ai 1i 3i bi 0i 2i d3 ¼ m~ ðð1 Àð1 À m~ À 2u þ 2siÞð1 À rÞÞp¯ À siD¯ AB À sip¯ Þ. p ðy À y ÞÞD =D ; respectively. Bi 2i 3i ABðiÞ CBðiÞ ARTICLE IN PRESS X.-S. Hu, F. He / Journal of Theoretical Biology 235 (2005) 207–219 211

Letting Dyji ¼ 0 ðj ¼ 0; 1; 2; 3Þ in Appendix C, we 0.04 obtain four non-linear equations for calculating the steady-state conditional probabilities (y , y , y , 0i 1i 2i 0.03 and y3i), 0 1 h00 h05 B C ÀÁ0.02 B C T @ ... A y0i y1i y2i y3i y0iy3i y1iy2i h ... h 0.01 DAC(i) 30 35 46 0 1 Linkage disequilibrium DAB(i) g0 B C B C 0.00 ¼ @ ...A , ð12Þ 00.02 0.04 0.06 0.08 0.1 0.12 g (a) m + m /2 3 41 S P where 25

h ¼ s p þ s p À u À u À v À m~ (%) 00 1i a 2i b 1 2 )

i i i ( 20 explained AB

Àð1 þ 2s1ip þ 2s2ip ÞðPa B ð1 À s1iÞr1 ai bi i i ) D i (

þ PAibi ð1 À s2iÞr2 þ Paibi ð1 À s1i À s2iÞ AC 15 D ðr1 þ r2 À r1r2ÞÞ; ..., 10 1 þ 2s1ip þ 2s2ip h ¼ ai bi ðr r ð1 À s À s ÞP P Þ, 35 1 2 1i 2i Aibi aiBi 5 Paibi by the component of

The proportion of 0 m~ P¯ ACB m~ P¯ aCb g0 ¼À ; ...; g3 ¼À . 00.02 0.04 0.06 0.08 0.1 0.12 PAiBi Paibi (b) mS + mP /2 The analytical solution is algebraically complicated, but can be numerically solved with Mathematica model Fig. 1. Effects of the dependence between the selected A and B loci: (a) solutions. Changes in DACðiÞ and DABðiÞ with the migration rate. (b) The proportion of DACðiÞ explained by the component of DABðiÞ: DAB(i) is Our numerical examples demonstrate that both DABðiÞ calculated according to Eqs. (7) and (8) while DAC(i) is calculated and DACðiÞ increase with the migration rate (Fig. 1a). according to Eq. (11a). The settings of other parameters are the À5 The proportion of DACðiÞ explained by the component of mutation rate for the A and B loci u1 ¼ u2 ¼ 10 and for the neutral locus v ¼ 10À4, the selection coefﬁcients s ¼ s ¼ 0:02; the recombi- DABðiÞ can be more than 20% when m~ ¼ 0:1(Fig. 1b). 1i 2i nation rates between the A and C loci or the B and C loci r1 ¼ r2 ¼ Both DABðiÞ and DACðiÞ decrease with the recombination 0:01 and between the A and B loci r ¼ 2r1; the LD between the A and B rate (Fig. 2a,b). The proportion of DACðiÞ explained by ¯ loci in migrants DAB ¼ 0:03; the migrant allele frequencies p¯A ¼ p¯B ¼ the component of DABðiÞ can be substantially increased 0:8; and the conditional probabilities in migrants for the neutral locus when the three loci are tightly linked, and more than C under different backgrounds y¯0 ¼ 0:8; y¯1 ¼ y¯2 ¼ 0:7; and y¯3 ¼ 0:2: 20% of DACðiÞ can be brought about when r1 ¼ r2 ¼ 0:0005 (Fig. 2c). The cumulative effects Li from multiple selected loci 3.3. General case are likely substantial when they closely link to the neutral locus. From the preceding two- and three-locus analyses, extension can be obtained to a more general case where a neutral locus links to an arbitrary number of selected 4. Population differentiation loci among which the interaction may exist. The change in the frequency of the neutral locus Ci in 4.1. Equal spatial selection the ith subpopulation, regardless of the magnitude of the linkage disequilibria among selected loci, can be We now examine the effects of background selection generally expressed as on population differentiation using the classical island 0 model (Wright, 1969). Background selection can change DpC ¼Àm~ ðpC À p¯CÞÀvpC þ Li, (13) i Pi i both the effective population size and the neutral allele L where Li ¼ j¼1 sjiDMjCðiÞ; in which Mj represents the frequency. In the preceding section we have shown the wild-type allele at the jth selected locus. Effects of the systematical change of allele frequencies at a neutral

LD among multiple selected loci are included in DMjCðiÞ: locus as a result of linkage to selected loci. Previous ARTICLE IN PRESS 212 X.-S. Hu, F. He / Journal of Theoretical Biology 235 (2005) 207–219

0.03 effective size for the ith subpopulation under the impacts 0 of background selection, denoted by Nei; can be approximated by 0.02 0 À ) li i ( Nei ¼ Nee , (14) AB

D where 0.01 XL 1 À p Mj li ¼ 2 . ð1 þ r~j=sjiÞ 0 j¼1 0 0.10.2 0.3 0.4 0.5 We assume that the effective subpopulation size is not (a) Recombination rate (r ) between the A and B loci affected by gene ﬂow although the effective size of the whole population is changed (Wright, 1943). 0.04 2 Denote by s0 the variance of allele frequencies among subpopulations after pollen and seed ﬂow and 0.03 background selection. Suppose that the number of 2 ) 0 i

( subpopulations (n) is large. According to Eq. (13), s

AC 0.02 can be calculated by D Xn 2 1 0.01 s0 ¼ ðp0 À p¯ Þ2 n Ci C i¼1 0 2 EFððð1 À m~ ÞðpC À p¯CÞÀvpC þ LiÞ Þ 00.05 0.1 0.05 0.2 0.25 i i 2 2 2 (b) Recombination rate (r1) between the A and C loci ¼ðð1 À m~ Þ À 2vÞs þ P þ L , ð15Þ

25 where EF represents the expectation with respect to the ) i ( allele frequency distribution among subpopulations,

AC 20 P ¼ 2ð1 À m~ ÞE ððp À p¯ ÞL Þ; and L2 ¼ E ðL2Þ: Note D F Ci C i F i 15 that the expectations of the terms involving coefﬁcients (%)

) 2 i

( of v ; mv; and vs are neglected in deriving Eq. (15).

AB 10 When the selection coefﬁcients for the same allele at D any selected locus are equal among all subpopulations, 5 i.e. s1j ¼¼snj; then l1 ¼¼ln and the effective

The proportion of 0 0 0 subpopulation size is the same, i.e. Ne1 ¼¼Nen ¼ explained by the component of 00.05 0.1 0.05 0.2 0.25 0 Ne: The effects of background selection are equal among 2 2 (c) Recombination rate (r1) between the A and C loci subpopulations, i.e. L1 ¼¼Ln ¼ L; and L ¼ L : The second term on the right-hand side of Eq. (15) Fig. 2. Effects of the dependence between the selected A and B loci: (a) varnishes, i.e. P ¼ 0: According to Hu and Ennos changes in DABðiÞ with the recombination fraction between the A and B (1999), the steady-state variance of allele frequencies loci; (b) changes in D with the recombination fraction between the ACðiÞ after genetic drift, can be written as A and C loci; (c) the proportion of DACðiÞ explained by the component of D : D is calculated according to Eqs. (7) and (8), while D ABðiÞ ABðiÞ ACðiÞ 1 is calculated according to Eq. (11a). The settings of other parameters 2 ¼ À ððð À Þ2 À Þ 2 þ 2Þ À5 s 1 0 1 m~ 2v s L are the mutation rate for the A and B loci u1 ¼ u2 ¼ 10 and for the 2Ne À4 neutral locus v ¼ 10 , the selection coefﬁcients s1i ¼ s2i ¼ 0:02; the p¯ ð1 À p¯ Þ recombination rates between the A and C loci or the B and C loci þ C C . ð16Þ 2N0 r1 ¼ r2 and between the A and B loci r ¼ 2r1; the LD between the A e and B loci in migrants D¯ AB ¼ 0:03; the migration rate m~ ¼ 0:05; the migrant allele frequencies p¯ ¼ p¯ ¼ 0:8; and the conditional prob- Population differentiation at the neutral locus, denoted A B 2 abilities in migrants for the neutral locus C under different back- by F st1 (¼ s =p¯Cð1 À p¯CÞ), can be obtained by substitut- grounds y¯0 ¼ 0:8; y¯1 ¼ y¯2 ¼ 0:7; and y¯3 ¼ 0:2: ing Eq. (14) into Eq. (16), 1 2N eÀl À 1 ¼ þ e 2 studies showed that background selection can reduce the F st1 Àl 1 L . (17) 1 þ 4Neðm~ þ vÞe p¯Cð1 À p¯CÞ effective subpopulation size (Ne) for the neutral locus (e.g. Nordborg et al., 1996) and hence affect the genetic Clearly, F st1 is greater than the population differentia- drift process. Here we include both effects in deriving tion under the purely neutral process, denoted by F st:b the expression for population differentiation. According ( ¼ 1=ð1 þ 4Nm~ Þ) for diploid nuclear genes (Hu and to Eq. (4) of Nordborg et al. (1996), the steady-state Ennos, 1999). ARTICLE IN PRESS X.-S. Hu, F. He / Journal of Theoretical Biology 235 (2005) 207–219 213

4.2. Unequal spatial selection selection. The steady-state F st under the purely neutral process equals 0.2320 for m~ ¼ 0:015; 0.0847 for m~ ¼ The more general situation is that the selection 0:05; and 0.0444 for m~ ¼ 0:1; with the settings of other À4 coefﬁcients of the same allele at any selected locus parameters Ne ¼ 50; n ¼ 30, and v ¼ 5 10 : In the are unequal among subpopulations, and so is the two-locus case, the proportion of increment in F st is effective subpopulation size among subpopulations, i.e. about 1.68% for m~ ¼ 0:015; 3.23% for m~ ¼ 0:05; and 0 a a 0 Ne1 Nen: Eq. (15) remains effective since it is 4.10% for m~ ¼ 0:1 when the neural locus tightly links to derived before the occurrence of genetic drift. The the selected locus A (r1 ¼ 0:0001) (Fig. 3a). The steady-state increment in the variance of allele frequen- proportion of increment in F st decreases with the cies after genetic drift, denoted by Ds2 ; is recombination rate. In the three-locus case, the propor- d p ð1 À p Þ tion of increment in F st is about 4.22% for m~ ¼ 0:015; 2 Ci Ci Ds ¼ EF 7.86% for m~ ¼ 0:05; and 9.16% for m~ ¼ 0:1 when r1 ¼ d 2N eÀli e p ð1 À p Þ p ð1 À p Þ ¼ E Ci Ci þðl þ l2=2! þÞ Ci Ci F 2N i i 2N e e 5 1 2 ¼ ðp¯Cð1 À p¯CÞÀs Þþr, ð18Þ 2Ne 4 p ð1Àp Þ where r ¼ E ððl þ l2=2! þÞ Ci Ci Þ is the incre- F i i 2Ne ment part due to the background selection. Therefore, 3 (%) the steady-state equation for the variance of allele st frequencies, equivalent to Eq. (16), is expressed as F 2 1 s2 ¼ 1 À ððð1 À m~ Þ2 À 2vÞs2 þ P þ L2Þ 1 2Ne p¯ ð1 À p¯ Þ The proportion of increment in þ C C þ r. ð19Þ 0 2Ne 0 0.10.2 0.3 0.4 0.5

Denote by F st2 the population differentiation at the (a) Recombination rate (r 1) neutral locus. Rearranging Eq. (19) yields 10 1 F st2 ¼ ð1 þ Þ, (20) 1 þ 4Neðm~ þ vÞ 8 where 6

2 (%) ð2Ne À 1ÞðP þ L Þþ2Ner st

¼ . F 4 p¯Cð1 À p¯CÞ In the presence of inbreeding in each subpopulation, 2 the variance effective population size reduces to Neð1 þ

À1 The proportion of increment in F isÞ where Fis is the inbreeding coefficient (Caballero 0 and Hill, 1992). Thus, the assumption of random mating 0 0.10.2 0.3 0.4 0.5 can be relaxed by replacing Ne in Eqs. (17) or (20) with (b) Recombination rate (r ) À1 1 Neð1 þ F isÞ : Also the assumption of the large number of subpopulations can be relaxed by replacing m~ with Fig. 3. The proportion of increment in Fst due to background 2 selection: (a) two-locus case, with the selection coefficient s1i ¼ ðn=ðn À 1ÞÞ m~ in which n can be an arbitrary number of À5 0:02 ði ¼ 1; ...; 30Þ; the mutation rate u1 ¼ 10 for the A locus and subpopulations (Hu, 2000). v ¼ 5 10À4 for the C locus, the conditional probabilities in migrants ^ ^ When the estimates of F st2 and F st:b are available, the x¯ 0 ¼ 0:8 and x¯ 1 ¼ 0:2; the migrant gamete frequencies P¯ AC ¼ 0:76 and ¯ increment in F st due to the effect of background PaC ¼ 0:01; the migrant allele frequencies p¯A ¼ 0:95 and p¯C ¼ 0:8; and selection can be estimated according to the general the effective population size Ne ¼ 50; (b) three-locus case, with the À5 À4 formula of Eq. (20), mutation rate u1 ¼ u2 ¼ 10 and v ¼ 5 10 , the selection coeffi- cients s1i ¼ s2i ¼ 0:02 ði ¼ 1; ...; 30Þ; the recombination rates between the A and C loci or the B and C loci r ¼ r and between the A and B F^ st2 1 2 ^ ¼ À 1. (21) loci r ¼ 2r1, the LD between the A and B loci in migrants D¯ AB ¼ 0:03; F^ st:b the migrant allele frequencies p¯A ¼ p¯B ¼ 0:95; the conditional probabilities in migrants y¯ ¼ 0:82; y¯ ¼ y¯ ¼ 0:7; and y¯ ¼ 0:2; and the In order to look at the amount of increment in F st at 0 1 2 3 effective population size N ¼ 50: In each figure the line with circles the neutral locus due to its linkage to selected loci, the e represents the case of m~ ¼ 0:015 (mS ¼ mP ¼ 0:01), the line with above analytical results are applied to the two- and blocks for m~ ¼ 0:05 (mS ¼ 0:04; mP ¼ 0:02), and the line with triangles three-locus cases under the model of equal spatial for m~ ¼ 0:10 (mS ¼ 0:05; mP ¼ 0:1). ARTICLE IN PRESS 214 X.-S. Hu, F. He / Journal of Theoretical Biology 235 (2005) 207–219

0:0001 (Fig. 3b). Although the proportion of increment population differentiation reaches steady distribution in F st decreases with the recombination rate, they are from generation to generation. greater than those in the two-locus case (Fig. 3a,b). Note Five thousand independent data sets are created per that the parameters for the neutral locus C and its generation, and each is used to calculate population linkage to the A (or B) locus in the above two cases are differentiation. From these replicated datasets, means ¯ ¯ comparable: p¯C ¼ 0:8; PAC ¼ 0:76; and PaC ¼ 0:01 in and standard deviation of Fst are calculated. The ¯ the two-locus case; p¯C ¼ 0:7957; PAC ¼ 0:7769; and predicted results are obtained according to Eq. (17), P¯ aC ¼ 0:0187 in the three-locus case. These numerical where the LD is calculated from Eq. (6) in the two- results indicate that a certain mount of increment in F st locus case and Eq. (11a, b) in the three-locus case. The can be attained when the neutral locus tightly links to steady-state values of yji ðj ¼ 0; 1; 2; 3Þ in calculating the selected loci or when the migration rate is high. expected F st values are calculated according to Eq. (12) with the Mathematica tool.

5. Simulations 5.2. Results

À4 5.1. Method Let n ¼ 30; v ¼ 5 10 ; and Ne ¼ 50: In the purely neutral process, the theoretical expectation of F st at To confirm the analytical results simulation study was steady state equals 0.2320 when m~ ¼ 0:015; 0.0847 when conducted according to the sequence of events in the life m~ ¼ 0:05; and 0.0444 when m~ ¼ 0:10: In the presence of cycle of hermaphrodite plants. Simulation starts from an background selection, our simulations clearly show that initial adult reference population that begins subdivision population differentiation is enhanced, especially when and produces many subpopulations. The allele frequen- the neutral locus is closely linked to the selected loci or cies at the selected loci are initially set to be the same as when migration rate is high. In the two-locus case, for in the reference population. The conditional probabil- example, the average proportion of increment in F st at ities for a diallelic neutral gene in the reference the 200th generation (steady-state value) reduces from population under different backgrounds of selected 9.48% for r1 ¼ 0.01–2.97% for r1 ¼ 0:1(Fig. 4a,b), gametes are also assumed, thus the allele frequencies although these estimates are greater than the expected of the neutral locus can be calculated. The frequencies of values of 1.12% and 0.25% (Fig. 3a), respectively. The two- or three-locus gametes in migrants (seeds and expected values of F st are within the range of one pollen grains) are assumed to be equal to those in the standard deviation of empirical results. The general initial reference population. Constant selection coeffi- pattern for the change of F st with generation is that the cients among subpopulations are examined although a average F st is initially small when the reference more complicated pattern of selection can be modeled. population starts subdivision and all subpopulations The detailed simulation procedure is as follows. Given are formed by sampling from the same reference the initial parameter settings, calculate the frequencies population. Then population differentiation gradually of two- or three-locus gametes in pollen and ovules increases and approaches a steady distribution after 160 according to the assumption of Wright–Fisher’s model. generations (Fig. 4a). Compared with the results in the Then calculate the gamete frequencies after pollen flow. case of loose linkage (r1 ¼ 0:1), population differentia- According to the assumption of random combination tion displays a greater fluctuation in the case of tight between pollen and ovules, calculate the genotype linkage (r1 ¼ 0:01; Fig. 4b). frequencies in seeds so formed. Seed flow is then Population differentiation quickly reaches a steady considered and the genotype frequencies are calculated distribution with the increase of migration rate after seed flow. Assume that the mutation at the neutral (Fig. 5a,b). The average proportion of increment in F st locus is a deterministic process, with a probability of v generally increases with migration rate. For example, from the allele Ci to the allele ci per generation, and then the average proportion of increment in F st at the 200th calculate all genotype frequencies after mutation. The generation increases from 9.48% for m~ ¼ 0:015; model of multiplicative viability is employed to calculate to 10.5% for m~ ¼ 0:05; and to 16.9% for m~ ¼ 0:10 the fitness of each genotype in any subpopulation and (Fig. 5a), although these estimates are greater than the the genotype frequencies after selection. A sampling expected values of 1.12%, 2.28%, and 2.74% (Fig. 3a), process (genetic drift) is then conducted according to the respectively. All expected F st values are within the range phenotype frequencies after selection, given an effective of one standard deviation of empirical results. 0 subpopulation size (Nei). Gamete frequencies in ovules The change of F st with generation in the three-locus and pollen grains are then calculated according to the case has the pattern similar to that in the two-locus case, segregation ratios of gametes from individual cross (10 which displays a gradual increase with time and crosses in the two-locus case and 36 crosses in the three- eventually approaches a stable distribution (Fig. 6a), locus case). The above steps are repeated until the but has a greater fluctuation than the latter (Fig. 6b). ARTICLE IN PRESS X.-S. Hu, F. He / Journal of Theoretical Biology 235 (2005) 207–219 215

0.3 0.3

0.25 0.25

0.2 0.2 st F

st ~

F m = 0.015 0.15 ~ ~ 0.15 r1=0.01 r1=0.10 m = 0.05 m = 0.10

Average 0.1 Average 0.1 0.05 0.05 0 0 20 40 60 80 100 120 140 160 180 200 0 (a) Generation 0 20 40 60 80 100 120 140 160 180 200 (a) Generation 0.07

0.07 0.06 st F 0.06 0.05 ~ st m = 0.015 F ~ ~ 0.05 0.04 m = 0.05 m = 0.10

r =0.01 0.03 0.04 1 r1=0.10 0.02 0.03 Standard deviation of 0.01 0.02 Standard deviation of 0 0.01 0 20 40 60 80 100 120 140 160 180 200 (b) Generation 0 0 20 40 60 80 100 120 140 160 180 200 Fig. 5. Effects of migration on background selection in the two-locus (b) case: (a) average F st; (b) standard deviation of F st: Results are Generation À5 obtained from 5000 independent simulations, with u1 ¼ 10 ; Ne ¼ 50; À4 Fig. 4. Effects of recombination rate on background selection in the v ¼ 5 10 ; p¯A ¼ 0:95; s1i ¼ 0:02 ði ¼ 1; ...; 30Þ; r1 ¼ 0:01; and the two-locus case: (a) average F st; (b) standard deviation of F st: Results conditional probabilities in migrants: x¯ 0 ¼ 0:8; and x¯ 1 ¼ 0:2: The are obtained from 5000 independent simulations, with mS ¼ mP ¼ dashed lines in (a) at the positions of F st ¼ 0:2320; 0.0847, and 0.0444 À5 À4 refer to the expected values under the purely neutral process with 0:01; u1 ¼ 10 ; Ne ¼ 50; v ¼ 5 10 ; p¯A ¼ 0:95; s1i ¼ 0:02 ði ¼ 1; ...; 30Þ; and the conditional probabilities in migrants: x¯ 0 ¼ 0:8 and migration rates of m~ ¼ 0:015 (mS ¼ mP ¼ 0:01), m~ ¼ 0:05 (mS ¼ 0:04; x¯ 1 ¼ 0:2: The dashed line in (a) refers to the Fst value ( ¼ 0.2320) under mP ¼ 0:02), and m~ ¼ 0:10 (mS ¼ 0:05; mP ¼ 0:1), respectively. the purely neutral process.

Compared with the results in the two-locus case, the demonstrate that the cumulative effect of background average F st in the three-locus case is increased. For selection on F st at a neutral locus can be substantial example, the average proportion of increment in F st at if the neutral locus is closely linked to multiple the 250th generation (steady-state value) is about 13.8% selected loci. for r1 ¼ 0:01 and 11.6% for r1 ¼ 0:1(Fig. 6a), although these estimates are greater than the expected values of 3.92% and 2.97% (Fig. 3b), respectively. The general 6. Discussion pattern for the effect of migration on increasing the proportion of increment in F st can be observed (Fig. 7a). In this paper we have obtained the analytical For example, the average proportions of increment in expressions for population differentiation at a neutral F st at the 250th generation are 13.8% for m~ ¼ 0:015; locus in the island model of population structure. The 15.7% for m~ ¼ 0:05; and 26.4% for m~ ¼ 0:10 increase in population differentiation as a result of (Fig. 7a,b), although these estimates are greater than background selection is analytically demonstrated under the expected values of 3.92%, 7.53%, and 8.88%, very general conditions. Although the individual effects respectively (Fig. 3a). Again, all predicted F st values of a single selected locus are likely to be small, the are in the range of one standard deviation of the cumulative effect of multiple selected loci could be empirical results. In summary, these simulation results substantial. Our theoretical results can be applied to a ARTICLE IN PRESS 216 X.-S. Hu, F. He / Journal of Theoretical Biology 235 (2005) 207–219

0.30 0.30

0.25 0.25

0.20 ~ 0.20 st m = 0.015 st F F 0.15 ~ ~ 0.15 r1=0.10 r1=0.01 m = 0.05 m = 0.10 Average

Average 0.10 0.10

0.05 0.05 0.00 0.00 050100 150 200 250 0 50 100 150 200 250 (a) Generation (a) Generation 0.09 0.09 0.08

0.08 st F 0.07

st 0.07 F 0.06 ~ 0.06 0.05 m = 0.015 0.05 0.04 m~ = 0.05 m~ = 0.10 0.04 0.03

0.03 r1=0.10 r1=0.01 0.02 Standard deviation of 0.02 0.01 Standard deviation of 0 0.01 050100 150 200 250 0 (b) Generation 0 50 100 150 200 250 (b) Generation Fig. 7. Effects of migration on background selection in the three-locus case: (a) average F st; (b) standard deviation of F st: Results are À5 Fig. 6. Effects of recombination rate on background selection in the obtained from 5000 independent simulations, with u1 ¼ u2 ¼ 10 ; À4 three-locus case: (a) average F st; (b) standard deviation of F st: Results Ne ¼ 50; v ¼ 5 10 ; p¯A ¼ p¯B ¼ 0:95; s1i ¼ s2i ¼ 0:02 ði ¼ 1; ...; 30Þ; ¯ are obtained from 5000 independent simulations, with mS ¼ mP ¼ r1 ¼ r2 ¼ r=2 ¼ 0:01; DAB ¼ 0:03; and the conditional probabilities in À5 À4 ¼ 0:01; u1 ¼ u2 ¼ 10 ; Ne ¼ 50; v ¼ 5 10 ; p¯A ¼ p¯B ¼ 0:95; s1i ¼ migrants for the neutral locus under different backgrounds y¯0 0:82; s2i ¼ 0:02 ði ¼ 1; ...; 30Þ; r1 ¼ r2 ¼ r=2; D¯ AB ¼ 0:03; and the condi- y¯1 ¼ y¯2 ¼ 0:7; and y¯3 ¼ 0:2: The dashed lines in (a) at the positions of tional probabilities in migrants for the neutral locus under different F st ¼ 0:2320; 0.0847, and 0.0444 refer to the expected values under the ¼ backgrounds y¯0 ¼ 0:82; y¯1 ¼ y¯2 ¼ 0:7; and y¯3 ¼ 0:2: The dashed line in purely neutral process with migration rates of m~ 0:015 (a) refers to the expected F st (0.2320) under the purely neutral process. (mS ¼ mP ¼ 0:01), m~ ¼ 0:05 (mS ¼ 0:04; mP ¼ 0:02), and m~ ¼ 0:10 (mS ¼ 0:05; mP ¼ 0:1), respectively. wide situation to map the effects of background and stepping-stone models of population structure, and selection in terms of population differentiation at a ﬁne presented the analytical expression of F st suitable for the genome scale. case of two subpopulations. The present model is an Although the present result is qualitatively the same extension of the classical island model (Wright, 1969)to as a previous study (Charlesworth et al., 1997), our incorporate effects of background selection and neces- approach is fundamentally different in theoretical sarily expands the study of Charlesworth et al. (1997) to deduction. By partitioning the total genetic diversity an arbitrary number of subpopulations simultaneously. into the components of between and within subpopula- Further, when the haploid dispersal of pollen is set tions, Charlesworth et al. (1997) showed that the to zero, our F st formula can be applied to animal increase of F st due to background selection is mainly populations where only diploid dispersal occurs. caused by the decreased diversity within populations. Until now empirical studies have been concentrated Our general expression of F st is derived using Wright’s on the observations of high F st in the regions of low approach and seems more rigorous. recombination as a result of hitchhiking effect, such as The expression of F st given by Charlesworth et al. in Drosophia species (e.g., Stephan and Mitchell, 1992; (1997) is only suitable for a pair of subpopulations. Begun and Aquadro, 1993). Much less attention has Slatkin and Wiehe (1998) have investigated the hitchhik- been paid to the effects of background selection on F st: ing effects on population differentiation using the island A recent study shows that the average level of nucleotide ARTICLE IN PRESS X.-S. Hu, F. He / Journal of Theoretical Biology 235 (2005) 207–219 217 diversity in regions of low recombination can be used to P ¼ðð1 Àð1 þ p Þs ÞP À d r Þ=w¯ aiCi ai 1i aiCi 0 1 i distinguish background selection from hitchhiking þðu À vÞP À m~ ðP À P¯ Þ, ðA:2Þ effects (Innan and Stephan, 2003). Because population 1 AiCi aiCi aC differentiation of a neutral locus is negatively related to where d0 ¼ PAi Pai ðx1i À x0iÞð1 À s1iÞ: The changes in the genetic diversity within subpopulations given a total gamete frequency (DP ¼ P À P and AiCi AiCi AiCi genetic variation, an interesting question is whether the DP ¼ P À P ) can be calculated from Eqs. aiCi aiCi aiCi measure of F st can be used to distinguish background (A.1) and (A.2). selection from hitchhiking effects. The properties of F From the relation of P ¼ p x ; we obtained st AiCi Ai 0i under hitchhiking effects (Slatkin and Wiehe, 1998) and DP ¼ Dp x þ p Dx : Since Dp ¼ 0 at steady AiCi Ai 0i Ai 0i Ai background selection (present study) are very similar, state, the change in the conditional probability of x0i is displaying a negative correlation with recombination given by fraction. The ‘‘spatial’’ pattern of F along chromo- st Dx ¼ DP =p ¼ r p ð1 À s Þð1 þ 2p s Þðx À x Þ somes is likely very similar between these two types of 0i AiCi Ai 1 ai 1i ai 1i 1i 0i þðp s À u À v À m~ Þx þ m~ p¯ =p , ðA:3Þ processes, and this problem presents a challenge for ai 1i 1 0i AiCi Ai future study. where p¯ is the frequency of the gamete A C in The present theory provides an implemental techni- AiCi i i que to examine the magnitude of background selection migrants. Similarly, the change in the conditional effects. The conventional approaches for estimating probability x1i is given by population differentiation at neutral loci remain valid Dx1i ¼ DPaiCi =pa ¼ r1pA ð1 À s1iÞð1 þ 2pa s1iÞðx0i À x1iÞ (e.g., Weir, 1996). The problem is how to distinguish i i i þðp s À s À v À m~ Þx þ u p x =p which loci are purely neutral and not affected by ai 1i 1i 1i 1 Ai 0i ai þ m~ p¯ =p , ðA:4Þ background selection or hitchhiking effects. With aiCi ai molecular genome sequence data, F can be calculated st where p¯ is the frequency of the gamete a C for all individual SNP that are distributed on the same aiCi i i in migrants. Since the genetic drift process does not chromosome, using the method introduced by Hudson change the average gamete frequency, Eqs. (A.3) et al. (1992), and hence the pattern of genome-wide F st and (A.4) actually represent the per-generation changes can be mapped. Those neutral loci with the smallest F st in the conditional probabilities of x and x , respec- values can be used for approximating F ; and the 0i 1i st:b tively. effects of background selection at other individual SNP can be estimated according to Eq. (21). The prerequi- sites for such genomic-wide F st mapping are the tests of neutrality and background selection. Appendix B. Recurrent equation for the LD between two selected loci

Acknowledgements Let DABðiÞ be the LD between the A and B loci in the ith subpopulation in the current generation (adults), We sincerely appreciate Richard A. Ennos and two and r be the recombination rate between them. The referees for insightful comments. This work was gamete frequencies at the current generation, denoted by financially supported from the Department of Renew- Pjl ðj ¼ Ai; ai; l ¼ Bi; biÞ; can be expressed as Pjl ¼ pjpl þ d D ; where d ¼ d ¼ 1andd ¼ d ¼À1: able Resources, University of Alberta. jl ABðiÞ AiBi aibi Aibi aiBi The gamete frequencies in pollen and ovules in the next 0 0 generation, denoted by Pjl; can be expressed by Pjl ¼ pjpl þð1 À rÞdjlDABðiÞ: After pollen flow the gamete Appendix A. Changes in the conditional probability in the 00 frequencies in pollen, denoted by Pjl; can be expressed two-locus case 00 ¯ 0 ¯ by Pjl ¼ mPPjl þð1 À mPÞPjl; where Pjl is the gamete frequency in migrants. The gamete frequencies in ovules Let P and P be the frequencies of gametes A C AiCi aiCi i i remain the same as those in the preceding adults. and aiCi after selection in the ith population, respec- After random combination between pollen and tively. The mean fitness in the ith subpopulation, ovules, the nine genotypic frequencies in seeds so denoted by w¯ ; is approximated by w¯ ¼ 1 À 2p s : i i ai 1i formed can be readily calculated. For example, the The frequency of each two-locus gamete after selection frequency for the genotype AiAiBiBi in seeds, denoted (P and P ) can be calculated using the conven- 0 00 AiCi aiCi by PA A B B ; equals PA B PA B : The frequency for the tional method, i i i i i i i i genotype AiAiBiBi after seed flow can be expressed as 00 0 P ¼ mSP¯ AABB þð1 À mSÞP ; where P¯ AABB P ¼ðð1 À pa s1iÞPA C þ d0r1Þ=w¯ i AiAiBiBi AiAiBiBi AiCi i i i is the frequency in migrating seeds. The frequencies for Àð þ Þ À ð À ¯ Þ ð Þ u1 v PAiCi m~ PAiCi PAC , A:1 other genotypes after seed flow can be expressed ARTICLE IN PRESS 218 X.-S. Hu, F. He / Journal of Theoretical Biology 235 (2005) 207–219 in a similar way. Following the procedure of mutation P ¼ðð1 À s p À s ð1 þ p ÞÞP AiCibi 1i ai 2i bi AiCibi and selection, the genotypic frequencies in adults in þ d r þ d r À d Þ=W¯ À u P þ u P the next generation can be derived using the conven- 4 1 2 2 0 i 1 AiCibi 2 AiCiBi À À ð À ¯ Þ ð Þ tional method. The frequencies of four gametes vPAiCibi m~ PAiCibi PACb , C:1c after selection in adults, denoted by Pjl ðj ¼ Ai; ai; l ¼ Bi; biÞ; are P ¼ðð1 À s ð1 þ p ÞÀs ð1 þ p ÞÞP aiCibi 1i ai 2i bi aiCibi ¯ P ¼ m~ P¯ þð1 À m~ þ p s À d4r1 À d3r2 þ d0Þ=W i þ u1PAiCibi þ u2PaiCiBi AiBi AB ai 1i 0 À vP À m~ ðP À P¯ Þ, ðC:1dÞ þ p s À u À u ÞP , ðB:1aÞ aiCibi aiCibi aCb bi 2i 1 2 AiBi where P ¼ m~ P¯ þð1 À m~ þ p s À p s À u ÞP0 Aibi Ab ai 1i Bi 2i 1 Aibi d1 ¼ PA B Pa B ð1 À s1iÞðy0i À y1iÞ þ u P0 , ðB:1bÞ i i i i 2 AiBi þðPAiBi Paibi y0i À PAibi PaiBi y1iÞð1 À s1i À s2iÞ,

P ¼ m~ P¯ þð1 À m~ À p s þ p s À u ÞP0 aiBi aB Ai 1i bi 2i 2 aiBi d2 ¼ PAiBi PAibi ð1 À s2iÞðy0i À y2iÞ 0 þ u1PA B , ðB:1cÞ i i þðPAiBi Paibi y0i À PAibi PaiBi y2iÞð1 À s1i À s2iÞ,

¯ 0 Pa b ¼ m~ Pab þð1 À m~ À pA s1i À pB s2iÞPa b i i i i i i d3 ¼ Paibi PaiBi ð1 À 2s1i À s2iÞðy3i À y1iÞ þ u P0 þ u P0 . ðB:1dÞ 1 Aibi 2 aiBi þðPAiBi Paibi y3i À PAibi PaiBi y1iÞð1 À s1i À s2iÞ, Let DABðiÞ be the LD in the ith subpopulation in the next adult generation. According to Eqs. (B.1a)–(B.1d), d4 ¼ PAibi Paibi ð1 À s1i À 2s2iÞðy3i À y2iÞ the recurrent equation for LD between the A and B loci þðPAiBi Paibi y3i À PAibi PaiBi y2iÞð1 À s1i À s2iÞ, is derived as

D ¼ P P À P P d0 ¼ r1r2ð1 À s1i À s2iÞðPAiBi Paibi ðy0i þ y3i À y0iy3iÞ ABðiÞ AiBi aibi Aibi aiBi À P P ðy þ y À y y ÞÞ. ¼ m~ ðD¯ þðp À p¯ Þðp À p¯ ÞÞ Aibi aiBi 1i 2i 1i 2i AB Ai A Bi B þð1 À m~ Àðp À p Þs d0 is associated with the effect of double crossover Ai ai 1i among the three loci. Àðp À p Þs À u À u Þð1 À rÞD , ðB:2Þ Bi bi 2i 1 2 ABðiÞ From the expression of PAiCiBi ¼ PAiBi y0i; we obtain where D¯ AB is the LD in migrants. DPAiCiBi ¼ DPAiBi y0i þ PAiBi Dy0i. (C.2)

Since DPAiBi ¼ 0 at steady state, the change in the conditional probability of y0i is Appendix C. Changes in the conditional probability in the three-locus case Dy0i ¼ DPAiCiBi =PAiBi

¼ðs1ipa þ s2ipb À u1 À u2 À v À m~ Þy0i Let P ; P ; P ; and P be the i i AiCiBi AiCibi aiCiBi aiCibi þðÀd1r1 À d2r2 þ d0Þð1 þ 2s1ipa þ 2s2ipb Þ=PA B frequencies of gametes AiCiBi, AiCibi, aiCiBi, and aiCibi i i i i ¯ after selection in the ith population, respectively. Using þ m~ PACB=PAiBi , ðC:3aÞ the assumption of multiplicative viability model, the where DPA C B ¼ P À PA C B and p¯ is the mean ﬁtness in the ith subpopulation equals W¯ i ¼ i i i AiCiBi i i i ACB 1 À 2p s À 2p s : Using the same approach as in the frequency of gamete AiCiBi in migrants. ai 1i bi 2i two-locus case, the frequencies of the four three-locus Similarly, the changes in the conditional probabilities gametes are given by of y1i, y2i, and y3i are derived as Dy ¼ðÀs p þ s p þ u À u À v À m~ Þy P ¼ðð1 À s p À s p ÞP À d r 1i 1i Ai 2i bi 1 2 1i AiCiBi 1i ai 2i bi AiCiBi 1 1 ¯ þðd1r1 þ d3r2 À d0Þð1 þ 2s1ipa þ 2s2ipb Þ=PaiBi À d2r2 þ d0Þ=W i Àðu1 þ u2 þ vÞPAiCiBi i i ¯ þ m~ P¯ aCB=Pa B , ðC:3bÞ À m~ ðPAiCiBi À PACBÞ, ðC:1aÞ i i

P ¼ðð1 À s ð1 þ p ÞÀs p ÞP Dy i ¼ðs1ipa À s2ipB À u1 þ u2 À v À m~ Þy i aiCiBi 1i ai 2i bi aiCiBi 2 i i 2 ¯ þðd r þ d r À d Þð1 þ 2s ip þ 2s ip Þ=PA b þ d1r1 þ d3r2 À d0Þ=W i þ u1PAiCiBi À u2PaiCiBi 4 1 2 2 0 1 ai 2 bi i i ¯ ¯ À vPaiCiBi À m~ ðPaiCiBi À PaCBÞ, ðC:1bÞ þ m~ PACb=PAibi , ðC:3cÞ ARTICLE IN PRESS X.-S. Hu, F. He / Journal of Theoretical Biology 235 (2005) 207–219 219

Dy ¼ðÀs p À s p þ u þ u À v À m~ Þy Hill, W.G., Robertson, A., 1968. Linkage disequilibrium in finite 3i 1i Ai 2i Bi 1 2 3i populations. Theor. Appl. Genet. 38, 226–231. þðÀd r À d r þ d Þð1 þ 2s p þ 2s p Þ=P 4 1 3 2 0 1i ai 2i bi aibi Hu, X.S., 2000. A preliminary approach to the theory of geographical þ m~ p¯ =p , ðC:3dÞ gene genealogy for plant genomes with three different models of aiCibi aibi inheritance and its application. Acta Genetica Sinica 27, 440–448. Hu, X.S., Ennos, R.A., 1999. Impacts of seed and pollen flow on where p¯ACb; p¯aCB; and p¯aCb are the frequencies of gametes A C b , a C B , and a C b in migrants, respec- population differentiation for plant genomes with three contrasting i i i i i i i i i modes of inheritance. Genetics 152, 441–450. tively. Hu, X.S., Li, B.L., 2002. Seed and pollen flow and cline discordance among genes with different modes of inheritance. Heredity 88, 212–217. References Hudson, R.R., Kaplan, N.L., 1995. Deleterious background selection with recombination. Genetics 141, 1605–1617. Abecasis, G.R., Noguchi, E., Heinzmann, A., Traherne, J.A., Hudson, R.R., Slatkin, M., Aguade´, M., 1992. Estimation of levels of Bhattacharyya, S., Leaves, N.I., Anderson, G.G., Zhang, Y.M., gene flow from DNA sequence data. Genetics 132, 583–589. Lench, N.J., Carey, A., Cardon, L.R., Moffatt, M.F., Cookson, Innan, H., Stephan, W., 2003. Distinguishing the hitchhiking and W.O.C., 2001. Extent and distribution of linkage disequilibrium in background selection models. Genetics 165, 2307–2312. three genomic regions. Am. J. Hum. Genet. 68, 191–197. Li, W.H., Nei, M., 1974. Stable linkage disequilibrium without Barton, N.H., 2000. Genetic hitchhiking. Philos. Trans. R. Soc. Lond. epistasis in subdivided populations. Theor. Popul. Biol. 6, 173–183. B 355, 1553–1562. Maynard Smith, J., Haigh, J., 1974. The hitch-hiking effect of a Begun, D.J., Aquadro, C.F., 1993. African and North American favorable gene. Genet. Res. 23, 23–35. populations of Drosophila melanogaster are very different at DNA Nordborg, M., 1997. Structured coalescent process on different time level. Nature 365, 548–550. scales. Genetics 146, 1501–1514. Bennett, J.H., 1954. On the theory of random mating. Ann. Eugenic. Nordborg, M., Charlesworth, B., Charlesworth, D., 1996. The effect of 18, 311–317. recombination on background selection. Genet. Res. 67, 159–174. Caballero, A., Hill, W.G., 1992. Effective size of non-random mating Rafalski, J.A., 2002. Novel genetic mapping tools in plants: SNPs and populations. Genetics 130, 909–916. LD-based approaches. Plant Sci. 162, 329–333. Charlesworth, B., Morgan, M.T., Charlesworth, D., 1993. The effect Shifman, S., Kuypers, J., Kokoris, M., Yakir, B., Darvasi, A., 2003. of deleterious mutations on neutral molecular variation. Genetics Linkage disequilibrium patterns of the human genome across 134, 1289–1303. populations. Hum. Mol. Genet. 12, 771–776. Charlesworth, B., Morgan, M.T., Charlesworth, D., 1997. The effects Slatkin, M., 1975. Gene flow and selection in a two-locus system. of local selection, balanced polymorphism and background Genetics 81, 787–802. selection on equilibrium patterns of genetic diversity in subdivided Slatkin, M., Wiehe, T., 1998. Genetic hitch-hiking in a subdivided populations. Genet. Res. 70, 155–174. population. Genet. Res. 71, 155–160. Farnir, F., Coppieters, W., Arranz, J.J., Berzi, P., Cambisano, N., Stephan, W., Mitchell, S.J., 1992. Reduced levels of DNA polymorph- Grisart, B., Karim, L., Marcq, F., Moreau, L., Mni, M., Nezer, C., ism and fixed between-population differences in the centromeric Simon, P., Vanmanshoven, P., Wagenaar, D., Georges, M., 2000. region of Drosophila ananassae. Genetics 132, 1039–1045. Extensive genome-wide linkage disequilibrium in Cattle. Genome Wang, D.G., Fan, J.B., Siao, C.J., et al., 1998. Large-scale identifica- Res. 10, 220–227. tion, mapping, and genotyping of single-nucleotide polymorphisms Goddard, K.A., Hopkins, P.J., Hall, J.M., Witte, J.S., 2000. Linkage in the human genome. Science 280, 1077–1082. disequilibrium and allele-frequency distribution for 114 single- Weir, B.S., 1996. Genetic Data Analysis II. Methods for Discrete nucleotide polymorphisms in five populations. Am. J. Hum. Genet. Population Genetic Data. Sinauer Associates, Sunderland, MA. 66, 216–234. Wright, S., 1943. Isolation by distance. Genetics 28, 114–138. Hill, W.G., 1974. Disequilibrium among several linked neutral genes in Wright, S., 1969. Evolutionary and the Genetics of Populations. vol. 2. finite populations. I. Mean changes in disequilibrium. Theor. The Theory of Gene Frequencies. The University of Chicago Press, Popul. Biol. 5, 366–392. Chicago.