Hitchhiking Effect of a Beneficial Mutation Spreading in a Subdivided

INVESTIGATION

Hitchhiking Effect of a Beneﬁcial Mutation Spreading in a Subdivided Population

Yuseob Kim*,†,1 and Takahiro Maruki* *Center for Evolutionary Medicine and Informatics and School of Life Sciences, Arizona State University, Tempe, Arizona 85287, and †Department of Life Science, Ewha Womans University, Seoul, Korea 120-750

ABSTRACT A central problem in population genetics is to detect and analyze positive natural selection by which beneficial mutations are driven to fixation. The hitchhiking effect of a rapidly spreading beneficial mutation, which results in local removal of standing genetic variation, allows such an analysis using DNA sequence polymorphism. However, the current mathematical theory that predicts the pattern of genetic hitchhiking relies on the assumption that a beneficial mutation increases to a high frequency in a single random- mating population, which is certainly violated in reality. Individuals in natural populations are distributed over a geographic space. The spread of a beneficial allele can be delayed by limited migration of individuals over the space and its hitchhiking effect can also be affected. To study this effect of geographic structure on genetic hitchhiking, we analyze a simple model of directional selection in a subdivided population. In contrast to previous studies on hitchhiking in subdivided populations, we mainly investigate the range of sufficiently high migration rates that would homogenize genetic variation at neutral loci. We provide a heuristic mathematical analysis that describes how the genealogical structure at a neutral locus linked to the locus under selection is expected to change in a population divided into two demes. Our results indicate that the overall strength of genetic hitchhiking—the degree to which expected heterozygosity decreases—is diminished by population subdivision, mainly because opportunity for the breakdown of hitchhiking by recombination increases as the spread of the beneficial mutation across demes is delayed when migration rate is much smaller than the strength of selection. Furthermore, the amount of genetic variation after a selective sweep is expected to be unequal over demes: a greater reduction in expected heterozygosity occurs in the subpopulation from which the beneficial mutation originates than in its neighboring subpopulations. This raises a possibility of detecting a “hidden” geographic structure of population by carefully analyzing the pattern of a selective sweep.

HEN a beneficial mutation arises in a population and 1974; Kaplan et al. 1989; Stephan et al. 1992; Fay and Wu Wrapidly increases to high frequency by directional se- 2000; Kim and Stephan 2002; Hermisson and Pennings lection, it also increases the frequency of neutral alleles on 2005; Etheridge et al. 2006). However, major theoretical re- the same chromosome at linked polymorphic loci, resulting sults were obtained from models that consider the spread of in a sudden reduction in genetic variation. This effect, abeneficial mutation in a single random-mating population. termed genetic hitchhiking (Maynard Smith and Haigh Natural populations are composed of individuals that are 1974) or selective sweep, provides a powerful means to distributed over geographical space. Therefore, with limited identify and study recent episodes of adaptive evolution migration, mating occurs more frequently among individu- (reviewed in Nielsen 2005; Sabeti et al. 2006; Thornton als that are close to each other than among those that are far et al. 2007; Akey 2009; Stephan 2010). Numerous studies apart on the space. This geographic structure is often advanced the mathematical model of this evolutionary pro- reduced to a simple demographic model in which a number cess and now provide accurate theoretical predictions and of small populations (or demes), each of which is panmictic, tools for genomic data analyses (Maynard Smith and Haigh are connected to each other by limited migration (Wright 1940). In this model, a proportion m of individuals in a deme is replaced by migrants from other demes each generation. A Copyright © 2011 by the Genetics Society of America doi: 10.1534/genetics.111.130203 fundamental result of spatial population genetics is that, if m Manuscript received May 1, 2011; accepted for publication June 2, 2011 is sufficiently large so that Nm . 1 where N is the population 1Corresponding author: Department of Life Science and Division of EcoScience, Ewha Womans University, 11-1 Daehyun-dong, Seodaemun-Ku, Seoul, Korea 120-750. size of a deme, neutral genetic variation is homogenized E-mail: [email protected] over the entire system of connected demes at equilibrium

Genetics, Vol. 189, 213–226 September 2011 213 under mutation–drift–migration balance (Slatkin 1987). that under a panmictic population and, more importantly, Therefore, even when a population has a clear geographic the frequency of a beneficial allele may be rising concur- structure (m > 0.5), its pattern of polymorphism at many rently, if not in complete synchrony, in multiple demes. This neutral markers may not deviate from that under panmixia. biological condition is important because, as explained In this case, modeling a natural population to be panmictic above, natural populations have a geographic structure that would not cause much problem. can modify the frequency trajectories of beneficial mutations However, if an evolutionary process occurs at a timescale over space while this structure is often undetected at the that is much shorter than that of the neutral coalescent, neutral loci (thus the genomic average pattern of variation) limited migration may lead to a heterogeneous footprint of as Nm . 1. We are investigating the effect of this hidden the process over geographic space. The spread of a beneficial geographic structure on the pattern of selective sweeps. Our mutation is a case of such fast evolutionary change. Let the analysis begins with a simple model in which two equal- frequency of a new beneficial mutation initially increase by sized panmictic populations are connected by limited migra- a factor of 1 + s per generation in a local deme. Then, its tion and a beneficial mutation occurs in one deme and then spread to the entire population will be limited by migration spreads to the entire population. Even with this simple model, if m , s, regardless of the value of Nm. Namely, the geo- we could provide only heuristic mathematical analysis based graphical structure of natural populations is expected to on numerous simplifying assumptions and then compare the cause the allele frequency trajectory of a beneficial allele solution with computer simulations. However, our limited to deviate from that in a panmictic population. Then, the results still clearly demonstrate the important effect of pop- hitchhiking effect of the beneficial mutation spreading over ulation subdivision on the pattern of genetic hitchhiking. a subdivided population may be different from that in a panmictic population. For example, Barton (2000) predicted Model that a beneficial mutation spreading over a geographic distance will produce a weaker hitchhiking effect because the A brief overview of the model of selective sweep in a time taken for the mutation to reach fixation is longer com- subdivided population, in comparison with that in a panmic- pared to the case of panmixia. tic population, is presented in Figure 1. A positively selected The theory of genetic hitchhiking in a subdivided pop- allele, B, rapidly spreads in the population in association ulation was developed in previous studies, notably by with a particular neutral “hitchhiker” allele at a linked locus. Slatkin and Wiehe (1998) and Santiago and Caballero Such an association is broken down by recombination (2005). They mainly analyzed the model in which subpopu- events, which allow the residual levels of polymorphism lations are isolated with weak migration (Nm > 1), under after the fixation of the B allele. Key differences in the pro- which a beneficial mutation goes to fixation in one popula- cess of genetic hitchhiking between panmictic and subdi- tion before it starts to increase in the next population (i.e., vided populations are identified in Figure 1. First, it takes the locus under selection is polymorphic in one deme only at longer for B to be fixed in a subdivided population because a given time). Their results indicated that, if neutral varia- the spread of B in the second deme starts only after B has tion near the target of selection is initially homogeneous increased to a sufficiently high frequency in the first deme. among subpopulations, the sequential fixations of the ben- Second, in the panmictic population, opportunity for the eficial allele may create genetic differentiation, because oc- recombination breakdown of allelic association monotoni- casional recombination may cause different neutral alleles at cally decreases while the frequency of B increases. On the a locus to hitchhike in different subpopulations. Therefore, other hand, such recombination occurs as a two-step process

Wright’s FST would increase from a small to an intermediate in the subdivided population, first in deme 1 and later in value by hitchhiking (Slatkin and Wiehe 1998; Bierne deme 2. This may effectively increase the overall rate of 2010). On the other hand, if subpopulations are highly dif- recombination breakdown and thus weaker hitchhiking. Im- ferentiated initially, the fixation of a beneficial allele in the portantly, depending on which B-bearing chromosomes mi- entire population reduces the level of differentiation at grate and increase under positive selection in the second linked neutral loci, because common neutral alleles are deme, the same or different alleles at the neutral locus likely to hitchhike to the beneficial mutation with limited may be hitchhiked to high frequency in the two demes. recombination. Therefore, FST decreases from a high to an In our model of a subdivided population, 2NK haploid intermediate value (Santiago and Caballero 2005). These individuals are subdivided equally into K demes. Unless theoretical studies helped interpret the geographic pattern stated otherwise, demes are structured according to the cir- of neutral polymorphism shaped by natural selection in cular stepping-stone model if K . 2. Demes are indexed by 1 a highly structured species such as Drosophila ananassae to K, indicating their spatial order. Demes 1 and K are neigh- (Stephan et al. 1998; Baines et al. 2004; Das et al. 2004) boring each other, thus making a circular population structure. and others (Faure et al. 2008). Generations are not overlapping and, in each generation, hap- In this study, we are interested in a subdivided population loids reproduce in the order of selection, recombination, and with more frequent migration (Nm . 1), under which the migration. During migration, the proportion m of haploids in pattern of long-term neutral variation would be similar to deme j moves to neighboring demes (m/2 to deme j 2 1 and

214 Y. Kim and T. Maruki Figure 1 Schematic illustration of the two- locus two-allele model of selective sweeps in panmictic and subdivided (K ¼ 2) populations. Chromosomes found in populations are shown to carry alleles at a locus under selection (wild- type allele b or beneficial allele B) and a neutral locus (allele A or a). The panmictic population is initially fixed for the b allele and a single copy of the B allele appears on a chromosome carrying allele A (the “hitchhiker” allele), which exists in equal frequency with a (stage 1a). The frequency of haplotype BA thus increases rapidly under positive selection and limited recombination. It is shown that a BA chromosome under- goes recombination with a ba chromosome to allow the spread of Ba haplotype in the population (1b indicated by “·”). Allele a thus survives the wipeout but exists in low frequency when B reaches fixation (1c). In the subdivided population, the rapid spread of B, also in association with A, is initially limited to the first deme (2a and 2b). While B is increasing in frequency, its association with A is broken by recombination (2b) and a chromosome carrying the B and the A allele migrates to the second deme and starts increasing there (2c). B in the second deme is also subject to recombination with a (2c). B is fixedinthefirst deme while it is still in intermediate frequency in the second deme (2d). When B becomes fixed in the entire population (2e), allele a is in low frequency in both demes. Note that it could have been a Ba chromosome instead of a BA chromosome that migrated (in stage 2c) and initiated the spread of B in the second deme, in which case allele a would become dominant in the second deme and result in much less change in the overall allele frequency at the neutral locus in the entire population. m/2 to deme j + 1). Note that for K = 2, two demes ex- condition until it reaches fixation in the entire population. change 2Nm migrants per generation. For modeling the All simulation results are based on 10,000 replicates for hitchhiking effect, we consider two biallelic loci—selected each parameter combination. and neutral loci that are partially linked with the probability fi of recombination r per generation. At the selected locus, Migration-Limited Trajectory of Bene cial Mutation mutation from the wild-type allele b to a beneficial allele The frequency of beneficial mutation rapidly increases B, with selective advantage s, arises on a particular chromo- within a deme by positive selection. However, its spread some in deme 1. At the time of this mutation, each of K into the entire population might be limited by the rate of demes is polymorphic at the neutral locus, with A and migration between demes. Namely, there might be a “delay” 2 a alleles in frequencies p0 and 1 p0, respectively. After in the fixation of beneficial mutation in a subdivided pop- genetic hitchhiking,P the frequency of A in deme j changes ulation compared to the panmictic population of equal size. to pj and p = j pj/K. We are mainly concerned about the To analyze the hitchhiking effect in this model, we first change in heterozygosity in the entire population (from need to understand how much delay in the spread of ben- ~ 2 ðTÞ 2 H ¼ 2p0ð1 p0Þ to H ¼ 2pð1 pÞÞ. eficial mutation is caused by geographical structure of the A forward-in-time simulation is built directly on this population. We firstconsiderthecaseofK =2.Itisas- discrete-time genetic model. At each generation, haplotype sumed that the beneficial mutation arising in deme 1 is frequencies at K demes are changed deterministically by eventually fixedinbothdemes1and2.LetXj(T)bethe selection, recombination, and migration, followed by the frequency of allele B in deme j at time T,whentimeis step of random sampling that uses a random binominal counted forward in generations and T =0whenthemu- number generator (Kim and Wiehe 2009). The initial allele tation to allele B happened in deme 1. Then, we define fi ^ ^ frequency at the neutral locus is given as a xed value (0.2) Tj ¼ maxT½XjðTÞ.0 and XjðT21Þ¼0.Namely,Tj marks for all demes. We also specified initial frequencies for K the time when the copy of allele B that survives loss by demes sampled from equilibrium distribution at the balance genetic drift is established in deme j.Wedefine the delay of mutation, migration, and drift (obtained by separate for- in the spread of allele B by ward-in-time simulations) but this did not yield different ^ ^ outcomes when the hitchhiking effect was measured by d ¼ jT2 2 T1j: (1) ðTÞ=~ H H (data not shown). Initial haplotypes were given such ^ ^ that it simulates the occurrence of a beneficial mutation at (Note that, with m > s, T2 . T1 ¼ 0 in most cases. However, a randomly chosen chromosome. If the beneficial mutation with more frequent migration, deme 1 may lose the B allele is lost, the simulation run is repeated from a new initial by genetic drift but later receive the allele from deme 2. In

Selective Sweep Over a Structured Population 215 fi this case the^ roles^ of deme 1 and 2 are reversed and the delay the t of Equation 1 to the stochastic trajectory of allele B becomes T1 2 T2.) (Maynard Smith 1971; Kim and Nielsen 2004). Then, we The expected value of d can be solved by approximating obtain the approximation of the mean delay time fi X1(T) by the deterministic trajectory of a bene cial muta- ^ 1 s tion, starting from frequency e (>1), in a single panmictic d [ log 1 þ : (5) population of size 2N. It is therefore assumed that migration s m between two populations does not affect the trajectory, This result shows that the delay time is on the same order fi , fi which might be justi ed for m s. Therefore, we de ne (1/s)asthedurationofthefixation process (t ¼ 2ð2=eÞlogðeÞÞ e and critically depends on the relative ratio of s and m. X ðTÞ¼ (2) 1 e þ expð2sTÞ Figure 2 shows the comparison of these approximations to the delay time observed in frequency-based stochastic (Stephan et al. 1992). In each generation, on average simulations.

2NmX1(T) copies of allele B enter deme 2 from deme 1. Each copy of B is lost by genetic drift with probability 1 2 2s, Hitchhiking Effect of the Beneficial approximately, assuming s > 1. At least one copy of B suc- Mutation—“Marked” Coalescent cessfully establishes in deme 2 at time T with probability fi 12ð122sÞ2NmX1ðTÞ. Therefore, the mean waiting time until Next, we analyze the hitchhiking effect of a bene cial fi mutation that spreads across demes in the manner described Pthe rst occurrenceQ of suchP a migrantQ copy is given by (using N 2 n21 N n , above. Our goal is to obtain an approximate solution for the n¼1nð1 anÞ i¼0ai ¼ n¼1 i¼0ai if janj 1 for all n, and making a Taylor series approximation) change of heterozygosity due to hitchhiking. The analysis is based on modeling genetic hitchhiking by the “marked” co- PN TQ21 alescent, which is derived from the structural coalescent d 2 2 2NmX1ðTÞ 2 2NmX1ðiÞ E½ ¼ T 1 ð1 2sÞ ð1 2sÞ model of genetic hitchhiking (Kaplan et al. 1989). In the T¼1 i¼0 PN Q structural coalescent model, gene lineages at a neutral locus T 2 2NmX1ðiÞ ¼ i¼0ð1 2sÞ traced from a present-day sample into the past are described T¼1 PN P to jump between two genetic backgrounds (corresponding 2 T expð 4Nsm i¼o X1ðiÞÞ to beneficial, B, and ancestral, b, alleles) by recombination, ÐT¼1 N 2 : while coalescence among lineages is allowed only when 0 expð 4NsmY1ðTÞÞdT they are on the same background. It can be shown that, Here, during the selective phase (the period between the birth and fixation of the B allele), lineages that arrive in the PT b background rarely move back to the B background and also Y ðTÞ¼ X ðiÞ 1 1 rarely coalesce to each other (Kaplan et al. 1989; Durrett ið¼0 ð T T e e sT and Schweinsberg 2004; Etheridge et al. 2006; Pfaffelhuber 1 1 þ e : X1ðzÞdz ¼ 2sz dz ¼ log et al. 2006). If we thus model that lineages moving from the 0 0e þ e s 1 þ e B to the b background by recombination remain distinct un- Therefore, til they exit the selective phase, the genealogy at the neutral locus is fully described by events of such recombination ðN ðN 1 þ e 4Nm 1 mapped along the genealogy at the selected locus (referred E d dT dT [ d: (3) ½ e sT 4Nm fi 0 1 þ e 0 ð1 þ eesTÞ to below as the B genealogy). Therefore, one may rst obtain a B genealogy and then calculate how often it is marked by recombination events to predict the pattern of genetic An essentially identical result was obtained by Slatkin variation at a linked neutral locus (Pfaffelhuber et al. 2006). ? d (1976). With 4Nm 1, is mostly determined by the above Here we first show how a marked coalescent allows a simple 2 = e integration in the interval [0, ð1 sÞln ], where we may derivation of the standard result of genetic hitchhiking e sT 4Nm e sT use (1 + e ) 1+4Nm e . Then, a simpler approx- [hard selective sweep in a single random mating population d imation (however, an overestimate of ) is obtained as (Maynard Smith and Haigh 1974)]. ðN As we are interested in obtaining the heterozygosity after ^ 1 1 4Nme þ 1 d [ dT ¼ log : (4) the fixation of the beneficial mutation, we obtain the B ge- 1 þ 4NmeesT s 4Nme 0 nealogy starting from two distinct beneficial mutations on sampled chromosomes. We consider a single panmictic Considering that the allele frequency of a beneficial population of size 2N in which the beneficial mutation is mutation starting from one copy is elevated by the inverse quasi-fixed with frequency 1 2 e. Time, t, is now counted of its fixation probability (i.e., probability of surviving extinc- backward in generations, t = 0 being the present (time of tion by genetic drift 2s) relative to its deterministic tra- sampling). Then, the frequency of beneficial mutation, B,in jectory, we may use e = 1/(2N)/(2s) = 1/(4Ns) to maximize the population is modeled to decrease from 1 2 e to e by

216 Y. Kim and T. Maruki Ignoring the probability of mutation in the period between t =0andt, the two neutral alleles in the sample will be observed different only if their ancestors at t = t

are distinct, which happens with probability 1 2 Pcoal, and different. Let H~ be the expected heterozygosity at the neutral locus at t = t. Then, the mean heterozygosity at the neutral locus at present is given approximately by

= H ¼ H~ 1 2 e2r s : (10)

e Figure 2 Delay in the spread of a beneficial mutation in a subdivided Using =1/(2N), we obtain the solution of Maynard population with two demes as the function of migration rate, for s ¼ 0.01 Smith and Haigh (1974). As explained above, a better so- (gray) and 0.1 (black) and 2N ¼ 104. Expected delay, d, by Equation 3 is lution that corrects for the stochastic effect of conditioning shown by solid curves and its approximation, d , is given by dashed on the fixation of beneficial mutation is obtained by using 6 curves. Results from frequency-based forward simulations (mean SD) e =1/(4Ns), which agrees reasonably well with stochastic are also shown. simulation results (Kim and Nielsen 2004). More accurate solutions available so far effectivelytakethefulldistribu- tion of the coalescent time (Equation 8) (Stephan et al. e 1992; Barton 1998) and furthermore the full stochastic xðtÞ¼ ; ð0 # t #tÞ; (6) e þ expð2sðt 2 tÞÞ trajectory of beneficial mutation (Barton 1998; Etheridge et al. 2006; Pfaffelhuber et al. 2006) into account. How- t 2 =e e fi where ¼ ð2 Þlogð Þ speci es the time when the B allele ever, in this study, we aim to derive solutions for the accu- 2 = is introduced in the population. The lineages of the two B racy of simple approximation H=H~ ¼ 12ð4NsÞ 2r s, i.e., #t alleles will coalesce at t = tC ( ). While tracing each lin- ignoring genetic drift within the subpopulation of bene- eage of B, the associated allele at the neutral locus may ficial alleles, as our primary aim is to examine the relative recombine onto a chromosome carrying allele b at some strengths of hitchhiking in panmictic vs. subdivided time between t = 0 and tC. The probability that this event populations. happens at time t, given that it did not happen in the previous t 2 1 generations, is (1 2 x(t))r, which is an increasing function with t. If no recombination event occurs on either Hitchhiking Effect in a Subdivided Population, K =2 lineage, the two linked neutral lineages will coalesce at t = Next, we come back to the model of a subdivided population t , which happens with probability P , C coal with K = 2 in which a beneficial mutation, B, arising first in ð t Yt deme 1, propagates into deme 2 with a delay of d genera- 2 2 2 ; Pcoal ¼ fcðtÞ ð1 ð1 xðzÞÞrÞ dt (7) tions. We assume that m , s > 1 but 4Nm ? 1. With this 0 z¼0 migration rate, the pattern of variation at a neutral locus without selection is close to that of neutral equilibrium in where fc(t) is the probability distribution of the coalescent time at the selected locus. It is given approximately by a panmictic population. We thus assume that the heterozygosity at the neutral locus immediately before the time of Yt21 beneficial mutation (t = d + t as defined below) is given by 1 1 ~ fcðtÞ¼ 1 2 ; ð0 # t #tÞ: (8) H regardless of whether two chromosomes are sampled 2NxðtÞ 2NxðiÞ i¼0 from deme 1 only, deme 2 only, or from demes 1 and 2, respectively. However, the corresponding expected hetero- This function has a peak close to t. Then, we may further zygosities may not be equal after the allele B is fixed in approximate f (t) to be 1 for t = t and 0 for t , t 2 1: the c the entire population. We denote them as H(11), H(22), and coalescent occurs at the time of beneficial mutation. This is H(12), respectively. equivalent to ignoring genetic drift within the allelic class of Again, time is counted backward from the present (t = B (Maynard Smith and Haigh 1974; Barton 1998; Nielsen 0), at which the allele B is fixed in the entire population et al. 2005). Then, and two haploids are sampled randomly. Assuming migra- t Q Ð t tion is weaker than selection (once a copy of B enters P 1 2 1 2 x t r 2 exp 22r 1 2 x t dt coal ð ð ð ÞÞ Þ 0ð ð ÞÞ a deme, its frequency increases mostly due to selection t¼0 Ð t exp 2 s t 2 t = without being affected by continuous in- and outflow of ¼ exp 22r ð ð ÞÞ dt ¼ e2r s: 0eþexpð2 sðt 2 tÞÞ B), we may model the trajectory of allele B frequency in (9) deme 1 and deme 2 by

Selective Sweep Over a Structured Population 217 e x ðtÞ¼ ; ð0 # t #tþ dÞ (11) 1 e þ expð2sðt þ d 2 tÞÞ and e ; # #t e þ expð2sðt 2 tÞÞ ð0 t Þ x2ðtÞ¼ (12) 0; ðt , t #tþ dÞ; respectively. On the basis of this trajectory of the frequency of B, its hitchhiking effect on the neutral locus is determined by ﬁrst obtaining the coalescent tree at the selected locus and then marking the recombination events on this tree. Below, approximations are obtained for strong selection relative to migration. The lineage of a copy of a B allele (a “B lineage”) in deme 1 will jump to deme 2 with probability

2Nmx2(t + 1)/(2Nx1(t)) mx2(t)/x1(t) and that in deme 2 will jump to deme 1 with probability mx1(t)/x2(t). Then, if selection is strong such that m > 1/t (but still Nm . 1), Figure 3 Gene genealogy at the locus under positive directional selection in migrations of B lineages in either direction will be rare until a subdivided population. Time is counted backward from present (time 0). x2(t) becomes close to zero, which drastically increases the Gray curves show the frequency trajectory of allele B. Three lineages traced migration rate from deme 2 to deme 1. On the other hand, from three chromosomes that are sampled in deme 2 are labeled by a, b, if the strength of selection is moderate such that m . 1/t and c. The a and b lineages coalesce in deme 1 after they migrate to deme 1 at time tm and tm , respectively. The b and c lineages coalesce in deme 2. s/log[4Ns], there might be multiple jumps of B lineages in 1 2 both directions between times 0 and t. To derive sim- Yt21 1 1 2mx2ðiÞ ple approximations, we mainly consider the range of large f2cðtÞ¼ 1 2 2 2Nx ðtÞ 2Nx ðiÞ x ðiÞ s/m (or m > 1/t). The applicability of our approximations 2 i¼0 2 2 " # to a wider parameter range is examined by computer simu- Xt21 1 1 2mx1ðiÞ lations. We also provide an alternative derivation for m . 1/t exp 2 þ 2Nx ðtÞ 2Nx ðiÞ x ðiÞ 2 i¼0 2 2 in the Appendix. ð 1 t 1 þ 4Nmx ðzÞ (13) exp 2 2 dz i. Two gene copies from deme 2 2Nx2ðtÞ 0 2Nx2ðzÞ

As will be demonstrated shortly, the effect of population 1 ð2m=sÞðesd21Þ ¼ x1ðtÞ subdivision on the hitchhiking effect, examined by relative 2Nx2ðtÞ reduction in expected heterozygosity, may be most pro- 1 s st ð22Þ= ~ · exp 2 2m þ t 2 ðe 21Þ : nounced in deme 2 (i.e., effect on H H). Consider a sam- 2N 2Ns pling of two chromosomes from deme 2 (choosing two of three lineages shown in Figure 3). The “B genealogy” is de- The total probability of the coalescence in deme 2 is termined as we trace the lineages of two B alleles on these therefore ð chromosomes backward in time until they coalesce. Two Xt t : mutually exclusive genealogical events occur. One event is Q ¼ f2cðtÞ f2cðtÞdt (14) 0 the separate migrations of two lineages, from deme 2 and t¼1 deme 1, followed by the coalescence in deme 1. The other With probability 1 2 Q, the coalescence of B lineages event is the coalescence of two lineages in deme 2, followed occurs at deme 1. In this case, it is modeled that one B by the migration of the common ancestor into deme 1. As lineage ﬁrst migrates to deme 1 at time tm1 and the other > t m 1/ , we ignore the possibility that a B lineage migrates later at tm2 (tm1 , tm2). The probability distribution of tm1 is to deme 1 and then later migrates back to deme 2 and coa- obtained using a similar approximation as lesces. Therefore, once any of these two lineages migrates to Yt21 deme 1, the coalescence of the two lineages should happen 2mx ðtÞ 1 2mx ðiÞ f ðtÞ¼Prob½t ¼ t 1 1 2 2 1 in deme 1. The coalescence in deme 1 must occur before t = m1 m1 x t 2Nx i x i 2ð Þ i 0 2ð Þ 2ð Þ t + d and that in deme 2 before t = t .Theprobabilitiesof ¼ 2m 1þð2m=sÞðesd21Þ these two events are given by 1 2 Q and Q,respectively. x1ðtÞ x2ðtÞ Again, the probability that a given lineage in deme 2 1 e · exp 2 2m þ t 2 ðest 2 1Þ migrates to deme 1 is mx1(t)/x2(t). The probability that two 2N 2Ns lineages at deme 2, which remained distinct until time t 2 1, ¼ 4Nmx ðtÞf ðtÞ: coalesce at time t is 1/(2Nx2(t)). Then, the probability that 1 2c the coalescent event occurs at deme 2 at time t is given by (15)

218 Y. Kim and T. Maruki ð t 2ðr=sÞ ð1Þ e2r=s 2rd x1ðtÞ = 2 : Pcoal e fm1ðtÞ dt ð1 QÞ (17) 0 x2ðtÞ

t ð1Þ If tm1 (for very small m), Pcoal is further approximated 2r=s 22rd sd to e e ; as x1ðtÞee : Therefore, the hitchhiking effect (the relative reduction in heterozygosity at the neutral locus) given such genealogy at the selected locus is 12e2r=se22rd:Compared with the solution in the panmixia model of hitchhiking (Equation 10), the homozygosity decreases by a factor of e22rd; which clearly represents the additional opportunity for recombination due to the ex- Figure 4 Probability of coalescence and lineage migration as functions of tended length of B genealogy (= d) due to migration. {Note time. The gray curve, the black curve, and the gray dashed curve plot f2c(t) (Equation 13), fm1(t) (Equation 15), and fm(t) (Equation 20), respectively. that more accurate comparison between panmictic vs. sub- The x-axis shows scaled time (backward) during the selective phase, with divided populations should take the difference in e [e.g.,1/ 0 and 1 corresponding to unscaled time 0 and t (¼ 1520), respectively, (8Ns) vs. 1/(4Ns), respectively, for a total population size of K N 5 s m 24 f t with ¼ 2, 2 ¼ 10 , ¼ 0.01, and ¼ 10 . The peaks of m1( ) and 4N] into account. Therefore, the comparison in this case fm(t) move closer to 1.0 (t)asm is further reduced (not shown). should be made between 1 2 (e/2)2r/s and 12e2r=se22rd:} d ? t e ? 2sd Figure 4 shows that the peaks of f2c(t) and fm1(t) both If [ e and x1(r) 1], Equation 16 is similarly occur between t/2 and t. It is because both probabilities for approximated to e4r/s, which basically means a twofold in- the coalescent and migration increase substantially when crease in the neutral lineages’ opportunity to dissociate from x2(t) becomes small. Given the first migration occurs at B lineages. However, this radical effect on genetic hitchhik- d ? t tm1, tm2 should be distributed between tm1 and t. However, ing may not happen frequently because requires a very as m > 1/t we may assume that tm2 is simply equal to t (the low migration rate, which causes B lineages to coalesce in remaining B lineage traces back to the first B allele that, deme 2 rather than in deme 1. forward in time, entered deme 2 and survived stochastic If B lineages coalesce in deme 2, the probability that this loss). As demonstrated below, this simplification leads to genealogy is marked by zero recombination events is ap- an overestimation of heterozygosity at the linked neutral proximated by ð ð locus. Furthermore, we make another simplifying assump- t t tion that two B lineages now in deme 1 coalesce at time t = fcðtÞexp 2 2r ð1 2 x2ðzÞÞdz dt t + d, which also leads to an overestimation of heterozygos- ð2Þ 0 0 Pcoal ¼ Q ity at the neutral locus by the same degree that Equation 10 Ð (18) t st 2ð2r=sÞ overestimates the heterozygosity in the standard (panmixia) fcðtÞðð1 þ ee Þ=ð1 þ eÞÞ dt ¼ 0 ; model of genetic hitchhiking. Q Heterozygosity at the neutral locus is determined by the e2r/s probability that the B genealogy described above is marked which is slightly greater than asthetimeofthecoalescent t by a recombination event by which a linked neutral lineage is smaller than . Finally, the hitchhiking effect on the hetero- moves to a chromosome carrying allele b. The probability of zygosity for two gene copies sampled in deme 2 is given by this recombination event at time t is (1 2 x (t))r if the 1 Hð22Þ lineage is in deme 1 and (1 2 x (t))r if the lineage is in 2 2 ð1Þ 2 ð2Þ : 2 ~ ¼ 1 Q 1 Pcoal þ Q 1 Pcoal (19) deme 2. Therefore, no recombination event is marked on H a lineage thatQ remainsn in deme oi from timeÐ t1 to t2 with t2 t2 probability 1 2 rð1 2 xiðtÞÞ exp½2r ð1 2 xiðtÞÞdt: Figures 5 and 6 show the comparison between this ap- t¼t1 t1 GivenÐ that B lineages coalesce at deme 1 (with probability proximation and results from frequency-based simulation. r 2 0fm1ðtÞdt ¼ 1 Q), the probability that this genealogy is As we took steps in simplifying formulas that consistently marked by zero recombination events (i.e., neutral lineages co- lead to the overestimation of relative heterozygosity after alesce at the same time B lineages coalesce) is approximated by a selective sweep, our approximation predicts greater eleva-

Ð t Ð Ð t d tion of heterozygosity (i.e., greater decline of the hitchhiking Pð1Þ f t exp 2 r t 1 2 x z dz 2 r þ 1 2 x z dz coal ¼ 0 m1ð Þ ½ Ð 0ð 2ð ÞÞ Ð t ð 1ð ÞÞ 2 t 2 2 tþd 2 = 2 effect) due to population subdivision than suggested by sim- r 0ð1 x2ðzÞÞdz r t ð1 x1ðzÞÞdzdt ð1 QÞ ( ! ! ! !)2 = ðt ðr sÞ 1 þ eest 1 þ eest 1 þ eest 1 þ eest ulation results. ¼ f ðtÞ dt=ð1 2 QÞ m1 e e sðt 2 dÞ e e sðt 2 dÞ 0 1 þ 1 þ e 1 þ 1 þ e ii. One gene copy from deme 1 and the other from deme 2 2 = ðt 2 = 1 þ e ðr sÞ x ðtÞ ðr sÞ ¼ f ðtÞ 1 dt=ð1 2 QÞ: e2 e 2 sd m1 ð þ e Þ 0 x2ðtÞ Next, the approximate solution for the expected heterozy- (16) gosity when one chromosome is sampled from deme 1 and the other from deme 2, H(12), is similarly obtained. Ignoring If d is smaller than t/2 (i.e., e ≪ e2sd), we get the possibility that the B lineage starting in deme 1 enters

Selective Sweep Over a Structured Population 219 Figure 5 Relative heterozygosity ðHð22Þ= H~Þ for two chromosomes sampled in deme 2 with increasing recombination from the selected locus. Analytic approximations and the results of frequency-based simulations are shown for an identical set of parameters [2N ¼ 105, s ¼ 0.01 (black) or 0.1 (gray), and m/s ¼ 0.01]. On the left, solid and dashed curves show the hitchhiking effect in the subdivided (K ¼ 2) population [Equation 19 using d ¼ d (Equa- 2 = tion 3)] and in the panmictic population ð12ð8NsÞ r sÞ; respectively. Corresponding simulation results are shown on the right.

sd 2rd deme 2 and coalesces to the other B lineage there, we may assuming x1ðtÞee : The factor e indicates that only consider a model in which the coalescence of two B lineages one B lineage, starting from deme 2, is subject to the addi- occurs in deme 1 only: one lineage always remains in deme tional marking of a recombination event due to its extension 1 and the other lineage migrates from deme 2 to deme 1 at by d. Figure 7 shows the analytic approximation to the rel- (12) (12) time tm. This will lead to an overestimation of H because ative increase of H due to population subdivision [Equa- 2r=s the coalescence in deme 2, which would produce smaller B tion 23, with d ¼ d; divided by H~ð12ðe=2Þ Þ; which is the genealogy with less opportunity for recombination, is pre- prediction for panmixia after merging demes 1 and 2] and cluded. Furthermore, we again make the simplifying as- the corresponding quantity observed in simulations. As sumption that the coalescent event in deme 1 happens at expected, our approximation overestimates the elevation t = t + d. Then, the probability that this genealogy is of H(12) caused by population subdivision. marked by zero recombination events is approximately iii. Two gene copies from deme 1 ð ð t tþd ð12Þ 2 2 fi Pcoal ¼ fmðtÞexp r ð1 x1ðzÞÞdz We nally investigate the expected heterozygosity when 0 ð 0 ð t tþd both chromosomes are sampled from deme 1. When it is 2r ð1 2 x2ðzÞÞdz 2 r ð1 2 x1ðzÞÞdz dt 0 t ( )2 = assumed that lineages starting in deme 1 do not migrate to ðt ðr sÞ (20) e þ 1 e þ e2sðt 2 tÞ e þ 1 ¼ f ðtÞ dt m 2sðtþdÞ 2st 2sðtþd 2 tÞ deme 2, the hitchhiking effect is determined solely due to 0 e þ e e þ e e þ e ð t 2ðr=sÞ 2r=s x1ðtÞ the trajectory of B in deme 1 only. In that case e fmðtÞ dt; 0 x2ðtÞ ð11Þ ~ 2r=s where fm(t) is the probability distribution of tm. Therefore, H H 1 2 e ; (24) ð ! t 2ðr=sÞ ð12Þ ~ 2r=s x1ðtÞ which is basically the result of a selective sweep as it occurs H H 1 2 e fmðtÞ dt : (21) 0 x2ðtÞ in deme 1 in isolation. Note that this level of heterozygosity is lower than that in the panmictic population that would be created if two demes are merged via free migration. The We may use 2r=s latter is given by H~ð12ðe=2Þ Þ because the initial fre- Yt21 quency of B is halved as the population size doubles. It is 2mx1ðtÞ 2mx1ðiÞ fmðtÞ 1 2 well known that, if all else is equal, reduction in polymor- x ðtÞ x ðiÞ 2 i¼0 ð2 phism by a single selective sweep is greater in a smaller t 2mx1ðtÞ x1ðzÞ population because the frequency trajectory of the beneficial exp 2 2m dz (22) x2ðtÞ 0x2ðzÞ allele is shorter (Barton 2000). Therefore, the effect of pop- 2m 1þð2m=sÞðesd 2 1Þ 22mt ulation subdivision with low migration is to increase the ¼ x1ðtÞ e : x2ðtÞ hitchhiking effect when two chromosomes are sampled from the deme where the beneficial allele originated. However, as the migration rate increases, H(11) should approach that However, when m/s is very small, which causes the dis- in the merged panmictic population. Therefore, it is expected tribution of t to be sharply concentrated very closely to t, m that the relative effect of population subdivision on hitchhik- the above equation fails to accurately describe the probabil- ing, H(11) /H(11) ,is(12e2r/s)/(12(e/2)2r/s)with ity density (data not shown). We thus take another step of subdiv panmic m > 1/t but gradually increases to 1 as m increases. Sim- simplification by using f (t)=1ift = t and 0 otherwise: the m ulation results confirm this prediction (Figure 8). In the migration of the lineage from deme 2 to 1 happens at time t. following, we use Equation 24 for H(11) as we are mainly This again leads to the overestimation of H(12) because the concerned in the case of m> s. length of the lineage staying in deme 2 and being subject to recombination marking is maximized. H(12) is then finally iv. Random sample from the entire population reduced to The expected effect of hitchhiking on the entire population, given that two chromosomes are sampled randomly over ð12Þ ~ 2r=s 2rd H Hð1 2 e e Þ; (23) two demes, is therefore

220 Y. Kim and T. Maruki (11) (11) (22) (22) Figure 8 The relative effect of population subdivision [H subdiv/H panmic] Figure 6 The relative effect of population subdivision [H subdiv/H panmic] for two chromosomes sampled from deme 2 as a function of migration for two chromosomes sampled from deme 1 as a function of migration K N 5 r s s rate. K ¼ 2, 2N ¼ 105, r/s ¼ 0.01, and s ¼ 0.01 (black) and 0.1 (gray). rate. ¼ 2, 2 ¼ 10 , / ¼ 0.01, and ¼ 0.01 (black) and 0.1 (gray). Approximations are given by the ratio of Equation 19 with d ¼ d and Dashed-dotted lines mark the approximate low-migration limit of the het- 2 = 2 e2r/s 2 e 2r/s 12ð8NsÞ r s: Heterozygosity (mean 6 2 SE) obtained in the simulation of erozygosity ratio: (1 )/(1 ( /2) ) (see text for explanation). Hetero- 6 the subdivided population (K ¼ 2) was divided by the mean heterozygosity zygosity (mean 2 SE) obtained in the simulation of the subdivided K obtained from the simulation of genetic hitchhiking in the panmictic pop- population ( ¼ 2) was divided by the mean heterozygosity obtained from ulation (the population with 4N chromosomes). the simulation of genetic hitchhiking in the panmictic population (the population with 4N chromosomes). ! T 11 12 22 . Hð Þ Hð Þ Hð Þ Hð Þ ~ ¼ þ þ H: (25) elevation of heterozygosity occurs. An alternative derivation H~ 4 2 4 for m . 1/t, provided in the Appendix, also suggests that no significant increase of heterozygosity occurs due to popula- Figures 9 and 10 show that the relative heterozygosity tion subdivision. after genetic hitchhiking is larger in the subdivided popula- We also plotted in Figure 10 the expected increase in time tion with small m than in the panmictic population. Namely, taken for a beneficial mutation to become fixed in the entire population subdivision weakens the strength of genetic population (“fixation time”), given by the ratio of the expec- hitchhiking. Compared to simulation results, our approxima- tation under population subdivision, d þ 2log½4Ns=2; and tion overestimates the elevation of relative heterozygosity that under panmixia, 2 log[8Ns]/s. As the relative length of for small m, as expected from our consistent use of assump- the fixation time increases in the subdivided population with ~ tions that lead to the overestimation of heterozygosity. How- decreasing m, HðTÞ=H also increases. This result is compatible ever, we also note that this effect of population subdivision is with the conjecture that the extension of fixation time allows limited to very small m/s values. With intermediate m more recombination that breaks the association between ben- (0.1s), both the approximation (although expected to be eficial and neutral alleles. However, decreasing the migration inaccurate) and simulation results suggest that only minor rate has a greater effect on fixation time than on heterozygosity. Figures 6 and 7 show that H(22) and H(12) respond differently to decreasing m, the overall effect being less drastic increase of H(T) than that of fixation time.

Subdivided Population With K =10 Next, we expand our analysis to the stepping-stone model of a subdivided population with more than two demes. The beneficial mutation arising in deme 1 is expected to propa- gate symmetrically in both directions and the last deme where the mutation is fixed is bK/2c steps away from deme 1. We therefore predict that, at least for m > s, the mean total time taken for the beneficial mutation to become fixed in the (12) (12) entire population (fixation time” is approximately Figure 7 The relative effect of population subdivision [H subdiv/H panmic] for a pair of chromosomes sampled from different demes as a function of j k migration rate. K ¼ 2, 2N ¼ 105, r/s ¼ 0.01, and s ¼ 0.01 (black) and 0.1 K 2 log½4Ns tK ¼ d þ : (26) (gray). Approximations are given by the ratio of Equation 23 with d ¼ d and 2 s 2 = 12ð8NsÞ r s: Heterozygosity (mean 6 2 SE) obtained in the simulation of the subdivided population (K ¼ 2) was divided by the mean heterozygosity obtained from the simulation of genetic hitchhiking in the panmictic pop- Compared to simulation results, this approximation works ulation (the population with 4N chromosomes). well asymptotically for very small m/s while the panmictic

Selective Sweep Over a Structured Population 221 Figure 9 Relative heterozygosity ðHðTÞ= H~Þ for two chromosomes randomly sampled from the entire population with increasing recombination from the selected locus. Analytic approximations and the results of frequency- based simulations are shown for an identical set of parameters [2N ¼ 105, s ¼ 0.01 (black) or 0.1 (gray), and m/s ¼ 0.01]. On the left, solid and dashed curves show the hitchhiking effect in the subdivided (K ¼ 2) population [Equa- tion 25 in which H(11), H(12), and H(22) are given by Equations 24, 23, and 19, respectively, and d ¼ d] and that 2 = in the panmictic population ð12ð8NsÞ r sÞ; respectively. Corresponding simulation results are shown on the right. approximation works for larger m (Figure 11). As the migra- hitchhiking diminishes as the beneficial mutation spreads tion rate increases, the fixation time approaches that in the across neighboring demes. As explained above, genetic corresponding panmictic population. hitchhiking reduces heterozygosity in the first deme of a sub- Simulation results were obtained only for the hitchhiking divided population with small m/s more than it reduces effect with K = 10. The pattern of heterozygosity immedi- heterozygosity in the merged (panmictic) population be- ately after the fixation of the beneficial allele in the entire cause deme 1 behaves similarly to a small isolated popula- population is similar to that with K = 2 (Figure 12A): tion. However, in other demes (e.g., demes 3–6 in Figure a smaller migration rate between neighboring demes results 12A), heterozygosity greatly increases as migration rate in higher heterozygosity (for two chromosomes randomly decreases. This produces an overall negative correlation be- selected from the entire population; shaded curve in Figure tween heterozygosity (over the entire population) and mi- 12A). However, similar to the case of K = 2, the elevation of gration rate. We also note that, as migration rate decreases, heterozogosity is less pronounced than the increase of fixa- heterozygosity over the entire population increases much tion time. For example, comparing Figures 11 and 12A, faster than heterozygosity for a single deme. This reflects while fixation time increases by 68%, heterozygosity the fact that genetic hitchhiking spreading over a subdivided increases by 34%. When two chromosomes are sampled population creates genetic differentiation among subpopu- from each subpopulation, heterozygosity greatly depends lations that were initially homogeneous (see below), as pre- on the location of sampling if m > s: heterozygosity is low- dicted by Slatkin and Wiehe (1998). est in the deme of the beneficial mutation’s origin (deme 1) For a comparison, we also simulated selective sweeps and progressively increases as distance from deme 1 spreading over K = 10 demes that are arranged according to increases (Figure 12A). This generalizes the pattern Wright’s island model, in which a given haploid chromo- obtained above with K = 2 that the strength of genetic some enters a common migrant pool with probability m per generation and then migrates to a randomly chosen deme. Except for this change in spatial arrangement, the same sets of parameters used in the simulations of the stepping-stone model were used. Figure 12B shows that the mean heterozygosity over the entire population after a selective sweep (shaded curve) increases with decreasing m.

(T) (T) Figure 10 The relative effect of population subdivision [H subdiv /H panmic and the relative increase of fixation time] for a pair of chromosomes randomly sampled from the entire population as a function of migration rate. K ¼ 2, 2N ¼ 105, r/s ¼ 0.01, and s ¼ 0.01 (black) and 0.1 (gray). Approximations for heterozygosity ratio (solid curves) are obtained by 2r=s Equation 25 divided by 12ð8NsÞ ; where d ¼ d and H(11), H(12), and H(22) are given by Equations 24, 23, and 19, respectively. Heterozygosity (mean 6 2 SE) obtained in the simulation of the subdivided popula- Figure 11 Time taken for a beneficial mutation to become fixed in the tion (K ¼ 2) was divided by the mean heterozygosity obtained from subdivided population with K ¼ 10, as the function of migration rate. the simulation of genetic hitchhiking in the panmictic population Simple approximation (Equation 26; black curve) is compared with simu- (the population with 4N chromosomes). Dashed curves show lation results (gray curve). 2N ¼ 105 and s ¼ 0.01. Mean values were ðd þ 2 log½4Ns=2Þ=ð2 log½8Ns=sÞ; the expectation for the relative in- obtained from 10,000 replicates for each parameter set. The expected crease in time taken for the beneficial allele to become fixed in the entire time for the corresponding panmictic population, 2 log[4NKs]/s, is shown population. by the dashed line.

222 Y. Kim and T. Maruki Figure 12 Relative heterozygosity ðHð:Þ= H~Þ for two chromosomes randomly sampled within individual demes (from deme 1 to deme 6 shown by curves with increasing dash sizes) and from the entire population (gray curve) for the stepping-stone model (A) and the island model (B) with K ¼ 10 and r/s ¼ 0.01. Note that, for the stepping-stone model, the fixation of the beneficial mutation occurs first in deme 1 and last in deme 6. Parameters for simulations are identical to those used for simulations described in Figure 11.

However, this increase is much smaller than that for the equilibrium process that sweeps across the entire population stepping-stone model. Similar to the stepping-stone model, can be greatly affected by population subdivision, as dem- the mean heterozygosity in deme 1, the birthplace of the onstrated in this study. On the basis of results obtained beneficial allele, is reduced with low migration. Mean het- above, we find that population subdivision causes several erozygosities in other demes (results for only three of nine important modifications in the strength and pattern of a se- demes are shown in Figure 12B) slightly increase with de- lective sweep. creasing migration. Much smaller heterozygosity for an in- First, the strength of genetic hitchhiking, measured by dividual deme than that for the entire population again the reduction of expected heterozygosity, is diminished in reflects genetic differentiation among demes. Comparing a subdivided population if the migration rate is much Figure 12A and 12B, we conclude that the pattern of a se- smaller than the selective advantage of the beneficial lective sweep is more severely affected in the stepping-stone mutation, while the population structure shaped by the model than in the island model, which demonstrates the same migration rate might be undetected in the examination complexity of genetic hitchhiking caused by a beneficial mu- of neutral polymorphism. As briefly argued by Barton tation that spreads in one-dimensional space. (2000), this effect is attributed to increased time taken for Further results, regarding the variance of heterozygosity the beneficial mutation to reach high frequency, which pro- and population differentiation, from the simulation of the vides more opportunity for the breakdown of hitchhiking by stepping-stone modelpffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi are given in Table 1. The coefficient of recombination. This result implies that the strength of di- ðijÞ ðijÞ (ij) variation, cvij ¼ Var½H =H ; where H is the expected rectional selection estimated from the chromosomal span of heterozygosity for one chromosome sampled in deme i and reduced polymorphism (Kim and Stephan 2002; Thornton another in deme j, was obtained over 10,000 replicates for et al. 2007) assuming panmixia might be an underestimate. (T) a given parameter set. We also calculated FST =(H 2 However, we also find that the relative increase in the fixa- (S) (T) (T) fi H )/H , where H is the expectedP heterozygosity in tion time of a bene cial mutation under population subdi- ðSÞ K ðiiÞ = : the entire population and H ¼ i¼1H K It is shown vision is much greater than the relative increase in expected that, for a given r/s, population subdivision (reduction in m) heterozygosity. This implies that, if the strength of selection causes a moderate increase in the coefficient of variation is estimated separately from the chromosomal span of re- while it causes a drastic increase in FST. Note that simulation duced polymorphism and from the age of the sweeping started with uniform frequencies of a neutral allele in all haplotype estimated by rare mutations (for example, Sáez demes (FST = 0). As predicted by Slatkin and Wiehe et al. 2003; Meiklejohn et al. 2004; Xue et al. 2006), the (1998) and Bierne (2010), the effect of population subdivi- former estimate might be greater than the latter (and closer sion on genetic differentiation after a selective sweep criti- to the true value). It is not, however, clear whether such cally depends on the recombination rate: Table 1 shows a discrepancy can be detected with a reasonable statistical that, for a given m, intermediate values of r/s (0.03 power. 0.1) produce the largest FST. The second important result is that a beneficial mutation leaves weaker signature of selection as it spreads across distant demes after its origin in the first population. With Discussion K = 2, while population subdivision results in a much stron- Complex geographic and demographic structures of natural ger hitchhiking effect in the first population relative to that populations have potentially great impact on evolutionary expected under panmixia (Equation 24 with m ≪ s), it causes genetic processes and thus on the interpretation of DNA a much weaker hitchhiking effect in the second population: sequence polymorphism (Jensen et al. 2005; Nielsen et al. the relative level of expected polymorphism is much higher 2005; Li and Stephan 2006; Kim and Gulisija 2010; Stephan than the panmixia. It is because the extension of neutral 2010). To examine the effect of spatial demographic struc- lineages exposed to hitchhike-breaking recombination ture on the pattern of selective sweeps, we used a simplified applies only to those descending to the second population. model of population subdivision. While a relatively slow Comparing Figures 6 and 8 we find that the expected het- evolutionary process such as the coalescence of neutral gene erozygosity can be up to 30% higher in the second pop- lineages may be less sensitive to spatial structure, a fast non- ulation than in the first. Figure 12A also indicates that

Selective Sweep Over a Structured Population 223 Table 1 Simulation of stepping-stone model (K ¼ 10, s ¼ 0.01, 2N ¼ 105)

(11) (33) (66) (13) (16) (36) r/sm/sH(cv11) H (cv33) H (cv66) H (cv13) H (cv16) H (cv36) FST 0.01 0.01 0.041 (1.81) 0.057 (1.72) 0.080 (1.51) 0.057 (1.78) 0.082 (1.76) 0.089 (1.65) 0.068 0.01 0.03 0.042 (1.66) 0.054 (1.60) 0.078 (1.43) 0.052 (1.59) 0.070 (1.55) 0.075 (1.49) 0.037 0.01 0.1 0.047 (1.49) 0.054 (1.45) 0.069 (1.33) 0.052 (1.43) 0.061 (1.36) 0.064 (1.35) 0.015 0.01 0.3 0.045 (1.40) 0.051 (1.39) 0.057 (1.33) 0.051 (1.38) 0.054 (1.33) 0.055 (1.34) 0.0047 0.01 1 0.049 (1.34) 0.049 (1.34) 0.049 (1.34) 0.049 (1.34) 0.049 (1.34) 0.049 (1.33) 0.00086 0.001 0.1 0.0049 (4.18) 0.0054 (4.27) 0.0077 (4.06) 0.0053 (4.03) 0.0066 (3.99) 0.0069 (4.03) 0.0060 0.003 0.1 0.014 (2.52) 0.016 (2.51) 0.022 (2.36) 0.016 (2.44) 0.019 (2.42) 0.020 (2.37) 0.0088 0.01 0.1 0.047 (1.49) 0.054 (1.45) 0.069 (1.33) 0.052 (1.43) 0.061 (1.36) 0.064 (1.35) 0.015 0.03 0.1 0.119 (0.95) 0.133 (0.92) 0.161 (0.84) 0.130 (0.92) 0.149 (0.87) 0.155 (0.86) 0.022 0.1 0.1 0.246 (0.55) 0.258 (0.52) 0.281 (0.43) 0.257 (0.52) 0.276 (0.48) 0.279 (0.46) 0.021 0.3 0.1 0.312 (0.23) 0.313 (0.21) 0.316 (0.17) 0.315 (0.20) 0.319 (0.16) 0.319 (0.16) 0.0098 1 0.1 0.317 (0.14) 0.317 (0.14) 0.318 (0.14) 0.319 (0.11) 0.321 (0.10) 0.320 (0.10) 0.0063

a gradient of heterozygosity will be observed along the path frequency spectrum and linkage disequilibrium are critically of the beneficial mutation’s propagation in a geographically dependent on the shape (branching pattern) of the coales- structured population. Slatkin and Wiehe (1998) and Bierne cent trees at the neutral loci, which in turn depends on the (2010) showed that genetic hitchhiking can introduce pop- shape of genealogy at the locus under selection (B geneal- ulation differentiation (large FST) in an initially homoge- ogy) (Fay and Wu 2000; Kim and Nielsen 2004; McVean neous population (small FST) and also implicitly suggested 2007; Pfaffelhuber et al. 2008). For example, Pfaffelhuber that this population differentiation occurs as a selective et al. (2006) showed that approximating the B genealogy by sweep leaves asymmetric levels of variation in accordance a Yule process corrects the error introduced by the assump- with our conclusion. Their and our results make it clear that tion of star-like B genealogy. Population subdivision is likely an evolutionary process can lead to heterogeneous outcomes to introduce a much greater deviation from the star-like across different demes if it occurs faster than migration. We genealogy that cannot be corrected by a Yule tree. Consider may use this prediction of the gradient of heterozygosity a sample of chromosomes that are distributed randomly along the pathway of a sweep to infer the geographical over multiple demes. The probability that two B lineages origin of a beneficial mutation and reveal the pattern of coalesce in a deme outside its mutational origin (Equation migration in a hidden geographic structure by analyzing 14), thus preventing a star-like tree, becomes nonnegligible DNA sequence polymorphism that harbors a signature of if s is sufficiently larger than m. Furthermore, while the selective sweep distributed over multiple demes. markings of recombination events are concentrated near This study used simple models of a structured population, the root of a Yule tree in a single panmictic population, it in which demes are of equal size and a given structure of is expected to occur in many other points along a B geneal- division remains unchanged through time. Actual popula- ogy, corresponding to times when a B lineage is in a deme tions in nature experience complex demographic changes. with low frequency of B, in a subdivided population. How An important case of demographic complexity is the split of these factors modify the patterns of frequency spectrum an ancestral population into several demes that are dis- and linkage disequilibrium, particularly along the demes in persed over geographic structure, as in the evolutionary the path of beneficial mutation’s propagation, remains to be history of humans. We argue that our analysis of selective investigated. This requires the development of computer sweeps in subdivided populations is still applicable to this simulation methods that can generate multisite (realistic demographic scenario. If a beneficial mutation spreads very DNA sequence) polymorphism data, which is planned for a rapidly across subpopulations, which should be true for strong follow-up article. selectionevenwithacertaindelaybylimitedmigration,the selective sweep would effectively occur under constant geo- Acknowledgments graphic structure as the timescale of the sweep is much shorter than that of major demographic processes. Our analysis can be We thank Wolfgang Stephan and two anonymous reviewers applied as long as demes are genetically homogeneous at the whose comments greatly improved the manuscript. This time of mutation to the beneficial allele. This condition will be work was supported by National Institutes Health grant met if different demes were recently established from the split R01GM084320 and an Ewha Womans University Research of the ancestral population and thus contain similar profiles of Grant of 2010. genetic variation. Although it was not explicitly addressed in this study, our analysis suggests that population subdivision may also Literature Cited significantly modify the other (“higher-order”) pattern of Akey, J. M., 2009 Constructing genomic maps of positive selection in polymorphism as the signature of a selective sweep. The site humans: Where do we go from here? Genome Res. 19: 711–722.

224 Y. Kim and T. Maruki Baines, J. F., A. Das, S. Mousset, and W. Stephan, 2004 The role in Drosophila simulans by haplotype mapping and composite- of natural selection in genetic differentiation of worldwide pop- likelihood estimation. Genetics 168: 265–279. ulations of Drosophila ananassae. Genetics 168: 1987–1998. Nielsen, R., 2005 Molecular signatures of natural selection. Annu. Barton, N. H., 1998 The effect of hitch-hiking on neutral geneal- Rev. Genet. 39: 197–218. ogies. Genet. Res. 72: 123–133. Nielsen, R., S. Williamson, Y. Kim, M. J. Hubisz, A. G. Clark et al., Barton, N. H., 2000 Genetic hitchhiking. Philos. Trans. R. Soc. B 2005 Genomic scans for selective sweeps using SNP data. Biol. Sci. 355: 1553. Genome Res. 15: 1566. Bierne, N., 2010 The distinctive footprints of local hitchhiking in Pfaffelhuber, P., B. Haubold, and A. Wakolbinger, 2006 Ap- a varied environment and global hitchhiking in a subdivided proximate genealogies under genetic hitchhiking. Genetics population. Evolution 64: 3254–3272. 174: 1995–2008. Das, A., S. Mohanty, and W. Stephan, 2004 Inferring population Pfaffelhuber, P., A. Lehnert, and W. Stephan, 2008 Linkage dis- structure and demography of Drosophila ananassae from multi- equilibrium under genetic hitchhiking in finite populations. locus data. Genetics 168: 1975–1985. Genetics 179: 527. Durrett, R., and J. Schweinsberg, 2004 Approximating selective Sabeti, P. C., S. F. Schaffner, B. Fry, J. Lohmueller, P. Varilly et al., sweeps. Theor. Popul. Biol. 66: 129–138. 2006 Positive natural selection in the human lineage. Science Etheridge, A., P. Pfaffelhuber, and A. Wakolbinger, 2006 An ap- 312: 1614–1620. proximate sampling formula under genetic hitchhiking. Ann. Sáez, A. G., A. Tatarenkov, E. Barrio, N. H. Becerra, and F. J. Ayala, Appl. Probab. 16: 685–729. 2003 Patterns of DNA sequence polymorphism at Sod vicini- Faure, M. F., P. David, F. Bonhomme, and N. Bierne, 2008 Genetic ties in Drosophila melanogaster: unraveling the footprint of a re- hitchhiking in a subdivided population of Mytilus edulis.BMC cent selective sweep. Proc. Natl. Acad. Sci. USA 100: 1793–1798. Evol. Biol. 8: 164. Santiago, E., and A. Caballero, 2005 Variation after a selective Fay, J. C., and C. I. Wu, 2000 Hitchhiking under positive Darwin- sweep in a subdivided population. Genetics 169: 475–483. ian selection. Genetics 155: 1405–1413. Slatkin, M., 1976 The rate of spread of an advantageous allele in Hermisson, J., and P. S. Pennings, 2005 Soft sweeps: molecular a subdivided population, pp. 767–780 in Population Genetics and population genetics of adaptation from standing genetic varia- Ecology, edited by S. Karlin and E. Nevo. Academic Press, New tion. Genetics 169: 2335–2352. York. Jensen, J. D., Y. Kim, V. B. Dumont, C. F. Aquadro, and C. D. Slatkin, M., 1987 Gene flow and the geographic structure of nat- Bustamante, 2005 Distinguishing between selective sweeps ural populations. Science 236: 787. and demography using DNA polymorphism data. Genetics Slatkin, M., and T. Wiehe, 1998 Genetic hitch-hiking in a subdi- 170: 1401–1410. vided population. Genet. Res. 71: 155–160. Kaplan, N. L., R. R. Hudson, and C. H. Langley, 1989 The “hitch- Stephan, W., 2010 Detecting strong positive selection in the ge- hiking effect” revisited. Genetics 123: 887–899. nome. Mol. Ecol. Res. 10: 863–872. Kim, Y., and D. Gulisija, 2010 Signatures of recent directional selec- Stephan, W., T. H. E. Wiehe, and M. W. Lenz, 1992 The effect of tion under different models of population expansion during colo- strongly selected substitutions on neutral polymorphism: ana- nization of new selective environments. Genetics 184: 571–585. lytical results based on diffusion-theory. Theor. Popul. Biol. 41: Kim, Y., and R. Nielsen, 2004 Linkage disequilibrium as a signa- 237–254. ture of selective sweeps. Genetics 167: 1513–1524. Stephan, W., L. Xing, D. A. Kirby, and J. M. Braverman, 1998 A Kim, Y., and W. Stephan, 2002 Detecting a local signature of test of the background selection hypothesis based on nucleotide genetic hitchhiking along a recombining chromosome. Genetics data from Drosophila ananassae. Proc. Natl. Acad. Sci. USA 95: 160: 765–777. 5649–5654. Kim, Y., and T. Wiehe, 2009 Simulation of DNA sequence evolu- Thornton, K. R., J. D. Jensen, C. Becquet, and P. Andolfatto, tion under models of recent directional selection. Brief. Bioin- 2007 Progress and prospects in mapping recent selection in form. 10: 84–96. the genome. Heredity 98: 340–348. Li, H., and W. Stephan, 2006 Inferring the demographic history and Williamson, S. H., M. J. Hubisz, A. G. Clark, B. A. Payseur, C. D. rate of adaptive substitution in Drosophila. PLoS Genet. 2: e166. Bustamante et al., 2007 Localizing recent adaptive evolution Maynard Smith, J., 1971 What use is sex. J. Theor. Biol. 30: 319– in the human genome. PLoS Genet. 3: e90. 335. Wright, S., 1940 Breeding structure of populations in relation to Maynard Smith, J., and J. Haigh, 1974 The hitch-hiking effect of speciation. Am. Nat. 74: 232–248. a favorable gene. Genet. Res. 23: 23–35. Xue, Y., A. Daly, B. Yngvadottir, M. Liu, G. Coop et al.,2006 Spread McVean, G., 2007 The structure of linkage disequilibrium around of an inactive form of caspase-12 in humans is due to recent a selective sweep. Genetics 175: 1395. positive selection. Am. J. Hum. Genet. 78: 659–670. Meiklejohn, C. D., Y. Kim, D. L. Hartl, and J. Parsch, 2004 Identification of a locus under complex positive selection Communicating editor: M. W. Feldman

Selective Sweep Over a Structured Population 225 Appendix: Deriving an Approximate Solution to Relative Heterozygosity for t . 1/m When s is not large enough to satisfy t > 1/m, while it may be still greater than m such that d is nonnegligible, we cannot make the assumption that a B lineage stays in one deme until x2(t) becomes small. B lineages will jump frequently across ^ demes if m is greater than 1/t. (For example, with 2N =105, s = 0.01, and m = 0.001, it is calculated that d¼ 182:5 while t = 1520 . 1/m.) In such a case, we may ignore the probability that two B lineages will coalesce at deme 2 [Equations 13 and 14 show that Q is a decreasing function of m for a given t, as the dominant factor of fc(t) between time 0 and t/2 is e22mt]. B lineages are then modeled to coalesce in deme 1 only, approximately at t = t + d. Let l(t) (= 1 or 2) be the location

(deme) of a randomly chosen B lineage at time t and I1[l]be1ifl = 1 and 0 otherwise. Then, considering two randomly chosen chromosomes from the entire population and a very small recombination rate, the probability of the coalescence at the neutral locus is given by tQþd 2 Pcoal ¼ f1 2 I1½lðtÞrð1 2 x1ðtÞÞ 2 ð1 2 I1½lðtÞÞrð1 2 x2ðtÞÞg t¼1 tPþd 1 2 2r fI1½lðtÞð1 2 x1ðtÞÞþð1 2 I1½lðtÞÞð1 2 x2ðtÞÞg (A1) t¼1 tPþd ¼ 1 2 2r fhI1½lðtÞið1 2 x1ðtÞÞ þ ð1 2 hI1½lðtÞiÞð1 2 x2ðtÞÞg; t¼1 where hi indicates the mean obtained by integration over all possible sample paths of l(t). If the jumps of B lineages are frequent enough, a given lineage would be in deme 1 or deme 2 with probabilities x1(t)/(x1(t)+x2(t)) and x2(t)/(x1(t)+ x2(t)), respectively, at time t. Therefore, using an approximation hI1½lðtÞi x1(t)/(x1(t)+x2(t)),

tPþd 2x1ðtÞx2ðtÞ P 1 2 2r 1 2 x1ðtÞ 2 x2ðtÞþ coal x1ðtÞþx2ðtÞ t¼1 Ð tþd 2x1ðtÞx2ðtÞ 1 2 2r 1 2 x1ðtÞ 2 x2ðtÞþ dt 0 x1ðtÞþx2ðtÞ n 1 þ ee2sd 1 þ e 2 þ eð1 þ e2sdÞ ¼ 1 2 2rðt þ dÞþ2r log þ log 2 log s e2e2sd þ ee2sd e2e2sd þ e 2e2e 2 sd þ eð1 þ e 2 sdÞ d (A2) 2r es þ 1 1 2 2rðt þ dÞþ log s 2e h 2r esd þ 1 exp 22rðt þ dÞþ log s 2e = 2sd 2r s = = 1 þ e e 2r s ¼ e2r s : 2 2

The last approximation results from the earlier assumption that d is nonnegligible compared to t ðe2sd> 1Þ: Therefore, the hitchhiking effect approaches that of the panmictic population that would form if two demes merge. When e2sd cannot be 2r/s ignored relative to 1, the above derivation yields Pcoal . (e/2) , which means a stronger hitchhiking effect under a subdivided population contrary to the expectation that population subdivision would cause an overall increase in heterozygosity (and thus a decrease in homozygosity). Similar to the underestimation of H(11) (Equation 24) relative to a panmictic expectation with intermediate m, this error results from specifying the initial frequency of the beneficial mutation in deme 1tobee = 1/(4Ns): with frequent migration that satisfies t . 1/m, the stochastic dynamics of the beneficial mutation would conform to that in a panmictic population of 4N chromosomes. In that case, specifying the initial frequency to be e would underestimate the length of selective phase, which leads to the overestimation of the hitchhiking effect. In conclusion, this derivation, despite errors, suggests that a clear decline in hitchhiking effect due to population subdivision is not expected under m . 1/t.

226 Y. Kim and T. Maruki