INVESTIGATION
Hitchhiking Effect of a Beneficial Mutation Spreading in a Subdivided Population
Yuseob Kim*,†,1 and Takahiro Maruki* *Center for Evolutionary Medicine and Informatics and School of Life Sciences, Arizona State University, Tempe, Arizona 85287, and †Department of Life Science, Ewha Womans University, Seoul, Korea 120-750
ABSTRACT A central problem in population genetics is to detect and analyze positive natural selection by which beneficial mutations are driven to fixation. The hitchhiking effect of a rapidly spreading beneficial mutation, which results in local removal of standing genetic variation, allows such an analysis using DNA sequence polymorphism. However, the current mathematical theory that predicts the pattern of genetic hitchhiking relies on the assumption that a beneficial mutation increases to a high frequency in a single random- mating population, which is certainly violated in reality. Individuals in natural populations are distributed over a geographic space. The spread of a beneficial allele can be delayed by limited migration of individuals over the space and its hitchhiking effect can also be affected. To study this effect of geographic structure on genetic hitchhiking, we analyze a simple model of directional selection in a subdivided population. In contrast to previous studies on hitchhiking in subdivided populations, we mainly investigate the range of sufficiently high migration rates that would homogenize genetic variation at neutral loci. We provide a heuristic mathematical analysis that describes how the genealogical structure at a neutral locus linked to the locus under selection is expected to change in a population divided into two demes. Our results indicate that the overall strength of genetic hitchhiking—the degree to which expected heterozygosity decreases—is diminished by population subdivision, mainly because opportunity for the breakdown of hitchhiking by recombination increases as the spread of the beneficial mutation across demes is delayed when migration rate is much smaller than the strength of selection. Furthermore, the amount of genetic variation after a selective sweep is expected to be unequal over demes: a greater reduction in expected heterozygosity occurs in the subpopulation from which the beneficial mutation originates than in its neighboring subpopulations. This raises a possibility of detecting a “hidden” geographic structure of population by carefully analyzing the pattern of a selective sweep.
HEN a beneficial mutation arises in a population and 1974; Kaplan et al. 1989; Stephan et al. 1992; Fay and Wu Wrapidly increases to high frequency by directional se- 2000; Kim and Stephan 2002; Hermisson and Pennings lection, it also increases the frequency of neutral alleles on 2005; Etheridge et al. 2006). However, major theoretical re- the same chromosome at linked polymorphic loci, resulting sults were obtained from models that consider the spread of in a sudden reduction in genetic variation. This effect, abeneficial mutation in a single random-mating population. termed genetic hitchhiking (Maynard Smith and Haigh Natural populations are composed of individuals that are 1974) or selective sweep, provides a powerful means to distributed over geographical space. Therefore, with limited identify and study recent episodes of adaptive evolution migration, mating occurs more frequently among individu- (reviewed in Nielsen 2005; Sabeti et al. 2006; Thornton als that are close to each other than among those that are far et al. 2007; Akey 2009; Stephan 2010). Numerous studies apart on the space. This geographic structure is often advanced the mathematical model of this evolutionary pro- reduced to a simple demographic model in which a number cess and now provide accurate theoretical predictions and of small populations (or demes), each of which is panmictic, tools for genomic data analyses (Maynard Smith and Haigh are connected to each other by limited migration (Wright 1940). In this model, a proportion m of individuals in a deme is replaced by migrants from other demes each generation. A Copyright © 2011 by the Genetics Society of America doi: 10.1534/genetics.111.130203 fundamental result of spatial population genetics is that, if m Manuscript received May 1, 2011; accepted for publication June 2, 2011 is sufficiently large so that Nm . 1 where N is the population 1Corresponding author: Department of Life Science and Division of EcoScience, Ewha Womans University, 11-1 Daehyun-dong, Seodaemun-Ku, Seoul, Korea 120-750. size of a deme, neutral genetic variation is homogenized E-mail: [email protected] over the entire system of connected demes at equilibrium
Genetics, Vol. 189, 213–226 September 2011 213 under mutation–drift–migration balance (Slatkin 1987). that under a panmictic population and, more importantly, Therefore, even when a population has a clear geographic the frequency of a beneficial allele may be rising concur- structure (m > 0.5), its pattern of polymorphism at many rently, if not in complete synchrony, in multiple demes. This neutral markers may not deviate from that under panmixia. biological condition is important because, as explained In this case, modeling a natural population to be panmictic above, natural populations have a geographic structure that would not cause much problem. can modify the frequency trajectories of beneficial mutations However, if an evolutionary process occurs at a timescale over space while this structure is often undetected at the that is much shorter than that of the neutral coalescent, neutral loci (thus the genomic average pattern of variation) limited migration may lead to a heterogeneous footprint of as Nm . 1. We are investigating the effect of this hidden the process over geographic space. The spread of a beneficial geographic structure on the pattern of selective sweeps. Our mutation is a case of such fast evolutionary change. Let the analysis begins with a simple model in which two equal- frequency of a new beneficial mutation initially increase by sized panmictic populations are connected by limited migra- a factor of 1 + s per generation in a local deme. Then, its tion and a beneficial mutation occurs in one deme and then spread to the entire population will be limited by migration spreads to the entire population. Even with this simple model, if m , s, regardless of the value of Nm. Namely, the geo- we could provide only heuristic mathematical analysis based graphical structure of natural populations is expected to on numerous simplifying assumptions and then compare the cause the allele frequency trajectory of a beneficial allele solution with computer simulations. However, our limited to deviate from that in a panmictic population. Then, the results still clearly demonstrate the important effect of pop- hitchhiking effect of the beneficial mutation spreading over ulation subdivision on the pattern of genetic hitchhiking. a subdivided population may be different from that in a pan- mictic population. For example, Barton (2000) predicted Model that a beneficial mutation spreading over a geographic dis- tance will produce a weaker hitchhiking effect because the A brief overview of the model of selective sweep in a time taken for the mutation to reach fixation is longer com- subdivided population, in comparison with that in a panmic- pared to the case of panmixia. tic population, is presented in Figure 1. A positively selected The theory of genetic hitchhiking in a subdivided pop- allele, B, rapidly spreads in the population in association ulation was developed in previous studies, notably by with a particular neutral “hitchhiker” allele at a linked locus. Slatkin and Wiehe (1998) and Santiago and Caballero Such an association is broken down by recombination (2005). They mainly analyzed the model in which subpopu- events, which allow the residual levels of polymorphism lations are isolated with weak migration (Nm > 1), under after the fixation of the B allele. Key differences in the pro- which a beneficial mutation goes to fixation in one popula- cess of genetic hitchhiking between panmictic and subdi- tion before it starts to increase in the next population (i.e., vided populations are identified in Figure 1. First, it takes the locus under selection is polymorphic in one deme only at longer for B to be fixed in a subdivided population because a given time). Their results indicated that, if neutral varia- the spread of B in the second deme starts only after B has tion near the target of selection is initially homogeneous increased to a sufficiently high frequency in the first deme. among subpopulations, the sequential fixations of the ben- Second, in the panmictic population, opportunity for the eficial allele may create genetic differentiation, because oc- recombination breakdown of allelic association monotoni- casional recombination may cause different neutral alleles at cally decreases while the frequency of B increases. On the a locus to hitchhike in different subpopulations. Therefore, other hand, such recombination occurs as a two-step process
Wright’s FST would increase from a small to an intermediate in the subdivided population, first in deme 1 and later in value by hitchhiking (Slatkin and Wiehe 1998; Bierne deme 2. This may effectively increase the overall rate of 2010). On the other hand, if subpopulations are highly dif- recombination breakdown and thus weaker hitchhiking. Im- ferentiated initially, the fixation of a beneficial allele in the portantly, depending on which B-bearing chromosomes mi- entire population reduces the level of differentiation at grate and increase under positive selection in the second linked neutral loci, because common neutral alleles are deme, the same or different alleles at the neutral locus likely to hitchhike to the beneficial mutation with limited may be hitchhiked to high frequency in the two demes. recombination. Therefore, FST decreases from a high to an In our model of a subdivided population, 2NK haploid intermediate value (Santiago and Caballero 2005). These individuals are subdivided equally into K demes. Unless theoretical studies helped interpret the geographic pattern stated otherwise, demes are structured according to the cir- of neutral polymorphism shaped by natural selection in cular stepping-stone model if K . 2. Demes are indexed by 1 a highly structured species such as Drosophila ananassae to K, indicating their spatial order. Demes 1 and K are neigh- (Stephan et al. 1998; Baines et al. 2004; Das et al. 2004) boring each other, thus making a circular population structure. and others (Faure et al. 2008). Generations are not overlapping and, in each generation, hap- In this study, we are interested in a subdivided population loids reproduce in the order of selection, recombination, and with more frequent migration (Nm . 1), under which the migration. During migration, the proportion m of haploids in pattern of long-term neutral variation would be similar to deme j moves to neighboring demes (m/2 to deme j 2 1 and
214 Y. Kim and T. Maruki Figure 1 Schematic illustration of the two- locus two-allele model of selective sweeps in panmictic and subdivided (K ¼ 2) populations. Chromosomes found in populations are shown to carry alleles at a locus under selection (wild- type allele b or beneficial allele B) and a neutral locus (allele A or a). The panmictic population is initially fixed for the b allele and a single copy of the B allele appears on a chromosome carrying allele A (the “hitchhiker” allele), which exists in equal frequency with a (stage 1a). The fre- quency of haplotype BA thus increases rapidly under positive selection and limited recombina- tion. It is shown that a BA chromosome under- goes recombination with a ba chromosome to allow the spread of Ba haplotype in the popu- lation (1b indicated by “·”). Allele a thus sur- vives the wipeout but exists in low frequency when B reaches fixation (1c). In the subdivided population, the rapid spread of B, also in asso- ciation with A, is initially limited to the first deme (2a and 2b). While B is increasing in fre- quency, its association with A is broken by re- combination (2b) and a chromosome carrying the B and the A allele migrates to the second deme and starts increasing there (2c). B in the second deme is also subject to recombination with a (2c). B is fixedinthefirst deme while it is still in intermediate frequency in the second deme (2d). When B becomes fixed in the entire population (2e), allele a is in low frequency in both demes. Note that it could have been a Ba chromosome instead of a BA chromosome that migrated (in stage 2c) and initiated the spread of B in the second deme, in which case allele a would become dominant in the second deme and result in much less change in the overall allele frequency at the neutral locus in the entire population. m/2 to deme j + 1). Note that for K = 2, two demes ex- condition until it reaches fixation in the entire population. change 2Nm migrants per generation. For modeling the All simulation results are based on 10,000 replicates for hitchhiking effect, we consider two biallelic loci—selected each parameter combination. and neutral loci that are partially linked with the probability fi of recombination r per generation. At the selected locus, Migration-Limited Trajectory of Bene cial Mutation mutation from the wild-type allele b to a beneficial allele The frequency of beneficial mutation rapidly increases B, with selective advantage s, arises on a particular chromo- within a deme by positive selection. However, its spread some in deme 1. At the time of this mutation, each of K into the entire population might be limited by the rate of demes is polymorphic at the neutral locus, with A and migration between demes. Namely, there might be a “delay” 2 a alleles in frequencies p0 and 1 p0, respectively. After in the fixation of beneficial mutation in a subdivided pop- genetic hitchhiking,P the frequency of A in deme j changes ulation compared to the panmictic population of equal size. to pj and p = j pj/K. We are mainly concerned about the To analyze the hitchhiking effect in this model, we first change in heterozygosity in the entire population (from need to understand how much delay in the spread of ben- ~ 2 ðTÞ 2 H ¼ 2p0ð1 p0Þ to H ¼ 2pð1 pÞÞ. eficial mutation is caused by geographical structure of the A forward-in-time simulation is built directly on this population. We firstconsiderthecaseofK =2.Itisas- discrete-time genetic model. At each generation, haplotype sumed that the beneficial mutation arising in deme 1 is frequencies at K demes are changed deterministically by eventually fixedinbothdemes1and2.LetXj(T)bethe selection, recombination, and migration, followed by the frequency of allele B in deme j at time T,whentimeis step of random sampling that uses a random binominal counted forward in generations and T =0whenthemu- number generator (Kim and Wiehe 2009). The initial allele tation to allele B happened in deme 1. Then, we define fi ^ ^ frequency at the neutral locus is given as a xed value (0.2) Tj ¼ maxT½XjðTÞ.0 and XjðT21Þ¼0 .Namely,Tj marks for all demes. We also specified initial frequencies for K the time when the copy of allele B that survives loss by demes sampled from equilibrium distribution at the balance genetic drift is established in deme j.Wedefine the delay of mutation, migration, and drift (obtained by separate for- in the spread of allele B by ward-in-time simulations) but this did not yield different ^ ^ outcomes when the hitchhiking effect was measured by d ¼ jT2 2 T1j: (1) ðTÞ=~ H H (data not shown). Initial haplotypes were given such ^ ^ that it simulates the occurrence of a beneficial mutation at (Note that, with m > s, T2 . T1 ¼ 0 in most cases. However, a randomly chosen chromosome. If the beneficial mutation with more frequent migration, deme 1 may lose the B allele is lost, the simulation run is repeated from a new initial by genetic drift but later receive the allele from deme 2. In
Selective Sweep Over a Structured Population 215 fi this case the^ roles^ of deme 1 and 2 are reversed and the delay the t of Equation 1 to the stochastic trajectory of allele B becomes T1 2 T2.) (Maynard Smith 1971; Kim and Nielsen 2004). Then, we The expected value of d can be solved by approximating obtain the approximation of the mean delay time fi X1(T) by the deterministic trajectory of a bene cial muta- ^ 1 s tion, starting from frequency e (>1), in a single panmictic d [ log 1 þ : (5) population of size 2N. It is therefore assumed that migration s m between two populations does not affect the trajectory, This result shows that the delay time is on the same order fi , fi which might be justi ed for m s. Therefore, we de ne (1/s)asthedurationofthefixation process (t ¼ 2ð2=eÞlogðeÞÞ e and critically depends on the relative ratio of s and m. X ðTÞ¼ (2) 1 e þ expð2sTÞ Figure 2 shows the comparison of these approximations to the delay time observed in frequency-based stochastic (Stephan et al. 1992). In each generation, on average simulations.
2NmX1(T) copies of allele B enter deme 2 from deme 1. Each copy of B is lost by genetic drift with probability 1 2 2s, Hitchhiking Effect of the Beneficial approximately, assuming s > 1. At least one copy of B suc- Mutation—“Marked” Coalescent cessfully establishes in deme 2 at time T with probability fi 12ð122sÞ2NmX1ðTÞ. Therefore, the mean waiting time until Next, we analyze the hitchhiking effect of a bene cial fi mutation that spreads across demes in the manner described Pthe rst occurrenceQ of suchP a migrantQ copy is given by (using N 2 n21 N n , above. Our goal is to obtain an approximate solution for the n¼1nð1 anÞ i¼0ai ¼ n¼1 i¼0ai if janj 1 for all n, and making a Taylor series approximation) change of heterozygosity due to hitchhiking. The analysis is based on modeling genetic hitchhiking by the “marked” co- PN TQ21 alescent, which is derived from the structural coalescent d 2 2 2NmX1ðTÞ 2 2NmX1ðiÞ E½ ¼ T 1 ð1 2sÞ ð1 2sÞ model of genetic hitchhiking (Kaplan et al. 1989). In the T¼1 i¼0 PN Q structural coalescent model, gene lineages at a neutral locus T 2 2NmX1ðiÞ ¼ i¼0ð1 2sÞ traced from a present-day sample into the past are described T¼1 PN P to jump between two genetic backgrounds (corresponding 2 T expð 4Nsm i¼o X1ðiÞÞ to beneficial, B, and ancestral, b, alleles) by recombination, ÐT¼1 N 2 : while coalescence among lineages is allowed only when 0 expð 4NsmY1ðTÞÞdT they are on the same background. It can be shown that, Here, during the selective phase (the period between the birth and fixation of the B allele), lineages that arrive in the PT b background rarely move back to the B background and also Y ðTÞ¼ X ðiÞ 1 1 rarely coalesce to each other (Kaplan et al. 1989; Durrett ið¼0 ð T T e e sT and Schweinsberg 2004; Etheridge et al. 2006; Pfaffelhuber 1 1 þ e : X1ðzÞdz ¼ 2sz dz ¼ log et al. 2006). If we thus model that lineages moving from the 0 0e þ e s 1 þ e B to the b background by recombination remain distinct un- Therefore, til they exit the selective phase, the genealogy at the neutral locus is fully described by events of such recombination ðN ðN 1 þ e 4Nm 1 mapped along the genealogy at the selected locus (referred E d dT dT [ d : (3) ½ e sT 4Nm fi 0 1 þ e 0 ð1 þ eesTÞ to below as the B genealogy). Therefore, one may rst ob- tain a B genealogy and then calculate how often it is marked by recombination events to predict the pattern of genetic An essentially identical result was obtained by Slatkin variation at a linked neutral locus (Pfaffelhuber et al. 2006). ? d (1976). With 4Nm 1, is mostly determined by the above Here we first show how a marked coalescent allows a simple 2 = e integration in the interval [0, ð1 sÞln ], where we may derivation of the standard result of genetic hitchhiking e sT 4Nm e sT use (1 + e ) 1+4Nm e . Then, a simpler approx- [hard selective sweep in a single random mating population d imation (however, an overestimate of ) is obtained as (Maynard Smith and Haigh 1974)]. ðN As we are interested in obtaining the heterozygosity after ^ 1 1 4Nme þ 1 d [ dT ¼ log : (4) the fixation of the beneficial mutation, we obtain the B ge- 1 þ 4NmeesT s 4Nme 0 nealogy starting from two distinct beneficial mutations on sampled chromosomes. We consider a single panmictic Considering that the allele frequency of a beneficial population of size 2N in which the beneficial mutation is mutation starting from one copy is elevated by the inverse quasi-fixed with frequency 1 2 e. Time, t, is now counted of its fixation probability (i.e., probability of surviving extinc- backward in generations, t = 0 being the present (time of tion by genetic drift 2s) relative to its deterministic tra- sampling). Then, the frequency of beneficial mutation, B,in jectory, we may use e = 1/(2N)/(2s) = 1/(4Ns) to maximize the population is modeled to decrease from 1 2 e to e by
216 Y. Kim and T. Maruki Ignoring the probability of mutation in the period between t =0andt, the two neutral alleles in the sample will be observed different only if their ancestors at t = t
are distinct, which happens with probability 1 2 Pcoal, and different. Let H~ be the expected heterozygosity at the neutral locus at t = t. Then, the mean heterozy- gosity at the neutral locus at present is given approxi- mately by