Population Genetics of Polymorphism and Divergence in Rapidly Evolving Populations
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.28.450258; this version posted June 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Population genetics of polymorphism and divergence in rapidly evolving populations Matthew J. Melissa1, Benjamin H. Good2, Daniel S. Fisher2, and Michael M. Desai1∗ 1Department of Organismic and Evolutionary Biology, Department of Physics, Quantitative Biology Initiative, and NSF-Simons Center for Mathematical and Statistical Analysis of Biology, Harvard University 2Department of Applied Physics and Department of Bioengineering, Stanford University ∗[email protected] (Dated: June 29, 2021) In rapidly evolving populations, numerous beneficial and deleterious mutations can arise and segregate within a population at the same time. In this regime, evolutionary dy- namics cannot be analyzed using traditional population genetic approaches that assume that sites evolve independently. Instead, the dynamics of many loci must be analyzed simultaneously. Recent work has made progress by first analyzing the fitness variation within a population, and then studying how individual lineages interact with this travel- ing fitness wave. However, these \traveling wave" models have previously been restricted to extreme cases where selection on individual mutations is either much faster or much slower than the typical coalescent timescale Tc. In this work, we show how the traveling wave framework can be extended to intermediate regimes in which the scaled fitness effects of mutations (Tcs) are neither large nor small compared to one. This enables us to describe the dynamics of populations subject to a wide range of fitness effects, and in particular, in cases where it is not immediately clear which mutations are most important in shaping the dynamics and statistics of genetic diversity. We use this ap- proach to derive new expressions for the fixation probabilities and site frequency spectra of mutations as a function of their scaled fitness effects, along with related results for the coalescent timescale Tc and the rate of adaptation or Muller's ratchet. We find that competition between linked mutations can have a dramatic impact on the proportions of neutral and selected polymorphisms, which is not simply summarized by the scaled selection coefficient Tcs. We conclude by discussing the implications of these results for population genetic inferences. I. INTRODUCTION with fitness effect s will fix. From this quantity, we can compute the rate at which mutations will accu- Models of evolutionary dynamics and population mulate (i.e. the rate of genotypic divergence from an genetics aim to predict how observable properties ancestor) and the rate, v, at which the mean fitness of the distribution of genotypes within a population of the population changes over time (McCandlish are shaped by evolutionary forces such as mutations, and Stoltzfus, 2014). In addition to these properties natural selection, and genetic drift. We focus here on of long-term divergence, we wish to predict expected the simplest possible models of these three core evo- patterns of genetic diversity within the population lutionary forces, which consider a population con- at any given time. For example, we aim to compute sisting of N individuals, with each individual sub- the coalescence timescale Tc (the typical time since ject to new mutations at rate U per generation, and individuals are related by common ancestry, which with the fitness effect of each mutation drawn from determines the expected overall level of genetic di- some distribution ρ(s). More complicated models of versity in the population), as well as other readily evolutionary dynamics can incorporate the effects of observable quantities such as the site frequency spec- additional evolutionary forces (e.g., spatial structure trum (which describes the relative abundances of and migration, environmental fluctuations in time polymorphisms at different allele frequencies). Ob- or space, and ecological interactions). However, the taining predictions for patterns of genetic diversity dynamics arising from even the simplest models| is of particular interest because of the potential to incorporating only the forces of mutation, selection, use these predictions, in combination with measured genetic drift|are already rather complex. levels of contemporary sequence diversity, to infer the strength of the various evolutionary forces which Within the context of these simple models, we have acted on a population. wish to understand how several types of observables depend on the key parameters. Specifically, we aim Much classical work proceeds by assuming the dif- to predict the probability, pfix(s), that a mutation ferent loci in the genome evolve completely indepen- bioRxiv preprint doi: https://doi.org/10.1101/2021.06.28.450258; this version posted June 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. 2 dently of one another (i.e., by ignoring the physical vian coalescent (McVean and Cardin, 2005)), and linkage of mutations within the same genomic seg- can also incorporate other complications such as ge- ment). The dynamics are then relatively straightfor- ographic subdivision, population structure, and de- ward to analyze (Fisher, 1931; Wright, 1931). This mographic changes (Nordborg, 2004). However, the assumption is reasonable in sexually evolving pop- backwards-time nature of the coalescent approach ulations where recombination is sufficiently rapid makes it difficult to incorporate the effects of se- compared to the timescales on which mutation, se- lection: genealogies cannot be considered indepen- lection and genetic drift generate correlations be- dently of the selected mutations which occur on their tween loci (Falconer and Mackay, 1996). In this case, branches (Kaplan et al., 1988). Efforts to do so have a given mutation is shuffled onto a large number of largely been limited to simulation-based or essen- genetic backgrounds over the course of its evolution- tially numerical approaches, such as the ancestral se- ary trajectory, and any correlations in the fates of lection graph (Krone and Neuhauser, 1997), though mutations at different loci become negligible. analytical progress has been made in certain cases, In asexual populations, however, mutations are particularly in the presence of purifying selection on are fully linked and do not evolve independently of deleterious mutations (Charlesworth, 1994; Hudson one another. Even in the genomes of obligately sex- and Kaplan, 1995). ual organisms, loci that are physically close within To model the effects of selection on beneficial mu- a chromosome will be broken up by recombination tations, much work has instead been done using only over very long timescales, and cannot necessar- forward-time approaches. Broadly speaking, these ily be treated as independently evolving (Weissman approaches seek to characterize the trajectories of and Barton, 2012). A useful opposite regime to con- mutant lineages in a probabilistic sense. Provided sider is then the limit of a genomic region that is selection is sufficiently strong and selected mutations small enough that recombination can be entirely ne- are sufficiently rare (more precisely, if Ns 1 and glected. One can imagine that on sufficiently short NU log Ns < 1), a beneficial mutation typically ei- genomic distance scales (within so-called \linkage ther sweeps to fixation or is purged before another blocks"), evolution proceeds entirely asexually, while such mutation arises within the same linkage block on larger genomic distance scales evolution acts on (Desai and Fisher, 2007). Thus, within this strong- some collection of recombining blocks. It is not en- selection weak-mutation (SSWM) regime, multiple tirely clear how this division between asexual linkage selected mutations are unlikely to segregate simulta- blocks and larger-scale regions consisting of multiple neously within the block. In this case, the dynamics recombining blocks is best modeled, though some re- of each selected mutation can be treated indepen- cent work has begun to address this question (Good dently of one another, using the single-locus meth- et al., 2014; Neher et al., 2013; Weissman and Hal- ods described above (Gillespie, 1983). The impact latschek, 2014). However, we focus in this paper on of these mutations on linked neutral diversity can an even simpler question: how to model the evolu- also be calculated from these dynamics (e.g., to an- tion within a single nonrecombining linkage block. alyze the signature of an isolated selective sweep) As we will see, even within the context of the simple (Smith and Haigh, 1974). However, recent work models described above, evolutionary dynamics even has shown that in a wide range of microbial and within perfectly asexual linkage blocks are surpris- viral populations, and potentially in many linked re- ingly complex, and remain incompletely understood. gions of the genomes of obligately sexual organisms There is a long history of efforts to analyze the such as humans, multiple beneficial mutations often dynamics of purely asexual evolution, and numer- segregate simultaneously (Lieberman et al., 2014; ous approaches