Soft Sweeps and Beyond: Understanding the Patterns and Probabilities of Selection Footprints Under Rapid Adaptation
Total Page:16
File Type:pdf, Size:1020Kb
Methods in Ecology and Evolution 2017, 8, 700–716 doi: 10.1111/2041-210X.12808 HOW TO MEASURE NATURAL SELECTION Soft sweeps and beyond: understanding the patterns and probabilities of selection footprints under rapid adaptation Joachim Hermisson*,1 and Pleuni S. Pennings2 1Department of Mathematics and Max F. Perutz Laboratories, University of Vienna, Vienna, Austria; and 2Department of Biology, San Francisco State University, San Francisco, CA, USA Summary 1. The tempo and mode of adaptive evolution determine how natural selection shapes patterns of genetic diver- sity in DNA polymorphism data. While slow mutation-limited adaptation leads to classical footprints of ‘hard’ selective sweeps, these patterns are different when adaptation responds quickly to a novel selection pressure, act- ing either on standing genetic variation or on recurrent new mutation. In the past decade, corresponding foot- prints of ‘soft’ selective sweeps have been describedbothintheoreticalmodelsandinempiricaldata. 2. Here, we summarize the key theoretical concepts and contrast model predictions with observed patterns in Drosophila, humans, and microbes. 3. Evidence in all cases shows that ‘soft’ patterns of rapid adaptation are frequent. However, theory and data also point to a role of complex adaptive histories in rapid evolution. 4. While existing theory allows for important implications on the tempo and mode of the adaptive process, com- plex footprints observed in data are, as yet, insufficiently covered by models. They call for in-depth empirical study and further model development. Key-words: adaptation, Drosophila,genetichitch-hiking,HIV,lactase,malaria,selectivesweep origins of the same allele have long been ignored in molecu- Hard and soft selective sweeps lar evolution. The view of adaptation in molecular evolution has long Around a decade ago, several publications started to explore focused almost exclusively on a mutation-limited scenario. selective sweeps outside the mutation-limitated scenario The assumption in such a scenario is that beneficial muta- (Innan & Kim 2004; Hermisson & Pennings 2005; Przeworski, tions are rare, so that they are unlikely to be present in the Coop & Wall 2005; Pennings , & Hermisson 2006a, b). These population as standing genetic variation (SGV) or to occur papers described novel patterns for genetic footprints termed multiple times in a short window of time. Mutation-limited ‘soft sweeps’. It has since become clear that non-mutation-lim- adaptation therefore occurs from single new beneficial muta- ited adaptation and soft sweeps are probably much more com- tions that enter the population only after the onset of the mon than originally thought. However, recent years have also selection pressure (e.g. due to environmental change). When seen some lively debate about whether soft sweeps are every- such a beneficial mutation fixes in the population, it reduces where (Messer & Petrov 2013) or rather a chimera (Jensen the genetic diversity at linked neutral loci according to the 2014). Some aspects of this dispute root in diverging interpreta- classical model of a ‘hard’ selective sweep (Maynard Smith & tion of the available data, others go back to conceptual differ- Haigh 1974; Kaplan, Hudson & Langley 1989; Barton 1998). ences. With this review, we aim to provide an overview and If recurrent beneficial mutation is considered at all in a muta- intuitive understanding of the relevant theory and the patterns tion-limitedscenario,eachsuchmutationisassumedtocre- that are observed in model species. ate a new allele. Single adaptive steps then either proceed independently of each other or compete due to clonal inter- DEFINITIONS ference, where adaptation is slowed by linkage (Gerrish & Lenski 1998; Desai & Fisher 2007). Despite evidence for Selective sweeps refer to patterns in genomic diversity that are rapid adaptation from quantitative genetics and phenotypic caused by recent adaptation. Characteristic patterns (foot- studies (Messer, Ellner & Hairston 2016), SGV or recurrent prints) arise if rapid changes in the frequency of a beneficial allele, driven by positive selection, distort the genealogical his- *Correspondence author. E-mail: [email protected] tory of samples from the region around the selected locus. We Both authors contributed equally. can therefore understand sweep footprints via properties of © 2017 The Authors. Methods in Ecology and Evolution published by John Wiley & Sons Ltd on behalf of British Ecological Society. This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes. Beyond soft sweeps 701 their underlying genealogy or coalescent history (Wakeley a process and the type of selective sweep that results, see 2008). In this context, the key genealogical implication of Fig. (2). In particular, adaptation from SGV can either lead to mutation-limited adaptation is that, at the selected locus itself, a hard sweep (if a single copy from the standing variation is the the time to the most recent common ancestor (MRCA) of the ancestor of all beneficial alleles in the sample), or to a soft sample, TMRCA, is shorter than the time that has elapsed since sweep, which can either be single-origin or multiple-origin.The the onset of the new selection pressure, TS. For recent adapta- notion of a ‘soft sweep’ as defined here following previous tion, we thus obtain coalescent histories that are much shorter work (e.g. Hermisson & Pennings 2005; Messer & Petrov 2013; than the expected neutral coalescent time TN (2Ne generations Jensen 2014; Berg & Coop 2015) is therefore not synonymous for a pair of lineages in a diploid population of effective size for ‘adaptation footprint from SGV’. Sweep types refer to Ne). It is this shortened genealogy that is responsible for the classes of patterns that result from characteristic coalescent characteristic footprint of a hard sweep (see the Footprints sec- genealogies, not to evolutionary processes. This leaves us the tion below). task to explore the probability of each sweep type under a given • We define a hard sweep based on the sample genealogy of evolutionary scenario and to use this information for statistical the beneficial allele, requiring that (i) TMRCA TN (recent inference of process from pattern. adaptation), and (ii) TMRCA ≤ TS (a single recent ances- tor), see Fig. 1a. Footprints of hard and soft sweeps If adaptation is not mutation limited because the benefi- cial allele was already present in the population prior to the In order to distinguish footprints of hard and soft sweeps, onset of selection, or because the mutation occurs recur- we need to understand how these footprints are shaped by rently, both assumptions of a hard sweep can be violated: the different characteristic features of the underlying ThetimetotheMRCAtypicallypredatestheonsetofthe genealogies. We assume the following model. A diploid selection pressure, TMRCA > TS, and can approach the population of constant size Ne experiences a new selection neutral coalescent time, TMRCA TN. Still, if adaptation is pressure at time TS. The derived allele A is generated from recent, we can observe a characteristic footprint: a soft the wild type a with fitness 1 by recurrent mutation of rate selective sweep. u. The frequency of A is x = x(t). Prior to TS, A is neutral • We define a soft sweep as a pattern of recent adaptation with or deleterious with fitness disadvantage 1 À sd, sd ≥ 0. After more than a single ancestor prior to the onset of selection in time TS, allele A is beneficial with fitness 1 + sb.Forsim- the underlying genealogy of the beneficial allele. In formal plicity, we assume codominance. terms, we require that (i) TS TN (recent adaptation), and (ii) T > T (more than a single ancestor). MRCA S FOOTPRINTS OF HARD SWEEPS There are two different ways how soft sweep genealogies can comeabout,whichleadtodifferentpatterns(Hermisson& The hallmark of a hard sweep genealogy is a very recent Pennings 2005). common ancestor of all beneficial alleles in the sample. 1. Single-origin soft sweeps refer to a genealogy that traces back Among carriers of the beneficial allele, ancestral variation to a single mutational origin of the beneficial allele, but com- prior to the onset of selection can only be preserved if there prises multiple copies of this allele in the SGV when positive is recombination between the polymorphic locus and the selection sets in at time TS, see Fig. 1b. selection target. Between the closest recombination break- 2. Multiple-origin soft sweep: In this case, the sample genealogy points to the left and to the right of the selected allele, we of the beneficial allele comprises multiple origins of this allele obtain a core region without ancestral variation. In the from recurrent mutation (or from some other source like flanking regions, ancestral variation is maintained on some migration). These origins can occur prior to the onset of haplotypes due to recombination. These recombination hap- selection (standing variation), but also only after that (new lotypes generate characteristic signals in the site-frequency variation), Fig. 1c. spectrum. For both, hard and soft sweeps, the definitions apply irre- We can understand the characteristics of hard sweep foot- spective of whether the beneficial allele has reached fixation prints from the shape of typical genealogies as sketched in (complete sweep) or still segregates in the population (partial Fig. 1a: The genealogy directly at the selected site is ‘star-like’, sweep), as long as we restrict the sample to carriers of the bene- with all coalescence events happening in a short time interval ficial allele only. Note that, for a given sweep locus, we may when the frequency x of the beneficial allele is very small. This observe a soft sweep in one sample, but a hard sweep in a dif- is because all ancestors (between now and the MRCA) in a ferent sample. The probability to observe a soft sweep hard sweep genealogy must carry the beneficial allele.