Slim: Simulating Evolution with Selection and Linkage

NOTE SLiM: Simulating Evolution with Selection and Linkage Philipp W. Messer1 Department of Biology, Stanford University, Stanford, California 94305 ABSTRACT SLiM is an efficient forward population genetic simulation designed for studying the effects of linkage and selection on a chromosome-wide scale. The program can incorporate complex scenarios of demography and population substructure, various models for selection and dominance of new mutations, arbitrary gene structure, and user-defined recombination maps. ECENT studies suggest that linkage effects such as genetic of demography and population substructure, various models Rdraft and background selection can profoundly alter the for selection and dominance, realistic gene structure, and patterns of genetic variation in many species (Sella et al. user-defined recombination maps. Special emphasis was 2009; Lohmueller et al. 2011; Weissman and Barton 2012; further placed on the ability to model and track individual Messer and Petrov 2013). Understanding the potential im- selective sweeps—both complete and partial. While retain- pact of these linkage effects on population genetic methods ing all capabilities of a forward simulation, SLiM utilizes requires efficient forward simulations that can model the evo- sophisticated algorithms and optimized data structures that lution of whole chromosomes with realistic gene structure. enable simulations in reasonably large populations. All these Forward simulations have a long-standing tradition in features are implemented in an easy-to-use C++ command population genetics and many programs have been developed line program. The source code is freely available under the (Carvajal-Rodriguez 2010; Hoban et al. 2011). For any such GNU General Public License and can be downloaded from program, there is typically a trade-off between efficiency and http://www.stanford.edu/~messer/software. flexibility. Simulations based on combined forward-backward SLiM simulates the evolution of diploid genomes in a approaches, such as MSMS (Ewing and Hermisson 2010), can population of hermaphrodites under an extended Wright– be very fast but remain limited to scenarios with only a single Fisher model with selection (Figure 1A). In each generation, selected locus. Current programs that can model scenarios a new set of offspring is created, descending from the indi- with multiple linked selected polymorphisms, such as FRE- viduals in the previous generation. The probability of becom- GENE (Chadeau-Hyam et al. 2008), GENOMEPOP (Carvajal- ing a parent is proportional to an individual’s fitness, which is Rodriguez 2008), simuPOP (Peng and Kimmel, 2005), forwsim determined by the selection and dominance effects of the (Padhukasahasram et al. 2008), or SFS_CODE (Hernandez mutations present in its diploid genome. Gametes are gener- 2008), either lack the ability to model realistic gene structure ated by recombining parental chromosomes (both crossing or are not efficient enough to allow for simulations on the scale over and gene conversion can be modeled) and adding new of a whole chromosome in reasonably large populations. mutations. Here I present SLiM, a population genetic simulation tar- Mutations can be of different user-defined mutation types, geted at bridging the gap between efficiency and flexibility for specified by their dominance coefficients and distributions of the simulation of linkage and selection on a chromosome- fitness effects (DFE); examples could be synonymous, adaptive, wide scale. The program can incorporate complex scenarios and lethal mutations. The possibility to specify arbitrary dominance effects allows for the simulation of a variety of Copyright © 2013 by the Genetics Society of America evolutionary scenarios, including balancing selection and re- doi: 10.1534/genetics.113.152181 Manuscript received April 11, 2013; accepted for publication May 20, 2013 cessive deleterious mutations. Genomic regions can be of dif- Supporting information is available online at http://www.genetics.org/lookup/suppl/ ferent user-defined genomic element types, specified by the doi:10.1534/genetics.113.152181/-/DC1. particular mutation types that can occur in such elements and 1Address for correspondence: Department of Biology, Stanford University, 371 Serra Mall, Stanford, CA 94305. E-mail: [email protected] their relative proportions; examples could be exons and introns. Genetics, Vol. 194, 1037–1039 August 2013 1037 Figure 1 (A) Illustration of SLiM’s core algorithm for a scenario with two subpopulations. In each generation, a new set of offspring is created from the individuals in the previous generation. Parents are drawn with probabilities proportional to their fitness. Gametes are generated by recombining parental chromosomes and adding new mutations. Migration is modeled by choosing parents from the source subpopulation. (B) Comparison of runtimes and peak resident set size memory requirements between SLiM and SFS_CODE for different evolutionary scenarios. In each column, one of the four parameters L, N, u, and r was varied, while the others were kept constant at their respective base values (L = 5 Mbp, N = 500, u =1029, and r = 1028). Simulations were run for 10N generations. Data points show the observed runtimes and memory requirements averaged over 10 simulation runs. Each mutation has a specified position along the chro- in terms of all mutations and genomes present in the mosome but remains abstract in the sense that the simula- population; (ii) random samples of specific sizes drawn from tion does not specify the particular nucleotide states of a subpopulation at given time points; (iii) lists of all mutations ancestral and derived alleles. Note, however, that the user that have become fixed, together with the times when each has the freedom to associate abstract mutation types with mutation became fixed; and (iv) frequency trajectories of specific classes of events. There are no back mutations in the particular mutations over time. simulations. Fixed mutations are removed from the popula- I ran SLiM under various test scenarios to check whether tion and recorded as substitutions. its output agrees with theoretical predictions. Specifically, I SLiM is capable of modeling complex scenarios of de- analyzed levels of neutral heterozygosity under different mography and population substructure. The simulation can scenarios of demography, population substructure, and self- include arbitrary numbers of subpopulations that can be ing, the fixation probabilities of new mutations of different added at user-defined times, initialized either with new selective effects, and the reduction in neutral diversity individuals or with individuals drawn from another sub- around adaptive substitutions. Simulation results conformed population to model a population split or colonization event. with the respective theoretical predictions in all tests (Supporting Subpopulation sizes, migration rates, and selfing rates can Information, File S1, section 6). be changed over time. SLiM utilizes sophisticated algorithms and optimized To establish genetic diversity, simulations first have to go data structures to achieve its high computational efficiency through a burn-in. Alternatively, simulations can be initialized (File S1, section 7.1). The simulation is based on a hierarchi- with a set of predefined genomes provided by the user or with cal data architecture that minimizes the amount of informa- the output of a previous simulation run. The user can further tion stored redundantly. At many stages of the program, specify predetermined mutations to be introduced at certain large quantities of random numbers have to be drawn from time points. Such mutations can be used, for example, to general probability distributions. To do this efficiently, SLiM investigate individual selective sweeps or to track the frequency uses lookup tables that are precomputed only once per gen- trajectories of particular mutations in the population. Predeter- eration and then allow one to draw random numbers in O mined adaptive mutations can also be limited to partial (1) time. Most algorithms are implemented using fast rou- selective sweeps, where positive selection ceases once the tines provided in the GNU scientific library (Galassi et al. mutation has reached a particular population frequency. 2009). SLiM provides a variety of output options, including (i) To evaluate SLiM’s performance I compared its runtime the complete state of the population at specified time points, and memory requirements with those of SFS_CODE, a popular 1038 P. W. Messer forward simulation of similar scope (Hernandez 2008). For Acknowledgments these tests, a chromosome of length L was simulated with The author thanks Dmitri Petrov for continuous support uniform mutation rate u and recombination rate r in a pop- throughout the project; members of the Petrov lab, espe- ulation of size N over the course of 10N generations, assum- cially Zoe Assaf, David Enard, and Nandita Garud for testing ing an exponential DFE with 2Ns ¼ 2 5(File S1, section the program; and three anonymous reviewers for their 7.3). The base scenario used values L = 5 Mbp, N = 500, valuable comments on program and documentation. Part of u =1029, and r =1028 per site per generation. I then this research was funded by the National Institutes of Health varied these four parameters independently to analyze (grants GM089926 and HG002568 to Dmitri Petrov). how each individually affects the performance of either program. Simulations were conducted on a standard iMac desk- top with

Slim: Simulating Evolution with Selection and Linkage

Determining the Factors Driving Selective Effects of New Nonsynonymous Mutations

I Xio- and Made the Rather Curious Assumption That the Mutant Is

It's About Time: the Temporal Dynamics of Phenotypic Selection in the Wild

Fixation of New Mutations in Small Populations

Selection and Genome Plasticity As the Key Factors in the Evolution of Bacteria

Dynamics of Adaptation in Sexual and Asexual Populations Joachim Krug Institut Für Theoretische Physik, Universität Zu Köln

Selection in a Subdivided Population with Local Extinction and Recolonization

Lecture on Positive and Negative Selection

Determining the Factors Driving Selective Effects of New Nonsynonymous Mutations

Application Evolution: Part 1.1 Basics of Coevolution Dynamics

Hot Spots, Cold Spots, and the Geographic Mosaic Theory of Coevolution

An Experimental Test on the Probability of Extinction of New Genetic Variants