Computational Methods in Population Genetics

Total Page:16

File Type:pdf, Size:1020Kb

Computational Methods in Population Genetics Computational Methods in Population Genetics Dirk Metzler February 1, 2017 Contents 1 Introduction 2 1.1 Examples . .2 1.2 Some recommended books on population genetics . .3 1.3 Wright Fisher model and Kingman's Coalescent . .3 1.4 Estimators for θ and Tajima's π ................................6 1.5 Outline of methods . .8 1.5.1 ML with Importance Sampling . .8 1.5.2 MCMC for frequentists and Bayesians . .8 1.5.3 Approximate Bayesian Computation (ABC) . .9 2 Importance sampling for genealogies 9 2.1 Griffiths und Tavar´e . 12 3 Lamarc (and Migrate) 13 4 IM, IMa, IMa2 19 5 Approximate Bayesian Computation (ABC) 22 5.1 ABC with local regression correction . 23 5.2 MCMC without likelihoods . 25 5.3 Sequential / Adaptive ABC . 26 5.4 Optimizing sets of summary statistics with PLS . 27 6 Jaatha 30 6.1 Wild Tomatoes and Jaatha 1.0 . 30 6.2 Jaatha 2.0 . 38 6.3 Application to genome-wide data . 41 6.4 Statistics in jaatha beyond jsfs . 42 6.5 Conclusions . 42 7 The program STRUCTURE 42 7.1 no admixture, no sampling locations . 43 7.2 with admixture . 45 7.3 taking sampling locations into account . 47 7.4 Faster alternatives to STRUCTURE for large datasets . 49 7.4.1 ADMIXTURE . 49 7.4.2 fastSTRUCTURE . 49 7.4.3 snmf() in the Bioconductor R package LEA . 50 8 Simulating and detecting selection 52 8.1 Ancestral Selection Graphs . 53 8.2 Simulating selective sweeps and other kinds of strong selection . 54 1 9 Stats for Selection 56 9.1 Selective Sweeps . 56 9.1.1 Detecting classical selective sweeps . 56 9.1.2 Stats for detecting incomplete sweeps . 58 9.1.3 Soft Sweeps . 60 9.1.4 Signals of sweeps and demography . 61 9.2 Balancing selection . 63 9.3 Does the gene list make sense? . 64 10 Phasing and PAC 65 10.1 Classical methods for phasing . 65 10.1.1 Excoffier and Slatkin's EM algorithm . 65 10.1.2 Excursus: EM algorithm . 66 10.1.3 Basic algorithms in PHASE . 68 10.2 Li&Stephens' PAC approach . 71 10.2.1 Excursus: Stephens and Donnelly's Importance Sampling . 71 10.2.2 Estimating LD and recombination hotspots . 75 10.2.3 PAC in PHASE . 79 10.2.4 Population splitting and recombination . 80 10.2.5 Diversifying selection and recombination . 82 10.3 Phasing large genomic datasets . 83 10.3.1 fastPHASE . 83 10.3.2 Phasing with Beagle software package . 85 10.3.3 IMPUTE version 2 . 86 10.3.4 MaCH . 86 10.3.5 polyHAP . 86 1 Introduction 1.1 Examples Complex Demography ? substructure population growth recent speciation introgression? recombination within loci can we still detect selection? 2 1.2 Some recommended books on population genetics References [1] R. Nielsen, M. Slatkin (2013) An Introduction to Population Genetics: Theory and Applications. Sinauer [2] J.H. Gillespie (2004) Population Genetics { A Concise Guide. 2nd Ed., John Hopkins University Press [3] J. Felsenstein (2016+) Theoretical Evolutionary Genetics. http://evolution.genetics.washington.edu/pgbook/pgbook.html [4] J. Wakeley (2008) Coalescent Theory: An Introduction. Roberts & Company Publishers [5] J. Hein, M.H. Schierup, C. Wiuf (2005) Gene Genealogies, Variation, and Evolution: A Primer in Coalescent Theory. Oxford University Press [6] J. Felsenstein (2002) Inferring Phylogenies, Chapters 26, 27 and 28. Sinauer Associates 1.3 Wright Fisher model and Kingman's Coalescent Basic assumptions of the Wright Fisher model • non-overlapping generations • constant population size • panmictic • neutral (i.e. no selection) • no recombination • N diploid individuals population of 2N haploid alleles (in case of autosomal DNA) Wright Fisher model Each allele chooses an ancestor in the generation before. Generation 0 − 1 − 2 − 3 − 4 −− 5 − 6 − 7 − 8 − 9 Samples are assumed to be taken purely randomly from the population. 3 Generation 0 ABCD − 1 − 2 − 3 − 4 −− 5 − 6 − 7 − 8 − 9 This induces a specific random distribution for the genealogies of the sampled alleles. Generation 0 ABCD − 1 − 2 − 3 − 4 −− 5 − 6 − 7 − 8 − 9 Haploid population of size Ne Average time until two ancestral lineages coalesce: Ne generations. Scale time: (1 time unit) = (Ne generations) ) pairwise coalescence rate = 1 µ := mutation rate per generation θ := 2Ne · µ is the expected number of mutations between 2 random individuals Let Ne −! 1 The Kingman Coalescent 4 Zeit in Zeit in 12 3 ........ k−1 k Generationen N Generatioen 2N/(k(k−1)) 2/(k(k−1)) = 2/(7*6) = 0,0476 2N/(6*5) 2/(6*5) = 0,667 2N/(5*4) 2/(4*5) = 0,1 2N/(4*3) 2/(4*3) = 0,167 2N/(3*2) 2/(3*2) = 0,333 E(total length) k−1 X = 2 · 1=i 2N/(2*1) 2/(2*1) = 1 i=1 typical coalescent trees for n = 8: 6 6 6 7 2 1 2 3 5 4 3 8 3 3 7 6 8 8 5 4 1 5 4 1 7 2 1 5 4 7 8 2 7 6 2 8 6 1 1 4 3 2 3 3 5 8 6 2 4 3 4 7 8 7 8 1 2 5 5 6 1 4 7 5 4 5 7 7 3 4 3 3 8 8 5 1 5 2 6 8 6 3 4 5 2 1 8 2 7 6 1 6 1 7 2 4 simulated coalescent tree with n = 500: 5 1.4 Estimators for θ and Tajima's π Two estimators of θ θπ (\Tajima's π") Average number of pairwise differences. number of mutations θW (\Watterson's θ") = Pk−1 i=1 1=i Both are unbiased estimators of θ, i.e. EθW = Eθπ = θ. Example: Ward et al. (1991) sampled 360 bp sequences from mtDNA control region of n = 63 Nuu Chah Nulth and observed 26 mutations. 26 θ = = 5:5123 W P63 i=1 1=i This corresponds to 0.0153 Mutations per base and per 2 · Ne generations.Assuming a mutation rate −6 µb ≈ 6:6 · 10 per generation per site this leads to an effective population size of θW =360 Nce = ≈ 1150 females 2 · µb How precise is this estimation? θ Pn 1=i2 var(θ ) = + θ2 · i=1 W Pn 1=i Pn 2 i=1 ( i=1 1=i) Theorem 1 Any unbiased estimator of θ has variance at least θ : Pn−1 1 k=1 k+θ (Here, we assume that the estimation is based on a single locus without recombination). 6 For the Nuu Chah Nulth data we get: θW = 5:5123 σθW = 3:42 Confidence range? (2σ-rule would leed to negative values...) Conclusion: Ne could perhaps also be 200 or 3000 females. How can we improve this estimate? Sample more individuals? How many individuals n would we 100/θ need to get σθW = 0:1 · θ? From the formula for varθW follows that we need n ≈ 2 · e . For θ = 5, this is n ≈ 109. For θ = 1, this is n ≈ 1043. number of water molecules on earth≈ 1047 number of seconds since big bang≈ 4:3 · 1017 Solution: sample many loci! References [Fel06] J. Felsenstein (2006) Accuracy of Coalescent Likelihood Estimates: Do We Need More Sites, More Sequences, Or More Loci?Mol. Biol. Evol., 23.3: 691{700. How to sample if • one read is 600 bp long • costs for developing a new locus is 40$ • costs for collecting a sample is 10 or 0.10$ • costs for a single read is 6$ • you can spend 1000$ • true θ is 1.8 (per locus) Optimal sampling scheme: n = 7 or n = 8 , respectively, individuals and 11 loci. With this sampling scheme we get: σθW ≈ 0:2 · θ and σθπ ≈ 0:22 · θ (all this is based on infinte-sites assumptions) Tajima's D θ π > θ W : θ π < θ W : θπ −θW D := σ bθπ −θW substructure? population growth? selection? 7 1.5 Outline of methods 1.5.1 ML with Importance Sampling The Likelihood = ( i)i vector of model parameters D sequence data Z LD( ) = Pr (D) = Pr (D j G) · P (dG): all Genealogies G Importance Sampling Draw G1;:::;Gk (approx.) i.i.d. with density Q and approximate Z k 1 X Pr (D j Gi) · P (Gi) Pr (D j G) P (dG) ≈ : k Q(G ) i=1 i efficient for with Pr (D j Gi) · P (Gi) ≈ Q(Gi) Methods differ in their choice of Q. Griffiths & Tavar´e(1994) Q: Generate G backwards in time, greedy proportional to coalescence and mutation probabilities. Choose between all allowed events. Good for infinite sites models, inefficient if back-mutations are allowed. 1.5.2 MCMC for frequentists and Bayesians Felsenstein, Kuhner, Yamato, Beerli,. For some initial 0, sample Genealogies G approx. i.i.d. according to Pr 0 (G j D) by Metropolis- Hastings MCMC. Coalescent is a natural prior for G! Two flavours: for frequentists: use G1;:::;Gk for Importance Sampling Optimize approx. Likelihood ! 1 Iterate with 0 replaced by 1 for Baysians: Then sample conditioned on Genealogies and iterate to do Gibbs-sampling from Pr( ; G j D). Problems of full-data methods • usual runtime for one dataset: several weeks or months • complex software, development takes years • most programs not flexible, hard to write extensions 8 1.5.3 Approximate Bayesian Computation (ABC) Pritchard et al. (1999) Approximate Bayesian Computation 1. Select summary statistics S = (Si)i and compute their values s = (si)i for given data set 2. Choose tolerance δ 3. repeat until k accepted 0: • Simulate 0 from prior distribution of • Simulate genealogy G according to Pr 0 (G).
Recommended publications
  • Population Size and the Rate of Evolution
    Review Population size and the rate of evolution 1,2 1 3 Robert Lanfear , Hanna Kokko , and Adam Eyre-Walker 1 Ecology Evolution and Genetics, Research School of Biology, Australian National University, Canberra, ACT, Australia 2 National Evolutionary Synthesis Center, Durham, NC, USA 3 School of Life Sciences, University of Sussex, Brighton, UK Does evolution proceed faster in larger or smaller popu- mutations occur and the chance that each mutation lations? The relationship between effective population spreads to fixation. size (Ne) and the rate of evolution has consequences for The purpose of this review is to synthesize theoretical our ability to understand and interpret genomic varia- and empirical knowledge of the relationship between tion, and is central to many aspects of evolution and effective population size (Ne, Box 1) and the substitution ecology. Many factors affect the relationship between Ne rate, which we term the Ne–rate relationship (NeRR). A and the rate of evolution, and recent theoretical and positive NeRR implies faster evolution in larger popula- empirical studies have shown some surprising and tions relative to smaller ones, and a negative NeRR implies sometimes counterintuitive results. Some mechanisms the opposite (Figure 1A,B). Although Ne has long been tend to make the relationship positive, others negative, known to be one of the most important factors determining and they can act simultaneously. The relationship also the substitution rate [5–8], several novel predictions and depends on whether one is interested in the rate of observations have emerged in recent years, causing some neutral, adaptive, or deleterious evolution. Here, we reassessment of earlier theory and highlighting some gaps synthesize theoretical and empirical approaches to un- in our understanding.
    [Show full text]
  • Joint Effects of Genetic Hitchhiking and Background Selection on Neutral Variation
    Copyright 2000 by the Genetics Society of America Joint Effects of Genetic Hitchhiking and Background Selection on Neutral Variation Yuseob Kim and Wolfgang Stephan Department of Biology, University of Rochester, Rochester, New York 14627 Manuscript received December 7, 1999 Accepted for publication March 20, 2000 ABSTRACT Due to relatively high rates of strongly selected deleterious mutations, directional selection on favorable alleles (causing hitchhiking effects on linked neutral polymorphisms) is expected to occur while a deleteri- ous mutation-selection balance is present in a population. We analyze this interaction of directional selection and background selection and study their combined effects on neutral variation, using a three- locus model in which each locus is subjected to either deleterious, favorable, or neutral mutations. Average heterozygosity is measured by simulations (1) at the stationary state under the assumption of recurrent hitchhiking events and (2) as a transient level after a single hitchhiking event. The simulation results are compared to theoretical predictions. It is shown that known analytical solutions describing the hitchhiking effect without background selection can be modi®ed such that they accurately predict the joint effects of hitchhiking and background on linked, neutral variation. Generalization of these results to a more appro- priate multilocus model (such that background selection can occur at multiple sites) suggests that, in regions of very low recombination rates, stationary levels of nucleotide diversity are primarily determined by hitchhiking, whereas in regions of high recombination, background selection is the dominant force. The implications of these results on the identi®cation and estimation of the relevant parameters of the model are discussed.
    [Show full text]
  • Detecting the Form of Selection from DNA Sequence Data
    Outlook COMMENT Male:female mutation ratio muscular dystrophy, neurofibromatosis and retinoblast- outcome of further investigation, this unique transposition oma), there is a large male excess. So, I believe the evidence event and perhaps others like it are sure to provide impor- is convincing that the male base-substitution rate greatly tant cytogenetic and evolutionary insights. exceeds the female rate. There are a number of reasons why the historical Acknowledgements male:female mutation ratio is likely to be less than the I have profited greatly from discussions with my colleagues current one, the most important being the lower age of B. Dove, B. Engels, C. Denniston and M. Susman. My reproduction in the earliest humans and their ape-like greatest debt is to D. Page for a continuing and very fruitful ancestors. More studies are needed. But whatever the dialog and for providing me with unpublished data. References Nature 406, 622–625 6 Engels, W.R. et al. (1994) Long-range cis preference in DNA 1 Vogel, F. and Motulsky, A. G. (1997) Human Genetics: 4 Shimmin, L.C. et al. (1993) Potential problems in estimating homology search over the length of a Drosophila chromosome. Problems and Approaches. Springer-Verlag the male-to-female mutation rate ratio from DNA sequence Science 263, 1623–1625 2 Crow, J. F. (1997) The high spontaneous mutation rate: is it a data. J. Mol. Evol. 37, 160–166 7 Crow, J.F. (2000) The origins, patterns and implications of health risk? Proc. Natl. Acad. Sci. U. S. A. 94, 8380–8386 5 Richardson, C. et al.
    [Show full text]
  • 1 1 Title: Background Selection and the Statistics of Population Differentiation: 2 Consequences for Detecting Local
    bioRxiv preprint doi: https://doi.org/10.1101/326256; this version posted May 24, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 1 2 Title: Background selection and the statistics of population differentiation: 3 consequences for detecting local adaptation 4 5 Authors: Remi Matthey-Doret1 and Michael C. Whitlock 6 7 Affiliation: Department of Zoology and Biodiversity Research Centre, University of 8 British Columbia, Vancouver, British Columbia V6T 1Z4, Canada 9 10 1 Corresponding author: [email protected] 11 12 1 bioRxiv preprint doi: https://doi.org/10.1101/326256; this version posted May 24, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 13 Abstract 14 Background selection is a process whereby recurrent deleterious mutations cause a 15 decrease in the effective population size and genetic diversity at linked loci. Several 16 authors have suggested that variation in the intensity of background selection could 17 cause variation in FST across the genome, which could confound signals of local 18 adaptation in genome scans. We performed realistic simulations of DNA sequences, 19 using parameter estimates from humans and sticklebacks, to investigate how 20 variation in the intensity of background selection affects different statistics of 21 population differentiation.
    [Show full text]
  • Arxiv:1302.1148V1 [Q-Bio.PE] 5 Feb 2013 I
    Genetic draft, selective interference, and population genetics of rapid adaptation Richard A. Neher Max Planck Institute for Developmental Biology, T¨ubingen, 72070, Germany. [email protected] (Dated: January 8, 2018) To learn about the past from a sample of genomic sequences, one needs to understand how evolutionary processes shape genetic diversity. Most population genetic inference is based on frameworks assuming adaptive evolution is rare. But if positive selection operates on many loci simultaneously, as has recently been suggested for many species including animals such as flies, a different approach is necessary. In this review, I discuss recent progress in characterizing and understanding evolution in rapidly adapting populations where random associations of mutations with genetic backgrounds of different fitness, i.e., genetic draft, dominate over genetic drift. As a result, neutral genetic diversity depends weakly on population size, but strongly on the rate of adaptation or more generally the variance in fitness. Coalescent processes with multiple merg- ers, rather than Kingman's coalescent, are appropriate genealogical models for rapidly adapting populations with important implications for population genetic inference. Contents I. Introduction 1 II. Adaptation of large and diverse asexual populations 2 A. Clonal Interference 3 B. Genetic background and multiple mutations 5 III. Evolution of facultatively sexual populations 6 IV. Selective interference in obligately sexual organisms 8 V. Genetic diversity, draft, and coalescence 9 A. Genetic draft in obligately sexual populations 9 B. Soft sweeps 10 C. The Bolthausen-Sznitman coalescent and rapidly adapting populations 11 D. Background selection and genetic diversity 11 VI. Conclusions and future directions 12 Acknowledgments 13 References 13 A.
    [Show full text]
  • Genomic Insights Into Positive Selection
    Review TRENDS in Genetics Vol.22 No.8 August 2006 Genomic insights into positive selection Shameek Biswas and Joshua M. Akey Department of Genome Sciences, University of Washington, 1705 NE Pacific, Seattle, WA 98195, USA The traditional way of identifying targets of adaptive more utilitarian benefits, each target of positive selection evolution has been to study a few loci that one has a story to tell about the historical forces and events hypothesizes a priori to have been under selection. that have shaped the history of a population. This approach is complicated because of the confound- Several genome-wide analyses for positive selection ing effects that population demographic history and have been performed in a variety of species. In this review, selection have on patterns of DNA sequence variation. In we summarize some of the recent studies, primarily principle, multilocus analyses can facilitate robust focusing on humans, critically evaluate what genome- inferences of selection at individual loci. The deluge of wide scans for selection are and are not likely to find and large-scale catalogs of genetic variation has stimulated suggest future avenues of research. A brief overview of many genome-wide scans for positive selection in statistical methods used to detect deviations from several species. Here, we review some of the salient neutrality is summarized in Box 1. For more detailed observations of these studies, identify important chal- discussions, see Refs [6,7]. lenges ahead, consider the limitations of genome-wide scans for selection and discuss the potential significance Thinking genomically of a comprehensive understanding of genomic patterns Positive selection perturbs patterns of genetic variation of selection for disease-related research.
    [Show full text]
  • Background Selection Does Not Mimic the Patterns of Genetic Diversity Produced by Selective Sweeps
    bioRxiv preprint doi: https://doi.org/10.1101/2019.12.13.876136; this version posted June 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 Background selection does not mimic the patterns of genetic diversity 2 produced by selective sweeps 1 3 Daniel R. Schrider 1 4 Department of Genetics, University of North Carolina, Chapel Hill, NC 27514 5 1 Abstract 6 It is increasingly evident that natural selection plays a prominent role in shaping patterns of diversity across the genome. The 7 most commonly studied modes of natural selection are positive selection and negative selection, which refer to directional 8 selection for and against derived mutations, respectively. Positive selection can result in hitchhiking events, in which a 9 beneficial allele rapidly replaces all others in the population, creating a valley of diversity around the selected site along with 10 characteristic skews in allele frequencies and linkage disequilibrium (LD) among linked neutral polymorphisms. Similarly, 11 negative selection reduces variation not only at selected sites but also at linked sites|a phenomenon called background 12 selection (BGS). Thus, discriminating between these two forces may be difficult, and one might expect efforts to detect 13 hitchhiking to produce an excess of false positives in regions affected by BGS. Here, we examine the similarity between BGS 14 and hitchhiking models via simulation. First, we show that BGS may somewhat resemble hitchhiking in simplistic scenarios in 15 which a region constrained by negative selection is flanked by large stretches of unconstrained sites, echoing previous results.
    [Show full text]
  • On the Importance of Skewed Offspring Distributions and Background Selection in Virus Population Genetics
    Heredity (2016) 117, 393–399 & 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved 0018-067X/16 www.nature.com/hdy REVIEW On the importance of skewed offspring distributions and background selection in virus population genetics KK Irwin1,2, S Laurent1,2, S Matuszewski1,2, S Vuilleumier1,2, L Ormond1,2, H Shim1,2, C Bank1,2,3 and JD Jensen1,2,4 Many features of virus populations make them excellent candidates for population genetic study, including a very high rate of mutation, high levels of nucleotide diversity, exceptionally large census population sizes, and frequent positive selection. However, these attributes also mean that special care must be taken in population genetic inference. For example, highly skewed offspring distributions, frequent and severe population bottleneck events associated with infection and compartmentalization, and strong purifying selection all affect the distribution of genetic variation but are often not taken into account. Here, we draw particular attention to multiple-merger coalescent events and background selection, discuss potential misinference associated with these processes, and highlight potential avenues for better incorporating them into future population genetic analyses. Heredity (2016) 117, 393–399; doi:10.1038/hdy.2016.58; published online 21 September 2016 INTRODUCTION taxa—demonstrating that the input of deleterious mutations far Viruses appear to be excellent candidates for studying evolution in outnumbers the input of beneficial mutations (Acevedo et al.,2014; real time; they have short generation times, high levels of diversity Bank et al., 2014; Bernet and Elena, 2015; Jiang et al.,2016).The often driven by very large mutation rates and population sizes (both purging of these deleterious mutants through purifying selection can census and effective), and they experience frequent positive selection affect other areas in the genome through a process known as in response to host immunity or antiviral treatment.
    [Show full text]
  • Estimating the Genome-Wide Contribution of Selection to Temporal Allele Frequency Change
    Estimating the genome-wide contribution of selection to temporal allele frequency change Vince Buffaloa,b,1 and Graham Coopb aPopulation Biology Graduate Group, University of California, Davis, CA 95616; and bCenter for Population Biology, Department of Evolution and Ecology, University of California, Davis, CA 95616 Edited by Montgomery Slatkin, University of California, Berkeley, CA, and approved July 13, 2020 (received for review October 31, 2019) Rapid phenotypic adaptation is often observed in natural popula- a polygenic trait (such as fitness) is distributed across numerous tions and selection experiments. However, detecting the genome- loci. This can lead to subtle allele frequency shifts on standing wide impact of this selection is difficult since adaptation often variation that are difficult to distinguish from background lev- proceeds from standing variation and selection on polygenic els of genetic drift and sampling variance. Increasingly, genomic traits, both of which may leave faint genomic signals indistin- experimental evolution studies with multiple time points, and guishable from a noisy background of genetic drift. One promis- in some cases multiple replicate populations, are being used ing signal comes from the genome-wide covariance between to detect large-effect selected loci (30, 31) and differentiate allele frequency changes observable from temporal genomic data modes of selection (32–34). In addition, these temporal–genomic (e.g., evolve-and-resequence studies). These temporal covariances studies have begun in wild populations, some with the goal of reflect how heritable fitness variation in the population leads finding variants that exhibit frequency changes consistent with changes in allele frequencies at one time point to be predic- fluctuating selection (35, 36).
    [Show full text]
  • Revisiting Current Methods of Population Genetic Selection Inference
    Review Thinking too positive? Revisiting current methods of population genetic selection inference 1,2 1,2 1,2,3 Claudia Bank , Gregory B. Ewing , Anna Ferrer-Admettla , 1,2 1,2 Matthieu Foll , and Jeffrey D. Jensen 1 School of Life Sciences, Ecole Polytechnique Fe´ de´ rale de Lausanne (EPFL), 1015 Lausanne, Switzerland 2 Swiss Institute of Bioinformatics (SIB), 1015 Lausanne, Switzerland 3 Department of Biology and Biochemistry, University of Fribourg, 1700 Fribourg, Switzerland In the age of next-generation sequencing, the availability data, and those that make use of between-population/ of increasing amounts and improved quality of data at species data. While each approach has its respective mer- decreasing cost ought to allow for a better understand- its, in this review, we focus on recent developments in ing of how natural selection is shaping the genome than population genetic inference from polymorphism data in ever before. However, alternative forces, such as demog- both natural and experimental settings (see [1,2] for more raphy and background selection (BGS), obscure the general reviews, and [3,4] for recent and specific literature footprints of positive selection that we would like to on divergence-based selection inference). For population identify. In this review, we illustrate recent develop- genetic inference from single time point polymorphism ments in this area, and outline a roadmap for improved data (as is most commonly the case), this includes not only selection inference. We argue (i) that the development sophisticated statistical methods, but also simulation pro- and obligatory use of advanced simulation tools is nec- grams that enable us to model expected genomic signa- essary for improved identification of selected loci, (ii) tures under a wide variety of possible scenarios.
    [Show full text]
  • An Introduction to Coalescent Theory
    An introduction to coalescent theory Nicolas Lartillot May 26, 2014 Nicolas Lartillot (CNRS - Univ. Lyon 1) Coalescent May 26, 2014 1 / 39 De Novo Population Genomics in Animals Table 3. Coding sequence polymorphism and divergence patterns in five non-model animals. species #contigs #SNPs pS (%) pN (%) pN/pS dN/dS aa0.2 aEWK vA turtle 1 041 2 532 0.43 0.05 0.12 0.17 0.01 0.43 0.92 0.17 60.03 60.007 60.02 60.03 60.18 60.15 hare 524 2 054 0.38 0.05 0.12 0.15 20.11 0.30 ,0 ,0 60.04 60.008 60.02 60.03 60.22 60.23 ciona 2 004 11 727 1.58 0.15 0.10 0.10 20.28 0.10 0.34 0.04 60.06 60.011 60.01 60.01 60.10 60.11 termite 4 761 5 478 0.12 0.02 0.18 0.26 0.08 0.28 0.74 0.20 60.01 60.002 60.02 60.02 60.10 60.11 oyster 994 3 015 0.59 0.09 0.15 0.21 0.13 0.22 0.79 0.21 60.05 60.011 60.02 60.02 60.12 60.13 doi:10.1371/journal.pgen.1003457.t003 Explaining genetic variation Figure 4. Published estimates of genome-wide pS, pN and pN/pS in animals. a. pN as function of pS; b. pN/pS as function of pS; Blue: vertebrates; Red: invertebrates;Gayral Full circles: et species al, analysed2013, in PLoS this study, Genetics designated by 4:e1003457 their upper-case initial (H: hare; Tu: turtle; O: oyster; Te: termite; C: ciona); Dashed blue circles: non-primate mammals (from left to right: mouse, tupaia, rabbit).
    [Show full text]
  • Population Genetics of Rare Variants and Complex Diseases Authors
    Title: Population genetics of rare variants and complex diseases Authors: M. Cyrus Mahera+, Lawrence H. Uricchiob+, Dara G. Torgersonc, and Ryan D. Hernandezd*. aDepartment of Epidemiology and Biostatistics, University of California, San Francisco. bUC Berkeley & UCSF Joint Graduate Group in Bioengineering, University of California, San Francisco. cDepartment of Medicine, University of California San Francisco. bDepartment of Bioengineering and Therapeutic Sciences, University of California, San Francisco. +These authors contributed equally. *Corresponding author: Ryan D. Hernandez, PhD. Department of Bioengineering and Therapeutic Sciences University of California San Francisco 1700 4th St., San Francisco, CA 94158. Tel. 415-514-9813, E-Mail: [email protected] Running title: Natural selection and complex disease Key words: Natural selection, deleterious, simulation, population genetics, rare variants. 1 Abstract: Objectives: Identifying drivers of complex traits from the noisy signals of genetic variation obtained from high throughput genome sequencing technologies is a central challenge faced by human geneticists today. We hypothesize that the variants involved in complex diseases are likely to exhibit non-neutral evolutionary signatures. Uncovering the evolutionary history of all variants is therefore of intrinsic interest for complex disease research. However, doing so necessitates the simultaneous elucidation of the targets of natural selection and population-specific demographic history. Methods: Here we characterize the action of natural selection operating across complex disease categories, and use population genetic simulations to evaluate the expected patterns of genetic variation in large samples. We focus on populations that have experienced historical bottlenecks followed by explosive growth (consistent with most human populations), and describe the differences between evolutionarily deleterious mutations and those that are neutral.
    [Show full text]