Supplemental Materials

Total Page:16

File Type:pdf, Size:1020Kb

Supplemental Materials Supplemental Materials Contents Page Experimental Methods 17 1 Study design and sample collection 17 2 High molecular weight DNA extraction 17 3 Linked-read library preparation and sequencing 18 4 Postprocessing and read cloud assignment 18 Computational Methods and Analysis 19 5 Metagenomic pipeline 19 5.1 Quantifying relative abundance at the species level................... 19 5.2 Quantifying gene content for each species........................ 19 5.3 Identifying SNVs within species.............................. 20 5.3.1 Quantifying genetic diversity over time...................... 20 5.3.2 Synonymous and nonsynonymous variants.................... 20 5.3.3 Allele prevalence across hosts........................... 21 5.4 Tracking read clouds associated with SNVs and genes................. 21 5.5 Quantifying frequency trajectories of SNVs....................... 22 5.5.1 Detecting SNV differences over time....................... 22 5.5.2 Private marker SNVs............................... 24 6 Antibiotic resistance gene profiling 24 7 Inferring genetic linkage from shared read clouds 25 7.1 Empirical estimates of read cloud impurity....................... 25 7.2 Empirical estimates of long-range linkage within read clouds.............. 26 7.3 Statistical null model of read cloud sharing....................... 27 7.4 Inferring linkage between pairs of SNVs......................... 29 7.4.1 Quantifying linkage at the SNV level....................... 29 7.4.2 Quantifying linkage at the allelic level...................... 30 7.5 Inferring linkage between SNVs and species backbones................. 31 8 Clustering SNVs into haplotypes 32 9 Distinguishing between genetic drift and natural selection in allele frequency time series 35 1 Supplemental Data Files 40 List of Figures S1 Read cloud statistics for each sample...........................3 S2 Microbiota density over time...............................4 S3 Species abundance dynamics before, during, and after antibiotics...........5 S4 Relative abundance of antibiotic resistance genes as a function of time........6 S5 Reference genome coverage as a function of species relative abundance........6 S6 Analogous version of Fig. 3 for more example species (1/3)..............7 S7 Analogous version of Fig. 3 for more example species (2/3)..............8 S8 Analogous version of Fig. 3 for more example species (3/3)..............9 S9 Read clouds link putative within-species sweeps to the correct genomic backbone.. 10 S10 Read clouds reveal haplotype structure in B. vulgatus ................. 11 S11 Estimated fragment length distribution from HMW DNA extraction protocol.... 12 S12 Linkage disequilibrium between SNVs that fail the 4 haplotype test.......... 13 S13 Peak-to-trough ratio and relative abundance for select species............. 14 S14 Schematic illustration of private marker SNV sharing over time............ 15 S15 Co-inheritance of A. finegoldii SNV differences across a larger cohort......... 16 S16 Calibrated null model of read cloud sharing as a function of read cloud coverage.. 29 S17 Putative B. vulgatus strains detected by the StrainFinder algorithm......... 33 S18 Distribution of frequency trajectory distances between pairs of SNVs in B. vulgatus 35 List of Tables S1 List of samples and associated metadata......................... 39 S2 DNA yield and purity from different extraction methods described in Supplemental Materials, Section2.................................... 39 S3 Evidence for natural selection in SNV frequency trajectories in Fig. 3......... 39 2 Day 1 (84 Gbp) Day 5 (85 Gbp) Day 11 (8 Gbp) Day 26 (87 Gbp) Total reads contributed 101 102 103 101 102 103 101 102 103 101 102 103 Day 36 (90 Gbp) Day 38 (18 Gbp) Day 47 (181 Gbp) Day 50 (15 Gbp) Total reads contributed 101 102 103 101 102 103 101 102 103 101 102 103 Day 68 (87 Gbp) Day 70 (32 Gbp) Day 73 (166 Gbp) Day 75 (26 Gbp) Total reads contributed 101 102 103 101 102 103 101 102 103 101 102 103 Day 78 (22 Gbp) Day 81 (84 Gbp) Day 93 (18 Gbp) Day 111 (45 Gbp) Total reads contributed 101 102 103 101 102 103 101 102 103 101 102 103 Day 119 (40 Gbp) Day 130 (22 Gbp) Day 154 (165 Gbp) Total reads contributed 101 102 103 101 102 103 101 102 103 Read pairs per barcode Read pairs per barcode Read pairs per barcode Figure S1: Read cloud statistics for each sample. Analogous versions of Fig. 1E for each of the 19 samples. 3 Baseline Disease Antibiotics Intermediate Final Bacteroides vulgatus 80 Bacteroides uniformis 60 Bacteroides coprocola 40 Bacteroides cellulosilyticus (ng/ul) 20 Bacteroides eggerthii DNA conc 0 Bacteroides faecis Bacteroides caccae Bacteroides massiliensis Alistipes sp Alistipes onderdonkii Alistipes finegoldii Parabacteroides distasonis Paraprevotella clara Butyrivibrio crossotus Coprococcus comes Coprococcus sp Eubacterium rectale Eubacterium eligens Eubacterium siraeum Phascolarctobacterium sp Absolute species abundance (a.u.) Other 0 20 40 60 80 100 120 140 160 Day Figure S2: Microbiota density over time. Top: Microbiota density estimated by microbial DNA quantification (concentration of extracted DNA per ∼200mg feces) for a subset of the study timepoints. Vertical lines indicate ±20% error bars, as estimated from the precision of our sam- ple mass measurements. The shaded region denotes the antibiotic treatment period. These data show that overall microbiota density was not significantly reduced at the end of antibiotic treat- ment. Bottom: Absolute species abundance dynamics obtained by scaling the relative abundance measurements in Fig. 1C by the microbiota density measurements in the top panel. 4 100 10 1 Bacteroidaceae 10 2 Rikenellaceae Eubacteriaceae 10 3 Lachnospiraceae (baseline units) ABX Abundance Prevotellaceae 4 10 Acidaminococcaceae Porphyromonadaceae 5 10 Ruminococcaceae Oscillospiraceae 0 10 Sutterellaceae Burkholderiales 1 10 Guyana Clostridiales 2 10 Erysipelotrichaceae Oxalobacteraceae 10 3 Pseudomonadaceae Clostridiaceae (baseline units) Final Abundance 10 4 Bifidobacteriaceae Synergistaceae 10 5 10 4 10 3 10 2 10 1 100 Avg Baseline Abundance Figure S3: Species abundance dynamics before, during, and after antibiotics. Absolute abundances of individual species at the antibiotic (top) and final (bottom) timepoint as a function of the average absolute abundance during the two baseline timepoints in Supplemental Fig. S2. Absolute abundances are estimated by multiplying relative abundance estimates by the microbial density measurements in Supplemental Fig. S2, and scaling by the average microbial density at the two baseline timepoints. Symbols denote individual species and are colored by taxonomic family. Open symbols denote species that fall below the detection threshold (≈ 2×10−5). These data show that only a minority of species experienced dramatic reductions in absolute abundance by the end of antibiotic treatment; several of these species recovered in abundance by the end of the study. 5 tetracycline 500 beta-lactam (tet-linked) beta-lactam bacitracin 400 300 200 100 Reads (per million) mapped to ABX genes 0 0 20 40 60 80 100 120 140 160 Day Figure S4: Relative abundance of antibiotic resistance genes as a function of time. Lines show fraction of reads mapping to different classes of antibiotic resistance genes in the ARGs-OAP [1] database (Methods). The shaded region indicates the antibiotic treatment period. 103 t Full lane samples , 1/3 lane samples e g a r 102 e v o c e d i w - 1 e 10 m o n e G 100 10 3 10 2 10 1 100 Species relative abundance Figure S5: Reference genome coverage as a function of species relative abundance. Symbols denote the typical number of unique read clouds per site, Dt, for each reference genome in each sample as a function of the estimated relative abundance of the species in that sample. Blue points indicate samples that were sequenced to a high target coverage (Supplemental Table S1). 6 A Alistipes finegoldii D Eubacterium eligens 100 10 2 Relative abundance 1.0 0.5 Allele frequency 0.0 B Alistipes sp E Phascolarctobacterium sp 100 10 2 Relative abundance 1.0 0.5 Allele frequency 0.0 C Alistipes onderdonkii F Bacteroides coprocola 100 10 2 Relative abundance 1.0 0.5 Allele frequency 0.0 -2yr 0 20 40 60 80 100 120 140 -2yr 0 20 40 60 80 100 120 140 Time (days) Time (days) Figure S6: Analogous version of Fig. 3 for more example species (Part 1/3). The -2yr timepoint is estimated using data from a previous study that sampled the same individual [3]. As in Fig. 3, the shaded region indicates the antibiotic treatment period. 7 A Bacteroides faecis D Bacteroides stercoris 100 10 2 Relative abundance 1.0 0.5 Allele frequency 0.0 B Bacteroides eggerthii E Bacteroides finegoldii 100 10 2 Relative abundance 1.0 0.5 Allele frequency 0.0 C Bacteroides caccae F Bacteroides massiliensis 100 10 2 Relative abundance 1.0 0.5 Allele frequency 0.0 -2yr 0 20 40 60 80 100 120 140 -2yr 0 20 40 60 80 100 120 140 Time (days) Time (days) Figure S7: Analogous version of Fig. 3 for more example species (Part 2/3). 8 A Eubacterium siraeum D Butyrivibrio crossotus 100 10 2 Relative abundance 1.0 0.5 Allele frequency 0.0 B Eubacterium rectale E Paraprevotella clara 100 10 2 Relative abundance 1.0 0.5 Allele frequency 0.0 C Barnesiella intestinihominis F Alistipes senegalensis 100 10 2 Relative abundance 1.0 0.5 Allele frequency 0.0 -2yr 0 20 40 60 80 100 120 140 -2yr 0 20 40 60 80 100 120 140 Time (days) Time (days) Figure S8: Analogous version of Fig. 3 for more example species (Part 3/3). 9 SNVs tested ( 1000) Confirmed linked to correct core genome Different core genome 103 with most of theobserve negatives an clustering average in positive justalleles a confirmation had few rate only species. of acore minority of genes shared in read thespecies clouds correct from was species confirmed the if, correct (blue).clouds for species Linkage both (red). across to alleles, all Across a the species, timepointsalternate different majority we alleles (and species of of the was at these shared inferred least≥ read SNVs, if 5 clouds we one shared were retained or derived the readof both from read five clouds cloud genes in sharing with total).
Recommended publications
  • Direct Solutions of the Wright-Fisher Model
    University of Calgary PRISM: University of Calgary's Digital Repository Graduate Studies The Vault: Electronic Theses and Dissertations 2019-12-23 Direct Solutions of the Wright-Fisher Model Kryukov, Ivan Kryukov, I. (2019). Direct Solutions of the Wright-Fisher Model (Unpublished doctoral thesis). University of Calgary, Calgary, AB. http://hdl.handle.net/1880/111434 doctoral thesis University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission. Downloaded from PRISM: https://prism.ucalgary.ca UNIVERSITY OF CALGARY Direct Solutions of the Wright-Fisher Model by Ivan Kryukov A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY GRADUATE PROGRAM IN BIOCHEMISTRY AND MOLECULAR BIOLOGY CALGARY, ALBERTA DECEMBER, 2019 c Ivan Kryukov 2019 Contents List of figures v List of tables viii Abstract x Preface xii Acknowledgements xiii 1 Introduction 1 1.1 The Wright-Fisher model . .1 1.1.1 Transition probability matrix of the Wright-Fisher model . .2 1.1.2 Selection and mutation . .3 1.1.3 Probability of fixation . .6 1.2 Applications in evolutionary biology . .9 1.2.1 Mutation-selection limitations . 11 1.3 Direct computation of substitution rate . 12 2 Wright-Fisher Exact Solver 16 2.1 Introduction . 17 2.2 Results . 18 2.2.1 Implementation .
    [Show full text]
  • Joint Bayesian Estimation of Mutation Location and Age Using Linkage Disequilibrium
    Joint Bayesian Estimation of Mutation Location and Age Using Linkage Disequilibrium B. Rannala, J.P. Reeve Pacific Symposium on Biocomputing 8:526-534(2003) JOINT BAYESIAN ESTIMATION OF MUTATION LOCATION AND AGE USING LINKAGE DISEQUILIBRIUM B. RANNALA, J.P. REEVE Department of Medical Genetics, University of Alberta, Edmonton Alberta T6G2H7, Canada Associations between disease and marker alleles on chromosomes in populations can arise as a consequence of historical forces such as mutation, selection and genetic drift, and is referred to as \linkage disequilibrium" (LD). LD can be used to estimate the map position of a disease mutation relative to a set of linked markers, as well as to estimate other parameters of interest, such as mutation age. Parametric methods for estimating the location of a disease mutation using marker linkage disequilibrium in a sample of normal and affected individuals require a detailed knowledge of population demography, and in particular require users to specify the postulated age of a mutation and past population growth rates. A new Bayesian method is presented for jointly estimating the position of a disease mutation and its age. The method is illustrated using haplotype data for the cystic fibrosis ∆F 508 mutation in europe and the DTD mutation in Finland. It is shown that, for these datasets, the posterior probability distribution of disease mutation location is insensitive to the population growth rate when the model is averaged over possible mutation ages using a prior probability distribution for the mutation age based on the population frequency of the disease mutation. Fewer assumptions are therefore needed for parametric LD mapping.
    [Show full text]
  • A Maximum Likelihood Approach
    Proc. Natl. Acad. Sci. USA Vol. 95, pp. 15452–15457, December 1998 Evolution Using rare mutations to estimate population divergence times: A maximum likelihood approach GIORGIO BERTORELLE*† AND BRUCE RANNALA‡ *Department of Integrative Biology, University of California, Berkeley, CA 94720-3140; and ‡Department of Ecology and Evolution, State University of New York, Stony Brook, NY 11794-5245 Communicated by Henry C. Harpending, University of Utah, Salt Lake City, UT, October 23, 1998 (received for review July 27, 1998) ABSTRACT In this paper we propose a method to estimate size N. This estimator is based on a model of genetic drift without by maximum likelihood the divergence time between two popu- mutation, and it is therefore most suitable for populations that lations, specifically designed for the analysis of nonrecurrent have recently separated. However, when the differences between rare mutations. Given the rapidly growing amount of data, rare alleles are taken into account by using equivalents of Fst that allow f disease mutations affecting humans seem the most suitable for mutation [such as st (7) or Rst (8)], equivalent estimators candidates for this method. The estimator RD, and its conditional become feasible for older population subdivision events (8). version RDc, were derived, assuming that the population dynam- More recently, Nielsen et al. (9) have proposed a maximum ics of rare alleles can be described by using a birth–death process likelihood estimator based on the coalescent process and assum- approximation and that each mutation arose before the split of ing no mutation. When applied to simulated data from stable a common ancestral population into the two diverging popula- populations, this estimator appears to have less bias and lower tions.
    [Show full text]
  • The Evolutionary History of the CCR5-Δ32 HIV-Resistance Mutation
    Microbes and Infection 7 (2005) 302–309 www.elsevier.com/locate/micinf Review The evolutionary history of the CCR5-D32 HIV-resistance mutation Alison P. Galvani a,*, John Novembre b a Department of Epidemiology and Public Health, Yale School of Medicine, New Haven, CT 06520, USA b Department of Integrative Biology, University of California, Berkeley, CA 94720, USA Available online 08 January 2005 Abstract The CCR5 chemokine receptor is exploited by HIV-1 to gain entry into CD4+ T cells. A deletion mutation (D32) confers resistance against HIV by obliterating the expression of the receptor on the cell surface. Intriguingly, this allele is young in evolutionary time, yet it has reached relatively high frequencies in Europe. These properties indicate that the mutation has been under intense positive selection. HIV-1 has not exerted selection for long enough on the human population to drive the CCR5-D32 allele to current frequencies, fueling debate regarding the selective pressure responsible for rise of the allele. The allele exists at appreciable frequencies only in Europe, and within Europe, the fre- quency is higher in the north. Here we review the population genetics of the CCR5 locus, the debate over the historical selective pressure acting on CCR5-D32, the inferences that can potentially be drawn from the geographic distribution of CCR5-D32 and the role that other genetic polymorphisms play in conferring resistance against HIV. We also discuss parallel evolution that has occurred at the CCR5 locus of other primate species. Finally, we highlight the promise that therapies based on interfering with the CCR5 receptor could have in the treatment of HIV.
    [Show full text]
  • Population Genetics of Rare Variants and Complex Diseases Authors
    Title: Population genetics of rare variants and complex diseases Authors: M. Cyrus Mahera+, Lawrence H. Uricchiob+, Dara G. Torgersonc, and Ryan D. Hernandezd*. aDepartment of Epidemiology and Biostatistics, University of California, San Francisco. bUC Berkeley & UCSF Joint Graduate Group in Bioengineering, University of California, San Francisco. cDepartment of Medicine, University of California San Francisco. bDepartment of Bioengineering and Therapeutic Sciences, University of California, San Francisco. +These authors contributed equally. *Corresponding author: Ryan D. Hernandez, PhD. Department of Bioengineering and Therapeutic Sciences University of California San Francisco 1700 4th St., San Francisco, CA 94158. Tel. 415-514-9813, E-Mail: [email protected] Running title: Natural selection and complex disease Key words: Natural selection, deleterious, simulation, population genetics, rare variants. 1 Abstract: Objectives: Identifying drivers of complex traits from the noisy signals of genetic variation obtained from high throughput genome sequencing technologies is a central challenge faced by human geneticists today. We hypothesize that the variants involved in complex diseases are likely to exhibit non-neutral evolutionary signatures. Uncovering the evolutionary history of all variants is therefore of intrinsic interest for complex disease research. However, doing so necessitates the simultaneous elucidation of the targets of natural selection and population-specific demographic history. Methods: Here we characterize the action of natural selection operating across complex disease categories, and use population genetic simulations to evaluate the expected patterns of genetic variation in large samples. We focus on populations that have experienced historical bottlenecks followed by explosive growth (consistent with most human populations), and describe the differences between evolutionarily deleterious mutations and those that are neutral.
    [Show full text]
  • Joint Nonparametric Coalescent Inference of Mutation Spectrum
    bioRxiv preprint doi: https://doi.org/10.1101/2020.06.16.153452; this version posted June 16, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. 1 Joint nonparametric coalescent inference of mutation spectrum 2 history and demography 1;4;∗ 2;3 1;4;∗ 3 William S. DeWitt , Kameron Decker Harris , and Kelley Harris 1Department of Genome Sciences, 2Paul G. Allen School of Computer Science & Engineering, and 3Department of Biology, University of Washington, Seattle, WA 4Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA ∗ Corresponding authors: [email protected] (WSD), [email protected] (KH) 4 June 16, 2020 5 Abstract 6 Booming and busting populations modulate the accumulation of genetic diversity, encoding 7 histories of living populations in present-day variation. Many methods exist to decode these 8 histories, and all must make strong model assumptions. It is typical to assume that mutations 9 accumulate uniformly across the genome at a constant rate that does not vary between closely 10 related populations. However, recent work shows that mutational processes in human and great 11 ape populations vary across genomic regions and evolve over time. This perturbs the mutation 12 spectrum: the relative mutation rates in different local nucleotide contexts. Here, we develop 13 theoretical tools in the framework of Kingman's coalescent to accommodate mutation spectrum 14 dynamics. We describe mushi: a method to perform fast, nonparametric joint inference of 15 demographic and mutation spectrum histories from allele frequency data.
    [Show full text]
  • Direct Solutions of the Wright-Fisher Model
    University of Calgary PRISM: University of Calgary's Digital Repository Graduate Studies The Vault: Electronic Theses and Dissertations 2019-12-23 Direct Solutions of the Wright-Fisher Model Kryukov, Ivan Kryukov, I. (2019). Direct Solutions of the Wright-Fisher Model (Unpublished doctoral thesis). University of Calgary, Calgary, AB. http://hdl.handle.net/1880/111434 doctoral thesis University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission. Downloaded from PRISM: https://prism.ucalgary.ca UNIVERSITY OF CALGARY Direct Solutions of the Wright-Fisher Model by Ivan Kryukov A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY GRADUATE PROGRAM IN BIOCHEMISTRY AND MOLECULAR BIOLOGY CALGARY, ALBERTA DECEMBER, 2019 c Ivan Kryukov 2019 Contents List of figures v List of tables viii Abstract x Preface xii Acknowledgements xiii 1 Introduction 1 1.1 The Wright-Fisher model . .1 1.1.1 Transition probability matrix of the Wright-Fisher model . .2 1.1.2 Selection and mutation . .3 1.1.3 Probability of fixation . .6 1.2 Applications in evolutionary biology . .9 1.2.1 Mutation-selection limitations . 11 1.3 Direct computation of substitution rate . 12 2 Wright-Fisher Exact Solver 16 2.1 Introduction . 17 2.2 Results . 18 2.2.1 Implementation .
    [Show full text]
  • The Linked Selection Signature of Rapid Adaptation in Temporal Genomic Data
    HIGHLIGHTED ARTICLE | INVESTIGATION The Linked Selection Signature of Rapid Adaptation in Temporal Genomic Data Vince Buffalo*,†,1 and Graham Coop† *Population Biology Graduate Group and †Center for Population Biology, Department of Evolution and Ecology, University of California, Davis, California 95616 ORCID IDs: 0000-0003-4510-1609 (V.B.); 0000-0001-8431-0302 (G.C.) ABSTRACT The majority of empirical population genetic studies have tried to understand the evolutionary processes that have shaped genetic variation in a single sample taken from a present-day population. However, genomic data collected over tens of generations in both natural and laboratory populations are increasingly used to find selected loci underpinning adaptation over these short timescales. Although these studies have been quite successful in detecting selection on large-effect loci, the fitness differences between individuals are often polygenic, such that selection leads to allele frequency changes that are difficult to distinguish from genetic drift. However, one promising signal comes from polygenic selection’s effect on neutral sites that become stochastically associated with the genetic backgrounds that lead to fitness differences between individuals. Previous theoretical work has established that the random associ- ations between a neutral allele and heritable fitness backgrounds act to reduce the effective population size experienced by this neutral allele. These associations perturb neutral allele frequency trajectories, creating autocovariance in the allele frequency changes across generations. Here, we show how temporal genomic data allow us to measure the temporal autocovariance in allele frequency changes and characterize the genome-wide impact of polygenic selection. We develop expressions for these temporal autocovariances, showing that their magnitude is determined by the level of additive genetic variation, recombination, and linkage disequilibria in a region.
    [Show full text]
  • Convergent Adaptation of Human Lactase Persistence in Africa and Europe
    Europe PMC Funders Group Author Manuscript Nat Genet. Author manuscript; available in PMC 2009 April 23. Published in final edited form as: Nat Genet. 2007 January ; 39(1): 31–40. doi:10.1038/ng1946. Europe PMC Funders Author Manuscripts Convergent adaptation of human lactase persistence in Africa and Europe Sarah A Tishkoff1,9, Floyd A Reed1,9, Alessia Ranciaro1,2, Benjamin F Voight3, Courtney C Babbitt4, Jesse S Silverman4, Kweli Powell1, Holly M Mortensen1, Jibril B Hirbo1, Maha Osman5, Muntaser Ibrahim5, Sabah A Omar6, Godfrey Lema7, Thomas B Nyambo7, Jilur Ghori8, Suzannah Bumpstead8, Jonathan K Pritchard3, Gregory A Wray4, and Panos Deloukas8 1Department of Biology, University of Maryland, College Park, Maryland 20742, USA 2Department of Biology, University of Ferrara, 44100 Ferrara, Italy 3Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA 4Institute for Genome Sciences & Policy and Department of Biology, Duke University, Durham, North Carolina 27708, USA 5Department of Molecular Biology, Institute of Endemic Diseases, University of Khartoum, 15-13 Khartoum, Sudan 6Kenya Medical Research Institute, Centre for Biotechnology Research and Development, 54840-00200 Nairobi, Kenya 7Department of Biochemistry, Muhimbili University College of Health Sciences, Dar es Salaam, Tanzania Europe PMC Funders Author Manuscripts 8Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK Abstract A SNP in the gene encoding lactase (LCT) (C/T-13910) is associated with the ability to digest milk as adults (lactase persistence) in Europeans, but the genetic basis of lactase persistence in Africans was previously unknown. We conducted a genotype-phenotype association study in 470 Correspondence should be addressed to S.A.T.
    [Show full text]
  • Estimating Time to the Common Ancestor for a Beneficial Allele
    bioRxiv preprint doi: https://doi.org/10.1101/071241; this version posted August 24, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 1 Estimating time to the common ancestor for a beneficial allele 1 2 3,4 3* 2 Joel Smith , Graham Coop , Matthew Stephens , John Novembre , 3 1 Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of 4 America 5 2 Center for Population Biology, Department of Evolution and Ecology, University of California, Davis, 6 California, United States of America 7 3 Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America 8 4 Department of Statistics, University of Chicago, Chicago, Illinois, United States of America 9 *[email protected] 10 Abstract 11 The haplotypes of a beneficial allele carry information about its history that can shed light on its age and 12 putative cause for its increase in frequency. Specifically, the signature of an allele's age is contained in 13 the pattern of local ancestry that mutation and recombination impose on its haplotypic background. We 14 provide a method to exploit this pattern and infer the time to the common ancestor of a positively selected 15 allele following a rapid increase in frequency. We do so using a hidden Markov model which leverages 16 the length distribution of the shared ancestral haplotype, the accumulation of derived mutations on 17 the ancestral background, and the surrounding background haplotype diversity.
    [Show full text]
  • The Geographic Spread of the CCR5 D32 HIV-Resistance Allele
    Open access, freely available online PLoS BIOLOGY The Geographic Spread of the CCR5 D32 HIV-Resistance Allele John Novembre*, Alison P. Galvani, Montgomery Slatkin Department of Integrative Biology, University of California, Berkeley, California, United States of America The D32 mutation at the CCR5 locus is a well-studied example of natural selection acting in humans. The mutation is found principally in Europe and western Asia, with higher frequencies generally in the north. Homozygous carriers of the D32 mutation are resistant to HIV-1 infection because the mutation prevents functional expression of the CCR5 chemokine receptor normally used by HIV-1 to enter CD4þ T cells. HIV has emerged only recently, but population genetic data strongly suggest D32 has been under intense selection for much of its evolutionary history. To understand how selection and dispersal have interacted during the history of the D32 allele, we implemented a spatially explicit model of the spread of D32. The model includes the effects of sampling, which we show can give rise to local peaks in observed allele frequencies. In addition, we show that with modest gradients in selection intensity, the origin of the D32 allele may be relatively far from the current areas of highest allele frequency. The geographic distribution of the D32 allele is consistent with previous reports of a strong selective advantage (.10%) for D32 carriers and of dispersal over relatively long distances (.100 km/generation). When selection is assumed to be uniform across Europe and western Asia, we find support for a northern European origin and long-range dispersal consistent with the Viking- mediated dispersal of D32 proposed by G.
    [Show full text]
  • Linkage Disequilibrium — Understanding the Evolutionary Past and Mapping the Medical Future
    REVIEWS FUNDAMENTAL CONCEPTS IN GENETICS Linkage disequilibrium — understanding the evolutionary past and mapping the medical future Montgomery Slatkin Abstract | Linkage disequilibrium — the nonrandom association of alleles at different loci — is a sensitive indicator of the population genetic forces that structure a genome. Because of the explosive growth of methods for assessing genetic variation at a fine scale, evolutionary biologists and human geneticists are increasingly exploiting linkage disequilibrium in order to understand past evolutionary and demographic events, to map genes that are associated with quantitative characters and inherited diseases, and to understand the joint evolution of linked sets of genes. This article introduces linkage disequilibrium, reviews the population genetic processes that affect it and describes some of its uses. At present, linkage disequilibrium is used much more extensively in the study of humans than in non-humans, but that is changing as technological advances make extensive genomic studies feasible in other species. Linkage disequilibrium (LD) is one of those unfortu- that cause gene-frequency evolution. How these fac- nate terms that does not reveal its meaning. As every tors affect LD between a particular pair of loci or in a instructor of population genetics knows, the term is a genomic region depends on local recombination rates. barrier not an aid to understanding. LD means simply The population genetics theory of LD is well developed a nonrandom association of alleles at two or more loci, and is being widely used to provide insight into evolu- and detecting LD does not ensure either linkage or a tionary history and as the basis for mapping genes in lack of equilibrium.
    [Show full text]