The Utility of Ancient Human DNA for Improving Allele Age Estimates, with Implications for Demographic Models and Tests of Natural Selection

Journal of Human Evolution 79 (2015) 64e72 Contents lists available at ScienceDirect Journal of Human Evolution journal homepage: www.elsevier.com/locate/jhevol The utility of ancient human DNA for improving allele age estimates, with implications for demographic models and tests of natural selection * Aaron J. Sams a, b, , John Hawks a, 1, Alon Keinan b, 1 a Department of Anthropology, University of Wisconsin-Madison, Madison, WI, USA b Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, USA article info abstract Article history: The age of polymorphic alleles in humans is often estimated from population genetic patterns in extant Received 14 January 2014 human populations, such as allele frequencies, linkage disequilibrium, and rate of mutations. Ancient Accepted 22 October 2014 DNA can improve the accuracy of such estimates, as well as facilitate testing the validity of demographic Available online 15 November 2014 models underlying many population genetic methods. Specifically, the presence of an allele in a genome derived from an ancient sample testifies that the allele is at least as old as that sample. In this study, we Keywords: € consider a common method for estimating allele age based on allele frequency as applied to variants Otzi from the US National Institutes of Health (NIH) Heart, Lung, and Blood Institute (NHLBI) Exome Neandertals Sequencing Project. We view these estimates in the context of the presence or absence of each allele in Europe € the genomes of the 5300 year old Tyrolean Iceman, Otzi, and of the 50,000 year old Altai Neandertal. Our results illuminate the accuracy of these estimates and their sensitivity to demographic events that were not included in the model underlying age estimation. Specifically, allele presence in the Iceman genome provides a good fit of allele age estimates to the expectation based on the age of that specimen. The equivalent based on the Neandertal genome leads to a poorer fit. This is likely due in part to the older age of the Neandertal and the older time of the split between modern humans and Neandertals, but also due to gene flow from Neandertals to modern humans not being considered in the underlying demographic model. Thus, the incorporation of ancient DNA can improve allele age estimation, demographic modeling, and tests of natural selection. Our results also point to the importance of considering a more diverse set of ancient samples for understanding the geographic and temporal range of individual alleles. © 2014 Elsevier Ltd. All rights reserved. Introduction timing of natural selection that has shaped present-day human populations. Herein, we illustrate the importance of aDNA for The quality of ancient human DNA (aDNA) sequencing has addressing questions of allele age and timing of selection by first steadily improved during the past decade, culminating recently in briefly reviewing related literature, and by analyzing allele age in the high-coverage sequencing of two archaic hominin genomes the context of two ancient genomes from different time periods from human skeletal remains pre-dating 40,000 years ago (Meyer and different phylogenetic distance to present-day Europeans. By et al., 2012; Prüfer et al., 2014). Additionally, Meyer et al. (2013) comparing with allele age estimates based on allele frequency in an recently sequenced a mitochondrial genome from the Middle extant population, we conclude that consideration of whether an Pleistocene site of Sima de los Huesos in Spain, pushing the oldest allele is present or absent in aDNA can provide important infor- sequenced human DNA beyond 300,000 years. In addition to mation about allele age and improve on the former type of providing information about human demographic history, the estimates. growing sample of aDNA is useful for understanding the age of mutations that segregate in extant populations and, therefore, the Presence of recently selected alleles in ancient European specimens A straightforward means of testing selective hypotheses, e.g., * Corresponding author. E-mail address: [email protected] (A.J. Sams). based on long-range haplotype methods (Sabeti et al., 2002; Voight 1 Co-project leaders. et al., 2006), is to analyze putatively selected haplotypes and http://dx.doi.org/10.1016/j.jhevol.2014.10.009 0047-2484/© 2014 Elsevier Ltd. All rights reserved. A.J. Sams et al. / Journal of Human Evolution 79 (2015) 64e72 65 single-nucleotide variants (SNVs) in ancient human specimens. agricultural lifestyles from the Near East. The earlier and more Perhaps the most successful application to date of aDNA to a po- popular viewpoint argues that this transition is characterized by a tential case of recent natural selection pertains to the genetic vari- large amount of demic diffusion involving a large influx of agricul- ants underlying lactase persistence. The ability to digest lactose, a tural populations across Europe (Childe, 1958). Others view this disaccharide found in milk, typically slows in most mammals around transition as being dominated by cultural diffusion, such that the time of weaning. In many humans the ability to produce the Mesolithic hunter-gatherers in Europe embraced Neolithic life- enzyme lactase, which digests lactose into glucose and galactose styles with little genetic input from Near Eastern farmers (Zvelebil sugars, continues into adulthood, a phenotype known as ‘lactase and Dolukhanov, 1991). persistence’. The timing and evolution of lactase persistence was Considering the two viewpoints, using phylogeographic ana- documented first in Europeans, where the genomic region sur- lyses of present human DNA samples has led to contrasting results rounding LCT, the gene encoding lactase, has been consistently re- (Sokal and Menozzi, 1982; Ammerman and Cavalli-Sforza, 1984; ported as one of the most extreme long-range haplotype based Sokal et al., 1991; Puit et al., 1994; Richards et al., 2000; Rosser et al., examples of recent selection in Europe (Enattah et al., 2002, 2007, 2000; Simoni et al., 2000; Belle et al., 2006; Balaresque et al., 2010). 2008; Bersaglieri et al., 2004; Tishkoff et al., 2006). The selection It is important to note that estimates based solely on phylogeo- has been shown to be in the nearby MCM6 gene and results in graphic analysis of genetic variation in extant humans can be biased downregulation of the cessation of lactase production after weaning by several factors. Importantly, recent demographic events, such as (Enattah et al., 2002). Similar disruptive changes to the MCM6 gene gene-flow and back-migration between Europe and the Near East have convergently evolved in both African (Tishkoff et al., 2006) and can result in the geography of extant human populations mis- Middle Eastern (Enattah et al., 2007) populations. Selection for representing that of their perceived ancestors (Richards et al., lactase persistence shows the importance of comparing genetic data 2000; Balaresque et al., 2010). Additionally, when considering a to known cultural changes in the past, such as the timing and limited number of genetic loci, clines of genetic variation can be geographic distribution of cattle and camel pastoralism and milk sensitive to stochastic processes such as isolation by distance consumption (Holden and Mace,1997; Gerbault et al., 2009). The age (Novembre and Stephens, 2008). of the mutation and subsequent beginning of the selective sweep Ancient DNA is an ideal data source to circumvent these issues underlying lactase persistence in Europeans (C/T-13910) has been and settle this debate because genetic differences between skeletal estimated between 3000 and 12,000 years, which seems to coincide samples associated with Mesolithic and Neolithic technologies can with the presence of domesticated cattle (Bollongino et al., 2006) be directly assayed and correlated with cultural differences. A and a record of increasing pastoralism and dairying in several hu- model of cultural diffusion predicts little to no major genetic dif- man populations, particularly in northern Europe. For example, ferences associated with culture change in Holocene samples, Tishkoff and colleagues (2006) estimated the age, using a coalescent while the demic diffusion model predicts substantial influx of novel simulation model that incorporated selection and recombination, at genetic variation. The majority of aDNA studies of Europe have approximately 8000 to 9000 years depending on the degree of involved mtDNA analysis, primarily from hunter-gatherer and dominance assumed for the allele. While consistent with the farming populations of north and central Europe. These studies anthropological record, the confidence intervals spanning 2000 to have revealed that approximately ~83% of pre-Neolithic peoples of 19,000 years points to the large uncertainty in the estimates. This is Europe carried mtDNA haplogroup U and none belong to hap- consistent with the large range of variation in coalescence times logroup H, a composition that is markedly different from present (Slatkin and Rannala, 2000). samples in which haplogroup H is dominant (Haak et al., 2005, Estimates of allele age and timing of selection based on popu- 2008, 2010; Bramanti, 2008; Bramanti et al., 2009; Guba et al., lation genetic patterns observed in extant humans depend heavily 2011; Fu et al., 2012; Lee et al., 2012; Nikitin et al., 2012). On the on assumptions about the demographic history of human pop- other hand, ~12% from early farming populations belong to hap- ulations and are often associated with large ranges of error (as logroup U, while haplogroup H is present in between 25 and 37% of illustrated above for the timing of C/T-13910). By testing whether mtDNA from early farming samples, both consistent with the specific genetic variants were absent or present in an ancient haplotype frequencies of extant Europeans. Combined, these re- sample, aDNA can be used to test hypotheses about the timing of sults from ancient mtDNA analyses suggest that pre-Neolithic selective changes in past human populations (Burger et al., 2007; hunter-gatherers contributed at most 20% to the mtDNA genetic Malmstrom€ et al., 2010; Plantinga et al., 2012).

The Utility of Ancient Human DNA for Improving Allele Age Estimates, with Implications for Demographic Models and Tests of Natural Selection

Direct Solutions of the Wright-Fisher Model

Joint Bayesian Estimation of Mutation Location and Age Using Linkage Disequilibrium

A Maximum Likelihood Approach

The Evolutionary History of the CCR5-Δ32 HIV-Resistance Mutation

Population Genetics of Rare Variants and Complex Diseases Authors

Joint Nonparametric Coalescent Inference of Mutation Spectrum

Direct Solutions of the Wright-Fisher Model

The Linked Selection Signature of Rapid Adaptation in Temporal Genomic Data

Convergent Adaptation of Human Lactase Persistence in Africa and Europe

Estimating Time to the Common Ancestor for a Beneficial Allele

The Geographic Spread of the CCR5 D32 HIV-Resistance Allele

Supplemental Materials