<<

Journal of Human Evolution 79 (2015) 64e72

Contents lists available at ScienceDirect

Journal of Human Evolution

journal homepage: www.elsevier.com/locate/jhevol

The utility of ancient human DNA for improving allele age estimates, with implications for demographic models and tests of

* Aaron J. Sams a, b, , John Hawks a, 1, Alon Keinan b, 1 a Department of Anthropology, University of Wisconsin-Madison, Madison, WI, USA b Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, USA article info abstract

Article history: The age of polymorphic alleles in humans is often estimated from population genetic patterns in extant Received 14 January 2014 human populations, such as allele frequencies, , and rate of . Ancient Accepted 22 October 2014 DNA can improve the accuracy of such estimates, as well as facilitate testing the validity of demographic Available online 15 November 2014 models underlying many population genetic methods. Specifically, the presence of an allele in a genome derived from an ancient sample testifies that the allele is at least as old as that sample. In this study, we Keywords: € consider a common method for estimating allele age based on allele frequency as applied to variants Otzi from the US National Institutes of Health (NIH) Heart, Lung, and Blood Institute (NHLBI) Exome Neandertals Sequencing Project. We view these estimates in the context of the presence or absence of each allele in Europe € the genomes of the 5300 year old Tyrolean Iceman, Otzi, and of the 50,000 year old Altai Neandertal. Our results illuminate the accuracy of these estimates and their sensitivity to demographic events that were not included in the model underlying age estimation. Specifically, allele presence in the Iceman genome provides a good fit of allele age estimates to the expectation based on the age of that specimen. The equivalent based on the Neandertal genome leads to a poorer fit. This is likely due in part to the older age of the Neandertal and the older time of the split between modern humans and Neandertals, but also due to gene flow from Neandertals to modern humans not being considered in the underlying demographic model. Thus, the incorporation of ancient DNA can improve allele age estimation, demographic modeling, and tests of natural selection. Our results also point to the importance of considering a more diverse set of ancient samples for understanding the geographic and temporal range of individual alleles. © 2014 Elsevier Ltd. All rights reserved.

Introduction timing of natural selection that has shaped present-day human populations. Herein, we illustrate the importance of aDNA for The quality of ancient human DNA (aDNA) sequencing has addressing questions of allele age and timing of selection by first steadily improved during the past decade, culminating recently in briefly reviewing related literature, and by analyzing allele age in the high-coverage sequencing of two archaic hominin genomes the context of two ancient genomes from different time periods from human skeletal remains pre-dating 40,000 years ago (Meyer and different phylogenetic distance to present-day Europeans. By et al., 2012; Prüfer et al., 2014). Additionally, Meyer et al. (2013) comparing with allele age estimates based on allele frequency in an recently sequenced a mitochondrial genome from the Middle extant population, we conclude that consideration of whether an Pleistocene site of Sima de los Huesos in Spain, pushing the oldest allele is present or absent in aDNA can provide important infor- sequenced human DNA beyond 300,000 years. In addition to mation about allele age and improve on the former type of providing information about human demographic history, the estimates. growing sample of aDNA is useful for understanding the age of mutations that segregate in extant populations and, therefore, the Presence of recently selected alleles in ancient European specimens

A straightforward means of testing selective hypotheses, e.g., * Corresponding author. E-mail address: [email protected] (A.J. Sams). based on long-range haplotype methods (Sabeti et al., 2002; Voight 1 Co-project leaders. et al., 2006), is to analyze putatively selected haplotypes and http://dx.doi.org/10.1016/j.jhevol.2014.10.009 0047-2484/© 2014 Elsevier Ltd. All rights reserved. A.J. Sams et al. / Journal of Human Evolution 79 (2015) 64e72 65 single-nucleotide variants (SNVs) in ancient human specimens. agricultural lifestyles from the Near East. The earlier and more Perhaps the most successful application to date of aDNA to a po- popular viewpoint argues that this transition is characterized by a tential case of recent natural selection pertains to the genetic vari- large amount of demic diffusion involving a large influx of agricul- ants underlying lactase persistence. The ability to digest lactose, a tural populations across Europe (Childe, 1958). Others view this disaccharide found in milk, typically slows in most mammals around transition as being dominated by cultural diffusion, such that the time of weaning. In many humans the ability to produce the Mesolithic hunter-gatherers in Europe embraced Neolithic life- enzyme lactase, which digests lactose into glucose and galactose styles with little genetic input from Near Eastern farmers (Zvelebil sugars, continues into adulthood, a phenotype known as ‘lactase and Dolukhanov, 1991). persistence’. The timing and evolution of lactase persistence was Considering the two viewpoints, using phylogeographic ana- documented first in Europeans, where the genomic region sur- lyses of present human DNA samples has led to contrasting results rounding LCT, the gene encoding lactase, has been consistently re- (Sokal and Menozzi, 1982; Ammerman and Cavalli-Sforza, 1984; ported as one of the most extreme long-range haplotype based Sokal et al., 1991; Puit et al., 1994; Richards et al., 2000; Rosser et al., examples of recent selection in Europe (Enattah et al., 2002, 2007, 2000; Simoni et al., 2000; Belle et al., 2006; Balaresque et al., 2010). 2008; Bersaglieri et al., 2004; Tishkoff et al., 2006). The selection It is important to note that estimates based solely on phylogeo- has been shown to be in the nearby MCM6 gene and results in graphic analysis of genetic variation in extant humans can be biased downregulation of the cessation of lactase production after weaning by several factors. Importantly, recent demographic events, such as (Enattah et al., 2002). Similar disruptive changes to the MCM6 gene gene-flow and back-migration between Europe and the Near East have convergently evolved in both African (Tishkoff et al., 2006) and can result in the geography of extant human populations mis- Middle Eastern (Enattah et al., 2007) populations. Selection for representing that of their perceived ancestors (Richards et al., lactase persistence shows the importance of comparing genetic data 2000; Balaresque et al., 2010). Additionally, when considering a to known cultural changes in the past, such as the timing and limited number of genetic loci, clines of genetic variation can be geographic distribution of cattle and camel pastoralism and milk sensitive to stochastic processes such as isolation by distance consumption (Holden and Mace,1997; Gerbault et al., 2009). The age (Novembre and Stephens, 2008). of the and subsequent beginning of the selective sweep Ancient DNA is an ideal data source to circumvent these issues underlying lactase persistence in Europeans (C/T-13910) has been and settle this debate because genetic differences between skeletal estimated between 3000 and 12,000 years, which seems to coincide samples associated with Mesolithic and Neolithic technologies can with the presence of domesticated cattle (Bollongino et al., 2006) be directly assayed and correlated with cultural differences. A and a record of increasing pastoralism and dairying in several hu- model of cultural diffusion predicts little to no major genetic dif- man populations, particularly in northern Europe. For example, ferences associated with culture change in Holocene samples, Tishkoff and colleagues (2006) estimated the age, using a coalescent while the demic diffusion model predicts substantial influx of novel simulation model that incorporated selection and recombination, at genetic variation. The majority of aDNA studies of Europe have approximately 8000 to 9000 years depending on the degree of involved mtDNA analysis, primarily from hunter-gatherer and dominance assumed for the allele. While consistent with the farming populations of north and central Europe. These studies anthropological record, the confidence intervals spanning 2000 to have revealed that approximately ~83% of pre-Neolithic peoples of 19,000 years points to the large uncertainty in the estimates. This is Europe carried mtDNA haplogroup U and none belong to hap- consistent with the large range of variation in coalescence times logroup H, a composition that is markedly different from present (Slatkin and Rannala, 2000). samples in which haplogroup H is dominant (Haak et al., 2005, Estimates of allele age and timing of selection based on popu- 2008, 2010; Bramanti, 2008; Bramanti et al., 2009; Guba et al., lation genetic patterns observed in extant humans depend heavily 2011; Fu et al., 2012; Lee et al., 2012; Nikitin et al., 2012). On the on assumptions about the demographic history of human pop- other hand, ~12% from early farming populations belong to hap- ulations and are often associated with large ranges of error (as logroup U, while haplogroup H is present in between 25 and 37% of illustrated above for the timing of C/T-13910). By testing whether mtDNA from early farming samples, both consistent with the specific genetic variants were absent or present in an ancient haplotype frequencies of extant Europeans. Combined, these re- sample, aDNA can be used to test hypotheses about the timing of sults from ancient mtDNA analyses suggest that pre-Neolithic selective changes in past human populations (Burger et al., 2007; hunter-gatherers contributed at most 20% to the mtDNA genetic Malmstrom€ et al., 2010; Plantinga et al., 2012). This can lead to composition of present European populations (Fu et al., 2012). much more precise time estimates, though these depend on the These ancient mtDNA results were recently combined with a ability to accurately date ancient skeletal materials. For example, dataset of 1151 complete mtDNAs from across Europe (Fu et al., the derived allele (-13910*T) that underlies lactase persistence in 2012). Fu and colleagues (2012) found evidence for a population Northern Europeans was found in only one copy out of 20 (~5%) in a expansion between 15,000 and 10,000 years ago in mtDNA typical 5000 year old skeletal sample from Sweden (Malmstrom€ et al., of pre-Neolithic hunter-gatherers and a subsequent contraction of 2010), in ~27% of a sample of 26 Basque individuals dating be- these haplotypes between 10,000 and 5000 years ago, consistent tween 4500 and 5000 years ago (Plantinga et al., 2012), and was with the expansion of mtDNA from agricultural populations from completely absent from a skeletal sample of nine individuals from the Near East. eastern Europe dating between 5000 and 5800 years ago (Burger In addition to ancient mtDNA data, results from whole genome et al., 2007). sequencing of ancient Holocene-aged individuals have been applied to the question of Neolithic population replacement in Holocene demography of Europe and ancient DNA Europe. Within the past two years, substantial amounts of auto- somal DNA have been reported for seven ancient individuals in € Archaeological evidence suggests that the transition from a Europe. These include Otzi (the Tyrolean Iceman) dating to roughly hunting and gathering lifestyle to a more sedentary agricultural 5300 years ago (Keller et al., 2012), three hunter-gatherers associ- ‘Neolithic’ lifestyle, which began in the Near East by 10,000 years ated with the Pitted Ware culture and one farmer associated with ago, spread across Europe between 8000 and 4000 years ago (Price, the Funnel Beaker culture from Scandinavia dating to approxi- 2000). Archaeological and genetic evidence has traditionally been mately 5000 years ago (Skoglund et al., 2012), and two 7000 year divided between two viewpoints regarding the spread of old Iberian hunter-gatherers (Sanchez-Quinto et al., 2012). 66 A.J. Sams et al. / Journal of Human Evolution 79 (2015) 64e72

Genome-wide analysis of the Iceman's genome revealed that of (Rannala and Reeve, 2001), or shared haplotypes (Genin et al., extant populations, the Iceman appears most closely related to 2004). We chose to utilize a set of allele age estimates generated Sardinians, which has been argued to reflect the common ancestry from allele frequencies (using the approach developed by Griffiths between the Sardinian and Alpine populations of ~5000 years ago and Tavare, 1998) as part of the US National Institutes of Health (Keller et al., 2012). The study of the four ancient Scandinavians (NIH) Heart, Lung, and Blood Institute (NHLBI)-sponsored Exome surprisingly revealed that the hunter-gatherers of this sample are Sequencing Project (ESP). We focus specifically on the set of allele more similar to (though still distinct from) Northern Europeans, ages for approximately 1.15 million exonic SNVs estimated from a while the farmer is more similar to present-day Southern Euro- sample of 4298 European Americans (Fu et al., 2013). peans, with a high degree of genetic similarity to present-day in- We considered two whole-genomes of skeletal samples with € dividuals in Greece and Cyprus. Finally, analysis of the two Iberian known geographic and temporal provenience, namely Otzi, the individuals showed no close relationship between these individuals Tyrolean Iceman, estimated to be approximately 5300 years old and present-day populations in Iberia or southern Europe. The (Keller et al., 2012) and an approximately 50,000 year old Nean- general picture emerging from analysis of aDNA is that the spread dertal from the Altai Mountains of Siberia (Prüfer et al., 2014). The of farming technology during the Neolithic was associated with accuracy of population genetic estimates of derived allele age de- significant population movements from the Near East across pends both on the details of the method applied for their estima- Europe. This population replacement should be taken under tion, as well as on selection operating on the allele and the global consideration in utilizing data from aDNA to address questions effects of demography. For example, an allele has been previously € about recent natural selection in Europe, particularly in studies of a reported (Sams and Hawks, 2013) that is present in Otzi but esti- single locus. mated to be younger than 5000 using an LD-based method. How- ever, we hypothesize that derived allele ages estimated from the Ancient DNA, demography, and selection extant European American population will be much older on average when they are present in an ancient genome, compared To what extent should the evidence of demographic population with derived alleles that are not observed in the ancient individual. replacement affect the application of aDNA in testing hypotheses of Empirical testing and quantification of this hypothesis can weigh recent natural selection? As a hypothetical example, consider the into how commonly-used methods for timing selection and age of analysis by Malmstrom€ et al. (2010), which revealed that the mutations can be biased by demographic factors (such as popula- -13910*T lactase persistence allele was relatively rare (5%) in a tion replacements, bottlenecks, and population growth) as well as middle Neolithic sample of hunter-gatherers associated with the by other factors, and how accurate they are when considering Pitted Ware culture. This result seemingly strengthens the argu- variability across loci in coalescence times, mutation and recom- ment for the recent origin and rapid increase in frequency of the bination rates, etc. lactase persistence allele in populations of Europe. However, given that these mid-Neolithic hunter-gatherer populations may not be Materials and methods closely related to present day populations of northern Europe, the rapid shift in allele frequency may reflect the demographic shift Allele-ages rather than natural selection. In light of this, it is important that applications of aDNA to questions related to selection consider a We extracted text-based (.txt) Exome-Sequencing Project data wider geographic range of aDNA samples. For example, the fre- from the NHLBI Exome Sequencing Project's Exome Variant Server quency of the -13910*T allele in ancient populations of the Near (http://evs.gs.washington.edu/EVS/, accessed December 2013). We East can shed important additional light on the timing and geog- filtered the data to exclude duplicate SNVs, SNVs representing raphy of selection for lactase persistence. transition polymorphisms, and any SNVs that did not include an allele age estimate for the European sub-sample of the ESP. Estimating allele age The ESP data files do not include the project's calls for the ancestral state used to determine derived allele-age. Therefore, we One of the exciting outcomes of the development of aDNA used our own inferred ancestral state (described below) to polarize technologies is the use of ancient samples to improve allele age the frequency at each resulting SNV. During this process, we estimation from modern samples, as recently demonstrated by noticed that a small proportion (~0.04%) of SNVs had listed ages Malaspinas et al. (2012). To examine uncertaintiesddue to de- clearly corresponding to the frequency of our inferred ancestral mographic eventsdassociated with using aDNA to assess allele age, allele, which likely results from differences in calling the ancestral be it a neutral allele or one under recent selection, we aim to first state. To alleviate this problem, we removed these miscalled sites determine how closely estimates of allele age from extant pop- from our analysis. In total, we analyzed a sample of 119,421 poly- ulations conform to ancient specimens with known geographic and morphic sites. temporal provenience. Lactase persistence provides one example where the absence of the derived allele in ancient samples is Inference of ancestral state consistent with a recent mutation. However, discovery of the allele in additional ancient populations could potentially push back the We inferred ancestral state by alignment of the human reference timing of the allele. We focus here on the information that can be (hg19) to three primates [chimpanzee (pantro2), orangutan gleaned from the presence or absence of derived alleles in ancient (ponabe1), and rhesus macaque (rhemac2)]. Alignment gaps and samples on the genome-wide distribution of allele age. We further non-syntenic regions were removed. Ancestral state was called by compare the information gleaned from aDNA with estimates of the chimpanzee allele as long as that allele was also present in a allele age based on extant population genetic information, which second primate outgroup. can provide insight into the accuracy of the latter. Several methods have been developed to estimate allele age PhyloP data using allele frequency (Kimura and Ohta, 1973; Griffiths and Tavare, 1998), intra-allelic variation (Serre et al., 1990; Risch et al., 1995; PhyloP scores generated from the EPO 36 eutherian mammal Slatkin and Rannala, 2000), patterns of linkage disequilibrium alignment with human sequence masked were generated and A.J. Sams et al. / Journal of Human Evolution 79 (2015) 64e72 67 provided by Wenqing Fu and Joshua Akey. We used the 95th Ascertainment in the Iceman genome shows reasonable fit to allele percentile to differentiate highly conserved sites (top 5%) and not age estimation highly conserved sites. The distribution of ESP age estimates for derived alleles that are Iceman genome data present in the Iceman genome exhibits a significant shift towards older ages compared with that for derived alleles present in the We obtained aligned genome reads from the Tyrolean Iceman Iceman genome (Fig. 1; median difference, p < 10 10). This result € (Otzi) from the European Short Read Archive. At the time of our holds also after removing the low frequency variants (derived allele € download, the Otzi data were provided as three BAM files; we used count below 10), which make up a majority of the dataset (data not samtools (http://samtools.org) to merge these into a single dataset shown). The overall proportion of derived alleles estimated to be and extracted the SNV sites for each allele with a valid age estimate younger than 5000 years old and present in the Iceman genome is € from the European portion of the ESP data. In order to align the Otzi very small (Table 1; 0.024%). Stated differently, alleles estimated to reads (hg18) to the HapMap/1000 Genomes data (hg19), we first be younger than 5000 years old are associated with 67-fold had to perform a genome build liftover using the UCSC Genome decreased odds (OR ¼ 0.0015) of being observed in the Iceman Browser Batch Coordinate Conversion utility (http://genome.ucsc. genome relative to alleles estimated to be older than 5000 years. edu/cgi-bin/hgLiftOver). While only 0.024% of alleles were estimated as younger than The extracted set of reads were filtered as in previous studies to 5000 years and present in the Iceman, this represents a significant include only reads with base quality >40 and mapping quality >37. deviation from the expectation of no such alleles (p ¼ 3.5 10 6). Additionally, we filtered out sites with collapsed coverage greater Therefore, we must explain the presence of the small fraction of than 15. We chose not to use a minimum read depth for the Iceman young alleles discovered in the Iceman. Assuming unbiased age genome in order to maximize the number of available sites repre- estimations, the problem must lie either in statistical error of age sented. Because of this, we chose a single random read from each estimation or with error in the aDNA, including deamidation of € SNV and joined the data from Otzi with the ESP allele ages. cytosine residues, sequencing error, and contamination. The esti- mated fraction of the Iceman sequence subject to these sorts of Neandertal genome data errors has been estimated to be approximately 2.5% of the genome (Keller et al., 2012). Our analysis should be confounded by these to a The Altai Neandertal VCF files were downloaded from the Max lesser extent since we only considered sites with known poly- Planck Institute's Department of Evolutionary Genetics server morphisms in present-day populations and removed transition (http://cdna.eva.mpg.de/neandertal/altai/AltaiNeandertal/VCF/). polymorphisms (C-T, A-G) to reduce the impact of cytosine dea- We filtered the data using minimum quality filters, which included midation errors. Only 0.86% of derived alleles that are present in the the following filters: removal of repeat sites from the Tandem Iceman are estimated to be younger than 5000 years, and they Repeat Finder, removal of sites with mapping quality <30, inclusion show a different distribution, shifted towards older ages, compared of only sites that pass a genome alignability filter that requires that with absent alleles that are also estimated to be younger than 5000 50% of 35-mers overlap a position map uniquely allowing up to one years (Fig. 1). Combined, these results support that the vast ma- mismatch, and removal of sites with read depth in the most jority of derived alleles observed in the Iceman are genuine. extreme 5% of the genome-wide coverage distribution. In order to Another piece of evidence supporting these conclusions stems compare the presence-absence distributions of the Iceman and from the frequency spectrum of present alleles: Under neutrality, Neandertal, we extracted only SNVs that were also ascertained in the probability that an allele is observed in a single ancient genome the Iceman. Rather than choosing a single read as in the Iceman, we from an ancestral population can be approximated by the present- simply scored (as 0 or 1) each genotype for the presence or absence day frequency of the allele. Fig. 2A shows the correlation of of the derived allele.

Block bootstrap tests

We used a block bootstrap approach (Lahiri, 2003; Keinan et al., 2007) to assess the significance of the median difference in allele age between presence/absence categories from zero and to assess the significance of deviations of odds-ratios from one (after dividing the sample into four young/old, present/absent bins). We resampled the ESP dataset (with replacement) in blocks of 200 Kb a total of 10,000 times and used the mean and standard error of these bootstrap distributions to calculate Z-scores and p-values.

Results and discussion

We analyzed, in total, 119,421 derived allele age estimates as estimated by the ESP for exonic single-nucleotide variants (Fu et al., 2013). These estimates are based on coalescent simulations using a model developed by Tennessen et al. (2012) and allele frequencies Figure 1. Empirical cumulative distributions of mean allele age divided by its presence from a sample of 4298 European Americans. For each SNV, we or absence in the Iceman and the Altai Neandertal. Empirical distributions of derived considered whether the derived allele is observed (present) or not allele age estimates from ESP for each subset of alleles based on their presence/absence (absent) in each of two ancient genomes, the approximately 5300 in the Iceman and the Altai Neandertal. As expected, alleles observed in ancient ge- € nomes are older than those not observed, and the distribution of the latter conforms to year old Tyrolean Iceman (Otzi) (Keller et al., 2012) and an observations from extant allele frequencies of most alleles being young. Inset bar chart approximately 50,000 year old Neandertal from the Altai Moun- summarizes the distributions at several time points (in ka). Note in particular the tains of Siberia (Prüfer et al., 2014). excess proportion of ‘young present’ alleles in the Altai genome. 68 A.J. Sams et al. / Journal of Human Evolution 79 (2015) 64e72

Table 1 Number of alleles for each combination of presence (P) or absence (A) in the Iceman and older (O) or younger (Y) age (in ka) compared to the threshold indicated in each row.

Meana SDa Y þ PYþ AOþ POþ AORb SE Z p-valuec

5 0 24 99,573 2770 17,054 0.0015 0.00032 4.64 3.5 10 6 5 2 0 0 2794 116,627 N/A N/A N/A N/A 8 0 27 102,670 2767 13,957 0.0013 0.00026 4.9 9.8 10 7 8 2 17 75,282 2777 41,345 0.0034 0.00078 3.85 0.0001

a Mean plus standard deviation (in ka) of coalescent age from ESP used as age cutoff. b Mean odds-ratio from block-bootstrap test. c p-value for difference of odds-ratio from 0. present-day frequency against the fraction of derived alleles that conservative upper bound estimate of allele age, we lose all power are observed in the Iceman. The slope of the best-fit regression line to differentiate alleles by the age of the Iceman sample since that is 1.03, reflecting a slight and insignificant increase compared with upper bound is lower than 5000 years for no alleles in our sample the neutral expectation, potentially due to the averaging of allele (Table 1). As the Iceman is unlikely to be directly ancestral to any frequencies that fit into a certain frequency bin or the small number individuals in the modern sample, the age of this sample (and other of total sites in the higher frequency bins. ancient samples) may be significantly younger than the time of the Another explanation for the small surplus of alleles estimated at population split between the present-day population and the younger than 5000 years and present in the Iceman genome population of the Iceman. In this context, the dating of the skeletal involved statistical error in allele age estimation. Indeed, when we sample is a conservative lower bound for the minimum age esti- considered the mean age plus two standard deviations as a mates we should expect to find in the ancient sample. Hence, we repeated analyses with a slightly less conservative age of 8000 years (Table 1).

Ascertainment in a Neandertal genome provides a poorer fit to allele age estimation

To bring more perspective to the results from the Iceman genome, we additionally examined patterns of presence-absence in the recently published (Prüfer et al., 2014) high-coverage genome of a Neandertal discovered in Denisova Cave, Russia. As with the Iceman, the distribution of ages for alleles present in the Altai Neandertal is significantly shifted towards older ages compared with that of alleles that are absent (Fig. 1; median difference, p < 10 10). For this genome, we used a conservatively young esti- mated age of 50,000 years. In comparing with age estimates from ESP bases on allele frequencies, it is important to recognize that they are based on a model that does not include a contribution of Neandertal ancestry to present-day European populations (Fu et al., 2013); therefore, we might expect to see a larger bias in the dis- tribution of presence-absence by age than observed in the much younger Iceman genome. Table 2 illustrates that the model still performs relatively well, given that the fraction of derived alleles observed in modern humans and the Neandertal genome is ex- pected to be small, since the majority of derived alleles in the ESP European sample are expected to have arisen since the population divergence of ancestral modern humans and Neandertals (Coventry et al., 2010; Keinan and Clark, 2012; Tennessen et al., 2012; Fu et al., 2013; Kiezun et al., 2013). Additionally, the distribution of ages relative to whether an allele is observed in the Neandertal genome suggests that the age estimates are relatively accurate (Fig. 1). Similarly, the overall fraction of derived alleles observed in the Neandertal genome (0.7%) is substantially reduced compared with Figure 2. Proportion of derived alleles present in the Iceman and the Altai Neandertal by derived frequency. Plots illustrate the fraction of derived alleles observed in the the Iceman (2.3%), in line with its older age. Iceman (A) and Altai Neandertal (B) at each 0.5% frequency bin. The dashed red line However, the fraction of sites estimated to be younger than (y ¼ x) illustrates the expectation from purely neutral evolution, assuming the ancient 50,000 years that are present in the Neandertal genome is 0.08% individual is from a population directly ancestral to the modern sample. Note the and the overall odds of observing a derived allele younger than extreme skew from this expectation for the Neandertal in panel B. Solid blue lines in both panels indicate linear regression ‘best-fit’ of the data. The linear regression slope 50,000 years old relative to those older than 50,000 years old for the Iceman is not significantly different from the neutral expectation (B ¼ 1.03, (OR ¼ 0.01) are less reduced compared with the Iceman. Compared SE ¼ 0.02). The linear regression slope for Altai is significantly different from 1 with the Iceman, this represents an even greater excess of ‘young (B ¼ 0.35, SE ¼ 0.018). The green line in panel B illustrates nonlinear (quadratic) present’ alleles above zero (p ¼ 5.35 10 12). One explanation for regression, which better fits the data (R2 ¼ 0.85 versus 0.66) and illustrates a more this pattern is the fact that statistical uncertainty in allele age in- complex population history with the present sample. Insets in both panels show the lowest ten frequency bins. (For interpretation of the references to color in this figure creases with allele age. Therefore, the pattern of presence/absence legend, the reader is referred to the web version of this article.) in the Altai genome could be explained by this increased statistical A.J. Sams et al. / Journal of Human Evolution 79 (2015) 64e72 69

Table 2 Number of alleles for each combination of presence (P) or absence (A) in the Altai Neandertal and older (O) or younger (Y) age (in ka) compared to the threshold indicated in each row.

Meana SDa Y þ PYþ AOþ POþ AORb SE Z p-valuec

50 0 93 109,707 763 8858 0.01 0.0015 6.9 5.35 10 12 50 2 79 100,573 777 17,992 0.018 0.0029 6.31 2.74 10 10 100 0 101 111,767 755 6798 0.008 0.001 7.24 4.38 10 13 100 2 86 104,775 770 13,790 0.015 0.0023 6.45 1.11 10 10

a Mean plus standard deviation (in ka) of coalescent age from ESP used as age cutoff. b Mean odds-ratio from block-bootstrap test. c p-value for difference of odds-ratio from 0. uncertainty. Consequently, we also measured the same statistics Considering the large range of error implicit in the coalescent using an upper bound of mean age plus two standard deviations process, none of the 24 young observed alleles are significantly (Table 2), holding the age cutoff constant at 50,000 years and the younger than 5000 years (mean þ 2SD). It is also worthy of note odds-ratio remains highly significant (OR ¼ 0.018, p ¼ 2.74 10 10 that the true number of independent loci is 20 (several of the 24 for difference from zero). This excess of ‘young present’ alleles in SNPs are in LD). Similarly, 79 of 93 of the young alleles observed in the Altai genome is visually evident in Fig. 1. the Neandertal are significantly younger than 50,000 years The distribution of ages of alleles that are present in the Altai (Tables 1 and 2). This highlights an important problem with allele- genome before the 400 ka (thousands of years) cross over is age estimation for which aDNA will be especially useful in the essentially flat, reflecting the fact that the majority of ‘present’ al- future. Extensions of our allele presence-absence procedure to leles in this genome are lower frequency compared with those larger samples of high quality and geologically well-dated samples ‘present’ in the Iceman. We should not expect the Neandertal of aDNA will facilitate the reduction of these large ranges of error by genome to carry a much higher proportion of truly ancient alleles empirically documenting the geographic and temporal locality of compared with the Iceman for alleles that are of very low fre- specific alleles. quency. This excess of lower frequency alleles likely reflects Neandertal ancestry in the present-day sample. We conclude, therefore, that the observation of derived alleles in the Neandertal genome given ages estimated from present-day samples is more biased. The amount of sequencing error/contam- ination necessary to explain the fraction of incorrectly estimated ages (0.11) is 11 times greater than the 0.01 inferred amount of error in this high-coverage genome, which is further reduced by geno- type calling (Prüfer et al., 2014), making errors in the Neandertal sequence itself an unlikely explanation for this bias. This skew in the presence-absence distribution for the Altai genome should not be surprising, as the Neandertals have a mostly separate population history from our African ancestors after at least 300 ka. This different population history is strongly reflected in the frequency distribution of ‘present’ Altai alleles (Fig. 2B). If more recent admixture from this archaic population is not explicitly modeled to produce the allele age estimates, we should expect the low- frequency variants contributed to modern populations by Nean- dertals to be estimated to be much younger than they truly are.

General performance of allele age estimation relative to ancient genomes

Our results suggest that, overall, the frequency- and coalescent- based allele age estimation employed by Fu et al. (2013) performs relatively well compared to empirical observations in aDNA. We note that our results described above and in the tables and figures in this manuscript refer to the mean allele ages (and mean þ 2SD where noted) estimated by Fu et al. (2013). In reality, a small pro- portion of rare derived alleles (proportional to the frequency) are expected to be much older than the average age for that frequency under neutrality (Kimura and Ohta, 1973), which may explain a portion of our observed ‘young present’ alleles. Additionally, age estimation can be biased for alleles in which frequency has been Figure 3. Empirical cumulative distributions of age divided by presence-absence in affected by natural selection. In fact, we observe in the Iceman that Iceman and high-low phyloP scores. Empirical distributions of derived alleles in our alleles at conserved sites (high phyloP scores) have systematically sample divided into four categories based on presence-absence in the Iceman (A) and younger age estimates than at non-conserved sites (Fig. 3), a result Altai Neandertal (B), and an estimate of high base conservation (as measured by the sample 95th percentile of phyloP scores). More highly conserved sites on average are that is consistent with purifying selection pushing alleles towards lower in frequency and, therefore, have younger estimated ages. More conserved sites low frequencies, and with previous empirical observations (Fu also contain a higher proportion of ‘young present’ alleles in the ancient genomes, et al., 2013; Kiezun et al., 2013). which likely reflects deviations of these sites as a group from neutral expectations. 70 A.J. Sams et al. / Journal of Human Evolution 79 (2015) 64e72

Towards improved demographic models approaches are based on the availability of several samples with genetic continuity through time. Similar to some approaches based In addition to utilizing ancient genomes to narrow the range of on extant population genetic data, these time-series approaches allele ages estimated from population genetic data, we envision a assume a model of demographic history, including population sizes process by which allele presence-absence in ancient genomes can at each sampled time point. Consider the case of lactase persistence be used to create and fit more realistic demographic models. The described in the Introduction. In this case, current sampling of importance of using an ancient genome in this context is that if age aDNA supports selection time as estimated from present-day estimations were absolutely accurate, demography could not pro- samples and points to the lactase persistence allele having risen duce our observed results. We should never observe alleles that to present frequency rapidly in situ in Europe. However, at present, arose in the past 5000 years in a 5300 year-old genome. Therefore, no DNA from ancient agricultural or pre-agricultural populations of we must account for the potential sources of bias in these age the Near East has been recovered. The lack of observation of the estimates. lactase persistence allele in ancient northern Europe does not, by A major source of bias on allele age estimates generated from itself, establish just how recent selection for lactase persistence has frequency and coalescent simulations is the demographic model been. This case is, therefore, a perfect illustration of the need for used to generate the coalescent trees. Fu et al. (2013) rely on a larger, more geographically diverse samples of aDNA. With demographic model of the history of European and African pop- moderately sized samples of aDNA from northern Europe, southern ulations generated by Tennessen et al. (2012), which has several Europe, and the Near East from approximately 8000 years ago we key parameters that affect the average age of alleles, most impor- could empirically test hypotheses about positive selection on al- tantly the timing of recent population expansion, but also the leles genome-wide in a probabilistic framework. Failure to observe timing of the Out-of-Africa event and the rate of growth before and the lactase persistence allele in such larger samples from different during the recent population expansion. The effects of the different locales will go a long way in solidifying the observation of very parameterization of several recently published models that include recent, strong positive selection on this allele. recent population growth (Coventry et al., 2010; Nelson et al., 2012; Tennessen et al., 2012) leads to as much as a two-fold difference in Conclusions average allele age (Fu et al., 2013). However, several events that have not been included in any recently published models may also We have demonstrated the utility of ancient human genomes substantially bias allele age when not included in the underlying for examining the accuracy of allele age estimates derived from model used for simulation. For example, recent expansion of the extant samples and a coalescent-based framework under an European population is typically modeled as in situ growth when, assumed demographic model. Our results illustrate that such allele in all likelihood, this expansion event included or was preceded by age estimates in European Americans, based on recently published an influx of gene flow from agricultural populations in the Near models of European demographic history, fit well with expecta- East, which themselves may have experienced substantial gene tions based on the observation or lack thereof of these alleles in a flow from Africa (Lazaridis et al., 2013). Fu et al., 2013 have exam- 5300 year old European genome (Fig. 2A). However, comparison to ined the effects of modest migration rates on allele age during a high-coverage Neandertal genome illustrates at least one major recent population expansion (between 0 and 15 10 5 per chro- gap in the models of European demographic history that have been mosome, per generation), the effective admixture rate from pop- used for allele age estimation (Fig. 2B). Our results and discussion ulations outside of ancient Europe may have been much larger. paint a picture of improved demographic modeling, allele age Our observations of alleles in the Neandertal genome highlights estimation, and tests of natural selection via increased inclusion of another major issue with these demographic models, which is that the growing sample of ancient human genomes. They also point to they do not include the contribution of archaic human populations the importance of considering a more diverse set of aDNA sam- to present-day human ancestry. By the most recent estimates, these plesdboth geographically and temporallydfor making significant ancient gene flow events contributed, on average, ~2% of the improvements in these areas. ancestry of present-day Europeans (Prüfer et al., 2014). This sug- gests that a small but substantial fraction of low frequency variants, introduced by gene flow from Neandertals, will have true ages that Acknowledgements are much older than the average age of alleles with similar fre- quency under a model without Neandertal admixture. While We would like to thank Karen Strier, Bret Payseur, Henry Bunn, models have been developed that include archaic introgression (for and Travis Pickering for comments on an earlier version of this example, see Harris and Nielsen, 2013), none have been explicitly paper. We would also like to thank Sriram Sankararaman and David applied to the allele age problem. The inclusion of more aDNA in the Reich for providing a set of quality filters for the Altai Neandertal way that we have illustrated here will allow improving the VCFs and Wenqing Fu and Joshua Akey for providing phyloP data parameterization of human demographic models. for the ESP dataset. Finally, we would like to thank Leonardo Arbiza for providing alignments of the inferred ancestral genome. This Towards estimates of recent positive natural selection work was supported in part by NIH grant R01GM108805 (A.K.) and by an award from the Ellison Medical Foundation (A.K.). Coalescent/frequency based estimates of allele age typically assume an absence of positive selection. However, in many cases References we are most interested in understanding whether a particular allele has been subject to recent positive selection. Moreover, it has been Ammerman, A.J., Cavalli-Sforza, L.L., 1984. The Neolithic Transition and the Genetics proposed that the amount of positive selection increased during of Populations in Europe. Princeton University Press, Princeton. Balaresque, P., Bowden, G.R., Adams, S.M., Leung, H.-Y., King, T.E., Rosser, Z.H., the recent epoch of population expansion (Hawks et al., 2007). Can Goodwin, J., Moisan, J.-P., Richard, C., Millward, A., Demaine, A.G., Barbujani, G., ancient genomes be used to systematically identify cases of recent Previdere, C., Wilson, I.J., Tyler-Smith, C., Jobling, M.A., 2010. A predominantly positive selection? There is a quickly growing literature on inferring Neolithic origin for European paternal lineages. PLoS Biol. 8, e1000285. Belle, E.M.S., Landry, P.A., Barbujani, G., 2006. Origins and evolution of the Euro- natural selection from genetic time-series data (Bollback et al., peans' genome: evidence from multiple microsatellite loci. Proc. R. Soc. B 273, 2008; Malaspinas et al., 2012; Feder et al., 2014). These 1595e1602. A.J. Sams et al. / Journal of Human Evolution 79 (2015) 64e72 71

Bersaglieri, T., Sabeti, P., Patterson, N., Vanderploeg, T., Schaffner, S., Drake, J., origin and phenotype as inferred by whole-genome sequencing. Nat. Commun. Rhodes, M., Reich, D., Hirschhorn, J., 2004. Genetic signatures of strong recent 3, 698. positive selection at the lactase gene. Am. J. Hum. Genet. 74, 1111e1120. Kiezun, A., Pulit, S.L., Francioli, L.C., van Dijk, F., Swertz, M., Boomsma, D.I., van Bollback, J.P., York, T.L., Nielsen, R., 2008. Estimation of 2Nes from temporal allele Duijn, C.M., Slagboom, P.E., van Ommen, G.J.B., Wijmenga, C., Consortium, G.O.T.N., frequency data. Genetics 179, 497e502. De Bakker, P.I.W., Sunyaev, S.R., 2013. Deleterious alleles in the human genome are Bollongino, R., Edwards, C.J., Alt, K.W., Burger, J., Bradley, D., 2006. Early history of on average younger than neutral alleles of the same frequency. PLoS Genet. 9, European domestic cattle as revealed by ancient DNA. Biol. Lett. 2, 155e159. e1003301. Bramanti, B., 2008. Ancient DNA: genetic analysis of aDNA from sixteen skeletons of Kimura, M., Ohta, T., 1973. The age of a neutral mutant persisting in a finite pop- the Vedrovice. Anthropologie 46, 153e160. ulation. Genetics 75, 199e212. Bramanti, B., Thomas, M.G., Haak, W., Unterlaender, M., Jores, P., Tambets, K., Lahiri, S.N., 2003. Resampling Methods for Dependent Data. Springer, Dordrecht. Antanaitis-Jacobs, I., Haidle, M.N., Jankauskas, R., Kind, C.J., Lueth, F., Lazaridis, I., Patterson, N., Mittnik, A., Renaud, G., Mallick, S., Sudmant, P.H., Terberger, T., Hiller, J., Matsumura, S., Forster, P., Burger, J., 2009. Genetic Schraiber, J.G., Castellano, S., Kirsanow, K., Economou, C., Bollongino, R., Fu, Q., discontinuity between local hunter-gatherers and central Europe's first farmers. Bos, K., Nordenfelt, S., de Filippo, C., Prüfer, K., Sawyer, S., Posth, C., Haak, W., Science 326, 137e140. Hallgren, F., Fornander, E., Ayodo, G., Babiker, H.A., Balanovska, E., Burger, J., Kirchner, M., Bramanti, B., Haak, W., Thomas, M.G., 2007. Absence of the Balanovsky, O., Ben-Ami, H., Bene, J., Berrada, F., Brisighelli, F., Busby, G.B.J., lactase-persistence-associated allele in early Neolithic Europeans. Proc. Natl. Cali, F., Churnosov, M., Cole, D.E.C., Damba, L., Delsate, D., van Driem, G., Acad. Sci. 104, 3736e3741. Dryomov, S., Fedorova, S.A., Francken, M., Romero, I.G., Gubina, M., Guinet, J.-M., Childe, V.G., 1958. The Dawn of European Civilization, Sixth edition. Knopf, New Hammer, M., Henn, B., Helvig, T., Hodoglugil, U., Jha, A.R., Kittles, R., York. Khusnutdinova, E., Kivisild, T., Kucinskas, V., Khusainova, R., Kushniarevich, A., Coventry, A., Bull-Otterson, L.M., Liu, X., Clark, A.G., Maxwell, T.J., Crosby, J., Laredj, L., Litvinov, S., Mahley, R.W., Melegh, B., Metspalu, E., Mountain, J., Hixson, J.E., Rea, T.J., Muzny, D.M., Lewis, L.R., Wheeler, D.A., Sabo, A., Lusk, C., Nyambo, T., Osipova, L., Parik, J., Platonov, F., Posukh, O.L., Romano, V., Rudan, I., Weiss, K.G., Akbar, H., Cree, A., Hawes, A.C., Newsham, I., Varghese, R.T., Ruizbakiev, R., Sahakyan, H., Salas, A., Starikovskaya, E.B., Tarekegn, A., Villasana, D., Gross, S., Joshi, V., Santibanez, J., Morgan, M., Chang, K., Hale IV, W., Toncheva, D., Turdikulova, S., Uktveryte, I., Utevska, O., Voevoda, M., Wahl, J., Templeton, A.R., Boerwinkle, E., Gibbs, R., Sing, C.F., 2010. Deep resequencing Zalloua, P., Yepiskoposyan, L., Zemunik, T., Cooper, A., Capelli, C., Thomas, M.G., reveals excess rare recent variants consistent with explosive population Tishkoff, S.A., Singh, L., Thangaraj, K., Villems, R., Comas, D., Sukernik, R., growth. Nat. Commun. 1, 131. Metspalu, M., Meyer, M., Eichler, E.E., Burger, J., Slatkin, M., Pa€abo,€ S., Kelso, J., Enattah, N., Sahi, T., Savilahti, E., Terwilliger, J., 2002. Identification of a variant Reich, D., Krause, J., 2013. Ancient human genomes suggest three ancestral associated with adult-type hypolactasia. Nature 30, 233e237. populations for present-day Europeans. Nature 513, 409e413. Enattah, N., Trudeau, A., Pimenoff, V., Maiuri, L., 2007. Evidence of still-ongoing Lee, E.J., Makarewicz, C., Renneberg, R., Harder, M., Krause-Kyora, B., Müller, S., convergence evolution of the lactase persistence T-13910 alleles in humans. Ostritz, S., Fehren-Schmitz, L., Schreiber, S., Müller, J., Wurmb-Schwark, N., von Am. J. Hum. Genet. 81, 615e625. Nebel, A., 2012. Emerging genetic patterns of the European Neolithic: Per- Enattah, N.S., Jensen, T.G.K., Nielsen, M., Lewinski, R., Kuokkanen, M., Rasinpera,€ H., spectives from a late Neolithic Bell Beaker burial site in Germany. Am. J. Phys. El-Shanti, H., Seo, J.K., Alifrangis, M., Khalil, I.F., 2008. Independent introduction Anthropol. 148, 571e579. of two lactase-persistence alleles into human populations reflects different Malaspinas, A.-S., Malaspinas, O., Evans, S.N., Slatkin, M., 2012. Estimating allele age history of adaptation to milk culture. Am. J. Hum. Genet. 82, 57e72. and selection coefficient from time-serial data. Genetics 192, 599e607. Feder, A.F., Kryazhimskiy, S., Plotkin, J.B., 2014. Identifying signatures of selection in Malmstrom,€ H., Linderholm, A., Liden, K., Storå, J., Molnar, P., Holmlund, G., genetic time series. Genetics 196, 509e522. Jakobsson, M., Gotherstr€ om,€ A., 2010. High frequency of lactose intolerance in a Fu, Q., Rudan, P., Pa€abo,€ S., Krause, J., 2012. Complete mitochondrial genomes reveal prehistoric hunter-gatherer population in northern Europe. BMC Evol. Biol.10, 89. Neolithic expansion into Europe. PLoS One 7, e32473. Meyer, M., Kircher, M., Gansauge, M.T., Li, H., Racimo, F., Mallick, S., Schraiber, J.G., Fu, W., O'Connor, T.D., Jun, G., Kang, H.M., Abecasis, G., Leal, S.M., Gabriel, S., Jay, F., Prüfer, K., de Filippo, C., 2012. A high-coverage genome sequence from an Rieder, M.J., Altshuler, D., Shendure, J., Nickerson, D.A., Bamshad, M.J., NHLBI archaic Denisovan individual. Science 338, 222e226. Exome Sequencing Project, Akey, J.M., 2013. Analysis of 6,515 exomes reveals Meyer, M., Fu, Q., Aximu-Petri, A., Glocke, I., Nickel, B., Arsuaga, J.L., Martínez, I., the recent origin of most human protein-coding variants. Nature 493, 216e220. Gracia, A., de Castro, J.M.B., Carbonell, E., Pa€abo,€ S., 2013. A mitochondrial Genin, E., Tullio-Pelet, A., Begeot, F., Lyonnet, S., Abel, L., 2004. Estimating the age of genome sequence of a hominin from Sima de los Huesos. Nature 505, 403e406. rare disease mutations: the example of Triple-A syndrome. J. Med. Genet. 41, Nelson, M.R., Wegmann, D., Ehm, M.G., Kessner, D., St. Jean, P., Verzilli, C., Shen, J., 445e449. Tang, Z., Bacanu, S.A., Fraser, D., Warren, L., Aponte, J., Zawistowski, M., Liu, X., Gerbault, P., Moret, C., Currat, M., Sanchez-Mazas, A., 2009. Impact of selection and Zhang, H., Zhang, Y., Li, J., Li, Y., Li, L., Woollard, P., Topp, S., Hall, M.D., Nangle, K., demography on the diffusion of lactase persistence. PLoS One 4, e6369. Wang, J., Abecasis, G., Cardon, L.R., Zollner, S., Whittaker, J.C., Chissoe, S.L., Griffiths, R.C., Tavare, S., 1998. The age of a mutation in a general coalescent tree. Novembre, J., Mooser, V., 2012. An abundance of rare functional variants in 202 Communications in Statistics. Stoch. Models 14, 273e295. drug target genes sequenced in 14,002 people. Science 337, 100e104. Guba, Z., Hadadi, E., Major, A., Furka, T., Juhasz, E., Koos, J., Nagy, K., Zeke, T., 2011. Nikitin, A.G., Newton, J.R., Potekhina, I.D., 2012. Mitochondrial haplogroup C in HVS-I polymorphism screening of ancient human mitochondrial DNA provides ancient mitochondrial DNA from Ukraine extends the presence of East Eurasian evidence for N9a discontinuity and East Asian haplogroups in the Neolithic genetic lineages in Neolithic Central and Eastern Europe. J. Hum. Genet. 57, Hungary. J. Hum. Genet. 56, 784e796. 610e612. Haak, W., Forster, P., Bramanti, B., Matsumura, S., Brandt, G., Tanzer,€ M., Villems, R., Novembre, J., Stephens, M., 2008. Interpreting principal component analyses of Renfrew, C., Gronenborn, D., Alt, K.W., 2005. Ancient DNA from the first Euro- spatial population genetic variation. Nat. Genet. 40, 646e649. pean farmers in 7500-year-old Neolithic sites. Science 310, 1016e1018. Plantinga, T.S., Alonso, S., Izagirre, N., Hervella, M., Fregel, R., van der Meer, J.W., Haak, W., Brandt, G., Jong, H.N.D., Meyer, C., Ganslmeier, R., Heyd, V., Netea, M.G., de la Rúa, C., 2012. Low prevalence of lactase persistence in Hawkesworth, C., Pike, A.W.G., Meller, H., Alt, K.W., 2008. Ancient DNA, Neolithic South-West Europe. European J. Hum. Genet. 20, 778e782. Strontium isotopes, and osteological analyses shed light on social and kinship Price, T.D., 2000. Europe's First Farmers. Cambridge University Press, New York. organization of the Later Stone Age. Proc. Natl. Acad. Sci. 105, 18226e18231. Prüfer, K., Racimo, F., Patterson, N., Jay, F., Sankararaman, S., Sawyer, S., Heinze, A., Haak, W., Balanovsky, O., Sanchez, J.J., Koshel, S., Zaporozhchenko, V., Adler, C.J., Renaud, G., Sudmant, P.H., de Filippo, C., Li, H., Mallick, S., Dannemann, M., Sarkissian Der, C.S.I., Brandt, G., Schwarz, C., Nicklisch, N., Dresely, V., Fritsch, B., Fu, Q., Kircher, M., Kuhlwilm, M., Lachmann, M., Meyer, M., Ongyerth, M., Balanovska, E., Villems, R., Meller, H., Alt, K.W., Cooper, A., The Genographic Siebauer, M., Theunert, C., Tandon, A., Moorjani, P., Pickrell, J., Mullikin, J.C., Consortium, 2010. Ancient DNA from European early Neolithic farmers reveals Vohr, S.H., Green, R.E., Hellmann, I., Johnson, P.L.F., Blanche, H., Cann, H., their Near Eastern affinities. PLoS Biol. 8, e1000536. Kitzman, J.O., Shendure, J., Eichler, E.E., Lein, E.S., Bakken, T.E., Golovanova, L.V., Harris, K., Nielsen, R., 2013. Inferring demographic history from a spectrum of Doronichev, V.B., Shunkov, M.V., Derevianko, A.P., Viola, B., Slatkin, M., Reich, D., shared haplotype lengths. PLoS Genet. 9, e1003521. Kelso, J., Pa€abo,€ S., 2014. The complete genome sequence of a Neanderthal from Hawks, J., Wang, E.T., Cochran, G.M., Harpending, H.C., Moyzis, R.K., 2007. Recent the Altai Mountains. Nature 505, 43e49. acceleration of human adaptive evolution. Proc. Natl. Acad. Sci. 104, Puit, I., Sajantila, A., Simanainen, J., Georgiev, O., Schaffner, W., Pa€€abo, S., 1994. 20753e20758. Mitochondrial DNA sequences from Switzerland reveal striking homogeneity of Holden, C., Mace, R., 1997. Phylogenetic analysis of the evolution of lactose digestion European populations. Biol. Chem. Hoppe-Seyler 375, 837e840. in adults. Hum. Biol. 69, 605e628. Rannala, B., Reeve, J.P., 2001. High-resolution multipoint linkage-disequilibrium Keinan, A., Clark, A.G., 2012. Recent explosive human population growth has mapping in the context of a human genome sequence. Am. J. Hum. Genet. 69, resulted in an excess of rare genetic variants. Science 336, 740e743. 159e178. Keinan, A., Mullikin, J.C., Patterson, N., Reich, D., 2007. Measurement of the human Richards, M., Macaulay, V., Hickey, E., Vega, E., Sykes, B., Guida, V., Rengo, C., allele frequency spectrum demonstrates greater in East Asians Sellitto, D., Cruciani, F., Kivisild, T., Villems, R., Thomas, M., Rychkov, S., than in Europeans. Nat. Genet. 39, 1251e1255. Rychkov, O., Rychkov, Y., lge, M.G., Dimitrov, D., Hill, E., Bradley, D., Romano, V., Keller, A., Graefen, A., Ball, M., Matzas, M., Boisguerin, V., Maixner, F., Leidinger, P., Calı, F., Vona, G., Demaine, A., Papiha, S., Triantaphyllidis, C., Stefanescu, G., Backes, C., Khairat, R., Forster, M., Stade, B., Franke, A., Mayer, J., Spangler, J., Hatina, J., Belledi, M., Di Rienzo, A., Novelletto, A., Oppenheim, A., Nørby, S., Al- McLaughlin, S., Shah, M., Lee, C., Harkins, T.T., Sartori, A., Moreno-Estrada, A., Zaheri, N., Santachiara-Benerecetti, S., Scozzari, R., Torroni, A., Bandelt, H.-J.R., Henn, B., Sikora, M., Semino, O., Chiaroni, J., Rootsi, S., Myres, N.M., 2000. Tracing European founder lineages in the Near Eastern mtDNA pool. Am. Cabrera, V.M., Underhill, P.A., Bustamante, C.D., Vigl, E.E., Samadelli, M., J. Hum. Genet. 67, 1251. Cipollini, G., Haas, J., Katus, H., O'Connor, B.D., Carlson, M.R.J., Meder, B., Blin, N., Risch, N., de Leon, D., Ozelius, L., Kramer, P., Almasyz, L., Singer, B., Fahn, S., Meese, E., Pusch, C.M., Zink, A., 2012. New insights into the Tyrolean Iceman's Breakefield, X., Bressman, S., 1995. Genetic analysis of idiopathic torsion 72 A.J. Sams et al. / Journal of Human Evolution 79 (2015) 64e72

dystonia in Ashkenazi Jews and their recent descent from a small founder linked to the cystic fibrosis locus throughout Europe lead to new considerations population. Nat. Genet. 9, 152e159. in populations genetics. Hum. Genet. 84, 449e454. Rosser, Z.H., Zerjal, T., Hurles, M.E., Adojaan, M., Alavantic, D., Amorim, A.N., Simoni, L., Calafell, F., Pettener, D., Bertranpetit, J., Barbujani, G., 2000. Geographic Amos, W., Armenteros, M., Arroyo, E., Barbujani, G., Beckman, G., Beckman, L., patterns of mtDNA diversity in Europe. Am. J. Hum. Genet. 66, 262e278. Bertranpetit, J., Bosch, E., Bradley, D.G., Brede, G., Cooper, G., Corte- Skoglund, P., Malmstrom,€ H., Raghavan, M., Storå, J., Hall, P., Willerslev, E., Real, H.B.S.M., de Knijff, P., Decorte, R., Dubrova, Y.E., Evgrafov, O., Gilissen, A., Gilbert, M.T.P., Gotherstr€ om,€ A., Jakobsson, M., 2012. Origins and genetic legacy Glisic, S., lge, M.G., Hill, E.W., Jeziorowska, A., Kalaydjieva, L., Kayser, M., of Neolithic farmers and hunter-gatherers in Europe. Science 336, 466e469. Kivisild, T., Kravchenko, S.A., Krumina, A., Kucinskas, V., Lavinha, J.O., Slatkin, M., Rannala, B., 2000. Estimating allele age. A. Rev. Genom. Hum. Genet. 1, Livshits, L.A., Malaspina, P., Maria, S., McElreavey, K., Meitinger, T.A., 225e249. Mikelsaar, A.-V., Mitchell, R.J., Nafa, K., Nicholson, J., Nørby, S., Pandya, A., Sokal, R.R., Menozzi, P., 1982. Spatial autocorrelations of HLA frequencies in Europe Parik, J.R., Patsalis, P.C., Pereira, L.I., Peterlin, B., Pielberg, G., Prata, M.J.O., support demic diffusion of early farmers. Am. Nat. 119, 1e17. Previdere, C., Roewer, L., Rootsi, S., Rubinsztein, D.C., Saillard, J., Santos, F.R., Sokal, R.R., Oden, N.L., Wilson, C., 1991. Genetic evidence for the spread of agri- Stefanescu, G., Sykes, B.C., Tolun, A., Villems, R., Tyler-Smith, C., Jobling, M.A., culture in Europe by demic diffusion. Nature 351, 143e145. 2000. Y-chromosomal diversity in Europe is clinal and influenced primarily by Tennessen, J.A., Bigham, A.W., O'Connor, T.D., Fu, W., Kenny, E.E., Gravel, S., geography, rather than by language. Am. J. Hum. Genet. 67, 1526. McGee, S., Do, R., Liu, X., Jun, G., Kang, H.M., Jordan, D., Leal, S.M., Gabriel, S., Sabeti, P.C., Reich, D.E., Higgins, J.M., Levine, H.Z.P., Richter, D.J., Schaffner, S.F., Rieder, M.J., Abecasis, G., Altshuler, D., Nickerson, D.A., Boerwinkle, E., Gabriel, S.B., Platko, J.V., Patterson, N.J., McDonald, G.J., 2002. Detecting recent Sunyaev, S., Bustamante, C.D., Bamshad, M.J., Akey, J.M., Broad, G.O., positive selection in the human genome from haplotype structure. Nature 419, Seattle, G.O., NHLBI Exome Sequencing Project, 2012. Evolution and functional 832e837. impact of rare coding variation from deep sequencing of human exomes. Sci- Sams, A., Hawks, J., 2013. Patterns of population differentiation and natural selec- ence 337, 64e69. tion on the celiac disease background risk network. PLoS One 8, e70564. Tishkoff, S.A., Reed, F.A., Ranciaro, A., Voight, B.F., Babbitt, C.C., Silverman, J.S., Sanchez-Quinto, F., Schroeder, H., Ramirez, O., Avila-Arcos, M.C., Pybus, M., Powell, K., Mortensen, H.M., Hirbo, J.B., Osman, M., 2006. Convergent adapta- Olalde, I., Velazquez, A., Marcos, M.E.P., Encinas, J.M.V., Bertranpetit, J., 2012. tion of human lactase persistence in Africa and Europe. Nat. Genet. 39, 31e40. Genomic affinities of two 7,000-year-old Iberian hunter-gatherers. Curr. Biol. Voight, B.F., Kudaravalli, S., Wen, X., Pritchard, J.K., 2006. A map of recent positive 22, R631eR633. selection in the human genome. PLoS Biol. 4, e72. Serre, J.L., Simon-Bouy, B., Mornet, E., Jaume-Roig, B., Balassopoulou, A., Zvelebil, M., Dolukhanov, P., 1991. The transition to farming in Eastern and Northern Schwartz, M., Taillandier, A., Boue, J., Boue, A., 1990. Studies of RFLP closely Europe. J. World Prehist. 5, 233e278.