Confidential Genomic Structure in Europeans
Total Page:16
File Type:pdf, Size:1020Kb
Submitted Manuscript: Confidential Genomic Structure in Europeans dating back at least 36,200 years Andaine Seguin-Orlando1*, Thorfinn S. Korneliussen1*, Martin Sikora1*, Anna-Sapfo Malaspinas1, Andrea Manica2, Ida Moltke3,4, Anders Albrechtsen4, Amy Ko5, Ashot Margaryan1,Vyacheslav Moiseyev6, Ted Goebel7, Michael Westaway8, David Lambert8, Valeri Khartanovich6, Jeffrey D. Wall9, Philip R. Nigst10,11, Robert A. Foley1,12, Marta Mirazon Lahr1,12†, Rasmus Nielsen5†, Ludovic Orlando1, Eske Willerslev1† 1Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5–7, 1350 Copenhagen, Denmark 2Department of Zoology, University of Cambridge, Downing St, Cambridge, CB2 3EJ, UK 3 Department of Human Genetics, University of Chicago, 920 E. 58th Street, CLSC, Chicago, IL 60637 4The Bioinformatics Center, University of Copenhagen, Ole Maaløes Vej 5, 2200 København N, Denmark 5Department of Integrative Biology, University of California, Berkeley, California 94720, USA 6Department of Physical Anthropology, Kunstkamera, Peter the Great Museum of Anthropology and Etnography, Russian Academy of Sciences, 24 Srednii prospect, Vassilievskii Island, St. Peterburg, Russia. 7Center for the Study of the First Americans and Department of Anthropology, Texas A&M University, TAMU-4352, College Station, Texas 77845-4352, USA 8Griffith University, Nathan Campus, 170 Kessels Road, Nathan, Brisbane, Queensland 4111, Australia 9Department of Epidemiology and Biostatistics, University of California San Francisco, 185 Berry Street, Lobby 5, Suite 5700, San Francisco, CA 94107, USA 10Division of Archaeology, University of Cambridge, Cambridge, Downing Street, CB2 3DZ, UK 11Department of Human Evolution, Max-Planck-Institute for Evolutionary Anthropology, Leipzig, Deutscher Platz 6, D-04103, Germany 12 Leverhulme Centre for Human Evolutionary Studies, Department of Archaeology & Anthropology, University of Cambridge, Cambridge, Fitzwilliam Street, CB2 1QH, UK *These authors contributed equally to the work and should be regarded as joint first authors. † Corresponding authors: [email protected], [email protected], [email protected] Abstract The origin of contemporary Europeans remains contentious. We obtain a genome sequence from Kostenki 14 in European Russia dating to 38,700-36,200 years ago, one of the oldest fossils of Anatomically Modern Humans from Europe. We find that K14 shares a close ancestry with the 24,000 year old Mal’ta boy from central Siberia, European Mesolithic hunter-gatherers, some contemporary western Siberians and many Europeans, but not eastern Asians. Additionally, the Kostenki 14 genome shows evidence of shared ancestry with a population basal to all Eurasians that also relates to later European Neolithic farmers. We find that Kostenki 14 contains more Neandertal DNA that is contained in longer tracts than present Europeans. Our findings reveal the timing of divergence of western Eurasians and East Asians to be >36,200 years ago, and that European genomic structure today dates back to the Upper Paleolithic and derives from a meta-population that at times stretched from Europe to central Asia. One Sentence Summary: The genome of Kostenki 14, a ~37 ka-old modern human from European Russia, reveals that European genomic structure dates back to the Upper Paleolithic. Main Text: The ancestors of contemporary Eurasians are believed to have left Africa some 60,000-50,000 years ago (60-50 ka) (1, 2), possibly 30-40 ka later than Australo-Melanesian ancestors (3). Despite controversies about routes out of Africa, the first Upper Paleolithic (UP) industries of Eurasia are found in the Levant from c. 48 ka (4, 5). Expansion into Europe took place through multiple events that by c. 40 ka had generated a spatially and culturally structured Anatomically Modern Human (AMH) population – from Russia (6), to Georgia (7), Bulgaria (8), southern Europe (9, 10) and the UK (11). The few AMH fossils associated with these initial UP industries are morphologically variable (9, 12–17). In western Eurasia, the distinctive Aurignacian toolkit, first observed at Willendorf (Austria) by 43.5 ka (18), becomes predominant across the earlier range by 39 ka. Although analyses of ancient human genomes have advanced our understanding of the European past, revealing contributions from Paleolithic Siberians, European Mesolithic and Near Eastern Neolithic groups to the European gene pool (19–23), the possible contribution of the earliest Eurasians to these later cultures and to contemporary human populations remains unknown. To investigate this, we sequenced the genome of Kostenki 14 (K14, Markina Gora, Figure 1A). The locality of Kostenki-Borshchevo on the Middle Don River, Russia, has one of the most extensive Paleolithic records in eastern Europe. The K14 human skeleton was excavated in 1954 (24) and recently dated to 33,250 ± 500 radiocarbon years BP (25), 38.7-36.2 thousand calendar years BP (ka cal BP), in agreement with the stratigraphic position of the burial that cuts into the Campanian Ignimbrite ash layer dated to c. 39.3 ka cal BP (26). Below the skeleton there is a distinctive early UP industry, with end scrapers, burins, prismatic cores and bone artifacts (Layer IV); the cultural layer above (Layer III) has a regionally local character (27, 28) (SOM S1-S2). We performed 13 DNA extractions from a total of 1.285 grams of the left tibia (dorsal side of the shaft), using two extraction methods based on silica purification (29, 30). We first constructed 7 Illumina libraries and validated the presence of typical signatures of post-mortem DNA damage, using a fraction of DNA extracts (SOM S3). The remaining extracts were built into 63 libraries following enzymatic USER treatment to limit the impact of nucleotide mis-incorporations in downstream analyses (31) (SOM Table S2). Additionally, a limited fraction of two DNA extracts was purified for methylated DNA fragments using Methyl Binding Domain (MBD)-enrichment (32) before USER treatment and library building, for a total of 8 DNA libraries. Following stringent quality criteria for read alignment, we identified a total number of 175.2 million unique reads aligning against the human reference genome hg19, representing an average depth-of- coverage of 2.84X (SOM S4). The eight USER treated DNA libraries that exhibited limited error rate and contamination levels were selected for further analyses. This restricted the dataset to 148.9 million unique reads, representing a final depth-of-coverage of 2.42X. We exploited the fact that K14 was a male and used the heterozygosity levels present in the X chromosome to estimate overall levels of contamination around 2.0% (SOM S5-S6;Table S5). Note that the population genetics analyses results are robust to contamination of that level. In particular we replicated the main analyses with selected libraries with varying contamination levels and observed no qualitative effect on the results (see SOM S9 for details). Mitochondrial analyses confirmed the sequence previously reported for K14 (haplogroup U2, (33)), which supports data authenticity. The Y chromosome belongs to haplogroup C M130, the same as in La Braña – a late Mesolithic hunter-gatherer (MHG) from northern Spain (22) (SOM S7). To identify patterns of shared ancestry and admixture among K14, other ancient genomes and contemporary Eurasians (based on a single-nucleotide polymorphism (SNP) array panel of 2091 individuals from 167 populations), we carried out a series of analyses – model-based clustering and principal component analysis (PCA) - to show the contribution of diverse genetic components within K14; D-statistics to explore the affinity of K14 to pairs of populations (using Mbuti Pygmy as an outgroup); f4 statistics to test whether a given modern population is equidistant to an ancient individual and a particular recent group (here Sardinians), given an outgroup (here Papuans); and f3 statistics to explore both patterns of admixture (“admixture” f3) and shared ancestry (“outgroup” f3). Key results were also replicated using two whole-genome sequencing datasets of modern individuals from worldwide populations (23, 34). Model-based clustering analyses (35) show that K14 has different genetic components of substantial size (Fig. 1B, SOM S10), suggesting the sharing of sets of alleles with different Eurasian groups. The largest fraction of K14’s ancestry derives from a component that is maximized in European MHGs, and also predominant in contemporary northern and eastern Europeans. The genetic affinity of K14 to contemporary Europeans is also observed using “outgroup” f3-statistics (36). Using Mbuti Pygmy as outgroup, we find that among a panel of 167 contemporary populations, Europeans have the greatest affinity (i.e. the largest f3) to K14 (Figure 1C). This conclusion is also formally supported by comparing pairs of populations to K14 using the D statistics of the form D(Mbuti Pygmy, K14; Population 1, Population 2). This statistic is expected to be equal to zero if K14 is symmetrically related to Population 1 and Population 2, whereas its expectation is negative (positive) if K14 is more closely related to Population 1 (Population 2). For pairs of populations involving East Asians (Population 1) and Europeans (Population 2), K14 is always significantly closer related to Europeans (e.g. Z = 12.1, (Han, Lithuanians)), in all datasets analyzed (SOM S9;Table S7). We also confirm that these results are robust to possible contamination from a modern DNA source, by