<<

Article Million-year-old DNA sheds light on the genomic history of

https://doi.org/10.1038/s41586-021-03224-9 Tom van der Valk1,2,3,17 ✉, Patrícia Pečnerová2,4,5,17, David Díez-del-Molino1,2,4,17, Anders Bergström6, Jonas Oppenheimer7, Stefanie Hartmann8, Georgios Xenikoudakis8, Received: 3 July 2020 Jessica A. Thomas8, Marianne Dehasque1,2,4, Ekin Sağlıcan9, Fatma Rabia Fidan9, Ian Barnes10, Accepted: 11 January 2021 Shanlin Liu11, Mehmet Somel9, Peter D. Heintzman12, Pavel Nikolskiy13, Beth Shapiro14,15, Pontus Skoglund6, Michael Hofreiter8, Adrian M. Lister10, Anders Götherström1,16,18 & Published online: 17 February 2021 Love Dalén1,2,4,18 ✉ Check for updates

Temporal genomic data hold great potential for studying evolutionary processes such as speciation. However, sampling across speciation events would, in many cases, require genomic time series that stretch well back into the Early subepoch. Although theoretical models suggest that DNA should survive on this timescale1, the oldest genomic data recovered so far are from a horse specimen dated to 780– 560 thousand years ago2. Here we report the recovery of genome-wide data from three specimens dating to the Early and Middle Pleistocene subepochs, two of which are more than one million years old. We fnd that two distinct mammoth lineages were present in eastern during the Early Pleistocene. One of these lineages gave rise to the and the other represents a previously unrecognized lineage that was ancestral to the frst mammoths to colonize North America. Our analyses reveal that the of North America traces its ancestry to a Middle Pleistocene hybridization between these two lineages, with roughly equal admixture proportions. Finally, we show that the majority of protein-coding changes associated with cold adaptation in woolly mammoths were already present one million years ago. These fndings highlight the potential of deep-time palaeogenomics to expand our understanding of speciation and long-term adaptive evolution.

The recovery of genomic data from specimens that are many thousands later gave rise to the Columbian mammoth (Mammuthus columbi) and of years old has improved our understanding of prehistoric popula- woolly mammoth (Mammuthus primigenius)10. Although the exact tion dynamics, ancient introgression events and the demography of relationships among these taxa are uncertain, the prevailing view is extinct species3–5. However, some evolutionary processes occur over that the Columbian mammoth evolved during an early colonization timescales that have often been considered beyond the temporal limits of North America about 1.5 Ma and that the woolly mammoth first of ancient DNA research. For example, many present-day and appeared in northeastern Siberia about 0.7 Ma8,10. Mammoths simi- bird species originated during the Early and Middle Pleistocene6,7. lar to M. trogontherii (and considered conspecific with it) inhabited Palaeogenomic investigations of their speciation process would thus Eurasia from at least around 1.7 Ma; the last populations went extinct require recovery of ancient DNA from specimens that are at least several in Europe about 0.2 Ma8. hundreds of thousands of years old. To investigate the origin and evolution of woolly and Columbian Mammoths (Mammuthus sp.) appeared in Africa approximately mammoths, we recovered genomic data from three mammoth molars five million years ago (Ma), and subsequently colonized much of the from northeastern Siberia that date to the Early and Middle Pleisto- Northern Hemisphere8,9. During the Pleistocene epoch (2.6 Ma to cene (Fig. 1a, Extended Data Figs. 1, 2). These molars originate from 11.7 thousand years ago (ka)), the mammoth lineage underwent evo- the well-documented and fossiliferous Olyorian Suite of northeastern lutionary changes that produced the southern mammoth (Mammuthus Siberia11, which has been dated using rodent biostratigraphy tied to the meridionalis) and steppe mammoth (Mammuthus trogontherii), which global sequence of palaeomagnetic reversals as well as to correlated

1Centre for Palaeogenetics, Stockholm, Sweden. 2Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden. 3Department of Cell and Molecular Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Uppsala, Sweden. 4Department of Zoology, Stockholm University, Stockholm, Sweden. 5Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen, Denmark. 6The Francis Crick Institute, London, UK. 7Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA. 8Institute for Biochemistry and Biology, University of Potsdam, Potsdam, Germany. 9Department of Biological Sciences, Middle East Technical University, Ankara, Turkey. 10Department of Earth Sciences, Natural History Museum, London, UK. 11College of Plant Protection, China Agricultural University, Beijing, China. 12The Arctic University Museum of Norway, UiT – The Arctic University of Norway, Tromsø, Norway. 13Geological Institute, Russian Academy of Sciences, Moscow, Russia. 14Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, USA. 15Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA, USA. 16Department of Archaeology and Classical Studies, Stockholm University, Stockholm, Sweden. 17These authors contributed equally: Tom van der Valk, Patrícia Pečnerová, David Díez-del-Molino. 18These authors jointly supervised this work: Anders Götherström, Love Dalén. ✉e-mail: [email protected]; [email protected]

Nature | Vol 591 | 11 March 2021 | 265 Article Estimated sample age (Myr) a c d 2.5 1.5 0.5 –0.5 –1.5

Kanchalan Adycha Chukochya Krestovka Wyoming Wrangel Chukochya Oimyakon Adycha M. columbi Density

Kanchalan

0.005 0.006 0.007 0.008 0.009 0.010

Clade 1 Substitutions to L. africana per bp Scotland

e b Chukochya Adycha 100 100 Krestovka 88 100 64 E. maximus

Krestovka M. columb i M. columbi 100 Adycha 100 Chukochya 100 Wyoming Scotland 94 Scotland Density 100 Oimyakon

91 Clade 3 80 Kanchalan 72 Wrangel Chukochya Adycha Clade 2 Krestovka E. maximus 3.0 2.5 2.0 1.5 1.0 0.5 0 Tip calibrated age (Myr) 6 5 4 3 2 1 0 Date (Ma)

Fig. 1 | DNA-based phylogenies and specimen age estimates. a, Geographical highest posterior densities. Circles depict the position of the newly sequenced origin of the mammoth genomes analysed in this study. b, Phylogenetic tree genomes. d, Densities for age estimates of the Adycha and Chukochya samples built in FASTME on the basis of pairwise genetic distances, assuming balanced on the basis of autosomal divergence to African savannah (L. africana). minimum evolution using all nuclear sites as well as 100 resampling replicates Owing to stochasticity among the tested blocks, a subset of genomic regions in (based on 100,000 sites each). c, Bayesian reconstruction of the mitochondrial the Krestovka and Adycha genomes are estimated as younger than the tree, with the molecular clock calibrated using radiocarbon dates of ancient corresponding genomic region in the Wrangel mammoth genome, resulting in samples for which a finite radiocarbon date was available, as well as assuming a negative values. Myr, million years. e, Densities for age estimates of the log-normal prior on the divergence between the African savannah elephant (not Krestovka, Adycha and Chukochya samples on the basis of mitochondrial shown in the tree) and mammoths with a mean of 5.3 Ma. Blue bars reflect 95% genomes, as inferred from the Bayesian mitochondrial reconstruction. faunas with absolute dating from eastern (Extended Data Fig. 2, and mapped them against the African savannah elephant (Loxodonta Supplementary Section 1). One of the specimens (which we refer to as africana) genome (‘LoxAfr4’)15 and an Asian elephant (Elephas maxi- ‘Krestovka’ on the basis of its find locality) is morphologically similar mus) mitochondrial genome16. We found that the DNA recovered from to the steppe mammoth (a species that was originally defined from the the Early and Middle Pleistocene specimens was considerably more Middle Pleistocene of Europe (Supplementary Information section 1)), fragmented and had higher levels of cytosine deamination than DNA and was collected from Lower Olyorian deposits that have been dated to from permafrost-preserved samples dating to the Late Pleistocene 1.2–1.1 Ma. The second specimen (referred to as ‘Adycha’), which is also subepoch (Extended Data Figs. 3, 4, Supplementary Information sec- of M. trogontherii-like morphology (Supplementary Information sec- tion 4). To circumvent this, we used conservative filters and an iterative tion 1), is of a less-certain age within the Olyorian Suite (1.2–0.5 million approach that was designed to minimize spurious mappings of short years old). However, the morphology of the Adycha specimen (Extended reads (Supplementary Information section 5). This approach allowed Data Fig. 1) strongly suggests that it dates to the Early Olyorian, and us to recover complete (over 37× coverage) mitogenomes from all probably to between 1.2 and 1.0 Ma. The third specimen (referred to as three specimens, and 49 million, 884 million and 3,671 million base ‘Chukochya’) has a morphology consistent with being an early form of pairs of nuclear genomic data for the Krestovka, Adycha and Chukochya woolly mammoth (Extended Data Fig. 1) and was discovered in a section specimens, respectively (Supplementary Table 3). in which only Upper Olyorian deposits are exposed, which implies that it dates to 0.8–0.5 Ma (Supplementary Section 1). We extracted DNA from the three molars using methods designed to DNA-based age estimates recover highly degraded DNA fragments12,13, converted the extracts into To estimate specimen ages using mitogenome data, we conducted a libraries14 and sequenced these on Illumina platforms (Supplementary Bayesian molecular clock analysis that was calibrated using samples Information section 2, Supplementary Table 1). We merged the reads with finite radiocarbon dates (tip calibration) and a log-normal prior

266 | Nature | Vol 591 | 11 March 2021 a c D statistics Genetic Geological (M. primigenius,P ,P : M. americanum) a 2 3 c timescale i timescale 0 0.1 0.2 0.3 0.4 0.5 er ngel g pe ra eria th Am b r Mammuthus sp.|L. africana W o Si N Euro s i 11–13% b s Mammuthus sp.|P. antiquus m America mu u i l M. trogon o M. primigeniu c M. primigenius|Krestovka ax . M m ~50% E. ~ M. columbi|Krestovka 0.5 Ma 50% therii 0.5 Ma Wyoming M. primigenius|M. columbi s

M. primigeniu M. primigenius|Adycha Chukochya M. primigenius|Chukochya 1.0 Ma ?

Adycha b 1.0 Ma E. maximus M. primigenius M. columbi 1.5 Ma

Chuk ~60% ~40%

Adycha Eurasia Krestovka ochya

2.0 Ma Elephas s

Krestovka i l a 1.5 Ma n io d ri e m M. Time Africa

Fig. 2 | Inferred genomic history of mammoths. a, D statistics, in which each for one admixture event, suggesting a hybrid origin for the Columbian dot reflects a comparison involving one woolly mammoth genome and the two mammoth. c, Hypothesized evolutionary history of mammoths during the past genomes depicted on the right, iterating through all possible sample 3 million years on the basis of currently available genomic data. Brown dots combinations using the mastodon (Mammut americanum) as an outgroup. represent mammoth specimens for which genomic data have been analysed in No elevated allele-sharing between any of the mammoth genomes and the this study; error bars represent 95% highest posterior density intervals from reference (African savannah elephant) is observed, suggesting no pronounced the mitogenome-based age estimates obtained for the three Early and Middle reference biases in the Early and Middle Pleistocene genomes. A strong affinity Pleistocene specimens. Arrows depict gene flow events identified from the between Columbian mammoths and the Krestovka sample is observed, as well autosomal genomic data. The European steppe mammoth (M. trogontherii) as a relationship between the North American woolly mammoth (Wyoming) survived well into the later stages of the Middle Pleistocene, and we and the Columbian mammoth. The abbreviation P. antiquus denotes the hypothesize that it most probably branched off from a common ancestor straight-tusked elephant ( antiquus), and Mammuthus sp. refers shared with the woolly mammoth about 1 Ma. to all mammoth specimens in this study. b, Best-fitting admixture graph model

that assumed a genomic divergence between the African savannah 0.05 million years, and all estimates support an age greater than one elephant and mammoth lineages at 5.3 Ma15 (root calibration). On the million years. basis of this analysis, the specimens were estimated to date to 1.65 Ma (95% highest posterior density, 2.08–1.25 Ma), 1.34 Ma (1.69–1.06 Ma) and 0.87 Ma (1.07–0.68 Ma) for Krestovka, Adycha and Chukochya, A genetically divergent mammoth lineage respectively (Fig. 1c, e). We also used the autosomal genomic data A phylogeny based on autosomal data shows that the three Early and to investigate the age of the higher-coverage Adycha (0.3×) and Middle Pleistocene samples fall outside the diversity of all Eurasian Chukochya (1.4×) specimens, by estimating the number of derived mammoth genomes dating to the Late Pleistocene (Fig. 1b), includ- changes since their most recent common ancestor with the African ing two woolly mammoth genomes from Europe (Scotland, dating to savannah elephant (Supplementary Information section 6). We used 48 ka) and Siberia (Kanchalan, dating to 24 ka) that were generated an approach based on the accumulation of derived variants over time17, as part of this study. The phylogenetic positions of the Adycha and assuming a constant mutation rate. This analysis suggested that the Chukochya specimens are consistent with their genomes being from Adycha and Chukochya specimens date to 1.28 Ma (95% confidence a population directly ancestral to all Late Pleistocene woolly mam- interval, 1.64–0.92 Ma) and 0.62 Ma (95% confidence interval, moths, whereas the Krestovka mammoth genome diverged before the 1.00–0.24 Ma), respectively (Fig. 1d). Although we caution that this split between the Columbian and woolly mammoth genomes (Fig. 1b). analysis is based on low-coverage data and the confidence intervals Similarly, Bayesian reconstruction of a mitogenome phylogeny that are wide, these estimates are similar to those obtained from the mito- included 168 Late Pleistocene mammoth specimens18,19 places the Early chondrial data. Pleistocene Krestovka and Adycha specimens as basal to all previously The DNA-based age estimates for the Chukochya and Adycha spec- published mammoth mitogenomes, whereas the Middle Pleistocene imens are consistent with the geological age inferences that were Chukochya mitogenome is basal to one of the three clades that have independently derived from biostratigraphy and palaeomagnetism, previously been described for Late Pleistocene woolly mammoths20 whereas the molecular clock dating of the Krestovka specimen sug- (Fig. 1c). gests an older age than that obtained from biostratigraphy. This could Estimates of sequence divergence times on the basis of both mean that the Krestovka specimen had been reworked from an older genome-wide and mitochondrial data indicate a deep split between geological deposit or that the mitochondrial clock rate has been under- the Krestovka specimen and all other mammoths analysed in this estimated. However, the confidence intervals of the genetic and geo- study. We estimate that the Krestovka mitogenome diverged from all logical age estimates of the Krestovka specimen are separated by only other mammoth mitogenomes between 2.66 and 1.78 Ma (95% highest

Nature | Vol 591 | 11 March 2021 | 267 Article posterior density) (Fig. 1c). We obtained a similar divergence time esti- built pairwise-distance phylogenetic trees for the genomic regions mate (between 2.65 and 1.96 Ma based on the 95% confidence interval) identified as being the result of ghost admixture and found them to be from the autosomal data, but caution that this analysis is based closely related to the Krestovka genome (Extended Data Fig. 7b, Sup- on limited genomic data (Supplementary Information section 7). plementary Information section 9). By contrast, when excluding these Moreover, estimates of relative divergence using F(A|B) statistics4 regions, the remaining part of the Columbian mammoth genome falls show that the Krestovka nuclear genome carries fewer derived alleles within the diversity of Late Pleistocene woolly mammoths (Extended than any other mammoth genome at sites at which the high-coverage Data Fig. 7c, Supplementary Information section 9). woolly mammoth genomes are heterozygous, which provides further Finally, our D statistics analysis also identified higher levels of derived support for the notion that the Krestovka mammoth lineage diverged allele-sharing between the Columbian mammoth and a woolly mam- after the split with Asian elephant but before any of the other mam- moth from Wyoming (Fig. 2a). On the basis of f4 ratios, we estimate moth genomes analysed here (Extended Data Fig. 5, Supplementary 10.7–12.7% excess shared ancestry between these genomes (Supple- Information section 8). mentary Section 9), consistent with a previous study15. Because the Overall, these analyses suggest that two evolutionary lineages (that Columbian mammoth carries a large proportion of Krestovka ancestry, is, two isolated populations persisting through time) of mammoths gene flow from the Columbian mammoth into North American woolly inhabited eastern Siberia during the later stages of the Early Pleisto- mammoths would have resulted in a larger proportion of allele-sharing cene. One of these lineages, which is represented by the Krestovka between Krestovka and the Wyoming woolly mammoth. Our finding of specimen, diverged from other mammoths before the first appearance no excess allele-sharing between the Krestovka genome and any of the of mammoths in North America. The second lineage comprises the sequenced woolly mammoths—including the individual from Wyoming Adycha specimen along with all Middle and Late Pleistocene woolly (Supplementary Table 7)—therefore indicates that this second phase of mammoths. gene flow may have been unidirectional, from woolly mammoth into the Columbian mammoth. This implies that the composition of the genome of the Columbian mammoth (as identified in the D statistics, Origin of the Columbian mammoth admixture graph models and ghost-admixture analysis) is the result Several lines of evidence suggest that—compared to all other mam- of two admixture events, in which an initial approximately 50% con- moths—the Columbian mammoth derives a much higher proportion of tribution from each of the Krestovka and woolly mammoth lineages its ancestry from the lineage represented by the Krestovka mammoth. was followed by an additional approximately 12% gene flow from North We performed analyses using D statistics4, which revealed a strong American woolly mammoths (Fig. 2c). signal of excess derived allele-sharing between the Columbian mam- moth and the Krestovka specimen (Fig. 2a, Supplementary Information section 8). This is at odds with the average phylogenetic position of Insights into mammoth adaptive evolution the Krestovka genome being basal to all other mammoth genomes, as The woolly mammoth evolved into a cold-tolerant, open-habitat under a scenario without subsequent admixture the D statistic would specialist through a series of adaptive changes8. The antiquity of our not deviate from zero. We further investigated this pattern using Tree- genomes makes it possible to investigate when these adaptations Mix21. Without modelling migration (admixture) events, none of the evolved. To do this, we identified protein-coding changes for which models fit the data (residuals > 10× s.e.). Instead, we observed a good fit all Late Pleistocene woolly mammoths carried the derived allele and when modelling one migration event (admixture weight = 42%, residu- all African savannah and Asian carried the ancestral allele als < 2× s.e.) (Supplementary Information section 8), which indicates (n = 5,598) (Supplementary Table 8). Among the variants that could be that part of the ancestry of the Columbian mammoth is derived from called in the Early and Middle Pleistocene genomes, we find that 85.2% the Krestovka lineage. (782 out of 918) and 88.7% (2,578 out of 2,906) of the mammoth-specific To further assess the evolutionary context of the Krestovka lineage protein-coding changes were already present in the genomes of Ady- within the population history of mammoths, we used two complemen- cha (M. trogontherii-like) and Chukochya (early woolly mammoth), tary admixture graph model approaches22,23. We exhaustively tested all respectively (Supplementary Information section 10, Supplementary possible phylogenetic combinations relating the three ancient individu- Table 9). Moreover, we did not detect significant differences in the ratio als with one Siberian woolly mammoth, one Columbian mammoth and of shared nonsynonymous to synonymous sites among our sequenced one Asian elephant. We set the latter as outgroup, including only sites Early, Middle and Late Pleistocene genomes (Supplementary Table 9). identified as polymorphic in six Asian elephant genomes to limit the Thus, despite the transitions in climate and mammoth morphology effects of incorrectly called genotypes (Supplementary Information at the onset of the Middle Pleistocene, we do not observe any marked section 8). None of the graph models without admixture events pro- change in the rate of protein-coding mutations during this time period. vided a good fit to the data, thus ruling out a simple tree-like popula- Previous analyses have identified specific genetic changes that are tion history. By contrast, graph models with only one admixture event thought to underlie a suite of woolly mammoth adaptations to the 25 provided a perfect fit, explaining all 45 f4-statistic combinations without Arctic environment . For these variants (n = 91), we assessed whether significant outliers. On the basis of point estimates obtained from the the Adycha and Chukochya genomes shared the same amino acid two admixture graph model approaches, we estimate the Columbian changes as those observed in Late Pleistocene woolly mammoths mammoth to be the result of an admixture event in which 38–43% (Supplementary Table 10). We found that among genes that are pos- of its ancestry was derived from a lineage related to the Krestovka sibly involved in hair growth, circadian rhythm, thermal sensation and genome, and 57–62% from the woolly mammoth lineage (Fig. 2b, white and brown fat deposits, the vast majority of coding changes were Extended Data Fig. 6). present in both the Adycha (87%) and Chukochya (89%) genomes (Sup- We obtained additional support for the complex ancestry of the plementary Table 10). This suggests that Siberian M. trogontherii-like Columbian mammoth by using a hidden Markov model that aimed at mammoths (that is, Adycha) had already developed a woolly fur as well identifying admixed genomic regions from an unknown source (that as several physiological adaptations to a cold, high-latitude environ- is, ghost admixture)24 (Supplementary Information section 9). This ment (Supplementary Information section 11). However, in one of the analysis, which was done without including any of the Early and Middle best-studied genes in the woolly mammoth (TRPV3, which encodes a Pleistocene specimens, suggested that roughly 41% of the Columbian temperature-sensitive transient receptor channel that is potentially mammoth genome originates from a lineage genetically differentiated involved in thermal sensation and hair growth25), we find that only two from the woolly mammoth (Extended Data Fig. 7a). We subsequently out of four amino acid changes identified in Late Pleistocene woolly

268 | Nature | Vol 591 | 11 March 2021 mammoths were present in the early woolly mammoth genome (Chu- kochya). This indicates that nonsynonymous changes in this gene Online content occurred over several hundreds of thousands of years, rather than Any methods, additional references, Nature Research reporting sum- during a single brief burst of adaptive evolution. maries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author con- tributions and competing interests; and statements of data and code Discussion availability are available at https://doi.org/10.1038/s41586-021-03224-9. Our genomic analyses suggest that the Columbian mammoth is a product of admixture between woolly mammoths and a previously 1. Allentoft, M. E. et al. The half-life of DNA in bone: measuring decay kinetics in 158 dated unrecognized ancient mammoth lineage, represented by the Krestovka fossils. Proc. R. Soc. Lond. B 279, 4724–4733 (2012). specimen. Given the finding that each of these lineages initially con- 2. Orlando, L. et al. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499, 74–78 (2013). tributed roughly half of their genome to this ancient admixture, we 3. Skoglund, P. et al. Origins and genetic legacy of Neolithic farmers and hunter-gatherers propose that the origin of the Columbian mammoth constitutes a in Europe. Science 336, 466–469 (2012). hybrid speciation event26. This hybridization event appears not to have 4. Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010). imparted any shift in the average molar morphology of North American 5. Palkopoulou, E. et al. Complete genomes reveal signatures of demographic and genetic populations10, but can explain the mitochondrial–nuclear discordance declines in the woolly mammoth. Curr. Biol. 25, 1395–1400 (2015). in the Columbian mammoth18, in which all known Columbian mam- 6. Weir, J. T. & Schluter, D. Ice sheets promote speciation in boreal birds. Proc. R. Soc. Lond. B 271, 1881–1887 (2004). moth mitogenomes are nested within the mitogenome diversity of the 7. Lister, A. M. The impact of Quaternary Ice Ages on mammalian evolution. Phil. Trans. R. woolly mammoth (Fig. 1c). On the basis of the mitogenome phylogeny, Soc. Lond. B 359, 221–241 (2004). we estimate that the most recent common female ancestor of all Late 8. Lister, A. M., Sher, A. V., van Essen, H. & Wei, G. The pattern and process of mammoth evolution in Eurasia. Quat. Int. 126–128, 49–64 (2005). Pleistocene Columbian mammoths lived approximately 420 ka (95% 9. Werdelin, L. & Sanders, W. J. (eds) Cenozoic of Africa (Univ. California Press, highest posterior density, 511–338 ka), providing a likely minimum 2010). date for when this hybridization event occurred (Fig. 1c). Because 10. Lister, A. M. & Sher, A. V. Evolution and dispersal of mammoths across the Northern Hemisphere. Science 350, 805–809 (2015). mammoths had already appeared in North America by 1.5 Ma, these 11. Repenning, C. A. Allophaiomys and the Age of the Olyor Suite, Krestovka Sections, Yakutia findings imply that before the hybridization event North American (US Government Printing Office, 1992). mammoths belonged to the Krestovka lineage. Given the morphology 12. Dabney, J. et al. Complete mitochondrial genome sequence of a Middle Pleistocene cave bear reconstructed from ultrashort DNA fragments. Proc. Natl Acad. Sci. USA 110, of the Krestovka specimen, this corroborates a previously proposed 15758–15763 (2013). model10 that the earliest North American mammoths were derived 13. Briggs, A. W. et al. Removal of deaminated cytosines and detection of in vivo methylation from an M. trogontherii-like Eurasian ancestor, rather than originating in ancient DNA. Nucleic Acids Res. 38, e87 (2010). 14. Meyer, M. & Kircher, M. Illumina sequencing library preparation for highly multiplexed from an expansion of the southern mammoth (M. meridionalis) into target capture and sequencing. Cold Spring Harb. Protoc. 2010, db.prot5448 North America27. (2010). Our findings demonstrate that genomic data can be recovered from 15. Palkopoulou, E. et al. A comprehensive genomic history of extinct and living elephants. Proc. Natl Acad. Sci. USA 115, E2566–E2574 (2018). Early Pleistocene specimens, which opens up the possibility of studying 16. Rohland, N. et al. Proboscidean mitogenomics: chronology and mode of elephant adaptive evolution across speciation events. The mammoth genomes evolution using mastodon as outgroup. PLoS Biol. 5, e207 (2007). 17. Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. presented here offer a glimpse of this potential. Even though the transi- Science 338, 222–226 (2012). tion from an M. trogontherii-like (Adycha) to woolly (Chukochya) mam- 18. Chang, D. et al. The evolutionary and phylogeographic history of woolly mammoths: a moth represents a marked change in molar morphology (Extended Data comprehensive mitogenomic analysis. Sci. Rep. 7, 44585 (2017). 19. Pečnerová, P. et al. Mitogenome evolution in the last surviving woolly mammoth Fig. 1), we do not observe an increased rate of genome-wide selection population reveals neutral and functional consequences of small population size. Evol. during this time period. Moreover, many key adaptations identified in Lett. 1, 292–303 (2017). Late Pleistocene mammoth genomes were already present in the Early 20. Barnes, I. et al. Genetic structure and extinction of the woolly mammoth, Mammuthus primigenius. Curr. Biol. 17, 1072–1075 (2007). Pleistocene Adycha genome. We thus find no evidence for an increased 21. Pickrell, J. K. & Pritchard, J. K. Inference of population splits and mixtures from rate of adaptive evolution associated with the origin of the woolly mam- genome-wide allele frequency data. PLoS Genet. 8, e1002967 (2012). moth. This is consistent with previous work that suggested that the 22. Patterson, N. et al. Ancient admixture in human history. Genetics 192, 1065–1093 (2012). major shift in habitat and morphology of mammoths happened earlier, 23. Leppälä, K., Nielsen, S. V. & Mailund, T. admixturegraph: an R package for admixture 8,10 between M. meridionalis-like and M. trogontherii-like mammoths . graph manipulation and fitting. Bioinformatics 33, 1738–1740 (2017). The retrieval of DNA that is more than one million years old confirms 24. Skov, L. et al. Detecting archaic introgression using an unadmixed outgroup. PLoS Genet. 1 14, e1007641 (2018). previous theoretical predictions that the ancient genetic record can 25. Lynch, V. J. et al. Elephantid genomes reveal the molecular bases of woolly mammoth be extended beyond what has been previously shown. We anticipate adaptations to the Arctic. Cell Rep. 12, 217–228 (2015). that the additional recovery and analysis of Early and Middle Pleisto- 26. Mallet, J. Hybrid speciation. Nature 446, 279–283 (2007). 27. Lucas, S. G., Morgan, G. S., Love, D. W. & Connell, S. D. The first North American cene genomes will further improve our understanding of the complex mammoths: taxonomy and chronology of early Irvingtonian (Early Pleistocene) nature of evolutionary change and speciation. Our results highlight Mammuthus from New Mexico. Quat. Int. 443, 2–13 (2017). the value of perennially frozen environments for extending the tem- poral limits of DNA recovery, and hint at a future deep-time chapter Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. of ancient DNA research in which specimens from high latitudes will have an important role. © The Author(s), under exclusive licence to Springer Nature Limited 2021

Nature | Vol 591 | 11 March 2021 | 269 Article Methods uniquely to nonrepetitive regions of the LoxAfr4 reference and had a mapping quality ≥30 were retained; reads that mapped equally well to No statistical methods were used to predetermine sample size. the human genome reference (hg19) in our composite reference were The experiments were not randomized, and investigators were not removed to reduce possible biases caused by contaminant human blinded to allocation during experiments and outcome assessment. reads32. Second, we used a method based on the rate of mismatches per base pair to the reference to assess the rate of spurious mappings for all Morphometry of mammoth molars reads between 20 and 35 bp and at 5-bp intervals between 35 and 50 bp Mammoth molars were measured according to a previously described (Supplementary Information section 4). This enabled us to identify a method10 (Supplementary Information section 1). Samples considered sample-specific minimum read length cut-off, above which we con- are as follows: M. meridionalis, about 2.0 Ma, Upper Valdarno, Italy (type sider reads to be correctly mapped and endogenous (Supplementary locality) (n = 34); M. trogontherii, about 0.6 Ma, Süssenborn, Germany Information section 4, Supplementary Table 3). On the basis of this, we (type locality) (n = 48); and M. primigenius, Late Pleistocene of north- applied the longest sample-specific cut-off (≥35 bp, for the Krestovka eastern Siberia (Russia) and Alaska (USA) (n = 28). Early (n = 8) and specimen) for all samples. We used mapDamage v.2.0.634 to obtain Late (n = 15) Olyorian samples are from localities in the Yana–Kolyma read-length distributions for all ancient samples. Finally, an assessment lowland (Lower Olyorian Suite is about 1.2–0.8 Ma; Upper Olyorian Suite of cytosine deamination profiles at CpG sites, which are unaffected is 0.8–0.5 Ma) (Extended Data Fig. 2). Early to early Middle Pleistocene by UDG treatment13, was done using the platypus option in PMDtools samples (about 1.5–0.5 Ma) from North America are from Old Crow (https://github.com/pontussk/PMDtools)35. A full set of ancient DNA (Yukon, Canada), Leisey Shell Pit 1A and Punta Gorda (both in Florida, quality statistics are available in Supplementary Tables 1–3. USA), and the Ocotillo Formation (California, USA) (combined, n = 16). Original data have previously been published10, along with further Allele sampling details on sites and collections. To minimize coverage-related biases, all subsequent analyses were based on pseudo-haploidized sequences that were generated by DNA extraction and sequencing randomly selecting a single high-quality base call at each autosomal Samples from Early and Middle Pleistocene mammoth molars (Krestovka, genomic site using ANGSD v.0.92136. For base-calling, we considered Adycha and Chukochya specimens) as well as Late Pleistocene samples only reads ≥ 35 bp, a mapping and base quality ≥ 30 and reads without (Scotland and Kanchalan specimens) were processed in dedicated multiple best hits (-uniqueOnly 1). Finally, we masked all sites within ancient DNA laboratories following standard ancient DNA practices repetitive regions as identified with RepeatMasker v.4.0.737, CpG sites, (Supplementary Information section 2). Following DNA extraction12, sites with more than two alleles among all individuals and sites with we constructed double- or single-stranded Illumina libraries14,28, which coverage above the 95th percentile of the genome-wide average, to were treated to remove uracil caused by post-mortem cytosine deami- reduce false calls from duplicated genomic regions. nation13. We subsequently sequenced these libraries using Illumina platforms, generating from 200 to 2,350 million paired-end reads Reconstruction of mitogenomes, tip-dating and mitochondrial (2 × 50 or 2 × 150 bp) per specimen (Supplementary Table 1). DNA phylogeny Mitochondrial genomes for the five newly sequenced samples were Sequence data processing and mapping assembled using MIA38 with the Asian elephant (NC_005129)16 mitog- We combined our sequence data with previously published genomic enome as reference for Adycha, Krestovka and Chukochya specimens, data from elephantids15 (Supplementary Table 2). For the five samples and the mammoth mitogenome (NC_007596) as reference for the Late sequenced in this study, we trimmed adapters and merged paired-end Pleistocene woolly mammoth samples from Scotland and Kanchalan, reads using SeqPrep v.1.129, initially retaining reads either ≥25 bp restricting the input reads to those ≥ 35 bp for each (Supplementary (Krestovka, Adycha and Chukochya specimens) or ≥30 bp (Scotland Section 5). This yielded mitochondrial assemblies with coverage and Kanchalan specimens), and with a minor modification in the source of 37.8×, 47.5× and 77.1× for the Adycha, Krestovka and Chukochya code that enabled us to choose the best base-quality score in the merged specimens, and 99.6× and 179.5× for the Scotland and Kanchalan sam- region instead of aggregating the scores5 (Supplementary Informa- ples, respectively. These assemblies were then aligned using Muscle tion section 3). For genomic data from the straight-tusked elephant v.3.8.3139 together with previously published elephantid mitoge- and the Scotland and Kanchalan mammoths (which had been treated nomes18,19,40. Following alignment partitioning, the HKY model with a with Afu uracil-DNA glycosylase (UDG), leaving post-mortem DNA gamma-distributed rate heterogeneity41 and a proportion of invariant damage at the ends of the molecules (Supplementary Tables 2, 3)), we sites or just a proportion of invariant sites, was identified as best-fitting removed the first and last two base pairs from all reads before mapping. for each alignment partition using jModelTest v.2.1.1042 (Supplemen- The merged reads were mapped to a composite reference, consisting tary Information section 5). To estimate the age of the three oldest Mam- of the African savannah elephant nuclear genome (LoxAfr4), woolly muthus samples (Adycha, Krestovka and Chukochya), we performed a mammoth mitogenome (DQ188829) and the human genome (hg19) Bayesian reconstruction of the phylogenetic tree using BEAST v.1.10.443. using BWA aln v.0.7.8 with deactivated seeding (-l 16,500), allowing for We calibrated the molecular clock using tip ages for all ancient samples more substitutions (-n 0.01) and up to two gaps (-o 2)30,31. The human with a finite radiocarbon date, as well as a log-normal prior of 5.3 Ma genome was included as a decoy to filter out spurious mappings in on the genetic divergence of Loxodonta and Elephas–Mammuthus as genomic conserved regions32. Next, we removed PCR duplicates from obtained from previous genomic studies15 (Supplementary Table 4). the alignments using a custom Python script5. After obtaining initial In addition, we tested for an older divergence (7.6 Ma) between Loxo- quality metrics for the genomes, we removed reads <35 base pairs from donta and Mammuthus that is more consistent with the fossil record16 the BAM files using samtools v.1.1033 and awk for all remaining analysis (Supplementary Information section 5). For both priors, we used a (Supplementary Section 4). standard deviation of 500,000 years. We assumed a strict molecular clock and the flexible skygrid coalescent model44 to account for the Ancient DNA authenticity and quality assessment complex cross-generic demographic history of the included taxa. The All ancient genomes were treated to reduce post-mortem DNA damage. ages of all samples beyond the limit of radiocarbon dating were esti- For the most ancient samples (Krestovka, Adycha and Chukochya), we mated by sampling from log-normal distributions with priors based on took several steps to assess the authenticity and quality of the data stratigraphic context and previous genetic studies, using two Markov (Supplementary Information section 4). First, only reads that mapped chain Monte Carlo (MCMC) chains of 100 million generations, sampling every 10,000 and discarding the first 10% as burn-in (Supplementary (with up to 2 admixture events) relating these 3 individuals, 1 woolly Table 5, Supplementary Information section 5). mammoth (Wrangel), 1 Columbian mammoth and 1 Asian elephant, setting the latter as outgroup (Supplementary Information section 8). Genetic dating on the basis of autosomal data We repeated the admixturegraph analysis using the above-described 46 Age estimates for the Adycha and Chukochya specimens (the Krestovka f4 statistic with qpBrute , which in addition enabled us to estimate specimen was excluded as too few autosomal bases were available shared genetic drift and branch lengths using f2 and f3 statistics. At for this analysis) were estimated on the basis of autosomal data fol- each step, insertion of a new node was tested at all branches of the lowing a previously described method17, using the American mas- graph, except the outgroup branch. In cases in which a node could not todon (Mammut americanum; an outgroup to all elephantids) and be inserted without producing f4 outliers (that is, |Z| ≥ 3), all possible the African savannah and Asian elephant genomes as outgroups. We admixture combinations were also attempted. The resulting list of all inferred the ancestral state for a given base in the African savannah fitted graphs was then passed to the MCMC algorithm implemented elephant reference genome by requiring that the alignments of the in the admixturegraph R package, to compute the marginal likelihood mastodon, two African savannah elephants and five Asian elephants of the models and their Bayes factors. Finally, we estimated genetic are present and identical at that nucleotide. We used the high-coverage relationships and admixture among the Mammuthus samples using and radiocarbon-dated Wrangel Island woolly mammoth genome as TreeMix v.1.1221. We first estimated the allele frequencies among the a calibration point5. Each difference to the ancestral state was then randomly sampled alleles and subsequently ran the TreeMix model counted for the Wrangel genome and the focal Mammuthus genome accounting for linkage disequilibrium by grouping sites in blocks of for all sites at which both genomes had a called base. We calculated 1,000 single-nucleotide polymorphisms (-k 1,000) setting the E. maxi- the relative age of each individual as (nW – nM)/nW, on the basis of the mus samples as root. Standard errors (-SE) and bootstrap replicates number of derived changes in the Wrangel genome (nW) and the other (-bootstrap) were used to evaluate the confidence in the inferred tree

Mammuthus genome (nM), using an assumed divergence time of 5.3 mil- topology. After constructing a maximum-likelihood tree, migration lion years15 to the common ancestor of African savannah elephant and events were added (−m) and iterated 10 times for each value of m (1–10) woolly mammoth. Age variance estimates were calculated in windows to check for convergence in the likelihood of the model as well as the of 5 Mb and we computed bootstrap confidence intervals as 1.96× s.e. explained variance following each addition of a migration event. The around the date estimates (Supplementary Information section 6). inferred maximum-likelihood trees were visualized with the in-built TreeMix R script plotting functions. Nuclear genetic relationships and phylogeny We reconstructed phylogenetic trees on the basis of the whole-genome Introgression in the Columbian mammoth identical-by-state matrix for all individuals using the doIBS function We further tested for admixture in the Columbian and Scotland mam- in ANGSD. We calculated pairwise genetic distances between indi- moths using a hidden Markov model24. This method identifies genomic viduals using the full dataset, as well as 100 resampling replicates regions within a given individual that possibly came from an admixture based on 100,000 sites each. Second, we obtained the phylogenetic event with a distant lineage not present in the dataset, on the basis of tree using a balanced minimum evolution method as implemented in on the distribution of private sites. In brief, we estimated the number FASTME45 (Fig. 1b, Supplementary Information section 7). Next, we of callable sites, the single-nucleotide polymorphism density (as a inferred relative population split times using an approach that exam- proxy for per-window mutation rate) and the number of private vari- ines single-nucleotide polymorphic positions that are heterozygous in ants with respect to all other elephant genomes except Krestovka in an individual from one population and measures the fraction of these 1-kb windows. We applied settings without gene flow, or with one gene sites at which a randomly sampled allele from an individual of a second flow event with starting probabilities and decoding described in Sup- population carries the derived variant, polarized by an outgroup (F(A|B) plementary Information section 9. We tested for ghost admixture in the statistics)4. We ascertained heterozygous sites in three high-coverage Columbian mammoth using sites private to the Columbian mammoth genomes—E. maximus and M. primigenius Oimyakon and Wrangel5— with respect to all other genomes in this study except Krestovka. We using the SAMtools v.1.1033 mpileup command and bcftools. We only subsequently obtained fasta alignments for those autosomal regions included single-nucleotide polymorphisms with a quality ≥ 30, and identified as ‘unadmixed’ and ‘ghost-admixed’ in the Columbian mam- filtered out all single-nucleotide polymorphisms in repetitive regions, moths by calling a random base at each covered position using ANGSD. within 5 bp of insertions and/or deletions, at CpG sites and sites below Minimal evolution phylogenies were then obtained for both alignments 1/3 or above 2× the genome-wide average coverage. For each of the as described in ‘Nuclear genetic relationships and phylogeny’. Mammuthus genomes, we then estimated the proportion of sites for which a randomly drawn allele at the ascertained heterozygous sites Genetic adaptations of the woolly mammoth matches the derived state. To investigate the timing of genetic adaptations in the woolly mammoth lineage, we used last v.117047 to build a chain file to lift over our sampled

D statistics, f4 statistics, AdmixtureGraphs and TreeMix allele dataset mapped to LoxAfr4 to the annotated LoxAfr3 reference 22 We first used Admixtools v.5 to calculate D statistics and f4 statistics for genome. Following construction of a reference index using lastdb (-P0 all possible quadruple combinations of samples iterating through the -uNEAR -R01), we aligned the two references using lastal (-m50 -E0.05 three different groups (P1, P2 and P3) on the basis of randomly sampled -C2). The alignment was converted to MAF format (last-split -m1) and alleles, conditioning on all sites that are polymorphic among the six finally to a chain file with the maf-convert tool (http://last.cbrc.jp/). The Asian elephant genomes22. The mastodon was used as an outgroup Picard Liftover tool (https://broadinstitute.github.io/picard/) was then in all comparisons (Supplementary Tables 6, 7). Direct estimates of used to lift over the identified variants to the LoxAfr3 reference. Using genomic ancestries using f4 ratios were additionally calculated for the African savannah elephant genome annotation (LoxAfr3.gff), we specific pairs in AdmixTools22 (Supplementary Information section 9). identified all amino acid changes in which all Late Pleistocene woolly Second, we used the admixturegraph R package23 to assess the genetic mammoth genomes carry the derived state and all other elephantid relationship among the Mammuthus genomes using admixture graph genomes carry the ancestral allele using VariantEffectPredictor48. For models, fitting graphs to all possible f4 statistics involving a given set all identified amino acid changes, we assessed the state (derived or of genomes. To resolve the relationships of the Adycha, Krestovka ancestral) among the three oldest samples (Krestovka, Adycha and Chu- and Chukochya individuals within the population history of mam- kochya) and the Columbian mammoth (Supplementary Tables 8–10). moths, we exhaustively tested all 135,285 possible admixture graphs In addition, we conducted a Gene Ontology enrichment on all genes Article for which the woolly mammoth genomes (including Chukochya and 43. Suchard, M. A. et al. Bayesian phylogenetic and phylodynamic data integration using 49 50 BEAST 1.10. Virus Evol. 4, vey016 (2018). Adycha) are derived, using GOrilla . Finally, we used PAML v.1.3.1 to 44. Gill, M. S. et al. Improving Bayesian population dynamics inference: a coalescent-based identify genes that have potentially been under positive selection in model for multiple loci. Mol. Biol. Evol. 30, 713–724 (2013). Late Pleistocene woolly mammoths (Supplementary Table 11, Sup- 45. Lefort, V., Desper, R. & Gascuel, O. FastME 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program. Mol. Biol. Evol. 32, 2798–2800 (2015). plementary Information section 10). 46. Liu, L. et al. Genomic analysis on pygmy hog reveals extensive interbreeding during wild boar expansion. Nat. Commun. 10, 1992 (2019). Reporting summary 47. Frith, M. C., Hamada, M. & Horton, P. Parameters for accurate genome alignment. BMC Bioinformatics 11, 80 (2010). Further information on research design is available in the Nature 48. McLaren, W. et al. The Ensembl variant effect predictor. Genome Biol. 17, 122 (2016). Research Reporting Summary linked to this paper. 49. Eden, E., Navon, R., Steinfeld, I., Lipson, D. & Yakhini, Z. GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics 10, 48 (2009). 50. Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, Data availability 1586–1591 (2007). All sequence data (in .fastq format) for samples sequenced in this study are available through the European Nucleotide Archive under acces- Acknowledgements T.v.d.V., P.P., D.D.-d.-M., M.D. and L.D. acknowledge support from the sion number PRJEB42269. Previously published data used in this study Swedish Research Council (2012-3869 and 2017-04647), FORMAS (2018-01640) and the Tryggers Foundation (CTS 17:109). A.G. is supported by the Knut and Alice Wallenberg are available under accession numbers PRJEB24361 and PRJEB7929. Foundation (1,000 Ancient Genomes project). A.B. and P.S. were supported by the Francis Crick Institute (FC001595), which receives its core funding from Cancer Research UK, the UK Medical Research Council and the Wellcome Trust. P.S. was supported by the European Code availability Research Council (grant no. 852558), the Wellcome Trust (217223/Z/19/Z) and the Vallee Foundation. M.H., J.A.T., I.B., A.M.L. and G.X. were supported by NERC (grant no. NE/J010480/1) The custom code used in this study to evaluate read length cut-offs and the ERC StG grant GeneFlow (no. 310763). B.S. and J.O. were supported by the US National is available from GitHub (https://github.com/stefaniehartmann/ Science Foundation (DEB-1754451). P.N. was supported by RFBR (grant no. 13-05-01128). The authors also acknowledge support from Science for Life Laboratory, the Knut and Alice readLengthCutoff). Wallenberg Foundation, the National Genomics Infrastructure funded by the Swedish Research Council, and Uppsala Multidisciplinary Center for Advanced Computational Science 28. Gansauge, M.-T. & Meyer, M. Single-stranded DNA library preparation for the sequencing for assistance with massively parallel sequencing and access to the UPPMAX computational of ancient or damaged DNA. Nat. Protocols 8, 737–748 (2013). infrastructure. N. Clark at the Hunterian Museum provided access to the Scotland mammoth 29. John, J. S. SeqPrep: tool for stripping adaptors and/or merging paired reads with overlap sample. Finally, we thank our late friend and colleague A. Sher, who defined and described the into single reads. GitHub https://github.com/jstjohn/SeqPrep (2011). Olyorian sequence, collected large quantities of fossil vertebrate material (including all of the 30. Schubert, M. et al. Improving ancient DNA read mapping against modern reference Early and Middle Pleistocene specimens studied here) and consistently promoted genomes. BMC Genomics 13, 178 (2012). multidisciplinary studies on his finds. 31. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013). Author contributions L.D., A.M.L., B.S., M.H. and I.B. conceived the project. L.D., A.G., P.P. and 32. Feuerborn, T. R. et al. Competitive mapping allows for the identification and exclusion of D.D.-d.-M. designed the study together with P.N. and A.M.L. Laboratory work on Early and human DNA contamination in ancient faunal genomic datasets. BMC Genomics 21, 844 Middle Pleistocene samples was done by P.P., L.D., A.G. and M.D., and G.X. and J.A.T. conducted (2020). laboratory work on Late Pleistocene samples. P.P., T.v.d.V. and D.D.-d.-M. processed and 33. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, mapped sequence data. T.v.d.V., S.H. and P.D.H. performed tests on DNA authenticity. T.v.d.V., 2078–2079 (2009). J.O. and S.L. conducted phylogenetic and Treemix analyses. J.O. and T.v.d.V. computed 34. Jónsson, H., Ginolhac, A., Schubert, M., Johnson, P. L. F. & Orlando, L. mapDamage2.0: genomic age estimates. T.v.d.V., A.B. and D.D.-d.-M. performed analyses on D statistics and f fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics 4 statistics and admixture graph models. T.v.d.V. performed analyses on population structure, 29, 1682–1684 (2013). and ghost admixture. T.v.d.V., E.S., F.R.F. and M.S. performed analysis on selection. L.D., P.D.H., 35. Skoglund, P. et al. Separating endogenous ancient DNA from modern day contamination M.H., B.S., A.G., M.S., P.S., P.N. and A.M.L. provided advice on the bioinformatic analyses and/or in a Siberian Neandertal. Proc. Natl Acad. Sci. USA 111, 2229–2234 (2014). helped to interpret the results. P.N. and A.M.L. provided morphological analyses as well as 36. Korneliussen, T. S., Albrechtsen, A. & Nielsen, R. ANGSD: analysis of next generation palaeontological and geological information. The manuscript was written by T.v.d.V., P.P., sequencing data. BMC Bioinformatics 15, 356 (2014). D.D.-d.-M., P.N. and L.D., with contributions from all co-authors. 37. Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-4.0, 2013–2015. http://www. repeatmasker.org (2015). 38. Green, R. E. et al. A complete Neandertal mitochondrial genome sequence determined Competing interests The authors declare no competing interests. by high-throughput sequencing. Cell 134, 416–426 (2008). 39. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high Additional information throughput. Nucleic Acids Res. 32, 1792–1797 (2004). Supplementary information The online version contains supplementary material available at 40. Meyer, M. et al. Palaeogenomes of Eurasian straight-tusked elephants challenge the https://doi.org/10.1038/s41586-021-03224-9. current view of elephant evolution. eLife 6, e25413 (2017). Correspondence and requests for materials should be addressed to T.v.d.V. or L.D. 41. Yang, Z. Maximum likelihood phylogenetic estimation from DNA sequences with variable Peer review information Nature thanks Gloria Cuenca-Bescós, David Lambert, Krishna rates over sites: approximate methods. J. Mol. Evol. 39, 306–314 (1994). Veeramah and the other, anonymous, reviewer(s) for their contribution to the peer review of 42. Darriba, D., Taboada, G. L., Doallo, R. & Posada, D. jModelTest 2: more models, new this work. heuristics and parallel computing. Nat. Methods 9, 772 (2012). Reprints and permissions information is available at http://www.nature.com/reprints. Extended Data Fig. 1 | Mammoth molars and morphometric comparisons. early Middle Pleistocene (about 1.5–0.5 Ma) North American Mammuthus a, b, Upper third molars in lateral and cross-sectional views. c, Partial lower samples (data points not shown). Green and blue squares, Early and Late third molar in lateral and occlusal views. a, Chukochya (accession number PIN Olyorian northeastern Siberian samples, respectively. Red and green circles, 3341-737). b, Krestovka (accession number PIN 3491-3) flipped horizontally. European M. meridionalis and M. trogontherii, respectively. Blue circles, c, Adycha (accession number PIN 3723-511), occlusal view flipped horizontally. M. primigenius from northeastern Siberia and Alaska. Note (i) the similarity of The lamellae are more closely spaced, and the enamel is thinner, in a Krestovka and Adycha to other molars from the Early Olyorian, and to European (M. primigenius-like) than in b, c (M. trogontherii-like). d, Hypsodonty index steppe mammoths (M. trogontherii); (ii) the similarity of early North American versus lamellar length index of upper M3. e, Enamel thickness index versus mammoths to these (to molars of the Early Olyorian, in particular); and (iii) the basal lamellar length index of lower M3. Olyorian specimens that yielded DNA similarity of Chukochya to M. primigenius. For site details, measurement are labelled by site name. Green dashed line, convex hull summarizing Early to definitions and data, see Supplementary Information section 1. Article

Extended Data Fig. 2 | Sample age on the basis of biostratigraphy, mammal ages (small mammals); LMA, land mammal ages (large mammals); palaeomagnetic reversals and genomic data. Chart shows the stratigraphic MN and MQ, European small mammal biozones; EEBU, East European position of the Kutuyakhian fauna, Phenacomys complex, and Early Olyorian biochronological units. Biostratigraphic- and palaeomagnetic-based and Late Olyorian faunas in relation to important European, northwest Asian chronological constraints for the specimens are provided, in comparison and northern North American stratigraphic benchmarks. ELMA, European land with the DNA-based age estimations. Extended Data Fig. 3 | DNA-fragment length distributions for nine samples. Ultrashort reads (<35 bp) are denoted in red; these were shown to be mammoths. Reads are aligned to the LoxAfr4 autosomes. For the three Early enriched for spurious alignments, and therefore excluded from downstream and Middle Pleistocene samples (Krestovka, Adycha and Chukochya), reads of analyses (Supplementary Information section 4). The mean read lengths (μ) 25–200-bp length are shown; 30–200-bp reads are shown for the remaining were calculated using only the retained reads (≥35 bp). Article

Extended Data Fig. 4 | Post-mortem cytosine deamination damage profiles permafrost-preserved woolly mammoth samples (Oimyakon and Wrangel) and at CpG sites. The most ancient samples (Krestovka, Adycha and Chukochya) the Columbian mammoth (M. columbi) specimen. carry a greater frequency of cytosine deamination compared to younger Extended Data Fig. 5 | F(A|B) statistics. The statistics reflect relative left and the right of the graph, at sites for which the genome on the right panel is divergence between the genomes on the left and the right side. Lower values heterozygous. The lower the value, the more drift has occurred between the indicate reduced derived allele-sharing between the sample indicated on the genomes (and thus the older their genetic divergence). Article

Extended Data Fig. 6 | qpGraph model. The most parsimonious graph model f-statistic units multiplied by 1,000. Discontinuous lines show admixture (highest Bayes factor) of the phylogenetic relationships among mammoth events between lineages, and percentages represent admixture proportions. lineages augmented with one admixture event. Branch lengths are given in Extended Data Fig. 7 | Ghost introgression analysis of the Columbian identified as ghost ancestry in the Colombian mammoth (M. columbi) genome. mammoth genome. a, The number of private alleles per 1,000 bp within c, Maximum-likelihood phylogenies for regions identified as unadmixed genomic regions identified as woolly mammoth (M. primigenius) ancestry or ancestry. ghost ancestry. b, Maximum-likelihood phylogenies for those genomic regions      U%FP401'&e4QS   $%&&'()%012034567%&8(9@ A%P'f4Qf0    

A4(65)146'1BC4567%&8(9@f'HghWgigi 

   

D')%&6203E5FF4&C     

G465&'D'('4&H7I2(7'(6%2F)&%P'67'&')&%15H2B2Q26C%R67'I%&S6746I')5BQ2(7TU72(R%&F)&%P21'((6&5H65&'R%&H%0(2(6'0HC4016&40()4&'0HC  

20&')%&6203TV%&R5&67'&20R%&F462%0%0G465&'D'('4&H7)%Q2H2'(W(''%5&X126%&24QY%Q2H2'(40167'X126%&24QY%Q2HC$7'HSQ2(6T     E6462(62H( V%&4QQ(6462(62H4Q404QC('(WH%0R2&F674667'R%QQ%I20326'F(4&')&'('062067'R235&'Q'3'01W64BQ'Q'3'01WF4206'`6W%&a'67%1(('H62%0T

0b4 $%0R2&F'1

j U7''`4H6(4F)Q'(2c'8d9R%&'4H7'`)'&2F'064Q3&%5)bH%01262%0W32P'04(412(H&'6'05FB'&4015026%RF'4(5&'F'06

j e(646'F'06%0I7'67'&F'4(5&'F'06(I'&'64S'0R&%F12(620H6(4F)Q'(%&I7'67'&67'(4F'(4F)Q'I4(F'4(5&'1&')'46'1QC U7'(6462(62H4Q6'(68(95('1eGfI7'67'&67'C4&'%0'g%&6I%g(21'1 j hdipqrsttsdquvwuwqwxsyi€qvq€vwr‚ƒv€qwsivipqpqd„tv q€vwr‚ƒvqts‚vqrst†iv‡quvrxdƒˆyvwqƒdquxvq‰vuxs€wqwvruƒsd

j e1'(H&2)62%0%R4QQH%P4&246'(6'(6'1

j e1'(H&2)62%0%R40C4((5F)62%0(%&H%&&'H62%0(W(5H74(6'(6(%R0%&F4Q26C40141‘5(6F'06R%&F5Q62)Q'H%F)4&2(%0( eR5QQ1'(H&2)62%0%R67'(6462(62H4Q)4&4F'6'&(20HQ51203H'06&4Q6'01'0HC8'T3TF'40(9%&%67'&B4(2H'(62F46'(8'T3T&'3&'((2%0H%'RR2H2'069 j eGfP4&2462%08'T3T(64014&11'P2462%09%&4((%H246'1'(62F46'(%R50H'&64206C8'T3TH%0R21'0H'206'&P4Q(9

V%&05QQ7C)%67'(2(6'(6203W67'6'(6(6462(62H8'T3T’WuW‚9I267H%0R21'0H'206'&P4Q(W'RR'H6(2c'(W1'3&''(%RR&''1%F401“P4Q5'0%6'1 j ”ƒ•vq“q•„iyvwq„wqv‡„ruq•„iyvwq–xvdv•v‚qwyƒu„iv

j V%&—4C'(240404QC(2(W20R%&F462%0%067'H7%2H'%R)&2%&(401a4&S%PH7420a%06'$4&Q%('66203(

j V%&72'&4&H72H4Q401H%F)Q'`1'(230(W21'062R2H462%0%R67'4))&%)&246'Q'P'QR%&6'(6(401R5QQ&')%&6203%R%56H%F'(

j X(62F46'(%R'RR'H6(2c'(8'T3T$%7'0˜(€WY'4&(%0˜(‚9W2012H462037%I67'CI'&'H4QH5Q46'1

hy‚q–vqrsiivruƒsdqsdqwu„uƒwuƒrwq™s‚qƒsisdƒwuwqrsdu„ƒdwq„‚uƒrivwqsdqt„dpqs™quxvq†sƒduwq„s•v E%R6I4&'401H%1' Y%Q2HC20R%&F462%04B%564P42Q4B2Q26C%RH%F)56'&H%1' f464H%QQ'H62%0 G%(%R6I4&'I4(5('1R%&1464H%QQ'H62%0

f464404QC(2( E'kY&')lTl 4ISPmTiTg —ne4Q0PiToTh E4F6%%Q(lTli YC67%0gToTlp DPTqTrTl 41F2`65&'3&4)78D)4HS43'9 766)(@bb32675BTH%Fb'S2&P203bk)B&56'T326 32675BTH%Fb(6'R402'74&6F400b&'41A'0367$56%RR 32675BTH%Fb)%065((SbYaf6%%Q( F4)f4F43'PTgTiTr 

eGsEfPTiTtgl  

!

D')'46a4(S'&PTmTiTo " # "

766)(@bb32675BTH%FbF)2'P4bF4))203g26'&462P'g4(('FBQ'&8aue9 # a5(HQ'PqThTql ‘a%1'QU'(6PgTlTli —XeEUPlTliTm VeEUaXPTgTi aXsevPTliTl Y2H4&1U%%QS26PTgilt

e1F2`6%%Q(Pp

U&''a2`PTlTlg  

32675BTH%FbA45&26(ES%Pbu06&%3&'((2%0g1'6'H62%0   

Q4(6PTlloi 

F4RgH%0P'&6PTlloi   

YeaAPTlTqTl  

s|&2QQ4PTgiit 



V%&F405(H&2)6(562Q2c203H5(6%F4Q3%&267F(%&(%R6I4&'67464&'H'06&4Q6%67'&'('4&H7B560%6C'61'(H&2B'120)5BQ2(7'1Q26'&465&'W(%R6I4&'F5(6B'F41'4P42Q4BQ'6%'126%&(401  &'P2'I'&(Tn'(6&%03QC'0H%5&43'H%1'1')%(262%0204H%FF5026C&')%(26%&C8'T3Ts26x5B9TE''67'G465&'D'('4&H73521'Q20'(R%&(5BF266203H%1'y(%R6I4&'R%&R5&67'&20R%&F462%0T       

f464  

Y%Q2HC20R%&F462%04B%564P42Q4B2Q26C%R1464  eQQF405(H&2)6(F5(620HQ51'414644P42Q4B2Q26C(646'F'06TU72((646'F'06(7%5Q1)&%P21'67'R%QQ%I20320R%&F462%0WI7'&'4))Q2H4BQ'@

geHH'((2%0H%1'(W502k5'21'062R2'&(W%&I'BQ20S(R%&)5BQ2HQC4P42Q4BQ'1464('6(  geQ2(6%RR235&'(674674P'4((%H246'1&4I1464  ge1'(H&2)62%0%R40C&'(6&2H62%0(%014644P42Q4B2Q26C

eQQ('k5'0H'1464820R4(6kR%&F469R%&(4F)Q'(('k5'0H'120672((651C4&'4P42Q4BQ'67&%53767'X5&%)'40G5HQ'%621'e&H72P'501'&4HH'((2%005FB'& YD}X—mggrtTY&'P2%5(QC)5BQ2(7'114645('120672((651C4&'4P42Q4BQ'46YD}X—gmqrl401YD}X—otgtT

V2'Q1g()'H2R2H&')%&6203 YQ'4('('Q'H667'%0'B'Q%I67462(67'B'(6R26R%&C%5&&'('4&H7TuRC%54&'0%6(5&'W&'4167'4))&%)&246'('H62%0(B'R%&'F4S203C%5&('Q'H62%0T A2R'(H2'0H'( —'74P2%5&4Qy(%H24Q(H2'0H'( j XH%Q%32H4QW'P%Q562%04&Cy'0P2&%0F'064Q(H2'0H'( V%&4&'R'&'0H'H%)C%R67'1%H5F'06I2674QQ('H62%0(W(''0465&'TH%Fb1%H5F'06(b0&g&')%&6203g(5FF4&CgRQ46T)1R

XH%Q%32H4QW'P%Q562%04&Cy'0P2&%0F'064Q(H2'0H'((651C1'(230 eQQ(6512'(F5(612(HQ%('%067'(')%206('P'0I7'067'12(HQ%(5&'2(0'3462P'T E651C1'(H&2)62%0 s'0'62H404QC(2(I'&')'&R%&F'1%0fGe14643'0'&46'1R&%F%P'&lF2QQ2%0C'4&%Q1F4FF%67F%Q4&(W4(I'QQ4(R&%F6I%A46' YQ'2(6%H'0'F4FF%67(TY%)5Q462%03'0'62H(6462(62H(401)7CQ%3'02'(W)&2F4&2QCR%H5('1%03'0'62H&'Q462%0(72)(40141F2`65&' B'6I''067'(4F)Q'(I4(4(('(('15(203(203Q'g05HQ'%621')%QCF%&)72(F(T

D'('4&H7(4F)Q' U7&''X4&QCYQ'2(6%H'0'F4FF%67(4F)Q'(R&%FE2B'&24W4016I%A46'YQ'2(6%H'0'F4FF%67(4F)Q'(R&%FE2B'&24401EH%6Q401 &'()'H62P'QCTu0411262%0W)&'P2%5(QC)5BQ2(7'18Y4QS%)%5Q5'64QTgilpWgilhWAC0H7'64QTgilp940H2'06401F%1'&03'0%F2H1464 R&%F'Q')7406(401F4FF%67(I4(20HQ51'120%5&404QC('(T

E4F)Q203(6&46'3C n'(4F)Q'1R2P'X4&QC6%A46'YQ'2(6%H'0'F4FF%67(4F)Q'(Tn')'&R%&F'1(7%6350('k5'0H2036%%B64204(H%F)Q'6'3'0%F'(4( )%((2BQ'R%&67'('(4F)Q'(T

f464H%QQ'H62%0 fGeR&%F40H2'06(4F)Q'(I4('`6&4H6'1W('k5'0H'1W401)&%H'(('1206%EGY3'0%6C)'H4QQ(%&3'0%6C)'Q2S'Q27%%1(

U2F203401()4624Q(H4Q' E)'H2F'0(%&232046'1R&%FE2B'&24401EH%6Q401WR&%FF4FF%67(6746Q2P'1B'6I''0~lTga4401~gm443%T

f464'`HQ5(2%0( U7'('k5'0H'1464%B6420'1I4()&%H'(('14HH%&12036%(64014&140H2'06fGe)2)'Q20'(W'`HQ51203&'41(B'Q%IqpB4(')42&(20 Q'036740150F'&3'1&'41(Tu0411262%0W&'41(I267Q%IF4))203k54Q26C8a€qi974P'B''0'`HQ51'1TU7''`HQ5(2%0%R&'41(6746 74P'6%%Q%Ik54Q26CW401b%&4&'6%%(7%&66%4HH5&46'QCF4)4&'(64014&1)&4H62H'2067'R2'Q1Tn'4Q(%4))Q2'140%P'Q4))&%4H76% 21'062RC7%I(7%&6&'41(6746H%5Q1B'F4))'14HH5&46'QCW4(%56Q20'12067'E5))Q'F'064&Cu0R%&F462%0T

D')&%15H2B2Q26C eQQF'67%1(5('14&'1'(H&2B'1201'642QW(%674667'404QC('(H40B'&')&%15H'1BC%67'&&'('4&H7'&(BC4HH'((20367'&4I1464 1')%(26'120XGe8(''4B%P'9TeQQ466'F)6(6%&')&%15H'67'&'(5Q6(401404QC('(W1%0'BC(')4&46'H%g4567%&(WI'&'(5HH'(R5Q

D401%F2c462%0 G%&401%F2c462%0I4(1%0'TE4F)Q'(I'&'3&%5)'1BC()'H2'(R%&404QC(2(%R41F2`65&'8Rmg(646(40141F2`65&'3&4)7(9 

—Q201203 —Q201203I4(0%6&'Q'P4066%672((651CTe04QC('(I'&')'&R%&F'1'267'&R%&4QQ2012P2154Q((')4&46'QC%&3&%5)'1BC()'H2'(T|67'&  

(4F)Q'()'H2R2HR'465&'(I'&'0%6&'Q'P4066%67'&'(5Q6(T ! " # " f2167'(651C20P%QP'R2'Q1I%&Sz {'( j G% #

D')%&6203R%&()'H2R2HF46'&24Q(W(C(6'F(401F'67%1(

w n'&'k52&'20R%&F462%0R&%F4567%&(4B%56(%F'6C)'(%RF46'&24Q(W'`)'&2F'064Q(C(6'F(401F'67%1(5('120F40C(6512'(Tx'&'W2012H46'I7'67'&'4H7F46'&24QW

(C(6'F%&F'67%1Q2(6'12(&'Q'P4066%C%5&(651CTuRC%54&'0%6(5&'2R4Q2(626'F4))Q2'(6%C%5&&'('4&H7W&'4167'4))&%)&246'('H62%0B'R%&'('Q'H62034&'()%0('T    

a46'&24Q(y'`)'&2F'064Q(C(6'F( a'67%1(    0b4 u0P%QP'12067'(651C 0b4 u0P%QP'12067'(651C   

j e062B%12'( j $7uYg('k  

j X5S4&C%62HH'QQQ20'( j VQ%IHC6%F'6&C

j Y4Q4'%06%Q%3C4014&H74'%Q%3C j aDugB4('10'5&%2F43203   

j e02F4Q(401%67'&%&3402(F(    

j x5F40&'('4&H7)4&62H2)406(   j $Q202H4Q1464   

j f54Q5('&'('4&H7%RH%0H'&0   Y4Q4'%06%Q%3C401e&H74'%Q%3C 

E)'H2F'0)&%P'040H' U7')&%P'040H'%R67'()'H2F'0(2(1'(H&2B'1201'642Q2067'E5))Q'F'064&C20R%&F462%0%R67'F405(H&2)6TU7'()'H2F'0(4&' H5&&'06QC7'Q14667's'%Q%32H4Qu0(62656'4667'D5((240eH41'FC%REH2'0H'(WU7'G%&67gX4(6u06'&12(H2)Q204&CEH2'062R2HD'('4&H7 u0(62656'0T4TGTeTE72Q%8V4&X4(6—&40H7WD5((240eH41'FC%REH2'0H'(9W40167'x506'&240a5('5F2067'ƒT

U7'()'H2F'0(I'&'(4F)Q'1I26767'R%QQ%I203)'&F2((2%0(@

DXEU|eeWef{$xe401$xƒ|$x{e

E4F)Q'uf@YuGgqmtlgqWqogqgpllWYuGqqmlgoqo

u((52034567%&26C@Y4Q'%06%Q%32H4Qu0(62656'0T4TeTeT—%&2((24S8D5((240eH41'FC%REH2'0H'(9T%REH2'0H'(9Wa%(H%IWD5((24

E4F)Q'1I267)'&F2((2%0R&%F@XTGTa4(H7'0S%

f46'@lq67|H6%B'&Wgilr

E$|UAeGf@

E4F)Q'uf@eplgq

u((52034567%&26C@U7'x506'&240F5('5FWsQ4(3%IWƒ026'12031%F

E4F)Q'1I267)'&F2((2%0R&%F@GT$Q4&S

f46'@li67a4&H7Wgiih

eG$xeAeG@

E4F)Q'uf@mgili

u((52034567%&26C@u06'&12(H2)Q204&CEH2'062R2HD'('4&H7u0(62656'0T4TGTeTE72Q%8V4&X4(6—&40H7WD5((240eH41'FC%REH2'0H'(9T

E4F)Q'1I267)'&F2((2%0R&%F@eTGT%6%P

f46'@gh67G%P'FB'&Wgiim

E)'H2F'01')%(262%0 U7'()'H2F'0(4&')4&6%R67'H%QQ'H62%0(%R67's'%Q%32H4Qu0(62656'4667'D5((240eH41'FC%REH2'0H'(WU7'G%&67gX4(6 u06'&12(H2)Q204&CEH2'062R2HD'('4&H7u0(62656'0T4TGTeTE72Q%8V4&X4(6—&40H7WD5((240eH41'FC%REH2'0H'(9W40167'x506'&240 a5('5F2067'ƒW401H40B'4HH'(('1R%QQ%I20367'(4F')&%H'15&'(4(R%&%67'&()'H2F'0(2067'('H%QQ'H62%0(T   

f46203F'67%1( e3''(62F46'(R%&67'A46'YQ'2(6%H'0'(4F)Q'(741)&'P2%5(QCB''0)'&R%&F'15(203&412%H4&B%01462038&')%&6'120Y4QS%)%5Q%5'6 ! " #

4QTgilq9TU7'X4&QC401a211Q'YQ'2(6%H'0'(4F)Q'(I'&'146'15(2033'%Q%32H4QF'67%1(8(''E5))Q'F'064&Cu0R%&F462%09W401 " 3'0'62H146203R%QQ%I203a'C'&'6T4Q8gilg94(I'QQ4(5(20367'(%R6I4&'—XeEUT #

j U2HS672(B%`6%H%0R2&F674667'&4I401H4Q2B&46'1146'(4&'4P42Q4BQ'2067')4)'&%&20E5))Q'F'064&Cu0R%&F462%0T

X672H(%P'&(2376 G%'672H4Q)'&F26(I'&'0''1'1R%&672((651CW(20H'4QQ404QC('(74P'B''01%0'%0'`620H6()'H2'(T G%6'6746R5QQ20R%&F462%0%067'4))&%P4Q%R67'(651C)&%6%H%QF5(64Q(%B')&%P21'12067'F405(H&2)6T

‚