Genomic insights into population history and biological adaptation in Oceania Jeremy Choin, Javier Mendoza-Revilla, Lara Arauna, Sebastian Cuadros-Espinoza, Olivier Cassar, Maximilian Larena, Albert Min-Shan Ko, Christine Harmant, Romain Laurent, Paul Verdu, et al.

To cite this version:

Jeremy Choin, Javier Mendoza-Revilla, Lara Arauna, Sebastian Cuadros-Espinoza, Olivier Cassar, et al.. Genomic insights into population history and biological adaptation in Oceania. Nature, Nature Publishing Group, 2021, 592 (7855), pp.583-589. ￿10.1038/s41586-021-03236-5￿. ￿pasteur-03205291￿

HAL Id: pasteur-03205291 https://hal-pasteur.archives-ouvertes.fr/pasteur-03205291 Submitted on 11 May 2021

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés.

Distributed under a Creative Commons Attribution| 4.0 International License Article Genomic insights into population history and biological adaptation in Oceania

https://doi.org/10.1038/s41586-021-03236-5 Jeremy Choin1,2,16, Javier Mendoza-Revilla1,16, Lara R. Arauna1,16, Sebastian Cuadros-Espinoza1,3, Olivier Cassar4, Maximilian Larena5, Albert Min-Shan Ko6, Christine Harmant1, Romain Laurent7, Received: 20 May 2020 Paul Verdu7, Guillaume Laval1, Anne Boland8, Robert Olaso8, Jean-François Deleuze8, Accepted: 13 January 2021 Frédérique Valentin9, Ying-Chin Ko10, Mattias Jakobsson5,11, Antoine Gessain4, Laurent Excoffier12,13, Mark Stoneking14, Etienne Patin1,17 ✉ & Lluis Quintana-Murci1,15,17 ✉ Published online: 14 April 2021

Check for updates The Pacifc region is of major importance for addressing questions regarding dispersals, interactions with archaic hominins and natural selection processes1. However, the demographic and adaptive history of Oceanian populations remains largely uncharacterized. Here we report high-coverage genomes of 317 individuals from 20 populations from the Pacifc region. We fnd that the ancestors of Papuan-related (‘Near Oceanian’) groups underwent a strong bottleneck before the settlement of the region, and separated around 20,000–40,000 years ago. We infer that the East Asian ancestors of Pacifc populations may have diverged from Taiwanese Indigenous peoples before the Neolithic expansion, which is thought to have started from Taiwan around 5,000 years ago2–4. Additionally, this dispersal was not followed by an immediate, single admixture event with Near Oceanian populations, but involved recurrent episodes of genetic interactions. Our analyses reveal marked diferences in the proportion and nature of Denisovan heritage among Pacifc groups, suggesting that independent interbreeding with highly structured archaic populations occurred. Furthermore, whereas introgression of Neanderthal genetic information facilitated the adaptation of modern related to multiple phenotypes (for example, metabolism, pigmentation and neuronal development), Denisovan introgression was primarily benefcial for immune-related functions. Finally, we report evidence of selective sweeps and polygenic adaptation associated with pathogen exposure and lipid metabolism in the Pacifc region, increasing our understanding of the mechanisms of biological adaptation to island environments.

Archaeological data indicate that Near Oceania, which includes New Guinea, island environments, and whether archaic introgression facilitated this the Bismarck archipelago and the Solomon Islands, was peopled around process in Oceanian individuals, who present the highest levels of com- 45 thousand years ago (ka)5.The rest of the Pacific—known as Remote Oce- bined Neanderthal and Denisovan ancestry worldwide14–17. We report here ania, and including Micronesia, Santa Cruz, Vanuatu, New Caledonia, Fiji a whole-genome-based survey that addresses a wide range of questions and Polynesia—was not settled until around 35 thousand years later. This relating to the demographic and adaptive history of Pacific populations. dispersal, associated with the spread of Austronesian languages and the Lapita cultural complex, is thought to have started in Taiwan around 5 ka, reaching Remote Oceania by about 0.8–3.2 ka6. Although genetic studies Genomic dataset and population structure of Oceanian populations have revealed admixture with populations of We sequenced the genomes of 317 individuals from 20 populations East Asian origin7–13, attributed to the Austronesian expansion, questions spanning a geographical transect that is thought to underlie the regarding the peopling history of Oceania remain. It is also unknown how peopling history of Near and Remote Oceania (Fig. 1a and Supple- the settlement of the Pacific was accompanied by genetic adaptation to mentary Note 1). These high-coverage genomes (around 36×) were

1Human Evolutionary Unit, Institut Pasteur, UMR 2000, CNRS, Paris, France. 2Université Paris Diderot, Sorbonne Paris Cité, Paris, France. 3Sorbonne Université, Collège doctoral, Paris, France. 4Oncogenic Virus and Pathophysiology, Institut Pasteur, UMR 3569, CNRS, Paris, France. 5Human Evolution, Department of Organismal , Uppsala University, Uppsala, Sweden. 6Key Laboratory of Vertebrate Evolution and Human Origins, Institute of Vertebrate and Paleoanthropology, Chinese Academy of Sciences, Beijing, China. 7Muséum National d’Histoire Naturelle, UMR7206, CNRS, Université de Paris, Paris, France. 8Centre National de Recherche en Génomique Humaine (CNRGH), Institut de Biologie François Jacob, CEA, Université Paris-Saclay, Evry, France. 9Maison de l’Archéologie et de l’Ethnologie, UMR 7041, CNRS, Nanterre, France. 10Environment--Disease Research Center, China Medical University and Hospital, Taichung, Taiwan. 11Science for Life Laboratory, Uppsala University, Uppsala, Sweden. 12Institute of and Evolution, University of Bern, Bern, Switzerland. 13Swiss Institute of , Lausanne, Switzerland. 14Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany. 15Collège de France, Paris, France. 16These authors contributed equally: Jeremy Choin, Javier Mendoza-Revilla, Lara R. Arauna. 17These authors jointly supervised this work: Etienne Patin, Lluis Quintana-Murci. ✉e-mail: [email protected]; [email protected]

Nature | Vol 592 | 22 April 2021 | 583 Article

a SantaCruz (14) b All genomes dbSNP ) 7 New genomes Tikopia (10) New genomes New variants Ureparapara (20) 4 Atayal (10) Ambae (15) 20° N Maewo (8) Paiwan (9) Agta (20) Santo (20) Pentecost 3 (20) of SNPs (×10 549.8 1.24 0. Malakula (20) Ambrym 2 10° N Cebuano (20) (20) Emae (15) 1 Efate (21) Tanna (20) Number 0° 0

Latitude Malaita (18) d 0.15 10° S Vella Lavella Polynesians (18) 0.10 Near and western Rennell (4) Remote Oceanians 20° S Bellona (15) 0.05 0 East/Southeast

110° W 130° W 150° W 170° W PC2 (1.8%) Asians c Longitude –0.05 Australians –0.10 –0.10 –0.05 00.05 20° N PC1 (6.3%) e

) 8.0

10° N –4

7.5 0° Latitude 7.0 10° S

Heterozygosity (×10 6.5 20° S

l a z 110° W 130° W 150° W 170° W n u o Agta anai uano k alaita Cr anna Ataya b Bundi ikopia T Longitude Paiwan ndiawa Lavella M T alakula u Na Ce a M K ll Taiwanese Indigenous peoples Australians Bismarck Islanders SantaCruz Islanders Santa Ve Philippine populations Papua New Guinea Bougainville Islanders Ni-Vanuatu Renell Bell Malaysians Highlanders Solomon Islanders Polynesians

Fig. 1 | Whole-genome variation in Pacific Islanders. a, Location of studied all K values, see Extended Data Fig. 1). ADMIXTURE results for Australian populations. The indented map is a magnification of western Remote Oceania. populations are discussed in Supplementary Note 3. d, PCA of Pacific Islanders Circles indicate newly generated genomes. Sample sizes are indicated in and East Asian individuals. The proportion of variance explained is indicated in parentheses. Squares, triangles and diamonds indicate genomes from Mallick parentheses. e, Population levels of heterozygosity (for all populations, see et al.19, Vernot et al.16 and Malaspinas et al.18, respectively. b, The number of Supplementary Fig. 9). Population samples were randomly down-sampled to SNPs (left), expressed in tens of millions, and comparison with dbSNP (right). obtain equal sizes (n = 5). The line, box, whiskers and points indicate the New variants are SNPs that are absent from available datasets16,18,19 and dbSNP. median, interquartile range, 1.5× the interquartile range and outliers, c, ADMIXTURE ancestry proportions at K = 6 (lowest cross-validation error; for respectively. a, c, Maps were generated using the maps R package51.

analysed with the genomes of selected populations—including Papua probably reflect low effective population sizes. Notably, F-statistics New Guinean Highlanders and Bismarck Islanders16,18,19—and archaic show a higher genetic affinity of ni-Vanuatu from Emae to Polynesian hominins20–22 (Supplementary Note 2 and Supplementary Table 1). The individuals, relative to other ni-Vanuatu, which suggests flow final dataset involves 462 unrelated individuals, including 355 individu- from Polynesia6,23. als from the Pacific region, and 35,870,981 single- poly- morphisms (SNPs) (Fig. 1b). Using ADMIXTURE, principal component analysis (PCA) and a measure of genetic distance (FST), we found that The settlement of Near and Remote Oceania population variation is explained by four components, associated with To explore the peopling history of Oceania, we investigated a set of (1) East and Southeast Asian individuals; (2) Papua New Guinean High- demographic models—driven by several evolutionary hypotheses—with landers; (3) Bismarck Islanders, Solomon Islanders and ni-Vanuatu; a composite likelihood method24 (Supplementary Note 4). We first and (4) Polynesian outliers (here ‘Polynesian individuals’) (Fig. 1c, d, determined the relationship between Papua New Guinean Highland- Extended Data Fig. 1 and Supplementary Note 3). The largest differ- ers and other modern and archaic hominins, and replicated previous ences are between East and Southeast Asian individuals and Papua findings18 (Extended Data Fig. 2a and Supplementary Table 2). We New Guinean Highlanders, the remaining populations show various next investigated the relationship between Near Oceanian groups, proportions of the two components, supporting the Austronesian assuming a three-epoch demography with gene flow. Observed site expansion model8,10,11. Strong similarities are observed between Bis- frequency spectra were best explained by a strong bottleneck before marck Islanders and ni-Vanuatu, consistent with an expansion from the settlement of Near Oceania (effective population size (Ne) = 214; 95% the Bismarck archipelago into Remote Oceania at the end of the Lapita confidence interval, 186–276). The separation of Papua New Guinean period8,10. Levels of heterozygosity differ markedly among Oceanian Highlanders from Bismarck and Solomon Islanders dated back to 39 ka populations (Kruskal–Wallis test, P = 1.4 × 10−12) (Fig. 1e), and correlate (95% confidence interval, 34–45 ka), and that of Bismarck Islanders from with individual admixture proportions (ρ = 0.89, P < 2.2 × 10−16). The low- Solomon Islanders to 20 ka (95% confidence interval, 16–30 ka) (Fig. 2a, est heterozygosity and highest were observed Supplementary Tables 3, 4), shortly after the human settlement of the in Papua New Guinean Highlanders and Polynesian individuals, which region around 30–45 ka5,6.

584 | Nature | Vol 592 | 22 April 2021 a Time (ka) Admixture rate (%) c Time (ka) Bismarck Archipelago 58 (53–59) 4.0 57 (52–59) 3.5

3.0 2.5 39 (34–45) 2.0

25 (20–36) 1.0 20 (16–30)

0 43 (27–58) 15.0510.00.0 4.1 (3.2–5.5) –4 35 (31–39) Posterior probability density (×10 ) SAR HAN TWN PNG NOC SLI BKA Present GST b 58 (53–59) Time (ka) Solomon Islands 57 (52–59) 4.0

3.3.5

3.3.0

252.5

18 (14–22) 2.0 14 (11–18)

8.2 (4.8–12) 1.0 0.3 (0.0–26) 0.0 (0.0–11) 4.3 (0.0–11) 3.9 (2.7–9.1) 1.8 (1.3–2.2) 0 15.0510.00.0 0.0 (0.0–6.4) 0.6 (0.2–1.5) Posterior probability density (×10–4) SAR NEA GST HANTWN PHP POL PNG NOC GST Present Present

Fig. 2 | Demographic models of the human settlement of the Pacific. two-directional arrows indicate asymmetric and symmetric gene flow, a, Maximum-likelihood model for Near Oceanian populations. Point estimates respectively; grey and black arrows indicate continuous and single-pulse gene of parameters and 95% confidence intervals are reported in Supplementary flow, respectively. The 95% confidence intervals are indicated in parentheses. Table 4. The grey area indicates the archaeological period for the settlement of We assumed a mutation rate of 1.25 × 10−8 mutations per generation per site and Near Oceania. b, Maximum-likelihood model for Formosan-speaking (TWN) a generation time of 29 years. We limited the number of parameter estimations and Malayo-Polynesian-speaking (PHP and POL) populations. Point estimates by making simplifying assumptions concerning the recent demography of of parameters and 95% confidence intervals are reported in Supplementary East-Asian-related and Near Oceanian populations in a and b, respectively Table 7 (‘3-pulse model’). a, b, BKA, Bismarck Islanders; HAN, Han Chinese (Supplementary Note 4). Sample sizes are reported in Supplementary Note 4. individuals; NEA GST, a northeast Asian unsampled population; NOC GST, a c, Posterior (coloured lines) and prior (grey areas) distributions for the times of Near Oceanian meta-population; PHP, Philippine individuals; PNG, Papua New admixture between Near Oceanian and East-Asian-related populations, under Guinean Highlanders; POL, Polynesian individuals from the Solomon Islands; the double-pulse most-probable model, obtained by ABC (Supplementary SAR, Sardinian individuals; SLI, Solomon Islanders; TWN, Taiwanese Notes 5, 6). Point estimates and 95% credible intervals are indicated by Indigenous peoples. Rectangle width indicates the estimated effective horizontal lines and rectangles, respectively. The grey rectangle indicates the population size. Black rectangles indicate bottlenecks. One- and archaeological period of the Lapita cultural complex in Near Oceania27.

We then incorporated western Remote Oceanian populations into individuals from the Solomon Islands) diverged around 7.3 ka (95% con- the model, represented by ni-Vanuatu individuals from Malakula. fidence interval, 6.4–11 ka) (Extended Data Fig. 2c), in agreement with a We estimated that the ancestors of ni-Vanuatu individuals received recent genetic study of Philippine populations25. Similar estimates were migrants from the Bismarck that contributed more than 31% of their obtained when modelling other Austronesian-speaking groups (>8 ka) gene pool (95% confidence interval, 31–48%) less than 3 ka (Extended (Supplementary Table 6). These dates are at odds with the out-of-Taiwan Data Fig. 2b and Supplementary Table 5), which is consistent with model—that is, a dispersal event starting from Taiwan around 4.8 ka ancient DNA results8–10. However, the best-fitted model revealed that that brought agriculture and Austronesian languages to Oceania2–4. the Papuan-related population who entered Vanuatu less than 3 ka However, unmodelled gene flow from northeast Asian populations was a mixture of other Near Oceanian sources8,23: the Papuan-related into Austronesian-speaking groups26 could bias parameter estima- ancestors of ni-Vanuatu diverged from Papua New Guinean Highland- tion. When accounting for such gene flow, we obtained consistently ers and later received approximately 24% (95% confidence interval, older divergence times than expected under the out-of-Taiwan model4, 14–41%) of Solomon Islander-related lineages. Interestingly, we found but with overlapping confidence intervals (approximately 8.2 ka; 95% a minimal (<3%) direct contribution of Taiwanese Indigenous peoples confidence interval, 4.8–12 ka) (Fig. 2b and Supplementary Tables 7–9). to ni-Vanuatu individuals, dating back to around 2.7 ka (95% confidence Although this suggests that the ancestors of Austronesian speakers interval, 1.1–7.5 ka). This suggests that the East-Asian-related ancestry separated before the Taiwanese Neolithic2, given the uncertainty in of modern western Remote Oceanian populations has mainly been parameter estimation, further investigation is needed using ancient inherited from admixed Near Oceanian individuals. genomes. We next estimated the time of admixture between Near Oceanian individuals and populations of East Asian origin under various admix- Insights into the Austronesian expansion ture models, using an approximate Bayesian computation (ABC) We characterized the origin of the East Asian ancestry in Oceanian approach (Supplementary Notes 5, 6 and Supplementary Table 10). populations by incorporating Philippine and Polynesian Austronesian We found that a two-pulse model best matched the summary statistics speakers into our models (Supplementary Note 4). Assuming isolation for Bismarck and Solomon Islanders. The oldest pulse occurred after with migration, we estimated that Taiwanese Indigenous peoples and the Lapita emergence in the region around 3.5 ka27 (2.2 ka (95% credible Malayo-Polynesian speakers (Philippine Kankanaey and Polynesian interval, 1.7–3.0) and 2.5 ka (95% credible interval, 2.2–3.4) for Bismarck

Nature | Vol 592 | 22 April 2021 | 585 Article

a e 250 b 75 200 ASN 50 150 20° N 20° N 100 25 50 0 0 30 300 TWN 10° N 10° N 20 200 10 100 0 0 0° 0° 40 150 AGT 30 100 20 Neanderthal ancestry Denisovan ancestry 50 Latitude Latitude 10 0.03 0.03 0 0 10° S 10° S 50 0.02 0.02 40 200 PHP 30 150 0.01 0.01 20 100 10 50 20° S 0 20° S 0 0 0 150 PNG 110° E 120° E 130° E 140° E 150° E 160° E 170° E 110° E 120° E 130° E 140° E 150° E 160° E 170° E 100 100 Longitude Longitude 50 50 0 0 c Density 1.00 d Atayal (TWN) 40 200 BKA Paiwan (TWN) 30 150 Agta (AGT) 100 a Cebuano (PHP) 20 Papuans (PNG) 10 50 Nakanai Bileki (BKA) 0 0 0.75 Vella Lavella (SLI) 150 Malaita (SLI) 100 SLI Santa Cruz (SCI) 100 Rennel Bellona (POL) 50 50 Tikopia (POL) 0.50 0 0 Ureparapara (VAN) Denisovan 200 Maewo (VAN) 60 POL Santo (VAN) Neanderthal 150 Ambae (VAN) 40 100 Pentecost (VAN) 20 50 0.25 Ambrym (VAN) Malakula (VAN) 0 0 Emae (VAN) 150 VAN/

Denisovan ancestry (as 400 Efate (VAN) 100 Tanna (VAN) 200 SCI

percentage of Papuan ancestry) 0 East Asians (ASN) 50 West Eurasians (EUR) 0 0 0.97 0.98 0.99 1.00 0.97 0.98 0.99 1.00 0 0.25 0.50 0.75 1.00 0 500 1,000 1,500 Match rate Match rate Papuan ancestry Introgressed (Mb) (Neanderthal) (Denisovan)

Fig. 3 | Neanderthal and Denisovan introgression across the Pacific. Denisovan (right) genomes, based on long (>2,000 sites), high-confidence a, b, Estimates of Neanderthal (a) and Denisovan (b) ancestry on the basis of archaic haplotypes, to remove false-positive values attributable to incomplete 51 f4-ratio statistics. Maps were generated using the maps R package . lineage sorting. Fitted density curves for populations with significant bimodal c, Correlation between Papuan ancestry and Denisovan ancestry (as a match rate distributions are shown. AGT, Philippine Agta; ASN, East Asian percentage of Papuan ancestry; n = 20 populations). The black line is the individuals (Simons Genome Diversity Project samples only19); EUR, western identity line. Bars denote 2 s.e. of the estimate. d, Cumulative length of the Eurasian individuals; SCI, Santa Cruz Islanders; VAN, ni-Vanuatu. The remaining high-confidence archaic haplotypes retrieved in Pacific, East Asian and west acronyms are as in Fig. 2. Population sample sizes are reported in Eurasian populations. e, Match rate to the Vindija Neanderthal (left) and Altai Supplementary Table 1. and Solomon Islanders, respectively) (Fig. 2c). This reveals that the genome) as previously reported28, but was also found in Taiwanese separation of Malayo-Polynesian peoples from Taiwanese Indigenous Indigenous peoples, Philippine Cebuano and Polynesian individuals. peoples was not followed by an immediate, single admixture episode Haplotypes with a match of approximately 99.4% were significantly with Near Oceanian populations, suggesting that Austronesian speak- longer than those with a match of approximately 98.6% (one-tailed ers went through a maturation phase during their dispersal. Mann–Whitney U-test; P = 5.14 × 10−4), suggesting that—in East Asian populations—introgression from a population closely related to the Altai Denisovan occurred more recently than introgression from the Neanderthal and Denisovan heritage more-distant archaic group. Pacific Islanders have substantial Neanderthal and Denisovan ances- We also observed two Denisovan peaks in Papuan-related popu- 29 −4 try, as indicated by PCA, D-statistics and f4-ratio statistics (Supple- lations (Gaussian mixture model P < 1.68 × 10 ) (Supplementary mentary Note 7). Whereas Neanderthal ancestry is homogeneously Table 11), with match rates of around 98.2% and 98.6% (Fig. 3e). Con- distributed (around 2.2–2.9%), Denisovan ancestry differs markedly sistently, we confirmed using ABC that Papua New Guinean Highlanders between groups (approximately 0–3.2%) and is highly correlated with received two distinct pulses (posterior probability = 99%) (Supplemen- Papuan-related ancestry14,15 (R2 = 0.77, P < 2.1 × 10−7) (Fig. 3a–c). A notable tary Note 12). Haplotypes with an approximately 98.6% match were of exception is the Philippine Agta (who self-identify as ‘Negritos’) and, similar length in all populations (Kruskal–Wallis test, P > 0.05), whereas to a lesser extent, the Cebuano, who have high Denisovan but little haplotypes with a match of around 98.2% were significantly longer in Papuan-related ancestry (R2 = 0.99, P < 2.2 × 10−16, after excluding Agta Papuan-related populations than those with a match of about 98.6% in and Cebuano). other populations (Supplementary Note 10). ABC parameter inference To explore the sources of archaic ancestry, we inferred high- supported a first pulse around 46 ka (95% credible interval, 39–56 ka), confidence introgressed haplotypes (Fig. 3d and Supplementary from a lineage that diverged 222 ka from the Altai Denisovan (95% cred- Note 8) and estimated match rates to the Vindija Nean- ible interval, 174–263 ka) (Supplementary Note 12 and Supplementary derthal and Altai Denisovan genomes. Neanderthal match rates were Table 12) and a second pulse into Papuan-related populations around unimodal in all groups (Fig. 3e) and Neanderthal segments signifi- 25 ka (95% confidence interval, 15–35 ka) from a lineage that separated cantly overlapped between population pairs (permutation-based 409 ka from the Altai Denisovan (95% credible interval, 335–497 ka). P = 1 × 10−4) (Supplementary Notes 9–11), which is consistent with a This model was more-supported than a previously reported model in unique introgression event in the ancestors of non-African populations which the pulse from distantly related Denisovans occurred around from a single Neanderthal population. Conversely, different peaks were 46 ka29 (ABC posterior probability = 99%) (Supplementary Note 12). apparent for Denisovan-introgressed segments (Fig. 3e and Extended Our results document multiple interactions of Denisovans with the Data Fig. 3). A two-peak signal was not only detected in East Asian indi- ancestors of Papuan-related groups and a deep structure of introgress- viduals (around 98.6% and about 99.4% match rate to the Denisovan ing archaic humans.

586 | Nature | Vol 592 | 22 April 2021 a Chr. 15: 28.16 OCA2 Chr. 13: 51.04 DLEU1 b Chr. 2: 150.04 LYPD6B Chr. 3: 68.36 FAM19A1 ** Waist circumference Chr. 6: 39.08 SAYSD1 Chr. 3: 77.32 ROBO2 Triglycerides Chr. 1: 209.8 LAMB3 Chr. 1: 23 EPHB2 Chr. 2: 241.92 SNED1 Chr. 13: 108.2 FAM155A Systolic pressure Chr. 21: 15.56 LIPI, RBM11 Chr. 13: 51.04 DLEU1 * Chr. 10: 64.28 ZNF365 Chr. 17: 40.8 TUBG2, PLEKHH3, CCR10, CNTNAP1 Standing height Chr. 10: 64.32 ZNF365 Chr. 17: 40.84 CNTNAP1, EZH1 Chr. 16: 77.4 ADAMTS18 Chr. 6: 0.4 IRF4 Skin colour Chr. 12: 47.16 SLC38A4 Chr. 12: 13.2 KIAA1467, GSG1 * ** Chr. 4: 183.52 TENM3 Chr. 13: 51.04 DLEU1 Leukocyte count Chr. 12: 129.88 TMEM132D Chr. 13: 108.16 FAM155A Chr. 11: 99.08 CNTN5 Chr. 4: 102.72 BANK1 LDL Chr. 15: 28.16 OCA2 Chr. 4: 102.92 BANK1 Chr. 16: 78.32 WWOX Chr. 4: 102.96 BANK1 Hip circumference Chr. 5: 56.56 GPBP1 Chr. 4: 110.6 CCDC109B, CASP6, PLA2G12A * * Chr. 2: 242.04 MTERFD2, PASK Chr. 3: 4.04 SUMF1 BMD Chr. 18: 69.36 RP11-723G8.2 Chr. 3: 133.56 RAB6B Chr. 1: 3.6 TP73 Chr. 3: 3.96 SUMF1 * ** ** HDL Chr. 2: 242.16 ANO7, HDLBP Chr. 3: 133.28 CDV3, TOPBP1 Chr. 14: 66.4 CTD-2014B16.3 Chr. 1: 65.36 JAK1 ** Hair colour Chr. 4: 111.52 PITX2 Chr. 6: 153.56 AL590867.1 Chr. 9: 114.8 SUSD1 Chr. 6: 153.44 RGS17 Diastolic pressure Chr. 5: 43.6 NNT Chr. 19: 51.8 IGLON5, VSIG10L Chr. 5: 43.64 NNT Chr. 1: 65.4 JAK1 * CAD Chr. 18: 60.16 ZCCHC2 Chr. 19: 51.72 CD33, SIGLECL1 Chr. 2: 242.04 MTERFD2, PASK Chr. 6: 138.2 TNFAIP3 * Height at age 10 Chr. 14: 66.4 CTD-2014B16.3 Chr. 4: 110.6 CCDC109B, CASP6, PLA2G12A Chr. 5: 150.84 SLC36A1 Chr. 6: 138.16 TNFAIP3 ** ** Cholesterol Chr. 12: 57.96 KIF5A, PIP4K2C, DTX3 Chr. 19: 51.76 SIGLECL1 Chr. 16: 89.68 DPEP1, CHMP1A Chr. 5: 73.16 ARHGEF28 ** BMI Chr. 2: 161.08 ITGB6 Chr. 5: 73.12 ARHGEF28 Chr. 4: 183.52 TENM3 Chr. 13: 51.04 DLEU1 Age of menopause Chr. 16: 89.8 ZNF276, FANCA Chr. 15: 31.72 KLF13 Chr. 16: 89.84 FANCA Chr. 11: 22.2 ANO5 ** Age of menarche I I P L L N L R LI C –log [P value] O P value P value 10 H SLI SCI S SC SLI PO AGT TWNAGTP PNG - P TWNAGTPHPPNG PO AS TWNPHP PNG- EU N N A 0.005 0.01 0.05 >0.05 0.005 0.01 0.05 >0.05 V VAN-S VA 5.0 2.5 0 2.5 5.0

Fig. 4 | Mechanisms of genetic adaptation to Pacific environments. CCDC109B is also known as MCUB, KIAA1467 is also known as FAM234B, FAM19A1 a, Genomic regions showing the strongest evidence of adaptive introgression is also known as TAFA1, MTERFD2 is also known as MTERF4, RP11-723G8.2 is also from Neanderthals (red) and Denisovans (purple). Each row is a 40-kb window, known as LINC01899. b, Signals of polygenic adaptation. Blue and brown colours each column is a Pacific population group, and each cell is coloured according indicate the −log10(P value) for a significant decrease (trait iHS > 0) or increase to whether the window is in the top 0.5%, 1%, 5%, >5% of the empirical (trait iHS < 0) in the candidate trait. *P < 0.025; **P < 0.005. BMD, heel-bone distributions of the adaptive introgression Q95 and U-statistics (Supplementary mineral density; BMI, body mass index; CAD, coronary atherosclerosis; Note 14). The starting position and of each genomic window are indicated. HDL high-density lipoprotein levels; LDL, low-density lipoprotein levels. Only the five most extreme windows are shown for each population group. All a, b, Population acronyms are as in Figs. 2, 3. results are reported in Supplementary Note 14 and Supplementary Tables 14, 15.

For the Philippine Agta, we also observed two Denisovan-related and SUMF1)29,32 genes. Our most-extreme candidates comprise 14 previ- peaks, with match rates of around 98.6% and 99.4% (Fig. 3e). We found ously unreported signals in genes relating to the regulation of innate that the 99.4% peak is probably due to gene flow from East Asian popu- and adaptive immunity, including ARHGEF28, BANK1, CCR10, CD33, lations (Supplementary Note 10). Introgressed haplotypes in the Agta DCC, DDX60, EPHB2, EVI5, IGLON5, IRF4, JAK1, LRRC8C and LRRC8D, overlap significantly with those in Papuan-related populations (Sup- and VSIG10L (Fig. 4a and Supplementary Table 15). For example, CD33— plementary Note 11), but their high Papuan-independent Denisovan which mediates cell–cell interactions and keeps immune cells in a rest- ancestry (Fig. 3c) suggests additional interbreeding. This, together ing state34—contains an approximately 30-kb-long haplotype with seven with the discovery of Homo luzonensis in the Philippines30, prompted high-frequency, introgressed variants, including an Oceanian-specific us to search for introgression from other archaic hominins. Using the nonsynonymous variant (rs367689451-A; derived frequency S′ method28, and filtering Neanderthal and Denisovan haplotypes, we (DAF) > 66%) (Extended Data Fig. 5) predicted to be deleterious (SIFT retained 59 archaic haplotypes spanning a total of 4.99 megabases score = 0). Similarly, IRF4—which regulates Toll-like receptor signalling (Mb), around 50% of which were common to most groups (Extended and interferon responses to viral infections35—has an around 29-kb-long Data Fig. 4 and Supplementary Note 13). Focusing on the Agta and haplotype containing 13 high-frequency (DAF > 64%) variants in the Cebuano, we retained only around 1 Mb of introgressed haplotypes that Agta. These results suggest that Denisovan introgression has facili- were private to these groups. This suggests that Homo luzonensis made tated human adaptation by serving as a reservoir of resistance little or no contribution to the genetic make-up of modern humans or against pathogens. that this hominin was closely related to Neanderthals or Denisovans.

Genetic adaptation to island environments The adaptive nature of archaic introgression Finally, we searched for signals of classic sweeps and polygenic adap- Although evidence of archaic adaptive introgression exists31,32, few tation in Pacific populations (Supplementary Notes 16–18 and Sup- studies have evaluated its role in Oceanian populations. We first tested plementary Tables 19–25). We found 44 sweep signals common to all 5,603 biological pathways for enrichment in adaptive introgression Papuan-related groups (empirical P < 0.01) (Extended Data Fig. 6), signals (Supplementary Notes 14, 15). For Neanderthal and Denisovan including the TNFAIP3 gene, which was identified as adaptively intro- segments, a significant enrichment was observed for 24 and 15 path- gressed from Denisovans31 (Extended Data Fig. 7). The strongest hit ways, respectively, of which 9 were related to metabolic and immune (empirical P < 0.001) included GABRP, which mediates the anticon- functions (Supplementary Tables 13–18). Focusing on Neanderthal vulsive effects of endogenous pregnanolone during pregnancy36, and adaptive introgression, we replicated genes such as OCA2, CHMP1A or RANBP17, which is associated with body mass index and high-density LYPD6B31,32 (Fig. 4a). We also identified previously unreported signals lipoprotein cholesterol37 (Extended Data Fig. 8a, b). The highest score in genes relating to immunity (CNTN5, IL10RA, TIAM1 and PRSS57), neu- identified a nonsynonymous, probably damaging variant (rs79997355) ronal development (TENM3, UNC13C, SEMA3F and MCPH1), metabolism in GABRP at more than 70% frequency in Papua New Guinean High- (LIPI, ZNF444, TBC1D1, GPBP1, PASK, SVEP1, OSBPL10 and HDLBP) and landers and ni-Vanuatu, and low frequency (less than 5%) in East and dermatological or pigmentation phenotypes (LAMB3, TMEM132D, Southeast Asian populations. Among population-specific signals, ATG7, PTCH1, SLC36A1, KRT80, FANCA and DBNDD1) (Extended Data Fig. 5), which regulates cellular responses to nutrient deprivation38 and is further supporting the notion that Neanderthal variants, beneficial or associated with blood pressure39, presented high selection scores in not, have influenced numerous human phenotypes31–33. Solomon Islanders. For Denisovans, we replicated signals for immune-related (TNFAIP3, Among populations with high East Asian ancestry, we identified 29 SAMSN1, ROBO2 and PELI2)29,31 and metabolism-related (DLEU1, WARS2 shared sweep signals (P < 0.01) (Extended Data Fig. 9). The highest

Nature | Vol 592 | 22 April 2021 | 587 Article scores (P < 0.001) overlapped with an approximately 1-Mb haplotype Sahul. Third, another pulse28,29—which was specific to Papuan-related containing multiple genes, including ALDH2. ALDH2 deficiency results groups—is derived from a clade more distantly related to Altai Deniso- in adverse reactions to alcohol and is associated with increased survival vans. We date this introgression to approximately 25 ka, suggesting it in Japanese individuals40. The ALDH2 rs3809276 variant occurs in more occurred in Sundaland or further east. Archaic hominins found east than 60% and less than 15% in East-Asian-related and Papuan-related of the Wallace line include Homo floresiensis and Homo luzonensis30,48, groups, respectively. We also detected a strong signal around OSBPL10, suggesting that either these lineages were related to Altai Denisovans, associated with dyslipidaemia and triglyceride levels41 and protection or Denisovan-related hominins were also present in the region. The against dengue42, which we found to have been adaptively introgressed recent dates of Denisovan introgression that we detect in East Asian from Neanderthals (Extended Data Fig. 7). Population-specific signals and Papuan populations indicate that these archaic humans may have included LHFPL2 in Polynesian individuals (Extended Data Fig. 8c, d), persisted as late as around 21–25 ka. Finally, the high Denisovan-related variation in which is associated with eye macula thickness—a highly ancestry in the Agta14,15 suggests that they experienced a different, variable trait involved in sharp vision43. LHFPL2 variants reach around independent pulse. Collectively, our analyses show that interbreed- 80% frequency in Polynesian individuals, but are absent from databases, ing between modern humans and highly structured groups of archaic highlighting the need to characterize genomic variation in understud- hominins was a common phenomenon in the Asia–Pacific region. ied populations. This study reports more than 100,000 undescribed genetic variants Because most adaptive traits are expected to be polygenic44, we in Pacific Islanders at a frequency of more than 1%, some of which are tested for directional selection of 25 complex traits with a well-studied expected to affect phenotype variation. Candidate variants for positive genetic architecture45, by comparing the integrated haplotype scores selection are observed in genes relating to immunity and metabolism, (iHS) of trait-associated alleles to those of matched, random SNPs46. which suggests genetic adaptation to pathogens and food sources that Focusing on European individuals as a control, we found signals of are characteristic of Pacific islands. The finding that some of these polygenic adaptation for lighter skin and hair pigmentation but not for variants were inherited from Denisovans highlights the importance increased height (Fig. 4b), as previously reported46,47. In Pacific popula- of archaic introgression as a source of adaptive variation in modern tions, we detected a strong signal for lower levels of high-density lipo- humans29,31,32,49. Finally, the signal of polygenic adaptation related to cholesterol in Solomon Islanders and ni-Vanuatu (P = 1 × 10−5). levels of high-density lipoprotein cholesterol suggests that there are population differences in lipid metabolism, potentially accounting for the contrasting responses to recent dietary changes in the region50. Implications for human history and health Large genomic studies in the Pacific region are required to understand The peopling of Oceania raises questions about the ability of our the causal links between past genetic adaptation and present-day dis- to inhabit and adapt to insular environments. Using current estimates ease risk, and to promote the translation of medical genomic research of the human mutation rate and generation time18 (Supplementary in understudied populations. Note 4 and Supplementary Tables 2–7), we find that the settlement of Near Oceania 30–45 ka5,6 was rapidly followed by genetic isolation between archipelagos, suggesting that navigation during the Pleisto- Online content cene epoch was possible but limited. Furthermore, our study reveals Any methods, additional references, Nature Research reporting sum- that genetic interactions between East Asian and Oceanian populations maries, source data, extended data, supplementary information, may have been more complex than predicted by the strict out-of-Taiwan acknowledgements, peer review information; details of author con- model4, and suggests that at least two different episodes of admixture tributions and competing interests; and statements of data and code occurred in Near Oceania after the emergence of the Lapita culture11,27. availability are available at https://doi.org/10.1038/s41586-021-03236-5. Our analyses also provide insights into the settlement of Remote Oce- ania. Ancient DNA studies have proposed that Papuan-related peoples 1. Gosling, A. L. & Matisoo-Smith, E. A. The evolutionary history and human settlement of expanded to Vanuatu shortly after the initial settlement, replacing local Australia and the Pacific. Curr. Opin. Genet. Dev. 53, 53–59 (2018). Lapita groups8,10,23. We suggest that most East-Asian-related ancestry in 2. Hung, H.-C. & Carson, M. T. Foragers, fishers and farmers: origins of the Taiwanese modern ni-Vanuatu individuals results from gene flow from admixed Neolithic. Antiquity 88, 1115–1131 (2014). 3. Gray, R. D., Drummond, A. J. & Greenhill, S. J. Language phylogenies reveal expansion Near Oceanian populations, rather than from the early Lapita settlers. pulses and pauses in Pacific settlement. Science 323, 479–483 (2009). These results, combined with evidence of back migrations from Poly- 4. Bellwood, P. First Farmers: the Origins of Agricultural Societies (Blackwell, 2005). 6,10,23 5. O’Connell, J. F. et al. When did Homo sapiens first reach Southeast Asia and Sahul? Proc. nesia , support a scenario of repeated population movements in Natl Acad. Sci. USA 115, 8482–8490 (2018). the Vanuatu region. Given that we explored a relatively limited number 6. Kirch, P. V. On the Road of the Winds: An Archeological History of the Pacific Islands before of models, archaeological, morphometric and palaeogenomic studies European Contact (Univ. California Press, 2017). 7. Wollstein, A. et al. Demographic history of Oceania inferred from genome-wide data. are required to elucidate the complex peopling history of the region. Curr. Biol. 20, 1983–1992 (2010). The recovery of diverse Denisovan-introgressed material in our data- 8. Lipson, M. et al. Population turnover in Remote Oceania shortly after initial settlement. set, together with previous studies28,29, shows that modern humans Curr. Biol. 28, 1157–1165 (2018). 9. Skoglund, P. et al. Genomic insights into the peopling of the Southwest Pacific. Nature received multiple pulses from different Denisovan-related groups 538, 510–513 (2016). (Extended Data Fig. 10). First, we estimate that the East-Asian-specific 10. Posth, C. et al. Language continuity despite population replacement in Remote Oceania. pulse28, derived from a clade closely related to the Altai Denisovan, Nat. Ecol. Evol. 2, 731–740 (2018). 11. Pugach, I. et al. The gateway from Near into Remote Oceania: new insights from occurred around 21 ka. The geographical distribution of haplotypes genome-wide data. Mol. Biol. Evol. 35, 871–886 (2018). from this clade indicates that it probably occurred in mainland East 12. Bergström, A. et al. A Neolithic expansion, but strong genetic structure, in the Asia. Second, another clade distantly related to Altai Denisovans28,29 independent history of New Guinea. Science 357, 1160–1163 (2017). 13. Ioannidis, A. G. et al. Native American gene flow into Polynesia predating Easter Island contributed haplotypes of similar length to Near Oceanian populations, settlement. Nature 583, 572–577 (2020). East Asian populations and Philippine Agta. Because our models do 14. Qin, P. & Stoneking, M. Denisovan ancestry in East Eurasian and Native American not support a recent common origin of Near Oceanian and East Asian populations. Mol. Biol. Evol. 32, 2665–2674 (2015). 15. Reich, D. et al. Denisova admixture and the first modern human dispersals into Southeast populations, we suggest that East Asian populations inherited these Asia and Oceania. Am. J. Hum. Genet. 89, 516–528 (2011). archaic segments indirectly, via gene flow from a population ancestral 16. Vernot, B. et al. Excavating Neandertal and Denisovan DNA from the genomes of to the Agta and/or Near Oceanian populations. Assuming a pulse into Melanesian individuals. Science 352, 235–239 (2016). 17. Sankararaman, S., Mallick, S., Patterson, N. & Reich, D. The combined landscape of the ancestors of Near Oceanian individuals, we date this introgres- Denisovan and Neanderthal ancestry in present-day humans. Curr. Biol. 26, 1241–1247 (2016). sion to around 46 ka, possibly in Southeast Asia, before migrations to 18. Malaspinas, A. S. et al. A genomic history of Aboriginal Australia. Nature 538, 207–214 (2016).

588 | Nature | Vol 592 | 22 April 2021 19. Mallick, S. et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse 36. Hedblom, E. & Kirkness, E. F. A novel class of GABAA receptor subunit in tissues of the populations. Nature 538, 201–206 (2016). reproductive system. J. Biol. Chem. 272, 15346–15350 (1997). 20. Prüfer, K. et al. The complete genome sequence of a Neanderthal from the Altai 37. Hoffmann, T. J. et al. A large multiethnic genome-wide association study of adult body Mountains. Nature 505, 43–49 (2014). mass index identifies novel loci. Genetics 210, 499–515 (2018). 21. Prüfer, K. et al. A high-coverage Neandertal genome from Vindija Cave in Croatia. Science 38. Lee, I. H. et al. Atg7 modulates p53 activity to regulate cell cycle and survival during 358, 655–658 (2017). metabolic stress. Science 336, 225–228 (2012). 22. Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. 39. Giri, A. et al. Trans-ethnic association study of blood pressure determinants in over Science 338, 222–226 (2012). 750,000 individuals. Nat. Genet. 51, 51–62 (2019). 23. Lipson, M. et al. Three phases of ancient migration shaped the ancestry of human 40. Sakaue, S. et al. Functional variants in ADH1B and ALDH2 are non-additively associated populations in Vanuatu. Curr. Biol. 30, 4846–4856 (2020). with all-cause mortality in Japanese population. Eur. J. Hum. Genet. 28, 378–382 (2020). 24. Excoffier, L., Dupanloup, I., Huerta-Sánchez, E., Sousa, V. C. & Foll, M. Robust 41. Perttilä, J. et al. OSBPL10, a novel candidate gene for high triglyceride trait in dyslipidemic demographic inference from genomic and SNP data. PLoS Genet. 9, e1003905 (2013). Finnish subjects, regulates cellular lipid metabolism. J. Mol. Med. 87, 825–835 (2009). 25. Larena, M. et al. Multiple migrations to the Philippines during the last 50,000 years. Proc. 42. Sierra, B. et al. OSBPL10, RXRA and lipid metabolism confer African-ancestry protection Natl Acad. Sci. USA, https://doi.org/10.1073/pnas.2026132118 (2021). against dengue haemorrhagic fever in admixed Cubans. PLoS Pathog. 13, e1006220 (2017). 26. Yang, M. A. et al. Ancient DNA indicates human population shifts and admixture in 43. Gao, X. R., Huang, H. & Kim, H. Genome-wide association analyses identify 139 loci northern and southern China. Science 369, 282–288 (2020). associated with macular thickness in the UK Biobank cohort. Hum. Mol. Genet. 28, 27. Rieth, T. M. & Athens, J. S. Late Holocene human expansion into Near and Remote 1162–1172 (2019). Oceania: a Bayesian model of the chronologies of the Mariana Islands and Bismarck 44. Sella, G. & Barton, N. H. Thinking about the evolution of complex traits in the era of Archipelago. J. Island Coast. Archaeol. 14, 5–16 (2019). genome-wide association studies. Annu. Rev. Hum. Genet. 20, 461–493 (2019). 28. Browning, S. R., Browning, B. L., Zhou, Y., Tucci, S. & Akey, J. M. Analysis of human 45. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. sequence data reveals two pulses of archaic Denisovan admixture. Cell 173, 53–61 (2018). Nature 562, 203–209 (2018). 29. Jacobs, G. S. et al. Multiple deeply divergent Denisovan ancestries in Papuans. Cell 177, 46. Field, Y. et al. Detection of human adaptation during the past 2000 years. Science 354, 1010–1021 (2019). 760–764 (2016). 30. Détroit, F. et al. A new species of Homo from the Late Pleistocene of the Philippines. 47. Berg, J. J. et al. Reduced signal for polygenic adaptation of height in UK Biobank. eLife 8, Nature 568, 181–186 (2019). e39725 (2019). 31. Gittelman, R. M. et al. Archaic hominin admixture facilitated adaptation to out-of-Africa 48. Brown, P. et al. A new small-bodied hominin from the Late Pleistocene of Flores, environments. Curr. Biol. 26, 3375–3382 (2016). Indonesia. Nature 431, 1055–1061 (2004). 32. Racimo, F., Marnetto, D. & Huerta-Sánchez, E. Signatures of archaic adaptive 49. Gouy, A. & Excoffier, L. Polygenic patterns of adaptive introgression in modern humans introgression in present-day human populations. Mol. Biol. Evol. 34, 296–317 (2017). are mainly shaped by response to pathogens. Mol. Biol. Evol. 37, 1420–1433 (2020). 33. Simonti, C. N. et al. The phenotypic legacy of admixture between modern humans and 50. Gosling, A. L., Buckley, H. R., Matisoo-Smith, E. & Merriman, T. R. Pacific populations, Neandertals. Science 351, 737–741 (2016). metabolic disease and ‘just-so stories’: a critique of the ‘thrifty genotype’ hypothesis in 34. Vitale, C. et al. Surface expression and function of p75/AIRM-1 or CD33 in acute myeloid Oceania. Ann. Hum. Genet. 79, 470–480 (2015). leukemias: engagement of CD33 induces apoptosis of leukemic cells. Proc. Natl Acad. 51. R Core Team. R: A language and environment for statistical computing. http:// Sci. USA 98, 5764–5769 (2001). www.R-project.org/ (R Foundation for Statistical Computing, 2013). 35. Negishi, H. et al. Negative regulation of Toll-like-receptor signaling by IRF-4. Proc. Natl Acad. Sci. USA 102, 15989–15994 (2005). © The Author(s), under exclusive licence to Springer Nature Limited 2021

Nature | Vol 592 | 22 April 2021 | 589 Article Methods detected with KING v.2.157. Previously unknown SNPs were identified by comparison with available datasets16,18,19 and dbSNP58. Data reporting No statistical methods were used to predetermine sample size. The Genetic structure analyses experiments were not randomized and the investigators were not PCAs were performed with the ‘SmartPCA’ algorithm implemented blinded to allocation during experiments and outcome assessment. in EIGENSOFT v.6.1.459. The genetic structure was determined with the unsupervised model-based clustering algorithm implemented in Sample collection and approvals ADMIXTURE60, which was run—assuming K = 1 to K = 12—100 times with Samples were obtained from 317 adult volunteers in Taiwan, the Philip- different random seeds. Linkage disequilibrium (r2) between SNP pairs pines, the Solomon Islands and Vanuatu from 1998 to 2018. DNA was was estimated with Haploview61, which was averaged per bin of genetic extracted from blood, saliva or cheek swabs (Supplementary Note 1). distance using the phase 3 genetic map62.

Informed consent was obtained from each participant, including con- FST values were estimated by analysis of molecular variance (AMOVA) sent for genetics research, after the nature and scope of the research as previously described63 (Supplementary Note 3). was explained in detail. The study received approval from the Institu- tional Review Board of Institut Pasteur (2016-02/IRB/5), the Ethics Com- Demographic inference mission of the University of Leipzig Medical Faculty (286-10-04102010), Demographic parameters were estimated with the simulation-based the Ethics Committee of Uppsala University ‘Regionala Etikprövn- framework implemented in fastsimcoal v.2.624. We filtered out sites ingsnämnden Uppsala’ (Dnr 2016/103) and from the local authori- (1) within CpG islands64; (2) within genes; and (3) outside of Vindija ties, including the China Medical University Hospital Ethics Review Neanderthal and Altai Denisovan accessibility masks. These masks Board, the National Commission for Culture and the Arts (NCCA) of the exclude sites (1) at which at least 18 out of 35 overlapping 35-mers Philippines, the Solomon Islands Ministry of Education and Training are mapped elsewhere in the genome with zero or one mismatch; and the Vanuatu Ministry of Health (Supplementary Note 1). The con- (2) with coverage of less than 10; (3) with mapping quality less than 25; sent process, sampling and/or subsequent validation in the Philippines (4) within tandem repeats; (5) within small insertions or deletions; and were performed in coordination with the NCCA and, in Cagayan val- (6) within coverage filters stratified by GC content. For each demo- ley region, with local partners or agencies, including Cagayan State graphic model, we performed 600,000 simulations, 65 conditional University, Quirino State University, Indigenous Cultural Community maximization cycles and 100 replicate runs starting from different Councils, Local Government Units and/or regional office of National random initial values. We limited overfitting by considering only Commission on Indigenous Peoples. More details about the sampling site frequency spectrum (SFS) entries with more than five counts for in the Philippines can be found in ref. 25. Research was conducted in parameter estimation. We optimized the fit between expected and accordance with: (i) ethical principles set forth in the Declaration observed SFS values following a previously described approach18,65,66. of Helsinki (version: Fortaleza October 2013), (ii) European direc- Specifically, we first calculated and optimized the likelihood with all of tives 2001/20/CE and 2005/28/CE, (iii) principles promulgated in the SFS entries for the first 25 cycles. We then used only polymorphic the UNESCO International Declaration on Human Genetic Data and sites for the remaining 40 cycles. We obtained maximum-likelihood (iv) principles promulgated in the Universal Declaration on the Human estimates of demographic parameters, by first selecting the 10 runs Genome and Human Rights. with the highest likelihoods from the 100 replicate runs. To account for the stochasticity that is inherent to the approximation of the like- Whole-genome sequencing data lihood using coalescent simulations, we re-estimated the likelihood Whole-genome sequencing was performed on the 317 individual sam- of each of the 10 best runs, using 100 expected SFS obtained using ples (Supplementary Table 1), with the TruSeq DNA PCR-Free or Nano 600,000 simulations. Finally, we re-estimated again the likelihood Library Preparation kits (Illumina). After quality control, qualified of the three runs with the highest average, this time using 107 simu- libraries were sequenced on a HiSeq X5 Illumina platform to obtain lations, and considered the run with the highest likelihood as the paired-end 150-bp reads with an average sequencing depth of 30× per maximum-likelihood run. We corrected for the different numbers sample. FASTQ files were converted to unmapped BAM files (uBAM), of SNPs in the expected and observed SFS, by rescaling parameters read groups were added and Illumina adapters were tagged with by a rescaling factor defined as Sobs/Sexp: the Ne and generation times Picard Tools version 2.8.1 (http://broadinstitute.github.io/picard/). were multiplied by the rescaling factor, whereas migration rates were Read pairs were mapped onto the human reference genome divided by the rescaling factor. For all inferences, we considered a (hs37d5), with the ‘mem’ algorithm from Burrows–Wheeler Aligner mutation rate of 1.25 × 10−8 mutations per generation per site19,67 and v.0.7.1352 and duplicates were marked with Picard Tools. Base quality a generation time of 29 years68. We also provide estimates of diver- scores were recalibrated with the Genomic Analysis ToolKit (GATK) gence and admixture times assuming a mutation rate of 1.4 × 10−8 software v.3.853. mutations per generation per site69 (Supplementary Tables 3–7). Whole-genome data for Bismarck Islanders16 were processed in the Model assumptions and parameter search ranges can be found in same manner as the newly generated genomes, while for Papua New Supplementary Note 4. Guinean Highlanders18 and other populations of interest19, raw BAM We checked the fit of each best-fit model, by comparing all entries files were converted into uBAM files, and processed as described above. of the observed SFS against simulated entries, averaged over 100 Variant calling was performed following the GATK best-practice recom- expected SFS obtained with fastsimcoal224 (Supplementary Note 4). 54 mendations . All samples were genotyped individually with ‘Haplo- We also compared observed and simulated FST values, computed with typeCaller’ in gvcf mode. The raw multisample VCF was then generated vcftools v.0.1.1370, for all population pairs. We checked that parameter with the ‘GenotypeGVCFs’ tool. Using BCFtools v.1.8 (http://www.htslib. estimates were not affected by background selection and biased gene org/), we applied different hard quality filters on invariant and variant conversion (Supplementary Note 4). We calculated confidence inter- sites, based on coverage depth, genotype quality, Hardy–Weinberg vals with a nonparametric block bootstrap approach; we generated 100 equilibrium and genotype missingness (Supplementary Note 2). The bootstrapped datasets by randomly sampling with replacement the same sequencing quality was assessed by several statistics (that is, breadth number of 1-Mb blocks of concatenated genomic regions as were present of coverage 10×, transition/transversion ratio and per-sample miss- in the observed data. For each bootstrapped dataset, we obtained ingness) computed with GATK54 and BCFtools. Heterozygosity was multi-SFS with Arlequin v.3.5.2.271 and re-estimated parameters with assessed with PLINK v.1.9055,56 and cryptically related samples were the same settings as for the observed dataset, with 20 replicate runs. Finally, to obtain the 95% confidence intervals, we calculated the 2.5% We performed 10,000 additional simulations for parameter estimation and 97.5% percentile of the estimate distribution obtained by nonpara- under the winning model. As summary statistics, we used the mean and metric bootstrapping. variance, across the 100 5-Mb regions, of the mean, minimum and maxi- For model selection, classical model choice procedures, such as mum of the distribution of the length of admixture tracts across Near the likelihood ratio tests, could not be used because the likelihood Oceanian populations. The six resulting summary statistics were com- function used in fastsimcoal224 is a composite likelihood (owing to puted based on local ancestry inference, with RFMix v.1.5.480, which was the presence of linked SNPs in the data). Instead, we compared the run with three expectation-maximization steps, a window of 0.03 cM, likelihoods of the most likely runs between the alternative models, and Taiwanese Indigenous peoples and Papua New Guinean Highlanders estimated from 600,000 simulations. We also compared the distribu- as source populations. The performance of the method is described in tion of the log10(likelihood) of the observed SFS based on 100 expected Supplementary Note 6. We used the logistic multinomial regression and SFS computed with 107 coalescent simulations, using parameters maxi- the neural-network ABC methods implemented in the abc R package78 mizing the likelihood under each scenario. A model was considered for model choice and parameter estimation, respectively. the most likely if its mean log10(likelihood) was 50 units larger than that of the second most likely model66. We estimated by simulations Archaic introgression that this criterion results in an 81% probability to select the true model Before performing archaic introgression analyses, we masked our (Supplementary Note 4). whole-genome sequencing dataset for regions non-accessible in archaic We evaluated the accuracy of demographic parameter estimation, genomes. We merged the masked dataset with the high-coverage using a parametric bootstrap approach. We simulated, with fastsim- genomes of Vindija and Altai Neanderthals and the Altai Denisovan20–22. coal224, x 1-Mb DNA loci, with x chosen to obtain the same numbers We assessed introgression between archaic hominins and modern of segregating SNPs and monomorphic sites as in the observed data, humans with D-statistics81. We computed a D-statistic of the form assuming parameters maximizing the likelihood under each model. We D(X, West Eurasians/East Asians/Africans; Neanderthal Vindija, chim- then generated 20 simulated SFS by random sampling and used boot- panzee) and D(X, West Eurasians/East Asians/Africans; Neanderthal strapped SFS to re-estimate parameters under the same settings as for Vindija, Denisova Altai) to test for introgression from Neanderthal; the original dataset (65 expectation conditional maximization cycles, and D-statistics of the form D(X, West Eurasians/East Asians; Denisova 600,000 simulations and 100 runs per simulated SFS). We calculated Altai, chimpanzee) and D(X, West Eurasians/East Asians; Denisova Altai, the mean, median and the 2.5% and 97.5% percentiles of the distribu- Neanderthal Vindija) to test introgression from Denisovans. The last tion of parameter estimates obtained by parametric bootstrapping, two D-statistics were used to account for the more-recent common and checked that they included the true (simulated) parameter value. ancestor between Neanderthals and Denisovans. We computed f4-ratios to estimate the proportion of genome-wide Neanderthal and Den- Admixture models isovan introgression in a modern human population (Supplementary 72 We applied two ABC approaches to test for different admixture models Note 7). All D- and f4-ratio statistics were computed with ‘qpDstat’ and for Near Oceanian populations and estimated parameters under the ‘qpF4ratio’ implemented in ADMIXTOOLS v.5.1.181. A weighted-block most probable model. Model choice and posterior parameter estima- jackknife procedure dropping 5-cM blocks of the genome in each run tion by ABC are based on summary statistics73. The first approach, was used to compute standard errors. developed in the MetHis method74, is based on the moments of the We used two statistical methods to identify archaic sequences in distribution of admixture proportions and explicit forward-in-time modern human genomes. The first, S-prime (S′), identifies introgressed simulations that follow a general mechanistic admixture model75. sequences without the use of an archaic reference genome28. For the The second approach uses—as summary statistics—the moments of identification of S′ introgressed segments in Pacific genomes, we only the distribution of the length of admixture tracts76,77. We assumed considered variants with a frequency less than 1% in African individu- three competing models of admixture: a single-pulse, a two-pulse or als from the Simons Genome Diversity Project (SGDP) dataset19, and a constant-recurring model (Supplementary Notes 5, 6). We checked segments were detected in each population separately. Genetic dis- a priori the goodness-of-fit of simulated and observed statistics with tances between sites were estimated from the 1000 Genomes Project the gfit function implemented in the abc R package78. Method perfor- phase 3 genetic map62. After retrieving empirical S′ scores, we estimated mance was assessed by estimating the error rates by cross-validation, a null distribution of S′ scores by simulating—with fastsimcoal224—2,500 and by checking a posteriori that the statistics simulated under the 10-Mb genomic regions under the best-fitted demographic model most probable model closely fitted the observed statistics. for western Remote Oceanian populations (Supplementary Note 4). For the MetHis approach, we simulated 100,000 independent We fixed all parameters to maximum-likelihood estimates, but SNPs segregating in the two source populations with fastsimcoal224, removed the simulated introgression pulses from Neanderthals under the refined demographic model for Near Oceanian populations and Denisovans. On the basis of these null distributions of S′ scores, (Fig. 2a). From the foundation of the admixed population to the present we estimated the threshold giving a false-positive rate of less than generation, the forward-in-time evolution of the 100,000 SNPs in the 0.01, to retain significantly introgressed S′ haplotypes (Supplemen- admixed population was simulated with MetHis74, under the classical tary Note 8). Wright–Fisher model. For model choice, we conducted 10,000 inde- The second method, based on conditional random fields (CRF), pendent simulations under each of the three competing models. On the identifies introgressed archaic haplotypes in phased genomic data, basis of 30,000 simulations, we used the random-forest ABC approach79 using a reference archaic genome17,82. We phased the data with SHA- implemented in the abcrf R package. For the best scenario identified, PEIT283,84, using 200 conditioning states, 10 burn-in steps and 50 we conducted an additional 20,000 simulations with MetHis. We then Markov chain Monte Carlo main steps, for a window length of 0.5 cM used all 30,000 simulations computed under the winning scenario for and an effective population size of 15,000. For the detection of joint posterior parameter estimation, with the neural-network ABC Neanderthal-introgressed haplotypes, we used as reference panels approach implemented in the abc R package78. The performance of the Vindija Neanderthal genome and SGDP African individuals19 merged the method is described in Supplementary Note 5. with the Altai Denisovan genome. To detect Denisovan-introgressed For the approach based on admixture tract length, we performed— haplotypes, we used as reference panel the Altai Denisovan genome under each alternative admixture model—5,000 simulations of 100 5-Mb and SGDP African individuals19 merged with the Vindija Neanderthal linked DNA loci with fastsimcoal224, assuming a variable recombination genome. Results from the two independent runs were analysed jointly rate sampled from the 1000 Genomes Project phase 3 genetic map62. to keep those containing alleles with a marginal posterior probability Article

PNeanderthal ≥ 0.9 and PDenisova < 0.5 as Neanderthal-introgressed haplotypes and those containing alleles with PDenisova ≥ 0.9 and PNeanderthal < 0.5 as Adaptive introgression Denisovan-introgressed haplotypes. Candidate regions for adaptive introgression were detected on the We computed a match rate between each detected S′ or CRF seg- basis of the number and derived allele frequency of sites common to ment and the Vindija Neanderthal and Altai Denisovan genomes modern and archaic humans (Supplementary Note 14), with Q95 and as previously described28 (Supplementary Note 9). We considered U-statistics32. We computed these statistics in 40-kb non-overlapping that a site matches if the putative introgressed allele is observed in windows along the genome of all target populations, using SGDP Afri- the archaic genome. The match rate was calculated as the number can individuals19 as the outgroup. We used the chimpanzee reference of matches divided by the total number of compared sites. Because genome to determine the ancestral or derived states of alleles, removed longer S′ haplotypes carry more information on the archaic ori- sites with any missing genotypes, and discarded genomic windows gin of introgressed segments, we computed only match rates for with fewer than five sites. Candidate genomic windows were defined as S′ haplotypes with more than 40 unmasked sites. For the statisti- those with both U and Q95 statistics in the top 0.5% of their respective cal assessment and assignment of introgressed haplotypes to dif- genome-wide distributions. ferent Denisovan components, we fitted single Gaussian versus We assessed the enrichment of introgressed genes in various biologi- two-component Gaussian mixtures to the Denisovan match rate cal pathways, including the Kyoto Encyclopedia of Genes and Genomes distributions (Supplementary Note 10). (KEGG)85, Wikipathways86, the genome-wide association studies We estimated the sharing of introgressed haplotypes between (GWAS) catalogue87, Gene Ontology88, and manually curated lists of populations by first retaining S′ introgressed haplotypes with a score innate immunity genes89 and virus-interacting proteins90. We merged >190,000 and a length of at least 40 kb (Supplementary Note 11). We Pacific populations into three population groups (Supplementary then classified each haplotype as of either Neanderthal or Denisovan Note 15). We assessed statistical significance using a resampling-based origin, as previously described28. For each haplotype present in a given enrichment test that compares the number of introgressed genes in a population, we then estimated the fraction of base-pair overlap with given gene set to that observed in randomly sampled sets of genes that the haplotypes present in a second population, with respect to the are matched for different genomic features (that is, recombination length of the segments in the first. As a test statistic, we computed the rate, PhastCons91, combined annotation-dependent depletion (CADD) proportion of segments with a fraction of base-pair overlap greater scores92, density of DNase I segments93 and number of SNPs). We also than 0.5. We assessed significance by performing 10,000 bootstrap determined whether a given gene set was enriched in adaptively intro- iterations, in which we randomly placed introgressed segments with gressed genes, by comparing the number of genes overlapping an the same number and of the same length as observed along the callable adaptively introgressed segment in the gene set with that observed genome (around 2.1 Gb). For each population pairwise comparison, we in randomly sampled sets of matched genes. Adaptively introgressed reported the highest P value of the two. All P values were adjusted for segments were defined as those intersecting with genomic windows multiple testing with the Benjamini–Hochberg method. with Q95 and U-statistics in the top 5% of their respective genome-wide We formally tested for the presence of two distinct Denisovan distributions. lineages in Papuan-related populations with an ABC approach72, by performing 50,000 independent simulations of 64 DNA sequences Classic sweeps of 10 Mb each with fastsimcoal224. We simulated the demographic For the detection of classic sweep signals, we combined the inter- model for Near Oceanian populations (Fig. 2a), introducing one or two population locus-specific branch lengths (LSBL)94 and cross-population Denisovan pulses into the Papua New Guinean branch, and a popula- extended haplotype homozygosity (XP-EHH)95 statistics into a Fisher’s tion resize in Papua New Guinea to capture the demographic effect score (FCS). We estimated the FCS as the sum of the −log10(percentile of the agricultural transition12 (Supplementary Note 12). As summary rank of the statistic for a given SNP) of all statistics, and defined ‘outlier statistics, we used the moments of the distribution of the S′ scores, SNPs’ as those with a FCS among the 1% highest genome-wide. Putatively S′ haplotype length and S′ match rate to the Altai Denisovan genome. selected regions were defined as genomic windows with a proportion We determined which of the single- and double-pulse introgression of outlier SNPs within the 1% highest genome-wide, after partition- models was the most probable, using a logistic multinomial regression ing all windows into five bins based on the number of SNPs. The test, algorithm with a tolerance rate set to 5%. We estimated the performance reference and outgroup populations used are described in Supple- of our ABC model choice by cross-validation. Parameter estimation mentary Note 16. LSBL and XP-EHH statistics were computed with the under the double-pulse winning model was performed on the basis optimized, window-based algorithms implemented in selink (https:// of an additional 150,000 independent simulations, using the neural github.com/h-e-g/selink). network algorithm with a tolerance rate set to 5%. We used the same procedure to test whether our two-pulse model, in which the pulse Polygenic adaptation from a more-distant Denisovan lineage occurs later than the other We searched for evidence of polygenic adaptation, using an approach pulse, fits the data better than a previous model in which the pulse testing whether the mean integrated haplotype score (iHS) of from a more-distant Denisovan lineage occurs earlier than the other trait-increasing alleles differed significantly from that of random pulse29. Introgression parameter values were sampled from uniform SNPs with a similar allele frequency46,96. We obtained GWAS summary priors limited by the previously obtained 95% confidence intervals statistics for 25 candidate complex traits from the UK Biobank data- (Supplementary Note 12). base45, including traits relating to morphology, metabolism and immu- We investigated whether Pacific populations had received gene nity, as these phenotypic traits are strong candidates for responses, flow from an unknown archaic hominin, by retaining S′ haplotypes through natural selection, to changes in climatic, nutritional and unlikely to be of Neanderthal or Denisovan origin, through the removal pathogenic environments. We classified SNPs as ‘trait-increasing’ or of Neanderthal and Denisovan haplotypes inferred by the CRF approach ‘trait-decreasing’ based on UK Biobank effect size (β) estimates. We (Supplementary Note 13). We characterized these S′ haplotypes fur- computed iHS with selink, for each SNP and population, and standard- ther by estimating their match rates to the Vindija Neanderthal and ized scores in 100 bins of DAF. We then polarized the iHS, such that posi- Altai Denisovan genomes and retaining only those with a match rate tive iHS values indicated directional selection of the trait-decreasing of less than 1% to either of these archaic hominins. The remaining S′ allele, whereas negative iHS values indicated directional selection of haplotypes represent putatively introgressed material from outside the trait-increasing allele. We called the resulting statistic the polar- the Neanderthal and Denisovan branch. ized trait iHS (tiHS). For each trait, we assessed significance keeping only unlinked 61. Barrett, J. C., Fry, B., Maller, J. & Daly, M. J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263–265 (2005). trait-associated variants (Supplementary Note 18). We then compared 62. 1000 Genomes Project Consortium. A global reference for human . the mean tiHS of the x independent, trait-associated alleles with the Nature 526, 68–74 (2015). mean tiHS of 100,000 random samples of x SNPs with similar DAF, 63. Excoffier, L., Smouse, P. E. & Quattro, J. M. Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA genomic evolutionary rate profiling (GERP) score and surrounding restriction data. Genetics 131, 479–491 (1992). recombination rate, to account for the effects of background selection. 64. Meyer, L. R. et al. The UCSC Genome Browser database: extensions and updates 2013. We considered that directional selection has increased (or decreased) Nucleic Acids Res. 41, D64–D69 (2013). 65. de Manuel, M. et al. Chimpanzee genomic diversity reveals ancient admixture with a given trait if less than 2.5% (or 0.5%) of the resampled sets had a mean bonobos. Science 354, 477–481 (2016). tiHS that is lower (or higher) than that observed. We adjusted P val- 66. Sikora, M. et al. The population history of northeastern Siberia since the Pleistocene. ues for multiple testing with the Benjamini–Hochberg method. The Nature 570, 182–188 (2019). 67. Schiffels, S. & Durbin, R. Inferring human population size and separation history from false-positive rate of the approach at a P value of 2.5% (or 0.5%) was multiple genome sequences. Nat. Genet. 46, 919–925 (2014). estimated by resampling (Supplementary Note 18). 68. Fenner, J. N. Cross-cultural estimation of the human generation interval for use in genetics- Because this approach assumes that alleles affecting traits are the based population divergence studies. Am. J. Phys. Anthropol. 128, 415–423 (2005). 69. Fu, Q. et al. Genome sequence of a 45,000-year-old modern human from western same in Oceanian and European populations and that they affect traits Siberia. Nature 514, 445–449 (2014). in the same direction, we used another approach, which tests for the 70. Danecek, P. et al. The and VCFtools. Bioinformatics 27, 2156–2158 (2011). co-localization of selection signals and trait-associated genomic 71. Excoffier, L. & Lischer, H. E. Arlequin suite ver 3.5: a new series of programs to perform analyses under Linux and Windows. Mol. Ecol. Resour. 10, 564–567 regions. We partitioned the genome into 100-kb non-overlapping (2010). contiguous windows and considered a window to be associated with 72. Beaumont, M. A., Zhang, W. & Balding, D. J. Approximate Bayesian computation in a trait if at least one SNP within the window was genome-wide significant population genetics. Genetics 162, 2025–2035 (2002). −8 73. Tavaré, S., Balding, D. J., Griffiths, R. C. & Donnelly, P. Inferring coalescence times from (P < 5 × 10 ). For each window, we estimated the mean tiHS for each DNA sequence data. Genetics 145, 505–518 (1997). population. We then tested whether the mean tiHS of trait-associated 74. Fortes-Lima, C. A., Laurent, L., Thouzeau, V., Toupance, B. & Verdu, P. Complex genetic windows was greater than that for a null distribution, obtained from admixture histories reconstructed with approximate Bayesian computations. Mol. Ecol. Resour. https://doi.org/10.1111/1755-0998.13325 (2021). 100,000 sets of randomly sampled windows, each set being matched to 75. Verdu, P. & Rosenberg, N. A. A general mechanistic model for admixture histories of trait-associated windows in terms of mean GERP score, recombination hybrid populations. Genetics 189, 1413–1426 (2011). rate, DAF and number of SNPs. 76. Gravel, S. Population genetics models of local ancestry. Genetics 191, 607–619 (2012). 77. Liang, M. & Nielsen, R. The lengths of admixture tracts. Genetics 197, 953–967 (2014). 78. Csilléry, K., François, O. & Blum, M. G. B. abc: an R package for approximate Bayesian Reporting summary computation (ABC). Methods Ecol. Evol. 3, 475–479 (2012). Further information on research design is available in the Nature 79. Pudlo, P. et al. Reliable ABC model choice via random forests. Bioinformatics 32, 859–866 (2016). Research Reporting Summary linked to this paper. 80. Maples, B. K., Gravel, S., Kenny, E. E. & Bustamante, C. D. RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. Am. J. Hum. Genet. 93, 278–288 (2013). 81. Patterson, N. et al. Ancient admixture in human history. Genetics 192, 1065–1093 (2012). Data availability 82. Sankararaman, S. et al. The genomic landscape of Neanderthal ancestry in present-day The whole-genome sequencing dataset generated and analysed humans. Nature 507, 354–357 (2014). in this study is available from the European Genome-Phenome 83. Delaneau, O., Marchini, J. & Zagury, J. F. A linear complexity phasing method for thousands of genomes. Nat. Methods 9, 179–181 (2012). Archive (EGA; https://www.ebi.ac.uk/ega/), under accession code 84. Delaneau, O., Zagury, J. F. & Marchini, J. Improved whole-chromosome phasing for EGAS00001004540. Data access and use is restricted to academic disease and population genetic studies. Nat. Methods 10, 5–6 (2013). research in population genetics, including research on population 85. Kanehisa, M. & Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000). origins, ancestry and history. The SGDP genome data were retrieved 86. Kutmon, M. et al. WikiPathways: capturing the full diversity of pathway knowledge. from the EBI European Nucleotide Archive (accession codes PRJEB9586 Nucleic Acids Res. 44, D488–D494 (2016). and ERP010710). The genome data from Malaspinas et al.18 were 87. Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 retrieved from the EGA (accession code EGAS00001001247). The (2019). genome data from Vernot et al.16 were retrieved from dbGAP (acces- 88. The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000). sion code phs001085.v1.p1). 89. Deschamps, M. et al. genomic signatures of selective pressures and introgression from archaic hominins at human innate immunity genes. Am. J. Hum. Genet. 98, 5–21 (2016). 90. Enard, D. & Petrov, D. A. Evidence that RNA viruses drove adaptive introgression between Code availability Neanderthals and modern humans. Cell 175, 360–371 (2018). 91. Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast Neutrality statistics were computed with the optimized, window-based genomes. Genome Res. 15, 1034–1050 (2005). algorithms implemented in selink (https://github.com/h-e-g/selink). 92. Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J. & Kircher, M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, All other custom-generated computer codes or algorithms used in this D886–D894 (2019). study are available on GitHub (https://github.com/h-e-g/evoceania). 93. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012). 94. Shriver, M. D. et al. The genomic distribution of population substructure in four 52. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler populations using 8,525 autosomal SNPs. Hum. Genomics 1, 274–286 (2004). transform. Bioinformatics 25, 1754–1760 (2009). 95. Sabeti, P. C. et al. Genome-wide detection and characterization of positive selection in 53. DePristo, M. A. et al. A framework for variation discovery and using human populations. Nature 449, 913–918 (2007). next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011). 96. Speidel, L., Forest, M., Shi, S. & Myers, S. R. A method for genome-wide genealogy 54. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing estimation for thousands of samples. Nat. Genet. 51, 1321–1329 (2019). next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010). 97. GenomeAsia100K Consortium. The GenomeAsia 100K Project enables genetic 55. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based discoveries across Asia. Nature 576, 106–111 (2019). linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007). 56. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015). Acknowledgements We thank all volunteers and Indigenous communities participating in this 57. Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. research; S. Créno and the HPC Core Facility of Institut Pasteur (Paris) for the management of Bioinformatics 26, 2867–2873 (2010). computational resources; F. Mendoza de Leon Jr, NCCA chairperson 2010–2016, for his 58. Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, support; C. Ebeo, O. Casel, K. Pullupul Hagada, D. Guilay, A. Manera and R. Quilang of Cagayan 308–311 (2001). State University, Lahaina Sue Azarcon and Samuel Benigno of Quirino State University, and the 59. Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. regional and provincial offices of the National Commission for Indigenous Peoples (NCIP)– 2, e190 (2006). Cagayan Valley for their support and assistance. J.C. is supported by the INCEPTION 60. Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in programme ANR-16-CONV-0005 and the Ecole Doctorale FIRE-CRI-Programme Bettencourt unrelated individuals. Genome Res. 19, 1655–1664 (2009). and L.R.A. by a Pasteur-Roux-Cantarini fellowship. The CNRGH sequencing platform was Article supported by the France Génomique National infrastructure, funded as part of the samples; C.H., A.B., R.O. and J.-F.D. coordinated and performed sample preparation and « Investissements d’Avenir » programme managed by the Agence Nationale pour la Recherche sequencing; F.V. provided the archaeological and anthropological context; G.L. and L.E. (ANR-10-INBS-09). M.J. is supported by the Knut and Alice Wallenberg foundation. M.S. is provided the theoretical and methodological context; J.C., J.M.-R., L.R.A., E.P. and L.Q.-M. supported by the Max Planck Society. The laboratory of L.Q.-M. is supported by the Institut wrote the manuscript, with critical input from all authors. Pasteur, the Collège de France, the CNRS, the Fondation Allianz-Institut de France and the French Government’s Investissement d’Avenir programme, Laboratoires d’Excellence Competing interests The authors declare no competing interests. ‘Integrative Biology of Emerging Infectious Diseases’ (ANR-10-LABX-62-IBEID) and ‘Milieu Intérieur’ (ANR-10-LABX-69-01). Additional information Supplementary information The online version contains supplementary material available at Author contributions E.P. and L.Q.-M. conceived and supervised the project; J.C. led and https://doi.org/10.1038/s41586-021-03236-5. performed the processing of the genetic data as well as the analyses of population structure Correspondence and requests for materials should be addressed to E.P. or L.Q.-M. and demographic inference; J.M.-R. led and performed the analyses of archaic and adaptive Peer review information Nature thanks Patrick Kirch, Cosimo Posth and the other, anonymous, introgression; L.R.A. led and performed the analyses of genetic adaptation; S.C.-E., R.L. and P.V. reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are performed the analyses of admixture models; E.P. coordinated all genetic analyses; O.C., M.L., available. A.M.-S.K., Y.-C.K., M.J., A.G. and M.S collaborated with local groups to collect population Reprints and permissions information is available at http://www.nature.com/reprints. (bottom) for the 462 unrelated individuals. The lowest cross-validation error error cross-validation lowest The individuals. unrelated 462 the for (bottom) K = 2 from (top) K = 10 to shown are components ancestry ADMIXTURE populations. Pacific of structure 1|Genetic Fig. Data Extended K=10 K=9 K=8 K=7 K=6 K=5 K=4 K=3 K=2

Africa SGDP (35) Western Eurasia SGDP (36) EastAsia SGDP (34) Ami SGDP (2) Atayal SGDP (1) Atayal (10) Paiwan (7) Igorot SGDP (2) Cebuano (20) Agta (18) Dusun SGDP (2) Australian SGDP (2) Papuan SGDP (15) Bundi (5) Kundiawa (5) Marawaka (5) Mendi (5) Tari (5) Ata (2) Baining (2) Melamala (1) Mamusi (2)

which is indicated in parentheses. in indicated is which size, sample population to proportional not is width Population borders. black by delimited are Populations Fig. 5). K = 6 at (Supplementary obtained was Nakanai Bileki (15) Pasismanua (1) Mussau (1) Nailik (1) Lavongai (1) Bougainville (2) Vella Lavella (17) Malaita (17) Rennell (4) Bellona (14) Tikopia (10) Santa Cruz (14) Ureparapara (14) Maewo (8) Ambae (13) Santo (17) Pentecost (13) Ambrym (16) Malakula (17) Emae (13) Efate (19) Tanna (17) Hawaiian (1) Maori (1) Article

a Time (kya) Archaic Introgression Introgression rate (%) Time (kya)

127

66[62-69] 2.2[1.5-2.7] 58[53-59] 61[56-62] 57[52-59] 0.36[0.35-1.9] Neanderthal 52[47-54] 47[42-49] 3.6[3.2-4.1] 42[35-44] Denisova

YRB OoA GST SAR HAN TWN PNG Present Present

b

?

Log10 Likelihood: -10,332,265Log10 Likelihood: -10,332,302 SAR HANTWN VAN PNG NOC SLI BKA GST Time (kya) Admixture rate (%) Time (kya) Admixture rate (%)

16[12-18] 12[10-16]

9.1[6.3-13] 44[27-57] 8.4[5.8-13] 8.4[5.8-13] 6.8[4.1-11] 24[14-41] 42[25-59] 37[27-56] 4.7[2.9-6.3] 4.3[2.6-5.9] 35[31-40] 36[30-38] 3.4[1.1-7.3] 0.04[0.004-1.6] 2.0[1.1-7.5] 0.2[0.008-2.5] 1.1[0.1-1.5] 0.4[0.2-1.5] 34[31-43] 39[34-48] Present TWNPVAN NG NOC SLI BKA Present TWN VAN PNG NOC SLI BKA GST GST c Time (kya) Admixture rate (%)

58[53-59] 57[52-59]

37[29-41]

28[22-30]

7.3[6.4-11]

1.5[1.4-2.4] 35[32-36] 0.5[0.4-1.1] SAR HAN TWN PHPPOL PNG NOC Present GST

Extended Data Fig. 2 | See next page for caption. Extended Data Fig. 2 | Demographic models for Pacific populations. of Near Oceanian individuals; OoA GST, an unsampled population to represent a, Maximum-likelihood demographic model for baseline populations. Point the Out-of-Africa exodus; PHP, Philippine individuals; PNG, Papua New Guinean estimates of parameters and 95% confidence intervals are shown in Highlanders; POL, Polynesian individuals from the Solomon Islands; SAR, Supplementary Table 2. b, Maximum-likelihood demographic models for Sardinian individuals (Italy); SLI, Solomon Islanders; TWN, Taiwanese western Remote Oceanian individuals (VAN). The likelihoods of the two models Indigenous peoples; VAN, ni-Vanuatu; YRB, Yoruba individuals (Nigeria). We are not considered to be different. Point estimates of parameters and 95% assumed a mutation rate of 1.25 × 10−8 mutations per generation per site and a confidence intervals are shown in Supplementary Table 5. The (VAN, PNG) generation time of 29 years. Single-pulse introgression rates are reported as a model (left) assumes that the ni-Vanuatu diverged from Papua New Guinean percentage. The 95% confidence intervals are shown in square brackets. The Highlanders and then received gene flow from Solomon Islanders, Bismarck larger the rectangle width, the larger the estimated effective population size

Islanders and Austronesian-speaking Taiwanese Indigenous peoples. The (Ne), except for b. Bottlenecks are indicated by black rectangles. Grey and black (VAN, SLI) (right) model assumes that the ni-Vanuatu diverged from the arrows represent continuous and single pulse gene flow, respectively. One- and Solomon Islanders and then received gene flow from the other three groups. two-directional arrows indicate asymmetric and symmetric gene flow, For the sake of clarity, only Taiwanese Indigenous, Near Oceanian and western respectively. We limited the number of parameter estimations by making Remote Oceanian populations are shown. c, Maximum-likelihood model for simplifying assumptions regarding the recent demography of East-Asian- Austronesian-speaking populations, represented by Taiwanese Indigenous, related and Near Oceanian populations in a and c, respectively (Supplementary Philippine Kankanaey and Tikopia Polynesian individuals. BKA, Bismarck Note 4). Sample sizes are described in Supplementary Note 4. Islanders; HAN, Han Chinese individuals (China); NOC GST, a meta-population Article Atayal (TWN) Paiwan (TWN) Agta (AGT) Cebuano (PHP) Papuans (PNG) Nakanai Bileki (BKA) .0 .0 .0 .0 .0 .0 0.4 0.5 0.3 0.5 0.3 0.5 0. n n n n n n 0.3 0.3 1 0.7 5 3 0 0.6 0.7 0.7 61 61 61 61 61 61 va va va va va va 0.4 0.5 0 0.3 1 2 2 1 0 0. 0. 0. 0. 0. .3 3 2 0. 4 . 0.4 0.4 0. 9 0.5 0.4 0.3 3 3 0.7 0.6 1 0.6 1 3 0.6 0.3 4 2 1 0.9 1 0.3 4 0.8 1 Deniso Deniso Deniso Deniso Deniso Deniso 0.9 0.7 1 2 6 0.2 0.2 0.2 0.2 0.6 0.2 0.3 0.2 2 0 0.4 0.4 0.3 .8 0.3 0. 2 5 0.6 0. 0.3 2 0.4 5 5 4 0 8 0.4 2 0.5 0.2 0.6 1.0 0.2 0.6 1.0 0.2 0.6 1.0 0.20.6 1.0 0.2 0.6 1.0 0.2 0.6 1.0 Vindija Neanderthal Vindija Neanderthal Vindija Neanderthal Vindija Neanderthal Vindija Neanderthal Vindija Neanderthal Vella Lavella (SLI) Malaita (SLI) Santa Cruz (SCI) Rennell/Bellona (POL) Tikopia (POL) Ureparapara (VAN) .0 1. 0 1. 0 1. 0 1. 0 1. 0 0.3 0.4 0.3 0.4 0.4 n 0.5 n n 0.5 0.5 0.6 0.5 0.7 0.3 0.3 an an an 1 1 1 0.6 va 0.7 va va 0.8 2 2 0.7 0.3 0.8 0.6 0. 61 0.6 0.6 0.6 0.6 0.5 0.4 0.6 3 0.3 7 0.4 0.4 0.5 3 0.7 1 0.6 0.6 0.6 0. 1 1 0.81 2 0.9 1 0.9 2 1 2 0.9 0.7 0.8 Deniso Denis ov Deniso Deniso Denis ov 1 0.9 0.9 10.8 Denis ov 0.8 0.7 1 0.8 0.2 0.2 0.2 0.2 0.6 0.2 0.2 0. 0.5 0.4 0 0. 0.3 2 0.3 0. 0.4 0.3 .6 0.3 0.3 0.4 0. 7 4 2 2 0.3 2 0.4 7 2 9 2 0.2 0.6 1.0 0.2 0.6 1.0 0.2 0.6 1.0 0.20.6 1.0 0.2 0.6 1.0 0.2 0.6 1.0 Vindija Neanderthal Vindija Neanderthal Vindija Neanderthal Vindija Neanderthal Vindija Neanderthal Vindija Neanderthal Maewo (VAN) Santo (VAN) Ambae (VAN) Pentecost (VAN) Ambrym (VAN) Malakula (VAN) .0 .0 .0 .0 .0 1. 0 0.3 0.3 0.5 0.5 0.5 0.4 0.3 n n 0.5 n 0.4 n n 0.4 0.3 0.3 n 0.6 1 0. 0.7 0.8 0.7 0.7 va va va 2 va va va 2 2 2 2 .6 7 0. 61 0. 61 0. 61 0.3 0. 61 0. 61 0.4 0.4 3 0.4 0.6 0.6 0.6 0. 3 0.5 0.6 3 0.6 1 0.8 1 0.5 1 1 1 7 3 0.91 0.7 2 0.7 9 0.8 1 . Deniso Deniso 0.9 0.9 Deniso Deniso 0.8 Deniso 0.9 1 Deniso 2 1 1 0.9 1 0 0 0.2 0.2 0.2 .5 0.2 0.6 0.2 0. 20 0.4 0.3 0.5 0.3 0.5 0.4 2 0.3 0 0.3 0. 2 0.4 0. 0.3 0.3 2 1 .4 2 4 5 2 0.4 1 0.2 0.6 1.0 0.2 0.6 1.0 0.2 0.6 1.0 0.20.6 1.0 0.2 0.6 1.0 0.2 0.6 1.0 Vindija Neanderthal Vindija Neanderthal Vindija Neanderthal Vindija Neanderthal Vindija Neanderthal Vindija Neanderthal Emae (VAN) Efate (VAN) Tanna (VAN) East Asians (ASN) West Eurasians (EUR) .0 .0 1. 0 1. 0 1. 0 0 0.4 .5 0.5 0.3 0.3 0.5 0.4 0.3 n n n n n 1 0.4 4 0.5 0.6 0.7

0 va va va va va 0. 2 2 .3 1 1 . 9 0.6 0.6 0.6 0.7 0. 61 0. 61 0 0.3

3 0.4 0.6 2 0.5 0 3 1 1 .7 0.7 0.3 2 2 0.6 1 0.3 4 5 0.8 Deniso Deniso Deniso Deniso Deniso 0.8 0.8 0.8 4 7 1 1 5 0.4 0.5 0.2 0.2 0.2 0.2 0.2 0.3 0. 0.3 0. 6 0.4 6 0.7 0.3 0. 5 2 3 2 6 1 0.4 3 6 0.2 0.6 1.0 0.2 0.6 1.0 0.2 0.6 1.0 0.20.6 1.0 0.2 0.6 1.0 Vindija Neanderthal Vindija Neanderthal Vindija Neanderthal Vindija Neanderthal Vindija Neanderthal

Extended Data Fig. 3 | Match rate of introgressed S′ haplotypes in Pacific more than 40 sites outside archaic genome masks were included in the analysis. populations to the Vindija Neanderthal and Altai Denisovan genomes. The The numbers indicate the height of the density corresponding to each contour match rate is the proportion of putative archaic alleles matching a given line. Contour lines are shown for multiples of 1 (solid lines) and multiples of 0.1 archaic genome, excluding sites at masked positions. Only S′ haplotypes with between 0.3 and 0.9 (dashed lines). a

300

S'

200 S'NoNeanderthal

S'NoDenisova

S'NoArchaic S' 100 NoArchaicLowMatch S' haplotypes (Mb)

0 AN ) AN) AN) AN ) AN ) AN) GT ) AN) AN ) AN) AN) uz (SCI ) al (TWN) ella (SLI) wo (V ym (V ate (V an (TWN) v y opia (POL) Agta (A anna (V Ef uano (PHP) iw Emae (V T Malaita (SLI) puans (PNG) Ata Ambae (V Mae Tik Pa Ambr entecost (V itu Santo (V Malakula (V Pa Ceb P ella La Santa Cr V East Asians (ASN) Ureparapara (V est Eurasians (EUR) Espir Nakanai Bileki (BKA ) Rennell/Bellona (POL) W

b 1.00

haplotypes 0.75 ch t Ma

0.50 Shared chaicLow r

A Private o N

0.25 tion of S'

0.00 Propor GT (N=5) SLI (N=3 ) AN (N=25) BKA (N=2) POL (N=1) A PHP (N=6 ) EUR (N=4) PNG (N=6 ) SCI + V ASN + TWN (N=7)

Extended Data Fig. 4 | Detection of introgressed haplotypes from an unknown archaic hominin. a, Cumulative length of S′ haplotypes retrieved among modern human populations (S′), after removing Neanderthal CRF haplotypes (S′NoNeanderthal) or Denisovan CRF haplotypes (S′NoDenisova) or both

(S′NoArchaic), and removing from the S′NoArchaic haplotypes those with a match rate higher than 1% to either the Vindija Neanderthal or Altai Denisovan genomes

(S′NoArchaicLowMatch). These S′ haplotypes are, therefore, putatively introgressed haplotypes from hominins outside of the Neanderthal and Denisovan branch

(Supplementary Note 13). b, Proportion of S′NoArchaicLowMatch haplotypes common or private (that is, unique) to populations. Total numbers of S′NoArchaicLowMatch haplotypes are shown above the population labels. Article

a 0.8 rs367689451   1000 Genomes Project This study           0.6     

0.4                    

aSNPs      0.2                      Denisovan Denisovan        0.0              0.8 0.6 0.4 HC CRF Denisova 0.2 0.0 SIGLECL1 VSIG10L NKG7 CD33 ETFB IGLON5 CLDND2 51.76 51.8 51.84 51.88 b Chromosome 19 (Mb) 0.8 rs376970869      1000 Genomes Project This study  0.6  

             0.4                            aSNPs             0.2                            

Denisovan Denisovan                             0.0                           0.8 0.6 0.4 HC CRF Denisova 0.2 0.0 IRF4 HUS1B DUSP22 EXOC2

0.3 0.4 0.5 0.6 0.7 c Chromosome 6 (Mb) 0.8  rs2360653         1000 Genomes Project This study 0.6 



        0.4                    aSNPs       0.2       Neanderthal        0.0   0.8 0.6 0.4

HC CRF 0.2 Neanderthal 0.0 LINC00592 KRT7 KRT80 C12orf80 52.56 52.58 52.6 52.62 52.64 d Chromosome 12 (Mb) 0.8 rs2303423                   1000 Genomes Project This study             0.6                                                                            0.4                                                           aSNPs           0.2                 

Neanderthal     0.0  0.8 0.6 0.4

HC CRF 0.2 Neanderthal 0.0 PTTG2 TBC1D1 PGM2

37.9 38 38.1 e Chromosome 4 (Mb) 0.8 rs368334238 1000 Genomes Project This study 0.6                     0.4                       aSNPs      0.2                    Denisovan Denisovan 0.0                   0.8 0.6 0.4 HC CRF Denisova 0.2 0.0 JAK1 MIR101-1 RAVER2 MIR3671 AK128734 L6INC01359 U

65.2 65.3 65.4 65.5 Chromosome 1 (Mb) f 0.8 rs17031656 1000 Genomes Project This study 0.6  



      0.4               aSNPs                0.2                

Denisovan Denisovan                                                                  0.0                            0.8 0.6 0.4 HC CRF Denisova 0.2 0.0 FLJ20021 BANK1 SLC39A8 PPP3CA

102 102.5 103

Extended Data Fig. 5 | Examples of candidate loci for adaptive proportion of high-confidence introgressed haplotypes (HC CRF) and the gene introgression in Pacific populations. a, Adaptive introgression of Denisovan isoforms at the locus (in Mb, based on hg19 coordinates). Middle, derived allele origin at the CD33 locus. b, Adaptive introgression of Denisovan origin at the frequencies of the top archaic SNP in 1000 Genomes Project phase 3 IRF4 locus. c, Adaptive introgression of Neanderthal origin at the KRT80 locus. populations (excluding recently admixed populations). Right, derived allele d, Adaptive introgression of Neanderthal origin at the TBC1D1 locus. frequencies of the top archaic SNP in populations from this study. Colours in e, Adaptive introgression of Denisovan origin at the JAK1 locus. f, Adaptive the left panels indicate populations as in Fig. 1. Pie charts indicate the derived introgression of Denisovan origin at the BANK1 locus. a–f, Left, local Manhattan allele frequency in purple, and are centred on the approximate geographical plot showing the derived allele frequency of archaic SNPs (aSNPs), the location of each population. Maps were generated using the maps R package51. Extended Data Fig. 6 | Classic sweep signals detected in Papuan-related Agta (d). a–d, The y axis shows the −log10(P value) for the number of outlier populations. a–d, Manhattan plots of classic sweep signals in Papua New SNPs per window. Each point is a 100-kb window. The names of genes Guinean Highlanders (a), Solomon Islanders (b), ni-Vanuatu (c) and Philippine associated with windows with significant sweep signals are shown. Article

a Neanderthal

TWN: classic sweep TWN: introgression

PHP: classic sweep PHP: introgression

POL: classic sweep POL: introgression

VAN: classic sweep VAN: introgression

SLI: classic sweep SLI: introgression

PNG: classic sweep PNG: introgression

AGT: classic sweep AGT: introgression

2 0 3 B 6 1 1 1 F A 1 0 P 1 4 N D N P A T2 5A PK 18 A4 OX PA3 A F CS S OAF TDGIARS OD2 OR2 AN2 OGOM IP LIFR T8D2 NOL8 OCA2 PASK PAI ANO7R APOIL23P ST DTX3KI 4K2C LRIG3 ASPECM2 EY STT3 CNTN5 GNATITGB1WWOSMYPD6BR TENM3 KCNH7 RPS1NDST4 1 ALNT HDLB PTCH1 CNPY2 BICD2 AM155AGL POU2FSIPA1L2 L NT5DC350K20.3 C5orf28C5orf34CR ZNF444 NFE2L2 C9orf8 CENPP NCOR2LZTFL1AMT COL25A1F OSBPL1 SEMA3F 6 SLC35F4 SLC36 FRMD4AUNC13C PIP ONECUT − MA HNRN SLC26A1 C104841.2SLCO3A AC024580. ARHGEF25B4G A AD RP11 b Denisovan

TWN: classic sweep TWN: introgression

PHP: classic sweep PHP: introgression

POL: classic sweep POL: introgression

VAN: classic sweep VAN: introgression

SLI: classic sweep SLI: introgression

PNG: classic sweep PNG: introgression

AGT: classic sweep AGT: introgression

2 1 B 0 D 6 2 1 1 3 5 P 3 B 1 9 1 K X3 M2 P1 A V3 NP ER2 O AR AS1 B AC2 AIP P ING4 − KTN1 JAK1 ERI1 EZH1 LT8D SGCZ HRH CUBN N TIAL CRB COLQ CD RPL1ST PIA DLEU1G HCFC2AC BAN THEM4CASP WARSEFR3B CSMD A SAMDKCNTIQSEC12 SRPRRAB6B C3orf3PDE76 BGSDMB FAM155A S100A1 ZNF385 TNF UNC13C LRRC8CTOP ORMDL3 CRISPLD2SLCO2 CNTNAP CCDC109 PLA2G12A ARHGEF28 KTN1 AL590867.1

Extended Data Fig. 7 | Classic sweeps and adaptive archaic introgression. Denisovans (b). Yellow and blue frames indicate genomic regions identified in a, b, Coloured squares indicate genomic regions displaying signals of both a East-Asian- and Papuan-related populations, respectively. AGT, Philippine selective sweep and adaptive introgression from Neanderthals (a) or Agta; PHP, Philippine individuals. Extended Data Fig. 8 | Examples of candidate loci for classic sweeps in (LHFPL2) (d). Pie charts indicate the derived allele frequency in purple, in which Pacific populations. a, c, Sweep signals detected in Papuan-related the radius is proportional to the sample size (Supplementary Table 1). The pie populations at the GABRP locus (a) and in Polynesian populations at the LHFPL2 charts for the populations of Santa Cruz and Vanuatu were moved from their locus (c). Manhattan plots shows the −log10(P value) of the Fisher’s scores for sampling locations for convenience. Maps were generated using the maps R each SNP (Supplementary Note 16). b, d, Maps showing the population allele package51. frequencies for candidate SNPs rs79997355 (GABRP) (b) and rs117421341 Article

Extended Data Fig. 9 | Classic sweep signals detected in East-Asian-related individuals (d). a–d, The y axis shows the −log10(P value) for the number of outlier populations. Manhattan plots of classic sweep signals in East Asian individuals (a), SNPs per window. Each point is a 100-kb window. The names of genes associated Taiwanese Indigenous peoples (b), Philippine Cebuano (c) and Polynesian with windows with significant sweep signals are shown. Extended Data Fig. 10 | Schematic model of the history of archaic Australian individuals and Philippine Agta populations14,15,17,97; a Denisovan introgression in modern humans. The phylogenetic tree depicts introgression event that occurred only in the ancestors of Papuan individuals relationships among archaic and modern humans. Estimates for the splits around 25 ka; a Denisovan introgression event in the ancestors of East Asian between archaic, introgressing populations and for introgression episodes are individuals around 21 ka, the legacy of which is also observed in Philippine Agta shown. Five introgression events are consistent with our data: a Neanderthal and western Eurasian individuals due to subsequent gene flow (solid purple introgression event into the common ancestors of non-African individuals arrows); and a Denisovan introgression event into the ancestors of the around 61 ka; a Denisovan introgression event into the ancestors of Papuan Philippine Agta at an unknown date. individuals approximately 46 ka, which is shared with the ancestral Indigenous        $%&&'()%012034567%&8(9@ AQ52(j5206404ga5&H2    

A4(65)146'1BC4567%&8(9@f'HklWmnmn 

   

D')%&6203E5FF4&C     

G465&'D'('4&H7I2(7'(6%2F)&%P'67'&')&%15H2B2Q26C%R67'I%&S6746I')5BQ2(7TU72(R%&F)&%P21'((6&5H65&'R%&H%0(2(6'0HC4016&40()4&'0HC  

20&')%&6203TV%&R5&67'&20R%&F462%0%0G465&'D'('4&H7)%Q2H2'(W(''%5&X126%&24QY%Q2H2'(40167'X126%&24QY%Q2HC$7'HSQ2(6T     E6462(62H( V%&4QQ(6462(62H4Q404QC('(WH%0R2&F674667'R%QQ%I20326'F(4&')&'('062067'R235&'Q'3'01W64BQ'Q'3'01WF4206'`6W%&a'67%1(('H62%0T

0b4 $%0R2&F'1

o U7''`4H6(4F)Q'(2c'8d9R%&'4H7'`)'&2F'064Q3&%5)bH%01262%0W32P'04(412(H&'6'05FB'&4015026%RF'4(5&'F'06

o e(646'F'06%0I7'67'&F'4(5&'F'06(I'&'64S'0R&%F12(620H6(4F)Q'(%&I7'67'&67'(4F'(4F)Q'I4(F'4(5&'1&')'46'1QC U7'(6462(62H4Q6'(68(95('1eGfI7'67'&67'C4&'%0'g%&6I%g(21'1 o hdipqrsttsdquvwuwqwxsyi€qvq€vwr‚ƒv€qwsivipqpqd„tv q€vwr‚ƒvqts‚vqrst†iv‡quvrxdƒˆyvwqƒdquxvq‰vuxs€wqwvruƒsd

o e1'(H&2)62%0%R4QQH%P4&246'(6'(6'1

o e1'(H&2)62%0%R40C4((5F)62%0(%&H%&&'H62%0(W(5H74(6'(6(%R0%&F4Q26C40141‘5(6F'06R%&F5Q62)Q'H%F)4&2(%0( eR5QQ1'(H&2)62%0%R67'(6462(62H4Q)4&4F'6'&(20HQ51203H'06&4Q6'01'0HC8'T3TF'40(9%&%67'&B4(2H'(62F46'(8'T3T&'3&'((2%0H%'RR2H2'069 o eGfP4&2462%08'T3T(64014&11'P2462%09%&4((%H246'1'(62F46'(%R50H'&64206C8'T3TH%0R21'0H'206'&P4Q(9

V%&05QQ7C)%67'(2(6'(6203W67'6'(6(6462(62H8'T3T’WuW‚9I267H%0R21'0H'206'&P4Q(W'RR'H6(2c'(W1'3&''(%RR&''1%F401“P4Q5'0%6'1 o ”ƒ•vq“q•„iyvwq„wqv‡„ruq•„iyvwq–xvdv•v‚qwyƒu„iv

o V%&—4C'(240404QC(2(W20R%&F462%0%067'H7%2H'%R)&2%&(401a4&S%PH7420a%06'$4&Q%('66203(

o V%&72'&4&H72H4Q401H%F)Q'`1'(230(W21'062R2H462%0%R67'4))&%)&246'Q'P'QR%&6'(6(401R5QQ&')%&6203%R%56H%F'(

o X(62F46'(%R'RR'H6(2c'(8'T3T$%7'0˜(€WY'4&(%0˜(‚9W2012H462037%I67'CI'&'H4QH5Q46'1

hy‚q–vqrsiivruƒsdqsdqwu„uƒwuƒrwq™s‚qƒsisdƒwuwqrsdu„ƒdwq„‚uƒrivwqsdqt„dpqs™quxvq†sƒduwq„s•v E%R6I4&'401H%1' Y%Q2HC20R%&F462%04B%564P42Q4B2Q26C%RH%F)56'&H%1' f464H%QQ'H62%0 Y2H4&1U%%Q(PTmTpTkW—eePTnTqTkrWfeUsPTrTpWPHR6%%Q(nTkTkrW—$V6%%Q(PTkTpWYAtGsPTkTuWstGfPTmTk

f464404QC(2( XtfXGEvVUPTqTmTkWefatwUxDXPTkTmmWYvGfPTkTlWg4)Q%P2'IPTlTmWEgeYXtUmWR4(6(2FH%4QPTmTyWDPTrTl%&Q46'&W4BHD)4HS43'PTmTkWa'6g2( PTkTnWefatwUvvAEPTzTkTkWEg)&2F'PTnqf'HkpTz'mW$DV8E40S4&4&4F40'64QTWG465&'mnkl9W('Q20SPTm8IIIT32675BTH%Fb7g'g3b('Q20S9W e&Q'i520PTrTzTmTmW%67'&H5(6%Fg3'0'&46'1(H&2)6(4&'1')%(26'1%0f26g5B8IIIT32675BTH%Fb7g'g3b'P%H'40249 V%&F405(H&2)6(562Q2c203H5(6%F4Q3%&267F(%&(%R6I4&'67464&'H'06&4Q6%67'&'('4&H7B560%6C'61'(H&2B'120)5BQ2(7'1Q26'&465&'W(%R6I4&'F5(6B'F41'4P42Q4BQ'6%'126%&(401 &'P2'I'&(Te'(6&%03QC'0H%5&43'H%1'1')%(262%0204H%FF5026C&')%(26%&C8'T3Tf26g5B9TE''67'G465&'D'('4&H73521'Q20'(R%&(5BF266203H%1'h(%R6I4&'R%&R5&67'&20R%&F462%0T

f464 Y%Q2HC20R%&F462%04B%564P42Q4B2Q26C%R1464 eQQF405(H&2)6(F5(620HQ51'414644P42Q4B2Q26C(646'F'06U72((646'F'06(7%5Q1)&%P21'67'R%QQ%I20320R%&F462%0WI7'&'4))Q2H4BQ'@

geHH'((2%0H%1'(W502i5'21'062R2'&(W%&I'BQ20S(R%&)5BQ2HQC4P42Q4BQ'1464('6(   

geQ2(6%RR235&'(674674P'4((%H246'1&4I1464 ! "

ge1'(H&2)62%0%R40C&'(6&2H62%0(%014644P42Q4B2Q26C # " # U7'I7%Q'3'0%F'1464('63'0'&46'1401404QC('120672((651C2(4P42Q4BQ'R&%F67'X5&%)'40f'0%F'gY7'0%F'e&H72P'8Xfe9W501'&4HH'((2%0H%1' XfeEnnnnknnlzlnTU7'EffY3'0%F'1464I'&'&'6&2'P'1R&%F67'X—tX5&%)'40G5HQ'%621'e&H72P'84HH'((2%005FB'&(@YD{X—uzpy401XDYnknqkn9TU7' 3'0%F'1464R&%Fa4Q4()204('64QTWG465&'mnkyI'&'&'6&2'P'1R&%FXfe84HH'((2%005FB'&@Xfefnnnnknnkyrl9TU7'3'0%F'1464R&%F|'&0%6'64QTWEH2'0H' mnkyI'&'&'6&2'P'1R&%F1BfeY84HH'((2%005FB'&@)7(nnknpzTPkT)k9T

   

V2'Q1g()'H2R2H&')%&6203    YQ'4('('Q'H667'%0'B'Q%I67462(67'B'(6R26R%&C%5&&'('4&H7TtRC%54&'0%6(5&'W&'4167'4))&%)&246'('H62%0(B'R%&'F4S203C%5&('Q'H62%0T    

A2R'(H2'0H'( —'74P2%5&4Qh(%H24Q(H2'0H'( o XH%Q%32H4QW'P%Q562%04&Ch'0P2&%0F'064Q(H2'0H'( 

V%&4&'R'&'0H'H%)C%R67'1%H5F'06I2674QQ('H62%0(W(''0465&'TH%Fb1%H5F'06(b0&g&')%&6203g(5FF4&CgRQ46T)1R        

XH%Q%32H4QW'P%Q562%04&Ch'0P2&%0F'064Q(H2'0H'((651C1'(230  

eQQ(6512'(F5(612(HQ%('%067'(')%206('P'0I7'067'12(HQ%(5&'2(0'3462P'T   E651C1'(H&2)62%0 e'('i5'0H'167'3'0%F'%R€rnnY4H2R2Ht(Q401'&(467237H%P'&43'8€rn`9W6%1'(H&2B'67'3'0'62H12P'&(26C%R75F40)%)5Q462%0( R&%F672(501'&(6512'1&'32%0TY%)5Q462%03'0'62H(404QC('(I'&'5('16%20R'&82967'3'0'62H(6&5H65&'W82291'F%3&4)72H72(6%&CW82229  67'Q'P'Q(%R4&H742H206&%3&'((2%040182P9H4012146'Q%H24016&426(501'&)%(262P'('Q'H62%020G'4&401D'F%6'vH'40240(T 

D'('4&H7(4F)Q' e'('i5'0H'167'3'0%F'%Rrkq2012P2154Q(R&%Fmn75F40)%)5Q462%0(6746I'&'H7%('06%H%P'&43'%3&4)72H6&40('H667%5376 6%501'&Q2'67')'%)Q20372(6%&C%RG'4&401D'F%6'vH'4024TU72(20HQ51'(U42I40W67'Y72Q2))20'(W67'E%Q%F%0t(Q401(WE4064$&5c 40167'|405465e&H72)'Q43%TU7'('0'IQC(4F)Q'1)%)5Q462%0(I'&'404QC('120H%FB20462%0I267%67'&)%)5Q462%0(R&%F67'e(24g Y4H2R2H&'32%0R%&I72H73'0%F'(4&'4P42Q4BQ'W20HQ51203Y4)54G'If520'4W67'—2(F4&HSe&H72)'Q43%401X4(6e(24TE4F)Q'1 2012P2154Q(4&'F'4066%&')&'('06G'4&vH'40240(8Y4)54G'If520'40(W—2(F4&HS401E%Q%F%02(Q401'&(9WI'(6'&0D'F%6' vH'40240(802g|4054659We5(6&%0'(240g()'4S2033&%5)(8U42I40'('4B%&2320'(WY72Q2))20'$'B540%9WY%QC0'(240g()'4S203)%)5Q462%0( 8Y%QC0'(240%56Q2'&(R&%F67'E%Q%F%0t(Q401(9W401Y72Q2))20'˜G'3&26%(˜8Y72Q2))20'e3649TU7'(651C(4F)Q'I4(4Q(%H7%('06% H74&4H6'&2c'203&'461'642Q67'3'0%F2H12P'&(26C%R75F40)%)5Q462%0(67464&'501'&(6512'12075F403'0%F2H(T

E4F)Q203(6&46'3C Y%)5Q462%0(I'&'(4F)Q'16%H%P'&43'%3&4)72H6&40('H667%53766%501'&Q2'67')'%)Q20372(6%&C%RG'4&401D'F%6'vH'4024T E4F)Q203%R&'Q46'12012P2154Q(I4(4P%21'1WB'H45('&'Q46'10'((H40H%0R%501)%)5Q462%03'0'62H(404QC('(TU7''670%gQ20352(62H 3&%5)%R(4F)Q'12012P2154Q(I4(1'R20'1B4('1%067'('QRg1'HQ4&'13&%5)%R67'2&)4&'06(4013&401g)4&'06(Te04P'&43'%R0ky 50&'Q46'12012P2154Q(I'&'(4F)Q'1)'&)%)5Q462%0TE4F)Q'(2c'R%&1'F%3&4)72H20R'&'0H'I267R4(6(2FH%4Qm2(5(54QQC0z 8a4Q4()204('64QTWG465&'mnky9TV%&4&H742H206&%3&'((2%0401)%(262P'('Q'H62%0404QC('(W)%I'&F420QC1')'01(%0%67'&R4H6%&( 6740(4F)Q'(2c'WB56262(H%FF%0QC4HH')6'167460mn)&%P21'(7237)%I'&8Y2HS&'QQ'64QTWf'0%F'D'(mnnu9Te'675(F'&3'1 HQ%('QCg&'Q46'1)%)5Q462%0(206%)%)5Q462%03&%5)(R%&67'('404QC('(T

f464H%QQ'H62%0 eQQ1'F%3&4)72H20R%&F462%0I4(H%QQ'H6'167&%5374(6&5H65&'1i5'(62%0042&'401b%&'670%3&4)72H206'&P2'I(TfGeI4(%B6420'1 R&%F)'&2)7'&4QI7%Q'BQ%%1BCP'0')50H65&'W%&(4Q2P4BCv&43'0'S26(401H7''S(I4B(TU7'(4F)Q203(5&P'C%RU42I40'(' 4B%&2320'(I4(H%015H6'1BCeQB'&6s%8t0(62656'%R|'&6'B&46'Y4Q'%06%Q%3C401Y4Q'%4067&%)%Q%3CW$72049TU7'(4F)Q203(5&P'C%R E%Q%F%0t(Q401'&(I4(H%015H6'1BCa4&SE6%0'S2038a4`YQ40HSt0(62656'R%&XP%Q562%04&Ce067&%)%Q%3CWf'&F40C9TU7'(4F)Q203 (5&P'C%RG2g|405465I4(H%015H6'1BCvQ2P2'&$4((4&401e06%20'f'((4208t0(62656Y4(6'5&WY4&2(9TU7'(4F)Q203(5&P'C%RY72Q2))20' G'3&26%(I4(H%015H6'1BCa4`2F2Q240A4&'048g5F40XP%Q562%0Wf')4&6F'06%Rv&3402(F4Q—2%Q%3CWx))(4Q4x02P'&(26CWEI'1'09T

U2F203401()4624Q(H4Q' U7'(4F)Q203(5&P'C%RU42I40'('4B%&2320'(I4(H%015H6'1B'6I''0kuup401mnnkTU7'(4F)Q203(5&P'C%RE%Q%F%02(Q401'&(I4( H%015H6'120e535(6401E')6'FB'&mnnlTU7'(4F)Q203(5&P'C%R02g|405465I4(H%015H6'1B'6I''0e)&2Qmnnr401e535(6mnnzT U7'(4F)Q203(5&P'C%RY72Q2))20'G'3&26%(I4(H%015H6'1B'6I''0mnkz401mnkpTU7'62F203%R(4F)Q203(5&P'C(I4(1'6'&F20'1 B4('1%0Q%32(62H&'i52&'F'06(67461')'01'1%067'4HH'((2B2Q26C%R(4F)Q203(26'(401R2040H24Q&'(%5&H'(T

f464'`HQ5(2%0( E4F)Q'(I'&''`HQ51'12R67'C(7%I'1'P21'0H'%R829fGeH%064F20462%0W8229)4&'064Q&'Q46'10'((W82229&'Q46'10'((6%%67'&(4F)Q'(W %&82P93'0'62H40H'(6&CR&%F)%)5Q462%0(%56(21'%RvH'4024401X4(6bE%567'4(6e(24TeQQ'`HQ5(2%0H&26'&24I'&')&'g'(64BQ2(7'1T

D')&%15H2B2Q26C e'H%F)4&'13'0%6C)'H4QQ(%B6420'1BC0'`6g3'0'&462%0('i5'0H2036%EGY3'0%6C)2034&&4C(R%&67'(4F'2012P2154Q( 850)5BQ2(7'114649401R%501P'&C7237H%0H%&140H'&46'(8€uuTuu‚9TG%%67'&'`)'&2F'064Q1464I'&'H%QQ'H6'1T

D401%F2c462%0 U%4P%21B46H7'RR'H6(W2012P2154Q(I'&'&401%F2c'14HH%&12036%67'2&)%)5Q462%0%R%&2320W4H&%((Q2B&4&C)&')4&462%0B46H7'(T

—Q201203 —Q201203I4(0%6&'Q'P40620672((651CB'H45('0%H%01262%0%&(6465(I4(H%F)4&'14H&%(((4F)Q'12012P2154Q(T

f2167'(651C20P%QP'R2'Q1I%&S~ '( o G%

D')%&6203R%&()'H2R2HF46'&24Q(W(C(6'F(401F'67%1( e'&'i52&'20R%&F462%0R&%F4567%&(4B%56(%F'6C)'(%RF46'&24Q(W'`)'&2F'064Q(C(6'F(401F'67%1(5('120F40C(6512'(Tg'&'W2012H46'I7'67'&'4H7F46'&24QW 

(C(6'F%&F'67%1Q2(6'12(&'Q'P4066%C%5&(651CTtRC%54&'0%6(5&'2R4Q2(626'F4))Q2'(6%C%5&&'('4&H7W&'4167'4))&%)&246'('H62%0B'R%&'('Q'H62034&'()%0('T  

! " # " #

} a46'&24Q(h'`)'&2F'064Q(C(6'F( a'67%1(   

0b4 t0P%QP'12067'(651C 0b4 t0P%QP'12067'(651C   o e062B%12'( o $7tYg('i   

o X5S4&C%62HH'QQQ20'( o VQ%IHC6%F'6&C   

o Y4Q4'%06%Q%3C4014&H74'%Q%3C o aDtgB4('10'5&%2F43203 

o e02F4Q(401%67'&%&3402(F(  

o g5F40&'('4&H7)4&62H2)406(  

o $Q202H4Q1464     o f54Q5('&'('4&H7%RH%0H'&0     g5F40&'('4&H7)4&62H2)406(   Y%Q2HC20R%&F462%04B%56(6512'(20P%QP20375F40&'('4&H7)4&62H2)406(  Y%)5Q462%0H74&4H6'&2(62H( e3'W3'01'&W'670%gQ20352(62H3&%5)4013'0%6C)2H20R%&F462%0I'&'H%QQ'H6'1R%&4QQ75F40&'('4&H7)4&62H2)406(T Y4&62H2)406(20HQ51'kqrF4Q'(401llR'F4Q'(W401I'&'R&%Fkp6%qyC'4&(%R43'TX670%gQ20352(62H3&%5)(4&'1'(H&2B'120 E5))Q'F'064&CU4BQ'kTf'0%6C)203&46'I4(€uz‚R%&4QQ)4&62H2)406(W'`H')6%0'T

D'H&526F'06 t0'4H7)%)5Q462%0W%0QC50&'Q46'1P%Q506''&(I2674('QRg&')%&6'1'670%gQQ20352(62H3&%5)I'&'&'H&526'1R&%FQ%H4QP2QQ43'(T E4F)Q203%R&'Q46'12012P2154Q(I4(4P%21'1B'H45('&'Q46'10'((H40H%0R%501)%)5Q462%03'0'62H(404QC('(TU7''670%g Q20352(62H3&%5)%R(4F)Q'12012P2154Q(I4(1'R20'1B4('1%067'('QRg1'HQ4&'13&%5)%R67'2&)4&'06(4013&401g)4&'06(Te' 1%0%64062H2)46'40CB24(20%5&&'(5Q6(6746H%5Q1B'15'6%672(&'H&526F'06(6&46'3CT

X672H(%P'&(2376 U7'(651C&'H'2P'14))&%P4QR&%F67't0(626562%04QD'P2'I—%4&1%Rt0(62656Y4(6'5&80„mnkygnmbtD—bz9W67'X672H($%FF2((2%0 %R67'x02P'&(26C%RA'2)c23a'12H4QV4H5Q6C80„mpygkngnlknmnkn9W67'X672H($%FF266''%Rx))(4Q4x02P'&(26C D'32%04Q4 X62S)&†P0203(0‡F01'0x))(4Q4ˆ8f0&mnkybknr9W4(I'QQ4(R&%FQ%H4Q4567%&262'(20HQ5120367'|405465a202(6&C%Rg'4Q67W 67'$7204a'12H4Qx02P'&(26Cg%()264QX672H(D'P2'I—%4&1W67'G462%04Q$%FF2((2%0R%&$5Q65&'40167'e&6(%R67' Y72Q2))20'(8204HH%&140H'I26767')&%P2(2%0(%RY72Q2))20'D')5BQ2HeH6qrzyW%&67'A4I$&'4620367'G$$e9W40167' E%Q%F%0t(Q401(a202(6&C%RX15H462%0401U&420203T G%6'6746R5QQ20R%&F462%0%067'4))&%P4Q%R67'(651C)&%6%H%QF5(64Q(%B')&%P21'12067'F405(H&2)6T   

! " # " #

ƒ