The mixed genetic origin of the first farmers of Europe - Supplementary Information -

Nina Marchi1,2#, Laura Winkelbach3#, Ilektra Schulz2,4#, Maxime Brami3#, Zuzana Hofmanová2,4, Jens Blöcher3, Carlos S. Reyna-Blanco2,4, Yoan Diekmann3, Alexandre Thiéry1,2, Adamandia Kapopoulou1,2, Vivian Link2,4, Valérie Piuz1, Susanne Kreutzer3, Sylwia M. Figarska3, Elissavet Ganiatsou5, Albert Pukaj3, Necmi Karul6, Fokke Gerritsen7, Joachim Pechtl8, Joris Peters9,10, Andrea Zeeb-Lanz11, Eva Lenneis12, Maria Teschler-Nicola13,14, Sevasti Triantaphyllou15, Sofija Stefanović16, Christina Papageorgopoulou5, Daniel Wegmann2,4^*, Joachim Burger3^*, Laurent Excoffier1,2^* # These authors contributed equally. ^ These authors contributed equally. * Corresponding authors ([email protected]; [email protected]; [email protected]) Affiliations: 1CMPG, Institute for Ecology and Evolution, University of Berne, Berne, . 2Swiss Institute of Bioinformatics, Lausanne, Switzerland. 3Palaeogenetics Group, Institute of Organismic and Molecular Evolution (iomE), Johannes Gutenberg University Mainz, Mainz, . 4Department of Biology, University of Fribourg, Fribourg, Switzerland. 5Laboratory of Physical Anthropology, Department of History and Ethnology, Democritus University of Thrace, Komotini, . 6Department of Prehistory, İstanbul University, . 7Netherlands Institute in Turkey, Istanbul, Turkey. 8Institute of Archaeology, Innsbruck University, . 9Institute of Palaeoanatomy, Domestication Research and the History of Veterinary Medicine, LMU Munich, Munich, Germany. 10SNSB, State Collection of Anthropology and Palaeoanatomy Munich, Germany. 11Generaldirektion Kulturelles Erbe Rheinland-Pfalz, Speyer, Germany. 12Department of Prehistoric and Historical Archaeology, University of Vienna, Austria. 13Department of Anthropology, Natural History Museum of Vienna, Austria. 14Department of Evolutionary Anthropology, University of Vienna, Austria. 15Department of History and Archaeology, Aristotle University of Thessaloniki, Thessaloniki, Greece. 16Biosense Institute, University of Novi Sad/Department of Archaeology, Laboratory for Bioarchaeology, University of , .

Alexandre Thiéry’s present address: Cardio-CARE AG, Davos-Wolfgang, Switzerland. Susanne Kreutzer’s present address: Functional Genomics Center Zurich/GEML, ETH Zurich, Zurich, Switzerland & Department of Biology, ETH, Zurich, Switzerland. Sylwia M. Figarska’s present address: Department of Cardiology, University Medical Centre Groningen, University of Groningen, Groningen, the Netherlands. 1 Marchi et al. The mixed genetic origin of the first farmers of Europe

Table of contents Section 1: Archaeological context of the samples ______3 Archaeological background ...... 3 Akopaklk ...... 7 Bacn ...... 8 Nea Nikomedeia ...... 8 Vlasac ...... 9 Lepenski Vir ...... 10 Grad-Saeo ...... 12 Vina-Belo Bdo (Saeo phae) ...... 13 Asparn-Schletz ...... 15 Kleinhadersdorf...... 16 Essenbach-Ammerbreite ...... 17 Dillingen-Steinheim ...... 18 Herxheim ...... 19 Radiocarbon dating and stable isotope analysis ...... 20 Section 2: DNA preparation and sequencing ______24 Sample preparation ...... 24 DNA extraction ...... 24 Library preparation ...... 25 Sample Screening ...... 26 Whole-genome sequencing ...... 26 Section 3: Genotyping, quality checks ______28 Samples processed ...... 28 Programs used ...... 30 Bioinformatics Pipeline ...... 31 Reference samples ...... 41 Data filtering ...... 42 Sequence annotations...... 44 Phasing and imputation ...... 44 Section 4: Genetic structure analysis ______45 Multi-Dimensional Scaling (MDS) ...... 45 Genetic diversity ...... 46 Runs of Homozygosity (ROHs)...... 46 Admixture analysis ...... 47 Sex-specific markers and haplogroups...... 49 Neanderthal introgression ...... 50 Section 5: Phenotypic analysis ______52 Pigmentation ...... 52 Additional phenotypic variation ...... 55 Polygenic Score for Height ...... 57 Section 6: Demographic inference ______59 MSMC2 analysis ...... 59 fastsimcoal2 analyses...... 68 Section 7: Validating the proposed demographic model ______100 Validation of parameter estimates from parametric bootstraps ...... 100 Predictive Simulations ...... 100 f-statistics...... 104 References ...... 116

2 Marchi et al. The mixed genetic origin of the first farmers of Europe

Section 1: Archaeological context of the samples In this study, we present new whole-genome sequences for 15 ancient human individuals (see Table S1 for an overview). Our sample set consists of 2 individuals from Vlasac (Serbia), 2 individuals from Transitional and layers at Lepenski Vir (Serbia) and 11 Early Neolithic individuals. The Neolithic samples originate from Aktopraklık and Barcın in Turkey (1 individual each), from Nea Nikomedeia in Greece (2 individuals), from Vinča-Belo Brdo and Grad-Starčevo in Serbia (1 each), from Kleinhadersdorf and Asparn-Schletz in Austria (1 each) and Essenbach-Ammerbreite, Dillingen-Steinheim and Herxheim in Germany (1 each).

Archaeological background

Anatolia and the Aegean appear to have been sparsely populated at the height of the , ca. 26-20 kya1–7. The few sites that are securely dated to this period, including Asprochaliko and Kastritsa in Epirus, Theopetra in and Franchthi in Argolis, are thought to have been mainly used seasonally as hunting stations and do not show much stratigraphic time-depth5. Aurignacian industries are succeeded by Gravettian industries ca. 26 kya in mainland Greece, which however lack the typical Gravette points observed elsewhere in the Balkans4,5.

The onset of the northern Hemisphere deglaciation, ca. 20-19 kya8, is associated with major changes in the Aegean region. Eustatic rise in sea-level, in the order of 120 m by the mid- Holocene9,10, is thought to have caused the breakup of the Cycladic mega-island11,12 and, from ca. 9.5 kya, the flooding of the Black Sea13. Epigravettian industries, characterized by small backed blades with abrupt retouch, perhaps used as inserts for composite projectiles, are found on both sides of the Aegean Basin and in the Antalya Bay14,15. The emergence of new seafaring networks in the Aegean Basin is indicated by the long-distance procurement of obsidian, a volcanic glass found on the island of Melos, from the Final Palaeolithic onwards16.

First settled communities appear on the Central Anatolian Plateau ca. 15.5-15 kya, shortly before or at the time of the Bølling-Allerød warming phase29. The Epipalaeolithic site of Pınarbaşı in the Konya Plain, has produced geometric microliths, including lunates, showing influence from contemporary Levantine pre-Natufian and Natufian communities29. Similar tools appear later on in the Antalya region at Öküzini30. Large numbers of marine shells at the site of Pınarbaşı 1,000 m above sea level suggest strong links with the Mediterranean area during the Late Pleistocene,

3 Marchi et al. The mixed genetic origin of the first farmers of Europe despite the natural boundary formed by the Taurus mountains31. It is not clear how the cold spell of the Younger Dryas affected interactions between the Anatolian Plateau and the Aegean Basin. There is limited evidence for occupation during the interval 12.9-11.7 kya at Pınarbaşı29,31.

Recent surveys in Western , in the Karaburun and Bozborun peninsulas32,33, fill an important gap between Epipalaeolithic traditions on the Mediterranean sea shore, e.g. Öküzini Cave in the Antalya Bay30, and contemporary traditions in the North Aegean, e.g. Ouriakos, on the island of Lemnos34. Characteristic lithic scatters found in the course of these surveys indicate the presence of foragers in Western Anatolia at the end of the Pleistocene, who shared similar tool sets (e.g. geometric microliths, backed blades and bladelets) and lithic technology, and were presumably integrated in wider Levantine-East Mediterranean-Aegean maritime interactions during the Younger Dryas34. The picture changes at the beginning of the Holocene (coinciding with the start of the Aegean Mesolithic) when Western Anatolia shows stronger connections with the Aegean islands and the Greek mainland. There was limited interaction at that time with aceramic Neolithic sites of Southwest Asia32.

In the Fertile Crescent, the Neolithic or agricultural ‘revolution’ is thought to have reached a tipping point by about 11.7 kya, at a time of dramatic post-Pleistocene climate change and expanding carrying capacities35–37. Sites in what is today Southeast Turkey, Northern Syria and Western , are among the first to show unambiguous signs of ungulate domestication38,39. From about 10.6 kya, a range of founder crops and animals from the Fertile Crescent are introduced piecemeal among sedentarizing communities of Central Anatolia and Cyprus29,40,41. The small ‘aceramic’ site of Boncuklu, is thought to be one of the oldest communities on the Anatolian plateau to practice small-scale plant cultivation, starting ca. 10.3 kya42.

The Western half of the Anatolian Peninsula does not transition to farming before about 8.7/8.6 kya (43,44 and references therein). Barcın and Aktopraklık are among the first Neolithic communities in NW Anatolia45–47. The new subsistence economy, characterized by a near- complete ‘package’ of domestic crops and animals, is abruptly introduced by farmers, who live on settlement mounds, build elaborate rectilinear houses, and use pressure technology to create sets of tools for domestic activities, such as plant harvesting41,48–50. Broadly comparable Early Neolithic communities appear all over mainland Greece and Crete approximately within a century51–56. Nea Nikomedeia belongs to an advanced phase of that expansion.

4 Marchi et al. The mixed genetic origin of the first farmers of Europe

From about 8.2 kya, at a time of rapid climate change, farming spreads northwards to the Central , along the Struma, Mesta and Vardar-Morava-Danube corridors57,58. The distribution of 14C-dated Neolithic sites shows a gradient away from the main river axes, with Alpine regions left mostly off-course during the initial phase of Danubian expansion59. Neolithic groups settle in enclaves along the main river system all the way to the Carpathian Basin, for instance at Vinča- Belo Brdo and Grad-Starčevo near today’s Belgrade. The narrow stretch of the Danube river known as the Danube Gorges or Iron Gates bears witness to important cultural and genetic interactions between indigenous sedentary fishers and incoming farmers at Vlasac and Lepenski Vir22,60–63.

The emergence of the Linearbandkeramik (LBK) in Central Europe ca. 7.5 kya involves a major redefinition of the Neolithic pattern of existence, with settlement mounds now abandoned in favour of flat extended sites. A steep decline in crop diversity is observed north of the Alps, with a number of southerly-oriented species like lentils and chickpeas dropped from the Neolithic ‘package’64. Typical long houses appear in this period. The dead are increasingly buried in formal cemeteries outside settlement areas, such as Kleinhadersdorf in Lower Austria65. Most LBK burials date to an advanced phase of the Early Neolithic after ca. 7.2 kya.

Increased competition between different LBK communities of Central Europe is thought to have resulted in the first large-scale ‘massacres’37. Asparn-Schletz and Herxheim show signs of interpersonal violence on a scale that has never been observed before in the Neolithic66–69. At Asparn-Schletz, one LBK community is thought to have attacked another, resulting in tens of dead lying unburied in the latest ditch system70. In Herxheim, several LBK groups are thought to have come together to participate in a large-scale ritual event, involving the systematic killing and dismembering of hundreds of people from outside the community, whose bones were then meticulously smashed into small fragments whereas the skulls were manipulated leaving only the calottes intact. Besides the human victims, which are thought to represent human sacrifices, precious and stone tools were also destroyed, underlining the ritualistic character of the event. Body fragments, calottes and artefact remains were finally deposited in large finds concentrations in two ditches surrounding the former settlement71,72.

5 Marchi et al. The mixed genetic origin of the first farmers of Europe

Table S1 - Archeological information on the new palaeogenomes included in this study. The 14C dates were calibrated in OxCal 4.4.217 using the IntCal20 calibration curve18. Dates from Danube Gorges individuals with associated δ15N>8.3‰ were corrected for freshwater reservoir effect (FRE) following the method described by Cook et al.19, later amended by Bonsall et al.20. Both new (marked with an asterisk) and previously published dates are reported here.

Sample Archaeological/ 14C Lab Number, Site Phase cal. BP 2σ ID Burial ID uncal. age BP

AA-57777: 8196 ± 6921 VLASA7 Vlasac Late Mesolithic Burial 31 8764-8340 FRE corrected: 7707 ± 93

*MAMS-46044: 9064 ± 27 VLASA32 Vlasac Late Mesolithic Burial 16 9741-9468 FRE corrected: 8596 ± 66

Anatolian Late Neolithic AKT16 Aktopraklık 89 D 14.1 MAMS-25475: 7792 ± 2824 8635-8460 (Aegean Early Neolithic)

Anatolian Late Neolithic M10-455 (BH 43347 Bar25 Barcın *MAMS-46043: 7506 ± 27 8384-8205 (Aegean Early Neolithic) (lot. 1856))

Nea2 Nea Nikomedeia Early Neolithic #7 MAMS-24004: 7290 ± 3125 8173-8023

Nea3 Nea Nikomedeia Early Neolithic T XII *MAMS-46042: 7388 ± 27 8327-8040

OxA-16005: 7190 ± 4522 8017-7844 FRE corrected: 7115 ± 46

Transformational/ OxA-16006: 7190 ± 4022 LEPE48 Lepenski Vir Burial 122 8020-7860 Early Neolithic FRE corrected: 7127 ± 41

OxA-16005 + OxA-16006: 7122 ± 31 8012-7867 X2-Test: df=1 T=0.0(5% 3.8)

BA-10652: 7265 ± 3023 LEPE52 Lepenski Vir Early-Middle Neolithic Burial 73 7931-7693 FRE corrected: 6983 ± 47

STAR1 Grad-Starčevo Early Neolithic (Starčevo) Grave 1 BRAMS-2407: 6671 ± 2726 7589-7476

VC3-2 Vinča-Belo Brdo Early Neolithic (Starčevo) Grave V (group burial) OxA-28634/UBA-22463: 6581 ± 3428 7565-7426

*MAMS-46038: 6657 ± 26 7580-7473

Ind 44 (646 Part 152, *MAMS-48728: 6627 ± 35 7573-7430 Asp6 Asparn-Schletz Early Neolithic (LBK) Schnitt 10 LM70) MAMS-46038 + MAMS-48728: 6646 ± 21; 7575-7474 X2-Test: df=1 T=0.5(5% 3.8)

*MAMS-46040: 6208 ± 27 7244-7000

VERA-2167: 6090 ± 5027 7158-6796 Klein7 Kleinhadersdorf Early Neolithic (LBK) Grave 56 (25,937) MAMS-46040 + VERA-2167: X2-Test: df=1 ⎯ T=4.277(5% 3.8) - combination fails at 5%

Essenbach- Ess7 Early Neolithic (LBK) Grave 2 ⎯ ⎯ Ammerbreite

Grave 24, Befund 24 Dil16 Dillingen-Steinheim Early Neolithic (LBK) *MAMS-46039: 6200 ± 25 7235-6998 (1997)

Herx Herxheim Early Neolithic (LBK) 281-19-6 *MAMS-46041: 6189 ± 25 7164-6993

6 Marchi et al. The mixed genetic origin of the first farmers of Europe

Aktopraklık

Excavated since 2004 by a team from the Prehistory Department of the University of Istanbul, Aktopraklık is one of the oldest Neolithic sites in Northwest Anatolia46. The site is located on one of the terraces overlooking the eastern edge of Lake Uluabat, ca. 25 km west of the city of Bursa. Unusually for the Anatolian Neolithic, Aktopraklık is a flat extended settlement and cemetery, consisting of three distinct areas (Aktopraklık A, B and C). Previously-published AMS dates on human bones indicate that occupation spanned from the early 9th to mid-8th millennium BP24,46. The oldest levels (Area C) are characterized by circular semi-subterranean structures, ca. 3-6 m in diameter, which were perhaps used as houses or ‘huts’ and fit the coastal Fikirtepe tradition defined by Mehmet Özdoğan in the NW Anatolia region45. Previous studies have indicated that the first inhabitants at Aktopraklık, including the individual sampled here, 49,73,74 consumed terrestrial C3 food and practiced animal husbandry . In the Neolithic phase, the dead were buried in close proximity to houses or immediately beneath them, sometimes with monochrome pottery, bone tools and stone beads75. Individual 89D 14.1. (AKT16) stems from a double burial and was identified genetically as female (Fig. S1). This middle-aged to old adult was buried in a contracted position on her left side with the head oriented to the east – facing south76. The head of individual 14.1 was placed on the feet of adult female 17.1, who shared the same grave. There were no grave goods as such except faunal remains in the pit fill and on the skeletons. Direct dating of this skeleton (MAMS-25475: 7792 ± 28 BP) indicates that she died ca. 8635-8460 cal. BP at 2σ - in other words at the very beginning of farming expansion in the region24.

Figure S1 - Aktopraklık: double burial 89D 17.1 and 14.176

7 Marchi et al. The mixed genetic origin of the first farmers of Europe

Barcın

Barcın Höyük is a multi-layered mound in Northwest Anatolia, excavated from 2005 to 2015 by an international team under the auspices of the Netherlands Institute in Turkey47. The site, which has already featured in several ancient DNA studies77–79, is located in the Yenişehir Basin, between the Iznik Lake and the Uludağ Mountains, providing convenient access to the heart of the Anatolian Peninsula. A series of over 30 radiocarbon dates place the beginning of the Barcın occupation (levels VIe and VId1) ca 8,550 cal. BP43. Despite its early date, Barcın is characterized from the outset by a fully agrarian economy, based on cultivated cereals and pulses, while domestic cattle and sheep dominate the faunal assemblage41. The settlement structure in the early phases consists of a row of rectilinear mud and timber houses facing a central courtyard. The dead are generally buried in the courtyard, except neonates and infants who are also buried within or directly outside houses, next to walls. The individual Bar25 sampled here (M10-455) was found with several other infant burials in a narrow annex of structure 19, a structure which had an unusual red-coloured floor – a feature often associated with “special purpose” buildings in Southeast and Central Anatolian Neolithic sites80. This and neighbouring buildings of level VId1 were all destroyed by a fire, which happened no later than 8350-8325 cal. BP according to the latest radiocarbon determinations. Culturally speaking, Barcın demonstrates early and ongoing connections with Central Anatolia, visible in such elements as the production and uses of ceramics, the incorporation of cattle and goat horns in the architecture and the predominance of Central Anatolian sources in the obsidian assemblage47. Burial M10-455 is a single primary inhumation in a small pit. The young infant, genetically identified as male, was placed in an east-west orientation with cranium to the east. There were no grave goods associated with the burial. The skeleton has been directly dated to 8384-8205 cal. BP at 2σ (MAMS 46043: 7506 ± 27 BP).

Nea Nikomedeia

The Early Neolithic mound of Nea Nikomedeia in Northern Greece was excavated from 1961 to 1964 by Robert J. Rodden under the auspices of the British School at Athens. The site, which is located ca. 10.5 km north-east of the city of Veroia in the plain of , has produced three building phases consisting of broadly overlapping rectilinear timber structures with foundation trenches81–84. The excavation project marked an important turning point for research on Macedonian prehistory, due to the involvement of multiple teams of international scientists and

8 Marchi et al. The mixed genetic origin of the first farmers of Europe the recognition of Nea Nikomedeia as one of “the oldest Neolithic sites found in Europe”83. An unusually large building at the site, described by the excavators as a “shrine”, was destroyed in a violent fire with all its contents, including elaborate female figurines85. Based on a series of radiocarbon dates obtained in the 1990s on archival samples of domestic cereals and bones, the Early Neolithic site of Nea Nikomedeia has been dated to the second half of the 9th millennium BP, perhaps starting ca. 8400 cal. BP. The funerary record of the site has never been published in detail, although preliminary reports suggest that at least 35 individuals came from regular burials inside the Early Neolithic settlement, including 13 adults, 13 children and nine infants; in addition, 31 adults and 21 children were represented by single bones or partial skeletons86,87. Most burials were single primary inhumations placed in earth pits, in contracted position under house floors or in-between houses82. The two individuals sampled here, both genetically females, have been radiocarbon dated to 8327-8040 cal. BP (Nea3, MAMS-46042: 7388 ± 27 BP) and 8173-8023 cal. BP (Nea2, MAMS- 24004: 7290 ± 31 BP). Based on the macroscopic examination of the skeletal remains, which are in extremely poor condition, both individuals appear to be early infants, respectively ~6-9 (T XII, Nea3) and ~18 months old (#7, Nea2). The latter shows enlarged nutrient foramina on the diaphyses and flaring of the epiphyseal ends of the long bones. These skeletal changes have been linked to hereditary anemias such as thalassemia and sickle cell anemia88–90. Differential diagnosis for flaring and frayed metaphyseal cortex also includes rickets/osteomalacia and genetic syndromes89.

Vlasac

Set like its neighbour Lepenski Vir in the spectacular landscape of the Danube Gorges, Vlasac has been first salvage-excavated by D. Srejović and Z. Letica in 1970-1971; a team led out by D. Borić returned to the site in 2006-2009 to conduct new excavations on the unsubmerged section of the settlement. The individuals analysed in this study both originate from the old Srejović excavation area. Remarkably, some of the architectural practices observed at Lepenski Vir, such as trapezoidal floor plans, rectangular hearths and red limestone floors, are already present (if only in basic form) in Late Mesolithic contexts at Vlasac. Burial 31 (VLASA7) is a typical Mesolithic burial in extended supine position (Fig. S2). The burial, which belongs to Phase I, truncated the eastern side of Dwelling 2, which is dated to the first half of the 9th millennium BP according to Borić et al.21. A single radiocarbon date (AA- 9 Marchi et al. The mixed genetic origin of the first farmers of Europe

57777: 8196 ± 69 corrected 7707 ± 93 BP) from the skull provided a range of 8764-8340 cal. BP at 2σ after correction for freshwater reservoir effect22. This adult individual, identified as genetically male24, has produced high stable isotope values (δ15N: 16.1‰ and δ13C: -20.7‰) consistent with substantial intake of aquatic proteins22. Previously published 87Sr/86Sr ratios indicate a local range for this individual23. Mesolithic Burial 16 (VLASA32) consists of the disarticulated remains of an old adult male (Fig. S2), radiocarbon dated to 9741-9468 cal. BP at 2σ after correction for freshwater reservoir effect (MAMS-46044: 9064 ± 27 corrected 8596 ± 66 BP). The previously published 87Sr/86Sr ratio for Burial 16 is consistent with a local range23.

Figure S2 - Vlasac. a: Burial 31 in extended supine position; b: Remains of Burial 16 (image credit: Documentation of the Archaeological collection, Faculty of Philosophy, University of Belgrade).

Lepenski Vir

Lepenski Vir is one of the most emblematic Mesolithic and Early Neolithic sites in Europe. Completely excavated from 1965 to 1970 by Dragoslav Srejović, in advance of the construction of the hydroelectric Đerdap I dam in the Danubian Iron Gates, Lepenski Vir has since been extensively re-studied by teams from the Universities of Belgrade and Cambridge. The osteological and mortuary evidence have been the focus of a number of recent high-impact publications, making the site an ideal case-study to address the transition from Mesolithic to Neolithic24,60,91. Occupied from the 12th millennium BP by foragers and fishers, Lepenski Vir bears witness to important changes after 8150 cal. BP, when pottery appears in the Central Balkans. The more iconic trapezoidal houses with limestone floors, sculpted boulders and 10 Marchi et al. The mixed genetic origin of the first farmers of Europe subfloor burials belong to this period. Despite the arrival of early Starčevo farming communities in the region, there are no domesticates beside dogs at Lepenski Vir before Phase III, dated to the early 8th millennium BP92. The two individuals reported in this study, which are very close chronologically, belong to the Transformational/Early Neolithic phase I-II (Burial 122) and the Early-Middle Neolithic Phase III (Burial 73). Burial 122 (LEPE48) is a stray sub-adult skull without a mandible (15-18 year old, genetically male) discovered in the packing layer between the superimposed floors of trapezoidal houses 47 and 47’, at the centre of the settlement (Fig. S3). Two radiocarbon dates (OxA-16005: 7190 ± 45 corrected 7115 ± 46 BP and OxA-16006: 7190 ± 40 corrected 7127 ± 41 BP) directly date this skull to 8012-7867 cal. BP at 2σ after correction for reservoir effect60. Strontium isotope analysis has previously indicated that skull burial 122, which bears intriguing cut marks, falls just outside the local range23. Burial 73 (LEPE52) is a single primary inhumation of a genetically adult male24, ca. 30-40 years old, crouched on his right side in the northern zone of Lepenski Vir, outside the habitation space (Fig. S4). A single radiocarbon date (BA-10652: 7265 ± 30 corrected 6983 ± 47 BP) places Burial 73 to 7931-7693 cal. BP at 2σ after correction for freshwater reservoir effect23,60. Isotopically, Burial 73 is in the local range23. Starčevo pottery and a green stone pendant were recovered in the fill of this burial.

Figure S3 - Lepenski Vir. a: Burial 122; the skull with house 47’; b: detail; c: frontal, occipital, vertical and lateral projection (image credit: reproduced after 61). 11 Marchi et al. The mixed genetic origin of the first farmers of Europe

Figure S4 - Lepenski Vir. a: Burial 73; b: skull of Burial 73, frontal, occipital, vertical and lateral projection and maxilla and mandible (image credit: reproduced after 61).

Grad-Starčevo

Situated just across Vinča on the other side of the Danube, along an old bank of the river that was liable to flooding until the turn of the 20th century, the type-site for the Starčevo culture in the Central Balkans has been repeatedly excavated since 1928 by teams from the National Museum in Belgrade93. The first systematic excavations were carried out in 1932 by Vladimir Fewkes with funding from the Peabody and Fogg Museums at Harvard and from the American School of Prehistoric Research93. New excavations took place in 1969-1970 under the direction of Draga Garašanin and Robert Ehrich. In 2003-2004, the Institute for the Heritage Protection in Pančevo conducted a series of small trench excavations totaling about 70 m2 94. The site, which has been traditionally ascribed to the later Starčevo period, is radiocarbon-dated to ca. 7900-7350 cal. BP95. Grad-Starčevo has documented a range of domestic activities typically associated with Neolithic societies, including food-production, some degree of sedentism with ‘pit-houses’ (though no postholes can convincingly be associated with them) and a rich ceramic assemblage including monochrome and painted vases93. Domestic animals include cattle, sheep/goat and pigs96.

12 Marchi et al. The mixed genetic origin of the first farmers of Europe

The individual sampled for this project, Skeleton 1, Grave 1 (STAR1), which is now radiocarbon dated to 7589-7476 cal. BP at 2σ (BRAMS-2407: 6671 ± 27 BP), was discovered during the 2004 excavation season at the bottom of the pit dwelling in trench 5/200494. Identified as an older female, ca. 55-60 years old, Skeleton 1 was buried in a contracted position on the right side, without any grave goods (Fig. S5). New isotopic values for this individual indicate that she had a mostly terrestrial diet26. A 5-7 year old child was found nearby, at a distance of 0.6 m on the same level94.

Figure S5 - Grad-Starčevo. a: Grave 1; b: detail (image credit: Documentation of the Monument Protection Institute - Pančevo/Jugoslav Pendić).

Vinča-Belo Brdo (Starčevo phase)

Located on the right bank of the Danube, ca. 14 km east of Belgrade, the site of Belo Brdo in Vinča has been excavated from 1908 onward by Miloje Vasić, rapidly becoming the type-site for the eponymous Vinča culture in the Central Balkans. The site is unusual in being the only known tell or settlement mound in a radius of 100 km, at 8 m above the Danube river valley97. New excavations began in 1978 under the auspices of the Serbian Academy of Sciences and Arts98.

13 Marchi et al. The mixed genetic origin of the first farmers of Europe

The site was first occupied in an advanced phase of the Starčevo period before the development of the Vinča culture. The individual sampled for this project, individual V (VC3-2), stems from a collective tomb (pit Z) in the lowermost level of the mound (Fig. S6). The partial commingled remains of at least 12 individuals, including two females and nine indeterminate adults and sub-adults, were associated with this feature94, which has been variously interpreted as an “ossuary”, a “pit-dwelling” and a “tomb with entrance hall – dromos” 99. Individual V forms part of the collection studied by I. Schwidetzki in 1937, which was damaged during the bombing of Belgrade in WWII94. In her PhD thesis, Jelena Jovanović indicates that individual V consists exclusively of skull fragments; age could be inferred to between 25-35 years based on tooth abrasion; isotopic values of δ13C, δ15N and δ34S provided by Nehlich et al.100 are consistent with a terrestrial diet with a small intake of aquatic proteins94. Individual V has a genetic signature consistent with male24. The Starčevo burials have been radiocarbon dated in the 2000s to between 7669-7324 cal. BP at 2σ after correction for reservoir effect28,101. Given the δ15N values of between +10.3-13.6‰ (Δ = 3.3‰; n = 6) for some of these individuals, possibly indicating some intake of freshwater fish62,99,100 , a small reservoir correction factor equivalent to that used in the Danube Gorges was applied by Tasić et al.28.

Figure S6 - Vinča: collective tomb of the Starčevo period (image credit: Documentation of the Archaeological collection, Faculty of Philosophy, University of Belgrade) 14 Marchi et al. The mixed genetic origin of the first farmers of Europe

Asparn-Schletz

Excavated from 1983 to 2005 by Helmut Windl, Niederösterreichisches Landesmuseum, the site of Asparn-Schletz, in the Lower Austrian Weinviertel, is one of the most enigmatic Linearbandkeramik culture (LBK) sites. Schletz is a large settlement enclosed by two ditch systems, a trapezoidal one, ca. 400 m in diameter, and an oval one, ca. 330 m in diameter70. The latter was renewed several times, broken by several entranceways or earthen bridges and probably reinforced by a palisade. There is evidence for habitation, in the form of long houses, and food production at the site102. In its latest phase, the oval ditch system provided a final resting place for approximately 130 individuals, who were given no formal inhumation and often lay in unnatural contortions. The skeletons, which have been studied by anthropologist Maria Teschler-Nicola and colleagues, bear multiple traumas of perimortem origin, mostly caused by blunt force to the skull and arrow piercing69. Some perimortem fractures on postcranial remains were also identified. It is likely that the corpses remained unburied for a period of time as limb extremities are often missing, and bones bear gnaw marks. Presumably the ‘massacre’ was a single event which happened some time towards the end of the LBK, ca. 7150-6900 BP, based on a sequence of 15 internally consistent 14C dates on human bones103. Young females are underrepresented in the mortality profile (only five females to 17 males in the 20-40 years age bracket). The abandonment of the settlement appears to have coincided with the ‘massacre’ itself; the final ditch system was partially infilled with settlement debris70. Up to now ca. 20 regular graves have been recorded in the ditch and the settlement beside the individuals of the massacre. Most individuals buried in graves were subadults. Regular graves came from an earlier LBK level. The individual sampled for this project, ind. 44 (Asp6) from the 1987 excavation season, comes from the juncture of the two ditches, which precisely overlap at this point, and possibly belongs to the ‘massacre’ context. The individual has been identified, both anthropologically and genetically, as male. Age at death has been recorded as between 35-45 years. The cranial remains exhibit not only an old healed trauma, but also perimortal fractures. In the course of the new Forschung-Technologie-Innovation (FTI) Strategy project, supported by the Lower Austrian government, the excavation records and other documents are being systematically re- investigated. For this burial, earlier excavation records indicate that the individual was “buried in the ditch, E-half”.

15 Marchi et al. The mixed genetic origin of the first farmers of Europe

The new 14C date reported here, which stems directly from the skeleton sampled (MAMS-46038: 6657 ± 26 BP, 7580-7473 cal. BP at 2σ) is significantly older than anticipated, based on its stratigraphic location on top of other slain individuals (F. Pieler, comm.). The 14C date was confirmed by repeating the graphitization and measurement, using collagen prepared for MAMS- 46038 (MAMS-48728: 6627 ± 35 BP, 7573-7430 cal. BP at 2σ). The ongoing FTI project, led by Franz Pieler and Maria Teschler-Nicola, will help to clarify the dating of this individual.

Kleinhadersdorf

Discovered in 1911, the Linearbandkeramik (LBK) cemetery of Kleinhadersdorf near Poysdorf, in the Lower Austrian Weinviertel, has been first excavated in 1931; more systematic excavations to rescue the site from erosion and agricultural activities took place in 1987-1991, under the directorship of Johannes Wolfgang Neugebauer from the Austrian Bundesdenkmalamt. The findings, including the results of anthropological analyses, have been recently published in a monograph65. The cemetery was located some 150-200 m north of the nearest LBK settlement, on a slope at the edge of a very fertile loess basin. Based on ceramic typology and a sequence of 19 radiocarbon dates, the site has been dated to ca. 7250-6750 BP, spanning from the ‘Lower Austrian LBK I/II’ to the ‘Moravian LBK III’; at least four burial phases could be identified65. Of the more than 60 graves excavated at the site, 41 have provided detailed anthropological information104. Most were single primary inhumations in a contracted position on the left side65. There was some variation in the layout and orientation of the bodies, and at least 12 were buried on their back. Beside regular graves, there were 26 ‘empty’ ones, which contained no or very few human remains, bringing the total number of individuals recorded at the site to 62. Intriguingly, several features point to a local continuation of Mesolithic traditions, including microliths, arrowheads, and deposits of red ochre in some of the grave pits105. The individual sampled, Klein7, comes from grave 56 (Fig. S7) and was buried in a contracted position on the left side, with the head to the northeast, the mouth wide open65. The arms were folded together, with the left hand in front of the face covered in red ochre powder. The individual was identified as a mature female, ca. 40-50 years old, and shows signs of degeneration on the spine104. There were perimortem fractures on the right humerus and right fibula. Ceramic sherds from the grave fill show an early Želiezovce ornament and should be dated to shortly after phase IIb of the site65. The new radiocarbon date (MAMS-46040: 6208 ± 27 BP) is slightly older than the one published before for grave 5627 (VERA-2167: 6090 ± 50 BP) and is consistent with the stratigraphy of the site, indicating that Klein7 should be dated to 7244-7000 cal. BP at 2σ. 16 Marchi et al. The mixed genetic origin of the first farmers of Europe

Previously published δ13C (-20.0‰), δ15N (9.8‰) 87Sr/86Sr (0.709612) values for this individual suggest that she sourced her food nearby and probably had access to meat106.

Figure S7 - Kleinhadersdorf: grave 56 (image credit: J.-W. Neugebauer).

Essenbach-Ammerbreite

The LBK cemetery and settlement of ‘Ammerbreite’ in Essenbach, Landshut county, Lower Bavaria, was discovered in 1981 during construction work on the north-western edge of the modern village. Subsequent excavations until 1986 by Henriette Brink-Kloke uncovered remains of LBK habitation and a concentration of 29 graves, ca. 40 m away, attributed to an advanced phase of the LBK. One additional burial (grave 7) was found inside a reused settlement pit in the direction of the cemetery107. Only a section of the cemetery could be excavated. Graves were damaged by ploughing activities. Anthropological study of the skeletons determined that 10 were children or infants, and 15 adults or sub-adults, of which seven were female and six were male107.

The individual sampled for this project (Ess7) was the child in grave 2, ca. 9-10 years old, buried in contracted position on the left side, with the head oriented to the northeast (Fig. S8). Grave 2 was one of the more richly furnished graves, including objects normally associated with adult males (graves 16 and 24), such as a stone axe and a complete vessel. The vessel contained the fragment of a perforated bone comb and the left, upper incisor tooth of the child107. The dating 17 Marchi et al. The mixed genetic origin of the first farmers of Europe of the skeleton has not been established through 14C dating. The objects deposited in the grave are typical of the LBK, with the comb suggesting a ‘young’ or ‘late LBK’ date, possibly in the range 7050-6900 BP (J. Pechtl, comm.).

Figure S8 - Essenbach-Ammerbreite: content of grave 2 (image credit: Zeichnungen Bayerisches Landesamt für Denkmalpflege, Außenstelle Landshut, reproduced with permission of Germania107).

Dillingen-Steinheim

Located on a high terrace with thick loess soils, between the Danube and the Egau rivers, the LBK site of ‘Steinheim’ near Dillingen, Swabia, was excavated during a seven-month long season in 1987 by the Bavarian State Office for the Preservation of Monuments – Augsburg108,109. The site consists of a short-lived LBK settlement possibly enclosed by a ditch. Two groups of burials were reported for this site – a small burial cluster, comprising up to 27 Early Neolithic graves, some distance away from the settlement108 and a group of regular graves within the ditch (J. Pechtl, comm.). There is no evidence of massacre at Steinheim; formal inhumations in LBK ditches are common in Southern Germany. The individual sampled (Dil16) comes from the ditch, belonging to an 11-13 year old, genetically male. The new radiocarbon date obtained from this

18 Marchi et al. The mixed genetic origin of the first farmers of Europe skeleton (MAMS-46039: 6200 ± 25 BP, 7235-6998 cal. BP at 2σ) confirms that the ditch was infilled in an ‘older LBK’ phase, ca. 7150-7050 BP110.

Herxheim

Located in the south of the Rhineland-Palatinate State in Western Germany, Herxheim is one the most frequently discussed LBK sites in relation to Neolithic interpersonal violence alongside Schletz and Talheim. Initially a rescue excavation under Annemarie Häußer, in 1996-1999, the scientific responsibility of the project has since been entrusted to Andrea Zeeb-Lanz, from the General Directorate for Cultural Heritage Rhineland-Palatinate, who oversaw from 2005 to 2008 a new excavation. Herxheim may have started off as a fairly conventional LBK site, enclosed during the so-called “ritual phase” by a series of elongated pits, which ended up forming two broadly parallel trapezoidal ditches or ditch segments, ca. 250 x 230 m in size; in places these run up to 4 m beneath the settlement level. Twenty-nine radiocarbon dates place occupation toward the end of the LBK, ca. 7160-7000 cal. BP at two standard deviations, with suggestions of a much shorter use and infilling of the ditch segments ca. 7000 cal. BP based on interpretation of the 14C dates with Bayesian methods111, pottery typology and ceramic re-fits112. The highly fragmented remains of over 500 individuals were found deposited in the two ‘ditches’ (Fig. S9), usually in heaps or scatters that also include pottery, faunal remains, stone and bone tools71,113. The human bone assemblage is dominated by skulls, especially skull caps, which were in some cases deposited separately in little clusters. These were shaped in a regular manner, probably with the help of an adze, and often bear cut marks that indicate removal of the scalp. Individuals of all age groups and sexes are represented among the deceased. Evidence for violence around the time of death is compounded by the fact that the skulls often bear traces of peri-mortem traumas67. Patterns of systematic and targeted bone breakage and defleshing have led some to infer mass cannibalism at the sit67, though the same evidence can be used to suggest ritual killing or ‘sacrifice’72. Much of the pottery deposited in the pits appears to be exotic, with stylistic elements indicating connections with communities as far east as the Elbe Valley in Germany. Hence a question arises as to whether the dead might have come from as far as the pottery itself. Strontium isotope analysis on tooth enamel of eight individuals, including two regular LBK ‘hocker’ graves from the inner pit ring, shows an intriguing pattern in which those individuals that have received formal inhumation fall within the local range of variation, while disarticulated or dismembered human remains belong to ‘non-locals’114. The Pars petrosa sampled for this project (Herx), which 19 Marchi et al. The mixed genetic origin of the first farmers of Europe belongs to a genetically female individual, comes from a stray skull fragment recovered in the northern end of the outer trench ring, section 281-19, excavated in 1998. The ‘ditch’ is about 2.5 m deep and V-shaped in this section, yielding only isolated human bones, faunal remains, loam and pottery fragments from a depth of 40 cm down. The infilling is likely to have taken place in the late/final LBK, ca. 7000 cal. BP, based on ceramic finds including decorated sherds. The individual reported here is now 14C dated to 7164-6993 cal. BP at 2σ (MAMS 46041: 6189 ± 25 BP).

Figure S9 - Herxheim: one of the bigger find concentrations (image credit: GDKE - Landesarchäologie Speyer).

Radiocarbon dating and stable isotope analysis

We report new radiocarbon dates (see Table S1) and stable isotope values (carbon and nitrogen; see Table S2) for seven individuals. The analyses were performed at the Curt-Engelhorn- Zentrum Archäometrie gGmbH (Mannheim, Germany). The collagen used for the analyses was extracted from fragments of the petrous bones used for palaeogenomic analysis. The isotope results were used as a basis for palaeodietary inference (see Fig. S10) and for freshwater reservoir effect correction in the case of the Vlasac sample (see Table S1). Radiocarbon dates on human bones from archaeological sites in the Danube Gorges are known to be influenced by a large freshwater 14C reservoir effect, due to high intake of freshwater food, and need to be corrected 20 Marchi et al. The mixed genetic origin of the first farmers of Europe

before calibration19. This was done here following the method described by Cook et al.19, later amended by Bonsall et al.20: any Danube Gorges date with associated δ15N>8.3‰ is corrected using a weighted mean age offset of 545 ± 70 radiocarbon years for 100% fish-based diet (corresponding to δ15N=17‰).

For the palaeodietary reconstruction, we analyzed the delta values of carbon (δ¹³C) and nitrogen (δ15Ν)115. Carbon isotopes are generally used to deterine the amount of marine and terrestrial

proteins in ancient diets, as well as C3 and C4 photosynthetic pathways. Marine endpoints of δ¹³C are -12 ± 1‰ while values from terrestrial proteins centre around -21 ± 1‰116,117. Nitrogen isotopes are often used to determine the trophic level of the organism within a specific ecosystem and for distinguishing between aquatic and terrestrial food sources. Terrestrial herbivores δ15Ν values typically range between 4 and 7‰, whereas carnivores - including humans - show values about 3-4‰ higher on average than the organisms they consume116,118.

The δ13C values of the seven individuals reported here range from -19.13‰ to -20.54‰, while the δ15N values range from 8.73‰ to 15.77‰ (see Table S2). In order to place the sampled individuals within a broader dietary context, we plotted their values together with published isotopic data from Mesolithic and Neolithic individuals of Central and Southeast Europe, Northwestern (NW) Anatolia, together with faunal remains from the same sites (Fig. S10).

Table S2 - Summary of the δ13C and δ15N values obtained from the 7 individuals analyzed in this study. The values given are the nitrogen content (N [%], average of triple determination) with standard deviation (1σ; N [%] SD), the carbon content (C [%], average of triple determination) with standard deviation (1σ; C [%] SD), the atomic C:N ratio, the 15N/14N isotopic ratio (D/C δ15N [‰ AIR]) relative to the standard AIR with drift correction (D/C) (average of triple determination) with standard deviation (1σ; D/C δ15N [‰ AIR] SD) and the 13C/12C isotopic ratio (D/C δ¹³C [‰ VPDB]) relative to the standard VPDB with drift correction (D/C) (average of triple determination) with standard deviation (1σ; D/C δ¹³C [‰ VPDB] SD). D/C δ¹⁵N D/C δ¹³C Collagen N [%] C [%] D/C δ¹⁵N [‰ AIR] D/C δ¹³C [‰ VPDB] Sample ID yield [%] N [%] SD C [%] SD C:N [‰ AIR] SD [‰ VPDB] SD

VLASA32 2.76 10.71 0.69 29.71 1.94 3.24 15.77 0.12 -20.13 0.09 Bar25 7.07 15.97 0.08 43.79 0.09 3.20 10.20 0.01 -19.13 0.04 Nea3 5.18 12.87 0.79 35.37 1.88 3.21 11.88 0.03 -19.91 0.07 Klein7 1.00 10.07 0.29 28.38 0.80 3.29 9.65 0.02 -20.31 0.06 Dil16 2.76 13.57 0.23 36.95 0.72 3.18 8.73 0.08 -20.52 0.02 Asp6 1.16 7.17 0.97 21.54 2.81 3.51 10.01 0.14 -19.97 0.07 Herx 6.75 14.39 0.13 39.32 0.16 3.19 10.28 0.02 -20.54 0.01

21 Marchi et al. The mixed genetic origin of the first farmers of Europe

Figure S10 - Scatterplot of δ13C and δ15N values obtained from the individuals analyzed in this study and published data from 613 Neolithic and Mesolithic humans from Central Europe122, the Danube Gorge19,20,22,23,62,100,123–127, the Aegean Basin 127–129 and NW Anatolia73,74,127. Published faunal values from the same period/regions are included for comparison22,62,73,74,100,130,131.

The Neolithic individuals from Herxheim (Herx), Asparn-Schletz (Asp6) and Kleinhadersdorf (Klein7) cluster with other Neolithic individuals from Central Europe (Fig. S10). Without detailed examination of local food web variations, inferences made in this section remain tentative. The δ13C values observed indicate a diet based on terrestrial animal protein with less plant contribution. Herx shows values suggestive of higher C3 plant (e.g. wheat, barley, leafy vegetables) consumption, when compared with other sampled individuals. Asp6 presumably consumed more animal than plant proteins. The individual from Dillingen-Steinheim (Dil16) has lower δ15N values compared to central European Neolithic individuals, but overall falls within the range of distribution of Neolithic farmers in central Europe. The lower δ15N may be influenced by the quality of the terrestrial proteins, meaning that the proxy includes animals lower in the food chain, i.e. herbivores (δ15N values typically +5.3 ±1.9). It is worth bearing in mind that nitrogen isotopic variations differ greatly within and between ecosystems and trophic levels; herbivore bone collagen values are affected by many factors including climate, organism’s 22 Marchi et al. The mixed genetic origin of the first farmers of Europe physiology etc119. The Neolithic individual from Barcın (Bar25) has the lowest δ13C value in the dataset and plots outside the Neolithic group. The sample’s δ13C intermediate value may be influenced by a mixed C3 and C4 (e.g. millet) plant consumption, as well as animal proteins. Nea3, the ~6-9 month old infant from Nea Nikomedeia has higher δ15N value compared to other Neolithic individuals of this study, related to breastmilk consumption (breastfed infants typically have 2-3‰ higher δ15N values than their mother120,121). The Mesolithic individual from the Danube Gorges (VLASA32) has the highest δ15N value, and plots outside the range of published Mesolithic individuals from Central Europe122. Its high δ15N value suggests a large intake of freshwater food, primarily large fish (e.g. anadromous salmonids), and is consistent with recent published values from the Danube Gorges Mesolithic sites62.

23 Marchi et al. The mixed genetic origin of the first farmers of Europe

Section 2: DNA preparation and sequencing

Paleogenetic analyses were conducted at the Institute of Organismic and Molecular Evolution (iomE) at the Johannes Gutenberg University, Mainz. All laboratory steps prior to PCR amplification (sample preparation, DNA extraction and library preparation) were carried out in the dedicated ancient DNA facilities of the Palaeogenetics Group, which are separated from post- PCR areas. Strict aDNA protocols to prevent and detect contamination with modern DNA as well as cross contamination between samples were applied as previously described132–134. These aDNA standards include decontamination of workspace, labware and samples, independent DNA extractions and the processing of blank controls during sample pulverization, DNA extraction, library preparation and PCR reactions to monitor contaminations.

Sample preparation

Human petrous bones (Pars petrosa) were prepared as described in Hofmanová et al.77. After documentation, the bone samples were sterilized under ultraviolet light (254 nm) from 2 sides for 45 minutes per side. In order to remove superficial contaminants, soil remains and the outer bone surface were removed using a sandblasting machine (P-G 400, Harnisch & Rieth, Winterbach, Germany) with Spezial-Edelkorund (EW60/250 my and 30B/50 my; Harnisch+Rieth). The densest, inner part of the petrous bone was then isolated and cut in cubes using a disk saw (Electer Emax IH-300, MAFRA). After irradiation with ultraviolet light (254 nm) from two sides, for 45 minutes per side, the densest bone cubes were pulverized using a mixer mill (MM200, Retsch). To control contamination blank milling controls containing hydroxyapatite were processed in parallel.

DNA extraction

DNA extraction followed the protocol by Yang et al.135 with the modifications described in MacHugh et al.136 and Gamba et al.137 as well as additional modifications described below. For each sample, 0.15-0.31 g of bone powder was used for extraction. Prior to extraction, a pre-lysis was performed by adding 1 mL of EDTA (0.5 M, pH8, Ambion/Applied Biosystems, Life technologies, Darmstadt, Germany) to the bone powder and incubating at room temperature for 10 minutes. The solution was centrifuged at maximum speed to pellet the powder and the

24 Marchi et al. The mixed genetic origin of the first farmers of Europe supernatant was removed. Lysis was performed on rocking shakers at 37°C for 24 hours (900 rpm) using 1 ml of extraction buffer containing EDTA (950 μl, 0.5 M, pH8, Ambion/Applied Biosystems, Life technologies, Darmstadt, Germany), Tris-HCl (20 μl, 1 M, pH8, Life Technologies, Carlsbad, ), N-Laurylsarcosine (17 µl, 5%, Merck Millipore, Darmstadt, Germany) and Proteinase K (13 µl; 20 mg/ml, Roche, Mannheim, Germany). After 24 hours of incubation, the samples were centrifuged for 10 minutes at 10,000 rpm, the supernatant was removed, transferred into new tubes and stored in a fridge until further processing. A second lysis step was performed following the same procedure. Following lysis, the supernatants from the two lysis steps were merged on an Amicon Filter (Amicon Ultra-4 30 kDA, 15 ml, Merck Millipore, Darmstadt, Germany) and centrifuged for 10 min at 2500 rpm. The DNA was then washed twice with 3 ml 1X Tris-EDTA, followed by centrifugation at 2500 rpm for 20 minutes and discarding of the flow-through in between. After washing, the extract was concentrated to 100 µl and subsequently purified with the QIAgen MinElute kit (Qiagen, Venlo, Netherlands) following the manufacturer’s instructions but incubating for 5 minutes during elution with 44 µl elution buffer (preheated to 65°C). Blank controls were processed during DNA extraction and incorporated into all further steps of the analyses.

Library preparation

Double-indexed Libraries were prepared according to the protocol by Kircher et al.138 with slight modifications. Prior to library preparation the DNA extracts were treated with USERTM enzyme: 5 µl of USERTM enzyme (New England Biolabs, Ipswich, Massachusetts, United States) was added to 16.25 μl of DNA extract (exception: 17µl of two merged extracts were used for library SL3_5 of sample AKT16, see Supp. Table 1) and the mixture was incubated for three hours at 37°C 139. The blunt-end repair step followed immediately and was performed using the NEBNext End Repair Module (New England Biolabs, Ipswich, Massachusetts, United States): the DNA extract was mixed with NEBNext End Repair Reaction Buffer (10X, 7 µl), NEBNext End Repair Enzyme Mix (3.5 µl) and nuclease-free water (38.25 µl; for a final reaction volume of 70 µl) and incubated for 15 minutes at 25°C followed by 5 minutes at 12°C. Deviating from this procedure, two libraries (SL2.12_MU and SL3.12_MU of sample VC3-2) were produced using 20 µl DNA extract and without USERTM enzyme treatment of the DNA extract (see Supp. Table 1). In the adapter ligation step hybridized adapters P5 and P7 (IDT, Leuven, ) were used at a concentration of 0.75 µM. 3 µl of Fill-In product (total volume: 40 µl) were amplified using

25 Marchi et al. The mixed genetic origin of the first farmers of Europe

AccuPrimeTM Pfx SuperMix (20 µl; Thermo Fisher Scientific, Waltham, Massachusetts, United States) in one PCR parallel adding unique and sample-specific index combinations to the library molecules (final reaction volume: 25 µl; final primer concentration: 200 nM or 160 nM each). Double indexing followed Kircher et al.138, but using index sequences from the NexteraXT index Kit v2 (Illumina, San Diego, California, United States; barcode length 8 bp; ordered at IDT, Leuven, Belgium). The PCR was performed in 9-13 cycles; the PCR temperature profile followed the manufacturer’s recommendations but using an annealing temperature of 60°C, extending for 30 seconds during each cycle and performing a final elongation step for 5 minutes. For purification during library preparation the MinElute PCR Purification Kit (Qiagen, Hilden, Germany) was used, while amplified libraries were purified with the MSB® Spin PCRapace (Invitek, Stratec Molecular, Berlin, Germany). Libraries were quantified by Qubit® Fluorometric quantitation (dsDNA HS assay, Invitrogen, Carlsbad, California, United States) and measurement on the Agilent 2100 Bioanalyzer System (HS DNA, Agilent Technologies, Waldbronn, Germany). Blank controls as well as positive controls (nonsense hybrids) of known concentration, were processed in every library step including PCR amplification to verify the success of the library preparation and to monitor contamination. Quantification of blank controls did not indicate significant contamination during any laboratory step (pulverization, DNA extraction, library preparation and amplification).

Sample Screening

Using shallow shotgun sequencing on an Illumina MiSeqTM platform at StarSEQ GmbH (Mainz, Germany), all samples were screened for their endogenous DNA preservation. Libraries were equimolarly pooled, subsequently purified with magnetic beads (Agencourt® AMPure® XP beads, Beckmann Coulter) and sequenced in single-end runs with 50 bp read length. Demultiplexing was performed by the sequencing facility using the MiSeq Reporter and allowing one mismatch in the barcode. Blank controls from pulverization, DNA extraction, library preparation and PCR-steps were sequenced alongside to estimate the fraction of potentially contaminating reads introduced in the lab (see Section 3).

Whole-genome sequencing

For deeper shotgun sequencing 2-5 DNA extracts and 3-9 libraries of each sample were prepared (as detailed above). The libraries were amplified in up to 13 PCR parallels, each with an 26 Marchi et al. The mixed genetic origin of the first farmers of Europe individual index combination, to increase the complexity of the libraries. Subsequently, PCR parallels were purified and quantified individually as described above. For sequencing, PCR parallels were pooled equimolarly according to their concentrations measured on Qubit® and purified with magnetic beads (Agencourt® AMPure® XP beads, Beckmann Coulter). The samples were sequenced on an Illumina HiSeq3000 (SE, 100 cycles or PE, 150 cycles) at the Next Generation Sequencing Platform at the University of Bern, Switzerland. The lab work and the sequencing strategy for each sample are summarized in Table S3; for details see Supp. Table 1.

Table S3 - Summary of lab work and sequencing strategy.

Sample ID # Extracts # Libraries # PCR parallels # HiSeq3000 Lanes AKT16 4 (+1 merged) 8 98 8 Bar25 5 5 58 5 Nea2 4 4 41 3 Nea3 4 4 42 3 VLASA32 2 4 52 4 VLASA7 2 4 52 4 LEPE48 2 3 35 3 LEPE52 5 7 72 4 STAR1 2 3 36 3 VC3-2 4 6 60 4 Asp6 2 3 35 3 Klein7 4 5 50 3 Ess7 4 6 69 4 Dil16 4 5 47 3 Herx 5 9 110 6

27 Marchi et al. The mixed genetic origin of the first farmers of Europe

Section 3: Genotyping, quality checks

Samples processed

We analyzed a total of 101 high-depth samples, consisting of: 1. 77 modern individuals from the Simons Genome Diversity Project (SGDP, https://www.simonsfoundation.org/simons-genome-diversity-project/)140, mostly from Europe and SW Asia (Fig. S11b), but we also included two Africans (Mende), six Asians (two Tubalars, two Kapus, two Hans), two Oceanians (Papuans) and two Amerindians (Karitianas) for comparison. 2. 15 ancient individuals sequenced for this study at 10X - 15X (see Table 1) 3. The following nine other high-quality ancient (Palaeolithic, Mesolithic and Neolithic) individuals previously published (see Table S4; Fig. S11a): a. Bichon141, demultiplexed FASTQ files downloaded from ENA with run accessions ERR1078331-ERR1078351 b. Loschbour and Stuttgart142, demultiplexed FASTQ files kindly provided by the authors (contact: Kay Prüfer). c. SF12143, BAM file downloaded from ENA with run accession ERR2060277. d. Bon002144, unaligned FASTQ file downloaded from European Nucleotide Archive (ENA) with run accession ERR1514027. e. WC1145, demultiplexed FASTQ file kindly provided by the authors (contact: Jens Blöcher, co-author of the study). f. Bar877, demultiplexed FASTQ file kindly provided by the authors (contact: Zuzana Hofmanová, co-author of the study). g. NE1137, raw sequencing FASTQ file kindly provided by the authors (contact: Lara Cassidy). h. CarsPas1146, demultiplexed FASTQ file kindly provided by the authors (contact: Yoan Diekmann, co-author of this study).

28 Marchi et al. The mixed genetic origin of the first farmers of Europe

Table S4 - General and genetic information on the 9 ancient genomes from the literature used in our study. Note that the mean depth was recalculated from the pipeline described in this study.

Individual Period Site Country Age Mean Genetic Haplogroups Phenotype Publication (culture) (cal. BP) Depth (X) sex mtDNA Y (Eye; Hair)

Bichon LUP Grotte du Bichon Switzerland 13,770-13,560 7.47 XY U5b1h I2 Br; D Jones, 2015 SF12 Meso Stora Förvar Sweden 9,033–8,757 51.21 XX U4a1 ⎯ Br; D Günther, 2018 Loschbour LM Heffingen 8,170-7,940 22.43 XY U5b1a I2a1b* Bl; D Lazaridis, 2014 Bon002 Neo Boncuklu Turkey 10,229-9,927 5.45 XX K1a ⎯ ⎯ Kilinc, 2016 WC1 EN Cave Iran 9,405-9,032 12.66 XY J1d6 G2b Bl; D Broushaki, 2016 Bar8 EN Barcın Turkey 8,162-7,980 7.37 XX K1a2 ⎯ Br; D Hofmanova, 2016 NE1 MN (ALP) Polgar-Ferenci-hat 7,260-7,020 17.66 XX U5b2c ⎯ Br; D Gamba, 2014 Stuttgart Neo (LBK) Viesenhäuser Hof, Stuttgart-Mühlhausen Germany 7,050-6,750 15.58 XX T2c1d1 ⎯ Br; D Lazaridis, 2014 CarsPas1 EN Carsington Pasture Cave, Brassington England 5,606-5,471 10.14 XY J1c1 I2a2a1a1a Br; D Brace, 2019

LUP, Late Upper Palaeolithic; Meso, Mesolithic; LM, Late Mesolithic; Neo, Neolithic; EN/MN, Early/Middle Neolithic; LBK, Linearbandkeramik; ALP, Alföld Linear Pottery Br, Brown eyes; Bl, Blue eyes; D, Dark hair (listed as "Brown or Black", "Black/Dark", "Dark" in the original publications)

Paleolithic b Saami (2) a Mesolithic Saami (2) SF12 ● Neolithic Icelandic (2) Icelandic (2)

Finnish (3) Russian (2) Norwegian (1) ● Orcadian (2) CarsPas1 Estonian (2)

Loschbour Herx ● Ess7 Stuttgart● ● ● ● Klein7 Dil16 Asp6 ● English (2) Polish (1) NE1 Bichon Czech (1) STAR1 LEPE48; LEPE52 ● Hungarian (2) VC3_2 VLASA32; French (3) VLASA7 Bergamo (2)

Nea2; Tuscan (2) Adygei (2) Chechen (1) Nea3 Bulgarian (2) Basque (2) Abkhasian (2) Georgian (2) ● ●● Bar25; Bar8 Albanian (1) AKT16 Armenian (2) Spanish (2) Sardinian (3) Greek (2) Turkish (2) ●Bon002 Iranian (2) Crete (2) Druze (2) Iraqi_Jew (2) WC1 ● Palestinian (3) Jordanian (2) Bedouin (2) 0 500 1000 km 0 500 1000 1500 km Figure S11 - Geographical distribution of all genomes included in this study. a: 24 high-quality ancient genomes: 15 newly sequenced (bold) and 9 from the literature. b: 65 modern genomes from Europe and SW Asia that were selected within the SGDP panel. 29 Marchi et al. The mixed genetic origin of the first farmers of Europe

Programs used

All bioinformatic steps were conducted with the following programs and versions: ● fastqc - version 0.11.5, (www.bioinformatics.babraham.ac.uk/projects/fastqc/) ● Trim Galore! - version 0.4.3, (https://github.com/FelixKrueger/TrimGalore) ● bwa - Burrows-Wheeler Alignment Tool - version 0.7.15147 ● SAMtools - version 1.3148 ● Picard-tools - version 2.9, http://broadinstitute.github.io/picard/ ● GATK - version 3.7149 ● ATLAS - version 1.0, commit 6bd2482150 ● Snakemake - version 4.0151 ● seqtk - version 1.2 (https://github.com/lh3/seqtk) ● ContamMix - version 1.0152 ● MIA (Mapping Iterative Assembler) - version 1.0 https://github.com/mpieva/mapping-iterative-assembler ) ● mafft - version 7.31153 ● ANGSD - version 0.917154 Some of the steps were performed with commit 6df90e7 of the snakemake script ATLAS-Pipeline available at bitbucket.org/wegmannlab/atlas-pipeline, as indicated below.

30 Marchi et al. The mixed genetic origin of the first farmers of Europe

Bioinformatics Pipeline

All 101 ancient and modern samples were processed with the same pipeline, albeit minor changes due to variation in how the raw data were obtained. We therefore first describe the pipeline as used on the 15 samples sequenced in this study, followed by a description of how previously published data were analyzed. The pipeline consisted of the following 8 steps.

Step 1: Raw data handling Raw, demultiplexed sequencing data were processed per PCR-parallel as follows: 1. The sequencing quality was checked with fastqc. 2. Raw reads were trimmed using trimgalore with no quality filter and a length filter of 30 bp ( -q 0, --length 30, -a ‘AGATCGGAAGAGCACACGTCTGAACTCC’ ). For paired-end libraries the additional option --retain unpaired was used and the second adapter sequence (-a2 'AGATCGGAAGAGCGTCGTGTAGGGAAAG') was provided. 3. A second fastqc analysis was performed to verify trimming and the quality of the remaining sequences. 4. Reads were then aligned to the 1000 genomes version of the human reference genome hs37d5 (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly _sequence/hs37d5.fa.gz155) using bwa -mem with options -t 8 -M. Reads with a mapping quality below 30 were filtered out using SAMtools. 5. Duplicate reads were marked but not removed using picard-tools MarkDuplicates with VALIDATION_STRINGENCY=SILENT and AS=TRUE. 6. Unmapped reads were removed with SAMtools with option -F4. Informative read groups were added to keep track of individual PCR-parallels with picard-tools AddOrReplaceReadGroups. 7. All necessary sorting and indexing in-between the steps was performed with SAMtools, library-files from the same samples were merged using SAMtools merge.

31 Marchi et al. The mixed genetic origin of the first farmers of Europe

Step 2: Local Realignment Local indel-realignment is an important but computationally expensive step in our pipeline. In order to allow for parallelization, we proceeded as follows: First, we identified potential target intervals using GATK RealignerTargetCreator from a set of ten ancient and ten modern samples: Asp6, Bar25, Dil16, Ess7, Klein7, LEPE48, Nea2, Nea3, STAR1, VC3, VLASA32, VLASA7, French-1, Sardinian-1, Russian-1, Spanish-1, Georgian-1, Hungarian-1, Icelandic-1, Finnish-1, English-1, Greek-1, Mende-1, Estonian-1, Polish-1 and providing known indel sites identified by the 1000 Genomes project (ftp://gsapubftp- [email protected]/bundle/b37/1000G_phase1.indels.b37.vcf.gz, ftp://gsapubftp- [email protected]/bundle/b37/Mills_and_1000G_gold_standard.indels.b37.vcf.gz155). We refer to the obtained target set as TargetsBase. Second, we created a set of reads by downsampling the following 5 ancient and 5 modern samples to a depth of 4X: Bar25, Klein7, STAR1, VLASA32, VLASA7, French-1, Georgian-1, Finnish-1, Greek-1, Polish-1). We refer to this set as the GuidanceSet. Third, we run local indel-realignment for each sample individually, proceeding as follows: 1. We identified private target positions for the sample using GATK RealignerTargetCreator and providing the known indel sites from the 1000 Genomes project (as above). We then unified the identified targets with TargetsBase to obtain a sample-specific target set. 2. We run GATK IndelRealigner on this sample together with the GuidanceSet on the sample- specific targets. We parallelized this step additionally by contig, merging the output with picard-tools. 3. After local-realignment, we used picard-tools FilterSamReads to filter out reads that contained soft-clipped bases (identified with ATLAS assessSoftClipping) or did not pass picard-tools ValidateSamFile. We further used SAMtools to keep only primary alignments and, for paired-end libraries, proper pairs.

32 Marchi et al. The mixed genetic origin of the first farmers of Europe

Step 3: Sample Stats We determined read counts, sequencing depth, endogenous DNA-content and other statistics using ATLAS BAMDiagnostics, ATLAS depthPerSiteDist and SAMtools flagstat. These statistics are given per library-parallel for all ancient samples in Supp. Table 1 and merged for the 15 samples produced here in Table S5, Fig. S12. 1.0 a AKT16 Herx Nea3 b Asp6 Klein7 STAR1 Bar25 LEPE48 VC3−2 Dil16 LEPE52 VLASA32 0.8

3e+08 Ess7 Nea2 VLASA7 0.6 2e+08 count 0.4 normalized cumulative depth normalized cumulative 1e+08 0.2 0.0

0e+00 0 10 20 30 40 50 0 10 20 30 40 50 depth depth 60 c AKT16* Herx Nea3 d ● Asp6 Klein7 STAR1 ● ● ● ● ● Bar25 LEPE48 VC3−2 ● ● 50 Dil16 LEPE52 VLASA32 ● 5e+07 ● Ess7 Nea2 VLASA7 ● ● ● ● ●

● 40 ● ● 4e+07

● ● ● ● 30 count

3e+07 ● ● % endogenous 20 2e+07 10 1e+07 0 2 0e+00

30 40 50 60 70 80 90 100 − Herx Ess7 Dil16 Asp6

read length Nea2 Nea3 Bar25 Klein7 AKT16 VC3 STAR1 LEPE48 LEPE52 VLASA7 VLASA32

Figure S12 - Sample Statistics. a: Distribution of read depth; b: normalized cumulative depth distribution for all samples produced in this study; c: Length distribution of sequenced single-end samples; d: Percentage of endogenous reads among all sequencing reads per library-parallel (black) and for whole samples (green points) for all samples produced in this study. The vertical dashed red line in c) indicates where the samples were split for post-mortem damage estimation. Reads above a length of 95 bp are assumed to not have been sequenced to their full length and may therefore miss the typical PMD-pattern on their 3’ side. (*only single-end libraries plotted for AKT16) 33 Marchi et al. The mixed genetic origin of the first farmers of Europe

Table S5 - Sequencing data produced in this study (n = 15). No. of reads mapping % of endogenous Total no. of Mean sequencing Sample to hs37d5 after reads without sequenced reads depth (X) filtering duplicates AKT16 3,484,002,250 709,352,253 20.36027 12.25 Asp6 1,139,245,581 689,412,495 52.80776 12.11 Bar25 1,991,972,802 1,176,984,160 43.04312 12.65 Dil16 1,164,789,725 652,044,979 48.95333 10.60 Ess7 1,562,375,278 749,310,530 41.48627 12.34 Herx 2,356,604,701 698,260,906 25.51949 11.46 Klein 1,174,296,074 679,244,954 50.60461 11.30 LEPE48 1,148,525,245 622,825,286 48.02362 10.92 LEPE52 1,541,932,690 902,777,757 47.21246 12.37 Nea2 1,168,563,125 723,747,890 52.05976 12.51 Nea3 1,193,896,217 751,809,835 52.52959 11.57 STAR1 1,167,689,785 673,467,996 49.30335 10.55 VC3-2 1,537,245,748 765,893,269 42.79341 11.22 VLASA32 1,603,082,543 899,098,270 46.05057 12.65 VLASA7 1,601,828,507 989,180,954 53.10986 15.21

Steps 4-7: ATLAS-Pipeline We used commit 6df90e7 of the snakemake script ATLAS-Pipeline available at bitbucket.org/wegmannlab/atlas-pipeline with the following entries in the config-file: - sample_file: Input-table with sample paths atlas: location of ATLAS executable; ref: location of reference genome (hs37d5); UCNE: location of ultraconserved sites for recal - sequence: single, split=T, cutoff:5 for single-end libraries sequence: single, split=F for pre-merged reads from reference-data sequence: paired for paired-end libraries - recal: NO-file - PMD: separate

34 Marchi et al. The mixed genetic origin of the first farmers of Europe

Step 4: Post-Mortem-Damage (PMD) This step was performed with the snakemake script ATLAS-Pipeline. We proceeded as follows to deal with post-mortem damage (PMD) affecting the ancient samples: 1. Since PMD is most prevalent at read ends, the PMD pattern is different if a read spans the entire fragment or not. Following 156, we therefore split single-end reads by length using ATLAS task=splitRGbyLength with the option allowForLarger. We identified reads that span the entire fragment as those shorter than the maximum read length minus 5 bases to account for variation introduced by adapter-trimming (Fig. S12). 2. We identified PMD patterns for each read group (PCR-parallel) using ATLAS task=estimatePMD, providing the reference genome with fasta=hs37d5.fasta and limiting the analysis to the chromosomes 1-22, X and Y with option chr. As shown in Fig. S13, all samples show typical post-mortem damage patterns. VC3 contains libraries that were not UDG-treated and therefore shows higher PMD-patterns than the other samples from this study.

AKT16 Dil16 Klein7 Nea2 VC3−2 Asp6 Ess7 LEPE48 Nea3 VLASA32 0.025 Bar25 Herx LEPE52 STAR1 VLASA7 0.025 damage rate − 0.020 0.020 mortem − 0.015 0.015 0.010 0.010 0.005 0.005 Weighted average of empiric post average Weighted

0.000 0 10 20 30 40 5050 40 30 20 10 0 0.000 Position in the read (5' ® 3') Position in the read (3' ® 5')

Figure S13 - Estimated post mortem damage (PMD) patterns for the first 50 bp (C→ T, left) and the last 50 bp (G → A, right). PMD was estimated per PCR-parallel. Shown here is the weighted average per sample. VC3-2 was not completely treated with UDG resulting in visibly higher PMD- patterns. All samples show a pattern of elevated PMD towards the end of the reads.

35 Marchi et al. The mixed genetic origin of the first farmers of Europe

Step 5: Recalibration This step was performed with the snakemake script ATLAS-Pipeline. We performed base-quality recalibration as described in 156 using ATLAS task=recal on known ultraconserved sites, obtained from https://ccg.epfl.ch/UCNEbase/data/download/ucnes/hg19_UCNE_coord.bed157. We assumed equal base frequencies (equalBaseFreq) and provided the PMD estimates obtained above for each PCR-parallel (pmdFile). We inferred recalibration parameters for each PCR-parallel from all sites with a minimum depth of 2 (minDepth=2). Sequencing- and PCR-duplicates were not excluded from the estimation of recal parameters (keepDuplicates) since they add additional power. The quality transformation by base quality recalibration (created with ATLAS task=qualityTransformation) can be seen in Fig. S14.

Asp6 Bar25 Dil16 50 50 50 40 40 40 30 30 30 20 20 20 10 10 10 0 0 0

0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 Ess7 Herx Klein7 50 50 50 40 40 40 30 30 30 20 20 20 10 10 10 0 0 0

0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 LEPE48 LEPE52 Nea2 50 50 50 40 40 40 30 30 30 20 20 20 10 10 10 0 0 0

0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 Nea3 STAR1 VC3−2 50 50 50 Recalibrated quality scores Recalibrated 40 40 40 30 30 30 20 20 20 10 10 10 0 0 0

0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 VLASA32 VLASA7 50 50 0 1e−1 40 40 1e−2 30 30 1e−3

20 20 1e−4 1e−5 10 10 1e−6 0 0

0 10 20 30 40 50 0 10 20 30 40 50 Machine reported quality scores Figure S14 - Quality Transformation for 14 genomes. Base quality scores before and after recalibration. A general trend is visible, where the sequencing machine mostly overestimated the quality of the sequenced bases.

36 Marchi et al. The mixed genetic origin of the first farmers of Europe

Step 6: Read-merging This step was performed with the snakemake script ATLAS-Pipeline. For paired-end samples and samples with partly paired-end sequenced libraries, the estimated recalibration parameters were directly corrected in the BAM file using ATLAS recalBAM providing the recal-parameters (recal) and the reference (fasta=hs37d5.fasta). Paired-end reads with corrected base qualities were then physically merged with ATLAS task=mergeReads to avoid the double-use of bases in the overlapping part. Reads that did not pass picart-tools ValidateSamFile were omitted. As a result of the physical merging, their read lengths changed, and we thus re-estimated PMD patterns as outlined above. We also created recalibrated BAM files from single-end libraries, which we used in any subsequent analysis in which ATLAS was not involved (e.g. contamination estimation).

Step 7: Genotype calling This step was performed with the snakemake script ATLAS-Pipeline. We called Bayesian genotypes at all sites in the reference genome with ATLAS task=callNew and parameters method=Bayesian, prior=theta, fixedTheta=0.001, infoFields=DP, equalBaseFreq. The reference genome, PMD patterns and recal-patterns (for single-end read groups) were provided to the caller with fasta, pmdFile and recal, respectively. The first and last 2 bp of each read were ignored (trim5=2, trim3=2).

37 Marchi et al. The mixed genetic origin of the first farmers of Europe

Step 8: Molecular sexing The molecular sex (Table 1) was inferred using a script from 158 version 4.0, based on the ratio of reads aligning to the X and Y chromosomes (Fig. S15).

XY

● ● ● ● ● ● ● ● ● 0.075 chromosomal reads − 0.05 to (X+Y) − 0.016 Fraction of Y Fraction XX

● ● ● ● ● ● 0 2 − Herx Ess7 Dil16 Asp6 Nea2 Nea3 Bar25 Klein7 AKT16 VC3 STAR1 LEPE48 LEPE52 VLASA7 VLASA32 Figure S15 - Inferred chromosomal sex for the 15 newly sequenced genomes, based on the share of reads aligning to Y-chromosome from the reads aligning to X and Y chromosomes. A ratio below 0.016 indicates an XX genotype, a ratio above 0.075 indicates an XY genotype. Note that error bars showing 95% confidence intervals are too small to be visible in this plot.

38 Marchi et al. The mixed genetic origin of the first farmers of Europe

Step 9: Contamination estimation Blank controls of milling, extraction, library-preparation and index-PCR were analyzed alongside the screening process. The concentration of potential contaminants was never higher than 0.81 ng/µl. A bioanalyzer run and the screening results confirm the detected DNA to consist of primer- and adapter-dimers with a maximum of 55 aligning reads per blank control (out of a potential share of 200,000 reads). To estimate modern human contamination in the sequencing results, two standard approaches were used: ContamMix: The ContamMix R-script estimate.R estimates the amount of authentically mapping mitochondrial reads at 311 modern diagnostic marker positions. We run ContamMix with the option --baseq20 on the following two input-files that we generated from all reads in the BAM file that mapped against the MT-Genome (SAMtools): 1. An alignment of mitochondrial reads against their own consensus (-samFn). This consensus was constructed by extracting iteratively mapping the reads with MIA (mia) and mapping the last iteration with MIA (ma) to obtain one FASTA-sequence. The MT-reads were mapped against their consensus using bwa aln and samse (V.0.7.17) and filtered for a minimum mapping quality of 30 with SAMtools. 2. A multiple alignment of the consensus genome and a fasta-file containing the diagnostic marker positions against each other (--malnFn) obtained with mafft. ANGSD: For male individuals, contamination was additionally estimated using the ANGSD contamination script based on haploid X-chromosomal regions (X:5000000-154900000). We run ANGSD with the options -doCounts 1 and -iCounts 1, followed by the executable misc/contamination, providing the publicly available HapMap file HapMapChrx.gz. The contamination estimates obtained with both methods, the percentage of not authentically mapping reads by ContamMix and the estimated contamination rate by ANGSD, are shown in Fig. S16 and Supp. Table 1. While the two methods use complementary approaches (i.e. comparing mtGenomes at Neanderthal-specific sites vs heterozygosity on X-chromosomes in males), resulting contamination estimates are generally low (<2% in most cases, <4% in all cases).

39 Marchi et al. The mixed genetic origin of the first farmers of Europe

● 0.04 contamMix ● ANGSD 0.03

● 0.02 contamination

● ● ●

0.01 ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 2 − Herx Ess7 Dil16 Asp6 Nea2 Nea3 Bar25 Klein7 AKT16 VC3 STAR1 LEPE48 LEPE52 VLASA7 VLASA32 Figure S16 - Percentage of contamination in the 15 newly sequenced samples. All samples show a contamination rate below 4%. Grey: ContamMix, 1-mapAuthentic, error bars represent 95% CI. Blue: ANGSD (male individuals only), Method1, new_llh, ML, standard error for all ANGSD analysis is below 2e-12 and therefore not visible in this plot.

40 Marchi et al. The mixed genetic origin of the first farmers of Europe

Reference samples

To minimize reference biases159, we committed to analyze all previously published samples with the same pipeline outlined above. However, some adaptations were necessary for some samples: - Blank characters in read names (e.g. from ENA accession numbers) were removed to ensure the proper detection of optical duplicates. - The quality scores of the Bichon FASTQ files were converted from illumina 1.5 to illumina 1.9 with setqk seq -V 64. - Stuttgart and Loschbour were received as unaligned and untrimmed demultiplexed raw data in BAM file format and transformed to unaligned FASTQ format using picard-tools SamToFastq. - For the Loschbour files, individual library information was not recoverable. Therefore duplicates across all sequencing files were marked in this step of the pipeline. - Raw sequencing data was not obtainable for Bon002 and SF12. For Bon002, we downloaded the published BAM files converted back into FASTQ files from ENA (run accession ERR1514027). For SF12, we used published BAM files with already physically merged reads and converted them back to FASTQ files using picard-tools SamToFastq. For both, we split the FASTQ files by run identifiers to treat each run separately throughout the pipeline. Since those reads were already pre-processed, we omitted adapter trimming, removed orphaned reads and otherwise followed the pipeline for single-end samples without splitting read groups by length. - For NE1, one of the FASTQ files was corrupted and only 25% of the contained reads could be recovered. As a result, we could only use 97% of the total number of reads generated for that sample137. We then demultiplexed the fixed FASTQ-files with a custom bash script, allowing one mismatch at the first position and one mismatch at any other position of the index. - Since the modern samples from the SGDP dataset were already aligned to the desired reference genome, we abstained from remapping them and analyzed them along the ancient samples, starting with marking optical and PCR duplicates (picard-tools) and local realignment. For each ancient sample, all differences from the standard pipeline are detailed in Supp. Table 1.

41 Marchi et al. The mixed genetic origin of the first farmers of Europe

Data filtering

All VCF file processing was performed using bcftools v1.9. Individual VCFs obtained after Bayesian genotype calling were first filtered according to their read depth (DP): for each individual we excluded sites that had DP < 8, and those that had a DP more than 2.5 s.d. away from the mean DP for each individual (see Supp. Table 2). We also excluded sites that had poor genotype quality (GQ < 30), and we only kept autosomal sites. Furthermore, heterozygous sites were considered as homozygous if they had a significant allelic imbalance (p-value ≤ 0.1) tested by Fisher’s exact test. After filtering, the modern individuals had a read depth of 37X on average and the ancient individuals of 14X (ranging from 6 to 70X), with 2,549,812,798 sites passing the filters on average for the modern genomes and 1,983,419,630 for the ancients (ranging from 333,061,590 to 2,573,873,256). Applying these filters led to a substantial increase in the quality of our heterozygous sites, reducing the amount of sites with high allelic imbalances and decreasing the asymmetry between singleton- and non-singleton sites to a minimum (Fig. S17). To quantify this improvement explicitly, let us denote by Histhe set of loci at which individual i was called heterozygous either for a singleton (only non-reference allele across all samples, s=1) or not (s=0). We then define the imbalance statistics

&$! ∑!∈# $( &'.)) $% '$! !!" = &$! , ∑!∈# $( ,'.)) $% '$! where ril denotes the number of reference alleles observed in individual i at site l out of the total number of observed alleles dil at this site. To compare singleton (s=1) against non-singleton (s=0) positions, we further quantify #! = !-.⁄!-'. As shown in Fig. S18, the chosen filters guarantee comparable imbalances at singleton and non-singleton sites, indicating similar quality of both categories. A striking outlier was samples CarsPas1, which was not included in any demographic analysis.

42 Marchi et al. The mixed genetic origin of the first farmers of Europe

filter: none filter: DP filter: DP, GQ filter: DP, GQ, AI 10 10 10 10 a b c d AKT16 Asp6 8 8 8 8 Bar25 Bar8

6 6 6 6 Bichon Bon002

density CarsPas1 4 4 4 4

Single singletons Dil16 Ess7

2 2 2 2 Herx Klein7 LEPE48 0 0 0 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 LEPE52 10 10 10 10 e f g h Loschbour NE1

8 8 8 8 Nea2 Nea3 SF12 6 6 6 6 STAR1 Stuttgart density

4 4 4 4 VC3_2 VLASA32 Other heterozygous sites

2 2 2 2 VLASA7 WC1 modern 0 0 0 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Allelic Imbalance Allelic Imbalance Allelic Imbalance Allelic Imbalance Figure S17 - Successive effects of filtering on allelic imbalance. Density of the allelic imbalance for singleton positions (top line panel) and other heterozygous positions (bottom line panel). a, e: No filter; b, f: Depth filter (DP) as described in the text; c, g: DP filter and genotype filter (GQ ≤ 30); d, h: DP, GQ and allelic imbalance filter (AI > 0.1).

Heterozygous non−singletons Singletons Ratio singletons/non−singletons 4 4 4 ● ● a AKT16 Loschbour b ● c ● ● Asp6 NE1_1 ● ● ● BAR25−1 Nea2 ● ● ● 3 3 3 Bar8 Nea3 ● ● ● ● Bichon SF12 ● ● ● ● ● Bon002 STAR1 ● ● ● ● CarsPas1 Stuttgart ● ● ● ● ● ● ● ) ) ● ) 2 2 2 ● ● i i0 Dil16 VC3−2 i1 ● ● ● r r r ● ● ( ( ( ● ● ● Ess7 VLASA32−2 ● ● ● ● log log log ● ● Herx VLASA7−3 ● ● ● ● ● ● ● Klein7 WC1 ● ● ● ● ● ● ● ● 1 1 ● ● 1 ● ● LEPE48−2 modern ● ● ● ● ● ● LEPE52−1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 0 0 ● ● ● ● ● ●

raw DP GQ AI raw DP GQ AI raw DP GQ AI Filter Filter Filter

Figure S18 - Imbalance statistics. The imbalance statistics ris calculated for non-singleton (a, ri0) and singleton (b, ri1) sites. c: The imbalance ratio #! = !-.⁄!-'. The range of these statistics observed for modern samples is shown as grey shades in all panels.

43 Marchi et al. The mixed genetic origin of the first farmers of Europe

The filtered individual VCFs were then merged into a single VCF, and polarised with the Chimpanzee reference genome (panTro4). This merged VCF includes 15,001,235 polymorphic sites (14,865,831 strictly polymorphic within the non-missing genotypes, and 135,404 sites for which all the non-missing genotypes are homozygous for the derived allele). On average, the modern genomes show 4% of missing sites (with a maximum of 6%) and the ancients 29%, ranging from 3 to 87%. All in all, the majority (51%) of the sites have less than 5% of missing genotypes (76% of the sites have a missing rate ≤ 10%; 96% have a missing rate ≤ 50%), and 57,449 of the sites show no missing data among the 101 individuals. Additional filtering, including missing data removal, was done for demographic analyses performed on subsets of the newly sequenced individuals, described in the section on demographic inference.

Sequence annotations

Additional information was obtained for the genotyped sites, such as recombination rates, nucleotide states for the Chimpanzee and Gorilla reference genome, GERP scores. Recombination maps for the YRI, CEU and JPT populations were obtained from the 1000G project (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130507_omni_recombination_rat es/). The Chimpanzee and the Gorilla hg19 nucleotide states were obtained from the chained and netted alignments http://hgdownload.cse.ucsc.edu/goldenpath/hg19/vsPanTro4/axtNet/ and http://hgdownload.cse.ucsc.edu/goldenpath/hg19/vsGorGor5/axtNet/, respectively.

Phasing and imputation

Genotype imputation and phasing were done for each chromosome on the merged VCF that was polarised back with hg37d5 human reference genome. We used SHAPEIT4 v1.2160 with by-default parameters for sequencing data, the HapMap phase II b37 genetic map, and the Haplotype Reference Consortium161 dataset (accession number EGAD00001002729 on the European Genome-phenome Archive) as reference panel. All the chromosomal phased-imputed VCFs were then concatenated into a single phased VCF with bcftools v1.9, which includes 11,745,372 polymorphic sites in total.

44 Marchi et al. The mixed genetic origin of the first farmers of Europe

Section 4: Genetic structure analysis

Multi-Dimensional Scaling (MDS)

For each pair of individuals X and Y, we identified the sites showing no missing data for the two

162 individuals, and we computed their average nucleotide divergence %XY over these sites. We then represented the relationships between the ancient genomes (Fig. S19a) or modern and ancient genomes from Europe and SW Asia (Fig. S19b), using a classical multidimensional scaling (MDS) approach implemented in R (cmdscale function). We also performed the same analyses on a restricted set of sites defined as neutral according to Pouyet et al.’s163 definition (Fig. 2a, Fig. S50a).

● Palaeo/Mesolithic: a b Loschbour Bichon ● VC3-2 ●

VLASA7 ST VLASA32 ● ● Sardinian ● ● Neolithic: ● ● ● ● ● Ess7 ● ● ● ● Stuttgart ● ● ● ●

Bar8 ● ● ● Spanish ● ● ●● ● ● ● ● ● ● ● Albanian● ● Englishrian ● ● ● ●●● ● ● Hunga ● ● ● ●● uscan ● T ● ● Polish ● w ● rian ● ● ● ● ● Nor ● ● ● ● ●Bulga Cz ● Estonian ● ●

● z Orcadian ru D ● ●● ● ●● ● Finnish P ● ● ●● ●● ●● ● Jordanianw ● ● Russian ● ● Iraqi_J rkish ● ● Tu ● r ● ● ● ● A ● ● ● ● ● ● ● ● ● ● ● ● Abkhasian● Saami Iranian● ● ●

Figure S19 - 2D-MDS performed on the average nucleotide divergence (%XY ) matrix computed over the whole genome. a: ancient individuals only (n=24); b: all Europeans and SW Asians (n=89), ancient and modern (shown with small circles).

The analysis shown in Fig. S19a reveals a slightly different picture than Fig. 3 performed only on neutrally evolving sites, as the Upper Palaeolithic sample of Bichon (shown as an inverted triangle) is here well within the cluster of the other four hunter-gatherers (HGs; triangles with the tip up). Furthermore, Fig. S19b suggests strongest affinities of European Neolithic farmers with modern individuals from Southern Europe (Crete, Greece, , , ) rather than with Sardinians, at odds with Fig. 2a and previous analyses based on the projections of ancient individuals on Principal Components computed from modern individuals only164. Note also that NW Anatolian Early Neolithic individuals (AKT16 and Bar8) slightly diverge from the Neolithic cluster and are 45 Marchi et al. The mixed genetic origin of the first farmers of Europe closer to modern individuals from Turkey and , thus showing some surprising genetic continuity across millennia within this region. CarsPas1 from England is also an outlier found in- between Southern Europeans and a Bulgarian individual.

Genetic diversity

We estimated the level of genetic diversity of each individual as the proportion of heterozygous sites found in the neutrally evolving portion of its genome. Thus, for every genome, we divided the amount of observed neutral heterozygous sites by the expected number of neutral sites. This number is obtained by considering all those sites that i) have the same reference allele for both the chimpanzee and gorilla reference genomes (for polarization), ii) are not CpG sites and out of CpG islands, iii) are in regions with recombination rate > 1.5 cM/Mb based on 163. Finally, in order to consider only sites unaffected by BGC, we divided the number of these previously defined sites by 3, as only one third of all mutations should be BGC free (i.e. A↔T and G↔C polymorphisms). We represented this genomic neutral heterozygosity as a function of the geographical air distance to Boncuklu, Central Anatolia (Fig. 2b).

Runs of Homozygosity (ROHs)

We inferred the segments of homozygosity-by-descent or Runs of Homozygosity (ROHs) from the imputed genomes (step described in Section 3) of the 89 European and SW Asian modern and ancient individuals by using IBDSeq v. r1206 165 with default parameters but errormax = 0.005 and ibdlod = 2. We further processed the artificially long tracts spanning assembly gaps or centromeres (their genomic locations were obtained from UCSC genome browser: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/gap.txt.gz) and split them into shorter tracks excluding the gap stretch, inspired by 166. As advised when genomes do not come from an homogeneous population165, we focused on short (2-10 Mb) and long (>10 Mb) ROHs similarly to 167. We show their number and total length in ancient genomes (Fig. 2c), as well as in moderns for comparison (Fig. S20). In Fig. S20a, we expect the number and total genome size made up of the short ROHs to be negatively correlated with population size, so that a larger number of short ROHs indicate smaller population sizes. Consequently, the HG individuals seem to be drawn from small populations, as well as Bon002, WC1 and LEPE52 among Early Neolithic individuals. In Fig. S20b, long ROHs are indicative of recent inbreeding between close relatives, and therefore WC1 and LEPE52 are 46 Marchi et al. The mixed genetic origin of the first farmers of Europe

inferred to be the most inbred ancient individuals, followed by Stuttgart, Bichon and Loschbour. Note however that these ancient individuals appear less inbred than some modern individuals (Jordanian-2 or BedouinB-1 for instance). a b Paleolithic Paleolithic ● 200 Loschbour Mesolithic Mesolithic ● Neolithic ● Neolithic ● SF12 ● Moderns Moderns ●

● 300 Bichon 150

P ● Druz ● ●

● 200 100

● Bon002 ● ● ● otal Length (Mb)

otal Length (Mb) ●

Iraqi_Je T T VLASA32 ● ● WC1 ● VLASA7 Tur ● Druz ● LEPE52 ● 100 ● 50 ● ● STAR1 ● ● LEPE52 ● ● ● ● ● ● ● ● ● ● AKT16 ●● WC1 Stuttgart ● ● ● ● ● ●● ● ● CarsPas1 ● ●● ● ● ● Loschbour ●● ● ● ● ● ● ● ● ● ●● ● Bichon ●● 0 ●● 0 ●●

0 10 20 30 40 50 60 0 5 10 15

Nb of ROH Nb of ROH Figure S20 - Number and cumulated length of ROHs found in the imputed European and SW Asian genomes (n=89). a: short ROHs (size between 2 and 10M; b: long ROHs (>10Mb).

Admixture analysis

We calculated levels of admixture for 20 ancient individuals: the Late Palaeolithic individual (Bichon), 3 Mesolithic individuals (VLASA7, VLASA32 and Loschbour but not SF12 from Sweden, as it is a geographical outlier), 16 Early Neolithic (the lowest quality Neolithic genome Bon002 was not considered for this analysis in order to maximize the number of genomic sites to be used, as well as Lepenski Vir genomes as it is unclear if their cultural attribution corresponds to their genetic relatedness, Table 1, Fig. 2a). We restricted our analyses to sites without missing data among all 20 genomes, i.e. in total 742,403. Admixture coefficients of each genome were estimated using the R package LEA168 with parameters K=2 or 3, alpha = 100, and number of repetitions = 5. The function snmf calculates the fit (entropy) of each run and the admixture coefficients in an unsupervised manner. This admixture analysis reveals for K=3, a cluster corresponding to hunter-gatherers (HGs); a second with the Neolithic individuals; and a last cluster including the Iranian Neolithic individual. A few Neolithic individuals (CarsPas1, NE1, AKT16 and Bar8) show some HG background

47 Marchi et al. The mixed genetic origin of the first farmers of Europe admixture. NW Anatolia in particular shows some heterogeneity: AKT16 and Bar8 are found admixed between the Neolithic and the HG component while Bar25 is found with only the Neolithic component, and to lower extents with Iranian Neolithic. Those results are in agreement with the MDS analysis (Fig. 2a) and the demographic inferences (Fig. 3a, Fig. S37).

1.0

0.8

0.6

0.4 Admixture coefficients

0.2

0.0 t r as1 AR1 NE1 Bar8 Herx WC1 Ess7 P Dil16 Asp6 Nea2 Nea3 T Bar25 Klein7 AKT16 Bichon S VLASA7 Stuttga VLASA32 Cars Loschbour

1.0

0.8

0.6

0.4 Admixture coefficients

0.2

0.0 t r as1 AR1 NE1 Herx Bar8 Ess7 WC1 P Dil16 Asp6 Nea2 Nea3 T Bar25 Klein7 Bichon AKT16 S VLASA7 Stuttga VLASA32 Cars Loschbour Figure S21 - Admixture coefficients for 20 Neolithic, Mesolithic, and Palaeolithic individuals, for K=2 (top) and K=3 (bottom). Each color corresponds to a different ancestral component.

48 Marchi et al. The mixed genetic origin of the first farmers of Europe

Sex-specific markers and haplogroups

We used phy-mer169 to determine mitochondrial haplogroups from BAM files for the 15 newly sequenced genomes, with the minimal number of occurrences of a K-mer set to 10 (Table 1; Supp. Table 3). Y-chromosomal haplogroups were determined from BAM files using Yleaf170, using the recommended minimal base-quality of 20 (-q 20) and base-majority to determine an allele of 90% (-b 90). The two newly sequenced HGs from the Danube Gorges have mitochondrial haplogroup U5 (VLASA32: U5a2a, VLASA7: U5a2a), the most common haplogroup in individuals from the Danube Gorges published to date127,171. Y-chromosomal haplogroups of these two individuals (VLASA32: R1b1, VLASA7: I2) were also consistent, as all Mesolithic individuals from the Danube Gorges published to date127,171 exclusively belong to haplogroups I or R (R1b*). In contrast, Y-chromosomal haplogroups of the non-Mesolithic males in our dataset either belong to haplogroup G (Asp6: G2a2b2a3, Bar25: G2a2b2a1, Ess7: G2a2b2a1a1, LEPE52: G2a2b2a1a1c, VC3-2: G2a2a1a3~) or C (LEPE48: C1a2b, Dil16: C1a2b). While haplotypes from haplogroup G (G2a*) were common among Early Neolithic farmers127, haplogroup C was less frequent in comparison (G: 56.52%, C: 17.39%, estimated based on version v42.4 of https://reichdata.hms.harvard.edu/pub/datasets/amh_repo/curated_releases/V42/V42.4/SHARE/pu blic.dir/v42.4.1240K.anno). Mitochondrial haplogroup frequencies observed in the Neolithic individuals were consistent with those previously reported from a Neolithic context. We found five individuals with haplogroup K (Nea2: K1a, Nea3: K1a2c, LEPE48: K1a1, Herx: K1a4a1i, AKT16: K1a3), as well as one individual each with haplotypes N1a1a1 (Bar25), J1c6 (Dil16), W1-119 (Klein7), T2e2 (STAR1) and HV- 16311 (VC3-2). U5 haplotypes, common among HGs127 and increasing in frequencies during the middle and later Neolithic periods, were found in Early Neolithic individuals from Southern Germany and Lower Austria (Ess7: U5b2c1, Asp6: U5a1c1). Additionally, haplogroup H3 was inferred for an individual from the Early-Middle Neolithic in the Danube Gorges (LEPE52). H3* haplotypes are rare or even absent in Early Neolithic individuals, but are found more frequently in the Middle Neolithic in Germany172,173 as well as the Iberian Peninsula78,174,175. While it was previously suggested that H3* haplogroups were associated with a glacial Iberian refugium and have spread throughout Europe from there176, our data indicates that H3* haplotypes were also already present in Neolithic Serbia.

49 Marchi et al. The mixed genetic origin of the first farmers of Europe

Neanderthal introgression

As a result of interbreeding between Neanderthals and modern humans some time after the out-of- Africa migration, all non-African individuals derive a small portion of their genome from Neanderthals. 177 We quantified proportions of Vindija Neanderthal introgression by computing f4 ratio statistics of the form /((0123!,56!78; :,;!<=3) as previously suggested178 with qpF4ratio from the /((0123!,56!78; >!

Figure S22 - Inferred Vindija Neanderthal ancestry proportions for ancient individuals (n=17; individuals used to select and fit the fastsimcoal2 demographic scenario). HGs are shown in light green.

50 Marchi et al. The mixed genetic origin of the first farmers of Europe

For the Neanderthal admixture proportions inferred for the ancient individuals used in fastsimcoal2 demographic modelling (Fig. S22), besides a notable outlier (Bon002), proportions roughly lie in an interval between between 2% and 3%, without any clear pattern correlated to the population history inferred in this study. However, comparisons of Neanderthal ancestry levels between ancient hunter-gatherers (HGs), Neolithic (NEO) and modern European and SW Asian (MOD) individuals (Fig. S23) reveal a significantly lower proportion of Neanderthal ancestry in modern individuals (Kolmogorov-Smirnov tests: HGs vs. MOD, p-value=0.0184, NEO vs. MOD, p-value=0.0002), yet no significant difference between HG and Neolithic groups (p-value=0.5114). Our observations support previous results by Petr et al.178 (Fig. 2 and Fig. S7). The high ancestry proportions inferred in Bichon and especially in Bon002 are likely artifacts from lower data quality.

Bon002

p = 0.00017 [KStest] Vindija ancestry Vindija proportion 0.02 0.03 0.04 0.05

HG Neo. Eurasia (modern) Figure S23 - Comparison of Vindija Neanderthal ancestry proportions between five ancient HGs, 19 Neolithic individuals, and 65 modern individuals from Europe and SW Asia.

51 Marchi et al. The mixed genetic origin of the first farmers of Europe

Section 5: Phenotypic analysis

Pigmentation

We used the HirisPlexS webtool181,182 to predict pigmentation phenotypes of hair, skin and eyes for each of the newly sequenced individuals. In case of missing data in the merged VCF for any of the 38 SNPs used by HirisPlexS, i.e. sites with low or no depth, the positions were looked up in the BAM files directly (see Table S6). In BAMs, only sites at least 3 bases away from either end of the read, with a base quality ≥ 25, and no C/T and G/A SNPs to avoid any effect of PMD on the prediction were considered. In order to deal with the uncertainty associated to observing alleles in BAM files directly, two HirisPlex input files were created for each individual: one in which all sites missing from the VCF and meeting the criteria above were assumed to be homozygous for the allele found in the BAM, and another in which all positions were assumed to be heterozygous. Running HirisPlexS twice for each sample resulted in ranges of probabilities for each phenotype (see Table S6). A prediction was accepted without further explanation if in both runs the same phenotype showed a probability ≥ 0.7182. If predictions differed between runs, the most parsimonious phenotype was chosen, following the approach of Walsh in 146. This approach allowed to predict pigmentation phenotypes for all individuals, except for eye-color in VLASA32 for which no allele at the SNP rs12913832 could be retrieved.

Table S6 - Phenotypes for eye and hair color as well as skin tone, inferred from HirisPlexS and their probabilities for the 15 newly sequenced individuals. The resulting HirisPlexS input and results file can be found in Supp. Table 3.

Sample BAM SNPsa Eye p(Eye_Color)b Hair p(Hair Color)b p(Shade)b Skin Tone p(Skin Tone)b,c Color Color

AKT16 rs885479, Brown pBrown(0.993); Dark pBrown(0.387), pD(0.910); Intermediate/ pInt(0.776); rs12821256, pBrown(0.993) Brown pBlack(0.586); pD(0.800) Pale pP(0.563), rs4959270, pBrown(0.420), pInt(0.410) rs3114908, pBlack(0.543) rs2238289, rs1126809, rs1545397, rs8051733

Asp6 rs885479, Brown pBrown(0.996); Dark pBrown(0.387), pD(0.910); Intermediate/ pInt(0.776); rs1805008, pBrown(0.996) Brown pBlack(0.586); pD(0.800) Pale pP(0.563), rs2228479, to Black pBrown(0.420), pInt(0.410) rs1110400, pBlack(0.543) rs2378249, 52 Marchi et al. The mixed genetic origin of the first farmers of Europe

rs1426654

Bar25 rs683, Brown pBrown(0.954); Dark pBrown(0.426), pD(0.916); Intermediate pInt(0.799); rs3114908, pBrown(0.954) Brown pBlack(0.550); pD(0.936) to Dark pInt(0.560), rs2238289 to Black pBrown(0.408), pD(0.395) pBlack(0.573)

Dil16 rs11547464, Brown pBrown(0.938); Dark pBrown(0.600), pD(0.809); Pale/ pInt(0.455), rs885479, pBrown(0.921) Brown pBlack(0.311); pL(1.000) Intermediate pD(0.512); rs1805008, to pRed(0.995) to Dark pVP(0.444), rs1805007, Auburn pP(0.385) rs1110400, rs4959270, rs1393350, rs683, rs1126809, rs1470608, rs1545397, rs8051733

Ess7 rs11547464, Brown pBrown(0.997); Dark pBrown(0.332), pD(0.994); Pale/ pD(0.972); rs885479, pBrown(0.997) Brown pBlack(0.663); pL(1.000) Dark pVP(0.658), rs1805008, to pRed(0.966) pP(0.129) rs1805007, Auburn rs1110400, rs2238289, rs1126809

Herx rs11547464, Brown pBrown(0.986); Dark pBrown(0.604), pD(0.817); Pale/ pInt(0.507), rs885479, pBrown(0.986) Brown pBlack(0.261); pL(1.000) Intermediate pD(0.436); rs1805008, to pRed(1.000) to Dark pVP(0.671), rs1805007, Auburn pP(0.270) rs1805009, rs2228479, rs1110400, rs683, rs1800414, rs2238289, rs1126809, rs1545397

Klein7 rs885479, Brown pBrown(0.938); Dark pBrown(0.538), pD(0.658); Pale/ pInt(0.760); rs1805008, pBrown(0.938) Brown pBlack(0.370); pL(0.938) Intermediate pP(0.792) rs1805007, to pRed(0.865) rs1110400, Auburn rs4959270, rs10756819, rs2238289, rs1126809, rs6059655

LEPE48 rs885479, Brown pBrown(0.997); Black pBlack(0.729); pD(0.984); Intermediate pInt(0.784); rs1805008, pBrown(0.997) pBlack(0.778) pD(0.971) pInt(0.913) rs10756819, rs2238289

LEPE52 rs885479, Brown pBrown(0.767); Light pBlond(0.298), pL(0.593), Intermediate pInt(0.651), rs1805008, pBrown(0.767) Brown pBrown(0.514), pD(0.407); pD(0.146); rs2238289, pBlond(0.357), pL(0.727) pP(0.530), rs17128291, pBrown(0.427); pInt(0.258) rs1126809, rs1470608

53 Marchi et al. The mixed genetic origin of the first farmers of Europe

Nea2 rs1805009, Brown pBrown(0.989); Dark pBrown(0.484), pD(0.980); Intermediate pP(0.198), rs2228479, pBrown(0.979) Brown/ pBlack(0.497); pD(0.996) pInt(0.508); rs12203592, Black pBrown(0.513), pInt(0.289), rs2238289, pBlack(0.468) pD(0.683) rs1126809

Nea3 rs885479, Brown pBrown(0.968); Dark pBrown(0.386), pD(0.952); Intermediate pInt(0.554), rs12203592, pBrown(0.941) Brown/ pBlack(0.598); pD(0.994) pD(0.386); rs2238289, Black pBlack(0.787) pP(0.190), rs17128291, pInt(0.657) rs1126809, rs1545397

STAR1 rs2378249, Dark pBlue(0.605), Brown/ pBrown(0.480), pL(0.523), Pale/ pP(0.389), rs2238289, Blue pBrown(0.296); Dark pBlack(0.430); pD(0.477); Intermediate pInt(0.535); rs1126809 pBlue(0.605), Brown pBrown(0.540), pL(0.475), pInt(0.709) pBrown(0.296) pBlack(0.367) pD(0.525)

VC3-2 rs1393350, Blue pBlue(0.784); Light pBlond(0.303), pL(0.790); Pale/ pP(0.294), rs3114908, pBlue(0.836) Brown pBrown(0.523); pL(0.790) Intermediate pInt(0.683); rs2238289, pBlond(0.303), pP(0.642), rs1126809, pBrown(0.523) pInt(0.279) rs1470608, rs1545397, rs8051733

VLASA32 rs885479, NA NA Dark pBrown(0.406), pD(0.989); Intermediate/ pVD(0.983); rs1805008, Brown/ pBlack(0.588); pD(0.983) Very Dark pInt(1.000) rs1110400, Black pBlack(0.768) rs1800407, rs1126809, rs1545397

VLASA7 rs1805008, Brown pBrown(0.998); Dark pBrown(0.319), pD(0.997); Intermediate/ pInt(1.000); rs2238289, pBrown(0.998) Brown/ pBlack(0.678); pD(0.992) Dark pD(0.821) rs6497292 Black pBrown(0.391), pBlack(0.602) a SNPs for which the allele was taken from the BAM file, because they were not present in the VCF. Not listed SNPs were present in the VCF. b For each phenotype two sets of probabilities are given, separated by a semicolon: the first were obtained assuming that all the BAM SNPs were homozygous for the found allele and the second assuming a heterozygous state. If per set the highest probability was not larger than 0.7, the two highest values are reported separated by a comma. c (D = Dark, VD = Very Dark , Int = Intermediate, L = Light, P = Pale, VP = Very Pale).

For some individuals, one or several SNPs in the MC1R gene associated with hair color were missing in the VCF. As described above, we therefore considered both homozygous and heterozygous states, resulting in predictions of red hair in some cases. Given the overall low frequency of the derived alleles associated with light pigmentation phenotypes in European populations183,184 as well as in ancient data78, a red haired phenotype is highly unlikely and we considered as an artifact. We followed a similar reasoning for skin pigmentation when MC1R SNPs had to be looked up in the BAM files, as variation in MC1R can also affect the pigmentation level of the skin185,186. We found that the vast majority of Early Neolithic individuals in our dataset most likely had an intermediate to light skin complexion, while the two Mesolithic individuals 54 Marchi et al. The mixed genetic origin of the first farmers of Europe were inferred to have darker skin tone in comparison. A dark (brown to black) hair color was inferred for all but two individuals: for LEPE52 and VC3-2, a light brown phenotype was more likely. Eye color variation was similarly low, with the majority of individuals showing highest probabilities for brown eyes, except STAR1 and VC3-2, which are inferred to have been blue- eyed. Interestingly, the highest phenotypic variation in our dataset seems to originate from Serbian individuals.

Additional phenotypic variation

Lactase persistence All individuals investigated in this study had the ancestral G allele at rs4988235. The derived allele at this SNP is highly associated with lactase persistence into adulthood. Since no derived alleles were found, it is unlikely that any of the individuals was able to digest lactose.

EDAR/ABCC11 We analyzed two SNPs associated with incisor shape/hair thickness (rs3827760, EDAR; derived alleles are associated with thick straight hair and shovel shaped incisors187) and earwax type/body odor (rs17822931, ABCC11; derived alleles at rs17822931 are associated with dry earwax and reduced body odor188), both having high derived alleles frequencies in modern East-Asian populations189–191. All ancient individuals presented here only had ancestral alleles for both SNPs.

Fatty acids synthesis We further investigated seven SNPs located in the FADS1/2 gene complex, reported to have been selected for in various populations192,193. Variation in FADS1/2 is known to influence the ability to synthesize omega-3 and omega-6 long-chain polyunsaturated fatty acids (LC-PUFAs) from precursor molecules193,194. Since LC-PUFAs are also found in food, especially from animal sources, the dietary shift during the Neolithic transition is hypothesised to have created a selective pressure that lowered diversity at FADS associated loci. Genotypes for each SNP were looked up in the filtered VCF. Individuals were grouped by subsistence into hunter-gatherers (HGs) and Neolithic farmers (excluding WC1 and the Lepenski Vir individuals, see Table S7). A two-sided binomial test was used to compare counts of derived alleles in our ancient individuals with frequencies in a modern population. We used the Central Europeans from Utah (CEU, N=99) from the 1000Genomes project as a proxy for a modern Central European population. 55 Marchi et al. The mixed genetic origin of the first farmers of Europe

For rs74771917, while the derived T alleles is almost lost in CEU (0.030), it is found in significantly larger number in farmers (3/26 = 0.115, p<0.043). No differences could be found between CEU and farmers for any of the other SNPs. For HGs, the allele counts suggest lower frequencies for the SNPs rs174546_C (fCEU=0.641; HG: 1/10=0.100, p<0.001), while higher frequencies in comparison to CEU are inferred for rs174570_T (fCEU= 0.162; HG: 3/4 = 0.750, p<0.015) and rs174594_A (fCEU= 0.616, HG: 10/10 =1.000, p<0.009). For rs174594_A, counts suggest a higher frequency in HGs compared to farmers (Farmer: 17/28 = 0.607, HG: 10/10 = 1.000, p<0.008). A similar result was found for rs97384_C (Farmer: 12/26 = 0.462, HG: 7/8 = 0.875, p<0.028). Table S7 - Derived allele frequencies for seven SNPs located in the FADS1/2 gene complex for modern Europeans (CEU) and counts for the derived (p) and total number of alleles for the ancient samples, grouped by subsistence (nHG = 5, nFarmers = 19). SNP Derived Allele CEU derived frequency No. of derived alleles Farmers HGs rs174546 C 0.641 12/24 1/10 rs174570 T 0.162 8/28 3/4 rs174594 A 0.616 17/28 10/10 rs97384 C 0.606 12/26 7/8 rs74771917 T 0.030 3/26 1/8 rs174455 A 0.647 13/20 5/10 rs174465 T 0.702 7/12 4/8

These results are consistent with selection affecting variation in FADS1/2 already during the Early Neolithic78, rather than starting later in the Bronze Age as proposed more recently193. This is because, with the exception of one SNP, frequencies in farmers are close to those in modern populations, but multiple SNPs differ in frequencies between HGs and modern populations. While it is possible that migration and admixture during and after the Late Neolithic led to a shift in frequencies at FADS related loci, our observations are also consistent with selective pressure associated with the Neolithic diet having shaped allele frequencies in Early Neolithic populations to approach the levels observed in modern populations today.

56 Marchi et al. The mixed genetic origin of the first farmers of Europe

Polygenic Score for Height

Height is a classical polygenic trait, which is highly heritable, but also strongly influenced by environmental variables195. Hundreds of variants are known to influence height in humans, with the majority alleles having only a tiny effect196. To measure the distribution of height-related alleles between our samples, we computed polygenic scores (PS) for standing height using a set of 670 SNPs197. To account for missing data, common even in high depth aDNA samples, we applied the generalized risk score approach described by 198 on samples where genotypes could be determined for at least ≥ 75% of the SNPs. This led to the exclusion of five individuals (Bar8, Bon002, Bichon, CarsPas1, AKT16). The highest scores were found for the two individuals from Vlasac, Serbia (VLASA32 - 20.033, VLASA7 - 20.147), while the lowest score was obtained for the individual from Herxheim, Germany (Herx - 18.086). We found a strong positive correlation of individual height PS with the average age of the samples (Pearson's r = 0.7383, p<0.0003, Fig. S24). The correlation remained strong when considering only Neolithic individuals, both with (Pearson's r = 0.6537, p<0.0082) and without the Lepenski Vir individuals (Pearson's r: 0.6565, p<0.0148). These findings are consistent with previous studies reporting selection for decreasing predicted height in Early Neolithic populations from Southern and Central Europe78 as well as modern populations from Sardinia199, but differ from a morphometric analysis of prehistoric skeletons200. Considering the high amount of ancestry attributed to Early Neolithic farmers in modern Sardinians142, these signals may be related. Although we find significant differences between mean scores for Mesolithic and Early Neolithic individuals including Lepenski Vir (Student t-test, t = - 2.427, p<0.0273), these differences can be explained by the older date (except for Loschbour) of the Mesolithic individuals in comparison. The decrease in height PS values could be a continuation of a trend of decreasing stature between the Upper Palaeolithic and the Mesolithic201, a result supported both by genetics and physical measurements. Interestingly, 201 did not find any differences between Mesolithic and Neolithic individuals.

57 Marchi et al. The mixed genetic origin of the first farmers of Europe

Figure S24 - Scatter plot of the average age in BP vs. height PS values for each individual (n=19; Bichon, Bon002, Bar8, CarsPas1, AKT16 being excluded based on genotype-filtering thresholds). The best-fit straight line illustrates the Pearson correlation (r2 = 0.5450) between height PS and sample age, indicating a decline in PS values over time.

58 Marchi et al. The mixed genetic origin of the first farmers of Europe

Section 6: Demographic inference

MSMC2 analysis

Data processing We used MSMC2202 to infer past effective sizes of ancestral populations and their split times for all high-quality ancient and some representative modern individuals (see Table S8) on the phased- imputed dataset prepared with SHAPEIT4 (Supplementary Section 3). To ensure high data quality especially high mappability and genotype quality, we followed 202 and used two masks: 1) per chromosome mappability masks (um75-hs37d5.bed.gz) for the human reference genome hs37d5 downloaded from https://github.com/wangke16/MSMC-IM/tree/master/masks; 2) sample-specific masks that we generated as suggested by 202. We then ran the “generate_multihetsep.py” script from MSMC-tools (https://github.com/stschiff/msmc-tools, commit 07bc8a9) to get single, multi- sample and pair population input files for MSMC2. Example of a command line for two, four or eight haplotypes: generate_multihetsep.py --chr 1 --mask ind1.chr1.mask.bed --mask mappabilityMaskperChr/chr1_m75- hs37d5.bed ind1.chr1.phased.vcf > ind1.chr1.multihetsep.txt generate_multihetsep.py --chr 1 --mask ind1.chr1.mask.bed --mask ind2.chr1.mask.bed --mask mappabilityMaskperChr/chr1_m75-hs37d5.bed ind1.chr1.phased.vcf ind2.chr1.phased.vcf > pop1.chr1.multihetsep.txt generate_multihetsep.py --chr 1 --mask ind1.chr1.mask.bed --mask ind2.chr1.mask.bed --mask ind3.chr1.mask.bed --mask ind4.chr1.mask.bed --mask mappabilityMaskperChr/chr1_m75-hs37d5.bed ind1.chr1.phased.vcf ind2.1.phased.vcf ind3.chr1.phased.vcf ind4.chr1.phased.vcf > pop1_pop2.chr1.multihetsep.txt

Table S8 - Samples considered in the MSMC2 analyses (n=29). Population Period Samples Africa Modern Mende-1, Mende-2 Western Europe Modern French-1, French-2 Eastern Asia Modern Han-1, Han-2 Southern America Modern Karitiana-1, Karitiana-2 Central Serbia Neolithic STAR1, VC3-2 NW Anatolia Neolithic AKT16, Bar25 Hungary-Neo Neolithic NE1 Lower Austria Neolithic Asp6, Klein7 Northern Greece Neolithic Nea2, Nea3 Southern Germany1 Neolithic Dil16, Ess7 Southern Germany2 Neolithic Herx, Stuttgart Zagros Region Neolithic WC1 Danube Gorges-Meso Mesolithic VLASA7, VLASA32 Northern Europe-Meso Mesolithic SF12 Western Europe-Meso Upper Palaeolithic - Mesolithic Bichon, Loschbour Lepenski Vir Transformational - Neolithic LEPE48, LEPE52

59 Marchi et al. The mixed genetic origin of the first farmers of Europe

Inference of human population size We inferred past effective population size for each diploid individual separately (Fig. S25a) as well as using two samples per population (Fig. S25b), if available (see Table S8), using command lines such as: msmc2 -t11 -s -o ind1.2haps.msmc2 ind1.chr*.multihetsep.txt msmc2 -t11 -I 0,1,2,3 -s -o pop1.4haps.msmc2 pop1.chr*.multihetsep.txt All MSMC2 results were scaled using a mutation rate of 1.25 × 10−8 per base pair per generation202,203, and a generation time of 29 years204.

Mende Nea3 Han Dil16 Karitiana Ess7 French Herx STAR1 Stuttgart VC32 WC1 AKT16 VLASA32 N B25 VLASA7 NE1 Asp6 Bichon Klein7 Loschbour Nea2 LEPE48 LEPE52 0 10000 20000 30000

1e+04 2e+04 5e+04 1e+05 2e+05 5e+05 1e+06 2e+06 Years ago Africa EAsia SAmerica WEurope CentralSerbia Hungary−Neo

LowerAustria

N NorthernGreece SouthernGermany1 SouthernGermany2 ZagrosRegion DanubeGorges−Meso NorthernEurope−Meso WesternEurope−Meso 0 10000 20000 30000 1e+04 2e+04 5e+04 1e+05 2e+05 5e+05 1e+06 2e+06 Years ago

Figure S25 - MSMC2 population size estimates scaled using a mutation rate of 1.25×10−8 per generation per site and a generation time of 29 years. a: Past demography obtained for each individual (two haplotypes; nind=29). b: Past demography obtained for two individuals (four haplotypes) per population, where available (i.e. not for NE1, SF12 and WC1, who represent a unique group each; npop=16). Both analyses suggest smaller population sizes in the most recent times for HG populations compared to farmers.

60 Marchi et al. The mixed genetic origin of the first farmers of Europe

The MSMC2 results (Fig. S25) were similar for all non-African populations before 35 kya and included a deep bottleneck at roughly 50-60 kya, a pattern that has been reported previously205. As expected, we were not able to get good recent estimates from a single diploid individual, although a clear difference in the population size between farmers and hunter-gatherers (HGs) is observable. The resolution at recent times was much improved when using two individuals per population, in which case we found that all the Upper Paleo-Mesolithic populations and Zagros Region exhibit a small population size of roughly Ne = 3,000 up to 35 kya. In contrast, we estimated a constant increase of population size after 35 kya in all Neolithic populations except Zagros Region (WC1).

This increase resulted in population sizes around Ne = 10,000 - 15,000 about 10 kya, with NW Anatolia showing the smallest population size, Lepenski Vir and Central Serbia showing similar sizes also around Ne = 10,000 and smaller than the Austrian-German Neolithic populations (Lower Austria, Southern Germany1, Southern Germany2).

Divergence time between populations To estimate split times between population pairs, we used MSMC2 1) to estimate coalescent rates among the samples of the first population, 2) to estimate coalescent rates among the samples of the second population, and 3) to estimate coalescent rates across the two populations. Example command lines used for two populations pop1 and pop2 were: msmc2 -t11 -I 0,1,2,3 -s -o pop1.4haps.msmc2 pops.chr*.multihetsep.txt, msmc2 -t11 -I 4,5,6,7 -s -o pop2.4haps.msmc2 pops.chr*.multihetsep.txt, msmc2 -t11 -I 0-4,0-5,0-6,0-7,1-4,1-5,1-6,1-7,2-4,2-5,2-6,2-7,3-4,3-5,3-6,3-7 -s -o pop1- pop2.8haps.cross.msmc2 pops.chr*.multihetsep.txt. We then used the MSMC2 script combineCrossCoal.py to create a single output file with all three rates: combineCrossCoal.py pop1-pop2.*.cross.msmc2.final.txt pop1.*.msmc2.final.txt pop2.*.msmc2.final.txt > pop1-pop2.combined.msmc2.final.txt For each population-pair, we then plotted the relative cross-coalescence rate (CCR), which is estimated by taking the ratio of the across-rate and the mean within-rate (Fig. S26)202. The relative CCR indicates when two populations were a single population (values around 1) and when they were well separated into two isolated populations (values close to zero)202. However, translating the relative CCR into estimates of split times is difficult for two reasons: First, if a population split was followed by migration, the relative CCR will remain high even after the split. Second, the uncertainty associated with the different colaescent rates translates into a gradual change in the

61 Marchi et al. The mixed genetic origin of the first farmers of Europe relative CCR, even under a hard split. As a rough estimate, 202 recommends estimating split times as the time when the relative CCR hits 0.5, which we show in Fig. S27a-b for all pairwise comparisons.

STAR1−VC3 Nea2−Nea3 VLASA7−VLASA32 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1000 2000 5000 10000 20000 50000 1000 2000 5000 10000 20000 50000 1000 2000 5000 10000 20000 50000 Years ago Years ago Years ago

AKT16−B25 Dil16−Ess7 SF12

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1000 2000 5000 10000 20000 50000 1000 2000 5000 10000 20000 50000 1000 2000 5000 10000 20000 50000

Years ago Years ago Years ago

Neo

− NE1 Herx−Stuttgart Bichon−Loschbour CCR from Hungary 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1000 2000 5000 10000 20000 50000 1000 2000 5000 10000 20000 50000 1000 2000 5000 10000 20000 50000

Years ago Years ago Years ago

Asp6−Klein7 WC1 LEPE48−LEPE52 CCR from 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1000 2000 5000 10000 20000 50000 1000 2000 5000 10000 20000 50000 1000 2000 5000 10000 20000 50000 Years ago Years ago Years ago

Figure S26 - Relative cross coalescence rate (CCR) estimated with MSMC2 using pairs of individuals per population where available (NE1, SF12 and WC1 are the unique representative of one population each). We estimated the relative CCR for all possible pairwise group combinations. The results were scaled using a mutation rate of 1.25×10−8 per generation per site and a generation time of 29 years. Each subplot shows the pairwise CCR estimates of one population against all others. First two columns: Populations from the Neolithic period (panels a - h). Last column: Populations from the Upper Paleo-Mesolithic period (panels i - k) and transitional and Neolithic layers at Lepenski Vir in the Danube Gorges (panel l).

62 Marchi et al. The mixed genetic origin of the first farmers of Europe

Meso Meso Meso − − − Neo Germany2 Germany1 Europe Greece Europe − Gorges Serbia Region Austria Danube Western Northern Southern Southern Hungary Lower Central Northern Zagros

SouthernGermany2 23006 38501 21737 21784 21185 23141 20991 34426 33811 39403 46912

SouthernGermany1 23006 38891 21208 19358 20152 20650 20172 33111 32873 37253 44464

Hungary−Neo 38501 38891 40048 38089 37146 40066 37270 43945 41561 45885 56140

LowerAustria 21737 21208 40048 19505 19673 22579 23507 33792 33275 37564 46496

CentralSerbia 21784 19358 38089 19505 20878 20587 21262 33373 32331 35900 45281

21185 20152 37146 19673 20878 22866 23134 34963 34582 36385 46029

NorthernGreece 23141 20650 40066 22579 20587 22866 21591 35267 34150 37655 47395

20991 20172 37270 23507 21262 23134 21591 29786 30386 34072 43501

DanubeGorges−Meso 34426 33111 43945 33792 33373 34963 35267 29786 14592 40020 30499

WesternEurope−Meso 33811 32873 41561 33275 32331 34582 34150 30386 14592 39457 35591

ZagrosRegion 39403 37253 45885 37564 35900 36385 37655 34072 40020 39457 49618

NorthernEurope−Meso 46912 44464 56140 46496 45281 46029 47395 43501 30499 35591 49618

SouthernGermany2 ● ● ●● ●●

SouthernGermany1 ● ●●● ● ● ● ● SouthernGermany2 Hungary−Neo ●●●●● ● ● SouthernGermany1 LowerAustria ● ●● ● ● ● ● ● Hungary−Neo ● LowerAustria CentralSerbia ●● ●●● ● ● ● CentralSerbia ●● ●● ●● ●● NorthernGreece ● ● ●● ● ● ● NorthernGreece

● Population ● ●●● ● ● ● DanubeGorges−Meso DanubeGorges−Meso ● ●●●●● ● ● WesternEurope−Meso

WesternEurope−Meso ● ●●●●● ● ● ● ZagrosRegion NorthernEurope−Meso ZagrosRegion ● ● ●● ● ●

NorthernEurope−Meso ●●●●●● ● ●

20000 30000 50000 Split time estimated at CCR of 0.5 Figure S27 - Divergence time estimates obtained with MSMC2 at the relative CCR of 0.5 for all pairwise comparisons. a: Square matrix showing the corresponding split time values between all pairs. b: Point split time estimates for all pairwise comparisons. Populations with only one individual (Hungary-Neo, Zagros Region and Northern Europe-Meso) are marked in red and those comparisons are plotted using symbols with outlines only (not filled). 63 Marchi et al. The mixed genetic origin of the first farmers of Europe

As a first observation, we note that split time estimates involving populations with a single sample (Northern Europe-Meso, Zagros Region and Hungary-Neo) tend to be much older. Indeed, CCR estimates from one-sample populations drop to zero more quickly than corresponding estimates from two-sample populations among Neolithic and Mesolithic populations. We therefore caution against interpreting the older split time estimates of Northern Europe-Meso from all other populations as strong evidence for the influence of Eastern HGs, which diverged from Western HGs at an earlier period. Focusing on two-sample Neolithic populations, the estimated split times capture the old split time between Upper Palaeolithic-Mesolithic and Neolithic populations, and also show that all Neolithic populations split at roughly the same time. However, no clear order of split times emerges among the Neolithic populations, with large variations depending on the exact comparison. Among the most consistent results across several comparisons is a relatively old split of NW Anatolia, followed by populations from what is today Greece, the Balkans and finally Austria-Germany, in accordance with a stepwise migration between neighbouring regions. Lepenski Vir for instance, split first from NW Anatolia and Northern Greece (~24 kya), then from Central Serbia (~21 kya) and from the Austro-German Neolithic populations (~19-22 kya). But we note that several individual estimates are at odds with this sequence of events: NW Anatolia, for instance, was inferred to have split from Lower Austria and Lepenski Vir (~24 kya) earlier than from Northern Greece and Central Serbia (~22 kya). Similarly, Northern Greece split from NW Anatolia earlier than from Central Serbia and Southern Germany1 (~21 kya). Our CCR results may thus not provide sufficient resolution at this scale. In addition, they are likely affected by recent admixture. The relatively old split time we estimated between NW Anatolia and the other Neolithic populations, along with the more recent split time we estimated between NW Anatolia and the two Upper Palaeolithic-Mesolithic populations (Danube Gorges-Meso and Western Europe-Meso), might be well explained by recent HG admixture into NW Anatolia, in particular AKT16 (Fig. S21). Interestingly, the split time between the Danube Gorges-Meso and Western Europe-Meso is estimated the most recent (~15 kya) among all populations, suggesting they share more recent common ancestors. While this observation is in line with their clustering in our MDS analysis (Fig. S19), it is somewhat younger than estimates obtained with fastsimcoal2 (see below), and potentially a result of the very estimates of their recent population sizes (and hence low intra- coalescent rates) and the uncertainty associated with cross-coalescent rates.

64 Marchi et al. The mixed genetic origin of the first farmers of Europe

In line with the final demographic scenario inferred from SFS analyses and shown in Fig. 3, MSMC2 infers split times of ~22 kya among Neolithic populations and ~35 kya between Neolithic and Mesolithic samples (Fig. S27). We note, however, that Bichon and Loschbour were estimated to have split from the Danube gorge Mesolithic samples more recently (~15 kya) than what is inferred from the site frequency spectrum (23.3 kya). These estimates obtained from the analysis of whole genomes are older than what is inferred with fastsimcoal2 on the neutrally evolving sites only, but are not incompatible given that they assume no gene flow or admixture between samples. Previous analyses have also shown that while MSMC and SFS-based inference programs were giving very congruent results on simulated data, they could lead to inconsistent result when applied to real genomes206, implying that the two approaches were differentially affected by genomic factors not included in their assumptions.

Bootstrap analysis for the Ne and CCR estimates

To obtain confidence intervals around coalescence rate estimates, we generated 20 artificial genomes by block-bootstrapping MSMC2 input files in 5Mb blocks as suggested in 202. We show the bootstrap estimates together with the original estimates of the effective population size for all ancient populations (Fig. S28) as well as the bootstrap estimates and original values of the relative CCR between NW Anatolia and all other ancient populations (Fig. S29). We observed that there is little uncertainty around the estimates of the original dataset based on the bootstrap estimates, although the uncertainty increases for the most recent times (after 10 kya), for the cases where there are fewer haplotypes per run, or samples of low depth (e.g. Bon002, Fig. S29l), as expected202.

65 Marchi et al. The mixed genetic origin of the first farmers of Europe

Nea2−Nea3 VLASA7−VLASA32 0 10000 20000 30000 0 10000 20000 30000 0 10000 20000 30000 1e+04 5e+04 2e+05 5e+05 2e+06 1e+04 5e+04 2e+05 5e+05 2e+06 1e+04 5e+04 2e+05 5e+05 2e+06 Years ago Years ago Years ago

Dil16−Ess7 SF12 0 10000 20000 30000 0 10000 20000 30000 0 10000 20000 30000 1e+04 5e+04 2e+05 5e+05 2e+06 1e+04 5e+04 2e+05 5e+05 2e+06 1e+04 5e+04 2e+05 5e+05 2e+06 Years ago Years ago Years ago

NE1 Herx−Stuttgart Bichon−Loschbour Neo − Hungary −

e N 0 10000 20000 30000 0 10000 20000 30000 0 10000 20000 30000

1e+04 5e+04 2e+05 5e+05 2e+06 1e+04 5e+04 2e+05 5e+05 2e+06 1e+04 5e+04 2e+05 5e+05 2e+06

Years ago Years ago Years ago

Asp6−Klein7 WC1 LEPE48−LEPE52 Austria Lower −

e N 0 10000 20000 30000 0 10000 20000 30000 0 10000 20000 30000

1e+04 5e+04 2e+05 5e+05 2e+06 1e+04 5e+04 2e+05 5e+05 2e+06 1e+04 5e+04 2e+05 5e+05 2e+06 Years ago Years ago Years ago

Figure S28 - Variation in estimates of past effective sizes for all ancient populations obtained from 20 artificial data sets per population generated by bootstrapping 5Mb blocks. First two columns: Populations from the Neolithic period (panels a - h). Last column: Populations from the Upper Paleo-Mesolithic period (panels i - k) and transitional and Neolithic layers at Lepenski Vir in the Danube Gorges (panel l).

66 Marchi et al. The mixed genetic origin of the first farmers of Europe

STAR1−VC3 Dil16−Ess7 SF12 CCR from 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1000 2000 5000 10000 20000 50000 1000 2000 5000 10000 20000 50000 1000 2000 5000 10000 20000 50000 Years ago Years ago Years ago

NE1 Herx−Stuttgart Bichon−Loschbour CCR from 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1000 2000 5000 10000 20000 50000 1000 2000 5000 10000 20000 50000 1000 2000 5000 10000 20000 50000 Years ago Years ago Years ago

Asp6−Klein7 WC1 LEPE48−LEPE52 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1000 2000 5000 10000 20000 50000 1000 2000 5000 10000 20000 50000 1000 2000 5000 10000 20000 50000 Years ago Years ago Years ago

Nea2−Nea3 VLASA7−VLASA32 Bon002 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1000 2000 5000 10000 20000 50000 1000 2000 5000 10000 20000 50000 1000 2000 5000 10000 20000 50000 Years ago Years ago Years ago

Figure S29 - Variation in estimates of coalescence rates (CCR) for all comparisons involving NW Anatolia and obtained from 20 artificial data sets per population generated by bootstrapping 5Mb blocks. First two columns: Populations from the Neolithic period (panels a - f) and Mesolithic period at Vlasac (panel h). Last column: Populations from the Upper Paleo- Mesolithic period (panels i - j), transitional and Neolithic layers at Lepenski Vir in the Danube Gorges (panel k) and a low depth sample from Boncuklu in the Konya Plain of Central Anatolia (panel l).

67 Marchi et al. The mixed genetic origin of the first farmers of Europe fastsimcoal2 analyses

Given the relatively high depth (>10X) of our newly sequenced individuals, it was possible to perform demographic inferences on ancient genomes in a way similar to what is done on modern individuals. By computing the site frequency spectrum (SFS) on unascertained polymorphic “neutral sites”, we could evaluate the likelihood of various historical scenarios and infer parameter values under these models with fastsimcoal2207. We nevertheless had to make several adjustments to take into account the temporal and spatial heterogeneity of the collected samples. First, in order to perform population level analyses, we had to pool geographically and culturally similar individuals into the same population, even though they would have been living centuries apart (Table S9). Since this temporal heterogeneity can lead to a Wahlund effect208 that translates into a faster rate of coalescence between the two homologous gene copies of an individual relative to two gene copies from different individuals, we introduced population inbreeding coefficients in our modeling as nuisance parameters to specifically account for this population subdivision effect209. Second, we introduced unsampled populations from which sampled populations could receive migrants, to reflect the fact that human populations are seldom isolated when not living on islands, which has a direct effect on the shape of the SFS within populations210 and could thus introduce biases in demographic inferences if not properly taken into account. These unsampled populations can also be considered as a set of populations (a metapopulation) surrounding sampled populations (i.e. other unsampled farmer populations still exchanging genes with sampled farmer populations, or surrounding hunter-gatherer (HG) groups contributing to the early farmer gene pool by admixture)211–213. Note also that we have modeled gene flow between metapopulations and sampled populations as a single pulse for modeling convenience, even though gene flow might have been continuous over several generations.

68 Marchi et al. The mixed genetic origin of the first farmers of Europe

Data sets and additional filtering We prepared seven datasets for the fastsimcoal2207 demographic analyses of the newly sequenced individuals. The first dataset (DS11) included the following 11 individuals (two Mesolithics from the Danube Gorges: VLASA7, VLASA32 and 9 : Nea2, Nea3 from Northern Greece; STAR1, VC3-2 from Central Serbia; Asp6, Klein7 from Lower Austria; Dil16, Ess7, Herx from Southern Germany). One Neolithic individual from NW Anatolia (Bar25) was added to the second dataset (DS12). The other data sets included a reduced number of samples to simplify the multidimensional SFS, i.e. the two Mesolithic and the two Greek Neolithic individuals plus some selected genomes to address specific questions: two NW Anatolian individuals (Bar25, AKT16) in DS6_AKT; one Central Anatolian individual (Bon002) in DS5_Bon; one Early Neolithic Iranian (WC1) in DS5_WC; one Upper Palaeolithic (Bichon) and one Mesolithic individual (Loschbour) from Western Europe in DS6_BL; four genomes from Central Serbia and Serbian Lepenski Vir site in DS8_LV.

Table S9 - Population composition and ages used for demographic modeling. Populations that are archeologically defined as Palaeolithic or Mesolithic are highlighted in light grey; the other populations belong to Neolithic cultures. The age of the population, given in generations with a generation time of 29 years204, was taken within the range of the ages of its constituent samples.

Population Age Genomes dataset (DS) BP 11 12 6_AKT 5_Bon 5_WC 6_BL 8_LV Konya Plain 347 Bon002 X Zagros region 317 WC1 X NW Anatolia (AKT) 295 AKT16 X NW Anatolia (BAR) 295 Bar25 X Northern Greece 280 Nea2; Nea3 X X X X X X X Central Serbia 260 STAR1; VC3-2 X X X Lepenski Vir 274 LEPE48; LEPE52 X Lower Austria 245 Klein7; Asp6 X X Southern Germany 240 Dil16; Ess7; Herx X X Bichon 472 Bichon X Loschbour 278 Loschbour X Danube Gorges Mesolithic 295 VLASA7; VLASA32 X X X X X X X

69 Marchi et al. The mixed genetic origin of the first farmers of Europe

From the merged VCF, and for each data set, we excluded sites with any missing data in any individual of the dataset, those that could not be polarized by the use of the chimp and gorilla reference genomes, CpG sites, sites found in CpG islands or in genomic regions with recombination rate less than 1.5 cM/Mb (such as to minimize the impact of background 163 selection ). We thus kept a total of TX sites for the DSX dataset, among which SX were

polymorphic and MX monomorphic (the properties of each dataset are found in Table S10). Second, in order to have neutral data sets best suited for demographic inference163, we also removed sites that were potentially affected by GC-biased gene conversion (BGC) and thus only

kept Sneutral_X BGC-free A↔T and G↔C polymorphic sites, which represented &:~ 0.2 of the

filtered polymorphic sites for any DSX. Since we would have expected them to represent one third of all filtered sites if mutation rates had been equal for all mutation types, we computed the ratio

!: = &:/(1/3) = 3 &:, which represents the reduction in mutation rates that has occurred at those

sites in dataset DSX. We then estimated the number of neutrally evolving monomorphic sites in

each dataset as Mneutral_X = MX/3, since we expect that one third of all mutations should be BGC- free. Table S10 - Properties of the different data sets used for the demographic inferences.

Data sets Labela Description DS11 DS12 DS5_Bon DS5_WC DS6_BL DS6_AKT DS8_LV

Total number of sites TX 138,324,714 13,0276,135 27,997,322 212,687,126 133,371,094 180,733,357 169,075,132 passing filtering criteriab

SX Polymorphic among TX 253,439 237,454 42,508 362,864 221,040 298,653 301,083

Sneutral_X WWSS SX sites 48,024 45,247 8901 68,818 41,381 54,968 56,279

MX Monomorphic among TX 138,071,275 130,038,681 27,954,814 212,324,262 133,150,054 180,434,704 168,774,049

MX/3 46,023,758 43,346,227 931,8271 70,774,754 44,383,351 60,144,901 56,258,016 Mneutral_X

Tneutral_X Mneutral_X+Sneutral_X 46,071,782 43,391,474 9,327,172 70,843,572 44,424,732 60,199,869 56,314,295

&X Sneutral_X/SX = rX/3 0.1894 0.1906 0.2094 0.1897 0.18721 0.1841 0.1869

μneutral_X rX*μtot = rX*1.25e-8 7.11E-09 7.15E-09 7.85E-09 7.11E-09 7.02E-09 6.90E-09 7.01E-09

a X subscripts indicate a given dataset. b All sites found without missing data for all individuals of a given dataset, that are non CpG sites nor in CpG islands, with similar reference allele in the chimpanzee and gorilla reference genomes and a recombination rate ≥ 1.5 cM/Mb 70 Marchi et al. The mixed genetic origin of the first farmers of Europe

Parameter inference via maximum likelihood fastsimcoal2207 was used to estimate parameters from the multidimensional derived site frequency spectrum (dSFS) computed on each dataset with a dedicated R program on Tneutral_X neutral sites matching the filtering criteria listed above. Since we have restricted our analyses to a set of neutral

203,214 variants not influenced by BGS, we had to adjust the basal mutation rate -2A2 =1.25e-8 to take into account the potentially lower rate of A↔T and G↔C mutations. The neutral mutation rates -

fsc -t xxx.tpl -n500000 -d -e xxx.est --initvalues xxx.pv -M -l60 -L60 -q -C5 --multiSFS --logprecision 18 The limits of 95% confidence intervals were finally estimated by computing the 2.5% and 97.5% quantiles of the distribution of the 100 newly estimated ML parameter values.

Demographic models and parameter estimation Expansion of early farmers into Europe

Tested models We have investigated the fit to data of several models of the spread of Neolithic farmer populations from the Aegean-Marmaran basin to central Europe along the Danubian corridor. The data consisted in the multidimensional dSFS computed on the dataset DS11. Note that we did not consider the two individuals from Lepenski Vir in this analysis, because one belongs stratigraphically to the Transformational/Early Neolithic layer, which is thought to be ‘transitional’ between Mesolithic and Neolithic (for instance, domesticates are absent); hence it is unclear if their cultural attribution corresponds to their genetic relatedness, although they are found genetically closer to Neolithic farmers than to Mesolithic foragers on the MDS analyses (Fig. S19). One class of models assumed that the spread of farmer populations consisted in a series of stepping-stone (SS) migration events from Northern Greece to Central Serbia, then from Central Serbia to Lower Austria, and finally from Lower Austria to Southern Germany. Another class of models assumed a partial SS model in that Lower Austria could be directly settled from Northern Greece, thus bypassing Central Serbia (BP class of models). In all tested models shown in Fig. S30a-d, we also assumed that Neolithic and Mesolithic sampled populations belonged to two larger pools of unsampled populations from which they could potentially receive some migrants. We here model these sets of unsampled populations as metapopulations (called MetaEast and MetaCentral), but they are usually described as ghost populations211,212 or continents207,213.

72 Marchi et al. The mixed genetic origin of the first farmers of Europe

TBot TBot Ancestral a Ancestral >TDiv NBot b Ancestral >TDiv NBot Meta X Meta X Ancestral

TDiv N N Meta Ancestral TDivMeta Ancestral >TDivMeso&Neo >TDivMeso&Neo TBot TBot Meta =TDiv -1 NBot Meta =TDiv -1 NBot Meta NBot MetaEast Meta NBot MetaEast X MetaCentral X X MetaCentral X

TDivNeo TDivNeo

admHG admHG AncNeo AncNeo TAdmAncNeo TAdmAncNeo Neo Neo

TDivMeso TDivMeso admHGMeso admHGMeso NBot admNeo admNeo NGreece NGreece NGreece X admHG admNeo admNeo CSerbia CSerbia admHGCSerbia CSerbia TDivCSerbiaNGreece TDivCSerbiaNGreece admHG admHG LAustria admNeo LAustria LAustria admNeo LAustria TDivLAustriaCSerbia TDivLAustriaCSerbia admHGSGermany admNeo SGermany admHG admNeo SGermany SGermany TDivSGermanyLAustria TDivSGermanyLAustria

Meso Meso N CSerbia N Meso LAustria NGreece LAustria CSerbia NGreece f N N N f Meso N SGermany CSerbia MetaEast SGermany N N MetaEast MetaCentral f LAustria f NGreece N LAustria f CSerbia N f MetaCentral N f f NGreece N f SGermany MetaEast f SGermany MetaEast N N MetaCentral MetaCentral Lower Austria Lower Austria Central Serbia Central Serbia Northern Greece Northern Greece Southern Germany Southern Germany Danube Gorges Meso Danube Gorges Meso c TBot d TBot Ancestral Ancestral >TDiv NBot >TDiv NBot Meta X Ancestral Meta X Ancestral

N N TDivMeta Ancestral TDivMeta Ancestral >TDivMeso&Neo >TDivMeso&Neo TBot TBot Meta =TDiv -1 NBot Meta =TDiv -1 NBot Meta NBot MetaEast Meta NBot MetaEast X MetaCentral X X MetaCentral X

TDivNeo TDivNGeo

admHGAncNeo TAdmAncNeo TAdmAncNeo

Neo Neo

TDivMeso TDivMeso admHGMeso admHGMeso NBot NGreece NBotNGreece admNeo X X NGreece

admHGCSerbia admNeo CSerbia TDivCSerbiaNGreece TDivCSerbiaNGreece admHG LAustria admNeo LAustria TDivLAustriaCSerbia TDivLAustriaCSerbia admHGSGermany admNeo SGermany TDivSGermanyLAustrriaia TDivSGermanyLAustria

N Meso Meso LAustria CSerbia NGreece N CSerbia f Meso N Meso LAustria NGreece SGermany N N MetaEast f N N N f LAustria f CSerbia NGreece N SGermany CSerbia MetaEast MetaCentral N f MetaCentral N f LAustria f f NGreece N f SGermany MetaEast MetaEast N N f SGermany MetaCentral MetaCentral Lower Austria Central Serbia Lower Austria Central Serbia Northern Greece Northern Greece Southern Germany Southern Germany Danube Gorges Meso e Danube Gorges Meso

a b c d a b c d SS BP

73 Marchi et al. The mixed genetic origin of the first farmers of Europe

Figure S30 - Schematic description of stepping stone models tested for the DS11 dataset (a-d) and comparison of their likelihood (e). a: Pure stepping-stone model. The East and Central metapopulations diverged TDivMeta generations ago followed by bottlenecks (X) one generation later. The sampled Mesolithic population then diverged from its metapopulation TDivMeso generations ago. The Northern Greek

Neolithic population diverged from its metapopulation TDivNeo generations ago; an admixture in

Northern Greece from Central metapopulation was possible at a variable time TAdmAncNeo and from East metapopulation 310 generations ago which corresponds to the onset of farming in this area 43,55. Then the Central Serbian Neolithic population derived from Northern Greece

TDivCSerbiaNGreece generations ago, the Lower Austrian Neolithic from Central Serbia TDivLAustriaCSerbia generations ago and Southern German Neolithic population from Lower Austria

TDivSGermanyLAustria generations ago. Admixture with Central and East metapopulations was possible at the time of the founding of the Neolithic populations from Central Serbia, Lower Austria and Southern Germany. All migration events are modeled as single pulses of admixture. b: Pure stepping-stone model with bottleneck. Same as a, except that we authorize a bottleneck in the Northern Greek population at the time of introduction of Neolithic in Greece 310 generations ago. c: Pure stepping-stone with bottleneck, no Eastern gene flow. Same as b, except that there is no admixture from the East metapopulation into Neolithic samples. d: Pure stepping-stone with bottleneck, no HG gene flow. Same as b, except that there is no admixture from the Central metapopulation into Neolithic samples. Bypassing models (BP) are similar to these models except that Lower Austria is directly derived from Northern Greece and not from Central Serbia. The parameters search ranges are shown between brackets on the figures, but parameters are further described in Supp. Table 4 and properties of the sampled populations can be found in Table S9. e: Distribution of the log10-likelihoods obtained by simulation from the maximum likelihood point estimates for the eight tested DS11 models, the Stepping-Stone scenarios on the left and the Bypassing ones in red, on the right. Higher likelihoods (top of the plot) reflect better simulations.

74 Marchi et al. The mixed genetic origin of the first farmers of Europe

Best model parameters estimates and SFS fit Within the 8 tested models, we find that each stepping-stone model (SS) is better supported than its corresponding bypassing (BP) model (Fig. S30e, Supp. Table 4), suggesting some stepwise expansion of the early farmers in Europe along the Danube. Within the SS models, the B model is the best (Fig. S30e, Supp. Table 4, Fig. S31) with an excellent fit between the observed multi and marginal SFS and the ones simulated under this model during the best run (Fig. S32). As compared to the second best model (Fig. S30a), the best model includes a bottleneck on the Northern Greek branch 310 generations ago, allowing for a potential reduction of the population size induced by the introduction of Neolithic practices in Greece (~9 kya43,55). However, this bottleneck modeled as lasting a single generation is relatively weak (~ 227 individuals), compared to the ancestral bottleneck, the bottleneck occurring on the Central metapopulation branch or even on the East metapopulation branch after the split between the metapopulations. Interestingly, the population size of the Mesolithic population is in the range of the Neolithic population sizes. As expected for the metapopulations, the model shows larger sizes (factor of 10 or more) than for the sampled populations and a deep divergence (~20 kya, 95% CI 15-38 kya). Additionally, the model finds out a deep divergence too between the Northern Greek population and the East metapopulation (~19 kya, 95% CI 11-29 kya), but a recent split time between the Mesolithic population from the Danube Gorges and the Central metapopulation but with a broad 95% CI (~10 kya, 95% CI 9-21 kya). Interestingly, this Danube Gorges Mesolithic population remained well isolated ever since, as they show very little later admixture (1%) (Fig. S31). Contrastingly, the Early Neolithic populations were well connected to other farmer communities, as modeled by gene flow coming from the late East metapopulation with an intensity of 2 to 13%. They even incorporated a few HG individuals (2-6% from the Central metapopulation) at all stages of the dispersal along the Danubian corridor. Thus, the models without HG gene flow (Fig. S30d) or without gene flow from the East metapopulation (Fig. S30c) into the Neolithic populations are less likely than the best model, showing that human populations are seldom isolated and the need of modeling surrounding unsampled populations. Beside these moderate and recent gene flow events, we detected a massive (34%, 95% CI 30-59%) and old (~13 kya, 95% CI 9-17 kya) admixture from the Central metapopulation into population ancestral to the Northern Greek, suggesting that all the early farmers of Europe have a mixed ancestry.

75 Marchi et al. The mixed genetic origin of the first farmers of Europe

TBot Ancestral 1.6 [1.2-2.5] >TDiv 8283 [6750-9431] X Meta

37,403 [31,927-44,098] TDiv Meta 689 [532-1316] >TDivMeso&Neo TBot =TDiv -1 Meta Meta X 4.5 [4-200] 78 [7-500] X

TDiv Neo 661 [390-1015]

34% [30-59] TAdm AncNeo 446 [310-572] [300-TDivNeo

TDivMeso 352 [309-736] 1% [1-8] 320

13% [2-38] 310 227 [10-500]X TDiv 6% [2-10] 4% [1-8] CSerbiaNGreece 280 [280-281] 5% [1-8] TDiv 2% [1-8] LAustriaCSerbia 261 [260-264] TDiv 3% [2-8] 2% [1-7] SGermanyLAustria 255 [255-255]

MetaEast MetaCentral Lower Austria Central Serbia Northern Greece 2570 [672-6703] 1998 [593-3862] 0.05 [0.02-0.16]Southern1580 Germany 0.04[1646-7821] [0.01-0.07]3050 0.10[1056-6842] [0.03-0.10]2335 0.09[1497-5998] [0.04-0.10] 0.07 [0.02-0.07] 21,345 [4169-80,838]Danube Gorges Meso 61,009 [11,575-88,929] Figure S31 - Schematic representation of the ML demographic model best explaining the observed multi-dimensional SFS of DS11: model b_SS ML. Values of various parameters are reported with their 95% CI estimated from 100 parametric bootstrap estimations. Close to the bottleneck symbol (X), is indicated the size of the population if the bottleneck lasted for a single generation (instbot fastsimcoal2 option) for modeling convenience.

76 Marchi et al. The mixed genetic origin of the first farmers of Europe

30 worse fitted SFS entries Northern Greece Central Serbia

● ● rv ● rv

● ● sfs[i] sfs[i] 0

● ● ● ●

● ● f ● ● ● ●

0 3 0 3

i i

Lower Austria Southern Germany

● ● rv ● rv

● sfs[i] sfs[i] ● 30 worse fitted SFS entries ● 6000 ● ● ● rv ● ● ● ●

● ● ●

● ● 0 3 0 3 6

i i Danube Gorges Mesolithic

● ● rv

sfs[i]

● 0

● ● 0 3 i Figure S32 - Fit between the observed and estimated SFS under the best ML model presented in Fig. S31 (total on the left panel; marginal for each population on the right).

77 Marchi et al. The mixed genetic origin of the first farmers of Europe

Neolithic Aegean diversity

Tested models In a second step, we extended the best model inferred in the previous step from the dataset DS11 by including an individual from NW Anatolia (dataset DS12) in order to estimate the divergence times between the sampled Aegean populations, i.e. from Greece and NW Anatolia. In this model, the ancestral Aegean population would have diverged from the Eastern metapopulation and then split into NW Anatolian and Northern Greek Neolithic populations (Fig. S33). These two divergence times, as well as the divergence between the Central and East metapopulations, are let free to allow for old or recent divergence between Aegeans without any a priori assumption. We placed a bottleneck on the ancestral Aegean population one generation after its split from East metapopulation. Furthermore, in this new model, we authorized three separate admixture events: one occurring from the Central metapopulation towards the ancestral population at an arbitrary time, and two others into the Northern Greek and NW Anatolian populations 310 generations ago (i.e. approximately when Neolithic was introduced in these areas and, therefore, when the Neolithic populations were founded43).

Best model parameters estimates and SFS fit Under this scenario, the Barcın individual Bar25 from NW Anatolia seems drawn from a small population distinct from the Northern Greek population, from which it would have diverged recently (342 generations ago, corresponding to ~9.9 kya, 95% CI 9.1-10.8 kya) (Fig. S34, Supp. Table 4). As in the former model, the Aegean ancestral population is found to have diverged from the East metapopulation a long time ago (761 generations, ~22.1 kya), only a few generations after the split between the metapopulations (807 generations, ~23.4 kya). Furthermore, we found that the old and large admixture with Central metapopulation previously observed in the ancestors of the European Neolithics (Fig. S31) actually occurred on the ancestral branch to all Neolithic populations, 669 generations ago (~19.4 kya) with a broad confidence interval ranging from 10.4 to 23 kya and at a slightly higher rate (41%, 95% CI 38-50%) than in the previous model DS11. As for the other Neolithic populations, the Northern Greek and NW Anatolian populations were therefore of admixed ancestry, and were later connected to HGs and other farmer communities (3- 9%). Note that this approach of extending a former model with some fixed and other newly estimated parameters leads to a good fit between the observed and simulated SFS (Fig. S35).

78 Marchi et al. The mixed genetic origin of the first farmers of Europe

TBot Ancestral 8283 X 1.6

37,403 TDivMeta

>TDivNeo&TDivMeso TBot =TDiv -1 4.5 Meta Meta X 78 X

TDivNeo >TDivNGreeceNWAnatolia =TDiv -1 NBotAncNeo TBotAncNeo Neo X admHGAncNeo TAdmAncNeo

[TDivNGreeceNWAnatolia-TDivNeo]

TDivMeso 352 NAncNeo TDivNGreeceNWAnatolia [310-2000] 0.5% NBot 320 NGreece NBot admHG NWAnatolia NGreece admNeo 310 X X NGreece admHGNWAnatolia admNeoNWAnatolia TDiv 6% 4% CSerbiaNGreece 280 5% TDiv 2% LAustriaCSerbia 261

TDiv 3% 2% SGermanyLAustria 255

2570 1580 3050 2335 21,345 0.05 0.04 0.10 0.09 N NGreece f NGreece MetaEast61,009 N NWAnatolia MetaCentral f NWAnatolia Lower Austria Central Serbia Northern Greece NWAnatolia (Bar) Southern Germany Danube Gorges Meso Figure S33 - Schematic description of the model tested for the DS12 dataset. This model is derived from the best model inferred for DS11: parameter values estimated in the previous scenario are fixed in this analysis and shown in light fonts while newly estimated parameters are shown in bold fonts. Under this model, the Aegean ancestral population has diverged TDivNeo generations ago from the Eastern metapopulation. A single pulse of admixture coming from the

Central metapopulation occurred on this branch TAdmAncNeo generations ago. Then, the ancestral population split into Northern Greece and NW Anatolia TDivNGreeceNWAnatolia generations ago, and both populations underwent a bottleneck (X) and received admixture from Central and Eastern metapopulations 310 generations ago. Finally, as in the former scenario, other European Neolithic populations emerged from the ancestors of the Greek population in a stepping-stone fashion. The parameters search ranges are shown between brackets on the figure, but parameters are further described in Supp. Table 4 and properties of the sampled populations can be found in Table S9.

79 Marchi et al. The mixed genetic origin of the first farmers of Europe

TBot Ancestral 8283 X 1.6

TDiv 37,403 Meta 807 [732-1087] >TDivNeo&TDivMeso TBot =TDiv -1 Meta Meta X 4.5 78 X

TDivNeo >TDivNGreeceNWAnatolia 761 [365-845] TBot AncMeta =TDiv -1 274 [12-500] Neo X TAdm 41% [38-50] AncNeo 669 [359-802]

[TDivNGreeceNWAnatolia-TDivNeo]

TDiv 352 Meso 2428 [125-3457] TDivNGreeceNWAnatolia 342 [314-374] [310-2000] 0.5% 320 379 [100-1000] 118 [27-500] 3% [3-9] 9% [6-14] 310 5% [1-8] X X 5% [1-8] TDiv 6% 4% CSerbiaNGreece 280 5% TDiv 2% LAustriaCSerbia 261

TDiv 3% 2% SGermanyLAustria 255

2570 1580 3050 2335 0.05 0.04 0.10 0.09 21,345 MetaEast61,009 MetaCentral Lower Austria Central Serbia Northern Greece NWAnatolia721 [526-7301] (Bar) Southern Germany 9509 [3219-8652] 0.01 [0.00-0.01] 0.04 [0.01-0.07] Danube Gorges Meso Figure S34 - Schematic representation of the ML demographic model best explaining the observed multi-dimensional SFS of DS12 when adding the Bar25 individual from NW Anatolia to the scenario shown in Fig. S31. ML values of newly estimated parameters and their 95% CI estimated from 100 parametric bootstrap are shown in bold fonts. Close to the bottleneck symbol (X), is indicated the size of the population if the bottleneck lasted for a single generation (instbot fastsimcoal2 option) for modeling convenience.

80 Marchi et al. The mixed genetic origin of the first farmers of Europe

30 worse fitted SFS entries Northern Greece Central Serbia

60 ● ● ● rved ● rved e e 40 15000 15000 20 ● ● ● 0 10000 10000

● ● ● ●

● ● ● f

5000 ● ● ● 5000 ● 0 1 2 3 4 0 1 2 3 4

i i

logLhood dif Lower Austria Southern Germany

● ● rved ● rved e e 16000 12000 ● ● (0,0,0,1,0,0) (0,1,0,0,0,0) (0,0,0,0,1,0) (2,0,0,0,0,0) (0,0,0,0,0,1) (0,0,1,0,0,0) (3,4,4,6,4,2) (4,4,3,6,4,2) (4,4,4,6,3,2) (1,0,0,0,0,0) (0,0,0,1,0,1) (0,2,0,0,0,0) (4,3,4,6,4,2) (1,1,0,1,0,0) (3,4,4,5,4,2) (0,0,0,0,2,0) (0,1,1,0,0,0) (2,0,0,1,1,0) (4,4,4,5,4,2) (1,0,0,2,0,0) (0,0,0,2,0,0) (0,0,0,0,4,0) (2,4,2,5,4,2) (1,1,2,3,1,0) (4,4,4,5,3,2) (4,4,3,5,3,2) (0,0,0,1,1,0) (2,4,4,6,4,2) (3,3,3,5,4,2) (1,0,0,0,1,0) 12000 ●

30 worse fitted SFS entries ● 6000 ● ● ● rved ● 2500 ● ● 4000 ● ● ● ● 4000 2000 0 1 2 3 4 0 1 2 3 4 5 6 i i

Danube Gorges Mesolithic NW Anatolia (Bar) 1500 ● ● rved ● rved 20000 e e 25000 1000 n 15000 20000 500 15000

10000 ● 0 ● ● ●

● 10000 ● ● ● 5000

0 1 2 3 4

(0,0,0,1,0,0) (0,1,0,0,0,0) (0,0,0,0,1,0) (2,0,0,0,0,0) (0,0,0,0,0,1) (0,0,1,0,0,0) (3,4,4,6,4,2) (4,4,3,6,4,2) (4,4,4,6,3,2) (1,0,0,0,0,0) (0,0,0,1,0,1) (0,2,0,0,0,0) (4,3,4,6,4,2) (1,1,0,1,0,0) (3,4,4,5,4,2) (0,0,0,0,2,0) (0,1,1,0,0,0) (2,0,0,1,1,0) (4,4,4,5,4,2) (1,0,0,2,0,0) (0,0,0,2,0,0) (0,0,0,0,4,0) (2,4,2,5,4,2) (1,1,2,3,1,0) (4,4,4,5,3,2) (4,4,3,5,3,2) (0,0,0,1,1,0) (2,4,4,6,4,2) (3,3,3,5,4,2) (1,0,0,0,1,0) i i

Figure S35 - Fit between the observed and estimated SFS under the best ML model presented in Fig. S34 (total on the left panel; marginal for each population on the right).

In addition to this approach where DS11 served as a backbone for DS12, we tried to estimate all parameters from DS12 with the same model as before (i.e. with Northern Greece as the source of a stepping-stone Neolithic settlement along the Danube, after its split from NW Anatolia) and with an alternative model where Neolithic groups from Serbia, Austria and Germany were modeled as stemming from the NW Anatolian population rather than from Northern Greece. As these scenarios included more parameters than before (42 vs. 34 max for DS11_b_SS), we used 150 ECM cycles per run instead of 100 for maximizing the likelihood of the parameters. Under this setting, we only observed a slight advantage for the model where European farmers originate from Greece (maximum log10Lhood of -255,538 vs. -255,541 for an origin from NW Anatolia, see Supp. Table 4). Given the very similar likelihoods, these two origins seem equally likely, and we thus prefer to propose a relatively broad Aegean origin of the Neolithic settlement along the Danube. However, when studying more complex models, we will keep a model where the origin of the stepping stone process for the settlement of Europe is in Northern Greece.

81 Marchi et al. The mixed genetic origin of the first farmers of Europe

NW Anatolia genetic structure

Tested models In a third step, we used the DS12 best model parameters (Fig. S34) as a backbone to examine more complex models and address specific questions by adding relevant ancient individuals. For instance, in addition to being one of the oldest Neolithic sites in NW Anatolia, the site of Aktopraklık belongs to the local ‘Fikirtepe culture’, which is thought to have been influenced by both Mesolithic and Neolithic traditions45. In order to investigate the possibly complex settlement history of this site, we tested three models allowing AKT16 to branch off the Central metapopulation, off the East metapopulation, or off the ancestral Aegean population (Fig. S36).

Best model parameters estimates and SFS fit The best model explaining the DS6_AKT dataset (Supp. Table 4, Fig. S36d) is when the Aktopraklık individual is drawn from a small population emerging as a sister population of Barcın (model b, Fig. S37). Also, under this model, the Aktopraklık-related population is clearly distinct from other NW Anatolian individuals from Barcın, as 25% of its genome would be drawn from surrounding Neolithic populations (modeled by East metapopulation), and another 16% from some Mesolithic related population (the Central metapopulation), the rest (58%) coming directly from the ancestral NW Anatolian population from which it would have diverged shortly before sampling times (317 generations ago, corresponding to ~9.2 kya). The quite important HG contribution to its genome is in line with the admixture analysis (Fig. S21). A good fit for the observed and simulated SFS is also obtained using the backbone approach (Fig. S38).

82 Marchi et al. The mixed genetic origin of the first farmers of Europe

TBot 8283 1.6 TBot 8283 1.6 a Ancestral X b Ancestral X 37,403 37,403 TDivMeta 807 TDivMeta 807 TBot 4.5 78 TBot 4.5 78 Meta 806 X X Meta 806 X X TDiv 61,009 TDiv 61,009 Neo 761 Neo 761 TBot TBot AncNeo 760 274 X AncNeo 760 274 X

TAdmAncNeo 669 41% TAdmAncNeo 669 41%

TDivAKT

TDivMeso 352 TDivMeso 352

TDivNGreeceNWAnatolia 342 2428 TDivNGreeceNWAnatolia 342 2428

0.5% 0.5% 320 320 admHG TAdm AKT TDiv AKT admNeo AKT AKT 379 118 379 118 AKT 3% 9% admNeo 3% 9% 310 310 AKT 5% X X 5% 5% X X 5% admHGAKT TAdmAKT

AKT

N AKT N AKT f AKT 2570 9509 721 f AKT 2570 9509 721 21,345 0.01 21,345 0.01 0.051 0.04 MetaEast61,009 0.051 0.04 MetaEast61,009 MetaCentral MetaCentral

Northern Greece Northern Greece NWAnatolia (AKT) NWAnatolia (Bar) NWAnatolia (AKT) NWAnatolia (Bar)

Danube Gorges Meso Danube Gorges Meso

TBot 8283 1.6 c Ancestral X d 37,403 TDivMeta 807 TBot 4.5 78 Meta 806 X X TDiv 61,009 Neo 761 TBot AncNeo 760 274 X

TAdmAncNeo 669 41%

TDivMeso 352

TDivNGreeceNWAnatolia 342 2428 log10Lhood

0.5% 320

TDivAKT 379 118 admNeo 3% 9% 310 AKT 5% X X 5% admHGAKT TAdmAKT AKT

N AKT f AKT 2570 9509 721 21,345 0.01 0.051 0.04 MetaEast61,009 MetaCentral a b c Northern Greece NWAnatolia (AKT) NWAnatolia (Bar) Danube Gorges Meso Figure S36 - Schematic description of the three models tested for the DS6_AKT dataset and comparison of their likelihood (d). These models are derived from the best model inferred for DS12: parameter values estimated in the previous scenario are fixed in this analysis and shown in light fonts while newly estimated parameters are shown in bold fonts. The AKT-related population could have emerged from the Central metapopulation (a); from the Aegean ancestral population (b); from East metapopulation (c). In all the three scenarios, the AKT-related population can receive a pulse of admixture from the Central and East metapopulations, at 320

and TAdmAKT generations ago, respectively, in model a; at TAdmAKT and 310 generations ago (i.e. 25 generations before its sampling) in models b and c. The parameters search ranges are shown between brackets on the figures, but parameters are further described in Supp. Table 4 and properties of the sampled populations can be found in Table S9. In d, the distribution of the log10- likelihoods were obtained by simulation from the maximum likelihood point estimates for each of the three models. Higher likelihoods (top of the plot) reflect better simulations.

83 Marchi et al. The mixed genetic origin of the first farmers of Europe

TBot 8283 1.6 Ancestral X 37,403 TDivMeta 807 TBot 4.5 78 Meta 806 X X TDiv 61,009 Neo 761 TBot AncNeo 760 274 X

TAdmAncNeo 669 41%

TDivMeso 352 TDiv NGreeceNWAnatolia 342 2428

0.5% 320 TDiv AKT 317 [313-329] 379 118 25% [18-28] 3% 9% 310 5% X X 5% 16% [14-22] TAdmAKT 304 [297-315]

AKT]

2570 9509 721 21,345 0.01 0.04 0.051 MetaEast61,009 MetaCentral

749 [123-6185] Northern Greece NWAnatolia0.13 (AKT) [0.04-0.15] NWAnatolia (Bar) Danube Gorges Meso Figure S37 - Schematic representation of the ML demographic model best explaining the observed multi-dimensional SFS of DS6_AKT (b) when adding the AKT16 individual from NW Anatolia to the scenario shown in Fig. S34. ML values of newly estimated parameters and their 95% CI estimated from 100 parametric bootstrap are shown in bold fonts. Close to the bottleneck symbol (X), is indicated the size of the population if the bottleneck lasted for a single generation (instbot fastsimcoal2 option) for modeling convenience.

84 Marchi et al. The mixed genetic origin of the first farmers of Europe

30 worse fitted SFS entries

NW Anatolia (Bar) Northern Greece

0 ● ● ● rved ● rved ● e ● e f 12000 20000 ●

● ● ● logLhood dif ● ●

0 1 2 3 4 (0,0,1,0) (1,0,0,0) (0,0,0,1) (1,1,0,0) (0,2,0,0) (0,2,1,0) (1,0,1,0) (0,0,3,0) (2,3,4,2) (2,4,3,2) (0,0,4,0) (0,0,0,2) (1,2,0,0) (2,2,4,2) (0,2,1,1) (2,3,3,2) (0,1,1,0) (2,0,2,0) (0,3,0,0) (0,1,2,0) (1,0,0,1) (1,1,2,0) (1,3,3,1) (0,0,2,0) (1,4,4,2) (1,3,2,1) (0,2,0,1) (2,4,0,2) (1,3,1,0) (1,1,0,2) i i

Danube Gorges Mesolithic NW Anatolia (AKT)

● ● ● rved ● rved e e

30 worse fitted SFS entries rved ● 4000 14000 ● 20000

● ●

3000 10000 ●

● ● 2000 0 1 2 3 4

n i i

1000

0

(0,0,1,0) (1,0,0,0) (0,0,0,1) (1,1,0,0) (0,2,0,0) (0,2,1,0) (1,0,1,0) (0,0,3,0) (2,3,4,2) (2,4,3,2) (0,0,4,0) (0,0,0,2) (1,2,0,0) (2,2,4,2) (0,2,1,1) (2,3,3,2) (0,1,1,0) (2,0,2,0) (0,3,0,0) (0,1,2,0) (1,0,0,1) (1,1,2,0) (1,3,3,1) (0,0,2,0) (1,4,4,2) (1,3,2,1) (0,2,0,1) (2,4,0,2) (1,3,1,0) (1,1,0,2) Figure S38 - Fit between the observed and estimated SFS under the best ML model presented in Fig. S37 (total on the left panel; marginal for each population on the right).

85 Marchi et al. The mixed genetic origin of the first farmers of Europe

Lepenski Vir and European Neolithic populations relationships

Tested models Lastly, we included in the dataset DS8_LV two newly sequenced genomes from Lepenski Vir of very similar age, ascribed respectively to the ‘Transformational/Early Neolithic’ phase (LEPE48) and to the ‘Early-Middle Neolithic’ phase (LEPE52). These two genomes were found to be close to the other Neolithic genomes from Europe on the MDS shown in Fig. 2a, and do not show the typical forager signature found in older (Mesolithic) samples from the same site24. Here we investigated the likely origin of these individuals, i.e. if they came with the same expansion wave that brought farming to Central Europe or if they split from the pioneer Neolithic population to form a separate branch that remained off-course and did not contribute significantly to the European Early Neolithic gene pool. We therefore tested a scenario in which the Lepenski Vir individuals we newly sequenced branched off the ancestral Neolithic population at a free time (Fig. S39).

Best model parameters estimates and SFS fit Under this scenario, the population related to the two Lepenski Vir individuals sequenced in this study and from the Neolithic epoch is found to have diverged extremely recently from the Northern Greek populations, 282 generations ago (95% CI 280-341) (Fig. S40, Supp. Table 4), thus after the split between Aegeans from Greece and NW Anatolia 342 generations ago, and potentially after the bottleneck mimicking the beginning of Neolithic in the region 310 generations ago, and only 2 generations before the split between the Northern Greek and Central Serbian populations (Central Serbia represented here by the STAR1 and VC3-2 genomes, which are both related to the Neolithic Starčevo culture). The two Lepenski Vir individuals could thus have come from a population that would have moved to the Central Balkans a few generations before the population to which the two other Serbian Neolithic individuals belong, but we may not have enough power here to clearly say if there were two waves or a single wave of settlement into the Balkans after the settlement in the Aegean.

86 Marchi et al. The mixed genetic origin of the first farmers of Europe

TBot 1.6 Ancestral 8283 X 37,403 TDivMeta 807

TBotMeta 806 4.5 X X 78 TDiv 61,009 Neo 761 TBot 274 AncNeo 760 X 41% TAdmAncNeo 669

TDivMeso 352 TDiv NGreeceNWAnatolia 342 2428 0.5% 320 379 3% 9% 310 X TDiv -1 LV LV NBot TBot =TDiv LV LV X admNeo 4% 280 LV 6%

TAdmLP admHGLV LV, TDivMeta

N LV f LV 2570 2335 9509 21,345 0.09 0.01 0.051 61,009 Lepenski Vir MetaEast MetaCentral Central Serbia Northern Greece Danube Gorges Meso Figure S39 - Schematic description of the model tested for the DS8_LV dataset. This model is derived from the best model inferred for DS12: parameter values estimated in the previous scenario are fixed in this analysis and shown in light fonts while newly estimated parameters are shown in bold fonts. The parameters search ranges are shown between brackets on the figures, but parameters are further described in Supp. Table 4 and properties of the sampled populations can be found in Table S9.

87 Marchi et al. The mixed genetic origin of the first farmers of Europe

TBot 1.6 Ancestral 8283 X 37,403 TDivMeta 807

TBotMeta 806 4.5 X X 78 TDiv 61,009 Neo 761 TBot 274 AncNeo 760 X

TAdm 41% AncNeo 669

TDivMeso 352 TDiv NGreeceNWAnatolia 342 2428 0.5% 320 379 3% 9% 310 X TDiv LV 282 [280-341] 80 [33-1000] TBot =TDiv -1 LV LV X 12% [7-14] 4% 280 6% 4% [1-5] TAdm 276 [275-310] LP

TDivLV,TDivMeta)]

2570 2335 9509 21,345 0.09 0.01 0.051 MetaEast61,009 Lepenski Vir MetaCentral Central Serbia 5218 [851-6648] 0.12 [0.02-0.14] Northern Greece Danube Gorges Meso

Figure S40 - Schematic representation of the ML demographic model best explaining the observed multi-dimensional SFS of DS8_LV when adding the two Lepenski Vir samples to the scenario shown in Fig. S34. ML values of newly estimated parameters and their 95% CI estimated from 100 parametric bootstrap are shown in bold fonts. Close to the bottleneck symbol (X), is indicated the size of the population if the bottleneck lasted for a single generation (instbot fastsimcoal2 option) for modeling convenience.

88 Marchi et al. The mixed genetic origin of the first farmers of Europe

30 worse fitted SFS entries

Northern Greece Central Serbia ● rv ● rv

f ● ●

sfs[i] sfs[i]

● ●

● ● ●

● ●

i i

Danube Gorges Mesolithic Lepenski Vir

● rv ● rv 30 worse fitted SFS entries rv ● sfs[i] sfs[i] ● ●

● ●

● ● ●

i i

Figure S41 - Fit between the observed and estimated SFS under the best ML model presented in Fig. S40 (total on the left panel; marginal for each population on the right).

89 Marchi et al. The mixed genetic origin of the first farmers of Europe

Central Anatolian and Iranian early farmers relationships

Tested models The large Mesolithic contribution found in the ancestors of all European and Western Anatolian Neolithics raises the question of the geographic location of this HG input, and whether it can also be observed in Central Anatolia and further East. We examined these questions by including the Bon002 (Boncuklu) Early Neolithic individual from the Konya Plain in Central Anatolia (Fig. S42a) and the WC1 (Wezmeh Cave) Early Neolithic Iranian individual from the Zagros Region (Fig. S42b-c).

Best model parameters estimates and SFS fit For DS5_Bon (Fig. S43, Supp. Table 4), the Boncuklu individual is found to have diverged from the Aegean ancestral population some 462 generations ago (~13.4 kya) during the cold Younger Dryas period, thus after the admixture with central HGs. An additional HG genetic input, received more recently (about 406 generations ago, ~11.8 kya), contributed to 8% of its genome. It appears clearly with this model that Bon002 from Central Anatolia shares an ancestry with the NW Anatolian and European Neolithics, but that this divergence clearly predates the introduction of farming. Constratingly, the Wezmeh Cave individual is found to have diverged from the Aegean ancestral population too (instead of coming from the East metapopulation, Fig. S42d, Supp. Table 4) but 701 generations ago (∼20.3 kya), thus before the ancestral Aegean population got admixed with the Central metapopulation (Fig. S44). It would have received only later on (539 generations ago, ∼14.3 kya) some relatively minor (10%) contribution from Central metapopulation, suggesting it has a clearly distinct ancestry than Anatolian and European Neolithic individuals.

90 Marchi et al. The mixed genetic origin of the first farmers of Europe

TBot 8283 1.6 a Ancestral X d TDivMeta 807 TBot 4.5 78 Meta 806 X X

TDivNeo 761 TBot AncNeo 760 274 X

41% TAdmAncNeo 669

TDivKP

log10Lhood admHGKP TAdmKP KP admNeoKP Neolithic in Konya Plain 353 X NBot TDiv 352 KP Meso TDivNGreeceNWAnatolia 342 0.5% 320 379 3% 9% 310 X

b c N KP f KP 2570 9509 0.01 0.051 MetaEast Konya Plain MetaCentral

Northern Greece b Danube Gorges Meso c 1.6 TBot 8283 TBot 8283 1.6 Ancestral X Ancestral X TDiv Meta 807 TDivMeta 807

TBotMeta 806 4.5 78 TBot 4.5 78 X X Meta 806 X X TDiv Neo 761 TDivNeo 761 TBot TBot AncNeo 760 274 AncNeo 760 274 TDiv X X ZR TDivZR 41% 41% TAdm AncNeo 669 TAdmAncNeo 669

admHG KP admHGKP TAdm ZR TAdmZR ZR ZR

admNeoKP admNeo Neolithic in Zagros 360 X Neolithic in Zagros 360 X KP NBotZR NBot TDiv 352 ZR Meso TDivMeso 352 TDiv NGreeceNWAnatolia 342 TDivNGreeceNWAnatolia 342 0.5% 0.5% 320 320 379 379 3% 9% 3% 9% 310 X 310 X

ZR N N ZR f ZR 2570 9509 f ZR 2570 9509 0.01 0.01 0.051 MetaEast 0.051 MetaEast MetaCentral MetaCentral Zagros region Zagros region Northern Greece Northern Greece Danube Gorges Meso Danube Gorges Meso Figure S42 - Schematic description of the models tested for the DS5_Bon (a) and DS5_WC (b- c) datasets, and comparison of DS5_WC models’ likelihood (d). These models are derived from the best model inferred for DS12: parameter values estimated in the previous scenario are fixed in this analysis and shown in light fonts while newly estimated parameters are shown in bold fonts. Namely, we only estimate the size and inbreeding coefficient of the population including the Central Anatolian genome Bon002 (a) or the Iranian genome WC1 (b-c), the time of their divergence from the Aegean ancestral population (a-b) or from the East metapopulation (c) and the time and rate of admixture they received from the Central metapopulation. The time of admixture with the East Metapopulation and of the bottlenecks correspond roughly to the time of uptake of Neolithic practices in both areas (353 generations ago in the Konya Plain; 360 generations ago in the Zagros region42,216,217). The parameters search ranges are shown between brackets on the figures, but parameters are further described in Supp. Table 4 and properties of

91 Marchi et al. The mixed genetic origin of the first farmers of Europe the sampled populations can be found in Table S9. In d, the distribution of the log10-likelihoods were obtained by simulation from the maximum likelihood point estimates for each of the two DS5_WC models. Higher likelihoods (top of the plot) reflect better simulations.

TBot 8283 1.6 Ancestral X TDivMeta 807 TBot 4.5 78 Meta 806 X X

TDivNeo 761 TBot AncNeo 760 274 X

41% TAdmAncNeo 669

TDivKP 462 [398-502] 8% [1-17] TAdmKP 406 [357-449] KP 2% [0-9] Neolithic in Konya Plain 353 X 78 [16-500] TDivMeso 352 TDivNGreeceNWAnatolia 342 0.5% 320 379 3% 9% 310 X

2570 9509 0.01 0.051 MetaEast Konya Plain MetaCentral

Northern Greece Danube Gorges Meso Figure S43 - Schematic representation of the ML demographic model best explaining the observed multi-dimensional SFS of DS5_Bon when adding the Bon002 individual from Central Anatolia to the scenario shown in Fig. S34. ML values of newly estimated parameters and their 95% CI estimated from 100 parametric bootstrap are shown in bold fonts. Close to the bottleneck symbol (X), is indicated the size of the population if the bottleneck lasted for a single generation (instbot fastsimcoal2 option) for modeling convenience.

92 Marchi et al. The mixed genetic origin of the first farmers of Europe

TBot 8283 1.6 Ancestral X 37,403 TDivMeta 807 TBot 4.5 78 Meta 806 X X

TDivNeo 761 61,009 TBot AncNeo 760 274 TDiv X ZR 693 [670-716] 41% TAdmAncNeo 669

10% [8-13] 458 [343-646] TAdmZR

ZR

5% [2-10] Neolithic in Zagros 360 X 500 [25-500] TDivMeso 352 2,428 TDivNGreeceNWAnatolia 342 0.5% 320 379 3% 9% 310 X

2570 9509 21,345 0.01 0.051 MetaEast61,009 MetaCentral Zagros region Northern Greece Danube Gorges Meso Figure S44 - Schematic representation of the ML demographic model best explaining the observed multi-dimensional SFS of DS5_WC (Fig. S42b) when adding the WC1 individual from the Zagros region to the scenario shown in Fig. S34. ML values of newly estimated parameters and their 95% CI estimated from 100 parametric bootstrap are shown in bold fonts. Close to the bottleneck symbol (X), is indicated the size of the population if the bottleneck lasted for a single generation (instbot fastsimcoal2 option) for modeling convenience.

93 Marchi et al. The mixed genetic origin of the first farmers of Europe

30 worse fitted SFS entries

20 Northern Greece Danube Gorges Mesolithic

● ● ● rved ● rved 10 e e ●

● ● 2000 f 0 2000 ● ●

● ●

● 1000 logLhood dif

● ● 1000

0 1 2 3 4 0 1 2 3 4

(0,3,0) (2,0,0) (0,2,1) (0,0,1) (2,2,1) (3,4,1) (1,2,1) (4,3,2) (2,1,1) (0,2,0) (0,1,1) (0,1,0) (0,0,2) (3,3,0) (2,3,0) (1,0,2) (2,2,0) (4,4,1) (1,3,0) (1,1,0) (4,2,2) (2,3,1) (1,4,0) (3,1,1) (2,3,2) (1,2,0) (4,1,2) (1,2,2) (2,1,0) (0,4,0) i i

Konya Plain

● ● rved e

30 worse fitted SFS entries 4000

rved 800

3000 ● ● 600 2000 ●

400 i n

200

0 (0,3,0) (2,0,0) (0,2,1) (0,0,1) (2,2,1) (3,4,1) (1,2,1) (4,3,2) (2,1,1) (0,2,0) (0,1,1) (0,1,0) (0,0,2) (3,3,0) (2,3,0) (1,0,2) (2,2,0) (4,4,1) (1,3,0) (1,1,0) (4,2,2) (2,3,1) (1,4,0) (3,1,1) (2,3,2) (1,2,0) (4,1,2) (1,2,2) (2,1,0) (0,4,0)

Figure S45 - Fit between the observed and estimated SFS under the best ML model presented in Fig. S43 (total on the left panel; marginal for each population on the right).

30 worse fitted SFS entries

Northern Greece Danube Gorges Mesolithic

● ● ● ● rv ● rv

f ●

● sfs[i] ● sfs[i]

● ●

● ●

● ●

1 1

i i

Zagros region

● ● rv 30 worse fitted SFS entries rv sfs[i] ●

● ●

i

Figure S46 - Fit between the observed and estimated SFS under the best ML model presented in Fig. S44 (total on the left panel; marginal for each population on the right).

94 Marchi et al. The mixed genetic origin of the first farmers of Europe

We additionally tested if the source for the HG genetic contribution to Boncuklu and Wezmeh Cave genomes could be the admixed ancestral Neolithic (AAN) population (i.e. on the Aegean branch between 669 and 342 generations in the former models). However, a model with an admixture from the AAN population is never better than a model with admixture from the Central metapopulation, as modeled before. Indeed, the likelihoods of the different scenarios are really similar in the case of DS5_Bon (log10LHood = -45,209 from AAN vs -45,208 from MetaCentral) but more distinct for DS5_WC (log10LHood = -350,986 from AAN vs -350,979 from MetaCentral). Furthermore, we explored a more refined model where most of the parameters previously fixed that could be impacted by the presence of WC1 (e.g. metapopulations size, bottleneck intensity and time of divergence; see Supp. Table 4) were re-estimated. We tested two models similar to those presented in Fig. S42b&c (i.e. WC1 branching off the ancestral Neolithic population or off the Eastern metapopulation with some admixture coming from Central metapopulation). In this case, we were not able to distinguish the two scenarios (log10LHood = -350,674 and -350,673). Overall, in both cases we still find an old divergence between the Wezmeh Cave individual and the branch leading to Aegeans and European Neolithic individuals (701 generations ago, i.e. before the HG admixture on the ancestral Neolithic branch in model b or 710 generations in model c, at

TDivNeo, i.e. when the ancestral Neolithic population split from the Eastern metapopulation from which WC1 has not diverged yet). In any case, in both scenarios WC1 looks closer to the Eastern metapopulation than to any other Neolithic individual. This is clearly visible on the MDS plot when simulating data for the unsampled metapopulations (Fig. S50b).

95 Marchi et al. The mixed genetic origin of the first farmers of Europe

Western European and Central European Mesolithic populations relationships

Tested models We included two previously published good quality Upper Palaeolithic (Bichon) and Mesolithic (Loschbour) genomes to investigate their relationship with the two Mesolithic genomes from the Danube Gorges and with the Central metapopulation used in our previous models. We evaluated two scenarios: one assuming that Bichon and Loschbour populations were related and that it is their common ancestor that diverged from the other HG populations (Central metapopulation, Fig. S47a). In the second scenario (Fig. S47b), Bichon and Loschbour would have diverged independently from the Central metapopulation.

TBot 1.6 TBot 1.6 a Ancestral X b Ancestral X c TDivMeta 807 TDivMeta 807

TBotMeta 806 4.5 78 TBotMeta 806 4.5 78 TDiv X X X X Anc TDiv Bichon [>TDivWEur TDiv NAnc TDiv WEur Loschbour 761 761 TDivNeo TDivNeo 760 760 TBotAncNeo X TBotAncNeo X 669 41% 669 41% TAdmAncNeo TAdmAncNeo

admHGBichon admHGBichon 497 497

log10Lhood

TDivMeso TDivMeso TDivNGreeceNWAnatolia TDivNGreeceNWAnatolia 0.5% 0.5% 379 379 310 admHG 3% 9% 310 admHG 3% 9% Loschbour X Loschbour X 303 303 a b

BichonBichon BichonBichon N 9509 N 9509 f Bichon Loschbour 0.01 f Bichon Loschbour 0.01 LoschbourN 0.051 MetaEast LoschbourN 0.051 MetaEast f Loschbour f Loschbour MetaCentral MetaCentral

Northern Greece Northern Greece Danube Gorges Meso Danube Gorges Meso Figure S47 - Schematic description of the models tested for the DS6_BL dataset, and comparison of their likelihood (c). These models are derived from the best model inferred for DS12: parameter values estimated in the previous scenario are fixed in this analysis and shown in light fonts while newly estimated parameters are shown in bold fonts. The populations could have emerged together from the Central metapopulation via an ancestral population TDivAnc of size NAnc (a) or independently at TDivBichon and TDivLoschbour respectively (b). In both scenarios, each population can receive a pulse of admixture from the Central metapopulation 25 generations before their sampling (i.e. 497 generations ago for Bichon and 303 for Loschbour). The parameters search ranges are shown between brackets on the figures, but parameters are further described in Supp. Table 4 and properties of the sampled populations can be found in Table S9. In c, the distribution of the log10- likelihoods were obtained by simulation from the maximum likelihood point estimates for each model. Higher likelihoods (top of the plot) reflect better simulations. 96 Marchi et al. The mixed genetic origin of the first farmers of Europe

Best model parameters estimates and SFS fit We find that the model where Bichon and Loschbour have a common ancestry is best supported (Fig. S47c, Supp. Table 4). In this scenario, the ancestral population of these two individuals would have diverged during the LGM, 803 generations ago (∼23.3 kya), from the Central European HGs, almost at the same time as these latter ones diverged from East metapopulation. Note that our modeling did not constrain them to have diverged from the Central branch as they could have diverged before the LGM from the ancestors of the two metapopulations. It implies that during the Last Glacial Maximum (LGM), the HG population split into (at least) three components (Western European, Central European and Near-Eastern HGs). Interestingly, the divergence of these Western European HGs from the Central metapopulation has occurred after the very severe bottleneck found at the real beginning of the Central branch (about a single couple for one generation or 22 individuals for 10 generations). This bottleneck seems to have affected all Central and Western HGs and can be responsible for the reduced diversity observed within European HGs as compared to most Neolithics, e.g HGs clustering all together on MDS plots (Fig. 2a-b and Fig. S19). We also note that the Loschbour and Bichon population seem to have split during the LGM some 763 generations ago (∼22.1 kya). However, this splitting time is not well supported by our bootstrap estimates, as the 95% CI indicates a splitting time of 14-20.5 kya, and thus that this divergence could have occurred after the LGM. However, the discrepancy between point estimates and CI limits can be solved by considering the population branch lengths expressed in units of population size (t/(2N)). With this rescaling, the Bichon-Loschbour ancestral branch has a length of 0.08 (95% CI 0.07-0.10), the Bichon specific branch length is 0.04 (95% CI 0.01-0.09) and the Loschbour-specific branch length is 0.12 (95% CI 0.04-0.18). Therefore, a similar amount of divergence between Loschbour and Bichon can be obtained either by a combination of long divergence times and large population sizes or by short divergence times and small population sizes. In all cases, these two Western European HG populations then received limited gene flow (4-5%) from the Central metapopulation, suggesting they remained relatively isolated after the LGM.

97 Marchi et al. The mixed genetic origin of the first farmers of Europe

TBot 1.6 Ancestral X TDivMeta 807 TBot 4.5 78 Meta 806 X X TDivAnc >TDivWEur 803 [691-804] TDiv 484 [1024-3637] WEur 763 [509-693] TDivNeo 761 TBot 760 274 AncNeo X 41% TAdmAncNeo 669

5% [1-9] 497 TAdmBichon

TDivMeso TDivNGreeceNWAnatolia 0.5% 379 310 3% 4% [1-11] X 9% 303 TAdmLoschbour

Bichon 9509 0.01 0.051 MetaEast Loschbour MetaCentral

Northern Greece Danube Gorges Meso Figure S48 - Schematic representation of the ML demographic model best explaining the observed multi-dimensional SFS of DS6_BL when adding the Bichon and Loschbour individuals to the scenario shown in Fig. S34. ML values of newly estimated parameters and their 95% CI estimated from 100 parametric bootstrap are shown in bold fonts. Close to the bottleneck symbol (X), is indicated the size of the population if the bottleneck lasted for a single generation (instbot fastsimcoal2 option) for modeling convenience.

98 Marchi et al. The mixed genetic origin of the first farmers of Europe

30 worse fitted SFS entries

0 Northern Greece Danube Gorges Mesolithic

● ● ● rved 14000 rved ● e e 12000 12000 10000 ● f 10000 ● ● ● ●

logLhood dif ●

4000 ● ● ●

0 1 2 3 4 0 1 2 3 4

(1,0,0,0) (2,1,0,0) (1,1,0,0) (0,1,0,0) (0,0,1,0) (1,0,0,1) (2,0,0,1) (1,0,1,0) (3,0,0,0) (2,2,0,0) (4,3,2,2) (0,1,1,0) (0,3,2,2) (1,3,2,2) (2,0,1,0) (3,3,2,2) (4,2,0,0) (1,2,1,0) (0,0,1,1) (4,4,2,1) (4,2,2,0) (4,4,1,2) (0,3,2,1) (2,2,1,2) (1,2,0,1) (0,1,1,1) (1,3,1,1) (3,1,0,0) (0,4,2,2) (4,2,2,2) i i

Bichon Loschbour

● rved ● rved e e 30 worse fitted SFS entries

rved

4000 14000 14000

● 3000 ●

10000 ● ● 10000

2000

n i i

1000

0

(1,0,0,0) (2,1,0,0) (1,1,0,0) (0,1,0,0) (0,0,1,0) (1,0,0,1) (2,0,0,1) (1,0,1,0) (3,0,0,0) (2,2,0,0) (4,3,2,2) (0,1,1,0) (0,3,2,2) (1,3,2,2) (2,0,1,0) (3,3,2,2) (4,2,0,0) (1,2,1,0) (0,0,1,1) (4,4,2,1) (4,2,2,0) (4,4,1,2) (0,3,2,1) (2,2,1,2) (1,2,0,1) (0,1,1,1) (1,3,1,1) (3,1,0,0) (0,4,2,2) (4,2,2,2) Figure S49 - Fit between the observed and estimated SFS under the best ML model presented in Fig. S48 (total on the left panel; marginal for each population on the right).

99 Marchi et al. The mixed genetic origin of the first farmers of Europe

Section 7: Validating the proposed demographic model

We validated the scenario from Fig. 3a-b using several complementary approaches, as outlined below.

Validation of parameter estimates from parametric bootstraps

In addition to the clear good fit found between the expected and observed SFS for every model, the parametric bootstrap approach used to compute parameters confidence intervals actually brings a different validation of our estimation procedure. The fact that we can infer confidence intervals around our estimated parameters shows that we can correctly recover the simulated parameters with our approach. Indeed, for every dataset except DS6_BL (see above), the point estimates were found to lie within the CI, showing that if the true model was that we estimate, we should be able to recover it with our maximum-likelihood estimation procedure. Note that for DS6_BL, the discrepancy between point estimates and CI limits can be solved by rescaling the population branch lengths expressed in units of population size (t/(2N)), showing that we can correctly recover the amount of drift having occurred in these branches.

Predictive Simulations

We tested if we were able to reproduce the actual genetic diversity observed within ancient samples under the scenario shown in Fig. 3a-b. We assembled as a single fastsimcoal2 input file (.par) a complex scenario including the maximum-likelihood parameter values of the best models for the 6 data sets (all except DS6-LV), corresponding to what is described in Fig. 3a-b. We simulated data from the 11 sampled populations as well as from the two unsampled metapopulations and from the unsampled “Admixed” population (2 diploid individuals for each), and 2 diploid African- like genomes for later use (see section on f-statistics below). In order to access how the genetic data evolved through drift, we sampled the two metapopulations at the estimated time of the massive HG admixture 669 generations ago, and then at 295 generations; the admixed population was sampled one generation before the massive HG admixture (670 generations ago), one generation after (668 generations ago) and then every 100 generations until the Aegean split (i.e. 568, 468 and 368 generations ago). The genetic data were simulated as 10 million independent segments of 100bp for 10 different replicates and a mutation rate of 7.5e-9 (chosen within the range of rates of the 7 data sets). We

100 Marchi et al. The mixed genetic origin of the first farmers of Europe

then computed the nucleotide divergence "XY between all diploid samples (for a number of polymorphic sites ranging between 1,764,987 and 1,768,249 among the 10 replicates). We found a significant correlation between simulated and observed distances computed over the neutral portion of the genome (Mantel test: r = 0.7, p-value = 0.001, with similar values for the 10

replicates). We then performed a MDS analysis of the "XY genetic distances computed among the Europeans and SW Asians simulated and plotted the results in Fig. S50, next to the same analysis done on all ancient genomes. We see that the simulated MDS is very similar to the observed one, showing that our model can reproduce very well the observed genetic structure of the ancient samples (see Fig. 3c for a comparison of MDS performed on exactly the same individuals for simulated and observed data, where the fit is even better).

Sampled: Unsampled: a ● b ●●●●●●●● Bichon r ● Loschbour ● r ● ● Dan ● ● Kon Palaeo/Mesolithic: LAustria Bichon

Loschbour rn VLASA32 Neolithic: ● Zagros ● Bar8 ● ● NWAnatolia LEPE52 ● Bar25 ● ●

● Stuttgart ● ● CarsP ● ● ST ● LAustria ● ● ● ● ●

Figure S50 - 2D-MDS performed on the average nucleotide divergence ("XY ) matrix computed over the neutral portion of the genome of ancient samples (a; n=24) or over simulated diploid genetic data for the samples used in the demographic models (listed in the left column in panel a’s legend, n=17) and the unsampled populations (b, 2 diploid samples simulated per population). The b plot is shown for one simulated dataset but the 9 others gave similar plots. Note that a rotation of the y-axis was done to better see the similarities with a). Dash arrows highlight the admixture event resulting into the Admixed population, and solid arrows show its evolutionary trajectory over 300 generations.

Fig. S50b shows that the MetaCentral population is genetically similar to all European Hunter- Gatherers (HGs) but that the MetaEast population is even further apart from all other samples than the Iranian Early Neolithic genome. In both metapopulations, old samples are slightly less differentiated than most simulated ancient samples, confirming that little drift occurred in these

101 Marchi et al. The mixed genetic origin of the first farmers of Europe large metapopulations. Importantly, we gain some interesting insight by modeling the admixture process between the Central and the Eastern metapopulations. Indeed, we find that the pre- Admixed Eastern HG individuals, sampled one generation before the HG admixture (670g) are close to the Iranian Early Neolithic genome (WC1) and that the Admixed population sampled one generation after the admixture event is, as expected, on a straight line between the pre-Admixed and the MetaCentral populations. By following genomes from the admixed population sampled at different times, we see that these populations progressively drift away from the ancestral admixed population towards the top left corner of the MDS, where the European and Anatolian Neolithic samples are located. This plot therefore shows that data simulated under our demographic scenario almost perfectly matches the real data and their pattern on MDS plots, and that the current position of the early farmers is explained by drift having occurred after the initial admixture event.

By simulating data from the inferred model, we were also able to obtain the admixture patterns seen on real data (Fig. S51a) with the simulated data (Fig. S51b) with, for K=3, the HGs as the blue component, WC1 close to Eastern HGs represented by the green component, the Neolithic European and Anatolian populations clustering with the red component, and some evidence of admixture with HGs in AKT16. However, the large late Pleistocene HG admixture is not visible with real nor simulated data in the Admixture Analyses. Given the large admixture between the Central and Eastern HGs, one would have expected to see for K=2 a 40-60% mixture of blue/red components in the Neolithic samples and not almost exclusively the red component. Interestingly, with simulations of the Admixed population at different sampling times, we are able to show that the signal of admixture is indeed visible in the Admixed individuals simulated just after the admixture, and remains up to 200 generations after the admixture event. However, this signal then disappears such that individuals from the Admixed population sampled at about the same time as early Aegean farmers (348 generations ago) only show the red component. It shows that the initial admixture signal is progressively erased by drift and an absence of this signal in the early European and Anatolian farmers is indeed expected under our scenario.

102 Marchi et al. The mixed genetic origin of the first farmers of Europe

a 1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4 Admixture coefficients Admixture coefficients

0.2 0.2

0.0 0.0 AR1 AR1 Herx Herx WC1 Ess7 WC1 Ess7 Dil16 Asp6 Dil16 Asp6 Nea2 Nea3 Nea2 Nea3 T T Bar25 Bar25 Klein7 Klein7 AKT16 Bichon AKT16 Bichon S S VLASA7 VLASA7 VLASA32 VLASA32 Loschbour Loschbour Danube Zagros NWAnatolia NGreece CSerbia LAustria SGermany Danube Zagros NWAnatolia NGreece CSerbia LAustria SGermany Gorges Gorges b 1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4 Admixture coefficients Admixture coefficients

0.2 0.2

0.0 0.0 y y y ia ia y y y n n n ia ia r r n n n r r ma ma ma ust ust ma ma ma ust ust r r r Bichon A A r r r Bichon A A CSerbia CSerbia L L CSerbia CSerbia L L NGreece NGreece NGreece NGreece Loschbour SGe SGe SGe a (Bon002) Loschbour SGe SGe SGe ube Gorges ube Gorges a (Bon002) y ube Gorges ube Gorges n n y n n n Zagros (WC1) n o Zagros (WC1) o Da Da K Anatolia (Bar25) Da Da K Anatolia (Bar25) Anatolia (AKT16) Anatolia (AKT16) W W W N W N N N c 1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4 Admixture coefficients Admixture coefficients

0.2 0.2

0.0 0.0 y y 3 ia ia y y 3 n n n r r ia ia n n n r r ma ma ust ust ma r r ma ma ust ust r ma Bichon A A r r r Bichon A A CSerbia CSerbia L L ed 668g ed 668g ed 568g ed 568g ed 468g ed 468g ed 368g ed 368g ed 670g ed 670g NGreece NGreece CSerbia CSerbia L L ed 668g ed 668g ed 568g ed 568g ed 468g ed 468g ed 368g ed 368g ed 670g ed 670g x x x x x x x x x x NGreece NGreece x x x x x x x x x x Loschbour SGe SGe SGe a (Bon002) Loschbour SGe SGe SGe ube Gorges ube Gorges a (Bon002) y ube Gorges ube Gorges n n y n n n Zagros (WC1) Admi Admi Admi Admi Admi Admi Admi Admi n o Zagros (WC1) Admi Admi Admi Admi Admi Admi Admi Admi o Da Da K Anatolia (Bar25) Da Da K Anatolia (AKT16) Anatolia (Bar25) Anatolia (AKT16) W pre-Admi pre-Admi W W pre-Admi pre-Admi N W N N N Figure S51 - Admixture plot for K=2 (left panel) and K=3 (right panel) carried on a) observed non-missing data for 16 ancient genomes included into fastsimcoal2 demographic analyses and best scenario (1,728,857 sites; Bon002 was not included because of too low quality); b) simulated data from the 11 sampled populations represented by 17 individuals; c) simulated data from the 11 sampled populations and the Admixed population at 5 different sampling times (2 individuals each time).

103 Marchi et al. The mixed genetic origin of the first farmers of Europe f-statistics

We compared the samples obtained in this study with previously published, target-enriched datasets available through the reference 1240K dataset v42.4 at https://reich.hms.harvard.edu/downloadable-genotypes-present-day-and-ancient-dna-data- compiled-published-papers. We refer to this dataset as 1240K in the following. To ensure our data is comparable to the pseudo-haploid calls in 1240K, we generated majority calls for all our ancient individuals with ATLAS (task=majorityBase; commit 7cfc9008a6af1938f8353ad5ca3ef5a0b3117d3d) at the 1,233,013 SNP positions present in this reference dataset. We then used these calls for the following f-statistics analyses, always using Mbuti as the outgroup.

Confirming population assignment We first used f-statistics to confirm that the assignment of samples to specific populations is not violated by the presence of shared drift with external samples. For this, we calculated all possible f-statistics of the form of D(Individual 1 from the population tested, Individual 2 from the population tested; Other samples, Outgroup) using ADMIXTOOLS179 and Mbuti as the outgroup.

Southern Germany Lower Austria |3| Central Serbia Lepenski Vir

Northern Greece |1| NW Anatolia |0| Danube Gorges Mesolithic t r as1 AR1 Herx Bar8 WC1 Ess7 P Dil16 Asp6 Nea3 T Klein7 AKT16 Bichon S LEPE48 VLASA7 Stuttga Cars Loschbour Figure S52 - Heatmap of absolute Z-Scores of f-statistics of the form D(Individual 1 from the population tested, Individual 2 from the population tested; Other samples, Outgroup) for each population with two or more individuals and using Mbuti as the outgroup. Red shades indicate significant absolute Z-scores above 3.0, purple shades indicate Z-scores below this threshold. For comparisons with the population Germany that contained more than two samples, the highest Z- score of all possible two-way comparisons is shown.

As shown in Fig. S52, only very few comparisons violated our assignment. First, AKT16 from the NW Anatolia region showed some HG_West ancestry (represented by Loschbour and Bichon) not present in Bar25. As a result, we modelled the two Marmara individuals as independent 104 Marchi et al. The mixed genetic origin of the first farmers of Europe populations in the fastsimcoal2 analysis. Second, more HG_Central ancestry was evidenced for Asp6 than Klein7 from Austria. However, variation in HG ancestry is expected in Early Neolithic populations due to the ongoing process of admixture. Since these samples did not show variation in their affinities to other Neolithic samples, modelling them as a single population seems justified.

Global model-fit using admixture graphs To validate the model inferred by fastsimcoal2, we compared f-statistics predicted under the model against those calculated from the data. For this, we first created an admixture graph matching the model in Fig. 3a-b. We validated this graph using data simulated with fastsimcoal2 under the model shown in Fig. 3a-b and including an African outgroup population as outlined above (see section Predictive Simulations). The graph shown in Fig. S53 is the simplest graph (in terms of admixture edges) for which no major differences between f-statistics predicted under the model and those calculated from the simulated data were detected with qpGraph from ADMIXTOOLS v7300179. Specifically, we fitted the graph 100 times with 1,000 initial conditions each. We identified the 20 best runs using the final score and calculated the median for each branch. In that graph, only three f-statistics had barely significant Z-scores: 1. f4(NGreece, Bichon; SGermany, Bar25); predicted: -0.0075, estimated: -0.0099, Z-score: - 3.010. 2. f4(NGreece, Bichon; Bar25, AKT16); predicted: 0.0138, estimated: 0.0166; Z-score: 3.011. 3. f4(SGermany, MetaEast; WC1, oldMetaEast); predicted: 0.0631; estimated: 0.0608; Z-score: -3.143.

We note that in contrast to the fastsimcoal2 analysis, the admixture graph analysis did not result in accurate estimates of all admixture proportions (Fig. S53), suggesting that the SFS captures more information of the full data than f-statistics. However, some admixture proportions were estimated relatively accurately, and in particular the admixture between the eastern and central hunter-gatherers leading to the admixed population ancestral to early farmers (Fig. S53). We next tested for discrepancies between f-statistics predicted under this model and those calculated using either 1) the same data as used in model fitting with fastsimcoal2 (DS11, see Table S9, Fig. S54) or 2) calls for the 1240K SNPs obtained as described above (Fig. S55). In both cases we refitted the admixture graph 100 times with qpGraph using 1,000 initial conditions to match specificities of the datasets and calculated the mean over the 20 best graphs using the final score. 105 Marchi et al. The mixed genetic origin of the first farmers of Europe

On the DS11 dataset, no f-statistics was violated in the best graph. Among the 100 replicates, 42 had significant outlier f-statistics, and in 41 of those the largest Z-score was calculated for the comparison f4(NGreece, AKT16; LAustria, SGermany) with Z-scores between -3.000 and -3.116.

On the 1240K SNPs, five f-statistics were significant in the best run: 1. f4(Vlasac, Bichon; SGermany, Bar25); predicted: -0.0004, estimated: -0.0066, Z-score: -3.332. 2. f4(CSerbia, Bichon; SGermany, Bar25); predicted: -0.0001; estimated: -0.0081; Z-score: -3.676. 3. f4(Mbuti, Bichon; SGermany, Bar25); predicted: 0.0019; estimated: -0.0058; Z-score: -3.626. 4. f4(SGermany, Bon002; Loschbour, Bichon); predicted: 0.0028; estimated: -0.0041; Z-score: -3.4. 5. f4(SGermany, Bar25; Loschbour, Bichon); predicted: 0.0014; estimated: -0.0053; Z-score: -3.271.

We note that all these include Bichon and most include Bar25, suggesting that the model does not capture their relationship perfectly. However, we stress that these estimates are on an ascertained set of SNPs different from the neutral set used when fitting the fastsimcoal2 model. As with simulated data, most admixture proportions could not be inferred accurately from real data. An important exception was the admixture event resulting in the source population of Aegean and European farmers, which was inferred to have received a roughly 40% contribution from a central hunter-gatherer population, which is well in line with the 38-50% 95% CI previously estimated with fastsimcoal2 (Fig. S34).

106 Marchi et al. The mixed genetic origin of the first farmers of Europe

R

16 16

African MetaBoth

8 97

MetaEast MetaC_W

0 15 3 32

MetaEast1 MetaCentral MetaWest

MetaEast2 57% 43% 17 29

40 MetaAdmixed 6% Bichon_drift

44 0-1 94% Loschbour_drift

MetaAdmixed_drift Bichon_MIX

4 WC1_drift 19 19

91% Aegean 90% MetaCentral1 Bichon 97%

9% 14 2 10% 5-207

WC1_MIX1 Marmara Greece_MIX0 Bon002_MIX1 MetaCentral2 14%

14 81% 97% 19% 3% 3% 86%

MetaEast3 100% Bar25_drift AKT16_MIX1 94% Greece_MIX1 5% Loschbour_MIX DanubeGorges_MIX1

21% 0% 79% 6% 11% 95% 89% 31 16

7% AKT16_MIX2 WC1_MIX2 Bon002_MIX2 Greece_MIX2 Bar25_MIX1 6% Loschbour DanubeGorges

65 48 56 93% 3

5% Akt16 Bar25_MIX2 WC1 Bon002 Greece_MIX3

18 1 94% 2%

Bar25 NGreece Serbia_MIX1

5% 95%

Serbia_MIX2 3%

4

3% Serbia_drift

13 98%

CSerbia Austria_MIX1

95%

Austria_MIX2

1

Austria_drift

15 97%

LAustria Germany_MIX1

97%

Germany_MIX2

7

SGermany

Figure S53 - Admixture graph fitted to data simulated under the fastsimcoal2 model. Shown are the median branch lengths and mixture proportions among the best 20 graphs (lowest final score). Branches and mixture proportions are shown in red if the 90% quantile among the best 20 graphs was larger than given by half and twice the median (factor two) of the branch length or odds ratios, respectively. For outlier branches, the full range of estimates among the best 20 graphs are shown. 107 Marchi et al. The mixed genetic origin of the first farmers of Europe

R

61 61

Mende MetaBoth

0-73 92

MetaEast MetaC_W

0-72 0-57 0-1 118

MetaEast1 MetaCentral MetaWest

MetaEast2 12-40 74% 26% 0-5

MetaAdmixed 39% 8 Bichon_drift

WC1_drift 47 6 61%

MetaAdmixed_drift Bichon_MIX

0-72 91% 0-24 Loschbour_drift 0-19

90% Aegean MetaCentral1 Bichon

0-4 9% 0-1 10% 23-1000 46%

WC1_MIX1 Bon002_MIX1 Marmara Greece_MIX0 MetaCentral2 100

25 86% 14% 92% 8% 0 54%

MetaEast3 100 87 Bar25_drift AKT16_MIX1 Greece_MIX1 7% DanubeGorges_MIX1 Loschbour_MIX

0 13 10% 90% 24% 76% 93% 23 40

WC1_MIX2 Bon002_MIX2 19% AKT16_MIX2 Greece_MIX2 Bar25_MIX1 0% DanubeGorges Loschbour

47 53 81% 48 0-2

1% WC1 Bon002 Bar25_MIX2 Akt16 Greece_MIX3

44 19 100% 2%

Bar25 NGreece Serbia_MIX1

0% 99%

Serbia_MIX2 3%

0-2

9% Serbia_drift

21 98%

CSerbia Austria_MIX1

100%

Austria_MIX2

0

Austria_drift

20 97%

LAustria Germany_MIX1

91%

Germany_MIX2

11

SGermany

Figure S54 - Admixture graph fitted to the DS11 data set. Color and labels as in Fig. S53.

108 Marchi et al. The mixed genetic origin of the first farmers of Europe

R

80 80

Mbuti_DG MetaBoth

0-57 92

MetaEast MetaC_W

0-65 43 32 32-270

MetaEast1 MetaCentral MetaWest

MetaEast2 0-234 60% 40% 125

MetaAdmixed 39% 88 Bichon_drift

WC1_drift 65 0 61%

MetaAdmixed_drift Bichon_MIX

0-65 81% 0 Loschbour_drift 334

87% Aegean MetaCentral1 Bichon

0-102 19% 2 13% 19-1000 49%

WC1_MIX1 Bon002_MIX1 Marmara Greece_MIX0 MetaCentral2 100%

201 84% 16% 90% 10% 0% 51%

MetaEast3 95% 76% Bar25_drift AKT16_MIX1 Greece_MIX1 7% DanubeGorges_MIX1 Loschbour_MIX

5% 24% 10% 90% 24% 76% 93% 129 363

WC1_MIX2 Bon002_MIX2 11% AKT16_MIX2 Greece_MIX2 Bar25_MIX1 2% Meso10x Loschbour

363 459 89% 439 5

0 WC1 Bon002 Bar25_MIX2 AKT16 Greece_MIX3

288 149 98% 1%

Bar25 Greece10x Serbia_MIX1

5% 100

Serbia_MIX2 1%

1

0 Serbia_drift

150 99%

CSerbia Austria_MIX1

95%

Austria_MIX2

0-7

Austria_drift

152 99%

LAustria Germany_MIX1

100

Germany_MIX2

149

SGermany

Figure S55 - Admixture graph fitted to the 1240k data set. Color and labels as in Fig. S53.

109 Marchi et al. The mixed genetic origin of the first farmers of Europe

Early separation of Konya Plain from Northern Greece The fastsimcoal2 analysis revealed an early split between the Konya Plain and the population leading to the Neolithic samples in the Aegean (see Fig. S43). In the same analysis, admixture from MetaEast into Konya Plain was estimated to be substantially lower (2%) than admixture from MetaEast into Northern Greece (9%). In contrast, the estimates for admixture from MetaCentral into Konya Plain was estimated to be substantially larger (8%) than admixture from MetaCentral in Northern Greece (3%). Here we used two sets of f-statistics to support these findings.

Excess HG Central/West ancestry in the Konya Plain To confirm the excess HG ancestry in the Konya Plain, we used f-statistics of the form D(NGreece/NW Anatolia, Konya Plain; HG_Central/West, Outgroup). Here, HG_Central/West is represented by the Vlasac individuals (VLASA7 and VLASA32) from this study and Bichon and Loschbour. The Konya Plain is represented by 1) the four Anatolia_Boncuklu samples available in 1240k144 consisting of the Bon002 sample used in the fastsimcoal2 analysis and three additional low-depth samples, and 2) the additional sample Pınarbaşı, an Epipalaeolithic HG from the Konya Plain dating to 15,592-15,023 cal BP218. As shown in Fig. S56, we observed a significant excess of shared drift between HG_Central/West and Konya Plain compared to NGreece/NW Anatolia. This signal was particularly strong for Pınarbaşı (all comparisons involving > 800,000 SNPs, Supp. Table 5). For NW Anatolia_Boncuklu, the signal was only present when compared to the late neolithic Greek samples (Pal7 and Klei10). A notable exception from the general pattern is AKT16 that shows excess shared drift with HG_Central/West compared to Konya Plain (Fig. S56), in line with elevated HG admixture estimated by fastsimcoal2 (Fig. 3) and also evidenced by the admixture analysis (see Fig. S21). We conducted similar comparisons using Neolithic Iranian and HG East samples from 1240K of the form D(NGreece/NW Anatolia, Konya Plain; Neolithic Iranian/HG East, Outgroup). Here, Neolithic Iranian are represented by WC1, Iran_TepeAbdulHosein_N and Iran_GanjDareh_N219 and HG East is represented by Georgia_Kotias and Georgia_Satsurblia141. None of these comparisons revealed a significant difference between the samples from Northern Greece and NW Anatolia and those from the Konya Plain in their level of shared drift with neither Neolithic Iranian nor HG East (Supp. Table 5), in line with the fastsimcoal2 model that places these populations as sister groups.

110 Marchi et al. The mixed genetic origin of the first farmers of Europe

VLASA7 TepecikCiftlik_N

VLASA32

Bichon

VLASA7 6 Boncuklu_N

VLASA32 3

0

Bichon

VLASA7 VLASA32

Bichon v5 e Bar8 R Nea2 Nea3 Bar25 AKT16 Diros_EN Anatolia_N Greece_LN Kumtepe_N Boncuklu_N eloponnese_N epecikCiftlik_N P T

Figure S56 - Heatmap of Z-Scores of f-statistics of the form D(NGreece/NW Anatolia, Test; HG_Central/West, Outgroup) using Mbuti as the outgroup and where Test is (listed from the top): Tepecik-Çiftlik, NW Anatolia_Boncuklu and Pınarbaşı. Red shades indicate significant positive Z- scores above 3, purple shades are for Z-scores below 3 and above 0, blue shades for Z-scores below 0 and above -3 and green shades indicate significant negative Z-scores below -3.

Scarcity of MetaEast ancestry in the Konya Plain

To confirm the scarcity of MetaEast ancestry in the Konya Plain, we used individuals from from the Chalcolithic period (Israel_C), the PPNB period (Israel_PPNB) and the Natufian culture (Israel_Natufian; all Israel individuals were from 180) as eastern proxies and we performed f- statistics of the form D(NGreece/NW Anatolia, Konya Plain; Israel, Outgroup). Most of the comparisons involving NW Anatolia_Boncuklu showed a strongly significant excess of shared drift between Israel and NGreece/NW Anatolia compared to NW Anatolia_Boncuklu (Fig. S57), in line 111 Marchi et al. The mixed genetic origin of the first farmers of Europe

with the scarcity of MetaEast admixture in the Konya Plain inferred by fastsimcoal2. Interestingly, however, this signal was not replicated when representing the Konya Plain by Pınarbaşı (Fig. S57): in that case, none of the comparisons was significant. Similarly, no significant comparisons were found when using either Neolithic Iranian or HG-East instead of Israel (Supp. Table 5). P eloponnese_N Israel_PPNB

Israel_Natufian

Israel_C

Israel_PPNB Di r os_EN Israel_Natufian

Israel_C

Israel_PPNB R e

Israel_Natufian v5

Israel_C

6 Israel_PPNB

Nea2 3

Israel_Natufian 0

Israel_C Israel_PPNB Nea3 Israel_Natufian

Israel_C

Israel_PPNB Boncuklu_N

Israel_Natufian

Israel_C

Israel_PPNB

Israel_Natufian

Israel_C AR1 NE1 Herx Bar8 Ess7 Dil16 Asp6 Rev5 Nea3 T Klein7 AKT16 S LEPE48 Stuttgart Diros_EN Greece_LN Boncuklu_N epecikCiftlik_N T Peloponnese_N Figure S57 - Heatmap of Z-Scores of f-statistics of the form D(NGreece/NW Anatolia/Early European Farmers, Test; Israel_PPNB/Natufian/C, Outgroup) using Mbuti as the Outgroup and Test as (listed from the top): Peloponnese_N, Diros_EN, Rev5, Nea2, Nea3, NW Boncuklu and Pınarbaşı. Red shades indicate significant positive Z-scores above 3, purple shades are for Z- scores below 3 and above 0, blue shades for Z-scores below 0 and above -3 and green shades indicate significant negative Z-scores below -3.

112 Marchi et al. The mixed genetic origin of the first farmers of Europe

Structure among HG individuals The model inferred by fastsimcoal2 predicts a deep split between Loschbour, Bichon and the Central metapopulation. To validate these findings, we investigated the relationships among HG samples using f-statistics. First, we calculated f-statistics of the form D(Loschbour, Bichon; HG_Central, Outgroup), representing HG_Central by the two Danube Gorges Mesolithic individuals VLASA7 and VLASA32. In line with their deep split inferred by fastsimcoal2, none of these comparisons was significant. We next investigated if the HG ancestry in NW Anatolian individuals can be attributed more to HG_Central than to Bichon, as predicted by the fastsimcoal2 model. In line with these predictions, f-statistics of the form D(Bichon, HG_Central; AKT16/Bar25, Outgroup) resulted in significant, negative values (Z-score from -3.218 to -4.826). Interestingly, the comparisons of the form D(Loschbour, HG_Central; AKT16/Bar25, Outgroup) were not significant and additional comparisons of the form D(Loschbour, Bichon; AKT16/Bar25, Outgroup) resulted in significant positive values. We interpret this as a result of sample age (Bichon is much older than Loschbour) and the admixture from HG_Central that was consequently estimated at a much older time for Bichon than Loschbour, allowing for shared drift in the HG_Central component in both Loschbour and AKT16/Bar25. Noteworthy are further the many significant and positive f-statistics of the form D(Bichon/Loschbour, HG_Central; HG_Adriatic, Outgroup), where HG_Adriatic is represented by Italian and Croatian Mesolithic and Epipalaeolithic HGs (see Supp. Table 5). This may point to an enhanced level of drift in the Danube Gorges Mesolithic samples (representing HG_Central here) and their relative uniqueness when compared to other HG populations in the region.

Differentiation within Greece and Konya Plain

Peloponnese: Based on differences between Neolithic samples from the Peloponnese to those from other regions, it was suggested that the Peloponnese might constitute a wave of neolithization other than the Balkan route127. Specifically, it was shown that Neolithic samples from the Peloponnese are shifted towards CHG-like ancestry and away from WHG-like ancestry, potentially due to contacts with populations like Kumtepe or Tepecik-Çiftlik that show similar patterns. We benefited from the additional samples generated in this study to shed additional light on the relationship between Neolithic samples from the Peloponnese and Northern Greece. We confirm that Greece_Peloponnese (four individuals from Diros Cave and one from

113 Marchi et al. The mixed genetic origin of the first farmers of Europe labelled as Greece_Peloponnese_N, all Middle or Late Neolithic) are indeed more CHG-like than Rev5. However, this pattern is stronger when CHG is substituted with Iran_GanjDareh individuals (interestingly not WC1): D(NGreece, Greece_Peloponnese; Iran_GanjDareh, Mbuti) is significant and positive for all Greek Neolithic samples (Supp. Table 5). However, this signal is not obviously connected to Kumtepe or Tepecik-Çiftlik, as Greece Peloponnese samples also show more influence from the Levant (with Israel samples as proxies) than Marmara, Greece (Late Greece) and central European samples (Klein7), see Fig. S57. In addition, the same analyses conducted with the Peloponnese Early Neolithic sample I5427 (labelled here Diros_EN) showed that higher amounts of Levantine ancestry was already present in the Early Neolithic, at least compared to the Barcın samples (Bar8 and Bar25). The sample Diros_EN also shows more shared drift with Iran_GanjDareh than Nea2 and more influence from Pınarbaşı than the later Peloponnese group (Greece_Peloponnese_N).

Northern Greece and NW Anatolia: In general, samples from Northern Greece and NW Anatolia show a relatively high level of heterogeneity in shared drift with samples from other populations (Fig. S58, Fig. S57). Particularly interesting differences include the excess shared drift with Levantine samples found for the Nea Nikomedeia sample Nea3 (the older) but not Nea2 (the younger) when comparing these to other samples from the region (Fig. S57). Rev5, which is of similar age as Nea3, also shows a bit more of the Levantine ancestry (Fig. S57), suggesting a decrease in that ancestry over time. While there is no significant f-statistics when directly comparing Rev5 to Nea2 or Nea3, it does seem that all comparisons with the 4 HG_Central and HG_West genomes in our dataset show a trend for an excess shared drift with HG samples for Rev5 and Nea3 (Supp. Table 5).

Konya Plain: We investigated the relationships inside Anatolia by comparing our samples from Northern Greece and NW Anatolia to Pınarbaşı (Epipalaeolithic), Boncuklu (Neolithic), Kumtepe (Chalcolithic) and Tepecik-Çiftlik (Neolithic). A first result was that Tepecik-Çiftlik is cladding outside the branch represented by Boncuklu and Pınarbaşı as D(Boncuklu, Pınarbaşı, Tepecik- Çiftlik, Outgroup) is not significant (Z-score 0.76), while the two alternative f-statistics with either Boncuklu or Pınarbaşı as a sister group were highly significant (Z-score < -6.29, Supp. Table 5). Tepecik-Çiftlik also seems to clad outside a branch leading to Boncuklu, Pınarbaşı and all Neolithic Aegean samples as all f-statistics of the form D(Aegean, Boncuklu / Pınarbaşı, Tepecik-Çiftlik,

114 Marchi et al. The mixed genetic origin of the first farmers of Europe

Outgroup) were not significant (Z-scores < |2.051|), but 8 of 24 f-statistics of the form D(Aegean, Tepecik-Çiftlik, Boncuklu / Pınarbaşı, Outgroup) and D(Boncuklu / Pınarbaşı, Tepecik-Çiftlik, Aegean, Outgroup) were significant, in particular where Aegean is represented by the Marmara samples and an Early Neolithic sample from Peloponnese (AKT16, Bar8, Anatolia_N and Diros_EN, Supp. Table 5).

Herx Dil16 VLASA32 Greece_LN Peloponnese_N Diros_EN Rev5 Nea2 Nea3 Bar25 Bar8 AKT16 Anatolia_N Kotias Losc Bichon Greece_LN Peloponnese_N 10 Diros_EN Rev5 5 Nea2 Nea3 0 Bar25 Bar8 AKT16 Anatolia_N Iran_GanjDareh_N Boncuklu_N Greece_LN Peloponnese_N Diros_EN Rev5 Nea2 Nea3 Bar25 Bar8 AKT16 Anatolia_N v5 v5 v5 e e e Bar8 Bar8 Bar8 R R R Nea2 Nea3 Nea2 Nea3 Nea2 Nea3 Bar25 Bar25 Bar25 AKT16 AKT16 AKT16 Diros_EN Diros_EN Diros_EN Anatolia_N Anatolia_N Anatolia_N Greece_LN Greece_LN Greece_LN eloponnese_N eloponnese_N eloponnese_N P P P Figure S58 - Heatmap of Z-scores of f-statistics of the form D(Greece/NW Anatolia 1; Greece/NW Anatolia 2; Selected sample; Outgroup) with Mbuti as an outgroup and selected sample marked in the graph. Red shades indicate significant positive Z-scores above 3, purple shades are for Z-scores below 3 and above 0, blue shades for Z-scores below 0 and above -3 and green shades indicate significant negative Z-scores below -3.

115 Marchi et al. The mixed genetic origin of the first farmers of Europe

References

1. Runnels, C. A Prehistoric Survey of Thessaly: New Light on the Greek Middle Paleolithic. Journal of Field Archaeology 15, 277–290 (1988). 2. Özdoǧan, M. Anatolia from the Last Glacial Maximum to the Holocene Climatic Optimum: cultural formations and the impact of the environmental setting. Paléorient 23, 25–38 (1997). 3. Runnels, C. & Özdoğan, M. The Palaeolithic of the Bosphorus Region NW Turkey. Journal of Field Archaeology 28, 69–92 (2001). 4. Kozłowski, J. K. Gravettian/Epigravettian sequences in the Balkans: environment, technologies, hunting strategies and raw material procurement. British School at Athens Studies 3, 319–329 (1999). 5. Perlès, C. Greece, 30,000-20,000 bp. in Hunters of the Golden Age. The Mid-Upper Palaeolithic of Eurasia 30,000 - 20,000 BP (eds. Roebroeks, W., Mussi, M., Svoboda, J. & Fennema, K.) vol. 31 375–397 (University of Leiden, 2000). 6. Adam, E. Looking out for the Gravettian in Greece. Paléo 145–158 (2007). 7. Tourloukis, V. & Harvati, K. The Palaeolithic record of Greece: a synthesis of the evidence and a research agenda for the future. Quat. Int. 466, 48–65 (2018). 8. Clark, P. U. et al. The Last Glacial Maximum. Science 325, 710–714 (2009). 9. Lambeck, K. Late Pleistocene and Holocene sea-level change in Greece and south-western Turkey: a separation of eustatic, isostatic and tectonic contributions. Geophys. J. Int. 122, 1022–1044 (1995). 10. Benjamin, J. et al. Late Quaternary sea-level changes and early human societies in the central and eastern Mediterranean Basin: An interdisciplinary review. Quat. Int. 449, 29–57 (2017). 11. Papoulia, C. Late Pleistocene to Early Holocene Sea-crossings in the Aegean: direct, indirect and controversial evidence. Géoarchéologie des Îles de Méditerranée. CNRS Editions, Paris 33–46 (2016). 12. Carter, T. et al. Earliest occupation of the Central Aegean (Naxos), Greece: Implications for hominin and Homo sapiens’ behavior and dispersals. Sci Adv 5, eaax0997 (2019). 13. Esin, N. V., Esin, N. I. & Yanko-Hombach, V. The Black Sea basin filling by the Mediterranean salt water during the Holocene. Quat. Int. 409, 33–38 (2016). 14. Kozłowski, J. K. & Kaczanowska, M. Gravettian/Epigravettian sequences in the Balkans and Anatolia. Mediterranean Archaeology and Archaeometry 4, 5–18 (2004). 15. Kozłowski, J. K. The Importance of the Aegean Basin for the Neolithization of South-Eastern Europe. Journal of the Israel Prehistoric Society 35, 409–424 (2005). 16. Carter, T. Obsidian consumption in the Late Pleistocene--Early Holocene Aegean: contextualising new data from Mesolithic Crete. Annual of the British School at Athens 111, 13–34 (2016). 17. Ramsey, C. B. Bayesian Analysis of Radiocarbon Dates. Radiocarbon 51, 337–360 (2009). 18. Reimer, P. J. et al. The IntCal20 Northern Hemisphere Radiocarbon Age Calibration Curve (0–55 cal kBP). Radiocarbon 62, 725–757 (2020). 19. Cook, G. T. et al. A Freshwater Diet-Derived 14C Reservoir Effect at the Stone Age Sites in the Iron Gates Gorge. Radiocarbon 43, 453–460 (2001). 20. Bonsall, C. et al. New AMS 14C dates for human remains from Stone Age sites in the Iron Gates reach of the Danube, southeast Europe. Radiocarbon 57, 33–46 (2015). 21. Borić, D., French, C. & Dimitrijević, V. Vlasac revisited: formation processes, stratigraphy and dating. Documenta Praehistorica 35, 261–287 (2008). 22. Borić, D. Adaptations and transformations of the Danube Gorges foragers (c. 13,000–5500 cal. BC): an overview. in Beginnings – New Research in the Appearance of the Neolithic between Northwest Anatolia and the Carpathian Basin. Papers of the International Workshop 8th - 9th April 2009, Istanbul. Organized by Dan Ciobotaru, Barbara Horejs und Raiko Krauß. (ed. Krauß, R.) 157–203 (Verlag Marie Leidorf GmbH, 2011). 23. Borić, D. & Price, T. D. Strontium isotopes document greater human mobility at the start of the Balkan Neolithic. Proceedings of the National Academy of Sciences of the United States of America 116 Marchi et al. The mixed genetic origin of the first farmers of Europe

110, 3298–3303 (2013). 24. Hofmanová, Z. Palaeogenomic and biostatistical analysis of ancient DNA data from Mesolithic and Neolithic skeletal remains. (2017). 25. Kreutzer, S. Populationsgenetische Analyse prähistorischer Individuen aus Griechenland. (2017). 26. Porčić, M., Blagojević, T., Pendić, J. & Stefanović, S. The Neolithic Demographic Transition in the Central Balkans: population dynamics reconstruction based on new radiocarbon evidence. Philos. Trans. R. Soc. Lond. B Biol. Sci. (2020) doi:10.1098/rstb.2019.0712. 27. Stadler, P. Versuch einer Auswertung der 14C-Proben von Kleinhadersdorf mittels Bayes’scher Statistik. in Das Linearbandkeramische Gräberfeld von Kleinhadersdorf. Mitteilungen der Prähistorischen Kommission 82 (eds. Neugebauer-Maresch, C. & Lenneis, E.) 149–151 (Verlag der Österreichischen Akademie der Wissenschaften, 2015). 28. Tasić, N. et al. Vinča-Belo Brdo, Serbia: The times of a tell [Dataset]. vols 1-75 (Heidelberg Research Data Repository, University of Heidelberg, 2016). 29. Baird, D. The Late Epipaleolithic, Neolithic, and Chalcolithic of the Anatolian Plateau, 13,000-- 4000 BC. in A Companion to the Archaeology of the Ancient (ed. Potts, D. T.) vol. 1 431– 465 (Wiley Online Library, 2012). 30. Kartal, M. Anatolian epi-paleolithic period assemblages: problems, suggestions, evaluations and various approaches. Anadolu / Anatolia 24, 45–62 (2003). 31. Baird, D. et al. Juniper smoke, skulls and wolves’ tails. The Epipalaeolithic of the Anatolian plateau in its South-west Asian context; insights from Pınarbaşı. Levantina 45, 175–209 (2013). 32. Çilingiroğlu, Ç. et al. Between Anatolia and the Aegean: Epipalaeolithic and Mesolithic Foragers of the Karaburun Peninsula. J. Field Archaeol. 45, 479–497 (2020). 33. Atakuman, Ç. et al. Before the Neolithic in the Aegean: The Pleistocene and the Early Holocene record of Bozburun - Southwest Turkey. The Journal of Island and Coastal Archaeology (2020) doi:10.1080/15564894.2020.1803458. 34. Efstratiou, N., Biagi, P. & Starnini, E. The Epipalaeolithic site of Ouriakos on the island of Lemnos and its place in the Late Pleistocene peopling of the east Mediterranean region. Adalya 17, 1–13 (2014). 35. Bocquet-Appel, J.-P. & Bar-Yosef, O. The Neolithic Demographic Transition and its Consequences. (Springer Science & Business Media, 2008). 36. Bar-Yosef, O. Climatic Fluctuations and Early Farming in West and East Asia. Curr. Anthropol. 52, S175–S193 (2011). 37. Shennan, S. The First Farmers of Europe: An Evolutionary Perspective. (Cambridge University Press, 2018). 38. Peters, J., von den Driesch, A., Helmer, D. & Saña Segui, M. Early Animal Husbandry in the Northern Levant. Paléorient 25, 27–48 (1999). 39. Zeder, M. A. Out of the Fertile Crescent: The dispersal of domestic livestock through Europe and Africa. in Human Dispersal and Species Movement: From Prehistory to the Present (eds. Boivin, N., Crassard, R. & Petraglia, M.) 261–303 (Cambridge University Press, 2017). 40. Vigne, J.-D. et al. First wave of cultivators spread to at least 10,600 y ago. Proc. Natl. Acad. Sci. U. S. A. 109, 8445–8449 (2012). 41. Arbuckle, B. S. et al. Data sharing reveals complexity in the westward spread of domestic animals across Neolithic Turkey. PLoS One 9, e99845 (2014). 42. Baird, D. et al. Agricultural origins on the Anatolian plateau. Proc. Natl. Acad. Sci. U. S. A. 115, E3077–E3086 (2018). 43. Weninger, B. et al. Neolithisation of the Aegean and Southeast Europe during the 6600-6000 calBC period of Rapid Climate Change. Documenta Praehistorica 41, (2014). 44. The Central/Western Anatolian Farming Frontier. Proceedings of the Neolithic Workshop held at 10th ICAANE in Vienna. vol. 12 (Austrian Academy of Sciences Press, 2019). 45. Özdoğan, M. Archaeological Evidence on the Westward Expansion of Farming Communities from Eastern Anatolia to the Aegean and the Balkans. Current Anthropology 52, S415–S430 (2011). 46. Karul, N. & Avci, M. B. Neolithic communities in the Eastern Marmara region: Aktopraklik C. 117 Marchi et al. The mixed genetic origin of the first farmers of Europe

Anatolica 37, 1–15 (2011). 47. Gerritsen, F. & Özbal, R. Barcın Höyük, a seventh millennium settlement in the Eastern Marmara region of Turkey. Documenta Praehistorica 46, 58–67 (2019). 48. Çilingiroğlu, Ç. & Çakırlar, C. Towards configuring the neolithisation of Aegean Turkey. Documenta Praehistorica 40, 21–29 (2013). 49. Çakırlar, C. Rethinking Neolithic subsistence at the gateway to Europe with new archaeozoological evidence from Istanbul. in Barely surviving or more than enough? (eds. Groot, M., Lentjes, D. & Zeiler, J.) 59–79 (Sidestone Press, 2013). 50. Horejs, B. et al. The Aegean in the Early 7th Millennium BC: Maritime Networks and Colonization. J World Prehist 28, 289–330 (2015). 51. van Andel, T. H. & Runnels, C. N. The earliest farmers in Europe. Antiquity 69, 481–500 (1995). 52. Perlès, C. The Early Neolithic in Greece: The First Farming Communities in Europe. (Cambridge University Press, 2001). 53. Perlès, C., Quiles, A. & Valladas, H. Early seventh-millennium AMS dates from domestic seeds in the Initial Neolithic at Franchthi Cave (Argolid, Greece). Antiquity 87, 1001–1015 (2013). 54. Douka, K., Efstratiou, N., Hald, M. M., Henriksen, P. S. & Karetsou, A. Dating Knossos and the arrival of the earliest Neolithic in the southern Aegean. Antiquity 91, 304–321 (2017). 55. Weiberg, E. et al. Long-term trends of land use and demography in Greece: A comparative study. Holocene 29, 742–760 (2019). 56. Halstead, P. & Isaakidou, V. Pioneer farming in earlier . in Farmers at the Frontier: A Pan European Perspective on Neolithisation (eds. Gron, K. J., Sørensen, L. & Rowley-Conwy, P.) 77–100 (Oxbow Books, 2020). 57. Krauß, R., Marinova, E., De Brue, H. & Weninger, B. The rapid spread of early farming from the Aegean into the Balkans via the Sub-Mediterranean-Aegean Vegetation Zone. Quat. Int. 496, 24–41 (2018). 58. Ivanova, M., De Cupere, B., Ethier, J. & Marinova, E. Pioneer farming in southeast Europe during the early sixth millennium BC: Climate-related adaptations in the exploitation of plants and animals. PLoS One 13, e0197225 (2018). 59. Davison, K., Dolukhanov, P. & Sarson, G. R. The role of waterways in the spread of the Neolithic. J. Archaeol. Sci. 33, 641–652 (2006). 60. Borić, D. Deathways at Lepenski Vir: Patterns in Mortuary Practice. (Serbian Archaeological Society, 2016). 61. Stefanović, S. People of the Lepenski Vir: bioanthropological analysis of human skeletal remains (in Serbian). (Laboratory for Bioarchaeology, Faculty of Philosophy, 2016). 62. Jovanović, J. et al. Last hunters - first farmers: new insight into subsistence strategies in the Central Balkans through multi-isotopic analysis. Archaeological and Anthropological Sciences 11, 3279– 3298 (2019). 63. de Becdelièvre, C., Jovanović, J., Hofmanová, Z., Goude, G. & Stefanović, S. Direct insight into dietary adaptations and the individual experience of Neolithisation: comparing subsistence, provenance and ancestry of Early Neolithic humans from the Danube Gorges c. 6200--5500 cal BC. in Farmers at the Frontier: A Pan European Perspective on Neolithisation (eds. Gron, K. J., Sørensen, L. & Rowley-Conwy, P.) 45–76 (Oxbow Books, 2020). 64. Colledge, S., Conolly, J., Crema, E. & Shennan, S. Neolithic population crash in northwest Europe associated with agricultural crisis. Quat. Res. 92, 686–707 (2019). 65. Neugebauer-Maresch, C. & Lenneis, E. Das Linearbandkeramische Gräberfeld von Kleinhadersdorf. Mitteilungen der Prähistorischen Kommission 82. (Verlag der Österreichischen Akademie der Wissenschaften, 2015). 66. Boulestin, B. et al. Mass cannibalism in the Linear Pottery Culture at Herxheim (Palatinate, Germany). Antiquity 83, 968–982 (2009). 67. Boulestin, B. & Coupey, A.-S. Cannibalism in the linear pottery culture: The human remains from Herxheim. (Archaeopress, 2015). 68. Orschiedt, J. & Haidle, M. N. Violence against the living, violence against the dead on the human 118 Marchi et al. The mixed genetic origin of the first farmers of Europe

remains from Herxheim, Germany. Evidence of a crisis and mass cannibalism? in Sticks, Stones, and Broken Bones: Neolithic Violence in a European Perspective (eds. Schulting, R. J. & Fibiger, L.) 121–138 (Oxford University Press, 2012). 69. Teschler-Nicola, M. The Early Neolithic site Asparn/Schletz (Lower Austria). in Sticks, Stones, and Broken Bones: Neolithic Violence in a European Perspective (eds. Schulting, R. J. & Fibiger, L.) 101–120 (Oxford University Press, 2012). 70. Windl, H. J. Zur Stratigraphie der bandkeramischen Grabenwerke von Asparn an der Zaya-Schletz. in Krisen – Kulturwandel – Kontinuitäten. Zum Ende der Bandkeramik in Mitteleuropa. Beiträge der internationalen Tagung in Herxheim bei Landau (Pfalz) vom 14.-17.06.2007. (ed. Zeeb-Lanz, A.) 191–196 (Verlag Marie Leidorf, 2009). 71. Ritualised Destruction in the Early Neolithic – the Exceptional Site of Herxheim (Palatinate, Germany) Vol. 2. vol. 8.2 (Generaldirektion Kulturelles Erbe, Direktion Landesarchäologie, 2019). 72. Zeeb-Lanz, A. The Herxheim ritual enclosure – a synthesis of results and interpretative approaches. in Ritualised Destruction in the Early Neolithic – the Exceptional Site of Herxheim (Palatinate, Germany) Vol. 2 (ed. Zeeb-Lanz, A.) vol. 8.2 423–482 (Generaldirektion Kulturelles Erbe, Direktion Landesarchäologie, 2019). 73. Budd, C., Lillie, M., Alpaslan-Roodenberg, S., Karul, N. & Pinhasi, R. Stable isotope analysis of Neolithic and Chalcolithic populations from Aktopraklık, northern Anatolia. Journal of Archaeological Science 40, 860–867 (2013). 74. Budd, C. et al. Diet uniformity at an early farming community in northwest Anatolia (Turkey): carbon and nitrogen isotope studies of bone collagen at Aktopraklık. Archaeol. Anthropol. Sci. 10, 2123–2135 (2018). 75. Alpaslan-Roodenberg, M. S. & Roodenberg, J. In the light of new data: The population of the first farming communities in the eastern Marmara region. Praehistorische Zeitschrift 95, (2020). 76. Alpaslan-Roodenberg, S. M. A Preliminary Study of the Burials from Late Neolithic-Early Chalcolithic Aktopraklık. Anatolica 37, 17–43 (2011). 77. Hofmanová, Z. et al. Early farmers from across Europe directly descended from Neolithic Aegeans. Proc. Natl. Acad. Sci. U. S. A. 113, 6886–6891 (2016). 78. Mathieson, I. et al. Genome-wide patterns of selection in 230 ancient Eurasians. Nature 528, 499– 503 (2015). 79. Lazaridis, I. et al. Genetic origins of the Minoans and Mycenaeans. Nature 548, 214–218 (2017). 80. Gerritsen, F., Özbal, R. & Gerrits, P. A red floor at Neolithic Barcın Höyük: Special or Not? in Metallurgica Anatolica. Festschrift für Ünsal Yalçın anlässlich seines 65. Geburtstag. Ünsal Yalçın 65. Yaşgünü Armağan Kitabı (eds. Yalçın, G. & Stegemeier, O.) 35–43 (2020). 81. Rodden, R. J. et al. Excavations at the Early Neolithic Site at Nea Nikomedeia, Greek Macedonia (1961 season). Proceedings of the Prehistoric Society 28, 267–288 (1962). 82. Rodden, R. J. Recent discoveries from prehistoric Macedonia: an interim report. Balkan Studies 5, 109–124 (1964). 83. Rodden, R. J. An Early Neolithic Village in Greece. Sci. Am. 212, 82–93 (1965). 84. Pyke, G. & Yiouni, P. Nea Nikomedeia I: The Excavation of an Early Neolithic Village in Northern Greece 1961-1964. The Excavation and the Ceramic Assemblage. vol. 25 (The British School at Athens, 1996). 85. Rodden, R. J. & Rodden, J. M. A European Link with Chatal Huyuk: Uncovering a 7th Millennium Settlement in Macedonia. Part I - Site and Pottery. The Illustrated London News 564–567 (1964). 86. Angel, J. L. Early Neolithic people of Nea Nikomedeia. in Die Anfänge des Neolithikums vom Orient bis Nordeuropa. Teil VIIIa Anthropologie (ed. Schwidetzky, I.) vol. 3 103–112 (Böhlau Verlag, 1973). 87. Triantaphyllou, S. A Bioarchaeological Approach to Prehistoric Cemetery Populations from Western and Central Greek Macedonia. vol. 976 (Archaeopress, 2001). 88. Hershkovitz, I. et al. Recognition of sickle cell anemia in skeletal remains of children. Am. J. Phys. Anthropol. 104, 213–226 (1997). 89. Ortner, D. J. Identification of Pathological Conditions in Human Skeletal Remains. (Academic Press, 119 Marchi et al. The mixed genetic origin of the first farmers of Europe

2003). 90. Lagia, A., Eliopoulos, C. & Manolis, S. Thalassemia: macroscopic and radiological study of a case. Int. J. Osteoarchaeol. 17, 269–285 (2007). 91. Stefanović, S. & Borić, D. The newborn infant burials from Lepenski Vir: in pursuit of contextual meanings. in The Iron Gates in Prehistory: new perspectives, BAR Int. Series 1893 (eds. Bonsall, C., Boroneanţ, V. & Radovanović, I.) 131–169 (Archaeopress, 2008). 92. Borić, D. & Dimitrijević, V. When did the ‘Neolithic package’ reach Lepenski Vir? Radiometric and faunal evidence. Documenta Praehistorica 34, 53–72 (2007). 93. Ehrich, R. W. Starčevo revisited. in Ancient Europe and the Mediterranean (ed. Markotić, V.) 59–67 (Aris and Phillips, 1977). 94. Jovanović, J. D. The Diet and Health Status of the Early Neolithic Communities of the Central Balkans (6200-5200 BC). (2017). 95. Whittle, A., Bartosiewicz, L., Borić, D., Pettitt, P. & Richards, M. In the beginning: new radiocarbon dates for the Early Neolithic in Northern Serbia and South-East Hungary. Antaeus 25, 63–117 (2002). 96. Clason, A. T. Padina and Starčevo: game, fish and cattle. Palaeohistoria 22, 141–173 (1980). 97. Chapman, J. Fragmentation in Archaeology. People, Places and Broken Objects in the Prehistory of South Eastern Europe. (Routledge, 2000). 98. Tasić, N., Srejović, D. & Stojanović, B. Vinča: Centre of the Neolithic Culture of the Danubian Region. (Project Rastko, 1990). 99. Borić, D. Absolute dating of metallurgical innovations in the Vinča Culture of the Balkans. in Metals and Societies: Studies in honour of Barbara S. Ottaway (eds. Kienlin, T. L. & Roberts, B. W.) 191– 245 (Verlag Dr. Rudolf Habelt, 2009). 100. Nehlich, O., Borić, D., Stefanović, S. & Richards, M. P. Sulphur isotope evidence for freshwater fish consumption: a case study from the Danube Gorges, SE Europe. Journal of Archaeological Science 37, 1131–1139 (2010). 101. Tasić, N. et al. Vinča-Belo Brdo, Serbia: The times of a tell. Germania 93, 1–75 (2015). 102. Windl, H. J. Makabres Ende einer Kultur? Archäologie in Deutschland 1, 54–57 (1999). 103. Wild, E. M. et al. Neolithic Massacres: Local Skirmishes or General Warfare in Europe? Radiocarbon 46, 377–385 (2004). 104. Tiefenböck, B. & Teschler-Nicola, M. Teil II: Anthropologie. in Das Linearbandkeramische Gräberfeld von Kleinhadersdorf. Mitteilungen der Prähistorischen Kommission 82 (eds. Neugebauer-Maresch, C. & Lenneis, E.) 297–392 (Verlag der Österreichischen Akademie der Wissenschaften, 2015). 105. Mateiciucová, I. Silices. in Das Linearbandkeramische Gräberfeld von Kleinhadersdorf (eds. Neugebauer-Maresch, C. & Lenneis, E.) vol. 82 111–122 (Verlag der Österreichischen Akademie der Wissenschaften, 2015). 106. Bickle, P. et al. The Isotope Results from Kleinhadersdorf. in Das Linearbandkeramische Gräberfeld von Kleinhadersdorf. Mitteilungen der Prähistorischen Kommission 82 (eds. Neugebauer-Maresch, C. & Lenneis, E.) 173–177 (Verlag der Österreichischen Akademie der Wissenschaften, 2015). 107. Brink-Kloke, H. Das linienbandkeramische Gräberfeld von Essenbach-Ammerbreite, Ldkr. Landshut, Niederbayern. Germania 68, 427–481 (1990). 108. Nieszery, N. Linearbandkeramische Gräberfelder in Bayern. (Verlag Marie Leidorf, 1995). 109. Dietrich, H. & Kociumaka, C. Jungsteinzeitliche Befunde aus Steinheim: Stadt und Landkreis Dillingen ad Donau, Schwaben. Das Archäologische Jahr in Bayern 2000 32–35 (2001). 110. Pechtl, J. Zeit und Umwelt. Ein Forschungsprojekt zur ersten bäuerlichen Kultur in Bayerisch- Schwaben. Denkmalpflege Informationen 161, 14–17 (2015). 111. Riedhammer, K. The radiocarbon dates from Herxheim and their archaeological interpretation. in Ritualised Destruction in the Early Neolithic – the Exceptional Site of Herxheim (Palatinate, Germany), Vol. 2 (ed. Zeeb-Lanz, A.) vol. 8.2 285–304 (Generaldirektion Kulturelles Erbe, Direktion Landesarchäologie, 2019). 120 Marchi et al. The mixed genetic origin of the first farmers of Europe

112. Haack, F. The early Neolithic ditched enclosure of Herxheim - architecture, fill formation processes and service life. in Ritualised Destruction in the Early Neolithic - The Exceptional Site of Herxheim (Palatinate, Germany) (ed. Zeeb-Lanz, A.) 19–118 (Generaldirektion Kulturelles Erbe Rheinland- Pfalz, 2016). 113. Ritualised Destruction in the Early Neolithic – the Exceptional Site of Herxheim (Palatinate, Germany) Vol. 1. vol. 8.1 (Generaldirektion Kulturelles Erbe, Direktion Landesarchäologie, 2016). 114. Turck, R. Where did the dead from Herxheim originate? Isotope analyses of human individuals from the find concentrations in the ditches. in Ritualised Destruction in the Early Neolithic – the Exceptional Site of Herxheim (Palatinate, Germany) Vol. 2 (ed. Zeeb-Lanz, A.) vol. 8.2 313–421 (Generaldirektion Kulturelles Erbe, Direktion Landesarchäologie, 2019). 115. Hoefs, J. Stable Isotope Geochemistry. (Springer-Verlag Berlin Heidelberg, 2009). 116. Cristiani, E. et al. Dental calculus and isotopes provide direct evidence of fish and plant consumption in Mesolithic Mediterranean. Scientific Reports 8, 8147 (2018). 117. Richards, M. P. & Hedges, R. E. M. Stable Isotope Evidence for Similarities in the Types of Marine Foods Used by Late Mesolithic Humans at Sites Along the Atlantic Coast of Europe. Journal of Archaeological Science vol. 26 717–722 (1999). 118. Schoeninger, M. J. & DeNiro, M. J. Carbon isotope ratios of apatite from fossil bone cannot be used to reconstruct diets of animals. Nature 297, 577–578 (1982). 119. Ambrose, S. H. Effects of diet, climate and physiology on nitrogen isotope abundances in terrestrial foodwebs. J. Archaeol. Sci. 18, 293–317 (1991). 120. Fogel, M. L., Tuross, N. & Owsley, D. W. Nitrogen isotope tracers of human lactation in modern and archaeological populations. Carnegie Institution of Washington Yearbook 88, 111–117 (1989). 121. Fuller, B. T., Fuller, J. L., Harris, D. A. & Hedges, R. E. M. Detection of breastfeeding and weaning in modern human infants with carbon and nitrogen stable isotope ratios. Am. J. Phys. Anthropol. 129, 279–293 (2006). 122. Bickle, P. Stable isotopes and dynamic diets: The Mesolithic-Neolithic dietary transition in terrestrial central Europe. Journal of Archaeological Science: Reports 22, 444–451 (2018). 123. Bonsall, C. et al. Mesolithic and Early Neolithic in the Iron Gates: A Paiaeodietary Perspective. Journal of European Archaeology 5, 50–92 (1997). 124. Bonsall, C., Boroneanţ, V. & Radovanović, I. The Iron Gates in prehistory: new perspectives. vol. 1893 (Archaeopress, 2008). 125. Bonsall, C., Boroneanț, A., Simalcsik, A. & Higham, T. Radiocarbon dating of Mesolithic burials from Ostrovul Corbului, southwest . in Southeast Europe and Anatolia in prehistory. Essays in honor of Vassil Nikolov on his 65th anniversary (eds. Bacvarov, K. & Gleser, R.) vol. 293 41–50 (Verlag Dr. Rudolf Habelt, 2016). 126. Borić, D. Lepenski Vir chronology and stratigraphy revisited. Starinar 9–60 (2019). 127. Mathieson, I. et al. The genomic history of southeastern Europe. Nature 555, 197–203 (2018). 128. Papathanasiou, A. Stable isotope analysis in Neolithic Greece and possible implications on human health. Int. J. Osteoarchaeol. 13, 314–324 (2003). 129. Papathanasiou, A. Health, diet and social implications in Neolithic Greece from the study of human osteological material. in Human Bioarchaeology of the Transition to Agriculture (eds. Pinhasi, R. & Stock, J.) 87–106 (John Wiley & Sons, Ltd Chichester, 2011). 130. Vaiglova, P. et al. An integrated stable isotope study of plants and animals from Kouphovouno, southern Greece: a new look at Neolithic farming. J. Archaeol. Sci. 42, 201–215 (2014). 131. Papathanasiou, A. Stable Isotope Analyses in Neolithic and Bronze Age Greece: An Overview. Hesperia Supplements 49, 25–55 (2015). 132. Bollongino, R. et al. 2000 years of parallel societies in Stone Age Central Europe. Science 342, 479– 481 (2013). 133. Bramanti, B. et al. Genetic discontinuity between local hunter-gatherers and central Europe’s first farmers. Science 326, 137–140 (2009). 134. Scheu, A. et al. The genetic prehistory of domesticated cattle from their origin to the spread across Europe. BMC Genet. 16, 54 (2015). 121 Marchi et al. The mixed genetic origin of the first farmers of Europe

135. Yang, D. Y., Eng, B., Waye, J. S., Dudar, J. C. & Saunders, S. R. Improved DNA extraction from ancient bones using silica-based spin columns. American Journal of Physical Anthropology: The Official Publication of the American Association of Physical Anthropologists 105, 539–543 (1998). 136. MacHugh, D. E., Edwards, C. J., Bailey, J. F., Bancroft, D. R. & Bradley, D. G. The extraction and analysis of ancient DNA from bone and teeth: a survey of current methodologies. Anc. Biomol. 3, 81–102 (2000). 137. Gamba, C. et al. Genome flux and stasis in a five millennium transect of European prehistory. Nat. Commun. 5, 5257 (2014). 138. Kircher, M., Sawyer, S. & Meyer, M. Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform. Nucleic Acids Res. 40, e3 (2012). 139. Verdugo, M. P. et al. Ancient cattle genomics, origins, and rapid turnover in the Fertile Crescent. Science 365, 173–176 (2019). 140. Mallick, S. et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538, 201–206 (2016). 141. Jones, E. R. et al. Upper Palaeolithic genomes reveal deep roots of modern Eurasians. Nat. Commun. 6, 8912 (2015). 142. Lazaridis, I. et al. Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature 513, 409–413 (2014). 143. Günther, T. et al. Population genomics of Mesolithic Scandinavia: Investigating early postglacial migration routes and high-latitude adaptation. PLoS Biol. 16, e2003703 (2018). 144. Kılınç, G. M. et al. The Demographic Development of the First Farmers in Anatolia. Curr. Biol. 26, 2659–2666 (2016). 145. Broushaki, F. et al. Early Neolithic genomes from the eastern Fertile Crescent. Science 353, 499–503 (2016). 146. Brace, S. et al. Ancient genomes indicate population replacement in Early Neolithic Britain. Nature Ecology & Evolution 3, 765–771 (2019). 147. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv [q- bio.GN] (2013). 148. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009). 149. DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011). 150. Link, V. et al. ATLAS: Analysis Tools for Low-depth and Ancient Samples. doi:10.1101/105346. 151. Koster, J. & Rahmann, S. Snakemake--a scalable bioinformatics workflow engine. Bioinformatics vol. 28 2520–2522 (2012). 152. Fu, Q. et al. A revised timescale for human evolution based on ancient mitochondrial genomes. Curr. Biol. 23, 553–559 (2013). 153. Katoh, K., Misawa, K., Kuma, K.-I. & Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002). 154. Rasmussen, M. et al. An Aboriginal Australian genome reveals separate human dispersals into Asia. Science 334, 94–98 (2011). 155. Consortium, T. 1000 G. P. & The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature vol. 526 68–74 (2015). 156. Kousathanas, A. et al. Inferring Heterozygosity from Ancient and Low Coverage Genomes. Genetics 205, 317–332 (2017). 157. Dimitrieva, S. & Bucher, P. UCNEbase--a database of ultraconserved non-coding elements and genomic regulatory blocks. Nucleic Acids Res. 41, D101–9 (2013). 158. Skoglund, P., Storå, J., Götherström, A. & Jakobsson, M. Accurate sex identification of ancient human remains using DNA shotgun sequencing. Journal of Archaeological Science 40, 4477–4482 (2013). 159. Günther, T. & Nettelblad, C. The presence and impact of reference bias on population genomic studies of prehistoric human populations. PLoS Genet. 15, e1008302 (2019). 122 Marchi et al. The mixed genetic origin of the first farmers of Europe

160. Delaneau, O., Zagury, J.-F., Robinson, M. R., Marchini, J. L. & Dermitzakis, E. T. Accurate, scalable and integrative haplotype estimation. Nat. Commun. 10, 5436 (2019). 161. McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016). 162. Nei, M. & Li., W. H. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc. Natl. Acad. Sci. U. S. A. 76, 5269–5273 (1979). 163. Pouyet, F., Aeschbacher, S., Thiéry, A. & Excoffier, L. Background selection and biased gene conversion affect more than 95% of the human genome and bias demographic inferences. Elife 7, (2018). 164. Skoglund, P. et al. Origins and genetic legacy of Neolithic farmers and hunter-gatherers in Europe. Science 336, 466–469 (2012). 165. Browning, B. L. & Browning, S. R. Detecting identity by descent and estimating genotype error rates in sequence data. Am. J. Hum. Genet. 93, 840–851 (2013). 166. Sikora, M. et al. Ancient genomes show social and reproductive behavior of early foragers. Science 358, 659–662 (2017). 167. Racimo, F., Sikora, M., Linden, M. V., Schroeder, H. & Lalueza-Fox, C. Beyond broad strokes: sociocultural insights from the study of ancient genomes. Nature Reviews Genetics vol. 21 355–366 (2020). 168. Frichot, E. & François, O. LEA: An R package for landscape and ecological association studies. Methods in Ecology and Evolution vol. 6 925–929 (2015). 169. Navarro-Gomez, D. et al. Phy-Mer: a novel alignment-free and reference-independent mitochondrial haplogroup classifier. Bioinformatics 31, 1310–1312 (2015). 170. Ralf, A., González, D. M., Zhong, K. & Kayser, M. Yleaf: Software for Human Y-Chromosomal Haplogroup Inference from Next-Generation Sequencing Data. Molecular Biology and Evolution vol. 35 1291–1294 (2018). 171. González-Fortes, G. et al. Paleogenomic Evidence for Multi-generational Mixing between Neolithic Farmers and Mesolithic Hunter-Gatherers in the Lower Danube Basin. Curr. Biol. 27, 1801– 1810.e10 (2017). 172. Lipson, M. et al. Parallel palaeogenomic transects reveal complex genetic history of early European farmers. Nature 551, 368–372 (2017). 173. Mittnik, A. et al. Kinship-based social inequality in Bronze Age Europe. Science 366, 731–734 (2019). 174. Olalde, I. et al. The Beaker phenomenon and the genomic transformation of northwest Europe. Nature 555, 190–196 (2018). 175. Olalde, I. et al. The genomic history of the Iberian Peninsula over the past 8000 years. Science 363, 1230–1234 (2019). 176. Brotherton, P. et al. Neolithic mitochondrial haplogroup H genomes and the genetic origins of Europeans. Nat. Commun. 4, 1764 (2013). 177. Prüfer, K. et al. A high-coverage Neandertal genome from in . Science 358, 655–658 (2017). 178. Petr, M., Pääbo, S., Kelso, J. & Vernot, B. Limits of long-term selection against Neandertal introgression. Proc. Natl. Acad. Sci. U. S. A. 116, 1639–1644 (2019). 179. Patterson, N. et al. Ancient admixture in human history. Genetics 192, 1065–1093 (2012). 180. Lazaridis, I. et al. Genomic insights into the origin of farming in the ancient Near East. Nature 536, 419–424 (2016). 181. Walsh, S. et al. The HIrisPlex system for simultaneous prediction of hair and eye colour from DNA. Forensic Sci. Int. Genet. 7, 98–115 (2013). 182. Chaitanya, L. et al. The HIrisPlex-S system for eye, hair and skin colour prediction from DNA: Introduction and forensic developmental validation. Forensic Sci. Int. Genet. 35, 123–135 (2018). 183. Harding, R. M. et al. Evidence for variable selective pressures at MC1R. Am. J. Hum. Genet. 66, 1351–1361 (2000). 184. Cavalli-Sforza, L. L., Moroni, A. & Zei, G. Consanguinity, Inbreeding, and Genetic Drift in Italy. 123 Marchi et al. The mixed genetic origin of the first farmers of Europe

(Princeton University Press, 2004). 185. Latreille, J. et al. MC1R gene polymorphism affects skin color and phenotypic features related to sun sensitivity in a population of French adult women. Photochem. Photobiol. 85, 1451–1458 (2009). 186. Yamaguchi, K. et al. Association of melanocortin 1 receptor gene (MC1R) polymorphisms with skin reflectance and freckles in Japanese. J. Hum. Genet. 57, 700–708 (2012). 187. Park, J.-H. et al. Effects of an Asian-specific nonsynonymous EDAR variant on multiple dental traits. J. Hum. Genet. 57, 508–514 (2012). 188. Shang, D., Zhang, X., Sun, M., Wei, Y. & Wen, Y. Strong association of the SNP rs17822931 with wet earwax and bromhidrosis in a Chinese family. J. Genet. 92, 289–291 (2013). 189. Fujimoto, A. et al. A scan for genetic determinants of human hair morphology: EDAR is associated with Asian hair thickness. Hum. Mol. Genet. 17, 835–843 (2008). 190. Xue, Y. et al. Population differentiation as an indicator of recent positive selection in humans: an empirical evaluation. Genetics 183, 1065–1077 (2009). 191. Ohashi, J., Naka, I. & Tsuchiya, N. The Impact of Natural Selection on an ABCC11 SNP Determining Earwax Type. Mol. Biol. Evol. 28, 849–857 (2010). 192. Fumagalli, M. et al. Greenlandic Inuit show genetic signatures of diet and climate adaptation. Science 349, 1343–1347 (2015). 193. Buckley, M. T. et al. Selection in Europeans on Fatty Acid Desaturases Associated with Dietary Changes. Mol. Biol. Evol. 34, 1307–1318 (2017). 194. Ameur, A. et al. Genetic adaptation of fatty-acid metabolism: a human-specific haplotype increasing the biosynthesis of long-chain omega-3 and omega-6 fatty acids. Am. J. Hum. Genet. 90, 809–820 (2012). 195. McEvoy, B. P. & Visscher, P. M. Genetics of human height. Econ. Hum. Biol. 7, 294–306 (2009). 196. Marouli, E. et al. Rare and low-frequency coding variants alter human adult height. Nature 542, 186–190 (2017). 197. Chan, Y. et al. Genome-wide Analysis of Body Proportion Classifies Height-Associated Variants by Mechanism of Action and Implicates Genes Important for Skeletal Development. Am. J. Hum. Genet. 96, 695–708 (2015). 198. Veeramah, K. R. et al. Population genomic analysis of elongated skulls reveals extensive female- biased immigration in Early Medieval Bavaria. Proc. Natl. Acad. Sci. U. S. A. 115, 3494–3499 (2018). 199. Zoledziewska, M. et al. Height-reducing variants and selection for short stature in Sardinia. Nature Genetics vol. 47 1352–1356 (2015). 200. Rosenstock, E. et al. Human stature in the Near East and Europe ca. 10,000–1000 BC: its spatiotemporal development in a Bayesian errors-in-variables model. Archaeol. Anthropol. Sci. 11, 5657–5690 (2019). 201. Cox, S. L., Ruff, C. B., Maier, R. M. & Mathieson, I. Genetic contributions to variation in human stature in prehistoric Europe. Proc. Natl. Acad. Sci. U. S. A. 116, 21484–21492 (2019). 202. Schiffels, S. & Wang, K. MSMC and MSMC2: The Multiple Sequentially Markovian Coalescent. Methods Mol. Biol. 2090, 147–166 (2020). 203. Scally, A. & Durbin, R. Revising the human mutation rate: implications for understanding human evolution. Nat. Rev. Genet. 13, 745–753 (2012). 204. Fenner, J. N. Cross-cultural estimation of the human generation interval for use in genetics-based population divergence studies. Am. J. Phys. Anthropol. 128, 415–423 (2005). 205. Malaspinas, A.-S. et al. A genomic history of Aboriginal . Nature 538, 207–214 (2016). 206. Beichman, A. C., Phung, T. N. & Lohmueller, K. E. Comparison of single genome and allele frequency data reveals discordant demographic histories. G3: Genes, Genomes, Genetics 7, 3605– 3620 (2017). 207. Excoffier, L., Dupanloup, I., Huerta-Sanchez, E., Sousa, V. C. & Foll, M. Robust demographic inference from genomic and SNP data. PLoS Genet. 9, e1003905 (2013). 208. Wahlund, S. Zusammensetzung von Populationen und Korrelationerscheinungen vom Standpunkt 124 Marchi et al. The mixed genetic origin of the first farmers of Europe

der Vererbungslehre aus betrachtet. Hereditas 11, 65–106 (1928). 209. de Manuel, M. et al. Chimpanzee genomic diversity reveals ancient admixture with bonobos. Science 354, 477–481 (2016). 210. Marchi, N. & Excoffier, L. Gene flow as a simple cause for an excess of high‐frequency‐derived alleles. Evol. Appl. 146, 295 (2020). 211. Slatkin, M. Seeing ghosts: the effect of unsampled populations on migration rates estimated for sampled populations. Mol. Ecol. 14, 67–73 (2005). 212. Beerli, P. Effect of unsampled populations on the estimation of population sizes and migration rates between sampled populations. Mol. Ecol. 13, 827–836 (2004). 213. Excoffier, L. Patterns of DNA sequence diversity and genetic structure after a range expansion: lessons from the infinite-island model. Mol. Ecol. 13, 853–864 (2004). 214. Spence, J. P. & Song, Y. S. Inference and analysis of population-specific fine-scale recombination maps across 26 diverse human populations. Sci Adv 5, eaaw9206 (2019). 215. Akaike, H. A new look at the statistical model identification. IEEE Transactions on Automatic Control vol. 19 716–723 (1974). 216. Matthews, R. & Nashli, H. F. The Neolithisation of Iran. The Formation of New Societies. vol. 3 (Oxbow Books, 2013). 217. Zeder, M. A. Domestication and early agriculture in the Mediterranean Basin: Origins, diffusion, and impact. Proc. Natl. Acad. Sci. U. S. A. 105, 11597–11604 (2008). 218. Feldman, M. et al. Late Pleistocene human genome suggests a local origin for the first farmers of central Anatolia. Nat. Commun. 10, 1218 (2019). 219. Gallego-Llorente, M. et al. The genetics of an early Neolithic pastoralist from the Zagros, Iran. Sci. Rep. 6, 31326 (2016).

125 Marchi et al. The mixed genetic origin of the first farmers of Europe