ID HGDP Parental Origin Paternal Origin Maternal Origin Call Seq? ID F M GF GM GF GM 171 1062 Tortoli Tortoli Tortoli Tortoli Tortoli Tortoli Yes 832 1073 Lanusei Lanusei Lanusei Ilbono Lanusei Yes 833 1074 Loceri Loceri Loceri Loceri Loceri Loceri Yes 2306 1076 Lanusei Lanusei - Lanusei Lanusei Lanusei Lanusei No 2483 1072 Lanusei Lanusei Lanusei Lanusei Lanusei Lanusei Lanusei No 4676 1064 Ilbono Ilbono Ilbono Ilbono Ilbono Loceri Ilbono Yes 4841 1066 Lanusei Gairo Lanusei Lanusei Gairo Gairo NA Yes 4842 1065 Ulassai Perdas- Ulassai Ulassai Ulassai Ulassai Yes defogu

Table S1: parental and grandparental origin of duplicate samples between whole genome sequencing data and HGDP Sardinians. Of the 27 HGDP Sardinians on the Human Origin Array dataset, 8 were included in our sample. For seven of these we had grandparental information that imply the samples derive from the villages of Tortoli, Lanusei, Loceri, Ilbono, and Ulassai in the Ogliastra province. These samples were likely shared with the HGDP by early researchers collecting for the Lanusei/SardiNIA project.

Cluster A (N = 12) Cluster B (N = 15) HGDP01062 HGDP01073 HGDP00666 HGDP00674 HGDP01063 HGDP01074 HGDP00667 HGDP01068 HGDP01064 HGDP01075 HGDP00668 HGDP01069 HGDP01065 HGDP01076 HGDP00669 HGDP01070 HGDP01066 HGDP00670 HGDP01077 HGDP01067 HGDP00671 HGDP01078 HGDP01071 HGDP00672 HGDP01079 HGDP01072 HGDP00673

Table S2: list of HGDP Sardinians divided into two clusters based on joint PCA analysis with the sequenced Sardinians (Figure S3). We named the two sub-clusters Cluster A (SarHGDPa) and Cluster B (SarHGDPb). Cluster A HGDP Sardinians are more closely resembling those from the Gennargentu region in our dataset.

Present Size (N) Bottleneck Size (N) Growth Rate (%) Divergence Time (gen) Sardinia 35,492 6,491 0.00487 349 (19,162-84,406) (5,460-19,468) (0.00055-0.020) (89-403) Tuscan 124,393 71,957 0.00271 202 (59,416-161,973) (3,952-97,085) (0.0017-0.018) (55-374)

Table S3: Demographic parameters inferred by fastsimcoal2. Confidence interval based on 5th and 95th percentile values from 100 bootstrap runs. 78% of the bootstrap replicates support earlier (higher value) divergence time for Sardinia relative to TSI.

Test Admixture Date Fitted Population Reference 1 Reference 2 (gen) amplitude P-value α1 (s.e.) α2 (s.e.) Spanish 2.46e-05 +/- Esan (Castilla y Leon) 56.88 +/- 5.60 2.22e-06 6.45E-20 0.0096 (0.0057) 0.011 (0.0073) Spanish 2.46e-05 +/- CAGLIARI Luhya (Castilla y Leon) 67.90 +/- 10.84 4.11e-06 5.10E-05 0.0098 (0.0058) 0.012 (0.0075) 2.45e-05 +/- CAGLIARI Luhya Russian 68.41 +/- 8.06 3.21e-06 5.85E-10 0.027 (0.0037) 0.062 (0.0048) Spanish 2.39e-05 +/- CAGLIARI Mende (Castilla y Leon) 58.41 +/- 7.51 2.37e-06 1.65E-10 0.0093 (0.0055) 0.011 (0.0070) 2.33e-05 +/- CAGLIARI Mende Orcadian 55.78 +/- 7.33 2.69e-06 6.15E-10 0.031 (0.0037) 0.031 (0.0048) 2.50e-05 +/- CAMPIDANO Luhya Tuscan 61.38 +/- 12.22 5.08e-06 0.021 0.0091 (0.0045) 0.018 (0.0059) 2.25e-05 +/- CAMPIDANO Yoruba Tuscan 50.39 +/- 10.00 4.17e-06 0.0108 0.0089 (0.0044) 0.018 (0.0058) 2.37e-05 +/- CARBONIA * Mende Saudi 68.78 +/- 13.46 4.88e-06 0.027 0.016 (0.14) 0.022 (0.10) 4.02e-05 +/- ORISTANO Hadza Hungarian 108.60 +/- 11.74 7.51e-06 0.0021 0.022 (0.0038) 0.029 (0.0049) 3.31e-05 +/- ORISTANO Ju_hoan_North Hungarian 92.51 +/- 15.34 6.90e-06 0.036 0.014 (0.0025) 0.017 (0.0029) 3.18e-05 +/- ORISTANO BantuSA Estonian 88.13 +/- 16.98 6.15e-06 0.0054 0.036 (0.0042) 0.039 (0.0053) 2.97e-05 +/- ORISTANO * BantuSA Syrian 106.70 +/- 20.40 6.19e-06 0.0375 -0.044 (0.046) -0.083 (0.038) 2.94e-05 +/- ORISTANO * BantuSA Jordanian 106.83 +/- 21.18 5.72e-06 0.0105 -0.064 (0.043) -0.070 (0.034) 2.46e-05 +/- SASSARI BantuSA Lithuanian 68.64 +/- 9.62 3.39e-06 2.25E-08 0.036 (0.0041) 0.030 (0.0054) 2.41e-05 +/- SASSARI Mbuti Lithuanian 68.77 +/- 13.15 3.29e-06 0.0039 0.030 (0.0034) 0.025 (0.0044) 2.23e-05 +/- SASSARI Luhya Lithuanian 73.87 +/- 12.69 3.96e-06 0.00042 0.039 (0.0045) 0.034 (0.0060) 2.18e-05 +/- SASSARI Biaka Lithuanian 60.98 +/- 12.81 3.76e-06 0.045 0.031 (0.0036) 0.026 (0.0046) 1.96e-05 +/- SASSARI BantuSA Finnish 55.80 +/- 11.69 3.61e-06 0.042 0.037 (0.0048) 0.044 (0.0057) Table S4: Admixture proportions estimated by f4 ratio test. αi given are the estimated Subsaharan admixture proportion with respect to different outgroups. For α1 outgroups used were Finnish and Chimp; For α2 outgroups used were Han and Chimp. * For Carbonia and Oristano, because the non-African source population is a Middle Eastern population, we used Esan and Gambian as the outgroups rather than Finnish and Han. Estimated admixture proportion for Carbonia is consistent with 0.

population f3 statistic Standard error f3(San; Stuttgart, Sardinian) SarHGDPa 0.213699 0.003046 0.213258 0.003044 SarHGDPb 0.213233 0.003034 GAIRO 0.213016 0.003058 VILLAGRANDE 0.212972 0.003001 LOCERI 0.212967 0.002998 BARISARDO 0.212886 0.003045 LANUSEI 0.212836 0.003016 ILBONO 0.212805 0.003009 0.212759 0.002978 TORTOLI 0.212545 0.002991 ORISTANO 0.212294 0.00298 CAMPIDANO 0.212063 0.00297 CAGLIARI 0.211977 0.002964 SASSARI 0.211816 0.002984 CARBONIA 0.211571 0.002993 OLBIATEMPIO 0.211258 0.002996 f3(San; Loschbour, Sardinian) GAIRO 0.209925 0.002977 LOCERI 0.209595 0.002983 LANUSEI 0.209496 0.002949 ARZANA 0.209433 0.002973 NUORO 0.209379 0.002937 BARISARDO 0.209214 0.002992 VILLAGRANDE_STRISAILI 0.209143 0.002979 SarHGDPb 0.208838 0.002977 TORTOLI 0.208826 0.002936 ILBONO 0.208738 0.00296 SarHGDPa 0.208532 0.002991 OLBIATEMPIO 0.208516 0.002933 ORISTANO 0.208376 0.002944 SASSARI 0.208286 0.00294 CAGLIARI 0.208146 0.002923 CARBONIA 0.20774 0.002939 CAMPIDANO 0.207687 0.002919

Table S5: measure of shared drift between each Sardinian subpopulations and the Neolithic European farmer (Stuttgart) and pre-Neolithic hunter-gatherer (Loschbour).

Table S6: an EXCEL spreadsheet listing the information of all populations used.

A B K = 2 Fst x 1000 1.0 SarHGb 9.2 3.5 4.3 6.5 7.6 2.2 3 4.4 6.8

0.8 VILLAGRANDE 6.7 4.1 4.2 4.8 5.9 3.7 3.8 4.3 6.8

GAIRO 6.6 1.6 2.5 2.6 5.2 −0.4 2.1 4.3 4.4

0.6 TORTOLI 6 1.7 2.2 3.4 4.4 1.6 2.1 3.8 3

SarHGa 5.4 0 1.5 1.4 3.4 1.6 −0.4 3.7 2.2 0.4 ILBONO 5.4 2.9 3.6 3.9 3.4 4.4 5.2 5.9 7.6 Ancestry

LANUSEI 5.4 1.5 2.7 3.9 1.4 3.4 2.6 4.8 6.5 0.2 BARISARDO 5.2 0.3 2.7 3.6 1.5 2.2 2.5 4.2 4.3

0.0 LOCERI 4.8 0.3 1.5 2.9 0 1.7 1.6 4.1 3.5 Ilb Ori Tor Arz Bar Gai Olb Loc Car Lan Vgs Sas Nuo Cag HGa HGb Cmp ARZANA 4.8 5.2 5.4 5.4 5.4 6 6.6 6.7 9.2

K = 3 GAIRO SarHGa SarHGb LOCERI ILBONO ARZANA TORTOLI LANUSEI 1.0 BARISARDO VILLAGRANDE 0.8 0.6 0.4 Ancestry 0.2 0.0 Ilb Ori Tor Arz Bar Gai Olb Loc Car Lan Vgs Sas Nuo Cag HGa HGb Cmp

Figure S1: Relationship between HGDP Sardinians and newly sequenced Sardinian dataset. The HGDP Sardinians are recorded as being collected from the Gennargentu region (A Piazza, personal communication). Consistent with this, the HGDP Sardinians show close affinity to the Ogliastra individuals. (A) Admixture analysis at K = 2 shows that the HGDP Sardinians, here divided into HGa and HGb (see Figure S2) cluster with Sardinians from Ogliastra. K = 3 identified the HGDP Sardinians to stand out from the rest of the Sardinians, likely reflecting differences in data generation. (B) Fst matrix within Sardinia, including the HGDP Sardinians (SarHGa and SarHGb). A B

HGbHGbHGb Arz HGbHGbHGbHGb HGbHGb HGa Arz 0.20 HGb HGbHGb Arz ●HGb Arz HGb HGbHGb HGa Arz HGa HGa 0.10 Arz HGa HGaHGa ArzArzArzxArz HGa HGaHGa HGa Arz ArzArzArz ●HGaHGa ArzArzArzArz ArzArzArzArz ArzArzArzArz x x ArzArz●xIlb ArzArz HGb ArzArz x x Arzx 0.15 x x Arz Arzx Arzx x ? x x x HGa ? x x Ilb Arz Ilbx xIlb 0.05 HGb HGbHGb x x x xIlb xx x x HGbHGb x x ? x HGbx xx x x xIlb x x ArzxArzxIlbx Olb SasOlb?? x x HGbHGbHGb xx xx x xx EliLan ?x x xCar?xVgsOlbx? HGbx x xx IlbIlbxIlbxIlbxIlb CagCarxxx?SasSasxOlb?SasOlb?OlbOlbxSas?? ? xHGb● HGb x IlbxIlb Elix ?xOlb?xCag x x x xIlbIlb Ilb xSasOlbxSas??OlbSasOlbxxOlbSas?Olb?xSas?Nuoxx?Sasx? x x Cagx x xxxIlbIlbxIlb VgsxIlbIlbxIlb CarSas?xOlbx?CagSas?xSasNuox?SasxSas??SasSasSas??Sas?xSas?xSas?xx?Nuo?x x HGbIlb BarIlbIlbxxx IlbIlb●IlbIlbIlbEli Ilb xx●SasCagOlbxSasSasSas?xCarCmpSas?Car?SasxCagNuoSasxCagSas?xSas?xSasNuoOriCagSas?SasOrixOriSasOlbxSasxOriSasx?CarOrixNuoNuox Tor x x xxIlbIlbIlb CagCarx xSasCagCarCagSasSar?SasSas?OriCarOriSas?SasCagSasxCagOriNuoSasxOlb?xNuoNuox?CmpOriNuoxxxNuo?x x x xx x Ilbx x●Ilbx Ilb x SasSasCmpSasCarSasSar? xOrixxNuoxNuo x x Ilb 0.10 ? CmpxNuoSasCmp?Sasxxxx xNuo Vgs HGaVgs x NuoCagCmpCag●SarSasCmpxCmpSarNuo?CmpCagOriCmpSasOriNuoCarxOriCag?xOrixCmpSasCag?xNuo?SarCagxOri?CmpNuox?NuoOriCagx?xNuoxxCagOrixxCarNuoxNuoxx?? Nuo x x xx xxx VgsIlbIlb CmpSasCmpCmpCagCmpCmpxSarCmpCarx?SasCagOriCmpNuoSasSarCarCagCmp?OriNuoCarCmpCagOrixCmpxOriSarCarxCagxOriCagxCmpCarNuoCag?xOriCagxCmp?xCagNuoCarNuoxCagNuo?CagSarNuoxx?NuoNuoxxxNuox xxx xx Loc x IlbVgsVgsVgsBarx Ilb CagCmpCagCmpOriCarCagCmpxCagNuoCarOriCmpSarCmpCag?OriCmpCarCag?CmpOriSarxCmpCagSar?OriSarCagxCmpCarNuoxCagOrixCagCarx?Orix?CagxOrixNuo?CagCarOriCag?CagOrixCagxNuoxx x x Torx x xx xx xBarxVgsLanxIlbVgsLan HGa Cmp?CarCmpCagOriCmpCarSasCmpCagNuoCar?CmpCagOrixCagCarCagxCmpOrixCagOrixCarxCarNuoCagSarxOriCagNuoxCag?xCagx?CagxOriCagNuoxCagNuoxNuoCagCagNuoxxxCagNuoxNuo?NuoCagTorNuox??Cagxx? x x xxxTorxVgs xxLocx x xx Ilb 0.00 xOriCmpCmpCagOriCmpCagCagx?CagCagx●Sar?xCagxCag?CagxNuoNuo Tor x Barx x●OriCmpxCmpOrixSarCagCmpxCagx?Sar?OriCagCagOrixCagxCagxCagxCagxCagCagxCagxSasCagNuo?CagNuoCagxxCagNuox TorTorTorTorxTorTorLanxx x ●Vgs HGa ●●CmpOriCagCagCagCagxxCagNuoOri?NuoCag?CagCagxCagxx Torx x x Vgsxx Ilb CmpCagOriCmpCarCmpSarCag?CmpOriCagSarCagCarOriCagCmpCagxSarCagOrixCmpCagCagxCagxCagNuoOrixCmpCagSarxNuoNuox CagxCagx BarxTor xx BarxBarx BarVgsLanx CagOriCagCagCmpCagx?xOriCagCagCagSarNuoCagCarxCagxSarxCagCag?xCagCag?Cag?CagCagCagCagxNuoCagNuo●TorTorx HGbBarxx Ilbx x x HGa Cag●CagCagOriCagCag?Cag?xCagNuo Cag x LanVgsx xx Lan Cag Cag Tor BarBarBarxLan HGa PC2 (0.242%) Cag xCagx Barx PC3 (0.124%) CagCagCagCagCagCagCagCagCagx x LanVgsBar●xxxx LocLan CagCagCagCagxSarCagCagCagxxx xx x x BarLanLocLocxLoc x CagxCagNuoCagxCagOri xx xx xx xx Lanxxx LanLanLan HGa CagCag Cagx xx x Lanx ●LocLoc LanxHGa CagCagxCagx x xxx xLan x x ● Tor xx LanLanx HGa Cag x x Lanx x LocxxLanx x HGa CagCagxx Gaixxx LanxxGaiLanLocxLanLanLanLan 0.05 Cag ? ? x Lan LanLanLanLanLan Lan CagCag x xGaix x LanLan xGaiGaiGaiGai ●xLanx LanxLan Cag x Lan Cagx xx Lan LanLanLanLanLan − 0.05 ● CagCagx? Gaix Lan LanxLanLan Lan Gai x LanLanLanLan HGa x x LanLan Lan HGa Nuo xSasSasNuoNuoNuo?NuoxNuoxNuoNuoNuo x ? ?xOlbxCag?Carxx?SasxSasCarSasSasOlb?CagSasOlbNuox?SasCmpxCagSasOlbxCmpSarCag?SasCagSasCmpSasSarNuo?xSas?xCmpCagSasOlbOri?xSasOri?xNuo?xOriCmpSasCagCarNuoxSasSar?CagOriCmpSasx?NuoCar?xOriCagSasCmpxOriCagSasOlbxOriCmpCagxx?SasCagxCarNuoxOrix?CagNuoxxCmpSarOriNuoxCag?xCagxNuoSarCagNuoxCagNuox?xCagxNuoxCagx x Car?xxCar??xSas?xVgsOlbCagSasxCmp?SasxCagOlb?xSasCagOlbSas?xOlbCmpCagSas?CagxCarCmp?CarSas?CmpCagxSasCmpOlbCag?OriCarxSasCag?OlbxNuoSasCarCmpOriSar?SasCagCmpxSasOri?CarxSasOriCmpNuoCar?CagSasSarxCmpNuoSasCagCar?OriCmpx?CmpxOriCagSarOlb?CmpSasxOriCag?CmpSarxOriCagx?NuoCarOriCagxSar?xOriCagNuoCmp?xCarxOriCagNuo?Sarx?OriCagCarxSar?CagxCmpOriSar?xCagNuoCarxCag?NuoxCag?xCarOriCmpCagx?CagxNuoSar?CagNuox?CagxNuoxOriCagNuo?Cagx?xNuoCagNuoCagxxNuoCagSarCmpx?Cagxx?CagxNuoCagTorCagxNuoCagCagxxxxCagxxCagxxx x x x LanLanLan Nuoxx OlbCarCmpCagSasOriCar?CagCmpxCagSarCagx?CmpCagCarSarCagOriCarCagx?xNuoCagCagNuoxOriCagxOriCarCagCagx?SarCagCagx?xCagSasCagxNuoCag?NuoOriCagSarNuoCagxNuoCagNuoxNuoCagNuoxCagxOrixNuoxCagx?xCagNuoxCagxCagxTorxxCagx?Torx?TorxCagxxTorCagxTorGaixxxCagxxTorxxxLocxxxTorLanxxxTorxxxxLanVgsxxxLanxxxLanxxxxxLanLocLan 0.00 ●●? ? xxxxTorxBarxxCagTor?TorxTorCagxxxxLan?xxxxVgsxBarxBarLanxxBarxLanGaiBarxxGaiGaiLanIlbxGaixGaiLocLanxIlbGaixxLanxBarLanLocxLocLocxBarLocxLanLanxLanxLocxLanxLanLanLanxxLanxLanxLanLanLan ●●● xx x x xLanxxxBarxxBarxVgsxxVgsxVgsLanxLanBarLocLanIlbxIlbxIlbxLanxxLanLanxIlbLanxxIlbLan ● x x?Arzxx xxBarxIlbxLanVgsxVgsLanxVgsLanxxIlbxLanVgsxLanIlbLanxLanxxIlbxLanxIlbIlb ● xx xVgsxxxArzxxIlbIlbVgsxIlbxIlbxVgsIlbIlbxxVgsBarxxxIlbIlbIlbxArzIlbVgsxLanxIlbIlbxIlbxIlbxIlbxIlbIlbArzIlbIlbIlbArzElixIlbEliElixIlbxIlbIlbIlbLan ●●●Ilb●ArzArzArzxArzxArzArzArzxArzIlbArzArzxArzIlbxx ● ArzArzArzArzArzArzArzArzArzArzArzArzArzArzArz HGa ●●●ArzArz − 0.10 Gai

−0.04 −0.02 0.00 0.02 0.04 0.06 0.08 −0.04 −0.02 0.00 0.02 0.04 0.06 0.08 PC1 (0.293%) PC1 (0.293%)

Figure S2: Relationship between HGDP Sardinians and newly sequenced Sardinians. The HGDP Sardinians are recorded as being collected from the Gennargentu region (A Piazza, personal communication). Consistent with this, the HGDP Sardinians show close affinity to the Ogliastra individuals. (A) PCA of merged HGDP Sardinian from Human Origins Array data and the sequenced Sardinians shows PC2 appear to be driven by technical differences in data generation between the two datasets. (B) PC1 vs. PC3 better recapitulate the observed geographical variation in Figure 2, and shows the HGDP individuals appear scattered across Sardinia, with roughly half of the samples found to be as differentiated or more so than the Gennargentu region samples while another half are more similar to the broader sample of Sardinians. Thus we labeled the two subgroups of HGDP as “HGa” and “HGb”. Figure S3: Admixture results of the merged dataset of Sardinia and Human Origins Array data, for K = 2 to K = 15. For K = 4 to K = 15 Sardinians (HGDP Sardinians, Arzana, and Cagliari) all cluster together with near 100% of a unique component of ancestry that are not found at such high level outside of Sardinia. Visualization was done using Pong (Behr et al. 2016 Bioinformatics). > 10 PEL 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.6 0.6 0.5 0.6 0.8 0.9 1 3.5 6.9 7.6 9 14 7.1 11 8.3 7.6 6.5 9.3 7.7 5 5.8 5.2 4.7 5.7 4.1 7 6.5 6.2 5.7 16 11 14 198 CLM 0.2 0.2 0.2 0.3 0.4 0.2 0.3 0.9 0.9 0.7 0.9 1.1 1.3 1.4 4.9 10 12 13 24 8.2 16 14 11 7.9 13 12 4.1 4.6 3.7 3.6 4.2 4.2 6.9 6.2 5.7 5.4 19 16 249 14 PUR 0.2 0.2 0.4 0.4 0.4 0.2 0.4 0.9 1 0.7 1 1.3 1.3 1.5 5.2 11 12 13 22 10 21 16 13 10 16 14 3.9 4.3 3.6 3.3 4 4.1 6.6 5.8 5.4 5.2 18 242 16 11 MXL 0.2 0.2 0.3 0.3 0.4 0.2 0.5 1 0.9 0.9 1.1 1.3 1.4 1.3 5.6 11 13 15 24 7.6 15 13 11 7.2 13 9.4 6 6.8 5.5 5 6.1 5.5 8.8 8.2 7.6 7.3 198 18 19 16 STU 0.2 0.1 0.2 0.2 0.2 0.2 0.3 0.5 0.5 0.4 0.5 0.6 0.7 0.7 3 5 5.4 7.8 5.6 4.9 7.1 5.8 6.1 4.5 6.4 5.6 5.4 6.6 5.3 5.9 7.7 45 62 73 118305 7.3 5.2 5.4 5.7 ITU 0.1 0.2 0.3 0.2 0.2 0.2 0.2 0.5 0.5 0.4 0.5 0.5 0.7 0.7 3.4 5.4 5.8 8.4 6.1 5.3 7.7 6.4 6.5 5 6.6 6.1 5.7 6.9 5.3 6.2 8.7 47 65 76 218118 7.6 5.4 5.7 6.2 BEB 0.2 0.2 0.2 0.2 0.2 0.2 0.3 0.5 0.5 0.5 0.5 0.6 0.7 0.9 4 5.9 6.8 9.2 6.8 5.7 7.8 6.7 6.7 5.2 7.4 6.6 8.4 15 11 17 20 54 74 176 76 73 8.2 5.8 6.2 6.5 PJL 0.2 0.2 0.4 0.2 0.3 0.2 0.3 0.7 0.6 0.6 0.7 0.8 1 1.1 4.6 7.2 8.1 12 8.2 5.5 8.5 6.9 6.8 5.1 7.2 6.6 6.6 8.6 6.3 6.2 8.4 53 228 74 65 62 8.8 6.6 6.9 7 8 GIH 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.4 0.4 0.4 0.4 0.5 0.6 0.7 2.8 4.4 5 7.2 4.9 3.3 4.8 3.9 3.9 3.1 4.4 3.9 3.8 4.9 3.4 3.5 4.9 237 53 54 47 45 5.5 4.1 4.2 4.1 KHV 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.3 0.3 0.3 0.3 0.3 0.4 0.4 2.4 3.4 3.5 4.7 3.9 4.6 6.2 5.3 5.4 4.2 6.3 4.9 25 67 76 120265 4.9 8.4 20 8.7 7.7 6.1 4 4.2 5.7 CDX 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.2 0.3 0.3 1.9 2.8 2.9 3.7 3.1 3.6 5.2 4.3 4.4 3.3 5 4 17 52 66 344120 3.5 6.2 17 6.2 5.9 5 3.3 3.6 4.7 CHS 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.3 0.3 2.4 3 3.1 4 3.4 4 5.6 4.6 4.7 3.7 5.2 4.2 40 101136 66 76 3.4 6.3 11 5.3 5.3 5.5 3.6 3.7 5.2 CHB 0.1 0.1 0.2 0.1 0.1 0.1 0.2 0.3 0.3 0.3 0.3 0.4 0.4 0.4 3.6 3.8 4.2 5.4 4.2 4.9 6.9 5.6 5.8 4.6 6.4 5.1 62 128101 52 67 4.9 8.6 15 6.9 6.6 6.8 4.3 4.6 5.8 JPT 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.3 0.3 0.3 0.4 3.2 3.4 3.8 4.7 3.8 4.7 6.9 5.2 5.4 3.9 5.6 4.5 358 62 40 17 25 3.8 6.6 8.4 5.7 5.4 6 3.9 4.1 5 ACB 0.1 0.1 0.2 0.1 0.2 0.1 0.1 0.4 0.5 0.4 0.4 0.5 0.5 0.7 3.6 7.5 8.3 6.5 6.2 73 59 47 79 70 75 161 4.5 5.1 4.2 4 4.9 3.9 6.6 6.6 6.1 5.6 9.4 14 12 7.7 ASW 0.1 0.2 0.2 0.2 0.2 0.2 0.2 0.5 0.4 0.4 0.5 0.6 0.6 0.8 5 11 12 8.3 8.3 53 80 54 67 59 302 75 5.6 6.4 5.2 5 6.3 4.4 7.2 7.4 6.6 6.4 13 16 13 9.3 6 ESN 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.3 0.2 0.3 0.3 0.3 0.3 2 3 3.3 4.1 3.7 85 49 26 34 202 59 70 3.9 4.6 3.7 3.3 4.2 3.1 5.1 5.2 5 4.5 7.2 10 7.9 6.5 MSL 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.3 0.4 0.3 0.4 0.3 0.4 0.5 2.5 4.1 4.2 5.7 5.2 45 31 132488 34 67 79 5.4 5.8 4.7 4.4 5.4 3.9 6.8 6.7 6.5 6.1 11 13 11 7.6 GWD 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.5 0.4 0.4 0.5 0.5 0.6 0.7 2.4 4 4.3 5.9 6.5 32 30 356132 26 54 47 5.2 5.6 4.6 4.3 5.3 3.9 6.9 6.7 6.4 5.8 13 16 14 8.3 LWK 0.1 0.2 0.2 0.3 0.3 0.2 0.4 0.7 0.7 0.7 0.8 0.8 0.8 1.1 3 4.9 5.5 8.9 7.8 40 820 30 31 49 80 59 6.9 6.9 5.6 5.2 6.2 4.8 8.5 7.8 7.7 7.1 15 21 16 11 YRI 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.3 0.4 2 3.2 3.5 4.6 4.1 113 40 32 45 85 53 73 4.7 4.9 4 3.6 4.6 3.3 5.5 5.7 5.3 4.9 7.6 10 8.2 7.1 IBS 0.4 0.4 0.4 0.6 0.6 0.4 0.6 1.6 1.5 1.3 1.6 2 2.2 2.6 8.3 20 21 22 51 4.1 7.8 6.5 5.2 3.7 8.3 6.2 3.8 4.2 3.4 3.1 3.9 4.9 8.2 6.8 6.1 5.6 24 22 24 14 TSI 0.6 0.5 0.8 1.1 0.9 0.7 1 2.3 2.2 2 2.4 3 3.3 4.5 9.4 19 22 127 22 4.6 8.9 5.9 5.7 4.1 8.3 6.5 4.7 5.4 4 3.7 4.7 7.2 12 9.2 8.4 7.8 15 13 13 9 CEU 0.3 0.4 0.4 0.4 0.5 0.3 0.5 1.3 1.1 1.1 1.2 1.7 1.9 2.4 20 41 60 22 21 3.5 5.5 4.3 4.2 3.3 12 8.3 3.8 4.2 3.1 2.9 3.5 5 8.1 6.8 5.8 5.4 13 12 12 7.6 4 GBR 0.3 0.3 0.4 0.3 0.4 0.3 0.4 1 0.9 0.8 1.1 1.2 1.5 2 15 86 41 19 20 3.2 4.9 4 4.1 3 11 7.5 3.4 3.8 3 2.8 3.4 4.4 7.2 5.9 5.4 5 11 11 10 6.9

FIN 0.1 0.1 0.2 0.1 0.2 0.2 0.2 0.5 0.4 0.4 0.5 0.7 0.8 1.1 153 15 20 9.4 8.3 2 3 2.4 2.5 2 5 3.6 3.2 3.6 2.4 1.9 2.4 2.8 4.6 4 3.4 3 5.6 5.2 4.9 3.5 Normalized Sharing x 10 OLBIATEMPIO 2 2.3 3 4.3 2.7 7.3 4.9 7.5 9.2 13 7.5 8 22 82 1.1 2 2.4 4.5 2.6 0.4 1.1 0.7 0.5 0.3 0.8 0.7 0.4 0.4 0.3 0.3 0.4 0.7 1.1 0.9 0.7 0.7 1.3 1.5 1.4 1 SASSARI 2 2 2.2 3.4 2.8 3.2 4.8 7.8 11 13 8.8 9.6 30 22 0.8 1.5 1.9 3.3 2.2 0.3 0.8 0.6 0.4 0.3 0.6 0.5 0.3 0.4 0.3 0.3 0.4 0.6 1 0.7 0.7 0.7 1.4 1.3 1.3 0.9 CARBONIA 2 1.9 2.5 3.5 3.6 2.8 6.2 12 12 9.4 20 32 9.6 8 0.7 1.2 1.7 3 2 0.3 0.8 0.5 0.3 0.3 0.6 0.5 0.3 0.4 0.3 0.2 0.3 0.5 0.8 0.6 0.5 0.6 1.3 1.3 1.1 0.8 CAMPIDANO 2 2 2.8 3.4 3.3 2.5 5.9 13 15 8.9 36 20 8.8 7.5 0.5 1.1 1.2 2.4 1.6 0.3 0.8 0.5 0.4 0.3 0.5 0.4 0.3 0.3 0.3 0.2 0.3 0.4 0.7 0.5 0.5 0.5 1.1 1 0.9 0.6 NUORO 2.7 2.5 3 3.4 3.1 4.4 5.5 8.2 11 21 8.9 9.4 13 13 0.4 0.8 1.1 2 1.3 0.2 0.7 0.4 0.3 0.2 0.4 0.4 0.2 0.3 0.2 0.2 0.3 0.4 0.6 0.5 0.4 0.4 0.9 0.7 0.7 0.5 ORISTANO 2.4 2.2 2.9 3.1 3.1 3.1 5.7 9.6 22 11 15 12 11 9.2 0.4 0.9 1.1 2.2 1.5 0.2 0.7 0.4 0.4 0.3 0.4 0.5 0.2 0.3 0.2 0.2 0.3 0.4 0.6 0.5 0.5 0.5 0.9 1 0.9 0.6 2 CAGLIARI 2.5 2.7 3.5 4 3.7 3.1 6.3 17 9.6 8.2 13 12 7.8 7.5 0.5 1 1.3 2.3 1.6 0.2 0.7 0.5 0.3 0.2 0.5 0.4 0.2 0.3 0.2 0.2 0.3 0.4 0.7 0.5 0.5 0.5 1 0.9 0.9 0.6 TORTOLI 4.7 4.7 5.4 10 5 6.3 45 6.3 5.7 5.5 5.9 6.2 4.8 4.9 0.2 0.4 0.5 1 0.6 0.1 0.4 0.2 0.2 0.1 0.2 0.1 0.2 0.2 0.1 0.1 0.1 0.2 0.3 0.3 0.2 0.3 0.5 0.4 0.3 0.4 VILLAGRANDE 5.3 5.6 5 6.3 5.7 41 6.3 3.1 3.1 4.4 2.5 2.8 3.2 7.3 0.2 0.3 0.3 0.7 0.4 0.1 0.2 0.2 0.2 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.3 LANUSEI 13 5.8 10 5.5 18 5.7 5 3.7 3.1 3.1 3.3 3.6 2.8 2.7 0.2 0.4 0.5 0.9 0.6 0.1 0.3 0.2 0.2 0.1 0.2 0.2 0.1 0.1 0.1 0.1 0.1 0.2 0.3 0.2 0.2 0.2 0.4 0.4 0.4 0.3 BARISARDO 5.2 6.1 17 40 5.5 6.3 10 4 3.1 3.4 3.4 3.5 3.4 4.3 0.1 0.3 0.4 1.1 0.6 0.1 0.3 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.3 0.4 0.3 0.2 LOCERI 4.5 6.8 38 17 10 5 5.4 3.5 2.9 3 2.8 2.5 2.2 3 0.2 0.4 0.4 0.8 0.4 0.1 0.2 0.1 0.1 0.1 0.2 0.2 0.1 0.2 0.1 0.1 0.1 0.1 0.4 0.2 0.3 0.2 0.3 0.4 0.2 0.2 ILBONO 6.3 28 6.8 6.1 5.8 5.6 4.7 2.7 2.2 2.5 2 1.9 2 2.3 0.1 0.3 0.4 0.5 0.4 0.1 0.2 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.1 0.2 0.2 0.2 0.2 ARZANA 20 6.3 4.5 5.2 13 5.3 4.7 2.5 2.4 2.7 2 2 2 2 0.1 0.3 0.3 0.6 0.4 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.1 0.2 0.2 0.2 0.2 0.1 0 TSI FIN IBS ITU YRI PJL JPT GIH PEL STU BEB ACB MSL ESN KHV MXL LWK CEU CHB CHS CDX PUR CLM GBR ASW GWD NUORO LOCERI ILBONO ARZANA TORTOLI LANUSEI SASSARI CAGLIARI ORISTANO CARBONIA BARISARDO CAMPIDANO OLBIATEMPIO VILLAGRANDE Figure S4: Doubleton-sharing between pairs of populations in the merged dataset of Sardinia and 1000 Genomes. The first seven columns are Sardinian populations within Ogliastra. The next seven columns are Sardinian populations outside of Ogliastra. The remaining columns are 1000 Genomes phase 3 populations. There are clear enrichment of doubleton sharing within Ogliastra, among Sardinian populations outside of Ogliastra, and among continental populations. On the other hand, there are comparatively less sharing between Ogliastra and the rest of the island populations, and very little sharing between Sardinia and rest of 1000 Genomes population with the exception of TSI. Of note, there is small amount of elevated sharing between Sardinians outside of Ogliastra and 1000 Genomes Latino populations MXL, PUR, and CLM. 8 1e+06 6

1e+04 4

2 1e+02 Allele count in pop 2 Number of sites per bin 0 1e+00 0 20 40 60 80 100 Allele count in pop 1 8 1e+06 6

1e+04 4

2 1e+02 Allele count in pop 2 Number of sites per bin 0 1e+00 0 20 40 60 80 100 Allele count in pop 1

8 200

6 100

4 0

2 −100 Allele count in pop 2

Number of sites per bin −200 0

0 20 40 60 80 100 Allele count in pop 1

Figure S5: population demographic history parameter inference using fastsimcoal2. Population growth and divergence parameters were estimated using a 2D site frequency spectra from 67 high- coverage Sardinians and 4 Tuscans from Complete Genomics. The observed 2D site frequency spectra based on ~ 9.4 Mb of neutral sequences are shown on the top row. The fitted site frequency spectra from fastsimcoal2 are shown in the middle, and the residuals are shown on the bottom. D( Ju_hoan_North , ARZANA ; Bergamo , X )

A Saa C 0.216 SarHGDPa ● SarHGDPb ● Basque_Spanish ● Spanish_North ● Basque_French ● Rus French_South ● Nor Fin Spanish_Cataluna ● 0.179 Orc Est Spanish_Valencia ● Spanish_Aragon ● GBR Chu French ● Lit Mrd Spanish_Castilla_la_Mancha ● Bel Spanish_Cantabria ● Orcadian ● GBR Ukr Tuscan ● Cze English ● Norwegian ● Hun Albanian ● Fre ItB Czech ● Ady Nog Croatian ● CntBasSFreS Cro Bal CheKum Gal SpN BasF ItT AbkGeo OsN Lithuanian ● Ctl Bul Lez Spanish_Extremadura ● CaL Ara Alb Gre Arm Hungarian ● CaM SarASarB Tur Ext Val Bal ItyS Bulgarian ● Mur And Estonian ● Alg Tun ItS Ukrainian ● Cyp Syr Irn Spanish_Castilla_y_Leon ● Leb Belarusian ● Spanish_Galicia ● Moz DruPalJor Egy Finnish ● Bed Sicilian ● Russian ● Can Sao Mordovian ● f3( Ju_hoan_North ; ARZANA , X ) Chuvash ●

−0.04 −0.02 0.00 0.02 0.04

B Saa D( Ju_hoan_North , ARZANA ; Tuscan , X ) 0.216 SarHGDPa D statistic ● SarHGDPb ● Basque_Spanish ● Rus Spanish_North ● Nor Fin Basque_French ● 0.179 Orc Est French_South ● Bergamo ● GBR Chu Spanish_Cataluna ● Lit Mrd Spanish_Valencia ● Bel Spanish_Aragon ● French ● GBR Ukr Spanish_Castilla_la_Mancha ● Cze Spanish_Cantabria ● Orcadian ● Hun English ● Fre ItB Norwegian ● Ady Nog Albanian ● CntBasSFreS Cro Bal CheKum Gal SpN BasF ItT AbkGeo OsN Czech ● Ctl Bul Lez Croatian ● CaL Ara Alb Gre Arm Lithuanian ● CaM SarASarB Tur Ext Val Bal ItyS Spanish_Extremadura ● Mur And Hungarian ● Alg Tun ItS Bulgarian ● Cyp Syr Irn Estonian ● Leb Ukrainian ● Spanish_Castilla_y_Leon ● Moz DruPalJor Egy Belarusian ● Bed Spanish_Galicia ● Finnish ● Can Sao Sicilian ● Russian ● f3( Ju_hoan_North ; CAGLIARI , X ) Mordovian ● Chuvash ●

−0.04 −0.02 0.00 0.02 0.04

Figure S6: Relationship between Sardinia and mainland populations. (A) Outgroup f3 results of the form f3(San; Arzana, X), where X is a mainland population from Human Origins Array data. Sardinians shows the highest among of shared drift with other Sardinians as well as the Basque. (B) Similar display of results for f3(San; Cagliari, X) (C) To formally test for excess sharing between Sardinians and Basque, we computed D statistics of the form D(San, Arzana; Tuscan, X), where X is a mainland European population. Relative to Arzana-Tuscan or Arzana-Bergamo sharing, excess of sharing between Arzana and X would result in significant positive values of this D statistic, dearth of sharing between Arzana and X would result in significant negative values. Mainland populations with significant results ( | Z | > 4 ) in this analysis are bolded on the Y-axis. ARZANA Tuscan

vs. Mordovian ( 1.216 ) vs. Chuvash ( 1.084 ) vs. Croatian ( 1.22 ) vs. Sicilian ( 1.101 ) vs. Spanish ( 1.233 ) vs. Bulgarian ( 1.128 ) 0.8 vs. Hungarian ( 1.237 ) 0.8 vs. Croatian ( 1.184 ) vs. Sicilian ( 1.249 ) vs. Czech ( 1.184 ) vs. Belarusian ( 1.271 ) vs. Belarusian ( 1.194 ) vs. French ( 1.279 ) vs. Lithuanian ( 1.199 ) vs. Lithuanian ( 1.285 ) vs. Mordovian ( 1.205 )

0.6 vs. Norwegian ( 1.289 ) 0.6 vs. Hungarian ( 1.21 ) vs. Ukrainian ( 1.301 ) vs. Estonian ( 1.235 ) vs. Russian ( 1.306 ) vs. Orcadian ( 1.239 ) vs. Bergamo ( 1.344 ) vs. CAGLIARI ( 1.253 ) vs. Tuscan ( 1.354 ) vs. Basque_French ( 1.274 )

0.4 vs. Orcadian ( 1.356 ) 0.4 vs. Ukrainian ( 1.28 ) vs. Basque_Spanish ( 1.369 ) vs. French ( 1.323 ) cumulative density cumulative vs. Bulgarian ( 1.395 ) density cumulative vs. English ( 1.33 ) vs. English ( 1.408 ) vs. Spanish ( 1.341 ) vs. Estonian ( 1.447 ) vs. ARZANA ( 1.354 )

0.2 vs. Chuvash ( 1.484 ) 0.2 vs. Bergamo ( 1.354 ) vs. Czech ( 1.501 ) vs. Russian ( 1.397 ) vs. Basque_French ( 1.525 ) vs. Basque_Spanish ( 1.413 ) vs. CAGLIARI ( 2.245 ) vs. Norwegian ( 1.453 ) vs. Sardinian ( 2.488 ) vs. Sardinian ( 1.49 ) vs. ARZANA ( 2.937 ) vs. Tuscan ( 1.652 ) 0.0 0.0

0 10 20 30 40 50 60 0 1 2 3 4 5 6 7 8 9 IBD segment length (Mb) IBD segment length (Mb)

Figures S7: Pattern of IBD sharing between Sardinians and mainland populations. The cumulative density plot of segment sharing, with median segment length to each population displayed in order in the legend. Outside of Sardinia, the Sardinians have the highest median IBD segment shared with French Basque. Only populations with sample size greater than 8 are shown. Tuscan ● BedouinB , Norwegian * Sicilian ● BantuSA , Basque_French * English ● Lithuanian , Mozabite Basque_Spanish ● Basque_French , Italian_South Basque_French ● Basque_Spanish , Mende SarHGDPb ● Canary_Islanders , Italian_South SarHGDPa ● Canary_Islanders , Italian_South SASSARI ● Italian_South , Spanish_Aragon ORISTANO ● Italian_South , Spanish_Aragon OLBIATEMPIO ● Italian_South , Spanish_Aragon NUORO ● Italian_South , Spanish_Aragon CARBONIA ● Datog , Spanish_North CAMPIDANO ● Basque_Spanish , Gambian ● Italian_South , Spanish_Aragon Target Population Target CAGLIARI VILLAGRANDE ● Canary_Islanders , Italian_South TORTOLI ● Canary_Islanders , Italian_South LOCERI ● Canary_Islanders , Italian_South LANUSEI ● Canary_Islanders , Italian_South ILBONO ● Canary_Islanders , Italian_South GAIRO ● Canary_Islanders , Italian_South BARISARDO ● Canary_Islanders , Italian_South ARZANA ● Canary_Islanders , Italian_South

0.00 0.01 0.02 0.03

f3(Target; Source1, Source2)

Figure S8: Analysis of admixture signal using f3 statistics. For each (target) Sardinian populations with sample size greater than 8 and selected European populations from the Human Origins Array data, we compute f3 of the form f3(Target; Source1, Source2), where source 1 and source 2 are all possible pairs of populations in Human Origins Array data. Then for each target populations we display the pair of source populations that produced the lowest (not necessarily negative) f3 values. Significantly negative f3 statistics (bolded and marked with asterisk) is evidence of admixture, which we observe for mainland European populations Tuscan, and Sicilian (Z = -4.259 and Z = -8.29, respectively). There is also marginally significant evidence of admixture for English (Z = -3.20). None of the Sardinian populations and Basque populations showed evidence of admixture by this test. Saa 0.215

Rus Nor Fin Orc 0.178 Est GBR Chu Lit Bel Mrd

GBR Cze Ukr

Hun Fre ItB Ady Nog CntSpNBasSFreSBasF ItT Cro Abk BalOsNCheKum Gal Bul Geo Lez CaL Ctl Alb Ara Gre CaM SarASarBArz Arm Ext Val Bal ItyS Tur Mur And Alg Tun ItS Cyp Syr Irn Leb Moz DruPalJor Egy Bed

Can Sao

Figure S9: Relationship of ancient pre-Neolithic hunter-gatherer across Europe and within Sardinia. This map visualize the outgroup f3 statistics of the form f3(San; Loschbour, X), where X is a population across the merged dataset of Sardinia and Human Origins Array data. Population abbreviations are the same as Figure 4. LBK_EN Loschbour Yamnaya

GAIRO VILLAGRANDE TORTOLI SarHGDPa LOCERI ARZANA CAMPIDANO NUORO BARISARDO LANUSEI SarHGDPb ORISTANO CAGLIARI ILBONO CARBONIA OLBIATEMPIO SASSARI

0.0 0.2 0.4 0.6 0.8 1.0

Proportion of Ancestry

Figure S10: Mixture proportion of the three-component ancestries among Sardinian populations. Using a method first presented in Haak et al. (Nature 2015), we computed unbiased estimates of mixture proportions without an accurate model of relationships between the test populations and the outgroup populations based on f4 statistics. The three-component ancestries were represented by early Neolithic individuals from LBK culture (LBK_EN), pre-Neolithic hunter-gatherer (Loschbour), and Bronze Age steppe pastoralists (Yamnaya). A K = 3 1.0 0.8 0.6 0.4 Ancestry 0.2

0.0 K = 3 1.0 TSI split no_info NUORO ILBONO ARZANA LANUSEI 0.8 SASSARI CAGLIARI ORISTANO CARBONIA CAMPIDANO 0.6 0.4 Ancestry 0.2 0.0 TSI split no_info NUORO ILBONO ARZANA LANUSEI SASSARI CAGLIARI ORISTANO CARBONIA CAMPIDANO results.XvA.Q3.txt Figure S11: Admixture analysis B NOBS = 1572 (100% of data) based on 1577 unrelated 150 COR = 0.79273 XAVG = 0.3705 YAVG = 0.3015 Sardinians and TSI individuals at K 1 100 = 3. (A) Top is the result from using

50 the autosomal data, bottom is the

0 result from using chrX data. (B) The scatter plot on the right shows ● ●● ●●●● ●●●●● ● ●●●● ● ● ● ● ●● ● ● ● ●● ●● ● ●● ● ●●● ● ● ● ● 1 the correlation of the “red” ● ● ● 0.96 ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ancestral component for each ● ● ● ●● ● ● ● ● 0.88 ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● individual between chr X (X-axis) ● ●●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● 0.80 ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●●● and the autosome (Y-axis). ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ●● ● ● ●

0.72 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Overall, there is an enrichment of ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● 0.64 ● ● ● the red ancestry component (found chr1 − 22_K=3 ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● most prominantly among 0.56 ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Sardinians from Ogliastra) on chr ●● ● ● ● ●● ● ● ● ● 0.48 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● X than on the autosome. We ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●

0.40 ● ● ● ● ● ● ● ● ●● ● ● ● ● ● observed qualitatively similar result ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.32 ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● whether we use only the females ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●●● ● 0.24 ● ● ● ● ● ● ● ● ● ● ● ● ● ● in the dataset, remove the Arzana ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ●● ●● ● ●● ● ● ●●●● ● ● ● ●●●●● ● ● ● ● ●● ●●●●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ●●●● ●●●●● ● ●● ●●● ● ● ● ● ● ●●●● ●●● ●●● ●●●● ● ● ● ●● ● ●●●●●● ●● ● ● ● ● ● ●● ● ● ● individuals, or compare chrX 0.16 ●● ● ●●● ●● ● ●●● ● ● ●●●● ● ●● ●●● ●● ● ●●●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ●●●●●●●●●●●● ●● ● ● ● ●●●● ● ● ●● ● ●● ● ●●●● ● ● ●● ●●●●●●●●●●●●● ● ● ●●● ●● ●●● ●● ● ●●●● ● ● ● ● ● ●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●● ●● ● ● ● ● ●●● ● ●● ● ● ●● ●●●●●●●●●● ●● ●●●●●●● ●● ● ●●●● ● ● ●● ●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ● ● ● ● ● ● ●●● ● ● ● ●●●●●● ●●●●● ●●●●● ●●●●●●●●● ● ●● ●● ●●● ● ● ● ● estimates only to those on ● ●●●●●●●●●●●●●● ●●●●●● ●●●●● ● ● ● ● ● ● ●● ●● ●●●●●● ●●●●●●●●●●●●●●●● ● ● ● ●●●● ● ●●● ● ● ● 0.08 ●●●●● ● ●●●●● ● ●●●●●●●●● ●●●● ●● ●● ● ●● ●● ● ● ● ●●●●●● ●● ●● ●●● ● ●●●●● ● ● ● ●● ● ● ●● ● ● ● ●●● ● ●● ●●● ● ● ●● ● ● ● ● ● ● ●●●●●●●● ●●● ● ● ●● ● ●● ●●●● ● ● ●●●● ● ●● ● ● ● ● ● ●●● ● ● ●● ●● ● ● ● ●● ● ● ● ●● ●●●● ● ● ●●● ● ● chromosome 7, which is the most ●●●● ●● ● ●●●● ● ● ● ●● ● ● 0.00 closely matched autosome in 0.00 0.08 0.16 0.24 0.32 0.40 0.48 0.56 0.64 0.72 0.80 0.88 0.96 0 50 100 150 terms of sequenced bp and gene chrX_K=3 density in the human reference genome (Figure S13). Neolithic Farmers Hunter−gatherers 0.02 0.00 0.00

●● ●●●●●● − 0.05 ● ●● ● ● ● ●●●●● ● ● ● CHS CHS D (chrX) ● D (chrX) ●●● ●● ● ●

● ESN − 0.04 ESN

LWK ●●●●● LWK ●●● PJL ● ● PJL ●● ●● ●● ● ●●●● ● ●● ● ●●●● − 0.10 ●● YRI ● ●● ● YRI ● ●● ●● ● ● ●●● ● ●● ●● ● Ogl ● ● Ogl ● ● ● ● ●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ●●●● ● non−Ogl ● ● ● non−Ogl ●

− 0.08 ● ●●● ●● ● ● 1000G ● 1000G

−0.03 −0.02 −0.01 0.00 0.01 0.02 0.000 0.005 0.010 0.015 0.020

D (Autosome) D (Autosome)

Figures S12: Evaluating the impact of different outgroups used for D statistics. We computed D statistics of the same form as that in Figure 7, but using different outgroups from 1000 Genomes (indicated by color of points). This include CHS, ESN, LWK, and PJL. The dashed y=x line represents expectation if there are no excess of sharing with ancient samples when contrasting the autosome and the X chromosome (i.e. no sex-biased demography). In general the D statistics remains more negative using chr X data than using autosomal data for Sardinian populations regardless of the outgroups used. In contrast, the 1000G sample consistently falls on the y=x line. Anc = Neolithic OG = YRI Anc = HG OG = YRI

● Ogl A ● Ogl A non−Ogl A non−Ogl A 0.02 0.00 1000G A 1000G A ● Ogl X ● Ogl X non−Ogl X non−Ogl X 1000G X 0.00 1000G X 0.04 − 0.02 − 0.04 chr7_v.s._chrX chr7_v.s._chrX − 0.08 − 0.06 − 0.12 − 0.08 −

−0.12 −0.08 −0.04 0.00 −0.08 −0.06 −0.04 −0.02 0.00 0.02

primary analysis primary analysis

Figure S13: Evaluating the impact of using chr 7 instead of the entire autosome. We repeated the primary analysis (Figure 7), but using only data from chr 7. The results are qualitatively unchanged. ARZANA −2.8 −1.7 −0.36 −1.9 −1.2 −0.71 −3.4 −1.4 −1.4 −2.7 −0.88 −4.1 −2.3 BARISARDO −4.9 −3.2 −1.6 −4.8 −2.9 −1.9 −3.7 −3.5 −3.3 −4 −2.9 −4.7 −3.6 GAIRO −5.9 −4.2 −2.4 −4.5 −2.7 −1.9 −2.8 −3 −3 −4.2 −2.3 −4.9 −4 ILBONO −3.1 −2.5 −1.5 −3.1 −1.7 −1.7 −3.3 −3.3 −3.2 −2.8 −1.9 −4.7 −3.1 5.1 LANUSEI −4.5 −3.3 −2.6 −3.4 −2.1 −2.2 −3.4 −3.2 −2.3 −3.2 −2.6 −4.7 −3.1 4.1 LOCERI −3.9 −2.6 −2.7 −3.8 −3.5 −1.8 −4.6 −3.1 −2.4 −3.7 −2.8 −4.8 −4 3.1 TORTOLI −4.9 −4.8 −1.8 −3.2 −2.2 −2.1 −4.2 −4.1 −2.3 −3.9 −2.8 −4.4 −3.8 VILLAGRANDE −5 −3.8 −0.57 −2.6 −1.8 −1 −2.2 −2.7 −1.6 −3.3 −2 −4.4 −2.3 2.1 CAGLIARI −4.3 −3.4 −2.2 −4.2 −2.8 −2.5 −4.2 −3.6 −3.4 −4.3 −3.2 −4.9 −3.7 1.1 CAMPIDANO −4.6 −3.5 −2.3 −4.3 −3 −2.6 −3.8 −3 −3.3 −4.2 −3.3 −4.3 −3.5 CARBONIA −4.5 −4 −2.4 −4.1 −2.6 −2.2 −3.7 −3.3 −3 −3.7 −2.9 −4.3 −4 0.14 NUORO −4.8 −3.1 −1.9 −3.9 −2.3 −2.6 −3.4 −3.4 −3.1 −3.7 −2.8 −3.8 −3.1 −0.86 OLBIATEMPIO −5.6 −3.9 −2.6 −4.6 −3.7 −3.3 −5.4 −4 −4.8 −5.6 −3 −5.4 −3.8 −1.9 Dstat (X − A) x 100

D(YRI,Stuttgart,TEST,CEU) ORISTANO −4.8 −3.8 −2.5 −4.5 −2.4 −2.2 −4.1 −3.5 −3.5 −4.3 −3.2 −4.2 −3.3 SASSARI −4.9 −3.4 −1.9 −3.7 −2.8 −2.6 −3.6 −3.5 −3.4 −4.2 −2.8 −4.1 −3.2 −2.9 GBR 0.73 0.43 0.71 0.7 0.37 0.8 0.16 0.2 0.76 0.45 0.32 0.52 0.25 −3.9 0.46 0.35 0.12 0.29 −0.33 0.64 −0.77 −0.42 −0.21 0.34 −0.16 −0.44 −0.17 FIN −4.9 IBS −0.4 0.32 0.42 0.29 0.28 0.65 −0.33 0.29 0.02 0 0.01 0.06 −0.48 TSI 0.5 −0.01 1 0.23 0.38 0.31 0.18 0.34 0.19 0.4 0.25 0.61 −0.24 −5.9 Luo − 2 Esan − 2 − 2 Mbuti − 1 Luhya Somali − 1 Mende − 2 − 1 Yoruba Gambian − 2 Mozabite − 2 Mandenka − 2 − 2 BantuKenya Khomani_San − 1 Khomani_San − 2 SNP ascertainment panel

Figure S14: Examining the impact of different SNP ascertainment panels. We repeated the primary analysis (Figure 7) but instead of using all SNPs ascertained from 13 African individuals from the SGDP panel, we used only SNPs ascertained from a single individual at a time. We visualized the difference between D stat calculated on chr X vs. on autosome for each comparison. In general, different SNP ascertainment scheme did not impact our conclusion. 0.98 0.96 0.94 0.92 0.9 10 0.88 0.86 0.84 0.82 8 0.8 0.78 0.76 0.74 6 0.72 0.7 0.68 0.66 4 0.64 0.62 0.6 0.58 2 0.56 0.54 0.52 0.5 0 0.48 0.46 0.44 0.42 −2 0.4 Z score per bin 0.38 0.36 0.34 initial haplotype frequency 0.32 −4 0.3 0.28 0.26 0.24 −6 0.22 0.2 0.18 0.16 −8 0.14 0.12 0.1 0.08 −10 0.06 0.04 0.02 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400

arrival time (gen ago)

Figure S15: Forward simulation to assess the likelihood that the I2a1a1 haplotype drifted to its observed present day frequency. For each pair of initial parameters (the arrival time of the haplotype on Sardinia in number of generations and the initial haplotype frequency), we forward simulated 100,000 times the frequency of the haplotype in a Wright-Fisher model with population size trajectory estimated for Lanusei from MSMC (Figure 5). Then based on the 100,000 simulated values of present frequency, we computed the likelihood of the allele reaching 39%, the observed frequency today for I2a1a1. A positive Z score (red) suggests that the present day frequency of 39% is too high, unlikely to occur by drift. Conversely, a negative Z score (blue) suggests the present day frequency is too low, unlikely to occur by drift. We assumed that I2a1a1 is absent among Neolithic individuals, but is 100% among pre-Neolithic hunter-gatherers. The black box is a plausible initial condition based on the Stuttgart sample, which is ~7.5-8 ky old, and contains ~10% hunter-gatherer ancestry (Haak et al.). Thus we assumed that Neolithic farmers carrying on average 10% of I2a1a1 haplotype colonized and populated the island approximately 250 to 300 generations ago. The yellow box assumes a second plausible initializing condition based on a Spanish Middle Neolithic individual (Haak et al.), which is ~5-6 ky old, carrying ~20% hunter-gatherer ancestry. For both scenarios we find that the present day I2a1a1 frequencies are too high to occur by neutral demography (Z ~ 4.7-5.8 for black box, Z ~ 2.4-3.0 for yellow box). Repeating the simulation using Arzana population size trajectory (Figure 5) did not qualitatively change the results. ● csct_000832 ● CSCT ● csct_000832 ● CSCT 0.015 ● Progenia 0.015 ● Progenia

● csct_000838csct_000844 ● csct_000838csct_000844 ● csct_007728 ● csct_007728 ● csct_000805 ● csct_000805 ● csct_004829 ● csct_004829

0.010 ● csct_000391 0.010 ● csct_000391 ● csct_000141 ● csct_000141

●● ● ●● ●● ● ● ●● ● ● ● ● ●● ● ● ●●● ●●● ● ●● ●● ●● ●●● ● ●●●●● ●●● ●●●●● ● ●●● ●●●●●● ●●● ●●●●● ●●●● ● ●●●●●● ● ●● ●●●●● ●●● ●●●●●●● ● ●●●● ●●●●●●●●● ●● ●●●●● ●●●● ●●●●●●●●●● ●●●● ●●●●●●●●●●●● ● ●●●●●●● ● ●●●●●●●●●● ● ● ●●●●●● ● ●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●● ●●●●●●● ●● ●●●●●●●●●●●●●●●●● ● 0.005 ●●●●●● 0.005 ●●●●●● ●●● ● ●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ● ●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● proportion of genotypes with posterior probability < 0.9 ●●●●●●●●●●● proportion of genotypes with posterior probability < 0.9 ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ● ●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●● ●● ● ●●●●●●●●●●● ●●●●●●●●●●●● ●●● ● ● ●●●●● ● ● ●●●●●●● ● ● ●● ● ●● ● ● ●●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● 0.000 0.000

0 10 20 30 40 50 0 2 4 6 8 10

mean depth mean depth

Figure S16: Individual-level quality control based on imputation qualities. For each of 3,514 sequenced individual, we plotted the mean depth of coverage against imputation quality measured by the proportion of genotypes with maximum posterior genotype probability < 0.9. In general, we observe that Progenia samples (orange), consist of mostly individuals from the Ogliastra province, had higher imputation qualities, suggesting that the increased relatedness among the individuals improved imputation quality at a given depth of coverage. In contrast, CSCT samples (blue), consist of mostly individuals from outside the Ogliastra province, showed slightly worse imputation quality. Individuals above the dashed line were removed from analysis. The right panel zoomed in to include only individuals with coverage < 10x. Histogram of x$PI_HAT 5000 4000 3000 2000 number of pairwise relationships number 1000 0

0.0 0.2 0.4 0.6 0.8 1.0

pi_hat

Figure S17: Distribution of pihat among 3,514 sequenced Sardinians. Among all pairs of relationships examined, peak pihat values were observed to center around 0.5, 0.25, and 0.125, corresponding to first, second, and third degree relationship in the dataset. Then we observe a large number of relationship pairs with low level pihat. We took a cut-off of 0.07 to identify the 1,577 unrelated Sardinians for downstream analysis. K = 2 1.0 0.8 0.6 0.4 Ancestry 0.2 0.0 2:split 1:split 1N:split 4:no_info 4:NUORO 3:NUORO 2:NUORO 4:ILBONO 3:ILBONO 2:ILBONO 4:ARZANA 3:ARZANA 2:ARZANA 4:TORTOLI 3:TORTOLI 2:TORTOLI 4:LANUSEI 3:LANUSEI 2:LANUSEI 4:SASSARI 3:SASSARI 2:SASSARI 4:CAGLIARI 3:CAGLIARI 2:CAGLIARI 4:ORISTANO 3:ORISTANO 2:ORISTANO 4:CARBONIA 3:CARBONIA others − CSCT 4:CAMPIDANO 3:CAMPIDANO 2:CAMPIDANO others − Progenia 4:VILLAGRANDE 3:VILLAGRANDE 2:VILLAGRANDE

Figure S18: Admixture analysis on Sardinian individuals stratified by the major grandparental ancestry. For each individual, we designated the majority grandparental origin by both number of grandparents and the geographical location. Two grandparents from a particular location may constitute the major grandparental ancestry if the other two grandparents originate from different locations. In general we observe that the ancestry component distribution did not differ significantly between different number of majority grandparental origin, but grouped only individuals with at least 3 of the 4 grandparents from the same location to belong in that particular subpopulation. Bars for some groups are enhanced in width in order to aid the visualization.