Table S1: Parental and Grandparental Origin of Duplicate Samples Between Sardinia Whole Genome Sequencing Data and HGDP Sardinians
Total Page:16
File Type:pdf, Size:1020Kb
ID HGDP Parental Origin Paternal Origin Maternal Origin Call Seq? ID F M GF GM GF GM 171 1062 Tortoli Tortoli Tortoli Loceri Tortoli Tortoli Tortoli Yes 832 1073 Lanusei Ilbono Lanusei Lanusei Lanusei Ilbono Lanusei Yes 833 1074 Loceri Seui Loceri Loceri Loceri Loceri Loceri Yes 2306 1076 Lanusei Lanusei - Lanusei Lanusei Lanusei Lanusei No 2483 1072 Lanusei Lanusei Lanusei Lanusei Lanusei Lanusei Lanusei No 4676 1064 Ilbono Ilbono Ilbono Ilbono Ilbono Loceri Ilbono Yes 4841 1066 Lanusei Gairo Lanusei Lanusei Gairo Gairo NA Yes 4842 1065 Ulassai Ulassai Perdas- Ulassai Ulassai Ulassai Ulassai Yes defogu Table S1: parental and grandparental origin of duplicate samples between Sardinia whole genome sequencing data and HGDP Sardinians. Of the 27 HGDP Sardinians on the Human Origin Array dataset, 8 were included in our sample. For seven of these we had grandparental information that imply the samples derive from the villages of Tortoli, Lanusei, Loceri, Ilbono, and Ulassai in the Ogliastra province. These samples were likely shared with the HGDP by early researchers collecting for the Lanusei/SardiNIA project. Cluster A (N = 12) Cluster B (N = 15) HGDP01062 HGDP01073 HGDP00666 HGDP00674 HGDP01063 HGDP01074 HGDP00667 HGDP01068 HGDP01064 HGDP01075 HGDP00668 HGDP01069 HGDP01065 HGDP01076 HGDP00669 HGDP01070 HGDP01066 HGDP00670 HGDP01077 HGDP01067 HGDP00671 HGDP01078 HGDP01071 HGDP00672 HGDP01079 HGDP01072 HGDP00673 Table S2: list of HGDP Sardinians divided into two clusters based on joint PCA analysis with the sequenced Sardinians (Figure S3). We named the two sub-clusters Cluster A (SarHGDPa) and Cluster B (SarHGDPb). Cluster A HGDP Sardinians are more closely resembling those from the Gennargentu region in our dataset. Present Size (N) Bottleneck Size (N) Growth Rate (%) Divergence Time (gen) Sardinia 35,492 6,491 0.00487 349 (19,162-84,406) (5,460-19,468) (0.00055-0.020) (89-403) Tuscan 124,393 71,957 0.00271 202 (59,416-161,973) (3,952-97,085) (0.0017-0.018) (55-374) Table S3: Demographic parameters inferred by fastsimcoal2. Confidence interval based on 5th and 95th percentile values from 100 bootstrap runs. 78% of the bootstrap replicates support earlier (higher value) divergence time for Sardinia relative to TSI. Test Admixture Date Fitted Population Reference 1 Reference 2 (gen) amplitude P-value α1 (s.e.) α2 (s.e.) Spanish 2.46e-05 +/- CAGLIARI Esan (Castilla y Leon) 56.88 +/- 5.60 2.22e-06 6.45E-20 0.0096 (0.0057) 0.011 (0.0073) Spanish 2.46e-05 +/- CAGLIARI Luhya (Castilla y Leon) 67.90 +/- 10.84 4.11e-06 5.10E-05 0.0098 (0.0058) 0.012 (0.0075) 2.45e-05 +/- CAGLIARI Luhya Russian 68.41 +/- 8.06 3.21e-06 5.85E-10 0.027 (0.0037) 0.062 (0.0048) Spanish 2.39e-05 +/- CAGLIARI Mende (Castilla y Leon) 58.41 +/- 7.51 2.37e-06 1.65E-10 0.0093 (0.0055) 0.011 (0.0070) 2.33e-05 +/- CAGLIARI Mende Orcadian 55.78 +/- 7.33 2.69e-06 6.15E-10 0.031 (0.0037) 0.031 (0.0048) 2.50e-05 +/- CAMPIDANO Luhya Tuscan 61.38 +/- 12.22 5.08e-06 0.021 0.0091 (0.0045) 0.018 (0.0059) 2.25e-05 +/- CAMPIDANO Yoruba Tuscan 50.39 +/- 10.00 4.17e-06 0.0108 0.0089 (0.0044) 0.018 (0.0058) 2.37e-05 +/- CARBONIA * Mende Saudi 68.78 +/- 13.46 4.88e-06 0.027 0.016 (0.14) 0.022 (0.10) 4.02e-05 +/- ORISTANO Hadza Hungarian 108.60 +/- 11.74 7.51e-06 0.0021 0.022 (0.0038) 0.029 (0.0049) 3.31e-05 +/- ORISTANO Ju_hoan_North Hungarian 92.51 +/- 15.34 6.90e-06 0.036 0.014 (0.0025) 0.017 (0.0029) 3.18e-05 +/- ORISTANO BantuSA Estonian 88.13 +/- 16.98 6.15e-06 0.0054 0.036 (0.0042) 0.039 (0.0053) 2.97e-05 +/- ORISTANO * BantuSA Syrian 106.70 +/- 20.40 6.19e-06 0.0375 -0.044 (0.046) -0.083 (0.038) 2.94e-05 +/- ORISTANO * BantuSA Jordanian 106.83 +/- 21.18 5.72e-06 0.0105 -0.064 (0.043) -0.070 (0.034) 2.46e-05 +/- SASSARI BantuSA Lithuanian 68.64 +/- 9.62 3.39e-06 2.25E-08 0.036 (0.0041) 0.030 (0.0054) 2.41e-05 +/- SASSARI Mbuti Lithuanian 68.77 +/- 13.15 3.29e-06 0.0039 0.030 (0.0034) 0.025 (0.0044) 2.23e-05 +/- SASSARI Luhya Lithuanian 73.87 +/- 12.69 3.96e-06 0.00042 0.039 (0.0045) 0.034 (0.0060) 2.18e-05 +/- SASSARI Biaka Lithuanian 60.98 +/- 12.81 3.76e-06 0.045 0.031 (0.0036) 0.026 (0.0046) 1.96e-05 +/- SASSARI BantuSA Finnish 55.80 +/- 11.69 3.61e-06 0.042 0.037 (0.0048) 0.044 (0.0057) Table S4: Admixture proportions estimated by f4 ratio test. αi given are the estimated Subsaharan admixture proportion with respect to different outgroups. For α1 outgroups used were Finnish and Chimp; For α2 outgroups used were Han and Chimp. * For Carbonia and Oristano, because the non-African source population is a Middle Eastern population, we used Esan and Gambian as the outgroups rather than Finnish and Han. Estimated admixture proportion for Carbonia is consistent with 0. population f3 statistic Standard error f3(San; Stuttgart, Sardinian) SarHGDPa 0.213699 0.003046 ARZANA 0.213258 0.003044 SarHGDPb 0.213233 0.003034 GAIRO 0.213016 0.003058 VILLAGRANDE 0.212972 0.003001 LOCERI 0.212967 0.002998 BARISARDO 0.212886 0.003045 LANUSEI 0.212836 0.003016 ILBONO 0.212805 0.003009 NUORO 0.212759 0.002978 TORTOLI 0.212545 0.002991 ORISTANO 0.212294 0.00298 CAMPIDANO 0.212063 0.00297 CAGLIARI 0.211977 0.002964 SASSARI 0.211816 0.002984 CARBONIA 0.211571 0.002993 OLBIATEMPIO 0.211258 0.002996 f3(San; Loschbour, Sardinian) GAIRO 0.209925 0.002977 LOCERI 0.209595 0.002983 LANUSEI 0.209496 0.002949 ARZANA 0.209433 0.002973 NUORO 0.209379 0.002937 BARISARDO 0.209214 0.002992 VILLAGRANDE_STRISAILI 0.209143 0.002979 SarHGDPb 0.208838 0.002977 TORTOLI 0.208826 0.002936 ILBONO 0.208738 0.00296 SarHGDPa 0.208532 0.002991 OLBIATEMPIO 0.208516 0.002933 ORISTANO 0.208376 0.002944 SASSARI 0.208286 0.00294 CAGLIARI 0.208146 0.002923 CARBONIA 0.20774 0.002939 CAMPIDANO 0.207687 0.002919 Table S5: measure of shared drift between each Sardinian subpopulations and the Neolithic European farmer (Stuttgart) and pre-Neolithic hunter-gatherer (Loschbour). Table S6: an EXCEL spreadsheet listing the information of all populations used. A B K = 2 Fst x 1000 1.0 SarHGb 9.2 3.5 4.3 6.5 7.6 2.2 3 4.4 6.8 0.8 VILLAGRANDE 6.7 4.1 4.2 4.8 5.9 3.7 3.8 4.3 6.8 GAIRO 6.6 1.6 2.5 2.6 5.2 −0.4 2.1 4.3 4.4 0.6 TORTOLI 6 1.7 2.2 3.4 4.4 1.6 2.1 3.8 3 SarHGa 5.4 0 1.5 1.4 3.4 1.6 −0.4 3.7 2.2 0.4 ILBONO 5.4 2.9 3.6 3.9 3.4 4.4 5.2 5.9 7.6 Ancestry LANUSEI 5.4 1.5 2.7 3.9 1.4 3.4 2.6 4.8 6.5 0.2 BARISARDO 5.2 0.3 2.7 3.6 1.5 2.2 2.5 4.2 4.3 0.0 LOCERI 4.8 0.3 1.5 2.9 0 1.7 1.6 4.1 3.5 Ilb Ori Tor Arz Bar Gai Olb Loc Car Lan Vgs Sas Nuo Cag HGa HGb Cmp ARZANA 4.8 5.2 5.4 5.4 5.4 6 6.6 6.7 9.2 K = 3 GAIRO SarHGa SarHGb LOCERI ILBONO ARZANA TORTOLI LANUSEI 1.0 BARISARDO VILLAGRANDE 0.8 0.6 0.4 Ancestry 0.2 0.0 Ilb Ori Tor Arz Bar Gai Olb Loc Car Lan Vgs Sas Nuo Cag HGa HGb Cmp Figure S1: Relationship between HGDP Sardinians and newly sequenced Sardinian dataset. The HGDP Sardinians are recorded as being collected from the Gennargentu region (A Piazza, personal communication). Consistent with this, the HGDP Sardinians show close affinity to the Ogliastra individuals. (A) Admixture analysis at K = 2 shows that the HGDP Sardinians, here divided into HGa and HGb (see Figure S2) cluster with Sardinians from Ogliastra. K = 3 identified the HGDP Sardinians to stand out from the rest of the Sardinians, likely reflecting differences in data generation. (B) Fst matrix within Sardinia, including the HGDP Sardinians (SarHGa and SarHGb). A B HGbHGbHGb Arz HGbHGbHGbHGb HGbHGb HGa Arz 0.20 HGb HGbHGb Arz ●HGb Arz HGb HGbHGb HGa Arz HGa HGa 0.10 Arz HGa HGaHGa ArzArzArzxArz HGa HGaHGa HGa Arz ArzArzArz ●HGaHGa ArzArzArzArz ArzArzArzArz ArzArzArzArz x x ArzArz●xIlb ArzArz HGb ArzArz x x Arzx 0.15 x x Arz Arzx Arzx x ? x x x HGa ? x x Ilb Arz Ilbx xIlb 0.05 HGb HGbHGb x x x xIlb xx x x HGbHGb x x ? x HGbx xx x x xIlb x x ArzxArzxIlbx Olb SasOlb?? x x HGbHGbHGb xx xx x xx EliLan ?x x xCar?xVgsOlbx? HGbx x xx IlbIlbxIlbxIlbxIlb CagCarxxx?SasSasxOlb?SasOlb?OlbOlbxSas?? ? xHGb● HGb x IlbxIlb Elix ?xOlb?xCag x x x xIlbIlb Ilb xSasOlbxSas??OlbSasOlbxxOlbSas?Olb?xSas?Nuoxx?Sasx? x x Cagx x xxxIlbIlbxIlb VgsxIlbIlbxIlb CarSas?xOlbx?CagSas?xSasNuox?SasxSas??SasSasSas??Sas?xSas?xSas?xx?Nuo?x x HGbIlb BarIlbIlbxxx IlbIlb●IlbIlbIlbEli Ilb xx●SasCagOlbxSasSasSas?xCarCmpSas?Car?SasxCagNuoSasxCagSas?xSas?xSasNuoOriCagSas?SasOrixOriSasOlbxSasxOriSasx?CarOrixNuoNuox Tor x x xxIlbIlbIlb CagCarx xSasCagCarCagSasSar?SasSas?OriCarOriSas?SasCagSasxCagOriNuoSasxOlb?xNuoNuox?CmpOriNuoxxxNuo?x x x xx x Ilbx x●Ilbx Ilb x SasSasCmpSasCarSasSar? xOrixxNuoxNuo x x Ilb 0.10 ? CmpxNuoSasCmp?Sasxxxx xNuo Vgs HGaVgs x NuoCagCmpCag●SarSasCmpxCmpSarNuo?CmpCagOriCmpSasOriNuoCarxOriCag?xOrixCmpSasCag?xNuo?SarCagxOri?CmpNuox?NuoOriCagx?xNuoxxCagOrixxCarNuoxNuoxx?? Nuo x x xx xxx VgsIlbIlb CmpSasCmpCmpCagCmpCmpxSarCmpCarx?SasCagOriCmpNuoSasSarCarCagCmp?OriNuoCarCmpCagOrixCmpxOriSarCarxCagxOriCagxCmpCarNuoCag?xOriCagxCmp?xCagNuoCarNuoxCagNuo?CagSarNuoxx?NuoNuoxxxNuox xxx xx Loc x IlbVgsVgsVgsBarx Ilb CagCmpCagCmpOriCarCagCmpxCagNuoCarOriCmpSarCmpCag?OriCmpCarCag?CmpOriSarxCmpCagSar?OriSarCagxCmpCarNuoxCagOrixCagCarx?Orix?CagxOrixNuo?CagCarOriCag?CagOrixCagxNuoxx x x Torx x xx xx xBarxVgsLanxIlbVgsLan HGa Cmp?CarCmpCagOriCmpCarSasCmpCagNuoCar?CmpCagOrixCagCarCagxCmpOrixCagOrixCarxCarNuoCagSarxOriCagNuoxCag?xCagx?CagxOriCagNuoxCagNuoxNuoCagCagNuoxxxCagNuoxNuo?NuoCagTorNuox??Cagxx? x x xxxTorxVgs xxLocx x xx Ilb 0.00 xOriCmpCmpCagOriCmpCagCagx?CagCagx●Sar?xCagxCag?CagxNuoNuo Tor x Barx x●OriCmpxCmpOrixSarCagCmpxCagx?Sar?OriCagCagOrixCagxCagxCagxCagxCagCagxCagxSasCagNuo?CagNuoCagxxCagNuox TorTorTorTorxTorTorLanxx x ●Vgs HGa ●●CmpOriCagCagCagCagxxCagNuoOri?NuoCag?CagCagxCagxx Torx x x Vgsxx Ilb