average +2SD −2SD sdThres nCap Y chromosome no rmalized c overage in the region 28457493 − 28806243 b p, sdThres= 0.4 nCap= 25 1 0 −1 −2
log10 C overage −3
Y chromosome no rmalized c overage in the region 28457493 − 28806243 b p, sdThres= 0.8 nCap= 80 1 0 −1 −2
log10 C overage −3 28.46 28.54 28.62 28.7 28.78 Y chromosome position (Millions of base pairs)
Figure S1. An example of a Y chromosome region filtering by local unique sequence cov- erage (UC). Combinations of filtering parameters (sdThres/nCap) applied to the same chrY region – upper panel 0.4/25; lower panel 0.8/80. In a sample of 307 Y chromosomes we nor- malized each individual’s average UC in 1000 base pair (bp) windows (sliding by 50 bps) to that of the individual’s average UC on chrX, then calculated the log10 average and standard deviation (SD) across all individuals. We then established two parameters for excluding sequence stretches of poor unique coverage: 1) sdThres – wide SD margins (exceeding a set value) indicate that coverage varies between individuals in the region; 2) nCap – very low or very high coverage (+2 SD < 0 or -2 SD > 0) in a set number of consecutive windows suggests a deletion/duplication in the region. Figure S2. The effect of different combinations of nCap/sdThres on the average number of sites retained. The effect of different combinations of nCap/sdThres (filter b) on the sequence length (average number of sites) retained. Within 10.79 MB of chrY regions we applied a re-mapping filter, removed individual bases with less than 5x unique coverage and then tested the effect of nCap/sdThres combina- tions on the average number of nucleotides in the sample of 307 Ychromosomes for which coverage data was available. The combination of these two parameters reaches a plateau approximately at nCap=75/sdThres=0.75 where the gains in retain- ing the sequence length from further relaxing the parameters is negligible – the difference between nCap100/sdThres1.0 and nCap300/sdThres 3.0 is only 0.01%. To retain at least 80% of the original data we picked a point between nCap75/sdThres0.75 and nCap100/sdThres1.0 – and made final analyses with the parameters nCap80/sdThres0.8. Dashed line marks the 10.79MB, orange line marks the point at which 80% of the original sequence length was retained. 25 Kyrgyz_Tdj; KyrgzTJ2 24 Kyrgyz_Tdj; KyrgzTJ3 23 Kyrgyz_Tdj; KyrgzTJ1 26 Altaians; Altai4 22 Altaians; Altai3 21 Ishkasim, Tajiks; Ishk2 27 Assyrians; Assyr1 20 Rushan-Vanch, Tajiks; RVnch2 30 Kyrgyz; Kyrgz3 29 Kyrgyz; Kyrgz4 28 Kyrgyz; Kyrgz2 Altaians; Altai5 36 Kazakhs; Kaz1 19 35 West-Bengal; Beng2 34 Gujaratis; GIH_2 37 Gupta, Uttar-Pradesh; Gupta1 18 33 Gujaratis; GIH_4 38 Middle-caste, Orissa; OrMc1 32 Kurmi, Uttar-Pradesh; Kurmi1 31 Ishkasim, Tajiks; Ishk1 Cossacks; Cosk2 17 Iranians; Iran3 42 Circassians; Cirkas3 41 Shugnan, Tajiks; Shugn1 40 Bashkirs; Bashk4 43 Azerbaijanis; Azerb24 39 Rushan-Vanch, Tajiks; RVnch1 Balkars; Balkar2 52 Russians-North; RusPi2 51 Komis; Komi2 50 Russians-West; RusPs2 53 Karelians; Karel2 49 Karelians; Karel1 R1a1’3 48 Ukrainians_west; UkrnW2 16 47 Ukrainians; Ukr7 54 Ukrainians_east; UkrnE1 Russians-Central; RusRz1 46 60 Cossacks_Kuban; CoskK2 59 Hungarians; Hun_3 56 Ukrainians_north; UkrN1 55 58 Ukrainians_east; UkrnE2 57 Mordvins; Mord2 45 Ukrainians_east; UkrnE3 Ukrainians_west; UkrnW3 62 Circassians; Cirkas2 44 Cossacks; Cosk1 61 Tatars; Tatar3 64 Swedes; Swe2 63 Swedes; Swe1 Saami; Saami6 R1 15 75 Colla, Andean-HA; AndT13 74 Colla, Andean-HA; AndT7 73 Germans; Ger3 72 Portugese; Port1 76 Mordvins; Mord1 71 French; Fra1 78 NW-Europeans; CEU_3 70 77 Christian-Arabs-Israel; CArab2 69 Italians; TSI_2 79 NW-Europeans; CEU_4 United-Kingdom, Britans; UK1 R 14 68 NW-Europeans; CEPH_13 67 Ukrainians_west; UkrnW1 82 Armenians; Armen1 81 Assyrians; Assyr3 66 Tabasarans; Tabas17 80 Avars; Avar9 65 83 Mongolians; Mong4 Bashkirs; Bashk3 Tajiks; Tajik1 R1b 87 Brahmin, Uttar-Pradesh; Brahm2 85 Thakur, Uttar-Pradesh; Thak1 R2 86 Gujaratis; GIH_1 84 Santhal, Jharkhand; Santh1 Brahmin, Uttar-Pradesh; Brahm1 98 Colla, Andean-HA; AndS47 96 Colla, Andean-HA; AndS23 P1 97 Colla, Andean-HA; Colla4 13 95 Andean-Cachi; Cachi5 94 Colla, Andean-HA; AndT12 93 Andean-Cachi; Cachi4 100 Wichi, Andean-LA; Wichi3 99 Wichi, Andean-LA; Wichi4 92 Colla, Andean-HA; AndS41 Q1a 91 101 Andean-Cachi; Cachi3 Colla, Andean-HA; AndS11 90 Eskimo; Esk2 104 Kets; Ket1 103 Selkups; Selkp3 12 Q1 89 Q1c 102 Kets; Ket2 P# Uzbek; Uzbk3 105 Croats; croat11 Brahmin, Nepalese; NepBr1 Q 88 110 Koryaks; KrkG55 109 Koryaks; KrkG11 108 Koryaks; KrkG13 107 Koryaks; KrkG14 11 106 Murut; Murut19 112 Turkmens; TrkUzb2 MR 111 Turkmens; TrkUzb1 Q2 Turkmens; TrkUzb3 Agta; Agta1 116 Bajo; Bajo17 114 Lebbo; Lebb2 115 Aeta; Aeta2 113 Aeta; Aeta1 118 Kosipe; Kosip1 117 Kosipe; Kosip3 Kosipe; Kosip2 137 Buryats; Bur336 MS 136 Buryats; Bur406 135 Kazakhs; Kaz3 134 Mongolians; Mong6 133 Mongolians; Mong1 Mongolians; Mong2 131 127 Koryaks; KrkG233 130 Chukchis; Chuk5 129 Eskimo; Esk3 128 Koryaks; KrkG152 126 132 Chukchis; Chuk9 Koryaks; KrkG294 140 Karelians; Karel3 139 Estonians; Est17 138 Vepsas; Veps2 Cossacks_Kuban; CoskK1 125 146 Latvians; Lat2 145 Estonians; Est2 144 Russians-West; RusPs1 143 Latvians; Lat3 142 Evens_Magadan; Evn31 10 149 Estonians; Est6 147 Estonians; Est21 NR 124 141 148 Saami; Saami5 Saami; Saami4 Russians-North; RusPi1 153 Yakuts; YakM1 152 Yakuts_Sakha; YakS8 N3a 123 151 Yakuts_Kr; YakK3 150 Evens_Sakha; EvenS2 Lebanese; Lebn1 N3# 122 156 Maris; Mari1 155 Udmurds; Udmrd4 154 Udmurds; Udmrd2 Maris; Mari2 Shor; Shor1 Africa 162 Evenks; Evk55 161 Yakuts_Kr; YakK4 121 160 Mongolians; Mong5 163 Tuvinians; Tuva2 159 Tuvinians; Tuva1 165 Forest-Nenets; NenetF3 N# 164 Tundra-Nenets; NentT2 Near–East 158 Tundra-Nenets; NentT1 120 N2a1 157 Evens_Sakha; EvenS3 167 Maris; Mari3 166 Vepsas; Veps4 K# Udmurds; Udmrd3 9 N4 Han; CHB_4 Europe 174 Dusun; Dusun12 172 Dusun; Dusun7 173 Lebbo; Lebb1 171 Gond, Madhya-Pradesh; Gond1 NO 119 176 Vietnamese_south; VietS2 O2a1’2 170 175 Vietnamese_south; VietS1 South–Asia Ho, Jharkhand; Ho1 177 Vietnamese_north; VietN4 Vietnamese_south; VietS3 169 181 Batak; Batak1 180 Murut; Murut13 179 Murut; Murut6 O1 178 Batak; Batak2 Central–Asia Agta; Agta2 182 Agta; Agta3 O 168 189 Burmese; Burm4 188 Burmese; Burm3 187 Burmese; Burm20 O3a1 186 Burmese; Burm8 East & Southeast–Asia 185 Burmese; Burm12 184 Burmese; Burm15 190 Bajo; Bajo21 O3 183 Batak; Batak3 Igorot; Igrt1 192 Tatars; Tatar2 Siberia LT 191 Druze; ISR34 IT 8 Muslim-Arabs-Israel; MArab1 203 Armenians; Armen3 201 Muslim-Arabs-Israel; MArab2 202 Saudi-Arabians; Saudi1 200 Saudi-Arabians; Saudi2 199 Druze; ISR35 Oceania 205 Albanians; Alban3 204 Armenians; Armen2 J1b 198 Jordanians; Jord2 197 Armenians; Armen5 Armenians; Armen4 196 208 Avars; Avar1 Andes J1 J1a 207 Kumyks; Kumk2 195 206 Tabasarans; Tabas4 Lezgins; Lezg2 Azerbaijanis; Azerb13 214 Kyrgyz; Kyrgz1 213 Iranians; Iran1 J 194 212 Kabardins; Kabard3 211 Kapu, Andhra-Pradesh; Kapu1 216 Georgians; Georg2 J2a 210 215 Babtized-Tatars; Kryash4 Assyrians; Assyr4 J2 209 Abkhazians; Abkhaz6 218 Jordanians; Jord1 7 J2b 217 Armenians; Armen7 HT Albanians; Alban2 IJ 193 225 Belarusians; BelaR1 224 Belarusians; BelaR3 223 Chuvashes; Chuv1 222 Lithuanians; Lit2 221 Vepsas; Veps3 227 Croats; croat12 I2’3 220 226 Croats; croat13 Ukrainians; Ukr9 Georgians; Georg1 I 219 231 Mordvins; Mord3 230 Komis; Komi1 229 NW-Europeans; CEPH_15 I1 228 Belarusians; BelaR2 233 Italians; TSI_4 GT 232 Hungarians; Hun2 6 NW-Europeans; CEU_2 240 Dhaka-mixed-popul; Bangl1 238 Kol, Uttar-Pradesh; Kol1 239 Asur, Jharkhand; Asur1 H1a 237 Kshatriya, Uttar-Pradesh; Khsat1 236 Burmese; Burm14 242 Middle-caste, Punjab; PuMc1 235 241 Dhaka-mixed-popul; Bangl2 Balija-Middle-caste, Andhra-Pradesh; Balija1 H 234 Burmese; Burm10 Malayan, Kerala; Mal1 247 North-Ossetians; NOsset2 246 North-Ossetians; NOsset6 245 Abkhazians; Abkhaz5 244 Moldavians; Mold3 G2a 243 Lezgins; Lezg3 249 Kabardins; Kabard4 248 Poles; Pole5 Iranians; Iran2 259 Evens_Magadan; EvenM1 258 Evens_Magadan; EvenM3 CT 257 Evens_Sakha; EvenS1 5 260 Evens_Magadan; EvenM2 256 Evens_Magadan; Evn21 262 Buryats; Bur530 Evenks; Evk14 C3c 255 261 Mongolians; Mong3 265 Koryaks; koryak42 254 264 Koryaks; koryak33 263 Koryaks; Krk119 253 Evenks; Evnk2 Koryaks; Krk116 C3c’h 252 Kazakhs; Kaz2 266 Altaians; Altai6 Koryaks; KrkG3 272 Buryats; Bur383 271 Buryats; Bur578 C3 251 270 Buryats; Bur11 273 Buryats; Bur361 269 Buryats; Bur350 274 Buryats; Bur318 268 Buryats; Bur398 C3f1 267 275 Buryats; Bur2 Buryats; Bur355 C Chinese_PGP, Chinease; Chn1 250 281 Aeta; Aeta3 280 Murut; Murut5 279 Murut; Murut3 282 Lebbo; Lebb3 DT 278 Lebbo; Lebb4 4 277 Low-caste, Madhya-Pradesh; MPlc1 Dusun; Dusun8 276 286 Koinanbe; Koinb3 285 Koinanbe; Koinb1 284 Koinanbe; Koinb2 C2 283 Bajo; Bajo2 C1’7 Bajo; Bajo19 296 Italians; TSI_3 294 United-Kingdom, Britans; UK2 295 Albanians; Alban1 293 Estonians; Est4 292 Mishar-Tatars; TatarM1 E2a 297 Maasai; MKK_1 291 Maasai; MKK_3 290 Iranians; Iran4 299 Christian-Arabs-Israel; CArab3 E2b 298 Christian-Arabs-Israel; CArab1 Sandawe; Sandw2 BT E1’2 300 Sandawe; Sandw3 289 305 Bakola-Pygmies; BakoPy1 3 304 Hadza; Hadza1 303 Baka-Pygmies; BakaPy2 302 Luhya; LWK_4 E# 306 Congo-pygmies; CongPy1 288 Yoruba; YRI_3 E1a 301 310 Sandawe; Sandw1 309 Luhya; LWK_2 308 Luhya; LWK_3 287 307 Bedzan-Pygmies; BedzPy1 Yoruba; YRI_1 AT# Yoruba; YRI_12 312 Chinese_PGP, Chinease; Chn2 2 DE D 311 Tamang, low-caste; NplTA1 Japanese; JPT_1 318 Hadza; Hadza5 316 Hadza; Hadza2 317 Hadza; Hadza3 315 Sandawe; Sandw5 Baka-Pygmies; BakaPy3 1 B4’5 314 319 Baka-Pygmies; BakaPy1 B2’5 313 320 Sandawe; Sandw4 Hadza; Hadza4 Congo-pygmies; CongPy3 A2 321 Congo-pygmies; CongPy6 A00 Ethiopian-Jews; ISR07 Mbos; GRC13292545 322 Mbos; GRC13292546 250000 225000 200000 175000 150000 125000 100000 75000 50000 25000 0
Figure S3. Y chromosome phylogeny based on BEAST analyses. Sample names corre- spond to those reported in Table S6. Age and posterior support estimates for each num- bered clade are reported in Table S7. Six clades that showed significant (p_Delta1 <0.05) phylogenetic asymmetry in SymmeTree analyses are highlighted by # after the haplogroup label. Africa Near East 108 108
105 105 e e N N
107 107
106 106 150 100 50 0 60 40 20 0 Thousands of Years ago Thousands of Years ago Europe Central Asia 108 108
105 105 e e N N
107 107
106 106 60 40 20 0 60 40 20 0 Thousands of Years ago Thousands of Years ago South Asia Southeast & East Asia 108 108
105 105 e e N N
107 107
106 106 60 40 20 0 60 40 20 0 Thousands of Years ago Thousands of Years ago Siberia Andes 108 108
105 105 e e N N
107 107
106 106 60 40 20 0 60 40 20 0 Thousands of Years ago Thousands of Years ago
Figure S4A. NRY and mtDNA Bayesian Skyline Plots for each geographical region. The 95% CI for mtDNA BSPs are represented in red and for NRY in yellow. The BSPs were created using a piecewise-linear coalescence model. 2.0 Africa Andes 1.8 Central Asia Europe 1.6 Near East S-E&E Asia 1.4 Siberia South Asia 1.2 Ne
1.0
0.8 Relative
0.6
0.4
0.2
0.0 012345678910111213 Thousands of Years Ago
Figure S4B. Variation of Y chromosome effective population size over the past 13 kya. The regional NRY BSPs presented in Figure S4A are shown relative to their values at 12 kya. 16 Nf/Nm
12
8
4
0102030405060708090100110120130140150 Thousands of Years Ago
Figure S5 – The temporal dynamics of the ratio of female (Nf) and male (Nm) effective population size in the last 140KY. The ratios of the global accumulative Ne estimates of mtDNA (Nf) and Y chromosome (Nm), as presented in Figure 2, are plotted against the time (in thousands of years) back from the present (0). The BSPs estimates of Ne were obtained in BEAST using a piecewise-linear coalescence model. ) 5 6 Autosomal (predicted) mtDNA 5 NRY
4
3
Effective Population Size (x 10 Size Population Effective 2
1
0 140 120 100 80 60 40 20 0 Thousands of Years Ago
Figure S6 – The temporal dynamics of the autosomal effective population size in the last 140KY as predicted from female (Nf) and male (Nm) effective population sizes. The autosomal estimates were derived applying the 4NfNm/(Nf+Nm) formula on the mtDNA and Y chromosome based estimates. 1 deme, constant, Nm/Nf=1 1 deme, constant, Nm/Nf=0.5 1 deme, constant, Nm/Nf=0.1 Frequency Frequency Frequency 0 100000 250000 0e+00 2e+05 0e+00 2e+05 4e+05 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000
Node age Node age Node age
100 demes, constant, Nm/Nf=1 100 demes, constant, Nm/Nf=0.5 100 demes, constant, Nm/Nf=0.1 Frequency Frequency Frequency 0 100000 250000 0 100000 250000 0e+00 2e+05 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000
Node age Node age Node age
100 demes, exp after, Nm/Nf=1 100 demes, exp after, Nm/Nf=0.5 100 demes, exp after, Nm/Nf=0.1 Frequency Frequency Frequency 0 40000 100000 0 50000 150000 0e+00 4e+04 8e+04 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000
Node age Node age Node age
100 demes, exp, Nm/Nf=1 100 demes, exp, Nm/Nf=0.5 100 demes, exp, Nm/Nf=0.1 Frequency Frequency Frequency 0 20000 60000 0 20000 60000 0 20000 60000
0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000
Node age Node age Node age Figure S7 - Distribution of node ages for 1000 coalescent simulation replicates of Y chromosome (red) and mtDNA (blue) lineages under varying scenarios of population growth, deme formation, and effective number of males (Nm) and females (Nf). Plots show histograms of node ages, measured in generations, from fastsimcoal2 coalescent simulations run for four scenarios: Row 1 - one deme with a constant population size; Row 2 - 1 deme divided into 100 equally-sized demes formed 400 generations ago with a constant population size; Row 3 - formation of 100 demes 400 generations ago, with exponential growth starting 200 generations ago; Row 4 - formation of 100 demes 400 generations ago, with exponential growth starting 400 generations ago. The first column shows simulations run, assuming the effective number of males relative to the effective number of females (Nm/Nf) is equal. The second column shows simulations where Nm/Nf = 0.5, and the third column shows simulations with an extreme Nm/Nf=0.1. Nm and Nf are computed assuming a constant autosomal effective size of 10,000. All simulations sample 500 individuals, distributed evenly between demes. 60000 0.00350
H1 H 0.00300 50000
K P 0.00250 E2 IJ mate (years) 40000 O G2a NO G2a1 R I 0.00200 on rate me es me R1 30000 J R2 C J2a J2 R1b Q1 Q 0.00150 N O3 C3
O2a J1 STR muta 20000 R1a2 R1a1 R1a Q1a J1b O3a'b 0.00100 R1a2d O3a1 N3 N2a R1b11 Q2b'c 10000 I2a-L621
STR-based coalescent 0.00050 I1 C3f1 M407 0 0.00000 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 Resequencing-based (BEAST) coalescent me es mate (years)
Figure S8. Comparison of STR and sequence data based coalescent time estimates. STR-based estimates: 21 STR markers, “evolutionary” mutation rate 6.9x10-4 per 25 years (Zhivotovsky et al. 2004). Sequence based estimates: BEAST v.1.8.0, GTR substitution model, lognormal relaxed clock, mutation rate 7.4x10-10 mut/bp/year (SI3). The majority of the STR-based age estimates of “young” haplogroups deviate from the expected linear relationship (black line) with age estimates derived from sequence data. Adjusted STR mutation rates (green line, and the right y-axis) were calculated by 5000 year bins of haplogroup ages assuming linear relationship with sequence data based haplogroup age estimates. The “evolutionary” STR mutation rate that was used to calculate the raw STR- based age estimates is shown as a red line. 8 African Number of coalescent events in clades Time (KYA) African root Non-African involving African and non-African samples 6 70 4
2 60 E
Number of coalescent events Number of coalescent 0 30 35 40 45 50 55 60 65 70 75 52 DC F Time, thousands of years . 50 . K . 47
40
30 D1 D2 D3 D4 D* C3fC3c'h C2 C5 C7a C7b C9 C1 F2 GH1a H1b H1c H2 H4 H3 I J1 J2a J2b L1 L2 TNO1 O2 O3 NO2 K2c K2d S1 S2 S3 M P2 QR
Europe West Asia Caucasus South Asia Northeast Asia SE Asia Oceania Americas
Figure S9. Coalescent times of the oldest Y chromosome haplogroups outside Africa. The tree was generated and coalescent times (presented in thousands of years, KY) calculated in BEAST. Shown are only those 32 branches that had coalescent date older than 30 KY. These branches are shown in solid lines. The plausible branching structure of 11 additional branches that have been reported in the literature is shown in dotted lines. The distribution of relevant clades by geographic regions is presented on the basis of our data and inferences made from literature ignoring cases of the presence of the haplogroup in a region that can be explained by historic gene flow. The distribution of coalescent events in time for African and non-African populations was obtained using a sliding window of 2000 and step size 1000 years. The red and blue line connecting D and E reflects the proposed model of an early back to Africa event involving putative DE* lineages (Hammer et al. 1998). 100 Worldwide Africa Near East Europe South Asia SE&E