Title : Convergence of Y STR from different SNP compromises accuracy of prediction

Authors:Chuan-Chao Wang1, Ling-Xiang Wang1, Rukesh Shrestha1, Shaoqing Wen1, Manfei Zhang1, Xinzhu Tong1, Li Jin1,2,3, Hui Li1*

Affiliations 1. State Key Laboratory of Genetic Engineering and MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University, Shanghai 200433, 2. CAS-MPG Partner Institute for Computational Biology, SIBS, CAS, Shanghai, China 3. Institute of Health Sciences, China Medical City, Taizhou, Jiangsu, China * Correspondence to: [email protected]

1

Abstract Short tandem repeats (STRs) and single nucleotide polymorphisms (SNPs) are two kinds of commonly used markers in studies of forensic and population . There has been increasing interest in the cost saving strategy by using the STR haplotypes to predict SNP haplogroups. However, the convergence of Y chromosome STR haplotypes from different haplogroups might compromise the accuracy of haplogroup prediction. Here, we compared the worldwide Y chromosome lineages at both haplogroup level and level to search for the possible haplotype similarities among haplogroups. The similar haplotypes between haplogroups B and I2, C1 and E1b1b1, C2 and E1b1a1, H1 and J, L and O3a2c1, O1a and N, O3a1c and O3a2b, and M1 and O3a2 have been found, and those similarities reduce the accuracy of prediction.

Keywords Y chromosome, SNP, STR, haplotype, haplogroup prediction

2

Introduction

The paternally inherited Y chromosome has been widely used in forensics for personal identification, in anthropology and to understand origin and migration of human populations, and also in medical and clinical studies. There are two extremely useful markers in Y chromosome, SNP and STR1,2. With a very low rate on the order of 3.0x10-8 /nucleotide/generation3, SNP markers have been used in constructing a robust phylogeny tree linking all the Y chromosome lineages from world populations4,5. Those lineages determined by the pattern of SNPs are called haplogroups. That is to say, we have to an appropriate number of SNPs in order to assign a given Y chromosome to a haplogroup. Compared with SNPs, the mutation rates of STR markers are about four to five orders of magnitude higher. To recognize a Y chromosome, typing STR has advantages of saving time and cost compared with typing SNPs6. A set of STR values for an individual is called a haplotype. Because of the disparity in mutation rates between SNP and STR, one SNP haplogroup actually could comprise many STR haplotypes6. It is most interesting that STR variability is clustered more by haplogroups than by populations7,8, which indicates that STR haplotypes could be used to infer the haplogroup information of a given Y chromosome. There has been increasing interest in this cost effective strategy for predicting the haplogroup from a given STR haplotype when SNP data are unavailable. For instance, Vadim Urasin’s YPredictor (http://predictor.ydna.ru/), Whit Atheys’ haplogroup predictor (http://www.hprg.com/hapest5/) 9,10, and haplogroup classifier of Arizona University11 have been widely employed in previous studies for haplogroup prediction12-15.

Although this approach for haplogroup prediction is correct in principle, there are still many ongoing debates about the accuracy of using STR haplotypes in haplogroup assigning 16, 17. In particular, it has been reported that the number of STRs used in prediction and the available STR-SNP associated reference data have a significant impact on the accuracy of haplogroup prediction11. Furthermore, taken the mutation rates of the STRs and the time depth of the haplogroup ramifications into consideration, it is possible to find the same or similar haplotypes from different haplogroups16. However, the possible bias caused by the convergence of STR haplotypes in haplogroup prediction has not been discussed before. Here, we use a large amount of worldwide Y chromosome SNP and STR data to address this question.

Materials and methods Altogether, 20403 pieces of Y chromosome data with informative SNP and STR markers have been included in this study (Table S1): unpublished data of 231 East Asian samples from our lab, unpublished data of 101 samples from Genographic Consortium (haplogroup B), and other data retrieved from the literature4, 17-80. We renewed the haplogroup names according to the nomenclature of Y Chromosome Consortium and the ISOGG Y-DNA Haplogroup Tree 2013 (http://www.isogg.org/) 4, 5, 81. Different authors typed different STR markers, which has reduced the feasibility for STR haplotype references. Here, we use the AmpFlSTR® YfilerTM seventeen Y chromosomal STRs (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS385a, DYS385b, DYS438, DYS439, DYS437, DYS448, DYS456, DYS458, DYS635, and YGATAH4) as standard.

3

Comparisons of STR frequency distribution among different haplogroups were performed using the Arlequin ver 3.5 software package (arlecore3513_64bit in Linux)82. The Neighbor-joining tree was constructed in MEGA 5.1083 using the Slatkin’s distance (Rst) matrix calculated in Arlequin. A Markov Chain Monte Carlo analysis of haplogroup structure was carried out using the program Structure 2.3.484. YPredictor by Vadim Urasin v1.5.0 (http://predictor.ydna.ru/) was used for haplogroup prediction.

Results and Discussion

Our dataset has covered all the main haplogroups and almost all their sublineages in the Y chromosome phylogeny tree (Fig. 1a, Table 1S). Although the high mutation rates of STR markers make it difficult to construct phylogeny trees, the neighbor-joining tree of STR data (Fig.1b) shows a similar pattern as the trunk of Y chromosome haplogroup tree (Fig.1a). There are four main branches in the neighbor-joining tree, namely I, II, III, and IV. The three ancient haplogroup C, D, and E were clustered in branch I, representing the out of migration. Haplogroup F, G, H, I, J were mainly clustered in branch II, which might indicate the possible population expansion in the and . Haplogroup L, T, K, N, and O were clustered in branch III, representing the peopling of the Far East. Haplogroup P, Q, and R were mainly clustered in branch IV, demonstrating the population expansion from Central and the migration through northern Eurasia into the .

Obvious haplogroup divisions were also observed in the neighbor-joining tree, in which haplogroup R1b1+s (+s means certain haplogroup and its sublineages), haplogroup Q1+s, haplogroup G2a1+s, haplogroup J2+s, haplogroup E1b1b1+s, haplogroup D+s, and haplogroup E1b1a1+s were clustered tightly together, demonstrating the specific or even exclusive STR haplotypes of those haplogroups. However, haplogroups A, B were scattered in the tree, probably due to the high diversification among different sublineages of the two oldest haplogroups. Haplogroup C+s were mainly clustered with haplogroups D and E. Haplogroup C1 and C3 showed strong affinity with haplogroup E1b1b1, however, C2a* and C2a1 tended to cluster with E1b1a1. The three main of haplogroup O, O1, O2, and O3, tended to be clustered together. Haplogroup O1a+s showed very strong affinity with haplogroup N* and N1c. Haplogroup O2a1 and O2a1a fell out the scale of STR patterns of haplogroup O. Haplogroup O2b* and O2b1a were clustered with T* and T1a*, and also showed affinity with O3a* and O3a1c*. Haplogroup O3a2+s formed a tight cluster, indicating high similarities between those lineages, although haplogroup L+s have also been placed in the O3a2 cluster. Haplogroup P was clustered with haplogroup Q+s in a separated small branch. Haplogroup R1a1 were clustered with E1b1a1 and C2a in branch I, while its sister clade R1b1+s were grouped with R2 and M1b in branch IV.

The neighbor-joining tree based on pairwise comparisons gives an overall clustering pattern of the worldwide haplogroups. However, the results at haplogroup level could be misleading because of the highly diversified STR haplotypes within haplogroups, especially in very ancient lineages. Here, we used Structure software to show the STR haplotype patterns among haplogroups at individual level (Fig 2). We also used YPredictor to infer haplogroup for each haplotype and then compared the inferred haplogroups with the genotyped haplogroups to

4 estimate the error rates. The most ancient lineage A00 also has the exclusive STR haplotypes. The haplotypes of haplogroup A1a and A1b1b2b show similarities with haplogroup DE and E1b1a1, thus, about 30% of A1a and A1b1b2b samples were mistaken as DE in YPredictor. Haplogroup B+s are probably the most diverse clades, sharing similar haplotypes with various haplogroups, such as haplogroups I2a1, R1a1, D2a, E1b1b1, and L. Actually, only 18% of haplogroup B samples could be successfully inferred, and 26% were mistaken as I2 or IJ, 21% were assigned as haplogroup R in YPredictor. Similar to haplogroup B, the haplotypes of F*, H*, and K* are also too diverse to be used in haplogroup prediction. Most haplotypes of haplogroup C1 are similar to those of E1b1b1 and 22% of C1 samples were mistaken as E1b1b1 in prediction. Similarly, haplogroup C2 and its sublineages C2a and C2a1 shared most haplotypes with E1b1a1 and therefore 37% of those C2 samples were mistaken as E1b1a1. The haplotype pattern of haplogroup H1 and H1a is very similar to that of haplogroup J, resulting in erred assigning of about 20% of H1 and H1a samples as J in prediction. Haplogroup I*, I1*, and I1a1b1 share some haplotypes with haplogroup G+s. The haplotypes of haplogroup L+s are similar to those of haplogroup O3a2c1* and O3a2c1a. The haplotypes of O1a+s bear some similarity to those of haplogroup N. Similarly, haplotypes of haplogroup O3a1c show some similarity to O3a2b, and M1a and M1b are similar to O3a2+s. Those haplotype sharing or similarities among different haplogroups often mislead us in haplogroup prediction. On the contrary, haplogroup D2a, D3a, O2a1, O2b, R, and Q have haplogroup specific haplotypes and could be predicted by STR with high accuracy. It is still worthy to note that the affinity between haplogroup D and E might confuse the two in some cases.

The purpose of our paper is not to address the quality of the haplogroup prediction software, as no algorithms could be powerful enough to distinguish the same or very similar haplotypes and assign them into different haplogroups. The convergence of Y chromosome STR haplotypes among different haplogroups has compromised the accuracy of haplogroup prediction. For samples with ambiguous STR haplotypes, typing SNPs is the only reliable method to determine the haplogroups.

5

Fig.1 (a) The trunk of the Y chromosome haplogroup tree; (b) Neighbor-joining tree of 146 Y chromosome haplogroups based on Rst distance of 10 commonly used STRs (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, and DYS439).

6

Fig.2 STR structure (K=14) of 146 Y chromosome haplogroups using 10 STRs as in Fig.1b

Acknowledgments This work was supported by the National Excellent Youth Science Foundation of China (31222030), National Natural Science Foundation of China (31071098, 91131002), Shanghai Rising-Star Program (12QA1400300), Shanghai Commission of Education Research Innovation Key Project (11zz04), and Shanghai Professional Development Funding (2010001).

Supplementary Material Table S1-Y SNP and STR dataset

References

1. Wang CC, Li H (2013) Inferring Human History in East from Y . Investig Genet 4:11.

7

2. Hughes JF, Rozen S (2012) and genetics of human and primate Y chromosomes. Annu Rev Genomics Hum Genet 13:83-108. 3. Xue Y, Wang Q, Long Q, Ng BL, Swerdlow H, Burton J, Skuce C, Taylor R, Abdellah Z, Zhao Y, Asan, MacArthur DG, Quail MA, Carter NP, Yang H, Tyler-Smith C (2009) Human Y chromosome base-substitution measured by direct sequencing in a deep-rooting pedigree. Curr Biol 19(17):1453-1457. 4. Y Chromosome Consortium (2002) A nomenclature system for the tree of human Y-chromosomal binary haplogroups. Genome Res 12(2):339-348. 5. Karafet TM, Mendez FL, Meilerman MB, Underhill PA, Zegura SL, Hammer MF (2008) New binary polymorphisms reshape and increase resolution of the human Y chromosomal haplogroup tree. Genome Res 18:830–838. 6. Wang CC, Yan S, Li H (2010) Surnames and the Y chromosomes. Commun. Contemp. Anthropol 4:26-33. 7. Bosch E, Calafell F, Santos F et al (1999) Variation in short tandem repeats is deeply structured by genetic background on the human Y chromosome. Am J Hum Genet 65:1623–1638. 8. Behar DM, Garrigan D, Kaplan M et al (2004) Contrasting patterns of the Y chromosome variation in Ashkenazi Jewish and host non-Jewish European populations. Hum Genet 114:354–365. 9. Athey WT (2005) Haplogroup prediction from Y-STR values using an allele-frequency approach. J Genet Geneal 1:1–7. 10. Athey WT (2006) Haplogroup prediction from Y-STR values using a Bayesian-allele-frequency approach. J Genet Geneal 2:34–39. 11. Schlecht J, Kaplan ME, Barnard K, Karafet T, Hammer MF, Merchant NC (2008) Machine-Learning approaches for classifying Haplogroup from Y Chromosome STR data. PloS Comput Biol 4(6):e1000093. 12. Tarlykov PV, Zholdybayeva EV, Akilzhanova AR, Nurkina ZM, Sabitov ZM, Rakhypbekov TK, Ramanculov EM (2013) Mitochondrial and Y-chromosomal profile of the Kazakh population from East . Croat Med J 54(1):17-24. 13. Larmuseau MH, Vanderheyden N, Jacobs M, Coomans M, Larno L, Decorte R (2010) Micro-geographic distribution of Y-chromosomal variation in the central-western European region Brabant. Forensic Sci Int Genet 5(2):95-99. 14. Bembea M, Patocs A, Kozma K, Jurca C, Skrypnyk C (2011) Y-chromosome STR haplotype diversity in three ethnically isolated population from North-Western . Forensic Sci Int Genet 5(3):e99-100. 15. Larmuseau MH, Ottoni C, Raeymaekers JA, Vanderheyden N, Larmuseau HF, Decorte R (2012) Temporal differentiation across a West-European Y-chromosomal cline: genealogy as a tool in human population genetics. Eur J Hum Genet 20(4):434-440. 16. Muzzio M, Ramallo V, Motti JM, Santos MR, López Camelo JS, Bailliet G (2011) Software for Y-haplogroup predictions: a word of caution. Int J Legal Med 125(1):143-147. 17. Athey W (2011) Comments on the article, "Software for Y haplogroup predictions, a word of caution". Int J Legal Med 125(6):901-903. 18. Mendez FL, Krahn T, Schrack B, Krahn AM, Veeramah KR, Woerner AE, Fomine FL, Bradman N, Thomas MG, Karafet TM, Hammer MF (2013) An African American paternal lineage adds an extremely ancient root to the human Y chromosome . Am J Hum Genet

8

92(3):454-459. 19. Rosa A, Ornelas C, Jobling MA, Brehm A, Villems R (2007) Y-chromosomal diversity in the population of Guinea-Bissau: a multiethnic perspective. BMC Evol Biol 7:124. 20. King TE, Jobling MA (2009) Founders, drift, and infidelity: the relationship between Y chromosome diversity and patrilineal surnames. Mol Biol Evol 26(5):1093-1102. 21. Tishkoff SA, Gonder MK, Henn BM, Mortensen H, Knight A, Gignoux C, Fernandopulle N, Lema G, Nyambo TB, Ramakrishnan U, Reed FA, Mountain JL (2007) History of click-speaking populations of Africa inferred from mtDNA and Y chromosome genetic variation. Mol Biol Evol 24(10):2180-2195. 22. Arredi B, Poloni ES, Paracchini S, Zerjal T, Fathallah DM, Makrelouf M, Pascali VL, Novelletto A, Tyler-Smith C (2004) A predominantly neolithic origin for Y-chromosomal DNA variation in North Africa. Am J Hum Genet 75(2):338-345. 23. Järve M, Zhivotovsky LA, Rootsi S, Help H, Rogaev EI, Khusnutdinova EK, Kivisild T, Sanchez JJ (2009) Decreased rate of in Y chromosome STR loci of increased size of the repeat unit. PLoS One 4(9):e7276. 24. Haber M, Platt DE, Ashrafian Bonab M, Youhanna SC, Soria-Hernanz DF, Martínez-Cruz B, Douaihy B, Ghassibe-Sabbagh M, Rafatpanah H, Ghanbari M, Whale J, Balanovsky O, Wells RS, Comas D, Tyler-Smith C, Zalloua PA; Genographic Consortium (2012) 's ethnic groups share a Y-chromosomal heritage structured by historical events. PLoS One 7(3):e34288. 25. Coelho M, Sequeira F, Luiselli D, Beleza S, Rocha J (2009) On the edge of Bantu expansions: mtDNA, Y chromosome and lactase persistence genetic variation in southwestern Angola. BMC Evol Biol 9:80. 26. Batini C, Ferri G, Destro-Bisol G, Brisighelli F, Luiselli D, Sánchez-Diz P, Rocha J, Simonson T, Brehm A, Montano V, Elwali NE, Spedini G, D'Amato ME, Myres N, Ebbesen P, Comas D, Capelli C (2011) Signatures of the preagricultural peopling processes in sub-Saharan Africa as revealed by the phylogeography of early Y chromosome lineages. Mol Biol Evol 28(9):2603-2613. 27. Coelho M, Sequeira F, Luiselli D, Beleza S, Rocha J (2009) On the edge of Bantu expansions: mtDNA, Y chromosome and lactase persistence genetic variation in southwestern Angola. BMC Evol Biol 9:80. 28. Gomes V, Sánchez-Diz P, Amorim A, Carracedo A, Gusmão L (2010) Digging deeper into East African human Y chromosome lineages. Hum Genet 127(5):603-613. 29. Berniell-Lee G, Calafell F, Bosch E, Heyer E, Sica L, Mouguiama-Daouda P, van der Veen L, Hombert JM, Quintana-Murci L, Comas D (2009) Genetic and demographic implications of the Bantu expansion: insights from human paternal lineages. Mol Biol Evol 26(7):1581-1589. 30. Pijpe J, de Voogt A, van Oven M, Henneman P, van der Gaag KJ, Kayser M, de Knijff P (2013) Indian ocean crossroads: Human genetic origin and population structure in the maldives. Am J Phys Anthropol 151(1):58-67. 31. Park MJ, Lee HY, Yang WI, Shin KJ (2012) Understanding the Y chromosome variation in --relevance of combined haplogroup and haplotype analyses. Int J Legal Med 126(4):589-599. 32. Roewer L, Willuweit S, Krüger C, Nagy M, Rychkov S, Morozowa I, Naumova O, Schneider Y, Zhukova O, Stoneking M, Nasidze I (2008) Analysis of Y chromosome STR haplotypes in the European part of reveals high diversities but non-significant genetic distances between populations. Int J Legal Med 122(3):219-223.

9

33. Pamjav H, Zalán A, Béres J, Nagy M, Chang YM (2011) Genetic structure of the paternal lineage of the Roma people. Am J Phys Anthropol 145(1):21-29. 34. Zalloua PA, Xue Y, Khalife J, Makhoul N, Debiane L, Platt DE, Royyuru AK, Herrera RJ, Hernanz DF, Blue-Smith J, Wells RS, Comas D, Bertranpetit J, Tyler-Smith C; Genographic Consortium (2008) Y-chromosomal diversity in is structured by recent historical events. Am J Hum Genet 82(4):873-882. 35. Kim SH, Kim KC, Shin DJ, Jin HJ, Kwak KD, Han MS, Song JM, Kim W, Kim W (2011) High frequencies of Y-chromosome haplogroup O2b-SRY465 lineages in Korea: a genetic perspective on the peopling of Korea. Investig Genet 2(1):10. 36. Delfin F, Myles S, Choi Y, Hughes D, Illek R, van Oven M, Pakendorf B, Kayser M, Stoneking M (2012) Bridging near and remote : mtDNA and NRY variation in the Solomon Islands. Mol Biol Evol 29(2):545-564. 37. Mizuno N, Kitayama T, Fujii K, Nakahara H, Yoshida K, Sekiguchi K, Yonezawa N, Nakano M, Kasai K (2010) A forensic method for the simultaneous analysis of biallelic markers identifying Y chromosome haplogroups inferred as having originated in Asia and the Japanese archipelago. Forensic Sci Int Genet 4(2):73-79. 38. Malyarchuk BA, Derenko M, Denisova G (2012) On the Y-chromosome haplogroup C3c classification. J Hum Genet 57(10):685-686. 39. Roewer L, Nothnagel M, Gusmão L, Gomes V, González M, Corach D, Sala A, Alechine E, Palha T, Santos N, Ribeiro-Dos-Santos A, Geppert M, Willuweit S, Nagy M, Zweynert S, Baeta M, Núñez C, Martínez-Jarreta B, González-Andrade F, Fagundes de Carvalho E, da Silva DA, Builes JJ, Turbón D, Lopez Parra AM, Arroyo-Pardo E, Toscanini U, Borjas L, Barletta C, Ewart E, Santos S, Krawczak M (2013) Continent-wide decoupling of y-chromosomal genetic variation from language and geography in native South americans. PLoS Genet 9(4):e1003460. 40. Lacau H, Gayden T, Regueiro M, Chennakrishnaiah S, Bukhari A, Underhill PA, Garcia-Bertrand RL, Herrera RJ (2012) Afghanistan from a Y-chromosome perspective. Eur J Hum Genet 20(10):1063-1070. 41. Adamov DS, Volkov VG (2012) About C3d (M407) haplotype of the Teleut clan Choros. Russian J of Genet Genealogy 5(1): 17-21. 42. Gayden T, Cadenas AM, Regueiro M, Singh NB, Zhivotovsky LA, Underhill PA, Cavalli-Sforza LL, Herrera RJ (2007) The as a directional barrier to flow. Am J Hum Genet 0(5):884-894. 43. Nonaka I, Minaguchi K, Takezaki N (2007) Y-chromosomal binary haplogroups in the Japanese population and their relationship to 16 Y-STR polymorphisms. Ann Hum Genet 71(Pt 4):480-495. 44. Karachanak S, Grugni V, Fornarino S, Nesheva D, Al-Zahery N, Battaglia V, Carossa V, Yordanov Y, Torroni A, Galabov AS, Toncheva D, Semino O (2013) Y-chromosome diversity in modern Bulgarians: new clues about their ancestry. PLoS One 8(3):e56779. 45. Robino C, Crobu F, Di Gaetano C, Bekada A, Benhamamouch S, Cerutti N, Piazza A, Inturri S, Torre C (2008) Analysis of Y-chromosomal SNP haplogroups and STR haplotypes in an Algerian population sample. Int J Legal Med 122(3):251-255. 46. Sanchez-Faddeev H, Pijpe J, van der Hulle T, Meij HJ, J van der Gaag K, Slagboom PE, Westendorp RG, de Knijff P (2013) The influence of clan structure on the genetic variation in a single Ghanaian village. Eur J Hum Genet, doi: 10.1038/ejhg.2013.12. [Epub ahead of print]

10

47. Simms TM, Wright MR, Hernandez M, Perez OA, Ramirez EC, Martinez E, Herrera RJ (2012) Y-chromosomal diversity in Haiti and Jamaica: contrasting levels of sex-biased gene flow. Am J Phys Anthropol 148(4):618-631. 48. Onofri V, Alessandrini F, Turchi C, Fraternale B, Buscemi L, Pesaresi M, Tagliabracci A (2007) Y-chromosome genetic structure in sub-Apennine populations of Central by SNP and STR analysis. Int J Legal Med 121(3):234-237. 49. Regueiro M, Rivera L, Damnjanovic T, Lukovic L, Milasin J, Herrera RJ (2012) High levels of Paleolithic Y-chromosome lineages characterize Serbia. Gene 498(1):59-67. 50. Mielnik-Sikorska M, Daca P, Woźniak M, Malyarchuk BA, Bednarek J, Dobosz T, Grzybowski T (2013) Genetic data from Y chromosome STR and SNP loci in Ukrainian population. Forensic Sci Int Genet 7(1):200-203. 51. Ambrosio B, Dugoujon JM, Hernández C, De La Fuente D, González-Martín A, Fortes-Lima CA, Novelletto A, Rodríguez JN, Calderón R (2010) The Andalusian population from Huelva reveals a high diversification of Y-DNA paternal lineages from haplogroup E: Identifying human male movements within the Mediterranean space. Ann Hum Biol 37(1):86-107. 52. Völgyi A, Zalán A, Szvetnik E, Pamjav H (2009) Hungarian population data for 11 Y-STR and 49 Y-SNP markers. Forensic Sci Int Genet 3(2):e27-28. 53. López-Parra AM, Gusmão L, Tavares L, Baeza C, Amorim A, Mesa MS, Prata MJ, Arroyo-Pardo E (2009) In search of the pre- and post-neolithic genetic substrates in Iberia: evidence from Y-chromosome in Pyrenean populations. Ann Hum Genet 73(1):42-53. 54. Varzari A, Kharkov V, Nikitin AG, Raicu F, Simonova K, Stephan W, Weiss EH, Stepanov V (2013) Paleo-Balkan and Slavic contributions to the genetic pool of Moldavians: insights from the Y chromosome. PLoS One 8(1):e53731. 55. King RJ, Di Cristofaro J, Kouvatsi A, Triantaphyllidis C, Scheidel W, Myres NM, Lin AA, Eissautier A, Mitchell M, Binder D, Semino O, Novelletto A, Underhill PA, Chiaroni J (2011) The coming of the to Provence and Corsica: Y-chromosome models of archaic Greek colonization of the western Mediterranean. BMC Evol Biol 11:69. 56. Rootsi S, Myres NM, Lin AA, Järve M, King RJ, Kutuev I, Cabrera VM, Khusnutdinova EK, Varendi K, Sahakyan H, Behar DM, Khusainova R, Balanovsky O, Balanovska E, Rudan P, Yepiskoposyan L, Bahmanimehr A, Farjadian S, Kushniarevich A, Herrera RJ, Grugni V, Battaglia V, Nici C, Crobu F, Karachanak S, Hooshiar Kashani B, Houshmand M, Sanati MH, Toncheva D, Lisa A, Semino O, Chiaroni J, Di Cristofaro J, Villems R, Kivisild T, Underhill PA (2012) Distinguishing the co-ancestries of haplogroup G Y-chromosomes in the populations of Europe and the . Eur J Hum Genet 20(12):1275-1282. 57. Basu A, Mukherjee N, Roy S, Sengupta S, Banerjee S, Chakraborty M, Dey B, Roy M, Roy B, Bhattacharyya NP, Roychoudhury S, Majumder PP (2003) Ethnic India: a genomic view, with special reference to peopling and structure. Genome Res 13(10):2277-2290. 58. Sengupta S, Zhivotovsky LA, King R, Mehdi SQ, Edmonds CA, Chow CE, Lin AA, Mitra M, Sil SK, Ramesh A, Usha Rani MV, Thakur CM, Cavalli-Sforza LL, Majumder PP, Underhill PA (2006) Polarity and temporality of high-resolution y-chromosome distributions in India identify both indigenous and exogenous expansions and reveal minor genetic influence of Central Asian pastoralists. Am J Hum Genet 8(2):202-221. 59. Klarić IM, Salihović MP, Lauc LB, Zhivotovsky LA, Rootsi S, Janićijević B (2009) Dissecting the molecular architecture and origin of Bayash Romani patrilineages: genetic influences from

11

South-Asia and the Balkans. Am J Phys Anthropol 138(3):333-342. 60. Rai N, Chaubey G, Tamang R, Pathak AK, Singh VK, Karmin M, Singh M, Rani DS, Anugula S, Yadav BK, Singh A, Srinivasagan R, Yadav A, Kashyap M, Narvariya S, Reddy AG, van Driem G, Underhill PA, Villems R, Kivisild T, Singh L, Thangaraj K (2012) The phylogeography of Y-chromosome haplogroup h1a1a-m82 reveals the likely Indian origin of the European Romani populations. PLoS One 7(11):e48477. 61. Gusmão A, Gusmão L, Gomes V, Alves C, Calafell F, Amorim A, Prata MJ (2008) A perspective on the history of the Iberian gypsies provided by phylogeographic analysis of Y-chromosome lineages. Ann Hum Genet. 72(Pt 2):215-227. 62. Regueiro M, Stanojevic A, Chennakrishnaiah S, Rivera L, Varljen T, Alempijevic D, Stojkovic O, Simms T, Gayden T, Herrera RJ (2011) Divergent patrilineal signals in three Roma populations. Am J Phys Anthropol 144(1):80-91. 63. Thangaraj K, Naidu BP, Crivellaro F, Tamang R, Upadhyay S, Sharma VK, Reddy AG, Walimbe SR, Chaubey G, Kivisild T, Singh L (2010) The influence of natural barriers in shaping the genetic structure of Maharashtra populations. PLoS One 5(12):e15283. 64. Chiaroni J, King RJ, Myres NM, Henn BM, Ducourneau A, Mitchell MJ, Boetsch G, Sheikha I, Lin AA, Nik-Ahd M, Ahmad J, Lattanzi F, Herrera RJ, Ibrahim ME, Brody A, Semino O, Kivisild T, Underhill PA (2010) The emergence of Y-chromosome haplogroup J1e among -speaking populations. Eur J Hum Genet 18(3):348-353. 65. Qamar R, Ayub Q, Mohyuddin A, Helgason A, Mazhar K, Mansoor A, Zerjal T, Tyler-Smith C, Mehdi SQ (2002) Y-chromosomal DNA variation in . Am J Hum Genet 70(5):1107-1124. 66. Mirabal S, Regueiro M, Cadenas AM, Cavalli-Sforza LL, Underhill PA, Verbenko DA, Limborska SA, Herrera RJ (2009) Y-chromosome distribution within the geo-linguistic landscape of northwestern Russia. Eur J Hum Genet 17(10):1260-1273. 67. Rootsi S, Zhivotovsky LA, Baldovic M, Kayser M, Kutuev IA, Khusainova R, Bermisheva MA, Gubina M, Fedorova SA, Ilumäe AM, Khusnutdinova EK, Voevoda MI, Osipova LP, Stoneking M, Lin AA, Ferak V, Parik J, Kivisild T, Underhill PA, Villems R (2007) A counter-clockwise northern route of the Y-chromosome haplogroup N from towards Europe. Eur J Hum Genet 15(2):204-211. 68. Cai X, Qin Z, Wen B, Xu S, Wang Y, Lu Y, Wei L, Wang C, Li S, Huang X, Jin L, Li H; Genographic Consortium (2011) Human migration through bottlenecks from Southeast Asia into during revealed by Y chromosomes. PLoS One 6(8):e24282. 69. Gan RJ, Pan SL, Mustavich LF, Qin ZD, Cai XY, Qian J, Liu CW, Peng JH, Li SL, Xu JS, Jin L, Li H; Genographic Consortium (2008) Pinghua population as an exception of 's coherent genetic structure. J Hum Genet 53(4):303-313. 70. Lu Y, Pan SL, Qin SM, Qin ZD, Wang CC, Gan RJ, Li H, the Genographic Consortium (2013) Genetic evidence for the multiple origins of Pinghua Chinese. J Syst Evol 51 (3): 271–279. 71.Chaubey G, Metspalu M, Choi Y, Mägi R, Romero IG, Soares P, van Oven M, Behar DM, Rootsi S, Hudjashov G, Mallick CB, Karmin M, Nelis M, Parik J, Reddy AG, Metspalu E, van Driem G, Xue Y, Tyler-Smith C, Thangaraj K, Singh L, Remm M, Richards MB, Lahr MM, Kayser M, Villems R, Kivisild T (2011) Population genetic structure in Indian Austroasiatic speakers: the role of landscape barriers and sex-specific admixture. Mol Biol Evol 28(2):1013-1024. 72. Kim SH, Han MS, Kim W, Kim W (2010) Y chromosome homogeneity in the Korean population.

12

Int J Legal Med 124(6):653-657. 73. Wang CC, Yan S, Qin ZD, Lu Y, Ding QL, Wei LH, Li SL, Yang YJ, Jin L, Li H, the Genographic Consortium (2013) Late Neolithic expansion of ancient Chinese revealed by Y chromosome haplogroup O3a1c-002611. J Syst Evol 51 (3): 280–286. 74. Malyarchuk B, Derenko M, Denisova G, Maksimov A, Wozniak M, Grzybowski T, Dambueva I, Zakharov I (2011) Ancient links between Siberians and Native Americans revealed by subtyping the Y chromosome haplogroup Q1a. J Hum Genet 56(8):583-588. 75. Balaresque P, Bowden GR, Adams SM, Leung HY, King TE, Rosser ZH, Goodwin J, Moisan JP, Richard C, Millward A, Demaine AG, Barbujani G, Previderè C, Wilson IJ, Tyler-Smith C, Jobling MA (2010) A predominantly neolithic origin for European paternal lineages. PLoS Biol 8(1):e1000285. 76. Busby GB, Brisighelli F, Sánchez-Diz P, Ramos-Luis E, Martinez-Cadenas C, Thomas MG, Bradley DG, Gusmão L, Winney B, Bodmer W, Vennemann M, Coia V, Scarnicci F, Tofanelli S, Vona G, Ploski R, Vecchiotti C, Zemunik T, Rudan I, Karachanak S, Toncheva D, Anagnostou P, Ferri G, Rapone C, Hervig T, Moen T, Wilson JF, Capelli C (2012) The peopling of Europe and the cautionary tale of Y chromosome lineage R-M269. Proc Biol Sci 279(1730):884-892. 77. King RJ, Di Cristofaro J, Kouvatsi A, Triantaphyllidis C, Scheidel W, Myres NM, Lin AA, Eissautier A, Mitchell M, Binder D, Semino O, Novelletto A, Underhill PA, Chiaroni J (2011) The coming of the Greeks to Provence and Corsica: Y-chromosome models of archaic Greek colonization of the western Mediterranean. BMC Evol Biol 11:69. 78. Dulik MC, Osipova LP, Schurr TG (2011) Y-chromosome variation in Altaian reveals a common paternal gene pool for Kazakhs and the influence of Mongolian expansions. PLoS One 6(3):e17548. 79. Deng QY, Wang CC, Wang XQ, Wang LX, Wang ZY, Wu WJ, Li H, the Genographic Consortium (2013) Genetic affinity between the Kam-Sui speaking Chadong and Mulam people. J Syst Evol 51(3): 263–270. 80. Li DN, Wang CC, Yang K, Qin ZD, Lu Y, Lin XJ, Li H (2013) Substitution of indigenous genetic lineage in the Utsat people, exiles of the Champa kingdom. J Syst Evol 51(3): 287–294. 81. Yan S, Wang CC, Li H, Li SL, Jin L, Genographic Consortium (2011) An updated tree of Y-chromosome Haplogroup O and revised phylogenetic positions of mutations P164 and PK4. Eur J Hum Genet 19(9):1013-1015. 82. Excoffier L, Lischer HE (2010) Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol Ecol Resour 10(3):564-567. 83. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S (2011) MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 28(10):2731-2739. 84. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155(2):945-959.

13