View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by University of Huddersfield Repository

University of Huddersfield Repository

Fornarino, Simona, Pala, Maria, Battaglia, Vincenza, Maranta, Ramona, Achilli, Alessandro, Modiano, Guido, Torroni, Antonio, Semino, Ornella and Santachiara-Benerecetti, Silvana A

Mitochondrial and Y-chromosome diversity of the Tharus (): a reservoir of genetic variation

Original Citation

Fornarino, Simona, Pala, Maria, Battaglia, Vincenza, Maranta, Ramona, Achilli, Alessandro, Modiano, Guido, Torroni, Antonio, Semino, Ornella and Santachiara-Benerecetti, Silvana A (2009) Mitochondrial and Y-chromosome diversity of the Tharus (Nepal): a reservoir of genetic variation. BMC Evolutionary Biology, 9 (1). p. 154. ISSN 1471-2148

This version is available at http://eprints.hud.ac.uk/15482/

The University Repository is a digital collection of the research output of the University, available on Open Access. Copyright and Moral Rights for the items on this site are retained by the individual author and/or other copyright owners. Users may access full items free of charge; copies of full text items generally can be reproduced, displayed or performed and given to third parties in any format or medium for personal research or study, educational or not-for-profit purposes without prior permission or charge, provided:

• The authors, title and full bibliographic details is credited in any copy; • A hyperlink and/or URL is included for the original metadata page; and • The content is not changed in any way.

For more information, including our policy and submission procedure, please contact the Repository Team at: [email protected].

http://eprints.hud.ac.uk/ BMC Evolutionary Biology BioMed Central

Research article Open Access Mitochondrial and Y-chromosome diversity of the Tharus (Nepal): a reservoir of genetic variation Simona Fornarino1,4, Maria Pala1, Vincenza Battaglia1, Ramona Maranta1, Alessandro Achilli1,2, Guido Modiano3, Antonio Torroni1, Ornella Semino*1 and Silvana A Santachiara-Benerecetti*1

Address: 1Dipartimento di Genetica e Microbiologia, Università di Pavia, 27100 Pavia, , 2Dipartimento di Biologia Cellulare e Ambientale, Università di Perugia, 06123 Perugia, Italy, 3Dipartimento di Biologia, Università di Roma 'Tor Vergata', 00173 Roma, Italy and 4Current address: Human Evolutionary Genetics, CNRS URA 3012, Institut Pasteur, Paris, France Email: Simona Fornarino - [email protected]; Maria Pala - [email protected]; Vincenza Battaglia - [email protected]; Ramona Maranta - [email protected]; Alessandro Achilli - [email protected]; Guido Modiano - [email protected]; Antonio Torroni - [email protected]; Ornella Semino* - [email protected]; Silvana A Santachiara- Benerecetti* - [email protected] * Corresponding authors

Published: 2 July 2009 Received: 22 December 2008 Accepted: 2 July 2009 BMC Evolutionary Biology 2009, 9:154 doi:10.1186/1471-2148-9-154 This article is available from: http://www.biomedcentral.com/1471-2148/9/154 © 2009 Fornarino et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract Background: and the Indian subcontinent represent an area considered as a source and a reservoir for human genetic diversity, with many markers taking root here, most of which are the ancestral state of eastern and western , while others are local. Between these two regions, Terai (Nepal) is a pivotal passageway allowing, in different times, multiple population interactions, although because of its highly malarial environment, it was scarcely inhabited until a few decades ago, when malaria was eradicated. One of the oldest and the largest indigenous people of Terai is represented by the malaria resistant Tharus, whose gene pool could still retain traces of ancient complex interactions. Until now, however, investigations on their genetic structure have been scarce mainly identifying East Asian signatures. Results: High-resolution analyses of mitochondrial-DNA (including 34 complete sequences) and Y-chromosome (67 SNPs and 12 STRs) variations carried out in 173 Tharus (two groups from Central and one from Eastern Terai), and 104 Indians (Hindus from Terai and New Delhi and tribals from Andhra Pradesh) allowed the identification of three principal components: East Asian, West Eurasian and Indian, the last including both local and inter-regional sub-components, at least for the Y chromosome. Conclusion: Although remarkable quantitative and qualitative differences appear among the various population groups and also between sexes within the same group, many mitochondrial-DNA and Y-chromosome lineages are shared or derived from ancient Indian haplogroups, thus revealing a deep shared ancestry between Tharus and Indians. Interestingly, the local Y-chromosome Indian component observed in the Andhra-Pradesh tribals is present in all Tharu groups, whereas the inter-regional component strongly prevails in the two Hindu samples and other Nepalese populations. The complete sequencing of mtDNAs from unresolved haplogroups also provided informative markers that greatly improved the mtDNA phylogeny and allowed the identification of ancient relationships between Tharus and Malaysia, the Andaman Islands and Japan as well as between India and North and East Africa. Overall, this study gives a paradigmatic example of the importance of genetic isolates in revealing variants not easily detectable in the general population.

Page 1 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:154 http://www.biomedcentral.com/1471-2148/9/154

Background [5,9] by the different distribution of the malarial related Terai, a highly malarial region of South Nepal bordering α-thal gene [10]. on India (Figure 1), was until a few decades ago, when malaria was eradicated, inhabited almost exclusively by Even in more recent phylogeographic studies encompass- Tharus, one of the oldest and the largest indigenous peo- ing a large number of populations and including Tharu ple of Terai. This group is known for their resistance to samples, mostly from Uttar Pradesh [11-16], the Tharu malaria as evidenced by their decreased malarial morbid- genetic structure was not completely clarified. ity compared to sympatric Nepalese populations [1], a phenomenon not completely clarified at the genetic level. The present availability of more advanced techniques, It was only after substantially full malaria eradication, which allow molecular analyses at a much higher level of through a program for malaria control started in 1956, resolution with extremely small amounts of DNA, that several other Nepalese populations migrated and set- prompted us to once again address the issue of the genetic tled in Terai. Tharus live throughout the length of the origin of the Tharus, by analyzing both their mtDNA country (mainly in the northern strip of Terai) in villages (including sequencing of entire mtDNAs) and Y-chromo- very close to, or even inside, the previously malarial for- some (SNPs and STRs) variation. ested zones. Although culturally and linguistically very heterogeneous, they consider themselves as a unique Methods tribal entity subdivided into three main groups (western, The sample central and eastern). The sample consisted of 173 Tharu DNAs from male blood specimens collected more than 25 years ago, soon Because of its geographic position in a boundary area of after the massive immigrations of other populations into Central Asia, Terai was a preferential passageway during Terai following malaria eradication, and 104 Indians. The the dispersal of many prehistoric and historic popula- Tharu sample was composed of three groups from differ- tions, thus Tharus might have retained genetic traces of ent villages: two in the of Central Terai ancient migratory events. Until 1980, however, their (Th-CI and Th-CII) and one in the Morang district of East- genetic structure was almost unknown and, on the basis ern Terai (Th-E) (Figure 1). The Indian sample also was of some classical serum markers [2] and physical features composed of three groups: Hindus from Terai (H-Te, col- [3], they were considered a 'Mongoloid' tribe. Subsequent lected in the Chitwan district), Hindus from New Delhi studies, carried out on mitochondrial DNA (mtDNA) (H-ND) and tribals from Andhra Pradesh (T-AP). Absence RFLPs, however, provided further support for the presence of close relationships between the individuals was ascer- of a Tharu East Asian component [4-8] and showed other tained through interview data. When necessary, genomic genetic characteristics of unclear origin [9]. In addition, amplification of DNA was performed by using the Amer- heterogeneity among the three groups was also evidenced sham GenomiPhi kit.

Chitwan

Morang

GeographicFigure 1 map of Nepal Geographic map of Nepal. Sampled areas, in circles.

Page 2 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:154 http://www.biomedcentral.com/1471-2148/9/154

This research has been approved by the Ethic Committee 0.00069 per locus per 25 years [40]. hetero- for Clinical Experimentation of the University of Pavia, geneity (H) was computed using Nei's standard method after having verified the conformity to the international [41]. Principal Component (PC) analysis was performed rules. on the mtDNA and Y-chromosome haplogroup frequen- cies using Excel software implemented by Xlstat. MtDNA analyses Affiliation within mtDNA haplogroups was first inferred Web Resource through the sequencing of a region ranging from 630–876 Accession numbers and the URL for data presented herein base pairs (bps) from the control region that, according to are as follows: the rCRS [17], encompasses the entire hypervariable seg- ment I (HVS-I) and part of HVS-II, then confirmed GenBank, http://www.ncbi.nlm.nih.gov/Genbank/ for through a hierarchical survey by PCR-RFLP/DHPLC/ mtDNA complete sequences [GenBank: FJ770939– sequencing of haplogroup diagnostic markers in the cod- FJ770973]). ing regions [see Additional file 1]. The 9-bp deletion/ insertion polymorphism, already studied in a subset of Network software: www.fluxus-engineer-ing.com these populations [6], was also evaluated in all samples. STR information: http://www.cstl.nist.gov/biotech/str MtDNAs not ascribable to any known or well-defined base/y20prim.htm haplogroup/subhaplogroup were completely sequenced according to Torroni et al. [18]. Overall, 34 novel com- Results plete sequences were produced in the course of this study. mtDNA The assignment of sequences to specific haplogroups was The mtDNA haplogroups of the examined populations, performed as reported in Figure 2, according to the most- together with their frequencies, are illustrated in the phy- recent classifications of Eurasian haplogroups and sub- logeny of Figure 2. All M* mtDNAs were sequenced, and haplogroups [16,19-31]. only five (1–5 in Figures 4 and 5) did not cluster with other complete sequences. These are reported together as Phylogenetic trees were constructed manually and vali- "M others" in Figure 2. The control-region motifs are dated by the Network 4.500 program software. Coales- given in Additional file 1. cence times for mtDNA haplogroups were calculated by the rho (ρ) statistic according to the mutation-rate estima- Super-haplogroups M (55.7%) and, to a lesser extent, R tion of Mishmar et al. [32]. (39.3%) are the most represented in the dataset. The M lineages were predominant (>50%) in all populations Y-chromosome analyses with highest values in the Tharu and Andhra Pradesh sam- Y-chromosome haplogroups were defined by the hierar- ples (75–88% and 76%, respectively). By contrast, the R chical order analysis of the 67 MSY bi-allelic markers lineages were present at higher frequencies among Hindus reported in Figure 3. The YAP, 12f2.1, LLY22g, PK3, PK4, (43.7%) than among the Tharu and the Andhra Pradesh P47 and M429 polymorphisms were analyzed according tribals (19.1% and 24.1%) with a few overlaps in the hap- to Hammer and Horai [33], Rosser et al. [34], Zerjal et al. logroup distribution. The N(xR) lineages were observed [35], Mohyuddin et al. [36], Gayden et al. [37] and Under- only in three Hindus (4.9%). hill et al. [38]. All other mutations were detected by PCR/ DHPLC, according to Underhill et al. [39] and, when nec- The 9-bp polymorphism was found exclusively in the Tha- essary, results were verified by sequencing fragments of rus, associated with three different haplogroups: the dele- interest. tion (6.4%) with haplogroups B5a (eight subjects) and M33 (three subjects), and the insertion (one subject – Twelve STR loci (DYS19, YCAIIa/b, DYS388, DYS389I/II, 0.6%) with haplogroup M38 (Figures 2 &4). DYS390, DYS391, DYS392, DYS393, DYS439 and DYS460) were also analysed in the majority of the sam- Based on their known or supposed origin [11,20,42-45] it ples using two multiplex reactions according to http:// is possible to identify among these haplogroups three www.cstl.nist.gov/biotech/strbase/y20prim.htm informa- main components – East Asian, West Eurasian and Indian tion and by using ABI PRISM® 377 DNA Sequencer, inter- – that show a very skewed distribution (Figure 6a). nal size standard and GeneScan fragment analysis software. The East Asian component This is represented by nine M mtDNAs belonging to Hgs The age of microsatellite variation within haplogroups C, D, G, M9, M21 and Z, and four R mtDNAs belonging was evaluated in samples of five or more subjects accord- to Hgs B5a, and F1. This component, which amounts to ing to Sengupta et al. [15] using the mutation rate of about 65% in the two groups of Central Tharus and 33%

Page 3 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:154 http://www.biomedcentral.com/1471-2148/9/154

Th-CI Th-CII Th-E H-Te H-ND T-AP N=57 N=76 N=40 N=24 N=48 N=29 M others 1.3 2.1 10.3 10632 M51 2.5 4.2 3.4 1462T M52 2.5 8502 M2a 3.4 M3 8.3 6.3 M4 2.6 6.9

* 10.4 3.4 M30b 15431 M30 3.4 M30c 1.3 M30d 12007 25.0 M18 1.8 5.0 6.9

15487 M38 * 1.3 17.5 2.1 9 bp ins (np: 8289) 2.5 M43 11696 10.5 9.2 2.5 M53 4734 2.1 1888 M5 1.8 11.8 5.0 4.2 8.3 6.9 3537 M6a 13.8 C M8 13263 5.3 2.5 10400 M 9090 Z 1.8 3.9

* 15.8 11.8 1041 M9a M9a1 3.5 15468 M21b 2.6 15928 M25 8.3 15440 M31b 1.8 1.3

2361 M33 * 7.0 1.3 4.2 6.9 9 bp del (np: 8281-8289) 7.5 12561 M35 1.3 5.0 6.3 6.9 M39 4.2 8925 M40a 2.1 3.4 1.8 7.9 5.0 4.2 * 3010 D4 11696 D4j 3.5 3.9 L3 D4e1a 5.0

* 10.5 7.9 2.5 2.1 4833 G 7600 G2a 15.8 13.2 10.0 953 N1d 2.1 10398 I 4.2 8994 W 2.1

* 2.1 4216 R2 2.1 2.1 8594 R5 * R5a 8.3 R6 12285 2.5 12.5 2.1 13105 R7 3.4 10873 N 2755 R8 2.1 3.4 8584 R30 3.5 5.0 2.1 15235 B5a 9 bp del (np: 8281-8289) 8.8 2.6 2.5 1.8 * 6962 F1 9053 F1c 5.3 5.3 2.5 15402 F1d 2.5 13104 U1 4.2 2.1 12705 R U2a 4.2 2.1 U2b U2 15049 3.5 2.6 7.5 4.2 6.9 15061 U2c 2.5 8.3 4.2 12308 U 13734 U2e 2.1 3.4 4646 U4 2.1 15218 U5a1 3.4 5360 U7 10.4 5999 U9a 3.4 2.1 11719 R0 * 7028 H 1.8 2.1 14233 T2 1.3 4.2 2.1

FigurePhylogeny 2 and frequencies (%) of mtDNA haplogroups in the populations studied Phylogeny and frequencies (%) of mtDNA haplogroups in the populations studied. Haplogroups (East Asian in grey; West Eurasian in white; Indian in black) were assigned on the basis of both the control-region motifs and the coding-region polymorphisms [see Additional file 1] following published criteria (see Materials and Methods). Coding-region markers are reported as mutated nucleotide positions according to the rCRS [17] Mutations are transitions unless a base change is explic- itly indicated. The 9-bp polymorphism: deletion = del; insertion = ins. Haplogroups with an asterix (*) include samples negative for the examined sub-groups.

Page 4 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:154 http://www.biomedcentral.com/1471-2148/9/154

(M168)

RPS4Y711 YAP M89

M217 M356 M174 M35 * M481M201 M69 M429 M9 M285 * P47 M78 * M52 APT M17012f2.1 * M70 M20 M214 M74 M15

M267 * M370 M172 * M76LLY22g M175 M242 M207 P36 M317 TAT P31 M82 M410 M12 M349 M122 M173M124 Tdel * * M102 M357 SRY10831.2 M241 PK3 M95 M17 M36. * M68 PK4 M134 M197 * M56 M97 M47 M99 M64 M119 M138 M67 * M117 PK5 M158

C C D D E F F G H H H H H J J J K L L N O O O O Q R R R 3 5 * 1 1 * 5 * * 1 1 1 2 2 2 2 * * 1 1 2 3 3 3 1 * 1 2 b * a a a a b * a * a a a 1 * 1 * 3 2 1 3 3 1 b * * a c c * H Number

Population 1* 1 * 1 Th-C I 57 3.5 8.8 7.0 10.5 1.8 3.5 3.5 49.1 10.5 1.8 0.733 Th-C II 77 1.3 5.2 9.1 18.2 11.7 7.8 3.9 28.6 2.6 3.9 7.8 0.855 Th-E 37 8.1 8.1 5.4 13.5 8.1 2.7 5.4 10.8 18.9 16.2 2.7 0.906 H-Te 26 3.8 7.7 3.8 3.8 3.8 3.8 69.2 3.8 0.527 H-ND 49 2.02.0 6.1 10.2 2.0 2.0 4.1 4.1 2.0 2.0 6.1 2.0 34.7 20.4 0.831 T-AP 29 6.9 3.4 3.4 6.9 27.6 3.4 6.9 3.4 3.4 27.6 6.9 0.852

PhylogenyFigure 3 and frequencies (%) of Y-chromosome haplogroups in the populations studied Phylogeny and frequencies (%) of Y-chromosome haplogroups in the populations studied. Haplogroups: East Asian in grey; West Eurasian in white; Indian in black. The nomenclature and the hierarchical order of the mutations are according to the Y-Chromosome Consortium [62,67,73], updated with more recent markers: M429 [38]; M481 and P31 T-del (present study). The nomenclature of haplogroup H differs from that presented by Karafet et al. [73], in that all of our M82 samples were also M370 positive. H: intra-population haplogroup diversity, according to Nei [41]. In italics: markers not found. In parentheses: markers inferred. Haplogroups with an asterix* include samples negative for the examined sub-groups. in the Eastern Tharus, was not observed in the tribals of whose high frequency was reported in Tibeto-Burman Andhra Pradesh, and was seen only in two Hindus, one tribes of Thailand [50], China [51] and India [52], is (from Terai) as D4* and the other (from New Delhi) as found in all Tharu groups. G*. These two haplogroups, together with the M9a, are among the most frequent in the Tharus, especially group The West Eurasian component G that includes the G* and G2a, and accounts for 20.8% This component comprises the N haplogroups I and W, of the total sample, and for 26.3% of the Th-CI. Interest- and the R haplogroups R0, H, T2 and U (xU2a,b,c) and is ingly, on the basis of the sequence information of the almost absent in the Tharus (only one H and one T2 mtD- mtDNA control region (16223, 16274, 16362), the NAs from Chitwan). In contrast, it reaches a high fre- Indian G* appears different from all the other G* haplo- quency (25.0%) in New Delhi, where most of the groups examined [see Additional file 1], and could belong haplogroups of this component are found, and is also to haplogroup G3 [29,46]. Haplogroup M21, previously common in Indians from Terai (12.5%) and Andhra described in Malaysia where it is present with different Pradesh (10.3%). However, in spite of the similar fre- sub-clades [24,47] has been observed in two Central Tha- quencies, the two latter populations are remarkably differ- rus, thus establishing a deep correlation with the people ent in their composition: Hgs I, U1 and T2 characterize of that area. Haplogroup B5a is present in all Tharus, with the Terai Hindus, whereas Hgs U2e, U5a1 and U9a the the highest frequency (8.8%) in the Th-CI group. All these Andhra Pradesh tribals. share the 9-bp deletion and the HVS-I motif 16140- 16189-16266A, which corresponds to the Nicobar Island Among the West Eurasian U sub-clades, particularly inter- B5a1 [25] and is closely related to the Chinese B5a esting are U7 and U9. In the New Delhi sample, U7 shows [48,49]. Haplogroup F1 is also present among Tharus as a frequency (10.4%) that is quite similar to that of F1*, F1c and F1d. In particular, the subhaplogroup F1c, (9.4%) and close to its peak (12.3%) in the West Indian

Page 5 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:154 http://www.biomedcentral.com/1471-2148/9/154

East India + Bengal L3 263 Indian Area 315+C 489 8701 73 Malaysia 750 Tharu 10400 9540 2706 12705 1438 14783 10398 7028 rCRS East Asia East Africa N 16223 R H 4769 M 15043 10873 11719 15301 14766 8860 Symbols surrounded by a circle are from literature 15326

249d 1211 185 195 195 152 1462T 482 12007 3780 9569 195 930 472 522-523d 1598 16126 4769@ 10514 199 3714 575 9064 5460 8008 11016 204 4973 3203 10365 6020 M3 16311 246 709 4734 10373 12924 228 6203 5147 10632 10750 M3a 195 4099 10316 9180 11542 13032 262 10084 5319 14440 M4 15314 11696 M52 30.8 (a) 15184 13759 522-523d 10310 6905 15097 15487 12636 M53 33.4 195 151 15286 14632 1313 14374 7388 16183C 310 150 2000 93 16519 14305 4059A 146 M4a, b 1888 152 16234 15924 4802 16189 9861 16189 (315+C)d 593 3394 30.8 ± 7.3 16519 ± 10238 4541 2534 5319 152 21.8 21.8 16243 16234 6305T 16215 13368 16294 522-523d 2880 6713 11827 M38 5585 2706@ ± 8.9 6827 4258 M43 16244 16519 7469 16368 14864 16519 4760 8143 189 9.3 11167 13820 9061 4859 199 5910 6872 16519 8281 16519 15896 8557 12279 1808 309+C M51 12.0 11914 16344 16184 7853 522-523d 153 6827 7049 10754 16129 13815 12792 6267 11963 ± 2 * 13743 16519 16519 10149T 721 9095 11914 482 151 23.1 11197 16240 14623A 14687 13766 6899 M38a 13656 11.2 1 AP 4 9929 10556 793G 9509 14212 11410 16291 15815 15859 9966 488 13966 15.4 15.4 6297 ± 15951 14687 16129 AP H-ND 16223@ 12 M4c 16520 4659 11908 16362 16129 16092 16185 7356 2833 16311 16189 5097 4.5 6713 16260 H-ND ± 12067 16519 3618 16183C 16111 523+CA 14944 4654 16362 6011 16209 150 15 7270 24 16192 12366 9300 16189 16192 6221 7444 152 6815 7.7

± 16093 8269 16297 198 11617 16300 12918 16172 8 16275 16293 12051 NW-In 9129 204 16184 8289+9bp M43a PK 5 14319 6.3 3652 12696 16362 14299 16256 16438 16519 16189 11221 214 16260A 9438 Th-CII 16051 AP 11 5460 16129 16519 15930 16111 4099@ 16249 9139 16266 14706 16234 709@ 16213 16086 S-In 7149 7 ** 9 10 16311@ 15562 16311 8281-89d 16223@ 25 16218 13323 6 16355 16 12858 16249 16270 H-Te Th-E AP 15940 18 20 H-ND 16359 16093 16293C 16362 Th-E 13 UP 16319 H-ND 19 Th-CI 21 Th-CII Th-E 14 Th-CII 22 23 3 17 AP Th-U UP AP AP

1888 2361 12561 16129 16519 16519 16519 42.5 ± 8.4 38.2 ± 7.2 M33 M35 M5 8562 1719 152 199 2442A 709 15908 3221 204 32.5 ± 7.8 3200 3921 25.7 ± 4.1 6293 207 3540 12477 M5 M33a 16324 513 482 15928 5027 14323 others 16362 3116 5432 6566 5423 199 150 207 146 30.8 27.0 ± 5.4 4024 10670 M35b 6719 30.8 30.8 13731 8705 293 195 41.1 M5a 6866 16169 10598 4117 198 M33b 6563 15924 152 3368 4454 4916 189 146 146 1664 8668 16093 709 6830 7954G 16172 11569 10310 195 8194 ±

18.0 18.0 15287 3407 3757 152 676 4814 9966 15.4 ± 5.4 4143 8844 9983

573+6C ± 12696 14290 12501 ± 8.9

M5a2 11365 3826 20.6 10325 M33a1 13105 16316 482 8572 15236 5192 10007 M35a 5363 9804 15.4 8930A 15262 12501 M5a1 5450A 14.8 11353 152 150 16140 3918 12153 16124 234 60 6125 10969 5426 151 8584 11311 ± 11986 14509 7669 11914 13500 9064 523+CA 462 16311 37 5351 13912 16304 2068 93 8856 13788 152 8709 ± 6.0 7975 16136 13734 2206A 5124A 9194 14053 6164 95C 10685 14950 204 9615 8886 11016 14381 3399 30 10604 10.3 NE-In 3391 7609 16095 16209 26 2217 10166 36 10951 15052 8270 152 10724 16178 146 207 6905 7.7 12477@ 40 16111 N-In M5a1a 4949 16278 13065 15466 8281-89d 3798 12361 16288 571 16293C 8392 5118 16335 16263 UP 7226h 13020 S-In 16304 16327 5894C 16519@ 15796 16294 Th-CI 10939 8276+C 13914 4991 16296+C 16093@ 5774 8676 7358 16144A 185 8697 16398

± 13215 5895 16172 9344 14693 44 12425 16189 8056 10334 9456 16148 334 16093 49 4.4 5899+7C 16271 41 11893 15530 13928C 16192 16264 15731 1780 16311 35 39 H-ND 46 AP 50 32 7533 16399 12477 15934 16319 16311 51 16265C 15172 UP Th-CI Th-E AP 47 Th-E 29 S-In 16354 13329 16259 Th-CII 16319 28 16291 33 H-ND 48 16365 UP 38 16260 45 H-ND Th-CII 34 43 AP 31 H-ND 42 S-In 27 AP Egypt NE-In NE-In UP

PhylogeneticFigure 4 tree of 51 mtDNA sequences Phylogenetic tree of 51 mtDNA sequences. Mutations are scored relative to the rCRS [17]For the tree construction, the length variation in the poly-C stretch at np 16193 was not used, while the variation at np 309 is indicated only when phyloge- netically relevant. Mutations are shown on the branches. They are transitions unless a base change is explicitly indicated; inser- tions are suffixed with a plus sign (+) and the inserted nucleotide(s), and deletions are characterized by "d"; back mutations are highlighted by "@"; recurrent mutations are underlined. Dating is reported in kilo years. Some sequences are incomplete: * from 3694 to 3738 nps; ** from 8353 to 8472 nps. For the ethnic/geographic origins of the samples, see Additional file 2. Pop- ulation codes: Th-CI and Th-CII: Central Tharus; Th-E: Eastern Tharus; H-ND: Hindus from New Delhi; H-Te: Hindus from Terai; AP: Andhra Pradesh; UP: Uttar Pradesh; N-In, NE-In, S-In: North, North-East, South Indians, respectively; PK: . Symbols surrounded by a circle are from literature. (a) Nomenclature different from that (M45) reported in the “mtDNA tree Build 5 (8 Jul 2009)” (http://www.phylotree.org/). state of [12]. U9 is a rare haplogroup previously The Indian component observed in Pakistan [42], Yemen and Ethiopia [23,53]. This is the major component of the Indian groups and Interestingly, the U9 mtDNA that we found in Andhra also of the Eastern Tharus (Figure 6a) and is represented Pradesh, together with an Ethiopian mtDNA, defines the by 36 haplogroups, one third of which are shared between new U9a sub-group (Figure 5), thus confirming the Tharus and Indians and seven are present only among ancient genetic links between East Africa, Southwest Asia Tharus (Figure 2). Since this component showed a more and India. complex genetic relationship between Tharus and Indi- ans, in addition to the M* samples, other selected mtD- Although the West Eurasian component is probably pri- NAs were completely sequenced, to obtain a deeper marily related to migrations during the Holocene period, haplogroup phylogeny. The parsimony trees, illustrated in the exact source and time of such migrations is difficult to Figures 4 and 5, include 81 sequences, 34 of which are establish [12,45]. from the present study comprising one exogenous

Page 6 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:154 http://www.biomedcentral.com/1471-2148/9/154

L3 263 8701 315+C 489 73 9540 750 10400 2706 10398 12705 1438 14783 7028 rCRS 10873 N 16223 H 4769 M 15043 R 11719 15301 14766 8860 15326

11482 16519 3970 499 61.7 ¢ 11.1 13928C 1811 M21 16304 5999 13105 8584 709 200 11467 16319 15924 204 12308 16362 R30 R9 51.4 ¢ 11.4 2222 1598 12372 3915 152 152 R7 2056 16183C 249d 6023 5093 309+C R30 U4’9 3796 3316 16189 6392

5108 150 83.9 6253 1442 others 6962 10202 7765 152 4232 9182 6248 10310 3531 11287 7861 231 5442 373 10172 7870 103 10609 3834 11611 8718 240 6764 4062 ¢ ¢ 11665A ¢ ¢ 189 6386 16093 9051 4225 12406 15468 9116 1007 9156 199 14094 9110 14.4 16129 9242 5836 12882 15930 12940 6959 10289 203

16256 11047A 7490 25.7 16263 15106 7660 13830 522-23d U9 16271 12714 8805 F1 16519 15511 8697 16260 709 16362 15055 9174 16051 195 15734 9017 16261 2626

30.8 146

10631A ¢ ¢ (a) 16278 573+4C ¢ ¢ M21b 16242 9531 16519@ 6962 29.1 11009 522-23d 1005 16319 11815 6.6 54 R30a 8281-89d 23.1 3378 5177 12406 1734 U9a 9299 6413 5899+C 16519 12867 R7a’b 8829

¢ ¢ 5628 ¢ ¢ Mal 15262 3158+T 10688 13236 11350 7789 6164 8856 7738 5460 1734 ¢ ¢ ¢ ¢ 14194 8.9 15758 4225 12615

9950 ¢ ¢ 55 ¢ ¢ 8974 3290 8277 12501 14305 15402 15769 7.5 5237 10103 12852 5306 13111 8279 15553 16183C 7.7 Mal 16182C 7274 152@ 16298 15930 11482@ 16093 522-23d 10398 16261 7870 9966 207 16299 16189 16242 12621 16129 16183C 6164 10907 25.7 16311 9554 11506 5147 16519 14831 16381 16189 13474 12361 15077 13758 6215 152 2156+A 6.9 68 16239 16231 13477 143 F1d 16147 4907 16126 6266 299d 66 ¢ ¢ 16325 53 58 15223 299A ¢ ¢ 16193 PK 11176 56 16181 7244 480 309+C 152 Ethiopia

15508 8.1

302C ¢ ¢ 16357 Mal AP 13135 ¢ ¢ 52 15440 16209 10736 15662 14577 4951

AP 4.2 15530 16362 14161 15851 15777 4977 65 67 Th-CII 16126 15442 5099 15927 16519@ 7191 Japan 57 16140 7268G 16086 AP M31 59 * 9277G 16284 13251 UP 16194C 61 16183C@ (2156+A)d 3999 16195 13857 Th-CI Th-E 15307 63 16136 27.4 ¢ 8.0 12876 16243 64 37.1¢ 9.4 16256 16092 Th-E M31b M31a 16182C Japan 249d 188 152 195 1524 60 234 808 62 16145 2045 Japan 282 3337 Pun 9269 M31a2 3975 3834 868 14152 16093 8973 8093 4775 15935 9581 14212 9966 16126@ 11014 15676 11023T 76 16311 14407 15258 E-In 16126@ 71 M31b1 16017 16311 Indian Area East India + Bengal NE-In 16519 6323 15300 M31a2a 3.8 ¢ 2.2 7.2 ¢ 4.0 7805 M31a1 150 Tharu Malaysia 11903 14178 70 6683 6158 9617 13710 16093 16093 75 NE-In East Asia East Africa 72 73 And 69 74 7785 And And 8108 Th-CI And 77 78 200 80 Symbols surrounded by a circle are from literature And And 79 And 81 And And

PhylogeneticFigure 5 tree of 30 mtDNA sequences Phylogenetic tree of 30 mtDNA sequences. Mutations are scored relative to the rCRS [17]For the tree construction, the length variation in the poly-C stretch at np 16193 was not used, while the variation at np 309 is indicated only when phyloge- netically relevant. Mutations are shown on the branches. They are transitions unless a base change is explicitly indicated; inser- tions are suffixed with a plus sign (+) and the inserted nucleotide(s), and deletions are characterized by “d”; back mutations are highlighted by “@”; recurrent mutations (considered in the global phylogeny of the 81 mtDNAs) are underlined. Dating is reported in kilo years. * Sequence incomplete from 411 to 628 and from 16189 to 16290 nps. For the ethnic/geographic ori- gins of the samples, see Additional file 2. Population codes: Th-CI and Th-CII: Central Tharus; Th-E: Eastern Tharus; AP: And- hra Pradesh; UP: Uttar Pradesh; Pun: Punjab; NE-In: North-East Indians; E-In: East Indians; PK: Pakistan; Mal: Malaysia; And: Andaman Islands. Symbols surrounded by a circle are from literature. (a) Nomenclature different from that (M13b) reported in the “mtDNA tree Build 5 (8 Jul 2009)” (http://www.phylotree.org/). mtDNA (from Egypt), and 47 are from the literature [see components of subgroups, contributing to an improved Additional file 2] with four exogenous mtDNAs (three definition of the mtDNA tree and a refinement of age esti- from Japan and one from Ethiopia). Ten sequences mates. belonging to Hg M did not enter in any of the previously described haplogroups: five clustered as three new haplo- mtDNA phylogeography groups M51, M52 and M53 and five resulted as single lin- The new haplogroups M51 and M52 were detected in the eages. The latter five can be used as references for new eastern part of the Indian subcontinent, while M53 seems haplogroup by affiliation of mtDNAs classified only for to belong to the West Indian area. As for the new sub- the control region such as, for example, sequence #3, clades of previously described haplogroups, M4c, linking whose HVS-I motif has been described in one Koya of one Tharu of Chitwan with one Indian from Andhra Andhra Pradesh [11] and one Sudra of [52]. Pradesh [30], could be typical of Tribal groups, and M43a, All the other sequences could be assigned to known M is observed at the Indian border with Nepal. Sub-clade and R haplogroups, either as direct basal derivatives or as M5a1 characterizes peoples from (New Delhi

Page 7 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:154 http://www.biomedcentral.com/1471-2148/9/154

M31a2 of the tribal Lodha, Lambadi and Chenchu popu- mtDNA components lations, represents the Indian counterparts of the M31a1 a) 100% Andaman lineages [27], further supporting a common 80% ancestry of the Indian subcontinent and people of the 60% Bengal Bay islands. 40% 20% As for the R haplogroups, R7 and R30 are of particular 0% Th-CI Th-CII Th-E H-Te H-ND T-AP interest. Very informative for the structure and for the age (57) (76) (40) (24) (48) (29) evaluation of haplogroup R7 is the Andhra Pradesh sequence #56 (Figure 5) that defines an extremely deep West Eurasia (% ) 1.8 1.3 0.0- 12.5 25.0 10.3 branch of the R7 in India. This branch shares with the root Indian area (% ) 29.8 34.2 67.5 83.3 72.9 89.7 of the phylogeny of Chaubey et al.[54] only the mutations East Asia (% ) 68.4 64.5 32.5 4.2 2.1 0.0- 13105, 16319 and, in addition, it does not display the 16260 and 16261 mutations characterizing the R7a and R7b branches observed in different R samples from Indian Y-chromosome components b) groups [11,52,54-57] and, interestingly, in one R7 Tutsi 100% from Rwanda (unpublished data). Two Tharu mtDNAs, 80% one from Chitwan and one from Eastern Terai, belong to 60% the R30 haplogroup. The first is closely related to two 40% Indian sequences, one from Andhra Pradesh and the other 20% from Uttar Pradesh, and contributes to define a sub-clade 0% Th-CI Th-CII Th-E H-Te H-ND T-AP of the R30a [54]. The second joins a Punjab sequence [54] (57) (77) (37) (26) (49) (29) with a Japanese deep lineage [22] indicating an ancient West Eurasia (% ) 12.3 11.7 29.7 3.8 10.2 6.9 link between India and Japan. A more recent connection Indian area (% ) 31.6 52.0 48.6 84.6 79.6 89.7 with Japan is, in turn, revealed by the F1d haplogroup East Asia (% ) 56.1 36.4 21.6 11.5 10.2 3.4 showing a tight linkage between an Eastern Tharu sequence and two Japanese mtDNAs. Another noteworthy connection with outside areas is evidenced by the U9 hap- HistogramsponentsFigure 6observed of the inmtDNA the populations (a) and Y-chromosome studied (b) com- Histograms of the mtDNA (a) and Y-chromosome logroup that, being shared by an Ethiopian and an Andhra (b) components observed in the populations studied. Pradesh mtDNA, reveals a not recent link between Ethio- Sample sizes are in parentheses. pia and India.

Even if the PC analysis of mtDNA haplogroup frequencies and Uttar Pradesh [30]), whereas M5a2 is present in observed in the present study compared with those of rel- Southern India [28,30]. Both haplogroups M33 and M35 evant populations accounts for only about a quarter of the show many inner branches, but while M35 is diffused variance, four main clusters are defined: West Eurasian inside the Indian subcontinent, relating the Tharu groups [12], Indian area [12,42,55,56], East Asian [58-60], and and the Hindu from New Delhi with populations of Southeast Asian [44] (Figure 7). The first two are well-dis- , M33 is also spread elsewhere. Indeed, its sub- tinguished from the others by the first PC, which points clade M33a includes one Egyptian mtDNA, thus connect- out a separation between the West and the East Eurasian ing the Indian subcontinent with North Africa, whereas gene pools; afterwards, the second PC distinguishes West M33b, described in Western Bengalese [30] and in the Eurasians from Indians and East Asians from Southeast Indian region of Megalaya [31], has been observed in Asians. Tharu groups are located in the middle of the area Eastern Tharus. Therefore, it may represent a clade of the among the clusters but, while the central groups are closer Northeast Indian subcontinent. to East Asians, Eastern Tharus turned out to be closer to the Indians. Other samples from the border between India Of particular interest is the detection of haplogroups M21 and Nepal, such as those from Uttar Pradesh, remain and M31 (two subjects each) among the central Tharus. inside the Indian cluster (including the group Th-Up com- The Tharu M21 sequence (Figure 5) shares nine mutations posed of marginal "Hinduized" Tharus [12]. As for Indi- with one of the three M21 lineages found in all Orang Asli ans, they all group together, in agreement with a deep groups of Malaysia [24] and in other groups from South- (Late Pleistocene) common maternal ancestry of caste and east Asia [44], belonging to the sub-group M21b. The tribal populations [11,60], perhaps due to some accepted Tharu M31 sequence, together with one Megalaya mtDNA practices (such as the anuloma) that allow a woman of a [31], clusters with one West Bengal Rajbhansi [21,27] and lower social level to enter a higher level by marriage defines a sub-group of M31b. This subclade, together with [55,61].

Page 8 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:154 http://www.biomedcentral.com/1471-2148/9/154

6 additional DHPLC/sequencing analyses of P31 chromo- somes are necessary to evaluate the extent of the contem-

Kaz porary presence of the two mutations. It is worth noting Irn-NE 4 Uzb that these samples were also all positive for the PK4 Irn-NW Uyg marker recently observed in four Pakistani Pathans [36]. Irn-C Mong Irn-SW Another variation, consisting of an A to G transition at np Hui 2 Irn-SE 147, was observed in two H-M82 samples while sequenc- ing the M89 marker. This mutation, which was not found Th-CII Han-SE Th-CI either in H-M69* or in H2-APT chromosomes, character- 0 Th-UP Pa H-ND Pj izes the H1 subgroup but, due to the impossibility of typ- Pk Th-E UP-1 Gj ing all the M82 samples, as well as any M370* and M52* Lb Jv Am

-- axis --> F2 (11 %) UP-2 WB-1 Lk Su Y chromosome, at present, we cannot define the precise -2 WB-3 Rj Bo phylogenetic position of this novel transition inside the H-Te T-AP WB-2 Sm sub-haplogroup.

-4 Ba On the basis of known or supposed haplogroup origin [11,14,15,36,56,62,64-72], three main components (East

-6 Asian, West Eurasian and Indian) can also be identified -6 -4 -2 0 2 4 6 for the Y chromosome. The incidence of the various com- -- axis F1 (15 %) --> ponents in each population is depicted in the histograms of figure 6b. PrincipalciesFigure 7 component analysis of mtDNA haplogroup frequen- Principal component analysis of mtDNA haplogroup frequencies. Comparison samples from Western Eurasia The East Asian component made up by haplogroups (Iran): Irn-W, Irn-E, Irn-C, Irn-SW, Irn-SE [12]; Indian sub- C(xC5), D, N, O3, Q, and K*, and mainly represented by continent: AP, Andhra Pradesh [55]; WB-1, Castes from Hg O3, is, on the whole, much more frequent among Tha- Bengal; WB-2, Kurmis from West Bengal; WB-3, Lodhas rus (39.8%) than among Indians (7.7%). The high Tharu from West Bengal; Pj, Punjab; Rj, Rajput; Pa, Parsi; Gj, frequency, mostly accounted for by the subgroup O3- Gujarat; UP-2, Brahmins from Uttar Pradesh [12]; Th-UP, M117 (83.8%), shows a wide range in the three groups Tharus from Uttar Pradesh [12,56]; UP-1, Uttar Pradesh; Lb, with significant differences between Th-CI vs both Th-CII Lobana group [56]; Pk, Karachis [42]; East Asia: Han-SE, (P < 0.02) and Th-E (P = 0.001). Among the less repre- Guandong [58]; Uzb, Uzbek; Uyg, Uygur; Kaz, Kazak; Mong, sented East Asian markers of interest is Hg D that is very ; Hui, Xinjiang [59,60]; and Indonesia: Su, Sumatra; frequent in Tibet, absent in other Nepalese populations Bo, Borneo; Jv, Java; Ba, Bali; Lk, Lombok; Sm, Sumba; Am, [37] but present in six Central Tharus: as D1-M15 in two Ambon [44]. Data have been normalized to the common level of analysis. On the whole, 26% of the total variance is Th-CI subjects and as D*-M174 in four Th-CII subjects. represented: 15% by the first PC and 11% by the second PC. The latter, by showing the DYS392 -7 repeat allele that characterizes the D3-P47 chromosomes [37], could belong to the recently identified Hg D3* [73]. In addition, The Y-chromosome two other haplogroups were encountered: K-M9* in a sin- The phylogeny and frequencies of the 28 Y-chromosome gle Eastern Tharus and Q1-P36 in two Tharus-CII. Hg Q, haplogroups observed in the present study are shown in which is present in Tibetans, was seen in only one sample Figure 3. from [37]. In Indians, the very scarce East Asian component was represented by three Hg O3 (each Two new variants are reported. The first, M481, defines belonging to a different sub-haplogroup and to a different the new haplogroup F5 and consists of a C→T transition Indian sample), one C3-M217 in Terai (previously at np 163 within the STS containing the P36 mutation observed only in a few Kathmandu and Tibetan samples [62]. The second, Tdel, was first noticed in haplogroup [37]), two N1-LLY22g*, one in Terai and one in New O2-P31 while typing the P31 marker and was confirmed Delhi and by three Q1-P36 in New Delhi. Only three East by sequencing. This is due to a T deletion in the 6T stretch Asian haplogroups, Q1-P36, O3-M134* and O3-M117, starting at np 127, adjacent to the P31 T to C transition are shared between Tharus and Indians. [63]. The T deletion, not found in the other examined Hg O derivatives, is always present in our O2 samples (all tri- The West Eurasian component, represented by haplo- bals; four of the Eastern Tharus and one from Andhra groups E, G, and J, shows a higher incidence among Tha- Pradesh). Taking into account that this haplogroup is rus (15.9%) than among Indians (7.7%). With the often recognized through markers different from P31 and exception of three E3-M35* Eastern Tharus and two G- that in other studies, where the P31 was examined M201 (one in New Delhi and the other in Andhra [64,65], a technique not detecting Tdel was employed, Pradesh), the main part of this component is accounted

Page 9 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:154 http://www.biomedcentral.com/1471-2148/9/154

for by haplogroup J (Tharus 14.0%, Indians 5.8%), [14,71,74], and also as an autochthonous Indian Austro- present only as J2, namely J2-M410* and J2-M241*. Asiatic population marker [72]. The remaining endog- Whereas the latter haplogroup is shared by all Indian and enous haplogroups include haplogroup C5-M356, shared Tharu samples, the J2-M410* was found in all Tharus but between Indians and Tharus (two in the Terai Hindus and in only one Hindu of New Delhi, where one sample of its one in the Tharus-CII), haplogroup F-M89* and its new derivative J2-M68 was also present. If one considers the derivative F5-M481, both considered as tribal markers total frequency of this component in each sub-group, and observed in Andhra Pradesh (10.3%). As for the inter- among Indians the highest value is observed in the Hin- regional haplogroups L-M20, R1-M17 and R2-M124, they dus of New Delhi (10%), and, among Tharus, in the display within India a considerable frequency and haplo- group of Eastern Terai (30%). It is noteworthy that the fre- type associated high microsatellite variance. However, quency of Eastern Tharus is about three times higher than whereas this observation for the subgroup L1-M76 of L- that of the other two Tharu samples (P ~ 0.03 vs Th-CI and M20 and for R2-M124 showing lower frequencies outside 0.02 vs Th-CII). This component may reflect several events this region, is considered indicative of a local origin, for of gene flow from the Early Holocene to the present, pass- R1-M17 the situation is more complex, as well as the posi- ing through farmers. tion of L-M20*. Actually, the high frequency of the R1- M17 haplogroup found in the Central Eurasian territory, The Indian subcontinent component includes lineages of together with its gradient of diffusion that was associated haplogroups C, F, H, L, O, R and among Indians it ranges with the Indo-European expansion [74-76], would leave from 80% in the New Delhi sample to 85% in Terai, and some uncertainty about its geographic origin. However, to 90% in the Andhra Pradesh. Among Tharus, with the the high microsatellite variation supports an ancient pres- exception of an incidence of ~32% in the Th-CI group, it ence, dated in our samples over 14 ky [see Additional file reaches values around 50% in the other two groups. Hgs 3] of the M17 marker in the Indian subcontinent, as sug- H and R are the most frequent haplogroups of this com- gested by Kivisild et al. [11], and sustained by Sengupta et ponent. Hg H (Tharus: 25.7% Indians: 18.3%) is repre- al. [15] and Thanseem et al. [71], who consider the Indo- sented by five sub-groups: H-M69*, H1-M52*, H1- European M17 only a contribution to a local Early M370*, H1-M82* and H2-APT. Whereas H-M69* was Holocene pre-existing Indian M17. Thus, it is reasonable detected at similar frequencies (mean 8.8%) in all the to assume that even this inter-regional haplogroup has Tharu sub-groups, and in two Indians of Andhra Pradesh ancient relationships with the Indian area. Interestingly, (6.9%), H1-M82* was seen in all Tharus and Indians. By the M17 Y-chromosomes of the Indian subcontinent dif- contrast, H1-M52* (2.0%) and H1-M370* (6.1%) were ferentiate from those of Central Eurasia in that they are seen only in the New Delhi Hindus, and H2-APT (11.7%) virtually all 49a,f/TaqI Ht 11 [77]. only in the Tharus-CII. As to the rare haplogroup L-M20*, it was present in two Hg R, besides a single R* from New Delhi, was detected individuals of the New Delhi sample. Only one of these Y- in all groups as R1a1-M17* and R2-M124 with important chromosomes could be analyzed for the microsatellites differences between Tharus (13.5%) and Indians and compared in a network with other seven available (52.9%), mainly due to R1-M17* (8.8% vs 41.3%). samples L-M20* of Turkish and Italian origin (unpub- Within the two populations, significant differences were lished data), showing that it was very distant from the oth- also observed: the Tharu-CII sample differs from the East- ers. ern one (3.9% vs 16.2%, P ~ 0.05); the Hindus from Terai (69.2%) appear very distant from both the New Delhi Age estimates of the main haplogroups with some com- Hindus (34.7%, P < 0.01) and the Andhra Pradesh tribals parative data [15] are reported in Additional file 3. (27.6%, P ~ 0.005). However, this important difference Although age estimates deserve caution, particularly when could be, at least partially, influenced by the genetic back- samples are small and standard errors large, a good gen- ground of the sample that in recent times moved from eral agreement between the two datasets is observed. As India to Nepal after malaria eradication. for haplogroup H1-M82*, not reported by Sengupta et al. [15], its age is very similar in all groups, with variance The Indian component can be resolved into the most (0.093–0.110) lower than that (0.19) previously observed likely endogenous (local) haplogroups (C5, F*, H, the in some Indian groups [11]. Special attention is deserved two new F5-M481 and O2a1a-Tdel), and the inter- by haplogroups J2-M410*and R1-M17*, showing vari- regional ones (L, R1 and R2). In the first group we have ances very different in the various Tharu and Indian sub- included the lineage HgO2-P31-Tdel found in the tribals groups and the highest values in the Eastern Tharus and of both Eastern Tharu and AP Indian samples. The T dele- tribals of Andhra Pradesh. Interesting is also Hg R2-M124 tion further characterizes the HgO2-M95 clade that is con- for which the Tharu total variance rises to 0.271, a value sidered a genetic footprint of the earliest Palaeolithic obtained by adding just two samples from the other Tharu Austro-Asiatic settlers in the Indian subcontinent

Page 10 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:154 http://www.biomedcentral.com/1471-2148/9/154

groups to six homogeneous Th-CII samples (variance 8 0.033), thus stressing again the Tharu heterogeneity. a)

H-ND The PC analyses of the haplogroup frequencies, which 6 were performed with the Nepalese and Tibetan data of Gayden et al. [37] and the Indian caste and tribal groups of Sengupta et al. [15], are illustrated in Figure 8a,b. In 4 both plots, a cluster of tribals, including Tharus and the Indians from Andhra Pradesh, is evident and separated from the caste groups. As for the Nepalese populations, all 2 Kat are very distant from Tibetans. Tharus, with the Eastern H-Te New group always in a peripheral position, cluster together in 0 the same quadrant of the plot, distinct from those occu- Tam Tib pied by the other three Nepalese groups. -- axis --> F2 (20 %) T-AP Th-CII -2 Discussion Th-CI The analysis of mtDNA and Y chromosome polymor- Th-E phisms in three Tharu samples from Central and Eastern -4 -4 -2 0 2 4 6 8 10 Terai has enlightened the presence of three main compo- -- axis F1 (23 %) --> nents, Oriental, West Eurasian and Indian, that show remarkable quantitative and qualitative differences among the three groups as well as between sexes within b) the same group. 4 DC

The East Asian signature of the Tharus Like Tibetans and other people of Nepal [37] the greater IEC part of the East Asian influence in the Tharus may be 2 mainly traced back to Tibeto-Burman speakers who Th-E entered Northeast India within the last 4.2 ky [78] and H-Te likely influenced them through a founder effect. Indeed, T-AP 0 East Asian mtDNA haplogroups present in the Tharu Th-CI H-ND samples show lower genetic variation: all control-region AAT haplotypes are similar [see Additional file 1] and do not Th-CII TBT cover the variety found within the Tibeto-Burman popula- -- axis-2 --> F2 (17 %) DT tions [79]. In particular, B5a, D4, G2a mtDNAs are present among Tharus, whereas B4, D5 mtDNAs as well as IET haplogroups A, M7 and R10 were not observed. Signa- -4 tures of this influence are also seen in the Tharu Y chro- -4 -2 0 2 4 6 8 mosomes that are almost completely represented by -- axis F1 (22 %) --> haplogroup O3-M117. Interestingly, Tibetan markers not present in the other Nepalese populations [37] are PrincipalfrequenciesFigure 8 component analysis of Y-chromosome haplogroup revealed in the Central Tharus by haplogroups D (4.5%) Principal component analysis of Y-chromosome hap- and Q (0.7%) of the Y chromosome. logroup frequencies. (a) Comparison with Nepalese and Tibetan groups [37]; (b) Comparison with some Indian caste The Middle Eastern signature of the Tharus and tribal groups [15] where our data have been normalized West Eurasian markers are virtually absent in the mtDNA to the Sengupta level of resolution. Populations: Kat, Kath- of Tharus, whereas they are present in their Y chromo- mandu; New, Newar; Tam, Tamang; Tib, Tibet; DC Dravid- somes essentially as J2-M410* and J2-M241*, with a fre- ian castes; IEC, Indo-European castes; AAT, Austro-Asiatic quency peak (30%) in the eastern sample, where three E- tribals; TBT, Tibeto-Burman tribals; DT, Dravidian tribals; M35 chromosomes were also observed. These latter, all IET, Indo-European tribals. displaying the same microsatellite haplotype, could be attributed to recent gene flow from the Middle East or, as previously reported for the Indian , from Africa [13,82-84], shows variance values of 0.346 in the [80,81]. By contrast, both sub-haplogroups of J are indic- Tharus and 0.339 in Indian groups [15]. These values are ative of various connections with the Middle East. J-M410, lower than those (0.467 and 0.479) observed in Anatolia which was associated with the first farmer dispersal in [13,82] and (0.410) in Southeast Europe [83,84] and

Page 11 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:154 http://www.biomedcentral.com/1471-2148/9/154

therefore are compatible with a dispersal of this lineage The links between the Central Tharus and the Andaman from somewhere in the Middle East/Asia Minor. The situ- Islanders through Northeast India (Hg M31), between the ation of J-M241* is more difficult to interpret. The vari- Eastern Tharus and Japan (Hg R30) and between Central ance of this lineage shows a value of 0.437 in the Tharus Tharus and Malaysia (Hg M21), are ancient. However, which is higher than that (0.328) obtained from the whereas our results are in agreement with an Indian ances- Indian data of Sengupta et al. [15], thus suggesting a pre- tor for haplogroup M31 [27], they are not informative Neolithic presence of J-M241* in the Indian subconti- about the origin of haplogroup M21 (observed in two nent. Tharus-CII), given its Southeast Asian frequency and vari- ation [44]. Haplogroup R30 could represent a relic of the The Indian background postulated out-of-Africa South Coastal Route [24], A great majority of the Tharu mtDNA and Y-chromosome whereas M33, together with U9a, indicate ancient links of gene pools is represented by lineages shared or derived India with North and East Africa. These events of gene from Indian haplogroups. In particular, Tharus share with flow, however, according to the divergence times (20.6 + Indians ancient mtDNA haplogroups (see for example, 10.3 and 23.1 + 7.7 ky, respectively), would have occurred the M clades M31, M33, M35, M38, the new M52 and also more recently than those previously described and dated the R30, almost all dated ~30 ky) and Y-chromosome to about 40–45 ky [43]. haplogroups (such as H-M69, O2-P31Tdel, R1-M17* and R2-M124) that, in the isolated malaria-resistant Tharus of Sex-specific influences Terai, could be retained. Therefore, Tharus might have Clear sex-biased frequencies emerged from these analyses. been structured in situ by major demographic episodes of This is particularly evident for the East Asian contribution the past, and then by relatively minor gene flows due to that shows a decreasing trend from Central to Eastern Tha- subsequent migrations. rus and is more strongly represented in the mtDNA than in the Y-chromosome data set. By contrast, the West Eur- Tharu gene pool: a reservoir of variation generated by asian contribution, extremely scarce and even absent in local differentiations and by traces of different migratory the Tharu mtDNA, accounts from 12% to 30% of the Y- routes chromosome data set. As for the Indian component, it is The remarkable qualitative heterogeneity of the three well represented in all groups, with the highest frequen- components and of the age of their haplogroups in the cies in the Eastern Tharu mtDNA and in the Y chromo- total populations and in their sub-groups [see Figures 4 somes of Tharu-CII. and 5 and Additional file 3] makes it possible to set them in a temporal background and to identify links between Apart from genetic drift, these sex-specific influences can the various populations of the Indian subcontinent, as be ascribed to all those human movements with different well as with populations outside this area. male/female composition. Thus, whereas the first human dispersals involved both males and females, more recent Of particular interest is the link emerging between Tharus immigrations, involving mainly men [85], gradually and tribals from Andhra Pradesh, as well illustrated by the diluted the ancient local Y-chromosome pool. A clear Y-chromosome PCA plots (Figure 8) and by the high prev- example of a recent sex-biased influence emerged in the alence in these two populations of the local Y-chromo- comparison between lower and the northern upper casts, some haplogroup component (Figure 9), in comparison the latter receiving in the last few thousand years, a Indo- to the Hindus and to the other populations of Nepal [37] European male genetic input from the North [86,87]. where the inter-regional component is clearly predomi- Thus, the differentiation between tribal and non tribal nant. This further supports a deep common ancestry groups is evident for the Y chromosome (Figure 8) between Tharus and Indians, probably due to the legacy of whereas a major similarity characterizes the two groups the first settlers who arrived from the Indian coasts during for mitochondrial DNA (Figures 7). the out-of-Africa dispersal. Subsequently, the high level of consanguinity inside numerous social boundaries, along Comparison with other Nepalese populations with the influences of evolutionary forces such as long- By considering the Nepalese populations examined by term isolation, could be responsible for the development Gayden et al. [37], apart from the homogeneous Tamang of local genetic variants stemming out from the same sample that displays almost exclusively the East Asian founders, as seen for mtDNA haplogroups M43, M51, haplogroup O3-M134, the Newar and Kathmandu M52, R30a in figures 4 and 5. groups, like Tharus, show an important Indian compo- nent. However, whereas in the first two, the inter-regional Useful in further elucidating and deepening these proc- haplogroups are most represented, in the Tharus the local esses has been the complete sequencing of informative ones are prevalent (Figure 9). Both quantitative and qual- mtDNAs, especially belonging to haplogroup M. itative differences emerge from the East Asian component:

Page 12 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:154 http://www.biomedcentral.com/1471-2148/9/154

100%

80%

60%

40%

20%

0% Th-CI Th-CII Th-E T-AP H-ND H-Te Tam*a New*a Kat*a (18) (40) (18) (26) (39) (22) (6) (40) (46) Local (%) 61.1 77.5 61.1 61.5 23.0 13.6 33.3 15.0 21.7 Inter-Regional (%) 38.9 22.5 38.9 38.5 77.0 86.4 66.7 85.0 78.3

ulationsFigureHistograms 9of the of thepresent Indian study local comp (Hgs:ared C5, withF, H, otherL1, O2a1a1) Nepalese and groups inter-regional (Hgs: L* and R) components observed in the pop- Histograms of the Indian local (Hgs: C5, F, H, L1, O2a1a1) and inter-regional (Hgs: L* and R) components observed in the populations of the present study compared with other Nepalese groups. Sample sizes are in parentheses. (a) Gayden et al. [37] on the whole it is most frequent and heterogeneous Particularly informative has been the complete mtDNA among Tharus, especially in the Chitwan groups which, in sequencing that further supports a deep differentiation of addition to the frequent Hg-O3-M117, show the Hgs D mtDNA haplogroups in the Indian subcontinent, indicat- and Q, reflecting a Tibetan influence. The West Eurasian ing that some branches are geographically or socially spe- component, virtually absent in the Tibetan sample, is rep- cific, while others are widespread. The improvement in resented in Newar and Kathmandu groups with frequen- the mtDNA phylogeny has also allowed the identification cies of 7.6% and 10.4%, respectively. It is interesting to of ancient relationships between Tharus, not only with note however, that the Newar sample in addition shows a the Indian subcontinent area, including Pakistan, but also substantial presence (10.6%) of the R1-M269 haplogroup with the Andaman Islands, Malaysia, and Japan, as well as not found in all the other examined populations. between India and North and East Africa. The new sequence data also allow a better definition of the genetic Conclusion relationships among Indian populations at the microgeo- The analyses carried out on the mtDNA and Y chromo- graphic level. Indeed many control-region data from the some of the Tharus, one of the oldest and the largest indig- literature, if compared to the mtDNA sequences of the enous people of Terai, have shown a complex genetic present study can now be classified into known haplo- structure within which are identifiable: i) a deep common groups. ancestry between Tharus and Indians, not previously reported, more evident for mtDNA but also revealed by Moreover, the importance of genetic isolates in revealing the prevalence of the local Indian Y-chromosome sub- variants not easily detectable in the general population component, as in the tribals of Andhra Pradesh; ii) a sig- has clearly emerged. nificant East Asian genetic contribution both in the male and female gene pool; iii) a western heritage, clearly evi- Authors' contributions dent for the Y-chromosome; iv) a remarkable heterogene- SAS-B and OS, designed the research; GM collected sam- ity of the Tharu population (with the Eastern Tharus more ples; SF, MP, RM, generated the mtDNA data; SF, VB, RM dissimilar to the others) ascribable both to various exoge- generated the Y-chromosomal data; OS, SF, MP, VB, and nous influences and to subgroup specific lineages stem- AA carried out the statistical analyses. SAS-B, OS and AT ming from a shared genetic background with Indians. wrote the paper. All authors discussed the results and commented the manuscript.

Page 13 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:154 http://www.biomedcentral.com/1471-2148/9/154

Additional material comparison with the Tharus of Nepal. Ann Hum Genet 1991, 55:123-136. 10. Modiano G, Morpurgo G, Terrenato L, Novelletto A, Di Rienzo A, Colombo B, Purpura M, Mariani M, Santachiara-Benerecetti AS, Brega Additional file 1 A: Protection against malaria morbidity, near-fixation of the MtDNA control and coding regions information of the population alpha-thalassemia gene in a Nepalese population. Am J Hum samples examined. The data provide the markers examined in the sub- Genet 1991, 48:390-397. jects of the present study. 11. Kivisild T, Roosti S, Metspalu M, Mastana S, Kaldma K, Parik J, Met- spalu E, Adojaan M, Tolk HV, Stepanov V, Golge M, Usanga E, Papiha Click here for file SS, Cinnioğlu C, King R, Cavalli Sforza L, Underhill PA, Villems R: The [http://www.biomedcentral.com/content/supplementary/1471- genetic heritage of earliest settlers persist in both the Indian 2148-9-154-S1.xls] tribal and caste populations. Am J Hum Genet 2003, 72:313-332. 12. Metspalu M, Kivisild T, Metspalu E, Parik J, Hudjashov G, Kaldma K, Serk P, Karmin M, Behar DM, Gilbert MT, Endicott P, Mastana S, Additional file 2 Papiha SS, Skorecki K, Torroni A, Villems R: Most of the extant Additional file 2. Origin of the Figure 4mtDNA complete-sequences. mtDNA boundaries in south and southwest Asia were likely The data provide information on the completely sequenced mtDNA mole- shaped during the initial settlement of Eurasia by anatomi- cules of Figures 4 and 5. cally modern humans. BMC Genet 2004, 5:26. Click here for file 13. Semino O, Magri C, Benuzzi G, Lin AA, Al-Zahery N, Battaglia V, Mac- [http://www.biomedcentral.com/content/supplementary/1471- cioni L, Triantaphyllidis C, Shen P, Oefner PJ, Zhivotovsky LA, King R, Torroni A, Cavalli-Sforza LL, Underhill PA, Santachiara-Benerecetti 2148-9-154-S2.doc] AS: Origin, diffusion, and differentiation of Y-chromosome haplogroups E and J, inferences on the neolithization of Additional file 3 Europe and later migratory events in the Mediterranean area. Am J Hum Genet 2004, 74:1023-1034. Ages of the main Y-chromosome haplogroups in the samples of the 14. Sahoo S, Singh A, Himabindu G, Banerjee J, Sitalaxmi T, Gaikwad S, present study together with relevant comparative data from Sengupta Trivedi R, Endicott P, Kivisild T, Metspalu M, Villems R, Kashyap VK: et al. [15]. Age estimates of the main Y-chromosome haplogroups in the A prehistory of Indian Y chromosomes, Evaluating demic dif- different population samples of the present study compared with those fusion scenarios. Proc Nat Acad Sci USA 2006, 103:843-848. reported by Sengupta et al. [15]. 15. Sengupta S, Zhivotovsky LA, King R, Mehdi SQ, Edmonds CA, Chow Click here for file CT, Lin AA, Mitra M, Sil SK, Ramesh A, Usha Rani MV, Thakur CM, Cavalli-Sforza LL, Majumder PP, Underhill PA: Polarity and tempo- [http://www.biomedcentral.com/content/supplementary/1471- rality of high resolution Y chromosome distributions in India 2148-9-154-S3.xls] identify both indigenous and exogenous expansions and reveal minor genetic influence of Central Asian pastoralists. Am J Hum Genet 2006, 78:202-221. 16. Thangaraj K, Chaubey G, Kivisild T, Selvi Rani D, Singh VK, Ismail T, Carvalho-Silva D, Metspalu M, Bhaskar LV, Reddy AG, Chandra S, Acknowledgements Pande V, Prathap Naidu B, Adarsh N, Verma A, Jyothi IA, Mallick CB, This research received support from Progetti Ricerca Interesse Nazionale Shrivastava N, Devasena R, Kumari B, Singh AK, Dwivedi SK, Singh S, Rao G, Gupta P, Sonvane V, Kumari K, Basha A, Bhargavi KR, Lalrem- 2007 (Italian Ministry of the University) (to O.S. and A.T.), Ministero degli ruata A, Gupta AK, Kaur G, Reddy KK, Rao AP, Villems R, Tyler- Affari Esteri (to O.S.) and Compagnia di San Paolo (to O.S. and A.T.). Smith C, Singh L: Maternal footprints of Southeast Asians in North India. Hum Hered 2008, 66:1-9. 17. Andrews RM, Kubacka I, Chinnery PF, Lightowlers RN, Turnbull DM, References Howell N: Reanalysis and revision of the Cambridge reference 1. Terrenato L, Shrestha S, Dixit KA, Luzzatto L, Modiano G, Morpurgo sequence for human mitochondrial DNA. Nat Genet 1999, G, Arese P: Decreased malaria morbidity in the 23:147. compared to sympatric populations in Nepal. Ann Trop Med 18. Torroni A, Rengo C, Guida V, Cruciani F, Sellitto D, Coppa A, Cal- Parasitol 1988, 82:1-11. deron FL, Simionati B, Valle G, Richards M, Macaulay V, Scozzari R: 2. Chopra VP: Studies on serum groups in the Kumaon region, Do the four clades of the mtDNA haplogroup L2 evolve at India. Humangenetik 1970, 10:35-43. different rates? Am J Hum Genet 2001, 69:1348-1356. 3. Bista DB: People of Nepal. Kathmandu, Nepal: Ratna Pustak 19. Ingman M, Gyllensten U: Mitochondrial genome variation and Bhandar; 1980. evolutionary history of Australian and New Guinean aborig- 4. Brega A, Gardella R, Semino O, Morpurgo G, Astaldi Ricotti GB, Wal- ines. Genome Res 2003, 13:1600-1606. lace DC, Santachiara-Benerecetti AS: Genetic studies on the 20. Palanichamy MG, Sun C, Agrawal S, Bandelt HJ, Kong QP, Khan F, Tharu population of Nepal, restriction endonuclease poly- Wang CY, Chaudhuri TK, Palla V, Zhang YP: Phylogeny of mito- morphisms of mitochondrial DNA. Am J Hum Genet 1986, chondrial DNA macrohaplogroup N in India, based on com- 39:502-512. plete sequencing, implications for the peopling of South 5. Passarino G, Semino O, Pepe G, Shrestha SL, Modiano G, Santachiara Asia. Am J Hum Genet 2004, 75:966-978. Benerecetti AS: MtDNA polymorphisms among Tharus of 21. Palanichamy MG, Agrawal S, Yao YG, Kong QP, Sun C, Khan F, eastern Terai (Nepal). Gene Geography 1992, 6:139-147. Chaudhuri TK, Zhang YP: Comment on "Reconstructing the 6. Passarino G, Semino O, Modiano G, Santachiara-Benerecetti AS: origin of Andaman islanders". Science 2006, 311:470. COII/tRNA(Lys) intergenic 9-bp deletion and other mtDNA 22. Tanaka M, Cabrera VM, González AM, Larruga JM, Takeyasu T, Fuku markers clearly reveal that the Tharus (southern Nepal) N, Guo LJ, Hirose R, Fujita Y, Kurata M, Shinoda K, Umetsu K, have Oriental affinities. Am J Hum Genet 1993, 53:609-618. Yamada Y, Oshida Y, Sato Y, Hattori N, Mizuno Y, Arai Y, Hirose N, 7. Passarino G, Semino O, Bernini LF, Santachiara-Benerecetti AS: Pre- Ohta S, Ogawa O, Tanaka Y, Kawamori R, Shamoto-Nagai M, Maru- Caucasoid and Caucasoid genetic features of the Indian pop- yama W, Shimokata H, Suzuki R, Shimodaira H: Mitochondrial ulation, revealed by mtDNA polymorphisms. Am J Hum Genet genome variation in eastern Asia and the peopling of Japan. 1996, 59:927-934. Genome Res 2004, 14:1832-1850. 8. Passarino G, Semino O, Modiano G, Bernini LF, Santachiara- 23. Achilli A, Rengo C, Battaglia V, Pala M, Olivieri A, Fornarino S, Magri Benerecetti AS: MtDNA provides the first known marker dis- C, Scozzari R, Babudri N, Santachiara-Benerecetti AS, Bandelt HJ, tinguishing proto-Indians from the other Caucasoids; it Semino O, Torroni A: Saami and Berbers-an unexpected mito- probably predates the diversification between Indians and chondrial DNA link. Am J Hum Genet 2005, 76:883-886. Orientals. Ann Hum Biol 1996, 23:121-126. 24. Macaulay V, Hill C, Achilli A, Rengo C, Clarke D, Meehan W, Black- 9. Semino O, Torroni A, Scozzari R, Brega A, Santachiara-Benerecetti burn J, Semino O, Scozzari R, Cruciani F, Taha A, Shaari NK, Raja JM, AS: Mitochondrial DNA polymorphisms among Hindus, a

Page 14 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:154 http://www.biomedcentral.com/1471-2148/9/154

Ismail P, Zainuddin Z, Goodwin W, Bulbeck D, Bandelt H, Oppenhe- 39. Underhill PA, Shen P, Lin AA, Jin L, Passarino G, Yang WH, Kauffman imer S, Torroni A, Richards M: Single, rapid coastal settlement E, Bonne-Tamir B, Bertranpetit J, Francalacci P, Ibrahim M, Jenkins T, of Asia revealed by analysis of complete mitochondrial Kidd JR, Mehdi SQ, Seielstad MT, Wells RS, Piazza A, Davis RW, Feld- genomes. Science 2005, 308:1034-1036. man MW, Cavalli-Sforza LL, Oefner PJ: Y chromosome sequence 25. Thangaraj K, Chaubey G, Kivisild T, Reddy AG, Singh VK, Rasalkar variation and the history of human populations. Nat Genet AA, Singh L: Reconstructing the origin of Andaman Islanders. 2000, 26:358-361. Science 2005, 308:996. 40. Zhivotovsky LA, Underhill PA, Cinnioğlu C, Kayser M, Morar B, Kivis- 26. Thangaraj K, Chaubey G, Singh VK, Vanniarajan A, Thanseem I, Reddy ild T, Scozzari R, Cruciani F, Destro-Bisol G, Spedini G, Chambers AG, Singh L: In situ origin of deep rooting lineages of mito- GK, Herrera RJ, Yong KK, Gresham D, Tournev I, Feldman MW, chondrial Macrohaplogroup 'M' in India. BMC Genomics 2005, Kalaydjieva L: The effective mutation rate at Y chromosome 7:151. short tandem repeats, with application to human population 27. Endicott P, Metspalu M, Stringer C, Macaulay V, Cooper A, Sanchez divergence time. Am J Hum Genet 2004, 74:50-61. JJ: Multiplexed SNP typing of ancient DNA clarifies the origin 41. Nei M: Molecular Evolutionary Genetics. New York, Columbia of Andaman mtDNA haplogroups amongst south Asian University; 1987. tribal populations. PLoS ONE 2006, 1:e81. 42. Quintana-Murci L, Chaix R, Wells RS, Behar DM, Sayar H, Scozzari R, 28. Kivisild T, Shen P, Wall DP, Do B, Sung R, Davis K, Passarino G, Rengo C, Al-Zahery N, Semino O, Santachiara-Benerecetti AS, Coppa Underhill PA, Scharfe C, Torroni A, Scozzari R, Modiano D, Coppa A, A, Ayub Q, Mohyuddin A, Tyler-Smith C, Qasim Mehdi S, Torroni A, de Knijff P, Feldman M, Cavalli-Sforza LL, Oefner PJ: The role of McElreavey K: Where west meets east, the complex mtDNA selection in the evolution of human mitochondrial genomes. landscape of the southwest and Central Asian corridor. Am J Genetics 2006, 172:373-387. Hum Genet 2004, 74:827-845. 29. Kong QP, Bandelt HJ, Sun C, Yao YG, Salas A, Achilli A, Wang CY, 43. Olivieri A, Achilli A, Pala M, Battaglia V, Fornarino S, Al-Zahery N, Zhong L, Zhu CL, Wu SF, Torroni A, Zhang YP: Updating the East Scozzari R, Cruciani F, Behar DM, Dugoujon JM, Coudray C, Santachi- Asian mtDNA phylogeny, a prerequisite for the identifica- ara-Benerecetti AS, Semino O, Bandelt HJ, Torroni A: The mtDNA tion of pathogenic mutations. Hum Mol Genet 2006, legacy of the Levantine early Upper Palaeolithic in Africa. 15:2076-2086. Science 2006, 314:1767-1770. 30. Sun C, Kong QP, Palanichamy MG, Agrawal S, Bandelt HJ, Yao YG, 44. Hill C, Soares P, Mormina M, Macaulay V, Clarke D, Blumbach PB, Khan F, Zhu CL, Chaudhuri TK, Zhang YP: The dazzling array of Vizuete-Forster M, Forster P, Bulbeck D, Oppenheimer S, Richards basal branches in the mtDNA macrohaplogroup M from M: A mitochondrial stratigraphy for island southeast Asia. India as inferred from complete genomes. Mol Biol Evol 2006, Am J Hum Genet 2007, 80:29-43. 23:683-690. 45. Chaubey G, Metspalu M, Kivisild T, Villems R: Peopling of South 31. Reddy BM, Langstieh BT, Kumar V, Nagaraja T, Reddy AN, Meka A, Asia, investigating the caste-tribe continuum in India. Bioes- Reddy AG, Thangaraj K, Singh L: Austro-Asiatic tribes of North- says 2007, 29:91-100. east India provide hitherto missing genetic link between 46. Lee HY, Yoo JE, Park MJ, Chung U, Kim CY, Shin KJ: East Asian South and Southeast Asia. PLoS ONE 2007:e1141. mtDNA haplogroup determination in Koreans, haplogroup- 32. Mishmar D, Ruiz-Pesini E, Golik P, Macaulay V, Clark AG, Hosseini S, level coding region SNP analysis and subhaplogroup-level Brandon M, Easley K, Chen E, Brown MD, Sukernik RI, Olckers A, control region sequence analysis. Electrophoresis 2006, Wallace DC: Natural selection shaped regional mtDNA varia- 27:4408-4418. tion in humans. Proc Natl Acad Sci USA 2003, 100:171-176. 47. Hill C, Soares P, Mormina M, Macaulay V, Meehan W, Blackburn J, 33. Hammer MF, Horai S: Y chromosomal DNA variation and the Clarke D, Raja JM, Ismail P, Bulbeck D, Oppenheimer S, Richards M: peopling of Japan. Am J Hum Genet 1995, 56:951-962. Phylogeography and ethnogenesis of aboriginal Southeast 34. Rosser ZH, Zerjal T, Hurles ME, Adojaan M, Alavantic D, Amorim A, Asians. Mol Biol Evol 2006, 23:2480-2491. Amos W, Armenteros M, Arroyo E, Barbujani G, Beckman G, Beck- 48. Kong QP, Yao YG, Liu M, Shen SP, Chen C, Zhu CL, Palanichamy MG, man L, Bertranpetit J, Bosch E, Bradley DG, Brede G, Cooper G, Zhang YP: Mitochondrial DNA sequence polymorphisms of Côrte-Real HB, de Knijff P, Decorte R, Dubrova YE, Evgrafov O, Gilis- five ethnic populations from northern China. Hum Genet 2003, sen A, Glisic S, Gölge M, Hill EW, Jeziorowska A, Kalaydjieva L, Kay- 113:391-405. ser M, Kivisild T, Kravchenko SA, Krumina A, Kucinskas V, Lavinha J, 49. Kong QP, Yao YG, Sun C, Bandelt HJ, Zhu CL, Zhang YP: Phylogeny Livshits LA, Malaspina P, Maria S, McElreavey K, Meitinger TA, Mikel- of East Asian mitochondrial DNA lineages inferred from saar AV, Mitchell RJ, Nafa K, Nicholson J, Nørby S, Pandya A, Parik J, complete sequences. Am J Hum Genet 2003, 73:671-676. Patsalis PC, Pereira L, Peterlin B, Pielberg G, Prata MJ, Previderé C, 50. Oota H, Settheetham-Ishida W, Tiwawech D, Ishida T, Stoneking M: Roewer L, Rootsi S, Rubinsztein DC, Saillard J, Santos FR, Stefanescu Human mtDNA and Y-chromosome variation is correlated G, Sykes BC, Tolun A, Villems R, Tyler-Smith C, Jobling MA: Y-chro- with matrilocal versus patrilocal residence. Nat Genet 2001, mosomal diversity in Europe is clinal and influenced prima- 29:20-21. rily by geography, rather than by language. Am J Hum Genet 51. Yao YG, Kong QP, Bandelt HJ, Kivisild T, Zhang YP: Phylogeo- 2000, 67:1526-1543. graphic differentiation of mitochondrial DNA in Han Chi- 35. Zerjal T, Beckman L, Beckman G, Mikelsaar AV, Krumina A, Kucins- nese. Am J Hum Genet 2002, 70:635-651. kas V, Hurles ME, Tyler-Smith C: Geographical, linguistic, and 52. Cordaux R, Saha N, Bentley GR, Aunger R, Sirajuddin SM, Stoneking cultural influences on genetic diversity, Y-chromosomal dis- M: Mitochondrial DNA analysis reveals diverse histories of tribution in Northern European populations. Mol Biol Evol tribal populations from India. Eur J Hum Genet 2003, 11:253-264. 2001, 18:1077-1087. 53. Kivisild T, Reidla M, Metspalu E, Rosa A, Brehm A, Pennarun E, Parik 36. Mohyuddin A, Ayub Q, Underhill PA, Tyler-Smith C, Mehdi SQ: J, Geberhiwot T, Usanga E, Villems R: Ethiopian mitochondrial Detection of novel Y SNPs provides further insights into Y DNA heritage, tracking gene flow across and around the chromosomal variation in Pakistan. J Hum Genet 2006, gate of tears. Am J Hum Genet 2004, 75:752-770. 51:375-378. 54. Chaubey G, Karmin M, Metspalu E, Metspalu M, Selvi-Rani D, Singh 37. Gayden T, Cadenas AM, Regueiro M, Singh NB, Zhivotovsky LA, VK, Parik J, Solnik A, Naidu BP, Kumar A, Adarsh N, Mallick CB, Underhill PA, Cavalli-Sforza LL, Herrera RJ: The Himalayas as a Trivedi B, Prakash S, Reddy R, Shukla P, Bhagat S, Verma S, Vasnik S, directional barrier to gene flow. Am J Hum Genet 2007, Khan I, Barwa A, Sahoo D, Sharma A, Rashid M, Chandra V, Reddy 80:884-894. AG, Torroni A, Foley RA, Thangaraj K, Singh L, Kivisild T, Villems R: 38. Underhill PA, Myres NM, Rootsi S, Chow CET, Lin AA, Otillar RP, Phylogeography of mtDNA haplogroup R7 in the Indian King R, Zhivotovsky LA, Balanovsky O, Pshenichnov A, Ritchie KH, peninsula. BMC Evol Biol 2008, 8:227. Cavalli-Sforza LL, Kivisild T, Villems R, Woodward SR: New phylo- 55. Bamshad MJ, Watkins WS, Dixon ME, Jorde LB, Rao BB, Naidu JM, genetic relationships for Y-chromosome haplogroup I, reap- Prasad BV, Rasanayagam A, Hammer MF: Female gene flow strat- praising its phylogeography and prehistory In Rethinking the ifies Hindu castes. Nature 1998, 395:651-652. human revolution. In Rethinking the Human Revolution Edited by: 56. Kivisild T, Bamshad MJ, Kaldma K, Metspalu M, Metspalu E, Reidla M, Mellars P, Boyle K, Bar-Yosef O, Stringer C. Cambridge, UK McDon- Laos S, Parik J, Watkins WS, Dixon ME, Papiha SS, Mastana SS, Mir ald Institute for Archaeological Research; 2007:33-42. MR, Ferak V, Villems R: Deep common ancestry of indian and

Page 15 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:154 http://www.biomedcentral.com/1471-2148/9/154

western-Eurasian mitochondrial DNA lineages. Curr Biol 1999, Read M, Pearson NM, Zerjal T, Webster MT, Zholoshvili I, Jamarjas- 9:1331-1334. hvili E, Gambarov S, Nikbin B, Dostiev A, Aknazarov O, Zalloua P, 57. Roychoudhury S, Roy S, Basu A, Banerjee R, Vishwanathan H, Usha Tsoy I, Kitaev M, Mirrakhimov M, Chariev A, Bodmer WF: The Eur- Rani MV, Sil SK, Mitra M, Majumder PP: Genomic structures and asian Heartland, A Continental Perspective on Y-chromo- population histories of linguistically distinct tribal groups of some Diversity. Proc Nat Acad Sci USA 2001, 98:10244-10249. India. Hum Genet 2001, 109:339-350. 77. Passarino G, Semino O, Magri C, Al-Zahery N, Benuzzi G, Quintana- 58. Kivisild T, Tolk HV, Parik J, Wang Y, Papiha SS, Bandelt HJ, Villems R: Murci L, Andellnovic S, Bullc-Jakus F, Liu A, Arslan A, Santachiara- The emerging limbs and twigs of the East Asian mtDNA Benerecetti AS: The 49a,f haplotype 11 is a new marker of the tree. Mol Biol Evol 2002, 19:1737-1751. Erratum in Mol Biol Evol 2003, EU19 lineage that traces migrations from northern regions 20:162 of the Black Sea. Hum Immunol 2001, 62:922-932. Erratum in, Hum 59. Yao YG, Lü XM, Luo HR, Li WH, Zhang YP: Gene admixture in Immunol 62:1313–1314 the silk road region of China, evidence from mtDNA and 78. Cordaux R, Weiss G, Saha N, Stoneking M: The Northeast Indian melanocortin 1 receptor polymorphism. Genes Genet Syst 2000, passageway, A barrier or corridor for human migrations? 75:173-178. Mol Biol Evol 2004, 21:1525-1533. 60. Yao YG, Kong QP, Wang CY, Zhu CL, Zhang YP: Different matri- 79. Wen B, Xie X, Gao S, Li H, Shi H, Song X, Qian T, Xiao C, Jin J, Su B, lineal contributions to genetic structure of ethnic groups in Lu D, Chakraborty R, Jin L: Analyses of genetic structure of the silk road region in China. Mol Biol Evol 2004, 21:2265-2280. Tibeto-Burman populations reveals sex-biased admixture in 61. Misra VN: Prehistoric human colonization of India. J Biosci southern Tibeto-Burmans. Am J Hum Genet 2004, 74:856-865. 2001, 26:491-531. 80. Thangaraj K, Ramana GV, Singh L: Y-chromosome and mitochon- 62. Y Chromosome Consortium: A nomenclature system for the drial DNA polymorphisms in Indian populations. Electrophore- tree of human Y chromosomal binary haplogroups. Genome sis 1999, 20:1743-1747. Res 2002, 12:339-348. 81. Ramana GV, Su B, Jin L, Singh L, Wang N, Underhill PA, Chakraborty 63. Hammer MF, Karafet TM, Redd AJ, Jarjanazi H, Santachiara- R: Y-chromosome SNP haplotypes suggest evidence of gene Benerecetti AS, Soodyall H, Zegura SL: Hierarchical patterns of flow among caste, tribe, and the migrant populations of global human Y-chromosome diversity. Mol Biol Evol 2001, Andhra Pradesh, South India. Eur J Hum Genet 2001, 9:695-700. 18:1189-1203. 82. Cinnioğlu C, King R, Kivisild T, Kalfoglu E, Atasoy S, Cavalleri GL, Lil- 64. Hurles ME, Sykes BC, Jobling MA, Forster P: The dual origin of the lie AS, Roseman CC, Lin AA, Prince K, Oefner PJ, Shen P, Semino O, Malagasy in Island Southeast Asia and East Africa, evidence Cavalli-Sforza LL, Underhill PA: Excavating Y-chromosome hap- from maternal and paternal lineages. Am J Hum Genet 2005, lotype strata in Anatolia. Hum Genet 2004, 114:127-148. 76:894-901. 83. King RJ, Ozcan SS, Carter T, Kalfoğlu E, Atasoy S, Triantaphyllidis C, 65. Xue Y, Zerjal T, Bao W, Zhu S, Shu Q, Xu J, Du R, Fu S, Li P, Hurles Kouvatsi A, Lin AA, Chow CE, Zhivotovsky LA, Michalodimitrakis M, ME, Yang H, Tyler-Smith C: Male demography in East Asia, a Underhill PA: Differential Y-chromosome Anatolian influ- north-south contrast in human population expansion times. ences on the Greek and Cretan Neolithic. Ann Hum Genet Genetics 2006, 172:2431-2439. 2008, 72:205-214. 66. Underhill PA, Passarino G, Lin AA, Shen P, Lahr MM, Foley RA, Oef- 84. Battaglia V, Fornarino S, Al-Zahery N, Olivieri A, Pala M, Myres NM, ner PJ, Cavalli Sforza LL: The phylogeography of Y chromosome King RJ, Rootsi S, Marjanovic D, Primorac D, Hadziselimovic R, binary haplotypes and the origins of modern human popula- Vidovic S, Drobnic K, Durmishi N, Torroni A, Santachiara- tions. Ann Hum Genet 2001, 65:43-62. Benerecetti AS, Underhill PA, Semino O: Y-Chromosomal Evi- 67. Jobling MA, Tyler-Smith C: The human Y chromosome, an evo- dence of the Cultural Diffusion of Agriculture in Southeast lutionary marker comes of age. Nature Rev Genet 2003, Europe. Eur J Hum Genet 2009, 17:820-830. 4:598-612. 85. Mukherjee N, Nebel A, Oppenheim A, Majumder PP: High-resolu- 68. Shi H, Dong Y, Wen B, Xiao C, Underhill PA, Shen P, Chakraborty R, tion analysis of Y-chromosomal polymorphisms reveals sig- Jin L, Su B: Y-Chromosome evidence of southern origin of the natures of population movements from Central Asia and East Asian-specific haplogroup O3-M122. Am J Hum Genet West Asia into India. J Genet 2001, 80:125-135. 2005, 77:408-419. 86. Bamshad MJ, Kivisild T, Watkins WS, Dixon ME, Ricker CE, Rao BB, 69. Hammer MF, Karafet TM, Park H, Omoto K, Harihara S, Stoneking M, Naidu JM, Prasad BV, Reddy PG, Rasanayagam A, Papiha SS, Villems R, Horai S: Dual origins of the Japanese, common ground for Redd AJ, Hammer MF, Nguyen SV, Carroll ML, Batzer MA, Jorde LB: hunter-gatherer and farmer Y chromosomes. J Hum Genet Genetic evidence on the origins of Indian caste populations. 2006, 51:47-58. Genome Res 2001, 11:994-1004. 70. Kayser M, Brauer S, Cordaux R, Casto A, Lao O, Zhivotovsky LA, 87. Zhao Z, Khan F, Borkar M, Herrera R, Agrawal S: Presence of three Moyse-Faurie C, Rutledge RB, Schiefenhoevel W, Gil D, Lin AA, different paternal lineages among North Indians: a study of Underhill PA, Oefner PJ, Trent RJ, Stoneking M: Melanesian and 560 Y chromosomes. Ann Hum Biol 2009, 36:46-59. Asian origins of Polynesians, mtDNA and Y chromosome gradients across the Pacific. Mol Biol Evol 2006, 23:2234-2244. 71. Thanseem I, Thangaraj K, Chaubey G, Singh VK, Bhaskar LV, Reddy BM, Reddy AG, Singh L: Genetic affinities among the lower castes and tribal groups of India, inference from Y chromo- some and mitochondrial DNA. BMC Genet 2006, 7:42. 72. Kumar V, Reddy AN, Babu JP, Rao TN, Langstieh BT, Thangaraj K, Reddy AG, Singh L, Reddy BM: Y-chromosome evidence sug- gests a common paternal heritage of Austro-Asiatic popula- Publish with BioMed Central and every tions. BMC Evol Biol 2007, 7:47. 73. Karafet TM, Mendez FL, Meilerman MB, Underhill PA, Zegura SL, scientist can read your work free of charge Hammer MF: New binary polymorphisms reshape and "BioMed Central will be the most significant development for increase resolution of the human Y chromosomal haplo- disseminating the results of biomedical research in our lifetime." group tree. Genome Res 2008, 18:830-838. 74. Cordaux R, Aunger R, Brentley G, Nasidze I, Sirajuddin SM, Stoneking Sir Paul Nurse, Cancer Research UK M: Independent origins of Indian caste and tribal paternal lin- Your research papers will be: eages. Current Biol 2004, 14:231-235. 75. Quintana-Murci L, Chaix R, Wells RS, Behar DM, Sayar H, Scozzari R, available free of charge to the entire biomedical community Rengo C, Al Zahery N, Semino O, Santachiara-Benerecetti AS, Coppa peer reviewed and published immediately upon acceptance A, Ayub Q, Mohyuddin A, Tyler-Smith C, McElreavey K: Y-Chromo- some lineages trace diffusion of people and languages in cited in PubMed and archived on PubMed Central southwestern Asia. Am J Hum Genet 2001, 68:537-542. yours — you keep the copyright 76. Wells RS, Yuldasheva N, Ruzibakiev R, Underhill PA, Evseeva I, Blue- Smith J, Jim L, Su B, Pitchappan R, Shanmuglakshmi S, Balakrisnan K, Submit your manuscript here: BioMedcentral http://www.biomedcentral.com/info/publishing_adv.asp

Page 16 of 16 (page number not for citation purposes)