<<

Journal of Genetics (2012) 57, 228–234 & 2012 The Japan Society of All rights reserved 1434-5161/12 $32.00 www.nature.com/jhg

ORIGINAL ARTICLE

Revisiting the role of the Himalayas in peopling Nepal: insights from mitochondrial

Hua-Wei Wang1,2, Yu-Chun Li1, Fei Sun3, Mian Zhao1,4, Bikash Mitra2,5, Tapas Kumar Chaudhuri5, Pasupati Regmi6, Shi-Fang Wu1,4, Qing-Peng Kong1,4 and Ya-Ping Zhang1,2,4

Himalayas was believed to be a formidably geographical barrier between South and East Asia. The observed high frequency of the East Eurasian paternal lineages in Nepal led some researchers to suggest that these lineages were introduced into Nepal from Tibet directly; however, it is also possible that the East Eurasian genetic components might trace their origins to northeast India where abundant East Eurasian maternal lineages have been detected. To trace the origin of the Nepalese maternal genetic components, especially those of East Eurasian ancestry, and then to better understand the role of the Himalayas in peopling Nepal, we have studied the matenal genetic composition extensively, especially the East Eurasian lineages, in Nepalese and its surrounding populations. Our results revealed the closer affinity between the Nepalese and the Tibetans, specifically, the Nepalese lineages of the East Eurasian ancestry generally are phylogenetically closer with the ones from Tibet, albeit a few mitochondrial DNA haplotypes, likely resulted from recent flow, were shared between the Nepalese and northeast Indians. It seems that Tibet was most likely to be the homeland for most of the East Eurasian in the Nepalese. Taking into account the previous observation on Y , now it is convincing that bearer of the East Eurasian genetic components had entered Nepal across the Himalayas around 6 kilo years ago (kya), a scenario in good agreement with the previous results from linguistics and archeology. Journal of Human Genetics (2012) 57, 228–234; doi:10.1038/jhg.2012.8; published online 22 March 2012

Keywords: mtDNA; Nepalese; origin

INTRODUCTION DNA (mtDNA) haplogroups M31, M33, M35, M38, R6 and R305,7) Understanding how the ancestors of modern had successfully are prevalent in the Nepalese, reflecting extensive connections between settled and adapted to areas with extreme conditions, for example, Nepal and India; whereas the East Eurasian influence on the Nepalese highlands, continues to be a hot issue of wide interest. For instance, a is surprisingly substantial, as manifested by the presence of Y-chromo- recent study on mitochondrial information revealed that the some haplogroup O3-M1174,5 and mtDNA haplogroups B5, D4 modern humans had successfully colonized the Tibetan Plateau since and G.5,7 As the Himalayas acts as a formidably geographical barrier the Late Pleistocene.1 As a neighbor of Tibet, Nepal is located at the between Nepal and East Asia (Tibet), how these East Eurasian southern piedmont of the Himalayas and is bordered by India and components dispersed to Nepal becomes an issue of hot debate. China. Similar to the Tibetans, the Nepalese are also famous for their Recently, by comparing the Y-chromosome lineages between the high-altitude living conditions. In consistent with the language Nepalese and the Tibetan populations, Gayden et al. have proposed affiliation of the Nepalese populations (viz. Indo-European and that the East Eurasian genetic components had been introduced into Tibeto-Burman),2–3 recent studies on the Nepalese have detected Nepal from Tibet directly,4,6 a scenario seemingly in agreement with substantial genetic contribution from South Asians, East and West the hypothesized retreat of Baric speakers.8 However, considering the Eurasians,4,5 it becomes evident that the genetic landscape of the close vicinity between Nepal and northeast India in geography, a Nepalese has been largely shaped by later immigrants from the possibility that these genetic components of the East Eurasian ancestry neighboring regions.4–6 For instance, South Asian genetic components might trace their origins to northeast India could not be ruled out (for example, Y-chromosome haplogroups H-M52*, H-M69*, completely. Indeed, abundant East Eurasian mtDNA lineages have H1-M82*, H1-M370*, R1a1-M198 and R2-M124;4,6 mitochondrial already been detected in the northeast Indian populations,9–12 raising

1State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan Province, China; 2Laboratory for Conservation and Utilization of Bio-Resources & Key Laboratory for Microbial Resources of the Ministry of Education, Yunnan University, Kunming, PRChina;3Department of Parasitology, Medical Teaching Institute, Xinxiang Medical University, Xinxiang, China; 4KIZ/CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming, China; 5Cellular Immunology Laboratory, Department of Zoology, University of North Bengal Siliguri, Siliguri, India and 6Little Angel’s College Hattiban, Lalitpur, Nepal Correspondence: Dr Q-P Kong or Dr Y-P Zhang, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, Yunnan Province, China. E-mail: [email protected] or [email protected] Received 30 September 2011; revised 5 January 2012; accepted 8 January 2012; published online 22 March 2012 Reconstructing the origin of the Nepalese H-W Wang et al 229

Figure 1 Sampling locations of the populations analyzed in this study. The Nepalese populations collected in this study are highlighted with solid pentacles, and the black dots represent the populations retrieved from the literature.

Table 1 Information of the 43 populations analyzed in this study

Population Code Location Country Group Sample size Reference

Nepal_Kat 1 Kathmandu Nepal Nepalese 200 This study Nepal_Mix 2 Eastern Nepal Nepal Nepalese 46 This study Tharu_CI 3 Central Terai Nepal Nepalese 57 Fornarino et al.5 Tharu_CII 4 Central Terai Nepal Nepalese 76 Fornarino et al.5 Tharu_E 5 Eastern Terai Nepal Nepalese 40 Fornarino et al.5 Kanet 6 Himachal India Northwest Indian 37 Metspalu et al.11 Kashmir 7 Kashmir India Northwest Indian 19 Kivisild et al.14 Lobana 8 Punjab India Northwest Indian 65 Kivisild et al.14 Brahmin 9 Punjab India Northwest Indian 26 Metspalu et al.11 Jat Sikh 10 Punjab India Northwest Indian 30 Metspalu et al.11 Kshatriya 11 Punjab India Northwest Indian 34 Metspalu et al.11 Scheduled caste 12 Punjab India Northwest Indian 20 Metspalu et al.11 Sikh 13 Punjab India Northwest Indian 40 Cordaux et al.9 Tharu_UP 14 Uttar Pradesh India North Indian 38 Metspalu et al.11;Kivisildet al.14 Kharia 15 Bihar India North Indian 39 Kumar et al.17; Banerjee et al.16 Bhumij 16 Bihar India North Indian 60 Kumar et al.17; Banerjee et al.16 Santhal 17 Bihar India North Indian 45 Kumar et al.17 Garo 18 Meghalaya India Northeast Indian 76 Reddy et al.12 Lyngnga 19 Meghalaya India Northeast Indian 74 Reddy et al.12 Nongtrai 20 Meghalaya India Northeast Indian 27 Reddy et al.12 Maram 21 Meghalaya India Northeast Indian 60 Reddy et al.12 Bhoi 22 Meghalaya India Northeast Indian 29 Reddy et al.12 Khynriam 23 Meghalaya India Northeast Indian 82 Reddy et al.12 War_Khas 24 Meghalaya India Northeast Indian 29 Reddy et al.12 Pnar 25 Meghalaya India Northeast Indian 51 Reddy et al.12 War_Jaint 26 Meghalaya India Northeast Indian 17 Reddy et al.12 Adi 27 Assam India Northeast Indian 45 Cordaux et al.9 Apa_1 28 Arunachal Pradesh India Northeast Indian 26 Cordaux et al.9 Apa_2 29 Tripura India Northeast Indian 21 Cordaux et al.9 Nishi 30 Tripura India Northeast Indian 44 Cordaux et al.9 Tipperah 31 Tripura India Northeast Indian 20 Roychoudhury et al.15 Naga 32 Nagaland India Northeast Indian 43 Cordaux et al.9 Tib_Ng 33 Ngari China Tibetan 46 Qin et al.13 Tib_R1 34 Shigatse China Tibetan 59 Qin et al.13 Tib_R2 35 Shigatse China Tibetan 220 Zhao et al.1 Tib_L 36 Lhasa China Tibetan 59 Qin et al.13 Tib_N1 37 Nagqu China Tibetan 168 Zhao et al.1 Tib_N2 38 Nagqu China Tibetan 58 Qin et al.13 Tib_S 39 Shannan China Tibetan 56 Qin et al.13 Monba 40 Nyingchi China Tibetan 51 Qin et al.13 Tib_Ny 41 Nyingchi China Tibetan 53 Qin et al.13 Lhoba 42 Shannan China Tibetan 20 Qin et al.13 Tib_C 43 Chamdo China Tibetan 61 Qin et al.13

Journal of Human Genetics Reconstructing the origin of the Nepalese H-W Wang et al 230

the possibility that the East Eurasian influence could trace its origin from Tibet directly (across the Himalayas) or from northeast India back to the dispersal of Tibeto-Burman people who arrived at Nepal instead. To test both scenarios and get deeper insights into the origin via northeast India.5 Therefore, it is plausible that the bearer of the of the Nepalese, a simple way is to compare the phylogenetic affinity of East Eurasian genetic components might have arrived at Nepal either the East Eurasian lineages, respectively, observed in Nepal, northern

Table 2 The genetic distance between populations from Nepal, China (Tibet), northern India based on the East Eurasian matrilineal Table 3 Admixture analysis of the Nepalese by comparing with its components potential parental populations

northwest north northeast Hybrid Nepalese population Nepalese Indian Indian Indian Tibetan Parental group Kathmandu eastern Nepal Tharu_CI Tharu_CII Tharu_E Nepalese 0.00 northwest Indian 0.03 0.00 Tibetan 0.58±0.16 0.45±0.17 0.87±0.23 0.75±0.19 0.12±0.24 north Indian 0.12 0.17 0.00 northeast Indian 0.24±0.14 0.17±0.14 0.00±0.19 0.00±0.16 0.12±0.20 northeast Indian 0.04 0.05 0.12 0.00 northwest Indian 0.19±0.12 0.39±0.13 0.00±0.17 0.20±0.14 0.56±0.18 Tibetan 0.02 0.03 0.13 0.03 0.00 north Indian 0.01±0.06 0.00±0.07 0.13±0.09 0.06±0.07 0.19±0.09

Nep CS# IG# CS# CS# Nep Nep Nep CS# Nep Nep CS# CS# CS# CS# CS# TK# CS# TK# Nep SF# TK# Nep CS# Nep CS# CS# Nep CS# Nep 152 R58 m306 R134 R114 056 102 198 T17 005 159 T13 A64/ B47/ C8 C48 OR89 R61 AS32 057 Sin11 AS16 079 A68 193 T21 C182 118 SW23 166 B26 B66 14031 16484G 16291 16275 16293 16319 16311 16278 16189 16335 12280 16260 15868 12007 15172 16519 16189 16192 13928C 16296+C 16192 16239 11887 16095 13858 16519 16297 13329 15514 11061 16365 1780 8697 16294 16183C 16111 12425 16293C 8056 16093 10565 9615 466 16390 13708 14509 16344 16209 13105 16519 15323 16400 12477 10680 15951 7546 16319 334 16519 15466 15712 16092 4991 3391 5774 12727 9776 8709 464C 16302 9632 16148 12501 16301 16185 10845 16497 7756 16298 16278 11893 8802 15289 456 16265C 185 @16126 15052 11287 15859 571 5118 4052 7961 8584 455+TT 10906 6413 16144A 11365 16102 15951 16093 2056 12879 16172 8276+C 6261 9966 146 374 16264 16169 14053 3798 14687 207 3654 5436d 5363 392 8572 5417 10334 h7226 3407 15203 13766 4353 522- 10727 12153 3469 8473 12792 204 267 4143 195 8281- 5319 6905 189 14060 13743 3915 523d 8639 13731 152 3202 8886 10166 11893 95C 7867 12279 152 709 M5b1 8289d 4851 3399 13820 11914 5783 960d 269 5426 8676 5423 8572 93 8143 151 16304 152 8005 3483 12813 11167 3766 M35a 11986 5124A 60 6020 16519 6260 3144 11827 6713 207 2880 14410C 1763 575 8930A 15287 10238 3394 462 16093 16519 146 150 204 593 12279 522- 523+CA 15262 11016 297 2000 195 150 15924 10166 523d 333 9064 195 16519 10670 6293 150 4916 M3a1 204 16362 5432 M35b 4454 460 M5b 146 M5a 16324 482 15928 M52 4703 9344 M35 M3a 16048 14323 15908 6293 13368 12477 8562 3221 10750 8784T 3921 4580 1719 5460 3744 M5 709 M3 676 1598 12561 M33 1462T 199 16126 16129 2361 1888 482

Revised Nep MG# Nep MG# Nep Nep TK# CS# IG# IG# CS# CS# CS# NM# CS# CS# CS# Nep CS# Nep Nep CS# CS# Nep Nep Nep CRS 007 A156 138 A190 028 022 AS15 R64 k11b t1331 T153 T14 B107 m2 T8 A47 B156 045 C64 107 168 B177 T6 076 190 075 16325 16224 16311 16311 16519 16093 199 16185 16145 16179d16399 15326 16519 16193 16497 16311 8273 12945 11946 16302 16293 16293C 12858 @15326 14550 14155 16362 8860 16271 15791 16362 16227 14053 4703 9966 8870 16093 16189 16249 8281- 12960 16360 13368 3335 16311 750 15787 243 16390 16261 5805 8491 4343 9653 204 10160 9055 16223 8289d 11437 11928 13143 1598 16234 16357 263 13968 15928 16227 152 6242 956 152 3504 6119 16213 @709 9098 16344 15148 385 9992 16093 16266 522- 13194 16189 3381 214 @522 204 16129 16318 7444 16093 11260 8911 16318T 12136 523d 12007 16183C 183 151 -523d 16318C 12696 16294 5875A 14233 10954 15924 6485 16182C 93 11617 16114 8723 4695 8580 3381 14148 2392 16181 16000T 7270 16093 7853 195 1019 16129 482 1760 8950 M30d 15184 6713 16362 15856 5471 456 15670 207 6185 H2 15326 3866 4659 16311 7444 4047 356+C 15467 16166d 3540 13782 793G 13966 4936 3894 234 15110 356+C 4769 9716 M30b 721 13656 4736 195 12768 3592 1438 8646 153 11963 3184 154 12285 1664 6131 16278 M30c 195 10003 522- H 5911 M30a 16192 9153 5510 13980 523d 8022 12234 14305 7028 709 6366 5147 195 R8 4456A 146 15259 2706 513 152 189 4388 16519 3645 M18 HV 13215 3310 M43 9449 13135 D4 M30 14766 7759 12498 12636 11719 3384 15431 246 11696 M81 73 2755 522-523d 194 10316 16362 M60 522-523d 195A 709 14668 11827 8414 R 9266 5178A 16223 12007 4883 8345 12705 3010 7912

16519 N M 15301 10873 15043 10398 14783 9540 10400 489 8701 L3

Figure 2 Reconstructed mtDNA tree of the completely sequenced representatives of the major Nepalese mtDNA lineages. The phylogenetic tree was reconstructed on the basis of 58 mitochondrial genomes, among which 21 mtDNAs were sampled from Katmandu, Nepal and generated in this study, whereas the rest 37 related mtDNAs were collected from the literature.5,26,27,31–33 are recorded according to the rCRS.19 SuffixesA,C,G,andT indicate transversions, ‘‘d’’ signifies a and a plus (+) signs an insertion; recurrent mutations are underlined. The prefix ‘‘h’’ indicates heteroplasmy and ‘‘@’’ highlights back . The length polymorphisms (for example, 309+C, 309+2C, 315+C and 315+2C) are ignored during the tree reconstruction. The reconstruction of highly recurrent mutations (for example, 16519, and the insertion/deletion of ‘‘CA’’ repeats in region 514–523) is tentative at best.

Journal of Human Genetics Reconstructing the origin of the Nepalese H-W Wang et al 231

India (including northeast, north and northwest India) and Tibet at Table 4 Estimated ages of haplogroups G2a2 and M9a1a2a based on higher molecular resolution, and which becomes feasible with the the calibration rates proposed in Forster et al.44 and Soares et al.46 recently released Tibetan mtDNA data.1,13 To achieve this objective, mtDNA variation within a number of Soares et al.’s rate46 Forster et al.’s rate44 246 Nepalese individuals from Kathmandu and eastern Nepal were collected and studied in this study (Supplementary Table S1, Supple- Haplogroup N r±s T(ky) r±s T(ky) mentary Material online), and the recently released mtDNA data from G2a2 23 0.30±0.20 5.74 (1.98–9.49) 0.30±0.20 6.14 (2.12–10.16) 5 1,13 Nepal and the neighboring regions, especially those from Tibet, M9a1a2a 20 0.30±0.19 5.65 (2.13–9.18) 0.30±0.19 6.05 (2.28–9.83) northeast and northwest India9–12,14–17 were also considered. To Abbreviation: ky, kilo years. further understand the Nepalese mtDNA landscape, a total of 21 The r and s values were obtained by considering the substitutions in segment 16090–16365 representative individuals were selected for completely mtDNA as fully described previously.44,46 genome sequencing in order to unambiguously determine their phylogenetic status (Figure 2).

Group Nepal Group Tri pura MATERIALS AND METHODS 0.750 Apatani North India Group Kashmir Jat Sikh Northeast India Group Sample collection Northwest India Group Tibet of China Group Blood samples of 246 unrelated individuals were collected from Kathmandu Monba Nishi and eastern Nepal with informed consent (Figure 1 and Table 1), and this Nep_Mix 0.500 Scheduled caste project was approved by the Ethics Committee at Kunming Institute of Kanet

Zoology, Chinese Academy of Sciences. The geographical locations of the Lhasa populations are displayed in Figure 1. Shigatse 0.250 Nagqu_2 Nagqu_1 Tharu_CI Nep_Kat Rikaze DNA amplification, sequencing and quality control Adi Lobana Naga Sikh Total DNA was isolated with the standard phenol/chloroform method and Ngari Sa Charndo stored at À80 1C. An mtDNA segment (spanning nucleotide position 16024– 0.000 Apatani Tharu_CII PC 2 (16.841%) Bhumij Lhoba 16569/1–576 and covering the whole-mtDNA control region) was amplified Tharu_E Nyingchi and sequenced as fully described in our previous works.1,18 Mutations are Garo Bhoi recorded by comparing with the revised Cambridge reference sequence 19 -0.250 Shannan (rCRS). All the individuals were allocated into specific haplogroup based War_Jaint on their control-region information; the assignments were further confirmed Tharu by typing additional diagnostic coding-region mutations according to the Pnar reconstructed phylogenetic trees of East Asian,1,20–25 South Asian5,7,12,26–33 -0.500 Kshatriya Brahmin Lyngnga 34–38 War_Khas and Southeast Asian (Supplementary Table S1, Supplementary Material Nongtrai Kharia online). For the mtDNA sample of interest, entire genome was amplified and Maram Khynriam sequenced as described elsewhere.1,18 To avoid any potential problems in -0.200 0.000 0.200 0.400 0.600 0.800 1.000 mtDNA data quality, necessary quality control measures21 and some caveats39 PC 1 (37.109%) were followed as described previously. The new mtDNA sequences reported in Figure 3 Principle component analysis (PCA) of the populations under this study have been deposited in GenBank under accession numbers study. JF742217–JF742461 (for control-region sequences) and JF742196–JF742216 (for whole-mitochondrial genomes). (36.59%) and South Asian (51.63%) ancestry have comprised the vast Data analysis majority of the Nepalese gene pool (Supplementary Table S1, Supple- 5 The principle component analysis (PCA) was conducted based on the East mentary Material online), and this pattern remains almost stable for Eurasian haplogroup frequencies as described previously40 (Supplementary both the East Eurasian (45.11%; 189/419) and South Asian (47.49%; Table S2, Supplementary Material online). Genetic distances (Table 2) were 199/419) components after taking into account the recently reported estimated by using the package Arlequin 3.11.41 Admixture estimation was Nepalese mtDNA data.5 As for the 21 samples with ambiguously performed by the Weighted Least Squares (WLS) Method using the Statistical phylogenetic status, completely sequencing their mtDNA genomes 42 Package for the Social Sciences (SPSS) 14.0 software (Table 3). The reduced revealed that virtually all of these samples in fact belong to the already median networks of haplogroups of interest were constructed by using the defined haplogroups, such as M3, M5, M18, M30, M35, M43, D4, R8 network 4.510 program (http://www.fluxus-engineering.com/sharenet.htm) and M60. Of note is that, beside two singular branches identified in and adjusted manually (Figures 4 and 5).43 The ages of the lineages G2a2 (further defined by mutation 16193 and named G2a2 tentatively) and M9a1a2a this study, we also defined a novel haplogroup characterized by (characterized by mutations 16145, 16316 and a back mutation at site 16362, variations 9266 and 11827, which was named M81 here (Figure 2). and designated as M9a1a2a) (Table 4) were estimated by using the r To get more insights into the origin of the East Eurasian maternal statistic44,45 with the suggested calibration rates.44,46 components observed in the Nepalese and therefore test the two competing scenarios about how these components had been intro- RESULTS AND DISCUSSION duced into Nepal,4,5,8 we focused on the phylogenetic affinity between On the basis of the combined information from control-region and the East Eurasian haplogroups identified in the Nepalese and those partial coding-region segments, the majority (96.34%; 237/246) of the from the Tibetan, northeast and northwest Indian populations. Nepalese mtDNAs could unambiguously be allocated into the defined Figure 3 illustrates the principle component analysis plot of the 43 haplogroups of East Eurasian (36.59%; 90/246),1,20–25 South Asian populations under study, which was constructed based merely on the (51.63%; 127/246)5,7,11,12,26–31 and West Eurasian ancestries (8.13%; East Eurasian lineages. Among the five Nepalese populations under 20/246)26,31,47,48 (Supplementary Table S2, Supplementary Material study, three clustered with the Tibetans (Figure 3). After we consid- online). It is apparent that the genetic components of East Eurasian ered all the Nepalese regional populations as a whole and calculated its

Journal of Human Genetics Reconstructing the origin of the Nepalese H-W Wang et al 232

Fst value with the populations from its neighboring regions, the from the nodes occupied almost exclusively by the Tibetan lineages smallest genetic distance was observed between the Nepalese and the and (3) only a few haplotypes are shared sporadically between the Tibetans (Table 2). By taking the Tibetans and northern Indians as the Nepalese and the northern Indians. Taken together, the Nepalese parental populations, the results of the admixture estimation analysis lineages of East Eurasian ancestry generally show much closer affinity revealed that the Tibetans made major contribution to virtually all with the ones from Tibet, albeit a few mtDNA haplotypes, likely Nepalese populations (except for the eastern Tharu population; resulted from recent gene flow, were shared between the Nepalese and Table 3). Afterwards, we further compared the phylogenetic affinity northern (including northeast) Indians (Figures 4 and 5). of the East Eurasian lineages observed in Nepalese (including hap- Even though we focused on the East Eurasian lineages identified in logroups A11, C, G2a, M9a, F1c and Z; Figures 4 and 5) with those the Nepalese populations, we did observe a number of the Nepalese- from the neighboring regions, for example, Tibet, northeast and specific haplotypes, strongly suggesting their rather ancient origin and northwest India, by means of median networks.43 On the basis of most plausibly de novo differentiation in Nepal. To get some hint at the constructed networks (Figures 4 and 5), several features could be the arrival time of the lineages, we have focused on two clades from observed: (1) the Nepalese share some basal or internal haplotypes haplogroups G2a and M9a1a2 simply because both clades contain the with the Tibetans; (2) the Nepalese harbor a number of unique Nepalese haplotypes at their terminal branch or basal node and likely haplotypes at the terminal level, most of which branched off directly have differentiated in Nepal; estimating their ages would then help to

N Nis @16093 R R L 16278 2R L Ng S 16374C R Ng 16243 5R 16172 16302 K 16129 C S 16300 16291 Adi 16069 16288 16456 16400 K Kan Ng 16126 16354 K L N-M @16223 4 C II E Ny 16086 Adi 16294 Ng K 16212 2Ny @16298 4Nis 2Nag @16093 Kan K R 16239 R 16362 R 5N 16319 Adi 12R 4S N 2R 16150 Ng 16362 3S Nag 16224 Ng Ny 3R 2Mon Khy R 16297 16357 R 3R 16284 16093 16129 S K 16384 16311 Tha 16093 16359 S Khy 2R 16318T 16355 16220+A 16093 2Nag 16051 16325 C Bho 16311 16234 16096 16288 2R R @16223 16164 R 2S 16085 Tri C 16126 16148 3N Lho 16327 16298 2R S 16223 3N 16319 N Ny C 16293C Ng 16290 Haplogroup C (N=64) 16223 Haplogroup A11 (N=61)

C II 5 Apa K

16092

2 R 16343 R C I

2 N-M 2 C II 6 R 3 K S 16362 3 Nis Apa K K N 2 K 16189 E 4 C II 16150 Ny 3C I 16355 16291 R 16051 16215 16093 R 16058C L L Lho 3 R 16145 16126 Note 16107 16284 16212 16302 16266 2 K C Nepalese K E 16311 16261 S L Adi Bho 3Nis Ny 16224 4N Tibetan Lyn Kan C N Nis Gar R Northeast Indian 16298 16304 Northwest Indian 16129 16260 16111 16223 16185 North Indian Haplogroup F1c (N=39) Haplogroup Z (N=36)

Figure 4 Median networks of haplogroups A11, C, F1c and Z. The reduced median networks of haplogroups of interest were constructed by using the network 4.510 program (http://www.fluxus-engineering.com/sharenet.htm) and adjusted manually according to Bandelt et al.43 The data used here were collected from this study (Supplementary Table S1) and the literature (Table 1). The sequence variation used for network construction was confined to segment 16047–16497. Suffixes T, C and G refer to transversions; recurrent mutations are underlined and ‘‘@’’ denotes a reverse mutation, 16193+C was omitted in the median network. Code in the circles refers to the population abbreviation as displayed in Supplementary Table S2.

Journal of Human Genetics Reconstructing the origin of the Nepalese H-W Wang et al 233

N-M K Non 8 WK Bho 2 C I @16093 WK 5 WJ 3 N N-M E 5 Lyn C I C II K K 5 C II Tha Mar 5 Pna 15 Khy Mar Khy 16348 16086 16093 C I N 16319 @16362 E N K 16051 16081 16311 16311 16356 Ny C I C II K 16188+C 2 N K L N C I 3 C II C II 16168 L 16300 2 C I C R C II R C Ng 3 N 2 E 3 C I Gar 16337 2 C I Kha 16293 16256 Adi Lho @16223 16390 Tha 16215 16271 L 16158G 16344 16129 R 16173 16278 16256 16384 16114 16311 R G2a2 16093 3 R 16172 16209 16129 16193 2N R 16093 N L 16288 Tha @16223 16265C Adi 16474T Ng 16189 R C 16319 R 16234 16272 16172 Bho 2 Adi 16325 R 16356 Mar 5 Lyn 3 M Ng 7 C I 7 C II @16227 C 16243 16317 R 3 Ny Khy 2 R 16288 16311 16224 16189 Pna Lho 16168 16093 9 R 16274 16188 4 N R L 16051 3 Ng Note 16185 16218 16114 16291 16172 2 Pna S M9a1a2a Nepalese 16289 6 S 4 C 16362 N 16362 16311 @16362 S N 16158 16278 16316 16145 Tibetan 16227 S 16223 Haplogroup G2a (N=56) Northeast Indian 2 R L 16234 16223 North Indian Haplogroup M9a (N=143) Figure 5 Median networks of haplogroups G2a2 and M9a1a2a. For more information, see Figure 4. date the arrival time of the migration from Tibet. In fact, time components had been introduced into Nepal. Taking into account the estimation results revealed that haplogroups G2a2 and M9a1a2a previous observation on ,4 now it is convincing that the have very similar ages of B5.7 kya, and this age becomes a little East Eurasian had entered Nepal across the Himalayas around 6 kya, a older (B6 kya) when calibration rate proposed by Forster et al.44 was scenario in good agreement with the previous findings from linguistics used. To this end, the very similar ages of both haplogroups, which and archeology. likely had in situ differentiated in Nepal, strongly suggest that the bearers of these East Eurasian maternal components would have CONFLICT OF INTEREST arrived at Nepal no later than 5.7 kya (Table 4). In retrospect, previous The authors declare no conflict of interest. work has suggested that the maternal genetic components from the northern East Eurasian was introduced into Tibet around 8.2 kya,1 and our time estimation results fit this dating frame very well. It is then ACKNOWLEDGEMENTS conceivable that the settlement of Nepal by the bearer of the East We thank Wen-Zhi Wang, Chun-Ling Zhu, Yang Yang, Shi-Kang Gou and Tao Sha for their technical assistant. We are grateful to the volunteers for providing Eurasian genetic components occurred likely before 5.7 kya, a result in the blood samples to the project. This work was supported by grants from the good agreement with the archeological findings reporting shared the Natural Science Foundation of Yunnan Province (No. 2011FB106) and the 49 Neolithic features between Nepal and Tibet (references therein). National Natural Science Foundation of China (NSFC, Nos. 30900797 and Previous studies have observed substantial East Eurasian genetic 30621092). components in the Nepalese populations;4,5 however, it remains controversial whether the East Eurasian lineages have been introduced into Nepal from Tibet directly (across the Himalayas)4,6 or via 5,8,50 northeast India. By extensively analyzing the mtDNA variation 1 Zhao, M., Kong, Q. P., Wang, H. W., Peng, M. S., Xie, X. D., Wang, W. Z. et al. in Nepal, Tibet, northern India populations, our observations, based Mitochondrial genome evidence reveals successful Late Paleolithic settlement on the Tibetan Plateau. Proc. Natl. Acad. Sci. USA 106, 21230–21235 (2009). on the principle component analysis, Fst and admixture estimation, 2 Wang, H. W. Nepal (Social Sciences Documentation Publishing House, Beijing, China, revealed the closer genetic affinity between the Nepalese and the 2004). Tibetans, and this result was further substantiated by the median 3 Xue, K. Q., Zhao, C. Q., Sun, S. H., Ge, W. J., Liu, Y., Xing, G. C. et al. Concise Encyclopaedia Of South Asia and Central Asia (China Social Science Press, Beijing, networks, (Figures 4 and 5) in which most of the Nepalese mtDNAs China, 2004). prevalent among northern Asian populations shared the haplotypes 4 Gayden, T., Cadenas, A. M., Regueiro, M., Singh, N. B., Zhivotovsky, L. A., Underhill, P. A. et al. The Himalayas as a directional barrier to gene flow. Am. J. Hum. Genet. 80, with the Tibetans at root level or branched off directly from the nodes 884–894 (2007). consisting almost exclusively of the Tibetan lineages. Our results 5 Fornarino, S., Pala, M., Battaglia, V., Maranta, R., Achilli, A., Modiano, G. et al. strongly suggest that most of the East Eurasian maternal components Mitochondrial and Y-chromosome diversity of the Tharus (Nepal): a reservoir of genetic 4,6 variation. BMC Evol. Biol. 9, 154 (2009). identified in the Nepalese were introduced directly from Tibet, and 6 Gayden, T., Mirabal, S., Cadenas, A. M., Lacau, H., Simms, T. M., Morlote, D. et al. the time estimation results further date that this peopling scenario Genetic insights into the origins of Tibeto-Burman populations in the Himalayas. plausibly occurred about 6 kya. Indeed, this inference seems to be in J. Hum. Genet. 54, 216–223 (2009). 7 Thangaraj, K., Chaubey, G., Kivisild, T., Rani, D. S., Singh, V. K., Ismail, T. et al. striking accordance with the historically recorded passes (such as the Maternal footprints of Southeast Asians in North India. Hum. Hered. 66, 1–9 (2008). Kodari and Rasuwa Passes), which bridged the Nepalese and the 8 Su, B., Xiao, C. J., Deka, R., Seielstad, M. T., Kangwanpong, D., Xiao, J. H. et al. 3 Y chromosome haplotypes reveal prehistorical migrations to the Himalayas. Hum. Tibetans since the ancient time. However, the observed gene flow Genet. 107, 582–590 (2000). from northeast India suggests genetic contribution, albeit limited, 9 Cordaux, R., Saha, N., Bentley, G. R., Aunger, R., Sirajuddin, S. M. & Stoneking, M. from this region, a scenario echoing the proposed inland dispersal Mitochondrial DNA analysis reveals diverse histories of tribal populations from India. 50 Eur. J. Hum. Genet. 11, 253–264 (2003). route. In this spirit, our findings complete the understanding of the 10 Cordaux, R., Weiss, G., Saha, N. & Stoneking, M. The Northeast Indian passageway: a origin of the Nepalese and the way how the East Eurasian genetic barrier or corridor for human migrations? Mol. Biol. Evol. 21, 1525–1533 (2004).

Journal of Human Genetics Reconstructing the origin of the Nepalese H-W Wang et al 234

11 Metspalu, M., Kivisild, T., Metspalu, E., Parik, J., Hudjashov, G., Kaldma, K. et al. Most 29 Chandrasekar, A., Kumar, S., Sreenath, J., Sarkar, B. N., Urade, B. P., Mallick, S. et al. of the extant mtDNA boundaries in south and southwest Asia were likely shaped during Updating phylogeny of mitochondrial DNA macrohaplogroup m in India: dispersal of the initial settlement of Eurasia by anatomically modern humans. BMC Genet. 5, 26 modern human in South Asian corridor. PLoS ONE 4, e7447 (2009). (erratum 26:41) (2004). 30 Thangaraj, K., Chaubey, G., Kivisild, T., Reddy, A., Singh, V. K., Rasalkar, A. A. et al. 12 Reddy, B. M., Langstieh, B. T., Kumar, V., Nagaraja, T., Reddy, A. N. S., Meka, A. et al. Reconstructing the origin of Andaman Islanders. Science 308, 996 (2005). Austro-Asiatic tribes of Northeast India provide hitherto missing genetic link between 31 Kivisild, T., Shen, P. D., Wall, D., Do, B., Sung, R., Davis, K. et al. The role of selection South and Southeast Asia. PLoS ONE 2, e1141 (2007). in the evolution of human mitochondrial genomes. Genetics 172, 373–387 (2006). 13 Qin, Z. D., Yang, Y. J., Kang, L. L., Yan, S., Cho, K., Cai, X. Y. et al. A mitochondrial 32 Maca-Meyer, N., Gonza´lez, A. M., Larruga, J. M., Flores, C. & Cabrera, V. M. Major revelation of early human migrations to the Tibetan Plateau before and after the last genomic mitochondrial lineages delineate early human expansions. BMC Genet. 2, 13 (2001). glacial maximum. Am. J. Phys. Anthropol. 143, 555–569 (2010). 33 Ingman, M. & Gyllensten, U. Mitochondrial genome variation and evolutionary history of 14 Kivisild, T., Bamshad, M. J., Kaldma, K., Metspalu, M., Metspalu, E., Reidla, M. et al. Australian and New Guinean aborigines. Genome Res. 13, 1600–1606 (2003). Deep common ancestry of Indian and western-Eurasian mitochondrial DNA lineages. 34 Macaulay, V., Hill, C., Achilli, A., Rengo, C., Clarke, D., Meehan, W. et al. Single, rapid Curr. Biol. 9, 1331–1334 (1999). coastal settlement of Asia revealed by analysis of complete mitochondrial genomes. 15 Roychoudhury, S., Roy, S., Basu, A., Banerjee, R., Vishwanathan, H., Usha Rani, M. V. Science 308, 1034–1036 (2005). et al. Genomic structures and population histories of linguistically distinct tribal groups 35 Hill, C., Soares, P., Mormina, M., Macaulay, V., Meehan, W., Blackburn, J. et al. of India. Hum. Genet. 109, 339–350 (2001). Phylogeography and ethnogenesis of aboriginal Southeast Asians. Mol. Biol. Evol. 23, 16 Banerjee, J., Trivedi, R. & Kashyap, V. K. Mitochondrial DNA control region sequence 2480–2491 (2006). polymorphism in four indigenous tribes of Chotanagpur plateau, India. Forensic Sci. 36 Hill, C., Soares, P., Mormina, M., Macaulay, V., Clarke, D., Blumbach, P. B. et al. Int. Genet. 149, 271–274 (2005). A mitochondrial stratigraphy for island southeast Asia. Am.J.Hum.Genet.80, 29–43 17 Kumar, V., Langstieh, B. T., Madhavi, K. V., Naidu, V. M., Singh, H. P., Biswas, S. et al. (2007). Global patterns in human mitochondrial DNA and Y-chromosome variation 37 Peng, M. S., Quang, H. H., Dang, K. P., Trieu, A. V., Wang, H. W., Yao, Y. G. et al. Tracing caused by spatial instability of the local cultural processes. PLoS Genet. 2, e53 the Austronesian Footprint in Mainland Southeast Asia: A Perspective from Mitochon- (2006). drial DNA. Mol. Biol. Evol. 27, 2417–2430 (2010). 18 Wang, H. W., Jia, X. Y., Ji, Y. L., Kong, Q. P., Zhang, Q. J., Yao, Y. G. et al. Strikingly 38 Tabbada, K. A., Trejaut, J., Loo, J. H., Chen, Y. M., Lin, M., Mirazo´n-Lahr, M. et al. different penetrance of LHON in two Chinese families with primary mutation G11778A Philippine mitochondrial DNA diversity: a populated viaduct between Taiwan and is independent of mtDNA haplogroup background and secondary mutation G13708A. Indonesia? Mol. Biol. Evol. 27, 21–31 (2010). Mutat. Res. 643, 48–53 (2008). 39 Kong, Q. P., Salas, A., Sun, C., Fuku, N., Tanaka, M., Zhong, L. et al. Distilling artificial 19 Andrews, R. M., Kubacka, I., Chinnery, P. F., Lightowlers, R. N., Turnbull, D. M. & recombinants from large sets of complete mtDNA genomes. PLoS ONE 3, e3016 Howell, N. Reanalysis and revision of the Cambridge reference sequence for human (2008). mitochondrial DNA. Nat. Genet. 23, 147 (1999). 40 Yao, Y. G., Kong, Q. P., Bandelt, H. J., Kivisild, T. & Zhang, Y. P. Phylogeographic 20 Kivisild, T., Tolk, H. V., Parik, J., Wang, Y. M., Papiha, S. S., Bandelt, H. J. et al. differentiation of mitochondrial DNA in Han Chinese. Am.J.Hum.Genet.70, The emerging limbs and twigs of the East Asian mtDNA tree. Mol. Biol. Evol. 19, 635–651 (2002). 1737–1751 (2002). 41 Excoffier, L., Laval, G. & Schneider, S. Arlequin (version 3.0): an integrated software 21 Kong, Q. P., Yao, Y. G., Sun, C., Bandelt, H. J., Zhu, C. L. & Zhang, Y. P. Phylogeny of package for population genetics data analysis. Evol. Bioinform. Online 1, 47–50 (2005). east Asian mitochondrial DNA lineages inferred from complete sequences. Am. J. Hum. 42 Long, J. C., Williams, R. C., McAuley, J. E., Medis, R., Partel, R., Tregellas, W. M. et al. Genet. 73, 671–676 (2003). Genetic variation in Arizona Mexican Americans: estimation and interpretation of 22 Tanaka, M., Cabrera, V. M., Gonza´lez, A. M., Larruga, J. M., Takeyasu, T., Fuku, N. et al. admixture proportions. Am. J. Phys. Anthropol. 84, 141–157 (1991). Mitochondrial genome variation in eastern Asia and the peopling of Japan. Genome 43 Bandelt, H. J., Macaulay, V. & Richards, M. Median networks: speedy construction and Res. 14, 1832–1850 (2004). greedy reduction, one simulation, and two case studies from human mtDNA. Mol. 23 Kong, Q. P., Bandelt, H. J., Sun, C., Yao, Y. G., Salas, A., Achilli, A. et al. Updating the Phylogenet. Evol. 16, 8–28 (2000). East Asian mtDNA phylogeny: a prerequisite for the identification of pathogenic 44 Forster, P., Harding, R., Torroni, A. & Bandelt, H. J. Origin and evolution of Native mutations. Hum. Mol. Genet. 15, 2076–2086 (2006). American mtDNA variation: a reappraisal. Am. J. Hum. Genet. 59, 935–945 (1996). 24 Derenko, M., Malyarchuk, B., Grzybowski, T., Denisova, G., Dambueva, I., Perkova, M. 45 Saillard, J., Forster, P., Lynnerup, N., Bandelt, H. J. & Nørby, S. mtDNA variation among et al. Phylogeographic analysis of mitochondrial DNA in northern Asian populations. Greenland Eskimos: the edge of the Beringian expansion. Am. J. Hum. Genet. 67, Am.J.Hum.Genet.81, 1025–1041 (2007). 718–726 (2000). 25 Kong, Q. P., Sun, C., Wang, H. W., Zhao, M., Wang, W. Z., Zhong, L. et al. 46 Soares, P., Ermini, L., Thomson, N., Mormina, M., Rito, T., Ro¨hl, A. et al. Correcting for Large-scale mtDNA screening reveals a surprising matrilineal complexity in East Asia purifying selection: an improved human mitochondrial molecular . Am. J. Hum. and its implications to the peopling of the region. Mol. Biol. Evol. 28, 513–522 Genet. 84, 740–759 (2009). (2011). 47 Finnila¨, S., Lehtonen, M. S. & Majamaa, K. Phylogenetic network for European mtDNA. 26 Palanichamy, M. G., Sun, C., Agrawal, S., Bandelt, H. J., Kong, Q. P., Khan, F. et al. Am. J. Hum. Genet. 68, 1475–1484 (2001). Phylogeny of mitochondrial DNA macrohaplogroup N in India, based on complete 48 Herrnstadt, C., Elson, J. L., Fahy, E., Preston, G., Turnbull, D. M., Anderson, C. et al. sequencing: implications for the peopling of South Asia. Am. J. Hum. Genet. 75, Reduced-median-network analysis of complete mitochondrial DNA coding-region 966–978 (2004). sequences for the major African, Asian, and European haplogroups. Am. J. Hum. 27 Sun, C., Kong, Q. P., Palanichamy, M. G., Agrawal, S., Bandelt, H. J., Yao, Y. G. et al. Genet. 70, 1152–1171 (2002). The dazzling array of basal branches in the mtDNA macrohaplogroup M from India as 49 Sharmai, D. R. Archaeological Remains of the Dang Valley. Ancient Nepal. 88, 8–15 inferred from complete genomes. Mol. Biol. Evol. 23, 683–690 (2006). (1988). 28 Thangaraj, K., Chaubey, G., Singh, V. K., Vanniarajan, A., Thanseem, I., Reddy, A. G. 50 Peng, M. S., Palanichamy, M. G., Yao, Y. G., Mitra, B., Cheng, Y. T., Zhao, M. et al. et al. In situ origin of deep rooting lineages of mitochondrial Macrohaplogroup ‘M’ in Inland post-glacial dispersal in East Asia revealed by mitochondrial haplogroup M9a’b. India. BMC Genomics 7, 151–156 (2006). BMC Biol. 9, 2(2011).

Supplementary Information accompanies the paper on Journal of Human Genetics website (http://www.nature.com/jhg)

Journal of Human Genetics