4. Mitochondrial Diversity: Haplogroups
Second approach to study mitochondrial DNA data is based on analyses of haplogroups. While phylogenetics of haplogroups in itself is an important part of this approach, here we report mtDNA haplogroups with a view to answer the questions about population affinities, demic or cultural diffusion of language and agricultural technology and antiquity and unity of Indian populations.
This chapter thus presents haplogroup frequencies and haplogroup sharing among the tribal populations of Maharashtra followed by haplogroup descriptions focussing on its estimated time to most recent common ancestor (TMRCA), sharing of haplogroups among populations and lastly, West Eurasian haplogroups which are important markers to shed light on demic or cultural diffusion of language and agricultural technology and populations movements to the subcontinent around Holocene. The chapter also presents the results of assessment of population affinities based on haplogroup frequencies and comments on the utility of mtDNA data vis-a vis genome wide autosomal data to understand population histories.
Principal component analysis of Genome wide SNP analysis undertaken to understand the population structure among South Asian populations along with tribal and caste populations of Maharashtra, has been published (Jonnalagadda et al., 2019) as part of a larger analysis.
Haplogroups and their clades observed in the study populations
One of the objectives of the present research was to characterise the mtDNA haplogroups among the selected tribes of Maharashtra, viz. Bhil, Pawara, Kokana and
Warli. As described earlier in the methodology chapter, sequenced control region of mtDNA (nucleotide positions 16024-576) were used to characterise the haplogroups.
95
Following the steps described in methodology chapter, it was possible to characterise almost all the mtDNA sequences from the four tribal populations under study into haplogroups without ambiguity (Figure 20, Table 24, Table 25, Table 26, Table 27).
In total, 40 haplogroups were observed among the tribal populations of West
Maharashtra (Figure 20).
South Asian specific haplogroups and its clades observed are:
M2a, M2b, M3, M30, M33, M35, M37, M*, M38, M53, M39, M4, M57, M5,
M6, M64, M65, N5, R5, R6, R8, R30, U2a, U2b, U2c’d,
Haplogroups shared with Western Eurasia are:
X2d, J, T, HV14, H2, H3, H6, H13, U1, U5, K2a, U4a, U7
Haplogroups shared with Eastern Eurasia are: G2, M73, A1
A large proportion of haplogroups among the tribal populations belongs to South Asia specific branches of M, N, R and U (333 sequences, 89.037 %). All of the South Asia specific haplogroups exhibit a deep coalescence emphasizing the autochthonous nature of the Tribal populations of Maharashtra. A minor fraction (30 sequences, 8.021 %) of the total sequences belongs to west Eurasian haplogroups shared with South Asia.
These haplogroups are crucial for understanding the putative agriculture and/or language related admixture and migrations from West Eurasia. Therefore, in the context of research questions of present study asking whether tribal populations of Maharashtra show any signs of agriculture or language related admixture, only the west Eurasian haplogroups are further discussed in relevant sections.
96
Figure 20: mtDNA haplogroups observed in the present study 97
Table 24: Frequency and Percentages of Macrohaplogroup M lineages
Branches Bhil Kokana Pawara Warli Total Haplogroups Freq. % Freq. % Freq. % Freq. % Freq. % M* 0 0 0 0 1 1.099 0 0 1 0.267 M* Total M* 0 0 0 0 1 1.099 0 0 1 0.267 M2a1 0 0 1 1.136 0 0 0 0 1 0.267 M2a1a 0 0 0 0 1 1.099 3 3.093 4 1.07 M2a1a+207 0 0 5 5.682 1 1.099 2 2.062 8 2.139 M2a M2a1a3 0 0 1 1.136 0 0 0 0 1 0.267 M2a1a3+16093 0 0 5 5.682 0 0 3 3.093 8 2.139 M2a1b 5 5.102 0 0 6 6.593 0 0 11 2.941 Total M2a 5 5.102 12 13.636 8 8.791 8 8.248 33 8.823 M2b 1 1.02 1 1.136 0 0 1 1.031 3 0.802 M2b M2b2 0 0 0 0 0 0 2 2.062 2 0.535 Total M2b 1 1.02 1 1.136 0 0 3 3.093 5 1.337 M3 0 0 0 0 4 4.396 0 0 4 1.07 M3a1+204 4 4.082 2 2.273 8 8.791 4 4.124 18 4.813 M3c1b 0 0 0 0 0 0 1 1.031 1 0.267 M3 M3c2 0 0 3 3.409 0 0 0 0 3 0.802 M3d 1 1.02 2 2.273 0 0 0 0 3 0.802 Total M3 5 5.102 7 7.955 12 13.187 5 5.155 29 7.754 M30 1 1.02 0 0 0 0 1 1.031 2 0.535 M30+16234 4 4.082 0 0 0 0 1 1.031 5 1.337 M30a 2 2.041 0 0 0 0 0 0 2 0.535 M30b 1 1.02 0 0 0 0 0 0 1 0.267 M30 M30c1 0 0 0 0 2 2.198 0 0 2 0.535 M30d 0 0 0 0 2 2.198 0 0 2 0.535 M30f 0 0 0 0 5 5.495 0 0 5 1.337 M30g 0 0 2 2.273 0 0 0 0 2 0.535 Total M30 8 8.163 2 2.273 9 9.89 2 2.062 21 5.615 M33a1b 1 1.02 1 1.136 0 0 0 0 2 0.535 M33a2a 1 1.02 0 0 1 1.099 0 0 2 0.535 M33 M33b+16362 0 0 1 1.136 0 0 0 0 1 0.267 Total M33 2 2.041 2 2.273 1 1.099 0 0 5 1.337 M35+199 0 0 0 0 1 1.099 0 0 1 0.267 M35b 0 0 0 0 0 0 1 1.031 1 0.267 M35b+16304 2 2.041 0 0 0 0 6 6.186 8 2.139 M35 M35b1 2 2.041 0 0 0 0 1 1.031 3 0.802 M35c 10 10.204 0 0 2 2.198 0 0 12 3.209 Total M35 14 14.286 0 0 3 3.297 8 8.247 25 6.684 M37 M37+152+151 2 2.041 0 0 0 0 14 14.433 16 4.278
98
Branches Bhil Kokana Pawara Warli Total Haplogroups M37e2 1 1.02 0 0 0 0 0 0 1 0.267 Total M37 3 3.061 0 0 0 0 14 14.433 17 4.545 M38c 0 0 0 0 0 0 1 1.031 1 0.267 M38 Total M38c 0 0 0 0 0 0 1 1.031 1 0.267 M39 0 0 0 0 1 1.099 0 0 1 0.267 M39b 0 0 1 1.136 0 0 0 0 1 0.267 M39 M39b1 3 3.061 0 0 3 3.297 0 0 6 1.604 Total M39 3 3.061 1 1.136 4 4.396 0 0 8 2.139 M4a 3 3.061 5 5.682 0 0 2 2.062 10 2.674 M4a Total M4a 3 3.061 5 5.682 0 0 2 2.062 10 2.674 M53 0 0 0 0 1 1.099 0 0 1 0.267 M53 Total M53 0 0 0 0 1 1.099 0 0 1 0.267 M57+152 3 3.061 0 0 1 1.099 0 0 4 1.07 M57a 1 1.02 0 0 6 6.593 0 0 7 1.872 M57 M57b 0 0 2 2.273 0 0 0 0 2 0.535 M57b1 5 5.102 0 0 8 8.791 0 0 13 3.476 Total M57 9 9.184 2 2.273 15 16.484 0 0 26 6.952 M5a2a1a2 0 0 2 2.273 0 0 0 0 2 0.535 M5a3b 0 0 0 0 1 1.099 0 0 1 0.267 M5a4 0 0 1 1.136 2 2.198 1 1.031 4 1.07 M5 M5a'd 5 5.102 8 9.091 2 2.198 4 4.124 19 5.08 M5b2a 0 0 0 0 2 2.198 1 1.031 3 0.802 Total M5 5 5.102 11 12.5 7 7.692 6 6.186 29 7.754 M6 0 0 0 0 0 0 7 7.216 7 1.872 M6 Total M6 0 0 0 0 0 0 7 7.216 7 1.872 M64 0 0 5 5.682 0 0 0 0 5 1.337 M64 Total M64 0 0 5 5.682 0 0 0 0 5 1.337 M65 0 0 1 1.136 0 0 0 0 1 0.267 M65a+@16311 1 1.02 0 0 0 0 0 0 1 0.267 M65 M65b 1 1.02 0 0 2 2.198 0 0 3 0.802 Total M65 2 2.041 1 1.136 2 2.198 0 0 5 1.337 M73a 2 2.041 0 0 0 0 0 0 2 0.535 M73 Total M73a 2 2.041 0 0 0 0 0 0 2 0.535 G2b1a1 1 1.02 0 0 2 2.198 0 0 3 0.802 G2b G2b2a 0 0 0 0 0 0 2 2.062 2 0.535 Total G2b 1 1.02 0 0 2 2.198 2 2.062 5 1.337
99
Table 25: Frequency and Percentages of Macrohaplogroup N (xR) lineages
Branches Bhil Kokana Pawara Warli Total Haplogroups
Freq. % Freq. % Freq. % Freq. % Freq. % N5 0 0 0 0 0 0 1 1.031 1 0.267 N5 Total N5 0 0 0 0 0 0 1 1.031 1 0.267 A1a 4 4.082 0 0 0 0 0 0 4 1.07 A1a Total A1a 4 4.082 0 0 0 0 0 0 4 1.07 X2d 0 0 3 3.409 0 0 0 0 3 0.802 X2d Total X2d 0 0 3 3.409 0 0 0 0 3 0.802
Table 26: Frequency and Percentages of haplogroup R (xU) lineages
Branches Bhil Kokana Pawara Warli Total Haplogroups
Freq. % Freq. % Freq. % Freq. % Freq. % R30 0 0 0 0 4 4.396 0 0 4 1.07 R30a1b 0 0 5 5.682 0 0 3 3.093 8 2.139 R30a1b1 0 0 0 0 0 0 1 1.031 1 0.267 R30 R30b1 0 0 3 3.409 0 0 0 0 3 0.802 R30b2 6 6.122 0 0 4 4.396 0 0 10 2.674 Total R30 6 6.122 8 9.091 8 8.791 4 4.124 26 6.952 R5a1 3 3.061 3 3.409 0 0 5 5.155 11 2.941 R5a2 3 3.061 0 0 1 1.099 2 2.062 6 1.604 R5 R5a2b 3 3.061 5 5.682 5 5.495 0 0 13 3.476 Total R5 9 9.184 8 9.091 6 6.593 7 7.216 30 8.021 R6a 1 1.02 0 0 0 0 0 0 1 0.267 R6a1 1 1.02 0 0 0 0 0 0 1 0.267 R6 R6a2 1 1.02 0 0 1 1.099 0 0 2 0.535 Total R6 3 3.061 0 0 1 1.099 0 0 4 1.07 R8a1 1 1.02 0 0 0 0 0 0 1 0.267 R8a1a1a1 1 1.02 2 2.273 3 3.297 1 1.031 7 1.872 R8 R8b1 0 0 2 2.273 0 0 0 0 2 0.535 Total R8 2 2.041 4 4.545 3 3.297 1 1.031 10 2.674 HV14a 0 0 0 0 0 0 3 3.093 3 0.802 HV14 Total 0 0 0 0 0 0 3 3.093 3 0.802 HV14a H13 H13a1d 1 1.02 0 0 0 0 0 0 1 0.267
100
Branches Bhil Kokana Pawara Warli Total Haplogroups
Freq. % Freq. % Freq. % Freq. % Freq. % H2a H2a2a 0 0 0 0 2 2.198 0 0 2 0.535 H3b H3b6 2 2.041 0 0 0 0 0 0 2 0.535 H6 H6 0 0 2 2.273 0 0 0 0 2 0.535
Total H 3 3.061 2 2.273 2 2.198 0 0 7 1.872 J1b1a1 J1b1a1 0 0 1 1.136 0 0 0 0 1 0.267 T1a T1a+152 0 0 1 1.136 0 0 0 0 1 0.267
Total JT 0 0 2 2.273 0 0 0 0 2 0.535
Table 27: Frequency and Percentages of haplogroup U lineages
Branches Bhil Kokana Pawara Warli Total Haplogroups Freq. % Freq. % Freq. % Freq. % Freq. % U1a 0 0 1 1.136 2 2.198 0 0 3 0.802 U1a Total U1a 0 0 1 1.136 2 2.198 0 0 3 0.802 U5b2a 0 0 0 0 0 0 2 2.062 2 0.535 U5 Total U5b2a 0 0 0 0 0 0 2 2.062 2 0.535 U2a1 1 1.02 1 1.136 0 0 0 0 2 0.535 U2a1a 2 2.041 0 0 2 2.198 2 2.062 6 1.604 U2a U2a2 1 1.02 0 0 0 0 0 0 1 0.267 Total U2a 4 4.081 1 1.136 2 2.198 2 2.062 9 2.406 U2b 1 1.02 0 0 0 0 0 0 1 0.267 U2b U2b2 0 0 3 3.409 0 0 0 0 3 0.802 Total U2b 1 1.02 3 3.409 0 0 0 0 4 1.069 U2c1a 0 0 0 0 2 2.198 0 0 2 0.535 U2c’d U2c'd 0 0 5 5.682 0 0 14 14.433 19 5.08 Total U2c'd 0 0 5 5.682 2 2.198 14 14.433 21 5.615 U4a1 0 0 0 0 0 0 3 3.093 3 0.802 U4a1 Total U4a1 0 0 0 0 0 0 3 3.093 3 0.802 U7 1 1.02 0 0 0 0 0 0 1 0.267 U7a 0 0 2 2.273 0 0 2 2.062 4 1.07 U7 U7a3b 1 1.02 0 0 0 0 0 0 1 0.267 Total U7 2 2.041 2 2.273 0 0 2 2.062 6 1.604 K2a5 1 1.02 0 0 0 0 0 0 1 0.267 K2a Total K2a5 1 1.02 0 0 0 0 0 0 1 0.267 n 98 88 100 91 100 97 100 374 100
101
Further, it was also observed that each tribal community has a different collation of mtDNA Haplogroups and its sub-haplogroups.
Figure 21: Venn diagram showing number of shared and unique mtDNA basal haplogroups among the Tribal Populations of West Maharashtra. (See Description)
Figure 21 depicts the number of shared and unique mtDNA basal haplogroups among the Tribal Populations of West Maharashtra. Eight haplogroups were seen all four populations, Bhil, Pawara, Kokana and Warli; M2, M3, M30, M5, R30, R5, R8, U2.
All of these are haplogroups with older coalescence, and all of them are typical South
Asian haplogroups. Bhil and Pawara, geographically contiguous tribal communities share R6 between themselves. However, Kokana and Warli, do not share any specific haplogroup between themselves. Bhil and Warli share M37, whereas Pawara and
Kokana share western Eurasian haplogroup U1a. Bhil and Kokana as well as Pawara
102 and Warli do not have common haplogroups. Four haplogroups; M33, M39, M57 and
M65, are shared among Bhil, Pawara and Kokana, Warli do not harbour these haplogroups. When Warli, Pawara and Kokana are considered together, there are no shared haplogroups. Bhil, Pawara and Warli share G2b and M35, whereas Bhil, Kokana and Warli share M4a and U7. Additionally, each community harbours some haplogroups which they do not share with other three communities, Bhils have 5 exclusive haplogroups; A1a, M73 which are Eastern Eurasian haplogroup and, H13,
H3b, K2a, which are western Eurasian haplogroups. Pawara exhibit M53 and a single unclassified M* mtDNA along with West Eurasian H2a haplogroup. South Asian haplogroup M64, and West Eurasian H6, J1b1a1, T1a, X2d are exclusively observed in
Kokana. Warli have highest, i.e. 6 private haplogroups, South Asia specific M6, M38 and rare N5 haplogroup represented by a single mtDNA along with Western Eurasian
U4a1, U5 and HV14 haplogroups.
It is interesting to note that haplogroups shared by 3 or 4 communities are largely South
Asia specific haplogroups which have older coalescence age. On the other hand, haplogroups exclusive to each community are largely Western Eurasian haplogroups.
This pattern may indicate the ancient mtDNA stratum among the tribal communities of
Maharashtra. Western Eurasian haplogroups exclusive to the communities may indicate differential gene flow to each of the communities, or loss of lineages due to drift among these communities. Additionally, sharing of South Asian deep-rooted lineages and exclusivity of west Eurasian lineages, may indicate common origin of these tribal communities, followed by strict endogamy. It is further accentuated while comparing shared and unique branches of haplogroups among four tribal populations. Number of haplogroup branches shared is minimal as opposed to exclusive branches in each of the communities, indicating recent but strong endogamy.
103
To infer the phylogeny of 374 mtDNA control region sequences from the present study, and to contextualise them in the larger subcontinental as well as global mtDNA picture, median joining networks were drawn for each haplogroup separately. For such analyses, mtGenomes with complete classification resolutions were selected from published literature (references are given in relevant sections and accession numbers are given in Annexure 5). Control region, 16024-576, of selected mtGenomes and mtDNA sequences from the present study were then utilised to draw median joining trees using Network 5.0.1.1 (fluxus-engineering; Bandelt et al., 1999). This analysis also helped to confirm the haplogroup assignment done by Haplogrep2 (Weissensteiner et al., 2016). Following sections describe the haplogroups and median joining trees.
Macrohaplogroup M
All the mtDNA outside Africa fall in two Macrohaplogroups, M and N. Haplogroup M is defined by 489-10400-14783-15043 substitutions. M and its sub-branches constitute
~60% to 80% of South Asian mtDNA lineages.
Based on the estimation of haplogroups using Haplogrep2, median joining networks and manual near-matching with published mtGenomes, a total of 18 mtDNA haplogroups belonging to macrohaplogroup M were observed in the present study. Two haplogroups (M73, G2), are east Asian whereas rest 16 are South Asia Specific haplogroups (M*, M38, M4a, M30, M37, M64, M65, M39, M2, M3, M5, M6, M33,
M35, M53, M57) One sequence belonging to Pawara tribal community could not be classified further than M node solely based on the control region mutations. This sequence therefore has been reported as M* and not used for median joining analyses.
104
M2
M2 haplogroup is defined by 447G-1780-11083-15670-16274 substitutions.
M2 is one of the oldest haplogroup, specific to Indian subcontinent(Kumar et al., 2008;
Chandrasekar et al., 2009).
In the present study, M2 mtDNA were observed among all the populations under study and it was the most frequently observed haplogroups (38 out of 374, 10.16%, Table 24) following sub sections describe two branches M2a and M2b separately.
M2a
M2a lineage of M2 haplogroup s defined by substitutions for M2a’b 8502-16319 and M2a defining substitutions 7961-12810.
@ M2a1a @
M2a1b
Figure 22: Median joining network of M2a
105
M2a is a South Asia specific haplogroup. A total of 33 (8.823 %) sequences were classified as M2a. M2a Haplogroup with Total 33 samples, was seen among all the four
Tribal populations.( Bhil (5 samples), Kokana (12 samples), Pawara (8 samples), Warli
(8 samples)) (Table 24).
M2a1, M2a1a, M2a1a+207, M2a1a3, M2a1a3+16093, M2a1b lineages were observed among the four tribal populations included in the present study (Figure 22).
63 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification. (Kumar et al., 2008, 2009; Chandrasekar et al., 2009; Sharma et al., 2012; Khan et al., 2013;
Lippold et al., 2014; Palanichamy et al., 2015a; Hartmann et al., 2016; Kutanan et al.,
2018).
TMRCA and its SD in years, estimated using ρ statistic, for M2a haplogroup in the present study is 39629 ± 11492 years. Published age estimates of M2a are 29633.4 ±
5904 years (Behar et al., 2012), 20300 years (11400-29600) (Soares et al., 2009), 34400
(21400 - 48000) years (Silva et al., 2017). TMRCA for M2a haplogroup from the present study is higher than Behar et al. (2012) estimate.
M2b
M2b is defined by 152-182-195-1453-2831T-3630-5744-6647-9899-13254-@14766-
16169.1C-16189-16320.
M2b is a South Asia specific haplogroup. A total of 5 (1.337 %) sequences were classified as M2b. M2b Haplogroup with Total 5 samples, was observed among Bhil (1 sample), Kokana (1 sample), Warli (3 samples), tribal populations. It was absent in
Pawara tribal population. (Table 24).
106
M2b2
M2b
Figure 23: Median Joining Network of M2b
M2b, M2b2 lineages were observed among the four tribal populations included in the present study (Figure 23).
17 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification
(Rajkumar et al., 2005; Sun et al., 2006; Kumar et al., 2008; Chandrasekar et al., 2009;
Palanichamy et al., 2015a).
TMRCA and its SD in years, estimated using ρ statistic, for M2b haplogroup in the present study is 29081 ± 9863 years. Published age estimates of M2b are 14245.4 ±
3926.4 years (Behar et al., 2012), 12800 years (5500-20400) (Soares et al., 2009),
107
14400 (9100 - 19700) years (Silva et al., 2017). TMRCA for M2b haplogroup from the present study is higher than Behar et al. (2012) estimate.
M3
M3 haplogroup is defined by 482-16126.
M3 is a South Asia specific haplogroup. A total of 29 (7.754 %) sequences were classified as M3. M3 Haplogroup with Total 29 samples was seen among all the four
Tribal populations. (Bhil (5 samples), Kokana (7 samples), Pawara (12 samples), Warli
(5 samples)) (Table 24).
M3, M3a1+204, M3c1b, M3c2, M3d lineages were observed among the four tribal populations included in the present study (Figure 24).
117 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification
(Rajkumar et al., 2005; Sun et al., 2006; Thangaraj et al., 2006; Chandrasekar et al.,
2009; Fornarino et al., 2009; Govindaraj et al., 2011; Schönberg et al., 2011; Behar et al., 2012; Wang et al., 2012; Khan et al., 2013; Lippold et al., 2014; Zheng et al., 2014;
Palanichamy et al., 2015a; Pradutkanchana et al., 2016; Sharma et al., 2017; Silva et al.,
2017; Kutanan et al., 2018).
108
M3c2
M3a1+204 M3d
Figure 24: Median Joining Network of M3
TMRCA and its SD in years, estimated using ρ statistic, for M3 haplogroup in the present study is 23580 ± 6347 years. Published age estimates of M3 are 23904.4 ±
7132.8 years (Behar et al., 2012), 35300 years (21400-50000) (Soares et al., 2009),
25800 (19000 - 32800) years (Silva et al., 2017). TMRCA for M3 haplogroup from the present study is lower than Behar et al. (2012) estimate.
M30
195A-15431 substitutions define M30 haplogroup.
M30 is a South Asia specific haplogroup. A total of 21 (5.615 %) sequences were classified as M30.
109
M30 Haplogroup with Total 21 sample, was seen among all the four Tribal populations.( Bhil (8 samples), Kokana (2 samples), Pawara (9 samples), Warli (2 samples)) (Table 24).
M30, M30+16234, M30a, M30b, M30c1, M30d, M30f, M30g lineages were observed among the four tribal populations included in the present study (Figure 25).
M30+16234
M30f
30b M
M30a
Figure 25: Median joining network of M30
89 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification (Maca-
Meyer et al., 2001; Rajkumar et al., 2005; Sun et al., 2006; Chandrasekar et al., 2009;
Govindaraj et al., 2011; Behar et al., 2012; Wang et al., 2012; Khan et al., 2013;
110
Lippold et al., 2014; Zheng et al., 2014; Li et al., 2015; Palanichamy et al., 2015a;
Hartmann et al., 2016; Marrero et al., 2016; Sharma et al., 2016c; Peng et al., 2017;
Sharma et al., 2017; Kutanan et al., 2018).
TMRCA and its SD in years, estimated using ρ statistic, for M30 haplogroup in the present study is 20094 ± 3616 years. Published age estimates of M30 are 17431.1 ±
4012.8 years (Behar et al., 2012), 22300 years (14600-30200) (Soares et al., 2009),
15200 (12200 - 18300) years (Silva et al., 2017). TMRCA for M30 haplogroup from the present study is higher than Behar et al. (2012) estimate.
M33
A single substitution 2361 characterises M33 haplogroup. In the present study, the branches were identified by control region mutations and near matching.
M33 is a South Asia specific haplogroup. A total of 5 (1.337 %) sequences were classified as M33.
M33 Haplogroup with Total 5 samples was observed among Bhil (2 samples), Kokana
(2 samples), Pawara (1 sample), tribal populations. It was absent in Warli tribal population (Table 24).
M33a1b, M33a2a, M33b+16362 lineages were observed among the four tribal populations included in the present study (Figure 26).
31 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.(Sun et al., 2006; Thangaraj et al., 2006; Abu-Amero et al., 2008; Fornarino et al., 2009;
Kumar et al., 2009; Al-Zahery et al., 2011; Wang et al., 2012; Li et al., 2015;
Palanichamy et al., 2015a)
111
@
M33a M33b
Figure 26: Median joining network of M33
TMRCA and its SD in years, estimated using ρ statistic, for M33 haplogroup in the present study is 33515 ± 7716 years. Published age estimates of M33 are 42331.8 ±
9388.8 years (Behar et al., 2012), 44900 years (32900-57300) (Soares et al., 2009),
38000 (29300 - 47000) years (Silva et al., 2017). TMRCA for M33 haplogroup from the present study is lower than Behar et al. (2012) estimate.
112
M35
M35 is defined by 12561, with a further branching M35+199 by additional 199 transition.
M35 is a South Asia specific haplogroup. A total of 25 (6.684 %) sequences were classified as M35. M35 Haplogroup, with Total 25 samples, was observed among Bhil
(14 samples), Pawara (3 samples), Warli (8 samples), tribal populations. It was absent in Kokana tribal population (Table 24).
M35+199, M35b, M35b+16304, M35b1, M35c lineages were observed among the four tribal populations included in the present study (Figure 27).
22 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.(Sun et al., 2006; Fornarino et al., 2009; Kumar et al., 2009; Govindaraj et al., 2011; Wang et al., 2012; Khan et al., 2013; Palanichamy et al., 2015a; Pradutkanchana et al., 2016)
TMRCA and its SD in years, estimated using ρ statistic, for M35 haplogroup in the present study is 38849 ± 10804 years. Published age estimates of M35 are 39085.2 ±
9964.8 years (Behar et al., 2012), 39600 years (26600-53100) (Soares et al., 2009),
26900 (18500 - 35600) years (Silva et al., 2017). TMRCA for M35 haplogroup from the present study is lower than Behar et al. (2012) estimate.
113
M35b1
M35b+16304 M35c
Figure 27: Median joining network of M35
M37
M37 is defined by 10556 followed by yet unresolved branches with 152 and 151 transitions.
M37 is a South Asia specific haplogroup. A total of 17 (4.545 %) sequences were classified as M37. M37 Haplogroup, with Total 17 samples, was observed among Bhil
(3 samples), Warli (14 samples), tribal populations. It was absent in 2 tribal populations
- Kokana, Pawara (Table 24).
M37+152+151, M37e2 lineages were observed among the four tribal populations included in the present study (Figure 28).
114
M37+152+151 M37e2
Figure 28: Median joining network of M37
20 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.
(Sun et al., 2006; Thangaraj et al., 2006; Chandrasekar et al., 2009; Sharma et al., 2012;
Palanichamy et al., 2015a; Kutanan et al., 2017, 2018)
TMRCA and its SD in years, estimated using ρ statistic, for M37 haplogroup in the present study is 29796 ± 9997 years. Published age estimates of M37 are 29269 ±
7027.2 years (Behar et al., 2012), 34700 years (22800-47200) (Soares et al., 2009),
18200 (11500 - 25200) years (Silva et al., 2017). TMRCA for M37 haplogroup from the present study is higher than Behar et al. (2012) estimate.
115
M*, M38, M53, M73
M macrohaplogroup is defined by 489 10400 14783 15043 substitutions stemming from L3 lineage. A single sample was observed among Pawara tribal population, which could not be classified further than M node and has been tentatively classified as M*.
M38 is a clade of M4”67 and M18’38, defined by 12498-13135-16318T substitutions.
M38c is a South Asia specific haplogroup. A single (0.267 %) sequence was classified as M38c (Table 24) in the Warli tribal populations included in the present study (Figure
29).
M53
Figure 29: Median joining network of M*, M38, M53 and M73
116
ρ statistic and therefor TMRCA for were not calculated due to small sample size.
Published age estimates of M38 are 26724.8 ± 5529.6 years (Behar et al., 2012), 16800 years (7900-26300) (Soares et al., 2009), 32500 (23600 - 41700) years (Silva et al.,
2017).
M53 is defined by 240-390T-572-5493-5821-9302-11560-16051-16189-16316 mutations. Single (0.267 %) Pawara mtDNA sequence belonged to M53 haplogroup
(Table 24).
ρ statistic and therefor TMRCA for were not calculated due to small sample size.
Published age estimates of M53 are 19904.6 ± 7084.8 years (Behar et al., 2012), and
20600 (9000 - 33000) years (Silva et al., 2017).
M73’79 is defined by 14034-16278 but M73 does not have any haplogroup defining mutations in the control region. Sequences belonging to M73a in the present study were classified using near matching with other sequences and 16184A which characterises
M73a lineage.
M73a is a East Asian haplogroup, also observed in South Asia. A total of 2 (0.535 %) sequences were classified as M73a. M73a Haplogroup, with Total 2 samples, was observed among Bhil (2 samples), tribal population. It was absent in 3 tribal populations - Kokana, Pawara, Warli (Table 24). M73a lineages were observed among the Bhil tribal populations included in the present study. INSERT FIG NUMBER
ρ statistic and therefore TMRCA for were not calculated due to small sample size.
Published age estimates of M73 are 33192.8 ± 6326.4 years (Behar et al., 2012).
16 additional sequences were used to construct a composite median joining network diagram. (Sun et al., 2006; Chandrasekar et al., 2009; Kumar et al., 2009; Tabbada et al., 2010; Sharma et al., 2012, Palanichamy et al., 2015a)
117
ρ statistic and therefor TMRCA for M*, M38, M53, M73 were not calculated due to small sample size.
M39
M39’70 haplogroup contains M39 clade, which is defined by characteristic indels and other substitutions 55.1T-59-60d-65.1T-(66T)-1811-15938.
M39 is a South Asia specific haplogroup. A total of 8 (2.139 %) sequences were classified as M39. M39 Haplogroup, with Total 8 samples, was observed among Bhil (3 samples), Kokana (1 sample), Pawara (4 samples), tribal populations. It was absent in
Warli tribal population (Table 24).
M39b1
M39b
M39
M
Figure 30: Median joining network of M39
118
M39, M39b, M39b1 lineages were observed among the four tribal populations included in the present study (Figure 30).
33 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification. (Sun et al., 2006; Chandrasekar et al., 2009; Sharma et al., 2012; Bhandari et al., 2015;
Palanichamy et al., 2015a)
TMRCA and its SD in years, estimated using ρ statistic, for M39 haplogroup in the present study is 67431 ± 14434 years. Published age estimates of M39 are 26638.7 ±
5750.4 years (Behar et al., 2012), 32300 years (20700-44400) (Soares et al., 2009),
23700 (15300 - 32500) years (Silva et al., 2017). TMRCA for M39 haplogroup from the present study is higher than Behar et al. (2012) estimate.
M4
M4”67 contains haplogroup clade defined by recurrent mutation 16311 and 6620-7859-
16145-16261 are M4 specific mutations.
M4a is a South Asia specific haplogroup. A total of 10 (2.674 %) sequences were classified as M4a. M4a Haplogroup, with Total 10 samples, was observed among Bhil
(3 samples), Kokana (5 samples), Warli (2 samples), tribal populations. It was absent in
Pawara tribal population (Table 24). M4a lineages were observed among the four tribal populations included in the present study (Figure 31).
16 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.
(Sun et al., 2006; Thangaraj et al., 2006; Chandrasekar et al., 2009; Derenko et al.,
2013b; Lippold et al., 2014; Li et al., 2015; Palanichamy et al., 2015a; Kutanan et al.,
2017)
119
M4a
M4
M
Figure 31: Median joining network of M4
TMRCA and its SD in years, estimated using ρ statistic, for M4a haplogroup in the present study is 26312 ± 10592 years. Published age estimates of M4a are 12734.3 ±
7315.2 years (Behar et al., 2012), 36500 years (26100-47300) (Soares et al., 2009),
11300 (7300 - 15500) years (Silva et al., 2017). TMRCA for M4a haplogroup from the present study is higher than Behar et al. (2012) estimate.
M57
3483-4020-13651-16311 characterise haplogroup M57.
M57 is a South Asia specific haplogroup. A total of 26 (6.952 %) sequences were classified as M57. M57 Haplogroup, with Total 26 samples, was observed among Bhil
120
(9 samples), Kokana (2 samples), Pawara (15 samples), tribal populations. It was absent in Warli tribal population (Table 24).
M57+152, M57a, M57b, M57b1 lineages were observed among the four tribal populations included in the present study (Figure 32).
Figure 32: Median joining network of M57
13 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.
(Thangaraj et al., 2006; Chandrasekar et al., 2009; Lippold et al., 2014; Palanichamy et al., 2015a; Kutanan et al., 2017)
121
TMRCA and its SD in years, estimated using ρ statistic, for M57 haplogroup in the present study is 31354 ± 9688 years. Published age estimates of M57 are 30220.7 ±
8448 years (Behar et al., 2012), 28800 (19000 - 38900) years (Silva et al., 2017).
TMRCA for M57 haplogroup from the present study is higher than Behar et al. (2012) estimate.
M5
M5 is defined by 1888-16129 substitutions.
M5a3b M5a2a1a
M5a’d
M5b2 M5
Figure 33: Median joining network of M5
M5 is a widely distributed South Asia specific haplogroup. A total of 29 (7.754 %) sequences were classified as M5. M5 Haplogroup, with Total 29 samples, was seen
122 among all the four Tribal populations.( Bhil (5 samples), Kokana (11 sample), Pawara
(7 samples), Warli (6 samples)) (Table 24). M5a2a1a2, M5a3b, M5a4, M5a'd, M5b2a lineages were observed among the four tribal populations included in the present study
(Figure 33).
125 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification. (Sun et al., 2006; Thangaraj et al., 2006; Behar et al., 2008; Chandrasekar et al., 2009;
Fornarino et al., 2009; Govindaraj et al., 2011; Kong et al., 2011; Behar et al., 2012;
Sharma et al., 2012; Wang et al., 2012; Derenko et al., 2013b; Gómez-Carballa et al.,
2013; Kang et al., 2013; Lippold et al., 2014; Li et al., 2015; Palanichamy et al., 2015a;
Hartmann et al., 2016; Marrero et al., 2016; Sharma et al., 2016c; Vyas et al., 2016;
Kutanan et al., 2017; Malyarchuk et al., 2017; Peng et al., 2017; Sharma et al., 2017)
TMRCA and its SD in years, estimated using ρ statistic, for M5 haplogroup in the present study is 19316 ± 3633 years. Published age estimates of M5 are 37067.2 ±
14803.2 years (Behar et al., 2012), 39600 years (27600-52100) (Soares et al., 2009),
32100 (21400 - 43200) years (Silva et al., 2017). TMRCA for M5 haplogroup from the present study is lower than Behar et al. (2012) estimate.
M6
461-5301-5558-10640-14128-16362 characterise M6 haplogroup.
M6 is a South Asia specific haplogroup. A total of 7 (1.872 %) sequences were classified as M6. M6 Haplogroup, with Total 7 samples, was observed among Warli (7 samples), tribal population. It was absent in 3 tribal populations - Bhil, Kokana, Pawara
(Table 24). M6 lineages were observed among the four tribal populations included in the present study (Figure 34).
123
25 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification. (Khan et al., 2013; Palanichamy et al., 2015a)
TMRCA and its SD in years, estimated using ρ statistic, for M65 haplogroup in the present study is 24343 ± 7752 years. Published age estimates of M65 are 25256 ± 6528 years (Behar et al., 2012), 20600 (12600 - 29000) years (Silva et al., 2017). TMRCA for M65 haplogroup from the present study is lower than Behar et al. (2012) estimate.
M6
Figure 34: Median joining network of M6
124
M64
M64, nested in M4”67, is defined by 152-5201-8843-9947-10685-13105-15355-15968-
16263-16527.
M64 is a South Asia specific haplogroup. A total of 5 (1.337 %) sequences were classified as M64. M64 Haplogroup, with Total 5 samples, was observed among
Kokana (5 samples), tribal population. It was absent in 3 tribal populations - Bhil,
Pawara, Warli (Table 24). M64 lineages were observed among the four tribal populations included in the present study (Figure 35).
3 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification. (Behar et al., 2008; Chandrasekar et al., 2009)
Figure 35: Median joining network of M64
TMRCA and its SD in years, estimated using ρ statistic, for M64 haplogroup in the present study is 7926 ± 5189 years. Published age estimates of M64 are 12624.2 ±
125
5289.6 years (Behar et al., 2012), 18100 (8100 - 28500) years (Silva et al., 2017).
TMRCA for M64 haplogroup from the present study is lower than Behar et al. (2012) estimate.
M65
M65 is defined by a single 511 transition.
M65b is a South Asia specific haplogroup. A total of 3 (0.802 %) sequences were classified as M65b. M65 Haplogroup, with Total 5 samples, was observed among Bhil
(2 samples), Kokana (1 sample), Pawara (2 samples), tribal populations. It was absent in Warli tribal population (Table 24).
M65, M65a+@16311, M65b lineages were observed among the four tribal populations included in the present study (Figure 36).
28 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification. (Sun et al., 2006; Abu-Amero et al., 2008; Chandrasekar et al., 2009; Kong et al., 2011; Khan et al., 2013; Lippold et al., 2014; Palanichamy et al., 2015a; Sharma et al., 2016c; Peng et al., 2017; Sharma et al., 2017)
TMRCA and its SD in years, estimated using ρ statistic, for M65 haplogroup in the present study is 24343 ± 7752 years. Published age estimates of M65 are 25256 ± 6528 years (Behar et al., 2012), 20600 (12600 - 29000) years (Silva et al., 2017). TMRCA for M65 haplogroup from the present study is lower than Behar et al. (2012) estimate.
126
M65a+16311@
M65
Figure 36: Median joining network of M65
G2
709-4833-5108-16362 defines G, and 5601-13563 define G2 haplogroup.
G2b is a East Asian haplogroup, also observed in South Asia. A total of 5 (1.337 %) sequences were classified as G2b (Table 24).
G2b1a1, G2b2a lineages were observed among the four tribal populations included in the present study (Figure 37).
127
G2b1a1
G2b2a G
Figure 37: Median joining network of G2
37 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification. (Kong et al., 2003; Zhang et al., 2008; Chandrasekar et al., 2009; Wang et al., 2011; Ji et al.,
2012; Jiang et al., 2014; Ko et al., 2014; Summerer et al., 2014; Hartmann et al., 2016;
Kutanan et al., 2017, 2018; Peng et al., 2017; Derenko et al., 2018; Zheng et al., 2018)
TMRCA and its SD in years, estimated using ρ statistic, for G2b haplogroup in the present study is 14055 ± 4406 years. Published age estimates of G2b are 22776.4 ±
128
5059.2 years (Behar et al., 2012). TMRCA for G2b haplogroup from the present study is lower than Behar et al. (2012) estimate.
Macrohaplogroup N
Haplogroup N is defined by 8701-9540-10398-10873-15301, and harbours derived clades of macrohaplogroup R.
In the present study, south Asia specific N5 haplogroup, A1a haplogroup shared with
East Asia and X2d which is common in west Asia are observed (Table 25, Table 26,
Table 27, Figure 20). Haplogroups shared with Western Eurasia are important for the questions of language shifts and agricultural transition with or without admixture posed in the present study.
N5
N5 is defined by 5063-7076-9545-11626-(13434)-16111-16311 substitutions.
N5 is a rare South Asia specific haplogroup. One (0.267 %) sequence from Warli population was classified as N5 (Table 25, Figure 38)
6 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.
(Palanichamy et al., 2004, 2015a; Sharma et al., 2012)
129
N5
Figure 38: Median joining network of N5
ρ statistic and therefor TMRCA for were not calculated due to small sample size.
Published age estimates of N5 are 36712.4 ± 8202.5 years (Behar et al., 2012), 45300
(28600 - 62800) years (Silva et al., 2017).
130
A1a
Haplogroup A is defined by 235-663-1736-4248-4824-8794-16290-16319 and A1a is by 9713-16249.
A1a is a East Asian haplogroup, also observed in South Asia. A total of 4 (1.07 %) sequences were classified as A1a. A1a Haplogroup, with Total 4 samples, was observed among Bhil (4 samples), tribal population. It was absent in 3 tribal populations -
Kokana, Pawara, Warli (Table 25). A1a lineages were observed among the four tribal populations included in the present study (Figure 39).
Figure 39: Median joining network of A1a
131
4 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.
(Tanaka et al., 2004; Derenko et al., 2007; Bilal et al., 2008; Peng et al., 2017)
ρ statistic and therefor TMRCA for were not calculated due to small sample size.
Published age estimates of A1a are 12987.6 ± 5404.2 years (Behar et al., 2012).
X2d
X2+125 is characterised by 195-1719-225 and X2d by @153-@225-6791-8503
X2d is West Asian haplogroup, also observed in South Asia. A total of 3 (0.802 %) sequences were classified as X2d. X2d Haplogroup, with Total 3 samples, was observed among Kokana (3 samples), tribal population. It was absent in 3 tribal populations - Bhil, Pawara, Warli (Table 25). X2d lineages were observed among the four tribal populations included in the present study (Figure 40).
X2 has a wide but intermittent distribution across west Eurasia (Reidla et al., 2003)
4 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification. (Shlush et al., 2008; Kloss-Brandstätter et al., 2010; Schönberg et al., 2011; Behar et al., 2012)
ρ statistic and therefor TMRCA for were not calculated due to small sample size.
Published age estimates of X2d are 10870.2 ± 3497.9 years (Behar et al., 2012).
132
Figure 40: Median joining network of X2d
Macrohaplogroup R
South Asia specific haplogroups of R (xU); R5, R6, R8, R30 and J, T, HV14, H2, H3,
H6, H13 haplogroups which are shared with West Eurasia were observed (Table 26,
Figure 20).
R5
Haplogroup R5 is defined by 8594-10754-14544-16304-16524.
133
R5 is a South Asia specific haplogroup. A total of 30 (8.021 %) sequences were classified as R5. R5 Haplogroup, with Total 30 samples, was seen among all the four
Tribal populations.( Bhil (9 samples), Kokana (8 samples), Pawara (6 samples), Warli
(7 samples)) (Table 26).
R5a R5
Figure 41: Median joining network of R5
R5a1, R5a2, R5a2b lineages were observed among the four tribal populations included in the present study (Figure 41).
20 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.
(Palanichamy et al., 2004; Behar et al., 2008; Chaubey et al., 2008a; Govindaraj et al.,
134
2011; Sharma et al., 2012; Derenko et al., 2013b; Khan et al., 2013; Sharma et al.,
2016c; Kutanan et al., 2017)
TMRCA and its SD in years, estimated using ρ statistic, for R5a haplogroup in the present study is 38689 ± 10877 years. Published age estimates of R5a are 30665.3 ±
8552.3 years (Behar et al., 2012), 19100 years (11200-27200) (Soares et al., 2009),
32000 (19100 - 45500) years (Silva et al., 2017). TMRCA for R5a haplogroup from the present study is higher than Behar et al. (2012) estimate.
R6
(195)-12285-(16266)-16362 substitutions define R6.
R6 is a South Asia specific haplogroup. A total of 4 (1.07 %) sequences were classified as R6. R6 Haplogroup, with Total 4 samples, was observed among Bhil (3 samples),
Pawara (1 sample), tribal populations. It was absent in 2 tribal populations - Kokana,
Warli (Table 26).
R6a, R6a1, R6a2 lineages were observed among the four tribal populations included in the present study (Figure 42).
21 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.
(Palanichamy et al., 2004; Govindaraj et al., 2011; Sharma et al., 2012; Wang et al.,
2012; Fregel et al., 2014; Kutanan et al., 2017; Silva et al., 2017)
135
R6a1
R6a
R6a2
R6
Figure 42: Median joining network of R6
TMRCA and its SD in years, estimated using ρ statistic, for R6a haplogroup in the present study is 50121 ± 13435 years. Published age estimates of R6a are 41310.8 ±
8832.2 years (Behar et al., 2012), R6 51100 years (35900-66800) (Soares et al., 2009),
33600 (22900 - 44700) years (Silva et al., 2017). TMRCA for R6a haplogroup from the present study is higher than Behar et al. (2012) estimate.
R8
R8 is defined by 195-2755-3384-7759-9449-13215.
R8 is a South Asia specific haplogroup. A total of 10 (2.674 %) sequences were classified as R8. R8 Haplogroup, with Total 10 samples, was seen among all the four
Tribal populations.( Bhil (2 samples), Kokana (4 samples), Pawara (3 samples), Warli
(1 sample)) (Table 26).
136
R8a1a1a1
R8
Figure 43: Median joining network of R8
R8a1, R8a1a1a1, R8b1 lineages were observed among the four tribal populations included in the present study (Figure 43).
40 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.
(Chaubey et al., 2008a; Thangaraj et al., 2009; Khan et al., 2013; Pradutkanchana et al.,
2016)
TMRCA and its SD in years, estimated using ρ statistic, for R8 haplogroup in the present study is 23668 ± 8887 years. Published age estimates of R8 are 32783.3 ±
6890.8 years (Behar et al., 2012), 42100 years (26700-58300) (Soares et al., 2009),
137
33200 (22900 - 43900) years (Silva et al., 2017). TMRCA for R8 haplogroup from the present study is lower than Behar et al. (2012) estimate.
R30
R30 is defined by a single 8584 transition.
R30 is a South Asia specific haplogroup. A total of 26 (6.952 %) sequences were classified as R30. R30 Haplogroup, with Total 26 samples, was seen among all the four
Tribal populations.( Bhil (6 samples), Kokana (8 samples), Pawara (8 samples), Warli
(4 samples)) (Table 26).
R30, R30a1b, R30a1b1, R30b1, R30b2 lineages were observed among the four tribal populations included in the present study (Figure 44).
51 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.
(Palanichamy et al., 2004; Behar et al., 2008; Chaubey et al., 2008a; Fornarino et al.,
2009; Rani et al., 2010; Sharma et al., 2012; Derenko et al., 2013b; Khan et al., 2013;
Kutanan et al., 2017, 2018; Silva et al., 2017; Zheng et al., 2018)
138
R30a1b
R30
Figure 44: Median joining network of R30
TMRCA and its SD in years, estimated using ρ statistic, for R30 haplogroup in the present study is 37714 ± 7152 years. Published age estimates of R30 are 53576.8 ±
3961.4 years (Behar et al., 2012), 64000 years (49900-78600) (Soares et al., 2009),
53000 (40600 - 65800) years (Silva et al., 2017). TMRCA for R30 haplogroup from the present study is lower than Behar et al. (2012) estimate.
J, T
Haplogroup JT is defined by 11251-15452A-16126, with J further characterised by
295-489-10398-12612-13708-16069 and T defined by 709-1888-4917-8697-10463-
13368-14905-15607-15928-16294.
139
J1b1a1 and T1a+152 are West Eurasian haplogroups, also observed in South Asia. A single Kokana (0.267 %) sequence was classified as J1b1a1 and another Kokana (0.267
%) sequence was classified as T1a+152 (Table 26, Figure 45).
37 (T- 4, J- 33) published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification. (Coble et al., 2004; Palanichamy et al., 2004; Kujanová et al., 2009; Li et al., 2014; Just et al., 2015; Gomez-Duran, 2016; Malyarchuk et al., 2017; Pereira et al.,
2017; Peng et al., 2018; Piotrowska-Nowak et al., 2019)
J1b
T1a
JT
Figure 45: Median joining network of J and T
ρ statistic and therefor TMRCA for were not calculated due to small sample size.
Published age estimates of J are 34258.3 ± 4886.2 years (Behar et al., 2012), 32600
140 years (22400-43200) (Soares et al., 2009), and those for T are 25149.4 ± 4668.3 years
(Behar et al., 2012), 26800 years (18100-35800) (Soares et al., 2009).
HV14
480-15115 defines haplogroup HV14.
HV14a is a West Eurasian haplogroup, also observed in South Asia. A total of 3 (0.802
%) sequences were classified as HV14a. HV14a Haplogroup, with Total 3 samples, was observed among Warli (3 samples), tribal population.
HV14
Figure 46: Median joining network of HV14
141
It was absent in 3 tribal populations - Bhil, Kokana, Pawara (Table 26). HV14a lineages were observed among the four tribal populations included in the present study (Figure
46).
50 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.
(Palanichamy et al., 2004; Derenko et al., 2013b; Khan et al., 2013; Margaryan et al.,
2017; Matisoo-Smith et al., 2018; Peng et al., 2018; Sylvester et al., 2018)
ρ statistic and therefor TMRCA for were not calculated due to small sample size.
Published age estimates of HV14a are 6366.8 ± 3904.1 years (Behar et al., 2012).
H2, H3, H6, H13
H13a1d
H6
H2a2a
Figure 47: Median joining network of H2, H3, H6, H13
142
H Haplogroup, with Total 7 samples, was observed among Bhil (3 samples), Kokana (2 samples), Pawara (2 samples), tribal populations. It was absent in Warli tribal population.
H2a2a differs from rCRS by 263-8860-15326.
H2a2a is a West Eurasian haplogroup, also observed in South Asia. A total of 2 Pawara
(0.535 %) sequences were classified as H2a2a (Table 26).
H2a2a lineage was observed among Pawara tribal population included in the present study (Figure 47).
ρ statistic and therefor TMRCA for were not calculated due to small sample size.
Published age estimates of H2 are 11905.3 ± 1364.4 years (Behar et al., 2012), 11700 years (6500-17100) (Soares et al., 2009).
H3 is characterised by 6776.
H3b6 is a West Eurasian haplogroup, also observed in South Asia. A total of 2 (0.535
%) sequences were classified as H3b6 (Table 26). H3b6 lineage was observed among the Bhil tribal population included in the present study (Figure 47).
ρ statistic and therefor TMRCA for were not calculated due to small sample size.
Published age estimates of H3 are 8919 ± 1062.6 years (Behar et al., 2012), 11800 years (8400-15400) (Soares et al., 2009).
H6 is defied by 239-16362-(16482).
H6 is a West Eurasian haplogroup, also observed in South Asia. A total of 2 (0.535 %) sequences were classified as H6 (Table 26) among Kokana tribal population included in the present study (Figure 47).
ρ statistic and therefor TMRCA for were not calculated due to small sample size.
Published age estimates of H6 are 10945.6 ± 1873.7 years (Behar et al., 2012), 15300 years (10500-20300) (Soares et al., 2009).
143
14872 defines H13.
H13a1d is a West Eurasian haplogroup, also observed in South Asia. A total of 1 (0.267
%) sequences were classified as H13a1d (Table 26) among Bhil tribal population included in the present study (Figure 47).
ρ statistic and therefor TMRCA for were not calculated due to small sample size.
Published age estimates of H13 are 12475.9 ± 867.7 years (Behar et al., 2012), 17500 years (13300-21700) (Soares et al., 2009).
6 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.
(Achilli et al., 2004; Roostalu et al., 2007; Behar et al., 2012; Raule et al., 2014;
Malyarchuk et al., 2017)
Haplogroup U
South Asia Specific U2a, U2b, U2c’d (previously known as U2i) and U1, U5, K2a,
U4a, U7 shared with West Eurasia were also observed (Table 27, Figure 20).
Haplogroup U is defined by 11467-12308-12372 substitutions. U2a, U2b and U2c’d branched are South Asia specific, whereas other lineages are common in west Eurasia.
U2a
U2 is defined by 16051, U2a further characterised by 16206C and U2b by 146-@2706-
5186T-12106-13194-15049 substitutions.
U2a is a South Asia specific haplogroup. A total of 9 (2.406 %) sequences were classified as U2a. U2a Haplogroup with Total 9 samples, was seen among all the four
Tribal populations, (Bhil (4 samples), Kokana (1 samples), Pawara (2 samples), Warli
(2 samples)). (Table 27).
U2a1, U2a1a, U2a2 lineages were observed among the four tribal populations included in the present study (Figure 48).
144
12 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.
(Palanichamy et al., 2004; Achilli et al., 2005; van der Walt et al., 2012; Kang et al.,
2013; Zheng et al., 2014; Palanichamy et al., 2015a; Sharma et al., 2016b; Olivieri et al., 2017; Kutanan et al., 2018)
U2a1a
U2a1
U2a
U2
Figure 48: Median joining network of U2a
TMRCA and its SD in years, estimated using ρ statistic, for U2a haplogroup in the present study is 41667 ± 11448 years. Published age estimates of U2a are 22693.8 ±
8274.7 years (Behar et al., 2012), 27500 years (13200-42800) (Soares et al., 2009),
145
35200 (24400 - 46400) years (Silva et al., 2017). TMRCA for U2a haplogroup from the present study is higher than Behar et al. (2012) estimate.
U2b
U2b is a South Asia specific haplogroup. A total of 4 (1.069 %) sequences were classified as U2b. U2b Haplogroup with Total 4 samples, was observed among Bhil (1 samples) and Kokana (3 samples), tribal populations. It was absent in 2 tribal populations – Pawara and Warli (Table 27).
U2b, U2b2 lineages were observed among the four tribal populations included in the present study (Figure 49).
21 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.
(Palanichamy et al., 2004; Achilli et al., 2005; Govindaraj et al., 2011; Khan et al.,
2013; Lippold et al., 2014; Palanichamy et al., 2015a; Sharma et al., 2016b; Kutanan et al., 2017; Peng et al., 2018; Zheng et al., 2018)
146
U2b2
U2b
U2
Figure 49: Median joining network of U2b
TMRCA and its SD in years, estimated using ρ statistic, for U2b haplogroup in the present study is 63044 ± 19726 years. Published age estimates of U2b are 29253.5 ±
5815 years (Behar et al., 2012), 34300 years (22300-46900) (Soares et al., 2009), 39100
(23300 - 55800) years (Silva et al., 2017). TMRCA for U2b haplogroup from the present study is higher than Behar et al. (2012) estimate.
U2c’d
16234 defines U2c’d, with U2c further characterised by 5790A-14935-15061 and U2d by 199-@263-471-1700-4025-8938-11893-14926-16189-16294.
147
U2c'd is a South Asia specific haplogroup. A total of 21 (5.615 %) sequences were classified as U2c'd. U2c'd Haplogroup, with Total 21 samples, was observed among
Kokana (5 samples), Pawara (2 samples), Warli (14 samples), tribal populations. It was absent in Bhils (Table 27).
U2c1a, U2c'd lineages were observed among the four tribal populations included in the present study (Figure 50).
18 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.
(Palanichamy et al., 2004; Achilli et al., 2005; van der Walt et al., 2012; Derenko et al.,
2013b; Palanichamy et al., 2015a; Sharma et al., 2016c; b; Kutanan et al., 2017; Peng et
al., 2018)
U2c1a
U2c’d
Figure 50: Median joining network of U2c’d
148
TMRCA and its SD in years, estimated using ρ statistic, for U2c'd haplogroup in the present study is 26936 ± 5590 years. Published age estimates of U2c'd are 39454.7 ±
6042.7 years (Behar et al., 2012), 46600 (33200 - 60500) years (Silva et al., 2017).
TMRCA for U2c'd haplogroup from the present study is lower than Behar et al. (2012) estimate.
U1
U1 haplogroups is defined by 285-12879-13104-14070-15148-15954C-16249 and U1a by 2218-14364-16189.
U1a is a West Eurasian haplogroup, also observed in South Asia. A total of 3 (0.802 %) sequences were classified as U1a. U1a Haplogroup, with Total 3 samples, was observed among Kokana (1 sample), Pawara (2 samples), tribal populations. It was absent in 2 tribal populations - Bhil, Warli (Table 27). U1a lineage was observed among the four tribal populations included in the present study (Figure 51).
149
U1a
Figure 51: Median joining network of U1a
27 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.(Palanichamy et al., 2004, 2015a; Ingman and Gyllensten, 2007; Derenko et al., 2013b; Khan et al., 2013; Lippold et al., 2014; Zheng et al., 2014; Skonieczna et al., 2015; Malyarchuk et al., 2017; Sharma et al., 2017; Kutanan et al., 2018)
ρ statistic and therefor TMRCA for were not calculated due to small sample size.
Published age estimates of U1 are 31955.4 ± 5352.5 years (Behar et al., 2012), 36900 years (25700-48600) (Soares et al., 2009).
150
U5
U5 is defined by control region motif 16192-16270, U5a’b by 150-7768-14182 and
U5b2a1 by 4732-16189-16270@.
U5b2a is a West Eurasian haplogroup, also observed in South Asia. A total of 2 (0.535
%) sequences were classified as U5b2a.
U5
Figure 52: Median joining network of U5
U5b2a Haplogroup, with Total 2 samples, was observed among Warli (2 samples), tribal population. It was absent in 3 tribal populations - Bhil, Kokana, Pawara (Table
151
27). U5b2a lineages were observed among the four tribal populations included in the present study (Figure 52).
17 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.
(Palanichamy et al., 2004; Montiel-Sosa et al., 2006; Behar et al., 2012; Khan et al.,
2013; Hartmann et al., 2016; Malyarchuk et al., 2017; Marchi et al., 2017; Margaryan et al., 2017; Peng et al., 2018; Piotrowska-Nowak et al., 2019).
ρ statistic and therefor TMRCA for were not calculated due to small sample size.
Published age estimates of U5 are 30248.3 ± 5330.5 years (Behar et al., 2012), 36000 years (25300-47200) (Soares et al., 2009).
K2a
K2a is branch of U8b though K (10550-11299-14798-16224-16311) and K2 (146-
9716). K2a is defined by 152-709-4561 and K2a5 further characterised by 324.
A single sequence from Bhil tribal community belonged to K2a5 haplogroup. (Table
27, Figure 53)
16 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.
(Palanichamy et al., 2004; Behar et al., 2012; Costa et al., 2013; Derenko et al., 2013b;
Li et al., 2014; Lippold et al., 2014; Zheng et al., 2014; Hartmann et al., 2016; Sharma et al., 2016a; Malyarchuk et al., 2017; Marchi et al., 2017; Piotrowska-Nowak et al.,
2019)
152
K2a5
K2a
Figure 53: Median joining network of K2a
TMRCA and its SD in years, estimated using ρ statistic, for K2a5 haplogroup in the present study is 5823 ± 2965 years. Published age estimates of K2a5 are 6045.5 ±
2848.8 years (Behar et al., 2012), K2a 14100 years (7600-21000) (Soares et al., 2009).
TMRCA for K2a5 haplogroup from the present study is lower than Behar et al. (2012) estimate.
U4a
U4’9 is defined by 195-499-5999, U4 by 4646-6047-11332-14620-15693-16356, and
U4a1 by 152-12937-16134.
U4a1 is a West Eurasian haplogroup, also observed in South Asia. A total of 3 (0.802
%) sequences were classified as U4a1. U4a1 Haplogroup, with Total 3 samples, was
153 observed among Warli (3 samples), tribal population. It was absent in 3 tribal populations - Bhil, Kokana, Pawara (Table 27). U4a1 lineage was observed among
Warli tribal population included in the present study (Figure 54).
15 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.
(Schönberg et al., 2011; Behar et al., 2012; Khan et al., 2013; Li et al., 2014;
Skonieczna et al., 2015; Hartmann et al., 2016; Marchi et al., 2017; Matisoo-Smith et al., 2018; Peng et al., 2018)
Figure 54: Median joining network of U4a
154
ρ statistic and therefor TMRCA for were not calculated due to small sample size.
Published age estimates of U4a are 14949.4 ± 3348.1 years (Behar et al., 2012), U4
20900 years (11000-31200) (Soares et al., 2009).
U7
152-980-3741-5360-8137-8684-10142-13500-14569-(16309)-16318T defines U7 haplogroup.
U7 is a West Eurasian haplogroup, also observed in South Asia. A total of 6 (1.604 %) sequences were classified as U7. U7 Haplogroup, with Total 6 samples, was observed among Bhil (2 samples), Kokana (2 samples), Warli (2 samples), tribal populations. It was absent in Pawara tribal population (Table 27).
U7, U7a, U7a3b lineages were observed among the four tribal populations included in the present study (Figure 55).
48 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.
(Palanichamy et al., 2004, 2015b; Behar et al., 2008, 2012; Schönberg et al., 2011;
Derenko et al., 2013b; Khan et al., 2013; Lippold et al., 2014; Sharma et al., 2015;
Larruga et al., 2017; Margaryan et al., 2017; Sahakyan et al., 2017; Zheng et al., 2018)
TMRCA and its SD in years, estimated using ρ statistic, for U7a haplogroup in the present study is 12542 ± 3784 years. Published age estimates of U7a are 16718.3 ±
3017.7 years (Behar et al., 2012), U7 21800 years (11500-32600) (Soares et al., 2009).
TMRCA for U7a haplogroup from the present study is lower than Behar et al. (2012) estimate.
155
U7a3b
U7
Figure 55: Median joining network of U7a
West Eurasian Haplogroups among Tribal Populations of
Maharashtra
West Eurasian Haplogroups seen among tribal populations of Maharashtra are: X2d, J,
T, HV14, H2, H3, H6, H13, U1, U5, K2a, U4a, U7
West Eurasian lineages in India are more common in north-western region and their frequency decline as one goes eastwards and southwards (Metspalu et al., 2004).
Arrival of west Eurasian haplogroups in India has been associated with Agricultural migrations (Kivisild et al., 1999b; a, 2000; Palanichamy et al., 2015b), or ‘Indo-Aryan
156 invasion’ associated with the introduction of the caste system (Bamshad et al., 2001).
There is no agreement on proliferation of west Eurasian lineages being linked to the spread of agriculture, the proto-Elamo-Dravidian language, and the Indo-Aryan migration (Palanichamy et al., 2015b). However, west Eurasian connection is suggested to be more from central Asia and Caucasus regions than any other region, and the admixtures are from multiple arrivals from northwest rather than only limited to
Neolithic or Bronze age (Silva et al., 2017).
X2d
X2 has a wide but intermittent distribution across west Eurasia (Reidla et al., 2003), and it needs to be further explored by complete sequencing of mtDNA.
J
J haplogroup in India is found predominantly in southern (Andhra Pradesh) and northern (Punjab and Uttar Pradesh) regions (Palanichamy et al., 2015b) and J1b1a1 observed among Kokana, is also predominant in Pakistanis, Europeans and central
Asians (Palanichamy et al., 2015b). Its deep coalescence age may suggest its arrival
~20000 in a period of relative warmth (Silva et al., 2017).
T
T branches are common among Andhra Pradesh, Tamil Nadu, Uttar Pradesh, Punjab, and Maharashtra populations. Indian T1- derived lineages cluster mainly with the Near
Eastern populations particularly from Iran, Iraq, and Azerbaijan, (Palanichamy et al.,
2015b) , and median joining network shows Kokana sample sharing basal mutations with samples from Egypt (Kujanová et al., 2009) and its tentative age in South Asia has been put in Holocene (Silva et al., 2017).
157
HV14a
HV14a and its linkage with Iranian population has been used to suggest its introduction along with proto-Dravidian language by Neolithic pastoralists (Palanichamy et al.,
2015b). HV14 among Dravidian speaking Melkudiya tribe, also supports the Elamo-
Dravidian linguistic connections (Sylvester et al., 2018), with shared ancestry with populations of Iran (Derenko et al., 2013a), also clustering with the Central Asia populations(Margaryan et al., 2017; Peng et al., 2018). Warli sample from the present study occupies the basal position in the median joining tree, and to shed further light on arrival of HV14a in India, its complete sequencing may be necessary. This haplogroup provides the important link of Warli with Iranian populations.
H2, H3, H6, H13
H2a2, H3b6, H6, H13a1d need to be further confirmed by complete mtDNA sequencing. However, they may indicate a origin in Caucasus, Iran and Anatolia
(Roostalu et al., 2007; Silva et al., 2017) and Neolithic arrival in India (Silva et al.,
2017).
U1a
U1a, like HV14, has been suggested to be associated with spread of Dravidian languages from west Asia (Palanichamy et al., 2015b). Pawara and Kokana samples from the present study are derived lineages of basal U1a sample from Iran in the mtDNA CR region based median joining network. This haplogroup provides the important link of Pawara and Kokana with Iranian populations and needs to be completely sequenced to shed light on their presence in tribal populations of India.
158
U5b2a
While the observed samples share basal mutations with Indian, West Eurasian and
Central Asian samples, this haplogroup needs to be further confirmed with complete mtDNA sequencing, as it has several derived mutations not seen presently in U5b2a.
K2a5
K2a5 like X2d, provides a link to middle eastern and Iranian populations. Control region sequence of Bhil is exact match with Iranian (Derenko et al., 2013a) and
Ashkenazi (Costa et al., 2013) and a Indian (Palanichamy et al., 2004) sequences. It might have arrived in India in Neolithic period (Silva et al., 2017).
U4a1
Frequency of U4 lineages is low in India (Palanichamy et al., 2015b), and while the
Warli samples share basal mutations with South Asian, West Eurasian and Central
Asian samples in the control region based median joining tree, it needs to be confirmed further.
U7
Spread of U7 and its derived branches is a complex phenomenon, with at least two dispersal events from Near East (Sahakyan et al., 2017). At the resolution provided by control region alone, only one Bhil sample has been allocated to terminal branch,
U7a3b. U7a3 has its expansion time prior to Holocene. Other lineages are also likely to have pre-Holocene origins, though they need to be further characterised by complete mtDNA sequencing. Thus U7a sequences cannot be associated with arrival of Indo-
European speakers (Palanichamy et al., 2015b). This association was made by
(Palanichamy et al., 2015b) prior to the extensive study (Sahakyan et al., 2017) established the complexity of U7 lineages.
159
In summary, it can be said that West Eurasian lineages seen in Tribal populations of
Maharashtra may emanate from variety of geographical sources (Iran, Middle east,
Caucasus, Europe), and in varied time scales spanning from ~20000 to possible
Holocene, and have provided a link to possible sources from Iran, Middle and Near
East and Caucasus.
Analysis of Population affinities based on Haplogroup
Frequencies
Principle component analysis was conducted on haplogroup frequencies of 38 (4 present study + 34 secondary data) populations. A total of 152 haplogroups from the present study as well as published sources (Rakha et al., 2010; Thangaraj et al., 2010;
Shah et al., 2011b; Sharma et al., 2012; Derenko et al., 2013a; The 1000 Genomes
Project Consortium et al., 2015; Chaubey et al., 2016; Tamang et al., 2018), were used for this analysis. PCA plot is shown in Figure 56.
It indicates two distinct clusters, one of north-eastern, Tibeto-Burman language speaking populations, and other of Dravidian and Indo-European language speaking populations. Presence of Austro-Asiatic Khasi and Kra-Dai speaking populations within the cluster of North-Eastern populations further substantiates influence of geographical proximity. Dravidian language speaking populations from South India and
Indo-European language speakers from Maharashtra as well as from Gujarat and
Pakistan (PJL) are tightly clustered indicating similarity among them. This may be explained by the substantial South Asia specific haplogroups present among all these populations with only a minor faction differing among them.
The inset shows populations from the current study. Warli, Mahadeo Koli and Bhils from Madhya Pradesh are very closely placed with Bhils from Maharashtra slightly
160 away from them. Kokana are close to Dravidian speaking Kare Vokkal population from
Karnataka, followed by Pawara.
Figure 56: PCA map of first two components among populations from South Asia Inset map focuses on populations under study.
Comparison of Genome wide Autosomal data with mtDNA data
Principal component plot of Genome wide analysis is shown in Figure 57.
The plot, owing to the power of autosomal markers, distinguishes Tribal populations from caste and 1000 Genome samples. Unlike mtDNA MDS and PCA plots, this plot is based on individuals, rather than populations. Four clearly demarcated clusters can be seen, with Bhils and Pawara overlapping, Kokana and Warli clearly separated from rest of the individuals. However, the Caste samples, Deshastha Brahmins and Maratha overlap with the 1000 Genomes population, without any apparent separate cluster.
161
Figure 57: PCA plot of Genome wide autosomal markers among Tribal and caste populations of Maharashtra and South Asian populations from 1000 Genome Phase 3
(Populations codes: Bhil - IN-BH, Pawara - IN-PW, Kokana - IN-KN, Warli - IN-WR, Deshastha Brahmins - IN-DB, Kunabi Maratha - IN-KM, GIH-Gujarati Indian from Houston, Texas, PJL-Punjabi from Lahore, Pakistan, BEB-Bengali from Bangladesh, STU-Sri Lankan Tamil from the UK, ITU-Indian Telugu from the UK)
When comparing this with mtDNA data, it is abundantly and intuitively clear that autosomal data has brought out the same population affinities seen among the four tribal populations of Maharashtra, when they were analysed without any secondary data. mtDNA based population affinities indicated that Bhils and Pawara are virtually indistinguishable from each other, and autosomal data reflects the same. However,
Genome wide autosomal data has also brought out clear distinction between caste and tribal populations.
162