4. Mitochondrial Diversity:

Second approach to study mitochondrial DNA data is based on analyses of haplogroups. While phylogenetics of haplogroups in itself is an important part of this approach, here we report mtDNA haplogroups with a view to answer the questions about population affinities, demic or cultural diffusion of language and agricultural technology and antiquity and unity of Indian populations.

This chapter thus presents frequencies and haplogroup sharing among the tribal populations of Maharashtra followed by haplogroup descriptions focussing on its estimated time to most recent common ancestor (TMRCA), sharing of haplogroups among populations and lastly, West Eurasian haplogroups which are important markers to shed light on demic or cultural diffusion of language and agricultural technology and populations movements to the subcontinent around Holocene. The chapter also presents the results of assessment of population affinities based on haplogroup frequencies and comments on the utility of mtDNA data vis-a vis genome wide autosomal data to understand population histories.

Principal component analysis of Genome wide SNP analysis undertaken to understand the population structure among South Asian populations along with tribal and caste populations of Maharashtra, has been published (Jonnalagadda et al., 2019) as part of a larger analysis.

Haplogroups and their clades observed in the study populations

One of the objectives of the present research was to characterise the mtDNA haplogroups among the selected tribes of Maharashtra, viz. Bhil, Pawara, Kokana and

Warli. As described earlier in the methodology chapter, sequenced control region of mtDNA (nucleotide positions 16024-576) were used to characterise the haplogroups.

95

Following the steps described in methodology chapter, it was possible to characterise almost all the mtDNA sequences from the four tribal populations under study into haplogroups without ambiguity (Figure 20, Table 24, Table 25, Table 26, Table 27).

In total, 40 haplogroups were observed among the tribal populations of West

Maharashtra (Figure 20).

South Asian specific haplogroups and its clades observed are:

M2a, M2b, M3, M30, M33, M35, M37, M*, M38, M53, M39, M4, M57, M5,

M6, M64, M65, N5, R5, R6, R8, R30, U2a, U2b, U2c’d,

Haplogroups shared with Western Eurasia are:

X2d, J, T, HV14, H2, H3, H6, H13, U1, U5, K2a, U4a, U7

Haplogroups shared with Eastern Eurasia are: G2, M73, A1

A large proportion of haplogroups among the tribal populations belongs to South Asia specific branches of M, N, R and U (333 sequences, 89.037 %). All of the South Asia specific haplogroups exhibit a deep coalescence emphasizing the autochthonous nature of the Tribal populations of Maharashtra. A minor fraction (30 sequences, 8.021 %) of the total sequences belongs to west Eurasian haplogroups shared with South Asia.

These haplogroups are crucial for understanding the putative agriculture and/or language related admixture and migrations from West Eurasia. Therefore, in the context of research questions of present study asking whether tribal populations of Maharashtra show any signs of agriculture or language related admixture, only the west Eurasian haplogroups are further discussed in relevant sections.

96

Figure 20: mtDNA haplogroups observed in the present study 97

Table 24: Frequency and Percentages of Macrohaplogroup M lineages

Branches Bhil Kokana Pawara Warli Total Haplogroups Freq. % Freq. % Freq. % Freq. % Freq. % M* 0 0 0 0 1 1.099 0 0 1 0.267 M* Total M* 0 0 0 0 1 1.099 0 0 1 0.267 M2a1 0 0 1 1.136 0 0 0 0 1 0.267 M2a1a 0 0 0 0 1 1.099 3 3.093 4 1.07 M2a1a+207 0 0 5 5.682 1 1.099 2 2.062 8 2.139 M2a M2a1a3 0 0 1 1.136 0 0 0 0 1 0.267 M2a1a3+16093 0 0 5 5.682 0 0 3 3.093 8 2.139 M2a1b 5 5.102 0 0 6 6.593 0 0 11 2.941 Total M2a 5 5.102 12 13.636 8 8.791 8 8.248 33 8.823 M2b 1 1.02 1 1.136 0 0 1 1.031 3 0.802 M2b M2b2 0 0 0 0 0 0 2 2.062 2 0.535 Total M2b 1 1.02 1 1.136 0 0 3 3.093 5 1.337 M3 0 0 0 0 4 4.396 0 0 4 1.07 M3a1+204 4 4.082 2 2.273 8 8.791 4 4.124 18 4.813 M3c1b 0 0 0 0 0 0 1 1.031 1 0.267 M3 M3c2 0 0 3 3.409 0 0 0 0 3 0.802 M3d 1 1.02 2 2.273 0 0 0 0 3 0.802 Total M3 5 5.102 7 7.955 12 13.187 5 5.155 29 7.754 M30 1 1.02 0 0 0 0 1 1.031 2 0.535 M30+16234 4 4.082 0 0 0 0 1 1.031 5 1.337 M30a 2 2.041 0 0 0 0 0 0 2 0.535 M30b 1 1.02 0 0 0 0 0 0 1 0.267 M30 M30c1 0 0 0 0 2 2.198 0 0 2 0.535 M30d 0 0 0 0 2 2.198 0 0 2 0.535 M30f 0 0 0 0 5 5.495 0 0 5 1.337 M30g 0 0 2 2.273 0 0 0 0 2 0.535 Total M30 8 8.163 2 2.273 9 9.89 2 2.062 21 5.615 M33a1b 1 1.02 1 1.136 0 0 0 0 2 0.535 M33a2a 1 1.02 0 0 1 1.099 0 0 2 0.535 M33 M33b+16362 0 0 1 1.136 0 0 0 0 1 0.267 Total M33 2 2.041 2 2.273 1 1.099 0 0 5 1.337 M35+199 0 0 0 0 1 1.099 0 0 1 0.267 M35b 0 0 0 0 0 0 1 1.031 1 0.267 M35b+16304 2 2.041 0 0 0 0 6 6.186 8 2.139 M35 M35b1 2 2.041 0 0 0 0 1 1.031 3 0.802 M35c 10 10.204 0 0 2 2.198 0 0 12 3.209 Total M35 14 14.286 0 0 3 3.297 8 8.247 25 6.684 M37 M37+152+151 2 2.041 0 0 0 0 14 14.433 16 4.278

98

Branches Bhil Kokana Pawara Warli Total Haplogroups M37e2 1 1.02 0 0 0 0 0 0 1 0.267 Total M37 3 3.061 0 0 0 0 14 14.433 17 4.545 M38c 0 0 0 0 0 0 1 1.031 1 0.267 M38 Total M38c 0 0 0 0 0 0 1 1.031 1 0.267 M39 0 0 0 0 1 1.099 0 0 1 0.267 M39b 0 0 1 1.136 0 0 0 0 1 0.267 M39 M39b1 3 3.061 0 0 3 3.297 0 0 6 1.604 Total M39 3 3.061 1 1.136 4 4.396 0 0 8 2.139 M4a 3 3.061 5 5.682 0 0 2 2.062 10 2.674 M4a Total M4a 3 3.061 5 5.682 0 0 2 2.062 10 2.674 M53 0 0 0 0 1 1.099 0 0 1 0.267 M53 Total M53 0 0 0 0 1 1.099 0 0 1 0.267 M57+152 3 3.061 0 0 1 1.099 0 0 4 1.07 M57a 1 1.02 0 0 6 6.593 0 0 7 1.872 M57 M57b 0 0 2 2.273 0 0 0 0 2 0.535 M57b1 5 5.102 0 0 8 8.791 0 0 13 3.476 Total M57 9 9.184 2 2.273 15 16.484 0 0 26 6.952 M5a2a1a2 0 0 2 2.273 0 0 0 0 2 0.535 M5a3b 0 0 0 0 1 1.099 0 0 1 0.267 M5a4 0 0 1 1.136 2 2.198 1 1.031 4 1.07 M5 M5a'd 5 5.102 8 9.091 2 2.198 4 4.124 19 5.08 M5b2a 0 0 0 0 2 2.198 1 1.031 3 0.802 Total M5 5 5.102 11 12.5 7 7.692 6 6.186 29 7.754 M6 0 0 0 0 0 0 7 7.216 7 1.872 M6 Total M6 0 0 0 0 0 0 7 7.216 7 1.872 M64 0 0 5 5.682 0 0 0 0 5 1.337 M64 Total M64 0 0 5 5.682 0 0 0 0 5 1.337 M65 0 0 1 1.136 0 0 0 0 1 0.267 M65a+@16311 1 1.02 0 0 0 0 0 0 1 0.267 M65 M65b 1 1.02 0 0 2 2.198 0 0 3 0.802 Total M65 2 2.041 1 1.136 2 2.198 0 0 5 1.337 M73a 2 2.041 0 0 0 0 0 0 2 0.535 M73 Total M73a 2 2.041 0 0 0 0 0 0 2 0.535 G2b1a1 1 1.02 0 0 2 2.198 0 0 3 0.802 G2b G2b2a 0 0 0 0 0 0 2 2.062 2 0.535 Total G2b 1 1.02 0 0 2 2.198 2 2.062 5 1.337

99

Table 25: Frequency and Percentages of Macrohaplogroup N (xR) lineages

Branches Bhil Kokana Pawara Warli Total Haplogroups

Freq. % Freq. % Freq. % Freq. % Freq. % N5 0 0 0 0 0 0 1 1.031 1 0.267 N5 Total N5 0 0 0 0 0 0 1 1.031 1 0.267 A1a 4 4.082 0 0 0 0 0 0 4 1.07 A1a Total A1a 4 4.082 0 0 0 0 0 0 4 1.07 X2d 0 0 3 3.409 0 0 0 0 3 0.802 X2d Total X2d 0 0 3 3.409 0 0 0 0 3 0.802

Table 26: Frequency and Percentages of haplogroup R (xU) lineages

Branches Bhil Kokana Pawara Warli Total Haplogroups

Freq. % Freq. % Freq. % Freq. % Freq. % R30 0 0 0 0 4 4.396 0 0 4 1.07 R30a1b 0 0 5 5.682 0 0 3 3.093 8 2.139 R30a1b1 0 0 0 0 0 0 1 1.031 1 0.267 R30 R30b1 0 0 3 3.409 0 0 0 0 3 0.802 R30b2 6 6.122 0 0 4 4.396 0 0 10 2.674 Total R30 6 6.122 8 9.091 8 8.791 4 4.124 26 6.952 R5a1 3 3.061 3 3.409 0 0 5 5.155 11 2.941 R5a2 3 3.061 0 0 1 1.099 2 2.062 6 1.604 R5 R5a2b 3 3.061 5 5.682 5 5.495 0 0 13 3.476 Total R5 9 9.184 8 9.091 6 6.593 7 7.216 30 8.021 R6a 1 1.02 0 0 0 0 0 0 1 0.267 R6a1 1 1.02 0 0 0 0 0 0 1 0.267 R6 R6a2 1 1.02 0 0 1 1.099 0 0 2 0.535 Total R6 3 3.061 0 0 1 1.099 0 0 4 1.07 R8a1 1 1.02 0 0 0 0 0 0 1 0.267 R8a1a1a1 1 1.02 2 2.273 3 3.297 1 1.031 7 1.872 R8 R8b1 0 0 2 2.273 0 0 0 0 2 0.535 Total R8 2 2.041 4 4.545 3 3.297 1 1.031 10 2.674 HV14a 0 0 0 0 0 0 3 3.093 3 0.802 HV14 Total 0 0 0 0 0 0 3 3.093 3 0.802 HV14a H13 H13a1d 1 1.02 0 0 0 0 0 0 1 0.267

100

Branches Bhil Kokana Pawara Warli Total Haplogroups

Freq. % Freq. % Freq. % Freq. % Freq. % H2a H2a2a 0 0 0 0 2 2.198 0 0 2 0.535 H3b H3b6 2 2.041 0 0 0 0 0 0 2 0.535 H6 H6 0 0 2 2.273 0 0 0 0 2 0.535

Total H 3 3.061 2 2.273 2 2.198 0 0 7 1.872 J1b1a1 J1b1a1 0 0 1 1.136 0 0 0 0 1 0.267 T1a T1a+152 0 0 1 1.136 0 0 0 0 1 0.267

Total JT 0 0 2 2.273 0 0 0 0 2 0.535

Table 27: Frequency and Percentages of haplogroup U lineages

Branches Bhil Kokana Pawara Warli Total Haplogroups Freq. % Freq. % Freq. % Freq. % Freq. % U1a 0 0 1 1.136 2 2.198 0 0 3 0.802 U1a Total U1a 0 0 1 1.136 2 2.198 0 0 3 0.802 U5b2a 0 0 0 0 0 0 2 2.062 2 0.535 U5 Total U5b2a 0 0 0 0 0 0 2 2.062 2 0.535 U2a1 1 1.02 1 1.136 0 0 0 0 2 0.535 U2a1a 2 2.041 0 0 2 2.198 2 2.062 6 1.604 U2a U2a2 1 1.02 0 0 0 0 0 0 1 0.267 Total U2a 4 4.081 1 1.136 2 2.198 2 2.062 9 2.406 U2b 1 1.02 0 0 0 0 0 0 1 0.267 U2b U2b2 0 0 3 3.409 0 0 0 0 3 0.802 Total U2b 1 1.02 3 3.409 0 0 0 0 4 1.069 U2c1a 0 0 0 0 2 2.198 0 0 2 0.535 U2c’d U2c'd 0 0 5 5.682 0 0 14 14.433 19 5.08 Total U2c'd 0 0 5 5.682 2 2.198 14 14.433 21 5.615 U4a1 0 0 0 0 0 0 3 3.093 3 0.802 U4a1 Total U4a1 0 0 0 0 0 0 3 3.093 3 0.802 U7 1 1.02 0 0 0 0 0 0 1 0.267 U7a 0 0 2 2.273 0 0 2 2.062 4 1.07 U7 U7a3b 1 1.02 0 0 0 0 0 0 1 0.267 Total U7 2 2.041 2 2.273 0 0 2 2.062 6 1.604 K2a5 1 1.02 0 0 0 0 0 0 1 0.267 K2a Total K2a5 1 1.02 0 0 0 0 0 0 1 0.267 n 98 88 100 91 100 97 100 374 100

101

Further, it was also observed that each tribal community has a different collation of mtDNA Haplogroups and its sub-haplogroups.

Figure 21: Venn diagram showing number of shared and unique mtDNA basal haplogroups among the Tribal Populations of West Maharashtra. (See Description)

Figure 21 depicts the number of shared and unique mtDNA basal haplogroups among the Tribal Populations of West Maharashtra. Eight haplogroups were seen all four populations, Bhil, Pawara, Kokana and Warli; M2, M3, M30, M5, R30, R5, R8, U2.

All of these are haplogroups with older coalescence, and all of them are typical South

Asian haplogroups. Bhil and Pawara, geographically contiguous tribal communities share R6 between themselves. However, Kokana and Warli, do not share any specific haplogroup between themselves. Bhil and Warli share M37, whereas Pawara and

Kokana share western Eurasian haplogroup U1a. Bhil and Kokana as well as Pawara

102 and Warli do not have common haplogroups. Four haplogroups; M33, M39, M57 and

M65, are shared among Bhil, Pawara and Kokana, Warli do not harbour these haplogroups. When Warli, Pawara and Kokana are considered together, there are no shared haplogroups. Bhil, Pawara and Warli share G2b and M35, whereas Bhil, Kokana and Warli share M4a and U7. Additionally, each community harbours some haplogroups which they do not share with other three communities, Bhils have 5 exclusive haplogroups; A1a, M73 which are Eastern Eurasian haplogroup and, H13,

H3b, K2a, which are western Eurasian haplogroups. Pawara exhibit M53 and a single unclassified M* mtDNA along with West Eurasian H2a haplogroup. South Asian haplogroup M64, and West Eurasian H6, J1b1a1, T1a, X2d are exclusively observed in

Kokana. Warli have highest, i.e. 6 private haplogroups, South Asia specific M6, M38 and rare N5 haplogroup represented by a single mtDNA along with Western Eurasian

U4a1, U5 and HV14 haplogroups.

It is interesting to note that haplogroups shared by 3 or 4 communities are largely South

Asia specific haplogroups which have older coalescence age. On the other hand, haplogroups exclusive to each community are largely Western Eurasian haplogroups.

This pattern may indicate the ancient mtDNA stratum among the tribal communities of

Maharashtra. Western Eurasian haplogroups exclusive to the communities may indicate differential gene flow to each of the communities, or loss of lineages due to drift among these communities. Additionally, sharing of South Asian deep-rooted lineages and exclusivity of west Eurasian lineages, may indicate common origin of these tribal communities, followed by strict endogamy. It is further accentuated while comparing shared and unique branches of haplogroups among four tribal populations. Number of haplogroup branches shared is minimal as opposed to exclusive branches in each of the communities, indicating recent but strong endogamy.

103

To infer the phylogeny of 374 mtDNA control region sequences from the present study, and to contextualise them in the larger subcontinental as well as global mtDNA picture, median joining networks were drawn for each haplogroup separately. For such analyses, mtGenomes with complete classification resolutions were selected from published literature (references are given in relevant sections and accession numbers are given in Annexure 5). Control region, 16024-576, of selected mtGenomes and mtDNA sequences from the present study were then utilised to draw median joining trees using Network 5.0.1.1 (fluxus-engineering; Bandelt et al., 1999). This analysis also helped to confirm the haplogroup assignment done by Haplogrep2 (Weissensteiner et al., 2016). Following sections describe the haplogroups and median joining trees.

Macrohaplogroup M

All the mtDNA outside Africa fall in two Macrohaplogroups, M and N. Haplogroup M is defined by 489-10400-14783-15043 substitutions. M and its sub-branches constitute

~60% to 80% of South Asian mtDNA lineages.

Based on the estimation of haplogroups using Haplogrep2, median joining networks and manual near-matching with published mtGenomes, a total of 18 mtDNA haplogroups belonging to macrohaplogroup M were observed in the present study. Two haplogroups (M73, G2), are east Asian whereas rest 16 are South Asia Specific haplogroups (M*, M38, M4a, M30, M37, M64, M65, M39, M2, M3, M5, M6, M33,

M35, M53, M57) One sequence belonging to Pawara tribal community could not be classified further than M node solely based on the control region mutations. This sequence therefore has been reported as M* and not used for median joining analyses.

104

M2

M2 haplogroup is defined by 447G-1780-11083-15670-16274 substitutions.

M2 is one of the oldest haplogroup, specific to Indian subcontinent(Kumar et al., 2008;

Chandrasekar et al., 2009).

In the present study, M2 mtDNA were observed among all the populations under study and it was the most frequently observed haplogroups (38 out of 374, 10.16%, Table 24) following sub sections describe two branches M2a and M2b separately.

M2a

M2a lineage of M2 haplogroup s defined by substitutions for M2a’b 8502-16319 and M2a defining substitutions 7961-12810.

@ M2a1a @

M2a1b

Figure 22: Median joining network of M2a

105

M2a is a South Asia specific haplogroup. A total of 33 (8.823 %) sequences were classified as M2a. M2a Haplogroup with Total 33 samples, was seen among all the four

Tribal populations.( Bhil (5 samples), Kokana (12 samples), Pawara (8 samples), Warli

(8 samples)) (Table 24).

M2a1, M2a1a, M2a1a+207, M2a1a3, M2a1a3+16093, M2a1b lineages were observed among the four tribal populations included in the present study (Figure 22).

63 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification. (Kumar et al., 2008, 2009; Chandrasekar et al., 2009; Sharma et al., 2012; Khan et al., 2013;

Lippold et al., 2014; Palanichamy et al., 2015a; Hartmann et al., 2016; Kutanan et al.,

2018).

TMRCA and its SD in years, estimated using ρ statistic, for M2a haplogroup in the present study is 39629 ± 11492 years. Published age estimates of M2a are 29633.4 ±

5904 years (Behar et al., 2012), 20300 years (11400-29600) (Soares et al., 2009), 34400

(21400 - 48000) years (Silva et al., 2017). TMRCA for M2a haplogroup from the present study is higher than Behar et al. (2012) estimate.

M2b

M2b is defined by 152-182-195-1453-2831T-3630-5744-6647-9899-13254-@14766-

16169.1C-16189-16320.

M2b is a South Asia specific haplogroup. A total of 5 (1.337 %) sequences were classified as M2b. M2b Haplogroup with Total 5 samples, was observed among Bhil (1 sample), Kokana (1 sample), Warli (3 samples), tribal populations. It was absent in

Pawara tribal population. (Table 24).

106

M2b2

M2b

Figure 23: Median Joining Network of M2b

M2b, M2b2 lineages were observed among the four tribal populations included in the present study (Figure 23).

17 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification

(Rajkumar et al., 2005; Sun et al., 2006; Kumar et al., 2008; Chandrasekar et al., 2009;

Palanichamy et al., 2015a).

TMRCA and its SD in years, estimated using ρ statistic, for M2b haplogroup in the present study is 29081 ± 9863 years. Published age estimates of M2b are 14245.4 ±

3926.4 years (Behar et al., 2012), 12800 years (5500-20400) (Soares et al., 2009),

107

14400 (9100 - 19700) years (Silva et al., 2017). TMRCA for M2b haplogroup from the present study is higher than Behar et al. (2012) estimate.

M3

M3 haplogroup is defined by 482-16126.

M3 is a South Asia specific haplogroup. A total of 29 (7.754 %) sequences were classified as M3. M3 Haplogroup with Total 29 samples was seen among all the four

Tribal populations. (Bhil (5 samples), Kokana (7 samples), Pawara (12 samples), Warli

(5 samples)) (Table 24).

M3, M3a1+204, M3c1b, M3c2, M3d lineages were observed among the four tribal populations included in the present study (Figure 24).

117 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification

(Rajkumar et al., 2005; Sun et al., 2006; Thangaraj et al., 2006; Chandrasekar et al.,

2009; Fornarino et al., 2009; Govindaraj et al., 2011; Schönberg et al., 2011; Behar et al., 2012; Wang et al., 2012; Khan et al., 2013; Lippold et al., 2014; Zheng et al., 2014;

Palanichamy et al., 2015a; Pradutkanchana et al., 2016; Sharma et al., 2017; Silva et al.,

2017; Kutanan et al., 2018).

108

M3c2

M3a1+204 M3d

Figure 24: Median Joining Network of M3

TMRCA and its SD in years, estimated using ρ statistic, for M3 haplogroup in the present study is 23580 ± 6347 years. Published age estimates of M3 are 23904.4 ±

7132.8 years (Behar et al., 2012), 35300 years (21400-50000) (Soares et al., 2009),

25800 (19000 - 32800) years (Silva et al., 2017). TMRCA for M3 haplogroup from the present study is lower than Behar et al. (2012) estimate.

M30

195A-15431 substitutions define M30 haplogroup.

M30 is a South Asia specific haplogroup. A total of 21 (5.615 %) sequences were classified as M30.

109

M30 Haplogroup with Total 21 sample, was seen among all the four Tribal populations.( Bhil (8 samples), Kokana (2 samples), Pawara (9 samples), Warli (2 samples)) (Table 24).

M30, M30+16234, M30a, M30b, M30c1, M30d, M30f, M30g lineages were observed among the four tribal populations included in the present study (Figure 25).

M30+16234

M30f

30b M

M30a

Figure 25: Median joining network of M30

89 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification (Maca-

Meyer et al., 2001; Rajkumar et al., 2005; Sun et al., 2006; Chandrasekar et al., 2009;

Govindaraj et al., 2011; Behar et al., 2012; Wang et al., 2012; Khan et al., 2013;

110

Lippold et al., 2014; Zheng et al., 2014; Li et al., 2015; Palanichamy et al., 2015a;

Hartmann et al., 2016; Marrero et al., 2016; Sharma et al., 2016c; Peng et al., 2017;

Sharma et al., 2017; Kutanan et al., 2018).

TMRCA and its SD in years, estimated using ρ statistic, for M30 haplogroup in the present study is 20094 ± 3616 years. Published age estimates of M30 are 17431.1 ±

4012.8 years (Behar et al., 2012), 22300 years (14600-30200) (Soares et al., 2009),

15200 (12200 - 18300) years (Silva et al., 2017). TMRCA for M30 haplogroup from the present study is higher than Behar et al. (2012) estimate.

M33

A single substitution 2361 characterises M33 haplogroup. In the present study, the branches were identified by control region mutations and near matching.

M33 is a South Asia specific haplogroup. A total of 5 (1.337 %) sequences were classified as M33.

M33 Haplogroup with Total 5 samples was observed among Bhil (2 samples), Kokana

(2 samples), Pawara (1 sample), tribal populations. It was absent in Warli tribal population (Table 24).

M33a1b, M33a2a, M33b+16362 lineages were observed among the four tribal populations included in the present study (Figure 26).

31 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.(Sun et al., 2006; Thangaraj et al., 2006; Abu-Amero et al., 2008; Fornarino et al., 2009;

Kumar et al., 2009; Al-Zahery et al., 2011; Wang et al., 2012; Li et al., 2015;

Palanichamy et al., 2015a)

111

@

M33a M33b

Figure 26: Median joining network of M33

TMRCA and its SD in years, estimated using ρ statistic, for M33 haplogroup in the present study is 33515 ± 7716 years. Published age estimates of M33 are 42331.8 ±

9388.8 years (Behar et al., 2012), 44900 years (32900-57300) (Soares et al., 2009),

38000 (29300 - 47000) years (Silva et al., 2017). TMRCA for M33 haplogroup from the present study is lower than Behar et al. (2012) estimate.

112

M35

M35 is defined by 12561, with a further branching M35+199 by additional 199 transition.

M35 is a South Asia specific haplogroup. A total of 25 (6.684 %) sequences were classified as M35. M35 Haplogroup, with Total 25 samples, was observed among Bhil

(14 samples), Pawara (3 samples), Warli (8 samples), tribal populations. It was absent in Kokana tribal population (Table 24).

M35+199, M35b, M35b+16304, M35b1, M35c lineages were observed among the four tribal populations included in the present study (Figure 27).

22 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.(Sun et al., 2006; Fornarino et al., 2009; Kumar et al., 2009; Govindaraj et al., 2011; Wang et al., 2012; Khan et al., 2013; Palanichamy et al., 2015a; Pradutkanchana et al., 2016)

TMRCA and its SD in years, estimated using ρ statistic, for M35 haplogroup in the present study is 38849 ± 10804 years. Published age estimates of M35 are 39085.2 ±

9964.8 years (Behar et al., 2012), 39600 years (26600-53100) (Soares et al., 2009),

26900 (18500 - 35600) years (Silva et al., 2017). TMRCA for M35 haplogroup from the present study is lower than Behar et al. (2012) estimate.

113

M35b1

M35b+16304 M35c

Figure 27: Median joining network of M35

M37

M37 is defined by 10556 followed by yet unresolved branches with 152 and 151 transitions.

M37 is a South Asia specific haplogroup. A total of 17 (4.545 %) sequences were classified as M37. M37 Haplogroup, with Total 17 samples, was observed among Bhil

(3 samples), Warli (14 samples), tribal populations. It was absent in 2 tribal populations

- Kokana, Pawara (Table 24).

M37+152+151, M37e2 lineages were observed among the four tribal populations included in the present study (Figure 28).

114

M37+152+151 M37e2

Figure 28: Median joining network of M37

20 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.

(Sun et al., 2006; Thangaraj et al., 2006; Chandrasekar et al., 2009; Sharma et al., 2012;

Palanichamy et al., 2015a; Kutanan et al., 2017, 2018)

TMRCA and its SD in years, estimated using ρ statistic, for M37 haplogroup in the present study is 29796 ± 9997 years. Published age estimates of M37 are 29269 ±

7027.2 years (Behar et al., 2012), 34700 years (22800-47200) (Soares et al., 2009),

18200 (11500 - 25200) years (Silva et al., 2017). TMRCA for M37 haplogroup from the present study is higher than Behar et al. (2012) estimate.

115

M*, M38, M53, M73

M macrohaplogroup is defined by 489 10400 14783 15043 substitutions stemming from L3 lineage. A single sample was observed among Pawara tribal population, which could not be classified further than M node and has been tentatively classified as M*.

M38 is a clade of M4”67 and M18’38, defined by 12498-13135-16318T substitutions.

M38c is a South Asia specific haplogroup. A single (0.267 %) sequence was classified as M38c (Table 24) in the Warli tribal populations included in the present study (Figure

29).

M53

Figure 29: Median joining network of M*, M38, M53 and M73

116

ρ statistic and therefor TMRCA for were not calculated due to small sample size.

Published age estimates of M38 are 26724.8 ± 5529.6 years (Behar et al., 2012), 16800 years (7900-26300) (Soares et al., 2009), 32500 (23600 - 41700) years (Silva et al.,

2017).

M53 is defined by 240-390T-572-5493-5821-9302-11560-16051-16189-16316 mutations. Single (0.267 %) Pawara mtDNA sequence belonged to M53 haplogroup

(Table 24).

ρ statistic and therefor TMRCA for were not calculated due to small sample size.

Published age estimates of M53 are 19904.6 ± 7084.8 years (Behar et al., 2012), and

20600 (9000 - 33000) years (Silva et al., 2017).

M73’79 is defined by 14034-16278 but M73 does not have any haplogroup defining mutations in the control region. Sequences belonging to M73a in the present study were classified using near matching with other sequences and 16184A which characterises

M73a lineage.

M73a is a East Asian haplogroup, also observed in South Asia. A total of 2 (0.535 %) sequences were classified as M73a. M73a Haplogroup, with Total 2 samples, was observed among Bhil (2 samples), tribal population. It was absent in 3 tribal populations - Kokana, Pawara, Warli (Table 24). M73a lineages were observed among the Bhil tribal populations included in the present study. INSERT FIG NUMBER

ρ statistic and therefore TMRCA for were not calculated due to small sample size.

Published age estimates of M73 are 33192.8 ± 6326.4 years (Behar et al., 2012).

16 additional sequences were used to construct a composite median joining network diagram. (Sun et al., 2006; Chandrasekar et al., 2009; Kumar et al., 2009; Tabbada et al., 2010; Sharma et al., 2012, Palanichamy et al., 2015a)

117

ρ statistic and therefor TMRCA for M*, M38, M53, M73 were not calculated due to small sample size.

M39

M39’70 haplogroup contains M39 clade, which is defined by characteristic indels and other substitutions 55.1T-59-60d-65.1T-(66T)-1811-15938.

M39 is a South Asia specific haplogroup. A total of 8 (2.139 %) sequences were classified as M39. M39 Haplogroup, with Total 8 samples, was observed among Bhil (3 samples), Kokana (1 sample), Pawara (4 samples), tribal populations. It was absent in

Warli tribal population (Table 24).

M39b1

M39b

M39

M

Figure 30: Median joining network of M39

118

M39, M39b, M39b1 lineages were observed among the four tribal populations included in the present study (Figure 30).

33 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification. (Sun et al., 2006; Chandrasekar et al., 2009; Sharma et al., 2012; Bhandari et al., 2015;

Palanichamy et al., 2015a)

TMRCA and its SD in years, estimated using ρ statistic, for M39 haplogroup in the present study is 67431 ± 14434 years. Published age estimates of M39 are 26638.7 ±

5750.4 years (Behar et al., 2012), 32300 years (20700-44400) (Soares et al., 2009),

23700 (15300 - 32500) years (Silva et al., 2017). TMRCA for M39 haplogroup from the present study is higher than Behar et al. (2012) estimate.

M4

M4”67 contains haplogroup clade defined by recurrent mutation 16311 and 6620-7859-

16145-16261 are M4 specific mutations.

M4a is a South Asia specific haplogroup. A total of 10 (2.674 %) sequences were classified as M4a. M4a Haplogroup, with Total 10 samples, was observed among Bhil

(3 samples), Kokana (5 samples), Warli (2 samples), tribal populations. It was absent in

Pawara tribal population (Table 24). M4a lineages were observed among the four tribal populations included in the present study (Figure 31).

16 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.

(Sun et al., 2006; Thangaraj et al., 2006; Chandrasekar et al., 2009; Derenko et al.,

2013b; Lippold et al., 2014; Li et al., 2015; Palanichamy et al., 2015a; Kutanan et al.,

2017)

119

M4a

M4

M

Figure 31: Median joining network of M4

TMRCA and its SD in years, estimated using ρ statistic, for M4a haplogroup in the present study is 26312 ± 10592 years. Published age estimates of M4a are 12734.3 ±

7315.2 years (Behar et al., 2012), 36500 years (26100-47300) (Soares et al., 2009),

11300 (7300 - 15500) years (Silva et al., 2017). TMRCA for M4a haplogroup from the present study is higher than Behar et al. (2012) estimate.

M57

3483-4020-13651-16311 characterise haplogroup M57.

M57 is a South Asia specific haplogroup. A total of 26 (6.952 %) sequences were classified as M57. M57 Haplogroup, with Total 26 samples, was observed among Bhil

120

(9 samples), Kokana (2 samples), Pawara (15 samples), tribal populations. It was absent in Warli tribal population (Table 24).

M57+152, M57a, M57b, M57b1 lineages were observed among the four tribal populations included in the present study (Figure 32).

Figure 32: Median joining network of M57

13 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.

(Thangaraj et al., 2006; Chandrasekar et al., 2009; Lippold et al., 2014; Palanichamy et al., 2015a; Kutanan et al., 2017)

121

TMRCA and its SD in years, estimated using ρ statistic, for M57 haplogroup in the present study is 31354 ± 9688 years. Published age estimates of M57 are 30220.7 ±

8448 years (Behar et al., 2012), 28800 (19000 - 38900) years (Silva et al., 2017).

TMRCA for M57 haplogroup from the present study is higher than Behar et al. (2012) estimate.

M5

M5 is defined by 1888-16129 substitutions.

M5a3b M5a2a1a

M5a’d

M5b2 M5

Figure 33: Median joining network of M5

M5 is a widely distributed South Asia specific haplogroup. A total of 29 (7.754 %) sequences were classified as M5. M5 Haplogroup, with Total 29 samples, was seen

122 among all the four Tribal populations.( Bhil (5 samples), Kokana (11 sample), Pawara

(7 samples), Warli (6 samples)) (Table 24). M5a2a1a2, M5a3b, M5a4, M5a'd, M5b2a lineages were observed among the four tribal populations included in the present study

(Figure 33).

125 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification. (Sun et al., 2006; Thangaraj et al., 2006; Behar et al., 2008; Chandrasekar et al., 2009;

Fornarino et al., 2009; Govindaraj et al., 2011; Kong et al., 2011; Behar et al., 2012;

Sharma et al., 2012; Wang et al., 2012; Derenko et al., 2013b; Gómez-Carballa et al.,

2013; Kang et al., 2013; Lippold et al., 2014; Li et al., 2015; Palanichamy et al., 2015a;

Hartmann et al., 2016; Marrero et al., 2016; Sharma et al., 2016c; Vyas et al., 2016;

Kutanan et al., 2017; Malyarchuk et al., 2017; Peng et al., 2017; Sharma et al., 2017)

TMRCA and its SD in years, estimated using ρ statistic, for M5 haplogroup in the present study is 19316 ± 3633 years. Published age estimates of M5 are 37067.2 ±

14803.2 years (Behar et al., 2012), 39600 years (27600-52100) (Soares et al., 2009),

32100 (21400 - 43200) years (Silva et al., 2017). TMRCA for M5 haplogroup from the present study is lower than Behar et al. (2012) estimate.

M6

461-5301-5558-10640-14128-16362 characterise M6 haplogroup.

M6 is a South Asia specific haplogroup. A total of 7 (1.872 %) sequences were classified as M6. M6 Haplogroup, with Total 7 samples, was observed among Warli (7 samples), tribal population. It was absent in 3 tribal populations - Bhil, Kokana, Pawara

(Table 24). M6 lineages were observed among the four tribal populations included in the present study (Figure 34).

123

25 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification. (Khan et al., 2013; Palanichamy et al., 2015a)

TMRCA and its SD in years, estimated using ρ statistic, for M65 haplogroup in the present study is 24343 ± 7752 years. Published age estimates of M65 are 25256 ± 6528 years (Behar et al., 2012), 20600 (12600 - 29000) years (Silva et al., 2017). TMRCA for M65 haplogroup from the present study is lower than Behar et al. (2012) estimate.

M6

Figure 34: Median joining network of M6

124

M64

M64, nested in M4”67, is defined by 152-5201-8843-9947-10685-13105-15355-15968-

16263-16527.

M64 is a South Asia specific haplogroup. A total of 5 (1.337 %) sequences were classified as M64. M64 Haplogroup, with Total 5 samples, was observed among

Kokana (5 samples), tribal population. It was absent in 3 tribal populations - Bhil,

Pawara, Warli (Table 24). M64 lineages were observed among the four tribal populations included in the present study (Figure 35).

3 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification. (Behar et al., 2008; Chandrasekar et al., 2009)

Figure 35: Median joining network of M64

TMRCA and its SD in years, estimated using ρ statistic, for M64 haplogroup in the present study is 7926 ± 5189 years. Published age estimates of M64 are 12624.2 ±

125

5289.6 years (Behar et al., 2012), 18100 (8100 - 28500) years (Silva et al., 2017).

TMRCA for M64 haplogroup from the present study is lower than Behar et al. (2012) estimate.

M65

M65 is defined by a single 511 transition.

M65b is a South Asia specific haplogroup. A total of 3 (0.802 %) sequences were classified as M65b. M65 Haplogroup, with Total 5 samples, was observed among Bhil

(2 samples), Kokana (1 sample), Pawara (2 samples), tribal populations. It was absent in Warli tribal population (Table 24).

M65, M65a+@16311, M65b lineages were observed among the four tribal populations included in the present study (Figure 36).

28 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification. (Sun et al., 2006; Abu-Amero et al., 2008; Chandrasekar et al., 2009; Kong et al., 2011; Khan et al., 2013; Lippold et al., 2014; Palanichamy et al., 2015a; Sharma et al., 2016c; Peng et al., 2017; Sharma et al., 2017)

TMRCA and its SD in years, estimated using ρ statistic, for M65 haplogroup in the present study is 24343 ± 7752 years. Published age estimates of M65 are 25256 ± 6528 years (Behar et al., 2012), 20600 (12600 - 29000) years (Silva et al., 2017). TMRCA for M65 haplogroup from the present study is lower than Behar et al. (2012) estimate.

126

M65a+16311@

M65

Figure 36: Median joining network of M65

G2

709-4833-5108-16362 defines G, and 5601-13563 define G2 haplogroup.

G2b is a East Asian haplogroup, also observed in South Asia. A total of 5 (1.337 %) sequences were classified as G2b (Table 24).

G2b1a1, G2b2a lineages were observed among the four tribal populations included in the present study (Figure 37).

127

G2b1a1

G2b2a G

Figure 37: Median joining network of G2

37 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification. (Kong et al., 2003; Zhang et al., 2008; Chandrasekar et al., 2009; Wang et al., 2011; Ji et al.,

2012; Jiang et al., 2014; Ko et al., 2014; Summerer et al., 2014; Hartmann et al., 2016;

Kutanan et al., 2017, 2018; Peng et al., 2017; Derenko et al., 2018; Zheng et al., 2018)

TMRCA and its SD in years, estimated using ρ statistic, for G2b haplogroup in the present study is 14055 ± 4406 years. Published age estimates of G2b are 22776.4 ±

128

5059.2 years (Behar et al., 2012). TMRCA for G2b haplogroup from the present study is lower than Behar et al. (2012) estimate.

Macrohaplogroup N

Haplogroup N is defined by 8701-9540-10398-10873-15301, and harbours derived clades of macrohaplogroup R.

In the present study, south Asia specific N5 haplogroup, A1a haplogroup shared with

East Asia and X2d which is common in west Asia are observed (Table 25, Table 26,

Table 27, Figure 20). Haplogroups shared with Western Eurasia are important for the questions of language shifts and agricultural transition with or without admixture posed in the present study.

N5

N5 is defined by 5063-7076-9545-11626-(13434)-16111-16311 substitutions.

N5 is a rare South Asia specific haplogroup. One (0.267 %) sequence from Warli population was classified as N5 (Table 25, Figure 38)

6 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.

(Palanichamy et al., 2004, 2015a; Sharma et al., 2012)

129

N5

Figure 38: Median joining network of N5

ρ statistic and therefor TMRCA for were not calculated due to small sample size.

Published age estimates of N5 are 36712.4 ± 8202.5 years (Behar et al., 2012), 45300

(28600 - 62800) years (Silva et al., 2017).

130

A1a

Haplogroup A is defined by 235-663-1736-4248-4824-8794-16290-16319 and A1a is by 9713-16249.

A1a is a East Asian haplogroup, also observed in South Asia. A total of 4 (1.07 %) sequences were classified as A1a. A1a Haplogroup, with Total 4 samples, was observed among Bhil (4 samples), tribal population. It was absent in 3 tribal populations -

Kokana, Pawara, Warli (Table 25). A1a lineages were observed among the four tribal populations included in the present study (Figure 39).

Figure 39: Median joining network of A1a

131

4 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.

(Tanaka et al., 2004; Derenko et al., 2007; Bilal et al., 2008; Peng et al., 2017)

ρ statistic and therefor TMRCA for were not calculated due to small sample size.

Published age estimates of A1a are 12987.6 ± 5404.2 years (Behar et al., 2012).

X2d

X2+125 is characterised by 195-1719-225 and X2d by @153-@225-6791-8503

X2d is West Asian haplogroup, also observed in South Asia. A total of 3 (0.802 %) sequences were classified as X2d. X2d Haplogroup, with Total 3 samples, was observed among Kokana (3 samples), tribal population. It was absent in 3 tribal populations - Bhil, Pawara, Warli (Table 25). X2d lineages were observed among the four tribal populations included in the present study (Figure 40).

X2 has a wide but intermittent distribution across west Eurasia (Reidla et al., 2003)

4 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification. (Shlush et al., 2008; Kloss-Brandstätter et al., 2010; Schönberg et al., 2011; Behar et al., 2012)

ρ statistic and therefor TMRCA for were not calculated due to small sample size.

Published age estimates of X2d are 10870.2 ± 3497.9 years (Behar et al., 2012).

132

Figure 40: Median joining network of X2d

Macrohaplogroup R

South Asia specific haplogroups of R (xU); R5, R6, R8, R30 and J, T, HV14, H2, H3,

H6, H13 haplogroups which are shared with West Eurasia were observed (Table 26,

Figure 20).

R5

Haplogroup R5 is defined by 8594-10754-14544-16304-16524.

133

R5 is a South Asia specific haplogroup. A total of 30 (8.021 %) sequences were classified as R5. R5 Haplogroup, with Total 30 samples, was seen among all the four

Tribal populations.( Bhil (9 samples), Kokana (8 samples), Pawara (6 samples), Warli

(7 samples)) (Table 26).

R5a R5

Figure 41: Median joining network of R5

R5a1, R5a2, R5a2b lineages were observed among the four tribal populations included in the present study (Figure 41).

20 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.

(Palanichamy et al., 2004; Behar et al., 2008; Chaubey et al., 2008a; Govindaraj et al.,

134

2011; Sharma et al., 2012; Derenko et al., 2013b; Khan et al., 2013; Sharma et al.,

2016c; Kutanan et al., 2017)

TMRCA and its SD in years, estimated using ρ statistic, for R5a haplogroup in the present study is 38689 ± 10877 years. Published age estimates of R5a are 30665.3 ±

8552.3 years (Behar et al., 2012), 19100 years (11200-27200) (Soares et al., 2009),

32000 (19100 - 45500) years (Silva et al., 2017). TMRCA for R5a haplogroup from the present study is higher than Behar et al. (2012) estimate.

R6

(195)-12285-(16266)-16362 substitutions define R6.

R6 is a South Asia specific haplogroup. A total of 4 (1.07 %) sequences were classified as R6. R6 Haplogroup, with Total 4 samples, was observed among Bhil (3 samples),

Pawara (1 sample), tribal populations. It was absent in 2 tribal populations - Kokana,

Warli (Table 26).

R6a, R6a1, R6a2 lineages were observed among the four tribal populations included in the present study (Figure 42).

21 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.

(Palanichamy et al., 2004; Govindaraj et al., 2011; Sharma et al., 2012; Wang et al.,

2012; Fregel et al., 2014; Kutanan et al., 2017; Silva et al., 2017)

135

R6a1

R6a

R6a2

R6

Figure 42: Median joining network of R6

TMRCA and its SD in years, estimated using ρ statistic, for R6a haplogroup in the present study is 50121 ± 13435 years. Published age estimates of R6a are 41310.8 ±

8832.2 years (Behar et al., 2012), R6 51100 years (35900-66800) (Soares et al., 2009),

33600 (22900 - 44700) years (Silva et al., 2017). TMRCA for R6a haplogroup from the present study is higher than Behar et al. (2012) estimate.

R8

R8 is defined by 195-2755-3384-7759-9449-13215.

R8 is a South Asia specific haplogroup. A total of 10 (2.674 %) sequences were classified as R8. R8 Haplogroup, with Total 10 samples, was seen among all the four

Tribal populations.( Bhil (2 samples), Kokana (4 samples), Pawara (3 samples), Warli

(1 sample)) (Table 26).

136

R8a1a1a1

R8

Figure 43: Median joining network of R8

R8a1, R8a1a1a1, R8b1 lineages were observed among the four tribal populations included in the present study (Figure 43).

40 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.

(Chaubey et al., 2008a; Thangaraj et al., 2009; Khan et al., 2013; Pradutkanchana et al.,

2016)

TMRCA and its SD in years, estimated using ρ statistic, for R8 haplogroup in the present study is 23668 ± 8887 years. Published age estimates of R8 are 32783.3 ±

6890.8 years (Behar et al., 2012), 42100 years (26700-58300) (Soares et al., 2009),

137

33200 (22900 - 43900) years (Silva et al., 2017). TMRCA for R8 haplogroup from the present study is lower than Behar et al. (2012) estimate.

R30

R30 is defined by a single 8584 transition.

R30 is a South Asia specific haplogroup. A total of 26 (6.952 %) sequences were classified as R30. R30 Haplogroup, with Total 26 samples, was seen among all the four

Tribal populations.( Bhil (6 samples), Kokana (8 samples), Pawara (8 samples), Warli

(4 samples)) (Table 26).

R30, R30a1b, R30a1b1, R30b1, R30b2 lineages were observed among the four tribal populations included in the present study (Figure 44).

51 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.

(Palanichamy et al., 2004; Behar et al., 2008; Chaubey et al., 2008a; Fornarino et al.,

2009; Rani et al., 2010; Sharma et al., 2012; Derenko et al., 2013b; Khan et al., 2013;

Kutanan et al., 2017, 2018; Silva et al., 2017; Zheng et al., 2018)

138

R30a1b

R30

Figure 44: Median joining network of R30

TMRCA and its SD in years, estimated using ρ statistic, for R30 haplogroup in the present study is 37714 ± 7152 years. Published age estimates of R30 are 53576.8 ±

3961.4 years (Behar et al., 2012), 64000 years (49900-78600) (Soares et al., 2009),

53000 (40600 - 65800) years (Silva et al., 2017). TMRCA for R30 haplogroup from the present study is lower than Behar et al. (2012) estimate.

J, T

Haplogroup JT is defined by 11251-15452A-16126, with J further characterised by

295-489-10398-12612-13708-16069 and T defined by 709-1888-4917-8697-10463-

13368-14905-15607-15928-16294.

139

J1b1a1 and T1a+152 are West Eurasian haplogroups, also observed in South Asia. A single Kokana (0.267 %) sequence was classified as J1b1a1 and another Kokana (0.267

%) sequence was classified as T1a+152 (Table 26, Figure 45).

37 (T- 4, J- 33) published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification. (Coble et al., 2004; Palanichamy et al., 2004; Kujanová et al., 2009; Li et al., 2014; Just et al., 2015; Gomez-Duran, 2016; Malyarchuk et al., 2017; Pereira et al.,

2017; Peng et al., 2018; Piotrowska-Nowak et al., 2019)

J1b

T1a

JT

Figure 45: Median joining network of J and T

ρ statistic and therefor TMRCA for were not calculated due to small sample size.

Published age estimates of J are 34258.3 ± 4886.2 years (Behar et al., 2012), 32600

140 years (22400-43200) (Soares et al., 2009), and those for T are 25149.4 ± 4668.3 years

(Behar et al., 2012), 26800 years (18100-35800) (Soares et al., 2009).

HV14

480-15115 defines haplogroup HV14.

HV14a is a West Eurasian haplogroup, also observed in South Asia. A total of 3 (0.802

%) sequences were classified as HV14a. HV14a Haplogroup, with Total 3 samples, was observed among Warli (3 samples), tribal population.

HV14

Figure 46: Median joining network of HV14

141

It was absent in 3 tribal populations - Bhil, Kokana, Pawara (Table 26). HV14a lineages were observed among the four tribal populations included in the present study (Figure

46).

50 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.

(Palanichamy et al., 2004; Derenko et al., 2013b; Khan et al., 2013; Margaryan et al.,

2017; Matisoo-Smith et al., 2018; Peng et al., 2018; Sylvester et al., 2018)

ρ statistic and therefor TMRCA for were not calculated due to small sample size.

Published age estimates of HV14a are 6366.8 ± 3904.1 years (Behar et al., 2012).

H2, H3, H6, H13

H13a1d

H6

H2a2a

Figure 47: Median joining network of H2, H3, H6, H13

142

H Haplogroup, with Total 7 samples, was observed among Bhil (3 samples), Kokana (2 samples), Pawara (2 samples), tribal populations. It was absent in Warli tribal population.

H2a2a differs from rCRS by 263-8860-15326.

H2a2a is a West Eurasian haplogroup, also observed in South Asia. A total of 2 Pawara

(0.535 %) sequences were classified as H2a2a (Table 26).

H2a2a lineage was observed among Pawara tribal population included in the present study (Figure 47).

ρ statistic and therefor TMRCA for were not calculated due to small sample size.

Published age estimates of H2 are 11905.3 ± 1364.4 years (Behar et al., 2012), 11700 years (6500-17100) (Soares et al., 2009).

H3 is characterised by 6776.

H3b6 is a West Eurasian haplogroup, also observed in South Asia. A total of 2 (0.535

%) sequences were classified as H3b6 (Table 26). H3b6 lineage was observed among the Bhil tribal population included in the present study (Figure 47).

ρ statistic and therefor TMRCA for were not calculated due to small sample size.

Published age estimates of H3 are 8919 ± 1062.6 years (Behar et al., 2012), 11800 years (8400-15400) (Soares et al., 2009).

H6 is defied by 239-16362-(16482).

H6 is a West Eurasian haplogroup, also observed in South Asia. A total of 2 (0.535 %) sequences were classified as H6 (Table 26) among Kokana tribal population included in the present study (Figure 47).

ρ statistic and therefor TMRCA for were not calculated due to small sample size.

Published age estimates of H6 are 10945.6 ± 1873.7 years (Behar et al., 2012), 15300 years (10500-20300) (Soares et al., 2009).

143

14872 defines H13.

H13a1d is a West Eurasian haplogroup, also observed in South Asia. A total of 1 (0.267

%) sequences were classified as H13a1d (Table 26) among Bhil tribal population included in the present study (Figure 47).

ρ statistic and therefor TMRCA for were not calculated due to small sample size.

Published age estimates of H13 are 12475.9 ± 867.7 years (Behar et al., 2012), 17500 years (13300-21700) (Soares et al., 2009).

6 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.

(Achilli et al., 2004; Roostalu et al., 2007; Behar et al., 2012; Raule et al., 2014;

Malyarchuk et al., 2017)

Haplogroup U

South Asia Specific U2a, U2b, U2c’d (previously known as U2i) and U1, U5, K2a,

U4a, U7 shared with West Eurasia were also observed (Table 27, Figure 20).

Haplogroup U is defined by 11467-12308-12372 substitutions. U2a, U2b and U2c’d branched are South Asia specific, whereas other lineages are common in west Eurasia.

U2a

U2 is defined by 16051, U2a further characterised by 16206C and U2b by 146-@2706-

5186T-12106-13194-15049 substitutions.

U2a is a South Asia specific haplogroup. A total of 9 (2.406 %) sequences were classified as U2a. U2a Haplogroup with Total 9 samples, was seen among all the four

Tribal populations, (Bhil (4 samples), Kokana (1 samples), Pawara (2 samples), Warli

(2 samples)). (Table 27).

U2a1, U2a1a, U2a2 lineages were observed among the four tribal populations included in the present study (Figure 48).

144

12 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.

(Palanichamy et al., 2004; Achilli et al., 2005; van der Walt et al., 2012; Kang et al.,

2013; Zheng et al., 2014; Palanichamy et al., 2015a; Sharma et al., 2016b; Olivieri et al., 2017; Kutanan et al., 2018)

U2a1a

U2a1

U2a

U2

Figure 48: Median joining network of U2a

TMRCA and its SD in years, estimated using ρ statistic, for U2a haplogroup in the present study is 41667 ± 11448 years. Published age estimates of U2a are 22693.8 ±

8274.7 years (Behar et al., 2012), 27500 years (13200-42800) (Soares et al., 2009),

145

35200 (24400 - 46400) years (Silva et al., 2017). TMRCA for U2a haplogroup from the present study is higher than Behar et al. (2012) estimate.

U2b

U2b is a South Asia specific haplogroup. A total of 4 (1.069 %) sequences were classified as U2b. U2b Haplogroup with Total 4 samples, was observed among Bhil (1 samples) and Kokana (3 samples), tribal populations. It was absent in 2 tribal populations – Pawara and Warli (Table 27).

U2b, U2b2 lineages were observed among the four tribal populations included in the present study (Figure 49).

21 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.

(Palanichamy et al., 2004; Achilli et al., 2005; Govindaraj et al., 2011; Khan et al.,

2013; Lippold et al., 2014; Palanichamy et al., 2015a; Sharma et al., 2016b; Kutanan et al., 2017; Peng et al., 2018; Zheng et al., 2018)

146

U2b2

U2b

U2

Figure 49: Median joining network of U2b

TMRCA and its SD in years, estimated using ρ statistic, for U2b haplogroup in the present study is 63044 ± 19726 years. Published age estimates of U2b are 29253.5 ±

5815 years (Behar et al., 2012), 34300 years (22300-46900) (Soares et al., 2009), 39100

(23300 - 55800) years (Silva et al., 2017). TMRCA for U2b haplogroup from the present study is higher than Behar et al. (2012) estimate.

U2c’d

16234 defines U2c’d, with U2c further characterised by 5790A-14935-15061 and U2d by 199-@263-471-1700-4025-8938-11893-14926-16189-16294.

147

U2c'd is a South Asia specific haplogroup. A total of 21 (5.615 %) sequences were classified as U2c'd. U2c'd Haplogroup, with Total 21 samples, was observed among

Kokana (5 samples), Pawara (2 samples), Warli (14 samples), tribal populations. It was absent in Bhils (Table 27).

U2c1a, U2c'd lineages were observed among the four tribal populations included in the present study (Figure 50).

18 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.

(Palanichamy et al., 2004; Achilli et al., 2005; van der Walt et al., 2012; Derenko et al.,

2013b; Palanichamy et al., 2015a; Sharma et al., 2016c; b; Kutanan et al., 2017; Peng et

al., 2018)

U2c1a

U2c’d

Figure 50: Median joining network of U2c’d

148

TMRCA and its SD in years, estimated using ρ statistic, for U2c'd haplogroup in the present study is 26936 ± 5590 years. Published age estimates of U2c'd are 39454.7 ±

6042.7 years (Behar et al., 2012), 46600 (33200 - 60500) years (Silva et al., 2017).

TMRCA for U2c'd haplogroup from the present study is lower than Behar et al. (2012) estimate.

U1

U1 haplogroups is defined by 285-12879-13104-14070-15148-15954C-16249 and U1a by 2218-14364-16189.

U1a is a West Eurasian haplogroup, also observed in South Asia. A total of 3 (0.802 %) sequences were classified as U1a. U1a Haplogroup, with Total 3 samples, was observed among Kokana (1 sample), Pawara (2 samples), tribal populations. It was absent in 2 tribal populations - Bhil, Warli (Table 27). U1a lineage was observed among the four tribal populations included in the present study (Figure 51).

149

U1a

Figure 51: Median joining network of U1a

27 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.(Palanichamy et al., 2004, 2015a; Ingman and Gyllensten, 2007; Derenko et al., 2013b; Khan et al., 2013; Lippold et al., 2014; Zheng et al., 2014; Skonieczna et al., 2015; Malyarchuk et al., 2017; Sharma et al., 2017; Kutanan et al., 2018)

ρ statistic and therefor TMRCA for were not calculated due to small sample size.

Published age estimates of U1 are 31955.4 ± 5352.5 years (Behar et al., 2012), 36900 years (25700-48600) (Soares et al., 2009).

150

U5

U5 is defined by control region motif 16192-16270, U5a’b by 150-7768-14182 and

U5b2a1 by 4732-16189-16270@.

U5b2a is a West Eurasian haplogroup, also observed in South Asia. A total of 2 (0.535

%) sequences were classified as U5b2a.

U5

Figure 52: Median joining network of U5

U5b2a Haplogroup, with Total 2 samples, was observed among Warli (2 samples), tribal population. It was absent in 3 tribal populations - Bhil, Kokana, Pawara (Table

151

27). U5b2a lineages were observed among the four tribal populations included in the present study (Figure 52).

17 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.

(Palanichamy et al., 2004; Montiel-Sosa et al., 2006; Behar et al., 2012; Khan et al.,

2013; Hartmann et al., 2016; Malyarchuk et al., 2017; Marchi et al., 2017; Margaryan et al., 2017; Peng et al., 2018; Piotrowska-Nowak et al., 2019).

ρ statistic and therefor TMRCA for were not calculated due to small sample size.

Published age estimates of U5 are 30248.3 ± 5330.5 years (Behar et al., 2012), 36000 years (25300-47200) (Soares et al., 2009).

K2a

K2a is branch of U8b though K (10550-11299-14798-16224-16311) and K2 (146-

9716). K2a is defined by 152-709-4561 and K2a5 further characterised by 324.

A single sequence from Bhil tribal community belonged to K2a5 haplogroup. (Table

27, Figure 53)

16 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.

(Palanichamy et al., 2004; Behar et al., 2012; Costa et al., 2013; Derenko et al., 2013b;

Li et al., 2014; Lippold et al., 2014; Zheng et al., 2014; Hartmann et al., 2016; Sharma et al., 2016a; Malyarchuk et al., 2017; Marchi et al., 2017; Piotrowska-Nowak et al.,

2019)

152

K2a5

K2a

Figure 53: Median joining network of K2a

TMRCA and its SD in years, estimated using ρ statistic, for K2a5 haplogroup in the present study is 5823 ± 2965 years. Published age estimates of K2a5 are 6045.5 ±

2848.8 years (Behar et al., 2012), K2a 14100 years (7600-21000) (Soares et al., 2009).

TMRCA for K2a5 haplogroup from the present study is lower than Behar et al. (2012) estimate.

U4a

U4’9 is defined by 195-499-5999, U4 by 4646-6047-11332-14620-15693-16356, and

U4a1 by 152-12937-16134.

U4a1 is a West Eurasian haplogroup, also observed in South Asia. A total of 3 (0.802

%) sequences were classified as U4a1. U4a1 Haplogroup, with Total 3 samples, was

153 observed among Warli (3 samples), tribal population. It was absent in 3 tribal populations - Bhil, Kokana, Pawara (Table 27). U4a1 lineage was observed among

Warli tribal population included in the present study (Figure 54).

15 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.

(Schönberg et al., 2011; Behar et al., 2012; Khan et al., 2013; Li et al., 2014;

Skonieczna et al., 2015; Hartmann et al., 2016; Marchi et al., 2017; Matisoo-Smith et al., 2018; Peng et al., 2018)

Figure 54: Median joining network of U4a

154

ρ statistic and therefor TMRCA for were not calculated due to small sample size.

Published age estimates of U4a are 14949.4 ± 3348.1 years (Behar et al., 2012), U4

20900 years (11000-31200) (Soares et al., 2009).

U7

152-980-3741-5360-8137-8684-10142-13500-14569-(16309)-16318T defines U7 haplogroup.

U7 is a West Eurasian haplogroup, also observed in South Asia. A total of 6 (1.604 %) sequences were classified as U7. U7 Haplogroup, with Total 6 samples, was observed among Bhil (2 samples), Kokana (2 samples), Warli (2 samples), tribal populations. It was absent in Pawara tribal population (Table 27).

U7, U7a, U7a3b lineages were observed among the four tribal populations included in the present study (Figure 55).

48 published sequences in addition to the presently generated sequences were used for the Median joining tree construction and refining the haplogroup classification.

(Palanichamy et al., 2004, 2015b; Behar et al., 2008, 2012; Schönberg et al., 2011;

Derenko et al., 2013b; Khan et al., 2013; Lippold et al., 2014; Sharma et al., 2015;

Larruga et al., 2017; Margaryan et al., 2017; Sahakyan et al., 2017; Zheng et al., 2018)

TMRCA and its SD in years, estimated using ρ statistic, for U7a haplogroup in the present study is 12542 ± 3784 years. Published age estimates of U7a are 16718.3 ±

3017.7 years (Behar et al., 2012), U7 21800 years (11500-32600) (Soares et al., 2009).

TMRCA for U7a haplogroup from the present study is lower than Behar et al. (2012) estimate.

155

U7a3b

U7

Figure 55: Median joining network of U7a

West Eurasian Haplogroups among Tribal Populations of

Maharashtra

West Eurasian Haplogroups seen among tribal populations of Maharashtra are: X2d, J,

T, HV14, H2, H3, H6, H13, U1, U5, K2a, U4a, U7

West Eurasian lineages in India are more common in north-western region and their frequency decline as one goes eastwards and southwards (Metspalu et al., 2004).

Arrival of west Eurasian haplogroups in India has been associated with Agricultural migrations (Kivisild et al., 1999b; a, 2000; Palanichamy et al., 2015b), or ‘Indo-Aryan

156 invasion’ associated with the introduction of the caste system (Bamshad et al., 2001).

There is no agreement on proliferation of west Eurasian lineages being linked to the spread of agriculture, the proto-Elamo-Dravidian language, and the Indo-Aryan migration (Palanichamy et al., 2015b). However, west Eurasian connection is suggested to be more from central Asia and Caucasus regions than any other region, and the admixtures are from multiple arrivals from northwest rather than only limited to

Neolithic or Bronze age (Silva et al., 2017).

X2d

X2 has a wide but intermittent distribution across west Eurasia (Reidla et al., 2003), and it needs to be further explored by complete sequencing of mtDNA.

J

J haplogroup in India is found predominantly in southern (Andhra Pradesh) and northern (Punjab and Uttar Pradesh) regions (Palanichamy et al., 2015b) and J1b1a1 observed among Kokana, is also predominant in Pakistanis, Europeans and central

Asians (Palanichamy et al., 2015b). Its deep coalescence age may suggest its arrival

~20000 in a period of relative warmth (Silva et al., 2017).

T

T branches are common among Andhra Pradesh, Tamil Nadu, Uttar Pradesh, Punjab, and Maharashtra populations. Indian T1- derived lineages cluster mainly with the Near

Eastern populations particularly from Iran, Iraq, and Azerbaijan, (Palanichamy et al.,

2015b) , and median joining network shows Kokana sample sharing basal mutations with samples from Egypt (Kujanová et al., 2009) and its tentative age in South Asia has been put in Holocene (Silva et al., 2017).

157

HV14a

HV14a and its linkage with Iranian population has been used to suggest its introduction along with proto-Dravidian language by Neolithic pastoralists (Palanichamy et al.,

2015b). HV14 among Dravidian speaking Melkudiya tribe, also supports the Elamo-

Dravidian linguistic connections (Sylvester et al., 2018), with shared ancestry with populations of Iran (Derenko et al., 2013a), also clustering with the Central Asia populations(Margaryan et al., 2017; Peng et al., 2018). Warli sample from the present study occupies the basal position in the median joining tree, and to shed further light on arrival of HV14a in India, its complete sequencing may be necessary. This haplogroup provides the important link of Warli with Iranian populations.

H2, H3, H6, H13

H2a2, H3b6, H6, H13a1d need to be further confirmed by complete mtDNA sequencing. However, they may indicate a origin in Caucasus, Iran and Anatolia

(Roostalu et al., 2007; Silva et al., 2017) and Neolithic arrival in India (Silva et al.,

2017).

U1a

U1a, like HV14, has been suggested to be associated with spread of Dravidian languages from west Asia (Palanichamy et al., 2015b). Pawara and Kokana samples from the present study are derived lineages of basal U1a sample from Iran in the mtDNA CR region based median joining network. This haplogroup provides the important link of Pawara and Kokana with Iranian populations and needs to be completely sequenced to shed light on their presence in tribal populations of India.

158

U5b2a

While the observed samples share basal mutations with Indian, West Eurasian and

Central Asian samples, this haplogroup needs to be further confirmed with complete mtDNA sequencing, as it has several derived mutations not seen presently in U5b2a.

K2a5

K2a5 like X2d, provides a link to middle eastern and Iranian populations. Control region sequence of Bhil is exact match with Iranian (Derenko et al., 2013a) and

Ashkenazi (Costa et al., 2013) and a Indian (Palanichamy et al., 2004) sequences. It might have arrived in India in Neolithic period (Silva et al., 2017).

U4a1

Frequency of U4 lineages is low in India (Palanichamy et al., 2015b), and while the

Warli samples share basal mutations with South Asian, West Eurasian and Central

Asian samples in the control region based median joining tree, it needs to be confirmed further.

U7

Spread of U7 and its derived branches is a complex phenomenon, with at least two dispersal events from Near East (Sahakyan et al., 2017). At the resolution provided by control region alone, only one Bhil sample has been allocated to terminal branch,

U7a3b. U7a3 has its expansion time prior to Holocene. Other lineages are also likely to have pre-Holocene origins, though they need to be further characterised by complete mtDNA sequencing. Thus U7a sequences cannot be associated with arrival of Indo-

European speakers (Palanichamy et al., 2015b). This association was made by

(Palanichamy et al., 2015b) prior to the extensive study (Sahakyan et al., 2017) established the complexity of U7 lineages.

159

In summary, it can be said that West Eurasian lineages seen in Tribal populations of

Maharashtra may emanate from variety of geographical sources (Iran, Middle east,

Caucasus, Europe), and in varied time scales spanning from ~20000 to possible

Holocene, and have provided a link to possible sources from Iran, Middle and Near

East and Caucasus.

Analysis of Population affinities based on Haplogroup

Frequencies

Principle component analysis was conducted on haplogroup frequencies of 38 (4 present study + 34 secondary data) populations. A total of 152 haplogroups from the present study as well as published sources (Rakha et al., 2010; Thangaraj et al., 2010;

Shah et al., 2011b; Sharma et al., 2012; Derenko et al., 2013a; The 1000 Genomes

Project Consortium et al., 2015; Chaubey et al., 2016; Tamang et al., 2018), were used for this analysis. PCA plot is shown in Figure 56.

It indicates two distinct clusters, one of north-eastern, Tibeto-Burman language speaking populations, and other of Dravidian and Indo-European language speaking populations. Presence of Austro-Asiatic Khasi and Kra-Dai speaking populations within the cluster of North-Eastern populations further substantiates influence of geographical proximity. Dravidian language speaking populations from South India and

Indo-European language speakers from Maharashtra as well as from Gujarat and

Pakistan (PJL) are tightly clustered indicating similarity among them. This may be explained by the substantial South Asia specific haplogroups present among all these populations with only a minor faction differing among them.

The inset shows populations from the current study. Warli, Mahadeo Koli and Bhils from Madhya Pradesh are very closely placed with Bhils from Maharashtra slightly

160 away from them. Kokana are close to Dravidian speaking Kare Vokkal population from

Karnataka, followed by Pawara.

Figure 56: PCA map of first two components among populations from South Asia Inset map focuses on populations under study.

Comparison of Genome wide Autosomal data with mtDNA data

Principal component plot of Genome wide analysis is shown in Figure 57.

The plot, owing to the power of autosomal markers, distinguishes Tribal populations from caste and 1000 Genome samples. Unlike mtDNA MDS and PCA plots, this plot is based on individuals, rather than populations. Four clearly demarcated clusters can be seen, with Bhils and Pawara overlapping, Kokana and Warli clearly separated from rest of the individuals. However, the Caste samples, Deshastha Brahmins and Maratha overlap with the 1000 Genomes population, without any apparent separate cluster.

161

Figure 57: PCA plot of Genome wide autosomal markers among Tribal and caste populations of Maharashtra and South Asian populations from 1000 Genome Phase 3

(Populations codes: Bhil - IN-BH, Pawara - IN-PW, Kokana - IN-KN, Warli - IN-WR, Deshastha Brahmins - IN-DB, Kunabi Maratha - IN-KM, GIH-Gujarati Indian from Houston, Texas, PJL-Punjabi from Lahore, Pakistan, BEB-Bengali from Bangladesh, STU-Sri Lankan Tamil from the UK, ITU-Indian Telugu from the UK)

When comparing this with mtDNA data, it is abundantly and intuitively clear that autosomal data has brought out the same population affinities seen among the four tribal populations of Maharashtra, when they were analysed without any secondary data. mtDNA based population affinities indicated that Bhils and Pawara are virtually indistinguishable from each other, and autosomal data reflects the same. However,

Genome wide autosomal data has also brought out clear distinction between caste and tribal populations.

162