1. Introduction

India harbours a tremendous cultural, social, linguistic and genetic diversity. In , almost all the major religions are represented. People of India speak languages belonging to four language families - Austro-Asiatic, Dravidian, Tibeto-Burman and

Indo-European. Socially, population of India can be divided into Tribals and Non- tribals. Non-tribal population mostly comprises of various communities from Hindu religious fold, hierarchically arranged in relation to each other through varna and caste system. Besides, it also includes various religious communities such as Muslims,

Christians, Sikhs, Buddha, Jain etc (Reddy et al., 2010). Tribal populations constitute about 8% of the total population. There are approximately 400 tribal groups in India.

Genetically, India is quite heterogeneous, exhibiting diversity, which is more than any other comparable region of the world (Majumder, 1998). Origins of such diverse populations of India remain obscure. However, four potential sources are suggested:

First corresponds to early Palaeolithic occupation of India, second to migration of farmers possibly speaking proto- from western Iran during

Neolithic, third to the arrival of Indo-European speaking pastoral nomads roughly 3500 years ago and the last of the Sino-Tibetan speakers in the North-Eastern India

(Vidyarthi, 1983; Cavalli-Sforza et al., 1994; Gadgil et al., 1997; Cordaux and

Stoneking, 2003). A composite picture has emerged out of these studies: that the Indian subcontinent was peopled rapidly by the first wave of modern humans out of Africa and probably much earlier than Europe Thangaraj et al. (2005) The majority of the Indian mtDNA belongs to Indian specific haplogroups, which show a deep coalescence age and have diversified in-situ without influence from outside regions (Kivisild et al. 1999,

2003; Metspalu et al. 2004) The mtDNA haplogroups are shared between populations and remarkably, do not cluster according to language families or social groups (castes,

1 tribes and religion etc.) (Cordaux et al. 2003; Majumder 2001; Roychoudhury et al.

2000, 2001). Indian Y chromosome haplogroups too show a deep coalescence age implying that the pre-Holocene and Holocene migrations have primarily shaped the Y chromosome boundaries (Sengupta et al. 2006). Thus, in essence, these findings emphasize the continuity of their gene pool, contrary to the traditional arguments and models, where the genetic structure and heterogeneity of Indian populations were often traditionally attributed to multiple migrations (Vidyarthi, 1983; Cavalli-Sforza et al.,

1994; Gadgil et al., 1997). Studies using genome wide SNPs (Reich et al., 2009;

Moorjani et al., 2013; Basu et al., 2016) have shown a much more complex history- despite the antiquity and continuity of the early gene pool of Indians, there has been multiple admixture events, and Indian populations exhibit at least 4 ancestry sources.

This admixture event may have happened ~ 1,900 to 4,200 years ago, followed by extensive adoption of practice of endogamy. Admixture events with ‘Iranian

Agriculturists’ and ‘Steppe pastoralists’ have also been indicated by ancient DNA

(Narasimhan et al., 2018). Thus, the peopling in India has a much more complicated history, which has also been reshaped by cultural and demographic events.

Such migrations cannot be studied in isolation, but within the framework of evolution of modern humans. It is now widely accepted that modern humans evolved in

Africa relatively recently and spread to other parts of the world through many dispersal events (Foley, 1998). However, the routes by which modern humans spread to the other parts of the world remain poorly understood. Indian subcontinent, in this regard, is a crucial geographic area as it is located at the crossroads of Africa, Eurasia and Pacific,

(Cavalli-Sforza et al., 1994; Cann, 2001) and it has also been suggested that India served as a corridor for the dispersal of the modern humans that started from Africa

100,000 years ago (Cann, 2001). It has also been suggested that colonisation of the

2

Indian subcontinent may have happened ~60000 years ago, through southern route

(Mellars et al., 2013), and a demographic expansion of populations may have taken place in the subcontinent (Atkinson et al., 2008).

Thus, it seems that, lying at the crossroads; India has been peopled by various human groups carrying diversity of genes and culture at different times. How was India peopled? When? What routes did different people take? Where did the people move? - such issues remain at the centre of anthropological enquiry today.

Migration of people is not merely a matter of demographic change, but it also implies migration of genes, ideas, customs, languages and technology. Apart from the changes in gene pool, it also involves cultural contacts producing variety of responses and thus bringing about socio-cultural changes, in both the cultures, the migrant and the autochthones. For example, formation of Varna based hierarchical society and caste system is attributed to arrival of Indo-European language speakers. Among many other socio-cultural changes, introduction of agricultural technology and various crops and linguistic changes are important. This scenario raises a few questions; assuming that such contacts would involve introduction and exchange of genes, could the traces of migrations and past cultural contacts be discerned using molecular techniques? What were those interactions? Whether language change and introduction of agriculture were accompanied by changes in gene pool or it was only a process of cultural diffusion?

Whether these contacts wiped out the original populations or there was genetic admixture of varying levels?

Interdisciplinary perspective on Indian Subcontinent

Indian subcontinent was peopled rapidly by the first wave of modern humans out of Africa (Thangaraj et al., 2005). Majority of the Indian mtDNA belong to Indian specific haplogroups, which show a deep coalescence age and have diversified in-situ

3

(Kivisild et al., 1999a, 2003; Metspalu et al., 2004). mtDNA haplogroups are shared between populations and do not cluster according to language families or social groups

(castes, tribes, religion etc) (Roychoudhury et al., 2000, 2001; Majumder, 2001b;

Cordaux et al., 2003). Indian Y chromosome haplogroups too show a deep coalescence age implying that the pre-Holocene and Holocene migrations have shaped the Y chromosome boundaries (Sengupta et al., 2006). Thus, these findings emphasize the continuity of their gene pool, contrary to the traditional arguments and models, where the genetic structure and heterogeneity of Indian populations was often attributed to the several migrations (Vidyarthi, 1983; Cavalli-Sforza et al., 1994; Gadgil et al., 1997).

Recent examination of skeletal record from the subcontinent has also emphasised such continuity of populations (Walimbe, 2007). Today, India has 50000 - 60000 endogamous groups (Gadgil and Malhotra, 1983). It follows that these endogamous groups are formed primarily due to process of fission with strong founder effects

(Nakatsuka et al., 2017), possibly after large scale admixture migrating populations

(Reich et al., 2009; Basu et al., 2016). It is suggested that fissioning of groups and lack of fusion with other groups of same occupation and social rank, maintained through strict endogamy, created several endogamous groups (Karve, 1961; Karve and

Malhotra, 1968; Malhotra and Vasulu, 1993). MtDNA sequences are shown to be useful in detecting past demographic events (Harpending et al., 1998; Harpending and

Rogers, 2000) apart from their use in deciphering the population affinities. Application of such techniques have demonstrated different demographic histories of several tribal populations from India (Cordaux et al., 2003). Tribal populations of southern India showed reduced diversity and no signal of population expansion, in contrast, North

Indian populations showed signatures of Population expansions (Cordaux et al., 2003).

4

Even different demographic histories were discerned for two castes inhabiting the same geographic area (Mountain et al., 1995a).

Taking advantage of the global mtDNA phylogenetic tree (Torroni et al., 2006), studies using phylogeographic approach have produced the detailed timeline of evolution of various mtDNA haplogroups native to South Asia (Palanichamy et al.,

2004; Sun et al., 2006; Chandrasekar et al., 2009) and haplogroups that may have entered later (Palanichamy et al., 2015b; Silva et al., 2017; Sylvester et al., 2019)

Thus, today, advances in molecular genetics and population genetic analyses not only allow us to study the population affinities but also to study the demographic changes that might have happened in the past. It is possible to date the demographic events and mutational events. Such dating is vital for contextualising genetic data within archaeological evidences or hypotheses. For example, is adoption of agriculture associated with population fissioning? What were demographic consequences; population expansions or bottlenecks? Are there signatures of Admixture/populations replacement/language replacement?

Within this broader framework, this study specifically aims to understand the genetic affinities and demographic history of tribal communities of Maharashtra, which have not been studied so far using molecular markers.

Genetic implications of Migrations: Some Scenarios

The change from Hunting-Gathering lifestyle to Agriculture led to an explosive increase in population (Cavalli-Sforza et al. 1994 and references therein). After its initial advent in few areas (such as Fertile crescent and China among others) agriculture soon spread around the globe. Whether the spread was through a process of cultural diffusion or by the migrating farmers themselves? (Jobling et al. 2004: pp 300-308)

5

Today, around 6500 languages are spoken around the world. Linguists believe that any language family that exists today is older than 10000 years old. Further, likely homelands of the largest language families appear to be around the centres of agricultural innovations. This has led to the hypothesis that languages also moved with farmers. (Bellwood 2001, Renfrew 1988)

Cultural traits such as agricultural technology and languages can spread without concomitant spread of genes. Further, while language could be imposed only through technological dominance without the need for numerical dominance, language spread could also be achieved with negligible gene flow. However, agricultural populations would be in conflict with hunter-gatherers for the control of land, and this could result in assimilation or total replacement of populations or even adoption of agriculture by the hunter-gatherers. Such population admixture scenario have been postulated in

Europe using ancient DNA studies (Lazaridis et al., 2014; Lazaridis, 2018) and for populations of South and Central Asia (Narasimhan et al., 2018).

Tribal Populations of Maharashtra

Tribal populations are considered as relic populations of unknown origin

(Cavalli-Sforza et al., 1994). It is generally accepted that tribal populations are the original inhabitants of the subcontinent (Thapar, 1966; Ray, 1973) and are the minorities that have not been absorbed into the caste system (Vidyarthi, 1983). It can be speculated that, they might have adopted agriculture through contacts with people having advanced technology of agriculture in prehistoric times, with or without gene flow.

Culturally distinct tribal communities of Maharashtra share several cultural traits (Bokil, 2006). Tribal communities such as Bhil, Pawara, Warli, Kokna and

Thakar, among a few others, speak dialects of Marathi, a language belonging to Indo-

6

European language family. Some of their deities are common. Their principal occupation is agriculture and wage-labour. It is suggested that these agricultural tribal communities are the extant representatives of chalcolithic cultures of Maharashtra

(Bokil, 2006; Jonnalagadda et al., 2013).

During prehistoric times, contacts with communities with advanced technology of agriculture and/or the scarcity of ecological resources may have forced different groups to adopt various subsistence strategies (e.g. pastoralism, agriculture on hill slopes or extensive agriculture in river valleys, many others choosing to become artisans) thus creating a separate ecological niche for themselves. Further, several other groups may have taken up the occupations such as carpentry, pottery, ironsmiths, cobblers etc. This may be responsible for formation of several endogamous groups

(Bokil, 2006). These endogamous groups were perhaps incipient castes(Karve, 1961).

Such events may also have resulted in genetic drift and founder events, thereby changing the genetic and demographic structure.

Statement of Research Problem

Maharashtra lies at the junction of Northern India and Southern India and is a culture- contact region which shows a thorough mingling of Sanskritic and Dravidian traditions as evidenced by kinship organisation which is modelled on Dravidian south (Karve,

1968) and typical South Indian village place-name endings found throughout

Maharashtra (Southworth, 2005). In such a cultural contact region, Indo-European language, and agriculture of tribal populations of Maharashtra raise a few questions;

What are the genetic affinities of the tribal populations of Maharashtra? How are they related to other caste groups? Do they derive from a common population which fissioned into several groups? What was the demographic outcome of such fissions? Is the signature of admixture with immigrant Agriculturists and Indo-European language

7 speaking migrants seen among the tribal populations? Have they acquired agriculture and language through demic diffusion or cultural diffusion or through combination of both?

To address these issues, present research aims to understand the genetic affinities and demographic history of select tribal communities of Maharashtra using mitochondrial

DNA.

The specific objectives of the study are as follows:

1) To document the mtDNA diversity in terms of mitochondrial control region

sequences among select tribal populations of Maharashtra.

2) Using the sequence data, partially reconstruct the demographic histories (such

as past demographic expansions or bottlenecks) of select tribal populations of

Maharashtra

3) to understand the genetic affinities of the select tribal populations of

Maharashtra

4) Additionally, the generated sequences will be compared with the published

sequences from previous studies in order to understand the affinities of these

populations with other population groups to shed light on the issues of

migration and its outcomes.

5) As mtDNA provides only the maternal view, Genome wide autosomal data

will be compared with the mtDNA data in order to assess whether similar

populations affinities are expressed by autosomal data.

8

Mitochondrial DNA

Mitochondria are cell organelles that produce energy for the cell. Each mitochondrion has its own circular DNA molecule (mtDNA), which is ~16569 base pairs long.

Further, mtDNA is transmitted exclusively from mother to child; males do not transmit their mtDNA to their offspring. In other words, mtDNA is maternally transmitted.

Unlike the nuclear genome, with the exception of non-recombining region of Y- chromosome, mtDNA does not recombine. Thus, new variation in mtDNA molecule can arise only due to mutation. MtDNA evolves rapidly, owing to the elevated mutation rate. Exclusive maternal inheritance helps to chart the history of population by tracing the mutations back along the maternal line, leading to a common ancestor famously termed as ‘Mitochondrial Eve’. Such peculiarity of mtDNA makes it a good candidate to analyse population histories. Sequencing the hypervariable segment-1 (HVS-1), one of the most polymorphic sites, was the most popular method for assaying mtDNA diversity and hundreds of thousands of HVR-1 sequences are available in databases

(Jobling et al., 2013a). With the declining cost of sequencing, sequencing the mitochondrial genome (mtGenome) also has caught up. This has resulted in haplogroup databases like Phylotree which presents the global mitochondrial phylogenetic tree using more than 20000 mtGenomes, helping research “harvest the fruit of mtDNA tree”

(Torroni et al., 2006) to answer questions related to migrations.

Approaches to study mtDNA

Initially mtDNA was studied using Restriction Fragment Length Polymorphisms

(RFLP) alone. Later as the technology for PCR became available, RFLP of PCR amplified products became common. In tandem with the DNA sequencing technology, sequence analysis of HVR -1 took over. With the cost for sequencing going down and with availability of high-throughput sequencing, recent studies are increasingly

9 sequencing the whole mtDNA genomes. While whole genomes are required for identifying new clades of mtDNA phylogeny, once identifies, the SNPs can be typed with PCR-RFLP and multiplex SNaPshot assays (Pakendorf and Stoneking, 2005).

There are two basic approaches to study mtDNA with evolutionary view: Lineage based approach and Population based approach.

Phylogenetic or lineage-based approach

In this approach, the researchers type various SNPs in mtDNA or sequence the mitochondrial genome and reconstruct the Phylogenetic tree of mtDNA. Based on combinations of several SNPs, mtDNA sequences can be classified into Haplogroups

(Bandelt et al., 2012). Haplogroups tend to be geographically specific and hence can be used to reconstruct the routes of migration, admixture proportions for communities around the known routes of migration, demographic events and age of the lineage.

However, it must be borne in mind that the age of lineage does not mean the age of the population.

Population Genetic approach

In contrast, the population genetic approach studies the prehistory of individual populations using distance statistics and other population genetic methods. Pakendorf and Stoneking, (2005) suggest that “To study the prehistory of human populations, it is crucial to use statistical methods that examine population relationships, e.g., calculating genetic distance values and displaying population relationships in trees or multidimensional scaling (MDS) plots”.

More and more researchers today are combining both the approaches and are analyzing both, the haplogroup affiliations as well as population affinities using population genetic methods. It must be noted that while HVR-1 alone cannot resolve the

10 haplogroups, it provides enough polymorphic sites for population genetic analyses

(Pakendorf and Stoneking, 2005).

Present study primarily uses population genetic approach to decipher the diversity and affinity among the select tribal populations. However, it also uses the Haplogroup classification and frequencies as well to understand affinities and migrations. mtDNA diversity in India: Picture from Haplogroups

Haplogroup M is the most frequent mtDNA cluster in India. It has been suggested that

M represents the earliest wave of anatomically modern humans out of Africa (Kivisild et al., 1999a; Quintana-Murci et al., 1999; Kivisild et al., 2000), through the ‘southern route’ (Cavalli-Sforza et al., 1994; Lahr and Foley, 1998; Mellars et al., 2013).

However, lineages of M from India differ from eastern and central Asian populations.

This most likely represents the in-situ differentiation of maternal lineages since

Palaeolithic