GENETIC CHARACTERIZATION OF MITOCHONDRIAL

DNA IN MAKRANI AND KALASHI POPULATION FROM PAKISTAN

BY

MUHAMMAD HASSAN SIDDIQI

DEPARTMENT OF ZOOLOGY UNIVERSITY OF THE PUNJAB QUAID-I-AZAM CAMPUS LAHORE, PAKISTAN (2014)

GENETIC CHARACTERIZATION OF MITOCHONDRIAL DNA IN MAKRANI AND KALASHI POPULATION FROM PAKISTAN

A THESIS SUBMITTED TO UNIVERSITY OF THE PUNJAB IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN ZOOLOGY BY MUHAMMAD HASSAN SIDDIQI

SUPERVISOR PROF. DR. TANVEER AKHTAR

DEPARTMENT OF ZOOLOGY UNIVERSITY OF THE PUNJAB QUAID-I-AZAM CAMPUS

LAHORE, PAKISTAN (2014)

IN THE NAME OF ALLAH, THE MOST BENEFICENT, THE MOST MERCIFUL

AL QURAN

Translation: O ye, who believe, stand out firmly for justice, as witnesses to Allah, even as against yourselves, or your parents, or your kin, or whether it be (against) rich or poor: for Allah can best protect both. Follow not the lusts (of your hearts), lest ye swerve and if ye distort (justice) or decline to do justice, verily Allah is well acquainted with all that ye do (Quran 4:135).

CERTIFICATE

This is to certify that the research work described in this thesis is the original work of the author Mr. Muhammad Hassan Siddiqi and has been carried out under my direct supervision. I have personally gone through all the data/results/materials reported in the manuscript and certify their correctness/authenticity. I further certify that the material included in this thesis have not been used in part or full in a manuscript already submitted or in the process of submission in partial/complete fulfillment of the award of any other degree from any other institution. I also certify that the thesis has been prepared under my supervision according to the prescribed format and I endorse its evaluation for the award of Ph.D. degree through the official procedures of the University.

Prof. Dr. Tanveer Akhtar Supervisor Department of Zoology, University of the Punjab, Lahore

DEDICATION

This work is dedicated to my Father, Mother (Late) and brother Muhammad Aslam (Late) who have been the source of inspiration since my childhood and who were always there to help me, gave me courage and strength to accomplish all the goals of my life including this prestigious research achievement. You are a part of every page, every thought and all the work.

CONTENTS

Title Page No.

SUMMARY ...... i

1. INTRODUCTION...... 1

2. LITERATURE REVIEW ...... 6

2.1 Hypervariable Sites ...... 10

2.2 ...... 10

2.2.1 African Haplogroups ...... 11

2.2.2 West Eurasian Haplogroups ...... 11

2.2.3 Southeast Asian ...... 13

2.3 The Role of the mtDNA in Ancestry Studies ...... 14

3. MATERIALS AND METHODS ...... 16

3.1 Sample Collection Areas...... 16

3.2 Makrani Population ...... 17

3.2.1. Sample Collection ...... 17

3.3. Kalash Population ...... 20

3.3.1. Sample Collectio ...... 20

3.4 DNA Extraction and Quantification ...... 25

3.5 PCR Amplification...... 25

3.5.1. Preparation of Agarose Gel ...... 27

3.6. Sequencing ...... 27

3.7. Statistical Analysis ...... 28

4. RESULTS ...... 29

4.1. Sampled populations ...... 29

4.2. Genomic DNA quality and PCR amplification of mtDNA control region 29

4.3. Sequencing the control region of mitochondrial DNA ...... 29

4.4. Reconstruction and alignment with rCRS...... 39

4.5. Identification of haplotypes and assignment of haplogroups ...... 45

4.6. Frequency of mtDNA haplogroups ...... 55

4.7 The Haplogroups Diversity within Sub-ethnic group of Kalash Population 56

4.8 Frequency of mtDNA Haplogroups ...... 58

4.9. The Construction of Median Joining (MJ) Networks ...... 59

4.10 The Occurrence and Distribution of Nucleotide Variations in mtDNA Control Region ...... 61

4.11. Heteroplasmy ...... 65

4.11.1. Point heteroplasmy...... 65

4.11.2. Length heteroplasmy ...... 68

4.12. Comparison of haplogroup frequencies and continental origins in Sub- populations of Pakistan ...... 70

4.13. Comparative statistical analyses of different Pakistani subpopulations ..74

5. DISCUSSION ...... 75

REFRENCES ...... 86

APPENDIX

SUMMARY Mitochondrial DNA (mtDNA) analysis has gained importance in forensic investigations especially for cases where the genomic DNA found is highly degraded or very less in quantity. Due to the high copy number of mtDNA in a cell increases the possibility of some copies of mtDNA to be intact in such samples. The variations in mitochondrial genome have been proven to be the most powerful genetic marker for investigating gene pools and tracing maternal genetic relatedness of the suspect. The control region of mtDNA including hypervariable segments (HVSI, HVSII and HVSIII) has been considered the most important chunk of polymorphic DNA in the mitochondrial genome. This study reports the haplotype data of mtDNA control region (spanning positions 16,024–16,569 and 1–576) including hypervariable segments (HVSI, HVSII and HVSIII) for two-genetically distinct and isolated populations of Pakistan i.e. Makrani & Kalashi. The genetic and forensic parameters were studied by sequencing the entire mitochondrial DNA control region of 100 unrelated Makrani individuals (males, n = 96; females, n = 4) and 111 Kalashi individuals (males, n = 63; females, n = 48). A total of 149 polymorphic positions were detected in Makrani population. Based on the entire profile of mutations along the mtDNA control region comparative to revised Cambridge Reference Sequence (rCRS), seventy different haplotypes were observed in the Makrani with 54 unique and 16 haplotypes shared by more than one individual in the population. Point heteroplasmy was observed at 5 different positions in Makrani accounting for 13% of the individuals. Only one individual presented more than one point heteroplasmy in the Makranis. Median Joining Network analysis showed the substantial divergence among the haplotypes in Makrani population. In Kalashi population, a total of 47 polymorphic positions were detected. After comparing with rCRS, 14 different haplotypes were observed in the Kalashi population with 5 unique and 9 haplotypes shared by more than one individual. Point heteroplasmy was observed at 6 different positions accounting for 58.56% of the individuals. In this case, three individuals presented more than one point heteroplasmy. Limited divergence among the haplotypes has been observed in Kalashi population while plotting Median Joining Network.

i Based on identified haplotypes, the Makranis showed admixed mtDNA pool consisting of African haplogroups (28%), West Eurasian haplogroups (26%), South Asian haplogroups (24%), and East Asian haplogroups (2%), however, the origin of the remaining individuals (20%) could not be confidently assigned in this population. Moreover, two haplotypes observed in the Makranis, both carrying a characteristic combination of two mutations in HVSII (154C and194T) could not be confidently assigned to a known (sub) haplogroups, although the presence of both 16223T and 489C indicate membership within macro-haplogroup M; this lineage was therefore tentatively assigned to haplogroup named ‘‘M-154-194’’. Future studies performing complete mitogenome sequencing, may elucidate the precise phylogenetic position of this lineage. The high frequency of African mtDNA haplogroups in Makranis shows their origin with major genetic contribution from Bantu from southeastern Africa and Fulani people of West- as a result of African slave trade. In Kalashi population, the dominating haplogroups were West Eurasians (98.2%) while a small proportion (0.9%) of South Asians were also observed. However, one of the Kalashi sample could not be assertively allocated with any of the known sub-haplogroups. The greater frequency of West Eurasian haplogroups in Kalash might be the consequence of the Arab and Muslim conquests, the rise of the British Indian Empire and invasion by the armies of Alexander the Great. The high genetic diversity (0.9688), consequently, a high power of discrimination (0.9592) and low random match probability (0.048) reflects intense gene flow in the Makrani population. In contrast, extremely low genetic diversity (0.8393), low power of discrimination (0.832) and higher probability match between two random individuals (0.168) in Kalashi population were observed. The low genetic diversity in Kalash may be explained by genetic drift in the population due to either low population size or endogamy. These data would be a valuable contribution to build a database of entire mtDNA control-region sequences, which may significantly contribute for both the populations to estimate the rarity of mtDNA profile under investigation in Pakistan.

ii ACKNOWLEDGEMENTS

All acclamations and appreciations are for ALIMIGHTY ALLAH, the Omnipotent, the Omnipresent, the Compassionate, the Beneficent and the source of all knowledge and wisdom, who bestowed upon me the intellectual ability, courage and strength to complete this humble contribution towards knowledge. I am proud of being a follower of the Holy Prophet Hazrat Muhammad (PBUH), the most perfect and exalted among and of ever born on the surface of earth, which declared it to be an obligatory duty of every man and woman to seek and acquire knowledge.

My wholehearted thanks goes to the worthy Chairman Department of Zoology, Professor, Dr. Muhammad Akhtar for providing the established and an inspirational environment and somewhat of a second home to me and other researchers. He has been helpful in every facet of my graduate studies.

Furthermore, I feel highly privileged to take this opportunity to wish my profound gratitude with a deep sense of obligation to my doctoral research supervisor, Dr. Tanveer Akhtar, professor, Department of Zoology, for her personal interest, inspiring guidance, helping attitude, and above all for providing necessary laboratory facilities during the whole span of this research work.

I would like to thank all people who have helped and inspired me during my doctoral study. My cordial thanks goes to Dr. Fazle Majid Khan, Dr. Allah Rakha, Dr. Muhammad Akram Tariq and Dr. Muhammad Farooq Sabar, for their guidance and cooperation whenever needed all the time.

I would like to thank, Sher Khan Kalash, Faizi Khan Kalash, Syed Said Hussain Shah, Subhan Shah, Abid Naqvi, Dr. Jamil Ahmad, Akram Ali, Ghazanfar Abbas, Sikandar Hayat, Dr. Muhammad Irfan, Syed Yasir Abbas Bokhari, Afia M Akram, Sana Shahbaz, Naeem Haider, Ali Akhtar, Usman Akhtar, Farooq Akhtar, Dr. Muhammad Akbar, Dr. Khurrum Shahzad, Imran Hussain Bhatti, Faizan Riaz Cheema Shahid Yar Khan Khadija Fazal Karim who have helped and inspired me.

I am grateful to my senior friends Dr. Umar Farooq, Dr. Abdul Majid Khan, Javed Akram, and Dr. Zafar Iqbal for helping me get through the difficult times and for all the emotional support, camaraderie, entertainment, and caring they provided.

I wish to extend my thanks to members of Paleontology Lab., Physiology Lab., Cell and Molecular Biology Lab., Wild life and Environmental Health Lab., Biochemistry Lab., Microbiology Lab., Developmental Biology Lab., Entomology Lab., Fisheries Lab., all the scientific staff, especially Mr. Abbas Anjum, para scientific staff especially Ashfaq Ahmad and Administrative staff of Department of Zoology, those had been directly and indirectly instrumental in my research work.

My utmost gratitude goes to Mannis van Oven and Oscar Lao, Department of Forensic Molecular Biology Erasmus MC, University Medical Center Rotterdam, The Netherlands, for their helpful discussion.

No words can express and no deeds can return the love, affection, amiable attitude, sacrifices, advices, unceasing prayers, support, and inspiration that my Father, my brothers and my sister imparted in me during my whole academic career.

Muhammad Hassan Siddiqi

LIST OF TABLES Table Page No. Title No. 3.1 The detailed data of consent forms from Makrani population 17

3.2 The summarized information about sampling of Makrani 20 population from different cities of three provinces of Pakistan

3.3 The detailed data of consent form from Kalashi population 21

3.4 The summarized information about sampling from three different 25 valleys of Kalash population

3.5 List of oligonucleotides, along with melting temperatures (Tm), 26 concentrations and sequences used for amplification and sequencing of the mtDNA control regions

4.1a The estimated haplotypes and haplogroups in Makrani population 46

4.1b The estimated haplotypes and haplogroups in Kalashi population 51

4.2a Differences observed in haplogroup estimation of Makrani 55 population either manually or by HaploGrep

4.2b Differences observed in haplogroup estimation of Kalashi 56 population either manually or by HaploGrep

4.3 The haplogroups diversity in each maternal sub-ethnic group of 57 Kalash

4.4a The occurrence and distribution of nucleotide variations in the 63 entire mtDNA control region of Makrani population

4.4b The occurrence and distribution of nucleotide variations in the 64 entire mtDNA control region of Kalashi population

4.5 Point heteroplasmy in the Makrani and the Kalashi populations 65

4.6 The length heteroplasmy distribution along the mtDNA control 69 region of the Makrani and Kalashi populations

4.7 The comparison of mtDNA haplogroups’ frequencies and their 71 continental origins among subpopulations of Pakistan

4.8 The comparison of diversity parameters estimated from the entire 74 mtDNA control region among subpopulations of Pakistan

LIST OF FIGURES Fig. No. Title Page No. 2.1 Human mitochondrial DNA map showing CR (control region). 7

2.2 mtDNA control region schematic diagram 8

3.1 Map of Pakistan showing its administrative regions and 16 neighboring countries

4.1 Agarose gel electrophoretic analysis of genomic DNA extracted 30 from blood samples

4.2 Agarose gel electrophoretic analysis of the mtDNA control 30 region PCR products

4.3 (a) Chromatogram of Makrani individual (MKH080) for entire 32 mtDNA control region sequenced by forward primer (F15975)

4.3 (b) Chromatogram of Makrani individual (MKH080) for mtDNA 34 control region sequenced by reverse primer (R635)

4.4 (a) Chromatogram of Kalashi individual (KLH015) for the entire 36 mtDNA control region sequenced by forward primer (F15975)

4.4 (b) Chromatogram of Kalashi individual (KLH015) for the entire 38 mtDNA control region sequenced by reverse primer (R635)

4.5 (a) The haplotype of Makrani individual (MKH080) for entire 42 mtDNA control region

4.5 (b) The haplotype of Kalashi individual (KLH016) for entire 45 mtDNA control region

4.6 (a) Graphical illustration of frequencies of mtDNA based 58 haplogroups in Makrani population

4.6 (b) Graphical illustration of frequencies of mtDNA-based 59 haplogroups in Kalashi population

4.7 (a) Median-joining haplotype network of the Makrani population 60 (70 haplotypes).

4.7 (b) Median-joining haplotype network of the Kalashi population 61 (14 haplotypes).

4.8 Point heteroplasmy observed at different positions of mtDNA 66 control region in the Makrani population

4.9 Point heteroplasmies observed at different positions of mtDNA 67 control region in the Kalashi population

4.10 Chromatograms showing the homopolymeric patterns of length 69 heteroplasmy in the Makrani Population

ABBREVIATIONS

EDTA Ethylene diamine tetra acetic acid KPK Khyber Pakhtunkhawa nDNA Nuclear DNA mtDNA Mitochondrial DNA L-strand Light strand H-strand Heavy Strand D-loop Displacement loop STR Short tandem repeat SNP Single nucleotide polymorphism RFLPs Restriction Fragment LengthPolymorphisms PCR Polymerase chain reaction HVR Hyper Variable region pM Picomol µl Micro liter

MgCl2 Magnesium Chloride mM Milli mole V Voltage UV Ultraviolet Rpm Revolution per minute HG haplogroup SA South Asian WEA West Eurasian SEA South East Asian EEA East Eurasian WA West Asian SWA South West Asia EA East Asia AF Africa KYA Thousand Years Ago MCL Maximum Composite Likelihood rCRS revised Cambridge Reference Sequence HVSI Hypervariable Segment I HVSII Hypervariable Segment II HVSIII Hypervariable Segment III

1-INTRODUCTION The genetic information is assembled within cells in the form of DNA sequences, either 23 pairs of chromosomes in human cell nucleus or DNA molecules in mitochondria. With the increase in the knowledge about the genetic differences in humans, the analysis of silent biological witness, the DNA molecule from crime scene, has become very important (Bandelt et al., 2012). There are two different kinds of DNA makers being utilized for DNA studies such as autosomal markers and uniparental (Y- chromosome, mitochondrial DNA) markers. The variations found within the autosomal chromosomes are called as “autosomal DNA markers”, which provide high discrimination power, and are considered powerful tool for human identifications. The Y chromosomal markers as being uniparental markers have been used as valuable tool in certain criminal investigations due to male specificity as males are usually culprits in most sexual assault cases. Recently, the analyses of second type of uniparental markers (mitochondrial DNA) in forensic investigations have gained remarkable importance especially in cases where the DNA found is highly degraded (such as ancient samples) or very less in quantity (such as stains, cigarette butts and fingernails etc.). The advantage of using mtDNA is due to presence of 1000–2000 mitochondria per human cells (e.g. liver cells) as well as five to ten copies of mtDNA per mitochondrion that increases the possibility of obtaining some copies of mtDNA for analysis from such samples. Along with copy number advantage of mtDNA, the clear-cut pattern of historical events can also be judged by mtDNA studies (Legros et al., 2004; Chong et al., 2005; Nilsson et al., 2008; Kavlick et al., 2011; Adachi et al., 2014). Mitochondria are unique among cell organelles as they contain their own genome and are quite distinctive from nuclear DNA. The human mtDNA is a circular double stranded DNA molecule composed of ~16569 nucleotides (Taylor and Turnbull, 2005; Lan, et al., 2008). The Cambridge Reference Sequence (CRS) of mtDNA published in 1981, has established the number of base pairs and functional genes in mtDNA (Anderson et al., 1981). By re-sequencing the mtDNA, the CRS was revised and named as revised Cambridge Reference Sequence (rCRS) that is used as standard for comparisons (Andrews et al., 1999). The mtDNA consists of 37 genes, 28 genes are the part of the H-strand while 9 are the part of the L-strand. Out of these, 13 genes encode

1 different types of proteins, which play different roles in respiration. The remaining 24 genes encode mature RNA products, out of these, 22 encode for mitochondrial tRNA molecules and two encode for mitochondrial rRNA molecules (16 s rRNA and a 12 s rRNA) (Andrews et al., 1999). Deletions or point mutations in mtDNA have been shown to be involved in human genetic defects (Holt et al., 1988; Shoffner et al., 1989) and are responsible for genetic differences between populations and advance the knowledge about their phylogenetic relationships (Guha et al., 2013) mtDNA has provided a wealth of interesting molecular enigmas since its discovery and it is being utilized in different fields like evolution, anthropology, history, inheritance and forensics (Brendan et al., 2013). The mtDNA and Y-chromosome analyses are being utilized for assessing continental origin or ancestry. However, mtDNA is advantageousin understandings about ancestry component and it provides valuable information about the maternal inheritance as well as intercontinental movements of humans. The continent specific polymorphisms in mtDNA are the key indicators to determine historical human migration routes and assessing population affinities (Chaitanya et al., 2014). In mitochondrial genome, the control region is the most polymorphic region of mtDNA, which is also called displacement loop (D- loop). This region is ~1122 bp in size (spanning positions 16024-16569 and 1-576) and is hot spot for mtDNA alterations (Michikawa et al., 1992; Tipirisetti et al., 2014). This region covers ~7% of the total mitochondrial genome (Andrews et al., 1999) and contains three-hypervariable segments, HVSI having fragment length of 342 bp (nps16024–16365), HVSII268 bp (nps73–340) and HVSIII 137 bp (nps 438-576). Two hypervariable segments (HVSI and HVSII) are the most polymorphic sites in mtDNA (Cano et al., 2014). Thus, knowledge of polymorphisms harboring in control region of mtDNA and the classification of these sequences in to haplogroups can be of great importance in the forensic cases of identification, such as mass disaster and missing persons (Chong, et al., 2005; Nilsson, et al., 2008). The mtDNA haplogroups have been considered as maternally derived ancestral genomic markers (Ma et al., 2014). Analysis of mutational events along the human mtDNA showed that individuals came from the same maternal lineage share the same set of mutations (Senafi et al., 2014).

2 The length of mtDNA control region sequence varies among populations due to the presence of indels and variable number of tandem repeats at the three-hypervariable segments. The characteristic properties of mtDNA like; exclusively maternal inheritance (Budowle et al., 2010), absence of recombination and high mutation rate (10-200 folds) compared to nuclear DNA in the control region, laid the basis for high polymorphism in mtDNA (Goncalves et al., 2011). This polymorphism in mtDNA is as a result of free radicles production due to electron transport chain and limited DNA repair mechanism (Larsen et al., 2005; Yu, 2011; Wallace, 2011). This feature has made mtDNA a useful tool for exploring origin and migration in human populations and is widely applicable in human ethnic group’s evolutionary relationships (Singh and Kulawiec, 2009). Another source of polymorphism in the mtDNA is the occurrence of different types of mtDNA or population of discrete mtDNA genomes in an individual, which is also called as heteroplasmy (Melton et al., 2004). During heteroplasmy, there is possibility of more than two types of mtDNA in an individual or in a single cell, or in a single mitochondrion. Heteroplasmy is more frequent in the control region than in the coding region of mtDNA (Santos et al., 2008; Li et al., 2010) and its level vary among tissues (Irwin et al., 2009; He et al., 2010; Goto et al., 2011) and populations. Two different types of heteroplasmies have been reported in mitochondrial genome including sequence heteroplasmy and length heteroplasmy (Bendall et al., 1996; Melton et al., 2004). The homopolymeric C-stretches at positions 16184–16193 (HVSI) and at positions 303–310 (HVSII) are usually the source of length heteroplasmies (Stewart et al., 2001). However, the occurrence of two nucleotides at one position in the mtDNA shows the sequence heteroplasmy, which results in overlapping peaks in an electropherogram. The mixture of wild type and variant mtDNA (heteroplasmic mtDNA) has been reported in significant number of healthy individuals (25%) (Schonberg et al., 2010). Moreover, homoplastic variation of mtDNA due to negligible or no recombination at the population level has been very instrumental to determine global scale migrations of populations (Torroni et al., 2006). The mtDNA haplotypes have become popular tools for tracing maternal ancestry (Ely et al., 2006). The haplotypes represent the entire profile of mutations along the mtDNA molecule in comparison torCRS (Andrews et al., 1999). The similar haplotypes

3 which share a common ancestor with same single nucleotide polymorphism (SNP) mutations form a haplogroup (Rosa and Brehm, 2011).There are seven macro- haplogroups including L0’ L1’ L2’ L3’ L4’ L5’L6, which are African specific mtDNA haplogroups, and M, N and R subgroups of macro-haplogroups are found in rest of the world (Behar et al., 2008). The macrohaplogroups L have been predominantly reported in western Africa (Barbieri et al., 2014). It has been suggested in previous studies that the presence of African mtDNA lineages in Makranis proves their recent origin as the Makrani haplotypes have also been observed in modern sub-Saharan African populations fromMozambique (Salas et al., 2002, Barbieri et al., 2014). Moreover, the lineages of Makranis including L1, L2, and L3 have been also found in Mozambique samples with the most frequent haplotypes including L1a2, L2a1a, and L2a1b (Pereira et al., 2001; Salas et al., 2002). The previous studies about Kalash suggested that highest contribution of western Eurasian haplogroups in this population is due to their maternal lineages and no evidence of East or South Asian lineages have been reported.The western Eurasian influence reached a frequency of 100% in the Kalash population with U4 haplogroup (34%) being the most frequent mtDNA haplogroup (Quintana-Murci et al.,2004). However, the molecular genetic studies of mtDNA in the Makrani and Kalashi ethnic people have been relatively limited so far. Present study reports the largest mtDNA survey so far of Makrani and Kalashi peoples of Pakistan. The aim of this study wasto evaluate the genetic variability within and between Makrani and Kalashi population using mtDNA control region.The entire mtDNA control region (spanning positions 16,024–16,569 and 1–576) including hypervariable segments (HVSI, HVSII and HVSIII) was sequenced for100 Makrani individuals (males, n = 96; females, n = 4) and 111 Kalashi individuals (males, n = 63; females, n = 48).The Makranis are the descendants of Bantu speaking, living in Turbat, Panjgur, Awaran, Kharan, Nasirabad, Gwadar and Buleda cities of Baluchistan province, Burewala city of Punjab Province and Karachi city of Sindh province of Pakistan. The Kalasha or are a group of Indo-European and Indo-Iranian speaking people living in Bumburet, Birir and Rumbur valleys of Chitral district, Khyber-Pakhtunkhwa province of Pakistan. The mitochondrial DNA variation in Makrani and Kalashi populations from rCRS were utilized to infer mtDNA haplogroups. The haplogroups

4 profiles of Makrani and Kalashi individuals were compared with different populations, which may provide an insight into the understanding of the history of their settlements in Pakistan.

5 2-LITERATURE REVIEW Pakistan is hypothesized to be one of the first regions where the modern humans were settled (Qamar et al., 1999; Rakha et al., 2011) and is a South Asian association of the four provinces i.e. Punjab, Sindh, Khyber Pakhtunkhwa (KPK), Baluchistan, Islamabad Territory, Gilgit Baltistan formerly known as the Northern Areas and the Tribal Areas in the northwest including the Frontier Regions, located within latitude and longitude of 33.6667oN, 73.1667oE and covering an area of 796,095 km2. Pakistan is considered the sixth most thickly populated country of the world with a population of about 180 million people (Press Information Department, Pakistan 2009). Pakistani population is usually divided into more than 18 ethnic and 60 linguistic groups (Grimes, 1992). The major ethnic groups include the Punjabis, Pathans, Sindhi, Saraiki, Muhajir, Balochi, Kalashi and Makrani (Rakha, et al., 2011). Makrani people sometimes also called as “Negroid Makrani”, inhabit the Makran coast of Baluchistan (Quintana-Murci et al., 2004). Baluchistan is the largest of Pakistani provinces with respect to area and smallest in terms of population, with about 80% inter-mountainous area, central Makran and Makran coast. Kalashi population is living in the Hindu Kush Mountains of present day Pakistan which is divided into three remote mountain valleys at the height of 1900M-2200M, exhibit the genes that certainly were originated in Europe and may have been carried to East by the Alexander the Great The mitochondrial genome analyses have been considered as a useful tool to study the ancestral relationship among populations and their migratory routes on the globe (Cossins, 2014). Mitochondria are unique among animal organelles which posses their own genome and it is an extraordinarily distinctive from nuclear genome. The human mtDNA is a closed circular double stranded DNA molecule composed of ~16569 building blocks (Taylor and Turnbull, 2005; Lan, et al., 2008). The application of mitochondrial DNA (mtDNA) analysis in forensic investigations has gained importance owing to the fact that in the cases where the DNA found is highly degraded (such as stains, bones, saliva, and fingernails etc) or very less in quantity, the mtDNA has proved very useful due to presence of 1000–2000 mitochondria per human cell which increases the possibility of obtaining some copies of mtDNA for analysis (Chong et al., 2005; Nilsson et al., 2008; Kavlick et al., 2011; Adachi et al., 2014). In human cells, there are many mitochondria

6 found in each cell (50-100s) and each mitochondrion contains 5 to 10 copies of mtDNA (Legros et al., 2004). The circular mtDNA is more resistant to the nucleases, which usually easily can degrade the nuclear DNA. Its non-recombinational inheritance makes it a valuable tool to track the human evolutionary history. Due to the presence of high copy number of mtDNA in each cell, mtDNA analyses is more reliable than nuclear DNA in the cases where the amount of starting material is low in quantity.

Figure 2.1: Human mitochondrial DNA map showing CR (control region).The hypervariable segments (HVSI, HVSII and HVS III) in the control region that is important for forensic mtDNA analyses are shown at the top.

The complete sequence of mitochondrial genome has revealed two different strands of mtDNA including the purine-rich strand (heavy strand) and the pyrimidine-rich strand (light strand) (Anderson et al., 1981). In mtDNA genome, the Nucleotide positions have been numbered according to Anderson principle with small changes (Anderson et al., 1981).

7 The each base on heavy strand has been numbered from base one to 16,569 base pairs. The slight reliability of polymerase and the noticeable lack of repair mechanisms are the major factors for high rates of mutations in mtDNA as compared to nuclear DNA. The evolution rate of some regions in the mitochondrial genome has been reported about ten times higher as compared to nuclear genes. These regions are being used for human identity testing due to their hyper-variability. Mostly, the variations between the persons are reported at two particular regions of the D-loop (Greenberg et al., 1983) known as the hyper-variable segment I (HVSI) and the hyper-variable segment II (HVSII). Generally, HVSI spans the region of 16,024 to 16,365 and HVSII from73 to 340 base pairs. Due to small PCR product size of both regions, HVSI and HVSII are generally utilized for forensic casework (Figure 2.1).

Figure 2.2: mtDNA control region schematic diagram (Courtesyhttp://forensic.yonsei.ac.kr/ protocols.html).

It has been found that mtDNA inherited maternally (Case and Wallace, 1981). Without alteration, mtDNA sequences of all maternal relations are identical, due to which mtDNA has become an effective tool in forensic investigations about missing persons with the known maternal links. The source of evidence can be effectively analyzed with reference to the maternal relatives that are several generations apart due to lack of recombination. Hence nuclear DNA markers cannot present this characteristic (Ginther et

8 al.,1992). The presence of mtDNA in the hair shaft has made it an important tool for forensic analysis besides bones and teeth. The mtDNA has also proved its importance when it was used as a tool in the identification of persons from bones found from the places of American war. In addition to this, mtDNA has been utilized to find evolutionary links and anthropological studies as well as in the analyses of very old samples such as old brain tissues (about 7000 years ago) (Yang et al., 2012). The interpretations of mtDNA sequencing results have been simplified on account of haploid and monoclonal characteristics of mtDNA. Although most of the individuals are homoplasmic and heteroplasmy may be found at some sites (Bendall et al., 1997). If a person carries more than one noticeable mtDNA types, he/she is considered to be heteroplasmic. Due to its high copy number mtDNA molecule has become more valuable as compared to nuclear DNA, for specific kinds of forensic analysis. However, interpretational difficulties can be removed by keeping in mind the knowledge of heteroplasmies about the populations when analyzing a sample under question(Alonso et al., 2002). The associations of mtDNA haplogroups of different populations build the base for the structures of genetic variations in human (Anderson et al., 1981; Andrew et al., 1999). The variants sharing common ancestors cluster together with common mutations forming a group called a haplogroup. In order to recognize the major human haplogroups, phylogenetic methods have been used (Andrew et al., 1999). Several methods are well characterized by the systematic community and used to construct a phylogenetic tree of the mtDNA sequences, including neighbor-joining, minimum spanning networks, maximum likelihood and maximum parsimony. The phylogenetic method has also been used comprehensively to examine and describe The variations in control region of mtDNA has been utilized comprehensively for phylogenetic relationship and there is large amount data available for comparative studies (Sobrino and Carracedo, 2005). The studies of mtDNA variation had been quite useful in understanding the human origin and diffusion patterns in the last decade. The mtDNA survey in the populations worldwide has shown the continent specific distributions of mtDNA lineages (Mishmar et al., 2003).

9 2.1 Hypervariable Sites

The hypervariable sites have been identified in mtDNA during evolutionary studies and these sites show variable nucleotides in different populations. The single nucleotide polymorphisms have been utilized in many applications in the field of medical genetics, human genetics and evolutionary genetics as well as in the field of forensics (Quintana-Murci et al., 2004). The mtDNA polymorphisms have been reported as precious for identity testing of degraded samples or samples with low quantity of starting material (Quintana-Quintana-Murci et al., 2004). Due to lack of evidences, the hypervariable sites were considered as mutational hotspots (Hagelberg et al., 1999). However, a hypothesis was projected in recent times, which conveys that the shuffling of earliest mtDNA mutations has occurred among different lineages via recombination, which reflects the strength of hypervariable sites. Even though recent claim of recombination in human mtDNA (Eyre-Walker et al., 1999) have not been verified in few cases (Hagelberg et al., 2000) and remained notorious (Kumar et al., 2000; Parsons and Irwin, 2000). The persistence of hypervariable sites in the non-coding mtDNA control regions have been well recognized by means of different studies on mtDNA variations (Hasegawa et al., 1993; Meyer et al., 1999).

2.2 Haplogroups In the molecular progression, haplogroup is known as a group of related haplotypes that contribute to a common ancestor that posse the same single nucleotide polymorphism mutation in all haplotypes. Since a haplogroup posse’s similar haplotypes, so it is possible to guess a haplogroup from the haplotypes. The evolution of the human mitochondrial genome is characterized by the appearance of ethnically diverse haplogroups. The haplogroups determined by utilizing the knowledge of mtDNA SNPs become major clades of mtDNA tree (Torroni et al., 2000). Nine European, seven Asian and three African mitochondrial DNA (mtDNA) haplogroups have been recognized previously on the basis of the presence or absence of a comparatively small number of restriction enzyme sites or on the basis of nucleotide sequences of the D-loop region. The origin and distribution of some haplogroups is as follows.

10 2.2.1. African Haplogroups Haplogroup L1 is known as African haplogroup and it is believed that this haplogroup has appeared around 110,000 to 170,000 years before. It is normally used to refer a family of lineages found in Africa and this haplogroup sometimes referred to as haplogroup L1-6, which is known as the major haplogroup including mostly African lineages with subclades L1, L2, L4, L5, L6 and also L3. It is also found mainly in Central Africa and . The populations carrying the M1 haplogroup mostly favor Africa as the place of origin instead of the Asia. It has also been observed that the population distributions for both M1 and M1a make it obvious that majority M1 haplogroups emerge across Sub- Saharan Africa, not in Asia and . Olivieri et al., 2006 claims that M1haplogroup originated in Asia and also haplogroup represents a back-migration to . Haplogroup L2 is human mitochondrial DNA (mtDNA) haplogroup that is normally found in Africa and its subclade L2a is a fairly common and prevalent in Africa, as well as in the African diaspora Americans (Salas et al., 2002) and also L2a1b and L2a1c appeared in Southeastern Africa. It is also believed that haplogroup L3 has originated in East Africa (Gonder et al., 2006) and it is divided into numerous clades, two of them are M and N haplogroups that become source of non-Africans haplogroups (Wallace et al., 1999). 2.2.2. West Eurasian Haplogroups About haplogroup HV it is believed that it is a west Eurasian haplogroup and is normally found throughout western Asia, southern and Eastern Europe, especially Iran, Anatolia and the Mountains of southern Russia and the republic of Georgia.U7 has been reported as West Eurasian haplogroup with it origin from the Black Sea (Metspalu et al., 2004). Presently, U7 haplogroup has been found in the western Siberian tribes (Rudbeck et al., 2005), West Asia, , in Iran (Metspalu et al., 2004) and (Metspalu et al., 2004). It has been observed that the haplogroups U7, W and R2 constitute about one third of the West Eurasian-specific haplogroups in India. It is believed that haplogroup W6, collectively W3, W4 and W5, were descendants of a woman from northwest India around 14,000 years ago. W6 haplogroup appeared in the area between the Black and Caspian Sea and also in Georgia near about 10,000 years ago.

11 The J haplogroup has been found about 12% in native Europeans (Sykes, 2001). Average frequency of the J was found with highest in the Near East followed by Europe, Caucasus and North Africa. It has been observed that J1 haplogroup takes up four-fifths of the total and is spread in the continent while J2 haplogroup is more localized around the Mediterranean, Greece, and . However, this haplogroup has also been found in Kalash (Quintana-Murci et al., 2004). It has been observed that haplogroup U4 has been originated near about 25,000 years ago in Upper Palaeolithic and has been occupied since the time of settling of modern humans in Europe (Richards et al., 2000). Haplogroup U4 is also found with highest frequency in Scandinavia as well as Baltic states and is also linked to ancient European hunters conserved in Siberia (Sarkissian et al., 2013). Haplogroup R has extensive diversity having diverse ethnic position and different families on language base in the South Asia. In western regions of India different castes and tribes show higher haplogroup diversity than that of other regions which suggest their autochthonous status (Maji, 2008). Haplogroup F is found in Asia and has appeared throughout East Asia and Southeast Asia. It is a descendant haplogroup of haplogroup R. Practically common in East Asia and Southeast Asia (David et al., 2004; Hill et al., 2006). Its distribution extends at low frequency to the Tharu of southern Nepal and the Bashkirs of the southern Urals (Fornarino et al., 2009). Haplogroup T is found in about 1% of native European population (Sykes, 2001). This haplogroup is rare in Africa and is found to be absent in most of the populations. It has highest frequencies in the Amhara and the Tigraipeople (Kivisild, 2004). Haplogroup U2 has been found most common in South Asia (Metspalu et al., 2004), with low frequency in Central Asia, West Asiaand Europe (U2e) (Maji et al., 2008). R0 haplogroup is most derives than that of haplogroup R. Haplogroup R0 is found frequently in the Arabian Plate with its highest frequency in Socotri (Cerny et al., 2009) and has been also found in a high frequency in the Kalash in Pakistan (Quintana-Murci et al., 2004). Haplogroup R2 have been reported with low frequencies in the and India and is almost absent in another place. The haplogroup R2 have been only found in few populations of the Volga in Europe. H2 haplogroup is fairly universal in Eastern Europe and the Caucasus (Pereira et al., 2005). This is most widespread H subclades

12 among Central Asians and has also been observed in West Asia (Loogvali et al., 2004). H2a5 haplogroup has been found in Spain (Alvarez-Iglesias et al., 2009) Norway, Ireland and Slovakia (van Oven and Kayser, 2009). 2.2.3. Southeast Asian Haplogroup It is thought that haplogroup Bhas been arisen in Asia near about 50,000 years ago. Haplogroup R was its ancestral haplogroup. Its greater variety is found in China. It is prominent with haplogroup B. haplogroup B is found frequently in southeastern Asia (Yao et al., 2002). The haplogroup H is considered the descendant group of haplogroup HV. Ina number of studies it has been concluded that perhaps haplogroup H has evolved in West Asia 25,000 years ago. Then by migrations near about 20-25,000 years ago this haplogroup was carried to Europe and spread with population of the southwest of the continent (Richards et al., 2000; Pereira et al., 2005; Behar et al., 2012). Haplogroup HV is derived form of the haplogroup R0. HV haplogroup is considered the ancestral haplogroup of haplogroup H and haplogroup V while HV is a west Eurasian haplogroup found throughout western Asia and southern & Eastern Europe, especially Iran, Anatolia and the Caucasus Mountains of southern Russia and the republic of Georgia. It has also been observed in some parts of the northeast Africa, in , while the Eurasian frequency was found to be 22.5% (Afonso et al., 2008). Some other haplogroups and their place of origin are as follows. Haplogroup Place of origin Time of origin (Years ago) K South Asia or West Asia 40,000 T West Asia 30,000 J Middle East 30,000 R South Asia or Central Asia 28,000 E1b1b-M35 East Africa 26,000 I Balkans 25,000 R1a1 Southern Russia 21,000 R1b Around the Caspian Sea or Central Asia 20,000 E1b1b-M78 / 18,000 G Between India and the Caucasus 17,000

13 I2 Balkans 17,000 J2 Northern Mesopotamia 15,000 I2b Central Europe 13,000 N1c1 Siberia 12,000 I2a Balkans 11,000 R1b1b2 North or south of the Caucasus 10,000 J1 10,000 E1b1b-V13 Balkans 10,000 I2b1 Central Europe 9,000 I2a1 Pyrenees 8,000 I2a2 Dinaric Alps 7,500 E1b1b-M81 Maghreb 5,500 I1 Scandinavia or Central Europe 5,000 R1b-L21 Central or Eastern Europe 4,000 R1b-S21 Central Europe 3,000 I2b1a Britain < 3,000

2.3. The Role of the mtDNA in Ancestry Studies The mtDNA follows maternal mode of inheritance strictly, the limited recombination, high mutation rate and high level of population-specific polymorphisms. Mutation accumulation in mtDNA is tenfold greater than in nuclear DNA. This feature has created and characterized groups defined by having a maternal lineage legacy, making mtDNA a useful tool for studying origin and migration in human populations; it has been extensively used evolutionary associations studies among different ethnic groups and their global migrations (Singh et al., 2009). The most variable segment of the mtDNA is control region and the most polymorphic nucleotide sites are concentrated in two hypervariable segments including HVS-I & HSV- II). Individuals’ geographical origin has been identified by RFLP analysis (high-resolution) and HSVI sequencing (Carracedo et al., 2000). mtDNA population databases serve as a mean to approximate the expected frequency of haplotypes observed when an individual’s mtDNA sequence matches that of a particular sample (Butler, 2009). Many researchers around the world

14 have spent a great amount of time and resources to compile mtDNA samples from thousands of maternally unrelated samples to create databases. The databases must obtain high-quality information so that potential random match probability may be estimated reliably (Butler, 2009). The ability to accurately generate a frequency estimates for random matches in a forensic setting is one of the most significance to forensic analysts (Butler, 2009). Mitochondrial DNA forensic databases are drastically lacking in sample size as well as population diversity. This study reports the mtDNA survey of Makrani and Kalashi peoples of Pakistan. The genetic variability within and between Makrani and Kalashi population using mtDNA control region was evaluated. The entire mtDNA control region was sequenced for 100 Makrani individuals and 111 Kalashi individuals. The mitochondrial DNA variation in Makrani and Kalashi populations from rCRS were utilized to infer mtDNA haplogroups. The haplogroups profiles of Makrani and Kalashi individuals were compared with different populations to infer the ancestry of Kalashi and Makrani peoples of Pakistan.

15 3-MATERIALS AND METHODS 3.1. Sample Collection Areas Blood samples were collected from 211 maternally unrelated individuals from two isolated populations of Pakistan, viz. Makrani and Kalashi. Sampling areas for both populations examined during this study are shown in Fig.3.1. In this study, mtDNA control region sequence data was generated for 211 individuals.

N GILGIT C H I N A Chitral BALTISTAN R I M PAKISTAN D Rumbur H I S J S A A Bumburet P K M U Birir 0 100 200 300 km U M T E M U D

M &

T A K J E

A A R D W A S R H Z I H T A M K O R N I N R R Sampling Area E U A B T Y Y H N Number of sample T H K K A S P I MAKRANI POPULATION N A N H G Turbat 73 F PUNJAB Awaran 03 A Burewala Buleda 02 Panjgur 02 Kharan 01 Kharan A Nasirabad 01 I Gwadar 01 BALOCHISTAN D Burewala 14 N Karachi (Lyari) 03 A Panjgur N

R Total 100 I I Buleda KALASHI POPULATION Nasir- abad Turbat Awaran SIND Chitral Gwadar (i) Bumburet 24 Karachi (ii) Rumbur 45 (Lyari) (iii) Birir 42 A R A B I A N S E A Total 111

Figure 3.1: Map of Pakistan showing its administrative regions and neighboring countries. Triangles represent sampling areas for Makrani and Kalashi populations.

16 3.2. Makrani Population

3.2.1. Sample Collection Blood samples (3-5 ml) were collected from 100 healthy, unrelated Makrani individuals (males, n=96; females, n=4) from Pakistan, after obtaining oral and written consent according to the declarations of Helsinki. Donor’s information was collected individually according to the consent forms (Annexure 1 and 2). The detailed data of consent forms from Makrani population is presented in Table 3.1.The summarized information about sampling of Makrani population from different cities of three provinces of Pakistan is presented in Table 3.2.

Table 3.1: The detailed data of consent forms from Makrani population.

Donor’s Sr. Mother Father Age Birth Ethnic No. ID Gender (Yrs.) Place Group Birth Ethnic Birth Ethnic Place Group Place Group 1 MKH001 M 23 TRB MKB TRB MKB TRB MKB 2 MKH002 M 50 TRB MKB TRB MKB TRB MKB 3 MKH003 M 18 TRB MKB TRB MKB TRB MKB 4 MKH004 M 22 TRB MKB TRB MKB TRB MKB 5 MKH005 M 60 TRB MKB TRB MKB TRB MKB 6 MKH006 M 28 TRB MKB TRB MKB TRB MKB 7 MKH007 M 28 TRB MKB TRB MKB TRB MKB 8 MKH008 F 50 TRB MKB TRB MKB TRB MKB 9 MKH009 M 24 TRB MKB TRB MKB TRB MKB 19 MKH011 M 27 TRB MKB GWD MKB TRB MKB 11 MKH012 M 21 TRB MKB TRB MKB TRB MKB 12 MKH013 M 35 TRB MKB TRB MKB TRB MKB 13 MKH014 M 21 PJR MKB PJR MKB TRB MKB 14 MKH015 M 25 TRB MKB TRB MKB TRB MKB 15 MKH016 M 23 TRB MKB TRB MKB TRB MKB 16 MKH017 M 22 TRB MKB TRB MKB TRB MKB 17 MKH018 M 26 TRB MKB TRB MKB TRB MKB 18 MKH019 M 18 TRB MKB TRB MKB TRB MKB 19 MKH020 M 18 TRB MKB TRB MKB TRB MKB 20 MKH021 M 19 TRB MKB TRB MKB TRB MKB 21 MKH022 M 20 TRB MKB KRC MKB KRC MKB 22 MKH023 M 30 TRB MKB TRB MKB TRB MKB 23 MKH024 M 28 TRB MKB TRB MKB TRB MKB 24 MKH025 M 28 TRB MKB TRB MKB TRB MKB

17 Donor’s Sr. Mother Father Age Birth Ethnic No. ID Gender Birth Ethnic Birth Ethnic (Yrs.) Place Group Place Group Place Group 25 MKH026 M 33 TRB MKB TRB MKB TRB MKB 26 MKH027 M 32 TRB MKB TRB MKB TRB MKB 27 MKH028 M 45 TRB MKB TRB MKB TRB MKB 28 MKH029 M 30 TRB MKB TRB MKB TRB MKB 29 MKH030 M 26 TRB MKB TRB MKB TRB MKB 30 MKH031 M 30 TRB MKB TRB MKB TRB MKB 31 MKH032 M 30 TRB MKB TRB MKB TRB MKB 32 MKH033 M 18 TRB MKB TRB MKB TRB MKB 33 MKH034 M 35 TRB MKB TRB MKB TRB MKB 34 MKH035 M 30 TRB MKB TRB MKB TRB MKB 35 MKH036 M 32 TRB MKB TRB MKB TRB MKB 36 MKH037 M 22 KRC MKB KRC MKB KRC MKB 37 MKH038 M 22 AWN MKB AWN MKB AWN MKB 38 MKH039 M 32 AWN MKB AWN MKB AWN MKB 39 MKH040 M 25 TRB MKB TRB MKB TRB MKB 40 MKH041 M 32 KRC MKB KRC MKB KRC MKB 41 MKH042 M 22 TRB MKB TRB MKB TRB MKB 42 MKH043 M 32 TRB MKB TRB MKB TRB MKB 43 MKH044 M 19 TRB MKB TRB MKB TRB MKB 44 MKH045 M 43 KRC MKB KRC MKB KRC MKB 45 MKH046 M 22 AWN MKB AWN MKB AWN MKB 46 MKH047 M 18 TRB MKB TRB MKB TRB MKB 47 MKH048 M 18 TRB MKB TRB MKB TRB MKB 48 MKH049 M 22 TRB MKB TRB MKB TRB MKB 49 MKH050 M 18 TRB MKB TRB MKB TRB MKB 50 MKH052 M 51 BRW MKB OKR MKB OKR MKB 51 MKH055 M 21 BRW MKB OKR MKB OKR MKB 52 MKH056 M 18 TRB MKB TRB MKB TRB MKB 53 MKH057 M 27 BRW MKB OKR MKB OKR MKB 54 MKH058 F 35 TRB MKB TRB MKB TRB MKB 55 MKH061 F 18 TRB MKB TRB MKB TRB MKB 56 MKH062 F 30 GWD MKB GWD MKB GWD MKB 57 MKH063 M 19 BRW MKB OKR MKB OKR MKB 58 MKH067 M 24 BRW MKB BRW MKB BRW MKB 59 MKH068 M 70 BRW MKB BRW MKB BRW MKB 60 MKH069 M 55 BRW MKB OKR MKB OKR MKB 61 MKH071 M 25 BRW MKB OKR MKB OKR MKB 62 MKH072 M 60 BRW MKB OKR MKB OKR MKB 63 MKH073 M 48 BRW MKB OKR MKB OKR MKB 64 MKH074 M 65 BRW MKB OKR MKB OKR MKB 65 MKH075 M 27 BRW MKB OKR MKB OKR MKB

18 Donor’s Sr. Mother Father Age Birth Ethnic No. ID Gender Birth Ethnic Birth Ethnic (Yrs.) Place Group Place Group Place Group 66 MKH077 M 24 BRW MKB OKR MKB OKR MKB 67 MKH078 M 47 BRW MKB OKR MKB OKR MKB 68 MKH079 M 26 TRB MKB TRB MKB TRB MKB 69 MKH080 M 30 TRB MKB TRB MKB TRB MKB 70 MKH081 M 20 TRB MKB TRB MKB TRB MKB 71 MKH082 M 19 TRB MKB TRB MKB TRB MKB 72 MKH083 M 20 PJR MKB PJR MKB PJR MKB 73 MKH084 M 22 TRB MKB BLD MKB BLD MKB 74 MKH085 M 24 TRB MKB TRB MKB TRB MKB 75 MKH086 M 32 TRB MKB TRB MKB TRB MKB 76 MKH087 M 22 TRB MKB TRB MKB TRB MKB 77 MKH088 M 22 TRB MKB TRB MKB PJR MKB 78 MKH089 M 20 TRB MKB TRB MKB TRB MKB 79 MKH090 M 20 TRB MKB TRB MKB TRB MKB 80 MKH091 M 18 TRB MKB TRB MKB TRB MKB 81 MKH092 M 18 TRB MKB TRB MKB TRB MKB 82 MKH093 M 23 TRB MKB TRB MKB TRB MKB 83 MKH094 M 50 TRB MKB TRB MKB TRB MKB 84 MKH095 M 30 TRB MKB TRB MKB TRB MKB 85 MKH096 M 22 TRB MKB TRB MKB TRB MKB 86 MKH097 M 20 TRB MKB TRB MKB TRB MKB 87 MKH098 M 23 TRB MKB TRB MKB TRB MKB 88 MKH099 M 20 TRB MKB TRB MKB TRB MKB 89 MKH100 M 18 KHN MKB KHN MKB KHN MKB 90 MKH101 M 20 BLD MKB TRB MKB TRB MKB 91 MKH102 M 22 TRB MKB TRB MKB TRB MKB 92 MKH103 M 21 NSD MKB NSD MKB NSD MKB 93 MKH104 M 21 BLD MKB BLD MKB BLD MKB 94 MKH105 M 21 TRB MKB TRB MKB TRB MKB 95 MKH106 M 23 TRB MKB TRB MKB TRB MKB 96 MKH107 M 24 TRB MKB TRB MKB TRB MKB 97 MKH108 M 22 TRB MKB TRB MKB TRB MKB 98 MKH109 M 22 TRB MKB TRB MKB TRB MKB 99 MKH112 M 24 TRB MKB TRB MKB TRB MKB 100 MKH113 M 21 TRB MKB TRB MKB TRB MKB

Abbreviations: M, Male; F, Female; Yrs., Years; MKB, Makrani Baloch; TRB, Turbat; BLD, Buleda; NSD, Nasirabad; KHN, Kharan; PJR, Panjgor; BRW, Burewala; GWD, Gawadar; AWN, Awaran; KRC, Karachi, OKR, Okara.

19

Table 3.2: The summarized information about sampling of Makrani population from different cities of three provinces of Pakistan

Male Female Province City (n=96) (n=4) Turbat 70 3 Awaran 3 0 Buleda 2 0 Baluchistan Panjgur 2 0 Kharan 1 0 Gwadar 0 1 Nasirabad 1 0 Punjab Burewala 14 0 0 Sindh Karachi(Lyari) 3

3.3. Kalash Population

3.3.1. Sample Collection

Blood samples (3-5ml) were collected from 111 maternally unrelated healthy Kalashi individuals (males=63 and females =48) from Pakistan after obtaining oral and written consent according to the declarations of Helsinki. Donor’s information was collected individually according to the consent forms (Annexure 1 and 2).The-sampling areas for Kalashi population are shown in Fig.3.1 and detailed data of consent forms in Table 3.3.For simplicity, the summarized data of samples from three different valleys of Kalash are shown in Table 3.4.

20

Table 3.3: The detailed data of consent forms from Kalashi population.

Donor’s

Birth Place Mother’s Father’s

Sr.

ears)

Ethnic Ethnic Place

No ID Place

-

Ethnic Ethnic Ethnic

- -

Group

Gender

Valley

Group Group

Village

Sub Age(Y

EthnicGroup

Birth Birth

Sub Sub 1 KLH001 KLS KLS GBG BRR M 29 KLG BRR KLS BRR 2 KLH002 SKT KLS KRK BMT M 26 RJW BTK SKT KRK 3 KLH003 BBR KLS KRK BMT M 35 BLS BRN BBR KRK 4 KLH004 SKT KLS KRK BMT M 50 BZK BRN SKT KRK 5 KLH005 DHM KLS KRK RBR M 60 MTM RBR DHM KRK 6 KLH006 SKT KLS KRK BMT M 39 SHY DGU SKT KRK 7 KLH007 SKT KLS KRK BMT M 27 RJW BTK SKT KRK 8 KLH008 RJW KLS BTK BMT M 19 MHD BRR RJW BTK 9 KLH009 RJW KLS BTK BMT M 35 SHY DGU RJW BTK 10 KLH019 BMK KLS ANH BMT M 19 ASN ANH BMK ANH 11 KLH011 BZK KLS BRN BMT M 29 BGL RBR BZK BRN 12 KLH012 BZK KLS BRN BMT M 45 BRK ANH BZK BRN 13 KLH014 BD KLS KRK BMT M 35 BLS BRN BBR KRK 14 KLH015 BLS KLS BRN BMT M 50 SHY DGU BLS BRN 15 KLH016 BZK KLS BRN BMT M 28 BLO RBR BZK BRN 16 KLH018 BZK KLS BRN BMT M 18 SKT BRN BZK BRN 17 KLH019 BD KLS KRK BMT M 18 LGY AYN BBR KRK 18 KLH020 SKT KLS KRK BMT M 21 SKT KRK SKT KRK 19 KLH021 SKT KLS KRK BMT M 26 QRH SKH SKT KRK 20 KLH022 SKT KLS KRK BMT M 45 SHY DGU SKT KRK 21 KLH023 RJW KLS BTK BMT M 80 GSD BRR RJW BTK 22 KLH024 RJW KLS BTK BMT M 40 BLS BRN RJW BTK 23 KLH026 BLS KLS BRN BMT M 30 MTM RBR BLS BRN 24 KLH027 BLS KLS BRN BMT M 26 BZK BRN BLS BRN 25 KLH028 BBR KLS KRK BMT M 30 SKT KRK BBR KRK 26 KLH029 RJW KLS BTK BMT M 38 SKT KRK RJW BTK 27 KLH030 MTM KLS KTD RBR F 28 OMH MLD MTM KTD 28 KLH031 MTM KLS KTD RBR F 28 BBR KRK MTM KTD 29 KLH032 BLO KLS KLG RBR F 17 BLS BRN BLO KLG 30 KLH033 DHM KLS BGU RBR F 20 DHM BGU DHM BGU 31 KLH034 MTM KLS KTD RBR F 29 BBR KRK MTM BTT 32 KLH035 BLO KLS KLG RBR M 18 DHM BRR BLO KLG

21

Donor’s

Birth Place Mother’s Father’s

Sr.

c c

No ID

Place Place

Gender

Ethnic Ethnic Ethni

Ethnic Group Ethnic

- -

-

Age (Years) Age

Valley

Group Group

Village

EthnicGroup

Birth Birth

Sub Sub Sub

33 KHL036 BLO KLS KLG RBR M 19 BZK BRN BLO KLG 34 KLH037 BLO KLS BGU RBR F 29 WKY BGU BLO GRM 35 KLH038 MTM KLS KLG RBR M 45 DHM BGU MTM KLG 36 KLH039 DHM KLS BGU RBR F 45 MTM KLG DHM BGU 37 KLH040 DHM KLS BGU RBR F 23 BLO GRM DHM BGU 38 KLH041 BGL KLS BGU RBR M 35 BLO GRM BGL BGU 39 KLH044 DHM KLS BGU RBR F 26 BGL BGU DHM BGU 40 LKH045 BLO KLS BTT RBR M 20 SKT KRK BLO BTT 41 KLH046 WKY KLS BGU RBR M 25 DHM BGU WKY BGU 42 KLH047 JRY KLS BGU RBR M 45 DHM BGU JRY BGU 43 KLH048 BGL KLS BGU RBR F 15 DHM MLD BGL BGU 44 KLH049 MTM KLS KTD RBR M 25 BLO KLS MTM KTD 45 KLH050 DHM KLS GRM RBR F 21 MTM BGU DHM GRM 46 KLH051 BLO KLS GRM RBR M 15 WKY BGU BLO GRM 47 KLH052 BLO KLS GRM RBR M 16 RJW BTT BLO GRM 48 KLH053 WKY KLS BGU RBR F 18 BLO KLS WKY BGU 49 KLH054 DHM KLS KLG RBR M 34 WKY KLS DHM KLG 50 KLH056 DHM KLS BGU RBR M 18 BGL BGU DHM BGU 51 KLH057 BGL KLS BGU RBR F 26 BLO KLS BGL BGU 52 KLH058 DHM KLS BGU RBR F 27 BLO GRM DHM BGU 53 KLH059 WKY KLS BGU RBR M 15 JRY BGU WKY BGU 54 KLH060 DHM KLS BGU RBR M 35 BLO GRM DHM BGU 55 KLH061 WKY KLS BGU RBR M 35 DHM BGU WKY BGU 56 KLH062 DHM KLS BGU RBR F 28 BGL BGU DHM BGU 57 KLH063 DHM KLS BGU RBR F 25 BLO GRM DHM BGU 58 KLH064 BGL KLS BGU RBR F 35 WKY BGU BGL BGU 59 KLH065 ZHO KLS BGU RBR F 18 BLO BTT ZHO BTT 60 KLH066 BGL KLS BGU RBR M 36 WKY BGU BGL BGU 61 KLH067 ZHO KLS BGU RBR M 30 BGL BGU ZHO MLD 62 KLH068 DHM KLS BGU RBR F 28 BGL BGU DHM BGU 63 KLH079 JRY KLS BGU RBR M 39 DHM BGU JRY BGU 64 KLH070 BGL KLS BGU RBR F 40 WKY BGU BGL BGU 65 KLH071 DHM KLS BGU RBR M 39 BGL BGU DHM BGU 66 KLH072 DHM KLS BGU RBR F 15 BLO GRM DHM BGU

22

Donor’s

Birth Place Mother’s Father’s

Sr.

No.

ID

p

Gender

Ethnic Ethnic Ethnic

Ethnic Group Ethnic

- -

-

Age(Years)

Valley

Grou Group

Village

EthnicGroup

Birth Place Birth Place Birth

Sub Sub Sub

67 KLH073 JRY KLS BGU RBR F 39 DHM BGU JRY BGU 68 KLH074 BGL KLS BGU RBR F 35 WKY BGU BGL BGU 69 KLH075 DHM KLS GRM RBR F 25 BLO KLG DHM MLD 70 KLH076 MTM KLS KLG RBR F 28 BZK BRN MTM KLG 71 KLH077 ASN KLS GBG BRR M 45 BDI KDR ASN GMK 72 KLH078 LKD KLS NSP BRR M 19 LKD BPL RMI NSP 73 KLH079 AKW KLS ASP BRR F 19 MDI BWO AKW ASP 74 KLH080 TRK KLS GBG BRR F 35 LTK GBL TRK GBL 75 KLH081 TRK KLS GBG BRR F 32 LRH GBL TRK GBL 76 KLH082 LKD KLS Guru BRR F 40 LRK GBL TRK Guru 77 KLH083 LKD KLS NSB BRR F 35 BDI SWR LTK NSB 78 KLH084 AKW KLS ASP BRR M 19 LRK GBL AKW ASP 79 KLH085 GSD KLS ASP BRR M 20 DMD BSH GSD ASP 80 KLH086 TRK KLS GBG BRR M 18 GSD Guru TRK GBL 81 KLH087 LKD KLS Guru BRR M 18 RMI NSB LTK Guru 82 KLH088 CHS KLS GB BRR M 19 TRK ASP CHS GBL 83 KLH089 DRD KLS NSB BRR M 19 LRK GBL DRD NSB 84 KLH090 PNW KLS WDN BRR M 22 DMW BSH PNW WDN 85 KLH091 TRK KLS BRR BRR M 24 LTK RBR TRK Guru 86 KLH092 PNW KLS BRR BRR M 19 DMD BSH PNW WDN 87 KLH093 LKD KLS BRR BRR M 18 CHS BRR LTK GBL 88 KLH094 TRK KLS BRR BRR M 20 RKD DGU TRK GBL 89 KLH095 BRD KLS NSB BRR M 18 TRK Guru BDR GBL 90 KLH097 MHD KLS BWO BRR F 19 LTK GBL MHD BWO 91 KLH098 BP KLS BWO BRR F 19 AKW ASP AKW BWO 92 KLH099 TRK KLS Guru BRR F 20 LTK Guru TRK Guru 93 KLH100 AKW KLS ASP BRR F 19 LTK GBL AKW ASP 94 KLH102 RKD KLS NSB BRR F 18 CHS GBL LTK NSB 95 KLH103 LKD KLS GBG BRR F 19 LTK GBL AKW GBL 96 KLH04 CHS KLS GBG BRR F 20 LTK GBL CHS GBL 97 KLH105 BRD KLS BWO BRR F 20 TRK GBL LTK BWO 98 KLH106 TRK KLS GBG BRR F 24 BRD BWO TRK GBL 99 KLH107 TRK KLS GBG BRR F 19 AKW GBL AKW ASP 100 KLH109 PNY KLS WDN BRR F 24 RNW BSH AKW WDN

23

Donor’s

Birth Place Mother’s Father’s

ID Ethnic

Sr. -

Group

Gender

Ethnic Ethnic Ethnic -

No -

Age(Years)

Sub

Valley

Group Group

Village

EthnicGroup

BirthPlace BirthPlace

Sub Sub 101 KLH110 AKW KLS ASP BRR F 19 TRK ASP AKW ASP 102 KLH111 PNY KLS WDN BRR F 19 ASN ABR PNW WDN 103 KLH112 DNW KLS BSH BRR F 20 AKW ASP DMW BSH 104 KLH113 LKD KLS Guru BRR F 19 TRK Guru LTK Guru 105 KLH114 LKD KLS GBG BRR F 20 RKD DGU LTK GBL 106 KLH115 TRK KLS GBG BRR F 19 LTK GBL TRK GBL 107 KLH116 MHD KLS BWO BRR F 20 MHD BWO PPW BWO 108 KLH117 TRK KLS GBG BRR M 21 TRK BWO TRK GBL 109 KLH118 TRK KLS BRR BRR M 19 RKD NSB TRK GBL 110 KLH119 LKD KLS PPG BRR M 19 TRK GBL LTK BPL 111 KLH121 AKW KLS ASP BRR M 19 TRK ASP AKW ASP

Abbreviations:BMT, Bumburete; BRR, Birir; RMB, Rumbur; KLS, Kalash; GBG, Grambetgol; MTM, Mutimir; MHD, Mahadari; BGL, Bangalie; BRKBaramuk; SKH, Shahkhandeh; DGU, Darasguru; RUR, Rumbur; MLD, Malidesh; BDI,Budadari; BMD,BumburDari;DHM, Dhramese; LTK, L’atharuk; SKT, Sharakat; AKW,Al’ukshernawaw; RJW, Rajaway; GSD, Gil’asurdari; CHS, Chagansey; DRD, Barburadari;RKD, Rashmukdari;DNW, DumuNawaw; MDI, Mahadari; LRK, Latharuk; DMD, Damundari;RMI, Rashmukdari; TRK, Thararaik; DMW, Damunawawa; RNW,RomaNawaw; SKT, Sharakat, , KLG, Kalashgram; DGU, Darasguru; BBR, Bumboor; KDR,Kandisaar; BPL, phishpagole; WDN,Waridon; ANH, Anish. KRK, Krakal; ASP, Asper; BLS; Bulasing BTK, Batrik; BRN, Broon; NSB, Nos’biaw; BSH, Bishal; PPG.Phishpagol; BWO, Biawo; BZK, Bazik;BLO, Balo; KTD, Kotdish; BMK, Barmuk; SHY, Sharey; AYN,Ayun; QRH, Quresh; WKY, Wakokay; ASN, Aspan’i; GRM, Groom; BTTBattet; JRY, Jaro’e’; LGY, L’agay; ZHE, Zohe;NSP, Nosbiaw; PNY, Paney; OMH, Ohramsh; GMK, Gumbak;SWR, Saweri; PNW, PanaiNawawo; PPW, Ponchapanaw

24 Table 3.4: The summarized information about sampling from three different valleys of Kalash population

Male Female Valleys Villages (n= 63) (n=48) Anish 1 0 Batrik 5 0 Bumburet Broon 7 0 Krakal 11 0 Battit 1 0 Groom 2 2 Kalash Gram 4 2 Rumbur Kotdish 1 3 Rumbur 1 0 Balanguru 11 18 Asper 0 0 Biawo 0 4 Bishala 0 1 Grambetgol 8 0 Birir Guru 2 3 Nosbia 2 2 Nospeawo 1 0 Pishpagol 1 0 Waridon 2 2

3.4. DNA Extraction and Quantification

DNA was extracted from blood samples using the QIAamp DNA Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions.The extracted DNA was incubated at 70C for 1 hour to avoid degradation by nucleases and then stored at -40C .The quantity of the extracted DNA was determined by NanoDrop™ 1000 Spectrophotometer (Thermo Scientific, Wilmington, DE). The quality of the DNA was determined by visualizing it using 0.8% agarose gel. 3.5. PCR Amplification PCR was performed using Applied Biosystems thermal cycler (2720) in a 50 µl reaction volume containing 1-2 ng of genomic DNA, 0.4 μM of each primer, and AmpliTaq Gold®360 Master Mix (Applied Biosystems, Foster City, CA, USA) was used according to the manufacturer’s instructions. The amplification program consisted of pre-

25 denaturation at 95C for 11 min, followed by 35 cycles consisting of denaturation step at 95°C for 30 s, annealing at 56°C for 30 s, and extension at 72°C for 90s, with a final extension at 72°C for 7 mins. The primers listed in Table 3.5 were used for the amplification and sequencing of the entire mtDNA control region in both Makrani and Kalashi (http://forensic.yonsei.ac.kr/protocol/mtDNA-CR.pdf) populations.

Table 3.5: List of oligonucleotides, along with melting temperatures (Tm), concentrations and sequences used for amplification and sequencing of the mtDNA control regions.

Concentration Melting Sr. Primer name of each primer Temper- Primer Sequences (5→3) No. (Control region) (μM) ature (°C) Amplification and 1 sequencing primer- CTC CAC CAT TAG CAC CCA AA 0.2 55.1 F15975 Sequencing primer- 2 CCG TAC ATA GCA CAT TAC AGT C 0.2 53.0 F16327 Sequencing primer- 3 TAT TTA TCG CAC CTA CGT TC 0.2 49.6 F155 Sequencing primer- 4 GAG GAT GGT GGT CAA GGG A 0.2 56.5 R16419m Sequencing primer- 5 AGA GCT CCC GTG AGT GGT TA 0.2 57.8 R042 Amplification and 6 sequencing primer- GAT GTG AGC CCG TCT AAA CA 0.2 54.7 R635 Sequencing primer- 7 CCG CTT CTG GCC ACA GCA CT 0.2 55.3 F403 Sequencing primer- 8 CTG GTT AGG CTG GTG TTA GG 0.2 55.1 R389 Sequencing primer- 9 AAG CCT AAA TAG CCC ACA CG 0.2 55.1 F16524

26 3.5.1. Preparation of Agarose Gel The amplified PCR products were electrophoresed in agarose gel (2%) stained with ethidium bromide and were detected using UV transillumination (UVI doc gel documentation systems UK). 2g of molecular grade agarose (molecular biology grade; Sigma Chem. Co) was mixed in 100 ml of TAE electrophoresis buffer. The agarose was melted in a microwave oven. When the agarose was dissolved completely, ethidium bromide (Sigma-Aldrich, St. Louis, USA) was added and mixed thoroughly after attaining mild temperature. A gel tray was sealed with rubber clamps and placed on a level horizontal surface. The required combs were placed at appropriate positions (0.5- 1.0mm above the base of the gel). The gel was poured into the gel tray. After the gel solidified, the combs and clamps were removed from the gel tray. The gel was placed in an electrophoresis tank containing appropriate 1X TAE electrophoresis buffer.6X DNA loading dye (Thermo Scientific USA) was added to each sample and the samples were loaded on the gel. A 100 bp DNA ladder (Thermo Scientific Gene Ruler 100 bp Plus DNA Ladder #SM0323) was loaded in the first well. Electrophoresis was carried out for 40 minutes at 100 volts using a Power Pac Basic, (B10-RAD). Photographs were taken under UV transilluminator (PhotoDoc-It™ Imaging System, UK). 3.6. Sequencing Unincorporated primers and dNTPs were removed from the amplified PCR products by using ExoSAP-IT® (USB, Cleveland, OH, USA) according to manufacturer’s instructions. Reactions were mixed briefly and incubated at 37°C for 90 min then 80°C for 20 min. An extended incubation at 37°C was implemented to ensure digestion of all unincorporated PCR primers (Peter et al., 2004). Sequencing of the entire mtDNA control region (spanning nucleotide positions 16024-16569 and 1-576) was done using Big Dye Terminator Cycle Sequencing v3.1 Ready Reaction Kit (Applied Biosystems; Carlsbad, CA, USA) according to the manufacturer’s instructions, as well as commercial sequencing facilities from 1stBASE (http://www.base-asia.com) and the National Center of Excellence in Molecular Biology (CEMB) Lahore, Pakistan, were utilized for this research work.

27 3.7. Statistical Analysis All samples were sequenced bi-directionally and evaluated twice as recommended (Parson and Bandelt, 2007) using the sequence analysis software Geneious (Version 7.0.3, Biomatters Ltd, New Zealand) (Drummond et al., 2009) as well as by two independent researchers. MitoTool (Fan and Yao, 2011), mtDNA profiler (Yang et al., 2013) and HaploGrep (Kloss-Brand et al., 2011), making use of the PhyloTree Build 16 (http://www.phylotree.org) (van Oven et al., 2008) as classification tree, were used to assess the quality of mtDNA data (Fan and Yao, 2011, Yang et al., 2013). The Makrani mtDNA sequences were assigned to haplogroups according to the published data (Metspalu et al., 2004; van Oven et al., 2008; Behar, et al., 2008; van Oven et al., 2011; Mostafa et al., 2013). The population statistical parameters such as haplotype diversity, random match probability and power of discrimination were statistically calculated according to the previous studies (Tajima, 1989; Prieto, 2011). The recommendations and guidelines from the ISFG regarding the mtDNA population data reporting were followed in this study (Parson and Bandelt, 2007). Median-joining haplotype networks (Bandelt et al., 1999) were constructed using the software NETWORK (http://www.fluxus- engineering.com/sharenet.htm).

28 4-RESULTS

4.1. Sampled Populations During present study, maternal genetic ancestry of Makrani and Kalashi populations living in Pakistan was characterized by analyzing mtDNA control region (spanning positions 16,024–16,569 and 1–576) including hypervariable segments (HVSI, HVSII& HVSIII). The sequences of mtDNA were obtained from 211healthy and unrelated individuals belonging to the two ethnic populations. 4.2. Genomic DNA Quality and PCR Amplification of mtDNA Control Region Genomic DNA was extracted from blood samples of Makrani and Kalashi individuals and quality of the DNA is shown in figure 4.1a & b. The mtDNA control region was amplified for all samples and fragment size of the PCR products (~1122bp) was determined using agarose gel electrophoresis. Due to indels that occurred in mtDNA control region, the amplified PCR product was not always 1122 bp in length. The fragment size and quality for PCR product of mtDNA control region for Makrani and Kalashi populations are shown in figure 4.2a & b. 4.3. Sequencing the Control Region of Mitochondrial DNA The amplified PCR products of mitochondrial DNA control region for all samples were subjected to purification followed by Sanger sequencing using bidirectional approach. Each sample was processed twice and chromatograms were counter checked independently by laboratory fellows. For the confirmation of haplotype, an additional sequencing for identification of relevant SNPs was carried out. Most of the questioned haplogroups were assigned based on control region and relevant SNPs from coding region. The bidirectional chromatograms of a Makrani (MKH080) and Kalashi (KLH015) for entire mitochondrial DNA control region by using forward primer (F15975) as well as reverse primer (R635) are shown as examples in figures 4.3 a-b and 4.4 a-b respectively.

29

(a) (b)

Figure 4.1: Agarose gel electrophoretic analysis of genomic DNA extracted from blood samples (a) Makrani samples (1) MKH001, (2) MKH002, (3) MKH080, (4) Negative control, M=100bp marker (ThermoTM SM # 0323) (b) Kalash samples (1) KLH001, (2) KLH002, (3) KLH003, (4) KLH004, (5) KLH005, (6) Negative control, M=100bp DNA marker (Thermo TM SM # 0323).

(a) (b)

Figure 4.2: Agarose gel electrophoretic analysis of the mtDNA control region PCR products (a) Makrani samples (1) MKH001, (2) MKH002, (3) MKH003, (4) MKH004, (5) MKH005, (6) MKH080, (7) Negative control, M=100bp DNA marker (Thermo TM SM#0323)(b) Kalash samples (1) KLH001 (2) KLH002 (3) KLH003 (4) KLH004, (5) KLH005, (6) KLH006, (7) Negative control, M=100 bp DNA ladder (Thermo TM SM # 0323).

30 31

Figure 4.3 (a): Chromatogram of Makrani individual (MKH080) for entire mtDNA control region sequenced by forward primer (F15975)

32

33

Figure 4.3 (b): Chromatogram of Makrani individual (MKH080) for mtDNA control region sequenced by reverse primer (R635).

34

35

Figure 4.4 (a): Chromatogram of Kalashi individual (KLH015) for the entire mtDNA control region sequenced by forward primer (F15975).

36

37

Figure 4.4 (b): Chromatogram of Kalashi individual (KLH015) for the entire mtDNA control region sequenced by reverse primer (R635)

38 4.4. Reconstruction and Alignment with rCRS Expected sequence (~1122bp) for each sample was reconstructed from several fragments sequences obtained from multiple amplification reactions by using “mtDNA assembly tool” that is one of the tools from mtDNA profiler (Yang et al., 2013).By using this tool, mtDNA sequences for Makrani as well as for Kalashi individuals were reconstructed from different fragments sequences. The reconstructed sequences for all samples of Makrani and Kalashi individuals were aligned to rCRS to identify the differences from rCRS. The representative rCRS aligned sequences for one Makrani (MKH080) and one Kalashi individual (KLH016) are shown in figure 4.5 a and b respectively.

....16030.....16040.....16050.....16060.....16070.....16080 ▼ ▼ ▼ ▼ ▼ ▼ rCRS TTCTTTCATGGGGAAGCAGATTTGGGTACCACCCAAGTATTGACTCACCCATCAACAACC ...... MKH080 TTCTTTCATGGGGAAGCAGATTTGGGTACCACCCAAGTATTGACTCACCCATCAACAACC

....16090.....16100.....16110.....16120.....16130.....16140

▼ ▼ ▼ ▼ ▼ ▼ rCRS GCTATGTATTTCGTACATTACTGCCAGCCACCATGAATATTGTACGGTACCATAAATACT ...... MKH080 GCTATGTATTTCGTACATTACTGCCAGCCACCATGAATATTGTACGGTACCATAAATACT

.....16150.....16160.....16170.....16180.....16190.....16200 ▼ ▼ ▼ ▼ ▼ ▼ rCRS TGACCACCTGTAGTACATAAAAACCCAATCCACATCAAAACCCCCTCCCCATGCTTACAA .S...... S...... MKH080 TAACCACCTGTAGTACATAAAAACCCAATCCATATCAAAACCCCCTCCCCATGCTTACAA

.....16210.....16220.....16230.....16240.....16250.....16260 ▼ ▼ ▼ ▼ ▼ ▼ rCRS GCAAGTACAGCAATCAACCCTCAACTATCACACATCAACTGCAACTCCAAAGCCACCCCT ...... S...... S.. MKH080 GCAAGTACAGCAATCAACCTTCAACTATCACACATCAACTGCAACTCCAAAGCCACCTCT

39 .....16270.....16280.....16290.....16300.....16310.....16320 ▼ ▼ ▼ ▼ ▼ ▼ rCRS CACCCACTAGGATACCAACAAACCTACCCACCCTTAACAGTACATAGTACATAAAGCCAT ...... S...... MKH080 CACCCACTAGGATACCAACAAACCTACCCACCCTTAACAGTACATAGCACATAAAGCCAT

.....16330.....16340.....16350.....16360.....16370.....16380 ▼ ▼ ▼ ▼ ▼ ▼ rCRS TTACCGTACATAGCACATTACAGTCAAATCCCTTCTCGTCCCCATGGATGACCCCCCTCA ...... MKH080 TTACCGTACATAGCACATTACAGTCAAATCCCTTCTCGTCCCCATGGATGACCCCCCTCA

.....16390.....16400.....16410.....16420.....16430.....16440 ▼ ▼ ▼ ▼ ▼ ▼ rCRS GATAGGGGTCCCTTGACCACCATCCTCCGTGAAATCAATATCCCGCACAAGAGTGCTACT ...... MKH080 GATAGGGGTCCCTTGACCACCATCCTCCGTGAAATCAATATCCCGCACAAGAGTGCTACT

.....16450.....16460.....16470.....16480.....16490.....16500 ▼ ▼ ▼ ▼ ▼ ▼ rCRS CTCCTCGCTCCGGGCCCATAACACTTGGGGGTAGCTAAAGTGAACTGTATCCGACATCTG ...... MKH080 CTCCTCGCTCCGGGCCCATAACACTTGGGGGTAGCTAAAGTGAACTGTATCCGACATCTG

.....16510.....16520.....16530.....16540.....16550.....16560 ▼ ▼ ▼ ▼ ▼ ▼ rCRS GTTCCTACTTCAGGGTCATAAAGCCTAAATAGCCCACACGTTCCCCTTAAATAAGACATC ...... S...... MKH080 GTTCCTACTTCAGGGCCATAAAGCCTAAATAGCCCACACGTTCCCCTTAAATAAGACATC

.....1...... 10...... 20...... 30...... 40...... 50... ▼ ▼ ▼ ▼ ▼ ▼ rCRS ACGATGGATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGG ...... MKH080 ACGATGGATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGG

40 .....60...... 70...... 80...... 90...... 100...... 110.. ▼ ▼ ▼ ▼ ▼ ▼ rCRS TATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCC ...... S...... MKH080 TATTTTCGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCC

.....120...... 130...... 140...... 150...... 160...... 170.. ▼ ▼ ▼ ▼ ▼ ▼ rCRS TATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTC ...... MKH080 TATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTC

.....180...... 190...... 200...... 210...... 220...... 230.. ▼ ▼ ▼ ▼ ▼ ▼ rCRS AATATTACAGGCGAACATACTTACTAAAGTGTGTTAATTAATTAATGCTTGTAGGACATA ...... MKH080 AATATTACAGGCGAACATACTTACTAAAGTGTGTTAATTAATTAATGCTTGTAGGACATA

.....240...... 250...... 260...... 270...... 280...... 290.. ▼ ▼ ▼ ▼ ▼ ▼ rCRS ATAATAACAATTGAATGTCTGCACAGCCACTTTCCACACAGACATCATAACAAAAAATTT ...... S...... MKH080 ATAATAACAATTGAATGTCTGCACAGCCGCTTTCCACACAGACATCATAACAAAAAATTT

.....300...... 310...... 320...... 330...... 340...... 350. ▼ ▼ ▼ ▼ ▼ ▼ rCRS CCACCAAACCCCCCCTCCCCC─GCTTCTGGCCACAGCACTTAAACACATCTCTGCCAAAC ...... I...... MKH080 CCACCAAACCCCCCCTCCCCCCGCTTCTGGCCACAGCACTTAAACACATCTCTGCCAAAC

...... 360...... 370...... 380...... 390...... 400...... 410. ▼ ▼ ▼ ▼ ▼ ▼ rCRS CCCAAAAACAAAGAACCCTAACACCAGCCTAACCAGATTTCAAATTTTATCTTTTGGCGG ...... MKH080 CCCAAAAACAAAGAACCCTAACACCAGCCTAACCAGATTTCAAATTTTATCTTTTGGCGG

41 ...... 420...... 430...... 440...... 450...... 460...... 470. ▼ ▼ ▼ ▼ ▼ ▼ rCRS TATGCACTTTTAACAGTCACCCCCCAACTAACACATTATTTTCCCCTCCCACTCCCATAC ...... MKH080 TATGCACTTTTAACAGTCACCCCCCAACTAACACATTATTTTCCCCTCCCACTCCCATAC

...... 480...... 490...... 500...... 510...... 520...... 530. ▼ ▼ ▼ ▼ ▼ ▼ rCRS TACTAATCTCATCAATACAACCCCCGCCCATCCTACCCAGCACACACACACCGCTGCTAA ...... S...... MKH080 TACTAATCTCATCAACACAACCCCCGCCCATCCTACCCAGCACACACACACCGCTGCTAA

...... 540...... 550...... 560...... 570....

▼ ▼ ▼ ▼ rCRS CCCCATACCCCGAACCAACCAAACCCCAAAGACACCCCCCACA ...... MKH080 CCCCATACCCCGAACCAACCAAACCCCAAAGACACCCCCCACA

Figure 4.5 (a): The haplotype of Makrani individual (MKH080) for entire mtDNA control region The observed haplotype of MKH080after the alignment of reconstructed sequence with rCRS (16145A, 16176T, 16223T, 16261T, 16311C, 16519C, 73G, 263G, 315.1C, 489C) I: Insertions, S: Transitions, V: Transversions.

.....16030.....16040.....16050.....16060.....16070.....16080 ▼ ▼ ▼ ▼ ▼ ▼ rCRS TTCTTTCATGGGGAAGCAGATTTGGGTACCACCCAAGTATTGACTCACCCATCAACAACC ...... S...... KLH016 TTCTTTCATGGGGAAGCAGATTTGGGTGCCACCCAAGTATTGACTCACCCATCAACAACC

.....16090.....16100.....16110.....16120.....16130.....16140 ▼ ▼ ▼ ▼ ▼ ▼ rCRS GCTATGTATTTCGTACATTACTGCCAGCCACCATGAATATTGTACGGTACCATAAATACT ...... V...... KLH016 GCTATGTATTTCGTACATTACTGCCAGCCACCATGAATATTGTACCGTACCATAAATACT

.....16150.....16160.....16170.....16180.....16190.....16200

▼ ▼ ▼ ▼ ▼ ▼ rCRS TGACCACCTGTAGTACATAAAAACCCAATCCACATCAAAACCCCCTCCCCATGCTTACAA ...... S...... KLH016 TGACCACCTGCAGTACATAAAAACCCAATCCACATCAAAACCCCCTCCCCATGCTTACAA

42 .....16210.....16220.....16230.....16240.....16250.....16260 ▼ ▼ ▼ ▼ ▼ ▼ rCRS GCAAGTACAGCAATCAACCCTCAACTATCACACATCAACTGCAACTCCAAAGCCACCCCT ...... S...... KLH016 GCAAGTACAGCAATCAACCCTCAACTATCACACATCAACTGCAATTCCAAAGCCACCCCT

.....16270.....16280.....16290.....16300.....16310.....16320 ▼ ▼ ▼ ▼ ▼ ▼ rCRS CACCCACTAGGATACCAACAAACCTACCCACCCTTAACAGTACATAGTACATAAAGCCAT ...... KLH016 CACCCACTAGGATACCAACAAACCTACCCACCCTTAACAGTACATAGTACATAAAGCCAT

.....16330.....16340.....16350.....16360.....16370.....16380

▼ ▼ ▼ ▼ ▼ ▼ rCRS TTACCGTACATAGCACATTACAGTCAAATCCCTTCTCGTCCCCATGGATGACCCCCCTCA ...... S...... KLH016 TTACCGTACATAGCACATTACAGTCAAATCCCTTCTCGCCCCCATGGATGACCCCCCTCA

.....16390.....16400.....16410.....16420.....16430.....16440 ▼ ▼ ▼ ▼ ▼ ▼ rCRS GATAGGGGTCCCTTGACCACCATCCTCCGTGAAATCAATATCCCGCACAAGAGTGCTACT ...... KLH016 GATAGGGGTCCCTTGACCACCATCCTCCGTGAAATCAATATCCCGCACAAGAGTGCTACT

.....16450.....16460.....16470.....16480.....16490.....16500 ▼ ▼ ▼ ▼ ▼ ▼ rCRS CTCCTCGCTCCGGGCCCATAACACTTGGGGGTAGCTAAAGTGAACTGTATCCGACATCTG ...... KLH016 CTCCTCGCTCCGGGCCCATAACACTTGGGGGTAGCTAAAGTGAACTGTATCCGACATCTG

.....16510.....16520.....16530.....16540.....16550.....16560 ▼ ▼ ▼ ▼ ▼ ▼ rCRS GTTCCTACTTCAGGGTCATAAAGCCTAAATAGCCCACACGTTCCCCTTAAATAAGACATC ...... S...... KLH016 GTTCCTACTTCAGGGCCATAAAGCCTAAATAGCCCACACGTTCCCCTTAAATAAGACATC

.....1...... 10...... 20...... 30...... 40...... 50... ▼ ▼ ▼ ▼ ▼ ▼ rCRS ACGATGGATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGG ...... KLH016 ACGATGGATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGG

43 .....60...... 70...... 80...... 90...... 100...... 110.. ▼ ▼ ▼ ▼ ▼ ▼ rCRS TATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCC ...... S...... KLH016 TATTTTCGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCC

.....120...... 130...... 140...... 150...... 160...... 170.. ▼ ▼ ▼ ▼ ▼ ▼ rCRS TATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTC ...... S...... KLH016 TATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCCATTATTTATCGCACCTACGTTC

.....180...... 190...... 200...... 210...... 220...... 230.. ▼ ▼ ▼ ▼ ▼ ▼ rCRS AATATTACAGGCGAACATACTTACTAAAGTGTGTTAATTAATTAATGCTTGTAGGACATA ...... S...... KLH016 AATATTACAGGCGAACATACTTACTAAAGTGTGTTAATTAATCAATGCTTGTAGGACATA

.....240...... 250...... 260...... 270...... 280...... 290.. ▼ ▼ ▼ ▼ ▼ ▼ rCRS ATAATAACAATTGAATGTCTGCACAGCCACTTTCCACACAGACATCATAACAAAAAATTT ...... S...... KLH016 ATAATAACAATTGAATGTCTGCACAGCCGCTTTCCACACAGACATCATAACAAAAAATTT

.....300...... 310...... 320...... 330...... 340...... 350.

▼ ▼ ▼ ▼ ▼ ▼ rCRS CCACCAAACCCCCCCTCCCCC─GCTTCTGGCCACAGCACTTAAACACATCTCTGCCAAAC ...... I...... S...... KLH016 CCACCAAACCCCCCCTCCCCCCGCTTCTGGCCACAGCACTTAAACATATCTCTGCCAAAC

...... 360...... 370...... 380...... 390...... 400...... 410. ▼ ▼ ▼ ▼ ▼ ▼ rCRS CCCAAAAACAAAGAACCCTAACACCAGCCTAACCAGATTTCAAATTTTATCTTTTGGCGG ...... KLH016 CCCAAAAACAAAGAACCCTAACACCAGCCTAACCAGATTTCAAATTTTATCTTTTGGCGG

...... 420...... 430...... 440...... 450...... 460...... 470. ▼ ▼ ▼ ▼ ▼ ▼ rCRS TATGCACTTTTAACAGTCACCCCCCAACTAACACATTATTTTCCCCTCCCACTCCCATAC ...... KLH016 TATGCACTTTTAACAGTCACCCCCCAACTAACACATTATTTTCCCCTCCCACTCCCATAC

44 ...... 480...... 490...... 500...... 510...... 520...... 530. ▼ ▼ ▼ ▼ ▼ ▼ rCRS TACTAATCTCATCAATACAACCCCCGCCCATCCTACCCAGCACACACACACCGCTGCTAA ...... S...... KLH016 TACTAATCTCATCAATACAACCCCCGCCCATCCTGCCCAGCACACACACACCGCTGCTAA

...... 540...... 550...... 560...... 570....

▼ ▼ ▼ ▼ rCRS CCCCATACCCCGAACCAACCAAACCCCAAAGACACCCCCCACA ...... KLH016 CCCCATACCCCGAACCAACCAAACCCCAAAGACACCCCCCACA

Figure 4.5 (b): The haplotype of Kalashi individual (KLH016) for entire mtDNA control region. The observed haplotype of KLH016 after the alignment of reconstructed sequence with rCRS (16051G 16129C 16154C 16248T 16362C, 16519C, 73G, 152C, 217C, 263G, 315.1C, 340T, 508G) I: Insertion, S: Transition, V: Transversions.

4.5. Identification of Haplotypes and Assignment of Haplogroups Based on identified differences from rCRS in both Makrani and Kalashi populations, 70 different haplotypes (of which 54 unique haplotypes) in former and 14 different haplotypes (of which 5 unique haplotypes) in later were observed. In Makrani population, 16 haplotypes were shared by more than one individual. However, only5 haplotypes were shared by more than one individual in Kalashi population. Based on haplotypes of mtDNA control region, haplogroups were assigned to each individual in both populations, which are shown in (Table 4.1 a and b). The four variant positions in mtDNA control region including 309 C insertions, 315C insertions, 524 indels and 16519 sites were disregarded during haplogroup assignment. The reason for disregarding these positions is that these regions are length polymorphisms, which are often difficult to determine precisely by Sanger sequencing. Therefore, it was safer to exclude these. The 16519 site is straight forward to determine by sequencing but it is a hypermutable position (mutational hotspot; evidenced by the fact that it is extremely recurrent in the mtDNA phylogeny) and therefore of very little phylogenetic value in haplogroup assignment. However, two haplotypes observed in the Makranis, both carrying a characteristic combination of two mutations in HVS-II (154C and 194T) could not be confidently assigned to any known (sub) haplogroup, although the presence of both 16223T and

45 489C indicate membership within Macrohaplogroup M; this lineage was therefore tentatively assigned to haplogroup ‘‘M-154-194’’.

Table 4.1a: The estimated haplotypes and haplogroups in Makrani population

GenBank Sr. Sample Accession No. ID Differences to the rCRS

(309ins, 315ins, 524ins and 16519 were disregarded) Numbers

Haplogroup HaplotypeID 16071T, 16093C, 16129A, 16145A, 16187T, 16189C, 16213A, 16223T, 16234T, 16265C, 1 MKH001 h57 16278T, 16286G, 16294T, 16311C, 16360T, L1c2a1a KM358171 16527T, 73G, 151T, 152C, 182T, 186A, 189C, 195C, 198T, 247A, 263G, 297G, 316A 2 MKH002 h31 16209C, 152C, 263G R30a1b KM358172 16129A, 16189C, 16223T, 16249C, 16311C, 3 MKH003 h19 M1a1 KM358173 16359C, 16519C, 73G, 195C, 263G, 489C 16182C, 16183C, 16189C, 16223T, 16278T, 4 MKH004 h25 16290T, 16294T, 16309G, 16390A, 73G, 146C, L2a1b1a KM358174 152C, 195C, 263G 16124C, 16223T, 16319A, 73G, 150T, 152C, 5 MKH005 h11 L3d1a1a KM358175 263G 6 MKH006 h36 16243C, 16519C, 153G, 263G ? KM358176 16182C, 16183C, 16189C, 16223T, 16278T, 7 MKH007 h25 16290T, 16294T, 16309G, 16390A, 73G, 146C, L2a1b1a KM358177 152C, 195C, 263G 16172C, 16189C, 16192T, 16223T, 16292T, 8 MKH008 h24 16325C, 16519C, 73G, 146C, 189G, 194T, 195C, W6 KM358178 204C, 207A, 263G 9 MKH009 h40 16519C, 199C, 263G ? KM358179 16069T, 16093C, 16126C, 16145A, 16172C, 10 MKH011 h63 16222T, 16259T, 16261T, 73G, 146C, 242T, J1b1a1 KM358180 263G, 295T, 462T, 489C 11 MKH012 h41 16519C, 263G ? KM358181 12 MKH013 h4 16093C, 16189C, 263G ? KM358182 16126C, 16163G, 16186T, 16189C, 16294T, 13 MKH014 h17 T1a8a KM358183 16325C, 16519C, 73G, 152C, 195C, 263G 16129A, 16148T, 16168T, 16172C, 16187T, 16188G, 16189C, 16223T, 16230G, 16278T, 14 MKH015 h21 L0a1b KM358184 16293G, 16311C, 16320T, 93G, 95C, 185A, 189G, 236C, 247A, 263G 16182C, 16183C, 16189C, 16192T, 16223T, 15 MKH016 h30 16278T, 16290T, 16294T, 16309G, 16390A, L2a1b1a KM358185 73G, 146C, 152C, 195C, 263G 16182C, 16183C, 16189C, 16192T, 16223T, 16 MKH017 h30 L2a1b1a KM358186 16278T, 16290T, 16294T, 16309G, 16390A,

46

GenBank Sr. Sample Accession No. ID Differences to the rCRS

(309ins, 315ins, 524ins and 16519 were disregarded) Numbers

Haplogroup HaplotypeID 73G, 146C, 152C, 195C, 263G 16126C, 16163G, 16186T, 16189C, 16294T, 17 MKH018 h18 T1a1'3 KM358187 16519C, 73G, 152C, 195C, 263G, 372C 18 MKH019 h34 16243C, 16311C, 16519C, 204C, 263G ? KM358188 16182C, 16183C, 16189C, 16192T, 16223T, 19 MKH020 h30 16278T, 16290T, 16294T, 16309G, 16390A, L2a1b1a KM358189 73G, 146C, 152C, 195C, 263G 16192T, 16270T, 73G, 150T, 152C, 263G, 264T, 20 MKH021 h53 U5b KM358190 275A 21 MKH022 h35 16243C, 16519C, 153G, 200G, 263G ? KM358191 16309G, 16318T, 16519C, 73G, 151T, 152C, 22 MKH023 h37 U7a KM358192 263G 23 MKH024 h31 16209C, 152C, 263G R30a1b KM358193 16182C, 16183C, 16189C, 16192T, 16223T, 24 MKH025 h30 16278T, 16290T, 16294T, 16309G, 16390A, L2a1b1a KM358194 73G, 146C, 152C, 195C, 263G 16071T, 16172C, 16223T, 16293T, 16311C, 25 MKH026 h58 16355T, 16362C, 16399G, 16519C, 73G, 189G, L4b2a2 KM358195 244G, 263G 16126C, 16223T, 16311C, 16519C, 73G, 204C, 26 MKH027 h14 M3a1 KM358196 217C, 263G, 482C, 489C 16114A, 16129A, 16213A, 16223T, 16278T, 27 MKH028 h9 16355T, 16362C, 16390A, 73G, 150T, 152C, L2b1a KM358197 182T, 195C, 198T, 204C, 263G, 418T 28 MKH029 h68 16051G, 16145A, 16206C, 73G, 152C, 263G U2a KM358198 16182C, 16183C, 16189C, 16223T, 16278T, 29 MKH030 h25 16290T, 16294T, 16309G, 16390A, 73G, 146C, L2a1b1a KM358199 152C, 195C, 263G 16124C, 16223T, 16294T, 16319A, 73G, 150T, 30 MKH031 h10 L3d1a1a KM358200 152C, 263G 16188T, 16223T, 16231C, 16264T, 16362C, 31 MKH032 h54 M6a1b KM358201 16519C, 73G, 146C, 263G, 461T, 489C 16209C, 16223T, 16311C, 16519C, 73G, 150T, 32 MKH033 h33 L3f1b4a KM358202 189G, 200G, 263G 16220C, 16265G, 16298C, 16362C, 73G, 150T, 33 MKH034 h50 F3b1 KM358203 152C, 249d, 263G 16126C, 16163G, 16186T, 16189C, 16274A, 34 MKH035 h16 T1a7 KM358204 16294T, 16519C, 73G, 263G, 512G, 519G 16220C, 16265G, 16298C, 16362C, 73G, 150T, 35 MKH036 h50 F3b1 KM358205 152C, 249d, 263G 16124C, 16223T, 16319A, 73G, 150T, 152C, 36 MKH037 h11 L3d1a1a KM358206 263G 37 MKH038 h62 16071T, 16111T, 16278T, 16311C, 16519C, R2 KM358207

47

GenBank Sr. Sample Accession No. ID Differences to the rCRS

(309ins, 315ins, 524ins and 16519 were disregarded) Numbers

Haplogroup HaplotypeID 73G, 150T, 152C, 263G 16223T, 16278T, 16286T, 16294T, 16309G, 38 MKH039 h47 L2a1a2 KM358208 16390A, 16519C, 73G, 146C, 152C, 195C, 263G 16309G, 16318T, 16519C, 73G, 151T, 152C, 39 MKH040 h37 U7a KM358209 263G 16223T, 16297C, 16318T, 16519C, 73G, 93G, 40 MKH041 h45 M18a KM358210 194T, 246C, 263G, 489C 41 MKH042 h2 16092C, 16311C, 16356C, 263G, 550G ? KM358211 42 MKH043 h32 16209C, 16311C, 73G, 152C, 263G R30a1b KM358212 16223T, 16234T, 16249C, 16278T, 16294T, 43 MKH044 h49 L2a1 KM358213 16295T, 16390A, 73G, 146C, 152C, 195C, 263G 16069T, 16126C, 16145A, 16222T, 16261T, 44 MKH045 h64 16295T, 16301T, 16519C, 73G, 263G, 271T, J1b1b KM358214 295T, 462T, 489C 45 MKH046 h31 16209C, 152C, 263G R30a1b KM358215 16069T, 16114T, 16126C, 16193T, 16519C, 46 MKH047 h67 J1d KM358216 73G, 152C, 263G, 295T, 462T, 489C 16309G, 16318T, 16519C, 73G, 151T, 152C, 47 MKH048 h37 U7a KM358217 263G M-154- 48 MKH049 h46 KM358218 16223T, 16519C, 73G, 154C, 194T, 263G, 489C 194 16069T, 16126C, 16239T, 16366T, 16399G, 49 MKH050 h65 73G, 150T, 195C, 263G, 295T, 489C, 573.1C, J2a2 KM358219 573.2C 16126C, 16163G, 16186T, 16189C, 16294T, 50 MKH052 h17 T1a8a KM358220 16325C, 16519C, 73G, 152C, 195C, 263G 16126C, 16163G, 16186T, 16189C, 16294T, 51 MKH055 h17 T1a8a KM358221 16325C, 16519C, 73G, 152C, 195C, 263G 16126C, 16223T, 16311C, 16519C, 73G, 263G, 52 MKH056 h15 M3 KM358222 482C, 489C 53 MKH057 h69 16051G, 16172C, 73G, 263G U2b1 KM358223 54 MKH058 h35 16243C, 16519C, 153G, 200G, 263G ? KM358224 16126C, 16223T, 16311C, 16519C, 73G, 263G, 55 MKH061 h15 M3 482C, 489C 16126C, 16223T, 16311C, 16519C, 73G, 263G, 56 MKH062 h15 M3 KM358225 482C, 489C 16126C, 16163G, 16186T, 16189C, 16294T, 57 MKH063 h17 T1a8a KM358226 16325C, 16519C, 73G, 152C, 195C, 263G 58 MKH067 h41 16519C, 263G ? KM358227 59 MKH068 h38 16311C, 16519C, 93G, 150T, 263G ? KM358228 16093C, 16189C, 16192T, 16223T, 16278T, 60 MKH069 h7 L2a1 KM358229 16284G, 16294T, 16309G, 16390A, 73G, 143A,

48

GenBank Sr. Sample Accession No. ID Differences to the rCRS

(309ins, 315ins, 524ins and 16519 were disregarded) Numbers

Haplogroup HaplotypeID 146C, 152C, 195C, 263G 61 MKH071 h69 16051G, 16172C, 73G, 263G U2b1 KM358230 16126C, 16163G, 16186T, 16189C, 16294T, 62 MKH072 h17 T1a8a KM358231 16325C, 16519C, 73G, 152C, 195C, 263G 16129A, 16223T, 16519C, 73G, 154C, 194T, M-154- 63 MKH073 h20 KM358232 263G, 489C 194 16182C, 16183C, 16189C, 16223T, 16278T, 64 MKH074 h26 16290T, 16294T, 16309G, 16390A, 73G, 146C, L2a1b1a KM358233 152C, 195C, 263G, 498.1C, 526.1G, 573.1C 65 MKH075 h69 16051G, 16172C, 73G, 263G U2b1 KM358234 16182C, 16183C, 16189C, 16223T, 16278T, 66 MKH077 h25 16290T, 16294T, 16309G, 16390A, 73G, 146C, L2a1b1a KM358235 152C, 195C, 263G 16145A, 16176T, 16223T, 16261T, 16311C, 67 MKH078 h23 M4 KM358236 16519C, 73G, 263G, 489C 16214T, 16217C, 16335G, 16519C, 73G, 152C, 68 MKH079 h51 HV2a KM358237 246C, 263G 16145A, 16176T, 16223T, 16261T, 16311C, 69 MKH080 h23 M4 KM358238 16519C, 73G, 263G, 489C 70 MKH081 h41 16519C, 263G ? KM358239 16051G, 16114T, 16189C, 16192.1T, 16223T, 16293T, 16311C, 16316G, 16355T, 16362C, 71 MKH082 h70 L4b2b1 KM358240 16399G, 16519C, 73G, 146C, 152C, 195C, 244G, 263G, 340T 16093C, 16182C, 16183C, 16189C, 16223T, 72 MKH083 h6 16278T, 16290T, 16294T, 16390A, 73G, 146C, L2a1b1a KM358241 152C, 195C, 263G 16355T, 73G, 195C, 263G, 343T, 499A, 573.1C, 73 MKH084 h39 U4'9 KM358242 573.2C 16214T, 16217C, 16335G, 16519C, 73G, 152C, 74 MKH085 h52 HV2a KM358243 246C, 263G, 573.1C, 573.2C 75 MKH086 h48 16223T, 16278T, 16294T, 16309G, 16390A, L2a1 KM358244 16519C, 73G, 146C, 152C, 195C, 263G 16182C, 16183C, 16189C, 16210C, 16223T, 76 MKH087 h27 16278T, 16290T, 16294T, 16309G, 16390A, L2a1b1a KM358245 73G, 146C, 152C, 195C, 263G 16126C, 16270T, 16278T, 16294T, 16357C, 77 MKH088 h13 U5b KM358246 16519C, 73G, 150T, 153G, 263G 16069T, 16126C, 16193T, 16266T, 16519C, 78 MKH089 h66 73G, 152C, 263G, 295T, 324G, 376C, 462T, J1d KM358247 489C 16092C, 16207G, 16309G, 16318T, 16519C, 79 MKH090 h3 U7a KM358248 73G, 151T, 152C, 263G

49

GenBank Sr. Sample Accession No. ID Differences to the rCRS

(309ins, 315ins, 524ins and 16519 were disregarded) Numbers

Haplogroup HaplotypeID 16086C, 16148T, 16184T, 16223T, 16259T, 80 MKH091 h1 16278T, 16319A, 16399G, 16526A, 73G, 150T, M32c KM358249 200G, 263G, 489C 81 MKH092 h43 16256T, 263G ? KM358250 82 MKH093 h43 16256T, 263G ? KM358251 16093C, 16189C, 16192T, 16223T, 16278T, 83 MKH094 h8 16294T, 16309G, 16390A, 16519C, 73G, 146C, L2a1 KM358252 152C, 195C, 263G 84 MKH095 h12 16126C, 16294T, 16296T, 16519C, 73G, 217C, T2 KM358253 263G, 490G 16145A, 16223T, 16234T, 16261T, 16311C, 85 MKH096 h22 M4 KM358254 16519C, 73G, 263G, 489C 16148T, 16172C, 16187T, 16188G, 16189C, 16223T, 16230G, 16242T, 16311C, 16320T, 86 MKH097 h56 L0a2a2 KM358255 16519C, 64T, 93G, 152C, 189G, 204C, 207A, 236C, 247A, 263G 87 MKH098 h42 263G ? KM358256 16223T, 16234T, 16249C, 16278T, 16294T, 88 MKH099 h49 L2a1 KM358257 16295T, 16390A, 73G, 146C, 152C, 195C, 263G 89 MKH100 h60 16071T, 16519C, 73G, 152C, 263G R2 KM358258 16183C, 16189C, 16207G, 16309G, 16318C, 90 MKH101 h29 16519C, 73G, 151T, 152C, 263G, 573.1C, U7a KM358259 573.2C 91 MKH102 h61 16071T, 16519C, 73G, 263G R2 KM358260 92 MKH103 h5 16093C, 16189C, 93G, 263G ? KM358261 93 MKH104 h28 16189C, 16207C, 16309G, 16318C, 16519C, U7a KM358262 73G, 151T, 152C, 263G, 573.1C, 573.2C 94 MKH105 h35 16243C, 16519C, 153G, 200G, 263G ? KM358263 95 MKH106 h44 16234T, 152C, 263G ? KM358264 96 MKH107 h31 16209C, 152C, 263G R30a1b KM358265 97 MKH108 h60 16071T, 16519C, 73G, 152C, 263G R2 KM358266 98 MKH109 h59 16071T, 16519C, 73G, 152C, 185A, 263G R2 KM358267 99 MKH112 h61 16071T, 16519C, 73G, 263G R2 KM358268 16179T, 16356C, 16519C, 73G, 195C, 228A, 100 MKH113 h55 U4c1 KM358269 263G, 499A

The mtDNA control-region sequences herein reported are available in GenBank under given accession numbers. Polymorphic sites have been assigned with numbers in accordance with the revised Cambridge Reference Sequence (rCRS) (Andrews et al., 1999). Haplotypes arranged according to assigned haplogroups.

50

Table 4.1b: Kalashi samples with estimated haplotypes and haplogroups

GenBank Sr. Sample Differences to the rCRS

ogroup Accession No. ID (309ins, 315ins, 524ins and 16519 were disregarded)

Number

Hapl HaplotypeID 16051G, 16129C, 16154C, 16248T, 16362C, 16391A, 1 KLH001 h13 U2e1 KM358270 16519C, 73G, 152C, 217C, 263G, 340T, 508G 16069T, 16126C, 16193T, 16274A, 16278T, 73G, 150T, 2 KLH002 h12 J2b1a KM358271 152C, 263G, 295T, 489C 3 KLH003 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358272 4 KLH004 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358273 5 KLH005 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358274 6 KLH006 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358275 7 KLH007 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358276

8 KLH008 h13 16051G, 16129C, 16154C, 16248T, 16362C, 16391A, U2e1 KM358277 16519C, 73G, 152C, 217C, 263G, 340T, 508G 16071T, 16111T, 16147T, 16203G, 16311C, 16519C, 9 KLH009 h11 R2 KM358278 73G, 150T, 263G 10 KLH010 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358279 11 KLH011 h4 16356C, 16519C, 73G, 195C, 198T, 263G, 499A U4 KM358280 12 KLH012 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358281 16051G, 16129C, 16154C, 16248T, 16362C, 16519C, 13 KLH014 h14 U2e1 KM358282 73G, 152C, 217C, 263G, 340T, 508G 14 KLH015 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358283 16051G, 16129C, 16154C, 16248T, 16362C, 16519C, 15 KLH016 h14 U2e1 KM358284 73G, 152C, 217C, 263G, 340T, 508G 16 KLH018 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358285 17 KLH019 h7 16223T, 16289G, 16519C, 73G, 263G, 489C, 511T M65a KM358286 16051G, 16129C, 16154C, 16248T, 16362C, 16519C, 18 KLH020 h14 U2e1 KM358287 73G, 152C, 217C, 263G, 340T, 508G 16223T, 55.1T, 57C, 59C, 62T, 73G, 146C, 152C, 195C, 19 KLH021 h8 ? KM358288 263G, 489C 20 KLH022 h9 16134T, 16356C, 16519C, 73G, 152C, 195C, 263G, 499A U4 KM358289 16051G, 16129C, 16154C, 16248T, 16362C, 16391A, 21 KLH023 h13 U2e1 KM358290 16519C, 73G, 152C, 217C, 263G, 340T, 508G 22 KLH024 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358291 16069T, 16126C, 16193T, 16274A, 16278T, 73G, 150T, 23 KLH026 h12 J2b1a KM358292 152C, 263G, 295T, 489C 24 KLH027 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358293 25 KLH028 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358294 26 KLH 029 h9 16134T, 16356C, 16519C, 73G, 152C, 195C, 263G, 499A U4 KM358295 16069T, 16126C, 16193T, 16274A, 16278T, 73G, 150T, 27 KLH030 h12 J2b1a KM358296 152C, 263G, 295T, 489C

51

GenBank Sr. Sample Differences to the rCRS

ogroup Accession No. ID (309ins, 315ins, 524ins and 16519 were disregarded)

Number

Hapl HaplotypeID 16051G, 16129C, 16154C, 16248T, 16362C, 16519C, 28 KLH031 h14 U2e1 KM358297 73G, 152C, 217C, 263G, 340T, 508G 29 KLH032 h9 16134T, 16356C, 16519C, 73G, 152C, 195C, 263G, 499A U4 KM358298 16071T, 16111T, 16147T, 16203G, 16311C, 16519C, 30 KLH033 h11 R2 KM358299 73G, 150T, 263G 16069T, 16126C, 16193T, 16274A, 16278T, 73G, 150T, 31 KLH034 h12 J2b1a KM358300 152C, 263G, 295T, 489C 32 KLH035 h6 16240G, 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358301 16051G, 16129C, 16154C, 16248T, 16362C, 16391A, 33 KLH036 h13 U2e1 KM358302 16519C, 73G, 152C, 217C, 263G, 340T, 508G 16051G, 16129C, 16154C, 16248T, 16362C, 16519C, 34 KLH037 h14 U2e1 KM358303 73G, 152C, 217C, 263G, 340T, 508G 35 KLH038 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358304 16069T, 16126C, 16193T, 16274A, 16278T, 73G, 150T, 36 KLH039 h12 J2b1a KM358305 152C, 263G, 295T, 489C 16051G, 16129C, 16154C, 16248T, 16362C, 16391A, 37 KLH040 h13 U2e1 KM358306 16519C, 73G, 152C, 217C, 263G, 340T, 508G 16069T, 16126C, 16193T, 16274A, 16278T, 73G, 150T, 38 KLH041 h12 J2b1a KM358307 152C, 263G, 295T, 489C 39 KLH044 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358308 16069T, 16126C, 16193T, 16274A, 16278T, 73G, 150T, 40 KLH045 h12 J2b1a KM358309 152C, 263G, 295T, 489C 41 KLH046 h9 16134T, 16356C, 16519C, 73G, 152C, 195C, 263G, 499A U4 KM358310 42 KLH047 h9 16134T, 16356C, 16519C, 73G, 152C, 195C, 263G, 499A U4 KM358311 16069T, 16126C, 16193T, 16274A, 16278T, 73G, 150T, 43 KLH048 h12 J2b1a KM358312 152C, 263G, 295T, 489C 44 KLH049 h9 16134T, 16356C, 16519C, 73G, 152C, 195C, 263G, 499A U4 KM358313 16071T, 16111T, 16147T, 16203G, 16311C, 16519C, 45 KLH050 h11 R2 KM358314 73G, 150T, 263G 46 KLH051 h4 16356C, 16519C, 73G, 195C, 198T, 263G, 499A U4 KM358315 47 KLH052 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358316 48 KLH053 h3 16354T, 263G H2a1 KM358317 49 KLH054 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358318 16051G, 16129C, 16154C, 16248T, 16362C, 16391A, 50 KLH056 h13 U2e1 KM358319 16519C, 73G, 152C, 217C, 263G, 340T, 508G 51 KLH057 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358320 16069T, 16126C, 16193T, 16274A, 16278T, 73G, 150T, 52 KLH058 h12 J2b1a KM358321 152C, 263G, 295T, 489C 53 KLH059 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358322 16069T, 16126C, 16193T, 16274A, 16278T, 73G, 150T, 54 KLH060 h12 J2b1a KM358323 152C, 263G, 295T, 489C 16069T, 16126C, 16193T, 16274A, 16278T, 73G, 150T, 55 KLH061 h12 J2b1a KM358324 152C, 263G, 295T, 489C

52

GenBank Sr. Sample Differences to the rCRS

ogroup Accession No. ID (309ins, 315ins, 524ins and 16519 were disregarded)

Number

Hapl HaplotypeID 56 KLH062 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358325 57 KLH063 h4 16356C, 16519C, 73G, 195C, 198T, 263G, 499A U4 KM358326 16069T, 16126C, 16193T, 16274A, 16278T, 73G, 150T, 58 KLH064 h12 J2b1a KM358327 152C, 263G, 295T, 489C 59 KLH065 h4 16356C, 16519C, 73G, 195C, 198T, 263G, 499A U4 KM358328 16051G, 16129C, 16154C, 16248T, 16362C, 16391A, 60 KLH066 h13 U2e1 KM358329 16519C, 73G, 152C, 217C, 263G, 340T, 508G 61 KLH067 h4 16356C, 16519C, 73G, 195C, 198T, 263G, 499A U4 KM358330 16069T, 16126C, 16193T, 16274A, 16278T, 73G, 150T, 62 KLH068 h12 J2b1a KM358331 152C, 263G, 295T, 489C 63 KLH069 h4 16356C, 16519C, 73G, 195C, 198T, 263G, 499A U4 KM358332 16069T, 16126C, 16193T, 16274A, 16278T, 73G, 150T, 64 KLH070 h12 J2b1a KM358333 152C, 263G, 295T, 489C 65 KLH071 h4 16356C, 16519C, 73G, 195C, 198T, 263G, 499A U4 KM358334 16051G, 16129C, 16154C, 16248T, 16362C, 16391A, 66 KLH072 h13 U2e1 KM358335 16519C, 73G, 152C, 217C, 263G, 340T, 508G 67 KLH073 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358336 16071T, 16111T, 16147T, 16203G, 16311C, 16519C, 68 KLH074 h11 R2 KM358337 73G, 150T, 263G 16051G, 16129C, 16154C, 16248T, 16362C, 16391A, 69 KLH075 h13 U2e1 KM358338 16519C, 73G, 152C, 217C, 263G, 340T, 508G 16069T, 16126C, 16193T, 16274A, 16278T, 73G, 150T, 70 KLH076 h12 J2b1a KM358339 152C, 263G, 295T, 489C 71 KLH077 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358340 16051G, 16129C, 16154C, 16248T, 16362C, 16391A, 72 KLH078 h13 U2e1 KM358341 16519C, 73G, 152C, 217C, 263G, 340T, 508G 73 KLH079 h9 16134T, 16356C, 16519C, 73G, 152C, 195C, 263G, 499A U4 KM358342 74 KLH080 h9 16134T, 16356C, 16519C, 73G, 152C, 195C, 263G, 499A U4 KM358343 75 KLH081 h9 16134T, 16356C, 16519C, 73G, 152C, 195C, 263G, 499A U4 KM358344 76 KLH082 h9 16134T, 16356C, 16519C, 73G, 152C, 195C, 263G, 499A U4 KM358345 77 KLH083 h1 16354T, 16519C, 199C, 263G H2a1 KM358346 16051G, 16129C, 16154C, 16248T, 16362C, 16391A, 78 KLH084 h13 U2e1 KM358347 16519C, 73G, 152C, 217C, 263G, 340T, 508G 79 KLH085 h9 16134T, 16356C, 16519C, 73G, 152C, 195C, 263G, 499A U4 KM358348 80 KLH086 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358349 81 KLH087 h2 16354T, 16519C, 263G H2a1 KM358350 16051G, 16129C, 16154C, 16248T, 16362C, 16391A, 82 KLH088 h13 U2e1 KM358351 16519C, 73G, 152C, 217C, 263G, 340T, 508G 83 KLH089 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358352 84 KLH090 h9 16134T, 16356C, 16519C, 73G, 152C, 195C, 263G, 499A U4 KM358353 85 KLH091 h9 16134T, 16356C, 16519C, 73G, 152C, 195C, 263G, 499A U4 KM358354 86 KLH092 h9 16134T, 16356C, 16519C, 73G, 152C, 195C, 263G, 499A U4 KM358355 87 KLH093 h12 16069T, 16126C, 16193T, 16274A, 16278T, 73G, 150T, J2b1a KM358356

53

GenBank Sr. Sample Differences to the rCRS

ogroup Accession No. ID (309ins, 315ins, 524ins and 16519 were disregarded)

Number

Hapl HaplotypeID 152C, 263G, 295T, 489C 88 KLH094 h9 16134T, 16356C, 16519C, 73G, 152C, 195C, 263G, 499A U4 KM358357 89 KLH095 h9 16134T, 16356C, 16519C, 73G, 152C, 195C, 263G, 499A U4 KM358358 90 KLH097 h9 16134T, 16356C, 16519C, 73G, 152C, 195C, 263G, 499A U4 KM358359 91 KLH098 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358360 92 KLH099 h2 16354T, 16519C, 263G H2a1 KM358361 93 KLH100 h9 16134T, 16356C, 16519C, 73G, 152C, 195C, 263G, 499A U4 KM358362 94 KLH102 h9 16134T, 16356C, 16519C, 73G, 152C, 195C, 263G, 499A U4 KM358363 16051G, 16129C, 16154C, 16248T, 16362C, 16519C, 95 KLH103 h14 U2e1 KM358364 73G, 152C, 217C, 263G, 340T, 508G 16051G, 16129C, 16154C, 16248T, 16362C, 16519C, 96 KLH104 h14 U2e1 KM358365 73G, 152C, 217C, 263G, 340T, 508G 97 KLH105 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358366 98 KLH106 h9 16134T, 16356C, 16519C, 73G, 152C, 195C, 263G, 499A U4 KM358367 99 KLH107 h9 16134T, 16356C, 16519C, 73G, 152C, 195C, 263G, 499A U4 KM358368 100 KLH109 h9 16134T, 16356C, 16519C, 73G, 152C, 195C, 263G, 499A U4 KM358369 101 KLH110 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358370 102 KLH111 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358371 16071T, 16111T, 16147T, 16203G, 16311C, 16519C, 103 KLH112 h11 R2 KM358372 73G, 150T, 263G 104 KLH113 h9 16134T, 16356C, 16519C, 73G, 152C, 195C, 263G, 499A U4 KM358373 105 KLH114 h9 16134T, 16356C, 16519C, 73G, 152C, 195C, 263G, 499A U4 KM358374 106 KLH115 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358375 107 KLH116 h10 16071T, 16519C, 16527T, 73G, 152C, 263G R2 KM358376 108 KLH117 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358377 109 KLH118 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358378 110 KLH119 h5 16362C, 16519C, 58C, 60.1T, 64T, 263G R0a'b KM358379 111 KLH121 h10 16071T, 16519C, 16527T, 73G, 152C, 263G R2 KM358380

The mtDNA control-region sequences herein reported are available in GenBank under given accession numbers. Polymorphic sites have been assigned with numbers in accordance with the revised Cambridge Reference Sequence (rCRS) (Andrews et al., 1999). Haplotypes were arranged according to assigned haplogroups.

54 4.6 Differences Observed in Haplogroups Estimation Either Manually

(PhyloTreemt mtDNA tree Build 16) or by HaploGrep During the haplogroup determination of Makrani samples, the discrepancies have been observed between HaploGrep estimation and manual calls in 22 haplogroups out of total 38 haplogroup found. Furthermore, 8 haplogroup calls by HaploGrep could not be confirmed manually and remained unassigned as shown in (Table 4.2a). However, there were total seven haplogroups found in the Kalash population and haplogroup estimation was done manually, due to discrepancies in all haplogroup calls by HaploGrep as shown in (Table 4.2b). Therefore, there may be a potential space to improve haplogroup estimation by HaploGrep according to the observation of this study.

Table 4.2a: Differences observed in haplogroup estimation of Makrani population either manually (PhyloTreemt mtDNA tree Build 16) or by HaploGrep

Haplogroup Manual Sample ID Estimation by Haplogroup HaploGrep Determination MKH006, MKH022, MKH058, MKH105 H13a1b ? MKH009, MKH012, MKH067, MKH081, H2a2a ? MKH098 MKH013, MKH103 H1f+16093 ? MKH019 H1e1a4 ? MKH027 M3a1+204 M3a1 MKH042 H1b ? MKH044 L2a1+143+@.. L2a1 MKH049 D4b2b M-154-194 MKH057 U2 U2b1 MKH068 H1e1a1 ? MKH069 L2a1b+143 L2a1 MKH071 U2 U2b1 MKH073 M5 M-154-194 MKH075 U2 U2b1 MKH084 U9b1 U4'9 MKH088 T U5b MKH092, MKH093 H1e5 ? MKH094 L2a1+16189 L2a1 MKH099 L2a1+143+@.. L2a1 MKH101 U8c U7a MKH104 U8c U7a MKH106 H13a1d ?

55

Table 4.2 b: Differences observed in haplogroup estimation of Kalashi population either manually (PhyloTreemt mtDNA tree Build 16) or by HaploGrep

Haplogroup Manual Samples IDs. estimation by Haplogroup HaploGrep Determination KLH001, KLH008, KLH014, KLH016, KLH020, KLH023, KLH031, KLH036, KLH036, KLH040, U2e1h U2e1 KLH036, KLH056, KLH066, KLH072, KLH075, KLH078, KLH084, KLH088, KLH103, KLH104 KLH003, KLH004, KLH005, KLH006, KLH007, KLH010, KLH012, KLH015, KLH018, KLH027, KLH028, KLH044, KLH054, KLH057, KLH059, H57 R0a'b KLH073, KLH077, KLH086, KLH089, KLH098, KLH105, KLH110, KLH111, KLH115, KLH117, KLH118, KLH119 KLH019 M65a+@16311 M65a KLH021 M24 ? KLH022, KLH 029, KLH032, KLH046, KLH047, KLH049, KLH079, KLH080, KLH081, KLH082, KLH085, KLH090, KLH091, KLH092, U4a1 U4 KLH094, KLH095, KLH097, KLH100, KLH102, KLH106, KLH107, KLH109, KLH113, KLH114 KLH024, KLH038, KLH052, KLH062, R0a+60.1T R0a'b KLH035 H1j8 R0a'b

4.7 The Haplogroups Diversity within Sub-ethnic group of Kalash Population The Kalash maternal sub-ethnicity and their relevant haplogroups were sorted out to see the haplogroup diversity in each maternal sub-ethnic group of Kalash as shown in (Table 4.3).

56 Table 4.3: The haplogroup diversity in each maternal sub-ethnic group of Kalash

Total number Maternal Sub-ethnicity Haplogroups of samples/ sub-ethnic group Al’ukSher R2, U4 2 Al’ukShernawaw R0a'b 1 Aspan’i R0a'b 2 Babura dari U4, H2a1 4 Bagalie J2b1a, R0a'b, U4 7 Bal’o’e R0a'b, H2a1, U4, U2e1 9 Baramuk R0a'b 1 Bazik J2b1a, U2e1, R0a'b 4 Budadari R0a'b 1 Bulasing U4, U2e1, R0a'b 4 Bumboor J2b1a, U2e1 2 Chagans’eynawaw J2b1a 1 Chagansey U4 1 Damunawaw U4 2 Damudari U4 1 Dhrames J2b1a, U4, R0a'b, R2 9 Gil’asur R0a'b, U2e1 2 Jaro’e’ R0a'b, U2e1 2 L’agay M65a 1 L’atharuk U2e1, H2a1, R0a'b, U4 13 Mahadari R2, U4, U2e1 3 Mutimir R2, J2b1a, R0a'b 4 Ohramsh J2b1a 1 Quresh ? 1 Raja way R0a'b, J2b1a, U2e1 4 Rashmukdari R0a'b, U4, H2a1 4 RumoNowaw U4 1 Sharakat J2b1a, U4, U2e1, R0a'b 5 Sharey U4, R2, R0a'b 4 Tharariek U2e1, U4, R0a'b, R2 8 Wakoke U2e1, U4, R0a'b, J2b1a, R2 7

57 4.8. Frequency of mtDNA Haplogroups The frequencies of mtDNA haplogroups in Makrani and Kalashi populations were calculated. The most frequent haplogroup observed in Makrani population was L2a1b1a (a southeastern African haplogroup found mostly in Mozambique) carried by 11% of the samples and R0a'b haplogroup was carried by 28.8% of Kalashi was found to be the most frequent haplogroup. In Makrani population, 17% of mtDNA profiles could not be confidently assigned to any known haplogroup. However, only 1% of the mtDNA profile remained unassigned to any haplogroup in Kalashi population. The frequencies of mtDNA haplogroups for both Makrani and Kalashi populations are summarized in pie charts (Fig.4.6 a and b) respectively.

M32c 1%

L2a1b1a L3d1a1a ? 11% 17% 3% L2b1a L2a1 L3f1b4a 1% 5% 1% M-154-194 2% L2a1a2 F3b1 1% 2% L0a2a2 W6 1% 1% HV2a L1c2a1a 1% 2% J1d L0a1b 2% J1b1a1 1% 1% J2a2 L1c2a1a 1% 1% J1b1b R2 1% T1a1'3 6% L4b2b1 1% 1% T2 T1a8a 1% T1a7 5% R30a1b M1a1 1% 5% 1% M3 U4'9 3% M4 1% U7a U4c1 U5b 6% U2b1 3% M3a1 1% 2% 3% M6a1b 1% U2a M18a 1% 1% 1%

Figure 4.6 (a): Graphical illustration of frequencies of mtDNA based haplogroups in Makrani population.

58

Figure 4.6 (b): Graphical illustration of frequencies of mtDNA-based haplogroups in Kalashi population.

4.9. The Construction of Median Joining (MJ) Networks

The median joining (MJ) networks was plotted from all control region haplotypes to reveal the possible relationships among haplotypes in Makrani and Kalashi populations, which are shown in figure 4.7 a & b respectively. The haplotypes of Makranis including (h43, h41, h40, h36, h35, h32, h5 and h4) clustered well with rCRS haplotype in the middle of network with highest frequency of h44 haplotype. The substantial divergence was observed among haplotypes in the population with several independent branches including in major haplotypes, h25 and h30, in the first branch, major haplotypes, h15 and h28, in the second branch and major haplotypes, 2 and h17, in the third branch. The median joining network analyses of Kalash population has shown a considerable divergence between haplotypes. This network shows a large number of independent branches giving rise to many sub-branches that are separated by several mutations.

59

Figure 4.7 (a): Median-joining haplotype network of the Makrani population (70 haplotypes). Mutations 309.1C, 309.1CC, 315.1C, 16182C, 16183C, 16519C, as well as length variation in the AC stretch spanning pos. 515-524, were ignored for network construction. The sizes of circlesare proportional to the number of respective haplotypes and branch lengths are proportional to nucleotide changes.

60

Figure 4.7 (b): Median-joining haplotype network of the Kalashi population (14 haplotypes) Mutations 309.1C, 309.1CC, 315.1C, 16182C, 16183C, 16519C, as well as length variation in the AC stretch spanning pos. 515-524, were ignored for network construction. Circle sizes are proportional to the number of mtDNAs with that haplotype and branch lengths are proportional to nucleotide changes.

4.10. The Occurrence and Distribution of Nucleotide Variations in mtDNA Control Region After comparing sequence profile of Makrani individuals with rCRS, 149 variable sites were observed in the entire mtDNA control region. The distribution of polymorphic sites across the mtDNA control region of Makrani samples clearly shows that control region of the human mitochondrial DNA is a highly polymorphic region. Three hypervariable segments have been described, HVSI (np16024– 16365), HVSII (np73– 340) and HVSIII (np438-576). The highest density of polymorphic sites was obtained for HVSI, which contains 86 variable positions in total length of 342 bp (25.14 %), HVSII displays 39 mutable sites in 268 bp (14.55%), and HVSIII exhibits a comparatively lower variability with 13 polymorphic sites in 137 bp (9.48%). In contrast, the segment located between HVSI &HVSII (np16366–16569, 1–72) shows 7 polymorphic sites in 275 bp (2.54%) and the other segment located between HVSII & HVSIII (np341–437) deviate

61 from rCRS at 4 positions in 98 bp (4.08%) (Table 4.4a). A comparison of sequence data of Makrani population with the rCRS in the region studied here revealed nucleotide substitutions; insertions or deletions had taken place. Sequence deviations caused by nucleotide substitutions predominate over insertions and deletions. Transitions make up the majority of the nucleotide substitutions. Transversions insertions and deletions were observed with significantly lower frequency. This excessive amount of transitions may indicate that mispairing during replication is the major source of spontaneous mutations in mitochondrial DNA. Among the transitions, pyrimidine substitutions are predominantly found and T–C transitions occur with particular high frequency. After comparing sequence profiles of Kalashi individuals with rCRS, 47 variable sites were observed in the entire mtDNA control region. The distribution of polymorphic sites across the mtDNA control region of Kalashi individuals clearly shows that control region of the human mitochondrial DNA is a highly polymorphic region. Three hypervariable segments have been described, HVSI (np16024– 16365), HVSII (np73– 340) and HVSIII (np438-576). The highest density of polymorphic sites was obtained for HVSI, which contains 21variable positions in total length of 342 bp (6.1%), HVSII displays 11 mutable sites in 268 bp (4.1%), and HVSIII exhibits a slightly lower variability with 4 polymorphic sites in 137 bp (2.91%). In contrast, the segments located between HVSI &HVSII (np16366–16569, 1–72) show 11 polymorphic sites in 275 bp (4 %) and the other segment located between HVSII & HVSIII (98 bp; np 341–437) did not show any deviation from rCRS (Table 4.4b). A comparison of sequence data of Kalashi population with the rCRS in the region studied here revealed nucleotide substitutions; insertions or deletions had taken place. Sequence deviations caused by nucleotide substitutions predominate over insertions and transversions. Furthermore no deletion was observed. Transitions make up the majority of the nucleotide substitutions. Transversions and insertions were observed with significantly lower frequency. This excessive amount of transitions may indicate that mispairing during replication is the major source of spontaneous mutations in mitochondrial DNA. Among the transitions, pyrimidine substitutions were predominant and C–T transitions occurred with particular high frequency.

62 Table 4.4a: The occurrence and distribution of nucleotide variations in the entire mtDNA control region of Makrani population

Segment Segment HVSI between HVS1 between HVSII HVSIII (np16024- & HVSII HVSII& (np73-340) (np438-574) 16365) (np16366- HVSIII

Mutation 16569, 1-72) (np341-437)

type

number

positions positions positions positions positions

Number of Number of Number of Number of Number of

of mutations of mutations of mutations of mutations

Total number Total number Total number Total

Total number of mutations Substitutions Transitions Py-Py C–T 37 186 3 3 10 37 2 2 2 5 T–C 20 124 1 57 9 120 1 1 1 4 Pu-Pu A–G 10 43 1 4 8 200 0 0 3 3 G–A 5 19 2 19 7 11 0 0 1 2 Total 72 372 7 83 34 368 3 3 7 14 Transversions Pu-Py A–C 8 31 0 0 2 2 1 1 1 18 G–C 0 0 0 0 0 0 0 0 0 0 A–T 2 7 0 0 0 0 0 0 0 0 G–T 0 0 0 0 0 0 0 0 0 0 Py-Pu C–A 1 1 0 0 1 1 0 0 0 0 C–G 2 3 0 0 1 1 0 0 1 1 T–A 0 0 0 0 0 0 0 0 0 0 T–G 0 0 0 0 0 0 0 0 0 0 Total 13 42 0 0 4 4 1 1 2 19 Insertions C 0 0 0 0 0 0 0 0 3 12 G 0 0 0 0 0 0 0 0 1 1 T 1 1 0 0 0 0 0 0 0 0 AC 0 0 0 0 0 0 0 0 0 0 CA 0 0 0 0 0 0 0 0 0 0 Total 1 1 0 0 0 0 0 0 4 13 Deletions -G 0 0 0 0 0 0 0 0 0 0 -A 0 0 0 0 1 2 0 0 0 0 -C 0 0 0 0 0 0 0 0 0 0 -T 0 0 0 0 0 0 0 0 0 0 Total 0 0 0 0 1 2 0 0 0 0

Abbreviations: HVSI: Hypervariable Segment I, HVSII: Hypervariable Segment II, HVSIII: Hypervariable Segment 3, Py: pyrimidine-base,Pu: purine-base, A: adenine, C: cytosine, G: guanine, T: thymin

63 Table 4.4b: The occurrence and distribution of nucleotide variations in the entire mtDNA control region of Kalashi population

Segment Segment HVSI between between HVSI HVSII HVSIII (16024bp- HVSII& & HVSII (73bp-340bp) (438-574) 16365bp) HVSIII (16366bp-72bp) (341bp-437bp)

Mutation

type

itions

pos positions positions positions positions

mutations mutations mutations mutations mutations

Number of Number of Number of Number of Number of

Total number of Total number of Total number of Total number of Total number of

Substitutions TransitionsPy-Py C–T 10 114 2 34 4 61 0 0 1 1 T–C 5 122 5 128 5 115 0 0 1 18 Pu-Pu A–G 4 26 2 196 0 0 1 19 G–A 1 16 1 12 0 0 0 0 1 31 Total 20 278 8 174 11 372 0 0 4 69 TransversionsPu-Py A–C 0 0 0 0 0 0 0 0 0 0 G–C 1 19 0 0 0 0 0 0 0 0 A–T 0 0 0 0 0 0 0 0 0 0 G–T 0 0 1 1 0 0 0 0 0 0 Py-Pu C–A 0 0 0 0 0 0 0 0 0 0 C–G 0 0 0 0 0 0 0 0 0 0 T–A 0 0 0 0 0 0 0 0 0 0 T–G 0 0 0 0 0 0 0 0 0 0 Total 1 19 1 1 0 0 0 0 0 0 Insertions C 0 0 0 0 0 0 0 0 0 0 2C 0 0 0 0 0 0 0 0 0 0 3C 0 0 0 0 0 0 0 0 0 0 T 0 0 2 33 0 0 0 0 0 0 AC 0 0 0 0 0 0 0 0 0 0 CA 0 0 0 0 0 0 0 0 0 0 2CA 0 0 0 0 0 0 0 0 Total 0 0 2 33 0 0 0 0 0 0 -G 0 0 0 0 0 0 0 0 0 0 -A 0 0 0 0 0 0 0 0 0 0 -C 0 0 0 0 0 0 0 0 0 0 -CA 0 0 0 0 0 0 0 0 0 0 Total 0 0 0 0 0 0 0 0 0 0

Abbreviations: HVSI: Hypervariable Segment 1, HVSII:Hypervariable Segment 2, HVSIII: Hypervariable Segment 3, Py: pyrimidine-base,Pu: purine-base,A: adenine, C: cytosine, G: guanine, T: thymine

64 4.11. Heteroplasmy

4.11.1. Point Heteroplasmy

Point heteroplasmy was observed at 5 different positions (16497A/G, 199C/T, 16168C/A, 16173C/A, 16249T/A) accounting 13% of the individuals in the Makrani population (Table 4.5). The point heteroplasmy observed at the different positions in Makrani population is shown in figure 4.8 (MKH009, MKH010, MKH029, MKH026). Moreover, only one individual (MKH026) presented more than one point heteroplasmy at positions 16168bp & 16173bp in HVSI of control region in this population Fig. 4.8. In Kalashi population, a significant number of samples showed the sequence heteroplasmy at 6 different positions (199C/T, 16168C/A, 16169C/A, 297A/C, 412G/A, 16295C/T) accounting for 58.56% of the individuals (Table 4.5). The point heteroplasmy observed at the different positions in Kalashi population are shown in figure 4.8 fig. 4.9 KLH 025, KLH121, KLH007, KLH112, KLH052, KLH119, KLH066 & KLH087. In this population, three individuals (KLH007, KLH011, KLH081) presented more than one point heteroplasmy in HVSI, HVSII segments of control region as shown in fig. 4.9.

Table 4.5: Point heteroplasmy in the Makrani and the Kalashi populations Heteroplasmic No. of samples Sr. No. Symbol positions Kalashi Makrani 1 16497 A/G R 0 1 2 199 C/T Y 2 2 3 16168 C/A M 58 8 4 16173 C/A M 0 1 5 16169 C/A M 3 0 6 297 A/C M 3 0 7 412 G/A R 1 0 8 16295 C/T Y 1 0 9 16249T/A W 0 1

65

Figure 4.8: Point heteroplasmy observed at different positions of mtDNA control region in the Makrani population. rCRS; revised Cambridge Reference Sequence

66

Figure 4.9: Point heteroplasmies observed at different positions of mtDNA control region in the Kalashi population. rCRS; revised Cambridge Reference Sequence.

67 4.11.2. Length Heteroplasmy In Makrani population, length heteroplasmy was observed in all samples as shown in (Table 4.6). In this case, one length heteroplasmy was observed in each individual (100 individuals), and two of them possessed more than one length heteroplasmy. The segments of mtDNA control region depicting length heteroplasmy are shown in (Table 4.6). The length heteroplasmy were observed in the common poly-C tracts with highest frequency (100%) in HVS II (between positions 303–315) of the control region. The length heteroplasmy in the common poly-C tracts in HVS II are shown in two Makrani individuals as an example in fig. 4.10. Moreover, two individuals presented length heteroplasmy in the common poly-C tracts in different segments of mtDNA control region (HVSI: np16184-16193 & HVSII: np303-315). The length heteroplasmy in the common poly-C tracts of a Makrani individual found in two different regions such as HVSI and HVSII is shown as an example in fig. 4.10. Similarly, length heteroplasmy of poly-C tracts in HVS II (between positions 303–315) of the control region were observed in all Kalashi individuals (n=111). Out of 111 individuals, only 16 individuals showed heteroplasmy with poly-C in HVSI (16184- 16193) and 23 individuals with poly-AC in HVSIII (np514-525) in addition to poly-C tract of HVSII (np303-315) (Table 4.6). During the sequencing of mtDNA control region, it was observed that the homopolymeric patterns of length heteroplasmy (poly-C types) affected the quality of sequencing profiles. In sequencing results, different kinds of homopolymeric C patterns like 7+6C (i.e. 7+6 C means that C repeated seven times interrupted by one T and then 6 more Cs), 8+6C and 9+6C were observed. All the samples with 7+6 C produced good quality sequencing chromatograms. However, homopolymer 8+6 C and 9+6 C produced noise after poly-C stretch as shown in figure 4.10. The quality of sequencing for such homopolymeric patterns was improved by using additional sequencing primers close to the poly-C stretches

68 Table 4.6: The length heteroplasmy distribution along the mtDNA control region of the Makrani and Kalashi populations

Numbers of heteroplasmy and frequency Makrani Kalashi HVSI poly-C (16184–16193) 02 16

HVSII poly-C (303–315) 100 111 HVSIII Poly-AC (514–525) 00 23 Poly-C (568–573) 00 00

Figure 4.10: Chromatograms showing the homopolymeric patterns of length heteroplasmy in the Makrani Population

69

4.12. Comparison of Haplogroup Frequencies and Continental Origins in Subpopulations of Pakistan The haplogroup frequencies and continental origins observed in Makranis and Kalashi during this study was compared with the other previously studied subpopulations of Pakistan. The comparative analysis revealed that the most frequent haplogroup observed in Makrani population is L2a1b1a (a southeastern African haplogroup found mostly in Mozambique) showing a high degree of genetic association with southeast Africans. In this study, African haplogroups (28%) including L2a1b1a, L2a1, L3d1a1a, L2ba1, L3f1b4a, L4b2a2, M1a1, L0a1b, L1c2a1a, L4b2b1, L2a1a2 and L0a2a2; West Eurasian haplogroups (26%) including U7a, T1a8a, U5b, J1d, HV2a, T1a7, U4c1, U4‘9, J1b1b, W6, T2, T1a103, J2a2 and J1b1a1; South Asian haplogroups (24%) including R2, R30a1b, M3, M4, U2b1, M3a1, M6a1b, M18a and U2a; and East Asian haplogroup, F3b1 (1%), were observed. The remaining 20% mtDNA of the sampled individuals could not be confidently assigned to a continental origin. On the other hand, Western Eurasian mtDNA haplogroups (U4, U2e1, R0a’b, R2, H2a1 and J2b1a) were found to be most prominent in Kalash. Only one haplogroup (M65a) found in Kalashi belongs to South Asian origin and one sample could not be assertively assigned with any of the known sub-haplogroup and its origin. The haplogroups, R0a’b&U4, were observed to be the most frequent haplogroups (28.8%, 27.9% respectively) in Kalash, only U4 haplogroup, the least frequent in Pathans (0.9%) and none of the individual belongs to R0a’b&U4 haplogroups in Makranis and Saraiki of Pakistan. L2a1b1a is the most frequent (11%) haplogroup in Makranis and was not found in Kalashi, Saraiki and Pathans (Table 4.7).

70 Table 4.7: The comparison of mtDNA haplogroups’ frequencies and their continental origins among subpopulations of Pakistan Saraiki Pathan Kalash Makrani N=85 N=230 Haplogroup N=111 Haplogroups N=100 (Hayatet al., Rakha et Origin (Present study) (Present study) 2014) al., 2011) n % n % n % n % B4b1 0 0 0 0 0 0 1 0.4 C4a1 0 0 0 0 0 0 2 0.9 D4 0 0 0 0 0 0 4 1.7 D4a EA 0 0 0 0 1 1.1 0 0 D4j1a EA 0 0 0 0 1 2.3 0 0 D4j1b2 SEA 0 0 0 0 1 1.1 0 0 D4q 0 0 0 0 0 0 1 0.4 F3b1 EA 0 0 2 2 0 0 0 0 H WE 0 0 0 0 0 0 9 3.9 H1 0 0 0 0 0 0 2 0.9 H14a 0 0 0 0 0 0 1 0.4 H2a+152,16311 SWA 0 0 0 0 1 1.1 0 0 H2a1 WEA 4 3.6 0 0 0 0 1 0.4 H2a2a SWA 0 0 0 0 3 1.1 0 0 H2b 0 0 0 0 0 0 6 2.6 H5 0 0 0 0 0 0 2 0.9 H6 0 0 0 0 0 0 1 0.4 H7a 0 0 0 0 0 0 1 0.4 HV WEA 0 0 0 0 0 0 24 10.4 HV0 WE 0 0 0 0 0 0 1 0.4 HV2 WE 0 0 0 0 0 0 2 0.9 HV2a WEA 0 0 2 2 1 1.1 0 0 I WA 0 0 0 0 1 1.1 1 0.4 I1 0 0 0 0 0 0 4 1.7 J1 0 0 0 0 0 0 1 0.4 J1b 0 0 0 0 0 0 2 0.9 J1b1a 0 0 0 0 0 0 1 0.4 J1b1a1 WEA 0 0 1 1 0 0 0 0 J1b1b WEA 0 0 1 1 1 1.1 0 0 J1d WEA 0 0 2 2 0 0 0 0 J2a2 WEA 0 0 1 1 0 0 0 0 J2b 0 0 0 0 0 0 1 0.4 J2b1a WEA 16 14.4 0 0 0 0 0 0 K1a 0 0 0 0 0 0 4 1.7 K1a11 0 0 0 0 0 0 1 0.4 K2a5 0 0 0 0 0 0 1 0.4 L0a1b AF 0 0 1 1 0 0 0 0 L0a2a2 AF 0 0 1 1 0 0 0 0 L1c2a1a AF 0 0 1 1 0 0 0 0

71 L2a1 AF 0 0 5 5 0 0 0 0 L2a1a2 AF 0 0 1 1 0 0 0 0 L2a1b1a AF 0 0 11 11 0 0 0 0 L2b1a AF 0 0 1 1 0 0 0 0 L3d1a1a AF 0 0 3 3 0 0 0 0 L3e'i'k'x EEA/SA 0 0 0 0 3 2.3 0 0 L3f1b4a AF 0 0 1 1 0 0 0 0 L4b2b1 AF 0 0 1 1 0 0 0 0 M EE 0 0 0 0 0 0 5 2.2 M-154-194 0 0 2 2 0 0 0 0 M12a 0 0 0 0 0 0 2 0.9 M18 0 0 0 0 0 0 1 0.4 M18a SA 0 0 1 1 2 2.3 0 0 M1a1 AF 0 0 1 1 0 0 0 0 M25 0 0 0 0 0 0 1 0.4 M2a1a SA 0 0 0 0 1 1.1 0 0 M2a1a2 0 0 0 0 0 0 1 0.4 M3 SA 0 0 3 3 0 0 20 8.7 M30 SA 0 0 0 0 1 1.1 7 3 M30+16234 SA 0 0 0 0 1 1.1 0 0 M30b 0 0 0 0 0 0 1 0.4 M30c1 0 0 0 0 0 0 2 0.9 M32c 0 0 1 1 0 0 0 0 M37e 0 0 0 0 0 0 1 0.4 M3a1 SA 0 0 1 1 0 0 0 0 M3a1+204 SA/AF 0 0 0 0 1 1.1 0 0 M3c1 0 0 0 0 0 0 2 0.9 M4 SA 0 0 3 3 1 1.1 0 0 M4a 0 0 0 0 0 0 2 0.9 M4b1 0 0 0 0 0 0 3 1.3 M4b2 0 0 0 0 0 0 1 0.4 M5 EEA/SA 0 0 0 0 2 2.3 5 2.2 M5a1 0 0 0 0 0 0 4 1.7 M5a2a1a EEA/SA 0 0 0 0 1 3.5 0 0 M5b2 EEA/SA 0 0 0 0 1 1.1 0 0 M5c1 EEA/SA 0 0 0 0 10 11.7 0 0 M65a SA 1 0.9 0 0 0 0 0 0 M6a1b SA 0 0 1 1 0 0 0 0 M7b 0 0 0 0 0 0 1 0.4 N 0 0 0 0 0 0 1 0.4 N10a SEA 0 0 0 0 1 1.1 0 0 N1b 0 0 0 0 0 0 1 0.4 N5 0 0 0 0 0 0 1 0.5 Pre-M30e 0 0 0 0 0 0 4 1.7 Pre-R0a2, 3 0 0 0 0 0 0 4 1.7 R0a’b WEA 32 28.8 0 0 0 0 0 0 R2 WEA 7 6.3 6 6 2 2.3 4 1.7

72 R30 0 0 0 0 0 0 1 0.4 R30a 0 0 0 0 0 0 3 1.3 R30a1b SA 0 0 5 5 0 0 0 0 R31 SA 0 0 0 0 1 1.1 0 0 R5a1 0 0 0 0 0 0 1 0.4 R5a2 0 0 0 0 0 0 2 0.9 R5a2b 0 0 0 0 0 0 1 0.4 R6 0 0 0 0 0 0 2 0.9 R9 SA 0 0 0 0 1 1.1 0 0 T1 0 0 0 0 0 0 3 1.3 T1a 0 0 0 0 0 0 6 2.6 T1a1’3 WEA 0 0 1 1 0 0 0 0 T1a7 WEA 0 0 1 1 0 0 0 0 T2 WEA 0 0 1 1 0 0 3 1.3 T2b 0 0 0 0 0 0 3 1.3 T2c 0 0 0 0 0 0 2 0.9 T2c1b 0 0 0 0 0 0 1 0.4 U1a 0 0 0 0 0 0 1 0.4 U2 WEA/SA 0 0 0 0 1 1.1 1 0.4 U2+152 SA 0 0 0 0 2 1.1 0 0 U2a WEA 0 0 1 1 0 0 1 0.4 U2a1a WEA/SA 0 0 0 0 3 3.5 0 0 U2b WE 0 0 0 0 0 0 2 0.9 U2b1 WEA 0 0 3 3 0 0 0 0 U2b2 SA 0 0 0 0 8 9.4 5 2.2 U2c 0 0 0 0 0 0 5 2.2 U2c'd SA 0 0 0 0 2 2.3 0 0 U2e1 WEA 19 17.1 0 0 0 0 0 0 U4 WEA 31 27.9 0 0 0 0 2 0.9 U4’9 WEA 0 0 1 1 0 0 0 0 U4a2 0 0 0 0 0 0 1 0.4 U4a2a SA 0 0 0 0 2 1.1 0 0 U4c1 WEA 0 0 1 1 0 0 0 0 U5b WEA 0 0 2 2 0 0 1 0.4 U7 WEA/SA 0 0 0 0 6 7 26 11.3 U7a WEA 0 0 6 6 7 8.2 0 0 U8c SA 0 0 0 0 1 2.3 0 0 W 0 0 0 0 0 0 3 1.3 W3, 5 0 0 0 0 0 0 3 1.3 W5a 0 0 0 0 0 0 1 0.4 W6 WEA 0 0 1 1 11 12.9 3 1.3 X2 WA 0 0 0 0 1 1.1 0 0 X2d WA 0 0 0 0 1 1.1 0 0 Z EE 0 0 0 0 0 0 1 0.4 ? 1 0.9 17 17 0 0 0 0

N, total number of individuals for each population; n, number of times haplogroup observed, Major haplogroups and their frequencies are represented in bold italics.

73 4.13. Comparative Statistical Analyses of Different Pakistani Subpopulations The statistical parameters observed in Makrani and Kalashi population during this study were compared with the other previously studied subpopulations of Pakistan. The comparative analysis revealed that the Pathans population showed the highest power of discrimination (0.9978), genetic diversity (0.993) and lowest random match probability (0.0065) among the subpopulations of Pakistan such as Makrani, Kalashi and Saraiki. However, Kalashi population observed to be the least diverse population, having genetic diversity (0.8393), power of discrimination (0.832) and random match probability (0.16824) among the present and previously studied subpopulations of Pakistan (Table 4.8).

Table 4.8: The comparison of diversity parameters estimated from the entire mtDNA control region among subpopulations of Pakistan

Makrani Kalashi Saraiki Pathans (Present study) (Present study) (Hayat et al., 2014) (Rakha et al., 2011) Sample size 100 111 85 230 Number of different 70 (of which 14 (of which 5 63 (of which 58 193 (of which 171 haplotypes 54 unique) unique) unique) unique) Polymorphic 149 47 140 215 Positions Random Match 0.0408 0.1682 0.0542 0.0065 Probability Power of 0.9592 0.832 0.9458 0.9978 Discrimination Genetic Diversity 0.9688 0.8393 0.957 0.993

74 5-DISCUSSION DNA serves as a molecular passport that tells us the story of early human’s intercontinental movements. In addition to nuclear genome humans have a second genome, the mitochondrial DNA (mtDNA) that follows a maternal mode of inheritance. Random mutations add up to form familial signature after every few generations. So a comparison of two samples of mtDNA with respect to mutations reveals ancestral origin and degree of relationship (Radford, 2011; Hellenthal et al., 2014). The present study reports the haplotype data of mtDNA control region (spanning positions 16,024–16,569 and 1–576) including hypervariable segments (HVSI, HVSII and HVSIII) for two isolated populations of Pakistan i.e. Makrani & Kalashi. The entire mitochondrial DNA control region of 100 unrelated Makrani and 111 Kalashi individuals were sequenced. In the Makrani data set, 149 polymorphic positions were observed (Siddiqi et al., 2014) while 47 polymorphic positions were detected in Kalashi population (Table 4.8). Overall, the entire mtDNA control region sequence analyses revealed an extremely high level of genetic diversity in the Makrani population (0.9688) with a high number of unique haplotypes (54) (Table 4.8). These results are consistent with two other studies involving Saraiki ethnic group (Hayat et al., 2014) and Pakistani-Karachi (Quintana-Murci et al., 2004). However, the highest number of unique haplotypes has been reported previously in Pathans but this was based on a larger sample size (n = 230). Furthermore, the high number of unique haplotypes in the Pathans population is also reflected in high genetic diversity (Rakha et al., 2011) among different ethnic groups of Pakistan, closely followed by Hazara, Sindhi and Pakistani-Karachi (Quintana-Murci, et al., 2004). The high genetic diversity in the Makrani population (0.9688) is comparable to the other regional ethnic groups such as Iran such as Qashqais (0.996) Persians (0.999) and Azeris (1.00) (Derenko et al., 2013). Similarly higher genetic diversities were also observed in Tajiks from Tajikistan (Ovchinnikov et al., 2014) and Mansi populations (Pimenoff et al., 2008).Moreover, Median Joining Network analysis showed the substantial divergence among the haplotypes in Makrani population (Fig.4.7). However, in the case of Kalash, low genetic diversity (0.875)was observed which might be due to either low number of haplotypes (14: only 5 are unique) or due to

75 presence of shared haplotypes. For example; only two shared haplotypes including h5 (16362C, 16519C, T58C, 60.1T, 64T, 263G) and h9 (16134T, 16356C, 16519C, 73G, 152C, 195C, 263G, 499A) represent >50% of Kalash population. Furthermore, high frequency of few haplogroups in Kalash further provided the evidence for low genetic diversity. For example, two haplogroups including R0a’b and U4 constituted ~60% of Kalash population which may be called as Kalash specific haplogroups (Fig. 4.6 b).The extremely low genetic diversity has been reported in the Waorani population as a consequence of genetic drift based on low population size (Beckerman et al., 2009; Cardoso et al., 2012).Moreover, the least genetic diversity (0.00) by analyzing hypervariable segment I (HVSI) of mtDNA has also been reported in Malbari population from Thailand (Oota et al., 2005). Out of total population of 300 individuals of Malbari population, only one haplotype have been found in analyzing HVSI of mtDNA of 58 individuals. The authors suggested exceptional display of poor gene pool of this population might be the result of founder event or bottleneck. Therefore, low genetic diversity in Kalash may be explained by genetic drift in the population due to either low population size (Oota et al., 2005) orendogamy. Furthermore, Median Joining Network analysis showed limited divergence among the haplotypes of Kalash (Fig. 4.7 b). Furthermore, out of total seven different haplogroups found in Kalash, the diverse haplogroup distribution was observed within each maternal sub-ethnic group (1-5 haplogroups/maternal sub-ethnic group) despite Kalash being isolated population (Table 4.3). However, few haplotypes were reported in all analyzed samples from two isolated populations including Waorani (three haplotypes) and Malbari (one haplotype) in contrast to our study (Beckerman et al., 2009; Cardoso et al., 2012). Therefore, the diversity within the sub ethnic group of Kalash may be the indicator of their strict custom about prohibition of marriages within their own sub-ethnic groups. The present study revealed a strongly admixed mtDNA pool composed 28% of African haplogroups (L2a1b1a, L2a1, L3d1a1a, L2ba, L3f1b4a, L4b2a2, M1a1, L0a1b, L1c2a1a, L4b2b1, L2a1a2, L0a2a2), 26% of West Eurasian haplogroups (U7a, T1a8a, U5b, J1d, HV2a, T1a7, U4c1, U4‘9, J1b1b, W6, T2, T1a103, J2a2, J1b1a1),24% of South Asian haplogroups (R2, R30a1, M3, M4, U2b1, M3a1, M6a1b, M18a, U2a.The remaining 20% mtDNA of the sampled individuals could not be confidently assigned a

76 continental origin (Table 4.1a). Among the African haplogroups, L2 lineage was observed as dominant with eighteen mtDNA sequences (18%) in the Makranis with no traces of this lineage in other sub populations of Pakistan such as Pathans (Rakha et al., 2011), Saraiki (Hayat et al., 2014) and Kalash (Present study). Among the L2 sub-clades, L2a1b1a was found to be highly frequent (11%), L2a1 as a moderately frequent (5%), while remaining sub-clades such as L2a1a2 and L2ba1 were found to be 1% each in the Makranis. In some earlier investigations haplogroup L2a has been reported as the most frequent and widespread mtDNA cluster (reaching over 40%) in different ethnic groups such as Tuareg, Mali, Fali, Western Pygmies and Mozambique Bantu from Africa (Watson et al., 1997; Pereira et al., 2001; Salas et al., 2002; Quintana-Murci et al., 2004; Coia et al., 2005). During present study the haplotype (16189 16192 16223 16278 16294 16309 16390) of L2a1 haplogroup was observed in Makranis and this haplotype has been considered as most extensive haplotype in pan-African individuals especially people belonging to - Congo family from West Africa, Afro-Asiatic family including the Hausa from North Africa and the Niger-Congo family including the Bamileke from Central Africa (Ely et al., 2006). However, very low frequency of L2a haplogroup has been reported in Persians (1.11%) indicating least gene flow from Africa to Persia in contrast to Makranis. Thus, identical mitochondrial haplotypes are often shared among ethnic groups with common origin as seen in the case of Makranis and African populations (Derenko et al., 2013). L3 lineage constituted a frequency of 4% in the Makranis with two sub-clades including L3d1a1a (3%) and L3f1b4a (1%) and no evidence of this lineage was reported in other subpopulations of Pakistan (Rakha et al., 2011; Hayat et al., 2014 Kalash: Present study). L3 is more related to Eurasian haplogroups than to the most divergent African clusters L1 and L2 (Meyer et al., 2001). L3 is the haplogroup from which all modern humans outside of Africa were originated. In a recent study, the L3d1a1a has been predominantly seen in Fulani people of West-Central Africa (John et al., 2014). Another evidence of L3d was also found in Damara people of Africa but L3f was entirely absent (Barbieri et al., 2014) in contrast to the Makranis which may suggest this genetic contribution from southwestern African and some population of Cameron as frequency of L3f has been reported >20% in both southwestern Africans and Cameroonians (Cerny et

77 al., 2009; Cerezo et al., 2011). The sub-clade L3f1b4a has also been reported in Himba and Herero populations as well as several other Bantu-speaking populations from Namibia and of Africa (Barbieri et al., 2014). L3f1b6, found at 1% in Asturias Spain, which diverged from African L3 lineages at least 10,000 years ago (Pardinas et al., 2014). L3d has also been found(1.79%)in Qashqais people of Iran living in nearby areas of Baluchistan (Makran Coast) which may further strengthen the suggestion of common African genetic influence in both Makranis and Qashqais people as result of slave trade from Africa (Derenko et al., 2013). L0 lineage was observed in Makranis with two sub-clades including L0a1b (1%), and L0a2a2 (1%) and was not observed in other subpopulations of Pakistan (Rakha et al., 2011; Hayat et al., 2014; Kalash: Present study). L0 lineage has been predominantly found in mtDNA gene pool, which is more than 60% with either L0d or L0k lineages in sub-Saharan Africa (Vigilant et al., 1991; Chen et al., 2000; Knight et al., 2003; Tishkoff et al., 2007). However, Haplogroup L0a is the most prevalent in Southeast African populations (25% in Mozambique) (Rosa et al., 2004). Moreover, L0a has been reported with low frequency in some other ethnic groups including Balanta (5%) and Guineans (1%) (Ridl et al., 2009). Particularly, L0a2a1 sub-clade is the most prevalent sub-clade in central/West Africa and L0a2a2a, is mostly associated with Bantu- speaking populations (Rito et al., 2013). As far as L4 lineage is concerned it was observed in Makranis in the form of two sub-clades including L4b2a2 (1%) and L4b2b1 (1%) and this lineage was not detected in other subpopulations of Pakistan (Rakha et al., 2011; Hayat et al., 2014, Kalash: Present study). The haplogroup L4 is a sister clade of L3, typically found in East and Northeast Africa, with low frequencies (Watson et al., 1997; Krings et al., 1999; Kivisild et al., 2004; Tishkoff et al., 2007; Castri et al., 2009; Rito et al., 2013). In some earlier studies, the L4a motif has been found in (Salas et al., 2002) and is also frequent in Tanzania, Amhara and Gurages from Ethiopia (Salas et al., 2002; Kivisild et al., 2004; Gonder et al., 2006). Furthermore this haplogroup is most concentrated in the southern tip of the Arabian Peninsula. Consistent with present study there is also evidence of low frequency (1.8%) for this haplogroup in the Rio de Janeiro (Brazil) (Bernardo et al., 2014).

78 Haplogroup L1c reaches its highest frequencies in West and Central Africa, notably among the Mbenga (Quintana-Murci et al., 2008). Other populations in which L1c is particularly prevalent include the Tikar (100%), Baka people from (97%) and (90%).Furthermore, it is also common in São Tomé (20%) and Angola (16-24%), however, L1c2a1athe sub-clade of L1, was found in Makranis with very low frequency (1%) during the present investigation. The macro-haplogroup M, like its sibling N, is a descendant of haplogroup L3.The geographical distribution of M and N are as a result of out of Africa migrations and the subsequent colonization of the rest of the world. The highest frequencies of macrohaplogroup M worldwide have been observed in Asia, specifically in Bangladesh, India, Japan, Nepal, and Tibet, where it ranges from 60%-80% (Maruyama et al., 2003; Rajkumar et al., 2005; Thangaraj et al., 2006).The sub clade haplogroup M1, which is the only variant of macrohaplogroup M, has been reported in Africa (Metspalu et al., 2004). Furthermore, India has shown several M lineages that may have appeared directly from the root of haplogroup M. The percentage of individuals carrying this haplogroup in this study was observed between 13.0to 55.4% for M1 and 11.0 to 68.0% for M1a1 (Rajkumar et al., 2005; winters, 2010). In the present study, very low frequency (1%) of haplogroup M1a1 was observed in Makranis in contrast to some previous studies where no evidence for this haplogroup was found in other subpopulations of Pakistan (Rakha et al., 2011; Hayat et al., 2014, Kalash: Present study). The Western Eurasian component was represented by haplogroups J, T, and U of the macrohaplogroup R (van Oven and Kayser, 2008).Among the West Eurasian haplogroups observed in Makranis, U haplogroup was predominant with 10 mtDNA sequences (10%). Among the U sub-clades, U7a displays the highest percentage (6%) in Makranis. A few sequences were assigned to other branches of U such as U5b (2%), U4c1 (1%), and U4‘9 (1%) in Makranis.U5b has also been reported with very low frequency (0.4%) in Pathans (Rakha et al., 2011). The sub-clade U5b2a has been characterized by its presence in Poland, Slovakia and the Czech Republic showing its Central European origin (Malyarchuk et al., 2010). The sub-clade U7a (having highest frequency in U sub-clade) has also been found in Saraikis (8.7%), another ethnic group of Pakistan (Hayat et al., 2014) and was not observed in Pathans (Rakha et al., 2011) and

79 Kalash (present study). The highest frequencies (up to 10%) of the haplogroup U7a have been also registered in some Iranian populations and in Gujarat (over 12%), the westernmost state of India (Quintana-Murci et al., 2004, Metspalu et al., 2004; Terreros et al., 2011; Derenko et al., 2013). Haplogroup U7 has been reported in three phylogenetic clusters based on its distribution either in Southwest Asian/Indian such as U7a and U7c or European such as U7b. Haplogroup U7a has been reported in Tajiks with frequency of 4.4% (Ovchinnikov et al., 2014), which is consistent with Makranis (Siddiqi et al., 2014). T lineage was observed in Makranis in the form of four sub-clades including T1a8a (5%), T1a1'3 (1%), T1a7 (1%), and T2 (1%). This lineage was not found in Saraikis (Hayat et al., 2014) and Kalash (Present study) of Pakistan. However, only T2 sub-clade of T lineage was observed in Pathans (1.3%) of Pakistan (Rakha et al., 2011), in Indo-European-speaking Persians (7.18) and Azeris (18.2%) living in Iran (Derenko et al., 2013). This lineage has been detected in Amhara and the Tigrai of Ethopia suggesting its contribution may due to bidirectional migrations from Europe to Africa (Kivisild, 2004). The presence of T lineage in Makranis may support the idea of slave trade from Africa to South Asia (India) as suggested previously (Kivisild, 2004). Two sub-clades of T linage (8.8%), T1a1’3 and T2b, have been found in Tajiks with equal frequencies in recent study (Ovchinnikov et al., 2014). Herein this study, the sequence similarity cannot be ignored between the rare T1a1'3 (16126C, 16163G, 16186T, 16189C, 16294T, 16519C, 73G, 152C, 195C, 263G, 309.1C, 315.1C, 372C) in Makranis sample (MKH018) and a sample from Region of Republican Subordination, (16126, 16163, 16186, 16189, 16294, 16519, 73, 152, 195, 263, 309.1C, 315.1C) of Tajikistan (Ovchinnikov et al., 2014)that may suggest their common genetic origin. The J lineages were found with moderate frequency (5%) in Makranis, having sub-cladesJ1b1b (1%), J2a2 (1%), J1b1a1 (1%) and J1d (2%). The two sub-clades J1b1a and J2a1a have been considered as indicators for the Near East towards Europe expansions in the Late Glacial period. Furthermore, J2b1a has been suggested as an exclusive mtDNA marker for Europeans (Pala et al., 2012). In a recent study, haplogroups J1 has been found (2.2%) in Tajiks (Ovchinnikov et al., 2014). As a result of northwest European expansion, J1b1a1a sub-clade has been found in Georgians and

80 Russians (Forster et al., 2004) similar to Makranis that may suggest their common genetic origin. The present study reports HV2a (2%), which has been found in Saraikis (1.1%) (Hayat et al., 2014) but not observed in Pathans (Rakha et al., 2011) and Kalash (present study). The south Asian component was comprised of twenty-four mtDNA sequences in the Makranis and it has been suggested that the macrohaplogroups M and Rare South Asian specific lineages (van Oven and Kayser, 2008). Macrohaplogroup R was represented by eleven mtDNA sequences (11%) belonging to R2 (6%) and R30a1b (5 %) in Makranis. The haplogroup R2 was found 6.3% in Kalash (present study), 2.3% in Saraiki population (Hayat et al., 2014) and 1.7% in Pathans (Rakha et al., 2011). Furthermore, R2 haplogroup has been reported 3.31% in Persians and 2.68% in Azeris (Derenko et al., 2013). Haplogroup R2, which is concentrated in southern Pakistan and in India, and is present at low frequencies in the most of adjacent regions, including the Near East, the Caucasus, the Iranian Plateau, the Arabian Peninsula, and Central Asia (Quintana-Murci et al., 2004; Al-Abri et al., 2012). The extensive sequencing of complete mtDNAs from a large part of the Iranian Plateau led to the identification of several highly divergent Qashqai lineages within the entire haplogroup R2 and revealed a new Persian-specific sub-clade within haplogroup R2a (Derenko et al., 2013). Furthermore, the subline ages of a south Asian autochthonous subhaplogroup of the macrohaplogroup R including U2b1 (3%) and U2a (1%) was found in Makranis. The other lineage of South Asian component, M was found 9% including M3 (3%), M4 (3%), M3a1(1%) M6a1b (1%) and M18a (1%). The sublineageM18a (2.3%) and M4 (1.1%) have been found in Sarakis (Hayat et al., 2014). Moreover, M3 (8.7%) has been reported in Pathans (Rakha et al., 2011). Here in this study, the similarity of haplogroup deciding nucleotides cannot be ignored between the rare lineage M3a1 (16126C, 16223T, 16311C, 16519C, 73G, 204C, 217C, 263G, 482C, 489C) in Makrani sample (MKH027) from Turbat and a sample, (16126, 16223, 16519, 73, 204, 263, 315.1C, 482, 489) from Pamir, Tajikistan (Ovchinnikov et al., 2014). Moreover, the presence of M haplogroup in Saudi Arab gene pool also suggested gene flow from South Asia (Abu-Amero et al., 2007). Two haplotypes observed in the Makrani, both carrying a characteristic combination of two mutations in HVS-II (154C and 194T) could not be confidently

81 assigned to a known (sub) haplogroup, although the presence of both 16223T and 489C indicate membership within macrohaplogroup M; this lineage was therefore tentatively assigned to haplogroup ‘‘M-154-194’’. Future studies, performing complete mitogenome sequencing, will be needed to elucidate the precise phylogenetic position of this lineage. Another isolated population, Kalash, living in the three valleys namely, Bumburet, Birrir and Rumbur of Chitral district was also analyzed for mtDNA-based haplogrouping to determine their origin. The Kalash showed greater affinity with West Eurasia comprising mainly of West Eurasian haplogroups (98.2%) includingR0a’b, U4, U2e1, J2b1a, R2 and H2a1 (Table 4.1b). Among West Eurasian haplogroups, highest frequency was observed for haplogroup R0a’b (28.8%) in Kalash population (present study) and no evidence was found in Makranis (Siddiqi et al., 2014), Pathans (Rakha et al., 2011) and Saraikis for this haplogroup (Hayat et al., 2014). Similarly, its highest frequency (23%) was observed in a previous study on Kalash population (Quintana-Murci, et al., 2004).In South Asia, haplogroup R has been reportedin western and southern India (Maji et al., 2008). Haplogroup R0 has been frequently reported in the Arabian Plate with its highest frequency in Socotri () i.e. 38% (Torroni et al., 2006, Cerny et al., 2009) and least frequently observed in North Africa, the Horn of Africa, Anatolia, Iranian Plateau and Dalmatia (Achilli et al., 2011). The R0a’b, node of R, has been observed in Italian individuals. However, the greater frequency of macro-haplogroup R in the Arabian Plate suggests that the origin of R0a is Arabian Peninsula (Abu-Amero et al., 2008). MtDNA haplogroup U4 was found to be second most frequent haplogroup (27.9%) in Kalash population and the haplogroup has not been observed in Makranis (Siddiqi et al., 2014) Pathans (Rakha et al., 2011) and Saraikis (Hayat et al., 2014). Similarly, U4 haplogroup was observed with the highest frequency of 34% in a previous study on Kalashi population (Quintana-Murci et al., 2004). The association of U4 haplogroup has been observed with the remnants of ancient European hunting-gatherers preserved in the indigenous populations of Siberia (Sarkissian et al., 2013) and with Europeans having highest concentrations in Scandinavia and the Baltic states. The isolated populations such as Mansi (16.3%) an endangered people of Russia (Derbeneva et al., 2002) and the Ket people (28.9%) of the Yenisey River (born in the heart of

82 Mongolia and flows north, through Siberia, and on to the Arctic Ocean) (Derbeneva et al., 2002) also showed high frequency of U4 haplogroup, which is consistent with present study. U4 is an ancient mitochondrial haplogroup and is relatively rare in modern populations (Malyarchuk et al., 2010) indicating ancient origin of Kalashi population. Further evidence of U4 as Western Eurasian haplogroup has been found in the form of relatively high frequencies (up to 18%) in Altai-Sayan region in Altaians and Khakassians (Derenko et al., 2013). Therefore, the evidence of highest frequency of U4 haplogroup in the Kalash may suggest Western Eurasian influence in the population (Quintana-Murci, et al., 2004). Another West Eurasian haplogroup, U2e1, was found 17.1% in the Kalash population but no evidence of this haplogroup, has been found in the Makranis (Siddiqi et al., 2014) Pathans (Rakha et al., 2011) and Saraikis (Hayat et al., 2014) of Pakistan. U2Haplogroup is most common in South Asia (Metspalu et al., 2004) but it is also found in low frequency in Central and West Asia, as well as in Europe (Maji et al., 2008). The U2i, the sub clade of U2, largely account for overall higher frequency in South Asians (Indians) whereas sub cladeU2e, common in Europe, is entirely absent in the region. These lineages diverged approximately 50,000-years-ago and very low maternal-line gene flow has been observed between South Asia and Europe throughout this period (Metspalu et al., 2004). In Siberia, the frequency of U2e had been reported as the average frequency of about 0.9%, but in some Altaian populations (such as Altaian-Kizhi, Teleuts and Telenghits) this haplogroup reached higher than the average frequencies (up to 3%) (Duliket al., 2012). The higher frequency of haplogroupU2e1 found in Kalash population may suggest its West Eurasian origin with contribution from Altaian population. The haplogroup J2b1a (14.4 %) was observed in the Kalash population and this haplogroup was not found in Makranis, Pathans (Rakha et al., 2011) and Saraikis (Hayat et al., 2014) of Pakistan. The J haplogroup has been observed to be the most frequent in haplogroup in the Near East (12%) followed by Europe (11%), Caucasus (8%) and North Africa (6%). J1, a subgroup of J, is spread on the European continent accounting up to four-fifths of the total while J2 is found more localized around the Mediterranean, Greece, Italy/Sardinia and Spain. J1 haplogroup has been also reported in Tajiks (West Eurasians) with a frequency of 2.2% (Ovchinnikov et al., 2014). J2 has been reported

83 with significant frequency (9%) in the Kalashi during the previous investigations (Quintana-Murci, et al., 2004), which is consistent with the findings of this study. The presence of significantly high frequency (14.4 %) of Western Eurasian lineage (J2b1a) in Kalash supports the idea of genetic contribution of West Eurasians. The moderate frequency (6.3%) for haplogroup R2 in Kalash people was observed during present study. R2 haplogroup is an example of West Eurasian haplogroup that has been found in other ethnic groups of Pakistan such as Makranis, (6%) Pathans (1.7%) (Rakha et al., 2011) and Saraiki (2.3%) (Hayat et al., 2014). The haplogroup R2 in Europe has been observed in few populations of Volga basin, Russia, low frequencies in near and Middle East and India and is virtually absent elsewhere. The phylogeographic analysis of R2 haplogroup has pointed out its presence in southern Arabia, and Near East has been considered as a possible place of origin for R2 (Al-Abri et al., 2012). High concentration of haplogroup R2has been reported in southern Pakistan and in India, while low frequencies were observed in adjacent areas; the Near East, the Caucasus, the Iranian Plateau, the Arabian Peninsula, and Central Asia. Another study of Persian Peninsula provided the evidence for its possible origin from Iran as its sub clade has been found mostly in southern Arabians and Persians. R2 haplogroup belongs to Persian Peninsula as evidenced in previous studies that may possibly have Western Eurasian influence due to conquest of Persia by Alexander the Great. Hence, the presence of R2 haplogroup in Kalash may support the idea of greater genetic contribution of West Eurasians in Kalash (Hellenthal et al., 2014). Low frequency (3.6%) for haplogroup H2a1 in Kalash people was observed during this study. H2a1 haplogroup is another example of West Eurasian haplogroup that is absent in Makranis, and Saraiki (Hayat et al., 2014) of Pakistan. However, H2a1 has been reported in Pathans although in very low frequency (0.7%) (Rakha et al., 2011). It has been reported in previous studies that the likely origin of haplogroup H is in Southwest Asia (Achilli et al., 2004). In a recent study on Tajiks, haplogroup H was reported as predominant haplogroup with a frequency of 23.1%. Furthermore, H2 a sub- clades of haplogroup H displayed the highest percentage (4.4%) among Tajiks (Ovchinnikov et al., 2014) consistent with the findings of this study. According to (Ovchinnikov et al., 2014) the majority of Minoans (Greece) were classified in

84 haplogroup H (43.2%). Moreover none of the Minoans carried the African haplogroup such as L haplogroup (Jeffery et al., 2013), which is consistent with the population data of Kalash. CONCLUSION 1) Makrani population  The high frequency of African mtDNA haplogroups such as L2, L1, L3, and L0 in Makrani population shows their origin with major genetic contribution from Mozambique Bantu from southeastern Africa and Fulani people of West-Central Africa. This may be the result of strong slave-trade connection between the Makran ports of Gwadar, the principal ports of Oman during the Omani Empire. The African component in the Makrani community may therefore represent the genetic gift of this slave trade.  The presence of West Eurasian lineages such as U7a, T and J in the Makranis similar to Irani, Tajiks and west most Indians may suggest their common genetic contribution and may indicate their likely origin to be within West Eurasia.  Another major South Asian contribution including M and R lineages were observed in Makranis, which may support the previous suggestions of mating between African women and autochthonous males and it, resulted in to genetic admixture. 2) Kalash population  The mtDNA genetic analysis of Kalashi population revealed that the frequency of West Eurasian haplogroups including R0a’b, U4, U2e1, J2b1a, R2 and H2a1 reaches to 98.2%.  Only one South Asian haplogroup (M65a) was found in the population with least frequency (0.9%). African haplogroup L and East Asian haplogroup were not observed in Kalash at all.  Hence, greater frequency of West Eurasian haplogroups in Kalash may suggest their West Eurasians origin that might be the consequence of the major historical movements during the Arab and Muslim conquests, the rise of the British Indian Empire and invasion by the armies of Alexander the Great.

85 REFERENCES

ABU-AMERO, K. K., GONZALEZ, A. M., LARRUGA, J. M., BOSLEY, T. M. AND CABRERA, V. M., 2007. "Eurasian and African mitochondrial DNA influences in the Saudi Arabian population". BMC Evolutionary Biology 7: 32. doi:10.1186/1471-2148-7-32.

ACHILLI, A., OLIVIERI, A., PALA, M., KASHANI, B. H. AND CAROSSA, V., 2011. Mitochondrial DNA Backgrounds Might Modulate Diabetes Complications Rather than T2DM as a Whole. PLoS ONE 6(6): e21029. doi:10.1371/journal. pone.0021029

ADACHI, N., UMETSU, K. AND SHOJO, H., 2014. Forensic strategy to ensure the quality of sequencing data of mitochondrial DNA in highly degraded samples. Leg. Med., 16: 52-55.

AFONSO, C., ALSHAMALI, F., PEREIRA, J. B., FERNANDES, V., COSTA, M. AND PEREIRA, L. 2008. "MtDNA diversity in Sudan (East Africa)". Forensic Science International: Genetics Supplement Series 1: 257. doi:10.1016/j.fsigss.2007. 10.118.

AL-ABRI, A., PODGORNA, E., ROSE, J. L., PEREIRA, L. AND MULLIGAN, C. J., 2012. Pleistocene-Holocene boundary in Southern Arabia from the perspective of human mtDNA variation. Am. J. Phys. Anthropol., 149: 291-298.

ALONSO, A. H., IVANOV, V. D., JAYAWARDHANA , R. AND HOSOKAWA , T., 2002. The Relation between Mid Infrared Emission and Black Hole Mass in Active Galactic Nuclei: A Direct Way to Probe BlackHole Growth. The Astrophysical Journal, 571:L1-L5.

ALLARD, M. W., MILLER, K., WILSON, M., MONSON, K., AND BUDOWLE, B. 2002. Characterization of the Caucasian haplogroups present in the SWGDAM forensic mtDNA dataset for 1771 human control region sequences. Scientific Working Group on DNA Analysis Methods. Journal of Forensic Sciences, 47(6), 1215–1223.

86 AL-ZAHERY, N., SAUNIER, J., ELLINGSON, K., PARSON, W., PARSONS, T. J. AND ANDIRWIN, J. A. 2013.Characterization of mitochondrial DNA control region lineages in Iraq International Journal of Legal Medicine, 127(2): 373–375.

ANDERSON, S. A., BANKIER, T. B., BARRELL, G. M., BRUIJN, H. L., COULSON, A. R., DROUIN, J., EPERON, L. C., NIERLICH, D. P., ROE, B. A., SANGER, F., SCHREIER, P. H., SMITH, A. J. H., STADEN, R. AND YOUNG, L. G., 1981. Sequence and organization of the human mitochondrial Genome. Nature, 290: 457-465. doi: 10. 1038/290457a0

ANDREWS, R. M., KUBACKA, L. AND CHINNERY, P. F., 1999. Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat. Genet., 23: 147.

BANDELT, H. J., FORSTER, P. AND ROHL, A., 1999. Median-Joining Networks for Inferring Intraspecific Phylogenies. Mol. Biol. Evol., 16 (1): 37–48.

BANDELT, H. J., VAN OVEN, M., AND SALAS, A. 2012. Haplogrouping mitochondrial DNA sequences in Legal Medicine/Forensic Genetics. International Journal of Legal Medicine, 126(6): 901–916.

BARBIERI, C., VICENTE, M., OLIVEIRA, S., BOSTOEN, K., ROCHA, J., STONEKING, M. AND PAKENDORF, B., 2014. Migration and Interaction in a Contact Zone: mtDNA Variation among Bantu-Speakers in Southern Africa. PLoS One, 9(6): e99117

BECKERMAN, S., ERICKSON, P. I., YOST, J., REGALADO, J., JARAMILLO, L. And SPARKS, C., 2009. Life histories, blood revenge, and reproductive success among the Waorani of Ecuador. Proc Natl. Acad. Sci. USA. 2009; 106:8134– 8139.

BEHAR, D. M., VILLEMS, R., SOODYALL, H., BLUE-SMITH, J., PEREIRA, L., METSPALU, E., SCOZZARI, R., MAKKAN, H., TZUR, S., COMAS, D., BERTRANPETIT, J., QUINTANA-MURCI, L., TYLER-SMITH, C., WELLS, R. S. AND ROSSET, S., 2008. Genographic Consortium. The dawn of human matrilineal diversity. Am. J. Hum. Genet., 82(5): 1130-40.

87 BENDALL, K. E., MACAULAY, V. A., BAKER, J. R. AND SYKES, B. C., 1996. Heteroplasmic point mutations in the human mtDNA control region. Am. J. Hum. Genet., 59:1276–1287.

BENDALL, K. E., MACAULAY, V. A. AND SYKES, B. C., 1997. Variable levels of a heteroplasmic point mutation in individual hair roots. Am. J. Hum. Genet., 61: 1303–1308.

BEHAR, DORON, M., RICHARD, V., HIMLA, S, BLUE-SMITH, J., LUISA, P., ENE, M., ROSARIA, S., HEERAN, M. AND SHAY, T., 2008. "The Dawn of Human Matrilineal Diversity". The American Journal of Human Genetics, 82(5): 1130– 40. doi:10.1016/j.ajhg.2008.04.002. PMC 2427203. PMID 18439549.

BEHAR, D. M, VAN OWEN. M., ROSSET, AND METSPALU, M., 2012. "A "Copernican" reassessment of the human mitochondrial DNA tree from its root". Am. J. Hum. Genet., 90 (4): 675–684. doi:10.1016/j.ajhg.2012.03.002. PMC 3322232. PMID 22482806.

BERNARDO, S., HERMIDA, R., DESIDERIO, M., SILVA, D. A., ELIZEU, F. AND DE CARVALHO, 2014. mtDNA ancestry of Rio de Janeiro population, Brazil. Mol. Biol. Rep., 41:1945–1950 DOI 10.1007/s11033-014-3041-9.

BOGENHAGEN, D. AND DAVID, C. A., 1974. The number of mitochondrial deoxyribonucleic acid genomes in mouse L and human Hela cells. Quantitative isolation of mitochondrial deoxyribonucleic acid. J. Biol. Chem., 249: 7991– 7995.

BOWCOCK, A. M., RUIZ-LINARES, A., TOMFOHRDE, J., MINCH, E. AND KIDD, J. R., 1994. High resolution of human evolutionary trees with polymorphic microsatellites. Nature, 368:455–457.

BRENDAN A. I. PAYNE, I. J. WILSON, COXHEAD, J. DEEHAN, D, HORVATH, R, ROBERT, W., TAYLOR, DAVID, C., SAMUELS, BANDELT, H. J., LAHERMO, P., RICHARDS, M., MACAULAY, V., 2013. Detecting errors in mtDNA data by phylogenetic analysis. Int. J. Legal Med., 115: 64–69.

88 BUDOWLE, J. G. E. AND CHAKRABORTY, B. R., 2010. DNA identification by pedigree likelihood ratio accommodating population substructure and mutations, Investig. Genet., 1: 8.

BUTLER, J. M., 2009. Fundamentals of forensic DNA typing. Burlington, MA: Academic Press. Pages: 500. www.amazon.ca

CASE, J. T. AND WALLACE, D. C., 1981. Maternal inheritance of mitochondrial DNA polymorphisms in cultured human fibroblasts. Somatic Cell Genet, 7(1):103-8.

CARRACEDO, M. C., ASENJO, A. AND CASARES, P., 2000. Location of Shfr, a new gene that rescues hybrid female viability in crosses between Drosophila simulans females and D. melanogaster males. Heredity, 84(6): 630-638.

CARDOSO, S., ALFONSO-SANCHEZ, M. A., VALVERDE, L., SANCHEZ, D., ZARRABEITIA, M. T., ODRIOZOLA, A., MARTINEZ-JARRETA, B., AND DE PANCORBO, M. M., 2012. Genetic uniqueness of the Waorani tribe from the Ecuadorian Amazon. Heredity (Edinb). 108(6): 609–615.

CASTRI, L., TOFANELLI, S., GARAGNANI, P., BINI, C., FOSELLA, X., PELOTTI, S., PAOLI, G., PETTENER, D. AND LUISELLI, D., 2009. mtDNA variability in two Bantu-speaking populations (Shona and Hutu) from Eastern Africa: implications for peopling and migration patterns in sub-Saharan Africa. Am. J. Phys. Anthropol., 140: 302-311.

CANO, C. F., GOMEZ, N., OSPINA, J. A., CAJIGAS, H., GROOT, R. E., ANDRADE AND TORRES, M. M., 2014. Mitochondrial DNA Haplogroups and Susceptibility to Prostate Cancer in a Colombian Population. ISRN Oncology.

CERNY, V., PEREIRA, L., KUJANOVA, M., VASIKOVA, HAJEK, M., MORRIS, M., CONNIE, J., AND MULLIGAN., 2009. Out of Arabia—The settlement of Island Soqotra as revealed by mitochondrial and Y chromosome genetic diversity. Am. J. Phys. Anthropol., 138(4): 439-447. doi:10.1002/ajpa.20960 Key: citeulike: 3801223

89 CEREZO, M., ACHILLI, A., OLIVIERI, A., PEREGO, U. A., GOMEZ-CARBALLA, A., BRISIGHELLI, F., LANCIONI, H., WOODWARD, S. R., LÓPEZ-SOTO, M., CARRACEDO, A., CAPELLI, C.,TORRONI, A. AND SALAS, A., 2012. Reconstructing ancient mitochondrial DNA links between Africa and Europe. Genome Res., 22(5): 821–826.

CHAITANYA, L., VAN OVEN, M., WEILER, N., HARTEVELD, J., WIRKEN, L., SIJEN, T., KNIJFF, P., AND KAYSER, M., 2014. Developmental validation of mitochondrial DNA genotyping assays for adept matrilineal inference of biogeographic ancestry at a continental level. Foren. Sci. Int. Genet., 11: 39-51.

CHEN, Y. S., OLCKERS, A., SCHURR, T. G., KOGELNIK, A. M., HUOPONEN, K. AND WALLACE, D. C., 2000. mtDNA variation in the South African Kung and Khweand their genetic relationships to other African populations. Am. J. Hum. Genet., 66:1362-1383.

CHONG, M. D., CALLOWAY, C. D., KLEIN, S.B., ORREGO, C. AND BUONCRISTIANI, M. R., 2005. Optimization of a duplex amplification and sequencing strategy for the HVI/HVII regions of human mitochondrial DNA for forensic casework, Forensic Sci. Int., 154: 137–148.

COIA, V., WALLACE, D. C., OEFNER, P. J., TORRNI, A., CAVALLI-SFORAZA, L. L., SCOZZARI, R., 2005. A back migration from Asia to Sub-Saharan Africa is supported by high resolution analysis of human Y- chromosome haplotypes. American journal of human Genetics., 70: 1197-214

COSSINS, D., 2014. COSMOS, The Science of Every thing.

DAVID, C., STEPHANIE, P. AND SPENCER, W. R., 2004. , "Admixture, migrations, and dispersals in Central Asia: evidence from maternal DNA lineages", European Journal of Human Genetics, 12: 495–504.doi:10.1038/sj.ejhg.5201160

DERENKO, M., MALYARCHUK, B., BAHMANIMEHR, A., DENISOVA, G., PERKOVA, M., FARJADIAN, S. AND YEPISKOPOSYAN, L., 2013. Complete Mitochondrial DNA Diversity in Iranians. PLoS ONE, 8(11): e80673. Doi: 10.1371/ journal. Pone. 0080673.

90 DERBENEVA, O. A., SUKERNIK, R. I.,VOLODKO, N, V., HOSSEINI, S. H., LOTT, M. T. AND WALLACE, D. C., 2002. Analysis of Mitochondrial DNA Diversity in the Aleuts of the Commander Islands and Its Implications for the Genetic History of Beringia. Am. J. Hum. Genet., 71(2): 415–421.

DIB, C., FAURE, S., FIZAMES, C., SAMSON, D., DROUOT, N., VIGNAL, A., MILLASSEAU, P., MARC, S., HAZAN, J., SEBOUN, E., LATHROP, M., GYAPAY, G., MORISSETTE, J. AND WEISSENBACH, J., 1996. A comprehensive genetic map of the human genome based on 5,264 microsatellites. Nature, 380: 152–154.

DRUMMOND, A. J., ASHTON, B. AND CHEUNG, M., 2009. Geneious v4.8, http://www.geneious.com

DULIK, M. C., ZHADANOV, S. I., OSIPOVA, L. P., ASKAPULI, A., GAU, L., GOKCUMEN, O., RUBINSTEIN, S. AND SCHURR, T. G., 2012. Mitochondrial DNA and Y Chromosome Variation Provides Evidence for a Recent Common Ancestry between Native Americans and Indigenous Altaians. The American Journal of Human Genetics, 90: 229–246.

ELY, B., WILSON, J. L., JACKSON, F. AND JACKSON, B. A., 2006. African- American mitochondrial DNAs often match mtDNAs found in multiple African ethnic groups. BMC Biology, 4: 34 doi: 10.1186/1741-7007-4-34

EYRE-WALKER, A., SMITH, N. H., AND SMITH, J. M., 1999. How clonal are human mitochondria? Proc R Soc Lond B Biol. Sci. 266: 477–483.

FAN, L. AND YAO, Y. G., 2011. Mito Tool: A web server for the analysis and retrieval of human mitochondrial DNA sequence variations. Mitochondrian, 2: 351-356.

FORNARINO, S., PALA, M., BATTAGLIA, V., MARANTA, R., ACHILLI, A., MODIANO, G., TORRONI1, A., SEMINO, O. AND SANTACHIARA- BENERECETTI, S. A., 2009. Mitochondrial and Y-chromosome diversity of the Tharus (Nepal): a reservoir of genetic variation. BMC Evolutionary Biology, 9:154

91 GILL, P., IVANOV, P. L., KIMPTON, C., PIERCY, R., BENSON, N. AND TULLY, G., 1994. Identification of the remains of the Romanov family by DNA analysis. Nature Genetics, 6: 130–135.

GINTHER, C., ISSEL-TARVER, L., AND KING, M. C., 1992. Identifying individuals by sequencing mitochondrial DNA from teeth. Nature Genetics, 2(2): 135–138.

GOTO, H., DICKINS, B., AFGAN, E., PAUL, I. M., AND TAYLOR, J., 2011. Dynamics of mitochondrial heteroplasmy in three families investigated via a repeatable re- sequencing study. Genome Biol., 12: R59.

GONDER, M. K., MORTENSEN, H. M., REED, F. A., SOUSA, A. AND TISHKOFF, S. A., 2006. "Whole-mtDNA Genome Sequence Analysis of Ancient African Lineages". Mol. Biol. and Evol., 24 (3): 757–68. doi:10.1093/molbev/msl209. PMID 17194802.

GONCALVES, F.T., CARDENA, M., GONZALEZ, R., KRIEGER, J.E., PEREIRA, A.C. AND FRIDMAN, C., 2011. The discrimination power of the hypervariable regions HV1, HV2 and HV3 of mitochondrial DNA in the Brazillian population. Forensic Sci. Int. Genet. Supplement Ser. 3: e311-e312

GREENBERG, B. D., NEWBOLD, J. E., AND SUGINO, A. 1983. Intraspecific nucleotide sequence variability surrounding the origin of replication in human mitochondrial DNA. Gene, 21(1-2): 33–49.

GRIMES, B. F., 1992. “Ethnologue: Languages of the World,” 12th ed., Summer Institute of Linguistics, Inc., Dallas, Texas, USA.

HASEGAWA, M., RIENZO, A. D., KOCHER, T. D. AND WILSON, A. C., 1993. Toward a more accurate time scale for the human mitochondrial DNA tree. J. Mol. Evol., 37: 347–354.

HAGELBERG, E., GOLDMAN, N., LIO, P., WHELAN, S., SCHIEFENHO, V. W., CLEGG, J. B. AND BOWDEN, D. K., 1999. Evidence for mitochondrial DNA recombination in a human population of island Melanesia. Proc R Soc Lond B Biol. Sci., 266:485–492.

92 HAGELBERG, E., GOLDMAN, N., LIO, P., WHELAN, S., SCHIEFENHO, V. W., CLEGG, J. B. AND BOWDEN, D. K., 2000. Evidence for mitochondrial DNA recombination in a human population of island Melanesia [correction]. Proc R Soc Lond B Biol. Sci., 267:1595–1596.

HAYAT, S., TANVEER, A., SIDDIQI, M. H., RAKHA, A., NAEEM, H., TAYYAB, M., GHAZANFAR, A., AZAM, A., YASSIR, A. AND TARIQ, A. M., 2014. Mitochondrial DNA Control Region Sequences Study in Saraiki Population from Pakistan. LEGMED-D-14-00137 [in process].

HE, Y., W. U, J., DRESSMAN, D. C., IACOBUZIO-DONAHUE, C. AND MARKOWITZ, S. D, 2010. Heteroplasmic mitochondrial DNA mutations in normal and tumour cells. Nature, 464: 610–614.

HELLENTHAL, G., BUSBY, G. B. J., BAND, G., WILSON, J. F., CAPELLI, C., FALUSH, D. AND MYERS, S., 2014. A genetic atlas of human admixture history. Science, 343, 747–751.

HILL, C., SOARES, P. AND MORMINA, M., 2006. "Phylogeography and Ethnogenesis of Aboriginal Southeast Asians", Mol. Biol. Evol. 23(12): 2480– 2491.doi:10.1093/molbev/msl124 PMID 16982817

HOLT, I. J., HARDING, A. E., AND MORGAN-HUGHES, J. A., 1988. Deletions of muscle mitochondrial DNA in patients with mitochondrial myopathies. Nat., 331: 717-719.

IRWIN, J. A., SAUNIER, J. L., NIEDERSTATTER, H., STROUSS, K. M. AND STURK, K. A., 2009. Investigation of heteroplasmy in the human mitochondrial DNA control region: a synthesis of observations from more than 5000 global population samples. J. Mol. Evol., 68: 516–527.

JEFFERY, R., HUGHEY, PASCHOU, P., DRINEAS, P., MASTROPAOLO, D., DIMITRA, M., LOTAKIS, PATRICK, A., NAVAS, MICHALODIMITRAKIS, M., JOHN, A., STAMATOYANNO POULOS AND STAMATOYANNOPOULOS, G., 2013. A European population in Minoan Bronze Age Crete. Nature Communications, 4:1861.DOI: 10.1038/ncomms2871

93 JOHN, S. E., THAREJA, G., HEBBAR, P., BEHBEHANI, K., THANARAJ, T. A. And ALSMADI, O., 2014. Kuwaiti population subgroup of nomadic Bedouin ancestry—Whole genome sequence and analysis. Legal Medicine, [in press].

KAVLICK, M. F., LAWRENCE, H. S., MERRITT, R. T., FISHER, C., ISENBERG, A., ROBERTSON, J. M. AND BUDOWLE, B., 2011. Quantification of human mitochondrial DNA using synthesized DNA standards. J. of foren. Scien., 6: 1457-63.

KIVISILD, T., REIDLA, M., METSPALU, E., ROSA, A., BREHM, A., PENNARUN, E., PARIK, J., GEBERHIWOT, T., USANGA, E. AND VILLEMS, R., 2004. Ethiopian mitochondrial DNA heritage: tracking gene flow across and around the gate of tears. Am. J. Hum. Genet., 75: 752-770.

KLOSS-BRAND S. A., PACHER, D., SCHONHERR, S., WEISSENSTEINER, H., BINNA, R., SPECHTAND, G. AND FLORIAN K., 2011. HaploGrep: a fast and reliable algorithm for automatic classification of mitochondrial DNA haplogroups. Hum. Mutat., 32(1): 25-32.

KNIGHT, A., UNDERHILL, P. A., MORTENSEN, H. M., ZHIVOTOVSKY, L. A., LIN, A. A., HENN, B. M., LOUIS, D., RUHLEN, M. AND MOUNTAIN, J. L., 2003. African Y chromosome and mtDNA divergence provides insight into the history of click languages. Curr. Biol., 13:464–473.

KOHNEMANN, S., SIBBING, U., PFEIFFER, H. AND HOHOFF, C., 2008. A rapid mtDNA assay of 22 SNPs in one multiplex reaction increases the power of forensic testing in European Caucasians. Int. J. of leg. Med., 122: 517-23.

KRINGS, M., SALEM, A. E., BAUER, K., GEISERT, H., MALEK, A. K., CHAIX, L., SIMON, C., WELSBY, D., DI, R. A., UTERMANN, G., SAJANTILA, A., PAABO, S. AND STONEKING, M., 1999. mtDNA analysis of River Valley populations: A genetic corridor or a barrier to migration? Am. J. Hum. Genet., 64(4):1166-76

94 KUMAR, S., HEDRICK, P., DOWLING, T. AND STONEKING, M., 2000. Questioning evidence for recombination in human mitochondrial DNA. Science, 288:1931.

LANDER, E. S., LINTON, L. M., BIRREN, B., NUSBAUM, C., ZODY, M. C., BALDWIN, J., DEVON, K., DEWAR, K., DOYLE, M. AND FITZHUGH, W., 2001. Initial sequencing and analysis of the human genome. Nature, 409:860– 921.

LAN, Q., LIM, U., CHIN-SAN, LI., STEPHANIE J., WEINSTEIN, CHANOCK, S., MATTHEW, R., VIRTAMO, B, J., ALBANES, D. AND ROTHMAN, N., 2008. A prospective study of mitochondrial DNA copy number and risk of non-Hodgkin lymphoma. Blood, 112: 4247-4249.

LARSEN, N. B., RASMUSSEN, M. AND RASMUSSEN, L. J., 2005. Nuclear and mitochondrial DNA repair: similar pathways? Mitochondrion, 5(2): 89-108.

LEGROS, F., MALKA, F., FRACHON, P., LOMBES, A. AND ROJO, M., 2004. Organization and dynamics of human mitochondrial DNA. J. Cell Sci. 117: 2653– 2662.

LI, M., SCHONBERG, A., SCHAEFER, M., SCHROEDER, R. AND NASIDZE, I., 2010. Detecting heteroplasmy from high-throughput sequencing of complete human mitochon- drial DNA genomes. Am. J. Hum. Genet., 87: 237–249.

LOOGVALI, E. L., ROOSTALU, U., MALYARCHUK, B. A., DERENKO, M. V., KIVISILD, T., METSPALU, E., TAMBETS, K., REIDLA, M., TOLK, H. V., PARIK, J., PENNARUN, E., LAOS, S., LUNKINA, A., GOLUBENKO, M., BARAC, L., PERICIC, M., BALANOVSKY, O. P., GUSAR, V., KHUSNUTDINOVA, E. K., STEPANOV, V., PUZYREV, V., RUDAN, P., BALANOVSKA, E. V., GRECHANINA, E., RICHARD, C., MOISAN, J. P., CHAVENTRE, A., ANAGNOU, N. P., PAPPA, K. I., MICHALODIMITRAKIS, E. N., CLAUSTRES, M., GOLGE, M., MIKEREZI, I., USANGA, E. AND VILLEMS, R., 2004. Disuniting uniformity: a pied cladistic canvas of mtDNA haplogroup H in Eurasia. Mol . Biol. Evol., 21(11):2012-21.

95 MAJI, S., KRITHIKA, S. AND VASULU, T. S., 2008. Distribution of Mitochondrial DNA Macrohaplogroup N in India with Special Reference to Haplogroup R and its Sub-Haplogroup U. Int. J. Hum. Genet., 8(1-2): 85-96.

MALYARCHUK, B., DERENKO, M., GRZYBOWSKI, T., PERKOVA, M. AND ROGALLA, U., 2010. The Peopling of Europe from the Mitochondrial Haplogroup U5 Perspective. PLoS ONE, 5(4): e10285. doi:10.1371/journal. pone.0010285

MA, J., COARFA, C., QIN, X., BONNEN, P. E., MILOSAVLJEVIC, A., VERSALOVIC, J. AND AAGAARD, K., 2014. mtDNA haplogroup and single nucleotide polymorphisms structure human microbiome communities. BMC Genomics, 15: 257.

METSPALU, M., KIVISILD, T., METSPALU, E., PARIK, J., HUDJASHOV, G. AND KALDMA, K., 2004. Most of the extant mtDNA boundaries in South and Southwest Asia were likely shaped during the initial settlement of Eurasia by anatomically modern humans. BMC Genet., 5: 26.

MELTON, T., CLIFFORD, S., MARTINSON, J., BATZER, M. AND STONEKING, M., 2004. Genetic evidence for the proto-Austronesian homeland in Asia: mtDNA and nuclear DNA variation in Taiwanese aboriginal tribes. Am. J. Hum. Genet., 63(6):1807-1823.

MEYER, S., WEISS, G. AND HAESELER, V. A., 1999. Pattern of nucleotide substitution and rate heterogeneity in the hypervariable regions I and II of human mtDNA. Genetics, 152: 1103–1110.

MEYER, M, N., ANA, G. M., JOSE, L. M, CABRERA, C. AND VICENTE, M., 2001. "Major genomic mitochondrial lineages delineate early human expansions". BMC Genetics, 2: 13. doi:10.1186/1471-2156-2-13. PMC 55343. PMID 11553319.

MICHIKAWA, Y., MAZZUCCHELLI, F., BRESOLIN, N., SCARLATO, G. AND ATTARDI, G., 1999. Aging-dependent accumulation of point mutations in the human mtDNA control region for replication. Science, 286: 774–779.

96 MISHMER, D., RUIZ- PESINI, E., GOLIK, P., MACAULAY, V., CLARK, A. G., HOSSEINI, S., BRANDON, M., EASLY, K., CHEN, E., BROWN, MD., SUKERNIK, RI., OLCKERS, A. AND WALLACE, D. C. 2003. Natural selection shaped regional mtDNA variation in humans. Proc. Natl. Acad. Sci. USA. 100:171-176.

MOSTAFA, A. E., NAGAI, A., GHADA, M., GOMAA, HANAA, M. R., HEGAZY, SHAABAN, F. E. AND BUNAI, Y., 2013. Investigation of mtDNA control region sequences in an Egyptian population sample. Legal Medicine, 15: 338– 341.

NADIR, E., MARGALIT, H., GALLILY, T. AND BEN-SASSON, S. A., 1996. Microsatellite spreading in the human genome: evolutionary mechanisms and structural implications. Proc. Natl. Acad. Sci. USA 93: 6470-6475.

NILSSON, M., ANDREASSON, J. H INGMAN, M AND ALLAN, M. 2008. Evaluation of mitochondrial DNA coding region assays for increased discrimination in forensic analysis. Forensic Sci. Int. Genet., 2: 1–8.

OVCHINNIKOV, I. V., MATHEW, J. MALEK, DREES, K., OLGA, I., KHOLINA, 2014. Mitochondrial DNA variation in Tajiks living in Tajikistan. Legal Medicine, 16: 390–395.

OLIVIERI, A., ACHILLI, A., PALA, M., BATTAGLIA, V., FORNARINO, S., AL- ZAHERY, N., SCOZZARI, R., CRUCIANI, F., BEHAR, D.M., DUGOUJON, J.M., COUDRAY, C., SANTACHIARA-BENERECETTI, A.S., SEMINO, O., BANDELT, H. J. AND TORRONI, A., 2006. The mtDNA legacy of the Levantine early Upper Palaeolithic in Africa. Science, 314: 1767-1770.

PALA, M., OLIVIERI, A., ACHILLI, A., ACCETTURO, J., METSPALU, M., REIDLA, M., TAMM, E., KARMIN, M., REISBERG, T., BAHARAK HOOSHIAR KASHANI, B. H., PEREGO, U. A., CAROSSA, V., GANDINI, F., JOANA B. PEREIRA, J. B., SOARES, P., ANGERHOFER, N., RYCHKOV, S., AL-ZAHERY, N., CARELLI, V., SANATI, M. H., HOUSHMAND, M.,

97 HATINA, J., MACAULAY, V., PEREIRA, L., WOODWARD, S.R., DAVIES, W., GAMBLE, C., BAIRD, D., SEMINO, O., 2012. Mitochondrial DNA Signals of Late Glacial Recolonization of Europe from Near Eastern Refugia. The American Journal of Human Genetics, 90: 915–924.

PARSONS, T. AND IRWIN, J., 2000. Questioning evidence for recombination in human mitochondrial DNA. Science, 288:1931.

PARSON, W. AND BANDELT, H. J., 2007. Extended guidelines for mtDNA typing of population data in forensic science. Forensic Sci. Int. Genet., 1: 13–19.

PARDINAS, A. F., MARTINEZ, J. L., ROCA, A., GARCIA-VAZQUEZ, E. AND LOPEZ, B., 2014. "Over the sands and far away: Interpreting an Iberian mitochondrial lineage with ancient Western African origins". Am. J. Hum. Biol., 26(6): 777–83.

PEREIRA, L., MACAULAY, V., TORRONI, A., SCOZZARI, R., PRATA, M. J., AND AMORIM, A. 2001. Prehistoric and historic traces in the mtDNA of Mozambique: insights into the Bantu expansions and the slave trade. Annals of Human Genetics, 65(Pt 5), 439–458.

PEREIRA, L., RICHARDS, M. AND GOIOS, A., 2005. "High-resolution mtDNA evidence for the late-glacial resettlement of Europe from an Iberian refugium". Genome Research 15(1): 19–24. doi:10.1101/gr.3182305. PMC 540273. PMID 15632086.

PIMENOFF, V. N., COMAS, D., PALO, J. U., VERSHUBSKY, G., KOZLOV, A. AND SAJANTILA, A., 2009. Northwest Siberian Khanty and Mansi in the junction of West and East Eurasian gene pools as revealed by uniparental markers. European Journal of Human Genetics, 16: 1254–1264.

PRIETO, L., ZIMMERMANN, B., GOIOS, A., RODRIGUEZ-MONGE, A., PANETO, G. G., ALVES, C., ALONSO, A., FRIDMAN, C., CARDOSO, S., LIMA, G., ANJOS, M. J., WHITTLE, M. R., MONTESINO, M., CICARELLI, R. M. B.,

98 ROCHA, A. M., ALBARRA, C., PANCORBO, M. M., PINHEIRO, M. F., CARVALHO, M., SUMITA, D. R., PARSON, W., 2011. The GHEP–EMPOP collaboration on mtDNA population data a new resource for forensic case work. Foren. Sci. Int. Gen., 5:146–151.

QAMAR, R., AYUB, Q., KHALIQ, S., MANSOOR, A., KARAFET, T. AND MEHDI, S. Q., 1999. African and Levantine origins of Pakistani YAP + Y chromosomes. Hum. Biol., 71: 745–55.

QUINTANA-MURCI, L., CHAIX, R., WELLS, R. S., AND BEHAR, D. M., 2004. Where west meets east: the complex mtDNA landscape of the southwest and Central Asian corridor. The American Journal of Human Genetics., 74(5): 827- 845.

RADFORD, T., 2011. Mitochondrial DNA and mysteries of human evolution.

RAJKUMAR, REVATHI, BANERJEE, JHEELAM GUNTURI, HIMA, TRIVEDI, R. AND KASHYAP, V. K., 2005. "Phylogeny and antiquity of M macrohaplogroup inferred from complete mt DNA sequence of Indian specific lineages". BMC Evolutionary Biology 5: 26. doi:10.1186/1471-2148-5-26. PMC 1079809. PMID 15804362.

RAKHA, A., SHIN, K. J., YOON, J. A., AND KIM, N. Y. AND HASSAN, M. S. 2011. Forensic and genetic characterization of mtDNA from Pathans of Pakistan. International Journal of legal Medicine, 125:841–848.

RICHARDS, M., MACAULAY, V., HICKEY, E., VEGA, E., SYKES, B., GUIDA, V. AND RENGO, C., 2000. Tracing European founder lineages in the Near Eastern mtDNA pool. Am. J. Hum. Genet., 67: 1251-1276.

RIDL, J., EDENS, C. M. AND CERNY, V., 2009. Mitochondrial DNA Structure of Yemeni Population: Regional Differences and the Implications for Different Migratory Contributions. Vertebrate Paleobiology and Paleoanthropology, 69-78.

99 RITO, T., RICHARDS, M. B., FERNANDES, V., ALSHAMALI, F., CERNY, V., PEREIRA, L., AND SOARES, P., 2013. The First Modern Human Dispersals across Africa. PLoS One, 8(11): e80031.

ROSA, A., BREHM, A., KIVISILD, T., METSPALU, E. AND VILLEMS, R., 2004. MtDNA profile of West Africa Guineans: towards a better understanding of the Senegambia region. Ann. Hum. Genet., 68: 340–352.

RUDBECK, L., GILBERT, M. T., WILLERSLEV, E., HANSEN, A. J., LYNNERUP, N., CHRISTENSEN, T. AND DISSING, J., 2005. mtDNA analysis of human remains from an early Danish Christian cemetery. Am. J. Phys. Anthropol., 128(2):424-9.

SARKISSIAN, C. D., BALANOVSKY, O., BRANDT, G., KHARTANOVICH, V., BUZHILOVA, A., KOSHEL, S., ZAPOROZHCHENKO, V., GRONENBORN, D., MOISEYEV, V., KOLPAKOV, E., SHUMKIN, V., KURT W. A., BALANOVSKA, E., COOPER, A. AND HAAK, W., 2013. Ancient DNA Reveals Prehistoric Gene-Flow from Siberia in the Complex Human Population History of North East Europe. PLoS Genet., 9(2): e1003296.

SALAS, A., RICHARDS, M., DELA, F. T., LAREU, M. V., SOBRINO, B., SANCHEZ, D. P, MACAULAY, V. AND CARRACEDO, A., 2002. The making of the African mtDNA landscape. Am. J. Hum. Genet., 71: 1082-1111.

SANTOS, C., MONTIEL, R., ARRUDA, A., ALVAREZ, L. AND ALUJA M. P., 2008. Mutation patterns of mtDNA: empirical inferences for the coding region. BMC Evol. Biol., 8: 167.

SCHONBERG, T., O'DOHERTY, J. P., JOEL, D., INZELBERG, R., SEGEV, Y. AND DAW, N. D., 2010. Selective impairment of prediction error signaling in human dorsolateral but not ventral striatum in Parkinson's disease patients: evidence from a model-based fMRI study. NeuroImage 49: 772–781

100 SENAFI, S., ARIFFIN, S. H. Z., DIN, R. D. R., MEGAT, R., WAHAB, A., ABIDIN, I. Z. Z. AND ARIFFIN, Z. Z., 2014. Haplogroup determination using hypervariable region 1 and 2 of human mitochondrial DNA. Journal of applied Sciences., 14(2): 197-200.

SHOFFNER, J. M., LOTT, M. T., VOLJAVEE, A. S. AND COSTIGAN, D. A., 1989. Spontaneous Kearns-Sayre/chronic external ophthalmoplegia plus syndrome is associated with a mtDNA deletion: A slip-replication model and metabolic therapy. Proc. natn. Acad. Sci. U.S.A. 86: 7952−7956.

SIDDIQI, M. H., TANVEER, A., RAKHA, A., GHAZANFAR, A., AKRAM, A., NAEEM, H., AZAM, A., SIKANDAR, H., MASOOMA, S., JAMIL, A., TARIQ, A. M., VAN OVEN, M. AND KHAN, F. M., 2014. Genetic characterization of the Makrani people of Pakistan from mitochondrial DNA sequence data. LEGMED-D-14-00135 [in process].

SINGH, K. K. AND KULAWIEC, K. K., 2009. “Mitochondrial DNA polymorphism and risk of cancer,” Methods in Molecular Biology, 471: 291–303.

SOBRINO, B., AND CARRACEDO, A., 2005. SNP typing in forensic genetics: a review. Methods in Molecular Biology (Clifton, N.J.), 297: 107–126.

STEWART, J. E. B., FISHER, C. L., AAGAARD, P. J., WILSON, M. R., ISENBERG, A. R., POLANSKEY, D., POKORAK, E., DIZINNO, J. A., AND BUDOWLE, B. 2001. Length variation in HV2 of the human mitochondrial DNA control region. Journal of Forensic Sciences, 46:862-870.

SUBRAMANIAN, S., HAY, J. M., MOHANDESAN, E., MILLAR, C. D. AND LAMBERT, D. M., 2009. Molecular and morphological evolution in tuatara are decoupled. Trends Genet, 25:16–18

SYKES, B., 2001. The Seven Daughters of Eve. London; New York: Bantam Press. ISBN 0393020185

101 TAJIMA, F., 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics, 123: 585–95.

TAYLOR, R.W. AND TURNBULL, D. M. 2005. Mitochondrial DNA mutations in human disease. Nat. Rev. Genet., 6: 389–402.

TERREROS, M. C., ROWOLD, D. J., MIRABAL, S. AND HERRERA, R. J., 2011. Mitochondrial DNA and Y-chromosomal stratification in Iran: relationship between Iran and the Arabian Peninsula. Journal of Human Genetics, 56: 235– 246; doi:10.1038/jhg.2010.174

THANGARAJ, K., CHAUBEY, G., SINGH, V., VANNIARAJAN, A., THANSEEM, I., REDDY, A. G. AND SINGH, L., 2006. In situ origin of deep rooting lineages of mitochondrial macrohaplogroup M in India. BMC. Genom., 7: 151.

TISHKOFF, S. A., GONDER, M. K., HENN, B. M., MORTENSEN, H., KNIGHT, A., GIGNOUX, C., FERNANDOPULLE, N. AND LEMA, G., 2007. "History of Click-Speaking Populations of Africa Inferred from mtDNA and Y Chromosome Genetic Variation". Molecular Biology and Evolution, 24(10): 2180–95.

TORRONI, A., RICHARDS, M., MACAULAY, V., FORSTER, P., VILLEMS, R., NORBY, S. AND SAVONTAUS, M. L., 2000. mtDNA haplogroups and frequency patterns in Europe. Am. J. Hum. Genet., 66: 000–000 (in this issue)

TORRONI, A., ACHILLI, A., MACAULAY, V., RICHARDS, M. AND BANDELT, H., 2006. Harvesting the fruit of the human MtDNA tree. Trends in Genetics, 22: 339-345. van OVEN, M. AND KAYSER, M., 2008. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum. Mutat., 29: E386-94. van OVEN, M. AND KAYSER, M., 2009. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum. Mutat., 30: 386–394.

102 van OVEN, M., VERMEULEN, M. AND KAYSER, M., 2011. Multiplex genotyping system for efficient inference of matrilineal genetic ancestry with continental resolution. Investigative Genetics., 2:6.

VIGILANT, L., STONEKING, M., HARPENDING, H., HAWKES, K. AND WILSON. A. C., 1991. African populations and the evolution of human mitochondrial DNA. Science 253:1503-l 507.

WALLACE, D., BROWN, M. D. AND LOTT, M. T., 1999. "Mitochondrial DNA variation in human evolution and disease". Gene, 238 (1): 211–30. doi:10.1016/S0378-1119(99)00295-4. PMID 10570998.

WALLACE, D. C., 2011. Colloquium paper: bioenergetics, the origins of complexity and the ascent of man. Proc. Natl. Acad. Sci. USA, 107 (suppl. 2), 8947–8953.

WATSON, E., FORSTER, P., RICHARDS, M. AND BANDELT, H. J., 1997. Mitochondrial footprints of human expansions in Africa. Am. J. Hum. Genet., 61: 691-704.

WINTERS, C., 2010. The African Origin of mtDNA Haplogroup M1. Journal of Biological Sciences, 2(6): 380-389.

YANG, H. C., WANG, P. L., LIN, C. W., CHEN, C. H., AND CHEN, C. H., 2012. Integrative analysis of single nucleotide polymorphisms and gene expression efficiently distinguishes samples from closely related ethnic populations. BMC Genomics., 13: 346.

YANG, I. S., LEE, H. Y., YANG, W. I. AND SHIN, K. J., 2013. mtDNA Profiler, The tool is described in: A Web Application for the Nomenclature and Comparison of Human Mitochondrial DNA Sequences. J. of Foren. Sci. [Epub ahead of print].

YAO, Y. G. AND ZHANG, Y. P., 2002. Phylogeographic analysis of mtDNA variation in four ethnic populations from Yunnan Province: new data and a reappraisal. J. Hum. Genet., 47: 311-318.

YU, M., 2011. Generation, function and diagnostic value of mitochondrial DNA copy number alterations in human cancers. Life Sci., 89(3-4): 65-71.

103

Characterization of Genetic Markers in Pakistani Population CONSENT FORM Donor’s Name:______Donor’s ID:______Gender:______Age:______Place of Birth:______Ethnic Group:______Education:______Religion:______Mother’s Place of Birth: ______Mother’s Ethnic Group:______Grand Mother’s Place of Birth:______Grand Mother’s Ethnic Group:______Father’s Place of Birth:______Father’s Ethnic Group:______Grand Father’s Place of Birth:______Grand Father’s Ethnic Group:______Mother Language:______Other Languages:______Contact Number:______Consanguinity of Parents: (Father married to): First Cousin Khalazad Mamonzad Chachazad Phupizad Second Cousin Khalazad Mamonzad Chachazad Phupizad Any Other Within Caste (sub-caste) Outside Caste (Sub-caste)

I am a Ph.D Scholar at Department of Zoology, University of the Punjab, Lahore, working on the research project “Characterization of Genetic Markers in Pakistani Population”. I assure you that the samples collected over here during this study will be utilized solely for research purposes. ______Muhammad Hassan Siddiqi Ph.D Research Scholar For Donor: I voluntarily agree to take part in this research project. My blood sample may be collected and used in population genetic studies as defined in this consent form.A person, well versed in my native language, has conveyed to me the aim of this study. ______

Native Person ‘s Name & Signature Donor’s Signature Date

PUBLICATIONS

(A) From the Thesis

1) Siddiqi, M. H., Akhtar, T., Rakha, A., Abbas, G., Ali, A., Haider, N., Ali, A., Hayat, S., Masooma, S., Ahmad, J., Tariq, M. A. and Khan, F. M., 2014. Genetic characterization of the Makrani people of Pakistan from mitochondrial DNA control region data. J. Leg. Med., [In press] http://dx.doi.org/10.1016/j.legalmed.2014.09.007

(B) Other than the Thesis 1) Hayat, S., Akhtar, T., Siddiqi, M. H., Rakha, A., Haider, N., Tayyab, M., Abbas, G., Ali, A., Yassir, S. A. B., Tariq, M. A. and Khan, F. M., 2014. Mitochondrial DNA Control Region Sequences Study in Saraiki Population from Pakistan. J. Legal Med., [In press]. 2) Jahan, S., Khaliq, S., Siddiqi, M. H., Ijaz, B., Ahmad, W., Ashfaq, U. A. and Hassan, S., 2011. Anti-apoptotic effect of HCV Core gene of Genotype 3a in Huh-7 cell line. Virology Journal. 3) Jahan, S., Samreen, B., Khaliq, S., Ijaz, B., Khan, M., Siddiqi, M. H., Ahmad, W., Hassan, S., 2011. HCV entry receptor as potential targets for siRNA based inhibition of HCV. Journal Genetic Vaccines and Therapy. 4) Rakha, A., Kyoung-Jin S., Yoon, J. A., Kim, Na, Y., Siddiqi, M. H., Yang, S., Yang, W. L. and Lee, H. Y., 2010. Forensic and genetic characterization of mtDNA from Pathans of Pakistan. Int. J. Legal Med. 5) Malik, F., Kayani, M. A., Ansar, M., Obaid-ullah, M. S., Chohan, S., Abbas, Y., Shahzad, S., Raza, A., Rehman, R., Qurat-ul-ain, Siddiqi, M. H., Rakha, A., Zia ur Rehman, Ahmad, Z., 2008. Development of 19-plex YSTR system and polymorphism studies in Pakistani population. J ACAD J XJTU.