Analysis of Y-Chromosome Polymorphisms in Pakistani
Total Page:16
File Type:pdf, Size:1020Kb
ANALYSIS OF Y-CHROMOSOME POLYMORPHISMS IN PAKISTANI POPULATIONS Thesis submitted to the Sindh Institute of Medical Sciences for the degree of Doctor of Philosophy. BY Sadaf Firasat Centre of Human Genetics and Molecular Medicine Sindh Institute of Medical Sciences Sindh Institute of Urology and Transplantation (SIUT) Karachi, Pakistan 2010 TABLE OF CONTENTS Title page Acknowledgements ii List of Tables iii List of Figures iv Summary vi Introduction 1 Literature Review 19 Materials and Methods 34 Results Phylogeography of Pakistani ethnic groups. 51 Comparison between the Pakistani and Greek populations 73 Discussion 86 Comparison within Pakistan 88 Comparison between the Pakistani and Greek population 94 Comparison with world populations 98 Insight in to populations origins 111 Conclusions 121 References 122 Appendix a i ACKNOWLEDGEMENT I thank Prof. Dr. Syed Qasim Mehdi H.I. S.I., for his support, encouragement and for providing all the facilities for doing scientific work in his laboratory. The work presented in this thesis was done under the supervision of Dr. Qasim Ayub T.I. It is great pleasure for me to acknowledge the keen interest, advice, patient guidance and kindness that I have received from him during the course of this work. I would like to thank Dr. Shagufta Khaliq, (PoP), for teaching all the molecular genetics lab techniques and also to Dr Aiysha Abid for comments on this manuscript and suggestion for its improvement. I am also grateful to Mrs. Ambreen Ayub for her help in making the contour map. I thank my colleague Ms. Sadia Ajaz for her help and cooperation in proof reading the thesis. It has been an honor for me to work at SIUT and I thank Prof. Dr Adeeb Rizvi H.I. S.I., Director, SIUT, for his constant support and guidance. Finally, I would like to thank my parent, without their love and support the completion of this work would have not been possible. ii LIST OF TABLES Table Title Page I. The possible origins and language affinities of Pakistani populations. 21 II. A list of Y haplogroups, markers, type of polymorphism and genotyping methods used in this study. Y haplogroups were determined in a hierarchal manner, screening initially with markers that identified deep lineages (bold) and subsequently genotyping markers that further delineated the tree in the target population. The typing methods were amplified fragment length polymorphism (AFLP), denaturing high performance liquid chromatography (DHPLC), amplification refractory mutation system polymerase chain reaction (ARMS-PCR) or dideoxy DNA sequencing (Seq). 41 III. List of SNPs typed by AFLP method 42 IV. YSTR Primer sequences. 46 V. Frequency of haplogroups B*, C*, E* and F* in ethnic groups from 53 Pakistan. VI. Number and frequencies of populations fall in haplogroup B-T. 60 VII. Y lineages found in the three Punjabi castes examined in this study. 63 VIII. Percentage of variation obtained by AMOVA at three levels of population hierarchy in ethnic groups from Pakistan. 68 IX. Population pair wise FSTs between Pakistani ethnic groups computed from Y haplogroup frequencies. FST p values (based upon 110 permutations) are given above the diagonal with *indicating significant pair wise differences. 69 X. Matrix of significant. FST p values (significance level =0.0500) based upon 110 permutations among the ethnic group of Pakistan. 70 XI. Weighted population pair wise ρ genetic distances (below diagonal) and FST values (above diagonal) based on STR variation within haplogroups. 80 XII. Description of World populations. 103 XIII. Y-STRS data of clade B lineages in Pakistan and African populations. 108 iii LIST OF FIGURES Figure Title Page I. Map of Pakistan showing its neighbors, administrative regions and the geographical distribution of the populations that are included in this study. 20 II. Phylogenetic tree. 26 III. Distribution of haplogroups B*, C*, E* and F* in populations from northern and southern Pakistan. 54 IV. Y haplogroup frequency distribution in ethnic group of 55 Pakistani. V. Distribution of major Y lineages (PK2, M52, M67, M27) frequencies in Pakistan. 64 VI. Distribution of major Y lineages (M357, M173, M17 and M124) frequencies in Pakistan 65 VII. Principal component analysis based on Y haplogroup frequencies in Pakistani populations. 67 VIII. Median-joining network of Lineage L individuals based on Y– STR haplotypes. 72 IX. A rooted maximum-parsimony tree of Y lineages found in the Greek, Burusho, Kalash, Pathan and Pakistani populations. 75 X. A plot of the first two principal coordinates based upon the analysis of Y haplogroup frequencies in Pakistani and Greek populations. 77 XI. A plot of the first two principal coordinates based upon the analysis of Y haplogroup frequencies in Pakistani and Greek samples (1=this study; 2 = Francalacci et al., 2003) using comparable biallelic markers. 78 XII. Neighbor-joining tree showing the relationship between the Greek and three Pakistani ethnic groups. The tree is based on ρ genetic distances. 81 XIII. Median-joining network of clade E lineages in Pakistan (open circles) and Greece (hatched circles). Circles represent haplotypes and have an area proportional to frequency. The Pathan individuals are shown in black. 83 XIV. Contour map showing the 9 Y-STR haplotypes frequency distribution in Eurasia and northern Africa. This haplotype was shared between three Greeks and a Pathan individual belonging to clade E1b1b1a. 85 iv XV. The frequencies of Major haplogroup in Asian population. 105 XVI. Median-joining network of C lineage. 106 XVII. Distribution of L haplogroup in Indo Pak sub continent. 107 XVIII. Median-joining network of clade B lineages in Pakistan and African populations. Circles represent haplotypes and have an area proportional to frequency. The Pakistani individuals are shown in orange and light blue colour. 1 09 XIX. Geographic distribution of O haplogroup. 110 XX. Median-joining network H1-M52 lineage fall in Burusho, Kalash and Pathan, based on their Y-STR haplotype. 1 15 XXI. Possible origins a) Hazara b) Kalash c) Parsi d) Makrani – Negroid. 120 v SUMMARY - 1 - The data presented in this thesis provides a comprehensive report on Y chromosomal diversity among different ethnic groups from Pakistan. It provides insights into the genetic variation in Pakistan in a global context and also sheds light on the patrilineal origins of these populations. The major conclusions are summarized as follows: 1. Genetic relationships in Pakistan are dictated primarily by geographic proximity rather than linguistics: The results suggest that within Pakistan male genetic relationships are dictated primarily by geographic proximity. Ethnic groups speaking Dravidian (Brahui), Sino-Tibetan (Balti) or the language isolate Burushaski (Burusho) share genetic affinity with their Indo-European speaking geographic neighbors. Although the isolation of the Hunza Burusho in the mountains of northern Pakistan has led to the preservation of their language it has not made them genetically distinct in comparison with their neighbors in Pakistan. Based on Y haplogroup frequencies, the majority of the ethnic groups from Pakistan show evidence of admixture mostly with Central/South Asian and European populations. This is illustrated by the fact that the major haplogroups such as E*, J* and R*, that are frequent in west Asians and Europeans, together constitute 65% of the total. Haplogroups L1 and R2 are shared with populations from India and constitute 11% of the Pakistani population. 2. The Karakoram Mountains form a formidable barrier to gene flow from China: Haplogroups, such as haplogroup C3 and O*, that are commonly observed in East Asians, are rare, or absent in the Pakistani populations and constitute < 1.5 % of the total. Populations living in these mountain valley’s such as the Hunza Burusho, Balti and Kashmiri are all genetically closer to other ethnic groups in Pakistan. This vi low prevalence, or absence, of East Asian haplotypes in Pakistan indicates that the Karakoram Mountains, which separate Pakistan and China, form a formidable barrier to gene flow from the north. The Hazara are the only population with significant East Asian ancestry but historical records indicate that they did not cross this geographical boundary and arrived in the sub-continent from the West. 3. Genetic signatures of invasions: The Indo-European contribution to the Y gene pool in Pakistan is substantial and is probably a reflection of the colonization of the subcontinent by invaders from West and Central Asia. These probably replaced the indigenous Y haplogroups which are now mostly found in South Indians and isolated populations in the Andaman Islands. Three populations (Burusho, Kalash and Pathan) also claim Greek ancestry following Alexander’s invasion of the subcontinent. However, the results shown here only provided strong support for a minor Greek genetic contribution to the Pathan gene pool. The presence of a unique star cluster based on Y-STR haplotypes in haplogroup C3 Y chromosomes in the Hazara population has been linked to the male descendants of Genghis Khan (1162-1227). These Y chromosomes are prevalent in Mongolia and are observed at a frequency of 60% in a much larger sample of Hazara males from northern and southern Pakistan that were analyzed in this study. Although this haplogroup was also observed in the Burusho (8.2%) but these samples did not share the star haplotype pointing towards separate origins for these populations. Historical records also support the genetic relatedness between East Asians and the Hazara. vii 4. The Kalash as genetic outliers: This study also demonstrates that the Kalash have a distinct genetic identity within Pakistan. Located in the remote valley’s of the Hindu Kush Mountains they show significant Caucasian ancestry but also have a high proportion of population specific haplogroup L3a that is not found elsewhere in Pakistan. Their genetic uniqueness is a reflection of genetic drift in an isolated population struggling to maintain their distinct cultural and religious identity. Future Prospects: This endeavour expands our knowledge about Pakistani populations and complements data obtained from analyzing autosomal and mitochondrial markers.