SAPIENZA Università Di Roma Facoltà Di Scienze Matematiche Fisiche E Naturali
Total Page:16
File Type:pdf, Size:1020Kb
Dottorato di ricerca in Genetica e Biologia Molecolare SAPIENZA Università di Roma Facoltà di Scienze Matematiche Fisiche e Naturali DOTTORATO DI RICERCA IN GENETICA E BIOLOGIA MOLECOLARE XXIX Ciclo (A.A. 2016/2017) TRANS-SAHARAN CONNECTIONS: HIGH-RESOLUTION ANALYSIS OF HUMAN Y CHROMOSOME DIVERSITY Dottorando Eugenia D’Atanasio Docente guida Prof. Fulvio Cruciani Tutore Prof. Andrea Novelletto Coordinatore Prof. Silvia Bonaccorsi Pag. 1 Eugenia D’Atanasio Pag 2 Dottorato di ricerca in Genetica e Biologia Molecolare TABLE OF CONTENTS GLOSSARY ................................................................................... 7 SUMMARY ................................................................................... 8 INTRODUCTION ....................................................................... 10 The human Y chromosome .................................................... 10 Structure of the human Y chromosome ............................. 10 Variation in the human Y chromosome ............................. 12 Biallelic polymorphisms ................................................. 12 Multiallelic polymorphisms ............................................ 14 Copy number variations .................................................. 15 Uniparental markers and phylogenetic trees ....................... 15 Geographic distribution of Y chromosome haplogroups .. 18 Time estimates for the MSY phylogeny .............................. 21 Peopling of Africa ................................................................... 22 Distribution of Y chromosome haplogroups in Africa ...... 22 The role of the Sahara in the peopling of Africa ............ 25 AIMS ........................................................................................... 29 RESULTS .................................................................................... 31 Targeted next generation sequencing ................................... 31 Sample selection .................................................................. 31 Samples from our lab collection ..................................... 31 Publicly available whole Y chromosomes ...................... 34 Ancient specimens ........................................................... 35 Region selection .................................................................. 36 Phylogenetic tree and time estimates .................................... 37 Pag. 3 Eugenia D’Atanasio General features of the phylogeny ...................................... 37 Comparison with literature ............................................. 40 Dating .................................................................................. 42 The four trans-Saharan haplogroups ................................ 44 A3-M13 ............................................................................ 44 E-M2 ............................................................................. 45 E-M78 ............................................................................. 47 R-V88 ............................................................................. 48 Geographic distribution and further molecular dissection of the trans-Saharan clades ............................................ 49 Molecular dissection of A3-M13 ........................................ 50 Molecular dissection of E-M2 ............................................ 55 Molecular dissection of E-M78 .......................................... 60 Molecular dissection of R-V88 ........................................... 65 DISCUSSION ............................................................................... 69 The advantages of the targeted sampling approach ............ 69 The Green Sahara and the four trans-Saharan clades........ 71 The central Sahara .............................................................. 71 A3-M13 during the Green Sahara ................................... 72 E-M2 during the Green Sahara ....................................... 74 R-V88 during the Green Sahara ..................................... 78 The eastern Sahara ............................................................. 81 The eastern corridor ....................................................... 82 The western corridor ....................................................... 84 General overview of the Sahara ......................................... 85 1) First occupation of the Green Sahara ........................ 85 2) Expansions within the Green Sahara .......................... 85 3) Regional differences at the end of the Green Sahara . 86 Pag 4 Dottorato di ricerca in Genetica e Biologia Molecolare Beyond the Green Sahara: other movements within and outside the African continent .................................... 87 The Mediterranean basin ................................................... 87 Sardinia ........................................................................... 87 The coastal region of northern Africa, the Near East and southern Europe. ................................................. 90 The Sahel ............................................................................ 91 The Fulbe people ............................................................ 91 Links between eastern and central Africa ....................... 92 Sub-Saharan Africa ............................................................ 95 The Horn of Africa .......................................................... 95 The Bantu expansion ....................................................... 96 MATERIALS AND METHODS ............................................... 99 The sample .............................................................................. 99 Sample quality and quantity control .................................. 99 Selection of the unique MSY regions .................................. 100 Targeted Next Generation Sequencing............................... 101 Targeting and library preparation ................................... 101 Sequencing and alignment ............................................... 102 Regional filtering .................................................................. 102 Analysis of the average depth ........................................... 102 Analysis of putative deletions/duplications ................... 103 SNP calling and filtering ...................................................... 104 SNP calling ....................................................................... 104 SNP filtering ..................................................................... 105 Direct filtering .............................................................. 105 Manual filtering ............................................................ 107 Cluster filtering ............................................................. 107 Pag. 5 Eugenia D’Atanasio Tree reconstruction and validation ..................................... 108 Reconstruction of the phylogenetic relations ................... 108 Check of published data .................................................... 109 Mutation rate and dating ..................................................... 109 Mutation rate estimate ...................................................... 109 Time estimates ................................................................... 110 Nodes from NGS data .................................................... 110 Nodes from genotyping ................................................. 111 Genotyping of informative markers.................................... 112 Selection of markers .......................................................... 112 Analysis of the selected SNPs ........................................... 113 Amplification ................................................................. 113 RFLP ........................................................................... 113 Sanger sequencing ........................................................ 113 Population data from literature ........................................ 115 Frequency maps .................................................................... 116 REFERENCES .......................................................................... 117 APPENDICES ........................................................................... 137 LIST OF PUBLICATIONS ...................................................... 138 Pag 6 Dottorato di ricerca in Genetica e Biologia Molecolare GLOSSARY ALT = Alternative ASD = Average of the squared distance CNV = Copy number variations bp = base pair BWA = Burrows-Wheeler aligner DHPLC = Denaturing high performance liquid chromatography DP = Depth FilDP4 = Filter based on DP4 kya = Kilo years ago Mb = Mega base pairs MQ = Mapping quality MSY = Male-specific region of the Y chromosome mtDNA = Mitochondrial DNA NGS = Next generation sequencing PAR = Pseudoautosomal region REF = Reference RFLP = Restriction fragment length polymorphism SD = Standard deviation SINE = Short interspersed nuclear element SNP = Single nucleotide polymorphism SNS = Single nucleotide substitution WGA = Whole genome amplification YAP = Y Alu polymorphism Pag. 7 Eugenia D’Atanasio SUMMARY Throughout the past millennia, the Sahara underwent strong climatic fluctuations. During the humid phases, the desert became fertile and was called the Green Sahara. During these periods, it was populated by fauna and hominins. The last humid phase occurred between 12 and 5 kya and the human occupation of the Sahara in that period is testified by a bulk of archaeological and paleoanthropological evidence. About 5