Gene Conversion
Total Page:16
File Type:pdf, Size:1020Kb
The Evolutionary Role of Human-Specific Genomic Events Yuval Itan SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY COLLEGE LONDON (UCL) September 2009 Centre for Mathematics & Physics in the Life Sciences and Experimental Biology (CoMPLEX) Department of Genetics, Evolution and Environment UCL Supervisor: Prof. Mark G. Thomas Second Supervisor: Dr. Kevin Bryson 1 Declaration of ownership. I, Yuval Itan, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis. 2 Abstract. In the short evolutionary time since the human-chimpanzee divergence, approximately 6.6 million years ago, humans have acquired a range of traits that are unique among primates. These include tripling brain size, enhanced cognitive abilities, complex culture, descended larynx structure that enables spoken language, longevity, specific diseases, inferior olfaction, and (in some human populations) adult lactase persistence. These traits were likely to have evolved through various genomic mechanisms, among them gene duplications and gene-culture co-evolution. Several studies have estimated the dates for some of these human lineage genomic events. However, no study to date has performed a genomewide estimate of the dates of all human gene duplications. Moreover, as many of these traits were likely to have evolved via gene-culture coevolutionary mechanisms, investigating the evolution of one of these human-specific traits – lactase persistence – provides a model example for in-depth future investigations of specific human phenotypes. In this study I have investigated an important class of human-specific genomic events – gene duplications (otherwise known as human inparalogues). I have developed a new bioinformatics approach for detecting human lineage-specific inparalogues and the duplication dates for those genes. I show that human-specific inparalogues are non- randomly distributed among biological function classes, and their duplication event dates are non-randomly distributed on a timeline between the date of the human- chimpanzee split and the present. I have also investigated the evolution of the human- specific polymorphic trait – lactase persistence. I have performed a worldwide correlation analysis comparing frequency data on all currently known lactase persistence-associated alleles and the distribution of the lactase persistence phenotype in different human populations. I have also performed a gene-culture co-evolution analysis, employing spatially explicit simulation and Approximate Bayesian Computation to condition simulations on genetic and archaeological data, in order to make inferences on the evolution of lactase persistence and dairying in Europe. 3 Table of Contents. Title Page. ................................................................................................................................................. 1 Declaration of Ownership. ......................................................................................................................... 2 Abstract. ...................................................................................................................................................... 3 Table of Contents. ...................................................................................................................................... 4 List of Figures. ........................................................................................................................................... 6 List of Tables. ............................................................................................................................................ 8 Abbreviations. ............................................................................................................................................ 9 Acknowledgements. ................................................................................................................................. 11 1. Introduction. ......................................................................................................................................... 12 1.1. Rationale of the Study. ............................................................................................................ 12 1.2. The Human Specific Phenotype: Human-Chimpanzee Differences. ....................................... 15 1.3. The Human Phenotype Evolution. ......................................................................................... 18 1.3.1. Palaeoanthropology Perspective on Human Phenotype Evolution – Fossil Record and Morphology. ................................................................................................................. 18 1.3.2. Genomic Perspective Human Phenotype Evolution – The Various Types of Genomic Events. ................................................................................................................................ 23 1.3.3. Cultural Perspective Human Phenotype Evolution – From Early Human to Farming and Modernity. .................................................................................................................... 26 1.4. Notable Genomic Events Contributing to Human Phenotype. ............................................... 30 1.4.1. Notable Genomic Events in Early Humans. ............................................................. 30 1.4.2. Notable Genomic Events in Modern Humans. ......................................................... 31 1.5. Integrating Early and Modern Human Genomic Studies. ........................................................ 32 2. Detecting Human-Chimpanzee Lineage Inparalogues. ................................................................... 34 2.1. Introduction. ........................................................................................................................... 34 2.1.1. Definitions of Evolutionary Terms Employed. ......................................................... 35 2.1.2. Review of Orthologues and Paralogues Detection Methods. ..................................... 37 2.1.3. Problems with Inparalogues Detection using InParanoid. ......................................... 43 2.1.3.1. Human Haplotype Data. ............................................................................. 44 2.1.3.2. Proteome Data. .......................................................................................... 45 2.1.3.3. Ambiguous Data. ....................................................................................... 45 2.1.3.4. Gene Conversion. ........................................................................................ 46 2.1.3.5. Non-Model Organisms. ............................................................................... 46 2.2. The Human Inparalogues Detection Algorithm. ..................................................................... 51 2.2.1. Choosing an Outgroup and filtering data. ................................................................. 51 2.2.2. Human-Mouse InParanoid Run. ............................................................................... 53 2.2.3. Human-Chimpanzee BLAT Run. ............................................................................. 54 2.2.4. Finding the Full Extent of Human Duplicated Regions. ........................................... 55 2.2.5. Alignment, Phylogenetic Trees and Molecular Clock Testing. ................................ 56 2.2.6. Gene Conversion. ..................................................................................................... 57 2.3. The Final Candidate Human Inparalogues Set. ....................................................................... 59 2.4. Discussion. .............................................................................................................................. 60 3. Estimating Dates of Human Lineage-Specific Gene Duplications. ................................................. 62 3.1. Introduction. ............................................................................................................................ 62 3.1.1. Primate Evolution and Human-Chimpanzee Divergence. ........................................ 63 3.1.2. Hypothesis and Rationale – Clusters of Duplication Events in Human Lineage. ..... 66 3.1.3. Molecular Clocks and Estimating Duplication Times. ............................................. 67 3.1.4. Studies Dating Divergence Events. ........................................................................... 70 3.1.5. The Novelty of the Study – Correlating Genomics with Fossil Record. .................... 71 3.2. Materials and Methods. ........................................................................................................... 72 3.2.1. Human Inparalogues Input. ........................................................................................ 72 3.2.2. Estimating Gene Duplication Times. ......................................................................... 73 3.2.3. Detection Duplication Dates Clusters. ...................................................................... 77 3.2.4. Assigning Biological Function to Duplications. ....................................................... 79 3.2.5. Detecting Gene-Enrichment in Human Inparalogues. ............................................... 80 3.3. Results. ....................................................................................................................................