Evolutionary Analyses of Orphan Genes in Mouse Lineages in the Context of De Novo Gene Birth
Total Page:16
File Type:pdf, Size:1020Kb
Evolutionary analyses of orphan genes in mouse lineages in the context of de novo gene birth Dissertation zur Erlangung des Doktorgrades der Mathematisch-Naturwissenschaftlichen Fakultät der Christian-Albrechts-Universität zu Kiel vorgelegt von Rafik Tarek Neme Garrido Plön, April, 2014 Erstgutachter: Prof. Dr. Diethard Tautz Zweitgutachter: Prof. Dr. Thomas C. G. Bosch Tag der mündlichen Prüfung: 07.07.2014 Zum Druck genehmigt: 07.07.2014 gez. Prof. Dr. Wolfgang Duschl (Dekan) 2 Contents Contents .................................................................................................................................... 3 Summary of the thesis ............................................................................................................... 6 Zusammenfassung der Dissertation ........................................................................................... 7 Acknowledgements ...................................................................................................................10 General introduction ..................................................................................................................12 A brief historic perspective on the concepts of gene birth ...................................................12 Gene duplication is the main source of new genes ............................................................12 Orphan genes and the genomics era .................................................................................14 Phylostratigraphy and the continuous emergence of new genes ........................................16 Not all genes come from other genes .................................................................................17 Considering gene birth from molecular and evolutionary perspectives...................................19 Overprinting: true innovation from existing genes ..................................................................20 The life cycle of genes ...........................................................................................................22 Overview ...............................................................................................................................24 Chapter 1: Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution ...........................................................................................................................26 Introduction ...........................................................................................................................26 Results ..................................................................................................................................27 Phylostratigraphy of mouse genes .....................................................................................27 Genomic features across ages ...........................................................................................29 Chromosomal distribution ..................................................................................................33 Association with transcriptionally active sites .....................................................................33 Testis expressed genes .....................................................................................................35 Alternative reading frames .................................................................................................36 Discussion .............................................................................................................................39 De novo evolution versus duplication-divergence...............................................................40 Regulatory evolution ..........................................................................................................40 Overprinting .......................................................................................................................41 Conclusion ............................................................................................................................42 Methods ................................................................................................................................43 Phylostratigraphy ...............................................................................................................43 3 Gene structure analyses ....................................................................................................43 Transcription associated regions ........................................................................................44 Expression data for testis ...................................................................................................44 Secondary reading frames .................................................................................................44 Acknowledgements ...............................................................................................................45 Chapter 2: Sequencing of genomes and transcriptomes of closely related mouse species .......46 Introduction ...........................................................................................................................46 Using wild mice to understand gene birth at the transcriptome level ..................................46 Phylogeographic distribution of the samples ......................................................................47 Methods ................................................................................................................................49 Biological material ..............................................................................................................49 Transcriptome sequencing .................................................................................................49 Genome sequencing ..........................................................................................................49 Raw data processing .........................................................................................................50 Transcriptome read mapping, annotation and quantification ..............................................50 Genome read mapping ......................................................................................................51 Available resources ...............................................................................................................51 Chapter 3: Differential selective constrains across phylogenetic ages and their impact on the turnover of protein-coding genes. ..............................................................................................53 Introduction ...........................................................................................................................53 Methods ................................................................................................................................53 Transcriptome assembly ....................................................................................................53 Generation of ortholog pairs and rate analyses ..................................................................54 Overlapping genes .............................................................................................................54 Reading frame polymorphism detection and annotation .....................................................55 Statistical analyses ............................................................................................................55 Results ..................................................................................................................................55 Rate differences between genes of different ages ..............................................................55 Overlapping genes are an unlikely source of bias ..............................................................57 Impact of reading frame polymorphisms across phylogenetic time .....................................59 Discussion .............................................................................................................................64 Acknowledgements ...............................................................................................................66 Chapter 4: A transcriptomics approach to the gain and loss of de novo genes in mouse lineages .................................................................................................................................................67 Introduction ...........................................................................................................................67 4 How is a gene made? ........................................................................................................67 The early phase of new gene emergence ..........................................................................69 Pervasive transcription and junk-DNA as raw material for new genes ................................70 Methods ................................................................................................................................71 Transcriptome presence/absence matrix and mapping of gains and losses .......................71 Results ..................................................................................................................................73 How much of the mouse genome has evidence of transcription? .......................................73 Genome-wide