Evolutionary Analyses of Orphan Genes in Mouse Lineages in the Context of De Novo Gene Birth

Evolutionary analyses of orphan genes in mouse lineages in the context of de novo gene birth Dissertation zur Erlangung des Doktorgrades der Mathematisch-Naturwissenschaftlichen Fakultät der Christian-Albrechts-Universität zu Kiel vorgelegt von Rafik Tarek Neme Garrido Plön, April, 2013 Erstgutachter: Prof. Dr. Diethard Tautz Zweitgutachter: Prof. Dr. Thomas C. G. Bosch Tag der mündlichen Prüfung: 07.07.2014 Zum Druck genehmigt: 07.07.2014 gez. Prof. Dr. Wolfgang Duschl (Dekan) 2 Contents Contents .......................................................................................................................................... 3 Summary of the thesis .................................................................................................................... 6 Zusammenfassung der Dissertation............................................................................................... 7 Acknowledgements ....................................................................................................................... 10 General introduction...................................................................................................................... 12 A brief historic perspective on the concepts of gene birth .................................................... 12 Gene duplication is the main source of new genes .............................................................. 12 Orphan genes and the genomics era .................................................................................... 14 Phylostratigraphy and the continuous emergence of new genes ......................................... 16 Not all genes come from other genes ................................................................................... 17 Considering gene birth from molecular and evolutionary perspectives ................................... 19 Overprinting: true innovation from existing genes .................................................................... 20 The life cycle of genes .............................................................................................................. 22 Overview.................................................................................................................................... 24 Chapter 1: Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution ............................................................................................................................... 26 Introduction................................................................................................................................ 26 Results....................................................................................................................................... 27 Phylostratigraphy of mouse genes ........................................................................................ 27 Genomic features across ages.............................................................................................. 29 Chromosomal distribution ...................................................................................................... 33 Association with transcriptionally active sites ....................................................................... 33 Testis expressed genes......................................................................................................... 35 Alternative reading frames..................................................................................................... 36 Discussion ................................................................................................................................. 39 De novo evolution versus duplication-divergence ................................................................ 40 Regulatory evolution .............................................................................................................. 40 Overprinting ........................................................................................................................... 41 Conclusion................................................................................................................................. 42 Methods ..................................................................................................................................... 43 Phylostratigraphy ................................................................................................................... 43 3 Gene structure analyses........................................................................................................ 43 Transcription associated regions........................................................................................... 44 Expression data for testis ...................................................................................................... 44 Secondary reading frames .................................................................................................... 44 Acknowledgements ................................................................................................................... 45 Chapter 2: Sequencing of genomes and transcriptomes of closely related mouse species....... 46 Introduction................................................................................................................................ 46 Using wild mice to understand gene birth at the transcriptome level ................................... 46 Phylogeographic distribution of the samples ........................................................................ 47 Methods ..................................................................................................................................... 49 Biological material.................................................................................................................. 49 Transcriptome sequencing .................................................................................................... 49 Genome sequencing.............................................................................................................. 49 Raw data processing ............................................................................................................. 50 Transcriptome read mapping, annotation and quantification................................................ 50 Genome read mapping .......................................................................................................... 51 Available resources ................................................................................................................... 51 Chapter 3: Differential selective constrains across phylogenetic ages and their impact on the turnover of protein-coding genes. ................................................................................................. 53 Introduction................................................................................................................................ 53 Methods ..................................................................................................................................... 53 Transcriptome assembly ....................................................................................................... 53 Generation of ortholog pairs and rate analyses .................................................................... 54 Overlapping genes................................................................................................................. 54 Reading frame polymorphism detection and annotation ...................................................... 55 Statistical analyses ................................................................................................................ 55 Results....................................................................................................................................... 55 Rate differences between genes of different ages ............................................................... 55 Overlapping genes are an unlikely source of bias ................................................................ 57 Impact of reading frame polymorphisms across phylogenetic time...................................... 59 Discussion ................................................................................................................................. 64 Acknowledgements ................................................................................................................... 66 Chapter 4: A transcriptomics approach to the gain and loss of de novo genes in mouse lineages ....................................................................................................................................................... 67 Introduction................................................................................................................................ 67 4 How is a gene made? ............................................................................................................ 67 The early phase of new gene emergence............................................................................. 69 Pervasive transcription and junk-DNA as raw material for new genes ................................ 70 Methods ..................................................................................................................................... 71 Transcriptome presence/absence matrix and mapping of gains and losses ......................

Evolutionary Analyses of Orphan Genes in Mouse Lineages in the Context of De Novo Gene Birth

Identification of an Overprinting Gene in Merkel Cell Polyomavirus Provides Evolutionary Insight Into the Birth of Viral Genes

Examining the Process of De Novo Gene Birth: an Educational Primer on “Integration of New Genes Into Cellular Networks, and Their Structural Maturation”

Uncovering De Novo Gene Birth in Yeast Using Deep Transcriptomics

Forest Dormouse (Dryomys Nitedula, Rodentia, Gliridae) – a Highly Contagious Rodent in Relation to Zoonotic Diseases

Genesis of Non-Coding RNA Genes in Human Chromosome 22—A Sequence Connection with Protein Genes Separated by Evolutionary Time

Seasonal Changes in Tawny Owl (Strix Aluco) Diet in an Oak Forest in Eastern Ukraine

The Turkmen Lake Altyn Asyr and Water Resources in Turkmenistan the Handbook of Environmental Chemistry

Intergenic Orfs As Elementary Structural Modules of De Novo Gene Birth And

Evolution of Novel Genes in Three-Spined Stickleback Populations

Turnover of Ribosome-Associated Transcripts

Biosystematics Study of Steppe Field Mouse Apodemus Witherbyi (Rodentia: Muridae) from North-West Iran

From De Novo to 'De Nono': Most Novel Protein Coding Genes Identified With