Evolutionary Analyses of Orphan Genes in Mouse Lineages in the Context of De Novo Gene Birth

Total Page:16

File Type:pdf, Size:1020Kb

Evolutionary Analyses of Orphan Genes in Mouse Lineages in the Context of De Novo Gene Birth Evolutionary analyses of orphan genes in mouse lineages in the context of de novo gene birth Dissertation zur Erlangung des Doktorgrades der Mathematisch-Naturwissenschaftlichen Fakultät der Christian-Albrechts-Universität zu Kiel vorgelegt von Rafik Tarek Neme Garrido Plön, April, 2013 Erstgutachter: Prof. Dr. Diethard Tautz Zweitgutachter: Prof. Dr. Thomas C. G. Bosch Tag der mündlichen Prüfung: 07.07.2014 Zum Druck genehmigt: 07.07.2014 gez. Prof. Dr. Wolfgang Duschl (Dekan) 2 Contents Contents .......................................................................................................................................... 3 Summary of the thesis .................................................................................................................... 6 Zusammenfassung der Dissertation............................................................................................... 7 Acknowledgements ....................................................................................................................... 10 General introduction...................................................................................................................... 12 A brief historic perspective on the concepts of gene birth .................................................... 12 Gene duplication is the main source of new genes .............................................................. 12 Orphan genes and the genomics era .................................................................................... 14 Phylostratigraphy and the continuous emergence of new genes ......................................... 16 Not all genes come from other genes ................................................................................... 17 Considering gene birth from molecular and evolutionary perspectives ................................... 19 Overprinting: true innovation from existing genes .................................................................... 20 The life cycle of genes .............................................................................................................. 22 Overview.................................................................................................................................... 24 Chapter 1: Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution ............................................................................................................................... 26 Introduction................................................................................................................................ 26 Results....................................................................................................................................... 27 Phylostratigraphy of mouse genes ........................................................................................ 27 Genomic features across ages.............................................................................................. 29 Chromosomal distribution ...................................................................................................... 33 Association with transcriptionally active sites ....................................................................... 33 Testis expressed genes......................................................................................................... 35 Alternative reading frames..................................................................................................... 36 Discussion ................................................................................................................................. 39 De novo evolution versus duplication-divergence ................................................................ 40 Regulatory evolution .............................................................................................................. 40 Overprinting ........................................................................................................................... 41 Conclusion................................................................................................................................. 42 Methods ..................................................................................................................................... 43 Phylostratigraphy ................................................................................................................... 43 3 Gene structure analyses........................................................................................................ 43 Transcription associated regions........................................................................................... 44 Expression data for testis ...................................................................................................... 44 Secondary reading frames .................................................................................................... 44 Acknowledgements ................................................................................................................... 45 Chapter 2: Sequencing of genomes and transcriptomes of closely related mouse species....... 46 Introduction................................................................................................................................ 46 Using wild mice to understand gene birth at the transcriptome level ................................... 46 Phylogeographic distribution of the samples ........................................................................ 47 Methods ..................................................................................................................................... 49 Biological material.................................................................................................................. 49 Transcriptome sequencing .................................................................................................... 49 Genome sequencing.............................................................................................................. 49 Raw data processing ............................................................................................................. 50 Transcriptome read mapping, annotation and quantification................................................ 50 Genome read mapping .......................................................................................................... 51 Available resources ................................................................................................................... 51 Chapter 3: Differential selective constrains across phylogenetic ages and their impact on the turnover of protein-coding genes. ................................................................................................. 53 Introduction................................................................................................................................ 53 Methods ..................................................................................................................................... 53 Transcriptome assembly ....................................................................................................... 53 Generation of ortholog pairs and rate analyses .................................................................... 54 Overlapping genes................................................................................................................. 54 Reading frame polymorphism detection and annotation ...................................................... 55 Statistical analyses ................................................................................................................ 55 Results....................................................................................................................................... 55 Rate differences between genes of different ages ............................................................... 55 Overlapping genes are an unlikely source of bias ................................................................ 57 Impact of reading frame polymorphisms across phylogenetic time...................................... 59 Discussion ................................................................................................................................. 64 Acknowledgements ................................................................................................................... 66 Chapter 4: A transcriptomics approach to the gain and loss of de novo genes in mouse lineages ....................................................................................................................................................... 67 Introduction................................................................................................................................ 67 4 How is a gene made? ............................................................................................................ 67 The early phase of new gene emergence............................................................................. 69 Pervasive transcription and junk-DNA as raw material for new genes ................................ 70 Methods ..................................................................................................................................... 71 Transcriptome presence/absence matrix and mapping of gains and losses ......................
Recommended publications
  • Identification of an Overprinting Gene in Merkel Cell Polyomavirus Provides Evolutionary Insight Into the Birth of Viral Genes
    Identification of an overprinting gene in Merkel cell polyomavirus provides evolutionary insight into the birth of viral genes Joseph J. Cartera,b,1,2, Matthew D. Daughertyc,1, Xiaojie Qia, Anjali Bheda-Malgea,3, Gregory C. Wipfa, Kristin Robinsona, Ann Romana, Harmit S. Malikc,d, and Denise A. Gallowaya,b,2 Divisions of aHuman Biology, bPublic Health Sciences, and cBasic Sciences and dHoward Hughes Medical Institute, Fred Hutchinson Cancer Research Center, Seattle, WA 98109 Edited by Peter M. Howley, Harvard Medical School, Boston, MA, and approved June 17, 2013 (received for review February 24, 2013) Many viruses use overprinting (alternate reading frame utiliza- mammals and birds (7, 8). Polyomaviruses leverage alternative tion) as a means to increase protein diversity in genomes severely splicing of the early region (ER) of the genome to generate pro- constrained by size. However, the evolutionary steps that facili- tein diversity, including the large and small T antigens (LT and ST, tate the de novo generation of a novel protein within an ancestral respectively) and the middle T antigen (MT) of murine poly- ORF have remained poorly characterized. Here, we describe the omavirus (MPyV), which is generated by a novel splicing event and identification of an overprinting gene, expressed from an Alter- overprinting of the second exon of LT. Some polyomaviruses can nate frame of the Large T Open reading frame (ALTO) in the early drive tumorigenicity, and gene products from the ER, especially region of Merkel cell polyomavirus (MCPyV), the causative agent SV40 LT and MPyV MT, have been extraordinarily useful models of most Merkel cell carcinomas.
    [Show full text]
  • Examining the Process of De Novo Gene Birth: an Educational Primer on “Integration of New Genes Into Cellular Networks, and Their Structural Maturation”
    PRIMER Examining the Process of de Novo Gene Birth: An Educational Primer on “Integration of New Genes into Cellular Networks, and Their Structural Maturation” Seth Frietze and Judith Leatherman1 School of Biological Sciences, University of Northern Colorado, Greeley, Colorado 80639 SUMMARY New genes that arise from modification of the noncoding portion of a genome rather than being duplicated from parent genes are called de novo genes. These genes, identified by their brief evolution and lack of parent genes, provide an opportunity to study the timeframe in which emerging genes integrate into cellular networks, and how the characteristics of these genes change as they mature into bona fide genes. An article by G. Abrusán provides an opportunity to introduce students to fundamental concepts in evolutionary and comparative genetics and to provide a technical background by which to discuss systems biology approaches when studying the evolutionary process of gene birth. Basic background needed to understand the Abrusán study and details on comparative genomic concepts tailored for a classroom discussion are provided, including discussion questions and a supplemental exercise on navigating a genome database. Related article in GENETICS: Abrusán, G., 2013 Integration of New Genes into Cellular Networks, and Their Structural Maturation. Genetics 195: 1407–1417. Background duplication has occurred over and over during evolutionary history, leaving most species with “families” of related genes IFE as we know it is encoded by the DNA of our genomes. arising from multiple duplication events (Demuth and Hahn Every cellular being has its own genome specifying the L 2009). When an organism duplicates a gene, one gene copy information required for constructing and maintaining its exis- can continue to perform the original function of that gene tence (its genotype).
    [Show full text]
  • Uncovering De Novo Gene Birth in Yeast Using Deep Transcriptomics
    ARTICLE https://doi.org/10.1038/s41467-021-20911-3 OPEN Uncovering de novo gene birth in yeast using deep transcriptomics William R. Blevins 1,2,3, Jorge Ruiz-Orera 1,4, Xavier Messeguer5, Bernat Blasco-Moreno 6, ✉ José Luis Villanueva-Cañas 1,7, Lorena Espinar2,8, Juana Díez 6, Lucas B. Carey 2,9 & M. Mar Albà 1,10 De novo gene origination has been recently established as an important mechanism for the formation of new genes. In organisms with a large genome, intergenic and intronic regions 1234567890():,; provide plenty of raw material for new transcriptional events to occur, but little is know about how de novo transcripts originate in more densely-packed genomes. Here, we identify 213 de novo originated transcripts in Saccharomyces cerevisiae using deep transcriptomics and genomic synteny information from multiple yeast species grown in two different conditions. We find that about half of the de novo transcripts are expressed from regions which already harbor other genes in the opposite orientation; these transcripts show similar expression changes in response to stress as their overlapping counterparts, and some appear to translate small proteins. Thus, a large fraction of de novo genes in yeast are likely to co-evolve with already existing genes. 1 Evolutionary Genomics Group, Research Programme on Biomedical Informatics, Hospital del Mar Research Institute (IMIM) and Universitat Pompeu Fabra (UPF), Barcelona, Spain. 2 Single Cell Behavior Group, Department of Experimental and Health Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain. 3 Single Cell Genomics Group, Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain. 4 Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany.
    [Show full text]
  • Forest Dormouse (Dryomys Nitedula, Rodentia, Gliridae) – a Highly Contagious Rodent in Relation to Zoonotic Diseases
    FORESTRY IDEAS, 2020, vol. 26, No 1 (59): 262–269 FOREST DORMOUSE (DRYOMYS NITEDULA, RODENTIA, GLIRIDAE) – A HIGHLY CONTAGIOUS RODENT IN RELATION TO ZOONOTIC DISEASES Alexey Andreychev1* and Ekaterina Boyarova2 1Department of Zoology, National Research Mordovia State University, Saransk 430005, Russia. *E-mail: [email protected] 2Center for Hygiene and Epidemiology in the Republic of Mordovia, Saransk 430030, 1a Dal’naya Str., Russia. Received: 22 March 2020 Accepted: 23 May 2020 Abstract Republic of Mordovia in Russia is a historical focus for hemorrhagic fever with renal syndrome and tularemia. This study aimed at assessing the current status of these foci by studying their rodent reservoirs. Among the small rodents in Mordovia, the red bank vole, the common vole and the house mouse play an important role as carriers of zoonotic diseases. However, it is neces- sary to take into account the role of such а small species as forest dormouse (Dryomys nitedula), which has a high percentage of infection. Of all examined dormice, 66.7 % were found to have antigens of hemorrhagic fever with renal syndrome viral pathogen, and 33.3 % - to have antigens of tularemia pathogen. Only one specimen (16.7 %) was found to have no antigens of zoonotic diseases. Our study concluded that the forest dormouse in the Republic of Mordovia was incrim- inated as a pathogen reservoir causing infectious diseases in human. Key words: hemorrhagic fever, Mordovia, rodents, tularemia. Introduction gion of Middle Volga (Ulyanovsk region) (Nafeev et al. 2009). Small mammals are Hemorrhagic fever with renal syndrome the most incriminated reservoirs for viral (HFRS) is the most common natural ro- pathogen agents causing human HFRS dent borne disease of viral etiology in the in the natural environments (Tsvirko and Republic of Mordovia.
    [Show full text]
  • Genesis of Non-Coding RNA Genes in Human Chromosome 22—A Sequence Connection with Protein Genes Separated by Evolutionary Time
    non-coding RNA Perspective Genesis of Non-Coding RNA Genes in Human Chromosome 22—A Sequence Connection with Protein Genes Separated by Evolutionary Time Nicholas Delihas Department of Microbiology and Immunology, Renaissance School of Medicine, Stony Brook University, Stony Brook, New York, NY 11794-5222, USA; [email protected] Received: 16 July 2020; Accepted: 1 September 2020; Published: 3 September 2020 Abstract: A small phylogenetically conserved sequence of 11,231 bp, termed FAM247, is repeated in human chromosome 22 by segmental duplications. This sequence forms part of diverse genes that span evolutionary time, the protein genes being the earliest as they are present in zebrafish and/or mice genomes, and the long noncoding RNA genes and pseudogenes the most recent as they appear to be present only in the human genome. We propose that the conserved sequence provides a nucleation site for new gene development at evolutionarily conserved chromosomal loci where the FAM247 sequences reside. The FAM247 sequence also carries information in its open reading frames that provides protein exon amino acid sequences; one exon plays an integral role in immune system regulation, specifically, the function of ubiquitin-specific protease (USP18) in the regulation of interferon. An analysis of this multifaceted sequence and the genesis of genes that contain it is presented. Keywords: de novo gene birth; gene evolution; protogene; long noncoding RNA genes; pseudogenes; USP18; GGT5; Alu sequences 1. Introduction The genesis of genes has been a major topic of interest for several decades [1,2]. One mechanism of gene formation is by duplication of existing genes [1,3].
    [Show full text]
  • Seasonal Changes in Tawny Owl (Strix Aluco) Diet in an Oak Forest in Eastern Ukraine
    Turkish Journal of Zoology Turk J Zool (2017) 41: 130-137 http://journals.tubitak.gov.tr/zoology/ © TÜBİTAK Research Article doi:10.3906/zoo-1509-43 Seasonal changes in Tawny Owl (Strix aluco) diet in an oak forest in Eastern Ukraine 1, 2 Yehor YATSIUK *, Yuliya FILATOVA 1 National Park “Gomilshanski Lisy”, Kharkiv region, Ukraine 2 Department of Zoology and Animal Ecology, Faculty of Biology, V.N. Karazin Kharkiv National University, Kharkiv, Ukraine Received: 22.09.2015 Accepted/Published Online: 25.04.2016 Final Version: 25.01.2017 Abstract: We analyzed seasonal changes in Tawny Owl (Strix aluco) diet in a broadleaved forest in Eastern Ukraine over 6 years (2007– 2012). Annual seasons were divided as follows: December–mid-April, April–June, July–early October, and late October–November. In total, 1648 pellets were analyzed. The most important prey was the bank vole (Myodes glareolus) (41.9%), but the yellow-necked mouse (Apodemus flavicollis) (17.8%) dominated in some seasons. According to trapping results, the bank vole was the most abundant rodent species in the study region. The most diverse diet was in late spring and early summer. Small forest mammals constituted the dominant group in all seasons, but in spring and summer their share fell due to the inclusion of birds and the common spadefoot (Pelobates fuscus). Diet was similar in late autumn, before the establishment of snow cover, and in winter. The relative representation of species associated with open spaces increased in winter, especially in years with deep snow cover, which may indicate seasonal changes in the hunting habitats of the Tawny Owl.
    [Show full text]
  • The Turkmen Lake Altyn Asyr and Water Resources in Turkmenistan the Handbook of Environmental Chemistry
    The Handbook of Environmental Chemistry 28 Series Editors: Damià Barceló · Andrey G. Kostianoy Igor S. Zonn Andrey G. Kostianoy Editors The Turkmen Lake Altyn Asyr and Water Resources in Turkmenistan The Handbook of Environmental Chemistry Founded by Otto Hutzinger Editors-in-Chief: Damia` Barcelo´ l Andrey G. Kostianoy Volume 28 Advisory Board: Jacob de Boer, Philippe Garrigues, Ji-Dong Gu, Kevin C. Jones, Thomas P. Knepper, Alice Newton, Donald L. Sparks The Handbook of Environmental Chemistry Recently Published and Forthcoming Volumes The Turkmen Lake Altyn Asyr and Emerging and Priority Pollutants in Water Resources in Turkmenistan Rivers: Bringing Science into River Volume Editors: I.S. Zonn Management Plans and A.G. Kostianoy Volume Editors: H. Guasch, A. Ginebreda, Vol. 28, 2014 and A. Geiszinger Vol. 19, 2012 Oil Pollution in the Baltic Sea Global Risk-Based Management of Volume Editors: A.G. Kostianoy Chemical Additives I: Production, and O.Yu. Lavrova Usage and Environmental Occurrence Vol. 27, 2014 Volume Editors: B. Bilitewski, R.M. Darbra, and D. Barcelo´ Urban Air Quality in Europe Vol. 18, 2012 Volume Editor: M. Viana Vol. 26, 2013 Polyfluorinated Chemicals and Transformation Products Climate Change and Water Resources Volume Editors: T.P. Knepper Volume Editors: T. Younos and C.A. Grady and F.T. Lange Vol. 25, 2013 Vol. 17, 2012 Emerging Organic Contaminants in Brominated Flame Retardants Sludges: Analysis, Fate and Biological Volume Editors: E. Eljarrat and D. Barcelo´ Treatment Vol. 16, 2011 Volume Editors: T. Vicent, G. Caminal, E. Eljarrat, and D. Barcelo´ Effect-Directed Analysis of Complex Vol. 24, 2013 Environmental Contamination Volume Editor: W.
    [Show full text]
  • Intergenic Orfs As Elementary Structural Modules of De Novo Gene Birth And
    bioRxiv preprint doi: https://doi.org/10.1101/2021.04.13.439703; this version posted April 13, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. 1 Title 2 3 Intergenic ORFs as elementary structural modules of de novo gene birth and 4 protein evolution 5 6 Running Title 7 de novo gene birth and protein evolution 8 9 Authors 10 11 Chris Papadopoulos1, Isabelle Callebaut2, Jean-Christophe Gelly3,4,5, Isabelle Hatin1, Olivier 12 Namy1, Maxime Renard1, Olivier Lespinet1, Anne Lopes1 13 14 15 Affiliations 16 17 1 Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France. 18 2 Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique 19 des Matériaux et de Cosmochimie, IMPMC, 75005, Paris, France 20 3 Université de Paris, Biologie Intégrée du Globule Rouge, UMR_S1134, BIGR, INSERM, F-75015, Paris, France. 21 4 Laboratoire d’Excellence GR-Ex, Paris, France 22 5 Institut National de la Transfusion Sanguine, F-75015, Paris, France 23 24 25 26 Corresponding author 27 28 Anne Lopes, Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, 29 Université Paris-Saclay, 91198, Gif-sur-Yvette, cedex, France 30 Tel: +33 1 69 82 62 23 31 email: [email protected] 32 33 34 35 36 Keywords 37 de novo genes, noncoding genome, foldability, genome evolution, protein evolution, protein bricks 38 39 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.13.439703; this version posted April 13, 2021.
    [Show full text]
  • Evolution of Novel Genes in Three-Spined Stickleback Populations
    Heredity (2020) 125:50–59 https://doi.org/10.1038/s41437-020-0319-7 ARTICLE Evolution of novel genes in three-spined stickleback populations 1 2 1 Jonathan F. Schmitz ● Frédéric J. J. Chain ● Erich Bornberg-Bauer Received: 24 November 2019 / Revised: 27 April 2020 / Accepted: 30 April 2020 / Published online: 4 June 2020 © The Author(s) 2020. This article is published with open access Abstract Eukaryotic genomes frequently acquire new protein-coding genes which may significantly impact an organism’s fitness. Novel genes can be created, for example, by duplication of large genomic regions or de novo, from previously non-coding DNA. Either way, creation of a novel transcript is an essential early step during novel gene emergence. Most studies on the gain-and-loss dynamics of novel genes so far have compared genomes between species, constraining analyses to genes that have remained fixed over long time scales. However, the importance of novel genes for rapid adaptation among populations has recently been shown. Therefore, since little is known about the evolutionary dynamics of transcripts across natural populations, we here study transcriptomes from several tissues and nine geographically distinct populations of an ecological model species, the three-spined stickleback. Our findings suggest that novel genes typically start out as transcripts with low expression and high tissue specificity. Early expression regulation appears to be mediated by gene-body methylation. 1234567890();,: 1234567890();,: Although most new and narrowly expressed genes are rapidly lost, those that survive and subsequently spread through populations tend to gain broader and higher expression levels. The properties of the encoded proteins, such as disorder and aggregation propensity, hardly change.
    [Show full text]
  • Turnover of Ribosome-Associated Transcripts
    Turnover of ribosome-associated transcripts from de novo ORFs produces gene-like characteristics available for de novo gene emergence in wild yeast populations Eléonore Durand, Isabelle Gagnon-Arsenault, Johan Hallin, Isabelle Hatin, Alexandre Dubé, Lou Nielly-Thibault, Olivier Namy, Christian Landry To cite this version: Eléonore Durand, Isabelle Gagnon-Arsenault, Johan Hallin, Isabelle Hatin, Alexandre Dubé, et al.. Turnover of ribosome-associated transcripts from de novo ORFs produces gene-like characteristics available for de novo gene emergence in wild yeast populations. Genome Research, Cold Spring Harbor Laboratory Press, 2019, 29, Epub ahead of print. 10.1101/gr.239822.118. hal-02146516 HAL Id: hal-02146516 https://hal.archives-ouvertes.fr/hal-02146516 Submitted on 17 Nov 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Research Turnover of ribosome-associated transcripts from de novo ORFs produces gene-like characteristics available for de novo gene emergence in wild yeast populations Éléonore Durand,1,4 Isabelle Gagnon-Arsenault,1,2
    [Show full text]
  • Biosystematics Study of Steppe Field Mouse Apodemus Witherbyi (Rodentia: Muridae) from North-West Iran
    Iranian Journal of Animal Biosystematics(IJAB) Vol.5, No.1, 47-58, 2009 ISSN: 1735-434X Biosystematics study of Steppe Field Mouse Apodemus witherbyi (Rodentia: Muridae) from North-West Iran MOHAMADALI HOSSEINPOUR FEIZI1, JAMSHID DARVISH2, NASER POULADI4, SAFIE AKBARI RAD2 AND ROOHOLLAH SIAHSARVIE2,3 1- Center of Excellence for Biodiversity, University of Tabriz, Iran 2- Rodentology Research Dept., Faculty of Science, Ferdowsi University of Mashhad, 91775-1436, Mashhad, Iran. 3- Institut des Sciences de l’Evolution, Cc 064, Université de Montpellier 2, Place Eugene Bataillon, 34095 Montpellier Cedex 5, France 4- Azarbaijan University of Tarbiat Moallem, Iran Apodemus witherbyi is a species with wide distribution and geographic variation in the Iranian plateau. This species exists in syntopic, sympatric or parapatric with other four reported Sylvaemus species from Iran, i.e. A. hyrcanicus, A. uralensis, A. flavicolis and A. avicennicus. In this study 33 specimens from different localities in NW Iran were examined to study the taxonomic status and the intra-specific variation of the populations. The study was carried out based on the RFLP analysis of cytochrome b (mtDNA), as well as the morphological and morphometric analyses external, cranial and dental characters. The results reveal that all the specimens studied belong to a same species, A. witherbyi. Variation range of the morphometric characters and the frequency of the morphological traits are provided. Key words: wood mouse, Apodemus witherbyi, RFLP, morphology, morphometry INTRODUCTION Wood mice of the genus Apodemus Kaup, 1829 have been the subject of many systematic and evolutionary studies in the last decades (e.g. Mezhzherin 1997; Mezhzherin et al 1989 and 1992; Fillipucci et al.
    [Show full text]
  • From De Novo to 'De Nono': Most Novel Protein Coding Genes Identified With
    bioRxiv preprint doi: https://doi.org/10.1101/287193; this version posted March 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 1 From de novo to ‘de nono’: most novel protein coding genes 2 identified with phylostratigraphy represent old genes or recent 3 duplicates 4 5 Claudio Casola1* 6 7 1 Department of Ecosystem Science and Management, Texas A&M University, 8 College Station, TX 77843-2138 9 10 11 Abstract 12 The evolution of novel protein-coding genes from noncoding regions of the 13 genome is one of the most compelling evidence for genetic innovations in nature. 14 One popular approach to identify de novo genes is phylostratigraphy, which 15 consists of determining the approximate time of origin (age) of a gene based on 16 its distribution along a species phylogeny. Several studies have revealed 17 significant flaws in determining the age of genes, including de novo genes, using 18 phylostratigraphy alone. However, the rate of false positives in de novo gene 19 surveys, based on phylostratigraphy, remains unknown. Here, I re-analyze the 20 findings from three studies, two of which identified tens to hundreds of rodent- 21 specific de novo genes adopting a phylostratigraphy-centered approach. Most of 22 the putative de novo genes discovered in these investigations are no longer 23 included in recently updated mouse gene sets. Using a combination of synteny 24 information and sequence similarity searches, I show that about 60% of the 25 remaining 381 putative de novo genes share homology with genes from other 26 vertebrates, originated through gene duplication, and/or share no synteny 27 information with non-rodent mammals.
    [Show full text]