Hundreds of Novel Composite Genes and Chimeric Genes with Bacterial

Hundreds of novel composite genes and chimeric genes with bacterial origins contributed to haloarchaeal evolution Raphaël Méheust, Andrew Watson, Francois-Joseph Lapointe, R. Thane Papke, Philippe Lopez, Eric Bapteste To cite this version: Raphaël Méheust, Andrew Watson, Francois-Joseph Lapointe, R. Thane Papke, Philippe Lopez, et al.. Hundreds of novel composite genes and chimeric genes with bacterial origins contributed to haloarchaeal evolution. Genome Biology, BioMed Central, 2018, 19, pp.75. 10.1186/s13059-018- 1454-9. hal-01830047 HAL Id: hal-01830047 https://hal.sorbonne-universite.fr/hal-01830047 Submitted on 4 Jul 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution| 4.0 International License Méheust et al. Genome Biology (2018) 19:75 https://doi.org/10.1186/s13059-018-1454-9 RESEARCH Open Access Hundreds of novel composite genes and chimeric genes with bacterial origins contributed to haloarchaeal evolution Raphaël Méheust1, Andrew K. Watson1, François-Joseph Lapointe3, R. Thane Papke2, Philippe Lopez1 and Eric Bapteste1* Abstract Background: Haloarchaea, a major group of archaea, are able to metabolize sugars and to live in oxygenated salty environments. Their physiology and lifestyle strongly contrast with that of their archaeal ancestors. Amino acid optimizations, which lowered the isoelectric point of haloarchaeal proteins, and abundant lateral gene transfers from bacteria have been invoked to explain this deep evolutionary transition. We use network analyses to show that the evolution of novel genes exclusive to Haloarchaea also contributed to the evolution of this group. Results: We report the creation of 320 novel composite genes, both early in the evolution of Haloarchaea during haloarchaeal genesis and later in diverged haloarchaeal groups. One hundred and twenty-six of these novel composite genes derived from genetic material from bacterial genomes. These latter genes, largely involved in metabolic functions but also in oxygenic lifestyle, constitute a different gene pool from the laterally acquired bacterial genes formerly identified. These novel composite genes were likely advantageous for their hosts, since they show significant residence times in haloarchaeal genomes—consistent with a long phylogenetic history involving vertical descent and lateral gene transfer—and encode proteins with optimized isoelectric points. Conclusions: Overall, our work encourages a systematic search for composite genes across all archaeal major groups, in order to better understand the origins of novel prokaryotic genes, and in order to test to what extent archaea might have adjusted their lifestyles by incorporating and recycling laterally acquired bacterial genetic fragments into new archaeal genes. Background transitions (a process we termed “haloarchaeal genesis”) Haloarchaea (also called Halobacteria) is an archaeal implied that Haloarchaea faced at least two major issues. class in which all members thrive in oxygenated hypersa- It involved numerous genetic events to transform their line environments using aerobic respiration and reduced physiology, as well as amino acid optimizations, which carbon sources. This lifestyle is in distinct contrast with allowed their proteins to remain soluble, resulting in the physiology of their methanogenic ancestors, which lower isoelectric points than their homologs outside this were autotrophic, and lived in oxygen-free habitats [1]. group [3]. While the latter changes can result from point Furthermore, Haloarchaea adapted to extreme osmotic mutation, abundant lateral gene transfers (LGT) from challenges by adopting a salt-in strategy making their bacteria have repeatedly been invoked to explain the cytosolic salinity equal to that of their environment – evolution and adaptation to oxygenic lifestyle of this ar- halophilic methanogens use compatible solutes to chaeal lineage [4]. balance their osmotic pressures [2]. These major lifestyle Phylogenetic studies, largely focused on the acquisition of full-sized genes by Haloarchaea from bacterial donors, * Correspondence: [email protected] proposed either a sudden and massive introgressive 1Sorbonne Universités, UPMC Univ Paris 06, Institut de Biologie Paris Seine, process [5, 6], or a more gradual and piecemeal process Centre National de la Recherche Scientifique, Unité Mixte de Recherche 7138 [7, 8] to explain the gains of a thousand gene families Evolution Paris Seine, 75005 Paris, France Full list of author information is available at the end of the article with bacterial origins in the haloarchaeal group [5, 6]. © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Méheust et al. Genome Biology (2018) 19:75 Page 2 of 12 Integrative modeling of gene and genome evolution in networks [19] that rely on both full and partial (e.g. the archaea has also suggested that, though gene families protein domain) pairwise similarity values to analyze are largely vertically transmitted within archaea, LGT similarity between sequences and to test whether gene has had a significant impact on archaeal genome remodeling was involved in the emergence of Haloarch- evolution, outnumbering expansion of genomes by du- aea and in their subsequent evolution. We report the plication of existing archaeal gene families in the major- creation of hundreds of novel composite genes, both ity of branches of the archaeal tree, including in the early in the evolution of Haloarchaea (during haloarch- haloarchaea where rates of LGT were particularly high aeal genesis) and later in diverged haloarchaeal groups. [9]. Importantly, in addition to these recognized LGTs, Based on the taxonomic assignment of the components the evolution of novel genes within the group could also of the composite genes, we distinguish three classes of explain how Haloarchaea arose and thrive, and this is composite genes, exclusively found in Haloarchaea. First, supported by the discovery of gene families correspond- we identified class I composite genes, which are formed ing to potential de novo origins in the Haloarchaea [9]. from unique associations of DNA components from ar- This type of change has rarely been assessed because lit- chaeal genomes (i.e. ARC-ARC and ARC-HALO com- tle is known about the origins of novel genes in prokary- posite genes). Second, we identified class II composite otic genomes [10, 11]. There are a range of different genes, which are derived from unique combinations of mechanisms that can produce novel genes, including de prokaryotic DNA, since these components cannot be novo genes, synthesized either partly or completely from confidently assigned either to a bacterial or an archaeal non-coding DNA [12], from the divergence of an exist- host (i.e. PROK-PROK, PROK-ARC, PROK-HALO com- ing protein-coding sequence beyond the point at which posite genes). Third, we identified chimeric composite it is recognizable as a homologue (e.g. following gene (ChiC) genes in Haloarchaea, which are made up of (at duplication events), or by fusion or fission of existing least one) component from bacterial genomes (i.e. protein-coding sequences [13]. New genes have been BAC-HALO, ARC-BAC, and PROK-BAC composite shown to be able to fulfil crucial roles in biological genes). Importantly, these latter genes constitute a differ- processes after relatively short evolutionary times for ent gene pool from the laterally acquired bacterial genes different lineages, highlighting their potential importance in detected by Nelson-Sathi [5]. Hence, haloarchaeal ChiC biological transitions [14]. Yet, events of gene remodeling genes reveal an additional substantial bacterial contribu- leading to the creation of novel genes, possibly contributing tion during the evolution of Haloarchaea. Many of these to haloarchaeal genesis and to the success of their descen- novel composite genes, i.e. ChiC genes and class I and II dants, remain to be systematically investigated. composite genes, were likely advantageous for these Gene remodeling has been described in prokaryotes, hosts. They showed significant residence times in mainly resulting from the fusion and fission of full-sized haloarchaeal genomes, as assessed by their taxonomic genes [15]. These two processes produce detectable distributions, consistent with a long phylogenetic history composite genes, i.e. genes composed of dissociable/as- involving vertical descent, and by the optimized isoelec- sociable parts, called components. Moreover, the transfer tric points

Hundreds of Novel Composite Genes and Chimeric Genes with Bacterial

The Biocyc Collection of Microbial Genomes and Metabolic Pathways Peter D

Biological Capacities Clearly Define a Major Subdivision in Domain Bacteria 4 5 Raphaël Méheust+,1,2, David Burstein+,1,3,8, Cindy J

Psortm: a Bacterial and Archaeal Protein Subcellular Localization Prediction Tool for Metagenomics Data Michael A

Pathway Tools Version 24.0: Integrated Software for Pathway/Genome Informatics and Systems Biology

A Sample Article Title

Computational Prediction and Comparative Analysis of Protein Subcellular Localization in Bacteria

Ontoloki: an Automatic, Instance-Based Method for the Evaluation of Biological Ontologies on the Semantic Web1

View Course Material

Dbmloc: a Database of Proteins with Multiple Subcellular Localizations Song Zhang†, Xuefeng Xia†, Jincheng Shen, Yun Zhou and Zhirong Sun*

Psortdb—An Expanded, Auto-Updated, User-Friendly Protein Subcellular Localization Database for Bacteria and Archaea Nancy Y

Ortholugedb: a Bacterial and Archaeal Orthology Resource for Improved Comparative Genomic Analysis Matthew D

Computational Ortholog Prediction: Evaluating Use Cases and Improving High-Throughput Performance