The Integrated Microbial Genomes Database and Comparative Analysis System

Total Page:16

File Type:pdf, Size:1020Kb

The Integrated Microbial Genomes Database and Comparative Analysis System University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln US Department of Energy Publications U.S. Department of Energy 2012 IMG: the integrated microbial genomes database and comparative analysis system Victor M. Markowitz Lawrence Berkeley National Laboratory, [email protected] I-Min A. Chen Lawrence Berkeley National Laboratory, [email protected] Krishna Palaniappan Lawrence Berkeley National Laboratory Ken Chu Lawrence Berkeley National Laboratory Ernest Szeto Lawrence Berkeley National Laboratory See next page for additional authors Follow this and additional works at: https://digitalcommons.unl.edu/usdoepub Part of the Bioresource and Agricultural Engineering Commons Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Grechkin, Yuri; Ratner, Anna; Jacob, Biju; Huang, Jinghua; Williams, Peter; Huntemann, Marcel; Anderson, Iain; Mavromatis, Konstantinos; Ivanova, Natalia N.; and Kyrpides, Nikos C., "IMG: the integrated microbial genomes database and comparative analysis system" (2012). US Department of Energy Publications. 291. https://digitalcommons.unl.edu/usdoepub/291 This Article is brought to you for free and open access by the U.S. Department of Energy at DigitalCommons@University of Nebraska - Lincoln. It has been accepted for inclusion in US Department of Energy Publications by an authorized administrator of DigitalCommons@University of Nebraska - Lincoln. Authors Victor M. Markowitz, I-Min A. Chen, Krishna Palaniappan, Ken Chu, Ernest Szeto, Yuri Grechkin, Anna Ratner, Biju Jacob, Jinghua Huang, Peter Williams, Marcel Huntemann, Iain Anderson, Konstantinos Mavromatis, Natalia N. Ivanova, and Nikos C. Kyrpides This article is available at DigitalCommons@University of Nebraska - Lincoln: https://digitalcommons.unl.edu/ usdoepub/291 Nucleic Acids Research, 2012, Vol. 40, Database issue D115–D122 doi:10.1093/nar/gkr1044 IMG: the integrated microbial genomes database and comparative analysis system Victor M. Markowitz1,*, I-Min A. Chen1, Krishna Palaniappan1, Ken Chu1, Ernest Szeto1, Yuri Grechkin1, Anna Ratner1, Biju Jacob1, Jinghua Huang1, Peter Williams2, Marcel Huntemann2, Iain Anderson2, Konstantinos Mavromatis2, Natalia N. Ivanova2 and Nikos C. Kyrpides2,* 1Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley and 2Microbial Genomics and Metagenomics Program, Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, USA Received September 15, 2011; Accepted October 24, 2011 ABSTRACT from RefSeq including its organization into chromosomal replicons (for finished genomes) and scaffolds and/or The Integrated Microbial Genomes (IMG) system contigs (for draft genomes), together with predicted serves as a community resource for comparative protein-coding sequences (CDSs), some RNA-coding analysis of publicly available genomes in a compre- genes and protein product names that are provided by hensive integrated context. IMG integrates publicly the genome sequence centres. available draft and complete genomes from all three IMG’s data integration pipeline associates every domains of life with a large number of plasmids genome with metadata from GOLD (2), and fills in add- and viruses. IMG provides tools and viewers for itional information potentially missing from the RefSeq analyzing and reviewing the annotations of genes files such as CRISPR repeats (3), signal peptides and genomes in a comparative context. IMG’s data computed using SignalP (4) and transmembrane helices content and analytical capabilities have been con- computed using TMHMM (5). Missing RNAs are identified using tRNAS-can-SE-1.23 (6) for tRNAs, in tinuously extended through regular updates since house developed HMMs for rRNAs (7), and Rfam (8) its first release in March 2005. IMG is available at and INFERNAL v1.0 (9) for other small RNAs. Genes http://img.jgi.doe.gov. Companion IMG systems are associated with ‘secondary’ functional annotations provide support for expert review of genome anno- and lists of related (e.g. homologue, paralogue) genes. tations (IMG/ER: http://img.jgi.doe.gov/er), teaching IMG generated annotations consist of protein family courses and training in microbial genome analysis and domain characterizations based on COG clusters (IMG/EDU: http://img.jgi.doe.gov/edu) and analysis and functional categories (10), Pfam (11), TIGRfam and of genomes related to the Human Microbiome TIGR role categories (12), InterPro domains (13), Gene Project (IMG/HMP: http://www.hmpdacc-resources Ontology (GO) terms (14) and KEGG Ortholog (KO) .org/img_hmp). terms and pathways (15). The association of KEGG pathways with IMG genomes is based on the assignment of KEGG Orthology (KO) terms to IMG genes via a mapping of INTRODUCTION IMG genes to KEGG genes. The MetaCyc collection of The Integrated Microbial Genomes (IMG) system inte- pathways (16) is also available in IMG, whereby the asso- grates publicly available draft and complete microbial ciation of MetaCyc pathways with IMG genomes is based genomes from all three domains of life with a large on correlating enzyme EC numbers in MetaCyc reactions number of plasmids and viruses. IMG employs NCBI’s with EC numbers associated with IMG genes via KO RefSeq resource (1) as its main source of public genome terms. Genes are further characterized using an IMG sequence data, and ‘primary’ annotations consisting of native collection of generic (protein cluster-independent) predicted genes and protein products. For every genome, functional roles called IMG terms that are defined by IMG records its primary genome sequence information their association with generic (organism-independent) *To whom correspondence should be addressed. Tel: +510 486 7073; Fax: +1 510 486 5812; Email: [email protected] Correspondence may also be addressed to Nikos C. Kyrpides. Tel: +925 296 5718; Fax: +925 296 5666; Email: [email protected] ß The Author(s) 2011. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. D116 Nucleic Acids Research, 2012, Vol. 40, Database issue functional hierarchies, called IMG pathways (17). IMG The ‘Expert Review’ version of IMG, IMG/ER (24), terms and pathways are specified by domain experts at allows individual scientists or groups of scientists to DOE-JGI as part of the process of annotating specific review and curate the functional annotation of microbial genomes of interest, and are subsequently propagated to genomes in the context of IMG’s public genomes. all the genomes in IMG using a rule based methodology Scientists can submit their private genome data sets into (18). Transporter genes are linked to the Transport Classi- IMG ER (using password protected access) prior to their fication Database (19) based on their assignment to COG, public release either with their original annotations Pfam or TIGRfam domains or IMG Terms that or with annotations generated by IMG’s annotation correspond to transporter families. pipeline (18). Since August 2009, close to 750 private For each gene, IMG provides lists of related (e.g. can- genomes have been reviewed and curated using IMG/ER. didate homologue, paralogue, orthologue) genes that are Genomes generated as part of the Human Microbiome based on sequence similarities computed using NCBI Project (HMP) (25) and the Genome Encyclopedia of BLASTp for protein coding genes and BLASTn for Bacterial and Archaea Genomes (GEBA) project (26) RNA genes. Such lists of genes can be filtered using are of special interest. With the goal of characterizing percent identity, bit score and more stringent E-values. microbial communities found at multiple human body IMG’s data integration pipeline identifies gene fusions sites, HMP has initially focused on the sequencing of and conserved gene cassettes (putative operons). A fused reference genomes from both cultured and uncultured gene (fusion) is defined as a gene that is formed from the bacteria (25). Over 550 reference genomes sequenced as composition (fusion) of two or more previously separate part of the HMP initiative, as well as over 1500 genomes genes (20). Transposases and integrases, pseudogenes, and associated with a human host and thus relevant to HMP, genes from draft genomes are not considered as putative can be examined and analyzed using IMG/HMP fusion components in order to avoid false positives caused (http://www.hmpdacc-resources.org/img_hmp/), which is by gene fragmentation. A ‘chromosomal cassette’ is provided as part of the HMP Data Analysis and defined as a stretch of genes with intergenic distance Coordination Center (DACC). smaller or equal to 300 bp (21), whereby the genes can The aim of the GEBA is to fill systematically the be on the same or different strands of the chromosome. sequencing gaps along the bacterial and archaeal Chromosomal cassettes with a minimum size of two genes branches of the tree of life. After a pilot project in 2009 common in at least two separate genomes are defined as that generated complete genomes for about 100 organisms ‘conserved chromosomal cassettes’. The identification of (26), the number of sequenced GEBA genomes has common genes across organisms is based on three gene steadily increased and stands at 205 as of August 2011. clustering methods, namely
Recommended publications
  • Comparative Genomic Analysis of Three Pseudomonas
    microorganisms Article Comparative Genomic Analysis of Three Pseudomonas Species Isolated from the Eastern Oyster (Crassostrea virginica) Tissues, Mantle Fluid, and the Overlying Estuarine Water Column Ashish Pathak 1, Paul Stothard 2 and Ashvini Chauhan 1,* 1 Environmental Biotechnology Laboratory, School of the Environment, 1515 S. Martin Luther King Jr. Blvd., Suite 305B, FSH Science Research Center, Florida A&M University, Tallahassee, FL 32307, USA; [email protected] 2 Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB T6G2P5, Canada; [email protected] * Correspondence: [email protected]; Tel.: +1-850-412-5119; Fax: +1-850-561-2248 Abstract: The eastern oysters serve as important keystone species in the United States, especially in the Gulf of Mexico estuarine waters, and at the same time, provide unparalleled economic, ecological, environmental, and cultural services. One ecosystem service that has garnered recent attention is the ability of oysters to sequester impurities and nutrients, such as nitrogen (N), from the estuarine water that feeds them, via their exceptional filtration mechanism coupled with microbially-mediated denitrification processes. It is the oyster-associated microbiomes that essentially provide these myriads of ecological functions, yet not much is known on these microbiota at the genomic scale, especially from warm temperate and tropical water habitats. Among the suite of bacterial genera that appear to interplay with the oyster host species, pseudomonads deserve further assessment because Citation: Pathak, A.; Stothard, P.; of their immense metabolic and ecological potential. To obtain a comprehensive understanding on Chauhan, A. Comparative Genomic this aspect, we previously reported on the isolation and preliminary genomic characterization of Analysis of Three Pseudomonas Species three Pseudomonas species isolated from minced oyster tissue (P.
    [Show full text]
  • Senegalemassilia Anaerobia Gen. Nov., Sp. Nov
    Standards in Genomic Sciences (2013) 7:343-356 DOI:10.4056/sigs.3246665 Non contiguous-finished genome sequence and description of Senegalemassilia anaerobia gen. nov., sp. nov. Jean-Christophe Lagier1, Khalid Elkarkouri1, Romain Rivet1, Carine Couderc1, Didier Raoult1 and Pierre-Edouard Fournier1* 1 Aix-Marseille Université, URMITE, Faculté de médecine, Marseille, France *Corresponding author: Pierre-Edouard Fournier ([email protected]) Keywords: Senegalemassilia anaerobia, genome Senegalemassilia anaerobia strain JC110T sp.nov. is the type strain of Senegalemassilia anaer- obia gen. nov., sp. nov., the type species of a new genus within the Coriobacteriaceae family, Senegalemassilia gen. nov. This strain, whose genome is described here, was isolated from the fecal flora of a healthy Senegalese patient. S. anaerobia is a Gram-positive anaerobic coccobacillus. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,383,131 bp long genome contains 1,932 protein- coding and 58 RNA genes. Introduction Classification and features Senegalemassilia anaerobia strain JC110T (= CSUR A stool sample was collected from a healthy 16- P147 = DSMZ 25959) is the type strain of S. anaer- year-old male Senegalese volunteer patient living obia gen. nov., sp. nov. This bacterium was isolat- in Dielmo (rural village in the Guinean-Sudanian ed from the feces of a healthy Senegalese patient. zone in Senegal), who was included in a research It is a Gram-positive, anaerobic, indole-negative protocol. Written assent was obtained from this coccobacillus. Classically, the polyphasic taxono- individual. No written consent was needed from his my is used to classify the prokaryotes by associat- guardians for this study because he was older than ing phenotypic and genotypic characteristics [1].
    [Show full text]
  • Three New Genome Assemblies Support a Rapid Radiation in Musa Acuminata (Wild Banana)
    GBE Three New Genome Assemblies Support a Rapid Radiation in Musa acuminata (Wild Banana) Mathieu Rouard1,*, Gaetan Droc2,3, Guillaume Martin2,3,JulieSardos1, Yann Hueber1, Valentin Guignon1, Alberto Cenci1,Bjo¨rnGeigle4,MarkS.Hibbins5,6, Nabila Yahiaoui2,3, Franc-Christophe Baurens2,3, Vincent Berry7,MatthewW.Hahn5,6, Angelique D’Hont2,3,andNicolasRoux1 1Bioversity International, Parc Scientifique Agropolis II, Montpellier, France 2CIRAD, UMR AGAP, Montpellier, France 3AGAP, Univ Montpellier, CIRAD, INRA, Montpellier SupAgro, France 4Computomics GmbH, Tuebingen, Germany 5Department of Biology, Indiana University 6Department of Computer Science, Indiana University 7LIRMM, Universite de Montpellier, CNRS, Montpellier, France *Corresponding author: E-mail: [email protected]. Accepted: October 10, 2018 Data deposition: Raw sequence reads for de novo assemblies were deposited in the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI) (BioProject: PRJNA437930 and SRA: SRP140622). Genome Assemblies and gene annotation data are available on the Banana Genome Hub (Droc G, Lariviere D, Guignon V, Yahiaoui N, This D, Garsmeur O, Dereeper A, Hamelin C, Argout X, Dufayard J-F, Lengelle J, Baurens F–C, Cenci A, Pitollat B, D’Hont A, Ruiz M, Rouard M, Bocs S. The Banana Genome Hub. Database (2013) doi:10.1093/ database/bat035) (http://banana-genome-hub.southgreen.fr/species-list). Cluster and gene tree results are available on a dedicated database (http://panmusa.greenphyl.org) hosted on the South Green Bioinformatics Platform (Guignon et al. 2016). Additional data sets are made available on Dataverse: https://doi.org/10.7910/DVN/IFI1QU. Abstract Edible bananas result from interspecific hybridization between Musa acuminata and Musa balbisiana,aswellasamongsubspeciesin M.
    [Show full text]
  • UNIVERSITY of CALIFORNIA, SAN DIEGO the Comparative Genomics
    UNIVERSITY OF CALIFORNIA, SAN DIEGO The Comparative Genomics of Salinispora and the Distribution and Abundance of Secondary Metabolite Genes in Marine Plankton A Dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Marine Biology by Kevin Matthew Penn Committee in charge: Paul R. Jensen, Chair Eric Allen Lin Chao Bradley Moore Brian Palenik Forest Rohwer 2012 UMI Number: 3499839 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent on the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. UMI 3499839 Copyright 2012 by ProQuest LLC. All rights reserved. This edition of the work is protected against unauthorized copying under Title 17, United States Code. ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, MI 48106 - 1346 Copyright Kevin Matthew Penn, 2012 All rights reserved The Dissertation of Kevin Matthew Penn is approved, and it is acceptable in quality and form for publication on microfilm and electronically: Chair University of California, San Diego 2012 iii DEDICATION I dedicate this dissertation to my Mom Gail Penn and my Father Lawrence Penn they deserve more credit then any person could imagine. They have supported me through the good times and the bad times. They have never given up on me and they are always excited to know that I am doing well. They just want the best for me.
    [Show full text]
  • Compact Graphical Representation of Phylogenetic Data and Metadata with Graphlan
    Compact graphical representation of phylogenetic data and metadata with GraPhlAn The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Asnicar, Francesco, George Weingart, Timothy L. Tickle, Curtis Huttenhower, and Nicola Segata. 2015. “Compact graphical representation of phylogenetic data and metadata with GraPhlAn.” PeerJ 3 (1): e1029. doi:10.7717/peerj.1029. http://dx.doi.org/10.7717/ peerj.1029. Published Version doi:10.7717/peerj.1029 Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:17820708 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA Compact graphical representation of phylogenetic data and metadata with GraPhlAn Francesco Asnicar1, George Weingart2, Timothy L. Tickle3, Curtis Huttenhower2,3 and Nicola Segata1 1 Centre for Integrative Biology (CIBIO), University of Trento, Italy 2 Biostatistics Department, Harvard School of Public Health, USA 3 Broad Institute of MIT and Harvard, USA ABSTRACT The increased availability of genomic and metagenomic data poses challenges at multiple analysis levels, including visualization of very large-scale microbial and microbial community data paired with rich metadata. We developed GraPhlAn (Graphical Phylogenetic Analysis), a computational tool that produces high-quality, compact visualizations of microbial genomes and metagenomes. This includes phylogenies spanning up to thousands of taxa, annotated with metadata ranging from microbial community abundances to microbial physiology or host and environmental phenotypes. GraPhlAn has been developed as an open-source command-driven tool in order to be easily integrated into complex, publication- quality bioinformatics pipelines.
    [Show full text]
  • Across Bacterial Phyla, Distantly-Related Genomes with Similar Genomic GC Content Have Similar Patterns of Amino Acid Usage
    University of South Carolina Scholar Commons Faculty Publications Biological Sciences, Department of 3-10-2011 Across Bacterial Phyla, Distantly-Related Genomes with Similar Genomic GC Content Have Similar Patterns of Amino Acid Usage John Lightfield Noah R. Fram Bert Ely University of South Carolina - Columbia, [email protected] Follow this and additional works at: https://scholarcommons.sc.edu/biol_facpub Part of the Biology Commons Publication Info Published in PLoS ONE, Volume 6, Issue 3, 2011, pages e17677-. © 2011 Lightfield et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. This Article is brought to you by the Biological Sciences, Department of at Scholar Commons. It has been accepted for inclusion in Faculty Publications by an authorized administrator of Scholar Commons. For more information, please contact [email protected]. Across Bacterial Phyla, Distantly-Related Genomes with Similar Genomic GC Content Have Similar Patterns of Amino Acid Usage John Lightfield¤a, Noah R. Fram¤b, Bert Ely* Department of Biological Sciences, University of South Carolina, Columbia, South Carolina, United States of America Abstract The GC content of bacterial genomes ranges from 16% to 75% and wide ranges of genomic GC content are observed within many bacterial phyla, including both Gram negative and Gram positive phyla. Thus, divergent genomic GC content has evolved repeatedly in widely separated bacterial taxa. Since genomic GC content influences codon usage, we examined codon usage patterns and predicted protein amino acid content as a function of genomic GC content within eight different phyla or classes of bacteria.
    [Show full text]
  • Various Complexes of the Oral Microbial Flora in Periodontal Disease
    ISSN: 2394-8418 DOI: https://doi.org/10.17352/jdps CLINICAL GROUP Received: 01 April, 2021 Short Communication Accepted: 08 April, 2021 Published: 10 April, 2021 *Corresponding author: Dr. Sukhvinder Singh Oberoi, Various complexes of the oral BDS, MDS, Associate Professor, Public Health Den- tistry, ESIC Dental College and Hospital, Rohini, Guru microbial fl ora in periodontal Gobind Singh Indraprastha University, India, E-mail: disease Keywords: Periodontal diseases; Red complex; Orange complex Sukhvinder Singh Oberoi1*, Shabina Sachdeva2 and https://www.peertechzpublications.com Shibani Grover3 1Associate Professor, Public Health Dentistry, ESIC Dental College and Hospital, Rohini, Guru Gobind Singh Indraprastha University, India 2Professor, Prosthodontics, Faculty of Dentistry, Jamia Milia Islamia, India 3Dean and Director Professor, Conservative Dentistry and Endodontics, ESIC Dental College and Hospital, Rohini, Guru Gobind Singh Indraprastha University, India Abstract Periodontal diseases, is the infection of the periodontal tissues which eventually can lead to loss of teeth, is a form of aberrant infl ammation resulting from a complex biofi lm of friendly commensal and periodontopathic bacteria and their products, triggering the human infl ammatory response. The cluster analysis has shown that 6 closely associated bacterial complexes are associated with it which are designated with different color codes. The early colonizers are “Blue complex” consisting of Actinomyces species, “Yellow complex” comprising of various Streptococci, “Green complex” comprising Eiknella corrodens and Capnocytophaga species, and “Purple complex” comprising Veillonella parvula and Actinomyces odontolyticus. The late colonizers are “Orange complex” comprising Prevotella, Fusobacterium, Campylobacter and other bacteria and the “Red complex” chiefl y consisting of Porphyromonas gingivalis, Tannerella forsythia, and Treponema Denticola. Periodontal disease is the commonest oral disease affecting are facultative, spirochetes and motile rods.
    [Show full text]
  • Comparative and Genetic Analysis of the Four Sequenced Paenibacillus
    Eastman et al. BMC Genomics 2014, 15:851 http://www.biomedcentral.com/1471-2164/15/851 RESEARCH ARTICLE Open Access Comparative and genetic analysis of the four sequenced Paenibacillus polymyxa genomes reveals a diverse metabolism and conservation of genes relevant to plant-growth promotion and competitiveness Alexander W Eastman1,2, David E Heinrichs2 and Ze-Chun Yuan1,2* Abstract Background: Members of the genus Paenibacillus are important plant growth-promoting rhizobacteria that can serve as bio-reactors. Paenibacillus polymyxa promotes the growth of a variety of economically important crops. Our lab recently completed the genome sequence of Paenibacillus polymyxa CR1. As of January 2014, four P. polymyxa genomes have been completely sequenced but no comparative genomic analyses have been reported. Results: Here we report the comparative and genetic analyses of four sequenced P. polymyxa genomes, which revealed a significantly conserved core genome. Complex metabolic pathways and regulatory networks were highly conserved and allow P. polymyxa to rapidly respond to dynamic environmental cues. Genes responsible for phytohormone synthesis, phosphate solubilization, iron acquisition, transcriptional regulation, σ-factors, stress responses, transporters and biomass degradation were well conserved, indicating an intimate association with plant hosts and the rhizosphere niche. In addition, genes responsible for antimicrobial resistance and non-ribosomal peptide/polyketide synthesis are present in both the core and accessory genome of each strain. Comparative analyses also reveal variations in the accessory genome, including large plasmids present in strains M1 and SC2. Furthermore, a considerable number of strain-specific genes and genomic islands are irregularly distributed throughout each genome. Although a variety of plant-growth promoting traits are encoded by all strains, only P.
    [Show full text]
  • Downloaded from a Variety of Sources (For Details, See Table S1) and Between Metazoa and Dictyostelium
    UC Riverside UC Riverside Previously Published Works Title This Déjà vu feeling--analysis of multidomain protein evolution in eukaryotic genomes. Permalink https://escholarship.org/uc/item/5398f2x3 Journal PLoS computational biology, 8(11) ISSN 1553-734X Authors Zmasek, Christian M Godzik, Adam Publication Date 2012 DOI 10.1371/journal.pcbi.1002701 Peer reviewed eScholarship.org Powered by the California Digital Library University of California This De´ja` Vu Feeling—Analysis of Multidomain Protein Evolution in Eukaryotic Genomes Christian M. Zmasek*, Adam Godzik* Program in Bioinformatics and Systems Biology, Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America Abstract Evolutionary innovation in eukaryotes and especially animals is at least partially driven by genome rearrangements and the resulting emergence of proteins with new domain combinations, and thus potentially novel functionality. Given the random nature of such rearrangements, one could expect that proteins with particularly useful multidomain combinations may have been rediscovered multiple times by parallel evolution. However, existing reports suggest a minimal role of this phenomenon in the overall evolution of eukaryotic proteomes. We assembled a collection of 172 complete eukaryotic genomes that is not only the largest, but also the most phylogenetically complete set of genomes analyzed so far. By employing a maximum parsimony approach to compare repertoires of Pfam domains and their combinations, we show that independent evolution of domain combinations is significantly more prevalent than previously thought. Our results indicate that about 25% of all currently observed domain combinations have evolved multiple times. Interestingly, this percentage is even higher for sets of domain combinations in individual species, with, for instance, 70% of the domain combinations found in the human genome having evolved independently at least once in other species.
    [Show full text]
  • Analysis of Class C G-Protein Coupled Receptors Using Supervised
    Analysis of class C G-Protein Coupled Receptors using supervised classification methods Caroline Leonore König Supervised by: Dr. René Alquézar Mancho and Dr. Alfredo Vellido Alcacena Computer Science Department Universitat Politècnica de Catalunya A thesis submitted for the degree of Ph.D. in Artificial Intelligence Acknowledgments I would like to thank my advisors Dr. Alfredo Vellido and Dr. René Alquézar from the SOCO research group of the UPC, for giving me the opportunity to develop my Ph.D. thesis as part of the KAPPA-AIM1 project and work on the investigation of G protein-coupled receptors with Artificial Intelligence methods. Both of them helped me with questions and provided me useful feedback, as well as valuable advice and input at every stage of this thesis, spending a long time during the preparation of this work. As well, I would like to thank to Dr. Jesús Giraldo from the ’Institut de Neurociencies’ of the ’Universitat Autònoma de Barcelona’ (UAB) for the large collaboration in this PhD research providing so many biological insight to the study. 1KAPPA-AIM: Knowledge Acquisition in Pharmacoproteomics using Advanced Artificial Intel- ligence Methods i Abstract G protein-coupled receptors (GPCRs) are cell membrane proteins with a key role in regulating the function of cells. This is the result of their ability to transmit extracellular signals, which makes them relevant for pharmacology and has led, over the last decade, to active research in the field of proteomics. The current thesis specifically targets class C of GPCRs, which are relevant in therapies for various central nervous system disorders, such as Alzheimer’s disease, anxiety, Parkinson’s disease and schizophrenia.
    [Show full text]
  • Comparative Analysis of Plant Genomes Through Data Integration
    Comparative Analysis of Plant Genomes through Data Integration Michiel Van Bel Promoter: Prof. Dr. Yves Van de Peer Co-Promoter: Prof. Dr. Klaas Vandepoele Ghent University Faculty of Sciences Department of Plant Biotechnology and Bioinformatics VIB Department of Plant Systems Biology Bioinformatics and Systems Biology Dissertation submitted in fulfillment of the requirements for the degree of Doctor (PhD) in Sciences, Bioinformatics). Academic year: 2012-2013 Examination Committee Prof. Dr. Geert De Jaeger (chair) Faculty of Sciences, Department of Plant Biotechnology and Bioinformatics, Ghent University Prof. Dr. Yves Van de Peer (promoter) Faculty of Sciences, Department of Plant Biotechnology and Bioinformatics, Ghent University Prof. Dr. Klaas Vandepoele (co-promoter) Faculty of Sciences, Department of Plant Biotechnology and Bioinformatics, Ghent University Prof. Dr. Jan Fostier Faculty of Engineering, Department of Information Technology, Ghent University Prof. Dr. Peter Dawyndt Faculty of Science, Department of Applied Mathematics and Computer Science, Ghent University Dr. Steven Robbens Bayer Cropscience, Belgium Dr. Matthieu Conte Syngenta Seeds, France II Acknowledgements While the cover of this book carries my name, this thesis did not come to fruition by my hand only. These past years have been a great experience, for which I would like to express my gratitude to several people. First of all, I would like to thank Thomas Abeel, for getting me in touch with Yves’ research group, and encouraging me to start a PhD in bioinformatics. Without a chance encounter with him, I never would have dreamed obtaining a PhD would be possible. Secondly, I would like to thank my promoter and co-promoter, Yves Van de Peer and Klaas Vande- poele.
    [Show full text]
  • Combining Morphological and Metabarcoding Approaches Reveals the Freshwater Eukaryotic Phytoplankton Community
    Prime Archives in Environmental Research Book Chapter Combining Morphological and Metabarcoding Approaches Reveals the Freshwater Eukaryotic Phytoplankton Community Shouliang Huo1, Xiaochuang Li1*, Beidou Xi1, Hanxiao Zhang1,2, Chunzi Ma1 and Zhuoshi He1 1State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, PR China 2College of Water Sciences, Beijing Normal University, PR China *Corresponding Author: Xiaochuang Li, State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, PR China Published December 10, 2020 This Book Chapter is a republication of an article published by Xiaochuang Li, et al. at Environmental Sciences Europe in March 2020. (Huo, S., Li, X., Xi, B. et al. Combining morphological and metabarcoding approaches reveals the freshwater eukaryotic phytoplankton community. Environ Sci Eur 32, 37 (2020). https://doi.org/10.1186/s12302-020-00321-w) How to cite this book chapter: Shouliang Huo, Xiaochuang Li, Beidou Xi, Hanxiao Zhang, Chunzi Ma, Zhuoshi He. Combining Morphological and Metabarcoding Approaches Reveals the Freshwater Eukaryotic Phytoplankton Community. In: Anna Strunecka, editor. Prime Archives in Environmental Research. Hyderabad, India: Vide Leaf. 2020. 1 www.videleaf.com Prime Archives in Environmental Research © The Author(s) 2020. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Acknowledgements and Funding: This work was supported by the National Key Research and Development program of China [No. 2017YFA0605003] and the National Natural Science Foundation of China [No.
    [Show full text]