Improving the Usage of Ribosomal RNA Gene in Microbiology and Microbial Ecology Importance of Standardization and Biocuration

lk Improving the usage of ribosomal RNA gene in microbiology and microbial ecology Importance of standardization and biocuration by Pelin Yilmaz, M.Sc. A thesis submitted in partial fulllment of requirements for the degree of DOCTOR OF PHILOSOPHY in Bioinformatics Approved, Thesis Committee: Prof. Dr. Frank Oliver Glöckner (chair) Max Planck Institute for Marine Microbiology Jacobs University Prof. Dr. Matthias Ullrich (member) Jacobs University Dr. Wolfgang Ludwig (member) Technical University of Munich Date of Defense: December, 9 2011 School of Engineering and Science Statement of Sources DECLARATION I declare that this thesis is my own work and has not been submitted in any form for another degree or diploma at any university or other institution of tertiary education. Information derived from published or unpublished scientic work has been cited in the text and listed in the references. Signature Date First classification of Bacteria: 1. Bacterium, inflexible rod forms; 2. Vibrio, flexible rod forms; 3. Spirillum, spiral, inflexible; 4. Spirochaeta, spiral, flexible 5. Spirodiscus, flattened spirals. * Adapted from; Die Infusionsthierchen als vollkommene Organismen: ein Blick in das tiefere organische Leben der Natur, Christian Gottfried Ehrenberg (1838). Thesis Abstract Environmental surveys of ribosomal RNA genes have become the standard approach to microbial ecology in the last twenty-five years. Although the ribosomal RNA approach has been competing with metagenomics during the past years, with high-throughput sequencing technologies presenting the ability to document the diversity and richness of the microbial biome at an astonishing rate, it is once again under spotlight. High-throughput sequencing technologies bring with them a sequence data deluge, and present challenges to the traditional data management, analysis and retrieval practices. Biocuration, which leads to specialized ribosomal RNA datasets, has become an important part of microbial ecology. More recently, the development and use of metadata standards has emerged as an important aid for biocuration in microbiology. With the help of such standards, it will be possible to curate specialized datasets, which integrate data from even more diverse resources, and bring the organisms, its functions, and the environment together. The work accomplished during this Ph.D. thesis aimed in improving the usage of ribosomal RNA gene in microbiology and microbial ecology by means of curating high- quality specialized datasets for ribosomal RNA gene sequences, and performing research that demonstrates the value of such datasets. More importantly, a metadata standard for marker genes sequences, such as ribosomal RNA, was developed and disseminated to a variety of bioinformatics service providers for implementation. This standard will greatly aid biocuration work in microbiology, and will improve the quantity and quality of integrative resources for microbiology, ultimately providing researchers the ability to understand the diversity and functioning of the microbial biosphere. Table of Contents Introduction ................................................................................... 1 Early Microbiology and Microbial Ecology ........................................... 1 The Molecular Dawn for Bacteria ............................................................ 3 Understanding the organism ................................................................................................... 3 Revisiting evolution ............................................................................................................... 3 Accessing the microbial biosphere ......................................................................................... 5 Cataloging Bacterial and Archaeal Diversity .......................................... 7 Primary databases ................................................................................................................... 7 Secondary databases ............................................................................................................... 8 rRNA Approach in the Age of “Omics” ................................................ 10 Is the rRNA approach still relevant? ..................................................................................... 10 High-throughput microbial diversity ..................................................................................... 13 Missing ingredient; contextual data ....................................................................................... 14 Motivation and Research Aims .............................................................. 18 Results and Discussion ................................................................ 21 Overview ................................................................................................... 22 Contextual Data Standards for Biocuration ......................................... 25 I. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications .................................................. 27 II. The genomic standards consortium: bringing standards to life for microbial ecology .... 43 III. MetaBar - a tool for consistent contextual data acquisition and standards compliant submission ............................................................................................................................. 51 IV. CDinFusion – Submission-ready, on-line Integration of Sequence and Contextual Data ............................................................................................................................................... 71 Curation of rRNA Datasets ..................................................................... 89 V. SILVA: Comprehensive databases for quality checked and aligned ribosomal RNA sequence data compatible with ARB .................................................................................... 91 VI. Megx.net: integrated database resource for marine ecological genomics ..................... 105 Usage of Curated Datasets in Microbial Ecology ............................... 119 VII. Analysis of 23S rRNA genes in metagenomes – A case study from the Global Ocean Sampling Expedition ........................................................................................................... 121 VIII. Ecological structuring of bacterial and archaeal taxa in ocean surface waters .......... 143 Summary .................................................................................... 177 Contextual Data Standards for Biocuration ....................................... 177 Minimum information about any (x) sequence (MIxS) ...................................................... 177 Implementation of MIxS ..................................................................................................... 180 Curation of rRNA Datasets .................................................................. 183 SILVA taxonomy ................................................................................................................ 183 Georeferenced diversity in megx.net ................................................................................... 186 Usage of Curated Datasets in Microbial Ecology ............................... 188 Investigating the potential of 23S rRNA genes in metagenomes ........................................ 188 Ecological structuring of bacterial and archaeal taxa in the marine environment .............. 189 Outlook ....................................................................................... 191 Appendix .................................................................................... 195 Acknowledgements .................................................................... 197 Bibliography ............................................................................... 199 INTRODUCTION Early Microbiology and Microbial Ecology The story of Bacteria started as a curiosity of nature with Leeuwenhoek’s microscopy observations in 1670s. Until mid-1800s, Bacteria were largely ignored; in fact it was not even sure that they were living organisms. The actual realization of the significance of Bacteria, without doubt, came with Robert Koch’s postulation of the “Germ Theory of Disease”. Having found an “applied” perspective, microbiology (specifically the study of Bacteria and Archaea) made a kick-start, and numerous findings ranging from biogenesis to development of vaccines dotted the early timeline of this field. Due to their implications in human health, other aspects of microbiology lagged behind, but did attain pace with the seminal works of Sergei Winogradsky and Martinus Beijerinck. Perhaps, those of the medical microbiology unfairly dominated their achievements. In fact, microbial ecology’s existence would not have been, if it were not for their discoveries of nitrogen fixation or chemoautotrophy, which are cornerstones in today’s understanding of the biogeochemical cycling of elements in the environment. Medical microbiology not only overshadowed Winogradsky and Beijericks’s achievements, but also made it difficult for microbiology to be realized as a “basic science”. However, Beijerinck had another vision for microbiology, rather than confining it to the applications of medicine. In the statement he made during his speech at the Dutch Royal Academy of Sciences in 1905, he outlines the best practices to studying microorganisms [1]: “[My] approach can

Improving the Usage of Ribosomal RNA Gene in Microbiology and Microbial Ecology Importance of Standardization and Biocuration

Insights Into the Dynamics Between Viruses and Their Hosts in a Hot Spring Microbial Mat

GSC Workshop 10 Grant Finances Monthly Teleconference Calls

The Minimum Information About a Genome Sequence (MIGS) Specification

Meeting Report: “Metagenomics, Metadata and Meta-Analysis” (M3) Workshop at the Pacific Symposium on Biocomputing 2010

The Positive Role of the Ecological Community in the Genomic Revolution

F. Nucleatum Sub Spp Vincentii and Its Comparison with the Genome of F

Volume 12, Issue 2

High-Quality Permanent Draft Genome Sequence of Bradyrhizobium Sp Th.B2, a Microsymbiont of Amphicarpaea Bracteata Collected in Johnson City, New York

Third Annual DOE Joint Genome Institute User Meeting

The Future of Microbial Metagenomics (Or Is Ignorance Bliss?)

GSC 6 and GSC 7 Meeting Reports Lent Speakers and Talks

The Genome Sequence of the Facultative Intracellular Pathogen Brucella Melitensis