De Bruijn Graphs Face Harsh Realities of Assembly What’S Inside the Cell Nucleus? Contents of the Nucleus

Total Page:16

File Type:pdf, Size:1020Kb

De Bruijn Graphs Face Harsh Realities of Assembly What’S Inside the Cell Nucleus? Contents of the Nucleus How Do Biologists Assemble Genomes? ! Graph Algorithms Phillip Compeau and Pavel Pevzner Bioinformatics Algorithms: An Active Learning Approach ©2015 by Compeau and Pevzner. All rights reserved. The Newspaper Problem The Newspaper Problem The Newspaper Problem The Newspaper Problem The Newspaper Problem The Newspaper Problem Outline • What Is Genome Sequencing? • The String Reconstruction Problem • String Reconstruction as a Walk in the Overlap Graph • Another Graph for String Reconstruction • The Seven Bridges of Konigsberg • Euler’s Theorem • De Bruijn Graphs Face Harsh Realities of Assembly What’s inside the cell nucleus? Contents of the Nucleus • The nucleus contains chromosomes. • Humans have 23 pairs of! chromosomes (one in each! pair comes from each parent).! • But what are chromosomes! made of? DNA: The Building Block of Life • One more zoom, and we reach the molecular level. • Early 1950s: Researchers start! uncovering properties of ! chromosomal substance,! now called “deoxyribose! nucleic acid”: DNA • 1953: Watson and Crick publish! “double helix” structure of DNA. Molecular Structure of DNA DNAs Double Helix DNAs Molecular Structure Molecular Structure of DNA • Nucleotide: Half of one! “rung” of DNA. • Four choices for the nucleic acid of a nucleotide: 1. Adenine (A) 2. Cytosine (C) 3. Guanine (G)—bonds to C 4. Thymine (T)—bonds to A DNAs Molecular Structure Why is DNA Important? Central Dogma of Molecular Biology: DNA is transcribed into RNA, which is then translated into protein (chain of amino acids). Courtesy: Rachel Raynes! Genome: A Long DNA “Book” • Genome: The nucleotide sequence read down one side of an organism’s chromosomal DNA. …CCGTAGTCGCATGGAACAGTATACGAGACAGTACAGATACGATACGATACGATCATTAACCGAGAGTACCAGATTCCAGATCATAC TTACGCTTAGCTACGGACGTACGATACCCAGATTACGATCCATATAGATATAACCGGTGTGTCTTGCTAATACGTAACGGGGTGCCT TCGATAGGTCAGAATACCAGATCTCTCGATCTTCTTACAGATACTACGATCCCCAGATACTACCCCTACTGACCCATCGTACGGGTA CTACTACGGATATGATACCGATGTAGAGGGATCCATATATCCCGAGACGTCTCGCGCATAAGATCATCGTCTAGATACACGTACGTA CTAGACTAGCGTATGCCTCTTATGATCGTCCCGATCGAGTCGCGTGCTCAGAAAAGCTACGATACGATACCCGATACTAGACCATAG… • A human genome has about 3 billion nucleotides. • Biologists want to be able to read this book. This is what it means to sequence a genome. We Share 99.9% of Our Genomes CTGATGATGGACTACGCTACTACTGCTAGCTGTATTACGA TCAGCTACCACATCGTAGCTACGATGCATTAGCAAGCTAT CGATCGATCGATCGATTATCTACGATCGATCGATCGATCA CTATACGAGCTACTACGTACGTACGATCGCGGGACTATTA TCGACTACAGATAAAACATGCTAGTACAACAGTATACATA GCTGCGGGATACGATTAGCTAATAGCTGACGATATCCGAT CTGATGATGGACTACGCTACTACTGCTAGCTGTATTACGA TCAGCTACAACATCGTAGCTACGATGCATTAGCAAGCTAT CGATCGATCGATCGATTATCTACGATCGATCGATCGATCA CTATACGAGCTACTACGTACGTACGATCGCGTGACTATTA TCGACTACAGATGAAACATGCTAGTACAACAGTATACATA GCTGCGGGATACGATTAGCTAATAGCTGACGATATCCGAT Species vs. Individual Sequencing Species Sequencing: What is the “consensus” genome of an entire species? Species vs. Individual Sequencing Individual Sequencing: What makes an individual unique within their species? Why SequenceLETTERS a Species’sNATURE MICROBIOLOGY DOI: Genome? 10.1038/NMICROBIOL.2016.48 (Tenericutes) Bacteria Actinobacteria Armatimonadetes Nomurabacteria Kaiserbacteria Zixibacteria Atribacteria Adlerbacteria Cloacimonetes Aquificae Chloroflexi Campbellbacteria Fibrobacteres Calescamantes Gemmatimonadetes Caldiserica Firmicutes WOR-3 Dictyoglomi TA06 Thermotogae Cyanobacteria Poribacteria Deinococcus-Therm. Latescibacteria Synergistetes Giovannonibacteria BRC1 Fusobacteria Melainabacteria Wolfebacteria Marinimicrobia Jorgensenbacteria RBX1 Ignavibacteria Bacteroidetes WOR1 Chlorobi Caldithrix Azambacteria PVC Parcubacteria superphylum Yanofskybacteria Planctomycetes Moranbacteria Elusimicrobia Chlamydiae, Lentisphaerae, Magasanikbacteria Verrucomicrobia Uhrbacteria Falkowbacteria Candidate Omnitrophica Phyla Radiation SM2F11 Rokubacteria NC10 Aminicentantes Peregrinibacteria Acidobacteria Tectomicrobia, Modulibacteria Gracilibacteria BD1-5, GN02 Nitrospinae Absconditabacteria SR1 Nitrospirae Saccharibacteria Dadabacteria Berkelbacteria Deltaprotebacteria (Thermodesulfobacteria) Chrysiogenetes Deferribacteres Hydrogenedentes NKB19 Woesebacteria Spirochaetes Shapirobacteria Wirthbacteria Amesbacteria TM6 Collierbacteria Epsilonproteobacteria Pacebacteria Beckwithbacteria Roizmanbacteria Dojkabacteria WS6 Gottesmanbacteria CPR1 Levybacteria CPR3 Daviesbacteria Microgenomates Katanobacteria Curtissbacteria Alphaproteobacteria WWE3 Zetaproteo. Acidithiobacillia Betaproteobacteria Major lineages with isolated representative: italics Major lineage lacking isolated representative: 0.4 Gammaproteobacteria Micrarchaeota Diapherotrites Eukaryotes Nanohaloarchaeota Aenigmarchaeota Loki. Parvarchaeota Thor. Korarch. DPANN Crenarch. Pacearchaeota Bathyarc. Nanoarchaeota YNPFFA Woesearchaeota Aigarch. Opisthokonta Altiarchaeales Halobacteria Z7ME43 Methanopyri TACK Methanococci Excavata Archaea Hadesarchaea Thermococci Thaumarchaeota Archaeplastida Hug et al., 2016! Methanobacteria Thermoplasmata Chromalveolata Archaeoglobi Methanomicrobia Amoebozoa Figure 1 | A current view of the tree of life, encompassing the total diversity represented by sequenced genomes. The tree includes 92 named bacterial phyla, 26 archaeal phyla and all five of the Eukaryotic supergroups. Major lineages are assigned arbitrary colours and named, with well-characterized lineage names, in italics. Lineages lacking an isolated representative are highlighted with non-italicized names and red dots. For details on taxon sampling and tree inference, see Methods. The names Tenericutes and Thermodesulfobacteria are bracketed to indicate that these lineages branch within the Firmicutesand the Deltaproteobacteria, respectively. Eukaryotic supergroups are noted, but not otherwise delineated due to the low resolution of these lineages. The CPR phyla are assigned a single colour as they are composed entirely of organisms without isolated representatives, and are still in the process of definition at lower taxonomic levels. The complete ribosomal protein tree is available in rectangular format with full bootstrap values as Supplementary Fig. 1 andin Newick format in Supplementary Dataset 2. 2 NATURE MICROBIOLOGY | www.nature.com/naturemicrobiology © 2016 Macmillan Publishers Limited. All rights reserved Why Sequence an Individual’s Genome? Personalized Medicine: Tailoring medical treatment to the individual based on their genetics. 2010: First person whose life was saved due to genome sequencing. Brief History of Genome Sequencing 1977: Gilbert and Sanger develop sequencing techniques independently. 1980: They share the Nobel Walter Gilbert prize. The resulting sequencing methods cost $1 per nucleotide. Frederick Sanger Brief History of Genome Sequencing 1990: The public Human Genome Project, headed by Francis Collins, aims to sequence the human genome. Francis Collins 1997: Craig Venter founds Celera genomics, a private firm with the same goal. Craig Venter Brief History of Genome Sequencing 2000: Draft of human genome is simultaneously completed by the Human Genome Consortium (public) and Celera Genomics (private). Brief History of Genome Sequencing 2000s: Race is on to sequence other mammalian genomes. Brief History of Genome Sequencing 2008: US passes Genetic Nondiscrimination Act. 2013: UK declares public funding to sequence 100,000 human genomes. 2015: Ilumina reduces cost of sequencing an individual human genome to $1,000. The Future of Genomics What Makes Genome Sequencing Hard? Sequencing machines can only read short pieces of DNA (~250 nucleotides long), called reads. vs. General Idea of Genome Assembly Multiple identical copies of a genome Shatter the genome into reads Sequence the reads AGAATATCA TGAGAATAT GAGAATATC AGAATATCA Assemble the genome using GAGAATATC overlapping reads TGAGAATAT ...TGAGAATATCA... General Idea of Genome Assembly Multiple identical copies of a genome Shatter the genome into reads Sequence the reads AGAATATCA TGAGAATAT GAGAATATC AGAATATCA Assemble the genome using GAGAATATC overlapping reads TGAGAATAT ...TGAGAATATCA... STOP and Think: What does this remind you of? Outline • What Is Genome Sequencing? • The String Reconstruction Problem • String Reconstruction as a Walk in the Overlap Graph • Another Graph for String Reconstruction • The Seven Bridges of Konigsberg • Euler’s Theorem • De Bruijn Graphs Face Harsh Realities of Assembly Complications in Genome Assembly 1. DNA is double-stranded (and may consist of multiple chromosomes). 2. Reads have imperfect coverage of the underlying genome. 3. Sequencing machines are error-prone. Assumptions for Genome Assembly 1. DNA is single-stranded (and consists of a single chromosome, like bacteria). 2. Reads have perfect coverage of the underlying genome: every k-mer in the genome is present. 3. Sequencing machines are error-free. k-mer Composition The k-mer composition of a string Text, denoted Compositionk(Text), is the collection of all k-mer substrings of Text (including repeats). Composition3(TATGGGGTGC) =! {ATG, GGG, GGG, GGT, GTG, TAT, TGC, TGG} String Reconstruction Problem String Reconstruction Problem: Reconstruct a string from its k-mer composition. • Input: An integer k and a collection Patterns of k- mers. • Output: A string Text with k-mer composition equal to Patterns (if such a string exists). Exercise Break: Reconstruct a string having the 3- mer composition {AAT, ATG, GTT, TAA, TGT}. What algorithm did you use? HOW DO WE ASSEMBLE GENOMES? Solving the String Composition Problem is a straightforward exercise, but in order to model genome
Recommended publications
  • Species Tree Inference and Update on Very Large Datasets Using Approximation, Randomization, Parallelization, and Vectorization
    Species tree inference and update on very large datasets using approximation, randomization, parallelization, and vectorization Siavash Mirarab Electrical and Computer Engineering University of California at San Diego 1 Phylogenetic reconstruction from data Gorilla Human Chimpanzee Orangutan ACTGCACACCG ACTGCCCCCG AATGCCCCCG CTGCACACGG 2 Phylogenetic reconstruction from data CTGCACACCG CTGCACACCG CTGCACACGG Gorilla Human Chimpanzee Orangutan ACTGCACACCG ACTGCCCCCG AATGCCCCCG CTGCACACGG 2 Phylogenetic reconstruction from data CTGCACACCG CTGCACACCG CTGCACACGG Gorilla Human Chimpanzee Orangutan ACTGCACACCG ACTGCCCCCG AATGCCCCCG CTGCACACGG 2 Phylogenetic reconstruction from data CTGCACACCG CTGCACACCG CTGCACACGG Gorilla Human Chimpanzee Orangutan ACTGCACACCG ACTGCCCCCG AATGCCCCCG CTGCACACGG Gorilla ACTGCACACCG Human ACTGC-CCCCG Chimpanzee AATGC-CCCCG Orangutan -CTGCACACGG D 2 Phylogenetic reconstruction from data CTGCACACCG CTGCACACCG CTGCACACGG Gorilla Human Chimpanzee Orangutan ACTGCACACCG ACTGCCCCCG AATGCCCCCG CTGCACACGG Orangutan Chimpanzee Gorilla ACTGCACACCG Human ACTGC-CCCCG Chimpanzee AATGC-CCCCG Orangutan -CTGCACACGG Gorilla Human D P (D T ) T | 2 Applications: HIV forensic Texas case Washington case Scaduto et al., PNAS, 2010 3 Applications: microbiome https://www.nytimes.com/2017/11/06/well/live/ unlocking-the-secrets-of-the-microbiome.html 4 Applications: microbiome https://www.nytimes.com/2017/11/06/well/live/ unlocking-the-secrets-of-the-microbiome.html Morgan, Xochitl C., Nicola Segata, and Curtis Huttenhower. "Trends in genetics (2013) 4 Applications: food safety Tracking the source of a listeriosis outbreak Jackson, Brendan R., et al. Reviews of Infectious Diseases (2016) 5 Fig. 3. Molecular dating of the 2014 outbreak. (A) BEAST dating of the separation of the 2014 lineage from Middle African lineages (SL = Sierra Leone; GN = Guinea; DRC = Democratic Republic of Congo; tMRCA: Sep 2004, 95% HPD: Oct 2002 - May 2006).
    [Show full text]
  • Resilience of Microbial Communities After Hydrogen Peroxide Treatment of a Eutrophic Lake to Suppress Harmful Cyanobacterial Blooms
    microorganisms Article Resilience of Microbial Communities after Hydrogen Peroxide Treatment of a Eutrophic Lake to Suppress Harmful Cyanobacterial Blooms Tim Piel 1,†, Giovanni Sandrini 1,†,‡, Gerard Muyzer 1 , Corina P. D. Brussaard 1,2 , Pieter C. Slot 1, Maria J. van Herk 1, Jef Huisman 1 and Petra M. Visser 1,* 1 Department of Freshwater and Marine Ecology, Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, 1090 GE Amsterdam, The Netherlands; [email protected] (T.P.); [email protected] (G.S.); [email protected] (G.M.); [email protected] (C.P.D.B.); [email protected] (P.C.S.); [email protected] (M.J.v.H.); [email protected] (J.H.) 2 Department of Marine Microbiology and Biogeochemistry, NIOZ Royal Netherland Institute for Sea Research, 1790 AB Den Burg, The Netherlands * Correspondence: [email protected]; Tel.: +31-20-5257073 † These authors have contributed equally to this work. ‡ Current address: Department of Technology & Sources, Evides Water Company, 3006 AL Rotterdam, The Netherlands. Abstract: Applying low concentrations of hydrogen peroxide (H2O2) to lakes is an emerging method to mitigate harmful cyanobacterial blooms. While cyanobacteria are very sensitive to H2O2, little Citation: Piel, T.; Sandrini, G.; is known about the impacts of these H2O2 treatments on other members of the microbial com- Muyzer, G.; Brussaard, C.P.D.; Slot, munity. In this study, we investigated changes in microbial community composition during two P.C.; van Herk, M.J.; Huisman, J.; −1 lake treatments with low H2O2 concentrations (target: 2.5 mg L ) and in two series of controlled Visser, P.M.
    [Show full text]
  • Dissertation Implementing Organic Amendments To
    DISSERTATION IMPLEMENTING ORGANIC AMENDMENTS TO ENHANCE MAIZE YIELD, SOIL MOISTURE, AND MICROBIAL NUTRIENT CYCLING IN TEMPERATE AGRICULTURE Submitted by Erika J. Foster Graduate Degree Program in Ecology In partial fulfillment of the requirements For the Degree of Doctor of Philosophy Colorado State University Fort Collins, Colorado Summer 2018 Doctoral Committee: Advisor: M. Francesca Cotrufo Louise Comas Charles Rhoades Matthew D. Wallenstein Copyright by Erika J. Foster 2018 All Rights Reserved i ABSTRACT IMPLEMENTING ORGANIC AMENDMENTS TO ENHANCE MAIZE YIELD, SOIL MOISTURE, AND MICROBIAL NUTRIENT CYCLING IN TEMPERATE AGRICULTURE To sustain agricultural production into the future, management should enhance natural biogeochemical cycling within the soil. Strategies to increase yield while reducing chemical fertilizer inputs and irrigation require robust research and development before widespread implementation. Current innovations in crop production use amendments such as manure and biochar charcoal to increase soil organic matter and improve soil structure, water, and nutrient content. Organic amendments also provide substrate and habitat for soil microorganisms that can play a key role cycling nutrients, improving nutrient availability for crops. Additional plant growth promoting bacteria can be incorporated into the soil as inocula to enhance soil nutrient cycling through mechanisms like phosphorus solubilization. Since microbial inoculation is highly effective under drought conditions, this technique pairs well in agricultural systems using limited irrigation to save water, particularly in semi-arid regions where climate change and population growth exacerbate water scarcity. The research in this dissertation examines synergistic techniques to reduce irrigation inputs, while building soil organic matter, and promoting natural microbial function to increase crop available nutrients. The research was conducted on conventional irrigated maize systems at the Agricultural Research Development and Education Center north of Fort Collins, CO.
    [Show full text]
  • Systema Naturae 2000 (Phylum, 6 Nov 2017)
    The Taxonomicon Systema Naturae 2000 Classification of Domain Bacteria (prokaryotes) down to Phylum Compiled by Drs. S.J. Brands Universal Taxonomic Services 6 Nov 2017 Systema Naturae 2000 - Domain Bacteria - Domain Bacteria Woese et al. 1990 1 Genus †Eoleptonema Schopf 1983, incertae sedis 2 Genus †Primaevifilum Schopf 1983, incertae sedis 3 Genus †Archaeotrichion Schopf 1968, incertae sedis 4 Genus †Siphonophycus Schopf 1968, incertae sedis 5 Genus Bactoderma Tepper and Korshunova 1973 (Approved Lists 1980), incertae sedis 6 Genus Stibiobacter Lyalikova 1974 (Approved Lists 1980), incertae sedis 7.1.1.1.1.1 Superphylum "Proteobacteria" Craig et al. 2010 1.1 Phylum "Alphaproteobacteria" 1.2.1 Phylum "Acidithiobacillia" 1.2.2.1 Phylum "Gammaproteobacteria" 1.2.2.2.1 Candidate phylum Muproteobacteria (RIF23) Anantharaman et al. 2016 1.2.2.2.2 Phylum "Betaproteobacteria" 2 Phylum "Zetaproteobacteria" 7.1.1.1.1.2 Phylum "Deltaproteobacteria_1" 7.1.1.1.2.1.1.1 Phylum "Deltaproteobacteria" [polyphyletic] 7.1.1.1.2.1.1.2.1 Phylum "Deltaproteobacteria_2" 7.1.1.1.2.1.1.2.2 Phylum "Deltaproteobacteria_3" 7.1.1.1.2.1.2 Candidate phylum Dadabacteria (CSP1-2) Hug et al. 2015 7.1.1.1.2.2.1 Candidate phylum "MBNT15" 7.1.1.1.2.2.2 Candidate phylum "Uncultured Bacterial Phylum 10 (UBP10)" Parks et al. 2017 7.1.1.2.1 Phylum "Nitrospirae_1" 7.1.1.2.2 Phylum Chrysiogenetes Garrity and Holt 2001 7.1.2.1.1 Phylum "Nitrospirae" Garrity and Holt 2001 [polyphyletic] 7.1.2.1.2.1.1 Candidate phylum Rokubacteria (CSP1-6) Hug et al.
    [Show full text]
  • Libros Sobre Enfermedades Autoinmunes: Tratamientos, Tipos Y Diagnósticos- Profesor Dr
    - LIBROS SOBRE ENFERMEDADES AUTOINMUNES: TRATAMIENTOS, TIPOS Y DIAGNÓSTICOS- PROFESOR DR. ENRIQUE BARMAIMON- 9 TOMOS- AÑO 2020.1- TOMO VI- - LIBROS SOBRE ENFERMEDADES AUTOINMUNES: TRATAMIENTOS, TIPOS Y DIAGNÓSTICOS . AUTOR: PROFESOR DR. ENRIQUE BARMAIMON.- - Doctor en Medicina.- - Cátedras de: - Anestesiología - Cuidados Intensivos - Neuroanatomía - Neurofisiología - Psicofisiología - Neuropsicología. - 9 TOMOS - - TOMO VI - -AÑO 2020- 1ª Edición Virtual: (.2020. 1)- - MONTEVIDEO, URUGUAY. 1 - LIBROS SOBRE ENFERMEDADES AUTOINMUNES: TRATAMIENTOS, TIPOS Y DIAGNÓSTICOS- PROFESOR DR. ENRIQUE BARMAIMON- 9 TOMOS- AÑO 2020.1- TOMO VI- - Queda terminantemente prohibido reproducir este libro en forma escrita y virtual, total o parcialmente, por cualquier medio, sin la autorización previa del autor. -Derechos reservados. 1ª Edición. Año 2020. Impresión [email protected]. - email: [email protected].; y [email protected]; -Montevideo, 15 de enero de 2020. - BIBLIOTECA VIRTUAL DE SALUD del S. M.U. del URUGUAY; y BIBLIOTECA DEL COLEGIO MÉDICO DEL URUGUAY. 0 0 0 0 0 0 0 0. 2 - LIBROS SOBRE ENFERMEDADES AUTOINMUNES: TRATAMIENTOS, TIPOS Y DIAGNÓSTICOS- PROFESOR DR. ENRIQUE BARMAIMON- 9 TOMOS- AÑO 2020.1- TOMO VI- - TOMO V I - 3 - LIBROS SOBRE ENFERMEDADES AUTOINMUNES: TRATAMIENTOS, TIPOS Y DIAGNÓSTICOS- PROFESOR DR. ENRIQUE BARMAIMON- 9 TOMOS- AÑO 2020.1- TOMO VI- - ÍNDICE.- - TOMO I . - - ÍNDICE. - PRÓLOGO.- - INTRODUCCIÓN. - CAPÍTULO I: -1)- GENERALIDADES. -1.1)- DEFINICIÓN. -1.2)- CAUSAS Y FACTORES DE RIESGO. -1.2.1)- FACTORES EMOCIONALES. -1.2.2)- FACTORES AMBIENTALES. -1.2.3)- FACTORES GENÉTICOS. -1.3)- Enterarse aquí, como las 10 Tipos de semillas pueden mejorar la salud. - 1.4)- TIPOS DE TRATAMIENTO DE ENFERMEDADES AUTOINMUNES. -1.4.1)- Remedios Naturales. -1.4.1.1)- Mejorar la Dieta.
    [Show full text]
  • Marine Sediments Illuminate Chlamydiae Diversity and Evolution
    Supplementary Information for: Marine sediments illuminate Chlamydiae diversity and evolution Jennah E. Dharamshi1, Daniel Tamarit1†, Laura Eme1†, Courtney Stairs1, Joran Martijn1, Felix Homa1, Steffen L. Jørgensen2, Anja Spang1,3, Thijs J. G. Ettema1,4* 1 Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, SE-75123 Uppsala, Sweden 2 Department of Earth Science, Centre for Deep Sea Research, University of Bergen, N-5020 Bergen, Norway 3 Department of Marine Microbiology and Biogeochemistry, NIOZ Royal Netherlands Institute for Sea Research, and Utrecht University, NL-1790 AB Den Burg, The Netherlands 4 Laboratory of Microbiology, Department of Agrotechnology and Food Sciences, Wageningen University, 6708 WE Wageningen, The Netherlands. † These authors contributed equally * Correspondence to: Thijs J. G. Ettema, Email: [email protected] Supplementary Information Supplementary Discussions ............................................................................................................................ 3 1. Evolutionary relationships within the Chlamydiae phylum ............................................................................. 3 2. Insights into the evolution of pathogenicity in Chlamydiaceae ...................................................................... 8 3. Secretion systems and flagella in Chlamydiae .............................................................................................. 13 4. Phylogenetic diversity of chlamydial nucleotide transporters. ....................................................................
    [Show full text]
  • The Bright Side of Microbial Dark Matter: Lessons Learned
    Available online at www.sciencedirect.com ScienceDirect The bright side of microbial dark matter: lessons learned from the uncultivated majority 1 2 1 Lindsey Solden , Karen Lloyd and Kelly Wrighton Microorganisms are the most diverse and abundant life forms The first realizations of just how diverse and unexplored on Earth. Yet, in many environments, only 0.1–1% of them have microorganisms are came from analyzing microbial small been cultivated greatly hindering our understanding of the subunit ribosomal RNA (SSU or 16S rRNA) gene sequences microbial world. However, today cultivation is no longer a directly from environmental samples [7]. These analyses requirement for gaining access to information from the revealed that less than half of the known microbial phyla uncultivated majority. New genomic information from contained a single cultivated representative. Phyla com- metagenomics and single cell genomics has provided insights posed exclusively of uncultured representatives are referred into microbial metabolic cooperation and dependence, to as Candidate Phyla (CP). Borrowing language from generating new avenues for cultivation efforts. Here we astronomy, microbiologists operationally define these CP summarize recent advances from uncultivated phyla and as microbial dark matter, because these organisms likely discuss how this knowledge has influenced our understanding account for a large portion of the Earth’s biomass and of the topology of the tree of life and metabolic diversity. biodiversity, yet their basic metabolic and ecological prop- erties are not known. This uncultivated majority represents Addresses 1 a grand challenge to the scientific community and until we Department of Microbiology, The Ohio State University, Columbus, OH solve the mysteries of the CP, our knowledge of the micro- 43210, USA 2 Department of Microbiology, University of Tennessee, Knoxville, TN bial world around us is profoundly skewed by what we have 37996, USA cultivated in the laboratory [8 ].
    [Show full text]
  • Biological Capacities Clearly Define a Major Subdivision in Domain Bacteria 4 5 Raphaël Méheust+,1,2, David Burstein+,1,3,8, Cindy J
    bioRxiv preprint doi: https://doi.org/10.1101/335083; this version posted May 30, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 Research article 2 3 Biological capacities clearly define a major subdivision in Domain Bacteria 4 5 Raphaël Méheust+,1,2, David Burstein+,1,3,8, Cindy J. Castelle1,2,4 and Jillian F. Banfield1,2,4,5,6,7,*,# 6 7 1Department of Earth and Planetary Science, University of California, Berkeley, Berkeley, CA, USA 8 2Innovative Genomics Institute, Berkeley, CA, USA 9 3 California Institute for Quantitative Biosciences (QB3), University of California Berkeley, CA USA 10 4Chan Zuckerberg Biohub, San Francisco, CA, USA 11 5University of Melbourne, Melbourne, VIC, Australia 12 6Lawrence Berkeley National Laboratory, Berkeley, CA, USA 13 7Department of Environmental Science, Policy and Management, University of California, Berkeley, 14 Berkeley, CA, USA 15 8Present address: School of Molecular and Cell Biology and Biotechnology, George S. Wise Faculty of 16 Life Sciences, Tel Aviv University, Tel Aviv, Israel 17 +These authors contributed equally to this work 18 *Correspondence: [email protected] 19 #Lead Contact 20 21 Resume 22 Phylogenetic analyses separate candidate phyla radiation (CPR) bacteria from other bacteria, but 23 the degree to which their proteomes are distinct remains unclear. Here, we leveraged a proteome 24 database that includes sequences from thousands of uncultivated organisms to identify protein 25 families and examine their organismal distributions. We focused on widely distributed protein 26 families that co-occur in genomes, as they likely foundational for metabolism.
    [Show full text]
  • Biosynthetic Capacity, Metabolic Variety and Unusual Biology in the CPR and DPANN Radiations Cindy J
    Biosynthetic capacity, metabolic variety and unusual biology in the CPR and DPANN radiations Cindy J. Castelle 1,2, Christopher T. Brown 1, Karthik Anantharaman 1,5, Alexander J. Probst 1,6, Raven H. Huang3 and Jillian F. Banfield 1,2,4* 1Department of Earth and Planetary Science, University of California, Berkeley, CA, USA. 2Chan Zuckerberg Biohub, San Francisco, CA, USA. 3Department of Biochemistry, University of Illinois, Urbana-Champaign, IL, USA. 4Department of Environmental Science, Policy, and Management, Lawrence Berkeley National Laboratory, Berkeley, CA, USA. 5Department of Bacteriology, University of WisconsinMadison, Madison, WI, USA. 6Department of Chemistry, Biofilm Center, Group for Aquatic Microbial Ecology, University of Duisburg-Essen, Essen, Germany. *e-mail: [email protected] Abstract Candidate phyla radiation (CPR) bacteria and DPANN (an acronym of the names of the first included phyla) archaea are massive radiations of organisms that are widely distributed across Earth’s environments, yet we know little about them. Initial indications are that they are consistently distinct from essentially all other bacteria and archaea owing to their small cell and genome sizes, limited metabolic capacities and often episymbiotic associations with other bacteria and archaea. In this Analysis, we investigate their biology and variations in metabolic capacities by analysis of approximately 1,000 genomes reconstructed from several metagenomics-based studies. We find that they are not monolithic in terms of metabolism but rather harbour a diversity of capacities consistent with a range of lifestyles and degrees of dependence on other organisms. Notably, however, certain CPR and DPANN groups seem to have exceedingly minimal biosynthetic capacities, whereas others could potentially be free living.
    [Show full text]
  • Evidence of Independent Acquisition and Adaption of Ultra-Small Bacteria to Human Hosts Across the Highly Diverse Yet Reduced Genomes of the Phylum Saccharibacteria
    bioRxiv preprint doi: https://doi.org/10.1101/258137; this version posted February 2, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Evidence of independent acquisition and adaption of ultra-small bacteria to human hosts across the highly diverse yet reduced genomes of the phylum Saccharibacteria Jeffrey S. McLeana,b,1 , Batbileg Borc#, Thao T. Toa#, Quanhui Liua, Kristopher A. Kernsa, Lindsey Soldend, Kelly Wrightond, Xuesong Hec, Wenyuan Shic a Department of Periodontics, University of Washington, Seattle, WA, 98195, USA b Department of Microbiology, University of Washington, Seattle, WA, 98195, USA c Department of Microbiology, The Forsyth Institute, Cambridge, Massachusetts 02142 dDepartment of Microbiology, The Ohio State University, Columbus, OH, USA Short title: Host Adaptation of the Saccharibacteria Phylum Keywords: TM7, Saccharibacteria, Candidate Phyla Radiation, epibiont, oral microbiome, # These authors contributed equally to this work. 1To whom correspondence should be addressed. Email: [email protected] Author contribution: JSM, XH, BB and WS conceived the study and designed experiments. XH, JSM, BB, TT, KK, LS, KW, QL conducted experiments. All authors listed analyzed the data. JSM, XH, BB wrote the paper with input from all other authors. All authors have read and approved the manuscript. This research was funded by NIH NIGMS R01GM095373 (J.S.M.), NIH NIDCR Awards 1R01DE023810 (W.S., X.H., and J.M.), 1R01DE020102 (W.S., X.H., and J.S.M.), F32DE025548-01 (B.B.), T90DE021984 (T.T.) 1R01DE026186 (W.S., X.H., and J.S.M.).
    [Show full text]
  • A New View of the Tree of Life
    LETTERS PUBLISHED: 11 APRIL 2016 | ARTICLE NUMBER: 16048 | DOI: 10.1038/NMICROBIOL.2016.48 OPEN A new view of the tree of life Laura A. Hug1†, Brett J. Baker2, Karthik Anantharaman1, Christopher T. Brown3, Alexander J. Probst1, Cindy J. Castelle1,CristinaN.Butterfield1,AlexW.Hernsdorf3, Yuki Amano4,KotaroIse4, Yohey Suzuki5, Natasha Dudek6,DavidA.Relman7,8, Kari M. Finstad9, Ronald Amundson9, Brian C. Thomas1 and Jillian F. Banfield1,9* The tree of life is one of the most important organizing prin- Contributing to this expansion in genome numbers are single cell ciples in biology1. Gene surveys suggest the existence of an genomics13 and metagenomics studies. Metagenomics is a shotgun enormous number of branches2, but even an approximation of sequencing-based method in which DNA isolated directly from the the full scale of the tree has remained elusive. Recent depic- environment is sequenced, and the reconstructed genome fragments tions of the tree of life have focused either on the nature of are assigned to draft genomes14. New bioinformatics methods yield deep evolutionary relationships3–5 or on the known, well-classi- complete and near-complete genome sequences, without a reliance fied diversity of life with an emphasis on eukaryotes6. These on cultivation or reference genomes7,15. These genome- (rather than approaches overlook the dramatic change in our understanding gene) based approaches provide information about metabolic poten- of life’s diversity resulting from genomic sampling of previously tial and a variety of phylogenetically informative sequences that can unexamined environments. New methods to generate genome be used to classify organisms16. Here, we have constructed a tree sequences illuminate the identity of organisms and their meta- of life by making use of genomes from public databases and 1,011 bolic capacities, placing them in community and ecosystem con- newly reconstructed genomes that we recovered from a variety of texts7,8.
    [Show full text]
  • Rich Repertoire of Quorum Sensing Protein Coding Sequences in CPR
    Rich Repertoire of Quorum Sensing Protein Coding Sequences in CPR and DPANN Associated with Interspecies and Interkingdom Communication Charles Bernard, Romain Lannes, Yanyan Li, Éric Bapteste, Philippe Lopez To cite this version: Charles Bernard, Romain Lannes, Yanyan Li, Éric Bapteste, Philippe Lopez. Rich Repertoire of Quorum Sensing Protein Coding Sequences in CPR and DPANN Associated with Interspecies and Interkingdom Communication. mSystems, 2020, 5 (5), 10.1128/mSystems.00414-20. hal-02967400 HAL Id: hal-02967400 https://hal.archives-ouvertes.fr/hal-02967400 Submitted on 14 Oct 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. RESEARCH ARTICLE Ecological and Evolutionary Science crossm Rich Repertoire of Quorum Sensing Protein Coding Sequences in CPR and DPANN Associated with Interspecies and Interkingdom Communication Downloaded from Charles Bernard,a,b Romain Lannes,a* Yanyan Li,b Éric Bapteste,a Philippe Lopeza aInstitut de Systématique, Evolution, Biodiversité (ISYEB), Sorbonne Université, CNRS, Museum National d’Histoire Naturelle, Paris, France bUnité Molécules de Communication et Adaptation des Micro-organismes (MCAM), CNRS, Museum National d’Histoire Naturelle, Paris, France ABSTRACT The bacterial candidate phyla radiation (CPR) and the archaeal DPANN superphylum are two novel lineages that have substantially expanded the tree of life due to their large phylogenetic diversity.
    [Show full text]