Genome-Based Microbial Taxonomy Coming of Age
Total Page:16
File Type:pdf, Size:1020Kb
Downloaded from http://cshperspectives.cshlp.org/ on September 24, 2021 - Published by Cold Spring Harbor Laboratory Press Genome-Based Microbial Taxonomy Coming of Age Philip Hugenholtz, Adam Skarshewski, and Donovan H. Parks Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia QLD 4072, Australia Correspondence: [email protected] Reconstructing the complete evolutionary history of extant life on our planet will be one of the most fundamental accomplishments of scientific endeavor, akin to the completion of the periodic table, which revolutionized chemistry. The road to this goal is via comparative genomics because genomes are our most comprehensive and objective evolutionary docu- ments. The genomes of plant and animal species have been systematically targeted over the past decade to provide coverage of the tree of life. However, multicellular organisms only emerged in the last 550 million years of more than three billion years of biological evolution and thus comprise a small fraction of total biological diversity. The bulk of biodiversity, both past and present, is microbial. We have only scratched the surface in our understanding of the microbial world, as most microorganisms cannot be readily grown in the laboratory and remain unknown to science. Ground-breaking, culture-independent molecular techniques developed over the past 30 years have opened the door to this so-called microbial dark matter with an accelerating momentum driven byexponential increases in sequencing capacity. We are on the verge of obtaining representative genomes across all life for the first time. However, historical use of morphology, biochemical properties, behavioral traits, and single-marker genes to infer organismal relationships mean that the existing highly incomplete tree is riddled with taxonomic errors. Concerted efforts are now needed to synthesize and integrate the burgeoning genomic data resources into a coherent universal tree of life and genome- based taxonomy. SETTING THE STAGE FOR A TAXONOMIC ferring relationships between organisms result- CLASSIFICATION BASED ON ing in a schema that lumped all microbial life EVOLUTIONARY RELATIONSHIPS (which had been discovered 200 years earlier through the advent of the microscope) into a losely following on from Darwin’s thesis single “primitive” kingdom at the base of the Cthat all life forms on our planet arose tree (Fig 1A) (Haeckel 1866). The discovery of from a common ancestor (Darwin 1859), have the structure of DNA in the latter half of the been biologists’ attempts to classify life natural- 20th century and its role as the heritable blue- istically according to evolution. Initially and print of life led to the proposal that genes understandably, phenotype (morphology, de- are a more objective basis than phenotype for velopment, etc.) was the primary basis for in- inferring evolutionary (phylogenetic) relation- Editor: Howard Ochman Additional Perspectives on Microbial Evolution available at www.cshperspectives.org Copyright # 2016 Cold Spring Harbor Laboratory Press; all rights reserved; doi: 10.1101/cshperspect.a018085 Cite this article as Cold Spring Harb Perspect Biol 2016;8:a018085 1 Downloaded from http://cshperspectives.cshlp.org/ on September 24, 2021 - Published by Cold Spring Harbor Laboratory Press P. Hugenholtz et al. ships (Zuckerkandl and Pauling 1965). Carl revealed by the development of culture-inde- Woese ran with this idea focusing on compari- pendent methods. son of universally conserved components of the protein manufacturing machinery in the cell, the ribosomal RNA (rRNA) genes, which he MOST MICROBIAL DIVERSITY IS UNCULTURED BUT HAS RECENTLY BECOME correctly reasoned would produce an objective READILY ACCESSIBLE GENOMICALLY tree of life. His first trees and all subsequent efforts turned the phenotype-based tree on its Culture-independent molecular techniques head; instead of microorganisms occupying founded on 16S rRNA in the mid-1980s (Olsen a lowly corner of the tree, all multicellular life et al. 1986) highlighted our ignorance of most of clustered together in a corner of one of three the tree of life by crudely outlining its borders. newly described primary lines of descent (Fig. This was achieved by sequencing 16S rRNA 1B) (Woese and Fox 1977). Small subunit ribo- genes from bulk DNAs extracted directly from somal RNA- (16S rRNA)-based classification of environmental sources (Pace 1997). This type bacteria and archaea was enthusiastically em- of microbial community profiling has improved braced by microbiologists following Woese’s with increased sequencing and computing ca- discoveries, in large part because natural rela- pacity. The startling conclusion from more tionships between microbes are virtually un- than two decades of such culture-independent detectable using phenotypic properties (Stanier environmental sequence surveys is that .80% and Van Niel 1962). Thirty years on, 16S rRNA of microbial evolutionary diversity is represent- sequences form the basis of microbial classifi- ed by uncultured microorganisms distributed cation; however, vast numbers of discrepancies across upward of 100 major lines of descent exist between taxonomy and phylogeny with within the Bacteria and Archaea (Harris et al. many currently defined taxa not forming evo- 2013) and that the amount of recognized micro- lutionarily coherent (monophyletic) groups. A bial dark matter is still increasing (Fig. 2). conspicuous case in point is the genus Clostrid- The task to obtain representative genomic ium, which is superficially united by a common coverage of all recognized microbial diversity, morphology and ability to produce endospores, estimated conservatively to represent hundreds but represents dozens of phylogenetically dis- of thousands of species (Curtis and Sloan tinct groups within the phylum Firmicutes (Yu- 2005), is daunting. However, two promising tin and Galperin 2013). This greatly impedes culture-independent approaches have emerged our ability to understand the ecology and to achieve this goal. The first is metagenomics, evolution of ecosystems, such as mammalian the application of high-throughput sequencing guts, where clostridia are important functional of DNA extracted directly from environmental populations. The large number of unresolved samples, and the second is single-cell genomics, taxonomic errors is due to a combination of the physical separation and amplification of historical artifacts (phenotypic classification) cells before sequencing. Initially, it was unfeasi- and limitations with rRNA gene trees such as ble to extract genomes of individual popula- poor phylogenetic resolution, inadequate refer- tions from metagenomic data (a bioinformatic ence sequences, and sequencing artifacts (no- process called binning) because of insufficient tably chimeras, as discussed below). Genome sequencing depth and inadequate binning trees inferred that using multiple marker genes tools. Only in some instances could complete offer greater phylogenetic resolution than or near-complete genomes be reconstructed 16S rRNA and other single-marker gene trees from environmental sequences, and these were (Ciccarelli et al. 2006; Lang et al. 2013) and are, typically of dominant populations with mini- therefore, a more reliable basis for taxono- mal genomic heterogeneity (Tyson et al. 2004; mic classification. However, publicly available Garcia Martin et al. 2006; Elkins et al. 2008). genome sequences are still far from represen- In contrast, most populations in early meta- tative of microbial diversity as a whole, as genomic studies remained unidentified, being 2 Cite this article as Cold Spring Harb Perspect Biol 2016;8:a018085 Downloaded from http://cshperspectives.cshlp.org/ Cite this article as AB es Bacteria Mitochondrion Rh tomyc Esc odoc herichi Planc er vobacterium Chloropla yclus Me SynechococcusDe a Fl thanob s sulfovibrio Thermococcus u a e low temp Cold Spring Harb Perspect Biol exibact l in F st r ChlamydiaGloeobacter ac Agrobacterium Methanoth teium Ma Haloferax Chlorobium hanococc Archaeoglobus Thermoplasma Methanopyrusermus Met Methanospirillum Leptonema Marine Gp. 1 low temp Clostridium Gp. 1 low temp onSeptember24,2021-PublishedbyColdSpringHarborLaboratoryPress pSLGp. 12 2 low temp Bacillus pSL 22 Gp. 3 low temp Heliobacterium PyrodictiumSulfolobus Arthrobacter pOPS19 ThermofilumThermoproteus Root pSL 50 Archaea pJP 78 Thermus PS66 p Chloroflexus O JP 27 Thermotogap Aquifex 2016;8:a018085 EM17 Coprin Genome-Based Microbial Taxonomy Coming of Age Hom ous Zea 0.1 changes/site Cryptomonas Achlya Costaria ium Porphyra Giardia besia um aramec Ba P teli Physar Tr i chomonas ctyos Di E T um Encephalitozoon Eucarya uglena rypanosoma Entamoeba Vair Naegleria imo rph a Figure 1. Two representations of the tree of life. The first (A) based on phenotypic comparisons resulting in lumping of microorganisms into a single undifferentiated mass at the base of the tree (circled in red), and the second (B) based on genotypic (rRNA genes) comparisons revealing that most diversity (including the Eucarya) is actually microbial with multicellular life forms only emerging relatively recently in evolutionary history (circled in red). (Panel A is reproduced 3 from Haekcel 1866, and is freely available in the public domain and free of known restrictions under copyright law. Panel B is based on Figure 2 in Barns et al. 1996.) Downloaded from http://cshperspectives.cshlp.org/ on September 24, 2021 - Published by Cold Spring Harbor