Metagenomics and the Unveiling of Biological Dark Matter Robert J
Total Page:16
File Type:pdf, Size:1020Kb
Robbins et al. Standards in Genomic Sciences (2016) 11:69 DOI 10.1186/s40793-016-0180-8 REVIEW ARTICLE Open Access Advances in biodiversity: metagenomics and the unveiling of biological dark matter Robert J. Robbins1,2, Leonard Krishtalka1* and John C. Wooley2 Abstract Background: Efforts to harmonize genomic data standards used by the biodiversity and metagenomic research communities have shown that prokaryotic data cannot be understood or represented in a traditional, classical biological context for conceptual reasons, not technical ones. Results: Biology, like physics, has a fundamental duality—the classical macroscale eukaryotic realm vs. the quantum microscale microbial realm—with the two realms differing profoundly, and counter-intuitively, from one another. Just as classical physics is emergent from and cannot explain the microscale realm of quantum physics, so classical biology is emergent from and cannot explain the microscale realm of prokaryotic life. Classical biology describes the familiar, macroscale realm of multi-cellular eukaryotic organisms, which constitute a highly derived and constrained evolutionary subset of the biosphere, unrepresentative of the vast, mostly unseen, microbial world of prokaryotic life that comprises at least half of the planet’s biomass and most of its genetic diversity. The two realms occupy fundamentally different mega-niches: eukaryotes interact primarily mechanically with the environment, prokaryotes primarily physiologically. Further, many foundational tenets of classical biology simply do not apply to prokaryotic biology. Conclusions: Classical genetics one held that genes, arranged on chromosomes like beads on a string, were the fundamental units of mutation, recombination, and heredity. Then, molecular analysis showed that there were no fundamental units, no beads, no string. Similarly, classical biology asserts that individual organisms and species are fundamental units of ecology, evolution, and biodiversity, composing an evolutionary history of objectively real, lineage-defined groups in a single-rooted tree of life. Now, metagenomic tools are forcing a recognition that there are no completely objective individuals, no unique lineages, and no one true tree. The newly revealed biosphere of microbial dark matter cannot be understood merely by extending the concepts and methods of eukaryotic macrobiology. The unveiling of biological dark matter is allowing us to see, for the first time, the diversity of the entire biosphere and, to paraphrase Darwin, is providing a new view of life. Advancing and understanding that view will require major revisions to some of the most fundamental concepts and theories in biology. Keywords: Prokaryotes, Multi-cellular eukaryotes, Metagenomics, Biodiversity, Tree of life Abbreviations: BSC, Biological species concept; CRT, Conditionally rare taxa; HGT, Horizontal gene transfer; MCE, Multicellular eukaryote Background eukaryotes into biodiversity data sets was straightforward, deal- This paper grew out of our participation in an NSF-funded ing with metagenomic data in a traditional biodiversity context effort to harmonize data standards used by the biodiversity proved to be difficult for conceptual, not technical, reasons. research community with those being developed and used in Because scientific data models should correspond to the the genomic research community and, in particular, with the reality being modeled, good models usually contain data emerging and growing metagenomic research community. objects that are formal abstractions of fundamental scientific Although integrating genomic data from multicellular concepts. When new discoveries challenge these fundamen- tal concepts, the underlying data models must adapt or become obsolete. For example, the discovery that a gene * Correspondence: [email protected] could occur within another gene [1] rendered obsolete any 1Biodiversity Institute, University of Kansas, 1345 Jayhawk Blvd., Lawrence, KS 66045, USA Full list of author information is available at the end of the article © 2016 The Author(s). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Robbins et al. Standards in Genomic Sciences (2016) 11:69 Page 2 of 17 data model that was based on the fundamental assumption instead have profound effects upon the function and fitness that genes are non-overlapping objects. of the multicellular organism. We now know that all MCEs Although non-eukaryotic microbes have long been known are actually functional composites, holobionts, composed of to be a ubiquitous and substantial component of the bio- more prokaryotic cells than eukaryotic cells and expressing sphere, the vast majority have been unculturable and thus more prokaryotic genes than eukaryotic genes. A full under- completely resistant to study, rendering them effectively in- standing of the biology of “individual” eukaryotes will now visible. Until recently, a microscopic examination of any en- depend on an understanding of their associated microbiomes. vironment revealed a teeming, but largely mysterious In 1908, William Bateson [8] described the significance prokaryotic world. (Note: Although “prokaryote” has been of the emerging new methods of Mendelian genetics: deprecated as a taxonomic concept [2], in this document we use the word in the informal sense of “not a eukaryote.”)A In research, as in all business of exploration, the stirring 1997 colloquium report from The American Academy of times come when a fresh region is suddenly unlocked by Microbiology [3] noted: the discovery of a new key. Then conquest is easy and there are prizes for all. … It is no hyperbolical figure that I use [I]n contrast to plants and vertebrate animals in when I speak of Mendelian discovery leading us into a new which 85 to 90 % have been described, it is world, the very existence of which was unsuspected before. conservatively estimated that less than 1 % of the bacterial species … are currently known. As genetics did for the 20th century, so metagenomics is doing for the 21st century—it is leading us into a new In 1998, however, a new technology began to emerge that world of discovery, with prizes for all. Applied to the would transform our ability to study prokaryotes: biosphere, two genomoscope discoveries are fundamental: Handelsman et al. [4] coined the term “metagenome” to first, most of the world’s biodiversity occurs in the characterize the collective genomes of all microbes in a soil prokaryotic realm, the full nature of which was unsus- sample. They proposed to analyze this soil metagenome pected before. Second, many “foundational” biological directly, without first having to culture the organisms in the concepts—such as the objective reality of individual orga- soil. Over the next five years a handful of papers explored nisms—are inadequate to describe and explain this emer- the possibilities of this new method, which (thanks to the ging new world. The full incorporation of the prokaryotic constant drop in cost of sequencing) moved from creating realm into our understanding of the biosphere will enrich clone libraries to the generation of full sequences. In 2003, biology, but at the expense of our having to rethink and Schloss and Handelsman [5] offered the term “metage- reconceptualize some of our most basic notions. nomics” to describe “the genomic analysis of uncultured The resulting conceptual adjustments will offer real chal- microorganisms.” lenges to our current view of biodiversity and will greatly As the utility of metagenomics became apparent, its use complicate the informatics tools needed to document bio- grew rapidly. A PubMed query shows that the number of diversity. Not only will biodiversity informatics need to deal published papers involving metagenomes and metagenomics with an explosion in the amount of biodiversity-relevant has grown from one in 1998 to nearly ten thousand today. data, it may well need to accommodate data that are of a Metagenomics has become almost a new kind of instru- conceptually different form. As any informatics professional ment—asortof“genomoscope”—that allows researchers to knows, making changes to an underlying data model is see the prokaryotic world for the first time. The view is always difficult and fraught with risk. Making changes to stunning. It is essentially Humboldt in a new biosphere (cf., conceptual base classes is the hardest of all. This will be the [6,7]):whereverthegenomoscopeispointed,newdiscover- world of 21st Century biodiversity informatics. ies swarm into view. These discoveries are showing that the full biosphere is sub- The quantum-classical duality in biology stantially more complex than previously conceived and that a Steven Weinberg elegantly captured the duality between conception of nature based only on the biology of multicellu- classical physics and the new quantum cosmology: “Our lar eukaryotes (MCEs) is decidedly inadequate. Attempting to expanding universe is mostly dark energy and dark mat- understand and describe the entire biosphere