Strategies to Improve Reference Databases for Soil Microbiomes

Strategies to Improve Reference Databases for Soil Microbiomes

The ISME Journal (2017) 11, 829–834 OPEN © 2017 International Society for Microbial Ecology All rights reserved 1751-7362/17 www.nature.com/ismej PERSPECTIVE Strategies to improve reference databases for soil microbiomes Jinlyung Choi1, Fan Yang1, Ramunas Stepanauskas2, Erick Cardenas3, Aaron Garoutte4, Ryan Williams1, Jared Flater1, James M Tiedje4, Kirsten S Hofmockel5,6, Brian Gelder1 and Adina Howe1 1Department of Agricultural and Biosystems Engineering, Iowa State University, Ames, IA, USA; 2Bigelow Laboratory for Ocean Sciences, East Boothbay, ME, USA; 3Department of Microbiology & Immunology, University of British Columbia, Vancouver, BC, Canada; 4Center for Microbial Ecology, Michigan State University, East Lansing, MI, USA; 5Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, USA and 6Department of Ecology, Evolution and Organismal Biology, Iowa State University, Ames, IA, USA The ISME Journal (2017) 11, 829–834; doi:10.1038/ismej.2016.168; published online 9 December 2016 Introduction and developed a reference catalog of 3000 genomes that were isolated and sequenced from human-associated Microbial populations in the soil are critical in our lives. microbial populations (Huttenhower et al., 2012). This The soil microbiome helps to grow our food, nourishing publicly available reference set of microbial isolates and and protecting plants, while also providing important their genomic sequences aids in the analysis of human ecological services such as erosion protection, water microbiome sequencing data (Wu et al., 2009; Segata filtration and climate regulation. We are increasingly et al., 2012) and also provides strains for which isolatese aware of the tremendous microbial diversity that has a (both culture collections and nucleic acids) are available role in soil heath; yet, despite significant efforts to as resources for experiments. isolate microbes from the soil, we have accessed only a Our increasing awareness of the links between small fraction of its biodiversity. Even with novel cell o – microbial communities and soil health has resulted isolation techniques, 1 50% of soil species have been in significant investments in using sequence-based cultivated (Janssen et al., 2002; Van Pham and Kim, approaches to understand the soil microbiome. The 2012). Metagenomic sequencing has accelerated our Earth Microbiome Project (www.earthmicrobiome.org) access to environmental microbes, allowing us to alone is characterizing 200 000 samples from research- characterize soil communities without the need to first ers all over the world. Despite increasing volumes of cultivate isolates. However, our ability to annotate and soil sequencing datasets, we currently lack soil-specific characterize the retrieved genes is dependent on the genomic resources to inform these studies. To fill this availability of informative reference gene or genome need, we have curated RefSoil (See Supplementary databases. Methods) from the genomic data that originates from The current genomic databases are not representative cultured representatives originating from soil. RefSoil of soil microbiomes. Contributions to the existing (both its genomes and associated strain isolates) databases have largely originated from human health provides a soil-specific framework with which to anno- and biotechnology research efforts and can mislead tate and understand soil sequencing projects. Addi- annotations of genes originating from soil microbiomes tionally, its curation is the first step in identifying (for example, annotations that are clearly not compatible strains that are currently gaps in our understanding of with life in soil). Soil microbiologists are not the first to soil microbiology, allowing us to strategically target face the problem of a limited reference database. The them for cultivation and characterization. In this NIH Human Microbiome Project (HMP) recognized the perspective, we introduce RefSoil and highlight several critical need for a well-curated reference genome dataset examples of its applications that would benefit diverse users. Correspondence: A Howe, Department of Agricultural and Biosystems Engineering, Iowa State University, 1201 Sukup Hall Ames IA, Ames IA 50011 USA. RefSoil: a soil microbiome database E-mail: [email protected] Received 28 June 2016; revised 10 October 2016; accepted We have curated a reference database of sequenced 21 October 2016; published online 9 December 2016 genomes of organisms from the soil, naming it Strategies to improve reference databases for soil microbiomes J Choi et al 830 RefSoil (See Supplementary Methods). The RefSoil How representative are our existing genomes are a subset of NCBI’s database of references in natural soils? sequenced genomes, RefSeq (release 74), and have While we are able to glimpse into soil microbial ecology been manually screened to include only organisms ’ that have previously been associated with soils. through RefSoil s genomes, its ability to inform natural RefSoil contains a total 922 genomes, 888 bacteria soils depends on the representation of laboratory and 34 archaea (Supplementary Table 1). While isolates in our soils. There are now datasets to assess sharing similar dominant organisms to the RefSeq global soil microbiomes through efforts like the Earth database (for example, Proteobacteria, Firmicutes Microbiome Project (EMP) (Gilbert et al., 2014; Rideout and Actinobacteria), RefSoil contains higher propor- et al., 2014), which have collected a total of 3035 soil tions of Armatimonadetes, Germmatimonadetes, samples and sequenced their associated 16S rRNA gene Thermodesulfobacteria, Acidobacteria, Nitrospirae amplicons. Clustering at 97% sequence similarity, these and Chloroflexi, suggesting that these phyla may be EMP OTUs represent 2158 unique taxonomic assign- enriched in the soil or under-represented in RefSeq. ments (See Supplementary Methods), with varying A total of 11 RefSeq-associated phyla are not abundances estimated in each soil sample (for example, included in RefSoil and these phyla are most likely total count of amplicons). We observed that the majority of these OTUs are rare (for example, only observed in a absent or difficult to cultivate in soil environments o (Supplementary Figure 1). few samples) with 76% of OTUs observed in 10 soil RefSoil can be used to define a representative samples, and 1% of OTUs representing 81% of total framework that can provide insight into potential sequence abundance in EMP. soil functions and genes, and phyla that are To evaluate the presence of RefSoil genomes in soil associated with encoding functions. We observe that samples, EMP 16S rRNA gene amplicons and RefSoil genes related to microbial growth and reproduction 16S rRNA genes were compared, requiring an align- ment with > 97% similarity, a minimum alignment of (for example, DNA, RNA and protein metabolism) ≤ are associated with diverse RefSoil phyla; in con- 72 bp, and E-value 1e-5. Using these criteria, a total of trast, key functions related to metabolism of aromatic 53 538 EMP OTUs shared similarity with RefSoil 16S compounds and iron metabolism are enriched rRNA genes. These OTUs represent a meager 1.4% of in Proteobacteria and Actinobacteria. Similarly, all EMP diversity (unique OTUs) or 10.2% of all EMP dormancy and sporulation genes are enriched in amplicon sequences. Overall, we observe that 99% Firmicutes (Supplementary Figure 2, Supplementary (2 442 432 of 2 476 795) of observed EMP amplicons do Tables 2 and 3). Many of the broader functions not share > 97% similarity to RefSoil genes, suggesting encoded by RefSoil genes are unsurprising (for that EMP soil samples contain much higher diversity example, photosynthesis in Cyanobacteria), but as a than represented within RefSoil (Figure 1) and high- collective framework, RefSoil genomes and their lights the poor representation of our current reference associated isolated strains can allow us to look genomes. Notably, Firmicutes are observed frequently deeper into soil functions. Specifically, understand- in the RefSoil database (Supplementary Figure 3) but ing the functions encoded by specific soil member- are not observed to be highly abundant in soil ship can guide the selection and design of environments (5.7% of all EMP amplicons). Firmicutes representative mock communities for soil processes. have been well-studied as pathogens, (Rupnik et al., For example, an experimental community of isolates 2009; Buffie and Pamer, 2013), likely resulting in their known for participating in nitrogen cycling could biased representation in our databases and conse- include RefSoil strains related to that associated with quently also biased annotations in soil studies. A key assimilatory nitrate reductase nitric and nitrous advantage to the development of the RefSoil database oxide reductase ammonia monooxygenase and nitro- is the opportunity to identify these biases and to ensure gen fixation (selected from Supplementary Figure 2). increasingly representative targets for future curation Another potential opportunity for RefSoil is to efforts. In annotating soil metagenomes with public provide context that can help improve functional databases, organisms and genes that are not associated annotation of genomes. The large majority of genes with soils can consistently be identified; for example, in previously published soil metagenomes (65–90%) in an Iowa corn metagenome annotated by the cannot be annotated against known genes (Delmont MG-RAST database, we identified

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    6 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us