Genome‐Centric Resolution of Microbial Diversity, Metabolism And
Total Page:16
File Type:pdf, Size:1020Kb
Genome-centric resolution of microbial diversity, metabolism and interactions in anaerobic digestion Running title: Genome-centric resolution through deep metagenomics Inka Vanwonterghem1,2, Paul D Jensen1, Korneel Rabaey1,3 and Gene W Tyson1,2* 1Advanced Water Management Centre (AWMC), The University of Queensland, St Lucia, QLD 4072, Australia; 2Australian Centre for Ecogenomics (ACE), School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia, QLD 4072, Australia; 3Laboratory for Microbial Ecology and Technology (LabMET), Ghent University, Coupure Links 653, 9000 Ghent, Belgium *Corresponding author: Prof. Gene W. Tyson. Mailing address: Australian Centre for Ecogenomics (ACE), School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia, QLD 4072, Australia. Phone: +617 3365 3829 Fax: +617 336 54511 Email: [email protected] Keywords: metagenomics / genome-centric / functional redundancy / metabolic network / novel diversity / anaerobic digestion This article has been accepted for publication and undergone full peer review but has not been through the copyediting, typesetting, pagination and proofreading process which may lead to differences between this version and the Version of Record. Please cite this article as an ‘Accepted Article’, doi: 10.1111/1462-2920.13382 This article is protected by copyright. All rights reserved. Abstract Our understanding of the complex interconnected processes performed by microbial communities is hindered by our inability to culture the vast majority of microorganisms. Metagenomics provides a way to bypass this cultivation bottleneck and recent advances in this field now allow us to recover a growing number of genomes representing previously uncultured populations from increasingly complex environments. In this study, a temporal genome-centric metagenomic analysis was performed of lab-scale anaerobic digesters that host complex microbial communities fulfilling a series of interlinked metabolic processes to enable the conversion of cellulose to methane. In total, 101 population genomes that were moderate to near-complete were recovered based primarily on differential coverage binning. These populations span 19 phyla, represent mostly novel species and expand the genomic coverage of several rare phyla. Classification into functional guilds based on their metabolic potential revealed metabolic networks with a high level of functional redundancy as well as niche specialization, and allowed us to identify potential roles such as hydrolytic specialists for several rare, uncultured populations. Genome-centric analyses of complex microbial communities across diverse environments provide the key to understanding the phylogenetic and metabolic diversity of these interactive communities. Introduction Microorganisms are ubiquitous in the environment and play key roles in global biogeochemical cycles. As the majority of microbial life has eluded cultivation in the laboratory, culture- independent techniques have been developed to study their diversity and functions (Tringe and Rubin, 2005; Albertsen et al., 2013; Vanwonterghem et al., 2014a). Metagenomics, the sequencing of bulk DNA extracted directly from environmental samples, provides direct access to the 2 This article is protected by copyright. All rights reserved. metabolic potential of a microbial community. Advances in sequence throughput, read length and quality, and bioinformatics tools have contributed to a more widespread application of metagenomics to study natural and engineered systems. Early metagenomic studies relied largely on gene-centric analyses (Venter et al., 2004; Tringe et al., 2005) with the recovery of individual genomes limited to environments dominated by few distinct populations (Tyson et al., 2004). These gene-centric approaches are biased towards existing databases, hereby overlooking a significant fraction of the novel diversity (Jaenicke et al., 2010; Wong et al., 2013). In addition, as only an overview of the metabolic potential of the community is provided without assigning functions to individual populations, important metabolic interactions may remain undetected. The development of new improved sequencing technologies and population genome binning algorithms (Wrighton et al., 2012; Albertsen et al., 2013; Imelfort et al., 2014) has allowed us to move beyond gene-centric approaches and recover population genomes from increasingly complex environments. This has led to the discovery of novel lineages (Brown et al., 2015; Castelle et al., 2015), and insight into the metabolic processes (Raghoebarsing et al., 2006; Haroon et al., 2013) and microbial interactions (Wrighton et al., 2014; Baker et al., 2015) taking place in these environments. Engineered systems offer a controlled environment in which to study complex microbial communities, test hypotheses and explore the efficacy of new metagenomic approaches. Anaerobic digestion provides an interesting study environment as it consists of a series of metabolic processes carried out by a consortium of interdependent microorganisms. This process is a critical component of the global carbon cycle as well as industrially relevant as a waste management strategy and for the production of bioenergy (Amani et al., 2010). Due to the complexity of the communities involved, anaerobic digesters (ADs) remain genomically underexplored and most metagenomic studies have relied on gene-centric approaches (Jaenicke et al., 2010; Hanreich et al., 2013; Wong 3 This article is protected by copyright. All rights reserved. et al., 2013; Solli et al., 2014; Stolze et al., 2015). The recovery of population genomes from various engineered systems has provided genomic insight into candidate phyla such as TM7 (Albertsen et al., 2013) and KSB3 (Sekiguchi et al., 2015), which is responsible for filamentous bulking in anaerobic wastewater treatment, and microbial interactions such as synergistic networks within terephthalate-degrading bioreactors (Nobu et al., 2014). Genome-centric approaches can thus provide a powerful means to understanding the phylogenetic and metabolic diversity in anaerobic digestion. Here, a detailed genome-centric exploration of complex microbial communities in ADs was performed to reconstruct the metabolic network by gaining access to the functional potential of individual population involved in the conversion of cellulose to methane. ADs were operated in triplicate for a year and supplied with cellulose. Metagenomic sequencing was performed on samples taken at two time points (spanning ~8 months), characterized by differences in performance. Co-assembly of the six generated metagenomes followed by differential coverage- based binning resulted in the recovery of 101 population genomes that constitute the majority of the community. These genomes represent 19 phyla and expand the genomic diversity of several lineages with few sequenced representatives. The metabolic reconstruction of individual populations combined with their relative abundance estimates allowed us to study ecological theories through the identification of a high level of functional redundancy, and construct an interaction network for the flow of carbon through the community. These results demonstrate the importance of genome-centric analyses when studying complex communities that harbor novel diversity, and provide the foundation for further hypotheses-driven experiments. Results Metagenomic sequencing and assembly 4 This article is protected by copyright. All rights reserved. The phylogenetic and metabolic diversity of microbial communities involved in anaerobic digestion was studied using a genome-centric metagenomic approach. Three lab-scale ADs (designated AD1, AD2 and AD3) were used as controlled systems in which to study the community dynamics and reconstruct the metabolic network. The ADs were inoculated with a mixture of eight samples taken from anaerobic environmental and engineered systems (Table S1). They were operated for 362 days and supplied with cellulose as the sole carbon and energy source. Samples for metagenomic sequencing were collected from the reactors at two time points (T1: Day 96; T2: Day 362) based on differences in the structure and performance of the microbial communities, which are summarized in Fig. S1 and Table S2, and have been described in detail previously (Vanwonterghem et al., 2014b). Briefly, cellulose hydrolysis was stable at both time points at an average efficiency of 86 ± 4%. Accumulation of predominantly acetate and propionate was observed at T1, with highest volatile fatty acid (VFA) concentrations measured for AD1 which correlated with lower methane production. At T2, VFAs were efficiently converted to methane and only minor differences were observed between the reactors. The six metagenomes (111 Gb total raw reads) from the triplicate ADs at these two time points were co-assembled, generating 494,042 contigs with a combined length of 908 Mb (Table S3). On average, >85% of the metagenomic reads from each dataset mapped onto the contigs (>500 bp) from the combined assembly (Table S4). Microbial community composition and population genome recovery The community composition was determined by extracting the 16S rRNA gene sequences from the metagenomes (Fig. 1) and compared to previously reported amplicon-based community profiles (Fig. S1) (Vanwonterghem et al., 2014b). The most abundant populations belonged to the phyla Euryarchaeota, Actinobacteria, Bacteroidetes, Fibrobacteres, Firmicutes, Spirochaetes and Verrucomicrobia,