Comparative and Joint Analysis of Two Metagenomic Datasets from a Biogas Fermenter Obtained by 454- Pyrosequencing

Sebastian Jaenicke1, Christina Ander1, Thomas Bekel1, Regina Bisdorf1, Marcus Dro¨ ge2, Karl-Heinz Gartemann3, Sebastian Ju¨ nemann1,4, Olaf Kaiser2, Lutz Krause5, Felix Tille1, Martha Zakrzewski1, Alfred Pu¨ hler6, Andreas Schlu¨ ter6., Alexander Goesmann1*. 1 Computational Genomics, Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany, 2 Roche Diagnostics GmbH, Penzberg, Germany, 3 Department of Genetechnology/Microbiology, Bielefeld University, Bielefeld, Germany, 4 Department of Periodontology, University Hospital Mu¨nster, Mu¨nster, Germany, 5 Division of Genetics and Population Health, Queensland Institute of Medical Research, Herston, Australia, 6 Institute for Genome Research and Systems Biology, Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany

Abstract Biogas production from renewable resources is attracting increased attention as an alternative energy source due to the limited availability of traditional fossil fuels. Many countries are promoting the use of alternative energy sources for sustainable energy production. In this study, a metagenome from a production-scale biogas fermenter was analysed employing Roche’s GS FLX Titanium technology and compared to a previous dataset obtained from the same community DNA sample that was sequenced on the GS FLX platform. Taxonomic profiling based on 16S rRNA-specific sequences and an Environmental Gene Tag (EGT) analysis employing CARMA demonstrated that both approaches benefit from the longer read lengths obtained on the Titanium platform. Results confirmed as the most prevalent taxonomic class, whereas of the order Methanomicrobiales are dominant among methanogenic Archaea. However, the analyses also identified additional taxa that were missed by the previous study, including members of the genera Streptococcus, Acetivibrio, Garciella, Tissierella, and Gelria, which might also play a role in the fermentation process leading to the formation of methane. Taking advantage of the CARMA feature to correlate taxonomic information of sequences with their assigned functions, it appeared that , followed by Bacteroidetes and Proteobacteria, dominate within the functional context of polysaccharide degradation whereas Methanomicrobiales represent the most abundant taxonomic group responsible for methane production. Clostridia is the most important class involved in the reductive CoA pathway (Wood-Ljungdahl pathway) that is characteristic for acetogenesis. Based on binning of 16S rRNA-specific sequences allocated to the dominant genus Methanoculleus, it could be shown that this genus is represented by several different species. Phylogenetic analysis of these sequences placed them in close proximity to the hydrogenotrophic methanogen Methanoculleus bourgensis. While rarefaction analyses still indicate incomplete coverage, examination of the GS FLX Titanium dataset resulted in the identification of additional genera and functional elements, providing a far more complete coverage of the community involved in anaerobic fermentative pathways leading to methane formation.

Citation: Jaenicke S, Ander C, Bekel T, Bisdorf R, Dro¨ge M, et al. (2011) Comparative and Joint Analysis of Two Metagenomic Datasets from a Biogas Fermenter Obtained by 454-Pyrosequencing. PLoS ONE 6(1): e14519. doi:10.1371/journal.pone.0014519 Editor: Ramy K. Aziz, Cairo University, Egypt Received July 15, 2010; Accepted December 14, 2010; Published January 26, 2011 Copyright: ß 2011 Jaenicke et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: T. Bekel acknowledges financial support from the Bundesministerium fur Bildung und Forschung (BMBF), SysMAP project (grant 0313704). S. Jaenicke acknowledges financial support by the BMBF grant 0313805A ‘GenoMik-Plus’. S. Ju¨nemann was supported by the BMBF PathoGeno-Mik Plus project 0313801N. The METAEXPLORE grant of the European Commission (KBBE-222625) is gratefully acknowledged. These funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. M. Dro¨ge and O. Kaiser are employees of Roche Diagnostics GmbH; sequencing of the obtained DNA sample was funded by Roche Diagnostics GmbH. Competing Interests: Marcus Dro¨ge and Olaf Kaiser are employees of Roche Diagnostics GmbH, Germany. This does not affect the authors’ adherence to all the PLoS ONE policies on sharing data and materials. * E-mail: [email protected] . These authors contributed equally to this work.

Introduction mentally compatible energy source [1,2]. Moreover, generation of energy from biogas relies on a balanced carbon dioxide cycle. In The fraction of renewable energy forms for energy supply is Germany biogas is mainly produced from energy crops such as constantly increasing since fossil fuels are running short and energy maize and liquid manure in medium-sized agricultural biogas production from fossil fuels brings about emissions of the plants [1]. The microbiology of biogas formation from organic greenhouse gas carbon dioxide which has implications on the matter is complex and involves interaction of different microor- climate. In this context the production of biogas by means of ganisms. In the first step of the digestion process, organic polymers fermentation of biomass becomes more and more important of the substrate such as cellulose, other carbohydrates, proteins and because biogas is regarded as a clean, renewable and environ- lipids are hydrolysed to low-molecular weight compounds [3–5].

PLoS ONE | www.plosone.org 1 January 2011 | Volume 6 | Issue 1 | e14519 Biogas

Cellulolytic Clostridia and Bacilli are among other important FLX platform. This paper describes an integrated analysis of the for this step. Subsequently, fermentative bacteria convert low- GS FLX and the GS FLX Titanium datasets with the objective to molecular weight metabolites into volatile fatty acids, alcohols, and deepen the knowledge on the taxonomic structure and composi- other compounds which are then predominantly metabolised to tion of a microbial community involved in biogas production acetate, carbon dioxide and hydrogen by syntrophic bacteria [6– within an agricultural, production-scale biogas plant. Moreover, 11]. These latter compounds are in fact the substrates for methane the described analysis intends to elucidate the metabolic capacity synthesis which is accomplished by methanogenic Archaea [12,13]. of the community, functional roles of specific microorganisms and Hydrogenotrophic Archaea are able to reduce carbon dioxide to key organisms for the biogas production process. methane using hydrogen as an electron donor, whereas aceticlastic Archaea convert acetate to methane [14–18]. The biochemistry and Methods enzymology of methanogenesis is well known for model organisms, but the functioning of biogas-producing microbial communities on Total community DNA preparation and sequencing the whole is insufficiently explored. Community structures of Total community DNA of a biogas fermentation sample obtained biogas-producing microbial consortia were analysed for different from an agricultural biogas plant was prepared by a CTAB-based systems and settings including a thermophilic municipal biogas DNA-isolation method [36] as described recently. More detailed plant [19], a thermophilic anaerobic municipal solid-waste digester information on the origin of the fermentation sample is given in a [20], thermophilic upflow anaerobic filter reactors [21], a previous publication [33]. An aliquot of the DNA preparation that completely stirred tank reactor fed with fodder beet silage [22], a was recently used for whole genome shotgun sequencing on the two-phase biogas reactor system operated with plant biomass [23], Genome Sequencer (GS) FLX platform now served as template DNA an anaerobic sludge digester [24], mesophilic anaerobic chemostats for sequencing on the GS FLX Titanium platform. The sequencing [25,26], a packed-bed reactor degrading organic solid waste [27] library was constructed according to the protocol of the GS Titanium and many other habitats. Most of these studies were based on the General Library Prep Kit (Roche Applied Science). After titration of construction of 16S rRNA clone libraries and subsequent the library using the GS Titanium SV emPCR Kit, a full sequencing sequencing of individual 16S rRNA clones. The resulting nucleotide run was carried out on the GS FLX Titanium platform. sequences were then taxonomically and phylogenetically classified to deduce the structure of the underlying community. Also, mcrA Data normalization clone libraries were used to elucidate methanogenic archaeal To assess the overall comparability of the pyrosequencing communities of different habitats [28–32]. The mcrA gene encodes datasets obtained from the GS FLX and Titanium platforms, the the alpha subunit of methyl-coenzyme M reductase representing the average GC content of all reads was determined for each dataset. final enzyme in the methanogenesis pathway. Since mcrA is present For this, various Perl scripts were developed to determine the in all methanogenic Archaea analysed so far, it serves as a overall and individual GC content of the obtained reads. Results phylogenetic marker for this group of Archaea. Usually, analyses of were visualized using the statistical computing software R [37]. mcrA and 16S rRNA clone libraries do not cover the whole To normalize the data with respect to the observed GC bias (see complexity of the respective habitats since sequencing can only be Results), an outlier detection approach was applied and the done for limited numbers of clones. Moreover, results of clone endpoints of the linear phases were determined; sequences longer library analyses are always biased by the choice of primers that are than the computed thresholds were excluded. A linear regression used for amplification of marker gene fragments and cloning was calculated for the GC plot beginning with 30 data points efficiencies. In recent years, microbial communities have been starting at the 100 bp position to exclude the portion of the dataset studied on the basis of their metagenomes which became accessible with high variance in GC content. In the next step, externally by applying high-throughput sequencing technologies. Recently, studentized residuals [38] were computed for the linear regression. the first metagenome sequencing approach for a biogas-producing Each data point was inspected whether it deviated from the linear community was described [33]. Community DNA isolated from a trend using a Student-t-test. A data point was regarded as being an production-scale biogas plant fed with maize silage, green rye and outlier if the p{value of the studentized residuals test was below low amounts of chicken manure was sequenced on the Genome 0.05. If ten outliers in a row were found, the first one of these is Sequencer FLX platform which resulted in 142 million base pairs of representing the end of the linear phase and was taken as threshold sequence information. Bioinformatic methods were employed to value for filtering of the dataset. deduce the taxonomic composition and functional characteristics of To rule out sequencing errors as the reason for the observed the intrinsic biogas community [34]. Analysis of the community decreasing GC content of the longer reads, the analysis was revealed Clostridia as the most prevalent phylogenetic class, whereas repeated for publicly available pyrosequencing datasets from both species of the order Methanomicrobiales are dominant among metagenome as well as genome sequencing projects which were methanogenic Archaea. obtained from NCBI’s Short-Read Archive (SRA). The effect in Similar results were obtained by parallel construction of 16S question could be observed for all metagenome datasets, where the rRNA and mcrA amplicon libraries and subsequent sequencing of DNA fragments from a mixture of organisms have a broad cloned fragments [35]. Moreover, bioinformatics results indicated distribution concerning their GC contents. Single genome that Methanoculleus species play a dominant role in methanogenesis sequencing projects, on the other hand, did not show this effect, and that Clostridia are important for hydrolysis of plant biomass in thereby ruling out sequencing errors as an alternative explanation the analysed fermentation sample. Rarefaction analysis of the (see File S1). The comparably narrow GC distribution in the metagenome data showed that the sequencing approach was not sequence data from single genome projects does not reveal this carried out to saturation. Sufficient coverage of non-abundant effect, and while backfolding might limit maximum obtainable microbial groups in the fermentation sample would require deeper read length, similar filtering steps would not be a prerequisite sequencing. Therefore, the available total community DNA before e.g. subsequent assembly. preparation from the biogas fermentation sample was additionally A recent study [39] described the occurence of artificially created sequenced on the GS FLX Titanium platform, which provides duplicate reads in datasets generated using the pyrosequencing longer read lengths and increased throughput compared to the GS method. It is assumed that the duplication of individual DNA

PLoS ONE | www.plosone.org 2 January 2011 | Volume 6 | Issue 1 | e14519 Biogas fragments occurs during the emulsion PCR reaction, which is a step homology search and classified as belonging to the genus of the library preparation. These nearly identical sequences might Methanoculleus, as described above, were assembled into longer lead to inflated estimates of functional genetic elements or introduce contigs. an artificial shift in taxonomic profiles, unless they are filtered from Here, the 16S rRNA nucleotide sequence from Methanoculleus the dataset. In this study, these duplicates were removed using the bourgensis (GenBank accession AY196674) was used as an assembly cdhit-454 [40] program, which filters almost identical reads template. The pyrosequencing reads were aligned to the reference beginning at the same position. Each dataset was processed sequence using ‘align0’ from the FASTA3 package [48] and then separately using the accurate mode (option ‘-g 1’) of the software. clustered into several groups based on their SNP (single nucleotide polymorphism) content. Afterwards, each of the resulting groups of Identification and taxonomic classification of 16S rRNA reads was separately assembled employing Roche’s GS de novo fragments Assembler software. To put the resulting consensus sequences into A BLAST [41] search versus the RDP database (Release 10.10) context, a phylogenetic characterization was performed. Together was conducted to identify reads carrying fragments of 16S rRNA with other 16S rRNA sequences obtained from GenBank, a genes. Since BLAST excludes regions with low sequence complexity multiple alignment of all sequences was computed using the by default, the sequence complexity filter was explicitly disabled MUSCLE tool [49] and a phylogenetic tree was generated with (option ‘-F F’). Alignments with an E-value of 10{5 or better and a the fdnapars program [50], which implements the DNA minimum length of 50 bp were extracted and processed using the parsimony algorithm. RDP classifier [42], which employs a naive bayesian classifier to assign the sequences to taxonomic categories. Only 16S rRNA Mapping of pyrosequencing reads to Methanoculleus fragments with at least 80% assignment confidence were considered. marisnigri JR1 To identify the overall sequence similarity between the Taxonomic classification of Environmental Gene Tags metagenomic reads and the genome of Methanoculleus marisnigri (EGTs) JR1, all reads were mapped to the published DNA sequence of this Metagenomic sequences were filtered using a BlastX search genome [51]. In a previous study [34], reads were aligned to reference sequences using BLAST with rather relaxed settings (E- against the Pfam database [43]. Reads without a hit were {4 additionally scanned for conserved protein domains by conducting Value ,10 , aligned region .100 bp, sequence identity .80%). a search for protein family members using Pfam HMMs. Since this approach did not account for the relative fraction of a Community sequence reads that were predicted to contain read taking part in an alignment, in this study the gsMapper fragments of genes (environmental gene tags, EGTs) were software from Roche was employed. Coverage information for the subsequently classified on different taxonomic ranks based on a genome and individual genes was extracted from the gsMapper phylogenetic tree of the metagenomic read itself and the matching output using various Perl scripts; the results for gene coverage were Pfam protein family member sequences. A full description of the normalized and divided into two groups based on the prediction CARMA pipeline can be found in the original publication [44]. found in the HGT-DB [52], a database of procaryotic genomes that uses different statistical properties of coding sequences to Here, an unpublished improved version of CARMA was predict whether they may have been acquired by horizontal gene applied to process both pyrosequencing datasets using the default transfer. While GC content is one of these parameters, several settings. For reads containing more than one EGT and therefore others such as codon and amino acid usage are also used. The R more than one classification, the taxonomic classifications were statistical software was employed to compute the kernel density merged into a single one that was determined as the lowest estimates for both groups; for this, an Epanechnikov kernel with a common ancestor of all classifications. While this approach bandwith of 0.4 was used. decreases the number of classifications on the lower taxonomic ranks, it simultaneously eliminates contradicting classifications and reduces the influence of potential mis-classifications. Detection of hydrogenase gene fragments To analyse the occurrence of metagenomic reads encoding Community participation in substrate decomposition, hydrogenases or proteins involved in hydrogen uptake systems, corresponding genes annotated as hydrogenases or hydrogen fermentation and methane production uptake systems were extracted from the reference genome of For further analysis both datasets were merged together, as no Methanoculleus marisnigri JR1 [51]. Related gene fragments in the significant differences in the as well as in the related metagenome datasets were identified by a BLAST search using an enzyme composition could be observed. Orthologous groups of e-value cutoff of 10{5 and a disabled sequence complexity filter genes were retrieved by comparing the sequences to the eggNOG (option ‘-F F’); for each sequence, only the best BLAST hit was database [45]. For this, sequences were assigned to COGs and considered. NOGs using BlastX with an E-value cutoff of 10{6 and annotated according to their best hit. Additionally, taxonomic information Biodiversity and Rarefaction analysis was added using the results obtained from the CARMA pipeline. Enzymes relevant for biochemical processes were identified with To gain an overview of the biodiversity of the studied microbial the help of MetaCyc [46] and the corresponding COGs (Clusters community, the Shannon index was computed based on 16S rRNA fragments classified on rank genus with at least 80% of Orthologous Groups) were selected and grouped. The confidence as conclusive tree construction based on the NCBI taxonomy [47], visualisation and data analysis were performed using unpublished Xn software. : H~{ pi ln pi, i~1 Identification of Methanoculleus variants To confirm the presence of different Methanoculleus species in the where pi is the relative abundance of sequences assigned to genus biogas fermenter, 16S rRNA fragments that were identified by a i, and n denotes the total number of different genera.

PLoS ONE | www.plosone.org 3 January 2011 | Volume 6 | Issue 1 | e14519 Biogas

Additionally, a rarefaction analysis was employed to assess the hand, DNA fragments with lower GC content are not affected by coverage of the microbial community by the datasets. The number this problem and can be amplified over their full length. of genera that would be observed for different sample sizes was Previous studies have already identified GC biases in reads estimated using Analytic Rarefaction (version 1.3). Rarefaction generated on the Illumina platform [53], but no such findings have curves were obtained by plotting the sample sizes versus the yet been reported for Roche’s pyrosequencing method. Such biases estimated number of genera. caused by the sequencing chemistry and protocol are likely to A separate rarefaction analysis was conducted to assess the remain unnoticed in many cases, where research is focused on coverage of the collective gene content of the microbial analysis of single datasets with no additional data available for community. The number of Pfam protein families that would be comparison. In the context of metagenome studies, these effects can identified in metagenomes of different sizes was estimated using easily lead to a distortion of results, where e.g. taxonomic profiles Analytic Rarefaction. Rarefaction curves were generated by would underestimate the relative abundance of GC-rich organisms. plotting the number of Pfam families versus the sample sizes. In this study, both datasets were filtered to minimize the impact of the identified GC bias and exclude the artificial duplicate sequences. Data availability The endpoints of the linear phases of the GC plot were determined Sequence data from both the GS FLX and Titanium run has at 262 bp for the GS FLX and 535 bp for the Titanium dataset; been deposited at the NCBI Short Read Archive (SRA) under the subsequent removal of sequences longer than the computed accessions SRR030746.1 for the GS FLX and SRR034130.1 for thresholds resulted in the exclusion of 169,569 (27.52%) of the the Titanium dataset. Assembled 16S rRNA consensus sequences GS FLX reads and 26,795 (1.98%) sequences from the Titanium of the different Methanoculleus variants have been submitted to the dataset, which is only marginally affected. This shows a significant GenBank database (accessions GU731070 to GU731076). improvement regarding GC bias for the GS FLX Titanium chemistry compared to the previous GS FLX chemistry. Results After the removal of technical replicates, 407,558 reads from the GS FLX dataset and 1,019,333 of the Titanium reads passed all Sequencing of the biogas microbial community on the filtering steps and are suitable for further analysis. Throughout this GS FLX Titanium platform and data preprocessing study, these normalized datasets were used for the computation of Metagenomic DNA from a biogas-producing microbial com- relative abundances of either taxonomic groups or the functional munity residing in the fermenter of an agricultural biogas plant fed characterization. with maize silage, green rye and low amounts of chicken manure was recently sequenced on the Genome Sequencer (GS) FLX Taxonomic composition of the microbial community platform [33]. This approach resulted in 616,072 sequence reads based on 16S rRNA analysis with an average read length of 230 bases, accounting for The DNA sequence of the 16S rRNA gene has found wide approximately 142 million bases sequence information. Analysis application for taxonomic and phylogenetic studies. It is highly of the obtained sequence data revealed that sequencing was not conserved between both Archaea and Bacteria, but also contains carried out to saturation [34]. To achieve a deeper coverage of the hypervariable regions that can be exploited for accurate intrinsic biogas community and to elucidate its whole complexity, taxonomic assignments. Using the previously filtered pyrosequenc- the same community DNA preparation that was used for ing reads to avoid a distortion of taxonomic profiles, a BLAST sequencing on the Genome Sequencer FLX was now sequenced homology search using the RDP database was performed. In this on the GS FLX Titanium platform. The single Titanium run step, 616 16S rRNA fragments with an average length of 159.5 bp yielded 1,347,644 reads with an average length of 368 bases were identified in the GS FLX dataset, while 2,709 fragments with resulting in 495.5 million bases sequence information, which an average length of 245.2 bp from the Titanium reads could be represents a 3.5-fold increase in coverage of the sample compared detected. The differing number of 16S rRNA sequences identified to the previous sequence dataset. Statistical data of the Titanium in the GS FLX dataset in comparison to the findings of a previous run are summarized and compared to the sequencing approach on study [34] can be explained by the data normalization step as the GS FLX platform in Table 1. Although aliquots of the same described above. Furthermore, the ARB database [54] was used DNA sample were used for sequencing on both platforms, the instead of the RDP database. All identified 16S rRNA sequences average GC content of the reads generated on the GS FLX were taxonomically classified by means of the RDP classifier [42]. platform was determined as 51.7%, while the GC content of the For all taxonomic ranks except domain, the RDP classifier was Titanium reads was determined as only 47.4%. Also, the average able to assign a larger fraction of 16S rRNA fragments from the GC content for different read lengths was analysed. Results Titanium dataset with at least 80% confidence than from the FLX showed that GC content and obtained read length clearly data. While 18.2% of the 16S rRNA fragments from the GS FLX correlate for both GS FLX and the Titanium platform. The reads could be classified on rank order, 25.3% of the Titanium applied sequencing methods did not only generate reads with 16S rRNA fragments were assigned on this rank, showing that the different average levels of GC content, but also both pyrose- RDP classifier benefits from the longer reads generated on the quencing platforms show a significant decline in GC content once Titanium platform. On rank genus, 6.0% of the GS FLX 16S the read length exceeds a certain value (Figure 1). rRNA fragments and 10.6% of the Titanium 16S rRNA fragments While the differences in average GC content can be explained could be assigned. Nevertheless, it has to be noted that only a low by variations in the sequencing chemistry and protocol used for fraction of pyrosequencing reads actually contains 16S rRNA sequencing on the GS FLX and Titanium platforms, the sharp fragments (0.15% of the filtered GS FLX reads; 0.26% of the decline of the GC content present in the longer reads is most filtered Titanium reads). While 16S rRNA genes are reliable probably caused by backfolding. It is assumed that the GC-rich phylogenetic markers, the low number that could be detected in parts of the synthesized ssDNA are forming stable secondary the present metagenome datasets gives only a broad overview of structures during the emulsion PCR process, which subsequently the most abundant taxonomic groups and may be insufficient to leads to a termination of the PCR reaction and only the non- obtain a detailed picture of the taxonomic composition of a folded part of the DNA fragment being amplified. On the other metagenome.

PLoS ONE | www.plosone.org 4 January 2011 | Volume 6 | Issue 1 | e14519 Biogas

Table 1. Sequencing results. that CARMA as well benefits from the increased read lengths generated by the Titanium platform. The community composition as deduced from EGTs identified in both datasets is almost number of number of avg. read identical: The majority of EGTs was classified as belonging to the reads bases length superkingdoms Bacteria (GS FLX: 67%; Titanium: 70%) and Archaea (11% and 8%, respectively). This conformity of taxonomic GS FLX 616,072 141,685,079 230.0 bp composition was also observed for almost all the prevalent and Titanium 1,347,644 495,506,659 367.7 bp rare taxa at other taxonomic ranks (Figure 2). Despite this general ratio Titanium/GS FLX 2.19 3.50 1.60 accordance, there are some noteworthy differences between the GS FLX and Titanium datasets. While the summarized Overview of sequence data obtained from the studied biogas fermenter employing different pyrosequencing technologies. percentage of identified taxa at rank genus is roughly the same doi:10.1371/journal.pone.0014519.t001 in both datasets, the actual abundance of specific genera differs between both datasets (Figure 3). A higher amount of variation Taxonomic classification of Environmental Gene Tags exists especially for the most abundant genera between the two datasets: Both datasets support Methanoculleus as the most abundant (EGTs) genus, but a higher fraction of GS FLX reads was assigned to this One of the major limitations of the taxonomic classification of genus than from the Titanium reads (4.36% of the GS FLX 16S rRNA fragments is the low fraction of reads actually dataset; 3.46% of the Titanium data). For the genus Clostridium, containing 16S rRNA specific sequences. The CARMA software this ratio is reversed: only 2.78% of the GS FLX reads, but 3.19% [44] overcomes this limitation as it is based on the identification of of the Titanium sequences were allocated to this genus. Similar environmental gene tags (EGTs) in community sequences using differences can also be noted for Bacteroides and Bacillus. Even profile hidden Markov models (pHMMs) and the phylogenetic though differences exist between the two taxonomic profiles, deduction of the taxonomic origin of these fragments. The major results based on analysis of the Titanium dataset can be considered strength of this approach is the high accuracy of pHMMs for the more reliable since they were deduced from a larger amount of detection of short functional segments of Pfam protein family sequence information. The discrepancy of the taxonomic profile members. The search for conserved Pfam protein fragments for the GS FLX dataset as shown here in contrast to previously resulted in a total of 100,546 identified EGTs (25% of all reads) for reported results [34] is due to the applied GC filtering step and the GS FLX dataset and 329,550 (32% of all reads) for the adjustments within the CARMA software. Applying an unpub- Titanium dataset, respectively. At the taxonomic rank super- lished, enhanced version of CARMA to the unfiltered GS FLX kingdom, 88,528 of the GS FLX reads (22%) and 290,008 of the dataset, EGTs were identified for 167,134 (32%) of the reads, Titanium reads (28%) could be successfully classified. The fact that while only 133,337 (22%) EGTs were reported in the previous a higher fraction of Titanium reads could be assigned indicates study.

Figure 1. Read length and average GC content of pyrosequencing reads. A sharp decline in GC content can be seen once the read length exceeds a certain value. The vertical bars indicate the computed filtering thresholds for the GS FLX and the Titanium dataset, respectively. doi:10.1371/journal.pone.0014519.g001

PLoS ONE | www.plosone.org 5 January 2011 | Volume 6 | Issue 1 | e14519 Biogas

Figure 2. Characterization of the GS FLX and Titanium datasets based on the taxonomic classification of Environmental Gene Tags (EGTs). Displayed are only the most abundant taxa among Bacteria and Archaea lineages at various taxonomic ranks. For each group, the first number represents the number of assigned EGTs from the GS FLX dataset, the second number the EGTs from the Titanium dataset, respectively. Due to the different number of reads obtained from each sequencing platform, the amount of EGTs listed for the Titanium dataset is typically the fourfold of the number listed for the GS FLX dataset. doi:10.1371/journal.pone.0014519.g002

New genera identified in the GS FLX Titanium dataset mesophilic conditions [58]. Acetivibrio species most probably play a Due to the deeper coverage of the metagenome by the dataset role in cellulose degradation [58,59]. Species of the genera obtained on the GS FLX Titanium system, new genera were Garciella, Tissierella, Gracilibacter and Gelria (all Firmicutes) are also identified in the corresponding taxonomic profile (see Table 4.1, adapted to anaerobic habitats where they are involved in different File S1). These include Streptococcus, Acetivibrio, Garciella, Tissierella, fermentative pathways [60–65]. Interestingly, a reference species Gracilibacter, Gelria, Dysgonomonas and Arcobacter to mention just a of the genus Geleria, namely G. glutamica, was isolated from a few. Streptococcus species were previously detected in different propionate-oxidizing methanogenic enrichment culture and rep- anaerobic habitats, especially in a mesophilic hydrogen-producing resents an obligately synthrophic, glutamate-degrading bacterium sludge and a glucose-fed methanogenic bioreactor [55–57]. In the that is able to grow in co-culture with a hydrogenotrophic latter habitat, an acetate- and propionate-based food chain was methanogen [65]. In this context it should be mentioned that prevalent but the specific functions of the Streptococcus members hydrogenotrophic methanogens are dominant in the fermentation dominating the bioreactor are not known [55]. Sequences related sample analysed in this study. The genus Dysgonomonas (see Table to the genus Acetivibrio (Firmicutes) were recently recovered from a 4.1, File S1) belongs to the family Porphyromonadaceae of the order community involved in methanogenesis utilizing cellulose under Bacteroidales. Members of the genus Dysgonomonas were inter alia

PLoS ONE | www.plosone.org 6 January 2011 | Volume 6 | Issue 1 | e14519 Biogas

Figure 3. Comparison of taxonomic profiles on rank genus. The taxonomic profiles for the GS FLX (black bars) and the Titanium (lightgray bars) datasets were computed employing the CARMA pipeline. The percentage values correspond to the total amount of reads in the filtered datasets; included are the ten most abundant genera. doi:10.1371/journal.pone.0014519.g003 isolated from stool samples and are able to ferment glucose processes. For this purpose all reads were classified according to resulting in the production of acids [66]. Likewise, bacteria of the Cluster of Orthologous Groups (COG) categories to infer the genus Alkaliflexus also cluster within the order Bacteroidales and functional potential of the underlying community. In a second represent anaerobic saccharolytic organisms [67]. Other genera step, obtained COG results were annotated with the taxonomic such as Arcobacter were indeed found in an anaerobic community information generated by the CARMA software. This approach digesting a model substrate for maize but seem to play no led to the identification of 292,782 reads (about 21% of the dominant role in degradation of polysaccharides [68]. Some of the dataset) for which both functional as well as taxonomic genera listed in Table 4.1 (File S1) have not been described for information could be retrieved. Moreover, a subset of COG anaerobic, methanogenic consortia so far and hence their entries representing (a) the process of ‘polysaccharide degradation’, involvement in the biogas production process remains unknown. (b) ‘acetogenesis’ and (c) the ‘methanogenesis’ step within the In summary, the more detailed taxonomic analysis presented in fermentation process were chosen for a more detailed taxonomic this study also revealed less-abundant genera that were missed in analysis (see File S1). Even though not all reads classified into the the previous taxonomic profile for the same community. Some of selected COG categories may actually represent the pathways in the newly identified genera presumably are of importance for the the focus of this approach, this analysis provides insights into the biogas production process. relevance of different taxonomic groups for the hydrolysis, acetogenesis, and methanogenesis steps in fermentation of Community participation in substrate decomposition, biomass. Misallocation of reads to these processes can be due to fermentation and methane production the fact that some COG entries include enzymes involved in As described in a previous study, Firmicutes and Methanomicrobiales different, but functionally related pathways. play a crucial role in hydrolysis, acetogenesis and methanogenesis One of the first steps in the fermentation process is the representing key steps in anaerobic degradation of plant biomass breakdown of polysaccharide components of plant cell material. [34]. In this study corresponding pathways are investigated in Especially, the cell wall polymers cellulose, hemicellulose and more detail using the combined dataset consisting of the GS FLX pectin constitute a high amount of the carbon and energy resource and newly acquired Titanium reads, and an elaborated method- that is available for bacteria in the fermentation sample. ology to identify key organisms involved in the above mentioned Accordingly, hydrolysis of plant biomass is considered to be the

PLoS ONE | www.plosone.org 7 January 2011 | Volume 6 | Issue 1 | e14519 Biogas rate-limiting step in biogas production. Based on the assignment to Methanoculleus by the RDP classifier, but could not be assembled COG entries (see File S1), a total of 4,762 reads were classified as into a single consensus sequence due to differences in sequence potentially coding for enzymes involved in the degradation of composition. Binning of the individual reads based on their SNP complex polymers, namely cellulose, hemicellulose, and lignin content (Figure 5A) and subsequent assembly resulted in seven (Figure 4a). Reads allocated to the cellulose degradation process consensus sequences, each of which comprised at least the partial account for 71% of the 4,762 reads. Hence, they constitute the sequence of a specific 16S rRNA gene. Subsequently, the most prevalent group followed by reads predicted to encode consensus sequences were characterized in terms of their hemicellulose degrading enzymes (27%). As expected, reads phylogeny together with several reference sequences obtained related to lignin degradation (2%) are only rarely found. This is from GenBank (Figure 5B). All sequences except one were placed due to the fact that lignin degradation predominantly occurs under in close phylogenetic distance to Methanoculleus bourgensis; the one aerobic conditions, whereas fermentation of plant biomass for remaining sequence was placed in a remote branch formed by biogas production is an anaerobic process. Most of the reads Methanoculleus marisnigri, Methanoculleus palmolei, Methanoculleus chiku- (1,571) assigned to the ‘polysaccharide degradation’ context goensis and Methanoculleus thermophilus (Figure 5B). These findings are originate from members of the Firmicutes making this phylum the consistent with the results of a previous study [35], where a most important one. Additionally, Bacteroidetes (661) and Proteobac- phylogenetic characterization of the same biogas plant based on teria (319) are involved in this process, since significant numbers of 16S rRNA clone library sequences was performed. reads were grouped into these taxa. The phylum Actinobacteria For verification, the same analysis was repeated for the 5S includes a high fraction of reads encoding fragments of lignin rRNA and the mtrB gene. While the mcr operon is partially degrading enzymes. This is in accordance with literature, since duplicated in the genome of Methanoculleus marisnigri JR1, only one Actinobacteria are known to express different ligninases [69]. copy of the mtrB gene exists. Four different variants of the mtrB The reductive acetyl-CoA pathway (also called Wood-Ljung- gene could be assembled, confirming the presence of several dahl pathway) is important for acetogenic bacteria to autotrophi- Methanoculleus species/strains (see File S1). Assembly of the 5S cally produce acetate from hydrogen and carbon dioxide or rRNA gene confirmed three variants (data not shown); down- carbon monoxide, respectively [70], whereas aceticlastic methan- stream of the 5S rRNA gene, all three variants differ from ogens reversely use this pathway to generate methane from Methanoculleus marisnigri JR1, indicating a different genomic context acetate. Several COG classifications representing the Wood- for the rrn cluster. Ljungdahl and the hydrogenotrophic methanogenesis pathway were taxonomically analysed to distinguish between aceticlastic Mapping of metagenome reads to the Methanoculleus [16] and hydrogenotrophic methanogens [71]. The fact that the Wood-Ljungdahl pathway is used by acetogenic bacteria like marisnigri JR1 reference genome Clostridia as well as methanogenic Archaea is shown in Figure 4b. To In an attempt to reconstruct the genome sequence of one of the address the total number of COG hits representing methanogen- dominant methanogenic species, both datasets were mapped onto esis, only Archaea with 720 assignments were taken into account for the published genome of Methanoculleus marisnigri JR1, the only subsequent analysis. The order Methanomicrobiales (520) constitutes Methanoculleus strain for which a completely sequenced genome 72% of all COG hits relevant for methanogenesis assigned to the currently exists. Using only the sequence data from the GS FLX superkingdom Archaea. Thus, the order Methanomicrobiales is by far dataset, 39.8% of the reference genome was covered; mapping of the most abundant taxonomic group producing methane using the Titanium dataset resulted in 41.7% coverage. A joint mapping CO2 as a carbon source. Besides, it is noticeable that 25% of the of both datasets produced a genomic coverage of 45.4%. reads assigned to Methanomicrobiales (131) could not be classified at Upon further analysis, a correlation between poorly covered lower taxonomic levels, indicating that either corresponding regions and areas with relatively low GC content was discovered proteins originate from new taxa which are not represented in (data not shown). Since variations in GC content often hint at COG or highly conserved proteins shared by more than one horizontal gene transfer, the number of bases that could be taxon. Acetate as source for methane production seems to play a mapped to the coding sequence of each gene in the genome of minor role, which is taxonomically indicated by the low fraction of Methanoculleus marisnigri JR1 was determined and normalized with Methanosarcinales (2,704 reads) within the group of known respect to the gene length. The results were divided into two methanogens (27,693 reads). This observation correlates with the different groups depending on the prediction found in the HGT- low fraction of hits (3.5%) indicative for acetate conversion to DB: one containing all genes that potentially have been acquired methane within the group of Archaea as well as the fact that by horizontal gene transfer, and another one for all remaining reductive acetyl-CoA pathway enzymes are mainly found in the genes. For both groups, the density function of all results was group of Bacteria (95%) with Clostridia as their most abundant class. plotted (Figure 6). A comparison of the results for genes predicted as acquired by horizontal gene transfer in the HGT-DB and the Identification of several different Methanoculleus species remaining genes suggests that some mobile DNA segments are Analysis of the community participation in methanogenesis missing in one or several of the Methanoculleus species present in the revealed that members of the order Methanomicrobiales are studied biogas fermenter. The coverage of the Methanoculleus dominant among the methanogenic Archaea. Recently, the marisnigri JR1 genome as determined from the available metagen- complete genome sequence of Methanoculleus marisnigri JR1 of the ome sequence data does not suffice to determine the absence or order of Methanomicrobiales became available [51]. To gain an presence of individual genes in the Methanoculleus species residing in insight into the relatedness of dominant methanogens within the the biogas fermenter. This is particularly relevant because several analysed fermentation sample to this reference species, several different Methanoculleus species/strains were identified in the genes were analysed in more detail. studied biogas fermenter. Even with the currently available From both the (unfiltered) GS FLX and Titanium datasets, a sequence data, the reconstruction of a complete genome of one total amount of 5,266 reads containing 16S rRNA sequence of the dominant species still remains unfeasible due to the fragments was identified. Out of these, 44 of the GS FLX and 88 complexity of the microbial community and the close relationship of the Titanium reads were taxonomically assigned to the genus of some of the dominant species.

PLoS ONE | www.plosone.org 8 January 2011 | Volume 6 | Issue 1 | e14519 Biogas

Figure 4. Taxonomic and physiological overview of relevant members in ‘polysaccharide degradation’ (a) and ‘acetogenesis/ methanogenesis’ (b). The trees are based on reads classified into the NCBI taxonomy by the CARMA software. For each taxonomic group the underlying number of reads is given. Numbers in brackets refer to the amount of reads which could not be classified at corresponding lower taxonomic ranks. Associated COG entries are depicted as pie charts, where the interpretation of the numbers is equivalent. doi:10.1371/journal.pone.0014519.g004

Hydrogenases of Methanomicrobiales produced via the hydrogenotrophic pathway by reduction of This study revealed that species of the genus Methanoculleus carbon dioxide. Hydrogenotrophic methanogenesis usually is dominate among methanogenic Archaea. High abundance of accompanied by syntrophic acetate oxidation which necessitates Methanoculleus members has also been shown for other communities that released hydrogen from acetate oxidation is efficiently involved in fermentation of maize silage and related substrates consumed by cooperating hydrogenotrophic methanogens. It is [58,68,72,73] suggesting that in these habitats methane is mainly assumed that Methanoculleus species have a high hydrogen affinity

PLoS ONE | www.plosone.org 9 January 2011 | Volume 6 | Issue 1 | e14519 Biogas

Figure 5. Identification of Methanoculleus variants. Partial view (A) of pyrosequencing reads aligned to the 16S rRNA sequence of Methanoculleus bourgensis (Genbank accession AY196674). Colored bases indicate differences between reads and the reference (shown in the bottom line). In the depicted part, two of the seven different variants are visible. To characterize the variants, a phylogenetic tree (B) was constructed together with various reference sequences. Most variants show close relationship to M. bourgensis; only variant VAR2 was placed in another branch formed by M. marisnigri, M. palmolei, M. chikugoensis and M. thermophilus. Several 16S rRNA sequences from the genus Methanoculleus were used: M. olentangyi (AF095270), M. bourgensis (AY196674), M. palmaeoli (Y16382), M. thermophilus (AB065297), M. chikugoensis MG62 (AB038795) and M. marisnigri JR1 (CP000562 (Memar_R0043)). Additional sequences in increasing taxonomic distance were included as outgroups: Methanosarcina mazeii (MMU20151), Methanococcus vannielii SB (CP000742 (Mevan_R0025)), Clavibacter michiganensis ssp. michiganensis NCPPB 382 (AM711867 (CMM_RNA_0001)) and two sequences from Escherichia coli K12 DH10B (NC_010473 (ECDH10B_3945 and ECDH10B_2759)). doi:10.1371/journal.pone.0014519.g005

PLoS ONE | www.plosone.org 10 January 2011 | Volume 6 | Issue 1 | e14519 Biogas

Figure 6. Comparison of kernel density estimates for metagenome reads mapped to coding regions of the Methanoculleus marisnigri JR1 genome. The figure shows the mapped bases per position in the genome separated into different groups: genes that have potentially been acquired by horizontal gene transfer (HGT, depicted as dot-dashed line) and all other genes without a prediction for horizontal gene transfer (CDS, shown as dashed line). For comparison, the density function for all genes combined is shown as well (solid line). Poor coverage of HGT genes hints at genomic features probably missing in the Methanoculleus species present in the studied biogas fermenter, further supporting their difference to Methanoculleus marisnigri JR1. doi:10.1371/journal.pone.0014519.g006

[73]. BLAST analyses revealed that several metagenome reads Since both datasets differ in size, the Shannon index was also from the biogas community correspond to M. marisnigri JR1 computed for random subsets of 407,558 reads (i.e. the size of the hydrogenase genes (see Table 2). Among the identified genes are normalized GS FLX dataset) extracted from the Titanium data; those possibly encoding the membrane-bound hydrogenases Eha 5,000 iterations were calculated and the average of all results (Memar_1172 - Memar_1185) and Ech (Memar_0359 - taken. The resulting index value of 2.58 shows that even for Memar_0364), which were predicted to participate in methano- equally sized datasets, the Titanium dataset provides an higher genesis [51]. It is likely that Eha and Ech represent high-affinity estimate of biodiversity. hydrogenases contributing to efficient hydrogen oxidation in the Rarefaction analyses of both unfiltered datasets revealed that the course of methanogenesis. At least some Methanoculleus members biogas-producing community is covered much deeper by the within the community of the analysed fermentation sample possess Titanium than by the GS FLX sequences (Figure 7). 16S rRNA membrane-bound hydrogenases related to Eha and Ech of fragments identified in the Titanium sequences were assigned to 38 Methanoculleus marisnigri JR1. different genera, but only 16 genera were observed for the GS FLX dataset. Three genera were specific for the GS FLX dataset, 13 were Coverage of the microbial community shared, and 25 were specific for the Titanium data. A full list of all To analyse whether the diversity of the biogas-producing microbial identified genera can be found in the File S1. The rarefaction curve community is sufficiently covered by the sequence data, species computed for the Titanium dataset was also far from reaching the richness, diversity and rarefaction calculations were conducted. plateau phase, indicating that considerably higher sequencing effort The Shannon index [74] was used to estimate the biological would be required to cover all phylogenetic groups of the underlying diversity of the underlying microbial community in the analyzed microbial community. Rarefaction analysis conducted at rank fermenter. It is widely applied in ecological studies as a family is in accordance with this conclusion (see File S1). measurement of biodiversity, accounting for both the number of The gene content of the studied metagenome samples was different taxa as well as their relative abundances. For this characterized by assigning reads to Pfam protein families. As purpose, 16S rRNA fragments were detected in the normalized expected, the collective gene content of the studied microbial pyrosequencing datasets and classified with the RDP classifier. community was covered much deeper by the Titanium than by the Subsequently the Shannon index was computed considering only GS FLX sequence reads. In total, 4,759 different protein families 16S rRNA sequences that could be classified on rank genus with at were identified in the Titanium dataset, while 3,844 could be least 80% confidence. detected in the GS FLX reads. However, the rarefaction curves This approach resulted in a Shannon index value of 1.90 for the computed for the number of protein families also did not reach the GS FLX and 2.51 for the Titanium dataset, showing that sequence plateau phase, suggesting that additional sampling would be data obtained from the GS FLX platform clearly underestimates required to capture the entire gene content of the underlying the biodiversity of the underlying community. community (Figure 8). Protein families identified in the Titanium

PLoS ONE | www.plosone.org 11 January 2011 | Volume 6 | Issue 1 | e14519 Biogas

Table 2. Metagenome reads comprising hydrogenase gene fragments.

Titaniuma GS FLXa locus description

462 134 Memar_0359b 4Fe-4S ferredoxin iron-sulfur binding domain-containing protein 66 44 Memar_0360b NADH-ubiquinone oxidoreductase, chain 49kDa 14 16 Memar_0361b ech hydrogenase, subunit EchD, putative 40 17 Memar_0362b NADH ubiquinone oxidoreductase, 20 kDa subunit 221 111 Memar_0417 (NiFe) hydrogenase maturation protein HypF 46 32 Memar_0470 hydrogenase accessory protein HypB 111 46 Memar_0622 methyl-viologen-reducing hydrogenase, delta subunit 80 59 Memar_1007c nickel-dependent hydrogenase, large subunit 55 28 Memar_1008c NADH ubiquinone oxidoreductase, 20 kDa subunit 21 21 Memar_1014 hydrogenase maturation protease 146 51 Memar_1022 hydrogenase expression/formation protein HypE 11 10 Memar_1023 hydrogenase expression/synthesis, HypA 16 7 Memar_1024 hydrogenase assembly chaperone hypC/hupF 87 49 Memar_1044 coenzyme F420 hydrogenase/dehydrogenase beta subunit 72 45 Memar_1140 hydrogenase expression/formation protein HypD 21 24 Memar_1172d hypothetical protein 12 8 Memar_1173d hypothetical protein 14 10 Memar_1174d hypothetical protein 18 5 Memar_1175d hypothetical protein 37 18 Memar_1176d hypothetical protein 18 15 Memar_1177d uncharacterized membrane protein 3 6 Memar_1179d hypothetical protein 3 5 Memar_1181d hypothetical protein 28 13 Memar_1182d hypothetical protein 24 14 Memar_1183d NADH ubiquinone oxidoreductase, 20 kDa subunit 281 68 Memar_1185d 4Fe-4S ferredoxin iron-sulfur binding domain-containing protein 49 38 Memar_1378 coenzyme F420 hydrogenase/dehydrogenase beta subunit 17 6 Memar_1380 coenzyme F420 hydrogenase/dehydrogenase beta subunit 79 43 Memar_1623 coenzyme F420 hydrogenase/dehydrogenase beta subunit 93 45 Memar_2174 nickel-dependent hydrogenase, large subunit 31 21 Memar_2175 hydrogenase maturation protease 544 178 Memar_2176 coenzyme F420 hydrogenase 54 29 Memar_2177 coenzyme F420-reducing hydrogenase subunit beta

anumber of reads assigned to a specific M. marisnigri JR1 locus (Memar); bEch operon encoding a membrane-bound hydrogenase [51]; c F420 non-reducing hydrogenase; d Eha operon encoding a membrane-bound hydrogenase [51]. doi:10.1371/journal.pone.0014519.t002 reads were annotated with 1,623 different GO terms, whereas both sequencing runs, thus indicating the importance of thorough 1,338 GO terms were observed for the GS FLX dataset. data screening and filtering to avoid a contortion of results. Compared to the GS FLX data, the Titanium dataset provides However, sequence data generated using the GS FLX Titanium a far more detailed overview of the gene content and metabolic chemistry was only marginally affected by this issue and only a potential of the studied microbial community. very small fraction of reads had to be excluded from further analysis. These differences result from improvements in the Discussion Titanium chemistry, leading to less bias compared to the previous GS FLX technology. Meanwhile, a new emPCR kit containing a In a recent study, the metagenome of a biogas-producing specific additive has been launched by Roche Applied Science. microbial community was sequenced employing the GS FLX Initial studies based on microbial genome sequencing revealed an pyrosequencing platform. Since analysis results showed insufficient almost bias free sequencing of even very high GC content regions coverage of the community, the study was complemented by (data not shown). additional sequencing of the same DNA preparation on the GS The composition of the microbial community was deduced FLX Titanium platform. from both the taxonomic classification of 16S rRNA fragments as During the analysis, a previously unreported GC bias in well as the assignment of Environmental Gene Tags (EGTs) on pyrosequencing data was identified, which affected sequences from different taxonomic ranks Obtained results essentially confirmed

PLoS ONE | www.plosone.org 12 January 2011 | Volume 6 | Issue 1 | e14519 Biogas

Figure 7. Rarefaction analysis of observed genera. The rarefaction curves represent the estimated number of genera that would be observed in biogas fermenter metagenomes of different sizes. The values were determined based on 16S rRNA fragments classified at rank genus identified in the entire Titanium and GS FLX datasets, respectively. doi:10.1371/journal.pone.0014519.g007

Figure 8. Rarefaction analysis of Pfam families. The estimated number of Pfam protein families that would be identified in biogas fermenter metagenomes of different sizes is shown. The values were computed based on the number of protein families identified in the entire Titanium and FLX metagenomes, respectively. doi:10.1371/journal.pone.0014519.g008

PLoS ONE | www.plosone.org 13 January 2011 | Volume 6 | Issue 1 | e14519 Biogas taxonomic profiles of the previous study. However, less abundant culleus species within the analysed sample and the reference species taxa could be identified by analyzing the GS FLX Titanium M. marisnigri JR1 revealed that there are several differences mainly dataset thus justifying the additional sequencing effort. The fact concerning genes that might have been acquired by horizontal that only a very small fraction of metagenomic reads actually gene transfer. Metagenome reads assigned to the genus Methano- contains fragments of the 16S rRNA gene emphasizes the culleus represent inter alia methanogenesis and membrane-bound advantages of using software such as the CARMA pipeline, which hydrogenase genes predicted to be of importance for the pathway accurately classifies gene fragments detected in metagenome leading to the formation of methane within biogas. The close sequence data. A rarefaction analysis was performed to estimate relationship of the Methanoculleus species in the studied biogas the coverage of the microbial community in both sequencing fermenter makes reconstruction of the genomic sequence of one of datasets; as expected, sequencing data from the GS FLX Titanium the dominant Methanoculleus species from the metagenomic platform provides a far more complete view of the underlying sequence reads rather unlikely, since a reliable distinction between community, while the GS FLX sequencing run was not carried out the most abundant strains can not be assured. In comparison, the to saturation. GS FLX Titanium data offers a far more complete view of the During the functional characterization of the community, analysed fermenter, even though analysis results give evidence that members of the phylum Firmicutes could be confirmed to represent the available sequence data still does not fully cover the microbial the dominant organisms involved in the breakdown of polysac- community. charides together with Bacteroidetes. Beyond that, the novel analyses showed that Proteobacteria also play an important role in Supporting Information polysaccharide degradation. Clostridia were found to dominate within the functional context ‘acetogenesis’, as deduced by File S1 Supporting material. mapping of bacterial taxa to metagenome hits representing the Found at: doi:10.1371/journal.pone.0014519.s001 (0.26 MB Wood-Ljungdahl pathway which is also known as the reductive PDF) acetyl-CoA pathway. Methanomicrobiales are the most abundant order involved in methanogenesis using CO2 as a carbon source, Acknowledgments while acetate only seems to play a minor role, as indicated by a low The authors thank the BRF system administrators for technical support fraction of Methanosarcinales. and Jochen Blom for computing the linear regression. Aileen O’Connell Based on the identification of 16S rRNA fragments from the and Dan Brooks are thanked for proof-reading the manuscript. Methanoculleus genus and subsequent assembly, the presence of several Methanoculleus species closely related to Methanoculleus Author Contributions bourgensis in the studied biogas fermenter could be demonstrated. Conceived and designed the experiments: S. Jaenicke AP AS AG. A rough characterization of the genomic content of these Performed the experiments: S. Jaenicke MD OK. Analyzed the data: S. Methanoculleus species was conducted by mapping the metagenome Jaenicke CA TB RB KHG S. Ju¨nemann LK FT MZ AS AG. Contributed sequence reads to the published genome of Methanoculleus marisnigri reagents/materials/analysis tools: S. Jaenicke MD OK. Wrote the paper: JR1. Comparison of the genomic content of dominant Methano- S. Jaenicke S. Ju¨nemann LK AS.

References 1. Weiland P (2003) Production and energetic use of biogas from energy crops 14. Blaut M (1994) Metabolism of methanogens. Antonie van Leeuwenhoek 66: and wastes in Germany. Applied Biochemistry and Biotechnology 109: 263– 187–208. 274. 15. Deppenmeier U (2002) The unique biochemistry of methanogenesis. Progress in 2. Yadvika S, Sreekrishnan T, Kohli S, Rana V (2004) Enhancement of biogas nucleic acid research and molecular biology 71: 223–283. production from solid substrates using different techniques - a review. 16. Ferry JG (1999) Enzymology of one-carbon metabolism in methanogenic Bioresource Technology 95: 1–10. pathways. FEMS Microbiol Rev 23: 13–38. 3. Bayer E, Belaich J, Shoham Y, Lamed R (2004) The cellulosomes: multienzyme 17. Liu Y, Whitman W (2008) Metabolic, phylogenetic, and ecological diversity of machines for degradation of plant cell wall polysaccharides. Annual Review of the methanogenic archaea. Annals of the New York Academy of Sciences 1125: Microbiology 58: 521–54. 171–189. 4. Cirne D, Lehtomaki A, Bjornsson L, Blackall L (2007) Hydrolysis and microbial 18. Reeve J, No¨lling J, Morgan R, Smith D (1997) Methanogenesis: genes, genomes, community analyses in two-stage anaerobic digestion of energy crops. Journal of and who’s on first? Journal of bacteriology 179: 5975–5986. Applied Microbiology 103: 516–527. 19. Weiss A, Je´roˆme V, Freitag R, Mayer H (2008) Diversity of the resident 5. Lynd L, Weimer P, van Zyl W, Pretorius I (2002) Microbial cellulose utilization: microbiota in a thermophilic municipal biogas plant. Applied Microbiology and fundamentals and biotechnology. Microbiology and molecular biology reviews Biotechnology 81: 163–173. 66: 506–577. 20. Tang Y, Shigematsu T, Morimura S, Kida K (2004) The effects of micro- 6. Drake H, Ku¨sel K, Matthies C (2002) Ecological consequences of the aeration on the phylogenetic diversity of microorganisms in a thermophilic phylogenetic and physiological diversities of acetogens. Antonie Van Leeuwen- anaerobic municipal solid-waste digester. Water Research 38: 2537–2550. hoek 81: 203–213. 21. Tang Y, Fujimura Y, Shigematsu T, Morimura S, Kida K (2007) Anaerobic 7. Drake H, Daniel S, Ku¨sel K, Matthies C, Kuhner C, et al. (1997) Acetogenic treatment performance and microbial population of thermophilic upflow bacteria: what are the in situ consequences of their diverse metabolic anaerobic filter reactor treating awamori distillery wastewater. Journal of versatilities? Biofactors 6: 13–24. Bioscience and Bioengineering 104: 281–287. 8. Myint M, Nirmalakhandan N, Speece R (2007) Anaerobic fermentation of cattle 22. Klocke M, Ma¨hnert P, Mundt K, Souidi K, Linke B (2007) Microbial manure: Modeling of hydrolysis and acidogenesis. Water Research 41: 323–332. community analysis of a biogas-producing completely stirred tank reactor fed 9. Schink B (1997) Energetics of syntrophic cooperation in methanogenic continuously with fodder beet silage as mono-substrate. Systematic and Applied degradation. Microbiology and Molecular Biology Reviews 61: 262–280. Microbiology 30: 139–151. 10. Shin H, Youn J (2005) Conversion of food waste into hydrogen by thermophilic 23. Klocke M, Nettmann E, Bergmann I, Mundt K, Souidi K, et al. (2008) acidogenesis. Biodegradation 16: 33–44. Characterization of the methanogenic Archaea within two-phase biogas reactor 11. Sousa D, Smidt H, Alves M, Stams A (2009) Ecophysiology of syntrophic systems operated with plant biomass. Systematic and Applied Microbiology 31: communities that degrade saturated and unsaturated long-chain fatty acids. 190–205. FEMS Microbiology Ecology 68: 257–272. 24. Chouari R, Le Paslier D, Daegelen P, Ginestet P, Weissenbach J, et al. (2005) 12. Deppenmeier U, Mu¨ller V (2008) Life close to the thermodynamic limit: how Novel predominant archaeal and bacterial groups revealed by molecular analysis methanogenic archaea conserve energy. Results and problems in cell of an anaerobic sludge digester. Environmental Microbiology 7: 1104–1115. differentiation 45: 123–152. 25. Shigematsu T, Era S, Mizuno Y, Ninomiya K, Kamegawa Y, et al. (2006) 13. Thauer R, Kaster A, Seedorf H, Buckel W, Hedderich R (2008) Methanogenic Microbial community of a mesophilic propionate-degrading methanogenic archaea: ecologically relevant differences in energy conservation. Nature consortium in chemostat cultivation analyzed based on 16S rRNA and acetate Reviews Microbiology 6: 579–591. kinase genes. Applied Microbiology and Biotechnology 72: 401–415.

PLoS ONE | www.plosone.org 14 January 2011 | Volume 6 | Issue 1 | e14519 Biogas

26. Tang Y, Shigematsu T, Morimura S, Kida K (2005) Microbial community 51. Anderson I, Ulrich L, Lupa B, Susanti D, Porat I, et al. (2009) Genomic analysis of mesophilic anaerobic protein degradation process using bovine serum characterization of Methanomicrobiales reveals three classes of methanogens. PloS albumin (BSA)-fed continuous cultivation. Journal of Bioscience and Bioengi- One 4: e5797. neering 99: 150–164. 52. Garcia-Vallve S, Guzman E, Montero M, Romeu A (2003) HGT-DB: a 27. Sasaki K, Haruta S, Ueno Y, Ishii M, Igarashi Y (2007) Microbial population in database of putative horizontally transferred genes in prokaryotic complete the biomass adhering to supporting material in a packed-bed reactor degrading genomes. Nucleic Acids Research 31: 187–189. organic solid waste. Applied Microbiology and Biotechnology 75: 941–952. 53. Dohm J, Lottaz C, Borodina T, Himmelbauer H (2008) Substantial biases in 28. Friedrich M (2005) Methyl-coenzyme M reductase gene: unique functional ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids markers for methanogenic and anaerobic methane-oxidizing Archaea. Methods in Research 36: e105. Enzymology 397: 428–442. 54. Ludwig W, Strunk O, Westram R, Richter L, Meier H, et al. (2004) ARB: a 29. Juottonen H, Galand P, Yrja¨la¨ K (2006) Detection of methanogenic Archaea in software environment for sequence data. Nucleic Acids Research 32: peat: comparison of PCR primers targeting the mcrA gene. Research in 1363–1371. Microbiology 157: 914–921. 55. Dollhopf S, Hashsham S, Dazzo F, Hickey R, Criddle C, et al. (2001) The 30. Lueders T, Chin K, Conrad R, Friedrich M (2001) Molecular analyses of impact of fermentative organisms on carbon flow in methanogenic systems methyl-coenzyme M reductase alpha-subunit (mcrA) genes in rice field soil and under constant low-substrate conditions. Applied microbiology and biotechnol- enrichment cultures reveal the methanogenic phenotype of a novel archaeal ogy 56: 531–538. lineage. Environmental Microbiology 3: 194–204. 56. Fang H, Zhang T, Liu H (2002) Microbial diversity of a mesophilic hydrogen- 31. Luton P, Wayne J, Sharp R, Riley P (2002) The mcrA gene as an alternative to producing sludge. Applied microbiology and biotechnology 58: 112–118. 16S rRNA in the phylogenetic analysis of methanogen populations in landfill. 57. Fernandez A, Hashsham S, Dollhopf S, Raskin L, Glagoleva O, et al. (2000) Microbiology 148: 3521–3530. Flexible community structure correlates with stable community function in 32. Rastogi G, Ranade D, Yeole T, Patole M, Shouche Y (2008) Investigation of methanogenic bioreactor communities perturbed by glucose. Applied and methanogen population structure in biogas reactor by molecular characteriza- Environmental Microbiology 66: 4058. tion of methyl-coenzyme M reductase A (mcrA) genes. Bioresource Technology 58. Li T, Maze´as L, Sghir A, Leblon G, Bouchez T (2009) Insights into networks of 99: 5317–5326. functional microbes catalysing methanization of cellulose under mesophilic 33. Schlu¨ter A, Bekel T, Diaz N, Dondrup M, Eichenlaub R, et al. (2008) The conditions. Environmental Microbiology 11: 889–904. metagenome of a biogas-producing microbial community of a production-scale 59. Noach I, Levy-Assaraf M, Lamed R, Shimon L, Frolow F, et al. (2010) Modular biogas plant fermenter analysed by the 454-pyrosequencing technology. Journal arrangement of a cellulosomal scaffoldin subunit revealed from the crystal of Biotechnology 136: 77–90. structure of a cohesin dyad. Journal of Molecular Biology 2: 294–305. 34. Krause L, Diaz N, Edwards R, Gartemann K, Kro¨meke H, et al. (2008) 60. Bae J, Park J, Chang Y, Rhee S, Kim B, et al. (2004) Clostridium hastiforme is a Taxonomic composition and gene content of a methane-producing microbial later synonym of Tissierella praeacuta. International journal of systematic and community isolated from a biogas reactor. Journal of Biotechnology 136: evolutionary microbiology 54: 947. 91–101. 61. Kaster K, Voordouw G (2006) Effect of nitrite on a thermophilic, methanogenic 35. Kro¨ber M, Bekel T, Diaz N, Goesmann A, Jaenicke S, et al. (2009) Phylogenetic consortium from an oil storage tank. Applied microbiology and biotechnology characterization of a biogas plant microbial community integrating clone library 72: 1308–1315. 16S-rDNA sequences and metagenome sequence data obtained by 454- 62. Kim W, Hwang K, Shin S, Lee S, Hwang S (2010) Effect of high temperature on pyrosequencing. Journal of Biotechnology 142: 38–49. bacterial community dynamics in anaerobic acidogenesis using mesophilic 36. Zhou J, Bruns M, Tiedje J (1996) DNA recovery from soils of diverse sludge inoculum. Bioresource Technology 101: S17–S22. composition. Applied and Environmental Microbiology 62: 316–322. 63. Lee Y, Romanek C, Mills G, Davis R, Whitman W, et al. (2006) Gracilibacter thermotolerans gen. nov., sp. nov., an anaerobic, thermotolerant bacterium from a 37. R Development Core Team (2008) R: A Language and Environment for constructed wetland receiving acid sulfate water. International journal of Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. systematic and evolutionary microbiology 56: 2089. ISBN 3-900051-07-0. 64. Miranda-Tello E, Fardeau M, Sepulveda J, Fernandez L, Cayol J, et al. (2003) 38. Kutner M, Nachtsheim C, Neter J, Li W (2005) Applied linear statistical models. Garciella nitratireducens gen. nov., sp. nov., an anaerobic, thermophilic, nitrate-and McGraw Hill. thiosulfate-reducing bacterium isolated from an oilfield separator in the Gulf of 39. Gomez-Alvarez V, Teal T, Schmidt T (2009) Systematic artifacts in Mexico. International journal of systematic and evolutionary microbiology 53: metagenomes from complex microbial communities. The ISME Journal 3: 1509. 1314–1317. 65. Plugge C, Balk M, Zoetendal E, Stams A (2002) Gelria glutamica gen. nov., sp. 40. Beifang N, Limin F, Shulei S, Weizhong L (2010) Artificial and natural nov., a thermophilic, obligately syntrophic, glutamate-degrading anaerobe. duplicates in pyrosequencing reads of metagenomic data. BMC Bioinformatics International journal of systematic and evolutionary microbiology 52: 401. 11: 187. 66. Shah H, Olsen I, Bernard K, Finegold S, Gharbia S, et al. (2009) Approaches to 41. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local the study of the systematics of anaerobic, Gram-negative, non-sporeforming alignment search tool. J Mol Biol 215: 403–410. rods: Current status and perspectives. Anaerobe 15: 179–194. 42. Wang Q, Garrity G, Tiedje J, Cole J (2007) Naive bayesian classifier for rapid 67. Zhilina T, Appel R, Probian C, Brossa E, Harder J, et al. (2004) Alkaliflexus assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ imshenetskii gen. nov. sp. nov., a new alkaliphilic gliding carbohydrate-fermenting Microbiol 73: 5261–5267. bacterium with propionate formation from a soda lake. Archives of microbiology 43. Finn R, Tate J, Mistry J, Coggill P, Sammut S, et al. (2008) The Pfam protein 182: 244–253. families database. Nucleic Acids Research 36: D281. 68. Pobeheim H, Munk B, Mu¨ller H, Berg G, Guebitz G (2010) Characterization of 44. Krause L, Diaz N, Goesmann A, Kelley S, Nattkemper T, et al. (2008) an anaerobic population digesting a model substrate for maize in the presence of Phylogenetic classification of short environmental DNA fragments. Nucleic trace metals. Chemosphere. Acids Research 36: 2230–2239. 69. Kirby B, Le Roes M, Meyers P (2006) Kribbella karoonensis sp. nov. and Kribbella 45. Muller J, Szklarczyk D, Julien P, Letunic I, Roth A, et al. (2010) eggNOG v2. 0: swartbergensis sp. nov., isolated from soil from the Western Cape, South Africa. extending the evolutionary genealogy of genes with enhanced non-supervised Int J Syst Evol Microbiol 56: 1097–1101. orthologous groups, species and functional annotations. Nucleic Acids Research 70. Pierce E, Xie G, Barabote R, Saunders E, Han C, et al. (2008) The complete 38: D190. genome sequence of Moorella thermoacetica (f. Clostridium thermoaceticum). Environ 46. Caspi R, Foerster H, Fulcher C, Kaipa P, Krummenacker M, et al. (2008) The Microbiol 10: 2550–2573. MetaCyc Database of metabolic pathways and enzymes and the BioCyc 71. Ladapo J, Whitman WB (1990) Method for isolation of auxotrophs in the collection of Pathway/Genome Databases. Nucleic Acids Research 36: D623. methanogenic archaebacteria: role of the acetyl-CoA pathway of autotrophic 47. Sayers E, Barrett T, Benson D, Bolton E, Bryant S, et al. (2009) Database CO2 fixation in Methanococcus maripaludis. PNAS 87: 5598–5602. resources of the National Center for Biotechnology Information. Nucleic Acids 72. Hansen K, Ahring B, Raskin L (1999) Quantification of syntrophic fatty acid- Research 37: D5. beta-oxidizing bacteria in a mesophilic biogas reactor by oligonucleotide probe 48. Pearson WR (2000) Flexible sequence similarity searching with the FASTA3 hybridization. Applied and environmental microbiology 65: 4767. program package. Methods Mol Biol 132: 185–219. 73. Schnu¨rer A, Zellner G, Svensson B (1999) Mesophilic syntrophic acetate 49. Edgar R (2004) MUSCLE: multiple sequence alignment with high accuracy and oxidation during methane formation in biogas reactors. FEMS microbiology high throughput. Nucl Acids Res 32: 1792–1797. ecology 29: 249–261. 50. Felsenstein J (1989) PHYLIP - Phylogeny Inference Package (Version 3.2). 74. Shannon CE (1948) A mathematical theory of communication. Bell System Cladistics 5: 164–166. Technical Journal 27: 379–423 and 623–656.

PLoS ONE | www.plosone.org 15 January 2011 | Volume 6 | Issue 1 | e14519