Classification of Pmoa Amplicon Pyrosequences

METHODS ARTICLE published: 18 February 2014 doi: 10.3389/fmicb.2014.00034 Classification of pmoA amplicon pyrosequences using BLAST and the lowest common ancestor method in MEGAN Marc G. Dumont 1*,ClaudiaLüke1,2,YongcuiDeng1,3 and Peter Frenzel 1 1 Department of Biogeochemistry, Max Planck Institute for Terrestrial Microbiology, Marburg, Germany 2 Department of Microbiology, Radboud University, Nijmegen, Netherlands 3 Key Laboratory of Virtual Geographic Environment, Ministry of Education, Nanjing Normal University, Nanjing, China Edited by: The classification of high-throughput sequencing data of protein-encoding genes is not as Kevin John Purdy, University of well established as for 16S rRNA. The objective of this work was to develop a simple and Warwick, UK accurate method of classifying large datasets of pmoA sequences, a common marker Reviewed by: for methanotrophic bacteria. A taxonomic system for pmoA was developed based on Anthony Yannarell, University of Illinois at Urbana-Champaign, USA a phylogenetic analysis of available sequences. The taxonomy incorporates the known Marco J. L. Coolen, Woods Hole diversity of pmoA present in public databases, including both sequences from cultivated Oceanographic Institution, USA and uncultivated organisms. Representative sequences from closely related genes, such *Correspondence: as those encoding the bacterial ammonia monooxygenase, were also included in the Marc G. Dumont, Department of pmoA taxonomy. In total, 53 low-level taxa (genus-level) are included. Using previously Biogeochemistry, Max Planck Institute for Terrestrial Microbiology, published datasets of high-throughput pmoA amplicon sequence data, we tested two Karl-von-Frisch-Str., 10, 35043 approaches for classifying pmoA: a naïve Bayesian classifier and BLAST. Classification of Marburg, Germany pmoA sequences based on BLAST analyses was performed using the lowest common e-mail: dumont@ ancestor (LCA) algorithm in MEGAN, a software program commonly used for the analysis mpi-marburg.mpg.de of metagenomic data. Both the naïve Bayesian and BLAST methods were able to classify pmoA sequences and provided similar classifications; however, the naïve Bayesian classifier was prone to misclassifying contaminant sequences present in the datasets. Another advantage of the BLAST/LCA method was that it provided a user-interpretable output and enabled novelty detection at various levels, from highly divergent pmoA sequences to genus-level novelty. Keywords: pmoA,methanotroph,pyrosequencing,diversity,NGSdataanalysis 1. INTRODUCTION how the sequences are related to those of cultivated organisms High-throughput sequencing (HTS) technologies have aided our or those from other studies. Taxonomy-based methods classify ability to investigate the diversity of microorganisms in envi- sequences according to their relatedness to those of pure cultures ronmental samples either by shotgun metagenomic or amplicon and uncultivated organisms. This approach is necessary to incor- sequencing approaches. Many bioinformatic tools necessary to porate knowledge of the physiological characteristics of different process and interpret the large volume of data obtained by HTS taxa, to identify novel sequence types and to compare results have been developed. For example, there are several choices of between published studies. Common methods for classification pipelines available to analyze 16S rRNA amplicon sequencing data include naïve Bayesian classifiers (Wang et al., 2007), k-nearest such as RDP (Cole et al., 2005), mothur (Schloss et al., 2009)and neighbor (Cole et al., 2005)andBLAST(Altschul et al., 1990). QIIME (Caporaso et al., 2010). Similar strategies targeting genes In general, the analysis of OTUs and phylogenetic trees cal- encoding enzymes responsible for important biogeochemical or culated from individual datasets of protein-encoding genes can bioremediation processes are becoming more common, but the be performed with the same tools designed for the analysis of methods for analyzing the data are not as well established as for 16S rRNA sequences. In contrast, classifiers must be tailor made 16S rRNA. for each gene by establishing a taxonomy with representative The analysis of HTS amplicon data can be performed sequences and choosing an appropriate classification algorithm. using taxonomy-dependent or independent approaches. The The objective of this study was to establish a robust and eas- taxonomy-independent approach includes methods to com- ily applied approach to classifying HTS amplicon sequences of pare sequence alignments and analyze operational taxonomic pmoA,akeygeneofmethane-oxidizingbacteria.Themethod units (OTUs) based on sequence dissimilarity (Schloss and should also allow for novelty detection and be easily performed by Handelsman, 2004; Cai and Sun, 2011). This approach is valu- amicrobialecologistwithonlyfundamentalknowledgeofbioin- able for estimating ecological parameters, such as richness and formatics. We test a naïve Bayesian classifier and BLAST com- diversity; however, the information on its own does not indicate bined with the lowest common ancestor approach of MEGAN www.frontiersin.org February 2014 | Volume 5 | Article 34 | 1 Dumont et al. BLAST/LCA pmoA classification using previously published pmoA pyrosequencing data (Lüke and found (e.g., Aquifer_cluster or upland soil cluster—USC) (Lüke Frenzel, 2011; Deng et al., 2013). Previous studies have also com- and Frenzel, 2011). pared both approaches for the classification of SSU rRNA (Lanzén et al., 2012)andfungalLSUrRNAsequences(Porter and Golding, 2.2. TYPE I AND II pmoA SEQUENCES 2012). The MOB_like sequences were assigned to either Type I, Type II or pXMO_like. The Type I sequences were further divided into 2. pmoA TAXONOMY Type Ia, b, or c. Type Ia are pmoA sequences affiliated to the classic An accurate taxonomic system for the gene sequences is a nec- Type I methanotrophs (i.e., not Type X). Type Ib (also referred to essary prerequisite for classification. Since the classification of elsewhere as Type X) are those of Methylococcus and closely related unknown sequences obtained by HTS can only be as accurate genera. Type Ic are all other Type I-related sequences with a more as the taxonomy, the analysis of database sequences and assign- ambiguous affiliation. Type II sequences were divided into Type ment of taxa is the critical step in the development of a classifier. IIa and b. Type IIa was used for the primary pmoA sequences of In general, pmoA has been shown to be a good phylogenetic the Methylocystaceae. TypeIIbwasusedto groupallotherTypeII- marker for methanotrophs (Degelmann et al., 2010), with some related (i.e., Alphaproteobacteria)sequences,includingthosefrom exceptions of divergent additional copies of the gene in some the Beijerinckiaceae (Theisen et al., 2005; Dunfield et al., 2010; organisms (Dunfield et al., 2002; Stoecker et al., 2006; Baani and Vorobev et al., 2011)andthealternatepMMO2identifiedinsome Liesack, 2008). Here we describe the taxonomy of pmoA genes Methylocystis species (Dunfield et al., 2002; Baani and Liesack, (Table 1); earlier versions were described previously (Lüke and 2008). Frenzel, 2011; Deng et al., 2013). 2.3. pXMO: DIVERGENT pmoA SEQUENCES We use pXMO as the third category of pmoA-related sequences. 2.1. OVERALL TAXONOMIC SYSTEM The original description of pXMO was for the unusual pMMO- The pmoA gene encodes the β-subunit of the particulate like enzyme identified in some Type I methanotrophs (Tavormina methane monooxygenase (pMMO), which belongs to the et al., 2011). Here we use pXMO_like to encompass all the diver- class of copper-containing membrane-bound monooxygenase gent sequence types for which the substrate or biological function (CuMMO) enzymes. In addition to the pMMO, this group has not been clearly identified by biochemical or genetic tests. includes the bacterial ammonia monooxygenase (Holmes et al., For example, we include the three verrucomicrobial pmoA-like 1995), the thaumarchaeal ammonia monooxygenase (Pester et al., sequences in this category until it is determined which, if not 2011), alkane monooxygenases and various uncharacterized all, catalyze the oxidation of methane. The original pxmA genes enzymes encoded by genes detected in environmental surveys identified in Methylomonas spp. (Tavormina et al., 2011) are clas- (Coleman et al., 2012). For our classifier we compiled a database sified in the M84_P105 low-level taxon. We have also included the of pmoA and related gene sequences obtained primarily from pmoA sequences from the nitrite-dependent anaerobic methane public databases. We focused on building a taxonomic structure oxidizers belonging to the NC10 phylum (Ettwig et al., 2009, for pmoA,butalsoincludedsequencesofrelatedgenesthatare 2010) into the pXMO_like category; it should be noted that these often co-amplified with common pmoA primers, such as the bac- NC10 pmoA sequences are typically retrieved only after using terial amoA. Related sequences that are not co-amplified, such as specific primers and a special PCR program designed for their the thaumarchaeal amoA,werenotincluded. amplification (Luesken et al., 2011)andthereforeareunlikelyto Currently, our curated database includes 6628 reference be obtained in HTS pmoA surveys using the traditional pmoA sequences corresponding to 53 low-level taxa (Table 1). The primer sets. assignment to taxa was determined by the phylogenetic analysis of the pmoA and related gene fragments using both the nucleotide 2.4. BACTERIAL AMMONIA MONOOXYGENASE and inferred protein sequences.

Classification of Pmoa Amplicon Pyrosequences

Widespread Soil Bacterium That Oxidizes Atmospheric Methane

Large Scale Biogeography and Environmental Regulation of 2 Methanotrophic Bacteria Across Boreal Inland Waters

Facultative Anerobe and Obligate Anaerobe Meaning

The Methanol Dehydrogenase Gene, Mxaf, As a Functional and Phylogenetic Marker for Proteobacterial Methanotrophs in Natural Environments

Novel Facultative Methylocella Strains Are Active Methane Consumers at Terrestrial Natural Gas Seeps Muhammad Farhan Ul Haque1,2* , Andrew T

Diversity of Methane-Cycling Microorganisms in Soils and Their Relation to Oxygen

Evolution of Methanotrophy in the Beijerinckiaceae&Mdash

Current Status and Potential Applications of Underexplored Prokaryotes

Originally Published As

Methane Distribution and Oxidation Across Aquatic Interfaces: Case Studies from Arctic Water Bodies and the Elbe Estuary

Interactions Between Soil Methane and Ammonia Oxidizers

Diversity and Activity of Aerobic Thermophilic Carbon Monoxide-Oxidizing Bacteria on Kilauea Volcano, Hawaii