ORIGINAL RESEARCH ARTICLE published: 17 April 2012 doi: 10.3389/fmicb.2012.00132 Evidence for horizontal gene transfer of anaerobic carbon monoxide dehydrogenases

Stephen M.Techtmann1,2, Alexander V. Lebedinsky 3, Albert S. Colman4,Tatyana G. Sokolova3,TanjaWoyke 5, Lynne Goodwin5 and FrankT. Robb1,2*

1 Institute of Marine and Environmental Technology, University of Maryland, Baltimore, MD, USA 2 Department of Microbiology and Immunology, University of Maryland, Baltimore, MD, USA 3 Winogradsky Institute of Microbiology, Russian Academy of Sciences, Moscow, Russia 4 Department of the Geophysical Sciences, University of Chicago, Chicago, IL, USA 5 DOE Joint Genome Institute, Walnut Creek, CA, USA

Edited by: Carbon monoxide (CO) is commonly known as a toxic gas, yet both cultivation studies and Martin G. Klotz, University of North emerging genome sequences of and archaea establish that CO is a widely utilized Carolina at Charlotte, USA microbial growth substrate. In this study, we determined the prevalence of anaerobic car- Reviewed by: Annette Summers Engel, University bon monoxide dehydrogenases ([Ni,Fe]-CODHs) in currently available genomic sequence of Tennessee, USA databases. Currently, 185 out of 2887,or 6% of sequenced bacterial and archaeal genomes Viktoria Shcherbakova, Institute of possess at least one gene encoding [Ni,Fe]-CODH, the key enzyme for anaerobic CO utiliza- Biochemistry and Physiology of tion. Many genomes encode multiple copies of [Ni,Fe]-CODH genes whose functions and Microorganisms, Russian Academy of Sciences, Russia regulation are correlated with their associated gene clusters. The phylogenetic analysis of *Correspondence: this extended protein family revealed six distinct clades; many clades consisted of [Ni,Fe]- Frank T. Robb, Institute of Marine and CODHs that were encoded by microbes from disparate phylogenetic lineages, based on Environmental Technology, 701 East 16S rRNA sequences, and widely ranging physiology.To more clearly define if the branching Pratt Street, Baltimore, MD 21202, patterns observed in the [Ni,Fe]-CODH trees are due to functional conservation vs. evolu- USA. e-mail: [email protected] tionary lineage, the genomic context of the [Ni,Fe]-CODH gene clusters was examined, and superimposed on the phylogenetic trees. On the whole, there was a correlation between genomic contexts and the tree topology, but several functionally similar [Ni,Fe]-CODHs were found in different clades. In addition, some distantly related organisms have sim- ilar [Ni,Fe]-CODH genes. Thermosinus carboxydivorans was used to observe horizontal gene transfer (HGT) of [Ni,Fe]-CODH gene clusters by applying Kullback–Leibler diver- gence analysis methods. Divergent tetranucleotide frequency and codon usage showed that the gene cluster of T. carboxydivorans that encodes a [Ni,Fe]-CODH and an energy- converting hydrogenase is dissimilar to its whole genome but is similar to the genome of the phylogenetically distant Firmicute, Carboxydothermus hydrogenoformans. These results imply that T carboxydivorans acquired this gene cluster via HGT from a relative of C. hydrogenoformans.

Keywords: carbon monoxide, thermophiles, hydrogenogens, carboxydotrophs, Thermosinus carboxydivorans, Carboxydothermus hydrogenoformans, carbon monoxide dehydrogenase

INTRODUCTION Schubel et al., 1995). The Cox-type CODH contains a highly con- Carbon monoxide (CO) is most commonly known as a potent served Mo based active site (Meyer and Schlegel, 1983; Meyer human toxin. Alternatively, CO can provide an energy and car- and Rajagopalan, 1984; Meyer et al., 1993; Dobbek et al., 1999; bon source in both anaerobic and aerobic microbes (Uffen, 1976; Gnida et al., 2003). In the case of the aerobic Cox-type CODH, King and Weber, 2007; Sokolova et al., 2009; Techtmann et al., the electrons generated from oxidation of CO are transferred to 2009). This ability to utilize CO relies upon enzymes known as oxygen or, in some cases, nitrate as the final electron acceptor carbon monoxide dehydrogenases (CODHs, EC 1.2.99.2; Ferry, (King, 2003, 2006; King and Weber, 2007). Anaerobic CODHs are 1995; Seravalli et al., 1995, 1997; Ragsdale, 2008). CO oxidation is distinct from Cox CODHs and have a Ni–Fe active site (Dobbek performed by aerobic and anaerobic using CODHs in dif- et al., 2001, 2004; Svetlitchnyi et al., 2001, 2004). The anaerobic ferent classes (Ragsdale, 2004; King and Weber, 2007). While the CODH generates electrons from the oxidation of CO and trans- overall reaction catalyzed by both aerobic and anaerobic species fers them to a variety of acceptors, allowing CO to be the source of + − is the same (CO + H2O → CO2 + 2H + 2e ), major differences reducing equivalents for various pathways including sulfate reduc- result from the different terminal electron acceptors of the path- tion, acetogenesis, methanogenesis, hydrogenogenesis, and metal way. For aerobic species this reaction is catalyzed by the aero- reduction, and it can also reduce CO2 to CO used further for bic CODH (CoxSML) complex (Hugendieck and Meyer, 1992; acetate synthesis by the Wood–Ljungdahl pathway (Gonzalez and

www.frontiersin.org April 2012 | Volume 3 | Article 132 | 1 Techtmann et al. Anaerobic carbon monoxide dehydrogenase HGT

Robb, 2000; Svetlitchnyi et al., 2001, 2004; Ragsdale, 2004, 2008; final [Ni,Fe]-CODH database (Table S3 in Supplemental Mater- Wu et al., 2005; Sokolova et al., 2009). ial). Many of the low scoring hits were often genes annotated as The extreme thermophile Carboxydothermus hydrogenofor- hydroxylamine reductases. This refined database was then used for mans is a prototype anaerobic carboxydotroph with five distinct further analysis. anaerobic CODHs. C. hydrogenoformans can grow efficiently on CO as sole carbon and energy source, confirming that CO can PHYLOGENETIC TREE CONSTRUCTION fuel divergent pathways (Wu et al., 2005). The presence of multi- The [Ni,Fe]-CODH sequences were aligned using the MUSCLE ple anaerobic CODHs encoded by a single organism requires that alignment program (Edgar, 2004a,b). This multiple sequence these homologs arose either by duplication of an ancestral CODH alignment was used to construct phylogenetic trees using Max- gene in the same lineage or else have been acquired by horizontal imum Likelihood (ML) and Bayesian inference (BI). ML trees gene transfer (HGT) between separate lineages. were constructed using the RAxML-HPC2 program (version In this study we identified [Ni,Fe]-CODH genes in the rapidly 7.2.8; Stamatakis, 2006; Stamatakis et al., 2008) on the CIPRES expanding database of microbial genomes in order to under- servers (Miller et al., 2010). ML trees were constructed using the stand the mechanisms of evolution and dispersion of [Ni,Fe]- WAG + Γ + I model which was selected as the best-fit model for CODHs. A comprehensive phylogenetic analysis was designed to our database by the model selection tool implemented in Topali2 observe if CODHs are found in distinct clades. Examination of (Milne et al., 2009). Node support was assessed by 1,000 bootstrap the genomic context of each of the [Ni,Fe]-CODHs provided replicates with the same model. BI trees were constructed using insights into the functions of [Ni,Fe]-CODH gene clusters. We the BEAST program (Drummond and Rambaut, 2007) with the then used these functional assignments to determine whether WAG + Γ + I model as was selected as the best-fit model by the differences in inferred function of [Ni,Fe]-CODHs explain the Topali2 package. Searches were run with four chains of 7,500,000 pattern of phylogenetic divergence. In addition, in-depth statis- generations, for which the first 750,000 were discarded as “burn- tical analysis was performed on two gene clusters that catalyze in.” Trees were sampled every 1,000 generations. Stabilization of similar functions. By examining operons with the same function chain parameters was determined using the program TRACER we control for differences that have arisen based on functional (Rambaut and Aj, 2007). A maximum credibility clade tree was diversification. The two gene clusters examined were from the pro- annotated using the TreeAnnotator program (part of the BEAST totype thermophilic hydrogenogen – C. hydrogenoformans – and package). Trees were drawn using the FigTree program1. Subtrees a thermophilic metal reducing hydrogenogen – Thermosinus car- were drawn of each of the clades and are shown in (Figure A2 in boxydivorans. These organisms were chosen based on the following Appendix). criteria. (1) They are both thermophilic, thus controlling for selec- tion due to adaptations to differing growth temperatures. (2) C. CONSTRUCTION OF FUNCTIONAL TREES hydrogenoformans and T. carboxydivorans exhibit different types of The analysis of the genomic contexts was done using the infor- cell wall structure and are members of widely divergent classes of mation available at Gene Detail pages of the IMG database and, the Phylum : the and , respec- where necessary, applying tblastn with appropriate queries at the tively. (3) Despite their divergent lineages, their [Ni,Fe]-CODHs IMG website2. This information was used to identify whether the and the linked energy-converting hydrogenases (ECH) are very [Ni,Fe]-CODH gene was located (1) within an ACS gene cluster, similar. The genomes of these bacteria were analyzed to deter- (2) adjacent to an ECH gene cluster, (3) located at a distance less mine whether the [Ni,Fe]-CODH–ECH gene cluster may have than 3 kb away from a gene encoding CooF, a ferredoxin-like FeS been subject to recent HGT. protein (Kerby et al., 1992) which carries out electron transfer from CooS to a variety of electron acceptors, or (4) not clustered MATERIALS AND METHODS with a cooF gene. DATABASE SEARCHING FOR [Ni,Fe]-CODH SEQUENCES The US Department of Energy’s Integrated Microbial Genome GENOMIC SEQUENCING OF THERMOSINUS CARBOXYDIVORANS (IMG) database was searched using BLASTp to query the pro- Thermosinus carboxydivorans was grown as previously described tein database (Altschul et al., 1990) for sequences correspond- (Sokolova et al., 2004). DNA was extracted using previously ing to [Ni,Fe]-CODHs. [Ni,Fe]-CODHs are subdivided into two described protocols (Wu et al., 2005). The genome of T. carboxy- types: the type called CooS, whichismorefrequentinbacte- divorans Nor1 was sequenced at the Joint Genome Institute (JGI) ria, and the type called Cdh, occurring almost exclusively in using a combination of 3, 8, and 40 kb (fosmid) DNA libraries. archaea. The ligands of the Ni,Fe-containing active site (Dobbek’s In addition to Sanger sequencing, 454 pyrosequencing was done C cluster; Dobbek et al., 2001) are conserved, with few excep- to a depth of 20× coverage. All general aspects of library con- tions, in CooS- and Cdh-type CODHs (Lindahl, 2002). However, struction and sequencing performed at the JGI can be found at Cdh-type CODHs harbor two additional [Fe S ] clusters (Gen- 4 4 http://www.jgi.doe.gov/. cic et al., 2010), and, on the whole, there is rather low homology Draft assemblies were based on 28,812 total reads. All between CooS- and Cdh-type CODHs. Therefore, the IMG data- three libraries provided 9.3× coverage of the genome. The base was searched using a CooS-type sequence (CooS-I from C. hydrogenoformans) and a Cdh-type sequence (Cdh-1 from the archaeon Archeoglobus fulgidus). Overlapping results and low scor- 1http://tree.bio.edu.ac.uk/software/figrtee/ ing hits (Quality scores less than 200) were removed from the 2http://img.jgi.doe.gov/

Frontiers in Microbiology | Evolutionary and Genomic Microbiology April 2012 | Volume 3 | Article 132 | 2 Techtmann et al. Anaerobic carbon monoxide dehydrogenase HGT

Phred/Phrap/Consed software package3 was used for sequence RESULTS assembly and quality assessment (Ewing and Green, 1998; Ewing MANY ORGANISMS ENCODE MORE THAN ONE [Ni,Fe]-CODH GENE et al.,1998; Gordon et al.,1998). After the shotgun stage,reads were BLAST searches of the IMG database revealed that a surpris- assembled with parallel Phrap (High Performance Software, LLC). ing number of organisms encode [Ni,Fe]-CODH genes. Of the Possible mis-assemblies were corrected with Dupfinisher (Han and 2887 extant bacterial and archaeal genomes, 185 genomes (>6%) Chain, 2006) or transposon bombing of bridging clones (Epicen- encoded at least one [Ni,Fe]-CODH gene. Of these 185 genomes, tre Biotechnologies, Madison, WI, USA). Gaps between contigs 43% encoded more than one [Ni,Fe]-CODH (Listed in Table S2 in were closed by editing in Consed, custom primer walk or PCR Supplemental Material). The highest number of [Ni,Fe]-CODHs amplification (Roche Applied Science, Indianapolis, IN, USA). detected in a single genome is five, found in the genome sequences A total of 5919 additional reactions were necessary to close of C. hydrogenoformans, the Mono Lake isolate delta Proteobac- gaps and to elevate the quality of the finished sequence. The com- terium MLMS-1, and the methanogenic archaeon Methanosarcina pleted genome sequences of T. carboxydivorans Nor1 contains acetivorans. Many of the species that encode multiple [Ni,Fe]- 36,788 reads, achieving an average of 10-fold sequence cover- CODHs have been described as having the ability to utilize CO for age per base with an error rate less than 1 in 100,000. The both energy conservation and carbon acquisition, or energy con- draft genome sequence was deposited into GenBank (Accession servation alone. Several of the gene clusters have genomic contexts Numbers AAWL01000001–AAWL01000049). that indicate probable primary functions such as metal reduction, oxidative stress responses, and cofactor reduction. In addition, the HORIZONTAL GENE TRANSFER ANALYSES OF THE [Ni,Fe]-CODH–ECH important human pathogens Clostridium difficile and Clostridium GENE CLUSTER botulinum both possess two copies of [Ni,Fe]-CODHs, suggest- The nucleotide sequences for the [Ni,Fe]-CODH–ECH gene ing a role for [Ni,Fe]-CODHs in their physiology. The majority cluster were extracted (including genes of all hydrogenase sub- of [Ni,Fe]-CODHs found in our analysis occur in strains whose units) from the C. hydrogenoformans and the T. carboxydivo- modes of CO utilization are not yet established, and therefore rans genomes. These sequences and the whole genome sequences more CO-related physiological traits are likely to be discovered in for both organisms were used for further analysis. Additionally, the future. the Escherichia coli K12 genome (GenBank Accession Number U00096.2) was used as a control. PHYLOGENETIC ANALYSIS OF [Ni,Fe]-CODH SEQUENCES REVEALS G + C content was determined using GC-Profile program (Gao SEVERAL DISTINCT CLADES and Zhang, 2006). Tetranucleotide frequency was determined A dataset of all of the [Ni,Fe]-CODHs from the IMG database using the TETRA program (Teeling et al., 2004). Codon usage was used to reconstruct the phylogeny of [Ni,Fe]-CODHs. Both was determined from tables in the codon usage database4.Over- ML and BI phylogenetic analyses were used. The phylogenetic all codon usage for the 10 genes of the [Ni,Fe]-CODH-ECH gene trees (Figures 1A,B) constructed from both methods were con- cluster was determined by averaging the codon frequency of each gruent, therefore only the ML trees are shown in subsequent gene in the gene cluster. figures. These trees both detail six distinct, strongly supported The Kullback–Leibler (K–L) divergence metric (Kullback and clades (greater than 50% boostraps for ML trees, with most nodes Leibler, 1951) was used to determine whether the differences in supported at >86%, and Bayesian posterior support of greater G + C content, tetranucleotide frequency, and codon usage were than or equal to 0.99). While the branching pattern within clades significant or not. These three metrics and the K–L divergence were differs slightly between ML and BI, the composition of all six chosen based on a recent paper that suggests that codon usage and clades in both ML and BI trees are identical (Table S3 in Sup- tetranucleotide frequency combined with K–L divergence are the plemental Material). The distribution of sequences within these most accurate metrics for determining HGT (Becq et al., 2010). clades greatly expands our knowledge of the organisms that encode K–L divergence was determined using the following formula: [Ni,Fe]-CODHs as well as the potential mechanisms for their evolution.    g (i) D g||G = g (i) ln KL G (i) FUNCTIONAL ANALYSIS OF [Ni,Fe]-CODH CLADES i Analysis of the genomic context was performed so that the [Ni,Fe]- CODH sequences were binned according to four categories: (1) where g is the a parameter (e.g., frequency of a particular tetranu- within an acetyl-CoA synthase gene cluster, (2) adjacent to an cleotide or frequency of usage of a particular codon) for the gene or ECH gene cluster, (3) adjacent to a cooF electron transfer protein, operon and G is that same parameter for the whole genome. This and (4) not adjacent to a cooF gene. These categories were plotted calculation is repeated for all of the tetranucleotide frequencies on the ML phylogenetic tree displayed in a circular format which and the resulting values are added together to determine the K–L was color coded according to the genomic context of the CODHs divergence for tetranucleotide frequency. The same basic calcula- (Figure 2). tion is done for the frequency of usage for each codon to determine the K–L divergence for codon usage between a gene or operon and SUMMARY OF [Ni,Fe]-CODH CLADES a genome. Clade A is composed of sequences corresponding to the Cdh-type CODHs, which occur almost exclusively in the Archaea and for the 3www.phrap.com most part cluster in the genomes as part of the five-subunit acetyl- 4www.kazusa.or.jp/codon/ CoA decarbonylase/synthase (ACDS) complex (Terlesky et al.,

www.frontiersin.org April 2012 | Volume 3 | Article 132 | 3 Techtmann et al. Anaerobic carbon monoxide dehydrogenase HGT

FIGURE 1 | Unrooted phylogenetic trees of the [Ni,Fe]-CODH amino probabilities are shown for the deep branches defining the six acid sequences encoded by IMG genomes were constructed from (A) [Ni,Fe]-CODH clades. The compositions of all six clades in both ML and BI Maximum Likelihood and (B) Bayesian Inference algorithms. Bootstrap trees are identical (see Figure A1 in Appendix and Table S3 in values as a percentage of 1000 replicates or Bayesian posterior Supplemental Material).

1986; Grahame, 1991; Kocsis et al., 1999; Gencic et al., 2010). Only Chlorobium phaeobacteroides and Chlorobaculum parvum. These one bacterium, the deep subsurface pioneer species Candidatus phototrophs and the second of two CooS sequences from the Desulforudis audaxviator, has a typical archaeal Cdh-type ACDS homoacetogen Moorella thermoacetica form the most deeply complex (Chivian et al., 2008). branching members of this clade. None of the sequences in Clade Clade B contains [Ni,Fe]-CODHs from an eclectic group of B is linked to any other known CO-related genes. Bacteria,the majority comprising either gut microbiota or putative Clade C is composed primarily of clostridial species. Most of cellulose degraders. This clade includes species found in diverse the strains within this clade are found as members of the human enteric environments ranging from cow rumen isolates such as microbiome. A large proportion of the [Ni,Fe]-CODH sequences Ruminococcus spp. to constituents of the human gut microbiome within this clade are one of two CooS homologs found in many such as Campylobacter spp. However, this clade also includes strains of the human pathogens C. botulinum and C. difficile.A metal reducing Geobacter species, as well as two phototrophs, large number of these sequences are found adjacent to cooF genes,

Frontiers in Microbiology | Evolutionary and Genomic Microbiology April 2012 | Volume 3 | Article 132 | 4 Techtmann et al. Anaerobic carbon monoxide dehydrogenase HGT

FIGURE 2 | Maximum likelihood phylogeny from Figure 1A cluster. Purple indicates CODH adjacent to an energy-converting drawn as an unrooted circular tree. IMG Gene ID numbers are hydrogenase gene cluster. Green indicates that the CODH is shown in parentheses after sequence names. The genomic context adjacent to a cooF gene. Black indicates that the CODH is not for each of the [Ni,Fe]-CODH sequences is shown by coloring of the linked to a cooF gene. Bootstrap values are shown as a percentage tip labels. Blue indicates CODH occurrence within an ACS gene derived from 1000 replicates. suggesting that these [Ni,Fe]-CODHs are able to use CooF to of the subgroups of Clade D contains sequences from the sec- transfer electrons to other functional proteins. Thus it is possi- ond class of C. botulinum [Ni,Fe]-CODH sequences among other ble that these [Ni,Fe]-CODHs are coupled to a broad range of Clostridium spp. A third group of sequences within Clade D is functional processes including disease-related physiology. Further composed of sequences similar to CooS-V from C. hydrogeno- biochemical investigation is required to determine the function of formans.TheC. hydrogenoformans CooS-V was described in the the [Ni,Fe]-CODHs within this clade. The most deeply branch- genome analysis of C hydrogenoformans as being unique, since it ing members of this clade are unlinked [Ni,Fe]-CODHs from is the only one of the five [Ni,Fe]-CODH genes in this genome hyperthermophilic methanogens including several Methanocaldo- without a putative function, and is not directly linked to other coccus species and Methanopyrus kandleri. The organisms within genes that could provide a contextual clue as to its function this clade thus span an extraordinary phylogenetic range, from (Wu et al., 2005). The majority of Clade D [Ni,Fe]-CODH genes Bacteria through Euryarchaeota and likely participate in diverse are encoded in genomes that also encode other [Ni,Fe]-CODH pathways for CO utilization. genes. All of the [Ni,Fe]-CODH sequences within Clade D were found Clade E is the largest and most eclectic clade, composed to be “lone” cooS genes, that is, not found in genomic context with of several subclades. Representatives of each of the four func- known CO-metabolism related genes. The most deeply branch- tional categories used in this study are found in Clade E. One ing group of Clade D is a subclade composed of a few alkaliphilic of the Clade E subclades is dominated by clostridial and delta bacteria. Another deeply branching group in Clade D is from Eur- proteobacterial [Ni,Fe]-CODHs found within ACS gene clusters. yarchaeal species, either methanogens or Archaeoglobaceae. One This group of sequences includes the majority of the known

www.frontiersin.org April 2012 | Volume 3 | Article 132 | 5 Techtmann et al. Anaerobic carbon monoxide dehydrogenase HGT

bacterial CODH/ACS gene clusters. Another large subclade in Clade F also contains many characterized [Ni,Fe]-CODHs with Clade E contains sequences from sulfate reducing bacteria as well representatives from each of the four functional classes used in as methanogenic archaea. A noteable feature of one more subclade this study. Many of these have been characterized structurally and within Clade E is a group of [Ni,Fe]-CODHs from Thermococcus biochemically (Svetlitchnyi et al., 2001, 2004; Soboh et al., 2002; spp. that are linked to ECH in these Euryarchaeota. These thermo- Volbeda and Fontecilla-Camps, 2005). These biochemical studies cocci have the ability to couple the anaerobic oxidation of CO to establish Clade F as encompassing the greatest known functional hydrogen production, via a conserved [Ni,Fe]-CODH-ECH gene diversity of [Ni,Fe]-CODH gene clusters. cluster (Lim et al., 2010). Clade F is numerically small,however members in all of the four THE GENOME OF T. CARBOXYDIVORANS ENCODES A MINIMAL CooS functional categories are represented in this group of CODHs. HOMOLOG The four C. hydrogenoformans [Ni,Fe]-CODH gene clusters in The genome of T. carboxydivorans encodes three [Ni,Fe]-CODH Clade F have been assigned different metabolic functions based homologs. One of these homologs is highly similar to the on biochemical characterization as well as their genetic linkages hydrogenase-linked [Ni,Fe]-CODH from C. hydrogenoformans.A to other functional genes (Wu et al., 2005). For example, the second [Ni,Fe]-CODH is highly similar to the CODH-V from C. well-characterized [Ni,Fe]-CODH–ECH gene clusters originally hydrogenoformans. The third [Ni,Fe]-CODH homolog however described in R. rubrum and C. hydrogenoformans are assigned is distinct from all known [Ni,Fe]-CODH sequences (Figure 3). to this clade. In addition to these well-characterized [Ni,Fe]- Other known [Ni,Fe]-CODH sequences are 600–800 amino acid CODH–ECH operons there are several other ECH-clustered residues long. This divergent mini-CooS has only 482 amino acid CODHs within Clade F.Another subgroup within Clade F contains residues and its short length does not result from frameshift trun- bacterial CODH/ACS gene clusters. cation. Upon alignment with other [Ni,Fe]-CODH sequences, it is

FIGURE3|Phylogenetic position of the mini-CooS fromThermosinus clades defined by Figure 1 as well as the mini-CooS from T. carboxydivorans. carboxydivorans. A maximum likelihood tree was built from a MUSCLE The tree was bootstrapped with 1000 replicates,which are shown as alignment of representative sequences selected from all of the [Ni,Fe]-CODH percentage values at the nodes in the tree.

Frontiers in Microbiology | Evolutionary and Genomic Microbiology April 2012 | Volume 3 | Article 132 | 6 Techtmann et al. Anaerobic carbon monoxide dehydrogenase HGT

apparent that its small size can be accounted for by many deletions shown to comprise two operons: a CODH operon, which con- of varying size relative to standard size CooS genes (Figure 4),how- tains a cooS gene and the gene for a nickel–iron, electron transfer ever the mini-CooS seems to retain all of the important domains protein (cooF), and the adjacent membrane-bound hydrogenase of typical CODHs, suggesting that it may be functional (Figure 4). operon. The [Ni,Fe]-CODH-ECH gene clusters are composed of Dobbek et al. (2004) solved the prototype crystal structure for the similar sets of genes in R. rubrum, C. hydrogenoformans, and T. car- CooS-II from C. hydrogenoformans which revealed three metal boxydivorans.InC. hydrogenoformans, the [Ni,Fe]-CODH–ECH binding clusters (Clusters B, C, and D), each with separate metal gene cluster probably comprises two operons (there is an inter- coordinating domains with conserved Cys and His residues essen- genic space approximately 100 nt long between the hypA gene and tial for CODH activity. Interestingly, the 11 Cys residues and the the cooF gene). In T. carboxydivorans, there is no intergenic space His residue coordinating these clusters are all conserved in the or putative promoter sequences between the ECH and CODH minimal CooS from T. carboxydivorans. Hydroxylamine reduc- sections of the gene cluster,suggesting that it forms a single operon. tases share some sequence similarity with CooS, but they all lack Nevertheless, the CODH and the clustered genes encoding the some of the conserved Cys residues that locate the metal clusters seven subunits of the membrane-bound hydrogenase from T. car- in CooS. The proximity in the genome of the minimal cooSgene boxydivorans are highly similar to the corresponding genes from C. to a cooF gene provides further evidence for its involvement in hydrogenoformans, in both gene order and in nucleotide sequence. CO-metabolism. The 11.3 kb operon that comprises the [Ni,Fe]-CODH–ECH gene cluster in T. carboxydivorans is 84% identical on the nucleotide EVIDENCE FOR HGT OF THE [NI,FE]-CODH-ECH GENE CLUSTER OF level to the homologous gene cluster in C. hydrogenoformans,an THERMOSINUS CARBOXYDIVORANS observation that prompted us to examine these gene clusters for ThegenomesequenceofT. carboxydivorans provides insights into quantifiable evidence of HGT. the ability of microbes to utilize CO as a versatile electron donor Several in silico methods have been proposed as indicators as well as potential mechanisms for acquisition of [Ni,Fe]-CODH of HGT. G + C content has been a well-established and simple operons and the evolution of multi-[Ni,Fe]-CODH genotypes. method for predicting HGT (Lawrence and Ochman, 1998). The As mentioned previously, the draft T. carboxydivorans genome G + C content of the T. carboxydivorans genome is 51.6% whereas sequence revealed three CODH gene clusters. The [Ni,Fe]-CODH- the G + C content of its 11.3-kb [Ni,Fe]-CODH–ECH operon ECH gene cluster is highly similar in sequence and gene content is 43.4%, which is similar to the overall G + C content of C. to the homologs found in C. hydrogenoformans and R. rubrum hydrogenoformans genome (42.0%). The G + C% of the [Ni,Fe]- as shown in Figure 5.InR. rubrum, this gene cluster has been CODH-ECH gene cluster from C. hydrogenoformans (44%) is very

FIGURE 4 |The mini-CooS fromThermosinus carboxydivorans three sites as being essential for CooS activity. Cluster C (purple), B (red), contains conserved residues essential for CODH activity. An alignment and D (green). All of these residues are present in the C. of Carboxydothermus hydrogenoformans CooS-II, the Cdh-type CODH of hydrogenoformans CooS-II, M. thermoautotrophicus Cdh, and T. Methanothermobacter thermautotrophicus (MTH1708), Thermosinus carboxydivorans minimal CooS. Many of these residues are absent in the carboxydivorans mini-CooS, and Thermincola potens hydroxylamine hydroxylamine reductase. The sequences were aligned with MAFFT reductase TherJR_2940 was constructed. Dobbek et al. (2001) defined v6.861b (http://mafft.cbrc.jp/alignment/server/index.html).

www.frontiersin.org April 2012 | Volume 3 | Article 132 | 7 Techtmann et al. Anaerobic carbon monoxide dehydrogenase HGT

FIGURE 5 | Gene topology for the hydrogenase – CODH gene clusters space between hypA and the cooF in C. hydrogenoformans. Additionally the T. from C. hydrogenoformans,T. carboxydivorans, and R. rubrum. carboxydivorans cooA gene is located 254 kb away from the rest of the gene Homologous genes are colored similarly between operons. The C. cluster. While the R. rubrum gene cluster has many of the same genes, there hydrogenoformans and T. carboxydivorans gene clusters are identical except are notable differences in the genes encoding the hydrogenase accessory for displacement of the associated cooC gene and presence of intergenic proteins (i.e., presence of cooT and cooJ in R. rubrum instead of hypA). similar to the overall G + C% of the C. hydrogenoformans genome respectively. These results confirmed that the threshold values are (Wu et al., 2005). appropriate for specifying similarity. The K–L divergence of the T. Because G + C content alone is considered to be an inadequate carboxydivorans [Ni,Fe]-CODH–ECH gene cluster vs. the T. car- criterion to detect HGT (Garcia-Vallve et al., 2000) we sought boxydivorans genome is thus greater than the K–L distance of the other criteria to distinguish the [Ni,Fe]-CODH–ECH gene clus- T. carboxydivorans [Ni,Fe]-CODH–ECH gene cluster vs. the E. coli ter in T. carboxydivorans from the genome at large. In a recent K12 genome. Based on these metrics it appears that the [Ni,Fe]- paper, additional methods for determining HGT were evaluated CODH–ECH gene cluster from T. carboxydivorans is significantly (Becq et al., 2010). Tetranucleotide frequency and codon usage divergent from the T. carboxydivorans genome at large. similarity quantified with K–L divergence were proposed as the The same analysis was performed to examine whether the C. most accurate metrics for predicting HGT. We compared the hydrogenoformans [Ni,Fe]-CODH–ECH gene cluster is similar to [Ni,Fe]-CODH–ECH gene clusters of T. carboxydivorans and C. rest of the C. hydrogenoformans genome. The K–L divergences hydrogenoformans with the full genome sequences of T. carboxydi- comparing tetranucleotide frequency and codon usage for the C. vorans, C. hydrogenoformans, with K–L analysis using the genome hydrogenoformans [Ni,Fe]-CODH–ECH gene cluster against the of E. coli K12 as a control to determine if these two CODH-I gene C. hydrogenoformans genome are 0.023 and 0.075 respectively. clusters are divergent from the genomes at large (Table 1). Again E. coli K12 was used as a control and the K–L divergences The T. carboxydivorans [Ni,Fe]-CODH–ECH gene cluster was for tetranucleotide frequency and codon usage are 0.155 and 0.236 compared to the T. carboxydivorans genome using tetranucleotide respectively. The K–L divergences comparing the C. hydrogeno- frequency, resulting in a K–L divergence of 0.217. When codon formans [Ni,Fe]-CODH–ECH gene cluster with the rest of the usage was used as the metric, the K–L divergence was 0.133. The genome are well below the cutoff values and therefore suggest that threshold values of 0.05 and 0.1 for K–L divergence of codon usage the C. hydrogenoformans CODH-I gene cluster is equilibrated with and tetranucleotide frequency respectively were used to determine the C. hydrogenoformans genome. similarity. By both metrics the T. carboxydivorans [Ni,Fe]-CODH– The K–L divergences for the comparison of tetranucleotide fre- ECH gene cluster is significantly different from the T. carboxy- quency and codon usage of theT.carboxydivorans [Ni,Fe]-CODH– divorans genome. As a control, the tetranucleotide frequency ECH gene cluster against the C. hydrogenoformans genome are and codon usage of the genes encoding the T. carboxydivorans 0.048 and 0.088. These K–L divergences are below the cutoff [Ni,Fe]-CODH–ECH gene cluster were compared to those of the values, indicating that the T. carboxydivorans [Ni,Fe]-CODH– E. coli K12 genome. K–L divergences for this comparison are ECH gene cluster is remarkably similar in composition to the C. 0.116 and 0.212 for tetranucleotide frequency and codon usage hydrogenoformans genome.

Frontiers in Microbiology | Evolutionary and Genomic Microbiology April 2012 | Volume 3 | Article 132 | 8 Techtmann et al. Anaerobic carbon monoxide dehydrogenase HGT

Table1|Kullback–Leibler divergence comparing codon usage (bold) and tetranucleotide frequency (Italics) for various combinations of the C. hydrogenoformans,T. carboxydivorans, and E. coli K12 genomes and the C. hydrogenoformans andT. carboxydivorans CODH-ECH gene clusters.

C. hydrogenoformans T.carboxydivorans C. hydrogenoformans T. carboxydivorans E. coli K12 genome genome CODH–ECH cluster CODH–ECH cluster genome

C. hydrogenoformans genome 0 0.302 0.023 0.048 0.155 T. carboxydivorans genome 0.143 0 0.202 0.217 0.098 C. hydrogenoformans CODH–ECH cluster 0.075 0.154 0 0.016 0.115 T. carboxydivorans CODH–ECH cluster 0.088 0.133 0.036 0 0.116 E. coli K12 genome 0.170 0.153 0.236 0.212 0

DISCUSSION member, resulting in [Ni,Fe]-CODHs with similar functions to The burgeoning release of microbial genomes has revealed that have convergent sequence motifs,even if originating from different the ability to utilize CO is a surprisingly common trait shared by evolutionary starting points. a diverse collection of bacteria and archaea. Phylogenetic analysis It is difficult at this stage to assess with accuracy whether the of [Ni,Fe]-CODHs revealed extensive diversity of [Ni,Fe]-CODH [Ni,Fe]-CODH phylogeny is due to divergence along functional sequences in 6% of the released microbial genomes. Most of these lines as described above, due to the fact that only [Ni,Fe]-CODHs species have primary physiologies not related to CO metabolism from clades A and F have been definitively characterized bio- and have not been scored for CO utilization. Among [Ni,Fe]- chemically. In an effort to clearly elucidate the branching pattern CODH encoding species, more than 40% encode more than one seen in the global [Ni,Fe]-CODH tree shown in Figure 1,we CODHs. This previously unexplored diversity of sequences has the have attempted to analyze the genomic context of each of the potential to provide insights into both the function and evolution [Ni,Fe]-CODHs in the tree. The function of a [Ni,Fe]-CODH is of this important protein family. For the most part the multiple often determined by the accessory proteins with which it asso- [Ni,Fe]-CODHs encoded by the same organism are not close rel- ciates. In many genomes, these accessory proteins are encoded in atives. This suggests that these multiple [Ni,Fe]-CODHs in one the same operon or adjacent to the [Ni,Fe]-CODH. The analy- genome are not the result of recent gene duplications. sis of the genomic context and in turn the implied function of Our phylogenetic analysis confirms prior work which has the [Ni,Fe]-CODH may help to distinguish vertical descent along shown that there is a clear distinction between the bacterial-type function-determined lineages from HGT. CODH/ACS and the archaeal-type CODH/ACDS. The clade com- The functional tree (Figure 2) shows that on the whole there posed of the archaeal-type CODH/ACDS is populated by archaea is co-occurrence of clades and genomic context. However there with one noteable exception. There is a sequence for an archaeal- are exceptions where the genomic context is not congruent with like CODH/ACDS found in the genome of the deep subsurface clades. The best example of genomic context being congruent with bacterium Candidatus D. audaxviator, which was the sole detected phylogenetic lineage is in the case of the archaeal CODH/ACDS inhabitant of subterranean effluent waters in a South African gold (Figure 2, Clade A). Aside from a few isolated cases, namely A. mine 2.8 km below the surface (Chivian et al., 2008). So far the fulgidus and M. acetivorans, the remaining members are found CODH/ACDS from D. audaxviator is the only sequence of a within the context of the CODH/ACDS operon. It is most pos- CODH/ACDS found in a bacterial genome, supporting the notion sible that in the cases of the exceptions, genomic context does that D. audaxviator acquired this CODH/ACDS via a HGT event not reflect functional cooperation. For the most part, however, from an archaeon. Clade A is predominated by species from the domain Archaea and Aside from D. audaxviator’s CODH/ACDS it is difficult to dis- with [Ni,Fe]-CODHs with a conserved genomic context (ACDS tinguish putative HGT events of [Ni,Fe]-CODHs from vertical clusters). descent solely from phylogenetic analysis. There are many exam- The bacterial side of the tree is not as straightforward. There are ples of [Ni,Fe]-CODH clades that are composed of organisms that many cases in which genomic context does appear to be correlated are not related to one another phylogenetically. For example, in with appearance in a common clade. Also, there are a few exam- clades C, D, and E, archaeal [Ni,Fe]-CODHs form deep branches ples where similar functions are found in two different clades. The of a clade composed primarily of [Ni,Fe]-CODHs from bacterial well-characterized physiology of hydrogenogenic carboxydotro- species, tempting us to infer that these sequences may be the result phy is encoded by a [Ni,Fe]-CODH linked to an ECH. In this of ancient inter-domain HGTs.However,the disparity between the phylogenetic tree there are two places in which this gene clus- 16S rRNA phylogeny and the [Ni,Fe]-CODH phylogeny recon- ter is found (Clade E and Clade F). While it may be difficult to structed here could result from functional distinctions between explain this phenomenon in terms of vertical descent, it is clear CODH lineages. We assume that the function of [Ni,Fe]-CODHs that unlike the CODHs in Clade A the [Ni,Fe]-CODH–ECH gene is determined by their associated proteins encoded within gene cluster is not the result of simple vertical descent from an ancestral clusters, since this has been observed in many cases. Therefore, [Ni,Fe]-CODH–ECH. the CODH proteins from one clade may have evolved to associate Another example of correlation of the genomic context with with accessory proteins more often clustered with another clade the tree topology is provided by the absence of CooS/ACS gene

www.frontiersin.org April 2012 | Volume 3 | Article 132 | 9 Techtmann et al. Anaerobic carbon monoxide dehydrogenase HGT

clusters beyond clades E and F. This could indicate that CooS/ACS genes (cooA-1)inC. hydrogenoformans is directly upstream of its sequences have evolved by vertical inheritance and the presence [Ni,Fe]-CODH–ECH gene cluster. The lone cooA homolog in the of phylogenetically diverse hosts within these clades is not due to genome of T. carboxydivorans is a homolog of C hydrogenoformans HGT but is a result of conservation of function. However, like the cooA-1 and is separated by 252 kb from the hydrogenase – CODH CODH–ECH gene cluster, the CODH/ACS gene cluster is present gene cluster. Interestingly, the T. carboxydivorans cooA is flanked in three clades. Again this points to a more complex evolutionary by transposes and phage-related genes, which are known to be history, which may indicate that this gene cluster has arisen more mediators of genomic rearrangement and HGT (Ochman et al., than once during the course of evolution or potentially a more 2000). complex mechanism involving HGT and genome rearrangements. The T. carboxydivorans genome also contains the smallest gene By overlaying known functional categories onto the phyloge- predicted to contain all of the conserved CooS features. This netic tree we were able to clarify some of the phylogenetic history of putative CODH is 428 amino acid residues long compared with [Ni,Fe]-CODHs. However much work needs to be done to further typical [Ni,Fe]-CODHs that range from approximately 600 to 800 clarify the function of the many of the remaining [Ni,Fe]-CODHs. residues long. This “mini-CooS” has been placed in a separate Since it appears that in some cases the clades fall along func- multiple alignment that gave rise to the ML analysis shown in tional boundaries, the question of the role of HGT as a means of Figure 3. Taking into account (i) the acquisition of the hydroge- acquisition of [Ni,Fe]-CODHs could be more clearly addressed by nase – [Ni,Fe]-CODH gene cluster via HGT, which we substan- examining unrelated species that possesses [Ni,Fe]-CODHs that tiate in this paper, (ii) the separate location of the cooA gene, are highly similar in sequence and function. One such example and (iii) the fact that the G + C content of the gene encod- is found in the [Ni,Fe]-CODH–ECH of Clade F. C. hydrogeno- ing the unusual small cooS is about 10 mol% higher than that formans and T. carboxydivorans are two thermophiles from the of the genome. Therefore, we speculate that the assemblage of Phylum Firmicutes but are from two divergent classes, Clostridia the carboxydotrophic hydrogenogenic phenotype in T. carboxy- and Negativicutes, respectively. While these organisms share a divorans is an evolutionarily recent event, suggesting that rapid common physiology they represent widely separated phylogenetic genome evolution is under way to accommodate selection for CO lineages. They both possess [Ni,Fe]-CODH–ECH homologous utilization. gene clusters with high similarity to each other (84% nucleotide Due to the large number of species that have been shown to identity). For this reason we set out to investigate whether the encode multiple CODHs, it seems probable that multiple [Ni,Fe]- [Ni,Fe]-CODH–ECH gene cluster could have been acquired by CODH operons increase the fitness of an organism in CO-rich T. carboxydivorans via an HGT event. Analysis of G + C content, milieus by providing divergent metabolic pathways through which tetranucleotide frequency, and codon usage support the hypoth- CO can be metabolized. The consumption of CO by indepen- esis that the CODH – ECH gene cluster was assimilated by T. dent pathways providing energy conservation, carbon fixation, carboxydivorans via HGT. and nicotinamide coenzyme reduction are now well-established. Based on the high similarity between the hydrogenase – CODH On a broader scale, in microbial consortia, CO utilization has gene clusters of T. carboxydivorans and C. hydrogenoformans and the additional communal benefit of dissipating a potentially toxic on the fitness of the T. carboxydivorans [Ni,Fe]-CODH–ECH to compound (Parshina et al., 2005; Techtmann et al., 2009). Our the C. hydrogenoformans genome in terms of tetranucleotide fre- recent work revealed that adaptive regulation of expression of quency and codon usage patterns, it is tempting to propose that T. multiple [Ni,Fe]-CODH operons by dual cooA genes may par- carboxydivorans obtained this operon by HGT from C. hydrogeno- tition CO between multiple competing pathways in response to formans or a closely related species. However, closer examination varying needs for energy conservation and carbon acquisition of the data indicate that there may have been a recent transfer from across a wide range of CO concentrations (Techtmann et al., arelativeofC. hydrogenoformans or else a more ancient transfer 2011). from an ancestor of C. hydrogenoformans into the T. carboxydi- In conclusion, we have developed a comprehensive phylogeny vorans lineage. First, the nucleotide identity of the gene clusters, for [Ni,Fe]-CODHs which revealed the presence of six distinct although high (84%), is lower than the ANI value of 95–96% clades. Comparison of the genomic contexts with CooS phylogeny shown to delimit microbial species (Goris et al., 2007; Richter suggests that several clades diverged based on function. These and Rossello-Mora, 2009; Tindall et al., 2010). Second, whereas clades appear to be evolving via vertical descent. However there are in C. hydrogenoformans the cluster seems to comprise two oper- a few CO-related metabolic functions that are spread throughout ons, T. carboxydivorans appears likely to encode a single operon various clades on the tree either indicating different evolutionary (Figure 5). Third, the order of genes in the gene clusters from T. events or HGT. It seems likely that both HGT and vertical trans- carboxydivorans and C. hydrogenoformans has been switched by the mission have driven the remarkable divergence of [Ni,Fe]-CODHs relocation of cooC (Figure 5). The rearrangements suggest further currently seen in both archaea and bacteria evolution in T. carboxydivorans or C. hydrogenoformans following a horizontal transfer. The location of the regulator gene cooA apart ACKNOWLEDGMENTS from the [Ni,Fe]-CODH–ECH gene cluster in T. carboxydivorans This work was funded by the US National Science Foundation also suggests that there is ongoing genome reordering of the through award numbers EAR 0747394 (Frank T. Robb), EAR CO-related genes in this isolate, or may be interpreted as the acqui- 0747412 (Albert S. Colman), and MCB 0605301 (Frank T. Robb sition of cooA that occurred independently from the acquisition and Albert S. Colman). The work of Alexander V. Lebedinsky of the [Ni,Fe]-CODH–ECH gene cluster. One of the two cooA and Tatyana G. Sokolova has been supported by the Russian

Frontiers in Microbiology | Evolutionary and Genomic Microbiology April 2012 | Volume 3 | Article 132 | 10 Techtmann et al. Anaerobic carbon monoxide dehydrogenase HGT

Foundation for Basic Research (project number 11-04-01723-a), SUPPLEMENTAL MATERIAL federal targeted program for 2009–2013 “Scientific and Pedagog- The Supplementary Material for this article can be found online at ical Personnel of Innovative Russia” (GK number P646), and the http://www.frontiersin.org/Evolutionary_and_Genomic_Micro program of the Presidium of the Russian Academy of Sciences biology/10.3389/fmicb.2012.00132/abstract “Molecular and Cell Biology.” The sequencing work conducted by the U.S. Department of Energy Joint Genome Institute is sup- Table S1 | [Ni,Fe]-CODH sequences found in the IMG database as of October 1, 2011. ported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. We are indebted to Table S2 | Genomes encoding multiple [Ni,Fe]-CODH homologs. our reviewers for helpful comments on phylogenetic methods. This is Contribution 11-230 from the Institute of Marine and Table S3 | Composition of the six [Ni,Fe]-CODH clades of both the ML and Environmental Technology. BI phylogenetic trees.

REFERENCES accuracy and high throughput. similarities. Int. J. Syst. Evol. Micro- resolution. J. Struct. Biol. 128, Altschul, S. F., Gish, W., Miller, W., Nucleic Acids Res. 32, 1792–1797. biol. 57, 81–91. 165–174. Myers, E. W., and Lipman, D. J. Ewing, B., and Green, P. (1998). Grahame, D. A. (1991). Catalysis of Kullback, S., and Leibler, R. A. (1951). (1990). Basic local alignment search Base-calling of automated sequencer acetyl-CoA cleavage and tetrahy- On information and sufficiency. tool. J. Mol. Biol. 215, 403–410. traces using phred. II. Error proba- drosarcinapterin methylation by a Ann. Math. Stat. 22, 79–86. Becq, J., Churlaud, C., and Descha- bilities. Genome Res. 8, 186–194. carbon monoxide dehydrogenase- Lawrence, J. G., and Ochman, H. vanne, P. (2010). A benchmark Ewing, B., Hillier, L., Wendl, M. corrinoid enzyme complex. J. Biol. (1998). Molecular archaeology of parametric methods for hor- C., and Green, P. (1998). Base- Chem. 266, 22227–22233. of the Escherichia coli genome. izontal transfers detection. PLoS calling of automated sequencer Han, C. S., and Chain, P. (2006). “Fin- Proc. Natl. Acad. Sci. U.S.A. 95, ONE 5, e9989. doi:10.1371/jour- traces using phred. I. Accuracy ishing repeat regions automatically 9413–9417. nal.pone.0009989 assessment. Genome Res. 8, 175–185. with Dupfinisher,” in Proceedings of Lim, J. K., Kang, S. G., Lebedin- Chivian, D., Brodie, E. L., Alm, E. J., Ferry, J. G. (1995). CO dehydrogenase. the 2006 International Conference on sky, A. V., Lee, J. H., and Lee, Culley, D. E., Dehal, P. S., Desan- Annu.Rev.Microbiol.49, 305–333. Bioinformatics and Computational H. S. (2010). Identification of a tis, T. Z., Gihring, T. M., Lapidus, Gao, F., and Zhang, C. T. (2006). GC- Biology, eds. H. R. Arabnia and H. novel class of membrane-bound A., Lin, L. H., Lowry, S. R., Moser, Profile: a web-based tool for visual- Valafar (Las Vegas: CSREA Press), [NiFe]-hydrogenases in Thermococ- D. P., Richardson, P. M., Southam, izing and analyzing the variation of 141–146. cus onnurineus NA1 by in silico G., Wanger, G., Pratt, L. M., Ander- GC content in genomic sequences. Hugendieck, I., and Meyer, O. (1992). analysis. Appl. Environ. Microbiol. sen, G. L., Hazen, T. C., Brockman, Nucleic Acids Res. 34, W686–W691. The structural genes encoding CO 76, 6286–6289. F. J., Arkin, A. P., and Onstott, T. Garcia-Vallve, S., Romeu, A., and Palau, dehydrogenase subunits (cox L, M Lindahl,P.A. (2002). The Ni-containing C. (2008). Environmental genomics J. (2000). Horizontal gene trans- and S) in Pseudomonas carboxy- carbon monoxide dehydrogenase reveals a single-species ecosystem fer in bacterial and archaeal com- dovorans OM5 reside on plas- family: light at the end of the tunnel? deep within Earth. Science 322, plete genomes. Genome Res. 10, mid pHCG3 and are, with the Biochemistry 41, 2097–2105. 275–278. 1719–1725. exception of Streptomyces thermoau- Meyer, O., Frunzke, K., and Mörsdorf, Dobbek, H., Gremer, L., Meyer, O., Gencic, S., Duin, E. C., and Grahame, D. totrophicus, conserved in carboxy- G. (eds). (1993). Biochemistry of the and Huber, R. (1999). Crystal A. (2010). Tight coupling of partial dotrophic bacteria. Arch. Microbiol. Aerobic Utilization of Carbon Monox- structure and mechanism of CO reactions in the acetyl-CoA decar- 157, 301–304. ide. Andover, MA: Intercept, Ltd. dehydrogenase, a molybdo iron- bonylase/synthase (ACDS) multien- Kerby, R. L., Hong, S. S., Ensign, Meyer, O., and Rajagopalan, K. V. sulfur flavoprotein containing S- zyme complex from Methanosarcina S. A., Coppoc, L. J., Ludden, P. (1984). Molybdopterin in carbon selanylcysteine. Proc. Natl. Acad. Sci. thermophila: acetyl C-C bond frag- W., and Roberts, G. P. (1992). monoxide oxidase from carboxy- U.S.A. 96, 8884–8889. mentation at the a cluster promoted Genetic and physiological charac- dotrophic bacteria. J. Bacteriol. 157, Dobbek, H., Svetlitchnyi, V., Gre- by protein conformational changes. terization of the Rhodospirillum 643–648. mer, L., Huber, R., and Meyer, O. J. Biol. Chem. 285, 15450–15463. rubrum carbon monoxide dehydro- Meyer, O., and Schlegel, H. G. (2001). Crystal structure of a carbon Gnida, M., Ferner, R., Gremer, L., Meyer, genase system. J. Bacteriol. 174, (1983). Biology of aerobic carbon monoxide dehydrogenase reveals a O., and Meyer-Klaucke, W. (2003). 5284–5294. monoxide-oxidizing bacteria. Annu. [Ni-4Fe-5S] cluster. Science 293, A novel binuclear [CuSMo] cluster King, G. M. (2003). Uptake of carbon Rev. Microbiol. 37, 277–310. 1281–1285. at the active site of carbon monox- monoxide and hydrogen at environ- Miller,M.A.,Holder,M. T.,Vos,R.,Mid- Dobbek, H., Svetlitchnyi, V., Liss, J., and ide dehydrogenase: characterization mentally relevant concentrations by ford, P. E., Liebowitz, T., Chan, L., Meyer, O. (2004). Carbon monoxide by X-ray absorption spectroscopy. mycobacteria. Appl. Environ. Micro- Hoover, P., and Warnow, T. (2010). induced decomposition of the active Biochemistry 42, 222–230. biol. 69, 7266–7272. The CIPRES Portals. CIPRES. site [Ni-4Fe-5S] cluster of CO dehy- Gonzalez, J. M., and Robb, F. T. King, G. M. (2006). Nitrate-dependent Available at: http://www.phylo.org/ drogenase. J. Am. Chem. Soc. 126, (2000). Genetic analysis of Carboxy- anaerobic carbon monoxide oxida- sub_sections/portal 5382–5387. dothermus hydrogenoformans car- tion by aerobic CO-oxidizing bacte- Milne, I., Lindner, D., Bayer, M., Hus- Drummond, A. J., and Rambaut, bon monoxide dehydrogenase genes ria. FEMS Microbiol. Ecol. 56, 1–7. meier, D., Mcguire, G., Marshall, D. A. (2007). BEAST: Bayesian evo- cooF and cooS. FEMS Microbiol. King, G. M., and Weber, C. F. F., and Wright, F. (2009). TOPALi lutionary analysis by sampling Lett. 191, 243–247. (2007). Distribution, diversity and v2: a rich graphical interface for trees. BMC Evol. Biol. 7, 214. Gordon, D., Abajian, C., and Green, P. ecology of aerobic CO-oxidizing evolutionary analyses of multiple doi:10.1186/1471-2148-7-214 (1998). Consed: a graphical tool for bacteria. Nat. Rev. Microbiol. 5, alignments on HPC clusters and Edgar, R. C. (2004a). MUSCLE: a mul- sequence finishing. Genome Res. 8, 107–118. multi-core desktops. Bioinformatics tiple sequence alignment method 195–202. Kocsis, E., Kessel, M., Demoll, E., and 25, 126–127. with reduced time and space com- Goris, J., Konstantinidis, K. T., Klappen- Grahame, D. A. (1999). Structure Ochman, H., Lawrence, J. G., and plexity. BMC Bioinformatics 5, 113. bach,J. A.,Coenye,T.,Vandamme,P., of the Ni/Fe-S protein subcom- Groisman, E. A. (2000). Lateral doi:10.1186/1471-2105-5-113 and Tiedje, J. M. (2007). DNA-DNA ponent of the acetyl-CoA decar- gene transfer and the nature of Edgar, R. C. (2004b). MUSCLE: mul- hybridization values and their rela- bonylase/synthase complex from bacterial innovation. Nature 405, tiple sequence alignment with high tionship to whole-genome sequence Methanosarcina thermophila at 26-A 299–304.

www.frontiersin.org April 2012 | Volume 3 | Article 132 | 11 Techtmann et al. Anaerobic carbon monoxide dehydrogenase HGT

Parshina, S. N., Kijlstra, S., Hen- Soboh, B., Linder, D., and Hedderich, from the anaerobic carbon- Volbeda,A., and Fontecilla-Camps, J. C. stra, A. M., Sipma, J., Plugge, C. R. (2002). Purification and catalytic monoxide-utilizing Eubacterium (2005). Structural bases for the cat- M., and Stams, A. J. (2005). Car- properties of a CO-oxidizing:H2- Carboxydothermus hydrogeno- alytic mechanism of Ni-containing bon monoxide conversion by ther- evolving enzyme complex from formans. J. Bacteriol. 183, carbon monoxide dehydrogenases. mophilic sulfate-reducing bacteria Carboxydothermus hydrogeno- 5134–5144. Dalton Trans. 21, 3443–3450. in pure culture and in co-culture formans. Eur. J. Biochem. 269, Techtmann, S. M., Colman, A. S., Wu, M., Ren, Q., Durkin, A. S., Daugh- with Carboxydothermus hydrogeno- 5712–5721. Murphy, M. B., Schackwitz, W. erty, S. C., Brinkac, L. M., Dodson, formans. Appl. Microbiol. Biotechnol. Sokolova,T.G.,Gonzalez,J. M.,Kostrik- S., Goodwin, L. A., and Robb, F. R. J., Madupu, R., Sullivan, S. A., 68, 390–396. ina, N. A., Chernyh, N. A., Slepova, T. (2011). Regulation of multiple Kolonay, J. F., Haft, D. H., Nelson, Ragsdale, S. W. (2004). Life with carbon T. V., Bonch-Osmolovskaya, E. A., carbon monoxide consumption W. C., Tallon, L. J., Jones, K. M., monoxide. Crit. Rev. Biochem. Mol. and Robb, F. T. (2004). Thermosinus pathways in anaerobic bac- Ulrich, L. E., Gonzalez, J. M., Zhulin, Biol. 39, 165–195. carboxydivorans gen. nov., sp. nov., teria. Front. Microbiol. 2:147. I. B., Robb, F. T., and Eisen, J. A. Ragsdale, S. W. (2008). Enzymology of a new anaerobic, thermophilic, doi:10.3389/fmicb.2011.00147 (2005). Life in hot carbon monox- the wood-Ljungdahl pathway of ace- carbon-monoxide-oxidizing, Techtmann, S. M., Colman, A. S., and ide: the complete genome sequence togenesis. Ann. N. Y. Acad. Sci. 1125, hydrogenogenic bacterium from a Robb, F. T. (2009). ‘That which does of Carboxydothermus hydrogenofor- 129–136. hot pool of Yellowstone National not kill us only makes us stronger’: mans Z-2901. PLoS Genet. 1, e65. Rambaut, A., and Aj, D. (2007). Park. Int. J. Syst. Evol. Microbiol. 54, the role of carbon monoxide in ther- doi:10.1371/journal.pgen.0010065 Tracer v1.4. Available at: 2353–2359. mophilic microbial consortia. Envi- http://beast.bio.ed.ac.uk/Tracer Sokolova, T. G., Henstra, A. M., Sipma, ron. Microbiol. 11, 1027–1037. Conflict of Interest Statement: The Richter, M., and Rossello-Mora, R. J., Parshina, S. N., Stams, A. J., Teeling, H., Waldmann, J., Lombar- authors declare that the research was (2009). Shifting the genomic gold and Lebedinsky, A. V. (2009). Diver- dot, T., Bauer, M., and Glockner, conducted in the absence of any com- standard for the prokaryotic species sity and ecophysiological features F. O. (2004). TETRA: a web-service mercial or financial relationships that definition. Proc. Natl. Acad. Sci. of thermophilic carboxydotrophic and a stand-alone program for the could be construed as a potential con- U.S.A. 106, 19126–19131. anaerobes. FEMS Microbiol. Ecol. 68, analysis and comparison of tetranu- flict of interest. Schubel, U., Kraut, M., Morsdorf, 131–141. cleotide usage patterns in DNA G., and Meyer, O. (1995). Mol- Stamatakis, A. (2006). RAxML-VI- sequences. BMC Bioinformatics 5, Received: 07 December 2011; accepted: 20 ecular characterization of the HPC: maximum likelihood-based 163. doi:10.1186/1471-2105-5-163 March 2012; published online: 17 April gene cluster coxMSL encoding the phylogenetic analyses with thou- Terlesky, K. C., Nelson, M. J., and Ferry, 2012. molybdenum-containing carbon sands of taxa and mixed models. J. G. (1986). Isolation of an enzyme Citation: Techtmann SM, Lebedinsky monoxide dehydrogenase of Olig- Bioinformatics 22, 2688–2690. complex with carbon monoxide AV, Colman AS, Sokolova TG, Woyke otropha carboxidovorans. J. Bacteriol. Stamatakis, A., Hoover, P., and Rouge- dehydrogenase activity containing T, Goodwin L and Robb FT (2012) 177, 2197–2203. mont, J. (2008). A rapid boot- corrinoid and nickel from acetate- Evidence for horizontal gene transfer Seravalli, J., Kumar, M., Lu, W. P., and strap algorithm for the RAxML Web grown Methanosarcina thermophila. of anaerobic carbon monoxide dehy- Ragsdale, S. W. (1995). Mechanism servers. Syst. Biol. 57, 758–771. J. Bacteriol. 168, 1053–1058. drogenases. Front. Microbio. 3:132. doi: of CO oxidation by carbon monox- Svetlitchnyi, V., Dobbek, H., Meyer- Tindall, B. J., Rossello-Mora, R., 10.3389/fmicb.2012.00132 ide dehydrogenase from Clostridium Klaucke, W., Meins, T., Thiele, B., Busse, H. J., Ludwig, W., and This article was submitted to Frontiers in thermoaceticum and its inhibition by Romer, P., Huber, R., and Meyer, Kampfer, P. (2010). Notes on Evolutionary and Genomic Microbiology, anions. Biochemistry 34, 7879–7888. O. (2004). A functional Ni-Ni- the characterization of prokary- a specialty of Frontiers in Microbiology. Seravalli, J., Kumar, M., Lu, W. [4Fe-4S] cluster in the monomeric ote strains for taxonomic purposes. Copyright © 2012 Techtmann, Lebedin- P., and Ragsdale, S. W. (1997). acetyl-CoA synthase from Car- Int. J. Syst. Evol. Microbiol. 60, sky, Colman, Sokolova, Woyke, Good- Mechanism of carbon monoxide boxydothermus hydrogenoformans. 249–266. win and Robb. This is an open-access oxidation by the carbon monoxide Proc. Natl. Acad. Sci. U.S.A. 101, Uffen, R. L. (1976). Anaerobic growth article distributed under the terms of dehydrogenase/acetyl-CoA synthase 446–451. of a Rhodopseudomonas species in the Creative Commons Attribution Non from Clostridium thermoaceticum: Svetlitchnyi, V., Peschel, C., Acker, the dark with carbon monoxide as Commercial License, which permits non- kinetic characterization of the G., and Meyer, O. (2001). Two sole carbon and energy substrate. commercial use, distribution, and repro- intermediates. Biochemistry 36, membrane-associated NiFeS- Proc. Natl. Acad. Sci. U.S.A. 73, duction in other forums, provided the 11241–11251. carbon monoxide dehydrogenases 3298–3302. original authors and source are credited.

Frontiers in Microbiology | Evolutionary and Genomic Microbiology April 2012 | Volume 3 | Article 132 | 12 Techtmann et al. Anaerobic carbon monoxide dehydrogenase HGT

APPENDIX

Clade C

ClostridiClostridium.botulinum.BoNT/A1.ATCC.19397.CooS-

CC

ll

ClostriClostridium.cellulovorans.743B.ATCC.35296.CooS-1 oo

ss

tt 2 rr -

ii

dd Desulfurobacterium.thermolithotrophum.BSA.DSM.11699.CooS-2Desulf 1 S

u ii

uu dium.cellul ClosClost m.botuli 1 S- Clostridium.beijerinckii.NCIMB.8052.CooS-1Cl mm

ClostridiuClo 1

.. - ostridi 1 Blautia.hydrogenotrophica.DSM.10507.CooS-Blautia.hydrog s bb 1 uroba BlautBlautia.hansenii.VPI.C7-24.DSM.20583.CooS- - 1 - oS-

C S oo Coo - Lachnospiraceae.bacterium.sp..8_1_57FALa t 2 1 ridium S 1

str S o

tt 43.Coo l S

o uu

o

o Clostridium.hiranonis.TO-931.DSM.13275Clostrid o Co S- chno Clostridium.botuliClostridium.botulinum.BoNT/A1.Hall . 28. s ll Me o idiu num ia.hydrog Clostridium.asparagiforme.DSM.1598Clostri ii o 52 c i nn C a.h um t a 5 . TThermosediminibacter.oceani.JW/IW-1228P.DSM.16646.CooS-3 r teri e C d oo 1 .hydrog uu C ovorans.743B i . . - he tthanocaldo d d n .Coo

. mm C hano 13 CooS- stri sporogenes m. F 6 ..CooS- 1 ans i n . S 1 spiraceae .be M.1 - u - r Ruminococcus .Bo Okr lu be 6 4 1 1 mo .. o m a

o l Ferroglobus.placidus.AEDII12DO.DSM.10642.CooS-2Fe m BB S S- 1 2 i dium.d botulinum.type.A-b t 9 o o um.hi Archaeoglobus.fulgidus.VC-16.DSM.4304.CooS-1Archae e o oo 1 ijer 0 D 8 pe.X 2 thermolitho enii.VPI.C7- o o 1 sedimi MeMethanocaldococcusc . NT/A . ..824 rrog . g Eubacterium.hallii.DSM.3353.CooS-2Eubact b

TThermovibrio.ammonificans.HB-1.DSM.15698 NN aldo y 2 t 5 enot DSMD 7 1 ooS- 7 e EubactEubacterium.hallii.DSM.3353.CooS-Clos ulinu C ClostrClostridiales.sp..1_7_47FA n MethanMethanocaldococcus.vulcanius.M7.CooS- o .ty hermo C S-S 1 1 CooS

ini TT K 1b/006. t MethanMethanopyrus.kandleri.AV19.CooS-1 . c C . a 3 C molitho hanoca t C.-.Ek o lobus T C o 55 S- 1.Co u aspara ckiickii. // NT/B1. 7 2 L o coccuc s rano A C s.P T 1 1 0 l . .CooS- ni l C e. oS- - o 32. 1 DesDe og occ ttridium rophic i 0 5 1213 .ba n A ridium ClostrClostridium.nexile.DSM.1787 33 / bacter.oceani. F m.tym p . s . .A s N 18 4 46 erium . an ho u . ATC lobu vibr . .. . . Coo AP1.AP1 a 1 . i BR .16795.C 2.C oS ClostridiuClostridium.ljungdahlii.PETC.DSM.13528.CooS- t 6 N demi s AATCC L Co p oca erium dialedia .ATC PETC.PETC ..CooS-i - r m u n ctercte m .ty mm.ATCC 4.NAP 6 2 ul m N 95 SM. ty i.i l nu i o a trop CI T m N DSM. SS-1 d 8. l s. opyop ag y 2 u li u 3 d . q4 sTis C f u 1079 Co S-

c oot c io .t . u ep 5 5. o 24 g C p c . . i .D sf B ri CooS-CooS-1 o jannaschiija a t n ri. SM . 3 oS- SynSy num.F.230613.CooS- ld u n h 3 t .bolteaeid oorques. i li . M.1 d i 5 S- n i num.Bo - 6 o l idivor um T o c b i l e e ah P 63 0 l NAP1 m a y mmaculu . um.D D 99. 0291 2g 2 o Co 1B u w 1 n M I 97b 9 C u N DSM. t u 4 DS ttrophobot . t t 15579. rophobom a . B

o .

b a o . c o Desulfitobacterium.hafniense.DCB-2.CooS-2D 6 yi. .ljungd.ljung.lju Hall.Coo .R20291 b

ulu o r b obutylic b

. JW 5 e . t esulfitAceAAcetivibrio.cellulolyticus.CD2.DSM.1870 . Thermincola.potens.JR.CooS-2 648851500 .sp..FS406-22.CooS-1 t 2 7

A e CClostridium.carboxidivorans.P7.DSM.15243.CooS-CloClostridiu ce . u - bo m.nig m ggd AT m De .D m 3 ce S lostr . o /IW TCC.27756 l . fficile icile C i u o C Clade D t u u n i iviiv i str t ahliah i ClostriClostridium.beijerinckii.NCIMB.8052.CooS-2ssulf obac uluulus SM. CooS- di

o d CC.BAA-61 o i u d d id ul brio.cb i i id i

r

o ri o r r iu m f s i. t CClostridium.cellulovorans.743B.ATCC.35296.CooS-3iu it f t t m. . PE i S S

obobacterium.hafniense.Y s glycglycolicus.FlGlyR.DSM.c s s lostridium.cellul m.m.acetobutylit 2661 S- m.diff .QCD-23m63.Coo eri . diu an ridium.nov othermalis.108 B

- - o c o o idium.a 1313.CooS-2 ell l ac l l arbo a 1 1 um 1 2 dr c 1 m.beijerinc ul o s. f 1 h C t C C e A Clostridium.botulinum.F.230613.CooS-2 str - er Clostridium.botulinum.BoNT/B1.Okra.CooS-1 l Clostridium.kluyveri.NBRC.1201 tridium. 1 1 Clostridium.kluyveri.DSM.55 fficile t olyticoly D . Clostridium.ljungdahlii.PETC.DSM.13528.CooS-1 hy Cdh-1 . 64801546 obu ha CooS- A Clostridium.carboxidivorans.P7.DSM.15243.CooS- xidiv i Clostridium.botulinum.type.C.-.Eklund um yt SM. Clost 2 ClosClostridium.sporogenes.A 3 Clostridium.botulinum.D.1873 .di Cdh 1 Clo . os f Clostridium.bartlettii.DSM.16795.CooS- 1 ty ni 3 Clostridium.difficile.CIP.107932.CooS-1 Cdh-2 8 s o .ha Clos . A 060642.Cd tr ra li ens 574 Clostridiu uptor. LQ8.DSM. idiumovo ckii ccu a 1 Clostridium.difficile.630.-.epidemic.type.X.CooS-1 r .AV19. 6A8.Cdh-1 n u f Clostridium.difficile.QCD-63q42.CooS- -1 CloClost m.m.ATCC.824.nien 1 Clostridium.difficile.QCD-97b34.NAP1b/006.CooS- si M.1M 04.Cdh- Clos s Clostridium.difficile.QCD-32g58.CooS-1 S rans. .P .NCIMB .Coo 1 Clostridium.difficile.QCD-76w55.NAP1.CooS- AV19 st s .sporogsporo AT Clostridium.difficile.QCD-66c26.CooS-1 ridiuridium.botridridium.botulinump 7 T Clostridium.difficile.CD196.CooS-2 cellum. 743 .DS CCC Clostridium.difficile.QCD-37x79.NAP1a/001.CooS-kandleri SM.43 Clostridium.botulinum.A2.Kyoto-F.CooS-2Clos ium S-1 Clostridium ellulo . 4304.Cdh ClostrClostridium.botulinum.BoNT/A1.ATCC.19397.CooS-2m g M.1 51 Clostridium.difficile.NAP07.CooS-c .carbinolicus.DSM.2380.CooS-1rmo . .bo . eenesn B. .8 827 Clostridium.difficile.NAP08.CooS-1 egula.boonei. ClostriClostridium.botulinum.BoNT/B1.Okra.CooS-tridium.ClostClo o b es ATC . Caldicellulosiruptor.lactoaceticus.6A.DSM.9545ter .kandleri. r T SM 0 idium. dium. tulinumot 524525 Coo 3 Caldicellulosiruptor.kristjanssonii.177R1B.DSM.1213 .P .D 737303.DSM.3721.Cdh-7303.DS2 Clostridium.botulinum.type.A.-.HalC s ulinum.A Coo 1 Caldi m.the 4 9 CClostridium. di lostridium.b triridium.bo TCCC 645664645611042 Caldicellulosiruptor.saccharolyticus.DSM.8903 m.Z-m.ZmZ- los botulinum.Boum.botul otulidiu C .35296. S-2 Thermosediminibacter.oceani.JW/IW-1228P.DSM.16646.CooS-1idiu M.624.5..521 stridium m .F.F. ..Ba4.657...155796488648854829 S-2 Desulfotomaculum.acetoxidans.5575.DSM.771.CooS-2thanopyrus M -1 1 ClostrClostridium.botulinum.NCTC.2916.CooS-2num.A2. .LangelaLLaLangeland.CooS-2Ba415579155579296 Pelobac DS DSD 2 bo angelngela4 579 Me us.Methano P.DP. .Cdh - Clade A CClostridium.saccharolyticum.WM1.DSM.2544.bobotulinum.idium. inum.BoNT/B1.Okrab ttulinum.F.230613u .657 CoC 4 Clostr 2AA.Cdh-1Cdh 4.Cdh-2 CarboxCarboxydothermus.hydrogenoformans.Z-2901.CooS-Vlost otulinum NT/A1. o lin a .CooCCooS-2 Methanopyrus SLP.DSMC2 A. h- ridium. botul tul .KyotoKyotou ndn .CooSCCooS- Methanospirillum.hungatei.JF-1.Cdh- s. 2 TThermhe inum m . Methanocella.sp..RC-I orans.C2an s.C2 DSM.80647.Cd -2 MetMethanohalobium.evestigatum.Z-7303.DSM.3721.CooS-1ydothermus.hrm .BoN inum.NCA .F. CoooS 0.9951 Candidat ran --33 h han oos saccharolyticum.WM1.DSM.TCC/B1.OkraB1.Okra.type.A-F.CooF 23 S Ferroglobus.placidus.AEDII12DO.DSM.10642.Cdhvorans.C2A.Cdh-usaro. h 4.Cd ohalob sinuinu T/A3A3. Coo0 S 2 Archaeoglobus.fulgidus.VC-16.DSM.4304.Cdh-2acetiv ri.F .DSM.3A.Cd .80 1 s.cc . CTC.2916..19397.C 61 Methanosaeta.thermophila.aceti o1 C2C2A.Cdh2 M ium.evestigaydrog arboxyrbo LoLoch. .Co.-.H.- SS- 3 Archaeoglobus.fulgidus.VC-16.DSM.4304.Cdh-1arcina. ei.G ns. DS m.evestiga oxyx ch. .H 2 Methanohalobium.evestigatum.Z-7303.DSM.3721.Cdh-oraan saro. 647.Cdh- MMethanosarcina.mazei.Go1.DSM.3647.CooS-2etha enof ydivordi MareeMaree. ooS-oS-S ala l Methanococcoides.burtonii.DSM.624sarcina tiivov u nosarciga ormanvor CooSS- 2 Methanohalophilus.mahii.SLP.DSM.5219ce tum. MeMethanococcus.voltae.Aanans.ns.s .CooS-2Co 2 Methanos ina.a rkeri.F 6 MethanMethanosarcina.barkeri.Fusaro.DSM.804.CooS-2MethanMethanosarcina.acetivorans.C2A.CooS-na.mazZ-7303tha s.Z- No oSS-2 Methano ei.Go1.DSM.35 noco 2901.r1.1 254- az ipaludis.CC osar osa ei.Go1.D.D ccu .CooCooS-4 2 Methanosarcina.barkeri.Fusaro.DSM.804.Cdh-17 --22 cina.bar rcina. SM.37 s C S 4 Methanosarcina.mazei.Go1.DSM.3647.Cdh-2thanosarc maripaludis.Cripaludis.Cs.C taa..H .Cdh acetivorans.C2ASM.3 .vo ooS-S- -2 Me ma ipaludip s.S2 us.Del 2133 MethMethanosaeta.thermophila.PT.Cdh-keri.Fusaro.DSM.80 21.CooSltae A V Methanosarcina.barkeri.Fusaro.DSM.804.Cdh-mari ccu SM. anosaetsaet rans.C2A.Cooans.C2A.Coo647.CooS-.AA3 Methanosarcina.mazei.Go1.DSM.3647.Cdh-1paludiB totrrophi burg.D a.thermophila.PT.Cdh-DSM.80 -1 Methanococcus.maripaludis.C6.mari ii.S oau Mar PersephonellPersephonella.marina.EX-HM.80 .Coo - Methanococcus.maripaludis.C5vannielann herm sis.Marburg.DSM.2133.s.M 8 NatNatranaerobius.thermophilus.JW/NM-WN-Lran DesulDesulfotomaculum.reducens.MI-ul 4.CooSS 2 s.vannielacctteer.thermoaur.t rburgen aeroberob fotomac a.marina.E -2 Methanococcus.maripaludis.C7moba er.ma 4S.D DSSM.208-1 ClClostridium.tetani.Massachusetts.E8ius.thermophulum.reduc -2 Methanococcus.maripaludis.S2hermobaobaobacter.marburgensactt V24S ostridium. X-H 1 Methanococcus.vannielii.SBthanot m s vidus. M7.Cdh DethioDethiobacter.alkaliphilus.AHT1.CooS-2tetani ilus.JWs.JWJW ens.MI-s.MIsMI1 Me mus.fermus .vulcanius. 86 1 AlAlkaliphilus.metalliredigens.QYMF.CooS-3kaliali bacter.alkal .Massa /NM-WN/NM-WN-L L 1 Methanothermococcus vens.AG8 php iilus.mm iphilu chuset F Methanothermus.fervidus.V24S.DSM.2088cocccococ us.fer schii.DSM.2661.Cdh-1 CClostridium.los etetalliredigens.QYMF.Coiredigeedig us.AHT1 CooSts.E8 Methanocaldococcus.vulcanius.M7.Cdh-1o 2Cdh stridium. gens.QYMFns.Q Co.CooS-8 Methanocaldococcus.fervens.AG8ococcus.jannasps F S406-22.Cdh-406-2 1 Clostridium.Clostridium.papyrosolvens.DSM.2782.CooS-2cellulolyellulolyulolyytiicumcumYMF.Co 2 dococcus.sp..FScoccuscus awensis.IH papyrosolven um..H10H10..CooS-2CooSoS-oS 3 Methanocaldococcus.jannaschii.DSM.2661.Cdh-1coccus .Cdh--2 AAzoAzototoobacobobactcterer. s.DSM 2782 C -2 Methanocaldococcus.sp..FS406-22.Cdh-rmmocmococcus.okinococcus.okinicus.Nankai-3 1 vinelavinelandiindindii.DJ.AT ooS Methanothermococcus.okinawensis.IH1ccuscuc s.a.aeol viator.MP104C.Cdh- Thermincola.potens.JR.CooS-4CC.BAA-1303 Methanococcus.aeolicus.Nankai-3.Cdh-2Desulforudis.audax CarboxCarboxydothermus.hydrogenoformans.Z-2901.CooS-IVydothermus Candidatus.Desulforudis.audaxviator.MP104C.Cdh-ccus.infeernurnus.Ms.ME Methanocaldococcus.dodoco ai-3.Cdh-1 Desulfitobacterium.hafnien Methanococcus.aeolicus.Nankai-3.Cdh-1ccuc s.aeolicus.Nank 1 se.Y51.CooS-3 Methanothermobacter.marbuermobaactter.marburgensirgensis.Ms.Marburarburg.Dg.DSSM.M.2133.Cdh-12133.Cdh- DesulfitoDesulfitobacterium.hafniense.DCB-2.CooS-4 Moorella.thermoacetica.ATCC.39073.CooS-1ermoaceticae .ATCC .390339073 .CooS CooS-1 Desulfurispirillum.indicum.S5 Geobact SM.574.CooS-2 er.metallireduc.mmetallireducens.GS-15ens.GSs.GS-15 DeDesulfosulfottoommaculum.nigrificans.D I Geobacter.bemidjiensis.Bem.DSM.16622.CooS-2bemimiidjiensis.Bem.Dm 1 0.9941 Geobacter.sp..FRC-32.CooS-2spp FRC..FRC-32.CooS-.FRC 32.Coo 32.Coo SM.16622.CooS--2 CarboxCarboxydothermus.hydrogenoformans.Z-2901.CooS-Iydothermu 3 Campylobacter.concisus.1382actac -2 Geobacillus.sp..Y4.1MC Campylobactc er.concisus.e 13823 Clostridium.papyrosolvens.DSM.2782.CooS-acctterer.curvus.currv 6 GeobaGeobacillus.thermoglucosidasius.C56-YS9.11170 Clostridium.cellulolypapyrosoppyrosus..525.9525.92 CC Clostridiu cell lvens.Dven ThermThermoanaerobacter.tengcongensis.MB4Tillum.rubrum.S1.AT 3 Clostr m.t ulolyytticum.H10.CooS-icum SM.SM 2782.C782.782 RhRhodospir idium.thermocelluthehhermrmooc .H10.Co ooS-1 .CooS-1 Clostridium.thermocellum.ATCC.27405therm celeellulum.m.LQ8.DSM.1313L oS-1 Rhodopseudomonas.palustris.BisB18 Clostridium.cellulovorans.743B.ATCC.35296.CooS-4hermocellum.hherm ocellu Q8.DSM. Thermincola.potens.JR.CooS-vorans.Nor1 648852350 e m.m.DSM.2360DSM. 1313.CooCooS-1 S-4 Veillonella ellulovoran ATCC.27402360 S-1 r sinus.carboxydi Coo Veillonella.sp..6_1_27 s TThermohe se.Y51. Veillonella.dispar.ATCC.17748.parvulav .743B 5 nien 1 Veillonella.sp..oral.taxon.158.F041ula.ATCC.17745.A .ATCC. ydother .haf 1 Veillonella .6_1_2_1_2TCC 35296. u cterium .CooS- Veillonella.sp..3_1_4ar .1774 CooS CarboxCarboxydothermus.hydrogenoformans.Z-2901.CooS-IDesuDesulfitobacterium.hafniense.DCB-2.CooS-3fitoba .Y51 3 Veillonella.atypica.ACS-134-V-Col7a.ATCC.1774.ATCC.ATCC.7 5 -4 DeDesul 4 Veillonella.atypica.ACS-049-V-Sch6.parvula.Doral.tax Selenomonas.sputigena.ATCC.35185la on.158774748 t 0.9917 Selenomonas.artemidis.F039.DSSM GeGeobacter.bemidjiensis.Bem.DSM.16622.CooS-1obac Selenom _1_4 M.2200 .F041 Clade F AAlkaliphilus.metalliredigens.QYMF.CooS-1 Mitsuokella.multacida.DSM.20544ACS-1ACSAC 4 008 Desulfitobacterium.hafniense Bryan 2 s Eubacterium.cellulosolvens.ona ACS- 3 DDesulfitobacterium.hafniense.DCB-2.CooS- Ruminococcus.flavefaciens.FD-1gen 049-V-Sch049-V-4-V- Ruminococcus.albus.tella s.sp..or a.ATC49-V- Col7 MMoorella.thermoacetica.ATCC.39073.CooS-2 S-2 Ruminococcus.albus.8.formatexigens.I-52.idis Sch a Thermincola.potens.JR.CooS-1 Pseudoramibacter.alactolyticus.ATCC.23263alal.ta.F0 C.3518C 6 .Coo 5 Oribacterium.sinus.F026 .ta 39 . otom .CooS-2 Eubacterium.saburreum.DSM.3986DS xon 9 8 -01 Chlorobaculum.parvum.NCIB.8327 .137137.F 5 DesulfDesulfotomaculum.acetoxidans.5575.DSM.771.CooS- Chlorobium.phaeobacteroides.DSM.266s M.20 ulfoto Dethiobacter.alkaliphilus.AHT1.CooS-1s.AK Desulfomicrobium.baculatum.X.DSM.402 .I-52 .F DesDesulfotomaculum.acetoxidans.5575.DSM.771.CooS- Desulfovibrio.magneticus.RS- 544 0430430 NaphS2 Desulfohalobium.retbaense.HR100.DSM.5692ns. .DSMDSM.4 m. del 7 6 0 delta sF eriu Desulfonatronospira.thiodismutans.ASO3-1.CooS-ta D- .14469 Candidatus.Kuenenia.stuttgartiensis.CooS-2Candidatus.Kuenenia.stuttgartiensialkenivoran 1 Desulfovibrio.desulfuricans.desulfuricans.ATCC.2777 1 .AK-01.CooS-1 Desulfovibrio.vulgaris.vulgaris.Hildenborough.pro Clade B AcetAcetohalobium.arabaticum.Z-7288.DSM.5501.CooS-2eobact Desulfovibrio.vulgaris.vulgaris.DP4.pro ..CooS-2CooS Desul teobacterium 2 Desulfurivibrio.alkaliphilus.AHT2.CooS-1m. 8 .prot vorans 4 DesDesulfovibrio.e s ov natronoteobacterium.eobac acterm.N D - Desulfonatronospira.thiodismutans.ASO3-1.CooS-2faatibacillum.ta T 3 DesulfDesulfovibrio.aespoeensis.Aspo-2.DSM.10631e su ul lf v b um.ragn bac cter S .AA 2 iu fo enien M ililuu en C 54 0 DDesulfovibrio.salexigens.DSM.263es sulfusulfu lfo ov ib ba aca TCC r sululf del ul arc k R ML LM . S S-S 1 Ferroglobus.placidus.AEDII12DO.DSM.10642.CooS-1F l ac c C M cte al oxida ..MLMS-1.CooS-5.. g 6C6 o 2 Syntrophobacter.fumaroxidans.MPOB.CooS-1S esulfoviu lf fovibrio.vulgarifo vibrii rio.rio c te e eti u o IB . DesDe . .H lpliph M LM Coo11 95 2 dedelta.proerrog ffu brio.v tb l i 3 Desulfococcus.oleovorans.Hxd3.CooS-2Des sulf sp .. P104C . o 2 MMethanospirillum.hungatei.JF-1.CooS-1ynyntropho lf u v te riu cus at d .8 9 ..22 lfococcum a . MLMSMLM .de.d m.KI 2 MethaMethanosphaerula.palustris.E1-9c.DSM.1995nnt ov r ibr de s ri ae es 3 8 3 Desulfarculus.baarsii.2st14.DSM.207De lu um k sp .. x S-S - CCandida l ovi ivibivibr m u 6 . M e .C.Co ooS-ooS 2 MeMethanoplanus.pee t t o p um m .D 2 2 cil .al um if 528 3 o 2 DDesulfotomaculum.acetoxidans.5575.DSM.771.CooS-1a i io s ir ..sp..n .RS-.RS 7 6 .spssp..MLMS-1.CooS-4 S - 2 CarboxydibrachiumCarbox a tha op lo b .v ulgar s .X.D 3 Desulfococcus.oleovorans.Hxd3.CooS-1Desu o 648064801154 .C 2 Desulfobacterium.autotrophicum.HRM2.DSM.3382.CooS-1Desulf e . b riori u a se.HR SM.S DesulfobaDesulfobacterium.autotrophicum.HRM2.DSM.3382.CooS-3ter.fumarrio.alrio eri um M.1664 1 MMethanosarcina.acetivorans.C2A.CooS-ese n p pph r .vu u .t . p iba t m 24 .CooS- 2 Methanosarcina.mazei.Go1.DSM.3647.CooS-1Me bus.b r o iio.alk lflfuric sps . tibacillum.alkt iator. S mosu.13 Co - MMethanosarcina.barkeri.Fusaro.DSM.804.CooS-1t d r ho ooal l h . a c imosu 64801264801295 2 MethanosaeM s ha o io.aes g MLMLMS-11 f erierium.sp..MLMS-1.CooS-3iu 5 CooS-CooS 2 T n ri . S l otrophic ac t r 2 Thermococcus.sp..AM4.CooS-2The etha ulfotu i n o alalaskensisal lg iodismutan....MLMS-1.CooS- 26 D M 243 . Coo ooS 0 DDesulfurobacterium.thermolithotrophum.BSA.DSM.11699.CooS- dad tteobact o a uulfa rivib c 9 X. S GGeobacter.sp..FRC-32.CooS- etha ospi i MLMS- M.M 6 eerium. 2 GeobaG hermohehhermococcu eobabac utotrophic oS S-SS-2 Geobacter.uraniireducens.Rf4Ge e s. t 9 PelobaPelobac osp as arari ris.vu s Acetohalobium.arabaticum.Z-7288.DSM.5501.CooS-1Acetohalobium.araAcetohAcetoha th nop . k 10 6 urivib ac Clostridium.cellulovorans.743B.ATCC.35296.CooS-2Clostr a ans. 6488528264885 M P. ooS-ooS C p a c II9.CooS-211. oS-oS e oba sal .a Ammonifex.degensii.KC4Ammonif 15243 .C o e t x ali v 4 eob ium.limosuDS 2.CooS CooS-CooS 93 eoba hano ttus.Methanoregula.boonei.l i tteobacterium.spa B 0 su anosarcina.mobobacto u a S- 0.DSM. 02 DeDes a ss.Miyazaki.Ful o y k rophobaco 167 . .C C rm mo no omacom t poe er 000 o r ci p 2 r eobac . oS- 1/ bactb dibra s haerula a e e en . ium e t 228 t M. / P1 CooC oSoS- n ba nos l i M garisde 1 pr type b an llum.h h 8 CooC bacte . c . o 1228 .DSM.15 ile 6. lf rr. d xi l ..CooS-2 eoba a/a . d o Me garg Desulfurivibrio.alkaliphilus.AHT2.CooS-2Desulf ro - a o s 1 Co tteobac a ilus.A a ac c.type.X.c 22.Coo2.CooS- u t fum us.AED s iyaya s Syntrophobacter.fumaroxidans.MPOB.CooS-2Synt t p 2917 c eeriu u g icile i c a . .prot. o 63q4 idiumidium.cellulovoran c ensi is SM d ror f 3 d ctcter. sa u e u CooS- PETC. .DSM. rob o a r m en s s oS- a fficile.ff 282 c ter.stet a r e . P1 alobium.aalobium occ i 569 t ppro t e s iu ..G za .H lfu deldelta.prol i mi .A2-16 a c c ulum.ulu t s .A . ere G EubEubacterium.limosum.KIST612 D-63q4 1b/006. t c rcin riu h 69 di llb . aroa .DP e DSM.51 erer. ini a 793 t e i hi lumlu mm. s tta . 642840642840930 sp SR c pe i ri S 9 u a l m ungatei. s.A 220 HTHT2.CooS-ki ldenborou 2 ulfobacter deltade 0291 P08.CooS-P08.CooS r ttii.D . s a .palustr .DS ca 2 0793 a bbi u tta.thermophila.P e o e .u ssul us. a.a m u n 0 O .NAP1a/00 s . a . rans.P7 m D-66c26 P07.Co p . 1 c AP carbinoliccarbinolicus. Naphx F 1 ddel 10793 dii. . ul s. m ttrolearius.SEBR.4847. ni.JW/IW 9 . YMF.Co .t a.barka.ba roleariuso II 4 ns 3- ATCC r ..F. te s DesDesulfobacterium.autotrophicum.HRM2.DSM.3382.CooS-2gdahli K . iidan a ile.CD19 R20291R20291. .ATCC.2917.ATCC. l . ap da NAP an ffurreduc gammggamma her aut ace rereg 12 eQC . F s c . M po 1. e di P1079P D-32g58.CooS- Q rb urr a .A -.epid x7 ile.QCD-63q42.CooSile.QCD-63q42.CooS-QC an cc o r utu pacifpa b p a e jun idivo ridiu 4.NAP m 3RC ium.th g n .2 Co bartle .obeum a D t ileile. sQ r T x . ile.NA i t -2.DS atus.Desulforudis.audaxv 0. D-76w55.NAe.QC CI kl um . ze i ot .oc a ivor u s.Ms edue . reduce .AM m c t S r o ile.QCD-66c26 c M.20583 CooS coccus.sp.SR1/ oxid is JF O.DSM.10642.C6 CC. iic eum la o idivorans.P7 f r r i 2 M - eri.F f 3 f c i.Go1 g S- rb x 7b3 M op o iicum.J .E1 ium.l 3 c . .CooS-1. 8 A h stic h - 9 m.diffic fficile.NAgens.Q baticum.Z a a p b Coo PO 1 ClostridiumClos u d S um 2777 S di u 2.2 1.Co 1 acte i ile.QC diffic .obeum c . ca le.63 l 4 SE a M o M . ii us t ns. O diffic c hil hicum a ridium ens e CandidCandidatus.Desulforudis.audaxviator.MP104C.CooS-ib ile.QC di . oolerans.E - DSM D C e o 77 c h CooS usaro.D n 9c.D n arbo m.m.diff coccus . . r l B .1063 m 4 ostr ic difficile iffic 24.DSM.20583.CooS- eraner a .J n 7 u . mo

c p . o s.557 S- m. ns o C2A.C BR um. a DSM.DSM D .Coo i um m. . . e 4 RuminoRuminococcus.sp.SR1/ s.743B.ATCC. c C oS i m imi u o o P M.M ClClostridium.ljungdahlii.PETC.DSM.13528.CooS- c .P d coccus i d i r S i 1 ClostClostridium.bartlettii.DSM.16795.CooS-2 d -52 o SS- . icile.QCD-37 idiu .Rf f i o lith T. . 66A h .

tr a e idi r a . . SM.199 C7- .I C M. DDSM ium. c s. HRM 484 - 1 s ostridium.diffic r idiu o A p m - 1 p ile.QCD- b 1 S-S r f CooS- st . 2 os idium. r S 5. S A ClostridiuClostridium.difficile.CD196.CooS-1 4 id y E 8 o . o Clostridium.difficile.QCD-63q42.CooS-Clostridium.diffic ns 3 -7288.DSM PI. s metalliredi e 2380.CooS-22380 SM r o 1 p M ClClostridium.carboxidivorans.P7.DSM.15243.CooS-3 Ruminococcus.obeum.A2-16Rumino - DS rmo str idium.diffici lost t tro J 6 7 iffic u m.dif 1 uminococcus h o e Clostridium.Clostridium.sticklandii.DSM.51 t 3 oo tr CloClostr o 4 . o lo c . CClostridium.difficile.QCD-66c26.CooS-2ClostrClostridium.difficile.NAP08.CooS- ge DSM.11571DS s T S 2 1265 RRuminococcus 7.C c ph n CClostridium.carboxidivorans.P7.DSM.15243.CooS-stridium.diff Clostridium.difficile.NAP07.CooS-2Clostr f .80 M S ThThermosediminibacter.oceani.JW/IW-1228P.DSM.16646.CooS-2 xige .DSM. i - 5 l lo e o Clostridium.difficile.CIP.107932.CooS-Clostr RuminococcusRuminococcus.obeum.ATCC.2917 u 1 1 . - enii.V eexigens.I-52. .7 M 8 c CooS- 1 t u g CClostridium.difficile.630.-.epidemic.type.X.CooS- Clostridium.dClostridium.difficile.QCD-32g58.CooS- m 5 CloClostridium.difficile.QCD-76w55.NAP1.CooS-2ium.d philus.metalliredi 4.C ooS li a o m.BSA.D 7 o 3 .

r 1157 a . 1. O

m

d ClostridiuClostridium.difficile.QCD-37x79.NAP1a/001.CooS- 35296. r o 3 Co y y P -

e oS 1 382 orma Alkaliphilus.metalliredigens.QYMF.CooS-Alk .5 - a.hans f h F h 2 i .f.format . 1 ClostridClostridium.difficile.QCD-97b34.NAP1b/006.CooS-2 oS a.hydrogenotrophica.DSM.10507.CooS- 501 a 8 T - i 1 t . CooS - ella u 1 t CooS SM.1 Blautia.hansenii.VPI.C7-24.DSM.20583.CooS-Blaut a .C Clade E l Blautia.hydrogenotrophica.DSM.10507.CooS-Blauti an B oo y - 1

S- 16 - BrBryant

2

1 9

9.C

o

o

S-

1

0.3

FIGURE A1 | Circular Bayesian Inference tree with sequence names shown (Same tree as Figure 1B). Posterior probabilities for deeply branching nodes are displayed.

www.frontiersin.org April 2012 | Volume 3 | Article 132 | 13 Techtmann et al. Anaerobic carbon monoxide dehydrogenase HGT

FIGURE A2 | Continued

Frontiers in Microbiology | Evolutionary and Genomic Microbiology April 2012 | Volume 3 | Article 132 | 14 Techtmann et al. Anaerobic carbon monoxide dehydrogenase HGT

FIGURE A2 | Continued

www.frontiersin.org April 2012 | Volume 3 | Article 132 | 15 Techtmann et al. Anaerobic carbon monoxide dehydrogenase HGT

FIGURE A2 | Subtrees of maximum likelihood tree of [Ni,Fe]-CODHs (Figure 2) divided based on clades.

Frontiers in Microbiology | Evolutionary and Genomic Microbiology April 2012 | Volume 3 | Article 132 | 16