Lawrence Berkeley National Laboratory Recent Work

Title RubisCO of a nucleoside pathway known from is found in diverse uncultivated phyla in bacteria.

Permalink https://escholarship.org/uc/item/4dk6g95j

Journal The ISME journal, 10(11)

ISSN 1751-7362

Authors Wrighton, Kelly C Castelle, Cindy J Varaljay, Vanessa A et al.

Publication Date 2016-11-01

DOI 10.1038/ismej.2016.53

Peer reviewed

eScholarship.org Powered by the California Digital Library University of California The ISME Journal (2016) 10, 2702–2714 © 2016 International Society for Microbial Ecology All rights reserved 1751-7362/16 OPEN www.nature.com/ismej ORIGINAL ARTICLE RubisCO of a nucleoside pathway known from Archaea is found in diverse uncultivated phyla in bacteria

Kelly C Wrighton1,7, Cindy J Castelle2,7, Vanessa A Varaljay1, Sriram Satagopan1, Christopher T Brown3, Michael J Wilkins1,4, Brian C Thomas2, Itai Sharon2, Kenneth H Williams5, F Robert Tabita1 and Jillian F Banfield2,6 1Department of Microbiology, The Ohio State University, Columbus, OH, USA; 2Department of Earth and Planetary Science, UC Berkeley, Berkeley, CA, USA; 3Department of Plant and Microbial Biology, UC Berkeley, Berkeley, CA, USA; 4School of Earth Sciences, The Ohio State University, Columbus, OH, USA; 5Earth Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA and 6Department of Environmental Science, Policy, and Management, University of California, Berkeley, CA, USA

Metagenomic studies recently uncovered form II/III RubisCO genes, originally thought to only occur in archaea, from uncultivated bacteria of the candidate phyla radiation (CPR). There are no isolated CPR bacteria and these organisms are predicted to have limited metabolic capacities. Here we expand the known diversity of RubisCO from CPR lineages. We report a form of RubisCO, distantly similar to the archaeal form III RubisCO, in some CPR bacteria from the Parcubacteria (OD1), WS6 and Microgenomates (OP11) phyla. In addition, we significantly expand the Peregrinibacteria (PER) II/III RubisCO diversity and report the first II/III RubisCO sequences from the Microgenomates and WS6 phyla. To provide a metabolic context for these RubisCOs, we reconstructed near-complete (493%) PER genomes and the first closed genome for a WS6 bacterium, for which we propose the phylum name Dojkabacteria. Genomic and bioinformatic analyses suggest that the CPR RubisCOs function in a nucleoside pathway similar to that proposed in Archaea. Detection of form II/III RubisCO and nucleoside metabolism gene transcripts from a PER supports the operation of this pathway in situ.

We demonstrate that the PER form II/III RubisCO is catalytically active, fixing CO2 to physiologically complement phototrophic growth in a bacterial photoautotrophic RubisCO deletion strain. We propose that the identification of these RubisCOs across a radiation of obligately fermentative, small- celled organisms hints at a widespread, simple metabolic platform in which ribose may be a prominent currency. The ISME Journal (2016) 10, 2702–2714; doi:10.1038/ismej.2016.53; published online 3 May 2016

Introduction (CPR), a monophyletic group of at least 35 phyla that accounts for 415% of all bacterial diversity (Brown The vast majority of the organisms in the environment et al., 2015). Metagenomic sampling of almost 800 have not been cultivated, obscuring our knowledge of CPR bacteria suggested that members of this radia- their physiology. Recent metagenomic investigations tion consistently have small genomes with many haverevealedthepresenceofalargediversityof metabolic limitations, including lack of an electron uncultivated organisms in marine and terrestrial transport chain, no more than a partial tricarboxylic subsurface environments (Castelle et al., 2013, 2015; acid cycle and mostly incomplete nucleotide and Lloyd et al.,2013;Bakeret al., 2015). For example, in a amino-acid biosynthesis pathways. Using metabolic metagenomic study of a shallow alluvial aquifer, we predictions from complete and near-complete gen- determined that many of the uncultivated bacteria omes (Wrighton et al., 2012), we inferred that were associated with the candidate phyla radiation members of the CPR are fermenters whose primary biogeochemical impact is primarily on subsurface Correspondence: JF Banfield, Department of Earth and Planetary organic carbon and hydrogen cycling (Wrighton Sciences, and Department of Environmental Science, Policy, and et al., 2014). However, many of the genes in these Man, University of California at Berkeley, 336 Hilgard Hall, genomes remain poorly annotated and the metabolic Berkeley, CA 94720, USA. platform of CPR bacteria remains uncertain. Further, E-mail: [email protected] 7These authors contributed equally to this work. no transcription and activities have been Received 25 August 2015; revised 29 February 2016; accepted validated, hindering knowledge of the actual reac- 4 March 2016; published online 3 May 2016 tions catalyzed by these organisms. RubicCO in uncultivated bacteria of the CPR KC Wrighton et al 2703 The incorporation of CO2 into organic carbon is a experiments were performed by adding approxi- critical step in the carbon cycle. Ribulose-1,5- mately 15 mM acetate to the groundwater over the bisphophate carboxylase-oxygenase (RubisCO) is course of 72 days to maintain an average in situ the most abundant enzyme on earth and is integral concentration of ~ 3 mM acetate during peak stimula- to the fixation of . Currently four tion. Six microbial community samples (A–F) were forms of RubisCO can fix atmospheric carbon collected over 93 days, including 3 days prior to dioxide (I, II, II/III and III) (Tabita et al., 2007; acetate stimulation and after acetate addition ceased 2008). Forms I and II are used by plants, algae and (amendment ceased on day 72). Ferrous iron and some chemoautotrophic and phototrophic bacteria sulfide concentrations were analyzed immediately for CO2 fixation during primary production via the after sampling using the HACH phenanthroline assay Calvin Benson Bassham (CBB) cycle. Forms III and and methylene blue sulfide reagent kit, respectively II/III, found primarily in archaea, enable light- (HACH, Loveland, CO, USA). Acetate and sulfate independent CO2 incorporation into sugars derived concentrations were determined using a Dionex from nucleotides like ICS-2100 chromatograph (Sunnyvale, CA, USA) (AMP) (Sato et al., 2007; Tabita et al., 2007). Form equipped with an AS-18 guard and analytical column II/III RubisCOs, which were primarily known in (Williams et al., 2011). Microbial cells from pumped methanogens, have greater overall similarity to groundwater that passed through a 1.2-μm prefilter bacterial Form II RubisCO but have specific residues, (Pall, Port Washington, NY, USA) were retained on 0.2- structural and catalytic features that more closely and 0.1-μm filters (Pall). Filters were flash-frozen in resemble archaeal form III (Alonso et al., 2009). liquid immediately upon collection. Previously, we reported the first evidence of form II/III RubisCO genes in partial genomes from the Peregrinibacteria (PER) phylum in the CPR radiation Nucleic acid extraction and sequencing (Wrighton et al., 2012). Additionally, three bacterial Approximately a 1.5-g sample from each frozen filter form II/III RubisCOs were recovered from genomes of was used for genomic DNA extraction. DNA was members of the SR1 phylum that also are inferred to extracted using the PowerSoil DNA Isolation Kit be fermentative (Wrighton et al., 2012; Campbell (MoBio Laboratories, Inc., Carlsbad, CA, USA). et al., 2013; Kantor et al., 2013). The presence of Illumina HiSeq 2000 2 × 150 paired-end sequencing these annotated RubisCO genes in small genomes was conducted by the Joint Genome Institute on that house few ancillary genes suggests that RubisCO extracted DNA. Sequencing amounts for samples A– may have an important role in the metabolism of F from 0.1- and 0.2-μm filters ranged from 9.5 to some of these bacteria. 40.7 Gbp, as described in Castelle et al. (2015). RNA To more comprehensively explore the diversity was extracted using Invitrogen TRIzol Reagent and potential role of CPR RubisCOs, we reproduced (Carlsbad, CA, USA) followed by genomic DNA the prior experiment from which we first identified removal and cleaning using Qiagen RNase-Free bacterial form II/III RubisCO sequences (Wrighton DNase Set Kit (Valencia, CA, USA) and Qiagen Mini et al., 2012). Size filtration was used to increase the RNeasy Kit (Valencia, CA, USA). An Agilent 2100 relative abundance of the small-celled CPR organ- Bioanalyzer (Santa Clara, CA, USA) was used to isms in samples used for sequencing (Luef et al., assess the integrity of the RNA samples. Only RNA 2015). Here we investigate CPR genomes that encode samples having RNA Integrity Number between 8 RubisCO genes, report a new RubisCO sequence type and 10 were used. The Applied Biosystems SOLiD diverged from Archaeal form III and propose a Total RNA-Seq Kit (Invitrogen/LifeTechnologies, broader metabolic context in which these genes Carlsbad, CA, USA) generated the cDNA template occur. By detecting in situ transcripts and through library. The SOLiD EZ Bead system was used to directly cloning one CPR form II/III RubisCO perform emulsion clonal bead amplification to sequence into an expression host, we suggest the generate bead templates for SOLiD platform sequen- potential functionality of this bacterial enzyme. This cing. Samples were sequenced on the 5500XL SOLiD study represents the first enzymatic activity demon- platform (LifeTechnologies). The 75-bp sequences strated from these broadly distributed but metaboli- produced were mapped in color space using the cally enigmatic candidate phyla bacteria. SOLiD LifeScope software version 2.5 (Life Techno logies/Invitrogen) using the default parameters against the reference genome set of 2 302 715 Materials and methods separate gene FASTA entries. Corresponding GTF files were built to record the position of each gene on Groundwater field experiment and sample collection the set of artificial chromosomes. The field experiment was carried out between 25 August and 12 December 2011 at the US Department of Energy Rifle Integrated Field Research Challenge Genome assembly, binning and functional annotations site adjacent to the Colorado River, CO, USA as Methods for assembly, binning and annotation recently described (Brown et al., 2015; Castelle et al., are described from methods described in detail 2015; Luef et al., 2015). In brief, biostimulation in prior publications (Wrighton et al., 2012;

The ISME Journal RubicCO in uncultivated bacteria of the CPR KC Wrighton et al 2704 Castelle et al., 2015). Reads were quality trimmed We then used these RubisCO sequences to recruit using Sickle (https://github.com/najoshi/sickle) with additional CPR sequences from currently unreported default settings. Trimmed paired-end reads were groundwater and sediment genomic data sets from assembled using IDBA_UD (Peng et al., 2012) with the same field site. default parameters. Genes on scaffolds 45kb in length were predicted using Prodigal (Hyatt et al., 2012) with the metagenome procedure (-p meta), and 16S rRNA gene recovery and phylogenetic analyses then USEARCH (–ublast) (Edgar, 2010) was used to 16S rRNA gene sequences were identified in search sequences against UniRef90 (Suzek assembled scaffolds based on Hidden Markov Model et al., 2007), KEGG (Kanehisa and Goto, 2000; searches using the cmsearch program from the Kanehisa et al., 2012) and an in-house database Infernal package (cmsearch –hmmonly –acc –noali composed of open reading frames from candidate –T –1) (Nawrocki et al., 2009). This protocol is phyla genomes. The in-house database includes described in detail in Brown et al. (2015). Searches previously published genomes (Wrighton et al., were conducted using the manually curated struc- 2012; Castelle et al., 2013; Hug et al., 2013; Kantor tural alignment of the 16S rRNA provided with SSU- et al., 2013; Wrighton et al., 2014) and genomes from Align (Nawrocki, 2009). Sequences corresponding ongoing work and primarily aiding in identifying with CPR genomes were aligned using SSU-Align putative CPR scaffolds. For each scaffold, we along with the best hits of these sequences in the determined the GC content, coverage and profile of Silva database (v1.2.11) (Quast et al., 2013), a curated phylogenetic affiliation based on the best hit for each set of reference sequences associated with candidate gene against the Uniref90 (Suzek et al., 2007) phyla of interest, and a set of archaea sequences to database. Scaffolds were binned to specific organ- serve as a phylogenetic root. Sequences with isms by using coverage across the samples, phyloge- ⩾ 800 bp aligned in the 1582-bp alignment were used netic identity and GC content, both automatically to infer a maximum-likelihood phylogeny using with the ABAWACA algorithm and manually using RAxML (Stamatakis, 2014) with the GTRCAT model ggKbase (http://ggkbase.berkeley.edu/). ABAWACA of evolution selection via ProTest and 100 bootstrap is an algorithm that generates preliminary genome re-samplings. bins based on different characteristics of assembled scaffolds (https://github.com/CK7/abawaca). Six genome bins generated by ABAWACA were manu- Protein phylogenetic analyses and modeling ally inspected within ggKbase. Binning purity was For single protein trees (for example, RubisCO large confirmed using an Emergent Self-Organizing Map subunit), the phylogenetic pipeline was performed (Dick et al., 2009). Our primary method for assessing as previously described (Wrighton et al., 2012). genome completeness was based on the presence or Briefly, protein sequences were aligned using MUS- absence of orthologous genes representing a core CLE version 3.8.31 (Edgar, 2004) with default gene set that are widely conserved as single-copy settings. Problematic regions of the alignment were genes among bacterial candidate phyla, but we also removed using very liberal curation standards with assessed global markers. All single copy and meta- the program GBlocks (Talavera and Castresana, bolic analyses reported here and summarized in the 2007). Alignments with and without GBlocks were Supplementary Figures are reported in ggKbase confirmed manually. Best models of amino-acid (FASTA files can be accessed via links provided in substitution for each protein alignment were esti- the Supplementary Figures). Manual assembly cura- mated using ProtTest3 (Abascal et al., 2005). Phylo- tion improved some genome bins by extending and genetic trees were generated using RAxML with the joining scaffolds, removing local scaffolding errors PROTCAT setting for the rate model and the best and, in some cases, bringing in small previously model of amino-acid substitution specified by unbinned fragments. One genome was curated to ProtTest. Nodal support was estimated based on closure and completion. 100 bootstrap replications using the rapid boot- strapping option implemented in RAxML. Three- dimensional structure predictions were generated by Inventory of the RubisCO genes SWISS-MODEL (an automated protein homology- All putative RubisCO genes on genome fragments modeling server) and i-tasser based on protein assembled from the 12 separate groundwater filter alignment and secondary structure prediction. The samples were recovered first by BLAST search alignment mode was based on a user-defined target- (Altschul et al., 1990) against an in-house database template alignment. Conservation of key catalytic that included candidate phyla sequences from prior residues and the secondary structure for each model studies (Wrighton et al., 2012; Campbell et al., 2013; was confirmed by manual inspection. Kantor et al., 2013; Castelle et al., 2015). The taxonomic affiliation of these putative RubisCO CPR scaffolds was confirmed based on phylogenetic Plasmids and cloning analysis of genes included on the same genome Rifle metagenomic DNA was used as the template to fragment as the RubisCO or in the same genome bin. amplify the exact coding regions for the RubisCO

The ISME Journal RubicCO in uncultivated bacteria of the CPR KC Wrighton et al 2705 genes from PER genome PER_GWF2_38_29 (PER-2), biparental matings (Smith and Tabita, 2003; the OD-1 genome GWE1_OD1_1_1_39 and WS6 Satagopan et al., 2009, 2014). Plasmid-complemented genomes GWC1_1_418 and GWC1_33_22_1_93. Primer strain SB I/II− colonies were first selected on peptone sequences used for RubisCO amplification are included yeast extract plates followed by streaking onto Ormer- (Supplementary Table S1). PCR was performed using od’s minimal medium (Ormerod et al., 1961) agar the PrimeStar GXL (Clontech, Mountain view, CA, plates with no added organic carbon. To test for USA) high-fidelity . PCR products for all photoautotrophic growth complementation, plates genes were cloned between NdeI and BamHI sites in were incubated in illuminated anaerobic jars, which plasmid pet11a (Novagen/EMD Millipore, Billerica, were periodically flushedwithagasmixtureof5% MA, USA), which does not incorporate an N-term- CO2/H2 as previously described (Satagopan et al., 2009, inal hexa-histadine tag. For the PER, the PCR 2014). Anaerobic CO2-dependent photoautotrophic was also cloned between KpnI and BamHI growth was obtained using Ormerod’sminimalmed- sites in plasmid p80, a high copy plasmid 3716- ium agar plates in a sealed jar flushed with a 5% CO2/ based vector (a derivative of pLO11 obtained from 95% H2 gas mixture. Colonies on minimal plates under Oliver Lenz and Bärbel Friedrich, Berlin, Germany), these conditions were used to inoculate liquid photo- which contained the Rhodospirillum rubrum CbbR/ autotrophic cultures bubbled continuously with 5% cbbM promoter region, for expression in vivo. This CO2/H2 for growth analysis; cells were harvested promoter has been used previously in the broad-host anaerobically for RubisCO activity assays, as range vector pRPS-MCS3 to drive the expression of described above. RubisCO genes to complement RubisCO deletion strain Rhodobacter capsulatus strain SB I/II (Smith and Tabita, 2003). Constructs for the coding regions In vitro RubisCO assays of archaeal form III Archaeoglobus fulgidus RbcL2 RubisCO assays with activated enzyme preparations (Kreel and Tabita, 2007), archaeal form II/III Metha- were performed according to Satagopan et al. (2014) nococcoides burtonii rbcL and bacterial form II R. with modifications to ensure anaerobiosis. All crude rubrum cbbM (Falcone and Tabita, 1993) were used protein lysates, reagents and supplies were prepared for comparison and served as positive controls. The in an anaerobic chamber and crimped in sealed glass sequences of all constructs were verified at the Plant- vials prior to conducting the assays in a hood using Microbe Genomics Facility at The Ohio State gas-tight Hamilton syringes (Hamilton Robotics, University. Reno, NV, USA). Assay reaction mixtures contained μ 14 50 mM NaHCO3 (~2 Ci of NaH CO3), 10 mM MgCl2 and 0.8 mM ribulose 1,5 bisphosphate (RuBP) in Growth and culture conditions 50 mM bicine-NaOH buffer, pH 8.0. RubisCO activity To test for in vitro activity, RubisCO from is initiated by addition of its physiological , pet11a constructs were expressed in Escherichia RuBP. Reactions were initiated with the addition of coli strain Rosetta 2 (DE3) pLysS (Novagen), which crude protein lysate to its physiological substrate, uses the pRARE plasmid to accommodate expression RuBP, at 30 °C and incubated for 5–30 min. Negative of rare codons such as those from candidate phyla control reactions consisted of no RuBP or no crude and archaeal RubisCO sequences. Cells were grown protein lysate added. Carboxylase activities were in 50 ml lysogeny broth supplemented with − − determined following the equimolar conversion of μ 1 μ 1 14 150 gml ampicillin and 12.5 gml chloram- CfromNaHCO3 into acid-stable 3-phosphoglycerate phenicol in 125-ml flasks. After reaching an optical (3-PGA), as measured via radioactive scintillation – density of 0.6 0.8 (at 600 nm), RubisCO gene counting. Protein was determined via the Bradford expression was induced by adding IPTG to a final method and specific activities (nmol CO2 fixed per concentration of 1 mM. After 5–16 h of shaking at min per mg protein) were calculated consistent with room temperature, cells were harvested, flash frozen prior publications (Satagopan et al., 2009). and stored at − 80 °C prior to analysis. To ensure that candidate phyla RubisCOs were exposed to minimal amounts of oxygen, cell lysis and assay conditions Results and discussion were performed under strict anaerobic conditions. Using an anaerobic chamber, cell pellets were Recovery of CPR genomes that contain RubisCO resuspended in 50 mM bicine-NaOH, 10 mM MgCl2, We reconstructed CPR genomes from sequence data at pH 8.0, supplemented with 2 mM NaHCO3 and sets for samples collected over 93 days (Brown et al., 1mM ; crude protein lysates were 2015; Castelle et al., 2015). As previously reported, obtained via bead beating with 0.1-mm beads for bacteria from CPR phyla WS6, OD1 and OP11 were 3–4-min intervals in tightly capped tubes. enriched (475% 16S rRNA relative abundance) on To test for in vitro activity from complemented the 0.1-μm filter, and some members were shown to autotrophic cultures, p80::PER, p80::R. rubrum and have ultra-small cells (median cell volume of pRPS-MCS3::A. fulgidus RubisCO constructs were 0.009 ± 0.002 μm3) (Luef et al., 2015). Multiple transformed into E. coli strain S17-1 (ATCC47055) genomes belonging to the CPR phylum PER were and mobilized into R. capsulatus SB I/II − via recovered from all six 0.2-μm filter data sets,

The ISME Journal RubicCO in uncultivated bacteria of the CPR KC Wrighton et al 2706 suggesting that these cells may be larger in size than the three partial PER genomes analyzed from other CPR bacteria (for example, OD1, WS6) samples collected from the same aquifer 5–7 days enriched on the 0.1-μm filters. after acetate stimulation, two of which (ACD51, Taxonomic identifications of the CPR organisms ACD65) encoded RubisCO genes (Wrighton et al., from which we recovered RubisCO genes are 2012). Phylogenetic analyses that include these and provided in Supplementary Figures S1 and S2. The other recently reported sequences (Rinke et al., 2014; emergent self-organizing map clustered genome Brown et al., 2015) support the identification of this fragments based on their tetranucleotide sequence lineage as a separate phylum (Candidatus Peregrini- composition (Supplementary Figure S3) and vali- bacteria). Based on sequence divergence, the five dated 16 unique CPR genomes that ranged in Peregrinibacterial genomes reported here represent estimated completion from near-complete (494%) multiple taxonomic classes in this newly designated to complete (Supplementary Figure S4). Here we phylum. report RubisCO sequences from five genomic bins assigned to the PER phylum, two to the Parcubac- teria (OD1), five to the Microgenomates (OP11) and CPR genomes contain novel and diverse RubisCO six to the WS6 phylum (Table 1). We also report sequences RubisCO from additional scaffolds assigned to these Fifty-one new RubisCO genes were identified from CPR phyla. these CPR genomes. Some sequences, for example, The five WS6 genomes reported here span multi- from the WS6-1 and OD1-1 genomes, were indepen- ple taxonomic classes (Supplementary Figure S1 and dently reconstructed from multiple data sets. Ulti- Supplementary Table S2) (Brown et al., 2015). The mately, 18 representative unique sequences were manually curated and closed WS6 genome is manually confirmed by read mapping to verify their 0.896 Mbp in length and is the first complete genome accuracy (Figure 1). These sequences were most reconstructed for a member of this lineage. The similar to previously reported CPR or archaeal mapped read file confirming the assembly and the RubisCO sequences (and not bacterial RubiscCO annotations can be accessed via ggKbase (see sequences) (Supplementary Table S3) and the major- Methods section). Given the level of genome com- ity harbor the key active-site residues (Figure 2). pletion and the breadth of phylogenetic diversity Phylogenetic analyses revealed that the seven sampled here (Konstantinidis and Rosselló-Móra, bacterial CPR RubisCO sequences reported here are 2015), we propose the name Candidatus Dojkabac- members of the previously defined archaeal inter- teria for the WS6 phylum to honor the memory of mediate form II/III RubisCOs (Figure 1) (Tabita et al., Michael A Dojka, who first reported this lineage 2007, 2008). Recent metagenomic studies reported based on 16S rRNA gene sequence data (Dojka et al., the form II/III RubisCOs from PER and SR1 unculti- 2000) vated bacterial lineages (Wrighton et al., 2012; We recovered five distinct PER genomes that we Campbell et al., 2013; Kantor et al., 2013). Although estimated to be 493% complete (Table 1; the overall sequence similarity is most closely Supplementary Figure S4). This data set augments related to type II, protein sequence alignment

Table 1 Summary of CPR genomes reported in this study

Genome ID ID ggKbase Taxonomic Completeness (%) Number Relative GC Bin No. of RubisCo affiliation of content (%) length protein- form scaffolds (Mbp) coding genes

PER-1 PER_GWA2_38_35 Peregrinibacteria 95 5 38.2 1.11 1020 II/III PER-2* PER_GWF2_38_29 Peregrinibacteria 93 23 38 1.23 1164 II/III PER-3 PER_GWF2_39_17 Peregrinibacteria 100 15 38.8 1.31 1129 II/III PER-4 PER_GWC2_39_14 Peregrinibacteria 98 21 38.5 1.32 1247 II/III PER-5 PER_GWF2_43_17 Peregrinibacteria 100 16 43.1 1.20 1136 II/III WS6-1 WS6_GWF2_39_15 Dojkabacteria 100 (closed) 1 38.7 0.896 891 II/III WS6-2 WS6_GWC1_33_20 Dojkabacteria 98 23 32.7 0.65 666 III-like WS6-3 GWF1_WS6_37_7 Dojkabacteria 100 18 36.7 1.22 1153 III-like WS6-4 WS6_GWE2_33_157 Dojkabacteria 100 29 32.8 0.61 628 III-like WS6-5 WS6_GWE1_33_547 Dojkabacteria 95 34 32.5 0.61 649 III-like WS6-6 WS6_GWF1_33_233 Dojkabacteria 98 23 32.7 0.61 640 III-like OD1-1 OD1_GWE2_42_8 Parcubacteria 93 34 42.5 0.743 781 III-like OD1-2 OD1_RIFOXA1_OD1_43_6 Parcubacteria 84 183 31.7 0.67 850 III-like OP11-1 OP11_GWA2_42_18 Microgenomates 95 53 42.3 1.24 1324 III-like OP11-2 GWA2_OP11_43_14 Microgenomates 100 24 43.2 1.65 1699 III-like OP11-3 RBG_16_OP11_37_8 Microgenomates 95 84 37.5 1.31 1506 III-like

Abbreviation: CPR, candidate phyla radiation. Bolded text includes manually curated and confirmed near-complete or complete genomes, while asterisk (*) denotes RubisCO that was cloned and functionally confirmed.

The ISME Journal RubicCO in uncultivated bacteria of the CPR KC Wrighton et al 2707 Intermediate type II/III

Insertion of 48 amino acids PER2 WS6-1*

WS6-1* Bacterial type II

OP11 SR1

PER2/4 SR1

PER1* PER5 PER3 Type IA/IB

0.1

Bacterial Candidate Phyla SR1 OP11 WS6 100 OD1 Type IC/ID 90 PER 100

100

To type IV 94 RuBisCO-like 64 91 100

100 99 60 100 100 Firmicutes

100 OD1 SCG Archaeal Type III Halobacteriales

Methanosarcinales

MethanococcalesAciduliprofundum

WS6-3 Archaeal Type III

OD1-1 OP11-1/2/3 WS6-2 Insertion of 24 amino acids OD1-1 WS6-2/4/5/6

Bacterial Type III-like Figure 1 Maximum likelihood phylogenetic tree constructed for the RubisCO large subunit. Key bootstrap values 450 are shown based on 100 resamplings. Protein models of the PER and Dojkabacteria (WS6) type II/III RubisCOs and the Dojkabacteria and Parcubacterium (OD1) type III-like RubisCOs are included. The asterisk (*) refers to genomes that are complete or nearly complete and hand-curated. The additional inserts of the form II/III RubisCO from the closed Dojkabacteria genome (48 residues) or of the form III-like from the Parcubacterium genome (24 residues) are highlighted in red on the protein models (Expanded view shown in Supplementary Figure S6). confirmed the presence of form III-specific residues genomes to date lack the insertion. The Microgeno- (Supplementary Figure S5), validating the form II/III mates and two of the Dojkabacteria RubisCOs phylogenetic placement of these bacterial sequences. reported here have the 29 amino-acid insertion Our results expand the bacterial members of this found in the SR1 and archaeal II/III form, while the form by four additional PER sequences, add the first closed WS6 genome (WS6-1) encodes a RubisCO three Dojkabacteria sequences and add a single-cell with a novel 48 amino-acid insertion (Figure 1, unpublished Microgenomates (OP11) sequence. Ori- Supplementary Figures S5 and S6). Both the 29 and ginally, two form II/III RubisCO subgroups were 48 amino-acid insertion sequences are located in the described based on the absence (PER) or the same position as previously identified insertions. presence (SR1 and archaea) of a 29 amino-acid Similar to the methanogen sequences, the insertion N-terminal insert sequence (Wrighton et al., 2012; is sandwiched between catalytic sites of the enzyme Campbell et al., 2013; Supplementary Figures S5 and (Supplementary Figure S5), yet impact of these S6). Here our additional sampling of the PER insertions on enzyme biochemical properties is RubisCO confirms this earlier discovery, as all PER currently unknown.

The ISME Journal RubicCO in uncultivated bacteria of the CPR KC Wrighton et al 2708

Figure 2 Conservation of RubisCO active-site residues. Alignment of active-site residues from representative RubisCO and RLP sequences as noted previously by Tabita et al., 2007, along with equivalent residues from the new sequences described in this study. Residues are noted in single-letter IUPAC code. Coloring scheme is based on the identities of residues relative to the Synechococcus elongatus PCC 6301 reference sequence. Amino-acid numbering according to Synechococcus elongatus PCC 6301 reference sequence have been indicated. Green shading refers to conserved residues, yellow indicates semi-conserved residues and red represents non- conserved residues. Catalytic (C) and RuBP-binding (R) residues are indicated on the top; X denotes missing residue. The RubisCO motif refers to a stretch of conserved residues that are proximal in the primary structure and is identified by consensus residue identities shaded gray. Residues K, D and E from this motif participate in .

In addition to the form II/III sequences, nine of our insertions between alpha-helix 6 and beta-sheet 7 newly recovered bacterial sequences form a distinct (Figure 1, Supplementary Figures S5 and S6). The and strongly supported monophyletic group that WS6 genome containing divergent form III also branches deeply from, yet is most similar to, the form contains the eight amino-acid insertion. To further III RubisCO sequences found in archaea. This confirm analyses indicating that these insertions current CPR clade includes RubisCO sequences from were not due to sequencing or assembly error, we Dojkabacteria (3), Parcubacterium (1) and Microge- designed specific PCR primers for the gene encoding nomates (5). Although the extent of sequence for the large RubisCO subunit and confirmed the divergence between these bacterial sequences and insertions via cloning and sequencing. Prior the archaeal form III sequences is considerable, sequence analyses demonstrated that this region additional taxon sampling is needed to resolve this was involved in interactions with the neighboring clade as completely distinct RubisCO form (Figure 1, subunit (Satagopan et al., 2014); however, these Supplementary Figure S7). In our analyses, we have insertions have not been previously identified in identified other bacterial genomes that contain other RubisCO sequences. homologs to archaeal form III sequences in public databases (for example, from a partial single-cell Parcubacteria genome and in two Firmicutes draft CPR RubisCO is inferred to support a heterotrophic genomes; Supplementary Figure S7). Notably, these nucleoside pathway bacterial form III are not monophyletic with the CPR Previously studied archaeal form III (Finn and RubisCOs described here but clade with the canoni- Tabita, 2004; Sato et al., 2007; Estelmann et al., cal archaeal form III. 2011) and form II/III (Tabita et al., 2008; Alonso The parcubacterial sequence from the OD1-1 et al., 2009) RubisCOs are not associated with the genome contains two, an 8 and 24 , CBB pathway but instead function in a nucleoside-

The ISME Journal RubicCO in uncultivated bacteria of the CPR KC Wrighton et al 2709 salvaging pathway. Consistent with the archaeal RuBP by an enzyme annotated to function in thiamine genomes that contain RubisCO, all bacterial CPR monophosphate formation (Figure 3, unique pathway genomes containing RubisCOs sampled to date lack denoted in yellow; Finn and Tabita, 2004) (that is, the gene encoding phosphoribulokinase, a key gene PRPP → ribose 1,5 bisphosphate → RuBP). Given that of the CBB pathway (Wrighton et al., 2012; Campbell only one Peregrinibacterium genome contains a distant et al., 2013; Kantor et al., 2013). However, we note homolog of this RuBP-forming enzyme, and all of these these genomes contain many genes that are too novel CPR were recovered from an ~ 14 °C aquifer, this to be identified based on sequence similarity and pathway seems less likely in most CPR bacteria whose thus we cannot eliminate the possibility that they genomes encode RubisCO. However, we note many contain new or divergent phosphoribuloki- PER genomes encode an enzyme that can convert nase homologs (Ashida et al., 2008). Consistent with PRPP to AMP. This enzyme, adenine phosphoribosyl- our hypothesis that these RubisCO function in , may represent an alternative mechanism archaeal nucleoside pathway, outside other CPR by which PER genomes can use PRPP via the AMP sequences, these RubisCO proteins have the closest pathway and not the pathway previously reported in homologs to similar archaeal and not bacterial thermophilic archaea (that is, PRPP → AMP → ribose RubisCOs (Supplementary Table S3). Thus we 1,5 bisphosphate → RuBP). Thus we cannot rule out suggest RubisCOs studied here also function in a that PRPP may also be used as a source for RuBP in nucleoside/nucleotide metabolism pathway analo- some members of the CPR via a T. kodakarensis-like gous to archaea. pathway (Figure 3). Rather than using phosphoribulokinase, there are Very recently, Aono et al. (2015) identified two proposed pathways to generate RuBP, the alternative enzymes involved in the nucleoside substrate for RubisCO in Archaea. In the first path- metabolism pathway in T. kodakarensis, linking way, identified in Thermococcus kodakarensis, nucleoside moieties to central carbon metabolism. nucleosides are converted into 3-PGA. The first step Instead of AMP phosphorylase (step 1 T. kodakar- in this short pathway dephosphorylates the nucleo- ensis pathway), the ribose moieties of adenosine, tide (AMP, CMP, UMP) to release the (for guanosine and uridine are converted to ribose-1- example, adenine), yielding ribose-1,5-bisphosphate, bisphosphate (R1P) by three nucleoside phosphor- which an converts to ribulose-1,5-bispho- ylases. R1P is converted to ribose-1,5-bisphosphate sphate (RuBP) (that is, AMP → ribose 1,5 bispho- (R15P) via a nucleoside (ADP-dependent sphate → RuBP). From RuBP and CO2, the type III or ribose-1-phosphate kinase). R15P is likely directed II/III archaeal RubisCO generate two of 3- to glycolysis via the R15P isomerase and RubisCO. In PGA, a central intermediate of core carbon metabo- this metabolic context, using phosphate and ADP, lism (Aono et al., 2012). the ribose moieties of adenosine, guanosine and All CPR genomes with RubisCO reported here and uridine are converted to R15P via R1P by nucleoside studied previously possesses the two archaeal AMP phosphorylases and ADP-R1P kinase (Aono et al., pathway homologs (Figure 4), with the best hits to 2015) (Figure 3, blue dotted line). Finally, R15P can the AMP phosphorylase and the isomerase outside be directed to glycolysis via the functions of R15P the CPR are largely from archaeal genomes isomerase and RubisCO (Aono et al., 2015). We (Supplementary Table S3). For both proteins in this examined the CPR genomes for this alternative pathway, the most similar sequences outside the CPR T. kodakarensis pathway and found some (albeit are to homologs in archaeal genomes, often times to weak) support for this pathway in Microgenomates methanogens (PER) or to uncultivated archaeal and most PER genomes. Thus it cannot be ruled out lineages such as the DPANN (Castelle et al., 2015). that some CPR bacteria could use an alternative route We used an amino-acid alignment and protein through which R15P is generated from the phos- modeling to confirm that the CPR isomerase has all phorolysis of nucleosides (Figure 3). residues for catalytic activity and is specific for CPR organisms that contain RubisCO are inferred to ribose 1,5 bisphophate and not another substrate, for be obligate fermenters (Supplementary Figure S10), example, 5-methylthioribose-1-phosphate isomerase consistent with prior metabolic analyses (Wrighton (Supplementary Figures S8 and S9). Based on the et al., 2012; Campbell et al., 2013; Kantor et al., genomic evidence to date, we propose that the CPR 2013; Wrighton et al., 2014). All of the genomes bacteria likely use a RubisCO pathway analogous to lack a complete tricarboxylic acid cycle, NADH that of T. kodakarensis (Figure 3, denoted in green). dehydrogenase (complex I) and most other com- In contrast to Thermococcus pathway, other thermo- plexes from the oxidative electron transport phos- philic archaeal (Methanocaldococcus jannaschii, phorylation chain (for example, complex II, Methanosarcina acetivorans and Archaeoglobus litho- complex III, complex IV and quinones). All CPR trophicus) use a second pathway. In this pathway, genomes included in this study, besides the ribose-phosphate pyrophosphokinase (PRPP) is sug- Dojkabacteria (WS6), have complete or near- gested to be desphophorylated to ribose 1,5 bispho- complete glycolysis and/or pentose phosphate sphate via an abiotic reaction that is facilitated by high pathway(s) and the capacity to produce acetate, temperatures. In archaea, the non-enzymatically pro- lactate and/or hydrogen as byproducts of fermenta- duced ribose 1,5-bisphosphate is then converted to tion. Notably, the PER, in contrast to bacteria of the

The ISME Journal RubicCO in uncultivated bacteria of the CPR KC Wrighton et al 2710

Glucose-6P Glucono-1,5-lactone-6P Gluconate-6P G6PD PGLS 6 1.1.1.49 6 3.1.1.31 6

CO hxlB arabino-3-Hexulose-6P HCHO 2 GPI 5.3.1.9 hxlA 5.3.1.27 6 4.1.2.43

Xylulose-5P Extracellular RPE Fructose-6P tkt RNA 6 2.2.1.1 5 5.1.3.1 5 Ribulose-5P ? Erythrose Sedoheptulose rpiA 5.3.1.6 -4P -1,7P2 AMP PRPP APRT FBP 3.1.3.11 AMP 4 7 5 2.7.6.1 5 2.4.2.7 Ribose-5P PRPP Fructose- tal 2.2.1.2 + ATP synthetase Adenine 1,6P2 6 2.2.1.1 tkt 7 Sedoheptulose-7P 2.4.2.57 AMPase ALDO 4.1.2.13 DAK DAS Glycerone-P 2.7.1.29 2.2.1.3 3 3 Pi ? TIM Glycerone HCHO 5.3.1.1 2.7.1.19 3 Adenine Phosphoribulokinase Glyceraldehyde-3P GAPDH 1.2.1.12

3 Glycerate-1,3P2

PGK 2.7.2.3 CO2 RuBisCO Isomerase 3 4.1.1.39 5 5.3.1.29 5 Glycerate-3P Ribulose-1,5P2 Ribose-1,5P2 gpmI 5.4.2.12

ThiC 3 Glycerate-2P NAD+ (?)

ENO 4.2.1.11

3 Phosphoenolpyruvate

PK 2.7.1.40 PFOR Acetyl-CoA PTA Acetyl-P ackA Acetate 3 1.7.1.3 3 2.3.1.8 3 2.7.2.1 3 Pyruvate ATP 6.2.1.13 ACS ATP

Figure 3 Proposed metabolic role of RubisCO, with genes identified in the PER-1 (blue line) and WS6-1 genomes (purple line). Black lines indicate genes not identified in either organism. Thick blue arrows represent genes identified by metatranscriptomics (Supplementary Table S4). Yellow lines denoted the proposed PRPP pathway indicated in the text. Abbreviations not indicated in the text: G6PD, glucose-6-phosphate dehydrogenase; PGLS, 6-phosphogluconolactonase; hxlB, 6-phospho-3-hexuloisomerase; hxlA, 3- hexulose-6-phosphate synthase Pgm, phosphoglucomutase; GPI, glucose-6-phosphate isomerase; FBP, fructose-1,6-bisphosphatase; Aldo, fructose-bisphosphate aldolase; GAPDH, glyceraldehyde 3-phosphate dehydrogenase, Pgk, ; TIM, triosepho- sphate isomerase; GpmI, 1,3-bisphosphoglycerate-independent phosphoglycerate mutase; Eno, enolase; PK, ; PFOR, pyruvate/2-oxoacid-ferredoxin ; ACS, Acetyl-CoA synthetase; PTA, phosphotransacetylase; ackA, acetate kinase; RpiA, ribose 5-phosphate isomerase A; RPE, ribulose-phosphate 3-epimerase; Tkt, transkelotase; Tal, transaldolase; APRT, adenine phosphoribosyltransferase; PRPP synthetase, ribose-phosphate pyrophosphokinase; DAK, glycerone kinase; DAS, dihydroxyacetone synthase; AMPase, AMP phosphorylase; isomerase, Ribose 1,5-bisphosphate isomerase; NMP, nucleoside 5′-monophosphate.

other CPR genomes with RubisCO described here, for the other CPR genomes. Enzymes that were seem to have the capacity to synthesize nucleotides identified include those required for the last steps and some amino acids (Supplementary Figure S11). of glycolysis from PGA to pyruvate and ultimately The metabolic information gleaned from the first fermentation to acetate and ATP. Notably, PGA is the complete WS6 genome is sparse compared with that product of RuBP conversion by RubisCO (Figure 3).

The ISME Journal RubicCO in uncultivated bacteria of the CPR KC Wrighton et al 2711 We failed to identify genes for upper glycolysis, would minimize the energetic cost of upper glyco- gluconeogenesis and the pentose phosphate path- lysis, maximizing energy yields for these organisms way in the WS6 genomes (Supplementary such as WS6 with tiny genomes. Figure S10). Gene and protein sequences discussed in this manuscript can be accessed through links in Supplementary Figures S10 and S11. This finding CPR RubisCO transcripts detected in the environment is similar to reports of bacteria of the SR1 phylum We examined the metatranscriptomic sequence (placement within the CPR is uncertain) that also data set collected from the 0.2-μm filter samples encode form II/III RubisCO and lack these path- for reads that mapped uniquely to PER-1 genome ways (Campbell et al., 2013; Kantor et al., 2013) (Supplementary Table S4), given that these organ- (Supplementary Figure S10). Complete and closed isms were enriched on this filter (there was insuffi- SR1 and WS6 genomes also lack pathways for cient biomass after metagenomics to investigate the biosynthesis of nucleotides (purines and pyrimi- 0.1-μm filters). In the sample collected prior to dines) and amino acids (Kantor et al., 2013). acetate amendment (sample A), we confirmed the Consistent with reports for SR1 and some small expression of RubisCO (Scaffold 3_gene102) and the CPR genomes (Kantor et al., 2013; Gong et al., 2014; surrounding gene neighborhood (3_105, 3_107, Brown et al., 2015; He et al., 2015; Luef et al., 3_109) (Figure 4a). Notably, besides RubisCO, all 2015), we suggest that WS6 bacteria reported here of these transcribed genes were annotated as have an obligatory symbiotic or parasitic lifestyle hypothetical. Also in sample A, we detected tran- and are not free-living. scripts for step 1 of the AMP pathway (AMP It is possible that WS6 and other CPR that encode phosphorylase (Scaffold 5_gene7)) and energy genera- RubisCO may be reliant on other members of the tion via acetate production (phosphotransacetylase community for external ribose, which is utilized as (Scaffold 5_gene5)), as well as a poorly annotated carbon and energy sources. A bacterial or archaeal phosphoesterase (Scaffold 5_gene4) (Figure 4b). The cell is on average 20% by weight RNA, and RNA is expression of RubisCO, a key gene in the AMP 40% by weight ribose, meaning that a cell is roughly pathway, and genes for acetate production from the 8% pure ribose. A recent paper used this information same genome prior to stimulation suggests a role for to suggest that ribose was likely an abundant sugar the RubisCO and likely the AMP pathway in bacteria available on early earth for fermentation and that the under natural aquifer conditions. archaeal type III RubisCO pathway of nucleoside monophosphate conversion to 3-PGA might be a relic of ancient heterotrophy (Schönheit et al., 2015). The PER RubisCO can support autotrophic

In a similar manner, we suggest that in WS6 the CO2-dependent growth ribose moiety can be funneled into RuBP via the two We synthesized a recombinant protein in E. coli gene AMP pathway and converted to PGA via of the PER-1 form II/III RubisCO that was transcribed RubisCO to ultimately yield ATP and acetate via in situ. We demonstrated CO2 fixation activity substrate-level . This use of ribose from the crude extract using RuBP as a substrate

Pyrimidine metabolism Purine metabolism Hypothetical protein Acetate production Cell wall-related protein Peptidase Figure 4 Gene neighborhoods near the form II/III RubisCO (a) as well as AMP phosphorylase and R15P isomerase (b) from the PER-1 genome. (a) RubisCO (red) clusters with genes for pyridmidine and purine metabolism. (b) Genes for producing acetate (blue) are near two homologs from the AMP pathway (dark green). Gene clusters were organized and visualized using Geneious v7.0.6.

The ISME Journal RubicCO in uncultivated bacteria of the CPR KC Wrighton et al 2712 (Kreel and Tabita, 2007; Supplementary Table S5). In contrast to the PER results, the OD1 and WS6 recombinant proteins failed to support comparable (OD1) or detectable (WS6) RuBP-dependent CO2 fixation under these conditions. This latter WS6 finding is consistent with the missing catalytic residues and thus their annotation as RubisCOs remains uncertain (Figure 2). More importantly, we demonstrated that the PER form II/III enzyme could complement a Rhodobacter capsulatus RubisCO-deletion host strain to support autotrophic CO2-dependent growth, much like archaeal form III Rubis- CO (Finn and Tabita, 2003) (Figure 5). However, while under the conditions tested RubisCO fixed 10.00 extracellular CO2, we cannot rule out the possibi- lity that in situ this RubisCO may scavenge intracellular CO2 pools to support heterotrophic metabolism as previously reported (Joshi and 1.00 Tabita, 1996; Tachi and Tabita, 2001; Kresge et al., 2005; McKinlay and Harwood, 2010; Aono Rr et al., 2015). PER 0.10 empty OD @ 660 nm vector Conclusion 0.01 Given that RubisCO is exceptionally well studied 0 50100150200250300350400 450 500 owing to its central role in CO2 fixation in Hours the Earth’s biosphere, it is significant that this analysis of subsurface sediments has yielded Figure 5 Photoautotrophic growth complemented by the PER form II/III RubisCO gene in Rhodobacter capsulatus strain SB I/II−. many new sequence types. We expand the phylo- (a) p80::Rhodospirillum rubrum cbbM;(b) pRPS-MCS3::Archae- genetic membership of CPR genomes known to oglobus fulgidus rbcL;(c) p80::Perigrinibacteria rbcL;(d) empty contain RubisCO sequences from the PER vector. Anaerobic CO2-dependent photoautotrophic growth was to include members of the Dojkabacteria (WS6), obtained using Ormerod’s minimal medium agar plates in a sealed Parcubacteria (OD1) and Microgenomates (OP11). jar flushed with a 5% CO2/95% H2 gas mixture. (e) Anaerobic CO2- dependent growth was performed in sealed tubes continuously We provide the first demonstration that genes bubbled with a 5% CO2/95% H2 gas mixture. Each data point of this pathway are transcribed in situ in bacteria. represent the average ± s.d. of triplicate cultures. We also verified the first physiological comple- mentation of a novel RubisCO that can support CO2 fixation from a metagenomically derived sequence from the bacterial candidate phylum Conflict of Interest radiation. Future research is needed to identify the currently unknown but co-expressed flanking The authors declare no conflict of interest. genes to better understand the regulatory as well as complete metabolic role for RubisCO in Acknowledgements these small, apparently obligatory fermentative This work is partially based upon work supported bacteria. through the Lawrence Berkeley National Laboratory’s The presence of form II/III and a currently novel, (LBNL) Sustainable Systems Scientific Focus Area. The undefined form of RubisCO in bacteria with a US Department of Energy (DOE), Office of Science, Office similar metabolic context (for example, anaerobic, of Biological and Environmental Research funded the non-CBB functionality) to RubisCO in archaea work under contract DE-AC02-05CH11231 (LBNL; oper- suggests a deep ancestry of this non-CBB cycle ated by the University of California) and DE-SC0004918 enzyme. Of particular note is that these bacterial to JB and NIH grant R01GM095742 to FRT. DNA RubisCOs occur in organisms with strikingly sequencing was conducted by the DOE Joint Genome limited metabolic potential and no detected Institute through the Community Science Program. RNA sequencing was performed at the DOE-supported respiratory capacity. This metabolism, where Environmental Molecular Sciences Laboratory at PNNL ribose is suggested to have a key role in carbon as part of an award to JFB, KCW, MJW and KHW. We compound transformations and energy generation, thank Lye Meng Markillie and Ronald C Taylor for may have been important in organisms that existed transcriptomic sequencing. We thank Professor B Alber before the emergence of respiration (Schönheit at the Ohio State University for discussions on microbial et al., 2015). metabolism. Genomes and raw reads in this manuscript

The ISME Journal RubicCO in uncultivated bacteria of the CPR KC Wrighton et al 2713 are deposited in NCBI under the accession numbers Dick GJ, Andersson AF, Baker BJ, Simmons SL, Thomas (pending) OP11_GWA2_42_18: LCDD00000000; OP11 BC, Yelton AP et al. (2009). Community-wide analysis _GWA2_43_14: LCFP00000000; PER_GWA2_38_35: of microbial genome sequence signatures. Genome Biol LBUV00000000; PER_GWF2_38_29: LBUS00000000; PE 10: R85. R_GWF2_39_17: LBWM00000000; WS6_GWF2_39_15: Dojka MA, Harris JK, Pace NR. (2000). Expanding the LBWK00000000; WS6_GWC1_33_20: LBOV00000000 known diversity and environmental distribution of an (BioProject: PRJNA273161 and BioSample: SAMN uncultured phylogenetic division of bacteria. Appl 03319638). RNA sequences have been deposited in Environ Microbiol 66: 1617–1621. the NCBI Sequence Read Archive under accession Edgar RC. (2004). MUSCLE: multiple sequence alignment number SRP050083. The genomes and relevant fasta with high accuracy and high throughput. Nucleic files for all mentioned proteins are available via the Acids Res 32: 1792–1797. website: http://ggkbase.berkeley.edu/, under the project Edgar RC. (2010). Search and clustering orders of magni- name ‘RubiscO’ under the link entitled ‘Rifle ground- tude faster than BLAST. Bioinformatics 26: 2460–2461. water metagenome’. Estelmann S, Ramos-Vera WH, Gad’on N, Huber H, Berg IA, Fuchs G. (2011). Carbon dioxide fixation in ‘Archae- oglobus lithotrophicus’: are there multiple autotrophic pathways? FEMS Microbiol Lett 319:65–72. Falcone DL, Tabita FR. (1993). Complementation analysis

and regulation of CO2 fixation gene expression in References a ribulose 1,5-bisphosphate carboxylase-oxygenase deletion strain of Rhodospirillum rubrum. J Bacteriol Abascal F, Zardoya R, Posada D. (2005). ProtTest: selection – of best-fit models of protein evolution. Bioinformatics 175: 5066 5077. 21: 2104–2105. Finn MW, Tabita FR. (2003). Synthesis of catalytically active form III ribulose 1,5-bisphosphate carboxylase/ Alonso H, Blayney MJ, Beck JL, Whitney S. (2009). – Substrate-induced assembly of Methanococcoides bur- oxygenase in archaea. J Bacteriol 185: 3049 3059. tonii D-ribulose-1,5-bisphosphate carboxylase/oxyge- Finn MW, Tabita FR. (2004). Modified pathway to nase dimers into decamers. J Biol Chem 244: synthesize ribulose 1,5-bisphosphate in methanogenic archaea. J Bacteriol 186: 6360–6366. 33876–33882. Gong J, Qing Y, Guo X, Warren A. (2014). ‘Candidatus Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Sonnebornia yantaiensis’, a member of candidate (1990). Basic local alignment search tool. J Mol Biol division OD1, as intracellular bacteria of the ciliated 215: 403–410. protist Paramecium bursaria (Ciliophora, Oligohyme- Aono R, Sato T, Imanaka T, Atomi H. (2015). A pentose nophorea). Syst Appl Microbiol 37:35–41. bisphosphate pathway for nucleoside degradation in He X, McLean JS, Edlund A, Yooseph S, Hall AP, Liu S-Y Archaea. Nat Chem Biol 11: 355–360. et al. (2015). Cultivation of a human-associated Aono R, Sato T, Yano A, Yoshida S, Nishitani Y, Miki K TM7 phylotype reveals a reduced genome and et al. (2012). Enzymatic characterization of AMP epibiotic parasitic lifestyle. Proc Natl Acad Sci USA phosphorylase and ribose-1,5-bisphosphate isomerase 112: 244–249. functioning in an archaeal AMP metabolic pathway. – Hug LA, Castelle CJ, Wrighton KC, Thomas BC, Sharon I, J Bacteriol 194: 6847 6855. Frischkorn KR et al. (2013). Community genomic Ashida H, Saito Y, Nakano T, Tandeau de Marsac N, analyses constrain the distribution of metabolic traits Sekowska A, Yokota A. (2008). RubisCO-like proteins across the Chloroflexi phylum and indicate roles in as the enolase enzyme in the methionine salvage sediment carbon cycling. Microbiome 1: 22. pathway: functional and evolutionary relationships Hyatt D, LoCascio PF, Hauser LJ, Uberbacher EC. (2012). between RubisCO-like proteins and photosynthetic Gene and translation initiation site prediction in meta- – RubisCO. J Exp Bot 59: 1543 1554. genomic sequences. Bioinformatics 28: 2223–2230. Baker BJ, Lazar CS, Teske AP, Dick GJ. (2015). Genomic Joshi HM, Tabita FR. (1996). A global two component resolution of linkages in carbon, nitrogen, and sulfur signal transduction system that integrates the control cycling among widespread estuary sediment bacteria. of , carbon dioxide assimilation, and Microbiome 3: 14. nitrogen fixation. PNAS 93: 14515–14520. Brown CT, Hug LA, Thomas BC, Sharon I, Castelle CJ, Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. (2012). Singh A et al. (2015). Unusual biology across a group KEGG for integration and interpretation of large-scale comprising more than 15% of domain Bacteria. Nature molecular data sets. Nucleic Acids Res 40: D109–D114. 523: 208–211. Kanehisa M, Goto S. (2000). KEGG: Kyoto Encyclopedia of Campbell JH, O’Donoghue P, Campbell AG, Schwientek P, Genes and Genomes. Nucleic Acids Res 28:27–30. Sczyrba A, Woyke T et al. (2013). UGA is an additional Kantor RS, Wrighton KC, Handley KM, Sharon I, Hug LA, glycine codon in uncultured SR1 bacteria from Castelle CJ et al. (2013). Small genomes and sparse the human microbiota. Proc Natl Acad Sci USA 110: metabolisms of sediment-associated bacteria from four 5540–5545. candidate phyla. MBio 4: e00708–e00713. Castelle CJ, Hug LA, Wrighton KC, Thomas BC, Williams KH, Konstantinidis KT, Rosselló-Móra R. (2015). Classifying Wu D et al. (2013). Extraordinary phylogenetic diversity the uncultivated microbial majority: a place for and metabolic versatility in aquifer sediment. Nat metagenomic data in the Candidatus proposal. Syst Commun 4: 2120. Appl Microbiol 38: 223–230. Castelle CJ, Wrighton KC, Thomas BC, Hug LA, Brown CT, Kreel NE, Tabita FR. (2007). Substitutions at methionine Wilkins MJ et al. (2015). Genomic expansion of domain 295 of Archaeoglobus fulgidus ribulose-1,5-6bispho- archaea highlights roles for organisms from new phyla sphate carboxylase/oxygenase affect oxygen binding in anaerobic carbon cycling. Curr Biol 25: 690–701. and CO2/O2 specificity. J Biol Chem 282: 1341–1351.

The ISME Journal RubicCO in uncultivated bacteria of the CPR KC Wrighton et al 2714 Kresge N, Simoni RD, Hill RL. (2005). The discovery of 1,5-bisphosphate carboxylase/oxygenase. J Mol Biol 331: heterotrophic carbon dioxide fixation by Harland 557–569. G. Wood. J Biol Chem 280: e15. Stamatakis A. (2014). RAxML version 8: a tool for Lloyd KG, Schreiber L, Petersen DG, Kjeldsen KU, Lever MA, phylogenetic analysis and post-analysis of large phy- Steen AD et al. (2013). Predominant archaea in marine logenies. Bioinformatics 30: 1312–1313. sediments degrade detrital proteins. Nature 496:215–218. Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH. Luef B, Frischkorn KR, Wrighton KC, Holman H-YN, (2007). UniRef: comprehensive and non-redundant Uni- Birarda G, Thomas BC et al. (2015). Diverse unculti- Prot reference clusters. Bioinformatics 23: 1282–1288. vated ultra-small bacterial cells in groundwater. Nat Tabita FR, Hanson TE, Li H, Satagopan S, Singh J, Chan S. Commun 6: 6372. (2007). Function, structure, and evolution of the McKinlay JB, Harwood CS. (2010). Carbon dioxide fixation RubisCO-like proteins and their RubisCO homologs. as a central recycling mechanism in Microbiol Mol Biol Rev 71: 576–599. bacteria. Proc Natl Acad Sci 107: 11669–11675. Tabita FR, Satagopan S, Hanson TE, Kreel NE, Scott SS. Nawrocki E. (2009). Structural RNA homology search and (2008). Distinct form I, II, III, and IV RubisCO proteins alignment using covariance models. All Theses and from the three kingdoms of life provide clues about Dissertations (ETDs). Paper 256. http://openscholar RubisCO evolution and structure/function relation- ship.wustl.edu/etd/256. ships. J Exp Bot 59: 1515–1524. Nawrocki EP, Kolbe DL, Eddy SR. (2009). Infernal Talavera G, Castresana J. (2007). Improvement of phylo- 1.0: inference of RNA alignments. Bioinformatics 25: genies after removing divergent and ambiguously 1335–1337. aligned blocks from protein sequence alignments. Syst Ormerod JG, Ormerod KS, Gest H. (1961). Light-dependent Biol 56: 564–577. utilization of organic compounds and photoproduction Tichi MA, Tabita FR. (2001). Interactive control of of molecular hydrogen by photosynthetic bacteria; Rhodobacter capsulatus redox balancing systems relationships with nitrogen metabolism. Arch Biochem during phototrophic metabolism. J Bacteriol 183: Biophys 94: 449–463. 6344–6354. Peng Y, Leung HCM, Yiu SM, Chin FYL. (2012). IDBA-UD: Williams KH, Long PE, Davis JA, Wilkins MJ, N’Guessan AL, a de novo assembler for single-cell and metagenomic Steefel CI et al. (2011). Acetate availability and sequencing data with highly uneven depth. Bioinfor- its influence on sustainable bioremediation of matics 28: 1420–1428. uranium-contaminated groundwater. Geomicrobiol J Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P 28: 519–539. et al. (2013). The SILVA ribosomal RNA gene database Wrighton KC, Castelle CJ, Wilkins MJ, Hug LA, Sharon I, project: improved data processing and web-based tools. Thomas BC et al. (2014). Metabolic interdependencies Nucleic Acids Res 41: D590–D596. between phylogenetically novel fermenters and Rinke C, Lee J, Nath N, Goudeau D, Thompson B, Poulton N respiratory organisms in an unconfined aquifer. ISME et al. (2014). Obtaining genomes from uncultivated J 8: 1452–1463. environmental microorganisms using FACS-based single- Wrighton KC, Thomas BC, Sharon I, Miller CS, Castelle CJ, cell genomics. Nat Protoc 9:1038–1048. VerBerkmoes NC et al. (2012). Fermentation, hydro- Satagopan S, Chan S, Perry LJ, Tabita FR. (2014). Structure- gen, and sulfur metabolism in multiple uncultivated function studies with the unique hexameric bacterial phyla. Science 337: 1661–1665. form II ribulose-1,5-bisphosphate carboxylase/oxyge- nase (RubisCO) from Rhodopseudomonas palustris. J Biol Chem 289: 21433–21450. This work is licensed under a Creative Satagopan S, Scott SS, Smith TG, Tabita FR. (2009). Commons Attribution-NonCommercial- A RubisCO mutant that confers growth under a ShareAlike 4.0 International License. The images or normally ‘inhibitory’ oxygen concentration. Biochem- other third party material in this article are included – istry 48: 9076 9083. in the article’s Creative Commons license, unless Sato T, Atomi H, Imanaka T. (2007). Archaeal type III indicated otherwise in the credit line; if the material RubisCOs function in a pathway for AMP metabolism. Science 315: 1003–1006. is not included under the Creative Commons license, Schönheit P, Wolfgang B, Martin WF. (2015). On the origin users will need to obtain permission from the license of heterotrophy. Trends Microbiol 24:12–25. holder to reproduce the material. To view a copy of Smith SA, Tabita FR. (2003). Positive and negative selection this license, visit http://creativecommons.org/licenses/ of mutant forms of prokaryotic (cyanobacterial) ribulose- by-nc-sa/4.0/

Supplementary Information accompanies this paper on The ISME Journal website (http://www.nature.com/ismej)

The ISME Journal