<<

UGA is an additional glycine codon in uncultured SR1 from the human microbiota

James H. Campbella,1, Patrick O’Donoghueb,1, Alisha G. Campbella,c, Patrick Schwientekd, Alexander Sczyrbae, Tanja Woyked, Dieter Söllb,f,2, and Mircea Podara,c,g,2

aBiosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831; bDepartment of Molecular Biophysics and Biochemistry and fDepartment of Chemistry, Yale University, New Haven, CT 06520; cGenome Science and Technology Program and gDepartment of Microbiology, University of Tennessee, Knoxville, TN 37998; dDepartment of Energy Joint Genome Institute, Walnut Creek, CA 94598; and eCenter for Biotechnology, Bielefeld University, 33501 Bielefeld, Germany

Contributed by Dieter Söll, February 21, 2013 (sent for review February 5, 2013)

The composition of the human microbiota is recognized as an code expansion with selenocysteine and pyrrolysine (11). One important factor in human health and disease. Many of our cohab- organism, Acetohalobium arabaticum, dynamically varies the itating microbes belong to phylum-level divisions for which there are number of amino acids it encodes between 20 and 21, depending no cultivated representatives and are only represented by small on its carbon source (12). The first variant genetic code, observed subunit rRNA sequences. For one such taxon (SR1), which includes in human mitochondria (13), showed that the code is able to bacteria with elevated abundance in periodontitis, we provide change over evolutionary time. There are now 18 nonstandard a single-cell genome sequence from a healthy oral sample. SR1 bac- codes in various organisms or organelles, and half of these decode teria use a unique genetic code. In-frame TGA (opal) codons are found UGA as tryptophan (Trp). We provide genomic and biochemical in most genes (85%), often at loci normally encoding conserved gly- evidence showing that SR1 and related bacteria require reas- cine residues. UGA appears not to function as a stop codon and is in signment of UGA to glycine (Gly) to produce an active proteome. equilibrium with the canonical GGN glycine codons, displaying strain- Results specific variation across the human population. SR1 encodes a diver- Gly Human-Associated SR1 Bacteria Are Diverse and Specific to Body gent tRNA UCA with an opal-decoding anticodon. SR1 glycyl-tRNA Gly Niches. To determine the level of phylogenetic diversity of SR1 synthetase acylates tRNA UCA with glycine in vitro with similar ac- Gly bacteria associated with the healthy human body, we used the SSU tivity compared with normal tRNA UCC. Coexpression of SR1 glycyl- Gly rRNA gene pyrosequencing data (V3–5 region) from the HMP tRNA synthetase and tRNA UCA in Escherichia coli yields significant β lacZ consortium (14). Of ∼26 million sequences, 1,657 were assigned to -galactosidase activity in vivo from a gene containing an in- fi frame TGA codon. Comparative genomic analysis with Human Micro- the candidate phylum SR1. SR1 bacteria were identi ed in oral cavity samples and on the skin (15). Six major operational taxonomic biome Project data revealed that the human body harbors a striking fi diversity of SR1 bacteria. This is a surprising finding because SR1 is units (OTUs) were identi ed based on sequences present in more most closely related to bacteria that live in anoxic and thermal environ- than two copies, accounting for 99% of the data, with 17 additional OTUs being represented by one or two sequences. The SR1-OR1 ments. Some of these bacteria share common genetic and metabolic genome data are from a single cell belonging to OTU1 (Fig. 1). features with SR1, including UGA to glycine reassignment and an The distributions of SR1 OTUs in saliva and supra- and sub- archaeal-type ribulose-1,5-bisphosphate carboxylase (RubisCO) gingival plaques are similar, but partitioned differently from those involved in AMP recycling. UGA codon reassignment renders SR1 in the rest of the mouth (Fig. 1). SR1 from the throat, tonsils, genes untranslatable by other bacteria, which impacts horizontal tongue, and hard palate were distinct from those in saliva and gene transfer within the human microbiota. other oral sites. Because saliva is present in all areas of the oral cavity, this heterogeneity suggests a degree of niche specialization aminoacyl-tRNA synthetase | oral microbiome | single-cell sequencing for the different SR1 phylotypes. The skin harbors an approxi- mately equal distribution of the major OTUs found in the mouth. andidate phylum SR1 includes cosmopolitan bacteria that are Phylogenetic analysis of host-associated and free-living SR1 bac- Cfound in marine and terrestrial high-temperature environ- teria indicates that all human phylotypes and other animal-asso- ments, fresh-water lakes, and subsurface aquifers (1). There are no ciated taxa classify within the subgroup III (1) (Fig. S1). This cultivated representatives of SR1. Environmental sequencing of phylogeny separates the free environmental members of the SR1 small subunit (SSU) rRNA first identified these bacteria in con- phylum from the host-associated ones and partitions the human taminated aquifers (2). These bacteria are usually found in sulfur- lineages in two distinct groups, each including oral and skin types. rich and oxygen-limited environments, suggesting a potential microaerophilic, sulfur-based metabolism (3). SR1 bacteria also Oral SR1 Evolved from a Deep-Branching Bacterial Phylum with a Unique associate with animals and exist in termite and mammalian di- Genetic Code. Based on the single-cell genomic sequence, the draft gestive tracts (1, 4, 5), as well as in the human oral cavity (6). SR1 is SR1-OR1 genome was assembled from 56 contigs totaling 0.46 in low abundance in healthy oral microbiota (∼0.1% on average), but their abundance increases several-fold in patients with H2S- related malodor (7) and in periodontal disease (8). Uncultivated Author contributions: P.O., D.S., and M.P. designed research; J.H.C., P.O., and A.G.C. microbial taxa, especially those in low abundance in the environ- performed research; J.H.C., P.O., P.S., A.S., T.W., D.S., and M.P. analyzed data; and J.H.C., ment, are refractory to typical genomic and microbiological P.O., D.S., and M.P. wrote the paper. techniques. Single-cell genomics and genomic reconstruction using The authors declare no conflict of interest. fi metagenomic data signi cantly advanced understanding of un- Data deposition: The SR1-OR1 final assembly sequence data have been submitted to cultivated microbes (9). To gain insight into the biology of human- GenBank under BioProject (accession no PRJNA189303). The sequence data are readily associated SR1 bacteria, we used single-cell genomic sequencing of available with full annotations under the Integrated Microbial Genomes portal at http:// an SR1 cell isolated from a human oral sample. Human Micro- img.jgi.doe.gov/cgi-bin/w/main.cgi (taxon ID: 2517572135). biome Project (HMP) data expanded genomic coverage, enabled 1J.H.C. and P.O. contributed equally to this work. evolutionary analyses of SR1 bacteria across the human pop- 2To whom correspondence may be addressed. E-mail: [email protected] or podarm@ ulation, and revealed evidence of a unique genetic code. ornl.gov. At the time of its elucidation, the genetic code was thought to be This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. universal and invariant (10). However, we now know of natural 1073/pnas.1303090110/-/DCSupplemental.

5540–5545 | PNAS | April 2, 2013 | vol. 110 | no. 14 www.pnas.org/cgi/doi/10.1073/pnas.1303090110 Downloaded by guest on September 25, 2021 ′ ′ supragingival plaque anticodon (5 -UCA-3 ). Although only half of the molecule (T- arm, acceptor stem) is similar to normal tRNAGly sequences, Gly subgingival plaque tRNA UCA includes most of the glycyl-tRNA synthetase (GlyRS) identity elements (17): that is, particular nucleotides (U73, G1:C72, saliva C2:G71, G3:C70, C35, C36) required for GlyRS recognition and glycylation (Fig. S3). These data indicate that oral SR1 bacteria use throat OTU# the opal stop codon (UGA) for Gly. Identical tRNA genes were 1 Gly found in two SR1 oral HMP sequences, and similar tRNA UCA tongue dorsum 2 3 sequences are evident in the genomes of related bacteria ACD78 4 and ACD80, which were reported to encode Trp with UGA (18). hard palate Gly 5 All three tRNA UCA lack the identity elements (G73, A1: 6 palatine tonsils U72, G2:C71, G3:C70, C34, C35, A36) required for tryptophanyl- tRNA synthetase (TrpRS) activity (Fig. S3). U73 and G1:C72 in S Gly retroauricular crease tRNA UCA are incompatible with TrpRS function (19). Gene mapping based on reassignment of TGA to Gly resulted buccal mucosa in the prediction of 994 protein genes, with an ORF size distri- bution typical of bacteria (Figs. S2 and S4). Based on Bacteria- keratinized gingiva specific conserved single-copy genes (Table S2), we estimated the ∼ S left antecubital fossa SR1-OR1 genome is 56% complete. As is common with un- cultivated phyla, ∼35% of SR1’s predicted proteins have no Sright antecubital fossa homologs. A significant fraction of the proteins (40%) are most 0 10 20 30 40 50 60 70 80 90 similar to their counterparts in ACD80 (18) (Fig. S4). RNA 806040 100 polymerase phylogeny confirmed that SR1 is closely related to % Similarity % of SR1 OTU/site ACD80, and both appear as deep branches in the bacterial (Fig. 2 and Fig. S5). Fellow uncultivated phyla BD1-5 (including Fig. 1. Hierarchical clustering of bacterial SR1 OTU abundance (Bray-Curtis ACD78), PER, TM7, OD1, and OP11 are the next most closely similarity matrices) by body sites and corresponding frequency distribution related taxa, and Chloroflexi are the nearest cultured relatives. of the six major SR1 OTUs at those sites. The skin SR1 sequences (indicated by “ ” s ) were combined for OTU abundance calculation due to their low num- SR1 Is a Fermentative Anaerobe with an Archaeal Metabolic Trait. bers. SR1-OR1 belongs to OTU#1. The incomplete genomic assembly combined with the absence of human-associated or environmental isolates limits our ability to experimentally characterize SR1 bacteria, yet genomic re- Mbp. Initial gene prediction revealed abundant, short, contiguous construction provides initial inferences of the metabolism and ORFs with identical annotations but separated by canonical ter- lifestyle of these organisms. Genes encoding enzymes for several minator TGA codons (Fig. S2). Sequence alignments of conserved glycolysis steps were identified (phosphoglycerate mutase, phos- proteins revealed a high frequency of interruptions of SR1-OR1 phopyruvate hydratase, and pyruvate kinase) as well as for pyru-

genes by TGA codons at highly conserved Gly positions across vate formate lyase, which converts pyruvate to acetyl-CoA. EVOLUTION Bacteria. For example, the translation initiation factor IF2 is Acetate kinase, potentially involved in substrate-level phosphor- interrupted at three locations that are 95–100% Gly in over 600 ylation is encoded in addition to subunits of an F1F0-type ATP homologs from all . Both RNA polymerase β and β′ synthase. As in its free-living relatives, SR1 show no evidence of have TGA at invariant Gly positions (motifs GDK and GRFR). a tricarboxylic acid cycle or electron transport chain components, Most of those interruptions were also found in over 70 highly suggesting SR1 are nonrespiring. The genome encodes over similar scaffolds identified in 13 HMP oral metagenomes (16). a dozen distinct peptidase families, a pectinase, and a glycosyl These data expanded the SR1-OR1 assembly to over 1.1 Mbp in hydrolase, which may produce fermentable substrates. 49 contigs (SI Results and Table S1). SR1 bacteria encode a ribulose-1,5-bisphosphate carboxylase Gly In addition to a canonical tRNA UCC, the SR1 genome en- (RubisCO) gene classified as a distinct subfamily containing codes an unusual tRNAGly-like sequence with an opal decoding homologs from several methanogenic archaea (20) (Fig. S6 and

Synechococcus elongatus 59 Actinomyces odontolyticus Rothia dentocariosa Gardnerella vaginalis 97 Clostridium difficile Faecalibacterium prausnitzii Staphylococcus aureus Gemella haemolysans Catonella morbi Enterococcus faecalis 95 Streptococcus pneumoniae Herpetosiphon aurantiacus Oscillochloris trichoides Chloroflexus aurantiacus Thermomicrobium roseum 73 Sphaerobacter thermophilus Thermobaculum terrenum Anaerolinea thermophila 91 Dehalogenimonas lykanthroporepellens Dehalococcoides ethenogenes

OP11 OP11 83 OD1_ACD63 OD1 Fig. 2. Maximum-likelihood phylogenetic tree of 98 TM7 TM7 oral taxon 349 SR1 and related bacteria based on RNA polymerase PER_ACD28 PER 99 protein sequences (β-β′ subunits). A phylogeny rep- BD1_ACD49 BD1-5 100 SR1-OR1 SR1 resenting all Bacteria is shown in Fig. S5. Node labels denote branch support. The cluster of candi- ACD80 ACD80 0.3 date phyla including SR1 is highlighted.

Campbell et al. PNAS | April 2, 2013 | vol. 110 | no. 14 | 5541 Downloaded by guest on September 25, 2021 Dataset S1). Even though this type of RubisCO was shown to fix A CO2 in an in vivo complementation assay (21), its physiological role is in the AMP-recycling pathway involving AMP phosphory- 140 lase (DeoA) and ribose 1,5-bisphosphate isomerase (E2b2). The 120 resulting 3-phosphoglycerate is available to glycolysis or gluco- 100 neogenesis (22). 80 This AMP-recycling pathway was previously unknown in human- 60 associated bacteria. SR1-related ACD80 and PER subsurface 40 bacteria also encode the archaeal RubisCO (18). An inserted loop is found in the SR1 and ACD80 genes (Fig. S7). The SR1-like Number of ORFs 20 RubisCO sequences are highly similar to the archaeal homologs 0 0 5 10 15 20 21+ (73–76% identity). The bacterial DeoA and E2b2 sequences are B also highly similar to their archaeal homologs, indicating horizontal 50 Number of TGA codons / ORF gene transfer (HGT) of the entire pathway. Although it is not 45 possible to define the trajectory of HGT with certainty, the bac- terial sequences are apparently derived from archaeal ancestors as 40 they branch within a larger group of exclusively archaeal relatives 35 * (Figs. S6 and S8). Archaeal RubisCO is oxygen-sensitive and is 30 found in strict anaerobes, suggesting that in order for the RubisCO 25 pathway to function, oral SR1 may require anaerobic or micro- 20 aerophilic conditions. SR1-OR1 encodes an alkyl hydroperoxide 15 reductase and a superoxide dismutase that may protect the cell 10 from oxidative damage. Genes for general stress response proteins, protein processing (DnaJ, GrpE, Clp protease), and assembly of 5 * iron-sulfur (Fe-S) clusters were also identified. Usage in SR1-OR1/pangenome (%) 0 TGA GGA GGT GGC GGG The SR1-OR1 genome includes genes for murein biosynthesis Glycine Codon and a tripeptide synthase that was suggested to confer Gram- positive characteristics to candidate phylum TM7 (9). Genes that Fig. 3. Codon use in SR1. (A) Frequency of internal TGAs in protein coding provide resistance (hemolysin, capsule production pro- genes from SR1-OR1 and SR1 metagenomic scaffolds. (B) Box-and-whisker tein) and possibly confer natural DNA competence (ComEC) are plot comparison of Gly codon use in the SR1-OR1 (♦) with that of predicted evident. There are no flagellum genes, but several genes for type II genes in all SR1 HMP metagenomic scaffolds (SR1 “pangenome”). Outliers and type IV secretion systems were identified. Detection of pilus are indicated with an asterisk. (PilB,C, T, and TraX) assembly genes suggests that limited mo- bility (twitching) and cell-to-cell interactions are possible. An FtsZ “ ” synonymous transitions (A-G and C-T). The transversion that homolog indicates that SR1 cells likely divide using a z-ring > mechanism. most likely drives the Gly reprogramming (GGA TGA) is less frequent and competes with purine excess at the first codon po- TGA Exchanges with Canonical Gly Codons Across the Human Population. sition (62%) compared with 50% at second and third positions. In Mycoplasma + Inspection of the SR1-OR1 and SR1-type metagenomic scaffolds , a drastic loss of G C content at the third codon ∼ fi (i.e., SR1 “pangenome”) revealed that most TGAs are conserved position (9% vs. 30% at rst and second position) contributed to across the HMP metagenome. In SR1-OR1, 85% of the predicted near complete replacement of the TGG Trp codon with TGA.

ORFs have at least one in-frame TGA and 24 genes (2.4%) encode Gly over 20 TGAs (Fig. 3A). In some alleles, homologous sites contain SR1 GlyRS and tRNA UCA Convert UGA to an Additional Gly Codon in canonical Gly codons (Table S3), and synonymous codon sub- Escherichia coli. SR1 encodes a canonical glycyl-tRNA synthetase stitution of TGA for GGN is evident in comparing SR1 isolated (α-dimeric type), which is similar to the well-characterized Ther- from different individuals (Fig. 4). Analysis of ORFs with internal mus thermophilus enzyme (24). Because SR1 bacteria are un- TGAs assigned to Gly did not reveal predicted peptides from two cultivated and no genetic system exists, we tested the in vivo Gly E. coli lacZ separate genes that were incorrectly joined and there was no un- activity of SR1 GlyRS and tRNA UCA variants in .A usual gene overlap. None of the ORFs appear to require UGA to gene with a TGA codon at position 3 serves to report the level of stop translation, so it is not surprising that release factor 2, which translational read-through of UGA by β-galactosidase activity. directs termination at UGA, is absent. Compared with the wild type β-galactosidase (Met3), endogenous Trp The frequency of TGA codons is ∼24%. Canonical Gly codon Trp-tRNA significantly suppresses the stop codon function of use correlates with the low G+C content of the genome (GGA UGA, leading to 15 ± 2% translational read-although of UGA A Gly 42%, GGT 16%, GGG 13%, and GGC 3%). The same distribu- (Fig. 5 ) (25). Expression of SR1 tRNA UCA leads to enhanced tion was observed across the pangenome. By comparison, re- read-through of UGA (22 ± 2%), and expression of the SR1 programming in Mycoplasma capricolum is much more extensive, GlyRS yields additional UGA translation (25 ± 3%). SR1 GlyRS Gly with 65% of the proteins containing exclusively UGA-encoded and tRNA UCA are active molecules when expressed in E. coli. Gly Trp (Fig. S2C). We observed evolution of TGA use across the SR1 When only tRNA UCA is present, increased UGA translation is pangenome. Except for elevated GGC use in SR1-OR1, Gly co- likely because of aminoacylation of the tRNA by E. coli’s native Gly don use for 2,083 alleles across 13 human donors is similar to that GlyRS. ACD78 (23 ± 1%) and ACD80 tRNA UCA (52 ± 4%) in the single cell (Fig. 3B and Table S4). At individual sites, TGA both supported enhanced read-through of UGA compared with alternates primarily with GGA, but replacements with all GGNs background (Fig. 5A). Although coexpression of SR1 GlyRS did occur. Among 71 oral SR1 RubisCO alleles, TGAs were found at not stimulate UGA suppression further for the ACD78 tRNA, Gly four Gly loci, with a distribution that correlates with the phylo- glycylation of ACD80 tRNA UCA by SR1 GlyRS leads to a high genetic grouping of oral SR1 OTUs (Fig. 4 and Dataset S1). level (68 ± 2%) of UGA translation as Gly. The nonsynonymous to synonymous substitution rates ratio (dN/dS) across the SR1 pangenome is low (0.023 for RubisCO), SR1 GlyRS Glycylates UGA-Decoding and Canonical tRNAGly Species. Gly indicating strong natural selection and suggesting that low G+C To verify glycine-accepting activity of the atypical tRNA UCA content and possibly TGA reassignment do not result from neutral variants (Fig. S3), we investigated aminoacylation activity of mutational drift (SI Results), as in clonal bacterial pathogens (23). recombinant SR1 GlyRS with several tRNAGly substrates in vitro The pangenome is dominated by a fivefold overabundance of (Fig. 5B). The enzyme reaches a plateau for glycylation (17.5 ±

5542 | www.pnas.org/cgi/doi/10.1073/pnas.1303090110 Campbell et al. Downloaded by guest on September 25, 2021 TGA Codon # Human subject representatives that are relatively deep branching in the bacterial domain (Fig. S5). The existence of human-associated species in TGA (HMP)

186 224 256 425 phyla dominated by anaerobic thermophiles is intriguing. Con- 737052003 GGA ceivably, as with other bacterial phyla that predominantly consist 159814214* of free-living species (e.g., candidate divisions TM7 and OP11, 158944319 phyla Chloroflexi, Nitrospira) (6, 8, 26), SR1 may have come in 158337416 contact with the human host, survived, and evolved as part of the 158742018 resident microbiota. 0.021 (+/- 0.008) SR1-OR1 SR1 are less phylogenetically diverse and abundant compared 160643649 with more successful colonizers of the human microbiome (e.g., 763901136 Firmicutes and ). SR1 either colonized human and 160380657 animal hosts relatively recently or they are less adaptable or competitive than other taxa. Nevertheless, human oral and skin- 158458797 associated SR1 present signs of diversification and potential 809635352 niche specialization, with strains or species preferring micro- 809635352-2 environments that offer different levels of oxygen, nutrient, or 370425937# biofilm opportunities. Periodontal disease may be linked to in- 370425937-2# creased SR1 abundance (8) but their role, if any, in the etiology 158883629 of disease is unknown. Cultivation and complete sequencing of 370425937# multiple lineages from this group, including free-living repre- 763961826 sentatives, will broaden our view of the biology, evolution, and 763536994 role of SR1 bacteria in human health. 809635352 0.024 (+/- 0.002) Molecular Mechanism of UGA Reassignment. The most unexpected 160158126 finding was that human oral SR1 bacteria use a novel variation of 764447348 the genetic code, in which the UGA terminator has been reprog- # Gly 370425937 rammed to a Gly codon. SR1, ACD78, and ACD80 tRNA UCA 370425937-2# variants contain most of the identity elements required for GlyRS 159369152 recognition (Fig. S3), but the opal decoding anticodon should in- T. thermophilus 763435843 hibit GlyRS activity. GlyRS is 5,000-fold less cat- alytically efficient with a tRNAGly mutant (C36A) with the UCA 764447348 Gly anticodon (27), yet SR1 GlyRS aminoacylates tRNA UCC and 159814214* opal decoding tRNAGly variants with similar activity. Some nucleo- 763496533 Gly 0.03 tides in the divergent D-arm and acceptor stem of tRNA UCA 160603188 may compensate for the C36A mutation to enhance glycylation Gly 159591683 and optimize reassignment of UGA to Gly. ACD80 tRNA UCA differs only at positions 15 and 47 from the SR1 tRNA, but shows EVOLUTION 550534656 Gly 763820215 increased Gly-tRNA UCA production and threefold enhanced ribosomal decoding of UGA, yielding 70% translational read- Fig. 4. Distribution of reprogrammed TGA codons in SR1 RubisCO genes. through of UGA in vivo (Fig. 5). The type of Gly codon, TGA (black) or canonical GGA (white), at the four Gly codon reassignment is rare and has only been detected in reprogrammed positions in the RubisCO SR1 pangenome is overlaid on the Pyura stolonifera mitochondria (28) and related genomes where gene phylogeny, based on full-length sequences from HMP metagenomes Gly incorporation is directed by two arginine codons. Although and SR1-OR1. Each HMP number defines a different human donor. For four SR1, ACD78, and ACD80 are the only organisms with UGA Gly human subjects, two SR1 phylotypes were identified or genes were present assigned to Gly, early work in E. coli produced tRNA variants in samples collected at different times (-2). Multiple sequences from donors that could translate UGA, other stop codons, and even sense 159814214 (*), 809635352 (grey rectangle), 370425937 (#), and 764447348 codons (29). Replacing a conserved Gly in an essential gene with (oval) are indicated by superscript symbols. The average dN/dS values for the either a non-Gly sense codon or a stop codon provided a condi- two main clades are indicated. tional lethal mutant. Selection experiments lead to the isolation of tRNA variants that could suppress the lethal mutation. The cells Gly were rescued by tRNAs that mistranslated sense codons as Gly 0.2% of total tRNA) with its canonical tRNA that is similar to (missense suppressor) or read-through stop codons with Gly Gly ± the amount of Gly-tRNA UCA formed (15.1 1.2%). SR1 (nonsense suppressor). The tRNAGly UGA suppressors isolated in fi E. coli GlyRS displays reduced but signi cant activity with these experiments differ only by one or two mutations from ca- Gly ± ± Gly tRNA (11.4 0.4%). ACD78 (24.0 2.6%) and ACD80 nonical tRNA and are less efficient in translating UGA [2–45% ± (22.1 0.2%) are the most active substrates, but only the ACD80 (30)] compared with the ACD80 tRNA. In their native cellular variant promotes efficient UGA read-through in vivo (Fig. 5A). context, nucleotide modifications (SI Discussion) may improve The ACD78 tRNA may be less compatible with EF-Tu or trans- UGA decoding for the SR1 and ACD78 tRNAs. location on the E. coli ribosome. The ACD78 tRNA sequence is In the absence of an opal-decoding tRNA, background trans- distinct from the SR1 and ACD80 tRNAs, yet only mutations lation of UGA in E. coli and possibly in SR1 results from near- G15A and U47C separate the SR1 and ACD80 variants (Fig. S3). cognate translation of Trp-tRNATrp. This result leads to insertion of Trp in response to UGA in reporter proteins (Fig. 5) (25), and Discussion likely to varying levels throughout the proteome. When SR1 Gly Single-cell genomics and assemblies of metagenomes provide new GlyRS and tRNA UCA are coexpressed, the proteome may opportunities to characterize uncultured constituents of complex contain some level of Trp and Gly at UGA-encoded loci. As in the microbial communities. Even though less diverse than many free- case of the ACD80 tRNA, however, increasing the cellular con- Gly living communities, human and animal-associated microbiota centration of Gly-tRNA UCA leads to enhanced UGA read- harbor uncultured bacteria at all taxonomic levels (5, 14). We through that may reach a sufficient level to completely outcompete found that the human body harbors a surprising diversity of SR1 Trp. Given these data, it is fascinating that a recent report iden- bacteria. ACD80 (18) are the closest relatives of SR1 (Fig. 2 and tified peptides containing Trp in response to UGA in ACD80 and Figs. S4–S8) and part of a cluster of phyla without cultured ACD78 samples (18). Although some level of Trp incorporation at

Campbell et al. PNAS | April 2, 2013 | vol. 110 | no. 14 | 5543 Downloaded by guest on September 25, 2021 A 100

80

60

40 % of wild type galactosidase activity

β- 20

0 LacZ reporter Wild type (M3) M3UGA M3UGA M3UGA M3UGA M3UGA M3UGA M3UGA Gly tRNA UCA empty empty SR1 SR1 ACD78 ACD78 ACD80 ACD80 GlyRS empty empty empty SR1 empty SR1 empty SR1 B Gly ACD78 tRNA UCA

ACD80 tRNAGlyUCA

20

SR1 tRNAGlyUCC

Gly SR1 tRNA UCA

E. coli tRNAGlyUCC Fig. 5. Biochemical characterization of SR1 GlyRS Gly A 10 and tRNA UCA.( ) To assay UGA translation in vivo in E. coli, β-galactosidase was expressed from a lacZ

% of glycyl-tRNA formed reporter gene with either Methionine 3 (wild-type) or M3 mutated to TGA. β-Galactosidase activities, shown relative to the wild-type activity level, were Gly measured in the presence or absence of tRNA UCA variants (SR1, ACD78, and ACD80) and SR1 GlyRS. (B) In vitro glycylation activity of SR1 GlyRS with Gly tRNA UCA (SR1: □; ACD78: +; ACD80: ♢) and ca- 0 nonical tRNAGly (SR1 ○, E. coli △) variants. The data 0 5 10 15 20 25 are based on triplicate experiments with 10 μM time (minutes) GlyRS and 0.2 μM tRNA.

UGA might be tolerated by the cell, many of the UGAs in SR1 and UGAs in SR1 are read as stop or if they lead to different levels of related bacteria encode essential or invariant Gly residues. Global read-through with Gly. In the human oral SR1 pangenome, we replacement of these residues with Trp would likely lead to an observed codon use variation across different strains and human Gly inactive proteome. The GlyRS/tRNA UCA pairs found in these hosts, showing how the code evolves in real time. The exchange of organisms should produce sufficient Gly-tRNA to dominate UGA canonical Gly codons with TGA demonstrates that different SR1 translation with Gly. Codon reassignment is a shared character of strains interchangeably use these codons for Gly. In RubisCO and SR1, ACD78, and ACD80 that likely evolved before these deeply other genes, TGA for GGN exchange tracks with the phylogenetic branching lineages diverged. separation of SR1 strains in distinct oral habitats (Figs. 1 and 4), resembling ecological differentiation in free-living bacteria (36). Evolutionary Consequences of Codon Reassignment. Although a Why do variant genetic codes evolve and why do lineages specialized tRNA is required for codon reassignment, the forces maintain code variations over evolutionary time? Differential that induce genetic code evolution and selectively maintain UGA fidelity or efficiency of Gly incorporation at UGA compared with as a sense codon are less obvious. Reduction in genome size (31) canonical Gly codons could provide a selective “handle” for and genomic AT or GC bias (32, 33) are two known evolutionary maintaining a variant code. Because it is unknown if any intrinsic phenomena linked to codon reassignment. Based on our esti- phenotypic cost or benefit of this type exists, perhaps the advan- mated coverage, the SR1 genome is likely not larger than 2 Mb, tage that SR1 derives from its variant code can only be understood and SR1 and related bacteria that reassign UGA (18) all have in terms of its ecological context. This genetic code variation will small AT-rich genomes. Theories of genetic code evolution differ bias the susceptibility of oral SR1 to phage predation and HGT. A mainly on whether a codon reassignment event will lead to am- highly diverse repertoire of phages is present in saliva, and by biguous codon reading and mistranslation. Ambiguous codons mediating HGT, phages significantly impact evolution of the mi- are tolerated in Candida (34), but other organisms, such as crobial community and the distribution of pathogenicity islands M. capricolum (35), display more complete codon reassignment. (37). HGT is rampant within the human microbiota, outpacing Although we found no evidence in homologous ORFs of UGA- that in free-living communities and enabling efficient adaptation intended stop codons in SR1, it is not known if in vivo some of the to host niches, antibiotic resistance, and the emergence of

5544 | www.pnas.org/cgi/doi/10.1073/pnas.1303090110 Campbell et al. Downloaded by guest on September 25, 2021 pathogenicity (38). We hypothesize that SR1’s altered genetic Expanding the SR1 Single-Cell Genome Using HMP Metagenomic Information. code allows the organism to successfully acquire genes from its To identify scaffolds likely representing SR1 bacteria in HMP metagenomic data, environment, but severely limits the ability of foreign hosts to sequence similarity searches were performed at protein and DNA levels in translate genetic material from SR1. In SR1, any transcribed IMG_HMP (16). Scaffolds that had syntenic gene content with the SR1-OR1 foreign genetic material lacking UGA codons would be translated contigs and that displayed high-sequence similarity (generally >95% at DNA level normally, but expression of some genes could result in read- in coding regions) were used to expand the genomic coverage and to enable through of UGA stop codons, potentially impacting protein pangenomic analyses of SR1 genomic variability across the human microbiome. folding and enzymatic activity. Translation of most SR1 genes, on For taxonomic diversity analysis, the HMP SR1 SSU rRNA pyrosequence data were the other hand, will lead to prematurely terminated peptides and analyzed with respect to distributions at defined human body sites and abun- inactive enzymes in essentially any other member of the oral dance of individual species-level taxonomic units (SI Experimental Procedures). microbiota. This genetic incompatibility partially isolates SR1 from the human microbiome gene pool, which may limit its ’ Biochemical Assays. In vitro aminoacylation and in vivo UGA translation assays adaptability, but also prevents SR1 s competitors from sharing its were performed as previously described (25). Experimental details are in SI genomic innovations. SR1 bacteria, therefore, evolve as a quasi- Experimental Procedures. independent taxonomic island within the complex community of microbes that colonize the human body. ACKNOWLEDGMENTS. We thank S. Allman and Z. Yang (Oak Ridge National Laboratory) for technical assistance; T. Vishnivetskaya for pro- Experimental Procedures viding Human Microbiome Project SR1 pyrosequences; Ilka Heinemann, SR1 Single-Cell Genomics. A bacterium representing the candidate phylum SR1 Jiqiang Ling, and Laure Prat for inspired discussions; the Human Microbiome was identified among cells randomly isolated by flow cytometry sorting from Project research community for providing sequence data; and the devel- a healthy oral-subgingival sample. The genomes of individual bacterial cells opers of Integrated Microbial Genomes for analysis platforms. This work was supported by National Institutes of Health Grants R01 HG004857 (to were amplified using multiple-displacement amplification (39) followed by M.P.) and GM22854 (to D.S.); Defense Advanced Research Projects Agency taxonomic characterization using rRNA gene amplification and sequencing. Contract N660-12-C-4020 (to D.S.); the Oak Ridge National Laboratory (man- fi The ampli ed SR1 genomic DNA was sequenced using 454 Titanium and Illu- aged by the University of Tennessee-Battelle) via the US Department of mina High Seq platforms. Following quality control and abundance normali- Energy Contract DE-AC05-00OR22725; and US Department of Energy Joint zation, sequences were assembled into a draft SR1-OR1 genome. See SI Genome Institute and Department of Energy Contract DE-AC02-05CH11231 Experimental Procedures for experimental and computational details. (to P.S., T.W., and A.S.).

1. Davis JP, Youssef NH, Elshahed MS (2009) Assessment of the diversity, abundance, and lessons provided by diverse molecular forms. Philos Trans R Soc Lond B Biol Sci 363(1504): ecological distribution of members of candidate division SR1 reveals a high level of 2629–2640. phylogenetic diversity but limited morphotypic diversity. Appl Environ Microbiol 75 21. McCabe K (2009) Investigation of the structure and enzymatic activity of the novel Ru- (12):4139–4148. bisco from Methanococcoides burtonii. BS thesis (Ohio State University, Columbus, OH). 2. Dojka MA, Hugenholtz P, Haack SK, Pace NR (1998) Microbial diversity in a hydro- 22. Sato T, Atomi H, Imanaka T (2007) Archaeal type III RuBisCOs function in a pathway carbon- and chlorinated-solvent-contaminated aquifer undergoing intrinsic bio- for AMP metabolism. Science 315(5814):1003–1006. remediation. Appl Environ Microbiol 64(10):3869–3877. 23. Hershberg R, Petrov DA (2010) Evidence that mutation is universally biased towards 3. Perner M, et al. (2007) Microbial CO(2) fixation and sulfur cycling associated with low- AT in bacteria. PLoS Genet 6(9):e1001115. temperature emissions at the Lilliput hydrothermal field, southern Mid-Atlantic Ridge 24. Mazauric MH, et al. (1996) An example of non-conservation of oligomeric structure in prokaryotic aminoacyl-tRNA synthetases. Biochemical and structural properties of (9° S). Environ Microbiol 9(5):1186–1201. EVOLUTION Thermus thermophilus Eur J Biochem – 4. Hongoh Y, Ohkuma M, Kudo T (2003) Molecular analysis of bacterial microbiota in glycyl-tRNA synthetase from . 241(3):814 826. ’ the gut of the termite Reticulitermes speratus (Isoptera; Rhinotermitidae). FEMS 25. O Donoghue P, et al. (2012) Near-cognate suppression of amber, opal and quadruplet Pyl FEBS Lett Microbiol Ecol 44(2):231–242. codons competes with aminoacyl-tRNA for genetic code expansion. – 5. Ley RE, et al. (2008) Evolution of mammals and their gut microbes. Science 320(5883): 586(21):3931 3937. 26. Ley RE, Lozupone CA, Hamady M, Knight R, Gordon JI (2008) Worlds within worlds: 1647–1651. Evolution of the vertebrate gut microbiota. Nat Rev Microbiol 6(10):776–788. 6. Dewhirst FE, et al. (2010) The human oral microbiome. J Bacteriol 192(19):5002–5017. 27. Nameki N, Tamura K, Asahara H, Hasegawa T (1997) Recognition of tRNA(Gly) by three 7. Takeshita T, et al. (2012) Discrimination of the oral microbiota associated with high widely diverged glycyl-tRNA synthetases: Evolution of tRNA recognition. Nucleic Acids hydrogen sulfide and methyl mercaptan production. Sci Rep 2:215. Symp Ser (37):123–124. 8. Griffen AL, et al. (2012) Distinct and complex bacterial profiles in human periodontitis 28. Durrheim GA, Corfield VA, Harley EH, Ricketts MH (1993) Nucleotide sequence of and health revealed by 16S pyrosequencing. ISME J 6(6):1176–1185. cytochrome oxidase (subunit III) from the mitochondrion of the tunicate Pyura sto- 9. Marcy Y, et al. (2007) Dissecting biological “dark matter” with single-cell genetic lonifera: Evidence that AGR encodes glycine. Nucleic Acids Res 21(15):3587–3588. analysis of rare and uncultivated TM7 microbes from the human mouth. Proc Natl 29. Murgola EJ (1994) Translational suppression: When two wrongs DO make a right. tRNA: Acad Sci USA 104(29):11889–11894. Structure, Biosynthesis and Function, eds Söll D, RajBhandary UL (ASM, Washington, 10. Crick FH (1968) The origin of the genetic code. J Mol Biol 38(3):367–379. DC), pp 491–510. 11. Yuan J, et al. (2010) Distinct genetic code expansion strategies for selenocysteine and 30. Kirsebom LA, Isaksson LA (1986) Functional interactions in vivo between suppressor pyrrolysine are reflected in different aminoacyl-tRNA formation systems. FEBS Lett tRNA and mutationally altered ribosomal protein S4. Mol Gen Genet 205(2):240–247. – 584(2):342 349. 31. Andersson GE, Kurland CG (1991) An extreme codon preference strategy: Codon re- 12. Prat L, et al. (2012) Carbon source-dependent expansion of the genetic code in bac- assignment. Mol Biol Evol 8(4):530–544. Proc Natl Acad Sci USA – teria. 109(51):21070 21075. 32. Osawa S, Jukes TH (1988) Evolution of the genetic code as affected by anticodon 13. Barrell BG, Bankier AT, Drouin J (1979) A different genetic code in human mito- content. Trends Genet 4(7):191–198. Nature – chondria. 282(5735):189 194. 33. Schultz DW, Yarus M (1994) Transfer RNA mutation and the malleability of the ge- 14. Huttenhower C, et al.; Human Microbiome Project Consortium (2012) Structure, netic code. J Mol Biol 235(5):1377–1380. Nature – function and diversity of the healthy human microbiome. 486(7402):207 214. 34. Santos MAS, Cheesman C, Costa V, Moradas-Ferreira P, Tuite MF (1999) Selective 15. Zhou Y, et al. (2013) Biogeography of the of the healthy human body. advantages created by codon ambiguity allowed for the evolution of an alternative Genome Biol 14(1):R1. genetic code in Candida spp. Mol Microbiol 31(3):937–947. 16. Markowitz VM, et al. (2012) IMG/M-HMP: A metagenome comparative analysis sys- 35. Inagaki Y, Bessho Y, Osawa S (1993) Lack of peptide-release activity responding to tem for the Human Microbiome Project. PLoS ONE 7(7):e40151. codon UGA in Mycoplasma capricolum. Nucleic Acids Res 21(6):1335–1338. 17. Giegé R, Sissler M, Florentz C (1998) Universal rules and idiosyncratic features in tRNA 36. Shapiro BJ, et al. (2012) Population genomics of early events in the ecological dif- identity. Nucleic Acids Res 26(22):5017–5035. ferentiation of bacteria. Science 336(6077):48–51. 18. Wrighton KC, et al. (2012) Fermentation, hydrogen, and sulfur metabolism in mul- 37. Pride DT, et al. (2012) Evidence of a robust resident bacteriophage population re- tiple uncultivated bacterial phyla. Science 337(6102):1661–1665. vealed through analysis of the human salivary virome. ISME J 6(5):915–926. 19. Himeno H, Hasegawa T, Asahara H, Tamura K, Shimizu M (1991) Identity determi- 38. Smillie CS, et al. (2011) Ecology drives a global network of gene exchange connecting nants of E. coli tryptophan tRNA. Nucleic Acids Res 19(23):6379–6382. the human microbiome. Nature 480(7376):241–244. 20. Tabita FR, Hanson TE, Satagopan S, Witte BH, Kreel NE (2008) Phylogenetic and evo- 39. Rodrigue S, et al. (2009) Whole genome amplification and de novo assembly of single lutionary relationships of RubisCO and the RubisCO-like proteins and the functional bacterial cells. PLoS ONE 4(9):e6864.

Campbell et al. PNAS | April 2, 2013 | vol. 110 | no. 14 | 5545 Downloaded by guest on September 25, 2021