Supporting Information
Total Page:16
File Type:pdf, Size:1020Kb
Supporting Information Campbell et al. 10.1073/pnas.1303090110 SI Results natural selection, as the ratio of nonsynonymous to synonymous Human Oral SR1 Bacterial Genome and the Oral SR1 Pangenome. A substitution rates (dN/dS) across the RubisCO tree is low (0.023, healthy human oral subgingival plaque sample was used for SD = 0.005). Separate analysis of another gene (thymidine phos- bacterial single-cell sorting by flow cytometry. Following genomic phorylase, deoA) across the SR1 pangenome space (10 alleles) multiple displacement amplification (MDA) using phi29 DNA yielded the same result (dN/dS = 0.05, SD = 0.017). This finding polymerase and characterization of single amplified genomes suggests the low G+C content of the SR1 genome and re- (SAGs) by sequencing of small subunit (SSU) rRNA gene am- programming are not results of neutral mutational drift under plification and sequencing, we identified one SAG corresponding a relaxed selection, which would be signaled by high dN/dS values, to an SR1 bacterium (SR1-OR1). The bacteria’s SSU rRNA se- as in the genomes of clonal bacterial pathogens (1). quence is 99–100% identical to that of several other human oral and skin SR1 clones from GenBank (Fig. 1). The SAG DNA was SI Discussion sequenced using both 454 Titanium and Illumina HiSeq plat- UGA-decoding tRNAGly suppressors isolated from selection forms, generating 124.8 Mbp and 21 Gbp of sequence data, experiments (reviewed in ref. 2) were found to contain full or respectively. partial modification of A37, which is normally unmodified in Computational normalization and hybrid assembly resulted in canonical tRNAGly, to 2-methylthio-N6-isopentenyladenosine 56 contigs, totaling 0.46 Mbp of genomic sequence. Initial gene (ms2i6A). The modification enhances UGA translation (3) and calls and annotation were performed using the Integrated Mi- increases mistranslation (4). Clear homologs of the gene (miaA) crobial Genomes (IMG)/M system. The SR1-OR1 predicted encoding the ms2i6A modification enzyme are present in SR1, protein sequences were used to search for close homologs in the ACD78, and ACD80 genomes. Although glycyl-tRNA synthe- 415 oral Human Microbiome Project (HMP) metagenomic as- tase (GlyRS) activity is typically not sensitive to modified bases semblies. Metagenomic scaffolds containing regions syntenic with (5), it is possible that modification at position 37 or at other the SAG and with high amino acid and nucleotide similarity levels locations may improve UGA decoding for the SR1 and ACD78 > ( 95%) were used for a hybrid, pangenomic superassembly. tRNAs in their native cellular context. Over 70 scaffolds from 13 metagenomes (supragingival plaque and tongue) were used in the superassembly (Table S1). Even SI Experimental Procedures though the metagenomic scaffolds originate from samples col- Sample Collection. Subgingival samples (crevicular fluid and bio- lected from different individuals and therefore do not represent film) were collected from a healthy volunteer using medium a clonal population of SR1 bacteria or even a single phylotype, we paperpoints. Signed informed consent was obtained and the Oak applied the supra-assembly to not only improve our SR1-OR Ridge National Laboratory Institutional Review Board approved single-cell assembly but also to assess host-dependent micro- the human subject protocol. The paperpoints were pooled and heterogeneity within SR1. First, the metagenomic scaffolds that homogenized by vortexing in sterile PBS to resuspend the cells had the highest similarity level (identical protein sequence, followed by filtration through a CellTrics 30-μm disposable filter synonymous DNA substitutions allowed on overlapping regions) (Partec). The bacterial suspension was fixed with an equal vol- to the SAG dataset were used to expand the SAG assembly by ume of absolute ethanol at −20 °C overnight. Fixed cells were bridging the single-cell data into larger contigs. PCR walking from centrifuged at 3,000 × g for 5 min at 4 °C and washed once with SAG DNA and Illumina reads remapping was also applied to 1× PBS before sorting. minimize the heterogeneity introduced by the different meta- genomic scaffolds. This process resulted in an expanded SR1- + Single-Cell Sorting, MDA, and Taxonomic Characterization. A bacte- OR1 assembly of 1.1 Mbp in 49 scaffolds, with an average G C rial cell sample was stained with the nucleic acid dyes SYTO 9 content of 36%. Second, overlapping independent metagenomic μ > (green) and SYTO 62 (red) (Life Technologies), each at 5 M for regions, which in some cases included 10 scaffolds, allowed 15 min. Flow cytometry cell sorting was performed using a Cy- analyses of variability between microbiota samples and human topeia Influx cell sorter (BD), which was cleaned as previously hosts, including distinguishing oral SR1 strains or species present described (6, 7). Several thousand cells were sorted individually, in the human population. based on green and red fluorescence into 96-well plates con- taining 3 μL TE per well. TGA Use in RubisCO Alleles Across the Pangenome. For the SR1 μ ribulose-1,5-bisphosphate carboxylase (RubisCO) gene, we iden- MDA was performed in 20- L reactions essentially as de- tified 65 alleles in the HMP oral metagenomes, and mapped four scribed in ref. 6, using a dedicated DNA-free hood, reagents and fl μ sites with alternative use of TGA vs. GGA codons (Fig. 4 and plasticware (8). Brie y, cells were lysed by addition of 3 L Dataset S1). PCR amplification and sequencing of a fragment of 0.13 M KOH, 3.3 mM EDTA pH 8.0, and 27.7 mM DTT, heated the RubisCO gene from six individuals (healthy and with perio- to 95 °C for 30 s, and placed on ice for 10 min. Neutralization μ dontitis) confirmed these polymorphisms in the oral microbiota. buffer was added (3 L of 0.13 M HCl, 0.42 M Tris pH 7.0, 0.18 M Phylogenetic analysis of 30 full-length alleles from HMP meta- Tris 8.0) followed by 11 μL of amplication mix that contained genomes and the SR1-OR1 copy grouped the SR1 RubisCO genes 90.9 μMof3′-end phosphorothioate-protected random hexamers in two main clades that are congruent with the groupings in the (Integrated DNA Technologies), 1.09 mM dNTPs (Roche), 1.8× 16S rRNA phylogeny (Fig. S6). For several metagenomes, alleles phi29 DNA polymerase buffer (New England BioLabs), 4 mM representing both clades were present, indicating that an in- DTT (Roche), and 100 U phi29 DNA polymerase (9). Amplifi- dividual can harbor several SR1 phylotypes. The average nucleo- cation was for 10 h at 30 °C followed by inactivation at 80 °C for tide similarity level ranges from 96% within clades to 84% 20 min. To increase the amount of available DNA, secondary between the two clades. Only one of the four reprogrammed TGA MDAs were performed on 0.5 μL of primary MDA product using locations appears correlated with the phylogenetic grouping of the the same protocols used for primary MDAs, with the exception of RubisCO genes (Fig. 4 and Fig. S6). The gene is under strong amplification duration (6 h). MDA products were purified using Campbell et al. www.pnas.org/cgi/content/short/1303090110 1of20 standard phenol:chloroform:isoamyl alcohol extraction and alco- 21) and trimmed reads that contain unique k-mers. These fil- hol precipitation. tering steps reduced the dataset to 0.6 million reads. The amplified products (SAGs) were screened for the presence of Assembly was performed in several steps: (i) Filtered Illumina bacterial SSU rRNA genes using PCR amplification with the pri- reads were assembled using Velvet v1.1.04 (21). The Velve- mers 27Fm (5′-AGAGTTTGATYMTGGCTCAG-3′) and 1492R tOptimizer script (v2.1.7) was used with default optimization (5′-TACCTTGTTACGACTT-3′)followedbydirectSangerse- functions (n50 for k-mer choice, total number of base pairs in quencing of the products. The sequences were classified using the large contigs for cov_cutoff optimization). (ii) The Velvet con- Ribosomal Database Project (10) and compared with previously tigs were used to simulate reads from long-insert libraries, which know oral bacteria using CORE (11). Over 200 SAGs representing were used together with the filtered reads as input for Allpaths- of a large diversity of oral bacteria were obtained from this healthy LG (22) assembly. (iii) Next, Allpaths contigs larger than 1 kb oral sample including one representing candidate phylum SR1, were shredded into 1-kb pieces with 200-bp overlaps. (iv) Finally, previously described in oral microbial rRNA diversity analyses as the Allpaths shreds and raw 454 pyrosequence reads were as- “candidate division SR1 bacterium taxon 345” (12). sembled using the 454 Newbler assembler v2.4 (Roche/454 Life Sciences). This process resulted in a total assembly size of Analysis of SR1 Body Site Distribution and Phylogenetic Reconstructions. 460,033 bp (56 contigs, N50:14,267 bp). Further inspection of the SSU rRNA sequences (454 pyrosequences of the V3–5hypervariable draft 454-Illumina hybrid assembly and contig refinement was region) corresponding to SR1 bacteria that colonize different hu- performed using Geneious v.5.6 (19). man body niches (13) were obtained from the HMP Data Analysis and Coordination Center (www.hmpdacc.org). The sequences Metagenome-Assisted SR1 Pangenome Assembly. Predicted protein were aligned, preclustered (14), and clustered into operational sequences from various regions of the SR1 contigs were used to taxonomic units (OTUs) at 97% sequence identity using mothur search for close homologs in the 415 HMP metagenomic datasets (15). The resulting OTU table was structured with body sites as deposited in IMG_HMP (23) using blastp. Top hits that were part samples. Raw OTU counts were converted to percentage within of relatively large contigs (>1kb),withsignificant e values (<10e- sample, square-root transformed, used to calculate a Bray–Curtis 20) and with DNA composition similar to that of the SR1 genomic resemblance matrix and visualized with nonmetric multidimen- assembly (35–38%) were retrieved for further analysis.