Supplemental Online Information
Total Page:16
File Type:pdf, Size:1020Kb
Supplementary Online Information 1. Photographs of Octopus and Mushroom Spring. See Supplementary Figure 1. 2. Reference genomes used in this study. See Supplementary Table 1. 3. Detailed Materials and Methods. DNA extraction. The uppermost 1 mm-thick green layer from each microbial mat core was physically removed using a razor blade and DNA was extracted using either enzymatic or mechanical bead- beating lysis protocols. The two methods resulted in different abundances of community members (see below) (Bhaya et al., 2007; Klatt et al. 2007). For enzymatic lysis and DNA extraction, frozen mat samples were thawed, resuspended in 100 μl Medium DH (Castenholz's Medium D with 5 mM HEPES, pH = 8.2; Castenholz, 1988), and homogenized with a sterile mini-pestle in 2 ml screw cap tubes. Medium DH (900 μl) was added to the homogenized sample, then lysozyme (ICN Biomedicals, Irvine, CA) was added to ~200 μg ml-1, and the mixture was incubated for 45 min at 37 °C. Sodium docecyl sulfate (110 μl of 10% (w/v) solution) and Proteinase K (Qiagen, Valencia, CA) (to 200 μg ml- 1) were added, and the mixture was incubated on a shaker for 50 min at 50 °C. Microscopic analysis suggested efficient lysis of Synechococcus spp. cells, but a possible bias against some filamentous community members (Supplementary Figure 2). Phase contrast micrographs were obtained with a Zeiss Axioskop 2 Plus (Carl Zeiss Inc., Thornwood NY, USA) using a Plan NeoFluar magnification objective, and autofluorescence was detected using a HBO 100 mercury arc lamp as excitation source and a standard epifluorescence filter set (Leistungselektronik Jena GmbH, Jena, Germany). DNA was purified using a series of organic extractions, the first using Tris-HCl-equilibrated phenol (pH=8.0) and three subsequent extractions using phenol:chloroform:isoamyl alcohol (25:24:1). Nucleic acids were precipitated at -20°C by adding 2.5 volumes ethanol and 0.1 volume 3.0 M sodium acetate (pH=5.2). The mechanical bead-beating extraction was performed on frozen mat samples with a MoBio UltraClean Soil DNA extraction kit (catalog #12800, MO BIO Laboratories, Inc. Carlsbad, CA) according to the manufacturer's instructions. 16S rRNA analysis of samples used in construction of metagenomic libraries. Denaturing gradient gel electrophoresis analysis of PCR-amplified 16S rRNA genes in DNA extracted using the enzymatic protocol was analyzed by denaturing gradient gel electrophoresis according to methods previously described (Ferris and Ward, 1997), and confirmed a familiar distribution pattern (Ferris and Ward, 1997; Ward et al., 2006) of Synechococcus spp. A/B genotypes along the effluent channel of Mushroom Spring and Octopus Spring, as shown in Supplementary Figure 3. Pyrosequencing of 16S rDNA. A pyrosequencing test plate (Roche 454 FLX) was completed at JCVI using DNA extracted from a #15 core sampled at Mushroom Spring 60°C on 17 December 2007. Four different protocols were followed for the extraction of DNA; (i) the enzymatic protocol detailed above, (ii) an enzymatic and mechanical method used to construct metagenome libraries at the US DOE Joint Genome Institute (see Inskeep et al., 2010 for details), (iii) a MoBio UltraClean Soil DNA extraction kit as above, and (iv) a pressure based lysis procedure. For this procedure, mat samples were resuspended into the Epicentre gram positive lysis buffer supplemented with Epicentre Ready-lyse at 1ug/ml and proteinase K at 1 ug/ml (Epicentre Biotechnologies, Madison, WI) and samples were processed in the PCT Barocycler NEP2320 (Pressure BioSciences, South Easton, MA). Briefly, resuspended samples were added to PCT tubes with shredder disk. Samples were homogenized in the shredder tube for 20 seconds. Homogenized samples were processed further in the Barocycler for 45 1 cycles at 65°C. Cycles were as follows: 5 seconds at 35K p.s.i. followed by 5 second at 0 p.s.i. After 45 cycles in the Barocycler, nucleic acids were extracted as per Epicentre protocol. Pyrosequencing was conducted using the sequencing primers V3-V5F: 5'-CCTACGGGAGGCAGCAG-3', and V3-V5R: 5'- CCGTCAATTCMTTTRAGT-3'. Taxonomic calls were determined using the Ribosomal Database Project Bayesian Classifier (Wang et al., 2007). The taxonomic distribution of these sequences is shown in Supplementary Figure 4. Metagenome construction and sequencing. DNA from both extraction procedures was size- fractionated using agarose gel electrophoresis, and fragments between ~2-3 kb and ~10-12 kb (Supplementary Table 2 were ligated into HT plasmid vectors. Paired-end sequencing of inserts was done at the J. Craig Venter Institute (JCVI) using BigDye Terminator chemistry and an ABI 3100 Genetic Analyzer (Applied Biosystems, Foster City, CA). Metagenomic assemblies were deposited in GenBank (Project number 20953). BLASTN recruitment by reference genomes. The 202 331 paired-end sequences derived from the plasmid insert libraries contain approximately 167 Mbp of sequence with an average sequence length of 817 nucleotides. Due to concerns of lysis bias and lower cyanobacterial representation in mechanical lysis protocols, we used only the 161 976 sequences that were produced from the enzymatic lysis protocol for further analysis (see Supplementary Figures 5 and 6). These sequences were used as a query in a preliminary WU-BLASTX (Altschul et al., 1990) (default parameters) comparison to NCBI's protein database of bacterial and archaeal genomes (obtained on 26 February 2008) to identify publicly available genomes that recruited numerous metagenome sequences at an amino acid identity above ~70%. In addition, the metagenomic sequences were subjected to BLASTN recruitment by all 1 414 genomes available at NCBI (May 2nd, 2009). These results guided the selection of twenty isolate genomes (Supplementary Table 1) to be used as a reference set. These genomes were selected on the basis of whether the isolates containing them were (i) known to be genetically representative of populations inhabiting these mat communities based on prior molecular analysis (e. g., 16S rRNA or 16S-23S internal transcribed spacer region analyses), (ii) cultivated from these or similar Yellowstone alkaline siliceous hot spring cyanobacterial mats, (iii) cultivated from another kind of Yellowstone geothermal feature; (iv) cultivated from geothermal features outside Yellowstone, (v) representative of physiological groups whose activities are known to occur in the mat (e. g., oxygenic photosynthesis, anoxygenic photosynthesis, aerobic respiration, fermentation, sulfate reduction and methanogenesis), or (vi) representative of relevant phylogenetic groups that were not otherwise included in the set of reference genomes. WU-BLASTN was used to align the metagenome sequences to the concatenated twenty-genome database with the parameters M=3, N=-2, E=1e-10, and wordmask=dust. Recruitment plots to these and a large number of other genomes can be produced using tools found at http://gos.jcvi.org/users/FIBR/advancedReferenceViewer.html). These parameters were designed using Karlin-Altschul statistics (Karlin and Altschul, 1990) to obtain significant alignments as low as 50% identity with a target length of approximately 100 bp. Sequences that did not meet these criteria were labeled “null”, which indicates a lack of sufficient sequence similarity from which to assign phylogeny. Supplementary Figures 5 and 6 show recruitment results metagenomes obtained using different lysis protocols and samples. Taxonomic resolution of recruited sequences. To estimate the taxonomic resolution offered by the recruitment of metagenomic sequences to reference genomes, cyanobacterial and FAP genomes of differing relatedness were aligned to a reference genome (Supplementary Figure 7). The distributions of % NT ID for each genome in comparison to the reference genome determined the level of % NT ID 2 that corresponded to strains within the same named genus, within different genera within the same kingdom (i.e., sub-Domain lineage) or within different kingdoms. We used these % NT ID ranges to inform decisions as to the % NT ID distributions that could be confidently associated with the respective reference genome, as indicated in Table 2. Specifically, we examined the relationships between homologs in genomes from cyanobacteria and Chloroflexi (and relatives) with different levels of relatedness (Supplementary Figure 7). Synechococcus spp. strain A and B' homologs range from ~75 to 100% NT ID (mean ± standard deviation = 85.0 ± 6.5 %). To ensure that the metagenomic sequences recruited by the Synechococcus spp. A and B' genomes were more closely related to the genome that recruited them than to the other genome, these sequences were separately queried against the Synechococcus spp. A and B' genomes in two independent BLASTN experiments. Results indicating efficient separation are shown in Supplementary Figure 8. Genes of more distantly related cyanobacteria (Thermosynechococcus elongatus, Nostoc sp. strain PCC 7120 and Gloeobacter violaceus) range from 50-75% NT ID (with means 61 to 64%) to homologs in Synechococcus sp. strain A. Similarly, Roseiflexus sp. strain RS1 and R. castenholzii homologs range from ~70 to ~90% NT ID (mean 78.3 ± 7.1 %), but genes in more distantly related members of the kingdom (Chloroflexus and Herpetosiphon) range from 50-75% NT ID (means 58.3 to 64.1 %) with Roseiflexus sp. strain RS1 homologs. According to a one-way analysis of variance, there is a statistically significant difference between the distributions of % NT ID in these pairwise genome