<<

Metagenome analysis of an extreme microbial reveals eurythermal adaptation and metabolic flexibility

Joseph J. Grzymskia,1, Alison E. Murraya,1, Barbara J. Campbellb, Mihailo Kaplarevicc, Guang R. Gaoc,d, Charles Leee, Roy Daniele, Amir Ghadirif, Robert A. Feldmanf, and Stephen C. Caryb,d,2

aDivision of Earth and Sciences, Desert Research Institute, 2215 Raggio Parkway, Reno, NV 89512; bCollege of Marine and Earth Studies, University of Delaware, Lewes, DE 19958; cDelaware Biotechnology Institute, 15 Innovation Way, Newark, DE 19702; dElectrical and Computer Engineering, University of Delaware, 140 Evans Hall, Newark, DE 19716; eDepartment of Biological Sciences, University of Waikato, Hamilton, New Zealand; fSymBio Corporation, 1455 Adams Drive, Menlo Park, CA 94025

Edited by George N. Somero, Stanford University, Pacific Grove, CA, and approved September 17, 2008 (received for review March 20, 2008) support diverse life forms, many of the thermal tolerance of a structural biomarker (5) which rely on symbiotic associations to perform functions integral supports the assertion that A. pompejana is likely among the to survival in these extreme physicochemical environments. Epsi- most thermotolerant and eurythermal metazoans on Earth lonproteobacteria, found free-living and in intimate associations (6, 7). with vent invertebrates, are the predominant vent-associated A. pompejana is characterized by a filamentous microflora that . The vent-associated worm, Alvinella forms cohesive hair-like projections from mucous glands lining pompejana, is host to a visibly dense fleece of episymbionts on its the polychaete’s dorsal intersegmentary spaces (8). The episym- dorsal surface. The episymbionts are a multispecies consortium of biont community is constrained to the bacterial subdivision, present as a biofilm. We unraveled details of Epsilonproteobacteria, (9)—a taxonomic class that is widely these enigmatic, uncultivated episymbionts using environmental represented in hydrothermal vent ecosystems on surfaces, free- genome sequencing. They harbor wide-ranging adaptive traits living, in symbioses with invertebrates, in sediments and the that include high levels of strain variability analogous to Epsilon- shallow subsurface (10, 11). Distribution of the 2 dominant such as pylori, metabolic members (5A and 13B) of this Epsilonproteobacteria consortium diversity of free-living , and numerous orthologs of pro- (9) appears highly structured along each hair-like projection. teins that we hypothesize are each optimally adapted to specific The community was reported to comprise between 10 to 15 temperature ranges within the 10–65 °C fluctuations characteristic phylotypes (related at Ն99% SSU rRNA identity), of which of the A. pompejana habitat. This strategic combination enables Ͼ98% were Epsilonproteobacteria (9). Most described host- the consortium to thrive under diverse thermal and chemical symbiont associations consist of either monospecific relation- regimes. The episymbionts are metabolically tuned for growth in ships [e.g., Euprymna squid–Vibrio fischeri (12)] or more often, hydrothermal vent ecosystems with encoding the complete divergent multispecies associations [e.g., termites (13), marine rTCA cycle, oxidation, and denitrification; in addition, the oligochaete, Olavius algarvensis (14)]. However, population-level episymbiont metagenome also encodes capacity for heterotrophic variation has been observed in some symbioses, such as the and aerobic . Analysis of the environmental genome marine sponge Axinella mexicana, which is associated with 2 suggests that A. pompejana may benefit from the episymbionts dominant crenarchaeotal populations (15) and the gammapro- serving as a stable source of food and vitamins. The success of teobacterial of the vent tubeworm Riftia pachy- Epsilonproteobacteria as episymbionts in hydrothermal vent eco- ptila (16). The diverse yet phylogenetically constrained episym- systems is a product of adaptive capabilities, broad metabolic bionts of A. pompejana are in stark contrast to a vent-associated capacity, strain variance, and virulent traits in common with shrimp from the Mid-Atlantic Ridge, Rimicaris exoculata, which pathogens. has established a more conventional single- association with an epsilonproteobacterium (11). Although the A. pompe- Epsilonproteobacteria ͉ hydrothermal vent jana episymbionts have eluded isolation, molecular approaches have revealed that two of the dominant epsilonproteobacterial phylotypes have the capability for chemolithoautotrophic growth he polychaete Alvinella pompejana is an endemic inhabitant Tof deep-sea hydrothermal vents located from 21°N to 32°S latitude on the East Pacific Rise (1). This tube-dwelling Author contributions: A.E.M., B.J.C., G.R.G., R.D., R.A.F., and S.C.C. designed research; B.J.C. polychaete forms dense colonies exclusively on the walls of and A.G. performed research; J.J.G., A.E.M., B.J.C., M.K., C.L., and S.C.C. analyzed data; and high-temperature black smoker chimneys (2, 3), which are J.J.G., A.E.M., B.J.C., and S.C.C. wrote the paper. characterized by extreme physicochemical gradients and dy- The authors declare no conflict of interest. namic in thermal emission rates and intensive mineral precipi- This article is a PNAS Direct Submission. tation. The high-temperature diffuse flow surrounding the Freely available online through the PNAS open access option. worms’ tubes is acidic (pH 4.2–6.1) carrying high levels of total Data deposition: The Whole Genome Shotgun project has been deposited in the DDBJ/ (free ϩ complexed) sulfide (Ͼ1 mM), ammonia EMBL/GenBank database (accession no. AAUQ00000000); the version described in this (3.8–10 ␮m), and reactive heavy metals (0.3–200 ␮M) including paper is the first version (accession no. AAUQ01000000). The project is registered with ␮ project ID 17241 and consists of sequences AAUQ01000001–AAUQ01128934. Traces were ferrous iron (290–840 m) (2). Temperature fluctuations in the submitted to the National Center for Biotechnology Information Trace Archive (accession actual tubes of A. pompejana range from 29 °C to 84 °C while the nos. 1878413299–1878706363); PCR-amplified SSU rRNA gene sequences were also sub- chemical conditions are anoxic, slightly acidic to near neutral pH mitted (accession nos. EF462586–EF462837). (5.33–6.9), and rich in electron acceptors (sulfate, nitrate, Fe III, 1J.J.G. and A.E.M. contributed equally to this work. and Mg) as well as potentially lethal levels of heavy metals (2, 3). 2To whom correspondence should be addressed. E-mail: [email protected]. The tube fluids contain surprisingly low levels of free H2S(Ͻ0.2 This article contains supporting information online at www.pnas.org/cgi/content/full/ ␮M to 46.53 ␮M) and are a mix of ambient seawater (72–91%) 0802782105/DCSupplemental. supplemented with vent-derived emissions (3, 4). An analysis of © 2008 by The National Academy of Sciences of the USA

17516–17521 ͉ PNAS ͉ November 11, 2008 ͉ vol. 105 ͉ no. 45 www.pnas.org͞cgi͞doi͞10.1073͞pnas.0802782105 Downloaded by guest on September 30, 2021 0.09

0.08 m

r 0.07 J R E M 0.06 j

e 0.05 S C l

c 0.04 ho L P HO N s 0.03 t I GF p 0.02 f g U K V ki or Free-living Cog usage Cog or Free-living Pathogen v T 0.01 DQu Fig. 1. Rarefaction curves for SSU rRNA genes and selected in the dnq EM. Rarefaction curves were determined for the SSU rRNA gene (216 bases, 79 0 Bb 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 sequences at 99% similarity) and the following proteins: recombinase subunit EM-C Cog usage A (recA, 280 bases, 43 sequences at 99% similarity), DNA gyrase subunit B (gyrB, 239 bases, 34 sequences at 99% similarity), heat shock protein 60 (groEL, Fig. 2. COG distributions in the EM-C dataset compared with 3 pathogenic 167 bases, 34 sequences at 98% similarity), ribosomal protein S12 (rpS12, 233 epsilonproteobacteria (bold, capitalized letters) and 3 free-living bacteria bases, 28 sequences at 99% similarity), and malate dehydrogenase (mdh, 263 (lowercase letters). The letters indicate the specific COG role category. bases, 20 sequences at 99% similarity).

coding genes (Fig. 1) identified between 6 and 31 distinct via the reductive TCA cycle (17). Until recently, it was believed sequences, respectively. carbon was solely fixed chemoautotrophically via the Calvin The EM dataset consisted of 300,607 high-quality reads that Benson cycle in vent microbial communities (17). assembled into short contigs at 95% nucleotide identity (Table We performed an environmental genomic analysis of the A. S2). Attempts to bin either the raw sequence data or the pompejana episymbiont community in an effort to define the assembled data based on GϩC content were unsuccessful be- basic metabolic strategy of the association and elucidate why cause the data were almost uniformly distributed around the Epsilonproteobacteria are so successful in this environment. We 35% average (Fig. S4). The assembled contigs averaged 1,462 bp hypothesized that under extreme constraints imposed by the in length and comprised 38.5 MB. The total dataset (68 MB), local geochemical environment, the Epsilonproteobacteria epi- including singletons, contained 128,412 sequences. A total of symbiotic consortium employs a core metabolic strategy shared 103,371 proteins was predicted from these data. Forty-one by most members of the community and that geochemical and percent of these predicted proteins could be assigned to a cluster thermal fluctuations impose significant selection pressures on of orthologous group (COG). Given the complexity of the the community. Identification of the metabolic and cellular dataset, the amino acid sequences were clustered at 40% identity features of this microbial community have provided details to simplify data analysis and help establish a nonredundant core regarding the relationships between the symbionts and their host episymbiont metagenome (hereafter referred to as the EM-C) as well as the nature of epibiont adaptation in a dynamic (Fig. S5). The EM-C dataset was designated as the 2812 clusters physiochemical environment (of 36,221) with greater than 5 amino acid sequences in them. These clusters comprised Ϸ60% of the total predicted proteins. Results and Discussion The EM-C dataset was compared against Epsilonproteobacte- Metagenome Characteristics and Comparisons. The primary met- ria genomes from pathogens ( J99, Campy- agenomic library (97% of the sequence generated) was prepared lobacter jejuni 11168, and , ATCC 51449) from a single worm collected from one of the 9oN hydrothermal and free-living (Sulfurovum sp. NBC37–1, Nitratirup- vent sites. Sequences from 2 other genomic libraries generated tor sp. SB155–2, and denitrificans, ATCC33889). from pools of 4 and 3 worms respectively (9,061 sequences) were The EM-C contained most of the COGs found in both pathogen included in the analysis as their community composition was and free-living Epsilonproteobacteria. Pathogen Epsilonpro- similar to the primary dataset; this added approximately 4.5 MB teobacteria genomes contained, on average, 1014 different of DNA to aid in the sequence assembly. The highly variable COGs, of which Ϸ80% are in common between these genomes physical and chemical features of the worm habitat are charac- and 72% are found in EM-C. Free-living Epsilonproteobacteria teristic of this ecosystem (e.g., temperature Ϸ40–70 °C) (sup- genomes contained, on average, 1174 different COGs, of which porting information (SI) Table S1) and are similar to those Ϸ82% are in common and 72% are found in EM-C. There are measured previously (2, 3). 598 predicted proteins associated with COGs in common among At least 27 highly related strains of bacteria (and no ) the 3 datasets. The distributions of COGs between the EM-C, were detected from V1-V2 SSU rRNA gene sequences from the free-living, and pathogen Epsilonproteobacteria genomes were A. pompejana episymbiont metagenome (referred to hereafter as compared (Fig. 2).

EM). Two clusters of sequences were related to the previously Distribution of predicted proteins in COG functional classes MICROBIOLOGY described ribotypes 5A and 13B (9). These clusters dominate the were similar to other Epsilonproteobacteria, including an over- consortium (35% and 30%, respectively, based on the V1-V2 representation of translation COGs compared with transcription aligned sequences) (Fig. S1). The shotgun metagenome SSU COGs (5.6% and 1.5%, respectively, for EM-C vs. 5.2% and rRNA gene sequences were very similar (82% of the sequences 2.6%, respectively, for Epsilonproteobacteria vs. 3.7% and 4.9%, Ͼ98% identity) to a PCR SSU RNA gene clone library derived respectively, for all bacterial genomes). The relatively high from the same DNA (Fig. S2). The PCR library has a higher frequency of translation COGs is a function of the small genome power of sampling depth and revealed greater population di- sizes of Epsilonproteobacteria and basic translation needs of a versity than the EM data revealed alone (Fig. S3). Rarefaction . However, there are fewer translation COGs in analysis of conserved (rpS12) to the variable (recA) protein- epsilon pathogen genomes than free-living epsilons and the

Grzymski et al. PNAS ͉ November 11, 2008 ͉ vol. 105 ͉ no. 45 ͉ 17517 Downloaded by guest on September 30, 2021 CO Org C O22 22 H22 HS- Sox Ni-Fe FSD system polysulfide Ni-Fe red. hydrogenase PTS - HS SO 2- CM OM rTCA dissimilatory 4 bisulfite reductase G-6 P SO 2- 3 fumarate adenylylsulfate red. glycolysis reductase APS + APS tRNAs: 49 Pentose-PO NH4 sulphurylase Amino Acids: 21 A 4 2- TC SO4 Vitamins: 8 (oxic/anoxic)

- - adhesions -metalloendopeptidases NO3 NO2 NO N2O N2 -ß-glucosidase -xylanase Capsule -chitin deacetylase -- NONO33

Fig. 3. Model of predicted metabolic processes in the episymbiont cell based on annotation of the EM-C. The KEGG database was used as a basis for metabolic reconstruction.

EM-C. This difference is a result of multiple copies of both the episymbionts associated with A. pompejana. The ribosomal putative methionyl-tRNA formyltransferase and putative rRNA protein phylogeny also suggests that the EM-C is more similar to methylases in free-living genomes. The low number of genes in other free-living Epsilonproteobacteria than to the Epsilonpro- transcription COGs could be the result of a combination of teobacteria pathogens, and it is most closely related to Sulfu- alternate gene expression regulators such as guanosine-3Ј- rovum sp. NBC37–1. This relationship is also apparent upon diphosphate-5Ј diphosphate (ppGpp), high levels of environ- inspection of ribosomal RNA phylogeny based on maximum mental adaptation, and lack of competition (18). Normalized to likelihood (Fig. S3). genome size, the Epsilonproteobacteria have approximately 4-fold fewer transcriptional regulators than E. coli. Eurythermalism in the Episymbiont Proteome. Ecoparalogs are func- Signal transductions COGs (COG ‘T’) were found in greater tionally equivalent genes in a single genome adapted to different abundance in free-living Epsilonproteobacteria (4.5%) and the ecological niches (20). Although the complexity of our dataset EM-C (2.8%) compared with pathogens (1.3%)—possibly a precluded colocating homologs in a single genome, numerous result of the stochastic environment characteristic of both the divergent gene variants were found. We hypothesize that selec- free-living organisms and episymbionts. The complexity of the tion pressures in the hydrothermal vent environment have cell wall and membrane biosynthesis pathways (COG ‘M’) in the disproportionate effects on amino acid usage and protein struc- episymbionts compared with other Epsilonproteobacteria are ture resulting in sequence variability like that found in EM reflected in the abundance of genes found in this category. It is groEL and recA sequences (Fig. S9). the most represented COG category, accounting for 8% of the Predicted structural changes between EM-C gene variants (2 EM-C. The most abundant class of proteins in the entire dataset, to 7 per protein) and other Epsilonproteobacteria orthologs were however, were putative replication initiator proteins associated compared for the following proteins: GroES, GroEL, and glu- with plasmids (Table S3). Efforts to detect plasmid DNA in tamate dehydrogenase (GDH) and 3 proteins that comprise the episymbiont genomic DNA preparations from several A. pom- 50S subunit of the ribosome (L1, which is important in the pejana samples were unsuccessful; however, contigs comprised binding of rRNA; L11, which is involved in the elongation of unidentifiable DNA and long stretches of hypothetical pro- process of protein translation; and L19, which binds to L14 and teins, some of which may be plasmid in origin, were identified by is involved in stabilizing the central region of the large subunit). using tetranucleotide analysis (Fig. S6). Not surprisingly, the There were 7 gene variants of the groES protein between 95% EM-C has fewer proteins associated with the cell motility COG and 60% amino acid identity, and there were 5 variants of the (Ͻ1%) than either the free-living epsilons (2%) or pathogens L19 protein between 91% and 60% amino acid identity. The (3.5%) (Fig. 2 and Fig. S6). In the EM-C, these proteins are gene variants for all proteins considered in this analysis have associated with a type II secretory pathway (Table S4), methyl- amino acid differences that could be inferred to provide differ- accepting chemotaxis, and cytolysin. Pathogens are unable to ent thermal optima; they have differences in Arg and Lys usages, synthesize all amino acids compared with the EM-C (Fig. 3) and as well as varying number of salt bridges (from 0 to 5) (21, 22). free-living epsilons. COGs that are uniquely found in pathogens Recent experimental work (23) determined in vitro kinetic include some outer membrane proteins, urease accessory pro- parameters for two EM , GDH and isopropylmalate teins, and proteins involved in the type IV secretory pathway dehydrogenase. These results provided the first experimental (Fig. S7). evidence demonstrating the wide range of thermodynamic pa- The average GϩC content of pathogen and free-living ge- rameters found in the EM. nomes is 35.4% and 39.4%, respectively. The EM GϩC content Predicted structural changes between proteins encoded by is 35%. There were no interpretable differences in average pI of EM-C gene variants with orthologs from Sulfurovum sp., H. the predicted proteomes; there were also no correlations with pylori J99, and an unrelated , thermophilus, GϩC content of the genomes as has been shown in free-living resulted in several different structures (Fig. 4). Structural dif- and endosymbiotic (19). ferences aligned 1 variant to a thermophile structure and another Protein branch lengths of concatenated ribosomal proteins to a mesophile structure. In addition, EM L19 ribosomal pro- (5235 aa, gaps removed) (Fig. S8) are similar between most teins had a number of amino acid substitutions (especially Lys Epsilonproteobacteria and EM-C, with the exception of Nitratir- and Arg) (Fig. S10). Differences in amino acid usage and uptor sp. Branch-length similarity suggests that evolutionary predicted structures may confer a broader thermal tolerance on forces acting on pathogens (lack of repair genes, high population the stability and function of proteins in the episymbionts. heterogeneity) are similarly affecting the free-living organisms Perhaps this tolerance provides a competitive advantage in vent living in the unique hydrothermal-vent environment, including ecosystems. The SSU RNA data showing a diverse, closely

17518 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0802782105 Grzymski et al. Downloaded by guest on September 30, 2021 tion as reported for H. pylori (29) was not found. We identified putative island-associated regions that contained long stretches of hypothetical proteins (Fig. S5) using tetranucleotide cluster- ing of EM contigs Ͼ5 kb, which further substantiates the claim that genomic islands, plasmid, and phage-like elements are significant in this consortium. Furthermore, nucleotide cluster- ing [using D2 cluster (30)] revealed significant amounts of repetitive DNA, which are likely associated with genomic islands and serine/aspartic acid-rich regions of adhesion factors as they are in other epsilonproteobacterial genomes (31). Bacterial defenses and virulence traits were also well repre- sented in the EM-C. Several genes were found to encode antimicrobial peptide transport, multidrug transport, and re- striction endonuclease type I and II systems with homology to other Epsilonproteobacteria in addition to other bacteria. Nu- merous virulence genes identified in other Epsilonproteobacteria Fig. 4. Homology based 3D models of the L19 ribsomal protein. Proteins are genomes were detected in the EM-C; however, the prevalence of from T. thermophilus, H. pylori J99, and Sulfurovum sp. (A) and from 2 virulence genes and unique types in the episymbiont core are ecoparalogs in the EM (B). Each amino acid is mapped onto the structure based noteworthy (Table S4). on its BLOSUM60 substitution probability in the alignment where blue is an identical or conserved substitution and white and increasing values of red are nonconserved amino acid changes. The alignment is Fig. S10. Energy and Carbon Utilization. A versatile suite of carbon processing and energy generation capabilities is present in the Epsilonproteobacteria episymbiont consortium. The EM-C related population structure and gene variant data suggest that encodes complete pathways for reductive TCA, episymbionts have populations with different thermal and chem- (via Nap, NirS, Nor, and Nos), and sulfur oxidation (via sox ical optima. Epsilonproteobacteria have relatively large shared system), similar to other vent-associated epsilonproteobacterial core genes despite low degrees of synteny and high levels of genomes, including strains of S. denitrificans, Sulfurovum sp. genome rearrangements (24). This feature likely contributes to NBC37–1, and sp. SB155–2 (Fig. 3). The EM-C also their success and diversity. As well, there is supporting evidence contains homologs of flavocytochrome c sulfide dehydrogenase, that the host, A. pompejana, is also well adapted to these suggesting elemental sulfur and/or polysulfide production, and temperature fluctuations (3, 6, 7). the enzymes required for direct sulfite oxidation via APS Amino acid usage in 104 EM-C proteins within 5% of the reductase and APS sulfurylase. In addition, ␣- and ␤-subunits of length of H. pylori orthologs (average length is 308 aa) was also polysulfide reductase were detected. Within Epsilonproteobac- examined. On average (inclusive of gene variants that result in teria genomes, these subunits are only present in Sulfurimonas higher statistical variation), proteins from the episymbionts had denitrificans (32) and succinogenes (33). Genes encod- a more acidic pI (7.7 Ϯ 2.4 vs. 8.2 Ϯ 2.1, t test, P Ͻ 0.001); were ing dsr and qmo redox complexes [for dissimilatory sulfate slightly more hydrophilic (grand average of hydropathicity, reduction (34)] were not detected. The diverse suite of enzymes Ϫ0.3 Ϯ 0.3 vs. 0.24 Ϯ 0.35, t test, P Ͻ 0.01; as in ref. 25); and used involved in sulfur cycling in the EM-C also allows for the significantly more Asp, Gly, Ile, Arg, and Thr in comparison with metabolism of various sulfur intermediates (including thiosul- the pathogens. Pathogen Epsilonproteobacteria used more Ala, fate), enabling resistance to pH fluctuations in this environment. Cys, His, Leu, and Gln (t test, P Ͻ 0.01). These results suggest Glycolysis, PTS, and pentosphosphate cycles are more com- plete than in other Epsilonproteobacteria genomes (i.e., phos- that on average, the EM-C has an amino acid usage profile more phorylase and phosphoglucomutase are present, suggesting the consistent with thermophily than its mesophilic Epsilonpro- potential for glycogen utilization and formation, although ri- teobacteria neighbors. For example, it has been repeatedly bose-5-phosphate isomerase appears to be missing). The epi- demonstrated that use more Arg and less Gln and symbionts are found intimately associated with mucopolysac- His (21, 26). charide. However, genes encoding metalloendopeptidase, betaglucosidase, xylanase, and a chitin deacetylase were found in Foreign DNA, Defenses, and Virulence in the EM. Evidence of foreign low abundance. Two essential enzymes in the Entner Doudoroff DNA further supports our hypothesis that these episymbionts pathway (6-phosphogluconate dehydratase and 2-keto-3-deoxy- are uniquely adapted to thrive in this environment by maintain- 6-phosphogluconate aldolase) are absent in both the episymbi- ing the ability to constantly update and modify their genomic ont dataset and other free-living Epsilonproteobacteria genomes. content as has been reported for numerous populations of H. They are present, however, in the Epsilonproteobacteria patho- pylori that harbor flexible (plastic) genomes (i.e., refs. 24 and 27). gen genomes—indicating free glucose utilization as an energy Abundant plasmid replication proteins (Table S3), a plasmid- source. The EM-C encodes numerous enzymes for pyruvate encoded colicin V-related protein, multiple integrases, and metabolism, although the most prevalent is the biosynthetic transposases associated with IS elements are present in the pyruvate synthetase, consistent with rTCA-carbon fix- episymbiont core, including several with unique sequences in ation. A portion of the episymbiont consortia likely generates comparison with the other 6 epsilon genomes used in our

energy via pyruvate synthesis from glucose, as all of the enzymes MICROBIOLOGY analyses (Table S4). Integrases and associated gene cassettes in the pathway are present (Table S3). Although we detected all investigated in other vent-associated systems (28) were found to enzymes to run TCA in the forward direction, sequences en- be diverse and, in combination with our observations here, coding ATP citrate lyase subunits were much more abundant (at suggest that mobile DNA may play a major role in microbial least 8:1) than those with homology to citrate synthase. Use of evolution and adaptation in vent environments. Several gene the TCA cycle in Epsilonproteobacteria varies between different homologs encoding pili-related proteins, which have orthologs species. C. jejuni genomes encode a complete oxidative TCA used in type II secretion and exogenous DNA uptake, were cycle, whereas both H. pylori genomes and H. hepaticus appear identified. The competence pathway in the episymbionts is to use a branched pathway in which part of the cycle functions unknown; however, a complete pathway using a type IV secre- in the reductive direction. The EM-C also encodes 2 fumarate

Grzymski et al. PNAS ͉ November 11, 2008 ͉ vol. 105 ͉ no. 45 ͉ 17519 Downloaded by guest on September 30, 2021 reductases that may act as an alternate electron acceptor as gence research vehicle Alvin in 1994 (13oN), 1999 (9oN), 2001 (9oN), and 2002 suggested in other Epsilonproteobacteria genomes (32). This (9oN). Before collection, the temperature of the worms’ immediate environ- enzyme is also required in the rTCA cycle. The EM-C encodes ment was measured and discrete water samples were taken from within the tubes for geochemical analysis (3). Individual alvinellids were collected, trans- both a group I membrane-bound Ni-Fe hydrogenase for H2 uptake (HypF, A, E, D, C, and B) and a group II H -sensing ported to the surface, and frozen (17, 38). The epibionts were removed with 2 sterile forceps and placed in a Tris-SDS-proteinase K lysis buffer for 1 h. The Ni-Fe hydrogenase (HupV and HupS), which is similar to S. nucleic acids then were recovered by using a cetyltrimethylammonium bro- denitrificans, although lacking the H2-evolving function found in mide extraction protocol (17). RNA was removed with RNace-It (Stratagene). Sulfurovum sp. and Nitratiruptor sp. (35). Predicted proteins for Semiquantitative PCR was used to estimate the extent of eukaryotic con- several NADH dehydrogenases and 2 cytochrome oxidases also tamination before library construction using universal primers against the are found in the EM dataset. A flavoprotein, with 30% homol- actin gene (Applied Biosystems) (17). The percentage of eukaryotic DNA was ogy to putative fixD encoding genes in several Sulfolobus species, estimated by calculating the approximate number of genome equivalents of is unique in comparison with other epsilonproteobacterial ge- A. pompejana DNA—haploid genome content is 0.8 pg or 782 MB per total nomes. Despite being a phylogenetically constrained epsilon- amount of DNA (39). The samples were analyzed by denaturing gradient gel proteobacterial consortium, the EM is metabolically versatile, electrophoresis (DGGE) of the V3 region of the 16S rRNA gene (40). Samples were chosen for library construction if they represented the majority of the consistent with other Epsilonproteobacteria (32, 35) and with the episymbiont population, based on DGGE analysis. oligochaete endosymbiotic system (14). SSU rRNA Gene PCR Clone Library. The 16S rRNA gene was amplified by PCR Relationship Between the Host and Episymbiont. Although these using universal primers from DNA extracted from a single episymbiont com- analyses have focused on the episymbionts, we suspect that the host munity (AP201, Extreme 2002, Dive 3836, from Bio9 chimney at 9°N, EPR). PCR benefits nutritionally from this unique symbiosis. Indeed, we ob- conditions were essentially as described previously (40); however, low cycle served worms eating the fleece off the back of their neighbors amplifications were done in triplicate to minimize PCR bias, and products were (Movie S1). Because the episymbionts synthesize all of their amino pooled before cloning into the PCR TA Topo cloning vector (Invitrogen). The acids, the nutritional benefits may go beyond a rich source of region containing the 16S rRNA gene was amplified with primers M13F and reduced carbon. The episymbionts could provide a dietary source M13R. The PCR product was then sequenced on an ABI3130 Genetic Analyzer of vitamin B6 and pyridoxal 5Ј-phosphate (PLP), which is the active (Applied Biosystems) with T3 and T7 primers. form of vitamin B6 and an important cofactor in amino acid Metagenomic Library Construction and Sequencing. Initial libraries were pre- metabolism (36). Genes for both pathways in the de novo synthesis pared by using a modified BAC shotgun library construction protocol (Invitro- of PLP are present, suggesting that the EM has the capability to gen). DNA was nebulized to 1- to 4-kb-sized fragments (selected via agarose synthesize PLP from D-erythrose-4-phosphate (a pentose phos- gel electrophoresis, end-repaired, and phosphorylated using a combination of phate pathway derivative) and glyceraldehyde-3-phosphate (a gly- T4 and Klenow polymerases) and cloned into a blunt-ended TA Topo vector. colysis derivative). This capability distinguishes the EM from all of For large library preparation, DNA was ligated into the gap-free cloning the other Epsilonproteobacteria. Retention of both of these routes vector pSMART-HCKan (Lucigen). Specific details on DNA sequencing and again suggests that the organisms rely on different modes of quality control are in the SI Methods. metabolism for sustained periods, depending on availability and fluctuating redox potential of the environment. Nonetheless, Assembly, Clustering, and Annotation. Upon quality screening of the data, given the location and structure of the episymbiont community on 270,000 reads were assembled using the program PCAP (41), a parallelized the worm and its presence on all worms collected throughout its version of CAP3 (42) with the following modifications: the overlap score cutoff was increased to 6000 (a more stringent value to prevent false joins), a score known geographic range (6,000 km), it is clear that the partnership of 50 was used in the masking and identification of repetitive regions, and is favored. 95% was used as the overlap percent identity. The assembly resulted in generation of approximately 26,000 contigs and approximately 67,000 sin- Conclusion. Analyses of the metagenome and SSU rRNA se- gletons. There were 374 contigs greater than 5 kb (Table S2). quence data from an A. pompejana episymbiont consortium Amino acid sequences from the assembled dataset plus unassembled se- revealed a multispecies community comprised exclusively of quences were clustered using the program CD-HIT (43). A nonredundant closely related Epsilonproteobacteria related to a sequenced, dataset consisted of the longest sequences in clusters Ն40% amino acid vent-associated, free-living bacterium, Sulfurovum sp. NBC37–1. identity using a word length of 3 and a tolerance of 5 (44). Predicted proteins clustered at 40% amino acid identity com- A customized entity relational database was designed and implemented to prised a core dataset representing Ϸ60% of the sequences. Many store the sequence data and associated automated annotation information (45). We integrated several annotation tools, including BLAST (46), FgenesB clusters consisted of multiple sequence variants suspected to (Softberry) for gene calling and COG assignment, PRIAM (47), and a custom range in thermal optima based on amino acid usage and struc- Pfam database generated with -only sequences (M. Gollery, per- tural predictions. Metabolic reconstruction revealed numerous sonal communication). The epibiont metagenome database is accessible pathways (e.g., rTCA, denitrification, and sulfur oxidation), and through a graphical user interface and is available for browsing (http:// analysis of multiple homologous protein structures confirm that ocean.dbi.udel.edu; login as guest). the episymbiont consortium has capabilities to thrive in and mediate the intense physiochemical hydrothermal vent environ- Tools Used in Data Analysis. Concatenated ribosomal protein alignments and ment. Also, multiple vitamin and amino acid biosynthetic path- phylogeny were created using ClustalW and the program VMD (http:// ways and the persistence and prevalence of the symbiosis across www.ks.uiuc.edu/Research/vmd/). The protein maximum likelihood phylogeny the geographic range of A. pompejana suggest that the host may was created using PHYLIP (v3.67). Maximum likelihood analysis of nearly full- length SSU rRNA genes was conducted with fastDNAml (48). Comparative anal- benefit from autotrophic biosynthetic capabilities of the episym- ysis of gene content between the EM-C and published Epsilonproteobacteria bionts. The episymbionts have characteristics of pathogens and genomes was performed using published datasets (31, 35) and tools provided in that facilitate their success. They are armed with many the IMG database (49). Methods for rarefaction analysis are in SI Methods. characteristics of their pathogenic Epsilonproteobacteria rela- tives, harboring the genetic armory required to evade a host ACKNOWLEDGMENTS. We thank Roger Bhan (Sym-Bio Inc.) for bioinformat- response—through cell-surface modifications, mobile DNA, and ics support, Martin Gollery (University of Nevada, Reno, NV), Scott Hazelhurst other flexible gene pool features (31, 37). (University of the Witwatersrand, Johannesburg, South Africa), and Win Hide (South African National Bioinformatics Institute, Belville, South Africa). This research was supported by National Science Foundation Grants OCE-0120648 Methods (to S.S.C., A.E.M., R.F., G.G.), EPS-0447416 (to D.R.I.), EPS-0447610 (to S.S.C.), Collection, DNA Extraction, and Screening. A. pompejana specimens in their and OPP-0421514 (to A.E.M.) and the Desert Research Institute’s postdoctoral associated tubes were collected at the East Pacific Rise by the deep submer- research support program.

17520 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0802782105 Grzymski et al. Downloaded by guest on September 30, 2021 1. Hurtado LA, Lutz RA, Vrijenhoek RC (2004) Distinct patterns of genetic differentiation 26. La D, Silver M, Edgar RC, Livesay DR (2003) Using motif-based methods in multiple among of eastern Pacific hydrothermal vents. Mol Ecol 13:2603–2615. genome analyses: A case study comparing orthologous mesophilic and thermophilic 2. Le Bris N, Zbinden M, Gaill F (2005) Processes controlling the physico-chemical micro- proteins. 42:8988–8998. environments associated with Pompeii worms. Deep-Sea Res I 52:1071–1083. 27. Levine SM, et al. (2007) Plastic cells and populations: DNA substrate characteristics in 3. Di Meo-Savoie CA, Luther GW, Cary SC (2004) Physicochemical characterization of the Helicobacter pylori transformation define a flexible but conservative system for microhabitat of the epibionts associated with Alvinella pompejana, a hydrothermal genomic variation. FASEB J 21:3458–3467. vent . Geochim Cosmochim Acta 68:2055–2066. 28. Elsaied H, et al. (2007) Novel and diverse integron integrase genes and integron-like 4. Luther GW, et al. (2001) Chemical drives hydrothermal vent ecology. Nature gene cassettes are prevalent in deep-sea hydrothermal vents. Environ Microbiol 410:813–816. 9:2298–2312. 5. Gaill F, Mann K, Wiedemann H, Engel J, Timpl R (1995) Structural comparison of cuticle 29. Smeets LC, Kusters JG (2002) Natural transformation in Helicobacter pylori: DNA and interstitial collagens from annelids living in shallow sea-water and at deep-sea transport in an unexpected way. Trends Microbiol 10:159–162. hydrothermal vents. J Mol Biol 246:284–294. 30. Burke J, Davison D, Hide W (1999) D2_cluster: A validated method for clustering EST 6. Chevaldonne P, Desbruyeres D, Childress JJ (1992) Some like hot - and some even and full-length cDNA sequences. Genome Res 9:1135–1142. hotter. Nature 359:593–594. 31. Eppinger M, Baar C, Raddatz G, Huson DH, Schuster SC (2004) Comparative analysis of 7. Cary SC, Shank T, Stein J (1998) Worms bask in extreme temperatures. Nature 391:545– four . Nat Rev Microbiol 2:872–885. 546. 32. Sievert SM, et al. (2008) Genome of the epsilonproteobacterial chemolithoautotroph 8. Desbruye`res D, Gaill F, Laubier L, Fouquet Y (1985) Polychaetous annelids from hydro- Sulfurimonas denitrificans. Appl Environ Microbiol 74:1145–1156. thermal vent ecosystems: An ecological overview. Bull Biol Soc Wash 6:103–116. 33. Jankielewicz A, Schmitz RA, Klimmek O, Kroger A (1994) Polysulfide reductase and 9. Haddad MA, Camacho F, Durand P, Cary SC (1995) Phylogenetic characterization of the formate dehydrogenase from Wolinella succinogenes contain molybdopterin guanine epibiotic bacteria associated with the hydrothermal vent polychaete Alvinella pom- dinucleotide. Arch Microbiol 162:238–242. pejana. Appl Environ Microbiol 61:1679–1687. 34. Cardoso-Pereira IA (2007) Respiratory membrane complexes of Desulfovibrio. In Mi- 10. Campbell BJ, Engel AS, Porter ML, Takai K (2006) The versatile epsilon-proteobacteria: crobial Sulfur metabolism, eds Dahl C, Friedrich CG (Springer, Berlin), pp 24–35. key players in sulphidic habitats. Nat Rev Microbiol 4:458–468. 35. Nakagawa S, et al. (2007) Deep-sea vent epsilon-proteobacterial genomes provide 11. Polz MF, Cavanaugh CM (1995) Dominance of one bacterial phylotype at a mid-Atlantic insights into emergence of pathogens. Proc Natl Acad Sci USA 104:12146–12150. ridge hydrothermal vent site. Proc Natl Acad Sci USA 92:7232–7236. 36. Tanaka T, Tateno Y, Gojobori T (2005) Evolution of vitamin B6 (pyridoxine) metabolism 12. Ruby EG, Lee KH (1998) The Vibrio fischeri Euprymna scolopes light organ association: by gain and loss of genes. Mol Biol Evol 22:243–250. Current ecological paradigms. Appl Environ Microbiol 64:805–812. 37. Kraft C, Suerbaum S (2005) Mutation and recombination in Helicobacter pylori: 13. Warnecke F, et al. (2007) Metagenomic and functional analysis of hindgut microbiota Mechanisms and role in generating strain diversity. Int J Med Microbiol 295:299–305. of a wood-feeding higher termite. Nature 450:560–565. 38. Cary SC, Cottrell MT, Stein JL, Camacho F, Desbruyeres D (1997) Molecular identifica- 14. Woyke T, et al. (2006) Symbiosis insights through metagenomic analysis of a microbial tion and localization of filamentous symbiotic bacteria associated with the hydrother- consortium. Nature 443:950–955. mal vent annelid Alvinella pompejana. Appl Environ Microbiol 63:1124–1130. 15. Schleper C, et al. (1998) Genomic analysis reveals chromosomal variation in natural 39. Gregory TR, et al. (2007) Eukaryotic genome size databases. Nucl Acids Res 35:D332– populations of the uncultured psychrophilic archaeon Cenarchaeum symbiosum. J 338. Bacteriol 180:5003–5009. 40. Campbell BJ, Cary SC (2001) Characterization of a novel spirochete associated with the 16. Robidart JC, et al. (2008) Metabolic versatility of the revealed through metagenomics. Environ Microbiol 10:727–737. hydrothermal vent polychaete annelid, Alvinella pompejana. Appl Environ Microbiol 17. Campbell BJ, Stein JL, Cary SC (2003) Evidence of chemolithoautotrophy in the bacterial 67:110–117. community associated with Alvinella pompejana, a hydrothermal vent polychaete. 41. Huang X, Wang J, Aluru S, Yang S-P, Hillier L (2003) PCAP: A whole-genome assembly Appl Environ Microbiol 69:5070–5078. program. Genome Res 13:2164–2170. 18. Marais A, Mendz GL, Hazell SL, Megraud F (1999) Metabolism and genetics of Helico- 42. Huang X, Madan A (1999) CAP3: A DNA sequence assembly program. Genome Res bacter pylori: The genome era. Microbiol Mol Biol Rev 63:642–674. 9:868–877. 19. Wu D, et al. (2006) Metabolic complementarity and genomics of the dual bacterial 43. Li W, Jaroszewski L, Godzik A (2001) A clustering of highly homologous sequences to symbiosis of sharpshooters. Plos Biol 4:1079–1092. reduce the size of large protein databases. Bioinformatics 17:282–283. 20. Sanchez-Perez G, Mira A, Nyrio G, Pasic L, Rodriguez Valera F (2008) Adapting to 44. Li W, Jaroszewski L, Godzik A (2002) Tolerating some redundancy speeds up clustering environmental changes by specialized paralogs. Trends Gen 24:154–158. of large protein databases. Bioinformatics 18:77–82. 21. Kreil DP, Ouzounis CA (2001) Identification of thermophilic species by the amino acid 45. Kaplarevic M, Murray AE, Cary SC, Gao GR (2008) EnGENIUS - Environmental genome compositions deduced from their genomes. Nucleic Acids Res 29:1608–1615. informational utility system. J Bioinfor Comput Biol 6:6. 22. Chakravarty S, Varadarajan R (2002) Elucidation of factors responsible for enhanced 46. Altshul SF, et al. (1997) Gapped BLAST and PSI-blast: A new generation of protein thermal stability of proteins: A structural genomics based study. Biochemistry 41:8152– database search programs. Nucleic Acids Res 25:3389–3402. 8161. 47. Claudel-Renard C, Chevalet C, Faraut T, Kahn D (2003) Enzyme-specific profiles for 23. Lee CK, Cary SC, Murray AE, Daniel RM (2008) Enzymatic approach to eurythermalism genome annotation: PRIAM. Nucleic Acids Res 31:6633–6639. of Alvinella pompejana and its episymbionts. Appl Environ Microbiol 74:774–782. 48. Olsen GJ, Matsuda H, Hagstrom R, Overbeek R (1994) Fastdnaml - A tool for construc- 24. Linz B, Schuster SC (2007) Genomic diversity in Helicobacter and related organisms. Res tion of phylogenetic trees of DNA-sequences using maximum-likelihood. Comput Microbiol 158:737–744. Applications Biosci 10:41–48. 25. Grzymski JJ, et al. (2006) Comparative genomics of DNA fragments from six antarctic 49. Markowitz V, et al. (2008) The integrated microbial genomes (IMG) system in 2007: marine planktonic bacteria. Appl Environ Microbiol 72:1532–1541. Data content and analysis tool extensions. Nucleic Acids Res 36:D528–533. MICROBIOLOGY

Grzymski et al. PNAS ͉ November 11, 2008 ͉ vol. 105 ͉ no. 45 ͉ 17521 Downloaded by guest on September 30, 2021