Mobile Elements in a Single-Filament Orange Guaymas Basin (“Candidatus Maribeggiatoa”) sp. Draft Genome: Evidence for Genetic Exchange with

Barbara J. MacGregor,a Jennifer F. Biddle,b Andreas Teskea Department of Marine Sciences, University of North Carolina, Chapel Hill, Chapel Hill, North Carolina, USAa; College of Earth, Ocean, and the Environment, University of Delaware, Lewes, Delaware, USAb

The draft genome sequence of a single orange Beggiatoa (“Candidatus Maribeggiatoa”) filament collected from a microbial mat at a hydrothermal site in Guaymas Basin (Gulf of California, Mexico) shows evidence of extensive genetic exchange with cyano- , in particular for sensory and signal transduction genes. A putative homing endonuclease gene and group I intron within the 23S rRNA gene; several group II catalytic introns; GyrB and DnaE inteins, also encoding homing endonucleases; mul- tiple copies of sequences similar to the fdxN excision elements XisH and XisI (required for heterocyst differentiation in some cyanobacteria); and multiple sequences related to an open reading frame (ORF) (00024_0693) of unknown function all have close non-Beggiatoaceae matches with cyanobacterial sequences. Sequences similar to the uncharacterized ORF and Xis ele- ments are found in other Beggiatoaceae genomes, a variety of cyanobacteria, and a few phylogenetically dispersed pleiomorphic or filamentous bacteria. We speculate that elements shared among filamentous bacterial species may have been exchanged in microbial mats and that some of them may be involved in cell differentiation.

arge vacuolate Beggiatoaceae are conspicuous members of mi- 27°0.450300=N; longitude, 111°24.532320=W; depth, 2,001 m) at Guay- Lcrobial mats at sites where hydrothermal fluids reach the sur- mas Basin, Mexico, contained orange mat material, which was removed to face in Guaymas Basin, a sedimented midocean spreading center a 50-ml Falcon tube with seawater and refrigerated overnight (also de- in the Gulf of California (1). They may belong to two or more scribed in reference 5). The orange tuft was nicely reestablished the next candidate -level groups in a recently proposed reorganiza- day. A large portion was placed in sterile seawater-0.1% agar and incu- tion of the group (2, 3). Beggiatoaceae are also found in a variety of bated at room temperature with the motion of the ship, and then a portion freshwater and hypersaline environments; of particular relevance to of this was transferred to sterile seawater in a petri plate. Single filaments were selected using a 10-␮l pipette tip. Three microliters of liquid was this paper, they form multispecies hypersaline microbial mats in drawn up as the pipette tip was centered on a filament head under ϫ40 Guerrero Negro (Baja California, Mexico) together with Cyanobac- magnification in a dissecting microscope. Filaments were immediately teria, Chloroflexi, , and a diversity of other bacteria and expelled into sterile 500-␮l tubes and frozen at Ϫ20°C. The sequenced archaea (2, 4). The near-complete genome sequence of a single or- filament had a diameter of 35 to 40 ␮m. ange Guaymas Basin vacuolate Beggiatoa (“Candidatus Maribeggia- Amplification of single-filament DNA. Filament DNA was initially toa”) filament, referred to here as the BOGUAY sequence, was re- amplified with the RepliG minikit (Qiagen, Germantown, MD) with the cently obtained (B. J. MacGregor, J. F. Biddle, C. Harbort, A. G. following modifications. Solution DLB was prepared by adding 500 ␮lof Matthysse, A. Teske, submitted for publication) (5). A complete ge- DNase/RNase-free water (Qiagen) to the DLB stock in the kit. Solution nome for the freshwater Beggiatoa alba B18LD (NCBI project ID D2 was prepared by adding 5 ␮l dithiothreitol (DTT) from the kit to 55 ␮l 62137) and partial genomes for two filaments (BgP and BgS) from resuspended DLB solution. PicoGreen dye stock (Invitrogen, Carlsbad, Baltic Sea harbor sediment (6) are available for comparison. CA) was diluted 1:10 in PCR-grade water. Frozen filaments were thawed The BOGUAY genome annotation, like most others, identified a and centrifuged briefly. All liquid was removed from the tube (ϳ3 ␮l), ␮ ϫ variety of possible mobile or formerly mobile elements (introns, in- and the filament was visible at the bottom. A total of 1 lof1 phosphate- ␮ teins, homing endonucleases) and genes likely delivered by them (re- buffered saline (PBS) and 1.5 l solution D2 were added. Tubes were vortexed, centrifuged briefly, and incubated 10 min on ice, and then 1.5 ␮l striction-methylation and toxin-antitoxin systems). Their phyloge- of stop solution was added. The entire volume was transferred to a QPCR netic affiliations may provide clues to the evolution of the plate (Stratagene, La Jolla, CA), and 15 ␮l reaction buffer, 1 ␮l RepliG Beggiatoaceae and the mat communities they are found in. We pres- phi29 polymerase, and 0.25 ␮l diluted PicoGreen dye were added. The ent evidence here that the marine BOGUAY lineage, cyanobacteria, reaction was placed in an MX3500P QPCR machine (Stratagene) and and diverse filamentous gliding bacteria have a history of genetic ex- change, which appears to have been less extensive and/or less well preserved in the freshwater Beggiatoa alba. A first genome-wide com- Received 11 December 2012 Accepted 15 April 2013 parison of BOGUAY-predicted proteins with the public databases Published ahead of print 19 April 2013 suggests that signal transduction genes and some cell wall biogenesis Address correspondence to Barbara J. MacGregor, [email protected]. genes have been especially successful in transfers. Supplemental material for this article may be found at http://dx.doi.org/10.1128 /AEM.03821-12. MATERIALS AND METHODS Copyright © 2013, American Society for Microbiology. All Rights Reserved. Retrieval of orange filaments for sequencing. Core 4489-10 from RV doi:10.1128/AEM.03821-12 Atlantis/HOV Alvin cruise AT15-40 (13 December 2008; latitude,

3974 aem.asm.org Applied and Environmental Microbiology p. 3974–3985 July 2013 Volume 79 Number 13 Mobile Elements in Orange Guaymas Beggiatoa observed every 10 min for 1.5 h at 30°C. Negative controls using water or here; their identification and classification varied widely among the seawater sorting solution in place of the filament mixture showed no automated annotations (JCVI, IMG/ER, UniProt), and many of increase in PicoGreen fluorescence during this time. Reactions were them appear to be split between ORFs and thus may be relict stopped by incubation at 65°C for 3 min. elements. To dilute the phi29 polymerase founder effect, this initial amplifica- A discussion of the evidence for the purity and completeness of tion was split in 6 aliquots and amplified further using the RepliG Midi kit the draft genome is presented first, as an essential basis for the rest (Qiagen) according to the manufacturer’s instructions with minor mod- ifications, including the addition of 1 ␮l of diluted PicoGreen dye into the of the discussion. reaction mixture and incubation of the reaction at 30°C for 4 h with Genome identity: 16S rRNA gene sequence. The orange observation of PicoGreen fluorescence every 10 min by the QPCR ma- Guaymas Beggiatoa genome sequenced (referred to here as the chine. Negative controls of water or the seawater sorting solution again BOGUAY genome) includes a single set of rRNA genes, with the showed no increase in PicoGreen fluorescence during this time. 16S rRNA gene sequence falling with the “Candidatus Maribeg- Confirmation of single-filament amplification. Products from the giatoa” (3) group (see Fig. S1 in the supplemental material). Note RepliG reactions were diluted 1:10, and 2 ␮l of this was used as the tem- that the orange Guaymas Beggiatoaceae are not monophyletic: the plate in a PCR using primers B4 and B7 (7), which amplify the intergenic 16S rRNA gene from the genome-sequenced filament discussed spacer between the 16S and 23S rRNA genes. A single PCR product was here is highly similar to 16S rRNA gene sequences from filaments seen for the BOGUAY filament sample in all amplifications, suggesting collected in 1996 and 2009 but not to a second orange filament that a single species was present. No amplifications were seen for the negative controls. To confirm this, PCR was repeated using primers B8F collected in 2008. The four filaments came from different sites; and B1492R to amplify the full 16S rRNA gene of the amplified filament. whether mixed populations exist is not known. Near-complete (B. PCR products were cloned in TOPO TA cloning vectors (Invitrogen) and alba), partial (Beggiatoa PS, BgP) or very partial (Beggiatoa SS, transformed into Escherichia coli. Cloned products were sequenced via BgS) genome sequences are available for three of the other species colony amplification by Genewiz (Plainsfield, NJ) using M13 priming and strains shown. The BOGUAY 23S rRNA gene is interrupted sites. Over 30 clones were examined, all of which were identical products, by a homing endonuclease gene and an intron sequence, discussed confirming that most likely one species was present. below. Preparation of DNA for genome sequencing. To remove potential Genome completeness. Because the BOGUAY genome is not chimeric structures from the amplified DNA, individual RepliG Midi re- completely assembled, the annotation of ribosomal protein, RNA actions were pooled and treated with S1 nuclease at 37°C for 1 h. DNAs polymerase, tRNA, and tRNA synthetase genes was examined as a were then precipitated and concentrated. Aliquots were run on a 0.8% Tris-acetate-EDTA (TAE) agarose gel and showed that most of the prod- test of genome coverage and purity. uct was high-molecular-weight (HMW) DNA, and a total of 48 ␮gof Ribosomal proteins. A complete set of ribosomal protein DNA had been amplified from a single orange filament. Samples were genes was identified (see Table S1 in the supplemental material), then sent to the JCVI for further sequencing. and six contigs have been provisionally connected (see Fig. S2 in Sequencing at JCVI and open reading frame (ORF)-naming conven- the supplemental material) into an extended fragment with po- tion. The sample was subjected to quality control by 16S rRNA gene PCR tential S10, spc, and alpha operons (16, 17). The remaining ribo- amplification and sequencing of 384 clones, all of which had identical somal protein genes are also generally found in conserved neigh- sequences matching the one previously obtained at UNC. A 454 paired- borhoods, except that L28 is downstream of the 5S rRNA gene end library was prepared and sequenced on a GSFLX titanium sequencer rather than adjacent to L33, and S21 is separated from S9 and L13. (454 Life Sciences). Assembly was performed using the Celera assembler, RNA polymerase. A putative RNA polymerase ␣-subunit gene and annotation was done via the Newbler pipeline. A total of 99.3% of the (rpoA) was found, as expected, in the alpha operon (16, 17). Sin- sequence was assembled into 822 contigs, suggesting that good coverage ␤ ␤ was achieved. A total of 4.7 Mb of sequence was recovered, with 80% of the gle-copy RNA polymerase and = subunit genes (rpoB and rpoC) sequence forming large (Ն15-kb) contigs. Annotated sequences are re- are often found between an S12 gene downstream and additional ferred to by 5-digit contig and 4-digit ORF numbers, e.g., 00024_0691. ribosomal protein genes upstream, but in the BOGUAY sequence, Annotation at UNC. Additional sequence analysis was carried out rpoB and associated ribosomal protein genes are in the center of using a combination of the JCVI-supplied annotation, the IMG/ER (8) one contig, and copies of rpoC are annotated on two others (see and RAST (9) platforms, and BLASTN, BLASTX, BLASTP, PSIBLAST, and Fig. S2 in the supplemental material). Two rpoC copies have also DELTABLAST searches of the GenBank nr databases. Nucleic acid and been noted in some cyanobacteria (18). The putative rpoC copies amino acid sequence alignments were generally performed in MEGA5 (10) both have high-scoring BLAST hits to other gammaproteobacte- using MUSCLE (11); for several amino acid alignments, the NCBI CO- rial ␤= subunit genes, with the top hit in both cases to the same BgP BALT aligner (12) gave a subjectively better result. Maximum-likelihood phylogenies were inferred in ARB (13) with RAxML rapid bootstrapping predicted protein (ZP_01998875.1; this is annotated as rpoA but is (14) and Bayesian phylogenies in MrBayes 3.2 (15). clearly mislabeled). The BOGUAY 00100_0018 predicted amino acid sequence has a gap of ϳ75 amino acid (aa) residues beginning RESULTS AND DISCUSSION at approximately position 982 and another of ϳ18 aa near the C The various potential mobile elements identified to date in the terminus, relative to other ␤= subunit sequences, so it may have an BOGUAY genome exhibit two distinct types of phylogenetic his- altered (or no) function. The other copy (BOGUAY 01343_3638) tory, as reconstructed from present-day sequences. Homing en- is near the end of a short contig and might be connected to the donucleases, group I and group II introns, GyrB and DnaE inteins, assemblage containing the ␣ subunit gene, but there is no evidence fdxN excision-like elements, and multiple copies of an ORF of for this in the existing sequence data. In light of this possible gene unknown function all have close cyanobacterial relatives and a rearrangement, it is interesting that a transposase gene (BOGUAY limited range of other bacterial affiliations, while restriction-mod- 00680_3522) is annotated at the downstream end of the proposed ification and toxin-antitoxin systems generally have (as expected) alpha operon (see Fig. S2 in the supplemental material). a much wider distribution. These patterns are discussed in the Candidate genes for several RNA polymerase sigma factors following sections. Putative transposons will not be discussed (␴70 [00397_1682], RpoN [00721_4589], RpoH [00906_2621,

July 2013 Volume 79 Number 13 aem.asm.org 3975 MacGregor et al.

00938_0742], RpoS [01092_1316], and RpoD [01192_0215]) E. coli position 1917, in the loop of predicted helix 69, and a pu- have also been annotated but will not be further discussed here. tative LAGLIDADG homing endonuclease at E. coli position 1931, tRNAs. Forty-six tRNA genes, covering all 20 standard amino in a predicted unstructured region between helices 69 and 71 (Fig. acids, were identified by JCVI, RAST, and JGI using tRNAScan- 1B). These positions and several nearby ones also harbor interven- SE-1.23 (19, 20), with three tRNAs (tRNA-Leu-GAG, tRNA-Tyr- ing sequences in a small but phylogenetically diverse group of GTA, tRNA-Val-TAC) represented twice (see Table S2 in the sup- other bacteria and archaea, hinting at a possibly complex history plemental material). Initiator methionine (iMet, fMet), extension of exchange, decay, and preservation for these elements. methionine (eMet), and lysylated isoleucine (kIle) tRNAs, which The phylogeny of the BOGUAY 23S homing endonuclease was share the anticodon CAT, were identified via the TFAM Web inferred by both maximum-likelihood (Fig. 2) and Bayesian (see server (21) and tRNA database (tRNdb) (22). Consistent with the Fig. S4 in the supplemental material) methods, which yielded sim- identification of tRNA-kIle-CAT, the draft genome includes a ilar results. The BOGUAY 23S homing endonuclease gene is affil- predicted gene for tRNA(Ile)-lysidine synthetase (BOGUAY iated with those from Synechococcus sp. C9 and Synechococcus livi- 00794_2073), which changes the specificity of one class of tRNA- dans C1 (34), with additional similar sequences in a few widely CATs from methionine to isoleucine by modifying their antico- divergent bacteria (e.g., several Chlamydiales, some Thermotoga don C to lysidine (23). Methionyl-tRNA formyltransferase, species, a few Coxiella burnetii strains) and eukaryotic organellar for formylation of initiator methionine, was also identified genomes (particularly in green algae). Close relatives of the intron (BOGUAY 00472_0536). sequence are also sporadically distributed (Fig. 3) but limited to After application of the standard wobble-base rules (24), bacteria, with the nearest relative so far in the giant sulfur bacte- tRNA-Arg-TCT and tRNA-Leu-TAA remained missing. Directed rium Thiomargarita sp. NAM092. Given that sequences similar to BLASTN searches with the uninterrupted cognate genes from Beg- both the intron and homing endonuclease genes are found to- giatoa alba B18LD revealed that these contain inserts, with the gether in BOGUAY, Synechococcus sp. C9, and several Coxiella sequence coding for the first 38 tRNA bases (up to the end of burnetii strains, they may have been transmitted together, with the the predicted anticodon loop) separated from the remainder of intron perhaps lost more easily over time. Synechococcus and Beg- the genes by spacers of 272 and 316 nucleotides (nt), respectively giatoaceae species can be found together in microbial mats (e.g., (see footnote to Table S2 in the supplemental material). This is a see reference 35), which may be favorable environments for gene common position for self-splicing group I introns in bacterial and exchange. Coxiella burnetii is a gammaproteobacterium only chloroplast tRNAs (25). The BOGUAY inserts appear to have at known to grow as an intracellular pathogen (36), but its DNA has least some of the conserved group I intron secondary structures been detected by PCR in a variety of terrestrial (37) and marine (not shown), but proof that they can excise would require exper- (38) environments, and one strain has been found infecting a imentation. The total of 48 tRNA genes is relatively few compared Steller sea lion (39), making genetic exchange between Coxiella to other bacteria (26), a trait which has been correlated with slow and Beggiatoaceae a possibility. As more giant sulfur bacterial growth and slow response to changes in nutrient concentrations chromosomal sequences become available, however, there may be (27, 28). evidence for vertical transmission of 23S rRNA introns within this tRNA synthetase genes. A complete set of tRNA synthetase group. genes was found (see Table S2 in the supplemental material). Like Whether there is a selective advantage to producing rRNA with many species in all three domains of life (29), the BOGUAY ge- intervening elements is (so far as we are aware) an open question; nome apparently does not possess a dedicated asparaginyl-tRNA perhaps it provides an additional control on the rate of ribosome synthetase. Instead, tRNA-Asn may first be ligated with aspartate, production or protection against other nucleases recognizing the which is then amidated by aspartyl-glutamyl-tRNA amidotrans- conserved target sites. It will be interesting to see if these elements ferase (GatCAB). A putative gatCA was located in the center of one are a general feature of deep-sea Beggiatoaceae; they are not found relatively long contig (BOGUAY 00150; 40,422 bp) and gatB on a in B. alba L18D, and a BgP fragment (BGP_R0100; 329 bp) has separate, smaller one (BOGUAY 00338; 3,292 bp). An apparent similarity to BOGUAY 23S sequences to either side of the intron amplification or sequencing error splits gatA into two ORFS and homing endonuclease but not to the elements themselves. (00150_0829, 00150_0828), with both fragments having high- The BgS genome does not appear to include any contigs spanning scoring matches to other putative gatA sequences (not shown). the insert region. In summary, although the single-filament BOGUAY genome is Group I introns in tRNA genes. The BOGUAY tRNA-Leu- not closed, genes encoding complete sets of ribosomal protein, TAA and tRNA-Arg-TCT genes contain inserts at position 38, the tRNA, and tRNA synthetase genes could be identified. While in- end of the anticodon loop, which may be group I self-splicing dividual genes may certainly have been missed, it seems unlikely introns (see Table S2 in the supplemental material). The first self- that entire multienzyme pathways were, but (as with any se- splicing bacterial intron discovered was in the anticodon loop of a quence-based predictions) experimental proof will be needed. cyanobacterial tRNA-Leu-TAA (40, 41), and introns in this tRNA Homing endonuclease gene and a group I intron in the 23S are widespread among cyanobacteria and chloroplasts (e.g., see rRNA gene. The inferred phylogeny of the BOGUAY 23S rRNA references 42 and 43). At least one has been found in a gamma- gene (see Fig. S3 in the supplemental material) is congruent with proteobacterium (44). Group I introns have also been identified in that of the 16S rRNA gene (see Fig. S1 in the supplemental mate- the tRNA-Arg-CCT of several alphaproteobacteria (45, 46), rial), at least so far as 23S rRNA sequences are available to date, but tRNA-Ile-CAT of a betaproteobacterium (45), and tRNA-fMet of it contains two intervening elements (Fig. 1A) similar to ones some cyanobacteria (47). They are not easily found in databases found in the rRNA genes of diverse species (e.g., see references because their primary sequences are variable, and automated an- 30–32), including other giant sulfur bacteria (33). In orange notation programs generally do not recognize the short-flanking Guaymas Beggiatoa, a putative group I catalytic intron is found at host gene fragments. The closest relative of the BOGUAY tRNA-

3976 aem.asm.org Applied and Environmental Microbiology Mobile Elements in Orange Guaymas Beggiatoa

FIG 1 Position of intervening sequences in the 23S rRNA gene. (A) The orange Guaymas Beggiatoa genome sequence includes a single set of rRNA genes. The figure combines JGI and RAST annotations; the 23S rRNA gene was not annotated by JGI and therefore has no number. It contains putative intron and homing endonuclease insertions, separated by 14 bp of apparent rRNA sequence. The intron sequence overlaps the upstream portion of the 23S gene by 4 bp. (B) The orange Guaymas Beggiatoa intervening sequences are in the predicted helix 69 loop and between helices 69 and 71 of the predicted rRNA structure, a region in which a variety of other bacteria and archaea also harbor putative introns and homing endonucleases, as outlined in the figure. These were identified primarily by searching the Silva (75) LSURef 111 database for 23S rRNA gene sequences longer than 3,000 bp and refining the alignment to place the intervening sequences precisely.

Leu-TAA was within the cognate gene of the cyanobacterium Lep- transfer from a cyanobacterium into the lineage encompassing tolyngbya sp. PCC 6703 (AY768525). Directed searches with the these three marine Beggiatoaceae (BOGUAY, BgP, BgS) after its BOGUAY tRNAs and their introns identified similar sequences in divergence from the B. alba lineage, but additional genome se- the BgP but not B. alba genomes. quences are needed to confirm this. It should also be pointed out Group II catalytic introns. Five possible self-catalytic group II that the several functions encoded by a single mobile element may introns were found in the BOGUAY genome (Fig. 4; see also Fig. be subject to different evolutionary pressures, so the true phylog- S5 in the supplemental material). Group II introns were originally eny may not be captured by simple sequence comparisons (49). discovered in eukaryotic organelles but have since been found in Four of the five putative BOGUAY group II introns are adja- many bacteria and some archaea (reviewed in reference 48). They cent to possible reverse transcriptase and/or transposase genes transpose as RNA: the intron self-catalytically splices from its host (see Table S3 in the supplemental material), but because three of RNA, splices into a DNA target site, and is then reverse transcribed the sequences are close to the ends of their contigs, it is not clear to become part of the genome at the new location. The enzymes what the full complement of associated genes might be. The fourth required for reintegration are often encoded within the introns (00500_2973) is located in the immediate neighborhood of three but may sometimes act in trans. The short (79- or 80-bp) lariat- possible transposases, a possible DNA-binding protein, a possible encoding regions of the five BOGUAY group II intron candidates endonuclease, and a possible retron-type reverse transcriptase. are highly similar to one another (two are identical) and have The fifth (00593; positions 280 to 358) is located on a short contig some close matches in the BgP and BgS, but not B. alba, genomes. (1,600 bp) with a possible toxin-antitoxin gene pair as the only The closest relatives of these sequences (only a few of which have other identifiable ORFs. Whether these elements are still mobile is been annotated as introns) are all from cyanobacteria, specifically not known. Trichodesmium erythraeum IMS101 and several Arthrospira spe- GyrB and DnaE inteins. Inteins are amino acid sequences that cies. These do not seem to be embedded in similar genes (not excise self-catalytically from their host proteins posttranslation- shown), although identifying split genes is not always straightfor- ally, leaving a functional host protein (“extein”; reviewed in refer- ward. The most parsimonious explanation would be a single ence 50). They may encode homing endonucleases, which make

July 2013 Volume 79 Number 13 aem.asm.org 3977 MacGregor et al.

100% ACU25394.1 Trebouxia suecica (chloroplast) 44% ACU25387.1 Trebouxia angustilobata (chloroplast) 25% AAL34389.1 Chlorosarcina brevispinosa (chloroplast) 20% AAL34311.1 Pedinomonas tuberculata (chloroplast) 100% AFA35829.1 Choricystis chodatii (chloroplast) 32% AFA35833.1 Coccomyxa rayssiae (chloroplast) Gammaproteobacteria 2% AAL34315.1 Pterosperma cristatum (chloroplast) Cyanobacteria 100% AAL77608.1 Monomastix sp. M722 (chloroplast) Thermotogales 33% YP_002601068.1 Monomastix sp. OKE-1 (chloroplast) 20% AAA20591.1 Acanthamoeba castellanii (mitochondrion) 8% YP_665642.1 Nephroselmis olivacea (mitochondrion) Aquificales 3% AAG61147.1 Chlorella vulgaris (mitochondrion) Chlamydiales 7% YP_635905.1 Oltmannsiellopsis viridis (chloroplast) 100% AAL34374.1 Lobochlamys segnis (chloroplast) Chlorophyta AAL34366.1 Chlamydomonas frankii (chloroplast) 82% Viridiplantae 88% AAL34360.1 Chlamydomonas mexicana (chloroplast) 100% AAL34368.1 Chlamydomonas geitleri (chloroplast) Amoebozoa 10% YP_002000425.1 Oedogonium cardiacum (chloroplast) 100% ABD91529.1 Synechococcus sp. C9 40% ABD91526.1 Synechococcus lividus C1 39% BOGUAY 00660_3608 41% BAL53026.1 uncultured Chloroflexi bacterium (gold mine metagenome) 84% AAX98676.1 Candidatus Fritschea bemisiae (whitefly endosymbiont) 18% ABM89081.1 Chlamydiales symbiont of Xenoturbella westbladi (marine worm) 58% YP_665682 Mesostigma viride (mitochondrion) 96% AAD38228.1 Simkania negevensis (human intracellular bacterium) 98% Thermotoga spp. [8 sequences] 99% 69% BAL58010.1 uncultured Chloroflexi bacterium (gold mine metagenome) YP_003473665.1 Thermocrinis albus DSM 14484 96% ABD91528.1 Synechococcus sp. C9 YP_002601069.1 Monomastix sp. OKE-1 (chloroplast)

0.10 FIG 2 Putative homing endonucleases encoded within 23S rRNA genes. Relatives of BOGUAY 0660_3608 were identified by BLASTX searches of the GenBank nr and JCVI IMG/ER databases (March 2013). Inferred amino acid sequences were aligned in MEGA5 (10) with MUSCLE (11). The first 100 hits were used for initial neighbor-joining trees, which were pruned to focus on the most closely BOGUAY-related sequences before further analysis. The maximum-likelihood phylogeny was inferred in ARB (13). All four combinations of the PROTGAMMA and PROTMIX rate substitution models and WAG and cpREV amino acid substitution models were tested with RAxML (14) new rapid hill climbing. The PROTGAMMA/cpREV combination gave the best likelihood score and was used for RAxML rapid bootstrap analysis with 1,000 bootstraps. Numbers represent bootstrap values for each node, and circles indicate uncertain branch points. Bayesian analysis also identified cpREV as the best-fitting amino acid substitution model for these data (see Fig. S4 in the supplemental material). Numbers at nodes represent bootstrap values for each node, and circles indicate uncertain branch points. The scale bar represents amino acid changes per position. double-stranded cuts in target genes into which the intein genes supplemental material), and one in the putative DNA polymerase are spliced from an existing integrated copy. The BOGUAY ge- III alpha subunit gene dnaE (Fig. 5B; see also Fig. S6B in the sup- nome encodes at least two possible inteins, one in the putative plemental material), which belong to an intein family also found DNA gyrase B subunit gene gyrB (Fig. 5A; see also Fig. S6A in the in proteins, including replicative DNA helicase, GyrA, and ribo-

100% AJ556793.1 Thermotoga subterranea 33% CP000812.1 Thermotoga lettingae TMO 96% HQ213630.1 Uncult. Solirubrobacterales (Arctic soil) 67% AY705476.1 Uncultured bacterium ALT16 (Altamira cave) 97% DQ421380.1 Synechococcus sp. C9 100% BOGUAY 00660_3609 74% FR774200.1 Thiomargarita sp. NAM092 AE016828.2 Coxiella burnetii RSA 493 28% CP000890.1 Coxiella_burnetii RSA 331 Gammaproteobacteria Cyanobacteria 73% CP001020.1 Coxiella burnetii CbuK Q154 Thermotogales CP000733.1 Coxiella burnetii Dugway 5J108−111 CP001019.1 Coxiella burnetii CbuG Q212 0.10

Species also has homing endonuclease within 23S rRNA gene Close relatives have homing endonuclease within 23S rRNA gene FIG 3 Putative group I introns within 23S rRNA genes. Relatives of BOGUAY 0660_3609 were identified by BLASTN searches of the GenBank nr and JCVI IMG/ER databases; all are located within 23S rRNA genes. As shown, there were very few full-length or nearly full-length matches. Species which also have putative 23S homing endonucleases are indicated by black stars, and species whose close relatives do are indicated by open circles (compare to Fig. 2). Nucleic acid sequences were aligned in MEGA5 (10) using MUSCLE (11). Maximum-likelihood phylogenies were inferred in ARB (13) with RAxML rapid bootstrapping (14), using the GTRGAMMA rate distribution model and 1,000 bootstraps. Numbers at nodes represent bootstrap values for each node, and circles indicate uncertain branch points. The scale bar represents base changes per position.

3978 aem.asm.org Applied and Environmental Microbiology Mobile Elements in Orange Guaymas Beggiatoa

Beggiatoa sp. PS, contig 00231 (756-826)

BOGUAY, contig 00632 (52225-52147) 0.99 BOGUAY, contig 00762 (2923-2845)

1 Beggiatoa sp. PS, contig 24845 (233-311)

BOGUAY, contig 00593 (280-358) Beggiatoaceae BOGUAY, contig 00647 (325-399) Cyanobacteria

Beggiatoa sp. SS, NZ_ABBY01000346 (350-428)

0.57 0.98 Beggiatoa sp. SS, contig 144 (16-94)

BOGUAY, contig 0500 (10070-10133) 0.82 Beggiatoa sp. PS, NZ_ABBZ01000876 (130-208)

Trichodesmium erythraeum IMS101, CP000393.1 (6441831-6441909)

Trichodesmium erythraeum IMS101, CP000393.1 (678613-678523)

0.63 Trichodesmium erythraeum IMS101, CP000393.1 (675675-675753) 0.93 0.62 Trichodesmium erythraeum IMS101, CP000393.1 (2498900-2498991)

Arthrospira platensis C1, AFXD00000000 (230031-230109)

Arthrospira platensis NIES-39, AP011615 (2828610-2828688) 0.97 0.93 Arthrospira sp. PCC 8005, NZ_ADDH01000086 (27273-27352)

0.07 Arthrospira maxima CS-328, NZ_ABYK01000020 (20742-20820)

Trichodesmium erythraeum IMS101, CP000393.1 [10 sequences]

FIG 4 Inferred phylogeny of candidate BOGUAY group II catalytic introns and related sequences. Four sequences annotated as BOGUAY group II introns were used for BLASTN searches of the IMG/ER database (cutoff E value of 9eϪ06, cutoff score of 58 bits). For sequences not previously annotated (the majority), genome positions are indicated. Three Crocosphaera watsonii sequences were used to root the tree. Nucleic acid sequences were aligned in MEGA5 (10) using MUSCLE (11). Phylogenetic analysis was carried out in MrBayes 3.2 (15) (2 runs of 4 chains each, 1.5 million generations) using the GTR base substitution and invgamma rate variation models. With the first 25% of generations discarded, the final statistics indicated that convergence was reached between the two runs (data not shown). The scale bar represents base changes per position. See Fig. S5 in the supplemental material for the phylogeny inferred by neighbor joining. nucleoside diphosphate reductases in a wide variety of bacterial sequences from other Beggiatoaceae (see Fig. S7 in the supplemen- and archaeal genomes. Both sequences encode possible homing tal material), and their positions are similar to that predicted from endonucleases of the LAGLIDADG family (reviewed in reference the 16S rRNA phylogeny (MacGregor et al., submitted) (53). 51), and both cluster with cyanobacterial sequences, although the Additional N- and C-terminal intein fragments have been an- identity of the closest relative varies with the phylogenetic infer- notated in the BOGUAY genome (00251_2940 and 00127_3146, ence method used (maximum likelihood or Bayesian analysis). respectively), which are positioned at the end of their contigs in The noncyanobacterial matches are more phylogenetically diverse such an orientation that they could be joined end to end. The than for some of the other BOGUAY elements discussed here. For concatenated ORFs would encode the uncharacterized conserved the GyrB intein, these include halophilic archaea (notably Halo- protein RtcB (COG1690) containing an endonuclease-bearing in- quadratum walsbyi, “Walsby’s square bacterium” [52]) and two tein, with the intein most similar to possible cyanobacterial inteins species of sulfur-oxidizing gammaproteobacteria (Fig. 5A), while and the extein most similar to predicted proteins from BgP, B. the BOGUAY DnaE intein is related to a planctomycete and a alba, and many other gammaproteobacteria (not shown). Bacteroidetes sequence (Fig. 5B). GyrB inteins have not yet been Sequences related to fdxN excision elements. The BOGUAY found in other Beggiatoaceae species, but Beggiatoa alba B18LD genome includes multiple copies of ORFs similar to cyanobacte- encodes a possible DnaE intein (more distantly related than the rial XisH and XisI genes, most found in pairs. They have also been sequences shown here), BgP encodes a DnaB one, and the partial noted in the BgP and BgS genomes (6), and a single xisI-like ORF BgS genome includes a short contig encoding a fragment of a is found in B. alba L18D (BegalDRAFT_0886). These elements replicative DNA helicase (BGS_0184) and a possible N-terminal have been best studied in the diazotrophic cyanobacterium intein fragment (BGS_0185). The inteins in the different Beggiato- Anabaena (Nostoc) sp. strain PCC 7120, in which terminal differ- aceae may have been introduced independently, but inferring the entiation of cells within filaments to nitrogen-fixing heterocysts evolutionary history of inteins is complicated by the fact that the requires excision of DNA fragments interrupting three nitrogen endonuclease function can be supplied in trans and could therefore fixation genes, each of which is associated with a site-specific re- decay rapidly in species that happen to bear or obtain a second, com- combinase (nifD/XisA, hupL/XisC, fdxN/XisF) (54, 55). Excision patible intein. of the fdxN element also requires XisH and XisI, which are en- As expected, the exteins of the BOGUAY GyrB and DnaE in- coded within the excised fragment along with XisF. Their mode of teins appear to be gammaproteobacterial and closely related to action is not known.

July 2013 Volume 79 Number 13 aem.asm.org 3979 MacGregor et al.

A) BOGUAY GyrB intein and related sequences B) BOGUAY DnaE intein and related sequences

100% Thermus spp. (6 sequences) 100% CRC_01164 Cylindrospermopsis raciborskii CS−505 95% 100% Alvin_0467 Allochromatium vinosum DSM 180 54% CRD_00599 Raphidiopsis brookii D9 99% ThimaDRAFT_4162 Thiocapsa marina 5811 DSM 5653 45% MC7420_309 Coleofasciculus chthonoplastes PCC 7420 98% Trad_2353 Truepera radiovictrix DSM 17093 Tery_3961 Trichodesmium erythraeum IMS101 31% 35% GobsU_010100012782 Gemmata obscuriglobus UQM 2246 100% BOGUAY 01113_1161 (intein without endonuclease) 87% BOGUAY 00337_2021 (intein only) 100% GyrB inteins Tery_1889 Trichodesmium erythraeum IMS101 Synechocystis spp. (4 sequences) 100% 98% 39 % MC7420_2413 Coleofasciculus chthonoplastes PCC 7420 DnaE inteins 97% FR746099 Haloquadratum walsbyi C23 36% PERMA_0903 Persephonella marina EX−H1 76% Halar_2635 halophilic archaeon sp. DL31 67% Rmar_1917 Rhodothermus marinus DSM 4252 A390DRAFT_022215 Lamprocystis purpurea DSM 4197 81% VAB18032_13715 Verrucosispora maris AB−18−032 86% 87% Thi970DRAFT_3239 Thiorhodovibrio sp. 970 100% Alvin_1987 Allochromatium vinosum DSM 180 40% Hhal_0653 Halorhodospira halophila SL1 100% OSCT_2440 Oscillochloris trichoides DG6 32% BGS_0185 Beggiatoa sp. SS Cyan8802_1675 Cyanothece sp. PCC 8802 83% Inteins in 75% CwatDRAFT_6199 Crocosphaera watsonii WH 8501 65% PCC8801_1648 Cyanothece sp. PCC8801 57% ECTPHS_00480 Ectothiorhodospira sp. PHS−1 hypothetical 100% CY0110_12402 Cyanothece sp. CCY0110 71% BCB4264_A3587 Bacillus cereus B4264 (XRE family transcriptional regulator intein) 50% L8106_15929 Lyngbya sp. PCC8106 proteins 100% HMPREF1014_01385 Bacillus sp. 7_6_55CFAA_CT2 (hypothetical protein intein) Npun_R6421 Nostoc punctiforme PCC 73102 47% BcerKBAB4_5745 Bacillus weihenstephanensis KBAB4 plasmid (DnaB intein) OSCT_0894 Oscillochloris trichichoides DG6 dCTP 35% Krac_10863 Ktedonobacter racemifer DSM 44963 (ATPase intein) ECTPHS_09328 Ectothiorhodospira PHS−1 deaminase 93% Fleli_2398 Flexibacter litoralis Fx l1 DSM 6794 99% Halhy_3634 Haliscomenobacter hydrossis O DSM 1100 0.10 inteins 99% Dester_1454 Desulfurobacterium thermolithotrophum DSM 11699 100% Streptomyces cattleya strs. (2 sequences) 82% 73% Desku_3545 Desulfotomaculum kuznetsovii 17 DSM 6115 100% B128DRAFT_00859 Thermus igniterrae ATCC 700962 Gammaproteobacteria 98% RLTM_08244 Thermus sp. RLM 59% MeichDRAFT_1677 Meiothermus chliarophilus ALT−8 DSM 9957 Cyanobacteria 100% Mrub_2587 Meiothermus ruber DSM 1279 Actinobacteria 40% NC_014974 Thermus scotoductus SA−01 DnaB inteins 100% Rhodothermus marinus strs. (3 sequences) Chloroflexi 33% Bacteroidetes 31% NZ_ABBZ01000574 Beggiatoa sp. PS 100% Nostoc punctiforme strs. (2 sequences) 38% 59% Cyan7882_2654 Cyanothece sp. 7822 Deinococci 100% OSCI_2300004 Oscillatoria sp. PCC 6506 21% Cyan7425_0288 Cyanothece sp. PCC 7425 AP012205 Synechocystis sp. PCC 6803 str. GT−S Aquificales 13% NC_00911 Synechocystis sp. PCC 6803 SYNPCCP_2552 Synechocystis sp. PCC 6803 str. PCC−N SYNPCCN_2552 Synechocystis sp. PCC 6803 str. GT−I Archaea 0.05

FIG 5 Inferred maximum-likelihood phylogeny of the putative BOGUAY GyrB and DnaE inteins. Relatives of the BOGUAY sequences (highlighted by boxes) were identified by BLASTX searches of the GenBank nr and JCVI IMG/ER databases. The first 100 hits were used for initial neighbor-joining trees, which were pruned to focus on the most closely Beggiatoa-related sequences before further analysis. Inferred amino acid sequences were aligned in MEGA5 (10) using MUSCLE (11). Maximum-likelihood phylogenies were inferred in ARB (13) with RAxML rapid bootstrapping (14), using the PROTMIX rate distribution model and WAG amino acid substitution model. Five hundred bootstraps were used for panel A, and 1,000 were used for panel B. Numbers at nodes represent bootstrap values for each node, and circles indicate uncertain branch points. The scale bar represents amino acid changes per position. Bayesian trees are presented for comparison in Fig. S6 in the supplemental material.

There are at least 12 sets of genes encoding putative XisHI and as far as we are aware no mechanism governing their forma- homologs in the BOGUAY genome, plus a lone XisI gene at the tion has been identified. The “White Point filament” strain can beginning of a contig (see Table S4 in the supplemental material). attach by holdfasts to solid substrates or form rosettes with other The phylogenetic trees (Fig. 6) suggest that xisI and xisH were cells (59), suggesting specialization by terminal cells. Variable cell acquired as a pair on one or more occasions and have diverged as morphology has been reported for some close relatives, including pairs, with xisH being lost or decaying more rapidly (for example, several Thiothrix species (gliding gonidia, holdfasts, rosettes, spi- compare XisH 1 and 6 with XisI 1, 6, and 11). No associated ral filaments) (60, 61). Any gene rearrangements induced by an recombinase could be identified, although one XisHI copy (on Xis-like system might also be more easily tolerated in filamentous contig 00833) is flanked by putative genes for the RecB family and species, to the extent that they are able to share cellular contents CRISPR-associated endonucleases. Sequences similar to xisHI between individuals, possibly making these elements more likely have so far been found primarily in cyanobacteria and in only a to persist and evolve new functions. Large bacteria such as the few phylogenetically dispersed noncyanobacterial genomes, rep- vacuolate Beggiatoaceae can also possess multiple nucleoids per resenting either filamentous or pleiomorphic species (see Table S5 cell, possibly allowing for coexistence of wild-type and mutant in the supplemental material). Of genomes studied to date, the alleles. cyanobacterium Coleofasciculus (Microcoleus) chthonoplastes PCC Sequences similar to ORF 00024_0693. ORF 00024_0693, 7420 (filamentous, nondiazotrophic) (56) and the sphingobacte- found on the contig encoding an abundant orange cytochrome rium Haliscomenobacter hydrossis (filamentous) (57) have been that helps give orange Guaymas Beggiatoaceae their color (5), may annotated with a number of these elements comparable to that in be derived from a mobile element. There are at least 28 more the BOGUAY genome. similar ORFs annotated in the BOGUAY genome, and they are Whether any xisHI-like proteins are or were involved in gene found in the other sequenced Beggiatoaceae genomes as well (27 in inversions outside the cyanobacteria is not known, but gene rear- BgP, 1 in BgS, and 1 in B. alba; Fig. 7). The BgP and especially BgS rangements promoting cell differentiation (e.g., heterocyst for- genomes are incomplete (6), so their lower copy numbers may be mation) might be of particular benefit to filamentous species. Sac- an artifact, but the B. alba genome is considered complete. A single rificial dead cells (necrosomes) which serve as filament breakage copy has been annotated in one related sulfur oxidizer (Thiothrix points have been observed in several Beggiatoaceae strains (58), nivea DSM 5205), but as previously noted in relation to some of

3980 aem.asm.org Applied and Environmental Microbiology Mobile Elements in Orange Guaymas Beggiatoa

9 BOGUAY 00553_4206 B153DRAFT 04180 B153DRAFT 3 XisI BOGUAY 00362_1737 L8106 27249 XisH MC7420 6682

MC7420 5202

AmaxDRAFT 0543

AplaP 010100024390PCC7424 5305 Halhy 5214 Halhy Halhy 4001 NIES39 O02950 Npun R1050

Halhy 4773 9 Tery 0527

Halhy 1416 Slin 0238 B153DRAFT 04181 Halhy 2867 Cy51472 3422 5

Runsl 2265 SYNPCC7002 F0026 Cyan7822 5112 MC7420 2946

cce 2414 9 4982 Cyan7822 AplaP 010100022628 NIES39 K02470 BegalDRAFT 0886 N9414 11219 3080 Cyan7425 Halhy 5481Halhy 4002 MC7420 7593 Halhy 4195 Aazo 4436 Runsl 2523

AmaxDRAFT 4369 SPLC1 S131550 N9414 23713 APCC8 010100008737 8 N9414 10508 Halhy 2865Runsl 2264

CwatDRAFTMC7420 4877 4662 CwatDRAFT 6584 CwatDRAFT

N9414 01230 N9414 Ava 3825 Ava Cy51472 3707 Slin 0239 MAE 16020 L8106 00810 APCC8 010100008732CwatDRAFT 4878 MAE 61420 Halhy 4296

MAE 61430 SPLC1 S131540 MC7420 569 L8106 00805 L8106 14320 Cy51472 1070 OSCI 3280031 650112958.OSCT 0916 3 Halhy 1415 AmaxDRAFT 4368 B153DRAFT 00519 PCC7424 4368cce 2705 BOGUAYMC7420 00553_4208 7863 BOGUAY 00946_4151 Slin 1546 2 PCC7424 5304 BOGUAY 00342_3380cce 0011 AmaxDRAFT 0542 BOGUAY 00553_4207 Npun R1104 NIES39 O02940 BOGUAY 01341_2380 MAEMC7420 56210 4891 AplaP 010100022633MAE 44780 AplaP 010100024385 CwatDRAFT 0054 MC7420 3009 MC7420 6481 Halhy 5213 Halhy 6510 Aazo 4435 Halhy 4772 GobsU 010100021840 NIES39 K02480 BOGUAYBGP 6030 00362_17384 A3IKDRAFT 03301 CCHmeta 00669 SYNPCC7002OSCI 3280030 F0025 cce 0472 4 BOGUAY 00833_4598 CCHmeta 04390PCC7424 4222 Cy51472 1531 AplaP 010100014778 Cyan8802 2205 Npun R1105 Cyan8802 2206 CwatDRAFT 4842 N9414 13817 MAE 56080 NIES39 O05790 .N9414 13812 AM1 4785 CCHmeta 00003 MAE 56070 OSCI 1460020 Cyan7425 3079 CCHmeta 04070 BOGUAYcce 2415 00833_4597 Haur 0276 MC7420 5272 Cy51472 3423 L8106 20815 NIES39 O02970 Haur 00239 Haur 05505 AplaP 010100027240 Ava 3313 MC7420 1270 APCC8 010100012068 Npun F2942 PCC7424 4367 Haur 5180 Ava 3826 BGP 5820 MC7420L8106 8075 22411 MAE 21850 MC7420 2075 MC7420 500 CY0110 08366 MAE 44790 BGP 5662 MC7420 7674 Cyan7822 5514 CwatDRAFTHalhy 6583 2856 20 PCC7424 2630 BGP Ava4792 2270 Npun AR114 A3IKDRAFT 03300 CY0110 04708 Ava 3316 alr1462 Thini 0796 BGP 6031 Cyan7822 5515 alr1461 Npun R5538 Cyan7822 0756 BOGUAY 01159_4377 OsccyDRAFT 2868 MAE 61050 Npun AR115 CY0110 23476 13 Cyan7425 2015 MC7420 2842 BOGUAY 01341_2379 Npun R5662 NIES39 C03560 Cy51472cce 4263 3285 AplaP 010100020594 CY0110 04713 Npun F5392 8 BOGUAY 00494_3983 BOGUAY 00946_4150 OSCI 1460021 CwatDRAFT 0034 10 Cyan8802 3055 OSCI 250001 Slin 1545 5 A3IKDRAFT 02925

MAE 20240 0150 PCC8801 AmaxDRAFT 0544

OSCI 2300006 Cyan8802 0146 Halhy 2866 Aazo 2556 Cyan7822 1476 APCC8 010100012063 Halhy 4295 Npun R4330 BOGUAY 00948_1636 Cyan8802 3443 SPLC1 S490110 7 Ava 3312 AplaP 010100027235 CCHmeta 06243 PCC8801 2661 MAE 21860 MC7420 492 L8106 12260 BGP 5821 NIES39 O02960 NIES39 O05780Npun F2941 BGP 5663N9414 18830 MC7420 8160 AplaP 010100014783 OSCI 250002 Npun R5537 CwatDRAFT 6412

PCC8801 2660 L8106 25265 L8106 22416

PCC8801 0149 Tery 2358

BOGUAY 00997_3078 BGP 4791 BOGUAY 01026_3196 Cyan8802 0145 Ava 2269 Cyan7822 1475 OsccyDRAFT 2869 0.2 Cyan7425Npun 2014 R5661MC7420 8077 1 6 Ava 1458 BOGUAY 01026_3195 Halhy 5212 6 N9414 23103 BOGUAY 00997_3079CCHmeta 06242 cce 3286/1-113 AplaP 010100020589 BGP 2002 Cy51472 4264 1 BOGUAY 00754_1883 Thini 4073 APCC8 010100004480 MAE 02070 AM1 4783

cce 2706 cce MC7420 2743 CwatDRAFT 6659 Cyan8802 1916 PCC8801 1892PCC8801 1884 MC7420 1216 NIES39 C03570 11 MAE 16030 Ava 3320 Cyan7822 0757 N9414 23708 CwatDRAFT 6654 BOGUAY 00948_1637 Cy1472 3708 cce 0010 MC7420 7506 Halhy 2857 7 BOGUAY 00494_3984 CY0110 10732 A3IKDRAFTCy51472 01128 1069

10 N9414 11224 Cyan7822 5113 5179 Haur Halhy 6434

BOGUAY 00342_3379 05504 Haur

Halhy 2152 2 sp. 90 (FJ461733.cds11.9809..9474) IMG Copies in tree abbreviation Species XisH XisI Anabaena 0.1 Cyanobacteria AM1 Acaryochloris marina MBIC11017 1 1 - Anabaena sp. 90 - 1 Ava Anabaena variabilis ATCC 29413 4 5 A3IKDRAFT 02924

Haur 0282

APCC8 Arthrospira sp. PCC 8005 2 3 .Haur 00300

Haur 00292 AmaxDRAFT Arthrospira maxima CS−328 2 3 IMG Copies in tree SPLC1 Arthrospira platensis C1 1 2 abbreviation Species XisH XisI NIES39 Arthrospira platensis NIES-39 5 5 Gammaproteobacteria AplaP Arthrospira platensis str. Paraca 5 5 BOGUAY Orange Guaymas Beggiatoa 11 12 CwatDRAFT Crocosphaera watsonii WH 8501 3 7 BGP Beggiatoa sp. PS* 4 5 Cy51472 Cyanothece sp. ATCC 51472 (BH63E) 4 5 BegalDRAFT Beggiatoa alba B18LD - 1 cce Cyanothece sp. ATCC 51142 (BH68) 4 5 A3IKDRAFT Thiothrix flexilis DSM 14609 2 3 CY0110 Cyanothece sp. CCY0110 2 3 Thini Thiothrix nivea DSM 5205 - 2 PCC7424 Cyanothece sp. PCC 7424 2 4 Cyan7425 Cyanothece sp. PCC 7425 2 2 Chloroflexi Cyan7822 Cyanothece sp. PCC 7822 4 5 CCHmeta Candidatus Chlorothrix halophila 3 3 PCC8801 Cyanothece sp. PCC 8801 2 4 Haur Herpetosiphon aurantiacus ATCC 23779 4 5 Cyan8802 Cyanothece sp. PCC 8802 3 4 OSCT Oscillochloris trichoides DG 1 - L8106 Lyngbya sp. PCC 8106 3 6 MC7420 Microcoleus chthonoplastes PCC 7420 6 17 Bacteroidetes MAE Microcystis aeruginosa NIES−843 6 8 Halhy Haliscomenobacter hydrossis DSM 1100 8 13 N9414 Nodularia spumigena CCY9414 3 7 Runsl Runella slithyformis DSM 19594 1 2 alr Nostoc sp. PCC 7120 1 1 Slin Spirosoma linguale DSM 74 2 2 Aazo 'Nostoc azollae' 0708 1 2 B153DRAFT Spirosoma panaciterrae DSM 21099 1 2 Npun Nostoc punctiforme PCC 73102 5 8 OsscyDRAFT Oscillatoriales sp. JSC-12 1 1 Planctomycetes OSCI Oscillatoria sp. PCC 6506 4 3 GobsU Gemmata obscuriglobus UQM 2246 - 1 SYNPCC7002 Synechococcus sp. PCC 7002 1 1 Ter y Trichodesmium erythraeum IMS101 - 2 - Not found * Incomplete genome sequence

FIG 6 Sequences related to XisH and XisI. The trees include all sequences annotated as XisH, XisI, or fdxN excision-control proteins in IMG/ER (as of August 2012). A neighbor-joining tree was first constructed with all the inferred amino acid sequences, and the database was then divided into XisH- and XisI-like sequences. BOGUAY ORFs found adjacent to each other are indicated by boldface numbers; see Table S4 in the supplemental material for details. There is one additional putative partial XisI gene and 4 partial or split XisH genes in the BOGUAY genome. Phylogenetic analyses were conducted in MEGA5 (10), with sequences aligned by MUSCLE (11). Evolutionary histories were inferred by neighbor joining (76) with 500 bootstrap replicates (77). Evolutionary distances were computed by the Poisson correction method (78) and are in units of amino acid substitutions per site. All ambiguous positions were removed for each sequence pair. For XisH, the analysis involved 114 amino acid sequences, and there were 330 positions in the final data set. For XisI, 171 amino acid sequences and 260 positions were used. The scale bar represents amino acid changes per position. RAxML and Bayesian analyses gave similar groupings (not shown).

the BgP copies (6), nearly all other annotated copies are found in gliding bacteria (see Table S5 in the supplemental material). Only cyanobacteria, often in multiple copies per genome. The cyano- a few of these ORFs have been assigned a possible function. The bacterial species represented include both unicellular (e.g., Cyan- BOGUAY ORF 01192_0187 has been annotated as a type I restric- othece spp.) and filamentous (e.g., Nostoc punctiforme) types, but tion enzyme R protein N terminus by IMG/ER, which would be the noncyanobacterial species represented are all filamentous consistent with a mobile-element origin(s) for these sequences.

July 2013 Volume 79 Number 13 aem.asm.org 3981 MacGregor et al.

Anabaena sp. 90 FJ461733.cds10.9431..882 90 sp. Anabaena IMG “0693” sequences

abbreviation Species in tree plasmid Cy782202 Cyan7822 6509 Cyan7822 Cy782202 plasmid

BOGUAY 01185_3406

BOGUAY 00556_2292 Gammaproteobacteria BOGUAY 00945_3813

BOGUAY 00226_3298 BOGUAY Orange Guaymas Beggiatoa 29 Cyan8802 0062 Cyan8802

PCC8801 0064 BGP Beggiatoa sp. PS* 27

BGP 2425 BGS Beggiatoa sp. SS* 1

BOGUAY 01192_0187 BegalDRAFT Beggiatoa alba B18LD 1

Cyan7425 5292 4574 Cyan7425 MC7420 4955 4955 MC7420

MC7420 5442 MC7420 Thini Thiothrix nivea DSM 5205 1 MC7420 8151 3860016 OSCI Fleli 3549 Ava C0220

Cyan8802 0666 CRC 00653 CRC

PCC8801 0646 RoseRS 2929 BGP 2313 Rcas 1959 Chloroflexi BGP 5316 Tery 1983 CCHmeta 02360 BOGUAY 00826_3885 MC7420 1683 CCHmeta 03439 CCHmeta Candidatus Chlorothrix halophila 5 Tery 1592 BOGUAY 00244_2468 Tery 2847 RoseRS Roseiflexus sp. RS-1 1 BGP 5838 MC7420 2989 Cyan7425 1342 L8106 18252 Cyan7425 4422 Rcas Roseiflexus castenholzii DSM 13941 1 CCHmeta Tery 05042 2757 BOGUAY 00316_2957MC7420 3859 OsccyDRAFT 1816 BOGUAY 00871_2432 CCHmeta 03562 CCHmeta 04611 BOGUAY 00898_4415 Bacteroidetes BOGUAY 01051_2217 BGP 4579 Flexi Flexibacter litoralis Fx l1 DSM 6794 1 BOGUAY 00241_1042

BGP 2499 BGS 1266 Cyanobacteria Tery 3292 BGSBOGUAY 0222 00628_3551 - Anabaena sp. 90 1 Thini 1398 BGP 1525 Ava Anabaena variabilis ATCC 29413 1 BGP 0118BGP 1466 BGP 2683 BOGUAY 00069_4434 PCC7424 Cyanothece sp. PCC 7424 1 BGP 0375 BGP 1694 BOGUAY 00997_3074 Cyan7425 Cyanothece sp. PCC 7425 4 BOGUAY 01185_3412 Cyan7822 0295 BOGUAY 00780_1610 Cyan7822 Cyanothece sp. PCC 7822 2 MC7420 2884 BGP 4037 BOGUAY 00117_1562 BGP 5765 BGP 1008 PCC8801 Cyanothece sp. PCC 8801 2 BOGUAY 01054_0106Npun F4826 BGP 2439 BGP 4068 Cyan8802 Cyanothece sp. PCC 8802 2 OSCI 1460018 BGP 4069 Npun R1333 BOGUAYBOGUAYBOGUAY 00331_4748 00059_1205 00469_2757 CRC Cylindrospermopsis raciborskii CS-505 2 CRD 00237 Npun F0326 L8106 Lyngbya sp. PCC 8106 1 CRD 00668 BGPBOGUAY 2933 00647_3822 BOGUAYBGP 2024 00472_0544BGP 4407 MC7420 3609 BOGUAY 01070_3584 MC7420 Microcoleus chthonoplastes PCC 7420 12 MC7420 6108 MC7420 5261 BOGUAY 00826_3884 N9414 Nodularia spumigena CCY9414 1 CYB 0591 OsccyDRAFT 3613 Aazo Nostoc azollae' 0708 1 BOGUAY 00024_0693 Npun Nostoc punctiforme PCC 73102 3 Syn63A14 00963 BOGUAY 01192_0199 Syn65A14 00737 BGP 1560

Syn60A14 00774

CRC 02234 N9414 22778 22778 N9414 CRC 01820 BOGUAY 00463_3787 OsscyDRAFT Oscillatoriales sp. JSC-12 2 Tery 2919 BGP 3867

BGP 6185 BGS 1151

MC7420 3359 MC7420

OsccyDRAFT 3612 OsccyDRAFT BGP 1752

MC7420 4465 4465 MC7420 BGP 5652 OSCI Oscillatoria sp. PCC 6506 2 MC7420 3381 MC7420 BegalDRAFT 2266 PCC7424 1907 PCC7424 CRD Raphidiopsis brookii D9 2 0.1 1826 Tery CYB Synechococcus sp. JA-2-3Ba(2-13) 1 Syn60A14 Synechococcus sp. PE A1-1 60AY4M2 1

Syn63A14 Synechococcus sp. PE A1-1 63AY4M1 1 Aazo 2391 Aazo

FIG 7 Sequences related to hypothetical protein BOGUAY 00024_0693 (arrow). The IMG/ER database was searched with the nucleic acid and inferred amino acid sequences of ORF 00024_0693 (BLASTX, cutoff score of 55 bits, E value of 9eϪ06; BLASTP, 56 bits, E value of 3eϪ06). Only BOGUAY 01192_0187 (star) has been annotated in IMG/ER with a possible function, containing a possible type I restriction enzyme R protein N-terminal domain (pfam04313). An ORF from Cyanothece sp. PCC 7822 (circle) has been assigned to a plasmid rather than a chromosome. Phylogenetic analyses were conducted as described for Fig. 6. The analysis involved 121 amino acid sequences. All ambiguous positions were removed for each sequence pair. There were a total of 328 positions in the final data set. The scale bar represents amino acid changes per position. RAxML and Bayesian analyses gave similar groupings (not shown).

One of the two Cyanothece sp. PCC 7822 copies has been assigned The large surface area of filaments, particularly vacuolated ones, to a plasmid (Fig. 7), and the “Nostoc azollae” ORF Aazo_2772 has should increase the likelihood of encounters with mobile ele- an N-terminal 141-aa segment similar to the 0693 elements and a ments, which could mean higher uptake rates. There may also be longer C-terminal segment which is a predicted IS4 family trans- elements (e.g., transducing phage) specializing in attachment to posase. Database searches with this sequence yielded many related cell surface components shared among filamentous species, such putative transposase-associated elements, largely in cyanobacteria as an element of the gliding motility apparatus. (not shown). However, there is no obvious relationship between Restriction-modification systems. Restriction-modification the 0693-like sequences and other mobile elements identified to systems generally combine methylation of specific DNA se- date in the BOGUAY genome (transposases, XisHI genes, toxin- quences with double-stranded DNA cutting by nucleases recog- antitoxin or restriction-modification system genes). Where two nizing unmethylated copies of the same sequence, either within 0693 elements are found on the same contig (01192_0187, the recognition sequence or at a specific or undefined distance 01192_0199; 00826_3884, 826_3885; 01185_3406, 01185_3412), from it. The most common role of restriction-modification sys- their sequences are sufficiently different that they do not appear to tems is thought to be defense against foreign DNA, such as phage: be recent duplications. restriction sites on host DNA are protected by methylation, but The phylogenetic distribution of the xisHI and 00024_0693- incoming DNA is not, unless it derives from a source with a cog- like elements is therefore similar, including a wide range of cya- nate system. There is extensive evidence for horizontal transfer nobacteria (motile and nonmotile, filamentous and unicellular, of restriction-modification systems (e.g., see references 62 and diazotrophic and nondiazotrophic, differentiating and nondiffer- 63), and there are often systems of multiple origins within a entiating) but only a few other bacteria, whose shared character- single genome (e.g., see reference 64). These may still be asso- istic appears to be filamentous or pleiomorphic cell morphology. ciated with evidence of mobile elements, including long in- Given the known habitats of the strains concerned (see Table S5 in verted repeats and transposase genes (e.g., see reference 65). the supplemental material), microbial mats or (very recently) ac- The pace of genetic exchange is such that even strains of the tivated sludge (57) seem a likely environment for gene exchange. same species may have quite different restriction enzyme com-

3982 aem.asm.org Applied and Environmental Microbiology Mobile Elements in Orange Guaymas Beggiatoa plements. There are also examples of restriction-modification BOGUAY genome, which reported matches above a cutoff value system components that have apparently been adopted for (E value Յ 1.00eϪ05), in order, to a maximum of five. Given the other functions, such as methylation control of gene expression large number of sequenced gammaproteobacterial genomes, (e.g., see references 66 and 67). matches outside this group are suggestive (although not proof) of For the BOGUAY genome, closest relatives of possible restric- gene exchange. Once sequences putatively encoding elements tion-modification genes found by JCVI, IMG/ER, RAST, or such as restriction enzymes, transposases, XisH elements, and tox- BLASTX searches for one enzyme type in the neighborhood of the in-antitoxin systems were removed, some 228 ORFs with at least other were identified by BLASTX searches of REBASE, a curated one high-scoring cyanobacterial match remained. Of these, 121 database of DNA and RNA modification enzyme sequences (68), have so far been annotated only as a hypothetical protein or sim- summarized in Table S6 (see Tables S7 to S9 for details) in the ilar in BOGUAY (see Table S10 in the supplemental material). Of supplemental material. Unlike the mobile elements discussed the remaining 107, some 33 can be classified as sensory and signal above, these have a wide range of phylogenetic affiliations, both transduction proteins (see Table S11 in the supplemental mate- bacterial and archaeal, with no special concentration of cyanobac- rial). Without knowing their specificity, it would be risky to spec- terium-like sequences. The likely role of restriction enzymes in ulate on possible evolutionary advantages, but species living in the defense against foreign DNA should favor maintenance of a varied same microbial mat might benefit from similar sensory systems. arsenal. Toxin-antitoxin systems. Toxin-antitoxin (TA) systems typi- ORFs putatively encoding enzymes involved in cell wall and mem- cally consist of a cotranscribed long-lived bacteriostatic or bacte- brane biogenesis and possibly chromosome partitioning are also represented, as are possible proteins for Mn/Zn and ferrous iron ricidal toxin and short-lived antitoxin (recently reviewed in refer- ϩ ϩ ences 69 and 70). Modes of toxin activity include translation uptake, Na /Ca exchange, and multidrug efflux. The other ma- inhibition, mRNA cleavage, and DNA gyrase inhibition. TA sys- jor category of similar genes appears to be proteins possibly in- tems were originally discovered on plasmids, where they can serve volved in secondary metabolite synthesis. as maintenance mechanisms; cells that lose the plasmid, or do not Perspectives. One question arising from this consideration of acquire it upon cell division, will be killed or inhibited by the toxin possible mobile elements in the BOGUAY genome is their role as the antitoxin decays. They have since been found on both mo- over shorter and longer time scales. In the short term, perhaps bile elements and chromosomes of many bacteria and archaea. A there is cell differentiation along filaments by an Xis-like gene single chromosome may bear multiple copies of multiple classes of inversion or rearrangement mechanism. This might be investi- TA elements (see reference 71 for a compilation), which appears gated by single-cell sequencing or targeted gene amplification of to be the case for the BOGUAY genome as well. dissected filaments. In the longer term, what mobile elements and Putative toxin and antitoxin genes were identified from the gene rearrangements are seen in filaments of different colors or in combined JGI and RAST annotations plus directed searches Beggiatoaceae collected from different locations (for example, at (BLASTX with flanking DNA, or BLASTP with unannotated hydrothermal sites compared to cold seeps)? If found, these might nearby ORFs), mostly for antitoxins in the neighborhood of or- be clues to key physiological differences. In the very long term, phan toxins (summarized in Table S10 in the supplemental mate- how much of the evolution of microbial mat communities is re- rial). One antitoxin may neutralize more than one type of toxin, corded in the gene mosaics of their current inhabitants? Orange and toxins with different modes of action may have similar struc- Guaymas Beggiatoa would seem to have spent some time in the tures (reviewed in reference 72), so the groupings here should be company of cyanobacteria before colonizing the deep-ocean floor, considered provisional. There may be decayed, inactive loci (73) not unexpected given the current geographical distribution of included. Toxins and their corresponding antitoxins are typically Beggiatoaceae and/or to have an active genetic connection with cotranscribed, at least in the well-studied cases. Many of the extant surface-dwelling species or strains. The inferred phylogeny BOGUAY elements are in pairs or found at the ends of contigs, of some carbon metabolism genes, on the other hand, has sug- such that the paired element may simply not have been sequenced or connected (see Table S11 in the supplemental material). The gested interspecies gene transfers at hydrothermal sites (B. J. pairs are not always the expected ones, however; the classification MacGregor, unpublished data). It may also be instructive to system and/or assignment of sequences to groups may need re- search for genes with phylogenies similar to that of the XisHI finement. elements, which seem to be shared among filamentous species. Examples of the inferred phylogeny of a toxin and an antitoxin Continued study of the Beggiatoaceae-hosting microbial mats of are shown in Fig. S8 in the supplemental material. They are similar Baja California, Guaymas Basin, and elsewhere could help to an- to those of the restriction-modification systems (not shown) and swer some of these questions. in clear contrast to those of the other putative mobile elements discussed here. BOGUAY sequences may group with other Beg- ACKNOWLEDGMENTS giatoaceae sequences, suggesting divergence within this lineage, Thanks to the captain and crews of the RV Atlantis and HOV Alvin and to but the next-closest neighbors for the examples shown include the shipboard parties of legs AT15-40 and AT15-56. Gammaproteobacteria, Cyanobacteria, Deltaproteobacteria, and Genome sequencing was performed by the J. Craig Venter Institute, Alphaproteobacteria. with funding from The Gordon and Betty Moore Foundation Marine Genes possibly exchanged between cyanobacteria and the Microbial Genome Sequencing Project. The use of RAST was supported BOGUAY strain. For a first overview of the genes that may have in part by National Institute of Allergy and Infectious Diseases, National been transferred (in one direction or the other) with the various Institutes of Health, Department of Health and Human Services (NIAD), cyanobacterium-affiliated mobile elements, we have analyzed the under contract HHSN266200400042C. The Guaymas Basin project was results of a BLASTP search of the UniProt (74) database with the funded by NSF OCE 0647633.

July 2013 Volume 79 Number 13 aem.asm.org 3983 MacGregor et al.

REFERENCES 20. Lowe TM, Eddy SR. 1997. tRNAscan-SE: a program for improved detec- 1. Jannasch HW, Nelson DC, Wirsen CO. 1989. Massive natural occur- tion of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25: rence of unusually large bacteria (Beggiatoa sp.) at a hydrothermal deep- 955–964. sea vent site. Nature 342:834–836. 21. Tåquist H, Cui Y, Ardell DH. 2007. TFAM 1.0: an online tRNA function 2. Hinck S, Mussmann M, Salman V, Neu TR, Lenk S, de Beer D, Jonkers classifier. Nucleic Acids Res. 35:W350–W353. HM. 2011. Vacuolated Beggiatoa-like filaments from different hypersaline 22. Jühling F, Mörl M, Hartmann RK, Sprinzl M, Stadler PF, Pütz J. 2009. environments form a novel genus. Environ. Microbiol. 13:3194–3205. tRNAdb 2009: compilation of tRNA sequences and tRNA genes. Nucleic Acids Res. 37:D159–D162. 3. Salman V, Amann R, Girnth AC, Polerecky L, Bailey JV, Hogslund S, 23. Silva FJ, Belda E, Talens SE. 2006. Differential annotation of tRNA genes Jessen G, Pantoja S, Schulz-Vogt HN. 2011. A single-cell sequencing with anticodon CAT in bacterial genomes. Nucleic Acids Res. 34:6015– approach to the classification of large, vacuolated sulfur bacteria. Syst. 6022. Appl. Microbiol. 34:243–259. 24. Grosjean H, de Crécy-Lagard V, Marck C. 2010. Deciphering synony- 4. Dillon JG, Miller S, Bebout B, Hullar M, Pinel N, Stahl DA. 2009. mous codons in the three domains of life: co-evolution with specific tRNA Spatial and temporal variability in a stratified hypersaline microbial mat modification enzymes. FEBS Lett. 584:252–264. community. FEMS Microbiol. Ecol. 68:46–58. 25. Randau L, Söll D. 2008. Transfer RNA genes in pieces. EMBO Rep. 5. MacGregor BJ, Biddle JF, Siebert JR, Staunton E, Hegg EL, Matthysse 9:623–628. AG, Teske A. 2013. Why orange Guaymas Basin Beggiatoa (Maribeggia- 26. Lee ZMP, Bussema C, Schmidt TM. 2009. rrnDB: documenting the toa) spp. are orange: single-filament genome-enabled identification of an number of rRNA and tRNA genes in bacteria and archaea. Nucleic Acids abundant octaheme cytochrome with hydroxylamine oxidase, hydrazine Res. 37:D489–D493. oxidase, and nitrite reductase activities. Appl. Environ. Microbiol. 79: 27. Dethlefsen L, Schmidt TM. 2007. Performance of the translational appa- 1183–1190. ratus varies with the ecological strategies of bacteria. J. Bacteriol. 189: 6. Mussmann M, Hu FZ, Richter M, de Beer D, Preisler A, Jørgensen 3237–3245. Huntemann BBM, Glöckner FO, Amann R, Koopman WJH, Lasken 28. Rocha EPC. 2004. Codon usage bias from tRNA’s point of view: redun- RS, Janto B, Hogg J, Stoodley P, Boissy R, Ehrlich GD. 2007. Insights dancy, specialization, and efficient decoding for translation optimization. into the genome of large sulfur bacteria revealed by analysis of single Genome Res. 14:2279–2286. filaments. PLoS Biol. 5:1923–1937. doi:10.1371/journal.pbio.0050230. 29. Ibba M, Becker HD, Stathopoulos C, Tumbula DL, Söll D. 2000. The 7. Biddle JF, House CH, Brenchley JE. 2005. Microbial stratification in adaptor hypothesis revisited. Trends Biochem. Sci. 25:311–316. deeply buried marine sediment reflects changes in sulfate/methane pro- 30. Marrs B, Kaplan S. 1970. 23S precursor ribosomal RNA of Rhodopseu- files. Geobiology 3:287–295. domonas spheroides. J. Mol. Biol. 49:297–317. 8. Markowitz VM, Mavromatis K, Ivanova NN, Chen I-MA, Chu K, 31. Haraszthy VI, Sunday GJ, Bobek LA, Motley TS, Preus H, Zambon JJ. Kyrpides NC. 2009. IMG ER: a system for microbial genome annotation 1992. Identification and analysis of the gap region in the 23S ribosomal expert review and curation. Bioinformatics 25:2271–2278. RNA from Actinobacillus actinomycetemcomitans. J. Dent. Res. 71:1561– 9. Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, 1568. Formsma K, Gerdes S, Glass EM, Kubal M, Meyer F, Olsen GJ, 32. Doolittle WF. 1973. Postmaturational cleavage of 23S ribosomal ribonu- Olson R, Osterman AL, Overbeek RA, McNeil LK, Paarmann D, cleic acid and its metabolic control in the blue-green alga Anacystis nidu- Paczian T, Parrello B, Pusch GD, Reich C, Stevens R, Vassieva O, lans. J. Bacteriol. 113:1256–1263. Vonstein V, Wilke A, Zagnitko O. 2008. The RAST server: rapid 33. Salman V, Amann R, Shub DA, Schulz-Vogt HN. 2012. Multiple self- annotations using subsystems technology. BMC Genomics 9:75. doi: splicing introns in the 16S rRNA genes of giant sulfur bacteria. Proc. Natl. 10.1186/1471-2164-9-75. Acad. Sci. U. S. A. 109:4203–4208. 10. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. 2011. 34. Haugen P, Bhattacharya D, Palmer JD, Turner S, Lewis LA, Pryer KM. MEGA5: molecular evolutionary genetics analysis using maximum likeli- 2007. Cyanobacterial ribosomal RNA genes with multiple, endonuclease- hood, evolutionary distance, and maximum parsimony methods. Mol. encoding group I introns. BMC Evol. Biol. 7:159. doi:10.1186/1471-2148 Biol. Evol. 28:2731–2739. -7-159. 11. Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accu- 35. Wood AM, Miller SR, Li WKW, Castenholz RW. 2002. Preliminary racy and high throughput. Nucleic Acids Res. 32:1792–1797. studies of cyanobacteria, picoplankton, and virioplankton in the Salton 12. Papadopoulos JS, Agarwala R. 2007. COBALT: constraint-based align- Sea with special attention to phylogenetic diversity among eight strains of ment tool for multiple protein sequences. Bioinformatics 23:1073–1079. filamentous cyanobacteria. Hydrobiologia 473:77–92. 13. Ludwig W, Strunk O, Westram R, Richter L, Meier H, Yadhukumar, 36. Ghigo E, Pretat L, Desnues B, Capo C, Raoult D, Mege JL. 2009. Buchner A, Lai T, Steppi S, Jobb G, Förster W, Brettske I, Gerber S, Intracellular life of Coxiella burnetii in macrophages: an update. Ann. N. Y. Ginhart AW, Gross O, Grumann S, Hermann S, Jost R, König A, Liss Acad. Sci. 1166:55–66. T, Lüssmann R, May M, Nonhoff B, Reichel B, Strehlow R, Stamatakis 37. Kersh GJ, Wolfe TM, Fitzpatrick KA, Candee AJ, Oliver LD, Patterson A, Stuckmann N, Vilbig A, Lenke M, Ludwig T, Bode A, Schleifer K-H. NE, Self JS, Priestley RA, Loftis AD, Massung RF. 2010. Presence of 2004. ARB: a software environment for sequence data. Nucleic Acids Res. Coxiella burnetii DNA in the environment of the United States, 2006 to 32:1363–1371. 2008. Appl. Environ. Microbiol. 76:4469–4475. 14. Stamatakis A. 2006. RAxML-VI-HPC: maximum likelihood-based phy- 38. Pommier T, Canbäck B, Riemann L, Boström KH, Simu K, Lundberg logenetic analyses with thousands of taxa and mixed models. Bioinformat- P, Tunlid A, Hagström Å. 2007. Global patterns of diversity and com- ics 22:2688–2690. munity structure in marine bacterioplankton. Mol. Ecol. 16:867–880. 15. Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Hohna 39. Kersh GJ, Lambourn DM, Self JS, Akmajian AM, Stanton JB, Baszler S, Larget B, Liu L, Suchard MA, Huelsenbeck JP. 2012. MrBayes 3.2: TV, Raverty SA, Massung RF. 2010. Coxiella burnetii infection of a Steller efficient Bayesian phylogenetic inference and model choice across a large sea lion (Eumetopias jubatus) found in Washington State. J. Clin. Micro- model space. Syst. Biol. 61:539–542. biol. 48:3428–3431. 16. Barloy-Hubler F, Lelaure V, Galibert F. 2001. Ribosomal protein gene 40. Kuhsel MG, Strickland R, Palmer JD. 1990. An ancient group I intron cluster analysis in eubacterium genomics: homology between Sinorhizo- shared by eubacteria and chloroplasts. Science 250:1570–1573. bium meliloti strain 1021 and Bacillus subtilis. Nucleic Acids Res. 29:2747– 41. Xu M-Q, Kathe SD, Goodrich-Blair H, Nierzwicki-Bauer SA, Shub DA. 2756. 1990. Bacterial origin of a chloroplast intron: conserved self-splicing 17. Coenye T, Vandamme P. 2005. Organisation of the S10, spc and alpha group I introns in cyanobacteria. Science 250:1566–1570. ribosomal protein gene clusters in prokaryotic genomes. FEMS Microbiol. 42. Olsson S, Kaasalainen U, Rikkinen J. 2012. Reconstruction of structural Lett. 242:117–126. evolution in the trnL intron P6b loop of symbiotic Nostoc (Cyanobacte- 18. Lane WJ, Darst SA. 2010. Molecular evolution of multisubunit RNA ria). Curr. Genet. 58:49–58. polymerases: sequence analysis. J. Mol. Biol. 395:671–685. 43. Paquin B, Kathe SD, Nierzwicki-Bauer SA, Shub DA. 1997. Origin and 19. Mavromatis K, Ivanova NN, Chen I-MA, Szeto E, Markowitz VM, evolution of group I introns in cyanobacterial tRNA genes. J. Bacteriol. Kyrpides NC. 2009. The DOE-JGI standard operating procedure for the 179:6798–6806. annotations of microbial genomes. Stand. Genome Sci. 1:63–67. 44. Vepritskiy AA, Vitol IA, Nierzwicki-Bauer SA. 2002. Novel group I

3984 aem.asm.org Applied and Environmental Microbiology Mobile Elements in Orange Guaymas Beggiatoa

intron in the tRNALeu(UAA) gene of a ␥-proteobacterium isolated from a 65. Furuta Y, Abe K, Kobayashi I. 2010. Genome comparison and context deep subsurface environment. J. Bacteriol. 184:1481–1487. analysis reveals putative mobile forms of restriction-modification systems 45. Reinhold-Hurek B, Shub DA. 1992. Self-splicing introns in tRNA genes and related rearrangements. Nucleic Acids Res. 38:2428–2443. of widely divergent bacteria. Nature 357:173–176. 66. Fox KL, Dowideit SJ, Erwin AL, Srikhanta YN, Smith AL, Jennings MP. 46. Paquin B, Heinfling A, Shub DA. 1999. Sporadic distribution of 2007. Haemophilus influenzae phasevarions have evolved from type III Arg ␣ tRNACCA introns among -purple bacteria: evidence for horizontal trans- DNA restriction systems into epigenetic regulators of gene expression. mission and transposition of a group I intron. J. Bacteriol. 181:1049–1053. Nucleic Acids Res. 35:5242–5252. 47. Bonocora RP, Shub DA. 2001. A novel group I intron-encoded endo- 67. Srikhanta YN, Maguire TL, Stacey KJ, Grimmond SM, Jennings MP. nuclease specific for the anticodon region of tRNAfmet genes. Mol. 2005. The phasevarion: a genetic system controlling coordinated, random Microbiol. 39:1299–1306. switching of expression of multiple genes. Proc. Natl. Acad. Sci. U. S. A. 48. Lambowitz AM, Zimmerly S. 2011. Group II introns: mobile ribozymes 102:5547–5551. that invade DNA. Cold Spring Harb. Perspect. Biol. 3:a003616. doi:10 68. Roberts RJ, Vincze T, Posfai J, Macelis D. 2010. REBASE—a database for .1101/cshperspect.a003616. DNA restriction and modification: enzymes, genes, and genomes. Nucleic 49. Simon DM, Kelchner SA, Zimmerly S. 2009. A broadscale phylogenetic Acids Res. 38:D234–D236. analysis of group II intron RNAs and intron-encoded reverse transcrip- 69. Blower TR, Salmond GPC, Luisi B. 2011. Balancing at survival’s edge: the tases. Mol. Biol. Evol. 26:2795–2808. structure and adaptive benefits of prokaryotic toxin-antitoxin partners. 50. Elleuche S, Poggeler S. 2010. Inteins, valuable genetic elements in molec- Curr. Opin. Struct. Biol. 21:109–118. ular biology and biotechnology. Appl. Microbiol. Biotechnol. 87:479– 70. Van Melderen L. 2010. Toxin-antitoxin systems: why so many, what for? 489. Curr. Opin. Microbiol. 13:781–785. 51. Stoddard BL. 2005. Homing endonuclease structure and function. Q. 71. Shao YC, Harrison EM, Bi DX, Tai C, He XY, Ou HY, Rajakumar K, Rev. Biophys. 38:49–95. Deng ZX. 2011. TADB: a Web-based resource for type 2 toxin-antitoxin 52. Bolhuis H, Palm P, Wende A, Falb M, Rampp M, Rodriguez-Valera F, loci in bacteria and archaea. Nucleic Acids Res. 39:D606–D611. Pfeiffer F, Oesterhelt D. 2006. The genome of the square archaeon 72. Arbing MA, Handelman SK, Kuzin AP, Verdon G, Wang C, Su M, Haloquadratum walsbyi: life at the limits of water activity. BMC Genomics Rothenbacher FP, Abashidze M, Liu MH, Hurley JM, Xiao R, Acton T, 7:12. doi:10.1186/1471-2164-7-169. Inouye M, Montelione GT, Woychik NA, Hunt JF. 2010. Crystal struc- 53. McKay LJ, MacGregor BJ, Biddle JF, Albert DB, Mendlovitz HP, Hoer tures of Phd-Doc, HigA, and YeeU establish multiple evolutionary links DR, Lipp JS, Lloyd KG, Teske AP. 2012. Spatial heterogeneity and between microbial growth-regulating toxin-antitoxin systems. Structure underlying geochemistry of phylogenetically diverse orange and white 18:996–1010. Beggiatoa mats in Guaymas Basin hydrothermal sediments. Deep Sea Res. 73. Mine N, Guglielmini J, Wilbaux M, Van Melderen L. 2009. The decay of 67:21–31. the chromosomally encoded ccdO157 toxin-antitoxin system in the Esche- 54. Carrasco CD, Holliday SD, Hansel A, Lindblad P, Golden JW. 2005. richia coli species. Genetics 181:1557–1566. Heterocyst-specific excision of the Anabaena sp. strain PCC 7120 hupL 74. Apweiler R, Martin MJ, O’Donovan C, Magrane M, Alam-Faruque Y, element requires xisC. J. Bacteriol. 187:6031–6038. Antunes R, Barrell D, Bely B, Bingley M, Binns D, Bower L, Browne P, 55. Henson BJ, Pennington LE, Watson LE, Barnum SR. 2008. Excision of Chan WM, Dimmer E, Eberhardt R, Fazzini F, Fedotov A, Foulger R, the nifD element in the heterocystous cyanobacteria. Arch. Microbiol. Garavelli J, Castro LG, Huntley R, Jacobsen J, Kleen M, Laiho K, Legge 189:357–366. D, Lin QA, Liu WD, Luo J, Orchard S, Patient S, Pichler K, Poggioli D, 56. Rippka R, Deruelles J, Waterbury JB, Herdman M, Stanier RY. 1979. Pontikos N, Pruess M, Rosanoff S, Sawford T, Sehra H, Turner E, Generic assignments, strain histories and properties of pure cultures of Corbett M, Donnelly M, van Rensburg P, Xenarios I, Bougueleret L, cyanobacteria. J. Gen. Microbiol. 111:1–61. Auchincloss A, Argoud-Puy G, Axelsen K, Bairoch A, Baratin D, Blatter 57. Nielsen PH, Kragelund C, Seviour RJ, Nielsen JL. 2009. Identity and MC, Boeckmann B, Bolleman J, Bollondi L, Boutet E, Quintaje SB, ecophysiology of filamentous bacteria in activated sludge. FEMS Micro- Breuza L, Bridge A, deCastro E, Coudert E, Cusin I, Doche M, Dornevil biol. Rev. 33:969–998. D, Duvaud S, Estreicher A, Famiglietti L, Feuermann M, Gehant S, 58. Kamp A, Røy H, Schulz-Vogt HN. 2008. Video-supported analysis of Ferro S, Gasteiger E, Gateau A, Gerritsen V, Gos A, Gruaz-Gumowski Beggiatoa filament growth, breakage, and movement. Microb. Ecol. 56: N, Hinz U, Hulo C, Hulo N, James J, Jimenez S, Jungo F, Kappler T, 484–491. Keller G, Lara V, Lemereier P, Lieberherr D, Martin X, Masson P, 59. Kalanetra KM, Huston SL, Nelson DC. 2004. Novel, attached, sulfur- Moinat M, Morgat A, Paesano S, Pedruzzi I, Pilbout S, Poux S, Pozzato oxidizing bacteria at shallow hydrothermal vents possess vacuoles not in- M, Redaschi N, Rivoire C, Roechert B, Schneider M, Sigrist C, Sonesson volved in respiratory nitrate accumulation. Appl. Environ. Microbiol. 70: K, Staehli S, Stanley E, Stutz A, Sundaram S, Tognolli M, Verbregue L, 7487–7496. Veuthey AL, Wu CH, Arighi CN, Arminski L, Barker WC, Chen CM, 60. Aruga S, Kamagata Y, Kohno T, Hanada S, Nakamura K, Kanagawa T. Chen YX, Dubey P, Huang HZ, Mazumder R, McGarvey P, Natale DA, 2002. Characterization of filamentous Eikelboom type 021N bacteria and Natarajan TG, Nchoutmboube J, Roberts NV, Suzek BE, Ugochukwu description of Thiothrix disciformis sp. nov. and Thiothrix flexilis sp. nov. U, Vinayaka CR, Wang QH, Wang YQ, Yeh LS, Zhang JA. 2011. Int. J. Syst. Evol. Microbiol. 52:1309–1316. Ongoing and future developments at the Universal Protein Resource. Nu- 61. Chernousova E, Gridneva E, Grabovich M, Dubinina G, Akimov V, cleic Acids Res. 39:D214–D219. Rossetti S, Kuever J. 2009. Thiothrix caldifontis sp. nov. and Thiothrix 75. Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, lacustris sp. nov., gammaproteobacteria isolated from sulfide springs. Int. Glöckner FO. 2013. The SILVA ribosomal RNA gene database project: J. Syst. Evol. Microbiol. 59:3128–3135. improved data processing and Web-based tools. Nucleic Acids Res. 41: 62. Jeltsch A, Pingoud A. 1996. Horizontal gene transfer contributes to the D590–D596. wide distribution and evolution of type II restriction-modification sys- 76. Saitou N, Nei M. 1987. The neighbor-joining method: a new method for tems. J. Mol. Evol. 42:91–96. reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406–425. 63. Naderer M, Brust JR, Knowle D, Blumenthal RM. 2002. Mobility of a 77. Felsenstein J. 1985. Confidence limits on phylogenies: an approach using restriction-modification system revealed by its genetic contexts in three the bootstrap. Evolution 39:783–791. hosts. J. Bacteriol. 184:2411–2419. 78. Zuckerkandl E, Pauling L. 1965. Evolutionary convergence and diver- 64. Nobusato A, Uchiyama I, Kobayashi I. 2000. Diversity of restriction- gence in proteins, p 97–166. In Bryson V, Vogel HJ (ed), Evolving genes modification gene homologues in Helicobacter pylori. Gene 259:89–98. and proteins. Academic Press, New York, NY.

July 2013 Volume 79 Number 13 aem.asm.org 3985