The genome of coahuilensis reveals adaptations essential for survival in the relic of an ancient marine environment

Luis David Alcaraz*, Gabriela Olmedo*, Germa´ n Bonilla†, Rene´ Cerritos†, Gustavo Herna´ ndez‡, Alfredo Cruz‡, Enrique Ramı´rez§, Catherine Putonti¶, Beatriz Jime´nez*‡, Eva Martı´nez*,Varinia Lo´pez*, Jacqueline L. Arvizu*, Francisco Ayala*, Francisco Razo*, Juan Caballero*, Janet Siefertʈ, Luis Eguiarte†, Jean-Philippe Vielle*‡, Octavio Martı´nez*‡, Valeria Souza†, Alfredo Herrera-Estrella*‡, and Luis Herrera-Estrella*‡**

‡Laboratorio Nacional de Geno´mica para la Biodiversidad (Langebio), *Departamento de Ingenierı´aGene´tica and §Departamento de Biotecnologı´a, Cinvestav, Campus Guanajuato, AP 629 Irapuato, Guanajuato CP36500, Me´xico; †Departamento de Ecologı´aEvolutiva, Instituto de Ecologı´a,Universidad Nacional Auto´noma de Me´xico, CU, AP 70-275 Coyoaca´n 04510 Me´xico D.F., Me´xico; ʈDepartment of Statistics, Rice University, P.O. Box 1892, MS-138, Houston, TX 77251-1892; and ¶Departments of Computer Science and Biology and Biochemistry, 4800 Cullen Boulevard, University of Houston, Houston, TX 77204-5001

Contributed by Luis Herrera-Estrella, January 31, 2008 (sent for review December 10, 2007) The Cuatro Cie´negas Basin (CCB) in the central part of the Chihua- phylogenetic analysis of 16S rRNA sequences indicates that B. han desert (Coahuila, Mexico) hosts a wide diversity of microor- coahuilensis is closely related to other marine Bacillus spp. (4), in ganisms contained within springs thought to be geomorphological agreement with the theory of an ancient marine origin of these relics of an ancient sea. A major question remaining to be answered ponds. We sequenced the genome of B. coahuilensis to gain insight is whether from CCB are ancient marine bacteria that on the origin and genomic adaptation of this bacterium to the adapted to an oligotrophic system poor in NaCl, rich in sulfates, and extremely P-limited oligotrophic environment of the Churince pond with extremely low phosphorus levels (<0.3 ␮M). Here, we report in CCB. the complete genome sequence of Bacillus coahuilensis, a sporu- lating bacterium isolated from the water column of a desiccation Results lagoon in CCB. At 3.35 Megabases this is the smallest genome General Genome Features. Sequencing of the B. coahuilensis genome sequenced to date of a Bacillus species and provides insights into was accomplished by using a hybrid strategy of high-coverage the origin, evolution, and adaptation of B. coahuilensis to the CCB pyrosequencing (29ϫ) and a low-coverage Sanger sequencing (6ϫ) environment. We propose that the size and complexity of the B. (supporting information (SI) Table S1). The assembled genome coahuilensis genome reflects the adaptation of an ancient marine size is 3.35 Megabases (Mb), making it the smallest genome bacterium to a novel environment, providing support to a ‘‘marine reported to date for Bacillus spp. (Fig. 1C). Its genome is composed isolation origin hypothesis’’ that is consistent with the geology of of a predicted 3,640 coding genes (5) and has a GC content of 37.5% CCB. This genomic adaptation includes the acquisition through (Table S2), with an average 1,100 orthologues shared with all 14 EVOLUTION horizontal gene transfer of genes involved in phosphorous utili- Bacillus genomes currently available (including Bacillus sp. NRRL zation efficiency and adaptation to high-light environments. The B. B-14911, a recently sequenced marine Bacillus from the Gulf of coahuilensis genome sequence also revealed important ecological Mexico) and 905 unique B. coahuilensis genes. features of the bacterial community in CCB and offers opportuni- By using a phylogenomic approach (6), we reconstructed a ties for a unique glimpse of a microbe-dominated world last seen phylogeny through a concatenated alignment of 20 universally in the Precambrian. distributed genes from the Clusters of Orthologous Groups (COGs) that are considered largely unsusceptible to horizontal gene transfer evolution ͉ genomic adaptation ͉ horizontal gene transfer ͉ (6), and that are present in the 14 Bacillus spp. genomes (Table S3). phosphorus stress ͉ sulfolipids This analysis revealed that B. coahuilensis is basal to the other 14 Bacillus spp. genomes and more related to the marine strain NRRL he Cuatro Cie´negasBasin (CCB) is located in a valley Ϸ740 m B-14911 than any other Bacillus spp. Additionally, the short phy- Tabove sea level in the state of Coahuila, Mexico, that measures logenetic branch length suggests that the B. coahuilensis genome is Ϸ30 km by 40 km and is surrounded by high mountains (Ͼ3,000 m) associated with a low substitution rate compared with the other (Fig. 1). CCB is an enclosed evaporitic basin that receives Ϸ150 mm genomes, despite its small size (Fig. 1C). An extensive 16S analysis of annual precipitation. Despite the dry climate of the valley, the shows that B. coahuilensis clearly forms a unique group, but also CCB harbors an extensive system of springs, streams, and pools (1). supports a phylogenetic relation to marine isolates (Fig. S1). The CCB ecosystem is not only characterized by a high endemism However, at the genome level we found no signals that would of macrooganisms and biodiversity of microorganisms (1, 2), but suggest the presence of bacteria closely related to B. coahuilensis in also by extremely oligotrophic waters that are unable to sustain algal the Global Ocean Sampling database (7) (data not shown). Taken growth, making microbial mats the base of the food web (3). In particular, phosphorus (P) levels in CCB appear to be rather low, because they were below the level of detection of several methods Author contributions: G.O., C.P., A.H.-E., and L.H.-E. designed research; L.D.A., G.H., A.C., ␮ E.R., B.J., E.M., V.L., J.L.A., F.A., and F.R. performed research; L.D.A., G.O., G.B., R.C., A.C., used (0.3 M) and the extremely high biomass C:P and N:P ratios C.P., J.C., J.S., L.E., J.-P.V., V.S., O.M., and L.H.-E. analyzed data; and L.D.A., G.O., and L.H.-E. (Ͼ100 by moles) previously reported for CCB stromatolites (3, 4). wrote the paper. Unlike the present sea, the Churince spring water is poor in NaCl The authors declare no conflict of interest. and carbonates, but it is rich in sulfates, magnesium, and calcium Data deposition: The sequence from this Whole Genome Shotgun project has been depos- (4). Characterization of the microbiological diversity by sequencing ited in the DDBJ/EMBL/GenBank database (accession no. ABFU00000000). The version 16S rRNA genes revealed that nearly half of the phylotypes from described in this article is the first version (accession no. ABFU01000000). the CCB were closely related to bacteria from marine environments **To whom correspondence should be addressed. E-mail: [email protected]. (2). Bacillus coahuilensis is a free-living, spore-forming bacteria This article contains supporting information online at www.pnas.org/cgi/content/full/ isolated from the water column of a shallow desiccation lagoon in 0800981105/DCSupplemental. the Churince system at CCB (4) (Fig. 1 A and B). A molecular © 2008 by The National Academy of Sciences of the USA

www.pnas.org͞cgi͞doi͞10.1073͞pnas.0800981105 PNAS ͉ April 15, 2008 ͉ vol. 105 ͉ no. 15 ͉ 5803–5808 Downloaded by guest on September 28, 2021 A B these genes participate in the synthesis of sulfolipids (9) (Fig. 2B). N The structural prediction of the B. coahuilensis Sqd1 protein is W E N 26°58' S remarkably similar to the crystallized structure of Arabidopsis thaliana SQD (1.2 Å; Fig. 2C); in particular, residues important for the interactions with NADϩ and UDP-ribose are conserved in B. N 26°54' CCB coahuilensis. Reverse transcription PCR (RT-PCR) experiments

S S revealed that, in contrast to plants and cyanobacteria, sqd1 is i a re constitutively expressed and not induced by P limitation (Fig. 2D). N 26°50' ar

d Thin layer chromatography revealed a spot in B. coahuilensis lipid e craM n extracts with migration similar to that of pure sulfolipids, compa- o

s N 26°46' rable to that present in A. thaliana and in a cyanobacteria spp. (isolated from CCB) but absent in lipid preparations from Bacillus W 102°20' W 102°12' W 102°04' W 102°56' sp. NRRL B-91411 (Fig. 2E). Mass spectrometry analysis con- firmed the presence of sulfolipids in B. coahuilensis (Fig. 2F and Fig. GC % S3). The remarkable acquisition of constitutively expressed genes C B. anthracis Sterne 35 100 B. anthracis AMES 35 allowing B. coahuilensis to replace membrane phospholipids with 100 B. thuringiensis konkukian 35 sulfolipids is in agreement with genomic adaptations to extreme 100 37 B. cereus E33L 35 phosphate limitation. 100 B. cereus ATCC 14579 35 B. halodurans 43 Light Sensing in a High-Radiation Environment. The presence of 43 96 B. clausii 46 44 genes encoding Bacteriorhodopsin (BR) in B. coahuilensis is rem- 92 71 O. iheyensis 35 iniscent of the abundance of BR genes in marine environmental B. licheniformis 14580 46 samples (7), suggesting an additional adaptation of marine bacteria. 48 97 B. licheniformis 46 The phylogeny of B. coahuilensis sensory BR (BSR) showed that its 81 54 45 B. subtilis 43 closest orthologue is the Anabaena sp. PCC7120 rhodopsin (ASR) 50 60 G. kaustophilus 52 (Fig. 3A). ASR was the first eubacterial SR identified (10) and B. sp. NRRL B-14911 45 suggested to function as a photosensory receptor. The structural 56 B. coahuilensis 37 prediction of BSR and its comparison with crystallized ASR and SR Listeria monocytogenes 0 1 2 3 4 5 6 of Natronomonas pharaonis (11, 12) indicates that all three proteins 96 Listeria inocua Megabases contain seven conserved transmembrane helices and a Lys residue that binds retinal (Fig. 3B). Alignment of ASR and BSR shows that 0.05 both have a Pro residue instead of the Asp (BSR203 and ASR206) Fig. 1. Marine origin of B. coahuilensis, isolated from a pond in the Chihua- residue present in all other microbial rhodopsins (10). Evidence for huan desert. (A) Sierra de San Marcos is a prominent mountain system in the HGT of rhodopsins has recently been obtained from whole-genome middle of CCB where Ͼ400 ponds sustain most of the biodiversity. The sequencing and metagenomic projects, and is now thought to be a geomorphological origin of CCB has been recently reviewed (2). (B) Churince frequent event in marine bacteria from the photic zone and extreme system (shown with a red triangle in A) consists of a springhead that feeds a saline environments (13–15). The retinal chromophore of rhodop- 2-km-long stream with an intermediate lagoon terminating at a large shallow sin is synthesized as a cleavage product of carotenoids; thus, the desiccation lagoon. (C) Phylogenomic reconstruction. Maximum likelihood combination of carotenoid synthesis and rhodopsin genes has been phylogenomic reconstruction by using Tree-Puzzle (29) was carried out with 20 universally conserved COGs from the sequenced Bacillus spp. and closely suggested to be sufficient for rhodopsin function (14). The genome related species (Table S3). Maximum likelihood bootstrap percentage support of B. coahuilensis also contains genes encoding crtB (phytoene values are only indicated for major nodes; numbers in red represent tentative synthase) and crtICA2 (phytoene dehydrogenases) that could be timescale, in million years, calculated with the method proposed by Battistuzzi involved in retinal biosynthesis (Fig. S4). RT-PCR experiments et al. (30). Genome size (represented as bars) and GC content of each Bacillus showed that the expression of bsr is constitutive and not light- spp. genome is shown to the right. dependent (Fig. 3C). The high-radiation exposure prevalent in shallow waters of CCB could explain the selection pressure respon- sible for the maintenance and constitutive expression of the bsr together, these results suggest that B. coahuilensis is ancestral to the gene. other sequenced Bacillus spp. and that the presence of this species Nucleotide composition analysis (16) identified numerous in the CCB did not originate from a recent migration through spore genomic islands containing genes likely to be acquired by HGT; disbursal from present marine habitats. several have been annotated as HGT genes in B. subtilis and B. halodurans (Fig. S5). Interestingly, our nucleotide composition Lipid Profile Adaptation to a Low-Phosphorous Environment. Mem- analysis did not identify bsr and sqd1 as HGT genes. We also brane phospholipids constitute Ϸ30% of the total phosphate in determined the Codon adaptation index (CAI) (Fig. S6)forbsr and most organisms. In plants and cyanobacteria subjected to phospho- sqd1 which is of 0.719 and 0.735, respectively, indicating that these rous deprivation, phospholipids can be replaced by non-P lipids genes have been present in B. coahuilensis long enough to undergo (such as sulfo- and galactolipids) to maintain membrane function- amelioration to the average CAI (0.714). ality and integrity, and release P to sustain other P-requiring cellular processes (8). Interestingly, genes encoding sulfoquinovose syn- Nitrogen Cycle Strategies and Feeding Capabilities. To gain insight thase (sqd1) and glycosyltransferase (sqdX), the two key enzymes in into the nutritional requirements and metabolic capabilities of B. the synthesis of sulfolipids, are present in B. coahuilensis. Because coahuilensis within the CCB microbial community, we analyzed the sulfoquinovose synthesis operon is absent in all other known genes involved in ABC transport systems, one of the largest Bacillus spp. genomes, this finding suggests that the adaptation of paralogous families present in the genome of this bacterium. B. coahuilensis to the extremely low P concentration of CCB Hidden Markov model (HMM) profiles were built for each ABC includes the acquisition of these genes through horizontal gene importer family (17) and searched against 11 Bacillus spp. genomes. transfer (HGT). The B. coahuilensis genes are closely related to The analysis retrieved a total of 1,038 import systems. The two most cyanobacterial sqd1 and sqdX (Fig. 2A and Fig. S2), and the operon abundant families among all Bacillus spp. were metal (MET) and arrangement is identical to that in Synechococcus sp. PC7942, where osmoprotectant (OTCN) importers (Fig. S6), which seems to be a

5804 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0800981105 Alcaraz et al. Downloaded by guest on September 28, 2021 100 Nostoc sp. PCC 7120 A 98 Anabaena variabilis ATCC 29413 C 43 Nostoc punctiforme PCC 73102 Thermosynechococcus elongatus BP-1 32 Synechocystis sp. PCC 6803 Crocosphaera watsonii WH 8501 97 Trichodesmium erythraeum IMS101 Synechococcus sp. JA-3-3Ba 56 100 Synechococcus sp. JA-3-3Ab Chlamydomonas reinhardtii Medicago trunculata 99 60 Oriza sativa 98 100 Spinacia oleracea Tyr 200/182 79 ChainA crystal structure Sqd1 A. thaliana 99 Arabidopsis thaliana 100 ChainA crystal structure Sqd1 A. thaliana

59 Bacillus coahuilensis Rubrobacter xylanophilus Syntrophobacter fumaroxidans MPOB 45 Thermoplasma volcanium GSS1 His 201/183 100 T. acidophilum 45 Synechococcus sp. WH 8102 Lys 204/186 52 Prochlorococcus marinus NA TL2 Thr 163/145 P. marinus pastoris 100 95 P. marinus MIT 9312 78 P. marinus MIT 9211 Mycobacterium sp. JLS gi|43443278 gi|43177758 99 gi|44133788 Cyanobacteria gi|43837657 Actinobacteria B. coahuilensis SQD1 model gi|43443861 100 Archaea Arabidopsis thaliana SQD1 crystal structure gi|43553861 gi|43089039 Non cultivable UDP-glucose gi|43851695 Plants and Algae NAD gi|43200131 Haloarcula marismortui ATCC 43049 SO4

0.1

B D 1 2 3 4 5 6 Bacillus coahuilensis ppiB sqd1 sqdX egsA

Synechoccocus sp. sqd1 sqdX

F SO2Na2+ -H 1 2 3 4 C1 9-Me-14:0 E 15-Me-16:0

100 B 2 C2 C 569.19 MGDG -H 839.46

C1 597.23 DGDG % 840.45 SQDG B EVOLUTION A 329.01 270.98

598.23 838.5 841.44

0 250 500 750 900 m/z

Fig. 2. Acquisition of sulfoquinovose synthesis capabilities through HGT, an adaptation of B. coahuilensis to a phosphorus-limiting environment. (A) Neighbor-joining phylogenetic reconstruction of SQD1 (31). (B) The operon structure of the sqd1 and sqdX resembles that of the Synecchococcus genes (9). Flanking genes are ppi (peptidyl-prolyl cis-trans isomerase B) and egsA (glycerol-1-phosphate dehydrogenase) (not to scale). (C) Modeling (32) to the A. thaliana SQD (PDB ID code 1I24) protein shows conservation of the NAD and UDP-glucose-binding residues. Diagrammic representation was done by using PyMol (http://www.pymol.org). (D) Expression of the sqd1 gene. RT-PCR was carried out from RNA of B. coahuilensis grown under different phosphate concentrations (lanes 1 to 5, RT-PCR products obtained from cells cultured with 0.001, 0.005, 0.05, 0.5, and 5 mM phosphate, respectively; lane 6, control without reverse transcriptase). (E) TLC analysis reveals the presence of a probable SQDG band in B. coahuilensis, A. thaliana, Cyanobacteria sp. (lanes 1 to 3) but absent in Bacillus sp. NRRL B-14911 (lane 4). (F) Mass spectroscopic analysis confirmed the identity of the sulfoquinovoside (Fig. S3).

characteristic for the group. B. coahuilensis returned 63 import acquisition of single amino acids over oligopeptides. Experimental systems, thus being one of the genomes with a less absolute number results show that B. coahuilensis has an absolute requirement for 8 of import systems (Table S4), below the expected values for the aa (four polar and three hydrophobic) and a partial requirement for group. Nonetheless, the best represented families in B. coahuilensis another 5 aa (three hydrophobic and two polar), confirming that are also MET and OTCN. B. coahuilensis possesses a very high this bacterium depends on amino acid import (Table S4 and Fig. proportion of iron-siderophore (ISVH) importers relative to ge- S6). This overrepresentation of single-amino acid importers is nome size, a feature shared with the water-column marine shared with the recently sequenced water-living beta-proteobacte- Oceanobacillus iheyensis, and Bacillus sp. NRRL B-14911. In addi- ria Minibacterium massiliensis (18), suggesting that this feature tion, an operon coding for the ferric-enterobactin synthesis and might be common in the reduced genomes of aquatic free-living transport system fepBCD is shared with O. iheyensis. It also encodes bacteria. an Iron(III)-dicitrate and ferric enterobactin transporters, suggest- HMM profiles were built for 86 genes involved in the N2 cycle ing that marine bacilli and B. coahuilensis are actively scavenging for (data not shown) and were searched against all sequenced Bacillus iron. spp. genomes. This revealed that B. coahuilensis has the lowest The proportion of polar amino acid and opine (PAO), as well as number of N2 cycle enzymes of all sequenced Bacillus spp., lacking D-L-Methionine (DLM) importers relative to the total number of an inorganic N pathway and most of the urea cycle and urea transporters and genome size is greater in B. coahuilensis than in degradation pathways (Fig. 4). It only possesses the canonical any other Bacillus. It also has one of the lowest ratios of oligopeptide transaminases, an ammonia transporter and a D-Ala transporter, (OPN) importers, suggesting a specialization for the preferential thus not being capable of performing the entire N cycle on its own.

Alcaraz et al. PNAS ͉ April 15, 2008 ͉ vol. 105 ͉ no. 15 ͉ 5805 Downloaded by guest on September 28, 2021 100 38 Uncultured rhodopsins A 72 Candidatus Pelagibacter ubique Bop B 89 B. anthracis str. Ames Uncultured Bop 47 77 Uncultured Bop Helix B 99 Helix A 12 Exiguobacterium sibiricum Bop B. cereus E33L Pyrocistis lunula rhodopsin Salinibacter ruber Xop 36 Helix C B. thuringiensis konkukian 52 Kineococcus radiotolerans Bop 47 Gleobacter violaceus Sop1 Lys 207/210 100 Chlamydomonas reinhardtii Sop2 B. licheniformis ATCC14580 46 C. reinhardtii Sop1 Pro 203/206 16 Neurospora crassa Sop Nostoc sp. PCC 7120 Sop Helix D B. subtilis str. 168 84 Bacillus coahuilensis Sop Salinibacter ruber Sop1 Helix G Phe 210/213 85 B. clausii 94 Salinibacter ruber Sop2 KSM-16 37 20 Haloarcula marismortui Sop1 Leu 202/205 48 Natronomonas pharaonis Sop1 Haloarcula marismortui Sop1 B. halodurans C-125 Helix E Haloquadratum walsbyi Bop1 Proton-Pump rhodopsins Haloarcula marismortui Bop precursor 27 98 O. iheyensis HTE831 Haloquadratum walsbyi Bop2 Halorhodopsins Helix F Haloarcula marismortui Bop 24 37 Halorubrum xinjiangense Bop Sensory B. coahuilensis SOP model rhodopsins G. kaustophilus HTA426 Rubrobacter xylanophilus Bop 32 Nostoc sp. PCC 7120 SOP crystal structure Salinobacter ruber Hop 58 Retinal Natronomonas pharaoinis Hop B. sp. NRRL B14911 100 Haloquadratum walsbyi Hop 46 Helix C 57 Haloarcula marismortui Hop ***** * B. coahuilensis Natronomonas pharaonis Sop1 (non-functional) BSR 64 TTIYYARYIDWVISTPLLLAALALTAMFGGKKN i I l ra Z F A A ASR 66 QIAHYARYIDWMVTTPLLLLSLSWTAMQFIKKD A B B Haloarcula marismortui Bop related B H C C C oce.Ahd aH itih h ra ih 67 gR Asa ri ih l t ro ro t r rB s cra r rt ra c cy cr ct c l l ga gr vl er pa p n e p ga Ar cr nl er e pb g SRII 66 RTVFVPRYIDWILTTPLIVYFLGLLAGLDSREF gr g l m rF yC C y r cab. aC u aN r r r IE o r s 100 d

Salinibacter ruber bop related i Hcp AC NI NA G Lt N T T NJ AA NA CA Hp A

Helix F GA Tp . CD A N GB UA C L h UB N HH AH AD CA UC AG A NG RF NZ NA DA PC Cn Nt A ** ** Cl h BSR 156 TRLARHYTRVAIYLSVLWVCYPTAWLLGPSGLG Ah d 0.2 ASR 159 SELANLYDKLVTYFTVLWIGYPIVWIIGPSGFG Gl d SRII 154 SGIKSLYVRLRNLTVVLWAIYPFIWLLGPPGVA Gh Helix G Gu 12 34 * * C BSR 189 LAQELTEVLVFIILPIFSKVGFSIVDLHGLRKL Fig. 4. B. coahuilensis lacks many of the genes coding for enzymes involved ASR 192 WINQTIDTFLFCLLPFFSKVGFSFLDLHGLRNL SRII 187 LLTPTVDVALIVYLDLVTKVGFGFIALDAAATR in nitrogen metabolism. Hidden Markov models for the bacterial enzymes involved in N2 metabolism were constructed to detect these genes in all of the Fig. 3. B. coahuilensis contains a sensory bacteriorhodopsin possibly ac- sequenced Bacillus spp. Bars in different colors denote the presence of a gene quired from cyanobacteria through an ancient HGT event. (A) Neighbor- predicted to code for a given enzyme. joining tree showing the phylogenetic diversity of rhodopsins (31). (B) Bsr possesses all of the residues involved in retinal binding and was modeled (32) to the predicted structures of Anabaena and Natronomonas pharaonis. Align- and appears well suited to perform most of its N2 cycle on its own ment of segments of SR from B. coahuilensis (BSR, BM4401574), Anabaena sp. (Fig. 4 and Table S5). PCC7120 (ASR, PDB ID code 1XIO), and Natronomonas pharaonis DSM2160 (SRII, PDB ID code 1JGJ) show conservation of residues in the retinal-binding Genome Size Evolution and Essential Genes. The basal position of B. pocket (marked with asterisks) except for a Pro residue (positions BSR203 and coahuilensis (Fig. 1C) suggests that it is ancestral to the other ASR206), which is an Asp residue in all other microbial rhodopsins (blue sequenced Bacillus spp. and related species. Although most of the rectangle; alignment adapted from ref. 10). Diagrammatic representation essential genes (20) of B. subtilis are conserved in B. coahuilensis, was done by using PyMol (http://www.pymol.org). (C) bsr is expressed in B. coahuilensis grown either under white light or in the dark. RT-PCR was carried many of those with redundant functions are absent (Table S6, Table out by using RNA obtained from bacteria grown under dark or white-light S7, and Table S8). Genes for endospore formation are conserved, conditions (lanes 1 and 2, respectively). Lanes 3 and 4 are controls without but the number of genes encoding components of the spore coat are reverse transcriptase. significantly underrepresented (Table S6), although this does not compromise formation of heat-resistant spores. We also carried out an analyses of the largest paralogous gene families within B. B. coahuilensis codes for rocF, an arginase mainly found in Firmi- coahuilensis compared with those in B. subtilis (21), as a means of identifying important functions that are maintained in a small cutes, but not for any other urea-cycle enzyme. All sequenced Ͼ Bacillus spp. code for a complete urea cycle (argFGH and rocF), genome. Although B. subtilis has 13 paralogous gene families of 5 with the notable exception of O. iheyensis. Moreover, growth of B. members involved in secondary metabolites biosynthesis, transport, and catabolism, these are almost completely absent in B. coahuilen- coahuilensis is impaired, but not completely arrested, when grown sis (COG Q category, 1 gene family of Ͼ5 members). Signal without arginine, and this amino acid is one of those transported by transduction, in contrast, seems to be overrepresented in B. the overrepresented import family PAO. B. coahuilensis also lacks coahuilensis (COG T category, 57 gene families of Ͼ5 members) the other gene that could provide a source of arginine from compared with the same category in B. subtilis (1 gene family of Ͼ5 citrulline, arcA. This suggests that arginine is an important amino members) suggesting that environmental monitoring is key to acid in this bacterium, and that it is most likely synthesizing arginine survival in the CCB environment (Fig. S7). Analysis of all known by employing an unknown pathway. It was reported for example, B. subtilis genes involved in the synthesis of cell wall and membrane that Pseudomonas spp. synthesizes arginine by an arginine racemase components reveals that B. coahuilensis also lacks genes necessary and a D-arginine dehydrogenase (19). for the synthesis of polyphosphate-rich teichoic and polyanion In summary, B. coahuilensis exhibits numerous auxotrophies and teichuronic acids, useful cell wall phosphorus storage compounds. high dependence on the N2 cycle, which is mainly carried out in the Because teichoic acid synthesis genes are not only present in several CCB ponds by cyanobacteria, and is therefore also likely to be Bacillus species, but also in Listeria spp. which are used as an highly dependent on the microbial community within this environ- outgroup for comparing the Bacillus genus (Table S7), it is likely ment. This is in contrast with the genome from the marine strain that B. coahuilensis lost the capacity to produce teichoic acid NRRL B-14911, which exhibits partial requirement for only 2 aa because of the P-limited environment of CCB. Of the 45 genes that

5806 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0800981105 Alcaraz et al. Downloaded by guest on September 28, 2021 constitute the PhoP/PhoR regulon in B. subtilis, only 24 are present Our findings show that HGT played a key role in the adaptations in B. coahuilensis (Table S8). B. coahuilensis lacks the pit gene of B. coahuilensis. We are currently exploring whether transposons encoding a low-affinity P transporter that is conserved in Listeria (of which there are Ͼ20 in B. coahuilensis), phages [known to be spp. and many Bacillus spp. (except for B. halodurans, B. clausi, and abundant in ponds of the CCB (26)], and natural competence could O. iheyensis) suggesting that pit was also lost in B. coahuilensis provide the mechanisms driving changes in the B. coahuilensis because it provided no advantage in the CCB low-P environment. genome as well as in the other bacteria in the community. We have Interestingly, the pit gene is also absent in the marine cyanobacterial isolated numerous cyanobacteria in the Churince pond (27). genomes (22). Overall, several genomic characteristics seem to be The strategy of sequencing a single bacterial isolate to obtain the result of evolutionary mechanisms that caused a reduction in information on the adaptations of bacteria living in a highly genome size and selected against redundant genes and genes that oligotrophic environment was highly fruitful because it led to a clear provided little or no advantage for survival in the oligotrophic identification of the phylogenetic affiliation of some ecological environment of the CCB pools. However, given the basal position functions. A metagenome approach might have obscured the of B. coahuilensis in the phylogenomic analysis, it is also possible phylogenetic association of genes of cyanobacterial origin to firmi- that some of the genes it is lacking, but which are present in other cutes and would not have allowed us to observe the important Bacillus spp., may represent gains of bacilli species that evolved nitrogen cycle and amino acid synthesis limitations that make this later. bacterium dependent on the microbial community. The small genome size, constraints in secondary metabolism, and overrepre- Discussion sentation of signal transduction genes are also features that could P limitation seems to be an important driving force for the not have been deduced from a metagenomic approach. Efforts to adaptations observed in the B. coahuilensis genome. This bacterium study the metagenomics of the CCB ponds should be made to help has the capability of synthesizing membrane sulfolipids, apparently us understand how these ancient water ecosystems in the middle of acquired by HGT from a cyanobacteria. Sulfolipid SQDG is present the desert are self-sufficient in their biogeochemical cycles. in all higher plants, mosses, ferns, and algae, but it has also been Our results provide evidence that, because of the specific genome reported in nonphotosynthetic organisms, predominantly in cya- dynamics, its ancestry, and the local adaptive response, B. nobacteria (Fig. 3A). Discovery of this capability in the bacilli is coahuilensis is most likely the result of adaptation of an ancient especially interesting given the context of an environment with marine bacterium to a novel environment. Results that are in extremely limiting P. HGT of sqd1 and sqdX seems to be frequent agreement with the geology/ontology model that predicts a marine in cyanobacteria and, because it involves transfer of only a couple environment for this region in the early Jurassic followed by the of genes, it seems relatively simple for a bacteria to appropriate this rising of the continent, the formation of the CCB valley, and its Ϸ mechanism to cope with P scarceness. Finally, genome-size reduc- isolation by the surrounding Sierras in the Cretaceous period 70 tion in an ecosystem with very low P availability, such as the CCB million years ago (2). B. coahuilensis is likely a primitive bacterial pools, could increase the fitness of a bacterium with a reduced P component of a complex community that included Archaea and demand for nucleotide synthesis. Cyanobacteria that provided genomic fodder for gene transfer and The predicted rhodopsin sequence of B. coahuilensis is phylo- the implementation of innovative and necessary strategies for genically closer to sequences from cyanobacteria than to those survival in an evolving ecology. EVOLUTION reported in other bacteria and archeobacteria. Because retinal, Materials and Methods which is derived from carotenoids, is probably readily available in many of the pigmented CCB bacteria, the acquisition of rhodopsin Strains. B. coahuilensis was provided by V.S (4) and NRRL B-14911 by J.S. The media used for growth are described in SI Text. genes from cyanobacteria could easily lead to a new functional adaptation. B. coahuilensis genome encodes a couple of photolyases Genome Sequencing and Annotation. B. coahuilensis was sequenced by a hybrid that repair UV-induced DNA lesions and could explain, along with Sanger/454 approach. The entire genome sequence was obtained from a com- the bsr gene, the adaptation of B. coahuilensis to the high-light bination of 16,698 end sequences (providing 6-fold coverage) from a pUC18 fluency in the shallow and clear waters of the CCB ponds. genomic shotgun library (2–5 kb), by using dye terminator chemistry on auto- The incidence of both rhodopsin and sulfolipid biosynthesis mated DNA sequencers (ABI3700; Applied Biosystems) and 454 technology with genes in environmental samples is spatially restricted. We searched seven runs at a 29-fold coverage. Predicted protein-encoding genes were man- the CAMERA database (23) for homologue genes to the B. ually refined (see SI Text) and automatically annotated by using the BASys coahuilensis’ bsr and sqd1 and found only four sample points within system (5). the Global Ocean Sampling (GOS) having both genes represented: Prediction of Horizontal Transfer. The Similarity Plot (S-plot) application was used GS027, GS031, GS031, and GS034, which are located in shallow (16) to identify windows that contained regions of unusual compositional prop- waters of the Galapagos Islands. Two important common features erties (RUCPs) within B. coahuilensis genome (see SI Text). between CCB ponds and the Galapagos Islands are a high-radiation environment in shallow waters and the lack of P. The fact that sqd1 RT-PCR. Semiquantitative RT-PCRs were carried out by using SuperScript One Step and bsr coexist in two distinct geographically distant locations with RT-PCR with Platinum Taq (Invitrogen Life Technologies) (see SI Text) from RNA common environmental characteristics is suggestive of common isolated from strains grown in modified marine medium supplemented with adaptation strategies. P starvation seems to be a general constraint phosphate. For light/dark experiments, strain was grown on Petri dishes with in marine environments, as shown in ref. 7. marine medium grown at 37°C either under white or blue light or in the dark. Analysis of paralogous gene families revealed a dramatic limi- tation in genes for secondary metabolism in B. coahuilensis possibly Lipid Extraction and Analysis. Lipids from Arabidopsis, Cyanobacteria spp., and B. coahuilensis were extracted (details are available on request) and analyzed by as a result of both genome-size reduction and adaptation to a using the TLC technique as described in ref. 28. For lipid footprint analysis, unique niche (24, 25). The large number of auxotrophies, the individual lipids were isolated from TLC plates, and duplicates of each lipid spot limitation in N2 cycle genes (Fig. 4), and overrepresented signal were analyzed by MALDI-TOF MS technology (see SI Text). transduction genes suggest that this bacterium is a specialist within the CCB water systems, dependent and tightly bound to the primary ACKNOWLEDGMENTS. We thank Laura Espinosa Azuar and Antonio Cruz in producers and prepared for sensing and responding to the extreme technical assistance at Instituto de Ecologı´a,Universidad Nacional Auto´noma seasonal environmental conditions (2). This is in contrast to pan- de Me´xico; Dr. Michael Travisano (University of Minnesota) and June Simpson (Cinvestav Campus Guanajuato) for their comments on the manuscript. This demic and generalist bacteria like B. subtilis with its arsenal of work was supported in part by Secretarı´ade Agricultura, Ganaderı´a,Desar- secondary metabolism genes. rollo Rural, Pesca y Alimentacio´n(SAGARPA) Zea-2006 and Consejo Nacional

Alcaraz et al. PNAS ͉ April 15, 2008 ͉ vol. 105 ͉ no. 15 ͉ 5807 Downloaded by guest on September 28, 2021 de Ciencia y Tecnologı´a (CONACyT)-Secretaria de Educatio´n Pu´ blica 43979 de Medio Ambiente y Recursos Naturales Grant 2002-C01-0237 (to V.S.); grants and Howard Hughes Medical Institute Grant 55005946 (to L.H.-E.); Exobiology Grant NG04GJ12G (to J.S.); and graduate scholarships from CONACyT IN Grants 223105 and 44673Q (to V.S. and L.E.); CONACyT-Secretaria CONACyT (to L.D.A., R.C., and J.C.).

1. Minckley WL (1969) Environments of the Bolson of Cuatro Cie´negas, Coahuila, Me´xico, 17. Bouige P, Laurent D, Piloyan L, Dassa E (2002) Phylogenetic and functional classification with Special Reference to the Aquatic Biota.University of Texas: El Paso Science Series of ATP-binding cassette (ABC) systems. Curr Protein Pept Sci 3:541–559. (Texas Western Press, El Paso, TX). 18. Audic S, et al. (2007) Genome analysis of Minibacterium massiliensis highlights the 2. Souza V, et al. (2006) An endangered oasis of aquatic microbial biodiversity in the convergent evolution of water-living bacteria. PLoS Genet 3:e138. Chihuahuan desert. Proc Natl Acad Sci USA 103:6565–6570. 19. Jann A, Matsumoto H, Haas D (1988) The fourth arginine catabolic pathway of 3. Elser JJ, et al. (2005) Effects of phosphorus enrichment and grazing snails on modern Pseudomonas aeruginosa. J Gen Microbiol 134:1043–1053. stromatolitic microbial communities. Freshwater Biol 50:1808–1825. 20. Kobayashi K, et al. (2003) Essential Bacillus subtilis genes. Proc Natl Acad Sci USA 4. Cerritos R, et al. (2008) Bacillus coahuilensis sp. nov. a new moderately halophilic 100:4678–4683. species from different pozas in the Cuatro Cie´negas Valley in Coahuila, Me´xico. Int J 21. Pushker R, Mira A, Rodriguez-Valera F (2004) Comparative genomics of gene-family Syst Evol Microbiol 58:919–923. size in closely related bacteria. Genome Biol 5:R27. 5. Van Domselaar GH, et al. (2005) BASys: A web server for automated bacterial genome annotation. Nucleic Acids Res 33:W455–W459. 22. Su Z, et al. (2003) Computational inference of regulatory pathways in microbes: An 6. Ciccarelli FD, et al. (2006) Toward automatic reconstruction of a highly resolved tree of application to phosphorus assimilation pathways in Synechococcus sp. WH8102. Ge- life. Science 311:1283–1287. nome Inform 14:3–13. 7. Rusch DB, et al. (2007) The Sorcerer II Global Ocean Sampling Expedition: Northwest 23. Seshadri R, Kravitz SA, Smarr L, Gilna P, Frazier M (2007) CAMERA: A community Atlantic through Eastern Tropical Pacific. PLoS Biol 5:e77. resource for metagenomics. PLoS Biol 5:e75. 8. Dormann P, Benning C (2002) Galactolipids rule in seed plants. Trends Plants Sci 24. Mira A, Pushker, R (2005) The silencing of pseudogenes. Mol Biol Evol 22:2135–2138. 7:112–118. 25. Mira A, Pushker R, Rodriguez-Valera F (2006) The Neolithic revolution of bacterial 9. Benning C (1998) Biosynthesis and function of the sulfolipid sulfoquinovosyl diacyl- genomes. Trends Microbiol 14:200–206. glycerol. Annu Rev Plant Physiol Plant Mol Biol 49:53–75. 26. Desnues C, et al. (2008) Biodiversity and biogeography of phages in modern stroma- 10. Jung KH, Trivedi VD, Spudich JL (2003) Demonstration of a sensory rhodopsin in tolites and thrombolites. Nature 452:340–343. eubacteria. Mol Microbiol 47:1513–1522. 27. Falcon LI, Cerritos R, Eguiarte LE, Souza, V (2007) Nitrogen fixation in microbial mat and 11. Vogeley L, et al. (2004) Anabaena sensory rhodopsin: A photochromic color sensor at stromatolite communities from Cuatro Cienegas. Mexico Microb Ecol 54(2):363–373. 20 A. Science 306:1390–1393. 28. Hartel H, Dormann P, Benning C (2000) DGD1-independent biosynthesis of extraplas- 12. Luecke H, Schobert B, Lanyi JK, Spudich EN, Spudich JL (2001) Crystal structure of tidic galactolipids after phosphate deprivation in Arabidopsis. Proc Natl Acad Sci USA sensory rhodopsin II at 2.4 angstroms: Insights into color tuning and transducer 97:10649–10654. interaction. Science 293:1499–1503. 29. Schmidt HA, Strimmer K, Vingron M, von Haeseler A (2002) TREE-PUZZLE: Maximum 13. Mongodin EF, et al. (2005) The genome of Salinibacter ruber: Convergence and gene likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics exchange among hyperhalophilic bacteria and archaea. Proc Natl Acad Sci USA 18:502–504. 102:18147–18152. 14. Frigaard NU, Martinez A, Mincer TJ, DeLong EF (2006) Proteorhodopsin lateral gene 30. Battistuzzi FU, Feijao A, Hedges SB (2004) A genomic timescale of prokaryote evolu- transfer between marine planktonic Bacteria and Archaea. Nature 439:847–850. tion: Insights into the origin of methanogenesis, phototrophy, and the colonization of 15. Venter JC, et al. (2004) Environmental genome shotgun sequencing of the Sargasso land BMC. Evol Biol 4:44. Sea. Science 304:66–74. 31. Kumar S, Tamura K (2004) Nei M MEGA3: Integrated software for molecular evolu- 16. Putonti C, et al. (2006) A computational tool for the genomic identification of regions tionary genetics analysis and sequence alignment. Brief Bioinform 5:150–163. of unusual compositional properties and its utilization in the detection of horizontally 32. Schwede T, Kopp J, Guex N, Peitsch MC (2003) SWISS-MODEL: An automated protein transferred sequences. Mol Biol Evol 23:1863–1868. homology-modeling server. Nucleic Acids Res 31:3381–3385.

5808 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0800981105 Alcaraz et al. Downloaded by guest on September 28, 2021