<<

Distinguishing features of ␦-proteobacterial genomes

Samuel Karlin*†, Luciano Brocchieri*, Jan Mra´ zek‡, and Dale Kaiser§

Departments of *Mathematics and §Biochemistry, Stanford University, Stanford, CA 94305; and ‡Department of Microbiology, University of Georgia, Athens, GA 30602

Contributed by Samuel Karlin, June 5, 2006

We analyzed several features of five currently available ␦- Table 1. ␦-Proteobacterial complete genomes proteobacterial genomes, including two aerobic exhibit- G ϩ C genome ing predatory behavior and three anaerobic sulfate-reducing bac- frequencies, % % PHX genes, % Max E(g) teria. The ␦ genomes are distinguished from other bacteria by several properties: (i) The ␦ genomes contain two ‘‘giant’’ S1 MYXXA 68.9 19.2 2.02 ribosomal protein genes in contrast to all other bacterial types, BDEBA 50.7 7.3 2.87 which encode a single or no S1; (ii) in most ␦-proteobacterial DESVU 63.2 15.1 1.90 GEOSU 60.9 5.2 1.33 genomes the major ribosomal protein (RP) gene cluster is near the DESPS 46.8 8.6 1.63 replication terminus whereas most bacterial genomes place the major RP cluster near the origin of replication; (iii) the ␦ genomes possess the rare combination of discriminating asparaginyl and glutaminyl tRNA synthetase (AARS) together with the amido- zation of prey bacteria by BDEBA is reviewed in refs 2 and 11. complex (Gat CAB) genes that modify Asp-tRNAAsn into DESVU is a strictly anaerobic sulfate-reducing bacterium (3) with Asn-tRNAAsn and Glu-tRNAGln into Gln-tRNAGln;(iv) the TonB re- a substantial capacity to oxidize metal ions, many of which are toxic. ceptors and ferric siderophore receptors that facilitate uptake and It has an extensive network of periplasmic hydrogenases and removal of complex metals are common among ␦ genomes; (v) the cytochromes for electron transport (3). GEOSU implements biore- anaerobic ␦ genomes encode multiple copies of the anaerobic mediation by precipitating various soluble heavy metals (4). detoxification protein rubrerythrin that can neutralize hydrogen GEOSU encodes many predicted highly expressed (PHX) cyto- peroxide; and (vi) ␴54 activators play a more important role in the chrome c and ferredoxin proteins that are used for periplasmic and ␦ genomes than in other bacteria. ␦ genomes have a plethora of outer membrane electron transport (4). DESPS is a psychrophilic enhancer binding proteins that respond to environmental and sulfate-reducing bacterium that uses sulfate as the main electron intracellular cues, often as part of two-component systems; (vii) ␦ acceptor and lactate and alcohols as major carbon and electron genomes encode multiple copies of metallo-␤-lactamase ; sources (5). Its optimal doubling time is 27 h during growth on (viii) a of proteins emphasizing SecA, SecB, and SecY lactate at 10°C, but it can also grow successfully at a temperature may be especially useful in the predatory activities of Myxococcus Ͻ0°C (5). xanthus;(ix) ␦ drive many multiprotein machines in This article has two objectives. First, PHX genes and related their periplasms and outer membrane, including chaperone-feed- properties of the five ␦ genomes are analyzed (Table 1). Qualita- ing machines, jets for slime secretion, and type IV pili. Bdellovibrio tively, a gene can be defined PHX if its codon frequencies are replicates in the periplasm of prey cells. The sulfate-reducing ␦ similar to highly expressed genes such as those for ribosomal proteobacteria metabolize hydrogen and generate a proton gra- proteins (RP) or for major transcription͞translation factors (TF) or dient by electron transport. The predicted highly expressed genes for the principal chaperone͞degradation (CH) proteins, but deviate from ␦ genomes reflect their different ecologies, metabolic strat- strongly in codon frequencies from the average gene of the genome egies, and adaptations. (see Methods for precise criteria). Second, various properties that distinguish the ␦ genomes from other bacteria are described. ␦ proteobacteria ͉ ͉ sulfate-reducing bacteria ͉ predatory bacteria ͉ ␴54 activators Results and Discussion Distinctive PHX Genes of ␦ Genomes. MYXXA shows, to date, the he ␦ proteobacteria (␦ genomes) are defined by their 16S RNA highest percentage of PHX genes, 19.2% compared with all sequence (1). Completely sequenced ␦ genomes include the currently sequenced bacterial genomes (12–16). The top PHX T gene in MYXXA encodes the preprotein SecA multicellular predator Myxococcus xanthus (MYXXA) (D.K., W. C. ϭ Nierman, B. S. Goldman, S. Slater, A. S. Durkini, J. Eisen, C. M. subunit [E(g) 2.02] (see Table 6, which is published as Ronning, W. B. Barbazuk, M. Blanchard, C. Field, et al., unpub- supporting information on the PNAS web site). The high E(g) lished results), the unicellular predator Bdellovibrio bacteriovorus value suggests that secretion plays a major role in the MYXXA lifestyle. Of almost equal predicted expression level are the RNA (BDEBA) (2), and the three anaerobic sulfate-reducing bacteria ϭ ϭ (DESVU) (3), Geobacter sulfurreducens polymerase subunits RpoC [E(g) 2.02] and RpoB [E(g) 1.84] and the ATP-dependent Lon [E(g) ϭ 1.95]. A highly (GEOSU) (4) and Desulfotalea psychrophila (DESPS) (5). Whereas ϭ MYXXA of 9.14-Mb length is among the largest bacterial genomes expressed protein [E(g) 2.01] of unknown function that could sequenced, the other four ␦ genomes are all of size 3.5 to 4.0 Mb. be an attractive candidate for experimental analyses is encoded MYXXA lives in cultivated topsoil, where it is often exposed to at genome positions 4160152–4162320. The PHX genes of solar radiation and is well aerated. It has two life stages, growth and development, both of which involve remarkable cellular coopera- Conflict of interest statement: No conflicts declared. tion and much gliding movement (6–8). are profi- Abbreviations: PHX, predicted highly expressed; RP, ribosomal protein; AARS, asparaginyl cient predators of whole colonies of other soil microbes. MYXXA and glutaminyl tRNA synthetase; RR, response regulator; HK, histidine kinase; TCA, tricar- encodes many duplicated proteins expressed during the different boxylic acid. life stages, e.g., Lon (9) and serine-threonine protein kinases (10). Data deposition: The two new complete genomes referred to in Note have been deposited BDEBA is ubiquitous in terrestrial and aquatic habitats. It preys on in the GenBank database [accession nos. NC-007519 (Desulfovibrio desulfuricans) and individual Gram-negative bacterial cells by invading their periplasm NC-007517 (Geobacter metallireduceans)]. and transforming them into nearly spherical structures called †To whom correspondence should be addressed. E-mail: [email protected]. bdelloplasts (2). A detailed scenario for the adhesion and coloni- © 2006 by The National Academy of Sciences of the USA

11352–11357 ͉ PNAS ͉ July 25, 2006 ͉ vol. 103 ͉ no. 30 www.pnas.org͞cgi͞doi͞10.1073͞pnas.0604311103 Downloaded by guest on October 2, 2021 Table 2. PHX genes of important pathways Genomes Glycolysis* TCA-cycle* Detoxification*

MYXXA 8 (8) 11 (13) 10 (14) BDEBA 3 (3) 7 (8) 5 (5) DESVU 4 (4) 1 (1) 1 (1) Scheme 1. Cluster of six TCA cycle genes. GEOSU 1 (1) 5 (5) 3 (3) DESPS 1 (1) 1 (1) 0 (0) form a gene cluster, possibly a single operon (display in Scheme 1). DESVU has three anaerobic detoxification genes (rubrerythrin, rubredoxin- oxygen , and nigerythrin). Overall, MYXXA has 11 PHX TCA cycle genes and 13 including *Number of distinct PHX genes (number of PHX genes with repeats). two duplications (Table 2 and Scheme 1). The successive gaps between genes in Scheme 1 are 32, 219, 44, 286, and 29 bp. The successive genes are of sizes 431, 313, 625, 268, 385, and 298 aa. BDEBA reach the high E(g) level 2.87, suggesting that BDEBA Gene orientation is indicated by arrows. The coordinates below the should be considered a fast growing organism (16, 17). In fact, grid indicate the starting position of each gene. once inside the bdelloplast, BDEBA does multiply rapidly, TCA cycle genes of the ␦ genomes BDEBA, GEOSU, and producing several descendants from a single host DESPS are organized similarly in that sucA is adjacent to sucB, sucC cell (11). BDEBA also encodes a variety of periplasmic PHX is adjacent to sucD, and the succinate dehydrogenase flavoprotein electron transporters that adapt it to microaerophilic conditions subunit (sdhA) and the succinate dehydrogenase -sulfur subunit likely found within the host’s periplasm (11). Of the 46 RP genes (sdhB) are encoded as part of a single operon. BDEBA and DESPS Ն80 aa length of the BDEBA genome, 45 are PHX, a high encode complete sets of TCA cycle enzymes (8 and 1, respectively, proportion consistent with the proposition that BDEBA is fast PHX). DESVU features a fusion of sucC and sucD (sucCD) but, growing (17). DESVU contains several PHX anaerobic detox- apart from icd (PHX), mdh, and fumC, lacks the other genes of the ification genes, including two rubrerythrin genes and two rubre- TCA cycle. Five TCA cycle enzymes are PHX in GEOSU. doxin oxidoreductase genes that can protect the organism against oxidative stress or other reactive toxins (18, 19). The gene Glycolysis. Genes encoding glycolytic enzymes are broadly distrib- of highest PHX level [E(g) ϭ 1.90] is the large ribosomal protein uted among the ␦ genomes, with a single cluster in each genome. gene S1. The PHX genes of GEOSU encompass only 5% of the Explicitly, MYXXA, BDEBA, and GEOSU each cluster the genes proteome and have a maximum E(g) ϭ 1.33 for the RNA gap, pgk, and tpi, probably in a single operon; DESVU clusters the processing͞degradation gene pnp (polynucleotide phosphory- genes fba and gap; and DESPS clusters tpi and pgk. MYXXA lase). The low expression levels of its PHX genes suggest that contains two copies of pyk and of pfk and DESVU contains two GEOSU is prone to grow slowly (16, 17). This genome also has copies of gap. DESVU expresses few PHX TCA cycle genes but few PHX chaperone genes. many PHX genes, as expected for an organism that grows anaerobically on sugars. Energy Metabolism in ␦ Genomes. MYXXA and BDEBA feature many PHX genes for aerobic metabolism. These genes include the PHX Genes Contributing to MYXXA Social Behavior. MYXXA is NADH dehydrogenase (Nuo) complex and associated enzymes of remarkable for the ability of its cells to cooperate in coordinated respiration, most tricarboxylic acid (TCA) cycle enzymes (see cell movements and in building fruiting bodies when starved. Its below), and the cytochrome c oxidase operon comprising four repertoire of PHX genes reflects its capacity for sensing its envi- subunits arranged in the order II-I-III-IV, all PHX (see Table 6). ronment and for cell–cell interactions. For example, there is a large GEOSU and DESVU contain the same operon, but none of the family of signal responsive ␴54 activators, also called enhancer- subunits are PHX. DESPS is missing the whole operon. These binding proteins. This action contrasts with most other Gram- qualities correlate with the anaerobic lifestyle. It appears that negative bacteria where ␴70 factors predominate and the ␴54- MYXXA has evolved to efficiently produce ATP over a wide range dependent RNA polymerase complex is accessory (20). In of oxygen concentrations in accord with its habit of sporulating particular, ␴54-dependent transcription plays an important role in within the environment of a fruiting body covered with slime. fruiting body development (17). The two basic metabolic pathways, glycolysis and the TCA cycle, Chaperone͞degradation PHX genes in MYXXA include 49 involve 10 and 15 genes, respectively (see Table 7, which is published genes highlighting 8 DnaK PHX genes, the most of any bacterial as supporting information on the PNAS web site). Counts of PHX genome sequenced to date, and an additional 7 DnaK genes that are genes in these pathways of the ␦ genomes are reported in Table 2. not PHX (see Table 8, which is published as supporting information Microbes with a preference for aerobic growth usually feature many on the PNAS web site). Among the other Ϸ200 prokaryotic PHX genes functioning in the TCA cycle and few PHX genes in genomes currently available, there is usually a single DnaK per glycolysis (16, 17). By contrast, microbes adapted for fermentation genome, and at most five (data not shown). In sharp contrast to of carbohydrates generally involve more glycolysis PHX genes. MYXXA, each of the other four ␦ genomes involves a unique DnaK Microbes with many PHX genes in both respiratory and fermen- gene (Table 8). MYXXA also has multiple PHX peptidyl-prolyl tative pathways are predicted to be facultative aerobes. In symbiotic cis-trans (PPI) genes embracing all three known types or parasitic organisms, few of the glycolytic enzymes and few of the (FKBP, cylophilin, and parvulin). Many PHX degradation genes TCA enzymes tend to be PHX (17). An aerobic environment is also (clpX, clpA͞B, clpP, hslU, hslV) and two htpX (HSP90) are con- predicted when the genome encodes several PHX oxygen detoxi- spicuous (see the list of PHX in Tables 9–15, which are published fying genes (17). as supporting information on the PNAS web site). The chaperone͞ degradation genes tig, clpP, clpX, pep, and lon occur as a gene cluster TCA Cycle Gene Cluster. In MYXXA, the PHX genes for 2-oxoglu- as depicted in Scheme 2. Two PHX duplicates of chaperonin groEL tarate dehydrogenase E1 (sucA) and E2 (sucB) components, which entail Ն75% identity. overlap 10 bp, are encoded in the same operon. Moreover, the six We conjecture that many of the chaperone͞degradation genes PHX genes for succinyl-CoA synthase ␣ subunit (sucD), ␤ subunit were acquired concomitant with predation in MYXXA. Several (sucC), succinate dehydrogenase flavoprotein subunit (sdhA), suc- strongly PHX groups of genes may facilitate predation. These

cinate dehydrogenase iron-sulfur protein (sdhB), malate dehydro- groups include genes involved in secretion of digestive enzymes, MICROBIOLOGY genase (mdh), and isocitrate dehydrogenase (icd) of the TCA cycle nine omp (porin) genes, and a melange of Ͼ70 protease͞peptidase

Karlin et al. PNAS ͉ July 25, 2006 ͉ vol. 103 ͉ no. 30 ͉ 11353 Downloaded by guest on October 2, 2021 and P2, feature a hyperacidic carboxyl residue run that is thought to act in adapting mRNA chains to the ribosome. Location and organization of the major RP gene cluster. Most bacterial genomes carry a cluster, accounting for 15–40% of all RP genes, Scheme 2. positioned proximal to the origin of replication, thus permitting an early expression of these RPs in the cell cycle. Several PHX genes fundamental in protein synthesis, including tuf, fus, rpoA, rpoB, and genes including , pitrilysin, fungallysin, etc. Numerous rpoC and several chaperones (e.g., groEL, groES, and tig), are ABC transporter genes stand out as PHX (see supporting infor- encoded within or proximal to the major RP cluster in many mation in Table 11). At least 10 PHX TonB-dependent receptors bacterial genomes. Archaeal genomes, often lacking a unique origin are encoded. These receptors compare with Caulobacter crescentus, of replication, delimit a less extended RP cluster compared with which shows at least 25 PHX TonB receptors, which are involved bacterial genomes. By contrast, the RP genes of yeast and of higher in absorption, and which convey iron or other nonsoluble sub- eukaryotes are randomly dispersed over their genome. However, ␦ stances into and out of the bacterial cell (15). Serine͞threonine three genomes (MYXXA, DESVU, and DESPS) have their (S͞T) kinase, phosphatase and histidine kinases, and cell cycle primary RP gene cluster located at or near the Ter region of the stress͞heat shock proteins contribute to regulating the develop- genome whereas BDEBA and GEOSU locate their RP cluster mental-sporulation phase of MYXXA (6, 7, 21). significantly closer (less than 1 Mb) to the origin of replication (oriC) similar to the organization of most bacterial genomes. As MYXXA possesses two genetically distinct motility systems with S1 the RPs L25 and S2 are often isolated (but not always) in designated adventurous (A) and social (S) (22–24). MYXXA the genome. Intermeshed with the major RP cluster of GEOSU are features in excess of 12 PHX A-motile gliding proteins (see Table the genes secY, adk, tuf, fus, rpoB, rpoC, nus, and secE, Ϸ50 kbp 9), associated with its abundant slime secretion (24). The S motility from the origin of replication (oriC). The major RP cluster of system involves type IV pili and fibrils in cell movement. The layers MYXXA incorporates the proteins Fus, Tuf, SecY, and RpoA. The of slime excretions of polysaccharides provide a solid surface for cell principal RP cluster of BDEBA includes the proteins RpoA, SecY, Ͼ movements (24). In this context, MYXXA also possesses 35 PHX RpoC, RpoB, NusG, SecE, and Tuf located Ϸ92 kbp from oriC. lipoproteins, some of which may be involved in slime secretion. The prime RP cluster of DESVU includes the proteins EF-G, SecY, RpoA, Tig, and ClpX. Here, RpoB and RpoC are encoded in a Distinguishing Features of ␦-Proteobacterial Genomes. Two giant separate cluster with four RP genes. The principal RP cluster of ribosomal protein S1 genes are present in all ␦ genomes. Other bacterial DESPS includes the protein translation processing genes nusG, genomes sequenced to date have at most a single S1 gene. Archaea rpoB, rpoC, map, fus, tuf, secY, rpoA, EF-Ts, and rrf (ribosome and eukaryotes have no S1 gene. Strikingly, there are two copies of recycling factor). Note that secY is encoded as part of the major RP the S1 ribosomal protein gene in each of the five ␦ genomes cluster in every ␦ genome, as in many other bacterial genomes, analyzed. One copy is highly conserved, those of size 500–600 aa emphasizing a role in translation. with 50–60% sequence identity among different species, and is The complement of asparaginyl and glutaminyl tRNA synthetase (AARS) usually PHX (Table 3). genes in ␦ genomes. The accuracy of the AARS enzymes is essential The RP gene S1, commonly exceeding 500 aa in length, is for correct translation of the genetic code. Most bacteria differ from essential in Gram-negative bacteria for initiating translation and is the E. coli model of tRNA aminoacylation for asparagine, glu- encoded separately from the main RP cluster (25). S1 is overall tamine, , proline, and lysine (26, 27). The sequencing of acidic, binds weakly and reversibly to the small ribosomal complex, numerous genomes verified the absence of the regular glutaminyl and interacts with mRNA chains, whereas most other RPs bind and asparginyl AARSs (Table 4 and supporting information in strongly to the complex (25). S1 can facilitate binding of mRNA that Table 10). Other anomalies for lysyl, cysteinyl, and prolyl AARS were also revealed for Gram-positive bacteria, for ␣-, ␤-, and lacks a strong Shine-Dalgarno sequence, allowing their translation ␧-proteobacterial genomes, for most obligate intracellular patho- by the ␦ proteobacteria. S1 is also encoded in the deeply branching gens, and for archaeal genomes (27). AARS representations in Gram-negative hyperthermophiles Aquifex aeolicus and Thermo- ␥-proteobacterial genomes are variable, with most in possession of toga maritima. The 820-aa S1 protein of T. maritima can be the cognate AARS for every but not in Pseudomonas recognized as a fusion of cytidylate kinase, involved in nucleotide genomes (ref. 26 and Table 4). The five ␦ genomes possess the biosynthesis, with a standard S1 sequence. The S1 protein of regular AARS for each amino acid (Table 10), including two gene genomes and those of low G ϩ C Gram-positive bacteria copies for GluRS in MYXXA and DESVU; two gene copies for () are of reduced size in the range of 380–410 aa. Acidic LysRS in MYXXA, BDEBA, GEOSU, and DESPS; two gene RPs are rarely present in bacterial genomes, except for S1 and copies for ThrRS in MYXXA and BDEBA; and two gene copies L7͞L12. L7͞L12, as with the eukaryotic ribosomal proteins P0,P1, for LeuRS in MYXXA. Generally when the regular glutamine and͞or asparagine AARS genes (asnS and glnS) are lacking, an amido-transferase pretrans- Table 3. Ribosomal protein S1 lation modification mechanism (GatCAB of three subunits) is E(g)* Location Size, aa available that converts glu-tRNAgln to gln-tRNAgln and͞or asp- tRNAasn to asn-tRNAasn. Subsequently, the correct charging of MYXXA 1.19 4143609 720 asn gln MYXXA 1.64 4561591C 569 both asn-tRNA and gln-tRNA occurs by comparable trans- BDEBA 2.47 1011942 594 amidation reactions compensating for the lack of AsnRS and BDEBA 0.82 1138486 397 GlnRS function, respectively (26, 27). Strikingly, the five ␦- DESVU 1.27 1551153C 486 proteobacterial genomes include asnS and glnS as well as the genes DESVU 1.90 3303332C 576 for the GatCAB amidotransferase complex (Table 4). The ␦- GEOSU 0.92 1303580 401 proteobacterial pattern is rare among 180 bacterial genomes ana- GEOSU 0.92 2872001 573 lyzed. It occurs only in pneumophila (␥ proteobacteria), DESPS 1.36 1012514 569 and in the non-proteobacteria Deinococcus radiodurans, Thermus DESPS 0.84 3213437 395 thermophilus (Deinococcus-Thermus group) and Pirellula sp. *Predicted highly expressed genes are indicated in bold. C, complementary (Planctomycetales). Among these species, it has been shown that strand the genome of D. radiodurans does not encode the regular enzymes

11354 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0604311103 Karlin et al. Downloaded by guest on October 2, 2021 Table 4. tRNA synthetases (asnS and glnS) and glutamyl͞aspartyl anaerobic environment (18, 19). Rbr is widespread in archaea and amidotransferases (gat) in Bacteria some bacteria, particularly in organisms that die in the presence of Group* asnS glnS gat oxygen. The function of Rbr has been debated, and recent evidence both in vivo and in vitro strongly supports Rbr in a role of a novel ␦ ϩϩϩ Proteobacteria (5) oxidative stress protection system (18, 19). A strictly anaerobic MYXXA ϩϩϩ ϩϩϩ organism needs to eliminate oxygen radicals that arise during BDEBA A Ϫ GEOSU ϩϩϩ reduction of SO4 to SO . Genomic analyses have shown that DESVU ϩϩϩ Rbr-like proteins are ubiquitous among archaea and bacteria and DESPS ϩϩϩ in genomes of many anaerobes that encode multiple Rbr homo- ␣ Proteobacteria (18) ϪϪϩ logues. The main function of Rbr seems to be reduction of hydrogen ␤ Proteobacteria (11) Ϫϩϩ peroxide (19). ␥ † ϩϩϪ Proteobacteria (27) TonB-dependent receptors. Allofthe␦ genomes encode multiple ␥ Proteobacteria (6)‡ Ϫϩϩ ␥ Proteobacteria (2)§ ϪϪϩ tonB-dependent receptors that channel various complex metals ␥ Proteobacteria (1)¶ ϩϩϩ into and out of bacterial cells. In particular, MYXXA features at ␧ Proteobacteria (4) ϪϪϩ least 10 PHX TonB-dependent receptors that it may use for Firmicutes (35) ϩϪϩ predation. The other ␦ genomes also encode multiple TonB- Actinobacteriales (13) ϪϪϩ dependent receptors, but none are PHX (Table 5). The TonB- Spirochaetales (5) ϩϪϩ dependent receptor proteins interact with outer membrane pro- ϪϪϩ Chlamydiales (5) teins and energize uptake of specific substrates (e.g., iron). These (8) ϩϪϩ Chloroflexi (1) ϪϪϩ substrates either are poorly permeable through the porin channels Bacteroidales (3) ϩϩϪ or are encountered at very low concentrations. DESVU and Deinococcus-Thermus (2) ϩϩϩ BDEBA encode 6 and 10 copies, respectively, of the TonB receptor. Chlorobiales (1) ϪϪϩ GEOSU shows 7 copies (1 PHX) and DESPS has 2 copies. Aquificales (1) ϪϪϩ ␴54-Activator proteins, histidine kinase (HK), and response regulator (RR) Thermotogales (1) ϪϪϩ types. There are Ͼ16 PHX genes in MYXXA that encode ␴54 ϩϪϩ (1) activators and at least an additional 28 genes encoding ␴54-activator Planctomycetales (1) ϩϩϩ proteins not PHX. These proteins play important roles in fruiting *Numbers in parentheses indicate the number of species. body development (6, 7, 21). MYXXA has many ␴70 factors most †Enterobacteriales (12), Pasteurellales (4), Alteromonadales (2), Vibrionales of which are ECF (extracytoplasmic function) sigmas (Table 9; see (5) and Xanthomonadales (4). also ref. 29). Many of the ␴54-activator proteins are PHX, but only ‡ (5) and (Thiotricales). three among the ␴70 factors are PHX (Table 5). Sensory and signal § () and Methylococcus capsulatus (Methyl- ococcales). histidine kinase genes in excess of 133 copies are widely distributed ¶L. pneumophila (Legionellales). in the MYXXA genome, and 139 genes are separately character- ized as RRs. In addition, at least 36 genes constitute hybrid two-component systems consisting of an HK coupled to an for asparagine biosynthesis (asnA or asnB) and that Asp-tRNAAsn RR domain. Serine͞threonine kinase and phosphatase (STPK and transamidation is the only pathway by which D. radiodurans can STPh, respectively) may function with the Forkhead ␴54-enhancer synthesize asparagine (28). The coexistence of GatCAB with binding proteins. An unprecedented number of at least 102 STPKs AsnRS and GlnRS might then be explained in the nine organisms balanced by at least 18 STPh are distributed around the genome where GatCAB coexists with AsnRS and GlnRS by a role of (Table 5). GatBCA in the biosynthesis of asparagine or glutamine. We found The HK genes represent the most abundant collection of regu- that one or more genes for glutamine biosynthesis (glnA) are latory genes in GEOSU. Additionally, 95 genes are characterized as present in these organisms. Five genomes are missing the enzymes RRs (Table 5). There are 22 genes containing together the HK and for asparagine biosynthesis (asnA and asnB), i.e., D. radiodurans, T. RR domains and 8 gene pairs that locate an HK gene consecutive ␦ thermophilus, L. pneumophila, and the two predator genomes to a RR gene. The GEOSU genome is impressive, with a total of MYXXA and BDEBA. However, a gene for asparagine biosyn- ␴54 ␦ 28 -activator proteins, of which 22 copies are described as thesis (asnB) is present in the genomes of the sulfate-reducing ␴54-dependent DNA-binding genes and 4 copies are described as Pirellula proteobacteria GEOSU, DESVU, and DESPS and in sp. ␴54-dependent transcriptional regulators, but only one representa- In the latter organisms, the coexistence of GatCAB with AsnRS tive of ␴70 occurs. In 14 occurrences, the ␴54 activators are encoded and GlnRS might be explained by a role of GatCAB in asparagine contiguously with an HK domain. Three examples of serine͞ or glutamine biosynthesis. MYXXA and BDEBA may have been threonine phosphatase genes occur, but no STPK is found in the assured of receiving asparagines from their prey, secondarily ac- quiring GatBCA. GEOSU genome. Anaerobic detoxification genes. Rubrerythrin (Rbr), with a di-iron DESVU features 56 HK genes, 79 RRs, and 18 genes combining , is found often PHX in anaerobic and microaerophilic the HK and RR domains in a common transcript. Many contiguous ͞ bacteria and in archaeal organisms. The presence of Rbr is inter- gene pairs encode the HK and RR domains. Three serine ͞ preted as an oxidative stress protection system in air-sensitive threonine (S T) phosphatase genes and apparently a single STPK ␴54 bacteria and archaea (18, 19). DESVU and DESPS are obligate is detected in DESVU. Remarkably, 32 -dependent transcrip- ␴54 anaerobic bacteria but appear to possess oxygen-reducing systems. tional regulators occur in DESVU, and 4 -activator genes joined They have been shown not to grow aerobically and appear to be with HK genes are recognized. There are only two sequences of ␴70 inhibited by the presence of molecular oxygen. Superoxide dis- factors. mutase (Sod) and (Kat) are moderately expressed in both DESPS encodes 23 ␴54 activators, one ␴54 factor, and two ␴70 genomes, apparently allowing some direct oxygen detoxification factors. The most frequent gene types are HK (17 copies) and RR (ref. 3 and 5; see also supporting information in Tables 12 and 15 (32 copies) and many genes whose domains are united into the same with ␦-genome lists of PHX genes). Rubrerythrin (Rbr), Ni- operon. There are three STPK and two STPh genes. The most

gerythrin, and Rubredoxin oxidoreductase were recently described abundant gene types in BDEBA are the two-component systems MICROBIOLOGY as alternative oxidation stress protection proteins functioning in an HK (54 copies) and RR (41 copies).

Karlin et al. PNAS ͉ July 25, 2006 ͉ vol. 103 ͉ no. 30 ͉ 11355 Downloaded by guest on October 2, 2021 Table 5. Representation of selected gene families in ␦ proteobacteria Gene MYXXA* BDEBA* GEOSU* DESVU* DESPS*

Histidine kinase 133 (10) 54 (Ϫ) 87 (1) 53 ϩ 3† (Ϫ)17(Ϫ) Response regulators 139 (31) 41 (Ϫ) 95 (3) 75 ϩ 4† (Ϫ) 32 (1) His-kinase and response regulator 36 (4) 3 (Ϫ)22(Ϫ)18(Ϫ) Ϫ Ser͞Thr protein kinase 102 (8) 6 (Ϫ)1‡ (1‡) 1 (1) 3 (Ϫ) Ser͞Thr phosphatase 18 (3) 5 (Ϫ)3‡ (1‡)3(Ϫ)2(Ϫ) ␴54-Dependent DNA binding 20 (7) Ϫ 22 (1) 4 (Ϫ) Ϫ ␴54-Dependent transcriptional reg. 22 (8) 6 (1) 4 (Ϫ)24ϩ 2† (Ϫ)3(Ϫ) ␴54 Others 2 (1) 2 (1) 2 (Ϫ)1ϩ 1† (Ϫ)20§ (Ϫ) All ␴54-dependent 44 (16) 8 ϩ 9¶ (2) 28 (1) 29 ϩ 3† (Ϫ)23(Ϫ) RNA pol ␴54 factor 1 (Ϫ)2(Ϫ)1(Ϫ)1(Ϫ)1(Ϫ) RNA pol ␴70 factor 35 (3) 2ʈ (Ϫ)1(Ϫ)2(Ϫ) 2 (1) RNA pol ␴-32 1 (Ϫ)2ʈ (Ϫ) 1 (1) Ϫ 1** (Ϫ) ␴ Factor for flagellar operon (FliA) Ϫ 1(Ϫ)1(Ϫ)1(Ϫ)1ϩ 1†† (Ϫ) (Metallo)-␤-lactamase 30 (3) 11 (Ϫ)13(Ϫ)8(Ϫ) Ϫ‡‡ GGDEF domain protein 16 (4) 4 (Ϫ) 23 (1) 25 (Ϫ) Ϫ TonB protein 16 (3) 6 (Ϫ) Ϫ 4(Ϫ)1(Ϫ) TonB-dependent receptor 12§§ (4) 4 (Ϫ) 7 (1) 2 (Ϫ)1(Ϫ) OmpA 10 (4) 2 (Ϫ) 5 (1) 2 (Ϫ) Ϫ Peptidyl-prolyl cis-trans isomerase 13 (5) 10 (4) 7 (Ϫ)6(Ϫ) 4 (1)

*Number of highly expressed genes are in parentheses. †In megaplasmid. ‡One identified as ‘‘HPr(Ser) kinase͞phosphatase.’’ §Similar to two-component system response regulators (Ntr family). This family includes regulatory proteins that activate the expression of genes from promoters recognized by core RNA polymerase associated with the alternative ␴54 factor. They have a conserved domain of Ϸ230 residues involved in the ATP-dependent interaction with ␴54. ¶Nine other genes with similarity 20–30% to ␴54-dependent transcription regulators, named transcriptional regulator NifA (2), response regulator containing CheY-like receiver AAA-type ATPase and DNA-binding domains (5), transcriptional regulatory protein zraR (1), and flagellar transcriptional activator protein flbD (1). ʈTwo genes identified as ␴70͞␴32. **Identified as ‘‘RNA polymerase ␴-B factor.’’ ††Very low similarity (0–8%) to other flagellar ␴ factors or to ␴70 (5–12%). ‡‡␤-Lactamase sequences from DESVU have similarity (23–25%) to three sequences from DESPS identified as flavoproteins. §§Two of these are identified as ‘‘tonB system transport proteins’’ of the ExbD͞TolR and ExbB͞TolQ families, respectively.

Secretion proteins. SecA is impressive, with the highest E(g) score GGDEF proteins. GGDEF proteins are related to cyclic diguanylate relative to all PHX genes of MYXXA. The gene also qualifies as metabolism and in regulation of the transition from sessile to motile PHX in most of the other ␦ genomes. forms (33). MYXXA contains 16 copies (4PHX), BDEBA contains Bacteria have developed complex mechanisms to deal with 4 copies, GEOSU shows 23 copies, and DESVU may encode 25 membrane translocation, secretion of polypeptides, and correct copies. folding. A dimeric SecA, essential and unique to bacteria (not ␤-lactamase enzymes. ␤ lactamase catalyses the opening and hydro- found in archaea), is fundamental for protein translocation to lysis of the ␤-lactam ring of ␤-lactam antibiotics such as penicillins the periplasm (30, 31). Apart from SecA, secretion-specific and cephalosporins (34). Metallo ␤ lactamase (30 copies in chaperones include SecB (32) and the signal recognition particle. MYXXA) is abundant in all five ␦ genomes. This gene can In these activities, the major chaperones GroEL, DnaK, and the putatively protect the genome from microbial antibiotics or provide trigger factor are also involved (12). In addition to structural and resistance against its own antibiotics. (Myxobacteria have substan- ancillary subunits, such as SecY, SecE, and SecG, the translocase tial capacity for polyketide biosynthesis and production of antibi- complex has a mechanical motor device, the SecA ATPase, that otics.) The ␦ proteobacteria are mainly soil inhabitants, and, binds to SecYEG to establish the functional translocase core because antibiotics are prodigiously manufactured by the Strepto- (30). Mycobacterium tuberculosis possesses two SecA paralogs myces soil bacteria and many fungal microbes, the metallo-␤- with distinct substrate specificities. SecY is prominent in the lactamase enzymes presumably provide a defense against antibiotic major RP cluster of all of the ␦ genomes, possibly indicating their molecules. The multiplicity of these ␤-lactamase genes may reflect relevance to protein synthesis. The SecA gene is also PHX in a gene dosage effect. Because the ␤-lactamase motif is a crucial , E. coli, Synechocystis, Mycoplasma pneumoniae, component of a large group of therapeutically useful antibiotics, Treponema pallidum, Borrelia burgdorferi, Aquifex aeolicus, and comprising penicillin, cephalosporin, and carbapenem families, other bacteria. The secretion pathway is used by many protein some bacteria express ␤-lactamase enzymes to escape the action of substrates. The cellular destination of all secretory polypeptides damaging antibiotics. Among soil habitats, Gram-negative bacteria is governed by a 20- to 30-residue amino-terminal sequence, the are generally more sensitive to ␤-lactam antibiotics than Gram- leader peptide, which also helps guide SecA binding to the positive bacteria. This sensitivity may relate to the hard cell wall substrate. SecA, SecB, and SecG are all involved in protein structure of Gram-positive bacteria. The sporulation capabilities of export and chaperone activity. Gram-negative bacteria also many soil Gram-positive bacteria may also protect them from secrete a variety of proteins into the extracellular and periplas- adverse effects, including antibiotics. Frequent genes of GEOSU mic milieu mediated by the secretion apparatus of types I to IV. include metallo-␤ lactamase, 13 copies; multiple repeats in These proteins can also influence bacterium–host interactions. DESVU, 8 copies; and, in BDEBA, 11 copies but no version in Other abundant gene classes. Multiple copies of the GGDEF domain DESPS. proteins, the metallo-␤-lactamase enzymes, and adventurous mo- Motility genes. Gliding motility genes (29 genes mostly PHX in tility proteins are conspicuous in ␦ genomes (Table 5). MYXXA; see also Table 9) contribute importantly to swarming

11356 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0604311103 Karlin et al. Downloaded by guest on October 2, 2021 movements. Also relevant are genes of twitching motility and the nent in cyanobacteria, and methanogenesis genes in methanogens. frizzy genes FrzA and FrzB, which, in combination with the (MglA, Examples of protein classes that are PHX in particular genomes MglB) operon, regulate reversing of the swarming motions in the include: (i) fatty acid metabolism in M. tuberculosis (12); (ii) fruiting body ensemble (24). The main genes governing movements primarily in and Ureaplasma urealyticum (12); of MYXXA are the adventurous gliding motility genes and many (iii) flagellar proteins in some ␣ proteobacteria (Mezorhizobium loti, pilus genes associated with social motility (22, 23). In BDEBA, C. crescentus) and in the spirochetes T. pallidum, B. burgdorferi (12, other groups of genes feature gliding and twitching motility genes 14); and (iv) a proliferation of PHX detoxification genes in D. of pilR, pilS, pilT, pilU, pilV types, each with multiple occurrences. radiodurans (15). Our results on PHX genes are consistent with These genes presumably contribute to BDEBA’s ability to attach to assessments of protein levels in two-dimensional gel electrophoresis its Gram-negative bacterial prey before entering the periplasm. (12, 13). Multifunctional proteins may be expected to attain high E(g) Methods values. For example, Pnp is fundamental to both mRNA processing ϭ Let G be a group of genes with average codon frequency g(x, y, z) and degradation and achieves the highest E(g) 2.66 value among for the codon (x, y, z) such that ͚g(x, y, z) ϭ 1 for each amino acid E. coli genes (12, 13). Aconitase not only interconverts citrate and family. Similarly, let {ƒ(x, y, z)} indicate the average codon fre- isocitrate for the TCA cycle but also serves as a sensor detecting quencies for the gene group F (F can be a single gene). The codon changes in the redox state and in assaying iron content within the usage difference of F with respect to G is calculated by the formula cell and attains the highest E(g) value (2.56) in D. radiodurans. Another multifunctional PHX protein of many genomes is GAPDH (gap), which catalyzes the oxidation of glyceraldehyde-3-P ͑ ͉ ͒ ϭ ͸ ͑ ͒ ͸ ͉ ͑ ͒ Ϫ ͑ ͉͒ B F G pa F ͫ f x, y, z g x, y, z ͬ, [1] in glycolysis and also possesses uracil DNA glycosylase activity, a ͑x,y,z͒ϭa senses oxidative stress, binds to RNA and DNA, and serves as a source of reducing equivalents. In contrast, proteins that are where {pa(F)} are the average amino acid frequencies of the genes required in few molecules per cell cycle are not expected to be of F (12, 13). Predicted expression levels with respect to individual highly expressed. Thus, the following gene groups are seldom highly standards can be based on the ratios ERP(g) ϭ B(g͉C)͞B(g͉RP), expressed: (i) specialized transcription factors, (ii) strict replication ECH(g) ϭ B(g͉C)͞B(g͉CH), ETF(g) ϭ B(g͉C)͞B(g͉TF), where C is the proteins, (iii) most repair proteins, and (iv) vitamin biosynthesis totality of all of the genes of the genome (RP, ribosomal protein enzymes (13). Overall, there is support for the proposition that each genes; TF, major protein synthesis factors; CH, major chaperone͞ bacterial genome has evolved a codon usage pattern reflecting degradation proteins). We introduce the expression measure E ϭ ‘‘optimal’’ gene expression levels for its typical lifestyle, habitat, and E(g) ϭ B(g͉C)͞(1͞3)[B(g͉RP) ϩ B(g͉CH) ϩ B(g͉TF)]. metabolic propensities (12, 13). We provide in supporting infor- The gene classes (RP, CH, and TF) serve as representatives of mation (Tables 11–15) complete lists of PHX genes for each ␦ highly expressed genes. Our method specifies genes with similar genome. Many of these genes offer attractive candidates for ex- codon usages to at least one of these classes as PHX. These perimental study. assignments are reasonable under fast growing conditions, where there is a need for many ribosomes, for proficient translation, and Note. After completing the foregoing analysis, two new ␦-proteobacterial ϩ for many chaperone proteins to ensure properly folded and trans- genomes were released: Desulfovibrio desulfuricans DESDE (3.73 Mb) G C content 57.8% (GenBank accession no. NC-007519) and Geobacter located protein products. E(g) is an estimate of the expression level ϩ Ͼ metallireducens GEOME (3.97 Mb) G C content 59.5% (GenBank of the gene g. The criterion E(g) 1 in conjunction with at least two accession no. NC-007517). The distinctive properties of ␦-proteobacterial of the values ERP(g), ETF(g), or ECH(g) exceeding 1.05 generally genomes set forth in the abstract also apply to these genomes, including (i) reflects high protein molar abundance (12, 13). presence of two genes for the giant ribosomal protein S1; (ii) the major RP Examples of PHX gene classes in most bacteria include: (i) most gene cluster situated either in the ter region (DESDE) or close to oriC RPs but generally not all; (ii) global protein synthesis genes (like (GEOME), in both cases including the gene for SecY; (iii) presence of rpoB, rpoC, tuf, and fus); (iii) major chaperone͞degradation pro- AARS genes for all amino acids (including asnS and glnS) and also the teins [like GroEL, DnaK, Tig, FtsH, Clp(A͞B), and PPI (peptidyl- GatCAB amidotransferase complex; (iv) multiple copies of anaerobic detoxification proteins (rubrerythrin and variants); and (v) a proliferation prolyl cis-trans isomerase)]; (iv) Pnp (mRNA processing and deg- of ␴54-activator proteins and of histidine kinases and response regulators, radation); (v) essential energy metabolic genes, including glycolysis either encoded separately or fused, but few (at most three) PHX ␴70 factors. genes mainly under anaerobic conditions and TCA cycle genes Other frequent protein classes include metallo-␤-lactamase, GGDEF- generally under aerobic conditions, photosynthesis genes promi- domain proteins, TonB receptors, and secretion proteins.

1. Woese, C. R., Kandler, O. & Wheeler, M. L. (1990) Proc. Natl. Acad. Sci. USA 87, 18. Lumppio, H. L., Shenvi, N. V., Summers, A. O., Voordouw, G. & Kurtz, D. M., Jr. (2001) 4576–4579. J. Bacteriol. 183, 101–108. 2. Rendulic, S., Jagtap, P., Rosinus, A., Eppinger, M., Baar, C., Lanz, C., Keller, H., Lambert, 19. Weinberg, M. V., Jenney, F. E., Jr., Cui, X. & Adams, M. W. (2004) J. Bacteriol. 186, C., Evans, K. J., Goesmann, A., et al. (2004) Science 303, 689–692. 7888–7895. 3. Heidelberg, J. F., Seshadri, R., Haveman, S. A., Hemme, C. L., Paulsen, I. T., Kolonay, J. F., 20. Buck, M., Gallegos, M. T., Studholme, D. J., Guo, Y. & Gralla, J. D. (2000) J. Bacteriol. 182, Eisen, J. A., Ward, N., Methe, B., Brinkac, L. M., et al. (2004) Nat. Biotechnol. 22, 554–559. 4129–4136. 4. Methe, B. A., Nelson, K. E., Eisen, J. A., Paulsen, I. T., Nelson, W., Heidelberg, J. F., Wu, 21. Kroos, L. (2005) Proc. Natl. Acad. Sci. USA 102, 2681–2682. D., Wu, M., Ward, N., Beanan, M. J., et al. (2003) Science 302, 1967–1969. 22. Spormann, A. M. (1999) Microbiol. Mol. Biol. Rev. 63, 621–641. 5. Rabus, R., Ruepp, A., Frickey, T., Rattei, T., Fartmann, B., Stark, M., Bauer, M., Zibat, A., 23. Ward, M. J., Lew, H. & Zusman, D. R. (2000) Mol. Microbiol. 37, 1357–1371. Lombardot, T., Becker, I., et al. (2004) Environ. Microbiol. 6, 887–902. 24. Kaiser, D. & Yu, R. (2005) Curr. Opin. Microbiol. 8, 216–221. 6. Jakobsen, J. S., Jelsbak, L., Welch, R. D., Cummings, C., Goldman, B., Stark, E., Slater, S. 25. Sengupta, J., Agrawal, R. K. & Frank, J. (2001) Proc. Natl. Acad. Sci. USA 98, 11991–11996. & Kaiser, D. (2004) J. Bacteriol. 186, 4361–4368. 26. Stathopoulos, C., Ahel, I., Ali, K., Ambrogelly, A., Becker, H., Bunjun, S., Feng, L., Herring, 7. Jelsbak, L., Givskov, M. & Kaiser, D. (2005) Proc. Natl. Acad. Sci. USA 102, 3010–3015. S., Jacquin-Becker, C., Kobayashi, H., et al. (2001) in Cold Spring Harbor Symposia on 8. Shimkets, L. J. (1999) Annu. Rev. Microbiol. 53, 525–549. Quantitative Biology (Cold Spring Harbor Lab. Press, Woodbury, NY), Vol. LXVI, pp. 9. Tojo, N., Inouye, S. & Komano, T. (1993) J. Bacteriol. 175, 4545–4549. 175–183. 10. Munoz-Dorado, J., Inouye, S. & Inouye, M. (1991) Cell 67, 995–1006. 27. Ibba, M. & Soll, D. (2000) Annu. Rev. Biochem. 69, 617–650. 11. Sockett, R. E. & Lambert, C. (2004) Nat. Rev. Microbiol. 2, 669–675. 28. Min, B., Pelaschier, J. T., Graham, D. E., Tumbula-Hansen, D. & Soll, D. (2002) Proc. Natl. 12. Karlin, S. & Mra´zek, J. (2000) J. Bacteriol. 182, 5238–5250. Acad. Sci. USA 99, 2678–2683. 13. Karlin, S., Mra´zek,J., Campbell, A. & Kaiser, D. (2001) J. Bacteriol. 183, 5025–5040. 29. Helmann, J. D. (2002) Adv. Microb. Physiol. 46, 47–110. 14. Karlin, S. & Mra´zek, J. (2001) Proc. Natl. Acad. Sci. USA 98, 5240–5245. 30. Economou, A. (1999) Trends Microbiol. 7, 315–320. 15. Karlin, S., Barnett, M. J., Campbell, A. M., Fisher, R. F. & Mra´zek, J. (2003) Proc. Natl. 31. Jilaveanu, L. B., Zito, C. R. & Oliver, D. (2005) Proc. Natl. Acad. Sci. USA 102, 7511–7516. Acad. Sci. USA 100, 7313–7318. 32. Ullers, R. S., Luirink, J., Harms, N., Schwager, F., Georgopoulos, C. & Genevaux, P. (2004) 16. Mra´zek, J., Spormann, A. M. & Karlin, S. (2006) Environ. Microbiol. 8, 273–288. Proc. Natl. Acad. Sci. USA 101, 7583–7588. 17. Karlin, S., Brocchieri, L., Campbell, A., Cyert, M. & Mra´zek, J. (2005) Proc. Natl. Acad. Sci. 33. Simm, R., Morr, M., Kader, A., Nimtz, M. & Romling, U. (2004) Mol. Microbiol. 53, 1123–1134. USA 102, 7309–7314. 34. Frere, J. M. (1995) Mol. Microbiol. 16, 385–395. MICROBIOLOGY

Karlin et al. PNAS ͉ July 25, 2006 ͉ vol. 103 ͉ no. 30 ͉ 11357 Downloaded by guest on October 2, 2021