<<

Genomic and proteomic comparisons between bacterial and archaeal and related comparisons with the yeast and fly genomes

Samuel Karlin*†, Luciano Brocchieri*, Allan Campbell‡, Martha Cyert‡, and Jan Mra´ zek*

Departments of *Mathematics and ‡Biological Sciences, Stanford University, Stanford, CA 94305-2125

Contributed by Samuel Karlin, March 22, 2005 Bacterial, archaeal, yeast, and fly genomes are compared with 40–60 bp of unrelated sequences (3). Two types of 30-bp repeats respect to predicted highly expressed (PHX) genes and several occur for each of the crenarchaeal genomes and only one set for genomic properties. There is a striking difference in the status of each euryarchaeal . A similar repeat arrangement is present PHX ribosomal (RP) genes where the archaeal genome in many thermophilic (table 4 of ref. 1). It is also observed generally encodes more RP genes and fewer PHX RPs compared in the genome of Bacillus halodurans, which is characterized as an with bacterial genomes. The increase in RPs in and eu- extreme alkaliphilic bacterium living optimally in an environment karyotes compared with that in bacteria may reflect a more of pH Ն9.5. The function of these repeats is unknown. complex set of interactions in archaea and in regulating , e.g., differences in structure requiring scaffolding of Representations of Short Palindromes. Archaea and bacteria tend to longer rRNA molecules, expanded interactions with the show underrepresentations of 4- and 6-bp palindromes (4), but machinery, and, in eukaryotic interactions with endoplasmic retic- eukaryotes do not, consistent with avoidance of restriction systems ulum components. The yeast genome is similar to fast-growing in prokaryotic genomes. bacteria in PHX genes but also features several cytoskeletal genes, including and tropomyosin, and several signal transduction Unique Versus Multiple Origins of Replication. The GC skew [strand regulatory from the 14.3.3 family. The most PHX genes of biases in (G Ϫ C)͞(G ϩ C) counts] (5–7) shows a clear difference Drosophila encode cytoskeletal and exoskeletal proteins. We between archaea and bacteria, apparently related to the existence found that the preference of a microorganism for an anaerobic of unique vs. multiple origins of replication. Archaeal genomes correlates with the number of PHX enzymes of the often have multiple origins of replication. For example, three active glycolysis pathway that well exceeds the number of PHX enzymes oriC have been identified experimentally in Sulfolobus solfataricus acting in the tricarboxylic cycle. Conversely, if the number of (cf. refs. 8–10). The Halobacterium genome carries at least two PHX enzymes of the tricarboxylic acid cycle well exceeds the PHX origins (11). It is argued that the Pyrococcus abyssi genome pos- enzymes of glycolysis, an aerobic metabolism is preferred. Where sesses a single origin (12, 13). The methanogens generally do not the numbers are approximately commensurate, a facultative show any GC skew, and, on this basis, it is surmised that they possess growth behavior prevails. multiple origins of replication.

Archaea ͉ Bacteria ͉ predicted highly expressed ͉ genomic comparisons ͉ Some Proteomic Comparisons Between Archaea and Bacteria Drosophila Protein and Amino Synthesis and Replication Factors. The energy system of most archaea is autotrophic (14). Probably on this he preceding paper (1) focused on the identity and analysis of basis, several archaea synthesize the full collection of amino acids, Tpredicted highly expressed (PHX) genes in archaeal genomes. including selenocysteine, and synthesize as well a wide assortment In this context, a variety of chaperone proteins stand out, especially of cofactors, which is consistent with the tendency of Archaea to thermosome and prefoldin. This paper proffers a series of genomic inhabit homogeneous extreme environments and concomitantly and proteomic comparisons between Archaea and Bacteria plus engage few PHX transport proteins. Translation elongation factors corresponding results on PHX genes of the eukaryotes Saccharo- (e.g., EF-1␣ and EF-2) occur as single genes in archaeal genomes myces cerevisiae and Drosophila melanogaster. (Table 1) but generally appear in multiple PHX copies in ␣-, ␤-, and ␥-proteobacterial genomes. The ribosome release factor Rrf is Some Genomic Comparisons Between Archaea and Bacteria found PHX in most bacteria and in yeast but is missing from Ribosomal Protein (RP) Gene Organization. Many RP genes differ archaea. The helicase protein RecG, which helps facilitate branch MICROBIOLOGY between most archaeal and bacterial genomes [see the clusters of migration of the Holliday junction, is widespread in bacteria but not orthologous groups of proteins (COG) database of the National found in archaea (15). Presumably, archaea have other proteins to Center for Biotechnology Information; see also ref. 2]. Most do these essential functions. bacterial genomes possess a unique origin of replication and feature a large cluster (putative ) nearby encompassing 15–40% of Membrane Lipid . from all of the three all RP genes. Generally, archaeal RP genes are confined to small domains of life contain polyisoprenes, but eukaryotes use signifi- clusters. Many genes involved in protein synthesis, including tuf, fus, cant amounts of sterols not abundant in either bacteria or archaea. rpoA, rpoB, rpoC, and some chaperones are encoded within or Membranes of Gram-negative bacteria and eukaryotes are replete proximal to the large RP cluster in bacteria but not in archaea. In with phospholipids and lipid-modified proteins, whereas archaea contrast, the RP genes of yeast (and of higher eukaryotes) are generally emphasize prenylated ether lipids but make little or no generally randomly dispersed throughout the genome. There are fatty acids (16). many in crenarchaeal protein-coding genes and tRNA genes but few or none among euryarchaeal genes. Abbreviations: TCA, tricarboxylic acid; RP, ribosomal protein; PHX, predicted highly Extended 30-bp Repeats. All archaeal genomes, except Halobacte- expressed. rium sp., contain one or more clusters of 24–30 bp repeat elements, †To whom correspondence should be addressed. E-mail: [email protected]. usually in excess of 50 copies, which are individually separated by © 2005 by The National Academy of Sciences of the USA

www.pnas.org͞cgi͞doi͞10.1073͞pnas.0502314102 PNAS ͉ May 17, 2005 ͉ vol. 102 ͉ no. 20 ͉ 7309–7314 Downloaded by guest on September 28, 2021 Table 1. Various comparisons of genomic and proteomic properties Property Bacteria Archaea

Shine–Dalgarno motifs in translation ϩ, Generally Variable (see table 4 of ref. 1); many leaderless initiation transcripts in Crenarchaeal genomes GC skew ϩ, Generally; Ϫ, Deinococcus radiodurans, Ϫ Synechocystis, T. maritima, and A. aeolicus Periodic 24- to 32-bp repeat array ϩ B. halodurans; ϩ, most thermophilic bacteria, ϩ, Except Halobacterium sp. NRC-1, see table 4 of ref. 1 Methanopyrus kandleri, and M. maripaludis; two sets of repeats in Crenarchaeal genomes Multiple chromosomes Mostly Ϫ (except Vibrio cholerae, D. Ϫ radiodurans, Sinorhizohium meliloti, A. tumefaciens, and Rhodobacter sphaeroides) Linear vs. circular chromosomes Mostly circular; linear (B. burgdorferi; A. Circular tumefaciens; R. sphaeroides; Streptomyces) HSP60 (double ring structure) GroEL, 7-units in each ring Thermosome, 8 or 9 units in each ring ϩ Mostly missing, generally present in those with opt. growth temp. Յ65°C (except M. maripaludis) Prefoldin complex Ϫϩ, ␤-subunit; ␣-subunit absent from Crenarchaea Trigger factor ϩϪ PPI peptidyl-proplyl cis–trans isomerase ϩ, Usually many ϩ RP S1 (Ͼ500 aa in Gram-negative bacteria) ϩ, Ϸ400 aa in low G ϩ C Gram-positive Ϫ SI Ϸ350 aa in cyanobacterial genomes Cluster of RP genes ϩ, Usually one large cluster including Mixed, mostly short 20–40% of all RPs

RPs P0,P1,P2 (acidic, regulatory) in eukaryotes Ϫ Only P0 Existence of introns in genes and tRNA Ϫ Generally, ϩ Crenarchaea, Ϫ Euryarchaea No. of origins of replication Generally one (possible exceptions: D. Mostly unknown; at least three in S. solfataricus, radiodurans, Synechocystis, T. maritima, two in Halobacterium sp., and one in and A. aeolicus) Pyrococcus PCNA replication factor; sliding clamp unit Ϫϩ Cdc48 cell division protein Ϫϩ FtsZ ϩ Variable; not found in Crenarchaea Lipopolysaccharide surface antigen (Lps Variable, not present in Gram-positive Ϫ family) Elongation factors G, Tu Multiple copies in proteobacteria One copy of each Protein median lengths among proteins Range 260–295 aa; except Neisseria meningitidis, 230–250 aa; in Euryarchaea Ն80 aa 239 aa

The notes on Bacteria are based on more than 50 diverse genomes. The notes on Archaea are based on 19 genomes. opt., Optimal; temp., temperature.

Lipopolysaccharide biosynthesis genes of anomalous codon us- or in any archaeal genomes to date. Hexokinase converts glucose to age encode a hierarchy of surface antigens (the Lps family) that glucose-6-phosphate. However, glucose-6-phosphate arises from often occur in clusters. Lps biosynthesis genes are present in many other hexoses and from glucose transported into the cell by means bacterial and in most archaeal genomes but are not found in of the phosphotransferase system (PTS). Bacteria that rely on Gram-positive bacteria and apparently are not present in eu- carbohydrates as a primary energy source use the PTS system to karyotes. The Gram-positive bacteria have lipoteichoic acid at- transport glucose into the cytoplasm and concomitantly phosphor- tached to their peptidoglycan, and these are weakly comparable to ylate glucose making hexokinase͞glucokinase expendable. PTS Lps of Gram-negative bacteria. The Lps complex also plays a role genes are apparently absent from all archaea. in cell adhesions. The lipid-A anchor (connecting the sugar and lipid Generally, glycolysis genes in archaea are either not PHX or moieties) prominent in Escherichia coli and Salmonella typhi- almost entirely missing. For example, glucose-phosphate isomerase murium appears to be missing from Gram-positive and archaeal is missing from the archaeon Archaeoglobus fulgidus and from genomes. The enzymatic apparatus for lipid synthesis is much Pyrococcus species, as well as from bacteria in the Mycoplasma reduced in most archaeal genomes. For example, FabB, FabD, and group. Phosphofructokinase is missing from Archaea and several AcpP are not found in Archaea. According to the COG database, proteobacteria. There are no archaeal genomes with more than of 78 gene families involved in lipid metabolism extant in Bacteria, three (mostly one) glycolytic genes PHX. only 41 are also in Archaea, and none is unique to Archaea. Fixation (nif) genes are present in several bacterial and archaeal genomes but not in eukaryotes. nif genes in archaea are Tricarboxylic Acid (TCA) Cycle Genes. In aerobic environments, the evolutionarily related to nif genes in bacteria and operate by the TCA cycle, apart from production of energy, can contribute in same fundamental mechanism (17). It is proposed that some genes myriad ways to cellular needs, especially in making precursors and of this kind wander about by means of lateral gene transfer (e.g., as intermediates to macromolecules, e.g., to amino acids, vitamins, occurs in Klebsiella). The predominant nitrogenases in methano- and . All TCA cycle genes are present in archaeal genomes gens appear to be molybdenum nitrogenases as is the case in except in the methanogens and Pyrococcus species. bacteria. The methanogens vary with respect to nitrogen fixation. For example, neither Methanococcus jannaschii nor Methanococcus RP Gene Numbers and PHX Accounts Comparing Archaeal and volcanii fix nitrogen, whereas Methanosarcina barkeri and Methano- Bacterial Genomes coccus thermolithotropicus do (17). The RP gene class, the TF (protein synthesis) gene ensemble and the major CH (chaperone͞degradation) gene group are used as Glycolysis. Hexokinase and glucokinase are prominent glycolysis standards of codon usage biases for ascertaining PHX genes of enzymes in eukaryotes, but the former is not found in most bacterial bacterial genomes (1). In bacteria, usually (but not always) the RP

7310 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0502314102 Karlin et al. Downloaded by guest on September 28, 2021 Table 2. Counts of PHX RP genes in archaeal, yeast, and Table 2. (continued) bacterial genomes Genome All RP RP Ն80 aa PHX RP Difference Genome All RP RP Ն80 aa PHX RP Difference

Archaea sinme 55 45 42 3 SULSO 65 54 35 19 meslo 51 44 41 3 SULTO 67 55 34 21 agrtu 43 39 36 3 AERPE 58 49 34 15 brume 55 44 40 4 PYRAE 54 52 14 38 helpy 52 40 12 28 PYRAB 62 51 34 17 camje 52 40 28 12 PYRFU 63 51 37 14 bacsu 52 41 40 1 PYRHO 61 50 28 22 staau 56 44 41 3 THEAC 54 46 17 29 lismo 53 41 39 2 THEVO 57 47 13 34 lacla 56 43 41 2 PICTO 55 47 31 16 strpn 56 42 39 3 ARCFU 59 48 28 20 cloac 54 45 38 7 METKA 62 52 37 15 strco 62 45 42 3 METTH 61 49 30 19 myctu 55 46 29 17 METJA 62 50 46 4 mycle 52 45 31 14 METMP 60 49 49 0 synsq 53 44 36 8 METAC 59 48 36 12 nossp 56 44 29 15 METMA 58 48 42 6 deira 52 45 42 3 HBNRC 56 45 19 26 aquae 53 44 24 20 NANEQ 59 52 13 39 thema 50 40 21 19 Average 59.6 49.6 30.4 19.3 Average 53.7 43.4 37.5 5.9 SD 3.55 2.65 10.8 10.6 SD 2.79 1.78 7.3 6.8 90% range 54–65 46–54 13–46 4–38 90% range 51–56 40–45 24–43 0–19 Yeast S. cerevisiae* 138 127 126 1 The species abbreviations for Archaea are defined in the legend for Table Bacteria 3. ec157, E. coli O157:H7 EDL933; yerpe, Yersinia pestis KIM; pseae, Pseudo- ec157 55 44 44 0 monas aeruginosa; haein, Haemophilus influenzae; pasmu, Pasteurella mul- yerpe 55 43 42 1 tocida; xanca, Xanthomonas campestris; sheon, Shewanella oneidensis; vibch, pseae 54 43 42 1 V. cholerae; borbr, Bordetella bronchiseptica; borpa, Bordetella parapertus- haein 54 43 43 0 sis; borpe, Bordetella pertussis; ralso, Ralstonia solanacearum; neime, N. pasmu 54 43 41 2 meningitidis MC58; caucr, Caulobacter crescentus; sinme, Sinorhizobium me- xanca 54 43 40 3 liloti; meslo, Mesorhizobium loti; agrtu, Agrobacterium tumefaciens; brume, sheon 54 43 43 0 Brucella melitensis; helpy, Helicobacter pylori 26695; camje, Campylobacter vibch 54 44 44 0 jejuni; bacsu, ; staau, Staphylococcus aureus Mu50; lismo, borbr 55 46 40 6 Listeria monocytogenes EGD-e; lacla, Lactococcus lactis; strpn, Streptococcus borpa 54 45 40 5 pneumoniae TIGR4; cloac, Clostridium acetobutylicum; strco, Streptomyces borpe 55 45 41 4 coelicolor; myctu, M. tuberculosis H37Rv; mycle, ; synsq, ralso 53 44 41 3 Synechocystis sp. PCC6803; nossp, Nostoc sp. PCC 7120; deira, D. radiodurans; neime 55 44 42 2 aquae, A. aeolicus; thema, Thermotoga maritima. caucr 53 44 40 4 *Mitochondrial RPs are excluded.

genes comprise the most conspicuous PHX class of genes. However, genomes, indicating that many RP genes of archaea have reduced RP genes of archaea do not comprise the most conspicuous PHX predicted expression levels akin to an average gene (see Table 2). class of genes. Table 2 reports the count of RP genes of at least an The archaeal ribosome structure appears to be a small-scale model 80-codon length across 34 diverse bacterial genomes, and Table 2 of the eukaryotic ribosome (2). reports the count of RP genes of at least 80 codons in 19 archaeal Lecompte et al. (2) proffer comparative analysis on the nature of genomes. (The requirement that a RP gene have a length of Ն80 RP summarizing distributions of 45 bacterial, 14 archaeal, and 7 codons allows the determination of the statistical validity or inval- eukaryotic genomes. Lecompte et al. (2) report a total of 78 RPs in idity of the PHX property.) The number of these RP genes in Eukarya, 68 in Archaea, and 57 in Bacteria, and they postulate

archaeal genomes (Table 2) range from 45 to 55 (average, 49.6; reductive evolution, i.e., the loss of RPs in the archaeal and bacterial MICROBIOLOGY standard deviation, 2.65), whereas the range in bacterial genomes lineages. Lecompte et al. (2) further observe that there are 34 RP (Table 2) is 39–46 (average, 43.4; standard deviation, 1.78). Actu- genes shared by bacteria, archaea, and eukaryotes. There are also ally, there are more RP genes in almost all archaeal genomes 33 RP genes shared by archaeal and eukaryotic genomes not found compared with almost all bacterial genomes. The contrast in in any bacteria, zero RP genes common only to bacteria and numbers of RPs between archaeal and bacterial genomes is striking. archaea, and zero RP genes common to bacteria and eukaryotes The (animals, fungi, and plants) ribosomal structure is consistent with the coevolution of archaea and eukaryotes. Eu- composed of 79 or 80 RPs of all sizes. It is remarkable that 126 of karyotes are considered to have coevolved with an archaeal lineage, 127 RP genes (these include many duplicates) in the yeast genome which might anticipate that the larger ribosome of eukaryotes are PHX (Table 2). All but a few of the RPs in each proteobacterial relative to bacterial ribosomes implies an expanded ribosome genome and in low G ϩ C Gram-positive bacteria are PHX structure in archaea versus bacteria. In summary, there are two [manifest exceptions are the proteobacteria Helicobacter pylori and primary anomalies in the numbers of RPs comprising the ribosome Campybacter jejuni (Table 2)]. Apart from the methanogens (Meth- and in the PHX status of RPs in archaeal versus bacterial genomes. anococcus maripaludis stands out, with all RPs of at least 80 aa First, virtually all ribosomes of an archaeal genome are composed PHX) in Archaea, generally only Ϸ60% of all RP genes Ն80 aa are of more RPs than virtually all bacterial ribosomes (with very few PHX. Explicitly, among the bacterial genomes, an average of 37.5 exceptions; see Table 2). Secondly, the percent PHX of RPs among RPs are PHX compared with an average of 30.4 PHX in archaeal archaeal genomes is significantly less than that of bacterial genomes.

Karlin et al. PNAS ͉ May 17, 2005 ͉ vol. 102 ͉ no. 20 ͉ 7311 Downloaded by guest on September 28, 2021 What can account for these differences? (i) The RNA of the Table 3. Archaeal genomes ribosome in eukaryotes is larger and needs an expanded context to Genome No. of Minimum organize, stabilize, and facilitate the folding of the underlying Name size, kb 16S rRNA doubling time* rRNA. Also, an expanded ribosome cover may better protect the underlying RNA from ribonuclease cleavage. (ii) Many eukaryote Crenarchaea ribosome units interact with the ER membrane (18). This interac- SULSO 2,992 1 8 tion may require an increased ribosomal complement and corre- SULTO 2,695 1 8 spondingly more RP genes. (iii) It is recognized in yeast that various AERPE 1,670 1 2.3 PYRAE 2,222 1 6 chaperones, possibly including nascent polypeptide-associated Euryarchaea complex, interact with ribosomes in processing and in protecting PYRAB 1,765 1 40 min nascent polypeptides exiting the ribosome (19). (iv) Archaea have PYRFU 1,908 1 59 min representatives of bacterial and eukaryotic RPs, which may account PYRHO 1,739 1 55 min for more RPs in archaeal as against bacterial genomes. (v) Eukary- THEAC 1,565 1 15 otic proteins are longer, on average (generally by at least 100 THEVO 1,585 1 14 residues), than bacterial proteins (20–22), which may require the PICTO 1,546 1 6 size of the ribosomes to be larger. (vi) Are the RP lengths generally ARCFU 2,178 1 4 longer in eukaryotes compared with bacteria? Actually, homolo- METKA 1,695 1 5 METTH 1,751 2 8 gous RPs of eukaryotes tend to be longer than in bacteria (2). METJA 1,665 2 56 min However, the longest RP overall is S1 in Gram-negative bacteria. METMP 1,661 3 2 (vii) An alternative explanation might interpret the diminished METAC 5,751 3 5.2 numbers and sizes of RPs of the ribosome structure in bacteria as METMA 4,096 3 7 a consequence of streamlining the bacterial genome, which puta- HALSP 2,014 1 12 tively increases efficiency. Nanoarchaea Are these comparisons of ribosome structural sizes and config- NANEQ 491 1 — urations correlated with slow growth for archaea? Examination of Notice that most archaea subtend genomes of moderate size, ranging from doubling times in Table 3 reveals that Pyrococcus, M. jannaschii, and Ϸ1.5 to 3.00 megabases. The methanogens are of variable size with the two M. maripaludis organisms are fast-growing, whereas the other mesophilic methanosarcina species especially relatively large at Ͼ4- and archaea appear constrained to slow growth. In these terms, there 5.7-megabase genome lengths. Crenarchaea tend to live optimally in a hy- appears to be no clear correlation between ribosome size and perthermophilic environment. SULSO, S. solfataricus; SULTO, Sulfolobus to- genome doubling time. kodaii; AERPE, A. pernix; PYRAE, Pyrobaculum aerophilum; PYRAB, Pyrococ- A ‘‘giant’’ RP gene (designated S1) commonly exceeding 500 aa cus abyssi; PYRFU, Pyrococcus furiosus; PYRHO, Pyrococcus horikoshii; THEAC, in length in Gram-negative bacteria is essential (with the exception acidophilum; THEVO, Thermoplasma volcanium; PICTO, Picrophilus torridus; ARCFU, Archaeoglobus fulgidus; METKA, M. kandleri; of Mycoplasma). S1 is overall acidic and binds weakly and reversibly METTH, Methanobacter thermoautotrophicus; METJA, M. jannaschii; to the small subunit of the ribosome, whereas most other RPs bind METMP, M. maripaludis; METAC, Methanosarcina acetivorans; METMA, strongly (23). S1 has a high affinity for mRNA chains, is necessary Methanosarcina mazei; HALSP, Halobacterium NRC-1; NANEQ, Nanoar- in many cases for translation initiation, is directly involved in chaeum equitans. —, unknown. mRNA recognition, and can facilitate binding of mRNA that lacks *Values indicate the number of hours unless otherwise noted. a strong Shine–Dalgarno motif. S1 is not encoded near any RP operon. The S1 proteins of low G ϩ C Gram-positive bacteria (Firmicutes) are generally of reduced size [in the range of 360–410 lack of multiple chromosomes may be sampling artifacts. (iii) aa (24)]. RPs are cationic (generally Ͼ20% cationic residues). The Chaperone proteins in archaea are like those in eukaryotes and ͞ three acidic RPs found in eukaryotes, P0,P1, and P2, are known to feature prefoldin and thermosomes (TRi CCT), with much fewer play an important regulatory role in the initiation step of eukaryotic HSP70 gene representatives in archaea (1), whereas all bacterial mRNA translation. Of these, only P0 is present in archaea. The S2 genomes encode one or more HSP70 genes, usually PHX. To date, RP gene in bacterial genomes is separated from other RPs, whereas no human (or vertebrate) diseases have been associated with S2 in many archaeal genomes is often incorporated in short RP archaeal species (25–27). Is this lack of pathogenic archaea due to clusters. the environment of human and archaea being fundamentally In many bacterial genomes (e.g., Synechocystis and Mycobac- different? Martin (26) discusses the fundamental differences be- terium tuberculosis), several major chaperone genes are proximal tween vertebrate and archaeal biochemical workings and empha- to the principal RP operon. For example, the major RP cluster sizes the novel ‘‘cofactors’’ produced, especially in methanogens. in Synechocystis has GroEL-1 nearby. It is tempting to speculate that these chaperones contribute to ribosome formation. The PHX Genes of Yeast deeply branching Gram-negative Aquifex aeolicus encodes a The yeast genome somewhat parallels fast-growing bacteria in PHX giant S1. Thermotoga maritima, allowing for a frameshift, also genes augmented by actin, cofilin, profilin, tropomyosin and related encodes an S1 homolog. All of the thermophilic bacteria dis- genes. Most eukaryotic cells are spatially organized by the cytoskel- played in table 4 of ref. 1 contain an S1. Unlike the giant bacterial eton containing three principal types of filaments: actin, microtu- S1, Saccharomyces cerevisiae RP genes are all Ͻ350 aa in length bules, and intermediate filaments. Unlike mammalian cells, yeast (generally between 50 and 350 aa) and are randomly dispersed cells do not contain intermediate filaments. Actin filaments occur over the 16 yeast chromosomes. in the cytoplasm and participate in various cellular activities, Table 1 highlights various comparisons of genomic and pro- including cell budding, mating, defining cell polarity, transport, and teomic characteristics within and between bacterial and archaeal maintenance of organelles (29). Actin is PHX at a very high genomes. For example, (i) prokaryotic genomes often (but not expression level [E(g) ϭ 2.53]. Other PHX genes associated with always) employ Shine–Dalgarno motifs for translation initiation of actin [with E(g) values in parentheses] are cofilin (1.66), tropomy- mRNAs, whereas eukaryotes almost never use such controls. (ii) osin-1 (1.19), profilin (1.57), NADPH dehydrogenase (1.66), and Several bacterial genomes consist of multiple chromosomes, and myosin light chain (1.03). Cofilin controls reversibly actin polymer- several bacterial genomes feature linear chromosomes, but, to date, ization and depolymerization in a pH-sensitive manner. Tropomy- every archaeal genome has a single circular chromosome. Of osin binds to and stabilizes actin filaments in the cell. Profilin plays course, because few archaeal genomes have been sequenced, the a role in actin organization and maintenance of polarity. In contrast

7312 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0502314102 Karlin et al. Downloaded by guest on September 28, 2021 Table 4. Counts of PHX genes of glycolysis and TCA cycle Table 4. (continued) pathways in different genomes TCA Predicted preferred TCA Predicted preferred Genome Glycolysis cycle lifestyle Genome Glycolysis cycle lifestyle

Archaea S. meliloti 1 6 Aerobic SULSO 2 5 Aerobic M. loti 6 12 Facultative͞aerobic SULTO 1 3 Possibly aerobic A. tumefaciens 3 10 Aerobic AERPE 1 5 Aerobic B. melitensis 5 12 Facultative͞aerobic PYRAE 0 0 Unknown* prowazekii 0 0 Parasitic PYRAB 1 0 Unknown* Other bacteria PYRFU 1 0 Unknown* aquae 1 5 Aerobic PYRHO 0 0 Unknown* thema 2 1 Unknown† THEAC 1 3 Possibly aerobic deira 3 12 Aerobic THEVO 1 4 Possibly aerobic myctu 4 9 Facultative͞aerobic PICTO 0 6 Aerobic mycle 2 2 Facultative͞Parasitic ARCFU 0 0 Unknown* Synechocystis 6 1 Anaerobic† METKA 3 0 Unknown* neime 0 7 Aerobic METTH 0 1 Unknown* camje 0 0 Parasitic METJA 0 0 Unknown* Chlamydia pneumoniae 1 0 Parasitic METMP 1 0 Unknown* Treponema pallidum 1 0 Parasitic METAC 3 2 Unknown* Borrelia burgdorferi 1 0 Parasitic METMA 1 2 Unknown* 0 0 Parasitic HBNRC 2 2 Unknown* NANEQ 0 0 Unknown* Species abbreviations are given in the legends of Tables 3 (Archaea) and 2 Eukaryotes (Bacteria). Enzymes acting in glycolysis and TCA cycle include the following. S. cerevisiae 15 3 Anaerobic Glycolysis (gluconeogenesis): hexokinase or glucokinase; glucose-6-phosphate D. melanogaster 10 12 Facultative isomerase; 6-phosphofructokinase (replaced by fructose bisphosphatase in glu- Low GϩC Gramϩ coneogenesis); fructose-1,6-bisphosphate aldolase; triosephosphate isomerase; B. subtilis 7 3 Facultative͞anaerobic glyceraldehyde-3-phosphate dehydrogenase; 3-phosphoglycerate kinase; phos- B. halodurans 4 8 Facultative͞aerobic phoglycerate mutase; enolase; pyruvate kinase. TCA cycle: citrate synthase; L. monocytogenes 9 1 Anaerobic aconitase; isocitrate dehydrogenase; 2-oxoglutarate dehydrogenase E1 compo- L. lactis 10 1 Anaerobic nent; 2-oxoglutarate dehydrogenase complex, dihydrolipoamide succinyltrans- S. pneumoniae 11 0 Anaerobic ferase (E2) component; 2-oxoglutarate dehydrogenase complex, dihydrolipoam- S. aureus 8 1 Anaerobic ide dehydrogenase (E3) component; succinyl-CoA synthetase alpha subunit; ␤ C. perfringens 10 0 Anaerobic succinyl-CoA synthetase -subunit; succinate dehydrogenase subunit A; succinate ␥-Proteobacteria dehydrogenase subunit B; succinate dehydrogenase subunit C; succinate dehy- S. oneidensis 4 8 Facultative͞aerobic drogenase hydrophobic anchor subunit; fumarase; malate dehydrogenase. V. cholerae 8 3 Facultative͞anaerobic *Very low numbers of glycolytic and TCA cycle PHX genes often occur in E. coli 10 10 Facultative parasitic or symbiotic bacteria. However, in archaea and T. maritima the lack Y. pestis 8 3 Facultative͞anaerobic of TCA cycle and glycolytic PHX genes may indicate significance of other H. influenzae 8 1 Anaerobic energy metabolism pathways. † P. multocida 7 0 Anaerobic Synechocystis satisfies its energy needs mainly by photosynthesis, and pre- P. aeruginosa 1 9 Aerobic dicting aerobicity from numbers of PHX gycolysis and TCA cycle genes may be ␣-Proteobacteria misleading. Moreover, the glycolytic enzymes of Synechocystis also act in C. crescentus 3 11 Aerobic carbon fixation, and its PHX levels may not necessarily indicate the impor- tance of the glycolysis pathway.

to actin filaments, microtubule and intermediate filament proteins rate and protein turnover. PHX genes in yeast appear randomly are not PHX. NADPH is multifunctional, with its primary activity distributed among chromosomes in contrast to bacterial genomes, in energy metabolism. Microtubules are found in the cytoplasm and where PHX genes often form clusters of functionally related genes. the nucleus, and they are cell cycle-regulated and play a major role Codon usage differences among different yeast chromosomes sug- in chromosome segregation and migration of the nucleus into the gest that the smallest chromosome I is an outlier or a supernumer- bud during mitosis. Microtubules are organized around a spindle ary chromosome. The other 15 chromosomes are substantially

pole body embedded in the nuclear envelope. Like microtubules, homogeneous in codon usages (30). MICROBIOLOGY the spindle pole body components are not PHX. In contrast to In terms of codon usage differences and predicted expression mammalian cells, yeast contain very few microtubules, and they do levels, the genome of the unicellular yeast shows results similar to not contribute significantly to cell shape or to transport of secretory fast-growing bacteria. Among the top PHX genes are those encod- vesicles as they do in mammalian cells. In yeast, microtubules are ing RPs, translation initiation, and elongation factors; a wide limited to a role in chromosomal segregation and nuclear migration. spectrum of chaperone proteins; enzymes functioning in glycolysis; The difference in expression levels between actin filaments and and eukaryote-specific genes for actin, , and related genes. microtubules is not surprising in view of the more specialized genes are encoded in pairs as divergent neighbors {H3, function of microtubules compared to actin filaments. H4} {H2A, H2B} on chromosome 2, another pair {H2B, H2A} on Most prokaryotic genomes and especially fast-growing bacteria chromosome 4, and another pair {H3, H4} on chromosome 14. All possess PHX DNA-directed RNA polymerase subunits, especially histone genes except H1 are strongly PHX. Many metabolic genes RpoB and RpoC. In contrast, none of the RNA polymerase II are PHX, including almost all genes of glycolysis and gluconeo- subunits in yeast is PHX. The same applies to RNA polymerases I genesis and many fermentation genes; a series of genes of purine͞ and III. synthesis, biogenesis, synthesis, and Dividing the task among the three polymerases sterol biosynthesis; a wide assortment of transporters (glucose putatively reduces the need for high expression levels of each transporters indicated earlier); phosphate transporters, including individual RNA polymerase. There may also be factors of growth Pho84 and Pho88; uracil permease; and others. Altogether, 486

Karlin et al. PNAS ͉ May 17, 2005 ͉ vol. 102 ͉ no. 20 ͉ 7313 Downloaded by guest on September 28, 2021 (8%) of 6,137 yeast genes with 80 or more codons score as PHX. PHX Genes of Drosophila melanogaster Approximately 36% of all yeast genes are duplicated (i.e., have a Among the transcripts that are Ն80 codons (including isoforms paralog of significant sequence similarity). By contrast, 67% of of the same gene), 2,338 (17%) genes are characterized as PHX. PHX genes are duplicated, which may enhance the expression of This percentage is higher than in bacteria and yeast, probably these proteins. because of the more complex patterns, mixture of different The yeast ribosome contains 78 RPs. Unlike bacterial genomes specialized types of cells, and cellular requirements of higher where RP genes are mostly organized in clusters and are not eukaryotes. As in bacteria, the RPs, chaperones, translation͞ duplicated, the yeast RP gene ensemble consists mostly of dupli- transcription processing factors, and major energy metabolism cates distributed randomly throughout the genome. With a single enzymes tend to be PHX. Conspicuously, structural skeletal ␣ exception, all cytoplasmic RP genes of at least an 80-codon length proteins, such as spectrin, -actinin, myosin, actin, and tubulin are PHX, generally with strikingly high predicted expression level, are found near the top of the list of PHX genes. Other eukaryote- E(g) Ͼ 2.00. In contrast, the RPs functioning in the mitochondria specific genes among top PHX genes of Drosophila include larval generally encoded in the nucleus are never PHX and mostly score serum proteins 1 and 2, alternative transcripts of a calcium E(g) Ͻ 0.70, akin to an average gene. In yeast, there are at least 40 transporter, and a chitinase. The following list highlights the top PHX genes. chaperone and degradation genes, and the majority are PHX. These genes include a family of HSP70 paralogs of distinct func- 1. Cytoskeletal and exoskeletal proteins [E(g) values mostly at tions, as follows: SSB (two copies) are associated with ribosomes; Ն2.00], including spectrin, ␣-actinin, F-actin crosslinking SSA (two copies) act in trafficking polypeptides across mitochon- protein, muscle myosin, actin, chitinase, tubulin, lumen, drial membranes and into the endoplasmic reticulum; SSC assists in paramyosin, tropomyosin, troponin T, C, and chitin. protein assembly and refolding in mitochondria; YDJ1 is involved 2. Biosynthesis proteins [1.50 Յ E(g) Յ 2.00], including protein in mitochondrial protein import; and PDR13 is associated with drug disulfide isomerase, , Hsp70, Hsp60 (mitochondrial) resistance. Other distinctive PHX chaperone proteins of yeast (Tcp), proteasome subunits, thioredoxin-1, chaperone Cdc37, feature protein disulfide isomerase and HSP82. contrib- serine protease, and endopeptidase. Յ Յ utes in ATP-dependent selective degradation of cellular proteins, 3. Most RPs [1.50 E(g) 2.00], including P0,P1, and P2. the maintenance of chromatin structure, the stress response, ribo- Aerobic Versus Anaerobic Preferences somal biogenesis, and DNA repair. There are many copies of E2 Table 4 reports the number of PHX enzymes in the two basic ubiquitin-conjugating enzymes that are PHX. Many seripauperin metabolic pathways, glycolysis and the TCA cycle, in Ͼ40 bacterial (stress induced) homologs are also PHX (e.g., PAU1–23). Yeast sequenced genomes, in 19 archaeal genomes, and in the S. cerevisiae grows best on glucose, and it is not surprising that it contains an and D. melanogaster eukaryotic genomes. The complete pathways extended family of hexose transporters (HXT) with several mem- carry 10 primary glycolytic genes and 15 primary TCA cycle genes. bers PHX at high levels. These PHX hexose transporters [with E(g) When the ratio of counts of PHX glycolysis genes to PHX TCA values in parentheses] are HXT1 (1.58), HXT2 (1.26), HXT3 (2.07), cycle genes is at least 2.5, we propose that the prefers an HXT4 (1.99), HXT6 (2.14), and HXT7 (2.16). Some other PHX anaerobic metabolism, and where the ratio of PHX among TCA transporters of yeast include phosphate transporters PHO84 (1.92), genes to glycolysis genes is at least 3, we declare a preference for an PHO88 (2.00), and MIR1 (mitochondrial; 1.53). BMH1 (1.14) and aerobic metabolism. We interpret the tendencies to be in favor of BMH2 (1.30) are PHX, which is surprising because they are facultative growth when the ratios are rather commensurate. There regulatory proteins that participate in signal transduction pathways. are other situations for which the glycolytic and TCA cycle enzyme However, they act stoichiometrically rather than catalytically, which PHX sets are both low and for which we interpret the growth may explain their PHX status. BMH1 and BMH2 are 14-3-3 condition of the organism as primarily parasitic, symbiotic, or proteins that are known to bind to phosphorylated proteins and unknown. regulate them by blocking critical protein–protein interactions (31). The results of Table 4 are largely concordant with known lifestyle What gene classes are not PHX? As with bacteria, specialized preferences of the microbes at hand. The foregoing criteria can be regulatory proteins or proteins responding to special demands and applied to hundreds of microorganisms as the genomes are se- required in few copies per cell cycle are not expected to be (and are quenced without the necessity of accompanying experiments. The not) PHX. For example, specialized transcription factors and cell two eukaryotes (yeast and fly) show counts that imply that yeast cycle regulatory proteins are not PHX, similar to protein kinases thrives in an anaerobic environment, whereas the fly cells engage in aerobic and anaerobic metabolism. and phosphatases. Genes acting in vitamin and cofactor biosynthe- sis pathways, of which only small amounts are needed to provide We are grateful for helpful comments on the manuscript by J. Ma, Prof. adequate function, have low predicted expression levels. Replica- D. Kaiser, and Dr. J. Trent (Ames Research Center, National Aeronau- tion and repair enzymes are generally not PHX in bacteria or in tics and Space Administration, Moffett Field, CA). S.K. was supported yeast. in part by National Institutes of Health Grant 5R01GM10452-40.

1. Karlin, S., Mra´zek,J., Ma, J. & Brocchieri, L. (2005) Proc. Natl. Acad. Sci. USA 102, 16. Hayes, J. M. (2000) Proc. Natl. Acad. Sci. USA 97, 14033–14034. 7303–7308. 17. Leigh, J. A. (2000) in Prokaryotic Nitrogen Fixation: A Model System for Analysis of a 2. Lecompte, O., Ripp, R., Thierry, J.-C., Moras, D. & Poch, O. (2002) Nucleic Acids Res. 30, 5382–5390. Biological Process, ed. Triplett, E. W. (Horizon Scientific, Wymondham, U.K.). 3. Karlin, S., Brocchieri, L., Trent, J., Blaisdell, B.E. & Mra´zek, J. (2002) Theor. Popul. Biol. 61, 367–390. 18. Leroux, M. R. (2002) Adv. Appl. Microbiol. 50, 219–277. 4. Rocha, E. P. C., Danchin, A. & Viari, A. (2001) Genome Res. 11, 946–958. 19. Hartl, F. U. & Hayer-Hartl, M. (2002) Science 295, 1852–1858. 5. Lobry, J. R. (1996) Mol. Biol. Evol. 13, 660–665. 20. Galperin, M. Y., Tatusov, R. L. & Koonin, E. V. (1999) in Organization of the Prokaryotic 6. Frank, A. C. & Lobry, J. R. (1999) Gene 238, 65–77. Genome, ed. Charlebois, R. L. (Am. Soc. Microbiol., Washington, DC) 7. Mra´zek, J. & Karlin, S. (1998) Proc. Natl. Acad. Sci. USA 95, 3720–3725. 21. Zhang, J. (2000) Trends Genet. 16, 107–109. 8. Kelman, L. M. & Kelman, Z. (2004) Trends Microbiol. 9, 299–401. 22. Brocchieri, L. & Karlin, S. (2005) Nucleic Acids Res., in press. 9. Robinson, N. P., Dionne, I., Lundgren, M., Marsh, V. L., Bernander, R. & Bell, S. D. (2004) 23. Sengupta, J., Agrawal, R. K. & Frank, J. (2001) Proc. Natl. Acad. Sci. USA 98, 11991–11996. Cell 116, 25–38. 24. Karlin, S., Theriot, J. & Mra´zek, J. (2004) Proc. Natl. Acad. Sci. USA 101, 6182–6187. 10. Lundgren, M., Andersson, A., Chen, L., Nilsson, P. & Bernander, R. (2004) Proc. Natl. Acad. 25. Cavicchioli, R., Curmi, P. M. G., Saunders, N. & Thomas, T. (2003) BioEssays 25, 1119–1128. Sci. USA 101, 7046–7051. 26. Martin, W. (2004) BioEssays 26, 592–593. 11. Grabowski, B. & Kelman, Z. (2003) Annu. Rev. Microbiol. 57, 487–516. 27. Cavicchioli, R. & Curmi, P. (2004) BioEssays 26, 593. 12. Myllykallio, J., Lopez, P., Lopez-Garcia, P., Heilig, R., Saurin, W., Zivanovic, Y., Philippe, 28. Lepp, P. W., Brinig, M. M., Ouverney, C. C., Palm, K., Armitage, G. C. & Relman, D. A. H. & Forterre, P. (2000) Science 288, 2212–2215. (2004) Proc. Natl. Acad. Sci. USA 101, 6176–6181. 13. Myllykallio, H. & Forterre, P. (2000) Trends Microbiol. 8, 537–539. 29. Winsor, B. & Schiebel, E. (1997) Yeast 13, 399–434. 14. Makarova, K. S. J. & Koonin, E. V. (2003) Genome Biol. 4, 115–175. 30. Karlin, S., Campbell, A. & Mra´zek, J. (1998) Ann. Rev. Genet. 32, 185–225. 15. Suyama, M. & Bork, P. (2001) Trends Genet. 17, 10–13. 31. Bridges, D. & Moorhead, G. B. G. (2004) Sci. STKE re10,1–8.

7314 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0502314102 Karlin et al. Downloaded by guest on September 28, 2021