<<

Predicted highly expressed in archaeal

Samuel Karlin*†, Jan Mra´ zek*, Jiong Ma‡, and Luciano Brocchieri*

Departments of *Mathematics and ‡Biological Sciences, Stanford University, Stanford, CA 94305-2125

Contributed by Samuel Karlin, March 25, 2005

Based primarily on 16S rRNA sequence comparisons, life has been Table 1. Archaeal genomes broadly divided into the three domains of , , and Optimal No. of Eukarya. Archaea is further classified into Crenarchaea and Eur- growth G ϩ C genes yarchaea. Archaea generally thrive in extreme environments as Name size, kb temp, °C content, % Ն80 aa assessed by temperature, pH, and salinity. For many prokaryotic , ribosomal (RP), transcription͞translation fac- Crenarchaea tors, and chaperone genes tend to be highly expressed. A is SULSO 2,992 80 36 2,869 predicted highly expressed (PHX) if its codon usage is rather similar SULTO 2,695 80 33 2,657 to the average codon usage of at least one of the RP, transcription͞ AERPE 1,670 90 56 1,783 factors, and chaperone gene classes and deviates PYRAE 2,222 100 51 2,290 strongly from the average gene of the genome. The thermosome Euryarchaea (Ths) chaperonin family represents the most salient PHX genes PYRAB 1,765 96 45 1,786 among Archaea. The chaperones Trigger factor and HSP70 have PYRFU 1,908 96 41 1,941 overlapping functions in the folding process, but both of these PYRHO 1,739 96 42 1,828 THEAC 1,565 59 46 1,415 proteins are lacking in most archaea where they may be substi- THEVO 1,585 60 40 1,415 tuted by the chaperone prefoldin. Other distinctive PHX proteins of PICTO 1,546 60 36 1,473 Archaea, absent from Bacteria, include the proliferating cell nuclear ARCFU 2,178 83 49 2,214 antigen PCNA, a replication auxiliary factor responsible for teth- METKA 1,695 110 61 1,606 ering the catalytic unit of DNA polymerase to DNA during high- METTH 1,751 65 50 1,735 speed replication, and the acidic RP P , which helps to initiate 0 METJA 1,665 85 31 1,635 mRNA translation at the ribosome. Other PHX genes feature Cell METMP 1,661 37 33 1,630 division control 48 (Cdc48), whereas the bacterial septation METAC 5,751 37 43 4,252 proteins FtsZ and minD are lacking in Crenarchaea. RadA is a major METMA 4,096 37 41 3,147 DNA repair and recombination protein of Archaea. Archaeal ge- HALSP 2,014 37 68 1,880 nomes feature a strong Shine–Dalgarno ribosome-binding motif Nanoarchaea more pronounced in Euryarchaea compared with Crenarchaea. NANEQ 491 90 32 515

acidic ribosomal proteins ͉ Archaea ͉ highly expressed Notice that most archaea subtend genomes of moderate size, ranging from Ϸ proteins ͉ thermosome 1.5 to 3.00 Mb. The are of variable size with the two meso- philic Methanosarcina especially relatively large, exceeding 4- and 5.7-Mb genome lengths. SULSO, Sulfolobus solfataricus; SULTO, Sulfolobus he identity of the three domains of life (1) and their tokodaii; AERPE, ; PYRAE, Pyrobaculum aerophilum; Trelationships are controversial (2–11). Archaea form a het- PYRAB, ; PYRFU, ; PYRHO, Pyrococcus erogeneous clade composed of a mosaic of bacterial, eukaryotic, horikoshii; THEAC, Thermoplasma acidophilum; THEVO, Thermoplasma vol- and unique features. Archaea and Eukarya share many homol- canium; PICTO, Picrophilus torridus; ARCFU, Archaeoglobus fulgidus; METKA, ogous genes involved in information processing (replication, kandleri; METTH, Methanobacter thermoautotrophicus; METJA, jannaschii; METMP, maripalu- , and translation), whereas Archaea and Bacteria dis; METAC, Methanosarcina acetivorans; METMA, Methanosarcina mazei; share many morphological structures and metabolic proteins (10, HALSP, sp. NRC-1; NANEQ, ; temp, 12). Of 19 archaeal genomes completely sequenced (Table 1, temperature. mid-2004), 4 are from Crenarchaea and 14 are from Eur-

yarchaea. Nanoarchaeum equitans, a parasitic archaeon that lives MICROBIOLOGY in coculture with the archaeon , has been tentatively The objectives of this work are to identify and analyze the assigned to the separate group of Nanoarchaea. Most sequenced major predicted highly expressed (PHX) genes with respect to archaea, to date, are thermophilic, generally prefer extreme codon usage biases among the archaeal genomes. Our analyses environments, and are found in most . The four of bacterial genomes support the hypothesis that each species sequenced crenarchaea are all (optimal has evolved codon usage patterns promoting ‘‘optimal’’ gene Ն growth temperature, 75°C), although mesophilic crenarchaea expression levels for most circumstances of its , energy have been putatively found in pelagic waters (3, 13). Among the sources, and lifestyle (15, 16). Codon bias is often different at Euryarchaea, six are methanogens, including three , the start of a gene compared with the central or terminal part Methanosarcina acetivorans, Methanosarcina mazei, and Meth- of the gene (17, 18). Different selection pressures are imposed anococcus maripaludis. Halobacterium NRC-1 is also mesophilic, by the constraints of ribosomal binding and translation fidelity. thriving in high salt concentrations. Most sequenced archaea, excluding methanogens (lifestyle strictly anaerobic, metabolism methanogenesis) grow both aerobically and anaerobically. Ar- Abbreviations: CH, chaperone͞degradation genes; PHX, predicted highly expressed; RP, chaeal mRNAs are principally polycistronic as in bacterial ribosomal proteins; SD, Shine–Dalgarno; TF, transcription͞translation synthesis factors. genomes. Archaeal proteins involved in translation have both †To whom correspondence should be addressed. E-mail: [email protected]. eukaryotic and bacterial features (14). © 2005 by The National Academy of Sciences of the USA

www.pnas.org͞cgi͞doi͞10.1073͞pnas.0502313102 PNAS ͉ May 17, 2005 ͉ vol. 102 ͉ no. 20 ͉ 7303–7308 Downloaded by guest on September 27, 2021 Protein folding is possibly correlated with codon usage (19, RP gene class of Archaea is the most variable, whereas the CH 20). According to the rare codon hypothesis for domains and and TF gene classes are more coherent and consistent. There is secondary structures, repetition of rare codons reduces trans- good agreement of our determinations of PHX protein abun- lation rate and introduces translation pauses, allowing time for dances with assessments by 2D gel electrophoresis displacements protein domains and secondary structures to fold into native (e.g., ref. 16). conformations. However, there appear to be subtle differences in bacterial and eukaryotic translation mechanisms, e.g., the Results and Discussion role of chaperonins in bacteria vs. and the impor- Distinctive Proteins of Archaeal Genomes. The thermosome sub- tant activity of cotranslational folding in eukaryotes but not in units (Ths) are among the most PHX throughout the archaeal prokaryotes. Generally, PHX genes in bacterial genomes rely and almost always essential (22). Archaea generally live on favorable codon usages, tend to possess a strong Shine– in extreme environments that are likely to affect the integrity of Dalgarno (SD) sequence (21), and putatively possess a strong their proteins, nucleic acid, and membranes. Mutational and promoter sequence. The substantial variability of GϩC com- other disturbances conceivably may be alleviated by an abun- position within mammalian genomes (isochores) may compli- dance of chaperone and degradation proteins, including ther- cate predicting gene expression levels from codon usages. In mosome, prefoldin, and the proteasome complex assisted by contrast, the compositions of bacterial genomes are repair, recombination, and replication (22, 23). Ths is largely homogeneous. Gene expression in prokaryotes is reg- pervasively PHX in archaeal genomes at a very high predicted ulated at initiation and termination of transcription and expression level (Table 2). Ths also has been investigated translation and by different rates of transcription and trans- experimentally and confirmed especially abundant in Sulfolobus lation, differential mRNA stabilities, segmental stability dif- species encompassing up to 20% of the cellular protein content ferences in polycistronic messages, codon preferences, and (24–26). DnaK (HSP70) is found, so far, only in archaeal interactions with chaperones and other proteins. mesophiles or in moderate (27), where it is PHX. The thermosome chaperones are outstandingly PHX genes The heat-inducible Lon protease is absent from the Crenarchaea (Table 2) consistent with the extreme environmental conditions but usually PHX among the Euryarchaea (see Table 5, which is to which these species have adapted. General comparisons of published as supporting information on the PNAS web site). bacterial vs. archaeal genomes and corresponding discussion of Archaeal genomes also are distinguished with proteasome sub- the genomic and proteomic content of the Saccharomyces cer- units. The chaperone prefoldin (Pfd) ␤-subunit is present in all ␣ evisiae and Drosophila melanogaster genomes are set forth in our Archaea, whereas the -subunit is lacking in Crenarchaea and companion paper (2). Thermoplasma genomes (Tables 2 and 5). The replication protein PCNA (proliferating cell nuclear Methods antigen) is present mostly PHX in all archaea and eukaryotes but Highly expressed genes are characterized on the basis of biased absent from bacteria. There are multiple copies (two or three) codon usages compared with the average gene (15). For most of PCNA in the crenarchaeal genomes (see Table 2 and ref. 28). bacteria during exponential growth, many genes encoding ribo- Moreover, PCNA is widely distributed in eukaryotes where it ␦ somal proteins (RP), the principal transcription͞translation functions in association with DNA polymerase enhancing synthesis factors (TF), and the major chaperone͞degradation processivity in elongation of the leading strand during DNA genes (CH) functioning in protein folding͞unfolding and traf- replication. ficking tend to be highly expressed. We consider these gene The regulatory RP P0, P1, and P2, prominent in eukaryotes, classes (restricted to genes of Ն80 codons) as representative of feature a hyperacidic run proximal to their carboxyl termini. These proteins are missing from bacterial genomes, and highly expressed genes. In this purview, a gene is PHX if its ͞ codon frequencies are similar to the average of any of these gene only P0 is present in archaeal genomes. The RPs L7 L12 and S1 classes but deviate strongly from those of the average gene of the of bacterial genomes, neither of which is similar to P0, also genome. The codon usage bias of a gene group F with respect to emphasize acidic residues. a gene group G is calculated by the formula There is evidence that the cell division control protein 48 (Cdc48), ubiquitous in archaeal genomes, functions in cell division and growth processes. In Arabidopsis, Cdc48 is localized ͑ ͉ ͒ ϭ ͸ ͑ ͒ ͸ ͉ ͑ ͒ Ϫ ͑ ͉͒ B F G pa F ͫ f x, y, z g x, y, z ͬ, primarily to the nucleus. In other eukaryotes, this protein is a ͑x,y,z͒ϭa mainly plasma membrane-associated. In Archaea, the genes of ORC (origin recognition complex) and Cdc6, analogous to where {pa(F)} correspond to the average amino acid frequencies DnaA of Bacteria, are deemed responsible for replication ini- of the genes of F, and f(x, y, z) and g(x, y, z) are codon frequencies tiation (28). of F and G genes, respectively, normalized to one for each amino RadA of archaea is similar to RecA of bacteria and Rad51 of acid family. Predicted expression levels with respect to the RP, eukaryotes. They all bind single-stranded DNA with the same TF, or CH standards can be based on the ratios ERP(g) ϭ stoichiometry and exhibit well-conserved Walker-A and -B ATP B(g͉C)͞B(g͉RP), ECH(g) ϭ B(g͉C)͞B(g͉CH), and ETF(g) ϭ binding motifs (29). These proteins are important in recombi- B(g͉C)͞B(g͉TF), where C is the totality of all genes of the nation, mutagenesis, transposition, and repair, and are made in genome. An overall estimate E(g) of the expression level of a response to general DNA damage. DNA repair pathways involve gene g is defined by the equation B(g͉C)͞E(g) ϭ (1͞3)(B(g͉RP) activities of chemically reverse DNA damage, base excision ϩ (B(g͉TF) ϩ (B(g͉CH)). For archaeal genes, the criterion repair, and nucleotide excision repair (29). E(g) Ͼ 1 with at least one (two for bacteria) of the values ERP(g), Small Hsps (average 150 aa), mostly Hsp20, are abundant ETF(g), or ECH(g) exceeding 1.05 provides generally a consistent among archaeal genomes, often in multiple copies (see Table 5), benchmark reflecting high protein abundance. The concept of but are variably represented among bacterial genomes. The small PHX in most bacterial genomes was justified by independent Hsps are not present in approximately half of the current measurements of gene expression (16). For most bacterial collection of bacterial genomes. For example, they are absent genomes, the codon usage differences among the functional gene from Lactococcus lactis, , Streptococcus classes RP, TF, and CH tend to be low and concordant (see ref. pneumoniae, Listeria innocua, Haemophilus influenzae, Pasteu- 15). However, not all genes of the classes RP, TF, and CH are rella multocida, Neisseria meningitidis, Helicobacter pylori, automatically PHX in Bacteria or Archaea. In this respect, the Campylobacter jejuni, Chlamydia trachomatis, Treponema palli-

7304 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0502313102 Karlin et al. Downloaded by guest on September 27, 2021 Downloaded by guest on September 27, 2021 Karlin tal. et Table 2. Predicted expression values E(g) for distinctive genes of archaeal genomes Value SULSO SULTO AERPE PYRAE PYRAB PYRFU PYRHO THEAC THEVO PICTO ARCFU METKA METTH METJA METMP METAC METMA HALSP NANEQ

No. of PHX 601 (21) 529 (20) 299 (17) 84 (3.7) 163 (9.1) 160 (8.2) 139 (7.6) 172 (12) 122 (8.6) 214 (15) 301 (14) 255 (16) 168 (10) 128 (7.8) 162 (10) 505 (12) 321 (10) 335 (18) 36 (7.0) genes, % Max E(g) 1.40 1.51 1.50 1.21 1.73 1.67 1.41 1.30 1.18 1.57 1.47 1.62 1.39 1.69 2.09 2.00 1.95 1.49 1.21

Ths 1.32 1.38 1.60 1.21 1.52 1.42 1.32 1.26 1.13 1.57 1.47 1.47 1.39 1.57 1.90 1.71 1.82 1.36 1.20 1.32 1.29 1.28 1.15 1.13 1.08 1.57 1.34 1.34 1.60 1.73 1.35 1.02 1.20 etc N* 1.26 DnaK ——————— 1.19 1.00 1.39 ——1.25 ——1.70 1.85 1.09 — Psm 1.07 1.17 1.04 1.03 1.17 1.12 1.00 0.94 1.02 1.16 1.16 1.10 0.93 0.95 0.95 1.38 1.37 1.20 1.09 1.04 1.12 1.01 1.02 0.96 0.88 0.88 0.94 0.88 0.93 1.00 0.80 0.93 0.93 0.89 0.97 1.28 1.16 1.04 1.04 0.98 0.97 0.63 0.76 0.79 0.91 Hsp20 1.06 1.00 0.91 1.03 1.18 1.19 1.20 1.08 1.07 1.38 1.24 — 1.13 1.31 0.80 1.20 0.95 1.10 — 0.90 0.98 0.94 0.96 1.03 0.93 1.10 0.96 0.95 etc N etc N Pfd ␤ 0.94 1.23 1.15 0.91 1.32 0.79 1.20 0.82 0.88 0.96 1.25 0.92 1.02 0.96 1.13 0.83 0.83 0.99 0.97 Pfd ␣ — — — — 0.81 1.09 1.07 — — — — 0.77 0.96 0.98 1.18 1.01 0.99 — —

PCNA 0.94 1.08 1.07 1.00 1.16 0.80 0.90 0.84 0.98 0.88 1.12 1.08 1.16 0.90 0.94 0.86 0.93 1.24 1.00 0.87 0.94 0.95 0.91 rp P0 1.11 1.26 1.01 1.00 0.98 1.25 1.23 0.91 0.98 1.37 0.91 0.93 0.95 1.28 1.69 1.56 1.57 1.09 0.94 Cdc48 1.18 0.98 1.01 1.01 1.16 1.19 1.17 1.11 1.09 1.39 1.37 0.82 1.08 1.54 0.68 1.43 1.02 1.08 1.21 1.15 0.88 0.95 0.97 0.66 0.89 1.13 1.26 1.16 1.07 0.80 etc N 0.60 0.71 Cdc6 1.20 0.83 1.12 — 0.70 0.70 0.81 0.89 0.87 0.83 0.72 — 0.82 — — 0.91 0.86 1.06 0.87 0.89 1.05 etc P AhpC/Bcp 1.08 1.12 1.16 0.95 1.32 1.20 1.11 1.14 1.08 1.26 1.14 — 0.92 0.83 0.76 0.87 0.87 0.97 1.09

PNAS 0.95 1.11 1.14 0.94 0.97 1.06 1.06 1.18 0.78 0.81 0.80 0.77 etc P etc N 0.93 0.80 etc N etc N etc P SecY 1.14 1.18 1.10 1.08 0.93 0.83 0.86 1.19 1.04 1.30 0.81 0.83 0.95 0.72 0.75 0.95 0.80 0.75 — ͉ RadA 1.18 0.94 1.28 0.96 0.71 0.96 0.80 1.11 0.94 1.09 0.85 0.80 0.76 0.69 1.28 1.22 1.04 0.88 0.98 a 7 2005 17, May

EF-1␣ (Tuf) 1.19 1.28 1.36 1.09 1.41 1.47 1.31 1.20 1.18 1.44 1.38 1.45 1.25 1.42 1.88 1.71 1.59 1.49 1.14 EF-2 (Fus) 1.17 1.35 1.29 1.06 1.70 1.55 1.31 1.13 1.07 1.45 1.21 1.62 1.23 1.56 2.04 1.85 1.95 1.32 1.09 rpoA 1.30 1.21 1.12 0.99 1.35 1.03 1.18 1.17 0.99 1.20 1.12 1.47 1.13, 0.83 1.14 1.50 2.00 1.90 0.75 1.05

͉ 1.08 1.02 0.91 0.95 0.83 0.80 0.94 1.16 1.04 0.97 0.97 1.12 0.82 0.80 0.89 1.27 1.29 0.87 1.02

o.102 vol. rpoB 1.13 1.51 1.15 0.89 1.23 1.04 1.21 1.20 1.06 1.52 1.06 1.33 1.28 0.94 1.58 1.35 1.17 0.73 0.89 1.13 1.07 1.22 1.16 0.86 1.00 1.34 1.41 0.84 1.00 ␤ ␤

͉ Bold indicates PHX; regular type indicates not PHX. Ths, any thermosome subunit; Psm, proteasome subunit; Hsp20, small heat shock protein, eye crystalline structure in higher eukaryotes; Pfd , prefoldin

o 20 no. subunit; PCNA, proliferating cell nuclear antigen; rp Po, ribosomal protein Po; Cdc48, cell division control protein 48; Cdc6, replication initiation; AhpC/Bcp, alkyl hydroperoxide reductase, bacterioferritin comigratory protein, peroxiredoxin or thioredoxin peroxidase; SecY, protein translocase; RadA, DNA repair and recombination protein; EF1␣, elongation factor 1␣; EF-2, elongation factor; rpo, DNA-directed RNA polymerase subunit; etc N, additional copies not PHX; etc P, additional copies PHX; —, gene not present in genome. HSP60 (GroEL) 1.35 and 1.38 PHX in two Archaeal genomes, METAC and METMA, respectively.

͉ *Three copies not PHX of Ths in METAC. 7305

MICROBIOLOGY Table 3. Shine–Dalgarno (SD) sequences in archaeal genomes Chaperone Proteins in Archaea. Because chaperones, especially Name Anti-SD* sequence SD%† thermosomes, are potently PHX in Archaea, we elaborate more on these classes of proteins. Chaperones play pivotal roles in SULSO AUAUCACCUCAU 22.9 protein folding, degradation of misfolded proteins, proteolysis, SULTO AUCACCUC 20.2 secretion, trafficking across membranes, facilitating the assem- AERPE AUCACCUCC 38.8 bly of macromolecular structural complexes (22), and archaeal PYRAE AUCACCUCC 23.8 membrane stabilization (26). Molecular chaperone systems that PYRAB AUCACCUCCUAU 71.9 promote the correct folding of nascent or misfolded proteins PYRFU AUCACCUCCUAU 69.8 have evolved in all domains of life. PYRHO AUCACCUCCUAU 54.9 The ATP-regulated HSP70 (DnaK) together with its cofactors THEAC AUCACCUCC 24.6 DnaJ and GrpE and the ATP-independent Trigger factor (Tig) THEVO AUCACCUCCAA 35.7 act posttranslationally in folding. Tig is only present in bacteria PICTO AUCACCUCCU 30.5 and generally is PHX. Tig is presumably substituted by nascent ARCFU AUCACCUCCUAA 47.0 polypeptide-associated complex (NAC) in eukaryotes (38) and METKA AUCACCUCC 70.9 possibly is Archaea (22, 38). DnaK (HSP70) is ubiquitous in METTH AUCACCUCCU 60.5 eukaryotes and bacteria, often with multiple copies, but is METJA AUCACCUCCU 54.4 missing from most archaea (see Tables 2 and 5). Tig and DnaK METMP AUCACCUCCU 71.8 are demonstrated to cooperate in the folding of newly synthe- METAC AUCACCUCCUAA 48.6 sized proteins (39). Simultaneous deletion of both Tig and DnaK METMA AUCACCUCCUAA 52.1 in bacteria is lethal under normal growth conditions (40). The HALSP AUCACCUCCUAA 26.3 archaeal HSP70, as with Gram-positive genomes, are missing a NANEQ AUCACCUCCU 7.5 23- to 25-aa segment present in Gram-negative genomes (4, 5). All archaea and eukaryotes contain the molecular chaperone *Bold indicates the core anti-SD sequence. See Table 1 for complete names of Pfd (subunit ␤) (Table 2), which has not been recognized in genomes. ␣ †SD% is defined as the fraction of genes (Ն80 aa) in a given genome that Bacteria. The crenarchaea do not contain the subunit (Table possesses a SD motif (for details, see ref. 21). The anti-SD sequence at the 3Ј 2). Pfd is considered to perform HSP70-like functions (41), end of the 16S rRNA binds to the SD motif of a gene when available to initiate albeit the sequences and structures of these proteins are sub- translation. In bacterial genomes, the consensus anti-SD sequence is AUCAC- stantially different. CUCCUUU, although archaeal genomes show some variation in their anti-SD The chaperonin complex (HSP60) assists protein folding in a sequence around the conserved core CCUCC. cavity, where nonnative polypeptides, usually of the size of 30–70 kDa, are enclosed and protected against intermolecular aggre- gation (for reviews, see refs. 38 and 42). Two groups of chap- dum Borrelia burgdorferi genitalium , , , and others. eronins are distinguished. With occasional exceptions, Group I However, the Hsp20 is ubiquitous in most eukaryotes, and in embodies GroEL of bacteria, mitochondria, and chloroplasts, higher eukaryotes it is involved in the structural determination whereas Group II features thermosomes in Archaea and TRiC, of the eye crystalline. ␣ also labeled CCT in eukaryotes. The thermosome genes are The translation elongation proteins EF-1 (Tuf) and EF-2 potently PHX in Archaea (Table 2), whereas the chaperonin (Fus) and the RNA polymerase subunits RpoA, RpoB, and (Cpn) and its cochaperonin (GroEL͞GroES) are highly ex- RpoC (Table 2) as with bacterial genomes are predominantly pressed in virtually all bacterial genomes (43). They are lacking PHX in Archaea. These genes are generally present in multiple in five of the nine sequenced so far (data not copies among bacteria but are represented by a single copy in shown). The GroES lid of the chaperonin complex is missing most archaea (Table 2). Archaeal RPs contain a mixture of from archaea and eukaryotes wherein helical protrusions sup- bacterial, eukaryotic, and some unique RPs (28, 30). Many plant GroES (22). The HSP60s in all three domains are purified archaea employ both eukaryotic and bacterial mechanisms in from cells as double-ring complexes. In Bacteria, each ring of Ј translation initiation (14). However, archaeal mRNAs have no 5 GroEL is composed of seven HSP60 subunits, whereas in Ј CAP structure or 3 poly(A) appendages, but to some extent they Archaea each ring embodies eight or nine HSP60 subunits. Some engage a bacterial SD translation initiation motif (Table 3). archaeal rings are formed from identical subunits, whereas in There is debate on the time of the origin of aerobic respiration others there are two subunit types; the Sulfolobus sp. contains (e.g., refs. 11 and 31–34). A substantial increase of in the three subunit types. Yeast contains at least 11 distinct CCT atmosphere accompanied the evolution of cyanobacterial lin- genes. It is observed that the eukaryotic ring structure contains eages and associated oxygenic photosynthesis in the time epoch six to nine different subunits in a variety of arrangements (22). at Ϸ3,200–2,500 million years ago (31, 34). However, there is Functional regions inferred from mutational studies and the E. evidence showing some oxygen was present earlier than 3,500 coli GroEL 3D crystal structure (44, 45) have been evaluated in million years ago (35–37) and influential in the evolution of life a multiple alignment across 43 HSP60 sequences selected from (31, 32). Methanogenesis is considered a later development in diverse genomes, centering on ATP͞ADP and Mg2ϩ binding archaeal evolution (11). Onset of cycling putatively sites, residues interacting with substrate, GroES contact posi- started Ϸ2,700 million years ago, whereas oxygenic photosyn- tions, interface regions between monomers and domains, and thesis evolved earlier (32). All archaeal genomes except for residues important in allosteric conformational changes (46). methanogens carry many PHX detoxification proteins that pro- The most evolutionarily conserved residues relate to the ATP͞ tect against oxygen stress, generally including alkyl hydroperox- ADP and Mg2ϩ binding sites. Hydrophobic residues that con- ide reductase, bacterioferritin comigratory proteins, thiore- tribute to substrate binding also are significantly conserved. A doxin, and superoxide dismutase (Table 2), suggesting that these large number of charged residues line the central cavity of the prokaryotes were selected to survive under moderate aerobic GroEL͞GroES complex in the substrate-releasing conforma- conditions. These observations may indicate that archaea, espe- tion. Charged residues also span intramonomer and intermono- cially methanogens in their present form, are probably not so mer 3D charge clusters (47) that are highly conserved among ancient or were converted to an oxygen-tolerant variant. The sequences and can play an important role interacting with the detoxification gene thioredoxin also catalyzes and removes substrate. disulphide bonds during protein folding. Duplicated HSP60 sequences stand out among the classical

7306 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0502313102 Karlin et al. Downloaded by guest on September 27, 2021 Table 4. Thermophilic bacteria DnaK 24- to 30-bp Name Group copies OGT/°C GϩC E(DnaK) repeats*

SYMTH Firmicutes 1 51 68.7 1.27 ϩ THETH - 1 70–75 69.4 1.27 ϩ (4 copies) THEEL Cyanobacteria 3 55 53.9 dnak1 0.76 Ϫ dnak2 1.38 dnak3 0.86 THETE Firmicutes 1 80 37.6 1.28 ϩ STRTH Firmicutes 1 42 39.1 2.10 ϩ THEMA Thermotogales 1 80 46.3 1.38 ϩ AQUAE Aquificales 1 90 43.3 1.36 ϩ CHLTE Chlorobiales 1 48 56.5 1.05 ϩ

SYMTH, Symbiobacterium thermophilum; THETH, ; THEEL, Thermosynechococcus elon- gatus; THETE, Thermoanaerobacter tengcogensis; STRTH, Streptococcus thermophilus; THEMA, Thermotoga maritima; AQUAE, aeolicus; CHLTE, Chlorobium tepidum. *The nature of the 24- to 30-bp repeats are discussed in ref. 2. These are lacking in T. elongatus and involve only four copies in the T. thermophilus genome.

␣-, contrasted to no duplications of HSP60 in gene may have been subsequently lost in M. maripaludis. Nine other proteobacterial clades (48). Multiple HSP60 paralogs also genomes of the bacterial mycoplasma species have been entirely exist in cyanobacteria, in Chlamydia, and in high GϩC Gram- sequenced, and each has a PHX DnaK gene; in contrast, GroEL positive bacteria. Many a-mitochondrial eukaryotes, including is missing from five of these genomes. By comparison, the eight Trichomonas vaginalis, Giardia lamblia, and Entamoeba histo- thermophilic bacteria completely sequenced (Table 4) (five lytica, contain two or more HSP60. Plastids carry multiple copies moderate thermophiles and three hyperthermophiles) all en- of HSP60 that bind Rubisco. Specialized complex structures in code at least one DnaK gene persistently PHX. cells often need their own ‘‘dedicated’’ chaperones (e.g., ref. 49). The crenarchaea have no HSP70 representations (Table 2). It Peptidyl-prolyl cis–trans isomerase (PPIase) in prokaryotes seems that the chaperone Pfd in archaea can substitute for and protein disulfide isomerase (PDI) in prokaryotes and eu- HSP70, whereas in bacteria, the Tig gene (missing from archaea) karyotes are generally present in multiple copies. E. coli has at functions cooperatively with or substitutes for DnaK. least nine PPIases defined by sequence similarity. One of these, the survival protein SurA, promotes the folding of periplasmic ‘‘Hybrid’’ Bacterial and Archaeal Genomes. The genomes of the and outer membrane proteins. As expected, SurA does not exist mesophilic methanogens Methanosarcina acitovorans and M. in Gram-positive bacteria. DegP is a chaperone͞degradation mazei (Table 1) might be regarded as hybrids of bacterial and factor that is significantly PHX and acts primarily in degrading archaeal species. They both contain multiple copies of Ths and misfolded proteins in the periplasm. PapD͞FimC are other one GroEL gene, all PHX. We speculate that the acquisition of chaperones͞periplasmic, and disulfide oxidase proteins cyto- GroEL in these genomes is due to lateral gene transfer. The same plasmic chaperones. Tig exhibits PPI activity in vitro and contains applies to the cyanobacterium Gloeobacter violaceus, which a PPI motif of the cyclophilin family in most bacteria, binds at contains simultaneously Ths and GroEL and the recombination the ribosome polypeptide tunnel exit, and cooperates with DnaK repair proteins RecA and RadA. G. violaceus is also similar to Archaea in expressing multiple detoxification PHX bacteriofer- during de novo protein folding (39–42). In this respect, NAC ritin comigratory proteins and several Hsp20s. substitutes for Tig in eukaryotes and possibly in archaea (38). Tig interacts with the ribosomal protein L23 at the ribosomal tunnel SD Sequences in Archaea. In Bacteria, a strong correlation between exit (50), where it helps to stabilize and organize nascent high predicted gene expression levels and the presence of a SD translated polypeptides in bacteria. Tig recognizes short hydro- sequence motif, which plays an important role in translation phobic stretches, whereas DnaK binds to longer peptides. Nev- initiation, is observed (21). SD motifs do not exist in eukaryotes. ertheless, Tig and DnaK have overlapping functions in the Initiation is generally considered the rate-limiting step of trans- folding process where Tig implements the initial chaperone lation, which in many bacteria involves interactions between a interactions with ribosomes in bacteria. In yeast, Ssb (HSP70) is SD sequence immediately upstream of the start codon in the MICROBIOLOGY associated with ribosomes and generally contributes in post- mRNA and an anti-SD sequence at the 3Ј end of the 16S rRNA translational protein assembly and protein translocation across (reviewed in refs. 52 and 53). The consensus SD sequence membranes (42). Prefoldin generally has six distinct subunit features at its core the purine run AGGAGG, generally travers- ␣ ␤ types in eukaryotes, at most two subunit types ( and )in ing positions Ϫ10 to Ϫ5 relative to the start codon, and the 16S archaea (Table 2), and is absent from bacteria (22). Prefoldin rRNA gene usually carries the anti-SD sequence CACCTC- may assist de novo protein folding when interacting with CCT CTTT at its 3Ј end. The PHX genes as compared with genes with (41). HSP90 (HtpG) is widely distributed in Bacteria but is an average expression level are significantly more likely to absent from Archaea and also from the genomes of the deeply possess a strong SD motif (21). This positive correlation between branching bacterial species Aquifex aeolicus and Thermotoga strong SD signal sequences and PHX genes can be found in maritima (22). HtpG plays a role in stress tolerance. almost all bacterial and archaeal genomes (ref. 21 and Table 3). Among the 19 archaeal genomes of Table 1, only 7 possess a The Crenarchaea and Thermoplasma have many leaderless DnaK gene, all of which are PHX. These seven archaea are either transcripts, and Crenarchaea and Thermoplasma are low in SD mesophiles or moderate thermophiles with optimal growth motifs. temperature of Յ65°C (27). However, the M. mari- paludis (see Tables 2 and 5) is lacking a HSP70 gene. It has been Conclusions and Prospects suggested that the presence of HSP70 genes in the seven archaeal Several authors underscore processes of lateral gene transfer and genomes is the consequence of lateral gene transfer (27, 51). The the archaeal origin of eukaryotic genes (8, 10, 11, 51). Many

Karlin et al. PNAS ͉ May 17, 2005 ͉ vol. 102 ͉ no. 20 ͉ 7307 Downloaded by guest on September 27, 2021 conspicuous genes of Archaea, e.g., Ths, PCNA, P0, Cdc48, and these reasons and others, they effectively suggest that pathogenic Pfd (Table 2), are missing from bacterial genomes but distributed archaeal will be identified in time as more studies on archaeal profusely in eukaryotes. However, several genomic features are species are conducted. Martin (55; see also ref. 56) in his common to all prokaryotes. These include operon gene organi- reactions, seems doubtful on biochemical grounds whether zation, the SD motif that provides control on mRNA translation Archaea are natural agents capable to parasitize mammalian initiation, and the presence of several clusters of RP. The most environments. pronounced PHX genes are the thermosome chaperones (dis- Archaeal–bacterial endosymbiosis and other relations have tant homologs of GroEL) and the chaperone prefoldin, which is been proposed to explain the genesis of eukaryotes and their hypothesized to substitute for the activities of Trigger factor and organelles (57–63). Along these lines, archaeal–bacterial part- HSP70 (22, 38, 41). Based on the Clusters of Orthologous nerships have been conceived preceding the origins of eu- Groups database (www.ncbi.nlm.nih.gov͞COG, 2004), all bac- karyotes. It is increasingly appreciated that the genomes of many terial and archaeal genomes share only 67 genes of which 30 are prokaryotes and primitive eukaryotes are ‘‘heterogeneous ribosomal protein genes, 14 are amino acetyl-tRNA synthetases, unions’’ in which lateral gene transfer and͞or close associations and several are major protein synthesis and processing factors. have been at work (64–67). Cavicchioli et al. (54) address the question of whether there are pathogenic Archaea. They point out interactions of archaea We thank Profs. A. Campbell and D. Kaiser and Dr. J. Trent of NASA with eukaryotes, methanogenic inhabitants of the human oral Ames Research Center for helpful comments on the manuscript. S.K. cavity and intestinal tract, and call attention to many human was supported in part by National Institutes of Health Grant diseases whose causative agents have not been identified. For 5R01GM10452-40.

1. Woese, C. R., Kandler, O. & Wheelis, M. L. (1990) Proc. Natl. Acad. Sci. USA 36. Towe, K. M. (1996) Adv. Space Res. 18, 7–15. 87, 4576–4579. 37. Ohmoto, H. (1997) Geochem. News 93, 12–29. 2. Karlin, S., Brocchieri, L., Campbell, A., Cyert, M. & Mra´zek, J. (2005) Proc. 38. Hartl, F. U. & Hayer-Hartl, M. (2002) Science 295, 1852–1858. Natl. Acad. Sci. USA 102, 7309–7314. 39. Teter, S. A., Houry, W. A., Ang, D., Tradler, T., Rockabrand, D., Fischer, G., 3. Barns, S. M., Delwiche, C. F., Palmer, J. D. & Pace, N. R. (1996) Proc. Natl. Blum, P., Georgopoulos, C. & Hartl, F. U. (1999) Cell 97, 755–765. Acad. Sci. USA 93, 9188–9193. 40. Deuerling, E., Schulze-Specking, A., Tomoyas, T., Mogk, A. & Bukau, B. 4. Gupta, R. S. (1998a) Theor. Popul. Biol. 54, 91–104. (1999) Nature 400, 693–696. 5. Gupta, R. S. (1998) Microbiol. Mol. Biol. Rev. 62, 1435–1491. 41. Siegert, R., Leroux, M. R., Scheufler, C., Hartl, F. W. & Moarefi, I. (2000) Cell 6. Poole, A., Jeffares, D. & Penny, D. (1999) BioEssays 21, 880–889. 103, 621–632. 7. Brocchieri, L. (2001) Theor. Popul. Biol. 59, 27–40. 42. Spiess, C., Meyer, A. S., Reissmann, S. & Frydman, J. (2004) Trends Cell Biol. 8. Forterre, P., Brochier, C. & Philippe, H. (2002) Theor. Popul. Biol. 61, 409–422. 14, 598–604. 9. Gribaldo, S. & Philippe, H. (2002) Theor. Popul. Biol. 61, 391–408. 43. Houry, W. A., Frishman, D., Eckerskorn, C., Lottspeich, F. & Hartl, F. U. 10. Makarova, K. S. & Koonin, E. V. (2003) Genome Biol. 4, 115–115.17. (1999) Nature 402, 147–154. 11. Castresana, J. (2004) in Respiration in Archaea and Bacteria: Diversity of 44. Harms, J., Schlu¨nzen, F., Zarivach, R., Bashan, A., Gat, S., Agmon, I., Bartels, Prokaryotic Electron Transport Carrier, Advances in Photosynthesis and Res- H., Franceschi, F. & Yonath, A. (2001) Cell 107, 679–688. piration, ed. Zannoni, D. (Kluwer Academic, Dordrecht, The Netherlands), 45. Klein, D. J., Moore, P. B. & Steitz, T. A. (2004) J. Mol. Biol. 340, 141–177. Vol. 1, pp. 1–14. 46. Brocchieri, L. & Karlin, S. (2000) Protein Sci. 9, 476–486. 12. Rivera, M. C., Jain, R., Moore, J. E. & Lake, J. A. (1998) Proc. Natl. Acad. Sci. 47. Zhu, Z.-Y. & Karlin, S. (1996) Proc. Natl. Acad. Sci. USA 93, 8350–8355. USA 95, 6239–6244. 48. Karlin, S., Barnett, M. J., Campbell, A. M., Fisher, R. F. & Mra´zek, J. (2003) 276, 13. Pace, N. R. (1997) Science 734–740. Proc. Natl. Acad. Sci. USA 100, 7313–7318. 14. Saito, R. & Tomita, M. (1999) Gene 238, 79–83. 49. Kuehn, M. J., Ogg, D. J., Kihlberg, J., Slonim, L. N., Flemmer, K., Bergfors, 15. Karlin, S. & Mra´zek, J. (2000) J. Bacteriol. 182, 5238–5250. T. & Hultgren, S. J. (1993) Science 262, 1234–1241. 16. Karlin, S., Mra´zek,J., Campbell, A. & Kaiser, D. (2001) J. Bacteriol. 183, 50. Kramer, G., Rauch, T., Rist, W., Vorderwu¨lbecke, S., Patzelt, H., Schulze- 5025–5040. Specking, A., Ban, N., Deuerling, E. & Bukau, B. (2002) Nature 419, 171–174. 17. Chen, G. T. & Inouye, M. (1994) Genes Dev. 8, 2641–2652. 51. Gribaldo, S., Lumia, V., Creti, R., de Macario, E. C., Sanangelantoni, A. & 18. Karlin, S., Campbell, A. M. & Mra´zek, J. (1998) Mol. Microbiol. 29, 1341–1355. Cammarano, P. (1999) J. Bacteriol. 181, 434–443. 19. Thanaraj, T. A. & Argos, P. (1996) Protein Sci. 5, 1594–1612. 52. Gold, I. (1988) Annu. Rev. Biochem. 57, 199–233. 20. Netzer, W. J. & Hartl, F. U. (1998) Trends Biochem. Sci. 23, 68–73. 53. Draper, D. E. (1996) in and Salmonella: Cellular and Molecular 21. Ma, J., Campbell, A. & Karlin, S. (2002) J. Bacteriol. 184, 5733–5745. Biology, eds. Neidhardt, F. C., Curtiss, R., III, Ingraham, J. L., Lin, E. C. C., 22. Leroux, M. R. (2002) Adv. Appl. Microbiol. 50, 219–277. 23. Lund, P. A., Large, A. T. & Kapatai, G. (2003) Bio. Chem. Trans. 31, 681–685. Low, K. B., Magasanik, B., Reznikoff, W. S., Riley, M., Schaechter, M. & 24. Trent, J. D., Osipiuk, K. & Pinkau, T. (1990) J. Bacteriol. 172, 1478–1485. Umbarger, H. E. (ASM Press, Washington, DC), 2nd Ed., pp. 902–908. 25. Trent, J. D., Nimmesgern, E., Wall, J. S., Hartl, F. U. & Horwich, A. L. (1991) 54. Cavicchioli, R., Curmi, P. M. G., Saunders, N. & Thomas, T. (2003) BioEssays Nature 354, 490–493. 25, 1119–1128. 26. Trent, J. D., Kagawa, H. K., Paavola, C. D., McMillan, R. A., Howard, J., 55. Martin, W. (2004) BioEssays 26, 592–593 (lett.). Jahnke, L., Lavin, C., Tsegereda, E. & Henze, C. E. (2003) Proc. Natl. Acad. 56. Cavicchioli, R. & Curmi, P. (2004) BioEssays 26, 593 (lett.). Sci. USA 100, 15589–15594. 57. Zillig, W., Klenk, H. P., Palm, P., Puhler, G., Gropp, F., Garrett, R. A. & 27. Macario, A. J. L., Malz, M. & Conway de Macario, E. (2004) Front. Biosci. 9, Leffers, H. (1989) Can. J. Microbiol. 35, 73–80. 1318–1332. 58. Gupta, R. S. & Golding, R. B. (1993) J. Mol. Evol. 37, 573–582. 28. Grabowski, B. & Kelman, Z. (2003) Annu. Rev. Microbiol. 57, 487–516. 59. Gupta, R. S. & Golding, G. B. (1996) Trends Biochem. Sci. 21, 166–171. 29. Seitz, E. M., Haseltine, C. A. & Kowalczykowski, S. C. (2001) Adv. Appl. 60. Martin, W. & Mu¨ller, M. (1998) Nature 392, 37–41. Microbiol. 50, 101–169. 61. Lopez-Garcia, P. & Moreira, D. (1999) Trends Biochem. Sci. 24, 88–93. 30. Lecompte, O., Ripp, R., Thierry, J.-C., Moras, D. & Poch, O. (2002) Nucleic 62. Karlin, S., Brocchieri, L., Mra´zek,J., Campbell, A. M. & Spormann, A. M. Acids Res. 30, 5382–5390. (1999) Proc. Natl. Acad. Sci. USA 96, 9190–9195. 31. Cady, S. L. (2001) Adv. Appl. Microbiol. 50, 3–35. 63. Hartman, H. & Fedorov, A. (2002) Proc. Natl. Acad. Sci. USA 99, 1420–1425. 32. Nisbet, E. G. & Sleep, N. H. (2001) Nature 409, 1083–1091. 64. Ochman, H., Lawrence, J. G. & Groisman, E. A. (2000) Nature 405, 299–304. 33. Xiong, J. & Bauer, C. E. (2002) Annu. Rev. Plant Physiol. Plant Mol. Biol. 53, 65. Campbell, A. M. (2000) Theor. Popul. Biol. 57, 71–77. 503–521. 66. Kunin, V. & Ouzounis, C. A. (2003) Genome Res. 13, 1589–1594. 34. Brochier, C. & Philippe, H. (2002) Nature 417, 244. 67. Gogarten, J. P., Doolittle, W. F. & Lawrence, J. G. (2002) Mol. Biol. Evol. 19, 35. Towe, K. M. (1990) Nature 348, 54–56. 2226–2238.

7308 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0502313102 Karlin et al. Downloaded by guest on September 27, 2021