<<

letters to nature

10. South, T. L., Blake, P. R., Hare, D. R. & Summers, M. F. C-terminal retroviral-type finger domain another superimposed autocatalytic cycle. Our study describes a from the HIV-1 nucleocapsid is structurally similar to the N-terminal zinc finger domain. 30, 6342–6349 (1991). mutualistic relationship between two replicators, each catalysing the 11. Luisi, B. F. et al. Crystallographic analysis of the interaction of the glucorticoid with DNA. formation of the other, that are linked by a superimposed catalytic Nature 352, 497–505 (1991). 12. Marmorstein, R., Carey, M., Ptashne, M. & Harrison, S. C. DNA recognition by GAL4: structure of a cycle. Although the kinetic data suggest the intermediary of higher- protein–DNA complex. Nature 356, 408–414 (1992). order species in the autocatalytic processes, the present system 13. Everett, R. D. et al. A novel arrangement of zinc-binding residues and secondary structure in the should not be referred to as an example of a minimal hypercycle C3HC4 motif of an alpha herpes virus . J. Mol. Biol. 234, 1038–1047 (1993). 1 14. Barlow, P. N., Luisi, B., Milner, A., Elliot, M. & Everett, R. Structure of the C3HC4 domain by H- in the absence of direct experimental evidence for the autocatalytic nuclear magnetic resonance spectroscopy. J. Mol. Biol. 237, 201–211 (1994). cross-coupling between replicators. Ⅺ 15. Borden, K. L. B. et al. The solution structure of the RING finger domain from the acute promyelocytic leukaemia proto-oncoprotein PML. EMBO J. 14, 1532–1541 (1995). 8 16. Phillips, S. E. V. The ␤-ribbon DNA recognition motif. Ann. Rev. Biophys. Biomol. Struct. 23, 671–701 (1994). 17. Kim, J. L., Nikolov, D. B. & Burley, S. K. Co-crystal structure of TBP recognizing the minor groove of a TATA element. Nature 365, 520–527 (1993). 18. Kim, Y., Geiger, J. H., Hahn, S. & Sigler, P. B. Crystal structure of a TBP–TATA-box complex. Nature 365, 512–520 (1993). The complete genome 19. Schumacher, M. A., Choi, K. Y., Zalkin, H. & Brennan, R. G. Crystal structure of LacI member, PurR, bound to DNA: minor groove binding by ␣-helices. Science 266, 763–770 (1994). 20. Flick, K. E. et al. Crystallization and preliminary X-ray studies of I-PpoI: a nuclear, intron-encoded sequence of the homing from Physarum polycephalum. Protein Sci. 6, 1–4 (1997). 21. Otwinowski, Z. & Minor, W. Processing of X-ray diffraction data collected in oscillation mode. hyperthermophilic, Methods Enzymol. 276, 307–326 (1997). 22. Leslie, A. G. W. in Joint CCP4 and ESF-EACMB Newsletter on Protein Crystallography (Daresbury Laboratory, Warrington, UK, 1992). sulphate-reducing archaeon 23. CCP4 The SERC (UK) Collaborative Computing Project No. 4, a Suite of Programs for Protein Crystallography (Daresbury Laboratory, Warrington, UK, 1979). 24. QUANTA96 X-ray Structure Analysis User’s Reference (Molecular Simulations, San Diego, 1996). Archaeoglobus fulgidus 25. Bru¨nger, A. XPLOR version 3.1: A System for X-ray Crystallography and NMR (Yale Univ. Press, New Haven, CT, 1992). 26. Laskowski, R. J., Macarthur, M. W., Moss, D. S. & Thornton, J. M. PROCHECK: a program to check Hans-Peter Klenk, Rebecca A. Clayton, Jean-Francois Tomb, the stereochemical quality of protein structures. J. Appl. Crystallogr. 26, 383–290 (1993). Owen White, Karen E. Nelson, Karen A. Ketchum, 27. Evans, S. V. SETOR: hardware-lighted three-dimensional solid model representations of Robert J. Dodson, Michelle Gwinn, Erin K. Hickey, macromolecules. J. Mol. Graphics 11, 134–138 (1993). Jeremy D. Peterson, Delwood L. Richardson, Acknowledgements. We thank D. McHugh, K. Stephens and J. D. Heath for initial subcloning, Anthony R. Kerlavage, David E. Graham, Nikos C. Kyrpides, purification and crystallization studies; R. Strong, K. Zhang and B. Scott for advice during the crystallographic analysis; and the beamline staff at the Advanced Light Source (NLBL laboratories), Robert D. Fleischmann, John Quackenbush, Norman H. Lee, beamline 5.0.2, particularly T. Earnest, for assistance. B.L.S. and R.J.M. are funded for this project by the Granger G. Sutton, Steven Gill, Ewen F. Kirkness, NIH. K.E.F. was supported by an NIH training grant and the American Heart Associaiton. M.S.J. was Brian A. Dougherty, Keith McKenney, Mark D. Adams, supported by an NSF fellowship and an NIH training grant. Brendan Loftus, Scott Peterson, Claudia I. Reich, Correspondence and requests for materials and coordinates should be addressed to B.L.S. (e-mail: Leslie K. McNeil, Jonathan H. Badger, Anna Glodek, [email protected]). Coordinates have been deposited in the Brookhaven (accession nos lipp, 1a73, 1a74). Lixin Zhou, Ross Overbeek, Jeannine D. Gocayne, Janice F. Weidman, Lisa McDonald, Teresa Utterback, Matthew D. Cotton, Tracy Spriggs, Patricia Artiach, Brian P. Kaine, Sean M. Sykes, Paul W. Sadow, Kurt P. D’Andrea, Cheryl Bowman, Claire Fujii, corrections Stacey A. Garland, Tanya M. Mason, Gary J. Olsen, Claire M. Fraser, Hamilton O. Smith, Carl R. Woese & J. Craig Venter Emergence of symbiosis in Nature 390, 364–370 (1997) ...... peptide self-replication The pathway for sulphate reduction is incorrect as published: in Fig. 3 on page 367, adenylyl sulphate 3- (cysC)is through a hypercyclic network not needed in the pathway as outlined, as adenylyl sulphate (aprAB) catalyses the first step in the reduction of adenylyl David H. Lee, Kay Severin, Yohei Yokobayashi sulphate. The correct sequence of reactions is: sulphate is first & M. Reza Ghadiri activated to adenylyl sulphate, then reduced to sulphite and subse- quently to sulphide. The catalysing these reactions are: Nature 390, 591–594 (1997) ...... sulphate adenylyltransferase (sat), adenylylsulphate reductase Hypercycles are based on second-order (or higher) autocatalysis (aprAB), and sulphite reductase (dsrABD). We thank Jens-Dirk and defined by two or more replicators that are connected by Schwenn for bringing this error to our attention. Ⅺ

Nature © Macmillan Publishers Ltd 1998 NATURE | VOL 394 | 2 JULY 1998 101 letters to nature

(that is, on its left side in Fig. 2a and b), a 231-nm-thick absorption cross-section is a minimum, to ensure strong non- 7,8 Al0.165Ga0.835As spacer layer was grown with two Si ␦-doping reciprocity between intersub-band absorption and emission . layers (1 ϫ 1012 cm Ϫ 2), one inserted 22 nm and the other 187 nm Although the realization of such a laser would be scientifically from the left edge of the deep well. A 10-nm-thick undoped GaAs important, its implementation would be difficult and its techno- region capped the structure. The spacer layer thickness was adjusted logical impact limited by the very short lifetime (a few tenths of to preserve the same distance between the ␦-doping layers and the picoseconds) of the excited state which is required to achieve strong double-quantum-well system, and therefore the electrostatic poten- interference14. Ⅺ ␦ tials are identical in both structures. The -doping provides a two- Received 23 June; accepted 6 October 1997. dimensional electron gas in the deep well with a calculated sheet ϫ 11 Ϫ 2 1. Fano, U. Effect of configuration interaction on intensity and phase shifts. Phys. Rev. 124, 1866–1878 electron density of ns ¼ 4 10 cm . (1961). For the absorption measurements, we processed our samples in a 2. Harris, S. E. Lasers without inversion: interference of lifetime-broadened resonance. Phys. Rev. Lett. Њ 62, 1033–1035 (1989). multipass (six) 45 wedge waveguide. This geometry allowed us to 3. Arkhipkin, V. G. & Heller, Y. I. Radiation amplification without population at transitions to couple in linearly polarized radiation with a component of the autoionizing states. Phys. Lett. A 98, 12–14 (1983). polarization normal to the layer (50%) as required by the intersub- 4. Zibrov, S. et al. Experimental demonstration of laser without population inversion via quantum 9 interference in Rb. Phys. Rev. Lett. 75, 1499–1452 (1995). band absorption selection rule . The absorption was measured with 5. Padmadandu, G. G. et al. Laser oscillation without population inversion in a Sodium atomic beam. a Fourier-transform infrared spectrometer (FTIR) using a step-scan Phys. Rev. Lett. 76, 2053–2056 (1996). 10 6. Boller, K.-J., Imamoglu, A. & Harris, S. E. Observation of electromagnetically induced transparency. modulation technique in which the electron gas in the double well Phys. Rev. Lett. 66, 2593–2596 (1991). is periodically depopulated by a Ti/Au Shottky barrier contact 7. Imamoglu, A. & Harris, S. E. Lasers without inversion: interference of dressed lifetime-broadened evaporated on the surface of the sample and the two-dimensional states. Opt. Lett. 13, 1344–1346 (1989). 8. Imamoglu, A & Ram, R. J. Semiconductor lasers without population inversion. Opt. Lett. 19, 1744– electron gas is contacted by indium balls alloyed into the layer. 1746 (1994). The absorption measurements at T ¼ 10 K for both structures 9. Levine, B. F., Choi, K. K., Bethea, C. G., Walker, J. & Malik, R. J. New 10 ␮m infrared detector using intersubband absorption in resonant tunneling GaA1As superlattices. Appl. Phys. Lett. 50, 1092–1094 are compared in Fig. 3 with the results of numerical calculations (1987). using the coupled Schro¨dinger’s and Poisson’s equations. As pre- 10. Faist, J. et al. Narrowing of the intersubband electroluminescence spectrum in coupled-quantum-well dicted, the absorption strength at photon energies between the two heterostructures. Appl. Phys. Lett. 65, 94–96 (1994). 11. Smidt, H., Campman, K. L., Gossard, A. C. & Imamoglu, A. Tunnel induced transparency: Fano resonances is strongly suppressed or enhanced by the interference interference in intersubband transitions. Appl. Phys. Lett. 70, 3455–3457 (1997). effect depending on the location of the thin barrier, proving that 12. Faist, J. et al. Tunable Fano interference in intersubband absorption. Opt. Lett. 21, 985–987 (1996). 13. Capasso, F., Faist, J., Sirtori, C. & Cho, A. Y. Infrared (4–11 ␮m) quantum cascade lasers. Solid State tunnelling through the latter controls the interference effect when Commun. 102, 231–236 (1997). the broadening of the states is dominated by tunnelling. However, 14. Kurghin, J. B. & Rosencher, E. Practical aspects of lasing without inversion in various media. IEEE J. the finite broadening introduced by interface disorder prevents Quant. Electron. QE-32, 1882–1890 (1996). full quantum interference; this is the main reason for the Acknowledgements. We thank S. Harris, A. Imamoglu, R. J. Ram, A. Tredicucci and C. Gmachl for departure from the calculated profiles and specifically the reason enlightening discussions, and the authors of ref. 11 for making their manuscript available before publication. The work was supported in part by DARPA/ARO. why the absorption does not vanish in the sample with destructive interference. Indeed, linewidth measurements on samples with Correspndence should be addressed to F.C. (e-mail: [email protected] or [email protected]). the same coupled-well structure but with negligible tunnelling to the continuum showed a full-width at half-maximum of the absorption peaks of Γ ¼ 5 meV. This structure consists of an identical double quantum well between two 60-nm-thick Al0.33Ga0.67As barriers. This value is a measure of the non-tunnelling Emergence of symbiosis in contribution to the broadening of the optical transitions; it is smaller but not negligible compared with the calculated broadening peptide self-replication Х Х by tunnelling through the 1.5 nm barrier, Γ1 Γ2 16 meV. Destructive interference in intersub-band absorption in a double- through a hypercyclic network well structure coupled by tunnelling to a continuum has recently been inferred from a fit of the absorption lineshape to a model that David H. Lee, Kay Severin, Yohei Yokobayashi included the collision broadening in a phenomenological manner11. & M. Reza Ghadiri The present experiment gives more direct evidence of tunnelling- Departments of Chemistry and , and the Skaggs Institute for induced quantum interference by showing that tunnelling can be Chemical Biology, The Scripps Research Institute, La Jolla, California 92037, USA used to control the sign of the interference...... It is important to stress the difference between the phenomena Symbiosis is an association between different organisms that described here and the Fano interference in intersub-band absorp- leads to a reciprocal enhancement of their ability to survive. tion recently reported by us12. In that work a minimum in the Similar mutually beneficial relationships can operate at the absorption arises because of interference between matrix elements molecular level in the form of a hypercycle, a collective of two for the ground state to the continuum and to a single resonance or more self-replicating species interlinked through a cyclic coupled by tunnelling to the same continuum. This leads to a catalytic network1–5. The superposition of cross- onto strongly asymmetric absorption lineshape. In contrast, in the autocatalytic replication integrates the members of the hypercycle phenomena studied here interference arises between absorption into a single system that reproduces through a second-order (or paths through two resonances coupled to a continuum, and the higher) form of nonlinear autocatalysis. The hypercycle popula- direct matrix element from the ground state to the continuum is tion as a whole is therefore able to compete more efficiently for negligible. existing resources than any one member on its own. In addition, These findings are relevant for the design of semiconductor lasers the effects of beneficial of any one member are spread without population inversion (LWI). Such lasing action has so far over the entire population. The formation of hypercycles has been been observed only in gases4,5. Essential for LWI is nonreciprocity suggested as an important step in the transition from inanimate to between emission and absorption. A possible semiconductor LWI living chemistry6 , and a large number of hypercycles are expected scheme would use the quantum-well structure of Fig. 2a for the to be embedded within the complex networks of living systems7. active regions. The latter would be alternated with electron injectors But only one naturally occurring hypercycle has been well as in quantum cascade lasers13. Electrons would be injected from the documented8, while two autocatalytic chemical systems may thick barrier side at an energy between the two resonances where the contain vestiges of hypercyclic organization9,10. Here we report a

Nature © Macmillan Publishers Ltd 1997 NATURE | VOL 390 | 11 DECEMBER 1997 591 letters to nature chemical system that constitutes a clear example of a minimal expect a survival-of-the-fittest situation where the more efficient hypercyclic network, in which two otherwise competitive replicator R2 would overwhelm R1 by consuming the common self-replicating peptides symbiotically catalyse each others’ fragment E more quickly. At first glance, this expectation seemed to production. be borne out as R2 was produced in greater abundance than R1 (as The present design of a minimal hypercycle is based on two self- expected, when molecular interactions are disrupted in the presence replicating coiled coil peptides R1 and R2 (Fig. 1). The replicator R1 of guanidinium hydrochloride, no kinetic preference for R2 over R1 was recently reported11,12 and is produced as the ligation of was observed). However, the situation is more interesting and the electrophilic peptide fragment E and the nucleophilic fragment complex. When we sought to give R1 an advantage in this competi- N1. The replicator R2 is made from the same electrophilic fragment tion by adding 40% R1 (with respect to the concentra- but a different nucleophilic peptide fragment N2. The nucleophilic tion) at the start of the reaction, to our surprise the rate of R1 self- fragments N1 and N2 differ in their sequence at the hydrophobic production increased by only 1.7 times over the unseeded reaction recognition surface—N1 is composed of and whereas but the rate of R2 formation was enhanced to a greater extent, by 5.4 N2 is made up of and leucine residues. This difference in times (Table 1, Fig. 3). Thus the two replicators are not mutually sequence at the hydrophobic core is known to affect profoundly the exclusive in their growth; R1 catalyses the formation of R2 as well as aggregation state of coiled coils13,14. Furthermore it is known that itself. Likewise, perturbation of the reaction by seeding it with 45% conservative mutations in this region of the structure can drastically R2 not only increased the rate of R2 production 2.9 times but R1 as alter the kinetic behaviour of the replicator11,12,15. well, by 3.5 times. Thus a cross-catalytic cycle is cooperatively The ability of R2 to self-replicate was determined by observation coupled with two self-replicating reactions, making this system of characteristics previously established as signatures of self- one which is hypercyclic in nature. There are four characteristic 11,12 replication (Fig. 2) . Similar to that of R1, the new replicator outcomes expected for such a hypercyclic network, depending on R2 also displays a parabolic growth profile. Numerical fitting of the relative efficiencies of the coupled catalytic and autocatalytic 2 the kinetic data obtained for R2 to the empirical rate equations reactions . The observed greater efficiencies of the catalytic reactions of von Kiedrowski16 gave a background rate constant over the autocatalytic components of the system are the most : Ϯ : Ϫ 1 Ϫ 1 kb ¼ 0 072 0 005 M s and an apparent autocatalytic rate desirable outcomes which assure the stability of the hypercycle: Ϯ Ϫ 3=2 Ϫ 1 constant ka ¼ 52 1 M s , making R2 more efficient production of one species promotes the production of the other to : Ϫ 1 Ϫ 1 than its relative R1 (kb ¼ 0 063 M s and constant an even greater degree. This particular mode of catalytic coupling : Ϫ 3=2 Ϫ 1 ka ¼ 29 4 M s ). prevents one replicator from overwhelming the other and enables A solution containing all three fragments E, N1 and N2 gave a the two to reproduce as a single coherent unit. combinatorial synthesis of both replicators. A priori, one would To verify that R1 and R2 catalyse each other’s production, the

Table 1 Initial rates of product formation

Product No replicators added +40% R +45% R ...... 1 ...... 2 R1 4.8 8.2 17.0 R2 5.8 31.1 16.9 ...... III − − The data in this table (in units of 10 8 M min 1) are for reactions containing the three peptide fragments in the absence and presence of added replicators. E R1 N1 E R2 N2

IV III

Figure 1 Schematic diagram of a minimal hypercycle based on two self- replicating peptides. Cycles I and III show the self-producing cycles of replicators

R1 (dark grey/light grey) and R2 (dark grey/striped) respectively, which pre- organize their constituent fragments thereby promoting peptide ligation. Cycle

II, where R1 promotes R2 formation, and cycle IV, where R2 promotes R1 formation, comprise the catalytic components of the hypercycle and allow the replicators to positively regulate each others’ production. The mechanistic details of the present hypercyclic network may be more complex than the minimal system depicted here. Detailed kinetic analyses of the replicator sequences have shown that the autocatalytically productive intermediates involve, at least in part, qua- ternary complexes in which two template strands pre-organize the reactive peptide fragments (ref. 12 and K. Kumar, D.H.L., M.R.G., unpublished results).

The following peptide sequences were employed in this study: replicator 1 (R1),

ArCONH-RMKQLEEKVYELLSKVA-CLEXEVARLKKLVGE-CONH2; replicator 2 Figure 2 Production of R2 as a function of time in the presence of various initial

(R2), ArCONH-RMKQLEEKVYELLSKVA-CLEXEIARLKKLIGE-CONH2; electrophilic concentrations of R2. Open circles, in the absence of any added R2; filled fragment (E), ArCONH-RMKQLEEKVYELLSKVA-COSBn; nucleophilic fragment diamonds, in the presence of 4.0 ␮M; open triangles, in the presence of

1 (N1), H2N-CLEXEVARLKKLVGE-CONH2; nucleophilic fragment 2 (N2), H2N- 21.4 ␮M; and filled circles, 42.6 ␮M of initially added R2. Curves were generated

CLEXEIARLKKLIGE-CONH2. Bn, benzyl; Ar, 4-acetamidophenyl; and X, -e- by nonlinear least-squares fit of the data to the empirical rate equation of von NHCO-Ar. Kiedrowski using the program SimFit16. Data are an average of two experiments.

Nature © Macmillan Publishers Ltd 1997 592 NATURE | VOL 390 | 11 DECEMBER 1997 letters to nature reaction mixtures were simplified to include E and only one times enhancement expected for the autocatalytic reaction contain- nucleophile, and then seeded with the template that was not ing 35% R2. produced in situ (Fig. 4). Comparisons with unseeded reactions We now consider the sequence selectivity issues in the formation revealed that even in these simplified systems one template can of the hypercyclic peptide network. The operation of the hypercycle promote the formation of the other, giving rate enhancements is based on complementary, as well as self-complementary, forms of much larger than what would be expected if the reaction mixture catalysis. As noted below, there is mounting evidence that both were seeded with the autocatalytic template. Reaction mixtures processes are strongly sequence selective. Previously we had shown containing E and N1 that were seeded with 25% R2 enhanced the that in the case of replicator R1, even conservative mutations Ϫ 8 −1 initial rate of production of R1 from 3:9 ϫ 10 M min to (Val9Ala—where a valine has been substituted by an at − 1:5 ϫ 10 Ϫ 7 M min 1, a 3.8 times increase over the unseeded reac- position 9—and Leu26Ala) in the hydrophobic core residues 11–12 tion. Seeding of the same reaction mixture with 25% R1 would completely abolish the autocatalytic process . In this study we improve the rate by only 2.8 times. Similarly, seeding reaction have determined that similar replicator R2 mutations are also mixtures containing E and N2 with 35% R1 gave a 5.4 times rate autocatalytically infertile. There is also good evidence for high − enhancement over the 5:0 ϫ 10 Ϫ 8 M min 1 rate observed for the sequence selectivity in the cross-catalytic component of the reaction without added catalyst. The increase is greater than the 3.6 system. Control studies have indicated that the Leu26Ala R2 mutant cannot cross-catalyse the formation of replicators R1 nor 15 R2. Although in a recent study we have shown that the Val9Ala R1 mutant can efficiently cross-catalyse the formation of R1, we have found it to be ineffective in catalysing R2 production. Moreover, in a related study we have shown that diminution in the initial rate of peptide fragment condensation of more than 3 orders of magnitude can be caused even by electrostatic substitutions at the - exposed e and g positions of the heptad repeat sequence17. Although the above studies strongly support high sequence selectivity in the catalytic and autocatalytic components of the hypercyclic network, a significantly large sequence-space must undoubtedly exist that would enable the spontaneous self-organization of even more complex networks. Studies along those lines are under investigation. The work reported here may have particular relevance to various origin-of- theories1–4,18. It has been suggested that at the dawn of life the onset of darwinian must have been marked by

Figure 3 Replicators R1 and R2 self-organize into a two-membered hypercyclic network. a, Production of R1 (empty circles) and R2 (empty diamonds) as a function of time for reaction mixtures containing E, N1 and N2. b, Formation of

R1 (filled circles) and R2 (filled diamonds) as a function of time for reaction Figure 4 Replicators R1 and R2 are cross-catalytic. a, Formation of R1 as a mixtures containing the three fragments and 40% R1. c, Formation of R1 (filled function of time for the reaction mixture containing only E and N1 in the absence circles) and R2 (filled diamonds) as a function of time for reaction mixtures (empty circles) and in the presence (filled circles) of 35% R2. b, Formation of R2 as containing the three fragments and 45% R2. In b and c, production formation in a function of time for the reaction mixture containing E and N2 in the absence the absence of added templates are shown for comparison. Data are an average (empty diamonds) and in the presence (filled diamonds) of 25% R1. Data are an of two experiments. Curves are shown to guide the eye. average of two experiments. Curves are shown to guide the eye.

Nature © Macmillan Publishers Ltd 1997 NATURE | VOL 390 | 11 DECEMBER 1997 593 letters to nature selection based on feedback processes of genotype replication19. It is also likely that molecular genotypes and phenotypes may have been the very same molecules20. Our example of a hypercyclic peptide Kinetic limitations on droplet network supports the idea that peptides could play a role in both hypotheses. Ⅺ formation in clouds ......

Methods P. Y. Chuang*, R. J. Charlson† & J. H. Seinfeld‡

Self-replication of R2. All reactions were done in 0.6 ml Eppendorf tubes at * Department of Environmental Engineering Science, MC 138-78, Њ 23 C. A stock solution containing E, N2 and the internal standard 4- California Institute of Technology, Pasadena, California 91125, USA acetamidobenzoic acid (ABA), were seeded with various amounts of R2. † Departments of Atmospheric Sciences and Chemistry, University of Washington, Benzylmercaptan (1 ␮l) was then added. Reactions were initiated by adding Seattle, Washington 98195, USA 3-(N-morpholino)propanesulphonic acid (MOPS) buffer (pH ¼ 7:50, ‡ Division of Engineering and Applied Science and Department of Chemical 200 mM, 236 ␮l), giving a total volume of 300 ␮l and concentrations of Engineering, MC 210-41, California Institute of Technology, Pasadena, California : ␮ : ␮ ␮ ½N2ÿ ¼ 104 5 M, ½Eÿ ¼ 94 2 M, ½R2ÿ ¼ 0. 4.0, 21.4 or 42.6 M. 91125, USA ½MOPSÿ ¼ 157 mM, ½ABAÿ ¼ 40:4 ␮M. Samples (30 ␮l) were taken at various ...... time points and quenched with 2% trifluoroacetic acid (TFA) in (70 ␮l) The ‘indirect’ radiative cooling of climate due to the role of then stored at −70 ЊC. Samples were analysed by high pressure liquid chroma- anthropogenic aerosols in cloud droplet formation processes −2 tography on a Zorbax C8 column using an acetonitrile/water/0.1% TFA (which affect cloud albedo) is potentially large, up to −1.5 Wm gradient while monitoring at 270 nm. The identity of all peptides was (ref. 1). It is important to be able to determine the number determined by mass spectrometry and verified by coinjection with authentic concentration of cloud droplets to within a few per cent, as samples. Experiments were done in duplicate. radiative forcing as a result of clouds is very sensitive to changes 2 3–5 Determination of hypercyclic organization in the E/N1/N2 mixture. Reac- in this quantity , but empirical approaches are problematic . tions were done as described above except that the stock solution contained, The initial growth of a subset of particles known as cloud besides E and ABA, both N1 and N2, which was subsequently seeded with either condensation nuclei and their subsequent ‘activation’ to form R1, R2, or water. Reactions were initiated by adding MOPS buffer (pH ¼ 7:50, droplets are generally calculated with the assumption that cloud 200 mM, 236.6 ␮l), giving a total volume of 300 ␮l and concentrations of droplet activation occurs as an equilibrium process described by : ␮ : ␮ : ␮ : 6,7 ½N1ÿ ¼ 112 5 M, ½N2ÿ ¼ 112 7 M, ½Eÿ ¼ 91 1 M, ½MOPSÿ ¼ 157 7 mM, classical Ko¨hler theory . Here we show that this assumption can : ␮ : ␮ : ␮ ½ABAÿ ¼ 97 1 M, ½R1ÿ ¼ 45 1 M, ½R2ÿ ¼ 50 4 M. be invalid under certain realistic conditions. We conclude that the Verification of the catalytic components of the hypercycle. Reactions were poor empirical correlation between cloud droplet and cloud performed as described above except only one nucleophile was present in the condensation nuclei concentrations is partly a result of kinetically reaction mixture and the reaction was seeded with the replicator that was not limited growth before droplet activation occurs. Ignoring these produced in situ. Initial concentrations are (1) ½Eÿ ¼ 88:9 ␮M, considerations in calculations of total cloud radiative forcing : ␮ : ␮ : ␮ : ␮ ½N1ÿ ¼ 98 2 M, ½R2ÿ ¼ 25 2 M, ½ABAÿ ¼ 50 5 M; (2) ½Eÿ ¼ 80 4 M, based on cloud condensation nuclei concentrations could lead : ␮ : ␮ : ␮ ½N21ÿ ¼ 96 9 M, ½R1ÿ ¼ 35 3 M, ½ABAÿ ¼ 36 9 M. to errors that are of the same order of magnitude as the total anthropogenic greenhouse-gas radiative forcing1. Received 20 June; accepted 20 October 1997. Cloud droplet activation and subsequent treatments of cloud 1. Eigen, M. & Schuster, P. The hypercycle. A principle of natural self-organization. Part A: emergence of the hypercycle. Naturwissenschaften 64, 541–565 (1977). droplet growth in atmospheric models generally rely on the 2. Eigen, M. & Schuster, P. The hypercycle. A principle of natural self-organization. Part B: the abstract assumption that pre-activation growth is accurately described by hypercycle. Naturwissenschaften 65, 7–41 (1978). an equilibrium model in which the particle diameter is always at 3. Eigen, M. & Schuster, P. The hypercycle. A principle of natural self-organization. Part C: The realistic 6,7 hypercycle. Naturwissenschaften 65, 341–369 (1978). equilibrium with the local supersaturation . The equilibrium 4. Eigen, M. Self-organization of matter and the evolution of biological macromolecules. relationship between supersaturation and particle size for a particle Naturwissenschaften 58, 465–523 (1971). 5. Mu¨ller-Herold, U. What is a hypercycle? J. Theor. Biol. 102, 569–584 (1983). composed of highly soluble inorganic species can be described by 8 6. Lee, D. H., Severin, K. & Ghadiri, M. R. Autocatalytic networks: the transition from molecular self- the well-known Ko¨hler equation (curve A, Fig. 1) . Cloud droplet replication to molecular ecosystems. Curr. Opin. Chem. Biol. (in the press). nuclei (CDN) activate when they grow larger than their critical 7. Ricard, J. & Noat, G. Electrostatic effects and the dynamics of reactions at the surface of cells. 1. A theory of the ionic control of a complex multi-enzyme system. Eur. J. Biochem. 155, 183–190 diameter, Dpc, after which they can grow spontaneously, limited (1986). only by growth kinetics. The concept of CDN is distinct from that of 8. Eigen, M., Biebricher, C. K., Gebinoga, M. & Gardiner, W. C. The hypercycle. Coupling of RNA and protein in the cycle of an RNA bacteriophage. Biochemistry 30, 11005–11018 CCN in that, whereas CCN are defined as those particles that (1991). activate to become cloud droplets within a cloud chamber of 9. Hong, J.-I., Feng, Q., Rotello, V. & Rebek, J. Jr Competition, cooperation and : improving a fixed or prescribed supersaturation, CDN are those particles that synthetic replicator by light irradiation. Science 255, 848–850 (1992). 10. Achilles, T. & von Kiedrowski, G. A self-replicating system from three starting materials. Angew. Chem. actually activate in the atmosphere under conditions of time- Int. Edn Engl. 32, 1198–1201 (1993). varying supersaturation. 11. Lee, D. H., Granja, J. R., Martinez, J. A., Severin, K. S. & Ghadiri, M. R. A self-replicating peptide. Nature 382, 525–528 (1996). To evaluate the conditions under which the equilibrium activa- 12. Severin, K. S., Lee, D. H., Martinez, J. A. & Ghadiri, M. R. Peptide self-replication via template- tion model is valid, two timescales will be defined. One is the directed ligation. Chem. Eur. J. 3, 1017–1024 (1997). timescale for particle growth that would be required for that particle 13. Harbury, P. B., Zhang, T., Kim, P. S. & Alber, T. A switch between two-, three- and four-stranded coiled coils in GCN4 leucine zipper mutants. Science 262, 1401–1407 (1993). to remain at equilibrium as the ambient supersaturation ratio 14. Hu, J. C., O’Shea, E. K., Kim, P. S. & Sauer, R. T. Sequence requirements for coiled coils: analysis with l increases in a rising air parcel, te. The other is the timescale for -GCN4 leucine zipper fusions. Science 250, 1400–1403 (1990). 15. Severin, K., Lee, D. H., Martinez, J. A. & Ghadiri, M. R. Dynamic error-correction in an autocatalytic actual change in the droplet size resulting from condensational q peptide network. Angew. Chem. Int. Edn Engl. (in the press). growth, tg. Hence, if te tg then the equilibrium model is reason- 16. von Kiedrowski, G. Minimal replicator theory I: parabolic versus exponential growth. Bioorg. Chem. able; otherwise, CDN activation, and hence the cloud droplet size Front. 3, 113–146 (1993). 17. Severin, K., Lee, D. H., Kennan, A. J. & Ghadiri, M. R. A synthetic peptide . Nature 389, 706–709 distribution, can be accurately predicted only if the kinetics of (1997). droplet growth are considered. To calculate te, the rate of change of 18. Kauffman, S. A. The Origins of Order (Oxford Univ. Press, New York, 1993). 19. Ku¨ppers, B.-O. The Origin of Biological Information (MIT Press, Cambridge, MA, 1990). the droplet diameter that would be required for that droplet to 20. Joyce, G. F. RNA evolution and the origins of life. Nature 338, 217–224 (1989). remain at its equilibrium size, dDpe/dt, is determined from the Acknowledgements. We thank K. Kumar for discussions. We also thank the Medical Research Council of combination of two effects. First, the time rate of change of super- Canada for a predoctoral fellowship (D.H.L.), and the Deutsche Forschungsgemeinschaft for a saturation, dS/dt, can be determined using a simple one-dimen- postdoctoral fellowship (K.S.). 9 sional adiabatic parcel model . Next, the rate of change of Dpe with Correspondence and requests for materials should be addressed to M.R.G. (e-mail: [email protected]). respect to supersaturation, dDpe/dS, is determined by differentiating

Nature © Macmillan Publishers Ltd 1997 594 NATURE | VOL 390 | 11 DECEMBER 1997 articles

The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon Archaeoglobus fulgidus

Hans-Peter Klenk*, Rebecca A. Clayton*, Jean-Francois Tomb*, Owen White*, Karen E. Nelson*, Karen A. Ketchum*, Robert J. Dodson*, Michelle Gwinn*, Erin K. Hickey*, Jeremy D. Peterson*, Delwood L. Richardson*, Anthony R. Kerlavage*, David E. Graham†, Nikos C. Kyrpides†, Robert D. Fleischmann*, John Quackenbush*, Norman H. Lee*, Granger G. Sutton*, Steven Gill*, Ewen F. Kirkness*, Brian A. Dougherty*, Keith McKenney*, Mark D. Adams*, Brendan Loftus*, Scott Peterson*, Claudia I. Reich†, Leslie K. McNeil†, Jonathan H. Badger†, Anna Glodek*, Lixin Zhou*, Ross Overbeek‡, Jeannine D. Gocayne*, Janice F. Weidman*, Lisa McDonald*, Teresa Utterback*, Matthew D. Cotton*, Tracy Spriggs*, Patricia Artiach*, Brian P. Kaine†, Sean M. Sykes*, Paul W. Sadow*, Kurt P. D’Andrea*, Cheryl Bowman*, Claire Fujii*, Stacey A. Garland*, Tanya M. Mason*, Gary J. Olsen†, Claire M. Fraser*, Hamilton O. Smith*, Carl R. Woese† & J. Craig Venter* * The Institute for Genomic Research (TIGR), Rockville, Maryland 20850, USA † Department of Microbiology, University of Illinois, Champaign-Urbana, Illinois 61801, USA ‡ Mathematics and Computer Science Division, Argonne National Laboratory, Illinois 60439, USA ......

Archaeoglobus fulgidus is the first sulphur-metabolizing organism to have its genome sequence determined. Its genome of 2,178,400 base pairs contains 2,436 open reading frames (ORFs). The information processing systems and the biosynthetic pathways for essential components (, amino acids and cofactors) have extensive correlation with their counterparts in the archaeon Methanococcus jannaschii. The genomes of these two indicate dramatic differences in the way these organisms sense their environment, perform regulatory and transport functions, and gain energy. In contrast to M. jannaschii, A. fulgidus has fewer restriction–modification systems, and none of its appears to contain inteins. A quarter (651 ORFs) of the A. fulgidus genome encodes functionally uncharacterized yet conserved , two-thirds of which are shared with M. jannaschii (428 ORFs). Another quarter of the genome encodes new proteins indicating substantial archaeal diversity.

Biological sulphate reduction is part of the global sulphur cycle, (Fig. 1). There are three regions with low G+C content (Ͻ39%), two ubiquitous in the earth’s anaerobic environments, and is essential to rich in genes encoding enzymes for lipopolysaccharide (LPS) the basal workings of the biosphere. Growth by sulphate reduction biosynthesis, and two regions of high G+C content (Ͼ53%), is restricted to relatively few groups of ; all but one of containing genes for large ribosomal , proteins involved in these are Eubacteria, the exception being the archaeal sulphate haem biosynthesis (hemAB), and several transporters (Table 1). reducers in the Archaeoglobales1,2. These organisms are unique in Because the origins of replication in Archaea are not characterized, that they are unrelated to other sulphate reducers, and because they we arbitrarily designated one within a presumed non- grow at extremely high temperatures3. The known Archaeoglobales coding region upstream of one of three areas containing multiple are strict anaerobes, most of which are hyperthermophilic marine short repeat elements. sulphate reducers found in hydrothermal environments2,4 and in Open reading frames. Two independent coding analysis programs subsurface oil fields5. High-temperature sulphate reduction by and BLASTX10 searches (see Methods) predicted 2,436 ORFs (Figs 1, Archaeoglobus species contributes to deep subsurface oil-well ‘sour- 2, Tables 1, 2) covering 92.2% of the genome. The average size of the ing’ by producing iron sulphide, which causes corrosion of iron and A. fulgidus ORFs is 822 bp, similar to that of M. jannaschii (856 bp), steel in oil- and gas-processing systems5. but smaller than that in the completely sequenced eubacterial Archaeoglobus fulgidus VC-16 (refs 2, 4) is the type strain of the genomes (949 bp). All ORFs were searched against a non-redundant Archaeoglobales. Cells are irregular spheres with a protein database, resulting in 1,797 putative identifications that envelope and monopolar flagella. Growth occurs between 60 and were assigned biological roles within a classification system adapted 95 ЊC, with optimum growth at 83 ЊC and a minimum division time from ref. 11. Predicted start codons are 76% ATG, 22% GTG and 2% of 4 h. The organism grows organoheterotrophically using a variety TTG. Unlike M. jannaschii, where 18 inteins were found in coding of and energy sources, but can grow lithoautotrophically on regions, no inteins were identified in A. fulgidus. Compared with M. hydrogen, thiosulphate and carbon dioxide6. We sequenced the jannaschii, A. fulgidus contains a large number of gene duplications, genome of A. fulgidus strain VC-16 as an example of a sulphur- contributing to its larger genome size. The average protein relative 7,8 metabolizing organism and to gain further insight into the Archaea molecular mass (Mr)inA. fulgidus is 29,753, ranging from 1,939 through genomic comparison with Methanococcus jannaschii9. to 266,571, similar to that observed in other prokaryotes. The isoelectric point (pI) of predicted proteins among sequenced General features of the genome prokaryotes exhibits a bimodal distribution with peaks at pIs of The genome of A. fulgidus consists of a single, circular approximately 5.5 and 10.5. The exceptions to this are Mycoplasma of 2,178,400 base pairs (bp) with an average of 48.5% G+C content genitalium in which the distribution is skewed towards high pI

Nature © Macmillan Publishers Ltd 1997 364 NATURE | VOL 390 | 27 NOVEMBER 1997 articles

1 Figure 1 Circular representation of the 2,100,000 100,000 A. fulgidus genome. The outer circle shows predicted protein-coding regions 2,000,000 200,000 on the plus strand classified by function according to the colour code in Fig. 2 1,900,000 (except for unknowns and hypotheticals, 300,000 which are in black). Second circle shows predicted protein-coding regions on the minus strand. Third and fourth circles 1,800,000 show IS elements (red) and other repeats 400,000 (green) on the plus and minus strand. Fifth and sixth circles show tRNAs (blue), rRNAs (red) and sRNAs (green) on the 1,700,000 plus and minus strand, respectively. 500,000

1,600,000 600,000

1,500,000 700,000

1,400,000 800,000

1,300,000 900,000

1,200,000 1,000,000 1,100,000

Table 1 Genome features

General Chromosome size: 2,178,400 bp Protein coding regions: 92.2% Stable RNAs: 0.4% ...... Predicted protein coding sequences: 2,436 (1.1 per kb) Identified by database match: 1,797 putative function assigned: 1,096 homologues of M. jannaschii ORFs: 916 conserved hypothetical proteins: 651 No database match: 639 Members of 242 paralogous families: 719 Members of 158 families with known functions: 475 ...... Stable RNAs Coordinates 16S rRNA: 1,790,478–1,788,987 23S rRNA 1,788,751–1,785,820 5S rRNA: 81,144–81,021 7S RNA: 798,067–798,376 RNase P: 86,281–86,032 46 species of tRNA: no significant clusters tRNAs with 15–62 bp introns: AspGUC, GluUUC, LeuCAA, TrpCCA, TyrGUA ...... Distinct G+C content regions Coordinates HGC-1, Ͼ53% G+C 1,786,000–1,797,000 HGC-2, Ͼ53% G+C 2,158,000–2,159,000 LGC-1, Ͻ39% G+C 281,000–284,000 LGC-2, Ͻ39% G+C 544,000–550,000 LGC-3, Ͻ39% G+C 1,175,000–1,177,000 ...... Short, non-coding repeats Coordinates SR-1A, CTTTCAATCCCATTTTGGTCTGATTTCAAC 147–4,213 SR-1B, CTTTCAATCCCATTTTGGTCTGATTTCAAC 398,368–401,590 SR-2, CTTTCAATCTCCATTTTCAGGGCCTCCCTTTCTTA 1,690,930–1,694,104 ...... Long, coding repeats Length Copy number LR-01 NADH-flavin 1,886 bp 2 copies LR-02 NifS, NifU + ORF 1,549 bp 2 copies LR-03 ISA1214 putative + ISORF2 1,214 bp 6 copies LR-04 ISA1083 putative transposase + ISORF2 1,083 bp 3 copies LR-05 type II secretion system protein 1,014 bp 4 copies LR-06 ISA0963 putative transposase 963 bp 7 copies LR-07 homologue of MJ0794 836 bp 3 copies LR-08 conserved hypothetical protein 696 bp 2 copies LR-09 conserved hypothetical protein 628 bp 2 copies ......

Nature © Macmillan Publishers Ltd 1997 NATURE | VOL 390 | 27 NOVEMBER 1997 365 articles

(median, 9.8) and A. fulgidus where the skew is toward low pI FadL. In E. coli, FadB has several metabolic functions, but in A. (median, 6.3). fulgidus these functions seem to be distributed among separate Multigene families. In A. fulgidus 719 genes (30% of the total) enzymes. For example, AF0435 encodes an orthologue of enoyl- belong to 242 families with two or more members (Table 1). Of CoA hydratase and resembles the amino-terminal domain of FadB. these families, 157 contained genes with biological roles. Most of This gene is immediately upstream of a gene encoding an ortholo- these families contain genes assigned to the ‘energy ’, gue of 3-hydroxyacyl-CoA that resembles the car- ‘transport and binding proteins’, and ‘fatty acid and phospholipid boxy-terminal domain of FadB. metabolism’ categories (Table 2). The superfamily of ATP-binding Acetyl-CoA is degraded by A. fulgidus through a C1-pathway, not subunits of ABC transporters is the largest, containing 40 members. by the cycle or glyoxylate bypass6,16,17. This degradation is The importance of catabolic degradation and signal recognition catalysed through the carbon monoxide dehydrogenase (CODH) systems is reflected by the presence of two large superfamilies: acyl- pathway that consists of a five-subunit acetyl-CoA decarboxylase/ CoA and signal-transducing . A. fulgidus complex (ACDS) and five enzymes that are typically does not contain a homologue of the large 16-member family found involved in methanogenesis18. In A. fulgidus, however, reverse 9 in M. jannaschii . methanogenesis occurs, resulting in CO2 production. All of the Repetitive elements. Three regions of the A. fulgidus genome enzymes and cofactors of methanogenesis from formylmethano- contain short (Ͻ40 bp) direct repeats (Table 1). Two regions (SR- furan to N5-methyltetrahydromethanopterin are used, but the 1A and SR-1B) contain 48 and 60 copies, respectively, of an identical absence of methyl-CoM reductase eliminates the possibility of 30-bp repeat interspersed with unique sequences averaging 40 bp. methane production by conventional pathways. Production of − The third region (SR-2) contains 42 copies of a 37-bp repeat similar trace amounts of methane (Ͻ0.1 ␮mol ml 1)19 is probably a result in sequence to the SR-1 repeat and interspersed with unique of the reduction of N5-methyltetrahydromethanopterin to methane sequence averaging 41 bp. These repeated sequences are similar to and tetrahydromethanopterin by carbon monoxide (CO) dehydro- the short repeated sequences found in M. jannaschii. genase. Nine classes of long (Ͼ500 bp) repeated sequences with у95% A. fulgidus also contains genes suggesting it has a second CO sequence identity were found (LR1-LR9; Table 1). LR-3 is a novel dehydrogenase system, homologous to that which enables element with 14-bp inverted repeats and two genes, one of which Rhodospirillum rubrum to grow without light using CO as its sole has weak similarity to a transposase from Halobacterium salinarium. energy source. Genes were detected for the nickel-containing CO One copy of LR-3 interrupts AF2090, a homologue of a large dehydrogenase (CooS), an iron–sulphur protein, and a M. jannaschii gene encoding a protein of unknown function. LR-4 protein associated with the incorporation of nickel in CooS. and LR-6 encode putative not identified in M. jan- These represent elements of a system that could catalyse the naschii that may represent IS elements. The remaining LR elements conversion of CO and H2O to CO2 and H2. are not similar to known IS elements. In contrast to M. jannaschii, A. fulgidus contains genes representing multiple catabolic pathways. Systems include CoA-SH-dependent Central intermediary and energy metabolism specific for pyruvate, 2-ketoisovalerate, Sulphur oxide reduction may be the dominant respiratory process 2-ketoglutarate and indolepyruvate, as well as a 2-oxoacid with little in anaerobic marine and freshwater environments, and is an specificity20,21. Four genes with similarity to the tungsten- important aspect of the sulphur cycle in anaerobic ecosystems12. containing ferredoxin oxidoreductase were also found22. 2− In this pathway, sulphate (SO4 ) is first activated to adenylylsulphate Biochemical pathways characteristic of eubacterial metabolism, (-5Ј-phosphosulphate; APS), then reduced to sulphite including the pentose- pathway, the Entner–Doudoroff and subsequently to sulphide1,13 (Fig. 3). The most important pathway, and , are either completely enzyme in dissimilatory sulphate reduction, adenylylsulphate absent or only partly represented (Fig. 3). A. fulgidus does not reductase, reduces the activated sulphate to sulphite, releasing have typical eubacterial biosynthesis machinery, yet AMP. In A. fulgidus, the APS reductase has a high degree of it has been shown to produce a protein and -contain- similarity and identical physiological properties to APS ing biofilm23. is obtained by importing inorganic mole- in sulphate-reducing delta proteobacteria14. A desulphoviridin-type cules or degrading amino acids (Fig. 3); neither a glutamate sulphite reductase then adds six electrons to sulphite to produce dehydrogenase nor a relevant fix or nif gene is present. 24 sulphide. As in the Eubacteria, three sulphite-reductase genes, The F420H2:quinone oxidoreductase complex is recognized as dsrABD, constitute an . The genes for adenylylsulphate reductase and sulphate adenylyltransferase reside in a separate operon. In A. fulgidus, sulphate can be replaced as an electron 2− 2− acceptor by both thiosulphate (S2O3 ) and sulphite (SO3 ), but not Figure 2 Linear representation of the A. fulgidus genome illustrating the location Q by elemental sulphur. of each predicted protein-coding region, RNA gene, and repeat element in the A. fulgidus VC-16 has been shown to use lactate, pyruvate, genome. Symbols for the transporters are as follows: AsO, arsenite; COH, sugar; + methanol, ethanol, 1-propanol and formate as carbon and energy Pi, phosphate; aa2, dipeptide; NH4, ; a/o, /lysine/ornithine; s/ 2 1 − sources . has been described as a carbon source , but neither p, spermidine/; glyc, glycerol; Cl , chloride; Fe2+, iron(II); Fe3+, iron(III); I, an uptake-transporter nor a catabolic pathway could be identified. L, V, branched-chain amino acids; P, ; pan, pantothenate; rib, ; lac, − Although it has been reported that A. fulgidus is incapable of growth lactate; Mg2+/Co2+, magnesium and cobalt; gln, ; NO3 , ; ox/for, 6 2+ 2− on acetate , multiple genes for acetyl-CoA synthetase (which con- oxalate/formate; maln, malonic acid; Hg , mercury; , polysaccharide; SO4 , − verts acetate to acetyl-CoA) were found. The organism may degrade sulphate; OCN , cyanate; hex, hexuronate; phs, polysialic acid; K+, potassium − a variety of hydrocarbons and organic acids because of the presence channel; H+/Na+, sodium/proton antiporter; Na+/Cl , sodium- and chloride- of 57 ␤-oxidation enzymes, at least one , and a minimum of dependent transporter; P/G, osmoprotection protein; Cu2+, copper-transporting five types of ferredoxin-dependent oxidoreductases (Fig. 3). The ATPase; +?, cation-transporting ATPase; ?, ABC-transporter without known predicted ␤-oxidation system is similar to those in Eubacteria and function. Triplets associated with tRNAs represent the anticodon sequence. mitochondria, and has not previously been described in the Numbers associated with GES represent the number of membrane-spanning Archaea. requires both the fadD and fadL gene domains (MSDs) according to Goldman, Engelman and Steiz scale as products to import long-chain fatty acids across the cell envelope determined by TopPred39. Genes whose identification is based on genes in 15 into the . A. fulgidus has 14 acyl-CoA ligases related to M. jannaschii are indicated by circles. Of the 236 proteins containing at least FadD, but as expected given that it has no outer membrane, no one MSD,124 of these had two or more MSDs.

Nature © Macmillan Publishers Ltd 1997 366 NATURE | VOL 390 | 27 NOVEMBER 1997 articles the main generator of proton-motive force. However, our analysis contains a large number of flavoproteins, iron–sulphur proteins indicates the presence of heterodisulphide reductase and several and iron-binding proteins that contribute to the general intra- -binding oxidoreductases, with polysulphide, nitrate, cellular flow of electrons (Fig. 3). Detoxification enzymes include a dimethyl sulphoxide, and thiosulphate as potential substrates, /, an alkyl-hydroperoxide reductase, arsenate which might contribute to energizing the cell membrane. A. fulgidus reductase, and eight NADH , presumably catalysing the

dppABCDF potABCD braCDEFG amt nhe2 hemUV feoB trkAH napA corA copB Na+ Na+ Cu2+

4 4 3 3 2 4

di-peptides spermidine Ile, Leu, Val NH4+ H+ Fe3+ Fe2+ K+ H+ Mg2+ putrescine ENERGY PRODUCTION BIOSYNTHETIC PATHWAYS DEGRADATIVE PATHWAYS R-5-P cat 2 Pentose shunt cyanate Arg, Lys, Sulfate reduction CO Glucose cynX 6-P-gluconate aliphatic Valine ornithine Sulfate L,M /Fatty Acids Aromatic G-6-P aromatic Leucine ? Biosynthesis Amino Acids sat CODH Glycerol Isoleucine putP 3 Pro Entner-Doudoroff fadD [9] glpK G-3-P ilvE arsenite Adenylsulfate CO2 aor [4] glpA aprAB arsB Acyl-CoA Rib1-5 P Dihdroxyacetone-P Ketoacids iorAB ? Thiamine diphosphate vorABDG Gln 3-P-adenylsulfate fadE (acd) [12] cysC Thiamine Biosynthesis glnHP β Rubisco Acetyl-CoA 2 -oxidation cycle Enoyl-CoA 3-P-Glycerate NO3- Sulfite Malate orAB mdhA Pro, dsrABD fad [5] Lactate korABDG lldD 2-P-Glycerate glycine- 3-hydroxyacyl-CoA Ox-acetate 2-oxobutyrate 2-oxoglutarate nrtBC Sulfide oadAB dld eno betaine hbd [10] proVWX ppsA aspB [4] Pyruvate PEP Chorismate aspC 3-ketoacyl-CoA por Glutamate SO42- Branched chain amino acids fadA [3] ABDG pflD acaB [12] Formate CO2 +H2 F420H2 Lysine cysAT H+ Acetyl-CoA Ethanol Aspartate Acyl-CoA(-2)* aspBC acs [6] Proline Acetate cdh TCA Quinone e-e- gltB Glutamate Arginine Pool e- Pi e-e- CH4 ? sucAB cycle e- Carbon monoxide dehydrogenase CH3 H4MPT [CO] Suc-CoA Ornithine e- Iron- proteins e- Aspartate PRPP pstABC CH =H MPT HCO3- + 2 4 CO2 mcmA H UMP Pyrimidines Sulfate reduction Pathway CH=H4MPT e- heterodisulfide, polysulfide, Methylmalonyl-CoA* Pyrimidine Biosynthesis Siroheme nitrate, nitrite, and CHO H4MPT Glutamine UroporphyrinogenIII Precorrin2 Cobalamin thiosulfate reductases? /Cobalamin Biosynthesis pathway CHO MFR ADP, P prsA [2] Folate Biosynthesis i Ribose-5-P PRPP Histidine ? (GTP) + CO2 H IMP Purines malate, De novo Purine Biosynthesis V0V1-ATPase ATP COO-H+ succinate, ATP synthase oxalate malonic acid hexuronate pantothenate polysialic acid lactate glycerol ribose COO-H+

2 2 3 2 2

formate COO-H+ oxlT mae1 exuT panF rfbAB lctP glpF rbsAC

Figure 3 An integrated view of metabolism and solute transport in A. fulgidus. spermidine/putrescine, branch-chain amino acids, iron (III), nitrate, sulphate, Biochemical pathways for energy production, biosynthesis of organic compounds, phosphate, ribose and polysialic acid transport, respectively). All other porters and degradation of amino acids, aldehydes and acids are shown with the central drawn as rectangles (glpF, glycerol uptake facilitator; copB, copper transporting components of A. fulgidus metabolism, sulphate, lactate and acetyl-CoA highlighted. ATPase; corA, magnesium and cobalt transporter). Export and import of solutes is Pathways or steps for which no enzymes were identified are represented by a red designated by arrows. The number of paralogous genes encoding each protein is arrow. A question mark is attached to pathways that could not be completely indicated in brackets for cytoplasmic enzymes, or within the figure for transporters. elucidated. Macromolecular biosynthesis of RNA, DNA and ether have Abbreviations: acs, acetyl-CoA synthetase; aor, aldehyde ferredoxin oxidoreduc- been omitted. Membrane-associated reactions that establish the proton-motive tase; aprAB, adenylylsulphate reductase; aspBC, aspartate aminotransferase; cdh, force (PMF) and generate ATP ( and V1V0-ATPase) are acetyl-CoA decarbonylase/synthase complex; cysC, adenylylsulphate 3-phospho- linked to cytosolic pathways for energy production. The oxalate-formate ; dld, D-; dsrABD, sulphite reductase; eno, antiporters (oxlT ) may also contribute to the PMF by mediating electrogenic ; fadA/acaB, 3-ketoacyl-CoA ; fadD, long-chain-fatty-acid-CoA anion exchange. Each gene product with a predicted function in ion or solute ligase; fad, enoyl-CoA hydratase; fadE (acd), acyl-CoA dehydrogenase; glpA, transport is illustrated. Proteins are grouped by substrate specificity with glycerol-3-phosphate dehydrogenase; glpK, glycerol ; gltB, glutamate transporters for cations (green), anions (red), /organic / synthase; hbd, 3-hydroxyacyl-CoA dehydrogenase; ilvE, branched-chain amino- acids (yellow), and amino acids/peptides/amines (blue) depicted. Ion-coupled acid aminotransferase; iorAB, indolepyruvate ferredoxin oxidoreductase; korABDG, permeases are represented by ovals (mae1, exuT, panF, lctP, arsB, cynX, 2-ketoglutarate ferredoxin oxidoreductase; lldD, L-lactate dehydrogenase; mcmA, napA/nhe2, amt, feoB, trkAH, cat and putP encode transporters for malate, methylmalonyl-CoA ; mdhA, L-; oadAB, oxaloacetate hexuronate, pantothenate, lactate, arsenite, cyanate, sodium, ammonium, iron decarboxylase; orAB, 2-oxoacid ferredoxin oxidoreductase; pflD, pyruvate (II), potassium, arginine/lysine and proline, respectively). ATP-binding cassette formate lysase 2; porABDG, pyruvate ferredoxin oxidoreductase; ppsA, phos- (ABC) transport systems are shown as composite figures of ovals, diamonds and phoenolpyruvate synthase; prsA, ribose-phosphate pyrophosphokinase; sucAB, circles (proVWX, glnHPQ, dppABCDF, potABCD, braCDEFG, hemUV, nrtBC, cysAT, 2-ketoglutarate dehydrogenase; sat, sulphate adenylyltransferase; TCA, tri- pstABC, rbsAC, rfbAB correspond to gene products for proline, glutamine, dipeptide, cycle; vorABDG, 2-ketoisovalerate ferredoxin oxidoreductase.

Nature © Macmillan Publishers Ltd 1997 NATURE | VOL 390 | 27 NOVEMBER 1997 367 articles four-electron reduction of molecular to water, with the III. As well as reverse gyrase, I (ref. 9), and topo- concurrent regeneration of NAD. VI (ref. 27), the genes for the first archaeal DNA gyrase were identified. Transporters A. fulgidus lacks a recognizable type II restriction-modification A. fulgidus may synthesize several transporters for the import of system, but contains one type I system. In contrast, two type II and carbon-containing compounds, probably contributing to its ability three type I systems were identified in M. jannaschii. No homologue to switch from autotrophic to heterotrophic growth5. Both M. of the M. jannaschii thermonuclease was identified. jannaschii and A. fulgidus have branched-chain amino-acid ABC The cell-division machinery is similar to that of M. jannaschii, transport systems and a transporter for the uptake of arginine and with orthologues of eubacterial fts and eukaryal cdc genes. However, lysine. A. fulgidus encodes proteins for dipeptide, spermidine/ several cdc genes found in M. jannaschii, including homologues of putrescine, proline/glycine-betaine and glutamine uptake, as well cdc23, cdc27, cdc47 and cdc54, appear to be absent in A. fulgidus. as transporters for sugars and acids, rather like the membrane systems described in eubacterial heterotrophs. These compounds and translation provide the necessary substrates for numerous biosynthetic and A. fulgidus and M. jannaschii have transcriptional and translational degradative pathways (Fig. 3). systems distinct from their eubacterial and eukaryal counterparts. Many A. fulgidus redox proteins are predicted to require iron. In both, the RNA contains the large universal subunits Correspondingly, iron transporters have been identified for the and five smaller subunits found in both Archaea and . import of both oxidized (Fe3+) and reduced (Fe2+) forms of iron. Transcription initiation is a simplified version of the eukaryotic There are duplications in functional and regulatory genes in both mechanism28,29. However, A. fulgidus alone has a homologue of systems. The uptake of Fe3+ may depend on haemin or a haemin- eukaryotic TBP-interacting protein 49 not seen in M. jannaschii, but like compound because A. fulgidus has orthologues to the eubac- apparently present in Sulfolobus solfactaricus. terial hem transport system proteins, HemU and HemV. A. fulgidus Translation in A. fulgidus parallels M. jannaschii with a few may also use the regulatory protein Fur to modulate Fe3+ transport; exceptions. The organism has only one rRNA operon with an Ala- this protein is not present in M. jannaschii. Fe2+ uptake occurs tRNA gene in the spacer and lacks a contiguous 5S rRNA gene. through a modified Feo system containing FeoB. This is the third Genes for 46 tRNAs were identified, five of which contain introns in example of an isolated feoB gene: M. jannaschii and Helicobacter the anticodon region that are presumably removed by the intron pylori also appear to lack feoA, implying that FeoA is not essential excision enzyme EndA. The gene for selenocysteine tRNA (SelC) for iron transport in these organisms. was not found, nor were the genes for SelA, SelB and SelD. With the A complex suite of proteins regulates ionic homeostasis. Ten exception of Asp-tRNAGTC and Val-tRNACAC, tRNA genes are not distinct transporters facilitate the flux of the physiological ions K+, linked in the A. fulgidus genome. The RNA component of the tRNA + + 2+ 2+ 3+ − 2− Na , NH4 , Mg , Fe , Fe , NO3, SO4 and inorganic phosphate maturation enzyme RNase P is present. Both A. fulgidus and M. (Pi). Most of these transporters have homologues in M. jannaschii jannaschii appear to possess an enzyme that inserts the tRNA- and are therefore likely to be critical for nutrient acquisition during modified nucleoside archaeosine, but only A. fulgidus has the related autotrophic growth. A. fulgidus has additional ion transporters for enzyme that inserts the modified base queuine. the elimination of toxic compounds including copper, cyanate and Both A. fulgidus and M. jannaschii lack and arsenite. As in M. jannaschii, the A. fulgidus genome contains two ; the relevant tRNAs are presumably amino- paralogous of cobalamin biosynthesis-cobalt transporters, acylated with glutamic and aspartic acids, respectively. An enzymatic cbiMQO. in situ transamidation then converts the to its amide form, as seen in other Archaea and in Gram-positive Eubacteria30. Sensory functions and regulation of Indeed, genes for the three subunits of the Glu-tRNA amidotrans- Consistent with its extensive energy-producing metabolism and ferase (gatABC) have been identified in A. fulgidus. The Lys versatile system for carbon utilization, A. fulgidus has complex aminoacyl-tRNA synthetase in both organisms is a class I-type, sensory and regulatory networks. These networks contain over 55 not a class II-type31. A. fulgidus possesses a normal tRNA synthetase proteins with presumed regulatory functions, including members for both Cys and Ser, unlike M. jannaschii in which the former was of the ArsR, AsnC and Sir2 families, as well as several iron- not identifiable and the latter was unusual9. dependent repressor proteins. There are at least 15 signal-transdu- M. jannaschii has a single gene belonging to the TCP-1 chaper- cing histidine kinases, but only nine response regulators; this onin family, whereas A. fulgidus has two that encode subunits ␣ and difference suggests there is a high degree of cross-talk between ␤ of the thermosome. Phylogenetic analysis of the archaeal TCP-1 kinases and regulators. Only four response regulators appear to be family indicates that these A. fulgidus genes arose by a recent species- in operons with histidine kinases, including those in the methyl- specific gene duplication, as is the case for the two subunits of the directed system (Che), which lies adjacent to the Thermoplasma acidophilum thermosome32 and the Sulfolobus flagellar biosynthesis operon. Although rich in regulatory proteins, shibatae rosettasome33. As in M. jannaschii, no dnaK gene was A. fulgidus apparently lacks regulators for response to amino-acid identified. and carbon starvation as well as to DNA damage. Finally, A. fulgidus contains a homologue of the mammalian mitochondrial benzo- Biosynthesis of essential components diazepine receptor, which functions as a sensor in signal-transduction Like most autotrophic , A. fulgidus is able to pathways25. These receptors have been previously identified only in synthesize many essential compounds, including amino acids, Proteobacteria and Cyanobacteria25. cofactors, carriers, purines and pyrimidines. Many of these biosyn- thetic pathways show a high degree of conservation between A. Replication, repair and cell division fulgidus and M. jannaschii. These two Archaea are similar in their A. fulgidus possesses two family B DNA , both related to biosynthetic pathways for siroheme, cobalamin, molybdopterin, the catalytic subunit of the eukaryal delta polymerase, as previously riboflavin, thiamin and nictotinate, the role category with greatest observed in the Sulfolobales26. It also has a homologue of the conservation between these two organisms being amino-acid bio- proofreading ⑀ subunit of E. coli Pol III, not previously observed synthesis. Of 78 A. fulgidus genes assigned to amino-acid bio- in the Archaea. The DNA repair system is more extensive than that synthetic pathways, at least 73 (94%) have homologues in M. found in M. jannaschii, including a homologue of the eukaryal jannaschii. For both archaeal species, amino-acid biosynthetic Rad25, a 3-methyladenine DNA , and exodeoxynuclease pathways resemble those of more closely than

Nature © Macmillan Publishers Ltd 1997 368 NATURE | VOL 390 | 27 NOVEMBER 1997 articles those of E. coli. For example, in A. fulgidus and M. jannaschii, highly conserved, with approximately 80% of the A. fulgidus tryptophan biosynthesis is accomplished by seven enzymes, TrpA, biosynthetic genes having homologues in M. jannaschii. In contrast, B, C, D, E, F, G as in B. subtilis, rather than by five enzymes, TrpA, B, only 35% of the A. fulgidus central intermediary metabolism genes C, D, E (including the bifunctional TrpC and TrpD) as found in E. have homologues, reflecting their minimal metabolic overlap. coli. Over half of the A. fulgidus ORFs (1,290) have no assigned No biotin biosynthetic genes were identified, yet biotin can be biological role. Of these, 639 have no database match. The remain- detected in A. fulgidus cell extracts34, and several genes encode a ing 651, designated ‘conserved hypothetical proteins’, have sequence biotin-binding consensus sequence. Similarly, A. fulgidus lacks the similarity to hypothetical proteins in other organisms, two-thirds genes for pyridoxine biosynthesis although pyridoxine can be found with apparent homologues in M. jannaschii. These shared hypothe- in cell extracts (albeit at lower levels than seen in E. coli and several tical proteins will probably add to our understanding of the genetic Archaea34). No gene encoding , the terminal enzyme repertoire of the Archaea. Analysis of the A. fulgidus and other in haem biosynthesis, has been identified, although A. fulgidus is archaeal and eubacterial genomes will provide the information known to use cytochromes34. These cofactors may be obtained by necessary to begin to define a core set of archaeal genes, as well as mechanisms that we have not recognized. Although all of the to better understand prokaryotic diversity. Ⅺ enzymes required for pyrimidine biosynthesis appear to be present, ...... three enzymes in the purine pathway (GAR transformylase, AICAR Methods formyltransferase and the ATPase subunit of AIR carboxylase) have Whole-genome random sequencing procedure. The type strain, A. fulgidus not been identified, presumably because they exist as new isoforms. VC-16, was grown from a culture derived from a single cell isolated by optical The Archaea share a unique cell membrane composed of ether tweezers37 and provided by K. O. Stetter (University of Regensburg). Cloning, lipids containing a glycerophosphate backbone with a 2,3-sn sequencing and assembly were essentially as described previously for genomes stereochemistry35 for which there are multiple biosynthetic sequenced by TIGR9,38–40. One small-insert and one medium-insert plasmid pathways36. In the case of Halobacterium cutirubrum, the backbone is library were generated by random mechanical shearing of genomic DNA. One apparently obtained by enantiomeric inversion of sn-glycerol-3- large-insert lambda (␭) library was generated by partial Tsp509I digestion and phosphate; in Sulfolobus acidocaldarius and Methanobacterium ligation to ␭-DASHII/EcoRI vector (Stratagene). In the initial random sequen- thermoautotrophicum, sn-glycerol-1-phosphate dehydrogenase builds cing phase, 6.7-fold sequence coverage was achieved with 27,150 sequences the backbone from dihydroxyacetonephosphate. An orthologue of from plasmid clones (average read length 500 bases) and 1,850 sequences from sn-glycerol-1-phosphate dehydrogenase has been identified in ␭-clones. Both plasmid and ␭-sequences were jointly assembled using TIGR A. fulgidus, suggesting that the latter pathway is present. assembler41, resulting in 152 contigs separated by sequence gaps and five groups of contigs separated by physical gaps. Sequences from both ends of 560 ␭-clones Conclusions served as a genome scaffold, verifying the orientation, order and integrity and Although A. fulgidus has been studied since its discovery ten years the contigs. Sequence gaps were closed by editing the ends of sequence traces ago1, the completed genome sequence provides a wealth of new and/or primer walking on plasmid or l-clones clones spanning the respective information about how this unusual organism exploits its environ- gap. Physical gaps were closed by combinatorial polymerase chain reaction ment. For example, its ability to reduce sulphur oxides has been well (PCR) followed by sequencing of the PCR product. At the end of gap closure, 90 characterized, but genome sequence data demonstrate that A. regions representing 0.33% of the genome had only single-sequence coverage. fulgidus has a great diversity of electron transport systems, some These regions were confirmed with reactions to ensure a minimum of unknown specificity. Similarly, A. fulgidus has been characterized of 2-fold sequence coverage for the whole genome. The final genome sequence as a scavenger with numerous potential carbon sources, and its gene is based on 29,642 sequences, with a 6.8-fold sequence coverage. The linkage complement reveals the extent of this capability. A. fulgidus appears between the terminal sequences of 2,101 clones from the small-insert plasmid to obtain carbon from fatty acids through ␤-oxidation, from library (average size 1,419 bp) and 8,726 clones from the medium-insert degradation of amino acids, aldehydes and organic acids, and plasmid library (average size 2,954 bp) supported the genome scaffold perhaps from CO. formed by the l-clones (average size 16,381 bp), with 96.9% of the genome A. fulgidus has extensive gene duplication in comparison with covered by l-clones. The reported sequence differs in 20 positions from the other fully sequenced prokaryotes. For example, in the fatty acid 14,389 bp of DNA in a total of 11 previously published A. fulgidus genes. and phospholipid metabolism category, there are 10 copies of 3- ORF prediction and identification. Coding regions (ORFs) were hydroxyacyl-CoA dehydrogenase, 12 copies of 3-ketoacyl-CoA identified using a combination strategy based on two programs. Initial sets of thiolase, and 12 of acyl-CoA dehydrogenase. The duplicated pro- ORFs were derived with GeneSmith (H.O.S., unpublished), a program that teins are not identical, and their presence suggests considerable evaluates ORF length, separation and overlap between ORFs, and with metabolic differentiation, particularly with respect to the pathways CRITICA (J.H.B. & G.J.O., unpublished), a coding region identification tool for decomposing and recycling carbon by scavenging fatty acids. using comparative analysis. The two largely overlapping sets of ORFs were Other categories show similar, albeit less dramatic, gene redun- merged into one joint set containing all members of both initial sets. ORFs were dancy. For example, there are six copies of acetyl-CoA synthetase searched against a non-redundant protein database using BLASTX10 and those and four aldehyde ferredoxin oxidoreductases for , as shorter than 30 codons ‘coding’ for proteins without a database match were well as four copies of aspartate aminotransferase for amino-acid eliminated. Frameshifts were detected and corrected where appropriate as biosynthesis. These observations, together with the large number of described previously40. Remaining frameshifts are considered authentic and paralogous gene families, suggest that gene duplication has been an corresponding regions were annotated as ‘authentic frameshift’. In total, 527 important evolutionary mechanism for increasing physiological hidden Markov models, based upon conserved protein families ( version diversity in the Archaeoglobales. 2.0), were searched with HMMER to determine ORF membership in families A comparison of two archaeal genomes is inadequate to assess the and superfamilies42. Families of paralogous genes were constructed as described diversity of the entire domain. Given this caveat, it is nevertheless previously40. TopPred43 was used to identify membrane-spanning domains in possible to draw some preliminary conclusions from the compar- proteins. ison of M. jannaschii and A. fulgidus. A comparison of the gene Received 9 September; accepted 4 November 1997. content of these Archaea reveals that gene conservation varies 1. Stetter, K. O., Lauerer, G., Thomm, M. & Neuner, A. Isolation of extremely thermophilic sulfate significantly between role categories, with genes involved in tran- reducers: Evidence for a novel branch of archaebacteria. Science 236, 822–824 (1987). scription, translation and replication highly conserved; approxi- 2. Stetter, K. O., in The Prokaryotes (eds Balows, A., Tru¨per, H. G., Dworkin, M., Harder, W. & Schleifer, K. H.) 707–711 (Springer, Berlin, 1992). mately 80% of the A. fulgidus genes in these categories have 3. Stetter, K. O. Microbial life in hyperthermal environments: Microorganisms from exotic environ- homologues in M. jannaschii. Biosynthetic pathways are also ments continue to provide surprises about life’s extremities. ASM News 61, 285–290 (1995).

Nature © Macmillan Publishers Ltd 1997 NATURE | VOL 390 | 27 NOVEMBER 1997 369 articles

4. Stetter, K. O. Archaeoglobus fulgidus gen. nov., sp. nov.: a new taxon of extremely thermophilic 25. Yeliseev, A. A., Krueger, K. E. & Kaplan, S. A mammalian mitochondrial drug receptor functions as a archaebacteria. Syst. Appl. Microbiol. 10, 172–173 (1988). bacterial ‘‘oxygen’’ sensor. Proc. Natl Acad. Sci. USA 94, 5101–5106 (1997). 5. Stetter, K. O. et al. Hyperthermophilic archaea are thriving in deep North Sea and Alaskan oil 26. Edgell, D. R., Klenk, H.-P. & Doolittle, W. F. Gene duplications in evolution of archaeal family B DNA reservoirs. Nature 365, 743–745 (1993). polymerases. J. Bacteriol. 179, 2632–2640 (1997). 6. Vorholt, J., Kunow, J., Stetter, K. O. & Thauer, R. K. Enzymes and coenzymes of the carbon monoxide 27. Bergerat, A. et al. An atypical topoisomerase II from archaea with implications for meiotic dehydrogenase pathway for autotrophic CO2 fixation in Archaeoglobus lithotrophicus and the lack of recombination. Nature 386, 414–417 (1997). carbon monoxide dehydrogenase in the heterotrophic A. profundus. Arch. Microbiol. 163, 112–118 28. Marsh, T. L., Reich, C. I., Whitelock, R. B. & Olsen, G. J. IID in the Archaea: (1995). sequences in the Thermococcus celer genome would encode a product closely related to the TATA- 7. Woese, C. R. & Fox, G. E. Phylogenetic structure of the prokaryotic domain: The primary kingdoms. binding protein of eukaryotes. Proc. Natl Acad. Sci. USA 91, 4180–4184 (1994). Proc. Natl Acad. Sci. USA 74, 5088–5090 (1977). 29. Kosa, P. F., Ghosh, G., DeDecker, B. S. & Sigler, P. B. The 2.1-A crystal structure of an archaeal 8. Woese, C. R., Kandler, O. & Wheelis, M. L. Towards a natural system of organisms: proposal for the preinitiation complex: TATA-box-binding protein/transcription factor (II)B core/TATA-box. Proc. domains Archaea, , and Eucarya. Proc. Natl Acad. Sci. USA 87, 4576–4579 (1990). Natl Acad. USA 94, 6042–6047 (1997). 9. Bult, C. J. et al. Complete genome sequence of the methanogenic archaeon Methanococcus jannaschii. 30. Curnow, A. W. et al. Glu-tRNAGln amidotransferase: a novel heterotrimeric enzyme required for Science 273, 1058–1073 (1996). correct decoding of glutamine codons during translation. Proc. Natl Acad. Sci. USA 94, 11819–11826 10. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. (1997). J. Mol. Biol. 215, 403–410 (1990). 31. Ibba, M., Bobo, J. L., Rosa, P. A. & Soll, D. Archaeal-type lysyl-tRNA synthetase in the Lyme disease 11. Riley, M. Functions of gene products of Escherichia coli. Microbiol. Rev. 57, 862–952 (1993). spirochete Borrelia burgdorferi. Proc. Natl Acad. Sci. USA (submitted). 12. Cooling, F, B. III, Maloney, C. L., Nagel, E., Tabinowski, J. & Odom, J. M. Inhibition of sulfate 32. Waldmann, T., Lupas, A., Kellermann, J., Peters, J. & Baumeister, W. Primary structure of the respiration by 1,8-dehydroxyanthraquinone and other anthraquinone derivatives. Appl. Environ. thermosome from Thermoplasma acidophilum. Hoppe-Seyler’s Biol. Chem. 376, 119–126 (1995). Microbiol. 62, 2999–3004 (1996). 33. Kagawa, H. K. et al. The 60 kDa heat shock proteins in the hyperthermophilic archaeon Sulfolobus 13. Thauer, R. K. & Kunow, J. in Sulfate Reducing Bacteria (ed. Barton, L. L.) 33–48 (Plenum, New York, shibatae. J. Mol. Biol. 253, 712–725 (1995). 1995). 34. Noll, K. M. & Barber, T. S. contents of archaebacteria. J. Bacteriol. 170, 4315–4321 (1988). 14. Speich, D. et al. Adenylylsulfate reductase from the sulfate-reducing archaeon Archaeoglobus fulgidus: 35. Thornebene, T. G. & Langworthy, T. A. Diphytanyl and dibiphytanyl glycerol ether lipids of cloning and characterization of the genes and comparison of the enzyme with other iron-sulfur methanogenic archaebacteria. Science 203, 51–53 (1979). flavoproteins. Microbiology 140, 1273–1284 (1994). 36. Nishihara, M. & Koga, Y. sn-glycerol-1-phosphate dehydrogenase in Methanobacterium thermoauto- 15. Clark, D. P. & Cronan, J. E. Jr in Escherichia coli and Salmonella typhimurium: Cellular and Molecular trophicum: key enzyme in biosynthesis of the enantiomeric glycerophosphate backbone of ether biology (ed Neidhardt, F. C.) 343–357 (ASM Press, Washington DC, 1996). phospholipids of archaebacteria. J. Biochem. 117, 933–935 (1995).

16. Mo¨ller-zirkhan, D. & Thauer, R. K. Anaerobic lactate oxidation to 3 CO2 by Archaeoglobus fulgidus via 37. Huber, R. et al. Isolation of a hyperthermophilic archaeum predicted by in situ RNA analysis. Nature the carbon monoxide dehydrogenase pathway: demonstration of the acetyl-CoA carbon-carbon 376, 57–58 (1995). cleavage reaction in cell extracts. Arch. Microbiol. 153, 215–218 (1990). 38. Fleischmann, R. D. et al. Whole-genome random sequenching and assembly of Haemophilus 17. Schauder, R., Eikmanns, B., Thauer, R. K., Widdel, F. & Fuchs, G. Acetate oxidation to CO2 in influenzae Rd. Science 269, 496–511 (1995). anaerobic-bacteria via a novel pathway not involving reactions of the citric-acid cycle. Arch. Microbiol. 39. Fraser, C. M. et al. The minimal gene complement of Mycoplasma genitalium. Science 270, 397–403 145, 162–172 (1986). (1995). 18. Dai, Y.-R. et al. Acetyl-CoA decarbonylase/synthase complex from Archaeoglobus fulgidus: purifica- 40. Tomb, J.-F. et al. The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature tion, characterization, and properties. Arch. Microbiol. (submitted). 388, 539–547 (1997). 19. Gorris, L. G. M., Voet, A. C. W. A. & van der Drift, C. Structural characteristics of methanogenic 41. Sutton, G. G., White, O., Adams, M. D. & Kerlavage, A. R. TIGR Assembler: A new tool for assembling cofactors in the non-methanogenic archaebacterium Archaeoglobus fulgidus. BioFactors 3, 29–35 large shotgun sequencing projects. Genome Sequence Technol. 1, 9–19 (1995). (1991). 42. Sonnhammer, E. L., Eddy, S. R. & Durbin, R. Pfam: A comprehensive database of protein families 20. Zhang, Q., Iwasaki, T., Wakagi, T. & Oshima, T. 2-oxoacid:ferredoxin oxidoreductase from the based on seed alignments. Proteins 28, 405–420 (1997). thermoacidophilic archaeon, Sulfolobus sp. strain 7. J. Biochem. 120, 587–599 (1996). 43. Claros, M. G. & von Heijne, G. TopPred II: an improved software for membrane 21. Tersteegen, A., Linder, D., Thauer, R. K. & Hedderich, R. Structures and functions of four anabolic 2- predictions. Comput. Appl. Biosci. 10, 685–686 (1994). oxoacid oxidoreductases in Methanobacterium thermoautotrophicum. Eur. J. Biochem. 244, 862–868 (1997). Acknowledgements. We thank M. Heaney, J. Scott and R. Shirley for software and database support; 22. Kletzin, A. & Adams, M. W. W. Molecular and phylogenetic characterization of pyruvate and 2- V. Sapiro, B. Vincent, J. Meehan and D. Maas for computer system support; B. Cameron and D. J. Doyle ketoisovalerate ferredoxin oxidoreductases from Pyrococcus furiosus and pyruvate ferredoxin oxidor- for editorial assistance: and K. O. Stetter for providing A. fulgidus VC-16. This work was supported by the eductase from Thermotoga maritima. J. Bacteriol. 178, 248–257 (1996). US Department of Energy. 23. LaPaglia, C. & Hartzell, P. L. Stress-induced production of biofilm in the hyperthermophile Archaeoglobus fulgidus. Appl. Environ. Microbiol. 63, 3158–3163 (1997). Correspondence and requests for materials should be addressed to J.C.V. (e-mail: [email protected]). The 24. Kunow, J., Linder, D., Stetter, K. O. & Thauer, R. K. F420H2: quinone oxidoreductase from annotated genome sequence and the gene family alignments are available on the World-Wide Web Archaeoglobus fulgidus—characterization of a membrane-bound mutlisubunit complex containing at http://www.tigr.org/tdb/mdb/afdb/afdb.html. The sequence has been deposited in GenBank with FAD and iron–sulfur clusters. Eur. J. Biochem. 223, 503–511 (1994). accession number AE000782.

Nature © Macmillan Publishers Ltd 1997 370 NATURE | VOL 390 | 27 NOVEMBER 1997 99 519 853 745 420 34 1695 2005 1383 1278 217 1470 190000 285000 380000 475000 570000 665000 760000 855000 950000 1043 74 95000 nt 1045000 1140000 1235000 1330000 1425000 1520000 1615000 1710000 1805000 1900000 1995000 2090000 213 2338 199 1586 953 518 199 223 98 419 1798 2229 1170 180 1897 209 17 1469 1382 2109 1042 ? rib 1277 1585 13 852 97 1694 85 2004 744 158 633 311 418 1276 2228 212 209 1169 1041 2337 952 GGG 1584 1381 1896 96 23 224 181 2003 36 1797 2108 1275 1468 517 417 95 GAC 1168 1693 1895 179 416 851 2227 2002 632 1583 743 415 1796 2107 211 310 631 212 951 414 1380 516 1 1040 1692 180 2001 1467 47 2336 1167 630 742 94 2106 2226 1582 217 309 1795 LR-7E 850 413 60 515 950 1894 1274 1691 229 629 2335 2105 94 741 1466 210 2000 SO4 308 1581 1379 93 157 740 1039 2225 514 412 99 2334 1690 2104 628 1794 1580 849 307 1378 739 SO4 1999 1166 513 949 1689 1038 92 1893 152 738 7 24 199 1793 411 1465 2103 9 512 1579 2333 209 737 1688 1037 50 225 627 198 1792 1892 848 1998 91 84 1464 159 306 157 1165 948 511 2102 1273 736 158 1578 40 1891 2224 1377 1791 2332 947 115 ISA1214-4 1687 180 1577 1036 847 1997 1890 305 1164 510 10 1996 153 90 1576 95 1686 626 2331 2101 735 37 1272 180 208 1463 1889 127 304 1790 2330 5 H+/Na+ 509 1575 153 180 1163 410 1376 1685 180 2436 1035 2223 846 151 625 1995 1162 1271 734 207 1161 151 89 Na+/Cl- 112 1375 241 159 508 2329 1270 2222 1160 2435 133 1462 1789 ISA1214-6 624 1574 182 303 946 1269 2100 206 1684 40 733 1374 1461 2434 1994 68 205 1788 845 2328 1888 23 72 1573 2221 1034 1268 2433 1460 179 409 RNaseP 1159 1373 302 103 732 236 623 1993 507 1683 204 2432 2327 88 173 1459 1572 1787 1682 GGT 199 2099 1267 844 81 2220 1992 203 731 301 227 2431 149 2326 Co2+ 408 1681 NO3- 1372 160 198 87 506 1786 1158 2219 945 300 171 843 2325 1266 1571 1991 48 1033 730 202 199 186 622 2218 1680 Co2+ 407 299 2430 842 1785 NO3- 1458 86 1887 Co2+ 505 1371 175 729 161 298 1265 2324 215 27 1157 406 184 25 6 1784 2098 6 944 297 85 1570 841 728 1679 621 2217 48 1990 238 201 405 2429 1370 504 84 CCT 1264 296 TTG 1783 6 172 2323 83 1569 295 727 1457 943 GCA 1369 1032 7 503 1782 404 2428 1678 1886 82 162 183 50 620 2216 2322 1368 1156 54 241 840 294 1263 726 25 1456 942 1568 CTG 241 200 185 1781 403 2097 1367 1989 2321 619 1155 502 198 TTT 1885 81 725 1677 941 402 618 163 2215 5S 2427 293 2320 1988 1455 1567 2096 199 1154 1884 120 401 51 292 127 940 8 1262 1031 LR-1A 235 617 1676 839 80 724 68 219 2319 1366 1454 182 199 2095 1780 501 1566 2094 2214 1153 K+ 2318 291 1675 GGC 221 1883 2426 16S 939 TGT 205 198 1565 212 400 2317 723 2093 69 240 79 28 1261 1030 616 838 1674 500 226 TGC 938 1564 1779 67 210 1152 1882 2425 2213 1365 1453 K+ 199 722 126 290 2212 1778 197 phs 242 1563 85 1260 2316 937 399 1151 721 60 78 615 1673 499 GAA 2090-C 837 129 1150 1777 2424 614 289 720 398 1562 phs SR-2 210 1672 1259 1881 1364 241 23S 1776 2211 1029 194 2315 14 936 836 180 613 6 1452 719 1561 288 835 397 46 77 2210 1258 2423 1880 2314 612 1775 1363 196 718 1671 159 834 2092 164 498 240 196 2209 205 935 2313 2422 1257 1560 717 170 611 396 101 1149 1362 833 146 60 2091 76 287 216 1028 2312 153 716 2208 934 216 209 832 1256 155 2311 228 1451 1559 1774 1987 195 1361 610 1879 715 128 194 212 395 75 216 831 286 52 a/o 933 2310 1670 165 1255 Pi 2207 1360 1986 158 217 609 193 73 2421 714 1027 1148 LR-7D 830 10 199 1878 497 53 608 1359 Pi 1450 198 1254 1773 74 1985 829 89 394 713 2206 1669 1147 2309 1253 147 285 239 41 GAT 828 206 932 1877 192 192 1358 1558 Pi 1668 68 712 73 1984 ? 1026 191 ? 1876 607 711 2090-N I,L,V AsO LR-9B 393 2205 116 496 199 1252 41 137 190 241 1983 1449 1772 1357 1875 1146 LR-8A 827 1667 Pi 180 130 284 2420 2308 495 ? 102 189 606 239 1025 72 199 1982 392 1251 78 48 1356 86 826 283 710 931 494 ? Pi 1557 211 209 2419 605 188 1874 199 2204 8 148 1666 2307 1771 213 1448 187 1250 1145 108 aa2 1981 2418 391 825 493 1249 282 224 1447 1355 26 I,L,V 1665 1556 2417 ? 199 2306 9 107 2203 186 heme 604 281 709 1980 1770 71 211 92 603 91 2305 1248 390 aa2 492 86 1873 280 1446 930 824 2202 708 48 1024 1555 124 2416 I,L,V 1144 2304 LR-2B 1354 138 185 2089 1979 491 159 1769 279 602 707 2201 82 389 aa2 1247 184 106 I,L,V 1554 1872 2303 823 929 1978 ISA1214-5 62 1143 1664 278 1445 2088 2200 490 153 183 70 1353 2415 199 1977 92 706 388 27 I,L,V 1768 928 1246 822 2302 1553 1871 aa2 489 601 2087 1142 10 182 69 199 178 488 1352 2414 166 180 277 184 2199 1245 1976 1023 927 1552 1870 2086 227 H+/Na+ LR-5A 121 387 2301 aa2 1444 1663 177 821 1351 68 162 2085 68 600 926 1141 1662 1869 705 1244 1022 2198 197 2 2413 139 181 1767 386 98 2300 1551 820 487 2084 1443 67 54 1350 1661 199 925 1243 1975 1021 599 276 385 ? 1868 2197 819 150 1550 1349 66 132 1140 180 2299 1348 1660 2083 200 704 2412 1020 65 924 1242 818 1442 384 21 1549 2082 1347 1974 64 2196 598 923 486 703 1766 1548 1659 179 199 2081 21 1139 63 1346 597 2298 ? 383 36 922 1019 1441 1547 485 702 219 61 1241 ? 2 2411 817 2080 200 2195 275 1973 382 1867 62 28 1345 1658 178 1546 921 ? 2079 701 1765 6 484 2410 1344 1018 103 596 199 1138 2194 381 61 1657 2297 1440 1545 1972 19 2193 1240 1764 2078 60 1656 920 2409 119 177 483 6 34 9 816 214 1017 1544 1343 59 700 201 1971 1655 2408 2192 1137 1239 1866 58 2077 1763 23s rRNA 5s/16s rRNAs tRNA 7S RNA RNaseP 274 919 1543 2296 11 1016 595 1439 380 122 111 482 1970 234 2407 CAG 815 14 5657 199 1015 7S 1654 918 42 1136 1542 2191 95 1342 2076 TAA 699 ? 55 176 1762 10 1865 1438 RNaseP 40 1238 13 229 1969 16S 54 1761 2295 814 594 1541 1864 2190 917 1135 698 481 1437 17 53 593 1653 2075 697 2189 1014 1760 1968 23S 2406 273 42 1863 480 207 696 52 2188 1341 31 226 813 2074 1436 24 85 60 2294 592 175 379 167 1759 1134 1967 1540 5S 2187 141 916 1013 1237 1862 479 165 51 237 272 2073 2293 1435 96 2405 695 591 1652 1966 167 1012 220 2186 174 1340 168 812 2404 1861 1539 1133 1236 2292 5 478 2072 378 590 271 1965 915 1758 1434 694 2403 1964 914 11 50 1235 173 1011 1339 1757 168 2185 2071 1132 1860 1234 1433 811 222 2291 913 270 184 477 172 377 1756 104 589 1651 1538 1010 2402 1859 166 269 1963 1233 2070 1432 2184 1131 CGG 268 810 74 1338 693 1009 2069 1755 171 1130 49 35 476 208 1431 GTA 1650 2290 912 1232 169 2183 1129 2068 1 1430 1754 809 1537 588 2067 1962 212 376 1008 1858 475 1128 2401 267 2066 2182 1429 170 1649 1753 2289 1127 1536 911 2065 2181 48 1231 692 1961 1535 474 1337 1007 2064 1648 1126 910 375 1534 89 2288 808 169 2180 185 +? 2063 1857 587 230 9 1428 1752 115 266 168 1230 140 ? Insertion element DNA repeat Transporter GES region Paralogous gene family Methanococcus homologue Authentic frameshift 197 374 1125 2179 2400 1647 909 47 1533 186 2062 97 691 2287 199 202 1006 1960 30 265 1427 167 CGA 586 807 1336 117 473 373 1532 glyc 45 9 1646 908 585 231 94 1124 1426 2178 1751 2061 1856 IS605 55 LP-8A 166 1531 1229 1335 2286 584 264 Fe+++ 1123 1005 46 1959 165 1334 690 2399 1530 110 232 372 583 187 1333 2060 1750 1645 231 806 1425 907 lac CCA 1529 2285 1332 164 689 472 224 16 263 233 371 239 1958 64 1122 1228 2284 2398 1528 1749 2059 1004 582 170 1227 2177 240 NH4+ 1855 2283 GTT 1331 1644 50 688 805 190 370 471 1226 163 1424 906 2058 28 1957 1527 234 2282 470 1748 56 63 1003 1225 1121 369 162 45 65 57 804 149 2397 1330 232 203 803 24 240 262 368 1747 1956 469 581 905 9 GTG 1120 29 1002 1643 1224 68 1526 2176 2057 7 213 1854 ox/for 90 687 241 86 1119 46 1423 161 802 1329 2281 233 367 1001 1746 904 93 1525 1955 12 1223 468 14 169 2056 580 1118 801 2175 NH4+ 261 2396 1853 1524 160 1000 1642 1422 2055 2395 1328 903 1523 686 1117 4 2280 100 467 123 1745 58 2174 2054 1222 180 579 800 Fe2+ 2279 1852 999 1116 112 62 22 1954 366 242 902 44 1744 P 238 1641 1115 238 685 2173 188 2053 57 109 1327 2278 2394 156 260 998 1221 799 1114 GCG 1743 1522 901 1326 216 1640 578 1113 1851 1421 43 684 102 77 1953 466 159 189 2277 1220 2052 GTC 2393 683 1742 900 92 1112 798 171 123 365 124 CAC 75 6 2172 1325 180 997 1639 2392 214 ? 1850 1952 2391 682 2051 42 577 2276 259 1111 CAT phs 1521 1219 1741 173 899 94 2171 2389-C 2050 797 258 phs 1110 576 129 41 364 10 257 1218 1740 1638 1420 16 1324 68 199 1109 234 575 37 2390 71 2275 681 898 996 30 2049 1951 1520 2170 1849 796 158 1217 1108 574 LR-6A 1 kb 465 104 1419 1739 573 363 131 256 2274 2169 40 1107 11 122 795 Other categories Transport Translation Transcription Conserved hypothetical Unknown 60 1637 572 157 31 362 2048 1950 84 1519 897 1848 1418 2168 13 242 1106 680 1323 11 132 2389-N 1847 571 255 1216 156 1949 GCT 361 glutamine 1636 1518 1738 76 1105 995 239 155 178 2273 679 224 794 12 1417 1846 1635 174 2167 2047 53 11 464 570 105 199 1517 896 39 121 360 177 678 1634 1104 212 254 154 793 1845 1416 1322 2166 1516 2388 569 1215 359 1948 994 1633 87 1844 895 90 1103 Cl- 2046 174 463 2165 CAG 38 2272 1737 792 235 153 180 1515 253 2387 1102 223 1214 358 677 242 14 993 20 568 241 215 2164 1843 2045 38 1632 1321 64 Pi 1415 1101 101 2271 791 9 1213 37 2386 172 1514 567 GGA 2044 2163 Co2+ 462 1320 1842 1947 1631 992 894 106 252 1736 566 36 2270 183 60 1630 790 2043 1212 227 2162 2385 152 1513 117 357 676 1319 Co2+ 565 138 1841 6 461 15 65 789 1629 2042 2161 113 1100 2269 LR-2A 1414 199 7S 1512 2160 675 Cu2+ 68 460 991 1735 35 564 107 788 9 1840 2041 356 120 151 16 2268 674 1628 CCC 251 893 180 206 2159 204 2384 1211 1946 108 150 563 355 136 673 2267 1839 152 1318 787 2158 34 48 9 55 1627 459 44 250 1099 990 562 59 892 2040 354 225 1734 2157 33 206 149 1626 249 1511 2266 TGA Regulatory functions Energy metabolism Fatty acid/Phospholipid metabolism Purines, pyrimidines, nucleosides and nucleotides Replication, repair, restriction/modification 1838 148 672 240 1098 353 891 TCG 1210 1317 51 1733 2265 2383 161 33 248 561 109 2156 2039 LR-8B 147 116 786 1837 352 77 1945 890 989 15 Mg2+/Co2+ 2264 1732 1625 59 66 6 146 458 1209 1097 113 137 2382 68 671 351 2155 32 145 785 217 1836 1731 1944 247 88 LR-7F TAG 1208 241 2263 43 1510 560 110 889 1316 2038 144 145 2381 rib 457 1624 2262 2154 1096 1413 1730 350 784 988 1207 31 670 26 1835 175 559 2153 2261 88 179 349 783 145 Fe2+ 1623 456 2380 143 888 44 1095 204 1943 126 rib 1509 1315 30 348 1729 9 78 987 669 239 782 2152 1206 237 558 1834 2379 190 2037 2260 109 246 1094 668 10 1622 29 2378 1093 667 986 142 9 347 2151 1314 pan 781 1092 204 666 126 maln 235 51 1942 887 199 557 2377 1728 455 32 141 67 1833 1508 LR-1B 1091 2036 2259 985 1621 28 58 2150 139 45 665 1205 rib 1313 1090 ? 2376 1941 148 126 140 245 Hg2+ 3 556 346 1089 ? 780 2149 984 1832 199 555 27 1507 1204 1088 28 1727 664 886 2035 345 139 1312 454 554 983 26 2375 2148 124 96 779 2258 244 1620 1506 1087 1940 1831 126 553 114 1086 87 344 138 453 3 1085 12 1412 CAT 1311 1203 12 9 1726 2147 778 552 1084 343 73 P/G 16 2374 885 2034 126 137 40 Amino acid biosynthesis Biosynthesis of cofactors, prosthetic groups, carriers Cell envelope Cellular processes Central intermediary metabolism 663 1083 982 25 452 1082 1505 1830 777 OCN- 342 43 136 193 1939 243 TTC 551 217 176 1081 236 1310 2146 1725 2373 LR-7C P/G 24 135 2257 1080 341 1619 1411 884 60 1079 776 1202 135 1078 981 451 1504 199 67 1077 240 24 2145 2033 2372 195 1309 1829 1618 GCC 206 49 1724 1938 1201 2144 2256 40 662 46 1076 340 48 P/G 242 883 450 180 134 75 550 1617 980 1308 1075 1503 2143 2371 775 217 80 1410 1828 1616 2142 1723 1074 LR-7A LR-9C 126 P/G 1200 209 449 2141 774 241 1502 119 979 1073 181 549 2370 1307 773 1615 80 23 46 1827 48 207 232 133 1501 1072 978 1937 882 180 2032 339 195 1199 177 2140 2255 772 661 17 217 1614 2369 240 1071 1306 1500 LR-5B 233 548 178 977 448 1936 180 2139 1613 22 81 132 6 98 660 1499 NH4+ 1722 1409 1198 9 771 196 1070 218 21 131 1826 180 71 338 1305 239 133 1935 192 2031 2138 241 2368 33 71 1498 LR-6B 659 228 1612 pan 881 180 146 2254 1197 447 976 242 130 547 198 2030 238 1304 20 44 1934 2137 58 446 1069 337 a/o 1408 1303 2367 154 1497 770 180 218 1825 129 2253 205 1407 1933 1302 445 61 19 546 2029 150 CAT 1196 336 2136 58 76 880 237 658 179 1068 1301 180 2366 1611 1721 444 203 769 128 123 2135 1824 1067 1496 pan 975 242 2252 199 241 1300 18 48 1932 879 236 ? 1610 2134 1823 335 545 187 1406 1195 657 1495 61 1066 768 334 2133 2251 235 1822 2028 878 32 118 127 1299 443 CCG 1494 1931 182 656 1720 1065 1194 1821 44 974 154 2365 1609 73 234 544 198 333 17 877 2132 1493 239 126 2250 767 233 1405 1193 29 125 SR-1B 1492 766 2027 1404 199 2364 1930 ? 199 1298 s/p 655 1064 232 glutamine 209 2249 764765 LR-9A 1403 1820 44 2131 876 17181719 1491 973 14 124 16 2248 1608 38 332 199 237 1717 543 198 2026 206 763 1192 Fe3+ 231 glutamine 123 2363 s/p 2247 1402 GAG 209 1490 39 1063 TCC 762 2130 1716 1607 654 208 972 1929 58 184 18 1297 28 15 122 P 199 875 331 s/p 442 761 1819 1062 ? TAC 2025 Fe3+ 874 230 134 13 2246 542 2362 1606 18 28 121 1401 238 1928 1191 179 199 2129 653 441 220 135 330 1061 14 179 CTT 2024 1489 873 971 541 229 2361 1818 201 1296 1927 440 1715 1400 239 2128 1605 1190 56 540 760 hex 63 141 120 652 439 872 2360 13 1295 228 2023 CTC 329 122 69 1926 12 2359 1060 970 194 1817 1488 2245 P 2127 539 1189 164 1714 CGT 871 47 2022 1713 1294 48 1188 438 1399 651 28 119 1925 538 1712 1487 1604 1816 58 12 870 969 143 3 650 227 1711 869 2126 537 2021 68 2358 328 121 437 759 1293 1059 CGC 1187 536 649 1924 118 68 2244 11 219 144 89 1058 1815 868 193 68 2020 1710 130 436 1923 1292 226 136 758 1486 2125 105 535 1603 117 327 235 2357 648 1057 48 968 82 1922 1398 1814 195 48 1056 2019 757 2124 225 1291 1186 1709 435 238 146 1602 I,L,V 2243 10 116 97 867 2356 1921 534 91 2123 99 1485 647 66 326 2355 1920 70 48 1601 967 1813 1055 756 224 1290 2018 115 142 2122 I,L,V 239 434 1185 Fe3+ 533 1919 70 1484 140 2121 1054 7 114 866 230 1397 19 9 2242 1600 81 1289 23532354 1918 239 966 2017 646 180 1917 223 1184 2120 325 81 433 I,L,V 119 1708 965 532 111 113 1053 1916 131 1396 1812 2352 645 2241 ox/for 755 198 8 1915 1 1599 2016 865 2351 2240 180 I,L,V 1483 1183 1811 644 12 1052 222 1914 1395 68 Fe3+ 964 324 8 1182 643 1598 163 112 1288 531 114 432 864 199 2239 1051 2119 1810 1913 1597 1394 I,L,V 1181 221 182 2015 7 863 1596 1707 642 159 1050 143 963 1809 238 Fe3+ 1912 111 754 199 323 431 1808 530 ISA1214-3 39 1911 1393 20 1482 2350 1595 641 1180 I,L,V 1706 1910 1481 1807 153 220 I,L,V 220 6 155 2238 1480 NO3- 144 962 10 862 2118 Fe3+ 1479 1909 640 430 242 1287 1392 2014 71 322 125 I,L,V 1594 110 1049 1478 217 1705 5 199 LR-7B 1908 753 1179 1477 CAA 2117 1593 I,L,V 1806 961 NO3- 2237 I,L,V 1907 639 429 2349 529 321 125 1476 1592 83 109 1391 1178 219 1704 752 1906 35 196 861 2348 156 2013 82 2236 1286 960 NO3- 4 638 199 528 1475 320 2116 751 108 134 428 176 1591 1805 1048 1905 126 199 239 1703 1177 I,L,V 319 K+ TGG 637 I,L,V 1390 959 860 2347 184 93 1285 2235 427 750 67 107 318 527 1904 221 1474 2012 1804 218 199 189 199 1702 I,L,V 958 I,L,V 2011 859 2234 526 1389 3 2115 49 1903 426 1176 52 1284 2346 106 67 1047 317 77 199 199 636 525 1283 1803 126 749 2010 316 21 1473 2345 425 1590 H+/Na+ 217 1046 188 1701 105 858 159 2114 83 67 957 315 68 2344 1282 2 1902 1175 118 126 524 2233 104 ISA1214-2 2343 314 424 202 1388 4 180 857 2342 2113 1472 22 1281 222 1700 1802 153 197 1589 2009 67 216 523 TCT 103 313 748 2341 956 7 635 100 856 1699 1174 2112 1901 215 72 102 522 955 1045 423 1280 1387 SR-1A 2232 1801 2008 1588 1173 1900 747 312 1698 1386 2111 855 634 2340 101 1471 2007 521 1385 1697 1279 214 2231 1800 954 746 1172 2110 1899 854 209 54 422 1384 1696 520 10 2006 1 311 1587 1044 199 118 2339 100 158 633 34 1171 1278 161 853 1470 745 953 180 421 1799 1695 2109 217 209 rib 1383 99 1898 223 2338 519 1043 22292230 2005 161 1897 74 420 1170 213

Nature © Macmillan Publishers Ltd 1997 Table 2 . List of A. fulgidus genes with putative identification. Gene numbers correspond to those in Fig. 2. Percentages represent per cent identities.

AMINO ACID BIOSYNTHESIS AF0722 cobalamin biosynthesis precorrin-6Y methylase (cbiE) 32.4% CELLULAR PROCESSES AF0732 cobalamin biosynthesis precorrin-8W General General decarboxylase (cbiT) 30.8% AF0906 hydantoin utilization protein A (hyuA) 27.4% AF1040 chemotaxis (cheA) 41.9% AF1336 cobalamin biosynthesis protein (cbiB) 38.4% AF1035 chemotaxis histidine kinase, putative 25.3% family AF0723 cobalamin biosynthesis protein (cbiD) 36.3% AF1036 chemotaxis histidine kinase, putative 30.4% AF0228 3-dehydroquinate (aroD) 36.8% AF0728 cobalamin biosynthesis protein (cbiM-1) 51.4% AF1037 chemotaxis protein (cheR) 33.2% AF1497 5-enolpyruvylshikimate 3-phosphate synthase (aroA) 41.5% AF1843 cobalamin biosynthesis protein (cbiM-2) 41.2% AF1042 chemotaxis response regulator (cheY) 62.9% AF1603 component I (trpE) 43.7% AF0731 cobalt transport ATP-binding protein (cbiO-1) 47.2% AF1034 methyl-accepting chemotaxis protein (tlpC-1) 27.5% AF1604 anthranilate synthase component II (trpD) 43.8% AF1841 cobalt transport ATP-binding protein (cbiO-2) 41.1% AF1045 methyl-accepting chemotaxis protein (tlpC-2) 29.6% AF1602 anthranilate synthase component II (trpG) 50.0% AF0729 cobalt transport protein (cbiN) 56.0% AF1041 protein-glutamate methylesterase (cheB) 43.3% AF0227 / (pheA) 32.2% AF0730 cobalt transport protein (cbiQ-1) 32.6% AF1032 purine NTPase, putative 32.2% AF0670 (aroC) 55.3% AF1842 cobalt transport protein (cbiQ-2) 30.3% AF1044 purine-binding chemotaxis protein (cheW) 40.4% AF1601 phosphoribosyl anthranilate isomerase (trpF) 37.1% AF1338 cobyric acid synthase (cbiP) 44.5% AF2327 shikimate 5-dehydrogenase (aroE) 43.1% AF2229 cobyrinic acid a,c-diamide synthase (cbiA) 42.3% Cell division AF0343 tryptophan repressor binding protein (wrbA) 46.6% AF1241 glutamate-1-semialdehyde aminotransferase (hemL) 54.3% AF0517 cell division control protein 21 (cdc21) 32.8% AF1599 , subunit alpha (trpA) 39.5% AF1975 glutamyl-tRNA reductase (hemA) 42.7% AF1297 cell division control protein 48, AAA family (cdc48-1) 69.1% AF1240 tryptophan synthase, subunit beta (trpB-1) 39.4% AF1594 heme biosynthesis protein (nirH) 25.2% AF2098 cell division control protein 48, AAA family (cdc48-2) 62.0% AF1600 tryptophan synthase, subunit beta (trpB-2) 64.1% AF1125 heme biosynthesis protein (nirJ-1) 38.7% AF0244 cell division control protein 6, putative 27.5% AF2009 heme biosynthesis protein (nirJ-2) 31.8% AF1285 cell division control protein, AAA family, putative 49.3% Aspartate family AF1593 heme d1 biosynthesis protein (nirD) 29.4% AF0696 cell division inhibitor (minD-1) 55.0% AF2112 5-methyltetrahydropteroyltriglutamate- AF1311 oxygen-independent coproporphyrinogen III AF1937 cell division inhibitor (minD-2) 32.8% methyltransferase (metE) 28.1% , putative 27.1% AF2051 cell division protein (ftsJ) 40.8% AF0882 (asnA) 45.9% AF1242 porphobilinogen deaminase (hemC) 46.3% AF0535 cell division protein (ftsZ-1) 60.4% AF1439 asparagine synthetase (asnB) 36.9% AF1974 porphobilinogen synthase (hemB) 60.4% AF0570 cell division protein (ftsZ-2) 61.4% AF2366 aspartate aminotransferase (aspB-1) 42.3% AF1784 protoporphyrinogen oxidase (hemK) 33.5% AF0837 cell division protein pelota (pelA) 41.7% AF2129 aspartate aminotransferase (aspB-2) 45.4% AF0422 uroporphyrin-III C-methyltransferase (cysG-1) 41.7% AF1215 cell division protein, putative 32.8% AF1623 aspartate aminotransferase (aspB-3) 39.4% AF1243 uroporphyrin-III C-methyltransferase (cysG-2) 52.5% AF0238 centromere/microtubule-binding protein (cbf5) 58.8% AF0409 aspartate aminotransferase (aspB-4) 45.2% AF0116 uroporphyrinogen III synthase (hemD) 27.4% AF1558 chromosome segregation protein (smc1) 32.8% AF1417 aspartate aminotransferase (aspC) 46.2% AF1822 serine/threonine (ppa) 31.9% AF0700 (lysC) 49.1% Menaquinone and ubiquinone AF1422 48.0% AF2176 4-hydroxybenzoate octaprenyltransferase (ubiA) 41.6% Chaperones AF1506 aspartate-semialdehyde dehydrogenase (asd) 60.9% AF0404 4-hydroxybenzoate octaprenyltransferase, putative 30.6% AF1296 small heat shock protein (-1) 52.3% AF0800 diaminopimelate decarboxylase (lysA) 45.6% AF2413 coenzyme PQQ synthesis protein (pqqE) 30.5% AF1971 small heat shock protein (hsp20-2) 38.1% AF0747 diaminopimelate epimerase (dapF) 45.8% AF1191 dihydroxynaphthoic acid synthase (menB) 54.6% AF2238 thermosome, subunit alpha (thsA) 70.6% AF0909 dihydrodipicolinate reductase (dapB) 48.6% AF1551 octaprenyl-diphosphate synthase (ispB) 33.2% AF1451 thermosome, subunit beta (thsB) 68.2% AF0910 dihydrodipicolinate synthase (dapA) 51.0% AF0140 ubiquinone/menaquinone biosynthesis Chromosome-associated protein AF0935 (hom) 47.9% methyltransferase (ubiE) 31.0% AF0337 archaeal A1 (hpyA1-1) 64.6% AF0886 S- (ahcY-1) 31.7% Molybdopterin AF1493 archaeal histone A1 (hpyA1-2) 69.7% AF2000 S-adenosylhomocysteinase hydrolase (ahcY-2) 67.3% AF2006 molybdenum biosynthesis protein (moaA) 47.8% AF0051 succinyl-diaminopimelate desuccinylase (dapE-1) 30.5% Detoxification AF0265 molybdenum cofactor biosynthesis protein (moaB) 44.4% AF0904 succinyl-diaminopimelate desuccinylase (dapE-2) 43.8% AF2173 2-nitropropane (ncd2) 39.7% AF2150 molybdenum cofactor biosynthesis protein (moaC) 62.0% AF0551 (thrC-1) 40.5% AF0270 alkyl hydroperoxide reductase 73.5% AF0931 molybdenum cofactor biosynthesis protein (moeA-1) 50.8% AF1316 threonine synthase (thrC-2) 61.0% AF1361 arsenate reductase (arsC) 30.5% AF0930 molybdenum cofactor biosynthesis protein (moeA-2) 44.8% AF0550 N-ethylammeline chlorohydrolase (trzA-1) 45.9% Glutamate family AF0161 molybdenum cofactor biosynthesis protein (moeA-3) 30.5% AF0997 N-ethylammeline chlorohydrolase (trzA-2) 44.5% AF1280 (argB) 56.1% AF0531 molybdenum cofactor biosynthesis protein (moeB) 44.0% AF0254 NADH oxidase (noxA-1) 35.1% AF2288 acetylglutamate kinase, putative 29.0% AF1022 molybdenum--binding protein (mopB) 39.3% AF0395 NADH oxidase (noxA-2) 35.5% AF0080 acetylornithine aminotransferase (argD-1) 48.3% AF1624 molybdopterin converting factor, subunit 1 (moaD) 36.6% AF0400 NADH oxidase (noxA-3) 40.8% AF1815 acetylornithine aminotransferase (argD-2) 36.2% AF2179 molybdopterin converting factor, subunit 2 (moaE) 33.3% AF0951 NADH oxidase (noxA-4) 36.7% AF0522 acetylornithine deacetylase (argE) 29.4% AF2005 molybdopterin- dinucleotide biosynthesis AF1858 NADH oxidase (noxA-5) 34.0% AF0883 argininosuccinate (argH) 42.2% protein A (mobA) 33.2% AF0455 NADH oxidase (noxB-1) 43.3% AF2252 argininosuccinate synthetase (argG) 62.0% AF2253 molybdopterin-guanine dinucleotide biosynthesis AF1262 NADH oxidase (noxB-2) 42.9% AF1147 glutamate N-acetyltransferase (argJ) 47.8% protein B (mobB) 40.0% AF0226 NADH oxidase (noxC) 38.4% AF0953 (gltB) 57.9% Pantothenate AF0515 NADH oxidase, putative 25.5% AF0949 glutamine synthetase (glnA) 43.3% AF1645 pantothenate metabolism (dfp) 42.4% AF2233 peroxidase / catalase (perA) 62.9% AF2071 N-acetyl-gamma-glutamyl-phosphate reductase (argC) 53.3% Riboflavin Protein and peptide secretion AF1255 ornithine carbamoyltransferase (argF) 51.7% AF0484 GTP cyclohydrolase II (ribA-1) 44.5% AF1902 protein , subunit SEC61 alpha (secY) 50.0% AF2107 GTP cyclohydrolase II (ribA-2) 47.1% AF0536 protein translocase, subunit SEC61 gamma (secE) 25.0% Pyruvate family AF1416 (ribC) 53.3% AF2062 signal recognition particle receptor (dpa) 54.8% AF0957 2-isopropylmalate synthase (leuA-1) 53.5% AF2128 riboflavin synthase, subunit beta (ribE) 75.9% AF1258 signal recognition particle, subunit SRP19 (srp19) 36.6% AF0219 2-isopropylmalate synthase (leuA-2) 53.9% AF2007 riboflavin-specific deaminase (ribG) 43.7% AF0622 signal recognition particle, subunit SRP54 (srp54) 51.2% AF2199 3-isopropylmalate dehydratase, large subunit (leuC) 49.3% AF1791 signal sequence peptidase (sec11) 36.3% AF0629 3-isopropylmalate dehydratase, small subunit (leuD-1) 56.4% Thiamine AF1657 signal sequence peptidase (spc21) 47.0% AF1761 3-isopropylmalate dehydratase, small subunit (leuD-2) 57.1% AF2075 hydroxyethylthiazole kinase (thiM) 33.6% AF1655 signal sequence peptidase, putative 34.5% AF0628 3-isopropylmalate dehydrogenase (leuB) 59.2% AF2208 hydroxymethylpyrimidine phosphate kinase (thiD) 35.5% AF0338 type II secretion system protein (gspE-1) 38.5% AF1720 , large subunit (ilvB-1) 57.5% AF1695 thiamine biosynthesis protein (apbA) 36.9% AF0659 type II secretion system protein (gspE-2) 38.2% AF1780 acetolactate synthase, large subunit (ilvB-2) 32.1% AF2412 thiamine biosynthesis protein (thiC) 60.2% AF0996 type II secretion system protein (gspE-3) 41.7% AF2015 acetolactate synthase, large subunit (ilvB-3) 34.1% AF0553 thiamine biosynthesis protein (thiF) 38.1% AF1049 type II secretion system protein (gspE-4) 46.5% AF2100 acetolactate synthase, large subunit (ilvB-4) 38.4% AF0088 thiamine biosynthesis protein, putative 28.2% AF1719 acetolactate synthase, small subunit (ilvN) 60.4% AF0702 thiamine biosynthetic enzyme (thi1) 50.0% CENTRAL INTERMEDIARY METABOLISM AF1672 acetolactate synthase, small subunit, putative 29.7% AF0733 thiamine monophosphate kinase (thiL) 30.4% Degradation of AF0933 branched-chain amino acid aminotransferase (ilvE) 59.0% AF2074 thiamine phosphate pyrophosphorylase (thiE) 45.5% AF1207 2-deoxy-D-gluconate 3-dehydrogenase (kduD) 45.3% AF1014 dihydroxy-acid dehydratase (ilvD) 54.5% Pyridine nucleotides AF1795 endoglucanase (celM) 55.4% AF1985 ketol-acid reductoisomerase (ilvC) 61.8% AF1000 NH(3)-dependent NAD+ synthetase (nadE) 52.0% Phosphorus compounds Serine family AF1839 nicotinate- pyrophosphorylase (nadC) 43.2% AF0756 (ppx1) 55.1% AF0813 phosphoglycerate dehydrogenase (serA) 48.8% AF1837 quinolinate synthetase (nadA), authentic frameshift 53.9% AF2138 phosphoserine phosphatase (serB) 50.7% Polyamine biosynthesis CELL ENVELOPE AF0273 oxidase, subunit alpha (soxA) 31.1% AF0646 (speB) 33.3% AF0274 , subunit beta (soxB) 26.5% Membranes, lipoproteins, and porins AF2334 (speE) 37.1% AF0852 serine hydroxymethyltransferase (glyA) 56.1% AF1420 membrane protein 51.8% Polysaccharides - (cytoplasmic) AF1354 membrane protein, putative 32.8% Histidine family AF0599 phosphate mannose synthase, putative 32.1% AF0590 ATP phosphoribosyltransferase (hisG) 31.6% Surface polysaccharides, lipopolysaccharides and antigens Sulfur metabolism AF0212 (hisD) 51.6% AF0324 dTDP-glucose 4,6-dehydratase (rfbB) 50.0% AF0288 adenylylsulfate 3-phosphotransferase (cysC) 52.0% AF2002 histidinol-phosphate aminotransferase (hisC-1) 39.8% AF0043 first mannosyl transferase (wbaZ-1) 30.0% AF1670 adenylylsulfate reductase, subunit A (aprA) 96.0% AF2024 histidinol-phosphate aminotransferase (hisC-2) 36.8% AF0606 first mannosyl transferase (wbaZ-2) 29.0% AF1669 adenylylsulfate reductase, subunit B (aprB) 97.3% AF0985 imidazoleglycerol-phosphate AF1728 26.9% AF1667 sulfate adenylyltransferase (sat) 28.4% dehydrogenase/histidinol-phosphatase (hisB) 42.2% AF0044 GDP-D-mannose dehydratase (gmd-1), AF2228 , desulfoviridin-type subunit AF0819 imidazoleglycerol-phosphate synthase, authentic frameshift 40.7% gamma (dsvC) 41.3% cyclase subunit (hisF) 67.0% AF1142 glucose-1-phosphate cytidylyltransferase (rfbF) 38.6% AF0423 sulfite reductase, subunit alpha (dsrA) 100.0% AF2265 imidazoleglycerol-phosphate synthase, AF0242 glucose-1-phosphate thymidylyltransferase (graD-1) 27.7% AF0424 sulfite reductase, subunit beta (dsrB) 100.0% subunit H (hisH) 44.4% AF0325 glucose-1-phosphate thymidylyltransferase (graD-2) 45.2% AF0425 sulfite reductase, subunit gamma (dsrD) 97.4% AF0509 imidazoleglycerol-phosphate synthase, AF0321 glycosyl transferase 30.7% subunit H, putative 43.2% AF0387 , putative 33.8% Other AF1950 phosphoribosyl-AMP cyclohydrolase/ AF0467 immunogenic protein (bcsp31-1) 34.7% AF1706 2-hydroxy-6-oxo-6-phenylhexa-2,4-dienoic acid phosphoribosyl-ATP pyrophosphohydrolase (hisIE) 59.6% AF0635 immunogenic protein (bcsp31-2) 44.3% hydrolase (pcbD) 29.4% AF0713 phosphoribosylformimino-5-aminoimidazole AF0988 immunogenic protein (bcsp31-3) 28.3% AF0675 2-hydroxy-6-oxohepta-2,4-dienoate hydrolase (todF) 26.3% carboxamide ribotide isomerase (hisA-1) 37.5% AF0602 LPS biosynthesis protein, putative 29.6% AF0091 2-hydroxyhepta-2,4-diene-1,7-dioate isomerase AF0986 phosphoribosylformimino-5-aminoimidazole AF0617 LPS biosynthesis protein, putative 29.0% (hpcE-1) 44.5% carboxamide ribotide isomerase (hisA-2) 42.2% AF0607 LPS glycosyltransferase, putative 29.7% AF2225 2-hydroxyhepta-2,4-diene-1,7-dioate isomerase AF0326 mannose-1-phosphate (hpcE-2) 66.0% BIOSYNTHESIS OF COFACTORS, PROSTHETIC GROUPS, AND CARRIERS (rfbM), authentic frameshift 42.4% AF0333 4-hydroxyphenylacetate-3-hydroxylase (hpaA-1) 22.4% AF1097 mannose-6-phosphate isomerase/mannose-1- AF0885 4-hydroxyphenylacetate-3-hydroxylase (hpaA-2) 26.0% General phosphate guanylyl transferase (manC) 43.1% AF1027 4-hydroxyphenylacetate-3-hydroxylase (hpaA-3) 21.0% AF1855 2,3-dihydroxybenzoate-AMP ligase (entE) 27.2% AF0035 mannosephosphate isomerase, putative 31.3% AF0669 4-oxalocrotonate tautomerase, putative 31.9% AF1070 coenzyme F390 synthetase (ftsA-1) 30.3% AF0045 A (mtfA) 38.7% AF0808 glycolate oxidase subunit (glcD) 32.0% AF1671 coenzyme F390 synthetase (ftsA-2) 31.9% AF0311 O-antigen biosynthesis protein (rfbC), authentic AF2216 methylmalonyl-CoA decarboxylase, biotin carboxyl AF2013 coenzyme F390 synthetase (ftsA-3) 30.4% frameshift 30.6% carrier subunit (mmdC) 36.2% AF2151 (entB) 31.2% AF0458 (pmm) 39.5% AF2217 methylmalonyl-CoA decarboxylase, subunit alpha Folic acid AF0595 polysaccharide biosynthesis protein, putative 24.1% (mmdA) 62.5% AF1414 40.8% AF0322 rhamnosyl transferase (rfbQ) 27.5% AF1288 methylmalonyl-CoA mutase, subunit alpha (mutB), AF0323 spore coat polysaccharide biosynthesis protein authentic frameshift 46.1% Heme and porphyrin (spsK-2), authentic frameshift 36.3% AF2219 methylmalonyl-CoA mutase, subunit alpha, AF1648 bacteriochlorophyll synthase, 33 kDa subunit 27.9% AF0620 succinoglycan biosynthesis protein (exoM) 24.8% C-terminus (mcmA2) 48.7% AF0464 bacteriochlorophyll synthase, 43 kDa subunit (chlP-1) 29.7% AF0361 UDP-glucose 4-epimerase (galE-1) 38.6% AF2215 methylmalonyl-CoA mutase, subunit alpha, AF1023 bacteriochlorophyll synthase, 43 kDa subunit (chlP-2) 31.2% AF2016 UDP-glucose 4-epimerase (galE-2) 30.0% N-terminus (mcmA1) 51.2% AF1637 bacteriochlorophyll synthase, 43 kDa subunit (chlP-3) 27.0% AF0302 UDP-glucose dehydrogenase (ugd-1) 43.8% AF2099 muconate cycloisomerase II (clcB) 24.9% AF0037 cobalamin (5’-phosphate) synthase (cobS-1) 33.9% AF0596 UDP-glucose dehydrogenase (ugd-2) 44.1% AF1425 phosphonopyruvate decarboxylase (bcpC-1) 35.0% AF2323 cobalamin (5’-phosphate) synthase (cobS-2) 34.4% AF1751 phosphonopyruvate decarboxylase (bcpC-2) 48.6% AF0725 cobalamin biosynthesis precorrin methylase (cbiG) 30.7% Surface structures AF0727 cobalamin biosynthesis precorrin-2 methyltransferase ENERGY METABOLISM AF1054 flagellin (flaB1-1) 30.0% (cbiL) 31.5% AF1055 flagellin (flaB1-2) 31.1% Amino acids and amines AF0726 cobalamin biosynthesis precorrin-3 methylase (cbiF) 49.2% AF0275 surface layer protein B (slgB-1) 30.8% AF1958 2-hydroxyglutaryl-CoA dehydratase, subunit alpha AF0724 cobalamin biosynthesis precorrin-3 methylase (cbiH) 49.0% AF1413 surface layer protein B (slgB-2) 29.9% (hgdA) 30.5%

Nature © Macmillan Publishers Ltd 1997 AF1957 2-hydroxyglutaryl-CoA dehydratase, AF0499 molybdopterin oxidoreductase, iron-sulfur binding TCA cycle subunit beta (hgdB) 24.4% subunit 41.5% AF1963 (acn) 57.1% AF0130 acetylpolyamine (aphA) 38.7% AF0500 molybdopterin oxidoreductase, membrane subunit 27.9% AF1340 (citZ) 50.3% AF2290 acetylpolyamine aminohydrolase, putative 33.3% AF1202 molybdopterin oxidoreductase, iron-sulfur AF1098 (fum-1) 49.1% AF0991 glutaryl-CoA dehydrogenase (gcdH) 48.7% binding subunit 35.5% AF1099 fumarase (fum-2) 53.4% AF1323 group II decarboxylase 28.0% AF1203 molybdopterin oxidoreductase, molybdopterin binding AF0647 , NADP (icd) 57.2% AF2004 group II decarboxylase 46.1% subunit 30.1% AF1727 malate oxidoreductase (mae) 52.3% AF2295 group II decarboxylase 30.5% AF2384 molybdopterin oxidoreductase, molybdopterin binding AF0681 , flavoprotein subunit A AF1665 (arcB) 35.3% subunit 34.6% (sdhA) 48.2% AF2385 molybdopterin oxidoreductase, iron-sulfur binding AF0682 succinate dehydrogenase, iron-sulfur subunit B (sdhB)51.3% Anaerobic subunit 46.9% AF0683 succinate dehydrogenase, subunit C (sdhC) 36.6% AF1145 4-hydroxybutyrate CoA transferase (cat2-1) 46.5% AF2386 molybdopterin oxidoreductase, membrane subunit 30.3% AF0684 succinate dehydrogenase, subunit D (sdhD) 25.9% AF1854 4-hydroxybutyrate CoA transferase (cat2-2) 47.5% AF0159 molybdopterin oxidoreductase, molybdopterin AF1539 succinyl-CoA synthetase, alpha subunit (sucD-1) 56.9% AF0866 (glpK) 33.8% binding subunit, putative 30.9% AF2185 succinyl-CoA synthetase, alpha subunit (sucD-2) 63.5% AF1328 glycerol-3-phosphate dehydrogenase (glpA) 27.8% AF2267 NAD(P)H-flavin oxidoreductase 31.4% AF1540 succinyl-CoA synthetase, beta subunit (sucC-1) 51.3% AF0871 glycerol-3-phosphate dehydrogenase (NAD(P)+) AF0131 NAD(P)H-flavin oxidoreductase, putative 28.2% AF2186 succinyl-CoA synthetase, beta subunit (sucC-2) 49.6% (gpsA) 36.3% AF2352 NADH dehydrogenase, subunit 1, putative 28.9% AF0020 L- (caiB-1) 33.3% FATTY ACID AND PHOSPHOLIPID METABOLISM AF1828 NADH dehydrogenase, subunit 3 24.3% AF0990 L-carnitine dehydratase (caiB-2) 31.2% AF0248 NADH-dependent flavin oxidoreductase 36.7% General ATP-proton motive force interconversion AF0342 nigerythrin, putative 33.3% AF1736 3-hydroxy-3-methylglutaryl- reductase AF1158 ATP synthase, subunit E, putative 47.1% AF0546 , gamma subunit (narI) 30.1% (mvaA) 57.1% AF1166 H+-transporting ATP synthase, subunit A (atpA) 67.0% AF0501 nitrate reductase, gamma subunit, putative 29.3% AF0017 3-hydroxyacyl-CoA dehydrogenase (hbd-1) 41.1% AF1167 H+-transporting ATP synthase, subunit B (atpB) 72.6% AF1126 P450 , putative 30.5% AF0285 3-hydroxyacyl-CoA dehydrogenase (hbd-2) 55.8% AF1164 H+-transporting ATP synthase, subunit C (atpC) 37.5% AF0463 polyferredoxin (mvhB), authentic frameshift 32.2% AF0434 3-hydroxyacyl-CoA dehydrogenase (hbd-3) 40.7% AF1168 H+-transporting ATP synthase, subunit D (atpD) 47.1% AF1379 quinone-reactive Ni/Fe- B-type AF1025 3-hydroxyacyl-CoA dehydrogenase (hbd-4) 45.6% AF1163 H+-transporting ATP synthase, subunit E (atpE) 36.3% cytochrome subunit (hydC) 29.0% AF1122 3-hydroxyacyl-CoA dehydrogenase (hbd-5) 45.2% AF1165 H+-transporting ATP synthase, subunit F (atpF) 45.0% AF0173 reductase, assembly protein 30.0% AF1177 3-hydroxyacyl-CoA dehydrogenase (hbd-6) 35.8% AF1159 H+-transporting ATP synthase, subunit I (atpI) 30.1% AF0547 reductase, iron-sulfur binding subunit 28.3% AF1190 3-hydroxyacyl-CoA dehydrogenase (hbd-7) 46.5% AF1160 H+-transporting ATP synthase, subunit K (atpK-1) 46.3% AF0867 reductase, putative 33.3% AF1206 3-hydroxyacyl-CoA dehydrogenase (hbd-8) 36.3% AF1162 H+-transporting ATP synthase, subunit K (atpK-2) 46.3% AF0880 rubredoxin (rd-1) 69.2% AF2017 3-hydroxyacyl-CoA dehydrogenase (hbd-9) 35.4% AF1349 rubredoxin (rd-2) 67.9% AF2273 3-hydroxyacyl-CoA dehydrogenase (hbd-10) 39.4% Electron transport AF0832 rubrerythrin (rr1) 45.7% AF0018 3-ketoacyl-CoA thiolase (acaB-1) 41.0% AF2036 folding protein (coxD) 33.3% AF0831 rubrerythrin (rr2) 63.7% AF0034 3-ketoacyl-CoA thiolase (acaB-2) 38.3% AF0144 cytochrome C oxidase, subunit II (cbaB) 34.2% AF1640 rubrerythrin (rr3) 37.8% AF0133 3-ketoacyl-CoA thiolase (acaB-3) 32.3% AF0142 cytochrome C oxidase, subunit II, putative 38.0% AF2312 rubrerythrin (rr4) 41.4% AF0134 3-ketoacyl-CoA thiolase (acaB-4) 32.5% AF0190 cytochrome C oxidase, subunit II, putative 31.7% AF0711 thioredoxin (trx-1) 28.4% AF0201 3-ketoacyl-CoA thiolase (acaB-5) 26.9% AF1057 cytochrome C-type biogenesis protein (ccdA) 30.7% AF0769 thioredoxin (trx-2) 38.5% AF0202 3-ketoacyl-CoA thiolase (acaB-6) 33.5% AF2192 cytochrome C-type biogenesis protein (nrfE) 36.1% AF1284 thioredoxin (trx-3) 52.9% AF0283 3-ketoacyl-CoA thiolase (acaB-7) 42.0% AF2296 cytochrome oxidase, subunit I (cydA-1) 22.9% AF2144 thioredoxin (trx-4) 48.9% AF0438 3-ketoacyl-CoA thiolase (acaB-8) 42.4% AF2297 cytochrome oxidase, subunit I (cydA-2) 31.5% AF1339 ubiquinol-cytochrome C reductase complex, AF0967 3-ketoacyl-CoA thiolase (acaB-9) 33.7% AF2046 cytochrome oxidase, subunit I, putative 25.1% subunit VI requiring protein 60.9% AF0968 3-ketoacyl-CoA thiolase (acaB-10) 28.0% AF0528 cytochrome-c3 hydrogenase, subunit gamma 39.3% AF1291 3-ketoacyl-CoA thiolase (acaB-11) 40.1% AF0833 desulfoferrodoxin (dfx) 63.0% Fermentation AF2416 3-ketoacyl-CoA thiolase (acaB-12) 49.9% AF0344 desulfoferrodoxin, putative 47.3% AF1779 2-hydroxyacid dehydrogenase, putative 37.6% AF1028 3-ketoacyl-CoA thiolase (fadA-1) 38.8% AF0287 electron transfer flavoprotein, subunit alpha (etfA) 39.7% AF0469 2-ketoglutarate ferredoxin oxidoreductase, AF1197 3-ketoacyl-CoA thiolase (fadA-2) 47.2% AF0286 electron transfer flavoprotein, subunit beta (etfB) 38.8% subunit alpha (korA) 52.3% AF2243 3-ketoacyl-CoA thiolase (fadA-3) 40.3% AF1380 F420-nonreducing hydrogenase (vhtA) 34.8% AF0468 2-ketoglutarate ferredoxin oxidoreductase, AF0033 acyl carrier protein synthase (acaA-1) 28.6% AF1371 F420-nonreducing hydrogenase (vhtD-1) 30.9% subunit beta (korB) 51.2% AF2415 acyl carrier protein synthase (acaA-2) 58.7% AF1378 F420-nonreducing hydrogenase (vhtD-2) 33.1% AF0470 2-ketoglutarate ferredoxin oxidoreductase, AF0199 acyl-CoA dehydrogenase (acd-1) 35.9% AF1381 F420-nonreducing hydrogenase (vhtG) 46.1% subunit delta (korD) 47.2% AF0436 acyl-CoA dehydrogenase (acd-2) 44.1% AF1824 F420H2:quinone oxidoreductase, 11.2 kDa subunit, AF0471 2-ketoglutarate ferredoxin oxidoreductase, AF0498 acyl-coA dehydrogenase (acd-3) 22.9% putative 24.1% subunit gamma (korG) 40.0% AF0671 acyl-CoA dehydrogenase (acd-4) 37.9% AF1823 F420H2:quinone oxidoreductase, 16.5 kDa subunit, AF2053 2-ketoisovalerate ferredoxin oxidoreductase, AF0845 acyl-CoA dehydrogenase (acd-5) 44.6% putative 25.7% subunit alpha (vorA) 41.2% AF0964 acyl-CoA dehydrogenase (acd-6) 35.8% AF1832 F420H2:quinone oxidoreductase, 32 kDa subunit AF2052 2-ketoisovalerate ferredoxin oxidoreductase, AF1026 acyl-CoA dehydrogenase (acd-7) 42.6% (nuoI) 95.5% subunit beta (vorB) 42.7% AF1141 acyl-CoA dehydrogenase (acd-8) 43.2% AF1833 F420H2:quinone oxidoreductase, 39 kDa AF2054 2-ketoisovalerate ferredoxin oxidoreductase, AF1293 acyl-CoA dehydrogenase (acd-9) 45.8% subunit, putative 33.6% subunit delta (vorD) 51.5% AF2057 acyl-CoA dehydrogenase (acd-10) 44.6% AF1829 F420H2:quinone oxidoreductase, 39.7 kDa AF2055 2-ketoisovalerate ferredoxin oxidoreductase, AF2244 acyl-CoA dehydrogenase (acd-11) 42.6% subunit, putative 43.8% subunit gamma (vorG) 45.2% AF2275 acyl-CoA dehydrogenase (acd-12) 38.9% AF1831 F420H2:quinone oxidoreductase, 41.2 kDa subunit, AF0749 2-oxoacid ferredoxin oxidoreductase, AF1175 acyl-CoA dehydrogenase, short chain-specific (acdS) 30.1% putative 34.8% subunit alpha (orA) 33.7% AF0818 (acyP) 36.8% AF1827 F420H2:quinone oxidoreductase, 43.2 kDa subunit, AF0750 2-oxoacid ferredoxin oxidoreductase, AF0868 alkyldihydroxyacetonephosphate synthase 33.6% putative 26.9% subunit beta (orB) 49.2% AF2286 bifunctional short chain isoprenyl diphosphate AF1830 F420H2:quinone oxidoreductase, 45 kDa subunit AF1286 acetoin utilization protein, putative 35.1% synthase (idsA) 42.7% (nuoD) 80.0% AF0197 acetyl-CoA synthetase (acs-1) 27.1% AF0220 (acc) 59.1% AF1825 F420H2:quinone oxidoreductase, 53.9 kDa subunit AF0366 acetyl-CoA synthetase (acs-2) 47.3% AF0865 (est-1) 27.1% (nuoM) 32.1% AF0677 acetyl-CoA synthetase (acs-3) 40.9% AF1537 carboxylesterase (est-2) 29.0% AF1826 F420H2:quinone oxidoreductase, 72.4 kDa AF0975 acetyl-CoA synthetase (acs-4) 42.3% AF2336 carboxylesterase (est-3) 30.4% subunit (nuoL) 33.2% AF0976 acetyl-CoA synthetase (acs-5) 36.2% AF1716 carboxylesterase (estA) 40.4% AF0156 ferredoxin (fdx-1) 45.3% AF1287 acetyl-CoA synthetase (acs-6) 34.3% AF1744 CDP-diacylglycerol—glycerol-3-phosphate 3- AF0166 ferredoxin (fdx-2) 49.2% AF0024 dehydrogenase, iron-containing 36.2% phosphatidyltransferase (pgsA-2) 26.7% AF0355 ferredoxin (fdx-3) 53.2% AF0339 , iron-containing 37.4% AF1143 CDP-diacylglycerol—glycerol-3-phosphate-3- AF0427 ferredoxin (fdx-4) 56.1% AF2019 alcohol dehydrogenase, iron-containing 35.7% phosphatidyltransferase (pgsA-1) 27.0% AF0923 ferredoxin (fdx-5) 56.9% AF2389-C acetyl-CoA synthetase, putative 64.8% AF2044 CDP-diacylglycerol—serine O-phosphatidyltransferase AF1010 ferredoxin (fdx-6) 44.4% AF2389-N acetyl-CoA synthetase, putative 59.3% (pssA) 36.6% AF1239 ferredoxin (fdx-7) 29.0% AF2101 alcohol dehydrogenase, zinc-dependent 34.8% AF0435 enoyl-CoA hydratase (fad-1) 47.6% AF2142 ferredoxin (fdx-8) 38.0% AF0023 aldehyde ferredoxin oxidoreductase (aor-1) 41.1% AF0685 enoyl-CoA hydratase (fad-2) 39.9% AF0164 ferredoxin- (nirA) 29.7% AF0077 aldehyde ferredoxin oxidoreductase (aor-2) 32.6% AF0963 enoyl-CoA hydratase (fad-3) 48.6% AF2332 , putative 30.3% AF0340 aldehyde ferredoxin oxidoreductase (aor-3) 38.4% AF1641 enoyl-CoA hydratase (fad-4) 41.7% AF0167 flavoprotein (fprA-1) 33.2% AF2281 aldehyde ferredoxin oxidoreductase (aor-4) 53.0% AF2429 enoyl-CoA hydratase (fad-5) 34.7% AF1520 flavoprotein (fprA-2) 47.2% AF0006 corrinoid methyltransferase protein (mtaC-1) 30.7% AF1763 lipase, putative 33.5% AF0557 flavoprotein reductase 26.2% AF0011 corrinoid methyltransferase protein (mtaC-2) 29.5% AF0089 long-chain-fatty-acid—CoA ligase (fadD-1) 31.9% AF1463 fumarate reductase, flavoprotein subunit (fdrA) 27.0% AF0394 D-lactate dehydrogenase, cytochrome-type (dld) 31.9% AF0200 long-chain-fatty-acid—CoA ligase (fadD-2) 34.8% AF1536 (grx-1) 34.3% AF0560 (fdhD1), authentic frameshift 32.9% AF0687 long-chain-fatty-acid—CoA ligase (fadD-3) 31.1% AF2145 glutaredoxin (grx-2) 38.8% AF1199 glutaconate CoA-transferase, subunit A (gctA) 31.9% AF0840 long-chain-fatty-acid—CoA ligase (fadD-4) 38.1% AF0663 heterodisulfide reductase, subunit A (hdrA-1) 42.2% AF1198 glutaconate CoA-transferase, subunit B (gctB), AF1029 long-chain-fatty-acid—CoA ligase (fadD-5) 37.8% AF1377 heterodisulfide reductase, subunit A (hdrA-2) 46.8% authentic frameshift 37.0% AF1510 long-chain-fatty-acid—CoA ligase (fadD-6) 36.0% AF0662 heterodisulfide reductase, subunit A/ AF1489 indolepyruvate ferredoxin oxidoreductase, AF1772 long-chain-fatty-acid—CoA ligase (fadD-7) 38.7% methylviologen reducing hydrogenase, subunit delta 34.2% subunit alpha (iorA) 48.1% AF1932 long-chain-fatty-acid—CoA ligase (fadD-8) 31.0% AF1238 heterodisulfide reductase, subunit A/methylviologen AF2030 indolepyruvate ferredoxin oxidoreductase, AF2368 long-chain-fatty-acid—CoA ligase (fadD-9) 38.7% reducing hydrogenase, subunit delta 53.7% subunit beta (iorB) 41.1% AF1753 33.5% AF1375 heterodisulfide reductase, subunit B (hdrB) 36.0% AF0807 L-lactate dehydrogenase, cytochrome-type (lldD) 39.4% AF0196 medium-chain acyl-CoA ligase (alkK-1) 34.6% AF0271 heterodisulfide reductase, subunit B, putative 35.3% AF0855 L-malate dehydrogenase, NAD+-dependent (mdhA) 40.1% AF0262 medium-chain acyl-CoA ligase (alkK-2) 38.6% AF1376 heterodisulfide reductase, subunit C (hdrC) 33.3% AF2085 oxaloacetate decarboxylase, biotin carboxyl carrier AF0672 medium-chain acyl-CoA ligase (alkK-3) 31.0% AF0502 heterodisulfide reductase, subunit D, putative 33.8% subunit, putative 38.7% AF1261 medium-chain acyl-CoA ligase (alkK-4) 42.7% AF0809 heterodisulfide reductase, subunit D, putative 100.0% AF2084 oxaloacetate decarboxylase, sodium ion pump subunit AF2033 medium-chain acyl-CoA ligase (alkK-5) 33.5% AF0661 heterodisulfide reductase, subunit E, putative 23.8% (oadB) 59.8% AF2289 (mvk) 40.6% AF0755 heterodisulfide reductase, subunits E and D, putative 31.8% AF1252 oxaloacetate decarboxylase, subunit alpha (oadA) 63.3% AF1794 myo-inositol-1-phosphate synthase (ino1) 32.2% AF0506 iron-sulfur binding reductase 38.5% AF1701 pyruvate ferredoxin oxidoreductase, AF2045 phosphatidylserine decarboxylase (psd2) 42.5% AF1773 iron-sulfur binding reductase 33.3% subunit alpha (porA) 50.3% AF1674 sn-glycerol-1-phosphate dehydrogenase (gldA) 44.0% AF1998 iron-sulfur binding reductase 29.6% AF1702 pyruvate ferredoxin oxidoreductase, AF0627 iron-sulfur cluster binding protein 45.5% subunit beta (porB) 50.7% AUTOTROPHIC METABOLISM AF0688 iron-sulfur cluster binding protein 44.8% AF1700 pyruvate ferredoxin oxidoreductase, subunit delta General AF1153 iron-sulfur cluster binding protein 27.9% (porD) 53.1% AF1100 acetyl-CoA decarbonylase/synthase, subunit alpha AF1185 iron-sulfur cluster binding protein 36.7% AF1699 pyruvate ferredoxin oxidoreductase, subunit gamma (cdhA-1) 50.4% AF1263 iron-sulfur cluster binding protein 42.1% (porG) 50.8% AF2397 acetyl-CoA decarbonylase/synthase, subunit alpha AF2380 iron-sulfur cluster binding protein 35.3% Gluconeogenesis (cdhA-2) 54.0% AF2381 iron-sulfur cluster binding protein 34.4% AF0710 phosphoenolpyruvate synthase (ppsA) 61.4% AF0379 acetyl-CoA decarbonylase/synthase, subunit beta AF2409 iron-sulfur cluster binding protein 28.2% (cdhC) 62.7% AF0076 iron-sulfur cluster binding protein 32.7% Glycolysis AF0377 acetyl-CoA decarbonylase/synthase, subunit delta AF1461 iron-sulfur cluster binding protein, putative 51.0% AF1146 3- (pgk) 48.8% (cdhD) 57.4% AF1436 iron-sulfur flavoprotein (isf-1) 35.7% AF1132 enolase (eno) 53.9% AF1101 acetyl-CoA decarbonylase/synthase, subunit epsilon AF1519 iron-sulfur flavoprotein (isf-2) 56.6% AF1732 glyceraldehyde 3-phosphate dehydrogenase (gap) 56.6% (cdhB-1) 40.0% AF1896 iron-sulfur flavoprotein (isf-3) 37.1% AF1304 triosephosphate isomerase (tpiA) 56.4% AF2398 acetyl-CoA decarbonylase/synthase, subunit epsilon AF1372 methylviologen-reducing hydrogenase, Pentose phosphate pathway (cdhB-2) 38.9% subunit alpha (vhuA) 39.4% AF0943 ribose 5-phosphate isomerase (rpi) 48.9% AF0376 acetyl-CoA decarbonylase/synthase, AF1374 methylviologen-reducing hydrogenase, subunit gamma (cdhE) 55.4% subunit delta (vhuD) 41.7% Sugars AF1849 carbon monoxide dehydrogenase, catalytic subunit AF1373 methylviologen-reducing hydrogenase, AF0356 carbohydrate kinase, pfkB family 31.3% (cooS) 39.9% subunit gamma (vhuG) 38.6% AF0401 carbohydrate kinase, pfkB family 34.1% AF0950 carbon monoxide dehydrogenase, iron sulfur subunit AF0157 molybdopterin oxidoreductase, iron-sulfur binding AF1324 carbohydrate kinase, FGGY family 27.1% (cooF) 38.9% subunit 38.6% AF1752 carbohydrate kinase, FGGY family 29.3% AF1535 ferredoxin-, catalytic subunit AF0174 molybdopterin oxidoreductase, membrane subunit 26.0% AF0861 D-arabino 3-hexulose 6-phosphate formaldehyde (ftrB) 38.6% AF0175 molybdopterin oxidoreductase, iron-sulfur binding lyase (hps-1) 30.6% AF2073 formylmethanofuran:tetrahydromethanopterin subunit 42.0% AF1305 D-arabino 3-hexulose 6-phosphate formyltransferase (ftr-1) 46.0% AF0176 molybdopterin oxidoreductase, molybdopterin formaldehyde lyase (hps-2) 44.2% AF2207 formylmethanofuran:tetrahydromethanopterin binding subunit 32.6% AF0480 fuculose-1-phosphate aldolase (fucA) 31.8% formyltransferase (ftr-2) 68.4% Nature © Macmillan Publishers Ltd 1997 AF1935 N5,N10-methenyltetrahydromethanopterin AF0004 RNase L inhibitor 54.5% AF0633 isoleucyl-tRNA synthetase (ileS) 48.9% cyclohydrolase (mch) 97.3% AF0021 signal-transducing histidine kinase 26.1% AF2421 leucyl-tRNA synthetase (leuS) 49.7% AF0714 N5,N10-methylenetetrahydromethanopterin AF0208 signal-transducing histidine kinase 27.9% AF1216 lysyl-tRNA synthetase (lysS) 43.6% dehydrogenase (mtd) 61.8% AF0450 signal-transducing histidine kinase 32.4% AF1453 methionyl-tRNA synthetase (metS) 45.2% AF1066 N5,N10-methylenetetrahydromethanopterin reductase AF0770 signal-transducing histidine kinase 26.9% AF1955 phenylalanyl-tRNA synthetase, subunit alpha (pheS) 44.4% (mer-1) 59.1% AF0893 signal-transducing histidine kinase 28.7% AF1424 phenylalanyl-tRNA synthetase, subunit beta (pheT) 42.6% AF1196 N5,N10-methylenetetrahydromethanopterin reductase AF1184 signal-transducing histidine kinase 29.8% AF1609 prolyl-tRNA synthetase (proS) 56.8% (mer-2) 37.4% AF1452 signal-transducing histidine kinase 28.5% AF2035 seryl-tRNA synthetase (serS) 45.4% AF0009 N5-methyltetrahydromethanopterin: AF1467 signal-transducing histidine kinase 37.4% AF0548 threonyl-tRNA synthetase (thrS) 46.9% methyltransferase (mtr) 42.1% AF1472 signal-transducing histidine kinase 30.4% AF1694 tryptophanyl-tRNA synthetase (trpS) 52.4% AF1587 ribulose bisphosphate carboxylase, large subunit AF1483 signal-transducing histidine kinase 27.7% AF0776 tyrosyl-tRNA synthetase (tyrS) 57.6% (rbcL-1) 40.6% AF1515 signal-transducing histidine kinase 32.0% AF2224 valyl-tRNA synthetase (valS) 54.5% AF1638 ribulose bisphosphate carboxylase, large subunit AF1639 signal-transducing histidine kinase 29.9% Degradation of proteins, peptides, and glycopeptides (rbcL-2) 44.9% AF1721 signal-transducing histidine kinase 34.5% AF1976 26S regulatory subunit 4 66.0% AF1930 tungsten formylmethanofuran dehydrogenase, AF2109 signal-transducing histidine kinase 31.6% AF1653 alkaline (aprM) 44.5% subunit A (fwdA) 48.9% AF0881 signal-transducing histidine kinase, AF0578 , putative 27.8% AF1650 tungsten formylmethanofuran dehydrogenase, authentic frameshift 26.5% AF0364 ATP-dependent protease La (lon) 36.6% subunit B (fwdB-1) 37.0% AF0277 signal-transducing histidine kinase, putative 29.8% AF1946 cysteine proteinase, putative 36.2% AF1929 tungsten formylmethanofuran dehydrogenase, AF0410 signal-transducing histidine kinase, putative 27.1% AF1281 intracellular protease (pfpI) 56.0% subunit B (fwdB-2) 49.4% AF0448 signal-transducing histidine kinase, putative 26.1% AF1112 O-sialoglycoprotein (gcp) 57.6% AF1931 tungsten formylmethanofuran dehydrogenase, AF1620 signal-transducing histidine kinase, putative 26.2% AF0665 O-sialoglycoprotein endopeptidase, putative 35.6% subunit C (fwdC) 44.1% AF2032 signal-transducing histidine kinase, putative 22.5% AF2086 protease inhibitor, putative 37.0% AF1651 tungsten formylmethanofuran dehydrogenase, AF2420 signal-transducing histidine kinase, putative 28.4% AF0490 proteasome, subunit alpha (psmA) 60.8% subunit D (fwdD-1) 32.6% AF0442 succinoglycan biosynthesis regulator (exsB) 37.2% AF0481 proteasome, subunit beta (psmB) 58.3% AF1928 tungsten formylmethanofuran dehydrogenase, AF1516 sugar fermentation stimulation protein (sfsA) 31.0% AF2034 X-pro aminopeptidase (pepQ) 34.6% subunit D (fwdD-2) 52.6% AF1270 transcriptional regulatory protein, ArsR family 35.4% AF0177 tungsten formylmethanofuran dehydrogenase, AF1544 transcriptional regulatory protein, ArsR family 32.3% Protein modification subunit E (fwdE) 29.7% AF1853 transcriptional regulatory protein, ArsR family 34.9% AF0656 maturation protein (pmbA) 32.7% AF1644 tungsten formylmethanofuran dehydrogenase, AF2136 transcriptional regulatory protein, ArsR family 39.8% AF0378 CODH nickel-insertion accessory protein (cooC-1) 35.7% subunit F (fwdF) 38.2% AF0439 transcriptional regulatory protein, AsnC family 29.8% AF1685 CODH nickel-insertion accessory protein (cooC-2) 47.4% AF1649 tungsten formylmethanofuran dehydrogenase, AF0474 transcriptional regulatory protein, AsnC family 51.0% AF1615 cofactor modifying protein (cmo) 27.2% subunit G (fwdG) 45.6% AF0584 transcriptional regulatory protein, AsnC family 35.3% AF2195 (dys1-1) 32.6% AF1121 transcriptional regulatory protein, AsnC family 35.8% AF2300 deoxyhypusine synthase (dys1-2) 34.9% PURINES, PYRIMIDINES, NUCLEOSIDES, AND NUCLEOTIDES AF1148 transcriptional regulatory protein, AsnC family 32.6% AF0381 (dph5) 40.8% 2’- metabolism AF1404 transcriptional regulatory protein, AsnC family 45.1% AF2324 fmu and fmv protein 40.0% AF1108 triphosphate deaminase, putative 38.1% AF1448 transcriptional regulatory protein, AsnC family 30.6% AF1367 hydrogenase expression/formation protein (hypA) 40.4% AF1664 (nrd) 59.7% AF1723 transcriptional regulatory protein, AsnC family 46.4% AF1368 hydrogenase expression/formation protein (hypB) 54.4% AF1554 thioredoxin reductase (trxB) 45.2% AF1743 transcriptional regulatory protein, AsnC family 34.9% AF1369 hydrogenase expression/formation protein (hypC) 40.5% AF2047 , putative 33.1% AF2127 transcriptional regulatory protein, LysR family 30.8% AF1370 hydrogenase expression/formation protein (hypD) 46.0% AF0114 transcriptional regulatory protein, putative 35.6% AF1365 hydrogenase expression/formation protein (hypE) 51.5% Nucleotide and nucleoside interconversions AF1968 transcriptional regulatory protein, Rok family 32.9% AF1366 hydrogenase expression/formation regulatory AF0876 5’- (nt5) 30.9% AF0112 transcriptional regulatory protein, Sir2 family 38.9% protein (hypF) 45.1% AF0676 () 56.1% AF1676 transcriptional regulatory protein, Sir2 family 40.6% AF0036 L-isoaspartyl protein carboxyl methyltransferase AF1900 (cmk) 48.6% AF1817 transcriptional regulatory protein, TetR family 24.5% (pcm-1) 60.7% AF0767 nucleoside diphosphate kinase (ndk) 56.4% AF0363 transcriptional repressor (cinR) 27.5% AF2322 L-isoaspartyl protein carboxyl methyltransferase AF0061 (tmk) 34.9% (pcm-2) 59.3% AF1308 thymidylate kinase, putative 26.3% REPLICATION AF1840 (map) 48.6% AF2042 uridylate kinase (pyrH) 53.6% DNA replication, restriction, modification, recombination, and repair AF1989 peptidyl-prolyl cis-trans isomerase (slyD) 34.4% Purine ribonucleotide biosynthesis AF2117 3-methyladenine DNA glycosylase (alkA) 30.0% AF0853 proliferating-cell nucleolar antigen P120, putative 35.7% AF2242 (purB) 52.3% AF2060 activator 1, replication factor C, 35 KDa subunit 66.3% AF2039 proliferating-cell nucleolar antigen P120, putative 44.2% AF0841 adenylosuccinate synthetase (purA) 70.8% AF1195 activator 1, replication factor C, 53 KDa subunit 43.7% AF1449 pyruvate formate-lyase 2 (pflD) 37.8% AF0873 amidophosphoribosyltransferase (purF) 55.8% AF0465 DNA gyrase, subunit A (gyrA) 48.4% AF1450 pyruvate formate-lyase 2 activating enzyme (pflC) 38.8% AF0253 GMP synthase (guaA-1) 59.8% AF0530 DNA gyrase, subunit B (gyrB) 58.4% AF0117 pyruvate formate-lyase activating enzyme (act-1) 25.5% AF1320 GMP synthase (guaA-2) 49.4% AF1388 DNA , putative 46.8% AF0918 pyruvate formate-lyase activating enzyme (act-2) 42.3% AF1811 inosine monophosphate cyclohydrolase 38.3% AF1960 DNA helicase, putative 32.7% AF1330 pyruvate formate-lyase activating enzyme (act-3) 45.8% AF0847 inosine monophosphate dehydrogenase (guaB-1) 41.6% AF0623 DNA ligase (lig) 44.4% AF2278 pyruvate formate-lyase activating enzyme (act-4) 42.5% AF2118 inosine monophosphate dehydrogenase (guaB-2) 31.9% AF1725 DNA ligase, putative 32.7% AF1961 pyruvate formate-lyase activating enzyme (pflX) 50.2% AF1259 inosine monophosphate dehydrogenase, putative 51.6% AF0497 DNA polymerase B1 (polB) 45.1% AF0380 transmembrane oligosaccharyl transferase, putative 27.8% AF1157 phosphoribosylamine—glycine ligase (purD) 40.9% AF0693 DNA polymerase B2 (boxA), authentic frameshift 32.3% AF0329 transmembrane oligosaccharyl transferase, putative 29.3% AF1271 phosphoribosylaminoimidazole carboxylase (purE) 42.8% AF0972 DNA polymerase III, subunit epsilon (dnaQ) 31.9% Ribosomal proteins: synthesis and modification AF1272 phosphoribosylaminoimidazolesuccinocarboxamide AF2277 DNA polymerase, bacteriophage-type 36.9% AF1490 LSU ribosomal protein L1P (rpl1P) 48.6% synthase (purC) 34.7% AF0742 DNA , putative 26.8% AF1922 LSU ribosomal protein L2P (rpl2P) 60.4% AF1693 phosphoribosylformylglycinamidine cyclo-ligase AF0264 DNA repair protein RAD2 (rad2) 44.4% AF1925 LSU ribosomal protein L3P (rpl3P) 56.5% (purM) 53.8% AF0358 DNA repair protein RAD25 32.5% AF1924 LSU ribosomal protein L4P (rpl4P) 56.4% AF1260 phosphoribosylformylglycinamidine synthase I (purQ) 40.9% AF1031 DNA repair protein RAD32 (rad32) 37.6% AF1912 LSU ribosomal protein L5P (rpl5P) 51.7% AF1940 phosphoribosylformylglycinamidine synthase II (purL) 41.5% AF0993 DNA repair protein RAD51 (radA) 59.3% AF1909 LSU ribosomal protein L6P (rpl6P) 53.7% AF0589 ribose-phosphate pyrophosphokinase (prsA-1) 35.0% AF2096 DNA repair protein REC 40.0% AF0764 LSU ribosomal protein L7AE (rpl7AE) 60.7% AF1419 ribose-phosphate pyrophosphokinase (prsA-2) 41.1% AF2418 DNA repair protein, putative 28.9% AF1491 LSU ribosomal protein L10E (rpl10E) 45.6% AF1806 DNA topoisomerase I (topA) 36.2% Pyrimidine ribonucleotide biosynthesis AF0538 LSU ribosomal protein L11P (rpl11P) 67.8% AF0940 DNA topoisomerase VI, subunit A (top6A) 39.8% AF0106 aspartate carbamoyltransferase, catalytic AF1492 LSU ribosomal protein L12A (rpl12A) 76.0% AF0652 DNA topoisomerase VI, subunit B (top6B) 43.9% subunit (pyrB) 60.7% AF1128 LSU ribosomal protein L13P (rpl13P) 47.4% AF1692 endonuclease III (nth) 44.3% AF0107 aspartate carbamoyltransferase, regulatory AF1915 LSU ribosomal protein L14P (rpl14P) 66.7% AF0580 III (xthA) 41.3% subunit (pyrI) 48.2% AF2319 LSU ribosomal protein L15E (rpl15E) 70.3% AF2314 methylated-DNA-protein-cysteine AF1274 carbamoyl-phosphate synthase, large subunit (carB) 65.1% AF1903 LSU ribosomal protein L15P (rpl15P) 53.8% methyltransferase (ogt) 55.3% AF1273 carbamoyl-phosphate synthase, small subunit (carA) 55.2% AF1127 LSU ribosomal protein L18E (rpl18E) 53.8% AF1409 modification methylase, type III R/M system 31.4% AF0252 CTP synthase (pyrG) 58.3% AF1906 LSU ribosomal protein L18P (rpl18P) 57.8% AF1234 mutator protein MutT (mutT) 63.6% AF2250 (pyrC) 37.2% AF1907 LSU ribosomal protein L19E (rpl19E) 55.5% AF2200 mutator protein MutT, putative 42.0% AF0745 dihydroorotase dehydrogenase (pyrD) 44.8% AF1529 LSU ribosomal protein L21E (rpl21E) 53.2% AF0335 proliferating-cell nuclear antigen (pol30) 33.7% AF1741 orotate phosphoribosyl transferase (pyrE) 49.0% AF1920 LSU ribosomal protein L22P (rpl22P) 55.2% AF0694 replication control protein A, putative 30.2% AF0386 orotate phosphoribosyl transferase, putative 39.0% AF1923 LSU ribosomal protein L23P (rpl23P) 55.6% AF1024 reverse gyrase (top-RG) 40.7% AF0537 LSU ribosomal protein L24A (rpl24A) 51.4% Salvage of nucleosides and nucleotides AF0621 HII (rnhB) 39.3% AF0766 LSU ribosomal protein L24E (rpl24E) 66.1% AF0240 deaminase (adeC) 39.5% AF1715 type I restriction-modification enzyme, M subunit, AF1914 LSU ribosomal protein L24P (rpl24P) 57.8% AF1764 dCMP deaminase, putative 39.0% authentic frameshift 63.0% AF1918 LSU ribosomal protein L29P (rpl29P) 44.6% AF1788 methylthioadenosine (mtaP) 40.0% AF1708 type I restriction-modification enzyme, R subunit 38.2% AF1890 LSU ribosomal protein L30E (rpl30E) 41.7% AF1341 phosphorylase (deoA-1) 46.7% AF1710 type I restriction-modification enzyme, S subunit 33.0% AF1904 LSU ribosomal protein L30P (rpl30P) 55.9% AF1342 (deoA-2) 40.7% TRANSCRIPTION AF2066 LSU ribosomal protein L31E (rpl31E) 50.6% AF0239 xanthine-guanine phosphoribosyltransferase (gptA-1) 25.7% AF1908 LSU ribosomal protein L32E (rpl32E) 51.2% AF1789 xanthine-guanine phosphoribosyltransferase (gptA-2) 28.2% DNA-dependent RNA polymerase AF0057 LSU ribosomal protein L37AE (rpl37AE) 47.6% AF1888 DNA-directed RNA polymerase, subunit A’ (rpoA1) 63.6% REGULATORY FUNCTIONS AF0874 LSU ribosomal protein L37E (rpl37E) 57.9% AF1889 DNA-directed RNA polymerase, subunit A’’ (rpoA2) 55.7% AF2067 LSU ribosomal protein L39E (rpl39E) 56.9% AF1959 (R)-hydroxyglutaryl-CoA dehydratase activator (hgdC) 51.2% AF1887 DNA-directed RNA polymerase, subunit B’ (rpoB1) 65.3% AF1430 LSU ribosomal protein L40E (rpl40E) 73.3% AF0168 arsenical resistance operon repressor, putative 36.7% AF1886 DNA-directed RNA polymerase, subunit B’’ (rpoB2) 57.1% AF1333 LSU ribosomal protein L44E (rpl44E) 46.8% AF2204 regulatory protein, putative 29.9% AF2282 DNA-directed RNA polymerase, subunit D (rpoD) 34.6% AF2064 LSU ribosomal protein LXA (rplXA) 53.8% AF0074 biotin operon repressor/biotin—[acetyl CoA AF1117 DNA-directed RNA polymerase, subunit E’ (rpoE1) 48.4% AF0739 ribosomal protein S18 alanine acetyltransferase 38.5% carboxylase] ligase (birA) 36.6% AF1116 DNA-directed RNA polymerase, subunit E’’ (rpoE2) 40.0% AF2303 ribosomal protein S6 modification protein (rimK) 32.2% AF1724 dinitrogenase reductase activating glycohydrolase AF1885 DNA-directed RNA polymerase, subunit H (rpoH) 59.5% AF1133 SSU ribosomal protein S2P (rps2P) 58.3% (draG) 37.9% AF1131 DNA-directed RNA polymerase, subunit K (rpoK) 61.5% AF1919 SSU ribosomal protein S3P (rps3P) 50.0% AF2232 ferric uptake regulation protein (fur) 25.8% AF0207 DNA-directed RNA polymerase, subunit L (rpoL) 42.0% AF1913 SSU ribosomal protein S4E (rps4E) 48.9% AF1785 iron-dependent repressor 42.0% AF1130 DNA-directed RNA polymerase, subunit N (rpoN) 58.8% AF2284 SSU ribosomal protein S4P (rps4P) 59.1% AF2395 iron-dependent repressor 40.0% Transcription factors AF1905 SSU ribosomal protein S5P (rps5P) 60.0% AF0245 iron-dependent repressor (desR) 28.2% AF1813 TBP-interacting protein TIP49 45.7% AF0511 SSU ribosomal protein S6E (rps6E) 50.8% AF1984 iron-dependent repressor (troR) 28.3% AF1299 transcription initiation factor IIB 60.4% AF1893 SSU ribosomal protein S7P (rps7P) 59.6% AF2430 lacZ expression regulatory protein (icc) 29.6% AF0373 transcription initiation factor IID 59.4% AF2152 SSU ribosomal protein S8E (rps8E) 61.6% AF1622 leucine responsive regulatory protein (lrp) 29.1% AF0757 transcription initiation factor IIE, subunit alpha, putative23.5% AF1910 SSU ribosomal protein S8P (rps8E) 64.6% AF0673 mercuric resistance operon regulatory protein (merR) 37.6% AF1891 transcription termination-antitermination factor NusA, AF1129 SSU ribosomal protein S9P (rps9P) 59.5% AF2425 regulatory protein (moxR) 48.3% putative 48.9% AF0938 SSU ribosomal protein S10P (rps10P) 71.0% AF1475 mitochondrial benzodiazepine receptor/sensory AF1235 transcription-associated protein TFIIS 59.0% AF2283 SSU ribosomal protein S11P (rps11P) 71.1% transduction protein 38.4% AF1892 SSU ribosomal protein S12P (rps12P) 74.1% AF0198 regulatory protein, putative 41.7% RNA processing AF2285 SSU ribosomal protein S13P (rps13P) 52.1% AF1933 monoamine oxidase regulatory protein, putative 38.9% AF1783 dimethyladenosine transferase (ksgA) 44.7% AF1911 SSU ribosomal protein S14P (rps14P) 61.5% AF0978 nitrogen regulatory protein P-II (glnB-1) 61.7% AF2087 fibrillarin (fib) 49.3% AF0801 SSU ribosomal protein S15P (rps15P) 62.0% AF1747 nitrogen regulatory protein P-II (glnB-2) 58.0% AF0482 mRNA 3’-end processing factor, putative 55.5% AF0911 SSU ribosomal protein S17E (rps17E) 52.6% AF1750 nitrogen regulatory protein P-II (glnB-3) 60.7% AF0532 mRNA 3’-end processing factor, putative 39.1% AF1916 SSU ribosomal protein S17P (rps17P) 59.0% AF0331 pheromone shutdown protein (traB) 40.5% AF2361 mRNA 3’-end processing factor, putative 30.5% AF2069 SSU ribosomal protein S19E (rps19E) 64.2% AF1797 phosphate regulatory protein, putative 30.7% AF2399 rRNA methylase, putative 36.4% AF1921 SSU ribosomal protein S19P (rps19P) 60.9% AF0521 protease synthase and sporulation regulator Pai1, AF0362 snRNP, putative 32.0% AF1114 SSU ribosomal protein S24E (rps24E) 40.2% putative 52.4% AF0875 snRNP, putative 35.7% AF1113 SSU ribosomal protein S27AE (rps27AE) 60.0% AF1627 repressor protein 59.1% TRANSLATION AF1334 SSU ribosomal protein S27E (rps27E) 49.0% AF1793 repressor protein 54.5% Amino acyl tRNA synthetases AF0765 SSU ribosomal protein S28E (rps28E) 55.6% AF0449 response regulator 38.1% AF2255 alanyl-tRNA synthetase (alaS) 47.1% AF2320 SSU ribosomal protein S3AE (rps3AE) 38.9% AF1063 response regulator 36.3% AF0894 arginyl-tRNA synthetase (argS) 48.8% AF1256 response regulator 42.5% tRNA modification AF0920 aspartyl-tRNA synthetase (aspS) 62.5% AF1384 response regulator 44.7% AF0588 archaeosine tRNA-ribosyltransferase (tgtA) 52.0% AF0411 cysteinyl-tRNA synthetase (cysS) 46.1% AF1473 response regulator 32.5% AF1954 Glu-tRNA amidotransferase, subunit A (gatA-1) 38.6% AF0260 glutamyl-tRNA synthetase (gltX) 44.9% AF1898 response regulator 48.7% AF2329 Glu-tRNA amidotransferase, subunit A (gatA-2) 53.5% AF0916 glycyl-tRNA synthetase (glyS) 51.2% AF2249 response regulator 44.8% AF1440 Glu-tRNA amidotransferase, subunit B (gatB-1) 54.7% AF1642 histidyl-tRNA synthetase (hisS) 46.0% AF2419 response regulator 37.9% AF2116 Glu-tRNA amidotransferase, subunit B (gatB-2) 46.4%

Nature © Macmillan Publishers Ltd 1997 AF2328 Glu-tRNA amidotransferase, subunit C (gatC) 35.1% protein (dppA) 33.1% AF2258 multidrug resistance protein 31.3% AF0815 N2,N2-dimethylguanosine tRNA methyltransferase AF1768 dipeptide ABC transporter, permease protein (dppB) 39.3% OTHER CATEGORIES (trm1) 38.2% AF1769 dipeptide ABC transporter, permease protein (dppC) 40.8% AF1730 I (truA) 37.4% AF0680 glutamine ABC transporter, ATP-binding protein (glnQ) 63.8% Adaptations and atypical conditions AF1485 queuine tRNA-ribosyltransferase (tgtB) 44.1% AF0231 glutamine ABC transporter, periplasmic glutamine- AF0508 ethylene-inducible protein 74.5% AF0493 ribonuclease PH (rph) 30.8% binding protein (glnH) 38.0% AF0235 heat shock protein (htpX) 32.9% AF0900 tRNA intron endonuclease (endA) 41.8% AF0232 glutamine ABC transporter, permease protein (glnP) 39.3% AF0942 surE stationary-phase survival protein (surE) 50.2% AF2156 tRNA (cca) 43.9% AF0981 osmoprotection protein (proV) 39.0% AF1996 virulence associated (vapC-1) 50.0% AF0979 osmoprotection protein (proW-1) 32.8% Translation factors AF1690 virulence associated protein C (vapC-2) 30.0% AF0980 osmoprotection protein (proW-2) 36.8% AF2350 ATP-dependent RNA helicase HepA, putative 31.5% AF0982 osmoprotection protein (proX) 28.7% AF2254 ATP-dependent RNA helicase, DEAD-family (deaD) 52.2% Drug and analog sensitivity AF0015 proline permease (putP-1) 26.2% AF0071 ATP-dependent RNA helicase, putative 29.6% AF1884 daunorubicin resistance ATP-binding protein (drrA) 47.1% AF0969 proline permease (putP-2) 27.4% AF1458 ATP-dependent RNA helicase, putative 48.1% AF1883 daunorubicin resistance membrane protein (drrB) 27.0% AF1222 proline permease (putP-3) 27.0% AF2406 ATP-dependent RNA helicase, putative 35.2% AF0487 penicillin G acylase 31.7% AF1608 spermidine/putrescine ABC transporter, ATP- AF1149 large helicase-related protein (lhr-1) 34.5% AF1214 phenylacrylic acid decarboxylase (pad1) 43.2% binding protein (potA) 50.2% AF2177 large helicase-related protein (lhr-2), authentic AF2194 rRNA (adenine-N6)-methyltransferase, putative 29.2% AF1605 spermidine/putrescine ABC transporter, periplasmic frameshift 56.0% AF1696 small multidrug export protein (qacE) 39.0% spermidine/putrescine-binding protein (potD), AF1220 peptide chain release factor eRF, subunit 1 51.2% authentic frameshift 31.0% AF2245 SKI2-family helicase, authentic frameshift 45.7% Transposon-related functions AF1607 spermidine/putrescine ABC transporter, permease AF0937 translation EF-1, subunit alpha (tuf) 74.4% AF0120 insertion sequence ISH S1, authentic frameshift 34.5% protein (potB) 38.0% AF0574 translation elongation factor EF-1, subunit beta 31.3% AF0193 ISA0963-1, putative transposase, authentic frameshift 34.3% AF1606 spermidine/putrescine ABC transporter, permease AF1894 translation elongation factor EF-2 (fus) 62.5% AF0309 ISA0963-2, putative transposase 33.5% protein (potC) 38.7% AF0777 translation initiation factor eIF-1A (eif1A) 57.5% AF1310 ISA0963-3, putative transposase 33.5% AF0527 translation initiation factor eIF-2, subunit alpha (eif2A) 51.1% Anions AF1383 ISA0963-4, putative transposase 33.5% AF2326 translation initiation factor eIF-2, subunit beta, putative 45.5% AF2308 arsenite transport protein (arsB) 27.3% AF1410 ISA0963-5, putative transposase 33.5% AF0592 translation initiation factor eIF-2, AF1415 chloride channel, putative 27.3% AF1705 ISA0963-6, putative transposase 33.5% subunit gamma (eif2G) 64.4% AF0025 cyanate transport protein (cynX) 24.5% AF1836 ISA0963-7, putative transposase, authentic frameshift 20.0% AF0370 translation initiation factor eIF-2B, subunit AF0087 nitrate ABC transporter, ATP-binding protein (nrtC-1) 47.4% AF0678 ISA1083-1, ISORF2 33.6% delta (eif2BD) 53.3% AF0638 nitrate ABC transporter, ATP-binding protein (nrtC-2) 55.5% AF0679 ISA1083-1, putative transposase 37.2% AF2037 translation initiation factor eIF-2B, subunit AF0640 nitrate ABC transporter, ATP-binding protein, putative 32.5% AF1351 ISA1083-2, ISORF2 30.8% delta (eif2BD) 57.9% AF0086 nitrate ABC transporter, permease protein (nrtB-1) 35.4% AF1352 ISA1083-2, putative transposase 31.5% AF0645 translation initiation factor eIF-5A (eif5A) 50.4% AF0639 nitrate ABC transporter, permease protein (nrtB-2) 37.4% AF2140 ISA1083-3, ISORF2 30.8% AF0768 translation initiation factor IF-2 (infB) 52.2% AF1359 phosphate ABC transporter, ATP-binding AF2139 ISA1083-3, putative transposase 31.5% protein (pstB) 66.0% AF0278 ISA1214-1, ISORF2 27.7% TRANSPORT AND BINDING PROTEINS AF1356 phosphate ABC transporter, periplasmic phosphate- AF0279 ISA1214-1, putative transposase 33.3% General binding protein (phoX) 25.1% AF0305 ISA1214-2, ISORF2 27.7% AF0393 ABC transporter, ATP-binding protein 34.5% AF1358 phosphate ABC transporter, permease protein (pstA) 34.1% AF0306 ISA1214-2, putative transposase 33.3% AF0984 ABC transporter, ATP-binding protein 35.2% AF1357 phosphate ABC transporter, permease protein (pstC) 33.7% AF0641 ISA1214-3, ISORF2 26.5% AF1006 ABC transporter, ATP-binding protein 35.1% AF1360 phosphate ABC transporter, regulatory protein (phoU) 26.9% AF0642 ISA1214-3, putative transposase 33.3% AF1018 ABC transporter, ATP-binding protein 57.7% AF0791 phosphate permease, putative 31.1% AF0857 ISA1214-4, ISORF2 27.7% AF1021 ABC transporter, ATP-binding protein 37.8% AF1798 phosphate permease, putative 52.9% AF0858 ISA1214-4, putative transposase 33.3% AF1136 ABC transporter, ATP-binding protein 39.3% AF0092 sulfate ABC transporter, ATP-binding protein (cysA) 54.2% AF2091 ISA1214-5, ISORF2 26.5% AF1139 ABC transporter, ATP-binding protein 38.2% AF0093 sulfate ABC transporter, permease protein (cysT) 44.1% AF2092 ISA1214-5, putative transposase 33.3% AF1300 ABC transporter, ATP-binding protein 34.1% AF2223 ISA1214-6, ISORF2 26.5% AF1469 ABC transporter, ATP-binding protein 43.5% Carbohydrates, organic alcohols, and acids AF2222 ISA1214-6, putative transposase 25.6% AF1819 ABC transporter, ATP-binding protein 51.1% AF0347 C4-dicarboxylate transporter (mae1) 24.5% AF0138 transposase IS240-A 43.3% AF1982 ABC transporter, ATP-binding protein 41.3% AF1426 glycerol uptake facilitator, MIP channel (glpF) 36.2% AF0895 transposase IS240-A 46.2% AF2364 ABC transporter, ATP-binding protein 53.5% AF0013 hexuronate transporter (exuT) 25.1% AF2390 transposase, authentic frameshift 24.0% AF1005 ABC transporter, ATP-binding protein, putative 28.7% AF0806 L-lactate permease (lctP) 31.7% AF0137 transposase, putative 29.6% AF1064 ABC transporter, ATP-binding protein, putative 36.0% AF0008 oxalate/formate antiporter (oxlT-1) 25.7% AF1628 transposase, putative 32.8% AF1983 ABC transporter, periplasmic binding protein 25.4% AF0367 oxalate/formate antiporter (oxlT-2) 33.2% UNKNOWN AF1981 ABC transporter, permease protein 29.9% AF1069 pantothenate permease (panF-1) 28.9% AF0477 AAA superfamily ATPase 35.0% AF1995 sodium- and chloride-dependent transporter 52.5% AF1205 pantothenate permease (panF-2) 24.8% AF0513 allene oxide synthase, putative 39.5% AF0237 pantothenate permease (panF-3) 25.1% AF0478 ATP-binding protein PhnP (phnP) 30.9% Amino acids, peptides and amines AF0041 polysaccharide ABC transporter, ATP-binding AF1775 chlorohydrolase, putative 34.4% AF1766 amino-acid ABC transporter, periplasmic protein (rfbB-1) 42.5% AF0973 bile acid-inducible operon protein F (baiF-1) 30.8% binding protein/ 27.4% AF0290 polysaccharide ABC transporter, ATP-binding protein AF0974 bile acid-inducible operon protein F (baiF-2) 29.9% AF0222 branched-chain amino acid ABC transporter, (rfbB-2) 43.9% AF1315 bile acid-inducible operon protein F (baiF-3) 31.3% ATP-binding protein (braF-1) 42.7% AF0042 polysaccharide ABC transporter, permease protein AF2063 c- binding protein, putative 21.7% AF0822 branched-chain amino acid ABC transporter, (rfbA-1) 27.5% AF1992 calcium-binding protein, putative 31.2% ATP-binding protein (braF-2) 44.7% AF0289 polysaccharide ABC transporter, permease protein AF2287 carotenoid biosynthetic gene ERWCRTS, putative 49.4% AF0959 branched-chain amino acid ABC transporter, ATP- (rfbA-2) 28.5% AF0512 inner envelope membrane protein 42.5% binding protein (braF-3) 37.6% AF0887 ribose ABC transporter, ATP-binding protein (rbsA-1) 33.3% AF2251 competence-damage protein, putative 28.0% AF1390 branched-chain amino acid ABC transporter, AF1170 ribose ABC transporter, ATP-binding protein (rbsA-2) 27.9% AF0090 dehydrase 34.1% ATP-binding protein (braF-4) 59.7% AF0888 ribose ABC transporter, permease protein (rbsC-1) 24.1% AF1498 dehydrase, putative 29.4% AF0221 branched-chain amino acid ABC transporter, AF0889 ribose ABC transporter, permease protein (rbsC-2) 31.2% AF1518 DNA/pantothenate metabolism flavoprotein, putative 51.4% ATP-binding protein (braG-1) 48.2% AF2014 sugar transporter, putative 26.0% AF0039 dolichol-P-glucose synthetase, putative 33.7% AF0823 branched-chain amino acid ABC transporter, AF0328 dolichol-P-glucose synthetase, putative 39.0% ATP-binding protein (braG-2) 42.9% Cations AF0581 dolichol-P-glucose synthetase, putative 27.5% AF0958 branched-chain amino acid ABC transporter, AF0977 ammonium transporter (amt-1) 44.3% AF0569 DR-beta chain MHC class II 37.7% ATP-binding protein (braG-3) 34.1% AF1746 ammonium transporter (amt-2) 49.0% AF0383 endonuclease III, putative 47.1% AF1389 branched-chain amino acid ABC transporter, ATP- AF1749 ammonium transporter (amt-3) 41.5% AF1150 erpK protein, putative 54.9% binding protein (braG-4) 64.6% AF0473 cation-transporting ATPase, P-type (pacS) 44.0% AF2372 extragenic suppressor (suhB) 37.0% AF0223 branched-chain amino acid ABC transporter, AF0152 copper-transporting ATPase, P-type (copB) 44.5% AF1418 glycerol-3-phosphate cytidyltransferase (taqD) 56.6% periplasmic binding protein (braC-1) 34.3% AF0246 iron (II) transporter (feoB-1) 33.3% AF0744 GTP-binding protein 33.4% AF0827 branched-chain amino acid ABC transporter, AF2394 iron (II) transporter (feoB-2) 48.0% AF1181 GTP-binding protein 36.3% periplasmic binding protein (braC-2) 26.8% AF0561 iron (II) transporter (feoB-3), authentic frameshift 29.4% AF1364 GTP-binding protein 57.5% AF0962 branched-chain amino acid ABC transporter, AF0430 iron (III) ABC transporter, ATP-binding protein (hemV-1) 50.4% AF2146 GTP-binding protein 65.9% periplasmic binding protein (braC-3) 25.6% AF0432 iron (III) ABC transporter, ATP-binding protein (hemV-2) 58.7% AF0428 GTP-binding protein, GTP1/OBG-family 43.9% AF1391 branched-chain amino acid ABC transporter, AF1401 iron (III) ABC transporter, ATP-binding protein (hemV-3) 35.2% AF2237 HAM1 protein 31.4% periplasmic binding protein (braC-4) 50.1% AF1397 iron (III) ABC transporter, periplasmic hemin-binding protein AF2211 HIT family protein (hit) 29.6% AF0224 branched-chain amino acid ABC transporter, (hemT), authentic frameshift 28.2% AF0216 L-isoaspartyl protein carboxyl methyltransferase permease protein (braD-1) 25.4% AF0431 iron (III) ABC transporter, permease protein (hemU-1) 36.2% PimT, putative 35.5% AF0825 branched-chain amino acid ABC transporter, AF1402 iron (III) ABC transporter, permease protein (hemU-2) 35.2% AF2313 maoC protein (maoC) 43.0% permease protein (braD-2) 30.8% AF0786 magnesium and cobalt transporter (corA) 40.1% AF0429 methyltransferase 43.8% AF0961 branched-chain amino acid ABC transporter, AF0346 mercuric transport protein periplasmic AF0186 nifS protein, class-V aminotransferase (nifS-1) 46.1% permease protein (braD-3) 23.9% component (merP) 35.2% AF0564 nifS protein, class-V aminotransferase (nifS-2) 45.1% AF1392 branched-chain amino acid ABC transporter, AF0217 Na+/H+ antiporter (napA-1) 28.2% AF0185 nifU protein (nifU-1) 55.6% permease protein (braD-4) 65.4% AF1245 Na+/H+ antiporter (napA-2) 28.4% AF0565 nifU protein (nifU-2) 55.6% AF0225 branched-chain amino acid ABC transporter, AF0846 Na+/H+ antiporter (nhe2) 33.1% AF0632 nifU protein (nifU-3) 47.4% permease protein (braE-1) 28.7% AF0715 potassium channel, putative 39.5% AF1781 nodulation protein NfeD (nfeD) 33.4% AF0824 branched-chain amino acid ABC transporter, AF1673 potassium channel, putative 36.3% AF2269 nucleotide-binding protein 48.7% permease protein (braE-2) 31.3% AF2197 potassium channel, putative 24.6% AF2382 nucleotide-binding protein 49.1% AF0960 branched-chain amino acid ABC transporter, AF0218 TRK potassium uptake system protein (trkA-1) 30.2% AF0374 p-nitrophenyl phosphatase (pho2) 31.7% permease protein (braE-3) 30.1% AF0838 TRK potassium uptake system protein (trkA-2) 42.9% AF1978 periplasmic divalent cation tolerance protein (cutA) 31.3% AF1393 branched-chain amino acid ABC transporter, AF0839 TRK potassium uptake system protein (trkH) 39.8% AF1652 prepro- sendai, putative 35.6% permease protein (braE-4) 60.5% AF2021 rod shape-determining protein (mreB) 26.6% AF1612 cationic amino acid transporter (cat-1) 29.5% Other AF1778 stage V sporulation protein (spoVG) 43.9% AF1774 cationic amino acid transporter (cat-2) 38.0% AF0834 ferritin, putative 39.8% AF1970 TPR domain-containing protein 29.0% AF1770 dipeptide ABC transporter, ATP-binding protein (dppD) 47.8% AF1980 heme exporter protein C (helC) 29.0% AF2202 tryptophan-specific permease, putative 25.2% AF1771 dipeptide ABC transporter, ATP-binding protein (dppF) 43.1% AF1144 multidrug resistance protein 29.2% AF0816 vtpJ-therm, putative 42.1% AF1767 dipeptide ABC transporter, dipeptide-binding AF1325 multidrug resistance protein 29.9% AF1679 vtpJ-therm, putative 45.1%

Nature © Macmillan Publishers Ltd 1997