<<

letters to nature

Received 7 July; accepted 21 September 1998. 26. Tronrud, D. E. Conjugate-direction minimization: an improved method for the re®nement of macromolecules. Acta Crystallogr. A 48, 912±916 (1992). 1. Dalbey, R. E., Lively, M. O., Bron, S. & van Dijl, J. M. The chemistry and enzymology of the type 1 27. Wolfe, P. B., Wickner, W. & Goodman, J. M. Sequence of the leader peptidase of signal peptidases. Sci. 6, 1129±1138 (1997). and the orientation of leader peptidase in the bacterial envelope. J. Biol. Chem. 258, 12073±12080 2. Kuo, D. W. et al. Escherichia coli leader peptidase: production of an active form lacking a requirement (1983). for detergent and development of substrates. Arch. Biochem. Biophys. 303, 274±280 (1993). 28. Kraulis, P.G. Molscript: a program to produce both detailed and schematic plots of protein structures. 3. Tschantz, W. R. et al. Characterization of a soluble, catalytically active form of Escherichia coli leader J. Appl. Crystallogr. 24, 946±950 (1991). peptidase: requirement of detergent or phospholipid for optimal activity. Biochemistry 34, 3935±3941 29. Nicholls, A., Sharp, K. A. & Honig, B. Protein folding and association: insights from the interfacial and (1995). the thermodynamic properties of hydrocarbons. Struct. Funct. Genet. 11, 281±296 (1991). 4. Allsop, A. E. et al.inAnti-Infectives, Recent Advances in Chemistry and Structure-Activity Relationships 30. Meritt, E. A. & Bacon, D. J. Raster3D: photorealistic molecular graphics. Methods Enzymol. 277, 505± (eds Bently, P. H. & O'Hanlon, P. J.) 61±72 (R. Soc. Chem., Cambridge, 1997). 524 (1997). 5. Black, M. T. & Bruton, G. Inhibitors of bacterial signal peptidases. Curr. Pharm. Des. 4, 133±154 (1998). 6. Date, T. Demonstration by a novel genetic technique that leader peptidase is an essential in Acknowledgements. We thank SmithKlineBeecham Pharmaceuticals for penem inhibitor; R. M. Sweet 8 Escherichia coli. J. Bacteriol. 154, 76±83 (1983). for use of beamline X12C (NSLS, Brookhaven National Laboratory); G. Petsko for the ethylmercury ; M. N. G. James for access to equipment for characterization of earlier crystal forms of SPase; 7. Whitely, P. & von Heijne, G. The DsbA-DsbB system affects the formation of disul®de bonds in and S. Mosimann and S. Ness for discussions. This work was supported by the Medical Research Council periplasmic but not in intramembraneous protein domains. FEBS Lett. 332, 49±51 (1993). of Canada, the Canadian Bacterial Diseases Network of Excellence, and British Columbia Medical 8. Peat, T. S. et al. Structure of the UmuD9 protein and its regulation in response to DNA damage. Nature Research Foundation grants to N.C.J.S. M.P. is funded by an MRC of Canada post-doctoral fellowship, 380, 727±730 (1996). N.C.J.S. by an MRC of Canada scholarship, and R.E.D. by the NIH and the American Heart Association. 9. Paetzel, M. et al. Crystallization of a soluble, catalytically active form of Escherichia coli leader peptidase. Proteins Struct. Funct. Genet. 23, 122±125 (1995). Correspondence and requests for materials should be addressed to N.C.J.S. (e-mail: natalie@byron. 10. van Klompenburg, W. et al. Phosphatidylethanolamine mediated insertion of the catalytic domain of biochem.ubc.ca). leader peptidase in membranes. FEBS Lett. 431, 75±79 (1998). 11. Kim, Y. T., Muramatsu, T. & Takahashi, K. Identi®cation of Trp 300 as an important residue for Escherichia coli leader peptidase activity. Eur. J. Biochem. 234, 358±362 (1995). 12. Landolt-Marticorena, C., Williams, K. A., Deber, C. M. & Reithmeirer, R. A. Non-random distribu- tion of amino in the transmembrane segments of human type I single span membrane proteins. J. Mol. Biol. 229, 602±608 (1993). 13. James, M. N. G. in Proteolysis and Protein Turnover (eds Bond, J. S. & Barrett, A. J.) 1±8 (Portland, errata Brook®eld, VT, 1994). 14. Strynadka, N. C. J. et al. Molecular structure of the acyl-enzyme intermediate in b-lactamase at 1.7 AÊ resolution. Nature 359, 393±400 (1992). 15. Manard, R. & Storer, A. C. interactions in and . Biol. Chem. Hoppe-Seyler 373, 393±400 (1992). Reconciling the spectrum 16. Nicolas, A. et al. Contribution of Ser 42 side chain to the stabilization of the oxyanion transition state. Biochemistry 35, 398±410 (1996). of Sagittarius A* with a 17. Paetzel, M. et al. Use of site-directed chemical modi®cation to study an essential in Escherichia coli leader peptidase. J. Biol. Chem. 272, 9994±10003 (1997). 18. Paetzel, M. & Dalbey, R. E. Catalytic hydroxyl/ dyads with serine proteases. Trends Biochem. Sci. two-temperature 22, 28±31 (1997). 19. von Heijne, G. Signal sequences. The limits of variation. J. Mol. Biol. 184, 99±105 (1985). 20. Izard, J. W. & Kendall, D. A. Signal : exquisitely designed transport promoters. Mol. Microbiol. plasma model 13, 765±773 (1994). 21. Matthews, B. W. content of protein crystals. J. Mol. Biol. 33, 491±497 (1968). Rohan Mahadevan 22. Otwinowski, Z. in DENZO (eds Sawyer, L., Isaacs, N. & Baily, S.) 56±62 (SERC Daresbury Laboratory, Warrington, UK, 1993). 23. Collaborative Computational Project No. 4 The CCP4 suite: programs for protein crystallography. Nature 394, 651±653 (1998) Acta Crystallogr. D 50, 760±763 (1994)...... 24. Jones, T. A., Zou, J. Y., Cowan, S. W. & Kieldgaard, M. Improved methods for building protein models A misleading typographical error was introduced into the second in electron density maps and the location of errors in these models. Acta Crystallogr. A 47, 110±119 (1991). sentence of the bold introductory paragraph of this Letter: the word 25. Brunger, A. T. X-PLOR: A System for X-ray Crystallography and NMR (Version 3.1) (Yale Univ. Press, New Haven, 1987). ``infrared'' should be ``inferred''. M

Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence

S. T. Cole, R. Brosch, J. Parkhill, T. Garnier, C. Churcher, D. Harris, S. V. Gordon, K. Eiglmeier, S. Gas, C. E. Barry III, F. Tekaia, K. Badcock, D. Basham, D. Brown, T. Chillingworth, R. Connor, R. Davies, K. Devlin, T. Feltwell, S. Gentles, N. Hamlin, S. Holroyd, T. Hornsby, K. Jagels, A. Krogh, J. McLean, S. Moule, L. Murphy, K. Oliver, J. Osborne, M. A. Quail, M.-A. Rajandream, J. Rogers, S. Rutter, K. Seeger, J. Skelton, R. Squares, S. Squares, J. E. Sulston, K. Taylor, S. Whitehead & B. G. Barrell

Nature 393, 537±544 (1998) ...... As a result of an error during ®lm output, Table 1 was published with some symbols missing. The correct version can be found at http://www.sanger.ac.uk and is reproduced again here (following pages). Also, in Fig. 2, we incorrectly labelled Rv0649 as fadD37 instead of fabD2. Two of the for mycolyl were inverted: Rv0129c encodes antigen 85C and not 85C9 as stated, whereas Rv3803c codes for the secreted protein MPT51 and not antigen 85C (Infect. Immun. 59, 372±382; 1991); Rv3803c is now designated fbpD. We thank Morten Harboe and Harald Wiker for drawing this to our attention. The sequence of Rv0746 from M. bovis BCG-Pasteur presented in Fig. 5b was incorrect and should have shown a 16-codon deletion instead of 29, as indicated here: H37Rv.....GSGAPGGAGGAAGLWGTGGAGGAGGSSAGGGGAGGAGGAGGWLLGDGGAGGIGGAST...... :::::::::::::::::::: ::::::::::::::::::::: BCG...... GSGAPGGAGGAAGLWGTGGA------GGAGGWLLGDGGAGGIGGAST... M

190 Nature © Macmillan Publishers Ltd 1998 NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com letters to nature

8

NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com Nature © Macmillan Publishers Ltd 1998 191 letters to nature

8

192 Nature © Macmillan Publishers Ltd 1998 NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com letters to nature

8

NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com Nature © Macmillan Publishers Ltd 1998 193 letters to nature

8

194 Nature © Macmillan Publishers Ltd 1998 NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com letters to nature

8

NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com Nature © Macmillan Publishers Ltd 1998 195 letters to nature

8

196 Nature © Macmillan Publishers Ltd 1998 NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com letters to nature

8

NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com Nature © Macmillan Publishers Ltd 1998 197 letters to nature

8

198 Nature © Macmillan Publishers Ltd 1998 NATURE | VOL 396 | 12 NOVEMBER 1998 | www.nature.com article

Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence 8

S. T. Cole*, R. Brosch*, J. Parkhill, T. Garnier*, C. Churcher, D. Harris, S. V. Gordon*, K. Eiglmeier*,S.Gas*, C. E. Barry III†, F. Tekaia‡, K. Badcock, D. Basham, D. Brown, T. Chillingworth, R. Connor, R. Davies, K. Devlin, T. Feltwell, S. Gentles, N. Hamlin, S. Holroyd, T. Hornsby, K. Jagels, A. Krogh§, J. McLean, S. Moule, L. Murphy, K. Oliver, J. Osborne, M. A. Quail, M.-A. Rajandream, J. Rogers, S. Rutter, K. Seeger, J. Skelton, R. Squares, S. Squares, J. E. Sulston, K. Taylor, S. Whitehead & B. G. Barrell Sanger Centre, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK * Unite´ de Ge´ne´tique Mole´culaire Bacte´rienne, and ‡ Unite´ de Ge´ne´tique Mole´culaire des Levures, Institut Pasteur, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France † Tuberculosis Research Unit, Laboratory of Intracellular Parasites, Rocky Mountain Laboratories, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Hamilton, Montana 59840, USA § Center for Biological , Technical University of Denmark, Lyngby, Denmark

......

Countless millions of people have died from tuberculosis, a chronic infectious disease caused by the tubercle bacillus. The complete genome sequence of the best-characterized strain of Mycobacterium tuberculosis, H37Rv, has been determined and analysed in order to improve our understanding of the biology of this slow-growing and to help the conception of new prophylactic and therapeutic interventions. The genome comprises 4,411,529 base pairs, contains around 4,000 genes, and has a very high guanine + cytosine content that is reflected in the biased amino- content of the proteins. M. tuberculosis differs radically from other in that a very large portion of its coding capacity is devoted to the production of involved in and , and to two new families of -rich proteins with a repetitive structure that may represent a source of antigenic variation.

Despite the availability of effective short-course chemotherapy pids and polysaccharides4,5. Novel biosynthetic pathways generate (DOTS) and the Bacille Calmette-Gue´rin (BCG) , the cell-wall components such as mycolic acids, mycocerosic acid, tubercle bacillus continues to claim more lives than any other phenolthiocerol, lipoarabinomannan and arabinogalactan, and single infectious agent1. Recent years have seen increased incidence several of these may contribute to mycobacterial longevity, trigger of tuberculosis in both developing and industrialized countries, the inflammatory host reactions and act in pathogenesis. Little is widespread emergence of drug-resistant strains and a deadly known about the mechanisms involved in life within the macro- synergy with the human immunodeficiency (HIV). In 1993, phage, or the extent and nature of the factors produced by the gravity of the situation led the World Health Organisation (WHO) the bacillus and their contribution to disease. to declare tuberculosis a global emergency in an attempt to heighten It is thought that the progenitor of the M. tuberculosis complex, public and political awareness. measures are needed now to comprising M. tuberculosis, M. bovis, M. bovis BCG, M. africanum prevent the grim predictions of the WHO becoming reality. The and M. microti, arose from a soil bacterium and that the human combination of genomics and bioinformatics has the potential to bacillus may have been derived from the bovine form following the generate the information and knowledge that will enable the domestication of cattle. The complex lacks interstrain genetic conception and development of new therapies and interventions diversity, and changes are very rare6. This is important needed to treat this airborne disease and to elucidate the unusual in terms of immunity and vaccine development as most of the biology of its aetiological agent, Mycobacterium tuberculosis. proteins will be identical in all strains and therefore The characteristic features of the tubercle bacillus include its slow will be restricted. On the basis of the systematic sequence analysis of growth, dormancy, complex cell envelope, intracellular pathogen- 26 loci in a large number of independent isolates6, it was concluded esis and genetic homogeneity2. The generation time of M. tubercu- that the genome of M. tuberculosis is either unusually inert or that losis, in synthetic medium or infected , is typically ϳ24 the organism is relatively young in evolutionary terms. hours. This contributes to the chronic nature of the disease, imposes Since its in 1905, the H37Rv strain of M. tuberculosis has lengthy treatment regimens and represents a formidable obstacle for found extensive, worldwide application in biomedical research researchers. The state of dormancy in which the bacillus remains because it has retained full virulence in models of tubercu- quiescent within infected tissue may reflect metabolic shutdown losis, unlike some clinical isolates; it is also susceptible to drugs and resulting from the action of a cell-mediated immune response that amenable to genetic manipulation. An integrated map of the 4.4 can contain but not eradicate the . As immunity wanes, megabase (Mb) circular of this slow-growing patho- through ageing or immune suppression, the dormant bacteria gen had been established previously and ordered libraries of reactivate, causing an outbreak of disease often many decades cosmids and bacterial artificial (BACs) were after the initial infection3. The molecular basis of dormancy and available7,8. reactivation remains obscure but is expected to be genetically programmed and to involve intracellular signalling pathways. Organization and sequence of the genome The cell envelope of M. tuberculosis, a Gram-positive bacterium Sequence analysis. To obtain the contiguous genome sequence, a with a G + C-rich genome, contains an additional layer beyond the combined approach was used that involved the systematic sequence peptidoglycan that is exceptionally rich in unusual , glycoli- analysis of selected large-insert clones (cosmids and BACs) as well as

Nature © Macmillan Publishers Ltd 1998 NATURE | VOL 393 | 11 JUNE 1998 537 article random small-insert clones from a whole-genome shotgun library. exception, none of these uses A in the first position of the anticodon, This culminated in a composite sequence of 4,411,529 base pairs indicating that extensive wobble occurs during . This is (bp) (Figs 1, 2), with a G + C content of 65.6%. This represents the consistent with the high G + C content of the genome and the second-largest bacterial genome sequence currently available (after consequent bias in codon usage. Three genes encoding tRNAs for that of Escherichia coli)9. The initiation codon for the dnaA gene, a were found; one of these genes (metV) is situated in a hallmark for the origin of replication, oriC, was chosen as the start region that may correspond to the terminus of replication (Figs 1, point for numbering. The genome is rich in repetitive DNA, 2). As metV is linked to defective genes for and excisionase, particularly insertion sequences, and in new multigene families perhaps it was once part of a phage or similar mobile genetic and duplicated housekeeping genes. The G + C content is relatively element. constant throughout the genome (Fig. 1) indicating that horizon- Insertion sequences and prophages. Sixteen copies of the promis- 8 tally transferred pathogenicity islands of atypical base composition cuous insertion sequence IS6110 and six copies of the more stable are probably absent. Several regions showing higher than average G element IS1081 reside within the genome of H37Rv8. One copy of + C content (Fig. 1) were detected; these correspond to sequences IS1081 is truncated. Scrutiny of the genomic sequence led to the belonging to a large gene family that includes the polymorphic G + identification of a further 32 different insertion sequence elements, C-rich sequences (PGRSs). most of which have not been described previously, and of the 13E12 Genes for stable RNA. Fifty genes coding for functional RNA family of repetitive sequences which exhibit some of the character- molecules were found. These molecules were the three species istics of mobile genetic elements (Fig. 1). The newly discovered produced by the unique ribosomal RNA , the 10Sa RNA insertion sequences belong mainly to the IS3 and IS256 families, involved in degradation of proteins encoded by abnormal messen- although six of them define a new group. There is extensive ger RNA, the RNA component of RNase P, and 45 transfer . similarity between IS1561 and IS1552 with insertion sequence No 4.5S RNA could be detected. The rrn operon is situated elements found in and Rhodococcus spp., suggesting that unusually as it occurs about 1,500 kilobases (kb) from the putative they may be widely disseminated among the actinomycetes. oriC; most eubacteria have one or more rrn near to oriC to Most of the insertion sequences in M. tuberculosis H37Rv appear exploit the gene-dosage effect obtained during replication10. This to have inserted in intergenic or non-coding regions, often near arrangement may be related to the slow growth of M. tuberculosis. tRNA genes (Fig. 1). Many are clustered, suggesting the existence of The genes encoding tRNAs that recognize 43 of the 61 possible sense insertional hot-spots that prevent genes from being inactivated, as codons were distributed throughout the genome and, with one has been described for Rhizobium11. The chromosomal distribution of the insertion sequences is informative as there appears to have been a selection against insertions in the quadrant encompassing oriC and an overrepresentation in the direct repeat region that 0 contains the prototype IS6110. This bias was also observed experi- mentally in a transposon mutagenesis study12. 4 At least two prophages have been detected in the genome sequence and their presence may explain why M. tuberculosis shows persistent low-level lysis in culture. Prophages phiRv1 and phiRv2 are both ϳ10 kb in length and are similarly organized, and some of their gene products show marked similarity to those encoded by certain bacteriophages from Streptomyces and sapro- phytic mycobacteria. The site of insertion of phiRv1 is intriguing as it corresponds to part of a repetitive sequence of the 13E12 family 1 that itself appears to have integrated into the operon. Some M. tuberculosis strains of M. tuberculosis have been described as requiring biotin as a H37Rv growth supplement, indicating either that phiRv1 has a polar effect 4,411,529 bp on expression of the distal bio genes or that aberrant excision, leading to , may occur. During the serial attenuation of M. bovis that led to the vaccine strain M. bovis BCG, the phiRv1 prophage was lost13. In a systematic study of the genomic diversity 3 of prophages and insertion sequences (S.V.G. et al., manuscript in preparation), only IS1532 exhibited significant variability, indicat- ing that most of the prophages and insertion sequences are currently stable. However, from these combined observations, one can con- clude that horizontal transfer of genetic material into the free-living ancestor of the M. tuberculosis complex probably occurred in nature 2 before the tubercle bacillus adopted its specialized intracellular niche. Figure 1 Circular map of the chromosome of M. tuberculosis H37Rv. The outer circle shows the scale in Mb, with 0 representing the origin of replication. The first ring from the exterior denotes the positions of stable RNA genes (tRNAs are blue, Figure 2 Linear map of the chromosome of M. tuberculosis H37Rv showing the Q others are pink) and the direct repeat region (pink cube); the second ring inwards position and orientation of known genes and coding sequences (CDS). We used shows the coding sequence by strand (clockwise, dark green; anticlockwise, light the following functional categories (adapted from ref. 20): green); the third ring depicts repetitive DNA (insertion sequences, orange; 13E12 (black); intermediary metabolism and respiration (yellow); information pathways REP family, dark pink; prophage, blue); the fourth ring shows the positions of the (pink); regulatory proteins (sky blue); conserved hypothetical proteins (orange); PPE family members (green); the fifth ring shows the PE family members (purple, proteins of unknown function (light green); insertion sequences and phage- excluding PGRS); and the sixth ring shows the positions of the PGRS sequences related functions (blue); stable RNAs (purple); and cell processes (dark (dark red). The histogram (centre) represents G + C content, with Ͻ65% G + C in green); PE and PPE protein families (magenta); virulence, detoxification and yellow, and Ͼ65% G + C in red. The figure was generated with software from adaptation (white). For additional information about gene functions, refer to DNASTAR. http://www.sanger.ac.uk.

Nature © Macmillan Publishers Ltd 1998 538 NATURE | VOL 393 | 11 JUNE 1998 article

Genes encoding proteins. 3,924 open reading frames were identi- proteins, which may protect against oxidative stress or be involved fied in the genome (see Methods), accounting for ϳ91% of the in capture, were found. The ability of the bacillus to adapt its potential coding capacity (Figs 1, 2). A few of these genes appear to metabolism to environmental change is significant as it not only has have in-frame stop codons or frameshift (irrespective of to compete with the for oxygen but must also adapt to the the source of the DNA sequenced) and may either use frameshifting microaerophilic/anaerobic environment at the heart of the during translation or correspond to pseudogenes. Consistent with burgeoning granuloma. the high G + C content of the genome, GTG initiation codons (35%) Regulation and signal transduction. Given the complexity of the are used more frequently than in Bacillus subtilis (9%) and E. coli environmental and metabolic choices facing M. tuberculosis, an (14%), although ATG (61%) is the most common translational extensive regulatory repertoire was expected. Thirteen putative start. There are a few examples of atypical initiation codons, the sigma factors govern at the level of transcription8 most notable being the ATC used by infC, which begins with ATT in initiation, and more than 100 regulatory proteins are predicted both B. subtilis and E. coli9,14. There is a slight bias in the orientation (Table 1). Unlike B. subtilis and E. coli, in which there are Ͼ30 copies of the genes (Fig. 1) with respect to the direction of replication as of different two-component regulatory systems14, M. tuberculosis ϳ59% are transcribed with the same polarity as replication, has only 11 complete pairs of sensor and response compared with 75% in B. subtilis. In other bacteria, genes tran- regulators, and a few isolated and regulatory genes. This scribed in the same direction as the replication forks are believed to relative paucity in environmental signal transduction pathways is be expressed more efficiently9,14. Again, the more even distribution probably offset by the presence of a family of eukaryotic-like serine/ in gene polarity seen in M. tuberculosis may reflect the slow growth protein kinases (STPKs), which function as part of a and infrequent replication cycles. Three genes (dnaB, recA and phosphorelay system18. The STPKs probably have two domains: the Rv1461) have been invaded by sequences encoding inteins (protein well-conserved kinase domain at the amino terminus is predicted to introns) and in all three cases their counterparts in M. leprae also be connected by a transmembrane segment to the carboxy-terminal contain inteins, but at different sites15 (S.T.C. et al., unpublished region that may respond to specific stimuli. Several of the predicted observations). envelope lipoproteins, such as that encoded by lppR (Rv2403), show Protein function, composition and duplication. By using various extensive similarity to this putative receptor domain of STPKs, database comparisons, we attributed precise functions to ϳ40% of suggesting possible interplay. The STPKs probably function in the predicted proteins and found some information or similarity for signal transduction pathways and may govern important cellular another 44%. The remaining 16% resembled no known proteins decisions such as dormancy and cell division, and although their and may account for specific mycobacterial functions. Examination partners are unknown, candidate genes for phosphoprotein phos- of the amino-acid composition of the M. tuberculosis proteome by phatases have been identified. correspondence analysis16, and comparison with that of other Drug resistance. M. tuberculosis is naturally resistant to many microorganisms whose genome sequences are available, revealed a , making treatment difficult19. This resistance is due statistically significant preference for the amino acids Ala, Gly, Pro, mainly to the highly hydrophobic cell envelope acting as a perme- Arg and Trp, which are all encoded by G + C-rich codons, and a ability barrier4, but many potential resistance determinants are also comparative reduction in the use of amino acids encoded by A + T- encoded in the genome. These include hydrolytic or drug-modify- rich codons such as Asn, Ile, Lys, Phe and Tyr (Fig. 3). This approach ing enzymes such as ␤-lactamases and aminoglycoside acetyl also identified two groups of proteins rich in Asn or Gly that belong transferases, and many potential drug–efflux systems, such as 14 to new families, PE and PPE (see below). The fraction of the members of the major facilitator family and numerous ABC proteome that has arisen through gene duplication is similar to transporters. Knowledge of these putative resistance mechanisms that seen in E. coli or B. subtilis (ϳ51%; refs 9, 14), except that the will promote better use of existing drugs and facilitate the concep- level of sequence conservation is considerably higher, indicating tion of new therapies. that there may be extensive redundancy or differential production of the corresponding polypeptides. The apparent lack of divergence following gene duplication is consistent with the hypothesis that F2 -22.6% Glu 6 Mj M. tuberculosis is of recent descent . Ae Af Mth 0.1 General metabolism, regulation and drug resistance Tyr Val Gly Lys Ile Bs Arg Metabolic pathways. From the genome sequence, it is clear that the Met Mt 0 Bb Cys tubercle bacillus has the potential to synthesize all the essential Asp Pro Ala Hp Leu amino acids, and enzyme co-factors, although some of the Hi His Ec Phe Sc Ss pathways involved may differ from those found in other bacteria. M. -0.1 Ce Trp Ser Thr tuberculosis can metabolize a variety of , hydrocar- Mg 2,17 Asn Mp bons, , ketones and carboxylic acids . It is apparent from -0.2 genome inspection that, in addition to many functions involved in , the enzymes necessary for , the pentose -0.3 phosphate pathway, and the tricarboxylic acid and glyoxylate cycles Gln ϳ are all present. A large number ( 200) of , oxyge- -0.30 -0.15 0 0.15 0.30 nases and is predicted, as well as many oxygenases F1 - 55.2% containing , that are similar to fungal proteins Figure 3 Correspondence analysis of the proteomes from extensively involved in sterol degradation. Under aerobic growth conditions, sequenced organisms as a function of amino-acid composition. Note the ATP will be generated by oxidative from electron extreme position of M. tuberculosis and the shift in amino-acid preference transport chains involving a ubiquinone cytochrome b reductase reflecting increasing G + C content from left to right. Abbreviations used: Ae, complex and . Components of several anae- ; Af, Archaeoglobus fulgidis; Bb, Borrelia burgdorfei; Bs, B. robic phosphorylative electron transport chains are also present, subtilis; Ce, Caenorhabditis elegans; Ec, E. coli; Hi, Haemophilus influenzae; Hp, including genes for (narGHJI), fumarate reductase pylori; Mg, genitalium; Mj, Methanococcus jannaschi; (frdABCD) and possibly nitrite reductase (nirBD), as well as a new Mp, Mycoplasma pneumoniae; Mt, M. tuberculosis; Mth, Methanobacterium reductase (narX) that results from a rearrangement of a homologue thermoautotrophicum; Sc, Saccharomyces cerevisiae; Ss, Synechocystis sp. of the narGHJI operon. Two genes encoding haemoglobin-like strain PCC6803. F1 and F2, first and second factorial axes16.

Nature © Macmillan Publishers Ltd 1998 NATURE | VOL 393 | 11 JUNE 1998 539 article a Figure 4 Lipid metabolism. a, Degradation of host-cell lipids is vital in the intracellular life of OH O M. tuberculosis. Host-cell membranes provide R SCoA ATP echA1-21 fadB2-5 precursors for many metabolic processes, Citric NADH, FADH 2 O acid as well as potential precursors of mycobac- O O O Purines, β cycle terial cell-wall constituents, through the R -Oxidation R SCoA Pyrimidines SCoA (fadA/fadB) SCoA actions of a broad family of ␤-oxidative Porphyrins O accA1/accD1 Glyoxylate enzymes encoded by multiple copies in the fadE1-36 fadA2-6 accA2/accD2 shunt Amino acids genome. These enzymes produce acetyl R SCoA 8 Glucose CoA, which can be converted into many fadD1-36 O different metabolites and fuel for the bacteria S Co A O through the actions of the enzymes of the CO2 R OH cycle and the glyoxylate shunt of this cycle. b, The genes that synthesize Cell wall and mycobacterial lipids mycolic acids, the dominant lipid component of the mycobacterial cell wall, include the Host membranes type I synthase (fas) and a unique type II system which relies on extension of a precursor bound to an to b form full-length (ϳ80-) mycolic acids. H The cma genes are responsible for OH cyclopropanation. c, The genes that produce O phthiocerol dimycocerosate form a large H OH operon and represent type I (mas) and type fas II (the pps operon) synthase systems. Functions are colour coordinated.

0 1 2 3 4 5 6 7 8 (kb)

fabD acpM kasA kasB accD

mabA inhA

cmaA1 Gene Function

fas , produces C16 -C 26 acyl-CoA fabD Malonyl-CoA:AcpM , AcpM loading cmaA2 acpM Acyl carrier protein, meromycolate precursor transport kasA Ketoacyl acyl carrier protein synthase, chain elongation kasB Ketoacyl acyl carrier protein synthase, chain elongation accD Acetyl-CoA carboxylase, malonyl-CoA synthesis mabA 3-oxo-acyl-acyl carrier protein reductase, reduces KasA/B inhA Enoyl-acyl carrier protein reductase cmaA1 cyclopropane synthase 1, distal-position specific cmaA2 cyclopropane mycolic acid synthase 2, proximal-position specific

c CH 3 CH 3 O O O OCH CH 3 ppsA ppsB ppsC ppsD ppsE mas 3 O CH 3 CH 3 CH 3 CH 3 CH 3 CH 3 CH 3

0 10 20 30 40 (kb)

Gene Function mas Four rounds extension of C 16,18 using Me-malonyl CoA ppsA Extension of C 18 with malony CoA, partial reduction ppsB Extension with malony CoA, partial reduction ppsC Extension with malony CoA, complete reduction ppsD Extension with Me-malony CoA, partial reduction ppsE Extension with malony CoA, partial reduction,

Nature © Macmillan Publishers Ltd 1998 540 NATURE | VOL 393 | 11 JUNE 1998 article

Lipid metabolism because they are difficult to distinguish from other members of the Very few organisms produce such a diverse array of lipophilic short-chain family on the basis of primary molecules as M. tuberculosis. These molecules range from simple sequence. The five enzymes that complete the cycle by thiolysis of fatty acids such as palmitate and tuberculostearate, through iso- the ␤-ketoester, the acetyl-CoA C-, do indeed prenoids, to very-long-chain, highly complex molecules such as appear to be a more limited family. In addition to this extensive mycolic acids and the phenolphthiocerol alcohols that esterify with set of dissociated degradative enzymes, the genome also encodes the mycocerosic acid to form the scaffold for attachment of the myco- canonical FadA/FadB ␤-oxidation complex (Rv0859 and Rv0860). sides. Mycobacteria contain examples of every known lipid and Accessory activities are present for the metabolism of odd-chain and polyketide biosynthetic system, including enzymes usually found in multiply unsaturated fatty acids. mammals and as well as the common bacterial systems. The Fatty acid . At least two discrete types of enzyme8 biosynthetic capacity is overshadowed by the even more remarkable system, fatty acid synthase (FAS) I and FAS II, are involved in radiation of degradative, fatty acid oxidation systems and, in total, fatty acid biosynthesis in mycobacteria (Fig. 4b). FAS I (Rv2524, fas) there are ϳ250 distinct enzymes involved in is a single polypeptide with multiple catalytic activities that gen- in M. tuberculosis compared with only 50 in E. coli20. erates several shorter CoA esters from acetyl-CoA primers5 and . In vivo-grown mycobacteria have been probably creates precursors for elongation by all of the other fatty suggested to be largely lipolytic, rather than lipogenic, because of acid and polyketide systems. FAS II consists of dissociable enzyme the variety and quantity of lipids available within mammalian cells components which act on a bound to an acyl-carrier and the tubercle2 (Fig. 4a). The abundance of genes encoding protein (ACP). FAS II is incapable of de novo but components of fatty acid oxidation systems found by our genomic instead elongates palmitoyl-ACP to fatty acids ranging from 24 to approach supports this proposition, as there are 36 acyl-CoA 56 in length17,21. Several different components of FAS II may synthases and a family of 36 related enzymes that could catalyse be targets for the important tuberculosis drug , including the first step in fatty acid degradation. There are 21 homologous the enoyl-ACP reductase InhA22, the ketoacyl-ACP synthase KasA enzymes belonging to the enoyl-CoA hydratase/ super- and the ACP AcpM21. Analysis of the genome shows that there are family of enzymes, which rehydrate the nascent product of the acyl- only three potential ketoacyl synthases: KasA and KasB are highly CoA dehydrogenase. The four enzymes that convert the 3-hydroxy related, and their genes cluster with acpM, whereas KasC is a more fatty acid into a 3-keto fatty acid appear less numerous, mainly distant homologue of a ketoacyl synthase III system. The number of ketoacyl synthase and ACP genes indicates that there is a single FAS II system. Its genetic organization, with two clustered ketoacyl synthases, resembles that of type II aromatic polyketide biosynthetic a gene clusters, such as those for actinorhodin, and tetracenomycin in Streptomyces species23. InhA seems to be the sole Representatives of the PE family enoyl-ACP reductase and its gene is co-transcribed with a fabG

PE PGRS (GGAGGA)n homologue, which encodes 3-oxoacyl-ACP reductase. Both of these ~110 aa 0 .. >1,400 aa proteins are probably important in the biosynthesis of mycolic acids. PE unique sequence Fatty acids are synthesized from malonyl-CoA and precursors are ~110 aa 0 .. ~500 aa generated by the enzymatic of acetyl (or propionyl)- Representatives of the PPE family CoA by a biotin-dependent carboxylase (Fig. 4b). From study of the genome we predict that there are three complete carboxylase PPE MPTR (NxGxGNxG)n ~180 aa ~200 .. >3,500 aa systems, each consisting of an ␣- and a ␤-subunit, as well as three PPE GxxSVPxxW ␤-subunits without an ␣-counterpart. As a group, all of the ~180 aa ~200 .. ~400 aa carboxylases seem to be more related to the mammalian homo- PPE unique sequence ~180 aa 0 .. ~400 aa logues than to the corresponding bacterial enzymes. Two of these carboxylase systems (accA1, accD1 and accA2, accD2) are probably involved in degradation of odd-numbered fatty acids, as they are b PE-PGRS sequence variation in ORF Rv0746 adjacent to genes for other known degradative enzymes. They may convert propionyl-CoA to succinyl-CoA, which can then be incor- H37Rv 29 aa PE PGRS porated into the tricarboxylic acid cycle. The synthetic carboxylases (accA3, accD3, accD4, accD5 and accD6) are more difficult to BCG ␤ PE PGRS understand. The three extra -subunits might direct carboxylation 46 aa to the appropriate precursor or may simply increase the total amount of carboxylated precursor available if this step were rate- H37Rv .. ...GSGAPGGAGGAAGLWGTGGAGGAGGSSAGGGGAGGAGGAGGWLLGDGGAGGIGGAST ...... ::::::::::::::::: ...... ::::::::::: limiting. BCG ...... GSGAPGGAGGAAGLWGT------GGAGGIGGAST H37Rv .. VLGGTGGGGGVGGLWGAGGAGGAGGTGLVGGDGGAGGAGGTGGLLAGLIGAGGGHGGTGG Synthesis of the paraffinic backbone of fatty and mycolic acids in ...... :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: BCG .... VLGGTGGGGGVGGLWGAGGAGGAGGTGLVGGDGGAGGAGGTGGLLAGLIGAGGGHGGTGG the cell is followed by extensive postsynthetic modifications and H37Rv .. LSTNGDGGVGGAGGNAGMLAGPGGAGGAGGDGENLDTGGDGGAGGSAGLLFGSGGAGGAG 24,25 ...... :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: unsaturations, particularly in the case of the mycolic acids . BCG .... LSTNGDGGVGGAGGNAGMLAGPGGAGGAGGDGENLDTGGDGGAGGSAGLLFGSGGAGGAG Unsaturation is catalysed either by a FabA-like ␤-hydroxyacyl- H37Rv .. GFGFLGGDGGAGGNAGLLLSSGGAGGFGGFGTAGGVGGAGGNAGWLGF------...... :::::::::::::::::::::::::::::::::::::::::::::::: BCG .... GFGFLGGDGGAGGNAGLLLSSGGAGGFGGFGTAGGVGGAGGNAGWLGFGGAGGIGGIGGN ACP dehydrase, acting with a specific ketoacyl synthase, or by an H37Rv .. ------GGAGGVGGSAGLIGTGGNGGNGGTGA aerobic terminal mixed function desaturase that uses both mole- ...... :::::::::::::::::::::::::::::::::: :::::::::::::::::::::::::: BCG .... ANGGAGGNGGTGGQLWGSGGAGGEGGAALSVGDTGGAGGVGGSAGLIGTGGNGGNGGTGA cular oxygen and NADPH. Inspection of the genome revealed no H37Rv .. NAGSPGTGGAGGLLLGQNGLNGLP* ...... :::::::::::::::::::::::: obvious candidates for the FabA-like activity. However, three BCG .... NAGSPGTGGAGGLLLGQNGLNGLP* potential aerobic desaturases (encoded by desA1, desA2 and desA3) were evident that show little similarity to related vertebrate or enzymes (which act on CoA esters) but instead resemble Figure 5 The PE and PPE protein families. a, Classification of the PE and PPE desaturases (which use ACP esters). Consequently, the geno- protein families. b, Sequence variation between M. tuberculosis H37Rv and M. mic data indicate that unsaturation of the meromycolate chain may bovis BCG-Pasteur in the PE-PGRS encoded by open (ORF) occur while the acyl group is bound to AcpM. Rv0746. Much of the subsequent structural diversity in mycolic acids is

Nature © Macmillan Publishers Ltd 1998 NATURE | VOL 393 | 11 JUNE 1998 541 article generated by a family of S-adenosyl-L-methionine-dependent made by a process that is mechanistically analogous to polyketide enzymes, which use the unsaturated meromycolic acid as a substrate synthesis23,27. These peptides include the structurally related iron- to generate cis and trans cyclopropanes and other mycolates. Six scavenging siderophores, the mycobactins and the exochelins2,28, members of this family have been identified and characterized25 and which are derived from salicylate by the addition of serine (or two clustered, convergently transcribed new genes are evident in the threonine), two and various fatty acids and possible poly- genome (umaA1 and umaA2). From the functions of the known ketide segments. The mbt operon, encoding one apparent salicylate- family members and the structures of mycolic acids in M. tubercu- activating protein, three amino-acid , and a single module of losis, it is tempting to speculate that these new enzymes may a type I polyketide synthase, may be responsible for the biosynthesis introduce the trans cyclopropanes into the meromycolate precursor. of the mycobacterial siderophores. The presence of only one non- In addition to these two , there are two other ribosomal peptide-synthesis system indicates that this pathway may 8 unrelated lipid methyltransferases (Ufa1 and Ufa2) that share generate both siderophores and that subsequent modification of a homology with cyclopropane fatty acid synthase of E. coli25. single ⑀-amino group of one lysine residue may account for the Although cyclopropanation seems to be a relatively common different physical properties and function of the siderophores28. modification of mycolic acids, cyclopropanation of plasma-mem- brane constituents has not been described in mycobacteria. Tuber- Immunological aspects and pathogenicity culostearic acid is produced by methylation of oleic acid, and may Given the scale of the global tuberculosis burden, vaccination is not be synthesized by one of these two enzymes. only a priority but remains the only realistic public health inter- Condensation of the fully functionalized and preformed mero- vention that is likely to affect both the incidence and the prevalence mycolate chain with a 26-carbon ␣-branch generates full-length of the disease29. Several areas of vaccine development are promising, mycolic acids that must be transported to their final location for including DNA vaccination, use of secreted or surface-exposed attachment to the cell-wall arabinogalactan. The transfer and proteins as immunogens, recombinant forms of BCG and rational subsequent transesterification is mediated by three well-known attenuation of M. tuberculosis29. All of these avenues of research will immunogenic proteins of the antigen 85 complex26. The genome benefit from the genome sequence as its availability will stimulate encodes a fourth member of this complex, antigen 85CЈ ( fbpC2, more focused approaches. Genes encoding ϳ90 lipoproteins were Rv0129), which is highly related to antigen 85C. Further studies are identified, some of which are enzymes or components of transport needed to show whether the protein possesses mycolytransferase systems, and a similar number of genes encoding preproteins (with activity and to clarify the reason behind the apparent redundancy. type I signal peptides) that are probably exported by the Sec- Polyketide synthesis. Mycobacteria synthesize by sev- dependent pathway. M. tuberculosis seems to have two copies of eral different mechanisms. A modular type I system, similar to that secA. The potent T-cell antigen Esat-6 (ref. 30), which is probably involved in biosynthesis23, is encoded by a very large secreted in a Sec-independent manner, is encoded by a member of a operon, ppsABCDE, and functions in the production of multigene family. Examination of the genetic context reveals several phenolphthiocerol5. The absence of a second type I polyketide similarly organized operons that include genes encoding large ATP- synthase suggests that the related lipids phthiocerol A and B, hydrolysing membrane proteins that might act as transporters. One phthiodiolone A and phthiotriol may all be synthesized by the of the surprises of the genome project was the discovery of two same system, either from alternative primers or by differential extensive families of novel glycine-rich proteins, which may be of postsynthetic modification. It is physiologically significant that immunological significance as they are predicted to be abundant the pps gene cluster occurs immediately upstream of mas, which and potentially polymorphic antigens. encodes the multifunctional enzyme mycocerosic acid synthase The PE and PPE multigene families. About 10% of the coding (MAS), as their products phthiocerol and mycocerosic acid esterify capacity of the genome is devoted to two large unrelated families of to form the very abundant cell-wall-associated molecule phthio- acidic, glycine-rich proteins, the PE and PPE families, whose genes cerol dimycocerosate (Fig. 4c). are clustered (Figs 1, 2) and are often based on multiple copies of the Members of another large group of polyketide synthase enzymes polymorphic repetitive sequences referred to as PGRSs, and major are similar to MAS, which also generates the multiply methyl- polymorphic tandem repeats (MPTRs), respectively31,32. The names branched fatty acid components of mycosides and phthiocerol PE and PPE derive from the motifs Pro–Glu (PE) and Pro–Pro– dimycocerosate, abundant cell-wall-associated molecules5. Although Glu (PPE) found near the N terminus in most cases33. The 99 some of these polyketide synthases may extend type I FAS CoA members of the PE all have a highly conserved N- primers to produce other long-chain methyl-branched fatty acids terminal domain of ϳ110 amino-acid residues that is predicted to such as mycolipenic, mycolipodienic and mycolipanolic acids or the have a globular structure, followed by a C-terminal segment that phthioceranic and hydroxyphthioceranic acids, or may even show varies in size, sequence and repeat copy number (Fig. 5). Phyloge- functional overlap5, there are many more of these enzymes than netic analysis separated the PE family into several subfamilies. The there are known metabolites. Thus there may be new lipid and largest of these is the highly repetitive PGRS class, which contains 61 polyketide metabolites that are expressed only under certain con- members; members of the other subfamilies, share very limited ditions, such as during infection and disease. sequence similarity in their C-terminal domains (Fig. 5). The A fourth class of polyketide synthases is related to the plant predicted molecular weights of the PE proteins vary considerably enzyme superfamily that includes chalcone and stilbene synthase23. as a few members contain only the N-terminal domain, whereas These polyketide synthases are phylogenetically divergent from all most have C-terminal extensions ranging in size from 100 to 1,400 other polyketide and fatty acid synthases and generate unreduced residues. The PGRS proteins have a high glycine content (up to polyketides that are typically associated with anthocyanin pigments 50%), which is the result of multiple tandem repetitions of Gly– and flavonoids. The function of these systems, which are often Gly–Ala or Gly–Gly–Asn motifs, or variations thereof. linked to apparent type I modules, is unknown. An example is the The 68 members of the PPE protein family (Fig. 5) also have a gene cluster spanning pks10, pks7, pks8 and pks9, which includes two conserved N-terminal domain that comprises ϳ180 amino-acid of the chalcone-synthase-like enzymes and two modules of an residues, followed by C-terminal segments that vary markedly in apparent type I system. The unknown metabolites produced by sequence and length. These proteins fall into at least three groups, these enzymes are interesting because of the potent biological one of which constitutes the MPTR class characterized by the activities of some polyketides such as the immunosuppressor presence of multiple, tandem copies of the motif Asn–X–Gly–X– rapamycin. Gly–Asn–X–Gly. The second subgroup contains a characteristic, Siderophores. Peptides that are not ribosomally synthesized are well-conserved motif around position 350, whereas the third contains

Nature © Macmillan Publishers Ltd 1998 542 NATURE | VOL 393 | 11 JUNE 1998 article proteins that are unrelated except for the presence of the common but the complex nature of its biosynthesis makes it difficult to 180-residue PPE domain. identify critical genes whose inactivation would lead to attenuation. The subcellular location of the PE and PPE proteins is unknown On inspection of the genome sequence, it was apparent that four and in only one case, that of a (Rv3097), has a function been copies of mce were present and that these were all situated in demonstrated. On examination of the protein database from the operons, comprising eight genes, organized in exactly the same extensively sequenced M. leprae15, no PGRS- or MPTR-related manner. In each case, the genes preceding mce code for integral polypeptides were detected but a few proteins belonging to the membrane proteins, whereas mce and the following five genes are all non-MPTR subgroup of the PPE family were found. These proteins predicted to encode proteins with signal sequences or hydrophobic include one of the major antigens recognized by leprosy patients, stretches at the N terminus. These sets of proteins, about which little the serine-rich antigen34. Although it is too early to attribute is known, may well be secreted or surface-exposed; this is consistent8 biological functions to the PE and PPE families, it is tempting to with the proposed role of Mce in invasion of host cells42. Further- speculate that they could be of immunological importance. Two more, a homologue of smpB, which has been implicated in intra- interesting possibilities spring to mind. First, they could represent cellular survival of Salmonella typhimurium, has also been the principal source of antigenic variation in what is otherwise a identified43. Among the other secreted proteins identified from genetically and antigenically homogeneous bacterium. Second, the genome sequence that could act as virulence factors are a these glycine-rich proteins might interfere with immune responses series of C, and , which might by inhibiting antigen processing. attack cellular or vacuolar membranes, as well as several proteases. Several observations and results support the possibility of anti- One of these phospholipases acts as a contact-dependent haemoly- genic variation associated with both the PE and the PPE family sin (N. Stoker, personal communication). The presence of storage proteins. The PGRS member Rv1759 is a fibronectin-binding proteins in the bacillus, such as the haemoglobin-like oxygen protein of relative molecular mass 55,000 (ref. 35) that elicits a captors described above, points to its ability to stockpile essential variable antibody response, indicating either that individuals growth factors, allowing it to persist in the nutrient-limited envir- mount different immune responses or that this PGRS protein onment of the phagosome. In this regard, the ferritin-like proteins, may vary between strains of M. tuberculosis. The latter possibility encoded by bfrA and bfrB, may be important in intracellular survival is supported by restriction fragment length polymorphisms for as the capacity to acquire enough iron in the vacuole is very various PGRS and MPTR sequences in clinical isolates33. Direct limited. Ⅺ support for genetic variation within both the PE and the PPE ...... families was obtained by comparative DNA sequence analysis (Fig. Methods 5). The gene for the PE–PGRS protein Rv0746 of BCG differs from Sequence analysis. Initially, ϳ3.2 Mb of sequence was generated from that in H37Rv by the deletion of 29 codons and the insertion of 46 cosmids8 and the remainder was obtained from selected BAC clones7 and codons. Similar variation was seen in the gene for the PPE protein 45,000 whole-genome shotgun clones. Sheared fragments (1.4–2.0 kb) from Rv0442 (data not shown). As these differences were all associated cosmids and BACs were cloned into M13 vectors, whereas genomic DNA was with repetitive sequences they could have resulted from intergenic cloned in pUC18 to obtain both forward and reverse reads. The PGRS genes or intragenic recombinational events or, more probably, from were grossly underrepresented in pUC18 but better covered in the BAC and strand slippage during replication32. These mechanisms are cosmid M13 libraries. We used small-insert libraries44 to sequence regions known to generate antigenic variability in other bacterial prone to compression or deletion and, in some cases, obtained sequences from pathogens36. products of the chain reaction or directly from BACs7. All shotgun There are several parallels between the PGRS proteins and the sequencing was performed with standard dye terminators to minimize com- Epstein–Barr virus nuclear antigens (EBNAs). Members of both pression problems, whereas finishing reactions used dRhodamine or BigDye polypeptide families are glycine-rich, contain extensive Gly–Ala terminators (http://www.sanger.ac.uk). Problem areas were verified by using repeats, and exhibit variation in the length of the repeat region dye primers. Thirty differences were found between the genomic shotgun between different isolates. The Gly–Ala repeat region of EBNA1 sequences and the cosmids; twenty of which were due to sequencing errors and functions as a cis-acting inhibitor of the /proteasome ten to mutations in cosmids (1 error per 320 kb). Less than 0.1% of the antigen-processing pathway that generates peptides presented in sequence was from areas of single-clone coverage, and Ͻ0.2% was from one the context of major histocompatibility complex (MHC) class I strand with only one sequencing chemistry. molecules37,38. MHC class I knockout mice are very susceptible to M. Informatics. Sequence assembly involved PHRAP, GAP4 (ref. 45) and a tuberculosis, underlining the importance of a cytotoxic T-cell customized perl script that merges sequences from different libraries and response in protection against disease3,39. Given the many potential generates segments that can be processed by several finishers simultaneously. effects of the PPE and PE proteins, it is important that further Sequence analysis and annotation was managed by DIANA (B.G.B. et al., studies are performed to understand their activity. If extensive unpublished). Genes encoding proteins were identified by TB-parse46 using a antigenic variability or reduced antigen presentation were indeed hidden Markov model trained on known M. tuberculosis coding and non- found, this would be significant for vaccine design and for under- coding regions and translation-initiation signals, with corroboration by posi- standing protective immunity in tuberculosis, and might even tional base preference. Interrogation of the EMBL, TREMBL, SwissProt, explain the varied responses seen in different BCG vaccination PROSITE47 and in-house databases involved BLASTN, BLASTX48, DOTTER programmes40. (http://www.sanger.ac.uk) and FASTA49. tRNA genes were located and identi- Pathogenicity. Despite intensive research efforts, there is little fied using tRNAscan and tRNAscan-SE50. The complete sequence, a list of information about the molecular basis of mycobacterial virulence41. annotated cosmids and linking regions can be found on our website (http:// However, this situation should now change as the genome sequence www. sanger.ac.uk) and in MycDB (http://www.pasteur.fr/mycdb/). will accelerate the study of pathogenesis as never before, because other bacterial factors that may contribute to virulence are becom- Received 15 April; accepted 8 May 1998. ing apparent. Before the completion of the genome sequence, only 1. Snider, D. E. Jr, Raviglione, M. & Kochi, A. in Tuberculosis: Pathogenesis, Protection, and Control (ed. three virulence factors had been described41: -peroxidase, Bloom, B. R.) 2–11 (Am. Soc. Microbiol., Washington DC, 1994). 2. Wheeler, P. R. & Ratledge, C. in Tuberculosis: Pathogenesis, Protection, and Control (ed. Bloom, B. R.) which protects against reactive oxygen species produced by the 353–385 (Am. Soc. Microbiol., Washington DC, 1994). phagocyte; mce, which encodes macrophage-colonizing factor42; 3. Chan, J. & Kaufmann, S. H. E. in Tuberculosis: Pathogenesis, Protection, and Control (ed. Bloom, B. R.) 271–284 (Am. Soc. Microbiol., Washington DC, 1994). and a sigma factor gene, sigA (aka rpoV), mutations in which can 4. Brennan, P. J. & Draper, P. in Tuberculosis: Pathogenesis, Protection, and Control (ed. Bloom, B. R.) 41 lead to attenuation . In addition to these single-gene virulence 271–284 (Am. Soc. Microbiol., Washington DC, 1994). factors, the mycobacterial cell wall4 is also important in pathology, 5. Kolattukudy, P. E., Fernandes, N. D., Azad, A. K., Fitzmaurice, A. M. & Sirakova, T. D. Biochemistry

Nature © Macmillan Publishers Ltd 1998 NATURE | VOL 393 | 11 JUNE 1998 543 article

and molecular genetics of cell-wall lipid biosynthesis in mycobacteria. Mol. Microbiol. 24, 263–270 1710–1717 (1995). (1997). 31. Hermans, P. W. M., van Soolingen, D. & van Embden, J. D. A. Characterization of a major 6. Sreevatsan, S. et al. Restricted structural gene polymorphism in the Mycobacterium tuberculosis polymorphic tandem repeat in Mycobacterium tuberculosis and its potential use in the epidemiology complex indicates evolutionarily recent global dissemination. Proc. Natl Acad. Sci. USA 94, 9869– of Mycobacterium kansasii and Mycobacterium gordonae. J. Bacteriol. 174, 4157–4165 (1992). 9874 (1997). 32. Poulet, S. & Cole, S. T. Characterisation of the polymorphic GC-rich repetitive sequence (PGRS) 7. Brosch, R. et al. Use of a Mycobacterium tuberculosis H37Rv bacterial artificial chromosome library for present in Mycobacterium tuberculosis. Arch. Microbiol. 163, 87–95 (1995). genome mapping, sequencing and comparative genomics. Infect. Immun. 66, 2221–2229 (1998). 33. Cole, S. T. & Barrell, B. G. in Genetics and Tuberculosis (eds Chadwick, D. J. & Cardew, G., Novartis 8. Philipp, W. J. et al. An integrated map of the genome of the tubercle bacillus, Mycobacterium Foundation Symp. 217) 160–172 (Wiley, Chichester, 1998). tuberculosis H37Rv, and comparison with Mycobacterium leprae. Proc. Natl Acad. Sci. USA 93, 3132– 34. Vega-Lopez, F. et al. Sequence and immunological characterization of a serine-rich antigen from 3137 (1996). Mycobacterium leprae. Infect. Immun. 61, 2145–2153 (1993). 9. Blattner, F. R. et al. The complete genome sequence of Escherichia coli K-12. Science 277, 1453–1462 35. Abou-Zeid, C. et al. Genetic and immunological analysis of Mycobacterium tuberculosis fibronectin- (1997). binding proteins. Infect. Immun. 59, 2712–2718 (1991). 10. Cole, S. T. & Saint-Girons, I. Bacterial genomics. FEMS Microbiol. Rev. 14, 139–160 (1994). 36. Robertson, B. D. & Meyer, T. F. Genetic variation in pathogenic bacteria. Trends Genet. 8, 422–427 8 11. Freiberg, C. et al. Molecular basis of symbiosis between Rhizobium and legumes. Nature 387, 394–401 (1992). (1997). 37. Levitskaya, J. et al. Inhibition of antigen processing by the internal repeat region of the Epstein-Barr 12. Bardarov, S. et al. Conditionally replicating mycobacteriophages: a system for transposon delivery to virus nuclear antigen-1. Nature 375, 685–688 (1995). Mycobacterium tuberculosis. Proc. Natl Acad. Sci. USA 94, 10961–10966 (1997). 38. Levitskaya, J., Sharipo, A., Leonchiks, A., Ciechanover, A. & Masucci, M. G. Inhibition of ubiquitin/ 13. Mahairas, G. G., Sabo, P. J., Hickey, M. J., Singh, D. C. & Stover, C. K. Molecular analysis of genetic proteasome-dependent protein degradation by the Gly-Ala repeat domain of the Epstein-Barr virus differences between Mycobacterium bovis BCG and virulent M. bovis. J. Bacteriol. 178, 1274–1282 nuclear antigen 1. Proc. Natl Acad. Sci. USA 94, 12616–12621 (1997). (1996). 39. Flynn, J. L., Goldstein, M. A., Treibold, K. J., Koller, B. & Bloom, B. R. Major histocompatability 14. Kunst, F. et al. The complete genome sequence of the gram-positive bacterium Bacillus subtilis. Nature complex class-I restricted T cells are required for resistance to Mycobacterium tuberculosis infection. 390, 249–256 (1997). Proc. Natl Acad. Sci. USA 89, 12013–12017 (1992). 15. Smith, D. R. et al. Multiplex sequencing of 1.5 Mb of the Mycobacterium leprae genome. Genome Res. 40. Bloom, B. R. & Fine, P. E. M. in Tuberculosis: Pathogenesis, Protection, and Control (ed. Bloom, B. R.) 7, 802–819 (1997). 531–557 (Am. Soc. Microbiol., Washington DC, 1994). 16. Greenacre, M. Theory and Application of Correspondence Analysis (Academic, London, 1984). 41. Collins, D. M. In search of tuberculosis virulence genes. Trends Microbiol. 4, 426–430 (1996). 17. Ratledge, C. R. in The Biology of the Mycobacteria (eds Ratledge, C. & Stanford, J.) 53–94 (Academic, 42. Arruda, S., Bomfim, G., Knights, R., Huima-Byron, T. & Riley, L. W. Cloning of an M. tuberculosis San Diego, 1982). DNA fragment associated with entry and survival inside cells. Science 261, 1454–1457 (1993). 18. Av-Gay, Y. & Davies, J. Components of eukaryotic-like protein signaling pathways in Mycobacterium 43. Baumler, A. J., Kusters, J. G., Stojikovic, I. & Heffron, F. Salmonella typhimurium loci involved in tuberculosis. Microb. Comp. Genomics 2, 63–73 (1997). survival within macrophages. Infect. Immun. 62, 1623–1630 (1994). 19. Cole, S. T. & Telenti, A. Drug resistance in Mycobacterium tuberculosis. Eur. Resp. Rev. 8, 701S–713S 44. McMurray, A. A., Sulston, J. E. & Quail, M. A. Short-insert libraries as a method of problem solving in (1995). genome sequencing. Genome Res. 8, 562–566 (1998). 20. Riley, M. & Labedan, B. in Escherichia coli and Salmonella (ed. Neidhardt, F. C.) 2118–2202 (ASM, 45. Bonfield, J. K., Smith, K. F. & Staden, R. A new DNA sequence assembly program. Nucleic Acids Res. Washington, 1996). 24, 4992–4999 (1995). 21. Mdluli, K. et al. Inhibition of a Mycobacterium tuberculosis ␤-ketoacyl ACP synthase by isoniazid. 46. Krogh, A., Mian, I. S. & Haussler, D. A hidden Markov model that finds genes in E. coli DNA. Nucleic Science 280, 1607–1610 (1998). Acids Res. 22, 4768–4778 (1994). 22. Banerjee, A. et al. inhA, a gene encoding a target for isoniazid and ethionamide in Mycobacterium 47. Bairoch, A., Bucher, P. & Hofmann, K. The PROSITE database, its status in 1997. Nucleic Acids Res. 25, tuberculosis. Science 263, 227–230 (1994). 217–221 (1997). 23. Hopwood, D. A. Genetic contributions to understanding polyketide synthases. Chem. Rev. 97, 2465– 48. Altschul, S., Gish, W., Miller, W., Myers, E. & Lipman, D. A basic local alignment search tool. J. Mol. 2497 (1997). Biol. 215, 403–410 (1990). 24. Minnikin, D. E. in The Biology of the Mycobacteria (eds Ratledge, C. & Stanford, J.) 95–184 (Academic, 49. Pearson, W. & Lipman, D. Improved tools for biological sequence comparisons. Proc. Natl Acad. USA London, 1982). 85, 2444–2448 (1988). 25. Barry, C. E. III et al. Mycolic acids: structure, biosynthesis, and phsyiological functions. Prog. Lipid 50. Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in Res. (in the press). genomic DNA. Nucleic Acids Res. 25, 955–964 (1997). 26. Belisle, J. T. et al. Role of the major antigen of Mycobacterium tuberculosis in cell wall biogenesis. Science 276, 1420–1422 (1997). Acknowledgements. We thank Y. Av-Gay, F.-C. Bange, A. Danchin, B. Dujon, W. R. Jacobs Jr, L. Jones, 27. Marahiel, M. A., Stachelhaus, T. & Mootz, H. D. Modular peptide synthetases involved in M. McNeil, I. Moszer, P. Rice and J. Stephenson for advice, reagents and support. This work was supported by the Wellcome Trust. Additional funding was provided by the Association Franc¸aise Raoul Follereau, synthesis. Chem. Rev. 97, 2651–2673 (1997). the World Health Organisation and the Institut Pasteur. S.V.G. received a Wellcome Trust travelling 28. Gobin, J. et al. Iron acquisition by Mycobacterium tuberculosis: isolation and characterization of a research fellowship. family of iron-binding exochelins. Proc. Natl Acad. Sci. USA 92, 5189–5193 (1995). 29. Young, D. B. & Fruth, U. in New Generation (eds Levine, M., Woodrow, G., Kaper, J. & Cobon, Correspondence and requests for materials should be addressed to B.G.B. ([email protected]) or S.T.C. G. S.) 631–645 (Marcel Dekker, New York, 1997). ([email protected]). The complete sequence has been deposited in EMBL/GenBank/DDBJ as MTBH37RV, 30. Sorensen, A. L., Nagai, S., Houen, G., Andersen, P. & Anderson, A. B. Purification and characterization accession number AL123456. of a low-molecular-mass T-cell antigen secreted by Mycobacterium tuberculosis. Infect. Immun. 63,

Nature © Macmillan Publishers Ltd 1998 544 NATURE | VOL 393 | 11 JUNE 1998 Table 1. Functional classification of Mycobacterium tuberculosis protein-coding genes

I. Small-molecule metabolism superfamily Rv3543c fadE29 acyl-CoA dehydrogenase A. Degradation Rv2831 echA16 enoyl-CoA hydratase/isomerase Rv3560c fadE30 acyl-CoA dehydrogenase 1. Carbon compounds superfamily Rv3562 fadE31 acyl-CoA dehydrogenase Rv0186 bglS β-glucosidase Rv3039c echA17 enoyl-CoA hydratase/isomerase Rv3563 fadE32 acyl-CoA dehydrogenase Rv2202c cbhK kinase superfamily Rv3564 fadE33 acyl-CoA dehydrogenase Rv0727c fucA L-fuculose phosphate aldolase Rv3373 echA18 enoyl-CoA hydratase/isomerase Rv3573c fadE34 acyl-CoA dehydrogenase Rv1731 gabD1 succinate-semialdehyde dehydro- superfamily, N-term Rv3797 fadE35 acyl-CoA dehydrogenase genase Rv3374 echA18' enoyl-CoA hydratase/isomerase Rv3761c fadE36 acyl-CoA dehydrogenase Rv0234c gabD2 succinate-semialdehyde dehydro- superfamily, C-term Rv1175c fadH 2,4-Dienoyl-CoA Reductase genase Rv3516 echA19 enoyl-CoA hydratase/isomerase Rv0855 far fatty acyl-CoA racemase Rv0501 galE1 UDP-glucose 4-epimerase superfamily Rv1143 mcr α-methyl acyl-CoA racemase β 8 Rv0536 galE2 UDP-glucose 4-epimerase Rv3550 echA20 enoyl-CoA hydratase/isomerase Rv1492 mutA methylmalonyl-CoA , Rv0620 galK superfamily subunit Rv0619 galT galactose-1-phosphate uridylyl- Rv3774 echA21 enoyl-CoA hydratase/isomerase Rv1493 mutB methylmalonyl-CoA mutase, α C-term superfamily subunit Rv0618 galT' galactose-1-phosphate uridylyl- Rv0859 fadA β oxidation complex, β subunit Rv2504c scoA 3-oxo acid:CoA transferase, α sub- transferase N-term (acetyl-CoA C-) unit Rv0993 galU UTP-glucose-1-phosphate uridylyl- Rv0243 fadA2 acetyl-CoA C-acetyltransferase Rv2503c scoB 3-oxo acid:CoA transferase, β sub- transferase Rv1074c fadA3 acetyl-CoA C-acetyltransferase unit Rv3696c glpK ATP:glycerol 3-phosphotrans- Rv1323 fadA4 acetyl-CoA C-acetyltransferase Rv1136 - probable carnitine racemase ferase (aka thiL) Rv1683 - possible acyl-CoA synthase Rv3255c manA mannose-6-phosphate isomerase Rv3546 fadA5 acetyl-CoA C-acetyltransferase Rv3441c mrsA or phospho- Rv3556c fadA6 acetyl-CoA C-acetyltransferase 4. Phosphorous compounds mannomutase Rv0860 fadB β oxidation complex, α subunit Rv2368c phoH ATP-binding pho regulon Rv0118c oxcA oxalyl-CoA decarboxylase (multiple activities) component Rv3068c pgmA phosphoglucomutase Rv0468 fadB2 3-hydroxyacyl-CoA dehydroge- Rv1095 phoH2 PhoH-like protein Rv3257c pmmA nase Rv3628 ppa probable inorganic pyrophos- Rv3308 pmmB phosphomannomutase Rv1715 fadB3 3-hydroxyacyl-CoA dehydroge- phatase Rv2702 ppgK polyphosphate nase Rv2984 ppk Rv0408 pta phosphate acetyltransferase Rv3141 fadB4 3-hydroxyacyl-CoA dehydroge- Rv0729 xylB xylulose kinase nase B. Energy metabolism Rv1096 - carbohydrate degrading enzyme Rv1912c fadB5 3-hydroxyacyl-CoA dehydroge- 1. Glycolysis nase Rv1023 eno enolase 2. Amino acids and Rv1750c fadD1 acyl-CoA synthase Rv0363c fba bisphosphate aldolase Rv1905c aao D- oxidase Rv0270 fadD2 acyl-CoA synthase Rv1436 gap glyceraldehyde 3-phosphate dehy- Rv2531c adi ornithine/ decarboxylase Rv3561 fadD3 acyl-CoA synthase drogenase Rv2780 ald L- dehydrogenase Rv0214 fadD4 acyl-CoA synthase Rv0489 gpm I Rv1538c ansA L- Rv0166 fadD5 acyl-CoA synthase Rv3010c pfkA I Rv1001 arcA Rv1206 fadD6 acyl-CoA synthase Rv2029c pfkB phosphofructokinase II Rv0753c mmsA methylmalmonate semialdehyde Rv0119 fadD7 acyl-CoA synthase Rv0946c pgi glucose-6-phosphate isomerase dehydrogenase Rv0551c fadD8 acyl-CoA synthase Rv1437 pgk Rv0751c mmsB methylmalmonate semialdehyde Rv2590 fadD9 acyl-CoA synthase Rv1617 pykA Rv0099 fadD10 acyl-CoA synthase Rv1438 tpi triosephosphate isomerase Rv1187 rocA pyrroline-5-carboxylate dehydro- Rv1550 fadD11 acyl-CoA synthase, N-term Rv2419c - putative phosphoglycerate mutase genase Rv1549 fadD11' acyl-CoA synthase, C-term Rv3837c - putative phosphoglycerate mutase Rv2322c rocD1 ornithine aminotransferase Rv1427c fadD12 acyl-CoA synthase Rv2321c rocD2 ornithine aminotransferase Rv3089 fadD13 acyl-CoA synthase 2. Rv1848 γ subunit Rv1058 fadD14 acyl-CoA synthase Rv2241 aceE pyruvate dehydrogenase E1 com- Rv1849 ureB urease β subunit Rv2187 fadD15 acyl-CoA synthase ponent Rv1850 ureC urease α subunit Rv0852 fadD16 acyl-CoA synthase Rv3303c lpdA dihydrolipoamide dehydrogenase Rv1853 ureD urease accessory protein Rv3506 fadD17 acyl-CoA synthase Rv2497c pdhA pyruvate dehydrogenase E1 com- Rv1851 ureF urease accessory protein Rv3513c fadD18 acyl-CoA synthase ponent α subunit Rv1852 ureG urease accessory protein Rv3515c fadD19 acyl-CoA synthase Rv2496c pdhB pyruvate dehydrogenase E1 com- Rv2913c - probable D-amino acid Rv1185c fadD21 acyl-CoA synthase ponent β subunit Rv2948c fadD22 acyl-CoA synthase Rv2495c pdhC dihydrolipoamide acetyltransferase Rv3551 - possible glutaconate CoA- Rv3826 fadD23 acyl-CoA synthase Rv0462 - probable dihydrolipoamide dehy- transferase Rv1529 fadD24 acyl-CoA synthase drogenase Rv1521 fadD25 acyl-CoA synthase 3. Fatty acids Rv2930 fadD26 acyl-CoA synthase 3. TCA cycle Rv2501c accA1 acetyl/propionyl-CoA carboxylase, Rv0275c fadD27 acyl-CoA synthase Rv1475c acn aconitate hydratase α subunit Rv2941 fadD28 acyl-CoA synthase Rv0889c citA 2 Rv0973c accA2 acetyl/propionyl-CoA carboxylase, Rv2950c fadD29 acyl-CoA synthase Rv2498c citE citrate β chain α subunit Rv0404 fadD30 acyl-CoA synthase Rv1098c fum Rv2502c accD1 acetyl/propionyl-CoA carboxylase, Rv1925 fadD31 acyl-CoA synthase Rv1131 gltA1 citrate synthase 3 β subunit Rv3801c fadD32 acyl-CoA synthase Rv0896 gltA2 citrate synthase 1 Rv0974c accD2 acetyl/propionyl-CoA carboxylase, Rv1345 fadD33 acyl-CoA synthase Rv3339c icd1 β subunit Rv0035 fadD34 acyl-CoA synthase Rv0066c icd2 isocitrate dehydrogenase Rv3667 acs acetyl-CoA synthase Rv2505c fadD35 acyl-CoA synthase Rv0794c lpdB dihydrolipoamide dehydrogenase Rv3409c choD cholesterol oxidase Rv1193 fadD36 acyl-CoA synthase Rv1240 mdh Rv0222 echA1 enoyl-CoA hydratase/isomerase Rv0131c fadE1 acyl-CoA dehydrogenase Rv2967c pca superfamily Rv0154c fadE2 acyl-CoA dehydrogenase Rv3318 sdhA A Rv0456c echA2 enoyl-CoA hydratase/isomerase Rv0215c fadE3 acyl-CoA dehydrogenase Rv3319 sdhB succinate dehydrogenase B superfamily Rv0231 fadE4 acyl-CoA dehydrogenase Rv3316 sdhC succinate dehydrogenase C sub- Rv0632c echA3 enoyl-CoA hydratase/isomerase Rv0244c fadE5 acyl-CoA dehydrogenase unit superfamily Rv0271c fadE6 acyl-CoA dehydrogenase Rv3317 sdhD succinate dehydrogenase D sub- Rv0673 echA4 enoyl-CoA hydratase/isomerase Rv0400c fadE7 acyl-CoA dehydrogenase unit superfamily Rv0672 fadE8 acyl-CoA dehydrogenase Rv1248c sucA 2-oxoglutarate dehydrogenase Rv0675 echA5 enoyl-CoA hydratase/isomerase (aka aidB) Rv2215 sucB dihydrolipoamide succinyltrans- superfamily Rv0752c fadE9 acyl-CoA dehydrogenase ferase Rv0905 echA6 enoyl-CoA hydratase/isomerase Rv0873 fadE10 acyl-CoA dehydrogenase Rv0951 sucC succinyl-CoA synthase β chain superfamily (aka eccH) Rv0972c fadE12 acyl-CoA dehydrogenase Rv0952 sucD succinyl-CoA synthase α chain Rv0971c echA7 enoyl-CoA hydratase/isomerase Rv0975c fadE13 acyl-CoA dehydrogenase superfamily Rv1346 fadE14 acyl-CoA dehydrogenase 4. Glyoxylate bypass Rv1070c echA8 enoyl-CoA hydratase/isomerase Rv1467c fadE15 acyl-CoA dehydrogenase Rv0467 aceA superfamily Rv1679 fadE16 acyl-CoA dehydrogenase Rv1915 aceAa isocitrate lyase, α module Rv1071c echA9 enoyl-CoA hydratase/isomerase Rv1934c fadE17 acyl-CoA dehydrogenase Rv1916 aceAb isocitrate lyase, β module superfamily Rv1933c fadE18 acyl-CoA dehydrogenase Rv1837c glcB malate synthase Rv1142c echA10 enoyl-CoA hydratase/isomerase Rv2500c fadE19 acyl-CoA dehydrogenase Rv3323c gphA phosphoglycolate superfamily (aka mmgC) Rv1141c echA11 enoyl-CoA hydratase/isomerase Rv2724c fadE20 acyl-CoA dehydrogenase 5. Pentose phosphate pathway superfamily Rv2789c fadE21 acyl-CoA dehydrogenase Rv1445c devB glucose-6-phosphate 1-dehydro- Rv1472 echA12 enoyl-CoA hydratase/isomerase Rv3061c fadE22 acyl-CoA dehydrogenase genase superfamily Rv3140 fadE23 acyl-CoA dehydrogenase Rv1844c gnd 6-phosphogluconate dehydroge- Rv1935c echA13 enoyl-CoA hydratase/isomerase Rv3139 fadE24 acyl-CoA dehydrogenase nase (Gram –) superfamily Rv3274c fadE25 acyl-CoA dehydrogenase Rv1122 gnd2 6-phosphogluconate dehydroge- Rv2486 echA14 enoyl-CoA hydratase/isomerase Rv3504 fadE26 acyl-CoA dehydrogenase nase (Gram +) superfamily Rv3505 fadE27 acyl-CoA dehydrogenase Rv1446c opcA unknown function, may aid Rv2679 echA15 enoyl-CoA hydratase/isomerase Rv3544c fadE28 acyl-CoA dehydrogenase G6PDH

Nature © Macmillan Publishers Ltd 1998 Rv2436 rbsK Rv3250c rubB rubredoxin B Rv1878 glnA3 probable synthase Rv1408 rpe ribulose-phosphate 3-epimerase Rv2860c glnA4 proable glutamine synthase Rv2465c rpi phosphopentose isomerase 7. Miscellaneous oxidoreductases and oxygenases 171 Rv2918c glnD uridylyltransferase Rv1448c tal Rv2221c glnE glutamate-- Rv1449c tkt transketolase 8. ATP-proton motive force adenyltransferase Rv1121 zwf glucose-6-phosphate 1-dehydro- Rv1308 atpA ATP synthase α chain Rv3859c gltB ferredoxin-dependent glutamate genase Rv1304 atpB ATP synthase α chain synthase Rv1447c zwf2 glucose-6-phosphate 1-dehydro- Rv1311 atpC ATP synthase ȏ chain Rv3858c gltD small subunit of NADH-dependent genase Rv1310 atpD ATP synthase β chain glutamate synthase Rv1305 atpE ATP synthase c chain Rv3704c gshA possible γ-glutamylcysteine syn- 6. Respiration Rv1306 atpF ATP synthase b chain thase a. aerobic Rv1309 atpG ATP synthase γ chain Rv2427c proA γ-glutamyl phosphate reductase Rv0527 ccsA cytochrome c-type biogenesis Rv1307 atpH ATP synthase δ chain Rv2439c proB glutamate 5-kinase protein Rv0500 proC pyrroline-5-carboxylate reductase 8 Rv0529 ccsB cytochrome c-type biogenesis C. Central intermediary metabolism protein 1. General 2. Aspartate family Rv1451 ctaB cytochrome c oxidase assembly Rv2589 gabT 4-aminobutyrate aminotransferase Rv3708c asd aspartate semialdehyde dehydro- factor Rv3432c gadB genase Rv2200c ctaC cytochrome c oxidase chain II Rv1832 gcvB glycine decarboxylase Rv3709c ask aspartokinase Rv3043c ctaD cytochrome c oxidase poly- Rv1826 gcvH H protein Rv2201 asnB synthase B peptide I Rv2211c gcvT T protein of glycine cleavage Rv3565 aspB aspartate aminotransferase Rv2193 ctaE cytochrome c oxidase poly- system Rv0337c aspC aspartate aminotransferase peptide III Rv1213 glgC glucose-1-phosphate adenylyl- Rv2753c dapA dihydrodipicolinate synthase Rv1542c glbN hemoglobin-like, oxygen carrier transferase Rv2773c dapB dihydrodipicolinate reductase Rv2470 glbO hemoglobin-like, oxygen carrier Rv3842c glpQ1 glycerophosphoryl diester phos- Rv1202 dapE succinyl-diaminopimelate desuc- Rv2249c glpD1 glycerol-3-phosphate dehydroge- phodiesterase cinylase nase Rv0317c glpQ2 glycerophosphoryl diester phos- Rv2141c dapE2 ArgE/DapE/Acy1/Cpg2/yscS Rv3302c glpD2 glycerol-3-phosphate dehydroge- phodiesterase family nase Rv3566c nhoA N-hydroxyarylamine o-acetyltrans- Rv2726c dapF diaminopimelate epimerase Rv0694 lldD1 L- ferase Rv1293 lysA diaminopimelate decarboxylase (cytochrome) Rv0155 pntAA pyridine transhydrogenase sub- Rv3341 metA homoserine o-acetyltransferase Rv1872c lldD2 L-lactate dehydrogenase unit α1 Rv1079 metB γ-synthase Rv1854c ndh probable NADH dehydrogenase Rv0156 pntAB pyridine transhydrogenase sub- Rv3340 metC cystathionine β-lyase Rv3145 nuoA NADH dehydrogenase chain A unit α2 Rv1133c metE 5-methyltetrahydropteroyltrigluta- Rv3146 nuoB NADH dehydrogenase chain B Rv0157 pntB pyridine transhydrogenase mate- methyltrans- Rv3147 nuoC NADH dehydrogenase chain C subunit β ferase Rv3148 nuoD NADH dehydrogenase chain D Rv1127c ppdK similar to pyruvate, phosphate Rv2124c metH 5-methyltetrahydrofolate-homo- Rv3149 nuoE NADH dehydrogenase chain E dikinase cysteine Rv3150 nuoF NADH dehydrogenase chain F Rv1392 metK S-adenosylmethionine synthase Rv3151 nuoG NADH dehydrogenase chain G 2. Rv0391 metZ o-succinylhomoserine sulfhy- Rv3152 nuoH NADH dehydrogenase chain H Rv0211 pckA phosphoenolpyruvate carboxy- drylase Rv3153 nuoI NADH dehydrogenase chain I kinase Rv1294 thrA Rv3154 nuoJ NADH dehydrogenase chain J Rv0069c sdaA L-serine 1 Rv1296 thrB Rv3155 nuoK NADH dehydrogenase chain K Rv1295 thrC homoserine synthase Rv3156 nuoL NADH dehydrogenase chain L 3. Sugar Rv3157 nuoM NADH dehydrogenase chain M Rv1512 epiA nucleotide sugar epimerase 3. Serine family Rv3158 nuoN NADH dehydrogenase chain N Rv3784 epiB probable UDP-galactose 4- Rv0815c cysA2 thiosulfate Rv2195 qcrA Rieske iron-sulphur component of epimerase Rv3117 cysA3 thiosulfate sulfurtransferase ubiQ-cytB reductase Rv1511 gmdA GDP-mannose 4,6 dehydratase Rv2335 cysE serine acetyltransferase Rv2196 qcrB cytochrome β component of ubiQ- Rv0334 rmlA glucose-1-phosphate thymidyl- Rv0511 cysG uroporphyrin-III c-methyltrans- cytB reductase transferase ferase Rv2194 qcrC cytochrome b/c component of Rv3264c rmlA2 glucose-1-phosphate thymidyl- Rv2847c cysG2 multifunctional enzyme, siroheme ubiQ-cytB reductase transferase synthase Rv3464 rmlB dTDP-glucose 4,6-dehydratase Rv2334 cysK A b. anaerobic Rv3634c rmlB2 dTDP-glucose 4,6-dehydratase Rv1336 cysM cysteine synthase B Rv2392 cysH 3'-phosphoadenylylsulfate (PAPS) Rv3468c rmlB3 dTDP-glucose 4,6-dehydratase Rv1077 cysM2 cystathionine β-synthase reductase Rv3465 rmlC dTDP-4-dehydrorhamnose Rv0848 cysM3 putative cysteine synthase Rv2899c fdhD affects formate dehydrogenase-N 3,5-epimerase Rv1093 glyA serine hydroxymethyltransferase Rv2900c fdhF -containing oxidore- Rv3266c rmlD dTDP-4-dehydrorhamnose Rv0070c glyA2 serine hydroxymethyltransferase ductase reductase Rv2996c serA D-3-phosphoglycerate dehydro- Rv1552 frdA fumarate reductase flavoprotein Rv0322 udgA UDP-glucose genase subunit dehydrogenase/GDP-mannose 6- Rv0505c serB probable phosphoserine phos- Rv1553 frdB fumarate reductase iron sulphur dehydrogenase phatase protein Rv3265c wbbL dTDP-rhamnosyl transferase Rv3042c serB2 C-term similar to phosphoserine Rv1554 frdC fumarate reductase 15kD anchor Rv1525 wbbl2 dTDP-rhamnosyl transferase phosphatase protein Rv3400 - probable β-phosphoglucomutase Rv0884c serC phosphoserine aminotransferase Rv1555 frdD fumarate reductase 13kD anchor protein 4. Amino sugars 4. Aromatic amino acid family Rv1161 narG nitrate reductase α subunit Rv3436c glmS glucosamine-fructose-6- Rv3227 aroA 3-phosphoshikimate Rv1162 narH nitrate reductase β chain phosphate aminotransferase 1-carboxyvinyl transferase Rv1164 narI nitrate reductase γ chain Rv2538c aroB 3-dehydroquinate synthase Rv1163 narJ nitrate reductase δ chain 5. Sulphur metabolism Rv2537c aroD 3-dehydroquinate dehydratase Rv1736c narX fused nitrate reductase Rv0711 atsA Rv2552c aroE shikimate 5-dehydrogenase Rv2391 nirA probable nitrite reductase/sulphite Rv3299c atsB proable arylsulfatase Rv2540c aroF reductase Rv0663 atsD proable arylsulfatase Rv2178c aroG DAHP synthase Rv0252 nirB nitrite reductase flavoprotein Rv3077 atsF proable arylsulfatase Rv2539c aroK I Rv0253 nirD probable nitrite reductase small Rv0296c atsG proable arylsulfatase Rv3838c pheA prephenate dehydratase subunit Rv3796 atsH proable arylsulfatase Rv1613 trpA synthase α chain Rv1285 cysD ATP:sulphurylase subunit 2 Rv1612 trpB β chain c. Electron transport Rv1286 cysN ATP:sulphurylase subunit 1 Rv1611 trpC indole-3-glycerol phosphate Rv0409 ackA kinase Rv2131c cysQ homologue of M.leprae cysQ synthase Rv1623c appC cytochrome bd-II oxidase Rv3248c sahH Rv2192c trpD anthranilate phosphoribosyltrans- subunit I Rv3283 sseA thiosulfate sulfurtransferase ferase Rv1622c cydB cytochrome d ubiquinol oxidase Rv2291 sseB thiosulfate sulfurtransferase Rv1609 trpE anthranilate synthase subunit II Rv3118 sseC thiosulfate sulfurtransferase component I Rv1620c cydC ABC transporter Rv0814c sseC2 thiosulfate sulfurtransferase Rv2386c trpE2 anthranilate synthase Rv1621c cydD ABC transporter Rv3762c - probable alkyl component I Rv2007c fdxA ferredoxin Rv3754 tyrA prephenate dehydrogenase Rv3554 fdxB ferredoxin D. Amino acid biosynthesis Rv1177 fdxC ferredoxin 4Fe-4S 1. Glutamate family 5. Histidine Rv3503c fdxD probable ferredoxin Rv1654 argB Rv1603 hisA phosphoribosylformimino-5- Rv3029c fixA electron transfer flavoprotein Rv1652 argC N-acetyl-γ-glutamyl-phosphate aminoimidazole carboxamide β subunit reductase ribonucleotide isomerase Rv3028c fixB electron transfer flavoprotein α Rv1655 argD acetylornithine aminotransferase Rv1601 hisB glycerol-phosphate subunit Rv1656 argF ornithine carbamoyltransferase dehydratase Rv3106 fprA adrenodoxin and NADPH ferre- Rv1658 argG arginosuccinate synthase Rv1600 hisC histidinol-phosphate aminotrans- doxin reductase Rv1659 argH arginosuccinate lyase ferase Rv0886 fprB ferredoxin, ferredoxin-NADP Rv1653 argJ glutamate N-acetyltransferase Rv3772 hisC2 histidinol-phosphate aminotrans- reductase Rv2220 glnA1 glutamine synthase class I ferase Rv3251c rubA rubredoxin A Rv2222c glnA2 glutamine synthase class II Rv1599 hisD

Nature © Macmillan Publishers Ltd 1998 Rv1605 hisF imidazole glycerol-phosphate subunit subunit 1 synthase Rv3048c nrdG ribonucleoside-diphosphate small Rv3119 moaE molybdopterin-converting factor Rv2121c hisG ATP phosphoribosyltransferase subunit subunit 2 Rv1602 hisH amidotransferase Rv3053c nrdH glutaredoxin electron transport Rv0866 moaE2 molybdopterin-converting factor Rv2122c hisI phosphoribosyl-AMP cyclohydro- component of NrdEF system subunit 2 lase Rv3052c nrdI NrdI/YgaO/YmaA family Rv3322c moaE3 molybdopterin-converting factor Rv1606 hisI2 probable phosphoribosyl-AMP 1,6 Rv3247c tmk subunit 2 cyclohydrolase Rv2764c thyA Rv0994 moeA molybdopterin biosynthesis Rv0114 - similar to HisB Rv0570 nrdZ , class II Rv3116 moeB molybdopterin biosynthesis Rv3752c - probable cytidine/deoxycytidylate Rv2338c moeW molybdopterin biosynthesis 6. Pyruvate family deaminase Rv1681 moeX weak similarity to E. coli MoaA Rv3423c alr Rv1355c moeY weak similarity to E. coli MoeB 4. Salvage of nucleosides and nucleotides Rv3206c moeZ probably involved in 7. Branched amino acid family Rv3313c add probable molybdopterin biosynthesis 8 Rv1559 ilvA threonine deaminase Rv2584c apt phosphoribosyltrans- Rv0865 mog molybdopterin biosynthesis Rv3003c ilvB acetolactate synthase I large sub- ferases unit Rv3315c cdd probable 5. Pantothenate Rv3470c ilvB2 acetolactate synthase large sub- Rv3314c deoA Rv1092c coaA unit Rv0478 deoC deoxyribose-phosphate aldolase Rv2225 panB 3-methyl-2-oxobutanoate Rv3001c ilvC ketol-acid reductoisomerase Rv3307 deoD probable purine nucleoside phos- hydroxymethyltransferase Rv0189c ilvD dihydroxy-acid dehydratase phorylase Rv3602c panC pantoate-β-alanine ligase Rv2210c ilvE branched-chain-amino-acid Rv3624c hpt probable hypoxanthine-guanine Rv3601c panD aspartate 1-decarboxylase transaminase phosphoribosyltransferase Rv1820 ilvG acetolactate synthase II Rv3393 iunH probable inosine-uridine 6. Pyridoxine Rv3002c ilvN acetolactate synthase I small sub- preferring nucleoside Rv2607 pdxH pyridoxamine 5'-phosphate unit Rv0535 pnp phosphorylase from Pnp/MtaP oxidase Rv3509c ilvX probable acetohydroxyacid syn- family 2 thase I large subunit Rv3309c upp phophoribosyltransferase 7. Pyridine nucleotide Rv3710 leuA α-isopropyl malate synthase Rv1594 nadA Rv2995c leuB 3-isopropylmalate dehydrogenase 5. Miscellaneous nucleoside/nucleotide reactions Rv1595 nadB L-aspartate oxidase Rv2988c leuC 3-isopropylmalate dehydratase Rv0733 probable Rv1596 nadC nicotinate-nucleotide pyrophos- large subunit Rv2364c bex GTP-binding protein of Era/ThdF phatase Rv2987c leuD 3-isopropylmalate dehydratase family Rv0423c thiC synthesis, pyrimidine small subunit Rv1712 cmk moiety Rv2344c dgt probable deoxyguanosine E. Polyamine synthesis triphosphate hydrolase 8. Thiamine Rv2601 speE Rv2404c lepA GTP-binding protein LepA Rv0422c thiD phosphomethylpyrimidine kinase Rv2727c miaA tRNA δ(2)-isopentenylpyrophos- Rv0414c thiE thiamine synthesis, thiazole F. Purines, pyrimidines, nucleosides and nucleotides phate transferase moiety 1. Purine ribonucleotide biosynthesis Rv2445c ndkA nucleoside diphosphate kinase Rv0417 thiG thiamine synthesis, thiazole Rv1389 gmk putative Rv2440c obg Obg GTP-binding protein moiety Rv3396c guaA GMP synthase Rv2583c relA (p)ppGpp synthase I Rv2977c thiL probable thiamine-monophos- Rv1843c guaB1 inosine-5'-monophosphate dehy- phate kinase drogenase G. Biosynthesis of cofactors, prosthetic groups and Rv3411c guaB2 inosine-5'-monophosphate dehy- carriers 9. drogenase 1. Biotin Rv1940 ribA GTP cyclohydrolase II Rv3410c guaB3 inosine-5'-monophosphate dehy- Rv1568 bioA adenosylmethionine-8-amino-7- Rv1415 ribA2 probable GTP cyclohydrolase II drogenase oxononanoate aminotransferase Rv1412 ribC α chain Rv1017c prsA -phosphate pyrophosphoki- Rv1589 bioB biotin synthase Rv2671 ribD probable riboflavin deaminase nase Rv1570 bioD dethiobiotin synthase Rv2786c ribF Rv0357c purA adenylosuccinate synthase Rv1569 bioF 8-amino-7-oxononanoate Rv1409 ribG riboflavin biosynthesis Rv0777 purB synthase Rv1416 ribH riboflavin synthase β chain Rv0780 purC phosphoribosylaminoimidazole- Rv0032 bioF2 C-terminal similar to B. subtilis Rv3300c - probable deaminase, riboflavin succinocarboxamide synthase BioF synthesis Rv0772 purD phosphoribosylamine-glycine lig- Rv3279c birA biotin apo-protein ligase ase Rv1442 bisC biotin sulfoxide reductase 10. Thioredoxin, glutaredoxin and mycothiol Rv3275c purE phosphoribosylaminoimidazole Rv0089 - possible bioC biotin synthesis Rv0773c ggtA putative γ-glutamyl transpeptidase carboxylase gene Rv2394 ggtB γ -glutamyltranspeptidase Rv0808 purF amidophosphoribosyltransferase- precursor Rv0957 purH phosphoribosylaminoimidazole- 2. Folic acid Rv2855 gorA reductase homologue carboxamide formyltransferase Rv2763c dfrA Rv0816c thiX equivalent to M. leprae ThiX Rv3276c purK phosphoribosylaminoimidazole Rv2447c folC folylpolyglutamate synthase Rv1470 trxA thioredoxin carboxylase ATPase subunit Rv3356c folD methylenetetrahydrofolate dehy- Rv1471 trxB Rv0803 purL phosphoribosylformylglycin- drogenase Rv3913 trxB2 thioredoxin reductase amidine synthase II Rv3609c folE GTP cyclohydrolase I Rv3914 trxC thioredoxin Rv0809 purM 5'-phosphoribosyl-5-aminoimida- Rv3606c folK 7,8-dihydro-6-hydroxymethylpterin zole synthase pyrophosphokinase 11. Menaquinone, PQQ, ubiquinone and other Rv0956 purN phosphoribosylglycinamide Rv3608c folP terpenoids formyltransferase I Rv1207 folP2 dihydropteroate synthase Rv2682c dxs 1-deoxy-D-xylulose 5-phosphate Rv0788 purQ phosphoribosylformylglycin- Rv3607c folX may be involved in biosyn- synthase amidine synthase I thesis Rv0562 grcC1 heptaprenyl diphosphate Rv0389 purT phosphoribosylglycinamide Rv0013 pabA p-aminobenzoate synthase gluta- synthase II formyltransferase II mine amidotransferase Rv0989c grcC2 heptaprenyl diphosphate Rv2964 purU formyltetrahydrofolate deformy- Rv1005c pabB p-aminobenzoate synthase synthase II lase Rv0812 pabC aminodeoxychorismate lyase Rv3398c idsA geranylgeranyl pyrophosphate synthase 2. Pyrimidine ribonucleotide biosynthesis 3. Lipoate Rv2173 idsA2 geranylgeranyl pyrophosphate Rv1383 carA carbamoyl-phosphate synthase Rv2218 lipA lipoate biosynthesis protein A synthase subunit Rv2217 lipB lipoate biosynthesis protein B Rv3383c idsB transfergeranyl, similar geranyl Rv1384 carB carbamoyl-phosphate synthase pyrophosphate synthase subunit 4. Molybdopterin Rv0534c menA 4-dihydroxy-2-naphthoate Rv1380 pyrB aspartate carbamoyltransferase Rv3109 moaA molybdenum biosynthe- octaprenyltransferase Rv1381 pyrC sis, protein A Rv0548c menB naphthoate synthase Rv2139 pyrD dihydroorotate dehydrogenase Rv0869c moaA2 molybdenum cofactor biosynthe- Rv0553 menC o-succinylbenzoate-CoA synthase Rv1385 pyrF orotidine 5'-phosphate decarboxy- sis, protein A Rv0555 menD 2-succinyl-6-hydroxy-2,4-cyclo- lase Rv0438c moaA3 molybdenum cofactor biosynthe- hexadiene-1-carboxylate synthase Rv1699 pyrG CTP synthase sis, protein A Rv0542c menE o-succinylbenzoic acid-CoA ligase Rv2883c pyrH uridylate kinase Rv3110 moaB molybdenum cofactor biosynthe- Rv3853 menG S-adenosylmethionine: Rv0382c umpA probable uridine 5'-monophos- sis, protein B 2-demethylmenaquinone phate synthase Rv0984 moaB2 molybdenum cofactor biosynthe- Rv3397c phyA sis, protein B Rv0693 pqqE coenzyme PQQ synthesis 3. 2'- metabolism Rv3111 moaC molybdenum cofactor biosynthe- protein E Rv0321 dcd triphosphate sis, protein C Rv0558 ubiE ubiquinone/menaquinone biosyn- deaminase Rv0864 moaC2 molybdenum cofactor biosynthe- thesis methyltransferase Rv2697c dut triphosphatase sis, protein C Rv0233 nrdB ribonucleoside-diphosphate Rv3324c moaC3 molybdenum cofactor biosynthe- 12. Heme and porphyrin reductase B2 (eukaryotic-like) sis, protein C Rv0509 hemA glutamyl-tRNA reductase Rv3051c nrdE ribonucleoside diphosphate Rv3112 moaD molybdopterin converting factor Rv0512 hemB δ-aminolevulinic acid dehydratase reductase α chain subunit 1 Rv0510 hemC porphobilinogen deaminase Rv1981c nrdF ribonucleotide reductase small Rv0868c moaD2 molybdopterin converting factor Rv2678c hemE uroporphyrinogen decarboxylase

Nature © Macmillan Publishers Ltd 1998 Rv1300 hemK protoporphyrinogen oxidase Rv0470c umaA2 unknown mycolic acid methyl- Rv2931 ppsA phenolpthiocerol synthesis (pksB) Rv0524 hemL glutamate-1-semialdehyde amino- transferase Rv2932 ppsB phenolpthiocerol synthesis (pksC) transferase Rv2933 ppsC phenolpthiocerol synthesis (pksD) Rv2388c hemN oxygen-independent copropor- 3. , mycoloyltransferases and Rv2934 ppsD phenolpthiocerol synthesis (pksE) phyrinogen III oxidase phospholipid synthesis Rv2935 ppsE phenolpthiocerol synthesis (pksF) Rv2677c hemY' protoporphyrinogen oxidase Rv2289 cdh CDP-diacylglycerol phosphatidyl- Rv2928 tesA Rv1485 hemZ hydrolase Rv1544 - probable ketoacyl reductase Rv2881c cdsA phosphatidate cytidylyltransferase 13. Cobalamin Rv3804c fbpA antigen 85A, mycolyltransferase J. Broad regulatory functions Rv2849c cobA cob(I)alamin adenosyltransferase Rv1886c fbpB antigen 85B, mycolyltransferase 1. Repressors/activators Rv2848c cobB cobyrinic acid a,c-diamide Rv3803c fbpC1 antigen 85C, mycolyltransferase Rv1657 argR arginine repressor synthase Rv0129c fbpC2 antigen 85C', mycolytransferase Rv1267c embR regulator of embAB genes Rv2231c cobC aminotransferase Rv0564c gpdA1 glycerol-3-phosphate dehydroge- (AfsR/DndI/RedD family) Rv2236c cobD cobinamide synthase nase Rv1909c furA ferric uptake regulatory protein 8 Rv2064 cobG percorrin reductase Rv2982c gpdA2 glycerol-3-phosphate dehydroge- Rv2359 furB ferric uptake regulatory protein Rv2065 cobH precorrin isomerase nase Rv2919c glnB regulatory protein Rv2066 cobI CobI-CobJ fusion protein Rv2612c pgsA CDP-diacylglycerol-glycerol-3- Rv2711 ideR iron dependent repressor, IdeR Rv2070c cobK precorrin reductase phosphate phosphatidyltrans- Rv2720 lexA LexA, SOS repressor protein Rv2072c cobL probable methyltransferase ferase Rv1479 moxR transcriptional regulator, MoxR Rv2071c cobM precorrin-3 methylase Rv1822 pgsA2 CDP-diacylglycerol-glycerol-3- homologue Rv2062c cobN insertion phosphate phosphatidyltrans- Rv3692 moxR2 transcriptional regulator, MoxR Rv2208 cobS cobalamin (5'-phosphate) ferase homologue synthase Rv2746c pgsA3 CDP-diacylglycerol-glycerol-3- Rv3164c moxR3 transcriptional regulator, MoxR Rv2207 cobT nicotinate-nucleotide-dimethyl- phosphate phosphatidyltrans- homologue benzimidazole transferase ferase Rv0212c nadR similar to E.coli NadR Rv0254c cobU cobinamide kinase Rv1551 plsB1 glycerol-3-phosphate acyltrans- Rv0117 oxyS transcriptional regulator (LysR Rv0255c cobQ cobyric acid synthase ferase family) Rv3713 cobQ2 possible cobyric acid synthase Rv2482c plsB2 glycerol-3-phosphate acyltrans- Rv1379 pyrR regulatory protein pyrimidine Rv0306 - similar to BluB cobalamin synthe- ferase biosynthesis sis protein R. capsulatus Rv0437c psd putative phosphatidylserine Rv2788 sirR iron-dependent transcriptional decarboxylase repressor 14. Iron utilization Rv0436c pssA CDP-diacylglycerol-serine Rv3082c virS putative virulence regulating Rv1876 bfrA bacterioferritin o-phosphatidyltransferase protein (AraC/XylS family) Rv3841 bfrB bacterioferritin Rv0045c - possible dihydrolipoamide acetyl- Rv3219 whiB1 WhiB transcriptional activator Rv3215 entC probable transferase homologue Rv3214 entD weak similarity to many phospho- Rv0914c - lipid transfer protein Rv3260c whiB2 WhiB transcriptional activator glycerate Rv1543 - probable fatty-acyl CoA reductase homologue Rv2895c viuB similar to proteins involved in Rv1627c - lipid carrier protein Rv3416 whiB3 WhiB transcriptional activator vibriobactin uptake Rv1814 - possible C-5 sterol desaturase homologue Rv3525c - similar to ferripyochelin binding Rv1867 - similar to acetyl CoA Rv3681c whiB4 WhiB transcriptional activator protein synthase/lipid carriers homologue Rv2261c - apolipoprotein N-acyltrans- Rv0023 - putative transcriptional regulator H. Lipid biosynthesis ferase-a Rv0043c - transcriptional regulator (GntR 1. Synthesis of fatty and mycolic acids Rv2262c - apolipoprotein N-acyltrans- family) Rv3285 accA3 acetyl/propionyl CoA carboxylase ferase-b Rv0067c - transcriptional regulator α subunit Rv3523 - lipid carrier protein (TetR/AcrR family) Rv0904c accD3 acetyl/propionyl CoA carboxylase Rv3720 - C-term similar to cyclopropane Rv0078 - transcriptional regulator β subunit fatty acid synthases (TetR/AcrR family) Rv3799c accD4 acetyl/propionyl CoA carboxylase Rv0081 - transcriptional regulator (ArsR β subunit I. Polyketide and non-ribosomal peptide synthesis family) Rv3280 accD5 acetyl/propionyl CoA carboxylase Rv2940c mas mycocerosic acid synthase Rv0135c - putative transcriptional regulator β subunit Rv2384 mbtA mycobactin/exochelin synthesis Rv0144 - putative transcriptional regulator Rv2247 accD6 acetyl/propionyl CoA carboxylase (salicylate-AMP ligase) Rv0158 - transcriptional regulator β subunit Rv2383c mbtB mycobactin/exochelin synthesis (TetR/AcrR family) Rv2244 acpM acyl carrier protein (meromycolate (serine/threonine ligation) Rv0165c - transcriptional regulator (GntR extension) Rv2382c mbtC mycobactin/exochelin synthesis family) Rv2523c acpS CoA:apo-[ACP] pantethienephos- Rv2381c mbtD mycobactin/exochelin synthesis Rv0195 - transcriptional regulator photransferase (polyketide synthase) (LuxR/UhpA family) Rv2243 fabD malonyl CoA-[ACP] transacylase Rv2380c mbtE mycobactin/exochelin synthesis Rv0196 - transcriptional regulator Rv0649 fabD2 malonyl CoA-[ACP] transacylase (lysine ligation) (TetR/AcrR family) Rv1483 fabG1 3-oxoacyl-[ACP] reductase (aka Rv2379c mbtF mycobactin/exochelin synthesis Rv0232 - transcriptional regulator MabA) (lysine ligation) (TetR/AcrR family) Rv1350 fabG2 3-oxoacyl-[ACP] Reductase Rv2378c mbtG mycobactin/exochelin synthesis Rv0238 - transcriptional regulator Rv2002 fabG3 3-oxoacyl-[ACP] reductase (lysine hydroxylase) (TetR/AcrR family) Rv0242c fabG4 3-oxoacyl-[ACP] reductase Rv2377c mbtH mycobactin/exochelin synthesis Rv0273c - putative transcriptional regulator Rv2766c fabG5 3-oxoacyl-[ACP] reductase Rv0101 nrp unknown non-ribosomal peptide Rv0302 - transcriptional regulator Rv0533c fabH β-ketoacyl-ACP synthase III synthase (TetR/AcrR family) Rv2524c fas fatty acid synthase Rv1153c omt PKS o-methyltransferase Rv0324 - putative transcriptional regulator Rv1484 inhA enoyl-[ACP] reductase Rv3824c papA1 PKS-associated protein, unknown Rv0328 - transcriptional regulator Rv2245 kasA β-ketoacyl-ACP synthase function (TetR/AcrR family) (meromycolate extension) Rv3820c papA2 PKS-associated protein, unknown Rv0348 - putative transcriptional regulator Rv2246 kasB β-ketoacyl-ACP synthase function Rv0377 - transcriptional regulator (LysR (meromycolate extension) Rv1182 papA3 PKS-associated protein, unknown family) Rv1618 tesB1 thioesterase II function Rv0386 - transcriptional regulator Rv2605c tesB2 thioesterase II Rv1528c papA4 PKS-associated protein, unknown (LuxR/UhpA family) Rv0033 - possible acyl carrier protein function Rv0452 - putative transcriptional regulator Rv1344 - possible acyl carrier protein Rv2939 papA5 PKS-associated protein, unknown Rv0465c - transcriptional regulator Rv1722 - possible biotin carboxylase function (PbsX/Xre family) Rv3221c - resembles biotin carboxyl carrier Rv2946c pks1 polyketide synthase Rv0472c - transcriptional regulator Rv3472 - possible acyl carrier protein Rv1660 pks10 polyketide synthase (chalcone (TetR/AcrR family) synthase-like) Rv0474 - transcriptional regulator 2. Modification of fatty and mycolic acids Rv1665 pks11 polyketide synthase (chalcone (PbsX/Xre family) Rv3391 acrA1 fatty acyl-CoA reductase synthase-like) Rv0485 - transcriptional regulator (ROK Rv3392c cmaA1 cyclopropane mycolic acid Rv2048c pks12 polyketide synthase (erythronolide family) synthase 1 synthase-like) Rv0494 - transcriptional regulator (GntR Rv0503c cmaA2 cyclopropane mycolic acid syn- Rv3800c pks13 polyketide synthase family) thase 2 Rv1342c pks14 polyketide synthase (chalcone Rv0552 - putative transcriptional regulator Rv0824c desA1 acyl-[ACP] desaturase synthase-like) Rv0576 - putative transcriptional regulator Rv1094 desA2 acyl-[ACP] desaturase Rv2947c pks15 polyketide synthase Rv0586 - transcriptional regulator (GntR Rv3229c desA3 acyl-[ACP] desaturase Rv1013 pks16 polyketide synthase family) Rv0645c mmaA1 methoxymycolic acid synthase 1 Rv1663 pks17 polyketide synthase Rv0650 - transcriptional regulator (ROK Rv0644c mmaA2 methoxymycolic acid synthase 2 Rv1372 pks18 polyketide synthase family) Rv0643c mmaA3 methoxymycolic acid synthase 3 Rv3825c pks2 polyketide synthase Rv0653c - putative transcriptional regulator Rv0642c mmaA4 methoxymycolic acid synthase 4 Rv1180 pks3 polyketide synthase Rv0681 - transcriptional regulator Rv0447c ufaA1 unknown fatty acid methyltrans- Rv1181 pks4 polyketide synthase (TetR/AcrR family) ferase Rv1527c pks5 polyketide synthase Rv0691c - transcriptional regulator Rv3538 ufaA2 unknown fatty acid methyltrans- Rv0405 pks6 polyketide synthase (TetR/AcrR family) ferase Rv1661 pks7 polyketide synthase Rv0737 - putative transcriptional regulator Rv0469 umaA1 unknown mycolic acid methyl- Rv1662 pks8 polyketide synthase Rv0744c - putative transcriptional regulator transferase Rv1664 pks9 polyketide synthase Rv0792c - transcriptional regulator (GntR

Nature © Macmillan Publishers Ltd 1998

150,000 300,000 450,000 600,000 750,000 900,000 1,050,000 1,200,000 1,350,000 1,500,000 1,650,000 1,800,000 1,950,000 2,100,000 2,250,000

1723

1075

1330

0653

0805

0123 0248

0939

1598

rplL 1205

ureC

0372

2003

0122 1597

mmpL2 1722

rplJ

fadA3 0804

0247 1461

0371

fabG3

ureB dinG

0121

nadC

1073

1204

0246

0650 ureA

0370

1721

0938

2001

1847

mmpS2

purL

1720

fadD37

proT

1072

nadB 0369 1460

1203

fusA2 0245

1846

1719 2000

glgP

dapE serB

0368

1845

0937

nadA

1718

echA9

0802 1459

0504 fadD7 0367 0801 pstA2

1717 0648

1201

fadE5

gnd

0366 1999 echA8

1593

1716 pepC

cmaA2 pstC

1458 1200

0365

0502

1327

1069 1998 fadB3

1592

fadA2

0364 1457 oxcA

phoS1

0799 guaB1

1714 galE1

IS1081

1591

1199 1456

fba pstB oxyS 0647 0798

ctpF 1590

1198

1842

1455

glgB 1713 fabG4 0797

bioB 1197

PE_PGRS

mgtE

lipG proC

IS1547 0116

pstS

qor

0241

1841 cmk

PPE

0796 0240

0499

1996

1588

0361

0239

1711 mmaA1 IS6110 0795

PE REP

0115

1453

0498

0238

1587 pknD

PE_PGRS

1995 1710

0360

1066

PE_PGRS 0114 mmaA2

PE_PGRS

0359

0497

lpqI

1709

1065

1194 lpdB

1994 1324 gmhA

1586 1839

pstA1

0358

mmaA3 1993 1838

lpqV

0496 0793 1708

gca

1585

pstC2

fadD36 fadA4

PE_PGRS

1584 mmaA4

1063 0792

ctpG rplA

purA

1583

0495 glcB

1707

ctaB

phoS2

1192 1062

rplK

1322 0791

0494

0111 0356 1582

nusG 1321

0236

1061

1191

1991 0790

0927 0493 1581 secE

1060

trpT

0789 1580

1190 1836

0110

1990

1320

1579 PPE purQ 0637

0926

1059

1989

1578 0636

sigI

0492

0635 PE_PGRS

0925

1577

1988

0787

PPE

PE_PGRS

metT 0235

1188

fadD14 1319

1835 thrT

1987

regX3

1576 0786

nramp 0108

1575

0634

cycA

1986

1834

rocA

0785

1574

1057 senX3 gabD2

0923

1318 8

PPE 0633 1573

1985

1703 tkt

1833

nrdB

0784 1572

1571 gpm

0922 echA3

1186

0232

bioD

alkA 1056

1984

ctpI

0488 REP

IS1535

1702

leuX tal

0783

0921

ogt bioF gcvB 1055

PE_PGRS 0487

rrf

fadE4 1054

1701

recC

IS1554 fadD21

0486 bioA

zwf2

0920 1053 1052 1700

1831

ptrBa

1982

0230

rrl

1830

argT

0919 0106

1184

1567

pyrG

0485 opcA

ptrBb

nrdF 0918

0229

PPE

1829

1051 1566

hspR rpmB

devB purC

0228

1828

1050

mpt64

1444 1698 0484

recB

0104 dnaJ

mmpL10 betP

1827

0779

1049 rrs

1565

0483

1443 1979

gcvH 1697

grpE

0778 0227

PE 1825

1048 1978

murB

murA

bisC

1824 ctpB

dnaK papA3 purB

recD

recN

PPE 0226

glgX

1977

1823

1047 0481

1314

IS1081

pgsA2

0349

0480 0225

1695

0776 0628 0914

IS1557

1046

1976

0102 0775

0627 1313 0348

1975

tlyA

0479

0626

PE_PGRS 1045

glgY 0224

1312

secG 1974

1693

0347

0913

0774 secA2

deoC

pks4

atpC

1973

1044

0625

1692

0477

0624 1439

1972 0223

0476

atpD

0623

0912

ggtA 1691

0475

1043 glgZ

tpi

1971 echA1

aroP2

0622 0911

lprJ

ilvG

1561

atpG

0345

0474 purD 0910

1042

IS-LIKE

pgk

1560

0221 0909

0621 lprM

1041

lpqJ pks3

0771

0473 tyrS

ilvA atpA

gap nrp

0770

1819

0343 1969

lipC

1688 galK

1558

PE

ctpE 0472

0769

0219

atpH

galT

1435 mmpL6

1434

1687

0471 1968

galT'

1433

0342

PPE 1179

aldA atpF

0218 1686

1556

0617

atpE 1967 PE_PGRS

0907

1038 1685

frdD

umaA2

1684

atpB

1432

0615 0616

1037

frdC 1817 umaA1

lipW 0767

0341

1303

mce3

1036

1178 0100

0614 frdB

IS1560

0906

0216 1035

fadB2

0766

fdxC

1683

fadD10 1431

1965 rfe 0340

1034

1816 echA6 frdA

1176

0765

aceA

1033 1964

0613

fadE3

1301 0098 1815

PE 0097

0764

hemK

accD3

0339

fadH 1814 1032

fadD4 0466 1682

0612

plsB1 0763

kdpC

prfA PPE

0903 1963

0611 0762

moeX

1813

1174

1429

0465

rpmE

0610

0213

0902

adhB

1962

1680 0095 kdpB

fadD11 1812

1961 REP

0464 0901

0338

0760

1173

0463 1428 rho

0094 1960

0609

0900

nadR fadE16

mgtC fadD11' 0759

1959

0608

0462 ompA

phoR

1810

0093 1958

0607

kdpA 1957 1678

pckA

fadD12

thrB

0461 0606 1956

0898

aspC PE

PPE

dsbF PPE phoP

0460

1955

ctpA 0605

IS1536

1171

lipO

0459

1676 0897

1954 thrC 1953

0336

0756

0210

1170 1952 lpqO

PPE

0458

1675 thrV

1425

1951

gltA2 kdpD

0091

thrA 0603 PE

PE 1950

0209

dnaE1

1674

1949 tcrA

PPE

rmlA 1424

PPE

PPE 0090

lysA

1673 1948

0601 0895

kdpE

0333

1423 0457

0208

1947 PE

1167

0089 0600

0332

1805

0207

1546 1026 lppG 0599

1422

1672

0088

argS

0894

lpqW

PE_PGRS

1545 0598 1804

1025 0331 1671

1945

1544 REP

1421

1670

1024 hycE

0597

argV

echA2 1669

0893 1543

0330

1944

1291 0455 mmpL3

eno

0596

1165

PE_PGRS

0454 uvrC 1943

0329

mmsA

hycQ

0595

0892 1668

1942 0328 glbN

lpqU PPE

1941 1290 0594

lprI

PPE 0205

narI

1419

1667 hycP

fadE9

1540

1021 0327

ribA

narJ 1289

lprH

0891 hycD

1666

0326 PPE lprL

lspA

0452

1417 0204 mmsB 1939

0325

0750

1288

ribH narH

pks11

0203

0324

ephB

0749

ansA

0083

0592

mmpS4

0748 0890 1287 ribA2

mfd

0323

dinX

PPE

0082

1414

udgA

cysN

0591

0081 1937

PE_PGRS 1413

pks9 mmpL4

lppT 0080

narG mmpL11 citA

ribC dcd

0079 1019

0590 cysD

0320

ileS

0888

lprG 1936 0201 glnT

0449

1798

1284

0200 pcp

mce2

pks17

glmU

PE_PGRS

0199

0078 mutT2 1410

0448

0588

03183 oppB

echA13 1535 1797

glyU

0077

prsA

ribG

1159

0887 0587

ufaA1

0745

oppC

34

076 0198

T 0 lpQ2 dE17

5

0

00

a ad 6

g

qT gl

16 16

e e 15 1 f

86 pq B

lp

03 rp

058

fprB

0446

1796 0075

0744

1158

1533

pks8

rplY

0315

oppD

0743

fadE18 sigK

fmu

tpx

pth

1157 0197

0074

0885

PE_PGRS 0444

1532

0314

1795

0585

1531

0313 0741

0443 1931 fmt 1156

pks16

oppA

0073

IS1557'

0740

1930 0196 adh

serC

1155

1794

0312

1929

1405

PPE

1012 0739

0195 0072

1793 1154

1279 1404

0883

1011

1928

0584

1792

REP' fadD24

0882

0738 0441 omt

1927 PE

0071 1152 1403 0311 0881

ksgA

0737 groEL2

0880

1151

1926 PPE

papA4

1278

0310

0194

1009

priA

1150

0736 pks7 lpqN

0879

0582 0309

1149

fadD31

glyA2

0581 sigL IS-LIKE

0439

1008 0308

PPE PPE

1401

0580

map' REP

1148 1277

sdaA PE 0579

0307

1924

0877

adk

lipI 0306 moaA3

metS

PPE lipD

0068

psd

1276 secY 0193

1147 lipH

lprC pks5

1786 0876

0067 pks10

pssA

1922

1006

lprB 1398

0192 PPE

1146

1397

0731

0875 1785

argH

0730 0435 icd2

PE_PGRS

1273 lppF

0874 0191 1145

pabB

xylB

PE_PGRS

1920 1784 0065

0190

argG

0434

1395 0577

1144 fadE10

1004

1526

1272 argR 1919

0433

0728 ilvD

mcr

0576

1003

wbbl2 argF

0064

1394

sodC fucA 1271 1783

0188

0431

1524

argD PPE

lprA

0187

echA10 1002 PPE

0575 0430 0726

1393

1269 PE_PGRS 1782

argB

1523

cspB echA11 def

1268 arcA 0725

1140

0063 0574

metK

0870 bglS

argJ

0428

1000

sppA

1139

embR

0573

moaA2

xthA

1781

0999

celA argC

dfp

0185

1138

0303

moaD2

rplO

0426

0184

1390 0998

mmpL12

0061

rpmD

1137

0572 0302 1780

PPE

1136

0867 gmk

rpsE

pknH

0301 0997 0183 0060

moaE2

rplR

0300

mIHF 0571

alaV

rplF

0299 mog

1265 0059

1779

sigG

0298 rpsH fadD25 moaC2

PPE PE_PGRS

0996

ctpH 0863 rpsN

PPE

1264

0181

rplE nrdZ

PE_PGRS

aceAb

1778

1134 rimJ

rplX

dnaB

PE

1520 rplN

0180

1777

0862

amiB2

0569

aceAa pheT

moeA 1519 pyrF

0713

0424

0057

atsG

0568

1518

1776

lprO

metE 1262 1914

galU rplI

1775 1913

1261

0178 rpsR

thiC

1517 pheS

0712

0295 0861

ssb

0567 1260

carB

0177

0992

0294

1774

rpsF

fadB5

1132

tyrT

1648 0991

thiD

0176

1516

1259

0052

0566

0990 atsA

lppC

fadB

1647 0293 0421

1773 0175

1515

1910

1772 gltA1

0420 0051 0565 carA

grcC2 1258

0292

PE

furA

rpsQ

0174 1514

lpqM

1771

1382 0988 rpmC fadA

1513

1130

rplP

0291 1257

gpdA1

pyrC

katG

1645

rpsC ponA lprK

epiA 1770

htpX

lpqL

0858

tsnR rplV

0987 0290 1256

pyrB

1129 gmdA rpsS

0857 rplT

1907

0172 grcC1 1769

0049 0856 rpmI

rplB pyrR

1255

thiG

0289

infC

1906

REP rplW 0416

1254

0986 far

0288 1128

0048

1510

0561

aao

rplD

1378

0171

0415 0287

PE_PGRS

0854

0047

1904 rplC

0560 mscL

deaD

moaB2

PPE

1377 1509 rpsJ

1903 thiE

0170 1767

0046 ppdK 0559

ubiE

pdc

lysX mutT3

1766

1376

PE 0983 0699

lprE

1126 nanT

0045

mce1

0698 0557

fadD16 ISB9'

1508

0412 1125

1375

0044

cinA

0982

0556

0168

0851

1765

0697

0284

ephC

glnH

0043 1639 0850

1251

0167

0981

menD

1374

1764 IS1606'

1507 lipJ

0849 0042

1373

IS6110 1763

bpoB 0696

1506

pknG bpoC fadD5

gnd2

lppD cysM3

pks18 1505

1762

uvrA

0695 1898

1250

leuS

menC

PE_PGRS

0283

1504 1761 0979

lpqS

1897 ackA zwf

0165

1371

1503

lldD1

0164 1760

1896

0552

1502 1249

0163

0846

1120

1895 1637 0282 PE_PGRS

1370

pta

pqqE

0040

1636 1119 IS6110

1501

adhE 1369

1118

0692 0039

0845

1117 PE_PGRS

0038

fadD8

0281

lprF

1500

0407 0161

1635 0691

1116 1894

sucA

(wag22) (wag22) 1893

1499 narL 0550 1115

1892 PE_PGRS 0690 0037

PPE

0406

0549

1114

1891 0843 1634 1367

PE

1113 1498

1758

0036

1890 menB

0976

0689

1247

1366 1112

0842

1757 1889

lipL

0688

1246

IS6110

0547

PE uvrB

fadD34

1756

1245

pks6

1365

1888

0546

0687

0841 1111 1496 fadE13

lpqZ

0034

0158

plcD PE_PGRS

1495 1887

0033

lytB' 0840 pitA

0686

rsbU 1494 1632

accD2

0839

0544

pntB

1631 1754

1109

bioF2

tuf 0543

fbpB mutB lpqR

PE_PGRS

1242 pntAB fadD30

1363

menE

xseA

1241

1885 accA2

rpsA

pntAA

1362 PE_PGRS

0031

1884 xseB

mdh

fusA

0541

0030

0837

1883

1106

mutA PPE

mmpS1 PPE

0277

0540

fadE12

0029 1882 1105

0836

corA fadE2 0276 rpsG

0539

polA

lppE

1360

0028

echA7 rpsL

1104

0970 1491 0153 0027 1752

lpqQ sugC mmpL1

0681

1880 1103

0538

fadD27

0274

1359

1102

0026

1751 sugB

1628 PE

0680

0401 1490 1879

ctpV

0273 1101

0679

0025 sugA

1627 0678

1100

0272

0537 fadE7 0024

glnA3

0968

1489

1626

PE

fadD1

1358 PE_PGRS

lpqY

0967 mmpS5

0023 1488

1099

leuV

galE2 lpqK

1749 0966

1234

1487

1625 0022

fadE6

1748 1877

pnp

0398 0965 fum

0150

PE_PGRS

mmpL5

1233

1486

0964

0021 0149

0397 1624

leuT

menA

0396

bfrA

1357

hemZ 0963 1097

fadD2

1232

0148

PE_PGRS

echA5 0395

1747

appC 1096

fabH

pheT inhA 1875

0962

0020 0674

1231

0394

0961 1356 aspT

1874 0147

gluT fabG1

0960

0269

PE_PGRS

lysT

phoH2 echA4 cydB

0393

1873 1230

0019

0831

pknF

0268

1482

0146

moeY

0531

0830

desA2

fadE8

0959 ppp lldD2

mrp cydD

narU

1481

0829

0392 0145

0530 1745 1871

IS1605'

glyA

lpqX

0828

lpqP

1744 1870 1480

1354

0144

rodA

metZ

0827 0958 cydC

ccsB 1227

end 0826

1869 pknE moxR 0390

coaA

1353

1226

1619

0143

pbpA

0266

0528

0825 purH 1352 purT

1478

1742

1351 0669 1868

0142

1741

1225 tesB1

purN ccsA

fabG2

desA1

pknA

PE_PGRS

1740 1477

1224

PPE

0141

0526 0140

pykA

0823 0387

0955

fecB2

htrA

1739

0525 1867 1349 1476

0139

pknB

1616

0264

1090 1738

0138 0954 1222

hemL

1615

rpoC

pabA

PE

0263 0822 0386

sigE

0137 PE

acn

0012

narK2

0262 1348 0953 1866 0523

0136

lgt

gabP

0011 1220 sucD

phoY2

0010 phoT

narK3

narX

leuT

1219

trpA ppiA 0135 0385 1474 PE_PGRS

sucC 0521

0520

1865

ephF

0819

1347 1218

0008 0260

trpB

rpoB

1864

1473

fadE14

0519

alaT 0818 1735

0133

1086 ileT 0950 0259

1217 1863

1734

trpC

0518 clpB

0007

0258

0817

echA12

fadD33 adhA

0132

1733

1085

1610

0257

thiX trxB

1216

0666

uvrD 0517

1861 1732

trxA 1344

0665

0383

trpE

0664 1084 fadE1

gyrA

PPE 1343

cysA2 gabD1

1215

modD

0130 0516

ctpD

sseC2 1341

pks14

umpA

0948

1083

0813 PE

atsD 0515

bcpB

0381

rphA modC

0947

1082

pabC

cobQ

chaA fbpC2

glgC

0514

1339

0380

1730 sec

gyrB

modB

0128

1081

0513

hisI2

pgi cobU 0662

0378

nirD

0811

murI

modA

PE_PGRS 0661

greA hisF

hemB

0127 0377 1729 0660 1212

0810

0004 1337

metB 0945

impA 0659 1211

1856

purM

nirB

fadE15

1728

0944

cysM recF tagA 0658

pra cysG hisA

0376

1727 1466

1855

1209 1335 0657

0126

1465 0943 hisH

purF

1334 0656

dnaN

1208

1726

0375 0942

cysM2

hisB

hemC hsp

ndh

1464

1333 0655

0807

0374

0941 folP2 pepA 0250

hisC

ureD

1332

lipU

1725

hemA 1463

0249

1331 0940

dnaA 0654

0373 ureG

1724

fadD6

PE_PGRS

hisD

cpsY

1462

0508 ureF 1 150,001 300,001 450,001 600,001 750,001 900,001 1,050,001 1,200,001 1,350,001 1,500,001 1,650,001 1,800,001 1,950,001 2,100,001

Nature © Macmillan Publishers Ltd 1998 0 0 0 0 0 0 0 0 0 0 0 0 0 0

2,400,00 2,550,00 2,700,00 2,850,00 3,000,00 3,150,00 3,300,00 3,450,00 3,600,00 3,750,00 3,900,00 4,050,00 4,200,00 4,350,00

3753

2842

folP

3871

2277

2402 3752 pyrD 3083 3480

sigH 3751

dxs

serX folX

nusA

fadD22 3222

2276

folK

3750

2401

2840

lppL

virS 3605

3870 3749

3748

3221

2681

2275 3479

subI

3747 2137

3604

pks15

PPE 3081

3220

2273 2274

2680

infB

PE

2136

cysT

2272 whiB1

3869

3745

3744

echA15

2271 3603 2135

cysW

PPE lppN 3218

2134 rbfA fas

panC

PE

2269

hemE 3743

3868

cysA

2133 pknK

2837

3217

panD

3216 pks1

2132

PE_PGRS 2268

3600

3346

3867 hemY'

3742 kgtP

entC

dinF

3599

cysQ

3866

3741

2676

2267 lysS 3079 entD

3865

3475

3078

ugpA cysS2

2675 3740

2395

lsr2 IS6110

3474

2674

3213 3864

2266 lppX

ugpE

2129

atsF

PPE

2128

2944

bpoA

3212 2673

PE_PGRS 3472 PPE

ugpB

acpS

3863

ansP

2265 3076

clpC

2943 IS1533

ggtB

3471

2522

rhlE

3737 2672

ugpC 3075

3862 3861

bcp

ilvB2

echA16

2393

3074

3860

2520

2264 ribD

3736

3210 2830

PE_PGRS

PE_PGRS cysH 2125

PE_PGRS 3209 PE

mmpL7 2829

3735 3073

3594

mhpE

2263 2670

2828

3072

2669 lysU nirA

3208

rmlB3

lpqF 3071 2668 3734

2827 2262

lppS

3467

3070

clpX'

3592 fadD28

3733 REP 2261 3207 metH

2826

3466 3069

2390

2260 gltB

2666 3732 3591 2517

2389

rmlC

2825 PPE

IS1081'

2516

2665 adhE2

moeZ

pgmA

2664

rmlB

ligC hemN

2824 2663

3205

PPE 2515

3067

3204

alaU

2662

PE_PGRS

2387 3463

2661

2258

3066

lipV

3730 mutY

2514

2660 hisI

gltD

emrE infA

2513 2823

2257

2659 hisG 3588 rpmJ

3064 8

rpsM

3729

2256

mas

2658

3857

2120

IS1081

3587 rpsK

trpE2

3342 2512

2822

2657 2255

2119

rpsD 3856

3586 2656 2821 3202

cstA

2254

2253 lipK

metA

hisT

3855

2655 rpoA 2118

2820

2511

2252 2117

radA

rplQ

2654

lppK

metC mbtA

2819

3854

3728 2653

ligB

2251 truA

lpqE 2510

2652 menG

papA5

3454

hns

2818

2651

3201

2115

3583 2509 icd1

2250 3851

3453

drrC 3582

3338

3850

2650

2817

2114

3727 3581 fadE22

3452

3337

2508

3849

drrB glpD1 2816

mbtB

2113

3451

trpS 2649 3200

2507

cysS

3848 2248 drrA

2648

IS6110

3060

2506 3726

3335

3199

2647 3847

3579

3450

2112

3334 2815 accD6

IS6110

2646 sodA

2814

arsB2

3725 mbtC

2645

2111 3449

fadD35

3059

3333 valU

kasB

ppsE

prcB

3724

uvrD2 cysT

3577 3845

glyT

valT scoA

nagA

prcA

3448

3844

3723

3058

kasA

pknM 2813

scoB 2644 mbtD

serV

PPE

3057

acpM

sugI

3197 arsC

2812 PE

3575

accD1 fabD dinP 3722

IS1604

3574

2642

2811

3055 3330

2106

3196

3447 2641

IS1555'

2242 3843

2810

IS6110

2105

2809

3054

accA1 2640 dnaZX

2808

3195 3329

ppsD

2639

glpQ1

fadE34 2638 nrdH

bfrB

2104

3720

mbtE

2807 nrdI

2103 dedA

aceE

fadE19

3572 3840 2102

3446

sigJ

3194 2806

2499

3839

2636

2805

3719

3445 3571

nrdE

citE

3327 2635 2804

3444

rplM

pheA 2240 2803 IS1547

3718 helZ

rpsI 3193

3570

3717 3050

pdhA

3326

2239

3837 2802

ahpE

IS6110 3836 3325

3716

PE_PGRS

3569 mrsA

valV

pdhB 2801 3049

recR 2237

3192

3835

mbtF

3440

3568

moaC3

2633

2800

2100

3714

metU gphA pdhC

ppsC

2632

3567

3439

2494

nrdG cobD cobQ2

moaE3

2493

2235 PE

2799 serS

3321

2631 3047

IS1603 3438

2492 3191 3712

ptpA 3320

nhoA

3833 2798

3046

sdhB 2630 3437

2233 2491

adhC aspB mbtG

PE_PGRS

2232

2629

3832 2797

mbtH

3831

dnaQ

sdhA

3190

2097

fadE33

fecB glmS

2376

2375

2628

3830

lppV 3189

cobC

fadE32

leuA

sdhD 2096

3188

hrcA

sdhC

2795 3435

3829

2627

ctaD 2230

fadE31

2095

3187 cdd

ppsB 2794 3434

dnaJ2

2094

IS6110 3186 PE_PGRS 2229 3828

ask

truB

2626

deoA

IS1537

fadD3

2093

3433

2372

serB2

3185

2228

3827

PE

2625 2792

asd

add 3184 IS6110

2227 3041

IS1602

3183

2489

gadB

fadD23 2370 2624 2791

fadE30 helY 3182

3040

rnpB 3707

2623

2369

3559

IS1552 3181

3312

2226

3706

3431 ltp1

echA17

3180 phoH

2622

2488

PPE

3311 2091 3038

3705 2367

ppsA 3179 IS1540

panB

2621 3430

2090

fadE21

3310

3037

2366

2620

PPE

sirR

gshA

3557

2619

2618

2365

upp 3178

3036

pepE

pks2

bex

3703

2224 fadA6

2787

3035 3177 2617

pmmB

2616

PE_PGRS

pknJ

3428

amiA2 3555 3702

IS1532

echA14 lipS

fadD26

deoD 2223

3034 ribF argW 3701

2087

3427

fdxB

2086 3033 2362 3175

PE_PGRS

PPE

rpsO

IS1556 2929

lipQ 2085 3700 amiB

2361 lppU

glnA2

tesA

PPE 3174

3699 2084

3032 3553 2360

thrS

papA1 furB amiA

3424

2484 3173

2358

3304 2927 2083

gpsI

3698

3552

3172

alr

3031 2613 glnE 2926

3551

glyS hpx rnc

pgsA lpdA 2483

3697

2082 3422

echA20 3030

pepR

mmpL8

2611

3170 3421

fpg

glpK rimI

3549

PPE

glnA1 fixA

2781

2610

glpD2

2923 2081

gcp

3169 plsB2

3695

3548 lppJ

fixB

ald 3822

2609 3547

phoY1

groES

3168

2219

2481 2355

3694

2079

3821

2779 3027 3300

fadA5

smc

PPE IS6110

2354

2778 lipA

groEL1

2480 3167

3026

3693

9

078

IS6110 hiB3 2

I

2 2

A

A2 w

wh

47

47 77 pB

77 pB PE PE 6

2

li P

27

pap 3166

3545 pdxH

atsB 3025

3819 2478

2216 moxR2

3165 3415

2776 2077

2606

PPE ftsY

3818

2775 sigD fadE28 3024

3691 2477

2076 moxR3

sucB

tesB2 3413 2774 lpqC

3690 fadE29 3163

3817 IS1081

2075 3412

2604 3023

nei amt dapB

3542

3162 2074

2772

plcA

3689

2603 3816

glnB

3541

2771

2602

ephD

PPE 2073 3161

guaB2

3815

ltp2 3688

PPE PPE

2476

3160

3814

plcB

3687 cobL

glnD PE

speE pepB

lhr

guaB3

PE

3686

3813

PE

3019

2600 cobM

PPE

3685 2599

2212 plcC

cobK

choD ufaA2

PPE

PE_PGRS PPE

2917

2598 sigC

2475 3408

2597

proV 2348 nuoN

3017

3295 3407

2474 gcvT

2767

3537

blaC

2596

2347

lpqA

2473 3684

3406

2595 csp

2346

fabG5 ffh

2472 3294

ilvE

2067

2765

nuoM

3015

3683

ruvC

2345 3405

3536

2471

aldB

ruvA

2209 pirG

3404

cobI 2915

thyA 3535

ruvB

ligA glbO

nuoL dfrA

ponA'

dgt

3292

glf cobH cobS 3534 3403

2762

2469

pknI

PE_PGRS

3013

nuoK

cobG

cobT

2761 3291

2468

nuoJ

PPE

gatC

2760

dnaG

lat 2206

3808 3402

whiB4

2063 nuoI 2759

2913

gatA

2342

3680 2758

pepD

3289

PPE

2205 2757 nuoH 3807

lppQ

fadD9

3288

2912

3679

3401

3806

asnT pfkA

hsdM rsbW 2204

dacB

cobN

2203

sigF 2466

PE

3531

3678

hsdS'

nuoG rpi

2910

gatB

3805 3400

3677

2754

accA3 cbhK 3530 rpsP

2464 gabT

3008

3676 2908

2060 2061 3399

dapA

rimM

3529 lipP

3675

mmpL9

nuoF

3284

asnB

2059

3007 fbpA

2588

trmD

glyV

nth

sseA idsA

lppZ

2752 proU

lppW nuoE

rpmB2 fbpC1 3528 tig

3673 secD 3282

rpmG

phyA

2751

nuoD ctaC

3281 rpsN2 rplS 3005

3527

3672

3802

3004 rpsR2 2054

clpP 2199

2750 lepB

moeW

secF

3526 guaA nuoC

accD5

3671 2749

2053

clpP2

nuoB

rnhB

2337

mmpS3

ilvB

nuoA

3525

ephE 4,411,529

2901

2585

2459 2197 fadD32

2052

birA

2336

3524 3395

3669

ftsK

mH

ilvN 3278 rp

PPE apt qcrB

fdhF

3277 rnpA

2458

3523

cysE

3394

ilvC 3668

3922 3143

2747

3921

cysK

2051

3142

qcrA

purK

3522 3000 iunH

relA

acs

fdhD

clpX

fadB4

pgsA3 3920

pks13 2745

purE

lppY

2050

qcrC 3521

2898

gid

cmaA1

fadE23

2456

2333 ppiB

ctaE 2049

2998

fadE25 2897 35kd_ag

arB

p

3520 2743 acrA1

2581

fadE24

trpD

dppA

3519

arA

2997

3273

mez

p 2742

2896

2455

hisS lpqD

2191

accD4 pflA

3518 PE_PGRS dppB

viuB 3916

2331

3389

linB

serA 3272 2454

3517

cwlM

3137 dppC lppP

3798 xerC

2740

IS1557

2453

trxC echA19

2893

2578 3271 leuB PE_PGRS

2452 dppD

2190 2451

2739

PPE

narK1

trxB2

fadE35 2994

2450 2577 2738

ctpC

PPE

3662 2189

PE

3912 PPE

fadD19 3387

2449

2891

sigM

atsH 3386

2993 2188

3269 pks12

IS1560

2576 2327

3134

3661

recA

2575 3268

3385

gltS 3133

rpsB

3384 fadD15 2574

tsf valS 2326

3267 recX

glnU

2573 3660 3910

3132

2991 embB

gluU PE_PGRS

2735

idsB

2186

amiC trbB

2734 2325

rmlD

2990 2185 3131

lytB

folC

2887 aspS

2324 2989

3658

2184

wbbL

3381

2886 2446 IS6110

3657

2323

2733

3130

2183 3909

3380 IS1539

2571 3656

leuC

ndkA rmlA2

fadD18

embA

3129

3655 2182

2570 rocD1 2885

2732

3654

PE_PGRS

leuD

rocD2

3263

2181

3379

3908

2884 2569 PE_PGRS 2731

rne 3128

PE_PGRS

rocE hupB

2047

2730

2568

3127

mutT1

3378

2180 pyrH

3651

3262 cnA

2319 p

frr

2179

2729 3126

lppI

embC

dctA

3377 PE

uspC

3906 2567

3261

ppk

2728 PPE

cdsA

aroG

PE_PGRS 3905

3376

lipT

3904

uspE

2880 rplU miaA

3124 3649

3259

whiB2

IS1558

2879

2983 rpmA 2177

amiD

uspA 2044

dapF

3903

3792

3123 3258 obg

mpt53

3510 pncA

echA18' pknL

3122

merT

cspA

2566

2042 gpdA2 echA18 hflX

2876

2315

3647

pmmA

3791 ilvX

2175 proB 3902

mpt70

ddlA

3121

2041 otsB2

2174

3256

3790

2314 3901

2980 fadE20

3120

2874 2040

2438

3900

topA 3371

3789

2723

2565

moaE

idsA2

manA 2313 2979

3788 2039 sseC

IS1538 2722

2312

2437

3899

cysA3

3254

2978 mpt83

2311

2038

2172 3787

PE_PGRS glnQ

2872

rbsK

3645 3898 2310

moeB 2721

lppM

2871

thiL

3897 2037

dnaE2

2563 3786

3253

2170

2036

ung

2870

3896

lexA

2562 3115

2035 3644 2309 3785

IS1081

2561 2169

2034 2435

2975

metV 3643 thrU

3252 3369

3114 2719

2168 3895 2869

IS6110 2308 epiB

2560

3113

rubA 3642 2033 2974 3368 2718 2167

fic

rubB

moaD 2717

3783

gcpE

2434 2032 3249

2716

PE_PGRS 2166 moaC

IS1553

2559

moaB

3640

2433 rfbE

2867

PE_PGRS sahH 2165 2715 recG

hspX

2432

2558 moaA

3894 2866

2307

PE

spoU 3639 3781

tmk

2865

2714

2164

2557

PPE

3638

2306 3108 3780 2030

2972

ahpD

mtrA 3637

2556

2971 ahpC 2305

IS1534

2713 2864

3779 3365 3107

3636 PE

pbpB

pfkB

fadD17

mtrB

2863

lipN

alaS 2304 PPE

2028

fprA

2712 proA

3635 3364

2303

fadE27

2862

3778 3891

3363

2302 ideR 2969

lpqB

3362 2426

3890

2027 map 2301

2554

PE_PGRS

3777 2968 prfB

fadE26

3889 3361

sigB

3360 rmlB2

2161

3243

2553

2425

2300 serU

3633

2026 3104

2709

3888 glnA4 3359

fdxD 3776

3242 2160

2708 3632

3103

aroE

3502

3358 IS1558 pca

2424 2707

2025

2859

2159

3631 ftsE

3357 htpG

2551 3241 3887

lipE 3501

2423

ftsX 2706 folD 2550

3630 aldC

2704

2298

murE

2705

2549 3355

3500

3354

2548 2024 echA21

2422

smpB 3886

2966

2547 secA

2857

3353

2297 3099 sigA

2546

2421 3773 kdtB mce4

3629

3352

ssr

2296 murF 2545

2420 hisC2

2023 ppa

3351

3885 lppB

3498 ppgK 2419 nicT 3098

2022

2295

purU

serT

lppA

2021

murX

2418

argU

PE 3627

2020 3497

2019

3771

suhB

2294

gorA

2963

3239

2542

3884 2417

2018

2700

murD

3770

3626

3096

3496

3769

2854

2541 2699

2416 2017

2293 2698

3768 2962

3095

mesJ

ftsW

3883 2292

dut

3238 lprN

2016

aroF sseB

hpt 2961

3767

PE_PGRS

3237

2415 3094

2696

lpqG 2960 lppO

aroK

3766

3882

murG

kefB 3494

2695 2015 2959

cdh 3093 PE

2414 aroB

3235

2288

2694 PPE

2014

3765

2958

aroD 2852 3493 PPE

murC 3092 3881

2536 2013 2693

IS1607 2413

3234

yjcE 3492

trkB

3620 rpsT

3491

2851 3764 3880

ftsQ

3091

2957 3233

3619

trkA

2012 pepQ

lpqH 3618

pvdS

2956

2286 otsA

efp

2411 2850

2011 ftsZ

3090 2010

3879 3231

3489

ephA

nusB

2009 2285

2690

2955

yfiH 3488

3762 2532

2410

cobA 3230

3878

2008 2148

2954 lipM

2409

cobB fadD13

lipF

PE

adi

2147 2689

2283

fdxA 3486

desA3

2953 fadE36

3877

3760

2146 3616

3228 2282

2407

3088

2688

3485 cysG2 2952

3615

IS1561' proX

wag31

3349

2530

3614

2687

2144

aroA 2406 pitB 3348 3876 otsB

cpsA

3613

3087

proV efpA

2529 2405

2143 2951 IS1608 2686

3611 3612

esat6

3226

proW arsB

3874

adhD

3483 2280

mrr

2142

leuU lepA proZ

proS

2527

PPE ftsH

3225 3482

3755

3085

arsA 2005 2526

fadD29

2844

PE tyrA 2279

lppR dapE2 3481

lipR 2843

2683 3224

IS6110 2278

folE

2949 2004 2525 2140 2,250,001 2,400,001 2,550,001 2,700,001 2,850,001 3,000,001 3,150,001 3,300,001 3,450,001 3,600,001 3,750,001 3,900,001 4,050,001 4,200,001 4,350,001

Nature © Macmillan Publishers Ltd 1998 family) Rv3160c - putative transcriptional regulator truncated Rv0823c - transcriptional regulator Rv3167c - putative transcriptional regulator Rv0018c ppp putative phosphoprotein phos- (NifR3/Smm1 family) Rv3173c - transcriptional regulator phatase Rv0827c - transcriptional regulator (ArsR (TetR/AcrR family) Rv2234 ptpA low molecular weight protein-tyro- family) Rv3183 - putative transcriptional regulator sine-phosphatase Rv0890c - transcriptional regulator Rv3208 - transcriptional regulator Rv0153c - putative protein--phos- (LuxR/UhpA family) (TetR/AcrR family) phatase Rv0891c - putative transcriptional regulator Rv3249c - transcriptional regulator Rv0894 - putative transcriptional regulator (TetR/AcrR family) II. Macromolecule metabolism Rv1019 - transcriptional regulator Rv3291c - transcriptional regulator A. Synthesis and modification of macromolecules (TetR/AcrR family) (Lrp/AsnC family) 1. synthesis and modification Rv1049 - transcriptional regulator (MarR Rv3295 - transcriptional regulator Rv3420c rimI ribosomal protein S18 acetyl family) (TetR/AcrR family) transferase Rv1129c - transcriptional regulator Rv3334 - transcriptional regulator (MerR Rv0995 rimJ acetylation of 30S S5 subunit 8 (PbsX/Xre family) family) Rv0641 rplA 50S ribosomal protein L1 Rv1151c - putative transcriptional regulator Rv3405c - putative transcriptional regulator Rv0704 rplB 50S ribosomal protein L2 Rv1152 - transcriptional regulator (GntR Rv3522 - putative transcriptional regulator Rv0701 rplC 50S ribosomal protein L3 family) Rv3557c - transcriptional regulator Rv0702 rplD 50S ribosomal protein L4 Rv1167c - putative transcriptional regulator (TetR/AcrR family) Rv0716 rplE 50S ribosomal protein L5 Rv1219c - putative transcriptional regulator Rv3574 - transcriptional regulator Rv0719 rplF 50S ribosomal protein L6 Rv1255c - transcriptional regulator (TetR/AcrR family) Rv0056 rplI 50S ribosomal protein L9 (TetR/AcrR family) Rv3575c - transcriptional regulator (LacI Rv0651 rplJ 50S ribosomal protein L10 Rv1332 - putative transcriptional regulator family) Rv0640 rplK 50S ribosomal protein L11 Rv1353c - transcriptional regulator Rv3583c - putative transcriptional regulator Rv0652 rplL 50S ribosomal protein L7/L12 (TetR/AcrR family) Rv3676 - transcriptional regulator (Crp/Fnr Rv3443c rplM 50S ribosomal protein L13 Rv1358 - transcriptional regulator family) Rv0714 rplN 50S ribosomal protein L14 (LuxR/UhpA family) Rv3678c - transcriptional regulator (LysR Rv0723 rplO 50S ribosomal protein L15 Rv1359 - putative transcriptional regulator family) Rv0708 rplP 50S ribosomal protein L16 Rv1395 - transcriptional regulator Rv3736 - transcriptional regulator Rv3456c rplQ 50S ribosomal protein L17 (AraC/XylS family) (AraC/XylS family) Rv0720 rplR 50S ribosomal protein L18 Rv1404 - transcriptional regulator (MarR Rv3744 - transcriptional regulator (ArsR Rv2904c rplS 50S ribosomal protein L19 family) family) Rv1643 rplT 50S ribosomal protein L20 Rv1423 - putative transcriptional regulator Rv3830c - transcriptional regulator Rv2442c rplU 50S ribosomal protein L21 Rv1460 - putative transcriptional regulator (TetR/AcrR family) Rv0706 rplV 50S ribosomal protein L22 Rv1474c - transcriptional regulator Rv3833 - transcriptional regulator Rv0703 rplW 50S ribosomal protein L23 (TetR/AcrR family) (AraC/XylS family) Rv0715 rplX 50S ribosomal protein L24 Rv1534 - transcriptional regulator Rv3840 - putative transcriptional regulator Rv1015c rplY 50S ribosomal protein L25 (TetR/AcrR family) Rv3855 - putative transcriptional regulator Rv2441c rpmA 50S ribosomal protein L27 Rv1556 - putative transcriptional regulator Rv0105c rpmB 50S ribosomal protein L28 Rv1674c - putative transcriptional regulator 2. Two component systems Rv2058c rpmB2 50S ribosomal protein L28 Rv1675c - putative transcriptional regulator Rv1028c kdpD sensor Rv0709 rpmC 50S ribosomal protein L29 Rv1719 - transcriptional regulator (IclR Rv1027c kdpE two-component response Rv0722 rpmD 50S ribosomal protein L30 family) regulator Rv1298 rpmE 50S ribosomal protein L31 Rv1773c - transcriptional regulator (IclR Rv3246c mtrA two-component response Rv2057c rpmG 50S ribosomal protein L33 family) regulator Rv3924c rpmH 50S ribosomal protein L34 Rv1776c - putative transcriptional regulator Rv3245c mtrB sensor histidine kinase Rv1642 rpmI 50S ribosomal protein L35 Rv1816 - putative transcriptional regulator Rv0844c narL two-component response Rv3461c rpmJ 50S ribosomal protein L36 Rv1846c - putative transcriptional regulator regulator Rv1630 rpsA 30S ribosomal protein S1 Rv1931c - transcriptional regulator Rv0757 phoP two-component response Rv2890c rpsB 30S ribosomal protein S2 (AraC/XylS family) regulator Rv0707 rpsC 30S ribosomal protein S3 Rv1956 - putative transcriptional regulator Rv0758 phoR sensor histidine kinase Rv3458c rpsD 30S ribosomal protein S4 Rv1963c - putative transcriptional regulator Rv0491 regX3 two-component response Rv0721 rpsE 30S ribosomal protein S5 Rv1985c - transcriptional regulator (LysR regulator Rv0053 rpsF 30S family) Rv0490 senX3 sensor histidine kinase Rv0683 rpsG 30S ribosomal protein S7 Rv1990c - putative transcriptional regulator Rv0602c tcrA two-component response Rv0718 rpsH 30S ribosomal protein S8 Rv1994c - transcriptional regulator (MerR regulator Rv3442c rpsI 30S ribosomal protein S9 family) Rv0260c - two-component response Rv0700 rpsJ 30S ribosomal protein S10 Rv2017 - putative transcriptional regulator regulator Rv3459c rpsK 30S ribosomal protein S11 (PbsX/Xre family) Rv0600c - sensor histidine kinase Rv0682 rpsL 30S ribosomal protein S12 Rv2021c - putative transcriptional regulator Rv0601c - sensor histidine kinase Rv3460c rpsM 30S ribosomal protein S13 Rv2034 - transcriptional regulator (ArsR Rv0818 - two-component response Rv0717 rpsN 30S ribosomal protein S14 family) regulator Rv2056c rpsN2 30S ribosomal protein S14 Rv2175c - putative transcriptional regulator Rv0845 - sensor histidine kinase Rv2785c rpsO 30S ribosomal protein S15 Rv2250c - putative transcriptional regulator Rv0902c - sensor histidine kinase Rv2909c rpsP 30S ribosomal protein S16 Rv2258c - putative transcriptional regulator Rv0903c - two-component response Rv0710 rpsQ 30S ribosomal protein S17 Rv2282c - transcriptional regulator (LysR regulator Rv0055 rpsR 30S ribosomal protein S18 family) Rv0981 - two-component response Rv2055c rpsR2 30S ribosomal protein S18 Rv2308 - putative transcriptional regulator regulator Rv0705 rpsS 30S ribosomal protein S19 Rv2324 - transcriptional regulator Rv0982 - sensor histidine kinase Rv2412 rpsT 30S ribosomal protein S20 (Lrp/AsnC family) Rv1032c - sensor histidine kinase Rv3241c - member of S30AE ribosomal Rv2358 - transcriptional regulator (ArsR Rv1033c - two-component response protein family family) regulator Rv2488c - transcriptional regulator Rv1626 - two-component response 2. modification and maturation (LuxR/UhpA family) regulator Rv1010 ksgA 16S rRNA dimethyltransferase Rv2506 - transcriptional regulator Rv2027c - sensor histidine kinase Rv2838c rbfA ribosome-binding factor A (TetR/AcrR family) Rv2884 - two-component response Rv2907c rimM 16S rRNA processing protein Rv2621c - putative transcriptional regulator regulator Rv2640c - transcriptional regulator (ArsR Rv3132c - sensor histidine kinase 3. Aminoacyl tRNA synthases and their modification family) Rv3133c - two-component response Rv2555c alaS alanyl-tRNA synthase Rv2642 - transcriptional regulator (ArsR regulator Rv1292 argS arginyl-tRNA synthase family) Rv3143 - putative sensory transduction Rv2572c aspS aspartyl-tRNA synthase Rv2669 - putative transcriptional regulator protein Rv3580c cysS cysteinyl-tRNA synthase Rv2745c - putative transcriptional regulator Rv3220c - sensor histidine kinase Rv2130c cysS2 cysteinyl-tRNA synthase Rv2779c - transcriptional regulator Rv3764c - sensor histidine kinase Rv1406 fmt methionyl-tRNA formyltransferase (Lrp/AsnC family) Rv3765c - two-component response Rv3011c gatA glu-tRNA-gln amidotransferase, Rv2887 - transcriptional regulator (MarR regulator subunit B family) Rv3009c gatB glu-tRNA-gln amidotransferase, Rv2912c - transcriptional regulator 3. Serine-threonine protein kinases and phosphoprotein subunit A (TetR/AcrR family) Rv3012c gatC glu-tRNA-gln amidotransferase, Rv2989 - transcriptional regulator (IclR Rv0015c pknA serine-threonine subunit C family) Rv0014c pknB serine-threonine protein kinase Rv2992c gltS glutamyl-tRNA synthase Rv3050c - putative transcriptional regulator Rv0931c pknD serine-threonine protein kinase Rv2357c glyS glycyl-tRNA synthase Rv3055 - putative transcriptional regulator Rv1743 pknE serine-threonine protein kinase Rv2580c hisS histidyl-tRNA synthase Rv3058c - putative transcriptional regulator Rv1746 pknF serine-threonine protein kinase Rv1536 ileS isoleucyl-tRNA synthase Rv3060c - transcriptional regulator (GntR Rv0410c pknG serine-threonine protein kinase Rv0041 leuS leucyl-tRNA synthase family) Rv1266c pknH serine-threonine protein kinase Rv3598c lysS lysyl-tRNA synthase Rv3066 - putative transcriptional regulator Rv2914c pknI serine-threonine protein kinase Rv1640c lysX C-term lysyl-tRNA synthase Rv3095 - putative transcriptional regulator Rv2088 pknJ serine-threonine protein kinase Rv1007c metS methionyl-tRNA synthase Rv3124 - transcriptional regulator Rv3080c pknK serine-threonine protein kinase Rv1649 pheS phenylalanyl-tRNA synthase α (AfsR/DndI/RedD family) Rv2176 pknL serine-threonine protein kinase, subunit

Nature © Macmillan Publishers Ltd 1998 Rv1650 pheT phenylalanyl-tRNA synthase β Rv2090 - partially similar to DNA poly- 2. DNA subunit merase I Rv0670 end IV (apurinase) Rv2845c proS prolyl-tRNA synthase Rv2191 - similar to both PolC and UvrC Rv1108c xseA VII large subunit Rv3834c serS seryl-tRNA synthase proteins Rv1107c xseB exonuclease VII small subunit Rv2614c thrS threonyl-tRNA synthase Rv2464c - probable DNA , Rv2906c trmD tRNA (guanine-N1)-methyltrans- endonuclease VIII 3. Proteins, peptides and glycopeptides ferase Rv3201c - probable ATP-dependent DNA Rv3305c amiA probable aminohydrolase Rv3336c trpS tryptophanyl tRNA synthase Rv3306c amiB probable aminohydrolase Rv1689 tyrS tyrosyl-tRNA synthase Rv3202c - similar to UvrD proteins Rv3596c clpC ATP-dependent Clp Rv2448c valS valyl-tRNA synthase Rv3263 - probable DNA methylase Rv2461c clpP ATP-dependent Clp protease pro- Rv3644c - similar in N-term to DNA poly- teolytic subunit 4. Nucleoproteins merase III Rv2460c clpP2 ATP-dependent Clp protease pro- Rv1407 fmu similar to Fmu protein teolytic subunit Rv3852 hns HU-histone protein 6. Protein translation and modification Rv2457c clpX ATP-dependent Clp protease 8 Rv2986c hupB DNA-binding protein II Rv0429c def polypeptide deformylase ATP-binding subunit ClpX Rv1388 mIHF integration host factor Rv2534c efp P Rv2667 clpX' similar to ClpC from M. leprae but Rv2882c frr ribosome recycling factor shorter 5. DNA replication, repair, recombination and restric- Rv0684 fusA elongation factor G Rv3419c gcp glycoprotease tion/modification Rv0120c fusA2 elongation factor G Rv2725c hflX GTP-binding protein Rv1317c alkA DNA-3-methyladenine glycosi- Rv1080c greA elongation factor G Rv1223 htrA dase II Rv3462c infA IF-1 Rv2861c map methionine Rv2836c dinF DNA-damage-inducible protein F Rv2839c infB initiation factor IF-2 Rv0734 map' probable methionine aminopepti- Rv1329c dinG probable ATP-dependent helicase Rv1641 infC initiation factor IF-3 dase Rv3056 dinP DNA-damage-inducible protein Rv0009 ppiA peptidyl-prolyl cis-trans isomerase Rv0319 pcp pyrrolidone-carboxylate peptidase Rv1537 dinX probable DNA-damage-inducible Rv2582 ppiB peptidyl-prolyl cis-trans isomerase Rv0125 pepA probable serine protease protein Rv1299 prfA peptide chain 1 Rv2213 pepB aminopeptidase A/I Rv0001 dnaA chromosomal replication initiator Rv3105c prfB peptide chain release factor 2 Rv0800 pepC aminopeptidase I protein Rv2889c tsf elongation factor EF-Ts Rv2467 pepD probable aminopeptidase Rv0058 dnaB DNA helicase (contains intein) Rv0685 tuf elongation factor EF-Tu Rv2089c pepE cytoplasmic peptidase Rv1547 dnaE1 DNA polymerase III, α subunit Rv2535c pepQ cytoplasmic peptidase Rv3370c dnaE2 DNA polymerase III α chain 7. RNA synthesis, RNA modification and DNA Rv2782c pepR protease/peptidase, M16 family Rv2343c dnaG DNA transcription (insulinase) Rv0002 dnaN DNA polymerase III, β subunit Rv1253 deaD ATP-dependent DNA/RNA Rv2109c prcA proteasome α-type subunit 1 Rv3711c dnaQ DNA polymerase III ȏ chain helicase Rv2110c prcB proteasome β-type subunit 2 Rv3721c dnaZX DNA polymerase III, γ (dnaZ) and Rv2783c gpsI pppGpp synthase and polyribo- Rv0782 ptrBa protease II, α subunit τ (dnaX) nucleotide phosphorylase Rv0781 ptrBb protease II, β subunit Rv2924c fpg formamidopyrimidine-DNA glyco- Rv2841c nusA transcription termination factor Rv0724 sppA protease IV, signal peptide pepti- sylase Rv2533c nusB N-utilization substance protein B dase Rv0006 gyrA DNA gyrase subunit A Rv0639 nusG transcription antitermination Rv0198c - probable zinc metalloprotease Rv0005 gyrB DNA gyrase subunit B protein Rv0457c - probable peptidase Rv2092c helY probable helicase, Ski2 subfamily Rv3907c pcnA polynucleotide polymerase Rv0840c - probable iminopeptidase Rv2101 helZ probable helicase, Snf2/Rad54 Rv3232c pvdS alternative sigma factor for Rv0983 - probable serine protease family siderophore production Rv1977 - probable zinc metallopeptidase Rv2756c hsdM type I restriction/modification sys- Rv3211 rhlE probable ATP-dependent Rv3668c - probable alkaline serine protease tem DNA methylase RNA helicase Rv3671c - probable serine protease Rv2755c hsdS' type I restriction/modification sys- Rv1297 rho transcription termination Rv3883c - probable secreted protease tem specificity determinant factor rho Rv3886c - protease Rv3296 lhr ATP-dependent helicase Rv3457c rpoA α subunit of RNA polymerase Rv3014c ligA DNA ligase Rv0667 rpoB β subunit of RNA polymerase 4. Polysaccharides, lipopolysaccharides and phospho- Rv3062 ligB DNA ligase Rv0668 rpoC β' subunit of RNA polymerase lipids Rv3731 ligC probable DNA ligase Rv1364c rsbU SigB regulation protein Rv0062 celA /endoglucanase Rv1020 mfd transcription-repair coupling factor Rv3287c rsbW anti-sigma B factor Rv3915 cwlM hydrolase Rv2528c mrr restriction system protein Rv2703 sigA RNA polymerase sigma factor Rv0315 - probable β-1,3-glucanase Rv2985 mutT1 MutT homologue (aka MysA, RpoV) Rv1090 - probable inactivated Rv1160 mutT2 MutT homologue Rv2710 sigB RNA polymerase sigma factor cellulase/endoglucanase Rv0413 mutT3 MutT homologue (aka MysB) Rv1327c - probable glycosyl hydrolase, α- Rv3589 mutY probable DNA glycosylase Rv2069 sigC ECF subfamily sigma subunit family Rv3297 nei probable endonuclease VIII Rv3414c sigD ECF subfamily sigma subunit Rv1333 - probable hydrolase Rv3674c nth probable endonuclease III Rv1221 sigE ECF subfamily sigma subunit Rv3463 - probable Rv1316c ogt methylated-DNA-protein-cysteine Rv3286c sigF ECF subfamily sigma subunit Rv3717 - possible N-acetylmuramoyl-L-ala- methyltransferase Rv0182c sigG sigma-70 factors ECF subfamily nine Rv1629 polA DNA polymerase I Rv3223c sigH ECF subfamily sigma subunit Rv1402 priA putative primosomal protein n' Rv1189 sigI ECF family sigma factor 5. Esterases and lipases (replication factor Y) Rv3328c sigJ similar to SigI, ECF family Rv0220 lipC probable Rv3585 radA probable DNA repair RadA homo- Rv0445c sigK ECF-type sigma factor Rv1923 lipD probable esterase logue Rv0735 sigL sigma-70 factors ECF subfamily Rv3775 lipE probable hydrolase Rv2737c recA recombinase (contains intein) Rv3911 sigM probable sigma factor, similar to Rv3487c lipF probable esterase Rv0630c recB V SigE Rv0646c lipG probable hydrolase Rv0631c recC exodeoxyribonuclease V Rv3366 spoU probable rRNA methylase Rv1399c lipH probable lipase Rv0629c recD exodeoxyribonuclease V Rv3455c truA probable pseudouridylate syn- Rv1400c lipI probable lipase Rv0003 recF DNA replication and SOS induc- thase Rv1900c lipJ probable esterase tion Rv2793c truB tRNA pseudouridine 55 synthase Rv2385 lipK probable acetyl-hydrolase Rv2973c recG ATP-dependent DNA helicase Rv1644 tsnR putative 23S rRNA methyltrans- Rv1497 lipL esterase Rv1696 recN recombination and DNA repair ferase Rv2284 lipM probable esterase Rv3715c recR RecBC-Independent process of Rv3649 - ATP-dependent DNA/RNA heli- Rv2970c lipN probable lipase/esterase DNA repair case Rv1426c lipO probable esterase Rv2736c recX regulatory protein for RecA Rv2463 lipP probable esterase Rv2593c ruvA Holliday junction binding protein, 8. Polysaccharides (cytoplasmic) Rv2485c lipQ probable carboxlyesterase DNA helicase Rv1326c glgB 1,4-α-glucan branching enzyme Rv3084 lipR probable acetyl-hydrolase Rv2592c ruvB Holliday junction binding protein Rv1328 glgP probable glycogen phosphory- Rv3176c lipS probable esterase/lipase Rv2594c ruvC Holliday junction resolvase, endo- lase Rv2045c lipT probable Rv1564c glgX probable glycogen debranching Rv1076 lipU probable esterase Rv0054 ssb single strand binding protein enzyme Rv3203 lipV probable lipase Rv1210 tagA DNA-3-methyladenine glycosi- Rv1563c glgY putative α-amylase Rv0217c lipW probable esterase dase I Rv1562c glgZ maltooligosyltrehalose trehalohy- Rv2351c plcA C precursor Rv3646c topA DNA topoisomerase drolase Rv2350c plcB precursor Rv2976c ung uracil-DNA glycosylase Rv0126 - probable glycosyl hydrolase Rv2349c plcC phospholipase C precursor Rv1638 uvrA excinuclease ABC subunit A Rv1781c - probable 4-α-glucanotransferase Rv1755c plcD partial CDS for phospholipase C Rv1633 uvrB excinuclease ABC subunit B Rv2471 - probable α-glucosidase Rv1104 - probable esterase pseudogene Rv1420 uvrC excinuclease ABC subunit C Rv1105 - probable esterase pseudogene Rv0949 uvrD DNA-dependent ATPase I and B. Degradation of macromolecules helicase II 1. RNA 6. Aromatic hydrocarbons Rv3198c uvrD2 putative UvrD Rv1014c pth peptidyl-tRNA hydrolase Rv3469c mhpE probable 4-hydroxy-2-oxovalerate Rv0427c xthA exodeoxyribonuclease III Rv2925c rnc RNAse III aldolase Rv0071 - group II intron maturase Rv2444c rne similar at C-term to ribo- Rv0316 - probable muconolactone iso- Rv0861c - probable DNA helicase E merase Rv0944 - possible formamidopyrimidine- Rv2902c rnhB HII Rv0771 - probable 4-carboxymuconolac- DNA glycosylase Rv3923c rnpA protein compo- tone decarboxylase Rv1688 - probable 3-methylpurine DNA nent Rv0939 - probable dehydrase glycosylase Rv1340 rphA ribonuclease PH Rv1723 - 6-aminohexanoate-dimer hydro-

Nature © Macmillan Publishers Ltd 1998 lase Rv1367c - probable penicillin binding protein Rv1030 kdpB potassium-transporting ATPase B Rv2715 - 2-hydroxymuconic semialdehyde Rv1730c - probable penicillin binding protein chain hydrolase Rv1922 - probable penicillin binding protein Rv1031 kdpC potassium-transporting ATPase C Rv3530c - probable cis-diol dehydrogenase Rv2864c - probable penicillin binding protein chain Rv3534c - 4-hydroxy-2-oxovalerate aldolase Rv3330 - probable penicillin binding protein Rv3236c kefB probable glutathione-regulated Rv3536c - aromatic hydrocarbon degrada- Rv3627c - probable penicillin binding protein potassium-efflux protein tion Rv2877c merT possible mercury resistance 4. Conserved membrane proteins transport system C. Cell envelope Rv0402c mmpL1 conserved large membrane Rv1811 mgtC probable transport 1. Lipoproteins (lppA-lpr0) 65 protein ATPase protein C Rv0507 mmpL2 conserved large membrane Rv0362 mgtE putative magnesium ion 2. Surface polysaccharides, lipopolysaccharides, pro- protein transporter teins and antigens Rv0206c mmpL3 conserved large membrane Rv2856 nicT probable transport protein Rv0806c cpsY probable UDP-glucose-4- protein Rv0924c nramp transmembrane protein belonging 8 epimerase Rv0450c mmpL4 conserved large membrane to Nramp family Rv3811 csp secreted protein protein Rv2691 trkA probable potassium uptake pro- Rv1677 dsbF highly similar to C-term Mpt53 Rv0676c mmpL5 conserved large membrane tein Rv3794 embA involved in arabinogalactan syn- protein Rv2692 trkB probable potassium uptake pro- thesis Rv1557 mmpL6 conserved large membrane tein Rv3795 embB involved in arabinogalactan syn- protein Rv2287 yjcE probable Na+/H+ exchanger thesis Rv2942 mmpL7 conserved large membrane Rv2723 - probable membrane protein, Rv3793 embC involved in arabinogalactan syn- protein tellurium resistance thesis Rv3823c mmpL8 conserved large membrane Rv3162c - probable membrane protein Rv3875 esat6 early secretory antigen target protein Rv3237c - possible potassium channel Rv0112 gca probable GDP-mannose dehy- Rv2339 mmpL9 conserved large membrane protein dratase protein Rv3743c - probable cation-transporting Rv0113 gmhA phosphoheptose isomerase Rv1183 mmpL10 conserved large membrane ATPase Rv2965c kdtB lipopolysaccharide core biosyn- protein thesis protein Rv0202c mmpL11 conserved large membrane 3. Carbohydrates, organic acids and alcohols Rv2878c mpt53 secreted protein Mpt53 protein Rv2443 dctA C4-dicarboxylate transport protein Rv1980c mpt64 secreted immunogenic protein Rv1522c mmpL12 conserved large membrane Rv3476c kgtP sugar transport protein Mpb64/Mpt64 protein Rv1902c nanT probable transporter Rv2875 mpt70 major secreted immunogenic pro- Rv0403c mmpS1 conserved small membrane Rv1236 sugA membrane protein probably tein Mpt70 precursor protein involved in sugar transport Rv2873 mpt83 surface lipoprotein Mpt83 Rv0506 mmpS2 conserved small membrane Rv1237 sugB sugar transport protein Rv0899 ompA member of OmpA family protein Rv1238 sugC ABC transporter component of Rv3810 pirG cell surface protein precursor (Erp Rv2198c mmpS3 conserved small membrane sugar uptake system protein) protein Rv3331 sugI probable sugar transport protein Rv3782 rfbE similar to rhamnosyl transferase Rv0451c mmpS4 conserved small membrane Rv2835c ugpA sn-glycerol-3-phosphate Rv1302 rfe undecaprenyl-phosphate α-N- protein permease acetylglucosaminyltransferase Rv0677c mmpS5 conserved small membrane Rv2833c ugpB sn-glycerol-3-phosphate-binding Rv2145c wag31 antigen 84 (aka wag31) protein periplasmic lipoprotein Rv0431 - tuberculin related peptide (AT103) Rv2832c ugpC sn-glycerol-3-phosphate transport Rv0954 - cell envelope antigen 5. Other membrane proteins 211 ATP-binding protein Rv1514c - involved in polysaccharide syn- Rv2834c ugpE sn-glycerol-3-phosphate transport thesis III. Cell processes system protein Rv1518 - involved in exopolysaccharide A. Transport/binding proteins Rv2316 uspA sugar transport protein synthesis 1. Amino acids Rv2318 uspC sugar transport protein Rv1758 - partial cutinase Rv2127 ansP L-asparagine permease Rv2317 uspE sugar transport protein Rv1910c - probable secreted protein Rv0346c aroP2 probable aromatic amino acid Rv1200 - probable sugar transporter Rv1919c - weak similarity to pollen antigens permease Rv2038c - probable ABC sugar transporter Rv1984c - probable secreted protein Rv0917 betP glycine betaine transport Rv2039c - probable sugar transporter Rv1987 - probable secreted protein Rv1704c cycA transport of D-alanine, D-serine Rv2040c - probable sugar transporter Rv2223c - probable exported protease and glycine Rv2041c - probable sugar transporter Rv2224c - probable exported protease Rv3666c dppA probable peptide transport system Rv2301 - probable cutinase permease 4. Anions Rv2345 - precursor of probable membrane Rv3665c dppB probable peptide transport system Rv2684 arsA probable arsenical pump protein permease Rv2685 arsB probable arsenical pump Rv2672 - putative exported protease Rv3664c dppC probable peptide transport system Rv3578 arsB2 probable arsenical pump Rv3019c - similar to Esat6 permease Rv2643 arsC probable arsenical pump Rv3036c - probable secreted protein Rv3663c dppD probable ABC-transporter Rv2397c cysA sulphate transport ATP-binding Rv3449 - probable precursor of serine pro- Rv0522 gabP probable 4-amino butyrate trans- protein tease porter Rv2399c cysT sulphate transport system perme- Rv3451 - probable cutinase Rv0411c glnH putative glutamine binding protein ase protein Rv3452 - probable cutinase precursor Rv2564 glnQ probable ATP-binding transport Rv2398c cysW sulphate transport system perme- Rv3724 - probable cutinase precursor protein ase protein Rv1280c oppA probable oligopeptide transport Rv1857 modA molybdate binding protein 3. Murein sacculus and peptidoglycan protein Rv1858 modB transport system permease, Rv2911 dacB penicillin binding protein Rv1283c oppB oligopeptide transport protein molybdate uptake Rv2981c ddlA D-alanine-D-alanine ligase A Rv1282c oppC oligopeptide transport system per- Rv1859 modC molybdate uptake ABC- Rv3809c glf UDP-galactopyranose mutase mease transporter Rv1018c glmU UDP-N-acetylglucosamine Rv1281c oppD probable peptide transport protein Rv1860 modD precursor of Apa (45/47 pyrophosphorylase Rv2320c rocE arginine/ornithine transporter kD secreted protein) Rv3382c lytB LytB protein homologue Rv3253c - probable cationic amino acid Rv2329c narK1 probable nitrite extrusion protein Rv1110 lytB' very similar to LytB transport Rv1737c narK2 nitrite extrusion protein Rv1315 murA UDP-N-acetylglucosamine-1-car- Rv3454 - possible proline permease Rv0261c narK3 nitrite extrusion protein1 boxyvinyltransferase Rv0267 narU similar to nitrite extrusion Rv0482 murB UDP-N-acetylenolpyruvoylglu- 2. Cations protein 2 cosamine reductase Rv2920c amt putative ammonium transporter Rv0934 phoS1 PstS component of phosphate Rv2152c murC UDP-N-acetyl-muramate-alanine Rv1607 chaA putative calcium/proton antiporter uptake ligase Rv1239c corA probable magnesium and cobalt Rv0928 phoS2 PstS component of phosphate Rv2155c murD UDP-N-acetylmuramoylalanine-D- transport protein uptake glutamate ligase Rv0092 ctpA cation-transporting ATPase Rv0820 phoT phosphate transport system ABC Rv2158c murE meso-diaminopimelate-adding Rv0103c ctpB cation transport ATPase transporter enzyme Rv3270 ctpC cation transport ATPase Rv3301c phoY1 phosphate transport system Rv2157c murF D-alanine:D-alanine-adding Rv1469 ctpD probable cadmium-transporting regulator enzyme ATPase Rv0821c phoY2 phosphate transport system Rv2153c murG transferase in peptidoglycan syn- Rv0908 ctpE probable cation transport ATPase regulator thesis Rv1997 ctpF probable cation transport ATPase Rv0545c pitA low-affinity inorganic phosphate Rv1338 murI glutamate racemase Rv1992c ctpG probable cation transport ATPase transporter Rv2156c murX phospho-N-acetylmuramoyl- Rv0425c ctpH C-terminal region putative cation- Rv2281 pitB phosphate permease petapeptide transferase transporting ATPase Rv0930 pstA1 PstA component of phosphate Rv3332 nagA N-acetylglucosamine-6-P- Rv0107c ctpI probable magnesium transport uptake deacetylase ATPase Rv0936 pstA2 PstA component of phosphate Rv0016c pbpA penicillin-binding protein Rv0969 ctpV cation transport ATPase uptake Rv2163c pbpB penicillin-binding protein 2 Rv3044 fecB putative FeIII-dicitrate transporter Rv0933 pstB ABC transport component of Rv0050 ponA penicillin-bonding protein Rv0265c fecB2 iron transport protein FeIII dici- phosphate uptake Rv3682 ponA' class A penicillin binding protein trate transporter Rv0935 pstC PstC component of phosphate Rv0017c rodA FtsW/RodA/SpovE family Rv1029 kdpA potassium-transporting ATPase A uptake Rv0907 - probable penicillin binding protein chain Rv0929 pstC2 membrane-bound component of

Nature © Macmillan Publishers Ltd 1998 phosphate transport system unit Rv3500c - part of mce4 operon Rv0932c pstS PstS component of phosphate Rv1821 secA2 SecA, preprotein sub- Rv3501c - part of mce4 operon uptake unit Rv3896c - putative p60 homologue Rv2400c subI sulphate binding precursor Rv2587c secD protein-export membrane protein Rv3922c - possible hemolysin Rv0143c - probable chloride channel Rv0638 secE SecE preprotein translocase Rv1707 - probable sulphate permease Rv2586c secF protein-export membrane protein B. IS elements, Repeated sequences, and Phage Rv1739c - possible sulphate transporter Rv1440 secG protein-export membrane protein 1. IS elements Rv3679 - possible anion transporter SecG IS6110 16 copies Rv3680 - probable anion transporter Rv0732 secY SecY subunit of preprotein translo- IS1081 6 copies case Others 37 copies 5. Fatty acid transport Rv2462c tig chaperone protein, similar to Rv2790c ltp1 non-specific lipid transport protein trigger factor 2. REP13E12 family 7 copies Rv3540c ltp2 non-specific lipid transport protein Rv2813 - probable general secretion path- way protein 3. Phage-related functions 8 6. Efflux proteins Rv2894c xerC integrase/recombinase Rv2936 drrA similar daunorubicin resistance E. Adaptations and atypical conditions Rv1701 xerD integrase/recombinase ABC-transporter Rv1901 cinA competence damage protein Rv1054 - integrase-a Rv2937 drrB similar daunorubicin resistance Rv3648c cspA cold shock protein, transcriptional Rv1055 - integrase-b transmembrane protein regulator Rv1573 - phiRV1 phage related protein Rv2938 drrC similar daunorubicin resistance Rv0871 cspB probable cold shock protein Rv1574 - phiRV1 phage related protein transmembrane protein Rv3063 cstA starvation-induced stress Rv1575 - phiRV1 phage related protein Rv2846c efpA putative efflux protein response protein Rv1576c - phiRV1 phage related protein Rv3065 emrE resistance to ethidium bromide Rv3490 otsA probable α,α-trehalose-phosphate Rv1577c - phiRV1 possible prohead protease Rv0783c - multidrug resistance protein synthase Rv1578c - phiRV1 phage related protein Rv0849 - possible quinolone efflux pump Rv2006 otsB trehalose-6-phosphate phos- Rv1579c - phiRV1 phage related protein Rv1145 - probable drug transporter phatase Rv1580c - phiRV1 phage related protein Rv1146 - probable drug transporter Rv3372 otsB2 trehalose-6-phosphate phos- Rv1581c - phiRV1 phage related protein Rv1250 - probable drug efflux protein phatase Rv1582c - phiRV1 phage related protein Rv1258c - probable multidrug resistance Rv3758c proV osmoprotection ABC transporter Rv1583c - phiRV1 phage related protein pump Rv3757c proW transport system permease Rv1584c - phiRV1 phage related protein Rv1410c - probable drug efflux protein Rv3759c proX similar to osmoprotection proteins Rv1585c - phiRV1 phage related protein Rv1634 - probable drug efflux protein Rv3756c proZ transport system permease Rv1586c - phiRV1 integrase Rv1819c - probable multidrug resistance Rv1026 - probable pppGpp-5'phosphohydro- Rv2309c - integrase pump lase Rv2310 - excisionase Rv2136c - putative bacitracin resistance pro- Rv2646 - phiRV2 integrase tein F. Detoxification Rv2647 - phiRV2 phage related protein Rv2209 - probable drug efflux protein Rv2428 ahpC alkyl hydroperoxide reductase Rv2650c - phiRV2 phage related protein Rv2333c - probable tetracenomycin C resis- Rv2429 ahpD member of AhpC/TSA family Rv2651c - phiRV2 prohead protease tance protein Rv2238c ahpE member of AhpC/TSA family Rv2652c - phiRV2 phage related protein Rv2994 - probable fluoroquinolone efflux Rv2521 bcp bacterioferritin comigratory protein Rv2653c - phiRV2 phage related protein protein Rv1608c bcpB probable bacterioferritin comigra- Rv2654c - phiRV2 phage related protein Rv1877 - probable drug efflux protein tory protein Rv2655c - phiRV2 phage related protein Rv2459 - probable drug efflux protein Rv3473c bpoA probable non-heme bromoperoxi- Rv2656c - phiRV2 phage related protein dase Rv2657c - similar to gp36 of mycobacterio- B. Chaperones/Heat shock Rv1123c bpoB probable non-heme bromoperoxi- phage L5 Rv0384c clpB dase Rv2658c - phiRV2 phage related protein Rv0352 dnaJ acts with GrpE to stimulate DnaK Rv0554 bpoC probable non-heme bromoperoxi- Rv2659c - phiRV2 integrase ATPase dase Rv2830c - similar to phage P1 phd gene Rv2373c dnaJ2 DnaJ homologue Rv3617 ephA probable epoxide hydrolase Rv3750c - excisionase Rv0350 dnaK 70 kD heat shock protein, chromo- Rv1938 ephB probable epoxide hydrolase Rv3751 - putative integrase some replication Rv1124 ephC probable epoxide hydrolase Rv3417c groEL1 60 kD chaperonin 1 Rv2214c ephD probable epoxide hydrolase C. PE and PPE families Rv0440 groEL2 60 kD chaperonin 2 Rv3670 ephE probable epoxide hydrolase 1. PE family Rv3418c groES 10 kD chaperone Rv0134 ephF probable epoxide hydrolase PE subfamily 38 members Rv0351 grpE stimulates DnaK ATPase activity Rv3171c hpx probable non-heme haloperoxi- PE_PGRS subfamily 61 members Rv2374c hrcA heat-inducible transcription dase repressor Rv1908c katG catalase-peroxidase 2. PPE family 68 members Rv0251c hsp possible heat shock protein Rv3846 sodA Rv0353 hspR heat shock regulator Rv0432 sodC superoxide dismutase precursor - D. production and resistance Rv2031c hspX 14kD antigen, heat shock protein (Cu-Zn) Rv2068c blaC class A β-lactamase Hsp20 family Rv1932 tpx peroxidase Rv3290c lat lysine-ȏ aminotransferase Rv2299c htpG heat shock protein Hsp90 family Rv0634c - putative glyoxylase II Rv2043c pncA pyrazinamide resistance/sensitivity Rv0563 htpX probable (transmembrane) heat Rv2581c - putative glyoxylase II Rv0133 - possible puromycin N-acetyltrans- shock protein Rv3177 - probable non-heme haloperoxi- ferase Rv2701c suhB putative extragenic suppressor dase Rv0262c - aminoglycoside 2'-N-acetyltrans- protein ferase Rv3269 - probable heat shock protein IV. Other Rv0802c - acetyltransferase A. Virulence Rv1082 - similar to S. lincolnensis lmbE C. Cell division Rv0169 mce1 cell invasion protein Rv1170 - similar to S. lincolnensis lmbE Rv3641c fic possible cell division protein Rv0589 mce2 cell invasion protein Rv1347c - possible aminoglycoside 6'-N- Rv3102c ftsE membrane protein Rv1966 mce3 cell invasion protein acetyltransferase Rv3610c ftsH inner membrane protein, Rv3499c mce4 cell invasion protein Rv2036 - similar to lincomycin production chaperone Rv3100c smpB probable small protein b genes Rv2748c ftsK chromosome partitioning Rv1694 tlyA cytotoxin/hemolysin homologue Rv2303c - similar to S. griseus macrotetrolide Rv2151c ftsQ ingrowth of wall at septum Rv0024 - putative p60 homologue resistance protein Rv2154c ftsW membrane protein (shape determi- Rv0167 - part of mce1 operon Rv3225c - probable aminoglycoside 3'-phos- nation) Rv0168 - part of mce1 operon photransferases Rv3101c ftsX membrane protein Rv0170 - part of mce1 operon Rv3700c - probable acetyltransferase Rv2921c ftsY cell division protein FtsY Rv0171 - part of mce1 operon Rv3817 - probable aminoglycoside 3'-phos- Rv2150c ftsZ circumferential ring, GTPase Rv0172 - part of mce1 operon photransferase Rv3919c gid glucose inhibited division protein B Rv0174 - part of mce1 operon Rv3625c mesJ probable cell cycle protein Rv0587 - part of mce2 operon E. Bacteriocin-like proteins 3 Rv3917c parA chromosome partitioning; DNA - Rv0588 - part of mce2 operon binding Rv0590 - part of mce2 operon F. Cytochrome P450 enzymes 22 Rv3918c parB possibly involved in chromosome Rv0591 - part of mce2 operon partitioning Rv0592 - part of mce2 operon G. Coenzyme F420-dependent Rv2922c smc member of Smc1/Cut3/Cut14 Rv0594 - part of mce2 operon enzymes 3 family Rv1085c - possible hemolysin Rv0012 - possible cell division protein Rv1477 - putative exported p60 protein H. Miscellaneous transferases 61 Rv0435c - ATPase of AAA-family homologue Rv2115c - ATPase of AAA-family Rv1478 - putative exported p60 protein I. Miscellaneous phosphatases, , Rv3213c - possible role in chromosome seg- homologue and 18 regation Rv1566c - putative exported p60 protein Rv1708 - possible role in chromosome parti- homologue J. Cyclases 6 tioning Rv1964 - part of mce3 operon Rv1965 - part of mce3 operon K. Chelatases 2 D. Protein and peptide secretion Rv1967 - part of mce3 operon Rv2916c ffh signal recognition particle protein Rv1968 - part of mce3 operon V. Conserved hypotheticals 912 Rv2903c lepB signal peptidase I Rv1969 - part of mce3 operon Rv1614 lgt prolipoprotein diacylglyceryl trans- Rv1971 - part of mce3 operon VI. Unknowns 606 ferase Rv2190c - putative p60 homologue Rv1539 lspA lipoprotein signal peptidase Rv3494c - part of mce4 operon TOTAL 3924 Rv0379 sec probable transport protein Rv3496c - part of mce4 operon SecE/Sec61- γ family Rv3497c - part of mce4 operon Rv3240c secA SecA, preprotein translocase sub- Rv3498c - Naturepar ©t ofMacmillan mce4 operon Publishers Ltd 1998