<<

USOO6593114B1 (12) United States Patent (10) Patent No.: US 6,593,114 B1 Kunsch et al. (45) Date of Patent: Jul. 15, 2003

(54) STAPHYLOCOCCUS AUREUS OTHER PUBLICATIONS POLYNUCLEOTIDES AND SEQUENCES Lewin, in IV, Oxford University Press, p. 816, 1990.* (75) Inventors: Charles A. Kunsch, Norcross, GA Sharrocks, in “PCR Technology Current Innovations”, Grif (US); Gil H. Choi, Rockville, MD fin et al eds. CRC Press Inc, pp. 5-11, 1994.* (US); Steven Barash, Rockville, MD American Type Culture (ATCC), Catalogue of Bacteria & (US); Patrick J. Dillon, Carlsbad, CA Bacteriophages 17th Edition, pp. 202-204, 1989.* (US); Michael R. Fannon, Silver Walkenhorst et al, Microbiol Res, 150:347–361, 1995.* Spring, MD (US); Craig A. Rosen, Dorrell et al., Photochem. Photobiol, 58:831-835, 1993.* Laytonsville, MD (US) Sambrook et al Molecular Cloning, A Laboratory Manual, 2nd Ed., Cold Spring Harbor pp. 17.1-17.44, 1989.* (73) Assignee: Human Genome Sciences, Inc., Kennell et al, Proc. Nucl. Acid Res. Mol. Biol. 11:259-301, Rockville, MD (US) 1971.* (*) Notice: Subject to any disclaimer, the term of this Herzog et al., DNA and Cell Biology 12(6): 465-471, 1993.* ject y Jazin et al, Regulatory Peptides 47:247–258, 1993.* past Sh adjusted under 35 Rudinger et al., in “Peptide Hormones”, ed. Parsons J.A. a -- (b) by 0 days. University Park Press, pp. 1–6, 1976.* Burgess et al. et al., The Journal of Cell Biology (21) Appl. No.: 08/956,171 111:2129-2138, 1990.* (22) Filed: Oct. 20, 1997 Lazar et al., Molecular and Cellular Biology 8(3):1247–1252, 1988.* Related U.S. Application Data Jubling et al, Mol. Microbiol., 5(7): 1755–67, 1991.* (63) Continuation-in-part of application No. 08/781986, filed on * cited by examiner Jan. 3, 1997. (60) Provisional application No. 60/009,861, filed on Jan. 5, Primary Examiner Patricia A. Duffy 1996. (74) Attorney, Agent, or Firm-Human Genome Sciences, (51) Int. Cl." ...... C12N 15/64; CO7H 21/04 ". (52) U.S. Cl...... 435/91.41; 435/91.4; 435/252.3; (57) ABSTRACT 435/254.11; 435/257.2; 435/320.1; 435/325; 536/23.7 The present invention provides polynucleotide Sequences of (58) Field of Search ...... 536,237, 435/69.1, the genome of Staphylococcus aureus, polypeptide 435/69.7, 70.1, 71.1, 71.2, 320.1, 325 Sequences encoded by the polynucleotide Sequences, corre 252.3 254.11 2572. 914. 61.41: 6356. sponding polynucleotides and polypeptides, Vectors and s s s 11 12 22. 23 66 hosts comprising the polynucleotides, and assays and other s u- as a as a -1s uses thereof. The present invention further provides poly (56) References Cited nucleotide and polypeptide Sequence information Stored on computer readable media, and computer-based Systems and U.S. PATENT DOCUMENTS methods which facilitate its use. 5,175,101 A * 12/1992 Gotz et al. 6,019,984. A * 2/2000 MacInnes et al. 15 Claims, 2 Drawing Sheets

US 6,593,114 B1 1 2 STAPHYLOCOCCUSAUREUS and it may reach below the skin, enter the lymphatic and POLYNUCLEOTIDES AND SEQUENCES blood circulation and develop into Septicaemia S. aureus, is among the most important pathogens typically found in burn CROSS REFERENCE TO RELATED wound infections. It can destroy granulation tissue and APPLICATIONS produce Severe Septicaemia. This application is a continuation-in-part of and claims Cellulitis priority under 35 U.S.C. S 120 to U.S. patent application Ser. Cellulitis, an acute infection of the Skin that expands from No. 08/781,986, filed Jan. 3, 1997 (pending), which is a a typically Superficial origin to spread below the cutaneous non-provisional of and claims benefit under 35 U.S.C. S layer, most commonly is caused by S. aureuS in conjunction 119(e) of U.S. Provisional Application Ser. No. 60/009,861 with S. pyrogenes. Cellulitis can lead to Systemic infection. filed Jan. 5, 1996. In fact, cellulitis can be one aspect of Synergistic bacterial Reference to a Sequence Listing Provided on Compact gangrene. This condition typically is caused by a mixture of Disc S. aureus and microaerophilic Streptococci. It causes necro sis and treatment is limited to excision of the necrotic tissue. This application refers to a "Sequence Listing”, which is 15 The condition often is fatal. provided as an electronic document on two identical com pact discs (CD-R), labeled “Copy 1” and “Copy 2.” These PEyelid infections compact discS each contain the electronic document, file S. aureus is the cause of Styes and of Sticky eye" in name “PB248P1 sequence listing..txt” (6,143,313 bytes in neonates, among other eye infections. Typically Such infec size, created on Jan. 24, 2002), which is hereby incorporated tions are limited to the Surface of the eye, and may occa in its entirety herein. Sionally penetrate the Surface with more Severe conse quences. FIELD OF THE INVENTION Food poisoning Some Strains of S. aureuS produce one or more of five The present invention relates to the field of molecular 25 Serologically distinct, heat and acid Stable enterotoxins that biology. In particular, it relates to, among other things, are not destroyed by digestive process of the Stomach and nucleotide Sequences of StaphylococcuS aureus, contigs, Small intestine (enterotoxins A-E). Ingestion of the toxin, in ORFs, fragments, probes, primers and related polynucle Sufficient quantities, typically results in Severe vomiting, but otides thereof, peptides and polypeptides encoded by the not diarrhoea. The effect does not require viable bacteria. Sequences, and uses of the polynucleotides and Sequences Although the toxins are known, their mechanism of action is thereof, Such as in fermentation, polypeptide production, not understood. assays and pharmaceutical development, among others. Joint infections BACKGROUND OF THE INVENTION S. aureuS infects bone joints causing diseaseS Such Osteo myelitis. The genus StaphylococcuS includes at least 20 distinct 35 Osteomyelitis species. (For a review see Novick, R. P., The Staphylococcus S. aureus is the most common causative agent of hae as a Molecular Genetic System, Chapter 1, pgs. 1-37 in matogenous osteomyelitis. The disease tends to occur in MOLECULAR BIOLOGY OF THE STAPHYLOCOCCI, children and adolescents more than adults and it is associ R. Novick, Ed., VCH Publishers, New York (1990)). Species ated with non-penetrating injuries to bones. Infection typi differ from one another by 80% or more, by hybridization 40 cally occurs in the long end of growing bone, hence its kinetics, whereas Strains within a species are at least 90% occurrence in physically immature populations. Most often, identical by the same measure. infection is localized in the vicinity of Sprouting capillary The Species StaphylococcuS aureus, a gram-positive, fac loops adjacent to epiphysial growth plates in the end of long, ultatively aerobic, clump-forming cocci, is among the most growing bones. important etiological agents of bacterial infection in 45 Skin infections humans, as discussed briefly below. S. aureuS is the most common pathogen of Such minor Human Health and S. Aureus skin infections as abscesses and boils. Such infections often StaphylococcuS aureus is a ubiquitous pathogen. (See, for are resolved by normal host response mechanisms, but they instance, Mims et al., MEDICAL MICROBIOLOGY, also can develop into Severe internal infections. Recurrent Mosby-Year Book Europe Limited, London, UK (1993)). It 50 infections of the nasal passages plague nasal carriers of S. is an etiological agent of a variety of conditions, ranging in LifeS . severity from mild to fatal. A few of the more common Surgical Wound Infections conditions caused by S. aureuS infection are burns, cellulitis, Surgical wounds often penetrate far into the body. Infec eyelid infections, food poisoning, joint infections, neonatal 55 tion of Such wound thus poses a grave risk to the patient. S. conjunctivitis, osteomyelitis, Skin infections, Surgical aureuS is the most important causative agent of infections in wound infection, Scalded skin Syndrome and toxic shock Surgical wounds. S. aureuS is unusually adept at invading syndrome, some of which are described further below. Surgical wounds, Sutured wounds can be infected by far Burns fewer S. aureus cells then are necessary to cause infection in Burn wounds generally are sterile initially. However, they 60 normal skin. Invasion of Surgical wound can lead to Severe generally compromise physical and immune barriers to S. aureuS Septicaemia. Invasion of the blood Stream by S. infection, cause loSS of fluid and electrolytes and result in aureuS can lead to Seeding and infection of internal organs, local or general physiological dysfunction. After cooling, particularly heart Valves and bone, causing Systemic contact with viable bacteria results in mixed colonization at diseases, Such as endocarditis and Osteomyelitis. the injury site. Infection may be restricted to the non-viable 65 Scalded Skin Syndrome debris on the burn Surface (“eschar”), it may progress into S. aureus is responsible for “scalded skin syndrome” (also full skin infection and invade viable tissue below the eschar called toxic epidermal necrosis, Ritter's disease and Lyell’s US 6,593,114 B1 3 4 disease). This diseases occurs in older children, typically in Molecular Genetics of Staphylococcus Aureus outbreaks caused by flowering of S. aureus Strains produce Despite its importance in, among other things, human exfoliation (also called Scalded skin Syndrome toxin). disease, relatively little is known about the genome of this Although the bacteria initially may infect only a minor organism. lesion, the toxin destroys intercellular connections, spreads Most genetic Studies of S. aureus have been carried out epidermal layerS and allows the infection to penetrate the using the strain NCTC8325, which contains prophages outer layer of the skin, producing the descquamation that psi 11, psi 12 and psi 13, and the UV-cured derivative of this typifies the diseases. Shedding of the outer layer of skin strain, 8325-4 (also referred to as RN450), which is free of generally reveals normal skin below, but fluid lost in the the prophages. proceSS can produce Severe injury in young children if it is These Studies revealed that the S. aureus genome, like that not treated properly. of other Staphylococci, consists of one circular, covalently Toxic Shock Syndrome closed, double-stranded DNA and a collection of so-called Toxic shock Syndrome is caused by Strains of S. aureaS variable accessory genetic elements, Such as prophages, that produce the So-called toxic shock Syndrome toxin. The 15 plasmids, transposons and the like. disease can be caused by S. aureuS infection at any Site, but Physical characterization of the genome has not been it is too often erroneously viewed exclusively as a disease carried out in any detail. Pattee et al. published a low Solely of women who use tampons. The disease involves resolution and incomplete genetic and physical map of the toxaemia and Septicaemia, and can be fatal. chromosome of S. aureus strain NCTC 8325. (Pattee et al. NocoSomial Infections Genetic and Physical Mapping of Chromosome of Staphy In the 1984 National NocoSomial Infection Surveillance lococcus aureus NCTC 8325, Chapter 11, pgs. 163-169 in Study (“NNIS”) S. aureus was the most prevalent agent of MOLECULAR BIOLOGY OF THE STAPHYLOCOCCI, Surgical wound infections in many hospital Services, includ R. P. Novick, Ed., VCH Publishers, N.Y., (1990) The genetic ing medicine, Surgery, obstetrics, pediatricS and newborns. map largely was produced by mapping insertions of Tn551 Resistance to drugs of S. aureuS Strains 25 and Tn4001, which, respectively, confer erythromycin and Prior to the introduction of penicillin the prognosis for gentamicin resistance, and by analysis of Smal-digested patients Seriously infected with S. aureus was unfavorable. DNA by Pulsed Field Gel Electrophoresis (“PFGE”). Following the introduction of penicillin in the early 1940s The map was of low resolution; even estimating the even the worst S. aureuS infections generally could be physical size of the genome was difficult, according to the treated Successfully. The emergence of penicillin-resistant investigators. The size of the largest SmaI chromosome Strains of S. aureuS did not take long, however. Most Strains fragment, for instance, was too large for accurate sizing by of S. aureuS encountered in hospital infections today do not PFGE. To estimate its size, additional restriction sites had to respond to penicillin; although, fortunately, this is not the be introduced into the chromosome using a transposon case for S. aureus encountered in community infections. containing a SmaI recognition Sequence. It is well known now that penicillin-resistant strains of S. 35 In Sum, most physical characteristics and almost all of the aureuS produce a lactamase which converts penicillin to genes of Staphylococcus aureus are unknown. Among the pencillinoic acid, and thereby destroys antibiotic activity. few genes that have been identified, most have not been Furthermore, the lactamase often is propagated physically mapped or characterized in detail. Only a very episomally, typically on a plasmid, and often is only one of few genes of this organism have been sequenced. (See, for Several genes on an episomal element that, together, confer 40 instance Thornsberry, J., Antimicrobial Chemotherapy 21 multidrug resistance. Suppl C: 9–16 (1988), current versions of GENBANK and Methicillins, introduced in the 1960s, largely overcame other nucleic acid databases, and references that relate to the the problem of penicillin resistance in S. aureus . These genome of S. aureus Such as those set out elsewhere herein.) compounds conserve the portions of penicillin responsible It is clear that the etiology of diseases mediated or for antibiotic activity and modify or alter other portions that 45 exacerbated by S. aureus infection involves the programmed make penicillin a good for inactivating lactamases. expression of S. aureuS genes, and that characterizing the However, methicillin resistance has emerged in S. aureus, genes and their patterns of expression would add dramati along with resistance to many other antibiotics effective cally to our understanding of the organism and its host against this organism, including aminoglycosides, interactions. Knowledge of S. aureus genes and genomic tetracycline, chloramphenicol, macrollides and lincosamides. 50 organization would dramatically improve understanding of In fact, methicillin-resistant Strains of S. aureuS generally are disease etiology and lead to improved and new ways of multiply drug resistant. preventing, ameliorating, arresting and reversing diseases. Moreover, characterized genes and genomic fragments of S. The molecular genetics of most types of drug resistance in aureuS would provide reagents for, among other things, S. aureus has been elucidated (See Lyon et al., Microbiology 55 Reviews 51:88-134 (1987)). Generally, resistance is medi detecting, characterizing and controlling S. aureuS infec ated by plasmids, as noted above regarding penicillin resis tions. There is a need therefore to characterize the genome tance; however, Several Stable forms of drug resistance have of S. aureuS and for polynucleotides and Sequences of this been observed that apparently involve integration of a organism. resistance element into the S. aureus genome itself. 60 SUMMARY OF THE INVENTION Thus far each new antibiotic gives rise to resistance Strains, Stains emerge that are resistance to multiple drugs The present invention is based on the Sequencing of and increasingly persistent forms of resistance begin to fragments of the StaphylococcuS aureuS genome. The pri emerge. Drug resistance of S. aureuS infections already mary nucleotide Sequences which were generated are pro poses significant treatment difficulties, which are likely to 65 vided in SEQ ID NOS:1–5,191. get much worse unless new therapeutic agents are devel The present invention provides the nucleotide Sequence of oped. Several thousand contigs of the StaphylococcuS aureuS US 6,593,114 B1 S 6 genome, which are listed in tables below and Set out in the instance, polypeptides and of the present invention Sequence Listing Submitted here with, and representative having relatively short, Simple Sequences readily fragments thereof, in a form which can be readily used, can be Synthesized using commercially available automated analyzed, and interpreted by a skilled artisan. In one peptide Synthesizers. Polypeptides and proteins of the embodiment, the present invention is provided as contiguous present invention also may be purified from bacterial cells Strings of primary Sequence information corresponding to which naturally produce the . Yet another alternative the nucleotide sequences depicted in SEQID NOS: 1-5,191. is to purify polypeptide and proteins of the present invention from cells which have been altered to express them. The present invention further provides nucleotide The invention further provides polypeptides comprising sequences which are at least 95% identical to the nucleotide StaphylococcuS aureuS epitopes and vaccine compositions sequences of SEQ ID NOS: 1-5,191. comprising Such polypeptides. Also provided are methods The nucleotide sequence of SEQ ID NOS:1–5,191, a for vacciniating an individual against StaphylococcuS aureuS representative fragment thereof, or a nucleotide Sequence infection. which is at least 95% identical to the nucleotide sequence of The invention further provides methods of obtaining SEQ ID NOS:1–5,191 may be provided in a variety of homologs of the fragments of the StaphylococcuS aureuS mediums to facilitate its use. In one application of this 15 genome of the present invention and homologs of the embodiment, the Sequences of the present invention are proteins encoded by the ORFs of the present invention. recorded on computer readable media. Such media includes, Specifically, by using the nucleotide and amino acid but is not limited to magnetic Storage media, Such as floppy Sequences disclosed herein as a probe or as primers, and discS, hard disc Storage medium, and magnetic tape, optical techniqueS Such as PCR cloning and colony/plaque Storage media Such as CD-ROM; electrical Storage media hybridization, one skilled in the art can obtain homologs. such as RAM and ROM; and hybrids of these categories The invention further provides antibodies which selec Such as magnetic/optical Storage media. tively bind polypeptides and proteins of the present inven The present invention further provides Systems, particu tion. Such antibodies include both monoclonal and poly larly computer-based Systems which contain the Sequence clonal antibodies. information herein described Stored in a data Storage means. 25 The invention further provides hybridomas which pro Such Systems are designed to identify commercially impor duce the above-described antibodies. A hybridoma is an tant fragments of the StaphylococcuS aureus genome. immortalized cell line which is capable of Secreting a Another embodiment of the present invention is directed Specific monoclonal antibody. to fragments of the StaphylococcuS aureuS genome having The present invention further provides methods of iden particular structural or functional attributes. Such fragments tifying test Samples derived from cells which express one of of the StaphylococcuS aureus genome of the present inven the ORFs of the present invention, or a homolog thereof. tion include, but are not limited to, fragments which encode Such methods comprise incubating a test Sample with one or peptides, hereinafter referred to as open reading frames or more of the antibodies of the present invention, or one or ORFs," fragments which modulate the expression of an 35 more of the Dfs or antigens of the present invention, under operably linked ORF, hereinafter referred to as expression conditions which allow a skilled artisan to determine if the modulating fragments or EMFS," and fragments which can sample contains the ORF or produced therefrom. be used to diagnose the presence of Staphylococcus aureuS In another embodiment of the present invention, kits are in a Sample, hereinafter referred to as diagnostic fragments provided which contain the necessary reagents to carry out or “DFS. 40 the above-described assayS. Each of the ORFs in fragments of the Staphylococcus Specifically, the invention provides a compartmentalized aureus genome disclosed in Tables 1-3, and the EMFs found kit to receive, in close confinement, one or more containers 5' to the ORFs, can be used in numerous ways as polynucle which comprises: (a) a first container comprising one of the otide reagents. For instance, the Sequences can be used as antibodies, antigens, or one of the DFs of the present diagnostic probes or amplification primerS for detecting or 45 invention; and (b) one or more other containers comprising determining the presence of a specific microbe in a Sample, one or more of the following wash reagents, reagents to Selectively control gene expression in a host and in the capable of detecting presence of bound antibodies, antigens production of polypeptides, Such as polypeptides encoded or hybridized DFs. by ORFs of the present invention, particular those polypep Using the isolated proteins of the present invention, the tides that have a pharmacological activity. 50 present invention further provides methods of obtaining and The present invention further includes recombinant con identifying agents capable of binding to a polypeptide, or Structs comprising one or more fragments of the Staphylo protein encoded by one of the ORFs of the present invention. COccus aureuS genome of the present invention. The recom Specifically, Such agents include, as further described below, binant constructs of the present invention comprise vectors, antibodies, peptides, carbohydrates, pharmaceutical agents Such as a plasmid or viral vector, into which a fragment of 55 and the like. Such methods comprise Steps of: (a)contacting the StaphylococcuS aureuS has been inserted. an agent with an isolated protein encoded by one of the The present invention further provides host cells contain ORFs of the present invention; and (b)determining whether ing any of the isolated fragments of the StaphylococcuS the agent binds to Said protein. aureuS genome of the present invention. The host cells can The present genomic Sequences of StaphylococcuS aureuS be a higher eukaryotic host cell, Such as a mammalian cell, 60 will be of great value to all laboratories working with this a lower eukaryotic cell, Such as a yeast cell, or a procaryotic organism and for a variety of commercial purposes. Many cell Such as a bacterial cell. fragments of the StaphyloCOccus aureus genome will be The present invention is further directed to isolated immediately identified by Similarity Searches against Gen polypeptides and proteins encoded by ORFs of the present Bank or protein databases and will be of immediate value to invention. A variety of methods, well known to those of skill 65 StaphylococcuS aureuS researchers and for immediate com in the art, routinely may be utilized to obtain any of the mercial value for the production of proteins or to control polypeptides and proteins of the present invention. For gene expression. US 6,593,114 B1 7 8 The methodology and technology for elucidating exten AS used herein, a “representative fragment of the nucle Sive genomic Sequences of bacterial and other genomes has otide sequence depicted in SEQID NOS:1-5,191 refers to and will greatly enhance the ability to analyze and under any portion of the SEQ ID NOS: 1-5,191 which is not Stand chromosomal organization. In particular, Sequenced presently represented within a publicly available database. contigs and genomes will provide the models for developing 5 Preferred representative fragments of the present invention tools for the analysis of chromosome Structure and function, are Staphylococcus aureus open reading frames (ORFs"), including the ability to identify genes within large Segments expression modulating fragment (EMFs") and fragments which can be used to diagnose the presence of Staphylo of genomic DNA, the Structure, position, and Spacing of coccus aureus in Sample (“DFs). A non-limiting identifi regulatory elements, the identification of genes with poten cation of preferred representative fragments is provided in tial industrial applications, and the ability to do comparative 10 Tables 1-3. genomic and molecular phylogeny. AS discussed in detail below, the information provided in DESCRIPTION OF THE FIGURES SEQ ID NOS: 1-5,191 and in Tables 1-3 together with routine cloning, Synthesis, Sequencing and assay methods FIG. 1 is a block diagram of a computer system (102) that will enable those skilled in the art to clone and Sequence all can be used to implement computer-based Systems of 15 “representative fragments' of interest, including open read present invention. ing frames encoding a large variety of StaphylococcuS FIG. 2 is a Schematic diagram depicting the data flow and aureuS proteins. computer programs used to collect, assemble, edit and While the presently disclosed sequences of SEQID NOS: annotate the contigs of the StaphylococcuS aureuS genome 1–5,191 are highly accurate, Sequencing techniques are not of the present invention. Both Macintosh and Unix plat perfect and, in relatively rare instances, further investigation forms are used to handle the AB 373 and 377 sequence data of a fragment or Sequence of the invention may reveal a files, largely as described in Kerllavage et al., Proceedings of nucleotide Sequence error present in a nucleotide Sequence the Twenty-Sixth Annual Hawaii International Conference disclosed in SEQ ID NOS:1–5,191. However, once the On System Sciences, 585, IEEE Computer Society Press, 25 present invention is made available (i.e., once the informa Washington D.C. (1993). Factura (AB) is a Macintosh tion in SEQID NOS: 1-5,191 and Tables 1-3 has been made program designed for automatic vector Sequence removal available), resolving a rare sequencing error in SEQ ID and end-trimming of Sequence files. The program Loadis NOS: 1–5,191 will be well within the skill of the art. The runs on a Macintosh platform and parses the feature data present disclosure makes available Sufficient Sequence infor extracted from the sequence files by Factura to the Unix mation to allow any of the described contigs or portions based StaphylococcuS aureuS relational database. ASSembly thereof to be obtained readily by straightforward application of contigs (and whole genome sequences) is accomplished of routine techniques. Further Sequencing of Such polynucle by retrieving a specific Set of Sequence files and their otide may proceed in like manner using manual and auto asSociated features using extrSeq, a Unix utility for retriev mated Sequencing methods which are employed ubiquitous ing Sequences from an SQL database. The resulting in the art. Nucleotide Sequence editing Software is publicly Sequence file is processed by Seqfilter to trim portions of the 35 available. For example, Applied Biosystem's (AB) AutoAS Sequences with more than 2% ambiguous nucleotides. The Sembler can be used as an aid during Visual inspection of Sequence files were assembled using TIGR ASSembler, an nucleotide Sequences. By employing Such routine tech assembly engine designed at The Institute for Genomic niques potential errorS readily may be identified and the Research (TIGR") for rapid and accurate assembly of correct Sequence then may be ascertained by targeting thousands of Sequence fragments. The collection of contigs 40 further Sequencing effort, also of a routine nature, to the generated by the assembly Step is loaded into the database region containing the potential error. with the lassie program. Identification of open reading Even if all of the very rare sequencing errors in SEQ ID frames (ORFs) is accomplished by processing contigs with NOS. 1-5,191 were corrected, the resulting nucleotide Zorf. The ORFs are Searched against S. aureuS Sequences 45 sequences would still be at least 95% identical, nearly all from Genbank and against all protein Sequences using the would be at least 99% identical, and the great majority BLASTN and BLASTP programs, described in Altschulet would be at least 99.9% identical to the nucleotide al., J. Mol. Biol. 215: 403:410 (1990)). Results of the ORF sequences of SEQ ID NOS: 1-5,191. determination and Similarity Searching Steps were loaded AS discussed elsewhere herein, polynucleotides of the into the database. As described below, Some results of the 50 present invention readily may be obtained by routine appli determination and the Searches are set out in Tables 1-3. cation of well known and Standard procedures for cloning and sequencing DNA. Detailed methods for obtaining librar DETAILED DESCRIPTION OF ILLUSTRATIVE ies and for Sequencing are provided below, for instance. A EMBODIMENTS wide variety of StaphylococcuS aureus Strains that can be The present invention is based on the Sequencing of 55 used to prepare SaureuS genomic DNA for cloning and for fragments of the StaphylococcuS aureuS genome and analy obtaining polynucleotides of the present invention are avail sis of the Sequences. The primary nucleotide Sequences able to the public from recognized depository institutions, generated by Sequencing the fragments are provided in SEQ such as the American Type Culture Collection (ATCC"). ID NOS: 1-5,191. (As used herein, the “primary sequence” The nucleotide Sequences of the genomes from different refers to the nucleotide Sequence represented by the 60 strains of Staphylococcus aureus differ somewhat. However, WUPAC nomenclature system.) the nucleotide Sequences of the genomes of all Staphylo In addition to the aforementioned StaphylococcuS aureuS coccus aureus strains will be at least 95% identical, in polynucleotide and polynucleotide Sequences, the present corresponding part, to the nucleotide Sequences provided in invention provides the nucleotide sequences of SEQ ID SEQ ID NOS:1–5,191. Nearly all will be at least 99% NOS:1-5,191, or representative fragments thereof, in a form 65 identical and the great majority will be 99.9% identical. which can be readily used, analyzed, and interpreted by a Thus, the present invention further provides nucleotide skilled artisan. sequences which are at least 95%, preferably 99% and most US 6,593,114 B1 9 10 preferably 99.9% identical to the nucleotide sequences of commercially-available software such as WordPerfect and SEQ ID NOS:1-5,191, in a form which can be readily used, MicroSoft Word, or represented in the form of an ASCII file, analyzed and interpreted by the Skilled artisan. Stored in a database application, Such as DB2, Sybase, Methods for determining whether a nucleotide Sequence Oracle, or the like. A skilled artisan can readily adapt any is at least 95%, at least 99% or at least 99.9% identical to the number of data-processor Structuring formats (e.g., text file nucleotide sequences of SEQ ID NOS: 1-5,191 are routine or database) in order to obtain computer readable medium and readily available to the skilled artisan. For example, the having recorded thereon the nucleotide Sequence informa well known fast algorithm described in Pearson and Lipman, tion of the present invention. Proc. Natl. Acad. Sci. USA 85: 2444 (1988) can be used to Computer software is publicly available which allows a generate the percent identity of nucleotide Sequences. The skilled artisan to access Sequence information provided in a BLASTN program also can be used to generate an identity computer readable medium. Thus, by providing in computer Score of polynucleotides compared to one another. readable form the nucleotide sequences of SEQ ID NOS: 1 COMPUTER RELATED EMBODIMENTS -5,191, a representative fragment thereof, or a nucleotide The nucleotide sequences provided in SEQ ID NOS:1-5, sequence at least 95%, preferably at least 96%, 97%, 98% or 191, a representative fragment thereof, or a nucleotide 15 99% and most preferably at least 99.9% identical to a sequence at least 95%, preferably at least 96%, 97%, 98% or sequence of SEQ ID NOS: 1-5,191 the present invention 99% and most preferably at least 99.9% identical to a enables the Skilled artisan routinely to access the provided polynucleotide sequence of SEQ ID NOS:1-5,191 may be Sequence information for a wide variety of purposes. “provided” in a variety of mediums to facilitate use thereof. The examples which follow demonstrate how software AS used herein, "provided” refers to a manufacture, other which implements the BLAST (Altschulet al., J. Mol. Biol. than an isolated nucleic acid molecule, which contains a 215:403-410 (1990)) and BLAZE (Brutlag et al., Comp. nucleotide Sequence of the present invention; i.e., a nucle Chem. 17:203-207 (1993)) search algorithms on a Sybase otide sequence provided in SEQ ID NOS:1-5,191, a repre System was used to identify open reading frames (ORFs) Sentative fragment thereof, or a nucleotide Sequence at least within the StaphylococcuS aureus genome which contain 95%, preferably at least 96%, 97%, 98% or 99% and most 25 homology to ORFs or proteins from both Staphylococcus preferably at least 99.9% identical to a polynucleotide of aureuS and from other organisms. Among the ORFs dis SEQ ID NOS:1-5,191. Such a manufacture provides a large cussed herein are protein encoding fragments of the Staphy portion of the StaphylococcuS aureus genome and parts lococcuS aureus genome useful in producing commercially thereof (e.g., a StaphylococcuS aureus open reading frame important proteins, Such as used in fermentation (ORF)) in a form which allows a skilled artisan to examine reactions and in the production of commercially useful the manufacture using means not directly applicable to metabolites. examining the StaphylococcuS aureus genome or a Subset The present invention further provides Systems, particu thereof as it exists in nature or in purified form. larly computer-based Systems, which contain the Sequence In one application of this embodiment, a nucleotide information described herein. Such systems are designed to Sequence of the present invention can be recorded on 35 identify, among other things, commercially important frag computer readable media. AS used herein, "computer read ments of the StaphylococcuS aureuS genome. able media” refers to any medium which can be read and AS used herein, “a computer-based System” refers to the accessed directly by a computer. Such media include, but are hardware means, Software means, and data Storage means not limited to: magnetic Storage media, Such as floppy discs, used to analyze the nucleotide Sequence information of the hard disc Storage medium, and magnetic tape, optical Stor 40 present invention. The minimum hardware means of the age media Such as CD-ROM; electrical Storage media Such computer-based Systems of the present invention comprises as RAM and ROM; and hybrids of these categories, such as a central processing unit (CPU), input means, output means, magnetic/optical Storage media. A skilled artisan can readily and data Storage means. A skilled artisan can readily appre appreciate how any of the presently known computer read ciate that any one of the currently available computer-based able mediums can be used to create a manufacture compris 45 System are Suitable for use in the present invention. ing computer readable medium having recorded thereon a AS Stated above, the computer-based Systems of the nucleotide Sequence of the present invention. Likewise, it present invention comprise a data Storage means having will be clear to those of skill how additional computer Stored therein a nucleotide Sequence of the present invention readable media that may be developed also can be used to and the necessary hardware means and Software means for create analogous manufactures having recorded thereon a 50 Supporting and implementing a Search means. nucleotide Sequence of the present invention. AS used herein, "data Storage means' refers to memory AS used herein, "recorded” refers to a process for Storing which can Store nucleotide Sequence information of the information on computer readable medium. A skilled artisan present invention, or a memory acceSS means which can can readily adopt any of the presently know methods for acceSS manufactures having recorded thereon the nucleotide recording information on computer readable medium to 55 Sequence information of the present invention. generate manufactures comprising the nucleotide Sequence AS used herein, “Search means' refers to one or more information of the present invention. programs which are implemented on the computer-based A variety of data Storage Structures are available to a System to compare a target Sequence or target Structural skilled artisan for creating a computer readable medium motif with the Sequence information Stored within the data having recorded thereon a nucleotide Sequence of the 60 Storage means. Search means are used to identify fragments present invention. The choice of the data Storage Structure or regions of the present genomic Sequences which match a will generally be based on the means chosen to access the particular target Sequence or target motif. A variety of known Stored information. In addition, a variety of data processor algorithms are disclosed publicly and a variety of commer programs and formats can be used to Store the nucleotide cially available Software for conducting Search means are Sequence information of the present invention on computer 65 and can be used in the computer-based Systems of the readable medium. The Sequence information can be repre present invention. Examples of Such Software includes, but Sented in a word processing text file, formatted in is not limited to, MacPattern (EMBL), BLASTN and US 6,593,114 B1 11 12 BLASTX (NCBIA). A skilled artisan can readily recognize of the Secondary Storage devices 110, and/or a removable that any one of the available algorithms or implementing Storage medium 116. During execution, Software for acceSS Software packages for conducting homology Searches can be ing and processing the genomic Sequence (Such as Search adapted for use in the present computer-based Systems. tools, comparing tools, etc.) reside in main memory 108, in AS used herein, a “target Sequence' can be any DNA or accordance with the requirements and operating parameters amino acid Sequence of Six or more nucleotides or two or of the operating System, the hardware System and the more amino acids. A skilled artisan can readily recognize Software program or programs. that the longer a target Sequence is, the less likely a target BIOCHEMICAL EMBODIMENTS Sequence will be present as a random occurrence in the Other embodiments of the present invention are directed database. The most preferred Sequence length of a target to isolated fragments of the StaphylococcuS aureus genome. sequence is from about 10 to 100 amino acids or from about The fragments of the StaphylococcuS aureus genome of the 30 to 300 nucleotide residues. However, it is well recognized present invention include, but are not limited to fragments that Searches for commercially important fragments, Such as which encode peptides, hereinafter open reading frames Sequence fragments involved in gene expression and protein (ORFs), fragments which modulate the expression of an processing, may be of shorter length. 15 operably linked ORF, hereinafter expression modulating AS used herein, “a target Structural motif, or “target fragments (EMFs) and fragments which can be used to motif,” refers to any rationally Selected Sequence or com diagnose the presence of StaphylococcuS aureuS in a Sample, bination of Sequences in which the sequence(s) are chosen hereinafter diagnostic fragments (DFS). based on a three-dimensional configuration which is formed AS used herein, an "isolated nucleic acid molecule' or an upon the folding of the target motif. There are a variety of "isolated fragment of the StaphylococcuS aureus genome’ target motifs known in the art. Protein target motifs include, refers to a nucleic acid molecule possessing a specific but are not limited to, enzymic active sites and Signal nucleotide Sequence which has been Subjected to purifica Sequences. Nucleic acid target motifs include, but are not tion means to reduce, from the composition, the number of limited to, promoter Sequences, hairpin Structures and induc compounds which are normally associated with the compo ible expression elements (protein binding sequences). 25 Sition. Particularly, the term refers to the nucleic acid mol A variety of Structural formats for the input and output ecules having the sequences set out in SEQ ID NOS: means can be used to input and output the information in the 1–5,191, to representative fragments thereof as described computer-based Systems of the present invention. A pre above, to polynucleotides at least 95%, preferably at least ferred format for an output means ranks fragments of the 96%, 97%, 98% or 99% and especially preferably at least StaphylococcuS aureuS genomic Sequences possessing vary 99.9% identical in sequence thereto, also as set out above. ing degrees of homology to the target Sequence or target A variety of purification means can be used to generated motif. Such presentation provides a skilled artisan with a the isolated fragments of the present invention. These ranking of Sequences which contain various amounts of the include, but are not limited to methods which separate target Sequence or target motif and identifies the degree of constituents of a Solution based on charge, Solubility, or size. homology contained in the identified fragment. 35 In one embodiment, Staphylococcus aureus DNA can be A variety of comparing means can be used to compare a mechanically sheared to produce fragments of 15-20 kb in target Sequence or target motif with the data Storage means length. These fragments can then be used to generate an to identify Sequence fragments of the StaphylococcuS aureuS StaphylococcuS aureuS library by inserting them into lambda genome. In the present examples, implementing Software clones as described in the Examples below. Primers which implement the BLAST and BLAZE algorithms, 40 flanking, for example, an ORF, Such as those enumerated in described in Altschul et al., J. Mol. Biol. 215: 403-410 Tables 1-3 can then be generated using nucleotide Sequence (1990), was used to identify open reading frames within the information provided in SEQID NOS:1–5,191. Well known StaphylococcuS aureuS genome. A skilled artisan can readily and routine techniques of PCR cloning then can be used to recognize that any one of the publicly available homology isolate the ORF from the lambda DNA library of Staphylo Search programs can be used as the Search means for the 45 COccuS aureuS genomic DNA. Thus, given the availability of computer-based Systems of the present invention. Of course, SEQ ID NOS: 1-5,191, the information in Tables 1, 2 and Suitable proprietary Systems that may be known to those of 3, and the information that may be obtained readily by skill also may be employed in this regard. analysis of the sequences of SEQ ID NOS:1-5,191 using FIG. 1 provides a block diagram of a computer System methods set out above, those of skill will be enabled by the illustrative of embodiments of this aspect of present inven 50 present disclosure to isolate any ORF-containing or other tion. The computer system 102 includes a processor 106 nucleic acid fragment of the present invention. connected to a bus 104. Also connected to the bus 104 are The isolated nucleic acid molecules of the present inven a main memory 108 (preferably implemented as random tion include, but are not limited to Single Stranded and access memory, RAM) and a variety of Secondary Storage double stranded DNA, and single stranded RNA. devices 110, Such as a hard drive 112 and a removable 55 AS used herein, an “open reading frame,” ORF, means a medium Storage device 114. The removable medium Storage Series of triplets coding for amino acids without any termi device 114 may represent, for example, a floppy disk drive, nation codons and is a Sequence translatable into protein. a CD-ROM drive, a magnetic tape drive, etc. A removable Tables 1, 2 and 3 list ORFs in the Staphylococcus aureus Storage medium 116 (Such as a floppy disk, a compact disk, genomic contigs of the present invention that were identified a magnetic tape, etc.) containing control logic and/or data 60 as putative coding regions by the GeneMark Software using recorded therein may be inserted into the removable medium organism-specific Second-order Markov probability transi storage device 114. The computer system 102 includes tion matrices. It will be appreciated that other criteria can be appropriate Software for reading the control logic and/or the used, in accordance with well known analytical methods, data from the removable medium Storage device 114, once Such as those discussed herein, to generate more inclusive, it is inserted into the removable medium Storage device 114. 65 more restrictive or more Selective lists. A nucleotide Sequence of the present invention may be Table 1 sets out ORFs in the Staphylococcus aureus stored in a well known manner in the main memory 108, any contigs of the present invention that are at least 80 amino US 6,593,114 B1 13 14 acids long and over a continuous region of at least 50 bases AS used herein, a Sequence is Said to "modulate the which are 95% or more identical (by BLAST analysis) to an expression of an operably linked Sequence” when the S. aureus nucleotide Sequence available through Genbank in expression of the Sequence is altered by the presence of the November 1996. EMF. EMFs include, but are not limited to, promoters, and Table 2 sets out ORFs in the Staphylococcus aureus promoter modulating sequences (inducible elements). One contigs of the present invention that are not in Table 1 and class of EMFS are fragments which induce the expression or match, with a BLASTP probability score of 0.01 or less, a an operably linked ORF in response to a specific regulatory polypeptide Sequence available through Genbank by Sep. factor or physiological event. 1996. EMF sequences can be identified within the contigs of the Table 3 sets out ORFs in the Staphylococcus aureus StaphylococcuS aureus genome by their proximity to the contigs of the present invention that do not match ORFs provided in Tables 1-3. An intergenic segment, or a Significantly, by BLASTP analysis, a polypeptide Sequence fragment of the intergenic Segment, from about 10 to 200 available through Genbank by Sep. 1996. nucleotides in length, taken from any one of the ORFs of In each table, the first and Second columns identify the Tables 1-3 will modulate the expression of an operably ORF by, respectively, contig number (SEQID NO) and ORF 15 linked ORF in a fashion similar to that found with the number within the contig; the third column indicates the first naturally linked ORF Sequence. AS used herein, an “inter nucleotide of the ORF, counting from the 5' end of the contig genic Segment” refers to fragments of the StaphylococcuS Strand shown in the Sequence listing, and the fifth column aureus genome which are between two ORF(s) herein indicates the length of each ORF in nucleotides. It will be described. EMFs also can be identified using known EMFs appreciated that Some ORFs are located on the reverse as a target Sequence or target motif in the computer-based strand. The numbering identifying such ORFs also repre systems of the present invention. Further, the two methods Sents nucleotide positions counting from the 5' end of the can be combined and used together. Strand shown in the Sequence listing. The presence and activity of an EMF can be confirmed In Tables 1 and 2, column five, lists the “match accession” using an EMF trap vector. An EMF trap vector contains a for the closest matching Sequence available through Gen 25 cloning site linked to a marker Sequence. A marker Sequence bank. These reference numbers are the databases entry encodes an identifiable phenotype, Such as antibiotic resis numbers commonly used by those of skill in the art, who will tance or a complementing nutrition auxotrophic factor, be familiar with their denominators. Descriptions of the which can be identified or assayed when the EMF trap vector nomenclature are available from the National Center for is placed within an appropriate host under appropriate Biotechnology Information. Column six in Tables 1 and 2 conditions. As described above, a EMF will modulate the provides the "gene name of the matching Sequence; column expression of an operably linked marker Sequence. A more seven provides the BLAST “similarity”; column eight pro detailed discussion of various marker Sequences is provided vides the BLAST “identity” score from the comparison of below. the ORF and the homologous gene, and column nine indi A Sequence which is Suspected as being an EMF is cloned cates the length in nucleotides of the highest Scoring Seg 35 in all three reading frames in one or more restriction Sites ment pair identified by the BLAST identity analysis. upstream from the marker Sequence in the EMF trap vector. The concepts of percent identity and percent Similarity of The vector is then transformed into an appropriate host using two polypeptide Sequences is well understood in the art. For known procedures and the phenotype of the transformed example, two polypeptides 10 amino acids in length which host in examined under appropriate conditions. AS described differ at three amino acid positions (e.g., at positions 1,3 and 40 above, an EMF will modulate the expression of an operably 5) are said to have a percent identity of 70%. However, the linked marker Sequence. Same two polypeptides would be deemed to have a percent AS used herein, a "diagnostic fragment, DF, means a similarity of 80% if, for example at position 5, the amino series of nucleotide molecules which selectively hybridize to acids moieties, although not identical, were "similar” (i.e., StaphylococcuS aureuS Sequences. DFS can be readily iden possessed similar biochemical characteristics). Many pro 45 tified by identifying unique Sequences within contigs of the grams for analysis of nucleotide or amino acid Sequence StaphylococcuS aureuS genome, Such as by using well Similarity, Such as fast and BLAST Specifically list percent known computer analysis Software, and by generating and identity of a matching region as an output parameter. Thus, testing probes or amplification primers consisting of the DF for instance, Tables 1 and 2 herein enumerate the percent Sequence in an appropriate diagnostic format which deter identity of the highest Scoring Segment pair' in each ORF 50 mines amplification or hybridization Selectivity. and its listed relative. Further details concerning the algo The Sequences falling within the Scope of the present rithms and criteria used for homology Searches are provided invention are not limited to the Specific Sequences herein below and are described in the pertinent literature high described, but also include allelic and Species variations lighted by the citations provided below. thereof. Allelic and Species variations can be routinely It will be appreciated that other criteria can be used to 55 determined by comparing the Sequences provided in SEQID generate more inclusive and more exclusive listings of the NOS:1-5,191, a representative fragment thereof, or a nucle types Set out in the tables. AS those of Skill will appreciate, otide sequence at least 99% and preferably 99.9% identical narrow and broad searches both ate useful. Thus, a skilled to SEQ ID NOS:1-5,191, with a sequence from another artisan can readily identify ORFs in contigs of the Staphy isolate of the Same Species. lococcuS aureus genome other than those listed in Tables 60 Furthermore, to accommodate codon variability, the 1-3, such as ORFs which are overlapping or encoded by the invention includes nucleic acid molecules coding for the opposite strand of an identified ORF in addition to those Same amino acid Sequences as do the Specific ORFs dis ascertainable using the computer-based Systems of the closed herein. In other words, in the coding region of an present invention. ORF, Substitution of one codon for another which encodes AS used herein, an “expression modulating fragment,” 65 the same amino acid is expressly contemplated. EMF, means a series of nucleotide molecules which modul Any specific Sequence disclosed herein can be readily lates the expression of an operably linked ORF or EMF. Screened for errors by resequencing a particular fragment, US 6,593,114 B1 15 16 Such as an ORF, in both directions (i.e., Sequence both tors include pWLineo, pSV2cat, pCG44, pXT 1, pSG Strands). Alternatively, error Screening can be performed by (available from Stratagene) pSVK3, pBPV, pMSG, pSVL Sequencing corresponding polynucleotides of Staphylococ (available from Pharmacia). cus aureus origin isolated by using part or all of the Promoter regions can be Selected from any desired gene fragments in question as a probe or primer. using CAT (chloramphenicol ) vectors or other Each of the ORFs of the Staphylococcus aureus genome vectors with Selectable markers. Two appropriate vectors are disclosed in Tables is 1, 2 and 3, and the EMFs found 5' to pKK232-8 and pCM7. Particular named bacterial promoters the ORFs, can be used as polynucleotide reagents in numer include lac, lacZ, T3, T7, gpt, lambda PR, and trc. Eukary ous ways. For example, the Sequences can be used as otic promoters include CMV immediate early, HSV thymi diagnostic probes or diagnostic amplification primers to dine kinase, early and late SV40, LTRs from retrovirus, and detect the presence of a specific microbe in a Sample, mouse metallothionein- I. Selection of the appropriate vec particular StaphylococcuS aureus. Especially preferred in tor and promoter is well within the level of ordinary skill in this regard are ORF such as those of Table 3, which do not the art. match previously characterized Sequences from other organ The present invention further provides host cells contain isms and thus are most likely to be highly Selective for 15 ing any one of the isolated fragments of the StaphylococcuS StaphylococcuS aureus. Also particularly preferred are aureuS genomic fragments and contigs of the present ORFs that can be used to distinguish between strains of invention, wherein the fragment has been introduced into the StaphylococcuS aureus, particularly those that distinguish host cell using known methods. The host cell can be a higher medically important Strain, Such as drug-resistant Strains. eukaryotic host cell, Such as a mammalian cell, a lower In addition, the fragments of the present invention, as eukaryotic host cell, Such as a yeast cell, or a procaryotic broadly described, can be used to control gene expression cell, Such as a bacterial cell. through triple helix formation or antisense DNA or RNA, A polynucleotide of the present invention, Such as a both of which methods are based on the binding of a recombinant construct comprising an ORF of the present polynucleotide sequence to DNA or RNA. Triple helix invention, may be introduced into the host by a variety of formation optimally results in a shut-off of RNA transcrip 25 well established techniques that are Standard in the art, Such tion from DNA, while antisense RNA hybridization blocks as calcium phosphate transfection, DEAE, dextran mediated translation of an mRNA molecule into polypeptide. Infor transfection and electroporation, which are described in, for mation from the Sequences of the present invention can be instance, Davis, L. et al., BASIC METHODS IN MOLECU used to design antisense and triple helix-forming oligonucle LAR BIOLOGY (1986). otides. Polynucleotides suitable for use in these methods are A host cell containing one of the fragments of the Sta usually 20 to 40 bases in length and are designed to be phylococcus aureus genomic fragments and contigs of the complementary to a region of the gene involved in present invention, can be used in conventional manners to transcription, for triple-helix formation, or to the mRNA produce the gene product encoded by the isolated fragment itself, for antisense inhibition. Both techniques have been (in the case of an ORF) or can be used to produce a demonstrated to be effective in model Systems, and the 35 heterologous protein under the control of the EMF. requisite techniques are well known and involve routine The present invention further provides isolated polypep procedures. Triple helix techniques are discussed in, for tides encoded by the nucleic acid fragments of the present example, Lee et al., Nucl. Acids Res. 6: 3073 (1979); Cooney invention or by degenerate variants of the nucleic acid et al., Science 241:456 (1988); and Dervan et al., Science fragments of the present invention. By “degenerate variant” 251: 1360 (1991). Antisense techniques in general are dis 40 is intended nucleotide fragments which differ from a nucleic cussed in, for instance, Okano, J. Neurochem. 56: 560 acid fragment of the present invention (e.g., an ORF) by (1991) and OLIGODEOXYNUCLEOTIDES AS ANTI nucleotide Sequence but, due to the degeneracy of the SENSE INHIBITORS OF GENE EXPRESSION, CRC Genetic Code, encode an identical polypeptide Sequence. Press, Boca Raton, Fla. (1988)). Preferred nucleic acid fragments of the present invention The present invention further provides recombinant con 45 are the ORFs depicted in Tables 2 and 3 which encode Structs comprising one or more fragments of the Staphylo proteins. COccus aureuS genomic fragments and contigs of the present A variety of methodologies known in the art can be invention. Certain preferred recombinant constructs of the utilized to obtain any one of the isolated polypeptides or present invention comprise a vector, Such as a plasmid or proteins of the present invention. At the Simplest level, the Viral vector, into which a fragment of the StaphylococcuS 50 amino acid Sequence can be Synthesized using commercially aureuS genome has been inserted, in a forward or reverse available peptide Synthesizers. This is particularly useful in orientation. In the case of a vector comprising one of the producing Small peptides and fragments of larger polypep ORFs of the present invention, the vector may further tides. Such short fragments as may be obtained most readily comprise regulatory Sequences, including for example, a by Synthesis are useful, for example, in generating antibod promoter, operably linked to the ORF. For vectors compris 55 ies against the native polypeptide, as discussed further ing the EMFs of the present invention, the vector may below. further comprise a marker Sequence or heterologous ORF In an alternative method, the polypeptide or protein is operably linked to the EMF. purified from bacterial cells which naturally produce the Large numbers of Suitable vectors and promoters are polypeptide or protein. One skilled in the art can readily known to those of skill in the art and are commercially 60 employ well-known methods for isolating polpeptides and available for generating the recombinant constructs of the proteins to isolate and purify polypeptides or proteins of the present invention. The following vectors are provided by present invention produced naturally by a bacterial Strain, or way of example. Useful bacterial vectors include by other methods. Methods for isolation and purification that phagescript, PsiXI74, pBluescript SK and KS (+ and -), can be employed in this regard include, but are not limited pNH8a, pNH16a, pNH18a, pNH46a (available from 65 to, immunochromatography, HPLC, Size-exclusion Stratagene); pTrc99A, pKK223–3, pKK233–3, plDR540, chromatography, ion-exchange chromatography, and pRIT5 (available from Pharmacia). Useful eukaryotic vec immuno-affinity chromatography. US 6,593,114 B1 17 18 The polypeptides and proteins of the present invention herein will express heterologous polypeptides or proteins also can be purified from cells which have been altered to upon induction of the regulatory elements linked to the DNA express the desired polypeptide or protein. AS used herein, Segment or Synthetic gene to be expressed. a cell is said to be altered to express a desired polypeptide Mature proteins can be expressed in mammalian cells, or protein when the cell, through genetic manipulation, is yeast, bacteria, or other cells under the control of appropriate made to produce a polypeptide or protein which it normally promoters. Cell-free translation Systems can also be does not produce or which the cell normally produces at a employed to produce Such proteins using RNAS derived lower level. Those skilled in the art can readily adapt from the DNA constructs of the present invention. Appro procedures for introducing and expressing either recombi priate cloning and expression vectors for use with prokary nant or Synthetic Sequences into eukaryotic or prokaryotic otic and eukaryotic hosts are described in Sambrook et al., MOLECULAR CLONING:A LABORATORY MANUAL, cells in order to generate a cell which produces one of the 2nd Edition, Cold Spring Harbor Laboratory Press, Cold polypeptides or proteins of the present invention. Spring Harbor, N.Y. (1989), the disclosure of which is Any host/vector System can be used to express one or hereby incorporated by reference in its entirety. more of the ORFs of the present invention. These include, Generally, recombinant expression vectors will include but are not limited to, eukaryotic hosts Such as HeLa cells, 15 origins of replication and Selectable markers permitting CV-1 cell, COS cells, and Sf9 cells, as well as prokaryotic transformation of the host cell, e.g., the amplicillin resistance host such as E. coli and B. Subtilis. The most preferred cells gene of E. coli and S. cerevisiae TRP1 gene, and a promoter are those which do not normally express the particular derived from a highly expressed gene to direct transcription polypeptide or protein or which expresses the polypeptide or of a downstream Structural Sequence. Such promoters can be protein at low natural level. derived from operons encoding glycolytic enzymes Such as "Recombinant,” as used herein, means that a polypeptide 3- phosphoglycerate kinase (PGK), alpha-factor, acid or protein is derived from recombinant (e.g., microbial or phosphatase, or heat Shock proteins, among others. The mammalian) expression systems. “Microbial” refers to heterologous Structural Sequence is assembled in appropriate recombinant polypeptides or proteins made in bacterial or phase with translation initiation and termination Sequences, fungal (e.g., yeast) expression Systems. As a product, 25 and preferably, a leader Sequence capable of directing Secre “recombinant microbial defines a polypeptide or protein tion of translated protein into the periplasmic Space or essentially free of native endogenous Substances and unac eXtracellular medium. Optionally, the heterologous companied by associated native glycosylation. Polypeptides Sequence can encode a fusion protein including an or proteins expressed in most bacterial cultures, e.g., E. coli, N-terminal identification peptide imparting desired will be free of glycosylation modifications, polypeptides or characteristics, e.g., Stabilization or Simplified purification proteins expressed in yeast will have a glycosylation pattern of expressed recombinant product. different from that expressed in mammalian cells. Useful expression vectors for bacterial use are constructed "Nucleotide sequence” refers to a heteropolymer of deox by inserting a structural DNA sequence encoding a desired yribonucleotides. Generally, DNA segments encoding the protein together with Suitable translation initiation and ter polypeptides and proteins provided by this invention are 35 mination Signals in operable reading phase with a functional assembled from fragments of the StaphylococcuS aureuS promoter. The Vector will comprise one or more phenotypic genome and short oligonucleotide linkers, or from a Series of Selectable markers and an origin of replication to ensure oligonucleotides, to provide a Synthetic gene which is maintenance of the vector and, when desirable, provide capable of being expressed in a recombinant transcriptional amplification within the host. unit comprising regulatory elements derived from a micro 40 Suitable prokaryotic hosts for transformation include bial or viral operon. strains of Staphylococcus aureus, E. coli, B. Subtilis, Sal “Recombinant expression vehicle or vector” refers to a monella typhimurium and various species within the genera plasmid or phage or virus or vector, for expressing a Pseudomonas, Streptomyces, and StaphylococcuS. Others polypeptide from a DNA (RNA) sequence. The expression may, also be employed as a matter of choice. vehicle can comprise a transcriptional unit comprising an 45 AS a representative but non-limiting example, useful assembly of (1) a genetic regulatory elements necessary for expression vectors for bacterial use can comprise a Select gene expression in the host, including elements required to able marker and bacterial origin of replication derived from initiate and maintain transcription at a level Sufficient for commercially available plasmids comprising genetic ele Suitable expression of the desired polypeptide, including, for ments of the well known cloning vector pBR322 (ATCC example, promoters and, where necessary, an enhancers and 50 37017). Such commercial vectors include, for example, a polyadenylation signal; (2) a structural or coding sequence pKK223-3 (available form Pharmacia Fine Chemicals, which is transcribed into mRNA and translated into protein, Uppsala, Sweden) and GEM 1 (available from Promega and (3) appropriate signals to initiate translation at the Biotec, Madison, Wis., USA). These pBR322 “backbone” beginning of the desired coding region and terminate trans Sections are combined with an appropriate promoter and the lation at its end. Structural units intended for use in yeast or 55 Structural Sequence to be expressed. eukaryotic expression Systems preferably include a leader Following transformation of a Suitable host Strain and Sequence enabling extracellular Secretion of translated pro growth of the host Strain to an appropriate cell density, the tein by a host cell. Alternatively, where recombinant protein Selected promoter, where it is inducible, is depressed or is expressed without a leader or transport Sequence, it may induced by appropriate means (e.g., temperature shift or include an N-terminal methionine residue. This residue may 60 chemical induction) and cells are cultured for an additional or may not be Subsequently cleaved from the expressed period to provide for expression of the induced gene prod recombinant protein to provide a final product. uct. Thereafter cells are typically harvested, generally by "Recombinant expression System” means host cells which centrifugation, disrupted to release expressed protein, gen have stably integrated a recombinant transcriptional unit into erally by physical or chemical means, and the resulting chromosomal DNA or carry the recombinant transcriptional 65 crude extract is retained for further purification. unit extra chromosomally. The cells can be prokaryotic or Various mammalian cell culture Systems can also be eukaryotic. Recombinant expression Systems as defined employed to express recombinant protein. Examples of US 6,593,114 B1 19 20 mammalian expression systems include the COS-7 lines of contained the sequence L-(AS)-(G,A)-C at positions -3 to monkey kidney fibroblasts, described in Gluzman, Cell 23: +1, relative to the point of cleavage (Hayashi, S. and Wu, H. 175 (1981), and other cell lines capable of expressing a C. Lipoproteins in bacteria. J Bioenerg. Biomembr. 22, compatible vector, for example, the C127, 3T3, CHO, HeLa 451–471; 1990). and BHK cell lines. It is well known that most anchored proteins found on the Mammalian expression vectors will comprise an origin of Surface of gram-positive bacteria possess a highly conserved replication, a Suitable promoter and enhancer, and also any carboxy terminal Sequence. More than fifty Such proteins necessary ribosome binding Sites, polyadenylation site, from organisms Such as S. pyogenes, S. mutans, E. faecalis, Splice donor and acceptor Sites, transcriptional termination S. pneumoniae, and others, have been identified based on Sequences, and 5' flanking nontranscribed Sequences. DNA their extracellular location and carboxy terminal amino acid sequences derived from the SV40 viral genome, for Sequence (Fischetti, V. A. Gram-positive commensal bacte example, SV40 origin, early promoter, enhancer, Splice, and ria deliver antigens to elicit mucosal and Systemic immunity. polyadenylation sites may be used to provide the required ASM News 62, 405–410; 1996). The conserved region is nontranscribed genetic elements. comprised of six charged amino acids at the extreme car Recombinant polypeptides and proteins produced in bac 15 boxy terminus coupled to 15-20 hydrophobic amino acids terial culture is usually isolated by initial extraction from presumed to function as a transmembrane domain. Imme cell pellets, followed by one or more Salting-out, aqueous diately adjacent to the transmembrane domain is a six amino ion eXchange or Size exclusion chromatography Steps. acid Sequence conserved in nearly all proteins examined. Microbial cells employed in expression of proteins can be The amino acid sequence of this region is L-P-X-TG-X disrupted by any convenient method, including freeze-thaw (SEQ ID NO:5256), where X is any amino acid. cycling, Sonication, mechanical disruption, or use of cell Amino acid Sequence Similarities to proteins of known lysing agents. Protein refolding Steps can be used, as function by BLAST enables the assignment of putative necessary, in completing configuration of the mature protein. functions to novel amino acid Sequences and allows for the Finally, high performance liquid chromatography (HPLC) Selection of proteins thought to function outside the cell can be employed for final purification Steps. An additional 25 wall. Such proteins are well known in the art and include aspect of the invention includes StaphylococcuS aureuS "lipoprotein”, “periplasmic”, or “antigen'. polypeptides which are useful as immunodiagnostic anti An algorithm for Selecting antigenic and immunogenic gens and/or immunoprotective vaccines, collectively StaphylococcuS aureuS polypeptides including the foregoing “immunologically useful polypeptides'. Such immunologi criteria was developed by the present inventors. Use of the cally useful polypeptides may be selected from the ORFs algorithm by the inventors to Select immunologically useful disclosed herein based on techniques well known in the art StaphylococcuS aureuS polypeptides resulted in the Selection and described elsewhere herein. The inventors have used the of several ORFs which are predicted to be outer membrane following criteria to select Several immunologically useful asSociated proteins. These proteins are identified below, and polypeptides: shown in the Sequence Listing as SEQ ID NOS:5,192 to AS is known in the art, an amino terminal type I Signal 35 5.255. Thus the amino acid sequence of each of several Sequence directs a nascent protein acroSS the plasma and antigenic StaphylococcuS aureuS polypeptides can be outer membranes to the exterior of the bacterial cell. Such determined, for example, by locating the amino acid outermembrane polypeptides are expected to be immuno Sequence of the ORF in the Sequence Listing. Likewise the logically useful. According to Izard, J. W. et al., Mol. polynucleotide Sequence encoding each ORF can be found Microbiol. 13, 765-773; (1994), polypeptides containing 40 by locating the corresponding polynucleotide SEQ ID in type I Signal Sequences contain the following physical Tables 1, 2, or 3, and finding the corresponding nucleotide attributes: The length of the type I Signal Sequence is Sequence in the Sequence listing. approximately 15 to 25 primarily hydrophobic amino acid As will be appreciated by those of ordinary skill in the art, residues with a net positive charge in the extreme amino although a polypeptide representing an entire ORF may be terminus; the central region of the Signal Sequence must 45 the closest approximation to a protein found in Vivo, it is not adopt an alpha-helical conformation in a hydrophobic envi always technically practical to express a complete ORF in ronment; and the region Surrounding the actual site of Vitro. It may be very challenging to express and purify a cleavage is ideally six residues long, with Small side-chain highly hydrophobic protein by common laboratory methods. amino acids in the -1 and -3 positions. As a result, the immunologically useful polypeptides Also known in the art is the type IV signal Sequence 50 described herein as SEQ ID NOS:5,192–5.255 may have which is an example of the Several types of functional signal been modified slightly to Simplify the production of recom Sequences which exist in addition to the type I Signal binant protein, and are the preferred embodiments. In Sequence detailed above. Although functionally related, the general, nucleotide Sequences which encode highly hydro type IV signal Sequence possesses a unique Set of biochemi phobic domains, Such as those found at the amino terminal cal and physical attributes (Strom, M. S. and Lory, S., J. 55 Signal Sequence, are excluded for enhanced in vitro expres Bacteriol. 1-74, 7345-7351; 1992)). These are typically six Sion of the polypeptides. Furthermore, any highly hydro to eight amino acids with a net basic charge followed by an phobic amino acid Sequences occurring at the carboxy additional Sixteen to thirty primarily hydrophobic residues. terminus are also excluded. Such truncated polypeptides The cleavage Site of a type IV Signal Sequence is typically include for example the mature forms of the polypeptides after the initial Six to eight amino acids at the extreme amino 60 expected to exist in nature. terminus. In addition, all type IV signal Sequences contain a Those of ordinary skill in the art can identify soluble phenylalanine residue at the +1 Site relative to the cleavage portions the polypeptide, and in the case of truncated Site. polypeptides sequences shown as SEQ ID NOS:5,192–5, Studies of the cleavage Sites of twenty-six bacterial lipo 255, may obtain the complete predicted amino acid protein precursors has allowed the definition of a consensus 65 Sequence of each polypeptide by translating the correspond amino acid Sequence for lipoprotein cleavage. Nearly three ing polynucleotides Sequences of the corresponding ORF fourths of the bacterial lipoprotein precursors examined listed in Tables 1,2 and 3 and found in the Sequence listing. US 6,593,114 B1 21 22 Accordingly, polypeptides comprising the complete sis (SMPS)” process is further described in U.S. Pat. No. amino acid Sequence of an immunologically useful polypep 4,631,211 to Houghten et al. (1986). Epitope-bearing pep tide Selected from the group of polypeptides encoded by the tides and polypeptides of the invention are used to induce ORFs shown as SEQID NOS:5, 192–5.255, or an amino acid antibodies according to methods well known in the art. See, Sequence at least 95% identical thereto, preferably at least for instance, Sutcliffe et al., Supra; Wilson et al., Supra; 97% identical thereto, and most preferably at least 99% Chow, M. et al., Proc. Natl. Acad. Sci. USA82:910–914; and identical thereto form an embodiment of the invention; in Bittle, F. J. et al., J. Gen. Virol. 66:2347–2354 (1985). addition, polypeptides comprising an amino acid Sequence Immunogenic epitope-bearing peptides of the invention, Selected from the group of amino acid Sequences shown in i.e., those parts of a protein that elicit an antibody response the sequence listing as SEQ ID NOS:5,191-5.255, or an when the whole protein is the immunogen, are identified amino acid Sequence at least 95% identical thereto, prefer according to methods known in the art. See, for instance, ably at least 97% identical thereto and most preferably 99% Geysen et al., Supra. Further still, U.S. Pat. No. 5,194,392 to identical thereto, form an embodiment of the invention. Geysen (1990) describes a general method of detecting or Polynucleotides encoding the foregoing polypeptides also determining the sequence of monomers (amino acids or form part of the invention. 15 other compounds) which is a topological equivalent of the In another aspect, the invention provides a peptide or epitope (i.e., a "mimotope') which is complementary to a polypeptide comprising an epitope-bearing portion of a particular paratope (antigen ) of an antibody of polypeptide of the invention, particularly those epitope interest. More generally, U.S. Pat. No. 4,433,092 to Geysen bearing portions (antigenic regions) identified in the (1989) describes a method of detecting or determining a sequence listing as SEQ ID NOS:5,191-5.255. The epitope Sequence of monomers which is a topographical-equivalent bearing portion is an immunogenic or antigenic epitope of a of a ligand which is complementary to the ligand binding polypeptide of the invention. An "immunogenic epitope' is site of a particular receptor of interest. Similarly, U.S. Pat. defined as a part of a protein that elicits an antibody response No. 5,480,971 to Houghten, R. A. et al. (1996) on Peralky when the whole protein is the immunogen. On the other lated Oligopeptide Mixtures discloses linear C1-C7-alkyl hand, a region of a protein molecule to which an antibody 25 peralkylated oligopeptides and Sets and libraries of Such can bind is defined as an “antigenic epitope.” The number of peptides, as well as methods for using Such oligopeptide Sets immunogenic epitopes of a protein generally is less than the and libraries for determining the Sequence of a peralkylated number of antigenic epitopes. See, for instance, Geysen et oligopeptide that preferentially binds to an acceptor mol al., Proc. Natl. Acad. Sci. USA 81:3998–4002 (1983). ecule of interest. Thus, non-peptide analogs of the epitope AS to the Selection of peptides or polypeptides bearing an bearing peptides of the invention also can be made routinely antigenic epitope (i.e., that contain a region of a protein by these methods. molecule to which an antibody can bind), it is well known Immunologically useful polypeptides may be identified in that art that relatively short Synthetic peptides that mimic by an algorithm which locates novel StaphylococcuS aureuS part of a protein Sequence are routinely capable of eliciting outer membrane proteins, as is described above. Also listed an antiserum that reacts with the partially mimicked protein. 35 are epitopes or “antigenic regions of each of the identified See, for instance, Sutcliffe, J. G., Shinnick, T. M., Green, N. polypeptides. The antigenic regions, or epitopes, are delin and Learner, R. A. (1983) “Antibodies that react with eated by two numbers X-y, where x is the number of the first predetermined sites on proteins”, Science, 219:660-666. amino acid in the open reading frame included within the Peptides capable of eliciting protein-reactive Sera are fre epitope and y is the number of the last amino acid in the open quently represented in the primary Sequence of a protein, can 40 reading frame included within the epitope. For example, the be characterized by a set of Simple chemical rules, and are first epitope in ORF. 168-6 is comprised of amino acids 36 confined neither to immunodominant regions of intact pro to 45 of SEQ ID NO:5,192. The inventors have identified teins (i.e., immunogenic epitopes) nor to the amino or Several epitopes for each of the antigenic polypeptides carboxyl terminals. identified. Accordingly, forming part of the present inven Antigenic epitope-bearing peptides and polypeptides of 45 tion are polypeptides comprising an amino acid Sequence of the invention are therefore useful to raise antibodies, one or more antigenic regions identified. The invention including-monoclonal antibodies, that bind Specifically to further provides polynucleotides encoding Such polypep a polypeptide of the invention. See, for instance, Wilson et tides. al., Cell 37:767-778 (1984) at 777. The present invention further includes isolated Antigenic epitope-bearing peptides and polypeptides of 50 polypeptides, proteins and nucleic acid molecules which are the invention preferably contain a Sequence of at least Seven, Substantially equivalent to those herein described. AS used more preferably at least nine and most preferably between herein, Substantially equivalent can refer both to nucleic acid about 15 to about 30 amino acids contained within the amino and amino acid Sequences, for example a mutant Sequence, acid Sequence of a polypeptide of the invention. Non that varies from a reference Sequence by one or more limiting examples of antigenic polypeptides or peptides that 55 Substitutions, deletions, or additions, the net effect of which can be used to generate S. aureuS Specific antibodies include: does not result in an adverse functional dissimilarity a polypeptide comprising peptides shown below. These between reference and Subject Sequences. For purposes of polypeptide fragments have been determined to bear anti the present invention, Sequences having equivalent biologi genic epitopes of indicated S. aureuS proteins by the analysis cal activity, and equivalent expression characteristics are of the Jameson-Wolf antigenic index. 60 considered Substantially equivalent. For purposes of deter The epitope-bearing peptides and polypeptides of the mining equivalence, truncation of the mature Sequence invention may be produced by any conventional means. See, should be disregarded. e.g., Houghtein, R. A. (1985) General method for the rapid The invention further provides methods of obtaining Solid-phase Synthesis of large numbers of peptides: Speci homologs from other Strains of StaphyloCOccus aureus, of ficity of antigen-antibody interaction at the level of indi 65 the fragments of the StaphylococcuS aureuS genome of the vidual amino acids. Proc. Natl. Acad. Sci. USA present invention and homologs of the proteins encoded by 82:5131–5135; this “Simultaneous Multiple Peptide Synthe the ORFs of the present invention. As used herein, a US 6,593,114 B1 23 24 Sequence or protein of StaphylococcuS aureus is defined as ILLUSTRATIVE USES OF COMPOSITIONS OF THE a homolog of a fragment of the StaphylococcuS aureuS INVENTION fragments or contigs or a protein encoded by one of the Each ORF provided in Tables 1 and 2 is identified with a ORFs of the present invention, if it shares significant homol function by homology to a known gene or polypeptide. AS ogy to one of the fragments of the StaphylococcuS aureuS a result, one skilled in the art can use the polypeptides of the genome of the present invention or a protein encoded by one present invention for commercial, therapeutic and industrial of the ORFs of the present invention. Specifically, by using purposes consistent with the type of putative identification the Sequence disclosed herein as a probe or as primers, and of the polypeptide. Such identifications permit one skilled in techniqueS Such as PCR cloning and colony/plaque the art to use the Staphylococcus aureus ORFs in a manner hybridization, one skilled in the art can obtain homologs. Similar to the known type of Sequences for which the AS used herein, two nucleic acid molecules or proteins are identification is made; for example, to ferment a particular Said to “share significant homology’ if the two contain Sugar Source or to produce a particular metabolite. A variety regions which prossess greater than 85% sequence (amino of reviews illustrative of this aspect of the invention are acid or nucleic acid) homology. Preferred homologs in this available, including the following reviews on the industrial regard are those with more than 90% homology. Especially 15 use of enzymes, for example, BIOCHEMICAL ENGI preferred are those with 93% or more homology. Among NEERING AND BIOTECHNOLOGY HANDBOOK, 2nd especially preferred homologs those with 95% or more Ed., Macmillan Publications, Ltd. N.Y. (1991) and BIO homology are particularly preferred. Very particularly pre CATALYSTS IN ORGANIC SYNTHESES, Tramper et al., ferred among these are those with 97% and even more Eds., Elsevier Science Publishers, Amsterdam, The Nether particularly preferred among those are homologs with 99% lands (1985). A variety of exemplary uses that illustrate this or more homology. The most preferred homologs among and Similar aspects of the present invention are discussed these are those with 99.9% homology or more. It will be below. understood that, among measures of homology, identity is 1. Biosynthetic Enzymes particularly preferred in this regard. Open reading frames encoding proteins involved in medi Region Specific primers or probes derived from the nucle 25 ating the catalytic reactions involved in intermediary and otide sequence provided in SEQ ID NOS: 1-5,191 or from macromolecular metabolism, the biosynthesis of Small a nucleotide sequence at least 95%, particularly at least 99%, molecules, cellular processes and other functions includes especially at least 99.5% identical to a sequence of SEQ ID enzymes involved in the degradation of the intermediary NOS:1–5,191 can be used to prime DNA synthesis and PCR products of metabolism, enzymes involved in central inter amplification, as well as to identify colonies containing mediary metabolism, enzymes involved in respiration, both cloned DNA encoding a homolog. Methods suitable to this aerobic and anaerobic, enzymes involved in fermentation, aspect of the present invention are well known and have enzymes involved in ATP proton motor force conversion, been described in great detail in many publications Such as, enzymes involved in broad regulatory function, enzymes for example, Innis et al., PCR PROTOCOLS, Academic involved in amino acid Synthesis, enzymes involved in Press, San Diego, Calif. (1990)). 35 nucleotide Synthesis, enzymes involved in and When using primers derived from SEQ ID NOS: 1-5,191 Vitamin Synthesis, can be used for industrial biosynthesis. or from a nucleotide Sequence having an aforementioned The various metabolic pathways present in StaphyloCOc identity to a sequence of SEQ ID NOS:1-5,191, one skilled cus aureus can be identified based on absolute nutritional in the art will recognize that by employing high Stringency requirements as well as by examining the various enzymes conditions (e.g., annealing at 50-60° C. in 6X SSPC and 40 identified in Table 1–3 and SEQ ID NOS: 1-5,191. 50% formamide, and washing at 50–65° C. in 0.5X SSPC) Of particular interest are polypeptides involved in the only Sequences which are greater than 75% homologous to degradation of intermediary metabolites as well as non the primer will be amplified. By employing lower Stringency macromolecular metabolism. Such enzymes include conditions (e.g., hybridizing at 35–37° C. in 5X SSPC and amylases, glucose oxidases, and catalase. 40–45% formamide, and washing at 42°C. in 0.5X SSPC), 45 Proteolytic enzymes are another class of commercially Sequences which are greater than 40-50% homologous to important enzymes. Proteolytic enzymes find use in a num the primer will also be amplified. ber of industrial processes including the processing of flax When using DNA probes derived from SEQID NOS:1-5, and other vegetable fibers, in the extraction, clarification and 191, or from a nucleotide Sequence having an aforemen depectinization of fruit juices, in the extraction of Veg tioned identity to a sequence of SEQ ID NOS:1-5,191, for 50 etables oil and in the maceration of fruits and vegetables to colony/plaque hybridization, one skilled in the art will give unicellular fruits. A detailed review of the proteolytic recognize that by employing high Stringency conditions enzymes used in the food industry is provided in Rombouts (e.g., hybridizing at 50–65° C. in 5X SSPC and 50% et al., Symbiosis 21: 79 (1986) and Voragen et al. in formamide, and washing at 50–65° C. in 0.5X SSPC), BIO CATALYSTS IN AGRICULTURAL Sequences having regions which are greater than 90% 55 BIOTECHNOLOGY, Whitaker et al., Eds., American homologous to the probe can be obtained, and that by Chemical Society Symposium Series 389: 93 (1989). employing lower Stringency conditions (e.g., hybridizing at The metabolism of SugarS is an important aspect of the 35–37° C. in 5X SSPC and 40–45% formamide, and wash primary metabolism of StaphylococcuS aureus. Enzymes ing at 42 C. in 0.5X SSPC), sequences having regions involved in the degradation of Sugars, Such as, particularly, which are greater than 35–45% homologous to the probe 60 glucose, galactose, fructose and Xylose, can be used in will be obtained. industrial fermentation. Some of the important Sugar trans Any organism can be used as the Source for homologs of forming enzymes, from a commercial viewpoint, include the present invention So long as the organism naturally Sugar Such as glucose . Other meta expresses Such a protein or contains genes encoding the bolic enzymes have found commercial use Such as glucose Same. The most preferred organism for isolating homologs 65 oxidases which produces ketogulonic acid (KGA). KGA is are bacterias which are closely related to StaphylococcuS an intermediate in the commercial production of ascorbic titleS. acid using the Reichstein's procedure, as described in Krue US 6,593,114 B1 25 26 ger et al., Biotechnology 6(A), Rhine et al., Eds., Verlag of alkanones and OXoalkanates, oxidation of alcohols to Press, Weinheim, Germany (1984). carbonyl compounds, oxidation of Sulfides to Sulfoxides, Glucose oxidase (GOD) is commercially available and and carbon bond forming reactions Such as the aldol reac has been used in purified form as well as in an immobilized tion. form for the deoxygenation of beer. See, for instance, When considering the use of an encoded by one Hartmeir et al., Biotechnology Letters 1: 21 (1979). The of the ORFs of the present invention for biotransformation most important application of GOD is the industrial Scale and organic Synthesis it is Sometimes necessary to consider fermentation of gluconic acid. Market for gluconic acids the respective advantages and disadvantages of using a which are used in the detergent, textile, leather, microorganism as opposed to an isolated enzyme. ProS and photographic, pharmaceutical, food, feed and concrete cons of using a whole cell System on the one hand or an industry, as described, for example, in Bigelis et al., begin isolated partially purified enzyme on the other hand, has ning on page 357 in GENE MANIPULATIONS AND been described in detail by Bud et al., Chemistry in Britain FUNGI; Benett et al., Eds., Academic Press, New York (1987), p. 127. (1985). In addition to industrial applications GOD has found Amino , enzymes involved in the biosynthesis applications in medicine for quantitative determination of 15 and metabolism of amino acids, are useful in the catalytic glucose in body fluids recently in biotechnology for analyZ production of amino acids. The advantages of using micro ing Syrups from Starch and cellulose hydroSylates. This bial based enzyme Systems is that the amino transferase application is described in Owusu et al., Biochem. et Bio enzymes catalyze the Stereo- Selective Synthesis of only physica. Acta. 872: 83 (1986), for instance. L-amino acids and generally possess uniformly high cata The main Sweetener used in the World today is Sugar lytic rates. A description of the use of amino transferases for which comes from Sugar beets and Sugar cane. In the field amino acid production is provided by Roselle-David, Meth of industrial enzymes, the glucose isomerase process shows ods of Enzymology 136:479 (1987). the largest expansion in the market today. Initially, Soluble Another category of useful proteins encoded by the ORFs enzymes were used and later immobilized enzymes were of the present invention include enzymes involved in nucleic developed (Krueger et al., Biotechnology, The Textbook of 25 acid Synthesis, repair, and recombination. A variety of Industrial Microbiology, Sinauer ASSociated Incorporated, commercially important enzymes have previously been iso Sunderland, Mass. (1990)). Today, the use of glucose lated from members of Staphylococcus aureus. These produced high fructose Syrups is by far the largest industrial include Sau3A and Sau961. busineSS using immobilized enzymes. A review of the indus 2. Generation of Antibodies trial use of these enzymes is provided by Jorgensen, Starch AS described here, the proteins of the present invention, as 40:307 (1988). well as homologs thereof, can be used in a variety proce Proteinases, Such as alkaline Serine proteinases, are used dures and methods known in the art which are currently as detergent additives and thus represent one of the largest applied to other proteins. The proteins, of the present inven Volumes of microbial enzymes used in the industrial Sector. tion can further be used to generate an antibody which Because of their industrial importance, there is a large body 35 selectively binds the protein. Such antibodies can be either of published and unpublished information regarding the use monoclonal or polyclonal antibodies, as well fragments of of these enzymes in industrial processes. (See Faultman et these antibodies, and humanized forms. al., Acid Proteases Structure Function and Biology, Tang, J., The invention further provides antibodies which selec ed., Plenum Press, New York (1977) and Godfrey et al., tively bind to one of the proteins of the present invention and Industrial Enzymes, MacMillan Publishers, Surrey, UK 40 hybridomas which produce these antibodies. A hybridoma is (1983) and Hepner et al., Report Industrial Enzymes by an immortalized cell line which is capable of Secreting a 1990, Hel Hepner & Associates, London (1986)). Specific monoclonal antibody. Another class of commercially usable proteins of the In general, techniques for preparing polyclonal and mono present invention are the microbial lipases, described by, for clonal antibodies as well as hybridomas capable of produc instance, Macrae et al., Philosophical Transactions of the 45 ing the desired antibody are well known in the art Chiral Society of London 310:227 (1985) and Poserke, (Campbell, A. M., MONOCLONAL ANTIBODY TECH Journal of the American Oil Chemist Society 61:1758 NOLOGY: LABORATORY TECHNIOUES IN BIO (1984). A major use of lipases is in the fat and oil industry CHEMISTRY AND MOLECULAR BIOLOGY, Elsevier for the production of neutral glycerides using lipase cata Science Publishers, Amsterdam, The Netherlands (1984); St. lyzed inter-esterification of readily available triglycerides. 50 Groth et al., J. Immunol. Methods 35: 1-21 (1980), Kohler Application of lipases include the use as a detergent additive and Milstein, Nature 256: 495-497 (1975)), the trioma to facilitate the removal of fats from fabrics in the course of technique, the human B-cell hybridoma technique (Kozbor the Washing procedures. et al., Immunology Today 4: 72 (1983), pgs. 77-96 of Cole The use of enzymes, and in particular microbial enzymes, et al., in MONOCLONAL ANTIBODIES AND CANCER as catalyst for key Steps in the Synthesis of complex organic 55 THERAPY, Alan R. Liss, Inc. (1985)). molecules is gaining popularity at a great rate. One area of Any animal (mouse, rabbit, etc.) which is known to great interest is the preparation of chiral intermediates. produce antibodies can be immunized with the pseudogene Preparation of chiral intermediates is of interest to a wide polypeptide. Methods for immunization are well known in range of Synthetic chemists particularly those Scientists the art. Such methods include Subcutaneous or interperito involved with the preparation of new pharmaceuticals, 60 neal injection of the polypeptide. One skilled in the art will agrochemicals, fragrances and flavors. (See Davies et al., recognize that the amount of the protein encoded by the ORF Recent Advances in the Generation of Chiral Intermediates of the present invention used for immunization will vary Using Enzymes, CRC Press, Boca Raton, Fla. (1990)). The based on the animal which is immunized, the antigenicity of following reactions catalyzed by enzymes are of interest to the peptide and the Site of injection. organic chemists: hydrolysis of carboxylic acid esters, phos 65 The protein which is used as an immunogen may be phate esters, amides and nitriles, esterification reactions, modified or administered in an adjuvant in order to increase trans-esterification reactions, Synthesis of amides, reduction the protein's antigenicity. Methods of increasing the antige US 6,593,114 B1 27 28 nicity of a protein are well known in the art and include, but Conditions for incubating a DF, antigen or antibody with are not limited to coupling the antigen with a heterologous a test Sample vary. Incubation conditions depend on the protein (Such as globulin or galactosidase) or through the format employed in the assay, the detection methods inclusion of an adjuvant during immunization. employed, and the type and nature of the DF or antibody For monoclonal antibodies, Spleen cells from the immu used in the assay. One skilled in the art will recognize that nized animals are removed, fused with myeloma cells, Such any one of the commonly available hybridization, amplifi as SP2/0-Ag14 myeloma cells, and allowed to become cation or immunological assay formats can readily be monoclonal antibody producing hybridoma cells. adapted to employ the Dfs, antigens or antibodies of the Any one of a number of methods well known in the art can present invention. Examples of Such assays can be found in be used to identify the hybridoma cell which produces an Chard, T., An Introduction to Radioimmunoassay and antibody with the desired characteristics. These include Related Techniques, Elsevier Science Publishers, screening the hybridomas with an ELISA assay, western blot Amsterdam, The Netherlands (1986); Bullock, G. R. et al., analysis, or radioimmunoassay (Lutz et al., Exp. Cell Res. Techniques in Immunocytochemistry, Academic PreSS, 175: 109-124 (1988)). Orlando, Fla. Vol. 1 (1982), Vol. 2 (1983), Vol. 3 (1985); Hybridomas Secreting the desired antibodies are cloned Tijssen, P., Practice and Theory of Enzyme Immunoassays: and the class and Subclass is determined using procedures 15 Laboratory Techniques in Biochemistry; PCT publication known in the art (Campbell, A. M., Monoclonal Antibody WO95/32291, and Molecular Biology, Elsevier Science Technology: Laboratory Techniques in Biochemistry and Publishers, Amsterdam, The Netherlands (1985), all of Molecular Biology, Elsevier Science Publishers, which are hereby incorporated herein by reference. Amsterdam, The Netherlands (1984)). The test Samples of the present invention include cells, Techniques described for the production of Single chain protein or membrane extracts of cells, or biological fluids antibodies (U.S. Pat. No. 4.946,778) can be adapted to Such as Sputum, blood, Serum, plasma, or urine. The test produce Single chain antibodies to proteins of the present sample used in the above-described method will vary based invention. on the assay format, nature of the detection method and the For polyclonal antibodies, antibody containing antisera is tissues, cells or extracts used as the Sample to be assayed. isolated from the immunized animal and is Screened for the 25 Methods for preparing protein extracts or membrane extracts presence of antibodies with the desired specificity using one of cells are well known in the art and can be readily be of the above-described procedures. adapted in order to obtain a Sample which is compatible with The present invention further provides the above the System utilized. described antibodies in detectably labelled form. Antibodies In another embodiment of the present invention, kits are can be detectably labelled through the use of radioisotopes, provided which contain the necessary reagents to carry out affinity labels (Such as biotin, avidin, etc.), enzymatic labels the assays of the present invention. (Such as horseradish peroxidase, alkaline phosphatase, etc.) Specifically, the invention provides a compartmentalized fluorescent labels (such as FITC or rhodamine, etc.), para kit to receive, in close confinement, one or more containers magnetic atoms, etc. Procedures for accomplishing Such which comprises: (a) a first container comprising one of the labelling are well-known in the art, for example See Stern 35 Dfs, antigens or antibodies of the present invention; and (b) berger et al., J. Histochem. Cytochem. 18:315 (1970); Bayer, one or more other containers comprising one or more of the E. A. et al., Meth. Enzym. 62:308 (1979); Engval, E. et al., following:Wash reagents, reagents capable of detecting pres Immunol. 109:129 (1972); Goding, J. W. J. Immunol. Meth. ence of a bound DF, antigen or antibody. 13:215 (1976)). In detail, a compartmentalized kit includes any kit in The labeled antibodies of the present invention can be 40 which reagents are contained in Separate containers. Such used for in vitro, in Vivo, and in Situ assays to identify cells containers include Small glass containers, plastic containers or tissues in which a fragment of the StaphylococcuS aureuS or Strips of plastic or paper. Such containers allows one to genome is expressed. efficiently transfer reagents from one compartment to The present invention further provides the above another compartment Such that the Samples and reagents are described antibodies immobilized on a Solid Support. 45 not cross-contaminated, and the agents or Solutions of each Examples of Such Solid Supports include plastics Such as container can be added in a quantitative fashion from one polycarbonate, complex carbohydrates Such as agarose and compartment to another. Such containers will include a Sepharose, acrylic resins and Such as polyacrylamide and container which will accept the test Sample, a container latex beads. Techniques for coupling antibodies to Such Solid which contains the antibodies used in the assay, containers supports are well known in the art (Weir, D. M. et al., 50 which contain wash reagents (Such as phosphate buffered “Handbook of Experimental Immunology' 4th Ed., Black Saline, Tris-buffers, etc.), and containers which contain the well Scientific Publications, Oxford, England, Chapter 10 reagents used to detect the bound antibody, antigen or DF. (1986); Jacoby, W. D. et al., Meth. Enzym. 34 Academic Types of detection reagents include labelled nucleic acid Press, N.Y. (1974)). The immobilized antibodies of the probes, labelled Secondary antibodies, or in the alternative, present invention can be used for in vitro, in Vivo, and in Situ 55 if the primary antibody is labelled, the enzymatic, or anti assays as well as for immunoaffinity purification of the body binding reagents which are capable of reacting with the proteins of the present invention. labelled antibody. One skilled in the art will readily recog 3. Diagnostic ASSays and Kits nize that the disclosed Dfs, antigens and antibodies of the The present invention further provides methods to iden present invention can be readily incorporated into one of the tify the expression of one of the ORFs of the present 60 established kit formats which are well known in the art. invention, or homolog thereof, in a test Sample, using one of 4. Screening ASSay for Binding Agents the DFS, antigens or antibodies of the present invention. Using the isolated proteins of the present invention, the In detail, Such methods comprise incubating a test Sample present invention further provides methods of obtaining and with one or more of the antibodies, or one or more of the identifying agents which bind to a protein encoded by one of DFS, or one or more antigens of the present invention and 65 the ORFs of the present invention or to one of the fragments assaying for binding of the DFS, antigens or antibodies to and the StaphylococcuS aureus fragment and contigs herein components within the test Sample. described. US 6,593,114 B1 29 30 In general, Such methods comprise Steps of: pharmaceutical compositions. AS used herein, the “pharma (a) contacting an agent with an isolated protein encoded ceutical agents of the present invention” refers the pharma by one of the ORFs of the present invention, or an ceutical agents which are derived from the proteins encoded isolated fragment of the StaphylococcuS aureuS by the ORFs of the present invention or are agents which are genome; and identified using the herein described assayS. (b) determining whether the agent binds to said protein or AS used herein, a pharmaceutical agent is Said to "modul Said fragment. late the growth or pathogenicity of StaphylococcuS aureuS or The agents Screened in the above assay can be, but are not a related organism, in Vivo or in vitro, when the agent limited to, peptides, carbohydrates, Vitamin derivatives, or reduces the rate of growth, rate of division, or viability of the other pharmaceutical agents. The agents can be Selected and organism in question. The pharmaceutical agents of the Screened at random or rationally Selected or designed using present invention can modulate the growth or pathogenicity protein modeling techniques. of an organism in many fashions, although an understanding For random Screening, agents Such as peptides, of the underlying mechanism of action is not needed to carbohydrates, pharmaceutical agents and the like are practice the use of the pharmaceutical agents of the present Selected at random and are assayed for their ability to bind 15 invention. Some agents will modulate the growth or patho to the protein encoded by the ORF of the present invention. genicity by binding to an important protein thus blocking the Alternatively, agents may be rationally Selected or biological activity of the protein, while other agents may designed. AS used herein, an agent is Said to be “rationally bind to a component of the Outer Surface of the organism Selected or designed when the agent is chosen based on the blocking attachment or rendering the organism more prone configuration of the particular protein. For example, one to act the bodies nature immune System. Alternatively, the skilled in the art can readily adapt currently available agent may comprise a protein encoded by one of the ORFs procedures to generate peptides, pharmaceutical agents and of the present invention and Serve as a vaccine. The devel the like capable of binding to a specific peptide Sequence in opment and use of vaccines derived from membrane asso order to generate rationally designed antipeptide peptides, ciated polypeptides are well known in the art. The inventors for example See Hurlby et al., Application of Synthetic 25 have identified particularly preferred immunogenic Staphy Peptides: Antisense Peptides,” In Synthetic Peptides, A lococcuS aureuS polypeptides for use as Vaccines. Such User's Guide, W. H. Freeman, N.Y. (1992), pp. 289-307, immunogenic polypeptides are described above and Sum and Kaspczak et al., Biochemistry 28:9230-8 (1989), or marized below. pharmaceutical agents, or the like. AS used herein, a "related organism' is a broad term In addition to the foregoing, one class of agents of the which refers to any organism whose growth or pathogenicity present invention, as broadly described, can be used to can be modulated by one of the pharmaceutical agents of the control gene expression through binding to one of the ORFs present invention. In general, Such an organism will contain or EMFs of the present invention. AS described above, Such a homolog of the protein which is the target of the pharma agents can be randomly Screened or rationally designed/ ceutical agent or the protein used as a vaccine. AS Such, selected. Targeting the ORF or EMF allows a skilled artisan 35 related organisms do not need to be bacterial but may be to design Sequence Specific or element Specific agents, fungal or viral pathogens. modulating the expression of either a Single ORF or multiple The pharmaceutical agents and compositions of the ORFs which rely on the same EMF for expression control. present invention may be administered in a convenient One class of DNA binding agents are agents which manner, Such as by the oral, topical, intravenous, contain base residues which hybridize or form a triple helix 40 intraperitoneal, intramuscular, Subcutaneous, intranasal or by binding to DNA or RNA. Such agents can be based on the intradermal routes. The pharmaceutical compositions are classic phosphodiester, ribonucleic acid backbone, or can be administered in an amount which is effective for treating a variety of sulfhydryl or polymeric derivatives which have and/or prophylaxis of the Specific indication. In general, they base attachment capacity. are administered in an amount of at least about 1 mg/kg body Agents Suitable for use in these methods usually contain 45 weight and in most cases they will be administered in an 20 to 40 bases and are designed to be complementary to a amount not in excess of about 1 g/kg body weight per day. region of the gene involved in transcription (triple helix In most cases, the dosage is from about 0.1 mg/kg to about see Lee et al., Nucl. Acids Res. 6:3073 (1979); Cooney et al., 10 g/kg body weight daily, taking into account the routes of Science 241:456 (1988); and Dervan et al., Science 251: administration, Symptoms, etc. 1360 (1991)) or to the mRNA itself (antisense-Okano, J. 50 The agents of the present invention can be used in native Neurochem. 56.560 (1991); Oligodeoxynucleotides as Anti form or can be modified to form a chemical derivative. As sense Inhibitors of Gene Expression, CRC Press, Boca used herein, a molecule is Said to be a “chemical derivative” Raton, Fla. (1988)). Triple helix- formation optimally results of another molecule when it contains additional chemical in a shut-off of RNA transcription from DNA, while anti moieties not normally a part of the molecule. Such moieties sense RNA hybridization blocks translation of an mRNA 55 may improve the molecule's Solubility, absorption, biologi molecule into polypeptide. Both techniques have been dem cal half life, etc. The moieties may alternatively decrease the onstrated to be effective in model systems. Information toxicity of the molecule, eliminate or attenuate any unde contained in the Sequences of the present invention can be sirable side effect of the molecule, etc. Moieties capable of used to design antisense and triple helix-forming mediating Such effects are disclosed in, among other oligonucleotides, and other DNA binding agents. 60 sources, REMINGTON'S PHARMACEUTICAL SCI 5. Pharmaceutical Compositions and Vaccines ENCES (1980) cited elsewhere herein. The present invention further provides pharmaceutical For example, Such moieties may change an immunologi agents which can be used to modulate the growth or patho cal character of the functional derivative, Such as affinity for genicity of StaphylococcuS aureus, or another related a given antibody. Such changes in immunomodulation activ organism, in Vivo or in vitro. AS used herein, a “pharma 65 ity are measured by the appropriate assay, Such as a com ceutical agent' is defined as a composition of matter which petitive type immunoassay. Modifications of Such protein can be formulated using known techniques to provide a properties as redox or thermal Stability, biological half-life, US 6,593,114 B1 31 32 hydrophobicity, Susceptibility to proteolytic degradation or Suitable for effective administration, Such compositions will the tendency to aggregate with carriers or into multimers contain an effective amount of one or more of the agents of also may be effected in this way and can be assayed by the present invention, together with a Suitable amount of methods well known to the skilled artisan. carrier vehicle. The therapeutic effects of the agents of the present inven Additional pharmaceutical methods may be employed to tion may be obtained by providing the agent to a patient by control the duration of action. Control release preparations any Suitable means (e.g., inhalation, intravenously, may be achieved through the use of polymers to complex or intramuscularly, Subcutaneously, enterally, or parenterally). absorb one or more of the agents of the present invention. It is preferred to administer the agent of the present inven The controlled delivery may be effectuated by a variety of tion So as to achieve an effective concentration within the well known techniques, including formulation with macro blood or tissue in which the growth of the organism is to be molecules Such as, for example, polyesters, polyamino controlled. To achieve an effective blood concentration, the acids, polyvinyl, pyrrollidone, ethyleneVinylacetate, preferred method is to administer the agent by injection. The methylcellulose, carboxymethylcellulose, or protamine, administration may be by continuous infusion, or by Single Sulfate, adjusting the concentration of the macromolecules or multiple injections. 15 and the agent in the formulation, and by appropriate use of In providing a patient with one of the agents of the present methods of incorporation, which can be manipulated to effectuate a desired time course of release. Another possible invention, the dosage of the administered agent will vary method to control the duration of action by controlled depending upon Such factors as the patient's age, weight, release preparations is to incorporate agents of the present height, Sex, general medical condition, previous medical invention into particles of a polymeric material Such as history, etc. In general, it is desirable to provide the recipient polyesters, polyamino acids, hydrogels, poly(lactic acid) or with a dosage of agent which is in the range of from about ethylene vinylacetate copolymers. Alternatively, instead of 1 pg/kg to 10 mg/kg (body weight of patient), although a incorporating these agents into polymeric particles, it is lower or higher dosage may be administered. The therapeu possible to entrap these materials in microcapsules prepared, tically effective dose can be lowered by using combinations for example, by coacervation techniqueS or by interfacial of the agents of the present invention or another agent. 25 polymerization with, for example, hydroxymethylcellulose AS used herein, two or more compounds or agents are said or gelatine-microcapsules and poly(methylmethacylate) to be administered "in combination' with each other when microcapsules, respectively, or in colloidal drug delivery either (1) the physiological effects of each compound, or (2) Systems, for example, liposomes, albumin microSpheres, the Serum concentrations of each compound can be mea microemulsions, nanoparticles, and nanocapsules or in mac Sured at the Same time. The composition of the present roemulsions. Such techniques are disclosed in REMING invention can be administered concurrently with, prior to, or TON'S PHARMACEUTICAL SCIENCES (1980). following the administration of the other agent. The invention further provides a pharmaceutical pack or The agents of the present invention are intended to be kit comprising one or more containers filled with one or provided to recipient Subjects in an amount Sufficient to more of the ingredients of the pharmaceutical compositions decrease the rate of growth (as defined above) of the target 35 organism. of the invention. ASSociated with Such container(s) can be a The administration of the agent(s) of the invention may be notice in the form prescribed by a governmental agency for either a “prophylactic' or “therapeutic' purpose. When regulating the manufacture, use or Sale of pharmaceuticals or provided prophylactically, the agent(s) are provided in biological products, which notice reflects approval by the agency of manufacture, use or Sale for human administra advance of any Symptoms indicative of the organisms 40 tion. growth. The prophylactic administration of the agent(s) In addition, the agents of the present invention may be Serves to prevent, attenuate, or decrease the rate of onset of employed in conjunction with other therapeutic compounds. any Subsequent infection. When provided therapeutically, 6. Shot-Gun Approach to Megabase DNA Sequencing the agent(s) are provided at (or shortly after) the onset of an The present invention further demonstrates that a large indication of infection. The therapeutic administration of the 45 Sequence can be sequenced using a random shotgun compound(s) serves to attenuate the pathological Symptoms approach. This procedure, described in detail in the of the infection and to increase the rate of recovery. examples that follow, has eliminated the up front cost of The agents of the present invention are administered to a isolating and ordering overlapping or contiguous Subclones Subject, Such as a mammal, or a patient, in a pharmaceuti prior to the Start of the Sequencing protocols. cally acceptable form and in a therapeutically effective 50 concentration. A composition is said to be “pharmacologi Certain aspects of the present invention are described in cally acceptable' if its administration can be tolerated by a greater detail in the examples that follow. The examples are recipient patient. Such an agent is said to be administered in provided by way of illustration. Other aspects and embodi a “therapeutically effective amount' if the amount admin ments of the present invention are contemplated by the istered is physiologically significant. An agent is physiologi 55 inventors, as will be clear to those of skill in the art from cally significant if its presence results in a detectable change reading the present disclosure. in the physiology of a recipient patient. ILLUSTRATIVE EXAMPLES The agents of the present invention can be formulated LIBRARIES AND SEQUENCING according to known methods to prepare pharmaceutically 1. Shotgun Sequencing Probability Analysis useful compositions, whereby these materials, or their func 60 The Overall Strategy for a shotgun approach to whole tional derivatives, are combined in admixture with a phar genome Sequencing follows from the Lander and Waterman maceutically acceptable carrier vehicle. Suitable vehicles (Landerman and Waterman, Genomics 2: 231 (1988)) appli and their formulation, inclusive of other human proteins, cation of the equation for the Poisson distribution. Accord e.g., human Serum albumin, are described, for example, in ing to this treatment, the probability, Po, that any given base REMINGTON'S PHARMACEUTICAL SCIENCES, 16" 65 in a Sequence of Size L, in nucleotides, is not Sequenced after Ed., Osol, A., Ed., Mack Publishing, Easton Pa. (1980). In a certain amount, n, in nucleotides, of random Sequence has order to form a pharmaceutically acceptable composition been determined can be calculated by the equation P=e",i. US 6,593,114 B1 33 34 where m is L/n, the fold coverage.” For instance, for a The final ligation to produce circles was carried out in a 50 genome of 2.8 Mb, m=1 when 2.8 Mb of sequence has been ul reaction containing 5 ul of V--i linears and 5 units of T4 randomly generated (1X coverage). At that point, Po-e'- at 14 C. overnight. After 10 min. at 70° C. the 0.37. The probability that any given base has not been following day, the reaction mixture was stored at -20° C. Sequenced is the same as the probability that any region of This two-stage procedure resulted in a molecularly ran the whole Sequence L has not been determined and, dom collection of Single-insert plasmid recombinants with therefore, is equivilent to the fraction of the whole Sequence minimal contamination from double-insert chimeras (<1%) that has yet to be determined. Thus, at one-fold coverage, or free vector (<3%). approximately 37% of a polynucleotide of size L, in nucle Since deviation from randomneSS can arise from propa otides has not been sequenced. When 14 Mb of sequence has gation the DNA in the host, E.coli host cells deficient in all been generated, coverage is 5X for a 0.2.8 Mb and the recombination and restriction functions (A. Greener, Strat unsequenced fraction drops to 0.0067 or 0.67%. 5X cover egies 3 (1):5 (1990)) were used to prevent rearrangements, age of a 2.8 Mb. Sequence can be attained by Sequencing deletions, and loSS of clones by restriction. Furthermore, approximately 17,000 random clones from both insert ends transformed cells were plated directly on antibiotic diffusion with an average Sequence read length of 410 bp. 15 plates to avoid the usual broth recovery phase which allows Similarly, the total gap length, G, is determined by the multiplication and Selection of the most rapidly growing equation G=Le", and the average gap size, g, follows the cells. equation, g=L/n. Thus, 5X coverage leaves about 240 gaps Plating was carried out as follows. A 100 ul aliquot of averaging about 82 bp in size in a Sequence of a. polynucle Epicurian Coli SURE II Supercompetent Cells (Stratagene otide 2.8 Mb long. 200152) was thawed on ice and transferred to a chilled The treatment above is essentially that of Lander and Falcon 2059 tube on ice. A 1.7 ul aliquot of 1.42M beta Waterman, Genomics 2: 231 (1988). mercaptoethanol was added to the aliquot of cells to a final 2. Random Library Construction concentration of 25 mM. Cells were incubated on ice for 10 In order to approximate the random model described min. A 1 ul aliquot of the final ligation was added to the cells above during actual Sequencing, a nearly ideal library of 25 and incubated on ice for 30 min. The cells were heat pulsed cloned genomic fragments is required. The following library for 30 sec. at 42 C. and placed back on ice for 2 min. The construction procedure was developed to achieve this end. outgrowth period in liquid culture was eliminated from this Staphylococcus aureus DNA was prepared by phenol protocol in order to minimize the preferential growth of any extraction. A mixture containing 600 ug DNA in 3.3 ml of given transformed cell. Instead the transformation mixture 300 mM sodium acetate, 10 mM Tris-HCl, 1 mM Na-EDTA, was plated directly on a nutrient rich SOB plate containing 30% glycerol was Sonicated for 1 min. at 0°C. in a Branson a 5 ml bottom layer of SOB agar (5% SOB agar: 20 g Model 450 Sonicator at the lowest energy setting using a 3 tryptone, 5 g yeast extract, 0.5g NaCl, 1.5% Difco Agar per mm probe. The Sonicated DNA was ethanol precipitated and liter of media). The 5 ml bottom layer is Supplemented with redissolved in 500 ul TE buffer. 0.4 ml of 50 mg/ml ampicillin per 100 ml SOBagar. The 15 To create blunt-ends, a 100 ul aliquot of the resuspended 35 ml top layer of SOB agar is supplemented with 1 ml X-Gal DNA was digested with 5 units of BAL31 nuclease (New (2%), 1 ml MgCl (1M), and 1 ml MgSO/100 ml SOBagar. England BioLabs) for 10 min at 30° C. in 200 ul BAL3 1 The 15 ml top layer was poured just prior to plating. Our titer buffer. The digested DNA was phenol-extracted, ethanol was approximately 100 colonies/10 ul aliquot of transfor precipitated, redissolved in 100 ul TE buffer, and then mation. size-fractionated by electrophoresis through a 1.0% low 40 All colonies were picked for template preparation regard melting temperature agarose gel. The Section containing less of size. Thus, only clones lost due to “poison' DNA or DNA fragments 1.6-2.0 kb in size was excised from the gel, deleterious gene products would be deleted from the library, and the LGT agarose was melted and the resulting Solution resulting in a slight increase in gap number over that was extracted with phenol to Separate the agarose from the expected. DNA. DNA was ethanol precipitated and redissolved in 20 45 3. Random DNA Sequencing ul of TE buffer for ligation to vector. High quality double stranded DNA plasmid templates A two-step ligation procedure was used to produce a were prepared using an alkaline lysis method developed in plasmid library with 97% inserts, of which >99% were collaboration with 5Prime->3Prime Inc. (Boulder, Colo.). Single inserts. The first ligation mixture (50 ul) contained 2 Plasmid preparation was performed in a 96-well format for ug of DNA fragments, 2 ug pUC 18 DNA (Pharmacia) cut 50 all Stages of DNA preparation from bacterial growth through with Smal and dephosphorylated with bacterial alkaline final DNA purification. Average template concentration was phosphatase, and 10 units of T4 ligase (GIBCO/BRL) and determined by running 25% of the Samples on an agarose was incubated at 14 C. for 4 hr. The ligation mixture then gel. DNA concentrations were not adjusted. was phenol eXtracted and ethanol precipitated, and the Templates were also prepared from a StaphylococcuS precipitated DNA was dissolved in 20 ul TE buffer and 55 aureuS lambda genomic library. An unamplified library was electrophoresed on a 1.0% low melting agarose gel. Discrete constructed in Lambda DASH II vector (Stratagene). Sta bands in a ladder were visualized by ethidium bromide phylococcus aureus DNA (>100 kb) was partially digested Staining and UV illumination and identified by Size as insert in a reaction mixture (200 ul) containing 50 ug DNA, 1X (i), vector (V), V+i, V+2i, V+3i, etc. The portion of the gel Sau3AI buffer, 20 units Sau3AI for 6 min. at 23 C. The containing v+i DNA was excised and the v+i DNA was 60 digested DNA was phenol-extracted and centrifuges over a recovered and resuspended into 20 ulTE. The v+i DNA then 10-40%. Sucroce gradient. Fractions containing genomic was blunt-ended by T4 polymerase treatment for 5 min. at DNA of 15-25 kb were recovered by precipitation. One ul 37 C. in a reaction mixture (50 ul) containing the v+i of fragments was used with 1 ul of DASHII vector linears, 500 uM each of the 4 dNTPs, and 9 units of T4 (Stratagene) in the recommended ligation reaction. One ul of polymerase (New England BioLabs), under recommended 65 the ligation mixture was used per packaging reaction fol buffer conditions. After phenol eXtraction and ethanol pre lowing the recommended protocol with the Gigapack II XL cipitation the repaired v+ilinears were dissolved in 20 ulTE. Packaging Extract Phage were plated directly without ampli US 6,593,114 B1 35 36 fication from the packaging mixture (after dilution with 500 hours (ABI 377) following the manufacturer's protocols. ul of recommended SM buffer and chloroform treatment). Following electrophoresis and fluorescence detection, the Yield was about 2.5x10 pfu?ul. ABI 373 or ABI 377 performs automatic lane tracking and An amplified library was prepared from the primary base-calling. The lane-tracking was confirmed Visually. packaging mixture according to the manufactureer's proto Each sequence electropherogram (or fluorescence lane col. The amplified library is stored frozen in 7% dimethyl trace) was inspected Visually and assessed for quality. Trail sulfoxide. The phage titer is approximately 1x10 pfu/ml. ing Sequences of low quality were removed and the Mini-liquid lysates (0.1 ul) are prepared from randomly Sequence itself was loaded via Software to a Sybase database Selected plaques and template is prepared by long range (archived daily to 8mm tape). Leading vector polylinker PCR. Samples are PCR amplified using modified T3 and Sequence was removed automatically by a Software pro T7 primers, and Elongase Supermix (LTI). gram. Average edited lengths of Sequences from the Standard Sequencing reactions are carried out on plasmid templates ABI 373 or ABI 377 were around 400 bp and depend mostly using a combination of two workstations (BIOMEK 1000 on the quality of the template used for the Sequencing and Hamilton Microlab 2200) and the Perkin-Elmer 9600 reaction. thermocycler with Applied Biosystems PRISM Ready Reac 15 INFORMATICS tion Dye Primer Cycle Sequencing Kits for the M1 3 1. Data Management forward (M13-21) and the M13 reverse (M13RP1) primers. A number of information management Systems for a Dye terminator Sequencing reactions are carried out on the large-scale sequencing lab have been developed. (For lambda templates on a Perkin-Elmer 9600 Thermocycler review See, for instance, Kerlavage et al., Proceedings of the using the Applied Biosystems Ready Reaction Dye Termi Twenty-Sixth Annual Hawaii International Conference On nator Cycle Sequencing kits. Modified T7 and T3 primers System Sciences, IEEE Computer Society Press, Washington are used to Sequence the ends of the inserts from the Lambda D.C., 585 (1993)) The system used to collect and assemble DASH II library. Sequencing reactions are on a combination the Sequence data was developed using the Sybase relational of AB 373 DNA Sequencers and ABI 377 DNA sequencers. database management System and was designed to automate All of the dye terminator Sequencing reactions are analyzed 25 data flow whereever possible and to reduce user error. The using the 2X 9 hour module on the AB 377. Dye primer database Stores and correlates all information collected reactions are analyzed on a combination of ABI 373 and ABI during the entire operation from template preparation to final 377 DNA sequencers. The Overall Sequencing Success rate analysis of the genome. Because the raw output of the ABI very approximately is about 85% for M13-21 and M13RP1 373 Sequencers was based on a Macintosh platform and the Sequences and 65% for dye-terminator reactions. The aver data management System chosen was based on a Unix age uSable read length is 485 bp for M13-21 Sequences, platform, it was necessary to design and implement a variety 445bp for M13RP1 sequences, and 375 bp for dye of multi-user, client-server applications which allow the raw terminator reactions. data as well as analysis results to flow seamlessly into the 4. Protocol for Automated Cycle Sequencing database with a minimum of user effort. The Sequencing was carried out using Hamilton MicroS 35 2. Assembly tation 2200, PerkinElmer 9600 thermocyclers, ABI 373 and An assembly engine (TIGR Assembler) developed for the ABI 377 Automated DNA Sequencers. The Hamilton com rapid and accurate assembly of thousands of Sequence bines pre-aliquoted templates and reaction mixes consisting fragments was enployed to generate contigs. The TIGR of deoxy- and dideoxynucleotides, the thermostable Taq assembler Simultaneously clusters and assembles fragments DNA polymerase, fluorescently-labelled Sequencing 40 of the genome. In order to obtain the Speed necessary to primers, and reaction buffer. Reaction mixes and templates assemble more than 10" fragments, the algorithm builds a were combined in the wells of a 96-well thermocycling plate hash table of 12 bp oligonucleotide Subsequences to gener and transferred to the Perkin Elmer 9600 thermocycler. ate a list of potential Sequence fragment overlaps. The Thirty consecutive cycles of linear amplification (i.e., one number of potential overlaps for each fragment determines primer Synthesis) Steps were performed including 45 which fragments are likely to fall into repetitive elements. denaturation, annealing of primer and template, and exten Beginning with a single Seed Sequence fragment, TIGR sion; i.e., DNA synthesis. A heated lid with rubber gaskets ASSembler extends the current contig by attempting to add on the thermocycling plate prevents evaporation without the the best matching fragment based on oligonucleotide con need for an oil overlay. tent. The contig and candidate fragment are aligned using a Two Sequencing protocols were used: one for dye-labelled 50 modified version of the Smith-Waterman algorithm which primerS and a Second for dye-labelled dideoxy chain termi provides for optimal gapped alignments (Waterman, M. S., nators. The shotgun Sequencing involves use of four dye Methods in Enzymology 164: 765 (1988)). The contig is labelled Sequencing primers, one for each of the four ter extended by the fragment only if Strict criteria for the quality minator nucleotide. Each dye-primer was labelled with a of the match are met. The match criteria include the mini different fluorescent dye, permitting the four individual 55 mum length of overlap, the maximum length of an reactions to be combined into one lane of the 373 or 377 unmatched end, and the minimum percentage match. These DNA Sequencer for electrophoresis, detection, and base criteria are automatically lowered by the algorithm in calling. ABI currently Supplies pre-mixed reaction mixes in regions of minimal coverage and raised in regions with a bulk packages containing all the necessary non-template possible repetitive element. The number of potential over reagents for Sequencing. Sequencing can be done with both 60 laps for each fragment determines which fragments are plasmid and PCR- generated templates with both dye likely to fall into repetitive elements. Fragments represent primers and dye-terminators with approximately equal ing the boundaries of repetitive elements and potentially fidelity, although plasmid templates generally give longer chimeric fragments are often rejected based on partial mis uSable Sequences. matches at the ends of alignments and excluded from the Thirty-two reactions were loaded per ABI 373 Sequencer 65 current contig. TIGR ASSembler is designed to take advan each day and 96 samples can be loaded on an ABI 377 per tage of clone size information coupled with Sequencing from day. Electrophoresis was run overnight (ABI 373) or for 2% both ends of each template. It enforces the constraint that US 6,593,114 B1 37 38 Sequence fragments from two ends of the same template 3. Polyclonal Antibody Production by Immunization point toward one another in the contig and are located within Polyclonal antiserum containing antibodies to heterog a certain ranged of base pairs (definable for each clone based enous epitopes of a Single protein can be prepared by on the known clone size range for a given library). immunizing Suitable animals with the expressed protein 3. Identifying Genes described above, which can be unmodified or modified to enhance immunogenicity. Effective polyclonal antibody pro Tables 1, 2, and 3 list ORFs in the Staphylococcus aureus duction is affected by many factors related both to the genomic contigs of the present invention that were identified antigen and the host species. For example, Small molecules as putative coding regions by the. GeneMark Software using tend to be leSS immunogenic than other and may require the organism-specific Second-order Markov probability transi use of carriers and adjuvant. Also, host animals vary in tion matrices. It will be appreciated that other criteria can be response to Site of inoculations and dose, with both inad used, in accordance with well known analytical methods, equate or excessive doses of antigen resulting in low titer Such as those discussed herein, to generate more inclusive, antisera. Small doses (ng level) of antigenadministered at more restrictive, or more Selective lists. multiple intradermal Sites appears to be most reliable. An Table 1 sets out ORFs in the Staphylococcus aureus effective immunization protocol for rabbits can be found in contigs of the present invention that over a continuous 15 Vaitukaitis, J. et al., J. Clin. Endocrinol. Metab. 33:988–991 region of at least 50 bases are 95% or more identical (by (1971). BLASTN analysis) to a nucleotide sequence available Booster injections can be given at regular intervals, and through Genbank in November 1996. antiserum harvested when antibody titer thereof, as deter Table 2 sets out ORFs in the Staphylococcus aureus mined Semi-quantitatively, for example, by double immun Odiffusion in agar against known concentrations of the contigs of the present invention that are not in Table I and antigen, begins to fall. See, for example, Ouchterlony, O. et match, with a BLASTP probability score of 0.01 or less, a al., Chap. 19 in: Handbook of Experimental Immunology, polypeptide Sequence available through a non-redundant Wier, D., ed., Blackwell (1973). Plateau concentration of database of known proteins generated by combining the antibody is usually in the range of 0.1 to 0.2 mg/ml of Serum Swiss-Prot, PIR, and GenPept databases. (about 12M). Affinity of the antisera for the antigen is Table 3 sets out the remaining ORFs in the Staphylococ 25 determined by preparing competitive binding curves, as cus aureus contigs of the present invention, which did not described, for example, by Fisher, D., Chap. 42 in: Manual of have significant matches to the public databases by the Clinical Immunology, Second edition, Rose and Friedman, criteria described above. eds., Amer. Soc. For Microbiology, Washington, D. C. ILLUSTRATIVE APPLICATIONS (1980) 1. Production of an Antibody to a Staphylococcus aureus Antibody preparations prepared according to either pro Protein tocol are useful in quantitative immunoassays which deter mine concentrations of antigen-bearing Substances in bio Substantially pure protein or polypeptide is isolated from logical samples; they are also used semi-quantitatively or the transfected or transformed cells using any one of the qualitatively to identify the presence of antigen in a biologi methods known in the art. The protein can also be produced 35 cal Sample. In addition, they are useful in various animal in a recombinant prokaryotic expression System, Such as E. models of Staphylococcal disease known to those of skill in coli, or can by chemically Synthesized. Concentration of the art as a means of evaluating the protein used to make the protein in the final preparation is adjusted, for example, by antibody as a potential vaccine target or as a means of concentration on an Amicon filter device, to the level of a evaluating the antibody as a potential immunothereapeutic few micrograms/ml. Monoclonal or polyclonal antibody to 40 reagent. the protein can then be prepared as follows. 3. Preparation of PCR Primers and Amplification of DNA 2. Monoclonal Antibody Production by Hybridoma Various fragments of the StaphylococcuS aureuS genome, Fusion Such as those of Tables 1-3 and SEQ ID NOS: 1-5,191 can Monoclonal antibody to epitopes of any of the peptides be used, in accordance with the present invention, to prepare identified and isolated as described can be prepared from 45 PCR primers for a variety of uses. The PCR primers are murine hybridomas according to the classical method of preferably at least 15 bases, and more preferably at least 18 Kohler, G. and Milstein, C., Nature 256:495 (1975) or bases in length. When Selecting a primer Sequence, it is modifications of the methods thereof. Briefly, a mouse is preferred that the primer pairs have approximately the same repetitively inoculated with a few micrograms of the G/C ratio, So that melting temperatures are approximately Selected protein over a period of a few weeks. The mouse is 50 the same. The PCR primers and amplified DNA of this then Sacrificed, and the antibody producing cells of the Example find use in the Examples that follow. Spleen isolated. The Spleen cells are fused by means of 4. Gene expression from DNA Sequences Corresponding polyethylene glycol with mouse myeloma cells, and the to ORFS exceSS unfused cells destroyed by growth of the System on A fragment of the StaphylococcuS aureus genome pro Selective media comprising aminopterin (HAT media). The 55 vided in Tables 1-3 is introduced into an expression vector Successfully fused cells are diluted and aliquots of the using conventional technology. Techniques to transfer dilution placed in Wells of a microtiter plate where growth cloned Sequences into expression vectors that direct protein of the culture is continued. Antibody-producing clones are translation in mammalian, yeast, insect or bacterial expres identified by detection of antibody in the Supernatant fluid of Sion Systems are well known in the art. Commercially the Wells by immunoassay procedures, Such as ELISA, as 60 available vectors and expression Systems are available from originally described by Engvall, E., Meth. Enzymol. 70:419 a variety of Suppliers including Stratagene (La Jolla, Calif.), (1980), and modified methods thereof. Selected positive Promega (Madison, Wis.), and Invitrogen (San Diego, clones can be expanded and their monoclonal antibody Calif.). If desired, to enhance expression and facilitate product harvested for use. Detailed procedures for mono proper protein folding, the codon context and codon pairing clonal antibody production are described in Davis, L. et al. 65 of the Sequence may be optimized for the particular expres Basic Methods in Molecular Biology Elsevier, New York. sion organism, as explained by Hatfield et al.; U.S. Pat. No. Section 21-2 (1989). 5,082,767, incorporated herein by this reference. US 6,593,114 B1 39 40 The following is provided as one exemplary method to be necessary to purify and locate the transfected product, generate polypeptide(s) from cloned ORFs of the Staphylo Synthetic 15-mer peptides Synthesized from the predicted COccus aureuS genome fragment. Bacterial ORFs generally StaphylococcuS aureus DNA sequence are injected into mice lack a poly A addition Signal. The addition signal Sequence to generate antibody to the polypeptide encoded by the can be added to the construct by, for example, Splicing out Staphylococcus aureus DNA. the poly A addition Sequence from pSG5 (Stratagene) using Alternativly and if antibody production is not possible, BglI and Sal restriction endonuclease enzymes and incor porating it into the mammalian expression vector pXT1 the StaphylococcuS aureus DNA sequence is additionally (Stratagene) for use in eukaryotic expression Systems. pXT1 incorporated into eukaryotic expression vectors and contains the LTRS and a portion of the gag gene of Moloney expressed as, for example, a globin fusion. Antibody to the Murine Leukemia Virus. The positions of the LTRS in the globin moiety then is used to purify the chimeric protein. construct allow efficient stable transfected. The vector Corresponding protease cleavage Sites are engineered includes the Herpes Simplex thymidine kinase promoter and between the globin moiety and the polypeptide encoded by the Selectable neomycin gene. The StaphyloCOccus aureuS the Staphylococcus aureus DNA so that the latter may be DNA is obtained by PCR from the bacterial vector using 15 freed from the formed by Simple protease digestion. One oligonucleotide primers complementary to the Staphylococ useful expression vector for generating globin chimericS is cus aureuS DNA and containing restriction endonuclease pSG5 (Stratagene). This vector encodes a rabbit globin. sequences for Pst incorporated into the 5' primer and BglII Intron II of the rabbit globin gene facilitates splicing of the at the 5' end of the corresponding StaphylococcuS aureuS expressed transcript, and the polyadenylation Signal incor DNA3' primer, taking care to ensure that the StaphylococcuS porated into the construct increases the level of expression. aureus DNA is positioned such that its followed with the These techniques are well known to those skilled in the art poly A addition Sequence. The purified fragment obtained of molecular biology. Standard methods are published in from the resulting PCR reaction is digested with Pst, blunt methods texts Such as Davis et al., cited elsewhere herein, ended with an exonuclease, digested with BglII, purified and and many of the methods are available from the technical ligated to pXT1, now containing a poly Aaddition Sequence 25 assistance representatives from Stratagene, Life and digested BgllI. Technologies, Inc., or Promega. Polypeptides of the inven The ligated product is transfected into mouse NIH 3T3 tion also may be produced using in vitro translation Systems cells using Lipofectin (Life Technologies, Inc., Grand such as in vitro ExpressTM Translation Kit (Stratagene). Island, N.Y.) under conditions outlined in the product speci While the present invention has been described in some fication. Positive transfectants are Selected after growing the detail for purposes of clarity and understanding, one skilled transfected cells in 600 ug/mi G418 (Sigma, St. Louis, Mo.). in the art will appreciate that various changes in form and The protein is preferably released into the Supernatant. detail can be made without departing from the true Scope of However if the protein has membrane binding domains, the the invention. protein may additionally be retained within the cell or All patents, patent applications and publications referred expression may be restricted to the cell Surface. Since it may to above are hereby incorporated by reference.

US 6,593,114 B1 71 72

TABLE 2 S. aureus-Putative coding regions of novel proteins similar to known proteins Contig ORF Start Stop match % length ID ID (nt) (nt) acession match gene name sim ident (nt) 2O 6 46.79 4269 gi511839 ORF1 Staphylococcus bacteriophage phi 11 11 149 3 1577 1122 pir|B49703B497 int gene activator RinA-bacteriophage phi 11 56 149 5 1912 1715 gi166161 Bacteriophage phi-11 int gene activator Staphylococcus acteriophage 98 phi 11 349 2 409 260 gi1661.59 integrase (int) Staphylococcus bacteriophage phi 11 1OO 50 398 707 42 gi1661.59 integrase (int) Staphylococcus bacteriophage phi 11 99 666 398 783 1OO1 gi4551284 excisionase (xi) Staphylococcus bacteriophage phi 11 1OO 239 502 1744 1574. gi1204912 M. influenzae predicted coding region HIO660 Haemophilus 71 71 influenzae 849 2 262 gi1373.002 polyprotein (Bean common mosaic virus) 46 261 1349 140 3 gi143359 protein synthesis initiation factor 2 (infB) Bacillus subtilis gi49319 82 38 IF2 gene product Bacillus subtilis 288O 21 gi862.933 protein kinase C inhibitor-I Homo sapiens 1OO 98 288 3O85 216 gi1354213 PET112-like protein Bacillus Subtilis 1OO 1OO 213 41.68 398 225 gi1354211 PET112-like protein Bacillus Subtilis 1OO 1OO 74 311 2 247 gi426473 nusG gene product Staphylococcus carnosus 98 95 246 2O7 1272 1463 gi4602594 Bacillus subtilis 97 90 92 331 : 395 850 gi581638 L11 protein Staphylococcus carnosus 97 93 366 39 215 gi166161 Bacteriophage phi-11 int gene activator Staphylococcus acteriophage 97 95 77 phi 11 68O 718 936 gi426473 nusG gene product Staphylococcus carnosus 97 97 219 3578 144 4 gi1339950 large subunit of NADH-dependent glutamate synthase 97 79 41 Plectonema boryanum 157 321 518 gi1022726 unknown Staphlococcus haemolyticus 96 88 98 205 33 16147 15824 gi11653.02 S10 Bacilius Subtilis 96 91 324 3919 48 4O1 gi871784 Clp-like ATP-dependent protease binding subunit Bos taurus 96 81 354 41.33 417 4 gi1022726 unknown Staphylococcus haemolyticus 96 84 41.68 355 2 gi1354211 PET112-like protein Bacillus Subtilis 96 95 354 42O7 157 2 gi602031 similar to trimethylamine DH Mycoplasma capricolum 96 86 56 pirS4995OS49950 probable trimethylamine (EC.5.99.7)-Mycoplasma capricolum (SGC3) (fragment) 4227 152 333 gi1871784 Clp-like ATP-dependent protease binding subunit Bos taurus 96 81 8O 4418 286 2 gi11022726 unknown Staphylococcus haemoyticus 96 84 285 22 430 2 gi1511070 UreC Staphylococcus xylosus 95 88 22 4036 3710 gi1581787 urease gamma subunit Staphylococcus xylosus 95 79 327 82 8794 9114 pirJGO008JGOO ribosomal protein S7-Bacillus Stearothermophilus 95 83 321 154 7838 6396 gi1354211 PET112-like protein Bacillus Subtilis 95 92 186 2055 1312 gi1514656 serine O-acetyltransrerase Staphylococcus xylosus 95 87 744 205 4014 3622 gi142462 ribosomal protein S11 Bacillus subtilis 95 85 393 205 4793 4569 gi142459 initiation factor 1 Bacilius Subtilis 95 84 225 205 2 10991 106.17 gi1044974 ribosomal protein L14 Bacillus Subtilis 95 93 375 259 6644 6OOO sp|P47995YSEA HYPOTHETICAL PROTEIN IN SECA 5'REGION (ORF1) (FRAG 95 85 645 MENT). 3 795 1097 gi40186 homologous to E. coli ribosomal protein L27 Bacillus subtilis 95 89 303 i143592 L27 ribosomal protein Bacillus subtilis irC21895C21895 ribosomal protein L27-Bacillus subtilis pPO5657RL27 BACSU 50S RIBOSOMAL PROTEIN L27 (BL30) (BL24). i40175 L24 gene prod 310 579 1523 gi1177684 chorismate mutase Staphylococcus xylosus 95 92 945 414 163 (pirC48396 C483 ribosomal protein L34-Bacillus stearothermophilus 95 90 162 4.185 125 277 gi1276841 glutamate synthase (GOGAT) Porphyra purpurea 95 86 153 22 723 418 gi511069 UreF Staphylococcus xylosus 94 91 3O8 22 3310 1574. gi4105164 urease alpha subunit Staphylococcus xylosus 94 85 1737 60 815 1372 gi666116 glucose kinase Staphylococcus xylosus 94 87 558 205 9536 906O gi1044978 ribosomal protein 58 Bacillus subtilis 94 78 477 326 2542 1706 gi557492 dihydroxynaptholic acid (DNNA) synthetase Bacillus subtilis 94 85 837 gi143186 dihydroxynaptholic acid (DNNA) synthetase Bacilius Subtilis 414 737 955 gi4673864 thiophen and furen oxidation Bacillus subtilis 94 77 219 426 1823 1386 gi1263908 putative Staphylococcus epidermidis 94 87 438 534 2 355 gi633650 enzyme II(mannitol) Staphylococcus carnosus 94 84 354 1017 2 229 gi149435 putative Lactococcus lactis 94 73 228 3098 184 38 gi413952 ipa-28d gene product Bacillus subtilis 94 50 147 3232 316 gi1022725 unknown Staphylococcus haemolyticus 94 84 315 42 2089 2259 pirS48396B483 ribosomal protein L33-Bacillus Stearothermophilus 93 81 171 101 1383 1021 gi155345 arsenic efflux pump protein Plasmid pSX267 93 82 363 205 11865 11503 sp|P14577RL16 SOS RIBOSOMAL PROTEIN L16. 93 83 363 259 5673 3055 gi499335 secA protein Staphylococcus carnosus 93 85 2619 275 21. 1114 2 gi633650 enzyme II(mannitol) Staphylococcus carnosus 93 86 1113 444 5773 5339 gi1022726 unknown Staphylococcus haemolyticus 93 81 435 491 152 622 gi46912 ribosomal protein L13 Staphylococcus carnosus 93 88 471 6O7 1674 2O33 gi1022726 unknown Staphylococcus haemolyticus 93 83 360 653 488 gi580890 translation initiation factor 1F3 (AA 1-172) 93 77 486 Bacilius tearothermophilus US 6,593,114 B1 73 74

TABLE 2-continued 1864 194 gi306553 ribosomal protein small subunit Homo Sapiens 93 93 192 2.997 28 3OO gi143390 carbamyl phosphate synthetase Bacillus subtilis 93 82 273 3232 596 285 gi1022725 unknown Staphylococcus haemolyticus 93 84 312 376 621 448 gi1022725 unknown Staphylococcus haemolyticus 93 88 174 16 374 gi142781 putative cytoplasmic protein: putative Bacilius Subtilis 92 83 372 sp|P37954|UVRB BACSU EXCINUCLEASE ABC SUBUNIT B (DINA PROTEIN) FRAGMENT). 5915 6124 gi1136430 KIAAO185 protein Homo Sapiens 92 46 210 56 19 26483 2739 gi467401 unknown Bacilius Subtilis 92 8O 909 69 5882 6130 gi530200 trophoblastin Ovis aries 92 53 249 145 2O38 1508 gi1022725 unknown Staphylococcus haemolyticus 92 8O 531 17 2362 1964 gi517475 D-amino acid transaminase Staphylococcus haemolyticus 92 86 399 205 12 6962 6429 gi49189 secY gene product Staphylococcus carnosus 92 85 534 205 19 10255 96.98 gi1044976 ribosomal protein L5 Bacilius Subtilis 92 82 558 219 357 gi1303812 YgeV Bacillus subtilis 92 88 354 344 1575 1805 gi1405474 CspC protein Bacillus cereual 92 85 231 699 2O 36 gi 4 13999 ipa-75d gene product Bacillus subtilis 92 81 342 1343 160 pirA45434A454 ribosomal protein L19-Bacilius Stearothermophilus 92 84 159 1958 264 Sl 4O79084 EIIscr Staphylococcus xylosus 92 8O 261 3578 386 54 gi1339950 large subunit of NADH-dependent glutamate synthase 92 78 333 Plectonema boryanum 3585 324 gi1339950 large subunit of NADH-dependent glutamate synthase 92 81 321 Plectonema boryanum 3640 402 gi1022726 unknown Staphylococcus haemolyticus 92 81 399 4362 178 gi450688 hsdM gene of EcoprrI gene product Escherichia coli 92 78 165 pirS38437S38437 hsdM protein-Escherichia coli pirSO9629ISO9629 hypothetical protein A-Escherichia coli (SUB 40-520) 4446 182 gi1022725 unknown Staphylococcus haemolyticus 82 177 4549 232 gi1022726 unknown Staphylococcus haemolyticus 8O 231 4626 224 gi1022725 unknown Staphylococcus haemolyticus 84 222 398O 4531 gi535349 CodW Bacillus Subtilis 74 552. 28 1126 gi1001376 hypothetical protein Synechocystis sp. 78 1125 60 1354 1701 gi1226043 orf2 downstream of glucose kinase Staphylococcus xylosus 8O 348 101 O36 83 gi150728 arsenic efflux pump protein Plasmid p1258 8O 954 187 2 412 1194 gi142559 ATP synthase alpha subunit Bacillus megaterium 79 783 205 22 11298 11017 gi40149 S17 protein AA 1-87) Bacillus subtilis 83 282 2O6 81.84 1O262 gi1072418 gleA gene product Staphylococcus carnosus 83 2O79 306 2326 767 gi143012 GNP synthetase Bacillus Subtilis 78 1560 306 3826 2333 gi467398 IMP dehydrogenase Bacillus subtilis 79 1494 310 2194 32O7 gi1177685 ccpA gene product Staphylococcus xylosus 81 1014 343 2974 3150 gi949974 sucrose repressor Staphylococcus xylosus 82 177 48O 1606 3O42 gi433991 ATP synthase subunit beta Bacilius Subtilis 85 1437 536 128O 534 gi143366 adenylosuccinate (PUR-s) Bacillus subtilis 79 747 pirC29326WZBSDS adenylosuccinate lyase (EC 4.3.2.2)- Bacilius Subtilis 552. 615 166 gi297874 fructose-bisphosphate aldolase Staphylococcus carnosus 79 450 pirA49943A49943 fructose-bisphosphate aldolase (EC 4.1.2.13)- taphylococcus carnosus (strain TN300) 637 1536 gi143597 CTP synthetase Bacillus subtilis 79 1536 859 21 359 gi385178 unknown Bacilius Subtilis 66 339 1327 339 530 gi 4 96.558 orfX Bacilius Subtilis 71 192 2515 275 84 gi511070 UreC Staphylococcus xylosus 85 192 2594 2O2 gi146824 beta-cystathionase Escherichia coli 75 2O1 3764 425 gi1022725 unknown Staphylococcus haemolyticus 78 423 4031 127 495 gi1022726 unknown Staphylococcus haemolyticus 79 369 4227 177 gi1296464 ATPase Lactococcus lactis 66 177 42 815 1033 gi5204 catalase Haemophilus influenzae 86 219 51 3717 46O7 OppF gene product Bacillus subtilis 74 891 129 4001 2685 glutamate dehydrogenase Bacilius Subtilis 76 1317 164 1. 16628 16933 30S RIBOSOMAL PROTEIN S15 (BS18). 74 306 171 2819 2655 gi5174 75 D-amino acid transaminase Staphylococcus haemolyticus 78 165 205 3550 2603 gi1424 63 RNA polyserase alpha-core-subunit Bacillus subtilis 76 948 205 4.410 4O72 gi1044 989 ribosomal protein S13 Bacillus Subtilis 73 339 205 6404 5643 gi49189 secY gene product Staphylococcus carnosus 81 762 205 6472 6299 gi49189 secY gene product Staphylococcus carnosus 78 174 205 13345 2998 gi786 57 Ribosomal Protein S19 Bacilius Subtilis 79 348 205 15496 15134 gi1165303 L3 Bacilius Subtilis 79 363 260 5773 4523 gi116 38O IcaA Staphylococcus epidermidis 78 1251 299 3378 3947 gi4674 40 phosphoribosylpyrophosphate synthetase Bacillus Subtilis gi40218 78 570 PRPP synthetase (AA 1-317) Bacillus subtilis 32O 1025 1717 gi3124 43 carbamoyl-phosphate synthase (glutamine-hydrolysing) 90 75 693 Bacilius aldolyticus 330 1581 1769 gi986963 beta-tubulin Sporidiobolus pararoseus 90 8O 189 369 523 92 pirS34 L-serine beta chain-Clostridium sp. 90 77 432 557 188 gi151 589 M. jannaschi predicted coding region NJ1624 90 54 186 Methanococcus jannaschii 663 667 1200 gi143786 ryptophanyl-tRNA synthetase (EC 6.1.1.2) Bacillus subtilis 90 73 534 pirJTO481YWBS tryptophan-tRHA ligase (EC 6.1.1.2)- Bacilius ubtilis US 6,593,114 B1 75 76

TABLE 2-continued 717 261 gi143065 hubst Bacillus Stearothermophilus 90 79 261 745 865 671 gi1205433 H. influenzae predicted coding region HI1190 90 81 195 Haemophilus influenzae 1007 386 565 gi143366 adenylosuccinate lyase (PUR-B) Bacillus Subtilis 90 77 18O pirC29326|WZBSDS adenylosuccinate lyase (EC 4.3.2.2)-Bacillus ubtilis 1054 331 83 gi1033122 ORF f729 Escherichia coli 90 50 249 1156 117 707 gi1477776 Clipp Bacilius Subtilis 90 8O 591 118O 205 gi1377831 unknown Bacilius Subtilis 90 74 204 1253 462 gi40046 phosphoglucose isomerase A (AA 1-449) 90 75 462 Bacillus Stearothermophilus irS15936BUBSSA glucose-6-phosphate isomerase (EC 5.3.1.9)A-cillus Stearothermophilus 2.951 269 gi144816 formyltetrahydrofolate synthetase (FTHFS) (ttg start codon) 90 76 267 (EC 3.4.3) Moorella theraoacetical 3140 166 gi1070014 protein-dependent Bacillus subtilis 90 52 162 4594 3 233 gi871784 Clp-like ATP-dependent protease binding subunit Bos taurus 90 76 23 87 1028 1750 gi467327 unknown Bacilius Subtilis 89 75 723 112 2 505 gi153741 ATP-binding protein Streptococcus mutans 89 77 504 118 12O 398 gi1303804 YgeO Bacillus subtilis 89 75 279 128 3545 3757 gi460257 triose phosphate isomerase Bacillus subtilis 89 84 213 164 12 11667 12755 gi39954 IF2 (aa 1-741) Bacillus Stearothermophilus 89 8O 1089 205 13 7405 6935 gi216338 ORF for L15 ribosomal protein Bacillus subtilis 89 76 47 205 32 15823 15494 gi1165303 L3 Bacilius Subtilis 89 8O 330 270 22O7 2007 pirC41902C419 arsenate reductase (EC 1.-.--)-Staphylococcus xylosus plasmid 89 81 2O pSX267 395 157 672 gi520574 glutamate racemate Staphylococcus haemolyticus 89 8O 516 494 839 gi396259 protease Staphylococcus epiderinidis 89 77 837 510 444 gi40046 phosphoglucose isomerase A (AA 1-449) Bacilius 89 74 444 Stearothermophilus irS15936NUBSSA glucose-6-phosphate isomerase (EC 5.3.1.9) A-cillus Stearothermophilus 615 1210 296 gi1303812 YgeV Bacillus subtilis 89 74 915 841 18 341 gi1165303 L3 Bacilius Subtilis 89 8O 324 1111 352 813 gi47146 thermonuclease Staphylococcus intermedius 89 70 462 1875 256 gi1205108 ATP-dependent protease binding subunit Haemophilus influenzae 89 82 255 2963 11 367 gi467458 cell division protein Bacillus subtilis 89 83 357 3O2O 90 362 gi1239988 hypothetical protein Bacillus subtilis 89 66 273 3565 400 gi1256635 dihydroxy-acid dehydratase Bacillus subtilis 89 75 399 3586 105 314 gi1580832 ATP synthase subunit gamma Bacilius Subtilis 89 82 210 36.29 399 gi100.9366 Respiratory nitrate reductase Bacillus subtilis 89 78 396 3688 400 gi1146286 glutamate dehydrogenase Bacilius Subtilis 89 75 399 3699 399 gi1339950 large subunit of NADN-dependent glutamate synthase 89 75 396 Plectonema boryanum 4O16 216 gi100.9366 Respiratory nitrate reductase Bacillus subtilis 89 71 213 4177 301 131 gi149426 putative Lactococcus lactis 89 76 171 4436 3O2 gi1022725 unknown Staphylococcus haemolyticus 89 8O 3OO 4635 162 gi1022725 unknown Staphylococcus haemolyticus 89 73 159 1330 2676 gi520754 putative Bacillus subtilis 88 76 1347 42 468 848 sp|P42321|CATA CATALASE (EC 1.11.1.6). 88 76 381 53 4722 3055 gi474177 alpha-D-1,4-glucosidase Staphylococcus xylosus 88 8O 1668 56 18018 18617 gi467411 recombination protein Bacillus subtilis 88 77 6OO 53 376 843 gi666116 glucose kinase Staphylococcus xylosus 88 77 468 70 1245 907 gi44095 replication initiator protein Listeria aonocytogenes 88 74 339 82 11514 12719 pirA60663 A606 translation elongation factor Tu-Bacillus Subtilis 88 79 12O6 103 4179 4391 gi167181 serine/threonine kinase receptor Brassica napus 88 77 213 114 7732 8232 gi1022726 unknown Staphylococcus haemolyticus 88 72 5O1 118 3O8 2011 gi1303804 YgeOBacillus subtilis 88 77 1704 141 657 1136 gi1405446 transketolase Bacilius Subtilis 88 72 48O 148 5871 6116 gi1118002 dihydropteroate synthase Staphylococcus haemolyticus 88 78 246 165 1428 2231 gi40053 phenylalanyl-tRNA synthetase alpha subunit Bacillus Subtilis 88 8O 804 irS11730YFBSA phenylalanine-tRNA ligase (EC 6.1.1.20) alpha aim-Bacilius Subtilis 205 28 1418.5 13343 gi11653.06 L2 Bacilius Subtilis 88 82 843 225 898 227 gi1303840 YgfSBacillus Subtilis 88 78 672 235 1975 gi452309 valyl-tRNA synthetase Bacillus Subtilis 88 76 1974 339 1566 1072 gi1118002 dihydropteroate synthase Staphylococcus haemolyticus 88 73 495 443 2928 1531 gi558559 pyrimidine nucleoside phosphorylase Bacilius Subtilis 88 73 1398 532 419 gi143797 valyl-tRNA synthetase Bacillus Stearothermophilus 88 78 417 sp|P11931|SYV BACST VALYL-TRNASYNTHETASE (EC 6.1.1.9) VALINE-TRNA LIGASE) (VALRS). 534 2SO4 2968 gi153049 mannitol-specific enzyme-III Staphylococcus carnosus 88 82 465 pirJQ0088JQ0088 phosphotransferase system enzyme II (EC .7.1.69), mannitol-specific, factor III-Staphylococcus carnosus sp|P17876|PTMA STACA PTS SYSTEM, MANNITOL-SPECIFIC IIA CONPONENT EIA-MTL) ( 705 399 214 gi710018 nitrite reductase (nirB) Bacillus subtilis 88 70 186 1OOO 1309 794 gi1022726 unknown Staphylococcus haemolyticus 88 78 516 1299 324 63 gi401786 phosphomannomutase Mycoplasma pirum 88 55 264 1341 170 400 gi39963 ribosomal protein L20 (AA 1-119) Bacillus Stearothermophilus 88 82 231 irSO5348R55520 ribosomal protein L20-Bacillus earothermophilus US 6,593,114 B1 77 78

TABLE 2-continued 1386 41 214 signal recognition particle 54K chain homolog Ffh-Bacilius Subtilis 88 71 174 386 183 533 signal recognition particle 54K chain homolog Ffh-Bacilius Subtilis 88 73 351 2949 399 94 gi 535350 CodXBacilius Subtilis 88 73 306 2984 169 gi 218277 O-acetylserine(thiol) lyase Spinacia oleracea 88 70 165 3035 138 gi 493O83 dihydroxyacetone kinase Citrobacter freundii 88 67 138 3O89 152 gi 606055 ORF f746 Escherichia coli 88 88 150 3917 410 gi 143378 pyruvate decarboxylase (E-1) beta subunit Bacillus Subtilis 88 77 408 gi1377836 pyruvate decarboxylase E-1 beta subunit Bacillus ubtilis 4199 342 gi 14054544 Bacilius Subtilis 88 82 339 42O1 369 gi 515938 glutamate synthase (ferredoxin) Synechocystis sp. 88 84 366 pirS46957S46957 glutamate synthase (ferredoxin) (EC 1.4.7.1)-ynechocystis sp. 4274 336 515938 glutamate synthase (ferredoxin) Synechocystis sp. 88 84 336 Sl pirS46957S46957 glutamate synthase (ferredoxin) (EC 1.4.7.1)-ynechocystis sp. 4308 399 gi 1462O6 glutamate dehydrogenase Bacilius Subtilis 88 71 396 4570 6OOO Sl 1535350 CodXBacilius Subtilis 87 70 1431 52 6482 61.83 Sl 1064791 function unknown Bacilius Subtilis 87 66 3OO 73 1584 248O Sl 142992 glycerol kinase (glpK) (EC 2.7.1.30) Bacillus subtilis 87 72 897 pirB458.68|B45868 glycerol kinase (EC 2.7.1.30-Bacillus subtilis sp|P18157GLPK BACSU GLYCEROL KINASE (EC 2.7.1.30) (ATP:GLYCEROL-PHOSPHOTRANSFERASE) (GLYCEROKINASE) (GK). 98 8813 Sl 467433 unknown Bacilius Subtilis 87 62 288 124 2988 S56886 serine hydroxymethyltransferase Bacillus Subtilis 87 77 1278 Sl pirS49363S49363 serine hydroxymethyltransferase Bacilius ubtilis 124 4032 Sl S56883 Unknown Bacilius Subtilis 87 66 426 148 3741 gi 467460 unknown Bacilius Subtilis 87 70 819 164 12710 Sl 399.54 IF2 (aa 1-741) Bacillus Stearothermophilus 87 72 1101 177 1104 gi 4 67385 unknown Bacilius Subtilis 87 78 1023 199 1158 gi 43527 iron-sulfur protein Bacillus Subtilis 87 77 825 199 2933 pirA27763A277 succinate dehydrogenase (EC 1.3.99.1) flavoprotein-Bacillus Subtilis 87 8O 1785 205 11543 Sl O44972 ribosomal protein L29 Bacillus Subtilis 87 78 240 205 126O7 gi 65309 S3 Bacillus Subtilis 87 75 669 222 1107 gi 772.49 rec233 gene product Bacillus subtilis 87 70 927 236 1333 gi 46198 ferredoxin Bacilius Subtilis 87 8O 303 246 2292 Sl 467373 ribosomal protein S18 Bacilius Subtilis 87 77 294 260 3422 gi 61.382 IcaC Staphylococcus epidermidis 87 72 768 32O 1696 Sl1312443 carbamoyl-phosphate synthase (glutamine-hydrolysing) 87 8O 696 Bacilius aldolyticus 38O 1165 gi 42570 ATP synthase c subunit Bacillus firmus 87 8O 219 414 900 Sl 467386 thiophen and furan oxidation Bacillus subtilis 87 77 174 425 794 gi O46166 pilin repressor Mycoplasma genitalium 87 69 210 448 722 Sl 405134 acetate kinase Bacillus Subtilis 87 75 534 48O 1. gi 42.559 ATP synthase alpha subunit Bacillus megaterium 87 79 71 481 2 sp Q06797RL1 B 50S RIBOSOMAL PROTEIN L1 (BL1). 87 72 351 677 359 Sli460911 fructose-bisphosphate aldolase Bacillus subtilis 87 78 597 677 934 Sli460911 fructose-bisphosphate aldolase Bacillus subtilis 87 78 351 876 gi 4.6247 asparaginyl-tRNA synthetase Bacillus subtilis 87 79 1376 214 Sl 3065555 F46H6.4 gene product Caenorhabditis elegans 87 75 213 22O6 Sl 215098 exciaionase Bacteriophage 154a 87 72 372 2938 Sl 508979 GTP-binding protein Bacillus subtilis 87 69 288 3O81 126 Sl 46.7399 IMP dehydrogenase Bacillus subtilis 87 72 83 3535 gi 4 aconitase Bacilius Subtilis 87 8O 399 4238 275 Sl 603769 HutU protein, Bacillus Subtilis 87 73 273 8736 7045 Sl 603769 HutU protein, urocanase Bacillus Subtilis 86 72 1692 22 3738 3286 Sl 410515 urease beta subunit Staphylococcus xylosus 86 73 4 53 54 1572 664 Sl 289287 UDP-glucose pyrophosphorylase Bacillus Subtilis 86 70 909 124 1713 1090 556887 uracil phosphoribosyltransferase Bacillus subtilis 86 74 624 Sl pirS49364S49364 uracil phosphoribosyltransferase-Bacillus ubtilis 148 1349 3448 Sl 467458 cell division protein Bacillus subtilis 86 75 148 3638 3859 gi 467460 unknown Bacilius Subtilis 86 73 222 152 1340 2O86 gi 377835 pyruvate decarboxylase P-1 alpha subunit Bacillus subtilis 86 75 747 164 17347 1194.67 gi 18468O polynucleotide phosphorylase Bacillus subtilis 86 72 8O 554. 1159 gi 43467 ribosomal protein S4 Bacillus subtilis 86 8O 606 205 2592 2218 gi 42464 ribosomal protein L17 Bacillus Subtilis 86 77 375 205 1. 21. 12990 112616 gi 4O107 ribosomal protein L22-Bacillus stearothermophilus 86 75 375 irS10612S10612 ribosomal protein L22-Bacillus earothermophilus 246 3140 2817 Sl 467375 ribosomal protein S6 Bacillus subtilis 86 70 324 299 1196 1540 Sl 39656 spoVG gene product Bacillus negaterium 86 70 345 299 3884 4345 Sl1467440 phosphoribosylpyrophosphate synthetase Bacillus subtilis gi40218 86 78 462 PRPP synthetase (AA 1-317) Bacillus subtilis 304 217O 2523 Sl 666983 putative ATP binding subunit Bacillus subtilis 86 65 354 310 1487 1678 gi 177684 chorismate mutase Staphylococcus xylosus 86 71 192 331 2O86 3405 gi 487434 isocitrate dehydrogenase Bacillus Subtilis 86 78 132O 339 1109 729 gi 118OO3 dihydroneopterin aldolase Staphylococcus haemolyticus 86 77 381 358 2124 3440 gi 1462.19 28.2% of identity to the Escherichia coli GTP-binding protein Era: 86 73 1317 putative Bacillus Subtilis US 6,593,114 B1 79 80

TABLE 2-continued 404 101S 2O58 gi1303817 YafA Bacillus subtilis 86 78 1044 581 452 243 gi40056 phoP gene product Bacillus subtilis 86 71 210 642 338 1075 gi1176399 EpiF Staphylococcus epidermidis 86 72 738 770 347 72 gi143328 phoP protein (put.); putative Bacillus subtilis 86 69 276 865 890 gi1146247 asparaginyl-tRNA synthetase Bacillus subtilis 86 74 888 868 963 1133 gi1002911 transmembrane protein Saccharomyces cerevisiae 86 69 171 904 162 gi1303912 YahWIBacillus subtilis 86 72 162 989 35 433 gi1303993 YckLBacillus subtilis 86 76 399 1212 150 gi414014 ipa-90d gene product Bacillus subtilis 86 70 147 1323 148 gi40041 pyruvate dehydrogenase (lipoamide) Bacilius Stearothermophilus 86 75 147 irS10798DEBSPF pyruvate dehydrogenase (lipoamide) (EC 1.2.4.1) pha chain Bacillus Stearothermophilus 3O85 310 8O gi1354211 PET112-like protein Bacillus Subtilis 86 86 231 3847 228 gi296464 ATPase Lactococcus lactis 86 63 228 4487 240 gi1022726 unknown Staphylococcus haemolyticus 86 73 237 4583 187 gi1022725 unknown Staphylococcus haemolyticus 86 79 186 25 4287 5039 gi1502421 3-ketoacyl-acyl carrier protein reductase Bacilius Subtilis 85 64 753 56 2 29395 281.63 gi1408507 pyrimidine nucleoside transport protein Bacilius Subtilis 85 69 1233 68 332 1192. gi4673764 unknown Bacilius Subtilis 85 74 861 73 88O 1707 gi142992 glycerol kinase (glpK) (EC 2.7.1.30) Bacillus subtilis 85 72 828 pirB458.68|B45868 glycerol kinase (EC 2.7.1.30)-Bacillus subtilis sp|P18157GLPK BACSU GLYCEROL KINASE (EC 2.7.1.30) (ATP:GLYCEROL-PHOSPHOTRANSFERASE) (GLYCEROKINASE) (GK). 106 4 1505 3490 gi143766 (thrSv) (EC 6.1.1.3) Bacillus subtilis 85 74 1986 128 1153 22O2 gi311924 glyceraldehyde-3-phosphate dehydrogenase Clostridium 85 75 1OSO pasteurianum pirS34254S34254 glyceraldehyde-3-phosphate dehydrogenase (EC.2.1.12)-Clostridium pasteurianum 129 4 5252 4038 gi1064807 ORTHININE ANINOTRANSFERASE Bacilius subtilis 85 73 1215 138 6 3475 5673 gi1072419 gloB gene product Staphylococcus carnosus 85 74 2199 189 2 69 gi4673854 unknown Bacilius Subtilis 85 65 168 205 115 806 7588 gi1044981 ribosomal protein S5 Bacillus subtilis 85 75 519 205 110596 10264 pirA02819R5BS ribosomal protein L24-Bacillus Stearothermophilus 85 72 333 22O 6101 5712 gi48980 secA gene product Bacilius Subtilis 85 66 390 231 4 3159 1441 gi1002520 MutS Bacilius Subtilis 85 70 1719 243 9 8013 8783 gi4140114 ipa-87r gene product Bacillus subtilis 85 72 771 249 2 31.86 gi1405454 aconitase Bacilius Subtilis 85 73 2709 3O2 140 gi40173 homolog of E. coli ribosomal protein L21 Bacilius Subtilis 85 72 336 irS18439S18439 Ribosomal protein L21-Bacillus Subtilis p|P26908|RL21 BACSU 50S RIBOSOMAL PROTEIN L21 (BL20). 333 2968 4 91 gi442360 ClpC adenogine triphosphatase Bacilius Subtilis 85 69 2478 364 6 6082 gi871784 Clp-like ATP-dependent protease binding subunit Bos taurus 85 68 2115 448 1339 686 gi4051344 acetate kinage Bacillus Subtilis 85 68 654 747 853 gi1373.157 orf-X; hypothetical protein; Method: conceptual translation supplied 85 73 399 by author Bacilius Subtilis 886 159 4 67 hemin permease Yersinia enterocolitical 85 55 309 1089 606 signal recognition particle 54K chain homolog Ffh-Bacilius Subtilis 85 71 603 1163 4.09 2 diaminopimelate decarboxylase Bacillus methanolicus 85 62 408 sp|P41023DCDA BACMT DIAMIMOPIMELATE DECARBOXYLASE (EC 4.1.1.20) DAP DECARBOXYLASE). 1924 251 15 gi215098 excisionase Bacteriophage 154a 85 73 237 2932 390 gi1041099 Pyruvate Kinage Bacillus licheniformis 85 71 387 3O3O 275 gi42370 pyruvate formate-lyage (AA 1–760) Escherichia coli 85 74 273 irSO1788ISO1788 formate C-acetyltransferase (EC 2.3.1.54)- cherichia coli 3111 299 3 gi63568 imb deformity protein Gallus gallus 85 85 297 3778 316 gi391840 beta-subunit of HDTPseudomonas fragi 85 67 315 3835 387 gi1204472 ype I restriction enzyme ECOR124/3 IM protein 85 56 387 Haemophilus influenzae 4042 386 gi18178 ormate acetyltransferase Chlamydomonas reinhardtii 85 70 384 irS24997IS24997 formate C-acetyltransferase (EC 2.3.1.54)- lamydomonas reinhardtii 4053 35 340 gi1204472 ype I restriction enzyme ECOR124/3 IM protein 85 56 306 Haemophilus influenzae 4108 181 gi1072418 gleA gene product Staphylococcus carnosus 85 61 18O 43OO 330 85 gi151932 ructose enzyme II Rhodobacter capsulatus 85 59 246 4392 355 83 gi1022725 unknown Staphylococcus haemolyticus 85 74 273 4408 235 gi871784 Clp-like ATP-dependent protease binding subunit Bos taurus 85 62 234 4430 291 gi100.9366 Respiratory nitrate reductase Bacillus subtilis 85 68 288 4555 253 gi4506884 hsdM gene of EcoprrI gene product Escherichia coli 85 52 252 pirS38437S38437 hsdM protein-Escherichia coli pirSO9629ISO9629 hypothetical protein A-Escherichia coli (SUB 40-520) 4611 242 gi1256635 dihydroxy-acid dehydratase Bacillus subtilis 85 65 240 1OO61 10591 gi46982 osB gene product Staphylococcus epidermidis 84 68 531 13 1172 996 gi142450 ahrC protein Bacillus subtilis 84 56 177 16 1803 4652 gi1277198 DNA repair protein Deinococcus radiodurans 84 67 2850 22 1128 721 gi511069 UreF Staphylococcus xylosus 84 73 408 23 5055 53.06 gi603320 Yer082p Saccharomyces cerevisiae 84 61 252 53 11145 10693 gi1303948 YgiWIBacillus subtilis 84 68 453 US 6,593,114 B1 81 82

TABLE 2-continued 53 12 12770 11481 gi1426134 branched chain alpha-keto acid dehydrogenase E2 Bacillus subtilis 84 71 1290 gi1303944 BfmBB Bacillus subtilis 70 1. 982 632 gi46647 ORF (repE) Staphylococcus aureus 84 68 351 73 2512 4311 gi142993 glycerol-3-phosphate dehydrogenase glpD) (EC 1.1.99.5) 84 74 18OO Bacilius ubtilis 98 4324 6096 gi467427 methionyl-tRNA synthetase Bacillus subtilis 84 66 1773 1OO 868O 7859 gi34.0128 ORF1 Staphylococcus aureus 84 78 822 117 1934 3208 gi1237019 Srb Bacilius Subtilis 84 68 1275 348 472O 5670 gi467462 cysteine synthetase A Bacillus subtilis 84 69 951 152 2O64 2456 gi143377 pyruvate decarboxylase (E-1) alpha subunit Bacillus subtilis 84 70 393 pirB36718DEBSPA pyruvate dehydrogenase (lipoamide) (EC 1.2.4.1) lpha chain-Bacillus subtilis 169 3634 3863 gi1001342 hypothetical protein Synechocystis sp. 84 66 228 171 2657 2322 gi517475 D-amino acid transaminase Staphylococcus haemolyticus 84 71 336 186 6216 5491 gi467475 unknown Bacilius Subtilis 84 70 726 205 5692 S123 gi216340 DRF for adenylate kinase Bacillus subtilis 84 71 570 224 915 1391 gi288269 beta-fructofuranoside Staphylococcus xylosus 84 70 477 251 92 388 gi1303790 YgeI Bacillus subtilis 84 65 297 282 1526 2836 gi143040 glutamate-1-semialdehyde 2,1-aminotransferase Bacilius Subtilis 84 75 1311 pirD42728D42728 glutamate-1-semialdehyde 2,1-aminoulase (EC 4.3.8)-Bacillus Subtilis 307 2959 2780 gi1070014 protein-dependent Bacillus subtilis 84 62 18O 32O 2343 4229 gi143390 carbamyl phosphate synthetase Bacillus subtilis 84 70 1887 372 296 gi1022725 unknown Staphylococcus haemolyticus 84 70 294 413 1341 481 gi1256146 YbbQ Bacillus Subtilis 84 65 861 439 392 gi1046173 Osmotically inducible protein Mycoplasma genitalium 84 53 390 461 1362 2270 gi402114 (thrC) (AA 1-352) Bacillus subtilis 84 69 909 irA25364A25364 threonine synthase (EC 4.2.99.2)-Bacillus btilis 487 299 gi1144531 integrin-like protein alpha Intlp Candida albicans 84 46 297 491 624 905 pirSO8564|R3BS (ribosomal protein S9-Bacillus Stearothermophilus 84 69 282 491 3 836 1033 pirSO8564|R3BS (ribosomal protein S9-Bacillus Stearothermophilus 84 77 198 548 341 gi431231 uracil permease Bacillus caldolyticus 84 74 339 728 1748 795 gi912445 DNA polymerase Bacillus caldotenax 84 68 954 769 257 gi1510953 cobalamin biosynthesis protein N Methanococcus jannaschii 84 38 255 954 156 gi1405454 aconitase Bacilius Subtilis 84 57 153 957 395 gi143402 recombination protein (ttg start codon) Bacilius Subtilis 84 68 393 gi1303923 RecN Bacillus subtilis 975 452 gi885934 Clpb, Synechococcus sp. 84 70 450 1585 257 gi510140 ligoendopeptidase F Lactococcus lactis 84 56 255 2954 323 gi603769 HutU protein, urocanase Bacillus Subtilis 84 73 321 2996 46 gi18178 formate acetyltransferase Chlamydomonas reinhardtii 84 65 303 irS24997IS24997 formate C-acetyltransferase (EC 2.3.1.54)- lamydomonas reinhardtii 3766 375 13 gi517205 67 kDa Myosin-crossreactive Streptococcal antigen 84 72 363 Streptococcus yogenes 4022 169 gi1148206 glutamate dehydrogenase Bacilius Subtilis 84 54 168 4058 312 gi151932 fructose enzyme II Rhodobacter capsulatus 84 71 309 4108 106 351 gi1072418 gleA gene product Staphylococcus carnosus 84 77 246 41.83 3O8 gi603769 HutU protein, urocanase Bacillus Subtilis 84 72 306 4726 55 234 gi146208 glutamate synthase large subunit (EC 2.6.1.53) Escherichia 84 73 18O coli pirA29617A29617 glutamate synthase (NAPPH) (EC 1.4.1.13) large hain-Escherichia coli 22 4 1576 1109 gi393297 urease accessory protein Bacillus sp. 83 64 468 53 13 13745 12768 gi142612 branched chain alpha-keto acid dehydrogenase E1-beta 83 68 978 Bacilius ubtilis 57 16 12872 12387 gi143132 lactate dehydrogenase (AC 1.1.1.27) Bacillus caldolyticus 83 66 486 pirB29704B29704 L-lactate dehydrogenase (EC 1.1.1.27)- Bacillus aldolyticus 66 2274. 1429 gi1303894 YchN Bacillus subtilis 83 63 846 66 4643 31.68 gi2212730 YghKBacillus subtilis 83 68 1476 70 1523 1182 gi44095 replication initiator protein Listeria monocytogenes 83 73 342 90 377 1429 gi155571 alcohol dehydrogenase I (adha) (EC 1.1.1.1) Zymomonas mobilis 83 70 1053 pirA35260A35260 alcohol dehydrogenase (EC 1.1.1.1) I-Zymomonas obilis 95 708 216.2 gi506381 phospho-beta-glucosidase Bacilius Subtilis 83 70 1455 137 168 694 gi467391 initiation protein of replicaton Bacillus subtilis 83 77 627 140 2742 2275 gi634107 kdp8 Escherichia coli 83 65 468 142 2989 2510 gi2212776 lumazine synthase (b-subunit) Bacillus amyloliquefaciens 83 69 48O 161 5749 6696 gi903307 ORF75 Bacilius subtilis 83 64 948 164 988O 11070 gi149316 ORF2 gene product Bacillus subtilis 83 66 1.191 164 14148 14546 gi580902 ORF6 gene product Bacillus subtilis 83 60 399 170 2467 1790 gi520844 orf4 Bacillus Subtilis 83 64 678 186 1370 711 gi289284 cysteinyl-tRNA synthetase Bacillus Subtilis 83 72 660 205 76O7 7392 gi216337 ORF for L30 ribosomal protein Bacillus subtilis 83 74 216 237 3683 4540 gi1510488 imidazoleglycerol-phosphate synthase (cyclase) 83 60 858 Methanococcus jannaschii 3O2 638 291 gi467419 unknown Bacilius Subtilis 83 65 348 3O2 1421 2743 gi508979 GTP-binding protein Bacillus subtilis 83 68 1323 321 3571 3209 gi139844 (citG) (aa 1-462) Bacillus subtilis 83 68 363 US 6,593,114 B1 83 84

TABLE 2-continued 367 2 352 gi1039479 ORFU Lactococcus lactis 83 54 351 387 3 662 gi806281 (DMA polyserase I Bacillus Stearothermophilus 83 70 660 527 916 1566 gi396259 protease Staphylococcus epiderinidis 83 67 651 533 179 gi142455 alanine dehydrogenase (EC 1.4.1.1) Bacillus Stearothermophilus 83 66 177 pirB34261B34261 alanine dehydrogenase (EC 1.4.1.1)- Bacillus tearothermophilus 536 1438 3259 gi143366 adenlosuccinate lyase (PUR-B) Bacilius Subtilis 83 67 18O pirC29326WZBSDS adenylosuccinate lyase (EC 4.3.2.2)- Bacilius ubtilis 652 3 859 gi520753 DNA topoisomerase I Bacillus subtilis 83 72 858 774 361 gi1522665 M. jannaschii predicted coding region MJECL28 83 58 162 Methanococcus jannaschii 897 12O 296 gi1064807 ORTHININE AMINOTRANSFERASE Bacillus subtilis 83 76 177 1213 491 gi289288 leXA Bacilius Subtilis 83 67 489 2529 150 gi143786 tryptophanyl-tRNA synthetase (EC 6.1.1.2) Bacillus subtilis 83 69 147 pirJTO481YWBS tryptophan-tRNA ligase (EC 6.1.1.2)-Bacillus ubtilis 2973 326 gi1109687 ProZ. Bacilius Subtilis 83 58 324 3009 366 gi882532 ORF o294 Escherichia coli 83 65 363 3035 45 305 gi95.0062 hypothetical yeast protein I Mycoplasma capricoium 83 59 261 pirS48578S48578 hypothetical protein-Mycoplasma capricolus SGC3) (fragment) 3906 67 309 gi1353197 thioredoxin reductase Eubacterium acidaminophilum 83 61 243 4458 271 gi397526 clumping factor Staphylococcus aureus 83 78 270 4570 223 gi1022726 unknown Staphylococcus haemolyticus 83 74 222 4654 97 261 gi1072419 gloB gene product Staphylococcus carnosus 83 79 165 16 295 1191. gi153854 uVS402 protein Streptococcus pneumoniae 82 67 897 16 1193 1798 gi153854 uVS402 protein Streptococcus pneumoniae 82 70 606 38 8748 7804 gi1204400 N-acetylneuraminate lyase Haemophilus influenzae 82 58 921 42 988 2019 gi841.192 catalase Bacteroides fragilis 82 70 1032 51 2590 34.89 gi143607 sporulation protein Bacillus subtilis 82 69 900 56 12270 13925 gi39431 oligo-1,6-glucosidase Bacillus careus 82 60 1656 56 17673 18O14 gi467410 unknown Bacilius Subtilis 82 66 342 61 881 3313 gi143148 ransfer RNA-Leu synthetase Bacillus Subtilis 82 70 2433 82 9162 11318 gi482404 elongation factor G (AA 1-691) Thermus aquaticus thermophilus 82 64 2157 irS15928EFTWG translation elongation factor G-Thermus aquaticus p|P13551 EFG THETN ELONGATION FACTOR G (EF-G). 85 2 3260 1050 gi143369 phosphoribosylformylglycinamidine synthetase II (PUR-Q) 82 66 2211 Bacillus ubtilis 102 3662 538O gi1256635 dihydroxy-acid dehydratase Bacillus subtilis 82 65 1719 117 3242 3493 pirA47154A471 orf1 5' of Ffh-Bacilius subtilis 82 53 252 128 4377 5933 gi460258 phosphoglycerate mutase Bacilius Subtilis 82 66 1557 129 1229 21.82 gi403373 glycerophosphoryl diester phosphodiesterase Bacilius Subtilis 82 62 954 pirS37251S37251 glycerophosphoryl diester phosphodiesterase acilius Subtilis 170 1441 gi1377831 unknown Bacilius Subtilis 82 67 1440 177 1094 gi467386 hiophen and furan oxidation Bacillus subtilis 82 65 1092 184 3572 4039 gi153566 ORF 19K protein) Enterococcus faecalis 82 59 468 189 4225 3995 gi1001878 CspL protein Listeria monocytogenes 82 73 231 2O6 2O707 2004.8 gi473916 ipopeptide antibiotics iturin A Bacillus subtilis 82 50 660 sp|P39144|LP14 BACSU LIPOPEPTIDE ANTIBIOTICS ITURINA AND SURFACTIN OSYNTHESIS PROTEIN. 221 805 1722 gi517205 67 kDa Nyosin-crossreactive Streptococcal antigen 82 63 918 Streptococcus yogenes 223 3651 3436 gi439619 Salmonella typhimurium IS200 insertion sequence from SABA17, 82 69 216 artial. gene product Salmonella typhimurium 260 4296 33.85 gi1161381 IcaB Staphylococcus epidermidis 82 61 912 315 2855 846 gi143397 quinol Oxidase Bacillus subtilis 82 67 2010 321 7945 7370 gi142981 ORF5; This ORF includes a region (aa23-103) containing a 82 62 576 potential ron-sulphur center homologous to a region of Rhodospirillum rubrum ind Chromatium vinosum; putative Bacilius Stearothermophilus pirPQ0299|PQ0299 hypothetical protein 5 (gldA 3' region) 331 1055 1342 gi436574 ribosomal protein L1 Bacillus subtilis 82 71 288 370 262 618 gi1303793 YgeLBacillus subtilis 82 59 357 404 3053 4024 gi1303821 YafE Bacillus subtilis 82 68 972 405 3073 1706 gi1303913 YchX Bacillus subtilis 82 67 1368 436 2864 1632 gi149521 beta subunit Lactococcus lactis 82 67 1233 pirS35129IS35129 tryptophan synthase (EC 4.2.1.20) beta chain actococcus lactis subsp. lactis 441 2573 1752 gi142952 glyceraldehyde-3-phosphate dehydrogenase 82 67 822 Bacilius tearothermophilus 444 12 104.15 11227 gi1204354 spore germination and vegetative growth protein 82 67 813 Haemophilus influenzae 446 3 191 gi143387 aspartate transcarbamylase Bacilius Subtilis 82 66 189 462 3 1007 1210 gi142521 deoxyribodipyrimidine photolyase Bacilius Subtilis 82 64 204 pirA37192A37192 uvrB protein-Bacillus Subtilis sp|P14951|UVRC BACSU EXCINUCLEASE ABC SUBUNIT C. US 6,593,114 B1 85 86

TABLE 2-continued 537 784 gi853767 UDP-N-acetylglucosamine 1-carboxyvinyltransferase 82 61 777 Bacillus ubtilis 68O 4O7 gi426472 secE gene product Staphylococcus carnosus 82 69 294 724 : 386 gi143373 phosphoribosyl aminoimidazole carboxy formyl ormyltransferasef 82 68 18O inosine monophosphate cyclohydrolase (PUR-H).J)) Bacillus subtilis 763 213 gi467458 cell division protein Bacillus subtilis 82 35 210 81.8 283 gi1064787 function unknown Bacilius Subtilis 82 69 282 858 175 1176 gi143043 uroporphyrinogen decarboxylase Bacilius Subtilis 82 71 10O2 pirB47045B47045 uroporphyrinogen decarboxylase (EC 4.1.1.37)-acillus subtilis 895 599 gi1027507 ATP binding protein Borrelia burgdorferi 82 72 597 939 399 gi143795 transfer RNA-Tyr synthetase Bacillus subtilis 82 60 390 961 306 gi577647 gamma-hemolysin Staphylococcus aureus 82 69 366 1192. 155 gi146974 NH3-dependent NAD synthetase Escherichia coli 82 71 153 1317 49 375 gi407908 EIIscr Staphylococcus xylosus 82 72 327 1341 150 gi39962 ribosomal protein L35 (AA 1-66) Bacillus Stearothermophilus 82 68 150 irSO5347R5BS35 ribosomal protein L35-Bacillus earothermophilus 2990 349 131 gi534855 ATPase subunit epsilon Bacillus Stearothermophilus 82 47 219 sp|P42009|ATPE BACST ATP SYNTHASE EPSILON CHAIN (EC 3.6.1.34). 3024 45 224 gi467402 unknown Bacilius Subtilis 82 64 18O 3O45 139 gi467335 ribosomal protein L9 Bacillus subtilis 82 60 138 3O45 400 242 gi467335 ribosomal protein L9 Bacillus subtilis 82 82 159 3091 238 gi499335 secA protein Staphylococcus carnosus 82 78 237 3107 210 i gi546918 orfY 3' of comK Bacillus subtilis, E26, Peptide Partial, 140 aa 82 64 2O7 pirS43612S43612 hypothetical protein Y-Bacillus subtilis sp|P40398|YNXD BACSU HYPOTHETICAL PROTEIN IN COMK3'REGION ORFY) FRAGMENT) 4332 319 gi420864 nitrate reductase alpha subunit Escherichia coli 82 75 31.8 p|PO9152NABG ECOLI RESPIRATORY NITRATE REDUCTASE 1 ALPHACHAIN (EC 7.99.4). (SUB 2-1247) 23 2574 1873 gi1199573 spsB Sphingomonas sp. 64 702 42 321 gi4667784 lysine specific permease Escherichia coli 59 31.8 48 4051 4350 gi1045937 M. genitalium predicted coding region MG246 62 3OO Mycoplasma genitalium 51 1578 2579 pirS16649|S166 dciAC protein-Bacillus subtilis 55 10O2 53 364 1494 gi1303961 YajJ Bacillus Subtilis 67 1131 53 7971 6523 gi146930 6-phosphogluconate dehydrogenase Escherichia coli 66 1449 54 1O119 9481 gi143016 permease Bacillus subtilis 65 639 54 11786 10212 gi143015 gluconate kinase Bacilius Subtilis 64 1575 57 13366 12749 pirA25805A258 t-lactate dehydrogenase (EC 1.1.1.27)-Bacillus subtilis 74 618 81 2237 1726 gi1222302 Nif J-related protein Haemophilus influenzae 54 492 86 374 gi414017 ipa-93d gene product Bacillus subtilis 70 372 103 4863 3284 gi971342 nitrate reductase beta subunit Bacilius Subtilis 64 1578 sp|P42176NARN BACSU NITRATE REDUCTASE BETA CHAIN (EC 1.7.99.4). 12O 10845 12338 gi1524392 GbSA Bacillus Subtilis 67 1494 128 3676 4413 gi143319 triose phosphate isomerase Bacillus negaterium 64 738 131 928O 8252 gi299163 alanine dehydrogenase Bacilius Subtilis 68 1029 143 54.71 4654 gi439619 Salmonella typhimurium IS200 insertion sequence from SARA17, 61 618 artial. gene product Salmonella typhimurium 169 43 825 gi897795 305 ribosomal protein Pediococcus acidilactici 8 65 783 sp|P49668|RS2 PEDAC 30S RIBOSOMAL PROTEIN S2. 230 226 gi1125826 short region of weak similarity to tyrosine-protein kinase receptors in 54 225 a fibronectin type III-like domain Caenorhabditis elegans 233 51. 2OOO 2677 gi4674.04 unknown Bacilius Subtilis 8 63 678 241 2149 1217 gi16510 succinate-CoA ligase (GDP-forming) Arabidopsis thaliana 69 933 irS30579S30579 succinate-CoA ligase (GDP-forming) (EC 6.2.1.4) pha chain-Arabidopsis thaliana (fragment) 256 1. 981 spoIIIE protein-Bacillus Subtilis 8 65 981 259 2691 1630 PROBABLE PEPTIDE CHAIN RELEASE FACTOR 2 65 1062 (RF-2) (FRAGMENT). 275 1728 3581 gi726480 L-glutamine-D-fructose-6-phosphate aminotransferase 68 1854 Bacilius ubtilis 285 735 gi1204844 H. influenzae predicted coding region H10594 8 63 732 Haemophilus influenzae 296 99 1406 gi467328 adenylosuccinate synthetase Bacilius Subtilis 67 1308 3O2 5590 5889 gi147485 queAEscherichia coli 64 3OO 317 1137 1376 gi154961 resolvase Transposon Tn917 51 240 343 1034 1342 gi405955 yeeD Escherichia coli 60 309 360 1404 2471 gi1204570 aspartyl-tRNA synthetase Haemophilus influenzae 67 1068 364 57O6 516.1 gi1204652 methylated-DNA-protein-cysteine methyltransferase 63 546 Haemophilus influenzae 372 1135 563 gi467416 unknown Bacilius Subtilis 65 573 392 43 603 pirSO9411 ISO94 spoIIIE protein-Bacillus Subtilis 65 561 404 5252 6154 gió06745 Bex Bacilius Subtilis 65 903 426 1119 511 gi3.9453 Manganese superoxide digmutage Bacillus caldotenax 66 609 irS22053S22053 superoxide dissutase (EC 1.15.1.1) (Mn)- Bacilius idotenax US 6,593,114 B1 87 88

TABLE 2-continued 5653 5889 hypothetical protein II (ospH 3' region)-Salmonella typhimurium 8 57 237 (fragment) 625 3 1105 2070 gi1262360 protein kinase PknB Mycobacterium leprae 56 966 754. SO4 1064 gi1303902 YchU Bacillus subtilis 71 561 842 86 430 gi1405446 transketolase Bacilius Subtilis 68 345 953 400 gi1205429 dipeptide transport ATP-binding protein Haemophilus influenzae 57 399 961 252 gi4876864 synergohymenotropic toxin Staphylococcus intermedius 72 150 pirS44944S44944 synergohymenotropic toxin Staphylococcus intermedius 1035 189 gi1046138 M. genitalius predicted coding region MG423 43 189 Mycoplasma genitalium 128O 449 228 giS59164 helicase Autographa californica nuclear polyhedrosis virus 43 222 sp|P24307|V143 NPVAC HELICASE. 3371 68 241 gi132224.5 mevalonate pyrophosphate decarboxylase Rattus norvegicus 62 174 3715 239 gi537137 ORF f388 Escherichia coli 58 237 3908 325 gi439619 Salmonella typhisurium IS200 insertion sequence from SARA17, 68 324 artial. gene product Salmonella typhimurium 3940 gi296464 ATPase Lactococcus lactis 69 399 3954 gi1224069 amidase Moraxella catarrhalis 68 31.8 4049 gi603768 HutI protein, imidazolone-5-propionate Bacillus subtilis 68 168 gi603768 HutI protein. imidazolone-5-propionate hydrolase Bacillus Subtilis 4209 324 gi403373 glycerophosphoryl diester phosphodiesterase Bacilius Subtilis 58 324 pirS37251S37251 glycerophosphoryl diester phosphodiesterase acilius Subtilis 4371 322 17 gi216677 indolepyruvate docarboxylase Enterobacter cloacae 72 306 pirS16013S16013 indolepyruvate decarboxylase (EC 4.1.1.-)- interobacter cloacae 4387 19 228 gi460689 TVG Thermoactinmyces vulgaris 59 210 4391 306 31 gi1524193 unknown Mycobacterium tuberculosis 67 276 4.425 341 gi143015 gluconate kinase Bacilius Subtilis 66 339 847 101 gi1064786 function unknown Bacilius Subtilis 62 747 17 311 78 gi559164 helicase Autographa californica nuclear polyhedrosis virus 40 234 sp|P24307|V143 NPVAC HELICASE. 45 1159 2448 gi1109684 ProV Bacilius Subtilis 63 1290 45 4032 4733 gi1109687 ProZ. Bacilius Subtilis 55 702 54 95O2 8738 gi563952 gluconate permease Bacilius lichenifornis 62 765 62 7545 6238 gi854655 Na/H antiporter system Bacilius alcalophilus 62 1308 62 8087 8683 gi559713 ORF Homo sapiens 68 5 9 7 67 13781 14122 gi305.002 ORF f356 Escherichia coli 65 34 2 70 10296 9097 gi1303995 YokN Bacillus subtilis 64 12O 98 6336 7130 gi467428 unknown Bacilius Subtilis 68 98 7294 78.33 gi467430 unknown Bacilius Subtilis 64 98 782O 8737 gi467431 high level kasgamycin resistance Bacillus Subtilis 61 109 14154 14813 gi580875 ipa-57d gene product Bacillus subtilis 63 112 14294 16636 gi1072361 pyruvate-formate-lyase Clostridium pasteurianum 65 234 139 726 gi506699 CapC Staphylococcus aureus 58 139 1448 717 gi506698 CapB Staphylococcus aureus 59 174 2870 2469 gi1146242 aspartate i-decarboxylase Bacillus subtilis 61 177 2102 2842 gi4673854 unknown Bacilius Subtilis 70 184 5912 57OO gi161953 185-kDa surface antigen TrypanoSosa cruzi 46 186 3875 2.382 gi289282 glutamyl-tRNA synthetase Bacillus Subtilis 65 149 205 15140 14484 gi401.03 ribosomal protein L4Bacillus Stearothermophilus 66 65 2O7 140 1315 gi460259 enolase Bacillus subtilis 67 117 211 1078 1590 gi4101.31 ORFX7 Bacilius subtilis 61 235 1962 2255 gi143797 valyl-tRNA synthetase Bacillus Stearothermophilus 55 2 9 sp|P11931|SYV BACST VALYL-TRNASYNTHETASE (EC 6.1.1.9) VALINE-TRNA LIGASE) (VALRS). 239 1263 gi143000 proton glutamate symport protein Bacillus Stearothermophilus 59 1263 pirS262471S26247 glutamate/aspartate transport protein Bacillus tearothermophilus 272 2461 21.98 gi709993 hypothetical protein Bacillus subtilis 54 264 301 1111 776 gi467418 unknown Bacilius Subtilis 58 336 310 45O1 3305 gi1177686 acuC gene product Staphylococcus cylosus 67 1197 310 5258 7006 gi348053 acetyl-CoA synthetase Bacillus Subtilis 67 1749 310 74.10 9113 gi1103865 forsyl-tetrahydrofolate synthetase Streptococcus mutans 67 1704 325 1114 1389 gi31.0325 Outer capsid protein Rotavirus sp. 40 276 337 636 gi537049 ORF o470 Escherichia coli 55 633 374 929 1228 gi1405448 YneFBacilius Subtilis 70 3OO 375 s 3062 33.31 gi467448 unknown Bacilius Subtilis 68 270 388 267 587 gi1064791 function umknown Bacillus Subtilis 65 321 394 659 gi304976 matches PS00017: ATP GTP A and PSOO301: EFACTOR GTP: 65 653 similar to longation factor G, TetM/TetO tetracycline-resistance proteins Escherichia coli 456 625 1263 gi1141183 putative Bacillus Subtilis 65 639 475 654 gi288269 beta-fructofuranoside Staphylococcus xylosus 66 654 544 1449 2240 gi529754 spec Streptococcus pyogenes 50 792 622 1623 1871 gi1483,545 unknown Mycobacterium tuberculosis 65 249 719 11 1257 gi1064791 function umknown Bacillus Subtilis 68 1257 US 6,593,114 B1 89 90

TABLE 2-continued 739 107 838 gi666983 putative ATP binding subunit Bacillus subtilis 61 732 745 414 247 gi1511600 coenzyme PQQ synthesis protein III Methanococcus jannaschii 61 168 822 17 679 gi410141 ORFX17 Bacilius subtilis 68 663 827 836 681 gi1205301 eukotoxin secretion ATP-binding protein Haemophilus influenzae 54 156 1044 149 gi60632 vp2 Marburg virus 1220 413 255 pirA61072EPSG gallidermin precursor-Staphylococcus gallinarum 74 159 2519 75 275 gi147556 dpi Escherichia coli 45 2O1 2947 279 55 gi1184680 polynucleotide phosphorylase Bacillus subtilis 62 225 312O 226 gi517205 67 kDa Myosin-crossreactive Streptococcal antigen 65 225 Streptococcus yogenes 3.191 148 gi151259 HMG-COA reductase (EC 1.1.1.88) Pseudomonas mevaionii 59 147 pirA44756A44756 hydroxymethylglutaryl-CoA reductase (EC 1.1.1.88) Pseudomonas sp. 3560 285 434 gi217130 photosystem I core protein B Synechococcus vulcanus 70 3655 47 346 gi415855 deoxyribose aldolase Mycoplasma hominis) 56 3658 324 584 gi551531 2-nitropropane dioxygenase Williopsis Saturnus 54 3769 400 gi1339950 arge subunit of NADN-dependent glutamate synthase 68 Plectonema boryanum 3781 348 gi166412 NADH-glutamate synthase Medicago satival 62 3988 48 gi1204696 ructose-permease IIBC component Haemophilus influenzae 69 4030 287 gi100.9366 Respiratory nitrate reductase Bacillus subtilis 60 4092 275 gi1370207 orf6 Lactobacillus Sake 69 4103 342 gi39956 IIGlc Bacilius Subtilis 65 4231 348 gi289287 UDP-glucose pyrophosphorylase Bacillus Subtilis 65 4265 299 gi603768 HutI protein, imidazolone-5-propionate hydrolase Bacillus subtilis 63 gi603768 HutI protein, imidazolone-5-propionate hydrolase Bacillus Subtilis 4504 250 gi1339950 large subunit of NADH-dependent glutamate synthase 68 Plectonema boryanum 6 5998 6798 gi535351 Cody Bacilius Subtilis 79 63 8O1 i 7051 5807 gi603768 HutI protein, imidazolone-5-propionate hydrolase Bacillus subtilis 79 64 1245 gi603768 HutI protein, imidazolone-5-propionate hydrolase Bacillus Subtilis 25 5273 5515 pirA36728A367 acyl carrier protein-Rhizobium meliloti 79 65 243 59 1173 1424 gi1147923 hreonine dehydratase 2 (EC 4.2.1.16) Escherichia coli 79 75 252 60 2O)4 gi1666115 orf1 upstream of glucose kinase Staphylococcus xylosus 79 60 204 pirS52351 S52351 hypothetical protein 1-Staphylococcus xylosua 81 1. 1590 178 gi466882 pps1: B1496 C2 189 Mycobacterium leprae 79 64 85 6505 5987 gi143364 phosphoribosyl aminoimidazole carboxylase I (PUR-E) 79 60 Bacilius ubtilis 89 4554 3448 gi1449064 product homologous to E. coli thioredoxin reductase: J. Biol. Chem. 79 35 1988) 263: 9015-9019, and to F52a protein of alkyl hydroperoxide eductase from S. typhimurium: J. Biol. Chem. (1990) 265: 10535-10540; pen reading frame A Clostridium pasteurianum 11 7489 85.71 gi1430934 ketol-acid reductoisomerase. Bacilius Subtilis 79 64 1083 sp|P37253|ILVC BACSU KETOL-ACID REDUCTOISDMERASE (EC 1.1.1.86) ACETOHYDROXY-ACID ISOMERDREDUCTASE) ALPHA-RETO-BETA-HYDROXYLACIL EDUCTOISOMERASE). O2 11190 12563 gi149428 putative Lactococcus lactis 79 65 1374 27 7792 9372 gi458688 PrfC/RF3 Dichelobacter nodosus 79 68 1581 39 1983 1426 gi506697 CaPA Staphylococcus aureus 79 55 558 44 1156 668 gi14982.96 peptide methionine sulfoxide reductase Streptococcus pneumoniae 79 47 489 48 529 1098 gi467457 hypoxanthine-guanine phosphoribosyltransferase Bacilius Subtilis 79 59 570 gi467457 hypoxenthine-guanine phosphoribosyltransfarase Bacillus ubtilis 50 591 217 gi755602 unknown Bacilius Subtilis 79 61 375 76 587 135 gi297874 ructose-bisphosphate aldolase Staphylococcus carnosus 79 65 453 pirA49943A49943 fructose-bisphosphate aldolase (EC 4.1.2.13)- taphylococcus carnosus (strain TM300) 86 6874 61.64 gi1314298 ORF5; putative Sms protein; similar to Sms proteins from 79 64 711 Haemophilus influenzae and Escherichia coli Listeria monocytogenes 205 8498 8109 gi1044980 ribosomal protein L18 Bacillus Subtilis 79 70 390 211 519 gi1303994 YckM Bacillus Subtilis 79 62 519 223 28O1 1419 gi488430 alcohol dehydrogenase 2 Entamoeba histolytical 79 60 1383 243 7896 6877 gi580883 ipa-88d gene product Bacillus subtilis 79 60 1O2O 279 3721 4329 gi413930 ipa-6d gene product Bacilius Subtilis 79 59 609 3OO 11 1393 gi403372 glycerol 3-phosphate permease Bacilius Subtilis 79 62 1383 307 1935 940 gi95.0062 hypothetical yeast protein 1 Mycoplasma capricoium 79 60 996 pirS48578S48578 hypothetical protein-Mycoplasma capricolum SGC3) fragment) 352 8886 7666 gi216854 P47K Pseudomonas chlororaphis 79 59 1221 412 578 gi143177 putative Bacillus Subtilis 79 51 576 481 621 1124 gi786163 Ribosomal Protein L10 Bacilius Subtilis 79 66 SO4 516 352 2 gi805090 Nisf Lactococcus lactis 79 48 351 525 1426 395 gi143371 phosphoribosyl aminoimidazole synthetase (PUR-M) 79 61 1032 Bacillus subtilis pirH29326AJBSCL phosphoribosylformylglycinamidine cyclo-ligase EC 6.3.3.1)- Bacilius Subtilis US 6,593,114 B1 91 92

TABLE 2-continued 538 2825 22O2 gi1370207 orf6 Lactobacillus Sake 79 67 624 570 2 421 gi476160 arginine permease substrate-binding subunit Listeria monocytogenes 79 42O 645 2663 3241 gi53898 ransport protein Salmonella typhimurium 79 62 579 683 75 374 gi1064795 function unknown Bacilius Subtilis 79 62 3OO 816 3987 3274 gi1407784 orf-1; novel antigen Staphylococcus aureus 79 62 71.4 2929 3 4O1 gi1524397 glycine betaine transporter CpuD Bacillus subtilis 79 81 399 2937 2O2 47 pirA52915|S529 nitrate reductase alpha chain-Bacilius Subtilis (fragment) 79 58 156 2940 385 gi149429 putative Lactococcus lactis 79 72 384 2.946 286 gi143267 2-oxoglutamate dehydrogenase (Odha; EC 1.2.4.2) Bacillus Subtilis 79 61 285 2999 212 gi710020 nitrite reductase (nirB) Bacillus subtilis 79 59 210 3O22 322 150 gi4506864 3-phosphoglycerate kinase Thermotoga maritina 79 61 183 3.064 314 gi1204436 pyruvate formate-lyase Haemophilus influenzae 79 60 312 3O83 22O gi1149662 hypD gene product Clostridium perfringens 79 56 219 3126 411 321 gi1339950 arge subunit of NADH-dependent glutamate synthase 79 55 291 Plectonema boryanum 3181 326 45 gi1339950 arge subunit of NADH-dependent glutamate synthase 79 59 282 Plectonema boryanum 3345 476 Clp-like ATP-dependent protease binding subunit Bos taurus 79 63 474 3718 270 euB protein, inactive-Lactococcus lactis subsp. lactis 79 71 267 (strain IL1403) 3724 159 4O1 gi100.9366 Respiratory nitrate reductase Bacillus subtilis 79 64 243 3836 312 16 gi1524193 unknown Mycobacterium tuberculosis 79 65 297 3941 334 gi415855 deoxyribose aldolase Mycoplasma hominis 79 54 333 4113 341 gi143015 gluconate kinase Bacilius Subtilis 79 63 339 45O1 209 12 gi1022726 unknown Staphylococcus haemolyticus 79 66 198 4612 238 gi460689 TVG Thermoactinomyces vulgaris 79 58 237 2 1213 gi520753 DNA topolsomerase I Bacillus subtilis 78 64 1212 1220 174 gi216151 DNA polymerase (gene L.; ttg start codon) Bacteriophage SPO2 78 72 1047 giS79197 SPO2 DNA polymerase (aa 1-648) Bacteriophage SPO2 pirA21498|DJBPS2 DNA-directed DNA polymerase (EC 2.7.7.7)-phage PO2 1089 838 gi1064787 function unknown Bacilius Subtilis 78 57 252 32 6803 7702 gi146974 NH3-dependent NAD synthetase Escherichia coli 78 63 900 36 2941 3138 gi290503 glutamate permease Escherichia coli 78 53 198 53 16221 14758 gi1303941 YgiV Bacillus subtilis 78 58 1464 57 1052O 12O67 gi3072418 gleA gene product Staphylococcus carnosus 78 65 1548 66 5812 4826 gi1212729 Ygh.J. Bacillus subtilis 78 67 987 67 4O29 4376 gi466612 nikAEscherichia coli 78 71 348 91 1OO58 10942 gi467380 stage 0 sporulation Bacilius Subtilis 78 51 885 102 1. 8574 101.30 gi149426 putative Lactococcus lactis 78 61 1557 112 3540 4463 gi854234 cymC gene product Klebsiella Oxytocal 78 56 924 124 1061 234 gi4056224 unknown Bacilius Subtilis 78 60 828 130 1805 2260 gi1256636 putative Bacillus Subtilis 78 71 456 133 377 gi168060 lamBEmericella nidulans 78 59 375 166 61.63 gi451216 Mannosephosphate Isomerase Streptococcus mutans 78 63 963 186 795 gi289284 cysteinyl-tRNA synthetase Bacillus Subtilis 78 63 792 195 2315 1881 gi1353874 unknown Rhodobacter capsulatus 78 58 435 199 36.23 2967 gi143525 succinate dehydrogenase cytochrome b-558 subunit Bacillus 78 57 657 subtilispirA29843DEBSSC succinate dehydrogenase (EC 1.3.99.1) Cytochrome 558-Bacillus subtilis 199 5557 3905 gi142521 deoxyribodipyrimidine photolyase Bacilius Subtilis 78 62 1653 uvrB pirA37192A37192 protein-Bacillus Subtilis sp|P14951|UVRC BACSU EXCINUCLEASE ABC SUBUNIT C. 223 3523 3215 gi439596 Escherichia coli IS200 insertion sequence from ECOR63, 78 47 309 partial., ene product Escherichia coli 299 1865 2149 gi467439 temperature sensitive cell division Bacillus subtilis 78 62 285 321 7315 6896 gi142979 ORF3 is homologous to an ORF downstream of the spoT gene of 78 55 42O E. coli; RF3 Bacilius Stearothermophilus 352 371.4 394.4 gi349050 actin 1 Pneumocystis carini 78 42 231 352 6093 4594 gi903587 NADN dehydrogenase subunit 5 Bacillus Subtilis 78 58 1SOO sp|P39755INDNF BACSU NADN DEHYDROGENASE SUBUNIT 5 (EC 1.6.5.3) NADN-UBIQUINONE CHAIN 5). 376 583 gi551693 dethiobiotin synthase Bacillus sphaericus 78 34 582 424 1595 1768 gi1524117 alpha-acetolactate decarboxylase Lactococcus lactis 78 68 174 450 988 62 gi103.0068 AND(P)N oxidoreductase isoflavone reductase homologue 78 63 927 Solanum tuberosum 558 562 362 gi1511588 bifunctional protein Methanococcus jannaschii 78 60 2O1 670 s 1152 1589 gi1122759 unknown Bacilius Subtilis 78 64 438 71.4 1. 64 732 gi143460 37 kd minor sigma factor (rpoF, sigB; ttg start codon 78 57 669 Bacilius ubtilis 814 368 gi1377833 unknown Bacilius Subtilis 78 59 366 981 692 gi143802 GerC2 Bacilius Subtilis 78 64 690 995 727 gi296947 uridine kinase Escherichia coli 78 64 252 1045 gi1407784 orf-1; novel antigen Staphylococcus aureus 78 61 399 1163 186 gi410117 diaminopimelate decarboxylase Bacillus subtilis 78 54 183 2191 399 gi235098 excisionase Bacteriophage 154a 78 65 396 2933 181 gi1204436 pyruvate formate-lyase Haemophilus influenzae 78 73 18O 3O41 129 317 gi624632 GltLEscherichia coli 78 53 189 US 6,593,114 B1 93 94

TABLE 2-continued 3581 105 4O1 gi763186 3-ketoacyl-coA thiolase Saccharomyces cerevisiae 78 55 297 3709 230 gi460689 TVG Thermoactinomyces vulgaris 78 58 228 3974 265 gi558839 unknown Bacilius Subtilis 78 65 264 398O gi39956 IIGlc Bacilius Subtilis 78 62 399 4056 354 gi1256635 dihydroxy-acid dehydratase Bacillus subtilis 78 55 294 4114 316 pirSO9372SO93 hypothetical protein-Trypanosoma brucei 78 62 315 4.185 179 gi1339950 large subunit of NADN-dependent glutamate synthase 78 58 177 Plectonema boryanum 4235 329 gi558839 unknown Bacilius Subtilis 78 60 327 4352 3O2 gi603768 HutI protein, imidazolone-5-propionate hydrolase Bacillus subtilis 78 63 240 gi603768 HutI protein, imidazolone-5-propionate hydrolase Bacillus Subtilis 4368 307 gi1353678 heavy-metal transporting P-type ATPase Proteus mirabilis 78 59 306 4461 216 gi1276841 glutamate synthase (GOGAT) Porphya purpurea 78 36 213 4530 238 gi39956 IIGlc Bacilius Subtilis 78 65 237 2O73 1177 gi1109684 ProV Bacilius Subtilis 77 56 897 12 : 1965 1504 gi467335 ribosomal protein L9 Bacillus subtilis 77 59 462 27 388 gi1212728 YchI Bacillus subtilis 77 63 387 39 590 1252 gi40054 phenylalanyl-tRNA synthetase beta subunit (AA 1-804) 77 60 663 Bacilius btilis 42 2704 2931 gi606241 30S ribosomal subunit protein S14 Escherichia coli 77 65 228 sp|PO2370|RS14 ECOLI 30S RIBOSOMAL PROTEIN S14. (SUB 2-101) 46 18 15459 16622 gi297798 mitochondrial formate dehydrogenase precursor Solanum tuberosum 77 55 1164 pirJQ2272JQ2272 formate dehydrogenase (EC 1.2.1.2) precursor, itochondrial-potato 1OO 40O2 3442 gi134.0128 ORF1 Staphylococcus aureus 77 54 561 102 5378 5713 gi1311482 acetolactate synthase Thermus aquaticus 77 57 336 109 4742 S383 gi710637 unknown Bacilius Subtilis 77 56 642 117 1228 gi1237015 ORF4 Bacilius Subtilis 77 53 1227 124 7688 7053 gi405819 thymidine kinase Bacillus subtilis 77 63 636 147 985 824 gi849027 hypothetical 15.9-kDa protein Bacillus subtilis 77 37 162 152 1. 73.54 7953 gi1205583 spermidine?putrescine transport ATP-binding protein Haemophilus 77 55 6OO influenzae 169 1004 1282 gi473825 elongation factor EF-Ts Escherichia coli 77 58 279 184 38O 1147 gi216314 esterase Bacillus Stearothermophilus 77 60 768 189 3296 3868 gi853.809 ORF3 Clostridium perfingens 77 48 573 193 132 290 gi1303788 YgeH Bacillus Subtilis 77 54 159 195 8414 8088 gi1499620 M. jannaschii predicted coding region MJO798 77 44 327 Methanococcus jannaschii 205 5204 498O gi216340 ORF for adenylate kinase Bacillus subtilis 77 61 225 205 145O2 14209 gi786155 Ribosomal Protein L23 Bacilius Subtilis 77 62 294 211 1908 2084 gi410332 ORFX8 Bacilius subtilis 77 47 177 217 3478 4416 gi496254 fibronectin/fibrinogen-binding protein Streptococcus pyogenes 77 54 939 232 267 998 gi1407784 orf-1; novel antigen Staphylococcus aureus 77 57 732 233 1346 873 gi467408 unknown Bacilius Subtilis 77 61 474 243 2299 1937 gi516155 unconventional myosin Sus Scrofa 77 32 363 299 68 769 gi467436 unknown Bacilius Subtilis 77 54 702 301 1283 1098 gi950073 ATP-bind. pyrimidine kinase Mycoplasma capricolum 77 48 386 pirS48605S48605 hypothetical protein Mycoplasma capricoium SGC3) (fragment) 3O2 2741 3211 gi508980 pheBBacillus subtilis 77 57 471 3O2 3835 4863 gi147783 ruvB protein Escherichia coli 77 60 3O29 307 4797 4.192 gi1070015 protein-dependent Bacillus subtilis 77 60 606 312 99 1391 gi143165 malic enzyme (EC 1.1.1.38) Bacillus Stearothermophilus 77 62 1293 pirA33307|DEBSXS malate dehydrogenase oxaloacetate decarboxylating) (EC 1.1.1.38)-Bacillus tearothemophilus 312 1541 2443 gi1399855 carboxyltransferase beta subunit Synechococcus PCC7942 77 58 903 321 4596 3526 gi398.44 fumarase (citG) (aa 1-482) Bacillus subtilis 77 65 1071 354 47 568 gi1154634 YmaB Bacilius Subtilis 77 57 522 365 1021 gi143374 phosphoribosylglycinamide synthetase (PUR-D; gtg start codon) 77 62 1O2O Bacilius Subtilis 374 708 gi1405446 transketolase Bacilius Subtilis 77 61 708 385 565 gi533.099 endonuclease III Bacilius Subtilis 77 63 564 392 594 1940 gi556014 UDP-N-acetyl muramate-alanine ligase Bacillus subtilis 77 65 1347 sp|P40778|MURC BACSU UDP-N-ACETYLMURAMATE ALANINE LIGASE (EC 3.2.8) (UDP-N-ACETYLMURANOYL L-ALANINE SYNTHETASE) (FRAGMENT) 405 3570 3061 gi1303912 YahW Bacillus subtilis 77 64 510 487 1302 1472 gi432427 ORF1 gene product Acinetobacter calcoaceticus 77 48 171 522 562 pirAO1379|SYBS tyrosine-tRNA ligase (EC 6.1.1.1)-Bacillus Stearothermophilus 77 63 561 523 1351 1115 gi1387979 44% identity over 302 residues with hypothetical protein from 77 48 237 Synechocystis sp. accession D64006 CD; expression induced by environmental stress; some similarity to glycosyl transferases; two potential membrane-spanning helices Bacilius Subtil 536 612 241 gi143366 adenylsuccinate lyase (PUR-B) Bacilius Subtilis 77 61 372 pirC29326MZBSDS adenylosuccinate lyase (EC 4.3.2.2)- Bacilius ubtilis 548 339 872 gi143387 aspartate transcarbamylase Bacilius Subtilis 77 56 534 US 6,593,114 B1 95 96

TABLE 2-continued 597 2 481 gi9041.98 hypothetical protein Bacillus subtilis 77 33 48O 633 1313 879 gi387577 ORF1A Bacilius Subtilis 77 64 435 642 85 360 gi46971 epiP gene producer Staphylococcus epidermidis 77 61 276 659 125 1219 gi1072381 glutamyl-aminopeptidase Lactococcus lactis 77 62 1095 670 1587 182O gi1122760 unknown Bacilius Subtilis 77 58 234 789 391 gi1377823 aminopeptidase Bacillus subtilis 77 65 390 815 1O 573 gi1303861 YdgN Bacillus subtilis 77 49 564 899 225 gi1204844 H. influenzae predicted coding region HIO594 Haemophilus 77 55 225 influenzae 1083 188 gi460828 B969 Saccharomyces cerevisiae 77 66 186 1942 209 gi160047 p101/acidic basic repeat antigen Plasmodium falciparum 77 38 2O7 pirA29232A29232 101K malaria antigen precursor-Plasmodium alciparum (strain Camp) 2.559 171 gi1499034 M. jannaschii predicted coding region MJO255 77 61 171 Methanococcus jannaschii 2933 243 gi423704 pyruvate formate-lyase (AA 1–760) Escherichia coli 77 72 159 irSO1788SO1788 formate C-acetyltransferase (EC 2.3.1.54)-cherichia coli 2966 56 292 gi1524397 glycine betaine transporter OpuD Bacillus subtilis 77 45 237 2976 309 gi40003 Oxoglutamate dehydrogenase (NADP) Bacillus subtilis 77 60 306 p|P23129ODO1 BACSU 2-OXOGLUTAMATE DEHYDROGENASE E1 COMPONENT (EC 2.4.2) (ALPHA-KETOGLUTARATE) DEHYDROGENASE). 2979 400 122 gi1204354 spore germination and vegetative growth protein 77 61 279 Haemophilus influenzae 2988 377 153 gi438465 Probable operon with orfF. Possible alternative initiation codon, ases 77 55 225 2151-2153. Homology with acetyltransferases.; putative Bacilius Subtilis 2990 167 gi142562 ATP synthase epsilon subunit Bacillus megaterium 77 63 165 pirB28599PWBSEN H-transporting ATP synthase (EC 3.6.1.34) psilon chain-Bacilius negaterium 389 gi488130 alcohol dehydrogenase 2 Entamoeba histolytical 77 56 387 195 gi468764 mocR gene product Rhizobium meliloti 77 50 195 400 74 gi603768 HutI protein, imidazolone-5-propionate hydrolase Bacillus subtilis 77 52 327 gi603768 HutI protein, imidazolone-5-propionate hydrolase Bacillus subtilis 4048 386 69 gi216278 gramicidin S synthetase Bacillus brevis 77 55 31.8 4110 368 pirS52915|S529 nitrate reductase alpha chain-Bacilius Subtilis (fragment) 77 61 366 4111 348 gi517205 67 kDa Myosin-crossreactive Streptococcal antigen 77 65 348 Streptococcus yogenes 4225 297 gi132224.5 mevalonate pyrophosphate decarboxylase Rattus norvegicus 77 60 294 4611 327 160 gi508979 GTP-binding protein Bacillus subtilis 77 57 168 4668 182 pirS52915|S529 nitrate reductase alpha chain-Bacilius Subtilis (fragment) 77 61 18O 25 2 1627 gi1150620 MmsA Streptococcus pneumoniae 76 58 1626 38 1488 2537 pirA43577A435 regulatory protein pfoR-Clostridium perfringens 76 57 1OSO 52 2962 4041 gi1161061 dioxygenase Methylobacterium extorguens 76 62 108O 56 27389 27955 gi467402 unknown Bacilius Subtilis 76 56 567 57 12046 12219 gi1206040 weak similarity to keratin Caenorhabditis elegans 76 40 174 91 1062 2261 gi4757154 acetyl coenzyme A acetyltransferase (thiolase) 76 57 12OO Clostridium cetobutylicum 98 81.8 1624 gi467422 unknown Bacilius Subtilis 76 62 807 98 2965 3228 gi897793 y98 gene product Pediococcus acidilactici 76 52 264 98 5922 6326 gi467427 methionyl-tRHA synthetase Bacillus subtilis 76 53 405 104 1322 1885 gi216151 DNA polymerase (gene L.; ttg start codon) Bacteriophage 76 63 564 PSO2) gi579197 SPO2 DNA polymerase (aa 1-648) Bacteriophage SPO2 pirA21498|DJBPS2 DNA-directed DNA polymerase (EC 2.7.7.7)-phage PO2 124 9 7055 5976 gi853776 peptide chain release factor 1 Bacillus subtilispirS55437S55437 76 58 108O peptide chain release factor 1-Bacillus ubtilis 164 2832 3311 gi1204976 prolyl-tRNA synthetase Haemophilus influenzae 76 53 48O 168 1841 1065 gi1177253 putative ATP-binding protein of ABC-type Bacillus Subtilis 76 58 777 189 163 888 gi467384 unknown Bacilius Subtilis 76 63 726 235 2253 3518 gi142936 folyl-polyglutamate synthetase Bacillus Subtilis 76 53 1266 pirB40646B40646 folC-Bacillus subtilis 236 335 925 gi1146197 putative Bacillus Subtilis 76 54 591 237 5323 5541 gi12792.61 F13G3.6 Caenorhabditis elegans 76 47 219 263 4585 368O gi1510348 dihydrodipicolinate synthase 76 49 906 Methanococcus jannaschii 304 1051 1794 gi666982 putative membrane spanning subunit Bacilius Subtilis 76 60 744 pirS52382S52382 probable membrane spanning protein Bacilius ubtilis 312 3611 4624 gi143312 6-phospho-1-fructokinase (gtg start codon; EC 2.7.1.11) 76 56 1014 Bacilius tearotheraophilus 343 1036 gi405956 yeeE Escherichia coli 76 59 1035 347 409 1701 gi396304 acetylornithine deacetylase Escherichia coli 76 72 1293 358 672 1907 gi1146215 39.0% identity to the Escherichia coli S1 ribosomal protein; putative 76 58 1236 Bacilius Subtilis 371 222 gi537084 alternate gene name mgt; CG Site No. 497 Escherichia coli 76 61 222 pirS56468S56468 mgtA protein-i Escherichia coli US 6,593,114 B1 97 98

TABLE 2-continued 379 4331 4858 gi 43268 dihydrolipoamide transsuccinylase (odhB; EC 2.3.1.61) 76 61 528 Bacilius ubtilis 404 4022 4492 gi 3O3823 YafG Bacillus subtilis 76 60 471 411 307 gi 4 86O25 ORFYKL027w Saccharomyces cerevisiae 76 55 336 472 2854 1352 gi 40.5464 AlsTBacilius Subtilis 76 57 1503 546 273 995 gi 53821 Streptococcal pyrogenic exotoxin type C spec) precursor 76 36 723 Streptococcus pyogenes 588 557 60 gi MutS Bacilius Subtilis 76 61 498 591 16 735 Sl 885934 Clpb Synechococcus sp. 76 44 720 6O2 175 798 gi 486422 OppD homologue Rhizobium sp. 76 52 624 619 : 290 33 Sl 330613 major caprid protein Human cytomegalovirus 76 47 258 660 2568 33O2 Sl 904.199 hypothetical protein Bacillus subtilis 76 55 735 677 228 gi 4.0377 spoOF gene product Bacillus Subtilis 76 58 225 962 24 gi 42443 adenylosuccinate synthetase Bacilius Subtilis 76 67 183 sp|P29726|PUEA BACSU ADENYLOSUCCINATE SYNTHETASE (EC 6.3.4.4) IMP-ASPARTATE LIGASE). 978 2 gi 511333 M. jannaschii predicted coding region MJ1322 76 56 579 Methanococcus jannaschii 997 244 gi 4 67154 No definition line found Mycobacterium leprae 76 38 243 1563 266 gi 3O3984 YckC Bacillus subtilis 76 52 264 21.84 182 Sl Cap Staphylococcus aureus 76 38 18O 2572 gi 53898 transport protein Salmonella typhimurium 76 65 387 2.942 29 Sl nitrite reductase (nirB) Bacillus subtilis 76 59 372 2957 216 gi 511251 hypothetical protein (SP:P42404) Methanococcus jannaschii 76 47 162 298O 279 gi 40.5464 AlsTBacilius Subtilis 76 53 276 3O15 326 gi 4 O8115 ornithine acetyltransferase Bacillus subtilis 76 61 324 3124 13 Sl 1882705 ORF o401 Escherichia coli 76 65 162 3179 gi 68477 ferredoxin-dependent glutamate synthase Zea mays 76 53 159 pirA38596"A38596 glutamate synthase (ferredoxin) (EC 1.4.7.1)-aize 3789 379 Sl 39956 IIGlc Bacilius Subtilis 76 55 378 3892 314 gi ferripyochelin binding protein Methanococcus jannaachi 76 52 312 3928 400 gi permease Bacillus subtilis 76 59 399 4159 386 15 sp METHICILLIN-RESISTANTSURFACE PROTEIN 76 66 372 (FRAGMENTS). 4204 17 331 Sl1296464 ATPase Lactococcus lactis 76 56 315 4398 249 Sl1987255 Menkes disease gene Homo Sapiens 76 48 246 4506 313 Sl 216746 D-lactate dehydrogenase Lactobacillus plantarum 76 47 312 4546 247 17 133995O large subunit of NADH-dependent glutamate synthase 76 61 231 Plectonema boryanum 4596 191 Sl 560027 cellulose synthase Acetobacter xylinum 76 70 189 5 4337 3417 Sl1882.532 ORF o294 Escherichia coli 75 59 921 164 952 gi 40960 Escherichia coli 75 56 789 12 3944 1953 gi 467336 unknown Bacilius Subtilis 75 57 1992 23 18 17310 16348 gi 296433 O-acetylserine sulfhydrylase B Alcaligenes eutrophus 75 55 963 25 3 2356 3393 gi SO2419 PlsX Bacilius Subtilis 75 56 1038 36 8 5765 6037 gi 256517 unknown Schizosaccharomyces pombe 75 45 273 46 13 11186 32O58 Sl 48972 nitrate transporter Synechococcus sp. 75 46 673 51 7 3474 3677 gi sporulation protein Bacillus subtilis 75 61 204 53 16 36590 16330 gi recombination protein (ttg start codon) Bacilius Subtilis 75 51 261 gi1303923 RecN Bacillus subtilis 74 2568 1564 gi 2O4847 ornithine carbamoyltransferase Haemophilus influenzae 75 61 1005 85 393O 3232 gi 43368 phosphoribosylformly glycinasidine synthetase I (PUR-L: 75 63 699 gtg start odon) Bacilius Subtilis 85 5 4878 41.68 gi 43367 phosphoribosyl aminoidazole succinocarboxamide synthatase 75 55 711 (PUR-C; tg start codon) Bacillus subtilis 85 6625 7530 gi 3O3916 YgiA Bacillus subtilis 75 53 906 87 2340 3590 gi O64.813 homologous to sp: PNDR BACSU Bacillus subtilis 75 56 1251 87 6084 6896 gi O64810 function unknown Bacilius Subtilis 75 61 813 108 1503 1162 gi OO1824 hypothetical protein Synechocystis sp.) 75 51 342 110 1748 3727 gi 147593 putative ppGpp synthetase Streptomyces coelicolor 75 55 198O 110 4353 5252 gi 177251 clwD gene product Bacillus subtilis 75 75 900 12O 1. 10649 10032 gi 524394 ORF-2 upstream of gbSAB operon Bacillus subtilis 75 55 618 121 2050 4221 Sl 31546.32 NrdE Bacilius Subtilis 75 54 2172 124 143 Sl 405622 unknown Bacilius Subtilis 75 56 141 128 81 1139 gi 43.316 gap gene products Bacillus negaterium 75 48 1059 130 5760 5903 gi 256.654 54.8% identity with Neisseria gonorrhoeae regulatory protein PilB; 75 62 144 putative Bacillus Subtilis 136 31.85 1890 gi 467403 seryl-tRNA synthetase Bacillus Subtilis 75 54 1296 161 5439 5798 gi OO1195 hypothetical protein Synechocystis sp. 75 55 360 172 2995 2171 Sl 755153 ATP-binding protein Bacillus Subtilis 75 52 825 179 1107 190 gi porphobilinogen deaminase Bacilius Subtilis 75 58 918 195 9374 92.19 sp HYPOTHETICAL PROTEIN IN PURB5REGION 75 60 156 (ORP-15) (FRAGMENT). 2OO 2605 4596 gi 42440 IATP-dependent nuclease Bacillus subtilis 75 56 1992 2O6 562O 4340 gi 256135 YbbF Bacilius subtilis 75 53 1281 216 159 389 gi O528OO unknown Schizosaccharomyces pombe 75 58 231 229 29 847 gi 2O5958 branched chain aa transport system II carrier protein Haemophilus 75 49 819 influenzae US 6,593,114 B1 99 100

TABLE 2-continued 230 2 518 1714 gi1971337 nitrite extrusion protein Bacillus subtilis 75 53 1197 231 1122 gi1002521 NutLBacilius Subtilis 75 54 1119 233 1314 1859 gi4674054 unknown Bacilius Subtilis 75 59 546 269 164 gi1511246 methyl coenzyme H reductase system, component A2 75 50 162 Methanococcus jannaschii 292 772 155 gi1511604 M. jannaschii predicted coding region MJ1651 75 46 618 Methanococcus jannaschii 304 1773 2261 gi1205328 surfactin Haemophilus influenzae 75 55 489 312 2437 3387 gi285621 undefined open reading frame Bacilius Stearothermophilus 75 62 951 312 5 4622 6403 gi1041097 pyruvate Kinase Bacillus psychrophilus 75 57 1782 319 353 877 gi1212728 YchI Bacillus subtilis 75 54 525 32O 4321 SO31 gi2O70361 OMP decarboxylase Lactococcus lactis 75 56 711 32O 5010 5642 gi143394 OMP-PRPP transferase Bacilius subtilis 75 60 633 337 2519 2088 gi487433 citrate synthase II Bacillus subtilis 75 58 570 394 669 1271 gi304976 matches PS00017: ATP GTP A and PSOO301: EFACTOR GTP: 75 51 603 similar to longation factor G, TetM/TetO tetracycline-resistance proteins Escherichia coli 423 127 570 gi1183839 unknown Pseudomonas aeruginosa 75 59 444 433 1603 2929 gi1492.11 acetolactate synthase Klebsiella pneumoniae 75 63 327 446 : 176 1540 gi312441 dihydroorotase Bacillus caldolyticus 75 62 1365 486 249 gi1149682 potF gene product Clostridium perfringens 75 55 246 496 794 gi143582 spoIIIEA protein Bacillus subtilis 75 59 792 498 824 1504 gi143328 protein (put.); putative Bacilius Subtilis 75 47 681 499 : 1061 1624 gi1387979 44% identity over 302 residues with hypothetical protein from 75 51 564 Synechocystis sp., accession D64006 CD; expression induced by environmental stress; some similarity to glycosyl transferases; two potential membrane-spanning helices Bacilius Subtil 568 453 265 triacylglycerol lipase (EC 3.1.1.3) 2-Mycoplasma mycoides subsp. 75 50 189 mycoides (SGC3) 613 233 36 gi330993 tegument protein Saimirine herpesvirus 2 75 75 198 621 525 gi529754 spec Streptococcus pyogenes 75 43 525 642 5 1809 2474 gi1176401 EpiG Staphylococcus epidermidis 75 51 666 646 454 657 gi172442 ribonuclease P Saccharomyces cerevisiae 75 37 204 657 347 gi882541 ORF o236 Escherichia coli 75 47 345 750 832 gi46971 epiP gene product Staphylococcus epidermidis 75 57 831 754. 481 gi1303901 YghT Bacilius Subtilis 75 57 48O 763 393 223 gi12051.45 multidrug resistance protein Haemophilus influenzae 75 51 71 775 482 pirB36889|B368 leuA protein, inactive-Lactococcus lactis subsp. lactis 75 63 48O (strain IL1403) 793 18O gi143316 gap gene products Bacillus negaterium 75 57 8O 8OO 160 gi509411 NFRA protein Azorhizobium caulinodans 75 34 59 811 560 gi143434 Rho Factor Bacilius Subtilis 75 60 558 940 329 165 gi1276985 arginase Bacillus caldovelox 75 50 65 971 37 252 gi1001373 hypothetical protein Synechocystis sp. 75 58 216 1059 232 8O gi726480 L-glutamine-D-fructose-6-phosphate aminotransferase 75 67 53 Bacilius ubtilis 1109 219 374 gi143331 alkaline phosphatase regulatory protein Bacilius Subtilis 75 53 56 pirA27650A27650 regulatory protein phoR-Bacillus subtilis sp|P23545|PHOR BACSU ALKALINE PHOSPHATASE SYNTHESIS SENSOR PROTEIN HOR (EC 2.7.3-) 1268 137 gi304135 ornithine acetyltransferase Bacillus Stearothermophilus 75 63 35 sp|Q07908ARGJ BACST GLUTAMATE N-ACETYLTRAHSFERASE (EC 2.3.1.35) ORHITHINE ACETYLTRANSFERASE) (ORHITHINE TRANSACETYLASE) (QATASE)/MINO-ACID ACETYLTRANSFERASE (EC 2.3.1.1) (N-ACETYLGLUTAMATE YNTHA 1SOO 163 gi1205488 excinuclease ABC subunit B Haemophilus influenzae 75 57 162 1529 400 gi1002521 MutLBacilius Subtilis 75 54 399 3O10 387 gi1204435 pyruvate formate-lyase activating enzyme Haemophilus influenzae 75 54 384 3105 18O gi1041097 Pyruvate Kinase Bacillus psychrophilus 75 57 18O 3117 45 212 gi899317 peptide synthetase module Microcystis aeruginosa 75 42 168 pirS49111S49111 probable amino acid activating domain-icrocystis aeruginosa (fragment) (SUB 144-528) 3139 139 345 gi145294 adenine phosphoribosyl-transferase Escherichia coli 75 66 2O7 388O 310 gi100.9366 Respiratory nitrate reductase Bacillus subtilis 75 58 309 3911 48 gi433991 ATP synthase subunit beta Bacilius Subtilis 75 68 354 3957 pirD36889D368 3-isopropylmalate dehydratase (EC 4.2.1.33) chain leuc 75 65 378 Lactococcus lactis subsp. lactis (strain IL1403) 4005 259 gi216746 D-lactate dehydrogenase Lactobacillus plantarum 75 48 255 408O 73 333 gi415855 deoxyribose aldolase Mycoplasma hominis 75 59 261 4311 339 gi149435 putative Lactococcus lactis 75 57 339 4136 gi450688 hsdM gene of EcoprrI gene product Escherichia coli 75 56 3OO pirS38437S38437 hsdM protein-Escherichia coli pirSO9629ISO9629 hypothetical protein A-Escherichia coli (SUB 40-520) 4144 336 gi48972 nitrate transporter Synechococcus sp. 75 49 333 4237 374 84 gi1339950 large subunit of NADH-dependent glutamate synthase 75 55 291 Plectonema boryanum US 6,593,114 B1 101 102

TABLE 2-continued 4306 73 318 gi2.94260 major surface glycoprotein. Pneumocystis carini 75 68 246 4343 1. 359 gi1204652 methylated-DNA-protein-cysteine methyltransferase 75 52 357 Haemophilus influenzae 4552 312 gi296464 ATPase Lactococcus lactis 75 55 309 38 5776 6126 gi443793 NupC Escherichia coli 74 50 351 50 6221 5532 gi1239988 hypothetical protein Bacillus subtilis 74 55 690 56 10770 12221 gi1000451 TreP Bacilius Subtilis 74 57 1452 64 1622 978 gi41015 aspartate-tRNA ligase Escherichia coli 74 57 645 66 4848 4633 gi1212729 Ygh.J. Bacillus subtilis 74 47 216 67 18 14334 14897 gi1510631 endoglucanase Methanococcus jannaschi 74 52 564 O2 12563 13136 gi149429 putative Lactococcus lactis 74 67 576 O2 16 13121 14419 gi149435 putative Lactococcus lactis 74 57 1299 O8 4 3902 2931 gi39478 ATP binding protein of transport ATPases Bacillus firmus 74 59 972 irS15486|S15486 ATP-binding protein-Bacillus firmus p|P26946YATR BACFI HYPOTHETICAL ATP-BINDING TRANSPORT PROTEIN. 16 7,093 5612 gi1205430 dipeptide transport system permease protein Haemophilus influenzae 74 49 1482 2O s 4342 4803 gi146970 ribonucleoside triphosphate reductase Escherichia coli 74 58 462 pirA47331A47331 anaerobic ribonucleotide reductase Escherichia oli 21 6581 gi1107528 ttg start Campylobacter coli 74 51 621 28 3531 gi143318 phosphoglycerate kinase Bacillus negaterium 74 57 1212 3O 5791 gi1256653 DNA-binding protein Bacillus subtilis? 74 60 555 36 3555 gi143076 histidase Bacilius Subtilis 74 58 1596 45 1368 gi4077734 dev A gene product Anabaena sp. 74 45 705 52 gi1377833 unknown Bacilius Subtilis 74 54 276 64 1. 11375 gi580900 ORF3 gene product Bacillus subtilis 74 52 312 75 2139 gi642656 unknown Rhizobium meliloti 74 34 486 75 5160 gi854656 Na/H antiporter system ORF2 Bacillus alcalophilus 74 46 453 95 111 9332 gi1204430 hypothetical protein SP: P25745) Haemophilus influenzae 74 55 1008 205 1. 1. 8499 gi1044979 ribosomal protein L6 Bacillus subtilis 74 64 561 236 2 671O gi1146207 putative Bacillus Subtilis 74 63 1137 241 2147 gi694121 malate thiokinase Methylobacterium extorguens 74 52 1188 246 2293 gi467374 single strand DNA binding protein Bacillus subtilis 74 64 507 249 4O75 gi1524397 glycine betaine transporter OpuD Bacillus subtilis 74 55 1239 261 3773 gi809542 CbrB protein Erwinia chrysanthemi 74 42 309 278 3616 gi1204872 ATP-binding protein Haemophilus influenzae 74 54 1OSO 309 112 gi1205579 hypothetical protein GB: U14003 302) Haemophilus influenzae 74 53 555 315 251 gi143398 quinol Oxidase Bacillus subtilis 74 57 612 32O 1065 gi143389 glutaminase of carbamyl phosphate synthetase Bacilius Subtilis 74 60 1065 pirR39045E39845 carbasoyl-phosphate synthase glutamine hydrolyzing) (EC 6.3.5.5), pyrimidine-repressible, small hain Bacilius Subtilis 38O 382 1128 gi534857 ATPase subunit a Bacilius Stearothermophilus 74 56 747 405 1311 88O gi11303915 YchZ Bacillus subtilis 74 65 432 433 2503 3270 gi473902 alpha-acetolactate synthase Lactococcus lactis 74 56 768 452 942 gi413982 ipa-58r gene product Bacillus subtilis 74 52 942 461 1193 gi558494 homoserine dehydrogenase Bacilius Subtilis 74 51 1.191 461 1174 14O7 gi40211 threonine synthase (thrC) (AA 1-352) Bacillus subtilis 74 56 234 irA25364A25364 threonine synthase (EC 4.2.99.2)-Bacillus btilis 462 402 734 gi142520 thioredoxin Bacilius Subtilis 74 62 333 478 32O 66 gi1499005 glycyl-tRNA synthetase Methanococcus jannaschii 74 52 255 5O1 739 1740 gi217040 acid glycoprotein Streptococcus pyogenes 74 58 10O2 551 2791 1499 gi143040 glutamate-1-semialdehyde 2,1-aminotransferase Bacilius Subtilis 74 51 1293 pirD42728D42728 glutamate-1-semialdehyde 2.1-aminomutase (EC 4.3.8)-Bacillus Subtilis 573 477 gi1006605 hypothetical protein Synechocystis sp. 74 45 477 596 1298 816 gi13031853 YdgF Bacillus subtilis 74 55 483 618 s 1758 592 gi11146237 21.4% of identity to trans-acting transcription factor of Sacharomyces 74 55 1167 cerevisiae; 25% of identity to sucrose synthase of Zea mays; putative Bacilius Subtilis 659 1269 1595 gi1072380 ORF3 Lactococcus lactis 74 62 327 724 188 gi1143374 phosphoribosylglycinamide synthetase (PUR-D; gtg start codon) 74 58 186 Bacilius Subtilis 743 604 1209 gi153833 ORF1; putative Streptococcus parasanguis 74 50 606 836 259 gi143458 ORF V Bacilius subtilis 74 47 258 989 443 724 gi1303994 YgkM Bacillus Subtilis 74 46 282 1106 492 gi46970 epiD gene product Staphylococcus epidermidis 74 54 492 1135 373 528 gi413948 ipa-24d gene product Bacillus Subtilis 74 48 156 1234 452 87 gi495245 recy gene product Erwinia chrysanthemi 74 36 366 2586 2 23R gi1149701 sbcC gene product Clostridium perfringens 74 62 237 2919 400 gi1405454 aconitase Bacilius Subtilis 74 60 399 2962 363 76 gi450686 3-phosphoglycerate kinase Thermotoga maritina 74 58 288 2983 191 gi1303893 YghLBacillus subtilis 74 56 189 301.8 223 gi143040 glutamate-1-semialdehyde 2,1-aminotransferase Bacilius Subtilis 74 56 222 pirD42728D42728 glutamate-1-semialdehyde 2,1-aminomutase (EC 4.3.8)-Bacillus Subtilis 3O38 256 pirS52915|S529 nitrate reductase alpha chain-Bacilius Subtilis fragment) 74 57 255 3062 189 i gi2107528 ttg start Campylobacter coli 74 51 186 US 6,593,114 B1 103 104

TABLE 2-continued 4035 184 360 gi O22725 unknown Staphylococcus haemolyticus 74 64 177 4045 305 gi 510977 M. jannaschii predicted coding region MJO938 74 41 303 Methanococcus jannaschii 4283 304 137 Sl 52O844 orf4 Bacillus Subtilis 74 58 168 4449 221 Sl S80910 peptide-synthetase ORF1 Bacillus subtilis 74 54 219 4587 231 gi orf6 Lactobacillus Sake 74 59 228 4603 29 214 gi glutamate synthase large subunit (EC 2.6.1.5) Escherichia coli 74 60 186 pirA29617A29617 glutamate synthase (NADPH) (EC 1.4.1.13) large hain-Escherichia coli 4670 184 gi 256135 YbbF Bacilius subtilis 74 61 183 7162 6371 gi 43727 putative Bacillus Subtilis 73 42 792 11 1372 290 gi 66.338 dihydroorotate dehydrogenase Agrocybe aegerita 73 55 1083 14 1O2O 16 gi 43373 phosphoribosyl aminoimidazole carboxy formyl ormyltransferasef 73 54 1005 inosine monophosphate cyclohydrolase (PUR-HCJ)) Bacillus subtilis 23 5 4635 3844 gi 468939 meso-2,3-butanediol dehydrogenase (D-acetoin forming) Klebsiella 73 58 792 pneumoniae 23 16360 15341 Sl 297O60 ornithine cyclodeaminase Rhizobium meliloti 73 37 1O2O 29 692 1273 Sl1467442 stage V sporulation Bacillus subtilis 73 54 582 31 4.914 3361 Sl1414000 ipa-76d gene product Bacillus subtilis 73 55 1554 37 74O2 6146 gi 429259 pepT gene product Bacilius Subtilis 73 59 1257 37 7562 7386 gi 68367 alpha-isopropylmalate isomerase (put.); putative 73 52 177 Rhizomucor ircinelloides 38 3931 4896 Sl 405885 yeiN Escherichia coli 73 58 966 44 4238 3435 Sl 58O895 unknown Bacilius Subtilis 73 53 804 44 7767 8306 Sl 42009 moaB gene product Escherichia coli 73 50 540 45 2439 3O8O gi 109685 ProW Bacilius subtilis 73 47 642 54 13794 13552 Sl 413931 ipa-7d gene product Bacilius Subtilis 73 61 243 59 1430 2248 gi 47923 threonine dehydratase 2 (EC 4.2.1.16) Escherichia coli 73 53 819 65 730 Sl 677944 AppFBacillus subtilis 73 56 729 8O 860 345 Sl S190932 murD gene product Bacillus subtilis 73 53 516 O2 1O124 11179 58O891 3-isopropylmalate dehydrogenase (AA)-365) Bacilius Subtilis 73 55 1056 Sl pirA26522A26522 3-isopropylmalate dehydrogenase (EC 1.1.1.85)-acillus subtilis 26OO 1707 gi 510849 M. jannaschii predicted coding region MJO775 73 40 894 Methanococcus jannaschii 4782 5756 gi 4697O ribonucleoside triphosphate reductase Escherichia coli 73 56 975 pirA47331A47331 anaerobic ribonucleotide reductase Escherichia Oli 5726 6223 gi 204333 anaerobic ribonucleoside-triphosphate reductase 73 62 498 Haemophilus influenzae 32 4 151 43.63 Sl 87.1048 HPSR2-heavy chain potential motor protein Giardie intestinalis 73 43 213 40 4 324 2696 Sl 634107 kdpB Escherichia coli 73 59 1629 5939 4818 gi 4 1O125 ribC gene product Bacillus Subtilis 73 57 1122 49 717 1568 gi 4 60892 heparin binding protein-44, HBP-44 mice, Peptide, 360 aa 73 53 150 pirJXO281JXO281 heparin-binding protein-44 precursor-mouse gi220434 ORF (Mus ausculus (SUB 2-360) 58 1431 Sl 882SO4 oRF f560 Escherichia coli 73 57 1431 74 3698 gi 146240 ketopantoate hydroxyaethyltransferase Bacillus subtilis 73 55 828 75 4819 Sl 1854657 Na/M antiporter system ORF3 Bacillus alcalophilus 73 56 360 386 4393 Sl 467477 unknown Bacilius Subtilis 73 48 1101 249 5175 gi 524397 glycine betaine transporter OpuD Bacillus subtilis 73 56 555 265 228O Sl 39848 U3 Bacilius Subtilis 73 41 408 270 582 Sl 780.461 220 kDa polyprotein African Swine fever virus 73 53 255 278 2953 gi 2O8965 hypothetical 23.3 kd protein Escherichia coli 73 49 666 279 22O2 gi 185288 isochorismate synthase Bacillus subtilis 73 58 1392 291 1575 gi 511440 glutamine-fructose-6-phosphate transaminase 73 63 369 Methanococcus jannaschii 299 1166 Sl 467437 unknown Bacilius Subtilis 73 58 432 299 5 3234 Sl 467439 temperature sensitive cell division Bacillus subtilis 73 53 1185 334 219 Sl 536,655 ORF Y8R244w Saccharomyces cerevisiae 73 43 510 336 245 Sl 790943 urea amidolyase Bacillus subtilis 73 51 792 374 1874 gi 405451 Yne.J Bacilius Subtilis 73 55 486 433 2554 gi 73902 alpha-acetolactate synthase Lactococcus lactis 73 54 639 509 261 gi 67483 unknown Bacilius Subtilis 73 56 768 513 127 gi 14622O AND+ dependent glycerol-3-phosphate dehydrogenase 73 56 792 Bacilius Subtilis 533 733 gi 510605 hypothetical protein ISP: P42297) Methanococcus jannaschii 73 44 495 546 2815 gi 4 1748 hsdM protein (AA 1-520) Escherichia coli 73 52 1668 549 gi 324847 Cin A Bacilius Subtilis 73 57 381 567 gi 4 10137 ORFX13 Bacilius subtilis 73 58 672 716 1112 gi 256623 exodeoxyribonuclease Bacillus subtilis 73 56 459 772 677 gi 142O10 shows 70.2% similarity and 48.6% identity to the EnvH protein 73 57 675 of almonella typhimurium Anabaena sp. 774 209 gi 4 O9286 bmrU Bacilius Subtilis 73 52 2O7 782 402 gi 4332O gap gene products Bacillus negaterium 73 56 402 789 451 762 gi O63246 low homology to P14 protein of Haemophilus influenza and 14.2 73 56 312 kDa protein of Escherichia coli Bacillus subtilis US 6,593,114 B1 105 106

TABLE 2-continued 796 3 911 gi853754 ABC transporter Bacillus subtilis 73 58 909 806 3 949 689 gi143786 tryptophanyl-tRNA synthetase (EC 6.1.1.2) Bacillus subtilis 73 51 261 pirJTO481YMBS tryptophan-tRHA ligase (EC 6.1.1.2)- Bacilius ubtilis 816 2 3097 1355 gi41748 hsdM protein (AA 1-520) (Escherichia coli 73 52 1743 839 400 gi886906 argininosuccinate synthetase Streptomyces clavuligerus 73 59 399 pirS57659S57659 argininosuccinate synthase (EC 6.3.4.5)- treptomyces clavuligerus 857 290 gi348052 acetoin utilization protein Bacillus subtilis 73 50 288 O08 396 gi401.00 rodC (tag3) polypeptide (AA 1-746) Bacillus subtilis 73 41 393 irS06049S06049 rodC protein-Bacillus subtilis p|P13485TAGF BACSU TECHOIC ACID BIOSYHTHESIS PROTEIN F. O18 213 gi529357 No definition line found Caenorhabditis elegans 73 53 213 sp|P46975|STT3 CAEELOLIGOSACCHARYL TRANSFERASE STT3 SUBUNIT OMOLOG. O33 491 gi142706 comG1 gene product Bacilius Subtilis 73 51 489 174 204 13 gi1149513 alpha3a subunit of laminin 5 Homo sapiens 73 60 92 175 329 gi473817 ORF Escherichia coli 73 57 327 187 209 gi580870 ipa-37d qoxA gene product Bacilius Subtilis 73 52 2O7 2O6 72 245 gi144816 formyltetrahydrofolate synthetase (FTHFS) (ttg start codon) 73 43 74 (EC 3.4.3) Moorella thermoacetical 454 241 59 gi1213253 unknown Schizosaccharomyces pombe 73 53 469 260 gi1303787 YeG Bacillus subtilis 73 55 761 189 gi9135 Mst26Aa gene product Drosophila Simulans 73 34 849 243 19 gi162307 DHA topoisomerase II Trypanosoma cruzi 73 60 2055 400 gi559381 P47K protein Rhodococcus erythropolis 73 34 2556 244 gi145925 fecB Escherichia coli 73 62 2947 400 251 gi1184680 polynucleotide phosphorylase Bacillus subtilis 73 51 2956 375 gi143397 quinol Oxidase Bacillus subtilis 73 58 3O37 329 gi143091 acetolactate synthase Bacillus subtilis 73 55 3125 194 gi323866 Overlapping out-of-phase protein Eggplant mosaic virus 73 53 sp|P20129IV7OK EPMV 70 KD PROTEIN. 3603 527 gi1439,521 glutaryl-CoA dehydrogenase precursor Mus musclus 73 48 74 3743 400 gi4506884 hsdM gene of EcoprrI gene product Escherichia coli 73 54 399 pirS38437S38437 hsdM protein-Escherichia coli pirSO9629ISO9629 hypothetical protein A-Escherichia coli (SUB 40-520) 3752 359 78 gi1524193 unknown Mycobacterium tuberculosis 73 59 282 3852 181 gi216746 D-lactate dehydrogenase Lactobacillus plantarum 73 68 18O 3914 239 pirS13490IS134 Hydroxymethylglutaryl-CoA synthase (EC 4.1.3.5)- 73 53 237 Chicken (fragment) 3914 343 116 gi52.8991 unknown Bacilius Subtilis 73 38 228 4O69 316 gi40003 Oxoglutamate dehydrogenase (NADP+) Bacillus subtilis 73 55 315 p|P23129ODO1 BACSU 2-OXOGLUTAMATE DEHYDROGENASE E1 COMPONENT (EC 2.4.2) (ALPHA KETOGLUTARATE DEHYDROGENASE) 41.65 365 15 gi1439,521 glutaryl-CoA dehydrogenase precursor Nus musculus 73 48 351 4196 177 gi8O9660 deoxyribose-phosphate aldolase Bacillus subtilis 73 60 177 pirS49455S49455 deoxyribose-phosphate aldolase (EC 4.1.2.4)-acillus subtilis 42O2 378 184 gi52.8991 unknown Bacilius Subtilis 73 38 195 4314 193 gi436797 N-acyl-L-amino acid amidohydrolase Bacillus Stearothermophilus 73 47 192 sp|P37112ANA BACSTN-ACYL-L-AMINO ACID AMIDOHYDROLASE (EC.5.1.14) (AMINOACYLASE). 4393 3 263 gi216267 ORF2 Bacilius megaterium 73 47 261 35 2 903 1973 gi1146196 phosphoglycerate dehydrogenese Bacilius Subtilis 72 53 1071 38 22 17877 1666O gi602031 similar to trimethylamine DH Mycoplasma capricolum 72 54 1218 pirS4995OS49950 probable trimethylamine dehydrogenase (EC.5.99.7)-Mycoplasma capricolum (SCC3) (fragment) 38 18134 19162 gi413968 ipa-44d gene product Bacillus Subtilis 72 54 1029 44 11895 12953 gi516272 unknown Bacilius Subtilis 72 49 1059 48 6248 7117 gi43499 pyruvate synthase Halobacterium halobium 72 49 870 50 5691 4819 gi1205399 proton glutamate symport protein Haemophilus influenzae 72 53 873 53 92.59 7997 gi1303956 YCE Bacilius Subtilis 72 52 1263 56 29549 129995 gi4674714 unknown Bacilius Subtilis 72 47 447 69 4123 2948 gi1354775 pfos/R Treponema pallidum 72 46 1176 69 4377 4982 gi9041.98 hypothetical protein Bacillus subtilis 72 43 606 73 2 856 gi142997 glycerol uptake facilitator Bacilius Subtilis 72 59 855 98 9371 110258 gi467435 unknown Bacilius Subtilis 72 50 888 127 1. 1593 gi217144 alanine carrier protein thermophilic bacterium PS3 72 56 1593 pirA45111A45111 alanine transport protein thermophilic acterium PS-3 131 26OO gi153952 polymerase III polymerase subunit (dinaE) Salmonella typhimurium 72 53 2598 pirA45915A45915 DNA-directed DNA polymerase (EC 2.7.7.7) III ipha chain-Salmonella typhimurium 141 1040 1978 gi1405446 ransketolase Bacilius Subtilis 72 54 939 149 2535 2251 gi606234 secy Escherichia coli 72 44 285 149 5245 SO18 gi1304472 IDNA polymerase Unidentified phycodnavirus clone OTU4 72 55 228 154 1. 210 gi11205620 erritin like protein Haemophilus influenzae 72 40 210 US 6,593,114 B1 107 108

TABLE 2-continued 155 132O 433 gi391610 arnesyl diphosphate synthase Bacillus Stearothermophilus 72 57 888 pirJXO257JXO257 geranyltranstransferase (EC 2.5.1.10)-Bacillus tearothermophilus 18O 2 328 gi433630 A180 Saccharomyces cerevisiae 72 62 327 184 1145 3553 gi1205110 virulence associated protein homolog Haemophilus influenzae 72 49 2409 195 1279 635 gi1001730 hypothetical protein Synechocystis sp.) 72 45 645 2O6 1. 14646 15869 gi1064807 ORTHININE AMINOTRANSFERASE Bacillus subtilis 72 50 1224 209 462 932 gi1204666 hypothetical protein (GB: X73124 53) Haemophilus influenzae 72 60 471 215 522 28O gi881513 insulin receptor homolog Drosophila melanogaster 72 63 243 pirS57245S57245 insulin receptor homolog-fruit fly Drosophila elanogaster (SUB 46-2146) 224 2 790 gi949974 sucrose repressor Staphylococcus xylosus 72 54 789 233 765 gi1408493 homologous to SwissProt: YIDA ECOLI hypothetical protein 72 52 762 Bacilius Subtilis 240 22O 1485 gi537049 ORF o470 Escherichia coli 72 52 1266 245 1340 gi1204578 hypothetical protein (GB: UO6949 1) Haemophilus influenzae 72 46 1338 259 1245 382 gi134.0128 ORF1 Staphylococcus aureus 72 59 864 304 285 1094 gi1205330 glutamine-binding periplasmic protein Haemophilus influenzae 72 52 810 307 5039 4752 gi1070015 protein-dependent Bacillus subtilis 72 53 288 315 260 gi143399 quinol Oxidase Bacillus subtilis 72 55 258 316 1.1. 93O8 8994 gi1204445 hypothetical protein (SP:P27857) Haemophilus influenzae 72 52 315 337 926 1609 gi4874334 citrate synthase II Bacillus subtilis 72 55 684 364 10493 84.48 gi1510643 errous iron transport protein B Methanococcus jannaschi 72 53 2O46 409 340 1263 gi1402944 orfRM1 gene product Bacillus subtilis 72 49 924 441 1590 1OO3 gi312379 highly conserved among eubacteria Clostridium acetobutyllicum 72 48 588 pirS34312S34312 hypothetical protein V-Clostridium cetobutyllicum 453 6 2505 2356 pirS00601|BXSA antibacterial protein 3-Staphylococcus haemolyticus 72 70 150 460 625 gi1016162 ABC transporter subunit Cyanophora paradoxal 72 51 624 463 1628 gi666014 The polymorphysm (RFLP) of this gene is associated with uscepti 72 60 1626 bility to essential hypertension. The SA gene product has light homology to acetyl-CoA synthetase Homo sapiens 48O 3O47 34.66 gi433992 ATP synthase subunit epsilon Bacillus subtilis 72 53 42O 502 586 86 gi310859 ORF2 Synechococcus sp. 72 50 5O1 519 81 1184 gi1303704 YrkE Bacilius Subtilis 72 54 1104 559 746 gi1107530 ceuD gene product Campylobacter coli 72 56 744 575 573 gi1303866 YdgS Bacillus subtilis 72 56 570 671 592 gi1204497 protein-export membrane protein Haemophilus influenzae 72 44 591 679 295 1251 gi563258 virulence-associated protein E Dichelobacter nodosus 72 52 957 687 : 295 957 gi1146214 44% identical amino acids with the Escherichia coli Smba supress; 72 49 663 putative Bacillus Subtilis 837 435 gi1146183 putative Bacillus Subtilis 72 54 435 868 150 788 gi1377842 unknown Bacilius Subtilis 72 55 639 922 130 432 gi1088269 unknown protein Azotobacter vinelandii 72 58 303 941 238 gi153929 NADPH-sulfite reducatase flavoprotein component 72 49 237 Salmonella yphimurium 98O 421 gi853767 UDP-N-acetylglucosamine 1-carboxyvinyltransferase 72 59 Bacilius ubtilis 1209 213 43 gi144735 neurotoxin type B Clostridium botulinum 72 44 171 469 474 277 gi1205458 hypothetical protein (GB: D26562190 47) Haemophilus influenzae 72 63 198 1956 365 gi1544.09 hexosephosphate transport protein Salmonella typhimurium 72 44 363 pirP41853S41853 hexose phosphate transport system regulatory rotein uhpB Salmonella typhimurium 2101 gi1303950 YgiY Bacillus subtilis 72 50 399 2503 399 gi149713 ormate dehydrogenase Methanobacterium formicicum 72 56 171 pirA42712A42712 formate dehydrogenase (EC 1.2.1.2)- ethanobacterium formicicum 2967 155 gi1212729 Ygh.J. Bacillus subtilis 72 46 153 3OO4 185 gi665999 hypothetical protein Bacillus subtilis 72 55 183 3109 141 gi413968 ipa-44d gene product Bacillus Subtilis 72 45 138 317 287 gi515938 glutamate synthase (ferredoxin) Synechocystis sp. 72 52 285 pirS46957S46957 glutamate synthase (ferredoxin) (EC 1.4.7.1-ynechocystia sp. 377 26 367 gi1408501 homologous to N-acyl-L-amino acid amidohydrolase of Bacillus 72 63 342 Stearothermophilus Bacilius Subtilis 395 222 gi15004.09 M. jannaschii predicted coding region MJ1519 72 38 222 Methanococcus jannaschii 362 gi39956 IIGlc Bacilius Subtilis 72 57 360 347 gi100.9366 Respiratory nitrate reductase Bacillus subtilis 72 55 345 931 1200 gi537095 ornithine carbarsoyltransferase Escherichia coli 71 55 270 15 10859 103.68 gi532.309 25 kDa protein Escherichia coli 71 47 492 2 1248 2435 gi1244574 D-alanine-D-alanine ligase Enterococcus hirae 71 52 1188 898 1488 gi149629 anthranilate synthase component 2 Leptospira biflexa 71 45 591 pirC32840C32840 anthranilate synthase (EC 4.1.3.27) component II Leptospira biflexia 34 567 gi1303983 YckF Bacillus subtilis 71 59 567 37 2806 242O gi1209681 glutamate-rich protein Bacillus firmus 71 50 387 38 18 12250 12462 gi927645 arginyl endopeptidase Porphyromonas gingivalis 71 50 213 US 6,593,114 B1 109 110

TABLE 2-continued 39 3 1246 4431 pirSO941SO94 spoIIIE protein-Bacillus Subtilis 49 53186 53 14 1476O 13750 gi142611 branched chain alpha-keto acid dehydrogenase E1-alpha 2 58 1011 Bacilius ubtilis 54 12625 11789 gi143.014 gnt repressor Bacilius Subtilis 46 837 57 5860 4568 gi508175 EIIC domain of PTS-dependent Gat transport and 2 48 1293 phosphorylation Escherichia coli 57 18 13897 14334 giO63247 high homology to flavohemoprotein (Haemoglobin-like protein) 56 4.38 of Alcaligenes eutrophus and Saccharomyces cerevisiae Bacilius Subtilis 62 98.31 10955 gi1303926 YgiG Bacillus subtilis 1125 70 8505 8966 gi1471.98 phnE protein Escherichia coli 462 86 2089 1784 gi904205 hypothetical protein Bacillus subtilis 306 96 76O1 8269 gi709991 hypothetical protein Bacillus subtilis 669 1OO 4822 5931 gi1060848 opine dehydrogenase Arthrobacter sp. 1110 103 532 gi143089 iep protein Bacilius Subtilis 531 O9 15312 15695 gi4139854 ipa-61d gene product Bacillus Subtilis 384 113 316 gi663254 probable protein kinase Saccharomyces cerevisiae 315 114 5603 4608 gi143156 membrane bound protein Bacillus subtilis 996 133 1723 359 gi1303913 YchX Bacillus subtilis 1365 149 5895 5455 gi52.9650 G4OP Bacteriophage SPP1 441 154 3O87 2539 gi4254884 repressor protein Streptococcus Sobrinus 549 164 1.1. 11354 11689 gi49318 ORF4 gene product Bacillus subtilis 336 169 1936 2745 gi1403403 unknown Mycobacterium tuberculosis 810 193 272 1234 gi1303788 YgeH Bacillus subtilis 963 205 895 47 gi1215694 GlnO Mycoplasma pneumoniae 849 233 1849 2022 gi133732 ORF1 Campylobacter jejuni 174 237 45O1 51.89 gi149384 HisE Lactococcus lactis 669 272 2273 1698 gi709993 hypothetical protein Bacillus subtilis 576 274 618 496 gi143035 NAD(P)H: glutamyl-transfer RNA reductase Bacillus Subtilis 5 879 pirA35252A35252 5-aminolevulinate synthase (EC 2.3.1.37)-acillus Subtilis 276 5 2.720 2091 gi3O3562 ORF210 Escherichia coli 630 287 136 660 gi310634 20 kDa protein Streptococcus gordonii 525 288 2771 222O gi1256625 putative Bacillus Subtilis 552. 301 2461 1430 gi4674174 similar to lysine decarboxylase Bacillus subtilis 1032 306 5222 3837 gi1256618 transport protein Bacillus Subtilis 1386 307 925 314 gi6O2683 orfQ Mycoplasma capricoium 612 310 5146 4499 gi348052 acetoin utilization protein Bacilius Subtilis 648 322 1303 gi1001819 hypothetical protein Synechocystis sp. 13O2 333 3995 3819 gi467473 unknown Bacilius Subtilis 177 350 548 922 gi551879 ORF 1 Lactococcus lactis 375 375 1860 3071 gi467447 unknown Bacilius Subtilis 1232 38O 1560 2102 gi142557 ATP synthase b subunit Bacillus megaterium 543 414 251 637 gi580904 homologous to E. coli mpA Bacillus subtilis 387 424 335 1354 gi581305 L-lactate dehydrogenase Lactobacillus plantarum 102O 436 3270 2839 pirPNO501|PNO5 phosphoribosylanthranilate isomerase (EC 5.3.1.24)-Bacillus subtilis 432 (fragment) 482 128O gi410142 ORFX18 Bacilius subtilis 1278 525 1844 1416 gi143370 phosphoribosylpyrophosphate amidotransferas (PUR-F; EC 2.4.2.14) 2 429 Bacilius Subtilis 529 2047 1355 gi606150 ORF f309 Escherichia coli 693 563 22 969 gi1237015 ORF4 Bacilius Subtilis 948 581 255 gi1301730 T25G3.2 Caenorhabditis elegans 252 612 913 758 gi353968 fimbriae Z. Salmonella typhimurium 156 613 654 gi466778 lysine specific permease Escherichia coli 654 618 623 gi1146238 poly(A) polymerase Bacillus subtilis 621 630 586 gi1486243 unknown Bacilius Subtilis 585 691 641 156 gi289260 comE ORF1 Bacilius Subtilis 486 694 149 427 gi12971 NADH dehydrogenase subunit V (AA 1-605) Gallus gallus 279 irS10197IS10397 NADH dehydrogenase (ubiquinone) (EC 1.6.5.3) chain-chicken mitochondrion (SGC1) 715 169 777 gi1303830 YafL Bacillus subtilis 53 609 746 : 970 467 gi1377843 unknown Bacilius Subtilis 52 504 748 8O2 167 gi1405459 YneS Bacilius Subtilis 49 636 753 524 gi1510389 M. jannaschii predicted coding region MJO296 53 495 Methanococcus jannaschii 761 215 gi4759724 pentafunctional enzyme Pneumocystis carini 47 213 783 703 2O3 gi536655 ORF YBR244w Saccharomyces cerevisiae 52 5O1 8OO 987 682 gi1204326 tRNA delta(2)-isopentenylpyrophosphate transferase 48 306 Haemophilus influenzae 806 116 286 gi3419075 cbiM gene product Methanobacterium thermoautotrophicum 50 171 931 488 gi893358 PgSA Bacillus subtilis 56 486 1041 262 gi1408507 pyrimidine nucleoside transport protein Bacilius Subtilis 45 261 1070 172 gi709993 hypothetical protein Bacillus subtilis 46 171 1176 57 365 gi151259 HMG-CoA reductase (EC 1.1.1.88) Pseudomonas mevisionii 49 309 pirA44756A44756 hydroxymethylglutaryl-CoA reductase (EC 1.1.1.88) Pseudomonas sp. 1181 184 gi46971 epiP gene product Staphylococcus epidermidis 50 183 1281 290 gi153016 ORF 419 protein Staphylococcus aureus 2 50 288 US 6,593,114 B1 111 112

TABLE 2-continued 1348 229 orfQ Mycoplasma capricoium 48 228 2002 379 gi1008177 ORFYJL046w Saccharomyces cerevisiae 48 378 2119 217 gi1046088 arginyl-tRMA synthetase Mycoplasma genitalium 50 216 2418 32O gi1499771 M. jannaschii predicted coding region MJO936 57 31.8 Methanococcus jannaschii 2961 187 gi312443 carbamoyl-phosphate synthase (glutamine-hydrolysing) 7 57 186 Bacilius aldolyticus 2999 67 306 gi710020 nitrite reductase (nirs) Bacillus subtilis 43 240 3O33 184 gi1262335 YmaA Bacilius Subtilis 57 183 3584 338 gi401716 beta-isopropylmalate dehydrogenase Neurospora crassa 55 336 3715 399 55 gi563952 gluconatepermease Bacillus lichenifornis 59 345 3785 387 gi47382 acyl-CoA-dehydrogenase Streptomyces purpurascens 57 384 3875 272 gi1001541 hypothetical protein Synechocystis sp. 38 270 4135 32O gi142695 S-adenosyl-L-methionine: uroporphyrinpgen III methyltransferase 52 31.8 Bacillus megaterium 4249 63 239 gi1205363 deoxyribose aldolase Haemophilus influenzae 7 63 177 4508 267 gi1197667 viteilogenin. Anolis pulchelius 7 46 264 1237 2721 gi1321788 arginine ornithine antiporter Clostridium perfringens 70 54 1485 11 6572 7486 gi216854 P47K Pseudomonas chlororaphis 70 41 915 12 1481 72 gi467330 replicative DNA helicase Bacillus subtilis 70 49 1410 15 893 3O gi451216 Mannosephosphate Isomerase Streptococcus mutans 70 46 864 15 1OSO 823 gi476092 unknown Bacilius Subtilis 70 50 228 17 : 1350 568 gi145402 choline dehydrogenase Escherichia coli 70 52 783 21 925 gi149516 anthranilate synthase alpha subunit Lactococcus lactis 70 50 924 pirS35124S35124 anthranilate synthase (EC 4.1.3.27) alpha chain-actococcus lactis subsp. lactis 25 6251 gi1389549 ORF3 Bacilius subtilis 70 52 672 33 7423 gi1303875 YChB Bacillus subtilis 70 51 353 36 959 1594 gi500755 methyl purine glycosylase Mus musculus 70 47 636 38 4901 5860 gi1408507 pyrimidine nucleoside transport protein Bacilius Subtilis 70 44 96.O 44 5312 5989 gi1006620 hypothetical protein Synechocystis sp. 70 49 678 46 8950 10O2O gi1403126 CzcD gene product Alcaligenes eutrophus 70 45 O71 52 1900 1073 gi1486247 unknown Bacilius Subtilis 70 53 828 52 4048 4656 gi244501 esterase II = carboxylesterase (EC 3.1.1.1) Pseudomonas 70 50 609 fluorescens, eptide, 218 aa 56 8 9962 gi1339951 small subunit of NADH-dependent glutamate synthase 70 51 503 Plectonema boryanum 62 48 290 gi142702 A competence protein 2 Bacilius Subtilis 70 64 541 gi1204377 molybdopterin biosynthesis protein Haemophilus influenzae 70 70 3595 2051 gi1204834 2',3'-cyclic-nucleotide 2'-phosphodiesterasa Haemophilus influenzae 70 91 5466 3139 gi886471 methionine synthase Catharanthus roseus 70 96 7255 5756 pir|B39096|B390 alkaline phosphatase (EC 3.1.3.1) III precursor-Bacillus subtilis 70 110 767 1300 gi345294 adenine phosphoribosyl-transferase Escherichia coli 70 116 7026 7976 gi143607 sporulation protein Bacillus subtilis 70 121 64O1 6988 gi1107528 ttg start Campylobacter coli 70 131 6842 7936 gi1150454 prolidase PepOLactobacillus delbrueckii 70 135 489 gi311309 putative membrane-bound protein with four times repetition of 70 ro-Ser-Ala at the N-terminus; function unknown Alcaligenes utrophu 138 71.4 gi904181 hypothetical protein Bacillus subtilis 70 297 164 9874 gi49315 ORF1 gene product Bacillus subtilis 70 531 164 16 16618 gi1205212 hypothetical protein (GB:D10483 18) Haemophilus influenzae 70 993 205 2 871 gi1215695 peptide transport system protein SapF homolog; SapF homolog 70 933 Mycoplasma pneumoniae 209 1386 gi1204665 hypothetical protein (G8:X73124 26) Haemophilus influenzae 70 477 246 756 gi215098 excisionase Bacteriophage 154a 70 417 263 5622 gi142540 aspartokinase II Bacillus sp. 70 1128 268 4117 gi134.0128 ORF1 Staphylococcus aureus 70 906 3O2 3827 gi147782 ruVA protein (gtg start) Escherichia coli 70 4 627 3O2 7051 pirC38530|C385 queuine tRNA-ribosyltransferase (EC 2.4.2.29)-Escherichia coli 70 1173 313 3O8 gi1205934 aminopeptidase afi Haemophilus influenzae 70 1107 355 669 gi1070013 protein-dependent Bacillus subtilis 70 292 403 gi733147 GumPXanthomonas campestris 70 627 444 92.73 gi1204752 high affinity ribose transport protein Haemophilus influenzae 70 SO4 449 1243 gi619724 MgtE Bacillus firmus 70 1242 472 gi727145 open reading frame; putative Bacillus amyloiquefaciens 70 31.8 pirB29091B29091 hypothetical protein (bg1A region)- Bacillus myloliquefaciens fragment) 48O 727 1608 gi142560 ATP synthase gamma subunit Bacillus megaterium 70 882 524 307 gi602292 RCH2 protein Brassica napus 70 306 525 413 gi143372 phosphoribosylglycinamide formyltransfetase (PUR-N) 70 411 Bacilius ubtilis 565 2552 1479 gi881434 ORFP Bacilius subtilis 70 51 1074 6O7 829 1284 gi1511524 hypothetical protein (SP: P37002) Methanococcus jannaschii 70 50 456 633 703 23 gi431231 uracil permease Bacillus caldolyticus 70 53 681 646 1309 935 gi467340 unknown Bacilius Subtilis 70 49 375 663 417 gi1303873 YdgZ Bacillus subtilis 70 40 414 US 6,593,114 B1 113 114

TABLE 2-continued 681 781 74 hypothetical protein Synechocystis sp. 70 708 708 448 HYPOTHETICAL 54.3 KD PROTEIN IN ECO-ALKB 70 447 INTERGENC REGION. 725 722 gi1001644 hypothetical protein Synechocystis sp. 70 672 776 2O3 gi145365 putative Escherichia coli 70 585 834 783 gi552971 NADH dehydrogenase (ndhF) Vicia faba 70 534 865 1173 gi1204636 ATP-dependent helicase Haemophilus influenzae 70 2O7 894 gi4673644 DNA binding protein (probale) Bacillus subtilis 70 267 919 317 gi1314847 Cin A Bacilius Subtilis 70 315 944 572 gi709991 hypothetical protein Bacillus subtilis 70 570 988 4.38 gi142441 ORF3; putative Bacillus subtilis 70 5 168 1055 335 gi529755 spec Streptococcus pyogenes 70 3 333 1093 904 gi853754 ABC transporter Bacillus subtilis 70 903 1109 310 gi1001827 hypothetical protein Synechocystis sp. 70 309 1220 pirS23416|S234 epiB protein-Staphylococcus epidermidis 70 234 1279 348 gi153015 Fema protein Staphylococcus aureus 70 276 1336 195 542 sp|P31776|PBPA PENICILLIN-BINDING PROTEIN 1A (PBP-1A) 70 5 348 (PENICILLIN-BINDING PROTEIN A). 1537 232 402 gi1146181 putative Bacillus Subtilis 70 5 171 3574 272 93 gi219630 endothelin-A receptor Homo sapiens 70 g 18O 1640 346 gi1146243 22.4% identity with Escherichia coli DNA-damage inducible protein 70 6 345 ..; putative Bacillus subtilis 2SO4 286 gi4951794 transmembrane protein Lactococcus lactis 70 285 3061 301 38 gi508175 EIIC domain of PTS-dependent Gat transport and phosphorylation 70 264 Escherichia coli 3128 199 gi1340096 unknown Mycobacterium tuberculosis 70 51 198 3218 3 488 gi515938 glutamate synthase (ferredoxit) Synechocystis sp. 70 50 486 pirS46957S46957 glutamate synthase (ferredoxin) (EC 1.4.7.1)-ynechocystis sp. 3323 399 gi1154891 ATP binding protein Phoraidium laminosum 70 52 396 3679 399 199 gi5293.85 chromosome condensation protein Caenorhabditis elegans 70 3O 2O1 3841 398 90 gi1208965 hypothetical 23.3 kd protein Escherichia coli 70 47 309 3929 4O1 gi149435 putative Lactococcus lactis 70 49 399 4044 374 153 gi602031 similar to trimethylamine DH Mycoplasma capricolus 70 40 222 pirS4995OS49950 probable trimethylamine dehydrogenase (EC 5.99.7)- Mycoplasma capricolum (SGC3) (fragment) 4329 28O gi1339951 small subunit of NADH-dependent glutamate synthase 70 49 279 Plectonema boryanum 4422 289 gi296464 ATPase Lactococcus lactis 70 57 288 4647 2OO 39 gi166412 NADH-glutamate synthase Medicago satival 70 59 162 16 7571 9031 gi1499620 M. jannaschii predicted coding region MJO798 69 44 1461 Methanococcus jannaschii 16 gi1353197 thioredoxin reductase Eubacterium acidaminophilum 69 54 954 3O 727 gi1204910 hypothetical protein (GB: U14003 302) Haemophilus influenzae 69 52 726 38 1023 1298 gi4077734 dev A gene product Anabaena sp. 69 41 276 44 5987 6595 gi1205920 molybdate uptake system hydrophilic membrane-bound protein 69 45 609 Haemophilus influenzae 62 15 9104 94.75 gi385178 unknown Bacilius Subtilis 69 44 372 66 24O2 28O3 gi1303893 YchLBacillus subtilis 69 51 402 67 15 33627 13130 gi149647 ORFZ Listeria monocytogenes 69 37 498 67 17 14053 14382 gi305.002 ORF f356 Escherichia coli 69 49 330 67 15130 15807 gi1109684 ProV Bacilius Subtilis 69 45 678 78 1447 2124 gi1256633 putative Bacillus Subtilis 69 53 678 78 3725 2937 gi1303958 YGG Bacilius Subtilis 69 32 789 85 4213 3905 pir|E29326|E293 hypothetical protein (pur operon)-Bacilius Subtilis 69 32 309 86 2654 2055 gi973332 OrfQ Bacilius Subtilis 69 50 6OO 95 96 710 gi786468 4A11 antigen, sperm tail membrane antigen = putative sucrose 69 43 615 specific hosphotransferase enzyme II homolo (mice, testis, Peptide Partial, 72 aa) OO 6O23 7426 gi1205355 Na+/H+ antiporter Haemophilus influenzae 69 39 1404 O2 1650 622 gi561690 sialoglycoprotease Pasteurella haemolytical 69 47 1029 O3 8537 4833 gi100.9366 Respiratory nitrate reductase Bacillus subtilis 69 54 3705 O3 12552 10317 gi710020 nitrite reductase (nirB) Bacillus subtilis 69 2436 12 8708 101.68 gi154411 hexosephosphate transport protein Salmonella typhimurius 69 51 1461 pirD41853D41853 hexose phosphate transport system protein uhpT almonella typhimurius 12 16644 17414 gi1204435 pyruvate formate-lyase activating enzyme Haemophilus influenzae 69 50 771 13 33 953 gi290509 o307 Escherichia coli 69 43 921 14 1058 579 pirA42771A427 reticulocyte-binding protein 1-Plasmodium vivax 69 39 48O 21 : 4309 5310 gi1154633 NrdF Bacilius Subtilis 69 53 10O2 25 267 854 gi413931 ipa-7d gene product Bacilius Subtilis 69 43 588 49 27 104.00 10134 pirS280891S280 hypothetical protein A-yeast Zygosaccharomyces bisporus 69 39 267 plasmid pS53 61 1. 813 28 gi1205,538 hypothetical protein (GB: U14003 302) Haemophilus influenzae 69 47 786 65 2222 4633 gi40054 phenylalanyl-tRNA synthetase beta subunit AA 1-804) 69 52 2412 Bacilius btilis 69 1210 1761 gi296.031 elongation factor T's Spirulina platensis 69 45 552. 75 12 8339 7992 gi732682 FimE protein Escherichia coli 69 69 348 US 6,593,114 B1 115 116

TABLE 2-continued 190 2 484 1671 HISTIDINOL-PHOSPHATE AMINOTRANSFERASE (EC 2.6.1.9) 69 48 1188 (IMIDAZOLE ACETOL-PHOSPHATE TRANSAMINASE). 2O6 2777 Sl 417SO hsdR protein (AA 1-1033) Escherichia coli 69 49 2775 2O6 5796 5554 gi 256135 YbbF Bacilius subtilis 69 48 243 249 319 gi 405456 YneP Bacilius Subtilis 69 50 31.8 3O2 482O 5776 gi OO1768 hypothetical protein Synechocystis sp. 69 48 957 324 3893 402 gi 256798 pyruvate carboxylase Rhizobium etli 69 53 3492 351 1808 1518 gi 491664 TO4H1.4 Caenorhabditis elegans 69 3O 291 369 2O75 2305 gi 336.458 ORF Balaenoptera acutorostrata 69 61 231 392 999 2424 gi 556O15 DRF1 Bacilius Subtilis 69 45 426 410 87 779 gi 55611 phosphoglyceromutase ZysOmonas mobilis 69 58 693 421 1129 173 gi 276985 arginase Bacillus caldovelox 69 54 957 444 6713 7741 gi 221782 purine synthesis repressor Haemophilus influenzae 69 40 1029 453 415 gi 122758 unknown Bacilius Subtilis 69 57 414 469 2246 12O6 gi 458228 mutY homolog Homo sapiens 69 44 1041 509 1371 1012 gi 49224 URF 4 Synechococcus sp. 69 39 360 52O 2823 26.23 gi 726427 similar to D. melanogaster MST101-2 protein (PIR: S34154) 69 39 2O1 Caenorhabditis elegans 531 26 760 gi 509672 repressor protein Bacteriophage Tuc2009 69 33 735 589 107 253 gi 691O1 17.9 kDa heat shock protein (hsp17.9) Pisum sativum 69 52 147 594 597 1391 gi 42.783 DNA photolyase Bacillus firmus 69 48 795 604 2114 1752 gi 4 1393O ipa-6d gene product Bacilius Subtilis 69 45 363 6O7 313 gi 236103 W08D2.3 Caenorhabditis elegans 69 47 312 6O7 312 34 gi 536,715 ORF YBR275c Saccharomyces cerevisiae 69 39 279 734 433 gi 467327 unknown Bacilius Subtilis 69 44 432 759 338 gi OO9367 Respiratory nitrate reductase Bacillus subtilis 69 50 336 761 392 586 gi 3508 Leucyl-tRNA synthetase (cytoplasmic) Saccharomyces cerevisiae 69 46 195 1370340 ORFYPL160w Saccharomyces cerevisiae 8O2 72 1013 gi 43O44 ferrochelatase Bacilius Subtilis 69 55 942 816 1368 163 gi 510268 restriction modification system S subunit Methanococcus jannaschii 69 45 12O6 838 133 387 gi 2553.71 coded for by C. elegans cDNA yk34a9.5; coded for by C. elegans 69 46 255 cDNA yk34a9.3; Similar to guanylate kinase Caenorhabditis elegans 851 745 1005 Sl 288998 aecA gene product Antithamnion sp. 69 39 261 867 269 gi protein-dependent Bacillus subtilis 69 47 267 995 478 gi transcription elongation factor Haemophilus influenzae 69 53 477 999 SO6 Sl 8992.54 predicted trithorax protein Drosophila virilis 69 21 SO4 1127 659 gi H. influenzae predicted coding region HT1191 69 56 657 Haemophilus influenzae 1138 1. 248 460 gi 510646 M. jannaschii predicted coding region M3D568 69 48 213 Methanococcus jannaschii 2928 Sl 290SO3 glutamate permease Escherichia coli 69 41 399 3090 223 gi 2O4987 DNA polymerase III, alpha chain Haemophilus influenzae 69 36 222 3817 400 gi 483199 peptide-synthetase Amycolatopsis mediterranei 69 45 399 3833 335 gi 524193 unknown Mycobacterium tuberculosis 69 46 333 4O79 400 53 Sl 546918 orfY 3' of comK Bacillus subtilis, E26, Peptide Partial, 140 aa 69 64 348 pirS43612S43612 hypothetical protein Y-Bacillus subtilis sp|P40398|YHXD BACSU HYPOTHETICAL PROTEIN IN COMK 3'REGION (ORFY) FRAGMENT). 4115 215 400 Sl 517205 67 kDa Myosin-crossreactive Streptococcal antigen 69 186 Streptococcus yogenes 4139 333 Sl 1208451 hypothetical protein Synechocystis sp. 69 333 4258 230 Sl 496.158 restriction-modification enzyme subunit M1 Mycoplasma pulmonis 69 228 pirS49395S49395 HsdM1 protein-Mycoplasma pulmonis (SGC3) 4317 374 Sl 413967 ipa-43d gene product Bacillus Subtilis 69 285 4465 293 Sl 396.296 similar to phosphotransferase systems enzyme II Escherichia coli 69 g 291 sp|P32672|PTNC ECOLI PTS SYSTEM, FRUCTOSE-LIKE-2 IIC COMPONENT PHOSPHOTRANSFERASE ENZYME II, C COMPONENT). 1193 84 gi 1109685 ProW Bacilius subtilis 68 110 15 2074 1556 gi 807973 unknown Saccharomyces cerevisiae 68 519 31 6328 8772 gi 290642 ATPase Enterococcus hirae 68 2445 40 750 385 gi 606342 ORF o622; reading frame open far upstreams of start; possible 68 366 rameshift, linking to previous ORF Escherichia coli 46 6886 8415 Sl 155276 aldehyde dehydrogenase Vibrio cholerae 68 1530 48 3404 31.65 gi 285.608 241k polyprotein Apple stem grooving virus 68 240 48 3536 4132 gi 104.5937 M. genitalium predicted coding region MG246 68 597 Mycoplasma genitalium 53 10685 9699 gi 1303952 YajA Bacillus subtilis 68 987 70 7346 8155 gi 147198 phnE protein Escherichia coli 68 810 89 1899 2966 gi 145173 35 kDa protein Escherichia coli 68 1068 108 1150 113 gi 38722 precursor (aa -20 to 381) Acinetobacter calcoaceticus 68 1038 irA29277A29277 aldose 1-epimerase (EC 5.1.3.3- Acinetobacter icoaceticus 112 2666 3622 gi 153724 MalC Streptococcus pneumoniae 68 957 116 7865 8638 gi 143608 sporulation protein Bacillus subtilis 68 774 118 2484 3698 gi 1303805 YgeR Bacillus subtilis 68 3215 12O 1424 1594 SULFITE REDUCTASE (NADPN) FLAVOPROTEIN ALPHA 68 45 171 COMPONENT (EC 1.8.1.2) (SIR-FP). US 6,593,114 B1 11 7 118

TABLE 2-continued

129 1011 Sl 3963O7 argininosuccinate lyase Escherichia coli 68 5 1011 132 1867 2739 Sl 216267 ORF2 Bacilius megaterium 68 4 873 134 848 1012 gi 47545 DNA recombinase Escherichia coli 68 5 165 141 372 614 Sl 872116 sti (stress inducible protein) (Glycine max) 68 243 149 2260 2066 gi 45774 hsp70 protein (dnaK gene) Escherichia coli 68 195 155 1534 1292 Sl 216.583 ORF1 Escherichia coli 68 243 158 1826 3.289 HYPOTHETICAL 54.3 KD PROTEIN IN ECO-ALKB 68 1464 INTERGENC REGION. 169 2749 3318 gi 4034O2 unknown Mycobacterium tuberculosis 68 4 570 175 7365 5572 gi O72395 phaA gene product Rhizobium meliloti 68 5 794 188 41.84 5434 gi 173843 3-ketoacyl-ACP synthase III Vibrio harveyi 68 4 1251 189 907 1665 gi 4 6 73 8 3 DNA binding protein (probable) Bacillus Subtilis 68 759 2O6 6709 5735 gi Ybb Bacilius Subtilis 68 975 2O6 10425 12176 gi 4 5 2 68 7 pyruvate decarboxylase Saccharomyces cerevisiae 68 1752 212 3421 3648 gi c1 gene product Bacteriophage B1 68 3 228 214 5457 6482 gi ORF YOR196c Saccharomyces cerevisiae 68 1026 237 2507 3O88 gi HisH Lactococcus lactis 68 582 243 4542 3544 gi mevalonate pyrophosphate decarboxylase Saccharomyces cerevisiae 68 999 262 164 gi 50974 4-Oxalocrotonate tautomerase Pseudomonas putida 68 162 262 1118 252 gi 147744 PSR Enterococcus hirae 68 867 276 3139 2576 sp P3075OABC E (ATP-BINDING PROTEIN ABC (FRAGMENT). 68 564 306 5725 5105 gi 2566.17 adenine phosphoribosyltransferase Bacillus subtilis 68 621 333 3850 31O1 Sl 467473 unknown Bacilius Subtilis 68 750 365 4838 4659 gi 13.064 T22B3.3 Caenorhabditis elegans 68 18O 376 549 1646 gi 27702 DAPA aminotransferase Bacilius Subtilis 68 5 1098 405 872 gi 3O39 YGiB Bacillus subtilis 68 870 4O6 539 225 gi 5115 ABC transporter, probable ATP-binding subunit 68 315 Methanococcus jannaschii 426 3.391 3224 Sl1624.632 GltLEscherichia coli 68 168 438 108 329 gi 46923 nitrogenase reductase Escherichia coli 68 222 443 240 Sl 535810 hippuricase Campylobacter jejuni 68 237 443 518 1015 gi 204742 H. influenzae predicted coding region HIO491 68 498 Haemophilus influenzae 443 1. 3779 311 80966O deoxyribose-phosphate aldolase Bacillus subtilis 68 5 5 669 pirS49455S49455 deoxyribose-phosphate aldolase (EC 4.1.2.4)-acillus subtilis 476 240 1184 Sl1971345 unknown, similar to E. coli cardiolipin synthase Bacillus subtilis 68 945 sp|P4586OYHIE BACSU HYPOTHETICAL 58.2 PROTEIN IN NAR-ACDANTERGENC REGION. 486 1046 216 gi 47328 transport protein Escherichia coli 68 4 1. 831 517 1764 2084 gi 5238.09 orf2 Bacteriophage A2 68 64 321 572 57 sp |P39237Y05L HYPOTHETICAL 6.8 KID PROTEIN IN NRDC-TK 68 4 7 570 INTERGENC REGION. 646 459 Sl 413982 ipa-58r gene product Bacillus subtilisu 68 456 659 3 1668 190 gi O7541 C33D9.8 Caenorhabditis elegans 68 234 864 1510 gi 45774 hsp70 protein dnaK gene) Escherichia coli 68 2O7 92O 432 gi 510416 hypothetical protein (SP: P31466) Methanococcus jannaschii 68 429 952 611 Sl 603456 reductase Leishmania major 68 486 970 91 gi 35.4775 pfoS/R Treponema pallidum 68 312 1028 534 Sl diaminopimelate decarbox.lase Bacillus subtilis 68 531 1029 216 gi 335,714 plasmodium falciparum mRHA for asparagine-rich antigen 68 3 213 (clone 17Cl) Plasmodium falciparum 1058 348 Sl 581649 epiC gene product Staphylococcus epidermidis 68 345 1096 465 265 gi 43434 Rho Factor Bacilius Subtilis 68 2O1 1308 694 gi 469939 group B oligopeptidase PepB Streptococcus agalactiae 68 693 1679 238 Sl 517205 67 kDa Myosin-crossreactive Streptococcal antigen 68 237 Streptococcus yogenes 2O39 383 gi 53898 ransport protein Salmonella typhimurium 68 5 381 207 7 326 pirC33496IC334 hisC homolog-Bacillus subtilis 68 4 324 2112 374 135 Sl 64.884 amin LII Xenopus laevis 68 5 240 2273 398 Sl 581648 epiB gene product Staphylococcus epidermidis 68 396 2948 385 Sl 216869 branched-chain amino acid transport carrier 68 384 Pseudomonas aeruginosapirA38534A38534 branched-chain amino acid transport protein braZ Pseudomonas aeruginosa 2955 400 32 Sl 904179 hypothetical protein Bacillus subtilis 68 369 2981 288 Sl 508979 GTF-binding protein Bacillus subtilis 68 285 3O14 294 Sl 1524394 ORF-2 upstream of gbgAB operon Bacillus subtilis 68 291 3082 169 Sli1204696 ructose-permease IIBC component Haemophilus influenzae 68 5 168 3.108 303 258 Sl 217855 heat-shock protein Arabidopsis thaliana 68 156 3639 461 Sl 1510490 nitrate transport permease protein Methanococcus jannaschii 68 459 3657 330 Sl 155369 PTS enzyme-II fructose Xanthomonas campestris 68 330 3823 391 Sl 1603768 HutI protein, imidazolone-5-propionate hydrolase Bacillus subtilis 68 5 390 gi603768 HutI protein, imidazolone-5-propionate hydrolase Bacillus Subtilis 3982 277 Sl 149435 putative Lactococcus lactis 68 276 4051 1. 342 Sl 450688 hsdM gene of EcoprrI gene product Escherichia coli 68 342 pirS38437S38437 hsdM protein-Escherichia coli pirSO9629ISO9629 hypothetical protein A-Escherichia coli (SUB 40-520) 4089 12 209 Sl 1353678 heavy-metal transporting P-type ATPase Proteus mirabilis 68 198 US 6,593,114 B1 119 120

TABLE 2-continued 4143 187 gi603769 HutU protein, urocanase Bacillus Subtilis 68 55 143 4148 352 gi4506884 hsdM gene of EcoprrI gene product Escherichia coli 68 51 351 pirS38437S38437 hgdM protein-Escherichia coli pirSO9629 SO9629 hypothetical protein A-Escherichia coli (SUB 40-520) 4173 382 gi1041097 Pyruvate Kinase Bacillus psychrophilus 68 48 381 4.182 1. 250 gi413968 ipa-44d gene product Bacillus Subtilis 68 50 249 4362 348 318 gi450688 hsdM gene of EcoprrI gene product Escherichia coli 68 44 171 pirS38437S38437 hsdM protein-Escherichia coli pirSO9629ISO9629 hypothetical protein A-Escherichia coli (SUB 40-520) 5 11 83OO gi143727 putative Bacilius Subtilis 67 46 1194 31 11 98.33 93.48 gi216746 D-lactate dehydrogenase Lactobacillus plantarus 67 41 486 32 3 1560 gi1098557 renal sodium?dicarboxylate cotransporter Homo sapiens 67 46 1596 32 5 4145 3345 gi1510720 prephenate dehydratase Methanococcus jannaschii 67 51 8O1 36 4268 gi"1146216 45% identity with the product of the ORF6 gene from the 67 58 1083 Erwinia herbicola carotenoid biosynthesis cluster; putative Bacilius Subtilis 44 4492 5304 gi1006621 hypothetical protein Synechocystis sp. 67 813 56 3943 8481 gi3O4131 glutamate synthase large subunit precursor Azospirillum brasilense 67 4539 pirB46602B46602 glutamate synthase (NADPH) (EC 1.4.1.13) alpha hain - Azospirillum brasilense 56 13923 14678 gi1000453 TreR Bacilius Subtilis 67 756 62 4757 4422 gi1113949 orf3 Bacillus, C-125, alkali-sensitive mutant 18224, Peptide Mutant, 67 336 112 aa 62 1. 6338 5106 gi854655 Na/H antiporter system Bacillus alcalophilus 67 1233 99 2119 3321 gi1204349 hypothetical protein (GB:GB:D90212 3) Haemophilus influenzae 67 12O3 102 5695 7176 gi149432 putative Lactococcus lactis 67 1482 103 14049 13549 gi1408497 LP9D gene product Bacillus subtilis 67 5O1 109 13982 13143 gi413976 ipa-52r gene product Bacillus subtilis 67 840 109 14811 15.194 gi413983 ipa-59d gene product Bacillus subtilis 67 384 121 1713 2153 gi1262335 YmaA Bacilius Subtilis 67 441 122 1. 1149 gi143047 ORFB Bacilius subtilis 67 3 149 124 3518 2976 gi556885 Unknown Bacilius Subtilis 67 4 543 131 3589 2594 gi1046081 hypothetical protein (GB:D26185 10) Mycoplasma genitalium 67 3 996 140 2297 1695 gi146549 kdpC Escherichia coli 67 4 603 142 4198 2987 gi1212775 GTP cyclohydrolase II Bacillus amyloliquefaciens 67 212 147 2374 1835 gi1303709 YrkJ Bacillus Subtilis 67 540 152 6341 6673 gi1377841 unknown Bacilius Subtilis 67 333 161 272O 3763 gi496319 SphX Synechococcus sp. 67 O44 163 1989 3428 gi595681 2-oxoglutarate/malate translocator Spinacia oleracea 67 440 193 1351 1626 gi1511101 shikimate 5-dehydrogenase Methanococcus jannaschi 67 276 2OO 917 2179 gi142439 ATP-dependent nuclease Bacillus Subtilis 67 263 2O6 12445 128O1 sp|P37347YECD HYPOTHETICAL 21.8 KID PROTEIN INASPSS REGION. 67 357 2O6 13047 14.432 gi732813 branched-chain amino acid carrier Lactobacillus delbrueckii 67 386 208 809 297 gi1033037 100 kDs heat shock protein (Hsp100) Leishmania major 67 3 513 238 1039 2052 gi809542 CbrB protein Erwinia chrysanthem 67 4 O14 246 176 367 gi215098 excisionase Bacteriophage 154a 67 192 276 1412 564 gi3O3560 ORF271 Escherichia coli 67 849 297 2223 3056 gi142784 CtaA protein Bacillus firmus 67 834 307 4186 3152 gi1070013 protein-dependent Bacillus subtilis 67 O35 316 36 1028 gi1161061 dioxygenase Methylobacterium extorquens 67 993 324 SO3O 4.410 gi1469784 putative cell division protein ftsW Enterococcus hirae 67 621 336 264 4 gi173122 urea amidolyase Saccharomyces cerevisiae 67 261 360 108 1394 Sp P30053SYH S HISTIDYL-TRNASYNTHETASE (EC 6.1.1.21) (HISTIDINE 67 287 TRNA LIGASE) (HISRS). 364 3592 2294 gi151259 HMG-CoA reductase (EC 1.1.1.88) Pseudomonas mevalonii 67 299 pirA44756A44756 hydroxymethylglutaryl-CoA reductase (EC 1.1.1.88) Pseudomonas sp. 365 2113 1286 gi1296823 orf2 gene product Lactobacilius helveticus 67 828 367 325 918 gi1039479 ORFU Lactococcus lactis 67 594 395 666 1271 gi1204516 hypothetical protein (GB:U00014. 4) Haemophilus influenzae 67 606 415 901 gi882579 CG Site No. 29739 Escherichia coli 67 900 419 903 gi520752 putative Bacilius Subtilis 67 897 474 796 gi886906 arginiosuccinate synthetase Streptomyces clavuligerus 67 795 pirS57659S57659 argininosuccinate synthase (EC 6.3.4.5) - reptomyces clavuligerus 485 1921 2226 gi143434 Rho Factor Bacilius Subtilis 67 306 596 865 gi1303853 YdgF Bacillus subtilis 67 864 700 218 gi1204628 hypothetical protein (SP:P21498) Haemophilus influenzae 67 216 806 249 647 gi677947 AppC Bacillus subtilis 67 399 828 340 900 gi777761 rrA Synechococcus sp. 67 561 833 916 425 gi142996 regulatory protein Bacilius Subtilis 67 4 492 856 779 gi780224 ZK970.2 Caenorhabditis elegans 67 777 888 850 86 gi437315 TTG start codon Bacilius licheniformis 67 765 1034 597 gi1205113 hypothetical protein (GB:L19201 15) Haemophilus influenzae 67 594 1062 319 gi1303850 YdgC Bacillus subtilis 67 31.8 1067 460 pirA32950A329 probable reductase protein - Leishmania major 67 459 453 1358 293 gi1001369 hypothetical protein Synechocystis sp. 67 291 US 6,593,114 B1 12 1 122

TABLE 2-continued 2181 3O2 gi 510416 hypothetical protein (SP:P31466) Methanococcus jannaschii 67 48 3OO 3OOO 507 Sl 517205 67 kDa Myosin-crossreactive streptococcal antigen 67 56 507 Streptococcus yogenes 3066 234 Sl GTG start codon Lactococcus lactis 67 46 231 3O87 251 48 gi 2O5366 oligopeptide transport ATP-binding protein Haemophilus 67 44 204 influenzae 31O1 256 gi 531541 uroporphyrinogen III methyltransferase Zea mays 67 255 3598 1. 393 58 gi 51259 HMG-CoA reductase (EC 1.1.1.88) Pseudomonas mevalonii 67 336 pirA44756A44756 hydroxymethylglutaryl-CoA reductase (EC 1.1.1.88) Pseudomonas sp. 3765 366 148 Sl 55.7489 menD Bacillus Subtilis 67 219 3788 398 138 pirS52915|S529 nitrate reductase alpha chain - Bacillus Subtilis (fragment) 67 261 3883 265 Sl 704397 cystathionine beta-lyase Arabidopsis thaliana 67 264 3926 340 gi 483199 peptide-synthetase Amycolatopsis mediterranei 67 339 44.17 82 396 gi 2O5337 ribonucleotide transport ATP-binding protein Haemophilus 67 315 influenzae 3075 3989 Sl 535348 CodV Bacilius Subtilis 66 4 915 15 2273 2542 Sl146491 SmtBSynechococcus PCC7942 66 3 270 31 7826 7593 Sl 292O46 mucin Homo Sapiens 66 234 31 1. 90.34 92.58 gi 2O45.45 mercury scavenger protein Haemophilus influenzae 66 225 32 5253 4159 Sl1998342 inducible nitric Oxide synthase Gellus gallus 66 1095 44 8856 1O124 gi 510751 molybdenum cofactor biosynthesis moe.A protein Methanococcus 66 1269 iannaschi 48 1276 2868 gi 50209 ORF 1 Mycoplasma mycoides 66 1593 58 7178 8428 Sl 665999 hypothetical protein Bacillus subtilis 66 1251 62 4370 3597 gi O72398 phaD gene product Rhizobium meliloti 66 774 70 10998 1O303 Sl 80966O deoxyribose-phosphate aldolase Bacillus subtilis 66 696 pirS49455S49455 deoxyribose-phosphate aldolase (EC 4.1.2.4) - acillus subtilis 76 1305 gi 42440 ATP-dependent nuclease Bacillus Subtilis 66 1305 91 82O5 7174 Sl 704397 cystathionine beta-lyase Arabidopsis thaliana 66 1032 O2 3265 272O gi 204323 hypothetical protein (SP:P31805) Haemophilus influenzae 66 546 O3 2732 2O46 Sl1971344 nitrate reductase gamma subunit Bacilius Subtilis 66 687 sp|P42177INARI BACSU NITRATE REDUCTASE GAMMA CHAIN (EC 1.7.99.4). gi10093.69 Respiratory nitrate reductase Bacillus subtilis (SUB -160) 4243 4674 170886 glucosamine-6-phosphate deaminase Candida albicans 66 432 Sl pirA46652A46652 glucosamine-6-phosphate isomerase (EC 5.3.1.10) - east (Candida albicans) 17491 17712 132.3179 ORF YGR111 w Saccharomyces cerevisiae 66 3 222 16 2637 6O7 1491.813 gamma-glutamyltranspeptidase Bacilius Subtilis 66 2O31 50 2989 2789 11146224 putative Bacillus Subtilis 66 s 2O1 3264 3662 755152 highly hydrophobic integral membrane protein Bacilius Subtilis 66 4 399 sp|P42953TAGG BACSU TEICHOIC ACID TRANSLOCATION PERMEASE PROTEIN AGG. 74 3723 2854 11146241 pantothenate synthetase Bacillus subtilis 66 4 870 75 288O 2551 642655 unknown Rhizobium meliloti 66 330 75 1. 7994 7245 854655 Na/H antiporter system Bacillus alcalophilus 66 750 90 572.7 4375 451072 di-tripeptide transporter Lactococcus lactis 66 1353 95 1. 13713 13507 11322411 unknown Mycobacterium tuberculosis 66 2O7 217 2595 2368 1143542 alternative stop codon Rattus norvegicus 66 228 233 6135 5137 1458327 FO8F3.4 gene product Caenorhabditis elegans 66 999 238 43 1041 809541 CbrA protein Erwinia chrysanthemi 66 999 241 1053 153067 peptidoglycan hydrolase Staphylococcus aureus 66 1OSO 261 648 118 1510859 M. jannaschii predicted coding region MJO790 Methanococcus 66 4 531 iannaschi 263 3 2973 2215 12O5865 tetrahydrodipicolinate N-succinyltransferase Haemophilus 66 4 7 759 influenzae 272 8 5484 442O Sl1882101 high affinity nickel transporter Alcaligenes eutrophus 66 4 1065 sp|P23516|HOXN ALCEU HIGH-AFFINITY NICKEL TRANSPORT PROTEIN. 276 2104 1403 Sl 1208965 hypothetical 23.3 kd protein Escherichia coli 66 702 278 1784 738 Sl11488662 phosphatase-associated protein Bacilius Subtilis 66 1047 278 2952 2074 Sl 303560 ORF271 Escherichia coli 66 879 279 2218 542 1185289 2-succinyl-6-hydroxy-2,4-cyclohexadiene-1-carboxylate synthase 66 1677 Bacilius Subtilis 288 2275 2015 Sl 1256625 putative Bacillus Subtilis 66 261 292 942 751. Sli1511604 M. jannaschii predicted coding region MJ1651 Methanococcus 66 44 192 iannaschi 294 1. 559 Sl1216314 esterase Bacillus Stearothermophilus 66 4 5 558 297 1978 1043 Sl 99.4794 cytochrome a assembly facto Bacillus subtilis 66 936 sp|P24009|COXX BACSU PROBABLE CYTOCHROME C OXIDASE ASSEMBLY FACTOR. 316 2053 2682 1107839 alginate lyase Pseudomonas aeruginosa 66 630 338 23O2 2144 Sl 52O750 biotin synthetase Bacillus sphaericus 66 159 339 735 256 Sl1467468 7,8-dihydro-6-hydroxymethylpterin-pyrophosphokinase Bacillus 66 48O ubtilis 363 1. 3 863 Sl 581649 epiC gene product Staphylococcus epidermidis 66 861 366 232 483 Sl 1103505 unknown Schizosaccharomyces pombe 66 252 US 6,593,114 B1 123 124

TABLE 2-continued 367 1845 1222 sp|P20692TYRA PREPHENATE DEHYDROGENASE (EC 1.3.1.12) (PDH). 66 50 624 372 1599 1048 gi467416 unknown Bacilius Subtilis 66 38 552. 378 212 1009 gi147309 purine nucleoside phosphorylase Escherichia coli 66 50 798 4O1 462 gi388263 p-aminobenzoic acid synthase Streptomyces griseus 66 46 462 pirJNO531JNO531 p-aminobenzoic acid synthase - Streptomyces riseuS 404 4826 5254 gi606744 cytidine deaminase Bacillus subtilis 66 51 429 411 1103 468 gi1460081 unknown Mycobacterium tuberculosis 66 44 636 42O 54 gi1046024 Na+ ATPase subunit J Mycoplasma genitalium 66 49 540 431 858 gi1500008 M. jannaschii predicted coding region MJ1154 Methanococcus 66 50 858 iannaschi 443 5299 4919 gi852076 MrgA Bacillus subtilis 66 46 381 444 2413 142 gi153047 lysostaphin (ttg start codon) Staphylococcus Simulans 66 51 993 pirA25881A25881 lysostaphin precursor - Staphylococcus simulans sp|P10547LSTP STASILYSOSTAPHIN PRECURSOR (EC 3.5.1.-). 561 1. 48O gi1204.905 DNA-3-methyladenine glycosidase I Haemophilus influenzae 66 4 5 477 562 1066 gi1046082 M. genitalium predicted coding region MG372 Mycoplasma 66 31.8 genitalium 576 11 gi305014 ORF o234 Escherichia coli 66 71.4 57.7 903 gi1001353 hypothetical protein Synechocystis sp. 66 288 584 sp|P24204YEBA HYPOTHETICAL 46.7 KD PROTEIN IN MSBB-RUVB 66 i 330 INTERGENIC REGION (ORFU). 592 7O6 gi928839 ORF266; putative Lactococcus lactis phage BK5-T 66 705 6O1 1. 720 gi1488695 novel antigen; orf-2 Staphylococcus aureus 66 71.4 619 468 gi746573 similar to M. musculus transport system membrane protein, Nramp 66 378 (PIR:A40739) and S. cerevisiae SMF1 protein (PIR:A45154) Caenorhabditis elegans 355 149 gi804808 unknown protein Rattus norvegicus 66 4 6 2O7 : 512 351 gi1519085 phosphatidylcholine binding immunoglobulin heavy chain IgM 66 162 variable region Mus musculus 740 317 gi1209272 argininosuccinate lyase Campylobacter jejuni 66 315 764 310 747 gi435296 alkaline phosphatase like protein Lactococcus lactis 66 : 438 pirS39339|S39339 alkaline phosphatase-like protein - Lactococcus actis 852 171 gi536955 CG Site No. 361 Escherichia coli 66 168 886 158 gi289272 ferrichrome-binding protein Bacillus subtilis 66 156 889 232 gi833.061 HCMVUL77 (AA 1-642) Human cytomegalovirus 66 6 231 893 247 gi149008 putative Helicobacter pylori 66 246 900 733 41 gi580842 F3 Bacillus Subtilis 66 693 906 1473 646 gi790945 aryl-alcohol dehydrogenase Bacilius Subtilis 66 828 947 79 549 gi410117 diaminopimelate decarboxylase Bacillus subtilis 66 471 950 552. gi48713 orf145 Staphylococcus aureus 66 549 955 89 475 gi1204390 uridine kinase (uridine monophosphokinase) Haemophilus 66 387 influenzae 981 997 686 gi4571464 rhoptry protein Plasmodium yoeli 66 312 986 25 315 gi305.002 ORF f356 Escherichia coli 66 291 1057 2O3 gi1303853 YdgF Bacillus subtilis 66 2O1 1087 294 gi575913 unknown Saccharomyces cerevisiae 66 294 1105 231 gi1045799 methylgalactoside permease ATP-binding protein Mycoplasma 66 . 231 genitalium 1128 gi1001493 hypothetical protein Synechocystis sp. 66 573 1150 250 gi1499034 M. jannaschii predicted coding region MJO255 Methanococcus 66 249 iannaschi 118O 453 gi215908 DNA polymerase (g43) Bacteriophage T4 66 255 1208 587 gi1256653 DNA-binding protein Bacillus subtilis 66 537 1342 gi1208474 hypothetical protein Synechocystis sp. 66 402 1761 398 gi215811 ail fiber protein Bacteriophage T3 66 192 1983 25 gi1045935 DNA helicase II Mycoplasma genitalium 66 249 2103 176 gi929798 precursor for the major merozoite surface antigens Plasmodium 66 225 alciparum 2341 188 gi1256623 exodeoxyribonuclease Bacillus subtilis 66 3 186 2458 164 gi1019410 unknown Schizosaccharomyces pombe 66 4 162 2505 235 gi1510394 putative transcriptional regulator Methanococcus jannaschi 66 234 2525 28O gi1000695 cytotoxin L Clostridium Sordellii 66 279 2935 gi765073 autolysin Staphylococcus aureus 66 273 3OOS 114 gi1205784 heterocyst maturation protein Haemophilus influenzae 66 192 3048 gi1303813 YgeW Bacillus subtilis 66 198 3071 gi1070014 protein-dependent Bacillus subtilis 66 189 3O81 225 gi984212 unknown Schizosaccharomyces pombe 66 18O 3090 386 gi1204987 DNA polymerase III, alpha chain Haemophilus influenzae 66 195 3318 gi100.9366 Respiratory nitrate reductase Bacillus subtilis 66 387 3739 400 gi1109684 ProV Bacilius Subtilis 66 399 3796 2O2 gi853760 acyl-CoA dehydrogenase Bacilius Subtilis 66 6 2O1 3924 347 gi563952 gluconate permease Bacilius lichenifornis 66 4 249 4240 350 gi151259 HMG-CoA reductase (EC 1.1.1.88) Pseudomonas mevalonii 66 5 348 pirA44756A44756 hydroxymethylglutaryl-CoA reductase (EC 1.1.1.88) Pseudomonas sp. 4604 234 pirA26713BHHC hemocyanin subunit II - Atlantic horseshoe crab 66 228 US 6,593,114 B1 125 126

TABLE 2-continued 8845 9750 gi145646 cynR Escherichia coli 65 35 906 2708 3565 gi887824 ORF o310 Escherichia coli 65 47 858 998 gi143402 recombination protein (ttg start codon) Bacillus subtilis gi1303923 65 44 996 RecN Bacillus Subtilis 15 2493 3524 gi1403126 CzcD gene product Alcaligenes eutrophus 65 38 1032 18 1372 836 gi349187 acyltransferase Saccharomyces cerevisiae 65 50 537 21 1467 2492 gi149518 phosphoribosyl anthranilate transferase Lactococcus lactis 65 52 1026 pirS35126S35126 anthranilate phosphoribosyltransferase (EC 4.2.18) - Lactococcus lactis subsp. lactis 25 3374 4312 gi1502420 malonyl-CoA:Acyl carrier protein transacylase Bacilius Subtilis 65 939 27 390 626 gi1212729 Ygh.J. Bacillus subtilis 65 237 31 10387 9734 gi509245 D-hydroxyisocaproate dehydrogenase Lactobacillus delbruecki 65 654 38 19172 19528 gi547519 H-protein Flaveria Cronquistii 65 357 44 790 1746 gi4058824 yeiK Escherichia coli 65 957 44 8832 83O8 gi1205905 molybdenum cofactor biosynthesis protein Haemophilus influenzae 65 5 525 45 6635 7588 gi493.074 ApbA protein Salmonella typhimurium 65 954 51 58O 1503 gi580897 OppB gene product Bacillus subtilis 65 924 52 225 953 gi1205518 NAD(P)H-flavin Oxidoreductase Haemophilus influenzae 65 729 55 1058 777 pirA44459A444 troponin T beta TnT-5 - rabbit 65 282 67 7421 8272 gi143607 sporulation protein Bacillus subtilis 65 852 73 4446 5375 gi1204896 lysophospholipase L2 Haemophilus influenzae 65 930 74 478 2 gi1204844 H. influenzae predicted coding region HIO594 Haemophilus 65 477 influenzae 77 1. 2 757 gi1046082 M. genitalium predicted coding region MG372 Mycoplasma 65 4 6 756 genitalium 77 795 1433 gi1222116 permease Haemophilus influenzae 65 639 81 3.454 218O gi1001708 hypothetical protein Synechocystis sp. 65 1275 91 8357 8166 gi1399263 cystathionine beta-lyase Emericella nidulans 65 192 98 1608 1988 gi467423 unknown Bacilius Subtilis 65 3 381 98 2250 2987 gi467424 unknown Bacilius Subtilis 65 4 738 O2 2119 1640 gi1511532 N-terminal acetyltransferase complex, subunit ARD1 Methanococcus 65 48O iannaschi 2862 gi1204637 H. influenzae predicted coding region HIO388 Haemophilus 65 32 786 influenzae 9841 88.31 gi142695 S-adenosyl-L-methionine:uroporphyrinogen III methyltransferase 65 47 1011 Bacilius megaterium 1O119 9799 gi710021 nitrite reductase (nirD) Bacillus subtilis 65 51 321 262 1140 gi39881 ORF 311 (AA 1-311) Bacillus subtilis 65 44 879 3909 4268 gi1204399 glucosamine-6-phosphate deaminase protein Haemophilus 65 44 360 influenzae 71.65 8595 gi536955 CG Site No. 361 Escherichia coli 65 41 1431 3688 3915 gi407881 stringent response-like protein Streptococcus equisimilis 65 45 228 pirS39975S39975 stringent response-like protein - Streptococcus quisiniis 1O 3882 4295 gi4078804 ORF1 Streptococcus equisimilis 65 50 414 1O 4231 438O gi1139574 Orf2 Streptomyces griseus 65 56 150 12 884.O 8062 gi1204571 H. influenzae predicted coding region HIO318 Haemophilus 65 52 579 influenzae 12 12 11288 10527 gi710496 transcriptional activator protein Bacillus brevis 65 32 762 25 2O2 gi1151158 repeat organellar protein Plasmodium chabaudi 65 39 2O1 26 422 gi37589 precursor Homo Sapiens 65 46 42O 27 11 10733 12658 gi1064809 homologous to sp:HTRA ECOLIBacillus Subtilis 65 41 1926 43 7004 6465 gi216513 mutator mutT (AT-GC transversion) Escherichia coli 65 56 540 45 3587 3838 gi1209768 D02 orf569 Mycoplasma pneumoniae 65 27 252 50 2841 22OO gi1146225 putative Bacillus Subtilis 65 37 642 66 1948 38 gi148304 beta-1,4-N-acetylmuramoylhydrolase Enterococcus hirae 65 50 1911 pirA42296A42296 lysozyme 2 (EC 3.2.1.-) precursor - Enterococcus irae (ATCC 9790) 88 3195 4178 gi151943 ORF3; putative Rhodobacter capsulatus 65 46 984 89 4785 4588 gi58812 ORF IV (AA 1-489) Figwort mosaic virus 65 40 198 95 5272 2636 gi145220 alanyl-tRNA synthetase Escherichia coli 65 49 2637 95 8104 5609 gi882711 exonuclease V alpha-subunit Escherichia coli 65 38 2496 2O6 16896 18191 gi4081154 ornithine acetyltransferase Bacillus subtilis 65 53 1296 217 3215 2586 gi1205974 5' guanylate kinase Haemophilus influenzae 65 41 630 22O 3751 2237 gi580920 rodD (gtaA) polypeptide (AA 1-673) Bacillus subtilis 65 40 1515 pirS06048S06048 probable rodD protein - Bacillus Subtilis sp|P13484TAGE BACSU PROBABLE POLY(GLYCEROL PHOSPHATE) LPHA-GLUCOSYLTRANSFERASE (EC 2.4.1.52) (TECHOIC ACID BIOSYNTHESIS ROTEIN E). 236 2327 3709 gi1146200 DNA or RNA helicase, DNA-dependent ATPase Bacillus Subtilis 65 46 1383 237 1902 2513 gi1493.79 HisBd Lactococcus lactis 65 46 612 241 4195 3422 gi1205308 ribonuclease HII (EC 31264) (RNASE HII) Haemophilus influenzae 65 50 774 252 940 6O2 gi1204989 hypothetical protein (BG:U00022 9) Haemophilus influenzae 65 40 339 261 3794 2808 gi145927 fecD Escherichia coli 65 43 987 274 278 gi496558 orfX Bacilius Subtilis 65 42 276 3O1 815 648 gi467418 unknown Bacilius Subtilis 65 45 168 US 6,593,114 B1 127 128

TABLE 2-continued 307 2864 2142 gi1070014 protein-dependent Bacillus subtilis 65 40 723 335 1399 512 gi146913 N-acetylglucosamine transport protein Escherichia coli 65 50 888 pirB29895 WQEC2N phosphotransferase system enzyme II (EC .7.1.69), N-acetylglucosamine-specific - Escherichia coli sp|PO9323|PTAA ECOLI PTS SYSTEM, N ACETYLGLUCOSAMINE-SPECIFIC IIABC OMPONENT (EIIA) 338 3170 222O gi1277029 biotin synthase Bacillus subtilis 65 49 951 343 1490 28OO gi143264 membrane-associated protein Bacillus subtilis 65 48 1311 344 2531 23O1 gi1050540 tRNA-glutamine synthetase Lupinus luteus 65 34 231 358 3421 3621 gi1146220 NAD+ dependent glycerol-3-phosphate dehydrogenase Bacillus 65 47 2O1 Subtilis 364 238 699 gi134.0128 ORF1 Staphylococcus aureus 65 51 462 379 576 gi143331 alkaline phosphatase regulatory protein Bacilius Subtilis 65 40 576 pirA27650A27650 regulatory protein phoR - Bacillus subtilis sp|P23545|PHOR BACSU ALKALINE PHOSPHATASE SYNTHESIS SENSOR PROTEIN HOR (EC 2.7.3-). 379 3 3666 4346 gi1432684 dihydrolipoamide transsuccinylase (odhB; EC 2.3.1.61) Bacillus 65 681 ubtilis 428 187 483 gi14204654 ORF YOR195w Saccharomyces cerevisiae 65 4 297 438 272 838 gi143498 degS protein Bacillus subtilis 65 567 444 928O 10215 gi1204756 ribokinase Haemophilus influenzae 65 936 449 1241 1531 gi599848 Na/H antiporter homolog Lactococcus lactis 65 291 478 865 278 gi1045942 glycyl-tRNA synthetase Mycoplasma genitalium 65 588 479 517 gi1498.1924 putative Pseudomonas aeruginosa 65 516 48O 4312 5637 gi415662 UDP-N-acetylglucosamine 1-carboxyvinyl transferase Acinetobacter 65 1326 alcoaceticus 484 430 gi146551 transmembrane protein (kdpD) Escherichia coli 65 429 499 54 932 gi603456 reductase Leishmania major 65 879 505 459 gi1518853 OafA Salmonella typhimurium 65 456 571 883 257 gi49399 open reading frame upstream glnE Escherichia coli 65 627 irS37754|S37754 hypothetical protein XE (glnE 5' region) - cherichia coli 611 270 34 gi10961 RAP-2 Plasmodium falciparum 65 4 237 705 283 gi710020 nitrite reductase (nirB) Bacillus subtilis 65 5 282 712 177 gi289272 ferrichrome-binding protein Bacillus subtilis 65 177 712 196 354 gi289272 ferrichrome-binding protein Bacillus subtilis 65 159 743 631 gi310631 ATP binding protein Streptococcus gordonii 65 630 749 393 779 gi467374 single strand DNA binding protein Bacilius Subtilis 65 387 762 850 gi160399 multidrug resistance protein Plasmodium falciparum 65 849 788 85 315 gi1129096 unknown protein Bacillus sp. 65 23 850 408 gi1006604 hypothetical protein Synechocystis sp. 65 408 908 444 gi11995.46 2362 Saccharomyces cerevisiae 65 444 925 174 gi1256653 DNA-binding protein Bacillus subtilis 65 174 1031 26 232 gi238657 65 524 AppC=cytochrome d oxidase, subunit I homolog Escherichia coli, K12, eptide, 514 aa 1037 262 110 gi1491813 gamma-glutamyltranspeptidase Bacilius Subtilis 65 4 153 1053 175 gi642655 unknown Rhizobium meliloti 65 3 174 1149 752 105 gi1162980 ribulose-5-phosphate 3-epimerase Spinacia oleracea 65 648 1214 495 109 gi1205959 lactam utilization protein Haemophilus influenzae 65 387 1276 276 76 pirS35493S354 site-specific DNA-methyltransferase Sts.I (EC 2.1.1.-) - 65 2O Streptococcus Sanguis 1276 577 254 gi473794 ORF Escherichia coli 65 324 2057 138 gi633699 TrsH Yersinia enterocolitical 65 135 2521 169 gi1045789 hypothetical protein (GB:U14003 76) Mycoplasma genitalium 65 168 2974 297 gi152052 enantiomerase-selective amidase Rhodococcus sp. 65 294 3O31 154 pirJQ1024JQ10 hypothetical 3OK protein (DmRP140 5' region) - fruit fly Drosophila 65 153 melanogaster 3069 278 gi144906 product homologous to E. coli thioredoxin reductase: J. Biol. Chem. 65 4 6 276 1988) 263:9015-9019, and to F52a protein of alkyl hydroperoxide eductase from S. typhimurium: J. Biol. Chem. (1990) 265:10535 10540; pen reading frame A Clostridium pasteurianum 3.146 142 gi49315 ORF1 gene product Bacillus subtilis 65 141 3170 341 gi1507711 indolepyruvate decarboxylase Erwinia herbicola 65 339 3546 303 gi450688 hsdM gene of EcoprrI gene product Escherichia coli 65 303 pirS38437S38437 hsdM protein - Escherichia coli pirSO9629 SO9629 hypothetical protein A - Escherichia coli (SUB 40-520) 37.82 328 gi166412 NADH-glutamate synthase Medicago satival 65 327 3990 189 gi100.9366 Respiratory nitrate reductase Bacillus subtilis 65 186 4032 3O8 gi1323127 ORF YGR087c Saccharomyces cerevisiae 65 306 4278 364 gi1197667 vitellogenin. Anolis pulchelius 65 363 19 4259 5518 gi145727 dealD Escherichia coli 64 1260 19 6926 6213 gi1016232 ycf27 gene product Cyanophora paradoxal 64 3 71.4 2O 6454. 5855 gi765073 autolysin Staphylococcus aureus 64 6OO 31 11537 103.68 gi414009 ipa-85d gene product Bacillus subtilis 64 1170 33 2388 4364 gi1204696 fructose-permease IIBC component Haemophilus influenzae 64 1977 36 1871 3013 gi290503 glutamate permease Escherichia coli 64 1143 37 4O65 4409 gi39815 orf 2 gene product Bacillus subtilis 64 345 45 7852 876O gi1230585 nucleotide sugar epimerase Vibrio cholerae O139 64 5 909 US 6,593,114 B1 12 9 130

TABLE 2-continued 53 3 1540 1899 gi 3O3961 YajJ Bacillus Subtilis 64 360 56 6 3855 2917 gi 4 57514 gltC Bacilius Subtilis 64 939 56 24 3OOO2 3O247 gi 47O331 similar to zinc fingers Caenorhabditis elegans 64 246 62 2421 2O83 Sl 642655 unknown Rhizobium meliloti 64 2 339 85 6O27 4876 gi 4 57702 5-aminoimidazole ribonucleotide-carboxilase Pichia methanolica 64 1152 pirS39112S39112 phosphoribosylaminoimidazole carboxylase (EC 1.1.21) - yeast Pichia methanolica 96 9 9251 gi 51 513 ABC transporter, probable ATP-binding subunit Methanococcus 64 iannaschi OO 1. 6OO Sl 765O73 autolysin Staphylococcus aureus 64 6OO O6 3868 4854 gi 466778 lysine specific permease Escherichia coli 64 987 23 554. 270 gi 4674 84 unknown Bacilius Subtilis 64 285 27 7514 7810 Sl1210061 serotype-specific antigen African horse sickness virus 64 297 pirS27891S27891 capsid protein VP2 - African horse sickness virus 31 7 6721 gi 51 160 M. jannaschii predicted coding region MJ1163 Methanococcus 64 414 iannaschi 62 42 4817 4179 173517 64 4444 639 gi riboflavin synthase alpha subunit Actinobacillus pleuropneumoniae 43 356 pir A32950A329 probable reductase protein - Leishmania major 64 5 354 49 3.295 3035 Sl 398 51 major surface antigen MSG2 Pneumocystis carinii 64 261 54 2307 1480 Sl1984587 DinPEscherichia coli 64 828 61 3855 488O Sl1903304 ORF72 Bacilius subtilis 64 1026 65 33 791 gi 4674 83 unknown Bacilius Subtilis 64 759 75 4844 3333 gi O72398 phaD gene product Rhizobium meliloti 64 1512 88 2042 2500 gi 961 MHC class II analog Staphylococcus aureus 64 459 95 13446 13225 Sl 39638O No definition line found Escherichia coli 64 222 2O6 16429 16938 Sl1304 34 argC Bacillus Stearothermophilus 64 510 215 282 gi 42359 ORF 6 Azotobacter vinelandi 64 279 243 6928 6038 Sl1414014 ipa-90d gene product Bacillus subtilis 64 891 258 845 360 Sl 664754 P17 Listeria monocytogenes 64 486 259 232 gi 4996.63 M. jannaschii predicted coding region MJO837 Methanococcus 64 231 iannaschi 263 6 5567 4569 gi 42828 aspartate semialdehyde dehydrogenase Bacilius Subtilis 64 4 8 999 sp|Q04797DHAS BACSU ASPARTATE-SEMIALDEHYDE DEHYDROGENASE (EC 2.1.11) (ASA DEHYDROGENASE). 271 1163 gi 4 67091 hfix; B2235 C2 202 Mycobacterium leprae 64 1161. 28O 173 1450 gi 3O3839 YafR Bacillus subtilis 64 1278 293 1267 gi 47345 primosomal protein n' Escherichia coli 64 1266 295 742 1488 gi 4 59266 Potential membrane spanning protein Staphylococcus hominis 64 747 pirS42932S42932 potential membrane spanning protein - taphylococcus hominis 3O1 1446 1267 Sl 58O835 lysine decarboxylase Bacillus subtilis 64 315 3949 2834 gi 4.3396 quinol Oxidase Bacillus subtilis 64 45 321 635 Sl 710496 transcriptional activator protein Bacillus brevis 64 41 333 4239 3958 gi 314295 ORF2; putative 19 kDa protein Listeria monocytogenes 64 43 342 549 gi 42940 fts A Bacillus Subtilis 64 38 353 2324 1770 Sl 537049 ORF o470 Escherichia coli 64 44 379 827 3658 pir S25295|A328 Oxoglutarate dehydrogenase (lipoamide) (EC 1.2.4.2) - Bacillus 64 47 Subtilis 404 4.429 4839 pir A36933A369 diacylglycerol kinase homolog - Streptococcus mutans 64 35 4O7 1133 246 Sl 969026 OrfX Bacilius Subtilis 64 41 425 591 73 gi 146177 phosphotransferase system glucose-specific enzyme II Bacilius 64 44 Subtilis 443 4082 4798 gi 4.7309 purine nucleoside phosphorylase Escherichia coli 64 51 450 1035 1604 Sl 606376 ORF ol62 Escherichia coli 64 38 470 168O 6107 gi 369948 host interacting protein Bacteriophage B1 64 45 486 1471 1031 gi 2O5582 spermidine?putrescine transport system permease protein 64 35 Haemophilus influenzae 497 1159 101 sp P36929FMU E FMU PROTEIN. 64 38 5O1 410 gi 42450 ahrC protein Bacillus subtilis 64 38 408 514 290 gi 2O4496 H. influenzae predicted coding region HIO238 Haemophilus 64 288 influenzae 551 3.162 3323 gi 2O4511 bacterioferritin comigratory protein Haemophilus influenzae 64 162 603 4 759 956 Sl 755823 NADH dehydrogenase F Streptogyna americana 64 3 198 653 746 552. gi 213234 dicarboxylic amino acids Dip5p permease Saccharomyces 64 4 195 cerevisiae 660 2257 713 sp P46133 YDAH HYPOTHETICAL PROTEIN IN OGT5' REGION (FRAGMENT). 64 3 1545 695 11 5O2 gi OO1383 hypothetical protein Synechocystis sp. 64 492 702 752 gi 42865 DNA primase Bacillus subtilis 64 750 826 339 Sl1971336 arginyl tRNA synthetase Bacillus subtilis 64 5 339 838 917 gi 35.4775 pfoS/R Treponema pallidum 64 915 864 675 944 Sl 398.33 cyclomaltodextrin glucanotransferase Bacilius Stearothermophilus 64 270 i39835 cyclomaltodextrin glucanotransferase Bacillus earothermophilus 887 677 gi enterotoxin type E precursor Staphylococcus aureus 64 675 pirA28179A28179 enterotoxin E precursor - Staphylococcus aureus sp|P12993|ETXE STAAU ENTEROTOXIN TYPE E PRECURSOR (SEE). 928 963 754. 311976 fibrinogen-binding protein Staphylococcus aureus 64 210 pirS3427OS34270 fibrinogen-binding protein - Staphylococcus ureus US 6,593,114 B1 131 132

TABLE 2-continued 1049 606 412 gi1049115 Rap60 Bacillus subtilis 64 195 1067 : 748 497 gi1151072 HhdA precursor Haemophilus ducreyi 64 252 112O 50 2O2 gi142439 ATP-dependent nuclease Bacillus Subtilis 64 153 1125 377 gi581648 epiB gene product Staphylococcus epidermidis 64 375 1688 214 26 pirAO1365 TVMS ransforming protein K-ras - mouse 64 189 2472 358 gi487282 Na+-ATPase subunit J Enterococcus hirae 64 357 2989 356 192 gi3O4134 argC Bacillus Stearothermophilus 64 165 3O13 352 74 gi551699 cytochrome oxidase subunit I Bacillus firmus 64 279 3O34 274 gi1204349 hypothetical protein (GB:GB:D90212 3) Haemophilus influenzae 64 273 3197 3O8 gi100.9366 Respiratory nitrate reductase Bacillus subtilis 64 306 3303 90 362 gi1107839 alginate lyase Pseudomonas aeruginosa 64 273 3852 82 288 gi216746 D-lactate dehydrogenase Lactobacillus plantarum 64 2O7 3868 312 gi149435 putative Lactococcus lactis 64 312 3918 331 gi5532 acetyl-CoA acyltransferase Yarrowia lipolytical 64 330 4OOO 112 378 gi984688 unknown Saccharomyces cerevisiae 64 267 4009 81 368 gi39372 grsB gene product Bacillus brevis 64 288 4166 349 gi149435 putative Lactococcus lactis 64 348 4366 307 gi216267 ORF2 Bacilius magaterium 64 306 4457 400 gi1197667 vitellogenin. Anolis pulchelius 64 399 11 3 1539 2438 gi4382284 ORF C Staphylococcus aureus 63 900 24 5423 5235 gi1369943 a1 gene product Bacteriophage B1 63 189 29 1. 390 gi4674414 expressed at the end of exponential growyh under conditions in which 63 390 he enzymes of the TCA cycle are repressed Bacillus subtilis gi467441 expressed at the end of exponential growyh under onditions in which the enzymes of the TCA cycle are repressed Bacil 31 5712 5095 gi4969434 ORF Saccharomyces cerevisiae 63 4 618 44 2 14669 15O19 pirA04446QQEC hypothetical protein F-92 - Escherichia coli 63 351 48 4403 62SO gi43498 pyruvate synthase Halobacterium halobium 63 1848 50 3869 4738 gi413967 ipa-43d gene product Bacillus Subtilis 63 870 53 5742 472O gi474176 regulator protein Staphylococcus xylosus 63 1023 56 1588O 176O7 gi467409 DNA polymerase III subunit Bacillus subtilis 63 1728 57 7376 6807 gi537036 ORF ol58 Escherichia coli 63 570 62 2114 1749 gi642656 unknown Rhizobium meliloti 63 366 70 6562 7353 gi1399821 PhoC Rhizobium meliloti 63 792 75 223 927 gi149376 HisG Lactococcus lactis 63 705 78 4403 3894 gi413950 ipa-26d gene product Bacillus Subtilis 63 510 91 7220 5364 gi466997 metH2; B2126 C1 157 Mycobacterium leprae 63 1857 91 9448 8330 gi1204344 cystathionine gamma-synthase Haemophilus influenzae 63 1119 12O 21 1508 gi882657 sulfite reductase (NADPH) flavoprotein beta subunit Escherichia Oli 63 1488 12O 2722 4125 gi665994 hypothetical protein Bacillus subtilis 63 3 1404 127 6064 7566 gi40162 murE gene product Bacillus subtilis 63 1503 149 2106 1891 gi148503 dnaKErysipelothrix rhusiopathiae 63 216 149 2 10170 9895 gi4870 ORF 2, has similarity to DNA polymerase Saccharomyces kluyveri 63 276 rS15961S15961 hypothetical protein 2 - yeast Saccharomyces yveri plasmid pSKL 164 507 1298 gi145476 CDP-diglyceride synthetase Escherichia coli 63 792 166 8164. 6419 gi151932 fructose enzyme II Rhodobacter capsulatus 63 1746 169 1704 1886 gi152886 elongation factor Ts (tsf) Spiroplasma citri 63 183 188 2.951 2757 gi1334547 GIY COI i14 grp IB protein Podospora anserina 63 195 195 13 11767 12804 gi606100 ORF o335 Escherichia coli 63 1038 2O1 6O7 2283 gi433534 arginyl-tRNA synthetase Corynebacterium glutamicum 63 1677 pirA49936A49936 arginine--tRNA ligase (EC 6.1.1.19) - orynebacterium glutamicum 14 15893 16489 gi580828 N-acetyl-glutamate-gamma-semialdehyde dehydrogenase Bacilius 63 597 ubtilis 22O 5766 3763 gi216334 secA protein Bacillus subtilis 63 2004 221 74 907 gi677945 AppA Bacillus subtilis 63 834 227 944 1708 gi1510558 cobyric acid synthase Methanococcus jannaschii 63 765 261 804 1070 gi486511 ORFYKRO54c Saccharomyces cerevisiae 63 267 269 1960 314 gi148221 DNA-dependent ATPase, DNA helicase Escherichia coli 63 1647 pirJSO137BVECRQ recC) protein - Escherichia coli 278 6176 4935 gi699273 cystathionine gamma-synthase Mycobacterium leprae 63 1242 sp|P46807|METB MYCLE CYSTATHIONINE GAMMA SYNTHASE (EC 4.2.99.9) O-SUCCINYLHOMOSERINE (THIOL LYASE). 287 738 1733 gi405133 putative Bacillus Subtilis 63 38 996 295 1. 748 gi1239983 hypothetical protein Bacillus subtilis 63 41 747 328 2148 3134 gi45302 carrier protein (AA 1-437) Pseudomonas aeruginosa 63 36 987 irS11497S11497 branched-chain amino acid transport protein braB - eudomonas aeruginosa 362 1216 806 D-3-PHOSPHOGLYCERATE DEHYDROGENASE (EC 1.1.1.95) 63 38 411 (PGDH). 404 326 1051 gi1303816 YgeZ Bacillus subtilis 63 35 726 405 1715 1329 gi1303914 YahY Bacillus subtilis 63 42 387 4O6 227 gi142152 sulfate permease (gtg start codon) Synechococcus PCC6301 63 43 225 pirA30301 GRYCS7 sulfate transport protein - Synechococcus sp. (PCC 7942) US 6,593,114 B1 133 134

TABLE 2-continued 415 1048 2718 gi1205402 transport ATP-binding protein Haemophilus influenzae 63 41 1671 426 2679 1783 gi393268 29-kiloDalton protein Streptococcus pneumoniae 63 39 897 sp|P42362|P29K STRPN 29 KD MEMBRANE PROTEIN IN PSAA 5' REGION ORF1). 505 1347 2195 gi1418999 orf4 Lactobacillus Sake 63 40 849 507 574. gi546917 comK Bacillus subtilis, E26, Peptide, 192 aa 63 35 573 562 146 1084 gi43985 nifS-like gene Lactobacillus delbrueckii 63 45 939 675 215 gi1510994 serine aminotransferase Methanococcus jannaschi 63 29 213 686 230 gi517356 nitrate reductase (NADH) Lotus japonicus 63 52 228 701 392 gi881940 NorQ protein Paracoccus denitrificans 63 41 390 720 400 gi471.68 open reading frame Streptomyces lividans 63 35 399 779 287 gi1261932 unknown Mycobacterium tuberculosis 63 41 285 907 22 321 gi149445 ORF1 Lactococcus lactis 63 27 3OO 972 399 gi1511235 M. jannaschii predicted coding region MJ1232 Methanococcus 63 27 396 iannaschi 1085 82 gi1204277 hypothetical protein (GB:U00019 14) Haemophilus influenzae 63 38 537 109.4 542 gi790943 urea amidolyase Bacillus subtilis 63 39 540 1108 482 pirS49892S498 regulation protein - Bacilius Subtilis 63 44 48O 1113 gi493017 endocarditis specific antigen Enterococcus faecalis 63 45 615 13OO 695 sp|P3394OYOJH HYPOTHETICAL 54.3 KD PROTEIN IN ECO-ALKB 63 46 693 INTERGENC REGION. 1325 gi928989 p100 protein Borrelia burgdorferi 63 3O 204 1814 gi1303914 YahY Bacillus subtilis 63 34 243 2021 pirC33496IC334 hisC homolog - Bacillus subtilis 63 46 249 2325 193 gi4361324 product is similar to TnpA of transposon Tn554 from Staphylococcus 63 40 192 ureus Clostridium butyricum 2335 195 gi1184298 flagellar MS-ring protein Borrelia burgdorferi 63 47 195 24O6 227 gi1041785 rhoptry protein Plasmodium yoeli 63 33 225 2961 136 360 gi312443 carbamoyl-phosphate synthase (glutamine-hydrolysing) Bacillus 63 52 225 aldolyticus 2965 402 gi1407784 orf-1; novel antigen Staphylococcus aureus 63 50 402 2987 293 gi1224069 amidase Moraxella catarrhalis 63 35 291 2994 135 gi836646 phosphoribosylformimino-praic ketoisomerase Rhodobacter 63 51 132 phaeroides 3O43 252 64 gi1480237 phenylacetaldehyde dehydrogenase Escherichia coli 63 40 189 3.078 400 191 gi1487982 intrinsic membrane protein Mycoplasma hominis 63 36 210 3139 217 gi4391264 glutamate synthase (NADPH) Azospirillum brasilense 63 47 216 pirA49916A49916 glutamate synthase (NADPH) (EC 1.4.1.13) - ZOspirillum brasilense 3625 398 gi623073 ORF360; putative Bacteriophage LL-H 63 48 396 3658 399 gi1303697 YrkA Bacilius Subtilis 63 37 399 3659 395 gi1256135 YbbF Bacilius subtilis 63 48 393 3783 361 gi1256902 Pyruvate decarboxylase isozyme 2 (Swiss Prot. accession number 63 34 360 P16467) Saccharomyces cerevisiae 3900 171 sp|P10537AMYB BETA-AMYLASE (EC 3.2.1.2) (1,4-ALPHA-D-GLUCAN 63 168 MALTOHYDROLASE). 4309 176 pirA37967A379 neural cell adhesion molecule Ng-CAM precursor - chicken 63 57 174 4367 1. 195 gi1321932 Per6p gene product Pichia pastoris 63 3O 195 4432 312 gi151259 HMG-CoA reductase (EC 1.1.1.88) Pseudomonas mevalonii 63 312 pirA44756A44756 hydroxymethylglutaryl-CoA reductase (EC 1.1.1.88) Pseudomonas sp. 4468 3O8 gi296464 ATPase Lactococcus lactis 63 36 303 33 1411 24OO gi153675 tagatose 6-P kinase Streptococcus mutans 62 44 990 36 5985 6218 gi1490521 hMSH3 Homo Sapiens 62 234 37 721 gi1107531 ceuE gene product Campylobacter coli 62 33 720 38 15 10912 11589 gi1222058 H. influenzae predicted coding region HIN1279 Haemophilus 62 38 678 influenzae 38 25 19526 2O329 gi695280 ORF2 Alcaligenes eutrophus 62 804 57 1780 1037 gi471234 orf1 Haemophilus influenzae 62 55 744 57 63SO 6054 gi508174 EIIB domain of PTS-dependent Gat transport and phosphorylation 62 35 297 Escherichia coli 58 559 gi755152 highly hydrophobic integral membrane protein Bacilius Subtilis 62 34 558 sp|P42953TAGG BACSU TEICHOIC ACID TRANSLOCATION PERMEASE PROTEIN AGG. 67 1O 8250 9014 gi470683 Shows similarity with ATP-binding proteins from other ABC 62 34 765 transport perons, Swiss Prot Accession Numbers P24137, P08007, PO4285, P24136 Escherichia coli 7494 6673 gi46816 actVA 4 gene product Streptomyces coelicolor 62 44 822 132O 847 gi39993 UDP-N-acetylmuramoylalanine--D-glutamate ligase Bacillus 62 43 474 Subtilis 87 7034 92.05 gi217191 5'-nucleotidase precursor Vibrio parahaemolyticus 62 48 2172 1OO 3O89 2127 gi1511047 phosphoglycerate dehydrogenase Methanococcus jannaschii 62 42 963 102 52O gi153655 mismatch repair protein Streptococcus pneumoniae 62 34 519 pirC28667C28667 DNA mismatch repair protein hexA - Streptococcus neumoniae 112 2 466 1068 gi153741 ATP-binding protein Streptococcus mutans 62 37 603 114 6855 7562 gi1204866 L-fucose operon activator Haemophilus influenzae 62 38 708 US 6,593,114 B1 135 136

TABLE 2-continued 116 4 5633 4443 gi677947 AppC Bacillus subtilis 62 37 1.191 124 6004 5153 gi853777 product similar to E. coli PRFA2 protein Bacillus subtilis 62 44 852 pirS55438|S55438 ywkE protein - Bacillus subtilis sp|P45873|HEMK BACSU POSSIBLE PROTOPORPHYRINOGEN OXIDASE (EC 3.3-). 148 24 554. gi467456 unknown Bacilius Subtilis 62 531 149 6725 5859 gi1205807 replicative DNA helicase Haemophilus influenzae 62 867 163 1153 803 gi40067 X gene product Bacilius Sphaericus 62 351 164 14673 15632 gi42219 P35 gene product (AA 1-314) Escherichia coli 62 96.O 165 1166 1447 gi403936 phenylalanyl-tRNA synthetase alpha subunit (Gly294 variant) 62 282 unidentified cloning vector 166 2084 5089 gi3O8861 GTG start codon Lactococcus lactis 62 3OO6 171 614 gi1046053 hypothetical protein (SP:P32049) Mycoplasma genitalium 62 612 183 1310 99 gi143045 hemY Bacillus Subtilis 62 1212 2OO 3 956 gi142439 ATP-dependent nuclease Bacillus Subtilis 62 : 954 237 935 1966 gi41695 hisC protein Escherichia coli 62 1032 261 2605 12O2 gi143121 ORF A; putative Bacillus firmus 62 1404 299 4477 4719 gi467441 expressed at the end of exponential growyh under conditions in which 62 243 he enzymes of the TCA cycle are repressed Bacillus subtilis gi467441 expressed at the end of exponential growyh under onditions in which the enzymes of the TCA cycle are repressed Bacil 304 3819 262O gi153015 Fema protein Staphylococcus aureus 62 12OO 324 262 gi142717 cytochrome aa3 controlling protein Bacillus Subtilis 62 261 pirA33960A33960 cta protein - Bacillus subtilis sp|P12946ICTAA BACSU CYTOCHROME AA3 CONTROLLING PROTEIN. 325 269 12O7 gi581088 methionyl-tRNA formyltransferase Escherichia coli 62 39 939 332 463 4368 gi1499960 uridine 5'-monophosphate synthase Methanococcus jannaschi 62 36 264 355 370 gi145925 fecB Escherichia coli 62 32 369 365 6628 6804 gi413943 ipa-19d gene product Bacillus subtilis 62 54 77 369 1626 508 pirA43577A435 regulatory protein pfoR - Clostridium perfringens 62 42 19 370 34 264 gi40665 beta-glucosidase Clostridium thermocellum 62 37 231 415 2709 3176 gi1205401 transport ATP-binding protein Haemophilus influenzae 62 35 68 429 790 gi1046024 Na+ ATPase subunit J Mycoplasma genitalium 62 40 789 444 704 1369 gi581510 nodulation gene; integral membrane protein; homology to Rhizobium 62 37 666 eguminosarum nodIRhizobium ioti 477 2 75 1869 pirA48440A484 ring-infected erythrocyte surface antigen 2, RESA-2 - 62 44 19 Plasmodium falciparum 485 24 1707 gi17934 betaine aldehyd dehydrogenase Beta vulgaris 62 43 14 67 487 3 114 1311 gi149445 ORF1 Lactococcus lactis 62 31 71 494 1134 1313 gi166835 ribulose bisphosphate carboxylase/oxygenase activase Arabidopsis 62 37 8O haliana 518 193 882 gi153491 O-methyltransferase Streptomyces glaucescens 62 39 690 534 369 2522 gi1480429 putative transcriptional regular Bacilius Stearothermophilus 62 35 54 551 6 437 482O gi511113 ferric uptake regulation protein Campylobacter jejuni 62 37 50 574. 570 gi153000 enterotoxin B Staphylococcus aureus 62 43 570 590 344 1171 gi40367 ORFC Clostridium acetobutyllicum 62 37 828 655 396 830 gi147195 phnB protein Escherichia coli 62 44 35 656 2 478 gi1205451 cell division inhibitor Haemophilus influenzae 62 36 77 676 348 gi1511613 methyl coenzyme M reductase system, component A2 62 36 345 Methanococcus jannaschii 687 248 gi49272 Asparaginase Bacillus lichenifornis 62 48 246 700 267 944 gi1205822 hypothetical protein (GB:X75627 4) Haemophilus influenzae 62 40 678 840 : 1041 367 gi1045865 M. genitalium predicted coding region MG181 Mycoplasma 62 36 675 genitalium 864 898 1491 gi1144332 deoxyuridine nucleotidohydrolase Homo Sapiens 62 38 594 916 35 400 gi4139314 ipa-7d gene product Bacilius Subtilis 62 45 366 1071 771 gi1510649 aspartokinase I Methanococcus jannaschi 62 40 771 1084 19 609 gi688.011 AgX-1 antigen human, infertile patient, testis, Peptide, 505 aa 62 39 591 1103 2O3 gi581261 ORF homologous to E. coli metB Herpetosiphon aurantiacus 62 51 2O1 pirS14030S14030 Hypothetical protein - Herpetosiphon aurantiacus (fragment) 1217 233 gi4600254 ORF2, putative Streptococcus pneumoniae 62 231 1533 414 184 gi413968 ipa-44d gene product Bacillus Subtilis 62 231 1537 257 gi1510641 alanyl-tRNA synthetase Methanococcus jannaschii 62 255 2287 161 gi4859564 mrpC gene product Proteus mirabilis 62 159 2386 245 gi285708 nontoxic component Clostridium botulinum 62 243 2484 167 gi142092 DNA-repair protein (recA) Anabaena variabilis 62 165 2490 400 gi581648 epiB gene product Staphylococcus epidermidis 62 399 3O16 3OO gi71.0022 uroporphyrinogen III Bacillus subtilis 62 5 297 3116 213 gi466883 nifS; B1496 C2 193 Mycobacterium leprae 62 213 3.297 413 gi475715 acetyl coenzyme A acetyltransferase (thiolase) Clostridium 62 411 cetobutyllicum 3609 31 276 gi1408501 homologous to N-acyl-L-amino acid amidohydrolase of Bacilius 62 4 8 246 Stearothermophilus Bacilius Subtilis 3665 402 22O gi151259 HMG-CoA reductase (EC 1.1.1.88) Pseudomonas mevalonii 62 4 O 183 pirA44756A44756 hydroxymethylglutaryl-CoA reductase (EC 1.1.1.88) Pseudomonas sp. 3733 374 gi1353197 thioredoxin reductase Eubacterium acidaminophilum 62 372 US 6,593,114 B1 137 138

TABLE 2-continued 3898 1. 237 gi153675 tagatose 6-P kinase Streptococcus mutans 62 237 4027 143 gi330705 homologue to gene 30 (aa 1-59); putative Bovine herpesvirus 4 62 141 4109 365 gi41748 hsdM protein (AA 1-520) Escherichia coli 62 363 4303 1. 303 gi1303813 YgeW Bacillus subtilis 62 303 438O 267 gi1235684 mevalonate pyrophosphate decarboxylase Saccharomyces cerevisiae 62 264 4494 2 256 gi510692 enterotoxin H Staphylococcus aureus 62 255 4598 223 35 gi763513 ORF4; putative Streptomyces violaceOruber 62 189 4624 1. 222 gi41748 hsdM protein (AA 1-520) Escherichia coli 62 222 3932 3576 gi928831 ORF95; putative Lactococcus lactis phage BK5-T 357 11 162 pirC33356C333 prothymosin alpha homolog (clone 32) - human (fragment) 159 16 10991 938 gi1205391 hypothetical protein (SP:P33995) Haemophilus influenzae 948 32 283 8O1 gi1066504 exo-beta 1.3 glucanase Cochliobolus carbonun 5 519 38 616 107 gi1510864 glutamine transport ATP-binding protein Q Methanococcus 492 iannaschi 45 3082 4038 gi1109686 ProX Bacilius Subtilis 44 957 48 7118 7504 gi498839 ORF2 Clostridium perfingens 3 387 51 4605 5570 gi388269 traC Plasmid paid1 966 60 1689 2243 gi1205893 hypothetical protein (GB:U00011 3) Haemophilus influenzae s 555 62 5122 4685 gi854656 Na/H antiporter system ORF2 Bacillus alcalophilus 438 67 433O 5646 gi466612 nikAEscherichia coli 1317 74 1504 608 gi1204846 carbamate kinase Haemophilus influenzae 897 85 1101 gi1498756 amidophosphoribosyltransferase PurFRhizobium etli 1098 86 1582 169 gi1499931 M. jannaschii predicted coding region MJ1083 Methanococcus 414 iannaschi 97 74 649 gi1518679 orf Bacilius Subtilis 4 576 99 1990 526 gi413958 ipa-34d gene product Bacillus Subtilis 465 124 S123 4023 gi556881 Similar to Saccharomyces cerevisiae SUA5 protein Bacillus 6 4 6 1101 : SubtilispirS49358S49358 ipc-29d protein - Bacillus Subtilis sp|P39153YWLC BACSU HYPOTHETICAL 37.0 KD PROTEIN INSPOR-GLYC NTERGENC REGION. 125 1668 2531 ORFA gene product Chloroflexus aurantiacus 4 3 864 132 1. 627 hypothetical protein I - Enterococcus faecalis plasmid pAM-beta-1 624 (fragment) 149 3075 2533 gi1144332 deoxyuridine nucleotidohydrolase Homo Sapiens 543 149 22 7869 7048 gi160047 p101/acidic basic repeat antigen Plasmodium falciparum 6 822 pirA29232A29232 101K malaria antigen precursor - Plasmodium alciparum (strain Camp) 168 1915 2361 gi1499694 HIT protein, member of the HIT-family Methanococcus jannaschii 4 447 171 7948 6221 gi467446 similar to SpoVB Bacillus subtilis 1728 174 1042 2340 gi216374 glutaryl 7-ACA acylase precursor Bacilius laterosporus 1299 190 4111 3.188 gi409286 bmrU Bacilius Subtilis 924 216 190 gi415861 eukaryotic initiation factor 2 beta (eIF-2 beta) Oryctolagus 189 uniculus 227 41.61 gi216341 ORF for methionine amino peptidase Bacillus subtilis 4 888 238 1959 gi809543 CbrC protein Erwinia chrysanthemi 1089 247 694 gi537231 ORF f579 Escherichia coli 693 247 678 1034 gi142226 chvD protein Agrobacterium tumefaciens 357 257 2627 1731 gi6993.79 glvr-1 protein Mycobacterium leprae 897 268 3051 2683 gi40364 ORFA1 Clostridium acetobutylicum 369 275 4621 4827 gi1204848 hypothetical protein (GP:M87049 57) Haemophilus influenzae 3 2O7 277 1845 gi784897 beta-N-acetylhexosaminidase Streptococcus pneumoniae 1845 pirA56390A56390 mannosyl-glycoprotein indo-beta-N-acetyl glucosaminidase (EC 3.2.1.96) precursor - treptococcus pneumoniae 278 7032 6061 gi467462 cysteine synthetase A Bacillus subtilis 972 278 8535 7192 gi1205919 Na+ and Cl- dependent gamma-aminobutryic acid transporter 6 8 1344 Haemophilus influenzae 283 366 gi755607 polyA polymerase Bacillus Subtilis 366 288 1496 1074 gi388108 cell wall enzyme Enterococcus faecalis 3344 423 291 86 334 gi454265 FBP3 Petunia hybrida 3 249 31.8 694 284 gi290531 similar to beta-glucoside transport protein Escherichia coli 4 : 411 sp|P31451|PTIB ECOLI PTS SYSTEM, ARBUTIN-LIKE IIB COMPONENT PHOSPHOTRANSFERASE ENZYME II, B COMPONENT) (EC 2.7.1.69). 330 1190 468 gi1001805 hypothetical protein Synechocystis sp. 723 385 1025 537 gi533098 DnaD protein Bacillus subtilis 4 489 426 399 gi1303853 YdgF Bacillus subtilis : 396 438 810 1421 gi1293660 AbsA2 Streptomyces coelicolor 36 612 454 792 gi733522 phosphatidylinositol-4,5-diphosphate 3-kinase Dictyostelium 3O 789 iscoideum 464 560 336 gi1123120 C53B7.5 gene product Caenorhabditis elegans 38 225 470 6O77 7357 gi623073 ORF360; putative Bacteriophage LL-H 47 1281 509 279 gi467484 unknown Bacilius Subtilis 45 276 555 1296 676 gi141800 anthranilate synthase glutamine amidotransferase Acinetobacter 42 621 alcoaceticus 569 857 gi467090 B2235 C2 195 Mycobacterium leprae 47 855 585 2 803 645 sp|P366861SURE SURVIVAL PROTEIN SURE HOMOLOG (FRAGMENT). 33 159 592 1422 1150 gi1221602 immunity repressor protein Haemophilus influenzae 32 273 US 6,593,114 B1 139 140

TABLE 2-continued

603 43 357 Sl 507738 Hmp Vibrio para haemolyticus 33 315 669 1235 Sl11146243 22.4% identity with Escherichia coli DNA-damage inducible 37 1233 protein ...; putative Bacillus Subtilis 675 805 1101 403373 glycerophosphoryl diester phosphodiesterase Bacilius Subtilis 36 297 Sl pirS37251S37251 glycerophosphoryl diester phosphodiesterase - acillus subtilis 703 829 Sl S37181 ORF f470 Escherichia coli 32 828 728 816 Sl 806281 DNA polymerase I Bacillus Stearothermophilus 39 813 821 61 318 Sl 709992 hypothetical protein Bacillus subtilis 38 258 856 1567 821 Sl 609310 portal protein gp3 Bacteriophage HK97. 40 747 923 542 Sl1143213 putative Bacilius Subtilis 38 540 1124 59 370 Sl 1107541 C33D9.8 Caenorhabditis elegans 26 312 1492 276 Sl 4O6397 unknown Mycoplasma genitalium 32 273 16O2 46 318 Sl 733522 phosphatidylinositol-4,5-diphosphate 3-kinase Dictyostelium 34 273 iscoideum 2500 290 Sl 1045964 hypothetical protein (GB:U14003 297) Mycoplasma genitalium 31 288 2968 808 Sl 397526 clumping factor Staphylococcus aureus 55 807 3O76 248 Sl 149373 ORF 1 Lactococcus lactis 41 246 3609 4O1 Sl 14085O1 homologous to N-acyl-L-amino acid amidohydrolase of Bacillus 39 195 Stearothermophilus Bacilius Subtilis 3662 740 Sl 1303813 YgeW Bacillus subtilis 6 42 738 3672 442 784897 beta-N-acetylhexosaminidase Streptococcus pneumoniae 6 50 441 Sl pirA56390A56390 mannosyl-glycoprotein indo-beta-N-acetyl glucosaminidase (EC 3.2.1.96) precursor - treptococcus pneumoniae 3724 Sl 1OO9366 Respiratory nitrate reductase Bacillus subtilis 41 219 3728 Sl 677943 AppD Bacillus Subtilis 6 46 396 3884 784897 beta-N-acetylhexosaminidase Streptococcus pneumoniae 47 399 Sl pirA56390A56390 mannosyl-glycoprotein indo-beta-N-acetyl glucosaminidase (EC 3.2.1.96) precursor - treptococcus pneumoniae 3971 383 784897 beta-N-acetylhexosaminidase Streptococcus pneumoniae 45 381 Sl pirA56390A56390 mannosyl-glycoprotein indo-beta-N-acetyl glucosaminidase (EC 3.2.1.96) precursor - treptococcus pneumoniae 4038 359 57 Sl 133995O arge subunit of NADH-dependent glutamate synthase Plectonema 24 303 boryanum O41 274 Sl 413953 ipa-29d gene product Bacillus subtilis 48 273 O47 402 Sl 52.8991 unknown Bacilius Subtilis 42 402 102 345 Sl1976O25 Hrs.A Escherichia coli 46 345 155 336 784.897 beta-N-acetylhexosaminidase Streptococcus pneumoniae 50 336 Sl pirA56390A56390 mannosyl-glycoprotein indo-beta-N-acetyl glucosaminidase (EC 3.2.1.96) precursor - treptococcus pneumoniae 4 268 233 Sl 450688 hsdM gene of EcoprrI gene product Escherichia coli 38 231 pirS38437S38437 hsdM protein - Escherichia coli pirSO9629 SO9629 hypothetical protein A - Escherichia coli (SUB 40-520) 4 374 273 784897 beta-N-acetylhexosaminidase Streptococcus pneumoniae 50 270 Sl pirA56390A56390 mannosyl-glycoprotein indo-beta-N-acetyl glucosaminidase (EC 3.2.1.96) precursor - treptococcus pneumoniae 4 389 172 Sl 147516 ribokinase Escherichia coli 35 171 4 621 268 784897 beta-N-acetylhexosaminidase Streptococcus pneumoniae 47 267 Sl pirA56390A56390 mannosyl-glycoprotein indo-beta-N-acetyl glucosaminidase (EC 3.2.1.96) precursor - treptococcus pneumoniae 4 663 27 227 Sl1976O25 Hrs.A Escherichia coli 50 2O1 5536 4409 Sl 14085O1 homologous to N-acyl-L-amino acid amidohydrolase of Bacillus 43 1128 Stearothermophilus Bacilius Subtilis 11 3.426 3725 410748 ring-infested erythrocyte surface antigen Plasmodium falciparum 60 24 3OO Sl pirA25526A25526 ring-infected erythrocyte surface antigen recursor - Plasmodium falciparum (strain FC27/Papua New Guinea) sp|P1383ORESA PLAFF RING-INFECTED ERYTHROCYTE SURFACE ANTIGEN RE 11 10313 95.91 gi 217651 carbonyl reductase (NADPH) Rattus norvegicus 60 28 723 16 11917 1293O gi OO1453 hypothetical protein Synechocystis sp. 60 37 1014 33 26 469 Sl 388109 regulatory protein Enterococcus faecalis 60 41 444 37 9834 8854 gi 336656 Orf1 Bacilius Subtilis 60 40 981 39 4364 4522 gi 872 ORF 4 Saccharomyces kluyveri 60 47 159 41 1025 gi 42822 D-alanine racemase cds Bacilius Subtilis 60 39 1023 43 2474 36O7 gi 4 68046 para-nitrobenzyl esterase Bacillus subtilis 60 40 1134 44 6756 7769 Sl 4 14234 thiF Escherichia coli 60 52 1014 45 8874 9074 Sl 343949 varl (40.0) Saccharomyces cerevisiae 60 44 2O1 56 26430 25O18 gi 4 687.64 mocR gene product Rhizobium meliloti 60 35 1413 60 173 388 gi 3O3864 YdgO Bacillus subtilis 60 33 216 63 357 1619 gi 67124 ureD; B229 C3 234 Mycobacterium leprae 60 43 1263 69 395 gi 518853 OafA Salmonella typhimurium 60 36 393 88 11.88 gi 48.0429 putative transcriptional regulator Bacilius Stearothermophilus 60 3O 1188 92 3881 3027 Sl 349227 transmembrane protein Escherichia coli 60 37 855 92 4923 3850 gi 4 666.13 nikB Escherichia coli 60 38 1074 93 476 gi 510925 coenzyme F420-reducing hydrogenase, beta subunit Methanococcus 60 27 474 iannaschi 96 7366 Sl1972715 accessory protein Carnobacterium piscicola 60 3O 213 98 3212 gi 4 67425 unknown Bacilius Subtilis 60 42 858 US 6,593,114 B1 141 142

TABLE 2-continued O2 1O 7158 7430 gi143092 acetolactate synthase small subunit Bacillus subtilis 60 37 273 sp|P37252|ILVN BACSU ACETOLACTATE SYNTHASE SMALL SUBUNIT (EC 1.3.18) (AHAS) (ACETOHYDROXY-ACID SYNTHASE SMALL SUBUNIT) (ALS). O9 11 9127 10515 gi1255259 o-succinylbenzoic acid (OSB) CoA ligase Staphylococcus aureus 60 28 1389 O9 12 10499 11656 gi141954 beta-ketothiolase Alcaligenes eutrophus 60 41 1158 19 3134 1638 gi152428O unknown Mycobacterium tuberculosis 60 45 1497 21 6957 7646 gi1107529 ceuC gene product Campylobacter coli 60 35 690 40 6O13 4322 gi146547 kdpA Escherichia coli 60 45 1692 45 2 703 gi1460077 unknown Mycobacterium tuberculosis 60 23 702 50 2216 1623 gi1146230 putative Bacillus Subtilis 60 40 594 57 961 533 gi1303975 YajX Bacillus subtilis 60 3O 429 58 4769 4413 gi1449288 unknown Mycobacterium tuberculosis 60 36 357 59 257 gi580932 murD gene product Bacillus subtilis 60 43 255 60 159 1187 gi1204532 hypothetical protein (GB:L19201 29) Haemophilus influenzae 60 34 1029 61 1. 7866 7483 gi1496003 ORF3; Pepy; putative oligoendopeptidase based on homology with 60 34 384 Lactococcus lactis PepF (GenBank Accession Number Z32522) Caldicellulosiruptor Saccharolyticus 72 1331 2110 gi4852804 28.2 kDa protein Streptococcus pneumoniae 60 33 78O 73 2460 838 gi1524397 glycine betaine transporter OpuD Bacillus subtilis 60 41 1623 73 4953 3943 gi1100737 NADP dependent leukotreine b4 12-hydroxydehydrogenase Sus 60 44 1011 Scrofa 1. 3 995 ipa-19d gene product Bacillus subtilis 60 42 993 3641 4573 HYPOTHETICAL 29.4 KD PROTEIN IN HEML-PFS 60 37 933 INTERGENC REGION PRECURSOR. 2O3 2415 561 gi927798 D9719.34p; CAI: 0.14 Saccharomyces cerevisiae 60 43 855 2O6 12234 12515 sp|P37347YECD HYPOTHETICAL 21.8 KID PROTEIN INASPSS REGION. 60 47 282 212 1213 gi332711 hemagglutinin-neuraminidase fusion protein Human parainfluenza 60 34 198 irus 3 214 65 gi1204366 hypothetical protein (GB:U14003 130) Haemophilus influenzae 60 36 1089 237 gi149377 HisD Lactococcus lactis 60 40 936 241 4998 gi1046160 hypothetical protein (GB:U00021 5) Mycoplasma genitalium 60 37 699 260 5919 gi4319504 similar to a B. Subtilis gene (GB: BACHEMEHY 5) Clostridium 60 35 567 asteurianum 264 1218 gi397526 clumping factor Staphylococcus aureus 60 53 1215 267 gi148316 NaH-antiporter protein Enterococcus hirae 60 27 14O7 275 3804 pirF36889|F368 leuD 3'-region hypothetical protein - Lactococcus lactis subsp. lactis 60 35 792 (strain IL1403) 291 860 98 gi1208889 coded for by C. elegans cDNA yk130e 12.5; contains C2H2-type zinc 60 33 339 fingers Caenorhabditis elegans 307 3176 2931 gi1070014 protein-dependent Bacillus subtilis 60 36 246 316 4957 5823 gi4139524 ipa-28d gene product Bacillus Subtilis 60 41 867 328 2996 3484 gi1204484 membrane-associated component, branched amino acid transport 60 39 489 system Haemophilus influenzae 332 43.63 3839 gi1205449 colicin V production protein (pur regulon) Haemophilus influenzae 60 37 525 357 532 gi887842 single-stranded DNA-specific exonuclease Escherichia coli 60 41 531 375 96 362 gi4857 adenylyl cyclase gene product Saccharomyces kluyveri 60 47 267 rJQ1145|OYBYK adenylate cyclase (EC 4.6.1.1) - yeast (ccharomyces kluyveri) 397 66 416 gi709999 Glucarate dehydratase Bacillus subtilis 60 37 351 4.09 2 163 gi499700 glycogen phosphorylase Saccharomyces cerevisiae 60 35 162 453 914 1237 gi1196899 unknown protein Staphylococcus aureus 60 36 324 453 362O 34O2 sp|P12222YCF1 HYPOTHETICAL 226 KD PROTEIN (ORF 1901). 60 31 219 470 622 945 pirS30782S307 integrin homolog - yeast Saccharomyces cerevisiae 60 31 324 500 118 606 gi4674074 unknown Bacilius Subtilis 60 36 489 503 752 982 gi167835 myosin heavy chain Dictyostelium discoideum 60 34 231 505 2238 3563 gi1510732 NADH oxidase Methanococcus jannaschii 60 26 1326 523 3 1043 gi143331 alkaline phosphatase regulatory protein Bacilius Subtilis 65 1041 pirA27650A27650 regulatory protein phoR - Bacillus subtilis sp|P23545|PHOR BACSU ALKALINE PHOSPHATASE SYNTHESIS SENSOR PROTEIN HOR (EC 2.7.3-l. 543 465 gi1511103 cobalt transport ATP-binding protein O Methanococcus jannaschii 60 465 545 726 gi1498.192 putative Pseudomonas aeruginosa 60 726 556 1054 gi1477402 ex gene product Bordetella pertussis 60 1053 578 489 gi1205129 H. influenzae predicted coding region HIO882 Haemophilus 60 486 influenzae 594 624 gi1212755 adenylyl cyclase Aeromonas hydrophila 60 624 604 530 gi145925 ecB Escherichia coli 60 528 62O 465 gi1205483 bicyclomycin resistance protein Haemophilus influenzae 60 462 630 2 871 1122 gi1486242 unknown Bacilius Subtilis 60 252 645 425 276 gi1205136 serine hydroxymethyltransferase (serine methylase) 60 150 Haemophilus influenzae 684 843 604 gi1205,538 hypothetical protein (GB:U14003 302) Haemophilus influenzae 60 3 240 786 485 gi1402944 orfRM1 gene product Bacillus subtilis 60 483 844 346 104 gi790943 urea amidolyase Bacillus subtilis 60 243 851 726 gi159661 GMP reductase Ascaris lumbricoides 60 726 US 6,593,114 B1 143 144

TABLE 2-continued 871 874 gi1001493 hypothetical protein Synechocystis sp. 60 39 873 896 839 12O gi604926 NADH dehydrogenase, subunit 5 Schizophyllum commune 60 39 720 sp|P50368|NU5M SCHCO NADH-UBIQUINONE OXIDOREDUCTASE CHAIN 5 (EC 6.5.3). 908 448 753 gi662880 novel hemolytic factor Bacillus cereus 60 31 306 979 595 gi1429255 putative; orf1 Bacilius Subtilis 60 3O 594 1078 502 335 gi581055 inner membrane copper tolerance protein Escherichia coli 60 40 168 gi871029 disulphide isomerase like protein Escherichia coli pirS47295S47295 inner membrane copper tolerance protein - scherichia coli 1112 62O 90 gi407885 ORF3 Streptomyces griseus 60 34 531 1135 275 66 gi1171407 Vps8p Saccharomyces cerevisiae 60 36 210 1146 17 562 gi1239981 hypothetical protein Bacillus subtilis 60 36 546 1291 360 pirS57530|S575 carboxyl esterase - Acintobacter calcoaceticus 60 3O 357 1332 169 gi1222056 aminotransferase Haemophilus influenzae 60 44 168 1429 146 gi1205619 erritin like protein Haemophilus influenzae 60 39 144 1722 286 gi240052 dihydroflavonol-4-reductase, DFR Hordeum vulgare=barley, cv. 60 36 285 Gula, eptide, 354 aa 2350 2OO 15 gi497626 ORF 1 Plasmid paQ1 60 2O 186 2936 310 101 gi508981 prephenate dehydratase Bacillus subtilis 60 48 210 3027 3O2 36 gi1146199 putative Bacilius Subtilis 60 37 267 3O84 2O 208 gi1407784 orf-1; novel antigen Staphylococcus aureus 60 51 189 3155 226 gi1046097 cytadherence-accessory protein Mycoplasma genitalium 60 34 225 3603 186 gi5101.08 mitochondrial long-chain enoyl-CoA hydratase/3-hydroxycyl-CoA 60 42 183 ehydrogenase alpha-subunit Rattus norvegicus 3665 244 gi151259 HMG-CoA reductase (EC 1.1.1.88) Pseudomonas mevalonii 60 42 243 pirA44756A44756 hydroxymethylglutaryl-CoA reductase (EC 1.1.1.88) Pseudomonas sp. 3747 146 gi474.1924 iucC gene product Escherichia coli 60 36 144 3912 335 gi1488695 novel antigen; orf-2 Staphylococcus aureus 60 44 333 4O72 272 gi4058794 yeiH Escherichia coli 60 33 270 4134 194 gi780656 chemoreceptor protein Rhizobium leguminosarum bv. viciae 60 28 159 gi780656 chemoreceptor protein Rhizobium leguminosarum bv. Cité 402 127 gi602031 E. to trimethylamine DH Mycoplasma capricolum 60 41 276 pirS4995OS49950 probable trimethylamine dehydrogenase (EC .5.99.7) - Mycoplasma capricolum (SGC3) (fragment) 4243 127 324 gi899317 peptide synthetase module Microcystis aeruginosa 60 42 198 pirS49111S49111 probable amino acid activating domain - icrocystis aeruginosa (fragment) (SUB 144-528) 4310 313 gi508980 pheBBacillus subtilis 60 28 312 4345 173 gi5101.08 mitochondrial long-chain enoyl-CoA hydratase/3-hydroxycyl-CoA 60 42 171 ehydrogenase alpha-subunit Rattus norvegicus 4.382 28O 62 gi47382 acyl-CoA-dehydrogenase Streptomyces purpurascens 60 48 219 4474 53 223 gi5101.08 mitochondrial long-chain enoyl-CoA hydratase/3-hydroxycyl-CoA 60 42 171 ehydrogenase alpha-subunit Rattus norvegicus 23 3523 2528 gi4264464 VipB protein Salmonella typhi 59 39 996 33 707 1483 pirS48604S486 hypothetical protein - Mycoplasma capricolum (SGC3) (fragment) 59 33 777 33 4651 5853 gi6721 F59B2.3 Caenorhabditis elegans 59 33 12O3 37 2299 1370 gi142833 ORF2 Bacilius Subtilis 59 37 930 38 16593 16402 gi912576 BiP Phaeodactylum tricornutum 59 40 192 52 2349 2050 gi536972 ORF o90a Escherichia coli 59 44 3OO 54 13402 12623 gi483940 transcription regulator Bacilius Subtilis 59 37 78O 57 3339 2281 gi508176 Gat-1-P-DH, NAD dependent Escherichia coli 59 40 1059 66 495 gi1303901 YghT Bacilius Subtilis 59 34 492 67 6552 7460 gi912461 nikC Escherichia coli 59 37 909 70 5383 6366 gi1399822 PhoD precursor Rhizobium meliloti 59 46 984 78 1449 gi971345 unknown, similar to E. coli cardiolipin synthase Bacillus subtilis 59 39 1449 sp|P45860YWIE BACSU HYPOTHETICAL 58.2 PROTEIN IN NAR-ACDANTERGENC REGION. 82 14329 15534 gi490328 LORFF (unidentified) 59 44 12O6 89 958 314 gi6428O1 unknown Saccharomyces cerevisiae 59 32 645 96 4940 5473 gi1333802 protein of unknown function Rhodobacter capsulatus 59 33 534 98 82O gi467421 similar to B. Subtilis DnaB Bacilius Subtilis 59 34 819 119 166 1557 gi143122 ORF B; putative Bacillus firmus 59 36 1392 12O 6214 6756 gi15354 ORF 55.9 Bacteriophage T4 59 39 543 12O 12476 13510 gi1086575 BetA Rhizobium meliloti 59 44 1035 123 195 gi984.737 catalase Campylobacter jejuni 59 38 192 130 1. 370 645 gi1256634 25.8% identity over 120 aa with the Synenococcus sp. MpeV protein; 59 276 putative Bacillus Subtilis 131 5278 5712 gi1510655 hypothetical protein (SP:P42297) Methanococcus jannaschii 59 39 435 164 509 gi1001342 hypothetical protein Synechocystis sp. 59 507 164 1529 2821 gi1205165 hypothetical protein (SP:P37764) Haemophilus influenzae 59 35 1293 164 19 19643 21376 gi1001381 hypothetical protein Synechocystis sp. 59 34 1734 173 3 3717 2707 gi1184121 auxin-induced protein Vigna radiata 59 50 1011 179 1688 1158 gi143036 unidentified gene product Bacilius Subtilis 59 33 531 195 12 11503 10337 gi762778 NifS gene product Anabaena azolae 59 1167 US 6,593,114 B1 145 146

TABLE 2-continued 4702 5670 gi1510240 hemin permease Methanococcus jannaschii 59 32 969 s 5719 6315 gi1511456 M. jannaschii predicted coding region MJ1437 Methanococcus 59 34 597 iannaschi 209 102 461 gi1204666 hypothetical protein (GB:X73124 53) Haemophilus influenzae 59 42 360 214 1OSO 2234 gi551531 2-nitropropane dioxygenase Williopsis Saturnus 59 36 1185 214 3293 4135 gi1303709 YrkJ Bacillus Subtilis 59 32 843 217 2167 953 gi290489 dfp (CG Site No. 18430) Escherichia coli 59 44 1215 237 3.078 3785 gi1493.82 His A Lactococcus lactis 59 38 708 251 376 960 gi1303791 Yge.J. Bacillus subtilis 59 34 585 286 812 gi146551 transmembrane protein (dkpD) Escherichia coli 59 31 810 316 386O 2742 gi4058794 yeiH Escherichia coli 59 32 1119 370 6OO 761 gi1303794 YGeM Bacillus Subtilis 59 35 162 382 SO6 gi547513 orf3 Haemophilus influenzae 59 34 SO4 391 1273 926 gi152901 ORF3 Spirochaeta aurantial 59 37 348 4O6 1705 605 gi709992 hypothetical protein Bacillus subtilis 59 34 1101 426 3245 2688 gi1204610 iron(III) dicitrate transport ATP-binding protein FECE 59 36 558 Haemophilus influenzae 429 1148 783 gi1064809 homologous to sp:HTRA ECOLIBacillus Subtilis 59 42 366 460 : 708 1301 gi4668824 pps1: B1496 C2 189 Mycobacterium leprae 59 37 594 461 2212 3135 gi14982.95 homoserine kinase homolog Streptococcus pneumoniae 59 37 924 473 1607 285 gi147989 trigger factor Escherichia coli 59 40 1323 48O 5862 6110 gi1205311 (3R)-hydroxymyristol acyl carrier protein dehydrase Haemophilus 59 40 249 influenzae 521 14 1354 pirA25620A256 staphylocoagulase - Staphylococcus aureus (fragment) 59 32 1341 534 2994 4O73 gi153746 mannitol-phosphate dehydrogenase Streptococcus mutans 59 36 108O pirC44798|C44798 mannitol-phosphate dehydrogenase Mt1D - treptococcus mutans 535 954 gi1469939 group B oligopeptidase PepB Streptococcus agalactiae 59 33 954 551 3 2836 31.86 gi1204511 bacterioferritin comigratory protein Haemophilus influenzae 59 45 351 573 449 940 gi386681 ORF YAL022 Saccharomyces cerevisiae 59 36 492 650 748 gi396400 similar to eukaryotic Na+/H+ exchangers Escherichia coli 59 3O 744 sp|P32703YJCE ECOLI HYPOTHETICAL 60.5 KD PROTEIN IN SOXR-ACS NTERGENIC REGION (O549). 664 285 gi1262748 LukF-PV like component Staphylococcus aureus 59 33 282 670 455 gi1122758 unknown Bacilius Subtilis 59 42 453 674 543 929 gi293033 integrase Bacteriophage phi-LC3) 59 46 387 758 176 gi1500472 M. jannaschii predicted coding region MJ1577 Methanococcus 59 37 174 iannaschi 771 2 1461 652 gi522150 bromoperoxidcase BPO-A1 Streptomyces aureofaciens 59 44 810 sp|P33912|BPA1 STRAU NON-HAEM BROMOPEROXIDASE BPO-A1 (EC 1.11.1.-) (BROMIDE PEROXIDASE) (BPO1). (SUB 2-275) 825 1097 gi397526 clumping factor Staphylococcus aureus 59 47 1095 1052 723 gi289262 comE ORF3 Bacilius Subtilis 59 36 372 1152 188 gi1276668 ORF238 gene product Porphyra purpurea 59 37 186 1198 247 gi142439 ATP-dependent nuclease Bacillus Subtilis 59 26 246 1441 235 gi1045942 glycyl-tRNA synthetase Mycoplasma genitalium 59 37 234 2103 gi4592504 triacylglycerol lipase Galactomyces geotrichum 59 33 186 2205 398 gi1303794 YGeM Bacillus Subtilis 59 38 396 2578 284 gi258003 insulin-like growth factor binding protein complex acid-labile ubunit 59 48 2O1 rats, liver, Peptide, 603 aa 2967 145 gi1212730 YghKBacilius Subtilis 59 44 204 3O12 gi773571 neurofilament protein NF70 Helix aspersal 59 31 246 3544 gi1055218 crotonase Clostridium acetobutylicum 59 42 399 3548 gi1055218 crotonase Clostridium acetobutylicum 59 42 399 358O 351 gi1055218 crotonase Clostridium acetobutylicum 59 42 348 3.720 363 gi1408494 homologous to penicillin acylase Bacilius Subtilis 59 36 360 4171 gi1055218 crotonase Clostridium acetobutylicum 59 42 294 4305 310 gi1524193 unknown Mycobacterium tuberculosis 59 39 309 18 622 gi146913 N-acetylglucosamine transport protein Escherichia coli 58 43 621 pirB29895 WQEC2N phosphotransferase system enzyme II (EC .7.1.69), N-acetylglucosamine-specific - Escherichia coli sp|PO9323|PTAA ECOLI PTS SYSTEM, N-ACETYL GLUCOSAMINE-SPECIFIC IIABC OMPONENT (EIIA) 5845 gi50502 collagen alpha chain precursor (AA-27 to 1127) 58 50 1176 Mus musculus 21 3234 3626 gi1054860 phosphoribosyl anthranilate isomerase Thermotoga maritina 58 32 393 23 1669 497 gi1276880 EpsC Streptococcus thermophilus 58 29 1173 23 8090 6879 pirA31133A311 diaminopimelate decarboxylase (EC 4.1.1.20) - Pseudomonas 58 37 1212 aeruginosa 38 22555 22884 gi973249 vestitone reductase Medicago Satival 58 37 330 44 2 4O6 gi289272 ferrichrome-binding protein Bacillus subtilis 58 33 405 45 1. 552. gi29464 embryonic myosin heavy chain (1085 AA) Homo Sapiens 58 33 552. irS1246OS12460 myosin beta heavy chain - human 55 538 317 gi158852 glucose regulated protein Echinococcus multilocularis 58 32 222 62 8068 7643 gi975353 kinase-associated protein B Bacillus subtilis 58 35 426 63 1553 1717 gi166926 Arabidopsis thaliana unidentified mRNA sequence, complete cds., 58 35 165 ene product Arabidopsis thaliana 67 11229 10441 gi1228083 NADH dehydrogenase subunit 2 Chorthippus parallelus 58 41 789 US 6,593,114 B1 147 148

TABLE 2-continued

96 8208 9167 Sl 709992 hypothetical protein Bacillus subtilis 58 42 96.O 107 364 663 Sl 806327 Escherichia coli hrpA gene for A protein similar to yeast PRP16 and 58 37 702 RP22 Escherichia coli 112 4519 5613 gi 55588 glucose-fructose oxidoreductase Zymomonas mobilis 58 38 1095 pirA42289A42289 glucose-fructose oxidoreductase (EC 1.1.--) recursor - Zymomonas nobilis 114 6503 5688 gi 377.843 unknown Bacilius Subtilis 58 38 816 143 395 529 pirA45605A456 mature-parasite-infected erythrocyte surface antigen MESA 58 867 Plasmodium falciparum 151 717 950 gi 370261 unknown Mycobacterium tuberculosis 58 234 154 4 627 3239 gi 209277 pCTHom1 gene product Chlamydia trachomatis 58 1389 154 13541 128O1 gi 46613 DNA ligase (EC 6.5.1.2) Escherichia coli 58 741 155 892 1515 gi YGiB Bacillus subtilis 58 378 174 529 Sl 904.198 hypothetical protein Bacillus subtilis 58 528 189 533 1769 Sl 46.7383 DNA binding protein (probable) Bacillus Subtilis 58 237 2O1 2669 3307 gi 511453 endonuclease III Methanococcus jannaschii 58 639 208 2 238 gi 276729 phycobilisome linker polypeptide Porphyra purpurea 58 237 22O 13058 11541 Sl 397526 clumping factor Staphylococcus aureus 58 1518 231 474 1319 gi MutS Bacilius Subtilis 58 156 233 3497 2793 gi No definition line found Caenorhabditis elegans 58 705 243 93O3 10O82 Sl ORF f277 Escherichia coli 58 78O 257 331 1143 gi ORF1 Staphylococcus aureus 58 813 3O2 460 8O1 Sl ORF X Bacilius subtilis 58 342 307 6127 5270 gi 3O3842 Yof J Bacilius Subtilis 58 858 321 1914 2747 gi 23.9996 hypothetical protein Bacillus subtilis 58 834 342 2724 3497 gi 54838 ORF 6; putative Pseudomonas aeruginosa 58 774 348 663 gi 67.478 unknown Bacilius Subtilis 58 663 4O1 384 605 gi 434O7 para-aminobenzoic acid synthase, component I (pab) Bacilius ubtilis 58 222 437 325 1554 gi 3O3866 YdgS Bacillus subtilis 58 35 1230 445 105 1442 Sl 581.583 protein A Staphylococcus aureus 58 32 1338 453 789 965 gi O0945.5 unknown Schizosaccharomyces pombe 58 34 177 453 2047 1346 Sl 537214 yiG gene product Escherichia coli 58 40 702 479 731 1444 gi 256621 26.7% of identity in 165 aa to a Thermophilic bacterium hypothetical 56 36 71.4 protein 6: putative Bacillus subtilis 490 547 185 Sl rodD (gtaA) polypeptide (AA 1-673) Bacillus subtilis 58 36 363 pirS06048S06048 probable rodD protein - Bacillus Subtilis sp|P13484TAGE BACSU PROBABLE POLY(GLYCEROL PHOSPHATE) LPHA-GLUCOSYLTRANSFERASE (EC 2.4.1.52) (TECHOIC ACID BIOSYNTHESIS ROTEIN E). 517 1164 HYPOTHETICALHELICASE MGO18. 58 3O 1164 517 4.182 4544 gi 453422 orf268 gene product Mycoplasma hominis 58 29 363 546 28O2 4019 gi 886052 restriction modification system S subunit Spiroplasma citri 58 37 1218 gi886052 restriction modification system S subunit Spiroplasma itri 562 179 gi 43831 nifS protein (AA 1-400) Klebsiella pneumoniae 58 34 177 6OO 1156 965 gi 1838.39 unknown Pseudomonas aeruginosa 58 48 192 604 1001 771 gi OO1353 hypothetical protein Synechocystis sp. 58 41 231 619 504 gi 903748 integral membrane protein Homo Sapiens 58 43 SO4 625 364 gi 2O8474 hypothetical protein Synechocystis sp. 58 43 363 635 755 18 gi 510995 transaldolase Methanococcus jannaschi 58 41 738 645 846 gi 677882 ileal sodium-dependent bile acid transporter Rattus norvegicus 58 33 846 gi677882 ileal sodium-dependent bile acid transporter Rattus orvegicus 645 906 1556 gi 23.9999 hypothetical protein Bacillus subtilis 58 41 651 665 532 293 gi 204262 hypothetical protein (GB:L10328 61) Haemophilus influenzae 58 39 240 674 327 19 gi 4 98817 ORF8; homologous to small subunit of phage terminases Bacillus 58 39 309 ubtilis 675 806 gi 4 2181 OsmO gene product Escherichia coli 58 28 507 745 310 gi 2O5432 coenzyme PQQ synthesis protein III (pgq III) Haemophilus 58 32 309 influenzae 799 242 1174 gi 2O4669 collagenase Haemophilus influenzae 58 36 933 8OO : 614 132 gi 71963 tRNA isopentenyl transferase Saccharomyces cerevisiae 58 37 483 sp|PO7884|MOD5 YEAST TRNA ISOPENTENYLTRANSFERASE (EC 2.5.1.8) ISOPENTENYLDIPHOSPHATE: TRNA ISO PENTENYLTRANSFERASE) (IPP RANSFERASE) (IPPT). 854 605 102 466778 lysine specific permease Escherichia coli 58 44 SO4 885 242 Sl1861199 protoporphyrin IX Mg-chelatase subunit precursor Hordeum vulgare 58 33 240 891 527 gi 293660 AbsA2 Streptomyces coelicolor 58 31 525 942 467 gi 4 O5567 traH Plasmid pSK41 58 3O 465 10O2 521 90 Sl 577649 preLUKM Staphylococcus aureus 58 34 432 1438 261 Sl 581558 isoleucyl tRNA synthetase Staphylococcus aureus 58 3O 261 sp|P41368|SYIP STAAU ISOLEUCYL-TRNASYNTHETASE, MUPIROCIN RESISTANT (EC 6.1.1.5) (ISOLEUCIN-TRNA LIGASE) (ILERS) (MUPIROCIN RESISTANCE ROTEIN). 1442 463 Sl1971394 similar to Acc. No. D26185 Escherichia coli 58 34 462 1873 241 Sl 13399.51 small subunit of NADH-dependent glutamate synthase 58 38 240 Plectonema boryanum 1876 158 Sl 529216 No definition line found Caenorhabditis elegans 58 33 156 sp|P46503YLX7 CAEEL HYPOTHETICAL 7.3. KD PROTEIN F23F12.7 NHROMOSOME III. US 6,593,114 B1 149 150

TABLE 2-continued 1989 gi1405458 YneR Bacilius Subtilis 58 29 294 2109 gi1001801 hypothetical protein Synechocystis sp. 58 31 399 2473 gi510140 ligoendopeptidase F Lactococcus lactis 58 38 144 2523 gi644873 catabolic dehydroquinate dehydratase Acinetobacter calcoaceticus 58 37 225 3O41 211 gi1205367 oligopeptide transport ATP-binding protein Haemophilus influenzae 58 39 210 3094 263 gi1185288 isochorismate synthase Bacillus subtilis 58 38 261 37O6 383 gi4566144 mevalonate kinase Arabidopsis thaliana 58 48 381 3854 402 gi8O8869 human gep372 Homo Sapiens 58 32 402 4082 224 gi508551 ribulose-1,5 bisphosphate carboxylase large subunit - methyl 58 37 174 transferase Pisun sativum 4278 gi180189 cerebellar-degeneration-related antigen (CDR34) Homo Sapiens 58 37 204 gi182737 cerebellar degeneration-associated protein Homo Sapiens pirA2977OA29770 cerebellar degeneration-related protein - human 19 7363 6908 gi1001516 hypothetical protein Synechocystis sp. 57 31 456 23 8872 8081 gi606066 ORF f256 Escherichia coli 57 29 792 31 24O2 gi153146 ORF3 Streptomyces coelicolor 57 32 24OO 38 14 10796 9981 gi144859 ORF B Clostridium perfringens 57 31 816 46 14 12O63 13046 gi1001319 hypothetical protein Synechocystis sp. 57 25 984 51 1187 963 pir|B33856|B338 hypothetical 80K protein - Bacillus sphaericus 57 38 225 54 1. 453 gi684950 staphylococcal accessory regulator A Staphylococcus aureus 57 31 453 75 3 239 gi1000470 C27B7.7 Caenorhabditis elegans 57 42 237 92 3061 2267 gi143607 sporulation protein Bacillus subtilis 57 35 795 96 4006 4773 gi144297 acetyl esterase (XynC) Caldocellum Saccharolyticum 57 34 768 pirB37202B37202 acetylesterase (EC 3.1.1.6) (XynC) - Caldocellum accharolyticum 107 148O 2O76 gi4609554 TagE Vibrio cholerae 57 42 109 5340 5933 gi1438846 Unknown Bacilius Subtilis 57 41 112 6679 77O1 gi1486250 unknown Bacilius Subtilis 57 33 114 4108 1832 gi871456 putative alpha subunit of formate dehydrogenase 57 37 Methanobacterium hermoautotrophicum 126 430 1053 gi288301 ORF2 gene product Bacillus megaterium 57 37 131 s 6277 6O17 gi1511160 M. jannaschii predicted coding region MJ1163 Methanococcus 57 38 iannaschi 133 22O1 gi1303912 YahW Bacillus subtilis 57 40 468 133 2784 gi1221884 (urea?) amidolyase Haemophilus influenzae 57 37 6OO 147 1694 gi467469 unknown Bacilius Subtilis 57 33 47 160 1060 gi558604 chitin synthase 2 Neurospora crassa 57 28 234 163 4764 gi145580 rarD gene product Escherichia coli 57 38 924 168 4336 gi39782 33 kDa lipoprotein Bacillus subtilis 57 32 990 170 3.297 gi603404 Yer164p Saccharomyces cerevisiae 57 37 159 221 6809 gi1136221 carboxypeptidase Sulfolobus Solfataricus 57 32 1218 228 1348 gi288969 fibronecin binding protein Streptococcus dysgalactiae 57 32 444 pirS3385OS33850 fibronecin-binding protein - Streptococcus ySgalactiae 263 3686 gi1185.002 dihydrodipicolinate reductase Pseudomonas Syringae pv. tabacil 57 42 726 276 255 gi396380 No definition line found Escherichia coli 57 40 240 283 335 gi773349 BirA protein Bacillus subtilis 57 32 990 297 236 gi1334820 reading frame V Cauliflower mosaic virus 57 46 234 342 1993 gi1204431 hypothetical protein (SP:P33644) Haemophilus influenzae 57 35 813 375 3340 gi385.177 cell division protein Bacillus subtilis 57 26 402 433 3286 gi1524117 alpha-acetolactate decarboxylase Lactococcus lactis 57 40 726 470 903 gi804819 protein serine/threonine kinase Toxoplasma gondi 57 3O 243 487 1391 gi507323 ORF1 Bacilius Stearothermophilus 57 28 333 498 274 gi1334549 NADH-ubiquinone oxidoreductase subunit 4 LLPodospora anserina 57 34 579 503 173 gi1502283 organic cation transporter OCT2 Rattus norvegicus 57 3O 171 505 1284 gi4668844 B1496 C2 194 Mycobacterium laprae 57 40 336 519 1182 gi1303707 YrkH Bacilius Subtilis 57 34 1368 522 1945 gi1064809 homologous to sp:HTRA ECOLIBacillus Subtilis 57 36 1290 538 909 gi153179 phosphorinothyrcin n-acetyltransferase Streptomyces coelicolor 57 40 507 pirJHO246JHO246 phosphinothricin N-acetyltransferase (EC 2.3.1.-) Streptomyces coelicolor 547 486 gi467340 unknown Bacilius Subtilis 57 50 483 599 532 sp|P20692TYRA PREPHENATE DEHYDROGENASE (EC 1.3.1.12) (PDH). 57 41 531 62O 572 387 gi1107894 unknown Schizosaccharomyces pombe 57 38 186 622 1130 660 gi173028 thioredoxin II Saccharomyces cerevisiae 57 39 471 625 362 1114 gi1262366 hypothetical protein Mycobacterium leprae 57 34 753 68O 1. 2O)4 gi143544 RNA polymerase sigma-30 factor Bacillus subtilis 57 3O 204 pirA28625A28625 transcription initiation factor sigma H acillus subtilis 690 629 gi4665204 pocR Salmonella typhimurium 57 29 627 696 433 gi4139724 ipa-48r gene product Bacilius Subtilis 57 33 432 704 638 gi1499931 M. jannaschii predicted coding region MJ1083 Methanococcus 57 36 603 iannaschi 732 1621 926 gi1418999 orf4 Lactobacillus Sake 57 37 696 746 227 gi392973 Rab3 Aplysia californica 57 42 225 757 2O 466 gi43979 L. curvatus small cryptic plasmid gene for rep protein 57 45 447 Lactobacilius rvatus 862 295 gi1303827 YafI Bacillus Subtilis 57 21 294 1049 455 gi15101.08 ORF-1 Agrobacterium tumefaciens 57 35 453 US 6,593,114 B1 151 152

TABLE 2-continued 1117 695 gi896286 NH2 terminus uncertain Leishmania tarentolae 57 28 693 1136 322 gi1303853 YdgF Bacillus subtilis 57 38 321 1144 611 189 gi310083 voltage-activated calcium channel alpha-1 subunit Rattus orvegicus 57 46 423 1172 738 gi1511146 M. jannaschii predicted coding region MJ1143 Methanococcus 57 28 735 iannaschi 1SOO 558 370 gi142780 putative membrane protein; putative Bacillus Subtilis 57 35 189 1676 399 139 gi313777 uracil permease Escherichia coli 57 31 261 2481 400 gi1237015 ORF4 Bacilius Subtilis 57 23 399 3.099 230 gi1204540 isochorismate synthase Haemophilus influenzae 57 39 228 3122 181 gi882472 ORF o464 Escherichia coli 57 40 18O 3560 361 gi153490 tetracenomycin C resistance and export protein Streptomyces 57 37 360 laucescens 3850 434 12 gi155588 glucose-fructose oxidoreductase Zymomonas mobilis 57 40 423 pirA42289A42289 glucose-fructose oxidoreductase (EC. 1.1.--) recursor - Zymomonas nobilis 3931 354 gi4139534 ipa-29d gene product Bacillus subtilis 57 36 351 3993 384 gi151259 HMG-CoA reductase (EC 1.1.1.88) Pseudomonas mevalonii 57 39 384 pirA44756A44756 hydroxymethylglutaryl-CoA reductase (EC 1.1.1.88) Pseudomonas sp. 4O65 398 pirJVOO37|RDEC nitrate reductase (EC 1.7.99.4) alpha chain - Escherichia coli 57 31 396 41OO 3OO gi1086633 TO6C10.5 gene product Caenorhabditis elegans 57 47 297 41.63 287 gi21512 patatin Solanum tuberosum 57 50 285 4267 335 39 gi1000365 SpoIIIAG Bacillus subtilis 57 38 297 4358 3O2 gi298.032 EF Streptococcus Suis 57 32 3OO 4389 108 290 gi4058944 1-phosphofructokinase Escherichia coli 57 37 183 4399 232 gi1483.603 Pristinamycin I synthase I Streptomyces pristinaespiralis 57 35 231 4481 288 gi4058794 yeiH Escherichia coli 57 44 285 4486 258 gi515938 glutamate synthase (ferredoxin) Synechocystis sp. 57 42 255 pirS46957S46957 glutamate synthase (ferredoxin) (EC 1.4.7.1) - ynechocystis sp. 4510 242 gi1205301 eukotoxin secretion ATP-binding protein Haemophilus influenzae 57 38 240 4617 256 44 gi1511222 restriction modification enzyme, subunit M1 Methanococcus 57 35 213 iannaschi 11524 10847 gi149204 histidine utilization repressor G. Klebsiella aerogenes 56 31 678 pirA36730A36730 hutG protein - Klebsiella pneumoniae (fragment) sp|P19452HUTG KLEAE FORMIMINOGLUTAMASE (EC 3.5.3.8) (FORMIMINOGLUTAMATE HYDROLASE) HISTIDINE UTILIZATION PROTEING) (FRAGMENT). 22 4248 5.177 gi1322222 RACH1 Homo Sapiens 56 33 930 38 21179 22264 gi1480705 ipoate-protein ligase Mycoplasma capricoium 56 34 1086 44 1861 2421 gi490320 Y gene product unidentified 56 31 561 44 101O3 10606 gi1205099 hypothetical protein (GB:L19201 1) Haemophilus influenzae 56 39 SO4 50 482O 516.1 gi209931 fiber protein Human adenovirus type 5 56 48 342 53 2O76 2972 gi623476 ranscriptional activator Providencia Stuartii 56 3O 897 sp|P43463|AARP PROST TRANSCRIPTIONAL ACTIVATOR AARP. 67 5656 6594 gi 4 666.13 nikB Escherichia coli 56 32 939 89 1810 1256 gi 4 82922 protein with homology to pail repressor of B. Subtilis 56 39 555 Lactobacilius elbruecki 96 2O3 913 gi145594 cAMP receptor protein (crp) Escherichia coli 56 35 711 O9 17846 17442 gi1204367 hypothetical protein (GB:U14003 278) Haemophilus influenzae 56 27 405 12 5611 6678 gi155588 glucose-fructose oxidoreductase Zymomonas mobilis 56 40 1068 pirA42289A42289 glucose-fructose oxidoreductase (EC 1.1.--) recursor - Zymomonas nobilis 31 51OO 3796 gi619724 MgtE Bacillus firmus 56 3O 1305 38 65 232 gi413948 ipa-24d gene product Bacillus Subtilis 56 31 168 38 823 1521 gi580868 ipa-22r gene product Bacilius Subtilis 56 31 699 46 154 gi1046009 M. genitalium predicted coding region MG309 Mycoplasma 56 37 294 genitalium 49 495 gi9453804 terminase small subunit Bacteriophage LL-H 56 35 573 63 223 gi143947 glutamine synthetase Bacteroides fragilis 56 3O 222 66 6153 gi405792 ORF154 Pseudomonas putida 56 26 297 87 3 393 gi311237 H(+)-transporting ATP synthase Zea mays 56 3O 363 90 373 gi1109686 ProX Bacilius Subtilis 56 35 372 91 994 8348 gi581070 acyl coenzyme A synthetase Escherichia coli 56 35 1596 95 64 gi1510242 collagenase Methanococcus jannaschi 56 34 645 230 2O7 1821 gi40363 heat shock protein Clostridium acetobutylicum 56 39 252 238 338 3775 gi1477533 sara Staphylococcus aureus 56 31 393 270 1712 gi765073 autolysin Staphylococcus aureus 56 41 900 290 163 43 gi547513 orf3 Haemophilus influenzae 56 34 1590 297 114 1373 gi1511556 M. jannaschii predicted coding region MJ1561 Methanococcus 56 40 234 iannaschi 321 1799 651 gi1001801 hypothetical protein Synechocystis sp. 56 31 1149 359 641 gi46336 nolI gene product Rhizobium meliloti 56 26 639 371 360 1823 gi145304 L-ribulokinase Escherichia coli 56 39 1464 391 1762 2409 gi1001634 hypothetical protein Synechocystis sp. 56 34 648 402 192 gi1438904 5-HT4L receptor Homo Sapiens 56 48 189 416 2109 1738 gi1408486 HS74A gene product Bacillus subtilis 56 31 372 424 1756 2334 gi142471 acetolactate decarboxylase Bacillus Subtilis 56 32 579 US 6,593,114 B1 153 154

TABLE 2-continued 457 1017 127 gi 2O5.194 formamidopyrimidine-DNA glycosylase Haemophilus influenzae 56 36 891 458 1812 12O1 gi 54.66 terminase Bacteriophage SPP1 56 37 612 SO4 : 1283 414 gi 142681 Lpp38 Pasteurella haemolytical 56 38 870 511 1284 Sl 217049 brnO protein Salmonella typimurium 56 37 1284 604 1099 1701 Sl 467109 rim; 30S Ribosomal protein S18 alanine acetyltransferase; 56 43 603 229 C1 170 Mycobacterium leprae 660 3547 3774 gi 229106 ZK930.1 Caenorhabditis elegans 56 3O 228 707 35 gi 53929 NADPH-sulfite reductase flavoprotein component Salmonella 56 38 366 yphimurium 709 1095 805 gi 5108O1 hydrogenase accessory protein Methanococcus jannaschi 56 38 291 718 495 gi 413948 ipa-24d gene product Bacillus Subtilis 56 35 495 744 87 Sl 92.8836 repressor protein Lactococcus lactis phage BK5-T 56 35 591 790 399 22 gi 511513 ABC transporter, probable ATP-binding subunit Methanococcus 56 33 378 iannaschi 795 gi 2O5382 cell division protein Haemophilus influenzae 56 34 405 813 930 gi 222161 permease Haemophilus influenzae 56 28 912 855 515 gi 256621 26.7% of identity in 165 aa to a Thermophilic bacterium hypothetical 56 33 513 protein 6: putative Bacillus subtilis 968 466 Sl 547513 orf3 Haemophilus influenzae 56 37 465 973 415 Sl 886O22 MexR Pseudomonas aeruginosa 56 31 31.8 12O3 gi 84251 HMG-1 Homo Sapiens 56 34 219 1976 22 Sl19806 lysine-rich aspartic acid-rich protein Plasmodium chabaudi 56 33 216 rS22183S22183 lysine/aspartic acid-rich protein - Plasmodium baudi 2161 gi 237015 ORF4 Bacilius Subtilis 56 27 399 2958 gi 4 66685 No definition line found Escherichia coli 56 26 18O 2979 gi 204354 spore germination and vegetative growth protein Haemophilus 56 40 210 influenzae 2994 326 Sl1836646 phosphoribosylformimino-praic ketoisomerase Rhodobacter 56 29 phaeroides 3O26 179 gi 43.306 penicillin V amidase Bacilius Sphaericus 56 3O 150 31.89 146 gi 166604 Similar to aldehyde dehydrogenase Caenorhabditis elegans 56 37 144 3770 63 4 O 1. gi 129145 acetyl-CoA C-acyltransferase Mangifera indical 56 43 339 4054 361 gi 205355 Na+/H+ antiporter Haemophilus influenzae 56 31 360 4145 Sl 726095 ong-chain acyl-CoA dehydrogenase Mus musculus 56 36 324 42OO 254 gi 55588 glucose-fructose oxidoreductase Zymomonas mobilis 56 40 252 pirA42289A42289 glucose-fructose oxidoreductase (EC 1.1.--) recursor - Zymomonas nobilis 4273 355 35 Sl 3O8861 GTG start codon Lactococcus lactis 56 33 321 3 3436 2777 5341 Putative orf YCLX8c, len: 192 Saccharomyces cerevisiae 55 25 660 rS53591 S53591 hypothetical protein - yeast Saccharomyces evisiae 11 1. 2 8505 7633 Sl 216773 haloacetate dehalogenase H-1 Moraxella sp. 55 32 873 12 4534 3935 gi 467337 unknown Bacilius Subtilis 55 26 6OO 19 54.04 5844 gi OO1719 hypothetical protein Synechocystis sp. 55 25 441 23 12339 10591 Sl 474190 iucA gene product Escherichia coli 55 3O 1749 32 5368 6888 gi unknown Mycobacterium tuberculosis 55 37 1521 34 1808 104.7 gi YgiQ Bacillus subtilis 55 39 762 34 3412 2864 gi YGjK Bacillus subtilis 55 33 549 36 647 3 Sl 606045 ORF ol18 Escherichia coli 55 27 645 36 5243 4266 gi OO1341 hypothetical protein Synechocystis sp. 55 31 978 47 3054 3821 gi OO1819 hypothetical protein Synechocystis sp. 55 21 768 49 1127 189 403373 glycerophosphoryl diester phosphodiesterase Bacilius Subtilis 55 36 939 Sl pirS37251S37251 glycerophosphoryl diester phosphodiesterase - acillus subtilis 67 11 8966 9565 gi 53053 norA1199 protein Staphylococcus aureus 55 23 6OO 75 881 273 Sl141698 L-histidinol:NAD+ oxidoreductase (EC 1.1.1.23) (aa 1-434) 55 33 393 scherichia coli 82 14194 13OO1 gi 136221 carboxypeptidase Sulfolobus Solfataricus 55 35 1194 87 3517 4917 gi O64812 function unknown Bacilius Subtilis 55 26 1401 88 2 1172 636 Sl1882.463 protein-N(pi)-phosphohistidine-sugar phosphotransferase Escherichia 55 35 465 Oli 92 127 516 gi 377832 unknown Bacilius Subtilis 55 36 390 1OO 836 2035 gi 370274 zeaxanthin epoxidase Nicotiana plumbaginifolia 55 36 12OO 1OO 4658 4179 Sl 39666O unknown open reading frame Buchnera aphidicola 55 29 48O 108 2986 7O6 gi 499866 M. jannaschii predicted coding region MJ1024 Methanococcus 55 31 1281 iannaschi 114 1834 O52 gi 511367 formate dehydrogenase, alpha subunit Methanococcus jannaschi 55 29 783 144 1476 147 gi 1OO787 unkown Saccharomyces cerevisiae 55 35 330 165 5508 4804 gi O45884 M. genitalium predicted coding region MG 199 Mycoplasma 55 27 705 genitalium 189 2205 2576 gi 42.569 ATP synthase a subunit Bacillus firmus 55 35 372 191 6857 4578 Sl 559.411 B0272.3 Caenorhabditis elegans 55 39 228O 194 364 636 gi 145768 K7 kinesin-like protein Dictyostelium discoideum 55 34 273 209 1335 676 Sl 473357 thi4 gene product Schizosaccharomyces pombe 55 35 342 211 1145 597 gi ORFX6 Bacilius subtilis 55 37 549 213 644 372 Sl 633692 TrSA Yersinia enterocolitical 55 28 729 214 4144 5481 gi OO1793 hypothetical protein Synechocystis sp. 55 3O 1338 221 91.97 6921 Sl 46652O pocR Salmonella typhimurium 55 32 2277 233 4817 3726 gi 237O63 unknown Mycobacterium tuberculosis 55 38 1092 236 1375 2340 gi 146199 putative Bacillus Subtilis 55 32 966 US 6,593,114 B1 15 S 156

TABLE 2-continued 243 38O 1885 gi 4 59907 mercuric reductase Plasmid pI258 55 29 1506 258 394 gi 4 55006 orf6 Rhodococcus fascians 55 36 393 281 126 938 gi 408493 homologous to SwissProt:YIDA ECOLI hypothetical protein 55 35 813 Bacilius Subtilis 316 1323 2102 gi 486.447 LuxA homologue Rhizobium sp. 55 3O 78O 326 2744 252O gi 296824 proline iminopeptidase Lactobacillus helveticus 55 36 225 351 1429 536 gi 2O482O hydrogen peroxide-inducible activator Haemophilus influenzae 55 28 894 353 2197 2412 gi 272475 chitin synthase Emericella nidulans 55 50 216 38O 14 379 gi 42554 ATP synthase i subunit Bacillus megaterium 55 37 366 383 232 Sl 289272 ferrichrome-binding protein Bacillus subtilis 55 36 231 386 938 gi 51O251 DNA helicase, putative Methanococcus jannaschii 55 3O 936 410 1208 1891 gi 2O5144 multidrug resistance protein Haemophilus influenzae 55 27 684 483 411 833 gi 413934 ipa-10r gene product Bacilius Subtilis 55 26 423 529 1433 1089 Sl 606150 ORF f309 Escherichia coli 55 33 345 555 585 82 gi 434O7 para-aminobenzoic acid synthase, component I (pab) Bacilius ubtilis 55 28 SO4 565 2O2 gi 223961 CDP-tyvelose epimerase Yersinia pseudotuberculosis 55 41 2O1 582 452 153 gi 256643 20.2% identity with NADH dehydrogenase of the Leishmania major 55 36 3OO mitochondrion; putative Bacillus subtilis 645 5 2057 1854 1210824 fusion protein F Bovine respiratory syncytial virus 55 25 204 Sl pirJQ1481VGNZBA fusion glycoprotein precursor - bovine espiratory syncytial virus (strain A51908) 672 957 2216 Sl 1511333 M. jannaschii predicted coding region MJ1322 Methanococcus 55 36 1260 iannaschi 730 479 Sl 537007 ORF f379 Escherichia coli 55 3O 477 737 945 31 Sl 536963 CG Site No. 18166 Escherichia coli 55 3O 915 742 228 572 Sl1304160 product unknown Bacillus subtilis 55 38 345 817 : 903 595 gi 136289 histidine kinase A Dictyostelium discoideum 55 29 309 819 355 128 Sl 558073 polymorphic antigen Plasmodium falciparum 55 22 228 832 724 296 gi 40367 ORFC Clostridium acetobutyllicum 55 32 429 840 386 gi 2O5875 pseudouridylate synthase I Haemophilus influenzae 55 39 384 1021 23 529 gi 8563 beta-lactamase Yersinia enterocolitical 55 38 507 1026 60 335 gi 7804 Opp C (AA 1-301) Salmonella typhimurium 55 26 276 1525 282 gi 477533 sara Staphylococcus aureus 55 29 282 1814 224 985 gi O46O78 M. genitalium predicted coding region MG369 Mycoplasma 55 38 762 genitalium 3254 254 81 Sl 413968 ipa-44d gene product Bacillus Subtilis 55 3O 174 3695 345 Sl 216773 haloacetate dehalogenase H-1 Moraxella sp. 55 32 342 3721 312 gi 42O29 ORF1 gene product Escherichia coli 55 31 312 3799 272 gi 42O29 ORF1 gene product Escherichia coli 55 38 270 3889 423 gi 129145 acetyl-CoA C-acyltransferase Mangifera indical 55 45 402 3916 385 Sl 52975.4 spec Streptococcus pyogenes 55 38 384 3945 198 Sl 476252 phase 1 flagellin Salmonella enterica 55 36 195 4O74 gi 42O29 ORF1 gene product Escherichia coli 55 38 243 41.84 343 gi 524267 unknown Mycobacterium tuberculosis 55 28 342 4284 208 gi 100774 ferredoxin-dependent glutamate synthase Synechocystis sp. 55 36 195 4457 378 112 gi 8O189 cerebellar-degeneration-related antigen (CDR34) Homo Sapiens 55 38 267 gi182737 cerebellar degeneration-associated protein Homo Sapiens pirA2977OA29770 cerebellar degeneration-related protein - human 4514 244 216773 haloacetate dehalogenase H-1 Moraxella sp. 55 32 243 4599 217 gi 129145 acetyl-CoA C-acyltransferase Mangifera indical 55 42 216 4606 210 4 Sl 38612O myosin alpha heavy chain (S2 subfragment) rabbits, masseter, 55 27 2O7 eptide Partial, 234 aa 4932 4516 Sl S36069 ORF YBL047c Saccharomyces cerevisiae 54 27 417 12 61.65 5164 gi homoserine acetyltransferase Haemophilus influenzae 54 3O 10O2 23 15326 13566 Sl 474.192 iucC gene product Escherichia coli 54 31 176 35 2 979 gi 4 8054 small subunit of soluble hydrogenase (AA 1-384) Synechococcus 54 36 sp. irS06919HQYCSS soluble hydrogenase (EC 1.12.--) small chain - nechococcus sp. (PCC 6716) 37 11 8667 7897 Sl ORF f277 Escherichia coli 54 38 37 12 81.65 8332 gi 160967 palmitoyl-protein thioesterase Homo Sapiens 54 37 46 15 13025 13804 gi 4 38.473 protein is hydrophobic, with homology to E. coli ProW; putative 54 28 Bacilius Subtilis 56 2O3 736 gi 256139 Ybby Bacilius subtilis 54 34 534 57 10179 9241 gi 151248 inosine-uridine preferring nucleoside hydrolase Crithidia 54 32 939 fasciculata 66 516 1133 gi 335781 Cap Drosophila melanogaster 54 29 618 70 8116 8646 gi 3998.23 PhoE Rhizobium meliloti 54 31 53 70 118O1 11046 sp PO2983TCR S TETRACYCLINE RESISTANCE PROTEIN. 54 29 756 87 4915 57O6 gi O64811 function unknown Bacilius Subtilis 54 33 792 92 2289 1573 gi 2O5366 oligopeptide transport ATP-binding protein Haemophilus influenzae 54 33 717 103 1556 516 Sl 710495 protein kinase Bacillus brevis 54 33 104 105 2095 605 gi 43727 putative Bacillus Subtilis 54 3O 149 112 2337 2732 gi 53724 MalC Streptococcus pneumoniae 54 41 396 127 1720 2493 gi 44.297 acetyl esterase (XynC) Caldocellum Saccharolyticum 54 34 774 pirB37202B37202 acetylesterase (EC 3.1.1.6) (XynC) - Caldocellum accharolyticum 138 16OO 3306 Sl 42473 pyruvate oxidase Escherichia coli 54 36 1707 152 525 1172 gi 377834 unknown Bacilius Subtilis 54 23 648 161 4831 5469 Sl1903305 ORF73 Bacilius subtilis 54 28 639 US 6,593,114 B1 157 158

TABLE 2-continued 161 13 6694 7251 gi1511039 phosphate transport system regulatory protein Methanococcus 32 558 iannaschi 164 3263 4543 gi1204976 prolyl-tRNA synthetase Haemophilus influenzae 54 34 1281 164 216O2 22243 gi143582 spoIIIEA protein Bacillus subtilis 54 32 642 171 4250 2817 gi436965 malA gene products Bacilius Stearothermophilus 54 37 1434 pirS43914S43914 hypothetical protein 1 - Bacillus tearothermophilus 2O6 19208 1972O gi1240016 R09E10.3 Caenorhabditis elegans 54 38 513 218 1090 1905 gi467378 unknown Bacilius Subtilis 54 26 816 22O 663 gi1353761 myosin II heavy chain Naegleria fowleri 54 22 660 22O 12655 13059 pirS00485S004 gene 11-1 protein precursor - Plasmodium falciparum (fragments) 54 35 405 221 2O3O 3709 gi1303813 YgeW Bacillus subtilis 54 34 168O 272 42.19 3383 gi62964 arylamine N-acetyltransferase (AA 1-290) Gallus gallus 54 33 837 ir SO6652XYCHY3 arylamine N-acetyltransferase (EC 2.3.1.5) (clone NAT-3) - chicken 316 4141 47O1 gi682769 mccE gene product Escherichia coli 54 31 561 316 6994 8742 gi413951 ipa-27d gene product Bacillus subtilis 54 28 1749 338 2214 1051 gi490328 LORF Funidentified 54 28 1164 341 32O1 3614 gi171959 myosin-like protein Saccharomyces cerevisiae 54 25 414 346 912 gi396400 similar to eukaryotic Na+/H+ exchangers Escherichia coli 54 34 909 sp|P32703YJCE ECOLI HYPOTHETICAL 60.5 KD PROTEIN IN SOXR-ACS NTERGENIC REGION (O549). 348 623 1351 gi537109 ORF f343a Escherichia coli 54 34 729 378 1007 1942 sp|PO2983TCR S TETRACYCLINE RESISTANCE PROTEIN. 54 31 936 408 4351 53O1 gi474190 iucA gene product Escherichia coli 54 29 951 444 7934 8854 gi216267 ORF2 Bacilius megaterium 54 32 921 463 2229 1741 gi304160 product unknown Bacillus subtilis 54 50 489 502 1133 570 gi1205015 hypothetical protein (SP:P10120) Haemophilus influenzae 54 38 564 505 5357 4452 gi1500558 2-hydroxyhepta-2,4-diene-1,7-dioate isomerase Methanococcus 54 41 906 iannaschi 550 1. 1522 gi401.00 rodC (tag3) polypeptide (AA 1-746) Bacillus subtilis 54 35 1215 irS06049S06049 rodC protein - Bacillus subtilis p|P134585TAGF BACSUTECHOIC ACID BIOSYNTHESIS PROTEIN E. 551 5 3305 4279 gi95O197 unknown Corynebacterium glutamicum 54 34 975 558 958 560 gi485090 No definition line found Coenorhabditis elegans 54 32 399 58O 91 936 gi331906 fused envelope glycoprotein precursor Friend spleen focus-forming 54 45 846 irus 603 554. 757 gi1323423 ORF YGR234w Saccharomyces cerevisiae 54 36 204 617 25 249 gi219959 ornithine transcarbamylase Homo Sapiens 54 40 225 622 1097 1480 gi1303873 YdgZ Bacillus subtilis 54 25 384 623 404 gi1063250 low homology to P20 protein of Bacillus lichiniformis and bleomycin 54 45 402 acetyltransferase of Streptomyces verticillus Bacillus subtilis 689 1011 475 gi5524.46 NADH dehydrogenase subunit 4 Apis mellifera ligustical 54 537 pirS52968 S52968 NADH dehydrogenase chain 4 - honeybee itochondrion (SGC4) 725 686 1441 gi987096 sensory protein kinase Streptomyces hygroscopicus 26 756 956 249 pirS30782S307 integrin homolog - yeast Saccharomyces cerevisiae 24 249 978 859 581 gi1301994 ORF YNLO91w Saccharomyces cerevisiae 33 279 1314 281 gi1001108 hypothetical protein Synechocystis sp. 33 279 2450 1. 228 gi1045057 ch-TOG Homo Sapiens 32 228 2934 387 gi580870 ipa-37d qoxA gene product Bacilius Subtilis 36 387 2970 251 sp|P37348|YECE HYPOTHETICAL PROTEIN IN ASPS 5' REGION (FRAGMENT). 42 249 3OO2 1. 309 gi44027 Tma protein Lactococcus lactis 33 309 3561 464 gi151259 HMG-CoA reductase (EC 1.1.1.88) Pseudomonas mevalonii 35 456 pirA44756A44756 hydroxymethylglutaryl-CoA reductase (EC 1.1.1.88) Pseudomonas sp. 3572 72 gi4506884 hsdM gene of EcoprrI gene product Escherichia coli 54 36 330 pirS38437S38437 hsdM protein - Escherichia coli pirSO9629 SO9629 hypothetical protein A - Escherichia coli (SUB 40-520) 3829 400 gi132224.5 mevalonate pyrophosphate decarboxylase Rattus norvegicus 29 399 3909 273 gi29865 CENP-E Homo sapiens 3O 273 3921 209 pirS24325|S243 glucan 1,4-beta-glucosidase (EC 3.2.1.74) - Pseudomonas fluorescens 34 2O7 subsp. cellulosa 4438 285 gi1196657 unknown protein Mycoplasma pneumoniae 3O 282 4459 272 gi1046081 hypothetical protein (GB:D26185 10) Mycoplasma genitalium 38 270 4564 221 gi216267 ORF2 Bacilius megaterium 38 219 23 12 10685 8832 gi474.192 iucC gene product Escherichia coli 35 1854 23 14 13579 12317 gi42029 ORF1 gene product Escherichia coli 32 1263 24 3940 3440 gi369947 c2 gene product Bacteriophage B1 36 5O1 26 381.8 4618 gi1486247 unknown Bacilius Subtilis 37 8O1 38 2856 3998 gi4058804 yei I Escherichia coli 40 1143 38 7806 6232 gi1399954 thyroid sodium/iodide symporter NIS Rattus norvegicus 29 1575 56 121OO 11876 pirA54592A545 110k actin filament-associated protein - chicken 32 225 57 4583 4119 pirA00341|DEZP alcohol dehydrogenase (EC 1.1.1.1) - fission yeast 39 465 (Schizosaccharomyces pombe) US 6,593,114 B1 159 160

TABLE 2-continued 57 12 8932 7349 gi1480429 putative transcriptional regulator Bacilius Stearothermophilus 53 3O 1584 67 12 9496 10218 gi1511555 quinolone resistance norA protein protein Methanococcus 53 31 723 iannaschi 69 2.382 1639 gi1087017 arabinogalactan-protein, AGP (Nicotiana alata, cell-suspension 53 744 culture filtrate, Peptide, 461 aa) 79 1031 gi1523802 glucanase Anabaena variabilis 53 32 1029 8O 338 gi4524284 ATPase 3 Plasmodium falciparum 53 36 336 88 1910 2524 gi537034 ORF o488 Escherichia coli 53 25 615 88 2467 3282 gi537034 ORF o488 Escherichia coli 53 29 816 92 5505 5140 gi399598 amphotropic murine retrovirus receptor Rattus norvegicus 53 33 366 94 3239 2O61 gi173038 tropomyosin (TPM1) Saccharomyces cerevisiae 53 25 1179 99 42O7 5433 sp|P28246 BCR E BICYCLOMYCIN RESISTANCE PROTEIN (SULFONAMIDE 53 3O 1227 RESISTANCE PROTEIN). 12O 1639 2262 gi576655 ORF1 Vibrio anguillarum 53 35 624 12O 7257 8897 gi1524397 glycine betaine transporter OpuD Bacillus subtilis 53 33 1641 127 5685 4477 gi1256630 putative Bacillus Subtilis 53 32 1209 147 255 557 gi581648 epiB gene product Staphylococcus epidermidis 53 34 303 158 4256 3807 gi151004 mucoidy regulatory protein AlgR Pseudomonas aeruginosa 53 32 450 pirA32802A328O2 regulatory protein algR - Pseudomonas aeruginosa sp|P26275ALGR PSEAE POSITIVE ALGINATE BIOSYNTHESIS REGULATORY ROTEIN. 171 5421 5125 gi1510669 hypothetical protein (GP:D64044. 18) Methanococcus jannaschii 53 34 297 191 11483 9879 gi2.98085 acetoacetate decarboxylase Clostridium acetobutylicum 53 31 1605 pirB49346B49346 butyrate--acetoacetate CoA-transferase (EC 8.3.9) small chain - Clostridium acetobutyllicum sp|P33752CTFA CLOAB BUTYRATE-ACETOACETATE COATRANSFERASE SUBUNIT (EC 2.8.3.9) (COATA) 3763 4326 gi143456 rpoE protein (ttg start codon) Bacillus Subtilis 53 29 564 17 18204 18971 gi3O4136 acetylglutamate kinase Bacilius Stearothermophilus 53 36 768 sp|Q07905|ARGB BACST ACETYLGLUTAMATE KINASE (EC 2.7.2.8) (NAG INASE) (AGK) (N-ACETYL-L-GLUTAMATE 5-PHOSPHOTRANSFERASE). 212 4021 4221 gi9878 protein kinase Plasmodium falciparum 53 28 2O1 231 1350 112O gi537506 paramyosin Dirofilaria immitis 53 34 231 272 2719 3249 pirA33141A331 hypothetical protein (gtfD 3' region) - Streptococcus mutans 53 34 531 3O8 927 2576 gi606292 ORF o696 Escherichia coli 53 33 1650 32O 5645 5884 gi160596 RNA polymerase III largest subunit Plasmodium falciparum 53 33 240 sp|P27625RPC1 PLAFA DNA-DIRECTED RNA POLYMERASE III LARGEST UBUNIT (EC 2.7.7.6). 327 218 901 gi854.601 unknown Schizosaccharomyces pombe 53 31 684 341 212 2500 gi633732 ORF1 Campylobacter jejuni 53 31 2289 351 383 sp|P31675YABM HYPOTHETICAL 42.7 KD PROTEIN IN TBPA-LEUD 53 32 381 INTERGENIC REGION (ORF104). 433 4731 4375 gi1001961 MHC class II analog Staphylococcus aureus 53 3O 357 454 : 98O 720 pirA60328A603 40K cell wall protein precursor (sr 5' region) - Streptococcus mutans 53 27 261 (strain OMZ175, serotype f) 470 1123 1761 gi516826 rat GCP360 Rattus rattus 53 3O 639 483 1. 217 gi1480429 putative transcriptional regulator Bacilius Stearothermophilus 53 33 216 544 516 1259 gi46587 ORF 1 (AA 1-121) (1 is 2nd base in codon) Staphylococcus 53 38 744 aureus irS15765S15765 hypothetical protein 1 (hlb 5’ region) - aphylococcus aureus (fragment) 558 3754 3551 gi15140 res gene Bacteriophage P1 53 32 204 603 339 gi507738 Hmp Vibrio para haemolyticus 53 26 282 693 941 213 gi153123 toxic shock syndrome toxin-1 precursor Staphylococcus aureus 53 38 729 pirA24606XCSAS1 toxic shock syndrome toxin-1 precursor - taphylococcus aureus 766 673 gi687600 orfA2; orfA2 forms an operon with orfA1 Listeria monocytogenes 53 43 672 781 335 gi1204551 pilin biogenesis protein Haemophilus influenzae 53 26 333 8O1 545 gi12794.00 SapA protein Escherichia coli 53 25 543 803 910 gi695278 lipase-like enzyme Alcaligenes eutrophus 53 3O 909 872 590 gi298.032 EF Streptococcus Suis 53 3O 588 910 184 gi1044936 unknown Schizosaccharomyces pombe 53 29 183 943 399 gi290508 similar to unidentified ORF near 47 minutes Escherichia coli 53 3O 396 sp|P31436YICK ECOLI HYPOTHETICAL 43.5 KD PROTEIN IN SELC-NLPANTERGENC REGION. 988 SO4 gi142441 ORF3; putative Bacillus subtilis 53 28 5O1 1064 434 gi305080 myosin heavy chain Entamoeba histolytical 53 26 432 1366 452 gi3O8852 transmembrane protein Lactococcus lactis 53 33 450 1758 397 gi1001774 hypothetical protein Synechocystis sp. 53 3O 396 1897 447 gi1303949 YgiX Bacillus subtilis 53 27 447 2381 400 gi1146243 22.4% identity with Escherichia coli DNA-damage inducible 53 37 399 protein . . . ; putative Bacillus subtilis 3537 327 gi4506884 hsdM gene of EcoprrI gene product Escherichia coli 53 35 327 pirS38437S38437 hsdM protein - Escherichia coli pirSO9629 SO9629 hypothetical protein A - Escherichia coli (SUB 40-520) 3747 137 397 gi1477486 transposase Burkholderia cepacial 53 53 261 11 5 3O49 3441 gi868224 No definition line found Caenorhabditis elegans 52 33 393 15 2205 2369 gi215966 G41 protein (gtg start codon) Bacteriophage T4 52 34 165 US 6,593,114 B1 161 162

TABLE 2-continued 19 2429 3808 gi1205379 UDP-murnac-pentapeptide synthetase Haemophilus influenzae 52 31 138O 24 3462 gi579124 predicted 86.4 kd protein; 52 Kd observed Mycobacteriophage 15 52 32 3459 pirS30971S30971 gene 26 protein - Mycobacterium phage L5 sp|Q05233VG26 BPML5 MINORTAIL PROTEIN GP26. (SUB 2-837) 37 3O15 3935 gi15.00543 P115 protein Methanococcus jannaschii 52 25 921 38 13 8795 9703 gi46851 glucose kinase Streptomyces coelicolor 52 29 909 44 16 106.17 11066 gi42012 moaE gene product Escherichia coli 52 36 450 46 3 521 gi1040957 NADH dehydrogenase subunit 6 Anopheles trinkae 52 25 519 51 1O 5531 628O gi388269 traC Plasmid paid1 52 32 750 56 2826 1684 gi181949 endothelial differentiation protein (edg-1) Homo Sapiens 52 23 1143 pirA35300S35300 G protein-coupled receptor edg-1 - human sp|P21453|EDG1 HUMAN PROBABLE GPROTEIN-COUPLED RECEPTOREDG-1. 57 4173 3496 gi304153 sorbitol dehydrogenase Bacillus subtilis 52 27 678 62 2870 2376 gi1072399 phaE gene product Rhizobium meliloti 52 25 495 62 3651 2857 gi46485 NADH dehydrogenase Synechococcus PCC7942 52 27 795 67 11355 12962 gi1511365 glutamate synthase (NADPH), subunit alpha Methanococcus 52 3O 1608 iannaschi 67 16935 18158 gi1204393 hypothetical protein (SP:P31122) Haemophilus influenzae 52 25 1224 70 1997 1809 gi7227 cytoplasmic dynein heavy chain Dictyostelium discoideum 52 36 189 rA44357A44357 dynein heavy chain, cytosolic - slime mold (ctyostelium discoideum) 96 1OOOS 10664 gi1408485 B65G gene product Bacillus subtilis 52 26 660 O3 3351 2716 gi10093.68 Respiratory nitrate reductase Bacillus subtilis 52 42 636 O9 3350 2598 gi699274 lmbE gene product Mycobacterium leprae 52 39 753 O9 15732 173OO gi1526981 amino acid permease Yee F like protein Salmonella typhimurium 52 3O 1569 21 981 550 gi732931 unknown Saccharomyces cerevisiae 52 32 432 25 865 168O gi1296975 puT gene product Porphyromonas gingivais 52 38 816 3O 659 1807 gi1256634 25.8% identity over 120 aa with the Synenococcus sp. MpeV protein; 52 36 1149 putative Bacillus Subtilis 49 583 gi1225943 PBSX terminase Bacilius Subtilis 52 33 582 49 44.15 4143 gi1510368 M. jannaschii predicted coding region MJO272 Methanococcus 52 35 273 iannaschi 67 216 1OO1 gi146025 cell division protein Escherichia coli 52 43 786 88 12O 1256 gi4749154 orf 337; translated orf similarity to SW: BCR ECOLI 52 26 1137 bicyclomycinesistance protein of Escherichia coli Coxiella burnetii pirS442O7IS44207 hypothetical protein 337 - Coxiella burnetii (SUB-338) 95 8359 gi3028 mitochondrial Outer membrane 72K protein Neurospora crassa 52 25 402 rA36682A36682 72K mitochondrial Outer membrane protein - rospora crassa 2OO 2O65 26O7 gi142439 ATP-dependent nuclease Bacillus Subtilis 52 35 543 2O3 2776 3684 gi1303698 BltD Bacilius Subtilis 52 25 909 227 5250 5651 gi305080 myosin heavy chain Entamoeba histolytical 52 24 402 242 21 1424 gi1060877 Emry Escherichia coli 52 32 1404 249 4526 4753 pirC37222C372 cytochrome P450 1A1, hepatic - dog (fragment) 52 23 228 255 1055 gi143290 penicillin-binding protein Bacillus subtilis 52 28 1053 276 3664 3365 gi1001610 hypothetical protein Synechocystis sp. 52 3O 3OO 276 8 4055 3654 gi4162354 orf L3 Mycoplasma capricoium 52 26 402 289 1449 1042 gi150900 GTP phosphohydrolase Proteus vulgaris 52 34 408 325 279 gi1204874 polypeptide deformylase (formylmethionine deformylase) 52 33 279 Haemophilus influenzae 340 1010 gi1215695 peptide transport system protein SapF homolog: SapF homolog 52 33 1008 Mycoplasma pneumoniae 375 340 1878 gi4674464 similar to SpoVB Bacillus subtilis 52 28 1539 424 3262 242O gi1478239 unknown Mycobacterium tuberculosis 52 34 843 430 575 pirA42606A426 orfA 5' to orf2.05 - Saccharopolyspora erythrea (fragment) 52 28 573 444 3712 2696 gi1408494 homologous to penicillin acylase Bacilius Subtilis 52 31 1017 465 903 gi143331 alkaline phosphatase regulatory protein Bacilius Subtilis 52 36 900 pirA27650A27650 regulatory protein phoR - Bacillus subtilis sp|P23545|PHOR BACSU ALKALINE PHOSPHATASE SYNTHESIS SENSOR PROTEIN HOR (EC 2.7.3-). 469 4169 3633 gi755152 highly hydrophobic integral membrane protein Bacilius Subtilis 52 32 537 sp|P42953TAGG BACSU TEICHOIC ACID TRANSLOCATION PERMEASE PROTEIN AGG. 495 633 gi1204607 ranscription activator Haemophilus influenzae 52 25 630 505 5762 552O gi142440 ATP-dependent nuclease Bacillus Subtilis 52 28 243 517 1162 1614 gi166162 Bacteriophage phi-11 int gene activator Staphylococcus acteriophage 52 35 453 phi 11 543 444 1295 gi1215693 putative orf; GT9 or f434 Mycoplasma pneumoniae 52 25 852 586 1. 336 gi581648 epiB gene product Staphylococcus epidermidis 52 36 336 773 426 gi1279769 FdhC Methanobacterium thermoformicicum 52 3O 423 112O 1OO 330 gi142439 ATP-dependent nuclease Bacillus Subtilis 52 35 231 1614 347 gi289262 comE ORF3 Bacilius Subtilis 52 28 345 2495 324 gi216151 DNA polymerase (gene L.; ttg start codon) Bacteriophage SPO2 52 34 324 gi579197 SPO2 DNA polymerase (aa 1-648) Bacteriophage SPO2 pirA21498|DJBPS2 DNA-directed DNA polymerase (EC 2.7.7.7) - phage PO2 US 6,593,114 B1 163 164

TABLE 2-continued 2931 285 gi1256136 YbbG Bacillus subtilis 52 3O 282 2943 32O gi41713 his A ORF (AA 1-245) Escherichia coli 52 35 258 2993 295 gi298.032 EF Streptococcus Suis 52 34 294 3667 307 gi849025 hypothetical 64.7-kDa protein Bacillus subtilis 52 36 306 3944 260 gi1218040 BAA Bacillus licheniformis 52 36 219 3954 347 gi854.064 U87 Human herpesvirus 6 52 50 267 3986 90 gi1205919 Na+ and Cl- dependent gamma-aminobutryic acid transporter 52 33 312 Haemophilus influenzae 389 gi400034 Oxoglutarate dehydrogenase (NADP+) Bacillus subtilis 52 42 387 p|P23129ODO1 BACSU 2-OXOGLUTARATE DEHYDROGENASE E1 COMPONENT (EC 2.4.2) (ALPHA KETOGLUTARATE DEHYDROGENASE). 4O2O 249 gi159388 ornithine decarboxylase Leishmania donovani 52 47 249 4098 gi409795 No definition line found Escherichia coli 52 32 219 4248 212 gi965077 Adróp Saccharomyces cerevisiae 40 210 3 575 gi895747 putative cel operon regulator Bacilius Subtilis 28 573 21 2479 3276 gi1510962 indole-3-glycerol phosphate synthase Methanococcus jannaschi 32 798 22 5301 5966 gi1303933 YgiN Bacillus subtilis 25 666 43 1283 1OSO gi1519460 Srp1 Schizosaccharomyces pombe 31 234 44 11042 11305 gi42011 moaD gene product Escherichia coli 35 264 51 6453 6731 gi495471 vacuolating toxin Helicobacter pylori 37 279 52 2537 2995 gi1256652 25% identity to the E. coli regulatory protein MprA; putative 32 459 Bacilius Subtilis 57 1O 6843 6355 gi508173 EIIA domain of PTS-dependent Gat transport and phosphorylation 5 32 489 Escherichia coli 59 29 1111 gi299163 alanine dehydrogenase Bacilius Subtilis 33 1083 67 15791 16576 gi1510977 M. jannaschii predicted coding region MJO938 Methanococcus 24 786 iannaschi 69 1218 877 gi467359 unknown Bacilius Subtilis 34 342 71 3 1196 gi298.032 EF Streptococcus Suis 32 1194 78 176 gi1161242 proliferating cell nuclear antigen Styela clava 28 174 99 3357 4040 gi642795 TFIID subunit TAFII55 Homo sapiens 25 684 109 1428 gi580920 rodD (gtaA) polypeptide (AA 1-673) Bacillus subtilis 27 1425 pirS06048S06048 probable rodD protein - Bacillus Subtilis sp|P13484TAGE BACSU PROBABLE POLY(GLYCEROL PHOSPHATE) LPHA-GLUCOSYLTRANSFERASE (EC 2.4.1.52) (TECHOIC ACID BIOSYNTHESIS ROTEIN E). 109 6007 6693 gi1204815 hypothetical protein (SP:P32662) Haemophilus influenzae 23 687 112 1066 2352 pirSO533OISO53 maltose-binding protein precursor - Enterobacter aerogenes 42 1287 112 12855 11278 gi405857 yehU Escherichia coli 29 1578 114 8967 8209 gi435098 orf1 Mycoplasma capricoium 3O 759 115 1. 912 gi1431110 ORF YDLO85w Saccharomyces cerevisiae 25 912 127 9647 10477 gi1204314 H. influenzae predicted coding region HIO056 Haemophilus 37 831 influenzae 152 6814 7356 gi4319294 MunI regulatory protein Mycoplasma sp. 38 543 154 575 1153 gi1237044 unknown Mycobacterium tuberculosis 36 579 154 5634 4681 gi4092864 bmrU Bacilius Subtilis 27 954 171 6236 5529 gi1205484 hypothetical protein (SP:P33918) Haemophilus influenzae 32 708 184 291 gi4668864 B1496 C3 206 Mycobacterium leprae 33 291 212 2139 pirA45605A456 mature-parasite-infected erythrocyte surface antigen MESA 23 639 Plasmodium falciparum 228 707 1378 gi8204 nuclear protein Drosophila melanogaster 27 672 236 7481 6825 gi49272 Asparaginase Bacillus lichenifornis 31 657 243 3546 2455 gi1511102 melvalonate kinase Methanococcus jannaschi 29 1092 257 3373 32O6 gi1204579 H. influenzae predicted coding region HIO326 Haemophilus 22 168 influenzae 258 1609 821 gi160299 glutamic acid-rich protein Plasmodium falciparum 5 34 789 pirA54514A54514 glutamic acid-rich protein precursor - Plasmodium alciparum 265 2419 3591 gi580841 F1 Bacillus Subtilis 32 298 518 748 gi1336162 SCPB Streptococcus agalactiae 34 231 316 5817 7049 gi413953 ipa-29d gene product Bacillus subtilis 39 1233 332 2057 339 gi1209012 mutS Thermus aquaticus thermophilus 26 1719 364 3816 4991 gi52.8991 unknown Bacilius Subtilis 32 440 448 684 gi2819 transferase (GAL10) (AA 1-687) Kluyveromyces lactis 32 rSO1407XUVKG UDPglucose 4-epimerase (EC 5.1.3.2) - yeast (uyveromyces marxianus var. lactis) 495 1177 1OO1 gi297861 protease G. Erwinia chrysanthemi 41 77 495 1718 1149 gi1513317 serine rich protein Entamoeba histolytical 25 SO6 421 gi455320 cII protein Bacteriophage P4 33 42O 6OO 983 492 gi587532 orf, len: 201, CAI: 0.16 Saccharomyces cerevisiae 3O 492 pirS48818S48818 hypothetical protein - yeast (Saccharomyces erevisiae) 6O7 479 934 gi1511524 hypothetical protein (SP:P37002) Methanococcus jannaschii 40 456 686 127 6OO gi493017 endocarditis specific antigen Enterococcus faecalis 3O 474 726 33 230 gi1353851 unknown Prochlorococcus marinus 45 98 861 176 652 gi4101.45 dehydroquinate dehydratase Bacillus subtilis 34 77 US 6,593,114 B1 165 166

TABLE 2-continued 869 393 gi401.004 rodC (tag3) polypeptide (AA 1-746) Bacillus subtilis 23 390 irS06049S06049 rodC protein - Bacillus subtilis p|P13485TAGF BAQCSUTECHOIC ACID BIOSYNTHESIS PROTEIN E. 1003 322 gi1279707 hypothetical phosphoglycerate mutase Saccharomyces cerevisiae 39 321 1046 624 382 gi510257 glycosyltransferase Escherichia coli 29 243 1467 352 gi1511175 M. jannaschii predicted coding region MJ1177 Methanococcus 32 351 iannaschi 2558 230 sp|P10582|DPOM DNA POLYMERASE (EC 2.7.7.7) (S-1 DNA ORF3). 26 228 3OO3 399 19 gi809543 CbrC protein Erwinia chrysanthemi 27 381 3604 399 pirJC421OJC42 3-hydroxyacyl-CoA dehydrogenase (EC 1.1.1.35) - mouse 37 399 3732 316 gi145906 acyl-CoA synthetase Escherichia coli 33 315 3791 274 gi1061351 semaphorin III family homolog Homo Sapiens 37 273 3995 46 336 gi216346 surfactin synthetase Bacillus subtilis 38 291 4193 307 2 gi42749 ribosomal protein L12 (AA 1-179) Escherichia coli 25 306 irSO4776XXECPL peptide N-acetyltransferase rimL (EC 2.3.1.-) - cherichia coli 4539 185 gi 4 O8494 homologous to penicillin acylase Bacilius Subtilis 40 183 4562 239 gi 4 5828O coded for by C. elegans cDNA cm01e7; Similar to hydroxymethyl 35 204 glutaryl-CoA synthase Caenorhabditis elegans 3576 4859 gi559160 GRAIL score: null; cap site and late promoter motifs present pstream; 50 44 1284 putative Autographa californica nuclear polyhedrosis irus 11 4044 5165 gi1146207 putative Bacillus Subtilis 50 35 1122 11 9496 8483 gi1208451 hypothetical protein Synechocystis sp. 50 39 1014 19 101.8 gi413966 ipa-42d gene product Bacillus Subtilis 50 29 1017 2O 84O7 8228 gi1323159 ORF YGR103w Saccharomyces cerevisiae 50 28 18O 24 4824 4240 gi4962804 structural protein Bacteriophage Tuc2009 50 29 585 34 1926 2759 gi1303966 YGO Bacillus subtilis 50 36 834 38 22865 23440 gi1072179 Similar to dihydroflavonol-4-reductase (maize, petunia, tomato) 50 32 576 Caenorhabditis elegans 47 1705 2976 gi153015 Fema protein Staphylococcus aureus 50 29 1272 56 15290 15841 gi606096 ORF f167; end overlaps end of ol. O0 by 14 bases; start overlaps 50 3O 552. f174, ther starts possible Escherichia coli 57 1077 19 gi640922 xylitol dehydrogenase unidentified hemiascomycete 50 29 1059 58 628 1761 gi143725 putative Bacillus Subtilis 50 29 1134 88 3884 3375 gi1072179 Similar to dihydroflavonol-4-reductase (maize, petunia, tomato) 50 32 510 Caenorhabditis elegans 89 3356 3O12 gi1276658 ORF174 gene product Porphyra purpurea 50 25 345 141 3 239 gi4760244 carbamoyl phosphate synthetase II Plasmodium falciparum 50 33 237 151 186 626 gi1403441 unknown Mycobacterium tuberculosis 50 35 441 166 96.23 81.81 gi895747 putative cel operon regulator Bacilius Subtilis 50 32 1443 2O1 5096 4908 gi160229 circumsporozoite protein Plasmodium reichenowi 50 42 189 2O6 29555 28326 gi1052754 LmrP integral membrane protein Lactococcus lactis 50 24 1230 211 1523 1927 gi4101.31 ORFX7 Bacilius subtilis 50 29 405 214 241.1 3295 sp|P37348|YECE HYPOTHETICAL PROTEIN IN ASPS 5' REGION (FRAGMENT). 50 37 885 228 44O6 3744 gi313580 envelope protein Human immunodeficiency virus type 1 50 35 663 pirS35835S35835 envelope protein - human immunodeficiency virus ype 1 (fragment) (SUB 1-77) 272 1723 398 gi1408485 B65G gene product Bacillus subtilis 50 22 1326 273 984 352 gi984186 phosphoglycerate mutase Saccharomyces cerevisiae 50 28 633 328 1605 703 gi148896 lipoprotein Haemophilus influenzae 50 26 903 332 38O2 2135 gi1526547 DNA polymerase family X Thermus aquaticus 50 27 1668 342 3473 3931 gi456562 G-box binding factor Dictyostelium discoideum 50 35 459 352 741 gi288301 ORF2 gene product Bacillus megaterium 50 29 738 408 5299 5523 gi11665 ORF2136 Marchantia polymorpha 50 27 225 42O 650 1825 gi757842 UDP-sugar hydrolase Escherichia coli 50 3O 1176 464 591 gi487282 Na+-ATPase subunit J Enterococcus hirae 50 29 591 472 864 310 gi551875 BglR Lactococcus lactis 50 23 555 52O 23 541 gi567036 CapE Staphylococcus aureus 50 27 519 529 410 gi1256652 25% identity to the E. coli regulatory protein MprA; putative 50 34 405 Bacilius Subtilis 534 5 6059 4392 gi295671 selected as a weak suppressor of a mutant of the subunit AC40 of 50 18 1668 DNA ependant RNA polymerase I and III Saccharomyces cerevisiae 647 1497 gi405568 TraI protein shares sequence similarity with a family of 50 31 1494 opoisomerases Plasmid pSK41 664 711 289 gi410007 leukocidin F component Staphylococcus aureus, MRSA No. 4, 50 32 423 Peptide, 23 aa 678 627 gi298.032 EF Streptococcus Suis 50 29 627 755 947 1171 gi150572 cytochrome c1 precursor (EC 1.10.2.2) Paracoccus denitrificans 50 37 225 gi45465 cytochrome c1 (AA 1-450) Paracoccus denitrificans pirC29413C29413 ubiquinol-cytochrome-c reductase (EC 1.10.2.2) ytochrome c1 precursor - Paracoccus denitrificans spp13627CY1 827 683 gi142020 heterocyst differentiation protein Anabaena sp. 50 21 681 892 752 gi1408485 B65G gene product Bacillus subtilis 50 27 750 910 438 887 gi1204727 tyrosine-specific transport protein Haemophilus influenzae 50 25 450 933 524 760 gi1205451 cell division inhibitor Haemophilus influenzae 50 32 237 973 236 48 gi886947 orf3 gene product Saccharomyces cerevisiae 50 40 189 1009 429 205 gi153727 M protein group G streptococcus 50 28 225 US 6,593,114 B1 167 168

TABLE 2-continued 1027 257 gi413934 ipa-10r gene product Bacilius Subtilis 50 25 255 1153 326 gi773676 incCA Alcaligenes xyloSOxydans 50 36 231 1222 400 gi1408485 B65G gene product Bacillus subtilis 50 21 399 1350 399 gi289272 ferrichrome-binding protein Bacillus subtilis 50 32 294 2945 184 gi171704 hexaprenyl pyrophosphate synthetase (COQ1) Saccharomyces 50 34 183 erevisiae 2968 804 gi397526 clumping factor Staphylococcus aureus 33 8O1 2998 394 131 gi495696 F54E7.3 gene product Caenorhabditis elegans 40 264 3O46 106 pirS13819|S138 acyl carrier protein - Anabaena variabilis (fragment) 32 2O1 3063 gi474190 iucA gene product Escherichia coli 29 273 3174 146 gi151900 alcohol dehydrogenase Rhodobacter sphaeroides 31 144 3792 gi1001423 hypothetical protein Synechocystis sp. 35 312 38OO 262 gi144733 NAD-dependent beta-hydroxybutyryl coenzyme A dehydrogenase 28 261 Clostridium acetobutylicum 3946 gi576765 cytochrome b Myrmecia pilosula 38 186 3984 sp|P37348|YECE HYPOTHETICAL PROTEIN IN ASPS 5' REGION (FRAGMENT). 37 288 37 752O gi1204367 hypothetical protein (GB:U14003 278) Haemophilus influenzae 3O 366 46 14848 gi466860 acd; B1308 F1 34 Mycobacterium leprae 24 104.7 59 36O1 gi606304 ORF o462 Escherichia coli 27 1335 112 18615 gi559502 ND4 protein (AA 1-409) Caenorhabditis elegans 25 732 138 7902 gi303953 esterase Acinetobacter calcoaceticus 29 930 217 5138 gi496254 fibronectin/fibrinogen-binding protein Streptococcus pyogenes 31 738 22O 12657 gi397526 clumping factor Staphylococcus aureus 31 855 228 2492 pirS23692S236 hypothetical protein 9 - Plasmodium falciparum 24 651 268 212 gi143047 ORFB Bacilius subtilis 26 2403 271 1373 gi100.1257 hypothetical protein Synechocystis sp. 38 210 3OO 2O2O gi1510796 hypothetical protein (GP:X91006. 2) Methanococcus jannaschii 26 1161 381 gi396301 matches PS00041: Bacterial regulatory proteins, araC family ignature 29 1140 Escherichia coli 466 947 gi1303863 YdgP Bacillus subtilis 26 945 666 gi633112 ORF1 Streptococcus Sobrinus 29 189 670 1014 gi1122758 unknown Bacilius Subtilis 32 612 709 157 gi14383.0 xpaC Bacillus subtilis 29 639 831 gi401786 phosphomannomutase Mycoplasma pirum 29 471 1052 gi1303799 YgeNBacillus subtilis 21 210 18OO gi216300 peptidoglycan synthesis enzyme Bacilius Subtilis 28 171 sp|P37585|MURG BACSU MURG PROTEIN UPD-N- ACETYLGLUCOSAMINE--N-ACETYLMURAMYL (PENTAPEPTIDE) PYROPHOSPHORYLUNDECAPRENOL N-ACETYLGLUCOSAMINE (RANSFERASE). 2430 376 HYPOTHETICAL 36.2 KD PROTEIN IN NDK-GCPE 26 375 INTERGENC REGION. 3.096 273 gi516360 surfactin synthetase Bacillus subtilis 25 270 32 31OO 2429 gi1217963 hepatocyte nuclear factor 4 gamma (HNF4gamma) Homo Sapiens 36 672 38 1. 609 gi1205790 H. influenzae predicted coding region HI1555 Haemophilus 28 609 influenzae 45 5021 6427 gi1524267 unknown Mycobacterium tuberculosis 14O7 59 14 16346 31096 gi1197336 Lmp3 protein Mycoplasma hominis s 14751 61 3 608 gi1511555 quinolone resistance norA protein protein Methanococcus 606 iannaschi 61 3311 3646 gi1303893 YchLBacillus subtilis 4 8 29 336 114 98 415 gi671708 sus) homolog; similar to Drosophila melanogaster suppressor of able 25 318 (sucs)) protein, Swiss-Prot Accession Number P22293 Drosophila virilis 121 610 89 gi1314584 unknown Sphingomonas S88 4 8 29 522 136 128O 546 gi1205968 H. influenzae predicted coding region HI1738 Haemophilus 23 735 influenzae 171 1O 822O 9557 gi1208454 hypothetical protein Synechocystis sp. 4 8 34 1338 175 1814 gi396400 similar to eukaryotic Na+/H+ exchangers Escherichia coli 29 1812 sp|P32703YJCE ECOLI HYPOTHETICAL 60.5 KD PROTEIN IN SOXR-ACS NTERGENIC REGION (O549). 194 385 gi1510493 M. jannaschii predicted coding region MJO419 Methanococcus 25 384 iannaschi 197 452 gi1045714 spermidine?putrescine transport ATP-binding protein Mycoplasma 25 450 genitalium 396 gi940288 protein localized in the nucleoli of pea nuclei; ORF, putative 29 396 Pisum Sativum 204 698 33 gi529202 No definition line found Caenorhabditis elegans 25 666 2O6 27760 2O705 gi511490 gramicidin S synthetase 2 Bacillus brevis 27 7056 212 2 166 gi295899 nucleolin Xenopus laevis 34 165 22O 11426 1O2OO gi44073 SecY protein Lactococcus lactis 23 1227 243 5491 4532 gi1184118 mevalonate kinase Methanobacterium thermoautotrophicum 3O 960 264 33O8 1182 gi1015903 ORFYJR151c Saccharomyces cerevisiae 26 2127 441 768 gi142863 replication initiation protein Bacillus subtilis 23 765 pirB26580B26580 replication initiation protein - Bacillus ubtilis 444 3898 5298 gi145836 putative Escherichia coli 24 1401 484 388 1110 gi146551 transmembrane protein (kdpD) Escherichia coli 4 8 18 723 542 1425 2OOO pirS28969|S289 N-carbamoylsarcosine amidohydrolase (EC 3.5.1.59) - Arthrobacter 27 576 sp. US 6,593,114 B1 169 170

TABLE 2-continued 566 3 1019 gi153490 tetracenomycin C resistance and export protein Streptomyces 8 24 1017 laucescens 611 2 730 gi1103507 unknown Schizosaccharomyces pombe 38 729 624 665 75 gi144859 ORF B Clostridium perfringens 26 591 846 508 gi537506 paramyosin Dirofilaria immitis 27 507 1O2O 66 950 gi1499876 magnesium and cobalt transport protein Methanococcus 3O 885 iannaschi 1227 174 gi493730 lipoxygenase Pisun Sativum 35 174 1266 405 gi882452 ORF f211; alternate name yggA; orf5 of X14436 Escherichia coli s 24 405 gi41425 ORF5 (AA 1–197) Escherichia coli (SUB 15–211) 2O71 381 55 gi1408486 HS74A gene product Bacillus subtilis 327 2398 233 gi1500401 reverse gyrase Methanococcus jannaschi 231 2425 246 16 pirH48563H485 G1 protein - fowlpox virus (strain HP444) (fragment) 231 2432 225 gi1353703 Trio Homo Sapiens 222 2453 399 gi142850 division initiation protein Bacillus subtilis 396 2998 236 gi577.569 PepV Lactobacilius delbrueckii 234 3O42 14 28O gi945219 mucin Homo Sapiens 267 3686 405 gi145836 putative Escherichia coli 405 4027 301 110 pirS51177|S511 trans-activator protein - Equine infectious anemia virus 192 2232 823 gi1303989 Yck Bacillus subtilis 1410 24 599 1084 gi540O83 PC4-1 gene product Bradysia hygida 486 36 6925 6326 gi1209223 esterase Acinetobacter twofii 6OO 43 196 1884 gi1403455 unknown Mycobacterium tuberculosis 1689 44 15108 14098 gi1511555 quinolone resistance norA protein protein Methanococcus 1011 iannaschi 69 6710 6279 gi4384664 Possible operon with Orf6. Hydrophilic, no homologue in the atabase; 7 29 432 putative Bacillus Subtilis 81 4279 3536 gi466882 pps1: B1496 C2 189 Mycobacterium leprae 24 744 12O 12 8863 8591 gi927340 D9509.27p; CAI: 0.12 Saccharomyces cerevisiae 38 273 142 1174 326 gi486143 ORFYKL094w Saccharomyces cerevisiae 32 849 168 1093 gi1177254 hypothetical EcsB protein Bacillus subtilis 29 1086 263 943 gi142822 D-alanine racemase cds Bacilius Subtilis 34 942 279 561 13 gi516608 2 predicted membrane helices, homology with B. Subtilis men Orf3 549 Rowland et. al. unpublished Accession number (M74183), approximately 1 minutes on updated Rudd map; putative Escherichia coli sp|P37355YFBB ECOLI HYPOTHETICAL 26.7 KD PROTEIN IN MEND-MENB 345 1676 732 gi1204835 hippuricase Haenophilus influenzae 28 945 389 152 400 gi4565624 G-box binding factor Dictyostelium discoideum 32 249 391 831 gi1420856 myo-inositol transporter Schizosaccharomyces pombe 19 831 404 2O72 2773 gi1255.425 C33G8.2 gene product Caenorhabditis elegans 17 702 529 2145 3107 gi1303973 YV Bacilius Subtilis 29 963 565 1257 193 gi142824 processing protease Bacillus Subtilis 28 1065 654 483 gi243353 ORF 5' of ECRF3 herpesvirus saimiri HVS, host-squirrel monkey, 23 48O eptide, 407 aa 692 115 633 gi150756 40 kDa protein Plasmid plM1 25 519 765 819 gi1256621 26.7% of identity in 165 aa to a Thermophilic bacterium hypothetical 2 28 816 protein 6: putative Bacillus subtilis 825 211 1023 gi397526 clumping factor Staphylococcus aureus 32 813 914 615 gi558073 polymorphic antigen Plasmodium falciparum 29 615 1076 753 gi1147557 Aspartate aminotransferase Bacillus circulans 33 753 1351 398 gi755153 ATP-binding protein Bacillus Subtilis 2O 396 4.192 293 gi145836 putative Escherichia coli 24 291 4361 4014 gi305080 myosin heavy chain Entamoeba histolytical 3O 348 11 2777 3058 gi603639 Ye1040p Saccharomyces cerevisiae 28 282 46 1O3OO 10O82 gi1246901 ATP-dependent DNA ligase Candida albicans 28 219 61 3941 793O gi298.032 EF Streptococcus Suis 35 3990 132 4093 3158 gi1511057 hypothetical protein SP:P45869 Methanococcus jannaschii 25 936 170 3652 2585 pirS51910|S519 G4 protein - Sauroleishmania tarentolae 26 1068 191 8284 7025 gi1041334 F54D5.7 Caenorhabditis elegans 25 1260 253 396 gi1204449 dihydrolipoamide acetyltransferase Haemophilus influenzae 35 396 264 437 973 gi180189 cerebellar-degeneration-related antigen (CDR34) Homo Sapiens 29 537 gi182737 cerebellar degeneration-associated protein Homo Sapiens pirA2977OA29770 cerebellar degeneration-related protein - human 273 285 85 gi607573 envelope glycoprotein C2V3 region Human immunodeficiency virus 35 type 350 3 563 gi537052 ORF f286 Escherichia coli 35 561 384 2 862 gi1221884 (urea?) amidolyase Haemophilus influenzae 31 861 410 1876 2490 gi1110518 proton antiporter efflux pump Mycobacterium Snegmatis 24 615 432 1455 247 gi1197634 orf4; putative transporter; Method: conceptual translation supplied by 27 1209 author Mycobacterium Smegmatis 458 1211 gi15470 portal protein Bacteriophage SPP1 3O 1209 517 2477 4.192 gi1523812 orf5 Bacteriophage A2 23 1716 540 3 1285 1058 gi215635 pacA Bacteriophage P1 3O 228 587 649 1242 gi537148 ORF f181 Escherichia coli 29 594 1218 391 35 gi1205456 single-stranded-DNA-specific exonuclease Haemophilus influenzae 3O 357 US 6,593,114 B1 171 172

TABLE 2-continued 3685 402 gi450688 hsdM gene of EcoprrI gene product Escherichia coli 33 402 pirS38437S38437 hsdM protein - Escherichia coli pirSO9629 SO9629 hypothetical protein A - Escherichia coli (SUB 40-520) 4176 1. 338 gi951460 FIM-C.1 gene product Xenopus laevis 31 336 37 7 4813 5922 gi606064 ORF f408 Escherichia coli 24 1110 38 16 11699 12004 gi452.192 protein tyrosine phosphatase (PTP-BAS, type 2) Homo sapiens 24 306 87 2 1748 24O7 gi1064813 homologous to sp:PHOR BACSU Bacillus subtilis 23 660 103 12 13385 12588 gi1001307 hypothetical protein Synechocystis sp. 22 798 112 14 13811 12831 gi1204389 H. influenzae predicted coding region HIO131 Haemophilus 23 981 influenzae 145 3461 2439 gi220578 open reading frame Mus musculus 4 5 2O 1023 170 4965 36O1 gi238657 AppC=cytochrome d oxidase, subunit I homolog Escherichia coli, 27 1365 K12, eptide, 514 aa 2O6 4346 3462 gi1222056 aminotransferase Haemophilus influenzae 27 885 228 60 716 gi160299 glutamic acid-rich protein Plasmodium falciparum 23 657 pirA54514A54514 glutamic acid-rich protein precursor - Plasmodium alciparum 288 1015 gi1255.425 C33G8.2 gene product Caenorhabditis elegans 23 1014 313 3128 1917 gi581140 NADH dehydrogenase Escherichia coli 3O 1212 332 459 gi870966 F47A4.2 Caenorhabditis elegans 2O 456 344 221 gi171225 kinesin-related protein Saccharomyces cerevisiae 26 219 441 1073 645 gi142863 replication initiation protein Bacillus subtilis 27 429 pirB26580B26580 replication initiation protein - Bacillus ubtilis 672 982 gi1511334 M. jannaschii predicted coding region MJ1323 Methanococcus 22 981 iannaschi 763 3 851 357 gi606180 ORF f310 Escherichia coli 24 495 886 379 846 gi726426 similar to protein kinases and C. elegans proteins F37C12.8 and 3O 468 37C12.5 Caenorhabditis elegans 948 473 gi156400 myosin heavy chain (isozyme unc-54) Caenorhabditis elegans 25 471 pirA93958MWKW myosin heavy chain B - Caenorhabditis elegans sp|PO2566|MYSB CAEEL MYOSIN HEAVY CHAIN B (MHC B). 1158 2 376 gi4411554 ransmission-blocking target antigen Plasmodium falciparum 35 375 2551 4 285 gi1276705 ORF287 gene product Porphyra purpurea 28 282 3967 42 374 gi976025 Hrs.A Escherichia coli 28 333 52 5846 4761 gi467378 unknown Bacilius Subtilis 22 1086 138 6475 6849 gi173028 thioredoxin II Saccharomyces cerevisiae 28 375 221 56.17 42O2 gi153490 tetracenomycin C resistance and export protein Streptomyces 21 1416 laucescens 252 1122 913 gi1204989 hypothetical protein (GB:U00022 9) Haemophilus influenzae 3O 210 263 2093 921 gi1136221 carboxypeptidase Sulfolobus Solfataricus 26 1173 365 3524 2085 gi1296822 orf1 gene product Lactobacilius helveticus 31 1440 543 1315 1833 gi1063250 low homology to P20 protein of Bacillus lichiniformis and bleomycin 24 519 acetyltransferase of Streptomyces verticillus Bacillus subtilis 544 4 3942 4892 gi951460 FIM-C.1 gene product Xenopus laevis 4 4 32 951 792 613 2 gi2O5680 high molecular weight neurofilament Rattus norvegicus 28 612 44 18 11303 11911 gi1511614 molybdopterin-guanine dinucleotide biosynthesis protein. A 27 609 Methanococcus jannaschii 59 3665 5128 gi153490 tetracenomycin C resistance and export protein Streptomyces 21 1464 laucescens 59 1O 5536 75.27 gi153022 lipase Staphylococcus epidermidis 22 1992 99 681 16 gi1419051 unknown Mycobacterium tuberculosis 21 666 310 94O2 12134 gi397526 clumping factor Staphylococcus aureus 21 2733 432 23O3 1824 pirA60540A605 sporozoite surface protein 2 - Plasmodium yoelii (fragment) 29 48O 519 2547 3122 sp|Q0653ODHSU SULFIDE DEHYDROGENASE (FLAVOCYTOCHROME C) 23 576 FLAVOPROTEIN CHAIN PRECURSOR (EC 1.8.2.-) (FC) (FCSD). 13 12O53 13321 gi295671 selected as a weak suppressor of a mutant of the subunit AC40 of 18 1269 DNA ependant RNA polymerase I and III Saccharomyces cerevisiae 94 1091 414 gi5.01027 ORF2 Trypanosoma brucei 31 678 127 4550 3309 gi42029 ORF1 gene product Escherichia coli 21 1242 297 1036 557 gi142790 ORF1; putative Bacillus firmus 25 48O 344 3525 2953 gi40320 ORF 2 (AA 1-203) Bacillus thuringiensis 3O 573 512 1115 63 gi405957 yeeFEscherichia coli 23 1053 631 1223 12 gi580920 rodD (gtaA) polypeptide (AA 1-673) Bacillus subtilis 24 1212 pirS06048S06048 probable rodD protein - Bacillus Subtilis sp|P13484TAGE BACSU PROBABLE POLY(GLYCEROL PHOSPHATE) LPHA-GLUCOSYLTRANSFERASE (EC 2.4.1.52) (TECHOIC ACID BIOSYNTHESIS ROTEIN E). 685 1739 1119 gi1303784 YgeD Bacillus subtilis 42 19 621 4132 395 gi1022910 protein tyrosine phosphatase Dictyostelium discoideum 42 25 393 86 884 393 gi309506 spermidine?spermine N1-acetyltransferase Mus Saxicola 41 3O 492 pirS4343OS43430 spermidine/spermine N1-acetyltransferase - spiny ouse Mus Saxicola 191 12 14O75 13353 gi1124957 orf4 gene product Methanosarcina barkeri 41 22 723 212 6 2150 3127 gi15873 observed 35.2 Kd protein Mycobacteriophage 15 41 26 978 US 6,593,114 B1 173 174

TABLE 2-continued 213 1263 2OOO gi 633692 TrSA Yersinia enterocolitical 41 18 738 408 2625 3386 gi 1197634 orf4; putative transporter; Method: conceptual translation supplied by 41 24 762 author Mycobacterium Smegmatis 542 1103 gi 4571.46 rhoptry protein Plasmodium yoeli 41 1101 924 475 nucleolin - rat 41 474 1562 402 gi 5521.84 asparagine-rich antigen Pfa35-2 Plasmodium falciparum 40 402 pirS27826S27826 asparagine-rich antigen Pfa35-2 - Plasmodium alciparum (fragment) 2395 261 pirS42251S422 hypothetical protein 5 - fowlpox virus 40 18 258 4077 305 gi 1055055 coded for by C. elegans cDNA yk37g1.5; coded for by C. elegans 39 21 303 cDNA ykSc9.5; coded for by C. elegans cDNA yk1a9.5: alternatively spliced form of F52C9.8b Caenorhabditis elegans 958 503 gi 1255.425 C33G8.2 gene product Caenorhabditis elegans 37 25 5O1 59 8294 10636 gi 535260 STARP antigen Plasmodium reichenowi 36 24 2343 63 3550 8079 gi 298O32 EF Streptococcus Suis 36 19 4530 544 2507 36O1 gi 1015903 ORFYJR151c Saccharomyces cerevisiae 35 22 1095 63 4 1949 3574 gi 552195 circumsporozoite protein Plasmodium falciparum 32 27 1626 sp|PO5691 ICSP PLAFL CIRCUMSPOROZOITE PROTEIN (CS) (FRAGMENT).

TABLE 3 TABLE 3-continued S. aureus - Putative coding regions of novel proteins S. aureus - Putative coding regions of novel proteins not similar to known proteins 25 not similar to known proteins Contig ORF Start Stop Contig ORF Start Stop ID ID (nt) (nt) ID ID (nt) (nt) 692 150 38 2O 16371 15760 1712 2278 38 26 20253 20804 3O32 2361 38 27 2O722 21264 1. 12585 12097 39 1. 1. 627 1601 663 40 1. 404 1532 1771 43 1. 428 4550 4359 44 4 2324 6422 4.905 44 5 2484 1. 8547 8383 35 44 14 10129 1982 1605 44 2O 13536 176 3 44 21 13596 5144 5983 45 7 6297 5968 6498 46 8 6365 6284 6096 46 12 10449 10954 11271 40 46 17 15032 4942 4532 47 288 4596 4862 48 762O 1650 1405 50 962 1. 10835 104O7 50 1316 917 741 51 370 7764 6403 45 51 2.245 823O 7889 53 287 88O3 8405 53 6319 10470 87.82 54 2 8709 339 4 55 326 5485 4832 55 3 786 5942 5508 56 1. 6881 6111 50 56 1228 12618 1283O 56 1560 4.185 3814 56 18712 5241 4840 57 3521 1824 24O2 57 5436 505 849 58 8553 1177 1524 55 59 1366 24.54 3OOS 59 28O2 765 1388 59 3570 7952 8575 59 4563 8591 8728 59 75.18 93.79 902O 59 104O1 10O87 9377 60 62 1521 1049 783 62 1. 5440 5226 58O1 63 1. 7261 6947 67 900 7424 7621 67 1774 2964 2770 67 2591 98O 375 65 67 6955 1. 6.425 6868 68 78 US 6,593,114 B1 175 176

TABLE 3-continued TABLE 3-continued S. aureus - Putative coding regions of novel proteins S. aureus - Putative coding regions of novel proteins not similar to known proteins not similar to known proteins Contig ORF Start Stop Contig ORF Start Stop ID ID (nt) (nt) ID ID (nt) (nt) 70 5199 3637 49 15 4610 4413 70 8645 8355 49 16 SO49 4603 77 192 794 1O 49 18 5491 5243 79 228 947 49 21 7054 6692 79 411 1791 49 23 8521 7826 83 2 403 49 24 9106 8531 85 83OO 8653 49 25 9897 9115 85 8781 8593 50 2 1587 871 86 232 1038 15 54 3 1508 1221 87 91.87 9366 54 8 6398 6210 88 62O 1922 54 14 12147 11590 89 3 16 54 15 128O3 12O75 89 4 878 4714 56 1. 315 593 91 550 2 57 11.83 2232 91 3141 2344 58 1064 657 92 449 928 59 452 808 92 467 976 61 876 1808 92 5638 6024 61 4279 3905 94 332 61 4540 4277 94 813 61 47 4538 94 2197 61 563 5459 96 106O1 25 63 84 76 99 4523 63 234 1892 99 4784 63 264 2342 OO 7287 63 490 5132 O2 4368 64 114 956 O3 2035 66 485 4495 O4 2 68 250 2868 O4 699 68 359 4158 05 693 70 25 2777 05 2655 71 145 623 O6 3 71 1112 9674 O6 1209 72 278 O7 542 35 72 358 O9 3651 73 127 O9 11625 73 6 1. 5227 O9 11981 74 5 9 1105 O9 174O1 75 255 2890 1O 2 75 333 2850 14 1. 8764 75 434 4506 40 16 1. 82 498 4495 16 4462 84 570 5361 16 9976 88 12 1755 16 10158 88 264 2994 2O 332O 89 26 4 3O39 2O 3869 90 1998 2564 2O 929O 45 91 153 21 417 91 669 388 26 81.8 91 1. 5 11786 13039 27 2648 91 12363 11824 27 4084 92 91 426 31 6438 95 1932 1558 32 715 50 95 2606 2313 34 2 98 1016 1591 35 258 2O1 170 625 35 729 2O3 783 1466 38 3 2O6 7815 67OO 38 6008 2O6 1. 13636 13325 40 1032 55 2O6 27960 27712 40 1513 212 170 817 40 2387 212 796 1167 42 1360 212 3128 3436 42 7586 212 3749 4O75 43 65O2 213 705 44 640 214 570 64 46 2 60 214 3738 3412 46 502 214 66OO 6995 46 2540 214 7.469 7074 46 287.4 217 965 3 47 1. 218 178 657 49 3615 218 1776 2156 49 3785 65 22O 1369 887 49 4145 22O 2262 1273 US 6,593,114 B1 177 178

TABLE 3-continued TABLE 3-continued S. aureus - Putative coding regions of novel proteins S. aureus - Putative coding regions of novel proteins not similar to known proteins not similar to known proteins Contig ORF Start Stop Contig ORF Start Stop ID ID (nt) (nt) ID ID (nt) (nt) 22O 7208 6141 316 982 341 22O 8661 7078 316 2758 31.65 22O 10216 86.36 1O 317 2 114 221 2613 21.31 317 34 58 2346 221 10757 10086 321 5217 4789 226 3 659 321 6 40 5961 226 1459 722 321 6794 61.38 226 1476 1961 322 543 259 227 2 487 15 326 65 112 227 460 975 326 1. 17 467 227 1855 2121 328 4 69 227 2052 2345 328 3.276 227 3768 2776 329 3 227 5591 6367 329 781 228 2503 2877 329 14 71 228 2846 3526 330 289 233 3762 358O 330 14 47 236 579 349 332 2204 238 1391 807 332 4971 239 905 393 333 3 28 241 4334 4173 335 4 33 242 1363 1049 25 337 95 243 127 576 340 1356 244 647 3 341 3 244 1962 889 341 24 76 245 1258 902 341 3618 246 69 215 341 3929 246 738 1733 344 2889 249 3712 3518 345 768 250 249 4 346 221 254 1. 156 350 1410 256 956 1144 352 1765 257 3227 2754 352 4596 260 458O 4254 35 352 7967 261 2196 2606 352 8906 261 3214 3681 352 9854 264 155 439 359 264 533 3814 362 6 5 264 739 5107 364 1808 145 267 931 539 364 1071.4 1045 40 268 4 700 4260 365 1313 1O 272 446 3O 365 4O90 350 272 2OO 1439 365 498O 623 272 4691 4909 366 52O 17 272 6035 56O1 367 906 108 276 746 1901 368 494 278 224 553 45 375 278 3299 3448 38O 1097 278 4849 5127 389 285 551 736 390 288 756 950 390 1373 154 288 2055 2276 391 560 3 6 289 O55 3 50 395 197 290 932 630 396 1068 291 332 622 398 1141 938 291 545 2051 399 178 669 295 349 O92 O1 566 847 295 2141 554 O2 1OO 465 295 2220 2762 55 O4 5370 5179 297 465 142 O8 2269 1031 298 2 205 O8 2672 2469 3OO 1928 476 O8 3524 4423 301 2 2624 2454 1O 1890 1669 304 3 194 13 488 96 306 109 654 16 32O 33 60 306 4036 4 257 16 578 847 307 339 4 16 1590 985 307 3645 3995 17 179 3O8 1. 654 17 161 616 3O8 599 78 2O 513 238 3O8 2332 2021 22 357 677 313 1919 1524 65 31 856 14O7 314 1O 702 32 446 1084 US 6,593,114 B1 179 18O

TABLE 3-continued TABLE 3-continued S. aureus - Putative coding regions of novel proteins S. aureus - Putative coding regions of novel proteins not similar to known proteins not similar to known proteins Contig ORF Star Stop Contig ORF Start Stop ID ID (nt) (nt) ID ID (nt) (nt) 33 417 558 7 2322 2008 33 2O33 1755 558 8 28O2 2551 34 535 128 1O 558 9 3453 292O 34 1235 381 560 475 921 40 450 565 1485 1264 42 1269 332O 571 156 4 43 1520 1167 571 994 12O6 44 696 57.7 2 199 44 6366 5971 15 57.7 163 453 51 288 579 1. 477 53 376 579 2OO 616 53 4786 583 996 4 53 4306 585 539 132 53 4525 587 22 573 55 4 588 372 848 55 930 588 554. 1366 59 687 590 47 334 62 247 592 141 827 66 32O 593 2 775 67 44 593 817 1122 68 250 595 87 890 69 362 25 596 435 1277 69 3372 6O2 8 169 69 37O6 603 O71 1469 70 538 606 322 768 70 3290 6O7 226 1008 70 SO42 610 541 53 70 81.81 612 3 500 70 9773 616 6SO 309 71 60 617 491 246 71 472 622 36 347 76 267 625 2O46 2549 77 760 627 67 210 77 2081 35 628 452 3 77 2332 631 4004 3219 8O 4261 634 759 70 81 4 636 189 368 86 774 636 1063 197 87 2112 637 : 1994 1665 88 3 638 227 1081 40 92 675 639 261 4 93 52O 639 811 245 493 1242 641 118 444 502 3 1571 642 1331 104.7 SO4 2 642 1847 1434 505 3734 643 3 608 511 723 45 645 1534 1758 512 747 645 6 2025 2321 515 812 645 2488 2O36 517 2511 648 2 1045 52O 2360 660 77 6O1 52O 3430 660 576 872 527 498 50 661 961 197 528 33 664 89 304 529 529 667 3 413 530 2 5534 668 1. 330 536 4 671 516 22O 538 110 673 3 338 538 288O 55 674 584 303 538 2711 679 1. 237 538 3114 679 1589 1906 540 332 688 835 434 540 567 688 1077 8O2 541 433 694 3 143 541 145 696 432 46 542 1272 60 7O6 224 81 545 456 709 11.83 1449 551 113 711 3 908 555 516 715 3 167 558 951 716 2 637 558 1156 721 133 570 558 1537 65 722 383 3 558 1874 723 829 2 US 6,593,114 B1 181 182

TABLE 3-continued TABLE 3-continued S. aureus - Putative coding regions of novel proteins S. aureus - Putative coding regions of novel proteins not similar to known proteins not similar to known proteins Contig ORF Start Stop Contig ORF Start Stop ID ID (nt) (nt) ID ID (nt) (nt) 723 2 1112 726 945 2 649 1242 727 2 472 946 950 198 729 268 441 1O 949 1. 270 731 130 828 951 3 362 735 214 955 3 143 736 782 960 400 77 738 298 963 1. 162 742 230 965 346 2 745 3 412 15 966 606 133 748 464 969 3O2 749 3 971 12 170 751. 3 974. 161 3 755 522 976 348 4 755 918 977 211 758 137 982 982 38 764 459 984 296 3 767 405 987 467 768 373 993 525 771 1O 994 549 178 778 69 OO)4 31.8 785 256 O14 313 787 2 25 O15 791 224 O16 145 799 260 O19 660 804 711 O22 474 805 68O O24 299 808 842 O24 276 810 3 O3O 338 810 1110 O32 179 812 979 O40 399 817 2 O43 81.8 1104 044 115 819 535 O47 819 1090 35 O51 354 82O 1064 O51 733 828 4 O63 829 8OO O69 830 4 O69 533 832 2 O75 399 835 796 O77 97 40 840 709 O81 58 845 O86 850 449 O87 246 853 O88 860 256 O96 238 864 410 O98 509 864 715 45 OO 511 864 6 1828 OO 1158 870 588 O1 353 873 O2 194 875 O7 877 379 14 878 107 50 15 879 19 22 881 243 29 40 882 604 32 181 890 508 33 376 905 44 225 906 236 55 47 28O 912 53 913 290 54 913 59 915 161 61 186 915 402 64 254 921 386 71 19 240 927 38 60 71 108 299 928 385 83 379 929 400 95 179 932 400 96 189 934 384 2OO 33 197 936 2O3 129 464 937 616 65 222 : 105 4O1 945 645 232 387 US 6,593,114 B1 183 184

TABLE 3-continued TABLE 3-continued S. aureus - Putative coding regions of novel proteins S. aureus - Putative coding regions of novel proteins not similar to known proteins 5 - not similar to known proteins Contig ORF Start Stop Contig ORF Start Stop ID ID (nt) (nt) ID ID (nt) (nt) 2977 2 373 3 i R 2981 2 334 167 1O 2986 7 279 271 221 3O 2991 363 118 286 2 595 2995 1. 32 295 1. 165 3OO7 191 39 306 185 3 3017 3O8 48 314 2 158 631 3018 2 136 35 316 58 570 15 3025 197 359 193 2 3040 18O 4 370 1. 402 3O46 185 13 371 1. 345 3O49 278 3 374 357 4 3OSO 3 314 378 2 400 3052 253 2 392 $ 413 20 E. 411 2O2 432 3075 222 4 433 167 3 3O8O 1. 285 450 2 256 3092 62 4 453 149 3 3093 250 89 471 398 75 31OO 52 237 477 639 409 25 3103 47 298 5O2 399 4 3.118 74 4 518 126 449 3123 2 45 534 143 3 3127 1. 47 546 3 4O1 3138 69 2 547 255 4 3142 2O3 18 583 3 350 3O 3144 386 O8 587 3 563 31.51 70 3 6O2 2 170 679 3155 2 2O2 384 629 1. 402 31.68 12 76 665 235 2. : is 760 314 3 3303 2 239 400 762 3 2OO 35 3371 2 211 399 876 2 119 286 3558 2 48 895 2 379 3558 2 36 4O1 931 400 2 3568 377 3 976 2 383 51 3595 38O 3

: " a 3618361s 2 1302 238402 2150 263 3 3622 86 358 2157 399 4 3622 2 398 32 2164 283 2 3642 439 2 2175 218 400 3649 398 15 2212 331 170 45 3651 314 3 2338 367 2 3664 467 637 2342 3 167 3674 55 402 2352 166 2 3677 311 3 2352 2 398 174 3704 1. 402 2355 47 352 3726 269 3 2356 341 3 50 3765 256 2 2359 152 3 3779 357 160 2421 150 4 3794 135 4 2422 306 43 3794 2 377 87 2443 263 99 3796 2 375 112 2454 3 158 38O1 262 50 2463 253 2 55 3806 298 143 2485 3 374 3807 42 389 2557 246 4 3815 400 2 2575 2 355 3827 3 32O 2582 3 284 3.842 392 3 26O7 1. 294 3853 399 127 293O 17 400 60 3855 1. 324 2939 242 18 3857 2 235 2.944 3 359 3861 297 4

2952 2 190 3897 3 173 2953 399 61 3897 2 143 400 2964 166 2 65 3898 2 225 4O1 2969 144 4 3921 2 103 342 US 6,593,114 B1 18S 186

TABLE 3-continued TABLE 3-continued S. aureus - Putative coding regions of novel proteins not similar to known proteins S. aureus - Putative coding regions of novel proteins 5 not similar to known proteins Contig ORF Start Stop ID ID (nt)t (nt)t Contig ORF Start Stop 3927 70 375 ID ID (nt) (nt) 393O 76 234 3946 2 382 113 1O 3951 2 105 377 4346 277 83 3965 344 42 4367 2 117 311 3973 400 5 4373 2 268 s 4381 326 78 4OO1 296 111 15 4384 309 4 4003 90 335 4397 9 311 4.018 2 259 44O2 1. 249 4.018 2 186 4O1 4021 1. 345 4403 328 50 4043 3 344 44O6 3 317 s 2O 4411 2 28O 4O7O 1. 324 4411 2 398 99 4O72 2 187 390 4412 2 364

4 E. 2 1: is 4418 3 230 4083 3 359 4424 398 195 4O90 27 368 25 4443 1. 215 41O1 103 297 4471 323 4105 1. 306 4107 286 2 4.478 271 4119 339 49 4482 50 289 4121 372 4 4489 3O2 3 t 30 449, 12 206 4128 2 331 4495 3 179 4130 415 62 4496 252 4 g S. 4500 130 306 4186 254 3 35 4511 248 3 4224 256 2 4518 246 4239 1. 348 4526 24 2 4242 356 3 4252 296 3 4527 2 163 4253 1. 174 4532 3 239 4256 323 78 4258 2 334 170 40 4 542 1. 175 4267 144 4 4567 36 2OO 4271 2 304 4573 231 4287 163 23 45 4289 319 167 78 322 2 43O2 153 305 4619 18O 43O4 1. 186 45 462O 176 3 43O4 2 96 314 4306 2 151 4662 246 4318 289 2 4669 2 157 al 1. 468O 28 183 4331 2 364 2OO 50 4690 174 4 43.38 399 70

SEQUENCE LISTING The patent contains a lengthy “Sequence Listing Section. A copy of the “Sequence Listing” is available in electronic form from the USPTO web site (http://seqdata.uspto.gov/sequence.html?DocID=6593114B1). An electronic copy of the “Sequence Listing” will also be available from the USPTO upon request and payment of the fee set forth in 37 CFR 119(b)(3). US 6,593,114 B1 187 188 What is claimed is: 8. A method for making a recombinant vector comprising 1. An isolated polynucleotide fragment comprising at inserting the isolated polynucleotide fragment of claim 6 least 50 contiguous nucleotides or the complement thereof, into a vector. or a nucleotide Sequence corresponding to nucleotides 3-746 9. A recombinant vector comprising the isolated poly of SEO ID NO 559. nucleotide fragment of claim 6. 2. The isolated polynucleotide fragment of claim 1, 10. An isolated recombinant host cell comprising the wherein Said polynucleotide fragment further comprises a isolated polynucleotide fragment of claim 6. heterologous polynucleotide Sequence. 11. An isolated polynucleotide fragment comprising the 3. A method for making a recombinant vector comprising inserting the isolated polynucleotide fragment of claim 1 nucleic acid Sequence corresponding to nucleotides 3-746 of into a vector. SEO ID NO:559. 4. A recombinant vector comprising the isolated poly 12. The isolated polynucleotide fragment of 11, wherein nucleotide fragment of claim 1. Said polynucleotide fragment further comprises a heterolo 5. A recombinant host cell comprising the isolated poly gous polynucleotide Sequence. nucleotide fragment of claim 1. 15 13. A method for making a recombinant vector compris 6. The isolated polynucleotide fragment of claim 1, ing inserting the isolated polynucleotide fragment of claim wherein Said polynucleotide fragment comprises at least 100 11 into a vector. contiguous nucleotides, or the complement thereof, of a 14. A recombinant vector comprising the isolated poly nucleotide Sequence corresponding to nucleotides 3-746 of nucleotide fragment of claim 11. SEO ID NO:559. 15. An isolated recombinant host cell comprising the 7. The isolated polynucleotide fragment of claim 6, isolated polynucleotide fragment of claim 11. wherein Said polynucleotide fragment further comprises a heterologous polynucleotide Sequence.