USOO6822071B1 (12) United States Patent (10) Patent No.: US 6,822,071 B1 Stephens et al. (45) Date of Patent: Nov. 23, 2004

(54) POLYPEPTIDES FROM CHLAMYDIA (56) References Cited PNEUMONIAE AND THEIR USE IN THE DIAGNOSIS, PREVENTION AND FOREIGN PATENT DOCUMENTS TREATMENT OF DISEASE WO 99/27105 * 6/1999 (75) Inventors: Richard S. Stephens, Orinda, CA (US); OTHER PUBLICATIONS Wayne Mitchell, San Francsico, CA Gerhold et al-BioEssays 18(12):973-981, 1996.* (US); Sue S. Kalman, Saratoga, CA Wells et al Journal of Leukocyte Biology 61(5):545-550, (US); Ronald Davis, Palo Alto, CA 1997.* (US) Russell et al Journal of Molecular Biology 244:33-350, 1994.* (73) Assignee: The Regents of the University of Rudinger et al., in “Peptide Hormones' Parsons, TA ets, California, Oakland, CA (US) University Park Press pp. 1–6, 1976.* Burgess et al, The Journal of Cell Biology, 111:2129-2138, Notice: Subject to any disclaimer, the term of this 1990.* patent is extended or adjusted under 35 Lazar et al, Molecular and Cellular Biology U.S.C. 154(b) by 0 days. 8(3):1247–1252, 1988.* Jobling et al, Mol-Microbiol. 5(7): 1755–67, 1991.* (21) Appl. No.: 09/438,185 Pir-68 Database Accession No. E72002 Kalmar et al. Apr. (22) Filed: Nov. 11, 1999 23, 1999.* (Under 37 CFR 1.47) * cited by examiner Primary Examiner Patricia A. Duffy Related U.S. Application Data (74) Attorney, Agent, or Firm Townsend and Townsend (60) Provisional application No. 60/128,606, filed on Apr. 8, and Crew 1999, and provisional application No. 60/108.279, filed on Nov. 12, 1998. (57) ABSTRACT Int. Cl...... A61K 3800; C07K 14/00 (51) Chlamydia pneumoniae polypeptides are provided. The C. (52) U.S. C...... 530/300; 530/350, 530/402; pneumoniae polypeptides can be used to prepare pharma 530/810; 530/811; 530/812; 530/813; 530/814; ceutical compositions for the treatment or prevention of 530/820; 530/825 disease. In addition, the proteins can be used in methods for (58) Field of Search ...... 435/183, 184; the diagnosis of C. pneumoniae infection. 514/2; 424/185.1, 190.1, 192.1, 193.1, 234.1, 263.1 6 Claims, No Drawings US 6,822,071 B1 1 2 POLYPEPTIDES FROM CHLAMYDIA C. pneumoniae is related to other Chlamydia Species, but PNEUMONIAE AND THEIR USE IN THE the level of sequence similarity is relatively low. Very little DIAGNOSIS, PREVENTION AND is known about the biology of this organism, although it TREATMENT OF DISEASE appears to be an important human pathogen. Allelic diver sity and Structural relationships between Specific genes of CROSS-REFERENCES TO RELATED Chlamydia species is described in Kaltenboeck et al. (1993) APPLICATIONS J Bacteriol 175(2):487–502; Gaydos et al. (1992) Infect The present application is related to No. 60/128,606, filed Immun 60(12):5319-5323; Everett et al. (1997) Int J Syst Apr. 8, 1999 and No. 60/108,279, filed Nov. 12, 1998, which Bacteriol 47(2):461–473; and Pudjiatmoko et al. (1997) Int are incorporated herein by reference. J Syst Bacteriol 47(2):425-431. A number of studies have been published describing STATEMENT AS TO RIGHTS TO INVENTIONS methods for detection of C. pneumoniae, and for distin MADE UNDER FEDERALLY SPONSORED guishing between Chlamydia Species. Such methods include RESEARCH AND DEVELOPMENT PCR detection (Rasmussen et al. (1992) Mol Cell Probes 15 6(5):389–394; Holland et al. (1990) J Infect Dis 162(4): 1. Field of the Invention 984-987); a simplified polymerase chain reaction-enzyme This invention relates to nucleic acids and polypeptides immunoassay (Wilson et al. (1996) J Appl Bacteriol 80(4): from Chlamydia pneumoniae and to their use in the 431–438); Sequence determination and restriction endonu diagnosis, prevention and treatment of diseases associated clease cleavage (Herrmann et al. (1996) J Clin Microbiol with C. pneumoniae. 34(8): 1897–1902). 2. Background of the Invention Antigenic and molecular analyses of different C. pneu Chlamydiaceae is a family of obligate intracellular para moniae strains is described in Jantos et al. (1997) Clin Site with a tropism for epithelial cells lining the mucus Microbiol 35(3):620–623. Some genes of C. pneumoniae membranes. The bacteria have two morphologically distinct have been isolated and Sequenced. These include the Gro E forms, “elementary body” and “reticulate body”. The 25 operon (Kikuta et al. (1991) Infect Immun 59(12): elementary body is the infectious form, and has a rigid cell 4665-4669); the major outer membrane protein Perez et al. wall, primarily of croSS-linked outer membrane proteins. (1991) Infect Immun 59(6):2195-2199; the DnaK protein The reticulate body is the intracellular, metabolically active homolog (Komak et al. (1991) Infect Immun 59(2): form. A unique developmental cycle between these two 721-725); as well as a number of ribosomal and other genes. forms characterizes Chlamydia growth. C. pneumoniae is a human respiratory pathogen that SUMMARY OF THE INVENTION causes acute respiratory disease, and approximately 10% of This invention provides the genomic Sequence of Chlamy community-acquired pneumonia Antibody prevalence Stud dia pneumoniae. The Sequence information is useful for a ies have shown that virtually everyone is infected with C. 35 variety of diagnostic and analytical methods. The genomic pneumoniae at Some time, and that reinfection is common. Sequence may be embodied in a variety of media, including In addition to respiratory disease, Studies have shown an computer readable forms, or as a nucleic acid comprising a asSociation of this organism with coronary artery disease. It Selected fragment of the Sequence. Such fragments generally has been demonstrated in atherOSclerotic lesions of the aorta consist of an open reading frame, transcriptional or transla and coronary arteries by immunocytochemistry and by poly 40 tional control elements, or fragments derived therefrom. merase chain reaction (Kuo et al. (1993) J Infect Dis Proteins encoded by the open reading frames are useful for 167(4):841–849). diagnostic purposes, as well as for their enzymatic or Recent reports have further demonstrated the presence of Structural activity. C. pneumoniae in the walls of abdominal aortic aneurysms (Juvonen et al. (1997).J. Vasc Surg. 25(3):499-505). Abdomi 45 DEFINITIONS nal aortic aneurysms are frequently associated with atherosclerosis, and inflammation may be an important The term "amino acid” refers to naturally occurring and factor in aneurysmal dilatation. C. pneumoniae may play a Synthetic amino acids, as well as amino acid analogs and role in maintaining an inflammation and triggering the amino acid mimetics that function in a manner Similar to the development of aortic aneurysms. 50 naturally occurring amino acids. Naturally occurring amino Muhlestein et al. (1996) JACC 27:1555–61, reported a acids are those encoded by the genetic code, as well as those differential incidence of Chlamydia species within the coro amino acids that are later modified, e.g., hydroxyproline, nary artery wall of patients with atherosclerosis verSuS those Y-carboxyglutamate, and O-phosphoSerine. Amino acid ana with other forms of cardiovascular disease. The extremely logs refers to compounds that have the Same basic chemical high rate of possible infection in patients with Symptomatic 55 Structure as a naturally occurring amino acid, i.e., an O. atherosclerotic disease compared to the very low rate in carbon that is bound to a hydrogen, a carboxyl group, an patients with normal coronary arteries or coronary artery amino group, and an R group., e.g., homoserine, norleucine, disease from chronic transplant rejection provides evidence methionine Sulfoxide, methionine methyl sulfonium Such for a direct link between the atherosclerotic process and analogs have modified R groups (e.g., norleucine) or modi Chlamydia infection. Because a history of chlamydial infec 60 fied peptide backbones, but retain the same basic chemical tion is So prevalent in the population, the issue of causality Structure as a naturally occurring amino acid. Amino acid remains. On a physiologic and pathologic level, abnormal mimetics refers to chemical compounds that have a structure interactions among endothelial cells, platelets, macrophages that is different from the general chemical Structure of an and lymphocytes may lead to a cascade of events resulting amino acid, but that functions in a manner Similar to a in acute endothelial damage, thrombosis and repair, chroni 65 naturally occurring amino acid. cally leading to the development of atheroma in blood Amino acids may be referred to herein by either their vessels. commonly known three letter symbols or by the one-letter US 6,822,071 B1 3 4 symbols recommended by the IUPAC-IUB Biochemical can be modified to yield a functionally identical molecule. Nomenclature Commission. Nucleotides, likewise, may be Accordingly, each Silen: Variation of a nucleic acid which referred to by their commonly accepted Single-letter codes. encodes a polypeptide is implicit in each described "Amplification' primers are oligonucleotides comprising Sequence. either natural or analogue nucleotides that can Serve as the AS to amino acid Sequences, one of skill will recognize basis for the amplification of a Select nucleic acid Sequence. that individual Substitutions, deletions or additions to a They include, e.g., polymerase chain reaction primerS and nucleic acid, peptide, polypeptide, or protein Sequence ligase chain reaction oligonucleotides. which alters, adds or deletes a Single amino acid or a Small “Antibody” refers to an immunoglobulin molecule able to percentage of amino acids in the encoded Sequence is a bind to a specific epitope on an antigen. Antibodies can be “conservatively modified variant” where the alteration a polyclonal mixture or monoclonal. Antibodies can be results in the Substitution of an amino acid with a chemically intact immunoglobulins derived from natural Sources or Similar amino acid. Conservative Substitution tables provid from recombinant Sources and can be immunoreactive por ing functionally similar amino acids are well known in the tions of intact immunoglobulins. Antibodies may exist in a 15 art. Such conservatively modified variants are in addition to variety of forms including, for example, Fv, F, and F(ab), and do not exclude polymorphic variants, interspecies as well as in Single chains. Single-chain antibodies, in which homologs, and alleles of the invention. genes for a heavy chain and a light chain are combined into a single coding Sequence, may also be used. The following groups each contain amino acids that are An “antigen' is a molecule that is recognized and bound conservative Substitutions for one another: by an antibody, e.g., peptides, carbohydrates, organic 1) Alanine (A), Glycine (G); molecules, or more complex molecules Such as glycolipids and glycoproteins. The part of the antigen that is the target 2) Serine (S), Threonine (T); of antibody binding is an antigenic determinant and a Small 3) Aspartic acid (D), Glutamic acid (E), functional group that corresponds to a single antigenic 4) Asparagine (N), Glutamine (Q); determinant is called a hapten. 25 5) Cysteine (C), Methionine (M); “Biological Sample” refers to any Sample obtained from a 6) Arginine (R), Lysine (K), Histidine (H); living or dead organism. Examples of biological Samples 7) Isoleucine (I), Leucine (L), Valine (V); and include biological fluids and tissue specimens. Such bio 8) (F), (Y), (W). See, logical Samples can be prepared for analysis of the presence e.g., Creighton, Proteins (1984)). of C. pneumoniae nucleic acids, proteins, or antibodies The terms “identical” or percent “identity,” in the context Specifically reactive with the proteins. of two or more nucleic acids or polypeptide sequences, refer The term “C. pneumoniae gene' shall be intended to mean to two or more Sequences or Subsequences that are the same the open reading frame encoding Specific C. pneumoniae or have a specified percentage of amino acid residues or polypeptides, as well as adjacent 5' and 3' non-coding 35 nucleotides that are the Same, when compared and aligned nucleotide Sequences involved in the regulation of for maximum correspondence over a comparison window, expression, up to about 2 kb beyond the coding region, but as measured using one of the following Sequence compari possibly further in either direction. The gene may be intro Son algorithms or by manual alignment and Visual inspec duced into an appropriate vector for extrachromosomal tion. This definition also refers to the complement of a test maintenance or for integration into a host genome. 40 Sequence, which has a designated percent Sequence or “Conservatively modified variants' applies to both amino Subsequence complementarity when the test Sequence has a acid and nucleic acid Sequences. With respect to particular designated or Substantial identity to a reference Sequence. nucleic acid Sequences, conservatively modified variants For example, a designated amino acid percent identity of refers to those nucleic acids which encode identical or 95% refers to Sequences or Subsequences that have at least essentially identical amino acid Sequences, or where the 45 about 95% amino acid identity when aligned for maximum nucleic acid does not encode an amino acid Sequence, to correspondence over a comparison window as measured essentially identical Sequences. Specifically, degenerate using one of the following Sequence comparison algorithms codon Substitutions may be achieved by generating or by manual alignment and Visual inspection. Such Sequences in which the third position of one or more Selected Sequences would then be said to have Substantial identity, or (or all) codons is Substituted with mixed-base and/or deoxyi 50 to be substantially identical to each other. Preferably, nosine residues (Batzer et al., Nucleic Acids Res. 19:5081 sequences have at least about 70% identity, more preferably (1991); Ohtsuka et al., J. Biol. Chem. 206:2605-2608 80% identity, more preferably 90-95% identity and above. (1985); Rossolini et al., Mol. Cell. Probes 8:91-98 (1994)). Preferably, the percent identity exists over a region of the Because of the degeneracy of the genetic code, a large Sequence that is at least about 25 amino acids in length, more number of functionally identical nucleic acids encode any 55 preferably over a region that is 50-100 amino acids in given protein. For instance, the codons GCA, GCC, GCG length. and GCU all encode the amino acid alanine. Thus, at every When percentage of Sequence identity is used in reference position where an alanine is Specified by a codon, the codon to proteins or peptides, it is recognized that residue positions can be altered to any of the corresponding codons described that are not identical often differ by conservative amino acid without altering the encoded polypeptide. Such nucleic acid 60 Substitutions, where amino acids residues are Substituted for variations are “Silent variations,” which are one species of other amino acid residues with Similar chemical properties conservatively modified variations. Every nucleic acid (e.g., charge or hydrophobicity) and therefore do not change Sequence herein which encodes a polypeptide also describes the functional properties of the molecule. Where Sequences every possible silent variation of the nucleic acid. One of differ in conservative Substitutions, the percent Sequence skill will recognize that each codon in a nucleic acid (except 65 identity may be adjusted upwards to correct for the conser AUG, which is ordinarily the only codon for methionine, vative nature of the substitution. Means for making this and TGG, which is ordinarily the only codon for tryptophan) adjustment are well known to those of skill in the art. US 6,822,071 B1 S 6 Typically this involves Scoring a conservative Substitution as (3.00), default gap length weight (0.10), and weighted end a partial rather than a full mismatch, thereby increasing the gaps. PILEUP can be obtained from the GCG sequence percentage Sequence identity. Thus, for example, where an analysis Software package, e.g., version 7.0 (Devereaux et identical amino acid is given a Score of 1 and a non al., Nuc. Acids Res. 12:387-395 (1984). conservative Substitution is given a Score of Zero, a conser Another example of algorithm that is Suitable for deter Vative Substitution is given a Score between Zero and 1. The mining percent Sequence identity (i.e., Substantial Similarity Scoring of conservative Substitutions is calculated according or identity) is the BLAST algorithm, which is described in to, e.g., the algorithm of Meyers & Miller, Computer Applic. Altschulet al., J. Mol. Biol. 215:403-410 (1990). Software Biol. Sci. 4:11-17 (1988) e.g., as implemented in the pro for performing BLAST analyses is publicly available gram PC/GENE (Intelligenetics, Mountain View, Calif., through the National Center for Biotechnology Information USA). (http://www.ncbi.nlm.nih.gov/). This algorithm involves For Sequence comparison, typically one Sequence acts as first identifying high Scoring sequence pairs (HSPs) by a reference Sequence, to which test Sequences are compared. identifying short words of length W in the query Sequence, When using a Sequence comparison algorithm, test and which either match or satisfy some positive-valued threshold reference Sequences are entered into a computer, Subse 15 Score T when aligned with a word of the Same length in a quence coordinates are designated, if necessary, and database Sequence. Tis referred to as the neighborhood word Sequence algorithm program parameters are designated. Score threshold (Altschul et al., Supra). These initial neigh Default program parameters can be used, or alternative borhood word hits act as Seeds for initiating Searches to find parameters can be designated. The Sequence comparison longer HSPs containing them. The word hits are then algorithm then calculates the percent Sequence identity for extended in both directions along each Sequence for as far as the test Sequence(s) relative to the reference Sequence, based the cumulative alignment Score can be increased. Cumula on the designated or default program parameters. tive Scores are calculated using, for nucleotide Sequences, A comparison window includes reference to a Segment of the parameters M (reward Score for a pair of matching any one of the number of contiguous positions Selected from residues; alwayS>0) and N (penalty Score for mismatching the group consisting of from 25 to 600, usually about 50 to 25 residues, always.<0). For amino acid sequences, a Scoring about 200, more usually about 100 to about 150 in which a matrix is used to calculate the cumulative Score. Extension Sequence may be compared to a reference Sequence of the of the word hits in each direction are halted when: the Same number of contiguous positions after the two cumulative alignment Score falls off by the quantity X from Sequences are optimally aligned. Methods of alignment of its maximum achieved value; the cumulative Score goes to Sequences for comparison are well-known in the art. Opti Zero or below, due to the accumulation of one or more mal alignment of Sequences for comparison can be negative-Scoring residue alignments, or the end of either conducted, e.g., by the local homology algorithm of Smith sequence is reached. The BLAST algorithm parameters W. & Waterman, Adv. Appl. Math. 2:482 (1981), by the homol T, and X determine the Sensitivity and Speed of the align ogy alignment algorithm of Needleman & Wunsch, J. Mol. ment. The BLASTN program (for nucleotide sequences) Biol. 48:443 (1970), by the search for similarity method of 35 uses as defaults a wordlength (W) of 11, an expectation (E) Pearson & Lipman, Proc. Natl. Acad. Sci. USA 85:2444 of 10, M=5, N=4, and a comparison of both strands. For (1988), by computerized implementations of these algo amino acid Sequences, the BLASTP program uses as default rithms (GAP, BESTFIT, FASTA, and TFASTA in the Wis parameters a wordlength (W) of 3, an expectation (E) of 10, consin Genetics Software Package, Genetics Computer and the BLOSUM62 scoring matrix (see Henikoff & Group, 575 Science Dr., Madison, Wis.), or by manual 40 Henikoff, Proc. Natl. Acad. Sci. USA 89:10915 (1989)). alignment and visual inspection (See, e.g., Ausubel et al., The BLAST algorithm also performs a statistical analysis Supra). of the Similarity between two sequences (see, e.g., Karlin & One example of a useful algorithm is PILEUP, PILEUP Altschul, Proc. Natl. Acad. Sci. USA 90:5873-5787 creates a multiple Sequence alignment from a group of (1993)). One measure of similarity provided by the BLAST related Sequences using progressive, pairwise alignments to 45 algorithm is the smallest sum probability (P(N)), which show relationship and percent Sequence identity. It also plots provides an indication of the probability by which a match a tree or dendogram showing the clustering relationships between two nucleotide or amino acid Sequences would used to create the alignment. PILEUP uses a simplification occur by chance. For example, a nucleic acid is considered of the progressive alignment method of Feng & Doolittle, J. Similar to a reference Sequence if the Smallest Sum prob Mol. Evol. 35:351–360 (1987). The method used is similar 50 ability in a comparison of the test nucleic acid to the to the method described by Higgins & Sharp, CABIOS reference nucleic acid is less than about 0.1, more preferably 5:151-153 (1989). The program can align up to 300 less than about 0.01, and most preferably less than about Sequences, each of a maximum length of 5,000 nucleotides O.OO1. or amino acids. The multiple alignment procedure begins An indication that two nucleic acid Sequences or polypep with the pairwise alignment of the two most similar 55 tides are Substantially identical is that the polypeptide Sequences, producing a cluster of two aligned Sequences. encoded by the first nucleic acid is immunologically croSS This cluster is then aligned to the next most related Sequence reactive with the antibodies raised against the polypeptide or cluster of aligned Sequences. Two clusters of Sequences encoded by the Second nucleic acid, as described below. are aligned by a simple extension of the pairwise alignment Thus, a polypeptide is typically Substantially identical to a of two individual Sequences. The final alignment is achieved 60 Second polypeptide, for example, where the two peptides by a Series of progressive, pairwise alignments. The program differ only by conservative substitutions. Another indication is run by designating Specific Sequences and their amino acid that two nucleic acid Sequences are Substantially identical is or nucleotide coordinates for regions of Sequence compari that the two molecules or their complements hybridize to Son and by designating the program parameters. Using each other under Stringent conditions, as described below. PILEUP, a reference sequence is compared to other test 65 Another indication that polynucleotide Sequences are Sub Sequences to determine the percent Sequence identity rela stantially identical is if two molecules hybridize to each tionship using the following parameters: default gap weight other under Stringent conditions. Stringent conditions are US 6,822,071 B1 7 8 sequence dependent and will be different in different cir interacting in a deleterious manner with any of the other cumstances. Generally, Stringent conditions are Selected to components of the pharmaceutical composition. be about 5 C. lower than the thermal melting point (Tm) for The terms “polypeptide,” “peptide' and “protein' are the Specific Sequence at a defined ionic Strength and pH. The used interchangeably herein to refer to a polymer of amino Tm is the temperature (under defined ionic strength and pH) acid residues. The terms apply to amino acid polymers in at which 50% of the target sequence hybridizes to a perfectly which one or more amino acid residue is an analog or matched probe. Typically Stringent conditions for a Southern mimetic of a corresponding naturally occurring amino acid, blot protocol involve hybridizing in a buffer comprising as well as to naturally occurring amino acid polymers. 5xSSC, 1% SDS at 65° C. or hybridizing in a buffer The phrase “specifically or selectively hybridizing to.” containing 5xSSC and 1% SDS at 42 C. and washing at 65 refers to hybridization between a probe and a target C. with a 0.2xSSC, 0.1% SDS wash. Sequence in which the probe binds Substantially only to the A "label' is a composition detectable by Spectroscopic, target Sequence, forming a hybridization complex, when the photochemical, biochemical, immunochemical, or chemical target is in a heterogeneous mixture of polynucleotides and means. For example, useful labels include 'P, fluorescent other compounds. Such hybridization is determinative of the dyes, electron-dense reagents, enzymes (e.g., as commonly 15 presence of the target Sequence. Although the probe may used in an ELISA), biotin, dioxigenin, or haptens and bind other unrelated sequences, at least 90%, preferably proteins for which antisera or monoclonal antibodies are 95% or more of the hybridization complexes formed are available. with the target Sequence. The term “nucleic acid” refers to deoxyribonucleotides or The term “recombinant' when used with reference to a ribonucleotides and polymers thereof in either Single- or cell, or nucleic acid, or vector, indicates that the cell, or double-Stranded form. The term encompasses nucleic acids nucleic acid, or vector, has been modified by the introduc containing known nucleotide analogs or modified backbone tion of a heterologous nucleic acid or the alteration of a residues or linkages, which are Synthetic, naturally native nucleic acid, or that the cell is derived from a cell So occurring, and non-naturally occurring, which have similar modified. Thus, for example, recombinant cells express binding properties as the reference nucleic acid, and which 25 genes that are not found within the native (non-recombinant) are metabolized in a manner Similar to the reference nucle form of the cell or express native genes that are otherwise otides. Examples of Such analogs include, without abnormally expressed, under expressed or not expressed at limitation, phosphorothioates, phosphoramidates, methyl all. phosphonates, chiral-methyl phosphonates, 2-O-methyl The phrase “specifically immunoreactive with', when ribonucleotides, peptide-nucleic acids (PNAS). referring to a protein or peptide, refers to a binding reaction Unless otherwise indicated, a particular nucleic acid between the protein and an antibody which is determinative sequence also implicitly encompasses conservatively modi of the presence of the protein in the presence of a hetero fied variants thereof (e.g., degenerate codon Substitutions) geneous population of proteins and other compounds. Thus, and complementary Sequences, as well as the Sequence under designated immunoassay conditions, the Specified explicitly indicated. The term nucleic acid is used inter 35 antibodies bind to a particular protein and do not bind in a changeably with gene, cDNA, mRNA oligonucleotide, and Significant amount to other proteins present in the Sample. polynucleotide. Specific binding to an antibody under Such conditions may AS used herein a “nucleic acid probe or oligonucleotide' require an antibody that is Selected for its Specificity for a is defined as a nucleic acid capable of binding to a target particular protein. A variety of immunoassay formats may be nucleic acid of complementary Sequence through one or 40 used to Select antibodies Specifically immunoreactive with a more types of chemical bonds, usually through complemen particular protein and are described in detail below. tary base pairing, usually through hydrogen bond formation. The phrase “substantially pure” or “isolated” when refer AS used herein, a probe may include natural (i.e., A, G, C, ring to a Chlamydia peptide or protein, means a chemical or T) or modified bases (7-deazaguanosine, inosine, etc.). In composition which is free of other Subcellular components addition, the bases in a probe may be joined by a linkage 45 of the Chlamydia organism. Typically, a monomeric protein other than a phosphodiester bond, So long as it does not is substantially pure when at least about 85% or more of a interfere with hybridization. Thus, for example, probes may Sample exhibits a Single polypeptide backbone. Minor vari be peptide nucleic acids in which the constituent bases are ants or chemical modifications may typically share the same joined by peptide bonds rather than phosphodiester linkages. polypeptide Sequence. Depending on the purification It will be understood by one of skill in the art that probes 50 procedure, purities of 85%, and preferably over 95% pure may bind target Sequences lacking complete complementa are possible. Protein purity or homogeneity may be indi rity with the probe Sequence depending upon the Stringency cated by a number of means well known in the art, Such as of the hybridization conditions. The probes are preferably polyacrylamide gel electrophoresis of a protein Sample, directly labeled as with isotopes, chromophores, lumiphores, followed by Visualizing a single polypeptide band on a chromogens, or indirectly labeled Such as with biotin to 55 polyacrylamide gel upon Silver Staining. For certain pur which a Streptavidin complex may later bind. By assaying poses high resolution will be needed and HPLC or a similar for the presence or absence of the probe, one can detect the means for purification utilized. presence or absence of the Select Sequence or Subsequence. A labeled nucleic acid probe or oligonucleotide is one that DETAILED DESCRIPTION is bound, either covalently, through a linker, or through 60 The present invention provides the nucleotide Sequence of ionic, van der Waals or hydrogen bonds to a label Such that the C. pneumoniae genome SEQ ID NO: 1 or a represen the presence of the probe may be detected by detecting the tative fragment thereof, in a form which can be readily used, presence of the label bound to the probe. analyzed, and interpreted by a skilled artisan. AS used "Pharmaceutically acceptable” means a material that is herein, a “representative fragment of the nucleotide not biologically or otherwise undesirable, i.e., the material 65 sequence depicted in SEQ ID NO: 1 refers to any portion can be administered to an individual along with a Chlamydia which is not presently represented within a publicly avail antigen without causing any undesirable biological effects or able database. Preferred representative fragments of the US 6,822,071 B1 9 10 present invention are open reading frames, expression of performing hybridization reactions in array based formats modulating fragments, uptake modulating fragments, and are well known to those of skill in the art (see, e.g., Pastinen fragments which can be used to diagnose the presence of C. (1997) Genome Res. 7: 606–614; Jackson (1996) Nature pneumoniae in Sample. Using the information provided in Biotechnology 14:1685; Chee (1995) Science 274: 610; WO the present application, together with routine cloning and 5 96/17958. Sequencing methods, one of ordinary skill in the art will be Arrays, particularly nucleic acid arrays can be produced able to clone and Sequence all “representative fragments of according to a wide variety of methods well known to those interest including open reading frames (ORFs) encoding a of skill in the art. For example, in a simple embodiment, large variety of C. pneumoniae proteins. A non-limiting “low density' arrays can simply be produced by Spotting identification of Such preferred representative fragments is (e.g. by hand using a pipette) different nucleic acids at provided in Table 2 (see Sequence Listing, SEQ ID different locations on a Solid Support (e.g. a glass Surface, a NOS:2–1074). membrane, etc.). Diagnostic use of C. pneumoniae Nucleic Acids This simple spotting, approach has been automated to Hybridization-based Assays produce high density Spotted arrays (see, e.g., U.S. Pat. No. Using the nucleic acids disclosed here, one of skill can 15 5,807,522). This patent describes the use of an automated design nucleic acid hybridization-based assays for the detec Systems that taps a microcapillary against a Surface to tion of C. pneumoniae. Any of a number of well known deposit a Small volume of a biological Sample. The process techniques for the Specific detection of target nucleic acids is repeated to generate high density arrayS. ArrayS can also can be used. Exemplary hybridization-based assays include, be produced using oligonucleotide Synthesis technology. but are not limited to, traditional “direct probe' methods Thus, for example, U.S. Pat. No. 5,143,854 and PCT patent Such as Southern Blots, dot blots, in situ hybridization (e.g., publication Nos. WO 90/15070 and 92/10092 teach the use FISH), PCR, and the like. The methods can be used in a wide of light-directed combinatorial Synthesis of high density variety of formats including, but not limited to Substrate oligonucleotide arrayS. (e.g. membrane or glass) bound methods or array-based Many methods for immobilizing nucleic acids on a vari approaches as described below. AS noted above, this inven 25 ety of Solid Surfaces are known in the art. A wide variety of tion also embraces methods for detecting the presence of organic and inorganic polymers, as well as other materials, Chlamydia DNA or RNA in biological samples. These both natural and Synthetic, can be employed as the material Sequences can be used to detect Chlamydia in biological for the Solid Surface. Illustrative Solid Surfaces include, e.g., Samples from patients Suspected of being infected. A variety nitrocellulose, nylon, glass, quartz, diazotized membranes of methods of specific DNA and RNA measurement using (paper or nylon), Silicones, polyformaldehyde, cellulose, nucleic acid hybridization techniques are known to those of and cellulose acetate. In addition, plastics Such as skill in the art (see Sambrook et al., Supra). polyethylene, polypropylene, polystyrene, and the like can In Situ hybridization assays are well known (e.g., Angerer be used. Other materials which may be employed include (1987) Meth. Enzymol 152: 649). Generally, in situ hybrid paper, ceramics, metals, metalloids, Semiconductive ization comprises the following major steps: (1) fixation of 35 materials, cermets or the like. In addition, Substances that tissue or biological structure to analyzed; (2) prehybridiza form gels can be used. Such materials include, e.g., proteins tion treatment of the biological Structure to increase acces (e.g., gelatins), lipopolysaccharides, Silicates, agarose and Sibility of target DNA, and to reduce nonspecific binding; polyacrylamides. Where the Solid Surface is porous, various (3) hybridization of the mixture of nucleic acids to the pore sizes may be employed depending upon the nature of nucleic acid in the biological structure or tissue; (4) post 40 the System. hybridization washes to remove nucleic acid fragments not In preparing the Surface, a plurality of different materials bound in the hybridization and (5) detection of the hybrid may be employed, particularly as laminates, to obtain vari ized nucleic acid fragments. The reagent used in each of ous properties. For example, proteins (e.g., bovine Serum these StepS and the conditions for use vary depending on the albumin) or mixtures of macromolecules (e.g., Denhardt's particular application. 45 Solution) can be employed to avoid non-specific binding, In a typical in Situ hybridization assay, cells are fixed to Simplify covalent conjugation, enhance Signal detection or a Solid Support, typically a glass slide. If a nucleic acid is to the like. If covalent bonding between a compound and the be probed, the cells are typically denatured with heat or Surface is desired, the Surface will usually be polyfunctional alkali. The cells are then contacted with a hybridization or be capable of being polyfunctionalized. Functional Solution at a moderate temperature to permit annealing of 50 groups which may be present on the Surface and used for labeled probes Specific to the nucleic acid Sequence encod linking can include carboxylic acids, aldehydes, amino ing the protein. The targets (e.g., cells) are then typically groups, cyano groups, ethylenic groups, hydroxyl groups, washed at a predetermined Stringency or at an increasing mercapto groups and the like. The manner of linking a wide Stringency until an appropriate Signal to noise ratio is variety of compounds to various Surfaces is well known and obtained. 55 is amply illustrated in the literature. The nucleic acids of this invention are particularly well For example, methods for immobilizing nucleic acids by Suited to array-based hybridization formats. Arrays are a introduction of various functional groups to the molecules is multiplicity of different “probe" or “target” nucleic acids (or known (see, e.g., Bischoff (1987) Anal. Biochem., 164: other compounds) attached to one or more Surfaces (e.g., 336-344; Kremsky (1987) Nucl. Acids Res. 15: 2891-2910). Solid, membrane, or gel). In a preferred embodiment, the 60 Modified nucleotides can be placed on the target using PCR multiplicity of nucleic acids (or other moieties) is attached primers containing the modified nucleotide, or by enzymatic to a Single contiguous Surface or to a multiplicity of Surfaces end labeling with modified nucleotides. Use of glass or juxtaposed to each other. membrane Supports (e.g., nitro cellulose, nylon, In an array format a large number of different hybridiza polypropylene) for the nucleic acid arrays of the invention is tion reactions can be run essentially “in parallel.” This 65 advantageous because of well developed technology provides rapid, essentially simultaneous, evaluation of a employing manual and robotic methods of arraying targets number of hybridizations in a single “experiment'. Methods at relatively high element densities. Such membranes are US 6,822,071 B1 11 12 generally available and protocols and equipment for hybrid Detection of a hybridization complex may require the ization to membranes is well known. binding of a signal generating complex to a duplex of target Target elements of various sizes, ranging from 1 mm and probe polynucleotides or nucleic acids. Typically, Such diameter down to 1 um can be used. Smaller target elements binding occurs through ligand and anti-ligand interactions as containing low amounts of concentrated, fixed probe DNA 5 between a ligand-conjugated probe and an anti-ligand con are used for high complexity comparative hybridizations jugated with a signal. Since the total amount of Sample available for binding to The sensitivity of the hybridization assays may be each target element will be limited. Thus it is advantageous enhanced through use of a nucleic acid amplification System to have Small array target elements that contain a Small that multiplies the target nucleic acid being detected. amount of concentrated probe DNA So that the Signal that is Examples of Such Systems include the polymerase chain obtained is highly localized and bright. Such Small array reaction (PCR) system and the ligase chain reaction (LCR) target elements are typically used in arrays with densities system. Other methods recently described in the art are the greater than 10"/cm. Relatively simple approaches capable nucleic acid Sequence based amplification (NASBAO, of quantitative fluorescent imaging of 1 cm areas have been Cangene, Mississauga, Ontario) and Q Beta Replicase Sys described that permit acquisition of data from a large num 15 temS. ber of target elements in a single image (see, e.g., Wittrup Nucleic acid hybridization Simply involves providing a (1994) Cytometry 16:206-213). denatured probe and target nucleic acid under conditions If fluorescently labeled nucleic acid Samples are used, where the probe and its complementary target can form arrays on Solid Surface Substrates with much lower fluores Stable hybrid duplexes through complementary base pairing. cence than membranes, Such as glass, quartz, or Small beads, The nucleic acids that do not form hybrid duplexes are then can achieve much better Sensitivity. Substrates Such as glass washed away leaving the hybridized nucleic acids to be or fused Silica are advantageous in that they provide a very detected, typically through detection of an attached detect low fluorescence Substrate, and a highly efficient hybridiza able label. It is generally recognized that nucleic acids are tion environment. Covalent attachment of the target nucleic denatured by increasing the temperature or decreasing the acids to glass or Synthetic fused Silica can be accomplished 25 Salt concentration of the buffer containing the nucleic acids, according to a number of known techniques (described or in the addition of chemical agents, or the raising of the above). Nucleic acids can be conveniently coupled to glass pH. Under low Stringency conditions (e.g., low temperature using commercially available reagents. For instance, mate and/or high Salt and/or high target concentration) hybrid rials for preparation of Silanized glass with a number of duplexes (e.g., DNA:DNA, RNA:RNA, or RNA:DNA) will functional groups are commercially available or can be form even where the annealed Sequences are not perfectly prepared using Standard techniques (see, e.g., Gait (1984) complementary. Thus Specificity of hybridization is reduced Oligonucleotide Synthesis: A Practical Approach, IRL Press, at lower stringency. Conversely, at higher stringency (e.g., Wash., D.C.). Quartz cover slips, which have at least 10-fold higher temperature or lower salt) Successful hybridization lower autofluorescence than glass, can also be Silanized. requires fewer mismatches. Alternatively, probes can also be immobilized on com 35 One of skill in the art will appreciate that hybridization mercially available coated beads or other Surfaces. For conditions may be selected to provide any degree of Strin instance, biotin end-labeled nucleic acids can be bound to gency. In a preferred embodiment, hybridization is per commercially available avidin-coated beads. Streptavidin or formed at low Stringency to ensure hybridization and then anti-digoxigenin antibody can also be attached to Silanized Subsequent washes are performed at higher Stringency to glass slides by protein-mediated coupling using e.g., protein 40 eliminate mismatched hybrid duplexes. Successive washes A following Standard protocols (see, e.g., Smith (1992) may be performed at increasingly higher Stringency (e.g., Science 258: 1122-1126). Biotin or digoxigenin end-labeled down to as low as 0.25xSSPE-T at 37° C. to 70° C) until a nucleic acids can be prepared according to Standard tech desired level of hybridization specificity is obtained. Strin niques. Hybridization to nucleic acids attached to beads is gency can also be increased by addition of agents Such as accomplished by Suspending them in the hybridization mix, 45 formamide. Hybridization specificity may be evaluated by and then depositing them on the glass Substrate for analysis comparison of hybridization to the test probes with hybrid after Washing. Alternatively, paramagnetic particles, Such as ization to the various controls that can be present. ferric oxide particles, with or without avidin coating, can be In general, there is a tradeoff between hybridization used. Specificity (Stringency) and Signal intensity. Thus, in a A variety of other nucleic acid hybridization formats are 50 preferred embodiment, the wash is performed at the highest known to those skilled in the art. For example, common Stringency that produces consistent results and that provides formats include Sandwich assays and competition or dis a signal intensity greater than approximately 10% of the placement assayS. Hybridization techniques are generally background intensity. Thus, in a preferred embodiment, the described in Hames and Higgins (1985) Nucleic Acid hybridized array may be washed at Successively higher Hybridization, A Practical Approach, IRL Press; Gall and 55 Stringency Solutions and read between each wash. Analysis Pardue (1969) Proc. Natl. Acad. Sci. USA 63: 378-383; and of the data Sets thus produced will reveal a wash Stringency John et al. (1969) Nature 223: 582–587. above which the hybridization pattern is not appreciably Sandwich assays are commercially useful hybridization altered and which provides adequate Signal for the particular assays for detecting or isolating nucleic acid Sequences. probes of interest. Such assays utilize a “capture' nucleic acid covalently 60 Methods of optimizing hybridization conditions are well immobilized to a Solid Support and a labeled “signal nucleic known to those of skill in the art (see, eg., Tijssen (1993) acid in Solution. The Sample will provide the target nucleic Laboratory Techniques in Biochemistry and Molecular acid. The “capture' nucleic acid and “signal” nucleic acid Biology, Vol. 24: Hybridization With Nucleic Acid Probes, probe hybridize with the target nucleic acid to form a Elsevier, N.Y.). “sandwich’ hybridization complex. To be most effective, the 65 Labeling and Detection of Nucleic Acids Signal nucleic acid should not hybridize with the capture In a preferred embodiment, the hybridized nucleic acids nucleic acid. are detected by detecting one or more labels attached to the US 6,822,071 B1 13 14 Sample or probe nucleic acids. The labels may be incorpo acceptor. Alternatively, luciferins can be used in conjunction rated by any of a number of means well known to those of with luciferase or lucigenins to provide bioluminescence. skill in the art. Means of attaching labels to nucleic acids Spin labels are provided by reporter molecules with an include, for example nick translation to end-labeling (e.g. unpaired electron Spin which can be detected by electron with a labeled RNA) by kinasing of the nucleic acid and Spin resonance (ESR) spectroscopy. Exemplary spin labels Subsequent attachment (ligation) of a nucleic acid linker include organic free radicals, transitional metal complexes, joining the sample nucleic acid to a label (e.g., a particularly Vanadium, copper, iron, and manganese, and the like. Exemplary Spin labels include nitroxide free radicals. fluorophore). A wide variety of linkers for the attachment of The label may be added to the target (sample) nucleic labels to nucleic acids are also known. In addition, interca acid(s) prior to, or after the hybridization. So called “direct lating dyes and fluorescent nucleotides can also be used. labels” are detectable labels that are directly attached to or Detectable labels suitable for use in the present invention incorporated into the target (sample) nucleic acid prior to include any composition detectable by Spectroscopic, hybridization. In contrast, so called “indirect labels” are photochemical, biochemical, immunochemical, electrical, joined to the hybrid duplex after hybridization. Often, the optical or chemical means. Useful labels in the present indirect label is attached to a binding moiety that has been invention include biotin for staining with labeled streptavi 15 attached to the target nucleic acid prior to the hybridization. din conjugate, magnetic beads (e.g., Dynabeads"), fluores Thus, for example, the target nucleic acid may be biotiny cent dyes (e.g., fluorescein, texas red, rhodamine, green lated before the hybridization. After hybridization, an fluorescent protein, and the like, See, e.g., Molecular Probes, avidin-conjugated fluorophore will bind the biotin bearing Eugene, Oreg., USA), radiolabels (e.g., H, I, S, 'C, or hybrid duplexes providing a label that is easily detected. For fP), enzymes (e.g., horse radish peroxidase, alkaline phos a detailed review of methods of labeling nucleic acids and phatase and others commonly used in an ELISA), and detecting labeled hybridized nucleic acids See Laboratory colorimetric labels Such as colloidal gold (e.g., gold particles Techniques in Biochemistry and Molecular Biology, Vol. 24: in the 40–80 nm diameter size range Scatter green light with Hybridization With Nucleic Acid Probes, P. Tijssen, ed. high efficiency) or colored glass or plastic (e.g., polystyrene, Elsevier, N.Y., (1993)). polypropylene, latex, etc.) beads. Patents teaching the use of 25 Fluorescent labels are easily added during an in vitro Such labels include U.S. Pat. Nos. 3,817,837; 3,850,752; transcription reaction. Thus, for example, fluorescein 3,939,350; 3,996.345; 4,277,437; 4,275,149; and 4,366,241. labeled UTP and CTP can be incorporated into the RNA A fluorescent label is preferred because it provides a very produced in an in vitro transcription. Strong Signal with low background. It is also optically The labels can be attached directly or through a linker detectable at high resolution and Sensitivity through a quick moiety. In general, the site of label or linker-label attachment Scanning procedure. The nucleic acid Samples can all be is not limited to any Specific position. For example, a label labeled with a single label, e.g., a single fluorescent label. may be attached to a nucleoside, nucleotide, or analogue Alternatively, in another embodiment, different nucleic acid thereof at any position that does not interfere with detection Samples can be simultaneously hybridized where each or hybridization as desired. For example, certain Label-ON nucleic acid Sample has a different label. For instance, one 35 Reagents from Clontech (Palo Alto, Calif.) provide for target could have a green fluorescent label and a Second labeling interspersed throughout the phosphate backbone of target could have a red fluorescent label. The Scanning Step an oligonucleotide and for terminal labeling at the 3' and 5' will distinguish cites of binding of the red label from those ends. AS Shown for example herein, labels can be attached binding the green fluorescent label. Each nucleic acid at positions on the ribose ring or the ribose can be modified Sample (target nucleic acid) can be analyzed independently 40 and even eliminated as desired. The base moieties of useful from one another. labeling reagents can include those that are naturally occur Suitable chromogens which can be employed include ring or modified in a manner that does not interfere with the those molecules and compounds which absorb light in a purpose to which they are put. Modified bases include but distinctive range of wavelengths. So that a color can be are not limited to 7-deaZaA and G, 7-deaza-8-aza A and G, observed or, alternatively, which emit light when irradiated 45 and other heterocyclic moieties. with radiation of a particular wave length or wave length It will be recognized that fluorescent labels are not to be range, e.g., fluorescers. limited to Single Species organic molecules, but include Desirably, fluorescers should absorb light above about inorganic molecules, multi-molecular mixtures of organic 300 nm, preferably about 350 nm, and more preferably and/or inorganic molecules, crystals, heteropolymers, and above about 400 nm, usually emitting at wavelengths greater 50 the like. Thus, for example, CdSe-CdS core-shell nanocrys than about 10 nm higher than the wavelength of the light tals enclosed in a Silica Shell can be easily derivatized for absorbed. It should be noted that the absorption and emis coupling to a biological molecule (Bruchez et al (1998) sion characteristics of the bound dye can differ from the Science, 281: 2013-2016). Similarly, highly fluorescent unbound dye. Therefore, when referring to the various quantum dots (Zinc Sulfide-capped cadmium Selenide) have wavelength ranges and characteristics of the dyes, it is 55 been covalently coupled to biomolecules for use in ultra intended to indicate the dyes as employed and not the dye sensitive biological detection (Warren and Nie (1998) which is unconjugated and characterized in an arbitrary Science, 281: 2016-2018). Solvent. Amplification-based ASSayS Fluorescers are generally preferred because by irradiating In another embodiment, amplification-based assays can a fluorescer with light, one can obtain a plurality of emis 60 be used to detect nucleic acids. In Such amplification-based Sions. Thus, a Single label can provide for a plurality of assays, the nucleic acid Sequences act as a template in an measurable events. amplification reaction (e.g. Polymerase Chain Reaction Detectable Signal can also be provided by chemilumines (PCR). Detailed protocols for quantitative PCR are provided cent and bioluminescent Sources. Chemiluminescent Sources in Innis et al. (1990) PCR Protocols, A Guide to Methods include a compound which becomes electronically excited 65 and Applications, Academic Press, Inc. N.Y.). by a chemical reaction and can then emit light which Serves Other suitable amplification methods include, but are not as the detectable Signal or donates energy to a fluorescent limited to ligase chain reaction (LCR) (see Wu and Wallace US 6,822,071 B1 15 16 (1989) Genomics 4: 560, Landegren et al. (1988) Science tides can be tested for their ability to induce 241: 1077, and Barringer et al. (1990) Gene 89: 117, lymphoproliferation, T cell cytotoxicity, or cytokine produc transcription amplification (Kwoh et al (1989) Proc. Natl. tion using Standard techniques. Acad. Sci. USA 86: 1173), and self-sustained sequence The particular procedure used to introduce the genetic replication (Guatelli et al. (1990) Proc. Nat. Acad. Sci. USA material into the host cell for expression of the polypeptide 87: 1874). is not particularly critical. Any of the well known procedures Detection of C. pneumoniae Gene Expression for introducing foreign nucleotide Sequences into host cells may be used. These include the use of calcium phosphate The nucleic acids of the invention can also be used to C. transfection, Spheroplasts, electroporation, liposomes, pneumoniae detect gene transcripts. Methods of detecting microinjection, plasmid vectors, viral vectors and any of the and/or quantifying gene transcripts using nucleic acid other well known methods for introducing cloned genomic hybridization techniques are known to those of skill in the DNA, cDNA, synthetic DNA or other foreign genetic mate art (see Sambrook et al. Supra). For example, a Northern rial into a host cell (see Sambrook et al., Supra). It is only transfer may be used for the detection of the desired mRNA necessary that the particular procedure utilized be capable of directly. In brief, the mRNA is isolated from a given cell Successfully introducing at least one gene into the host cell Sample using, for example, an acid guanidinium-phenol 15 which is capable of expressing the gene. chloroform extraction method. The mRNA is then electro Any of a number of well known cells and cell lines can be phoresed to separate the mRNA species and the mRNA is used to express the polypeptides of the invention. For transferred from the gel to a nitrocellulose membrane. AS instance, prokaryotic cells Such as E. coli can be used. with the Southern blots, labeled probes are used to identify Eukaryotic cells include, yeast, Chinese hamster ovary and/or quantify the target mRNA. (CHO) cells, COS cells, and insect cells. In another preferred embodiment, the gene transcript can The particular vector used to transport the genetic infor be measured using amplification (e.g. PCR) based methods mation into the cell is also not particularly critical. Any of as described above for directly assessing copy number of the the conventional vectors used for expression of recombinant target Sequences. proteins in prokaryotic and eukaryotic cells may be used. Expression of C. pneumoniae Proteins 25 Expression vectors for mammalian cells typically contain The nucleic acids disclosed here can be used for recom regulatory elements from eukaryotic viruses. binant expression of the proteins. In these methods, the The expression vector typically contains a transcription nucleic acids encoding the proteins of interest are introduced unit or expression cassette that contains all the elements into suitable host cells, followed by induction of the cells to required for the expression of the polypeptide DNA in the produce large amounts of the protein. The invention relies on host cells. A typical expression cassette contains a promoter routine techniques in the field of disclosing the general operably linked to the DNA sequence encoding a polypep methods of use in this invention is Sambrook et al., Molecu tide and signals required for efficient polyadenylation of the lar Cloning, A Laboratory Manual (2nd ed. 1989). transcript. The term “operably linked” as used herein refers Standard transfection methods are used to produce to linkage of a promoter upstream from a DNA sequence prokaryotic, mammalian, yeast or insect cell lines which 35 such that the promoter mediates transcription of the DNA express large quantities of the desired polypeptide, which is Sequence. The promoter is preferably positioned about the then purified using standard techniques (see. e.g., Colley et Same distance from the heterologous transcription start Site al., J. Biol. Chem. 264:17619-17622, 1989; Guide to Protein as it is from the transcription Start site in its natural Setting. Purification, Supra). AS is known in the art, however, Some variation in this The nucleotide Sequences used to transfect the host cells 40 distance can be accommodated without loSS of promoter can be modified to yield Chlamydia polypeptides with a function. variety of desired properties. For example, the polypeptides Following the growth of the recombinant cells and can vary from the naturally-occurring Sequence at the pri expression of the polypeptide, the culture medium is har mary Structure level by amino acid, insertions, Substitutions, Vested for purification of the Secreted protein. The media are deletions, and the like. These modifications can be used in 45 typically clarified by centrifugation or filtration to remove a number of combinations to produce the final modified cells and cell debris and the proteins are concentrated by protein chain. adsorption to any Suitable resin or by use of ammonium The amino acid Sequence variants can be prepared with Sulfate fractionation, polyethylene glycol precipitation, or various objectives in mind, including facilitating purification by ultrafiltration. Other routine means known in the art may and preparation of the recombinant polypeptide. The modi 50 be equally Suitable. Further purification of the polypeptide fied polypeptides are also useful for modifying plasma half can be accomplished by Standard techniques, for example, life, improving therapeutic efficacy, and lessening the Sever affinity chromatography, ion exchange chromatography, Siz ity or occurrence of Side effects during therapeutic use. The ing chromatography, His tagging and Ni-agarose chroma amino acid Sequence variants are usually predetermined tography (as described in Dobeli et al., Mol and Biochem. variants not found in nature but exhibit the same immuno 55 Parasit. 41:259-268 (1990)), or other protein purification genic activity as naturally occurring protein. In general, techniques to obtain homogeneity. The purified proteins are modifications of the Sequences encoding the polypeptides then used to produce pharmaceutical compositions, as may be readily accomplished by a variety of well-known described below. techniques, Such as Site-directed mutagenesis (see Gillman An alternative method of preparing recombinant polypep & Smith, Gene 8:81–97 (1979); Roberts et al., Nature 60 tides useful as vaccines involves the use of recombinant 328:731-734 (1987)). One of ordinary skill will appreciate viruses (e.g., vaccinia). Vaccinia virus is grown in Suitable that the effect of many mutations is difficult to predict. Thus, cultured mammalian cells Such as the HeLa S3 Spinner cells, most modifications are evaluated by routine Screening in a as described by Mackett et al., in DNA cloning Vol. II: A Suitable assay for the desired characteristic. For instance, the practical approach, pp. 191–211 (Glover, ed.). effect of various modifications on the ability of the polypep 65 Antibody Production tide to elicit a protective immune response can be easily The proteins of the present invention can be used to determined using in vitro assayS. For instance, the polypep produce antibodies Specifically reactive with C pneumoniae US 6,822,071 B1 17 18 antigens. If isolated proteins are used, they may be recom is inversely proportional to the amount of free analyte binantly produced or isolated from Chlamydia cultures. present in the Sample. Synthetic peptides made using the protein Sequences may Noncompetitive assays are typically Sandwich assays, in also be used. which the Sample analyte is bound between two analyte Methods of production of polyclonal antibodies are Specific binding reagents. One of the binding agents is used known to those of Skill in the art. In brief, an immunogen, as a capture agent and is bound to a Solid Surface. The preferably a purified protein, is mixed with an adjuvant and Second binding agent is labelled and is used to measure or detect the resultant complex by visual or instrument means. animals are immunized. When appropriately high titers of A number of combinations of capture agent and labelled antibody to the immunogen are obtained, blood is collected binding agent can be used. For instance, an isolated Chlamy from the animal and antisera is prepared. Further fraction dia protein or culture can be used as the capture agent and ation of the antisera to enrich for antibodies reactive to labelled anti-human antibodies Specific for the constant Chlamydia proteins can be done if desired (see Harlow & region of human antibodies can be used as the labelled Lane, Antibodies: A Laboratory Manual (1988)). binding agent. Goat, Sheep and other non-human antibodies Polyclonal antisera are used to identify and characterize Specific for human immunoglobulin constant regions (e.g., Y Chlamydia in the tissueS of patients using, for instance, in 15 or u) are well known in the art. Alternatively, the anti-human Situ techniques and immunoperoxidase test procedures antibodies can be the capture agent and the antigen can be described in Anderson et al. JAVMA 198:241 (1991) and labelled. Barr et al. Vet. Pathol. 28:110–116 (1991). Various components of the assay, including the antigen, Monoclonal antibodies may be obtained by various tech anti-Chlamydia antibody, or anti-human antibody, may be niques familiar to those skilled in the art. Briefly, Spleen cells bound to a solid surface. Many methods for immobilizing from an animal immunized with a desired antigen are biomolecules to a variety of Solid Surfaces are known in the immortalized, commonly by fusion with a myeloma cell (see art. For instance, the Solid Surface may be a membrane (e.g., Kohler & Milstein, Eur, J. Immunol. 6:511–519(1976)). nitrocellulose), a microtiter dish (e.g., PVC or polystyrene) Alternative methods of immortalization include transforma or a bead. The desired component may be covalently bound tion with Epstein Barr Virus, oncogenes, or retroviruses, or 25 or noncovalently attached through nonspecific bonding. Alternatively, the immunoassay may be carried out in other methods well known in the art. Colonies arising from liquid phase and a variety of Separation methods may be Single immortalized cells are Screened for production of employed to Separate the bound labeled component from the antibodies of the desired specificity and affinity for the unbound labelled components. These methods are known to antigen, and yield of the monoclonal antibodies produced by those of skill in the art and include immunoprecipitation, Such cells may be enhanced by various techniques, including column chromatography, adsorption, addition of magnetiz injection into the peritoneal cavity of a vertebrate host. able particles coated with a binding agent and other similar Monoclonal antibodies produced in such a manner are procedures. used, for instance, in ELISA diagnostic tests, immunoper An immunoassay may also be carried out in liquid phase oxidase tests, immunohistochemical tests, for the in Vitro without a separation procedure. Various homogeneous evaluation of Spirochete invasion, to Select candidate anti 35 immunoassay methods are now being applied to immunoas gens for vaccine development, protein isolation, and for Says for protein analytes. In these methods, the binding of Screening genomic and cDNA libraries to Select appropriate the binding agent to the analyte causes a change in the Signal gene Sequences. emitted by the label, So that binding may be measured Immunodiagonostic Detection of C. pneumoniae Infections without separating the bound from the unbound labelled The present invention also provides methods for detecting 40 component. the presence or absence of C. pneumoniae, or antibodies Western blot (immunoblot) analysis can also be used to reactive with it, in a biological Sample. For instance, anti detect the presence of antibodies to Chlamydia in the bodies specifically reactive with Chlamydia can be detected Sample. This technique is a reliable method for confirming using either Chlamydia proteins or the isolates described the presence of antibodies against a particular protein in the here. The proteins and isolates can also be used to raise 45 Sample. The technique generally comprises Separating pro Specific antibodies (either monoclonal or polyclonal) to teins by gel electrophoresis on the basis of molecular weight, detect the antigen in a Sample. In addition, the nucleic acids transferring the Separated proteins to a Suitable Solid disclosed and claimed here can be used to detect Chlamydia Support, (Such as a nitrocellulose filter, a nylon filter, or Specific Sequences using Standard hybridization techniques. derivatized nylon filter), and incubating the sample with the For a review of immunological and immunoassay proce 50 Separated proteins. This causes Specific target antibodies dures in general, see Basic and Clinical Immunology (Stites present in the Sample to bind their respective proteins. Target & Terred., 7th ed. 1991)). The immunoassays of the present antibodies are then detected using labeled anti-human anti invention can be performed in any of Several configurations, bodies. which are reviewed extensively in Enzyme Immunoassay The immunoassay formats described above employ (Maggio, ed., 1980); Tijssen, Laboratory Techniques in 55 labelled assay components. The label may be coupled Biochemistry and Molecular Biology (1985)). For instance, directly or indirectly to the desired component of the assay the proteins and antibodies disclosed here are conveniently according to methods well known in the art. A wide variety used in ELISA, immunoblot analysis and agglutination of labels may be used. The component may be labelled by asSayS. any one of several methods. Traditionally a radioactive label In brief, immunoassays to measure anti-Chlamydia anti 60 incorporating H, I, S, 'C, or 'P was used. Non bodies or antigens can be either competitive or noncompeti radioactive labels include ligands which bind to labelled tive binding assayS. In competitive binding assays, the antibodies, fluorophores, chemiluminescent agents, Sample analyte (e.g., anti-Chlamydia antibodies) competes enzymes, and antibodies which can Serve as Specific binding with a labeled analyte (e.g., anti-Chlamydia monoclonal pair members for a labelled ligand. The choice of label antibody) for specific binding sites on a capture agent (e.g., 65 depends on Sensitivity required, ease of conjugation with the isolated Chlamydia protein) bound to a solid surface. The compound, Stability requirements, and available instrumen concentration of labeled analyte bound to the capture agent tation. US 6,822,071 B1 19 20 Enzymes of interest as labels will primarily be hydrolases, Symptoms and/or complications. Vaccine compositions con particularly phosphatases, esterases and glycosidases, or taining the peptides are administered prophylactically to a oxidoreductases, particularly peroxidases. Fluorescent com patient Susceptible to or otherwise at risk of the infection. pounds include fluorescein and its derivatives, rhodamine The pharmaceutical compositions (containing either pep and its derivatives, dansyl, umbelliferone, etc. Chemilumi tides or antibodies) are intended for parenteral or oral neScent compounds include luciferin, and 2,3- administration. Preferably, the pharmaceutical compositions dihydrophthalazinediones, e.g., luminol. For a review of are administered parenterally, e.g., Subcutaneously, various labelling or Signal producing Systems which may be intradermally, or intramuscularly. Thus, the invention pro used, see U.S. Pat. No. 4,391,904, which is incorporated vides compositions for parenteral administration which herein by reference. comprise a Solution of the immunogenic polypeptides dis Non-radioactive labels are often attached by indirect Solved or Suspended in an acceptable carrier, preferably an means. Generally, a ligand molecule (e.g., biotin) is aqueous carrier. A variety of aqueous carriers may be used, covalently bound to the molecule. The ligand then binds to e.g., water, buffered water, 0.4% Saline, 0.3% glycine, an anti-ligand (e.g., Streptavidin) molecule which is either hyaluronic acid and the like. These compositions may be inherently detectable or covalently bound to a signal System, 15 Sterilized by conventional, well known Sterilization Such as a detectable enzyme, a fluorescent compound, or a techniques, or may be Sterile filtered. The resulting aqueous chemiluminescent compound. A number of ligands and Solutions may be packaged for use as is, or lyophilized, the anti-ligands can be used. Where a ligand has a natural lyophilized preparation being combined with a Sterile Solu anti-ligand, for example, biotin, thyroXine, and cortisol, it tion prior to administration. The compositions may contain can be used in conjunction with the labelled, naturally pharmaceutically acceptable auxiliary Substances as occurring anti-ligands. Alternatively, any haptenic or anti required to approximate physiological conditions, Such as genic compound can be used in combination with an anti buffering agents, tonicity adjusting agents, Wetting agents body. and the like, for example, Sodium acetate, Sodium lactate, Some assay formats do not require the use of labelled Sodium chloride, potassium chloride, calcium chloride, Sor components. For instance, agglutination assays can be used 25 bitan monolaurate, triethanolamine oleate, etc. to detect the presence of the target antibodies. In this case, The compositions may also comprise carriers to enhance antigen-coated particles are agglutinated by Samples com the immune response. Useful carriers are well known in the prising the target antibodies. In this format, none of the art, and include, e.g., KLH, thyroglobulin, albumins Such as components need be labelled and the presence of the target human Serum albumin, tetanus toxoid, polyamino acids Such antibody is detected by Simple visual inspection. as poly(lysine:glutamic acid), influenza, hepatitis B virus Pharmaceutical Compositions core protein, hepatitis B virus recombinant vaccine and the The peptides or antibodies (typically monoclonal like. antibodies) of the present invention and pharmaceutical For Solid compositions, conventional nontoxic Solid car compositions thereof are useful for administration to riers may be used which include, for example, pharmaceu mammals, particularly humans, to treat and/or prevent 35 tical grades of mannitol, lactose, Starch, magnesium Stearate, Chlamydia infections. Suitable formulations are found in Sodium Saccharin, talcum, cellulose, glucose, Sucrose, mag Remington's Pharmaceutical Sciences, Mack Publishing nesium carbonate, and the like. For oral administration, a Company, Philadelphia, Pa., 17th ed. (1985). pharmaceutically acceptable nontoxic composition is The immunogenic peptides or antibodies of the invention formed by incorporating any of the normally employed are administered prophylactically or to an individual already 40 excipients, Such as those carriers previously listed, and Suffering from the disease. The peptide compositions are generally 10-95% of active ingredient, that is, one or more administered to a patient in an amount Sufficient to elicit an peptides of the invention, and more preferably at a concen effective immune response to Chlamydia. An effective tration of 25%-7.5%. immune response is one that inhibits infection. An amount AS noted above, the peptide compositions are intended to adequate to accomplish this is defined as “therapeutically 45 induce an immune response to Chlamydia. Thus, composi effective dose” or “immunogenically effective dose.” tions and methods of administration Suitable for maximizing Amounts effective for this use will depend on, e.g., the the immune response are preferred. For instance, peptides peptide composition, the manner of administration, the Stage may be introduced into a host, including humans, linked to and Severity of the disease being treated, the weight and a carrier or as a homopolymer or heteropolymer of active general State of health of the patient, and the judgment of the 50 peptide units from various Chlamydia proteins disclosed prescribing physician, but generally range for the initial here. Alternatively, a “cocktail” of polypeptides can be used. immunization (that is for therapeutic or prophylactic A mixture of more than one polypeptide has the advantage administration) from about 0.1 mg to about 1.0 mg per 70 of increased immunological reaction and, where different kilogram patient, more commonly from about 0.5 mg to peptides are used to make up the polymer, the additional about 0.75 mg per 70 kg of body weight. Boosting dosages 55 ability to induce antibodies to a number of epitopes. are typically from about 0.1 mg to about 0.5 mg of peptide The compositions also include an adjuvant. AS used here, using a boosting regimen over weeks to months depending number of adjuvants are well known to one skilled in the art. upon the patient's response and condition. A Suitable pro Suitable adjuvants include incomplete Freund's adjuvant, tocol would include injection at time 0, 4, 2, 6, 10 and 14 alum, aluminum phosphate, aluminum hydroxide, N-acetyl weeks, followed by further booster injections at 24 and 28 60 muramyl-L-threonyl-D-isoglutamine (thr-MDP), N-acetyl weeks. nor-muramyl-L-alanyl-D-isoglutamine (CGP 11637, For therapeutic use, administration should begin at the referred to as nor-MDP), N-acetylmuramyl-Lalanyl-D- first sign of infection. This is followed by boosting doses isoglutaminyl-L-alanine-2-(1'-2'-dipalmitoyl-sn-glycero-3- until at least Symptoms are Substantially abated and for a hydroxyphosphoryloxy)-ethylamine (CGP 19835A, referred period thereafter. In Some circumstances, loading doses 65 to as MTP-PE), and RIBI, which contains three components followed by boosting doses may be required. The resulting extracted from bacteria, monophosphoryl lipid A, trehalose immune response helps to cure or at least partially arrest dimycolate and cell wall skeleton (MPL+TDM+CWS) in a US 6,822,071 B1 21 22 2% squalene/Tween 80 emulsion. The effectiveness of an assignment of Sequences 636 (60%) genes disclosedlhere adjuvant may be determined by measuring the amount of and 251 (23%) are similar to hypothetical genes for other antibodies directed against the immunogenic peptide. bacterial organisms including those for C. trachomatis. The The concentration of immunogenic peptides of the inven remaining 186 (17%) genes are not homologous to tion in the pharmaceutical formulations can vary widely, i.e. Sequences deposited in GenBank., Seventy C. trachomatis from less than about 0.1%, usually at or at least about 2% to genes are not represented in the C. pneumoniae genome. as much as 20% to 50% or more by weight, and will be These are contained within blocks consisting of 2-17 genes Selected primarily by fluid Volumes, Viscosities, etc., in and 19 single genes. Of the 70 C. trachomatis genes without accordance with the particular mode of administration homologs in C. pneumoniae, 60 are classified as encoding Selected. 1O hypothetical proteins. The remaining genes not represented The peptides of the invention can also be expressed by in C. pneumoniae consist of the tryptophan operon (trip.A, attenuated viral hosts, Such as vaccinia or fowlpox. This B.R), trpC, two predicted thiol protease genes, and 4 genes approach involves the use of vaccinia virus as a vector to assigned to the phospholipase-D Superfamily. express nucleotide Sequences that encode the peptides of the It is evident that there is a high level of functional invention. Upon introduction into a host, the recombinant 15 conservation between C. pneumoniae and C. trachomatis as vaccinia virus expresses the immunogenic peptide, and orthologs to C. trachomatis genes were identified for 859 thereby elicits an immune response. Vaccinia Vectors and (80%) of the predicted coding sequences for C. pneumoniae. methods useful in immunization protocols are described in, The level of similarity for individual encoded proteins spans e.g., U.S. Pat. No. 4,722,848. Another vector is BCG a wide spectrum (22-95% amino acid identity) with an (Bacille Calmette Guerin). BCG vectors are described in average of 62% amino acid identity between orthologs from Stover et al. (Nature 351:456-460 (1991)). A wide variety of the two species. The percent amino acid identity between other vectors useful for therapeutic administration or immu orthologous chlamydial proteins is similar among functional nization of the peptides of the invention, e.g., Salmonella groups with the highest for proteins associated with trans typhi Vectors and the like, will be apparent to those skilled lation and the lowest for proteins whose function in chlamy in the art from the description herein. 25 diae is uncharacterized and not related to proteins encoded The DNA encoding one or more of the peptides of the by other organisms. The gene order of the homologous Set invention can also be administered to the patient. This of genes in C. pneumoniae shows reorganization relative to approach is described, for instance, in Wolff et, al., Science the genome of C. trachomatis, however, there is a high level 247: 1465–1468 (1990) as well as U.S. Pat. Nos. 5,580,859 of Synteny for the gene organization of the two genomes. We and 5,589,466. identified thirty-nine blocks of 2 or more genes whose gene In order to enhance Serum half-life, the peptides may also organization is colinear with homologs to C. trachomatis, be encapsulated, introduced into the lumen of liposomes, although Some of these are inverted. The distribution of prepared as a colloid, or other conventional techniques may genome reorganization is not evenly distributed on the be employed which provide an extended serum half-life of chromosome as the region between C. pneumoniae coding the peptides. A variety of methods are available for prepar 35 sequences 0130-0300 contains substantially more reorgani ing liposomes, as described in, e.g., Szoka et al., Ann. Rev. Zation than other areas of the genome. This region coincides Biophys. Bioeng. 9:467 (1980), U.S. Pat. Nos. 4, 235,871, with the predicted chromosome replication terminus. 4,501,728 and 4,837,028. We identified orthologs of enzymes characterized in other bacteria that account for the essential requirements for DNA EXAMPLES 40 replication, repair, transcription and translation including two predicted DNA helicases of the Swi2/Snf2 family found The following examples are offered to illustrate, but no to in C. trachomatis. Similar to C. trachomatis, alternative limit the claimed invention. sigma subunits for RNA polymerase, o' and o, were Example 1 identified in addition to anti-O regulatory System factors 45 RsbV, a RsbW-like single-domain histidine kinase, and a This example describes comparison of the C. pneumoniae RsbU-like protein phosphatase. These findings Suggest that genome disclosed here and the, previously Sequenced, C. the fundamental mechanisms of transcriptional regulation trachomatis genome (Stephens, et al. Science 282:754-759 are conserved among Chlamydia. The C. trachomatis pro (1998)). teins containing SET and SWIB domains, and a SWIB The apparent low level of DNA homology between C. 50 domain fused to the C-terminus of the chlamydial topoi trachomatis and C. pneumoniae (Campbell, et al., J. Clin. Somerase I, not identified outside eukaryotes, are found in C. Microbiol. 25:1911-1916 (1987)) yet analogous cell struc pneumoniae Supporting their possible role in the chromatin tures and developmental cycles, predicts that comparative condensation-decondensation characteristic of the biologi analysis of the two genomes will Significantly enhance the cally unique chlamydial developmental cycle. understanding of both pathogens. Identification of genes that 55 The central metabolic pathways inferred from the C. are present in one Species but not the other are of particular pneumoniae genome Sequence are the same as those iden importance for the mutually exclusive biological, Virulence tified for C. trachomatis C. pneumoniae has a glycolytic and pathogenesis capabilities of each. Identification of genes pathway and a linked tricarboxylic acid cycle, although shared between the two species Strongly Supports the likely functional, is incomplete as genes for citrate Synthase, requirement for these capabilities in a biological System that 60 aconitase, and isocitrate dehydrogenase were not identified. has, over its long-term association with mammalian host C. pneumoniae has a complete glycogen Synthesis and cells, evolved to reduce the metabolic capacities while degradation System Supporting a role for glycogen Synthesis optimizing Survival, growth and transmission of these and utilization of glucose-derivatives in chlamydial metabo unique pathogens. lism. Genes encoding essential functions in aerobic respi The previously Sequenced C. trachomatis genome con 65 ration are present and electron flux may be Supported by tains 1,042,519 nucleotides and 875 likely protein-coding pyruvate, Succinate, glycerol-3-phosphate, and NADH genes. Similarity Searching permitted the inferred functional dehydrogenases, NADH-ubiquinone oxidoreductase and US 6,822,071 B1 23 24 cytochrome oxidase. C. pneumoniae also contains the V range of host cell types than C. trachomatis is consistent (vacuolar)-type ATPase operon and the two ATP translo with this hypothesis. Not all of the differences in biological cases found in C. trachomatis. capacity may be associated with mutually exclusive genes. The type-III Secretion virulence System required for inva One explanation for the significantly lower level of homol Sion by Several pathogenic bacteria and found in the C. ogy between protein Sequences assigned as having C. pneu trachomatis genome in three chromosomal locationsis also moniae and C. trachomatis orthologs but no identifiable present in the C. pneumoniae genome. Each of the compo orthologs in other organisms is that this set of proteins is not nents is conserved and their relative genomic contexts are only associated with biological requirements Specific for conserved. Genes Such as a predicted Serine/threonine pro Chlamydia but this polymorphism may account for differ tein kinase and other genes physically linked to genes ential biology between the two species. The determination of encoding Structural components of the type-III Secretion the genome Sequence from a representative of the C pSittaci apparatus, but without identified homologs, are also highly group will precisely delineate those genes that are mutually Similar between the two species Suggesting the functional exclusive and Specific for each species. roles in modifying cellular biology are fundamentally con The major functionally identifiable addition to the C. Served. 15 pneumoniae genome is a large expansion of genes encoding Chlamydia-encoded proteins that are not found in a new family of chlamydial polymorphic membrane proteins chlamydial organisms but localized to the intracellular (Pmp), alone representing 22% of the increased coding chlamydial inclusion membrane are likely essential for the capacity. While the C. trachomatis genome has 9 pmp genes, unique intracellular biology and perhaps differences in remarkably the C. pneumoniae genome contains 21 pmp inclusion morphology observed between Species of Chlamy genes. Most of these genes appear to be amplified in two dia. Several Such proteins, termed IncA,B&C, have been regions of the genome with three Stand-alone genes. Inter characterized for a C. pSittaci Strain (Rockey, et al. Mol. estingly one of the Stand-alone genes is most closely related Microbiol. 15:617-626 (1995); Rockey et al. Infect. Immun. to the C. trachomatis pmpD which is the only stand-alone 62:106-112 (1994)). C. pneumoniae and C. trachomatis pmp gene in the C. trachomatis genome and it is located encode orthologs to C. pSittaci IncB and IncC and C. 25 with the Same relative genomic context, Suggesting an trachomatis also contains an ortholog to IncA. C. pneumo essential and conserved function for this paralog. Six Pmp niae contains two genes that encode proteins with Similarity coding genes are presumably not functional as five contain to IncA (CPn0186 and CPn0585), although the level of predicted coding frame-shifts and one is truncated. The homology is low Suggesting analogous but possibily altered amplification of this gene family and the confidently pre functions. dicted frame-shifts Suggest a specific molecular mechanism The tryptophan biosynthesis operon (trp A, trp B, trpR) and to promote functional or antigenic diversity. The biological trpC identified in C. trachomatis is conspicuously missing in role of this protein family remains enigmatic, although at the C. pneumoniae genome. This represents the entire rep least one of the proteins in C. pSittaci related to this family ertoire of genes associated with tryptophan biosynthesis is exposed on the chlamydial Surface. identified in C. trachomatis. Seventeen genes adjacent to the 35 While a function could not be assigned for most of the C. trachomatis tryptophan operon also were not found in the unique C. pneumoniae genes, Several have Significant Simi C. pneumoniae genome. This region is the Single largest loSS larity to genes from other organisms. Functional assign of a contiguous genomic Segment and includes 4 HKD ments could be made for genes encoding GMP Synthetase, Superfamily encoding genes that encompass a family of IMP dehydrogenase, UMP synthase, uridine kinase, biotin proteins related to endonuclease and phospholipase D. 40 Synthase pathway proteins, methylthioadenosine These findings may be important for the ability of Chlamy nucleosidase, a DNA glycosylase and aromatic amino acid dia to persist in their hosts and cause disease by eliciting hydroxylase. Thus a complete pathway was identified for potent, focal and persistent inflammatory responses thought biotin biosynthesis. The additional purine and pyrimidine to be essential for pathogenesis. Salvage pathway genes presumably reflect metabolic limi The C. pneumoniae genome contains 187,711 additional 45 tations in one of the cell types that C. pneumoniae infects or nucleotides compared to the C. trachomatis genome, and the differences in the ability of C. pneumoniae to transport 214 coding Sequences not found in C. trachomatis account precursor nucleosides or nucleotides. for most of the increased genome size. Eighty-eight of these The addition of aromatic amino acid hydroxylase in C. genes are found in blocks of >10 genes (11-30 genes/block), pneumoniae is intriguing especially in light of the loSS of 41 are Single genes, and the remainder are partnered with at 50 tryptophan biosynthetic genes and the inability to Synthesize least one other gene. Based upon the observation that ~70% other amino acids including phenylalanine. Aromatic amino of all the C. pneumoniae genes have an identifiable homolog acid hyroxlyases include three distinct enzymes that func in GenBank, exclusive of C. trachomatis, it would be tion to receptively oxidize phenylalanine to tyrosine, expected that over 150 of the 214 genes should have a tyrosine to Dopa, and tryptophan to 5-hydroxytryptophan homolog in GenBank, many associated with a function. 55 and Serotonin. Although the chlamydial protein is similar to However, only 28 coding Sequences have similarity to genes proteins of this family and incrementally more closely from other organisms. Thus the majority of the genes that are related to tryptophan hydroxylase, its Specific function could mutually exclusive of C. trachomatis (186 of 214), and the not be confidently predicted. We hypothesize that it may be 60 of 70 C. trachomatis genes that lacked an identifiable involved in C. pneumoniae virulence. Tryptophan hydroxy homolog in C. pneumoniae, do not have detectable 60 lase has not been previously identified in bacteria and the homologs to genes from other organisms. We predict that origin of the chlamydial gene appears to be from eukaryotes. most of the unique genes are essential for Specific attributes The functional role of an aromatic amino acid hydroxylase that define the differential biology, tropism and pathogenesis for C. pneumoniae is linked to the unique intracellular of C. trachomatis and C. pneumoniae. Moreover, this Sug biology of this organism and may represent a key contribu gests that C. pneumoniae has more unique biological (i.e., 65 tion to C. pneumoniae persistence and pathogenesis. virulence) capacity than C. trachomatis. The ability of C. It is understood that the examples and embodiments pneumoniae to be more invasive and Survive in a broader described herein are for illustrative purposes only and that US 6,822,071 B1 25 26 various modifications or changes in light thereof will be Suggested to perSons Skilled in the art and are to be included TABLE 1-continued within the Spirit and purview of this application and Scope of the appended claims. All publications, patents, and patent applications cited herein are hereby incorporated by refer- 5 SEO ID NO:1 SEO ID NO:1 ence in their entirety for all purposes. ype start position end position Gene Table 1 provides functional assignments of C. pneumo- RNA 78.4994 784922 (R) Lys tRNA niae nonprotein-encoding genomic Sequences. Table 2 pro- RNA 843926 843999 Pro tRNA 2 Vides functional assignments of protein coding Sequences. 10 tRNA 4O9922 409848 (R) Pro tRNA 1 Elethe proteins Sequence corresponding Listing AOR to the the coding . Sequencesacid SES RNA 631373 631445 Phe tRNA NOS:2–1074). RNA 677337 677264 (R)R) Arg tRNA 2 RNA 807413 807341 (R) Arg tRNA 3 TABLE 1. 15 RNA 877473 8774OO (R) Arg tRNA 4 RNA 462141 462214 Arg tRNA 1 SEQ ID NO:1 SEQ ID NO:1 RNA 10856.05 1085676 Gin tRNA ype start position end position Gene RNA 78678O 786708 (R) Thr tRNA 3 Ori 841664 841396 (R) Putative Origin of Replica mRNA 138493 138074 (R) timRNA 2O RNA 89728 89657 (R) Thr tRNA 1 RNA 6O7342 6O7649 Ribonuclease PRNA RNA 293477 2934.05 (R) Thr tRNA 2 rRNA 1OOO564 10O2115 16S rRNA rRNA 10O2415 1OO5278 23S rRNA RNA 87522 87450 (R) Met tRNA 1 rRNA 1OO5393 1005509 5S rRNA RNA 1993O1 199229 (R) Met tRNA 2 RNA 269070 269142 Ala tRNA 1 RNA 164318 164389 Asn tRNA RNA 1993.90 199317 (R) Met tRNA 3 RNA 296,224 296.151 (R) Asp tRNA 25 tRNA 626904 626987 Ser tRNA 1 RNA 836.191 836119 (R) Ala tRNA 2 RNA 1030533 1030603 Cys tRNA RNA 708359 708440 Ser tRNA 2 RNA 784-896 784-822 (R) Glu tRNA RNA 1142O34 1142117 Ser tRNA 3 RNA 781.68O 781610 (R) Gly tRNA 1 RNA 961536 96.16O7 Gly tRNA 2 RNA 1230O28 12299.45 (R) Ser tRNA 4 RNA 999949 1OOOO23 His tRNA 30 tRNA 91070 90999 (R) Trp tRNA RNA 268992 269065 Ile tRNA RNA 672236 672318 LeutRNA 1 RNA 293399 293317 (R) Tyr tRNA RNA 68O178 68O257 Leu tRNA 2 RNA 296147 296O75 (R) Val tRNA 1 RNA 715889 715971 Leu tRNA 3 RNA 739403 739486 Leu tRNA 4 RNA 1137389 1137462 Val tRNA 2 RNA 1175863 1175944 Leu tRNA 5

TABLE 2 Gene # From To Strand Gene Function (C. Trachomatis Ortholog in parenthesis) CPOOO1 282 4 R CT001 hypothetical protein CPOOO2 573 875 F gatC-Glu-tRNA Glin Amidotransferase (C subunit)-(CTO02) CPOOO3 895 2370 F gatA-Glu tRNA Glin Amidotransferase-(CTO03) CPOOO4 2370 3833 F gat-(Pet 112) Glu tRNA Glin Amidotransferase (B Subunit) CPOOOS 4127 6892 F pmp 1-Polymorphic Outer Membrane Protein G Family CPOOO6 7293 7141 R CPnOOO7 7605 10496 F CPOOO8 10975 11685 F CPOOO9 11815 13119 F CPOO1O 13435 14325 F CPOO1O 14379 15746 F frame-shift with 0010 CPOO11 15892 16614 F CPOO12 16644 18212 F CPOO13 18584 21106 F pmp 2-Polymorphic Outer Membrane Protein G Family CPOO14 21392 21922 F pmp 3-Polymorphic Outer Membrane Protein G Family CPOO15 21835 24174 F pmp 3-PMP 3 (frame-shift with 0.014) CPOO16 24416 26.188 F pmp 4-Polymorphic Outer Membrane Protein G Family CPOO17 26094 27170 F pmp 4-PMP 4 (frame-shift with 0.016) CPOO18 27522 29OO3 F pmp 5-Polymorphic Outer Membrane Protein G Family CPOO19 29OO7 30356 F pmp 5-PMP 5 (frame-shift with 0018) CPOO2O 32687 30603 R Predicted OMP leader (14) peptide: Outer membrane-(CT351) CPOO21 34.410 32707 R Predicted OMP leader (19) peptide-(CT350) CPOO22 34982 34.395 R maf-(CT349) CPOO23 36603 35O14 R yijKfalr-ABC Transporter Protein ATPase-(CT348) CPOO24 37596 36661 R xerC-Integrase/recombinase-(CT347) CPOO25 386O4 37684 R elaC/atsA-Sulphohydrolase/Glycosulfatase-(CT346) CPOO26 396.25 38762 R CT345 hypothetical protein-(CT345) CPnOO27 42234 39778 R lon-Lon ATP-dependent Protease-(CT344) CPOO28 43325 42543 R CPOO29 43755 43390 R CPOO3O 43891 44529 F gep. 1-O-Sialoglycoprotein Endopeptidase 1-(CT343)

US 6,822,071 B1 31 32

TABLE 2-continued

Gene # From To St al d Gene Function (C. Trachomatis Ortholog in parenthesis) CPO184 221775 221221 efp 1-Elongation Factor P 1-(CT122) CPO185 22.2451 221765 rpefaraD-Ribulose-P Epimerase-(CT121) CPO186 222899 224O68 *similarity to Cps IncA 1-(CT119) CPO187 224248 225,045 predicted methylase-(CT133) CPO188 225111 226400 CT132 hypothetical protein CPO189 2264OO 229825 CT131 homolog-(Possible Transmembrane Protein) CPO190 229919 231274 CPO191 231991 231314 nO-ABC Amino Acid Transporter ATPase-(CT130) CPO192 232634 231984 nP-ABC Amino Acid Transporter Permease-(CT129) CPO193 233,126 232686 argR-Arginine Repressor CPO194 233,210 234241 cp 2-O-Sialoglycoprotein Endopeptidase 2-(CT197) CPO195 234190 235785 ppA 1-Oligopeptide Binding Protein 1 CPO196 235939 237519 ppA 2-Oligopeptide Binding Protein 2-(CT198) CPnO197 237578 238882 ppA 3-Oligopeptide Binding Protein 3 CPO198 2391.69 24O746 ppA 4-Oligopeptide Binding Protein 4 CPO199 241042 241983 ppB 1-Oligopeptide Permease 1-(CT199) CPO2OO 242O17 242868 ppC 1-Oligopeptide Permease 1-(CT200) CPO2O1 242864 243715 ppD-Oligopeptide Transport ATPase-(CT201) CPO2O2 243715 2445OO ppF-Oligopeptide Transport ATPase-(CT202) CPO2O3 245OO8 2458O2 CPO2O4 245817 246OO2 CPO2O5 246133 246,327 CPO2O6 2464O9 247161 CT203 hypothetical protein CPnO2O7 2472O8 2486.17 ybhI/sodiT1-Oxoglutarate/Malate Translocator-(CT204) CPO2O8 248953 2506O2 pfkA 2-Fructose-6-P Phosphotransferase 2-(CT205) CPO209 251036 251272 CPO210 252384 251440 CPO211 252756 252463 CPO212 2S4O66 252888 CPO213 25.4342 254190 CPO214 255657 2544.46 CPO215 257015 255759 CPO216 257608 2571.74 CPnO217 257896 258579 ypdP-(CT140) CPO218 259058 258582 CPO219 259357 260472 tgt-Queuine tRNA Ribosyl Transferase-(CT193) CPO220 260696 261238 CPO221 261657 262O64 CPO222 262SO4 26.2842 *weak similarity to Bacteriophage CHP1 (Orf4) CPO223 262956 263333 CPO224 263435 263674 CPnO225 263873 264541 CPO226 264566 264967 CPnO227 265416 26SOO9 dsbB-Disulfide bond Oxidoreductase-(CT176) CPO228 266110 265412 dsbG-Disulfide Bond Chaperone-(CT177) CPO229 266,328 267560 CT178 hypothetical protein CPO230 268253 267576 CT179 hypothetical protein CPO231 268957 268253 tauB-ABC Transport ATPase (Nitrate/Fe)-(CT180) CPO232 27O122 269232 *similarity to 5'-Methylthioadenosine/S-Adenosylhomocysteine Nucleosidase CPO233 270424 270248 CPO234 271240 270548 CT181 hypothetical protein CPO235 271416 272177 kdsB-deoxyoctulonosic Acid Synthetase-(CT182) CPO236 272156 2.73766 pyrg-CTP Synthetase-(CT183) CPnO237 2.73762 27.4214 yggF Family-(CT184) CPO238 274303 275838 rwf-Glucose-6-P Dehydrogenase-(CT185) CPO239 275899 276672 devB-Glucose-6-P Dehydrogenase (Dev B family)-(CT186) CPO240 277861 276698 CPO241 279354 2782O3 CPO242 279918 279487 CPO243 280555 28O133 CPO244 280918 281.556 adk-Adenylate Kinase-(CT128) CPO245 281645 282499 ydhC-Polysaccharide Hydrolase-Invasin Repeat Family-(CT127) CPO246 282952 282551 rs9-S9 Ribosomal Protein-(CT126) CPO247 2.83415 2.82969 rl13-L13 Ribosomal Protein-(CT125) CPO248 284327 283650 ycfV/ybbA-ABC Transporter ATPase-(CT152) CPO249 285841 284.333 CT151 hypothetical protein CPO2SO 286.057 285902 rl33-L33 Ribosomal Protein-(CT150) CPO251 286060 287559 *conserved hypothetical protein CPnO252 288.112 287576 CT144 hypothetical protein (frame-shift with 0253?) CPO253 288456 287950 CT144 hypothetical protein 1 CPO254 289262 288459 CT143 hypothetical protein 1 CPnO255 29O165 289.329 CT142 hypothetical protein 1 CPO256 291264 29O398 CT144 hypothetical protein 2 CPnO257 292127 291.267 CT143 hypothetical protein 2 CPO258 292534 2921.33 CT142 hypothetical protein (frame-shift with 0259?) CPnO259 292986 292441 CT142 hypothetical protein 2

US 6,822,071 B1 37 38

TABLE 2-continued

Gene # From To St al d Gene Function (C. Trachomatis Ortholog in parenthesis) CPO4 14 6O143 59172 accA-AcCoA Carboxylase/Transferase Alpha-(CT265) CPO4 15 61498 6O22 CT266 hypothetical protein CPO4 16 61856 61557 himD/ihfa-Integration Host Factor Alpha-(CT267) CPO4 17 63O35 62244 amiA-N-Acetylmuramoyl Alanine Amidase-(CT268) CPO4 18 644O1 6295.3 murE-N-Acetylmuramoylalanylglutamyl DAP Ligase-(CT269) CPO4 19 66834 64876 pbp3-transglycolase/transpeptidase-(CT270) CPO4 2O 67108 66824 CT271 hypothetical protein CPO4 21 67998 67108 yabC-PBP2B Family methyltransferase-(CT272) CPO4 22 68242 687.84 CT273 hypothetical protein CPO4 23 687.91 69216 CT274 hypothetical protein CPO4 24 696.12 7096 dnaA 2-Replication Initiation Factor 2-(CT275) CPO4 25 7098O 71564 CT276 hypothetical proteins CPO4 26 72111 71536 CT277 similarity CPO4 27 722O7 73715 nqr2-NADH (Ubiquinone) Dehydrogenase-(CT278) CPO4 28 73722 7468 nqr3-NADH (Ubiquinone) Oxidoreductase, Gamma-(CT279) CPO4 29 74681 75319 nqra-NADH (Ubiquinone) Reductase 4-(CT280) CPO4 3O 75326 76093 nqr5-NADH (Ubiquinone) Reductase 5-(CT281) CPO4 31 76483 76.15 CPO4 32 76816 76514 CPO4 33 772.73 76929 gcsH-Glycine Cleavage System H. Protein-(CT282) CPO4 34 794-62 77276 CT283 hypothetical protein CPO4 35 80902 794.75 Phospholipase D superfamily uncleavable leader peptide-(CT284) CPO4 36 81618 80902 lplA-Lipoate Protein Ligase-Like Protein-(CT285) CPO4 37 81816 843SO clpC-ClpC Protease-(CT286) CPO4 38 85416 84334 ycbF-PP-loop superfamily ATPase-(CT287) CPO4 39 855.53 86O77 CPO4 40 86105 86,740 CPO4 41 86891 87838 CT007 hypothetical protein CPO4 42 88O13 88528 CT006 hypothetical protein CPO4 43 88729 89979 CT005 hypothetical protein CPO4 44 90287 945O7 pmp 6-Polymorphic Outer Membrane Protein G/I Family CPO4 45 94.772 4 97.579 pmp 7-Polymorphic Outer Membrane Protein G Family CPO4 46 4 97626 SOO415 pmp 8-Polymorphic Outer Membrane Protein G Family CPO4 47 SOO568 SO3351 pmp 9-Polymorphic Outer Membrane Protein G/I Family CPO4 48 SO4810 SO3698 *yxiG Bs 2 Hypothetical Protein CPO4 49 507231 505330 pmp 10-PMP 10 (Frame-shift with 0451) CPO4 50 508112 50718O pmp 10-Polymorphic Outer Membrane Protein G Family CPO4 51 508275 511058 pmp 11-Polymorphic Outer Membrane Protein G Family CPO4 52 511319 512860 pmp 12-Polymorphic Outer Membrane Protein A/I Family (truncated) CPO4 53 513234 516152. pmp 13-Polymorphic Outer Membrane Protein G Family CPO4 54 516182 519115 pmp 14-Polymorphic Outer Membrane Protein H Family CPO4 55 52O348 519458 CPO4 56 521532 52O327 CPO4 57 523865 522120 CPO4 58 52632O 524236 CPO4 59 527005 526619 CPO4 60 527840 526992 CPO4 61 528638 527844 CPO4 62 531052 529037 CPO4 63 532.357 531191. CPO4 64 532842 532,366 CPO4 65 S33212 532871 CPO4 66 533724 536537 pmp 15-Polymorphic Outer Membrane Protein E Family CPO4 67 536633 539.434 pmp 16-Polymorphic Outer Membrane Protein E Family CPO4 68 539632 54.0432 pmp 17-Polymorphic Outer Membrane Protein E Family CPO4 69 540399 541460 pmp 17-Polymorphic Outer Membrane Protein (Frame-shift with 0469) CPO4 70 541357 542532 pmp 17-Polymorphic Outer Membrane Protein (Frame-shift with 0470) CPO4 71 542564 5454O1 pmp 18-Polymorphic Outer Membrane Protein E/F Family CPO4 72 547905 545581 CPO4 73 549593 548070 CPO4 74 551573 5498O7 CT365 hypothetical protein CPO4 75 553844 551685 glgB-Glucan Branching Enzyme-(CT866) CPO4 76 554844 553858 CT865 hypothetical protein CPO4 77 556106 554844 *yqeV Bs Hypothetical Protein CPO4 78 557625 556210 hfiX-GTP Binding Protein-(CT379) CPO4 79 558425 557616 phnP-Metal Dependent Hydrolase-(CT380) CPO4 8O 559303 55.8650 CT383 hypothetical protein CPO4 81 560946 559339 CPO4 82 561737 560961 artJ-Arginine Periplasmic Binding Protein-(CT381) CPO4 83 561836 564964 CPO4 84 564970 565824 aroG-Deoxyheptonate Aldolase-(CT382) CPO4 85 56.6038 566229 CT382.1 hypothetical protein CPO4 86 567784 566405 *hypothetical proline permease CPO4 87 56974O 568112 CT384 hypothetical protein CPO4 88 57O096 569767 hitA-HIT Family Hydrolase-(CT385) CPO4 89 570965 57O096 CT386 hypothetical protein CPO4 90 571279 573.333 CT387 hypothetical protein

US 6,822,071 B1 S3 S4

TABLE 2-continued Gene # From To Strand Gene Function (C. Trachomatis Ortholog in parenthesis) CP1025 77302 78879 F pgi-Glucose-6-P Isomerase-(CT378) CP1026 78997 791.37 F ltuA-(CT377) CPin1027 791.75 80755 F CP1028 81O16 81999 F mdhC-Malate Dehydrogenase-(CT376) CP1029 82O08 82844 F CP1030 83886 82843 R predicted D-amino acid dehydrogenase-(CT375) CP1031 85.552 84O98 R arcD-Arginine/Ornithine Antiporter-(CT374) CP1032 86150 85566 R CT373 hypothetical protein CP1033 875OO 861.87 R CT372 hypothetical protein CP1034 88517 87732 R Predicted OMP 1 (CT371) leader (18) peptide CP1035 9OOOO 88570 R AroE-Shikimate 5-Dehydrogenase-(CT370) CP1036 91135 89984 R AroB-Dehyroquinate Synthase-(CT369) CP1037 92199 91123 R AroC-Chorismate Synthase-(CT368) CP1038 92726 92199 R aroL-Shikimate Kinase II-(CT367) CP1039 93.999 92665 R aroA-Phosphoshikimate Vinyltransferase-(CT366) CP1040 94741 94O73 R CP1041 95.994 94726 R *bioA-Adenosylmethionine-8-Amino-7-Oxononanoate Aminotransferase CP1042 96590 95934 R *bioD-dethiobiotin synthetase CP1043 97.717 96.572 R bioF 2-Oxononanoate Synthase 2 CP1044 98691 97699 R *bioB-Biotin Synthase CP1045 99590 98.901 R *conserved hypothetical bacterial membrane protein CP1046 2OO625 99590 R *Tryptophan Hyroxylase CP1047 200552 2O1343 F dapB-Dihydrodipicolinate Reductase-(CT364) CP1048 2O1606 2O26O4 F asd-Aspartate Dehydrogenase-(CT363) CP1049 2O2595 2O3914 F lysO-Aspartokinase III-(CT362) CP1050 2O3926 204798 F dapA-Dihydrodipicolinate Synthase-(CT361) CP1051 2O4962 205270 F CP1052 2O5417 2O6169 F CP1053 2O6153 2O67O1 F CP1054 2O7034 2094.66 F CPin1055 209694 210521 F CPn1056 210527 211228 F CPin1.057 211497 213596 F CT356 hypothetical protein CP1058 213748 214836 F CT355 hypothetical protein CP1059 214848 215678 F kgSA-Dimethyladenosine Transferase-(CT354) CP1060 217658 21572.7 R dxs/tkt-Transketolase-(CT331) CP1061 21792O 217666 R CT330 hypothetical protein CP1062 21982O 218159 R xseA-Exodoxyribonuclease VII-(CT329) CP1063 219951 22O712 F tpiS-Triosephosphate Isomerase-(CT328) CP1064 22O719 220895 F CP1065 221095 22O928 R CP1066 221135 221488 F CP1067 221735 222292 F def-Polypeptide Deformylase-(CT353) CP1068 223258 222365 R rnhB 2-Ribonuclease HII 2-(CT008) CP1069 223.513 223941 F yfgA-HTH Transcriptional Regulator-(CT009) CP1070 225511 224144 R CP1071 227324 225885 R CPin1.072 227969 22.8835 F CP1073 229O11 229832 F Predicted OMP 2-(CT371)

TABLE 2 (Supplemental Data) Functional Assignments of C. pneumoniae Coding Sequences. C. trachomatis genes are shown in parentheses. Amino Acid Biosynthesis Aromatic Family 1039 (CT366) aroA Phosphoshikimate Vinyltransferase 1036 (CT369) aroB Dehyroquinate Synthase 1037 (CT368) aroC Chorismate Synthase 1035 (CT370) aroE Shikimate 5-Dehyrogenase O484 (CT382) aroG Deoxyheptonate Aldolase 1038 (CT367) aroL Shikimate Kinase II O740 (CT637) tyrB Aromatic AA Aminotransferase Aspartate Family (lysine) 1048 (CT363) asd Aspartate Dehydrogenase 1OSO (CT361) dapA Dihydrodipicolinate Synthase 1047 (CT364) dapB Dihydrodipicolinate Reductase US 6,822,071 B1 SS

TABLE 2-continued (Supplemental Data) Functional Assignments of C. pneumoniae Coding Sequences. C. trachomatis genes are shown in parentheses. 0519 (CT430) dapF Diaminopimelate Epimerase 1049 (CT362) lysC Aspartokinase III Serine Family O433 (CT282) gesh Glycine Cleavage System H. Protein O521 (CT432) glyA Serine Hydroxymethyltransferase Base & Nucleotide Metabolism O171 guaA GMP Synthase O172 guaB Inosine 5'-Monophosphase Dehydrogenase O608 Uridine 5'-Monophosphate Synthase O735 Uridine Kinase O244 (CT128) adk Adenylate Kinase O894 (CT751) al AMP Nucleosidase O568 (CT452) cmk CMP Kinase O392 (CTO39) dcd dCTP Deaminase OO59 (CT292) dut dUTP Nucleotidohydrolase O12O (CT030) gmk GMP Kinase O619 (CT500) indk Nucleoside-2-P Kinase O984 (CTS27) nrdA Ribonucleoside Reductase, Large Chain O985 (CT328) nirdB Ribonucleoside Reductase, Small Chain O236 (CT183) pyrCi CTP Synthetase O698 (CT678) pyrH UMP Kinase O273 (CT188) todk Thymidylate Kinase O659 (CT539) trXA Thioredoxin O314 (CTO99) trixB Thioredoxin Reductase 1001 (CTS44) yfbC Cytosine Deaminase Biosynthesis of Cofactors Biotin, Lipoate & Ubiquinone 1041 bioA Adenosylmethionine-8-Amino-7-Oxononanoate Aminotransferase 1044 bio Biotin Synthase 1042 bioD Dethiobiotin Synthetase O923 (CT777) bioF 1 Oxononanoate Synthase 1 1043 (CT777) bioF 2 Oxononanoate Synthase 2 O866 (CT725) birA Biotin Synthetase O748 (CT628) ispA Geranyl Transtransferase O832 (CT558) lipA Lipoate Synthetase O265 (CT219) ubiA Benzoate Octaphenyltransferase O264 (CT220) ubiD Phenylacrylate Decarboxylase O515 (CT428) ubi Ubiquinone Methyltransferase Folic Acid O759 (CT612) olA Dihydrofolate Reductase O335 (CTO78) olD Methylene Tetrahydrofolate Dehydrogenase O758 (CT613) oIP Dihydropteroate Synthase O757 (CT614) oX Dihydroneopterin Aldolase O763 (CT649) ygfA Formyltetrahydrofolate Cycloligase Porphyrin O714 (CT662) hemA Glutamyl tRNA Reductase O744 (CT633) hem3 Porphobilinogen Synthase OO52 (CT299) hemC Porphobilinogen Deaminase O890 (CT747) hem Uroporphyrinogen Decarboxylase O888 (CT745) hemCi protoporphyrinogen Oxidase O138 (CT210) hemL Glutamate-1-Semialdehyde-2,1-Aminomutase O38O (CTO52) hemN 1 Coproporphyrinogen III Oxidase 1 O889 (CT746) hemN 2 Coproporphyrinogen III Oxidase 2 O603 (CT485) hem2 Ferroehetalase Riboflavin O872 (CT731) ribA&ribB GTP Cyclohydratase & DHBP Synthase O532 (CT405) ribC Riboflavin Synthase O871 (CT730) ribD Riboflavin Deaminase O873 (CT732) ribe Ribityllumazine Synthase O32O (CTO93) rib FAD Synthase Cell Envelope Fatty Acid & Phospholipid Metabolism O161 (CT206) (predicted acyltransferase family) O922 (CT776) aaS Acylglycerophosphoethanolamine Acyltransferase O414 (CT265) accA AcCoA Carboxylase/Transferase Alpha O183 (CT123) accB Biotin Carboxyl Carrier Protein O182 (CT124) accC Biotin Carboxylase US 6,822,071 B1 57 58

TABLE 2-continued (Supplemental Data) Functional Assignments of C. pneumoniae Coding Sequences. C. trachomatis genes are shown in parentheses. OO58 (CT293) accD AcCoA Carboxylase/Transferase Beta O295 (CT236) acpp Acyl Carrier Protein O313 (CT100) acpS Acyl-carrier Protein Synthase O567 (CT451) cds.A Phosphatidate Cytidylyltransferase O297 (CT238) fab) Malonyl Acyl Carrier Transcyclase O916 (CT770) fab Acyl Carrier Protein Synthase O296 (CT237) fabG Oxoacyl (Carrier Protein) Reductase O298 (CT239) fab Oxoacyl Carrier Protein Synthase III O4O6 (CT104) fab Enoyl-Acyl-Carrier Protein Reductase O651 (CT532) fabz, Myrisaoyl-Acyl Carrier Dehydratase OO98 (CT010) htrB Acyltransferase O271 (CT136) Lysophospholipase Esterase O615 (CT496) pgSA 1 Glycerol-3-P Phosphatidyltransferase 1 O947 (CT797) pgSA 2 Glycerol-3-P Phosphatydyltransferase 2 O958 (CTS07) plsB Glycerol-3-P Acyltransferase O569 (CT453) plsC Glycerol-3-P Acyltransferase O962 (CTS11) plsX FA/Phospholipid Synthesis Protein O839 (CT699) psdD Phosphatidylserine Decarboxylase O983 (CT326) pSSA Glycerol-Serine Phosphatidyltransferase O921 (CT775) singlycerol-J-P Acyltransferase O654 (CT535) yciA Acyl-CoA Thioesterase O877 (CT736) ybcL CT736 Hypothetical Protein LPS O154 (CT208) gScA KDO Transferase O721 (CT655) kdSA KDO Synthetase 0235 (CT182) kdsB Deoxyoctulonosic Acid Synthetase O650 (CT531) lpXA Acyl-Carrier UDP-GlcNAc O-Acyltransferase O965 (CT411) lpxB Lipid A Disaccharide Synthase O652 (CT533) lpxC Myristoyl GlcNac Deacetylase O3O2 (CT243) lpxD UDP Glucosamine N-Acyltransferase Membrane Proteins, Lipoproteins & Porins O310 (CT251) 6OM 60 kDa Inner Membrane Protein O556 (CT442) crpA 15 kDa Cysteine-Rich Protein O653 (CT534) cutE Apollipoprotein N-Acetyltransferase O311 (CT252) lgt Prolipoprotein Diacylglycerol Transferase O558 (CT444) omcA 9 kDa-Cysteine-Rich Lipoprotein O557 (CT443) omcB 60 kDa Cysteine-Rich OMP O695 (CT681) ompa Major Outer Membrane Protein O854 (CT713) ompFB Outer Membrane Protein B O781 (CT600) pal Peptidoglycan-Associated Lipoprotein O3OO (CT241) yaeT Omp85 Homolog Peptidoglycan

O417 (CT268) amiA N-Acetylmuramoyl Alanine Amidase O78O (CT601) ami B N-Acetylmuramoyl-L-Ala Amidase O672 (CT551) dacF D-Ala-D-Ala Caroxypeptidase O968 (CTS16) glimS Glucosamine-Fructose-6-P Aminotransferase O749 (CT629) glimU UDP-GlcNAc Pyrophosphorylase O900 (CT757) mrSY Muramoyl-Pentapeptide Transferase O571 (CT455) murA UDP-N-Acetylglucosamine Transferase O988 (CT331) murE UDP-N-Acetylenolpyruvoylglucosamine Reductase O905 (CT762) murc&ddlA Muramate-Ala Ligase & D-Ala-D-Alam Ligase O901 (CT758) mur) Muramoylalanine-Glutamate Ligase O418 (CT269) murE N-Acetylmuramoylalanylglutamyl DAP Ligase O899 (CT756) murF Muramoyl-DAP Ligase O904 (CT761) murg Peptidoglycan Transferase O902 (CT759) nlpD Muramidase (invasin repeat family) O694 (CT682) pbp2 PBP2-Transglycolase/Transpeptidase O419 (CT270) pbp3 Transglycolase/Transpeptidase O421 (CT272) yabC PBP2B Family Methyltransferase Cellular Processes

Cell Division (CT308) cafe Axial Filament Protein (CT739) ftsK Cell Division Protein Ftsk (CT760) ftsW Cell Division Protein FtsW (CTS2O) ftsy Cell Division Protein Ftsy (CT498) gidA FAD-dependent Oxidoreductase (CT582) minD Chromosome Partitioning ATPase (CT709) mreB Rod Shape Protein-Sugar Kinase US 6,822,071 B1 59 60

TABLE 2-continued (Supplemental Data) Functional Assignments of C. pneumoniae Coding Sequences. C. trachomatis genes are shown in parentheses. O867 (CT726) rodA Rod Shape Protein O684 (CT688) parB Chromosome Partitioning Protein Detoxtification OO57 (CT294) sodM Superoxide Dismutase (Mn) O778 (CT603) ahpC Thio-specific Antioxidant (TSA) Peroxidase Signal Transduction O148 (CT145) S/T Protein Kinase O584 (CT467) asoS Two-Component Sensor O294 (CT235) cAMP-Dependent Protein Kinase Regulatory Subunit O712 (CT664) (FHA domain) O478 (CT379) hfix GTP Binding Protein O703 (CT673) S/T Protein Kinase O095 (CT301) S/T Protein Kinase O397 (CT259) PP2C Phosphatase Family OO37 (CT337) ptsH PTS Phosphocarrier Protein Hpr OO38 (CT336) pts PTS PEP Phosphotransferase O060 (CT291) ptsN 1 PTS IIA Protein 1 OO61 (CT290) ptsN 2 PTS IIA Protein + HTH DNA-Binding Domain O262 (CT218) SurE SurE-like Acid Phosphatase O838 (CT698) thdF Thiophene/Furan Oxidation Protein O693 (CT683) TFR Repeats-CT683 Hypothetical Protein O321 (CTO92) ychF GTP Binding Protein O544 (CT418) yhbZ GTP binding protein O844 (CT703) yphC GTPase/GTP-binding protein Standard Protein Secretion O115 (CTO25) fh Signal Recognition Particle GTPase O363 (CTO60) flha Flagellar Secretion Protein O858 (CT717) f Flagellum-specific ATP Synthase O704 (CT672) fiN Flagellar Motor Switch Domain/YscQ family O815 (CT572) gspD Gen. Secretion Protein D O816 (CT571) gspE Gen. Secretion Protein E O817 (CT570) gspF Gen. Secretion Protein F O359 (CTO64) lepA GTPase O110 (CTO2O) lepB Signal Peptidase I O535 (CT408) lspA Lipoprotein Signal Peptidase O260 (CT141) secA 1 Protein Translocase Subunit 1 O841 (CT701) secA 2 Translocase SecA 2 O564 (CT448) sec)&secE. Protein Export Proteins SecD/SecF (fusion) OO75 (CT321) secE Preprotein Translocase O629 (CT510) SecY Translocase O848 (CT707) tig Trigger Factor-Peptidyl-prolyl Isomerase Transport-Related Proteins O486 Hypothetical Proline Permease O289 (CT230) aaaT Neutral Amino Acid (Glutamate) Transporter O691 (CT685) abcx ABC Transporter ATPase 1031 (CT374) arcD Arginine/Ornithine Antiporter O482 (CT381) art Arginine Periplasmic Binding Protein O836 (CT554) brnO Amino Acid (Branched) Transport O536 (CT409) dagA 1 D-Ala/Gly Permease 1 O876 (CT735) dagA 2 D-Alanine/Glycine Permease 2 O682 (CT690) dppD ABC ATPase Dipeptide Transport O683 (CT689) dppF ABC ATPase Dipeptide Transport O28O (CT689) dppF Dipeptide Transporter ATPase O785 (CT596) exbB Macromolecule Transporter O784 (CT597) exbD Biopolymer Transport Protein O604 (CT486) fy Glutamine Binding Protein O192 (CT129) glnP ABC Amino Acid Transporter Permease O191 (CT130) glnO ABC Amino Acid Transporter ATPase O528 (CT401) glitT Glutamate Symport O286 (CT194) mgtE Mg' Transporter (CBS Domain) O413 (CT264) msbA Transport ATP Binding Protein O290 (CT231) Na'-dependent Transporter O195 (CT198) oppA 1 Oligopeptide Binding Protein 1 O196 (CT198) oppA 2 Oligopeptide Binding Protein 2 O197 (CT139) oppA 3 Oligopeptide Binding Protein 3 O198 (CT175) oppA 4 Oligopeptide Binding Protein 4 O599 (CT480) oppA 5 Oligopeptide Binding Lipoprotein 5 O199 (CT199) oppB 1 Oligopeptide Permease 1 0598 (CT479) oppB 2 Oligopeptide Permease 2 O2OO (CT200) oppC 1 Oligopeptide Permease 1 O597 (CT478) oppC 2 Oligopeptide Permease 2 O2O1 (CT201) oppD Oligopeptide Transport ATPase US 6,822,071 B1 61

TABLE 2-continued (Supplemental Data) Functional Assignments of C. pneumoniae Coding Sequences. C. trachomatis genes are shown in parentheses. O2O2 (CT202) oppF Oligopeptide Transport ATPase O231 (CT180) tauB ABC Transport ATPase (Nitrate/Fe) O782 (CT599) tol Macromolecule Transporter O969 (CT317) tyrP 1 Tyrosine Transport 1 O970 (CTS18) tyrP 2 Tyrosine Transport 2 O665 (CT544) uhpC Hexosphosphate Transport O282 (CT216) aaSA Amino Acid Transporter O2O7 (CT204) ybhI dicarboxylate Translocator O971 (CTS19) yccA Transport Permease O248 (CT152) ycfV ABC Transporter ATPase 1014 (CT356) ychM Sulfate Transporter O736 (CT641) ygeD Efflux Protein O68O (CT692) ygo4 Phosphate Permease O723 (CT653) yhbC ABC Transporter ATPase OO23 (CT348) yiK ABC Transporter Protein ATPase O127 (CTO34) ytlF Cationic Amino Acid Transporter O349 (CTO67) ytgA Solute Protein Binding Family O348 (CTO68) ytgB ABC Transporter ATPase O347 (CTO69) ytgC Integral Membrane Protein O346 (CTO70) ytg|D Integral Membrane Protein 1012 (CT354) yzcB ABC Transporter Permease O868 (CT727) mtA Metal Transport P-type ATPase O279 Possible ABC Transporter Permease Protein O543 (CT417) (Metal Transport Protein) O692 (CT684) ABC Transporter O542 (CT416) ABC Transporter ATPase O690 (CT686) ABC Transporter Membrane Protein O541 (CT415) solute binding protein Type-III Secretion O323 (CT090) lcrD Low Calcium Response D O324 (CTO89) lcrE Low Calcium Response E O811 (CT576) lcrE 1 Low Ca Response Protein H 1 1021 (CTS62) lcrEH 2 Low Calcium Response 2 O325 (CTO88) sycE Secretion Chaperone O702 (CT674) yscC Yop C/Gen Secretion Protein D O828 (CT559) ysc) Yop Translocation J O826 (CT561) yscL Yop Translocation L O707 (CT669) yscN Yop N (Flagellar-Type ATPase) O825 (CT562) yscR Yop Translocation R O824 (CT563) yscS YopS Translocation Protein O823 (CT564) yscT YopT Translocation T O322 (CTO91) ytcU Yop Translocation Protein U Central Intermediary Metabolism Glycogen Metabolism O856 (CT715) UDP-Glucose Pyrophosphorylase O948 (CT798) glgA Glycogen Synthase O475 (CT366) glgB Glucan Branching Enzyme O607 (CT489) glgC Glucose-1-P Adenyltransferase O3O7 (CT248) glgP Glycogen Phosphorylase O388 (CTO42) glgX Glycogen Hydrolase (debranching) O326 (CTO87) malO Glucanotransferase O851 (CT710) pckA Phosphoenolpyruvate Carboxykinase Phosphorous & Sulfur O548 (CT435) cysI Sulfite Reductase O920 (CT774) cysQ Sulfite Synthesis/Biphosphate Phosphatase OO25 (CT346) atSA Sulphohydrolase O918 (CT772) ppa Inorganic Pyrophosphatase DNA Replication, Modification, Repair & Recombination DNA Mismatch Repair O505 3-Methyladenine DNA Glycosylase O812 (CT575) mutL DNA Mismatch Repair O941 (CT792) mutS DNA Mismatch Repair O4O2 (CT107) mutY Adenine Glycosylase O732 (CT625) info Endonuclease IV O837 (CT697) nth Endonuclease III DNA Modification O596 (CT477) ada Methyltransferase O114 (CTO24) hemk A/G-specific Methylase O891 (CT748) mfcd Transcription-Repair Coupling US 6,822,071 B1 63 64

TABLE 2-continued (Supplemental Data) Functional Assignments of C. pneumoniae Coding Sequences. C. trachomatis genes are shown in parentheses. O62O (CT501) ruVA Holliday Junction Helicase O390 (CTO40) ruvB Holliday Junction Helicase O621 (CT502) ruvC Crossover Junction Endonuclease OO53 (CT298) SS Sms Protein O773 (CT607) lung Uracil DNA Glycosylase 1062 (CT329) XscA Exodosyribonuclease VII DNA Recombination

O762 (CT650) recA RecA Recombination Protein O738 (CT639) rec Exodeoxyribonuclease V. Beta O737 (CT640) recC Exodeoxyribonuclease V. Gamma O123 (CTO33) rec) 1. Exodeoxyribonuclease V (Alpha Subunit) 1 O752 (CT652) rec) 2 Exodeoxyribonuclease V. Alpha 2 O339 (CTO74) rec ABC Superfamily ATPase O340 (CTO74) (frame-shift with 0339) O563 (CT447) rec ssDNA Exonuclease O299 (CT240) recR Recombination Protein DNA Replication O309 (CT250) dnaA 1 Replication Initiation Protein 1 O424 (CT275) dnaA 2 Replication Initiation Factor 2 O616 (CT497) dnaB Replicative DNA Helicase O666 (CT545) dinaE DNA Pol III Alpha O942 (CT794) dnaG DNA Primase 0338 (CTO75) dnaN DNA Pol III (Beta) O410 (CT261) dnaO 1 DNA Pol III Epsilon Chain 1 O655 (CT536) dnaO 2 DNA Pol III Epsilon Chain 2 OO40 (CT334) dnaX 1 DNA Pol III Gamma and Tau 1 O272 (CT187) dnaX 2 DNA Pol III Gamma and Tau 2 O149 (CT146) din DNA Ligase O274 (CT189) gyra 1 DNA Gyrase Subunit A 1 O716 (CT660) gyra 2 DNA Gyrase Subunit A 2 O275 (CT190) gyrB 1 DNA Gyrase Subunit B 1 O715 (CT661) gyrB 2 DNA Gyrase Subunit B 2. O416 (CT267) himD Integration Host Factor Alpha O612 (CT493) polA DNA Polymerase I O924 (CT778) priA Primosomal Protein N O386 (CTO44) ssb SS DNA Binding Protein O835 (CT555) SWI/SNF family helicase 1 O849 (CT708) SWI/SNF family helicase 2 O769 (CT643) topA DNA Topoisomerase I-Fused to SWI Domain OO24 (CT347) xerC Integrase/recombinase 1024 (CTS64) xerD Integrase/recombinase Eukaryotic-Type Chromatin Factors O886 (CT743) hctA Histone-Like Developmental Protein O384 (CTO46) hctB Histone-like Protein 2 O878 (CT737) SET Domain protein O577 (CT460) SWIB (YM74) Complex Protein UVR Exinuclease Repair System OO96 (CT333) uvrA Excinuclease ABC Subunit A O8O1 (CT556) uvrB Excinuclease ABC Subunit B O940 (CT791) uvrC Excinuclease ABC Subunit C O772 (CT608) uwrD DNA Helicase Energy Metabolism Aerobic O855 (CT714) gpdA Glycerol-3-P Dehydrogenase O743 (CT634) narA Ubiquinone Oxidoreductase, Alpha 0427 (CT278) nar2 NADH (Ubiquinone) Dehydrogenase O428 (CT279) nar3 NADH (Ubiquinone) Oxidoreductase, Gamma O429 (CT280) nar4 NADH (Ubiquinone) Reductase 4 O430 (CT281) narS NADH (Ubiquinone) Reductase 5 O883 (CT740) nar6 Phenolhydrolase/NADH (Ubiquinone) Oxidoreductase 6 ATP Biogenesis and metabolism O351 (CTO65) adt 1 ADP/ATP Translocase 1 O614 (CT495) adt 2 ADP/ATP Translocase 2 O088 (CT308) atp A ATP Synthase Subunit A OO89 (CT307) atp3 ATP Synthase Subunit B OO90 (CT306) atpD ATP Synthase Subunit D O086 (CT310) atp, ATP Synthase Subunit E OO91 (CT305) atp ATP Synthase Subunit I US 6,822,071 B1 65

TABLE 2-continued (Supplemental Data) Functional Assignments of C. pneumoniae Coding Sequences. C. trachomatis genes are shown in parentheses. O092 (CT304) atpK ATP Synthase Subunit K O860 (CT719) OF Flagellar M-Ring Protein Electron Transport Chain O1O2 (CT013) cydA Cytochrome Oxidase Subunit I O103 (CTO14) cydB Cytochrome Oxidase Subunit II O364 (CTO59) Ferredoxin OO84 (CT312) Predicted Ferredoxin Glycolysis & Gluconeogenesis O281 (CT215) dhnA Predicted 1,6-Fructose Biphosphate Aldolase O8OO (CT587) CO Enolase O624 (CT505) gap A Glyceraldehyde-3-P Dehydrogenase OO56 (CT295) mrSA Phosphomannomutase O967 (CTS15) pgm Phosphoglucomutase O160 (CT207) pfkA 1 Fructose-6-P Phosphotransferase 1 O2O8 (CT205) pfkA 2 Fructose-6-P Phosphotransferase 2 1025 (CT378) pgi Glucose-6-P Isomerase O679 (CT693) pgk Phosphoglycerate Kinase O863 (CT722) pgmA Phosphoglycerate Mutase O097 (CT332) pyk Pyruvate Kinase 1063 (CT328) tpiS Triosephosphate Isomerase Pentose Phosphate Pathway O239 (CT186) dew Glucose-6-P Dehydrogenase (Dev B family) 1060 (CT331) CXS Transketolase O360 (CTO63) gnd 6-Phosphogluconate Dehydrogenase O185 (CT121) e Ribulose-P Epimerase O141 (CT213) rpiA Ribose-5-P Isomerase A O083 (CT313) al Transaldolase O893 (CT750) kt3 Transketolase O238 (CT185) Zwf Glucose-6-P Dehydrogenase Pyruvate Dehydrogenase O833 (CT557) pdA Lipoamide Dehydrogenase O436 (CT285) pla 1 Lipoate Protein Ligase-Like Protein O618 (CT499) pla 2 Lipoate-Protein Ligase A OO33 (CT340) bdha&B Oxoisovalerate Dehydrogenase C/B Fusion O3O4 (CT245) bdha Pyruvate Dehydrogenase Alpha O305 (CT246) bdhB Pyruvate Dehydrogenase Beta O3O6 (CT247) bdhC Dihydrolipoamide Acetyltransferase TCA Cycle O495 (CT390) aspC Aspartate Aminotransferase 1013 (CTS55) umC Fumarate Hydratase 1028 (CT376) mdhC Malate Dehydrogenase O789 (CT592) sdhA Succinate Dehydrogenase O790 (CT591) sdhB Succinate Dehydrogenase O788 (CT593) sdhC Succinate Dehydrogenase O378 (CTO54) SucA Oxoglutarate Dehydrogenase O377 (CTO55) SucB 1 Dihydrolipoamide Succinyltransferase 1 O527 (CT400) sucB 2 Dihydrolipoamide Succinyltransferase 2 O973 (CTS21) sucC Succinyl-CoA Synthetase, Beta O974 (CT322) sucD Succinyl-CoA Synthetase, Alpha Protein Folding, Assembly & Modification Chaperones

O949 (CT799) ctic General Stress Protein O534 (CT407) dkSA DnaK Suppressor OO32 (CT341) dna Heat Shock Protein J 0503 (CT396) dnaK Hsp-70 O134 (CT110) groEL 1 Hsp-60 1 O777 (CT604) groEL 2 Hsp-60 2 O898 (CT755) groEL 3 Hsp-60 3 O135 (CT111) groES 10 KDa Chaperonia O5O2 (CT395) grpE HSP-70 Cofactor O661 (CT541) mip FKBP-type Peptidyl-prolyl Cis-Trans Isomerase Proteases O144 (CT113) clpB Clp Protease ATPase O437 (CT286) clpC ClpC Protease O52O (CT431) clpP 1 CLP Protease O847 (CT706) clpP 2 CLP Protease Subunit O846 (CT705) clpX CLP Protease ATPase O269 (CT138) Dipeptidase US 6,822,071 B1 67

TABLE 2-continued (Supplemental Data) Functional Assignments of C. pneumoniae Coding Sequences. C. trachomatis genes are shown in parentheses. O998 (CT341) ftsH ATP-dependent Zinc Protease OO3O (CT343) gep. 1 O-Sialoglycoprotein Endopeptidase 1 O194 (CT197) gep 2 O-Sialoglycoprotein Endopeptidase 2 O979 (CT323) htra DO Serine Protease O957 (CTS06) ide Insulinase family/Protease III OO27 (CT344) lon Lon ATP-dependent Protease 1017 (CTS59) lytB Metalloprotease 1009 (CT351) map Methionine Aminopeptidase O385 (CTO45) pepA Leucyl Aminopeptidase A O136 (CT112) pepF Oligopeptidase O813 (CT574) pepP Aminopeptidase P O613 (CT494) sohB Protease O555 (CT441) tsp Tail-Specific Protease O344 (CTO72) yaeL Metalloprotease O981 (CT324) Zinc Metalloprotease (insulinase family) Protein Isomerases

O227 (CT176) disbB Disulfide bond Oxidoreductase O786 (CT595) disbD Thio:disulfide Interchange Protein O228 (CT177) disbG Disulfide Bond Chaperone O933 (CT783) Predicted Disulfide Bond Isomerase O926 (CT780) Thioredoxin Disulfide Isomerase Transcription RNA Degradation O999 (CT342) pnp Polyribonucleotide Nucleotidyltransferase OO54 (CT297) C Ribonuclease III O119 (CTO29) mhB 1 Ribonuclease HII 1 1068 (CTO08) mhB 2 Ribonuclease HII 2 O934 (CT784) mpA Ribonuclease P Protein Component OSO4 (CT397) vacB Ribonuclease Family RNA Elongation & Termination Factors O741 (CT636) grea Transcription Elongation Factor O316 (CTO97) nuSA N Utilization Protein. A OO76 (CT320) nusG Transcriptional Antitermination O845 (CT704) pcnB 1 Poly A Polymerase 1 O966 (CT410) pcnB 2 Poly A Polymerase 2 O610 (CT491) rho Transcription Termination Factor RNA Methylases O674 (CT553) fmu RNA Methyltransferase 1059 (CT354) kgSA Dimethyladenosine Transferase O187 (CT133) Predicted Methylase O530 (CT403) spoU 1 rRNA Methylase 1 O660 (CT540) spoU 2 rRNA Methylase 2 O117 (CTO27) trim) tRNA ( N-1)-Methyltransferase O885 (CT742) ygcA rRNA Methyltransferase O986 (CT329) ygg Predicted rRNA Methylase O987 (CTS30) ytgB Predicted rRNA Methylase RNA Modification O649 (CT530) fmt Methionyl tRNA Formyltransferase O910 (CT766) miZA tRNA Pyrophosphate Transferase O719 (CT658) sfhB Predicted Pseudouridine Synthase O219 (CT193) tgt Queuine tRNA Ribosyl Transferase O58O (CT463) truA Pseudouridylate Synthase I O319 (CTO94) truE tRNA Pseudouridine Synthase O403 (CT106) yceC Predicted Pseudouridine Synthetase Family O864 (CT723) yibC Predicted Pseudouridine Synthase RNA Polymerase & Transcription Regulators O586 (CT468) atpC Two-Component Regulator O362 (CTO61) rpsD Sigma-28/WhiG Family OSO1 (CT394) hircA HTH Transcriptional Repressor O793 (CT588) rbsU Sigma Regulatory Family Protein-PP2C Phosphatase (RsbW Antagonist) O626 (CT507) rpoA RNA Polymerase Alpha OO81 (CT315) rpoB RNA Polymerase Beta OO82 (CT314) rpoC RNA Polymerase Beta' O756 (CT615) rpoD RNA Polymerase Sigma-66 O771 (CT609) rpoN RNA Polymerase Sigma-54 O511 (CT424) rsbV 1 Sigma Regulatory Factor 1 O909 (CT765) rsbV 2 Sigma Factor Regulator 2 0670 (CT549) rsbW Sigma Regulatory Factor Histidine Kinase US 6,822,071 B1 69 70

TABLE 2-continued (Supplemental Data) Functional Assignments of C. pneumoniae Coding Sequences. C. trachomatis genes are shown in parentheses. O750 (CT630) tot) HTH Transcriptional Regulatory Protein + Receiver Doman 1069 (CTO09) yfgA HTH Transcriptional Regulator Translation Amino Acyl tRNA Synthesis O892 (CT749) alaS Alanyl tRNA Synthetase O570 (CT454) argS Arginyl tRNA Transferase O662 (CT542) aspS Aspartyl tRNA Synthetase O932 (CT782) cysS Cysteinyl tRNA Synthetase OOO3 (CTOO3) gatA Glu tRNA Glin Amidotransferase (A subunit) OOO4 (CTOO4) gat3 Glu tRNA Glin Amidotransferase (B Subunit) OOO2 (CTOO2) gatC Glu tRNA Glin Amidotransferase (C subunit) O560 (CT445) gltX Glutamyl-tRNA Synthetase O946 (CT796) glyO Glycyl tRNA Synthetase O663 (CT543) hisS Histidyl tRNA Synthetase O109 (CTO19) illeS Isoleucyl-tRNA Synthetase O153 (CT209) leuS Leucyl tRNA Synthetase O931 (CT781) lysS Lysyl tRNA Synthetase O122 (CTO32) metG Methionyl-tRNA Synthetase O993 (CTS36) pheS Phenylalanyl tRNA Synthetase, Alpha O594 (CT475) pheT Phenylalanyl tRNA Synthetase Beta O500 (CT393) proS Prolyl tRNA Synthetase O870 (CT729) serS Seryl tRNA Synthetase 2 O806 (CT581) thrS Threonyl tRNA Synthetase O8O2 (CT585) trpS Tryptophanyl tRNA Synthetase O361 (CTO62) tyrS Tyrosyl tRNA Synthetase OO94 (CT302) walS Valyl tRNA Synthetase Peptide Chain Initiation, Elongation & Termination 1067 (CT353) def Polypeptide Deformylase O184 (CT122) efp 1 Elongation Factor P 1 O895 (CT752) efp 2 Elongation Factor P 2 O550 (CT437) fuSA Elongation Factor G OO73 (CT323) infA Initiation Factor IF-1 O317 (CTO96) inf Initiation Factor-2 O990 (CTS33) info Initiation Factor 3 O113 (CTO23) pfrA Peptide Chain Releasing Factor 1 O576 (CT459) prfB Peptide Chain Release Factor 2 O950 (CTSOO) pth Peptidyl tRNA Hydrolase O318 (CTO95) rbfA Ribosome Binding Factor A O699 (CT677) Ribosome Releasing Factor O697 (CT679) tS Elongation Factor TS OOf4 (CT322) tufA Elongation Factor Tu Ribosomal Proteins

OO78 (CT318) L1 Ribosomal Protein O644 (CT525) 2 L2 Ribosomal Protein O647 (CT528) 3 L3 Ribosomal Protein O646 (CT527) 4 L4 Ribosomal Protein O635 (CT516) 5 L5 Ribosomal Protein O633 (CT514) 6 L6 Ribosomal Protein 0080 (CT316) 7 L7/L12 Ribosomal Protein O953 (CT803) 9 L9 Ribosomal Protein OO79 (CT317) r10 L10 Ribosomal Protein OO77 (CT319) r11 L11 Ribosomal Protein O247 (CT125) r13 L13 Ribosomal Protein O637 (CT518) r14 L14 Ribosomal Protein O630 (CT511) r15 L15 Ribosomal Protein O640 (CT521) r16 L16 Ribosomal Protein O625 (CT506) r117 L17 Ribosomal Protein O632 (CT513) r18 L18 Ribosomal Protein O118 (CTO28) r119 L19 Ribosomal Protein O992 (CTS35) r2O L20 Ribosomal Protein O546 (CT420) r21 L21 Ribosomal Protein O642 (CT523) r22 L22 Ribosomal Protein O645 (CT526) r123 L23 Ribosomal Protein O636 (CT517) r24 L24 Ribosomal Protein O545 (CT419) r127 L27 ribosomal protein O327 (CTO86) r28 L28 Ribosomal Protein O639 (CT520) r29 L29 Ribosomal Protein O112 (CTO22) r131 L31 Ribosomal Protein O961 (CTS10) r132 L32 Ribosomal Protein O250 (CT150) r133 L33 Ribosomal Protein O935 (CT785) r134 L34 Ribosomal Protein O991 (CTS34) r135 L35 Ribosomal Protein US 6,822,071 B1 71

TABLE 2-continued (Supplemental Data) Functional Assignments of C. pneumoniae Coding Sequences. C. trachomatis genes are shown in parentheses. (CT786) r136 L36 Ribosomal Protein (CTO98) rs1 S1 Ribosomal Protein (CT680) rs2 S2 Ribosomal Protein (CT522) rs3 S3 Ribosomal Protein (CT626) rs4 S4 Ribosomal Protein (CT512) rS5 S5 Ribosomal Protein (CT301) rs6 S6 Ribosomal Protein (CT438) rs, S7Ribosomal Protein (CT515) rs8 S8 Ribosomal Protein (CT126) rs9 S9 Ribosomal Protein (CT436) rs10 S10 Ribosomal Protein (CT508) rs11 S11 Ribosomal Protein (CT439) rs12 S12 Ribosomal Protein (CT509) rs13 S13 Ribosomal Protein (CT787) rs14 S14 Ribosomal Protein (CT343) rS15 S15 Ribosomal Protein (CTO26) rs16 S16 Ribosomal Protein (CT519) rS17 S17 Ribosomal Protein (CTSO2) rs18 S18 Ribosomal Protein (CT524) rS19 S19 Ribosomal Protein (CT617) rs20 S2O Ribosomal Protein (CT342) rs21 S21 Ribosomal Protein Other Categories Chlamydia-Specific Proteins O561 (CT446) EuO CHLPS EuO Protein O804 (CT583) Gp6D CHLTR Plasmid Paralog O186 (CT119) Similarity to IncA 1 O291 (CT232) incB Inclusion Membrane Pro ein B O292 (CT233) incC Inclusion Membrane Pro ein C 1026 (CT377) LtuA Protein O333 (CTO80) LtuB Protein OOO5 (CTS7 pmp 1 Polymorphic Outer Mem brane Pro ein G Family OO13 (CTS7 pmp 2 Polymorphic Outer Mem brane Pro ein G Family OO14 (CTS7 pmp 3 Polymorphic Outer Mem brane Pro ein G Family OO15 (CTS7 pmp 3 PMP 3 ( rame-shift with 0.014) OO16 (CTS74 pmp 4 Polymorphic Outer Mem brane Pro ein G Family OO17 (CTS7 pmp 4 PMP 4 ( rame-shift with 0.016) OO18 (CTS74 pmp 5 Polymorphic Outer Mem brane Pro ein G Family OO19 (CTS7 pmp 5 PMP 5 ( rame-shift with 0.018) O444 (CTS7 pmp 6 Polymorp hic Outer Mem brane Pro ein G/I Family O445 (CTS7 pmp 7 Polymorp hic Outer Mem brane Pro ein G Family O446 (CTS7 pmp 8 Polymorp hic Outer Mem brane Pro ein G Family O447 (CTS7 pmp 9 Polymorp hic Outer Mem brane Pro ein G/I Family O450 (CTS7 pmp 10 Polymorp hic Outer Mem brane Pro ein G Family O449 (CTS7 pmp 10 PMP 10 (Frame-shift with 0450) O451 (CTS7 pmp 11 Polymorphic Outer Mem brane Pro ein G Family O452 (CTS74 pmp 12 Polymorphic Outer Mem brane Pro ein (truncated) A/I Family O453 (CTS7 pmp 13 Polymorphic Outer Mem brane Pro ein G Fami O454 (CTS72) pmp 14 Polymorphic Outer Mem brane Pro ein H. Fami O466 (CT369) pmp 15 Polymorphic Outer Mem brane Pro ein E Fami O467 (CT369) pmp 16 Polymorphic Outer Mem brane Pro ein E Fami O468 (CT369) pmp 17 Polymorphic Outer Mem brane Pro ein E Fami O469 (CT369) pmp 17 PMP 17 (Frame-shift with 0468) O470 (CT369) pmp 17 PMP 17 (Frame-shift with 0469) O471 (CTS70) pmp 18 Polymorphic Outer Mem brane Pro ein E/F Family O539 (CT412) Oil 19 Polymorphic Membrane Protein A Family O540 (CT413) Il 20 Polymorphic Membrane Protein B Family O963 (CT312) Il 21 Polymorphic Membrane Protein D Family O562 CHLPS 43 kDa Protein Homolog 1 O927 CHLPS 43 kDa Protein Homolog 2 O928 CHLPS 43 kDa Protein Homolog 3 O929 CHLPS 43 kDa Protein Homolog 4 O728 (CT622) CHLPN 76 kDa Homolog 1 (CT622) O729 (CT623) CHLPN 76 kDa Homolog 2 (CT623) O133 (CT109) CHLPS Hypothetical Protein O332 (CTO81) CHLPRT2 Protein Miscellaneous Enzymes/Conserved Proteins O193 argR Possible Arginine Repressor 1046 Aromatic Amino Acid Hydroxylase O232 Similarity to 5'-Methylthioadenosine Nucleosidase O128 (CTO35) Biotin Protein Ligase O513 (CT426) Fe-S Oxidoreductase 1 O911 (CT767) Fe-S Oxidoreductase 2 US 6,822,071 B1 73 74

TABLE 2-continued (Supplemental Data) Functional Assignments of C. pneumoniae Coding Sequences. C. trachomatis genes are shown in parentheses. O373 (CTO57) gepE GepE Protein O4O7 (CT103) HAD Superfamily Hydrolase/Phosphatase 0917 (CT771) Hydrolase/Phosphatase Homolog O488 (CT385) ycfF HIT Family Hydrolase O701 (CT675) karG Arginine Kinase O526 (CT399) kpsF GurO/KpsF Family Sugar-P Isomerase O919 (CT773) ldh Leucine Dehydrogenase OO22 (CT349) maf Maf protein O997 (CT340) mes PP-loop superfamily ATPase O15 (CT148) mhp A Monooxygenase O730 (CT624) mviN Integral Membrane Protein O86 (CT720) Nif-Related Protein O479 (CT380) phnP Metal Dependent Hydrolase O106 (CTO15) phoH ATPase O329 (CTO84) Phospholipase D Superfamily O435 (CT284) Phospholipase D Superfamily O58 (CT464) Phosphoglycolate Phosphatase O897 (CT754) Predicted Phosphohydrolase O509 (CT422) Predicted Metalloenzyme 1030 (CT375) Predicted D-Amino Acid Dehydrogenase O53 (CT404) SAM Dependent Methyltransferase O337 (CTO76) SmpB Small Protein B O394 (CT256) tlyC 1 CBS Domain Protein (Hemolysin Homolog) 1 O510 (CT423) tlyC 2 CBS Domains (Hemolysin Homolog) 2. O382 (CT048) yabC SAM-Dependent Methyltransferase O787 (CT594) yabD PHF Superfamily (Urease/Pyrimidinase) Hydrolase O61 (CT492) yacE Predicted Phosphatase/Kinase O579 (CT462) yacM Sugar Nucleotide Phosphorylase O578 (CT461) yae Phosphohydrolase O345 (CTO71) yaeM CTO71 Hypothetical Protein O566 (CT450) yaeS YaeS family Hypothetical Protein O59 (CT472) yagE YagE family OO39 (CT335) ybaB YbaB family Hypothetical Protein O1O (CTO12) ybbP YbbP family Hypothetical Protein O915 (CT769) ybeB iojap Superfamily Onholog O137 (CT108) ybgI ACR family O529 (CT402) ycaH ATPase O438 (CT287) ycbF PP-loop Superfamily ATPase O734 (CT627) yccA YccA Hypothetical Protein O954 (CTS04) ych B Predicted Kinase O261 (CT217) ydaO PP-Loop Superfamily ATPase O245 (CT127) ydho Polysaccharide Hydrolase-Invasin Repeat Family O573 (CT457) yebC YebC Family Hypothetical Protein O689 (CT687) yfhO 1 NifS-related Aminotransferase 1 O862 (CT721) yfhO 2 NifS-related Aminotransferase 2 O547 (CT434) ygbB YgbB Family Hypothetical Protein O237 (CT184) yggF YggF Family Hypothetical Protein O775 (CT606) yggY YggY Family Hypothetical Protein O396 (CT258) yhfo 3 NifS-related Aminotransferase 3 O605 (CT487) yhhF Predicted Methylase O575 (CT458) yhhY Amino Group Acetyl Transferase O592 (CT473) yidD YidD Family O982 (CT325) yigN YigN Family Hypothetical Protein O657 (CT537) yieE YeE Hypothetical Protein O768 (CT644) yohI Yoh Predicted Oxidoreductase O336 (CTO77) yoL YojL Hypothetical Protein O217 (CT140) ypdP YpdP Hypothetical Protein O140 (CT212) ygdE YgdE Hypothetical Protein O263 (CT221) ygflJ YgfU Hypothetical Protein O139 (CT211) yggE YdgE Hypothetical Protein O270 (CT137) ywlC SuAS Superfamily-related Protein O879 (CT738) yyc) Metal Dependent Hydrolase Homologs to CHLTR Hypothetical Coding Genes OOO1 (CTOO1) CT001 Hypothetical Protein OO2O (CT351) CT351 Hypothetical Protein OO21 (CT350) CT350 Hypothetical Protein OO26 (CT345) CT345 Hypothetical Protein OO35 (CT339) CT339 Hypothetical Protein OO36 (CT338) CT338 Hypothetical Protein OO55 (CT296) CT296 Hypothetical Protein OO62 (CT289) CT289 Hypothetical Protein OO65 (CT288) CT288 Hypothetical Protein OO68 (CT360) CT360 Hypothetical Protein OO71 (CT325) CT325 Hypothetical Protein OO72 (CT324) CT324 Hypothetical Protein US 6,822,071 B1 75 76

TABLE 2-continued (Supplemental Data) Functional Assignments of C. pneumoniae Coding Sequences. C. trachomatis genes are shown in parentheses.

OO85 (CT311) CT311 Hypotheti Ca Pro ein O087 (CT309) CT309 HypothetiCa Pro ein OO93 (CT303) CT303 HypothetiCa Pro ein O1OO (CTO11) CT011 Hypotheti Ca Pro ein O104 (CTO17) CTO17 HypothetiCa Pro ein O105 (CT016) CT016 HypothetiCa Pro ein O107 (CTO58) CTO58 HypothetiCa Pro ein 1 O108 (CTO18) CTO18 Similarit y O111 (CT021) CT021 HypothetiCa Pro ein O121 (CT031) CT031 HypothetiCa Pro ein O129 (CTO36) CTO36 Similarit y O145 (CT114) CT114 Hypotheti Ca Pro ein O150 (CT147) CT147 HypothetiCa Pro ein O152 (CT149) CT149 HypothetiCa Pro ein O176 (CT153) CT153 HypothetiCa Pro ein O188 (CT132) CT132 HypothetiCa Pro ein O189 (CT131) CT131 HypothetiCa Pro ein O2O6 (CT203) CT203 HypothetiCa Pro ein O229 (CT178) CT178 HypothetiCa Pro ein O230 (CT179) CT179 HypothetiCa Pro ein O234 (CT181) CT181 HypothetiCa Pro ein O249 (CT151) CT151 HypothetiCa Pro ein O253 (CT144) CT144 HypothetiCa Pro ein 1 O254 (CT143) CT143 HypothetiCa Pro ein 1 O255 (CT142) CT142 HypothetiCa Pro ein 1 O256 (CT144) CT144 HypothetiCa Pro ein 2 O257 (CT143) CT143 HypothetiCa Pro ein 2 O259 (CT142) CT142 HypothetiCa Pro ein 2 O276 (CT191) CT191 HypothetiCa Pro ein O288 (CT195) CT195 HypothetiCa Pro ein O293 (CT234) CT234 HypothetiCa Pro ein O3O1 (CT242) CT368 HypothetiCa Pro ein O303 (CT244) CT244 HypothetiCa Pro ein O3O8 (CT249) CT249 Similarit y O312 (CT101) CT101 Hypothe Ca Pro ein O328 (CTO85) CTO85 Hypothe ica Pro ein O330 (CTO83) CTO83 HypothetiCa Pro ein O331 (CTO82) CTO82 HypothetiCa Pro ein O334 (CTO79) CTO79 Similarit y O342 (CTO73) CTO73 Hypothe ica Pro ein O343 (CTO73) (frame-shift with 0342?) O350 (CTO66) CTO66 Hypothe Ca Pro ein O369 (CTO58) CTO58 HypothetiCa Pro ein 2 O370 (CTO58) CTO58 HypothetiCa Pro ein 3 O374 (CTO56) CTO56 HypothetiCa Pro ein O379 (CTO53) CTO53 HypothetiCa Pro ein O381 (CT326) CT326 Similarit y O383 (CTO47) CTO47 Hypothe Ca Pro ein O387 (CTO43) CTO43 Hypothe ica Pro ein O389 (CTO41) CTO41 HypothetiCa Pro ein O393 (CTO38) CTO38 HypothetiCa Pro ein O395 (CT257) CT257 HypothetiCa Pro ein O399 (CT253) CT253 HypothetiCa Pro ein OO (CT254) CT254 HypothetiCa Pro ein O1 (CT255) CT255 HypothetiCa Pro ein 05 (CT105) CT105 HypothetiCa Pro ein O8 (CT102) CT102 HypothetiCa Pro ein O9 (CT260) CT260 HypothetiCa Pro ein 11 (CT262) CT262 HypothetiCa Pro ein 12 (CT263) CT263 HypothetiCa Pro ein 15 (CT266) CT266 HypothetiCa Pro ein 2O (CT271) CT271 HypothetiCa Pro ein 22 (CT273) CT273 HypothetiCa Pro ein 23 (CT274) CT274 HypothetiCa Pro ein 25 (CT276) CT276 HypothetiCa Pro eins 26 (CT277) CT277 Smarty 34 (CT283) CT283 Hypothe Ca Pro ein 41 (CTO07) CT007 Hypothe ica Pro ein 42 (CTO06) CT006 HypothetiCa Pro ein 43 (CTO05) CT005 HypothetiCa Pro ein 74 (CT365) CT365 HypothetiCa Pro ein 76 (CTS65) CT865 HypothetiCa Pro ein 8O (CT383) CT383 HypothetiCa Pro ein 85 (CT382) CT382.1 Hypothetical Protein 87 (CT384) CT384 Hypothe Ca Pro e 89 (CT386) CT386 HypothetiCa Pro ein US 6,822,071 B1 77

TABLE 2-continued (Supplemental Data) Functional Assignments of C. pneumoniae Coding Sequences. C. trachomatis genes are shown in parentheses. O490 (CT387) CT387 Hypothetical Protein O491 (CT389) CT389 Hypothetical Protein O496 (CT391) CT391 Hypothetical Protein O497 (CT388) CT388 Hypothetical Protein OSO6 (CT421) CT421 Hypothetical Protein O507 (CT421) CT421.1 Hypothetical Protein O508 (CT421) CT421.2 Hypothetical Protein O512 (CT425) CT425 Hypothetical Protein O514 (CT427) CT427 Hypothetical Protein O518 (CT429) CT429 Hypothetical Protein O522 (CT433) CT433 Hypothetical Protein O525 (CT398) CT398 Hypothetical Protein O533 (CT406) CT406 Hypothetical Protein O537 (CTS14) CT814.1 Hypothetical Protein O538 (CTS14) CT814 Hypothetical Protein 0554 (CT440) CT440 Hypothetical Protein O559 (CT441) CT441.1 Hypothetical Protein O565 (CT449) CT449 Hypothetical Protein O572 (CT456) CT456 Hypothetical Protein O582 (CT465) CT465 Hypothetical Protein O583 (CT466) CT466 Hypothetical Protein O588 (CT469) CT469 Hypothetical Protein O589 (CT470) CT470 Hypothetical Protein O590 (CT471) CT471 Hypothetical Protein O593 (CT474) CT474 Hypothetical Protein O595 (CT476) CT476 Hypothetical Protein O6O1 (CT483) CT483 Hypothetical Protein O6O2 (CT484) CT484 Hypothetical Protein O606 (CT488) CT488 Hypothetical Protein O609 (CT490) CT490 Hypothetical Protein O622 (CT503) CT503 Hypothetical Protein O623 (CT504) CT504 Hypothetical Protein O648 (CT529) CT529 Hypothetical Protein O658 (CT538) CT538 Hypothetical Protein O667 (CT546) CT546 Hypothetical Protein O668 (CT547) CT547 Hypothetical Protein O669 (CT548) CT548 Hypothetical Protein O671 (CT550) CT550 Hypothetical Protein O673 (CT552) CT552 Hypothetical Protein O675 (CT696) CT696 Hypothetical Protein 0676 (CT695) CT695 Similarity O681 (CT691) CT691 Hypothetical Protein O687 (CT482) CT482 Hypothetical Protein O688 (CT481) CT481 Hypothetical Protein O7OO (CT676) CT676 Hypothetical Protein O705 (CT671) CT671 Hypothetical Protein O7O6 (CT670) CT670 Hypothetical Protein O708 (CT668) CT668 Hypothetical Protein O709 (CT667) CT667 Hypothetical Protein O710 (CT666) CT666 Hypothetical Protein O711 (CT665) CT665 Hypothetical Protein O713 (CT663) CT663 Hypothetical Protein O717 (CT656) CT656 Hypothetical Protein O718 (CT657) CT657 Hypothetical Protein O72O (CT659) CT659 Hypothetical Protein O722 (CT654) CT654 Hypothetical Protein O725 (CT652) CT652.1 Hypothetical Protein O726 (CT620) CT620 Hypothetical Protein O727 (CT619) CT619 Hypothetical Protein O739 (CT638) CT368 Hypothetical Protein O742 (CT635) CT635 Hypothetical Protein O746 (CT632) CT632 Hypothetical Protein O747 (CT631) CT631 Hypothetical Protein O751 (CT651) CT651 Hypothetical Protein O755 (CT616) CT616 Hypothetical Protein O760 (CT611) CT611 Hypothetical Protein O761 (CT610) CT610 Hypothetical Protein O764 (CT648) CT648 Hypothetical Protein O765 (CT647) CT647 Hypothetical Protein O766 (CT646) CT646 Hypothetical Protein O767 (CT645) CT645 Hypothetical Protein O770 (CT642) CT642 Hypothetical Protein O774 (CT606) CT606.1 Hypothetical Protein O776 (CT605) CT605 Hypothetical Protein O779 (CT602) CT602 Hypothetical Protein O783 (CT598) CT598 Hypothetical Protein US 6,822,071 B1 79 80

TABLE 2-continued (Supplemental Data) Functional Assignments of C. pneumoniae Coding Sequences. C. trachomatis genes are shown in parentheses. O791 (CT590) CT590 Hypothetical Protein O792 (CT589) CT589 Hypothetical Protein O8O3 (CT584) CT584 Hypothetical Protein O807 (CT580) CT580 Hypothetical Protein O808 (CT579) CT579 Hypothetical Protein O809 (CT578) CT578 Hypothetical Protein O810 (CT577) CT577 Hypothetical Protein O814 (CT573) CT573 Hypothetical Protein O818 (CT569) CT569 Hypothetical Protein O819 (CT568) CT568 Hypothetical Protein O82O (CT567) CT567 Hypothetical Protein O821 (CT566) CT566 Hypothetical Protein O822 (CT565) CT565 Hypothetical Protein O827 (CT560) CT560 Hypothetical Protein O834 (CT556) CT556 Hypothetical Protein O840 (CT700) CT700 Hypothetical Protein O842 (CT702) CT702 Hypothetical Protein O843 (CT702) CT702 Hypothetical Protein O852 (CT711) CT711 Hypothetical Protein O853 (CT712) CT712 Hypothetical Protein O857 (CT716) CT716 Hypothetical Protein O859 (CT718) CT718 Hypothetical Protein O865 (CT724) CT724 Hypothetical Protein O869 (CT728) CT728 Hypothetical Protein O874 (CT733) CT733 Hypothetical Protein O875 (CT734) CT734 Hypothetical Protein O884 (CT741) CT741 Hypothetical Protein O887 (CT744) CHLTR Possible Phosphoprotein O896 (CT753) CT753 Hypothetical Protein O906 (CT763) CT763 Hypothetical Protein O908 (CT764) CT764 Hypothetical Protein O912 (CT768) CT768 Hypothetical Protein O925 (CT779) CT779 Hypothetical Protein O938 (CT788) CT788 Hypothetical Protein O939 (CT790) CT790 Hypothetical Protein O943 (CT794) CT794.1 Hypothetical Protein O945 (CT795) CT795 Hypothetical Protein O956 (CT305) CT805 Hypothetical Protein O960 (CTS09) CT809 Hypothetical Protein O989 (CTS32) CT832 Hypothetical Protein O994 (CTS37) CT837 Hypothetical Protein O995 (CT338) CT838 Hypothetical Protein O996 (CTS39) CT839 Hypothetical Protein OO2 (CT345) CT845 Hypothetical Protein OO3 (CT346) CT846 Hypothetical Protein OO4 (CT347) CT847 Hypothetical Protein O05 (CT348) CT848 Hypothetical Protein OO6 (CT349) CT849 Hypothetical Protein OO7 (CT349) CT849.1 Hypothetical Protein O08 (CT350) CT850 Hypothetical Protein O10 (CT352) CT852 Hypothetical Protein O11 (CT353) CT853 Hypothetical Protein O15 (CTS57) CT857 Hypothetical Protein O16 (CT358) CT858 Hypothetical Protein O19 (CTS60) CT860 Hypothetical Protein O2O (CT361) CT861 Hypothetical Protein O22 (CTS63) CT863 Hypothetical Protein O32 (CT373) CT373 Hypothetical Protein O33 (CT372) CT372 Hypothetical Protein O34 (CT371) CT371 Hypothetical Protein O57 (CT356) CT356 Hypothetical Protein O58 (CT355) CT355 Hypothetical Protein O61 (CT330) CT330 Hypothetical Protein O73 (CT371) CT371 Hypothetical Protein Coding Genes Not in C. trachamatis O486 Hypothetical Proline Permease O279 Possible ABC Transporter Permease Protein O505 3-Methyladenine DNA Glycosylase O193 argR Similarity to Arginine Repressor 1041 bioA Adenosylmethionine-8-Amino-7-Oxononanoate Aminotransferase 1044 bio Biotin Synthase 1042 bioD Dethiobiotin synthetase O585 Similarity to Cps IncA 2 O562 CHLPS 43 kDa Protein Homolog 1 O927 CHLPS 43 kDa Protein Homolog 2 US 6,822,071 B1 81

TABLE 2-continued (Supplemental Data) Functional Assignments of C. pneumoniae Coding Sequences. C. trachomatis genes are shown in parentheses. O928 CHLPS 43 kDa Protein Homolog 3 O929 CHLPS 43 kDa Protein Homolog 4 1045 Conserved Hypothetical Membrane Protein O251 Conserved Hypothetical Protein O278 Conserved Outer Membrane Lipoprotein Protein O907 CutA-like Periplasmic Divalent Cation Tolerance Protein O171 guSA GMP Synthase O172 gusB Inosine 5'-Monophosphase Dehydrogenase O608 Uridine 5'-Monophosphate Synthase O735 Uridine Kinase O98O Similar to Saccharomyces cerevisiae 52.9 KDa Protein O232 Similarity to 5'-Methylthioadenosine Nucleosidase 104 6 Tryptophan Hydroxylase O47 ygeV Bs Conserved Hypothetical Protein OO4 ygfE-Bs Conserved Hypothetical IM Protein O58 yvyD Bs Conserved Hypothetical Protein O14 yxiG Bs 1 Conserved Hypothetical Protein O44 yxiG Bs 2 Conserved Hypothetical Protein OOO OOO OOO OOO OO OO OO OO2 OO2 OO3 OO4 OO4 OO4 OO4 OO4 OO4 OO4 OO4 9 OO50 OO51 OO63 OO64 OO66 OO67 OO69 OO70 OO99 O 24 25 26 3O 31 32 42 46 47 55 56 57 58 59 62 63 64 65 66 67 68 69 70 73 74 75 77 78 79 8O 81

US 6,822,071 B1 85 86

TABLE 2-continued (Supplemental Data) Functional Assignments of C. pneumoniae Coding Sequences. C. trachomatis genes are shown in parentheses.

50

-continued

tRNAs tRNAs tRNA # Begin End Type Codon 55 tRNA # Begin End Type Codon 89657 897.28 Thr GGT 13 784822 784896 Glu TTC 90998 91070 Trp CCA 14 784922 784994 Lys TTT 199301 199229 Met CAT 15 836119 836.191 Ala GGC 1993.90 199317 Met CAT 16 843926 843999 Pro GGG 296O75 296147 Va TAC 60 17 8774OO 877473 Arg ACG 296.151 296224 Asp GTC 18 1085 605 1085676 Glin TTG 4O9848 4O9922 Pro TGG 19 1142O34 1142118 Ser TGA 462141 462214 Arg CCT 2O 1175863 1175944 Leu TAG 672236 672318 Leu CAA 21 1230O28 1229942 Ser CGA 677264 677337 Arg TCG 22 1137462 1137389 Val GAC 739403 739486 Leu CAG 65 23 1030603 1030533 Cys GCA 78.1610 781.68O Gly TCC 24 1OOOO22 999949 His GTG US 6,822,071 B1

-continued -continued

tRNAs tRNAs 5 tRNA # Begin End Type Codon tRNA # Begin End Type Codon

25 96.16O7 961536 Gly GCC 33 293477 2934.05 Thr TGT 26 807413 807341 Arg TCT 34 293399 293317 T GTA 27 78678O 786708 Thr CGT yr 28 715971 715889 Leu TAA 1O 35 269142 26907O Ala TGC 29 708441 708354 Ser GCT 36 269065 268992 Ile GAT 3O 68O259 68O178 Leu GAG 37 1643089 164318 Asn GTT 31 631445 631373 Phe GAA 38 87522 87450 Met CAT 32 626987 626901 Ser GGA

SEQUENCE LISTING The patent contains a lengthy “Sequence Listing Section. A copy of the “Sequence Listing” is available in electronic form from the USPTO web site (http://seqdata.uspto.gov/sequence.html?DocID=5983318B9). An electronic copy of the “Sequence Listing” will also be available from the USPTO upon request and payment of the fee set forth in 37 CFR 119(b)(3).

What is claimed is: 4. The composition of claim 2, further comprising an 1. An isolated Chlamydia pneumoniae protein comprising adjuvant. the amino acid sequence set forth in SEQ ID NO: 1047. 5. The isolated Chlamydia pneumoniae protein of claim 1, 2. A composition comprising the isolated protein of claim which is bound to a Solid Surface. 1, and a carrier. 6. The isolated Chlamydia pneumoniae protein of claim 5, 3. The composition of claim 2, wherein the carrier is an wherein the Solid Surface is nitrocellulose. aqueous carrier. k . . . .