Treponema Denticola with Other Spirochete Genomes
Total Page:16
File Type:pdf, Size:1020Kb
Comparison of the genome of the oral pathogen Treponema denticola with other spirochete genomes Rekha Seshadri*, Garry S. A. Myers*, Herve´ Tettelin*, Jonathan A. Eisen*†, John F. Heidelberg*‡, Robert J. Dodson*, Tanja M. Davidsen*, Robert T. DeBoy*, Derrick E. Fouts*, Dan H. Haft*, Jeremy Selengut*, Qinghu Ren*, Lauren M. Brinkac*, Ramana Madupu*, Jamie Kolonay*, Scott A. Durkin*, Sean C. Daugherty*, Jyoti Shetty*, Alla Shvartsbeyn*, Elizabeth Gebregeorgis*, Keita Geer*, Getahun Tsegaye*, Joel Malek*, Bola Ayodeji*, Sofiya Shatsman*, Michael P. McLeod§, David Sˇ majs§, Jerrilyn K. Howell¶, Sangita Pal§, Anita Amin§, Pankaj Vashisth¶, Thomas Z. McNeill§, Qin Xiang§, Erica Sodergren§, Ernesto Baca§, George M. Weinstock§, Steven J. Norris¶, Claire M. Fraser*ʈ, and Ian T. Paulsen*†** *The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850; †Johns Hopkins University, Charles and 34th Streets, Baltimore, MD 21218; ʈDepartments of Pharmacology and Microbiology and Tropical Medicine, The George Washington University School of Medicine, 2300 Eye Street Northwest, Washington, DC 20037; ‡Center of Marine Biotechnology, University of Maryland Biotechnology Institute, Baltimore, MD 21202; ¶Department of Pathology and Laboratory Medicine and Graduate School of Biomedical Sciences, University of Texas Health Science Center, 6431 Fannin Street, Houston, TX 77230; and §Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 Edited by Robert Haselkorn, University of Chicago, Chicago, IL, and approved February 9, 2004 (received for review November 20, 2003) We present the complete 2,843,201-bp genome sequence of Trepo- cultivate and manipulate genetically, making this an excellent nema denticola (ATCC 35405) an oral spirochete associated with model for spirochete research. periodontal disease. Analysis of the T. denticola genome reveals factors mediating coaggregation, cell signaling, stress protection, Methods and other competitive and cooperative measures, consistent with Bacteria. T. denticola strain 35405 was initially isolated and its pathogenic nature and lifestyle within the mixed-species envi- designated as the type strain by Chan et al. (7). Bacteria used in ronment of subgingival dental plaque. Comparisons with previ- this study were obtained from the American Type Culture ously sequenced spirochete genomes revealed specific factors Collection. contributing to differences and similarities in spirochete physiol- ogy as well as pathogenic potential. The T. denticola genome is Sequencing and Gene Identification. The complete genome of T. considerably larger in size than the genome of the related syphilis- denticola strain 35405 was sequenced by using the random causing spirochete Treponema pallidum. The differences in gene shotgun method described for genomes sequenced by The content appear to be attributable to a combination of three Institute for Genomic Research (5). ORFs likely to encode phenomena: genome reduction, lineage-specific expansions, and proteins (CDSs) were predicted by GLIMMER (8). All predicted horizontal gene transfer. Genes lost due to reductive evolution proteins Ͼ30 aa were analyzed for sequence similarity against a appear to be largely involved in metabolism and transport, nonredundant protein database. Two sets of hidden Markov whereas some of the genes that have arisen due to lineage-specific models were used to determine CDS membership in families and expansions are implicated in various pathogenic interactions, and superfamilies: PFAM V5.5 (9) and TIGRFAMS (10). Domain-based genes acquired via horizontal gene transfer are largely phage- paralogous families were built by performing all-versus-all related or of unknown function. searches on the remaining protein sequences. The 5Ј regions of each CDS were inspected to define initiation codons by using he Gram-negative oral spirochete Treponema denticola is homologies, position of ribosomal binding sites, and transcrip- Tpredominantly associated with the incidence and severity of tional terminators. Sequences containing frameshifts and point human periodontal disease (1). This polymicrobial infection and mutations were reexamined and corrected where appropriate. inflammation of the gingiva occurs in 80% of the adult popu- Protein membrane-spanning domains were identified by lation at some time in their lives and can evolve to severe forms TOPPRED (11). Putative signal peptides were identified with including refractory periodontitis and acute necrotizing gingivi- SIGNALP (12). tis, resulting in bone resorption and tooth loss. Treatment regimens to combat periodontitis are difficult and costly involv- Trinucleotide Composition. Distribution of all 64 trinucleotides ing extensive antibiotic treatment and intricate surgery. T. (3-mers) for each chromosome was determined, and the 3-mer denticola dwells in a complex and diverse microbial community distribution in 2,000-bp windows that overlapped by half their within the oral cavity, and as such is highly specialized to survive length (1,000 bp) across the genome was computed. For each within this milieu. This aerotolerant anaerobe (2) is related to window, we computed the 2 statistic on the difference between the syphilis-causing obligate human pathogen, Treponema pal- its 3-mer content and that of the whole chromosome. A large lidum subsp. pallidum. T. denticola is one of Ϸ60 treponemal value for 2 indicates the 3-mer composition in this window is species or uncharacterized phylotypes found in dental plaque different from the rest of the chromosome. Probability values for (3). Spirochetes comprise a monophyletic phylum that exhibits this analysis are based on assumptions that the DNA composi- overall structural similarity and rRNA relatedness but great tion is relatively uniform throughout the genome, and that 3-mer variability in habitat, physiologic properties, and genome size composition is independent. Because these assumptions may be and organization (Table 1). Comparative analysis with previously sequenced spirochetes (T. pallidum, Borrelia burgdorferi, and Leptospira interrogans) This paper was submitted directly (Track II) to the PNAS office. (4–6) yielded insights into the basis for differences in their Abbreviation: CDS, protein-coding sequences. lifestyle and disease manifestations. The genome of T. denticola Data deposition: The sequence reported in this paper has been deposited in the GenBank clearly reflects its adaptations for colonization and survival database (accession no. AE017226). within the biofilm environment of subgingival dental plaque. **To whom correspondence should be addressed. E-mail: [email protected]. Compared to other spirochetes, T. denticola is relatively easy to © 2004 by The National Academy of Sciences of the USA 5646–5651 ͉ PNAS ͉ April 13, 2004 ͉ vol. 101 ͉ no. 15 www.pnas.org͞cgi͞doi͞10.1073͞pnas.0307639101 Downloaded by guest on September 24, 2021 Table 1. General genome features T. denticola T. pallidum* B. burgdorferi* L. interrogans† Size, bp 2,843,201 1,138,012 910,725 4,691,184 GϩC content, % 37.9 52.8 28.6 36.0 Protein-coding genes No. with assigned function 1,223 542 487 2,060 No. of unknown function‡ 352 35 22 146 No. of conserved hypotheticals§ 477 175 102 569 No. with no database match¶ 734 288 242 1,952 Total 2,786 1,040 853 4,727 Average CDS size, bp 939 1,017 992 778 Coding, % 92.1 93.0 93.5 78.4 rRNA 6 6 5 4 tRNA 44 45 34 37 *The distribution of CDSs in the T. pallidum and B. burgdorferi chromosomes are derived from the original annotation. These numbers, particularly hypothetical and conserved hypothetical proteins, may be significantly different with updated blast searches and annotation. †The genome information for L. interrogans represents combined data from both chromosomes. ‡Unknown function, significant sequence similarity to a named protein for which no function is currently attributed. §Conserved hypothetical protein, sequence similarity to a translation of another ORF, however no experimental evidence for protein expression exists. ¶Hypothetical protein, no significant similarity to any other sequenced protein. Twenty-five of the total number of CDSs in T. denticola possess one or more authentic frameshifts, point mutations, or are truncated. incorrect, we prefer to interpret high 2 values as indicators of About one-fourth of T. denticola genes (728 CDSs) have their regions on the chromosome that appear unusual and demand best matches to CDSs in the T. pallidum genome (representing further scrutiny. 68% of its genome), and on average, these share only 53% amino acid identity (70.6% similarity). Essentially no synteny (conser- Comparative Genomics. The T. pallidum and T. denticola genomes vation of gene order) exists between the T. denticola and T. were compared at the nucleotide level by suffix tree analysis pallidum genomes [with the exception of highly conserved using MUMMER, and their ORF sets were compared by using operons encoding ribosomal (TDE0766-TDE0792) and flagellar BLAST. Additionally, all T. denticola CDSs were compared by proteins (TDE1198-TDE1219)]. This result, as well as differ- MICROBIOLOGY BLASTP against the complete set of CDSs from T. pallidum, L. ences in G ϩ C content (Table 1) and rRNA sequences, indicates interrogans, and B. burgdorferi using an E value cutoff of 10Ϫ5. that divergence of T. denticola and T. pallidum from a common ancestor was an ancient event relative to the recent divergence Results and Discussion of many bacterial groups (for which