The Genome Sequence of Clostridium Tetani, the Causative Agent of Tetanus Disease
Total Page:16
File Type:pdf, Size:1020Kb
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by University of Regensburg Publication Server The genome sequence of Clostridium tetani, the causative agent of tetanus disease Holger Bru¨ ggemann*, Sebastian Ba¨ umer*, Wolfgang Florian Fricke*, Arnim Wiezer*, Heiko Liesegang*, Iwona Decker*, Christina Herzberg†, Rosa Martı´nez-Arias*, Rainer Merkl*, Anke Henne*, and Gerhard Gottschalk*†‡ *Go¨ ttingen Genomics Laboratory and †Department of General Microbiology, Institute of Microbiology and Genetics, Georg-August-University, Grisebachstrasse 8, D-37077 Go¨ ttingen, Germany Edited by Dieter So¨ ll, Yale University, New Haven, CT, and approved December 11, 2002 (received for review September 27, 2002) Tetanus disease is one of the most dramatic and globally prevalent a mammalian host. The presented information may also help to diseases of humans and vertebrate animals, and has been reported unravel the regulatory network governing tetanus toxin forma- for over 24 centuries. The manifestation of the disease, spastic tion, which is still poorly understood (11). paralysis, is caused by the second most poisonous substance known, the tetanus toxin, with a human lethal dose of Ϸ1ng͞kg. Methods Fortunately, this disease is successfully controlled through immu- Sequencing Strategy. Genomic DNA of C. tetani E88, a nonsporu- nization with tetanus toxoid; nevertheless, according to the World lating variant of strain Massachusetts used in vaccine production, Health Organization, an estimated 400,000 cases still occur each was extracted and sheared randomly. Several shotgun libraries year, mainly of neonatal tetanus. The causative agent of tetanus were constructed by using size fractions ranging from 1 to 3 kb. disease is Clostridium tetani, an anaerobic spore-forming bacte- A cosmid library was constructed from TasI partially digested rium, whose natural habitat is soil, dust, and intestinal tracts of genomic DNA cloned in the cosmid vector SuperCos1 (Strat- various animals. Here we report the complete genome sequence of agene). Insert ends of the recombinant plasmids and cosmids toxigenic C. tetani E88, a variant of strain Massachusetts. The were sequenced by using dye-terminator chemistry with Mega- genome consists of a 2,799,250-bp chromosome encoding 2,372 BACE 1000 and ABI Prism 377 DNA automated sequencers ORFs. The tetanus toxin and a collagenase are encoded on a (Amersham Biosciences and Applied Biosystems). Sequences 74,082-bp plasmid, containing 61 ORFs. Additional virulence- were processed with PHRED and assembled into contigs by using related factors could be identified, such as an array of surface- the PHRAP assembling tool (12). Sequence editing was done by layer and adhesion proteins (35 ORFs), some of them unique to C. using GAP4, which is part of the STADEN package software (13). tetani. Comparative genomics with the genomes of Clostridium A coverage of 9.2-fold was obtained after the assembly of 44,500 perfringens, the causative agent of gas gangrene, and Clostridium sequences. To solve problems with misassembled regions caused acetobutylicum, a nonpathogenic solvent producer, revealed a by repetitive sequences and to close remaining sequence gaps, remarkable capacity of C. tetani: The organism can rely on an PCR-based techniques and primer walking on recombinant extensive sodium ion bioenergetics. Additional candidate genes plasmids and cosmids were applied. involved in the establishment and maintenance of a pathogenic lifestyle of C. tetani are presented. ORF Prediction and Annotation. Initial ORF prediction was accom- plished by using the GLIMMER program (www.tigr.org͞software͞ eurotoxigenic clostridia have attracted considerable atten- glimmer͞). The result was verified and improved manually by Ntion during the past decades. The mode of action of the using criteria such as the presence of a ribosome-binding site and strongest toxins known to mankind, the tetanus toxin produced codon usage analysis. Annotation was done first automatically by by Clostridium tetani and the homologous botulinum toxins A–G the ERGO annotation tool (Integrated Genomics, www.inte- produced by various Clostridium botulinum strains, is now well gratedgenomics.com), which was verified and refined by search- understood (1–4). The tetanus toxin (TeTx) blocks the release ing derived protein sequences against the public GenBank͞ of neurotransmitters from presynaptic membranes of inhibitory European Molecular Biology Laboratory databases by using the neurons of the spinal cord and the brainstem of mammals; it BLAST program (14). catalyzes the proteolytic cleavage of the synaptic vesicle protein synaptobrevin (5). This leads to continuous muscle contractions, Comparative Genomics. For comparative analysis, each ORF of which are primarily observed in jaw and neck muscles (lockjaw). C. tetani was searched against all ORFs of Clostridium perfringens Since the introduction of a potent vaccine during World War II, and Clostridium acetobutylicum by using the BLAST program (14). cases of tetanus disease have been only sporadic in industrial Significant homology was defined as Ͼ30% amino acid identity countries. However, the disease, and in particular neonatal over Ͼ60% of both the C. tetani protein sequence and the tetanus (NT), is still an important cause of death due to homologous sequence of C. perfringens or C. acetobutylicum, insufficient immunization (4, 6). NT is the second leading cause respectively. of death from vaccine-preventable diseases among children worldwide. It is considered endemic to 90 developing countries, Results and Discussion with an estimated 248,000 deaths occurring in 1997 (World General Features of the C. tetani Genome. The chromosome of Health Organization; www.who.int͞vaccines-diseases͞diseases͞ C. tetani contains 2,799,250 bp with a GϩC content of 28.6%. NeonatalTetanus.shtml). The 74,082-bp plasmid pE88 exhibits a remarkably low GϩC So far, genetic information on C. tetani is mainly restricted to content of 24.5%. The replication origin of the chromosome was the nucleotide sequences of the tetanus toxin TeTx and of its direct transcriptional activator TetR, both of which are encoded on a plasmid, designated pCL1 in the strain Massachusetts This paper was submitted directly (Track II) to the PNAS office. (7–10). The identification and analysis of all genes in the genome Abbreviation: SLP, surface-layer protein. of C. tetani will contribute to our understanding of the lifestyle Data deposition: The sequence reported in this paper has been deposited in the GenBank switch from a harmless soil bacterium to a potentially devastat- database [accession nos. AE015927 (chromosome) and AF528097 (pE88)]. ing neurosystem-damaging organism after entering and infecting ‡To whom correspondence should be addressed. E-mail: [email protected]. 1316–1321 ͉ PNAS ͉ February 4, 2003 ͉ vol. 100 ͉ no. 3 www.pnas.org͞cgi͞doi͞10.1073͞pnas.0335853100 MICROBIOLOGY Fig. 1. Circular maps of the chromosome and the plasmid pE88 of C. tetani. The coding sequence of the chromosome is shown in blue or green, depending on strand orientation. ORFs of C. tetani that have homologous proteins in C. perfringens (15) but not in C. acetobutylicum (16) are shown in orange. ORFs of C. tetani, which are not present in the two other mentioned clostridial genomes, are shown in pink. The plasmid map shows ORFs color-coded according to their assigned functions. The gene of the tetanus toxin, tetX (CTP60), and the gene encoding a collagenase, colT (CTP33), are highlighted in red. Numbers at the inner ring refer to genes mentioned in the text. Most inner rings of both maps show the GϩC content variation (higher values outward). identified on the basis of GC-skew analysis, the distinctive nonfunctional because of insertions, deletions, and point inflection point in the coding strand and the vicinal presence mutations resulting in frame-shifted or degenerated ORFs. of characteristic replication proteins such as DnaA (Fig. 1; also GϩC variation within the genome is very low; the only regions published as Figs. 5 and 6 in supporting information on the with a significantly higher GϩC content (Ϸ50%) are those PNAS web site, www.pnas.org). Eighty-two percent of the harboring the six rRNA gene clusters and genes encoding 2,368 predicted ORFs on the chromosome are transcribed in ribosomal proteins. This lack of GϩC fluctuation indicates the same direction as DNA replication. A similar figure (83%) that recent events of gene acquisition by lateral gene transfer was calculated from the sequence data of C. perfringens (15). have not taken place, and that the C. tetani genome is more This distinctive codirectionality of replication and transcrip- stable than the genomes of enteropathogens. tion is not seen in the genomes of pathogens such as Vibrio cholerae or Yersinia pestis or in archaeal genomes such as The Tetanus Toxin-Encoding Plasmid pE88. It was known that the Methanosarcina mazei (17–19). Only a few mobile elements 74-kb C. tetani plasmid pE88 harbors the genes for the tetanus could be identified in the genome of C. tetani: 16 transposase toxin (tetX) and its direct transcriptional regulator TetR (7–10). genes are present, which can be classified in four different We have now identified 61 ORFs, 28 of which show similarities insertion sequence families. Most of these genes seem to be to known proteins with assigned functions (Fig. 1). Interestingly, Bru¨ ggemann et al. PNAS ͉ February 4, 2003 ͉ vol. 100 ͉ no. 3 ͉ 1317 The origin of pE88 remains unclear. Over 50% of all ORFs on pE88 are unique to C. tetani. A mobile element, present in two copies, shows strongest similarity to a transposase of B. cereus (60% identity over 355 amino acids). The putative replication protein of pE88 is duplicated (CTP1 and CTP45). It shows homology only to a protein (40% identity over 438 amino acids) encoded on the plasmid of the alkaliphilic Bacillus strain KSM- KP43 and to the replication protein (39% identity over 418 amino acids) on pIP404, the bacteriocin- and UviA-encoding 10-kb plasmid of C. perfringens CPN50 (25). Virulence Factors on the Chromosome. In addition to pE88, the C.