The Organization of Leptospira at a Genomic Level
Total Page:16
File Type:pdf, Size:1020Kb
Genomic Organization of Leptospira 109 7 The Organization of Leptospira at a Genomic Level Dieter M. Bulach, Torsten Seemann, Richard L. Zuerner, and Ben Adler Summary The complete nucleotide sequences of three and, in the not too distant future, six strains from the genus Leptospira will be available. Managing and maintaining these data will be a perpetual problem unless a system is devised to address this issue. We propose a central role for the Interna- tional Union of Microbiological Societies Subcommittee on the Taxonomy of Leptospira in main- taining and updating the annotated leptospiral genome sequences. The first step in this process is provided as part of this publication, namely a revision of the annotation of the three published genomes, and an internet location for the current versions of these genomes. Key Words: Leptospira; comparative genomics; spirochetes; genome annotation. 1. Introduction Leptospirosis is a zoonosis of worldwide distribution. Significant efforts aimed at raising the standard of diagnostics (1) and the collection and collation of disease data (via LeptoNet; www.leptonet.net) will correct what is likely to be a significant under- estimation of the contribution of this disease to human morbidity and mortality. Lepto- spirosis is caused by pathogenic serovars from the genus Leptospira and in the current species classification schema these pathogenic serovars are usually classified as Lepto- spira interrogans, Leptospira borgpetersenii, Leptospira kirschneri, Leptospira noguchii, Leptospira meyeri, Leptospira weilii, or Leptospira santarosai. Analysis based on 16S ribosomal RNA (rRNA) gene analysis shows that these serovars are found in one of the three phylogenic clades into which the genus can be divided (2). The severity of lepto- spirosis varies from a mild febrile flu-like illness that is rarely fatal, to a disease causing multi-organ failure and mortality rates of up to 20% where inadequate patient support is available. There is an underlying association between severity of disease and serovar and to some extent species, with severe disease being often associated with serovars from L. interrogans and L. kirschneri (3). The relative abundance of sequenced leptospiral genomes, and for that matter spiro- chetal genomes, has been stimulated because of the difficulties these bacteria have posed for genetic manipulation. Only in the last few years has it been possible to trans- form Leptospira, and this is only in one strain of the saprophyte L. biflexa. Studies have compared the genetic layout of strains of L. interrogans and demonstrated that gene layout is variable between strains; speculatively, this genetic shuffling may be related to the multitude of insertion sequence (IS) elements present. This genetic fluidity is para- doxical, given the apparent stability of leptospiral strains in long-term serial passage, From: Bacterial Genomes and Infectious Diseases Edited by: V. L. Chan, P. M. Sherman, and B. Bourke © Humana Press Inc., Totowa, NJ 109 110 Bulach et al. although this long-term stability is measured in the context of the absence of change to the lipopolysaccharide phenotype. Clearly, pathogenic Leptospira undergoes consider- able phenotypical change in the transition from in vivo growth to in vitro culture, as indi- cated by strains that have undergone long term in vitro culture having low infectivity/ virulence when returned to in vivo conditions. In the absence of reliable systems for the genetic manipulation of Leptospira, compar- ative genomics provides a means by which we can begin to understand the genetic basis for differences in virulence and the effects of long term passage. The background and history of sequenced strains is therefore critical. Although two L. interrogans strains have been sequenced, the serovar Lai strain has undergone long term in vitro passage (4), whereas the serovar Copenhageni strain was isolated from a patient with severe leptospirosis and has undergone minimal in vitro passage (5). Likewise, the L. borg- petersenii serovar, Hardjobovis, is a low-passage, human isolate. Pulsed-field gel elec- trophoresis analysis of the genomic DNA showed it to be a type A strain (unpublished data). These strains have a wide geographical distribution and are the most common Hardjobovis isolate. As previously mentioned, the genus Leptospira is divided into three broad phyloge- nic clades. L. interrogans and L. borgpetersenii are representative of the diversity of one of the clades containing the pathogenic serovars, with L. interrogans serovars Lai and Copenhageni associated with severe leptospirosis with high mortality, and Hardjo- bovis with a much milder disease that is almost never fatal (2). Moreover, the mainte- nance hosts for Copenhageni and Lai are rodent species, whereas bovine species are the usual maintenance hosts for Hardjobovis. The sequenced strains, therefore, provide an excellent insight into the genetic diversity of a key subgroup of the genus Leptospira and perhaps even a means to investigate a genetic basis for disease severity. Even in the short time since the release of the serovar Lai sequence, it has been neces- sary to review and update the annotation of this sequence. In light of the imminent release of several additional leptospiral genome sequences, it is essential to develop a strategy that continues to revise the annotation of existing genomes as further applica- ble data are published, thereby ensuring the maximum benefit is derived from the anno- tated sequences. Ultimately, the system should ensure that the annotations are up-to- date, consistent, and error-free. This chapter will attempt to provide an overview of the genomic similarities and dif- ferences between the sequenced strains, and propose a system to make orthologous genes recognizable by using a unique identifier for orthologous, leptospiral genes. This chapter will also examine the relationship between Leptospira and the other sequenced spiro- chetes (Treponema pallidum [6], Treponema denticola [7], Borrelia burgdorferi [8], and Borrelia garinii [7]). 2. Similarities 2.1. Comparing the Leptospiral Genomes The primary difficulty in comparing the leptospiral genomes is the identification of orthologous genes. In particular, the different naming schemes used by each of the annotation groups make it difficult to recognize orthologous genes. Another, perhaps less obvious, problem can arise where there are differences in the process used to decide whether or not a reading frame is a protein coding sequence (CDS). Ussery and Hallin Genomic Organization of Leptospira 111 Table 1 Start Codon Frequencies for all Protein Coding Sequence Features in Leptospira Serovar Start codon Copenhageni Lai Hardjobovis ATG 2733 2807 2572 TTG 479 468 402 GTG 221 220 175 CTG 17 19 19 ATT001 have noted that the number of CDS features has been overestimated in the Lai and Copenhageni genomes (9), thereby interfering with comparisons to genomes where a more conservative approach to the annotation of CDS features has been made. Given that there was no relationship between the groups that annotated the three leptospiral genomes, it is no real surprise that the annotations of the Lai, Copenhageni, and Hard- jobovis genomes differ significantly. Adopting the method used to annotate the Hard- jobovis genome, we have revised the Lai and Copenhageni annotations. A significant benefit of the reannotation process has been that each set of orthologous genes from the sequenced genomes has been assigned a unique identifier, and the start of the coding region has been revised and made consistent across orthologous genes. Accurate esti- mation of the start of the coding regions is critical to postgenomic studies where the prediction of the subcellular location of the encoded protein is required. Moreover, this type of revision has led to the identification of potential pseudogenes, where there is a frameshift early in the coding region, and the identification of genes that have been interrupted by the incorporation of IS elements into the genome. It is worth noting the frequency of the different start codons that are predicted to be used by Leptospira (Table 1). Although ATG is the most common start codon, TTG, GTG, and CTG occur in diminishing frequencies. Based on similarity analysis, it is clear there is at least one Hardjobovis CDS that has ATT as its start codon. 2.2. Overview of the Genome Sequences The genome of L. borgpetersenii serovar Hardjobovis strain L550 comprises two circular chromosomes of 3,614,446 bases and 317,336 bases, with an overall guanine and cytosine (G+C) content of 41.3%. The density of CDS sequences across the genome is 80.3%, with an average gene size of 931 bases. In total 3111 and 292 CDS features were annotated on chromosome 1 and chromosome 2, respectively. The L. borgpetersenii genome is smaller than the L. interrogans genomes (Table 2) and codes for proportion- ally fewer genes. The G+C content is higher than that found for L. interrogans, consis- tent with previous estimates of G+C content (3). The relatively lower density of coding regions found in Leptospira compared with Escherichia coli may be related to the fluidity of the arrangement of genes on the genome. Consistent with this viewpoint, the gene layout found in the sequenced Borrelia genomes is conserved and the density of coding regions is around 95% (10). Also likely to contribute to genome fluidity is the 112 Bulach et al. Table 2 Genome Features in Leptospira Serovar Copenhageni Lai Hardjobovis Feature CI CII CI CII CI CII Size (bp) 4,277,185 350,181 4,332,241 358,943 3,614,446 317,336 G+C content (%) 35.1 35.0 36.0 36.1 41.0 41.2 Protein-coding percentage 73 73 73 73 80 80 Protein-coding (CDS) With assigned function 1811 161 1901 159 2000 177 Conserved and hypothetical 1643 113 2459 208 1121 115 Total 3454 274 4360 367 3121 292 Revised annotation 3375 279 3327 300 Difference (%) 2 0 31 22 Transfer RNA genes 37 0 37 0 37 0 Ribosomal RNA genes 23Srrl 20 1020 16Srrs 20 2020 5Srrf 10 1010 CI, chromosome 1; CII, chromosome 2.