Intragenomic Variation and Evolution of the Internal Transcribed Spacer of the Rrna Operon in Bacteria
Total Page:16
File Type:pdf, Size:1020Kb
J Mol Evol (2007) 65:44À67 DOI: 10.1007/s00239-006-0235-3 Intragenomic Variation and Evolution of the Internal Transcribed Spacer of the rRNA Operon in Bacteria Frank J. Stewart, Colleen M. Cavanaugh Department of Organismic and Evolutionary Biology, Harvard University, The Biological Laboratories, 16 Divinity Avenue, Cambridge, MA 02138, USA Received: 20 October 2006 / Accepted: 13 March 2007 [Reviewing Editor: Dr. Margaret Riley] Abstract. Variation in the internal transcribed Key words: Bacterial diversity — Intergenic spacer spacer (ITS) of the rRNA (rrn) operon is increasingly — Concerted evolution — Gene conversion — used to infer population-level diversity in bacterial Ribosomal RNA (rrn) operon — Copy number — communities. However, intragenomic ITS variation Multigene family may skew diversity estimates that do not correct for multiple rrn operons within a genome. This study characterizes variation in ITS length, tRNA compo- sition, and intragenomic nucleotide divergence across Introduction 155 Bacteria genomes. On average, these genomes en- code 4.8 rrn operons (range: 2À15) and contain 2.4 The internal transcribed spacer (ITS) region sepa- unique ITS length variants (range: 1À12) and 2.8 rating the bacterial 16S and 23S rRNA genes unique sequence variants (range: 1À12). ITS variation is increasingly used to assess microheterogeneity in stems primarily from differences in tRNA gene com- the growing field of bacterial population genetics position, with ITS regions containing tRNA-Ala + (e.g., Di Meo et al. 2000; Boyer et al. 2002; Hurtado tRNA-Ile (48% of sequences), tRNA-Ala or tRNA-Ile et al. 2003; Vogel et al. 2003; Brown and Fuhrman (10%), tRNA-Glu (11%), other tRNAs (3%), or no 2005; DeChaine et al. 2006). Relative to molecular tRNA genes (27%). Intragenomic divergence among markers used in bacterial phylogenetics (e.g., 16S paralogous ITS sequences grouped by tRNA compo- rRNA, rpoB), the ITS region experiences low sition ranges from 0% to 12.11% (mean: 0.94%). Low selective constraint, evolves rapidly, and provides a divergence values indicate extensive homogenization high-resolution estimate of gene flow and genetic among ITS copies. In 78% of alignments, divergence is structuring at the population scale (Gurtler and <1%,with54% showing zero variation and 81% ¨ Stanisich 1996; Anto´n et al. 1998; Schloter et al. 2000; containing at least two identical sequences. ITS Rocap et al. 2002, 2003; Brown and Fuhrman 2005). homogenization occurs over relatively long sequence However, use of the ITS marker for studies of envi- tracts, frequently spanning the entire ITS, and is lar- ronmental samples is problematic due to the common gely independent of the distance (basepairs) between occurrence of multiple ribosomal RNA (rrn) operons operons. This study underscores the potential contri- within a genome and to the possibility of intrage- bution of interoperon ITS variation to bacterial mic- nomic variation in ITS sequence and length. Given rodiversity studies, as well as unequivocally that strain-level genetic diversity, as measured by ITS demonstrates the pervasiveness of concerted evolution sequence and length variation, has been conclusively in the rrn gene family. linked to diversity in bacterial ecotypes (e.g., Rocap Correspondence to: Colleen M. Cavanaugh; email: cavanaug@ et al. 2002, 2003; Jaspers and Overmann 2004; Hahn fas.harvard.edu and Po¨ ckl 2005), intragenomic ITS variation may 45 lead to an overestimation of the number of func- The recent wealth of whole-genome sequence data tionally distinct bacteria in environmental samples. In allows comprehensive assessment of ITS structure, particular, intragenomic heterogeneity in ITS length variation, and evolution across distinct bacterial lin- has the potential to skew estimates of diversity that eages. Using genomic data from 155 taxa represent- are based solely on DNA fingerprinting techniques ing all major bacterial groups, this study quantifies (e.g., ARISA; Fisher and Triplett 1999; Ranjard et al. genetic variation in the ITS region among multiple 2000; Crosby and Criddle 2003). Accurately assessing rrn operons within a genome (paralogues). Our bacterial microdiversity using ITS-based techniques analyses identify clear differences in intragenomic ITS therefore requires an understanding of how this re- length, composition, and divergence among bacterial gion evolves in distinct bacterial taxa. groups and underscore the surprising extent to which Several studies directed at specific bacterial taxa genetic content is homogenized across distinct rrn have provided important insight into ITS variation operons within a genome. These data greatly increase and evolution. These studies primarily describe ITS our understanding of the evolution of the ITS region variation at the interstrain and interspecies levels, in Bacteria and provide a framework for integrating particularly with respect to medically relevant this knowledge into studies that use the ITS for organisms (Chun et al. 1999; Osorio et al. 2005; bacterial strain typing, microdiversity estimation, and Gonza´lez-Escalona et al. 2006), for which knowledge population genetics. of ITS composition is valuable for differentiating and tracking pathogenic genetic variants. Several of these studies also directly or indirectly characterize varia- Materials and Methods tion among ITS regions within the same genome (Graham et al. 1997; Anto´n et al. 1998; Luz et al. Sequence Data 1998; Boyer et al. 2001; Giannino` et al. 2003; Milyutina et al. 2004). These show the potential both ITS and 16S rRNA gene sequences from 155 complete Bacteria genomes with multiple rrn operons were obtained from the for substantial intragenomic heterogeneity in ITS National Center for Biotechnology Information (NCBI) Micro- sequence and length and for high levels of interop- bial Genome project in June 2006. Accession numbers for these eron sequence conservation within certain regions of genomes are listed as Supplemental Material. (Complete genomes the ITS (Gu¨ rtler and Stanisich 1996; Anto´n et al. containing only a single rrn operon [representing 60 distinct 1998; Nagpal et al. 1998; Rocap et al. 2003). This species at the time of the analysis] were excluded from this study.) Genomes were chosen in order to span the phylogenetic pattern has been attributed to homologous recombi- diversity of Bacteria and include representatives from 12 of the nation that rearranges tRNA genes and other se- major phylogenetic groups represented in the NCBI database quence blocks within the ITS region, often generating (Table 1). However, given the current taxonomic coverage of the ITS regions characterized by a modular composition database, Bacteria from the c-Proteobacteria and Firmicutes of alternating variable and conserved sequence blocks divisions are most abundant in our data set, constituting 35% and 20% of the genomes, respectively (Tables 1 and 2). In several (Gu¨ rtler and Stanisich 1996; Anto´n et al. 1998; Lan instances multiple strains from the same species were included in and Reeves 1998; Liao 2000; Wenner et al. 2002; the analysis to evaluate variation in ITS length and composition Osorio et al. 2005). Indeed, the nonreciprocal transfer among closely related organisms. For each rrn operon within a of such sequence blocks between paralogous rrn genome, the base-pair coordinates corresponding to the 3’ ter- operons (i.e., gene conversion) has been proposed as minus of the 16S rRNA gene and the 5’ start of the 23S rRNA gene were entered into the sequence retrieval function on the a mechanism that may ultimately homogenize parts NCBI database and used to extract the intervening ITS sequence. or all of the rrn operon across all copies within a Also, the 16S rRNA gene sequence corresponding to each ITS genome (Liao 2000). However, prior studies have was downloaded from each genome’s structural RNA table (in focused on one or a few bacterial taxa or on variation NCBI). Archaea, which commonly contain only one rrn operon in ITS length only. For many taxa, the extent to (Acinas et al. 2004b) and which have been the focus of relatively few ITS-based diversity studies, were excluded from this analysis. which recombination either homogenizes ITS regions Nonetheless, increasing use of the ITS for Archaea strain typing or generates new combinations of sequence blocks may warrant a systematic examination of ITS evolution in this within the ITS is unclear. In addition, the relative domain. contribution of point mutations to intragenomic ITS divergence, though potentially large (Anto´n et al. Intragenomic Sequence Analysis 1998; Liao 2000) and of direct relevance to accurately interpreting population-level genetic processes, has Each ITS sequence was categorized based on the presence of been characterized for only a few bacteria (e.g., distinct tRNA genes as belonging to one of five classes, con- Anto´n et al. 1998; Boyer et al. 2001). A comprehen- taining tRNA-alanine + tRNA-isoleucine (tRNA-Ala+Ile), sive analysis of ITS composition in all rrn operons in tRNA-Ala or tRNA-Ile singly (tRNA-Ala/Ile), tRNA-glutamate (tRNA-Glu), other tRNAs (tRNA-other), or no tRNA genes a genome and across diverse bacterial groups is (tRNA-none). The presence or absence of distinct tRNA genes needed to characterize the relative roles of recombi- within an ITS was determined based on each genome’s annota- nation and point mutation in ITS evolution. tion. For each genome, all ITS sequences belonging to the same Table 1. Intragenomic sequence divergence among internal transcribed