Diversity of Retrotransposable Elements

Cytogenet Res 110:589–597 (2005) DOI: 10.1159/000084992

Group II retroelements: function and diversity

A.R. Robart, S. Zimmerly Department of Biological Sciences, University of Calgary, Calgary, Alberta (Canada)

Manuscript received 16 October 2003; accepted in revised form for publication by J.-N. Volff 8 December 2003.

Abstract. Group II are a class of retroelements capa- and bacterial introns with regard to structures, insertion pat- ble of carrying out both self-splicing and retromobility reac- terns and inferred behaviors. We also discuss the evolution of tions. In recent years, the number of known group II introns has group II introns, as they are the putative ancestors of spliceoso- increased dramatically, particularly in , and the new mal introns and possibly non-LTR retroelements, and may information is altering our understanding of these intriguing have played an important role in the development of elements. Here we review the basic properties of group II . introns, and summarize the differences between the organellar Copyright © 2005 S. Karger AG, Basel

Group II introns are a unique type of genetic element with catalytic core of the (Fig. 1A) (Michel and Ferat, two remarkable properties. They are catalytic , or ribo- 1995; Qin and Pyle, 1998). I is the largest domain and zymes, that can excise themselves from pre-mRNA without the is also important for catalysis, while domain VI contains the aid of . They are also retroelements that encode reverse bulged A that forms the branch site in the spliced product. transcriptases (RTs) and can insert themselves into new loca- RNA structures are divided into two major classes, IIA and IIB, tions. Group II introns are present in mitochondria and chloro- with class IIC being a minor class of bacterial introns. plasts of and fungi, where they are relatively abundant The ORF is encoded within the loop of domain IV (Fig. 1A) (Michel et al., 1989), and they are also found in about 25% of and has four functionally defined domains: RT (reverse trans- eubacterial genomes, in either full-length or truncated forms criptase activity, with sub-domains 0–7), X (maturase activity (Ferat and Michel, 1993; Dai and Zimmerly, 2002). Recently, associated with intron splicing), D (non-conserved DNA-bind- they were found even in some archaebacterial species (Dai and ing domain) and En (endonuclease activity) (Fig. 1B) (Zimmer- Zimmerly, 2003; Rest and Mindell, 2003; Toro, 2003). Due to ly et al., 2001; Belfort et al., 2002; San Filippo and Lambowitz, the burgeoning information from the genome projects, our 2002). The serves two functions for the intron. It assists knowledge about the introns has increased significantly in the intron splicing in vivo through an activity associated with the X past few years, and this review emphasizes this new genomic domain (maturase activity), and it also allows the intron to act and comparative perspective. as a mobile element and invade intronless sites, a process which The typical group II intron consists of a conserved intron involves all four of the ORF domains, as well as the catalytic RNA structure and an RT ORF. The RNA structure is com- RNA activity. prised of six domains, of which domain V is believed to be the

Self-splicing reaction

Ribozyme activity was the first property assigned to group Supported by the Canadian Institutes of Health Research (CIHR) and by Alberta II introns (van der Veen et al., 1986; Peebles et al., 1987). Self- Heritage Foundation for Medical Research (AHFMR). splicing is accomplished by two transesterifications (Fig. 2A). Request reprints from Dr. Steven Zimmerly In the first step, the 5) is defined through two base-pairing Department of Biological Sciences, University of Calgary 2500 University Dr. NW, Calgary, AB T2N 1N4 (Canada) interactions, in which the intron binding sequences (IBS) 1 and telephone: (403) 220-7933; fax: (403) 289-9311; e-mail: [email protected] 2 pair to their corresponding exon binding sequences (EBS)

Fax + 41 61 306 12 34 © 2005 S. Karger AG, Basel Accessible online at: ABC E-mail [email protected] 1424–8581/05/1104–0589$22.00/0 www.karger.com/cgr www.karger.com Fig. 1. Basic group II intron structure. (A) Simplified secondary structure of IIB introns, composed of six structural domains, with the ORF located in domain IV. The intron is shown by black lines, by thick gray lines, and the ORF by a dotted black line. Black dots indicate individual involved in base pairing tertiary interactions. The most important tertiary interactions are IBS1,2,3/EBS1,2,3, Á–Á) and ‰–‰), which are indicated by gray lines and arrows. For IIA introns, ‰) is located at the first of Fig. 2. Reactions performed by group II introns. (A) Self-splicing reac- the 3) exon rather than in domain I, and there is no IBS3/EBS3 pairing. See tion. Self-splicing is catalyzed by the intron RNA structure through two Toor et al. (2001) for detailed information about RNA structural subclasses. transesterification reactions, yielding spliced exons and intron lariat. The (B) Intron-encoded ORF domains. The ORF consists of an RT domain (out- bulged A branch site that attacks the 5)-splice site is shown in domain VI of lined gray boxes; with sub-domains 0–7), domain X Fig. 1A. (B) Basic mechanism of group II intron mobility. The intron lariat in (solid box; maturase domain), domain D (hatched box; DNA binding an RNP particle (lariat+RT) reverse splices into the top strand of the DNA domain, not conserved in sequence among different ORFs), and the En target site. The bottom strand is cleaved by the En domain, and the intron is domain (double-hatched box; endonuclease domain). Exons are dark gray, reverse transcribed. Recombination and repair activities complete the inser- and intron RNA structure light gray. tion.

1 and 2 (Fig. 1A). The two interactions position the 5) splice site bond of the 3) junction, resulting in exon ligation and release of in proximity to the bulged adenosine in DVI (Jacquier and the intron lariat (Fig. 2A). Because the number of phosphate Michel, 1987). The 2)-OH of the adenosine acts as a nucleo- bonds broken and created during the reaction is equal, splicing phile, attacking the phosphodiester bond of the 5) junction to is energetically neutral and reversible, and does not require break the bond and form a lariat intermediate with a 2)–5) link- energy sources such as ATP. age (Fig. 2A). The self-splicing reaction requires relatively ex- In the second splicing step, the 3) exon is defined by two, treme reaction conditions of salt, magnesium and sometimes single-base-pair interactions, which vary for the RNA structur- temperature (e.g., 500 mM KCl, 100 mM MgCl2, 45°C) (Jarrell al classes. For IIA introns, the base pairs are Á–Á) and ‰–‰), et al., 1988), and it is known that protein factors are required while for IIB and IIC introns, they are Á–Á) and IBS3-EBS3 for splicing of introns in vivo. For many introns, the most (Fig. 1A) (Jacquier and Michel, 1990; Costa et al., 2000). The important splicing factor is the intron-encoded protein. Upon second transesterification is similar to the first, but with the , the protein is translated from unspliced mRNA, now free 3)-OH of the 5) exon attacking the phosphodiester and binds to a high affinity binding site in domain IVA in un-

590 Cytogenet Genome Res 110:589–597 (2005) spliced intron (Fig. 1A) (Wank et al., 1999). Combined with markers (Eskes et al., 2000). The pathways differ with regard to contacts with other regions of the intron RNA, the protein the role of double-strand break repair (DSBR) activities that induces structural changes that stimulate self-splicing (Matsuu- complete the insertion event. In bacteria, group II introns have ra et al., 2001; Noah and Lambowitz, 2003), to yield spliced not been observed to coconvert exon markers, and mobility exons and a ribonucleoprotein (RNP) particle consisting of a occurs in RecA– strains, indicating independence from a gener- lariat with the protein still attached. This RNP is the entry mol- al DSBR system during retrohoming (Mills et al., 1997; Cousi- ecule to the mobility pathway. neau et al., 1998; Martı´nez-Abarca and Toro, 2000). Interestingly, many bacterial introns do not encode En domains (Fig. 3B), yet appear to be mobile. The best studied of The basic intron mobility reaction these is the Sinorhizobium intron RmInt1 (Martı´nez-Abarca et al., 2000; Muñoz-Adelantado et al., 2003), which is efficiently Mobility of group II introns occurs through a simple yet spe- mobile in vivo even though it does not have an En domain. A cialized mechanism. In the predominant event, retrohoming, similar process was characterized for an Ll.ltrB intron with a the introns invade double-stranded DNA targets with high site mutated En domain (Zhong and Lambowitz, 2003). The data specificity (Moran et al., 1995). Non-site-specific mobility, or suggested a mechanism in which the intron reverse splices into retrotransposition, also occurs (see below) (Mueller et al., 1993; the double-stranded DNA target, and cDNA synthesis is Sellem et al., 1993; Cousineau et al., 2000; Martı´nez-Abarca primed by the 3) end of a DNA at the replication fork, rather and Toro, 2000), but the majority of characterized introns than priming by cleaved target DNA. home to consistent and predictable locations. A final variation of mobility is insertion into noncognate Homing is initiated by recognition of the DNA target site by sites, known as retrotransposition. In general, retrotransposi- the RNP, with the homing site typically extending from tion is thought to occur by reverse splicing directly into DNA F25 bp upstream to F10 bp downstream of the insertion site. sites, similar to retrohoming (Yang et al., 1998; Dickson et al., In recognizing the correct site, the protein binds to the up- 2001), although most retrotransposition sites lack 3) exon stream target site (–25 to –13 for the well-studied Lactococcus sequences required for second strand cleavage (Mueller et al., lactis intron Ll.ltrB), which stimulates local unwinding of the 1993; Sellem et al., 1993; Ichiyanagi et al., 2002). In bacteria, DNA. The remaining upstream target site (–12 to –1 for retrotransposition occurs independently of the En domain Ll.ltrB) is recognized by base pairing to the intron RNA (Cousineau et al., 2000), suggesting that the mechanism may be through the same IBS1/EBS1 and IBS2/EBS2 interactions that similar to homing without an En domain, where a nascent occur during splicing (Guo et al., 1997, 2000; Mohr et al., 2000; DNA at a replication fork is proposed to prime reverse tran- Singh and Lambowitz, 2001). Concomitantly, the 3) exon is scription of the inserted intron RNA (Ichiyanagi et al., 2002; defined by the IBS3-EBS3 (or ‰–‰)) base pair(s) (Guo et al., Zhong and Lambowitz, 2003). 1997; Jiménez-Zurdo et al., 2003), which also are analogous to the pairing interactions during splicing. The intron then re- verse-splices into the top strand (Zimmerly et al., 1995b) in a Group II introns as tools in reaction that is mechanistically the reverse of splicing, and therefore intrinsically RNA-catalyzed. The protein recognizes Because of their novel properties and wide species distribu- sequences in the 3) exon and uses the En domain to cleave the tion, group II introns may provide a powerful new strategy for bottom strand of the DNA target (Fig. 2B) (at position +9 for site-specific DNA manipulation. The basic idea is based on: Ll.ltrB), creating a free 3)-OH from which the RT can prime 1) their long target site recognition (F30 bp); 2) the ability to reverse transcription (Zimmerly et al., 1995a; Matsuura et al., insert foreign sequence into domain IV for coinsertion with the 1997). Once primed, the RT converts the inserted intron into intron; and 3) the ability to alter the DNA target site by mani- cDNA to produce the bottom strand of the inserted intron. For- pulating the EBS sequences that pair with the DNA exons dur- mation of the top (second) strand of intron DNA has not been ing intron insertion. The region of protein recognition (–25 to experimentally demonstrated, but is believed to be through –13, and +4 to +10 for Ll.ltrB), only requires a loose consensus recombination/repair pathways (below) (Fig. 2B). Overall, this to the cognate site, while the RNA-recognized region (–12 to +3 process is called target-primed reverse transcription (TPRT), for Ll.ltrB) can in principle be mutated to any new pairing and is similar to reactions performed by non-LTR retroele- interaction (Mohr et al., 2000). Together, these requirements ments. For more detailed mechanistic discussion see other allow predictable manipulation of new target site specificities, reviews (Lambowitz et al., 1999; Belfort et al., 2002; Lambo- and such modified group II introns are known as targetrons. witz and Zimmerly, 2004). There are a number of variations for the uses of targetrons. Importantly, a method has been developed to generate an intron to insert site-specifically into any target, through ran- Variations in the mobility reaction domization of the EBS sequences and selection (Guo et al., 2000; Zhong et al., 2003). The RT ORF can be expressed in Because the final steps of intron insertion are carried out by trans (Guo et al., 2000; Frazier et al., 2003), and intron inser- host repair activities, which can vary among , multi- tions can “knock out” selected (Karberg et al., 2001). Site- ple pathways occur. In mitochondria, three pathways specific can be targeted into bacterial have been identified based on coconversion data for flanking using a specialized intron that stimulates homologous recombi-

Cytogenet Genome Res 110:589–597 (2005) 591 Fig. 3. Variety of group II introns. Intron variations found in nature are introns) (Dai and Zimmerly, 2002). (H) Insertion in the reverse orientation. diagrammed. Exons are shown as dark gray boxes, intron RNAs as black (Examples: X.f.I1 in Xylella fastidiosa DNA methyltransferase ; Th.e.I12 lines and intron-encoded ORFs as light gray boxes. RNA secondary structure in Thermosynechococcus elongatus glycosyltransferase gene) (Dai and Zim- domains I–VI are denoted by stem loops. Panels A–J represent double merly, 2002; L. Dai, S. Zimmerly, unpublished). (I) Twintron in Euglena stranded DNA structures, with boxes and stems positioned up or down to . Twintrons can consist of many combinations of group II and III indicate the strand orientation. Panels K–M represent RNA secondary struc- introns. (Example: rps3 gene twintron) (Copertino et al., 1992). (J) Twintron tures. (A) Typical group II intron with all motifs. (Examples: Lactococcus in the Archaebacteria. The example shown consists of four nested introns, lactis Ll.ltrB, yeast mitochondrial introns aI1, aI2) (Belfort et al., 2002). each inserted into a conserved sequence in another intron’s ORF. Two of the (B) Intron lacking the En domain, common in bacteria. (Example: Sinorhizo- intron copies are incomplete, so that sequential splicing cannot remove all of bium meliloti RmInt1) (Martı´nez-Abarca et al., 2000). (C) ORF-less intron. the introns, and there is no ORF flanking the introns in any case. (Example: (Examples: yeast mitochondrial introns aI5Á, cob1I1) (Michel et al., 1989). M.a.I2-F1/M.a.I1-F1/M.a.I1-3/M.a.I5-1 in Methanosarcina acetivorans) (D) Bacterial intron fragment. This example intron is truncated by an IS5 (Dai and Zimmerly, 2003). (K) Two-piece trans-splicing intron, which is insertion in the opposite orientation. (Example: E. coli I4) (Dai and Zimmer- common in mitochondria. The intron is encoded in two different ly, 2002). (E) ORF-encoding intron with degenerate ORF, likely immobile. genomic locations, and splicing in trans produces a functional mRNA from (Example shown is trnKI1 (matK) in plant chloroplasts; different degenera- two separately encoded exons. (Examples: nad1I1, nad5I2 in higher plants) tions are seen for nad1I4 (matR) in higher plant mitochondria and for (Bonen, 1993). (L) Three-piece trans-splicing intron. This is similar to two- psbCI4 in Euglena chloroplasts (Mohr et al., 1993; Zimmerly et al., 2001). piece trans-splicing, but the intron is encoded in three pieces. (Example: (F) Free-standing ORF without an intron RNA structure. (Examples: nMat- Chlamydomonas psaAI1) (Choquet et al., 1988; Goldschmidt-Clermont et 1a, nMat-1b, nMat-2a, nMat-2b in plant nuclear genomes; matK in mt al., 1991). (M) Group III intron. Only found in Euglena chloroplasts, these genome of Epiphagus virginianum) (Mohr and Lambowitz, 2003). (G) Inser- introns are short, degenerate forms consisting of portions of domain I and VI. tion after rho-independent terminators. (Examples: all bacterial class C (Examples: psbK) (Doetsch et al., 2001).

592 Cytogenet Genome Res 110:589–597 (2005) nation with introduced mutated DNA sequence (Karberg et al., in intron splicing (Lambowitz et al., 1999). These factors would 2001). Finally, targetrons have been shown to insert into plas- be predicted to be especially critical in splicing of degenerate mid targets in human cells, pointing to potential applications in introns. functional genomics and gene therapy in higher organisms The simplest type of degeneration is deviation of the RNA (Guo et al., 2000). Clearly, the development of group II introns structure, most obviously through mispairings in domains 5 or as vectors has very broad potential, and further work will devel- 6, major deviations within conserved motifs, or large insertions op their uses and applications. or localized deletions. Such deviations are common for higher plant organellar introns and occur to different extents (Michel et al., 1989; Toor et al., 2001; N. Toor, S. Zimmerly, unpub- Diversity of group II introns in nature: bacterial versus lished). While it is not possible to know for certain that individ- organellar introns and unusual variations ual deviations compromise catalytic activity, this is suggested cumulatively by the fact that no higher plant intron has yet self- Organellar introns spliced in vitro (Michel and Ferat, 1995). In contrast, group II Since the discovery of group II introns roughly twenty years introns in lower tend to have more “standard” sec- ago, the models for study have been organellar introns. In the ondary structures that are easier to model, and many of them past ten years, however, our knowledge of bacterial introns has self-splice in vitro (Michel and Ferat, 1995; N. Toor, L. Dai, S. grown immensely, and combined with the probability that the Zimmerly, unpublished). bacterial introns are the most ancient, bacterial group II introns ORF degeneration is also a common theme. Many organel- will soon be the new prototype if they are not already. In this lar ORFs have modest defects in mobility-related domains, section, we compare group II introns in and in bacte- such as mutations in the YADD motif or truncations of the ria, and describe the impressive range of variations in the endonuclease, and these losses appear to have occurred inde- introns known to date (Fig. 3). pendently many times because they are distributed spottily In mitochondria and chloroplasts, group II introns are pri- among distantly related introns (Zimmerly et al., 2001). More marily splicing entities, as most of them do not encode mobility dramatic degenerations of ORFs are also found. For example, ORFs (Fig. 3C) (Michel et al., 1989; Michel and Ferat, 1995; matK, the ORF encoded by tRNALys introns in plants, contains Zimmerly et al., 2001). Organellar introns are typically located no sequence identity in the region corresponding to RT do- in conserved housekeeping genes such as cytochrome oxidase mains 0–4, but has detectable remnants of domains 5–7 and and rubisco subunits, with sometimes up to ten introns (both very strong domain X conservation (Fig. 3E) (Mohr et al., group I and II) in a given gene. Therefore, splicing must be effi- 1993). From this it is inferred that the protein retains its matu- cient enough to maintain function of the host gene. The introns rase activity associated with domain X, but has lost mobility are considered silent phenotypically because there is no appar- functions. In higher plant chloroplasts, matR is truncated for ent difference in fitness between strains that contain or lack the RT domains 0–1, and has two large insertions (1150 aa each) introns. A minority of organellar introns have mobility proper- within the RT domain, raising doubts as to its mobility func- ties, but there is a bias in their distribution. In lower euka- tions (Zimmerly et al., 2001). Also in plants, several nuclear- ryotes, about half of group II introns encode RT ORFs and encoded proteins, nMat-1a, nMat-1b, nMat-2a and nMat-2b, would be predicted to be mobile, while in higher plants, nearly are closely related to mitochondrial maturase proteins but are all introns are ORF-less, and there is only one ORF-containing not found within an intron (Mohr and Lambowitz, 2003). intron out of about twenty introns in each (matK in These proteins are presumably derived from mitochondrial chloroplasts, matR in mitochondria). This difference between maturases, and are imported into mitochondria to function in lower eukaryotes and plants suggests a progressive ORF degen- intron splicing and/or mobility (Fig. 3F). Finally, Euglena eration and loss during plant evolution (below). chloroplasts have two intron-encoded ORFs, mat1 and mat2, Degeneration is common for organellar group II introns, which are severely degenerate throughout the RT and X motifs, both in RNA structures and ORF motifs. In many cases, degen- but which likely have a function in splicing since they are con- erate introns have probably lost mobility competence, but they served among Euglena species and their host intron is located must retain splicing competence because they are located in in a photosystem gene (Zhang et al., 1995; Doetsch et al., conserved genes. Similarly, ORFs with degenerate RT motifs 1998). probably no longer promote mobility but may retain splicing Trans-splicing is a type of “genomic degeneration” in which (maturase) function. Many host-encoded genes are known to an intron RNA is encoded in two or more pieces, which are facilitate splicing of group II introns in organelles (Lehmann encoded in different locations in the genome (Bonen, 1993). and Schmidt, 2003). These include the helicase-related gene The intron pieces along with their flanking exons are tran- MSS116 (Séraphin et al., 1989) and the magnesium ion trans- scribed separately, and the intron segments associate to form an port gene MRS2 (Gregan et al., 2001) in yeast mitochondria. RNA structure that approximates a standard cis-splicing in- Maize splicing proteins include the novel gene crs1 tron. The resulting splicing event ligates two exons encoded in and the tRNA peptidyl hydrolase-related gene crs2, with its distant parts of the genome (Fig. 3K). There are many examples associated factors caf1 and caf2 (Jenkins and Barkan, 2001; Till of trans-splicing introns in higher plants (e.g., nad1I1, nad1I3 et al., 2001; Ostheimer et al., 2003). Because some of the pro- nad2I2, nad5I2, nad5I3) (Pereira de Souza et al., 1991; Wissin- teins are related to proteins with known activities, it is thought ger et al., 1991). Trans-splicing introns are believed to be that pre-existing organellar proteins were adapted to function formed by genomic rearrangements that split an intron, a scen-

Cytogenet Genome Res 110:589–597 (2005) 593 ario that is supported by the presence of cis-splicing introns in The main form of degeneration for bacterial group II introns early branching plants that are homologous to the trans-splicing is fragmentation, and there are roughly equal numbers of full- homologs in higher plants (Malek and Knoop, 1998). Trans- length and truncated introns in sequenced bacterial genomes splicing in three parts has also been found, for example, in the (Dai and Zimmerly, 2002). Truncations can be caused in a green alga Chlamydomonas reinhardtii, in which the psaA I1 number of ways, such as incomplete reverse transcription dur- intron is split into three pieces, which together fold into a struc- ing TPRT, an attractive idea because it would cause 5) trunca- ture resembling a degenerate group II intron structure. For this tions analogous to those found in non-LTR elements. However, intron and its downstream trans-splicing intron in the psaA 5) truncations account for a minority of fragmented introns. gene, a dozen proteins have been implicated genetically as Figure 3D shows an example of 3) truncation that was caused affecting the trans-splicing process (Fig. 3L) (Choquet et al., by the insertion of an IS element in the opposite orientation 1988; Goldschmidt-Clermont et al., 1991). (Dai and Zimmerly, 2002). Truncations in bacteria are funda- Another type of dramatic degeneration is found in Euglena mentally different from the degenerations in organellar introns chloroplasts (Copertino et al., 1992). The Euglena gracilis because the truncations completely inactivate the intron, chloroplast genome contains about 150 introns in total (F39% whereas organellar introns must remain splicing-competent of its sequence), with two general types, group II and group III. since they are in conserved genes. It should be noted that there The group II introns resemble other group II introns but have are some examples of RNA structural degeneration in bacterial structural irregularities, while the group III introns are drasti- introns such as mispairs and indels, but since the introns are cally degenerate, and are minimal introns of roughly 100 bp, not located in conserved genes and do not need to retain splic- which retain only the 5) and 3) motifs of the introns (Fig. 3M). ing function, it is likely that many of these are “dying” copies of Because of the range of intron degeneration in Euglena and the retroelements (Dai and Zimmerly, 2002). small size of many introns, it is clear that the introns need addi- The most unusual types of bacterial introns are those of the tional help from proteins or other RNAs acting in trans in order ORF phylogenetic class “bacterial class C.” The RNA is classi- to splice, but these factors have not yet been identified. fied IIC because it does not conform to either the IIA or IIB Euglena chloroplasts also contain twintrons, an unusual consensus, as it lacks EBS2, and has a shortened domain 5 organization in which one intron is nested inside another (eith- helix, and other deviations (Toor et al., 2001; Toro, 2003). er copy can be group II or III) (Fig. 3I). Sequential splicing of Interestingly, the introns are not located in ORFs, but instead the introns produces a functional mRNA (Copertino et al., are located directly after rho-independent terminator motifs 1992). An interesting question, given the prevalence of the (Fig. 3G) (Dai and Zimmerly, 2002). This insertion strategy degenerate introns in the genome, is whether the introns are ensures that the intron does not impair expression of a host mobile in the degenerate form in Euglena, or whether only gene, and that the intron is transcribed at a low level, by termi- “standard” introns are mobile, with the current introns in the nation readthrough. Often identical bacterial class C introns genome being immobile remnants. insert into multiple targets having very little sequence identity, In summary, there are a multitude of examples of degener- but still having a terminator motif (inverted repeat followed by ate introns in organelles. In nearly all cases, splicing of degener- T’s). This pattern appears to defy the principle of homing into ate introns must be retained, because the introns are in con- defined sequences, and it is not yet clear how the introns recog- served genes, and this may be promoted by splicing factors con- nize the terminator motifs in the DNA. tributed by the host . In cases of ORF degeneration, it is Although there is a bias for bacterial introns being located likely that splicing function is usually retained while mobility outside of conserved genes, some notable exceptions have been function is lost. found recently. The genome of the cyanobacterium Trichodes- mium sp. contains 21 group II introns, twelve of which are Bacteria located in conserved genes, such as allophycocyanin-B, IMP In bacteria, in contrast to organellar introns, group II dehydrogenase, a ribonucleotide reductase, a DNA polymerase introns appear to behave as retroelements more than introns. and RNase H (L. Dai, S. Zimmerly, unpublished). The ribonu- Bacterial group II introns are associated with mobile and cleotide reductase in fact contains many genic interruptions, are rarely found in housekeeping genes (Dai and Zimmerly, including three group II introns and, curiously, four inteins 2002). The introns are mainly found in IS elements, , (Liu et al., 2003). The sheer number of examples in Trichodes- or pathogenicity islands, and are sometimes inserted between mium suggests that the introns do not adversely affect host gene genes. Together, this suggests that the introns are excluded function in this cyanobacterium. Other examples of introns in from conserved genes, either because they interfere with gene conserved genes tend to be exceptions that prove the rule. In function, or because there is a replicative advantage by residing Xylella fastidiosa and Thermosynechococcus elongatus, two on other mobile DNAs. Unlike organellar introns, they almost introns have inserted into conserved genes in the reverse orien- always encode RT ORFs and are either retroelements or trun- tation (DNA methyltransferase and glycosyltransferase genes, cated retroelements (Fig. 3A). In general, they are not degener- respectively) (Fig. 3H) (Dai and Zimmerly, 2002; L. Dai, S. ate in RNA structure, but instead conform to the IIA or IIB Zimmerly, unpublished). In both cases, the intron cannot be consensus, and have conserved sequences and structural varia- spliced out from the mRNA transcript, and intron insertion has tions specific to each phylogenetic subclass (Toor et al., 2001). in effect knocked out the gene, an event more characteristic of One subclass (bacterial class C) cannot be classified as IIA or retroelements than introns. In Azotobacter vinelandii, an intron IIB, and is designated IIC (Ferat et al., 2003; Toro, 2003). has inserted into the stop codon of the GroEL gene, but because

594 Cytogenet Genome Res 110:589–597 (2005) the stop codon is changed to a different stop codon, it is not introns is based on similarities in the splicing reactions and clear that intron insertion affects the host gene expression RNA structures (Padgett et al., 1994; Villa et al., 2002), and (Adamidi et al., 2003; Ferat et al., 2003). recently, a functional similarity was observed in that a group II Another interesting subset of introns in bacteria is ORF-less intron domain 5 stem can be swapped for the analogous U2/ introns, which are now known in archaebacteria and in cyano- U6atac pairing the U12-dependent (Shukla and bacteria (four examples out of 130 known full-length introns) Padgett, 2002). (Dai et al., 2003; L. Dai, S. Zimmerly, unpublished). Interest- The expanding microcosm of group II introns in bacteria ingly, the ORF-less introns appear to be mobile despite their has brought new insights to the history of the introns, but many lack of an ORF, which is inferred from their presence in multi- questions remain unanswered. The diversity of introns in bac- ple polymorphic targets within a genome. Their targets are gen- teria strengthens the idea that the introns evolved in bacteria erally conserved for IBS1 and 2 pairings, but not other protein and subsequently spread to the organelles (Ferat and Michel, recognition regions, suggesting that the target site is determined 1993; Dai and Zimmerly, 2002). However, an interesting twist by the reverse splicing step rather than protein recognition. comes from the emerging picture that bacterial introns are Intriguingly, for each ORF-less intron, there is a closely related retroelement-like, because it suggests that spliceosomal introns intron within the genome that encodes an ORF, suggesting that are ultimately derived from retroelements, a scenario which the ORF-less introns require an RT protein acting in trans, might help account for their successful spread in eukaryotic making them “satellites” of the ORF-containing introns. genomes. Twintrons have also been found in bacterial genomes, but Phylogenetic analysis of the group II intron ORFs divides they are different from twintrons described in Euglena chloro- them into eight groups: the mitochondrial class, the chloro- plasts. In the cyanobacterium Trichodesmium there are six plast-like classes 1 and 2, and bacterial classes A–E (Zimmerly twintrons, one of which is in a conserved gene (RNase H) (L. et al., 2001; Toro et al., 2002). Each class is represented by Dai, S. Zimmerly, unpublished). However, the majority of mixed host organisms (for example, the mitochondrial and twintrons are not located in ORFs, and sequential splicing chloroplast-like classes each contain bacterial introns), and a would not produce a functional mRNA. The most bizarre orga- single species can contain introns in multiple classes (for exam- nization is found in the archaebacterium Methanosarcina aceti- ple, E. coli introns are in four different phylogenetic classes, vorans, which contains one twintron with four levels of nesting and one E. coli intron is closely related to an archaebacterial (Fig. 3J) (Dai and Zimmerly, 2003). Some of the intron copies intron). Therefore, as perhaps expected for a mobile element, are incomplete, such that all introns cannot be spliced out, and there appears to have been much horizontal transfer of the no possible ORF surrounds the intron cluster that might introns, particularly among bacteria. encode a functional product. Instead, the introns appear be A detailed analysis of RNA secondary structures and com- homing into conserved sequence motifs of the RTs of other parison with the phylogenetic classes indicated coevolution introns, presumably a selfish strategy that finds an innocuous between intron RNA and ORFs, with each ORF subclass being target, but at the expense of other group II intron retroele- associated with a different intron RNA structural class, which ments. cumulatively represent all previously characterized RNA struc- Together, these properties suggest that group II introns tural classes plus new varieties (Fontaine et al., 1997; Toor et al., behave as retroelements in bacteria. In most cases, it is of little 2001). Based on this general trend, as well as other evidence consequence to the cell whether the introns splice, although (Toor et al., 2001), we proposed the “retroelement ancestor splicing does serve a purpose to the intron in providing RNPs hypothesis” to describe an overall evolutionary history for group for further mobility reactions. Bacterial introns appear to have II introns (Fig. 4). This hypothesis predicts that the ancestor of adapted a survival strategy typical of selfish DNAs, based on all known group II introns was a retroelement in bacteria having constant movement to new genomic locations, balanced with an RNA structure that was not of the “standard” IIA or IIB type deletions and loss. In contrast, organellar group II introns are as originally defined for organellar introns, and that the varieties adapted primarily to splicing, with many host enzymes having of ribozyme structures differentiated as components of these been recruited to promote and control the splicing process. In retroelements. Some of the intron classes (mitochondrial and many cases in organelles, particularly for ORF-less introns, the chloroplast-like classes) migrated to organelles, where a large mobility properties have been discarded in exchange for being percentage of introns lost their ORFs and in plants, nearly all an indispensable step in gene expression. introns lost ORFs and degenerated in RNA structure, becoming dependent on protein splicing factors supplied by the host organ- ism. The concept of ORF loss is validated by examples of “ORF- Group II intron evolution less” introns in lower plant mitochondria that contain remnants of RT ORFs (Toor et al., 2001). Evolution of group II introns is an important topic because Group II introns are also evolutionarily related to non-LTR the introns are believed to be the ancestors of spliceosomal elements. RTs of group II introns and non-LTR elements are introns. According to this idea, the introns began as self-splic- related phylogenetically, and structurally, they share a con- ing entities, and after invasion of the nucleus the intron RNAs served N-terminal extension (domain 0) and domain 2a that is fragmented into multiple pieces (the snRNAs), which together absent from retroviral and other RTs (Malik et al., 1999). carry out the splicing reaction within the spliceosome (Sharp, Mechanistically, the two types of elements are both mobile 1991). The parallel between group II introns and spliceosomal through TPRT mechanisms, in which cleaved target DNA is

Cytogenet Genome Res 110:589–597 (2005) 595 used to prime cDNA synthesis (Luan et al., 1993; Zimmerly et al., 1995b). If group II introns were present in primitive euka- ryotes, as expected if they were precursors to spliceosomal introns, then it is also possible that they could have been pre- cursors to non-LTR elements. Accordingly, one simple scenario is that mobile group II introns invaded the nucleus of a primi- tive eukaryote (either from an organellar or bacterial source) and that some introns lost RT ORFs and became spliceosomal introns, while others lost the intron RNA structure and became non-LTR elements. At this time one cannot say with certainty that group II introns were the ancestors, either direct or indi- rect, of retroelements or introns in higher eukaryotic genomes. Still, it is tantalizing to consider that group II intron invasion of a primitive eukaryotic nucleus was a pivotal occurrence in the development of eukaryotic genomes.

Fig. 4. The retroelement ancestor hypothesis for the evolution of group II introns. The model predicts that the ancestor of all known group II introns was a mobile bacterial intron with RNA structural features not conforming to canonical IIA or IIB structures. The different subclasses of ribozyme struc- tures are predicted to have differentiated as components of retroelements in bacteria, and two of these lineages (canonical IIA and IIB) migrated to organ- elles where many lost their ORFs and mobility properties. Spliceosomal introns could be derived from group II introns at any point in this history.

References

Adamidi C, Fedorova O, Pyle AM: A group II intron Dai L, Zimmerly S: ORF-less and reverse-transcrip- Frazier CL, San Filippo J, Lambowitz AM, Mills DA: inserted into a bacterial heat-shock operon shows tase-encoding group II introns in archaebacteria, Genetic manipulation of Lactococcus lactis by us- autocatalytic activity and unusual thermostability. with a pattern of homing into related group II ing targeted group II introns: generation of stable Biochemistry 42:3409–3418 (2003). intron ORFs. RNA 9:14–19 (2003). insertions without selection. Appl Environ Micro- Belfort M, Derbyshire V, Parker MM, Cousineau B, Dai L, Toor N, Olson R, Keeping A, Zimmerly S: Data- biol 69:1121–1128 (2003). Lambowitz AM: Mobile introns: pathways and base for mobile group II introns. Nucleic Acids Res Goldschmidt-Clermont M, Choquet Y, Girard-Bascou proteins, in Craig NL, Craigie R, Gellert M, Lam- 31:424–426 (2003). J, Michel F, Schirmer-Rahire M, Rochaix JD: A bowitz AM (eds): Mobile DNA II, pp 761–777 Dickson L, Huang HR, Liu L, Matsuura M, Lambowitz small chloroplast RNA may be required for trans- (ASM Press, Washington DC 2002). AM, Perlman PS: Retrotransposition of a yeast splicing in Chlamydomonas reinhardtii. Cell 65: Bonen L: Trans-splicing of pre-mRNA in plants, ani- group II intron occurs by reverse splicing directly 135–143 (1991). mals, and . FASEB J 7:40–46 (1993). into ectopic DNA sites. Proc Natl Acad Sci Gregan J, Kolisek M, Schweyen RJ: Mitochondrial Choquet Y, Goldschmidt-Clermont M, Girard-Bascou 98:13207–13212 (2001). Mg2+ homeostasis is critical for group II intron J, Kück U, Bennoun P, Rochaix JD: Mutant phe- Doetsch NA, Thompson MD, Hallick RB: A maturase- splicing in vivo. Genes Dev 15:2229–2237 (2001). notypes support a trans-splicing mechanism for the encoding group III twintron is conserved in deeply Guo H, Zimmerly S, Perlman PS, Lambowitz AM: expression of the tripartite psaA gene in the C. rein- rooted euglenoid species: are group III introns the Group II intron endonucleases use both RNA and hardtii chloroplast. Cell 52:903–913 (1988). chicken or the egg? Mol Biol Evol 15:76–86 protein subunits for recognition of specific se- Copertino DW, Shigeoka S, Hallick RB: Chloroplast (1998). quences in double-stranded DNA. EMBO J 16: group III twintron excision utilizing multiple 5)- Doetsch NA, Favreau MR, Kuscuoglu N, Thompson 6835–6848 (1997). and 3)-splice sites. EMBO J 11:5041–5050 (1992). MD, Hallick RB: Chloroplast transformation in Guo H, Karberg M, Long M, Jones JP 3rd, Sullenger B, Costa M, Michel F, Westhof E: A three-dimensional Euglena gracilis: splicing of a group III twintron Lambowitz AM: Group II introns designed to perspective on exon binding by a group II self- transcribed from a transgenic psbK operon. Curr insert into therapeutically relevant DNA target splicing intron. EMBO J 19:5007–5018 (2000). Genet 39:49–60 (2001). sites in human cells. Science 289:452–457 (2000). Cousineau B, Smith D, Lawrence-Cavanagh S, Mueller Eskes R, Liu L, Ma H, Chao MY, Dickson L, Lambo- Ichiyanagi K, Beauregard A, Lawrence S, Smith D, JE, Yang J, Mills D, Manias D, Dunny G, Lambo- witz AM, Perlman PS: Multiple homing pathways Cousineau B, Belfort M: Retrotransposition of the witz AM, Belfort M: Retrohoming of a bacterial used by yeast mitochondrial group II introns. Mol Ll.LtrB group II intron proceeds predominantly group II intron: mobility via complete reverse Cell Biol 20:8432–8446 (2000). via reverse splicing into DNA targets. Mol Micro- splicing, independent of homologous DNA recom- Ferat JL, Michel F: Group II self-splicing introns in biol 46:1259–1272 (2002). bination. Cell 94:451–462 (1998). bacteria. Nature 364:358–361 (1993). Jacquier A, Michel F: Multiple exon-binding sites in Cousineau B, Lawrence S, Smith D, Belfort M: Retro- Ferat JL, Le Gouar M, Michel F: A group II intron has class II self-splicing introns. Cell 50:17–29 (1987). transposition of a bacterial group II intron. Nature invaded the genus Azotobacter and is inserted with- Jacquier A, Michel F: Base-pairing interactions involv- 404:1018–1021 (2000). in the termination codon of the essential groEL ing the 5) and 3)-terminal nucleotides of group II Dai L, Zimmerly S: Compilation and analysis of group gene. Mol Microbiol 49:1407–1423 (2003). self-splicing introns. J Mol Biol 213:437–447 II intron insertions in bacterial genomes: evidence Fontaine JM, Goux D, Kloareg B, Loiseaux-de Göer: (1990). for retroelement behavior. Nucleic Acids Res The reverse transcriptase-like proteins encoded by Jarrell KA, Peebles CL, Dietrich RC, Romiti SL, Perl- 30:1091–1102 (2002). group II introns in the mitochromdrial genome of man PS: Group II intron self-splicing. Alternative the brown alga Pylaiella littoralis belong to two dif- reaction conditions yield novel products. J Biol ferent lineages which apparently coevolved with Chem 263:3432–3439 (1988). the group II ribosyme lineages. J Mol Evol 44:33– 42 (1997).

596 Cytogenet Genome Res 110:589–597 (2005) Jenkins BD, Barkan A: Recruitment of a peptidyl- Mohr G, Lambowitz AM: Putative proteins related to Shukla GC, Padgett RA: A catalytically active group II tRNA hydrolase as a facilitator of group II intron group II intron reverse transcriptase/maturases are intron domain 5 can function in the U12-depen- splicing in chloroplasts. EMBO J 20:872–879 encoded by nuclear genes in higher plants. Nucleic dent spliceosome. Mol Cell 9:1145–1150 (2002). (2001). Acids Res 31:647–652 (2003). Singh NN, Lambowitz AM: Interaction of a group II Jiménez-Zurdo JI, Garcı´a-Rodrı´guez FM, Barrientos- Mohr G, Perlman PS, Lambowitz AM: Evolutionary intron ribonucleoprotein endonuclease with its Dura´n A, Toro N: DNA target site requirements relationships among group II intron-encoded pro- DNA target site investigated by DNA footprinting for homing in vivo of a bacterial group II intron teins and identification of a conserved domain that and modification interference. J Mol Biol 309: encoding a protein lacking the DNA endonuclease may be related to maturase function. Nucleic Acids 361–386 (2001). domain. J Mol Biol 326:413–423 (2003). Res 21:4991–4997 (1993). Till B, Schmitz-Linneweber C, Williams-Carrier R, Karberg M, Guo H, Zhong J, Coon R, Perutka J, Lam- Mohr G, Smith D, Belfort M, Lambowitz AM: Rules Barkan A: CRS1 is a novel group II intron splicing bowitz AM: Group II introns as controllable gene for DNA target-site recognition by a lactococcal factor that was derived from a domain of ancient targeting vectors for genetic manipulation of bacte- group II intron enable retargeting of the intron to origin. RNA 7:1227–1238 (2001). ria. Nature Biotechnol 19:1162–1167 (2001). specific DNA sequences. Genes Dev 14:559–573 Toor N, Hausner G, Zimmerly S: Coevolution of group Lambowitz AM, Zimmerly S: Mobile group II introns. (2000). II intron RNA structures with their intron-encoded Annu Rev Genet, in press (2004). Moran JV, Zimmerly S, Eskes R, Kennell JC, Lambo- reverse transcriptases. RNA 7:1142–1152 (2001). Lambowitz AM, Caprara MG, Zimmerly S, Perlman witz AM, Butow RA, Perlman PS: Mobile group II Toro N: Bacteria and Group II introns: addi- PS: Group I and group II as RNPs: clues introns of yeast mitochondrial DNA are novel site- tional in the environment. to the past and guides to the future, in Gesteland specific retroelements. Mol Cell Biol 15:2828– Environ Microbiol 5:143–151 (2003). RF, Cech TR, Atkins JF (eds): The RNA World, pp 2838 (1995). Toro N, Molina-Sa´nchez MD, Ferna´ ndez-La´pez M: 451–484 (Cold Spring Harbor Laboratory Press, Mueller MW, Allmaier M, Eskes R, Schweyen RJ: Identification and characterization of bacterial Cold Spring Harbor 1999). Transposition of group II intron aI1 in yeast and class E group II introns. Gene 299:245–250 Lehmann K, Schmidt U: Group II introns: structural invasion of mitochondrial genes at new locations. (2002). and catalytic versatility of large natural ribozymes. Nature 366:174–176 (1993). van der Veen R, Arnberg AC, van der Horst G, Bonen Crit Rev Biochem Mol Biol 38:249–303 (2003). Muñoz-Adelantado E, San Filippo J, Martı´nez-Abarca L, Tabak HF, Grivell LA: Excised group II introns Liu XQ, Yang J, Meng Q: Four inteins and three group F, Garcı´a-Rodrı´guez FM, Lambowitz AM, Toro N: in yeast mitochondria are lariats and can be II introns encoded in a bacterial ribonucleotide Mobility of the Sinorhizobium meliloti group II formed by self-splicing in vitro. Cell 44:225–234 reductase gene. J Biol Chem 287:46826–46831 intron RmInt1 occurs by reverse splicing into (1986). (2003). DNA, but requires an unknown reverse transcrip- Villa T, Pleiss JA, Guthrie C: Spliceosomal snRNAs: Luan DD, Korman MH, Jakubczak JL, Eickbush TH: tase priming mechanism. J Mol Biol 327:931–943 Mg2+-dependent chemistry at the catalytic core? Reverse transcription of R2Bm RNA is primed by (2003). Cell 109:149–152 (2002). a nick at the chromosomal target site: a mechanism Noah JW, Lambowitz AM: Effects of maturase binding Wank H, San Filippo J, Singh RN, Matsuura M, Lam- for non-LTR retrotransposition. Cell 72:595–605 and Mg2+ concentration of group II intron RNA bowitz AM: A reverse transcriptase/maturase pro- (1993). folding investigated by UV cross-linking. Biochem- motes splicing by binding at its own coding seg- Malek O, Knoop V: Trans-splicing group II introns in istry 42:12466–12480 (2003). ment in a group II intron RNA. Mol Cell 4:239– plant mitochondria: the complete set of cis-ar- Ostheimer GJ, Williams-Carrier R, Belcher S, Osborne 250 (1999). ranged homologs in ferns, fern allies, and a horn- E, Gierke J, Barkan A: Group II intron splicing fac- Wissinger B, Schuster W, Brennicke A: Trans splicing wort. RNA 4:1599–1609 (1998). tors derived by diversification of an ancient RNA- in Oenothera mitochondria: nad1 mRNAs are Malik HS, Burke WD, Eickbush TH: The age and evo- binding domain. EMBO J 22:3919–3929 (2003). edited in exon and trans-splicing group II intron lution of non-LTR retrotransposable elements. Padgett RA, Podar M, Boulanger SC, Perlman PS: The sequences. Cell 65:473–482 (1991). Mol Biol Evol 16:793–805 (1999). stereochemical course of group II intron self-splic- Yang J, Mohr G, Perlman PS, Lambowitz AM: Group Martı´nez-Abarca F, Toro N: RecA-independent ectop- ing. Science 266:1685–1688 (1994). II intron mobility in yeast mitochondria: target ic transposition in vivo of a bacterial group II Peebles CL, Benatan EJ, Jarrell KA, Perlman PS: DNA-primed reverse transcription activity of aI1 intron. Nucleic Acids Res 28:4397–4402 (2000). Group II intron self-splicing: development of alter- and reverse splicing into DNA transposition sites Martı´nez-Abarca F, Garcı´a-Rodrı´guez FM, Toro N: native reaction conditions and identification of a in vitro. J Mol Biol 282:505–523 (1998). Homing of a bacterial group II intron with an predicted intermediate. Cold Spring Harb Symp Zhang L, Jenkins KP, Stutz E, Hallick RB: The Euglena intron-encoded protein lacking a recognizable en- Quant Biol 52:223–232 (1987). gracilis intron-encoded mat2 is interrupted donuclease domain. Mol Microbiol 35:1405–1412 Pereira de Souza A, Jubier MF, Delcher E, Lancelin D, by three additional group II introns. RNA 1:1079– (2000). Lejeune B: A trans-splicing model for the expres- 1088 (1995). Matsuura M, Saldanha R, Ma H, Wank H, Yang J, sion of the tripartite nad5 gene in wheat and maize Zhong J, Lambowitz A: Group II intron mobility using Mohr G, Cavanagh S, Dunny GM, Belfort M, mitochondria. Plant Cell 3:1363–1378 (1991). nascent strands at DNA replication forks to prime Lambowitz AM: A bacterial group II intron encod- Qin PZ, Pyle AM: The architectural organization and reverse transcription. EMBO J 22:4555–4565 ing reverse transcriptase, maturase, and DNA en- mechanistic function of group II intron structural (2003). donuclease activities: biochemical demonstration elements. Curr Opin Struct Biol 8:301–308 Zhong J, Karberg M, Lambowitz AM: Targeted and of maturase activity and insertion of new genetic (1998). random bacterial gene disruption using a group II information within the intron. Genes Dev 11: Rest J, Mindell D: Retroids in archaea: phylogeny and intron (targetron) vector containing a retrotranspo- 2910–2924 (1997). lateral origins. Mol Biol Evol 20:1134–1142 sition-activated selectable marker. Nucleic Acids Matsuura M, Noah JW, Lambowitz AM: Mechanism (2003). Res 31:1656–1664 (2003). of maturase-promoted group II intron splicing. San Filippo J, Lambowitz AM: Characterization of the Zimmerly S, Guo H, Eskes R, Yang J, Perlman PS, EMBO J 20:7259–7270 (2001). C-terminal DNA-binding/DNA endonuclease re- Lambowitz AM: A group II intron RNA is a cata- Michel F, Ferat JL: Structure and activities of group II gion of a group II intron-encoded protein. J Mol lytic component of a DNA endonuclease involved introns. Annu Rev Biochem 64:435–461 (1995). Biol 324:933–951 (2002). in intron mobility. Cell 83:529–538 (1995a). Michel F, Umesono K, Ozeki H: Comparative and Sellem CH, Lecellier G, Belcour L: Transposition of a Zimmerly S, Guo H, Perlman PS, Lambowitz AM: functional anatomy of group II catalytic introns – a group II intron. Nature 366:176–178 (1993). Group II intron mobility occurs by target DNA- review. Gene 82:5–30 (1989). Séraphin B, Simon M, Boulet A, Faye G: Mitochon- primed reverse transcription. Cell 82:545–554 Mills DA, Manias DA, McKay LL, Dunny GM: Hom- drial splicing requires a protein from a novel heli- (1995b). ing of a group II intron from Lactococcus lactis case family. Nature 337:84–87 (1989). Zimmerly S, Hausner G, Wu X: Phylogenetic relation- subsp. lactis ML3. J Bacteriol 179:6107–6111 Sharp PA: “Five easy pieces”. Science 254:663 (1991). ships among group II intron ORFs. Nucleic Acids (1997). Res 29:1238–1250 (2001).

Cytogenet Genome Res 110:589–597 (2005) 597