I.=) 1994 Oxford University Press Nucleic Acids Research, 1994, Vol. 22, No. 6 1029-1036

A group Ill twintron encoding a maturase-like excises through lariat intermediates

Donald W.Copertino1 2+, Edina T.Hall2,§, Fredrick W.Van Hook2, Kristin P.Jenkins2 and Richard B.Hallick1,2* 1Departments of Biochemistry and 2Molecular and Cellular Biology, University of Arizona, Tucson, AZ 85721, USA

Received December 9, 1993; Revised and Accepted February 10, 1994

ABSTRACT The 1605 bp 4 of the Euglena gracilis the closely related nonphotosynthetic euglenoid Astasia longa psbC gene was characterized as a group Ill twintron (8, 9). composed of an internal 1503 nt group Ill intron with Euglena chloroplast also contain 15 twintrons, - an of 1374 nt (ycfl3, 458 amino within-introns, that are sequentially spliced (3). There are acids), and an external group Ill intron of 102 nt. twintrons with one intron internal to another intron (2, 6, 10), Twintron excision proceeds by a sequential splicing and complex twintrons with multiple internal introns (11; for pathway. The splicing of the internal and external group review, see 3). Group II and group III introns can occur as Ill introns occurs via lariat intermediates. Branch sites internal and/or external introns of twintrons. Excision of internal were mapped by primer extension RNA sequencing. introns occurs prior to excision of external introns. The excision The unpaired adenosines in domains VI of the internal of internal group IH introns of twintrons can proceed from and external introns are covalently linked to the 5' multiple 5' and 3' splice sites (6, 1 1). Internal introns are believed of the intron via 2'-5' phosphodiester bonds. to interrupt functional domains of external introns, neccesitating This bond is susceptible to hydrolysis by the de- sequential splicing (3, 4). branching activity of the HeLa nuclear S100 fraction. The psbC gene of the Euglena gracilis chloroplast DNA, which The internal intron and presumptive ycfl3 mRNA encodes a 44 kDa component of the photosystem II accumulates primarily as a linear RNA, although a lariat reaction center (12, 13), contains an intron with an open reading precursor can also be detected. The ycfl3 gene frame (ORF) of 1374 nt (ycfl3, 458 codons ). It had previously encodes a maturase-like protein that may be involved been proposed that this intron is an atypical group II intron (14). in group Ill intron metabolism. We examined the splicing of the 1.6 kb intron by direct RNA sequencing, cDNA cloning and sequencing, and northern INTRODUCTION hybridization. This intron is a group III twintron, and ycfl3 is encoded within the internal group IH intron. Splicing of the The complete DNA sequence of the Euglena gracilis chloroplast internal and external group III introns occurs via lariat is known (1). The chloroplast genes ofEuglena contain intermediates, establishing a functional role for group HI 155 introns, including 74 group II introns. Many of the Euglena VI. group II introns are smaller than those found in other because domains I -IV are abbreviated (2-4). All possess a MATERIALS AND METHODS distinct catalytic domain V, and domain VI, which includes the unpaired adenosine residue that initiates nucleophilic attack at cDNA cloning and sequencing the 5' splicejunction. Euglena chloroplast genes also contain 64 Primers were synthesized by either The University of Arizona introns of a unique class designated group I (S). Group III Center or The Midland Certified Reagent introns are abbreviated group II introns that have retained group Company. Chloroplast RNA (ctRNA) was isolated and II-like 5' and 3' boundary sequences and a domain VI-like isopropanol-fractionated from purified as described structure, but lack domains HI-IV (2, 6, 7). It is not known if (10). DNA sequence coordinates (see below) have been described the putative group HI domain VI has a functional role during (1; EMBL accession number X70810, release 37, version 18). splicing in lariat formation. In many group HI introns, the region Two synthetic deoxynucleotide primers complementary to the between the 5' boundary and domain VI resembles group II intron RNA-like strand (cDNA primers) at the psbC intron 4- 5 domain ID. Group III introns also occur in the genes of boundary (5'-GGGGATCCGCTACATGAGCTTAATTTAG-3',

*To whom correspondence should be addressed Present addresses: +Department of Neurobiology, The Scripps Research Institute, 10666 North Torrey Pines Road, La Jolla, CA 92037 and 1Department of Chemistry and Biochemistry, University of Colorado, Boulder, CO 80309, USA 1030 Nucleic Acids Research, 1994, Vol. 22, No. 6 positions 20234-20253), and in psbC exon 5 (5'-GGAA- hybridization, and post-hybridization procedures were performed CAAAATGAGCTACTTC-3', positions 20300-20319) were as described (10). used to prime cDNA synthesis with total ctRNA as template (2). Another deoxyoligonucleotide 5'-GGTAATTCTAGACTTAT- RNase H digestion and debranching of RNA lariats AAATGTATCAGG-3' of the RNA-like strand (PCR primer) in 12 Atg of HMW ctRNA was dried with or without 500 ng of a psbC exon 4 (positions 18596-18624) was used to amplify the purified deoxyoligonucleotide primer 5'-CCTCGCGTATAAA- resulting cDNAs by the polymerase chain reaction (PCR) (Perkin- GTAAAGAGAG-3' complementary to the RNA-like strand in Elmer, Cetus) as described (10). The PCR product derived from psbC intron 4 at positions 19555- 19577. Pellets were the partially spliced psbC RNA was digested with Xba I and resuspended in 10 tl 25 mM Tris-HCl, pH 7.5; 1 mM EDTA; BamHI, gel-purified by crush and soak (15), and cloned into the 50 mM NaCl; heated at 68°C for 10 minutes, and slow cooled Xba I and BamHI sites of pBluescript KS - vector (Stratagene to 30°C. An equal volume of 2 x RNase H buffer (40 mM Cloning Systems). This DNA was designated pEZC 1041 Tris-HCl, pH 7.5; 20 mM MgCl2; 100 mM NaCl; 2 mM (Figure 1). The PCR product derived from the fully splicedpsbC DTT; 60 ,ug/ml BSA) with or without 0.5 units RNase H RNA was directly cloned into ddT-tailed pBluescript KS- (16). (Promega) was then added and the reactions were incubated at This plasmid DNA was designated pEZC 1091 (Figure 1). The 30°C for 1 hour. Reactions were terminated by adding 100 yl plasmid DNA pEZC1041.1 was generated by digesting stop mix (38.5 Atg/ml tRNA, 20 mM EDTA, 300 mM NaOAc). pEZC1041 with HincH and SmaI, releasing a 40 bp fragment The mixture was phenol extracted, and products were ethanol of the multiple cloning site that contains the EcoRI site, and precipitated. Control RNase H experiments were performed using religating the remaining plasmid DNA. To synthesize an external 2.2 x 105 dpm of transcript synthesized using T7 RNA group III intron-specific riboprobe, pEZC 1041.1 was linearized polymerase from pEZC2000 linearized with BamHI. This at the EcoRI site within the 5' region of the external intron transcript is 1688 nt in length (41 nt vector, 31 nt exon 3, 21 (position 18648). This plasmid DNA also contains 10 nt of the nt 5' portion of external intron, 1503 nt internal intron, 81 nt 3'-exon as part of the cDNA primer. The plasmid cDNA inserts 3' portion of external intron, and 11 nt exon 4). , were completely sequenced on both strands by dideoxy DNA purification, and self-splicing of the 887 nt yeast mitochondrial sequencing using the Sequenase kit (U. S. Biochemical). intron aI5,y in 0.5 M (NH4)2SO4 from pJD20 was done as described (19). 6 mg aliquots of HMW ctRNA and purified aI5,y Genonic cloning and sequencing lariat were treated with HeLa S100 nuclear extract for 1 hour Euglena gracilis chloroplast DNA (ctDNA) was isolated as as described (20). Reactions were subjected to electrophoresis described (17). The cDNA primer at the psbC intron 4-exon 5 through 4% polyacrylamide gels containing 8 M urea in 44.5 boundary and the PCR primer in exon 4 described above were mM Tris-borate, 44.5 mM boric acid, 1 mM EDTA (0.5x used to amplify the corresponding region of the chloroplast TBE), electroblotted to a Genescreen membrane, and either genomic DNA. 200 ng of ctDNA in 2 ,u of ddH20 was boiled exposed to film (in vitro controls) or hybridized with a riboprobe for 10 minutes, placed on ice, and amplified by PCR. 430 ng specific for the internal ycfl3-containing group HI intron as of cDNA primer and 400 ng of PCR primer were included in described above (see Figure 2). the amplification. Prior to amplification, a 2 minute 'hot start' at 80°C was done in the absence of Taq polymerase. The Taq Primer extension cDNA sequence analysis polymerase was then added and amplification preceeded for 30 Purified deoxyoligonucleotide primers 5'-CACTAATTTAA- cycles. The DNA was denatured at 940C for 1 minute, annealed TTAAACTACTAAC-3 ', 5'-GGAACAAAATGAGCTACT- at 42°C for 1 minute, and extended at 72°C for 2 minutes. The TC-3', and 5'-GAAACTTGAAAAATTTCATAAAAAATC-3' PCR product was then digested with Xba I and BamHI, gel- complementary to the RNA-like strand in psbC at positions purified by GENECLEAN (BIO 101), and cloned into the Xba 20202 -20225 (intron 4), 20300-20319 (exon 5), and I and BamHI sites of pBluescript KS-. The plasmid DNA was 18726 - 18745 (ycfl3), respectively, were 5' end-labelled using designated pEZC2000. The internal group III intron-specific polynucleotide kinase (BRL). 32P-labelled primer (1 x 107 dpm) sequences corresponding to positions 19154- 19587 were cloned was dried with 12 ,ug of total ctRNA or HMW ctRNA, or 2 tg by gel-purifying the 437 bp HindIllEcoRI fragment from sRNA and resuspended in 12 ,a1 of 200 mM KCl; 10 mM pEZC2000 using GENECLEAN, and ligating the fragment into Tris-HCl, pH 8.3 at 42°C. Two 6 /tg aliquots ofEuglena HMW the HindIm and EcoRI sites of the pBluescript KS-. The ctRNA treated with HeLa S100 nuclear extract were also used resulting plasmid DNA was designated pEZC2000. 1. Identity as template. The hybridization and primer extension reactions of plasmid DNA inserts were confirmed by dideoxy DNA were done as described (5). The reactions contained 770 gM sequencing from the ends of the inserts, and by restriction dATP and dTTP, and 290 zM dCTP and dGTP, plus one of the endonuclease mapping. following: 500 ItM ddATP or ddTTP, or 167 ,tM ddCTP or ddGTP. The primer extension reactions were subjected to Northern blot analysis electrophoresis on 6% polyacrylamide gels containing 8 M urea 2 Ag each of total ctRNA, high molecular weight-enriched ctRNA in 0.5 x TBE. (HMW ctRNA), and the isopropanol-soluble (sRNA) fraction were subjected to electrophoresis through a 1.2% agarose gel containing 1 xMOPS and 2.2 M formaldehyde using the RESULTS optimized surface tension procedure (18). Fractionated cDNA cloning and sequencing were transferred to Genescreen (NEN, DuPont Co.) membranes The chloroplast DNA (ctDNA) of Euglena gracilis contains four and hybridized with riboprobes corresponding to intron-specific intron-encoded open reading frames (1, 3). One of these, plasmid inserts generated in vitro by T3 or T7 RNA polymerase designated ycfl3 with 458 codons, is located within the 1605 nt (see cloning above and Figure 2). The prehybridization, intron 4 of photosystem H gene psbC (Figure IA). This intron Nucleic Acids Research, 1994, Vol. 22, No. 6 1031

A psbC psbC r-. , ycf1- 13 psbC 1 ycfl3 /' - _ ----__ t 3' 5 psbC _3, PCR primer cDNA Drimers B- A - J partially spliced C -- _ *fully spliced B z ¢ <: C

a- - Partially Fully < E g Unspliced Spliced Spliced G AT C A G A T C 9.49-:9.49- 7.46-:~7.46- A~~~~ 4.40-' 2.37-'2.37- A 1666 1.35-1.35-3

.s...... T

A 0.24- 163 .4: pEZCl1041 pEZCl1091

;> by Northern Figure 2. Analysis of psbC twintron-containing transcripts Figure 1. The Euglena gracilis chloroplast ycfl3 is encoded in a group HI twintron hybridization. (Top) Structure of the psbC twintron and the location of the probes within the photosystem II psbC gene. (A) (Top panel) Genetic structure of the used in the northern hybridizations. Symbols for structures of the gene are as 1.6 kb psbC twintron and the experimental design for the analysis of the psbC indicated in Figure 1. 32P-radiolabelled riboprobes complementary to the exon group Ill twintron. The locations of the cDNA and PCR primers used for the and intron sequences indicated at the bottom (A, B, C) were prepared by T3 analysis of the psbC twintron are indicated by arrows. The filled boxes correspond and T7 transcription from containing the corresponding insert. (Bottom) to psbC and open boxes correspond to the external group III intron. The Northern analysis of psbC twintron-containing transcripts. Fractionated RNAs stippled box denotes the internal group III intron. The gray box depicts the open (see Materials and methods) were electrophoresed through a denaturing 1.2% reading frame ycfl3. The polarity of the gene is indicated. (Lower panel) Structure agarose gel, transferred to a nylon membrane and hybridized with the riboprobes of the psbC partially spliced cDNA, pEZC1041. The cDNA insert of pEZC1041 corresponding to the plasmid inserts A, B, and C. The sizes of BRL RNA markers corresponding to the partially spliced twintron is shown. The excision of the are indicated at the left. ycfl3-containing internal group Ill intron is indicated. (B) The ycfl3-containing internal group III intron is excised from the twintron of the psbC pre-mRNA. (Left panel) PCR amplification ofpsbC partially spliced pre-mRNAs and psbC genomic DNA. The sizes at the right are based on the characterized PCR products. the 102 nt intron, cDNA of fully spliced mRNA, containing (Middle, left panel) DNA sequence analysis of the unspliced, genomic psbC clone, portions of exons 4 and 5, was also cloned and sequenced pEZC2000. The arrow refers to the position of the external group IIH intron- internal group IIl intron 5' boundary. (Middle, right panel) DNA sequence analysis (pEZC1091, Figure iB). The exon 4-exon 5 splice boundaries of the partially spliced psbC cDNA clone, pEZC1041. The arrow refers to the are in agreement with an earlier report (14). As a control, a PCR site where the ycfl3-containing internal group III intron was excised. (Right panel) product representing unprocessed pre-mRNA was cloned DNA sequence analysis of the fully spliced psbC cDNA clone, pEZC 1091. The (pEZC2000), mapped by restriction endonuclease digestion (data arrow refers to the position of the excised group III twintron. not shown), and the boundaries of the insert were sequenced (Figure 1B). The 1503 nt internal intron is also interpreted as a group III follows the 5'-GU--AG-3' rule typical for nuclear pre-mRNA intron with an internal open reading frame of 1374 nt, the ycfl3 introns (21), and was reported to contain structural features gene, and 126 nt of non-coding sequence. Assuming ycfl3 has similar to group II introns (14). However, this hypothesis was 5' and/or 3' untranslated regions, the remaining non-coding made prior to the discovery of group III introns (22) and twintrons region, of length approximately 100 nt, is the size expected for (10). In our view, the 3' structure is reminiscent of the domain a group IH intron (1, 5, 6). The internal intron also has a potential VI-like region of group III introns (2, 6, 7). To test the hypothesis group IE-like secondary structure (see below). Group In introns that the psbC intron 4 might be a new type of twintron, the with internal ORFs, and ORFs within the internal intron of a polymerase chain reaction (PCR) was used with primers designed twintron have not previously been described. to amplify partially-spliced psbC pre-mRNAs. In control The identification of the partially spliced pre-mRNA is direct reactions, fully-spliced psbC mRNA, and genomic DNA evidence that the 1605 nt psbC intron is a twintron, formed from representative of unprocessed pre-mRNA were also amplified. the insertion of a group IH intron into a group HI intron. PCR products for partially-spliced and unprocessed pre-mRNAs Identification of the partially spliced pre-mRNA is also consistent are shown in Figure LA. with a sequential in vivo splicing pathway (2). Since the RNA The PCR product from partially spliced pre-mRNA was was amplified from the upstream exon, the isolated cDNA amplified, cloned (pEZC1041), and sequenced (Figure IB). The representing a partially spliced mRNA must be derived from cDNA insert of pEZC 1041 is 143 bp in length and corresponds splicing of a precursor mRNA, and not an excised twintron. No to a partially spliced RNA in which a 1503 nt internal intron has variation in splice sites was detected following splicing of either been excised, and a 102 nt group HI intron remains between psbC the internal or external group III introns (3/3 and 12/12 exons 4 and 5. The 102 nt group II intron results from ligation independent cDNAs, respectively). A consequence of the of the 5' (21 nt ) and 3' (81 nt ) portions ofpsbC intron 4 flanking sequential RNA processing mechanism would be the the 1503 nt internal intron. To confirm the splice boundaries of accumulation of excised internal and external introns, rather than 1032 Nucleic Acids Research, 1994, Vol. 22, No. 6

UU Uu A B C uu in vitro in vivo in vivo

uuUkt?u.u u. uu UU-A _ Z Z Z Z Z Z Z Z u u r cRF F [ Er r CC + 'C+ uus0UUuU ycdl3 (1377 nt) extemal intron Uuut:uuu.U~ (102 nt) LAA - A intenal inton U-AA (125 nt) U-A lariat -. - lariat Cx-v (887 nt) (1 503 nt) U.U-A U-A domain VI uU-Au UG-UU uuU U AUU AUA linear A6~~~~- linear -__ f1 503 nt) 6-U (887 ni) 5'-UU G =C UUU UJUA-U A-UUUA-3' A oanV _ U-i' c "~~-A uA UUUGU uG A-~ U A U-A G.c A:U A..A A U U-A C.- Figure 4. The internal ycfl3-containing group III intron accumulates as a linear exon G. U-A 1 exon RNA. A 1688 nt 5'-GGGUG GUGUGUUU UUA-:UAAGCUCAU-3' or broken lariat uniformly 32P-labelled transcript representing the unspliced psbC twintron synthesized in vitro from the linearized genomic plasmid DNA pEZC2000 (A) and Euglena in vivo HMW ctRNA (B) were hybridized with an oligonucleotide specific to ycfl3 and incubated with RNase Figure 3. Secondary structure model for the psbC twintron. The internal group H as indicated. A 32P-labelled purified lariat (and broken lariat) from the excised I1 intron is on the left and the external group Im intron is on the right. The proposed 887 nt yeast mitochondrial intron aISy was included as a control for a linear and twintron junction is within a stem-loop structure in the 5' portion of the external lariat RNA of known size. The in vitro RNase H controls and the purified aI5y intron, and is marked with an arrow. The position of the open reading frame lariat were direcdy subjected to autoradiography. The in vivo HMW ctRNA (ycf13) within the internal group mI intron is also within a 5' stem-loop structure. samples were hybridized with the ycf13-specific riboprobe (see Figure 2). (C) The asterisks denote the adenine residues that are the proposed nucleophiles during A long exposure of the in vivo northern blot shown in (B). lariat formation and intron excision. The vertical arrows mark the boundary of the psbC twintron and the 5' and 3' exons. The exon sequences are italicized. The secondary structure model was derived using the FOLD program (23). Domain VI and the 3' end of the introns were excluded from the analysis. total ctRNA HMW ctRNA sRNA in vitro RNA G A T C N G A T C N G A T C N G A T C N twintron-sized products. To test this prediction, excised psbC 5'intron V;9A1IR 4q, IA introns were characterized by northern RNA hybridization. Northern blot analysis I~A The Euglena chloroplast psbC gene is 1 1,403 bp in length, with 11 exons and 10 introns (1). Pre-mRNA transcripts of 2.1 and stop~ **t A 2.5 kb, as well as a 1.5 kb species interpreted to be the fully sp ice_ A spliced psbC mRNA were previously detected with a psbC intron 4/exon 5 specific probe (14). A DNA probe corresponding to ycfl3 hybridized to a transcript of 1.4 kb, interpreted to be the 1605 nt excised intron. In order to extend these results, and to identify the products of sequential splicing of the psbC twintron, total chloroplast RNA (ctRNA), and chloroplast RNA fractionated Figure 5. Mapping of the putative site of branch formation to the unpaired into an isopropanol-soluble fraction (sRNA), and a high molecular adenosine in domain VI of the internal group Ill intron. Total ctRNA, HMW ctRNA, sRNA, and RNA synthesized in vitro were used as templates for dideoxy weight fraction (HWM) were subjected to northern hybridization cDNA sequencing using AMV reverse transcriptase and a 32P end-labelled analysis. Riboprobes specific for the internal ycfl3-containing oligonucleotide primer specific for the external intron. The sequence group III intron (pEZC2000. 1; probe A, Figure 2), the partially complementary to the RNA is shown. The termination which maps to the nucleotide spliced cDNA (pEZC 1041; probe B, Figure 2), and the external prior to the branch site is indicated. The 5' region of the external intron is also indicated. The boxed C residue refers to the nucleotide at the position ofthe branch group III intron plus 10 nt from psbC exon 5 (PEZC 1041. 1, site in the unspliced sequence (see Figure 6). probe C, Figure 2) were used. The internal intron-specific probe hybridized predominantly to a 1.5 kb RNA (Figure 2A), the size expected for the excised internal ycfl3-containing group HI intron. This probe also kb RNA (Figure 2B) interpreted to be the fully spliced psbC hybridized to additional RNAs of 3.1, 2.7, and 2.2 kb, interpreted mRNA. This 1.4 kb RNA is distinct from the ycfl3-specific 1.5 to be intron-containing pre-mRNAs. The low molecular weight kb RNA. Intron-containing pre-mRNAs of 2.7 and 2.2 kb are RNA species appearing in the sRNA lane is interpreted as cross- also detected with the probe B. Pre-mRNAs of 5.8, 4.7, and 3.8 hybridization to tRNA. The riboprobe representing the partially kb are also evident upon longer exposure of this blot (data not spliced cDNA (probe B) hybridized with a strong signal to a 1.4 shown). The excised external group IIm intron was not detected Nucleic Acids Research, 1994, Vol. 22, No. 6 1033 in the three chloroplast RNA fractions with probe B. The specific oligonucleotide primer complementary to the RNA-like riboprobe corresponding to the external group III intron which strand within ycfl3 was hybridized to Euglena HMW ctRNA and lacks the 5' exon 4 sequences but contains 10 nt from the 3' exon treated with RNase H. A circular or lariat intron would be cleaved (probe C) hybridized with equal intensity to three RNA species via RNase H digestion of the primer/intron hybrid resulting in (Figure 2C). The 1.4 kb RNA is distinct from the ycfl3-specific a linear molecule, whereas a linear species would be cleaved into RNA and is most likely the fully spliced psbC mRNA detected two intron fragments. The reaction products were then analyzed with probe B. The external group IH intron probe also hybridized by northern hybridization on acrylamide gels (Figure 4). Lariats to intron-containing pre-mRNAs of 2.6 and 2.1 kb. The excised migrate anomalously slowly compared to linear RNAs of equal external group III intron was not detected in the three chloroplast size upon gel electrophoresis through high concentrations of RNA fractions with probe C. acrylamide (28-30). A 1688 nt transcript of the psbC twintron (and portions of the surrounding exons, see Figure 1A) was Secondary structure model for the psbC twintron synthesized in vitro and used as a linear RNA control. Purified Secondary structure models for the internal and external group lariat and linear (broken lariat) RNAs from the excised 887 nt HI introns are shown in Figure 3. The most characteristic feature yeast mitochondrial group II intron aISy were also used as RNA of group III introns is the group II intron-like domain VI at the mobility strandards. The results of these experiments are shown 3 '-end, with a bulged A -7 or -8 nt from the 3'-splice boundary in Figure 4. that is the putative nucleophile for lariat formation. The insertion The ycfl3-specific probe (see Figure 2) hybridized to a site of the internal intron is in the 5' region of the external intron, transcript in the in vivo untreated and RNAse H control samples and may define another functional domain for group III introns. which migrates as a linear (or debranched) 1.5 kb molecule, A secondary structure model for both introns of the psbC twintron relative to the control samples of the 1688 nt psbC in vitro was determined using the RNA FOLD program (23). Domain transcript and the linear and lariat 887 nt excised intron aI5,y. VI and the 3' end of the introns were excluded from the analysis RNase H digestion of the in vitro psbC transcript in the presence (Figure 3). The internal group Ill intron is located within a of the ycfl3-specific oligo would result in two fragments of sizes predicted stem-loop structure in a U-rich bulge in the 5' region 988 and 677 nt. These products are apparent in Figure 4A in of the external group III intron. The structure is similar to that the (+) oligo and (+) RNase H lane. In the (+) oligo and (+) proposed for the external group Ill introns of the rps3 and the RNase H in vivo samples, the ycfl3-specific probe hybridizes rp116 twintrons (2, 3). A stem-loop structure in the 5' region to an RNase H cleavage product of 0.9 kb. The predicted size of the intron is also predicted for the internal intron of the psbC is 895 nt. This product is the 5' fragment of the internal intron, twintron. The 1374 nt ORF is internal to this structure. An as expected for a linear internal intron substrate. The explanation for the ability of the unusually long 1503 nt internal ycfl3-specific probe does not contain sequences complementary intron to splice normally is apparent from the secondary structure to the 3' fragment of the internal intron. model. The proposed intron core structure and helical domains A high molecular weight RNA that is slightly larger than the might not be disrupted by the distal location of ycfl3. aI5,y lariat is also detected with the ycfl3-specific probe upon The 5' and 3' untranslated regions of ycfl3 have not been a longer exposure (Figure 4C). This RNA is a lariat or circular defined. Although only the coding sequences of ycfl3 were form of the internal yvcl3-containing group III intron. Digesting excluded from the computer analysis, the untranslated regions ctRNA in the presence of the ycfl3-specific oligonucleotide with presumably would not contribute to the intron structure. Intron- RNase H results in linearization of this species. From these results encoded ORFs of yeast group HA introns are located in domain we conclude that the internal ycfl3-containing group III intron IV, which is not essential for intron excision in vitro (19, 24). accumulates predominately as a linear molecule in steady-state Similarly, ycfl3 must be positioned in a nonessential region of ctRNA. However, a lariat or circular RNA, potentially the the internal group III intron. Both of the introns appear to lack precursor of the linear species, is also present. exon-binding sites. The predicted folding of the internal intron of the psbC twintron also has a short helical domain between Primer extension RNA sequencing of the psbC twintron the 5' stem-loop and domain VI-like structures. This domain is Group II introns are excised as lariats, which form when the 2' absent from the external intron. hydroxyl of the unpaired adenosine within the domain VI helix is linked to the 5' nucleotide of the intron via a 2'-5' RNase H digestion of the psbC internal group m intron phosphodiester bond (28, 30). Group III introns are abbreviated Many intron-encoded open reading frames known versions of group II introns that contain a domain VI-like structure as maturases that function in splicing of their host intron. Some at the 3' end of the intron (2, 6, 7). The branch point for group intron-encoded maturases are expressed as fusion proteins with III lariat formation has not been defined. To map the branch point, the upstream exon, others are freestanding and are presumably direct primer extension RNA sequencing was performed using translated independently of the upstream exon (for reviews, see a synthetic primer complementary to the RNA-like strand of the 25-27). The group III twintron-encoded ycfl3 is a freestanding external intron. Total ctRNA, HMW, and the sRNA fractions ORF, and the excised intron containing the ORF accumlates to were used as templates. The results are shown in Figure 5. high levels (Figure 2). Group III introns of the rpll6, rps3, and A ladder corresponding to a cDNA sequence of the unspliced rpsl8 twintrons accumulate as lariat or circular RNAs (6, 11, twintron is obtained when total and HMW ctRNAs are used as D.W.C. and R.B.H. unpublished). The internal ycfl3-containing template. A prominent termination in all lanes is also seen with group Ill intron of the psbC twintron might also accumulate as these samples. However, this termination site is not present when a branched RNA or alternatively, as a linear, debranched the sRNA fraction or an in vitro RNA control is used as template molecule from which the ORF may be translated. (Figure 5). The termination site maps to the the nucleotide prior To test whether the ycfl3-containing internal group III intron to the unpaired adenosine in domain VI of the internal group III accumulates as a linear or lariat molecule, an internal intron- intron, which is the predicted position for branch formation if 1034 Nucleic Acids Research, 1994, Vol. 22, No. 6

5' psbC ycfl3 psbC 3 _- 1% A untreated mock HeLL1 G A T C N G AT C N GA TC N / tiL\7 .....k Splicing of _s* ~~~~~~~~~~~~~.M Internai Intron UAG twF r - OOs~~~~~~~~~( stopo a J/Lmr ''_

splice - Splicing of Debranching External v Int-or. FEcisd nron

AUG UAG

B osbC nmRNA yctl3 mRNA

Figure 7. A model for the RNA processing pathway ofthepsbCtwintron. Splicing ofpsbC exons 4 and 5 occurs by the sequential splicing of internal and external group III introns. The internal intron is excised as a lariat RNA, but accumulates in vivo as a linear species. sto p 4-0

splice-* debranching activity than that of the internal group III intron. The difference in susceptibility of the HeLa debranching activity between the internal and external introns may be due to the size and sequence context of the lariat RNAs. To test whether processing other than debranching modifies ycfl3 mRNA, the 5' end of the mRNA was Figure 6. The internal (A) and external (B) group Im introns of the psbC twintron ycfl3 analyzed by primer extension are excised via lariat intermediates. Untreated, mock-treated, and HMW ctRNA RNA sequencing. The 5' end is the same as the 5'-end of the treated with a HeLa nuclear S100 fraction were subjected to primer extension internal group III intron (data not shown). From these results RNA sequencing as described in Figure 5. The arrows refer to the position of we conclude that group HI introns are excised via a lariat the external group Im intron-internal group III intron, and exon 4-external group intermediate, and that group HI intron domain VI is both the III intron boundaries. The branch sites hydrolyzed by the HeLa debranching activity are indicated by an arrow at the right. structural and functional equivalent to group II domain VI. DISCUSSION it functions like domain VI of group II introns. The template for this cDNA is most likely the lariat intermediate for the internal The excision of the 1605 nt intron 4 within the Euglena gracilis intron. The 2'-5' phosphodiester bond between the unpaired psbC gene was characterized by cDNA cloning and sequencing, adenosine within domain VI and the 5' splice site prevents primer northern hybridization, primer extension RNA sequencing, and extension by reverse transcriptase. debranching experiments. This intron is composed of an internal In order to confirm that the termination is due to branch 1503 nt group IH intron that contains an ORF of 458 codons formation, HMW ctRNA was treated with the debranching (ycfl3), and an external group HI intron of 102 nt. Group HI activity of a HeLa nuclear SO00 fraction (20), followed by primer twintron excision was found to proceed by a two-step, sequential extension analysis. As shown in Figure 6A, the termination site in vivo splicing pathway. A model for the RNA processing is susceptible to 2'-5' debranching activity. However, the 2'-5' pathway is shown in Figure 7. Excision of both the internal and phosphodiester bond is not completely hydrolyzed, persisting external group IH introns occurs through a lariat intermediate. even after a second HeLa S100 treatment (data not shown). This However, the ycfl3-containing internal group III intron result is probably due to partial inhibition of the enzyme by the accumulates as a linear molecule. There is probably a precursor- RNA sample. Indeed, debranching of the yeast group II intron product relationship between the excised internal intron-lariat and aI57 as a positive control in the presence of Euglena HMW the ycfl3 mRNA, proceeding via some uncharacterized ctRNA is also partially inhibited (data not shown). The 2'A-5'U debranching activity. Mohr et al (1993) have recently reported phosphodiester bond is not as susceptible to cleavage by the HeLa that domains of ycfl3 aresimilar to the reverse transcriptase debranching activity as the usual 2'A-5'G typically found in group domains of group II intron ORFs, and a domain 'X' that may II and nuclear pre-mRNA introns (31). The excision of the have an essential role in splicing. Thus, the identification of group external group III intron was also examined by primer extension Im intron-encoded ycfl3 ofthepsbC twintron extends the presence RNA sequencing using a primer complementary to the RNA- of maturase-like open reading frames to all classes of like strand of exon 5 (Figure 6B). One termination site is detected, introns. which is absent following treatment with HeLa S100. This site maps to the expected position for lariat formation within domain Group III intron structure VI during excision of the external group III intron. Interestingly, The size range of a group III intron is 91-119 nt (5, 6). Since this 2'A-5'G bond is hydrolyzed more efficiently by the no group Ill introns outside of this size range were previously Nucleic Acids Research, 1994, Vol. 22, No. 6 1035 known, the possiblility existed that splicing was size-dependent. vitro group II intron excision, however, this interaction is very However, the 1503 nt internal group III intron of the psbC inefficient (34). In addition, a deletion analysis of substructures twintron is more than an order of magnitude larger than all other of the yeast group II intron aI5'y has demonstrated that domain group Ill introns. Thus length alone is not the determinant of V binds to domain I in trans and together with the 5' exon splicing. The most critical feature of group Im introns may be constitute the catalytic core of this group II intron (35). Domain the distance between the splice sites and the nucleophile in the V may interact with the 5' domain ID-like element of group III core RNA structure. A similar relationship between size and core introns thereby activating the first reaction at the 5' splice site. structure has been proposed for the two classes of yeast nuclear The nucleophile within domain VI may arrive at the active site pre-mRNA introns. Most yeast nuclear pre-mRNA introns are due to the relatively reduced size of the group HI intron core between 250-1000 nt, whereas others are 100 nt or less. structure. Nuclear pre-mRNA splicing requires trans-acting Intramolecular base-pairing between near the 5' splice snRNPs which assemble on the intron to form the site and the branch site within the larger introns may result in (for reviews, see 36, 37). Important regions of some of the structures comparable to the smaller introns (32). These snRNAs of these ribonucleoproteins appear to be structural and/or interactions were shown to positively affect both splicing functional cognates to the cis-encoded domains of group II introns efficiency and splice site selection (33). (38-41). Nuclear and group 11 introns may be directly related Group III introns also possess internal secondary structure descendants of group II introns or may represent parallel evolution between the 5' splice site and the branch site. The 5' regions from group II introns (3). Other similarities between group III of the internal intron of the rpll6 twintron and the external intron and nuclear introns include conserved 5' boundary sequences, of the rps3 twintron are almost identical to domain ID3 of group lariat formation, lack of internal structure, and ability to use II introns (2, 3). The terminal loop of the 5'-domain of some alternate splice boundaries. group m introns may interact with the upstream exon in a tertiary interaction similar to the EBS1-IBSI of group II introns (4). Other Does ycfl3 encode a maturase? group III introns have potential stem-loops in the 5' region of Several group I and group II introns contain open reading frames the intron that lack EBS-like sequences. These 5 '-domains may that encode maturases (25-27,42). These maturases are required have evolved from domain ID and now function in a core for efficient splicing in vivo and are thought to bind and stabilize structure required for splicing without a tertiary interaction with catalytically active RNA structures. The fungal mitochondrial the exon. Several intron insertion sites among group HI and group II intron-encoded maturases are highly conserved and complex twintrons are within these predicted 5 '-helical domains contain domains with a high degree of similarity to reverse (3). The internal intron of the psbC twintron also disrupts a transcriptases (42-45). Reverse transcriptase activity associated potential stem-loop structure of the external intron (Figure 3). with several of the yeast group II intron-encoded maturases that The internal introns may have been maintained because the may be involved in intron mobility has also been demonstrated structures they interrupt are essential for splicing of the external (46). Many of the ORFs contain a conserved domain X and a intron (3). Zn2+-finger-like region (42). The putative group Il-intron encoded maturases within the plastid trnK genes from tobacco, Group HI introns excise as lariats pea, mustard, and rice contain only remnants of the blocks of Based on secondary structure analysis, group III introns have amino acids that are diagnostic for reverse transcriptase-like been predicted to contain domain VI at their 3' ends. Six of ten sequences. These maturases lack the Zn2+-finger-like region, twintrons involving external group HI introns contain insertion yet retain domain X. Consequently, domain X may be essential sites within this domain, consistent with a functional role for this for the maturase activity of intron-encoded open reading domain (3). Several group Im introns have been shown to frames (42). accumulate as lariat or circular RNAs (6, 11, D.W.C. and Group III intron excision is also likely to require proteins in R.B.H. unpublished). Termination sites in the primer extension vivo. One potential group III maturase is ycfl3, which is encoded RNA sequencing of splicing intermediates of the psbC group HI within the internal group II intron of the psbC twintron. twintron map to the unpaired adenosine in the domain VI-like Interestingly, the plastid DNA of the colorless, heterotrophic structure of the internal and external group HI introns. euglenoid Astasia longa contains an open reading frame (orf456) Termination is sensitive to the debranching activity of a HeLa that has 55.5% identity and 84.8% overall similarity to the nuclear extract that hydrolyzes 2'-5' phosphodiester bonds (Figure Euglena gracilis ycfl3 (8). Both of these ORFs encode domains 6). Therefore, group III introns are excised as lariats in which with homology to the reverse transcriptase domain and domain the unpaired adenosine in domain VI is covalently attached to X of other intron-encoded ORFs (42). However, the Astasia the 5' nucleotide of the intron in a 2'-5' phosphodiester bond. orf456 gene is not intron-encoded. The plastid DNA of Astasia Furthermore, domain VI of group III introns is both structurally is approximately 73 kb in size, and lacks most or all and functionally equivalent to domain VI of group II introns. An photosynthetic-related genes other than rbcL. The Astasia orf456 evolutionary and mechanistic relationship between groupII and gene is on the same strand and in relatively the same location groupIII intron excision seems apparent. of the genome as the Euglena ycfl3 gene, given deletions of the The evolution of the groupII and group HI splicing systems psbA and psbC genes. Apparently, the Astasia plastid DNA has in euglenoid may parallel the evolution of nuclear pre- undergone deletions of dispensable genes, particularly those mRNA splicing (3). Groupm introns represent abbreviated group involved in the photosynthetic light reactions. Other highly II introns. Part or all of the groupII intron catalytic core may conserved genes have been maintained, and therefore may be be reconstituted for group HI intron excision (3, 6). Many of essential for plastid functions. The ycfl3 gene is also present the cis-elements required for groupII intron excision, such as within the psbC intron in four other species of Euglena (D.W.C. domainV, may be supplied as trans-acting RNAs forgroupTmT etal., unpublished). The plastid genes ofAstasia longa and other introns. Domain V interacts with domain VI in trans during in euglenoid species also contain groupIm introns (8, 9, D.W.C. 1036 Nucleic Acids Research, 1994, Vol. 22, No. 6

et al. unpublished). ycfl3, which is group HI twintron-encoded 30. van der Veen,R., Arnberg,A., Van der Horst,G., Bonen,L., Tabak,H. & in Euglena, may be required for group I intron excision and/or Grivell,L. (1986) , 44, 225-234. 31. Peebles,C.L., Belcher,S.M., Zhang,M., Dietrich,R.C. and Perlman,P.S. mobility in Euglena and Astasia. A similar situation is found (1993) J. Biol. Chem., 268, 11929-11938. within the plastid genome of the nonphotosynthetic parasitic 32. Parker,R. and Patterson,B. (1987) of RNA: New Epifagus virginiana. The matK ORF is closely related to the Perspectives., Academic Press, New York, pp. 133-149. maturases encoded within the group II intron of plant trnK genes, 33. Goguel,V. and Rosbash,M. (1993) Cell, 72, 893-901. but is not intron-encoded (47). Presumably, matK has been 34. Dib-Haji,S.D., Boulanger,S.C., Hebbar,S.K., Peebles,C.L., Franzen,J.S. and Perlman,P.S. (1993) Nucleic Acids Res., 21, 1797- 1804. retained since it may be required for the excision of other group 35. Koch,J.L., Boulanger,S.C., Dib-Haji,S.D., Hebbar,S.K. and Perlman,P.S. H introns (47). We are currently testing whether ycfl3 is (1992) Mol. Cell. Biol. 12, 1950-1958. necessary for group HI intron excision and/or mobility. 36. Guthrie,C. (1991) Science, 253, 157-163. 37. Steitz,J.A., Black,D.A., Gerke,V., Parker,K.A., Kramer,A., Frendewey,D. and Keller,W. (1988) In Birnsteil,M.L. (ed.), Stn*cure and Function ofMajor ACKNOWLEDGEMENTS and Minor Small Nuclear Ribonucleoprotein Particles., Springer-Verlag, Berlin, pp. 115-154. We wish to thank Jenny Stevenson and Mike Thompson for 38. Jacquier,A. (1990) Trends Biochem. Sci., 15, 351-354. critical reading of the manuscript. We also wish to thank Dr. 39. Madhani,H.D. and Guthrie,C. (1992) Cell, 71, 803-817. Philip Perlman for the plasmid DNA pJD20 of the yeast 40. Newman,A.J. and Norman,C. (1992) Cell, 68, 743-754. 41. Steitz,J.A. (1992) Science, 257, 888-889. mitochondrial intron aI5'y, Drs. Steven Mayer and Bemardo 42. Mohr,G., Perlman,P.S. and Lambowitz,A.M. (1993) Nucleic Acids Res., Nadal-Ginard for the HeLa S100 nuclear extract, and Giordano 21, 4991- 4997. Caponigro and Denise Muhlrad for the RNase H protocol. This 43. Michel,F. and Lang,B.F. (1985) Nature, 316, 641-643. work was supported by the US National Institutes of Health. 44. Xiong,Y. and Eickbush,T.H. (1988) Mol. Biol. Evol., 5, 675-690. 45. Xiong,Y. and Eickbush,T.H. (1990) EMBO J., 9, 3353-3362. 46. Kennell,J.C., Moran,J.V., Perlman,P.S., Butow,R.A. and Lambowitz,A.M. REFERENCES (1993) Cell, 73, 133-146. 47. Wolfe,K.H., Morden,C.W. and Palmer,J.D. (1992) Proc. Natl. Acad. Sci. 1. Hallick,R.B., Hong,L., Drager,R.G., Favreau,M.R., Monfort,A., Orsat,B., USA, 89, 10648-10652. Spielmann,A. and Stutz,E. (1993) Nucleic Acids Res., 21, 3537-3544. 2. Copertino,D.W., Christopher,D.A. and Hallick,R.B. (1991) Nucleic Acids Res., 19, 6491-6497. 3. Copertino,D.W. and Hallick,R.B. (1993) Trends Biochem. Sci., 18, 467-471. 4. Michel,F., Umesono,K. and Ozeki,H. (1989) Gene, 82, 5-30. 5. Christopher,D.A. and Hallick,R.B. (1989) Nucleic Acids Res., 17, 7591 -7608. 6. Copertino,D.W., Shigeoka,S. and Hallick,R.B. (1992) EMBO J., 11, 5041-5050. 7. Drager,R.G. and Hallick,R.B. (1993b) Curr. Genet., 23, 271-280. 8. Siemeister,G., Buchholz,C. and Hachtel,W. (1990) Moi. Gen. Genet., 220, 425-432. 9. Hachtel,W. (unpublished). X16004. 10. Copertino,D.W. and Hallick,R.B. (1991) EMBO J., 10, 433-442. 11. Drager,R.G. and Hallick,R.B. (1993) NucleicAcids Res., 21, 2389-2394. 12. Alt,J., Westhoff,P. and Hermann,R.G. (1984) Curr. Genet., 8, 597-606. 13. Holschuh,K., Bottomley,W. and Whitfeld,P.R. (1984) Nucleic Acids Res., 12, 8819- 8834. 14. Montandon,P.E., Vasserot,A. and Stutz,E. (1986) Curr. Genet., 11, 35-39. 15. Sambrook,J., Fritsch,E.F. and Maniatis,T. (1989) Molecular Cloning. A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. 16. Holton,T.A. and Graham,M.W. (1991) Nucleic Acids Res., 19, 1156. 17. Hallick,R.B., Richards,O.C. and Gray,P.W. (1982) In Edelmann,M., Hallick,R.B., and Chua,N.-H. (eds), Methods in Chloroplast Molecular Biology, Elsevier Biomedical Press, New York, pp. 281-294. 18. Rosen,K.M., Lamperti,E.D. and Villa-Komaroff,L. (1990) BioTechniques, 8, 398-403. 19. Jarrell,K.A., Dietrich,R.C. and Perlman,P.S. (1988) Mo. Cell. Biol., 8, 2361 -2366. 20. Ruskin,B. and Green,M.R. (1985) Science, 229, 135-140. 21. Sharp,P.A. (1981) Cell, 23, 643-646. 22. Montandon,P.E., Knuchel,A.C. and Stutz,E. (1987) NucleicAcids Res., 15, 7809-7822. 23. Zuker,M. and Stiegler,P. (1981) Nucleic Acids Res., 9, 133-148. 24. Hebbar,S.K., Belcher,S.M. and Perlman,P.S. (1992) Nucleic Acids Res., 20, 1747- 1754. 25. Lambowitz,A.M. and Belfort,M. (1993) Annu. Rev. Biochem., 62, 587-622. 26. Lambowitz,A.M. and Perlman,P.S. (1990) Trends Biochem. Sci., 15, 440-444. 27. Perlman,P.S. and Butow,R.A. (1989) Science, 246, 1106-1109. 28. Peebles,C., Perlman,P., Mecklenburg,K., Petrillo,M., Tabor,J., Jarrell,K. and Cheng,H.-L. (1986) Cell, 44, 213-223. 29. Ruskin,B., Krainer,A.R., Maniatis,T. and Green,M.R. (1984) Cell, 38, 317-331.