Study of group II introns in Euglena genomes: Structure, processing and evolution

Item Type text; Dissertation-Reproduction (electronic)

Authors Zhang, Liqun, 1969-

Publisher The University of Arizona.

Rights Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.

Download date 27/09/2021 14:23:33

Link to Item http://hdl.handle.net/10150/282618 INFORMATION TO USERS

This manuscript has been reproduced from the microfihn master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter &ce, while others may be fi-om any type of computer printer.

The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction.

In the unlikely event that the author did not send UME a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion.

Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps. Each ori^al is also photographed in one exposure and is included in reduced form at the back of the book.

Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6" x 9" black and white photographic prints are available for any photographs or illustrations appearing in this copy for an additional charge. Contact UMI directly to order. UMI A Bell & Howell Infomiation Company 300 North Zed> Road, Ann Aibor MI 48106-1346 USA 313/761-4700 800/521-0600

STUDY OF GROUP H INTRONS EST EUGLENA CHLOROPLAST GENOMES

- STRUCTURE, PROCESSING AND EVOLUTION

by

Liqun Zhang

A Dissertation Submitted to the Faculty of the

DEPARTMENT OF BIOCHEMISTRY

In Partial Fulfillment of the Requirements

For the Degree of

DOCTOR OF PHILOSOPHY

In the Graduate College

THE UNIVERSITY OF ARIZONA

1998 UMl Nunber: 9829355

UMI Microform 9829355 Copyright 1998, by UMI Company. Ail rights reserved.

This microform edition is protected against unauthorized copying under Title 17, United States Code.

UMI 300 North Zeeb Road Ann Arbor, MI 48103 2

THE UNIVERSITY OF ARIZONA ® GRADUATE COLLEGE

As members of the Final Examination Committee, we certify that we have

read the dissertation prepared by LIQUN ZHANG

entitled Study of Group II Introns in Euglena Chloroplast Genomes

— Structure, Processing and Evolution

and recommend that it be accepted as fulfilling the dissertation

requirement for the Degree of Doctor of Philogophy

2/// Date ^

Date

""Tc*- Date

Final approval and acceptance of this dissertation is contingent upon the candidate's submission of the final copy of the dissertation to the Graduate College.

I hereby certify that I have read this dissertation prepared under my direction and recommend that it be accepted as fulfilling the dissertation requirement.

Dissertation Director Date 3

STATEMENT BY AUTHOR

This dissertation has been submitted in partial fulfillment of requirements for an advanced degree at The University of Arizona and is deposited in The University Library to be made available to borrowers under rules of the Library

Brief quotations from this dissertation are allowable without special permission, provided that accurate acknowledgment of source is made. Requests for permission for extended quotation from or reproduction of this manuscript in whole or in part may be granted by the head of the major department or the Dean of the Graduate College when in his or her judgment the proposed use of the material is in the interests of scholarship. In all other instances, however, permission must be obtained from the author.

SIGNED: 4 ACKNOWLEDGMENTS

I would like to thank my advisor, Richard Hallick, for providing me the opportunity and the strict training to become a scientist. His strong encouragement and wise guidance have helped me tremendously during my research and study. I would also like to thank my committee members: Hans Bohnert, Don Bourque, Mark Dodson, William Montfort and Roy Parker, for their support and advice which helped me through so many research low tides. Thanks to my past and present lab members Donald Coppertino, Rober Drager, Jennifer Stevenson, Angie Boyer, Cathy Hibbert, Ling Hong, Kristin Jenkins, Mike Thompson, Mitch Favreau, Jane Huff and Natalie Doestch for their assistance and companionship. I'd like to specially thank Kristin and Natalie for their great effort helping me improve my technique writing skill. There are many friends outside my lab that have also given me strong support and advise. I would like to thank Wei Chen, Hua Sue, Li Zhang, Chunghong Mao, Nicoleta Constantin, David Whitacre and David Ascue. Thanks for helping me solve so many technical problems, providing me valuable information, editing my writings and, simply, for being great friends. They have made my six years of graduate school much more enjoyable. I would like to thank my family my father, Dongpei Chen, my mother, shouyu Zhang and my lovely sister, Cindy yao Chen, for being a strong support throughout my academic years, in China and in the States. Thanks for keeping home the warmest place on earth, thanks for telling me thousands of times that I can do this, thanks for inspiring me with your own achievements and persistent spirits. Specially, I would like to thank my dearly beloved husband, Charles Xiaobing Fang, thanks for being there for me all the time, thanks for hearing those endless complains and whining, and returning with encouragement and smart ideas. Without him, I would never pull it off. 5 DEDICATION I Would like to dedicate this thesis to my grandfather, Shouyi Zhang, my most respected person in the world, who has taught me, with his own words and deeds, how to be strong, confident, always decent and never, ever, give up. 6 TABLE OF CONTENTS

LIST OFnCURES I 3

LIST OF TABLES 1 7

ABSTRACT I 8

CHAPTER I 2 0

INTRODUCTION 2 0

L The RNA world 2 0

Ribozyme 2 1

Retroelements 2 3

Introns 2 5

(1) Intron types, structures and splicing mechanisms 2 7

(2) Trans protein factors for intron splicing 3 2

II. Group II introns 3 4

The splicing process: the five steps 3 6

Strcutrual elements and functionalities 3 7

(1) Major ribozyme motifs 3 8

(2) Other tertiary intereaction elements 3 9

(3) Internally encoded proteins 4 2

Hydrolysis and trans-splicing of group II introns 4 3

Evolutionary relationship of group II introns and nuclear 7 spliceosome 4 4

III. Genus Euglena 4 6

The classification and taxonomy of Euglena 4 8

Phylogeny of Euglena 5 I

IV Euglena gracilis chloroplast genome and the intron content.... 5 4

V. Thesis objectives 6 0

CHAPTER 2 6 3

THE EUGLENA GRACILIS INTRON-ENCODED MAT 2 LOCUS IS INTERRUPTED BY THREE ADDITIONAL GROUP H INTRONS 6 3

Introduction 6 3

Results 6 6

Identification of partially spliced psbC intron 2 pre- mRNAs by Reverse Transcriptase-Polymerase Chain Reaction (RT-PCR) analysis 6 6

Three group II introns are located in domain IV of psbC intron 2 6 9

Characterization of internal 5'- and -3' splice sites within psbC intron 2 7 0

Intron secondary models 7 5

psbC intron 2 splicing pathway is partially ordered 7 9

Internal introns of psbC interrupt an open reading frame encoding a putative maturase 8 4

Discussion 8 7 8

A new category of twintron 8 7

A partially ordered psbC intron 2 splicing pathway 8 7

mat 2 may be required for splicing 8 8

Possible functions of mat 2 8 9

Evolution of mat 2 9 I

Materials and methods 9 2

RNA isolation 9 2

cDNA synthesis 9 2

PCR analysis, cloning and sequencing 9 3

Computer analysis of orf 758 9 4

CHAPTERS 9 5

PSBC INTRON 2 AND MAT 2 ARE DEEPLY ROOTED IN GENUS EUGLENA, WITH VARIOUS INTERNAL INTRON CONTENT 9 5

Introduction 9 5

Results 9 8

PCR amplification of ten Euglena species 9 8

Determination of processing sites of psbC intron 2 homologues in E. viridis, E. granulata, E. anabaena, and E. spirogyra 1 02

Complete sequencing and identification of the external intron and mat 2 gene of the psbC intron 2 homologues of E. viridis, E. granulata, E. anabaena, and E. spirogyra 105 9

RT-PCR analysis and identification of internal introns in mat 2 homologues of E. viridis, E. granulata, E. anabaena and E. spirogyra 105

Processing of the internal introns in E. granulata is partially ordered 112

Comparative analysis of mat 2 sequences, and identification of conserved domains 115

Discussion I 1 8

psbC intron 2 and mat 2 are deeply rooted in Euglena species 1 1 8

High local conservation of mat 2 suggests a conserved function as an intron maturase 119

psbC2 intron 2 as the Euglena group II introns founder1 2 1

Internal introns and their evolutionary implications ....12 2

Materials and methods 124

Euglena cultures 124

Nucleic acid isolation 125

PCR and RT-PCR primers 125

PCR amplification, cloning and sequencing 126

RT-PCR amplification, cloning and analysis 127

CHAPTER 4 130

COMPARATIVE ANALYSIS AND SECODNARY STRUCTURE MODEL STUDY OF PSBC INTRON 2 EXTERNAL AND INTERNAL INTRONS 13 0 10

Introduction 1 30

Results I 3 2

Primary sequence comparison of psbC intron 2a 132

Comparative analysis of external intron 2as, establishing the secondary structure models 137

Analysis of domains I and II of psbC intron 2a 140

Sequence analysis and secondary structure model establishment of the internal introns, characterization of a mini-group II intron and identification of homologous intron 2ds 142

Discussion 1 49

The idenfitication of psbC intron 2 as a group II intron has been confirmed 149

The Euglena group II intron "microcosm" 150

The internal introns seem to have a much higher mutation rate 15 1

CHAPTER 5 153

EXPRESSION OF PARTIAL MAT 2 GENE OF EUGLENA GRACILIS IN E. COL/AND YEAST 153

Introduction 1 5 3

Results 1 55

Cloning of mat 2 and rpoB fragments into expression vectors 1 55 11 Expression of mat 2 and rpoB gene fragment in E. coli.I 60

Expression of mat 2 C terminal fragment in yeast 168

Discussion

Successful expression of Euglena chloroplast genes with the pET expression system 17 1

Possible reasons for the low expression rate of mat 2 fragments 1 7 1

The promising yeast system and future goals 173

Materials and methods 174

Vectors and cell strains 174

cDNA and PCR primers 1 7 5

cDNA synthesis, PCR analysis and cloning 176

Expression and protein isolation in E. coli: 177

Protocol for expression and protein isolation in yeast...l7 8

Gel electrophoresis 1 7 9

APPENDDC 1 182

IDENTIFICATION OF A GROUP U TWINTRON IN THE ATPE GENE OF THE EUGLENA GRACIUS CHLOROPLAST GENOME 1 8 2

Introduction 1 8 2

Results 1 8 5

Identification of partially spliced atpE intron 1 pre-mRNA by RT-PCR 185 1 2

Characterization of the internal splicing sites and secondary structure analysis of the internal and external introns 1 8 6

Discussion I 94

APPENDIX II 196

DETECTION OF GROUP U INTRONS IN EUGLENA MYXOCYUNDRACEA

Introduction 1 9 6

Results 1 9 7

Selection of the most conserved genes for amplification 1 9 7

No group II introns were identified in E. myxocylindracea 19 9

Discussion 206

CHAPTER 6 208

GENERAL DISCUSSION 2 0 8

The evolutionary relationships of the known intron types 208

Intron evolution in the Euglena chloroplast 2 1 2

Intron processing in the Euglena chloroplast 2 1 7

Future prospects 2 1 8

REFERENCES 221 13 LIST OF FIGURES

Figure 1-1 General ribozyme model 2 2

Figure 1-2 A possible phylogenetic tree of retroelements rooted with RNA-dependent RNA polymerases 2 5

Figure 1-3 Splicing mechanisms of the five types of preRNAs 3 0

Figure 1-4 Secondary structure model of a group 1 intron 3 1

Figure 1-5 Structure and consensus nucleotides of the ribozyme component of group II introns 3 5

Figure 1-6 Comparison of the catalytic core models of nuclear pre- mRNA introns (upper) and group II introns from Saccharomyces cerevisiae mitochondria 4 5

Figure 1-7 typical structure of a Euglena 4 7

Figure 1-8 Euglenoid Phylogeny 5 3

Figure 1-9 Circular map of Euglena gracilis chloroplast genome 5 9

Figure 2-1 Schematic description of the overall structure of psbC intron2 6 5

Figure 2-2. cDNA-PCR analysis of psbC pre-mRNAs 6 7

Figure 2-3. Characterization of three internal introns by cDNA cloning and sequencing 7 2

Figure 2-4. Secondary structure model for psbC intron 2 external intron 2a and internal intron 2b, 2c and 2d 7 7

Figure 2-5. Dominant processing pathway of the three internal introns 8 0 14

Figure 2-6. Predominant pathway of psbC intron 2 mRNA processing 8 2

Figure 2-7. Comparison of the reverse transcriptase domains V- VII from orf 758 with the chloropalsat consensus and general consensus 8 5

Figure 3-1. PCR analysis of total nucleic acid of ten Euglena species. .Figutre 3-2. Processing sites of psbC intron 2 in E. anabaena, E. spirogyra, E. viridis, E. granulata and E. gracilis 100

Figure 3-2 Processing sites of psbC intron 2 in E. anabaena, E. spriogyra, E. viridis, E. granulata and E. gracilis 104

Figure 3-3 RT-PCR analysis of psbC intron 2 in E. viridis, E. anabaena and E. granulata 108

Figure 3-4 Internal intron content of psbC intron 2 in E. gracilis, E. myxocylindracea, E. viridis, E. anabaena, E. granulata and £. spirogyra 1 10

Figure 3-5. Predominant pathway of the mat 2 internal introns in E. granulata 1 1 3

Figure 3-6. Sequence alignment of mat 2 X domains (A) and RT domains (B) of in E. gracilis, E. viridis, E. anabaena, E. granulata and E. spirogyra 1 16

Figure 4-1 Nucleotide sequence alignment of psbC intron 2a homologues from E. gracilis, E. anabaena, E. viridis, E. granulata and E. spirogyra 134

Figure 4-2 Secondary structure models for psbC intron2as with domain I and domain II abbreviated 138 15

Figure 4-3. Secondary seqeunce alighnment of psbC intron 2a domain I stem and domain II 141

Figure 4-4 Sequence alignment of the N' fragment of mat 2 protein in E. gracilis, E. viridis and E. anabaena 1 4 3

Figure 4-5 Comparative analysis data of intron 2d in E. gracilis, E. viridis, and E. anabaena 1 4 4

Figure 4-6. Secondary structural models of intron 2f, 2g and the mini group II intron 2h in E. granulata and intron 2e in E. viridis 146

Figure 5-1 Maps of recombinant pET plasmids that contains mat 2 and rpoB fragments 157

Figure 5-2. Map of the recombinant pG-1 plasmid containing mat 2 joined to a His-tag 159

Figure 5-3 E. coli expression of the Euglena rpoB gene fragment in the pET15b expression vector and the thyrodoxin protein in the pet32a vector 161

Figure 5-4 E. coli expression of b-galactosidase and the Euglena rpoB gene fragment in the pET15b expression vector, the transferrin and the trx-psbC fusion protein in the pET 32a vector 163

Figure 5-5 His-tag column purification of £. coli expression of plasmid mat2C-pet32a 165

Figure 5-6 Affinity column purified rpoB, thyrodoxin and truncated trx-mat2c protein fragment, trx-mat2cA3-1 protein samples 167 16 Figure 5-7. His-tag affinity Nickel-column purification of the his- trx-mat 2C fusion protein expressed in yeast 169

Figure 5-8 Purification of E. coli expressed pET plasmid proteins with his-tag affinity column 180

Figure 5-9 Purificaiton of yeast expressed protein with the his-tag affinity column 1 8 1

Figure Al-1. RT-PCR analysis of atpE pre-mRNA. A Simplified diagram of the atpE intron 1 region is shown below the gel 188

Figure A1-2 Sequences of the cDNAs corresponding to partially and fully processed atpE intron 1 190

Figure A1-3 Secondary structure models for A) the external intron and B) the internal intron of the atpE twintron 192

Figure A2-1 Group II intron screening using PGR analysis of the E. myxocylindracea chloroplast genome 20 2

Figure A2-2 Comparison of the intron contents of eight genes in E. gracilis and in E. myxocylindracea 204

Figure 6-1 The phylogeny of organisms and the five major steps in intron evolution 211 17 LIST OF TABLES

Table 3-1 Sequences and locations of PGR and RT-PGR primers. The locations are in the forms of EMBL coordinations 128

Table 3-2 Complete anotations of introns and protein coding regions in psbC intron 2 of E. gracilis, E. viridis, E. anabaena, E. graniilata E. spirogyra 129

Table 5-1 The sequences and locations of the oligos used for expression plasmid construction 176

Table Al-1 Summary of Euglena gracilis chloroplast twintrons. N/A= not available (adapted from Gopertino, 1993) 183

Table A1-2 Sequence and location of primers used for cDNA and PGR analysis. The locations are EMBL coordinates of the Euglena gracilis chloroplast genome, accession number X70810 185

Table A2-1 Oligo primers used to amplify the E. mxyo chloroplasat genome 199 18 ABSTRACT

The chloroplast genome of Euglena gracilis contains at least

155 introns, accounting for 39.2% of the genome. Among them are

88 group II introns of size 277-671 nucleotide (nt). Questions about the processing of these introns and their origin and spread in Euglena genera is the subject of this thesis. My working hypothesis is that the Euglena chloroplast genome evolved from an intron less ancestral genome by the invasion of mobile genetic elements relying in part on internally encoded enzyme activities for mobility. These internal enzymes may also be group II maturases.

My research target was the largest intron in Euglena gracilis chloroplast, psbC intron 2 (4144 nt). I characterized psbC intron 2 as a cis-spliced group II intron encoding a 758 codon maturase-like protein {mat 2) that is interrupted by three additional group II introns. I also identified a putative RNA-binding domain (X), that relates to fungal mitochondrial group II intron maturases, and a reveres transcriptase (RT) domain, that has been found in other 19 group II intxon maturases and has been shown to have intron-

translocation activity (Lazowska, 1994, Moran, 1995).

In order to examine the distribution of mat 2, I charaterized

psbC intron 2 homologues from several species. I found that the mat

2 locus well conserved in most of the Euglena species tested,

indicative of a deep root and important function of mat 2.

Interestingly, the entire intron is absent in E. myxocylindracea.

In order to investigate the effect of mat 2 loss, I analyzed group II

intron content in E. myxocylindracea chloroplast genome. No group

II introns have been identified in the genome. These data also

provided evidence that mat 2 might be responsible for the acquisition and the processing of most of the group II introns in

Euglena chloroplast.

The activity of the mat 2 protein has been difficult to examine since the protein is hard to obtain. The development of protocols to overexpress mat 2 in E. coli and yeast are reported in the thesis. 20 CHAPTER 1

INTRODUCTION

1. The RNA World The well accepted dogma of molecular biology is that information flows along a "DNA-RNA-protein" pathway utilized in all living creatures. DNA has many features which make it suitable for its role as the basic genetic molecule. It's stable, replicable and can form highly compact structures for efficient storage. However, DNA lacks other characteristics which would have been required as the original genetic molecule. The "chicken or egg" question therefore emerged about DNA, RNA and protein. First of all, DNA stores the genetic message but can not actively express or process it. Enzymes and RNA molecules are needed for transcription, translation, and even DNA replication. RNA molecules, on the other hand, are more self-reliant. It is well known that retroviruses, a primitive form of life, contain RNA as their genomic material. The discovery of "catalytic RNA" in the early 80's revealed that RNA could also act as an enzyme. It was then hypothesized that initially the world was RNA-based, the "RNA world", from which evolved the "RNP" world composed of RNA and protein molecules. After the subsequent rise of DNA, a better candidate for gene storage, the "RNA world" was replaced by the "DNA world". 2 1 In the contemporary DNA world, RNA molecules still play a crucial role in many biological process. For example, many enzyme complexes contain RNA subunits; RNA is a primer for DNA synthesis; RNA delivers and decodes genetic messages; even the DNA structural units, dNTPs, are synthesized from NTPs, the structural units of RNA. Learning more about the structures, functions and evolution of RNA

molecules then should help us understand the fundamentals of the biological world.

Ribozymes Ribozymes are RNA molecules with enzymatic properties that catalyze specific RNA cleavage and RNA splicing reactions. Ribozymes are the only RNA molecules found today that retain catalytic activity. Ribozymes can either be free RNA molecules or subunits of protein-RNA complexes. Ribozymes can form compact and stable secondary and tertiary structures. As occurs in protein enzymes, significant ribozymal "3-D" conformations manipulate the RNA substrates into closed catalytic pockets and bring the active chemical groups into proximity. Ribozyme active groups are mostly

3' or 2' hydroxyl groups which cleave RNA substrates through transesterification reactions. The active groups carry out acid-base attacks at the target phosphodiester bonds and form tetrahedral intermediates around the phosphorus atoms. The specific environments of the catalytic pockets, for instance metal ion conformations, then cause the stereochemical configurations around the phosphorus to orient toward the desired direction, which results in cleavage or/and ligation of the RNA substrates. Figure 1-1

depicts the theme of this catalysis mechanism.

The very first ribozyme discovered was in Terrahymena. where

an intervening sequence of a pre-rRNA molecule was found to

catalyze its own splicing (Cech, 1981). The same self-cleavage and maturation process was observed with Hepatitis Delta Virus genomic RNA (Wu, 1989). A more versital ribozyme is RNAaseP, a RNA-

protein complex molecule involved in the maturation of tRNA also in Teirahymena (Guerrier-Takada. 1983). The RNA moiety of the complex contains the reaction center and can catalyze, by itself, the

site-specific cleavage, resulting in the mature tRNA molecule.

5"

3'

Figure I-l General ribozyme model (adapted from Cech, 1990). General acid-base catalysis (B or HB) is hypothetical. The reaction proceeds with the inversion of the stereochemical configuration around the phosphorus atom. The metal ion differs among ribozymes. 23 Other ribozymes are mostly pure RNA molecules with significant secondary and tertiary structures. Major ribozyme types are hammerheads (Haseloff, 1988), hair-pins (Gibson, 1997), and self-splicing intervening sequences, i.e. introns (discussed later). In addition to naturally-occurring ribozymes, custom made RNA molecules with ribozymal structures can exhibit the same activities, demonstrating that the enzymatic properties of ribozymes are encoded within the RNA primary sequence. Ribozymes are highly specific and efficient. Therefore, although they have a much smaller diversity of catalytic groups and a much more limited substrate range than proteins, ribozymes are still considered to be "irreplacable" (Cech, 1986) by protein enzymes .

Retroelements Like ribozymes, retroelements are another type of ancient feature of the "RNA world" that still persists today. Retroelements are DNA or RNA sequences that have been shown or speculated to be partially or totally the result of reverse transcription, the synthesis of DNA molecules from an RNA template. By generating stable, static DNA from mobile RNA molecules, retroelements are able to transpose or integrate into other genomes. Retroelements are present in various forms. They can be organisms such as retroviruses, ribozymes such as telomerases (Nakamura, 1997), mobile DNA fragments such as retrotransposons, or mobile RNA elements such as self-splicing introns. Distinguishing features of retroelements include 24 the presence/absence of Long Terminal Repeats (LTR), presence/ absence of virulence and the translocation or integration mechanisms (for review see Hull, 1989).

Most retroelements contain a partial or complete coding region for reverse transcriptase (RT). Comparing the sequences of RT domains from known retroelements has provided information about the evolutionary relationships among them in the form of a phylogenetic tree (Figure 1-2) (Nakamura, 1997). The common ancestor of all retro elements was most likely a retrotransposable element containing a gag gene and a pol gene (Xiong, 1990). The gag gene contains a Cys-His motif and is essential in sequestering the retroelement's RNA. The pol gene contains the reverse transcriptase domain and an intergrase domain that are responsible for new strand synthesis and integration. 25

RNA depetidciit RNA Polymerases

Telomerases Sc-Est2p ' Ea-pI23 Sp-Trllp hTRT mbOrJAs "I

I ! 1 Milo pldsinicl RTL

Group II liiUoiis

Non-LTR Relroiransposoiis

Hepadnaviruses •

LTR Retrotransposons (Copia-Tyl Subgroup)

LTR Retrotransposons (Gypsy-Ty3 Subgroup! ^

Caulimoviruses

Retroviruses

Figure 1-2 A possible phylogenetic tree of retroelements rooted with

RNA-dependent RNA polymerases (Adapted from Nakamura, 1997 and Xiong, 1990). The tree was constructed using the Neighbor Joining method. The length of the box corresponds to the most divergent elements within that class.

Introns Introns are also called intervening sequences. As the name suggests, introns interrupt the coding sequences of protein or functional RNAs. After replication and transcription, introns are excised from the transcribed RNA to produce the functional form of the RNA molecules or to allow correct translation of the proteins. 26 Introns were discovered in the late 1970's (Chow, 1977, Klessig, 1977). To date, introns have been found in organisms of almost every taxonomical subkingdom, and hence are considered to be a universal phenomenon. The argument about the origin and biological meaning of introns remains a hot topic in molecular biology. The predominate intron origin theories are the "introns late" and "introns early" models. The "introns early" model argues that introns were present in ancient genes and were selectively lost in many genes and organisms (Darenell and Doolittle, 1986). The "introns late" model suggests that introns were mobile elements that inserted into genomes later in evolution (Cavalier-Smith, 1991). In recent years, the characterization of the transposing ability of many introns, the discovery of the introns-within-introns structure (Copertino, 1991a), and the phylogenetic study of homologous introns have provided strong support for the "introns late" model. Also debated is the role of introns in gene expression. The ability of introns to transpose suggested that introns integrated into the genome selfishly to express themselves (Temin, 1985). Alternatively, it was suggested that introns were mobile carriers flanking a useful protein gene or protein domain which they transported to new locations ("exon shuffling", Gilbert, 1985). To date, the exon shuffling theory has been supported only by the study of animal evolution and fails to explain the occurrence of introns in lower and Prokaryotes (Nilsen, 1994). 27 fn Intron types, structures and splicing mechanisms There are five main types of introns (Figure 1-3). Group I introns were among the earliest identified introns. They are found in bacteria, bacteriophages, nuclear RNAs, and Eukaryotic mitochondrial and chloroplast RNAs. Group I introns have been found in all types of RNA molecules, yet are most abundant in tRNAs. Group I introns can undergo self-splicing in vitro, indicative of ribozyme activity. Group I introns form stable and compact secondary structures autonomously which consist of nine stem-loops (pl-p9) and four primarily conserved segments P, Q, R and S (Figure 1-4). A G-C base pair in P7 is at the catalytic center. Group I introns are spliced through two tranesterification reactions (Figure 1-4) which are initiated by the binding of an exogenous guanosine into the reaction center by triple base-pairing with the G-C pair. The 3' OH of the G attacks the 5' phosphate of the first intron nucleotide and cleaves the 5' site. The 3' end of the 5' exon then attacks and cleaves the 3' boundary and ligates to the 3' exon (for review see Cech, 1990). The site specificity and the efficiency of the splicing reaction are assured by the intricate

tertiary conformation of the intron, in which adjoining secondary helical segments twist and stack coaxially to create super helices and form a cleft embeding the catalytic center and to bring the 5' and 3' exons into close contact (Saldanha, 1993). Some group I introns contain internal open reading frames which might encode 28 retroelement-like proteins that contribute to group I intron mobility (Jacquier, 1985, Colleaux, 1986). The second type of introns, group II introns, are also ribozymes that undergo a low rate of self-splicing in vitro, indicating ribozyme activity. Like group I introns, group II introns assume well established secondary structures. A typical group II intron consists of six helical-domains radiating from a central core (Figure 1-5). Splicing of group II introns involves two transesterification reactions and formation of a lariat intermediate (Figure 1-3). Details of the mechanism will be discussed later in this chapter. Group II introns are found mostly in organelles, i.e. mitochondria and , of fungi, algae and land plants. Recently, the identification of group II introns in cyanobacteria, purple bacteria (Ferat, 1993), E. coli (Knoop, 1994) and Lactococcus lactic (Shearman, 1996) have been reported, indicating a wide distribution of group II introns. The largest group II intron population is found in the chloroplast genome of Euglena gracilis, a unicellular protist. In the Euglena gracilis chloroplast there is another type of intron called group III introns. They are usually very short (97-119 nt) and so far have been found only in genus Euglena and related Euglenoids. Group III introns also splice via lariat formations (Copertino, 1994). Evolutionary phylogenetic analysis has identified a domain at the 3' end of group III intron that is similar to group II intron domain VI (Doetsch, 1998) Group II and group III introns also have similar 5' boundary consensus sequences. Therefore group 29 III introas are considered to be closely related to group II introns

(Copertino, 1993).

Another large group of introns is also considered to be related to group II introns. These are the nuclear pre-mRNA introns. Nuclear introns are very diverse in size and do not have a well established structure. In addition, they are not self-splicing. Instead, they rely on the spliceosome, a group of small RNA molecules that act as ribonuclear proteins (snRNPs), for catalysis. These RNA molecules recognize and bind to specific sequences of the intron and the flanking exons to form a compact catalytic complex that, as in group I and group II introns, bring the reaction sites together in the proper conformation. There are five known small RNAs, named Ul, U2 and U4-U6. Their interactions with nuclear introns are briefly shown in figure 1-3 (for details see Nilsen, 1994b). Significantly, as illustrated in figure 1-3, nuclear introns all have an "A" near the 3' end and splice via formation of lariat intermediates just like group II and group III introns (Konaraska, 1985). Splicing mechanism similarities between group II, group III and nuclear introns therefore indicate that they are evolutionarily related.

The last type of introns are found in archaebacteria and nuclear tRNAs. Their splicing depends completely on protein enzyme catalysis (Deutscher, 1984). Oruiip I (Uoup 11/111 Nuclcar niKNA Nuclcar IKNA .Self >|)1 icing ScH spl icint; SpliccQSum al hwymatic 4 i ll> i (II -O- 1

O p-' ^-O

Figure 1-3 Splicing niechanisnis of the five major types ol" pre-KNAs (adapted froni Cech, 1990). Wavy lines are inlrons. Smooth lines are exons. Open circles represent the ligatcd splice junction. Circles with letters ul-uO represent the snKNPs. The spliceosonuil interaction of the nuclcar intron is simplified.

o 3 1

P5

P2 'P G-Ur.-r Q^ u ~ G .•J:L P3 xlink. ^AUCAGu 5' exon I I I I # I binding AAGA U A UAGUCC P8 P7 P9

Figure 1-4 Secondary structure model of a group I intron (adapted from Cech, 1990) as determined by comparative analysis. P, Q, R. and S are conserved sequence elements represented by their most common nucleotide sequences. The dashed line between P4 and P6 is added to make the diagram less crowded, and does not indicate any omission of nucleotides. Filled arrows indicate 5" and 3' splice sites. Open arrows indicate the insertion sites of extra stem-loops in some introns. The Region in the square is the catalytic center and a black dot in P7 indicates the location of the exogenous G. Some tertiary interactions are shown by solid lines. 32 (2) Trans protein factors involved in intron splicing All intron processing requires protein mediators and regulators in vivo, even for group I and group II introns that can self-splice in vitro. Over 35 proteins have been identified in yeast that are involved in nuclear intron processing (for review see Woolford and

Peebles, 1992). Protein mediators of nuclear introns are mostly the proteins involved in spliceosome assemblies. A very large protein, prp8, found both in yeast and mammalian cells has been shown to be part of the U5 snRNP and binds to the U4-U5-U6 complex formed at the initiation of splicing (Whittaker, 1991). The prp24 gene product has been shown to stabilize the U4-U6 coupling before the first transesterification (Shannon, 1991). Some ATPases or ATP- dependent RNA helicases were also found to stimulate or facilitate the snRNA association and disassociation. The prp28, prp2, pep16 and prp 22 gene products are examples of such types of mediators (Woolford and Peebles, 1992). In addition to spliceosomal proteins, some other proteins have also been proposed to regulate splicing.

For example, the DBRl protein was thought to be involved in the branching-debranching process of the lariat (Chapman, 1991) and the ribosomal protein L32 was shown to inhibit the splicing of its own RNA through a feed-back mechanism (Eng, 1991). Splicing of group I and group II introns is thought to be mediated by the proteins encoded within some introns. These intron-encoded proteins are called maturases. There have been 4 to 33 5 maturases identified in group I introns, most of which are in yeast mitochondria (for review see Saldanha, 1993). Group I intron maturases contain a dodecapeptide-like motif which is characteristic of site-specific endonucleases (Perlman,1989, Wenzlan, 1989). A large percentage of group II introns also contain internal maturase coding regions (Mohr, 1993). These include two yeast mitochondrial introns, ail and ai2 of the coxl gene (Moran, 1995), intron 2 of the Euglena chloroplast psbC gene (Zhang, 1995), a group II intron in the trnK gene of many higher plant chloroplasts (Liere and Link, 1995), and the group II introns found in bacteria (Shearman, 1996). A typical group II intron maturase contains reverse transcriptase domains I-VII, an RNA binding domain (X domain), a Zn2+ finger domain, a proline rich domain called the Z domain, and sometimes a residual protease domain (Saldanha, 1993)

The mechanism of maturase mediation of intron splicing has yet to be established. It is proposed that a mechanism similar to that of molecular chaperons is employed. That is, maturases may bind to the target intron and help it to assume the correct tertiary conformation. Site directed mutagenesis has shown that mutations in the X domain of the yeast ai2 maturase block splicing (Eskes, personal communication). Gel-shift assays of the matK gene product, a maturase encoded in the mustard chloroplast trnK intron, clearly demonstrate RNA-binding activity (Liere and Link, 1995). There is no direct biochemical evidence for group I intron maturase activity. 34 yet site directed mutagenesis has demonstrated their importance in intron splicing (Lambowitz, 1989). In addition to maturase activity, group I and group II intron encoded proteins are also thought to have contributed to intron mobility, since endonuclease domains and/or reverse transcriptase domains have been identified in these proteins. Endonuclease activity and transposition ability of group I intron encoded maturases have been demonstrated (Schafer, 1994, Ho, 1997). The mobility of maturase containing group II introns will be discussed below.

Some nuclear-encoded proteins also regulate the splicing of group I and group II introns. Some tRNA synthetases are involved in controling group I intron splicing, such as cytl8 in yeast and neurospora, a tyrosine synthetase, and NAM2 in yeast which encodes a leucine synthetase. The tRNA synthetases are thought to recognize the pre-RNA structures because they are very similar structurally to their normal cellular targets (Lambowitz, 1990). Other nuclear proteins, such as CBP2 and MSS18 in yeast, bind to the stem-loops of group I introns (Gampel, 1991). The nuclear proteins found to be involved in group II intron splicing, such as MRS3 and MRS4, are ion- carrier proteins (Wiesenberger, 1991), although the MSS116 gene product has been found to be an RNA helicase (Wiesenberger, 1992). (5

iD3 EBS1

IV

Figure 1-5 Structure and consensus nucleotides of the ribozyme component of group II introns (adapted from Michel, 1993).

Residues shown are the consensus for both subgroups of group II introns (R=purine, Y= pyrimidine, M=A/C, K=G/U). Tertiary interactions are indicated by dashed lines, curved arrows and/or greek lettering. Plus signs indicate the nucleotides that participate the 6-5' interaction. Roman numbers label the six major helical domains. Arrowheads indicate splice sites. The asterisk shows the lariat branchpoint. 36 II. Group II introns Although group II introns are not as thoroughly and widely studied as group I introns, their catalysis, mobility and evolutionary origins have all been characterized. Group II introns have been shown not only to self-splice in vitro, but also, to undergo reverse splicing, reinserting into the ligated exons (Augustin, 1990), or sometimes transposing into new locations (Lazowska, 1994, Moran, 1995). As compared to group I introns, group II intron splicing in vitro takes place at a slow rate and is very sensitive to reaction conditions such as temperature (Michel, 1995) and salt concentrations (Winkler, 1991). The in vivo processing of group II introns is thought to rely strongly on trans-provided factors. Some of these factors are group II intron encoded maturases that contain reverse transcriptase and/or nuclease motifs, thus bearing similarity to ancient retroelements (Mohr, 1993). In addition to the interesting molecular functions of group II introns, their close relationship to nuclear introns has attracted many researchers to this field.

The five steps of the splicing process (Reviewed in Michel. 1995). In vivo, group II introns are believed to excision through two transesterification reactions (Winkler, 1991). The entire reaction can be described in five steps: (1) the process is triggered by bringing the 3' end of the intron into the proximity of the 5' splice junction. (2) the 2'-0H of the bulging adenosine in the sixth domain of the helical-wheel structure initiates the first transesterification by 37 attacking the 5' nucleotide of the intron, forming a lariat

intermediate. (3) the lariat disassociates from the 5' exon yet is still covalently connected to the 3" exon. The free 5' exon remaining in non-covalent contact with the intron and the 3' exon. (4) the 3'-OH group of the 5' exon then attacks the 5' nucleotide of the 3' exon in order to carry out the second transesterification reaction which results in complete intron cleavage and exon ligation. (5) the ligated exons form new secondary and tertiary conformations which outcompete the intron-exon interactions resulting in complete disassociation from the intron. The formation of the lariat is proven to be the rate limiting step (Michel, 1989) and is reversible, i.e. the lariat could debranch. The second transesterification and the exon ligation are also reversible (Jarrell, 1988). The equilibrium of these reactions is affected in several ways. Critical methods of regulation include the selection of the 5' and 3' splicing sites, the formation of correct and stable first and second reaction complexes and the maintainance of close contact between the 5' and 3' exon after the first diester bond cleavage. These regulations largely depend on the cis-stored information inherent in the intron itself, i.e. the secondary structure and tertiary interactions formed by the intron.

Structural elements and functions.

Since in vitro splicing has been observed with only a few of the known group II introns, group II introns are identified mostly by 38 recognition of characteristic sequences and structural elements (Michel, 1995). Through comparative analysis of 70 group II introns, Michel et al. deduced a general secondary structure model for group II inlrons. This model separated group II introns into two subgroups, IIA and IIB, based on several sequence differences (Michel, 1989). For nealy ten years, biochemical and genetic experimentations have been carried out to scrutinize and improve the model. Figure 1-5 displays the commonly accepted structure and conserved nucleotides of group II introns.

(H Major ribozvme motifs In vitro mutagenesis experiments have shown that deletion of domains II, III and IV have no major effect on the splicing process. Thus, domains I, V and VI are thought to be the major catalytic core components. It is easy to understand the significance of domain VI since it contains the branchpoint adenosine. Although the stem-loop is conserved, there is no primary sequence conservation observed with domain VI. On the other hand, a highly consistent primary consensus has been noticed in domain V (figure 1-5). Domain V has been characterized as the catalytic domain, since it's binding to the reaction pocket triggers catalysis (Michel, 1995). Three out of the first four nucleotides of the domain, A2, G3 and C4, are highly conserved. Mutations of these nucleotides block both cleavage steps. The rest of domain V is thought to provide a scaffold for correct conformation of the catalytic center (Chanfreau, 1994). For example, 39 the "GU" bulge on the 3' stem and the distance between domain V and domain VI have both been shown to affect the splicing rate (Schimdt, 1996, Boulanger, 1996)

Domain I is the most complicated domain, in that it is not only the largest domain, but also the domain that contains the most regulatory elements. Kinetic studies demonstrate close contact between domain V and domain I throughout the sphcing reaction (Pyle, 1994), indicative of a crucial role played by domain I. The most important elements of domain I are called Exon Binding Sequences (EES). EBSI is a six-nucleotide sequence in the loop of ID3. This hexamer is proposed to base pair with the last six nucleotides of the 5' exon, named the Intron Binding Segment I (IBSI). This base pairing has been shown to be critical for 5" splice site recognition, the formation of the first reaction complex and for maintaining 5' exon-intron contact after the first cleavage. Significantly, the EBSI- IBSI interaction was shown to be both necessary and sufficient to carry out reverse splicing of the intron into the ligated exon (Mori, 1990a, Mori. 1992). The EBSII-IBSII interaction is less common and has been shown to be critical only during trans-splicing.

(2) Additional tertiarv interaction elements

In addition to the three domains which are neccessary for the catalysis of group II intron processing, there are also sequence motifs or base pairs that have been shown to affect the catalytic rate of splicing. Most of the tertiary interactions were identified and 40 demonstrated by in vitro splicing assays of two yeast mitochondrial introns, ail and ai2 located in the cox 1 gene. Therefore, it is possible that some of the tertiary elements found in these introns are specific to only these two introns. One of the earliest identified interactions is y-y'. The y motif is

a 5 nucleotide sequence, "RRGAY" (R=purines, Y=pyrmidines) in the core located between domain II and domain III (Michel, 1989). The second "R" is proposed to base-pair with the last nucleotide of the intron (y'). y-y' has been shown to be involved in 3' splice site

recognition and cleavage rate control (Jacquier, 1991). Mutagensis which disrupts the y-y' interaction causes accumulation of the lariat-

3' exon intermediate (Michel, 1989). The a-a' interaction consists of base pairing between a

sequence motif in the 3' bulge of domain ID3 and the terminal loop of domain IB. Compensatory mutagenesis demonstrates that this base-pairing assists in bringing the IBS and the EBS elements into proximity (Harris-Kerr, 1993). A similar kind of base-pairing, P-P', is

much less commonly found. The e-e' base-pairing involves the third and the fourth

nucleotides at the 5' boundary and two nucleotides in the 3' bulge of domain IC. Like the EBS-IBS connection, e-e' is formed at the first step and survives until exon ligation. The 5-8' interaction is a single base pairing between the nucleotide 5' of the EBSI motif and the first nucleotide of the 3' exon

(Michel, 1989). Inspired by the "guiding theory" of group I introns 4 1 proposed by Davies et al (1982), it has been proposed that group II introns also have an internal "code" to guide recognition of the splicing boundary. While the 5' site is specified by the boundary sequence and the IBS-EBS pair, 3' recognition is thought to be directed by 5-6' pair. Therefore 5-6' is also called "guided pair".

Significantly, traditional Watson-Crick pairing is not exclusive for guided pairings (Michel, 1995). G-U and A-C pairs have also been observed (Hong, Ph.D. Thesis 1993, Zhang, 1995). Recently, some tertiary elements have been identified within domains II and IV in ail and ai2. The 9-0' interaction between the domain Id loop and the basal stem of domain II has been reported to stabilize the core structure (not shown in the figure) in vitro. The Ti-ri' interaction between domain II and domain VI has different effects in different subtypes of group II introns (not shown in the figure). In IIA introns the rj-il' interaction was reported to mediate a conformational rearrangement before the second transesterification reaction. However, in IIB introns it was shown to prevent the first transesterification (Costa, 1997). The functional difference of the same motif is probably a result of different evolutionary environments. The tertiary interaction characterized most recently is the interaction, which is a base-pairing between the terminal loop of domain V and a motif in the base stem of domain I (Costa, 1995). This interaction utilizes the unique "GAAA" tetraloop of domain V to stabilize the first and the second reaction complex. Unfortunately, 42 the "GAAA" sequence is absent in many group 11 introns. Therefore the generality of this interaction is still in question.

Internally encoded proteins: maturase with retroelements Group II intron encoded open reading frames have been found in yeast mitochondria, Euglena chloroplasts, and almost every group II intron found in bacteria. All of the open reading frames were found in the loop of domain IV, where no secondary or tertiary interactions have been identified. Although their predicted protein sequences are diverse, these ORFs do have consensus motifs. Through comparison of the protein sequences of 34 group II intron encoded ORFs, two major functional motifs were proposed (Mohr, 1993). One is the reverse transcriptase motif and the other is a potential RNA binding motif, the X-domain. For some introns, an endonuclease motif (Gorbalenya, 1994, Zimmerly, 1995a), a protease motif and a metal binding motif such as a zinc finger, have also been identified. These functional motifs of group II intron internal proteins have many features similar to those found in retroelements. In a recent phylogenetic report, group II intron encoded proteins are positioned, together with msDNAs, as the sister group of non-LTR retrotransposons, in a clade more primitive than LTR-heteroviruses (Nakamura, 1997). Retro-activity enables the maturase containing group II introns to act as transposable elements, which could account for the large numbers of group II introns in contempory organisms. This "intron invasion" hypothesis is strongly supported by the study 43 of the two group II introns in yeast mitochondria, ail and ai2 of the cox 1 gene, which both contain maturase-like genes. Researchers have successfully demonstrated the ability of the two introns to insert into intronless alleles in vivo. Complete reverse-splicing of ail into ligated RNAs and even double stranded DNA with accessible sites in vitro was also reported (Zimmerly, 1995b). Both the endonuclease and reverse-transcription mechanisms are involved in the homing and reverse-splicing events (Yang, 1996). Although in these in vitro assays intron insertions are highly specific, PGR analysis did detect the possible transposition of one of these group II introns into novel sites (Sellem, 1993). Reverse-splicing followed by integrative recombination at the RNA level, and endonuclease cleavage followed by ligation at the DNA level were both proposed as mechanisms used in group II intron invasion into organelle genomes (Grivell, 1994). Interestingly, group II introns found in bacteria contain maturases that contain RT or/and X-domains homoligical to those found in Eukaryotes (Matsuura, 1997, Shearman, 1996, Knoop, 1994), indicating the early existence of these group II intron encoded proteins.

Hydrolysis and trans-splicing of group II introns

Transesterification is not the only splicing mechanism possible for group II introns. Under in vitro conditions that are unfavorable for the formation of the catalytic center, group II introns also splice through hydrolysis (Reviewed in Michel, 1995), in which both splice 44 sites are hydrolyzed and ligated through dehydration. The IBSI- EBSI interaction was shown to be necessary for the correct ligation of the exons. Conditions that induce hydrolysis include the loss of the branch site "A", the alteration of the distance between IBSI and the intron, and high salt concentrations. However, no in vivo hydrolysis of a group II intron has ever been observed. Group II introns have also been found to undergo trans- splicing, in which segments of one group II intron are located in separate positions, usually quite far away from each other in the genome. Indicating that some of the domains of group II introns can be provided in trans. In fact, domain V and domain I have been shown to accomplish splicing in trans in vitro (Hetzer, 1997, Suchy, 1991). Most of the trans-splicing group II introns are found to be separated at domain IV. However domain VI can not be provided in trans in vitro.

The evolutionary relationship of group II introns and nuclear spliceosomes

Group II introns and nuclear introns share a similar splicing mechanism, including the formation of a lariat intermediate, and structural similarities between the snRNP RNAs and the group II intron domains. In 1993, Wise presented a structure to structure comparison between the reaction complex of a typical nuclear intron and a group II intron (Figure 1-6). In the alignment, counterparts of domain ID, domain IC, domain V and domain VI were all found 4 5 within the U1-U6 snRNP RNA group. As suggested in the article (Wise. 1993). considering the complexity of both splicing complexes. It seems improbable that both group II and nuclear introns would have independently arrived at such a similar assemblage of catalytic strategies by chance. Therefore, group II introns are considered to be the direct ancestor of nuclear introns.

Precursor mRNA splicing u6-5' splice Site neiix

U-A

U—A U2-U6 fielix I

II

I I I I I I I

Uz-Branch poini nelix - crcssi:nK — .va:son-C::c-. lase : 5-2 =icsrnoc:es!er \on-.Va;scn-C.--cn Group II rrerac.'cn intron self- Domain IC splicing

Domain ID

E3S1BS Domain V

Guide I I I I I I I

I I I I I Domain VI

Figure 1-6 Comparison of the catalytic core models of nuclear pre- mRNA introns (upper) and group II introns from Saccharomyces cerevisiae mitochondria (lower). Only a small portion of the group II intron is shown. The figure is adapted from Wise, J. A.. 1993. 46 III. Genus Euglena The Chinese translation of Euglena is "the green-bug alga". This interpretation has precisely summarized the biological intricacy of Euglena, yet has made a conventional error in terms of classification. A Euglena is like a bug in that it has flagella which makes it motile, but it contains chloroplasts which makes it green and plant-like. However, the classification of Euglena as an alga was considered to be improper, for, although Euglena has a complex cellular, it is still a unicellular organism and it is far more simple than most of the green, red and brown algae. Figure 1-7 shows a typical Euglena structure (Buetow, 1968). In general, Euglena is described as a unicellular, motile, photosynthetic . It is mostly an elongated oval-shape and has a which enables it to perform a peculiar crawling movement named "metaboly" (Walne and Kivic, 1985). Euglena inhabits fresh water or sea water and normally is phototropic, yet when grown in the dark, Euglena will assume a heterotropic metabolism and feed on dead organic materials in its environment

(Purves, 1992). Euglena shares morphological, physiological and biochemical features with organisms in all the biological superkingdoms. Like animals, it has a stigma at its flagellum end and a reservoir connected to a gullet. Like plants, it has a chloroplast, but differs from plants in that the chloroplast has three membranes instead of two. Like fungi and bacteria, it can be colonial and it reproduces via asexual mitosis and cytokinesis. Euglena also 47 has many unique features, such as the pellicle or periplasmic membrane, , and paramylon as sugar storage.

D

r,i N 0

-0 I 1 n

__3

Figure 1-7 typical structure of a Euglena (adapted from Buetow. 1968, P3).

A. locomoter flagellum; B. gullet; C. stigma; D. photoreceptor; E. internal flagellum (usually adheres to photoreceptor); F. contractile vacuole; G. reservoir; H. blepharoplast (kinetosome); I. rhizoplast (rare); J. paramylon (free); K. ; L. chloroplast; M. endosome; N. nucleus; O. hematochrome granules (red); P. mitochondria; Q. stria; R. pellicle; S. muciferous body (rare). 48 The various classifications and taxonomy of Euglena

Because of the combination of plant-like and animal-like

features of Euglena, its classification has been controversial among biologists. In the earliest classifications, botanists suggested that Euglena belonged to the kingdom Eukaryota and the green algae phylum {Chlorophyta) because of its chloroplast (Christensen, 1962). They established a class Euglenophycea for all the unicellular, motile, green organisms, and a Euglenale order for the non-phagotrophic organisms which included Euglena. The names Euglenophycea and Euglenale are still in use by some phylogenists. However, the botanical classification was not widely accepted since there are many photosynthetic deficient organisms that are clearly related to Euglena. Currently, the preferable way to classify Euglena is according to its motile feature, the flagella, with subgroups arranged by phototropic, phagotropic or osmotropic metabolisms. It is well accepted that genus Euglena belongs to the family Euglenidae which includes all the unicellular, non-phagotrophic, motile flagellates with one emergent flagellum and an oval-shaped body (Buetow, 1968). Euglenidae belongs to an order (the Euglenoids) of unicellular flagellets with one or more flagella. The name and the classification of this order, however, still varies among biologists. After the kingdom Protista was developed and the five-kingdom system, kingdom monera (prokayrotes), kingdom protista, kingdom fungi, kingdom plantae and kingdom animalia, was established, Euglenoids were placed in the protist kingdom, in which a 49 subkingdom was defined for the protists with combinations of fungal and algal features. A consensus classification of the Euglenoids is as follows (adapted from Buetow, 1968 and Purves, 1992):

Kingdom Protista (protozoa, fungus-like or algal protists)

Subkingdom Protozoa (flagellates, amoebas or ciliates) Phylum Mastigophora (flagellates, uni or multicelluar) Class Phytomastiogophorea (with chromatophores) Order Euglenida (unicellular, motile, green chromatophores).

The order Euglenida is sometimes called Euglenoida. Because of the vast diversity in Euglenida, it was even suggested that they form their own phylum, "Phylum Euglenida" (Walne and Kivic, 1985) or "Euglenophyta". In 1993, a rearrangement of all the microorganisms in the entire biological world was suggested by the Canadian Evolutionary Biology Program (Cavalier-Smith, 1993). In the new classification, the five-kingdom system was changed into a two empire system with eight kingdoms. Kingdom protista was split into kingdom protozoa, kingdom chromista and kingdom archezoa. New lines were also drawn among all the kingdoms. For example, green and red algae were classified into kingdom plantae, brown alga and fungus-like protists were classified into kingdom chromista, and multicellular organisms were all excluded from Protozoa. In this 50 newly established, narrowed kingdom, Euglenoids became a new subphylum within a new phylum, (for detail see Cavalier- Smith, 1993). Therefore the classification of Euglena became the following:

Kingdom Protozoa Infrakingdom Euglenozoa Phylum Euglenozoa (Euglenoids and kinetoplastids) Subphylum Euglenoida (flagella, the old order "Euglenida") Class Aphagea (non-phagotrophic) Subclass Euglenia (with chloroplast) Order Astasida

Although various methods of classification put the Euglenoids {Euglenida) into different contexts, the species contained by the group remain the same. Today, most people still prefer to use the term "order Euglenida", with their closest relatives found in a parallel order "", (trypanosomes and bodonids). Inside Euglenida, the family Euglenidae is the focus of this dissertation. It contains Euglena, the "colorless Euglena" Astasia, and other families such as Phacus, Eutreptia, Tracholomonas, Lepocinclis, Cryptoglena and Eutreptialla (Pringshein, 1956). The distinctions between the genera include the number of flagella, body coat, metaboly, and presence or absence of chromatophores. Eutreptia and Eutreptialla, since they have two emergent flagella, sometimes are 5 1 considered to belong to another family or order (Leedale, 1967). Inside genus Euglena, species are classified mainly according to body rigidity. Euglena species include rigid spirogyra, mid-mobile pisciformis, granulata, sanguinia and gracilis, the active virids and anabaena and the serpent-like mutabilis and desus.

Phylogenv of Genus Euglena

To date, no one has a definitive answer to where Euglena is placed on the tree of life. Attempts to relate Euglena to Chlorophyta (Klein, 1967), the chloromonads (Senn, 1900), the dinoflagellates (Cavalier-Smith, 1975) or (Taylor, 1976) have all failed because they considered some features of Euglena and ignored others. Later, it was suggested that Euglena belonged to a very

ancestral group at a critical branch point in protist evolution which gave rise to Chlorophytes and dinoflagella (McQuade, 1983). A recent phylogenetic study using 16S and 5S rRNA proposed that the closest group to Euglena was the Kinetoplasts, Trypanosoma and Bodona, and more distant relatives were Dictostellium and Ciliates, and the algae were placed very distant from Euglena (Krishnan, 1990). The close relationship between Euglena and Kinetoplasts has been shown by more than one study (Sogin, 1989, Yasuhira, 1997). It is thought that Euglena and Kinetoplastida shared a common ancestor and branched after the endosymbiotic acquisition of the chloroplast Euglena (Kivic, 1984). 52 Very few phylogenelic trees are available for the Euglenoids. Leedale published a tree in 1987 based on the morphological features of about 18 Euglenoids. This tree proposed that endosymbiosis of chloroplasts into a phagotropic Euglenoid gave rise to all the green and some etiolated Euglenoids including Euglena. By comparative analysis of the rbcL gene sequences of the chloroplast genomes of Euglenoids, Thompson et al. have established a phylogenetic tree rooted by A. nindulas, of 8 Euglena and 3 other

Euglenoids (Thompson, 1995). In this tree, E. gracilis and E. mutabilis are among the most late branching species and E. viridis and E. anabaena are in a relatively basal branch (Figure 1-8). The tree also proposes that Eutreptia and Crptoglena branched earlier than Euglena. Interestingly, the non-photosynthetic Astasia longa, which was proposed to have branched long before the endosymbiosis event in Leedale's tree, was shown to branch very close to E. gracilis. 53

100 Kgracilis Kmut abilis 100 - Kgen icu lata 100 A. long a 67 Kmyx ocylindracea 67 E.p isciformi s 67 67 Kviridis 100 ~E.an abaena E.ste Hat a Eutreptia Cryptog lena —A.nidu lans

Figure 1-8 Euglenoid Phylogeny (adapted from Thompson, 1995). The tree is a consensus (majority rule) of 3 equally and most parsimonious trees resulting from a branch and bound search by the Phylogenetic Analysis Using Parsimony program (PAUP version 3.1, Swofford, 1993). Branch frequencies among the three most parsimonious trees are shown at each node. 54 IV. Euglena gracilis chloroplast genome and intron content Euglena chloroplasts are believed to have originated from the chlorophyll b containing prochlorophytes, a type of cyanobacteria

which is also the ancestor of green algae (Drager, 1993). Euglena chloroplasts are enclosed in a three-layer membrane. The hypothesis is that the extra membrane layer of Euglena chloroplasts arose from "two" endosymbiotic events, in which the first, non- photosynthetic endosymbiont swallowed a second, photosynthetic cell (reviewed in Gray, 1993).

Euglena chloroplasts, like other endosymbiotic (organelle)

genomes, are double-stranded circular DNAs. In 1993, the complete chloroplast genomic sequence of a mid-mobile Euglena species, E. gracilis, was reported in (Hallick, 1993). The genome is a 143 kb circular DNA and encodes nearly 100 genes. There are 27 tRNA genes {trn), 21 ribosomal protein genes (rpl/rps), 3 rRNA genes, 28 photosystem protein genes including photosystem genes ipsa/psb), cytochrome genes (j)et), a chlorophyll gene {chl), ATPase complex genes {atp) and rbcL, an ET-Tu protein gene (tufA), 3 RNA polymerase genes {rpo), 13 unidentified open reading frames and a gene for a unique protein, roaA, characterized as a potential intron maturase (Jenkins, 1995). The genome map is shown in figure 1-9. Like most of the organellar genomes, the E. gracilis chloroplast encodes all of its tRNAs and rRNAs. However, some photosynthetic proteins are encoded in the nucleus and are translocated into the chloroplast when translated. The chloroplast genome also might 55 encode some mitochondrial proteins. Gene exchange among nuclei, mitochondria and chloroplasts has been observed in other Eukaryotes (Purves, Orians and heller, 1992, Sinaner Associates, Inc) (Marechal-Drouard, 1993). This phenomenon of gene exchange is thought to have taken place shortly after the endosymbiosis to increase nuclear-organellar coordination. As in all organelle genomes, the Euglena gracilis chloroplast

genes are prokaryotic-like. For example, the genes are expressed

polycistronically. That is, many genes cluster together and are expressed under one promoter. The characterized transcription clusters (operons) include the psbC-psbD operon (Copertino 1994), the irnK-psaA-psaB-psbE-psbF-psbL-psbJ operon (Stevenson, 1994), the rps2-atpI-atpH-atpF-atpA-rps 18 operon (Drager, 1993), the petB-atpB-atpE operon (Hong, thesis 1996) and the psbB-psbT operon (Hong, 1994). One tandem repeat was found, to be a triple repeat of the 5S-23-16S rRNA coding cluster. Euglena chloroplast promoters mostly contain a Shine-Delgarno-like motif. Transcription and translation are coupled, and the transcription of the genome appears to be almost symmetrical and is in the same direction as

DNA replication (Hallick, 1993). All these genomic features seem to assure the efficiency of gene expression in the chloroplast. The promoter strengths and the promoter regulation of the Euglena gracilis chloroplast genome are still under investigation. However, as suggested by studies of many other chloroplasts, transcription-level control does not seem to be the predominant 56 force in gene regulation in chloroplast genomes. For example, many chloroplast genes, including light-inductive protein genes, are found to be constitutively expressed (Tobin, 1985). Second, transcription inhibitors were shown to be ineffective for the expression of many proteins (Malnoe, 1988, Mullet, 1988). On the other hand, many expression controls were found at the post- transcriptional level. Post-transcriptional control of organelle gene expression could take place through the control of RNA stability. In chloroplast genomes, since the RNAs are not poly-adenylated, the RNA stabilities are controlled by many cis-located elements such as the presence of short tandem repeats or large inverted repeats (IR), ranging from 6 to 76 kb (Palmer, 1985) located in the 3'-untranslated region. Control of RNA stability was also shown to be a function of some nuclear encoded RNA-binding proteins (Sugita, 1996) in higher plants. The Euglena gracilis chloroplast psbA gene has a 3' stem-loop that was shown to be important for maintaining the steady state mRNA level (Stevenson, 1994).

A more universal post-transcriptional control in organelles is through processing of the large, polycistronic preRNAs. The processing events include endo- and exo-nuclease cleavages to produce rRNA and tRNA, RNA editing, and most significantly, intron splicing.

The Euglena gracilis chloroplast genome has the largest intron content found to date in an organelle genome (Thompson, 1995). All 57 the introns are very AT rich, which accounts for the high AT percentage of the whole genome. To date, 155 introns have been identified or proposed, taking up 39.2% of the genomic DNA content. All the introns are located in mRNAs of intercistronic spacers, none are in tRNAs of rRNAs. A diagram of the Euglena gracilis chloroplast genome depicting gene content and intron distribution is shown in figurel-9. There are about 48 group III introns and 70 group II introns in E. gracilis. The remainder are various twintrons and complex twintrons, in which one or more introns (group II or group III) are located inside another intron (group II or group III). Twintrons are unique structures that are found so far only in Euglenoids. In twintrons, the internal introns interrupt the critical structural elements of the external introns, therefore blocking the processing of the external introns. The interruption, and hence regulation, of the external intron splicing is thought to be the reason that internal introns have been maintained through evolution. As to the origin of the twintron, it is likely that mobile introns inserted into existing external introns (Copertino, 1993). Open reading frames were also found in Euglena gracilis chloroplast introns such as psbC intron 2 and intron 4 and psbD intron 8. Analysis of the predicted protein sequences of these ORFs have identified functional motifs found in group II intron encoded maturases such as the X-domain and a residual RT-domain (Mohr, 1993). 58 The rich intron content, however, appears to be limited to Euglena gracilis, for research has shown that other Euglenoid species have much smaller intron contents. For example, data collected so far shows that Euglena viridis contains about 30 introns (Thompson, 1995). Introns, unlike other genomic features of the chloroplast, should slow down gene expression. Despite the possible needs for gene expression controls, the presence of so many introns in such a simple protist organelle seems quite intriguing in terms of genomic organization. Therefore the E. gracilis chloroplast genome, and all the attendant introns, is a very interesting field of research. 5 9

Euglena gracilis Chloroplast DNA 143,172 bp

Figure 1-9 Circular map of Euglena gracilis chloroplast genome (adapted from Thompson. 1995). The locations of introns are shown as blue (group III) and red (group III) lollipops. One lollipop inserted into another represents a twintron. The replication and transcription orientation is clockwise for the outside ring and counter clockwise for the inside. 60 V. Thesis objectives The rich and diverse intron population in Euglena gracilis chloroplast provides a very good system for intron studies. Interesting questions about introns including structure, splicing mechanism, and evolution have been addressed through the study of many Euglena gracilis chloroplast introns. In my thesis research, I have focused my investigation on one particular intron, psbC intron 2, the largest intron of the genome. First, as discussed in Chapter 2, I have characterized the detailed structure of this intron, showing that this intron is a group II intron that contains a 759 aa ORF, later named mat 2, which is interrupted by three additional group II introns. I also showed that mat 2 encodes a maturase-like protein that could be the key factor in the processing of one or more group II introns in the chloroplast, and it could also have contributed to intron evolution in the genome. Next, I expanded my research into the area of the evolution of Euglena genera. In Chapter 3, I report the detection and confirmation of the presence of psbC intron 2 and mat 2 in other Euglena species. I also showed that the DNA sequence of the external intron 2 and mat 2 protein were significantly conserved, indicating a deep rooting of this intron in Euglena, supporting the proposed critical functions of mat 2 in the chloroplast genome. Euglena gracilis chloroplast introns are relatively smaller than other organellar introns and many structural elements are missing from many E. gracilis group II introns. The membership of E. 6 I gracilis introns in the group II intron family was questioned. In Chapter 4, I compared the homologous intron data base I have established with the psbC intron 2 external intron data base, and showed that the critical structural elements proposed in the canonical group II intron model (Michel, 1989) were all well conserved in this Euglena intron, providing evidence for the authenticity of Euglena gracilis group II introns. Biochemical tests of the activities of group II intron maturases have not been possible, as obtaining large amounts of maturase

protein has proven very difficult. The only chloroplast maturase that has been tested biochemically is the matK of mustard (Liere, 1995), in which RNA binding was detected. In Chapter 5, I reported the initials steps for overexpression of mat 2 for further biochemical tests on its activities. I have tried mat! expression in E. coli and yeast. I have been able to express a short C terminal segment of mat 2 in the E. coli system. I have also established a protocol for mat 2 expression in the yeast system and have obtained preliminary expression of a large C terminal fragment which included the functional domains. The expression of mat 2 will help to initiate biochemical investigation of maturases in Euglena. Among the Euglenas tested, Euglena myxocylindrcea, was found to lack psbC intron 2 and mat 2. In appendix I, I describe my detection of group II introns in the chloroplast genome of E. myxocylindracea. 1 have discovered that a large number of group II introns were absent in this genome, including some deeply rooted 62 group II introns. The co-absence of mat 2 and possibly all the group II introns in E. myxocylindracea indicated that mat 2 might be responsible for the existence of group II introns in Euglena chloroplast.

Finally, as a side-project, I reported, in appendix II, my work on characterizing a twintron, atpE intron 1. I have discovered that atpE intron 1 is a group II intron that contains another internal group II intron at its domain VI. The secondary structures of both introns have been analyzed. 63 CHAPTER! THE EUGLENA GRACIUS INTRON-ENCODED MAT 2 LOCUS IS INTERRUPTED BY THREE ADDITIONAL GROUP n INTRONS.

Introduction

The Euglena gracilis chloroplast genome contains many novel introns unlike any introns found in other genomes. Introns account for up to 39% of the Euglena gracilis chloroplast genome. Among the 155 introns identified to date, 74 are group II introns ranging in size from 277 nt (nucleotide) to 671 nt (Hallick et al., 1993). Euglena chloroplast group II introns conform to the core secondary structure model proposed by Michel (Michel et al., 1989). However, only the catalytic domains V and VI are well conserved in Euglena

chloroplast introns, while domains I-IV are often abbreviated. The Euglena chloroplast genome also contains fifteen twintrons (Copertino and Hallick, 1993), or introns-within-introns. They include simple twintrons, where one intron is inserted into another intron (Copertino et al., 1991; Copertino and Hallick, 1991; Copertino et al., 1992), and complex twintrons in which the external intron is interrupted by more than one intron (Hong and Hallick, 1994a; Hong and Hallick, 1994b) or even by another complex twintron (Drager and Hallick, 1993). The internal intron is usually inserted in a functional domain of the external intron so that the internal intron 64 must be removed to restore the splicing ability of the external intron (Copertino and Hallick, 1993). Splicing of group II introns in vivo is thought to be mediated by protein factors, including intron-encoded maturases (Lambowitz and Perlman, 1990; Michel et al., 1989; Saldanha et al., 1993). Group II intron-encoded maturases often contain a reverse transcriptase domain. The reverse transcriptase activity has been shown to contribute to intron mobility during evolution (Moran et al., 1995). A potential maturase has been identified in intron 4 of psbC, a gene coding for the chlorophyll binding protein of photosystem II. The putative maturase gene is located in the internal intron of this group

III twintron (Copertino et al., 1994). A gene encoded in psbC intron 2 may be a second intron maturase (Mohr et al., 1993). The 4144 nt long psbC intron 2 is the largest intron in the Euglena gracilis chloroplast genome (Hallick et al., 1993). Here we report that it is a group II intron containing a 758 codon open reading frame, which has been designated mat 2. mat 2 encodes a putative maturase-like protein that is interrupted by three internal group II introns (figure 2-1). This is the first example of an intron- encoded open reading frame containing introns. I LS/Ht

i-'igurc 2-1 Schcinalic clcscriplion ol" llic overall slruclurc of /j.v/jC iniron 2. p.shC exons 1, 2 and 3 (xl. x2, x3) arc shown as solitl hlack boxes. Orl" 758 (nidi 2) coding regions arc shown as shaded niagenia boxes, and ihc three inicrnal inlrons are shown as shaded blue boxes. Sices of inlron 2b, 2c and 2d arc shown above. An arrow indicates ihe potcnlial ribosonuil binding site of nuil 2.

Ul 66 Results

Identification of partially spliced psbC intron 2 pre-mRNAs by Reverse Transcriptase-Polymerase Chain Reaction (RT-PCR) analysis

The Euglena gracilis chloroplast psbC intron 2 is 4144 nt long. The size range for group II introns in Euglena gracilis chloroplasts is

277-671 nt. Present within the intron are several putative group II intron 5'-splice sites and -3' domains V and VI, along with potential exons of an intron-encoded protein. To test the hypothesis that psbC intron 2 is a twintron with an internally-encoded protein, a series of oligonucleotide primers were designed to amplify psbC intron 2 RNA processing intermediates. Locations of the cDNA and PGR primers used in the analysis are shown in figure 2-2A. Two control reactions were used to define the sizes of the fully spliced, and fully unspliced RNAs. RT-PGR with primers specific to psbC exons 1-2 and exon 3 (figure 2-2B, lane 2, oligonucleotide pi spans spliced exons 1 and 2) resulted in a single product of 65 bp, corresponding to spliced exons 1-2-3. Chloroplast total nucleic acid was PGR amplified with primers specific to the 5' and 3' exon-intron boundary sequences (figure 2-2B, lane 3 ). The resulting 4176 bp product is the size of the unprocessed intron 2 pre-mRNA. To test for partially spliced intron 2 pre-mRNAs, chloroplast RNA was RT-PGR amplified with primers spanning exon 2-intron 2 and intron 2-exon 3 boundary sequences (figure 2-2B, lane 4). A 67 Figure 2-2. cDNA-PCR analysis of psbC pre-mRNAs. A. Structure of the psbC intron 2 is shown. Black boxes represent

exons 1, 2 and 3, white boxes are external intron 2a, light shaded boxes represent internal introns (2b, 2c, 2d) and dark shaded boxes represent orf 758. Positions of the primers used for cDNA synthesis and PGR amplification are shown as arrows in the map of psbC intron 2. pl-p7 are PGR primers, c6-cl2 are cDNA primers. B. PGR amplification of psbC partially spliced mRNAs . Amplification products were fractionated on an agarose gel. PGR and cDNA primers used for each amplification are shown above lanes 2-6 and 8-12. Control PGR with Euglena gracilis chloroplast total nucleic acid is labeled "DNA" above lanes 2,5,9 and 11. 1 BstEII molecular weight markers are shown in lanes 1 and 7, and the sizes (in bp) are shown on the left of the gel. Sizes of some PGR products are shown on the right. 68

p3 p2. pi p4 ^ exon 3 2a exon 1+2 orf738 c8 -^clO •<—c6

B

< < < < Z Z Z z •«wQ Q Q Q Ov PC 00 00 oo' 00 clO u u u u U1 u u u u u 1 1 1 to 1 1 1 CO 1 CO tn CM 4 •«* CQ 4. .1 a. a CL a. Cu CL a CL O. o. T 1 1 1 1 1 1 1 1 1 I I I 1 1 1 1 1 1 2 3 4 5 8 9 10 11 12

4822 4320 3675 2323 1929 1371 1264

702

224 117 69 PGR product of about 4200 bp, that co-migrates with the control chloroplast total nucleic acid amplification product, and amplified cDNAs from four partially spliced pre-mRNAs of 3900 bp, 3500 bp, 3300 bp, and 3000 bp were obtained. The signal from the 3300 bp RT-PCR product is faint. The 3000-3900 bp PGR products are evidence of at least three introns, of a total length of about 1200 bp, internal to psbC intron 2.

Three group II introns are located in domain IV of psbC intron 2 A preliminary secondary structure model for an external group II intron interrupted in domain IV by internal introns and an open reading frame was developed (figures 2-2A and 42-A). To test this model, primers for RT-PGR reactions were designed to complement either the putative external intron domain IV (p4, figure 2-2A) or the coding region of the putative intron-encoded polypeptide (p6, p7, c6 and c7, figure 2-2A).

RT-PGR with primers p4 and c7 resulted in a product from fully unprocessed precursor (1861 bp) that comigrates with the product of a total nucleic acid control template, and products of four splicing intermediates (1509 bp, 1140 bp, 956 bp and 587 bp) (figure 2-2B, lanes 5 and 6). These products correspond to the four intermediates obtained by RT-PGR with primers p3 and c9. Therefore, all the internal splicing events occur between p4 and c7 (coordinates 13979-15839). 70 RT-PCR with primers p4 and c6 resulted in four products of 1192 bp, 823 bp, 639 bp and 270 bp (figure 2-2B, lane 8). The 1192 bp species was identified by sequencing as the fully unprocessed precursor (data not shown). Therefore, one of the three internal splicing events must occur 3' of primer c6 and at least two splicing events must occur within the region flanked by p4 and c6 (coordinates 13979-15170). RT-PCR amplification with primers p6 and c8 resulted in a 2677 bp product corresponding to unprocessed precursor and one partially spliced intermediate of approximately 2300 bp (figure 2- 2B, lane 10). The only RT-PCR product with primers p7 and c8 is the unprocessed precursor (figure 2-2B, lane 12). Since p6 and p7 are the compliment of c6 and c7, respectively, there is only one internal processing event between c6 and c7 (coordinates 15148-15839). We therefore conclude that psbC intron 2 contains at least three internal introns and all the splicing events occur within the region flanked by p4 and c7 (figure 2-2A).

Characterization of internal 5' and 3' splice sites within psbC intron 2 In order to locate the insertion sites of the internal introns, products from PCR reactions with primers p2-c6, p4-c6, p6-c7 (PCR data not shown) and pl-clO were cloned and sequenced. Schematic diagrams of the cDNAs are shown in figure 2-3A. Representative sequencing data showing the splice sites of internal introns

(designated 2b, 2c and 2d) and the external introns 2a are displayed 71 in figure 2-3B. Intron 2b is 553 nt (coordinates 14161-14713), intron 2c is 369 nt (coordinates 14775-15143) and intron 2d is 352 nt (coordinates 15432-15783). Splice sites for introns 2a and 2d are unambiguous. Splice sites for introns 2b and 2c could be shifted -1 nt and yield the same cDNA sequence. 72 Figure 2-3. Characterization of three internal introns by cDNA cloning and sequencing. A. Schematic diagram of the four PCR fragments cloned and sequenced. The names of the plasmids containing the PCR fragments are listed at right, with a description of the cDNA. pECZ2020 is obtained from PCR p4-c6, pECZ2021 is from PCR p6-c7, pECA2017 is from PCR p2-c7, and pECZ2023 is from PCR pl-clO. B. Sequencing data showing the splice junctions of introns (from left to right) 2c, 2d, 2b and 2c, and 2a. Splice sites are indicated by arrows. External intron 2a or psbC exon sequence flanking each intron is shown to the right of each panel. A

exon 1+ 2 exon 2a orf 758 p4—^ p2 > pi > clO pECZ 2023 Fully Spllcod

pECZ 2021 2d Spliced

pECZ 2017 2b and 2c Spliced

pECZ 2020 2c Spliced

OJ B

pECZ2020 pECZ2021 pECZ2017 pECZ 2023 2c Spliced 2d Spliced 2b and 2c Spliced Fully Spliced G A G A T C G A T C GATC 75 Intron secondary structure models The size of the three introns internal to psbC intron 2 are typical of Euglena chloroplast group II introns (Copertino and Hallick, 1993; Hallick et al., 1993). Models for the secondary structures of introns 2a, 2b, 2c and 2d are shown in figure 2-4. These intron structures conform to the model proposed by Michel for group II intron secondary structure (Michel et al., 1989) and are similar to other Euglena chloroplast group II introns (Hong and Hallick, 1994b). Each model includes six helical domains (I-VI) radiating from a central core. Domains V and VI are most diagnostic of Euglena group II introns. Domain V is a potential catalytic domain, with the conserved pairing 5'-(AAJ)AGCU (A/G)GUUU-3' at the base of the lower stem, and a 5'-(A/U)G-3' bulge. Each domain VI contains the adenosine residue involved in lariat formation at position -7 or -8 from the 3'-splice site (Lambowitz and Belfort, 1993; Michel et al.,

1989). In addition to the helical domains, several potential tertiary interactions believed to be involved in splicing are suggested, including the exon binding site I (EBSI) in the subdomain Diii loop of domain I for pairing with IBSl of the 5'-exon; EBSII in the subdomain Di bulge of domain I for potential pairing with IBSII of the 5'-exon (Michel and Jacquier, 1987); the y-y' interaction involved in recognizing the 3' splice site (Jacquier and Michel, 1990; Michel and Jacquier, 1987); the e-e' interaction involved in locating the 5' splice site (Jacquier and Michel, 1990); and the a-a' interaction between domains IB and domain ID (Harris-Kerr et al., 1993). Note 76 that the y-y', e-e' and a-a' interactions are not well-conserved in these introns. As is typical of Euglena gracilis chloroplast group II 77 Figure 2-4. Secondary structure model for psbC intron 2, external intron 2a and internal intron 2b, 2c and 2d. The secondary structures of psbC intron 2a (A), 2b(B), 2c (C) and 2d (D) are based on the model proposed by Michel et al., 1989(2), and on comparative analysis of other Euglena group II introns (Hallick, unpublished data). Orf 758 is located in the loop of domain IV of intron 2a. The insertion sites of the three internal introns are all within the loop of domain IV of intron 2a and are indicated by arrows. The 5' and 3' splice sites of each intron are indicated by thin arrows. IBS(I/II) - EBS(I/II) pairing regions are indicated by thin arrows. Possible nucleotides in the guided pair, y-y' and e-e' interacting regions are enclosed in boxes and hypothetical tertiary interactions are shown as dotted or solid lines. The a-a' interactions are indicated by thin arrows. Domain 11 .«• u. B AaO jV v« A*^^t 3c(ll«*)(3k2nU Domain !k V' iv"""'" V .1' fc^pOrooilW "• .fiV / -"-•'.v'4^ ... 'V. '' IIOII ^'ji'^''"' V- i>«ni(.AiiAii. ^ f ^i#''' /•.V ' / '• > ^ uoman ai Domain I 'V Dom-n VI \ Domain IV 10'j. 10 */Jj,/>•»»'... J; .iiA i/>It t •plica «; '• D«i>ain »v-M .^i' -uVy* -

jj;8 y ,\ » (lAllflMlRAtu" ''"VAx,;"''— - •• ASS^'^b!®;k«aU V --'^Ukj Domain VI 3 i tpKc* * tlu D ij (J,,.- W .Domain I '[fj? Doamin IV \ M yy 4' Domain 1.,"^i.i""* \ 'lii IN* '• Domain V {llilamtit" . ,• f ./'""Si """x if/ DomainV Domain l|i "" 1 JuC''»Vf'"Wf'*'\ Donuin VI .1 'I,M \ViV \ \ v/- •cH' DomainI| \'A-'.'' li (t ,l|;>''\ '••'"s'.i)'". \\ ,|''''i'"'0 'I.,-ill. II II « I' A e u *•"•HI** 4 -J 00 79 introns, secondary structures and tertiary interactions may be abbreviated when compared to self-splicing group II introns from other genomes (Copertino and Hallick, 1993). The helical stem of domain IV of external intron 2a, the largest of the 4 introns, has more extensive pairing than in most other Euglena group II introns. All three internal introns, as well as the open reading frame, are within domain IV of intron 2a.

psbC intron 2 splicing pathway is partially ordered If splicing of the three internal introns, 2b, 2c and 2d, were unordered, PGR amplification of cDNA using primer pairs p2~c9 or p4—c7 should result in eight products. However, only five products are detected (figure 2-2, lanes 4 and 6). Splicing intermediates lacking either intron 2c or 2d alone are similar in size, and would not be resolved. Similarly, intermediates lacking both intron 2c and 2b or 2d and 2b would not be resolved either. These could account for two of the missing products. Also missing was an intermediate corresponding in size to a pre-mRNA with intron 2b excised and 2c and 2d retained. During DNA sequence analysis of partially excised intermediates, pre-mRNAs with 2d excised, 2d and 2c excised, 2d and 2b excised, and all three introns excised were identified. No products with 2d retained and 2b excised were found. To directly test if removal of intron 2d precedes excision of intron 2b, cDNA was synthesized using primer cl2, which crosses the 3' splice junction of intron 2d (figure 2-5A), and was PGR amplified with primers cl2 80 Figure 2-5. Dominant processing pathway of the three internal introns. Primer cl2 crosses the 3' splice junction of intron 2d. Primer cll crosses 5' splice junction of intron 2d. Control PCR with Euglena gracilis chloroplast total nucleic acid is labeled "DNA". I BstEII molecular weight markers are shown in the left lane and the sizes (in base pairs) are shown on the left of the gel. Sizes of some PCR products are shown on the right. Location of primers used for cDNA synthesis and PCR amplification are shown on the psbC map in panel

A. orf578 82 Figure 2-6. Predominant pathway of psbC intron 2 mRNA processing. A model of the predominant processing pathway of the three internal psbC introns are shown. Black boxes represent exons 2 and

3; solid lines represent intron 2a. White open box represents the open reading frame 758. Internal introns 2b, 2c and 2d are shown as lettered lollipops. Thick arrows label the preferential pathway. 83 84 and p4 (figure 2-5). All products amplified in this reaction must contain 2d. Products corresponding to the unexcised mRNA (1822 nt), intron 2c excised (1453 nt), and introns 2b and 2c excised (900 nt) were obtained. No intermediate with only 2b excised was detected. In addition, relative to the splicing intermediates, much more of the unspliced mRNA was observed than in previous reactions. A partially ordered RNA splicing pathway for the three internal introns, consistent with the intermediates detected by RT-

PCR and DNA sequence analysis is shown in figure 2-6. Introns 2c and 2d are preferentially spliced prior to 2b. The splicing of 2b prior to both 2c and 2d was not observed.

Internal introns of psbC interrupt an open reading frame encoding a putative maturase Our initial description of the DNA sequence of the psbC operon included a 4143 nt psbC intron 2 with open reading frames of 177 and 241 codons (Hallick et al., 1993). This sequence has since been corrected to 4144 nt (EMBL Accession X70810 Release 37, correction at position 17874). In the revised sequence, orfs 177 and 241 are part of a continuous open reading frame of 635 amino acids. After splicing of introns 2b, 2c and 2d, the orf is extended to 758 amino acids. Intron 2b is located between a putative ribosome binding site and a methionine codon at the beginning of orf 758 (figure 2-1). Intron 2c is present 57 nt into orf 758 between the codons for amino acids 19 and 20. Intron 2d is located 287 nt into orf 758 and splits A)

Uoinaiii V Domain V1 Doinnin VI I

KKNIYYVKYL NFLlFGFL.SS KNFJ FFl-l'.LK YLFFLKNKLY I'NFl^I'VCJ11- SSSMUKV I I- l.iJVY I AVNK • •AAikA • A* •• • • aA* • ••• •• ••••• AAA

Dl'liliHYVKYp Gb-iLA-Kt^p -hi. uKVJb-Y LliuFWU- liu lio pupFJiul Nui.jippp pi' rK;Yli.':;;\/ I, B)

FKNIYYVRYI. NFLIFlJFLSH KNF 1 FFFKLK YLFF1..HNKI.Y Fl Jl-'KlsVDJ F.S S.SNI )KV I l-l ,i; VYIAYIirj

• »• A* aAAA « o-IjIi-YVI^YA UU-lilit^li-(,Jp lib li Fli [i li li abu li (/<) - li l-'l.li - li

I'igmc 2-7. Comparison of llic icvcisc lianscii|)lasc ilomains V-VII liom oi I' (lop line) with

Hic cliloro|)lasl consensus (hollom line, panel A) and general consensus (hollom line, panel li) aie slK)wn. I)t)nuiin VI is underlineil. A variable si/.ed gap of'I-1'I amino acids occuis in llie geneial consensus helween domains VI and VII {&). A gap ol lwo amino acids occins helween ilomains VI and VII in ihc orl 7f)»S alignment in panel A. Similar amino aciils aie iiulicaled by a dot (•), lileiHical amino acids aie iiulicaled with an aslerisk ('). I'osilions willi no cimsensus amino acid aie iiulicaled willi a ilasli (-). Classes of amino aciils aie as lollows; li=liyiliopliol)ic. p=pt)lar, l)=l)asic. a=acidic, o=aiomalic. 86 the codon for amino acid 95. psbC intron 2 is the first example of an intron-encoded gene being interrupted by introns. Orf 758 would encode a protein with a predicted size of 93kD and is composed of 25% aromatic amino acids. The predicted overall charge of orf 758 is +81 (pH 5.5) with a pi of 10.69. High positive charge and aromatic amino acid content may be indicative of RNA binding activity. In fact, orf 758 has previously been identified as maturase- like based on the presence of an X domain (Mohr et al., 1993). Many maturase-like proteins also include reverse transcriptase domains I-VII, where domains V-VII are usually the best conserved. To search for domains V-VII in orf 758, the amino acid sequence was aligned visually with a chloroplast group II intron maturase (matK) consensus sequence (figure 2-7A) and a general reverse transcriptase consensus (figure 2-7B) (Xiong and Eickbush, 1990). In both comparisons, characteristic motifs of YVRY in domain V and FLG in domain VII could be aligned. In addition, the alignments could be extended to many positions where a class (such as hydrophobic) as opposed to a particular amino acid was specified in the concensus. Since orf 758 is located within domain IV of a group II intron, the canonical position of intron-encoded maturases, and contains maturase-like domain X, and reverse transcriptase-like domains V-VI-VII, it has been designated mat 2. Reverse transcriptase domains I-IV, domain Z and zinc-finger doamins could not be identified in mat 2. 87 DISCUSSION

A new category of twintron psbC intron 2 is a group II intron (2a) encoding the 758aa mat 2 locus, mat 2 is interrupted by three additional group II introns. Two of the internal introns (2c, 2d) are within the coding region of mat 2 . The third internal intron (2b) is located between a potential ribosome binding site and the start codon for mat 2. The potential ribosome binding site of 5'-GUAGU, centered at -18 upstream of mat 2 ATG codon 1 is complementary in 4/5 positions to the 3'-CCUCA-5' 3'-end of Euglena chloroplast 16S rRNA. Intercistronic introns have been reported previously (Barkan, 1988; Stevenson et al., 1991), however this is the first example of introns within an intron-encoded maturase, and the first example of a group II intron in a 5'- nontranslated leader sequence (figure 2-1). As a complex twintron with an internal gene, psbC intron 2 represents a new type of twintron.

A partially ordered psbC intron 2 splicing pathway The partially ordered psbC intron 2 internal introns splicing pathway, with preferential initial splicing of intron 2d, was an unanticipated result. Internal group II introns in other Euglena chloroplast genes normally splice via independent, unordered events (Koller et al., 1985). Intron 2d may interfere with the normal folding of introns 2b and 2c, or keep splicing factors from contacting the introns. It is possible that only 2d is correctly folded and exposed to 88 splicing factors in the unspliced pre-mRNA. Splicing of 2d may allow a shift in the tertiary structure, allowing 2b and 2c to splice in an unordered sequence. Excised, intact 4144 nt psbC intron 2 mRNA was not detectable by northern hybridization with intron-specific probes (data not shown). This is a preliminary indication that the external intron might not excise prior to splicing of internal introns. Unspliced domain IV, at 3707 nt, is possibly too large to fold properly and may interfere with the folding of the entire intron.

mat 2 may be required for splicing Why are internal introns 2b, 2c and 2d maintained? It has been proposed that the internal introns of twintrons maintain their ability to splice as a species evolves because they interrupt functional domains of the external intron. If the internal intron were to lose the ability to splice, so would the external intron, and as a consequence the host gene would not be expressed (Copertino and Hallick, 1993). It is unlikely that internal introns 2b, 2c and 2d interrupt any essential RNA structure. In the secondary structure model of psbC intron 2a, the three internal introns are located in the loop of domain IV. No RNA structural elements required for intron excision have been identified in domain IV (Copertino and Hallick, 1993; Lambowitz and Belfort, 1993; Michel et al., 1989). Experiments also showed that deletion of domain IV did not block group n self-splicing in vitro (Hebbar et al., 1992; Koch et al., 1992). 89 Trans-splicing of introns segmented within domain IV occurs in chloroplasts in vivo (Bonen, 1993). Therefore, the three internal introns in domain IV may not be sufficient to block the excision of intron 2a at the RNA level. Alternatively, besides the additive effect on intron size, the internal introns 2b, 2c, and 2d could be viewed as interrupting the coding region of an essential protein. Splicing of intron 2b, 2c and 2d may be a prerequisite for expression of mat 2, a possible intron splicing factor.

Possible functions of mat 2 psbC intron 2 is one of three Euglena gracilis chloroplast introns encoding internal maturase-like polypeptides. Intron-encoded genes for maturase-like proteins are known in group II introns of both mitochondria and chloroplasts. Some of these proteins have been shown to mediate splicing or potentially contribute to intron mobility (Carignani et al., 1983; Moran et al., 1995). The group II intron- encoded maturase-like proteins comprise a subgroup of retroelements (Doolittle et al., 1989; Xiong and Eickbush, 1988) and are characterized by the presence of one or more domains found in retroelements including Z, reverse transcriptase, X and zinc finger (Mohr et al., 1993). The X domain is a potential RNA binding domain, which has been found in all group II intron-encoded maturases or maturase-like proteins. The zinc finger domain may be a remnant of a retroviral endonuclease and has recently been shown not to be 90 necessary for maturase activity (Zimmerly et al., 1995). Only the X and reverse transcriptase domains have been shown to be required for maturase activity (Moran et al., 1994; Moran et al., 1995). Significantly, an X domain and reverse transcriptase domains V-VII have been identified in mat 2. The putative Euglena gracilis mat 2 protein diverges significantly from other group II intron-encoded maturase-like proteins. Only reverse transcriptase domains V-VII have been identified, and the X domain also differs from the consensus. There are also significant differences between the mitochondrial and chloroplast group III intron-encoded maturase-like proteins. For example, the mitochondrial reverse transcriptase domain V consensus is YVRYADD, and the YADD motif is required for activity in retroviral reverse transcriptases. In chloroplasts the consensus is YVRY. Loss or modification of domains may be due to differences in substrates, or changes in function. The term "maturase activity" has not been rigorously defined but is generally used to describe any enhancement or enablement of splicing activity in maturase- dependent introns. It is possible that maturase-like proteins serve different roles in different organisms. As a potential maturase, mat 2 is a prime candidate for playing a role in splicing one or more Euglena chloroplast group II introns. Known group II intron-encoded maturases act specifically on the intron which encodes them or closely related proteins (Carignani et al., 1983; Carignani et al., 1986). However of the 155 plus Euglena 91 chloroplast introns only 3 encode maturase-like proteins. It is possible that one result of divergence from concensus domains is more general maturase activity. Functional assays should provide additional information regarding the activity and specificity of mat 1.

Evolution of mat 2

Many intriguing questions about the origin and spread of group II and group III introns in genus Euglena remain. Our working hypothesis is that at the DNA level the chloroplast genome of genus Euglena arose from an intron-less common ancestor with the chloroplast genome of other photosynthetic eukaryotes, including land plants (Thompson et al., in press). In this model, also suggested by others (Lambowitz and Belfort, 1993) ancestral introns were mobile, retrotransposable genetic elements that invaded the genome from another organism, relying in part on internally encoded enzyme activities for mobility. Possibly a single "founder" intron arrived via intergenomic (and interspecies) transfer, and was subsequently propagated intragenomically, either into other genes, or into introns or intragenomic spacers. Introns of contemporary Euglena species have subsequently lost mobility, but retained the ability to be spliced. To explore some of these questions, an analysis of mat 2 in evolutionarily diverse species of genus Euglena has been initiated. The first significant result is that an intron homologue encoding mat 2 is present in a similar psbC group II intron of E. viridis (Zhang and Hallick, unpublished observation). E. viridis lacks the majority of 92 group II and group III introns of E. gracilis, but does contain a homologue of E. gracilis psbC intron 2.

MATERIALS AND METHODS

RNA isolation Euglena gracilis chloroplasts were isolated and RNA was purified from isolated chloroplasts as described previously (Hallick et al.,

1982). cDNA synthesis Primers were synthesized by the Midland Certified Reagent Company or the University of Arizona Biotechnology Center. Seven synthetic deoxynucleotide primers complementary to the RNA-like strand were used for cDNA synthesis. cDNA primer clO, (5'-

GCAAATCCAGTTGATTCCTG-3', coordinates 17895-17914, EMBL accession #70810) is complementary to the RNA-like sequence of psbC exon 3; primer c9 (5'-CCACCTACAGTTAGTTAAATCG-3', coordinates 17866-17887) is complementary to the RNA-like sequence at the psbC intron 2a-exon 3 junction; primer c8 (5'- GCACATACAGCTTTCTGACTGAC-3', coordinates 17802-17824) is complementary to the RNA-like strand at the 3' end of psbC intron 2a and includes the hypothetical 3' stem of domain IV of the external intron 2a; primer c6 (5'-CCAACCACATTTTAAACTTAACC-3', coordinates 15148-15170) and primer c7 (5'- 93 GGACACTGCTTACAAAAAAATGG-3', coordinates 15817-15839) are complementary to the RNA-like sequence at the 5" end of orf 758. In order to determine the splicing pathway, primer cll (5'- ATAATCACACACATTCAATTAAATTCATGC-3', coordinates 15412- 15441) and primer cl2 (5'-GATGTAACATTAAATATTAAAAACTAAG- 3', coordinates 15773-15800) were designed to cross the junction of the intron 2d 5' and 3' splice sites, respectively. cDNA was synthesized using 200 ng of cDNA primer and lOfig of purified total chloroplast RNA as template. Reactions were carried out as described previously (Copertino and Hallick, 1991).

PGR analysis, cloning and sequencing:

Oligos for PGR amplification are as follows: primer pi (5'-

GTGGAAACGCTCTTTAATAAAAAAT-3', coordinates 13163-13176 + 13720-13730) is RNA-like and covers psbC exons 1 and 2; primer p2 (5'-CGCTCTTTAATAAAAAATGTGTGGC-3', coordinates 13169-13176 + 13720-13737) contains psbC exons 1 and 2 and crosses the exon 2- intron 2a junction; primer p3 (5'-TAATAAAAAATGTGTGGCATGG-3', coordinates 13720-13741) is located at the psbC exon 2-intron 2 junction; primer p4 (5'-GTTTAGATTTATCCGCTTTGGG-3', coordinates 13979-14000) and primer p5 (5'- GGTAAAAATTAATTGTCCTGACTGGC-3', coordinates 14087-14112) are located at the 5' end of intron 2. P5 corresponds to the hypothetical 5' stem of domain IV of external intron 2a. Primer p6

(5'-GGTTAAGTTTAAAATGTGGTTGG-3'» coordinates 15148-15170) is 94 the reverse complement of c6. Primer p7 (5- CCATTTTTTTGTAAGCAGTGTCC-3', coordinates 15817-15839) is the reverse complement of c7. Various combinations of PCR and cDNA primers were used to amplify synthetic cDNA and, as a control, Euglena gracilis chloroplast total nucleic acid. All reactions were incubated at 80^0 for 3 minutes and 3 units of Taq polymerase (Perkin-Elmer) were added to each reaction. Amplification cycles consisted of 94^C for 1 min, SO^C for 2

min and 12^C for 3 min for 25 cycles. PCR products were gel purified and cloned into the ddT-tailed EcoRV site of Bluescript KS(-) vector(Stratagene)(Kovalic et al., 1991). Plasmids containing inserts were sequenced using the Sequenase kit (USB).

Computer analysis of orf 758: Multiple alignments were performed with PILEUP and PRETTY from the GCG Wisconsin Package (Genetics Computer Group, September 1994,). Chloroplast matk protein sequences from Pinus contorta (P24685), Pinus thunbergii (Q00866), Oryza sativa (P12175), Secale cereale (JN0302), Nicotiana tabacum (P12176), Solarium tuberosum (P32088), Saxifragia integrifolia (P36436), Pisum sativuum (S08056), Sinapis alba (P09364) and Hordeum vulgaris (S28765) were aligned. In addition, the Swissprot, GenBank and EMBL databases were searched for homologous proteins according to Smith and Waterman on Genquest (Devereux et al., 1984). 95 CHAPTERS

PSBC INTRON 2 AND MAT 2 ARE DEEPLY ROOTED IN GENUS EUGLENA, WITH VARIOUS INTERNAL INTRON CONTENT

Introduction

Compared to the group II introns found in mitochondria and other chloroplasts, Euglena chloroplast group II introns are shorter in size and abbreviated in secondary structure. Many tertiary interaction elements suggested in the canonical model (Michel, 1989) are missing. Sequence analysis indicates that Euglena chloroplast group II introns may fold slightly differently than other group II introns. Furthermore, in vitro self-splicing of any Euglena chloroplast group II intron has never been demonstrated. A likely explanation is that Euglena group II introns may rely much more on trans-mediating factors, such as intron maturases, for processing, due to the lack of efficient tertiary structure

A typical group II intron maturase is usually encoded by a group II intron. Maturases contain several functional motifs, including a reverse transcriptase domain (I-VII), an X domain, a Z domain and a C-terminal Zn finger domain (Mohr, 1993, Lambowitz, 1995). The X domain may be involved in RNA binding. The zinc finger is believed to be an endonuclease domain (Zimmerly, 1995), and the RT domain can synthesize DNA from an RNA template. The 96 combination of the above features has enabled group II intron maturases not only to mediate intron excision, but also to promote intron mobility. Two intron-encoded maturases located in two yeast mitochondria group II introns, ail and ai2 of the cox I gene, have been shown to insert their host introns into intron-less alleles in vitro and in vivo (Moran, 95). The mechanism of intron homing involves reverse transcription of the intron, double strand cleavage, and ligation of the DNA molecules (Zimmerly 95 , Zimmerly 95b).

Genetic data also showed that amino acid changes within the X-

domain directly affected the mobility of the intron in a negative fashion (Eskes 1997). In the Euglena gracilis chloroplast genome, open reading frames are found in three introns. A group III twintron (intron- within-intron), psbC intron 4, has been shown to encode a 53 kD maturase-like protein, mat 1 (Copertino, 1994; Doetsch, in press ). Two group II introns, psbC intron 2 and psbD intron 8, also contain potential open reading frames. While psbD intron 8 has not been well studied, psbC intron 2 has been thoroughly characterized. psbC intron 2 of Euglena gracilis is a 596 nt group II intron that contains a 758 aa open reading frame which is itself interrupted by 3 additional group II introns. Two of the characteristic maturase functional domains, the X domain and a portion of the RT domain, have been identified in the predicted protein sequence. The 3 group II introns internal to the open reading frame may be involved in the regulation of the expression of the open reading frame (Zhang, 1995). This open 97 reading frame has been named mat 2 and is believed to code for a maturase-Iike protein. Considering the fact that there are nearly 90 group II introns in the Euglena gracilis chloroplast and only 2 potential group II encoded maturases, we believe that mat 2 may be a general splicing factor involved in all Euglena chloroplast group II intron excision. Although Euglena gracilis is a single celled protist, a primitive form of eukaryote, it has a chloroplast intron content higher than land plants. Studies of Euglenoid intron evolution have shown that other Euglena species, as well as some basally branched Euglenoids, have far fewer introns than E. gracilis (Thompson, 1995). The source of these introns and their evolution in the Euglenoid chloroplast genome is a very interesting question. Phylogenetic analysis, together with the discovery of twintrons in Euglena, has provided strong evidence supporting the "introns late" hypothesis which proposes that introns are mobile elements which have inserted into intronless genomes (Palmer, 1991). The invasion of these mobile elements is thought to have been mediated by internally encoded protein factors, or maturases, through reverse transcriptase or endonuclease activities. Since mat 2 has remnants of the RT domain and a potential iron-binding domain at the C-terminus, we propose that psbC intron 2 could be one of the earliest introns that invaded Euglena chloroplast genome and mat 2 could have played a crucial role in intron evolution in the Euglenoid chloroplast. 98 If mat 2 was a key element in recruiting group II introns into the Euglena chloroplast and is involved in their processing, we would expect mat 2 to be widespread throughout the Euglena genera. Therefore we would expect to find psbC intron 2 and the mat 2 homologues in many other Euglena chloroplast genomes. Based on this rational, we have screened 10 Euglena species for psbC intron 2 homologues and the results from 5 of these species were subjected to thorough analysis in terms of the structural features of psbC intron 2 and the mat 2 gene, and their evolutionary conservation.

Results

PCR amplification of ten Euglena species In order to detect psbC intron 2 homologues of E. gracilis from other species of Euglena by the polymerase chain reaction (PCR), primers specific to the consensus sequence of psbC exon 1 and exon 3 were used to amplify DNA from total nucleic acid (TNA) extracts of 10 Euglena species. PCR results are shown in figure 3-1. As a positive control, PCR of E. gracilis TNA was also carried out, yielding a fragment of about 4144 nt, which is the expected size of a segment of exon 1, entire intron 1, entire exon 2, entire intron 2 and a segment of exon 3. The products obtained from E. geniculata and E. mutabilis appear to be the same size, indicating that their intron contents are likely the same as that of E. gracilis. PCR products from E. viridis, E. anabaena, E. granulata and E. sanguinia were similar to 99 each other, in that they were all about 1200 bp smaller than the product from E. gracilis (The traces of 100 bp or so fragments in the E. viridis and E. anabaena reactions were proven to be primer dimers by sequencing, data not shown). Since intron 1 of E. gracilis is about 1000 bp, it was predicted that E. viridis, E. anabaena, E. granulata, and E. sanguinia lack psbC intron 1, yet contain a large psbC intron 2 (shown in section 4). The product from E. stellata was difficult to characterize by size comparison alone. The product was approximately 4300 bp, which is too large for only intron 2, yet too small for both introns 1 and 2. We hypothesized that E. stellata could contain both introns if their sizes were modified, or it perhaps contained a completely different intron content. The E. spriogyra PCR product was a relatively small fragment of 2592 nt. We predicted that E. spriogyra contained a shortened psbC intron 2 and lacked intron 1. 100 Figure 3-1. PGR analysis of total nucleic acid of ten Euglena species. The bar-diagram above the gel is a simplified diagram of the Euglena

gracilis psbC DNA from exon 1 to exon 3. The black boxes are exons and the white boxes are introns. Positions used for primer design are shown as arrows. The agarose gel shows the PGR amplification results. Names of the Euglena species are shown above the lanes. The sizes (in bp) of some PGR products that were confirmed by sequencing are shown on the left of the gel; lanes 1 and 12 are "k

BstEII molecular weight markers and the sizes (in bp) are shown on the right of gel. 1 0 1

primer 1 primer 2

B intron iB intron 2

§u "*s s: « .«s s •C> .u e u *««• is i I) I I 2 R V. I tc CK3 S £ i S uj uj ui ui ^ tj^ rjj Li] UJ ui

£. gracilis 4796 (+ intron 1) » £. granulata 3573 A £. oiridis 3415 j £. anabaena 3346 B. spirogyra 2592

E. myxocylindracea 83 (exon only) 102 Interestingly, amplification of E. myxocylindracea TNA yielded nothing but a very small fragment, either an intron-less fragment or a primer dimer (figure 3-1, lane 11), indicating that introns were completely missing in this species.

Determination of processing sites of psbC intron 2 homologues in E viridis. E. eranulata. E. anabaena. and E. spirosvra. In order to confirm the existence of the predicted intron 2 in £. anabaena, E. viridis, E. graniilata and E. spirogyra, the PGR fragments

were cloned into pKS(-) bluescript. The 5' and 3' ends of the inserts

were directly sequenced using the M13 and T3 primers. Each species contained a intron 2 homologue, while intron 1 was absent in all four species. Figure 3-2 shows the splicing boundaries of all the intron 2s, as determined by comparative analysis with E. gracilis psbC sequence. Intron 2 of all four species analyzed is inserted in exactly the same location in the psbC gene as in E. gracilis. The exon sequences of psbC gene are also highly conserved at both the nucleic acid and protein levels in these five Euglenas. The 5' splicing boundary of the introns is conserved, especially the first two nucleotides. While four of the five external introns (including E. gracilis) had 5' boundaries matching the consensus "GUGYG", E. spirogyra intron 2 varied slightly with a sequence of "GUUUU". The small fragment observed from the E. myxocylindracea PGR reaction was also cloned and sequenced. It was revealed to be psbC exon 1, 2 and 3, i.e. the mature psbC mRNA sequence (Figure 3-2, 103 last row). It was thus confirmed that the intron contents from psbC exon 1 to exon 3 are missing from E. myxocylindracea. — - • :X .11 2A 2b -- L2/ tj.lV ^

h'. '1' AA',' AA/» •r J ciat Jaaaci <: Jctal tciLC(Ja -- - tl .LL UiJtJcittl.. ijtO 'I'A Ai.-r A'rr CXJ'T ijij'l'

tl (i K I. T I Ci Cj

h'. upijoijyz.i 'i' AAl vaJA AA • [ .Jl 1 M ciLLJLtt.l O J t Ia 11- UciO I. - - - .K;cjtjl 1.. . Lt. «.:ijijaLLLc»

N K 1, A V (\ li

f. v.iiidi-j '1' AA'l' :\At\ AAi\ 1' |t 1 cjdrttl.ijat J -Jcindadl-C.'ijtj -- - civjl. LI: .ijL;; TA ACT AIT (KiT LkJ'i'

N K K 1. '1' 1 G

£•. •.ILcUllllciLa 1' AA'l' •.j'siA AA/\ I 1 avjal-VaiJU'.A: t Ctjl. -- - tl l.c» TA AtM- A'l'l' OG'l' GOT

N ij K i. '1' 1 1 i i-j

f. 1' AA'l' AiK/\ /uV> 'i' ,t.ji .1 ijijcjL J«.' 'i'A Ai'l' > MA > Aj'i' GG'l

N K K 1. 'r V { ') U I/I.yH 1' M K. IIIJ' i i. t 1^4 k i t'lAT tv\f\A A A 1 /»'1 f" 11' ATA ^1 JVJM- i GCjl'

li u K 1. 1 1 (; ci

Figure 3-2. Processing sites of psbC intron 2 in E. anabaena, E. spirogyra, E. viriclis, E. granulata and E. gracilis. The exon sequences and tlie translations are shown in red (upper case). The 3' boundary of the introns are labeled blue (italic). The intron contents are in lower cases, mat 2 coding regions are represented by triple dashes. 105 Complete sequencing and identification of the external intron and mat 2 gene of the psbC intron 2 homologues of E. viridis. E. granulata. E. anabaena. and E. spirogvra. The cloned PGR products of E. viridis, E. granulata, E. anabaena and E. spirogyra were completely sequenced via restriction fragment subcloning and deletion subcloning as described in the "Material and Methods" section. The 5' and 3' ends of the mat 2 reading frames, and the boundaries of the external introns (intron 2as) were determined through comparative analysis. The sizes of the external introns are 458 bp in E. viridis, 596 bp in E. anabaena, 474 bp in E. granulata and 373 bp in E. spirogyra. The 5' and 3' segments of the external introns are shown in figure 3-4. In all four species, the start codon and the 3' end of the mat 2 reading frames were clearly identified. No potential intron sequence elements were identified at the 5' untranslated region of any the mat 2 genes. In E. viridis, the coding region appeared uninterrupted until about 1750 bp from the 3' end, where a potential domain V of a group II intron was found, indicating the presence of an internal intron. In E. anabaena, the reading frame continued up to nearly 2100 bp from the 3' end. In E. granulata, the interruption appeared at about 1100 bp from the 3' end. While in the case of E. spirogyra, a 2133 bp uninterrupted reading frame was clearly identified, indicating mat 2 of E. spirogyra contained no internal introns. 106 RT-PCR analysis and identification of internal introns in mat 2 homologues of E. viridis. E. ^ranulata. E. anabaena and E. spiroq\ra. In order to characterize the internal introns, oligo primers for cDNA synthesis and analysis were designed specifically for each species. The locations of the primers relative to the introns are shown in figure 3-3. In general, the PGR primers were designed to anneal to the predicted seqeunce upstream to the start codon of the ORFs and the corresponding cDNA primers were designed around the

beginning of the C terminal, uninterrupted section of the ORF. cDNA synthesis and PGR analysis was carried out as described in the material and methods section. The RT-PGR results are shown in figure 3-3. The control PCRs on DNA that has not been through reverse transcription show the sizes of unprocessed precursors. RT- PGR of E. viridis RNA resulted in four products (figure 3-3, lane 3), indicating the presence of two internal introns. The E. anabaena RT- PGR shows only one more product than the precursor (figure 3-3, lane 5), indicative of only one internal intron. The reaction with E. granulata resulted in at least 5 visible fragments (figure 3-3, lane 7), indicative of at least 3 internal introns. RT-PGR did not detect any internal introns in E. spirogyra, which is consistent with the sequence analysis results (data not shown). Therefore it was proposed that internal introns are absent in E. spirogyra mat 2.

The RT-PGR products were cloned and sequenced to obtain detailed data about the content as well as insertion sites of all the internal introns. A complete diagram depicting the psbC intron 2s of 107 the four Euglenas is shown in figure 4. E. viridis psbC intron 2 is a

458 bp group II intron that contains a 758 aa ORF which is

interrupted by two group II introns; 2d' is between amino acid 117 and 118, and 2e is between the first and the second codon of amino acid 211. Intron 2d' and intron 2e are 283 bp and 314 bp, respectively. The E. anabaena psbC intron 2 external intron is 596 bp and it contains a 760 aa ORF which is interrupted by a 384 bp group II intron 2d' between amino acids 112 and 113. E. graniilata psbC intron 2a is 474 bp and contains a 702 aa ORF with three group II introns, 2f (304 bp) is between the first and the second codon of amino acid 51, 2h (236 bp) is between amino acids 111 and 112, and 2g (367 bp) is between the first and the second codon of amino acid 355. E. spirogyra psbC intron 2a is 473 bp, with an ORF of 711 aa and contains no internal introns. E. myxocylindracea psbC completely lacks intron 2. The complete annotation of intron contents of the psbC intron 2s in all analyzed species is presented in table 2.

Sequence analysis has shown that all of the internal introns are group II introns (results shown in chapter 4). Intron 2d of E. gracilis, intron 2d' of E. viridis and intron 2d" of E. anabaena are considered to be homologous (analysis shown in chapter 4). The 238 nt intron 2h of E. granulata is a very short group II intron lacking domain IV. Since it is below the previously known Euglena group II intron size range, intron 2h has been termed a mini-group II intron. 108 Figure 3-3. RT-PCR analysis of psbC intron 2 in E. viridis, E. anabaena and E. granulata. The bar diagrams below the gel showed the positions of the primers used. Black boxes are exons and white boxes are intron 2as (external introns). The stippled boxes are a shortened version of mat 2. Internal introns are shown as lollipops. Control PCRs with total nucleic acid extracts are labeled "DNA". Sizes (in bp) of some fragments that were confirmed by cloning and sequencing are indicated on the right of the gel. 109 < . < z - 2 z < 2 C - C Z - 'r. i

1230 Igranuiata pre mRNA)

200-t Igranuiata -2h) 3675 •1926 Igranuiata -2.0

irOO (granuiata -2ii-2fl

1637 Igranuiata -2h-2g) 1333 (granultata tully processed

1340(viridisi or 132-t( anabaena)

1056 (vir.dis -2d') 1026 ivindis -2ei 939 (anabaena -2d")

742 ivindii)

E. viridis

E.anabaena

£. granulata 1 10 Figure 3-4. Internal intron content of psbC intron 2 in E. gracilis, E. myxocylindracea, E. viridis, E. anabaena, E. granulata and E. spirogyra. Black boxes are exons. The white open boxes are external intron 2as with segment sizes (in bp) listed under the boxes. The mat 2 coding regions are labeled as shaded boxes with sizes (in AA) shown underneath. Internal introns are shown as lollipops with name and sizes (in bp) shown above each lollipop. Suggested homologous internal intron 2ds are connected by a dashed line running through. On the right is part of the phylogenetic tree of Euglena established by M.D. Thompson, 1995. Only the species whose psbC intron 2 was studied were listed. For complete phylogenetic tree see Chapter 1, figure 1-9. 2b 552 2c 2d '352 369

x2 T x3 E. gracilis 2a 2a1 431 -mat 2 (758aa)- 160

x2 x3 ^E. myxocylindracea

2e 2d' 314 283

x2 x3 E. viridis tzm 351 -mat 2 (758aa)- -•107 2d' 384

x2 i x3 — E.anabaena • 27 •_2iJ 4 50 -mat 2 (760aa)- -• 146

2f 2h 2g A. nidulans 304 236 367

x2 T T T x3 ~ E. granuiata 2a

364 -mat 2 (702aa)- •no

x2 x3 E. spirogyra •nr £2. 239 -mat 2 (71 laa)- 134 1 1 2 Processing of the internal introns in E. ^ranulata is partially ordered.

RT-PCR of E. granulata using primers Pg and Cg resulted in 6 detectable bands (Figure 3-3), with the largest being the unprocessed precursor (2230 bp) and the smallest being the mature mRNA (1333 bp). However, if the three internal introns were randomly processed, there should be a total of 8 products with sizes distinguishable on the agarose gel. Since the three internal introns in E. gracilis are processed in a partially ordered pattern (Chapter 2), It was suspected that the processing of the 3 introns in E. granulata is also partially ordered. Cloning of the RT-PCR products followed by sequence analysis confirmed the sizes and identity of three intermediates. The first was a 1926 bp product in which only intron 2f has been spliced. Next was the 1637 bp fragment which corresponds to a RNA with both 2h and 2g spliced. And lastly, a 1700 bp product was confirmed to be the result of the splicing of both 2h and 2f. These intermediates are most likely the third, fourth and fifth bands on the gel (Figure 3-3). There is also a very faint band above the 1926 bp fragment which is probably the 2004 bp intermediate in which intron 2h has been spliced. No visible bands on the gel, or cloned products, correspond to intermediates with only 2g spliced or both 2g and 2f spliced. Preliminary data therefore implies that intron 2h and 2f are preferentially spliced prior to 2g.

The proposed predominant processing pathway is outlined in figure 3-5. 113 Figure 3-5. Predominant pathway of the mat 2 internal introns in E. granulata. Black boxes are psbC exon segments. The white boxes are the external intron 2a segments. Mat 2 is shown as shaded box and the internal introns are shown as lollipops. Thick arrows label the preferential pathways.

1 15 Comparative analysis of mat 2 sequences, and identification of conserved domains.

The predicted mat 2 amino acid sequences of the five Euglenas

were analyzed using "gap" and "pileup" functions in the GCG program (Wisconsin). The mat 2 coding regions have internal length variations of no more than 15 amino acids between species. Pairwise comparisons between each mat 2 were done by the "gap" program with a default gap creation penalty of 12 and gap extension penalty of 4. Percent similarities between total mat 2 sequences of any two species ranged from 34% to 39%. Pileup alignment of all five mat 2 sequences was done with the same gap parameters. In spite of the overall low similarity between the sequences, pileup analysis revealed two clusters that are highly conserved in all five mat 2s.

One region contains about 100 amino acids at the C end and the other is a 60-70 aa block 300 aa up from the C end. The C block corresponds to the previously identified X-domain (Mohr, 1993; Zhang, 1995), with a regional average pairwise similarity of 54%. The 60-70 aa block overlaps with a residual reverse transcriptase domain V-VII also previously identified (Mohr, 1993; Zhang, 1995). Regional similarity of the RT-domain was not significantly higher than the overall similarity, yet similarities to the general consensus model were clear. The alignment of the X-domains and the RT- domains of the five mat 2s is shown in figure 6, together with a Euglena consensus and a general consensus for all group II intron maturases (Mohr). 1 16 It is interesting to note that the most conserved cluster in the X-domain region, "R(Q/E)SCFLTL(S/C)RKHNK" within Euglena, is not conserved in non-Euglena group II introns, indicating that the active sites of the Euglena maturase might be slightly different than those from other organisms.

Figure 3-6. Sequence alignment of mat 2 X domains (A) and RT domains (B) of in E. gracilis, E. viridis, E. anabaena, E. granulata and E. spirogyra. A consensus of all five mat 2 is listed in blue under the alignment {constnsns-Euglena). The consensus suggested by Mohr

(Mohr, 1993) was listed in pink (consensus-model). The letter code used in the consensus lanes is: o=aromatic; b=basic; h=hydrophobic; p=polar; a=acid. Amino acids that are identical to or have tolerable changes from consensus-model are pink; Amino acids that are identical to or have tolerable change from consensus-fMg/ena are labeled blue. 117

641 689 Mat2-C E. gracilis PIGNVRLLLF EDKFILRNFG FFVYSVLNWF SICENFS.HL RFFVELIRES Mat2-C E. viridis AIGKSAFLKL DDMSIIKTFG SISYLFLNWY RCCFNFSF.V KKFINILRES Mat2-C E. anabeana PISNSKYLFF DDAIILDYCS YHAILLLSLF RCSENFS,KV KIMVEYIRQS Mat2-C E. granulata PISNCRLLLL QDKVLTQYYG NFAFNTINWY RCVFNIS.KL KLVISTLRQS Mat2-C E. spirogyra SIGNIYFMFL DDNQIILNFK ILTYLLYIWY KKASN.SYKL RFFSSLLKNS Consensus - Euglena PIGNS—IJLF DD—IL G Y—llNWF RCCENFS bV K—h—IRES Consensus - model bGhhNYY -O—N h h-YhhbS

690 740 Mat2-C E. gracilis CFLTLCRKHNK MKLWSYSVYT FDLVFSKSVY RTISFFPTRK FIFNLKRKSF Mat2-C E. viridis CFLTLCRKHNK NKTWVYEVYT YDLNIFDNLF SNKSFFPSRS VLSQMKRKFF Mat2-C B. anabeana CLLTLCRKHNK SKDWVYSVYT SDLLSYQNLF FYGNTFPTFK KIHVMKKKFL Mat2-C E. granulata CLLTLSRKHNK RKSWAYFIFT PDLLILRGLF MDTSFFPSRV MLSKLSRKFF Mat2-C E. spirogyra CCLTLSRKHNK SKSWSLRVY. KWLS.DC.F NNV.FF Consensus - Euglena ChLTLCRKHNK -K-W-Y-VYT -DL LF SFFPTR- -I—MKRKFF Consensus - model TLA-K—K p bhh-bo G-L

741 758 Mat2-C E. gracilis LV.DVRFNLD ETIFLE*. Mat2-C E. viridis MIR.FIVFFD ERFFLNS* Mat2-C E. anabeana KTNDFEFLLN EK.FLLF*. Mat2-C E. granulata LSEKAELLFE EIMFLR* Mat2-C E. spirogyra L.NANSFSFD HDFFLKL Consensus - Euglena DF-F-L- E—FL

B 351 400 Mat2-E.gracilis ..FNNKVFTV YEKNIYYVRY LNFLIFGFLS SKNFIFFFKL KYLFFLRNKL Mat2-B.viridis ..MTNVSTFL FEKNISYLRY FSHIILGVIG SKNFFNFFFK KILTFVRSSI Mat2-E.anabaena KNFYKKKFKV FLKKILYVRF LHFFLFGTIA SKSFSFQILE KVLCYTKTNL Katl-E.granulata NCLIFTNDLF FLKTFFYARY LDYLLLGIKG SNKFALNVSK KLSNFVRSRI Mat2-E.spirogyra KYYKYKNFFY TINNSSVYSS L.VPLKITSG S..FYYPFL. TRNTL Consensus-Buglena DKhhhYVRY h-b-IhGh-G pK-L-FKh— YLhpF-b—h Consenseus - model bh-YVRA DD—hhGh-G p—h hb-- -h—Fh—h < domain V > < Domain VI

401 450 Mat2-B.gracilis YFNFREVQIF SSSNDKVIFL GVYIAY NKIYNFFEKL RVNKKYF... Mat2-E.viridis RFDLEKTSFF SNLDDSIIFL GFNIKLVSLS QKNTNSIFDF RTAKSYFSRIL Mat2-B anabaena HVYYNQKDLI LASNAPITFL GFYIKRLDLL TKKV SVFNF SNKIKHKIFLK Mat2-E.granulata YFDIQKLESI LCQRNFHIFC QVLI.YVYVS LRIYIRIFLL KLHYKEIFKKN Mat2-E.spirogyra FLFNQ GIY ITRFSNF FL GILGSTLFKT KLKNKIIGFL RGLCLILESFY Consensus-Euglena php-p-pRh- —pppp-pFL Ghh— Consensus - model -h-h—abp- h h-FL G—h > < Domain VII > Discussion psbC intron 2 and mat 2 are deeply rooted in Euslena species.

We have screened ten Euglena species for psbC intron 2 homologues. Preliminary PGR data strongly indicated that intron 2 is present in 9 of the screened species. In this study, intron 2 has been confirmed to be in 5 of these 9 species. The widespread existence of psbC intron 2 could be best explained by the presence of this intron in a common ancestor.

Of particular interest, one species, E. myxocylindracea appears to lack the entire intron 2 region. According to a phylogenetic study, E. myxocylindracea belongs to a relatively late branching lineage (Thompson, 1995). A consensus tree indicates that E. myxocylindracea branches after species such as E. viridis and E. anabaena , which contain psbC intron 2. Therefore it is unlikely that the lack of intron 2 in this species is due to the early divergence of E. myxocylindracea. An alternative could be that E. myxocylindracea experienced a secondary loss of psbC intron 2 instead. The later explanation suggests that the lack of intron 2 in E. myxocylindracea does not exclude the possibility that intron 2 existed in its ancestor. The results of our screen indicate that psbC intron 2 was most likely present in the ancestor of all 10 tested Euglena, thus probably the entire genera. The identical insertion sites in the psbC exon and high secondary structure conservation indicate that psbC intron 2, as well as mat 2, has withstood the long period of Euglena evolution and 1 19 development. The most likely explanation of this conservation is that psbC intron 2 and its internal element play an essential role in

the chloroplast. Since cDNA library of chloroplast mRNAs are hard to generate because that there is not much post-transcriptional modification in chloroplast, mat 2 mRNA, if expressed, will be hard to detect. Therefore evidence that mat 2 is a functional protein has to come from different analysis, such as comparative sequence analysis.

High local conservation of mat 2 suggests a conserved function as an intron maturase. The total homology of mat 2 protein is below 30%. It is not so surprising considering that it is intron encoded and it's putative working target is intron, highly variable elements. In fact, comparative analysis of maturases in other organism have all came up with homology lower than 50%. Even yeast mitochondrial group II maturases could not show homology bigger than 70% until only the RT or X domains are considered (Mohr, 1993). Similarly, my comparative analyses have revealed highly conserved regions in the five mat 2 amino acid sequences. One of them is within the putative X-domain (similarity 65%), and the other is within the RT domain (similarity nearly 50%). These high local conservations are particularly significant in a background of low homology of the total mat 2 sequences. 14-amino acid motif in the X-domain, which is almost unchanged from species to species, consists mostly of basic or polar amino acids, a composition very suitable to nucleic acid 120 binding. Although slightly shifted from the proposed activity center of other maturases (Mohr, 1993), this motif is strong evidence for a conserved X-domain functioning as an RNA binding motif for Euglena introns. The reverse transcription domain, on the other hand, is only partially conserved, indicating a loss of function during evolution. Yet the partially conserved domain provides evidence that mat 2 could once have acted as a reverse transcriptase and may have been involved in mediating intron acquisition. Also during evolution, we proposed, that the N terminal regions of the mat 2 sequences evolved substantially to adapt to the fast mutation rate of the introns they process. Another piece of evidence that mat 2 is actively expressed came from the secondary structure analysis (discussed later). It was demonstrated that all the internal introns of mat 2s are located in the loop of domain IV of the external intron 2as and are incapable of interrupting the external intron processing. I have proposed, instead, that the internal introns were maintained to regulate the expression of mat 2s. Which intron(s) does mat 2 assist in splicing? Since no biochemical test has been done at present, there is no sure answer. However, there is evidence implying that mat 2 is specific for group II introns. The evidence comes from the species, E. myxocylindracea, in which psbC intron 2 is proposed to have been lost during a secondary loss event. In order to see what the effects of mat 2 loss might be, work has been done to analyze group II intron content in E. myxocylindracea chloroplast genes. So far, 6 121 genes, which contain a total of 43 group II introns in the E. gracilis genome, have been examined, and no group II introns were detected in E. myxocylindracea (see appendix II for data). To date, no group

II introns have been identified in E. myxocylindracea, a phenomenon that's never been observed in any Euglena species. On the other hand, E. myxocylindracea contains several group III introns, including psbC intron 4, a group III twintron with an internal mat 1 gene (Doetsch, 1998). Some of the 43 group II introns missing in E.

myxocylindracea are present in earlier branching species such as E. viridis and E. anabaena (Thompson, 1995). Therefore it appears that the existence of mat 2 may be directly correlated to the existence of group II introns in Euglena chloroplasts. This is promising evidence that mat 2 could be responsible for the general processing of all the Euglena group II introns.

psbC intron 2 as the Euglena group II introns founder Was mat 2 involved in the colonization of group II introns only in the chloroplast genome of Euglena, or other Euglenoids also? Moreover, was mat 2 responsible for the existence of group II intron only, or the group Il-related, group III intron also? To date, PGR analysis have not been able to detect psbC intron 2 homologues in any Euglenoid genera other than Euglena. It is possible that the psbC exon sequences in those species are too diverse to be recognized by our PGR primers. An alternative explanation, however, is that psbC intron 2 and mat 2 are absent in other Euglenoid chloroplast. In 122 addition, psbC intron 4 iiomologues, a group III twintron encoding a

the maturase-like gene mat I, in two non-Euglena Euglenoid species across two orders of Euglenaphycea have been identified using PGR techniques (Doetsch, in press). The same study also showed that group III introns tend to accumulate in more basely branched

species faster than group II intron. The results imply, surprisingly, that group III introns might be more deeply rooted in Euglenaphycea than group II introns. Therefore, it is unlikely that mat 2 was necessary for group III intron evolution in Euglenoid chloroplast.

An exciting evolutionary hypothesis therefore is that group III introns, instead of being derivatives of group II introns, are actually group II intron ancestors. However, considering the fact that mat 1 and mat 2 are very different in primary structure, and the presence

of mat I in E. myxocylindracea, which displays a huge loss of group II introns, mat 1 is probably not responsible for the splicing of group II introns. One possibility is that originally mat 1 was involved in recruiting and processing of a small amount of group II or group II- like introns. However, it was not until psbC intron 2 invaded the Euglena chloroplast genome, assisted by mat 1 or not, that the major group II invasion began. An alternative could also be that mat I and mat 2 simply process two different types of introns, group III and group II respectively, that converged later during evolution. In order to provide evidence considering the relationships between mat y/group III intron and mat 2/group II intron, RNA binding assays 123 could be perfomed with mat 1 and mat 2 to test their reaction specificty to group II and group III intron.

Internal introns and their evolutionary implications. Using introns as milestones to mark evolutionary steps and deduce taxonomic relationships is a well known strategy in phylogenetic study (Thompson, 1995, Doyle, 1996, Liss, 1997) Using the internal intron of psbC intron 2 homologues as markers, we could gather clues to the evolutionary relationships between the Euglena species. Most notable is the homologous intron 2d found in E. gracilis, E. viridis and E. anabaena. The most parsimonious explanation for their homology is that the ancestral intron 2d was acquired before the divergence of these lineages and the remaining internal introns were inserted afterwards. Also, this data suggests that the common ancestor of E. gracilis, E. viridis and E. anabaena emerged after E. granulata and E. spirogyra, two species whose positions in the Euglena phylogenetic tree are not yet established. Based on the intron-late model of intron evolution, there should be more introns in the later branching species. Therefore, E. spirogyra, which is missing internal introns, might belong to a relatively early branching lineage. A less likely alternative is that, as in E. myxocylindracea, the E. spirogyra lineage experienced a partial secondary loss. Yet if this is the case, E. spirogyra should branch more closely to E. myxocylindracea and E. gracilis, higher than E. viridis and E. anabaena. Sequence analysis of 124 E. spirogyra psbC intron 2a (in chapter 4), and the preliminary PAUP analysis data using mat 2 sequences (data not shown) do not favor this phylogenetic model. Instead, E. spirogyra appears to belong to a branch far from the clade containing E. gracilis, E. myxocylindracea, E. viridis and E. anabaena. Thus, internal intron content has provided valuable information in the phylogenetic study of Euglena, which needs to be advanced by expanding the species and gene database. The presence of the "mini-group II intron", intron 2h, in E. granulata has illuminated the investigation of group Il-group III relations. The mini-group II could very well be an intermediate product of the group 11^ group III transition.

The same kind of mini-group II introns were also found in Lepocinclis beutschlii, a protist in another Euglenoid genera which is very close to Euglena. It is interesting that the two mini-group II introns of L. beutschlii, 289 nt and 237 nt, were found within the coding region of mat I (Doetsch, 1997). The mat I gene is in the internal intron of psbC intron 4 (group III twintron) and had never been found interrupted by introns in other Euglenoids. The discovery of small group II introns inside a group III maturase gene is thought to illustrate a promising link between group III and group II introns. This hypothesis hence favors the idea that mat 1 and mat 2 are also evolutionarily related. 125 Materials and Method

Euglena cultures Cell cultures of nine Euglena species were purchased from the culture collection of algae at the University of Texas at Austin (UTEX) They include: Euglena geniculata var. terricola (UTEX 366), Euglena mutabilis (UTEX 364), Euglena viridis (UTEX 85), Euglena anabaena (UTEX 373), Euglena granulata (UTEX 453), Euglena sanguinia (UTEX 2345), Euglena stellata (UTEX 372), Euglena myxocylindracea (UTEX 1989) and Euglena spirogyra (UTEX LB1307). Euglena gracilis Z was used as a control.

Nucleic acid isolation

Total nucleic acid (TNA) was isolated from each species as described previously (Thompson, 1995).

PCR and RT-PCR primers Table 1 shows the sequence of all oligos as well as their locations (EMBL coordinates). Two sets of primers were used to amplify the total nucleic acid extracts. All oligos were made degenerate in order to adapt to the slight sequence differences

between Euglena species. The first set is oligo pEl and cE34. PEl is RNA-like and is located in psbC exon 1. cE34 is cDNA-like, and spans psbC exon 3 and 4. The second set is a modified version of set 1: SacIIpEl is pEl modified to contain a SacII site. BamHIcE3 is 126 analogous to cE34 with an added BamHI site, and primes only to psbC exon 3. pEl-cE34 was used in the original PGR, and SacIIpEl and BamHIcE3 were used later to facilitate the cloning process.

Three sets of oligos were used to amplify splicing intermediates of psbC intron 2 of Euglena viridis, Euglena granulata, and Euglena anabaena. Pv and Cv are mRNA and cDNA primers with the exact sequence of the 5' and 3' regions in E. viridis psbC intron 2, respectively. Likewise, Pa-Ca is for E. anabaena and Pg-Cg is for E. granulata.

PGR amplification, cloning and sequencing Primers pEl-cE34 were used to amplify total nucleic acids (TNAs) from ten Euglena species. Each reaction was carried out for 30 cycles of 940G 1' - 50OG 2' - ll^C 3', using 2.5 unit Taq polymerase (GIBCOBRL), with 5|il of template TNA and 200 ng of each primer. TNA extraction is described in Thompson, 1995. In order to clone the PGR products, primer SacIIpEl and

BamHIcE3 were used to reproduce the original amplification with TNAs from Euglena viridis. Euglena anabaena, Euglena granulata, Euglena spirogyra and Euglena myxocylindracea. Amplified products were phenol-chloroform-isoamyl alcohol (24:24:1) extracted and precipitated. The products were then digested with SacII and BamHI, and cloned into the SacII-BamHI sites of pKS(-) bluescript. 127 Insert-containing plasmids were then subjected to sequencing analysis. The terminal sequences (5' and 3') of the amplified fragments

were determined via double strand sequencing of the ligated plasmids, as described previously (see chapter 1). The entire sequence of the five cloned PGR products was obtained through deletion cloning on both strands of the original plasmids using exonuclease III (GIBCOBRL). Sequences were also confirmed by sequencing restriction subclones of the original plasmids.

RT-PCR amplification, cloning and analysis Oligos Cv, Ca and Cg were used to synthesize cDNA from RNAase treated TNAs of E. viridis, E. anabaena and E. granulata, respectively, with Superscript II (GIBCOBRL). The reactions were carried out under 370c for 1 hour, as described previously (Copertino, 1991a). cDNAs were then amplified with the corresponding mRNA and cDNA primers as listed in table 1. The PGR reactions of the cDNAs were carried out for 30 cycles of 940G 1' - SO^C 2' - 72^C 3'. PGR products were polished by PfuII polymerase (GIBCOBRL) at ll^C for 3min. and shot-gun cloned into the EcoRV site of pKS(-) bluescript. Insert- containing plasmids were then sequenced. 128 Table 3-1 Sequences and locations of PGR and RT-PCR primers. The ocations are in the forms of EMBL coordinations.

Name Template Sequence(5'-3') Location strand pE 1 mRNA GGTTCT(AyDCCACGTGGAAACGCTC 13152-I3I74 (gracilis) cE34 cDNA GCATTNCCNGCCCACCA(A/G)GC 18602...17912 {gracilis) SacIIpEl mRNA GGTTCT(A/DCCCCGCGGAAACGCTC 13152-13174 {gracilis) BamHIcE3 cDNA CCACCA(A/G)GCGGATCC(A/T/G)GTTGATTC 17920-17897 {gracilis) Pv mRNA AATTAATTGTCCTG(AG)CTG 326-343 (viridis) Cv cDNA GCATCTGTAGTAAGACTATTAAACC 1644-1665 (viridis) Pa mRNA GGATAAAGATGTAGAATGAATACAC 271-295 (anabaena) Ca cDNA CAAAGGACTAAAACCTGC 1579-1595 (anabaena) Pg mRNA GTTCrri'l'GGTTAAAATC 232-249 (granulata) Cg cDNA GAAACCCTTGAATATTTGACTTC 1439-2462 (granulata) 129 Table 3-2 Complete anotations of introns and protein coding regions in psbC intron 2 of E. gracilis, E. viridis, E. anabaena, E. granulata and

E. spirogyra.

Species Gene Start End Length Gracilis Intron 2a-1 13731 14160 430 4144bD Intron 2b 14161 14713 553 orf=759aa Intron 2a-2 14714 14717 4 mat 2 axon 1 14718 14774 57 2a=593bD intron 2c 14775 15143 369 mat 2 axon 2 15144 15431 288 intron 2d 15432 15783 352 Mat 2 axon 3 15784 17715 1 932 Intron 2a-3 17716 17874 159 Viridis intron 2a-1 27 389 363 3345bD mat 2 exon 1 390 740 351 orf=759aa intron 2cl' 741 1 024 284 mat 2 exon 2 1 025 1 307 283 2a=470bp intron 2e 1308 1621 314 mat 2 exon 3 1622 3264 1 643 intron 2a-2 3265 3371 1 07 Anabaena intron 2a-1 37 487 451 3257bD mat 2 exon 1 488 823 336 orf=761aa intron 2d" 824 1 208 385 mat 2 exon 2 1209 3153 1 945 2a=591bp intron 2a-2 3154 3293 1 40 Granulata intron 2a-1 24 388 365 3497bD mat 2 exon 1 389 539 151 orf=710aa intron 2f 540 843 304 mat 2 exon 2 844 1 025 1 82 intron 2h 1026 1251 226 2a=476bD mat 2 exon 3 1252 1 981 730 intron 2q 1982 2348 367 mat 2 exon 4 2349 3409 1061 intron 2a-2 3410 3520 1 1 1 Spiroavra intron 2a-1 24 262 239 2506nt mat 2 263 2395 2133 orf=711 aa intron 2a-2 2396 2529 1 34 2a=373bD 130 CHAPTER 4

COMPARATIVE ANALYSIS AND SECONDARY STRUCTURE MODEL STUDY OF PSBC INTRON 2 EXTERNAL AND INTERNAL INTRONS

Introduction

As a self-processing ribozyme, group II introns have a well established secondary structure and tertiary conformation. Extensive research, including computer analysis and molecular genetic approaches, have been done to perfect the canonical secondary structure model deduced in 1989 (Michel, 1989 see figure 1-5,). Domain V and domain VI of the six-helical-wheel model are the most crucial, and hence characteristic, structural feature of group II introns. In fact, since the group II intron self-splicing rate is too low to detect in vitro, most group II introns are recognized by sequencing analysis and the identification of domain V.

Among all the group II introns studied, yeast mitochondrial introns best fit the canonical group II intron secondary structural model. Most of the proposed secondary and tertiary elements are present in yeast group II introns. In recent years, many new conserved elements have been identified, through comparative analysis of yeast introns. Compared to yeast group II introns, Euglena chloroplast group II introns are much less conserved in secondary structure and tertiary interaction. Euglena group II introns all have clearly 131 distinguishable domains V and VI. Yet, due to their unusually short length, the other domains are often abbreviated. Also, many G-U and sometimes U-U pairs are present in the putative stems of Euglena group II introns. By relaxing the pairing rules, six-helical domains could be postulated for all Euglena group II introns, yet many tertiary interaction elements still remain unidentifiable. Due to the unicellular, asexual nature of the organism, it is not possible to test any of the stem-loop structures of Euglena chloroplast group II introns by a mutagenesis approach. Therefore, Euglena introns as traditional group II introns were questioned. Euglena belongs to the kingdom protozoa (or the subkingdom protozoa of the old kingdom protista), and are one of the most early branching organelle-containing eukaryotes. The divergence and developments of genus Euglena species are thought to have started about a billion years ago; much earlier than the branching of yeast (kingdom fungi) species and land plants. Moreover, intron sequences have faster mutation rates than exon DNAs. Therefore it is not entirely unexpected to see that Euglena chloroplast intron sequences are much more diverse than yeast. In fact the extremely high similarity in yeast group II introns (even between non-homologous introns) might mask the significance of the authentic, functionally important conservation. The conservation in Euglena group II introns, on the other hand, will reflect those structure elements which are necessary for functionality. Therefore, in lieu of genetics, comparative analysis of homologous introns in Euglena could be a 132 very effective approach to obtain first hand data about the structure and processing characteristics of group II introns. The psbC intron 2 external intron 2a data set of 5 homologous introns is currently the largest homologous intron data set for Euglena: By comparative analysis of these five intron 2a's, I am able to show that the core structure of Euglena group II intron secondary structure does conform to the canonical model, yet contains some unique structure elements. Combining the canonical model and the comparative analysis result, the internal introns of all the mat 2s have also been folded and analyzed.

Results Primary sequence comparison of psbC intron 2a The entire sequence of the five psbC intron 2a's were first compared pairwise using the GAP function of GCG (Wisconsin). The GAP algorithm compares two sequences and determines the total percent similarity with as few gaps created as possible. For two sequences of 500-600 nucleotides, the default gap creation penalty was 50 and the gap extension penalty was 3. Using the default gap parameters, the pairwise identity of any two of the five intron 2a's varied from 49% to 59%. Due to the limited ability of the GAP program to deal with length differences from intron to intron, the default gap penalty was not able to align the beginning of the introns. For example, the beginning of intron 2a of E. viridis was paired to the 152nd nucleotide of E. gracilis intron 2a. In order for 133 the "ends" to align, the gap parameters needed to be less stringent depending on the species compared. After adjusting the gap parameters to 40 (creation) and 1 (extension), most of the pairings had the two 5' boundaries properly aligned and the total identity increased to between 54% and 63%. Due to the smaller size of E. spirogyra intron 2a, it failed to align with any species except E. granulata. Pairwise comparison of nucleic acid sequence using loose gap parameters is not directly informative as to tertiary structure conservation, yet it does reflect the relative conservation between some species. For example, E. spirogyra intron 2a is obviously much different from the others. Also, E. gracilis and E. viridis intron 2s had the highest identity (59% and 63%, respectively) under any gap parameters, indicative of a closer relationship, which is consistent with the established phylogenetic tree of Euglenoid (Thompson, 1995). The five intron 2a's were also subjected to pileup alignment with a gap creation penalty of 12 and a gap extension penalty of 4. The pileup result of the entire intron 2 region is shown in figure 4-1. A consensus 5' boundary "GUGYG" was identified, with E. spirogyra an exception. The sequence downstream of the 5' boundary is variable, yet a conserved "CC" dinucleotide seemed notable. The most impressive primary conservation was at the 3' end. The last nucleotides of all the introns have high similarity. The pairwise analysis of this 3' end block gave an identity of 74% to 80%. 134 Figure 4-1 Nucleotide sequence alignment of psbC intron 2a homologues from E. gracilis, E. anabaena, E. viridis, E. granulata and E. spirogyra. Segments of the flanking exon sequence of the psbC gene, with amino acid translations, are shown in bold. The 5' boundaries are underlined. The mat 2 sequences are represented by a bold "ATG-TAA" string, indicating the start and stop codons. The branch point adeneosines in the lariats are also shown in bold. 135 psbC exon 2 l--psbC intron 2a 25 E. anabaena T AAC GGA. AAA T GTGTG AATCAAAA C CAATTATCGA N P E L E. spirogyra T AAT GGA AAG T GTTTT ATTCTTTT C CTTATTAACT N P K L E. viridis T AAT AAA AAA T GTGCG AAATTGAT C CAAAAATCGG N E E L E. granulata T AAT GGA AAA T GTGCG AGATTAGTTC CGAAAATCGT N P E L E. gracilis T AAT AAA AAA T GTGTG GCATGGATTC CAATAATCGC N E E L 26 75 E. anabaena TATGTTTATT TTATGTTAAG ACCATTTAAA GATATTAAAT ACCTTTCTTG E. spirogyra TATTTTGGTT TT TATTTTATG ATAATAGTGT .TAATTATAG E. viridis TTA TTTTAATT AGAGTTTTTA AATTTAATCA .TTTTGATTA E. granulata AT TTTATTTA ATTAATAAAG GTGATAAAAT .CCTTTATTA E. gracilis TAGGT..TTA GGATTAAAAA ATAATTTTTA ACTTAAGTCA .TTTAAGTTA

76 125 E. anabaena AAAACTGCCT TAATTATAAG TAAGAATTTA TTTTTTATTA GTTAAATAAG E. spirogyra ATATATATTT ATTTAACCTT TAAAAATATA TTTATTCTT E. viridis ATTTTAGCTT TTTTGACTTA TAATACTTTT ATCCTTATTT TTTTATTATC E. granulata ATTTCAGCTT TATCATATAA TCTAGTTTAT ATCAATGATA TCTTTGCAAA E. gracilis ATTTATATTT TTTATTGGTA TAAAAACATT GATAAAAATT TATAAACTTG

126 175 E. anabaena AAAATTTTAG CTAATATTTT TCATAATAAG CTTGAATTTC AGCTTAATAT E. spirogyra ...TATAATG ACATATTTTT TTTAT E. viridis ATTTGTTGT GTTTAATCTA AACTACTTCT ATTGTTAAA. E. granulata TATTTTAAA. TTTAAATTTA AAGTTAAATC ACTCTTTAAT TGAAGAATTT E. gracilis TTTTTTTGAA CAGTAATTTT TACTATTAAT TTTTTGGTGA AAAAGGTTTT

176 225 E. anabaena AGTATTTTAT ATTCAATACA TTTAAATAAA TTCTTTCGAG ..CTCGATAA E. spirogyra CTTTAT E. viridis AATTAATTTA ATTCAAACTG TA.GATTCAA CTTTTAA AAATTT . E. granulata AAAGCTGTAA TTCCTTTCCG GTTTAAAGAA TAAATTTATT AAAAGTTCTT E. gracilis AAGTTTAAAT AATTAAATTT TTTGTTTAAT TAATTT

226 275 E. anabaena ATGTTTTTTA TTGGATAAAG AT GTAGAAT GAA.TACACT TATTATTTTT E. spirogyra AAGGCTAAAG TTTTTTAAGT TA E. viridis TTGGCTAAA TTATTTTTTG CATATAAACG E. granulata TTGGTTAAAA TC E. gracilis TTGGCATATT TTATTCTTTT GTTTTTTATT GTGATTTGAA GTTTAGATTT

276 325 E. anabaena TGTTGATTAA TTCTATTAAC TAACTAAATA AGTGTATCGT CG E. spirogyra E. viridis AACTTGTTTT TTTGTTTAAG ATTTTTGTAA AAAGAATTAA GAAATTATTT E. granulata TA..TATTTG AAGCTTTTAT ATTATTTTTC AGTATATAAA AAATTTGCTT E. gracilis ATCCGCTTTG GGTAACATAA TTATGTTAAA TTAGATTTAA AAAGATGATA 136 326 375 E. anahaena AG GGGAAATTAA AATTATAATT GATTTCGTTT E. spirogyra ^TTTT TGGAAATTCA ATTTAATTAT TTTAT E. viridis TTTTAA GGGAAATTAT GAAGT TTTAATTA. . E. granulata TATTAA GGGAAGTTAT TTT TTTTGAATAT E. gracilis AAAAAATTGT TTTTAATTAA GGGAAATTAA CTAAAATCTA AGGTACTAGG

376 425 E. anabaena .TTTATTACT TTTATTTTTT AATTTGGCCT GACTGGTATA AATTATCATA E. spirogyra TTGG ATTTG.CCTA ACTGATATTT TTTGAATTG. E. viridis A TT AATTGTCCTG GCTGATTTTA TTTCATGAAT E. granulata TTAAA TA AATTGGCCTG ACTGAGTTTT AGAAGA.. . . E. gracilis TAAAAA TT AATTGTCCTG ACTGGCATTT TTCATG....

426 mac2 477 E. anabaena AATTTTGTTT TATTAAAAGT TTAG ATG-TAG E. spirogyra TTCTTTTA TTTTATO-TAA E. viridis AGTATTTTAT TTAA..TTTT TTAGAGCTTT ..TATTCTTT TTTTGTG-TAA E. granulata ATTTTATTTT AAAACCTTTT ATTTATTAAT ATG-TAA E. gracilis AATTTTATTT TTTAAATTTA TTATGTAGTA AATATTTTTT TATAATG-TAA

478 527 E. anabaena TTTTAAATGC TTTTTAAATA TTTGGTTTTT T ATTAAAATTAT E. spirogyra ATATCTAGTT ATATTTTAAC ATGCATTTTT T..ATTTGGT TTTTTATTTAT E. viridis TATA...GTT GTTAATTTTA GTTTGTAAGA G..T.TCAGT CGGAAAGCTGT E. granulata TTTATTCGTG CGGAGTGTCG GTAGTCTACT T...TTGGGC C..AAAC E. gracilis TTCTTAATAT ATCGTTGTTG ATTGTTAAAC AAATATAGGA ATCTTGAATAA

528 577 E. anabaena GTTAAATTCT TATTTTTTGT TCTTATT... TATTAACCAG TCAGAAAGCT E. spirogyra ATT.AATTTT TTATA AGTTTTTCGG TTAGTAAGCT E. viridis ATGC.ATTTT TTGCATGTAC .TTTTTTA.. TTCTAATCAG TCGGAAAGCT E. granulata CCTATCAG TCAGAAAGCT E. gracilis TTAGAATTTT AATTTTGTAG TGTTAATTGT TTTTTGTCAG TCAGAAAGCT

578 627 E. anabaena GTATGCTT.. CCAA..A... AAGCATGTAC AGATTCGAGA ACGGTTT.TT E. spirogyra GTATGTAAA TT...T TTGCTTGTAC AGCTTCGTAA ACGGTT..TT E. viridis GTATGCA. .. .TTTTT TGCATGTAC AGTTTCGAAA ACGGATTAGT E. granulata GTATGT.. .. TCATTTA ACATATAC AGCTTCGTAA ACGGTTTTTA E. gracilis GTATGTGCAA TGAATTATTT GTGCAAGTAT AGTTTGGAAA ACGGATT.TT

637 psbC exon 3 E. anabaena CCGATTT.GTC TA ACT ATT GGT GGT T I G G E. spirogyra CCGATTTAGTC TA GCT GTA GGT GGT A V G G E. viridis CCGATTT.GTC TA ACT ATT GGT GGT T I G G E. granulata CCGAT.TTGTC TA ACT ATT GGT GGT T I G G E. gracilis .CGATTT.AAC TA ACT GTA GGT GGT T V G G 137 Comparative analysis of external intron 2a's. establishing the secondary structure models. Based on the secondary structure model established for Euglena gracilis psbC intron 2 (Chapter 2) and the primary sequence comparison of intron 2a homologues, helical-wheel structures of intron 2a of E. viridis. E. anabaena, E. granulata and E. spirogyra can be proposed and compared to that of E. gracilis (Figure 4-2).

Domain V and domain VI are well conserved. A consensus in the domain V stem, 5'..AAGCUGUA...UAYAGYUU-3', was clearly identified. Separated from the base stem by a dinucleotide bulge, a 5'..UGY...RCA..3' motif is nearly unaltered. In domain VI, the whole stem, 5'..AAA - CGG...CCG A* UUU..3', is conserved between species. Surprisingly, a consensus could also be identified at the domain IV stem, 5'..CUGACUG...CAGUCAG..3', with G replacing the fourth A on the 5' side of the stem in E. viridis. The domain III stem is less conserved than domain IV-VI, yet it also contains a 5'- AUUA...UAAU-3' motif at the stem. Other conserved elements include the 5' boundary (5'-GUGYG..3'), and a tertiary interaction motif, (5'..YYGGGAA..3' y-motif) in the core 5' to domain III, which is slightly different only in E. spirogyra.

The size and many sequences of the center core of the helical- wheel is also conserved among the five intron 2a's. In all the species there are 5 nucleotides, GUGYG, before the domain I (DI) stem, 6-8 nucleotides (including y motif) before Dili, 3 or 4 nucleotides between Dili and DIV, a single adenosine between DIV and DV, a 138 "GGA" or "CGA" between DV and DI, and 3 or 4 nucleoties at the 3' end of the intron, with the last nucleotide as a "C".

Figure 4-2 Secondary structure models for psbC intron2a's with domain I and domain II abbreviated. Conserved nucleotides are labeled blue. Positions that have undergone compensatory nucleotide substitutions are labeled yellow. The y motifs are in red.

The branch site A is labeled with an asterisk. 139 Domain V Donuin I -A u A stem Domain 111 G A U. C^A "A-u

5 Dotnain IV C-G u. u JZ® -3700nt U-G6-U Domain II iS W/ mat2 Euglena gracilis A^^A A-U ""Aa Domain VI ? A c_Q G uu "A-UU^ a' •SrCG-tf ti-^AA-^U AG-rU J G-U U-A S^C G-U ATU CrU cre^. 5' boundary U^A ArU C-<5 motif «tC a*fu Ar« c u U^A A-U A!-tf 6UGUG 6-Cuu' - A«G8MrU6uCC^aiA :

A-UU.-A AVUA-U A^-U ArU u u C-G A-U 0 u Loio C-G c U-A G-C u -3350ni U-A Euglena viridis U-A /// mat 2 A -' G-U u U A A A-U U -A G u G J -A A A A-tt A-U A J A-U G A ff-C U-A G-C G-U u u U-A G-C 5-C u A-U C'-a U-A C-G U-A U-A G-U C-0 •••• .A^ A-U U-A CTC 6-U A-U A-U A-rU U-G A-TJ A-U 6U5C6 A-U u ~ ~ AfifiCGAA'-^UGUCC-S A A'UCaAAT-UffijC

G-C C-G u u tl-A A-U A-U U-A G-U U-A J - A C -^A A A u ~c C A A-U A— U U-A C-G A — ;i U-A c u LOOO C-6 A-U -33SOnt G-C A-U A U-A U U J fnat2 u u U-A G u Euglena anabaena "A -U U 'J U A G-C A-U U J A A-U G-C U—A a-c A-U A-U tf-A G-C G-C: A-U A-U C-G U^A C-C C-G U-A A-U CTC • A* U-G U*rA G-C G A A-U A A A-U U-A A-U A-U GnatSG A-UAA " AGCGCAA-'UUGGCCr'G A A-UC5A6-UCUC tt u A C ArU C J A-ti U-A A--U U-A G-0 G-C Euglena gramilata c-c U-A C-6 U, *V/ rnat2 u 'i u Lt u U-A'^ A U 0 U-A A-U A-U J A U-A c-c U-A s-c Ar-U U~A C-C G-C O-A A C-0 O-A U-A A-"U C-0 •:••••• A* a^A Gr-C 6-C A-tl G-C G-U «-iA A-HI ArU

cA-U c A-U u u U u A-U A-UU~A A -U U-A 5-K Euglena spirogyra U-A A^"\ -2200ntLOGO O^CU fU C'-O A u W/ mat2 u c u u ,G u u U-A A-if ArrU u u U-A u JUS S'-C Mrf-AU-rA A-rU CrGm U-A c-^ C-G CrG A-tf C-ft A* tfTG Ar^U e^c A-rU A-U U-A U^A AHI A-U GUUUU A-UAU- UUUQOAA-a UUGC c-a A*rUCGU Af-U AfiiiC 140 Analysis of domains I and II of psbC intron 2a More detailed analysis was done to investigate possible structures of domains I and 11. A comparative sequence alignment is presented in figure 4-3 (only the stem regions of domain I are shown). The stem of domain I contains several conserved features (figure 4-3). A consensus sequence after the 5' splicing boundary, "AAAUNRR-(U)n-CCRA", could be identified, with slight variations in

each sequence. Also, two C-G pairs and two A-U pairs just above the first central loop in domain I are conserved (with the AU pairs inverted in E. spirogyra). These elements strongly indicate the presence of the domain I. However, the regions between the closure of domain I varied drastically, both in size and sequence, among the five Euglenas. No consistent secondary structure elements could be deduced and therefore no tertiary interactions recognized. Potential EBSl motifs were found in all five sequences, yet no well conserved stem-loops could be established around the motifs. Similarly, stem- loops could be established for domain II in all five introns, yet domain II sequence varies largely in size (from 98 nt to 9 nt) among the five introns. There was a block of A-U pairs in the domain II stems in all five introns that could possibly be related to each other, but no primary sequence conservation could be confirmed in domain II. 1 UxiMill I 1

1 I

A "J U-A A-U U-4A-U A U-A i'-ii A-0 A • >..u A -W V- * • -U C -0 C-C }:a \i~4 c A-y U~A 0 n U~A U-A U-A 0-A fi-o U-A M-A A-U U-A U-A lliullt A--U c - c J:8 A • v> • - • .wuv.;>« V A • M •

/'. ^rocilia /'. viritlis /'. (inulnienii /.'. ^mnttUilu

Figure 4-3 Secondary seqeunce alignment of the psbC intron 2a domain 1 stem and domain II. The notations are the same as in figure 4-2. 142 Sequence analysis and secondary structure model establishment of the internal introns. characterization of a mini-group II intron and identification of homologous intron 2ds. Although the external intron 2a and mat 2 are conserved throughout all species examined, the internal introns within mat 2 vary from species to species in number, size and location. Secondary structure models were proposed for each internal intron (Figure 4-6). Domains II through VI as well as the core region were predicted for all introns, yet domain I internal structures were again hard to determine. Likewise, most of the proposed tertiary interaction elements could not be identified in these internal introns. However, as in the external intron 2a, a y-y' motif was present in all of the internal introns. It was noticed that E. gracilis intron 2d, E. viridis intron 2d' and E. anabaena intron 2d" have similar insertion sites, between amino acids 115-116 for E. gracilis, 117-118 for E. viridis and 112-113 for E. anabaena. In the mat 2 sequence pileup alignment, the insertion sites of the three intron 2d can be aligned to the same site (figure 4-

4). During the pileup analysis, a few gaps were introduced into the sequence, indicating possible insertions and deletions during evolution. Therefore the pileup alignment is probably more accurate, in terms of the relative locations of intron 2d, 2d' and 2d", than the comparison of the absolute insertion sites. The three intron 2ds are most likely inserted into the same position in mat 2, and may all have a common ancestor. 143

1 50 E. gracilis MYQSNLNLQK KYTSDLLYYS WLSLKCGWKY FFE.IHNYCL FNSISRSWFK E. viridis VKNFNINLIN YITPEFLFFS WKQIKYQSKI KGM.FIRKKF lEPPSNFLFL E. ancihaena MVKKQMIFRD QFTSDFLLNS WDILKNNSSL lEL.KHVRDF I.PISRFWFE

51 100 E. gracilis RTSSLIKKG. ..FFIYPTVP LKIKNFFLSC TKKNFNLLKF KIVENAFLII E. viridis KTSYLIRSGQ FYYFCFPTLI SPLPSILDNF LLGSLLNLKL IIIQNAFLML E. anabaena KNIPFVKKGN YDY..YSLIF VKNYKLRLNQ KILPLRTIKR LILEISFINA

101 140 E. gracilis IKNFFIYKIY VQSMNLIEC-2d -I FNVTSFSFMK PFFCKQCPN. E. viridis IKPFF.YSSN TIFYNSSEC-2d'-R FNLSKLSF E. anabaBna LSPLFKKSS. ..FLNLTEY-2d"-N FYLNSIFKKE FFNLRTLKN.

Figure 4-4 Sequence alignment of the N' fragment of mat 2 protein in E. gracilis, E. viridis and E. anabaena. The insertion of intron 2ds are indicated by bold letters. Gaps were added by the pileup program.

In order to investigate whether E. gracilis 2d, E. viridis 2d' and E. anabaena 2d" introns are indeed homologous, comparative secondary structure analysis was done (figure 4-5). The level of overall similarity between the structures is low. However, domain V, the hallmark of group II introns, is well conserved. In addition to the nearly invariable triple base pair of the stem (5'-..AGC...GUU..3'), a "GU" bulge on the 3' stem and two G-C pairs on the upper stem are also conserved features which are specific to these intron 2ds only. Extensive conservation was also found in domain VI. In each intron, domain VI consists of a triple base pair lower stem, a bulged "A", a 6-7 base pair upper stem and a (U)n terminal loop. More 144 significantly, a primary consensus sequence could be identified at the 3' stem of the domain VI, 5'-..CUUA*GUU...UA-3', including nucleotides at the 3' end of the core. Although domain VI has been shown to have a well established secondary structure, it does not contain conserved primary sequences among non homologous introns. Therefore the consensus sequence found in domain VI is additional evidence that these three introns may be homologous.

Figure 4-5 Comparative analysis data of intron 2d in E. gracilis, E. viridis, and E. anabaena. Notations are same as in figure 4-2. 145

A—U

Euglena gracilis U—G U U A—U «= A—U A—U U—A U—U U—A A-U U—A A—U A—U GUGUG UGGUU6UU U UA

u u U t c G A l/ ,G U U A u U A A A u Euglena viridis A U A/ C A c U y iA U G c G C 1A U A-U A u A 1y U G u U A 1A U A A U u C G U U u A u A U U1 A U u U A U1 A u A UUGCG UUUUUUGAU U U UAU

dofnain I domian III

U domain V U A I /domian IV U U U A domian Vl U Euglena anabaena domain II U U U A lU U A U u A /U Un U U u A G U A A' A G C A U A U U A A U A U G C G 5' boundary A U 1 motif A U G U A U UUGUG CUAUGAA AUU UAUA 146 Figure 4-6. Secondary structural models of intron 2f, 2g and the mini group II intron 2h in E. granulata and intron 2e in E. viridis. Domains are indicated in bold letters within each model. The 5' and 3' splice sites of each intron are indicated by arrows. Possible y-y' interactions are indicated by solid lines. AA Domain 111 U i^AUUA. 0^ Domain IV Domain III ""ac'/ y * /:a° ^S'cS Domain IV "4 n 'u.acuou»u°N? Domain V Domain II ,0*'°''""'*%°*^ Domain II ("•%* \\»\\\\\AUUOACAU^ U /rAAooUU* rf'Vw ^ r" •«* \ uu%' Domain V krwiiiiaii I ,A uuuoocAu /y/Ouoi' • ,U A . I1 U u'^/'/uuu /"""Co'.V ^ \ .* *»«uu" Domain VI /'p Wt, " / ft \ uii»'r •! I 1 5- splice E 3' splice . , . Domain VI site site Domian I 5' splice 5 0 3' splice site S i site ° E granulata 2f E granulata 2g

Domain III Domain III Domain V U-A Domain U-A AAOCUOCA ^AU*"* Uu cf\° Domain V \NJUUOACAUQ .NW \ „ ^AAOOJAUQ" ip Domain \\.\NNN\UUUQAUAC*^ U '/."OOOUU uV •" \VL Domain VI ///./• J Domain VI '"v"\ "A OUUQAQa^ AU*»Qy.\V vva"*" y ..UUAMcuU^ 5' splice 3*' splice site li u site

Domian *^OUu 4i^ £ viridis 2e E granulata ll-mini group II 148 Another interesting intron is intron 2h in E. graniilata. At only 238 nt, intron 2h falls below the previously known size range of Euglena group II introns. The proposed secondary structure model shows a typical group II intron 5' boundary "GUGCG", standard domains V and VI, and a well defined domain III (figure 4-5) with a typical y-motif. A small domain II and a clear stem of domain I can also be identified. Thus this 238 nt intron appears to be a bona-fide group II intron which apparently lacks domain IV. I have labeled psbC intron 2h of E. granulata a "mini group II intron".

Discussion psbC intron 2 is a bona fide group II intron. Based on the comparative analysis of psbC intron 2 homologues from five species, the secondary structure of this intron is consistent with the canonical group II intron helical-wheel model. The core structure of the intron is well conserved both in size and shape (the number of nucleotides between domains). The base stem of each domain is also well preserved. Surprisingly, consensus sequences were even found in the stems of domain III and IV. The most well conserved domains are domains V and VI, not only on the RNA- folding level, but also on the level of primary sequences. This level of conservation in domains V and VI is not unusual, since domain V has been proven to be the catalytic domain (Michel, 1995). The triple base-pair in the stem of domain V, 5'..AGC,..GUU..3', was shown 149 to be necessary for ribozyme activity (Boulanger , 1995), and a dinucleotide bulge on the 3' stem was shown to be critical for the basal processing rate (Schmidt, 1996). These fundamental elements, as well as domain VI which contains the branch site adenosine, are very well retained in Euglena group II introns. Unique to psbC intron 2, the stem of domain IV, the domain that contains the mat 2 coding region, is surprisingly well conserved even at the primary sequence level, additional evidence of the importance and activity of

the mat 2 protein. The same type of study has also been done with another group II intron, petB intron 2. In the three species examined extensive conservation was also noticed which supported the presence of the core structure and the base of the domain stems (Thompson and Zhang, 1997). These comparative analyses have confirmed the membership of Euglena chloroplast introns in the group II intron kingdom.

The Euglena group II intron "microcosm".

Although psbC intron 2 homologues display a bona fide core and the six domains defined in the canonical model, they do appear to be unique in the folding of some of these domains. For example, except for the base of the domain II stem, comparative analysis has not been successful in identifying any conserved primary or secondary motifs within the domain II loop region. In addition, a convincing domain I base stem could be identified, yet in contrast to 150 the canonical model, no consistent secondary hair-pins could be postulated inside domain I. Therefore the proposed tertiary interaction elements within the domain, including the EBSl motif, are either missing or substantially differ from the canonical model. In this respect, the Euglena group II introns are unique in terms of their

folding and maybe also in the processing mechanism. Although conserved sequences inside domain I and domain II have not been identified, it could not be concluded that domains I and II lack secondary structure. In fact, the observation that in many Euglena twintrons, the internal introns interrupt the domain I strongly indicate that domain I contain critical structural elements for splicing. It is possible that stem-loop structures were maintained in the domain even though the primary sequences drastically changed. Since nucleotide pairing rules in Euglena are more relaxed than in other organisms, potential stem-loop structures are difficult to identify, especially when it has not yet been possible to test proposed structures by mutagenesis. In addition, it is likely that Euglena introns rely more heavily on trans-acting factors, which could result in unconventional structures being used in vivo. Based on the traditional group II intron elements that are not found in Euglena, several features specific to Euglena intron processing could be proposed. For example, many of the tertiary interactions absent in Euglena chloroplast group II introns affect the splicing efficiency but not accuracy. Therefore Euglena group II introns might assure correct splicing via cis-stored information while relying on 151 txans-acting protein factors for the processing rate. Also, in order to compensate for the lack of tertiary structure motifs, the conserved structural elements in Euglena could carry different, or simply more, functions than their counterparts in other group II introns. For example, the lack of EBSl could be compensated for by relying on recognition of the well conserved 5' splicing site, 5' RRGUG.. 3". Moreover, the surprisingly well conserved y-motif, almost the only tertiary element confirmed in Euglena group II introns, could play an unusually important role in 3' recognition and may direct the formation of the second reaction complex. In order to further investigate the structure and processing significance of the Euglena group II intron system, mutagenesis will be a necessary approach in the next step of research.

Internal introns seem to have a much higher mutation rate. The comparative analysis of the intron 2d homologues showed that their structures are much less conserved than the external intron 2a homologues. In fact, sequence analysis of all Euglena group II introns has indicated that the internal introns are usually smaller and more abbreviated structurally. Thus, it seems that the internal introns, although inserted later, were exposed to higher mutation rate than the external introns. Or, more likely, the internal introns are less constrained in terms of mutation. It could be that the processing accuracy of the internal introns is less critical than that of the external introns since the internal introns interrupt secondary 152 elements, which are more flexible than amino acid codons. However, accurate processing of the mat 2 internal introns directly affects the correct translation of the intron maturase. There is an alternative explanation for the high percentage of mutations. Since mat 2 may be involved in either global chloroplast intron processing or processing of a specific set of group II introns, the splicing of its internal introns could be a highly controlled step which would in turn affect the rate of chloroplast gene expression. Trans-acting factors such as nuclear encoded regulators could compensate for mutated secondary and tertiary elements to affect splicing rate and efficiency as discussed above. The involvement of nuclear regulators in control of chloroplast splicing would then also act in part to assure the nuclear-organelle coordination. 153 CHAPTERS

EXPRESSION OF PARTIAL MAT 2 GENE OF EUGLENA GRACILIS IN E. COU AND YEAST

Introduction

The hypotheses that psbC intron 2 is one of the earliest introns to have invaded Euglena chloroplast and that mat 2 is a group II

intron maturase are supported by the observation that psbC intron 2 and mat 2 are deeply rooted in Euglena genera, mat 2 proteins have very well conserved X-domains and RT domains, indicative of activity. The research focus now is to initiate biochemical analysis of the activity of mat 2 and its expression in Euglena. Because of the asexual nature of Euglena, mutagenesis is not practical for testing the gene product activity. Efforts have been made to synthesize a mat 2- specific antibody using in vitro synthesized epitopes. Due to the lack of knowledge about maturase structure, sensitive and specific epitopes have not been discovered. The present approach is to try to express the mat 2 gene in other systems so that I can obtain pure protein to raise antibody against the whole protein and perform biochemical assays. Expression of maturases has always been a difficult task. Attempts to express an intact maturase-containing plasmid often lead to cell death (Eskes, personal communication). To date, the 154 expression of matK from mustard chloroplast is the only published report (Liere, 1995). One reason for the difficulty could be the toxicity of maturases to E. coli cells because of the potentially strong nucleic acid binding activity. Moreover, most maturases are intron- encoded and their codon usage might be different from that of the exons. The exon-intron codon differences are apparent in many Euglena chloroplast genes, especially the codons for basic and aromatic amino acids. To reduce the potential toxicity, I began the expression experiments with partial-length, instead of the full-length mat 2 gene fragments. Based on a convenient restriction site, Ndel, two fragments of mat 2, an 800 bp N terminal fragment and a 1600 bp C terminal fragment, have been designed as expression targets. As a positive control, a fragment of the Euglena chloroplast gene, rpoB , which was previously expressed in E. coli, was also selected for expression. The PET expression system (Novagen) was chosen for mat 2 over-expression in E. coli. The PET vectors have a double-repression system suitable for expression of potentially toxic proteins (figure 5-

1). The target protein is expressed from a T7 promoter regulated by die E. coli lac operon regulatory region. The lac is activated by IPTG induction and the T7 polymerase needs to be provided by transformation into a specific cell strain. A histidine hexamer tag is added at the N-terminal end of the vector for affinity chromatography purification. 155 A yeast expression vector, pG-1, was also chosen for expressing mat 2 in yeast. pG-1 has a strong promoter and a trpl gene for selection. It contains the puc-18 origin of replication and a P-

lactonase gene for ampicillin resistance so that it can be maintained and manipulated in E. coli (figure 5-1).

Results

Cloning of mat 2 and rpoB fragments into expression vectors: Two pET vectors were chosen for expression. pET15b is the standard, doubly repressed vector and pET32a contains a thioredoxin gene as the fusion fragments (figure 5-1). Primers BamHIcM and BamHIcRS (see "Material and Method" for primer sequences) were

used to synthesize cDNA fragments of mat 2 and rpoB, respectively. PGR reactions using primers NdelpM-BamHIcM and KpnlpMC- BamHIcM were done to obtain a full-length and a C terminal fragment of mat 2 coding DNA, respectively. The PGR fragments and pET vectors were treated with appropriate restriction enzymes. The

G terminal mat 2 fragment was cloned into the KpnI-BamHI site of pET32a. A 660 nt N terminal fragment of mat 2 was generated by cutting the full-length mat 2 with Ndel, and was ligated into the Ndel site of pET15b. An 888 nt rpoB fragment was obtained by cutting the rpoB PGR product with BamHI, and was ligated into the BamHI site of pET15b vector. 156 pET32a-mar 2C was linearized just before the T7 terminator region and was treated with exonuclease III (GibcoBRL). Ligated plasmids were screened by restriction digestion. Clone pET32a- mat2CA3, in which matlC was truncated into a 285 bp fragment, was selected for expression. 157 Figure 5-1 Maps of recombinant pET plasmids that contains mat 2 and rpoB fragments. The T7 promoter is indicated by a vertical black bar with an arrow on the top. Genes and coding regions are shown as rectangular boxes with arrows indicating orientation, mat 2 and rpoB inserts are drawn above each plasmid with insertion sites indicated. Syntax used in the maps are: o: lac operator; Trx: thioredoxin gene; H: a Histidine hexamer; S: a 15 amino acid fragment of ribonuclease S, used here as affinity tag for purification; pes: the polycloning site; Amp*": ampicillin resistance gene; Lad: lac repressor gene; Ori: a bacteria origin of replication. inat2N: mat 2 N terminal fragment; mat2C: mat 2 C terminal fragment; mat2C-A3-l: a deleted matlC, with deleted region indicated by an

open box; rpoB e7+e8: rpoB with exon 7 and partial exon 8. mataC (~530aa) mal2C.A3-l(-95aa)

Trx (lOSaa) ' Trx (105aa) Ib •M—N H S pes H H S pes

pET-32a-niat2C (-7500bp) pET-32a-inal2C (~6200bp) J

mat2N (~220aa) rpoB e7+e8 (290aa)

pes r pEr-15b-mat2N (-6378bp) & JI pEr-15b-rpoB (-6500) 159 A yeast expression plasmid was generated by inserting pET32a-mat2C terminal into pG-1 (Figure 5-2). PGR was carried out with linearized pET32a-mar 2C as template using primers BamHIpP and SallcP. The resulting fragment contained the mat 2 C terminal fragment (-530 aa) with the Trx protein and the Histidine tag. The fragment was treated with appropriate enzymes and ligated into the BamHI-Sall site of pG-1.

mat2C (~530aa)

Trx (lOSaa)

pG-1 (-7400bp )

Figure 5-2. Map of the recombinant pG-1 plasmid containing mat 2 joined to a His-tag. The syntax were the same as figure 1. Trp I is the tryptophan gene for trp deficiency selection. The stippled box on the plasmid indicates the micron gene copies. GPD: glucose phosphodiesterase promoter; PGK: phosphoglucose kinase terminator. 160 Expression of mat 2 and rpoB gene fragment in E. colt The plasmids pET15b-rpo5, pET32a, pET15b-j3-ga/, pET32a-

transferrin and pET32a-mat2C were transformed into the expression cell line BL21-DE3 {E. coli strain transaffected by DE3 phage which contains T7 polymerase) and the uninducible cell line BL21. 25 jil of

crude sample of each induced or uninduced cell culture was analyzed by SDS-gel. Induction was successful for rpoB, fi-gal, thioredoxin and

transferrin (figure 5-3 and 5-4). Uninduced and non-inducible cells showed traces of background expression while control expression was substantial after 1/2 hour of IPTG induction. The amount of

protein in the 1 hour sample was similar with that of the 1/2 hour sample, indicating that expression peaked within an hour. No induction could be detected from the 25 |il crude sample of

pET32a-mar 2C induction. To test if any low level expression occurred, sample volume was increased. 300 ml of induced pet32a- mat 2C cells (OD600= 1.0-1.2) were collected and the protein

extraction was subjected to column purification and concentration. Six 250^.1 elution fractions were collected with a gradient of 100 mM to 1 M imidazole. When 50 |j,l of each elution was loaded on the gel, a

very faint band at the predicted position for pet32a-war 2C (-90 kd) was obtained, (figure 5-5).

To test if this 68 kd band is the expected pet32a-maf 2 C peptide, the pet32a-waf 2C plasmid was subjected to deletion subcloning. Derived plasmid pET-32a-maf2C-A3-l, which contained only a 95 aa N terminal portion of the mat 2C fragment, was 161 transformed and induced in tiie same way. Although visible induction could not be seen with the crude sample (data not shown), a band at the correct position range (-30 kd) was obtained in most of the column elution fractions, and most strongly in the 200 mM imidazole elution (Figure 5-6, only the 200 mM fraction was shown). The -90 kd band was not obtained in any of the elutions with this truncated pet32a-maf 2C sample. The above results supported the authenticity of the pet32a-mflr 2C band.

Figure 5-3 E. coli expression of the Euglena rpoB gene fragment in the pET15b expression vector and the thioredoxin protein in the pet32a vector. 25 nl of crude cell lysate was loaded. The sizes of the protein weight marker (Jenkins, 1996) are shown on the right and the sizes of rpoB (40 kd) and thioredoxin (20.4) protein fragments are shown on the left. Lane 1: rpoB fragment in pETlSb, induced for 1 hour (37°C); Lane 2: rpoB fragment in pET15b, induced for 1/2 hour (37''C); Lane 3: rpoB fragment in pET15b, uninduced; Lane 4: pet32a plasmid (with thioredoxin), induced for 1/2 hour (37°C); Lane 5: pet32a plasmid (with thioredoxin), induced for 1 hour (37°C) 1 2 3

200k d 1 16kd 97.4kd 66kd

45kd 4()kd

28kd

20.4k d-

14.4kd 163 Figure 5-4 E. coli expression of b-galactosidase and the Euglena rpoB gene fragment in the pETlSb expression vector, the transferrin and the trx-psbC fusion protein in the pET 32a vector. 25ml of crude cell lysate was loaded. Land 1: transferrin in pET 32a induced for 1 hour; Lane 2: transferrin protein in pET 32a, uninduced; Lane 3: mat 2 C terminal fragment {mat 2C) in pET15b, induced for 1 hour; Lane 4: mat 2C in pET 15b, uninduced; Lane 5: mat 2C in pET32a, induced for I hour; Lane 6: mat 2C in pET32a, uninduced; Lane 7: rpoB fragment in pET15b, induced for 1 hour; Lane 8: rpoB fragment in pET 15b, uninduced; Lane 9: b-gal in pET 15b, induced for 1 hour. 1 2 3

200kd

116 kd •80 kd —

66kd

45kd

40 kd

23kd 165 Figure 5-5 His-tag column purification of E. coli expression of plasmid mat2C-pet32a. The arrow on the right indicates the predicted size range of the target protein. Lane I. 5% of the crude cell extract; lane 2. 5% of the column flow through; Lane 3. 5% of the elution fraction with 100 mM imidazole; Lane 4. 200 mM imidazole elution; Lane 5. 400 mM elution; lane 6. 600 mM elution; Lane 7. First IM elution; Lane 8. Second IM elution. 166

1234 5 678

200kd -

116kd.

97.4kd' ~90kd

66kd-

45kd 167

200kd — 97kd —

eSkd —

43 kd —

40 kd 29 kd — — 30kd

— 20.4kd

18.4 —

Figure 5-6 Afjfinity column purified rpoB (40kd), thyrodoxin (20.4kd) and truncated trx-mat 2c protein fragment, trx-mat2CA3-l (30 kd) protein samples. Lane 1. rpoB fragment in petlSb; Lane 2. pet32a plasmid(with thyrodoxin); Lane 3. 95aa mat 2 fragment in pet32a. 168 Expression of mat 2 C terminal fragment in yeast: The plasmid pG-l-frx-his-maf2C was transformed into the tryptophan-synthesis-deficient (Trp ) yeast strain, and grown under tryptophan selection. 300 ml of induced cell culture grown for approximately 60 hours at 28°C was collected. The protein was extracted, concentrated, and purified through the 400 |il Nickel column. Promisingly, a clean and distinct band was seen in the 200 mM elution lane (Figure 5-7). This polypeptide is approximately 68 kd, which is the predicted size of the rrx-his-mat 2C fragment. Since there are no antibodies to mat 2, the identity of the putative mat 2 band could not be confirmed by immunohybridization. However, only one extra band, which was very faint, was seen below the 68 kd fragment. This result is much cleaner than the expression from E. coli extracts. Traces of the same bands could been seen in all other elutions, yet all appeared to be less than 10% the intensity of the 68 kd band in 200 mM imidazole, indicative of a strong his-tag specificity. The same specificity was also observed with rpoB expression and purification, in which most of the proteins were eluted with the 200 mM elution. This is another indication of the purity of the product. The yeast expression system appear to be more suitable for Euglena maturase expression than the E. coli system. 169 Figure 5-7. His-tag affinity Nickel-column purification of the his-frx- mat 2C fusion protein expressed in yeast. The sizes of the protein weight standard marker (BRL) are on the left. The position of the target protein, trx-mat2C, is indicated by an arrow on the right. Lane 1: 0.5% of the crude extract of yeast containing the pG-l-rrx-mat2C plasmid; Lane 2: crude extract after 0.45 nm filter; Lane 3: 0.05% of the flow through; Lane 4: column wash (60 mM imidazole); Lane 5: 8% of the 200 mM imidazole eludon ; Lane 6: 8% of the 400 mM imidazole elution; Lane 7: 8% of the 600 mM imidazole elution; Lane

8: First elution at IM imidazole; Lane 9: Second elution at IM imidazole. 1 70

8

200kd.

116kd' 97.4kd'

66kd-

45kd-

28kd' Discussion

Successful expression of Euglena chloroplast genes with the pET expression system: Few Euglena chloroplast genes have been expressed in other systems. Over expression has been unsuccessful with roaA, a non- intron encoded open reading frame. In fact the rpoB gene is the only reported Euglena chloroplast gene to be moderately expressed in other system. Using the double repression pET system, I successfully expressed and purified large quantities of Euglena chloroplast gene products, rpoB. The pET expression system has been proven to be very sensitive and efficient. In addition, the affinity tag, a histidine hexamer, appeared to be highly specific during purification. Successful expression using the pET system has thus provided us convenient access to Euglena chloroplast proteins for advanced biochemical and molecular genetic investigation.

Possible reasons for the low expression rate of mat 2 fragments:

One major problem of mat 2 expression is the possible toxicity caused by its RNA-binding ability. In my expression system, toxicity was mitigated in several ways. First, the expression vectors have a doubly-repressed T7 promoter; Second, mat 2 was expressed as truncated N terminal and C terminal fragments. Third, the antibiotic selection (ampilicillin) was frequently re-enhanced during expression. A plasmid stability test (outlined in the Novagen PET 172 manual) showed that little plasmid loss occurred during induction. Therefore, the toxicity of mat 2 appears to have been reduced to an acceptable value. However, leaky expression from the T7 promoter could still have generated sufficient expression of mat 2 to cause cell death prior to induction. Therefore, although toxicity did not cause plasmid loss during induction, it could have caused a low starting concentration of inducible cells. Other than toxicity, system incompatibility was the next concern. Various types of proteins were overexpressed via the pET plasmids, showing the robustness of the system. The fact that the rpoB fragment, which is also a Euglena chloroplast gene, was induced strongly indicates that the expression system is not exclusively resistant to Euglena genes. However, rpoB and mat 2 do have different codon usage (analysis data not shown). While no particular tRNA was absent, the mat 2 gene did appear to use some tRNAs much more frequently than rpoB. Moreover, rpoB gene codon usage appeared more similar to E. coli genes than mat 2. Therefore other than toxicity, another explanation for difficult expression could be that the limited amount of certain tRNAs in £. coli slowed down the translation of mat 2. This could explain the fact that the induction of a deleted clone of mat 2C was more visible. The possibility that mat 2 is susceptible to some very active proteases could not be excluded, either.

A third factor in the expression control was the rate of degradation of the mRNA, i.e. mRNA stability. Euglena chloroplast 173 mRNAs are prokaryotic-like. Therefore, instead of poly-adenylation, inverted repeats and stem-loops in the 3' regions of the mRNAs and many nuclear encoded RNA-binding proteins are used to stabilize the transcripts. It is possible that the intron-encoded maturase mRNAs are stabilized by specific protein factors in Euglena. When expressed in a bacterial system, the maturase mRNA becomes susceptible to endonucleases. The RNA stability might be improved by switching into a eukaryotic expression system.

The promising yeast system and future goals:

To employ a eukaryotic expression system, I have developed a practical protocol for the expression and purification of Euglena chloroplast genes in yeast. I have also been able to express and purify segments of a group II intron maturase, one of the most difficult of proteins to express. However, the expression of mat 2 is very inefficient since procedure was not always reproducible. Expression in the yeast system turned out to be much more difficult to control than in the E. coli system. First, the expression conditions for yeast seemed to be restricted. The expression of pET32a mat 2CA3-1, the truncated fragment of mat 2, could be obtained in E. coli under a wide range of induction temperatures, from 30-37OC. For the yeast system, however, temperatures above 29^C blocked cell growth. Also, due to my limited experience with yeast cell cultures, the expression rate could not be determined. However, log phase cells didn't seem to have the most expression. The purification of the 174 expressed protein from yeast was also very difficult, since yeast cells are not as easily lysed as bacteria. Sonication, which efficiently breaks bacteria cells, failed to lyse even 5% of the yeast cell culture.

Zymolase treatment, glass bead disruption and sonication were combined to give an acceptable amount of cell breakage. Therefore, 300 ml of yeast culture yielded a much lower amount of crude protein sample than an equivalent E. coli culture. Protein could also be lost during the dialysis procedure which was necessary to prepare

the lysed sample for column purification. The residual EDTA from the lysis buffer could cause the clotting of the nickel-column and lower affinity binding. All of the above could combine and result in a low yield of purified protein and contribute to difficulty in reproducing the expression and purification experiments. Therefore the expression procedure could still be significantly improved. However, the results obtained by this system indicate the first successful expression of a Euglena chloroplast maturase, and are very promising for future work.

Material and Methods

Vectors and cell strains: The E. coli overexpression vector pETlSb and pET32a were purchased from Novagen. The E. coli plasmid maintaining strain BL21, expression strain BL21-DE3, and positive control strain were from Novagen. The yeast expression vector pG-l was provided by 175 Dr. Roy Parker, who also provided the yeast expression strain YRP840. cDNA and PCR primers: Three primers were used to amplify mat 2 cDNA fragments. BamHIcM is a cDNA primer that primes in the 3' region of psbC intron 2, which is about 10 nt downstream of the mat 2 stop codon. NdelpM is a PCR primer at the beginning of mat 2. Both sequences were modified to introduce necessary restriction sites. KpnIpMC is mRNA-like and primes to the middle of mat 2, just downstream of the third internal intron (2d). KpnIpMC introduces a Kpnl site and deletes an existing Ndel site.

Two primers were designed for rpoB cDNA cloning. BamHIcRS (cDNA primer) anneals to the middle of rpoB exon 8. BamHIpR is a PCR primer for the beginning of exon I. Both primers were constructed with BamHI sites at the ends. Two primers were designed for the construction of the yeast expression plasmid. BamHIpP is mRNA-like and primes to the lac operator region of the pET32a/pET15b vectors. SallcP is cDNA-like and anneals to the region just upstream of the T7 terminator in the pet32a/pET15b vectors. The restriction sites for cloning were constructed in the middle of the primers.

The sequence of all the primers and the available coordinates are listed in table 5-1. 176 Table 5-1 The sequences and locations of the oligos used for expression plasmid construction. Locations are the EMBL coordination numbers of the E. gracilis chloroplast genome.

Name Template Sequence(5'-3') Location strand KpnIpcM tnRNA 16089-16114 (gracilis) BamHIcM cDNA CrAATTATTCAAGGATCCTATATTTG 17745-17770 (graci lis) NdelpM mRNA TATlTnTCATATGTATCAAAGTAAT 14151...17732 (gracilis) BamHIpR mRNA GGGATCCGGTAATGCTrCTAGAGTACGC 114331-114352 (graci lis) BamHIcRS cDNA GGGATCXXXXTATTAGCATCXjTrATG 112055-112073 (gracilis) BamHIpP mRNA GTGAGCGGATCCCAATrCCCX: N/A

SallcP cDNA G(nTIGTCX}ACAGOCGCATOC N/A

cDNA synthesis. PGR analysis and cloning: cDNA was synthesized using Superscript II polymerase (BRL) with 10 mM DTT and 50 ng/|il actinomyocin D. 200 unit of enzyme, 10 |ig of purified total chloroplast RNA template and 200 ng of cDNA primer was used for each reaction. Reaction buffer was from BRL. Reactions were carried out for 1 hour at 3>1°C. PGR amplifications with plasmid as template were carried out for 30 cycles of 94°C for 1 min., 55°C for 2 min. and 12°C for 3 min. PGR amplifications using cDNA template were carried out similarly, but were annealed at 50oG for 2 min. Plasmids were constructed using T4 DNA ligase as described previously. 177 Expression and protein isolation in E. coli:

The expression protocol from the Novagen pET System manual was followed. Two E. coli strains were used in the expression: BL21, the uninducible strain, is a standard strain for maintenance of the plasmids; BL21-DE3, which is transinfected by DE3 phage that contains T7 polymerase, is the induction strain. 250 ^il transformed cells were plated onto LB+Amp plates. Colonies were inoculated into 2 ml LB liquid media with 200 ^ig/ml carbenicillin and incubated overnight. The overnight cultures were inoculated (1:1000) into fresh LB media with 200 |ig/ml carbenicillin, and grown at 31^C for 3 hours. The cells were pelleted and resuspended in fresh LB media containing 1 mM IPTG and 500 carbenicillin and incubated for

1 hour at 30OC.

Induced cell cultures were collected by centrifugation and resuspended in cold Nickel column binding buffer (5 mM imidazole, 0.5 M NaCl and 20 mM Tris-HCl pH7.9) with 20 mM PMSF and sonicated for 2 min. at high settings. The sonicated suspension was centrifuged at 12000 rpm for 15 min. Pellets were collected and resuspended in binding buffer with 6 M urea and incubated overnight. The suspension cultures were then sonicated for another 2 minutes and centrifuged. Clear supernatants were collected and filtered through 0.45 |im filter and then applied to resin columns charged with 50 mM NiSo4. The columns were washed with 5 mM imidazole, low salt buffer and the proteins were eluted with 100 mM 178 to 1 M imidazole as described in the Novagen Manual. The collected elution fractions were directly subjected to SDS gel electrophoresis

analysis.

Protocol for expression and protein isolation in yeast. The yeast protein isolation protocol was adapted from the Current Protocols in Molecular Biology Ch.l3 based on suggestions by Dr. Dodson and other colleagues. The yeast expression plasmids were transformed into trp" yeast strain using LiOAc protocol (Gietz, 1991). A single colony was inoculated into 2ml of rrp-minimum media and incubated at for about 30 hours. The culture was then inoculated (1:1000) into fresh fr/7-minimal media and grown at 280C for approximately 50 hours. The cell culture was collected by centrifugation, washed with water and resuspended in glass bead disruption buffer (20 mM Tris.HCl pH 7.9, 10 mM MgC12, 1 mM EDTA, 5% glycerol, 1 mM DTT and 0.3 M (NH4)2S04). 1 cell volume (pellet volume) of acid washed glass beads

were added to the solution and the cultures were vortexed for 10 minutes with 20 mM PMSF. Vortexed cultures were sonicated for 5 minutes. The cell debris was removed by centrifugation at 12kg for 15 minutes and the supernatant was dialyzed against a large volume of his-column binding buffer with 6M urea. Dialyzed samples were then filtered and run through the nickel column as described above. Eluted samples were directly loaded onto an SDS gel. 179 Gel electrophoresis: Gel electrophoresis was carried out at 50 mA. Crude cell cultures were obtained by pelleting induced cultures, resuspending in 5X protein loading dye (62.5 mM Tris PH 6.8, 2% SDS, 10% glycerol, 0.1% bromophenol blue and 5% P-mercaptoethanol) and boiling.

Column elution fractions were prepared as described above. 180

300 ml cell Fucked column culture (-750UI) ISpin down ^ Rinse (water)

100 ml Binding ^ Charge Buffer (BB) ^ BB w/ urea Sonication (2-3min) Charged column (-750ul) 50 ml BB w/ 6M urea Row through

on ice 0/N (• I Bound column spm V

^Washe with 5mM •ear solution ^ imidazole i0.45um filter M 200mM Bute Clear solution

400mM Bute

c 1 M Bute

Figure 5-8 Purification of £. coli expressed pET plasmid proteins with his-tag affinity column I 8 1

300 ml cell culture

^Spin down

^ wash w/ water

IV of acid washed spin down glass beads measure V

30-40ml of glass bead BF

Denatured his-tag vortex 10 PMSF affinity column described in Rgure 5-8

sonication 5 filter (0.4 Sum)

Dialysis supernat ant Binding buffer w/ urea

Figure 5-9 Purification of yeast expressed protein with the his-tag affinity column 182 APPENDIX I

IDENTinCATION OF A GROUP II TWINTRON IN THE ATPE GENE OF THE EUGLENA GRACIUS CHLOROPLAST GENOME

Introduction

Among the 155 introns in the Euglena gracilis chloroplast genome, there are 15 identified or proposed twintrons, which are unique intron-within-intron structures that have been found only in Euglenoid chloroplasts In twintrons, internal introns interrupt the important secondary structure elements of the external introns thereby inhibiting the processing of the external introns. Splicing of the internal introns restores the correct folding of the external introns allowing them to be processed. Euglena twintrons vary in size and conformation. Both group II and group III introns can be internal and/or external introns. In addition to the simple "one intron within another intron" conformation, there are also complex introns. For example, petB intron 1 is a group II intron that is interrupted by one group II and one group III internal introns (Hong, 1994), and rpsl8 intron 2 is a group III intron that contains a group III internal intron which is itself interrupted by another two group III internal introns (Drager, 1993). psbC intron 4 is a group III intron which contains another group III intron in which is found a protein coding region (mat 1). Also, as reported in chapter 2, psbC intron 2 is an "intron within a gene within an intron" complex. 183 Table Al-1 Summary of Euglena gracilis chloroplast twintrons. N/A= not available (adapted from Copertino, 1993)

Gene-intron Size (nt) External Internal size (nt) 1 type size (nt) 1 type atpE-1 757 N/A N/A N/A N/A psbF-1 1042 424 group II 618 group II psbD-1 1098 463 group II 635 group II psbT- 1 1352 358 group II 601 group II 393 group II petB- 1 909 399 group II 404 group II 106 group III psbD-S 3658 N/A group II N/A (with ORF) psbC-2 4144 592 group II 3 group lis (See Ch 2) psbK-2 204 93 group III 1 11 group III rplI6-3 208 96 group III 1 12 group III rpoCI- I 210 1 14 group III 96 group III rpoCI-3 213 1 1 1 group III 102 group III rpoCl-1 I 198 102 group III 96 group III psbC-4 1605 101 group III 1504 group III with ORF rps3-1 409 99 group III 3 10 group II rpsI8-2 434 107 group III 1 10 group III 106 group III 1 12 group III 184 Intron I of the atpE gene, which encodes the ATPase subunit E, is 757 bp in length. Primary sequence analysis has indentified two potential domain Vs, yet only one domain VI. Here we report that through cDNA analysis we have characterized atpE intron 1 as a group II twintron, with the internal intron inserted into domain VI of the exteral intron. 185 Results

Identification of partially spliced atpE intron 1 pre-mRNA bv RT-PCR: Primers were designed to detect partially processed atpE

mRNA (Figure Al-1). cDNA primer cl is specific to atpE exon 2 and cDNA primer c2 spans the intron 1-exon 2 junction. The mRNA primer pi anneals to atpE exon 1. The sequence and location of all the primers are described in the following table.

Table A1-2 Sequence and location of primers used for cDNA and PGR analysis. The locations are EMBL coordinates of the Euglena gracilis chloroplast genome, accession number X70810.

Oligo Sequence Location

P 1 5'-ATGAGATTAGATGTATC-3' 89410-89428 c 1 5'-GTTGAAGTCGrn"nGGG-3' 88605-88623 c2 5'-GTTTCAGGAATAAGTAAAGTAG-3' 88636-88657

Two control reactions with E. gracilis nucleic acid extracts were done to define the sizes of the fully processed and fully unprocessed RNAs. RT-PCR with primers pi and cl resulted in a small fragment of 68 bp, which corresponds to fully spliced atpE mRNA. The pl-cl PGR reaction carried out with total chloroplast nucleic acid extracts resulted in a larger fragment of 824 bp, which represents the unprocessed precurssor. 186 An RT-PCR reaction using primers pi and c2 resulted in two major products (figure A1-1), one being the fully unprocessed precursor and the other being an approximately 500 bp processing intermediate. This result indicates that atpE intron 1 contains an internal intron of about 300 bp.

Characterization of the internal splicing sites and secondary structure analysis of the internal and external introns:

The RT-PCR pl-cl product and the RT-PCR pl-c2 product were cloned into bluescript KS(-) plasmid, generating plasmids pEZC 2016 and pEZC 2015, respectively. The two plasmids then were subjected to sequencing analysis (figure A1-2). The 5' and 3' splice sites of both internal and external introns were determined. We then concluded that atpE intron 1 is a 463 nucleotide group II intron which is interrupted by a 320 nucleotide group II intron. The splicing sites of both the internal and the external introns are unambiguous. Both the internal and external introns were folded based on the cannonical model for group II intron secondary structure (figure Al- 3). The 5' splice boundary of both introns, "GUACG" for the external intron and "GUUCG" for the internal intron, differ from the Euglena consensus "GUGYG" by only a single nucleotide. Both introns have clearly identifiable domains III through VI. A y-motif which fits the consensus, "RRGAY" was also found in both introns. Domain I and domain II were both easily identified for the external intron. EBS- 187 IBS intereaction motifs were both present in the expected positions. Moreover, the tertiary interaction elements, e-e' and a-a', were also identified. The atpE intron 1 external intron structure appears to be one of the Euglena introns that conform the best to the group II intron secondary structure model in Euglena gracilis. On the other hand, the internal intron structure is much more abbreviated. Even the closures of domain I and domain II were difficult to determine. The best way to further define the secondary structure of the internal intron would be through comparative analysis of homologous introns in other Euglena species. 188 Figure Al-1. RT-PCR analysis of atpE pre-mRNA. A simplified diagram of the atpE intron 1 region is shown below the gel. Black boxes represent exons, white boxes are external intron segments and the shaded box is the internal intron. Positions of the primers used are shown as arrows. The gel shows the PGR amplification products of atpE mRNAs. The total nucleic acid PGR reaction is labeled "DNA". The sizes of the molecular weight markers are shown on the left and the sizes of the PGR and RT-PGR products are shown on the right. 89

es U u a, c-I < ^ I E- z S C£ D CM 01 u Q.

2323 1929 1371. 1264

— 824 (unprocessed) 702

— 473 (partial spliced) 224 112 — 68 (fully spliced)

pi atpE intron 1 cl 190 Figure A1-2 Sequences of the cDNAs corresponding to partially and fully processed atpE intron 1. Nucleotide sequences of cDNAs corresponding to partially (pEZC 2015) and fully (pEZC 2016) spliced mRNAs are shown. The expanded sequences represent the internal and external intron boundaries. The locations of the internal and the external intron insertion sites are indicated by arrows. Schematic diagrams of the unspliced, partially spliced and fully spliced RNAs are shown below the gel. 19 1 pEZC 2015 pEZC 2016

G A T C G A T C 192 Figure A1-3 Secondary structure models for A) the external intron and B) the internal intron of the atpE twintron. Dashed lines and boxed nucleotides indicated the e-e' tertiary interactions. The y-y' interactions are indicated by solid lines. The branch site A is marked with an asterisk (*). Nucleotides involved in the EBSI/II-IBSI/II interactions and a-a' interaction are marked with curved arrows.

The 5' and 3' splice sites are indicated by straight arrows. The internal intron insertion site in the external intron is also indicated by an arrow. lA ufeiuoQ P-wv^no*I

mnvonm I uieujoQ 3**(o^Wrya/^ 'vo 1^ V

UIBlUOa * V " ^"IBUIOa Al V n g III UIBIUOQ

.E' VI if CNJ ^ ^11 sei ,3-3, SOOn \ lA ujBiuoa d3 = UOJ)U| n rL j^'"*Onv* Vn ,n*3" / A U|BUIOQ fi' Jpivovonnn

nnr^i" ,nVfeTpn'\jIBUJOQ" "nl* 7, , ,, luieuiOQ v~ny~^ n^y n o" X V Al UIBUIOQ fj-v III uieuJOQ V0 rPrv'y'"S oT^lQI ;-/YyXn.-:S.= n V n V 0 V V V 9" L"'

e6T 194 Discussion The existence of twintrons is strong evidence for the "intron late" model concerning intron evolution (Copertino, 1992). It is believed that internal introns insert randomly into external introns, but only those that affect the processing of the external introns are maintained through evolution. Studies show that all the internal introns found in the Euglena gracilis chloroplast genome interrupt crucial secondary or tertiary interaction elements of the external introns. The only three internal introns that have been found in a relatively less important domain are the psbC intron 2 internal introns (chapter 2). However, those three internal intron were found to interrupt the coding region of mat 2, and are therefore thought to regulate the expression of mat 2 instead of regulating the splicing of intron 2a. According to the postulated secondary structure model, the atpE intron 1 internal intron is inserted into the terminal loop of the external domain VI. Therefore, the internal intron does not disrupt any base-pairs. However, the domain I stem of the atpE intron 1 external intron is very short (only 8 bp plus a bulged "A"). Thus the internal intron is only 7 nucleotides upstream of the external branch site "A". This position is the closest to the 3' splicing site of a group II external intron that any internal intron has been found so far. The internal intron therefore could block the interaction between the crucial "A" and other structural elements, or prevent the "A" from being properly located in the catalytic center. The known domain 195 Discussion The existence of twintrons is strong evidence for the "intron late" model concerning intron evolution (Copertino, 1992). It is believed that internal introns insert randomly into external introns, but only those that affect the processing of the external introns are maintained through evolution. Studies show that all the internal introns found in the Euglena gracilis chloroplast genome interrupt crucial secondary or tertiary interaction elements of the external introns. The only three internal introns that have been found in a relatively less important domain are the psbC intron 2 internal introns (chapter 2). However, those three internal intron were found to interrupt the coding region of mat 2, and are therefore thought to regulate the expression of mat 2 instead of regulating the splicing of intron 2a. According to the postulated secondary structure model, the atpE intron 1 internal intron is inserted into the terminal loop of the external domain VI. Therefore, the internal intron does not disrupt any base-pairs. However, the domain I stem of the atpE intron 1 external intron is very short (only 8 bp plus a bulged "A"). Thus the internal intron is only 7 nucleotides upstream of the external branch site "A". This position is the closest to the 3' splicing site of a group II external intron that any internal intron has been found so far. The internal intron therefore could block the interaction between the crucial "A" and other structural elements, or prevent the "A" from being properly located in the catalytic center. The known domain 196 APPENDIX II

DETECTION OF GROUP D INTRONS IN EUGLENA MYXOCYLINDRACEA

Introduction

All the most parsimonious trees from a branch and bound search (see Figure 1- 8) using the Phylogenetic Analysis Using Parsimony (PAUP) program place E. myxocylindracea as the sister group to the most derived tri-taxa clade containing £. gracilis, E. mutabilis and E. geniculata, and a non-photosynthetic derivative. Astasia longa. The phylogenetic data suggests that the E. myxocylindracea lineage branched later than E. viridis and E. anabaena, and according to preliminary data (not shown), very possibly E. granulata and E. spirogyra. However, while most of the Euglenoid tested species exhibit evolutionary conservation of the mat 2 gene and psbC intron 2, this intron and internal ORF are missing in E. myxocylindracea. Since mat 2 has been found in E. viridis and E. anabaena, which are more basally branching than E. myxocylindracea, the lack of mat 2 in E. myxocylindracea is believed to be caused by a secondary loss event in the E. myxocylindracea lineage. This observation weakens the hypothesis that psbC intron 2 plays a fundamental role in intron processing, since several group III introns have been identified in the E. myxocylindracea chloroplast genome (Thompson, 1995), including psbC intron 4, the 197 group III twintron that contains mat 1 (Doetsch, 1997). However,

since mat 2 was found in a group II intron, it could also be proposed

that mat 2 acts specifically on group II introns. To test if E. myxocylindracea group II intron processing is affected by the absence of mat 2, we have attempted to identify other group II introns in E. myxocylindracea for study of their structural features. However, as we report in this chapter, no group II introns have yet been found in the E. myxocynlindracea chloroplast genome.

Therefore a strong link between mat 2 and the existence of all group II introns in the genome is implied.

Results

Selection of the most conserved genes for amplification:

The exon sequences of many genes that contain many group II introns in E. gracilis were compared with their homologues from the chloroplast genomes of Euglena, Arabidopsis, Chalmydomonas, maize, spinach, maize and rice. The consensus sequences that were produced from this comparison were used to design the primers for E. myxocylindracea intron screening by PGR. It was found that some genes, such as atpH, F, A, E and B, petB and psbT, have very high mutation rates and consensus sequences were difficult to deduce. In contrast, many other genes are very well conserved. Therefore we chose the more highly conserved genes to screen the E. myxocylindracea genome for group II introns. These genes include 198 the deeply rooted psbD gene which contains 8 group II introns and 1 group II twintron in E. gracilis-, the psbC gene which contains 8 group II introns, 1 group III twintron, 1 group II intron with an open reading frame, and three internal group II introns; the psbA gene containing 4 group II introns; the psaA gene with three group II introns; the psaB gene with 6 group II introns; the psbB gene containing three group II introns and 1 group III intron, and lastly, the rbcL gene which contains 9 group II introns. Degenerate oligonucleotides were synthesized to anneal to the exons of these genes in order to amplify the E. myxocylindracea homologues. The locations of the primers are shown in the bar diagrams in Figure A2- 1. The sequences of the primers are shown in table A2-1. Tabic A2-1 Oligo primers used to amplify /:. myxocylindracea cliloroplasl genome. The location is ilie IiMBL coordination of E. }>r(icilis chloroplast genonje. Degenerated nucleotides were indicated in parcntlie.scs. The "N" in the cbH5 secjnence represent all four nucleotides.

Name Strand Gene Feature Sc(|ucncc Location pbD8 mRNA pshD exon H CG'l'n'l'TGGTC(A/T)CAAAT(T/C)'rr(T/C)riG 8863-88'} 1 chCM cDNA pshC ex on '1 CCrcATACATITATAAG'l'CrAGCA'lTACC 18595-18624 pbC5 mRNA pshC exon 5 GAAGT(A/T)GC(C/T)CA'lTlTGT(A/'r)CC 20299-20318 cbCI 1 cDNA pshC cxon 11 C(G/T)ATC(G/T)ATCCriT'ri'CAAA(G/T)CC 24504-24526 pbAI mRNA pshA exon 1 GG(A/r)GGrrCGGTG'|-(A/T)'rr(C/G)ATG 24885-24905 cbA.') cDNA pshA cxon 5 GGGAAGriGTG(A/nGC(A/G)'iT'ACG'rrCGTG 277 19-27744 pal) I mRNA psnli cxon 1 GTA G A A AC( A/T)(C/G)( A/ryrrrG A A A A ATG G 37429-3745 1 pan2 mRNA p.sdii exon 2 GCri'rmCITCATGGCA/IOGC 42687-42703 caH7 cDNA psal) exon 7 CATGAGCCCA(A/T)GCrAA(A/T)G'n'rC 46327-46348 pbBl mRNA pshli cxon 1 A'I"GGG(A/ITrrcCCnXjGrATCGTG 99845-100367 cbB5 cDNA pxhli cxon 5 GGACTGCr(A/T)C(A/T)AAAAACACCATCNG 97284-97308

vO vO 200 No group II introns were identified in E. mvxocvlindracea Thompson at al. screened the entire rbcL gene and psbD axon I to exon 8 of E. myxocylindracea and found no introns (Thompson, 1995). The search for mat 2 homologues has shown that psbC intron 1 and intron 2 are both missing in E. myxocylindracea (chapter 3). Therefore, 22 group II introns in three genes of E. gracilis have been found to be absent in E. myxocylindracea. The E. myxocylindracea PGR amplification products of the remaining segment of psbD and psbC, and the psbA, psaA, psaB and psbB genes are shown in Figure A2-1. PGR using primers pbD8-cbG4 resulted in a 396 bp fragment. Sequence analysis confirms that this fragment corresponds to intronless E. myxocylindracea homologues of psbD exon 8 to psbC exon 4. Therefore introns 8 and 9 of the psbD gene and introns 1-3 of the psbC gene are not present in E. myxocylindracea. PGR of E. myxocylindracea using primers pbG5- cbGlO produced a fragment of approximately 1200 bp, which is the size of the intronless E. gracilis cDNA from psbC exon 5 to exon 10. Since all the introns between psbC exon 5 and exon 10 are at least 300 bp, the presence in E. myxocylindracea of any intron in this region would increase the size of the PGR fragment to at least 1500 bp which is clearly distinguishable from the intronless, 1200 bp fragment. Therefore, based on this PGR analysis we conclude that psbC introns 5 to 9 are absent in E. myxocylindracea. PGR with primers pbAl-cbA5 resulted in two substantial products. The large fragment is about 2800 bp, similar to the E. 201 gracilis homologue of the psbA gene, which contains 4 group II introns. The other fragment is about 930 bp, corresponding to the same size as the intronless E. gracilis segment. The large fragment was cloned and the ends sequenced. The end sequences of the large fragment are identical to the E. gracilis psbA gene sequences. Since all other sequencing data indicated that the E. myxocylindracea chloroplast genome differs significantly from that of £. gracilis in terms of primary sequence, we believe that this large fragment is E. gracilis contamination. Therefore, the remaining fragment, which is probably the correct product of E. myxocylindracea, is strongly indicative that the E. myxocylindracea psbA gene is lacking all introns. PGR with primers paBl-caB7 yielded no products, possibly due to the lack of specificity of the primers to the E. myxocylindracea genome. However, the PGR reaction using primers paB2-caB7 amplified an 870 bp product, corresponding to the intronless cDNA of the gene psaB from exon 2 to the 3' end, indicating the absence of introns 2 to 6 in E. myxocylindracea. In the case of the psbB gene, PGR with primers pbBI-cbB5 produced a 1300 bp fragment which is similar in size to the intronless allele of the psbB gene. There is a 104 bp group III intron in E. gracilis psbB gene which could be present in E. myxocylindracea. However, one group III intron would change the fragment length from about 1340 bp to approximately 1444 bp, and cause it to run more slowly than the 1371 bp marker on the gel. Therefore it is more likely that this group III intron is also absent in E. myxocylindracea. 202 Figure A2-1 Group II intron screening using PGR analysis of the E. myxocylindracea chloroplast genome. Total nucleic acid extract was used as template. A) PGR amplifications of selected genes. The sizes (in bp) of the molecular weight markers are shown on the left. The sizes of representative PGR products are shown in the middle. All the fragments are E. myxocylindracea amplifications except one contamination band from E. gracilis (marked as E. gracilis psbA). Primers used in each reaction are: lane 1- pbBl-cbB5; lane 2- pbG5- cbGll; lane 3- pbAl-cbA5; lane 4- pbD8-cbG4; lane 5 is the same as lane 1; lane 6- paB2-caB7. B.) Schematic diagram of the E. gracilis psbDC operon, psbA, psaB and psbB genes. Black boxes are exons. Shaded lollipops are group II introns and open lollipops are group III introns. Black arches on the lollipops are open reading frames. Arrows indicate the locations of all the oligonucleotide primers used. 3675' 2859(E. gracilis psbA) 2323' 1929- yl340(psbB el-e5). 1371" 1264- 1170 (psbC e5-ell) 930 (psbA el-eS) 702 • 877(psaBe2-€7)' 396 (psbD eS-psbC e4) 224 112

B psbD-C operon V

pbD8 cbC4 pbC5 cbClO

ffff ffffff , psbA •••• psaB ••••••• psbB pbAl cbA5 paBl paB2 caB7 pbBl 204 Figure A2-2 Comparison of the intron contents of eight genes in E. gracilis and in E. myxocylindracea. Notations are similar to Figure A2-1B except that group II and intron III introns are shown as blue and red lollipops, respectively. 205

E. gracilis E. myxocylindracea

A mats psbD sQSSiSS 5? matl mat! psbC psbA m psbK T I ycfl2 V V psaA L psaB rbcL

psbB 206 Discussion

PGR analysis described in this chapter has shown that all the group II introns are absent in 5 E. myxocylindracea chloroplast genes. In addition, Thompson et al have shown that the rbcL gene, which has 9 group II intron in E. gracilis, contained no intron in E. myxocylindracea (unpublished). A preliminary data also indicated that the three group II introns in psaA gene are absent in E. myxocylindracea (data not shown). In summary, 7 genes have been studied and none of the 46 group II introns present in E. gracilis have been identified in E. myxocylindracea. Therefore, to date, no group II introns have been identified in E. myxocylindracea, a phenomenon that's never before been observed among lately branching Euglenoid species. A summary of the known intron content in the £. myxocylindracea chloroplast genome is compared with E. gracilis in figure A2-2. Interestingly, 6 out of 7 group III introns are conserved in E. myxocylindracea, very possibly due to the simultaneous conservation of psbC intron 4, which contains mat I (Doetsch, in press). These observations strongly indicate that the existence of mat 2 directly affects the existence of group II introns in the Euglena chloroplast. The best explanation for the coexistence of mat 2 and group II introns is that mat 2 was one of the key elements in group II intron acquisition during evolution and also is necessary for processing the group II introns in the genome. 207 A hypothesis of the loss of group II introns in E. myxocylindracea is as follows:

• In the early development of the E. myxocylindracea lineage, psbC

intron 2 was lost, causing the recruitment of group II introns to be reduced or even stopped. • The loss of mat 2 largely inhibited the processing of the existing group II introns in the genome and down regulated general chloroplast gene expression. Therefore only the species that did not contain many group II introns survived.

• The survived species continued to evolve into the contemporary species, E. myxocylindracea, which possibly contain no group II intron. This theory emphasizes the critical role mat 2 plays in the expression of the entire genome and could be better supported if it

was found that more group II introns are missing from some very deeply rooted genes such as psbT, petB, atpB and atpE. However, the larger incidence of mutation in the exon sequences of these genes makes it difficult to generate good primers to carry out PGR analysis. Given the efficiency of automated sequencing, it may be more practical to subclone and sequence the whole genome. 208 CHAPTER 6

GENERAL DISCUSSION

The evolutionary relationships of the known intron types: There are five known types of introns, including group I, group II, group III, nuclear introns (spliceosomal introns) and protein spliced introns. Structure and mechanism data strongly suggest a close relationship among group III, group II, and nuclear introns while the other two categories seem to have originated from independent sources. Consistent with this hypothesis, both group I and group II introns are found in different locations in Eubacterial genomes. Members of the Eubacteria are among the oldest species on earth, dating back to approximately 3500 million years ago, before the divergence of eubactiera and archebacteria. In addition, almost all known introns found in prokaryotes are located in tRNA genes, indicating that tRNA was the original location of intron insertion. The 3'-phosphate/5' -OH protein spliced intron, which has only been found in archebacteria and metakaryotes (organelle-containing Eukaryotes), seems to have originated later than other intron types. Also, the same Leu tRNA which contains a group I intron in cyanobacteria appears to have a protein spliced intron in some archebacteria (Wich, 1987). This could indicate a relationship between group I and protein spliced introns. No introns have been identified in Archezoa, which are believed to be the oldest 209 Eukaryotes. These species contain no endosymbionts. This has provoked an interesting hypothesis, that Eukaryotic introns were probably all obtained through endosymbiotic events no earlier than 1000 million years ago. By analyzing the structures, mechanisms and the physical distribution of all known introns, Cavalier-Smith presented a global phylogentic intron tree based on the "introns late" theory (Figure 6-1) (Cavalier-Smith, 1991). In the model he suggested that self-splicing group I and group II introns originated first from transposons. Protein spliced introns were generated out of group I introns by a cleavage site conversion and hence lost their self-splicing ability. The occurrence of group I and group II introns in Eukaryotes are due to the endosymbiosis of mitochondria (with a purple bacterial ancestor) and chloroplasts (with a cyanobacterial ancestor). Nuclear introns were generated from genetic message exchange from the organelles to the nucleus. To explain transition from group II introns to nuclear introns, an "introns-in-pieces" model was proposed, which is supported by a phylogenetic data

(Roger and Doolittle, 1993). This model proposes that group II introns had the ability to translocate from organelles to the nucleus because they contained regions which encoded a reverse- transcriptase activity. The separation of transcription and translation resulting in slower gene expression relaxed the need for efficient processing of nuclear group II introns and therefore allowed fragmentation and degeneration of the intron-encoding DNA to occur. Domains of group II introns then evolved into trans-acting elements 210 such as snRNAs. Some of these transformed "group II introns" retained the ability to translocate and their movements lead to today's nuclear intron distribution. No clear evolutionary conclusions have been reached about group III introns except that they are related to group II introns. Based on their limited distribution (they have been found so far only in the Euglenoids), group III introns seem to be an off-shoot of group II intron evolution, in the same way that protein introns diverged from group I introns. 21 1

Errtcire tnotfe Bacteria EuKaryola

Sooetkincscm MetsKar/cia

rco-

ICOO-

- 17=0-

2500-1

Figure 6-1 The phylogeny of organisms and the five major steps in intron evolution (adapted from Cavalier-Smith, 1991). 1. selfish insertion of self-splicing introns into tRNA genes of organisms. 2. The rise of protein spliced tRNA introns from self-splicing introns. 3. mitochondria endosymbiosis brought self-splicing introns into Eukaryotes followed by intron lateral transfer from tRNA to rRNA and mRNA genes. 4. transfer of group II introns to the nucleus and evolution of spliceosomal introns. 5. endosymbiotic evolution of the chloroplast and additional transfer of group II introns into Eukaryotes. The phylogeny and dates are based on molecular, ultrastruciural cladistic, and fossil record studies. 212 Intron evolution in the Euglena chloroplast: The principle theory about intron occurrence in the Euglenoid chloroplast is the "founder intron theory", which proposes that introns were inserted into intron-less ancestors via internally encoded protein mediators. The main points of the theory are;

1 The chloroplast genome of the Euglenoids arose from an intron-less common ancestor. 2 An intron containing an internally coded protein, or maturase,

which could also promote intron mobility, invaded the Euglenoid chloroplast genome, resulting in the founder event. 3 The intron then was propagated intragenomically due to the activity of the intron-encoded maturase. 4 Some introns lost the maturase and may have been spread by trans-acting maturase activity. 5 Continued intron propagation lead to twintron formation. 6 The maturase lost its mobility-promoting activity and became involved solely in the splicing process.

According to this theory, there were one or a few "founder introns" at the beginning of intron occurrence in Euglenoid, which encoded "founder proteins" that were fully equipped with maturase activity and intron translocation activity. The intron translocation activity was most likely reverse transcriptase, endonuclease and ligase activities. These founder introns must have been present in 213 the common ancestor of all Euglenoid chloroplasts and were probably retained somehow since they were also involved in mediating intron processing. My studies of psbC intron 2 and mat 2 have shown that mat 2 is widely spread and deeply rooted at least within genus Euglena. The functional domains of mat 2 are well conserved, indicative of an essential activity. Also, the loss of mat 2 in E. myxocylindracea correlates with the loss of possibly all group II introns in the chloroplast genome. The data therefore suggests that mat 2 is a very good candidate for one of the "founder proteins". In addition to mat 2, a 458 aa group III intron encoded maturase, mat I with maturase-like features has also been characterized (Doestch, in press). The relationship between mat 1 and mat 2 is hard to tell since their protein sequences are very different. But since mat 1 was identified in E. myxocylindracea, together with some other group 111 introns, mat 1 might relate to group III introns but not to group II introns. So far, mat I has been shown to have a much wider distribution than mat 2. mat I has been identified not only in many Euglena species, but also in Eutreptia, Lepocinclis and the non- photosynthetic Astasia. The search for mat 2 in other Euglenoids, however, has not yet been successful. A possible technical reason is the insufficient sensitivity of the PCR reactions since the primers were designed according to the consensus psbC exon sequences of genus Euglena only. Another likely explaination is that mat 2 's 214 existence could be limited within Euglena. Therefore, mat 2 might have been a feature of the chloroplast genome in the common ancestor of only Euglena. mat I, on the other hand, might be more deeply rooted in the Euglenaphycea. Was the invasion of mat 2 into Euglena chloroplast genome compeltely independent of mat / ? Or did the existence of mat I somehow facilitate the invasion? Based on preliminary data, although mat 1 did not appear to relate to group II intron in the case of E. myxocylindrace, it might be responsible for the occurrence of introns other than group III introns. A mini-group II intron, a 230

nt intron lacking domain IV, was identified in Lepocinclis beutchlii, a Euglenoid which might not have mat 2. Interestingly, the presence of a mini-group II intron was also observed inside mat 2 m E. granulata. Mini-group II introns could be sensitive to the activities of both mat 1 and mat 2. Also, through homologous recombination, mini-group II introns could transform into either group II or group III introns. These mini-group II introns are evidence of a so far mysterious link between group II and group III introns as well as mat 1 and mat 2. Therefore, it is possible that mat 2 were recruited within a group Il-like intron through the help of mat I. Later, the group Il-like intron evolved into group II intron, i.e. the Euglena psbC intron 2a.

According to the PGR detection results and the phylogentic tree of Euglena and other Euglenoids (Thompson, 1995), mat 2 is present in some basally branching Euglena species such as E. viridis and E. 215 anabaena. Interestingly, preliminary analysis of the chloroplast genome of E. viridis, however, indicates that its intron content is far smaller than E. gracilis. Nine out of nine group II introns in rbcL gene are missing in E. viridis, and many E. gracilis twintrons have been found to be simple group II introns in E. viridis. Therefore mat 2 and the extraordinarily large group II intron population does not occur in all the Euglena species. Since mat2 is found in E. viridis, there is an implication that mat 2 might not have been a causal factor in the massive recruitment of group II introns into the E. gracilis genome. If mat 2 was not responsable for this massive recruitment then what was? Besides mat I and mat 2, there is a third ORF left to study. The 8th intron of the psbD gene, a gene in the same operon as psbC, is a 3600 bp fragment with a bona fide domain V and domain VI at the 3' end and two ORFs in the middle. The two ORFs are separated by a small sequence that is possibly a group III intron. Also, an X-domain (Mohr, 1993) was identified in one of the ORFs Therefore there is evidence that psbD intron 8 might be a group II intron that contains a single, 700-800 AA maturase-like protein gene. We named this unexplored gene mat 3, predicting that it could also have contributed to intron evolution and splicing. According to PGR analysis, psbD intron 8, is present in E. gracilis, E. mutabilis and E. geniculata, yet is absent in E. viridis, E. anabaena, E. spirogyra, E. granulata and E. myxocylindracea. Based on this data, mat 3 coexists with the large group II intron population and therefore could be the 216 main cause of the group II intron acqusition. It is also possible that mat 3 worked in concent with mat 2 to recruit group II introns. The E. gracilis mat 2 and mat 3 are quite different in amino acid sequence. However, considering that E. gracilis is one of the very late branching species, it can not be concluded that mat 2 and mat 3 are unrelated. On the contrary, considering their similar size and closely linked physical position, mat 3 could very well have diverged from mat 2 via DNA recombination. Combining the observations and reasoning above, an intron evolution path in Euglena and related Euglenoids can be proposed. The mat 1 containing group III intron, since it is the most widely distributed, could have been the first to invade the Euglenoid genome, leading to the insertion of other group III introns and mini- group II introns. Group II introns appeared after the invasion of the mat 2-containing mobile element, with or without the help of mat I. The invasion of mat 2 enabled the spread of typical group II introns including the mat i-containing psbD intron 8. The presence of mat 3 then induced the massive insertion of group II introns throughout the genome and lead to the present intron population in chloroplast genomes such as Euglena gracilis.

Of course, the detailed path of intron evolution in the Euglenoid chloroplast is far more complex than can be accounted for by a simple "introns late" theory. For example, intron loss events have been proposed to explain the absence of group II introns in the E. myxocylindracea lineage. A complete tree depicting intron evolution Ill among the Euglenoid species, if ever produced, will be full of reticulations and polytomies caused by a combination of events such as intron gains, intron losses, horizontal transfers, and endosymbiotic events.

Intron processing in the Eu^lena chloroplast. In comparison to many primitive chloroplast genomes, a smaller number of genes, the presence of various introns, and considerably altered or modified gene cluster arrangements have featured the Euglena gracilis chloroplast genome as a highly "derived" genome (Hong, thesis 1996). The frequent environmental changes during Euglenoid evolution, such as the constant switching of growing conditions from light to dark, and the ever growing needs for the nuclear-organelle coordination, might have caused these extensive modification in the chloroplast genome. One result of these evolutionary pressures could be the acqusition of the large number of introns in the E. gracilis chloroplast and their apparent lack of internally-encoded control over their own processing. The comparative analysis of psbC intron 2 indicates that many tertiary elements in the intron that were responsible for the efficiency of the processing are missing. Also, so far none of the Euglena chloroplast introns has been reported to be able to self-splice. This implies that the rate of intron processing in the chloroplast of E. gracilis , and possibly all the Euglena species present today, is controlled by other factors such as maturases and nuclear encoded proteins and RNA 218 elements. Unfortunately, this strong dependence on trans-factors makes it very difficult to study Euglena chloroplast introns in vitro.

Future prospects: The presence of psbC intron 2 and the mat 2 gene in 5 Euglena species has been confirmed. The presence of mat 2 in another 4 Euglena species was also strongly indicated. However, preliminary PCR amplification attempts with Cryptoglena, Eutreptia, Lepocinclis and Astasia resulted in too many background artifacts to provide accurate information about the presence of mat 2 in these species. There is a strong possibility that the psbC intron 2 region is not present, or greatly altered, in these species. To provide convincing data about either the presence or absence of this region of the genome, the specificity and sensitivity of the PCR reaction needs to be improved. A possible way is to synthesize oligos from known sequences such as mat 1 and rbcL in these species and perform sequencing analysis of the genomes to obtain detailed sequence information around the psbC area.

The intron content of the E. myxocylindracea chloroplast appears to be unique and very informative for intron evolutionary studies. Efforts should be made to complete the intron search of the entire E. myxocylindracea genome. An approach which employs the BAC cloning system to build a E. myxocylindracea chloroplast genomic library has been considered and the cloning procedure has been started. 219 The identification and characterization of mat 2 homologous genes have provided inspiring information concerning intron evolution and processing in the Euglena chloroplast. However, except for enlarging the mat 2 database by searching greater numbers of Euglenoids, there is not much more we can do at the DNA level to further investigate the relationship between mat 2 and group II introns. Instead, biochemical or mutagenic approaches are needed. The success of the first attempts at overexpression of a fragment of mat 2 has provided an opportunity for such biochemical analysis of intron maturases. Efforts shall be made to improve the expression protocol to increase the yield. Once a sufficient amount of protein is collected, in vitro RNA-binding assays could be performed to actually demonstrate the activity and function of mat 2. In vitro RNA binding could also test the specificity of mat 2 and mat I to group II and/or group III introns.

Due to the asexual reproduction of Euglena, the traditional induction and isolation of mutants has not been a feasable approach. However, the recent success of biolistic transformation of an RNA/protein expression vector into E. gracilis chloroplasts has provided an alternative approach to mutagenesis (Favreau, 1997). The vector that was succesfully incorporated into Euglena gracilis chloroplasts has the strong promoter and the 3' stabilization elements of the Euglena psbA gene which flank a small polycloning site. An aadA gene cassette was also constructed in the vector to generate spectromyocin and strepotomyocin resistance for selection. 220 Using tiiis plasmid, antisense RNA to mat 2 transcripts could be expressed in the Euglena chloroplast to test the function of the mat2 protein. We can also try to express isolatable mat 2 protein in its own environment by combining the mat 2 sequence with an affinity tag such as the histidine or S-tag from the pET vectors. A third use of this transformation system will be to test the secondary structures of group II and group III introns as determined by comparative analysis. One example could be to express the psbC intron 2 external intron with mutated nucleotides in the proposed stem-loop and test splicing efficiency in vivo.

However, from the aspect of intron evolution, this study of mat 2 alone is not enough to define a model for the origin and evolution of introns in the genus Euglena. In addition to further analysis of mat 1, a detailed study of mat 3 and its host intron, psbD intron 8, by PCR and RT-PCR analysis of E. gracilis and other Euglena species would be extremely informative and could help to further define the roles each intron-encoded protein has played during the course of the intron acquisition and radiation in Euglenoid chloroplasts. 221 REFERENCE

Augustin, S., Muller, M. W. and Schweyen, R. J. (1990) Reverse self- splicing of group II intron RNAs in vitro. Nature 343: 383-386

Barkan, A. 1988. Proteins encoded by a complex chloroplast transcription unit are each translated from both monocistronic and polycistronic mRNAs. EMBO J 7: 2637-44.

Bennett, D. C., Rogers, S. A., Chen, J. L. & Orozco, E. M. (1990) A primary transcript in spinach chloroplasts that completely lacks a 5' untranslated leader region. Plant Mol Biol 15: 111-119

Biosystems 16: 31-38 (1983) Algal photosensory apparatus probably represent multiple parallel evolutions.P. A. Kivic & P. L. Walne

Bonen, L. 1993. Trans-splicing of pre-mRNA in plants, animals, and protists. FASEB J 7; 40-46.

Boulanger, S. C., P. H. Faix, H. Yang, J. Zhuo, J. S. Franzen, C. L. Peebles and Perlman, P. S. (1996) Length changes in the joining segment between domains 5 and 6 of a group II intron inhibit self-splicing and alter 3' splice site selection. Mol Cell Biol 16: 5896-5904 222 Buetow, Dennis E., (1968) The Biology of Euglena Academic Press, New York and London

Carignani, G., Groudinsky, O., Frezza, D., Schiavon, E., Bergantino, E. and Slonimski, P.P. 1983. An mRNA maturase is encoded by the first intron of the mitochondrial gene for the subunit I of cytochrome oxidase in S. cerevisiae. Cell 35: I'i'i-lAl.

Carignani, G., Netter, P., Bergantino, E. and Robineau, S. 1986. Expression of the mitochondrial split gene coding for cytochrome oxidase subunit I in S. cerevisiae: RNA splicing pathway. Curr. Genet. 11: 55-63.

Cavalier-Smith, T. (1975) Electron and light microscopy of gametogenesis and gamete fusion inChlamydomonas reinhardii. Protoplasma 86: 1-18

Cavalier-Smith, T. (1991) Intron phylogeny: a new hypothesis. Trends Genet 7; 145-148

Cech, T. R. (1990) Self-splicing of group I introns. Annu Rev Biochem 59: 543-568

Cech, T. R., Zaug, A. J. and Grabowski, P. J. (1981) In vitro splicing of the ribosomal RNA precursor of Tetrahymena: involvement of a 223 guanosine nucleotide in the excision of the intervening sequence. Cell 27: 487-496

Chanfreau, G. and Jacquier, A. (1994) Catalytic site components common to both splicing steps of a group II intron. Science 266: 1383-1387

Chapman K. B. and J. D. Boeke (1991) Isolation and characterization of the gene encoding yeast debranching enzyme. Cell 65: 483-492

Chow, L. T., Gelinas, R. E., Broker T. R. and Roberts, R. J. (1977) An amazing sequence arrangement at the 5' ends of adenovirus 2 messenger RNA. Cell 12: 1-8

Christensen, T (1962) In "Botanik, " Vol. II, 2: 1-178, Masson, Paris

Colleaux, L., d'Auriol, L., Betermier, M., Cottarel, G., Jacquier, A., Galibert F. and Dujon, B. (1986) Universal code equivalent of a yeast mitochondrial intron reading frame is expressed into E. coli as a specific double strand endonuclease. Cell 44: 521-533

Copertino, D. W. and Hallick, R. B. (1991a) Group II twintron: an intron within an intron in a chloroplast cytochrome b-559 gene. EMBO J 10: 433-442 224 Copertino, D. W. and Hallick, R. B. (1993) Group II and group III introns of twintrons: potential relationships with nuclear pre-mRNA introns. Trends Biochem Sci 18: 467-471

Copertino, D. W., Christopher, D. A. and Hallick, R. B. (1991b) A mixed group Il/group III twintron in the Euglena gracilis chloroplast ribosomal protein S3 gene: evidence for intron insertion during gene Nucleic Acids Res 19: 6491-6497

Copertino, D. W., Hall, E. T., Van Hook, F. W., Jenkins, K. P. and Hallick, R. B. (1994) A group III twintron encoding a maturase-like gene excises through lariat intermediates. Nucleic Acids Res 22: 1029- 1036

Copertino, D. W., Shigeoka, S. and Hallick, R. B. (1992) Chloroplast group III twintron excision utilizing multiple 5'- and 3'-splice sites. EMBO J 11: 5041-5050

Costa, M. and Michel, F. (1995) Frequent use of the same tertiary motif by self-folding RNAs. EMBO J 14: 1276-1285

Costa, M., Deme, E., Jacquier. A. and Michel, F. (1997) Multiple tertiary interactions involving domain II of group II self-splicing introns. J Mol Biol 267: 520-536 225 Darnell J. E. and Doolittle, W. F. (1986) Speculations on the early course of evolution. Proc Natl Acad Sci USA 83: 1271-1275

Davies, R. W., Waring, R. B., Ray, J. A., Brown, T. A. and Scazzocchio, C. (1982) Making ends meet: a model for RNA splicing in fungal mitochondria. Nature 300: 719-724

Deutscher, M. P. (1984) Processing of tRNA in prokaryotes and eukaryotes. Crit. Rev. Biochem. 17: 45-72

Devereux, J., Haeberli, P. and Smithies, O. 1984. A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res 12: 387-395.

Doetsch, A. N., Thompson, M.D. and Hallick, B. H. (1998) A maturase- encoding Group III twintrons is conserved in deeply rooted euglenoid species: Are group III introns the chicken or the egg? Mol. Biol. Evol. 15: 76-86

Drager, R. G. & Hallick, R. B. (1993b) A novel Euglena gracilis chloroplast operon encoding four ATP synthase subunits and two ribosomal proteins contains 17 introns. Curr Genet 23: 271-280

Drager, R. G. and Hallick, R. B. (1993a) A complex twintron is excised as four individual introns. Nucleic Acids Res 21: 2389-2394 226

Eisermann, A. Tiller, K. & Link, G. (1990) In vitro transcription and DNA binding characteristics of chloroplast and etioplast extracts from mustard (Sinapis alba) indicate differential usage of the psbA promoter. EMBO J 9: 3981-3987

Eng, F. J. and Warner, J. R. (1991) Structural basis for the regulation of splicing of a yeast messenger RNA. Cell 65: 797-804

Favreau, Mitchell R., Thompson M. D. and Hallick, Richard B. (1997) Chloroplast transformation in Euglena gracilis: Splicing of group III introns and a group III twintron transcribed from a transgenic psbK operon. Submitted for publication.

Ferat, J. L. and Michel, F. (1993) Group II self-splicing introns in bacteria. Nature 364: 358-361

Gampel, A. and Cech, T. R. (1991) Binding of the CBP2 protein to a yeast mitochondrial group I intron requires the catalytic core of the RNA. Genes Dev 5: 1870-1880

Genetics Computer Group, I. September 1994,. Program Manual for the Wisconsin Package. Version 8. 575 Science Drive, Madison, Wisconsin, USA 53711., State: Ill Gibson, S.A. and Shillitoe, E.J. (1997) Ribozymes. Their functions and strategies for their use. Mol Biotechnol 7: 125-137

Gietz, R.D. and Schiestl, R. H. (1991) Applications of high efficiency lithium acetate transformation of intact yeast cells using single- stranded nucleic acids as carrier. Yeast 7: 253-263

Gilbert, W. (1985) Genes-in-pieces revisited. Science 228: 823-824

Gorbalenya, A. E. (1994) Self-splicing group I and group II introns encode homologous (putative) DNA endonucieases of a new family. Protein Sci 3: 1117-1120

Gray, M. W. (1993) Origin and evolution of organelle genomes. Curr Opin Genet Dev 3: 884-890

Gray, M. W., Sankoff, D. and Cedergren, R. J. (1984) On the evolutionary descent of organisms and organelles: a global phylogeny based on a highly conserved structural core in small subunit ribosomal RNA. Nucleic Acids Res 12: 5837-5852

Grivell, L. A. (1994) Intron mobility. Invasive introns. Curr Biol 4: 161-164 228 Guerrier-Takada, C., Gardiner, K., Marsh, T., Pace N. and Altman, S. (1983) The RNA moiety of ribonuclease P is the catalytic subunit of the enzyme. Cell 35: 849-857

Haff, L. A. and Bogorad, L. (1976) Hybridization of maize chloroplast DNA with transfer ribonucleic acids. Biochemistry 15: 4105-4109

Hallick, R. B. (1992) Chloroplasts. Curr Opin Genet Dev 2: 926-930

Hallick, R. B., Hong, L., Drager, R. G., Favreau, M. R., Monfort, Orsat, A. B., Spielmann, A. and Stutz, E. (1993) Complete sequence of Euglena gracilis chloroplast DNA. Nucleic Acids Res 21: 3537-3544

Harris-Kerr, C. L., Zhang, M. and Peebles, C. L. (1993) The phylogenetically predicted base-pairing interaction between a and a' is required for group II splicing in vitro. Proc Natl Acad Sci USA 90: 10658-10662

Haseloff, J. and Gerlach, W. L. (1988) Simple RNA enzymes with new and highly specific endoribonuclease activities. Nature 334: 585-591 Hebbar, S.K., Belcher, S.M. and Perlman, P.S. 1992. A maturase- encoded group II intron of yeast mitochondria self-splices in vitro. Nucleic Acids Res. 20: 1747-1754. 229 Hetzer, M., Wurzer, G., Schweyen, R. J. and Mueller, M. W. (1997) Trans-activation of group II intron splicing by nuclear U5 snRNA. Nature 386: 417-420

Hiratsuka, J., Shiraada, H., Whittier, R., Ishibashi, T., Sakamoto, M., Mori, M., Kondo, C., Honji, Y., Sun, C. R., Meng B. Y. (1989) The complete sequence of the rice (Oryza sativa) chloroplast genome: intermolecular recombination between distinct tRNA genes accounts for a major plastid DNA inversion during the evolution of the cereals. Mol Gen Genet 217: 185-194

Ho, Y., Kim, S. J. and Waring, R. B. (1997) A protein encoded by a group I intron in Aspergillus nidulans directly assists RNA splicing and is a DNA endonuclease. Proc Natl Acad Sci U S A 94: 8994-8999

Hong, L. and Hallick, R. B. (1994a) A group III intron is formed from domains of two individual group II introns. Genes Dev 8: 1589-1599

Hong, L. and Hallick, R. B. (1994b) Gene structure and expression of a novel Euglena gracilis chloroplast operon encoding cytochrome b6 and the beta and epsilon subunits of the H(+)-ATP synthase complex. Curr Genet 25: 270-281

Hong, L., Stevenson, J. K., Roth, W. B. and Hallick, R. B. (1995) Euglena gracilis chloroplast psbB, psbT, psbH and psbN gene cluster: 230 regulation of psbB-psbT pre-mRNA processing. Mol Gen Genet 247:

180-188

Hull R. and Will, H. (1989) Molecular biology of viral and nonviral retroelements. Trends Genet 5: 357-359

Jacquier, A. and Dujon, B. (1985) An intron-encoded protein is active in a gene conversion process that spreads an intron into a mitochondrial gene. Cell 41: 383-394

Jacquier, A. and Jacquesson-Breuleux, N. (1991) Splice site selection and role of the lariat in a group II intron. J Mol Biol 219: 415-428

Jarrell, K. A., Peebles, C. L., Dietrich, R. C., Romiti, S. L. and Perlman, P. S. (1988) Group II intron self-splicing. Alternative reaction conditions yield novel products. J Biol Chem 263: 3432-3439

Jenkins, K.P., Hong, L. & Hallick, R. B. (1995) Alternative splicing of the Euglena gracilis chloroplast roaA transcript. RNA 1: 624-633

Kivic, P. A. and Walne, P. L. (1983) Algal photosensory apparatus probably represent multiple parallel evolutions. Biosystems 16: 31- 38 231 Kjems, J., Leffers, H., Garrett, R. A., Wich, G., Leinfelder, W. and Bock, W. (1987) Gene organization, transcription signals and processing of the single ribosomal RNA operon of the archaebacterium Thermoproteus tenax. Nucleic Acids Res 15: 4821-4835

Klessig, D. F. (1977) Two adenovirus mRNAs have a common 5' terminal leader sequence encoded at least 10 kb upstream from their main coding regions. Cell 12: 9-21

Knoop, V. and Brennicke, A. (1994) Evidence for a group II intron in Escherichia coli inserted into a highly conserved reading frame associated with mobile DNA sequences. Nucleic Acids Res 22: 1167- 1171

Koch, J.L., Boulanger, S.C., Dib-Hajj, S.D., Hebbar, S.K. and Perlman, P.S. 1992. Group II intron deleted for multiple substructures retain self- splicing activity. Mol. Cell. Biol. 12: 1950-1958.

Koller, B., Clarke, J. and Delius, H. 1985. The structure of precursor mRNAs and of excised intron RNAs in chloroplasts of Euglena gracilis. EMBO J. 4: 2445-2450.

Konarska, M. M., Grabowski, P. J., Padgett R. A. and Sharp, P.A. (1985) Characterization of the branch site in lariat RNAs produced by splicing of mRNA precursors. Nature 313: 552-557 232

Kovalic, D., Kwak, J. and Weisblum, B. 1991. General method for direct cloning of DNA fragments generated by the polymerase chain reaction. Nucleic Acids Res 19: 4560.

JCrishnan, S., Barnabas, S. and Barnabas, J. (1990) Interrelationships among major protistan groups based on a parsimony network of5S rRNA sequences. Biosystems 24: 135-144

Lambowitz, A. M. (1989) Infectious introns. Cell 56: 323-326

Lambowitz, A. M. and Perlman, P. S. (1990) Involvement of aminoacyl-tRNA synthetases and other proteins in group I and group II intron splicing. Trends Biochem Sci 15: 440-444

Lazowska, J., Meunier, B. and Macadre, C. (1994) Homing of a group II intron in yeast mitochondrial DNA is accompanied by unidirectional co-conversion of upstream-located markers. EMBO J 13: 4963-4972

Leedale, G. F. (1967) Euglenida-euglenophyta. Annu Rev Microbiol 21: 31-48 233 Liere, K. and Link, G. (1995) RNA-binding activity of the matK protein encoded by the chloroplast trnK intron from mustard (Sinapis alba L.). Nucleic Acids Res 23: 917-921

Marechal-Drouard, L., Ramamonjisoa, D., Cosset, A., Weil, J. H. and Dietrich, A. (1993) Editing corrects mispairing in the acceptor stem of bean and potatomitochondrial phenylalanine transfer RNAs. Nucleic Acids Res 21: 4909-4914

Marechal-Drouard, L., Weil, j. H. and Dietrich, A., (1993) Transfer RNAs and transfer RNA genes in plants. Annu. Rve. Plant Mol. Biol. 44: 13-32

Markowicz, Y., Loiseaux-de Goer, S. and Mache, R. (1988) Presence of a 16S rRNA pseudogene in the bi-molecular plastid genome of the primitive brown alga Pylaiella littoralis. Evolutionary implications. Curr Genet 14: 599-608

Matsuura, M., Saldanha, R., Ma, H., Wank, H., Yang, J., Mohr, G., Cavanagh S., Dunny G. M., Belfort M. and Lambowitz, A. M. (1997) A bacterial group II intron encoding reverse transcriptase, maturase, and DNA endonuclease activities: biochemical demonstration of maturase activity and insertion of new genetic information within the intron. Genes Dev. 11: 2910-2924 234 McQuade, A. B. (1983) Origins of the nucleate organisms II. Biosystems 16: 39-55

Michel F. and Ferat, J. L. (1995) Structure and activities of group II introns. Annu Rev Biochem 64: 435-461

Michel, F., Umesono, K. and Ozeki, H. (1989) Comparative and functional anatomy of group II catalytic introns~a review. Gene 82: 5-30

Microbiol Rev 57: 953-994 (1993) Kingdom protozoa and its 18 phyla.T. Cavalier-Smith

Mohr, G., Perlman, P. S. and Lambowitz, A. M. (1993) Evolutionary relationships among group II intron-encoded proteins and identification of a conserved domain that may be related to maturase function. Nucleic Acids Res 21: 4991-4997

Mol Cell Biol 15: 4479-4488 (1995) Studies of point mutants define three essential paired nucleotides in the domain 5 substructure of a group II intron.Boulanger, S. C., Belcher, S. M., Schmidt, U., Dib-Hajj, S. D., Schmidt, S. D. and Perlman, P. S.

Moran, J. V., Zimmerly, S., Eskes, R., Kennell, J. C., Lambowitz, A. M., Butow, R. A. and Perlman, P. S.(1995) Mobile group II introns of 235 yeast mitochondrial DNA are novel site-specific retroelements. Mol Cell Biol 15: 2828-2838

Mori, M. and Schmelzer, C. (1990) Group II intron RNA-catalyzed recombination of RNA in vitro. Nucleic Acids Res 18: 6545-6551

Mori, M., Niemer, I. and Schmelzer, C. (1992) New reactions catalyzed by a group II intron ribozyme with RNA and DNA substrates. Cell 70: 803-810

Mullet, J. E. (1988) Chloroplast development and gene expression. Ann. Rev. Plnat Physiol. Plant Mol. Biol. 39, 475-502

Nakamura, T. M., Morin, G. B., Chapman, K. B., Weinrich, S. L., Andrews, W. H., Lingner, J., Harley C. B. and Cech, T. R. (1997) Telomerase catalytic subunit homologs frobm fission yeast and human. Science 277: 955-959

Nilsen, T. W. (1994a) RNA-RNA interactions in the spliceosome: unraveling the ties that bind. Cell 78: 1-4

Nilsen, T. W. (1994b) Unusual strategies of gene expression and control in parasites. Science 264: 1868-1869 236 Ohyama, K. (1996) Chloroplast and mitochondrial genomes from a liverwort, Marchantia polymorpha— gene organization and molecular evolution. Biosci Biotechnol Biochem 60: 16-24

Palmer, J. D. (1985) Comparative organization of chloroplast genomes. Annu Rev Genet 19: 325-354

Perlman, P. S. and Butow, R. A. (1989) Mobile introns and intron- encoded proteins. Science 246: 1106-1109

Pfitzinger, H., Marechal-Drouard, L., Pillay, D. T., Weil, J. H. and Guillemaut, P. (1990) Variations during leaf development of the relative amounts of two bean (Phaseolus vulgaris) chloroplast tRNAs(Phe) which differ in their minor nucleotide content. Plant Mol Biol 14: 969-975

Pringsheim, A. (1953) Planta 42: 478

Purves, W. K., Orians, G. H. and Heller, C. H., (1992) LIFE: The Science of Biology. Sinauer Associates INC.

Pyle, A. M. and Green, J. B.(1994) Building a kinetic framework for group II intron ribozyme activity:quantitation of interdomain binding and reaction rate. Biochemistry 33: 2716-2725 237 Roger, A. J. and Doolittle, W. F. (1993) Molecular evolution. Why introns-in-pieces? Nature 364: 289-290

Roger, A. J., Keeling, P. J. and Doolittle, W. P. (1994) Introns, the broken transposons. Soc Gen Physiol Ser 49; 27-37

Rordorf B.F. and Kearns, D. R. (1976) Effects of europium (III) on the thermal denaturation and cleavage of transfer ribonucleic acids. Biopolymers 15: 1491-1504

Saldanha, R., Mohr, G., Belfort, M. and Lambowitz, A. M. (1993) Group I and group II introns. FASEB J 7: 15-24

Schafer, B., Wilde, B., Massardo, D. R., Manna, F., Giudice, L. and Wolf, K. (1994) A mitochondrial group-I intron in fission yeast encodes a maturase and is mobile in crosses. Curr Genet 25: 336-341

Schmidt, U., Podar, M., Stahl, U. and Perlman, P. S. (1996) Mutations of the two-nucleotide bulge of D5 of a group II intron block splicing in vitro and in vivo: phenotypes and suppressor mutations. RNA 2: 1161-1172

Sellem, C. H., Lecellier, G. and Belcour, L. (1993) Transposition of a group II intron. Nature 366: 176-178 238 Shannon, K. W. and Guthrie, C. (1991) Suppressors of a U4 snRNA mutation define a novel U6 snRNP protein with RNA- binding motifs. Genes Dev 5: 773-785

Shearman, C., Godon, J. J. and Gasson, M. (1996) Splicing of a group II intron in a functional transfer gene of Lactococcus lactis. Mol Microbiol 21: 45-53

Sogin, M.L., Gunderson, J. H., Elwood, H. J., Alonso, R. A. and Peattie, D. A. (1989) Phylogenetic meaning of the kingdom concept: an unusual ribosomal RNA from Giardia lamblia. Science 243: 75-77

Stevenson, J. K. and Hallick, R. B. (1994) The psaA operon pre-mRNA of the Euglena gracilis chloroplast is processed into photosystem I and II mRNAs that accumulate differentially depending on the conditions of cell growth. Plant J 5: 247-260

Stevenson, J. K., Drager, R. G., Copertino, D. W., Christopher, D. A., Jenkins, K. P., Yepiz-Plascencia, G. and Hallick, R. B. (1991) Intercistronic group III introns in polycistronic ribosomal protein operons of chloroplasts. Mol Gen Genet 228: 183-192

Stevenson, J. K., Drager, R. G., Copertino, D. W., Christopher, D. A., Jenkins, K. P., Yepiz-Plascencia, G. and Hallick, R. B. (1991) 239 Intercistronic group III introns in polycistronic ribosomal protein operons of chloroplasts. Mol Gen Genet 228: 183-192

Suchy, M. and Schmelzer, C. (1991) Restoration of the self-splicing activity of a defective group II intron by a small trans-acting RNA. J Mol Biol 222: 179-187

Sugita, M. and Sugiura, M. (1996) Regulation of gene expression in chloroplasts of higher plants. Plant Mol Biol 32: 315-326

Sugiura, M. (1992) The chloroplast genome. Plant Mol Biol 19: 149-

168

Tanaka, T., Wakasugi, K., Kuwano, Y., Ishikawa, K. and Ogata, K. (1986) Nucleotide sequence of cloned cDNA specific for rat ribosomal protein L35a. Eur J Biochem 154: 523-527

Temin, H. M. (1985) Reverse transcription in the eukaryotic genome: retroviruses, pararetroviruses, retrotransposons, and retro transcripts. Mol Biol Evol 2; 455-468 240 Thompson, M. D., Copertino, D. W., Thompson, E., Favreau A. R. and Hallick, R. B. (1995) Evidence for the late origin of introns in chloroplast genes from an evolutionary analysis of the genus Euglena. Nucleic Acids Res 23: 4745-4752

Tobin, E. M. and Silverthorne, J. (1985) Light regulation of gene expression in higher plants. Ann. Rev. Plant Physiol. 36: 569-593

Wenzlau, J. M., Saldanha, R.J., Butow R. A. and Perlman, P. S. (1989) A latent intron-encoded maturase is also an endonuclease needed for intron mobility. Cell 56: 421-430

Whittaker, E. and Beggs, J. D. (1991) The yeast PRP8 protein interacts directly v/ith pre-mRNA. Nucleic Acids Res 19: 5483-5489

Wiesenberger, G., Link, T. A., Ahsen, L-., Waldherr V. and Schweyen, R. J. (1991) MRS3 and MRS4, two suppressors of mtRNA splicing defects in yeast, are new members of the mitochondrial carrier family. J Mol Biol 217: 23-37

Wiesenberger. G., Waldherr, M. and Schweyen, R. J. (1992) The nuclear gene MRS2 is essential for the e.xcision of group II introns from yeast mitochondrial transcripts in vivo. J Biol Chem 267: 6963- 6969 241 Winkler, M. and Kuck, U. (1991) The group IIB intron from the green alga Scenedesmus obliquus mitochondrion: molecular characterization of the in vitro splicing products. Curr Genet 20: 495- 502

Wise, J. A. (1993) Guides to the heart of the spliceosome. Science 262: 1978-1979

Woolford, J. L. and Peebles, C. L. (1992) RNA splicing in lower eukaryotes. Curr Opin Genet Dev 2: 712-719

Wu, H. N. and Lai, M. M. (1989) Reversible cleavage and ligation of hepatitis delta virus RNA. Science 243: 652-654

Xiong, Y. and Eickbush, T. H. (1990) Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J 9: 3353-3362

Yang, Y., Macdonald, G. J. and Duggan, K. A. (1996) Changes in angiotensin II metabolism contribute to the increased pressor response to angiotensin after chronic treatment with L-NAME in the spontaneously hypertensive rat. Clin Exp Pharmacol Physiol 23: 611- 613 242 Yasuhira, S. and Simpson, L. (1997) Phylogenetic affinity of mitochondria of Euglena gracilis and kinetoplastids using cytochrome oxidase I and hsp60. J Mol Evol 44: 341-347

Zimmerly, S., Guo, H., Eskes, R., Yang, J., Perlman, P. S. and Lambowitz, A. M. (1995a) A group II intron RNA is a catalytic component of a DNA endonuclease involved in intron mobility. Cell 83: 529-538

Zimmerly, S., Guo, Perlman, P. S. and Lambowitz, A. M. (1995b) Group II intron mobility occurs by target DNA-primed reverse transcription. Cell 82: 545-554 IMAGE EVALUATION TEST TARGET (QA-3)

iai2-8 1^ 1.0 US 2.2 ItHi - 1^0 2.0 I.I 1.8

1.25 1.4 1.6

1

6

V