Experimental analysis of trans-splicing of an ascidian troponin 1

Sandra Mortimer Biology Department McGill University, Montreal Submitted April 2007

"A thesis submitted to McGill University in partial fulfillment of the requirements of the degree of Master of Science."

© Sandra Mortimer, 2007 Library and Bibliothèque et 1+1 Archives Canada Archives Canada Published Heritage Direction du Branch Patrimoine de l'édition

395 Wellington Street 395, rue Wellington Ottawa ON K1A ON4 Ottawa ON K1A ON4 Canada Canada

Your file Votre référence ISBN: 978-0-494-32759-3 Our file Notre référence ISBN: 978-0-494-32759-3

NOTICE: AVIS: The author has granted a non­ L'auteur a accordé une licence non exclusive exclusive license allowing Library permettant à la Bibliothèque et Archives and Archives Canada to reproduce, Canada de reproduire, publier, archiver, publish, archive, preserve, conserve, sauvegarder, conserver, transmettre au public communicate to the public by par télécommunication ou par l'Internet, prêter, telecommunication or on the Internet, distribuer et vendre des thèses partout dans loan, distribute and sell th es es le monde, à des fins commerciales ou autres, worldwide, for commercial or non­ sur support microforme, papier, électronique commercial purposes, in microform, et/ou autres formats. paper, electronic and/or any other formats.

The author retains copyright L'auteur conserve la propriété du droit d'auteur ownership and moral rights in et des droits moraux qui protège cette thèse. this thesis. Neither the thesis Ni la thèse ni des extraits substantiels de nor substantial extracts from it celle-ci ne doivent être imprimés ou autrement may be printed or otherwise reproduits sans son autorisation. reproduced without the author's permission.

ln compliance with the Canadian Conformément à la loi canadienne Privacy Act some supporting sur la protection de la vie privée, forms may have been removed quelques formulaires secondaires from this thesis. ont été enlevés de cette thèse.

While these forms may be included Bien que ces formulaires in the document page count, aient inclus dans la pagination, their removal does not represent il n'y aura aucun contenu manquant. any loss of content from the thesis. ••• Canada ABSTRACT

1 investigated SL trans-splicing in the troponin 1 gene of Ciona intestinalis.

Experimental mutation ofthe AG dinucleotide adjacent to the natural trans-splice acceptor site (-64) in CiTnI/nuclacZ constructs eliminated trans-splicing to that site in

Ciona embryos but activated trans-splicing at cryptic acceptor sites at -76 and -39, adjacent to the nearest AG dinucleotides. However, not all AG dinucleotides specify cryptic acceptor sites because outron internal deletions or 3 'truncation mutants were trans-spliced at a far-upstream AG-adjacent cryptic site (-346), leaving many AGs in the retained outron segments. Thus, additional sequence elements that are present only in the

-346 and -76/-64/-39 regions are required for cryptic acceptor activity. AlI mutant constructs generated detectable fi-gal enzyme expression, although the mutant with the longest retained-outron segment appeared less active. Therefore, mRNA accumulation and translation do not require trans-splicing to the natural acceptor site, although they may be facilitated by the normal removal of the outron during trans-splicing. ABSTRAIT

J'ai investigé l'excision en trans SL dans la gène CiTnI de l' organism Ciana intestinalis. Les expériences dans lesquelles j'ai effectuée une mutation au AG qui précède le normal de l'excision en trans ont éliminé l'excision à cet endroit, mais dans ces embryons de Ciana, l'excision en trans a été effectuer a un site cryptique, soit à -76 ou à -39 -les deux AG les plus proches du site normal. Pourtant, ce n'est pas touts les

AG qui peuvent fonctionner comme accepteur de l'excision en trans SL. En effet, des expériences utilisant des molécules d'ADN qui manquent des sections du outron ont démontré qu'il n'y qu'un autre site pour l'excision en trans. Appart du région autour de-

39/-64/-76, seulement le région autour de -346 est reconnu comme un site d'excision en trans. Il n'y avait aucun mutation qu a éliminé complètement la production d'une protéine active, mais celui qui a retenu le plus outron semblait d'avoir perdu aussi de l'activité. Donc, l'excision en trans ne doit pas nécessairement être effectuée au site normal pour obtenir une protéine CiTnI.

2 ACKNOWLEDGEMENTS

1 would like to thank my supervisor, Dr. Ken Hastings, for the opportunity to conduct this research and the support provided throughout. To the members of the Hastings' lab, thank you for you help and advice. Thank you to the members of my supervisory committee, Drs. Ehab Abouheif and Richard Roy, for the critical questions and creative suggestions. FinaIly, 1 must thank Dr. Tom Meedel for allowing me to perform work on

Ciona in his laboratory at Rhode Island College and for aIl of the insight he provided in working with the animaIs.

Il TABLE OF CONTENTS

Abstract...... 1

Abstrait ...... 2

1. Introduction Overview...... 3 Phylogenetic Distribution of SL Trans-splicing...... 4 Characteristics of the SL RNA ...... 6 Functions of SL Trans-splicing ...... 9 Mechanism of SL trans-splicing and cis-splicing ...... )3 Mechanism of SL trans-splicinK ...... 13 Determinants of trans-splice acceptor sites...... 15 Spliceosome Assembly at the Cis-splice Acceptor Site ...... 17 Cryptic Splicing Sites: Cis-splicing ...... ! 9 Cryptic Splicing Sites: Trans-splicing ...... 21 Ciona intestinalis ...... 22 Ascidian Muscle ...... 23 C. intestinalis troponin 1...... 24 C. intestinalis genomic data...... 25 The Project and Rationale...... 26

2. Methods General Cloning Procedure...... 28 Standard PCR and Agarose Gels...... 30 Mutation ofNatural Trans-splice Acceptor Site...... )1 Deletion between Natural Trans-splice Acceptor Site and Upstream Cryptic Site ...... 33 Outron 3' end Truncation Constructs...... 35 Collection of AnimaIs ...... 42 Electroporation of DNA Constructs into Ciona Embryos ...... 43 p-galactosidase Staining of Tailbud Embryos...... 45 Scoring p-galactosidase Activity...... 45 RNA Extraction from Electroporated Ciona Embryos ...... 46 Poly-C Tailing 5 'RACE ...... 46 SL PCR...... 48 SMART 5'RACE...... 48

3. Results: Cryptic Trans-splice Acceptor Sites Natural Trans-splice Acceptor Site Mutant...... 50 Outron Deletion (-319 to -77) Construct...... 62 Outron 3 , Deletion Constructs ...... 65 Comparion ofbranchpoint~like sequences...... 68

iii 4. Results: An Altemate Promoter? ...... 70 Scan of EST database...... 73

5. Discussion Distribution of Trans-splice Acceptor Sites...... 75 Is Trans-splicing Needed for Protein Expression? ...... 78 Sanitation Hypothesis ...... 80 How to Build an Outron-Containing Molecule ...... 80 Problems Still Remaining and Future Studies...... 83

6. References ...... 85

Appendix 1...... 92

Appendix II ...... 93

iv FIGURES

Figure 1: An SL RNA molecule 5

Figure 2: Mechanism of SL trans-splicing 12

Figure 3: Schematic drawing of constructs 37

Figure 4: CiTnIDNA in constructs 38

Figure 5: SL PCR of fi-gal mRNAs from 12h embryos electrotransformed with 52 wildtype trans-splice acceptor, mutant trans-splice acceptor and -316, -77 outron deletion constructs

Figure 6: SL PCR-derived 5'end sequences oftranscripts isolated form 12h C. 53 intestinalis embryos electrotransformed with CiTnI/nuclear lacZ constructs

Figure 7: Poly(dC)-tailing 5'RACE nlacZ mRNA from 12h embryos 56 electrotransformed with the wildtype trans-splice acceptor, the trans-splice acceptor mutants or the -319, -77 deletion construct

Figure 8: Examples oftailbud embryos for each of the categories used in semi- 62 quantitative B-galactosidase expression assay

Figure 9: SL PCR ofCiTnlnuclacZ RNA from 12h embryos 67 electrotransformed with the outron truncation constructs

Figure 10: 5'end sequences of the -35 RNA transcripts from C. intestinalis 71 embryos electrotransformed with CiTnIlnuclear lacZ constructs

Figure Il: Poly( dC)-tailing 5'RACE of endogenous CiTnI rnRNA from adult 72 bodywall muscle

Figure 12: Distribution ofthe 5'ends ofCiTnI ESTs 74

Figure 13: Sequence map ofthe CiTnloutron 75

v 1. INTRODUCTION

Overview

The research focus of our lab is the regulation and molecular evolution of the troponin 1 (TnJ) gene family in the chordates/vertebrates. This work follows two streams, one based on studies of the three differentially-expressed Tnl of the higher vertebrates and the other based on studies of the single Tnl gene present in a simple chordate organism, the ascidian , Ciona intestinalis. Studies of the Ciona intestinalis troponin 1 (CiTnl) gene revealed, surprisingly, that SL trans-splicing was involved in the maturation of the CiTnl pre-mRNA (Vandenberghe et al., 2001). SL trans-splicing is a spliceosomal process whereby the 5'end of a pre-mRNA molecule is replaced through a splicing reaction by a short RNA , the spliced leader (SL) from a special RNA molecule, the SL RNA (Fig. 1) (Davis, 1996). The 5' end of the pre-mRNA molecule that is removed during this process (Fig. 2) is called the outron (Conrad et al.,

1991).

The functions of SL trans-splicing are poorly understood, particularly in the case of genes, like CiTnl, that are not part of polycistronic transcription units (or operons) but are, apart from trans-splicing, typical monocistronic transcription units (Davis, 1996).

We would ultimately like to determine the functional importance of SL trans-splicing for expression of CiTnl at the mRNA and protein levels. Determining a function for SL trans-splicing of monocistronic genes such as CiTnl will shed light on the evolutionary origins of SL trans-splicing and the likely genomic impacts of the evolutionary gain or loss of trans-splicing. The initial goal of this project was to experimentally eliminate trans-splicing of CiTnl gene transcripts in order to assess whether trans-splicing is

3 essential for gene expression. In the course of the work, 1 uncovered aspects of trans­ splicing that made the original experimental goal more difficult than was originally anticipated. My early findings showed it was necessary to first characterize the distribution of cryptic trans-splice acceptor sites in the CiTnloutron before it would be possible to experimentally eliminate trans-splicing. This thesis describes my initial findings and my subsequent characterization of the distribution of trans-splice acceptor sites in the CiTnloutron. It also indicates a plausible experimental approach based on my results that could eliminate CiTnI trans-splicing in future work.

Phylogenetic Distribution of SL Trans-splicing

SL trans-splicing has a patchy phylogenetic distribution throughout the eukaryotes that makes its evolutionary origin somewhat of a mystery. (Refer to Hastings,

2005 for a phylogenetic tree showing the distribution of SL trans-splicing). SL trans­ splicing was originally identified in a unicellular protist, Trypanosoma brucei (Campbell et al., 1984). Since this initial discovery, SL trans-splicing has been found in

(Krause and Hirsh, 1987), Platyhelminths () (Brehm et al., 2002; Rajkovic et al., 1990; Zayas et al., 2005), Euglena (Tessier et al., 1991), chordates (Ganot et al.,

2004; Vandenberghe et al., 2001), cnidarians (Stover and Steele, 2001) and

(Pouchkina-Stantcheva and Tunnacliffe, 2005). There is no evidence of SL trans-splicing in the highly characterized and thoroughly sequenced genomes of mouse, Drosophila or man (Nilsen, 2001). The amount of genomic data available on these animals is extensive making it reasonable to believe that trans-splicing does not OCCur in these animals, or at the very most occurs at a very low level (Davis, 1996; Hastings, 2005; Nilsen, 2001).

4 There are two equally parsimonious hypotheses for the origin of trans-splicing that can expIain such a wide but sporadic distribution. The first hypothesis is one of ancestral origin where trans-splicing is a very ancient eukaryotic characteristic which has subsequently been lost in numerous lineages (Hastings, 2005; Nilsen, 2001; Stover and

Steele, 2001). The alternative hypothesis is that trans-splicing Was not an ancestral eukaryotic process, but one that arose de nova, independently, in the ancestors of each of the organismallineages in which it occurs today (Hastings, 2005; Nilsen, 2001; Stover and Steele, 2001). It is also theoretically possible that SL trans-splicing arose once in one specific lineage, and has since spread to others by horizontal transfer of the SL RNA gene(s). However, ifthis was the case, it has been obscured by subsequent sequence evolution because there is no c1ear overall sequence similarity among the SL RNAs of different phyla. The chordates provide an interesting phylum for study of the evolution of trans-splicing because SL trans-splicing is present in one major group, the

(Ganot et al., 2004; Vandenberghe et al., 2001), but is absent from another, the vertebrates.

The fraction of mRNA species that receive the spliced leader varies in the different groups in which trans-splicing occurs. In trypanosomes, all pre-mRNA molecules are trans-spliced (Campbell et al., 1984; Parsons et al., 1984). Most other organisms trans-splice 50-90% oftheir genes (Davis et al., 1995; Ganot et al., 2004;

Pouchkina-Stantcheva and Tunnacliffe, 2005; Satou et al., 2006; Satou et al., 2001;

Stover and Steele, 2001; Tessier et al., 1991), although the Larvacean tunicate

Oikopleura dioica and the cnidarian Hydra have lower rates of trans-splicing, 13% and

30% respectively (Ganot et al., 2004; Stover and Steele, 2001).

5 Characteristics of the SL RNA

The SL RNA is a short RNA composed of a single 5' -terminal exon, the spliced leader (SL), followed by an -like sequence (Fig. 1). The shortest SL exon discovered to date is the 16 nt SL of Ciona intestinalis (Vandenberghe et al., 2001), while the longest SL was found in another tunicate, 0. dioica, at 40 nt (Ganot et al., 2004).

Lengths of the 3'terminal intron-like moiety also vary, ranging from 30 nt in C. intestinalis (Vandenberghe et al., 2001) to 106 nt in bdelloid rotifers (Pouchkina-

Stantcheva and Tunnacliffe, 2005).

A)

5' TMGI'--_____---'-IG_U ______R_A(_Ul_nG_R _____--'13' Spliced leader (Sll Intron-like moiety

D) 5'- AUUCUAUUUGAAUAAGGUAAGAAUGUGAGCUGAGCUUUGGCUCAACUAUGU -3'

Figure 1: An SL RNA molecule. AlI SL RNA molecules can be divided into the spliced leader (SL) and the intron-like moiety. The SL is the 5' -terminal portion of the molecule that is transferred to the pre-mRNA molecule in the trans-splicing reaction. In metazoa, the SL RNA is capped at the 5' end with a trimethyl guanosine (TMG) cap. The intron-like moiety, like conventional cis-spliced , starts with a GU dinucleotide that helps define the immediately upstream nucleotide as the splice donor site. Within the intron-like moiety is an Sm-like binding site (RA(U)nGR). A) shows a general SL RNA molecule, while B) gives the sequence for the C. intestinalis SL RNA as identified by Vandenberghe et al., 2001. In B) the spliced leader is in bold type, and the Sm-like binding site is underlined.

The SL RNA is synthesized by RNA polymerase II (Bruzik et al., 1988) from a gene repeat which varies in length depending on the species from 264 bp in C. intestinalis

(Hastings, unpublished) to 1.35 kb in Trypanosoma brucei (Campbell et al., 1984). In C. elegans, there are 110 copies of the repeat (Krause and Hirsh, 1987). The number of

6 gene repeats has not been detennined for other species. In sorne organisms including

Caenorhabditis and Hydra, the SL repeat also includes the 5S rRNA gene repeats (Krause and Hirsh, 1987; Stover and Steele, 2001). This, however, likely represents a chance occurrence rather than an evolutionarily significant trait since the orientation with respect to the 5S rRNA repeat is different in both cases (Krause and Hirsh, 1987; Stover and

Steele, 2001) and because association of the SL RNA gene with the 5S rRNA gene is not the case for an organisms in which trans-splicing occurs (Brehm et al., 2002; Rajkovic et al., 1990; Zayas et al., 2005). Unlike other RNApolII transcripts, the SL RNA is not polyadenylated (Krause and Hirsh, 1987). Like other RNApolII transcripts, the SL RNA is initially synthesized with a 5' m7G cap. This is later modified in metazoa by additional methylation reactions to fonn a trimethyl guano sine (TMG) cap (Ganot et al., 2004;

Rajkovic et al., 1990; Stover and Steele, 2001; Thomas et al., 1988; Van Doren and

Hirsh, 1988). TMG caps are also found on spliceosomal snRNAs (Thomas et al., 1988).

In trypanosomes the SL RNA does not contain TMG but has a cap4 structure, a highly modified cap that contains monomethylated m7G at its 5'end (Lücke et al., 1996).

While the sequence of the SL RNA is not conserved across different phyla, there are certain general features common to all (Davis, 1996). These are the dinucleotide GU following the leader sequence and representing the first two nucleotides of the intron, and a binding site for the spliceosomal snRNP associated protein, Sm, RA(U)nGR, located in the intron (Bruzik et al., 1988; Vandenberghe et al., 2001). Several of the spliceosomal snRNAs also have Sm binding sites, and Sm binding plays a key role in the maturation and function ofsnRNAs, including the fonnation of the TMG cap structure

(Will and Luhnnann, 2001).

7 There is detectable, though not necessarily strict, sequence conservation of the SL among species within a given phylum (Davis, 1996). For example, all SL

RNAs contain AUG at the 3'end of the spliced leader (Davis, 1996). The SL1, once thought to be strictly conserved in all nematodes, has been shown to differ in certain species (Blaxter and Liu, 1996; Goyal et al., 2005). In sorne organisms, several distinct

SL RNAs exist, for example SLI and SL2 in nematodes (Blumenthal, 1995). In addition, sequence microheterogeneity of a single SL RNA type is emerging as a common feature in many organisms reflecting the presence of non-identical copies of the SL RNA gene

(Pouchkina-Stantcheva and Tunnacliffe, 2005; Tessier et al., 1991).

Despite the lack of global sequence conservation, sorne aspects of SL RNA secondary structure may be conserved. Usually, there are two or three predicted stem­ loop structures of variable length (Bruzik et al., 1988; Rajkovic et al., 1990), the first containing the SL leader and the exonlintron boundary (Bruzik et al., 1988). The Sm-like binding sequence is located either between the fust and second loop, if there are only two loops, or between the second and third loop if there are three loops (Bruzik et al., 1988).

Exceptions do exist. For example, the Hydra SLB RNA has three predicted stem-loops but the third is composed of the 5' and 3'ends of the molecule (Stover and Steele, 2001).

Oikopleura SL RNA has two predicted loops, but the arrangement is different because it is the second loop, and not the first, that contains the SL sequence and is involved in the splicing reaction (Ganot et al., 2004). Onlyone stem-loop is predicted for both the SL

RNA ofCiona and that of Hydra (SLA RNA). In these two cases, the SL sequence is not part of the loop structure (Stover and Steele, 2001; Vandenberghe et al., 2001). The

8 functional implications, if any, of these variations in predicted secondary structure are unclear.

Functions of SL trans-splicing

Five distinct functions of SL trans-splicing have been described to date or hypothesized. One of the se is to provide a cap structure to RNA polymerase 1 (RNApolI) protein-coding transcripts (Lee and Van der Ploeg, 1997). This function applies only to trypanosomes, as they are the only organisms to date for which protein-coding messages transcribed by RNApolI have been identified. RNApolI transcripts (ribosomal RNA molecules in most organisms) are not capped at the 5' end, but a cap is generally needed for translation of an RNA molecule in eukaryotes, thus adding a cap via SL trans-splicing permits these transcripts to be translated.

A second function for SL trans-splicing is specific to flatworms where the last three bases of the SL are a conserved AUG (Cheng et al., 2006). In Schistosoma mansoni and Schistosomajaponica, a small minority oftrans-spliced mRNAs use the AUG from the SL as the translation initiation codon (Cheng et al., 2006).

The third function that has been identified for SL trans-splicing is to resolve polycistronic transcription units, or operons (Johnson et al., 1987; Blumenthal, 1995). In most or all trans-splicing organisms, a fraction of genes are organized in operons. Trans­ splicing of the SL to the downstream genes of a polycistronic unit cuts out individual protein-coding open reading frames into separate RNA molecules and at the same time provides these internal cistron rnRNAs with a 5' cap. In nematodes, resolution of polycistrons uses a special SL, SL2, but in other organisms the same SL molecule is used

9 for both operon resolution and for monocistronic pre-mRNA trans-splicing (Blumenthal,

1995). In nematodes, trans-splicing of downstream cistrons is dependent on 3'end formation of the upstream gene; however, the converse does not hold true (Kuersten et al., 1997). Polycistrons have been identified in the following trans-splicing organisms: trypanosomes, flatworms, Caenorhabditis, Ascaris, Ciona and Oikopleura (Blumenthal,

1995; Ganot et al., 2004; Muhich and Boothroyd, 1988; Satou et al., 2006). In fact, polycistrons have been identifed whenever an in-depth search for has been carried out in a trans-splicing organism (Ganot et al., 2004).

A fourth possible function for SL trans-splicing is one of enhanced translation and/or mRNA stabli1ity, ie., addition of the SL RNA and its TMG cap increases the abundance of the mRNA or its translational efficiency. This could be done by increasing mRNA stability, increasing affinity between the mRNA and translation initiation factors, or optimizing the length of the 5 'untranslated region (UTR) (Conrad et al., 1991; Davis,

1996; Lall et al., 2004). Some studies support the enhanced-translation hypothesis and others do not. A study by Maroney et al. (1995) indicated that optimal conditions for translation of an Ascaris mRNA inc1uded both the SL sequence per se and its normal

TMG cap. However, a more recent study in Ascaris has found that, taken individually, both the SL per se and the TMG cap, reduce translation as compared to the m 7G cap and absence of the SL, but that each neutralized the other' s negative effect so that when both the SL and the TMG cap were present translation occurred at levels comparable to m7G­ capped non-SL-containing molecules (Lall et al., 2004). The optimal5'UTR length for a non-trans-spliced Ascaris mRNA was 25nt, and for a trans-spliced mRNA, 31 nt (Lall et

10 al., 2004), suggesting that generating an optimal5'UTR length is not one of the functions of SL trans-splicing.

The final function hypothesized for SL trans-splicing is one of "sanitizing" the

5'UTR. Removal of the outron by trans-splicing (Fig. 2) is a potential mechanism for removing any sequence elements that may be detrimental to the mRNA molecule's processing, nuc1ear export, cytoplasmic stability, or translation (Blumenthal, 1995;

Hastings, 2005; Nilsen, 2001). Presumably the outron elements targeted for removal by trans-splicing, though detrimental to the mRNA must have a functionally positive role in gene expression at sorne stage preceding trans-splicing (eg. transcription). What these hypothetical elements might be is unc1ear at the moment, since the most obvious potentially problematic sequence element, an out of frame AUO translational start codon, does not in fact seem to affect translatability of a nematode trans-spliced molecule

(Conrad et al., 1993b). However, this does not prec1ude the possible existence of other, yet to be characterized, negative 5'UTR elements whose removal may be beneficial to mRNA function.

Of the five known or proposed functions of trans-splicing, only the last two, enhanced translation and sanitization of the 5 'UTR, are applicable to monocistronic RNA polymerase Il transcribed genes. such as CiTnJ.

11 "41G SLRNA

SL exon Gu

m7GJ A AGI IAAAn outron Pre-mRNA molecule

"41 ~ G

SLexon V(;' m7GI A AGI IAAAn

m7G.~I ______~A~~A=G~1 TMG ~ ______~IAAAn + SL

Y-branched outron structure SL-trans-spliced mRNA

Figure 2: Mechanism of SL trans-splicing. SL trans-splicing is a spliceosome­ dependent process similar to conventional cis-splicing used to remove pre-mRNA introns. The first step of the reaction transfers the 5' end of the intron-like moiety of the SL RNA to the branchpoint A residue of the pre-mRNA molecule simultaneously releasing the SL with a free 3' hydroxyl. In the second step of the trans-splicing reaction the SL sequence is spliced to the splice acceptor site, releasing the SL-trans-spliced mRNA and a Y-branched outron-containing RNA molecule.

12 Mechanism of SL trans-splicing and cis-splicing

The mechanism of SL trans-splicing has not been fully elucidated. This section will introduce what is known and the following section will outline the equivalent steps of cis-splicing, which are better understood. Cis-splicing and trans-splicing occur in paraUel in most trans-splicing organisms.

Mechanism of SL trans-splicing

Trypanosomes present a unique group of organisms that trans-splice, because all of their genes are present as SL-trans-splicing-resolved polycistronic units, and only a handful ofthese genes have introns (Mair et al., 2000). This means that almost no cis­ splicing occurs in trypanosomes, whereas in other eukaryotic organisms, cis-splicing is the dominant type of splicing (Davis, 1996).

The SL RNA is transcribed in the nucleus by RNA polymerase II (Bruzik et al.,

1988). Elements of the SL sequence itself are required for transcription initiation and for

3'end formation (Hannon et al., 1990). Following its transcription, the SL RNA is exported to the cytoplasm where it associates with the Sm proteins (Bruzik et al., 1988).

Once bound to the Sm proteins, the 5'cap is modified to a TMG cap (Maroney et al.,

1990); or, in trypanosomes, to a cap4 structure (Lücke et al., 1996). It is believed that the associations between the Sm proteins, the TMG cap and the SL RNA is the signal that allows the SL RNA, in the form of a snRNP, to re-enter the nucleus where SL trans­ splicing reactions take place (Bruzik et al., 1988; Thomas et al., 1988).

Like cis-splicing, trans-splicing is a spliceosome-dependent, branch-point­ initiated process whereby the SL exon is transferred to a splice acceptor near the 5'end of

13 a pre-mRNA molecule (Davis, 1996). Addition of the SL is coupled with the removal of the 5' -most portion of the pre-mRNA molecule (the outron) (Fig. 2). A Y -branched RNA is formed by the covalent joining of the pre-mRNA outron and the intron-like segment of the SL RNA during the splicing reaction (Fig. 2), analogous to the formation of the intron lariat structure generated in cis-splicing (Davis, 1996).

Many of the components involved in trans-splicing are the same factors that make up the cis-spliceosome. One major exception is that the UI snRNP is not required for SL trans-splicing, despite being essential for cis-splicing (Hannon et al., 1991; Van Doren and Hirsh, 1988). It has been proposed that the SL RNA, when present as an snRNP, assumes the role otherwise played by Ul (Thomas et al., 1988). The other splicing factors including snRNPs U2, U4, U5, and U6, and the protein U2AF are all required for trans-splicing and cis-splicing, filling similar roles in both processes (Thomas et al.,

1988; Zorio and Blumenthal, 1999). U2 pairs with sequences near the branchpoint,

U2AF recognizes the trans-splice acceptor site, U5 pairs with the acceptor region of the pre-mRNA molecule, and U4 and U6 associate with each other and with other snRNPs

(Romfo et al., 2001; Thomas et al., 1988; Zorio and Blumenthal, 1999).

In nematodes, two proteins specifie for the trans-splicing reaction have been identified, marking the first discovery of any factors (other than the SL RNA) unique to trans-splicing (Denker et al., 2002). Orthologues of the two proteins were not found in the genomes of plants, tly, or humans, organisms which do not trans-splice (Denker et al.,

2002); however, orthologues have not yet been identified in other trans-splicing organisms, ego Ciona (Hastings, unpublished observation). These proteins appear to bind as a heterodimer to the Sm-like binding site of the SL RNA (Denker et al., 2002). They

14 must bind to the SL RNA before any association between the Sm-proteins and SL RNA can occur. One ofthese proteins, SL 30k, associates with SFIIBBP, and as a complex binds to the branchpoint early in spliceosome assembly (Denker et al., 2002). In cis­ splicing it is believed that VI, perhaps in conjunction with another as ofyet unidentified protein, interacts with SFIIBBP bringing the 5'splice site in close physical proximity to the 3' splice site. Denker and co-workers suggest that SL 30k may in fact be the functional replacement for UI in the trans-splicing reaction, forming the link between the

SL RNA and the pre-mRNA molecule that is to be trans-spliced.

Determinants of trans-splice acceptor sites

The most important factor in determining if SL trans-splicing will occur on a given pre-mRNA molecule is the presence of an unpaired splice acceptor site (Conrad et al., 1991). Two experiments in the nematode C. elegans clearly demonstrated that the acceptor sites used in cis- and trans-splicing were functionally equivalent. In the first experiment, an intron segment lacking a splice donor site was inserted into the 5'region of a non-trans-spliced mRNA, and with this change, the molecule was now trans-spliced

(Conrad et al., 1991). In the second experiment, an exon including its splice donor site was inserted into the outron of a pre-mRNA molecule that was normally trans-spliced.

However, with the addition of the upstream exon, the pre-mRNA molecule was now cis­ spliced (Conrad et al., 1993a). Together these experiments illustrate the importance of the context of splice donors and acceptors in trans-splicing. Trans-splicing does not occur at an acceptor that is paired with an upstream donor site, presumably because it cannot compete with the cis-splicing reaction that occurs at such paired sites.

15 For trans-splicing to occur a proper splice acceptor site is needed on the pre­ mRNA molecule (Conrad et al., 1993b). In the case ofnematodes, this means an AG dinucleotide with pyrimidines located at position -3 to -5, usually CVVAG (Conrad et al., 1993b). If any of these features are mutated, then trans-splicing can occur at a cryptic acceptor site (Conrad et al., 1993b). No consensus branchpoint sequence has been identified in Caenorhabditis (Blumenthal, 1995). In vertebrates, and likely in Ciona

01andenberghe et al., 2001), the features believed to define a splice acceptor are an AG dinucleotide located 15 to 50 nt downstream of a branchpoint consensus YRCTRA Y, with the intervening sequence rich in pyrimidines (Reed and Maniatis, 1985). In a study testing random sequences placed downstream of the trans-splice acceptor site for their ability to increase the efficiency of trans-splicing, two sequences were identified that were not also enhancers ofupstream cis-splicing (Boukis and Bruzik, 2001). One of these was (G/C)GAC(G/C) a sequence that binds one or more SR proteins, perhaps

SF2/ASF (Boukis and Bruzik, 2001). The other sequence identified was the 5'splice-site­ like (donor site) sequence AGGVAGGVV (Boukis and Bruzik, 2001). In cis-splicing, this sequence is recognized by VI, but when VI is absent, this site still sequesters components needed for trans-splicing (Boukis and Bruzik, 2001). The authoTS suggest that downstream splice donoTS might be enhancing trans-splicing by exon definition.

In summary, trans-splicing requires the SL RNP, the spliceosome components

V2, V2AF, U4, U5 and U6 and, in nematodes, two trans-splicing specific proteins. With the exception of the two trans-splicing-specific proteins, the roles these components play are the same as they would in cis-splicing, or if not required for cis-splicing, they take on the roles ofUl. In addition, trans-splicing requires that there be an unpaired splice

16 acceptor site on the target pre-mRNA molecule, and the presence of a branchpoint adenosine residue an appropriate distance upstream from the unpaired acceptor.

Spliceosome Assembly at the Cis-splice Acceptor Site

As previously discussed the spliceosome for trans-splicing is believed to be very similar to the cis-spliceosome, with the exception ofUI being absent from the trans­ spliceosome, while the SL snRNP is present only in the trans-spliceosome. In nematodes, there are also two proteins that are unique to the trans-spliceosome (see previous section). This section describes the assembly of the spliceosome in cis-splicing and the stages at which different elements of the pre-mRNA sequence and the different snRNPs are required. A very similar process is expected for the trans-splicing spliceosome.

Spliceosome assembly can be broken down into a series of intermediate complexes. These are, in order, E~E*~A~B~C (Champion-Arnaud et al., 1995).

Each complex is dependent on proper formation of the previous comp]ex (Jamison et al.,

1992). Formation of each of the complexes, with the exception of the E complex, is

ATP-dependent (Champion-Arnaud et al., 1995).

The E complex, sometimes referred to as the commitment complex, consists of the snRNPs Ul, U2 and U2AF65, U2AF35 and BBP, the branchpoint binding protein (in mammals, mBBP and in yeast SFl) (Champion-Arnaud et al., 1995; Das et al., 2000).

Ul basepirs with the 5'splice site (Jamison et al., 1992), U2 binds the branchpoint and will eventually displace BBP (Champion-Arnaud et al., 1995; Das et al., 2000), U2AF65 binds the (Berglund et al., 1998a), U2AF35 is recruited to U2AF65

17 (Peled-Zehavi et al., 2001) and BBP is the first protein to recognize and bind to the branchpoint sequence (Berglund et al., 1998a). BBP first recognizes the consensus branchpoint sequence YNCURAy and binds to the RNA through its KH domain

(Berglund et al., 1998b; Peled-Zehavi et al., 2001). U2AF65 sits adjacent to BBP on the

RNA, binding specifically to the polypyrimidine tract (Berglund et al., 1998a). This interaction stabilizes BBP's binding to the branchpoint (Peled-Zehavi et al., 2001). There is evidence that BBP also interacts with UI, although the mechanism for this interaction has yet to be determined (Berglund et al., 1997). U2AF35 interacts with U2AF65IBBP through protein-protein interactions, further stabilizing the U2AF65-polypyrimidine tract interaction (Peled·Zehavi et al., 2001). Both the 5' splice site and the 3'splice site are required for E complex formation (Jamison et al., 1992). While mutations to the branchpoint sequence, particularily to the branchpoint A residue, eliminate splicing, they do not prevent E complex assembly; however, transition to the A complex is affected

(Champion-Arnaud et al., 1995; Reed and Maniatis, 1988). The E* complex contains all of the same components as the E complex, but here U2AF65 is phosphorylated

(Champion-Arnaud et al., 1995). The A complex is formed when U2 binds to the branchpoint sequence and displaces BBP (Champion-Arnaud et al., 1995). By the time the B complex has formed, Ul and U2AF have falJen out ofthe spliceosome (Champion­

Arnaud et al., 1995). In place ofUl, U6 is recruited to the 5'splice site and with it U5 and U4 which are associated through various protein-protein and RNA:RNA interactions.

The U61U4.U5 snRNP is involved in lariat formation, the covalent modification of the substrate pre-rnRNA, within the final active spliceosome, termed the C complex (Alberts,

2002).

18 Cryptic Splicing Sites: Cis-splicing

A cryptic splice site is one that is revealed only when the natural site has been altered in such a way that it can no longer be used. Mutating either 3' splice acceptor site

AG or branchpoint can lead to the activation of a cryptic splice acceptor site. Both of the se elements, along with a polypyrimidine tract, are features of a splice acceptor site

(Alberts,2002). Altered splicing patterns result from mutating the splice acceptor site while in the case of branchpoint mutations a new branchpoint residue will be selected, but the splicing pattern may or may not be affected. Mutating the 5'splice donor site more often than not leads to intron retention rather than activation of a cryptic donor site (Ley et al., 1982; Maniatis and Reed, 2002; Reed and Maniatis, 1988). The following section illustrates examples of the different types of mutations that can lead to activation of a cryptic site.

Mutations at the 5 'splice site (donor)

Mutations of the GT donor site, whether naturally occurring or engineered for in vitro studies, prevent the use of the natural5'splice site and lead to intron retention (Aebi et al., 1986; Ley et al., 1982; Maniatis and Reed, 2002). For example, a GT ~AT mutation in B-thalassemia patients prevents the intron in question from being spliced out

(Ley et al., 1982). A similar result was found when either a GT~AT or GT~GA mutation was made in vitro (Aebi et al., 1986). However, in vitro, a GT~GC mutation still produced the correct splicing product, albeit at reduced amounts (Aebi et al., 1986).

19 Mutations at the 3 'splice site (acceptor)

Mutating either base of the AG dinucleotide adjacent to the splice acceptor site to any other base blocks splicing at that site but often activates a cryptic acceptor site (Aebi et al., 1986; Kuiper et al., 1988; Macklin et al., 1987; Reed and Maniatis, 1985; Reed and

Maniatis, 1988). Often this cryptic site is simply the next AG downstream of the natural site, and it has been noted that cryptic cis*splice acceptor sites tend to cluster ~20 nt. downstream of the natural acceptor site (Aebi et al., 1986; Kralovicova et al., 2005;

Kuiper et al., 1988), but there are exceptions, and cryptic acceptors can also be located upstream of the natural acceptor site. Usually, the AG nearest the branchpoint is selected

(Reed and Maniatis, 1985). These results indicate that AG dinucleotides are critical for defining cis-splicing acceptor sites but that almost any AG near a branchpoint can serve the purpose.

Mutations at the branchpoint

The consensus branchpoint sequence in mammals is YNYTRAY (Reed and

Maniatis, 1985) which is very close to the yeast branchpoint TACTAAC (Mount, 1983).

The actual branchpoint residue is the underlined A. In yeast and vertebrates, if the branchpoint sequence is deleted leaving in place the polypyrimidine tract and 3' splice site, a cryptic branchpoint is generally activated and the splice product remains unchanged (Padgett et al., 1985; Ruskin et al., 1985). The same is true ofa targeted mutation of the branchpoint A residue (Reed and Maniatis, 1988; Ruskin et al., 1985).

The cryptic branchpoints that are used do not usually match the consensus sequence; but aH do have a purine just before the new branchpoint residue (Padgett et al., 1985).

20 Cryptic branchpoints are less efficient than the natural branchpoints (Reed and Maniatis,

1985). These results suggest that at least in cis-splicing the process is not "driven" by the branchpoint consensus sequence, though this may facilitate it.

Cryptic Splice Sites: Trans-splicing

The few prior studies that have examined the effects of mutations at the AG directly adjacent to a trans-splice acceptor site have found results that mirror those observed in cis-splicing. Conrad and colleagues made an AG~AA mutation in a synthetic gene construct in C. elegans and found that SL trans-splicing to the natural site was abrogated. Instead the SL sequence was added to the pre-mRNA molecule at a cryptic acceptor site 20 nt upstream (Conrad et al., 1993b). This alteration in acceptor site did not affect the phenotype, suggesting that trans-splicing to the new cryptic site was not detrimental to mRNA function (Conrad et al., 1993b). A similar experiment was performed in trypanosomes, although in this case an AG~ TG mutation was used

(Hummel et al., 2000). The cryptic trans-splice acceptor sites were located at either the next AG downstream or, less frequently, at the next AG upstream (Hummel et al., 2000).

The initial experimental steps in my project involved mutation of the trans-splice acceptor site in an ascidian gene, CiTnI; it was possible that such mutations might entirely eliminate trans-splicing. However, the aforementioned studies raise the possibility that cryptic acceptor sites might be generated upon mutation of the CiTnI trans-splice acceptor site, depending on the degree to which ascidian trans-splicing mechanistically resembles nematode or trypanosome trans-splicing, or cis-splicing in yeastlvertebrates.

21 Ciona intestinalis

Ciona intestinalis is an ascidian. Ascidians are one of three classes that make up the tunicate subphylum. Tunicates, along with cephalochordates and vertebrates make up the chordate phylum. Ascidians are classified as chordates on the presence of a notochord and dorsal nerve cord in the larva (Satoh, 2003). Traditionally it was believed that the tunicates were the frrst group to branch off the chordate lineage, but recent data suggest that this is not the case and that tunicates are the sister group to vertebrates, with the cephalochordates being more distantly related (Del suc et al., 2006).

Ascidians are marine organisms found in oceans around the world (Satoh, 2003).

As adults, they are hermaphroditic, sessile, filter-feeding organisms; however, as larvae they are motile using muscle cells in the tail to power their swimming (Satoh, 2003). The body plan of the larva is relatively simple, consisting of a trunk and a tail (Satoh and

Jeffery, 1995). It is composed on the order of 2600 cells and only a handful of tissue types, namely: epidermis, endoderm, nervous system, muscle, notochord and mesenchyme (Satoh and Jeffery, 1995). The chordate features of a notochord and dorsal nerve cord are present only in the larval stages of development. When an animal undergoes metamorphosis the tail is resorbed and aIl of the adult tissues are formed from precursor cells in the trunk (Satoh and Jeffery, 1995). Ascidian development is rapid.

For example, in C. intestinalis it takes only 16-18h to go from fertilization to the tadpole larva (Satoh, 2003).

22 Ascidian Muscle

My experimental studies are based on a Ciona intestinalis gene encoding a muscle-specifie protein, troponin I. There are three types of muscle in C. intestinalis: larval tail muscle, and adult cardiac and body wall muscles (Meedel, 1997). There are between 36-42 muscle cells in ascidian larval tails depending on the species. Larval tail muscle cells are derived from two different lineages. The primary muscle lineage, believed to be the ancestral muscle lineage, cornes from the B4.1 blastomeres. AlI ascidians have 28 primary muscle cells. Secondary muscle cells, variable in number, are derived from the A4.1 and b4.2 blastomeres, and require inductive signals from the notochord and/or endoderm (Meedel, 1997). Muscle cens are found along the notochord; primary lineage rostrally and secondary lineage caudally. Larval tail muscle is mononucleated striated muscle, of the troponinltropomyosin-regulated variety (Meedel,

1997). Larval tail muscle is homologous to vertebrate skeletal muscle (Cleto et al.,

2003).

Adult body wall muscle cells are large and multinucleated. There are no sarcomeres and the muscle is not striated despite the presence of a troponinltropomyosin contractile regulatory system (Nevitt and Gilly, 1986). During metamorphosis, body­ wall muscle cells are derived from the mesenchyme cells of the larva (Meedel, ] 997). In molecular terms, adult body wall muscle is related to vertebrate skeletal muscle, but, in terms of morphology and function, it is not clearly homologous to any vertebrate muscle

(Hastings, ] 997; Nevitt and Gilly, 1986).

23 In ascidians, the heart is a very delicate structure composed of a single layer of spindle-shaped sarcomeric myoepithelial cells derived from the mesenchyme cells of the larva (Meedel, 1997).

C. intestinalis troponin 1

Troponin is a protein of the sarcomeric muscle contractile apparatus. There are three troponin subunits: troponin l, troponin C and troponin T (Farah and Reinach, 1995).

Troponin 1 is the inhibitory subunit responsible for the ability of tropomyosin to inhibit the actin-myosin interaction in the absence ofCa2+, troponin C is the Ca2+-binding subunit and troponin T binds to tropomyosin (Farah and Reinach, 1995). Unlike vertebrates where there are three troponin l genes (fast skeletal, slow skeletal and cardiac), C. intestinalis has only one Tnl gene, CiTnl (Cleto et al., 2003; Dehal et al.,

2002; Hastings, 1997; MacLean et al., 1997). Troponin 1 diversity in C. intestinalis is generated through alternative splicing oftwo encoding near-N-terminal protein sequence to give two protein isoforms; these exons are excluded from the Tnl mRNA produced in the adult body wall and embryo tail muscle cells, and are included in the rnRNA expressed in the heart (MacLean et al., 1997). The single CiTnI gene is equally­ distantly related to all three vertebrate Tnl genes (Chiba et al., 2003; Cleto et al., 2003;

Hastings, 1997). Therefore, the presence of a single TnI gene in Ciona is believed to be a reflection of the ancestral condition in the chordates, with the expansion of the ToI family having occurred in the vertebrate lineage after the ascidian-vertebrate divergence through a series oftwo gene duplication events (Chiba et al., 2003; Hastings, 1997).

24 Polycistronic transcription units, or operons, have been identified in C. intestinalis

(Satou et al., 2006); however, CiTnI is not among them. The JGI Ciona intestinalis genomic database (Joint Genome Institute's website (JGI) (http://genome.jgi­ psf.org/Cioin2/Cioin2.home.html) indicates that there are no open reading frames within at least Skb upstream ofCiTnI and no open-reading frames lkb downstream of the termination signal. This genomic DNA sequence evidence and the presence of a functional promoter immediately upstream of the CiTnI gene (Khare, 2005) indicate that

CiTnI is a monocistronic gene. Version 1.0 of the JGI database contained the CiTnI gene, but for reasons that are not clear, the gene is absent from the most recent version, v2.0.

C. intestinalis genomic data

In 2002, a draft genome sequence was published for C. intestinalis (Dehal et al.,

2002) and is available at the C. intestinalis has approximately 16 000 protein coding genes. This is about half the number of genes in vertebrates, but is similar to other invertebrates such as Drosophila melanogaster and C. elegans (C. elegans Genome

Consortium, 1998; Celniker and Rubin, 2003). The Cionalvertebrate gene number difference is consistent with the hypothesis that there have been large- and small-scale duplications in the evolution of the vertebrate genome (Gu et al., 2002; McLysaght et al.,

2002). Of the genes in C. intestinalis, 60% have a protostome homologue and therefore represent ancient genes ofthe bilateria (Dehal et al., 2002). One fifth of the genes are believed to be specifie to tunicates, ascidians, or C. intestinalis having no homologues in fly, worm or humans (Dehal et al., 2002).

25 The availability of the genome sequence is an enonnous advance for molecular biology and evolutionary studies in C. intestinalis (Kusakabe, 2005).

The Project and Rationale

The first case of SL trans-splicing in the chordates was discovered in the CiTnI gene (Vandenberghe et al., 2001). Since this initial discovery, it has been detennined that

SL trans-splicing is not a rare event in C. intestinalis, in fact about half of all genes are trans-spliced (Satou et al., 2006). The role trans-splicing plays in many of these cases, including that of CiTnI, remains unclear. Ultimately, we would like to determine a function for SL trans-splicing of monocistronic genes such as CiTnI. This project marks the first step towards this goal, as it elucidates the behaviour of cryptic acceptor sites within the outron, leading to a better understanding of trans-splicing and what will be required to create a non-trans-spliced CiTnI mRNA molecule in order to experimentally assess the role of trans-splicing in CiTnI gene expression.

While no experimental studies on trans-splicing have been perfonned in ascidians, a similar study has been conducted in nematodes (Conrad et al., 1993b). In this study, mutating the natural trans-splice acceptor site activated cryptic sites within the outron. 1 was aware of two trans-splice acceptor sites in CiTnI: the natural acceptor site at -64 (Vandenberghe et al., 2001) and a cryptic acceptor site at -346 (Khare, 2005). The cryptic acceptor site was accidentally discovered in a promoter analysis study using a construct containing an upstream segment of the outron (This study established that the transcription start site of CiTnI is at -523, thus the outron extends frOID -523 to-64

(Khare, 2005»).

26 ln the CUITent project 1 initially planned to make mutant constructs that would express non-trans-spliced mRNAs containing the full outron. However, because my initial mutation studies revealed activation of cryptic trans-splice acceptor sites, it became important to characterize the distribution of cryptic trans-splice acceptor sites within the

CiTnloutron, and this becomes the main focus of the project. Although, it has not been possible yet to produce a non-trans-spliced CiTnI mRNA retaining the full outron, my studies provide information on how to approach this goal, and also provide novel insight into the mechanism and functional relevance of SL trans-splicing in Ciona.

27 2.METHODS

AlI nucleotide positions refer to the CiTnI sequence as found in NCBI GenBank

(Accession AF237978) and are numbered with the A of the ATG translation start codon being position + 1, and with the nucleotide immediately preceding being position -1.

General Cloning Procedure

A similar cloning procedure was used for aIl constructs. Deviations from this procedure are described in the section pertaining to that particular construct. Insert and vector, obtained as described below for each individual case, were ligated ovemight at 6-

10°C. Each 10 J.1L ligation reaction consisted of 0.1 U ofT4 DNA ligase (MBI

Fermentas) in IX T4ligase buffer [MBI Fermentas, lOX buffer composition: 400 mM

Tris-HCI, 100 mM MgCh, 100 mM DTT and 5 mM ATP pH 7.8], the insert and vector

DNA. Unless stated otherwise the insert: vector molar ratio was 3:1 and the total amount of DNA was 50 ng.

Ligated DNA was transformed into bacteria by one oftwo methods. Heat-shock into competent DH5a cens (Invitrogen) was always tried first, but if that did not yield enough colonies, then electrotransformation ofXLl-blue Tet+ cens was used.

Electrotransformation-competent XLI-blue Tet+ cens were made as folIows. A colony from a streaked glycerol stock, grown on an LB [1 % Biotryptone (Bioshop), 0.5% yeast extract (Difco), 0.5% NaCl (EMD)], 1.5% agar (Bioshop) plate containing tetracycline

(10 J.1g/mL), was used to inoculate a 5 mL ovemight culture in LB containing 10 J.1g/mL tetracycline). Two millilitres of the overnight culture were used to inoculate a 500 ml culture which was grown to an OD600 of 0.5-0.8. The cells were centrifuged (5000 rpm,

28 15 min, 4°C in a GS3 rotor) and the cell pellet was resuspended in 2 L of ice cold water.

The centrifugation step was repeated three more times, with resuspension in 1 L of water,

40 mL ice cold 10% glycerol and 6 mL 10% glycerol respectively. The cells were divided into aliquots and stored at -70°C until used.

For heat-shock transformation, 5 J,lL ofligation reaction was mixed with 50 J,lL of sub-cloning efficiency DH5a cells (lnvitrogen). The cells and DNA were incubated in an ice/water bath for 30 min, then in a 37°C water bath for 20 s, followed by a two­ minute incubation in the ice/water bath. Pre-warmed LB, 1 mL, was added and following incubation for 1 hr in a 37°C shaking incubator the cells were plated on LB (1.5% agar)+Amp+X-gal+IPTG plates [25 J,lg/mL Ampicillin (ICN), 25J,lg/mL X-gal (Bioshop) in dimethylformamide (from a 1000X stock) (Sigma), 23 J,lg/mL IPTG (Bioshop)] and grown ovemight at 37°C. The next day blue or white colonies (as appropriate) were selected and re-streaked onto fresh LB+Amp+X-gal+IPTG plates. Individual colonies from the se plates were used to inoculate 5 mL liquid LB+Amp (25 mg/mL ampicillin) ovemight cultures which were then used to make glycerol stocks (20% glycerol) and for plasmid DNA minipreps. QIAgen Plasmid miniprep or Sigma GenElute miniprep kits were used according to the manufacturer's instructions.

For electrotransformation, ligation reaction products were ethanol precipitated and resuspended in 10 J,lL ofwater. Half of the resuspended DNA was mixed in an electroporation cuvette (on ice) with 40 ~L ofXLl-blue Tet+ cells. After a 1 min incubation on ice the cuvettes were placed in the GenePulser (BioRad) and shocked using the following settings: controller at 2000, pulser at 25 J,lF and 1.7 kV. Cells were diluted in 1mL SOC medium [2% Biotryptone, 0.5% yeast extract, 10 mM NaCl, 2.5 mM KCI

29 (Fisher), 10 mM MgCh(Fisher), 10 mM MgS04 (Aldrich), 20 mM glucose (Sigma)] and placed in a 37°C shaking incubator for 1 h before plating on LB+Amp+Tet+IPTG+X-gal

[25 ~g/mL ampicilin, 10 ~g/mL tetracyc1ine, 25 mg/mL X-gal in dimethylformamide and

23~g/mL IPTG] agar plates and incubated in a 37°C incubator overnight. The next day, blue or white colonies (as appropriate) were re-streaked onto fresh

LB+Amp+Tet+IPTG+ X-gal agar plates. Single colonies from the se new plates were used to inoculate a 5 mL liquid LB+Amp+Tet (25 f..lg/mL ampicillin and 10 mg/mL tetracycline) cultures that were used to make glycerol stocks and for plasmid minipreps as described above for the heat-shock procedure.

Standard PCR and Agarose Gel Electrophoresis

The standard 50 ~L PCR reaction used in all experiments consisted of 1X Pfu buffer [MBI Fermentas; IOX buffer: 200 mM Tris-HCI pH8.8, 100 mM (N~hS04, 100 mM KCI, 1% Triton X-I 00, 1 mg/mL bovine serum albumin (BSA), 20 mM MgS04], 1 f..lL 0.2 mM dNTPs (Gibco), 200 ng of each primer (BioCorp) and 1.25 U Pfu DNA polymerase (MBI Fermentas). Amounts oftemplate varied as described below.

Appendix 1 contains the thermal cyc1ing protocols used for each PCR reaction.

PCR products were separated by electrophoresis in agarose gels in IX TAE [Tris acetate EDTA Buffer: 40 mM Tris base (Bioshop), 20 mM sodium acetate (EM Science),

2 mM EDTA (Bioshop)]. The percent ofagarose (Bioshop) used in the gel depended on the size of the PCR products to be separated but was usually in the range of 1%-1.5%.

Ethidium bromide (Sigma) was added to the molten agarose to a final concentration of

0.5 f..lg/mL before pouring the gel. A one-tenth volume loading buffer [50% glycerol,

30 O.lM EDTA, bromophenol blue] was added to the samples prior to loading them on to the gel. Electrophoresis was at 10 Vlem, and products were visualized by ultraviolet transillumination fluorescence.

Mutation of Natural Trans-splice Acceptor Site

Two CiTnI constructs were made in which the natural trans-splice acceptor site was mutated (Fig. 3). In the first, CiTnlnuclacZ(-1.5){A(-66)C}, the A at position -66 was changed to a C, and in the second, CiTnlnuclacZ(-1.5){A(-66)T}, it was changed to a T. The starting material for the production ofboth mutant constructs was

CiTnlnuclacZ(-1.5) (Cleto, 2002) which contains 1.5 kb ofupstream CiTnIDNA, from position -24, driving an otherwise promoterless nuclear lacZ gene, aIl in Bluescript II

SK+ (Cleto et al., 2003; Vandenberghe et al., 2001).

1 used PCR SOEing (Splicing by Overlap Extension) to generate the point mutations. PCR SOEing is a method for introducing mutations based on the production of two overlapping PCR products bearing the mutation, introduced by the primers, in the overlapping region. These products have complimentary ends and so can anneal in the overlapping region, and used as template for extension (and then PCR) by the polymerase

(Horton, 1995). Standard PCR reactions were performed in a Perkin Elmer Thermal

Cycler (see Appendix 1). For the left half ofCiTnlnuclacZ(-1.5){A(-66)C} 800ng ofthe template, CiTnlnuc1acZ(-1.5), was used along with primers 9134 and 9135 (Table 1).

For the reaction to produce the right halfO.8ng of the template was used with primers

9132 and 9133. The PCR products from these reactions were recovered following electrophoresis in a 1% agarose gel and extracted using the Sephaglas BandPrep Kit

31 (Amersham). For the SOEing reaction, 62 ng of each product was placed in a PCR tube along with 46 f..lL 0.2 mM dNTP, IX Pfu buffer containing 1.25 U Pfu DNA polymerase.

The tube was heated at 94°C for 4 min, placed on ice for 5 min and then incubated at

72°C for 20 min. Primers 9133 and 9134 (200 ng each) were added bringing the final reaction volume to 50 f..lL and the reaction was subjected to thermocycling (see Appendix

1). The PCR SOEing product obtained was subjected to electrophoresis in a 1% agarose gel in IX TAE, and the appropriate band, 891 bp, was extracted using the Sephaglas

BandPrep Kit (Amersham). This PCR product, called A( -66)C, was cut with a ten-fold excess of Eco81I and Bpu1102I (MBI Fermentas) in Y+ buffer [MBI Fermentas, lOX buffer composition: 33 mM Tris-acetate pH7.9, 10 mM Mg acetate, 66 mM potassium acetate, 0.1 mg/mL BSA] for 2 h at 37°C to generate sticky ends. The digest was subjected to electrophoresis in a 1% agarose gel in IX T AE and the appropriate fragment,

714 bp termed "A(-66)C insert", was extracted and cloned as described next.

The "A(-66)C insert" was cloned in a three-way ligation with the 4427 bp and

2789 bp fragments that were generated when CiTnuclacZ(-1.5) was digested under conditions described above with Eco81I and Bpull02I and recovered using the QIAquick

Gel Extraction Kit (Qiagen) following a l % agarose gel electrophoresis. The 2789 bp fragment was treated with Shrimp Alkaline Phosphatase (SAP) prior to ligation to minimize vector recircularization. For SAP treatment, 200 ng of the 2789 bp fragment were incubated with 0.5 U SAP (MBI Fermentas) in IX SAP buffer (MBI Fermentas) for

30 min at 37°C. The SAP was heat inactivated with a 15 min incubation at 65°C. The three-way ligation reaction contained 40 ng of SAP-treated 2789 bp fragment, 45 ng of the 4427 bp fragment, and 40 ng of "A(-66)C insert", and following

32 electrotransfonnation blue colonies were selected. QIAPrep Spin Miniprep Kit (Qiagen) was used to extract the plasmid DNA which was verified by restriction digest analysis and sequencing. AlI sequencing was done by Sheldon Biotechnology Centre at McGill

University. Confinned plasmids were produced in large quantity using the Qiagen

Plasmid Maxi Kit. Two individual Maxipreps were done for each construct and the identity of the resulting DNA was reconfinned by restriction digest analysis.

CiTnInuclacZ( -1.5) {A( -66)T} was generated using this same procedure, except the primers used in the initial PCR SOEing reactions were 9137 and 9134 (left half) and

9136 and 9133 (right halt).

Deletion between Natural Trans-splice Acceptor Site and Upstream Cryptic Site

This construct, CiTnInuclacZ(-1.5){A(-66)C}il-320,-75, has a deletion from position -324 to -74 of the CiTnI in the context ofCiTnInuclacZ(-1.5){A(-66)C} (Fig.

3). The deletion was generated by creating two PCR products, one covering a 578 bp region upstream of the deletion and one covering a region 530 bp downstream of the deletion. The two halves were joined together at a HindIII site that had been integrated into the original PCR primers. For PCR of the upstream piece, four identical standard

PCR reactions were made with 0.2 ng ofCiTnInuclacZ(-1.5){A(-66)C} as template and primers 9158 and 9159 (200 ng of each). For cycling conditions, please refer to

Appendix 1. The downstream half was made from four standard PCR reactions with

0.2ng CiTnlnuclacZ(-1.5){A(-66)C} as template and the oligonucleotides 9160 and 9161

(200 ng each) as primers. Appropriate products (upstream half: 578 bp, downstream half:

530 bp) were recovered from separately pooled reactions following electrophoresis in a

33 1% agarose gel in IX TAE using the QIAquick gel extraction kit (Qiagen) and were cut with a 10-fold excess of HindIlI (MBI Fermentas) in IX R+ restriction buffer (MBI

Fermentas, 10X buffer composition: 10 mM Tris-HCI pH8.5, 10 mM MgCh, 100 mM

KCI, 0.1 mg/mL BSA). The digest (37°C for one hour) was inactivated by heating at

65°C for 20 min and products were recovered following electrophoresis in a 1% agarose gel with the QIAquick extraction kit.

The upstream and downstream halves were ligated across the HindIlI joint by incubating the extracted products overnight at 6-10°C with O.lU ofT4 DNA ligase in IX

T4ligase buffer. The 1096 bp ligation product, termed "~insert" was recovered following electrophoresis in a 1% agarose gel using the QIAquick extraction kit.

To prepare for cloning, ~insert was cut in sequential digests (1.5 h, 37°C) with a

10-fold excess of PstI (Pharmacia Biotechnology) in R+ buffer and Eco81I in Y+ buffer

(MBI Fermentas). The vector, CiTnInuclacZ(-1.5){A(-66)C}, was also digested with the se two enzymes. Both the cut vector (6774 bp) and cut ~insert (900 bp) were recovered following electrophoresis in a 1% agarose gel in IX TAE using the QIAquick gel extraction kit. Standard ligation and electrotransformation protocols were used, blue colonies were selected and plasmid DNA was purified by QIAPrep Spin Miniprep Kit.

Verification of clones was done by restriction digestion and sequencing as described above. Duplicate QIAPrep MaxiPreps were made in order to have two distinct DNA preparations.

34 Outron 3' end truncation constructs

There are three constructs in this series; CiTnI(-883/-79)nlacZ, CiTnI(-838/-

177)nlacZ and CiTnI( -838/-262)nlacZ, each bearing a larger 3 'truncation of the outron

(Fig. 3). These DNA molecules contain a fragment of the CiTnI gene upstream DNA, whose end-coordinates are given in the construct name, driving the promotorless nuclear lacZ, in Bluescript II SK+ vector.

This series of constructs was created using a PCR based method to generate intermediate constructs, followed by insertion of a nuclear lacZ gene cassette. To create intermediate constructs 0.8 ng of CiTnInuclacZ(-1.5) was used as template in a standard

PCR reaction with either primers 9066 and 9187 (to amplify the CiTnI DNA region from

-844 to -73), primers 9066 and 9188 (to amplify the CiTnI DNA region from -844 to-

171), or primers 9066 and 9189 (to amplify the region from -844 to -254). See

Appendix 1 for PCR cycling conditions. The PCR products were recovered following electrophoresis on a 1% agarose gel using the Sephaglas BandPrep Kit (Amerhsam).

PCR products were prepared for cloning by sequentially digestion with SmaI (lX

OPA buffer (Pharmacia), 1.5h at room temperature) and KpnI (IX OPA buffer, 2h,

37°C). Digested insert and vector (SmaIlKpnI fragment ofpBluescript II SK+) were recovered following electrophoresis in a 1% agarose gel using the Sephaglas BandPrep

Kit. For each deletion construct, a standard ligation was performed using a 3:1 insert:vector molar ratio. Ligation reactions were electrotransformed into XLl-blue Tet+ cells. White colonies were selected and the plasmids isolated by the Sigma GeneElute

Miniprep Kit. These intermediate constructs, CiTnI( -838,-79), CiTnI(-838/-177) and

CiTnI(-838/-262), were screened by KpnI/SmaI digests and clones with the appropriate

35 band pattern were sequenced (Sheldon Biotechnology Centre) to confinn the CiTnI DNA that was generated by PCR and any ligation junctions.

The nuclear lacZ fragment to be introduced into the intennediate constructs, nuclacZ, was recovered as a 3.5kb SmaIlBglII fragment ofpSp72-1.27 following agarose gel electrophoresis, using the Sephaglas BandPrep Kit. The recovered lacZ fragment was ligated into the SmallBamHI linear fragment from double-eut intennediate construct.

The ligation reactions were transfonned by heat-shock into DH5a cells, blue colonies were selected, and the plasmids recovered by Sigma GenElute Miniprep. These constructs, CiTnl(-838/-79)nlacZ, CiTnI(-838/-177)nlacZ and CiTnI( -838/-262)nlacZ were screened by Kpnl digests. Constructs exhibiting the correct digest pattern were sequenced across the ligation junctions by Sheldon Biotechnology Centre and two

QIAGEN Maxi Plasmid Preps were prepared for each.

36 ... .9 .s ;S .5 .~ .~S- .ê 1 i 1 c 0.. 1;l 1;l ~ .- (J r(J .~ .§ N lI").~~ 13 o;! \O.~ ~ on (J OJ Q'\ ''S. V C :l vg.M ~§§ ~~ ~ <'l :1 C ~~ ~ <'lè, .., -:l­I .~.,&:) lI(.l:J~ ~S ,0:) 1 U 1 C (J , . CiTnInuclacZ( -1.5) ,\ (î ,---- r ----- r------CiTnI r----r nia..:!

CiTnInuclacZ( -1.5){ A( -66)C} L_ J_CQ.

1 CiTnInuclacZ( -1.5){ A( -66)T} TG r---- r - --T--- T

1 1 1 1

-324 1-74 ~'j CiTnInuclacZ(-1.5){A(-66)C}d-319,-77a.I _____ .....__ ...... 1-...... ,._.\.\11. ____ • ___ ...... ,...... -- ""'_--1 1

CiTnI( -838/-79)nIacZ

CiTnI(-838/-177)nlacZ

-262 CiTnI( -838/-262)nlacZ -838 --- .,. ------L 1 .,.J.,."''''''' -...... ' i-~i

Figure 3: Schematic drawing of constructs. CiTnlnuclacZ( -1.5)

-1432 cagcgaagaacctttaactcataacctctgggttagaggcat -1390 gcgcgctaacaattgtgtcacggcgcgggatcaaatcatgctattaatactgaagcgctt -1330 aaagcgaggatatgtaactcgccatggtccagattatccattcactttgcgtatcatctg -1270 aaaccaaattttttggcaagcctgtaatttggtgtaatttatttaacattattaacaccg -1210 cacaatataacacattcatacaaattttggtttatagtgcccttatccttgaacttcgta -1150 acaagtaattctgtaaaccaagttcacatatatttttgcgggtcagtgctgctacagcaa -1090 atcatacgaaataaaaataaataaatcaccgcactaaatattcagcataactttactttt -1030 cgcggggagcagcagcaggccgaagtttttgataatatttcgatttgttttacattcttt -970 agaaaacgtctctataagttgataaaagtgcatatattgttataataattatgaacgtgg -910 aaaactctttagagcaactggagtggcatggcatcgtaataacagcgtctgctgcttgtc -850 aactcgacagcggggtaggtgcttgtgacgtcatactgcagctgttatcgcctgagcagc -790 agtgttccccgtgattttgtgctttgaatcaaatttaaatattacaaatataaaaaatat -730 attctgttatacttaatcaaagtaaagtattggtataagcatattttcgttaatttagtc -670 caaatcaactatagtttaagcagtttctagacgtttacatcagtacggcggcaacact cg -610 ttctatattaaatccagacaaccagaacttcttcttcgtttagtccctcgatgttctatt -550 taagggagcttacagcatccatccaacAtcggtaatcgagaatctgttaaccttgtttag -490 tcacagcgaaaggttcatgcatttaagaaccccttggtgaatttaaagaattttcttgta -430 gaaagggacgcgtagtggcgatagttactatggttaactattagcttagctgagtttatc -370 atgattactatttttattatttagatttctgaagggtttgccaataaaagcatgttaaat -310 gttaagcatatctatttaaatataagttattcttctgagggttgtgatagtgataactca -250 tcaggatttataactatagcgcaatactgaacttatcaagctgataagactaccagtagc -190 tgcgtaaaacctttaggacaaatctgaagaggaactaaatacaaaccgtataaactgaat -130 attgtaaagtgataattttttatcgtttttttttcttgtggtttactaatgcagtttcat -70 cta~aaccggtatcaattggttaagtaggtcttgtagtttggct -

CiTnlnuclacZ( -1.5){A( -66)C)

-1432 cagcgaagaacctttaactcataacctctgggttagaggcat -1390 gcgcgctaacaattgtgt cacggcgcgggatcaaatcatgctattaatactgaagcgctt -1330 aaagcgaggatatgtaactcgccatggtccagattatccattcactttgcgtatcatctg -1270 aaaccaaattttttggcaagcctgtaatttggtgtaatttatttaacattattaacaccg -1210 cacaatataacacattcatacaaattttggtttatagtgcccttatccttgaacttcgta -1150 acaagtaattctgtaaaccaagttcacatatatttttgcgggtcagtgctgctacagcaa -1090 atcatacgaaataaaaataaataaatcaccgcactaaatattcagcataactttactttt -1030 cgcggggagcagcagcaggccgaagtttttgataatatttcgatttgttttacattcttt -970 agaaaacgtctctataagttgataaaagtgcatatattgttataataattatgaacgtgg -910 aaaactctttagagcaactggagtggcatggcatcgtaataacagcgtctgctgcttgtc -850 aactcgacagcggggtaggtgcttgtgacgtcatactgcagctgttatcgcctgagcagc -790 agtgttccccgtgattttgtgctttgaatcaaatttaaatattacaaatataaaaaatat -730 attctgttatacttaatcaaagtaaagtattggtataagcatattttcgttaatttagtc -670 caaatcaactatagtttaagcagtttctagacgtttacatcagtacggcggcaacact cg -610 ttctatattaaatccagacaaccagaacttcttcttcgtttagtccctcgatgttctatt -550 taagggagcttacagcatccatccaacAtcggtaatcgagaatctgttaaccttgtttag -490 tcacagcgaaaggttcatgcatttaagaaccccttggtgaatttaaagaattttcttgta -430 gaaagggacgcgtagtggcgatagttactatggttaactattagcttagctgagtttatc -370 atgattactatttttattatttagatttctgaagggtttgccaataaaagcatgttaaat -310 gttaagcatatctatttaaatataagttattcttctgagggttgtgatagtgataactca -250 tcaggatttataactatagcgcaatactgaacttatcaagctgataagactaccagtagc -190 tgcgtaaaacctttaggacaaatctgaagaggaactaaatacaaaccgtataaactgaat -130 attgtaaagtgataattttttatcgtttttttttcttgtggtttactaatgcagtttcat -70 cta~aaccggtatcaattggttaagtaggtcttgtagttt~

38 CiTnInuclacZ( -1.5){ A( -66)T}

-1432 cagcgaagaacctttaactcataacctctgggttagaggcat -1390 gcgcgctaacaattgtgtcacggcgcgggatcaaatcatgctattaatactgaagcgctt -1330 aaagcgaggatatgtaactcgccatggtccagattatccattcactttgcgtatcatctg -1270 aaaccaaattttttggcaagcctgtaatttggtgtaatttatttaacattattaacaccg -1210 cacaatataacacattcatacaaattttggtttatagtgcccttatccttgaacttcgta -1150 acaagtaattctgtaaaccaagttcacatatatttttgcgggtcagtgctgctacagcaa -1090 atcatacgaaataaaaataaataaatcaccgcactaaatattcagcataactttactttt -1030 cgcggggagcagcagcaggccgaagtttttgataatatttcgatttgttttacattcttt -970 agaaaacgtctctataagttgataaaagtgcatatattgttataataattatgaacgtgg -910 aaaactctttagagcaactggagtggcatggcatcgtaataacagcgtctgctgcttgtc -850 aactcgacagcggggtaggtgcttgtgacgtcatactgcagctgttatcgcctgagcagc -790 agtgttccccgtgattttgtgctttgaatcaaatttaaatattacaaatataaaaaatat -730 attctgttatacttaatcaaagtaaagtattggtataagcatattttcgttaatttagtc -670 caaatcaactatagtttaagcagtttctagacgtttacatcagtacggcggcaacactcg -610 ttctatattaaatccagacaaccagaacttcttcttcgtttagtccctcgatgttctatt -550 taagggagcttacagcatccatccaacAtcggtaatcgagaatctgttaaccttgtttag -490 tcacagcgaaaggttcatgcatttaagaaccccttggtgaatttaaagaattttcttgta -430 gaaagggacgcgtagtggcgatagttactatggttaactattagcttagctgagtttatc -370 atgattactatttttattatttagatttctgaagggtttgccaataaaagcatgttaaat -310 gttaagcatatctatttaaatataagttattcttctgagggttgtgatagtgataactca -250 tcaggatttataactatagcgcaatactgaacttatcaagctgataagactaccagtagc -190 tgcgtaaaacctttaggacaaatctgaagaggaactaaatacaaaccgtataaactgaat -130 attg~aagtgataattttttatcgtttttttttcttgtggtttactaatgcagtttcat -70 cta~aaccggtatcaattggttaagtaggtcttgtagtttggct

CiTnlnuclacZ( -1.5){A( -66)C} 6.-319,-77

-1432 cagcgaagaacctttaactcataacctctgggttagaggcat -1390 gcgcgctaacaattgtgtcacggcgcgggatcaaatcatgctattaatactgaagcgctt -1330 aaagcgaggatatgtaactcgccatggtccagattatccattcactttgcgtatcatctg -1270 aaaccaaattttttggcaagcctgtaatttggtgtaatttatttaacattattaacaccg -1210 cacaatataacacattcatacaaattttggtttatagtgcccttatccttgaacttcgta -1150 acaagtaattctgtaaaccaagttcacatatatttttgcgggtcagtgctgctacagcaa -1090 atcatacgaaataaaaataaataaatcaccgcactaaatattcagcataactttactttt -1030 cgcggggagcagcagcaggccgaagtttttgataatatttcgatttgttttacattcttt -970 agaaaacgtctctataagttgataaaagtgcatatattgttataataattatgaacgtgg -910 aaaactctttagagcaactggagtggcatggcatcgtaataacagcgtctgctgcttgtc -850 aactcgacagcggggtaggtgcttgtgacgtcatactgcagctgttatcgcctgagcagc -790 agtgttccccgtgattttgtgctttgaatcaaatttaaatattacaaatataaaaaatat -730 attctgttatacttaatcaaagtaaagtattggtataagcatattttcgttaatttagtc -670 caaatcaactatagtttaagcagtttctagacgtttacatcagtacggcggcaacactcg -610 ttctatattaaatccagacaaccagaacttcttcttcgtttagtccctcgatgttctatt -550 taagggagcttacagcatccatccaacAtcggtaatcgagaatctgttaaccttgtttag -490 tcacagcgaaaggttcatgcatttaagaaccccttggtgaatttaaagaattttcttgta -430 gaaagggacgcgtagtggcgatagttactatggttaactattagcttagctgagtttatc -370 atgattactatttttattatttagatttctgaagggtttgccaataaaagc -310 -250 -190 -130 .tttcat -70 cta~aaccggtatcaattggttaagtaggtcttgtagtttggct

39 CiTnI(-838/-79)nlacZ

-838 gtaggtgcttgtgacgtcatactgcagctgttatcgcctgagcagc -790 agtgttccccgtgattttgtgctttgaatcaaatttaaatattacaaatataaaaaatat -730 attctgttatacttaatcaaagtaaagtattggtataagcatattttcgttaatttagtc -670 caaatcaactatagtttaagcagtttctagacgtttacatcagtacggcggcaacactcg -610 ttctatattaaatccagacaaccagaacttcttcttcgtttagtccctcgatgttctatt -550 taagggagcttacagcatccatccaacAtcggtaatcgagaatctgttaaccttgtttag -490 tcacagcgaaaggttcatgcatttaagaaccccttggtgaatttaaagaattttcttgta -430 gaaagggacgcgtagtggcgatagttactatggttaactattagcttagctgagtttatc -370 atgattactatttttattatttagatttctgaagggtttgccaataaaagcatgttaaat -310 gttaagcatatctatttaaatataagttattcttctgagggttgtgatagtgataactca -250 tcaggatttataactatagcgcaatactgaacttatcaagctgataagactaccagtagc -190 tgcgtaaaacctttaggacaaatctgaagaggaactaaatacaaaccgtataaactgaat -130 attgtaaagtgataattttttatcgtttttttttcttgtggtttactaaJégc

CiTnI(-838/-1 77)nlacZ

-838 gtaggtgcttgtgacgtcatactgcagctgttatcgcctgagcagc -790 agtgttccccgtgattttgtgctttgaatcaaatttaaatattacaaatataaaaaatat -730 attctgttatacttaatcaaagtaaagtattggtataagcatattttcgttaatttagtc -670 caaatcaactatagtttaagcagtttctagacgtttacatcagtacggcggcaacactcg -610 ttctatattaaatccagacaaccagaacttcttcttcgtttagtccctcgatgttctatt -550 taagggagcttacagcatccatccaacAtcggtaatcgagaatctgttaaccttgtttag -490 tcacagcgaaaggttcatgcatttaagaaccccttggtgaatttaaagaattttcttgta -430 gaaagggacgcgtagtggcgatagttactatggttaactattagcttagctgagtttatc -370 atgattactatttttattatttagatttctgaagggtttgccaataaaagcatgttaaat -310 gttaagcatatctatttaaatataagttattcttctgagggttgtgatagtgataactca -250 tcaggatttataactatagcgcaatactgaacttatcaagctgataagactaccagtagc -190 tgcgtaaaacctt

40 CiTnI( -838/-262)nlacZ

-838 gtaggtgcttgtgacgtcatactgcagctgttatcgcctgagcagc -790 agtgttccccgtgattttgtgctttgaatcaaatttaaatattacaaatataaaaaatat -730 attctgttatacttaatcaaagtaaagtattggtataagcatattttcgttaatttagtc -670 caaatcaactatagtttaagcagtttctagacgtttacatcagtacggcggcaacactcg -610 ttctatattaaatccagacaaccagaacttcttcttcgtttagtccctcgatgttctatt -550 taagggagcttacagcatccatccaacAtcggtaatcgagaatctgttaaccttgtttag -490 tcacagcgaaaggttcatgcatttaagaaccccttggtgaatttaaagaattttcttgta -430 gaaagggacgcgtagtggcgatagttactatggttaactattagcttagctgagtttatc -370 atgattactatttttattatttagatttctgaagggtttgccaataaaagcatgttaaat -310 gttaagcatatctatttaaatataagttattcttctgagggttgtgata

Figure 4: CiTnI DNA in constructs. The exact CiTnI DNA sequence (based on GenBank sequence AF237978) present in each construct is shown above. The transcription start site is shown as an upper case, boid Ietter. The AG dinucleotide adjacent to the natural trans-splice acceptor circled. In cases where a mutation to this dinucleotide has been made, the mutated base is underlined. The putative natural branchpoint sequence is underlined. Light grey bases in italics represent bases deleted from internaI deletions.

41 T ahl e 1 : PCR. pnmers. Primer number Sequence (5' to 3') 9015 GGATCCGATTCTATTTGAATAAG 9055 AGTAACAACCCGTCGGATTCTCC 9066 ATTGGTACCGTAGGTGCTTGTGAC 9103 TGATGACAAGTGACGCTCCACrGrGrG 9110 CATCGAGCACTGTCATCTCCGAATTCCATTTTTTTTTTTTTTTT 9111 GCTCAAGTCGTTGGCTCAGCTCAG 9112 TGATGACAAGTGACGCTCCGACG 9132 TGCAGTTTCATCTACCGCAACCG 9133 TGAGGGGACGACGACAGTATC 9134 TCCAACATCGGTAATCGGAGAATCTG 9135 CGGTTGCGGTAGATGAAACTGCA 9136 TGCAGTTTCATCTACTGCAACCG 9137 CGGTTGCAGTAGATGAAACTGCA 9158 TAACAGCGTCTGCTGCTTG 9159 CATTTAACAAGCTTTTATTGGCA 9160 CTAATGAAGCTTCATCTACCGCA 9161 GATTCTCCGTGGGAACAAAC 9172 CGCTGATTTGTGTAGTC 9173 TCACTCCAACGCAGCACCATCA 9174 ATCGCACTCCAGCCCAGCTTTCC 9177 AGGCGATTAAGTTGGGTAAC 9185 (P04)TGATGACAAGTGACGCTCCGACG 9186 (P04)GTTTTCCCAGTCACGACGTT 9187 GATCCCGGGGCATTAGTAAACCAC 9188 TTTCCCGGGAAGGTTTTACGCAG 9189 AGTCCCGGGTATCACAACCT 9190 GTTTTCCCAGCTGCGACGTT

Collection of Animais

The Ciona intestinalis animals used in the experiments were collected from Point

Judith Harbor, Snug Harbor, Rhode Island USA or from Sandwich Bay Marina, Cape

Cod Canal, Massachusettes USA. They were maintained in Dr. Tom Meedel's laboratory

42 at Rhode Island College, Providence RI, in a seawater aquarium at 14°C under constant light conditions to induce gamete accumulation.

Electroporation of DNA constructs into Ciona embryos

1 performed this work in the laboratory of our colleague Dr. Tom Meedel, a faculty member in the Biology Department at Rhode Island College. The plasmid DNA was transported to Rhode Island as pelleted ethanol precipitates and resuspended upon arrivai to 4.25 Jlg/JlL in water. RNA samples were brought back to Montreal as damp ethanol precipitates, and were resuspended in 20 JlL of water upon return. Stained Ciona embryos were brought to Montreal suspended in PBT [0.1 % Tween-80 (Sigma) in IX

PBS (Sigma)] containing 10mM sodium azide (Sigma).

For each set of zygote electroporations, gametes were collected from 3 or more adult Ciona intestinalis individuals, the eggs were rinsed with Millipore-filtered sea water, and a small amount of sperm from two or more animais was mixed with the eggs in Millipore-filtered sea water. This provides enough fertilized eggs for 2-4 electroporations. Fertilization was allowed to proceed for 5min, at which point the fertilized eggs were washed twice with a thioglycolate solution (0.01 g/mL, pH1O;

Sigma), collected by filtration, and placed on a 1% agarose (in sea water) plate covered in

1 mL thioglycolate solution for removal ofthe chorion which is an impediment to transformation. For dechorionation, 25JlL of a 0.05% Protease E (Sigma) solution was added to the fertilized eggs. Dechorionation was aIlowed to proceed at room temperature until 10-30% ofthe chorions had been removed as monitored by observation in a dissecting microscope, at which point cold Millipore-filtered (MPF) seawater

43 (approximately 2 mL) was added to the dechorionation plates to slow the protease. The dechorionated zygotes were transferred to a 1% agarose (in sea water) plate overlaid with

MPF seawater for a wash. The time delay between observing 10-30% of dechorionated embryos and transferrlng the embryos through the washes to stop dechorionation was sufficient to allow most of the remaining eggs to dechorionate (T. Meedel, personal communication). The sea water wash procedure was repeated 4 more times, each time transferring to a new plate. After the fourth wash, the zygotes were ready to be electroporated.

To prepare the DNA for electroporation, 4.25 JlL ofplasmid DNA (25 Jlg) were mixed with 600 JlL of O. 77M mannitol (Sigma) in an electroporation cuvette ( 2mm gap,

VWR). A fraction of the dechorionated zygotes (1/4-1/2 depending on the number of electroporations being done) taken in a volume of 200 JlL were added to the cuvette already containing the DNA. The electroporation was done using a BTX Unit Model

EC600 electroporator set with a voltage of50 V, a CUITent of850 JlF and a time constant of 20 ms. The electroporated zygotes were transferred to a new agarose plate covered with seawater and were left to develop for 12 h at 18°C.

After 12 h of development, the embryos were chilled to 4°C and then sorted under a dissecting microscope. Normal-appearlng tailbud embryos were kept, while those that failed to develop normally were discarded. The normal embryos from each electroporation were divided into two groups, 20-80 embryos were used for RNA extraction, while the remainder (20-200) were used for J3-galactosidase staining to monitor CiTnIllacZ reporter gene expression.

44 ~-galaetosidase Staining of Tailbud Embryos

Normally developing embryos were fixed for 30 min in a solution of2% paraformaldehyde (Sigma) and 0.1 % Tween80 (Sigma) in PBS. The fixative was removed and the embryos were washed twice in PBT [phosphate buffered saline (Sigma) containing 0.1% Tween80] and once with the staining solution. The staining solution consisted of 0.04% X-gal, 2 mM MgCh, 0.06 M NazHP04, 0.04 M NaHzP04, 4 mM potassium ferrocyanide, and 4 mM potassium ferricyanide (all reagents from Sigma).

The embryos were stained for 2.5 h in the dark at room temperature, the staining solution was removed and the embryos were washed once in PBT. The embryos were stored in

1OmM sodium azide (Sigma) in PBT at room temperature.

Seo ring ~-galactosidase Aetivity

To assess the gene expression activity ofintroduced CiTnIllacZ reporter constructs, X-gal stained embryos were photographed using a Canon Powershot S40 digital camera connected to a Leica MZ 95 dissecting microscope using Canon's Remote

Capture software. Reporter gene B-galactosidase activity was scored based on the number of positively staining (ie. bIue) tail muscle cells. Embryos with 6 or more well­ stained muscle cells were scored +++, those with 1-5 well stained muscle cells were scored as ++, embryos with 1-2 faintly stained muscle cells were scored as +, while those embryos that showed no staining were classified as - (Fig. 7). AlI active constructs showed expression in tail muscle cells, with Iittie or no expression in other cells/tissues.

45 RNA Extraction from Electroporated Ciona embryos

Embryos (20-80) were placed in 10 J!L of a 0.77 M Mannitol solution and put on ice. Water (20 J!L) and 20 J!g ofyeast tRNA carrier (Sigma) were added to the embryos, followed by 10 J!L of5X extraction buffer [0.5 M sodium acetate buffer, pH 5,0.5 M

NaCI, 25 mM EDTA, all from Sigma] and 5 J!L 10% SDS (Sigma) were added to the embryos. The tube was brought to room temperature and vortex-agitated. Water­ saturated-phenol (25 J!L, Sigma) was added and the tube was vortex-agitated 5 min, following which 25J!L ofSevag reagent [24:1 (v/v) chloroform:isoamyl alcohol] was added and again the tube was vortex-agitated for 5 min. The tubes were centrifuged in a microfuge at room temperature for 5 min, 13000 rpm. The aqueous phase was moved to a new tube, an equal volume of Sevag was added and the vortex mixing and centrifugation steps were repeated. Finally, the aqueous phase was removed and RNA was precipitated by adding 250 J!L of ethanol, incubating at -20°C ovemight and microfuge centrifugation at 13 000 rpm for 10 min at 4°C to pellet the RNA. The RNA pellet was resuspended in 15 J!L water.

Poly-C Tailing S'RACE

This 5 'RACE technique was performed on RNA samples extracted from 12 h

Ciona embryos developed from zygotes electroporated with CiTnInuc1acZ( -1.5),

CiTnInuc1acZ(-1.5){A(-66)C} and CiTnInuc1acZ(-1.5){A(-66)T}, using Invitrogen's

5 'RACE system for Rapid Amplification ofcDNA Ends Version 2. O. The protocol outlined by the manufacturer was followed, inc1uding pre-treatment of the RNA with

DNaseI (Invitrogen). The method, which is designed to amplify 5'-segments ofmRNAs,

46 requîres three nested gene-specifie primers. Beeause my goal was to analyze the

5'sequences ofthe B-galactosidase mRNAs transcribed from DNA construets introduced by electrotransformation, 1 used three nuclear lacZ gene-specifie primers: 9172,9173 and

9174 (Table 1).

5'RACE produets were recovered following agarose gel eleetrophoresis using the

Sephaglas BandPrep Kit (Amersham). The products from CiTnInuclacZ(-1.5){A(-

66)C}were sequeneed by Sheldon Biotechnology Centre by PCR cycle sequencing using oligo 9177 as a sequencing primer. In addition, aIl produets were cloned into the

SpeIlSmaI sites of Blueseript II SK+ and then sequeneed using T7 and T3 sequencing primers (Appendix II). For this cloning, the PCR produets were blunted by incubating the reeovered DNA with 1 J.lL 10X Pfu buffer, 1 J.lL 2.5 mM dNTPs and 2.5 U Pfu DNA polymerase in a final volume of 10 J.lL at 72°C for 30 min. The enzyme was removed by phenol-ehloroform extraction followed by ethanol precipitation and the DNA was eut with SpeI (New England Biolabs) in lO-fold exeess using Y+ buffer (MBI Fermentas) at

37°C for 1.5 h. To prepare the vector, Bluescript SKII+ was digested for 1.5 h at room temperature with Smal (MBI Fermentas) and then with Spel, 1.5 h at 37°C. The vector and insert DNA were reeovered following eleetrophresis in a 1% agarose gel using the

Sephaglas BandPrep Kit from Amersham.

Insert and vector were ligated according to the standard ligation procedure and following heat-shock transformation white colonies were se1ected. Plasmid DNA purified by QIAPrep Spin Miniprep was screened by restriction digest analysis and confrrmed clones were sequenced at She1don Biotechnology Centre with both the T7 and T3 primers.

47 SLPCR

SL PCR was used to amplify transcripts that had been trans-spliced by the Ciona intestinalis spliced leader (SL). The template used was the poly(dC)-tailed cDNA produced as part of the poly(dC)-tailing 5'RACE kit (see previous section). The PCR reactions varied from the standard mix in that IX Taq buffer without Mg (MBI

Fermentas) was used and MgCh (MBI Fermentas) was added to the reaction to give a final Mg concentration of 1.6 mM and 2.5 U ofTaq DNA polymerase (MBI Fermentas) were used per reaction. Oligos 9055 and 9015 were used as primers in the reactions. The gel isolated SL PCR products were sequenced by direct cycle sequencing at She1don

Biotechnology Centre ofMcGill University using 9055 as the sequencing primer

(Appendix II).

SMART S'RACE

The SMART 5'RACE procedure (Zhu et al., 2001) Was performed on mRNA from CiTnlnuc1acZ(-1.5){A(-66)C}. The 20 ilL RT-reaction contained 5 Ilg RNA, 1 ilL ofboth 9110 and 9103 oligonuc1eotides at O.IIlg/IlL ,41lL 5X reaction buffer

(Clontech), 2 ilL 100 mM dithiothreitol (Clonetech), 2 ilL 10 mM dNTP (Gibco), and lU

Powerscript reverse transcriptase (Clonetech). The RNA, oligonucleotides and buffer were mixed together and heated at 72°C for 6 min at which point the tubes were placed on ice and the remaining components added. The complete reactions were incubated at

42°C for 70 min and the reaction was inactivated by incubating at 72°C for 15 min.

For further analysis, 2.5 ilL of the reverse transcription reaction Was used as template in a standard PCR reaction. Amplification of the endogenous CiTnI RNA used

48 oligos 9111 an 9112, while amplification ofCiTnlnuc1acZ(-1.5){A(-66)C} used primers

9185 and 9186. The PCR cycling conditions are listed in Appendix 1. The PCR products were analyzed on a 1.5% agarose gel.

The QIAquick PCR Purification Kit (Qiagen) was used to prepare the 5'RACE products from CiTnInuc1acZ(I.5){ A(-66)C} for c1oning. The PCR products were then ligated into Smal-cut Bluescript II SK+ treated with shrimp alkaline phosphatase (MBI

Fermentas) and standard ligation and electrotransformation protocols were used. Plasmid

DNA minipreps were prepared using the Sigma GenElute Plasmid Miniprep Kit and were screened by digesting with Agel, a unique site in the CiTnI/lacZ constructs located in the polylinker section that precedes the nuc1acZ gene. Clones that were cut by Agel were sequenced at Sheldon Biotechnology Centre with the M13REV primer (Appendix II).

Subsequent additional experiments indicated that the SMART 5'RACE technique tends to amplify 250-350 bp products corresponding to Ciona intestinalis 28SrRNA, apparently due to mis-priming by the SMART anchor primer on a regionofthe 28S rRNA sequence that is a 9/23 base sequence match (data not shown). The rRNA products did not interfere with the above analysis ofCiTnlnuc1acZ(-1.5){A(-66)C}-derived mRNAs because c10ned products were screened for the presence of the Agel site, which is not present in the rRNA products. However, the rRNA product can complicate the interpretation of gel electrophoresis results of SMART 5'RACE products derived from

Ciona intestinalis samples, and this should be kept in mind in future studies using this technique.

49 3. RESULTS: CRYPTIC TRANS·SPLICE ACCEPTOR SITES

Natural trans-splice acceptor site mutants

The AG dinucleotide adjacent to the natural trans·splice acceptor site at position -

66 of the CiTnI gene was targeted for mutations to determine what effect such mutations would have on SL trans-splicing. It was possible that such a mutation might eliminate trans·splicing and if so, would directly permit assessment of the functional importance of trans-splicing for the CiTnI gene. On the other hand, it was also possible that such a mutation might lead to the use of an alternative cryptic trans-splice acceptor site. Mutant

CiTnI nuclear ~gal reporter constructs were electroporated into Ciona zygotes and, after allowing development to proceed to the tailbud stage, two types of data were gathered: 1) sequence data for the S'end of the RNA molecules produced from the CiTnI constructs, and 2) expression data measured in a semi-quantitative manner by X-gal staining to detect the ~-galactosidase reporter gene function in tail muscle cells to assess if the RNA transcribed from the plasmids could direct the synthesis of normallevels of functional protein.

1 made two point mutations, A~C and A~T, both targetting the A of the AG dinucIeotide at position -66. Both ofthese mutations have been shown to eliminate cis· splicing to the natural cis·splicing acceptor site in a synthetic rabbit ~-globin intron in vertebrate cells (Aebi et al., 1986). 1 generated two different mutants in case one of the two substitutions were innately harmful to the DNA due to the new sequence created, rather than an effect resulting from loss of trans-splicing per se.

50 Three different methods were used in mRNA 5' -sequence analysis: SL PCR, poly(dC)-tailing 5'RACE and SMART PCR.

Analysis ofmRNA 5'-end sequence: SL PCR

SL PCR is a reverse-transcriptase PCR technique that amplifies SL trans-spliced

RNA molecules based on the presence of the SL sequence at the 5'-end 01andenberghe et al.,2001). 1 used this technique with a j3-gal-specific reverse transcription primer to assess the presence of SL trans-spliced mRNAs produced from electrotransformed

CiTnVB-gal constructs. j3-gal SL PCR from wildtype construct CiTnInuclacZ( -1.5)­ transformed 12h Ciona embryos yielded PCR products 550 bp in length (Fig. 5), as expected based on trans-splicing to the natural acceptor site at -64 (Vandenberghe et al.,

2001). When RNA extracted from embryos electroporated with either splice acceptor mutant, CiTnlnuclacZ(-1.5){A(-66)C} or CiTnInuclacZ(-1.5){A(-66)T}, was used products of approximately 550 bp were also observed (Fig. 5). This suggested that, if the mutants had eliminated trans-splicing at the natural site, then a cryptic acceptor site had been activated nearby. The PCR products were extracted from the gel and the 5'ends were sequenced (Fig. 6). The wildtype construct, as expected, showed trans-splicing at position -64, the natural trans-splice acceptor site (Fig. 6). PCR products from both trans-splice acceptor mutants showed trans-splicing not to the natural trans-splice acceptor site at position -64, but instead to a cryptic trans-splice acceptor site 12 bases upstream at position -76 (Fig. 6). This cryptic acceptor site corresponds to the next AG dinucleotide upstream from the one mutated. The size difference between the wildtype and mutant RT-PCR products (12 base pairs over ~550 bp) is too small to notice on an

51 agarose gel, so all three SL PCR products appeared to be the same size in Fig. 5. The results of the SL PCR experiments indicated that mutation of the A in the AG dinucleotide that defines a trans-splice acceptor site prevented trans-splicing to the natural acceptor site, and revealed an upstream cryptic acceptor site.

Figure 5: SL PCR of 8-gal mRNAs from 12h embryos electrotransformed with wildtype trans-splice acceptor, mutant trans-splice acceptor and -319,-77 outron deletion constructs. SL PCR products were analyzed by electrophoresis in a 1% agarose gel. Lanes 1 & 9: 500 ng MBI Fermentas GeneRuler DNA Ladder Mix (SM0331); lane 2: SL PCR ofCiTnInuclacZ(-1.5){A(-66)C}; lane 3: SL PCR ofCiTnInuclacZ(-1.5){A(- 66)C}, no reverse transcriptase; lane 4: SL PCR ofCiTnInuclacZ(-1.5){A(-66)T}, no reverse transcriptase; lane 5: SL PCR ofCiTnlnuclacZ(-1.5){A(-66)T}; lane 6: SL PCR ofCiTnInuclacZ(-1.5){A(-66)C}ô-319,-77, no reverse transcriptase; lane 7: SL PCR of CiTnlnuclacZ(-1.5){A(-66)C}ô-319,-77; lane 8: SL PCR ofCiTnlnuclacZ(-1.5).

52 a) CiTnlnuclacZ(-1.5) 5' end structure of transcript raturaI acceptor site CAACCGGTATCAATTGGTTAAGTAGGTCTTGTAGTTTGGCT CiTnUnBgal Construct CAGCGAAGAACC/.. 1345bp .. /TGGTTTACTAATGCAGTTCATCTACAGCAACCGGTATCAATTGGTTAAGTAGGTCTTGTAGTTTGGCT -90• -80 • -70• -60 • -50• -40 • -30 •

b) CiTnlnuclacZ(-1.5){A(-66)C}

5' end structure of transcript fryptic acceptor site TTTCATCTAC CAACCGGTATCAATTGGTTAAGTAGGTCTTGTAGTTTGGCT CiTnUnBgal Construct

CAGCGAAGAACC/.. 1345bp .. /TG~TTTACTAAT~CAGTTTCAT~TACCAAC~GGTATCAATT~GTTAAGTAGG1CTTGTAGTT1GGCT -90 -80 -70 -60 -50 -40 -30 c) CiTnlnuclacZ(-1.5){A( -(6)T} 5' end structure of transcript fryptic accepter site TTTCATCTACTGCAACCGGTATCAATTGGTTAAGTAGGTCTTGTAGTTTGGCT CiTnI/nBgal Construct CAGCGAAGAACC/.. 1343bp .. /TGGTTTACTAATGCAGTTTCATCTACCAACCGGTATCAATTGGTTAAGTAGGTCTTGTAGTTTGGCT -90• -80• -70• -60• -50• -40• -30•

d) CiTnlnuclacZ(-1.5){A(-66)C}6-319,-77

5' end structure oftranscript fryptic acceptor site

ATTTCTGAGGGTTTGCCAATAAAAGCTTTCATCTACCAACCGGTATCAATTGGTTAAGTAGGTCTTGTAGTTTGGCT

CiTnI/nBgal Construct CAGCGAAGAACC/... 1100bp ... /AGATTTCTGAGGGTTTGCCAATAAAAGCTTTCATCTACCAACCGGTATCAATTGGTTAAGTAGGTCTTGTAGTTTGGCT • • • • • • • -340 -330 -319 ••1 -70 -60 -50 -40 -30 -76 e) CiTnI(-838/-79)nlacZ 5' end structure oftranscript Cryptic accepter site 1 ATTCT~AAAGGGT1T1GCCAAT~GC/... 234bp ... /CTTGT~TTTACTAAT~C -340 -330 -320 -90 -80 CiTnIlnBgal construct GTAGGTCTTGT/...... 722bp ...... /CTTGTGGTTTACTAATGC -830• -90• -80•

1) CiTnI(-838/-177)nlacZ

5'end structure oftranscript Cryptic accepter site 1 ATTCT~AAAGGGTTTIGCCAAT~GC/... 128bp ... /~CTGCGTAAAAÇCTT -340 -330 -320 -190 -180 CiTnI/nBgal construct GTAGGTCTTGT/...... 628bp ...... /GCTGCGTAAAACCTT • • • -830 -190 -180

g) CiTnI(-838/-262)nlacZ

5' end structure of structure Cryptic accepter site 1 ATTCT~AAAGGGTTT1GCCAAT~GC/... 4Sbp ... /TGAGGqTTGTGATA -340 -330 -320 -270 CiTnIlnBgal construct

GTAGGTCTT~T/...... SS2bp ...... /TGAGG~TTGTGATA -830 -270 Figure 6: SL PCR-derived 5'end sequences oftranscripts isolated from 12h C. intestinalis embryos electrotransformed with CiTnIlnuclear lacZ constructs. In each case the sequence of the transcript is shown above the sequence of the DNA construct that was used for electroporation. CiTnI sequence is shown in black, the SL is orange and nucIear lacZ is violet. a) CiTnInuclacZ( -1.5) was trans-spliced to the natural acceptor site at -64. This sequence was obtained by SL PCR and polyC-tailing 5'RACE analysis; b) CiTnlnuc1acZ(-1.5){A(-66)C} was trans-spliced at two different cryptic acceptor sites, one at -76 and one at -39. Molecules trans­ spliced at -76 were detected using SL PCR, polyC-tailing 5'RACE an SMART PCR. Molecules trans-pliced at -39 were detected only with SMART PCR.; c) CiTnlnuc1acZ( -1.5){ A(-66)T} was trans-spliced at a cryptic acceptor site at -76. These results were obtained by

SL PCR and polyC-tailing 5'RACE.; d, e, f, g) CiTnInuclacZ(-1.5){ A(-66)C} ~-319,-77,CiTnI(-838/-79)nlacZ, CiTnI(-838/-177)nlacZ and CiTnI( -838/-262) were aIl trans-spliced at a cryptic acceptor site at -345. These results were obtained by SL PCR. Analysis ofmRNA 5 'end sequences: Poly(dC)- tailing 5 'RACE

SL PCR will only amplify mRNA molecules that have been trans-spliced. For a more thorough analysis of the 5'ends of RNA from the wildtype and trans-splice acceptor mutant constructs including assessment of the possible production of non-trans-spliced mRNAs, a 5'RACE kit from Invitrogen was used that would amplify the 5'ends of mRNA molecules whether trans-spliced or not. This method uses a gene-specifie primer for the reverse transcription reaction, then terminal transferase is used to add a poly( dC) tail the 3'end of all first-strand cDNA molecules. A 5'anchor oligonuc1eotide containing a poly(dI,dG) tail is then hybridized with the cDNA 3'poly(dC) tail and is used as a primer for second strand cDNA synthesis, thus linking the 5'anchor sequence to the cDNA and thereby introducing a 5'PCR amplification sequence.

55 Figure 7: Poly(dC)-tailing S'RACE of nlacZ mRNA from 12h embryos electrotransformed with the wildtype trans-splice acceptor, the trans-splice acceptor mutants or the -319,-77 outron deletion construct. PCR products were analyzed by electrophoresis on a 1.5% agarose gel. Lanes 1& 14: 500 ng MBI Fermentas GeneRuler (SM0331); lane 2: poly(dC)-tailing 5'RACE on CiTnInuclacZ(-1.5){A(-66)C}; lane 3: poly(dC)-tailing 5'RACE on CiTnlnuclacZ(-1.5){A(-66)C} no terminal transferase; lane 4:poly(dC)-tailing 5'RACE on CiTnInuclacZ(-1.5){A(-66)C}, no reverse transcriptase; lane 5: poly(dC)-tailing 5'RACE on CiTnInuclacZ(-1.5){A(-66)T}; lane 6: poly(dC)­ tailing 5'RACE on CiTnlnuclacZ(-1.5){A(-66)T}, no terminal transferase; lane 7: poly(dC)-tailing 5 'RACE on CiTnlnuclacZ(-1.5){ A( -66)T}, no reverse transcriptase; lane 8: poly(dC)-tailing 5'RACE on CiTnlnuclacZ(-1.5){A(-66)C}~-319,-77; lane 9: poly(dC)-tailing 5'RACE on CiTnInuclacZ(-1.5){A(-66)C}~-319,-77, no terminal transferase; lane 10: poly(dC)-tailing 5'RACE on CiTnInuclacZ(-1.5){A(-66)C}~-319,- 77 no reverse transcriptase; lane 11: poly(dC)-tailing 5'RACE on CiTnInuclacZ(-1.5); lane 12: poly(dC)-tailing 5'RACE on CiTnlnuclacZ(-1.5), no terminal transferase; lane 13: poly(dC)-tailing 5'RACE on CiTnInuclacZ(-1.5), no reverse transcriptase.

The wildtype construct, CiTnInuclacZ( -1.5), generated a single reverse- transcriptase dependent poly(dC)-tailing 5'RACE product of 425 bp, as expected (Fig. 7 lane Il). The RNA from CiTnInuclacZ(-1.5){A(-66)C} generated a major product of

450 bp, and very minor products of 650 bp and 800 bp (Fig. 7 lane 2). The minor 800 bp product did not re-amplify in a subsequent PCR reaction with the same primers and was not analyzed futher. The RNA from CiTnInuclacZ( -1.5){ A( -66)T} also gave a major

56 5'RACE product of 450 bp and a minor product of 650 bp (Fig. 7 lane 5). AIl ofthese products were extracted from the gel and subjected to direct sequencing which generated good sequence for the 450 bp and 650 bp 5'RACE products from CiTnlnuclacZ(-

1.5){A(-66)C}. These two products were identical at the 5'end (Fig. 6). The extra length ofthe 650 bp product (exact length = 629 bp) is at the 3'end of the molecule and is the result of mis-priming by the gene-specific primer 9174 at a site that is a 9/22 match to the correct priming site. In both products, trans-splicing occurred at the cryptic site at position -76. These results independently confirm the 5'structure ofCiTnI/~-gal rnRNAs from CiTnInuclacZ(-1.5){A(·66)C} previously obtained by SL PCR. (See Appendix III for examples of raw sequence data).

Direct cycle sequencing was not successful for either the wild type,

CiTnlnuclacZ(-1.5) or CiTnInuclacZ(-1.5){A(-66)T} 5'RACE products. To get around this problem, S'RACE products extracted from the gel were cloned into SmaIlSpeI·cut

BluescriptII SK+ and individual clones were sequenced. Samples from

CiTnlnuclacZ{A(~66)C}, CiTnInuclacZ{A(-66T)} and CiTnlnuclacZ(-1.5) were analyzed in paraUel. One of three clones derived from wildtype CiTnInuclacZ(-1.5) was found to be trans·spliced to the natural acceptor site (-64) as expected (Fig. 6). However, surprisingly, the other two clones corresponded to a non-trans-spliced molecule whose

5'end mapped to the CiTnI genomic position -35 (Table 2). This shorter, non-trans­ spliced mRNA molecule may be evidence of a novel transcription start site, something that is addressed in the next chapter. Similar results were obtained for CiTnInuclacZ(-

1.5){A(-66)C} (Table 2). In this case the trans·spliced molecules (2/4) did not use the natural acceptor site, using instead the cryptic acceptor site at -76, consistent with results

57 of direct sequencing ofuncloned 5'RACE and SL PCR products (Fig. 6). The other two clones were non-trans-spliced molecules whose 5'end aligned with genome position -35

(see next chapter). Two CiTnlnuclacZ(-1.5){A(-66)T} 5'RACE clones were analyzed and both were found to be trans-spliced to the cryptic acceptor site at position -76 (Fig.

6). One ofthese clones was obtained from the 650 bp poly(dC)-tailing 5'RACE product, while the other was obtained from the 450bp poly(dC)-tailing 5'RACE product. Both clones had the same 5' end. Again, as for the A ---+C mutant construct discussed above, the longer product resulted from mispriming at the 3' end.

The foregoing 5'RACE sequence data revealed a number ofthings. Firstly, mutating the AG adjacent to the natural trans-splice acceptor site to CG or TG prevented trans-splicing to the natural acceptor. Instead, a cryptic site at -76 was used. Secondly, not all of the 5'RACE products were trans-spliced. A novel non-trans-spliced mRNA initiating at -35 was also detected. However, production ofthis novel mRNA did not depend on mutation ofthe trans-splice acceptor site as it was also observed in RNA extracted from embryos previously electrotransformed with the wildtype construct

CiTnInuclacZ(-1.5). This surprising result, though somewhat tangential to the original project goals, nonetheless could have significant implications for experimental interpretation of CiTnI/nuclacZ expression. Further investigation of the apparent non­ trans-spliced -35 RNA is reported below and in the next chapter.

Analysis ofmRNA 5'end sequence: SMART peR

We were concemed that the unexpected non-trans-spliced "-35" RNA could perhaps be a 5'RACE artefact resulting from reverse transcriptase faH-off at the -35

58 position of longer, presumably trans-spliced mRNAs. SMART (§.witching mechanism ~t the 5'end of the RNA !ranscript) is a 5'RACE technique that is thought to be less susceptible to reverse transcriptase faH-offthan other 5'RACE methods because it depends on the fact that when reverse transcriptase reaches the end of the template transcript it adds 3-4 dCs to the 3' end of the cDNA molecule as part of its normal termination mechanism (Zhu et al., 2001). In contrast, in the poly( dC) tailing Invitrogen

Kit, an oligo(dC) tail is added by terminal transferase to all cDNA molecules present, whether due to full-Iength reverse transcription or to fall-off.

The SMART PCR clones were generated by blunt-end cloning of the unfractionated PCR products in to Smal cut and phosphatase-treated pBluescript II SK+ and clones containing inserts with an Agel site (confirms the presence of lacZ) were sequenced. When SMART PCR was performed on the CiTnlnuclacZ(-1.5){A(-66)C}

RNA, one out of seven clones sequenced did correspond to the non-trans-spliced -35 mRNA (Table 2 or Fig. 6). This suggests that the -35 mRNA is a real transcript, and not the result of a reverse transcriptase faH off artefact. Although the proportion of molecules representing the ~35 RNA may differ in the poly(dC)-tailing analysis ofthis

RNA (2/4 molecules) and the SMART analysis (1/7 moleucles) (Table 2) the sample size was too small to produce valid statistical differences.

ln addition to the non-trans-spliced -35 RNA, SMART analysis of

CiTnInuclacZ(-1.5){ A(-66)C} mutant RNA revealed trans-spliced mRNA molecules.

Four of the seven correspond to the mRNA trans-spliced at the previously noted cryptic

59 acceptor site, while two molecules were trans-spliced at another novel cryptic acceptor at

-39, the next AG downstream ofthe natural trans-splice acceptor site.

T a bl e 2 :mRNA end s d e t ec t e d b ypoy1 (dC) -t allngT and SMART5'RACE anal lySIS . Source Total number Trans-spliced at -64 Trans-spliced at - Trans-spliced at Not trans-spliced Construct of clones (poly(dC)/SMART) 76 -39 (-35 RNA) sequenced (poly(dC)/ (poly(dC)/ poly(dC)/ (poly(dC)- SMART) SMART) SMART) /SMART) CiTnlnuclacZ 3 1 0 0 2 (-1.5) (3/0) (110) (2/0) CiTnlnuclacZ 11 0 6 2 3 (-1.5){A(- (417) (2/4) (0/2) (211) 66)C} CiTnlnuclacZ 2 0 2 0 0 (-1.5){A(- (2/0) (2/0) 66)T}

Reporter gene expression assayed by embryo X-gal staining analysis

5'RACE sequencing analyses reveal the structure of a transcript, but they do not reveal whether it can be translated into a functional protein. To investigate whether shifting trans-splicing from the natural trans-splice acceptor site to a previously cryptic site 12 bp upstream affected protein expression, 12h tailbud embryos that had been electroporated with either the wild type construct (CiTnlnuc1acZ( -1.5)) or one of the two splice site mutant constructs were stained with X-gal to detect the presence of the nuclacZ reporter gene product, p-galactosidase (Table 3). Cells that stain blue express a functional p-galactosidase protein, and indicate that the mRNA derived from the DNA construct is translation-competent. A semi-quantitative analysis based on the percentage of embryos that showed staining and on the number of tail muscle cells stained indicated that both the wild type and mutant constructs were similarly active in terms of functional

B-galactosidase expression in the tail muscle cells of the 12h embryos (Table 3). There

60 may be some small differences in expression levels among the various constructs, but this was not investigated further. Thus mutation of position -66, the A of the AG adjacent to the natural trans-splice acceptor site, does not greatly diminish expression of the construct at the protein level (Table 3). 1 conclude that trans-splicing to cryptic sites that were activated when position -66 was mutated was not grossly deleterious to protein expression. A corollary conclusion is that trans-splicing at the natural trans-splice acceptor site is not necessary for protein expression.

Table 3: Semi-quantitative assessment of~-galactosidase expression in 12h Ciona tallb u d em)ryos b DNA Construct Number Total %+++ %++ %+ %- of number embryos embryos embryos embryos electro- of porations embryosl CiTnlnuclacZ(-1.5) 1 61 93 3 2 2 CiTnInuclacZ(-1.5){ A(-66)C} 2 152 74 19 4 3 CiTnInuclacZ(-l.5){A( -66)T} 2 125 62 17 9 13 CiTnInuclacZ(-1.5){ A(-66)C} .::\-319, -76 2 156 57 30 4 9 CiTnI(-838/-79)nlacZ 3 97 10 24 7 59

CiTnI(-838/-177)nlacZ 2 261 66 25 4 5

CiTnI(-838/-262)nlacZ 2 266 35 32 Il 22

1 Where multIple electroporatlOns were performed each rephcate was done on a dtfferent day. For each construct similar results were obtained in each replicate, and so the results were pooled. The independent DNA maxi preps for each construct were not used; they were prepared in the event that no fi-gal staining was observed.

61 a) +++ b) ++

•• "

c) + d)

Figure 8: Examples of tailbud embryos for each of the categories used in semi­ quantitative 8-galactosidase expression assay. a) Embryos in the +++ category have 6 or more tail muscle ceUs strongly expressing fi-galactosidase (stain dark blue); b) embryos in the ++ category have 1-5 tail muscle ceUs strongly expressing fi­ galactosidase; c) embryos in the + category have 1-2 tail muscle ceUs weakly expressing fi-galactosidase (faint blue stain); d) embryos in the - category do not express fi­ galactosidase (do not stain).

Outron Deletion (-319 to -77) Construct

The previous section showed that mutation of the natural trans-splice acceptor site at -64 led to the activation of cryptic acceptor sites adjacent to the nearest AG dinucleotides upstream (at position -76) and, to a lesser extent, downstream (at position-

35). This suggested that cryptic trans-splicing could perhaps occur at any AG dinucleotide near the mutated natural trans-splice acceptor site. In order to explore this issue further, 1 created a deletion variant ofthe CiTnlnuclacZ(-1.5){A(-66)C} mutant in which a 250 bp segment of the outron sequence immediately upstream ofto the -76

62 cryptic acceptor site was deleted. This construct was called CiTnInuc1acZ( -1.5) {A( -

66)C}~-319,-77. This deletionjuxtaposed upstream outron DNA inc1uding several AG dinucleotides with the -76 cryptic site (while removing the original AG just upstream of -

76) (Fig. 3). The nearest relocated AG (original CiTnI positions -322, -321) was separated from the cryptic acceptor site at -76 by only one nuc1eotide, and the next upstream AG (original coordinates -337,-338) was separated from it by 17 bp (Fig. 4).

Analysis of5 'end sequence: SL PCR and poly(dC)-tailing 5 'RACE

1 found that CiTnInuclacZ(-1.5){A(-66)C}~-319,-77 produced trans-spliced mRNAs. SL PCR analysis revealed a single reverse-transcriptase-dependent product of

~600bp (Fig. 5) and sequencing revealed this to correspond to an mRNA molecule trans­ spliced to an AG-adjacent nucleotide whose position in the intact gene is -346. This trans-spliced mRNA would be predicted to generate a poly(dC)-tailing 5'RACE product of 470 bp and indeed a single 470 bp product was observed (Fig. 5), suggesting that no major non-trans-spliced rnRNAs were generated in addition to the molecules trans­ spliced at -346.

My result established that cryptic trans-splicing is not forced on the AG-adjacent nuc1eotide closest to the mutated natural trans-splice acceptor site. In CiTnInuc1acZ(-

1.5){A(-66)C}~-319,-76 neither the nearest upstream AG (12 bp away), northe next­ nearest (28 bp away) were targeted for trans-splicing. The cryptic site actually used (-

346) was actually adjacent to the third-closest AG 38 bp upstream of the natural acceptor site. Moreover, although the sequence environment of the -76 cryptic trans-splice acceptor site in the deletion mutant included an AG placed nearby, this site was not trans-

63 spliced. This clearly shows that an AG placed near to a natural trans-splice acceptor site is not sufficient to target cryptic trans-splicing. Other features are required, perhaps proximity to a functional branchpoint. The fact that not aIl AGs readily function as cryptic acceptor sites indicated that it might be experimentally possible to generate mRNAs that preserve significant stretches of the outon, including AG dinucleotides.

Trans-splicing to -346 had previously been noted by Parul Khare in our lab in the course ofpromoter mapping experiments using a CiTnIDNA fragment extending from -838 to-

337 (Khare; 2005). Her construct did not contain the natural trans-splice acceptor site nor any part of the first exon. My results with CiTnInuclacZ(-1.5){A(-66)C}~-3l9,-77, which contains the mutated natural trans-splice acceptor site and 40 bp of the first exon, both confirmed -346 as a cryptic acceptor site and permitted inferences regarding the role of AG in defining cryptic splice sites.

Reporter gene expression assayed by embryo X-gal staining analysis

Twelve hour embryos developed from zygotes electroporated with

CiTnlnuclacZ(-1.5){ A( -66)C} ~-3l9,-77 showed X -gal staining at levels similar to wildtype CiTnlnuclacZ(-1.5) (Table 3). This shows, again, that trans-splicing at the natural acceptor site is not essential for effective protein synthesis. It also further extends the observation that at least short stretches of the outron sequence can be retained without compromising expression.

64 Outron 3' Truncation Constructs

The discovery that mutation of the AG adjacent to the natural trans-splice acceptor site at -64 activated cryptic trans-splicing at sites (-76 and -39) adjacent to the nearest AG dinucleotides at first suggested the possibility that there may be numerous cryptic acceptor sites at short intervals throughout the outron, ie. adjacent to each AG dinucleotide. However, results with CiTnInuclacZ(-1.5){A(-66)C}ô-319,-77 clearly indicated that not all AG dinucleotides, but only a subset could serve to specify cryptic trans-splice acceptor sites. To generate an overview of the distribution offunctional cryptic acceptor sites, 1 decided to make three evenly spaced 3 'truncations of the outron.

These were designed in such a way that the portion of the outron between the natural splice acceptor site and the upstream cryptic splice acceptor site at -346 was divided essentially evenly in terms of length and in terms of the number of AG dinucleotides present. In the longe st construct, the CionafnuclacZjunction occurs at -79, within the putative branchpoint sequence for the natural trans-splice acceptor site. Therefore, neither of the previously identified cryptic acceptor sites at -76 nor -39 were present in any ofthere constructs, although the one at -346 remains present in each (Fig. 4). The

DNA constructs are, from smallest to largest outron 3' truncations, CiTnI(-838,-79)nlacZ,

CiTnl(-838,-1 77)nlacZ and CiTnl(-838,-262)nlacZ (Fig. 3).

SL PCR of3 'outron truncation transcripts

SL PCR was performed on RNA from 12h C. intestinalis embryos developed from zygotes electroporated with each of the three outron 3' truncation constructs.

Different sized SL PCR products were produced from each RNA sample (Fig. 9). CiTnl(-

65 838,-79)nlacZ yielded a band of 750-800 bp, CiTnI(-838,-177)nlacZ produced a product of 670-700 bp and CiTnI( -838,-262)nlacZ gave a product of~600 bp. These PCR products were analyzed by direct cycle sequencing, which revealed that all three were trans-spliced to the same cryptic acceptor site at -346 (Fig. 6). Given that the three molecules were trans-spliced at the same 5' location, but had different outron

3 'truncations, the length of outron present in each case was different. The greatest length of outron (268 nt), bases -346 to -79 was present in mRNA derived from CiTnI(-838,-

79)nlacZ. Similarly, the outron on molecules from the construct CiTnI( -838,-177)nlacZ spanned -346 to -177 (length 170 nt). The same pattern held true for molecules from

CiTnI(-838,-262)nlacZ, which had the shortest outron segment (85 nt); bases: -346 to-

262.

66 Figure 9: SL PCR of CiTnlnuclacZ RNA from 12h embryos electransformed with the outron 3' truncation constructs. PCR products were analyzed by electrophoresis on a 1% agarose gel. Lane 1: 10 JlL (1030 ng DNA) MBI Fermentas Mass Ruler (SM0403); lane 2: SL PCR of CiTnI(-838/-79)nlacZ RNA; lane 3: SL PCR of CiTnI(- 838/-79)nlacZ RNA, no reverse transcriptase; lane 4: SL PCR ofCiTnI(-838/-171)nlacZ RNA; lane 5: SL PCR of CiTnI(-838/-171)nlacZ RNA, no reverse transcriptase; lane 6: SL PCR of CiTnI( -838/-262)nlacZ RNA; lane 7: SL PCR of CiTnI( -838/-262)nlacZ RNA; no reverse transcriptase.

My conclusion from this study is that only three AG dinucleotides within the

CiTnloutron are capable of specifiying a cryptic trans-splice acceptor site, ie. the AGs adjacent to -346, -76, -39.

Reporter gene expression assayed by embryo staining analysis

Tailbud embryos, previously electroporated with CiTnI(-838,-79)nlacZ, CiTnI(-

838,-177)nlacZ or CiTnI(-838,-262)nlacZ were stained with X-gal to determine ifany of the se 3' outron deletions affect the synthesis of a functional reporter protein (Table 3).

None of the deletions eliminated the production off}-galactosidase; however, CiTnI(-

67 838,-79)nlacZ appeared to give significantly weaker expression than the others (Table 3) on three replicate trials. Only 41 % of embryos showed X-gal staining and this was the only construct tested where the number of embryos showing +++ staining was less than the number of embryos showing ++ staining. This apparently less efficient expression maybe related to having a longer outron present on the mature mRNA molecule, although this will have to be confirmed in future studies by more extensive analysis, induding co­ transfection controls.

Comparison of branchpoint-Iike sequences

My data indicate that cryptic trans-splicing can occur at two widely separated sites in the CiTnloutron: those around the natural trans-splice acceptor site (ie. sites -76 and -39), and position -346. It is possible that this could reflect the existence of only two potential branchpoints, the natural one and a cryptic one associated with trans-splicing at

-346. When SL trans-splicing was originally discovered in the CiTnI gene a putative branchpoint at -82 was identified based on matching the vertebrate consensus

YRCTRAy (branchpoint A residue underlined) (Vandenberghe et al., 2001). 1 examined the entire outron for matches to this consensus sequence. The only perfect match is the one identified by Vandenberghe et al.: TACTAAT (Table 4). There is a potential branchpoint (TACTATT) 16 nt upstream ofthe cryptic acceptor site at -346, but it is no better than 15 other non-perfect branchpoint sequences within the outron (Table 4) and may be "worse" than many in that the key A residue itself is not at its usuallocation in the consensus.

68 Table 4: Catalogue ofbranchpoint-related sequences in the CiTnI outron

Sequence Position in CiTnI (mismatches to consensus are highlighted) YRCTRAY Consensus TACTAAT -82 TGATAAT -116 AACTGAA -133 AACTAAA -153 ATCTGAA -164 GACTACC -200 AGCTGAT -208 TACTGAA -222 AACTATA -236 TGATAAC -256 TAGTGAT -259 TTCTGAG -274 TACTATT -362 TCATGAT -368 AGCTGAG -380 AACTATT -393 TACTATG -403

69 4. RESULTS: AN ALTERNATE PROMOTER?

ln the course of the se studies, 1 obtained evidence that suggested that the CiTnl gene may contain an altemate promoter initiating transcription at position -35, and giving rise to a non-trans-spliced mature mRNA. In addition to SL trans-spliced molecules, poly(dC) tailing 5'RACE and SMART 5'RACE detected a non-trans-spliced rnRNA whose 5'end aligned with -35 in RNA transcribed from wildtype CiTnlnuc1acZ(·I.5) and

CiTnlnuc1acZ(-1.5){A(-66)C} (Table 2). This non-trans-spliced RNA product is hereafter called the -35 RNA (sequence shown in Fig. 10). In this chapter 1 describe several analyses meant to determine if the -35 RNA was also produced from the endogenous CiTnI gene. My results indicate that ·35 RNA is produced from lacZ reporter constructs electrotransformed into Ciona zygotes but it is not produced from the endogenous chromosomal CiTnI gene.

70 a) CiTnlnucIacZ(-1.5) 5' end structure oftranscript (-35 RNA)

TGTAGTITGGCT CiTnI/nBgal Construct CAGCGAAGAACC/ .. 1345bp .. /IGGTITACTAATGCAGTITCATCTACi',GCAACCGGTATCAATTGGTTAAGTAGGTCTTGTAGTITGGCT -90• -80• -70• -60• -50• -40 • -30•

b) CiTnlnucIacZ(-1.5){A( -66)C} 5' end structure oftranscript (-35 RNA)

TGTAGTITGGCT CiTnI/nBgal Construct CAGCGAAGAACC/ .• 1345bp .• /IGGTITACTAATGCAGTITCATCTACCCJCAACCGGTATCAATTGGTTAAGTAGGTCTTGTAGTITGGCT -90• -80 • -70 • -60• -50• -40 • -30 •

Figure 10: 5'end sequences of the -35 RNA transcripts isolated from 12h C. intestinalis embryos e1ectrotransformed with CiTnIlnuclear lacZ constructs. In each case the sequence of the transcript is shown above the sequence of the DNA construct that was used for electroporation. CiTnI sequence is shown in black, the SL is orange and nuclear lacZ is violet. a) The -35 RNA obtained from CiTnInuclacZ(-1.5) is not trans-spliced and begins at position -35. This sequence was obtained by polyC-tailing S'RACE analysis; b) The -35 RNA species was also found in RNA from CiTnlnuclacZ(-1.5){A(-66)C}. It was detected using polyC-tailing S'RACE and SMART S'RACE. Poly(dC)-tailing 5 'RACE analysis

Because 5'RACE products consistent with transcription initiation at -35 had been detected using poly(dC) tailing 5'RACE analysis ofCiTnIlnuclacZ mRNAs expressed in electrotransformed embryos (Table 2, Fig. Il),1 used the Invitrogen poly(dC)-tailing

5'RACE kit to determine the 5'end of endogenous CiTnI mRNA from adult body wall muscle RNA. This generated a single PCR product of 280bp, the expected size of a trans-spliced 5'RACE product (Fig. Il); a non-trans-spliced 5'RACE product would have been 230 bp.

3000

2000 1500

1031

500

Figure 11: Poly(dC)-tailing 5'RACE of endogenous CiTnI mRNA from adult bodywall muscle. PCR products were analyzed by electrophoresis in a 1% agarose gel electrophoresis. Lane 1: 500 ng MBI Fermentas GeneRuler DNA Ladder Mix (SM0331); lane 2: poly(dC)-tailing 5'RACE product of CiTnI using adult body wall RNA (Atlantic) as the template; lane 3: poly(dC)-tailing S'RACE product of CiTnI using adult body wall RNA (Atlantic) as the template, no terminal transferase; lane 4: poly(dC)-tailing 5'RACE product of CiTnI using adult body wall RNA (Atlantic) as the template, no reverse transcriptase.

72 This product was gel-recovered, cloned and sequenced and 4/4 molecules were found to be trans-spliced at the natural trans-splice acceptor site at -64. Therefore, there is no evidence from 5'RACE experiments supporting the transcription ofthe -35 RNA from the endogenous CiTnI gene, at least in adult bodywall muscle.

Scan of EST database

ln addition to the foregoing 5'RACE based approach for determining whether or not -35 RNA is produced from the endogenous CiTnI gene, 1 analyzed the CiTnI ESTs in the Ghost database (http://ghost.zool.kyoto-u.ac.jp/indexrl.html) which includes

>600,000 Ciona intestinalis ESTs from various developmental stages and adult tissues

(though not specifically bodywall muscle) for anY evidence of a true non-trans-spliced -

35 CiTnI transcript. An EST (expressed sequence tag) is a single-read sequencing run from a cDNA clone of an rnRNA. The Ghost database contains 339,054 5'ESTs, 157 of which are from CiTnI from alilife stages during which the CiTnI gene is expressed, including 16 from the "embryo mix" stage and 13 from the "larval" stage. The cDNA clones used for the EST sequencing effort were not 5'RACE clones, but conventional cDNA clones that are expected to lack at least 12-15 bp from the 5'ends on the corresponding RNAs (Satou et al., 2006). The most abundant CiTnIESTs 5'ends are at positions -51 and -53, about 27 to 29 nucleotides shorter than the only known endogenous Tnl mRNA, the mRNA trans-spliced at the natural trans-splice acceptor site at -64. This suggests that an mRNA of a given length would show a peak of abundance in EST 5'-end sites at the genomic location 27-29 nucleotides downstream of the true mRNA 5'end. If a ·35 non-trans-spliced mRNA was expressed in adult or larval muscle

73 cells we would expect to see a corresponding peak of EST 5'end sites at positions -6 to-

8. No such peak was observed (Fig. 12). In fact the shortest EST 5'end that started before the ATG start site began at -17 (Fig. 12). Thus in a population of 157 CiTnI mRNA molecules there was no evidence for -35 RNAs.

18 ---.. - .---~------.------~----~ --~~- -- . !

16~------~------~

14~------~1~------~------~

12+------~1~~------~ ~ (Il ~ 10+------~1~~------~~~~-~--~----- ... ~ 8+----~-----~--~I-~-~------~---._------~-----i El i

4 +------".-

2

-82 -77 -72 -61 -56 -51 -46 -41 -36 -31 -26 -21 -16 -11 -6 -1 Position in CiTnI gene

Figure 12: Distribution of the 5'ends of CiTnI 5'ESTs. The number of CiTnI 5'ESTs that end at a given position are shown. Due to the limitations of conventional cDNA preparation methods, the actual RNA sequence will be at least 12-15 bases longer at the 5'end than what is indicated by an EST.

The apparent absence of non-trans-spliced CiTnI mRNAs initiating at -35 both in the 5'RACE analysis and in the EST analysis suggests that these mRNAs may be generated by experimental gene constructs introduced into zygotes and presumably transcribed as circular plasmid molecules, but not by transcription of the endogenous chromosomal CiTnI gene.

74 5. DISCUSSION

InitialIy, the goal ofthis project had been to create and study the translatability of a non-trans-spliced outron-containing CiTnl mRNA. However, results indicating the presence of cryptic trans-splice acceptor sites within the outron made imperative a study of the distribution of such cryptic acceptor sites. And so, the focus of the present study shifted to the distribution of trans-splice acceptor sites within the CiTnloutron. The idea of creating a non-trans-spliced outron-containing CiTnl mRNA molecule remains a long- term goal for our group (see below).

CiTnI sequece from GenBank AF237978

-550 Atcggtaa tcgagaatctgttaaccttgtttèl

Figure 13: Sequence map of the CiTnloutron. The CiTnloutron is shown above. The transcription start site is illustrated by an upper case, bolded A (first base of sequence shown). AlI AG dinucleotides found within the OUtron are in red. Those with experimental evidence ofbeing adjacent to trans-splice acceptor sites are in upper case letters. The AG adjacent to the natural trans-splice acceptor site is circled. The close matches to the branchpoint consensus sequence are underlined. The one true match is also in upper case lettering.

Distribution of Cryptic Trans-splice Acceptor Sites

An adjacent upstream AG dinucleotide contributes to the functional definition of a trans-splice acceptor site, but it is not the only signal required for SL trans-splicing. My results showed that when the AG at -66 was mutated, trans-splicing occurred adjacent to

75 the next AG dinucleotide upstream (CiTnInuclacZ(-1.5){A(-66)C} and CiTnInuclacZ(-

1.5){A(-66)T}) or downstream (CiTnInuclacZ(-1.5){A(-66)C}) (Figs. 6 & 13). These results are consistent with those obtained when similar experiments were performed in nematodes (Conrad et al., 1993b) and trypanosomes (Hummel et al., 2000). In both of these reports, trans-splicing was eliminated at the mutated natural trans-splice acceptor site and occurred instead at a cryptic site adjacent to the next AG upstream in the case of the nematode experiment, or downstream in the trypanosomes (Conrad et al., 1993b;

Hummel et al., 2000). My findings of similar results in Ciona indicate mechanistic similarities in trans-splicing across wide phylogenetic distances. The appearance of cryptic sites so close to the mutated CiTnI trans-splice acceptor was of concern because it raised the possibility that any AG within the outon perhaps serves to direct trans-splicing, a scenario that would make the experimental approach used here, ie. the generation of outron-containing mature mRNAs by mutational elimination of trans-splicing, impractical for future studies. There are 33 AGs in the CiTnloutron (Fig. 13). Mutating all ofthem is not a reasonable experiment for two reasons: (1) mutating 33 AGs is technically impractical, (2) if aIl 33 were mutated, this would significantly alter the sequence of the outron and the experiment would no longer be testing the effect of retaining the natural outron on eventual protein translatability. The outron 3'truncation constructs were designed to provide insight into the distribution of cryptic trans-splice acceptor s within the CiTnloutron.

The mRNA SL PCR sequencing results obtained from the 3'outron truncation constructs indicted that cryptic acceptors are not widely distributed throughout the outron

(Fig. 13). In fact, all three ofthese constructs and the internai outron partial deletion

76 construct CiTnlnuclacZ(-1.5){A(-66)C}~-319,-76 used the same cryptic trans-splice acceptor site, at -346 (Fig. 6). This result, coupled with the results from the natural splice site mutants, suggested that trans-splice acceptor sites are clustered. In the case of

CiTnI there are two clusters, one, perhaps a single site, at -346 and one in the -77 to -39 region near the natural trans-splice acceptor site at -64. The functional cryptic trans­ splice acceptor sites involve only 3 of the 33 AG dinucleotides in the outron.

While an AG is needed for trans-splicing, it is certainly not the only sequence requirement that defines a trans-splice acceptor site. Trans-splice acceptor sites appear to have the same sequence requirements as cis-splice acceptor sites (Davis, 1996). They require a branchpoint A residue, in many organisms found within the consensus

YNCTRA Y, an AG exonlintron(outron) boundary dinucleotide located downstream of the branchpoint, and an intervening polypyrimidine tract (Reed, 1989). (Nematodes do not have the same requirement for a consensus branchpoint sequence (Blumenthal,

1995». For the trans-splicing reaction to occur, the acceptor site must not be paired with an upstream splice donor site (Conrad et al., 1993b). The CiTnloutron has the necessary features in theregion surrounding the natural acceptor site (Vandenberghe et al., 2001).

My results show that most AGs in the CiTnloutron do not serve as cryptic trans-splice acceptors. This may be because they are lacking nearby branchpoints, and/or polypyrimidine tracts. When mutations at the natural trans-splice acceptor site are made, the natural branchpoint is left intact. The spliceosomal complex assembles on the branchpoint and then, when the natural acceptor site is not present, uses the nearest AG as a cryptic trans-splice acceptor site (Kralovicova et al., 2005). Clustering of trans­ splice acceptor sites is likely related to the recognition signaIs of the various spliceosomal

77 factors, in particular U2AF (Blumenthal, personal communication) which binds the polypyrimdine tract (Berglund et al., 1998a).

The acceptor at -346 is probably the only other location in the outron where aH of the sequence elements needed for trans-splicing are present. However, the putative branchpoint for the -346 acceptor, TACTATT (-362), is not a perfect match to the consensus YNCTRAy (Table 4), but it is located in the vicinity of a pyrimidine-rich sequence tract. This suggests that in Ciona branchpoints do not need to be perfect matches to the consensus to be used. A study done with a mutant variant of the rabbit pglobin second intron demonstrated that a non-consensus branchpoint can be used to give the correct mRNA, but that splicing occurs at a lower efficiency (Padgett et al., 1985).

Because no studies of splicing kinetics were performed in the present study it is not known whether trans-splicing to the cryptic site at -346 is siower or less efficient than trans-splicing to the natural acceptor site. However, ifthis is the case, it does not appear to have a major impact on gene expression as assessed by my semi-quantitative histochemical analysis of reporter fi-galactosidase enzyme production.

Is Trans-splicing Needed for Protein Expression?

The results obtained in the current study indicate that trans-splicing at the natural acceptor site is not required for protein expression. The natural acceptor mutant constructs were strongly active with more than haif of the embryos falling into the highest fi-gal staining category, like the wildtype (Table 3), this despite the fact that they were not trans-spliced at the natural acceptor site, but instead used cryptic acceptors.

CiTnlnuclacZ(-1.5){ A(-66)C} L\-319,-77 is also strongly active and also uses a cryptic

78 trans-splice acceptor site. There do appear to be differences in expression levels among constructs (see below) although this was not verified by rigorous quantitative methods.

None of the mutations examined eliminated trans-splicing entirely, so we do not know whether trans-splicing, at sorne location, is required for protein expression. Trans­ splicing of polycistronic genes in trypanosomes has been eliminated by mutating the trans-splice acceptor site of the downstream gene, and in the process preventing operon resolution and maturation of that transcript (Hummel et al., 2000). No study has eliminated trans-splicing of a monocistronic gene. Conrad et al. attempted such an experiment in nematodes by mutating the AG adjacent to the natural acceptor site of the rol6 gene, but found that rather than eliminating trans-splicing of the RNA molecule, this merely changed the location of trans-splicing to a cryptic acceptor site 20 nt upstream

(Conrad et al., 1993b). In that study, protein function was assessed by the presence of a particular phenotype; they observed no phenotypic differences between the acceptor site mutant and the wildtype (Conrad et al., 1993b). And so, while there is definitive evidence in the literature supporting the conclusion that trans-splicing specifically at the natural acceptor site is not needed for protein expression, there is no evidence as to whether or not the process of trans-splicing is required in general for those monocistronic genes that undergo trans-splicing. Complete elimination of trans-splicing, with resulting production of a non-trans-spliced mRNA containing the entire outron, remains an experimental goal.

79 Sanitization Hypothesis

Sorne of my data provide preliminary support for the 5' -untranslated region sanitization hypothesis of SL trans-splicing function. By this hypothesis, the retenti on of an outron on the rnRNA would be deleterious to protein expression. 1 note that the outron 3' truncation construct with the longest outron segment, CiTnI(-838/-79)nlacZ, had lower f3-galactosidase expression than wildtype (Table 3). This is consistent with the sanitization hypothesis (Blumenthal, 1995; Hastings, 2005; Nilsen, 2001) although further studies will be required to confirm this observation and to map the hypothetical deleterious element within the outron.

Since the outron is normally removed, it is free to evolve. It does not have the same constraints as exonic regions of the RNA molecule. Therefore, potentially harmful mutations could accumulate over evolutionary time in the outron only to be revealed when trans-splicing is artificially inhibited. Aspects of rnRNA function that could possibly be interfered with by elements within an experimentally retained outron would include mRNA stability, RNA transport out of the nucleus and translation ofthe final protein.

How to build an outron-containing CiTnI mRNA molecule

The original goal of the project was to create a non-trans-spliced, outron­ containing CiTnI RNA that could be used to test the functional importance of trans­ splicing. However, the initial results indicated that simple mutation of the natural trans­ splice acceptor site did not eliminate trans-splicing because cryptic trans-splice acceptor sites were then used. It therefore became important to determine the distribution of

80 cryptic trans-splice acceptor sites within the outron. Now that we know that not all AGs function as cryptic trans-splice acceptor sites, and that SL trans-splicing appears to be driven by additional elements perhaps including branchpoints and/or polypyrimindine tracts, which are only present in two places in the CiTnloutron, there is sufficient evidence to attempt a new strategy to tacIde the goal of creating a non-trans-spliced CiTnI

RNA molecule.

Clearly, mutating the natural trans-splice acceptor site is not sufficient to eliminate trans-splicing. In my experiments three cryptic trans-splice acceptor sites were revealed. This is a common result for splicing experiments, both cis- and trans-, where the acceptor site has been mutated (Aebi et al., 1986; Conrad et al., 1993b; Hummel et al., 2000; Kuiper et al., 1988; Macklin et al., 1987; Reed and Maniatis, 1985). Cryptic acceptors are often located near the real acceptor; likely because weak acceptors are allowed to evolve in the shadow of strong acceptors (T. Blumenthal, personal communication), or because of the requirement for a nearby branchpoint and/or polypyrimidine tract.

An alternate strategy to mutating the acceptor-defining AG dinuceotides could be one of mutating the branchpoints. Branchpoints have more complex sequence requirements, so there is a smaller chance of them evolving strictly by chance (although a computational study did find that the consensus branchpoint is distributed with near random frequency throughout the gene sequence, with only a very slight bias to be located near splice acceptors (Harris and Senapathy, 1990». Branchpoint sequence requirements are not straightforward. In the CiTnI gene, the only complete match to the consensus branchpoint is the putative natural branchpoint at -82 (Vandenberghe et al.,

81 2001). This site is probably also used for trans-splicing to the nearby cryptic sites at -76 and -35. However, ifit is used for trans-splicing at -76, then this is a shorter branchpoint-acceptor distance (6 nt) than has been reported in the literature to date; branchpoints are commonly found lOto 50 nt away from the acceptor site (Harris and

Senapathy, 1990).

Activation of cryptic branchpoints is also a concem if this strategy were to be followed. Several cis-splicing experiments conducted in the human p-globin gene first intron found that mutations to the branchpoint, whether naturally occurring or synthetic, led to the recruitment of a cryptic branchpoint (Padgett et al., 1985; Reed and Maniatis,

1985; Ruskin et al., 1985). The cryptic branchpoints were not necessarily a match to the consensus, but were usually very close (Padgett et al., 1985; Reed and Maniatis, 1985).

In all cases, the splicing reaction accurately joined the two exons, so the final mRNA product was not affected (Padgett et al., 1985; Reed and Maniatis, 1985; Ruskin et al.,

1985).

I believe that the best way to create an outron-containing CiTnl molecule is to make branchpoint-acceptor pair mutations. So, for each acceptor c1uster the branchpoint and the natural or main cryptic acceptor site would be mutated. A potential strategy for branchpoint mutations would be site-directed mutagenesis of all A residues within the branchpoint consensus (or near consensus) sequence. It is unlikely that both a cryptic branchpoint and a cryptic acceptor will exist in the same region. In the case of CiTnl, this would mean mutating the branchpoint at -82 and the acceptor at -66 and the branchpoint at -362 and the acceptor at -346.

82 Problems Still Remaining and Future Studies

Two problems remain to be solved. The first is to determine conclusively the origin and significance (biological or technical) of the -35 RNA. The second is to develop an internaI electroporation control for rigorous quantitative studies of construct reporter gene expression levels.

-35 RNA

The results ofmy experiments suggest that the -35 RNA is not a product ofan alternate promoter that is functional in the endogenous CiTnI gene, rather, they suggest that the -35 RNA is a product of the CiTnI nuclear lacZ fusion constructs. There is probably an artificial enhancer or promoter created when CiTnI is placed in the context of a circular plasmid. UnfortunateIy, if this is indeed the case, then it makes interpreting the p-gaIactosidase staining difficult because there are now two RNA species present and it is not known which of these is leading to production of the functional protein. Further studies need to be done to determine whether the -35 RNA is capable ofyielding functional protein. These experiments could be done by in vitro transcription/translation assays.

Internai Electroporation Control

The second issue that needs to be addressed is that of an internaI electroporation control. In aIl of the Ciona electroporation experiments performed to date by members of our Iaboratory, no internaI control has been used. Interpretation ofresults would be greatly aided if such a control were in place, as it would allow for more accurate

83 quantitative comparisons of gene expression levels hetween constructs. A good internaI control could he generated hy using the CiTnIDNA from CiTnInuc1acZ(-1.5), but replacing lacZ with sorne other reporter gene (placental alkaIine phosphatase, luciferase, green fluorescent protein) that could he detected by staininglfluoresence (semi­ quantitative) or a homogenate enzyme assay (quantitative).

Future studies using an internaI control and constructs carrying branchpoint­ acceptor pair mutations should provide the necessary means of generating data to answer whether SL trans-splicing is required for protein expression.

84 6. REFERENCES

Aebi, M., Homig, H., Padgett, RA., Reiser, J., and Weissmann, C. (1986). Sequence requirements for splicing ofhigher eukaryotic nuclear pre-mRNA. Cell 47, 555-565.

Alberts, B. (2002). Molecular biology of the ceIl, 4th edn (New York, Garland Science).

Berglund, J. A., Abovich, N., and Rosbash, M. (1998a). A cooperative interaction between U2AF65 and mBBP/SFl facilitates branchpoint region recognition. Genes Dev 12, 858-867.

Berglund, J. A., Chua, K., Abovich, N., Reed, R, and Rosbash, M. (1997). The splicing factor BBP interacts specifically with the pre-mRNA branchpoint sequence UACUAAC. Ce1l89, 781-787.

Berglund, J. A., Fleming, M. L., and Rosbash, M. (1998b). The KR domain of the branchpoint sequence binding protein determines specificity for the pre-mRNA branchpoint sequence. Rna 4, 998-1006.

Blaxter, M., and Liu, L. (1996). Nematode spliced leaders--ubiquity, evolution and utility. Int J Parasitol 26, 1025-1033.

Blumenthal, T. (1995). Trans-splicing and poly-cistronic transcription in Caenorhabditis elegans. Trends Genet Il, 132-136.

Blumenthal, T. (2006). personal communication.

Boukis, L. A., and Bruzik, J. P. (2001). Functional selection ofsplicing enhancers that stimulate trans-splicing in vitro. Rna 7, 793-805.

Brehm, K., Hubert, K., Sciutto, E., Garate, T., and Frosch, M. (2002). Characterization of a spliced leader gene and oftrans-spliced mRNAs from Taenia solium. Mol Biochem ParasitoI122,105-11O.

Bruzik, J. P., Van Doren, K., Hirsh, D., and Steitz, J. A. (1988). Trans splicing involves a novel form of small nuclear ribonucleoprotein particles. Nature 335, 559-562.

Campbell, D. A., Thomton, D. A., and Boothroyd, J. C. (1984). Apparent discontinuous transcription of Trypanosoma brucei variant surface antigen genes. Nature 311,350-355.

Ce1niker, S. E., and Rubin, G. M. (2003). The Drosophila melanogaster genome. Annu Rev Genomics Hum Genet 4, 89-117.

Champion-Arnaud, P., Gozani, O., Palandjian, L., and Reed, R (1995). Accumulation of a novel spliceosomal complex on pre-mRNAs containing branch site mutations. Mol Cell Biol 15, 5750-5756.

85 Cheng, G., Cohen, L., Ndegwa, D., and Davis, R. E. (2006). The flatworm spliced leader 3'-terminal AUG as a translation initiator methionine. J Biol Chem 281, 733-743.

Chiba, S., Awazu, S., Itoh, M., Chin-Bow, S. T., Satoh, N., Satou, Y., and Hastings, K. E. (2003). A genomewide survey of developmentally relevant genes in Ciona intestinalis. IX. Genes for muscle structural proteins. Dev Genes EvoI2J3, 291-302.

Cleto, C. L. (2002) Analysis of transcriptional elements of an ascidian troponin 1 gene, MSc., McGill University, Montreal.

Cleto, C. L., Vandenberghe, A. E., MacLean, D. W., Pannunzio, P., Tortorelli, C., Meedel, T. H., Satou, Y., Satoh, N., and Hastings, K. E. (2003). Ascidian larva reveals ancient origin ofvertebrate-skeletal-muscle troponin 1 characteristics in chordate locomotory muscle. Mol Biol Evo120, 2113-2122.

Conrad, R., Liou, R. F., and Blumenthal, T. (1993a). Conversion ofa trans-spliced C. elegans gene into a conventional gene by introduction of a splice donor site. Embo J 12, 1249-1255.

Conrad, R., Liou, R. F., and Blumenthal, T. (1993b). Functional analysis ofa C. elegans trans-splice acceptor. Nucleic Acids Res 21, 913-919.

Conrad, R., Thomas, 1., Spieth, J., and Blumenthal, T. (1991). Insertion of part of an intron into the 5' untranslated region of a Caenorhabditis elegans gene converts it into a trans-spliced gene. Mol Cell Biol}}, 1921-1926.

Das, R., Zhou, Z., and Reed, R. (2000). Functional association of U2 snRNP with the ATP-independent spliceosomal complex E. Mol Ce1l5, 779-787.

Davis, R. E. (1996). Spliced leader RNA trans-splicing in metazoa. Parasitol Today 12, 33-40.

Davis, R. E., Hardwick, C., Tavernier, P., Hodgson, S., and Singh, H. (1995). RNA trans­ splicing in flatworms. Analysis of trans-spliced mRNAs and genes in the human parasite, Schistosoma mansoni. J Biol Chem 270,21813-21819.

Dehal, P., Satou, Y., Campbell, R. K., Chapman, J., Degnan, B., De Tomaso, A., Davidson, B., Di Gregorio, A., Gelpke, M., Goodstein, D. M., et ql. (2002). The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins. Science 298, 2157-2167.

Delsuc, F., Brinkmann, H., Chourrout, D., and Philippe, H. (2006). Tunicates and not cephalochordates are the closest living relatives ofvertebrates. Nature 439, 965-968.

Denker, J. A., Zuckerman, D. M., Maroney, P. A., and Nilsen, T. W. (2002). New components of the spliced leader RNP required for nematode trans-splicing. Nature 417, 667-670.

86 Farah, C. S., and Reinach, F. C. (1995). The troponin complex and regulation of muscle contraction. Faseb J 9, 755-767.

Ganot, P., Kallesoe, T., Reinhardt, R., Chourrout, D., and Thompson, E. M. (2004). Spliced-Ieader RNA trans splicing in a chordate, Oikopleura dioica, with a compact genome. Mol Cell Biol 24, 7795-7805.

Goyal, K., Browne, J. A., Burnell, A. M., and Tunnacliffe, A. (2005). Dehydration­ induced tps gene transcripts from an anhydrobiotic nematode contain novel spliced leaders and encode atypical GT-20 family proteins. Biochimie 87, 565-574.

Gu, X., Wang, Y., and Gu, J. (2002). Age distribution ofhuman gene families shows significant roles ofboth large- and small-scale duplications in vertebrate evolution. Nat Genet 31,205-209.

Hannon, G. J., Maroney, P. A., Ayers, D. G., Shambaugh, J. D., and Nilsen, T. W. (1990). Transcription of a nematode trans-spliced leader RNA requires internai elements for both initiation and 3' end-formation. Embo J 9, 1915-1921.

Hannon, G. J., Maroney, P. A., and Nilsen, T. W. (1991). U small nuclear ribonucleoprotein requirements for nematode cis- and trans-splicing in vitro. J Biol Chem 266,22792-22795.

Harris, N. L., and Senapathy, P. (1990). Distribution and consensus ofbranch point signals in eukaryotic genes: a computerized statisticaI anaIysis. Nucleic Acids Res 18, 3015-3019.

Hastings, K. E. (1997). Molecular evolution of the vertebrate troponin 1 gene family. Cell Struct Funet 22, 205-211.

Hastings, K. E. (2005). SL trans-splicing: easy come or easy go? Trends Genet 21,240- 247.

Horton, R. M. (1995). PCR-mediated recombination and mutagenesis. SOEing together tailor-made genes. Mol Biotechnol3, 93-99.

Hummel, H. S., Gillespie, R. D., and Swindle, J. (2000). MutationaI analysis of3' splice site selection during trans-splicing. J Biol Chem 275, 35522-35531.

Jamison, S. F., Crow, A., and Garcia-Blanco, M. A. (1992). The spliceosome assembly pathway in mammalian extraets. Mol Cell Biol 12, 4279-4287.

Johnson, P. J., Kooter, J. M., and Borst, P. (1987). Inactivation of transcription by UV irradiation ofT. brucei provides evidence for a multicistronic transcription unit including a VSG gene. Cell51, 273-281.

Khare, P. (2005) Cis-regulatory elements driving muscle-specifie expression of an ascdian troponin 1 gene, MSc., McGill University, Montreal.

87 Kralovicova, J., Christensen, M. B., and Vorechovsky, 1. (2005). Biased exon/intron distribution of cryptic and de novo 3' splice sites. Nucleic Acids Res 33, 4882-4898.

Krause, M., and Hirsh, D. (1987). A trans-spliced leader sequence on actin mRNA in C. elegans. Cell 49, 753-761.

Kuersten, S., Lea, K., MacMorris, M., Spieth, J., and Blumenthal, T. (1997). Relationship between 3' end formation and SL2-specific trans-splicing in poly(dC)istronic Caenorhabditis elegans pre-mRNA processing. Rna 3,269-278.

Kuiper, M. T., Holtrop, M., Vennema, H., Lambowitz, A. M., and de Vries, H. (1988). A 3' splice site mutation in a nuc1ear gene encoding a mitochondrial ribosomal protein in Neurospora crassa. J Biol Chem 263, 2848-2852.

Kusakabe, T. (2005). Decoding cis-regulatory systems in ascidians. Zoolog Sci 22, 129- 146.

Lall, S., Friedman, C. C., Jankowska-Anyszka, M., Stepinski, J., Darzynkiewicz, E., and Davis, R. E. (2004). Contribution oftrans-splicing, 5' -leader length, cap-poly(A) synergism, and initiation factors to nematode translation in an Ascaris suum embryo ceIl­ free system. J Biol Chem 279, 45573-45585.

Lee, M. G., and Van der Ploeg, L. H. (1997). Transcription ofprotein-coding genes in trypanosomes by RNA polymerase I. Annu Rev Microbiol 51, 463-489.

Ley, T. J., Anagnou, N. P., Pepe, G., and Nienhuis, A. W. (1982). RNA processing errors in patients with beta-thalassemia. Proc Natl Acad Sci USA 79,4775-4779.

Lücke, S., Xu, G. L., Palfi, Z., Cross, M., Bellofatto, V., and Bindereif, A. (1996). Spliced leader RNA of trypanosomes: in vivo mutational analysis reveals extensive and distinct requirements for trans splicing and cap4 formation. Embo J 15, 4380-4391.

Macklin, W. B., Gardinier, M. V., King, K. D., and Kampf, K. (1987). An AG----GG transition at a splice site in the myelin proteolipid protein gene in jimpy mice results in the removal of an exon. FEBS Lett 223, 417-421.

MacLe an, D. W., Meedel, T. H., and Hastings, K. E. (1997). Tissue-specific alternative splicing of ascidian troponin I isoforms. Redesign of a protein isoform-generating mechanism during chordate evolution. J Biol Chem 272,32115-32120.

Mair, G., Shi, H., Li, H., Djikeng, A., Aviles, H. O., Bishop, J. R., Falcone, F. H., Gavrilescu, C., Montgomery, J. L., Santori, M. 1., et al. (2000). A new twist in trypanosome RNA metabolism: cis-splicing ofpre-mRNA. Rna 6, 163-169.

Maniatis, T., and Reed, R. (2002). An extensive network of coupling among gene expression machines. Nature 416, 499-506.

88 Maroney, P. A., Denker, J. A., Darzynkiewicz, E., Laneve, R, and Nilsen, T. W. (1995). Most mRNAs in the nematode Ascaris lumbricoides are trans-spliced: a role for spliced leader addition in translational efficiency. Rna 1, 714-723.

Maroney, P. A., Hannon, G. J., Denker, J. A., and Nilsen, T. W. (1990). The nematode spliced leader RNA participates in trans-splieing as an Sm snRNP. Embo J 9,3667-3673.

McLysaght, A., Hokamp, K., and Wolfe, K. H. (2002). Extensive genomic duplication during early chordate evolution. Nat Genet 31, 200-204.

Meedel, T. H. (1997). Development of ascidian muscles and their evolutionary relationship to other chordate muscle types. In Reproductive Biology of Invertebrates, K. G. a. A. Adiyodi, RG., ed.

Mount, S. M. (1983). RNA processing. Sequences that signal where to splice. Nature 304, 309-310.

Muhich, M. L., and Boothroyd, J. C. (1988). Poly(dC)istronic transcripts in trypanosomes and their accumulation during heat shock: evidence for a precursor role in mRNA synthesis. Mol Cell Biol 8, 3837-3846.

Nevitt, G., and Gilly, W. F. (1986). Morphological and physiological properties ofnon­ striated muscle from the tunicate, Ciona intestinalis: parallels with vertebrate skeletal muscle. Tissue Ce1118, 341-360.

Nilsen, T. W. (2001). Evolutionary origin ofSL-addition trans-splicing: still an enigma. Trends Genet 17, 678-680.

Padgett, R A., Konarska, M. M., Aebi, M., Homig, H., Weissmann, C., and Sharp, P. A. (1985). Nonconsensus branch-site sequences in the in vitro splieing oftranscripts of mutant rabbit beta-globin genes. Proc Natl Acad Sei USA 82,8349-8353.

Parsons, M., Nelson, R G., Watkins, K. P., and Agabian, N. (1984). Trypanosome mRNAs share a common 5' spliced leader sequence. Ce1138, 309-316.

Peled-Zehavi, H., Berglund, J. A., Rosbash, M., and Frankel, A. D. (2001). Recognition of RNA branch point sequences by the KH domain of splieing factor 1 (mammalian branch point binding protein) in a splicing factor complex. Mol Cell Biol 21, 5232-5241.

Pouchkina-Stantcheva, N. N., and Tunnacliffe, A. (2005). Spliced leader RNA-mediated trans-splieing in phylum Rotifera. Mol Biol Evo122, 1482-1489.

Rajkovic, A., Davis, RE., Simonsen, 1. N., and Rottman, F. M. (1990). A spliced leader is present on a subset of rnRNAs from the human parasite Schistosoma mansoni. Proc Natl Acad Sci USA 87, 8879-8883.

Reed, R (1989). The organization of3' splice-site sequences in mammalian introns. Genes Dev 3, 2113-2123.

89 Reed, R., and Maniatis, T. (1985). Intron sequences involved in lariat formation during pre-mRNA splicing. Cell41, 95-105.

Reed, R., and Maniatis, T. (1988). The role of the mammalian branchpoint sequence in pre-mRNA splicing. Genes Dev 2, 1268-1276.

Romfo, C. M., Maroney, P. A., Wu, S., and Nilsen, T. W. (2001). 3' splice site recognition in nematode trans-splicing involves enhancer-dependent recruitment of U2 snRNP. Rna 7, 785-792.

Ruskin, B., Greene, J. M., and Green, M. R. (1985). Cryptic branch point activation allows accurate in vitro splicing ofhuman beta-globin intron mutants. Cell41, 833-844.

Satoh, N. (2003). The ascidian tadpole larva: comparative molecular development and genomics. Nat Rev Genet 4, 285-295.

Satoh, N., and Jeffery, W. R. (1995). Chasing tails in ascidians: developmental insights into the origin and evolution of chordates. Trends Genet Il, 354-359.

Satou, Y., Hamaguchi, M., Takeuchi, K., Hastings, K. E., and Satoh, N. (2006). Genomic overview of rnRNA 5'-leader trans-splicing in the ascidian Ciona intestinalis. Nucleic Acids Res 34, 3378-3388.

Satou, Y., Takatori, N., Yamada, L., Mochizuki, Y., Hamaguchi, M., Ishikawa, H., Chiba, S., Imai, K., Kano, S., Murakami, S. D., et al. (2001). Gene expression profiles in Ciona intestinalis tailbud embryos. Development 128, 2893-2904.

Stover, N. A., and Steele, R. E. (2001). Trans-spliced leader addition to rnRNAs in a cnidarian. Proc Nad Acad Sci USA 98, 5693-5698.

Tessier, L. H., Keller, M., Chan, R. L., Fournier, R., Weil, J. H., and Imbault, P. (1991). Short leader sequences may be transferred from small RNAs to pre-mature rnRNAs by trans-splicing in Euglena. Embo J 10, 2621-2625.

Thomas, J. D., Conrad, R. C., and Blumenthal, T. (1988). The C. elegans trans-spliced leader RNA is bound to Sm and has a trimethylguanosine cap. Cell 54, 533-539.

Van Doren, K., and Hirsh, D. (1988). Trans-spliced leader RNA exists as smaU nuclear ribonucleoprotein particles in Caenorhabditis elegans. Nature 335, 556-559.

Vandenberghe, A. E., Meedel, T. H., and Hastings, K. E. (2001). rnRNA S'-leader trans­ splicing in the chordates. Genes Dev 15, 294-303.

Will, C. L., and Luhrmann, R. (2001). Spliceosomal UsnRNP biogenesis, structure and function. Curr Opin Cell Biol 13, 290-301.

Zayas, R. M., Bold, T. D., and Newmark, P. A. (2005). Spliced-leader trans-splicing in freshwater planarians. Mol Biol Evo122, 2048-2054.

90 Zhu, Y.Y., Machleder, E.M., Chenchik, A., Li, R., and Sieber, P.D. (2001). Reverse transcriptase template switching: a SMART approach for full-Iength cDNA library construction. Biotechniques 30,892-897.

Zorio, D. A., and Blumenthal, T. (1999). Both subunits ofU2AF recognize the 3' splice site in Caenorhabditis elegans. Nature 402, 835-838.

91 APPENDIXI:

PCR THERMAL CYCLING CONDITIONS

In every case, the final 72°C step was extended by 12 additional minutes, following which reactions were cooled to 4°C until they could be analyzed or stored at -20°C.

CiTnInuclacZ(-1.5){AC -66)C} and CiTnInuclacZ(-1.5){A( -66)T}

Left half: 95°C Imin, 45°C Imin, 72°C 2min (30 cycles)

Right half: 95°C Imin, 50°C Imin, 72°C 2min (30 cycles)

SOEing: 95°C Imin, 50°C Imin, 72°C 2min (30 cycles)

CiTnInuclacZ(-1.5){A( -66)C}L\-319,-76

Upstream half: 95°C Imin, 42°C Imin, 72°C 2min (30 cycles)

Downstream half: 95°C Imin, 45°C Imin, 72°C 2min (20 cycles)

CiTnI(-838/-79)n1acZ, CiTnI(-838/-177)n1acZ and CiTnl(-838/-262)n1acZ

94°C Imin, 5SoC Imin, 72°C 2min (20 cycles)

SLPCR

94°C 30s, 57°C 30s, 72°C 2min (30 or 40 cycles)

SMART S'RACE PCR CiTnI/Bgal

95°C 30s, 55°C 30s, 72°C 4min (22 (9185 & 9193) or 30 (9112 & 9178) cycles)

Poly(dC)-tailed 5'RACE PCR 1 and Nested PCR

94°C 2min 94°C Imin, 55°C Imin, 72°C 2min (35 cycles)

92 APPENDIX II:

EXAMPLES OF RAW SEQUENCE DATA

This section inc1udes examples of raw sequence data files generated at the

Sheldon Biotechnology Centre of McGill University. One example from each of the following methods is inc1uded: (i) SL PCR, (ii) poly(dC)-tailing 5'RACE, and (iii)

SMART. When traces and raw sequence files were received from Sheldon, they were aligned WÎth the sequence of the plasmid that had been used to electrotransform the Ciona zygotes. This was done to localize the Ciona and lacZ portions of the sequence data.

Once these were found, 1 looked for the SL sequence at the 5' end of the Ciona sequence

(as determined on the + strand). After determining where and iftrans-splicing had taken place, the PCR primer and/or cloning vector sequences were located. AH ofthese elements are coulor-coded in the raw sequence data according to the scheme shown below.

Poly(dC) tail (poly(dC)-tailing method only)

Mutated Natural acceptor site (where applicable) Bluescript II SK+ DNA Ciona Tnl DNA nuclacZ DNA

(i) SL PCR

This is sequence data obtained for the uncloned 550 bp SL PCR product obtained from

RNA extracted from zygotes that had been electrotransformed with CiTnInuc1acZ( -1.5).

Sequencing was done with oligo 9055 (Table 1) which was also the leftward primer used to generate the SL PCR product.

93 >C:\Program Files\Molecular Dynamics\MegaBACE\AnalyzedData/T050907RunOl Cp312_MD1/643(550bp)- 9055.esd 514 MegaBACE CC:CC;jU\Cl\l\c:r;C;( :C;C7\TC!\CC;T f\TC;C;e ;!\'T' IV; c; 'S'CI\CC; T 'l'C;( Tl;'l')\r : i,' ['i;r (;r :C!\C 'l']' ']'(;/\(;( :C:C;,f\C1:l\CCN:;\I:;Tl\' ['( :C~C;CCTC1\GC;M\C;PT( :C( 'j\( "[ '( '1 'i'C:'I'(;CTC;c:r: r;C:!\!\l\CCJ\C;C;C,Ell\ACC:C ;O:J\T"I'CCCCATTCf\C CC'l'CC'C; G\N "~l' ( 'l' 'l'CI,;e; /\I\i ;C;, ;1 ': ;l'l' [II ,i : c;( :C;CC;C;CTC :']"]'cc;r 'Tl', T TJ\C;(;r:Cl\(~(:TC;(jCl ;JV~/\C(;( ;(;C1\'j (;'1' r;Ci :;\/\: ;r ;r:; :/\';";'/\/\: ; '1 '1'1 :1 il: '1' li:: ( 'r C:J\(;c;e ;TTTTCCCJ\C;TCl\CC{.\CC;T 'l'r:Tjl,Af\1\CCACCC;(;l\ 'l'CC, ATC' 'ri :( !' I\C :, \((C:c;r:r ;T!' ( '1'·:"1'1 ;1 C;C!\I\T/'\ TCC;CCC):Tr'J\CT'j'r :C[IC;C;TC;C'L'e;T T 'l'CT(;(;TC'I' TC 1',(:(:c :N(:rX;'] 'J\I '("[''1' !\('(;(: TTr ''1'' [" 1'1 " ' I\C;Cl\C;'j'C!\'["['T'['TTe, ,N;C:'i'C:I;C;'i'/\C:CC/\GCCAAACTACAAGACCTACTTAACCAATTGATACCG

(i) poly(dC)-tailing 5'RACE

The next two sequences are from the opposite strands of a poly( dC)-tailing 5'RACE

clone obtained from zygotes electrotransformed with CiTnInuclacZ( -1.5) {A( -66)T}.

Sequencing was preimed from flanking vector regions using T7 (first sequence) or T3

(second sequence) primers. The second sequence corresponds to the 'forward strand' .

>C:\Program Files\Molecular Dynamics\MegaBACE\AnalyzedData/T051005RunOl_Cp312_MD1/643(400)7-T7.esd 722 MegaBACE AGGGACGGGCCCCCTCAGGCTCACGGTATCGATAAGCTTGATATCGAATTCTGCAGCCCii "\11:1 :i[ [ T(;r;1'( ;('(~'I'C!\/\fI(-'c!\(:e ;C:!\1\!\C;I~'c;c:r i\T'I'CGCCJ\TT (:/\( ;(;("i'CC ';iJ\/V "['( :T TI :i ;(;l'J\C :C :1 ;1 '1 ' C;C;CCC'J'CTTCI ; Cl' /'\T'I!.\CC:CTJ\GC'J'CC;CC;AAI\GCc,:C;C;;\'[C;'[' ~;C:T':C'.L\l\(:C;(( ;r,TTl\!\C:' l" ['( ;( cc: [jl/\ii :11 '\ CC;C;'I"T'TTCCC:l~(;'jTl\(:Cj\C(;'f"I'C;T,T\/\I\AC:C;l\CCCC;Al'(:(;/\'j'("rn:;( T/\('/\C ;('(:'i"['(;/I/\/\I" IC" ((,:i ((' /\f\T'l\T(~(;e :cc;e 'TC1\C,],' ['CCJ\CCTCC'['r;'l' T'TC l'CerC' T'['efICe :C/ICC'(;i:'[ !V 'i ''[''l' N:C:C'!" l' ("[' '['," l'l' 1'1 (1 :i, ; C/\C:TC/\'I'TTTi'i'CTCt\c;ccr:c;e;'I'f\(;C:C:PJ;CCAAACTACAAGACCTACTTAACCAATTGATACCGGTTGCAG TAGATGAAA cc :CC:CC:CC:CCC:C(" 'TCTAGAGCGGCCGCACCGCGGTGG AGCTCCAGCTTTTGTTCCCTTTAGTGAGGGTTAATTGCGCGCTTGGCGTAATCATGGTCATAGCTGTTTCC TGTGTGAAATTGTTATCCGCTCACAATTCCACACAACATACGAGCCGGAAGCATAAAGTGTAAAGCCTGGG GTGCCTAATGAGTGAGCTAACTCACATTAATTGCGTTGCGCTCACTGCCCGCTTTCCAGTCGGGAAACCTG TCGTGCCAGCTG

>C:\Program Files\Molecular Dynamics\MegaBACE\AnalyzedData/T051 005RunO 1_ Cp312 _MD 1/643 (400)7- T3 .esd 705 MegaBACE AGAGTCACGGGTTGGCGGCGCTCTATA . ';' ··C;C;C;CC;O:C;c;:;1 TTTCAT CTACTGCAACCGGTATCAATTGGTTAAGTAGGTCTTGTAGTTTGGI' Il ;(;(;':/\1 '( '1 ;I;r;( 'i(' CT cc 'l' CCI\J\J\ cr\AC/VICCr; '1'/I,A(;( ~ T !IC C:C ;C~ T Ge;CT (;/\2\C;1\ c: ':/\(: /\1\1\1 :/\r 1\'["['1;(;(:(:/\( ;r:( ;'["["l'C1\/\(:1 ;i:1 ((,,[,C;'I'/\'['(;( ;CI:/\C:/\'l'CC:j\'I'CCr:C ': ('1"1"1''1'/\1 'J\N le ;'l'c:( :'l'(:/\I''I'I:I:i CCTTCCCGTTA':C:Cf\P,CTT1V\,[,( :CCCfTCCA(;CACATC:CCD':TTTC:(·,Cr'/V;(",'C'.c;i :r;T /~j\T /\1 :r '1 :l,t.' ' cc:c:r;clV :CC;7\'1'C'C;CCTT'['CC:C';\/\C;"II:TTC:CC('/\Gc:r:Tc;r\7\'['I;CC'1 ;r,/\';C;::CI:r q:'T';'r ;::c "J'I ;(:'l'!'!'I' • :r,\ C:Cl\C;/\!\CCc;(~T(;CCl;GGGCTGCAGGAATTCGATATCAAGCTTATCGATACCGTCGACCTCGAGGGGGGGCC CGGTACCCAATTCGCCCTATAGTGAGTCGTATTACGCGCGCTCACTGGCCGTCGTTTTACAACGTCGTGAC TGGGAAAACCCTGGCGTTACCCAACTTAATCGCCTTGCAGCACATCCCCCTTTCGCCAGCTGGCGTAATAG CGAAGAGGCCCGACCGATCGCCCTTCCCAACAGTTGCGCAGCCTGAATGGCGAATGGGACGCGCCC

94 iii) SMART 5'RACE

The following is a sequence obtained from a c10ned SMART 5'RACE product derived

from RNA from zygotes electrotransformed with CiTnInuc1acZ(-1.5){A(-66)C}. The

sequencing primer, M13REV, primes from flanking vector sequence. The sequence

corresponds to the 'reverse' strand of the SMART 5'RACE product.

>C:\Program Files\Molecular Dynamics\MegaBACE\AnalyzedData/U060501RunOl_Cp312_MD1ISMART21.23-Ml3REV.esd738 MegaBACE TCAGCGCGCATACCCTCACTAAGGGlIACAAAGC'I'GGAGC'I'CC1\CCCCCTCCCCCCGCTCTACAACT1\C'l'( :c: A'I'CCCCCGT'I'TTCCC1\C:TCP,CC:/-\CC ;TT(:TAJV,,~\C(;ACC;c;r;F,TC'( ( '!.' 'l' C C CAC T Cl\C( :7\ (;C; TTC: TA/V\N: Gr\ c Cc:C~Gj\T C GA' [' C ']' C C;C: C/\TN' ; ( ;~ '/\. \ r', /\

C:C;CC;C: C' 1'C:7\(;TT(:(;f\(;(;T(~CT C~TT T CT (~(;TC'l' TCACC:C/\.CC:CCT IIC :C!\'l' '1'1\1:( ;C"i' TC 'l' '['( ;'["(' '[" ;( 'i\l' . i li (:11'["["[' T 'j" ['(" ['( ;/\(;CT ( '( :'['(;'1'1\1:(:( :1\( ;CCAAACTACAAGACCTACTTAACCAATTGATACCGGTTGCGGTAG ATGAAA ' " " i i GGGCTGCAGGAATTCTGATATC AACGCTTATCGATACCGTCGANCTCGAGGGGGGGCCCGGTACCCAATTCGCCCTATAGTGAGTCGTATTAC GCGCGCTCACTGGCCGTCGTTTTACAACGTCGTGACTGGGAAAACCCTGGCGTTACCCAACTTAATCGCCT TGCAGCACATCCCCCTTTCGCCAGCTGGCGTAATAGCGAAGAGGCCCGCACCGATCGCCCTTCCCAACAGT TGCGCAGCCTGAATGGCGAATGGGACGCGCCCTGTAGCGGCGCATTAAGCGCGGCGGGTGTGGTGGTTACG CGCAGCGTGACCGCTACACTTGCCAGCG

95