Spliced leader (SL) trans-splicing in the ascidian Ciona intestinalis: molecular characterization of the SL RNA

Brendan Yeats Biology Department McGill University, Montreal August 2009

A thesis submitted to McGill University in partial fulfilment of the requirements of the degree of MSc, biology. © Brendan Yeats 2009

Library and Archives Bibliothèque et Canada Archives Canada

Published Heritage Direction du Branch Patrimoine de l’édition

395 Wellington Street 395, rue Wellington Ottawa ON K1A 0N4 Ottawa ON K1A 0N4 Canada Canada

Your file Votre référence ISBN: 978-0-494-61656-7 Our file Notre référence ISBN: 978-0-494-61656-7

NOTICE: AVIS:

The author has granted a non- L’auteur a accordé une licence non exclusive exclusive license allowing Library and permettant à la Bibliothèque et Archives Archives Canada to reproduce, Canada de reproduire, publier, archiver, publish, archive, preserve, conserve, sauvegarder, conserver, transmettre au public communicate to the public by par télécommunication ou par l’Internet, prêter, telecommunication or on the Internet, distribuer et vendre des thèses partout dans le loan, distribute and sell theses monde, à des fins commerciales ou autres, sur worldwide, for commercial or non- support microforme, papier, électronique et/ou commercial purposes, in microform, autres formats. paper, electronic and/or any other formats. . The author retains copyright L’auteur conserve la propriété du droit d’auteur ownership and moral rights in this et des droits moraux qui protège cette thèse. Ni thesis. Neither the thesis nor la thèse ni des extraits substantiels de celle-ci substantial extracts from it may be ne doivent être imprimés ou autrement printed or otherwise reproduced reproduits sans son autorisation. without the author’s permission.

In compliance with the Canadian Conformément à la loi canadienne sur la Privacy Act some supporting forms protection de la vie privée, quelques may have been removed from this formulaires secondaires ont été enlevés de thesis. cette thèse.

While these forms may be included Bien que ces formulaires aient inclus dans in the document page count, their la pagination, il n’y aura aucun contenu removal does not represent any loss manquant. of content from the thesis.

ABSTRACT

I initially set out to identify the cap structure on the spliced leader RNA of

Ciona intestinalis. During this investigation, I discovered a previously unobserved 53 nt transcript containing the spliced leader sequence. This transcript contained the usual metazoan spliced leader RNA cap, trimethylguanosine, while the canonical spliced leader RNA lacked this moiety, but likely contained a m7G cap. The cap structure of the trans-spliced troponin I mRNA matched that of the canonical spliced leader RNA, indicating that the canonical spliced leader RNA, not the 53 nt transcript, is the donor for troponin I. Further immunoprecipitation studies showed that both the canonical spliced leader RNA and the novel 53 nt transcript exist in association with

Sm proteins.

I cloned a genomic DNA segment containing four tandem repeats of the spliced leader RNA , and this was used as a probe in an in situ hybridization study that showed the vast majority of the spliced leader RNA reside on chromosome 8.

Finally, I performed some preliminary work showing that outrons, the 5‟- segments of pre-mRNAs removed by trans-splicing, may exist in sufficient quantities as to be detected by PCR amplification.

ABSTRAIT

Au d‟épart, j'ai entrepris d‟identifier la structure coiffe de l‟ARN spliced leader de Ciona intestinalis. Durant cette recherche, j'ai découvert un nouveau transcript d‟RN de 53 nt qui contient la séquence spliced leader. Ce transcrit contient la coiffe usuele des ARNs spliced leader des métazoaires, le trimethylguanosine, tandis que l‟ARN spliced leader canonique ne possède pas cette modification, mais contient, fort probablement, une coiffe m7G. La structure de la coiffe de l‟ARNm de la troponine I trans-épissé correspond à celle de l‟ARN spliced leader canonique, indiquant que l‟ARN spliced leader est le donneur pour la troponine I, et non l‟ARN de 53 nt. Des études d'immunoprécipitation supplémentaires ont montré que l‟ARN spliced leader canonique et le nouvel ARN de 53 nt existent en association avec des protéines Sm.

J'ai cloné une sequence d‟ADN génomique contenant quatre repetitions en tandem du gène l‟ARN spliced leader. Ce clone a été utilisé comme sonde lors d‟une experience d‟hybridation in situ qui a montré que la grande majorité des gènes l‟ARN spliced leader réside sur le chromosome 8.

Finalement, j'ai effectué une étude préliminaire montrant que les outrons, les segments 5' des ARNs pré-messagers enlevés par le trans-épissage, existent en quantité suffisante et peuvent être détecté par l'amplification PCR.

TABLE OF CONTENTS

1. Introduction Overview of the thesis...... 1 Background ...... 2 Spliced leader trans-splicing...... 2 Phylogenetic distribution of SL trans-splicing ...... 4 Characteristics of SL RNAs ...... 6 Functions of SL trans-splicing ...... 9 Synthesis of the TMG cap precursor m7G ...... 14 Synthesis and function of TMG caps ...... 14 Sm proteins, snRNPs, and splicing ...... 19 Ciona intestinalis ...... 21 Project rationale ...... 22 Molecular characterization of the Ciona intestinalis SL RNA ...... 22 FISH to the Ciona intestinalis SL RNA gene cluster...... 23 Outron amplification ...... 23

2. Methods RNA extraction ...... 26 Northern blotting...... 26 PCR amplification and cloning of the 53 nt SL RNA -containing transcript...... 28 Reverse transcription...... 29 Surveying the expression of various SL RNAs ...... 29 Immunoprecipitation using the K121 mouse monoclonal antibody ...... 30 Immunoprecipitation using the R1131 rabbit polyclonal antiserum...... 30 Shrimp alkaline phosphatase (SAP) treatment ...... 31 Tobacco acid pyrophosphatase (TAP) treatment ...... 31 Northern blotting to determine of the cap status of the SL RNA ...... 32 Immunoprecipitation using the H20 mouse monoclonal antibody ...... 33 Immunoprecipitation using the anti-Sm antiserum ...... 33 Identification of Sm-Associated transcripts ...... 34 RNase H digestion of Ciona intestinalis TnI mRNA ...... 34 Measuring hybridization on Northern blots ...... 35 Topo TA® cloning ...... 36 Production of a fluorescent in situ hybridization probe ...... 37 Fluorescent in situ hybridization...... 39 Amplification of the TnI outron...... 39

3. Results: characterization of the Ciona intestinalis SL RNA in terms of size heterogeneity, cap structure, association with Sm proteins, and expression profile Size heterogeneity amongst SL-related transcripts ...... 42 The 53 nt SL RNA exon-containing transcript is generated by trans-splicing .... 45 Survey of SL RNA size and sequence variation ...... 47 K121 immunoprecipitation indicates the Ciona intestinalis SL RNA is most likely not TMG-capped, while the 53 nt SL RNA exon-containing transcript is 49 R1131 immunoprecipitation confirms the Ciona intestinalis SL RNA is not TMG-capped, while the 53 nt SL RNA exon-containing transcript is ...... 54 The 5‟-end of the SL RNA is phosphatase protected and therefore capped ...... 58

The Ciona intestinalis SL RNA cap is most likely m7G ...... 64 The Ciona intestinalis SL RNA, as well as the 53 nt SL RNA exon-containing transcript, are snRNPs associated with Sm proteins ...... 66 The canonical ~46 nt SL RNA, not the 53 nt SL RNA exon-containing transcript, is the SL donor for TnI...... 72 Expression profile of the SL RNA and the 53 nt SL RNA exon-containing transcript...... 75

4. Results: determination of where in the genome the Ciona intestinalis SL RNA genes reside Production of a probe for FISH to the Ciona intestinalis SL RNA gene cluster . 78 FISH to the Ciona intestinalis SL RNA gene cluster...... 80

5. Results: outron amplification Amplification of the TnI outron...... 82

6. Discussion Summary of results ...... 84 Functional implications of the lack of a TMG cap on the Ciona intestinalis SL RNA ...... 85 Evolutionary implications of the lack of a TMG cap on the Ciona intestinalis SL RNA ...... 88 Possible roles of the 53 nt SL RNA exon-containing transcript ...... 89 Why does the Ciona intestinalis SL RNA lack a TMG cap? ...... 92 Significance that the Ciona intestinalis SL RNA is capped...... 94 Association with Sm proteins ...... 94 Expression profile of the SL RNA ...... 95 Outron amplification ...... 95

7. References ...... 96

1

CHAPTER 1 – INTRODUCTION Overview of the thesis

The principal focus of our laboratory has historically been the regulation and molecular evolution of the troponin I (TnI) gene family. Through studying these facets of TnI in the model organism Ciona intestinalis, our lab discovered[1] that this organism utilizes a pre-mRNA processing phenomenon termed spliced leader (SL) trans-splicing, thus extending the phylogenetic range of organisms known to use this process to the chordates. Briefly, SL trans-splicing is a process whereby the 5‟-ends of some pre-mRNAs are replaced by the 5‟-end of another RNA, the spliced leader

(SL) RNA, in a spliceosomal reaction (figure 1)[2]. The molecule which donates its

5‟-end to pre-mRNAs is the SL RNA, while the specific portion donated, the SL RNA exon, is simply referred to as the SL. Because SL RNAs donate their 5‟-ends to pre- mRNAs, they also donate their capping moieties[3]. The 5‟-portion of the pre-mRNA removed by trans-splicing is called the outron[4].

While many functions for SL trans-splicing have been proposed, one of which is resolution of polycistronic transcripts, or operons, into individual translatable mRNAs[5], its purpose for typical genes which are not part of polycistronic transcription units remains unclear. As trans-splicing in imparts a specialized cap structure termed the trimethylguanosine (TMG) cap (figure 2, lower) to trans-spliced mRNAs[3], it has been hypothesized that provision of this cap bestows trans-spliced mRNAs with unique stability, translatability, or localization[3].

As determining a role for trans-splicing of typical monocistronic genes remains an elusive goal, the first aim of my project was to characterize the cap structure of the

Ciona intestinalis SL RNA to see if it shared the feature of a TMG cap in common with and other metazoan SL RNAs. As the evolutionary origins of trans- splicing are unknown, studying the precise molecular mechanisms of this process 2 should also help to resolve the ambiguity of whether this is an independently evolved or ancient process[6].

During my investigation of the Ciona intestinalis SL RNA, I discovered a previously unobserved SL RNA-like molecule, which contained the expected characteristics of an SL donor: the SL sequence, a TMG cap, a GU splice donor dinucleotide adjacent to the SL, and association with Sm proteins. Because this molecule looked remarkably like an SL donor, I also wished to investigate whether it, or the previously defined SL RNA[1], was the true donor for the trans-spliced mRNA

TnI.

I also set out to answer some other outstanding questions related to SL trans- splicing, such as where in the Ciona intestinalis genome the SL RNA genes reside. In addition to answering that question, I sought to determine whether the portions of pre- mRNA molecules lost to trans-splicing, the outrons, existed in such quantities as to be detectable by PCR amplification.

Background

Spliced leader trans-splicing

Mechanistically, SL trans-splicing proceeds through a reaction similar to cis- splicing, or removal. In both cases, the splicing reaction is a two-step process.

In the first step, the 2‟-hydroxyl of the branchpoint A nucleotide performs a nucleophilic attack on the 5‟-phosphate of the G residue within the GU dinucleotide found almost universally immediately downstream of the splice donor site. This step generates a branched intermediate in which the G nucleotide is linked to the branchpoint A nucleotide in a 2‟-5‟ phosphodiester linkage, an intermediate which is in the shape of a lariat in cis-splicing and a Y-branch when formed by trans-splicing, as well as a free hydroxyl moiety at the 3‟-end of the upstream exon. In the second 3 step, the 3‟-hydroxyl of the recently freed upstream exon undertakes a nucleophilic attack on the 3‟-phosphate at the G of the splice acceptor AG dinucleotide, ligating the upstream and downstream [7, 8], and releasing the intron/outron. Thus the consequences of trans-splicing are two-fold: all trans-spliced mRNAs will share at their 5‟-ends the sequence donated by the SL RNA, and all the outrons will contain the SL RNA intron as a branch in a 5‟-2‟ linkage with the branchpoint A nucleotide.

Figure 1. The SL trans-splicing reaction. Pre-splicing: The 2‟-OH of the branchpoint A nucleotide prepares to attack (dashed arrow) the splice donor dinucleotide GU, which will occur in step 1. Step 1: The nucleophilic attack of the 2-OH of the branchpoint A nucleotide attaches the SL RNA intron to the branchpoint in a 2‟-5‟ linkage, leaving a free 3‟-OH on the SL RNA exon. The SL RNA exon is now free to attack (dashed arrow) the splice acceptor dinucleotide AG. Step 2: The free 3‟-OH of the SL RNA exon completes its attack on the splice acceptor site, generating two products, the mature trans-splice mRNA, and a Y-branched intermediate consisting of the outron linked to the SL RNA intron.

4

Phylogenetic distribution of SL trans-splicing

SL trans-splicing is patchily distributed throughout the eukaryotes; two unicellular groups, Euglenozoa[9] and Alveolata[10] utilize this process, as do the metazoan phyla Nematoda[11], Platyhelminthes[12], [13], Rotifera[14],

Chaetognatha[15], and finally Chordata[1], of which phylum the ascidian tunicate

Ciona intestinalis is a member. To date, there is no evidence of SL trans-splicing in the well characterized genomes of mouse, fruitfly, or human[16], and the extensive genomic data available for these animals indicates it is reasonable to believe that SL trans-splicing does not occur in these animals, or that it occurs at extremely low levels[6, 16].

Considering its evolutionary origins, trans-splicing could be an ancient eukaryotic characteristic which has been lost extensively. Alternatively, trans- splicing, along with the features found in tandem with SL trans-splicing, such as trans-spliceosome-specific proteins[6, 17], SL-specific translation enhancer proteins capable of interacting with the modified cap structure transferred to trans-spliced mRNAs by trans-splicing[6, 8, 16, 18], and the presence of operons[6, 13, 16, 19], arose independently in the ancestors of each of the lineages which utilize this process today. The former hypothesis is supported by the observation that SL RNAs share some important characteristics, namely that they are similar to the small nuclear

RNAs involved in splicing in that they contain an Sm binding site. Additionally, all metazoan SL RNAs examined thus far, at least, also contain a conserved cap structure termed the TMG cap[16], however, for this fact to give credence to the ancient eukaryotic characteristic hypothesis, TMG caps on SL RNAs is a characteristic which would have to have been gained in the metazoans or lost in the protists, since the cap structure in unicellular groups which utilize this process is not TMG but cap4[20]. 5

The independent origin hypothesis is supported by the observation that the few components known to be unique to trans-splicing, namely the proteins involved in recognizing the modified cap structure transferred to and allowing for the translation of trans-spliced mRNAs[6, 13, 18, 19], and SL small nuclear ribonucleoprotein(snRNP)-specific proteins[17], are not obviously conserved between the phyla which utilize this process. The independent origin hypothesis is also supported by the fact that orthologous genes which are trans-spliced in one phylum are not necessarily trans-spliced in another[2, 6]. Thus, the debate over whether this is an ancient or independently derived process has not been settled, although it has been noted that studying the precise molecular mechanisms of this process should help to settle it[6].

It is also possible that trans-splicing was spread through horizontal transfer of the SL RNA genes, however, if this were the case, there has been pervasive sequence evolution, as there is very little sequence or size (ranging from ~46-109 nt)[1, 21] similarity between the SL RNAs of different phyla, and the additional SL trans- splicing specific components found in some organisms would then be secondarily derived.

Different organisms trans-splice their mRNA populations to varying extents, ranging from a portion of the mRNA population to the entirety of it. In trypanosomes, parasitic euglenozoans which utilize trans-splicing, for instance, 100% of the mRNAs receive the SL[22], while between 50-90%, of nematode mRNAs, depending on the species[23, 24], receive SL1, the major spliced leader of nematodes, and approximately 50% of Ciona mRNAs receive the 16 bp spliced leader[25], which is in contrast to the 25% of genes which are trans-spliced in the larvacean tunicate 6

Oikopleura dioica[19], or to the estimated minimum of 30% of Hydra genes which are trans-spliced[13].

As the chordates include lineages that clearly do (e.g. the Ciona intestinalis[1] and Oikopleura dioica[19]) and do not (e.g. the vertebrates[6]) utilize trans-splicing, they represent an interesting clade to study with regard to the question of whether SL trans-splicing was an ancestral or derived characteristic. Detailed analysis and comparison of the specific characteristics and mechanisms in the trans- splicing organisms within the chordates may provide unique insight into that question[6].

Characteristics of SL RNAs

SL RNAs are short (<150 nt) RNAs composed of a single exon, the SL, followed by an intron. The shortest SL discovered to date is that of Ciona intestinalis at 16 nt[1], and the longest SL yet identified is that of the primitive

Stylochus zebra at 51 nt[21]. SL RNA vary similarly, with the shortest known one being found once again in Ciona intestinalis at 30 nt[1], and the longest one in bdelloid at 81 nt[14].

Metazoan SL RNAs are RNA polymerase II (RNA pol II) transcripts transcribed from multiple direct tandem gene repeats[23, 26], the unit length of which is 264 bp in Ciona intestinalis[K. Hastings, unpublished data]. As estimated from raw genome sequencing data, there are approximately 900 copies of this gene in Ciona intestinalis (average ~11X coverage of the genome with ~10000 reads of this gene)[K.

Hastings, unpublished observation]. A recurring evolutionary tendency of SL RNA genes, as evidenced by species in Euglenozoa, Nematoda, and Cnidaria, is their tandem arrangement with the 5S rRNA gene[27]. Although this arrangement is widespread, it is most likely not an ancient ancestral trait, but rather independently 7 derived by chance, as the orientation of the SL gene repeat with respect to the 5S

RNA repeat is different in many cases[27], and because these genes are not always clustered together, in e.g. C. intestinalis[K. Hastings, unpublished observation].

Like other RNA pol II transcripts, SL RNAs are initially capped co- transcriptionally with a 7-methyl guanosine (m7G) cap. All the metazoan SL RNA caps thus far examined exist as a hypermethylated form of the initial m7G cap termed

TMG caps (figure 2, lower)[12, 13, 19, 28-31]. It was initially reported that the cap structures of mature trans-spliced nematode mRNAs were not TMG[28], as trans- spliced mRNAs were not precipitable with anti-TMG cap antiserum, suggesting these caps were altered or removed during processing. However, it was later reported by the same group that RNase H digested trans-spliced mRNAs were precipitable with anti-TMG cap antiserum [3], and by a different group that intact trans-spliced mRNAs were precipitable by a monoclonal anti-TMG antibody[32], indicating that trans- spliced mRNAs are indeed TMG-capped, and that the antiserum which recognizes the

TMG moiety fails to bind its full-length mRNAs, perhaps because their secondary structure interferes with the epitope with which this antiserum interacts. The varied cap structures of trans-spliced and non-trans-spliced mRNAs suggested the TMG- capped mRNAs may represent a functionally distinct subset of mRNAs with altered subcellular location, unusual stability, or a unique method of translation[3]. TMG caps are also found on the small nuclear RNAs involved in splicing[33], telomerase

RNAs[34], and some small nucleolar RNAs (snoRNAs)[35].

While the cap structures of metazoan SL RNAs (e.g. those of the nematodes C. elegans[28] and Ascaris suum[29], the chordate Oikopleura dioica[19], the cnidarian

Hydra vulgaris[13], and the parasitic platyhelminths Echinococcus multilocularis[30],

Fasciola hepatica[31], and Schistosoma mansoni[12]) have been reported to be TMG, 8 the cap structure of trypanosome (parasitic unicellular euglenozoan) SL RNAs are cap4[20]. Cap4 is an m7 G cap followed by extensive[36], cotranscriptional[37], modification of the first four nucleotides downstream of the cap. The functional significance of the difference in cap structures between protists and metazoans, if any, is currently unknown.

Although there is little sequence conservation of the SL RNA across phyla, certain characteristics are conserved. SL RNAs tend to have a predicted secondary structure consisting of a variable number of stem loops, usually three, with the splice donor dinucleotide GU always being adjacent to an AG dinucleotide which closes the first stem loop, and a single-stranded binding site, RA(U)nGR, for the spliceosomal snRNP-associated protein Sm[38]. The Sm binding site generally resides towards the

3‟-end of SL RNAs in the SL RNA intron, and is flanked by the other two stem loops in a three stem loop SL RNA, or simply lies between the only two stem loops in two stem loop SL RNAs[38]. Although these characteristics were initially thought to be characteristic of all SL RNAs, as they were found in all early discovered SL RNAs, the predicted secondary structures of the SL RNA of Ciona intestinalis[1] challenges the concept that the two or three stem-loop structure is absolutely conserved, as this unusually small SL RNA has only one predicted stem loop. The presence of a consensus Sm binding motif within the exon of the SL RNAs of challenges the belief that this feature is always found in the SL RNA intron[10].

Furthermore, the presence of the splice donor dinucleotide GU following the second stem loop in the predicted secondary structure of the SL RNA of Oikopleura dioica is an exception to the rule that the first two nucleotides of the SL RNA intron immediately follow the closure of the first stem loop[19]. The functional implications, if any, or these variations in predicted secondary structure, location of the splice donor 9 dinucleotide, and location of the Sm binding site, have not been investigated and remain unclear.

Functions of SL trans-splicing

Six distinct functions of SL trans-splicing have been observed or conjectured.

The first of these is to provide a cap structure to translated RNA polymerase I (RNA pol I) transcripts[39]. This is a function which is unique to trypanosomes, as these are the only organisms known to translate RNA pol I transcripts; RNA pol I is relegated to the transcription of rRNA genes in the nucleolus of most organisms. RNA pol I transcripts are not capped at their 5‟-end, but a cap is necessary for translation of most eukaryotic mRNAs. As the SL RNA is an RNA pol II transcript and therefore itself capped, trans-splicing of this molecule to the 5‟-end pre-mRNAs uncouples the absolute requirement that translatable mRNAs be RNA pol II transcripts, and instead allows RNA pol I transcripts to be capped via the spliceosomal linkage of the SL

RNA.

A second function for SL trans-splicing has thus far been observed in only platyhelminths, where the last three bases of the SL are an absolutely conserved

AUG[40]. A small minority of flatworm mRNAs use the SL AUG as the initiator methionine codon[40], however as the majority of trans-spliced flatworm mRNAs do not, it seems unlikely that providing an in-frame AUG is the primary function of SL trans-splicing, even in .

A third function of SL trans-splicing is to resolve polycistronic transcripts into individual translatable messages, a function first observed in trypanosomes[5], and later in nematodes[41]. More recently, it has been noted that most if not all organisms which utilize SL trans-splicing also have polycistrons[19]. In addition to freeing the downstream genes of a larger polycistronic transcript into individual open reading 10 frames, SL trans-splicing also provides these internally encoded mRNAs with a 5‟- cap, a moiety near essential for translation. In nematodes related to Caenorhabditis, distinct SL RNAs, SL1 and SL2, are used for trans-splicing near pre-mRNA 5‟-ends and internal trans-splicing of polycistronic transcripts, respectively[41]. In other organisms, the same SL is used in the trans-splicing of both polycistrons and monocistrons[41]. In nematode operons, the end of the upstream gene and the beginning of the downstream gene are generally ~110 bp apart, and 3‟-end formation of the upstream gene is required for trans-splicing of the downstream gene[42]. In

Ciona intestinalis, the 3‟-end of the upstream operon gene appears to be directly abutting the 5‟-end of the downstream gene[25]. A small subset of Caenorhabditis operons also has this feature. These downstream genes of these operons, termed SL1- type operons, are trans-spliced in nematodes by SL1 rather than SL2. It has been proposed that direct abutment of the upstream and downstream genes makes their expression mutually exclusive, as the poly(A) addition site of the upstream gene and the trans-splice acceptor site of the downstream gene are independent and in competition, i.e. either the upstream gene is polyadenylated and expressed from a given polycistronic pre-mRNA, or the downstream gene is trans-spliced and expressed from a given polycistronic pre-mRNA, but not both[43].

As only a subset of mRNAs receive the SL in most organisms which trans- splice, a fourth possible function for SL addition could be a signal for altered subcellular localization, unusual stability, or a unique method of translation of the recipient mRNA, through either the specialized cap structure which the SL provides or through the SL sequence itself[3]. Enhanced translation could occur through increased RNA stability, increased affinity between the mRNA and translation initiation factors, or by optimizing the sequence, length, or secondary structure of the 11

5‟-untranslated region (UTR)[2]. There have been contradicting studies with respect to the effects of the SL RNA on translation. One study reported that optimal translation in Ascaris lumbricoides was achieved through the combination of the SL

RNA of that species and its associated TMG cap, presumably at the level of translation initiation[44]. A more recent study in Ascaris suum, however, found that while TMG-capped mRNAs were very inefficiently translated, and m7 G -capped mRNAs containing the SL sequence were slightly less efficiently translated than m7G

-capped counterparts lacking this sequence, the presence of the two on an mRNA together yielded a translation efficiency around 2/3 that of m7 G transcripts lacking the

SL sequence[29]. This suggests that, in Ascaris suum at least, enhanced translation is not the predominant function of trans-splicing. The same study reported that the optimal 5‟-UTR length for non-trans-spliced mRNAs in this species was 25 nt, and that of trans-spliced mRNAs was 31 nt[29], so perhaps trans-splicing is required to reduce the size of the 5‟-UTRs of trans-spliced mRNAs down a size resembling that of non-trans-spliced mRNAs.

A fifth hypothesized function for SL trans-splicing is sanitization of the 5‟-

UTR. Removing the outron, the region of mRNA between the trans-splice acceptor site and the transcription start site, allows for the evolution of sequence elements near the 5‟-end of the which may be beneficial to early processes in gene expression, such as transcription, while being detrimental to later functional aspects, such as mRNA processing, nuclear export, stability, or translation[41]. It remains unclear, however, precisely what these elements might consist of; an obviously potentially problematic out of frame AUG codon normally removed by trans-splicing but experimentally manipulated to remain did not appear to affect mRNA levels of a particular gene, meaning out of frame translation was not happening as observed by a 12 lack of nonsense mediated decay[45]. Although in this instance an out of frame AUG did not appear to be detrimental to translation, that does not preclude the possibility of other 5‟-UTR elements from which an organism benefits by transient association with the pre-mRNA sequence.

The sixth function of trans-splicing is the generation of transcriptional variants through the use of alternative trans-splice acceptor sites. Very few, as few as two[39], promoters exist genome-wide in trypanosomes, and there is also a paucity of intron- containing genes requiring cis-splicing, though all trypanosome genes are trans- spliced; the trypanosome poly(A) polymerase gene is the only one which is known to required both cis-splicing and trans-splicing for proper expression[46]. Clearly, with such a lack of promoters and genes containing introns, trypanosomes can not generate the same transcriptional repertoire through alternative promoter usage or alternative cis-splicing, which requires genes to be divided into introns and exons, that higher eukaryotes can. Trypanosomes can, however, generate transcriptional variants through the usage of alternative trans-splice acceptor sites. This has been observed for the LYT1 gene of Trypanosoma cruzi[47], a parasitic trypanosome. The life cycle of these parasites is complex; depending on the life stage, it can thrive under a variety of conditions, be they replicative stage in the midgut of the insect vector (at which point these protists are called epimastigotes), or within the eukaryotic cells which they parasitize (at which point they are referred to as amastigotes)[47]. Because of these diverse stage-specific living circumstances, control of gene expression must be fine- tuned to suit the particular environment in which the trypanosome dwells. For the

LYT1 gene, a gene suspected of being involved in host-cell invasion, this is accomplished through alternative trans-splicing of the primary transcript[47]. Control of alternative trans-splicing of this gene is developmentally regulated; a longer 13 mRNA is expressed in amastigotes, while a shorter version is expressed in epimastigotes[48]. Through alternative trans-splicing, translation initiation is forced to begin at a downstream AUG in the shorter transcript, culminating in expression of a protein which is truncated by 28 amino acids N-terminally. This difference causes a change in subcellular localization of the LYT1 protein, and in turn that affects the function of the protein[47].

Protein isoforms with varied functions have been reported in other higher eukaryotes, especially plants[49]. In plants, the production of these alternative protein isoforms can be either transcriptional, by utilization of an alternative transcription start site which can force translation at downstream in-frame AUGs, or translational, through the use of cap-independent internal ribosome entry sites[49]. In trypanosomes, however, translation is absolutely dependent on the presence of a capped spliced leader at the 5‟-end of the mRNA[22]. Therefore alternative trans- splicing of the polycistronic precursor mRNA to generate capped transcripts of varied sizes is the only mechanism available for production of transcripts which utilize varied

AUG translation initiation codons.

The gamut of genes which utilize alternative trans-splicing for the production of protein isoforms which vary in localization or function is unknown, but at least two other genes are reported to have alternative trans-splice acceptor sites which result in different localization of the encoded protein. A proline racemase from T. cruzi is secreted when an upstream trans-splice acceptor site is used and cytosolic when a downstream trans-splice acceptor site is chosen[50]. Additionally, in Crithidia fasciculata, another trypanosome, the protein product of the RNH gene localizes mitochondrially when the upstream trans-splice acceptor site is selected, and to nuclei when a downstream trans-splice acceptor site is conscripted[51]. While the utilization 14 of alternative trans-splice acceptor sites appears to be a way to generate switches in functionality for at least a handful of trypanosome genes, the mechanism by which they select a given acceptor site remains to be elucidated[47].

Synthesis of the TMG cap precursor m7G

Although m7 G caps have many important roles, including providing RNA stability, localization, and a means through which to be translated[52], this section will be brief, focusing on the fact that acquisition of a m7G cap is a prerequisite for the formation of a TMG cap.

In eukaryotes, all known cellular mRNAs and most small RNAs have their 5‟- ends protected by the presence of a cap. These cap structures contain the unusual 5‟-

5‟ linkage of the capping G nucleoside and the initiation nucleotide[53]. Addition of this cap is a multi-step process. In the first step, RNA triphosphatase enzymes cleave the γ-phosphate of nascent pre-mRNA chains to form a diphosphate end[54]. In the second step, RNA guanylyltransferases cleave GTP to GMP and transfers it to the now diphosphate end of the pre-mRNA[54]. Finally, guanine-N7 methyltransferase catalyzes the transfer of a methyl group from S-adenosylmethionine to Gppp-capped pre-mRNAs, forming m7G -capped pre-mRNAs.

Capping is co-transcriptional, and occurs when the nascent RNA transcript is

25-30 nt long[55, 56]. This size threshold for capping is an important one, as it predicts the Ciona intestinalis SL RNA, at ~46 nt, should be capped, although few, if any, capped eukaryotic RNAs this small have been reported.

Synthesis and function of TMG caps

TMG caps, usually present on metazoan SL RNAs, are also known to exist on the small nuclear RNAs (snRNAs) involved in splicing in a diverse array of organisms, including human, trypanosomes, amoeba, yeast, and plants[23, 53]. 15

As previously mentioned, mRNAs and other RNA pol II transcripts are co- transcriptionally capped with m7G in the nucleus, and this m7 G precursor is hypermethylated in the cytoplasm to form TMG caps on snRNAs as well as the snoRNA U3. Trimethylation of Xenopus U2 RNA requires the presence of a binding site for the protein Sm[57]. However, only a subset of TMG-capped snRNAs is found associated with Sm proteins; human U3, U8, and U13 snoRNAs are not associated with Sm proteins, but are TMG-capped[53], implying an Sm-independent method of acquisition of TMG caps. Fibrillarin, a protein which binds to all three of these snoRNAs, has been suggested to be this Sm-independent signalling molecule for the conversion of m7G to TMG[53].

Figure 2. Illustration of the structures of m7G (upper) and TMG (lower) cap structures. Relative to m7G, TMG contains an additional two methyl groups at the N2 position of the guanosine pyrimidine ring. Figures reproduced with permission from [58] and [53], respectively.

16

In addition to being found on snRNAs and SL RNAs, TMG caps are found on some mRNAs, namely those of C. elegans which have been trans-spliced. The TMG cap in this instance is acquired through trans-splicing of the C. elegans SL RNA to the mRNA molecules, which retain the TMG cap structure of the SL RNA[3, 32]. These mRNAs are found in the cytoplasm and associated with polysomes, suggesting TMG- capped mRNAs can be functionally translated[3].

The precise role of the TMG is still not entirely defined, although it appears to have roles in cytoplasmic-nuclear shuttling, stability, and translation initiation.

In frog oocytes, the TMG cap of U1 and U2 appears to be required for transport of these snRNPs back to the nucleus from the cytoplasm[59, 60]. Nuclear shuttling of U1 requires both a TMG cap and Sm association. In contrast to U1 and

U2, however, U4 and U5 rely much less heavily on their TMG caps for nuclear import; when ApppG-capped in vitro-synthesized transcripts of U1, U2, U4, and U5 were injected into Xenopus oocytes, U1 and U2 were excluded from the nucleus entirely, U5 showed appreciable nuclear accumulation, and U4 exhibited an intermediate nuclear accumulation[60]. Although the TMG cap is not a stringent requirement for the nuclear import of U4 and U5 snRNAs, it does increase the rate at which the molecules enter the nucleus [60]. The fact that TMG caps increase the rate at which all snRNAs enter the nucleus, even the ones for which this is not strict requirement to enter the nucleus, suggests a model of snRNP import in which the

TMG cap acts as a signal to increase the efficiency of nuclear import[60].

One possible reason for the observed difference in the rate of nuclear accumulation may be the size of the RNA in question, as there is an inverse correlation between RNA length and rate of nuclear accumulation; U1, the largest snRNA, lacking a TMG cap shows the least nuclear localization while U5, the 17 smallest snRNA, has the highest rate of nuclear localization when lacking a TMG cap.

It seems reasonable to conclude from this that larger RNAs are more difficult to transport, and thus larger RNAs are more dependent on the localization signal provided by the TMG cap. It should be noted, however, that it is not naked RNA which is imported into the nucleus, but rather at least partially assembled protein-

RNA complexes called snRNPs[61], and thus RNA size is not the sole determinant of snRNP size.

The TMG cap plays a dual role in transport. First, it interacts directly snurportin 1 (SPN1)[62], a protein which binds TMG caps and acts as an adaptor to the nuclear import receptor importin ß. Injection of a TMG cap analogue into

Xenopus oocytes prevented transport of TMG-capped molecules, presumably due to competitive binding to SPN1[60]. Second, the TMG cap prevents interaction with some m7G cap-binding element which normally tethers m7G-capped RNAs to the cytoplasm. While in vitro transcribed TMG-capped U6 snRNA accumulates in the nucleus, m7G -capped in vitro transcribed U6 snRNA is sequestered in the cytoplasm.

Transport of m7G -capped in vitro transcribed U6 snRNA was activated upon co- injection into a Xenopus oocyte with a m7 G cap analogue[60]. Once again, this implies competitive inhibition of some cytoplasm-sequestering m7 G-capped RNA binding element. This or these element(s), possibly the cap-binding proteins involved in the initiation of translation, may also function to keep mRNAs in the cytoplasm, as cytoplasmic-nuclear shuttling of mRNAs has not been reported. The snRNAs can escape this cytoplasmic sequestration because their m7G precursor caps are rapidly converted to TMG in the cytoplasm[61]. If a cytoplasmic-tethering m7G -binding element is absolutely required for retention of mRNAs in the cytoplasm, how this is 18 accomplished in organisms whose mRNAs may be either m7G -capped or TMG- capped, such as C. elegans, remains unclear.

Transport of U6, however, is not affected by injection of a TMG analogue, suggesting this gamma-monomethyl phosphate-capped transcript is transported by a different method[60]. The snRNAs U4, U5, and U6 are known to associate tightly together to form a tri-snRNP; the association of U4 and U5 with U6 and their transport as a snRNP through the U6 nuclear import pathway may be another explanation to the nuclear accumulation observed for experimentally generated non-

TMG-capped U4 and U5 molecules, discussed above.

TMG-capped U1 and U2 snRNAs are more stable than uncapped RNAs, indicating the cap confers stability. In the case of U2, TMG caps do not appear to be required for 3‟-end processing[63] or snRNP assembly[64].

The snoRNA U3 contains a TMG cap in animal cells but a gamma- monomethyl phosphate cap in plant cells. U3 localizes to the nucleus as a 15S snRNP in transfected plant cells regardless of whether it is TMG-capped or gamma- monomethyl phosphate-capped. Only the gamma-monomethyl phosphate-capped U3, however, could form larger snRNPs of 60-80s, indicating the gamma-monomethyl phosphate cap may be required for proper assembly and function in plant cells, or that the TMG cap prevents this formation[65].

Finally, it has been shown by exposure of permeable trypanosome cells to S- adenosyl-L-homocysteine, a methyltransferase inhibitor, that methylation of the 5‟- end of trypanosome SL RNA is absolutely required for trans-splicing[66]. Unlike metazoan SL RNAs, however, the cap structure of trypanosome SL RNAs is cap4, which is an m7G cap followed by downstream nucleotides modified with, among other modifications, 2‟-O-methylation[20]. It was left unresolved whether the methylation 19 of the cap structure itself to m7G or 2‟-O-methylation of the nucleotides downstream of the cap was essential for trans-splicing[66]. This is in contrast to the results observed using a synthetic SL RNA derived from in vitro transcription and used in an

Ascaris lumbricoides cell free extract system[67]. Mutant SL RNAs lacking an Sm- binding site failed to bind Sm and acquire a TMG cap, and were unable to participate in the trans-splicing reaction. The failure to splice, however, was not caused by the lack of the TMG cap; when the Sm-binding site was left intact but methylation was prevented with S-adenosyl-L-homocysteine, TMG caps once again failed to form, but the SL RNA was nonetheless able to be utilized in the trans-splicing reaction. Thus in a nematode cell free extract system, trans-splicing is dependent upon association of the SL RNA with Sm proteins to form a snRNP, but not dependent upon the presence of a TMG cap[67].

Sm proteins, snRNPs, and splicing

The heart of the spliceosome consists of the U1, U2, U5, and U4/U6 snRNPs[68]. Each of the individual snRNPs, with the exception of the U6 snRNP, are composed of the constituent snRNAs, seven core Sm proteins, and several snRNP- specific proteins[68]. Although the assembly of the heteroheptameric Sm complex occurs spontaneously in vitro, it is a highly regulated process in vivo; it requires both

ATP and the assembly factors found in the survival of motor neuron (SMN) complex

[69, 70]. Biogenesis of snRNPs begins in the nucleus with RNA pol II mediated transcription of the snRNAs, which are then exported to the cytoplasm[71]. The primary 3‟-end of these transcripts extends beyond that of the mature 3‟-end[72].

Once in the cytoplasm, the SMN complex recruits the core Sm proteins to bind in the ordered, stepwise manner B/B‟, D3, D2, D1, E, F, and finally G[68]. 20

SMN, the protein complex involved in recruitment of Sm proteins to snRNAs in the cytoplasm, consists of the proteins SIP1/Gemin2, Gemin3, and Gemin4[73-76].

These proteins interact only with the budding snRNPs, as they are dissociated just before nuclear snRNP import[62, 68]. As SMN interacts with both Sm proteins and importin ß, the nuclear import receptor through which snRNPs are imported, it has been suggested that this protein complex provides snRNPs with an alternate nuclear localization signal to the TMG cap.

Once recruited to the snRNAs by the SMN complex, Sm proteins bind the highly conserved Sm site PuAU4-6GPu. This single-stranded binding site is flanked by two stem-loop structures in the U1, U2, U4, and U5 snRNAs, a feature conserved in most but not all SL RNAs[68]. Sm binding is a prerequisite for both TMG-capping via the methylating enzyme trimethylguanosine synthase (Tgs1) and exonucleolytic trimming of the 3‟-end of the primary transcripts[72]. Sm binding in conjunction with a TMG cap provides a bipartite nuclear localization signal for the cytoplasmic snRNP.

Nuclear re-import is achieved through interaction with three import factors: SPN1, an import adaptor which recognizes the TMG cap of the snRNA, importin ß, a nuclear import receptor which interacts with SPN1, and an as of yet unidentified component, which may be SMN, which interacts with Sm proteins and allows for import[62, 77].

Upon re-entering the nucleus, the snRNPs briefly reside in Cajal bodies[78] where their constituent snRNAs undergo site-specific pseudouridylation and 2‟-O- methylation[79], after which they travel to interchromatic regions and accumulate in splicing speckles. These splicing speckles are generally near regions of active transcription, and are believed to be sites of snRNP-mediated splicing or for recycling of snRNP components[80].

21

Ciona intestinalis

Ciona intestinalis is an ascidian. Ascidians, along with larvaceans and thaliaceans, make up the tunicate, or urochordate, subphylum[81]. In addition to tunicates, the chordate phylum is comprised of cephalochordates and vertebrates.

Ascidians possess a notochord, dorsal neural tube, and gill slits, hallmarks of all chordates, and as such they are a sister group of vertebrates[81]. Historically, it was believed that cephalochordates were the sister taxon to vertebrates, but recent phylogenetic analyses using genomic data indicate that tunicates are actually the closest extant relative of vertebrates[82].

Ascidians are marine invertebrate chordates which are found throughout the world[81]. As larvae, ascidians use tail muscle to power their swimming. As adults, these creatures become sessile, filter-feeding organisms, a fact which led them to be classified as molluscs until the tadpole-like larva was discovered[81]. In addition to their evolutionary position between non-chordate deuterostomes and vertebrate chordates, which facilitates the study of the evolutionary origins of the chordates from an ancestral deuterostome, and the origins of the vertebrates from a simple chordate, the ascidians are relatively simple animals, making them ideal candidates for evolutionary-developmental studies, which are the primary focus of ascidian molecular biology research.

The ascidian larval body plan consists of only a tail and a trunk, and is composed in its entirety of around 2600 cells[83]. The chordate features of a notochord and dorsal neural tube are present in only the larval stage; the tail is resorbed during metamorphosis, and the emerging adult is formed entirely out of precursor cells in the trunk[83]. In addition to their simple body plans making them ideal candidates for evolutionary-developmental studies, ascidian development is also 22 rapid; Ciona intestinalis takes around only eighteen hours to develop from a zygote to swimming larva[81].

Since Ciona intestinalis is often used for evolutionary-developmental studies, many resources for Ciona are available, including a sequenced genome[84]. These resources are also invaluable for studies of gene expression mechanisms, such as SL trans-splicing.

Project rationale

I primarily studied three facets related to SL trans-splicing throughout the course of my thesis work: molecular characterization of the SL RNA itself, determination of where in the genome the SL RNA genes reside, and amplification of outrons, which are removed from pre-mRNAs by trans-splicing. The rationale for studying each of these aspects is summarized below.

Molecular characterization of the Ciona intestinalis SL RNA

The first evidence of SL trans-splicing in the chordates was serendipitously discovered in our lab while studying the TnI gene in Ciona intestinalis[85]. The complete cDNA sequence of the mature mRNA was determined by a 5‟-RACE procedure, and through database searching, it was surprisingly discovered that the 5‟- end of the TnI gene shared in common its first 16 nt with at least 3 other mRNAs.

The presence of a shared nucleotide sequence of unrelated mRNAs is characteristic of

SL trans-splicing[2, 41, 86]. Further genes were investigated in a follow-up study, and it was confirmed that other genes also shared the 16 nt common sequence, and that addition of this sequence followed a consensus splice acceptor site AG. The 16 nt sequence was not encoded as an exon in the near-upstream regions of the translation start sites of the genes[1]. Finally, the mRNA from a plasmid construct consisting of the TnI outron and a β-gal reporter introduced into Ciona embryos was itself trans- 23 spliced, and as this construct did not itself contain the 16 nt SL sequence, this result was the definitive evidence that Ciona utilizes trans-splicing[1]. A candidate SL

RNA of ~50 nt was reported and contained the predicted features of a spliced donor

RNA: the consensus splice donor site GU abutting the SL, and the presence of an Sm binding site located within the intron portion of the molecule[1]. Although there was a prediction of Sm binding, the Ciona intestinalis SL RNA had not been previously shown to be actually associated with these proteins, warranting direct investigation of whether it was. Since this was the smallest SL RNA yet discovered, and had a predicted secondary structure differing from other SL RNAs[1], due in part to its relatively small size, investigating whether it shared other features, such as whether it is TMG-capped, as other metazoan SL RNAs are, was also warranted.

FISH to the Ciona intestinalis SL RNA gene cluster

Although the Ciona intestinalis genome has been sequenced[84], the majority of the SL RNA genes were omitted from the genomic assembly, due to the inability of the assembly software to incorporate directly repeated sequences such as the SL RNA gene. In order to determine whether the majority of the SL RNA genes reside at a single site, or if there are many clusters spread throughout the genome, and where these cluster(s) reside, I isolated and characterized a segment of genomic DNA containing four SL RNA gene repeat units, and this was used by a colleague in Japan for FISH localization of the SL RNA gene repeats in the genome.

Outron amplification

Our lab recently completed a study aimed at identifying the full contingent of trans-spliced genes in Ciona intestinalis [Jun Matsumoto et al., manuscript in preparation]. This was accomplished through RT-PCR using a random hexamer primer linked to an arbitrary anchor sequence for reverse transcription and SL-specific 24 rightward and anchor-specific leftward primers for subsequent PCR. The resulting heterogeneous population of cDNAs were sequenced in a high-throughput manner, delineating the vast majority of mRNAs which are trans-spliced in Ciona intestinalis in the tailbud embryo stage examined. In addition to defining the subset of mRNAs which are trans-spliced, this data also provided a wealth of information regarding which splice acceptor sites are used, data which may prove useful for studying what sequence information besides AG is required to mark a trans-splice acceptor site.

However, as the 5‟-ends of trans-spliced pre-mRNAs are lost via the splicing reaction, virtually nothing is known about the transcription start sites of genes whose mRNAs are trans-spliced. How far away the transcription start site is from the trans-splicing acceptor site is unknown for all but a handful of trans-spliced genes. Defining the transcription start sites of the trans-spliced genes in Ciona intestinalis, which is around half of the total number of genes[25], would provide a wealth of information regarding which genomic sequences act as promoters in this organism. Outron amplification is an attempt to take advantage of the fact that the regions of trans- spliced pre-mRNAs which are replaced by the SL, the outrons, should all contain a common sequence, the SL RNA intron, as a 2‟-5‟ phosphodiester-linked branch at the branchpoint A residue. Through RT-PCR using a primer for reverse transcription complementary to the SL intron and subsequent PCR amplification using a rightward primer complementary to an anchor sequence which could be ligated to the 5‟-end of the outrons, similar to what is done in 5‟-RACE procedures, the transcription start sites of the population of trans-spliced mRNAs could also theoretically be identified by high-throughput sequencing. Further, this approach would provide insight into the stability of outrons; very little is known about the stability of spliced introns in eukaryotic cells[87], and virtually nothing is known about the stability of the Y- 25 branched outrons. There is an important distinction between two types of structures with respect to their stability: the stability of lariats, which initially contain a single free 3‟-end consisting of the lariat handle, is dictated by the rate of debranching, or linearization, to provide an additional unbranched free 3‟-end, the substrate upon which exonucleases act to degrade RNAs[87], whereas Y-branched intermediates contain two free 3‟-ends to begin with, and as such may be inherently less stable.

Successful outron amplification would be the first proof that these Y-branched intermediates exist in vivo at detectable levels.

26

CHAPTER 2 - METHODS RNA extraction

RNA was isolated from Ciona intestinalis animals by K. Hastings in Tom

Meedel‟s lab at Rhode Island College, Providence, Rhode Island. Tissues of interest were dissected, rinsed in 150 mM NaCl, 1 mM MgCl2, 10 mM Tris-HCl pH 8.5.

Tissues were homogenized in 10 mL TRIzol® (Invitrogen). Chloroform (2 mL) was added to the homogenates. Samples were then centrifuged at 10000 g for 15 min at

4°C. The top phases were removed, and 5 mL of isopropanol was added. Mixtures were centrifuged at 10000 g for 20 min, and the supernatants were removed. Pellets were washed in 70% ethanol, vortexed, and spun at 5000 g for 10 min. Supernatants were removed, pellets were allowed air dry briefly, and RNA samples were stored as damp ethanol precipitates during transportation, and resuspended in water for long- term storage at -80°C.

Northern blotting

Unless specified otherwise, all northern blots were performed on 10% (19:1 acrylamide:bisacrylamide) 7.5 M urea gels, 1 mm thick, which were polymerized with

40 uL TEMED/100 mL and 0.8 mL/100 mL of freshly prepared 10% APS (small gels were 8.3 cm x 7.3 cm, run in the Bio-Rad Protean® 3 apparatus, and required ~7 mL of solution to cast). Gels were pre-run in 1X TBE (0.089M Tris base, 0.089M boric acid, 0.002M EDTA) at 175 V for ~15 min and then run at 200 V until sufficient resolution had been achieved. RNA samples were mixed with an equal volume of 1X

TBE-urea pH 8.0 sample loading buffer from Bio-Rad (catalog no. 161 0737) and heated at 65°C for 5 min then snap-chilled on ice for 1 min prior to loading. Also prior to loading samples, the wells were rinsed with 1X TBE. Electrophoresis was temporarily halted and the wells were rinsed once again shortly after the RNA had 27 entered the gel, approximately when the xylene cyanol and bromophenol blue size markers of the loading buffer first appeared to be visually separated.

Prior to transfer, gels were stained with 10 uL of SYBR® Safe (Invitrogen,

10000X) in 100 mL of 1x TBE for 30 min. This served two purposes: i) superimposition of the stained gel onto the membrane to permit alignment of the stained RNA size markers (NEB, catalog no. N0364S) onto the blots ii) verification that the RNA left the gel during the transfer.

After staining, the contents of the gel were semi-dry transferred onto a Zeta-

Probe membrane (Bio-Rad, catalog no. 162-0165); transfer was carried out in the Owl semi-dry electroblotter (Thermo Scientific) at 50 mA for 75 min in 1 X TBE.

Membranes were allowed to air-dry for 30 min and UV cross-linked twice (120000 uJ each) in the Stratalinker 1800 UV crosslinker (Stratagene).

Blots were probed with 32P end-labelled oligonucleotides ~20 nt in length; each probe was given a reference number, and their full sequences have been included as appendix 1. Labelling reaction consisted of 10 pmol DNA termini, 5 uL 10X

Fermentas buffer A (500 mM Tris-HCl (pH 7.6 at 25°C), 100 mM MgCl2, 50 mM

DTT, 1 mM spermidine and 1 mM EDTA), 15 uCi γ-32P-ATP (using 6000Ci/mmol 10 mCi/mL γ-32P-ATP, Perkin-Elmer catalog no. BLU502Z), and 2 uL (10 U/uL) of T4 polynucleotide kinase (Fermentas, catalog no. EK0031) in a 50 uL reaction, which was carried out at 37°C for 30 min and subsequently stopped at 65°C for 15 min.

Probes were purified using Illustra G-25 Microspin columns; columns were prepared by centrifugation for 1 min at 735 g, at which point the probe was loaded into the column and eluted by centrifugation for 2 min at 735 g.

Prior to probing, blots were pre-hybridized in 10 mL ExpressHyb™

Hybridization Solution from Clontech in a screw-top glass hybridization tube for 30 28 min at 37°C in a rotating hybridization oven. Probe (50 uL) was then added and hybridizations were carried out for 1 h at 37°C. First wash was 2X SSC (0.3 M NaCl,

0.03 M Sodium Citrate) 0.05% SDS for 20 min. Three subsequent washes were then performed using 2X SSC 0.05% SDS for 5 min at room temperature. Blots were wrapped in Saran™ wrap, and exposed to GE Healthcare Life Sciences Storage

Phosphor Screens overnight. Blots which were reprobed were first stripped in 0.1X

SSC 0.1% SDS for 1 h at 65°C and stored wrapped at 4°C.

PCR amplification and cloning of the 53 nt SL RNA exon-containing transcript

As the 53 nt SL RNA exon-containing transcript appeared to be enriched in the antibody pellet lane from the R1131 experiment (figure 11A), 5 ug of Ciona intestinalis body-wall muscle RNA was immunoprecipitated with the R1131 antibody, and the resulting supernatant and pellet fractions were poly(A) tailed with poly(A) polymerase, (PAP, Ambion, catalog no. AM2030) in a reaction consisting of 20 uL

5X PAP buffer (Ambion), 10 uL 25 MM MnCl2, 1 uL 100 mM ATP, 1.5 uL PAP (2

U/uL) in a 100 uL reaction. The poly(A)-tailed RNA was phenol/chloroform extracted, ethanol precipitated, and resuspended at 0.5 ug/uL. Resuspended RNA (2 uL) was reverse transcribed using primer 9038RT, a poly(T) primer linked to an arbitrary anchor sequence (see methods subsection entitled “reverse transcription”).

PCR (2 uL reverse transcribed DNA, 5 uL 10X Taq buffer with KCl (100 mM

Tris-HCl (pH 8.8 at 25°C), 500 mM KCl, 0.8% (v/v) Nonidet P40), 1 uL dNTPs

(10mM each), 0.5 uL Taq (Fermentas, 5 U/uL, catalog no. EF0511), 1 uL each primers 9015 and 9038 (0.1 ug/uL each; see appendix 1 for sequence), and 3.0 uL of

MgCl2 (25 mM), in a 50 uL reaction, using PCR program POLYAPCR(appendix 2)) using primers specific to the SL RNA (9015) and the arbitrary anchor sequence (9038) was used to amplify those RNAs containing the SL RNA sequence. Results of the 29

PCR were visualized by agarose gel electrophoresis, and as the amplified pellet lane

RNA appeared to contain a PCR product slightly larger than that of the amplified supernatant lane RNA (not shown), the cDNA was cloned into the pCR®4-TOPO® vector from Invitrogen.

The cloning reaction consisted of 2 uL of PCR reaction from the antibody pellet cDNA, 1 uL TOPO® vector (10 ng plasmid), and 1 uL of salt solution (1.2 M

NaCl, 0.06 M MgCl2) in a 6 uL reaction. Four clones were sequenced (figures 6A&B) at the Institut de recherche en immunologie et en cancérologie (IRIC) sequencing facility with the M13+ primer.

Reverse transcription

The same reverse transcription protocol was used several times throughout my work. The following framework was applied for all of the procedures, with only the primers and RNA differing. The RNA, reverse transcription primer (1 ug), and 1 uL of dNTP mix (10 mM each) were combined in a 12 uL reaction. This reaction was heated to 65°C for 5 min and snap-chilled on ice. To that reaction, 4 uL of 5X first- strand buffer (Invitrogen, 250 mM Tris-HCl, pH 8.3 at room temperature; 375 mM

KCl; 15 mM MgCl2) and 2 uL of 0.1 M DTT were added; the reaction was brought up to 42°C for 2 min, at which point 1 uL (200 U) of SuperScript® II Reverse

Transcriptase (Invitrogen) was added, and the reaction was brought to a final volume of 20 uL. Reverse transcription proceeded at 42°C for 50 min, and was stopped at

70°C for 15 min.

Surveying the expression of various SL RNAs

The Ciona intestinalis SL RNA had previously been shown to be ~46 nt[1], however it was apparent that there may be some size heterogeneity amongst various

SL RNA transcripts. In order to survey this variation, the plasmid DNA containing 30 the same poly(A)-tailed and SL-PCR amplified reaction which initially produced the

~46 nt sequence was re-transformed into competent DH5alpha cells by heat-shock for

30 s at 42°C. The transformed cells (10 uL) were plated, and 96 colonies were randomly selected for sequencing by Génome Québec. The sequences (table 1) of the

73/96 clones resembling SL RNAs are presented in the results subsection “Survey of

SL RNA size and sequence variation”.

Immunoprecipitation using the K121 mouse monoclonal antibody

The immunoprecipitation experiment involving the K121 antibody was carried out by K. Hastings. I used these immunoprecipitated samples for Northern blots.

Freshly isolated of body-wall muscle RNA (30 ug) was incubated for 1.5 h on a rocking platform at 4°C in 240 uL of NET-2 (150 mM NaCl, 50 mM Tris-HCl pH 7.5,

0.05% NP-40) and 50 uL of K121 antibody/agarose conjugate (Oncogene Research

Products catalog no. NA02A). After 1.5 h, samples were briefly centrifuged and the supernatants were removed and stored. Pellets were rinsed three times in 1 mL NET-

2. After three rinses, 150 uL NET-2 and 7.5 ug of yeast tRNA, which served as a carrier, were added to the pellet. RNA was extracted with an equal volume of phenol/chloroform, and ethanol precipitated.

Immunoprecipitation using the R1131 rabbit polyclonal antiserum

Immunoprecipitations to investigate the 5‟-ends of the SL RNA and trans- spliced mRNAs were performed using a polyclonal rabbit antiserum, R1131, from

Synaptic Systems (catalog no. 201 002) which was raised against synthetic TMG caps and which specifically recognizes TMG[88]. Antiserum (20 uL) was coupled to 25 uL of Protein A-Sepharose® beads from Sigma-Aldrich® overnight in a head-over- tail incubator at 4°C in 1X PBS (137 mM NaCl, 2.7 mM KCl, 10 mM Sodium

Phosphate dibasic, 2 mM Potassium Phosphate monobasic, pH 7.4). After overnight 31 coupling, the beads were rinsed 3 times in 4°C 1X PBS to remove unbound antibody.

RNA was incubated with the bead-coupled antiserum in 250 uL IPP150 (10 mM Tris-

HCl, pH 7.4, 150 mM NaCl, 0.1% NP 40, 0.1 U/mL RNasin (Promega, catalog no.

N2111)) at 4°C. An antibody-free control containing only beads was done in parallel to investigate whether there was non-specific binding of the RNA to the beads. After

1 hr of binding, the samples were briefly centrifuged, the supernatants were removed, and the pellets were washed 5 times in 4°C IPP150. After washing, 250 uL of IPP150 was added back to the pellet fractions, in addition to quail poly(A)-minus RNA to serve as a carrier. RNA was purified from both the final pellet and original supernatant fractions by adding an equal volume of phenol and then chloroform to each of these fractions, respectively, followed by vortexing and centrifugation to separate the aqueous and organic phases. RNA was precipitated out of the recovered aqueous phase using 2.5 volumes of 95% ethanol and 0.1 volumes of 3 M NaCl, and finally resuspended in distilled water.

Shrimp alkaline phosphatase (SAP) treatment

RNA was mixed with 1 uL of SAP (Fermentas, 1 U/uL, catalog no. EF0511)) and 5 uL of 10X reaction buffer (0.1 M Tris-HCl (pH 7.5 at 37°C), 0.1M MgCl2 and 1 mg/ml bovine serum albumin) in a 50 uL total reaction. Reactions proceeded for 1 h at 37°C and were stopped at 65°C for 15 min. RNA was purified by adding 1 volume of phenol followed by 1 volume of chloroform, followed by vortexing and centrifugation to separate the aqueous and organic phases, ethanol precipitated out of the aqueous phase, and resuspended in distilled water.

Tobacco acid pyrophosphatase (TAP) treatment

RNA was mixed with 1 uL of TAP (Epicentre, 10 U/uL, catalog no. T19050) and 5 uL of 10X reaction buffer (0.5 M sodium acetate (pH 6.0), 10 mM EDTA, 1% 32

β-mercaptoethanol, and 0.1% Triton® X-100) in a 50 uL total reaction. Reactions proceeded for 2 h at 37°C. RNA was purified by adding 1 volume of phenol followed by 1 volume of chloroform, followed by vortexing and centrifugation to separate the aqueous and organic phases, ethanol precipitated from the aqueous phase, and finally resuspended in distilled water.

Northern blotting to determine of the cap status of the SL RNA

In order to detect if the 5‟-end of the Ciona intestinalis SL RNA is protected by a cap, Northern blots of SAP- and TAP-treated Ciona intestinalis RNA samples, as well as enzyme-treated 5‟-phosphorylated synthetic SL exon oligonucleotides, were performed. In order to achieve the single-phosphate or single-nucleotide resolution results required by these experiments, 20% acrylamide (19:1 acrylamide:bisacrylamide) 7.5 M urea 1 mm thick gels were used (large gels were 16 cm x 16 cm, run in Protean® II apparatus from Bio-Rad, and required ~40 mL of solution to cast). These large gels were run overnight at a constant 5 mA, until sufficient resolution had been achieved (the xylene cyanol was approximately 3 cm from the bottom of the gel). In order to facilitate transfer out of a higher percentage gel, semi-dry transfer was carried out at 50 mA for 90 min in 1 X TBE. Barring those modifications, other procedures were carried out as previously described in the methods subsection “Northern blotting”.

For all gels used for investigating the 5‟-ends of RNAs (i.e. gels containing samples treated with either SAP, TAP, or both), the total amount of RNA in a given lane was equal within in an experiment, lest slightly varying amounts of RNA affect how far the RNAs in that lane migrate (i.e., the lanes containing synthetic SL exon oligonucleotides without Ciona intestinalis RNA in them had equivalent amounts of 33 carrier RNA), since the difference in migration of transcripts which vary by a single phosphate was small.

Immunoprecipitation using the H20 mouse monoclonal antibody

An immunoprecipitation to investigate the methylation status of the SL RNA cap was performed using a monoclonal mouse antibody called H20 from Synaptic

Systems (catalog no. 201 001) which was raised against synthetic TMG caps and which recognizes both TMG and m7G, but not unmethylated G[89]. Antiserum (80 uL) was coupled to 25 uL of Protein A-Sepharose® beads from Sigma-Aldrich® overnight in a head-over-tail incubator at 4°C in 1X PBS. Washes, binding conditions, and elution were the same as those for the R1131 antibody above.

Immunoprecipitation using the anti-Sm antiserum

As the purpose of this experiment was to determine if the Ciona intestinalis SL

RNA is complexed with Sm proteins, this immunoprecipitation experiment against the

Sm antigen required the use of whole cell homogenate rather than purified RNA, as purification of the RNA prior to immunoprecipitation would disrupt protein-RNA complexes. A 10% w/v Ciona intestinalis intestine extract, prepared by Ken Hastings, was homogenized in homogenization buffer (150 mM NaCl, 10 mM Tris-HCl pH 8.5,

1 mM MgCl2, 1 mM PMSF in DMSO, 0.5 % NP40). Extract was centrifuged at 5000 g for 5 min at 4°C and supernatant was removed and stored on ice.

Human anti-Sm antiserum (15 uL) from Immunovision (catalog no. HSM-

0100) was coupled to 30 uL Protein A-Sepharose® beads from Sigma-Aldrich® in

300 uL of NET-2 for 80 min at room temperature. Antibody-coupled beads were rinsed twice in 1 mL NET-2, and finally resuspended in 30 uL NET-2. The tissue extract (300 uL) was added to the antibody-coupled beads, as well as to beads alone, to test for non-specific binding to the beads. Immunoprecipitation was for 2 h at 4°C 34 with gentle agitation. Extract alone was treated in a similar manner to the immunoprecipitated extract with the exceptions of the addition of beads and antiserum and is referred to as mock-immunoprecipitated extract. The RNA was then purified from this aliquot of extract at the end of the procedure to control for any effect incubation as a whole cell lysate had on the RNA contained within. Supernatants were removed and stored, and pellets were washed three times in 1 mL ice-cold NET-

2, and finally resuspended in 150 uL of NET-2. Yeast tRNA (1.5 ug) was added to each pellet to serve as a carrier. RNA from both the pellets and supernatants was phenol/chloroform extracted and ethanol precipitated.

Identification of Sm-Associated transcripts

Sm-immunoprecipitated RNAs were cloned and sequenced. To do so, 2 uL of the immunoprecipitated RNA was poly(A)-tailed, as described in the methods subsection entitled “Identification of the 53 nt SL RNA exon-containing transcript”.

The same primer and conditions were also used for reverse transcription, as well as the subsequent PCR (once again, methods subsection entitled “Identification of the 53 nt SL RNA exon-containing transcript”). The PCR reaction was used in a TOPO® cloning reaction (see methods subsection “Topo TA® Cloning”), and plating 10 uL of this cloning reaction yielded numerous colonies, 8 of which were sent for sequencing at Génome Québec with the M13+ primer. Results of the sequencing (table 2) are present in the results subsection entitled “The Ciona intestinalis SL RNA, as well as the 53 nt SL RNA exon-containing transcript, are snRNPs associated with Sm proteins”.

RNase H digestion of Ciona intestinalis TnI mRNA

Because others had reported problems immunoprecipitating full-length but not

RNase H digested RNAs with an anti-TMG antibody[3, 28], RNase H digestion, 35 which cleaves double-stranded DNA/RNA hybrids, was performed on the trans- spliced Ciona intestinalis TnI mRNA prior to immunoprecipitation[3, 28]. Ciona intestinalis body-wall muscle RNA (30 ug) was mixed with 1 ug of an oligonucleotide

(9294) complementary to bases 101-122 of the Atlantic Ciona intestinalis TnI gene

(numbered with the first base after the trans-splice acceptor site as being +1), 5 uL of

10 X RNase H reaction buffer (50 mM Tris-HCl (pH 8.3 at 25°C), 75 mM KCl,

3 mM MgCl2, 10 mM Dithiothreitol), and 2 uL RNase H (NEB, 5 U/uL, catalog no.

N2111) were combined in a 50 uL reaction. Reaction was 1 h at 37°C and stopped for

20 min at 65°C. This reaction should release the first 100 nt of the TnI mRNA after the trans-splice acceptor site from the rest of the mRNA, yielding a 116 nt product

(100 nt of TnI mRNA + 16 nt SL).

ATTCTATTTGAATAAGATGCAGTTTCATCTACAGCAACCGGTATCAATTGGTTAAGTAGGTCTTGTAGT TTGGCTTAGCAACGCAACAAAACAGCACCATGTCGGAAGAGAGCGGAGTTCGCAAAATGACCCACCAGC GCAAGATGATGCTCAAGTCTCTCATGCTTAACAAAGCACGAGAAGATTTGAAGCGAGAAATGGAACAAA AGGCGGAGGCGAAAAAAGCTGAGCTAAGCCAACGACTTGAGCCACTCAGCGGCTTAAATTCCATGTCGT CACAAGAACTCATGGATCTCTGCCGAGAACTCCATGGTAAGATCGATAAAGTTGACGAGCAACGATTTG ACATTGAGGCAAGAGTAAAGAAGAATGATACTGAGATTGAAGAGTTGAACCAGAAAATCTTTGATCTTC GTGGCAAATTCAAGCGCCCACCACTGCGACGTGTAAGGATGTCAGCTGACCAGATGTTGCGAGCTCTGC TTGGTTCGAAACACAAAGTATCCATGGATTTACGATCCAACCTCAAATCCGTCAAGAAGGGAGGAGAGA AGAAGGAAGACGCCGAGGTTAAAGATTGGAGAGACAACATTGAAGCCAAGCAAGGAATGGGCGGAAAGA AGGCAGTTTTTGAGGGAGCTCAATAAACCTGTTAATAATTTAACCAAAAAAGAATTTCACAGAGTATTG CTATAATGTTATACGAATAATTTTTGCAACATTCGAAATAAAATAACGAACAATAACCACATTATGTGA TCTATGTTTGTTGGCTGGTTTTACGTATTTTTCTTTGGCATGTTGCTTAGAATACACACAAATAAATCA ATTGT

Figure 3. Full-length Atlantic Ciona intestinalis trans-spliced TnI mRNA (including the SL). The underlined characters are reverse complementary to the oligonucleotide (9294) used for RNase H digestion while the bold characters correspond to the nucleotides which are antisense with respect to the detection probe (9281).

Measuring hybridization on Northern blots

The freely available software ImageJ[90] was used to measure the level of hybridization to a given transcript. Representative backgrounds values were measured for each lane, as background tended to vary widely from lane to lane, especially when hybridizing the blots to an antisense oligonucleotide (9060) corresponding to the SL

RNA exon. This was not unexpected; the immunoprecipitation fraction in which 36 trans-spliced mRNAs were present was expected to exhibit a higher background due to the presence of many mRNAs and mRNA fragments containing the target SL sequence. The image area corresponding to the transcript of interest was then defined, its intensity on the blot measured, and lane-specific background values were subtracted to give a corrected value. All measurements have been included as appendix 3. In some cases background for the lane exceeded the signal from the region of the blot where the transcript should be present, and in these cases the amount of transcript was set to zero. The amount of hybridization signal measured in the supernatant and pellet fractions was combined to give a value corresponding to the total amount of RNA recovered from an immunoprecipitation, and the proportion of that total amount of RNA residing in either the supernatant or pellet lanes was calculated.

Topo TA® cloning

The same framework cloning protocol is referenced several times throughout this section; the only differences between clonings are the manner in which the DNA for the Topo TA® reaction is produced, the specifics of which are mentioned in the subsection of methods pertaining the methods of production. 6 uL Topo TA® reactions containing the DNA of interest were assembled, and allowed to proceed for

5 min at room temperature. That Topo TA® reaction (2 uL) was added to one vial (50 uL) of chemically competent One Shot® TOP10 cells that had been previously thawed on ice. The Topo TA® reaction and chemically competent cells were allowed to mingle on ice for 5 min. The mixture was then heat-shocked for 30 s in a 42°C water bath. After 30 s, the tubes were immediately transferred to ice, and 250 uL of room temperature S.O.C. medium (2% tryptone, 0.5% yeast extract, 10 mM NaCl, 2.5 mM KCl, 10 mM MgCl2, 10 mM MgSO4, 20 mM glucose) was added. Cells were 37 then incubated in a 37°C shaking (~200 rpm) incubator for 1 h. Cells were plated on

LB (1% tryptone, 0.5% yeast extract, 0.5% NaCl) 1.5% agar plates containing ampicillin (25 ug/mL) and allowed to grow overnight at 37°C. Selected colonies were used to inoculate 5 mL cultures of LB containing ampicillin (25 ug/mL), which were cultured overnight at 37°C. Cells were harvested by centrifugation at 5000 rpm, and plasmids were extracted with a QIAprep Spin Miniprep Kit (Qiagen), following the manufacturer‟s instructions.

Production of a fluorescent in situ hybridization probe

The probe was amplified from an existing genomic phage library, kindly provided by Robert Zeller of San Diego State University, which had been created by cloning partially Sau3A digested genomic DNA into the pBK-CMV vector which had been linearized with BamHI. Sau3A cuts only once in the SL RNA gene repeat.

Because this was only a partial digest, inserts should range in size in multiples of 264, the SL RNA gene repeat unit length. Primers (9240 and 9241) which spanned both the phage vector and the SL RNA repeat were used, rather than primers entirely within the repeat unit itself, to prevent the successive shortening of amplicons normally seen when amplifying tandemly repeated sequences. ctacatttgcacgaatgggtctaggtgtttttctttatacgaagaaaactttttgaattaaagcaccca agtcggacattttacggtATTCTATTTGAATAAGGTAAGAATCTGAGCTTTGGCTCAACTATGTatata aacaaaagagttctggacctaaaatacttgtttgatcaattatcgcgctacaccgcacaacgtaagaga tatcggactttaccattattacataccgttatcacatatgtcagattaacacatcgg

Figure 4A. The SL RNA gene unit repeat (~264 bp), found through a BLAST search of the Ciona intestinalis WGS trace archives. Sau3A site in bold; SL RNA in its entirety in capitals.

38

GGGTACACTTACCTGGTACCCCACCCGGGTGGAAAATCGATGGGCCCGCGGCCGCTCTAGAAGTACTCT CGAGAAGCTTTTTGAATTCTTTGGATCaattatcgcgctacaccgcacaacgtaagagatatcggactt taccattattacataccgttatcacatatgtcagattaacacatcggctacatttgcacgaatgggtct aggtgtttttctttatacgaagaaaactttttgaattaaagcacccaagtcggacattttacggtattc tatttgaataaggtaagaatctgagctttggctcaactatgtatataaacaaaagagttctggacctaa aatacttgtttgatc…CACTAGTGTCGACCTGCAGGCGCGCGAGCTCCAGCTTTTGTTCCCTTTAGTGA

Figure 4B: The SL RNA gene unit repeat surrounded by an arbitrary amount of the pBK-CMV vector. Vector in all capitals; SL RNA gene in all lower case. Bold represents the Sau3A cut site; ellipsis represents the potential presence of another SL RNA gene repeat. Underlined nucleotides are complementary to primers 9240 (right) and 9241 (left) used for PCR.

The initial PCR reaction (1 uL phage library, 5 uL 10X Taq buffer with KCl

(100 mM Tris-HCl (pH 8.8 at 25°C), 500 mM KCl, 0.8% (v/v) Nonidet P40), 1 uL dNTPs (10mM each), 0.5 uL Taq (5 U/uL), 1 uL each primers 9240 and 9241 (0.1 ug/uL; see appendix 1 for sequence), and 3.0 uL of MgCl2 (25 mM), in a 50 uL reaction, using PCR program PCR54(appendix 2)) produced faint bands that appeared to be in multiples of ~300 ranging from ~0.3 kb to ~1.1 kb (not shown). The uppermost (~1.1 kb) band was extracted using the QIAquick Gel Extraction Kit from

Qiagen according to the manufacturer‟s instructions. A similar PCR reaction as previous was set up, this time using 1 uL of the gel-extracted uppermost band as the template, and using the program PCRGRADIENT instead of PCR54. This re- amplification produced a clearly visible ladder of bands with the rungs in multiples of

~300 nt for the PCR reactions with the annealing phase carried out at 61.0°C (figure

20, lane 2) and 66.6°C (figure 20, lane 3), and as such the now abundant uppermost band was once again extracted from the gel from the lane in which it was enriched. A

TOPO® TA Cloning (Invitrogen) reaction was set up (4 uL gel-excised DNA, 1 uL salt solution (1.2 M NaCl, 0.06 M MgCl2), 1 uL of pCR®4-TOPO® vector in a 6 uL reaction; see section entitled “Topo TA® Cloning”), and plating 20 uL of the reaction produced colonies. Colonies were selected and sent for sequencing to the IRIC sequencing facility. Two plasmids, SLx4.6 (figure 21A) and SLx4.7 (figure 21B), 39 each corresponding to four SL RNA genes, were used for fluorescent in situ hybridization.

Fluorescent in situ hybridization

SL RNA gene multimers SLx4.6 and SLx4.7 were sent to a collaborator,

Eiichi Shoguchi, at the Okinawa Institute of Science and Technology, to perform the

Fluorescent in situ hybridization (FISH) experiment identifying the SL RNA gene cluster(s), following an established FISH protocol[91].

Amplification of the TnI outron

Attempts to amplify the TnI outron were initially carried out using RT-PCR, with a reverse transcription primer complementary to the SL RNA intron, the sequence of which was previously identified by a student in our lab, Parul Khare, through successive deletions of trans-splice acceptor sites[92]. Initial attempts to amplify the outron used primer 9232 (complementary to the SL RNA intron) for reverse transcription; 1 ug each of Ciona intestinalis body-wall muscle RNA and embryo RNA were reverse transcribed with this primer. Two subsequent PCR reactions (1 uL cDNA, 5 uL 10X Taq buffer with KCl (100 mM Tris-HCl (pH 8.8 at

25°C), 500 mM KCl, 0.8% (v/v) Nonidet P40), 1 uL dNTPs (10mM each), 0.5 uL Taq

(5 U/uL), 1 uL each primers (0.1 ug/uL; see appendix 1 for sequence), and 3.0 uL of

MgCl2 (25 mM), in a 50 uL reaction, using PCR program BPPCR(appendix 2)) were carried out with the cDNA produced from this reverse transcription, one using primer pair 9232/9233 and the other using primer pair 9232/9234. The reason for two primer pairs was that through successive deletions of the trans-splice acceptor sites of this gene, it was observed that an upstream cryptic trans-splice acceptor site was used when the natural acceptor site was deleted[92, 93]. As it is unknown whether this cryptic site is also used prior to trans-splicing to the natural acceptor site, one primer 40 was used on either side of this site to allow for amplification whatever the case may be.

As no amplification was observed for either of these amplifications, the magnesium concentrations and annealing temperatures were varied over a wide variety of conditions, without any successful results. 1 uL of the PCR reactions was also used as template for subsequent PCR reactions in case the product was present but not detectable; although numerous bands of incorrect sizes were observed, no amplification corresponding to the predicted size of the outron occurred.

One potential problem with amplification was that the reverse transcriptase may have had trouble reading through the branchpoint. Although there are previous reports of this polymerase reading through branchpoints[94], these experiments place the primer for reverse transcription several hundred bases away from the branchpoint, whereas I was limited to placing the primer nearly adjacent to the branchpoint due to the short size (30 nt) of the SL RNA intron. To circumvent this problem, the reverse transcription was attempted with a primer that spans the branchpoint 5 nt into the TnI outron.

Figure 5. TnI outron and the branchpoint-spanning reverse transcription primer 9267. Horizontal text corresponds to the TnI outron; diagonal text corresponds to the SL RNA intron. Capitals designate the nucleotides complementary to primer 9267.

The branchpoint spanning primer, 9267, was used for reverse transcription of

Ciona intestinalis body-wall muscle RNA in parallel with the primer previously used for reverse transcription, 9232. Subsequent PCR reactions (1 uL cDNA, 5 uL 10X

Taq buffer with KCl (100 mM Tris-HCl (pH 8.8 at 25°C), 500 mM KCl, 0.8% (v/v) 41

Nonidet P40), 1 uL dNTPs (10mM each), 0.5 uL Taq (5 U/uL), 1 uL each primers (0.1 ug/uL; see appendix 1 for sequence), and 3.0 uL of MgCl2 (25 mM), in a 50 uL reaction, using PCR program PCR54(appendix 2)) used primer pairs 9232/9233 and

9266/9267 for the reverse transcription reactions done using primers 9232 and 9267, respectively.

Although the PCR with primers 9232/9233 failed as previously, the PCR with primer pairs 9266/9267 produced a faint band of the expected size, ~400 nt (figure 23).

This band was extracted using the QIAquick Gel Extraction Kit from Qiagen, and 1 uL of the extracted DNA was used as template for a second PCR using the same conditions. The now abundant ~400 nt band was extracted, and 4 uL of the extracted

DNA was used in a Topo TA® cloning reaction. Plating of 10 uL of the cloning reaction generated many colonies, three of which were sent to IRIC for sequencing with the M13+ primer. Two of these sequences (figure 24) corresponded to the amplified TnI outron.

42

CHAPTER 3 - RESULTS: CHARACTERIZATION OF THE CIONA INTESTINALIS SL RNA IN TERMS OF SIZE HETEROGENEITY, CAP STRUCTURE, ASSOCIATION WITH SM PROTEINS, AND EXPRESSION PROFILE

Size heterogeneity amongst SL-related transcripts

The principal technique I utilized throughout my thesis was Northern blots following polyacrylamide gel electrophoresis of Ciona intestinalis RNA. Previous

Northern blot analysis of Ciona intestinalis SL RNA, which indicated the presence of a single transcript of ~50 nt[1], had used lower resolution agarose gel electrophoresis.

The first novel result my work revealed was that probing an acrylamide gel Northern blot containing Ciona intestinalis body-wall muscle RNA with an antisense oligonucleotide (9060) corresponding to the SL RNA exon actually revealed 3 (or more) distinct bands. In body-wall muscle RNA (figure 6A, lane 1), a doublet migrated just below the 50 nt marker, and a larger transcript migrated just above this marker. Previous sequence data had shown the <50 nt transcript was ~46 nt[1], and subsequent sequence data (see chapter 3 results subsection entitled “Survey of SL

RNA size and sequence variation”) revealed that the doublet could be explained by transcripts in the range of ~42-46 nt. Other subsequent sequence data (see methods subsection entitled “Identification of the 53 nt SL RNA exon-containing transcript”) revealed that the >50 nt transcript had a ~53 nt sequence (figure 6A). As such, these transcripts have been labelled as ~46 and ~53 nt on this (figure 6A) and subsequent blots. Sizes are not precisely known because the sequence of both of these transcripts were generated in part through poly(A)-tailing, and as such 3‟-A residues were trimmed from the sequence analysis, despite that they may have been genomically encoded.

This variation in size observed because of the greater resolving power of acrylamide gel electrophoresis may reflect in part heterogeneity among the estimated 43

~900 copies of the SL RNA gene are present in the Ciona intestinalis genome [K.

Hastings, unpublished data]. The lower ~42-46 nt bands correspond to a heterogeneous population of SL RNAs identical or similar in sequence to the canonical 46 nt SL RNA sequence previously established by poly(A)-tailing, SL-PCR amplification, cloning, and sequencing[1]. For simplicity, this apparent doublet will simply be referred to as the Ciona intestinalis SL RNA throughout this thesis. The upper band will henceforth be referred to as the 53 nt SL RNA exon-containing transcript, and as indicated below, this RNA does not derive from the SL RNA gene loci and does not act as an SL donor.

Figure 6A. Northern blot identifies previously unobserved size heterogeneity of SL- related transcripts. A 10% acrylamide gel Northern blot of Ciona intestinalis body- wall muscle RNA was hybridized with an antisense oligonucleotide (9060) corresponding to the SL RNA exon. Lane 1: 5 ug Ciona intestinalis body-wall muscle RNA.

Since resolution of the Ciona intestinalis SL RNA and the 53 nt SL RNA exon-containing transcript had not previously been observed, the size heterogeneity was a surprising result. The 53 nt SL RNA exon-containing transcript differed substantially in size from the canonical 46 nt size of the SL RNA previously identified by DNA sequencing[1]. In order to see if this transcript was simply a larger version of the canonical SL RNA, the membrane shown in figure 6A was stripped and reprobed with an antisense oligonucleotide (9102) corresponding to the SL RNA intron. Lane 1 of figure 6B shows that the 53 nt SL RNA exon-containing transcript 44 does not contain the canonical SL intron sequence, as only the ~46 nt Ciona intestinalis SL RNA hybridized to the intron probe, which indicates the 53 nt SL RNA exon-containing transcript is not simply a larger version of the canonical SL RNA.

The 53 nt SL RNA exon-containing transcript could represent a second SL RNA in

Ciona intestinalis, or it could possibly be a short abundantly expressed RNA which is trans-spliced, and as such reacts with the probe for the SL RNA exon.

Figure 6B. Northern blot reveals the 53 nt SL RNA exon-containing transcript does not contain the SL RNA intron sequence, and therefore is not simply a longer version of the canonical Ciona intestinalis SL RNA. The blot featured in figure 6A was stripped and hybridized with an antisense oligonucleotide (9102) corresponding to the SL RNA intron. Lane 1: 5 ug Ciona intestinalis body-wall muscle RNA.

The 3‟-(non-SL)-sequence of the 53 nt SL RNA exon-containing transcript was ultimately identified by SL-PCR of the antibody R1131 immunoprecipitation pellet fraction (figure 11A, lane 4), which was enriched for this transcript and devoid of the Ciona intestinalis SL RNA. To verify the sequence identified corresponded to a bona fide transcript and was not some artefact of cloning or PCR, a Northern blot of

Ciona intestinalis body-wall muscle RNA was hybridized with an antisense oligonucleotide (9284) corresponding to the 3‟-end of the sequence (figure 6A) identified by the SL-PCR experiment of the R1131 immunoprecipitation pellet fraction. As expected, a band migrating just above the 50 nt marker was observed in this blot (figure 6C, lane 1). This indicates that the SL-PCR identified the previously 45 observed (figure 6A, lane 1) 53 nt SL RNA exon-containing transcript, and the sequence generated was not some artefact of cloning or PCR.

Figure 6C. Northern blot reveals the sequence generated by SL-PCR following R1131 immunoprecipitation identified the 3‟-sequence of the 53 nt SL RNA exon- containing transcript. A 10% acrylamide gel Northern blot of Ciona intestinalis body- wall muscle RNA was hybridized with an antisense oligonucleotide (9284) corresponding to the 3‟-end of the molecule generated by SL-PCR of the R1131 immunoprecipitation pellet fraction. Lane 1: 5 ug Ciona intestinalis body-wall muscle RNA.

The 53 nt SL RNA exon-containing transcript is generated by trans-splicing

In order to identify the 53 nt SL RNA exon-containing transcript, I cloned and sequenced it from antibody R1131-immunoprecipitated body-wall muscle RNA (see methods subsection entitled “Immunoprecipitation using the R1131 rabbit polyclonal antiserum”). Both the supernatant and pellet fractions from this immunoprecipitation were poly(A)-tailed, reverse transcribed with a poly(T) primer (9038RT) linked to an anchor sequence, and PCR amplified using anchor-specific (9038) and SL-specific

(9015) primers. Electrophoresis on a 4% agarose gel revealed that the lane containing cDNA from immunoprecipitated RNA contained a prominent band ~10 nt larger than the band in the lane containing the cDNA from the supernatant fraction (not shown).

As such, the PCR product from the immunoprecipitated pellet fraction was cloned into a TOPO® vector and sequenced. Four clones were randomly selected and sequenced and three of these shared an identical 53 nt sequence containing the SL RNA exon 46 sequence (figure 7A). The origin of the fourth clone (figure 7B) was unknown, as its sequence was to small to map to the Ciona intestinalis genome uniquely.

ATTCTATTTGAATAAGgtaatttttcattgtataaactattttgtttatacag

Figure 7A. Sequence of 53 nt SL RNA exon-containing transcript generated by poly(A)-tailing and SL-PCR of R1131 immunoprecipitated RNA. Capitals represent the SL RNA exon; lower case represents the 5‟-end of the Ciona intestinalis WAS protein family homolog gene, to which the SL is trans-spliced. Underlined portion represents the segment complementary to an antisense oligonucleotide (9284), used for hybridization in figure 6C.

ATTCTATTTGAATAAGattattgtagagaaact

Figure 7B. Sequence of the fourth clone generated by poly(A)-tailing and SL-PCR of R1131 immunoprecipitated RNA. Capitals represent the SL RNA exon; precise origin of the lowercase 3‟-portion is unknown, as this sequence is too short to map to a unique portion of the Ciona intestinalis genome.

BLAT alignment of the 53 nt sequence (figure 7A) using the UCSC Genome

Browser[95] to the Ciona intestinalis genome, December 2002 assembly, revealed that the 3‟-most 37 nt portion aligned with 36/37 identities (figure 8) to the 5‟-end of the trans-spliced gene annotated as the Ciona intestinalis WAS protein family homolog gene, with an expect value of 4x10-12. This 37 nt sequence corresponded to the sequence between two alternative trans-splice acceptor sites of this gene, which was discovered using the KH gene models from the Ciona intestinalis Ghost database[96], which contains the most up to date gene models for Ciona intestinalis.

These gene models incorporated data regarding trans-splice acceptor sites produced by our lab via a high-throughput sequence analysis of trans-spliced mRNAs in Ciona intestinalis [J. Matsumoto et al., manuscript in preparation].

This is a unique RNA in that both its 5‟- and 3‟-ends may be generated by trans-splicing. It is clear that the 5‟-end is trans-spliced, as the SL exon primer

(9015) primes on this molecule, and the SL sequence is not upstream, while there is a splice acceptor dinucleotide AG at the expected location. The 3‟-most dinucleotide, 47 also AG, marks a trans-splice acceptor site known to be utilized in the Ciona intestinalis WAS protein family homolog gene according to the KH gene models[96].

Figure 8. The 53 nt SL RNA exon-containing transcript appears to be generated by two successive trans-splicing reactions of the Ciona intestinalis WAS protein family homolog gene. Upper black bar: alignment of the latter 37 nt of the 53 nt SL RNA exon-containing transcript to the Ciona intestinalis genome. Lower black bar: gene models showing alternative trans-splice acceptor sites (arrows) of the Ciona intestinalis WAS protein family homolog gene. The trans-splice acceptor AG dinucleotides have been underlined in the genomic sequence.

Survey of SL RNA size and sequence variation

In order to determine the size and sequence distribution of the transcripts constituting the apparent doublet observed in lane 1 figures 6A&B, 96 more clones were produced and sequenced from the poly(A)-tailing and SL-PCR that initially produced the sequence identity of the SL RNA[1]. Of the sequences produced, 73/96

(~76%), resembled the SL RNA; other sequences appeared to have been fragments of trans-spliced mRNAs which may have been amplified due to the presence of an internal poly(A) stretch. Since the primer (9038RT) used for reverse transcription contained a poly(T), and the rightward primer (9015) for PCR corresponded to the SL, it is understandable that molecules containing both poly(A) stretches and the SL sequence would be amplified. The results of this experiment indicate that the most abundantly expressed SL RNAs range from 42-46 nt, with apparent over- representation of transcripts of 42, 44, 45, and 46 nt (figure 9). This heterogeneity presumably contributes to the electrophoretic heterogeneity observed in Northern blots (e.g. figure 6A, lane 1), but sequence length may not precisely correspond to 48 migration distance because all 3‟-A residues have been trimmed from the sequences presented in table 1, as these molecules were generated through poly(A)-tailing, but it is possible that some, any, or all of the SL RNA sequences contain one or more 3‟-A residues. The SL RNA sequences and the frequency with which they occur, as well as the frequency with which each size of SL RNA occurs, are listed below in table 1.

Table 1. Size and sequence distributions of the SL RNAs illustrate size heterogeneity of the Ciona intestinalis SL RNA. Size (nt) Frequency Number Sequence of size of Reads 37 1/73 1 attctatttgaataaggtacattaaaagcattcagct 42 17/73 8 attctatttgaataaggtaagaatctgagctttggctcaaat 6 attctatttgaataaggtaagaatctgagctttggctcaact 2 attctatttgaataaggtaagaatctgagctatggctcaact 1 attctatttgaataaggtatgaatgtgagctttggctcatct 43 3/73 1 attctatttgaataaggtaagaatctgagctttggctcaactg 1 attctatttgaataaggtaagaatctgagctttggctcaccat 1 attctatttgaataagtgtactaaatgcttctaaggtttttgt 44 22/73 19 attctatttgaataaggtaagaatctgagctttggctcaactat 2 attctatttgaataaggtaagaatctgagctttggctcaaatat 1 attctatttgaataaggtaagattctgagctttggctcaactat 45 11/73 10 attctatttgaataaggtaagaatctgagctttggctcaactatg 1 attctatttgaataaggtaagaatctgagctttggctcaaatatg 46 16/73 10 attctatttgaataaggtaagaatctgagctttggctcaactatgt 5 attctatttgaataaggtaagaatctgagctttggctcaaatatgt 1 attctatttgaataaggtaagaatctgagctttggctcaactatat 47 1/73 1 attctatttgaataaggtaagttttttgggctttggctcaattttat 48 2/73 2 attctatttgaataaggtaagaatctgagctttggctcaactatgtat The sequences of the 73/96 sequenced clones from a poly(A)-tailing and SL-PCR experiment that resemble SL RNAs. All sequences shown were followed by runs of 11-18 A residues, which have been trimmed off here, reflecting at least in part the experimental addition of 3‟-poly(A) using poly(A) polymerase.

49

0.35

0.3

0.25

0.2

0.15 Frequency

0.1

0.05

0 37 42 43 44 45 46 47 48 Length (nt)

Figure 9. Histogram illustrating heterogeneousness of the size of the Ciona intestinalis SL RNA. A diverse set of SL RNAs ranging from 37-48 nt represents “the Ciona intestinalis SL RNA”, with over-representation of transcripts of 42, 44, 45, and 46 nt.. Body-wall muscle RNA was poly(A) tailed, reverse transcribed with a poly(T) primer (9038RT) linked to an anchor, and PCR amplified with primers (9015 and 9038) specific to the SL and anchor sequences[1]. K121 immunoprecipitation indicates the Ciona intestinalis SL RNA is most likely not TMG-capped, while the 53 nt SL RNA exon-containing transcript is

The 5‟-end of the SL RNA of every metazoan thus far examined contains a hypermethylated cap structure termed the TMG cap[12, 13, 19, 28-31]. In order to examine the 5‟-end of the Ciona intestinalis SL RNA to see if it shares this feature with other metazoans, immunoprecipitation experiments were initially performed with the K121 anti-TMG antibody against Ciona intestinalis body-wall muscle RNA. This antibody was raised against TMG cap structures and binds TMG caps, and to a lesser extent m7G caps[97]. As expected, the Ciona intestinalis SL RNA and the 53 nt SL

RNA exon-containing transcript were observed in body-wall muscle RNA (figure 10A, lane 1) when the blot (figure 10A) was hybridized to an antisense oligonucleotide

(9060) corresponding to the SL RNA exon. Addition of an equal amount of carrier yeast tRNA to that present in the immunoprecipitated pellet fractions reduced, but did not prevent this detection (lane 2). As determined with ImageJ, 61% of the Ciona intestinalis SL RNA recovered from the immunoprecipitation was present in the pellet fraction (lane 5), while 100% of the recovered 53 nt SL RNA exon-containing 50 transcript was found in the pellet fraction (lane 5). Judging from these varying affinities for the K121 antibody, the Ciona intestinalis SL RNA has a different cap structure than the 53 nt SL RNA exon-containing transcript. Since the K121 antibody is known to have a high affinity for TMG caps while weakly recognizing m7G caps[97], it is plausible that the 53 nt SL RNA exon-containing transcript is TMG- capped while the Ciona intestinalis SL RNA is capped with m7G.

FIG. 10A. Northern blot demonstrating the cap structures of the Ciona intestinalis SL RNA and the 53 nt SL RNA exon-containing transcript are different, most probably m7G and TMG, respectively, as judged by their varying affinity for the K121 antibody. A 10% acrylamide gel blot of K121 immunoprecipitated Ciona intestinalis body-wall muscle RNA was hybridized with an antisense oligonucleotide (9060) corresponding to the SL RNA exon. Lane 1: 5 ug body-wall muscle RNA. Lane 2: 5 ug body-wall muscle RNA + 0.75 ug yeast tRNA. Lane 3: Input RNA (Ciona intestinalis body-wall muscle RNA) for the immunoprecipitation. Lane 4: Body-wall muscle RNA supernatant fraction following K121 immunoprecipitation. Lane 5: Body-wall muscle RNA pellet fraction following K121 immunoprecipitation, also containing 0.75 ug yeast tRNA.

In order to ascertain how a bona fide TMG-capped molecule behaved in this experiment, in order to resolve any ambiguity with respect to the K121 antibody‟s efficiency for precipitating TMG-capped molecules, figure 10A was stripped and hybridized with an antisense oligonucleotide (9108) corresponding to the snRNA U1 of Ciona intestinalis, which is presumed to be TMG-capped, as it is in a diverse array of organisms, including human, trypanosomes, amoeba, yeast, and plant cells[53]. As

97% of the ~150 nt U1 snRNA recovered from the immunoprecipitation is present in the pellet fraction (figure 9B lane 5), this blot reveals that the K121 antibody is very 51 efficient at precipitating TMG-capped RNAs. This band actually appears enriched

(2.9-fold) in the immunoprecipitated pellet fraction (lane 5) as compared to the starting RNA for the experiment (lane 3), despite equal amounts of RNA being loaded into the input RNA lane and the immunoprecipitation. This discrepancy can be explained by the fact that the pellet fraction lacks the principal cellular RNAs, rRNA and tRNA, which are uncapped; as there is less RNA in this lane, transfer of the RNA from the gel to the membrane, or hybridization of the antisense U1 oligonucleotide to its target may be more efficient, as there is less RNA in the vicinity of this RNA to potentially block base-pairing interactions of the probe to its target. The similarly high efficiency of immunoprecipitation of the 53 nt SL RNA exon-containing transcript and U1 snRNA (100% versus 97%, respectively) suggests that both contain

TMG caps. In contrast, the Ciona intestinalis SL RNA, which binds this antibody less efficiently (61%), may not contain a TMG cap, but possibly does contain a m7 G cap.

Figure 10B. Northern blot identifies the K121 antibody bound TMG-capped molecules very efficiently in this experiment. The gel pictured in figure 10A was stripped and hybridized with an antisense oligonucleotide (9108) corresponding to Ciona intestinalis U1 snRNA. Lane 1: 5 ug body-wall muscle RNA. Lane 2: 5 ug body-wall muscle RNA + 0.75 ug yeast tRNA. Lane 3: Input RNA (Ciona intestinalis body-wall muscle RNA) for the immunoprecipitation. Lane 4: K121 immunoprecipitated body-wall muscle RNA supernatant fraction. Lane 5: K121 immunoprecipitated body-wall muscle RNA pellet fraction.

Although the K121 antibody has been shown to bind TMG-capped RNAs efficiently and m7G caps less so, its possible binding to uncapped RNAs has not been 52 investigated. In order to ensure the K121 antibody wasn‟t non-specifically binding

RNA, the gel used to create the Northern blot pictured in figures 10A&B was stained prior to transfer with SYBR® Safe, a DNA/RNA stain which binds DNA and RNA non-specifically and allows for their visualization, and analyzed (figure 10C). Two prominent bands, migrating above and below the 150 nt marker with calculated sizes of ~188 and ~138 nt, respectively, are present in the lanes containing body-wall muscle RNA, body-wall muscle RNA with yeast tRNA as a carrier, the input RNA for the immunoprecipitation, and the immunoprecipitation supernatant fraction (lanes 2-

5), but not in the immunoprecipitation pellet fraction (lane 6). As the expected sizes of metazoan 5.8S rRNA and 5S rRNA are ~160 nt[98] and ~120 nt[99], respectively, these abundantly expressed transcripts most likely represent those rRNAs. The 5‟- ends of the 5.8S and 5S rRNAs in eukaryotes are 5‟-monophosphate and 5‟- triphosphate, respectively[100]. The proportions of the 5.8S and 5S rRNAs RNA recovered from the immunoprecipitation in the pellet fraction were 2% and 0%, respectively. Since the K121 antibody is not binding these uncapped molecules, it is specific for molecules with caps. The ~80 nt RNA present in the immunoprecipitation pellet fraction (lane 6) corresponds to yeast tRNA added to this sample after the immunoprecipitation to serve as carrier RNA; tRNA migrates at a molecular weight of

~80 nt[101].

53

Figure 10C. SYBR® Safe stained scan of the pre-transfer gel used to produce the blots in figures 10A and 10B shows that the K121 antibody did not bind uncapped RNAs. Lane 1: 2.5 uL low molecular weight RNA marker (NEB N0364S). Lane 2: 5 ug body-wall muscle RNA. Lane 3: 5 ug body-wall muscle RNA + 0.75 ug yeast tRNA. Lane 4: Input RNA (Ciona intestinalis body-wall muscle RNA) for the immunoprecipitation. Lane 5: K121 immunoprecipitated body-wall muscle RNA supernatant. Lane 6: K121 immunoprecipitated body-wall muscle RNA pellet.

Although the K121 antibody is known to bind m7G with less affinity than

TMG[97], its affinity for the other known SL RNA cap structure, cap4 of trypanosomes, which contains a m7G cap, is unknown. In order to test the specificity of the K121 antibody with respect to this cap structure, an immunoprecipitation experiment of RNA from Leishmania donovani was performed. The SL RNA of

Leishmania donovani is presumed to contain the cap4 structure, as this is the cap structure found on the SL RNA of other species of Leishmania as well as other trypanosomes[102]. Figure 11 shows that the ~86 nt Leishmania donovani SL RNA was present in untreated Leishmania donovani RNA (lane 1). The ~86 nt Leishmania donovani SL RNA is also present in the input RNA for the immunoprecipitation (lane

3). The presence of 83% of the RNA recovered from the immunoprecipitation in the immunoprecipitation pellet fraction (lane 5) confirms that the K121 antibody lacks specificity for the TMG cap, as it reacts with RNAs containing a m7 G cap. The

Leishmania donovani and Ciona intestinalis SL RNAs appear to have similar affinities for the K121 antibody; both are efficiently but not entirely (83% versus 61%, respectively) bound by it. This result indicates that the SL RNA has a cap structure similar to cap4 (i.e. Leishmania donovani SL RNA) but not to TMG (i.e. U1 snRNA,

97% precipitated). 54

Figure 11. Northern blot displays a lack of specificity of the K121 antibody for m7G - capped molecules in the context of cap4. A 10% acrylamide gel Northern blot of total Leishmania donovani RNA immunoprecipitated with the K121 antibody was hybridized with an antisense oligonucleotide (9254) corresponding to the Leishmania donovani SL RNA. Lane 1: 3 ug Leishmania donovani RNA. Lane 2: 3 ug Leishmania donovani RNA + 1.9 ug yeast tRNA. Lane 3: Input RNA (total Leishmania donovani RNA) for the immunoprecipitation. Lane 4: K121 immunoprecipitated supernatant fraction. Lane 5: K121 immunoprecipitated pellet fraction containing 1.9 ug yeast tRNA.

R1131 immunoprecipitation confirms the Ciona intestinalis SL RNA is not TMG- capped, while the 53 nt SL RNA exon-containing transcript is

Although the greater affinity of the K121 antibody for the 53 nt SL RNA exon- containing transcript than the Ciona intestinalis SL RNA (figure 10A) indicated the

Ciona intestinalis SL RNA most probably does not have a TMG cap while the 53 nt

SL RNA exon-containing transcript does possess this modification, I attempted to verify this with an antibody, R1131, having greater specificity for TMG than the K121 antibody. As the goals of this experiment were to investigate the reactivity of R1131 with respect to the Ciona intestinalis SL RNA, and to show that the new antibody was capable of binding TMG caps but not the m7G cap of cap4, RNA samples of

Leishmania donovani, Ciona intestinalis, and Caenorhabditis elegans were mixed and immunoprecipitated together with R1131, a polyclonal rabbit antiserum specific to

TMG[88]. Northern blot analysis of the immunoprecipitated fractions (figure 12A) showed, as expected, the Ciona intestinalis SL RNA and the 53 nt SL RNA exon- containing transcript are present in untreated body-wall muscle RNA (lane 1) when 55 hybridizing with an antisense oligonucleotide (9060) corresponding to the Ciona SL.

These bands are also present in the lane containing mixed R1131 immunoprecipitated

RNA (lane 3), although only 40% of the 53 nt SL RNA exon-containing transcript recovered from the immunoprecipitation resides in the supernatant fraction, while the remaining 60% is in the pellet fraction (lane 4). The binding of the 53 nt SL RNA exon-containing transcript was dependent on the R1131 antibody, as only 3% of the

RNA recovered from the beads lacking antibody control was found in this pellet fraction (lane 6). The presence in the pellet fraction of the 53 nt SL RNA exon- containing transcript in greater amounts than the SL RNA was also confirmed by poly(A)-tailing, SL-PCR, amplification, cloning, and sequencing; both supernatant and pellet fractions were poly(A)-tailed and amplified under the same conditions, and the cDNA produced from the pellet fraction migrated ~10 nt higher than that of the supernatant fraction (not shown), suggesting the pellet fraction was lacking the SL

RNA and enriched for the 53 nt SL RNA exon-containing transcript.

In contrast, the entirety of the Ciona intestinalis SL RNA is in the supernatant fraction (lane 3). These results confirm that while the Ciona intestinalis SL RNA lacks the TMG modification, the 53 nt SL RNA exon-containing transcript is capped with TMG.

56

Figure 12A. Northern blot identifies that the Ciona intestinalis SL RNA lacks the TMG modification, while the 53 nt SL RNA exon-containing transcript is capped with TMG. A 10% acrylamide Northern blot against R1131 immunoprecipitated mixed C. elegans total, Ciona intestinalis body-wall muscle, and Leishmania donovani total RNA was hybridized with an antisense oligonucleotide (9060) corresponding to the Ciona intestinalis SL RNA exon. Lane 1: 5 ug Ciona intestinalis body-wall muscle RNA. Lane 2: 5 ug quail poly(A)- carrier RNA. Lane 3: R1131 immunoprecipitated mixed RNA supernatant fraction. Lane 4: R1131 immunoprecipitated mixed RNA pellet fraction. Lane 5: beads without antibody mixed RNA supernatant fraction. Lane 6: beads without antibody mixed RNA pellet fraction.

The blot shown in figure 12A was stripped and reprobed with an antisense oligonucleotide (9254) corresponding to the Leishmania donovani SL RNA to confirm specificity of the R1131 antibody for TMG. As expected, the ~86 nt SL RNA of L. donovani was detected in untreated Leishmania donovani total RNA (figure 12B, lane

1). Unlike the K121 antibody, which precipitated the SL RNA of Leishmania with intermediate efficiency (83%, figure 10, lane 5), the R1131 antibody did not bind the

SL RNA of Leishmania, as evidenced by its near absence (5%) in the immunoprecipitation pellet fraction (lane 5), while 95% of the RNA recovered from the immunoprecipitation experiment resides in the supernatant fraction (lane 4). The results of this experiment indicate the R1131 antibody does not, to an appreciable extent, cross-react with m7G , at least in the context of cap4. 57

Figure 12B. Northern blot indicates the R1131 antibody does not react with m7G in the context of cap4. The blot pictured in figure 12A was stripped and hybridized with an antisense oligonucleotide (9254) corresponding to the Leishmania donovani SL RNA. Lane 1: 5 ug Leishmania donovani total RNA. Lane 2: 5 ug Ciona intestinalis body-wall muscle RNA. Lane 3: 5 ug quail poly(A)- carrier RNA. Lane 4: R1131 immunoprecipitated mixed RNA supernatant fraction. Lane 5: R1131 immunoprecipitated mixed RNA pellet fraction. Lane 6: Beads without antibody mixed RNA supernatant fraction. Lane 7: Beads without antibody mixed RNA pellet fraction.

In order to confirm that the antibody did in fact bind TMG-capped molecules in this experiment, the blot pictured in figures 12A&B was once again stripped and reprobed with an antisense oligonucleotide (9272) corresponding to the C. elegans

SL1 RNA, which is TMG-capped[28]. As anticipated, the ~94 nt SL1 RNA of C. elegans is present in untreated C. elegans RNA (figure 12C, lane 1). The presence of

37% of the RNA recovered from the immunoprecipitation in the pellet fraction (lane

6) indicates that, as expected, the TMG-capped C. elegans SL1 RNA was bound to a significant extent by the R1131 antibody. Once again, there was little non-specific binding to the beads, as 97% of the RNA recovered from this control is found in the supernatant fraction (lane 7).

The extent of binding of C. elegans SL1 RNA in figure 12C (37%) and that of the 53 nt SL RNA exon-containing transcript in figure 12A (60%) were similar, suggesting that the 53 nt SL RNA exon-containing transcript also contains a TMG cap.

The absence of binding of the Ciona intestinalis SL RNA in figure 12A (0%) confirms that the Ciona intestinalis SL RNA does not contain a TMG cap. 58

Figure 12C. Northern blot illustrating the R1131 antibody was capable of binding TMG-capped RNAs under the conditions utilized in the immunoprecipitation preceding the blots pictured in figures 12A, B, and C. The blot pictured in figures 12A&B was stripped and hybridized with an antisense oligonucleotide (9272) corresponding to the C. elegans SL1 RNA. Lane 1: 5 ug untreated C. elegans total RNA. Lane 2: 5 ug Leishmania donovani total RNA. Lane 3: 5 ug Ciona intestinalis body-wall muscle RNA. Lane 4: 5 ug quail poly(A)- carrier RNA. Lane 5: R1131 immunoprecipitated mixed RNA supernatant fraction. Lane 6: R1131 immunoprecipitated mixed RNA pellet fraction. Lane 7: beads without antibody mixed RNA supernatant fraction. Lane 8: beads without antibody mixed RNA pellet fraction.

The 5’-end of the SL RNA is phosphatase protected and therefore capped

One possible explanation for the lack of a TMG cap on the Ciona intestinalis

SL RNA is that it is not capped with the TMG precursor m7G. Since the SL RNA, at

42-46 nt, is just above the established threshold of ~30 nt for minimum size required to be capped[55, 56], and since eukaryotic transcripts this small are seldom, if ever observed to be capped in vivo, I next investigated whether it is capped at all. This was done by treating Ciona RNA with either SAP, which removes only accessible (i.e., not protected by a cap) phosphates, or TAP, which cleaves pyrophosphate linkages regardless of whether they are protected by the presence of a cap, then by analyzing these treated samples on a 20% acrylamide 7.5 M urea Northern blot, which is able to provide single phosphate resolution[103], by hybridization with an antisense oligonucleotide (9060) corresponding to the SL RNA exon. This blot (figure 13A) reveals the Ciona intestinalis SL RNA in the TAP treated sample (lane 1) shows a clear downward mobility shift as compared to the untreated Ciona intestinalis SL 59

RNA (lane 2), while the SAP-treated Ciona intestinalis SL RNA (lane 3) ran at the same molecular weight as the untreated sample (lane 2). These results indicate that while the loss of terminal phosphates (or terminal phosphates protected by a cap) can be detected under these conditions, SAP can not produce this change in migration.

This implies that the 5‟-end of the Ciona intestinalis SL RNA is capped. The failure of the SAP treatment to affect migration of the Ciona intestinalis SL RNA was not due to a failure of the SAP reaction; a 22 nt 5‟-phosphorylated synthetic shortened SL

RNA oligonucleotide (9311) added to the sample (lane 3) prior to SAP treatment as a control for phosphatase activity showed an upward mobility shift as compared to the untreated synthetic phosphorylated SL oligonucleotide (lane 5). A similar upward shift in mobility was apparent in the SAP-treated 22 nt 5‟-phosphorylated synthetic shortened SL RNA oligonucleotide combined with carrier RNA (lane 4).

This blot also revealed that the previously observed ~42-46 nt doublet on 10% acrylamide gels actually represents four distinct bands, the lowest two of which are faint, as greater resolution is achieved on a 20% acrylamide gel than on a 10% acrylamide gel. From sampling a large number of SL RNA transcripts (see results subsection entitled “Survey of SL RNA size and sequence variation”), it was previously known that four length variants of 42, 44, 45, and 46 nt are abundantly expressed. Although there is some uncertainty due to the possible presence of one or more A‟s at the 3‟-end of the SL RNA molecules not revealed in the sequence-based length determination, it is plausible that the four bands seen following electrophoresis in 20% acrylamide may correspond to the four major length variants observed by poly(A)-tailing, SL-PCR amplification, and DNA sequencing. 60

Figure 13A. Northern blot demonstrating the 5‟-end of the Ciona intestinalis SL RNA is protected by a cap structure. A 20% acrylamide Northern blot containing Ciona intestinalis RNA and a 5‟-phosphorylated synthetic shortened SL RNA oligonucleotide treated with TAP or SAP was hybridized to an antisense oligonucleotide (9060) corresponding to the Ciona SL RNA exon. Lane 1: 5 ug TAP treated Ciona intestinalis body-wall muscle RNA. Lane 2: 5 ug untreated Ciona intestinalis body-wall muscle RNA. Lane 3: 5 ug Ciona intestinalis body-wall muscle RNA+ 5 ng 22 nt synthetic 5‟-phosphorylated SL oligonucleotide (9311) SAP treated together. Lane 4: 5 ug quail poly(A)- carrier RNA + 5 ng 9311 SAP treated together. Lane 5: 5 ug quail poly(A)- carrier RNA + 5 ng 9311 untreated.

Unexpectedly, the 53 nt SL RNA exon-containing transcript was not detected in figure 13A. In order to confirm the interpretation of the detected bands in figure

13A as SL RNA, the blot was stripped and hybridized to an antisense oligonucleotide

(9102) corresponding to the SL RNA intron. The uppermost band detected in lanes 1, 61

2, and 3 of both figures 13A and 13B migrated the same distance. This confirmed that the uppermost band observed in lanes 1, 2, and 3 of figure 13A is the largest Ciona intestinalis SL RNA rather than the 53 nt SL RNA exon-containing transcript, since the 53 nt SL RNA exon-containing transcript was previously shown (figure 6B) not to hybridize to the SL RNA intron probe. The absence of the 53 nt SL RNA exon- containing transcript may be explained by failure of this larger molecule to transfer out of the 20% acrylamide gel. Figure 13B also once again showed that the SL RNA is capped (its mobility is affected by TAP but not SAP treatment). Finally, figure 13B also verifies the microheterogeneity discovered from the SL RNA sequencing in the results subsection entitled “Survey of SL RNA size and sequence variation”, as (at least) four bands appear to have been detected by the probe.

As expected, oligonucleotide 9311 was not detected in this blot, as its sequence consisted mostly of that of the SL RNA exon, and the antisense oligonucleotide (9102) used for detection corresponded to the SL RNA intron.

62

Figure 13B. Northern blot demonstrating the uppermost transcript in lanes 1, 2, and 3 of figure 13A is the Ciona intestinalis SL RNA, not the 53 nt SL RNA exon- containing transcript. The blot pictured in figure 12A was stripped and hybridized with an antisense oligonucleotide (9102) complementary to the SL RNA intron. The uppermost transcript in these lanes migrated the same distance in figures 12A and 12B. Lane 1: 5 ug TAP treated Ciona intestinalis body-wall muscle RNA. Lane 2: 5 ug untreated Ciona intestinalis body-wall muscle RNA. Lane 3: 5 ug Ciona intestinalis body-wall muscle RNA+ 5 ng 22 nt synthetic 5‟-phosphorylated SL oligonucleotide (9311) SAP treated together.

Because the upward shift in mobility of the 22 nt 5‟-phosphorylated synthetic shortened SL RNA oligonucleotide (9311) caused by SAP treatment (figure 13A, lane

3) was small, I confirmed in a separate analysis the relative migration of 5- phosphorylated and unphosphorylated 22 nt oligonucleotide 9311. Several SAP- treated samples were run alongside the untreated 5‟-phosphorylated 9311 on a 20% acrylamide 7.5 M urea gel, as well as alongside a lane consisting of mixed SAP- treated and untreated 9311. This blot was once again hybridized to an antisense oligonucleotide (9060) complementary to the SL RNA exon. This result (figure 14) confirms the small upward band shift due to SAP removal of the free 5‟-phosphate from the SAP-treated oligonucleotides (lanes 3 & 5) as compared to untreated, 5‟- phosphorylated oligonucleotides (lanes 2 & 4) is expected[103]. The mixed SAP- treated and untreated sample (lane 1) was not completely resolved into its two 63 constituent bands, but this band does appear to be much thicker than the single species in lanes 2-5, as expected for a mixture of two closely but differently migrating species.

Apparently, the loss of the two negative charges associated with the terminal phosphate causes a greater reduction in mobility than the expected increase in mobility afforded by the concomitant reduction in size. This confirms that the upward shift of 9311 observed in figure 13A lane 3 means the SAP reaction worked, and thus that the 5‟-end of the Ciona intestinalis SL RNA is protected by a cap.

Figure 14. Northern blot illustrating a 20% acrylamide 7.5 M urea gel really can obtain single phosphate resolution. A 20% acrylamide Northern blot against un- and SAP-treated 22 nt 5‟-phosphorylated synthetic shortened SL RNA oligonucleotide (9311) was hybridized with an antisense oligonucleotide (9060) corresponding to the SL RNA exon. Lane 1: 1 ng each of un- and SAP-treated 9311 + 1 ug SAP-treated quail poly(A)- RNA. Lane 2: 1 ng 9311 + 1 ug quail poly(A)- RNA both untreated. Lane 3: 1 ng 9311 + 1 ug quail poly(A)- RNA both SAP treated. Lane 4: 1 ng 9311 + 1 ug quail poly(A)- RNA both untreated. Lane 5: 1 ng 9311 + 1 ug quail poly(A)- RNA both SAP treated. 64

The Ciona intestinalis SL RNA cap is most likely m7G

In eukaryotic cells, methylation of the capping G nucleoside by methyltransferase is distinct process from addition of the G nucleoside via guanylyl transferase[104]. In order to investigate if the cap structure found on the SL RNA of

Ciona intestinalis was m7 G, the common cap structure associated with RNA pol II transcripts, as opposed to unmethylated G or some other cap, such as the gamma- monomethyl phosphate cap found on U6 snRNA[60], a Northern blot against fractions immunoprecipitated with a highly characterized mouse monoclonal antibody called

H20, which was raised against TMG and is known to cross-react with m7 G but not unmethylated G[89] was performed and hybridized to an antisense oligonucleotide

(9060) corresponding to the SL RNA exon. The results of this experiment (figure

15A) show that, as expected, the Ciona intestinalis SL RNA and 53 nt SL RNA exon- containing transcript are present in body-wall muscle RNA (lane 1). This figure also shows that both the Ciona intestinalis SL RNA and 53 nt SL RNA exon-containing transcript are efficiently precipitated by the H20 antibody, as 53% of the recovered

RNA of both of these transcripts appears in the immunoprecipitation pellet fraction

(lane 3). As anticipated, there was little non-specific binding of the RNA to the beads, as 97% of the SL RNA and 99% of the 53 nt SL RNA exon-containing transcript recovered from the beads lacking antibody control is found in the supernatant fraction

(lane 4). As figure 12A demonstrated, the Ciona intestinalis SL RNA is not TMG- capped, and since the H20 antibody used in figure 15A does not recognize unmethylated G caps but does recognize m7G caps, the results of this experiment indicate that the Ciona intestinalis SL RNA is most likely m7 G -capped. 65

Figure 15A: Northern blot demonstrating the cap structure of the Ciona intestinalis SL RNA is most likely m7G. A 10% acrylamide gel Northern blot containing H20 immunoprecipitated Ciona intestinalis body-wall muscle RNA fractions was hybridized to an antisense oligonucleotide (9060) corresponding to the SL RNA exon. Lane 1: 10 ug unimmunoprecipitated body-wall muscle RNA. Lane 2: H20 immunoprecipitated body-wall muscle RNA supernatant fraction. Lane 3: H20 immunoprecipitated body-wall muscle RNA pellet fraction. Lane 4: Beads only control body-wall muscle RNA supernatant fraction. Lane 5: Beads only control body-wall muscle RNA pellet fraction.

To ensure the H20 antibody was not non-specifically binding uncapped RNA, the SYBR® Safe stained pre-transfer scan (figure 15B) of the gel was analyzed. Of the RNA recovered from the immunoprecipitation, 96% of an abundant transcript with a calculated molecular weight of ~141 nt was found in the supernatant fraction of the immunoprecipitation (lane 3), with only 4% in the pellet fraction (lane 4). This band most likely represents the non-capped 5‟-monophosphate 5S RNA of Ciona intestinalis. The similarly sized band in the pellet fraction (lane 4) can be explained by the fact that quail poly(A)- RNA was added to the pellet lane as a carrier; as this band migrates as ~126 nt, it is most likely quail 5S RNA, rather than immunoprecipitated Ciona intestinalis 5S RNA. This indicates that the antibody was not non-specifically binding uncapped RNAs. 66

Figure 15B. SYBR® Safe stained pre-transfer scan of the gel used to produce the blot featured in figure 15A shows that the H20 antibody is not non-specifically binding uncapped RNAs. Lane 1: 2.5 uL low molecular weight RNA marker (NEB N0364S). Lane 2: 10 ug body-wall muscle RNA. Lane 3: H20 immunoprecipitated body-wall muscle RNA supernatant fraction. Lane 4: H20 immunoprecipitated body-wall muscle RNA pellet fraction. Lane 5: Beads only control body-wall muscle RNA supernatant fraction. Lane 6: Beads only control body-wall muscle RNA pellet fraction.

The Ciona intestinalis SL RNA, as well as the 53 nt SL RNA exon-containing transcript, are snRNPs associated with Sm proteins

A further question I wished to investigate was whether the Ciona intestinalis

SL RNA was in the form of a snRNP in association with Sm proteins in vivo. In order to investigate this, an immunoprecipitation of Ciona intestinalis tissue (adult intestine) homogenate was performed with human anti-Sm antiserum, and RNA from the pellet and supernatant fractions was analyzed by Northern blot hybridization with an antisense oligonucleotide (9060) corresponding to the SL RNA exon. As expected, the Ciona intestinalis SL RNA and 53 nt SL RNA exon-containing transcript were detectable in untreated body-wall muscle RNA (figure 16A, lane 1), which was included as a positive control for hybridization in this analysis. Two bands are clearly visible in the immunoprecipitation pellet fraction (lane 5). One band in the pellet fraction (lane 5) corresponds to the size of the 53 nt SL RNA exon-containing transcript; 42% of this RNA recovered from the immunoprecipitation was found in the pellet lane. The other band in the pellet lane is visibly shorter (later determined by 67 sequencing to be ~42 nt, and labelled as such) than the Ciona intestinalis SL RNA in body-wall muscle RNA (lane 1); 100% of this shorter transcript recovered from the immunoprecipitation is found in the pellet lane. This binding was dependent on the

Sm antibody, as 0% of either of these transcripts was found in the pellet fraction of the beads lacking antibody control. This ~42 nt RNA could be a degradation product of either the Ciona intestinalis SL RNA or the 53 nt SL RNA exon-containing transcript, perhaps generated during the 2 h whole cell homogenate incubation with the Sm antibody. Alternatively, this could reflect genuine tissue-specific difference of the

Ciona intestinalis SL RNA between intestine and body-wall muscle tissues; unfortunately this experiment can not distinguish between these possibilities, as the

SL RNA was not clearly detected in RNA derived from mock-immunoprecipitated extract (lane 3). Enrichment out of the RNA derived from mock-immunoprecipitated extract (lane 3) in the pellet fraction (lane 5) can account for this difference. As the pellet fraction lacks the principal cellular RNAs, transfer of the RNA from the gel to the membrane, or hybridization of the antisense SL oligonucleotide (9060) to its target may have been more efficient for RNAs contained in this lane. A third possibility is that Sm-associated SL RNA molecules are shorter than their unbound counterparts, as

Sm binding is known to be a prerequisite for 3‟-end trimming of snRNAs[72].

68

Figure 16A. Northern blot illustrating the 53 nt SL RNA exon-containing transcript and a previously unobserved ~42 nt RNA exist as snRNPs in association with Sm proteins. A 10% acrylamide Northern blot of RNA fractions extracted from an Sm antibody immunoprecipitation of Ciona intestinalis tissue homogenate was hybridized with an antisense oligonucleotide (9060) corresponding to the Ciona intestinalis SL RNA exon. Lane 1: 5 ug body-wall muscle RNA. Lane 2: 5 ug body-wall muscle RNA + 8 ug carrier yeast tRNA. Lane 3: RNA derived from mock- immunoprecipitated extract. Lane 4: RNA extracted from immunoprecipitation supernatant fraction. Lane 5: RNA extracted from immunoprecipitation pellet fraction. Lane 6: RNA extracted from beads alone supernatant fraction. Lane 7: RNA extracted from beads alone pellet fraction.

The presence of the Ciona intestinalis SL RNA in the Sm immunoprecipitation pellet (figure 16A, lane 5) was an expected result, as the SL RNAs of the nematodes C. elegans[28], Ascaris lumbricoides[67] and the trypanosome Leptomonas collosoma[105] had previously also been shown to associate with Sm proteins.

However, association of the novel 53 nt SL RNA exon-containing transcript was an unexpected result. To verify the 53 nt SL RNA exon-containing transcript was in fact associated with Sm proteins, the blot shown in figure 16A was stripped and reprobed with an antisense oligonucleotide (9284) corresponding to the 3‟-end of this transcript.

As expected, the 53 nt SL RNA exon-containing transcript was found in body-wall muscle RNA, body-wall muscle RNA with yeast carrier tRNA added, and RNA derived from mock-immunoprecipitated extract (figure 16B, lanes 1-3, respectively).

The immunoprecipitation pellet fraction (lane 5) contained 86% of the RNA recovered from the immunoprecipitation, suggesting that the 53 nt SL RNA exon-containing transcript is in fact associated with Sm proteins. 69

Figure 16B. Northern blot confirming that the 53 nt SL RNA exon-containing transcript is associated with Sm proteins. The blot pictured in figure 16A was stripped and hybridized with an antisense oligonucleotide (9284) corresponding to the 3‟-end of the sequence generated by SL-PCR of the R1131 immunoprecipitation pellet fraction. Lane 1: 5 ug body-wall muscle RNA. Lane 2: 5 ug body-wall muscle RNA + 8 ug carrier yeast tRNA. Lane 3: RNA derived from mock-immunoprecipitated extract. Lane 4: RNA extracted from immunoprecipitation supernatant fraction. Lane 5: RNA extracted from immunoprecipitation pellet fraction.

In order to investigate whether the antibody was binding off-target RNAs, the immunoprecipitation supernatant (figure 16C, lane 5) and pellet (lane 6) fractions from the pre-transfer scan of the gel were analyzed. Once again, two abundant transcripts migrated just above and below the 150 nt marker (lane 1). The apparent molecular weights of the bands above and below the 150 nt marker were ~174 nt and

~132 nt, respectively. Once again, these transcripts most likely represent the 5.8S and

5S rRNAs, respectively. The proportion of RNA recovered from the immunoprecipitation for both of these transcripts in the pellet (lane 6) was 3%. As the antibody bound the 53 nt SL RNA exon-containing transcript (42% or 86%, depending on the probe) and the SL RNA (100%) with much greater affinity, these transcripts are genuinely associated with Sm proteins and are not present in the pellet as a result of some non-specific binding. The ~80 nt RNA present in the immunoprecipitation pellet fraction (lane 8) corresponds to yeast tRNA added to this sample after the immunoprecipitation to serve as carrier RNA; tRNA migrates at a molecular weight of ~80 nt[101]. 70

Figure 16C. SYBR® Safe stained scan of the pre-transfer gel used to produce the blot in figures 16A&B shows that the Sm antibody does not bind off-target RNAs. Lane 1: 2.5 uL low molecular weight RNA marker (NEB N0364S). Lane 2: 5 ug body-wall muscle RNA. Lane 3: 5 ug body-wall muscle RNA + 8 ug carrier yeast tRNA. Lane 4: RNA derived from mock-immunoprecipitated extract. Lane 5: RNA extracted from immunoprecipitation supernatant fraction. Lane 6: RNA extracted from immunoprecipitation pellet fraction. Lane 7: RNA extracted from beads alone supernatant fraction. Lane 8: RNA extracted from beads alone pellet fraction.

In order to distinguish between these possible explanations of why a shorter transcript was detected in the Sm pellet fraction (figure 16A, lane 5), a Northern blot of body-wall muscle and intestine RNA isolated directly from fresh tissue was hybridized with an antisense oligonucleotide (9060) corresponding to the Ciona intestinalis SL RNA exon. The results of this analysis (figure 17) indicate that the SL

RNA migrates at the same size (~46 nt) in unadulterated intestine RNA (lane 2) as it does in body-wall muscle RNA (lane 1). This suggests that the ~42 nt band previously observed in the Sm pellet (figure 16A, lane 5) is not smaller due to intrinsic tissue-specific size differences of the SL RNA between body-wall muscle and intestine. Either immunoprecipitation with an anti-Sm antibody enriches for truncated versions of the SL RNA, or there is natural degradation during the 2 h incubation as whole cell homogenate. 71

Figure 17. Northern blot identifies the SL RNA in intestine and body-wall muscle is of similar size. A 10% acrylamide Northern blot of body-wall muscle and intestine RNA was hybridized to an antisense oligonucleotide (9060) corresponding to the Ciona intestinalis SL RNA exon. Lane 1: 5 ug body-wall muscle RNA. Lane 2: 5 ug intestine RNA.

The ~42 nt RNA in the Sm immunoprecipitated intestine lane (figure 16A, lane 5) was shorter than the SL RNA of intestine tissue. This was not due to intrinsic tissue-specific size differences of the SL RNA between intestine and body-wall muscle RNA (above). In order to confirm the ~42 nt transcript was related to SL

RNA, the SL exon-containing RNA from this lane was sequenced by poly(A)-tailing, reverse transcribing with a poly(T) primer (9038RT) linked to an anchor, and PCR amplifying using anchor and SL RNA exon specific primers (9015 and 9038). The resulting cDNA was cloned and eight clones were sequenced. Two of the clones represented a 42 nt version of the SL RNA, and a third clone represented an intact version of the 53 nt SL RNA exon-containing transcript. Four of the remaining five clones contained a short stretch of nucleotides following the SL primer before the poly(A) tail; their precise origin can not be determined due to their small size. The eighth clone appears to be a novel trans-spliced RNA; while the SL primer sequence does not map immediately upstream of this RNA, an AG splice acceptor site does.

This unknown transcript does have a site resembling the consensus binding site for

Sm proteins, as do the SL RNA and as 53 nt SL RNA exon-containing transcript, as shown in table 2. The sequences of all the clones with their poly(A) tails trimmed off 72 are reported below in table 2. The sequencing results along with the Northern blot result indicates that both the SL RNA and the 53 nt SL RNA exon-containing transcript exist as snRNPs in association with Sm proteins, and either Sm immunoprecipitation enriches for SL RNAs which are 3‟-truncated, as only 18/73

(25%) of previously sequenced SL RNA clones were less than 42 nt, whereas 100% of

Sm-associated transcripts were, or these transcripts simply degraded by ~4 nt during the 2 h incubation as whole cell homogenate. A comparison of the lengths of SL

RNAs generated from poly(A)-tailing and SL-PCR of the Sm supernatant fraction could resolve the latter issue, however that data is not available.

Table 2. Sequences of SL-related transcripts associated with Sm proteins. Length (nt) Number Sequence of reads 21 1 ATTCTATTTGAATAAGgcctt 21 1 ATTCTATTTGAATAAGgtaag 24 2 ATTCTATTTGAATAAGtttacgcc 42 2 ATTCTATTTGAATAAGgtaagaatctgagctttggctcaact 53 1 ATTCTATTTGAATAAGgtaatttttcattgtataaactattttgtttatacag 77 1 ATTCTATTTGAATAAGattgagcgttttgcggctgatttatgatgatattggttagctagt gttgcctacataccat Underlined portions resemble Sm consensus binding sites. Capitalized sequence corresponds to the SL sequence, which was included in the rightward primer (9015). Poly(A) tails have been trimmed from the above sequences.

The canonical ~46 nt SL RNA, not the 53 nt SL RNA exon-containing transcript, is the SL donor for TnI

Since the 53 nt SL RNA exon-containing transcript is TMG-capped, Sm- associated, and contains the splice donor dinucleotide GU immediately following the

SL sequence, and is therefore remarkably donor-like, it seemed plausible that this molecule, rather than or in addition to the ~46 nt Ciona intestinalis SL RNA, functions as the SL donor for trans-splicing in adult body-wall muscle tissue. Because my previous results showed that the 53 nt SL RNA exon-containing transcript molecule possessed a TMG cap, while the ~46 nt SL RNA lacked this modification, I could investigate if the 53 nt SL RNA exon-containing transcript is used as a donor by 73 determining whether trans-spliced mRNAs contained TMG caps, asking does the cap structure of trans-spliced mRNAs match that of the ~46 Ciona intestinalis SL RNA or that of the 53 nt SL RNA exon-containing transcript? In order to investigate this, TnI mRNA was cut by oligonucleotide-targeted RNase H cleavage at a site 116 nt downstream of the 5‟-end (since problems with immunoprecipitating full-length mRNAs with anti-cap antibodies had previously been reported[28]) and mixed together with C. elegans RNA (to serve as a positive control for the immunoprecipitation). The combined RNA was used in an immunoprecipitation experiment using the R1131 rabbit polyclonal antiserum specific for TMG. The blot

(figure 18A) was probed with an antisense oligonucleotide (9281) complementary to a gene-specific region near the 5‟-end of trans-spliced TnI mRNA. The expected 116 nt product was found in the RNAse H digested unimmunoprecipitated RNA (figure 18A, lane 1). The digested TnI mRNA recovered from the immunoprecipitation experiment did not react with the antibody, as 100% of it is detected in the supernatant fraction

(lane 2). As anticipated, there was no non-specific binding to the beads, as 100% of the digested TnI mRNA recovered from the beads without antibody control is found in this supernatant fraction (lane 4). According to this result, the 5‟-ends of trans-spliced mRNAs are not TMG-capped. This implies that the SL-like, Sm-associated and

TMG-capped 53 nt SL RNA exon-containing transcript does not act as the major SL donor in adult body-wall muscle RNA, at least for TnI mRNA. 74

Figure 18A. Northern blot illustrating the 5‟-end of the trans-spliced mRNA TnI is not TMG, and therefore that the ~46 nt SL RNA is the SL donor, not the 53 nt SL RNA exon-containing transcript. A 10% acrylamide gel Northern blot against R1131 immunoprecipitated mixed RNase H digested Ciona intestinalis and C. elegans RNA was hybridized with an antisense oligonucleotide (9281) corresponding to the 5‟-end of the trans-spliced gene TnI. Lane 1: 10 ug RNase digested Ciona intestinalis body- wall muscle RNA. Lane 2: RNase H digested mixed RNA immunoprecipitation supernatant fraction. Lane 3: RNase H digested mixed RNA immunoprecipitation pellet fraction. Lane 4: RNase H digested mixed RNA beads only supernatant fraction. Lane 5: RNase H digested mixed RNA beads only pellet fraction.

In order to ensure that the R1131 antibody in fact effectively bound TMG- capped molecules in the immunoprecipitation preceding the blot in figure 18A and that thus the 5‟-end of trans-spliced mRNAs are not capped with TMG, the blot pictured in figure 18A was stripped and reprobed with an antisense oligonucleotide

(9272) corresponding to C. elegans SL1 RNA, which had been mixed with the Ciona intestinalis body-wall muscle RNA prior to immunoprecipitation. As figure 18B shows, 34% of the ~94 nt C. elegans SL1 RNA recovered from the immunoprecipitation is found in the immunoprecipitation pellet fraction (lane 3). As expected, there was no non-specific binding of this transcript to the beads alone, as

100% of the RNA recovered from the beads lacking antibody control is found in the supernatant fraction (lane 4). This result indicates the immunoprecipitation was successful, indicating that for TnI mRNA at least, the majority of molecules are not

TMG-capped. Presumably, the ~46 nt non-TMG-capped Ciona intestinalis SL RNA is the SL donor. 75

Figure 18B. Northern blot illustrating the R1131 antibody was capable of binding TMG-capped RNAs in the immunoprecipitation preceding the blot shown in figure 18A. The blot pictured in figure 18A was stripped and hybridized with an antisense oligonucleotide (9272) corresponding to the C. elegans SL1 RNA. Lane 1: 10 ug RNase digested Ciona intestinalis body-wall muscle RNA. Lane 2: RNase H digested mixed RNA immunoprecipitation supernatant fraction. Lane 3: RNase H digested mixed RNA immunoprecipitation pellet fraction. Lane 4: RNase H digested mixed RNA beads only supernatant fraction. Lane 5: RNase H digested mixed RNA beads only pellet fraction.

Expression profile of the SL RNA and the 53 nt SL RNA exon-containing transcript

In order to determine the expression profile of the SL RNA and the 53 nt SL

RNA exon-containing transcript, I performed Northern blot analysis of various Ciona intestinalis RNA samples, including three Ciona intestinalis body-wall muscle RNA samples collected from separate batches of animals from Cape Cod over the course of

10 years (collections in 1995, 2002, and 2005), and one Ciona intestinalis 12 h embryo sample developed from Southern California animals. The blot (figure 19) was once again hybridized with an antisense oligonucleotide (9060) corresponding to the

SL RNA exon. The 53 nt SL RNA exon-containing transcript is present in all three

Ciona body-wall muscle RNA samples surveyed (figure 19, lanes 1, 3, and 4), but absent in the one 12 h embryo sample examined (lane 2). The absence of the 53 nt SL

RNA exon-containing transcript in the Ciona intestinalis 12 h embryo sample from

California is apparently due to tissue-specific, rather than geographical differences, as this transcript was later shown to also be absent in the 12 h embryo sample (figure 76

20A, lane 2) from Cape Cod. Thus the 53 nt SL RNA exon-containing transcript is a consistent feature of Atlantic Ciona intestinalis body-wall muscle RNA samples.

Figure 19. Northern blot illustrating the presence of the 53 nt SL RNA exon- containing transcript in three Ciona intestinalis body-wall muscle RNA samples collected from three separate batches of animals and its absence in one 12 h embryo sample. A 10% acrylamide gel Northern blot was hybridized to an antisense oligonucleotide (9060) corresponding to the SL RNA exon. Lane 1: Ciona intestinalis body-wall muscle RNA from Cape Cod (collected in 1995). Lane 2: Ciona intestinalis 12 h embryo RNA from Southern California. Lane 3: Ciona intestinalis body-wall muscle RNA from Cape Cod (collected in 2002). Lane 4: Ciona intestinalis body-wall muscle RNA from Cape Cod (collected in 2005).

I also examined the expression profile of the SL RNA and the 53 nt SL RNA exon-containing transcript in varying life stages of Ciona intestinalis, once again by

Northern blot hybridization with an antisense oligonucleotide (9060) corresponding to the Ciona intestinalis SL RNA exon. As shown in figure 20A, the 53 nt SL RNA exon-containing transcript is detectably expressed in the Ciona intestinalis adult body- wall muscle sample (lane 3), but not in Ciona eggs (lane 1) or Atlantic Ciona 12 h embryos (lane 2). All of these samples were derived from animals from Cape Cod.

This analysis also revealed a large (2.7-fold) apparent upregulation of the SL RNA sometime between the egg and early embryo stages, as the SL RNA is less abundantly expressed in the egg (lane1) than it is in the 12 h embryo stage (lane 2). The paucity of SL RNA in the egg sample (lane 1) is not due to an absence of nuclear RNAs in general, as stripping the blot and reprobing it with an antisense oligonucleotide (9108) 77 corresponding to Ciona intestinalis U1 revealed the egg sample contained 78% as much U1 as the 12 h embryo sample (figure 20B).

Figure 20A. Northern blot illustrating the presence of the 53 nt SL RNA exon- containing transcript in only body-wall muscle RNA, as well as an apparent upregulation of the SL RNA some time between egg and 12 h embryo stages. A 10% acrylamide gel Northern blot containing RNA from varying life stages of Ciona intestinalis was hybridized to an antisense oligonucleotide (9060) corresponding to the SL RNA exon. Lane 1: Ciona intestinalis egg RNA. Lane 2: Ciona intestinalis 12 h embryo RNA. Lane 3: Ciona intestinalis body-wall muscle RNA. All RNA samples were from animals from Cape Cod.

Figure 20B. Northern blot illustrating the paucity of the Ciona intestinalis SL RNA in egg samples is not due to a lack of snRNA in general. The blot pictured in figure 20A was stripped and reprobed with an antisense oligonucleotide (9108) corresponding to Ciona intestinalis U1 snRNA. Lane 1: Ciona intestinalis egg RNA. Lane 2: Ciona intestinalis 12 h embryo RNA. Lane 3: Ciona intestinalis body-wall muscle RNA. All RNA samples were derived from animals from Cape Cod.

78

CHAPTER 4 - RESULTS: DETERMINATION OF WHERE IN THE GENOME THE CIONA INTESTINALIS SL RNA GENES RESIDE

Production of a probe for FISH to the Ciona intestinalis SL RNA gene cluster

In order to make a probe for FISH to the SL RNA gene cluster, a probe consisting of four repeats of the SL RNA gene was PCR amplified from a Ciona intestinalis phage library supplied by R. Zeller. An initial PCR amplification was carried out using primers (9240 and 9241) which partially spanned the phage vector

(pBK-CMV) and which were also specific to the SL RNA gene unit repeat (see methods). This PCR produced a faint band corresponding to four SL RNA genes (not shown). As such, this band was gel-extracted, and a used as a template for a subsequent temperature gradient PCR used to produce the probes. Four distinct bands were observed on an agarose gel containing the re-amplified samples (figure 21). The

~300 nt band (lane 3), corresponded to the size of the SL RNA gene plus the flanking vector sequence supplied by the primers and the ~550, 800, and 1100 nt bands corresponded to multiples of this singleton. The largest band, which corresponded to the length of four unit repeats, was extracted from the gel, cloned, and sequenced.

Two clones (SLx4.6 & SLx4.7, figures 22A&B respectively) represented two different repeats of four SL RNA gene units each, and were subsequently used in

FISH experiments (figure 23) to map the SL RNA gene cluster. 79

Figure 21. PCR of partially digested Ciona intestinalis genomic DNA inserted into a vector using the vector/insert spanning primers (9240 and 9241) successfully produced products in multiples the size of the SL RNA gene repeat unit. Right side numbers represent approximate size of bands in nt. Lane 1: 10 uL Fermentas GeneRuler™ 100bp Plus DNA Ladder. Lane 2: PCR carried out at 61.0°C. Lane 3: PCR carried out at 66.6°C.

GAGAAGCTTTTTGAATTCTTTGGATC_AATTATCGCGCTACACCGCACAACGTAAGAGCTATCGGACTT TACCATTATTACATACCGTTATCACATATGTCAGATTAACACATCGGCTACATTTGCACCGAATTGGGT TCTAGGTGTTTTCTTTATACGAAGAAAACTTTTTGAATTAAAGCACCCAAGTCATACATTTGAAGGTAT TCTATTTGAATAAGGTAAGAATCTGAGCTTTGGCTCAACTATATATAAACAAAAGAGTACTGGACCTAA AATACTTGTTTGACC_ATTTGTCGCGCTACACCGCACAACGTAAGAGCTATCGGACTTTACCATTATTA CATACCGTTATCACATATGTCAGATTAACACATCGGCTACATTTGCACCGAATTGGGTTCTAGGTGTTT TCTTTATACGAAGATTACTTTTTGAATTAAACCACCTAAGTCAGACATTTTACGGTATTCTATTTGAAT AAGGTAAGAATCTGAGCTTTGGCTCAACTATATTTAAACAAAAGAGTTCTGGACCTAAAATACTTGTTT GACC_AATTATCGCGCTACACCGCACAACGTAAGAGATATCGGACTTTACCATTATTACATACCGTTAT CACATATGTCAGATTAACACATCGGCTACATTTGCACCGAATTGGGTCCTAGGTGTTTTCTTTATACGA AGAAAACTTTTTGAATTAAAGCACCAAAGTCAGACATTTGAAGGTACTCTATTTGAATAAGGTAAGAAT CTGAGCTTTGGCTCAACTATATATAAACAAAAGAGTTCTGGACCTAAAATACTTGTTTGATC_AATTAT CGCGCTACACCGCACAACGTAAGAGATATCGGACTTTACCATTATTACATACCGTTATCACATATGTCA GATTAACACATCGGCTACATTTGCACCGAATTGGGTTCTAGGTGTTTTCTTTATACGAAGAAAACTTTT TGAATCAAAGCACCCAAGTCAGACATTTGAAGGTATTCTATTTGAATAAGGTAAGAATCTGAGATTTGG CTCAACTATTTTTAAACAAAGGAGTTCTGGGCCTAAAATACTTGTTTGATC_CACTAGTGTCGAA

Figure 22A. Sequence of clone SLx4.6, which contains four copies of the Ciona intestinalis SL RNA gene repeat unit. Black represents flanking vector sequence. Each other colour represents one SL RNA gene repeat unit, which are separated from each other and the vector sequence by an underscore.

80

GAGAAGCTTTTTGAATTCTTTGGATC_AATTATCGCGCTACACCACACAACGTAAGAGATATCGGACTT TACCGTTATTACATACCGTTATCAGATATGTCAGATTAACACATCGGCTACATTTGCACCGAATTGGGT TCTAGGTGTTTTCTTTATACGGAGAAAACTTTTTGAATCAAAGCACCCAAGTCAGACAGTTGAAGGTAT TCTATTTGAATAAGGTAAGAATCTGAGCTTTCGCTCAACTATATATAAACAAAAGAGTACTGGACCTAA AATACTTGTTTGGCC_ATTTGTCGCGCTACACCGCACAACGTAAGAGCTATCGGACTTTACCATTATTA CATACCGTTATTACATATGTCAGATTAACACATCGGCTACATTTGCACCGAATTGGGTTCTAGGTGTTT TCTTTATACGAAGAAAACTTTTTGAATTAAAGCACCCAAGTCAGACATTTGAAGGTATTCTATTTGAAT AAGGTAAGAATGTGAGCTTTGGCTCAACTATTTTTAAACAAAGGAGTTCTGGACCTAAAATACTTGTTT GACC_ATTTATCGCGCCACACCGCACAACGTAAGAGCTGTCCGACTTTACCATTATTACATACCGTTAT CACATTTGCCAGACTAACACATCGGCTACATTTGCACCGAATTGGGTCCTAGTTGTTTTCTTTATACGA AGAAAACTTTTTGAATTAAAGCACCAAAGTCAGACATTTGAAGGTATTCTATTTGAATAAGGTAAGAAT CTGAGCTTTGGCTCAACTATATATAAACAAAAGAGTACTGGACTTAAAATACTTGTTTGATC_AATTAT CGCGCTACACCGCACAACGTAAGAGATATCGGACTTTACCATTATTACATACCGTTATCAGATATGTCA GATTAACACATCGGCCACATTTGCACCGAATTGGGTTCTAGGTGTTTTCTTTATACGAAGAAAACTTTT TGAATTAAAGCACCCAAGTCAGACATTTGAAGGTATTCTATTTGAATAAGGTAAGAATGTGAGCTTTGG CTCAACTATATACAAACAAAAGAGTTCTGGACCTAAAATACTTGTTTGATC_CACTAGTGTCGAA

Figure 22B. Sequence of clone SLx4.7, which contains four copies of the Ciona intestinalis SL RNA gene repeat unit. Black represents flanking vector sequence. Each other colour represents one SL RNA gene repeat unit, which are separated from each other and the vector sequence by an underscore.

FISH to the Ciona intestinalis SL RNA gene cluster

Due to the inability of the assembly software to incorporate directly-repeated sequences such as the SL RNA gene, this gene cluster was lost in the genome assembly of Ciona intestinalis (data not shown). In order to investigate whether the estimated 900 copies of this gene reside at a single major site or at more than one site, and on which chromosome(s) this cluster of genes resides, FISH for these clusters was performed with the probes described (figures 22A&B). Hybridization with two probes each consisting of four different monomers, performed by Eiichi Shoguchi at the Okinawa Institute of Science and Technology, indicated that the vast majority of the SL RNA genes map to the telomeric tip of the short arm of chromosome 8 (figure

23).

81

Figure 23. FISH of the Ciona intestinalis SL RNA gene cluster illustrating the majority of the SL RNA genes in Ciona intestinalis are located at the distal tip of the short arm of chromosome 8. Labels on the left refer to BAC probes referenced [91, 106]. Numbers on the right refer to genome version 1 sequence scaffolds.

82

CHAPTER 5 - RESULTS: OUTRON AMPLIFICATION Amplification of the TnI outron

In order to investigate whether the branched discarded outron portion of trans- spliced pre-mRNAs exists in detectable quantities, I attempted to amplify these molecules with RT-PCR.

Initial attempts to RT-PCR amplify the TnI outron using an antisense primer

(9232) for reverse transcription corresponding to the SL RNA intron and TnI-specific primers (9233 and 9234) for PCR failed to produce a product visible on an agarose gel

(not shown). Altering the conditions of the reaction over a wide variety of conditions also failed to produce the predicted amplicon, as did using once-amplified PCR product as template for a second round of PCR (not shown).

However, reverse transcribing with an antisense primer (9267) corresponding to the SL RNA intron and spanning 5 nt into the TnI outron upstream of the branchpoint nucleotide and subsequently PCR amplifying with this primer and another

(9266) specific to the TnI outron resulted in the production of an amplicon with the expected ~400 nt size (figure 24, lane 3).

83

Figure 24. Agarose gel demonstrating that reverse transcription with a branchpoint- spanning primer may be essential for the amplification of outrons. Reverse transcription with an antisense oligonucleotide (9267) corresponding to the SL RNA intron and spanning 5 nt into the TnI outron upstream of the branchpoint was used in conjunction with a PCR primer (9266) specific to the TnI outron. Lane 1: Fermentas GeneRuler™ 100bp Plus DNA Ladder. Lane 2: RT-PCR (primers 9232 and 9233) with non-branchpoint-spanning primer (9232) for reverse transcription. Lane 3: RT- PCR (primers 9267 and 9266) with a branchpoint-spanning primer (9267) for reverse transcription.

The ~400 nt band from figure 1 lane 3 was extracted, and 1 uL of the recovered DNA was used as a template for a subsequent PCR using the same primers

(9266 and 9267) to produce DNA for cloning and sequencing. The successfully re- amplified DNA was cloned into a TOPO® vector and sequenced (figure 25).

ATTGGAACAAATCAGTGTCGCTTTCCAAGTACGTAAATTCTTGGGGAACCACTTAAATTTCTTAAAAGA ACATCTTTCCCTGCGTATCACCGCTATCAATGATACCAATTGATAATCGAATCGACTCAAATAGTACTA ATGATAAAAATAATAAATCTAAAGACTTCCCAAACGGTTATTTTCGTACAATTTACAATTCGTATAGAT AAATTTATATTCAATGAGAAGAATCCCAACACTATCACTATTGAGTAGTCCTAAATATTGATATCGCGT TATGACTTGAATAGTTCGACTATTCTGATGGTCATCGACGCATTTTGGAAATCCTGTTCAGACTTCTCC TTGATTTATGTTTGGCATATTTGACTTATAACATTTCAATATGAAAAAATAGCAAGAAAAAGAACACCA AATGATT(*)cattcttagactcgaaacc

Figure 25. Sequence of the ~400 nt band generated by reverse transcription with primer 9267 and PCR with primers 9266 and 9267. The TnI outron was amplified from body-wall muscle RNA reverse transcribed with primer 9267 and PCR amplified with primers 9266 and 9267. Underlined sequence corresponds to primers 9266 (left) and 9267 (right). Lowercase corresponds to the SL RNA intron, while uppercase corresponds to the TnI outron. The * is immediately to the right of the 9267 nucleotide complementary to the TnI branchpoint.

84

CHAPTER 6 - DISCUSSION Summary of results

The evidence presented in this thesis suggests that what has previously been referred to as the Ciona intestinalis SL RNA is really a heterogeneous population of

RNAs (table 1), with the majority of these in the 42-46 nt range and the most abundantly represented sizes being 42, 44, 45, and 46 nt (figure 9). These Ciona intestinalis SL RNAs lack the TMG cap structure (figures 10A and 12A) found on the

SL RNA of every other metazoan whose SL RNA has thus far been examined [12, 13,

19, 28-31]. Although it lacked a TMG cap, the Ciona intestinalis SL RNA was found as a snRNP associated with Sm proteins (figure 16A), and the Ciona intestinalis SL

RNA was shown to be phosphatase protected and therefore capped (figure 13A), most likely with m7G (figure 15A). Interestingly and surprisingly, a 53 nt transcript with the expected features of an SL donor was identified which contained the SL sequence

(figure 6A), was capped with TMG (figures 10A and 12A), existed as a snRNP in association with Sm proteins (figure 16A), and had the splice donor dinucleotide GU directly adjacent to the SL sequence (figure 8). Examination of the 5‟-end of the trans-spliced mRNA TnI (figure 18A), however, revealed that the cap structure of TnI was not TMG, but was similar to that of the SL RNA in failing to react with TMG- specific antiserum, indicating that at least for TnI, this donor-like 53 nt SL RNA exon- containing transcript is not the actual donor. In fact, the 53 nt SL RNA exon- containing transcript is itself generated through trans-splicing, certainly at its 5‟-end, and potentially to form its 3‟-end as well (figure 8). This 53 nt SL RNA exon- containing transcript was present in all adult body-wall muscle samples examined

(figure 19), as well as the RNA derived from the Sm-immunoprecipitated intestine homogenate, but absent in egg and 12 h embryo samples (figure 20A). Examination 85 of the expression profile of the SL RNA (figure 20A) revealed an apparent large upregulation of the SL RNA some time between egg and 12 h embryo stages.

This work also showed the vast majority of SL RNA genes were found to cluster together in a single region as a series of head-to-tail tandem repeats near the telomeric tip of the short arm of chromosome 8 (figure 23).

Finally, some success was obtained in the amplification of the TnI outron

(figures 24 and 25) using an antisense primer (9267) for reverse transcription corresponding to the SL RNA intron and spanning the TnI branchpoint into the outron, indicating that these molecules may be exist in sufficient quantities to be detected by

PCR amplification, and that doing so with a branchpoint-spanning primer is advisable.

Functional implications of the lack of a TMG cap on the Ciona intestinalis SL RNA

As the cap structure on the SL RNA of Ciona intestinalis is unique amongst metazoan SL RNAs, it can shed some light on aspects of trans-splicing, such as the functions and evolutionary origins of this process. While the experiments with the

H20 antibody indicated that the cap structure of the Ciona SL RNA is most likely m7G, direct chemical analysis of the cap may be a worthwhile future endeavor, as it is still possible that this cap structure is one which has never been observed before.

Historically, one of the leading hypotheses for the function of SL trans- splicing has been to provide a signal for altered subcellular localization, unusual stability, or a unique method of translation of the trans-spliced mRNA, through either the specialized cap structure which the SL provides or through the SL sequence itself[28]. The fact that the trans-spliced mRNAs of Ciona intestinalis are most likely capped by the same moiety, m7 G, as non-trans-spliced mRNAs, indicates that if the primary function of trans-splicing in Ciona intestinalis is to biochemically define 86 trans-spliced mRNAs, this definition is embodied in the SL sequence per se, and not in the nature of the cap structure.

While the absence of a TMG cap on the SL RNA is one way in which Ciona intestinalis differs from other metazoans which utilize trans-splicing, the presence of operons is a feature shared in common, indicating that resolution of operons may be a principal function of SL trans-splicing. Notably, in C. elegans, a specialized donor,

SL2, is used exclusively for trans-splicing to the downstream genes of operons[107].

While the provision of a specialized cap structure to trans-spliced mRNAs does not appear to be the primary function of SL trans-splicing in Ciona intestinalis, this could still be the primary function in other organisms, or even as a secondary function in

Ciona intestinalis; the 53 nt SL RNA exon-containing transcript has not been entirely ruled out as a donor, except as the principal SL donor for TnI mRNA, and it is theoretically possible that this molecule could serve as the SL donor for mRNAs other than TnI. Perhaps the 53 nt SL RNA exon-containing transcript is a specialized SL

RNA utilized exclusively for trans-splicing of downstream operon genes, or another specialized purpose, in Ciona intestinalis, although its strong preferential expression in adult body-wall muscle and intestine as opposed to embryos and eggs appears to argue against that function in the latter developmental stages. In any case, the 53 nt

SL RNA exon-containing transcript can not exist as the sole SL donor, as it is itself generated through SL trans-splicing. It has not escaped my notice that the mechanism postulated to generate the 53 nt SL RNA exon-containing transcript immediately suggests that the TMG cap structure found upon it is formed after trans-splicing to the non-TMG-capped SL RNA. Examining the cap structure of trans-spliced downstream operon genes known to be expressed in adult body-wall muscle or intestine would 87 determine if the 53 nt SL RNA exon-containing transcript is utilized for the specialized purpose of resolving operons in those tissues.

TMG caps bind SPN1, a protein which associates with the nuclear import receptor importin ß, allowing for the nuclear import of snRNPs. Another implication of the absence of a TMG cap on the SL RNA of Ciona intestinalis is that there is an independent nuclear re-import pathway for non-TMG-capped small RNAs, or that in the context of RNAs which are small enough, Sm binding is sufficient to supply the signal for nuclear re-import. The latter hypothesis is supported by the fact that the U5 snRNA, the smallest of the snRNAs at 116 nt in Xenopus laevis[108], is also the least dependent upon a TMG cap for nuclear localization in Xenopus oocytes[60]. The

Ciona intestinalis SL RNA, at ~42-46 nt, is much smaller than even the shortest snRNA, and as the results for Xenopus U5 snRNA indicate, smaller RNAs may not be dependent on the TMG cap for nuclear re-import.

Finally, while the small size of the SL RNA may account for its ability to re- enter the nucleus after it has been exported, how it avoids cytoplasmic sequestration by m7G cap-binding proteins remains unclear; in addition to providing a signal for nuclear re-import, TMG caps are rapidly formed on snRNAs upon their transport to the cytoplasm to prevent tethering of snRNAs to the cytoplasm. This tethering is presumably caused by proteins in the cytoplasm, known to interact with the TMG precursor m7 G, such as those involved in the initiation of translation[60]. Perhaps the intact SL RNA adapts a conformation to prevent access of these proteins to its m7G cap, while adopting a new conformation allowing for the translation of trans-spliced mRNAs once trans-splicing has taken place and the SL RNA has lost its intron.

88

Evolutionary implications of the absence of a TMG cap on the Ciona intestinalis SL RNA

It has been noted that in addition to studying additional phyla to determine the full extent of the phylogenetic range of SL trans-splicing to determine if this is an ancient or independently-derived process, studying the detailed molecular mechanisms of the process between phyla already known to utilize this process would be necessary to definitively establish whether trans-splicing is truly an orthologous means of pre-mRNA maturation amongst the organisms which use it. As the Ciona intestinalis SL RNA does not precisely fit the pattern defined by other metazoan SL

RNAs in terms of its size, predicted secondary structure, and cap structure, this supports independent origins of trans-splicing for Ciona intestinalis, as the lack of a

TMG cap implies a novel nuclear re-import pathway, or potentially a complete lack of nuclear-cytoplasmic shuttling. Alternatively, by the hypothesis that smaller RNAs are less dependent on their TMG cap for nuclear re-import, the SL RNA of Ciona intestinalis may use the same nuclear re-import machinery as other metazoans, and simply not require the TMG cap due to its unusually small size. In situ hybridization to study the cellular distribution of this molecule to confirm that it does enter the cytoplasm, and determining its precise route of nuclear import, would reveal whether the SL RNA trafficking pathways used in other organisms which utilize this process are also used in Ciona intestinalis, despite the absence of a TMG cap on the Ciona intestinalis SL RNA.

The absence of a TMG cap on the SL RNA does not reflect a general inability of Ciona intestinalis to make TMG caps; Ciona U1 snRNA was almost entirely (97%) bound by the K121 antibody (figure 10B), suggesting that at least the ability to make

TMG caps is conserved. Additionally, performing a BLAST search of the Ciona intestinalis genome using the S. cerevisiae trimethylguanosine synthase protein, the 89 enzyme responsible for conversion of m7G caps to TMG[109], sequence (RefSeq

NP_015168.1) as a query, reveals that Ciona intestinalis has a homolog of this enzyme (RefSeq accession XM_002131220.1, expect value 5x10-27) with a predicted methyltransferase activity. Further, the 53 nt SL RNA exon-containing transcript was shown to be TMG-capped (figure 12A).

As studying the precise molecular mechanisms involved in all facets of trans- splicing is required to determine if this is an orthologous process amongst the organisms which utilize it, further investigation on why Ciona intestinalis does not

TMG cap its SL RNA despite being capable of making TMG caps may provide insights as to whether trans-splicing is ancient or independently derived process.

Possible roles of the 53 nt SL RNA exon-containing transcript

As previously mentioned, while the 53 nt SL RNA exon-containing transcript has been ruled out as the major SL donor for TnI or the sole SL donor, it has not been ruled out entirely as a putative SL donor, potentially with a specialized function.

A class of small, often TMG-capped and Sm-associated RNAs to which the 53 nt SL RNA exon-containing transcript could perhaps belong is the snoRNAs[35]. The majority of snoRNAs are encoded in introns, and are processed by debranching and by both 5‟-3‟ and 3‟-5‟ exonucleolytic cleavage to generate mature snoRNAs[35].

Similar to how splicing of the host pre-mRNA is necessary to provide freed precursor snoRNAs, trans-splicing is necessary to generate the freed 53 nt SL RNA exon- containing transcript.

There are two types of snoRNAs: box C/D snoRNAs, which guide site- specific 2‟-O-methylation of targeted nucleotides, and H/ACA snoRNAs, which guide the pseudouridylation of targeted U nucleotides, with both modifications present in rRNAs, tRNAs, snRNAs, and even mRNAs[35]. Box C/D snoRNAs associate with 90 fibrillarin, a nucleolar protein which has been suggested as a protein involved in the

Sm-independent acquisition of TMG caps.

As the 53 nt SL RNA exon-containing transcript lacks the C (RUGAUGA) and

D (CUGA) motifs, and since the association of the 53 nt SL RNA exon-containing transcript with Sm proteins can account for how it acquires a TMG cap, it is unlikely that this molecule is a box C/D snoRNA.

However, the 53 nt SL RNA exon-containing transcript does have an H motif

(AnAnnA) exhibited by the sequence AAACTA, and possibly has the ACA motif, which is invariantly found 3 nt away from the 3‟-end of these snoRNAs. While it is certain that 53 nt SL RNA exon-containing transcript contains the sequence ACA near the 3‟-end, it is uncertain exactly how far away from this end it is. Because the sequence was generated by poly(A)-tailing, I assumed any As at the 3‟-end were from this addition, not genomically encoded. I made this assumption because by making it, the transcript, at 53 nt, maps precisely between two known alternative trans-splice acceptor sites of a gene (figure 8), thus potentially explaining how both the 5‟- and 3‟- ends are generated. However, it is also possible that this molecule is 54 or 55 nt; two

As are present in the genomic DNA sequence after the AG dinucleotide that defines the 3‟-end of the 53 nt SL RNA exon-containing transcript. If both of these As are actually present on the „53 nt SL RNA exon-containing transcript‟, the ACA is precisely 3 nt away from the 3‟-end, meaning it fits the criterion for being an ACA motif, and that this transcript is actually 55 nt and is potentially an H/ACA snoRNA.

Precise determination of the 3‟-end, and determining whether it is associated with the common H/ACA nucleolar protein dyskerin[35] would provide more evidence regarding whether this molecule is an H/ACA snoRNA. If so, it would be the first snoRNA known to be generated by trans-splicing. 91

Another class of small RNAs which are both TMG-capped and Sm-associated, and to which the 53 nt SL RNA exon-containing transcript could belong, is telomerase

RNAs[34], which vary widely in length and sequence between species. A recent report[110] revealed a unique and interesting role of the spliceosome for generation of the telomerase RNA in Schizosaccharomyces pombe. Normally in cis-splicing, an intron is removed by the spliceosome in two consecutive steps. In the first step, the

5‟-end of the intron is cleaved by joining the 5‟-phospate of the intron to the 2‟-OH of the branchpoint nucleotide, which is near the 3‟-end of the intron. This leaves a free

3‟-OH group at the end of the upstream exon. In the second step, this free 3‟-OH is joined to the 5‟-end of the downstream exon, freeing the intron and generating the mature transcript[111]. In S. pombe, the telomerase RNA exists in multiple forms: short (complexed with the catalytic subunit of telomerase and active) and long

(inactive) forms[112]. The telomerase gene in this organism harbours an intron. The long version of telomerase RNA contains a 56 nt deletion with respect to the genomic sequence after the 3‟-end of the short isoform; this distance corresponds to the telomerase gene intron. The 3‟-end of the short, active isoform is generated by uncoupling the first step (joining the 5‟-end of the intron to the branchpoint nucleotide, freeing the 3‟-end of the upstream exon) of the removal of this intron from the second step (ligation of the freed upstream exon to the downstream exon); by failing to join the upstream and downstream exons, the spliceosome releases the telomerase RNA as the mature short form. The long, inactive isoform is generated by completing both steps of intron removal. Thus, the spliceosome plays a unique role in the generation of the active form of the telomerase RNA in S. pombe; cleavage to generate a 3‟-end without subsequent ligation. It is unknown exactly how this intron uncouples the first and second steps of splicing, however, the larger than average distance between the 92 branchpoint nucleotide and 3‟ splice acceptor site apparently facilitates the uncoupling of the first and second steps of intron removal for this gene to some extent[110], allowing for the accumulation of the short form. In addition, specific regulatory factors may be necessary for uncoupling these first and second reactions[110].

In addition to sharing with the S. pombe telomerase RNA the presence of a

TMG cap and association with Sm proteins, the 3‟-end of the 53 nt SL RNA exon- containing transcript is also potentially generated by splicing, albeit trans- rather than cis-splicing. Trans-splicing actually has an advantage over cis-splicing for the generation of 3‟-ends, as it allows for proper 3‟-end formation by completion of the splicing reaction. Because uncoupling of the first and second steps of intron removal is a difficult process, as doing so in the context of protein-encoding genes would be exceptionally detrimental, it has been hypothesized that specific sequence and/or other regulatory elements are required for the regulation of this process[110]. Trans- splicing circumvents the problem of uncoupling by making 3‟-end formation of target molecules dependent on the second step of splicing rather than the first. This is because the splice donor is on a separate molecule; the first step of SL trans-splicing frees only the SL. Thus I propose a new function for SL trans-splicing, revealed by the sequence of the 53 nt SL RNA exon-containing transcript: correct 3‟-end formation for RNAs which would otherwise be unable to form them in the proper location. Perhaps the SL RNA is significant in that it is able to splice and thus that the presence of the SL on the molecule to which it is trans-spliced is not as important as the fact that the 3‟-end of another molecule was released by this splicing.

Why does the Ciona intestinalis SL RNA lack a TMG cap?

Given that the Ciona intestinalis SL RNA is the shortest yet discovered and also the only metazoan SL RNA known to lack a TMG cap, it is tempting to speculate 93 that the two facts are related. The TMG cap may not be a requisite moiety for the nuclear re-import for RNAs as small as the SL RNA of Ciona intestinalis, a hypothesis supported by the earlier observed trend of shorter RNAs being less reliant upon the TMG cap for nuclear re-import[60]. Despite the TMG cap being potentially unnecessary for nuclear re-import, the question remains, how does the Ciona intestinalis SL RNA escape being TMG-capped? The answer may lie in SMN proteins, the proteins involved in recruiting Sm proteins to snRNAs. In addition to interacting with Sm proteins, SMN proteins often directly bind snRNAs themselves[113]. The minimum requirement for SMN binding to snRNAs is the Sm binding site, followed by a stem-loop structure immediately 3‟ of the Sm binding site[114, 115]. The predicted secondary structure of the Ciona intestinalis SL RNA is unique in that it lacks the stem-loop structure normally found on SL RNAs behind the

Sm binding site, due in part to its relatively short length[1]; perhaps SMN can not bind this molecule, despite Sm being present.

Notably, SMN also binds directly to fibrillarin, the common protein to all box

C/D snoRNAs[116]. As these box C/D snoRNAs are also often TMG-capped, fibrillarin has been suggested as an Sm-independent route to TMG-capping. As SMN is common to both Sm proteins and fibrillarin, two proteins known or suspected of being involved in TMG capping, perhaps SMN itself plays a previously underappreciated role in TMG cap acquisition. If this were the case, the absence of a

TMG cap on the SL RNA of Ciona intestinalis could be explained by a failure of

SMN to bind, due to the lack of a stem-loop immediately 3‟ of the Sm binding site of this molecule, caused in part by its small size.

SMN, however, is normally a prerequisite for Sm binding. Whether the SL

RNA of Ciona intestinalis achieves binding of Sm proteins through SMN recruitment 94 of them despite the predicted absence 3‟-stem-loop structure adjacent to the Sm binding site may also be an area for interesting future research.

Significance that the Ciona intestinalis SL RNA is capped

At 42-46 nt, the Ciona intestinalis SL RNA is, to my knowledge, the shortest eukaryotic transcript presently known to be capped in vivo.

Association with Sm proteins

My results show that, as expected, the Ciona intestinalis SL RNA exists in vivo as a snRNP in association with Sm proteins. My results also suggested that Sm- associated SL RNA transcripts may be 3‟-trimmed compared to SL RNA transcripts not bound to Sm, although this may have been a result of degradation during the immunoprecipitation rather than association with Sm proteins; performing the poly(A)-tailing SL-PCR experiment on the supernatant fraction of the Sm immunoprecipitation experiment would allow for the determination of whether SL

RNA transcripts which were incubated as a whole cell homogenate for a similar amount of time but not associated with Sm were also 3‟-trimmed. Enrichment for 3‟- trimmed transcripts would not be a surprising result, as Sm association has been shown to be a prerequisite for 3‟-end trimming of snRNAs[72]. If the Sm pellet fraction were found to enrich for shorter transcripts, this would suggest the active form of the SL RNA is ~42 nt. Association of the SL RNA with Sm proteins was not an unexpected result, as previous SL RNAs had been shown to be in association with these proteins[28]. It has been hypothesized that an as of yet unidentified factor, possibly SMN, interacts with Sm and the nuclear import receptor importin ß to provide one half (with the other half being the TMG cap) of the bipartite signal required for nuclear re-import of Sm-associated TMG-capped snRNAs[72]. Given 95 that the Ciona intestinalis SL RNA lacks the TMG cap, perhaps it is entirely reliant on

Sm proteins for nuclear re-import.

Unexpectedly, the 53 nt SL RNA exon-containing transcript was also shown to exists in vivo as a snRNP in association with Sm proteins. Implications regarding the function of the association between Sm proteins and the 53 nt SL RNA exon- containing transcript have been discussed in the discussion subsection “Possible roles of the 53 nt SL RNA exon-containing transcript”.

Expression profile of the SL RNA

My data showed at most low levels of the SL RNA in the egg stage of Ciona intestinalis, with an apparent large (2.7-fold) upregulation some time between egg and

12 h embryo stages. This result indicates that in order to study the role of trans- splicing during this crucial window of development, studies using siRNA against the

SL RNA should target the SL RNA at the egg stage. Knockdown of the SL RNA may be much more feasible during this stage than 12 h later.

Outron amplification

Although scaling up outron amplification to a high-throughput method of determining the transcription start sites of trans-spliced genes is still several steps away, these experiments provided the proof of principle for doing so, indicating that branched outrons may be sufficiently abundant to be amplified by PCR.

96

7. REFERENCES

1. Vandenberghe, A.E., T.H. Meedel, and K.E. Hastings, mRNA 5'-leader trans- splicing in the chordates. Genes Dev, 2001. 15(3): p. 294-303. 2. Davis, R.E., Spliced leader RNA trans-splicing in metazoa. Parasitol Today, 1996. 12(1): p. 33-40. 3. Liou, R.F. and T. Blumenthal, trans-spliced Caenorhabditis elegans mRNAs retain trimethylguanosine caps. Mol Cell Biol, 1990. 10(4): p. 1764-8. 4. Conrad, R., et al., Insertion of part of an intron into the 5' untranslated region of a Caenorhabditis elegans gene converts it into a trans-spliced gene. Mol Cell Biol, 1991. 11(4): p. 1921-6. 5. Johnson, P.J., J.M. Kooter, and P. Borst, Inactivation of transcription by UV irradiation of T. brucei provides evidence for a multicistronic transcription unit including a VSG gene. Cell, 1987. 51(2): p. 273-81. 6. Hastings, K.E., SL trans-splicing: easy come or easy go? Trends Genet, 2005. 21(4): p. 240-7. 7. Liang, X.H., et al., trans and cis splicing in trypanosomatids: mechanism, factors, and regulation. Eukaryot Cell, 2003. 2(5): p. 830-40. 8. Stover, N.A., M.S. Kaye, and A.R. Cavalcanti, Spliced leader trans-splicing. Curr Biol, 2006. 16(1): p. R8-9. 9. Campbell, D.A., D.A. Thornton, and J.C. Boothroyd, Apparent discontinuous transcription of Trypanosoma brucei variant surface antigen genes. Nature, 1984. 311(5984): p. 350-5. 10. Zhang, H., et al., Spliced leader RNA trans-splicing in dinoflagellates. Proc Natl Acad Sci U S A, 2007. 104(11): p. 4618-23. 11. Krause, M. and D. Hirsh, A trans-spliced leader sequence on actin mRNA in C. elegans. Cell, 1987. 49(6): p. 753-61. 12. Rajkovic, A., et al., A spliced leader is present on a subset of mRNAs from the human parasite Schistosoma mansoni. Proc Natl Acad Sci U S A, 1990. 87(22): p. 8879-83. 13. Stover, N.A. and R.E. Steele, Trans-spliced leader addition to mRNAs in a cnidarian. Proc Natl Acad Sci U S A, 2001. 98(10): p. 5693-8. 14. Pouchkina-Stantcheva, N.N. and A. Tunnacliffe, Spliced leader RNA-mediated trans-splicing in phylum Rotifera. Mol Biol Evol, 2005. 22(6): p. 1482-9. 15. Marletaz, F., et al., Chaetognath transcriptome reveals ancestral and unique features among bilaterians. Genome Biol, 2008. 9(6): p. R94. 16. Nilsen, T.W., Evolutionary origin of SL-addition trans-splicing: still an enigma. Trends Genet, 2001. 17(12): p. 678-80. 17. Denker, J.A., et al., New components of the spliced leader RNP required for nematode trans-splicing. Nature, 2002. 417(6889): p. 667-70. 18. Keiper, B.D., et al., Functional characterization of five eIF4E isoforms in Caenorhabditis elegans. J Biol Chem, 2000. 275(14): p. 10590-6. 19. Ganot, P., et al., Spliced-leader RNA trans splicing in a chordate, Oikopleura dioica, with a compact genome. Mol Cell Biol, 2004. 24(17): p. 7795-805. 20. Perry, K.L., K.P. Watkins, and N. Agabian, Trypanosome mRNAs have unusual "cap 4" structures acquired by addition of a spliced leader. Proc Natl Acad Sci U S A, 1987. 84(23): p. 8190-4. 21. Davis, R.E., Surprising diversity and distribution of spliced leader RNAs in flatworms. Mol Biochem Parasitol, 1997. 87(1): p. 29-48. 97

22. Walder, J.A., et al., The 35-nucleotide spliced leader sequence is common to all trypanosome messenger RNA's. Science, 1986. 233(4763): p. 569-71. 23. Nilsen, T.W., Trans-splicing of nematode premessenger RNA. Annu Rev Microbiol, 1993. 47: p. 413-40. 24. Zorio, D.A., et al., Operons as a common form of chromosomal organization in C. elegans. Nature, 1994. 372(6503): p. 270-2. 25. Satou, Y., et al., Genomic overview of mRNA 5'-leader trans-splicing in the ascidian Ciona intestinalis. Nucleic Acids Res, 2006. 34(11): p. 3378-88. 26. Nilsen, T.W., et al., Characterization and expression of a spliced leader RNA in the parasitic nematode Ascaris lumbricoides var. suum. Mol Cell Biol, 1989. 9(8): p. 3543-7. 27. Drouin, G. and M.M. de Sa, The concerted evolution of 5S ribosomal genes linked to the repeat units of other multigene families. Mol Biol Evol, 1995. 12(3): p. 481-93. 28. Thomas, J.D., R.C. Conrad, and T. Blumenthal, The C. elegans trans-spliced leader RNA is bound to Sm and has a trimethylguanosine cap. Cell, 1988. 54(4): p. 533-9. 29. Lall, S., et al., Contribution of trans-splicing, 5' -leader length, cap-poly(A) synergism, and initiation factors to nematode translation in an Ascaris suum embryo cell-free system. J Biol Chem, 2004. 279(44): p. 45573-85. 30. Brehm, K., K. Jensen, and M. Frosch, mRNA trans-splicing in the human parasitic cestode Echinococcus multilocularis. J Biol Chem, 2000. 275(49): p. 38311-8. 31. Davis, R.E., et al., RNA trans-splicing in Fasciola hepatica. Identification of a spliced leader (SL) RNA and SL sequences on mRNAs. J Biol Chem, 1994. 269(31): p. 20026-30. 32. Van Doren, K. and D. Hirsh, mRNAs that mature through trans-splicing in Caenorhabditis elegans have a trimethylguanosine cap at their 5' termini. Mol Cell Biol, 1990. 10(4): p. 1769-72. 33. Saponara, A.G. and M.D. Enger, Occurrence of N2,N2,7-trimethylguanosine in minor RNA species of a mammalian cell line. Nature, 1969. 223(5213): p. 1365-6. 34. Seto, A.G., et al., Saccharomyces cerevisiae telomerase is an Sm small nuclear ribonucleoprotein particle. Nature, 1999. 401(6749): p. 177-80. 35. Kiss, T., et al., Biogenesis and intranuclear trafficking of human box C/D and H/ACA RNPs. Cold Spring Harb Symp Quant Biol, 2006. 71: p. 407-17. 36. Bangs, J.D., et al., Mass spectrometry of mRNA cap 4 from trypanosomatids reveals two novel nucleosides. J Biol Chem, 1992. 267(14): p. 9805-15. 37. Mair, G., E. Ullu, and C. Tschudi, Cotranscriptional cap 4 formation on the Trypanosoma brucei spliced leader RNA. J Biol Chem, 2000. 275(37): p. 28994-9. 38. Bruzik, J.P., et al., Trans splicing involves a novel form of small nuclear ribonucleoprotein particles. Nature, 1988. 335(6190): p. 559-62. 39. Lee, M.G. and L.H. Van der Ploeg, Transcription of protein-coding genes in trypanosomes by RNA polymerase I. Annu Rev Microbiol, 1997. 51: p. 463-89. 40. Cheng, G., et al., The flatworm spliced leader 3'-terminal AUG as a translation initiator methionine. J Biol Chem, 2006. 281(2): p. 733-43. 41. Blumenthal, T., Trans-splicing and polycistronic transcription in Caenorhabditis elegans. Trends Genet, 1995. 11(4): p. 132-6. 98

42. Kuersten, S., et al., Relationship between 3' end formation and SL2-specific trans-splicing in polycistronic Caenorhabditis elegans pre-mRNA processing. RNA, 1997. 3(3): p. 269-78. 43. Williams, C., L. Xu, and T. Blumenthal, SL1 trans splicing and 3'-end formation in a novel class of Caenorhabditis elegans operon. Mol Cell Biol, 1999. 19(1): p. 376-83. 44. Maroney, P.A., et al., Most mRNAs in the nematode Ascaris lumbricoides are trans-spliced: a role for spliced leader addition in translational efficiency. RNA, 1995. 1(7): p. 714-23. 45. Conrad, R., R.F. Liou, and T. Blumenthal, Conversion of a trans-spliced C. elegans gene into a conventional gene by introduction of a splice donor site. EMBO J, 1993. 12(3): p. 1249-55. 46. Mair, G., et al., A new twist in trypanosome RNA metabolism: cis-splicing of pre-mRNA. RNA, 2000. 6(2): p. 163-9. 47. Benabdellah, K., E. Gonzalez-Rey, and A. Gonzalez, Alternative trans- splicing of the Trypanosoma cruzi LYT1 gene transcript results in compartmental and functional switch for the encoded protein. Mol Microbiol, 2007. 65(6): p. 1559-67. 48. Manning-Cela, R., A. Gonzalez, and J. Swindle, Alternative splicing of LYT1 transcripts in Trypanosoma cruzi. Infect Immun, 2002. 70(8): p. 4726-8. 49. Silva-Filho, M.C., One ticket for multiple destinations: dual targeting of proteins to distinct subcellular locations. Curr Opin Plant Biol, 2003. 6(6): p. 589-95. 50. Chamond, N., et al., Biochemical characterization of proline racemases from the human protozoan parasite Trypanosoma cruzi and definition of putative protein signatures. J Biol Chem, 2003. 278(18): p. 15484-94. 51. Engel, M.L., J.C. Hines, and D.S. Ray, The Crithidia fasciculata RNH1 gene encodes both nuclear and mitochondrial isoforms of RNase H. Nucleic Acids Res, 2001. 29(3): p. 725-31. 52. Sonenberg, N., Cap-binding proteins of eukaryotic messenger RNA: functions in initiation and control of translation. Prog Nucleic Acid Res Mol Biol, 1988. 35: p. 173-207. 53. Reddy, R., R. Singh, and S. Shimba, Methylated cap structures in eukaryotic RNAs: structure, synthesis and functions. Pharmacol Ther, 1992. 54(3): p. 249- 67. 54. Gu, M. and C.D. Lima, Processing the message: structural insights into capping and decapping mRNA. Curr Opin Struct Biol, 2005. 15(1): p. 99-106. 55. Cho, E.J., et al., mRNA capping enzyme is recruited to the transcription complex by phosphorylation of the RNA polymerase II carboxy-terminal domain. Genes Dev, 1997. 11(24): p. 3319-26. 56. McCracken, S., et al., 5'-Capping enzymes are targeted to pre-mRNA by binding to the phosphorylated carboxy-terminal domain of RNA polymerase II. Genes Dev, 1997. 11(24): p. 3306-18. 57. Mattaj, I.W., Cap trimethylation of U snRNA is cytoplasmic and dependent on U snRNP protein binding. Cell, 1986. 46(6): p. 905-11. 58. Fechter, P. and G.G. Brownlee, Recognition of mRNA cap structures by viral and cellular proteins. J Gen Virol, 2005. 86(Pt 5): p. 1239-49. 59. Fischer, U. and R. Luhrmann, An essential signaling role for the m3G cap in the transport of U1 snRNP to the nucleus. Science, 1990. 249(4970): p. 786-90. 99

60. Fischer, U., et al., Diversity in the signals required for nuclear accumulation of U snRNPs and variety in the pathways of nuclear transport. J Cell Biol, 1991. 113(4): p. 705-14. 61. Zieve, G.W. and R.A. Sauterer, Cell biology of the snRNP particles. Crit Rev Biochem Mol Biol, 1990. 25(1): p. 1-46. 62. Narayanan, U., et al., SMN, the spinal muscular atrophy protein, forms a pre- import snRNP complex with snurportin1 and importin beta. Hum Mol Genet, 2002. 11(15): p. 1785-95. 63. Hernandez, N. and A.M. Weiner, Formation of the 3' end of U1 snRNA requires compatible snRNA promoter elements. Cell, 1986. 47(2): p. 249-58. 64. Kleinschmidt, A.M. and T. Pederson, RNA processing and ribonucleoprotein assembly studied in vivo by RNA transfection. Proc Natl Acad Sci U S A, 1990. 87(4): p. 1283-7. 65. Shimba, S., et al., Cap structure of U3 small nucleolar RNA in animal and plant cells is different. gamma-Monomethyl phosphate cap structure in plant RNA. J Biol Chem, 1992. 267(19): p. 13772-7. 66. Ullu, E. and C. Tschudi, Trans splicing in trypanosomes requires methylation of the 5' end of the spliced leader RNA. Proc Natl Acad Sci U S A, 1991. 88(22): p. 10074-8. 67. Maroney, P.A., et al., The nematode spliced leader RNA participates in trans- splicing as an Sm snRNP. EMBO J, 1990. 9(11): p. 3667-73. 68. Will, C.L. and R. Luhrmann, Spliceosomal UsnRNP biogenesis, structure and function. Curr Opin Cell Biol, 2001. 13(3): p. 290-301. 69. Pellizzoni, L., J. Yong, and G. Dreyfuss, Essential role for the SMN complex in the specificity of snRNP assembly. Science, 2002. 298(5599): p. 1775-9. 70. Meister, G., et al., A multiprotein complex mediates the ATP-dependent assembly of spliceosomal U snRNPs. Nat Cell Biol, 2001. 3(11): p. 945-9. 71. Kolb, S.J., D.J. Battle, and G. Dreyfuss, Molecular functions of the SMN complex. J Child Neurol, 2007. 22(8): p. 990-4. 72. Kiss, T., Biogenesis of small nuclear RNPs. J Cell Sci, 2004. 117(Pt 25): p. 5949-51. 73. Liu, Q., et al., The spinal muscular atrophy disease gene product, SMN, and its associated protein SIP1 are in a complex with spliceosomal snRNP proteins. Cell, 1997. 90(6): p. 1013-21. 74. Fischer, U., Q. Liu, and G. Dreyfuss, The SMN-SIP1 complex has an essential role in spliceosomal snRNP biogenesis. Cell, 1997. 90(6): p. 1023-9. 75. Pellizzoni, L., B. Charroux, and G. Dreyfuss, SMN mutants of spinal muscular atrophy patients are defective in binding to snRNP proteins. Proc Natl Acad Sci U S A, 1999. 96(20): p. 11167-72. 76. Charroux, B., et al., Gemin4. A novel component of the SMN complex that is found in both gems and nucleoli. J Cell Biol, 2000. 148(6): p. 1177-86. 77. Palacios, I., et al., Nuclear import of U snRNPs requires importin beta. EMBO J, 1997. 16(22): p. 6783-92. 78. Sleeman, J.E. and A.I. Lamond, Newly assembled snRNPs associate with coiled bodies before speckles, suggesting a nuclear snRNP maturation pathway. Curr Biol, 1999. 9(19): p. 1065-74. 79. Jady, B.E., et al., Modification of Sm small nuclear RNAs occurs in the nucleoplasmic Cajal body following import from the cytoplasm. EMBO J, 2003. 22(8): p. 1878-88. 100

80. Lamond, A.I. and D.L. Spector, Nuclear speckles: a model for nuclear organelles. Nat Rev Mol Cell Biol, 2003. 4(8): p. 605-12. 81. Satoh, N., The ascidian tadpole larva: comparative molecular development and genomics. Nat Rev Genet, 2003. 4(4): p. 285-95. 82. Delsuc, F., et al., Tunicates and not cephalochordates are the closest living relatives of vertebrates. Nature, 2006. 439(7079): p. 965-8. 83. Satoh, N. and W.R. Jeffery, Chasing tails in ascidians: developmental insights into the origin and evolution of chordates. Trends Genet, 1995. 11(9): p. 354-9. 84. Dehal, P., et al., The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins. Science, 2002. 298(5601): p. 2157-67. 85. MacLean, D.W., T.H. Meedel, and K.E. Hastings, Tissue-specific alternative splicing of ascidian troponin I isoforms. Redesign of a protein isoform- generating mechanism during chordate evolution. J Biol Chem, 1997. 272(51): p. 32115-20. 86. Bonen, L., Trans-splicing of pre-mRNA in plants, animals, and protists. FASEB J, 1993. 7(1): p. 40-6. 87. Clement, J.Q., et al., The stability and fate of a spliced intron from vertebrate cells. RNA, 1999. 5(2): p. 206-20. 88. Luhrmann, R., et al., Isolation and characterization of rabbit anti-m3 2,2,7G antibodies. Nucleic Acids Res, 1982. 10(22): p. 7103-13. 89. Bochnig, P., et al., A monoclonal antibody against 2,2,7-trimethylguanosine that reacts with intact, class U, small nuclear ribonucleoproteins as well as with 7-methylguanosine-capped RNAs. Eur J Biochem, 1987. 168(2): p. 461-7. 90. Abramoff, M.D., Magelhaes, P.J., and S.J. Ram, Image Processing with ImageJ. Biophotonics International, 2004. 11(7): p. 36-42. 91. Shoguchi, E., et al., Chromosomal mapping of 170 BAC clones in the ascidian Ciona intestinalis. Genome Res, 2006. 16(2): p. 297-303. 92. Khare, P., Cis-regulatory elements driving muscle-specific expression of an ascidian troponin I gene, in Biology. 2005, McGill University: Montreal. 93. Mortimer, S., Experimental analysis of trans-splicing of an ascidian troponin I gene, in Biology. 2007, McGill University: Montreal. 94. Sinnreich, M., C. Therrien, and G. Karpati, Lariat branch point mutation in the dysferlin gene with mild limb-girdle muscular dystrophy. Neurology, 2006. 66(7): p. 1114-6. 95. Karolchik, D., et al., The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res, 2008. 36(Database issue): p. D773-9. 96. Satou, Y., et al., Improved genome assembly and evidence-based global gene model set for the chordate Ciona intestinalis: new insight into intron and operon populations. Genome Biol, 2008. 9(10): p. R152. 97. Lee, C.Y., A. Lee, and G. Chanfreau, The roles of endonucleolytic cleavage and exonucleolytic digestion in the 5'-end processing of S. cerevisiae box C/D snoRNAs. RNA, 2003. 9(11): p. 1362-70. 98. Abou Elela, S., L. Good, and R.N. Nazar, An efficiently expressed 5.8S rRNA 'tag' for in vivo studies of yeast rRNA biosynthesis and function. Biochim Biophys Acta, 1995. 1262(2-3): p. 164-7. 99. Szymanski, M., et al., 5S Ribosomal RNA Database. Nucleic Acids Res, 2002. 30(1): p. 176-8. 100. Kwan, S., V.L. Gerlach, and D.A. Brow, Disruption of the 5' stem-loop of yeast U6 RNA induces trimethylguanosine capping of this RNA polymerase III transcript in vivo. RNA, 2000. 6(12): p. 1859-69. 101

101. Ali, M. and W.V. Vedeckis, Interaction of RNA with transformed glucocorticoid receptor. II. Identification of the RNA as transfer RNA. J Biol Chem, 1987. 262(14): p. 6778-84. 102. Lewdorowicz, M., et al., Chemical synthesis and binding activity of the trypanosomatid cap-4 structure. RNA, 2004. 10(9): p. 1469-78. 103. Haussecker, D., et al., Capped small RNAs and MOV10 in human hepatitis delta virus replication. Nat Struct Mol Biol, 2008. 15(7): p. 714-21. 104. Ensinger, M.J. and B. Moss, Modification of the 5' terminus of mRNA by an RNA (guanine-7-)-methyltransferase from HeLa cells. J Biol Chem, 1976. 251(17): p. 5283-91. 105. Goncharov, I., et al., Purification of the spliced leader ribonucleoprotein particle from Leptomonas collosoma revealed the existence of an Sm protein in trypanosomes. Cloning the SmE homologue. J Biol Chem, 1999. 274(18): p. 12217-21. 106. Shoguchi, E., M. Hamaguchi, and N. Satoh, Genome-wide network of regulatory genes for construction of a chordate embryo. Dev Biol, 2008. 316(2): p. 498-509. 107. Spieth, J., et al., Operons in C. elegans: polycistronic mRNA precursors are processed by trans-splicing of SL2 to downstream coding regions. Cell, 1993. 73(3): p. 521-32. 108. Kazmaier, M., G. Tebb, and I.W. Mattaj, Functional characterization of X. laevis U5 snRNA genes. EMBO J, 1987. 6(10): p. 3071-8. 109. Mouaikel, J., et al., Hypermethylation of the cap structure of both yeast snRNAs and snoRNAs requires a conserved methyltransferase that is localized to the nucleolus. Mol Cell, 2002. 9(4): p. 891-901. 110. Box, J.A., et al., Spliceosomal cleavage generates the 3' end of telomerase RNA. Nature, 2008. 456(7224): p. 910-4. 111. Bonnal, S. and J. Valcarcel, Molecular biology: spliceosome meets telomerase. Nature, 2008. 456(7224): p. 879-80. 112. Leonardi, J., et al., TER1, the RNA subunit of fission yeast telomerase. Nat Struct Mol Biol, 2008. 15(1): p. 26-33. 113. Battle, D.J., et al., The Gemin5 protein of the SMN complex identifies snRNAs. Mol Cell, 2006. 23(2): p. 273-9. 114. Yong, J., et al., snRNAs contain specific SMN-binding domains that are essential for snRNP assembly. Mol Cell Biol, 2004. 24(7): p. 2747-56. 115. Yong, J., L. Pellizzoni, and G. Dreyfuss, Sequence-specific interaction of U1 snRNA with the SMN complex. EMBO J, 2002. 21(5): p. 1188-96. 116. Jones, K.W., et al., Direct interaction of the spinal muscular atrophy disease protein SMN with the small nucleolar RNA-associated protein fibrillarin. J Biol Chem, 2001. 276(42): p. 38645-51.

ACKNOWLEDGEMENTS

I would like to take this opportunity to acknowledge those who helped to make the foregoing work better, and in some cases, possible. I would like to thank my supervisor, Dr. Ken Hastings, for allowing me an opportunity to work on this project and the ceaseless guidance provided throughout same. I would like to thank our research associate, Dr. Pat Hallauer, for her extensive advice regarding lab protocols.

I would like to thank the members of my supervisory committee, Dr. Ehab Abouheif,

Dr. Thomas Bureau, and Dr. Ken Dewar, for their critical questions and invaluable input on my project. I would like to thank Dr. Robert Zeller of San Diego State

University for providing Pacific Ciona intestinalis samples, Dr. Tom Meedel of

Rhode Island College for providing Atlantic Ciona intestinalis samples, Michaël

Hebeisen at McGill University for providing C. elegans samples, and Dr. Greg

Matlashewski of McGill University for providing Leishmania donovani samples. I would like to thank Dr. Eiichi Shoguchi at the Okinawa Institute of Science and

Technology for collaborating with me on the FISH project. Finally, I would like to thank the members, too numerous to name individually, of the labs on the 6th floor of the Montreal Neurological Institute, for allowing me unabated access to their equipment, and inveterate instruction regarding same.

APPENDIX 1 - OLIGONUCLEOTIDES

Oligonucleotides used for hybridization Sequence (5'-3') 9060 ACCTTATTCAAATAGAAT 9102 GTTGAGCCAAAGCTCAGATTCTTAC 9108 TCCCCACTAGCAGAAATTAT 9254 CAATAAAGTACAGAAACTGATACTTATATAGCGTTA 9272 TGGGTCAGTTTCAATGTTTAC 9281 CTACAAGACCTACTTAACCAAT 9284 AGTTTATACAATGAAAAATTAC

Oligonucleotide used for RNase H digestion Sequence (5'-3') 9294 GCTGGTGGGTCATTTTGCGAAC

RNA oligonucleotide Sequence (5'-3') 9311 AUUCUAUUUGAAUAAGGUAAGA

PCR primers Sequence (5'-3') 9015 GGATCCGATTCTATTTGAATAAG 9038 GAATTCTACCTCAGAGGAGTCATAT 9038RT GAATTCTACCTCAGAGGAGTCATATTTTTTTTTTTTT 9232 TTGAGCCAAAGCTCAGATTC 9233 TGCATTTAAGAACCCCTTGG 9234 GAAGGGTTTGCCAATAAAAGC 9240 TCGACACTAGTGGATCAAACAAGTATTTT 9241 GAGAAGCTTTTTGAATTCTTTGGATCAAT 9266 TAACCTTGTTTAGTCACAGCGAAA 9267 CCAAAGCTCAGATTCTTACTTAGTA 9330 GTCAATTATCCAAGTCCCATTAGTA

APPENDIX 2 - PCR PROGRAMS

Program Name: POLYAPCR Denature: 94°C 30s Anneal: 57°C 30s Elongate: 72°C 2 min Cycles: 35

Program Name: PCR54 Denature: 94°C 45s Anneal: 54°C 45s Elongate: 72°C 2.5 min Hold: 72°C 10 min Cycles: 35

Program Name: PCRGRADIENT Add primers when at Hot Start 80°C 80°C Denature: 94°C 45s 61- Anneal: 75°C 45s Elongate: 72°C 1 min Hold: 72°C 10 min Cycles: 35

Program Name: BPPCR Add primers when at Hot Start 80°C 80°C Denature: 94°C 45s Anneal: 48°C 45s Elongate: 72°C 1 min Hold: 72°C 10 min Cycles: 35

APPENDIX 3 – MEASURED AMOUNT OF HYBRIDIZATION ON NORTHERN BLOTS

K121 antibody with oligonucleotide 9060 (figure 9A) - 46 nt SL RNA Sup 416.024 Pellet 631.893 Background 296.661 Background 443.497 Corrected Sup 119.363 Corrected 188.396 Total Recovered 307.759 Pellet Fraction 0.387846 Fraction 0.612154

K121 antibody with oligonucleotide 9060 (figure 9A) - 53 nt SL RNA exon-containing transcript Sup 293.653 Pellet 471.773 Background 296.661 Background 443.497 Corrected Sup 0 Corrected 28.276 Total Recovered 28.276 Pellet Fraction 0 Fraction 1

K121 antibody with oligonucleotide 9108 (figure 9B) Sup 364.464 Pellet 2773.034 Background 299.788 Background 320.508 Corrected Sup 64.676 Corrected 2452.526 Total Recovered 2517.202 Pellet Fraction 0.025694 Fraction 0.974306

K121 antibody with oligonucleotide 9108 (figure 9B) Starting RNA 1123.126 Pellet 2773.034 band Background 271.57 Background 320.508 Corrected 851.556 Corrected 2452.526 Pellet Enrichment 2.880053 Factor

K121 gel (figure 9C) for ~188 nt transcript (Ciona intestinalis 5.8S rRNA) Sup 6581.472 Pellet 4558.957 Background 4430.273 Background 4517.015 Corrected Sup 2151.199 Corrected 41.942 Total Recovered 2193.141 Pellet Fraction 0.980876 Fraction 0.019124

K121 gel (figure 9C) for ~138 nt transcript (Ciona intestinalis 5S rRNA) Sup 7712.345 Pellet 4493.787 Background 4430.273 Background 4517.015 Corrected Sup 3282.072 Corrected 0 Total Recovered 3282.072 Pellet Fraction 1 Fraction 0

K121 antibody with oligonucleotide 9254 (figure 10) Sup 865.342 Pellet 2854.315 Background 658.685 Background 1818.612 Corrected Sup 206.657 Corrected 1035.703 Total Recovered 1242.36 Pellet Fraction 0.166342 Fraction 0.833658

R1131 antibody with oligonucleotide 9060 (figure 11A) - 46 nt SL RNA Sup 451.659 Pellet 163.356 Background 310.925 Background 180.379 Corrected Sup 140.734 Corrected 0 Total Recovered 140.734 Pellet Fraction 1 Fraction 0

R1131 antibody with oligonucleotide 9060 (figure 11A) - 53 nt SL RNA exon-containing transcript Sup 324.569 Pellet 200.507 Background 310.925 Background 180.379 Corrected Sup 13.644 Corrected 20.128 Total Recovered 33.772 Pellet Fraction 0.404003 Fraction 0.595997

Beads without R1131 antibody with oligonucleotide 9060 (figure 11A) - 53 nt SL RNA exon- containing transcript Sup 421.593 Pellet 180.249 Background 306.113 Background 177.154 Corrected Sup 115.48 Corrected 3.095 Total Recovered 118.575 Pellet Fraction 0.973898 Fraction 0.026102

R1131 antibody with oligonucleotide 9254 (figure 11B) Sup 5575 Pellet 548.44 Background 1401.643 Background 326.442 Corrected Sup 4173.357 Corrected 221.998 Total Recovered 4395.355 Pellet Fraction 0.949493 Fraction 0.050507

R1131 antibody with oligonucleotide 9272 (figure 11C) Sup 3817.11 Pellet 2247.193 Background 548.683 Background 357.635 Corrected Sup 3268.427 Corrected 1889.558 Total Recovered 5157.985 Pellet Fraction 0.633664 Fraction 0.366336

Beads without R1131 antibody with oligonucleotide 9272 (figure 11C) Sup 5463.833 Pellet 511.109 Background 410.171 Background 352.824 Corrected Sup 5053.662 Corrected 158.285 Total Recovered 5211.947 Pellet Fraction 0.96963 Fraction 0.03037

H20 antibody with oligonucleotide 9060 (figure 14A) - 46 nt SL RNA Sup 867.612 Pellet 872.618 Background 680.752 Background 657.907 Corrected Sup 186.86 Corrected 214.711 Total Recovered 401.571 Pellet Fraction 0.465322 Fraction 0.534678

H20 antibody with oligonucleotide 9060 (figure 14A) - 53 nt SL RNA exon-containing transcript Sup 797.01 Pellet 787.775 Background 680.752 Background 657.907 Corrected Sup 116.258 Corrected 129.868 Total Recovered 246.126 Pellet Fraction 0.472352 Fraction 0.527648

Beads without H20 antibody with oligonucleotide 9060 (figure 14A) - 46 nt SL RNA Sup 1368.974 Pellet 722.682 Background 808.649 Background 702.784 Corrected Sup 560.325 Corrected 19.898 Total Recovered 580.223 Pellet Fraction 0.965706 Fraction 0.034294

Beads without H20 antibody with oligonucleotide 9060 (figure 14A) - 53 nt SL RNA exon- containing transcript Sup 1217.7 Pellet 706.86 Background 808.649 Background 702.784 Corrected Sup 409.051 Corrected 4.076 Total Recovered 413.127 Pellet Fraction 0.990134 Fraction 0.009866

H20 gel for ~141 nt transcript (figure 14B) (Ciona intestinalis 5S rRNA) Sup 8174.378 Pellet 6093.244 Background 6317.351 Background 6007.363 Corrected Sup 1857.027 Corrected 85.881 Total Recovered 1942.908 Pellet Fraction 0.955798 Fraction 0.044202

Sm antibody with oligonucleotide 9060 (figure 15A) - 53 nt SL RNA exon-containing transcript Sup 1043.086 Pellet 631.575 Background 915.925 Background 538.993 Corrected Sup 127.161 Corrected 92.582 Total Recovered 219.743 Pellet Fraction 0.578681 Fraction 0.421319

Sm antibody with oligonucleotide 9060 (figure 15A) - 46 nt SL RNA Sup 657.328 Pellet 666.489 Background 915.925 Background 538.993 Corrected Sup 0 Corrected 127.496 Total Recovered 127.496 Pellet Fraction 0 Fraction 1

Beads without Sm antibody with oligonucleotide 9060 (figure 15A) - 53 nt SL RNA exon- containing transcript Sup 569.28 Pellet 256.218 Background 397.167 Background 267.065 Corrected Sup 172.113 Corrected 0 Total Recovered 172.113 Pellet Fraction 1 Fraction 0

Beads without Sm antibody with oligonucleotide 9060 (figure 15A) - 46 nt SL RNA Sup 412.647 Pellet 225.664 Background 397.167 Background 267.065 Corrected Sup 15.48 Corrected 0 Total Recovered 15.48 Pellet Fraction 1 Fraction 0

Sm antibody with oligonucleotide 9284 (figure 15B) Sup 517.224 Pellet 589.704 Background 503.415 Background 503.447 Corrected Sup 13.809 Corrected 86.257 Total Recovered 100.066 Pellet Fraction 0.137999 Fraction 0.862001

Sm antibody gel (figure 15C) for ~174 nt transcript (Ciona intestinalis 5.8S rRNA) Sup 10260.63 Pellet 5343.182 Background 8341.235 Background 5274.342 Corrected Sup 1919.39 Corrected 68.84 Total Recovered 1988.23 Pellet Fraction 0.965376 Fraction 0.034624

Sm antibody gel (figure 15C) for ~132 nt transcript (Ciona intestinalis 5S rRNA) Sup 11804.29 Pellet 5375.882 Background 8341.235 Background 5274.342 Corrected Sup 3463.058 Corrected 101.54 Total Recovered 3564.598 Pellet Fraction 0.971514 Fraction 0.028486

RNase H digested RNA R1131 antibody with oligonucleotide 9281 (figure 17A) Sup 923.949 Pellet 534.157 Background 681.732 Background 590.536 Corrected Sup 242.217 Corrected 0 Total Recovered 242.217 Pellet Fraction 1 Fraction 0

RNase H digested RNA R1131 antibody with oligonucleotide 9272 (figure 17B) Sup 1120.088 Pellet 767.889 Background 627.797 Background 513.941 Corrected Sup 492.291 Corrected 253.948 Total Recovered 746.239 Pellet Fraction 0.659696 Fraction 0.340304

RNase H digested RNA beads without R1131 antibody with oligonucleotide 9272 (figure 17B) Sup 1527.614 Pellet 419.368 Background 437.823 Background 454.042 Corrected Sup 1089.791 Corrected 0 Total Recovered 1089.791 Pellet Fraction 1 Fraction 0

Ciona egg, 12 h embryo, and body-wall muscle for 9060 (figure 19A) - 46 nt SL RNA 12 h embryo 2400 Egg 1313.231 Background 817.175 Background 720.462 Corrected 1582.825 Corrected 592.769 Pellet Fraction of Egg 0.374501

Ciona egg, 12 h embryo, and body-wall muscle for 9108 (figure 19B) 12 h embryo 3265.5 Egg 2667.579 Background 381.586 Background 423.125 Corrected 2883.914 Corrected 2244.454 Pellet Fraction of Egg 0.778267