JENS VERBEEREN

RegulaƟ on of the Minor through AlternaƟ ve Splicing and Nuclear RetenƟ on of the U11/U12ͳ65K mRNA

Institute of Biotechnology and

Division of

Department of Biosciences

Faculty of Biological and Environmental Sciences and

Doctoral Programme in Integrative Life Science

University of Helsinki

ACADEMIC DISSERTATION

To be presented for public examination with the permission of the Faculty of Biological and Environmental Sciences in Auditorium 1041 in Biocenter 2 (Viikinkaari 5, Helsinki) on January 29th 2016, at 12 noon.

Helsinki 2016 Custos Professor Juha Partanen Department of Biosciences University of Helsinki Helsinki, Finland

Supervisor Docent Mikko Frilander Institute of Biotechnology University of Helsinki Helsinki, Finland

Th esis advisory committee Professor Yrjö Helariutta Docent Petri Auvinen Sainsbury Laboratory Institute of Biotechnology University of Cambridge University of Helsinki Cambridge, United Kingdom Helsinki, Finland

Reviewers Docent Tapio Heino Docent Noora Kotaja Department of Biosciences Department of Physiology University of Helsinki Institute of Biomedicine Helsinki, Finland University of Turku Turku, Finland

Opponent Associate Professor Stephen M. Mount Department of Cell Biology and Molecular Genetics University of Maryland College Park, Maryland, United States

Dissertationes Scholae Doctoralis Ad Sanitatem Investigandam Universitatis Helsinkiensis 8/2016

ISBN 978-951-51-1856-1 (paperback) ISBN 978-951-51-1857-8 (PDF) ISSN 2342-3161 (Print) ISSN 2342-317X (Online)

Layout: Tinde Päivärinta/PSWFolders Oy Hansaprint Helsinki 2016 http://ethesis.helsinki.fi “Die sijn tijtjen weet te gissen, En sijn toutjen weet te splissen, En sijn glas te roer te staen, Mag wel voor een bootsman gaen” Jacob Cats, 1632 TABLE OF CONTENTS

List of original Publications ...... vi Abbreviations ...... vii Abstract ...... viii 1. Review of the Literature ...... 1 1.1 ...... 1 1.1.1 Defi nition and Classifi cation ...... 1 1.1.1.1 Spliceosomal Introns ...... 1 1.1.1.2 tRNA Introns ...... 4 1.1.1.3 Group I Introns ...... 4 1.1.1.4 Group II Introns ...... 5 1.1.2 On the Origin of Introns ...... 6 1.1.2.1 Gain and Intron Loss ...... 7 1.1.2.2 Roots of the U12-type Intron ...... 8 1.1.3 What good is an Intron? ...... 8 1.2 RNA Splicing and the Spliceosome ...... 9 1.2.1 Spliceosome Composition ...... 10 1.2.2 and Intron Defi nition ...... 12 1.2.2.1 Splicing Enhancers and Splicing Silencers ...... 13 1.2.2.2 Splicing Activators and Repressors ...... 13 1.2.3 Spliceosome Assembly and Catalysis ...... 15 1.2.3.1 Major Spliceosome Assembly ...... 15 1.2.3.2 Minor Spliceosome Assembly ...... 17 1.2.4 ...... 18 1.2.4.1 Regulatory Role of Alternative Splicing ...... 19 1.2.4.2 Alternative Splicing of U12-type Introns ...... 19 1.3. Splicing and other pre-mRNA Processes ...... 20 1.3.1 Th e Timing of Splicing ...... 20 1.3.2 Splicing and Nuclear Export ...... 21 1.3.3 Splicing and Quality Control: Nonsense Mediated Decay ...... 22 1.3.4 Splicing and 3´ End Processing ...... 22 1.4 Th e Minor Spliceosome: Signifi cance ...... 23 1.4.1 Function and Regulation of the Minor Spliceosome ...... 23 1.4.2 Minor Spliceosome and Disease ...... 25 2. Aims of the Study ...... 26 3. Materials and Methods ...... 27 4. Results and Discussion ...... 28 4.1 Ultraconserved non-coding regions in minor spliceosome-associate genes are linked to alternative splicing ...... 28 4.2 U11 binds the conserved enhancer and activates alternative splicing through exon defi nition interactions mediated by the U11-35K protein...... 30 4.3 Alternative splicing leads to a multilayered inhibitory mechanism for 48K and 65K mRNA expression ...... 31 4.4 USSE provides negative auto-regulation for the 48K gene and minor spliceosome mediated cross-regulation for the 65K gene ...... 33 4.5 Evolutionary role of the USSE ...... 34 5. Concluding Remarks ...... 37 6. Acknowledgements ...... 38 7. References ...... 40

LIST OF ORIGINAL PUBLICATIONS

Th e thesis is based on the following articles, referred to in the text by their Roman numerals:

I. Verbeeren J*, Niemelä EH*, Turunen JJ*, Will CL, Ravantti JJ, Lührmann R, Frilander MJ. An ancient mechanism for splicing control: U11 snRNP as an activator of alternative splicing. Mol Cell 2010; 37(6):821-33. (Reproduced here with permission from Elsevier) II. Niemelä EH, Verbeeren J, Singha P, Nurmi V, Frilander MJ. Evolutionarily conserved exon defi nition interactions with U11 snRNP mediate alternative splicing regulation on U11- 48K and U11/U12-65K genes. RNA Biol 2015; 12(11):1256-64. (Reproduced here with permission from Taylor & Francis) III. Verbeeren J, Niemelä EH, Frilander MJ. Regulation of the minor spliceosome U11/U12- 65K mRNA by nuclear retention and transcriptional read-through. Unpublished manu- script.

* Equal contribution

Th e author´s contribution to each article:

I. JV planned experiments and analyzed alternative splicing of the U11/U12-65K gene in the contexts of of the regulatory element, knock-down of minor spliceosomal pro- teins, inhibition of nonsense mediated decay, and morpholino transfections. JV planned and performed the critical genetic rescue experiments, performed the luciferase assays, and wrote the article together with the other authors.

II. JV designed experiments related to the investigation of USSE distance requirements and supervised one undergraduate student who performed them. JV supervised another un- dergraduate student who performed the initial U11-35K tethering experiments. JV con- ducted experiments to evaluate the role of the 35K and other proteins in splicing activation (RT-PCR splicing analysis and qPCR), and wrote the article together with EN and MJF.

III. JV planned and performed virtually all experiments (single molecule FISH, qPCR, cellular fractionation, RNase protection assay, and RT-PCR splicing analysis), and wrote the article together with MJF.

vi ABBREVIATIONS

A adenine BP branch point BPS branch point sequence C cytosine CP cleavage/ EJC exon junction complex ESE exonic splicing enhancer ESS exonic splicing silencer EST expressed sequence tag FISH fl uorescence in situ hybridization G guanine hnRNP heterogeneous nuclear ribonucleic protein IEP intron-encoded protein ISE intronic splicing enhancer ISS intronic splicing silencer LECA last eukaryotic common ancestor lncRNA long non-coding RNA miRNA micro RNA mRNA messenger RNA NMD nonsense mediated decay nt nucleotide(s) ORF open reading frame PAP polyadenylate polymerase PCR polymerase chain reaction poly(A) polyadenylation PPT polypyrimidine tract pre-mRNA precursor mRNA PTC premature termination codon qPCR quantitative PCR R purine (A or G) RNA ribonucleic acid RNAP RNA polymerase RNP ribonucleic protein RRM RNA recognition motif RS arginine-serine-rich (domain) RUST regulated unproductive splicing and translation snRNA small nuclear ribonucleic acid snRNP small nuclear ribonucleic protein SR serine-arginine ss splice site T thymine U uracil USSE U11 snRNP-binding splicing enhancer UTR untranslated region Y pyrimidine (C, T or U)

vii ABSTRACT

Th e protein coding information in our genome is located on genes which are very oft en interrupted by non-coding regions called introns. For proper gene expression, introns must be removed accurately and the remaining protein coding parts, the , must be rejoined. Th is reaction, termed splicing, is carried out by an enormous macromolecular machine called the spliceosome, and is one of the most crucial steps in gene expression. Two diff erent intron types have been identifi ed in eukaryotes, each removed by their own dedicated spliceosome: the U2-type (or major) introns, which constitute the majority of introns, and the U12-type (or minor) introns, of which ca. 700-800 have been identifi ed in the human genome. Th e presence of a second type of intron and spliceosome has always been enigmatic. However, studies investigating U12-type intron removal have provided us with an important clue; it appears that U12-type introns are spliced less effi ciently than U2-type introns. Th is suggests that their removal could be rate-limiting for the expression of the genes that harbor these introns, and it also off ers the intriguing possibility that the activity of the minor spliceosome could be altered in response to changing cellular conditions. Th ese implications could off er a valuable explanation for the extraordinary conservation of the U12-type introns and the components that catalyze their excision. Th ere is currently not much known about the regulation of the minor spliceosome and this study aimed to address this issue. I have investigated the characteristics of a negative feedback loop that regulates the expression level of two essential and unique protein components of the minor spliceosome, the U11-48K and the U11/U12-65K proteins. In the genes that encode these proteins, an ultraconserved sequence element can be found which consists of a tandem repeat of U12-type 5´ splice sites. We uncovered that binding of U11/U12 di- on these elements leads to alternative splicing where an mRNA isoform is produced that is targeted for degradation or nuclear retention. Th e presence of such enhancer elements is conserved from plants to animals, highlighting an extreme selection pressure for this regulatory mechanism. I further investigated the role of the U11-35K protein, another protein uniquely associated with the minor spliceosome, in alternative splicing, and the functional requirements for enhancer binding. Furthermore, I uncovered the molecular mechanism by which the level of translational- competent U11/U12-65K mRNA is downregulated through U11/U12 di-snRNP enhancer binding.

viii Review of the Literature 1. REVIEW OF THE LITERATURE 1.1 Introns Unknown to science before 1977, the discovery of eukaryotic intragenic regions that were not part of the fi nal mRNA was truly dogma-shattering (Gilbert, 1978). Rather than consisting of one continuous sequence, the genetic information of many eukaryotic genes was found to be spread out onto pieces that must be accurately joined together in a process called splicing (Berget et al., 1977, Chow et al., 1977). Since then, the presence of the interrupting regions, the so-called introns, has been shown to provide an additional layer of biological complexity as alternative splicing generates multiple proteins from a single gene. Crucial to our understanding of gene evolution, disease, and mRNA processing, the discovery of “split genes” led to the 1993 Nobel Prize in Physiology or Medicine for Phillip Sharp and Richard Roberts.

1.1.1 Defi niƟ on and Classifi caƟ on Introns are nucleotide sequences that interrupt the coding regions of protein coding genes and the functional regions of RNA genes. As such, they can be found in various types of RNA (tRNA, rRNA, mRNA and lncRNA) in eukaryotic, viral, organelle, archaeal and bacterial genomes but their abundance, size and their mechanism of removal diff ers substantially across the lineages. In order to generate a functional protein or RNA product, introns need to be accurately removed and the remaining pieces, called exons, joined to one another. Th is process is called RNA splicing. On the basis of intron sequence, as well as the biochemical properties of the splicing reaction, four main intron classes can be distinguished.

1.1.1.1 Spliceosomal Introns Spliceosomal introns constitute the predominant class of eukaryotic introns and their removal is catalyzed by dedicated ribonucleoprotein complexes called . Two diff erent spliceosomes can be found in eukaryotic organisms: the major (or so-called U2-dependent) spliceosome and the minor (or so-called U12-dependent) spliceosome. Whereas the major spliceosome catalyzes the removal of the majority of introns, the minor spliceosome targets a specifi c and rare class of spliceosomal introns: the minor (or U12-type) type intron. Apart from their diff erential prevalence, the two intron types have distinct recognition sequences (“splice sites”), and both the composition of the spliceosome machinery and the mechanism of intron removal diff er (see 1.2). Th e most crucial pieces of information that introns possess are located on recognition sequences at the very 5´ and 3´ ends of the intron, which can extend a few nucleotides (nt) into the fl anking exons. Th ese so-called 5´ and 3´ splice sites (ss) are recognized by components of the spliceosomal machinery. Th e spliceosome recognizes another sequence motif, the so-called branch point sequence (BPS), typically located 20-40 nt upstream of the 3´ ss and containing the crucial adenosine that is required for the fi rst nucleophilic attack of the splicing reaction (Ruskin et al., 1984). In most metazoa, the BPS is followed by a polypyrimidine tract (PPT), a pyrimidine-rich motif crucial for effi cient BPS utilization and selection. Both the length and the sequence of the PPT are fl exible (Coolidge et al., 1997).

1 Review of the Literature

A

B

Figure 1. Splice site consensus sequences for U12-type (A) and U2-type (B) introns. Adapted from Turunen et al. (2013a).

In the case of the U2-type intron, the consensus 5´ ss is AG/GTAAGT, the slash denoting exon- intron boundary and the consensus 3´ ss is YAG/G (Fig. 1b). In higher eukaryotes, these splice sites are highly degenerate with the exception of the terminal dinucleotides, which are almost always GT-AG (Aebi et al., 1986). Th e most frequent exception, comprising 0.9 % of all U2-type introns, are those introns that have GC-AG dinucleotides at their exon-intron boundaries (Sheth et al., 2006). Other subtypes are extremely rare: in the human genome, 15 cases of AT-AC introns can be found (Sheth et al., 2006). In more basal and more intron-poor eukaryotic groups, the 5´ ss is much more conserved: perhaps a compensation mechanism for a lack of exonic splicing signals (Irimia et al., 2007, Irimia et al., 2009). Th e human BPS is a highly degenerative yUnAy and the sequence context, in addition to the presence of the PPT, is important for its utilization (Gao et al., 2008). In contrast, yeast introns have a strictly conserved consensus BPS, UACUAAC, and generally lack a PPT (Coolidge et al., 1997). On the other hand, the U12-type consensus 5´ ss is a highly conserved /RTATCCTTT, with a purine residue (R) at the +1 position relative to the exon-intron boundary (Fig. 1a). Th e terminal nucleotides at the 5´ and 3´ end of the U12-type intron show high interdependence (Dietrich et al., 2005) such that two diff erent varieties can be distinguished: the GT-AG and the AT-AC classes, of which the GT-AG class is the most common in human (ca. 76 %: Sheth et al. (2006)). Th e U12-type BPS consists of an 8 nt pyrimidine-rich sequence, followed by the branch point adenosine either in position 9 or 10 (McConnell et al., 2002) and then a further 2 nt enriched in pyrimidines. Th e distance of the BPS to the 3´ ss is generally short (10-20 nt) (Sharp and Burge, 1997) but can be, at least for the GT-AG subtype, extended to more than 35 nt (Dietrich et al., 2005). U12-type introns lack a PPT, further hinting, together with the obvious dissimilarities in splice sequence motifs, to a fundamental diff erence in intron recognition between the two spliceosomes. Genes of multicellular organisms are generally intron-rich. In human, the average gene contains about 7.8 introns (Sakharkar et al., 2004) and the expression of the monstrous titin gene requires the correct removal of no less than 362 introns (Bang et al., 2001). In contrast, about 3 % of all human genes contain no introns at all, primarily those encoding for G-protein coupled receptors and histones (Louhichi et al., 2011). Within the eukaryotic lineage there is signifi cant variation in intron density: Caenorhabditis elegans has 4.7 introns per gene (Schwarz et al., 2006), Arabidopsis thaliana has 4.4 (Haas et al., 2005) and has 3.4 (Drysdale et al., 2005). At the opposite end of the spectrum, the fungus Encephalitozoon cuniculio, the nucleomorph of the alga Guillardia theta, and the protozoan parasite Giardia

2 Review of the Literature lamblia all have very few introns: respectively 15, 17 and 4 (Jeff ares et al., 2006, Morrison et al., 2007). Spliceosomal introns seem to be completely lost from the parasitic microsporidian Enterocytozoon bieneusi and in the nucleomorph genome of Hemiselmis andersenii (Lane et al., 2007, Keeling et al., 2010). It is tempting to associate increasing intron density with increasing “biological complexity”, but the variation observed in fungi troubles that assumption. Whereas the yeasts Schizosaccharomyces pombe (0.9 introns per gene) and, especially, Saccharomyces cerevisiae (0.05 per gene) show low intron densities (Wood et al., 2002, Hirschman et al., 2006), the basidiomycete Cryptococcus neoformans displays intron densities similar to many multicellular eukaryotes (5.3 per gene) (Loft us et al., 2005). Furthermore, two early branching protists, the jakobids and the malawimonads, possess a large number of introns (Archibald et al., 2002). Clearly, widespread intron gain or intron loss, or a combination thereof, must have taken place at multiple times during evolution. U12-type introns are very rare: about 0.4 % of human introns are of the U12-type (Sheth et al., 2006), and Drosophila melanogaster, for example, harbors only 19 U12-type introns (Lin et al., 2010). Th ey are present in many eukaryotic lineages: vertebrates, cnidarians, insects, plants and even protists (Burge et al., 1998, Russell et al., 2006), suggesting an early evolutionary origin. Th ey are however absent and thought to be lost in Caenorhabditis elegans, Saccharomyces cerevisiae and Schizosaccharomyces pombe (Burge et al., 1998). Typically, within those genes that harbor U12-type introns, a single U12-type intron is found, surrounded by U2-type introns. Interestingly, several genes exist that have two, and in a single case even have three U12-type introns (Burge et al., 1998, Levine and Durbin, 2001). Between diff erent lineages, the average size of introns varies greatly. In protozoa, intron length rarely exceeds 100 nt (Wu et al., 2013). Th e introns of Encephalitozoon cuniculi are extremely short (23-52 nt) and the pygmy introns of the nucleomorph genome of Bigelowiella natans are the smallest known (18-21 nt) (Gilson et al., 2006, Lee et al., 2010). Saccharomyces introns are on average 266 nt long (Bon et al., 2003), and the length of Arabidopsis thaliana and Drosophila melanogaster introns is on average 158 nt and 818 nt, respectively (Hong et al., 2006). Vertebrate introns seem to have been expanding: the average human intron is 3749 nt long, comparable to that of other vertebrates (Hong et al., 2006, Wu et al., 2013). One notable exception are the puff er fi sh, amongst which Tetraodon rubripes has one of the most compact vertebrate genomes, partly due to a reduction in intron size (435 nt on average) (Guo et al., 2010). Th e nature of the gene itself can shape the length of the introns it harbors: in humans, introns of highly expressed genes and housekeeping genes are shorter than those of genes expressed at low levels. A possible explanation is that, since transcription is both time and energy consuming, selective forces might act to reduce intron length in highly expressed genes (Eisenberg and Levanon, 2003, Urrutia and Hurst, 2003). Even within a single gene there are diff erences: introns within the 5´ untranslated region (5´ UTR) and the fi rst intron within the coding region are typically longer than those located downstream (Bradnam and Korf, 2008). First introns show more sequence conservation relative to other introns (Keightley and Gaff ney, 2003) and have shown to be enriched in regulatory motifs (Majewski and Ott, 2002, Gaff ney and Keightley, 2004), partly explaining their increased lengths. Th e size of U12-type introns is comparable to that of U2-type introns (Levine and Durbin, 2001). However, in comparison to U2-type introns, in the length distribution of U12-type introns, there seems to be no enrichment for short introns (ca. 90 nt) (Levine and Durbin, 2001) and this might indicate a preference for exon defi nition mediated splice site recognition of U12- type introns (Patel and Steitz, 2003) (see 1.2.2 for a discussion of exon defi nition).

3 Review of the Literature 1.1.1.2 tRNA Introns Another type of introns, whose removal is completely based on protein components, can be found in tRNA precursors in two major lines of descent: the archaea and the eukaryotes (Abelson et al., 1998). Introns are also present in bacterial tRNA but they are self-splicing and belong to a diff erent class (group I introns, see 1.1.1.3). Introns in tRNA were fi rst described in Saccharomyces cerevisiae (Goodman et al., 1977, Valenzuela et al., 1978) where 59 of the 256 tRNA genes are interrupted by introns (Trotta et al., 1997). Th ey show no obvious sequence conservation at their splice sites, are 14-60 nt in length and are located immediately 3´ to the anticodon (Ogden et al., 1984). Th eir removal depends solely on protein components and occurs in a 3-step mechanism (Fig. 2a, and Abelson et al. (1998)). In the fi rst step, the tRNA endonuclease cleaves the intron at the splice sites. Since there is no sequence conservation at the splice sites, recognition occurs by measuring the distance from the mature domain of the tRNA to the splice site (Reyes and Abelson, 1988). In addition to this ruler-mechanism, some sequences within the intron itself can assist the recognition (Baldi et al., 1992, Di Nicola Negri et al., 1997). In a second step, tRNA ligase joins the ends in a reaction that is dependent on ATP hydrolysis (Abelson et al., 1998). Ultimately, the 5´ ss is dephosphorylated by a nicotinamide adenine dinucleotide- (NAD-) dependent phosphotransferase (McCraith and Phizicky, 1991). In archaea, tRNA introns are small and the splice sites are recognized because they are located in a conserved structural motif, the bulge-helix-bulge (BHB) motif (Th ompson and Daniels, 1988). Archaea operate by a similar pathway for tRNA intron removal: both tRNA endonucleases and tRNA ligases are present (Yoshihisa, 2014). Th e presence of such introns seems to be a restriction for the function of the tRNA and there is no conclusive explanation for their role. However, certain tRNA introns are known to contain motifs that promote tRNA modifi cation (Grosjean et al., 1997), perhaps driving selection pressure for intron maintenance.

1.1.1.3 Group I Introns In the genomes of bacteria, bacteriophages, mitochondria, chloroplasts, some eukaryotic viruses and lower eukaryotes, a third class of introns can be found that interrupts rRNA, mRNA and tRNA species (Cech, 1986, Haugen et al., 2005). Interestingly, these so-called group I introns are absent from nuclear genomes in multicellular organisms. Group I introns are self-splicing ribozymes but some do rely to varying degrees on protein factors (Nielsen and Johansen, 2009). Th ey show little conservation at the sequence level, however, the last nucleotide of the upstream exon is very oft en a U, and the last nucleotide of the intron sequence is a G (Nielsen and Johansen, 2009). Th ey are on average 250-500 nt long and consist of nine paired regions folding into helical domains (Woodson, 2005, Haugen et al., 2005). Th eir removal occurs through two coupled transesterifi cation reactions where a 3´ hydroxyl group of an exogenous guanosine attacks the phosphodiester bond at the 5´ ss. Th e resulting free hydroxyl group at the upstream exon then attacks the 3´ ss, leading to ligation of the exons and intron release (Fig. 2b, and Cech (1990)). So far, no clear biological role has been attributed to group I introns: they seem to be selfi sh genetic elements with their self-splicing ability rendering them relatively neutral to the host. On the other hand, it has been suggested that they could regulate expression of rRNA and protein coding genes (Nielsen and Johansen, 2009), and assist in the correct folding of their tRNA and rRNA exons (Cao and Woodson, 2000, Rangan et al., 2004).

4 Review of the Literature 1.1.1.4 Group II Introns In bacterial genomes and the organelle genomes of certain eukaryotes (fungi and plants in particular), a remarkable mobile genetic element can be found. Th ey are called group II introns and they combine the ability to self-splice with, through the reverse transcriptase activity of an intron-encoded protein (IEP), the ability to invade and to populate new DNA sites (Lambowitz and Zimmerly, 2011). Group II introns are characterized by a 400-800 nt long conserved sequence, consisting of 6 domains that fold such that distant sites are brought together to form an active site. Th is structure binds the splice sites and a branch-point nucleotide, and uses Mg2+ ions for catalysis (Lambowitz and Zimmerly, 2011). As is the case with group I introns, a two-step transesterifi cation drives their splicing reaction (Fig. 2c, and Peebles et al. (1986)): a nucleophilic attack from the 2´ hydroxyl of the bulged branch point adenine to the 5´ ss, leading to the formation of an intron lariat-3´ exon intermediate. Th is is followed by the nucleophilic attack of the 3´ hydroxyl of the cleaved 5´ exon to the 3´ ss, resulting in lariat release and exon ligation. Th e multifunctional IEP that is encoded within group II introns both promotes the splicing reaction (Carignani et al., 1983), and reverse transcribes the excised intron which, through its ribozyme activity, can reverse splice into DNA. As such, group II introns are highly mobile elements and are thought to have shaped the genomic evolution of eukaryotes (Sharp, 1985, Cech, 1986, Zimmerly et al., 1995). Th ey are absent from eukaryotic nuclear genomes, where mechanisms might have evolved that stopped their proliferation. Specifi cally, eukaryotic nuclei have low Mg2+ concentrations, incompatible with the high concentrations required for the catalysis of group II splicing (Lambowitz and Zimmerly, 2011, Truong et al., 2013). In addition, group II introns are believed to be the ancestors of both spliceosomal introns (Cech, 1986, Burge et al., 1998): their splicing pathways are identical (see 1.2.3, and Fig. 2), and there are structural similarities between spliceosomal RNAs and critical domains of group II introns (Madhani and Guthrie, 1992, Shukla and Padgett, 2002, Toor et al., 2008, Dayie and Padgett, 2008, Pyle, 2010). Furthermore, the 5´ and 3´ ends of group II introns have conserved sequences, GUGYG and AY, respectively, which are remarkably similar to those of spliceosomal introns.

5 Review of the Literature

Figure 2. Splicing mechanisms for the four diff erent intron classes. tRNA introns (A), group I introns (B), group II introns (C), and spliceosomal introns (D). Aft er Cech (1990).

1.1.2 On the Origin of Introns Our genomes are crowded with spliceosomal introns: in human, several hundred thousands have been identifi ed (Sakharkar et al., 2004). Where do they come from? Because of reasons discussed in 1.1.1.4, it seems reasonable to assume that spliceosomal introns were derived from self-proliferating and self-splicing selfi sh genetic elements, the group II introns, which then lost their ability to self-splice and subsequently increased their dependence on trans-acting factors. Th e presence of spliceosomal introns is a universal feature of even the most basal eukaryotic genomes but they are missing from archaeal and bacterial genomes. Based on their omnipresence in eukaryotic genomes (Jeff ares et al., 2006, Morrison et al., 2007) and the conservation of intron positions in orthologous genes of much diverged lineages (Marchionni and Gilbert, 1986), it is clear that spliceosomal introns are ancient but the timing of their appearance is the subject of intense debate in which two main theories have been proposed. Th e introns late theory (Cavalier-Smith, 1985, Palmer and Logsdon, 1991) postulates the appearance of introns within the eukaryotic lineage, where inserted transposable elements and reverse splicing formed sources of new introns. Th e introns early model (Darnell, 1978) counters that introns were present already in the last universal common ancestor (LUCA), and promoted, through recombination within intronic sequences, the formation of new gene products in a process called exon shuffl ing (Gilbert, 1978). Introns have since then been lost in the prokaryote and archaeal lineages due to a need for streamlining their genome for shorter replication times and/or due to their large population sizes (Lynch and Richardson, 2002). Although proponents of each theory persist, the current general consensus can be interpreted as a compromise: spliceosomal introns appeared at the time of the origin of eukaryotes (kind of

6 Review of the Literature late), and are descendants from pre-existing self-splicing introns. A more suggestive hypothesis puts forward the notion that, during the endosymbiosis event in which mitochondria were acquired, spliceosomal introns originating from α-proteobacterial group II introns transferred to the host genome (Cavalier-Smith, 1991). Recent comparisons amongst eukaryotes provide evidence that the ancestors of the last eukaryote then experienced a dramatic increase in introns (Koonin, 2009). Selection then drove these early organisms to compartmentalize their genome into a nucleus to stop proliferation of the self-replicating introns (perhaps due to the lower Mg2+ levels), and to fragment the group II introns into spliceosomal small nuclear RNAs (snRNAs) with increasing reliance on protein factors. Such fragmentation into trans-acting factors might have promoted the loss in group II introns, as any aff ecting their self-splicing abilities could be compensated if the proto-spliceosome were able to recognize the intron boundaries (Stoltzfus, 1999). In addition, a trans-acting spliceosomal system would relieve the mutational pressure on a system where the genome is crowded with group II introns for which the number of constrained nucleotides is very high (Lambowitz and Zimmerly, 2011, Irimia and Roy, 2014). Given that the intron densities diff er by orders of magnitude in diff erent lineages, from these ancient ancestors on, there have been diff erent rates of intron loss (and gain) in the diverse lineages (Rogozin et al., 2003).

1.1.2.1 Intron Gain and Intron Loss A great number of introns was already present in the ancestral genome (Carmel et al., 2007). Comparisons of exon-intron structures between highly conserved genes suggest that both signifi cant intron loss and gain have occurred during eukaryotic gene evolution (Rogozin et al., 2003). Remarkably, introns have been gained recently in the rab4 gene within a small Daphnia pulex population endemic to Oregon (Omilian et al., 2008), and an intron has been acquired in the primate-specifi c RNF113B retrogene in human only (Szczesniak et al., 2011), showing that intron gain can be a recent evolutionary phenomenon. Th e mechanism behind intron gain events could be through a process similar to group II intron retro-transposition (Sharp, 1985), in which a reverse splicing event is combined with reverse transcription. Additionally, transposable elements could, under certain conditions, give rise to spliceosomal introns (Purugganan and Wessler, 1992). Another mechanism is the tandem duplication of sequences containing an AGGT tetramer where de novo 5´ and 3´ splice sites are created with retention of the reading frame aft er intron removal (Rogers, 1989). isTh model predicts that introns produced by such duplication event will display sequence similarity at their boundaries. Th ere might be evidence for such proliferation at the intron-exon boundaries of the U12-type AT-AC class in the human genome: the two most common nucleotides at positions -2 and -1 of the 5´ exon are A and C, respectively, and the 4 most common nucleotides at positions +1, +2, +3 and +4 of the 3´ exon are A, T, A and T respectively (data obtained from Splicerack database: Sheth et al. (2006)). Th is raises the possibility that initial proliferation of these introns could be achieved by tandem replication of a sequence ACATATCCT. Th e weakness of this hypothesis is that it would require the fortuitous presence of a BPS within the duplicated sequence, which, in the case of the minor spliceosome, is a rather conserved sequence element. Massive intron losses, on the other hand, seem to be the dominant outcome for many lower eukaryotes. Here, mechanisms of intron loss include but are not limited to, reverse transcription of processed mRNA followed by gene conversion, and simple genomic (Palmer and Logsdon, 1991, Lynch and Richardson, 2002).

7 Review of the Literature 1.1.2.2 Roots of the U12-type Intron Ever since the discovery of the U12-type intron and its spliceosome (Jackson, 1991, Hall and Padgett, 1994, Tarn and Steitz, 1996b, Sharp and Burge, 1997), one of the most intriguing aspects was their origin. U12-type introns, albeit rare, can be found in diverse eukaryotic lineages and were, highly likely, already present in the last eukaryotic common ancestor (LECA) (Russell et al., 2006). One of the earliest hypotheses concerning their origin is the fi ssion-fusion model (Burge et al., 1998), where it was postulated that spliceosomal U12-type and U2-type introns were derived from a common ancestor. Speciation of two separate lineages, each with their own spliceosome and intron-type, occurred and was then perhaps followed by endosymbiosis, where their genetic material was fused. Over time, the vast majority of the acquired U12-type introns were then converted into U2-type introns because, due to the severe sequence constraints imposed on the U12-type splice site sequences, the reciprocal conversion is an unlikely event. Th is model is supported by the observation that there are more genes with multiple U12-type introns than expected by chance, and by the presence of paralogous genes with U12-type introns in non-homologous positions so that diff erent sets of U12-type introns were either lost or converted (Burge et al., 1998). However, the model has been criticized as to what extent an organism could survive and reproduce aft er undergoing such a radical fusion event (Lynch and Richardson, 2002). A study of the amino acid distribution at intron-containing sites that are subject to extreme evolutionary constraints revealed that primordial spliceosomal introns were probably of the U2-type (Basu et al., 2008b). Here, a scenario is suggested in which two separate group II intron invasions took place into the early eukaryotic ancestor, in which the U2-type introns would be the fi rst to populate the genome followed by a smaller-scale invasion of U12- introns. Whatever the scenario, it is evident that, since those early events, there has been massive loss of U12-type introns, at least in invertebrates (Lin et al., 2010), or even complete loss, as is the case in some yeasts and Caenorhabditis elegans (Burge et al., 1998, Bartschat and Samuelsson, 2010). Interestingly, U12-type introns are lost to a much greater degree than they are converted to U2-type introns (Lin et al., 2010). Why have they not yet been completely purged from our genomes? Th e answer might lie in the fact that U12-type intron loss and/or conversion can be a relatively slow process for certain organisms, and only obtainable by organisms where population size and generation time are high and short enough, respectively, so that genome streamlining can occur. On the other hand, both the conservation of U12-type introns in the gene encoding the sodium channel α-subunit between humans and jellyfi sh (somehow withstanding loss or conversion to U2-type) (Wu and Krainer, 1999, Patel and Steitz, 2003) and the fact that U12- type intron positions are more conserved than U2-type intron positions between Arabidopsis thaliana and human (Basu et al., 2008a), are remarkable observations and highly indicative of a functional role for the U12-type intron.

1.1.3 What good is an Intron? Seemingly, the presence of introns can be considered a burden for the cells. Indeed, for most genes, only a minority of transcribed nucleotides will eventually code for protein and, with introns ranging up to several hundreds of thousands of nucleotides long, the interruption of our genetic material seems wasteful and energy-consuming. For proponents of the introns early theory, introns have already served us well by allowing for a process called exon shuffl ing, in which exons from diff erent genes can combine and give birth to new genes. Whether they

8 Review of the Literature were crucial for the assembly of long modern genes or not, introns have become indispensable genetic elements ever since. Virtually every eukaryotic genome harbors spliceosomal introns and it is highly probable that, if introns suddenly somehow were to disappear, mass extinction at the eukaryotic domain would be the unfavorable outcome. In fact, RNA splicing is intimately linked with almost every known process in the mRNA maturation pathway (Maniatis and Reed, 2002). Splicing factors are connected with elongation factors to promote transcription activation (Fong and Zhou, 2001), and RNA splicing promotes effi cient mRNA export to the cytoplasm (Luo and Reed, 1999) as well as effi cient cleavage and polyadenylation of the transcript (Dye and Proudfoot, 1999, Vagner et al., 2000b). In addition, RNA splicing and associated components enable an extensive number of quality control mechanisms: the inhibition of premature cleavage of transcripts by the U1 snRNP (Kaida et al., 2010, Berg et al., 2012), ensuring nuclear retention of potentially toxic unspliced pre-mRNAs (Dreyfuss et al., 2002), and tagging transcripts that contain premature termination codons for destruction by the nonsense mediated decay (NMD) pathway (Lykke-Andersen et al., 2001). Introns also off er a practice ground for evolutionary experimentation: they provide space for the development of new promoter elements and they can harbor and develop elements that regulate gene expression (Chorev and Carmel, 2012). Furthermore, hundreds of intronic nested genes (i.e. ORFs located within an external host gene) can be found in the human genome (Yu et al., 2005), and introns are the source of many RNA genes such as microRNAs (miRNAs), small nucleolar RNAs (snoRNAs), piwi-interacting RNAs (piRNAs), and various long non- coding RNAs (lncRNAs) (Rearick et al., 2011). Th rough a process called back-splicing, in which a downstream 5´ss is joined with a 3´ss of an upstream exon, introns allow the formation of circular RNAs (circRNAs) (Nigro et al., 1991, Cocquerelle et al., 1992, Cocquerelle et al., 1993). Th e function of these circRNAs is still largely unknown but they have been shown to act as a “molecular sponge” to modulate miRNA activity (Hansen et al., 2013, Memczak et al., 2013), and are hypothesized to bind and sequester RNA-binding proteins or ribonucleoprotein complexes amongst several other suggested functions (reviewed in Hentze and Preiss (2013)). Most notably, introns allow for the process of alternative splicing in which novel transcripts are produced from a single gene through incorporation of alternative exons or parts of the exons, and hence increase proteome diversity (Nilsen and Graveley, 2010). Th ey can regulate alternative splicing, not just by their mere existence, but due to the presence of regulatory elements that either promote (intronic splicing silencer) or prevent (intronic splicing enhancer) splice site activation. In summary, despite being ultimately derived from selfi sh elements and perhaps at some point challenging our earliest ancestors to an enormous evolutionary task: introns allow for an intricate regulation (the what, when and where) of the genes that harbor them and, as such, provide a driving force for the complexity of biological life.

1.2 RNA Splicing and the Spliceosome Introns are removed and exons are rejoined through the cooperative action of proteins and RNAs in what probably is the largest molecular machine in the cell (Nilsen, 2003, Valadkhan and Jaladat, 2010): the spliceosome. It faces the enormous task of ensuring the effi ciency and accuracy of the splicing reaction. Furthermore, fl exibility of splice site choice enables the process of alternative splicing but appropriate regulation in a time- and space-dependent manner must be achieved. Here, we will investigate how the spliceosome tackles these challenges, and the

9 Review of the Literature composition and characteristics of its splicing reaction, requiring the coordinated input of as many as 150 proteins (Valadkhan and Jaladat, 2010), will be summarized.

1.2.1 Spliceosome ComposiƟ on Th e spliceosome is a massive ribonucleoprotein, consisting of a core of fi ve small nuclear ribonucleic proteins (snRNPs), of which the associated snRNAs are termed U1, U2, U4, U5 and U6 snRNA, respectively. With the exception of the U6 snRNA, all snRNAs harbor binding sites for Sm proteins that form a seven-member ring-like core structure around the RNA and these proteins are crucial for the correct processing and nuclear localization of the snRNA (Kambach et al., 1999). Th e U6 snRNA instead, is bound by a set of seven proteins homologous to the Sm proteins, the like Sm proteins (Lsm). Th ese form a heptameric Lsm2-8 ring with a similar role as the Sm proteins (Spiller et al., 2007). Apart from this set of shared proteins, each of the snRNPs also has a specifi c set of protein factors (Will et al., 1993). Important components of the U1 snRNP include the SR-like protein U1-70K (see 1.2.2), and the U1-A and U1-C proteins, which stabilize the interaction of the U1 snRNA with the 5´ ss (Will and Lührmann, 2011). Th e core of the U2 snRNP consists of the stably associated U2A´ and U2B´´ polypeptides, as well as the heteromeric protein complexes SF3a and SF3b (Will et al., 2002). Th e U4 snRNP and U6 snRNP are linked to each other through protein components and extensive base pairing interactions to form the U4/U6 di-snRNP (Nottrott et al., 2002). Th e U5 snRNP contains a distinct set of proteins, most notably the multifunctional Prp8 protein, and associates through protein factors with U4/U6 to form the U4/U6.U5 tri-snRNP, comprising more than 30 proteins (Nguyen et al., 2015). Apart from these snRNP associated proteins, there are a large number of protein factors that associate with the spliceosome from one stage of splicing to the next (Behzadnia et al., 2007, Fabrizio et al., 2009, Will and Lührmann, 2011). Proteins that assist in splice site recognition, such as SF1 (splicing factor 1) and U2AF (U2 auxiliary factor), which recognize the BPS, PPT and 3´ ss, form an integral part of the spliceosome. A large set of helicases carries out the extensive structural and compositional remodeling of the snRNPs that are required during stage transitions (Cordin et al., 2012). Several cyclophilins, a subfamily of peptidyl-prolyl cis-trans isomerases, facilitate conformational changes within the spliceosome (Th apar, 2015). Kinases can modulate protein-protein interactions within the spliceosome (Misteli, 1999). Furthermore, various splicing activators and repressors (such as SR proteins and hnRNPs, see 1.2.2.2) that play regulatory roles in splicing can associate with the spliceosome in a tissue-, time- or intron- dependent manner (Will and Lührmann, 2011). Finally, some of the spliceosome-associated proteins are crucial for coupling with the machineries of other pre-mRNA processes, such as transcription and polyadenylation (reviewed in Maniatis and Reed (2002)). In the minor spliceosome, the U1, U2, U4 and U6 snRNAs are replaced by the U11, U12, U4atac and U6atac snRNAs, respectively, whereas the U5 snRNA is a shared component of both spliceosomes (Montzka and Steitz, 1988, Tarn and Steitz, 1996a, Tarn and Steitz, 1996b, Incorvaia and Padgett, 1998, Patel and Steitz, 2003). Similarly to major-type snRNAs, Sm proteins form a ring around the minor-type snRNAs, with the exception of U6atac which instead binds to Lsm proteins, similarly to U6 snRNA (Tarn and Steitz, 1996a). Th e sequences of the U11 and U12 snRNAs are not homologous to those of U1 and U2, however, their secondary structures show a great deal of similarity (Montzka and Steitz, 1988, Patel and Steitz, 2003). Apart from structural homology, they also show functional homology: U11 recognizes the U12-type 5´ ss and U12

10 Review of the Literature recognizes the U12-type BPS (Hall and Padgett, 1996, Kolossova and Padgett, 1997). Unlike their major-type counterparts, U11, which also exists as a mono-particle, and U12 combine to a preformed U11/U12 di-snRNP (Wassarman and Steitz, 1992). U4atac and U6atac are ca. 40% homologous to their major-type cousins (Tarn and Steitz, 1996a), and they are known to interact with one another through base pairing. Th ey associate with U5 to form the U4atac/U6atac.U5 tri-snRNP (Tarn and Steitz, 1996a), of which the protein composition does not diff er from the major-type tri-snRNP (Schneider et al., 2002). Overall, the minor spliceosome shares many of the protein factors found in the major spliceosome but there are some crucial diff erences (Table 1 and Will et al. (2004)). U11 snRNP lacks the U1 snRNP-specifi c proteins U1-70K, U1-A and U1-C, whereas U12 snRNP (within the U11/U12 di-snRNP) lacks the SF3a complex. Instead, a unique set of proteins specifi c to the minor spliceosome are found from the U11/12 di-snRNP, named 20K, 25K, 31K, 35K, 48K, 59K and 65K. Th e 25K, 35K, 48K and 59K proteins are also found in U11 mono-snRNPs (Will et al., 2004). Of these proteins, few have been adequately characterized. Th e U11-48K protein has been shown to participate in the recognition of the U12-type 5´ ss (Turunen et al., 2008, Tidow et al., 2009) but it is also necessary for the stability of the U11/U12 di-snRNP (Turunen et al., 2008). Based on sequence homology, the U11-35K is thought to be a functional homologue of U1-70K (see 1.2.2.2) (Will et al., 1999, Turunen et al., 2008). Th e U11/U12-65K protein binds to the U11-59K protein and the U12 snRNA, and in this way, bridges the U11 and U12 snRNPs (Benecke et al., 2005). Th e U11-59K protein has been characterized as a DNA-binding protein induced during apoptosis (Park et al., 1999). Finally, the U11/U12 di-snRNP appears to contain a protein involved in 3´ ss recognition, ZRSR2 (also known Urp). Th is protein was originally described to function in 3´ ss recognition for both spliceosomes (Shen et al., 2010). However, a recent analysis of the transcriptome derived from cells of patients suff ering from myelodys- plastic syndrome (MDS) with defects in the ZRSR2 gene suggests that this protein is necessary for the 3´ ss recognition of U12-type introns only (Madan et al., 2015).

Table 1. Composition of human spliceosomal snRNPs. Aft er Will et al. (2004) and Will and Lührmann (2011).

snRNP Shared Proteins Spliceosome-specifi c Proteins U1 Sm proteins U1-70K, U1-A, U1-C

U2 Sm proteins, SF3b (7 subunits) SF3a (3 subunits), U2-A´, U2-B´´

U5 Sm proteins, hPrp8, hBrr2, Snu114, - hPrp6, hPrp6, hPrp28, 52K, 40K, hDib1

U4/U6 and U4atac/ Sm proteins, Lsm 2-8, hPrp3, hPrp31, - U6atac hPrp4, CypH, 15.5K

U4/U6.U5 and U4atac/ Sm proteins, Lsm 2-8, U5- and U4/U6- - U6atac.U5 specifi c proteins, hSnu66, hSad1, 27K

U11/U12 Sm proteins, SF3b (7 subunits) 20K, 25K, 31K, 35K, 48K, 59K, 65K

11 Review of the Literature 1.2.2 Exon and Intron Defi niƟ on In higher eukaryotes, the average exon size is small and introns can be tens of thousands of nucleotides long (Sakharkar et al., 2005) and harbor many cryptic splice sites. It is therefore challenging to understand how accurate recognition of 5´ and 3´ splice sites across introns can take place. A resolution to this is provided by the exon defi nition model that postulates that exons, rather than introns are the basic unit of recognition (Fig.3a, and Robberson et al. (1990)). Here, individual splice sites are not independently recognized sequences. Instead, spliceosomal components that bind the intronic 3´ and 5´ ends interact across the exon to promote spliceosome assembly and catalysis (Berget, 1995). Pairing of splice sites is enabled through the binding of the small U2AF35 subunit of the U2AF35/65 auxiliary splicing factor at the upstream 3 ´ ss, and the U1-70K protein of the U1 snRNP binding at the downstream 5´ ss (De Conti et al., 2013). Th ese so-called exon defi nition interactions are oft en facilitated by members of the serine-arginine (SR) protein family and mediated through arginine-serine-rich domains (see 1.2.2.2). Th e size of the exon can have a detrimental eff ect on the effi ciency of pairing for exon defi nition: in experiments were the exon size was expanded to more than 300 nt, spliceosome formation was inhibited. On the other hand, steric hindrance imposes a minimal distance (ca. 50 nt) for exon defi nition to occur (Dominski and Kole, 1991, Sterner et al., 1996). For short introns, and thus for many introns in lower eukaryotes such as yeast, splice site pairing takes place across introns (Talerico and Berget, 1994). As such, during intron defi nition, the recognition module operates through pairing adjacent 5´ and 3´ splice sites within one and the same intron (Fig. 3b). Experiments in which Drosophila and yeast introns were expanded lead to splicing defects such as intron retention (Talerico and Berget, 1994), indicating a preference for splice site recognition through intron defi nition in these organisms.

A SR U1 snRNP U1- 70K U1 U2AF65 35

Py-tract AG Exon

B SR U1- 70K U1 U1 snRNP U2AF65 35 Exon Py-tract AG Exon

Figure 3. SR protein-mediated exon (A) and intron (B) defi nition models for splicing.

12 Review of the Literature

A special recognition mechanism is required for the defi nition of the fi rst and the last exon of a transcript. Th e fi rst exon is defi ned through a mechanism that requires 5´ capping: cell extracts where the cap binding protein complex has been depleted show ineffi cient splicing (Izaurralde et al., 1994). For terminal exon defi nition to occur, spliceosomal components must team up with cleavage/polyadenylation factors (see 1.3.4, and Berget (1995)). Whatever the model of defi nition, there is no diff erence in subsequent spliceosome assembly and for diff erent introns on the same pre-mRNA both exon defi nition and intron defi nition can occur. Finally, it is important to realize that splice site decision is not solely dependent on the strength of the splice site sequences and their respective positioning: there exists a plethora of cis-acting regulatory elements that can promote or inhibit spliceosome assembly (Zhang et al., 2008a, De Conti et al., 2013).

1.2.2.1 Splicing Enhancers and Splicing Silencers Especially in the case of U2-type introns from higher eukaryotes, 5´ and 3´ ss sequences can be highly degenerate and, as a consequence, splice site-like sequences are fairly common motifs in the genome (Sheth et al., 2006). Many of these, however, are never activated for splicing. In fact, the mere presence of the splice sites is oft en insuffi cient to initiate splicing. Information over the suitability of splice sites is oft en conveyed by auxiliary cis-acting splicing regulatory elements (Zhang et al., 2008a). In the case of splicing enhancers, they promote splicing and they can be located both in exons and introns (ESEs and ISEs, respectively). Proteins called splicing activators bind these enhancers, and recruit and stabilize components of the spliceosome machinery to assist in exon-intron defi nition. On the other hand, splicing silencers, present both in exons and introns (ESSs and ISSs, respectively), can attract splicing repressor proteins. In this way, activation of cryptic splice sites can be avoided or splicing enhancement is counteracted. Taken together, in the presence of both splicing enhancers and silencers, it is oft en the relative contributions of splicing activators and repressors that determine whether a given splice site is activated (Wang and Burge, 2008). Additional factors however, such as RNA secondary structure, have the ability to modulate splice site choice by suppression of pseudo-splice sites, for instance, or through stabilization of ESE sequences (Buratti et al., 2004b, Buratti et al., 2007).

1.2.2.2 Splicing AcƟ vators and Repressors Splicing enhancers recruit trans-acting splicing activators of which the most notable are those of the SR protein family (reviewed in Long and Caceres (2009)). SR proteins are typically characterized by the presence of 2 motifs: one or two copies of an N-terminal RNA-recognition motif (RRM) (Dreyfuss et al., 1988) and a signature C-terminal arginine and serinedipeptide- rich domain (RS domain), which enables protein-protein interactions and from which they derive their name. SR proteins are recruited to nascent sites of RNA polymerase II (RNAP II) transcription and interactions with the C-terminal domain of the largest subunit of RNAP II have been documented (Yuryev et al., 1996, Misteli et al., 1997). Th ey typically recognize exonic splicing enhancer sequences, through their RRM, and assist, via their RS domain, in exon defi nition interactions through recruitment and stabilization of spliceosomal factors (Long and Caceres, 2009). More specifi cally, SR proteins have been shown to interact with the SR-like U2AF35 protein that binds the 3´ ss, through their respective RS domains (Wu and Maniatis, 1993). In addition, they can interact with the SR-like U1-70K protein which also contains an RS domain (Wu and Maniatis, 1993). Th e U1-70K protein enhances U1 snRNP binding to the 5´

13 Review of the Literature ss, and its RS domain enables a network of cross-exonic protein-protein interactions connecting to the upstream 3´ ss (Fig. 3a). Th is method of assisting in exon defi nition interactions is not limited to exon bridging in the context of two U2-type introns. Most U12-type introns are surrounded by neighboring U2-type introns. Here, SR proteins have also been shown to promote binding of the U11 snRNP and U12 snRNP to respectively the 5´ ss and BPS of U12-type introns (Hastings and Krainer, 2001). Th e minor spliceosome specifi c U11-35K protein is a likely candidate to participate in exon defi nition interactions as its domain structure is similar to that of the U1-70K (N-terminal RRM and C-terminal RS domain) with its RS domain thought to stimulate protein-protein interactions across exons (Will et al., 1999). Additionally, SR proteins can assist in spliceosome recruitment and stabilization, not through promotion of exon defi nition interactions, but by forming a network across the intron to bridge U1-70K and U2AF35 via their RS domains (Fig. 3b, and Long and Caceres (2009)). Generally, SR proteins are known to antagonize the negative eff ects of heterogeneous nuclear RNPs (hnRNPs: see hereinaft er) on splicing (Long and Caceres, 2009). It is too simplistic to designate SR proteins as splicing activators per se. Th eir function oft en depends on the context of the pre-mRNA sequence it binds. For instance, the prototypical SR protein SRSF1 (SF2/ASF) can bind an intronic splicing silencer to inhibit adenovirus IIIa pre- mRNA splicing (Kanopka et al., 1996). In addition, the phosphorylation state of serine residues within the RS domain can aff ect the splicing outcome: dephosphorylated SRSF10 has been shown to suppress splicing during heat-shock (Shin et al., 2004). Th e phosphorylation state of SR proteins has been linked with another important function of SR proteins: the ability to attract export factors. For this, certain SR proteins become hypophosporylated upon their contribution to splicing which, in turn, increases their affi nity for the general export receptor NFX1/TAP and ultimately leads to the nuclear export of the associated transcripts (Huang et al., 2004). In this way, the SR protein phosphorylation state helps the nuclear export machinery to distinguish between spliced and unspliced transcripts (Huang and Steitz, 2005). Th e role of SR proteins does not stop at the level of splicing activation/repression and mRNA export: the well-studied SRSF2 protein (SC35), for example, promotes RNAP II elongation in a subset of genes (Lin et al., 2008), and some SR proteins have been shown to facilitate recruitment of the U4/U6.U5 tri-snRNP (Roscigno and Garcia-Blanco, 1995). Furthermore, the RS domain in SR proteins can act as a chaperone of RNA-RNA interactions to enable transitions in the spliceosome (Shen and Green, 2007). Finally, SR proteins have also been shown to mediate mRNA stability and to regulate mRNA translation (reviewed in Huang and Steitz (2005)). Whereas SR proteins are generally considered to be activators of splicing, members of the hnRNP family are generally thought to act as repressors with a preference for intronic sequences (reviewed in Martinez-Contreras et al. (2007)). In this context, they oft en promote exon skipping by occluding the binding of spliceosomal components to an overlapping or adjacent site. For instance, binding of hnRNP H near a 3´ ss has been shown to inhibit recognition by U2AF35 (Jacquenet et al., 2001). When hnRNP H binding sites overlap or are near the 5´ ss, U1 snRNP binding can be inhibited and exon skipping promoted (Buratti et al., 2004a). HnRNP A1, a known antagonist of SR proteins, can prevent binding of SRSF2 at an ESE, through its competitive binding at an overlapping ESS (Zahler et al., 2004). Another mode of splicing inhibition is through a “looping-out” mechanism in which homo-dimers are formed that bind at opposite sides of an exon to promote skipping. Here, the splice sites of two more distal pairs of exons are juxtaposed and their splicing stimulated (Martinez-Contreras et al., 2007). However, like SR proteins, their function in splicing is not one-dimensional. Indeed, both hnRNP A1 and

14 Review of the Literature hnRNP H have been shown to contribute positively to splicing: when there are no exons located in between hnRNP binding sites, interactions between bound hnRNPs can loop out intronic sequences to assist in intron defi nition (Chabot et al., 2003). Furthermore, binding of hnRNP H at a G-tract (sequence with at least 3 consecutive G residues) near a U12-type 5 ´ss has been suggested to be required for U11 snRNP binding (McNally et al., 2006). Interestingly, 17 % of U12-type introns harbor at least two G-tracts within the fi rst 50 nt following their 5´ ss (McNally et al., 2006). Similarly to SR proteins, hnRNPs are involved not only in splicing. Th ey are oft en multitaskers participating in processes as various as mRNP export, DNA repair and chromatin remodeling (reviewed in Han et al. (2010)). Finally, it is noteworthy that splicing activators and repressors do not need to be limited to protein components. A snoRNA regulates alternative splicing of serotonin receptor 2C by binding to an ESS (Kishore and Stamm, 2006). In addition, there are examples in which metabolites directly interact with dynamic RNA structures to aff ect splicing. In an intron in the NMT1 gene of the eukaryote Neurospora crassa, binding of thiamine pyrophosphate to a riboswitch alters the RNA structure and thereby prevents usage of a splice site by the spliceosomal machinery (Cheah et al., 2007).

1.2.3 Spliceosome Assembly and Catalysis Interactions of the snRNPs and other spliceosome components with the pre-mRNA are established in a step-wise manner. Th e spliceosome is both highly dynamic and fl exible: not only must it tediously assemble and disassemble for each splicing event (however, see Nilsen (2002)), it undergoes both radical structural and compositional changes at every step of its assembly (Will and Lührmann, 2011) and on the basis of biochemical methods, six diff erent complexes can be distinguished: the E, A, B, Bact, B*, and C complex (Fig. 4, and Will and Lührmann (2011)). Th e catalytic steps of both the U2- and the U12-splicing reactions are identical to those of group II introns (Figures 2c and 2d: Cech (1990)). Two transesterifi cation reactions take place: in the fi rst reaction, a nucleophilic attack of the 2´ hydroxyl of the bulged branch point adenosine to the 5´ ss forms an intron lariat intermediate. Th is is followed by another nucleophilic attack. Th is time, the 3´ hydroxyl of the 5´ exon attacks the 3´ ss resulting in the release of the lariat intron and the joining of the exons. Unlike group II introns, which can function as stand-alone ribozymes, a great number of protein factors are involved in intron recognition and removal. At each step of the splicing reaction, proofreading takes place in which the same reactive sites in the pre-mRNA are recognized both by protein and RNA factors in order to establish accuracy and specifi city. Proteins also carry out important functions during spliceosome assembly and structural rearrangement steps, and they are crucial for recycling of the snRNPs. Furthermore, proteins improve the speed of splicing and guarantee unidirectionality of the splicing reaction. Finally, they provide fl exibility of splice site choice and are important mediators of alternative splicing.

1.2.3.1 Major Spliceosome Assembly Th e assembly of the major spliceosome starts with the formation of the commitment complex (E complex): the U1 snRNA base pairs to the 5´ ss (Mount et al., 1983, Zhuang and Weiner, 1986), and this interaction is stabilized by the U1-C protein (Heinrichs et al., 1990). In this complex, SF1 binds to the BPS, where it defi nes the branch site A (Liu et al., 2001), and the U2AF subunits 65 and 35 bind to the PPT and 3´ ss, respectively (Zamore and Green, 1989, Berglund et al., 1997,

15 Review of the Literature

Zorio and Blumenthal, 1999). Formation of the pre-spliceosome or the so-called A complex then follows: in a step that requires ATP hydrolysis, U2 snRNP is recruited by U2AF and it displaces SF1 at the BPS (Ruskin et al., 1988, Valcarcel et al., 1996). Base pairing interactions between the U2 snRNA and BPS cause the branch site adenosine to bulge out (Wu and Manley, 1989, Query et al., 1994).

Figure 4. Spliceosome assembly pathways of the major (left ) and the minor (right) spliceosome. Adapted from Turunen et al. (2013a). 16 Review of the Literature

During B complex formation, conformational and compositional changes in the spliceosome lead to the introduction of the U4/U6.U5 tri-snRNP (Konarska and Sharp, 1987). Th e pre- catalytic Bact complex is then formed: extensive remodeling of protein and RNA-RNA interactions takes place and the U4 snRNP is expelled, and U1 is replaced by U6, which base-pairs at the 5´ ss (Konarska and Sharp, 1987, Kandels-Lewis and Seraphin, 1993). Next, in preparation of the fi rst catalytic step of splicing, the 5´ ss and the branch point are brought into close proximity by interactions between U2 and U6, forming the catalytic core (Wu and Manley, 1991, Madhani and Guthrie, 1992). Th e U5 snRNP further assists in exon alignment through contacts at the exonic sides of the 5´ ss and the 3´ ss (Newman and Norman, 1992, Cortes et al., 1993, Sontheimer and Steitz, 1993). Subsequent catalytic activation by the DEAH-box RNA helicase Prp2 generates the activated complex B* and the fi rst step of splicing is catalyzed (Gencheva et al., 2010, Will and Lührmann, 2011). Aft er the fi rst step of splicing, SF3a and SF3b dissociate and other factors enter the spliceosome and facilitate the conformational changes required for complex C formation (Bessonov et al., 2008). Th e second catalytic step takes place leading to exon-exon ligation and release of the lariat. Finally, the spliceosome disassembles and the released snRNPs can take part in additional rounds of splicing (Will and Lührmann, 2011).

1.2.3.2 Minor Spliceosome Assembly Overall, the assembly of the minor spliceosome resembles that of the major spliceosome (Patel and Steitz, 2003). However, due to the nature of the U11/U12 di-snRNP and the distinct protein repertoire of the minor spliceosome, initial recognition diff ers between the two systems. For the minor spliceosome, the fi rst stage of assembly, the formation of the A complex, is characterized by the cooperative and simultaneous binding of the U12-type 5´ ss by the U11 snRNA, and the BPS by the U12 snRNA (Hall and Padgett, 1996, Kolossova and Padgett, 1997, Frilander and Steitz, 1999). Here, base-pairing of U11 with the U12-type 5´ ss is limited to 6 nucleotides (positions +4 to +9) but the U11-48K assists through its recognition of the fi rst three nucleotides of the U12- type intron (Turunen et al., 2008). Th e BPS of U12-type introns is very constrained and the 3´ end of the intron lacks a clear PPT, suggesting that recognition of the BPS by the U12 snRNP is more reliant on RNA-RNA interactions (Brock et al., 2008). Upon base pairing of U12 snRNA with the BPS, bulging of the branch point adenosine is achieved (Tarn and Steitz, 1996b). In addition, formation of the A complex requires the binding of Urp/ZRSR2, a U2AF35-like protein factor that recognizes the 3´ ss (Shen et al., 2010). Th e B complex of the minor spliceosome is characterized by the entry of the U4atac/U6atac.U5 tri-snRNP: U11 and U4atac dissociate, followed by base pairing of U6atac with U12 forming the catalytic core of the minor spliceosome (Tarn and Steitz, 1996a, Yu and Steitz, 1997, Incorvaia and Padgett, 1998, Frilander and Steitz, 2001). Similarly as in the minor spliceosome, this interaction, through additional base pairing of U6atac with the 5´ ss, brings the 5´ ss and BPS in close proximity and U5 snRNP aligns the exons in a similar way as during major spliceosome assembly. Th e two transesterifi cation reactions then take place and ultimately result in exon-exon ligation and lariat intron release. Disassembly and recycling of the snRNPs is thought to be similar to that of the major spliceosome (Damianov et al., 2004).

17 Review of the Literature 1.2.4 AlternaƟ ve Splicing During constitutive splicing, splicing events that take place in the majority of all cell types during various developmental stages generate the from a given gene. However, for almost all genes in higher eukaryotes (at least 95 %: Pan et al. (2008)), there is a fl exibility of splice site choice, and alternative splicing can generate multiple transcripts from one and the same gene. In this way, the number of diff erent genomic transcripts and the protein repertoire of the cell are greatly expanded (Nilsen and Graveley, 2010). In unicellular cells, however, alternative splicing is absent or very rare, and here, one gene provides one protein product (Ast, 2004). A number of diff erent splicing mechanisms can be employed by alternative splicing (Fig. 5). Th ese include, but are not limited to: exon skipping or inclusion, alternative 5´ ss activation with preservation of the original 3´ ss, alternative 3´ ss activation, intron retention where splicing has not taken place at all, and mutual exon exclusion where either one of two exons is included (Fig. 5, and Nilsen and Graveley (2010)). Which splicing event takes place is oft en dictated by the relative contributions and activity of the diff erent splicing activators and repressors in a given tissue or during a given developmental stage (see 1.2.2.2).

Exon skipping

Alternative 5ʹ ss

Alternative 3ʹ ss

Intron retention

Mutually exclusive exons

Figure 5. Mechanisms of alternative splicing. Adapted from (Ast, 2004).

Apart from the activity and concentration of splicing activators and repressors that bind enhancers or silencers, the elongation rate of the RNAP II can also have a profound eff ect on alternative splicing (Kornblihtt et al., 2004). A kinetic coupling model has been proposed in

18 Review of the Literature which transcriptional elongation can aff ect the timing at which splice sites are available to the spliceosome. Here, kinetic competition can have a signifi cant impact on alternative splicing decisions and slow elongation can favor the activation of an intrinsically weaker 3´ ss competing with a stronger but more downstream located 3´ ss (see 1.3.1, and Kornblihtt et al. (2004), Bentley (2014)). Finally, care must be taken that alternative splicing is well regulated: many genetic disorders result from abnormal splicing variants (Matlin et al., 2005, Tazi et al., 2009), and miss-regulated alternative splicing is also thought to contribute to the development of cancer (Skotheim and Nees, 2007, Fackenthal and Godley, 2008).

1.2.4.1 Regulatory Role of AlternaƟ ve Splicing Th e functional consequences of alternative splicing can be quite diverse. On one hand, it increases the proteome diversity and has the ability to change enzymatic properties, ligand specifi city or localization of the protein product (Kelemen et al., 2013). Alternative splicing can also have a profound eff ect on the localization, the stability and the abundance of the mRNA itself (reviewed in Kelemen et al. (2013)). For example, regulated unproductive splicing and translation (RUST) is a mechanism in which binding of cis-elements located on the mRNA dictate an alternative splicing event, so that the coding frame is disrupted and a premature termination codon (PTC) is introduced (Lewis et al., 2003). Th is will lead to degradation of the message by virtue of the NMD pathway (see 1.3.3). RUST is a regulated mechanism: it is triggered in certain cell types during specifi c conditions, and the cis-elements are oft en highly conserved revealing functional importance. Indeed, it has been shown that many splicing factors employ RUST to auto-regulate expression of their own gene or cross-regulate expression of other splicing factors (Lareau and Brenner, 2015). Diff erent regions in the 3´ UTR of the SRSF1 gene are responsible for its auto- regulation, which involves multiple layers of post-transcriptional and translational control (Sun et al., 2010). Increased levels of SRSF2 (SC35) promote alternative splicing in the 3´ UTR of its own gene, leading to transcripts that are degraded by NMD (Sureau et al., 2001). SRSF3 has been shown to be a master regulator of the SR protein family by auto-regulating its own gene, and through a cross-regulatory mechanism in which it directs alternative splicing of SRSF2, SRSF3, SRSF5 and SRSF7 to include PTC-containing exons (Änkö et al., 2012). A combination of auto- and cross-regulation also occurs for the splicing repressor PTB and its neuronal expressed paralogue nPTB (also known as PTBP1 and PTBP2, respectively). Here, PTB auto-regulates expression of its own gene and cross-regulates nPTB expression, both via non-productive alternative splicing (Spellman et al., 2007).

1.2.4.2 AlternaƟ ve Splicing of U12-type Introns Due to the constrained splice site sequences and the relative scarcity of these sequences in the genome, alternative splicing is rare for U12-type introns (Levine and Durbin, 2001, Chang et al., 2007). Th ere is evidence that minor splicing is responsive to exonic purine rich splicing enhancers and that exon skipping or inclusion, and alternative 3´ ss usage is possible in vivo for neighboring U12-type introns (Dietrich et al., 2001). For the human JNK2 gene, regulated alternative splicing exists in which mutually exclusive exon selection is driven by the activation of either one of two U12-type 5´ splice sites and a downstream U12-type 3´ ss (Chang et al., 2007). Such conformations are rare: U12-type introns are in reality exclusively surrounded by their major-type counterparts (with the exception of the AOX1 and XDH genes: Lin et al. (2010)), and

19 Review of the Literature adjacent alternative U12-type 5´ or 3´ splice sites are oft en absent. Other studied U12-alternative splicing events constitute a competition where either U2- type or U12-type splicing takes place. In the Prospero gene of Drosophila melanogaster, a U2-type intron is embedded within a U12- type intron, and splicing through either the minor or the major spliceosome is regulated and changes the homeo-domain of the protein (Borah et al., 2009). A reverse case can be found in the Drosophila Urp gene, where a U12-type intron is embedded within a U2-type intron and, interestingly, processing of the U12-type intron is predicted to lead to degradation by NMD (Lin et al., 2010). Notwithstanding the rarity of U12-type alternative splicing, the development of large-scale sequencing methods is expected to produce further examples and to enhance our understanding on the regulation of such splicing.

1.3 Splicing and other pre-mRNA Processes Virtually all machineries that carry out any of the steps in pre-mRNA processing are closely interconnected and oft en share components. Th is is no diff erent for the spliceosome and its components. Splicing factors couple extensively with many other pre-mRNA processing factors and here, the interdependence between splicing and transcription, as well as other processes will be explored.

1.3.1 The Timing of Splicing It is well established that most spliceosomes assemble and catalyze intron removal while the RNAP II still is transcribing the pre-mRNA in the nucleus (Brugiolo et al., 2013). Using electron micrographs, co-transcriptional splicing was visualized for transcription units of Drosophila melanogaster where transcripts were observed which displayed loop formation and removal before termination of transcription (Beyer and Osheim, 1988). Total RNA-seq data show underrepresentation of intronic counts near the 3´ ends of the introns, indicating that introns are removed and degraded soon aft er 3´ ss transcription (Ameur et al., 2011). Furthermore, many spliceosomal factors, including SR proteins and U2AF65, are known to associate with RNAP II through the carboxy-terminal domain (CTD) of its largest subunit (Misteli and Spector, 1999, David et al., 2011), a landing pad for many processing factors. In addition, U1 snRNP has been shown to be recruited to splicing competent as well as splicing defi cient transcription units (Spiluttini et al., 2010). Th e rate of RNAP II transcription elongation can aff ect alternative splicing in the so-called kinetic coupling model (Kornblihtt, 2005). Th e average RNAP II transcription elongation rate has been estimated to range from 1.9 kb min-1 to 4.3 kb min-1 (Boireau et al., 2007, Darzacq et al., 2007). Co-transcriptional splicing of U2-type introns has been measured to require 5-10 minutes but minor splicing is somewhat slower (ca. 10 minutes: Singh and Padgett (2009)). In another study, MS2 labeling of introns combined with live-cell confocal microscopy, allowing high- resolution detection of individual pre-mRNAs, revealed a much faster time for splicing. Here, the mean intron lifetime was measured to be 30-40 seconds (Martin et al., 2013). Kinetic coupling allows coordination of transcription and splicing, and thus, the transcription elongation rate can determine a window of opportunity for alternative processing. For example, fast elongation rates can promote exon skipping, whereas slow elongation rates oft en promote inclusion of exons with suboptimal splice sites (Kornblihtt et al., 2004, Bentley, 2014). Th is coordination can be regulated and mechanisms exist to alter RNAP II elongation rate: in yeast, the polymerase pauses

20 Review of the Literature downstream at introns to accommodate co-transcriptional splicing, and pausing occurs at the beginning of mammalian exons perhaps because of high nucleosome density (Schwartz et al., 2009, Alexander et al., 2010, Kwak et al., 2013). In some cases however, splicing and transcription are uncoupled. Using single molecular fl uorescence in situ hybridization (FISH), post-transcriptional splicing has been observed with introns associated with skipped exons in the SXl and PTB2 genes (Vargas et al., 2011). In addition, analysis of intron presence in nascent versus released Drosophila BR1 pre-mRNA demonstrated that, while most introns are removed in a co-transcriptional manner, introns close to the polyadenylation signal can be spliced post-transcriptionally (Bauren and Wieslander, 1994). Indeed, the degree of co-transcriptional splicing seems to be dependent on intron positioning: upstream introns are spliced more co-transcriptionally than downstream introns (Khodor et al., 2012). Furthermore, the rate at which introns are removed can have an eff ect. Th e slower the intron removal, the more likely splicing will occur post-transcriptionally (Bentley, 2014). Even though there are some reports that suggest that post-transcriptional splicing serves to quickly release transcripts from the transcription site to prevent binding of constitutive splicing factors (Vargas et al., 2011), the functional consequences as to whether an intron is co-transcriptionally or post-transcriptionally spliced are currently unclear (Bentley, 2014). Overall, the co-transcriptional nature of splicing has many benefi cial eff ects: co-transcriptional splicing is a fundamental requirement for many alternative splicing events (see above, and Schor et al. (2013)). Furthermore, it enhances the effi ciency and the accuracy of pre-mRNA maturation, allows the communication of splicing factors with chromatin and the interaction of the U1 snRNP with 3´ end processing factors (Bentley, 2014).

1.3.2 Splicing and Nuclear Export Correct processing of an mRNA is a requirement for nuclear export. Failure to do so will lead to retention and subsequent degradation of the mRNA by the nuclear surveillance machinery (Schmid and Jensen, 2010). Splicing promotes nuclear export by depositing protein factors at the site of exon fusion, the so-called exon-junction complex (EJC). Th e hetero-tetramer core of the EJC consists of eIF4AIII, Y14, Magoh, and MLN51 (Tange et al., 2005) which are deposited ca. 20–24 nucleotides upstream of exon–exon junctions, regardless whether U2- or U12-type splicing took place (Le Hir et al., 2000, Le Hir et al., 2001, Hirose et al., 2004). Formation of the EJC is suggested to be initiated through the EJC core protein Y14, which associates both with intron pre-mRNA and U snRNPs with the highest affi nity for the U2 snRNP (Shiimori et al., 2013). Th e EJC core proteins serve as a binding platform allowing the transient association of other EJC factors, such as the UAP56 and Aly proteins (Cullen, 2003, Tange et al., 2005). Th ese proteins are also part of the transcription-export (TREX) complex which is recruited during splicing (Strasser et al., 2002, Masuda et al., 2005). Both UAP56 and Aly are critical components for the binding of the NXF1/TAP heterodimer, a known export receptor (Reed and Hurt, 2002). Hypo-phosphorylated SR proteins are an additional mark for nuclear export. Th ey serve as a sign that splicing has taken place, and are known to interact with the NXF1/TAP exporter (Huang et al., 2004). More recently, an additional mechanism by which splicing aff ects mRNA export has been described for the human β-globin mRNA. Here, the splicing machinery, rather than recruiting export factors, operates through the removal of nuclear retention signals from the intron (Akef et al., 2015).

21 Review of the Literature 1.3.3 Splicing and Quality Control: Nonsense Mediated Decay Prevention of nuclear export in the case of failure to splice or incorrect splicing is an important quality control system because premature export can give rise to potentially toxic protein products. Th e EJC off ers an additional layer of quality control by degrading potentially lethal mRNAs in the cytoplasm through the NMD pathway (Lykke-Andersen et al., 2001). Typically, transcripts that contain a premature termination codon (PTC) are targeted by a translation- dependent mechanism. Such a PTC can be introduced through mutation, intron retention, or splicing errors. Following nuclear export, an initial round of translation takes place, and the ribosome dislodges EJCs from the mRNA. Recognition of the stop codon leads, through the action of eRF1 and eRF2, to the recruitment of Upf1 (Kashima et al., 2006). Upf1 can then sense downstream splicing events through its interaction with two components that associate with the EJC: Upf2 and Upf3 but only if the EJC is located suffi ciently far downstream (ca. 50-55 nt) (Nagy and Maquat, 1998). However, some mRNAs with extended 3´ UTRs are also NMD substrates, and it has been proposed that the most crucial determinant for NMD substrate recognition is the distance between the PTC and the 3´ end, more specifi cally the cytoplasmic poly(A)-binding protein 1 (PABPC1) (Muhlrad and Parker, 1999, Amrani et al., 2004, Behm-Ansmant et al., 2007, Silva et al., 2008, Singh et al., 2008, Eberle et al., 2008). In either way, NMD substrates are then decapped and degraded by both exonucleases and endonucleases (Rebbapragada and Lykke- Andersen, 2009). Although the main function of NMD seems to be the removal of deleterious mRNAs generated through unintentional processing errors, it can also perform a function in a regulatory mechanism called regulated unproductive splicing and translation (RUST). Here, oft en a so-called “poison exon” is included through alternative splicing and the destruction of the mRNA is part of a regulatory pathway (see 1.2.4.1, and Lewis et al. (2003)).

1.3.4 Splicing and 3´ End Processing Cleavage/polyadenylation (CP) constitutes a key event during pre-mRNA processing. It enables nuclear export and ensures stability and proper translation of the mRNA (Wickens et al., 1997). In essence, it is a two-step process: an endonucleolytic cleavage reaction followed by the addition of the poly(A) tail by the polyadenylate polymerase (PAP). Nearly all polyadenylated mRNAs from animal cells contain the consensus poly(A) signal “AAUAAA” sequence, or a close variant thereof (Wahle and Kuhn, 1997). However, due to redundancy and the low complexity of the signal, additional elements are required to prevent premature 3´ end processing. It is now realized that the unit of recognition for 3´ end processing oft en includes an upstream 3´ ss (Martinson, 2011). Functional coupling between CP and splicing has already been demonstrated for a long time (Niwa et al., 1990, Niwa and Berget, 1991). Since then, factors have been identifi ed that participate in the coupling (Gunderson et al., 1997, Vagner et al., 2000b, McCracken et al., 2002, Millevoi et al., 2006, Kyburz et al., 2006). Th e interconnectivity between splicing and CP ensures mutual stimulation, defi nes the terminal exon, and furthermore, promotes transcriptional termination (Dye and Proudfoot, 1999). While the presence of an upstream 3´ ss can activate a poly(A) site, a 5´ ss oft en has an inhibitory eff ect on CP, mediated by interactions of the U1 snRNP with components of the poly(A) machinery. Various mechanisms, either indirect or involving interactions with the CP machinery, have been described through which the U1 snRNP can execute a suppressive eff ect on CP. In the U1A gene, a conserved U1 site in the 3´ UTR inhibits polyadenylation and forms

22 Review of the Literature part of an auto-regulatory negative feedback loop (Boelens et al., 1993, Guan et al., 2007). In bovine papillomavirus (BPV) late transcripts, a direct interaction between the U1-70K protein and PAP specifi cally inhibits polyadenylation (Gunderson et al., 1998). For this, the U1 site is located upstream of the poly(A) site. In the 5´ long terminal repeat (LTR) of the HIV-1 provirus, however, a downstream U1 site leads to inhibition of the cleavage step (Ashe et al., 1997). Here, the interaction is thought to be with a cleavage factor, rather than with PAP (Vagner et al., 2000a), and again, the U1-70K protein is likely to be involved (Ashe et al., 2000). Th ese two types of inhibition serve diff erent purposes: for BPV, the production of cleaved but non-polyadenylated mRNAs that are presumably unstable, and in the case of HIV-1, the regulation of transcriptional read-through to generate full-length viral pre-mRNA to be packaged into viral particles. Interestingly, for both mechanisms of CP inhibition, the U1 site exhibits a high degree of affi nity for the U1 snRNA, which is a prerequisite at least when the U1 site is located upstream of the poly(A) site (Abad et al., 2008). In a method to inhibit gene expression, termed U1 interference (U1i), U1 snRNP has been shown to exhibit long-distance (> 1000 nt) inhibition of CP. Th is inhibition is thought to be due to a disruption of terminal exon defi nition, rather than to result from an interaction with the U1-70K protein (Fortes et al., 2003). In the IgM heavy chain gene, again, no direct interaction between U1 snRNP and the CP machinery takes place. Instead, a competition exists between an intronic poly(A) site and an upstream, suboptimal 5´ ss (Peterson, 2011). Here, a model has been suggested where a race to form either a cross-intron A complex with a downstream 3´ ss, or the 3´ terminal A-like complex will determine the outcome of the competition. Factors that delay (or hasten) this event for splicing promote (or suppress) CP, and vice versa (Martinson, 2011). On a more global scale, a role for the U1 snRNP has been shown in protecting the whole transcriptome from premature CP events (Kaida et al., 2010).

1.4 The Minor Spliceosome: Signifi cance Several U12-type introns are extraordinarily conserved, the most notable example being the second intron in the gene encoding the sodium channel α-subunit present both in humans and jellyfi sh (Wu and Krainer, 1999), organisms that diverged 600-800 million years ago (Spaff ord et al., 1998). Additionally, U2-type intron positions are more conserved than U2-type intron positions between Arabidopsis thaliana and human (Basu et al., 2008a). It appears that there is a selection pressure to maintain U12-type introns in certain genes, perhaps because their presence is important for the expression of the genes that harbor them. Th ere is no clear enrichment of U12-type introns in genes related to specifi c molecular functions or biologically processes. U12- type introns seem to be mainly present in genes involved in more broadly defi ned ´information processing´ functions, such as DNA replication and repair, transcription, RNA processing, and translation. In addition, they can also be found in genes related to cytoskeletal organization, vesicular transport, and voltage-gated ion channel activity (Burge et al., 1998, Wu and Krainer, 1999). In this chapter, the possible reasons for retaining two diff erent spliceosomes, which seemingly carry out the same task, and some recently described diseases associated with defects in minor splicing will be discussed.

1.4.1 FuncƟ on and RegulaƟ on of the Minor Spliceosome An important insight into the role of the U12-type intron came from studies in which the abundance of unspliced U12-introns was compared to that of U2-type introns: consistently

23 Review of the Literature higher levels of unspliced U12-type introns were detected compared to the U2-type introns (Patel et al., 2002). When measuring fl uorescence levels from reporter constructs, it was observed that reporters with a U2-type intron produced 6-8 fold more of the fl uorescent protein compared to when a U12-intron was present in the exact same position (Patel et al., 2002). It was concluded that the removal of U12-type introns could be a rate-limiting step for the expression levels of genes that harbor them and therefore, U12-type introns act as some kind of post- transcriptional bottleneck. Reduced levels for U12-type intron splicing have been confi rmed in other in vivo studies as well (Santoro et al., 1994, Pessa et al., 2006). Furthermore, a knock-down of components of the nuclear exosome leads to the upregulation of the nuclear retention of U12- type introns but not of U2-type introns (Niemelä et al., 2014). Th ese data support a model where the delay in processing for transcripts with U12-type introns make them more likely targets for the nuclear exosome, off ering a means to control gene expression. Th e retention of U12-type introns could be explained by their slower splicing. Earlier in vitro studies have shown that the removal of U12-type introns is dramatically slower (Tarn and Steitz, 1996b, Wu and Krainer, 1996) and in vivo measurement of the splicing kinetics estimate minor splicing to be on average two times slower (Singh and Padgett, 2009). An analysis of the decay rates of U12-type introns led to the observation that U12-introns are remarkably stable aft er nuclear exosome knock-down (Niemelä et al., 2014) and an alternative hypothesis was proposed to explain the perceived retention of U12-type introns. Whereas some U12-type intron containing transcripts are spliced rather effi ciently, a subpopulation fails to splice entirely (Niemelä and Frilander, 2014). Perhaps, one of the components of the minor spliceosome is limiting the formation of a catalytically active spliceosome combined with a higher dependency for the minor spliceosome on co-transcriptional splicing. Regardless of whether minor splicing is either slow or ineffi cient, what could be the underlying mechanistic reason for the higher levels of unspliced U12-type introns? Th e snRNPs of the minor spliceosome are ca. 100-fold less abundant than those of the major spliceosome in mammalian cell nuclei and nuclear extracts (Montzka and Steitz, 1988, Tarn and Steitz, 1996a), off ering a possible explanation. However, a 10-fold reduction of U4atac, the least abundant of the U12-type snRNAs, did not seem to have any signifi cant impact on the effi ciency of endogenous U12-type splicing (Pessa et al., 2006). On the other hand, U6atac is relatively unstable (half-life < 2hr), and has been shown to be stabilized by the cell-stress activated kinase p38MAPK, resulting in the enhanced splicing of U12-type introns and increased expression of their associated mRNAs (Younis et al., 2013). Additionally, a protein factor associated with the minor spliceosome could be in rate-limiting concentrations. Perhaps, its expression or activity could vary in diff erent cell types or during diff erent developmental stages, enabling regulation of the activity of the minor spliceosome. It has been argued that the recognition step of the U12-type introns is not crucial for the perceived higher U12-type intron retention: both a disease that hampers U12-type intron recognition (Argente et al., 2014) and knock-down of the U11-48K protein, which recognizes the U12-type 5´ ss (Turunen et al., 2008), lead to the activation of cryptic U2-type 5´ splice sites (Niemelä and Frilander, 2014). Th us, when U12-type intron recognition is impaired, U2-type splicing typically takes over: the crucial steps that lead to slow or ineffi cient splicing are more likely to take place during minor spliceosome assembly or during the catalytic steps.

24 Review of the Literature 1.4.2 Minor Spliceosome and Disease A number of pathological conditions have been found resulting either from mutations in the splicing signals of U12-type introns, mutations that aff ect minor snRNP processing or mutations directly aff ecting a protein component of the minor spliceosome. In Peutz-Jeghers syndrome and spondyloepiphyseal dysplasia tarda (SEDT), the U12-type 5´ ss in the LKB1 or SEDL genes, respectively, are mutated and the disease phenotype is most likely caused by the reduced availability of the corresponding proteins (Hastings et al., 2005). Homozygous inactivation of the survival motor neuron 1 (SMN1) gene leads to spinal muscular atrophy (SMA) (Lefebvre et al., 1995). Th e SMN protein forms a macromolecular complex involved in the biogenesis of the snRNPs of the Sm class (Pellizzoni, 2007, Neuenkirchen et al., 2008). Defects in minor splicing are thought to contribute to the disease phenotype because SMA reduces the levels of minor snRNPs to a greater extent than those of the major snRNPs (Gabanella et al., 2007, Zhang et al., 2008b). Indeed, in Drosophila melanogaster, SMN defi ciency decreases the expression of a U12-type intron containing gene required for proper motor circuit function (Lotti et al., 2012). Mutations in the ZRSR2 gene, encoding for a splicing factor recognizing the 3´ splice sites of both U2- and U12-type introns, can lead to myelodysplastic syndrome (MDS) (Yoshida et al., 2011). Here, aberrant retention of U12-type introns is prevalent, whereas U2-type introns remain largely unaff ected (Madan et al., 2015). In recent years, several pathological conditions have been described resulting from mutations in core components (snRNA and protein) of the minor spliceosome. Mutations in the U11/U12-65K gene are associated with a case of isolated familial growth hormone defi ciency (IGHD1) and pituitary hypoplasia (Argente et al., 2014). In microcephalic osteodysplastic primordial dwarfi sm type I (MOPD1, also known as Taybi-Linder Syndrome), the clinical phenotype is much more severe: patients suff er several developmental defects and typically die within three years (Nagy et al., 2012). Here, a mutation is located within a phylogenetically conserved stem-loop structure in the U4atac snRNA (He et al., 2011, Edery et al., 2011, Pessa and Frilander, 2011). Th is is believed to disrupt base pairing with U6atac and suppress binding of the spliceososomal protein 15.5 K. Th is protein is presumably required for the association of the U5 snRNP to form the U4atac/U6atac.U5 tri-snRNP (Nottrott et al., 2002, Makarova et al., 2002). A diff erent set of compound heterozygous mutations in the RNU4ATAC gene, encoding the U4atac snRNA, cause Roifman syndrome (Merico et al., 2015). Here, one mutation aff ects the conserved stem-loop structure which is also mutated in MOPD1, and another mutation aff ects another highly conserved element, stem II. In Roifman syndrome, the disease phenotype is much milder, leading to non-fatal retinal dystrophy, poor pre- and postnatal growth, cognitive delay and facial dysmorphism. It is currently unclear how the minor splicing defects lead to the specifi c pathologies seen in these conditions and why such tissue specifi city is observed. Finally, it is possible that there are other diseases linked to mutations in the other minor spliceosomal snRNAs: future genome sequencing would help reveal these cases.

25 Aims of the Study 2. AIMS OF THE STUDY

Although recent fi ndings have shed some lights on the minor spliceosome and its workings, its exact role and regulation are largely unknown. Th is thesis aims to characterize in more detail the regulation of the minor spliceosome. More specifi cally: (1) to identify and characterize the role of an ultraconserved sequence element, containing a duplication of a U12-type 5´ ss, found in two genes coding for proteins specifi c to the minor spliceosome. We hypothesized that a negative feedback loop regulatory mechanism would act through non-productive alternative splicing and subsequent downregulation of the expression of the genes which harbor these elements. Aft er characterization of this regulatory mechanism, a study was initiated (2) to gain detailed molecular insights into the mechanism by which this element downregulates the translational- competent mRNA levels of the U11/U12-65K gene and, in addition, regulates minor splicing in general. A further study was undertaken (3) to investigate the role of the U11-35K protein, a protein associated with the minor spliceosome, and its ability to activate the splice site located upstream from the conserved element, as well as genuine U12-type 5´ splice sites. Additionally, in order to get mechanistic insights into the binding properties of the element, the requirements for the distance between the two 5´ ss-like motifs within the element, and for the distance of the element to the activated splice site were investigated.

26 Materials and Methods 3. MATERIALS AND METHODS

Methods used in this study are listed in table 2. For a detailed description of methods, see the original publications.

Table 2. Overview of methods used in this study.

Method PublicaƟ on Cell fracƟ onaƟ on I, III Cell lines and culture I, II, III Fluorescence in situ hybridizaƟ on III Immunofl uorescence II ImmunoprecipitaƟ on I, II In vitro splicing and spliceosome assembly assays I In vivo splicing reporter assays I, II, III Luciferase expression assays I Northern bloƫ ng I Protein overexpression in mammals I, II, III QuanƟ taƟ ve PCR I, II, III Reporter plasmids I, II, III RNA half-life measurement III RNAi knock-down I RNA-RNA cross-linking I RNase protecƟ on assays III RT-PCR I, II, III Sequence alignments I, II, III Streptavidin pull-down assays I TransfecƟ on I, II, III Western bloƫ ng I, II, III

27 Results and Discussion 4. RESULTS AND DISCUSSION 4.1 Ultraconserved non-coding regions in minor spliceosome- associated genes are linked to alternaƟ ve splicing (I) Th ere are seven proteins uniquely associated with the minor spliceosome, and it has been hypothesized that their expression might be linked to the activity of the minor spliceosome (Will et al., 2004). Th e U11-48K, which assists in the recognition of the U12-type 5´ ss (Turunen et al., 2008), is a potential candidate regulator, and the serendipitous fi nding of an ultraconserved region in its gene provided an interesting starting point to investigate a potential regulatory mechanism. Th e ultraconserved region in the 48K gene is located in the intron between exons 4 and 5 (Fig. 6a) and encompasses ca. 110 nt in the 25 mammalian species examined. Here, ultraconserved regions are defi ned as sequences that are over 100 nt long and that have > 90 % sequence similarity in mammals. Th e presence of ultraconserved regions in this gene does not come as a surprise per se. In fact, the presence of ultraconserved sequences in non-coding regions is a widespread phenomenon in eukaryotes and, furthermore, has been shown to be particularly associated with RNA binding and processing factors (Bejerano et al., 2004). However, most interestingly, within the ultraconserved region in the U11-48K gene, a duplication of the consensus U12-type 5´ splice site separated by a 6 nt spacer, was found. From a further comparison of the 48K genes in fi sh and insects, we identifi ed a “core” conserved region with very divergent fl anking and interspersing DNA sequences. Th is core consists of a PPT and adjacent U2-type 3´ ss, combined with the ca. 40-60 nt downstream located duplicated U12-type 5´ ss. Th is prompted a bioinformatics search to look for similar sequence elements in the human genome. Several thousands were found but only one single element showed ultraconservation in mammals, located in the 3´ UTR of the 65K gene (also known as RNPC3)(Fig. 6b), coding for a protein with a role in U11/U12 di-snRNP formation and intron bridging (Benecke et al., 2005). Sequence similarity of the region in the 65K 3´ UTR was still very high when comparing mammals with reptiles and birds. However, the core structure was the only recognizable conservation we observed from a further comparison with fi shes. Ultraconservation of elements covering hundreds of nucleotides in mammals and birds and a reduction to a shorter core sequence in fi sh is a common theme (Bejerano et al., 2004, Sabarinadh et al., 2004). Many ultraconserved elements are thought to be chordate innovations rapidly evolving at fi rst and then frozen in birds and mammals (Bejerano et al., 2004). Our discovery of a core sequence (PPT + 3´ ss + duplicated U12-type 5´ ss) in the 3´ UTR of the 48K gene of plants (Fig. 6c), however, indicates that such a sequence was presumably present already in the common ancestor of plants and animals, and that the mechanism by which it operates is under enormous evolutionary selection pressure. Based on EST data, the conserved regions in both 48K and 65K genes were found to be associated with alternative splicing events. However, there was no evidence that the U12-type 5´ splice sites were used for splicing. In the human 48K, alternative splicing leads to either inclusion of an 8 nt mini-exon (termed exon 4i), or inactivation of the 3´ ss downstream of exon 4i to generate an 1852 nt long exon. Similarly, in 65K, alternative 3´ ss activation leads to the production of an isoform with an 1839 nt elongated 3´ UTR. We confi rmed the presence of the 48K mini-exon transcript and the long 65K isoform in vivo in HEK293 cells (I, Fig. 4C and D). Additionally, I observed two alternative splicing events in the 3´ UTR of the 48K gene in both

28 Results and Discussion

Figure 6. Conserved sequence elements and alternative splicing in animal (A) and plant (C) 48K genes, and in the animal 65K gene (B). Th e upper part depicts genomic organization and splice variants: blue boxes depict protein-coding exons, the 3´ UTR is in yellow, and the thin horizontal lines are introns. Th e middle part is a phylogenetic conservation plot and the blow-up below shows residue- level conservation of the sequence elements.

29 Results and Discussion

Arabidopsis thaliana and Populus trichocarpa: alternative 3´ ss usage and intron retention (I, Fig. S1D). Th e most obvious hypothesis is that the duplicated U12-type 5´ ss acts as a splicing enhancer to activate the upstream 3´ ss, either directly or indirectly. Excitingly, given its affi nity for the U12-type 5´ ss, this could signify a novel role for the U11 snRNP as an activator of alternative splicing without direct participation in actual splicing.

4.2 U11 binds the conserved enhancer and acƟ vates alternaƟ ve splicing through exon defi niƟ on interacƟ ons mediated by the U11-35K protein (I and II) To address the role of the duplicated 5´ ss sequences, I transfected antisense morpholino oligos targeting the duplicated U12-type 5´ ss in the human 65K gene, and observed complete loss of alternative splicing and a shift to the canonical isoform with a short 3´ UTR (I, Fig. 4D). For technical reasons, it was not possible to target the 48K element with morpholino oligos. Th erefore, we used 2´ O-Methyl oligos, which resulted in a loss of 48K exon 4i inclusion (I, Fig. 4C). To further pinpoint the role of the U11 snRNP, I mutated nucleotides (AG5/6TC) that are absolutely required for U11 snRNP binding (Kolossova and Padgett, 1997) in either one or both of the U12-type 5´ splice sites, and observed complete loss of alternative splicing in each case. Such a loss-of-function phenotype provides a strong support for the essential role of the 5´ ss elements, but does not prove causality. Consequently, I then attempted a genetic rescue by co-transfection of a U11 snRNA with compensatory mutations that are expected to restore Watson-Crick base pairing. Such an elegant strategy has been previously employed to reveal the role of U1 and U11 snRNAs in U2-type and U12-type 5´ ss recognition respectively (Zhuang and Weiner, 1986, Seraphin et al., 1988, Kolossova and Padgett, 1997), and has the capacity to reveal a direct causal relationship between U11 snRNP binding and alternative splicing. Gratifyingly, upon co-transfection, alternative splicing was restored (I, Fig. 4F) and the sequence element with U12 5´ ss repeat was henceforth termed as the U11 snRNP-binding splicing enhancer (USSE). Th is interpretation is also supported by biochemical evidence, in particularly psoralen crosslinking and pull-down experiments (I, Figures 2 and 3), which indicate a direct interaction between U11 snRNA/snRNP and the USSE element that is sensitive to mutations aff ecting U11 snRNA/5´ ss base-pairing. How does the U11 snRNP (or U11/U12 di-snRNP) bind the USSE? Are the U12-type 5´ splice sites recognized by a single U11 snRNP or are they bound by two separate snRNPs? A clue can be found from the rescue experiment (I, Fig. 4F). Mutation of either one of the 5´ splice sites is suffi cient to abolish alternative splicing. Alternative splicing is restored, however, when two diff erent U11 snRNAs are present (one wild-type and one that carries compensatory mutations). Here, we would not expect to see a rescue if the duplication serves to increase the affi nity of a single U11 snRNP. Furthermore, alternative splicing is further improved to wild-type levels when both U12 5´ ss are mutated, presumably because overexpression of the compensatory mutated U11 snRNA now provides optimal stoichiometric conditions for double occupancy. Further evidence for the binding of two separate U11 snRNPs at the USSE comes from experiments where the length of the spacer region between the two U12-type 5´ splice sites was varied. Here, it was found that alternative splicing was signifi cantly reduced when the spacer length was shorter than 3 nt and even completely abolished without spacer, arguing for steric hindrance

30 Results and Discussion between two snRNPs (II, Fig. 2B). In addition, phylogenetic analysis of spacer length in various organisms confi rmed evolutionary pressure to maintain spacer length ≥ 3 nt. How does the U11 snRNP activate alternative splicing? Splicing activators are known to operate through diff erent mechanisms and one such mechanism is through the inhibition of splicing repressors (see 1.2.2.2). Indeed, for both 65K and 48K genes, the USSE is surrounded with an extremely conserved sequence that putatively could harbor splicing silencers. In this model, U11 snRNP binding would counteract or prevent silencer binding. Another mechanism is more direct, acting through the establishment of exon defi nition interactions. However, such interactions are generally mediated through the RS domains present in SR and SR-like proteins. Th e U11 snRNP contains one candidate SR-like protein, the U1-70K homologue U11-35K. In a 3´ UTR reporter construct, the USSE was removed and replaced by BoxB RNA hairpin sequences. Th ese structures can tether λN-peptide fusion proteins (Gehring et al., 2008) and, in this way, we evaluated the role of the U11-35K in splicing activation. We demonstrated that the RS domain of U11-35K is both necessary and suffi cient for upstream 3´ ss activation (II, Fig. 3G), supporting a model in which U11-35K participates in cross-exon communication between the two diff erent spliceosomes. I further showed that the activity of the U11-35K is comparable to that of the SR protein SRSF1 but less eff ective than U1-70K, which contains two RS domains (II, Fig. 3H). Further evidence for an exon defi nition mediated activation of alternative splicing came from experiments with constructs where the distance between the 3´ ss and its downstream USSE was manipulated. Steric hindrance imposes a minimum limit on exon defi nition (Dominski and Kole, 1991), and similarly, gradual reduction from the 65K 3´ ss-USSE wild type distance of 43 nt showed a linear decline in alternative splicing (II, Fig. 1C), consistent with classical exon defi nition interactions. Furthermore, the 3´ ss-USSE distances in both 48K and 65K were highly constrained in terrestrial vertebrates and indicate an exon defi nition model in all studied organisms. Th e USSE thus acts as an ESE but, unlike a classical splicing enhancer, its relative positioning to the activated 3´ ss is restricted by distance requirements imposed by exon defi nition.

4.3 AlternaƟ ve splicing leads to a mulƟ layered inhibitory mechanism for 48K and 65K mRNA expression (I and III) Having gained insight in the mechanism of USSE coupled alternative splicing, we turned our attention to the essence of the alternative splicing event itself. Alternative splicing oft en leads to the production of alternate protein products but, coupled with NMD or other degradation pathways, it can be a mechanism to regulate mRNA levels (RUST: see 1.2.4.1). Here, a gene is transcribed and spliced, only to be degraded. Seemingly wasteful, it can however provide an extra layer of regulation that is benefi cial to the organism. If the splicing regulator is the protein product of the gene itself, an auto-regulatory feedback loop is established. We were left with an interesting hypothesis: could the presence of the USSE, given the affi nity of the U11-48K protein and the U11 snRNP for such sequences, provide the basis for a negative auto-regulatory feedback loop? In this case, the expression of two key proteins of the minor spliceosome would be under control of the USSE through RUST. Both 48K alternate isoforms constitute potential NMD targets: incorporation of the 8 nt “poison” exon, as well as the 1852 nt giant exon are predicted to generate PTCs. Indeed, when treating cells with the translation inhibitor cycloheximide (CHX), I observed stabilization for

31 Results and Discussion the mini-exon containing transcript, and knock-down of the NMD factor Upf1 showed a similar result (I, Figures 5B and F). However, the 48K long exon containing isoform was unresponsive to CHX treatment (Turunen et al., 2013b). Similarly, the 65K long isoform did not prove to be a target for NMD (I, Fig. 5F), even though the presence of a long 3´ UTR has the potential to mark the transcript for NMD (Muhlrad and Parker, 1999). Interestingly, a cellular fractionation experiment showed an overabundance of the long isoform in comparison to the short isoform in the nuclear fraction (I, Fig. 5H) and thus, the 65K long isoform is either subjected to fast cytoplasmic decay or retained in the nucleus. In fact, treatment with the transcription inhibitor 5,6-dichloro-1-beta-D-ribo-furanosylbenzimidazole (DRB) indicated that the 65K long isoform was as stable as the short isoform with an estimated half-life of ca. 10h (III, Fig. 1G), ruling out active degradation of the mRNA. Using single molecule FISH, I confi rmed nuclear retention of the 65K long isoform (III, Fig. 1C) and morpholino blocking of the 65K USSE led to enhanced nuclear export (III, Fig. 1D). I then performed a cellular fractionation and showed that a nuclear retention mechanism also exists for the 48K transcript with the long exon (III, Fig. 1H). What could be the mechanism for nuclear retention? In the case of the 65K mRNA, EST data, RNAseq data from ENCODE and a database for conjoined genes (Prakash et al., 2010) revealed transcription from the 65K locus into the downstream AMY2B locus connected through an intergenic exon (III, Fig. 2A). Th rough manipulation of the 65K splicing pattern by USSE blocking morpholinos and careful quantifi cation of the intergenic exon signal by qPCR, I deduced that read-through transcription past the 65K poly(A) site was predominantly associated with the long isoform (III, Figures 2B and C). Th is off ers a potential mechanism for nuclear retention as transcripts with CP defects are oft en retained at the transcription site, and furthermore, their degradation is slow (de Almeida et al., 2010). Interestingly, for the plant 48K gene, neither alternatively splicing of the 3´ UTR nor the complete retention of the intron in the 3´ UTR leads to transcriptional read-through (unpublished data). In plants, NMD has been shown to occur both for transcripts with long 3´ UTR and for transcripts where splicing takes place within the 3´ UTR (Kertesz et al., 2006), off ering a plausible degradation mechanism for 48K alternate isoforms in plants. Th ere is, however, a substantial amount of 65K long isoform for which CP does occur and the mRNA spot pattern observed in FISH is inconsistent with retention at the transcription site only (III, Figures 1C and 3G). Furthermore, 48K long exon transcripts demonstrate nuclear retention even though the 48K USSE is located in an intronic region and not near the constitutive poly(A) site. I hypothesized that binding of the USSE by the U11 snRNP (or U11/U12 di-snRNP) would constitute a common nuclear retention mechanism for the long isoforms of both 48K and 65K. To test this, I made two diff erent reporter constructs: one in which the long 65K 3´ UTR was directly cloned and a reporter that contained the full unprocessed 65K 3´ UTR, and therefore required U11 snRNP binding to generate the long isoform. Co-transfection of each of these constructs with a short 3´ UTR reporter followed by cellular fractionation showed that nuclear retention occurred only when U11 binding was required to produce the long isoform (III, Fig. 4). Th is argues against the presence of cis-acting nuclear retention signals and suggests that the retention mechanism is further dependent on splicing. A possible mechanistic explanation is that splicing is required to provide mutual exon defi nition interactions that anchor the two U11/ U12 di-snRNPs onto the USSE element. Stable nuclear retention has been shown to occur when early spliceosomal complexes have formed but are impeded to undergo the catalytic steps (Hett and West, 2014). Similarly, in addition to alternative splicing activation, the USSE could serve to recruit a minor spliceosome which, perhaps due to the absence of a suitable U12-type BPS, is

32 Results and Discussion prevented to assemble adequately. In the case of the 48K transcript with exon 4i inclusion only, such complexes would not be present in the fi nal mRNA due to removal of the USSE by splicing. In contrast, both the longer 48K transcript and the 65K long isoform retain the USSE and thus the ability to bind U11/U12-di-snRNPs. Previously, E complex factors such as U1-70K and U2AF65 have been demonstrated to cause nuclear retention for the major spliceosome (Takemura et al., 2011) and perhaps, analogously, the U11- (or U11/12 di-) snRNPs recruit or are associated with specifi c retention factors. Th e expression of 48K and 65K mRNA is regulated through RUST. While nuclear retention is employed as a shared mechanism, NMD coupled alternative splicing is specifi c for the 48K mini-exon transcript, and transcriptional read-through specifi c for the 65K long isoform. While at least three diff erent means are used, the regulatory end product is the same: downregulation of minor spliceosomal proteins. Th is multilayered inhibitory system is reminiscent of the situation in the SRSF1 gene where alternative splicing generates six isoforms that are either nuclear retained, degraded by NMD or controlled at the translational level (Sun et al., 2010). Th e redundancy of degradation/retention systems might serve as some sort of double locked fail- safe mechanism or even allow for extra levels of regulations. For instance, nuclear retention of isoforms could be regulated so that specifi c signals release the mRNA into the cytoplasm readily available for translation, or alternatively, the activity or effi ciency of the NMD pathway could be altered such that alternatively spliced isoforms are stabilized (Smith and Baker, 2015).

4.4 USSE provides negaƟ ve auto-regulaƟ on for the 48K gene and minor spliceosome mediated cross-regulaƟ on for the 65K gene (I and III) Overexpression of the U11-48K protein from a construct leads to enhanced inclusion of the poisonous 4i mini-exon and thus leads to reduction in the levels of endogenous 48K mRNA (I, Fig. 4C). Th is is the hallmark of an auto-regulatory negative feedback mechanism. No eff ect was observed for the 65K mRNA when 65K levels were increased through expression constructs. Th e 65K mRNA splicing pattern, however, was aff ected by knock-down of both U11-48K and U11- 35K proteins and demonstrated a shift towards the short isoform in combination with a 3- to 4-fold upregulation of total 65K mRNA levels (I, Fig. 5C). A similar switch in splicing pattern was observed when a mutated U11 snRNA was overexpressed (I, Fig. 4E), suggesting that the endogenous U11 snRNA was outcompeted for minor spliceosome protein factors. Although we could not establish evidence for an auto-regulatory feedback loop in the 65K gene, it is clear that extensive cross-regulation by minor spliceosome components does occur. Even more, when I overexpressed a P120 minigene, which contains a U12-type intron, I saw an isoform switch towards the short isoform in the 65K gene but no change in the 48K splicing pattern (III, Fig. 5B). Overall, the effi ciency of endogenous U12-splicing was unaff ected (III, Fig. 5A). I concluded that the USSE of 65K is a more sensitive sensor of minor splicing activity and maintains cellular homeostasis by increasing the levels of the export-competent short isoform. Intriguingly, the U1-C protein, with a similar role as the U11-48K protein but instead recognizing U2-type 5´ splice sites, directs NMD-coupled alternative splicing in the U1-70K gene (encoding a functional homologue of U11-35K) providing a cross-regulatory system for components of the major spliceosome similar to the one we described here for the minor spliceosome (Rosel-Hillgartner et al., 2013).

33 Results and Discussion

Auto-regulation and cross-regulation are common regulatory mechanisms for splicing factors, with ultraconserved sequences oft en as key elements (see 1.2.4.1, and Ni et al. (2007), Lareau et al. (2007), Lareau and Brenner (2015)), and negative feedback loops are thought to dampen noise in transcriptional output and to maintain cellular homeostasis (Becskei and Serrano, 2000). Similarly, in our study, the role of the feedback loops that regulate 48K and 65K expression seems to be the maintenance of the steady-state levels of the two minor spliceosome associated proteins. However, both U11-48K and U11/U12-65K are excellent candidates to alter the activity of the minor spliceosome through changes in their protein concentrations. How can these changes be achieved in a negative feedback loop? Th e extreme evolutionary conservation over several hundred nucleotides near the USSE in mammals and birds indicates that splicing decisions might not lie only in the hands of minor spliceosome components. Indeed, for the 48K USSE, a rather complex network involving hnRNP H1, U1 snRNP and U11 snRNP directs alternative splicing (Turunen et al., 2013b). It is entirely possible that, in diff erent tissues or physiological states, several splicing activators and repressors could promote or counteract U11 snRNP binding to the USSE, and thereby contribute to regulation of 48K and 65K expression. Even more, one could imagine a splicing activator that promotes 48K or 65K alternative splicing in a U11 snRNP-independent way, thereby bypassing the feedback loop and turning a switch to change the activity of the minor spliceosome. On the other hand, the negative feedback loop does not need to be broken to alter the concentrations of minor spliceosomes. Auto-regulation can be modulated by “master” splicing regulators that allow for diff erent steady-state levels of the auto- regulated proteins between diff erent cell states. In particular, the splicing factor Rbfox2 has been shown to tune negative auto-regulation of many RNA-binding proteins both by promoting or suppressing NMD, resulting in splicing network shift s (Jangi et al., 2014). Similarly, negative auto- regulation and cross-regulation of the U11-48K and U11/U12-65K proteins could be modulated by splicing regulators, altering their steady state levels at diff erent cell and developmental states.

4.5 EvoluƟ onary role of the USSE (I,II and III) Th e presence of the USSE in the plant 48K 3´ UTR indicates that the regulatory mechanism predates the divergence of plants and animals revealing an origin for the USSE of more than 1 billion years old. Th e minimal requirements of the regulatory mechanism are: (a) a duplicated U12-type 5´ ss, (b) communication between components of the minor and major spliceosome, with the implication that the regulatory system presumably developed when both spliceosome systems were present in the same genome, and (c) a quality control system coupled with alternative splicing to establish the negative feedback loop. A simple program of alternative splicing has been assumed to be present already in LECA (Irimia and Roy, 2014), and the presence of NMD in plants (Isshiki et al., 2001) suggests that a NMD system was already present in the last common ancestor of plant and animal. Th e regulatory mechanism that we have described here strongly suggests RUST already to be in eff ect in LECA. Negative auto-regulation benefi ts the organism by minimizing potentially toxic fl uctuations of gene expression and unnecessary translation (Becskei and Serrano, 2000). For RNA binding proteins, and splicing factors in particular, it seems that the evolutionary most facile and permissive way to establish auto-regulation is to recognize the RNA of their own genes and activate unproductive splicing. Indeed, many non-homologous genes coding for SR proteins, hnRNPs and core spliceosome proteins have evolved auto-regulation through unproductive

34 Results and Discussion alternative splicing (Saltzman et al., 2008, Lareau and Brenner, 2015). Th e fact that the USSE requires two U11 snRNPs for binding might turn it into a particular sensitive enhancer element for regulation, enabling detection at low levels of U11 snRNP. Indeed, knock-down of the U11- 48K protein has moderate eff ects on U12-type splicing but a dramatic eff ect on the alternative splicing of 65K. A look at the phylogenetic distribution of the USSE (I, Fig. 7) reveals that, even though the USSE can be found in the plant 48K gene, the USSE in the 65K gene seems to be a relatively recent innovation antedating the tetrapod-fi sh divergence and indicates that the regulatory system has changed across evolution to include a cross-regulatory mechanism. It is worth noting that among the closely related SR proteins, a family which expanded through gene duplication, poison cassettes for auto-regulation can be found in non-homologous positions (Lareau et al., 2007). Here, ancestral unproductive splicing was rapidly replaced by newly acquired unproductive splicing aft er each duplication (Lareau and Brenner, 2015). Th is shows that such regulatory elements can be readily acquired during evolution depending on the regulatory requirements of the organism and perhaps this has happened for the 65K gene as well. Th e presence of the USSE in the 3´ UTRs of vertebrate 65K and plant 48K genes, as compared to within an intron in the vertebrate 48K gene, reveals that the regulatory mechanism employs diff erent means of degradation to achieve the same regulatory end. Could the negative feedback loops serve to down-regulate minor spliceosome proteins to limit their availability in minor splicing and thereby contributing to the reduced rate the splicing reaction? Th e presence of a similar negative feedback loop for the major spliceosome rather suggests that this is a general mechanism to ensure homeostasis of core spliceosome components (Rosel-Hillgartner et al., 2013). On the other hand, many splicing factors are auto-regulated but their steady state levels can be altered by “master” splicing regulators, enabling their participation to diff erent splicing networks. A similar regulation could be the underlying mechanism by which the U11-48K and the U11/12-65K proteins aff ect minor splicing in diff erent cellular conditions. Interestingly, it seems that in the tetrapod lineage extra regulatory elements were installed. Indeed, the DNA sequences that surround the USSE are ultraconserved in mammals and sauropsida (birds and reptiles) and potentially constitute many (overlapping) binding sites for splicing repressors and activators in a much defi ned setting. Finally, there is the possibility that loss of the USSE is related to the loss of U12-type introns in many lineages. Loss of minor splicing homeostasis or the ability to regulate the activity of the minor spliceosome might relieve the selection pressure for maintaining U12-introns, setting the path for U12-type to U2-type conversion.

35 Results and Discussion

A STABLE EXPRESSION EXPORT COMPETENT 48K 65K

U11 snRNP level up

NUCLEAR RETENTION NUCLEAR RETENTION

48K 65K AAAAA USSE USSE poison cleavage exon 48K 65K NMD NUCLEAR RETENTION 48K level down: 65K level down: negative auto-regulation negative cross-regulation

B NUCLEAR RETENTION

65K AAAAA ”isoform switch” EXPORT COMPETENT USSE Increased minor 65K cleavage splicing activity

65K Minor splicing homeostasis NUCLEAR RETENTION

Figure 7. Model of USSE-directed regulation of alternative splicing in the U11-48K and U11/U12- 65K genes, and of the minor spliceosome. (A) An increased level of U11 snRNP leads to downregulation of 48K and 65K expression through USSE-directed alternative splicing, creating negative auto-regulatory and cross-regulatory feedback loops. Binding of two U11/U12 di-snRNPs at the two U12-type 5´ ss motifs promotes activation of an upstream 3´ ss via U11-35K mediated exon defi nition interactions. Exons are in blue/green and the 3´ UTR is in yellow. (B) Th e 65K USSE acts as a sensor for minor spliceosome activity. Increased minor splicing activity leads to a 65K isoform switch, generating an export competent short isoform, to maintain minor spliceosome homeostasis.

36 Concluding Remarks 5. CONCLUDING REMARKS

In I, I have identifi ed and characterized the ultraconserved element and have shown that it is in fact a splicing enhancer element (termed USSE) that requires the U11 snRNP to activate an upstream 3´ ss. Th e alternative splicing event that follows produces, in the case of the minor spliceosome associated 48K gene, an mRNA isoform that is targeted for destruction. Th e resulting “negative auto-regulatory feedback loop” was shown to be evolutionarily ultraconserved as the element was found in mammals, fi sh, insects and even plants, and is therefore proposed to carry an important function in the regulation of the minor spliceosome. In II, we have characterized the U11-35K protein and show that its RS domain is suffi cient and necessary for exon defi nition interactions, acting as a mediator of communication between components of the minor and major spliceosome. In addition, by manipulation of distances between the two 5´ ss-like motifs and the upstream 3´ ss, we gained insight into the molecular properties of U11 snRNP binding and these fi ndings were supported by evolutionary data. In III, I show that in the 65K gene, also encoding for a minor spliceosome associated protein and containing a USSE element in its 3´ UTR, alternative splicing produces a splice product which is, as shown by cell fractionation experiments and fl uorescence in situ hybridization (FISH), retained in the nucleus. Th e nuclear retention requires the binding of the U11 snRNP. Additionally, I demonstrated that cleavage/ polyadenylation of the alternative isoform is compromised and propose a mechanism that explains this fi nding. Figure 7 summarizes the main fi ndings of this thesis. Overall, the regulatory feedback loop that I have characterized in this study serves to maintain an appropriate balance for minor spliceosome proteins and could provide a framework for further studies into the regulation of the minor spliceosome. Additionally, this study opens an intriguing possibility that USSE-directed non-productive alternative splicing is further fi ne- tuned by other splicing factors, regulating the expression of both U11-48K and U11/U12-65K with an plausible impact on the activity of the minor spliceosome.

37 Acknowledgements 6. ACKNOWLEDGEMENTS

Th is study has been carried out at the University of Helsinki, Institute of Biotechnology. I would like to thank the Institute for providing excellent infrastructure and research facilities. Th ere are some very capable and wonderful people working here. I worked here long enough to witness three Directors: Mart Saarma, Tomi Mäkelä and Howard Jacobs; thank you all for providing a stimulating working environment. During my PhD studies, I benefi ted from a Marie Curie Fellowship and I am grateful for this wonderful European research programme. My work has also benefi ted from the Viikki Graduate School in Biosciences for which I am grateful.

I owe a lot my supervisor Mikko Frilander who invited me to Finland and gave me a chance to work in his fi ne lab. Mikko, I thank you for your support and your trust. Your mentoring has had a huge infl uence on my scientifi c thinking. Your door was always open: you listened and you advised. Warm thanks go to all member of the Frilander lab, in the present and in the past. Heli, Janne and Elina: thanks for the help, the coff ee, the friendship and the discussions. Jouni, Ali, Bhupi, Antto and our newest lab member Maureen: there is a good atmosphere in our lab, the science is good and I genuinely like you all. Special thanks for Ali and Bhupi for sometimes putting a nice cup of tea on my desk when I was writing this thesis and needed it the most. Lilli, when I am in the lab, we talk and we laugh and most importantly you speak Finnish to me. In this way, you helped me integrating into Finnish culture and I am grateful for that. I wish you all the best (and also your animals).

I thank Associate Professor Stephen Mount for accepting to be my opponent: I am looking forward to our scientifi c discussions. I am thankful to Professor Juha Partanen to accept the role as custos at my defense. I also thank Docent Noora Kotaja and Docent Tapio Heino for reviewing the thesis and giving insightful comments: the thesis benefi ted from your advice. Special thanks also for the member of my thesis advisory committee, Docent Petri Auvinen and Professor Ykä Helariutta for your guidance during my studies.

Honorable mentions go to my fellow Marie Curie students: Maarja, Vassilis, Roxana, Sarah and Sylvie. You are all long gone and have left the country but I enjoyed our fi rst year(s) in Finland together. Sylvie and Alain, thanks for the nice moments and for keeping in touch. I like to thank all (ex-) members of the Book Club I belong to: Elina, Tiina, Fergal, Natalia, Michele, Janneke, Jonathon, etc. Sorry for not always turning up but when I was at our meetings, I always enjoyed our discussions. My football friends at FC Ellas: you do not know so much about the Biosciences but you sure provide a welcome distraction.

38 Acknowledgements

My parents, An and Herman, and my sister, Ute: you have lost your son (or brother) to Finland but only physically. My thoughts are oft en with you and when I go back, I feel at home and welcome. Th ank you, mother, for coming over and helping us when I was writing the thesis and studying. I can never pay back all the other things you have given me. Heidi, thank you for your support and love. I apologize for being so busy and preoccupied with the thesis during a time when all the focus should have been on you. Th ank you for being understanding. Lastly, I thank my children for the added value in my life: Liina, the apple of my eye, and our latest addition, Niklas, the apple of my other eye.

Helsinki, December 2015

Jens

39 References 7. REFERENCES

ABAD, X., VERA, M., JUNG, S. P., OSWALD, E., ROMERO, I., AMIN, V., FORTES, P. & GUNDERSON, S. I. 2008. Requirements for gene silencing mediated by U1 snRNA binding to a target sequence. Nucleic Acids Res, 36, 2338-52. ABELSON, J., TROTTA, C. R. & LI, H. 1998. tRNA splicing. J Biol Chem, 273, 12685-8. AEBI, M., HORNIG, H., PADGETT, R. A., REISER, J. & WEISSMANN, C. 1986. Sequence requirements for splicing of higher eukaryotic nuclear pre-mRNA. Cell, 47, 555-65. AKEF, A., LEE, E. S. & PALAZZO, A. F. 2015. Splicing promotes the nuclear export of beta-globin mRNA by overcoming nuclear retention elements. RNA, 21, 1908-20. ALEXANDER, R. D., INNOCENTE, S. A., BARRASS, J. D. & BEGGS, J. D. 2010. Splicing-dependent RNA polymerase pausing in yeast. Mol Cell, 40, 582-93. AMEUR, A., ZAGHLOOL, A., HALVARDSON, J., WETTERBOM, A., GYLLENSTEN, U., CAVELIER, L. & FEUK, L. 2011. Total RNA sequencing reveals nascent transcription and widespread co-transcriptional splicing in the human brain. Nat Struct Mol Biol, 18, 1435-40. AMRANI, N., GANESAN, R., KERVESTIN, S., MANGUS, D. A., GHOSH, S. & JACOBSON, A. 2004. A faux 3´-UTR promotes aberrant termination and triggers nonsense-mediated mRNA decay. Nature, 432, 112-8. ARCHIBALD, J. M., O´KELLY, C. J. & DOOLITTLE, W. F. 2002. Th e chaperonin genes of jakobid and jakobid-like fl agellates: implications for eukaryotic evolution. Mol Biol Evol, 19, 422-31. ARGENTE, J., FLORES, R., GUTIERREZ-ARUMI, A., VERMA, B., MARTOS-MORENO, G. A., CUSCO, I., OGHABIAN, A., CHOWEN, J. A., FRILANDER, M. J. & PEREZ-JURADO, L. A. 2014. Defective minor spliceosome mRNA processing results in isolated familial growth hormone defi ciency. EMBO Mol Med, 6, 299-306. ASHE, M. P., FURGER, A. & PROUDFOOT, N. J. 2000. Stem-loop 1 of the U1 snRNP plays a critical role in the suppression of HIV-1 polyadenylation. RNA, 6, 170-7. ASHE, M. P., PEARSON, L. H. & PROUDFOOT, N. J. 1997. Th e HIV-1 5´ LTR poly(A) site is inactivated by U1 snRNP interaction with the downstream major splice donor site. EMBO J, 16, 5752-63. AST, G. 2004. How did alternative splicing evolve? Nat Rev Genet, 5, 773-82. BALDI, M. I., MATTOCCIA, E., BUFARDECI, E., FABBRI, S. & TOCCHINI-VALENTINI, G. P. 1992. Participation of the intron in the reaction catalyzed by the Xenopus tRNA splicing endonuclease. Science, 255, 1404-8. BANG, M. L., CENTNER, T., FORNOFF, F., GEACH, A. J., GOTTHARDT, M., MCNABB, M., WITT, C. C., LABEIT, D., GREGORIO, C. C., GRANZIER, H. & LABEIT, S. 2001. Th e complete gene sequence of titin, expression of an unusual approximately 700-kDa titin isoform, and its interaction with obscurin identify a novel Z-line to I-band linking system. Circ Res, 89, 1065-72. BARTSCHAT, S. & SAMUELSSON, T. 2010. U12 type introns were lost at multiple occasions during evolution. BMC Genomics, 11, 106. BASU, M. K., MAKALOWSKI, W., ROGOZIN, I. B. & KOONIN, E. V. 2008a. U12 intron positions are more strongly conserved between animals and plants than U2 intron positions. Biol Direct, 3, 19. BASU, M. K., ROGOZIN, I. B. & KOONIN, E. V. 2008b. Primordial spliceosomal introns were probably U2-type. Trends Genet, 24, 525-8.

40 References

BAUREN, G. & WIESLANDER, L. 1994. Splicing of Balbiani ring 1 gene pre-mRNA occurs simultaneously with transcription. Cell, 76, 183-92. BECSKEI, A. & SERRANO, L. 2000. Engineering stability in gene networks by autoregulation. Nature, 405, 590-3. BEHM-ANSMANT, I., GATFIELD, D., REHWINKEL, J., HILGERS, V. & IZAURRALDE, E. 2007. A conserved role for cytoplasmic poly(A)-binding protein 1 (PABPC1) in nonsense-mediated mRNA decay. EMBO J, 26, 1591-601. BEHZADNIA, N., GOLAS, M. M., HARTMUTH, K., SANDER, B., KASTNER, B., DECKERT, J., DUBE, P., WILL, C. L., URLAUB, H., STARK, H. & LUHRMANN, R. 2007. Composition and three-dimensional EM structure of double affi nity-purifi ed, human prespliceosomal A complexes. EMBO J, 26, 1737-48. BEJERANO, G., PHEASANT, M., MAKUNIN, I., STEPHEN, S., KENT, W. J., MATTICK, J. S. & HAUSSLER, D. 2004. Ultraconserved elements in the human genome. Science, 304, 1321-5. BENECKE, H., LUHRMANN, R. & WILL, C. L. 2005. Th e U11/U12 snRNP 65K protein acts as a molecular bridge, binding the U12 snRNA and U11-59K protein. EMBO J, 24, 3057-69. BENTLEY, D. L. 2014. Coupling mRNA processing with transcription in time and space. Nat Rev Genet, 15, 163-75. BERG, M. G., SINGH, L. N., YOUNIS, I., LIU, Q., PINTO, A. M., KAIDA, D., ZHANG, Z., CHO, S., SHERRILL-MIX, S., WAN, L. & DREYFUSS, G. 2012. U1 snRNP determines mRNA length and regulates isoform expression. Cell, 150, 53-64. BERGET, S. M. 1995. Exon recognition in vertebrate splicing. J Biol Chem, 270, 2411-4. BERGET, S. M., MOORE, C. & SHARP, P. A. 1977. Spliced segments at the 5´ terminus of adenovirus 2 late mRNA. Proc Natl Acad Sci U S A, 74, 3171-5. BERGLUND, J. A., CHUA, K., ABOVICH, N., REED, R. & ROSBASH, M. 1997. Th e splicing factor BBP interacts specifi cally with the pre-mRNA branchpoint sequence UACUAAC. Cell, 89, 781-7. BESSONOV, S., ANOKHINA, M., WILL, C. L., URLAUB, H. & LUHRMANN, R. 2008. Isolation of an active step I spliceosome and composition of its RNP core. Nature, 452, 846-50. BEYER, A. L. & OSHEIM, Y. N. 1988. Splice site selection, rate of splicing, and alternative splicing on nascent transcripts. Genes Dev, 2, 754-65. BOELENS, W. C., JANSEN, E. J., VAN VENROOIJ, W. J., STRIPECKE, R., MATTAJ, I. W. & GUNDERSON, S. I. 1993. Th e human U1 snRNP-specifi c U1A protein inhibits polyadenylation of its own pre-mRNA. Cell, 72, 881-92. BOIREAU, S., MAIURI, P., BASYUK, E., DE LA MATA, M., KNEZEVICH, A., PRADET-BALADE, B., BACKER, V., KORNBLIHTT, A., MARCELLO, A. & BERTRAND, E. 2007. Th e transcriptional cycle of HIV-1 in real-time and live cells. J Cell Biol, 179, 291-304. BON, E., CASAREGOLA, S., BLANDIN, G., LLORENTE, B., NEUVEGLISE, C., MUNSTERKOTTER, M., GULDENER, U., MEWES, H. W., VAN HELDEN, J., DUJON, B. & GAILLARDIN, C. 2003. Molecular evolution of eukaryotic genomes: hemiascomycetous yeast spliceosomal introns. Nucleic Acids Res, 31, 1121-35. BORAH, S., WONG, A. C. & STEITZ, J. A. 2009. Drosophila hnRNP A1 homologs Hrp36/Hrp38 enhance U2-type versus U12-type splicing to regulate alternative splicing of the prospero twintron. Proc Natl Acad Sci U S A, 106, 2577-82. BRADNAM, K. R. & KORF, I. 2008. Longer fi rst introns are a general property of eukaryotic gene structure. PLoS One, 3, e3093.

41 References

BROCK, J. E., DIETRICH, R. C. & PADGETT, R. A. 2008. Mutational analysis of the U12-dependent branch site consensus sequence. RNA, 14, 2430-9. BRUGIOLO, M., HERZEL, L. & NEUGEBAUER, K. M. 2013. Counting on co-transcriptional splicing. F1000Prime Rep, 5, 9. BURATTI, E., BARALLE, M., DE CONTI, L., BARALLE, D., ROMANO, M., AYALA, Y. M. & BARALLE, F. E. 2004a. hnRNP H binding at the 5´ splice site correlates with the pathological eff ect of two intronic mutations in the NF-1 and TSHbeta genes. Nucleic Acids Res, 32, 4224-36. BURATTI, E., DHIR, A., LEWANDOWSKA, M. A. & BARALLE, F. E. 2007. RNA structure is a key regulatory element in pathological ATM and CFTR pseudoexon inclusion events. Nucleic Acids Res, 35, 4369-83. BURATTI, E., MURO, A. F., GIOMBI, M., GHERBASSI, D., IACONCIG, A. & BARALLE, F. E. 2004b. RNA folding aff ects the recruitment of SR proteins by mouse and human polypurinic enhancer elements in the fi bronectin EDA exon. Mol Cell Biol, 24, 1387-400. BURGE, C. B., PADGETT, R. A. & SHARP, P. A. 1998. Evolutionary fates and origins of U12-type introns. Mol Cell, 2, 773-85. CAO, Y. & WOODSON, S. A. 2000. Refolding of rRNA exons enhances dissociation of the Tetrahymena intron. RNA, 6, 1248-56. CARIGNANI, G., GROUDINSKY, O., FREZZA, D., SCHIAVON, E., BERGANTINO, E. & SLONIMSKI, P. P. 1983. An mRNA maturase is encoded by the fi rst intron of the mitochondrial gene for the subunit I of cytochrome oxidase in S. cerevisiae. Cell, 35, 733-42. CARMEL, L., ROGOZIN, I. B., WOLF, Y. I. & KOONIN, E. V. 2007. Patterns of intron gain and conservation in eukaryotic genes. BMC Evol Biol, 7, 192. CAVALIER-SMITH, T. 1985. Selfi sh DNA and the origin of introns. Nature, 315, 283-4. CAVALIER-SMITH, T. 1991. Intron phylogeny: a new hypothesis. Trends Genet, 7, 145-8. CECH, T. R. 1986. Th e generality of self-splicing RNA: relationship to nuclear mRNA splicing. Cell, 44, 207-10. CECH, T. R. 1990. Self-splicing of group I introns. Annu Rev Biochem, 59, 543-68. CHABOT, B., LEBEL, C., HUTCHISON, S., NASIM, F. H. & SIMARD, M. J. 2003. Heterogeneous nuclear ribonucleoprotein particle A/B proteins and the control of alternative splicing of the mammalian heterogeneous nuclear ribonucleoprotein particle A1 pre-mRNA. Prog Mol Subcell Biol, 31, 59-88. CHANG, W. C., CHEN, Y. C., LEE, K. M. & TARN, W. Y. 2007. Alternative splicing and bioinformatic analysis of human U12-type introns. Nucleic Acids Res, 35, 1833-41. CHEAH, M. T., WACHTER, A., SUDARSAN, N. & BREAKER, R. R. 2007. Control of alternative RNA splicing and gene expression by eukaryotic riboswitches. Nature, 447, 497-500. CHOREV, M. & CARMEL, L. 2012. Th e function of introns. Front Genet, 3, 55. CHOW, L. T., GELINAS, R. E., BROKER, T. R. & ROBERTS, R. J. 1977. An amazing sequence arrangement at the 5´ ends of adenovirus 2 messenger RNA. Cell, 12, 1-8. COCQUERELLE, C., DAUBERSIES, P., MAJERUS, M. A., KERCKAERT, J. P. & BAILLEUL, B. 1992. Splicing with inverted order of exons occurs proximal to large introns. EMBO J, 11, 1095-8. COCQUERELLE, C., MASCREZ, B., HETUIN, D. & BAILLEUL, B. 1993. Mis-splicing yields circular RNA molecules. FASEB J, 7, 155-60. COOLIDGE, C. J., SEELY, R. J. & PATTON, J. G. 1997. Functional analysis of the polypyrimidine tract in pre-mRNA splicing. Nucleic Acids Res, 25, 888-96.

42 References

CORDIN, O., HAHN, D. & BEGGS, J. D. 2012. Structure, function and regulation of spliceosomal RNA helicases. Curr Opin Cell Biol, 24, 431-8. CORTES, J. J., SONTHEIMER, E. J., SEIWERT, S. D. & STEITZ, J. A. 1993. Mutations in the conserved loop of human U5 snRNA generate use of novel cryptic 5´ splice sites in vivo. EMBO J, 12, 5181-9. CULLEN, B. R. 2003. Nuclear RNA export. J Cell Sci, 116, 587-97. DAMIANOV, A., SCHREINER, S. & BINDEREIF, A. 2004. Recycling of the U12-type spliceosome requires p110, a component of the U6atac snRNP. Mol Cell Biol, 24, 1700-8. DARNELL, J. E., JR. 1978. Implications of RNA-RNA splicing in evolution of eukaryotic cells. Science, 202, 1257-60. DARZACQ, X., SHAV-TAL, Y., DE TURRIS, V., BRODY, Y., SHENOY, S. M., PHAIR, R. D. & SINGER, R. H. 2007. In vivo dynamics of RNA polymerase II transcription. Nat Struct Mol Biol, 14, 796-806. DAVID, C. J., BOYNE, A. R., MILLHOUSE, S. R. & MANLEY, J. L. 2011. Th e RNA polymerase II C-terminal domain promotes splicing activation through recruitment of a U2AF65-Prp19 complex. Genes Dev, 25, 972-83. DAYIE, K. T. & PADGETT, R. A. 2008. A glimpse into the active site of a group II intron and maybe the spliceosome, too. RNA, 14, 1697-703. DE ALMEIDA, S. F., GARCIA-SACRISTAN, A., CUSTODIO, N. & CARMO-FONSECA, M. 2010. A link between nuclear RNA surveillance, the human exosome and RNA polymerase II transcriptional termination. Nucleic Acids Res, 38, 8015-26. DE CONTI, L., BARALLE, M. & BURATTI, E. 2013. Exon and intron defi nition in pre-mRNA splicing. Wiley Interdiscip Rev RNA, 4, 49-60. DI NICOLA NEGRI, E., FABBRI, S., BUFARDECI, E., BALDI, M. I., GANDINI ATTARDI, D., MATTOCCIA, E. & TOCCHINI-VALENTINI, G. P. 1997. Th e eucaryal tRNA splicing endonuclease recognizes a tripartite set of RNA elements. Cell, 89, 859-66. DIETRICH, R. C., FULLER, J. D. & PADGETT, R. A. 2005. A mutational analysis of U12-dependent splice site dinucleotides. RNA, 11, 1430-40. DIETRICH, R. C., SHUKLA, G. C., FULLER, J. D. & PADGETT, R. A. 2001. Alternative splicing of U12-dependent introns in vivo responds to purine-rich enhancers. RNA, 7, 1378-88. DOMINSKI, Z. & KOLE, R. 1991. Selection of splice sites in pre-mRNAs with short internal exons. Mol Cell Biol, 11, 6075-83. DREYFUSS, G., KIM, V. N. & KATAOKA, N. 2002. Messenger-RNA-binding proteins and the messages they carry. Nat Rev Mol Cell Biol, 3, 195-205. DREYFUSS, G., SWANSON, M. S. & PINOL-ROMA, S. 1988. Heterogeneous nuclear ribonucleoprotein particles and the pathway of mRNA formation. Trends Biochem Sci, 13, 86-91. DRYSDALE, R. A., CROSBY, M. A. & FLYBASE, C. 2005. FlyBase: genes and gene models. Nucleic Acids Res, 33, D390-5. DYE, M. J. & PROUDFOOT, N. J. 1999. Terminal exon defi nition occurs cotranscriptionally and promotes termination of RNA polymerase II. Mol Cell, 3, 371-8. EBERLE, A. B., STALDER, L., MATHYS, H., OROZCO, R. Z. & MUHLEMANN, O. 2008. Posttranscriptional gene regulation by spatial rearrangement of the 3´ untranslated region. PLoS Biol, 6, e92.

43 References

EDERY, P., MARCAILLOU, C., SAHBATOU, M., LABALME, A., CHASTANG, J., TOURAINE, R., TUBACHER, E., SENNI, F., BOBER, M. B., NAMPOOTHIRI, S., JOUK, P. S., STEICHEN, E., BERLAND, S., TOUTAIN, A., WISE, C. A., SANLAVILLE, D., ROUSSEAU, F., CLERGET- DARPOUX, F. & LEUTENEGGER, A. L. 2011. Association of TALS developmental disorder with defect in minor splicing component U4atac snRNA. Science, 332, 240-3. EISENBERG, E. & LEVANON, E. Y. 2003. Human housekeeping genes are compact. Trends Genet, 19, 362-5. FABRIZIO, P., DANNENBERG, J., DUBE, P., KASTNER, B., STARK, H., URLAUB, H. & LUHRMANN, R. 2009. Th e evolutionarily conserved core design of the catalytic activation step of the yeast spliceosome. Mol Cell, 36, 593-608. FACKENTHAL, J. D. & GODLEY, L. A. 2008. Aberrant RNA splicing and its functional consequences in cancer cells. Dis Model Mech, 1, 37-42. FONG, Y. W. & ZHOU, Q. 2001. Stimulatory eff ect of splicing factors on transcriptional elongation. Nature, 414, 929-33. FORTES, P., CUEVAS, Y., GUAN, F., LIU, P., PENTLICKY, S., JUNG, S. P., MARTINEZ-CHANTAR, M. L., PRIETO, J., ROWE, D. & GUNDERSON, S. I. 2003. Inhibiting expression of specifi c genes in mammalian cells with 5´ end-mutated U1 small nuclear RNAs targeted to terminal exons of pre- mRNA. Proc Natl Acad Sci U S A, 100, 8264-9. FRILANDER, M. J. & STEITZ, J. A. 1999. Initial recognition of U12-dependent introns requires both U11/5´ splice-site and U12/branchpoint interactions. Genes Dev, 13, 851-63. FRILANDER, M. J. & STEITZ, J. A. 2001. Dynamic exchanges of RNA interactions leading to catalytic core formation in the U12-dependent spliceosome. Mol Cell, 7, 217-26. GABANELLA, F., BUTCHBACH, M. E., SAIEVA, L., CARISSIMI, C., BURGHES, A. H. & PELLIZZONI, L. 2007. Ribonucleoprotein assembly defects correlate with spinal muscular atrophy severity and preferentially aff ect a subset of spliceosomal snRNPs. PLoS One, 2, e921. GAFFNEY, D. J. & KEIGHTLEY, P. D. 2004. Unexpected conserved non-coding DNA blocks in mammals. Trends Genet, 20, 332-7. GAO, K., MASUDA, A., MATSUURA, T. & OHNO, K. 2008. Human branch point consensus sequence is yUnAy. Nucleic Acids Res, 36, 2257-67. GEHRING, N. H., HENTZE, M. W. & KULOZIK, A. E. 2008. Tethering assays to investigate nonsense-mediated mRNA decay activating proteins. Methods Enzymol, 448, 467-82. GENCHEVA, M., KATO, M., NEWO, A. N. & LIN, R. J. 2010. Contribution of DEAH-box protein DHX16 in human pre-mRNA splicing. Biochem J, 429, 25-32. GILBERT, W. 1978. Why genes in pieces? Nature, 271, 501. GILSON, P. R., SU, V., SLAMOVITS, C. H., REITH, M. E., KEELING, P. J. & MCFADDEN, G. I. 2006. Complete nucleotide sequence of the chlorarachniophyte nucleomorph: nature´s smallest nucleus. Proc Natl Acad Sci U S A, 103, 9566-71. GOODMAN, H. M., OLSON, M. V. & HALL, B. D. 1977. Nucleotide sequence of a mutant eukaryotic gene: the yeast tyrosine-inserting ochre suppressor SUP4-o. Proc Natl Acad Sci U S A, 74, 5453-7. GROSJEAN, H., SZWEYKOWSKA-KULINSKA, Z., MOTORIN, Y., FASIOLO, F. & SIMOS, G. 1997. Intron-dependent enzymatic formation of modifi ed nucleosides in eukaryotic tRNAs: a review. Biochimie, 79, 293-302. GUAN, F., CARATOZZOLO, R. M., GORACZNIAK, R., HO, E. S. & GUNDERSON, S. I. 2007. A bipartite U1 site represses U1A expression by synergizing with PIE to inhibit nuclear polyadenylation. RNA, 13, 2129-40.

44 References

GUNDERSON, S. I., POLYCARPOU-SCHWARZ, M. & MATTAJ, I. W. 1998. U1 snRNP inhibits pre-mRNA polyadenylation through a direct interaction between U1 70K and poly(A) polymerase. Mol Cell, 1, 255-64. GUNDERSON, S. I., VAGNER, S., POLYCARPOU-SCHWARZ, M. & MATTAJ, I. W. 1997. Involvement of the carboxyl terminus of vertebrate poly(A) polymerase in U1A autoregulation and in the coupling of splicing and polyadenylation. Genes Dev, 11, 761-73. GUO, B., ZOU, M., GAN, X. & HE, S. 2010. Genome size evolution in puff erfi sh: an insight from BAC clone-based Diodon holocanthus genome sequencing. BMC Genomics, 11, 396. HAAS, B. J., WORTMAN, J. R., RONNING, C. M., HANNICK, L. I., SMITH, R. K., JR., MAITI, R., CHAN, A. P., YU, C., FARZAD, M., WU, D., WHITE, O. & TOWN, C. D. 2005. Complete reannotation of the Arabidopsis genome: methods, tools, protocols and the fi nal release. BMC Biol, 3, 7. HALL, S. L. & PADGETT, R. A. 1994. Conserved sequences in a class of rare eukaryotic nuclear introns with non-consensus splice sites. J Mol Biol, 239, 357-65. HALL, S. L. & PADGETT, R. A. 1996. Requirement of U12 snRNA for in vivo splicing of a minor class of eukaryotic nuclear pre-mRNA introns. Science, 271, 1716-8. HAN, S. P., TANG, Y. H. & SMITH, R. 2010. Functional diversity of the hnRNPs: past, present and perspectives. Biochem J, 430, 379-92. HANSEN, T. B., JENSEN, T. I., CLAUSEN, B. H., BRAMSEN, J. B., FINSEN, B., DAMGAARD, C. K. & KJEMS, J. 2013. Natural RNA circles function as effi cient microRNA sponges. Nature, 495, 384-8. HASTINGS, M. L. & KRAINER, A. R. 2001. Functions of SR proteins in the U12-dependent AT-AC pre-mRNA splicing pathway. RNA, 7, 471-82. HASTINGS, M. L., RESTA, N., TRAUM, D., STELLA, A., GUANTI, G. & KRAINER, A. R. 2005. An LKB1 AT-AC intron mutation causes Peutz-Jeghers syndrome via splicing at noncanonical cryptic splice sites. Nat Struct Mol Biol, 12, 54-9. HAUGEN, P., SIMON, D. M. & BHATTACHARYA, D. 2005. Th e natural history of group I introns. Trends Genet, 21, 111-9. HE, H., LIYANARACHCHI, S., AKAGI, K., NAGY, R., LI, J., DIETRICH, R. C., LI, W., SEBASTIAN, N., WEN, B., XIN, B., SINGH, J., YAN, P., ALDER, H., HAAN, E., WIECZOREK, D., ALBRECHT, B., PUFFENBERGER, E., WANG, H., WESTMAN, J. A., PADGETT, R. A., SYMER, D. E. & DE LA CHAPELLE, A. 2011. Mutations in U4atac snRNA, a component of the minor spliceosome, in the developmental disorder MOPD I. Science, 332, 238-40. HEINRICHS, V., BACH, M., WINKELMANN, G. & LUHRMANN, R. 1990. U1-specifi c protein C needed for effi cient complex formation of U1 snRNP with a 5´ splice site. Science, 247, 69-72. HENTZE, M. W. & PREISS, T. 2013. Circular RNAs: splicing´s enigma variations. EMBO J, 32, 923-5. HETT, A. & WEST, S. 2014. Inhibition of U4 snRNA in human cells causes the stable retention of polyadenylated pre-mRNA in the nucleus. PLoS One, 9, e96174. HIROSE, T., SHU, M. D. & STEITZ, J. A. 2004. Splicing of U12-type introns deposits an exon junction complex competent to induce nonsense-mediated mRNA decay. Proc Natl Acad Sci U S A, 101, 17976-81. HIRSCHMAN, J. E., BALAKRISHNAN, R., CHRISTIE, K. R., COSTANZO, M. C., DWIGHT, S. S., ENGEL, S. R., FISK, D. G., HONG, E. L., LIVSTONE, M. S., NASH, R., PARK, J., OUGHTRED, R., SKRZYPEK, M., STARR, B., THEESFELD, C. L., WILLIAMS, J., ANDRADA, R., BINKLEY, G., DONG, Q., LANE, C., MIYASATO, S., SETHURAMAN, A., SCHROEDER, M., THANAWALA, M. K., WENG, S., DOLINSKI, K., BOTSTEIN, D. & CHERRY, J. M. 2006. Genome Snapshot: a new resource at the Saccharomyces Genome Database (SGD) presenting an overview of the Saccharomyces cerevisiae genome. Nucleic Acids Res, 34, D442-5. 45 References

HONG, X., SCOFIELD, D. G. & LYNCH, M. 2006. Intron size, abundance, and distribution within untranslated regions of genes. Mol Biol Evol, 23, 2392-404. HUANG, Y. & STEITZ, J. A. 2005. SRprises along a messenger´s journey. Mol Cell, 17, 613-5. HUANG, Y., YARIO, T. A. & STEITZ, J. A. 2004. A molecular link between SR protein dephosphorylation and mRNA export. Proc Natl Acad Sci U S A, 101, 9666-70. INCORVAIA, R. & PADGETT, R. A. 1998. Base pairing with U6atac snRNA is required for 5´ splice site activation of U12-dependent introns in vivo. RNA, 4, 709-18. IRIMIA, M., PENNY, D. & ROY, S. W. 2007. Coevolution of genomic intron number and splice sites. Trends Genet, 23, 321-5. IRIMIA, M. & ROY, S. W. 2014. Origin of spliceosomal introns and alternative splicing. Cold Spring Harb Perspect Biol, 6. IRIMIA, M., ROY, S. W., NEAFSEY, D. E., ABRIL, J. F., GARCIA-FERNANDEZ, J. & KOONIN, E. V. 2009. Complex selection on 5´ splice sites in intron-rich organisms. Genome Res, 19, 2021-7. ISSHIKI, M., YAMAMOTO, Y., SATOH, H. & SHIMAMOTO, K. 2001. Nonsense-mediated decay of mutant waxy mRNA in rice. Plant Physiol, 125, 1388-95. IZAURRALDE, E., LEWIS, J., MCGUIGAN, C., JANKOWSKA, M., DARZYNKIEWICZ, E. & MATTAJ, I. W. 1994. A nuclear cap binding protein complex involved in pre-mRNA splicing. Cell, 78, 657-68. JACKSON, I. J. 1991. A reappraisal of non-consensus mRNA splice sites. Nucleic Acids Res, 19, 3795- 8. JACQUENET, S., MEREAU, A., BILODEAU, P. S., DAMIER, L., STOLTZFUS, C. M. & BRANLANT, C. 2001. A second exon splicing silencer within human immunodefi ciency virus type 1 tat exon 2 represses splicing of Tat mRNA and binds protein hnRNP H. J Biol Chem, 276, 40464-75. JANGI, M., BOUTZ, P. L., PAUL, P. & SHARP, P. A. 2014. Rbfox2 controls autoregulation in RNA- binding protein networks. Genes Dev, 28, 637-51. JEFFARES, D. C., MOURIER, T. & PENNY, D. 2006. Th e biology of intron gain and loss. Trends Genet, 22, 16-22. KAIDA, D., BERG, M. G., YOUNIS, I., KASIM, M., SINGH, L. N., WAN, L. & DREYFUSS, G. 2010. U1 snRNP protects pre-mRNAs from premature cleavage and polyadenylation. Nature, 468, 664-8. KAMBACH, C., WALKE, S., YOUNG, R., AVIS, J. M., DE LA FORTELLE, E., RAKER, V. A., LUHRMANN, R., LI, J. & NAGAI, K. 1999. Crystal structures of two Sm protein complexes and their implications for the assembly of the spliceosomal snRNPs. Cell, 96, 375-87. KANDELS-LEWIS, S. & SERAPHIN, B. 1993. Involvement of U6 snRNA in 5´ splice site selection. Science, 262, 2035-9. KANOPKA, A., MUHLEMANN, O. & AKUSJARVI, G. 1996. Inhibition by SR proteins of splicing of a regulated adenovirus pre-mRNA. Nature, 381, 535-8. KASHIMA, I., YAMASHITA, A., IZUMI, N., KATAOKA, N., MORISHITA, R., HOSHINO, S., OHNO, M., DREYFUSS, G. & OHNO, S. 2006. Binding of a novel SMG-1-Upf1-eRF1-eRF3 complex (SURF) to the exon junction complex triggers Upf1 phosphorylation and nonsense- mediated mRNA decay. Genes Dev, 20, 355-67. KEELING, P. J., CORRADI, N., MORRISON, H. G., HAAG, K. L., EBERT, D., WEISS, L. M., AKIYOSHI, D. E. & TZIPORI, S. 2010. Th e reduced genome of the parasitic microsporidian Enterocytozoon bieneusi lacks genes for core carbon metabolism. Genome Biol Evol, 2, 304-9. KEIGHTLEY, P. D. & GAFFNEY, D. J. 2003. Functional constraints and frequency of deleterious mutations in noncoding DNA of rodents. Proc Natl Acad Sci U S A, 100, 13402-6.

46 References

KELEMEN, O., CONVERTINI, P., ZHANG, Z., WEN, Y., SHEN, M., FALALEEVA, M. & STAMM, S. 2013. Function of alternative splicing. Gene, 514, 1-30. KERTESZ, S., KERENYI, Z., MERAI, Z., BARTOS, I., PALFY, T., BARTA, E. & SILHAVY, D. 2006. Both introns and long 3´-UTRs operate as cis-acting elements to trigger nonsense-mediated decay in plants. Nucleic Acids Res, 34, 6147-57. KHODOR, Y. L., MENET, J. S., TOLAN, M. & ROSBASH, M. 2012. Cotranscriptional splicing effi ciency diff ers dramatically between Drosophila and mouse. RNA, 18, 2174-86. KISHORE, S. & STAMM, S. 2006. Th e snoRNA HBII-52 regulates alternative splicing of the serotonin receptor 2C. Science, 311, 230-2. KOLOSSOVA, I. & PADGETT, R. A. 1997. U11 snRNA interacts in vivo with the 5´ splice site of U12- dependent (AU-AC) pre-mRNA introns. RNA, 3, 227-33. KONARSKA, M. M. & SHARP, P. A. 1987. Interactions between small nuclear ribonucleoprotein particles in formation of spliceosomes. Cell, 49, 763-74. KOONIN, E. V. 2009. Intron-dominated genomes of early ancestors of eukaryotes. J Hered, 100, 618- 23. KORNBLIHTT, A. R. 2005. Promoter usage and alternative splicing. Curr Opin Cell Biol, 17, 262-8. KORNBLIHTT, A. R., DE LA MATA, M., FEDEDA, J. P., MUNOZ, M. J. & NOGUES, G. 2004. Multiple links between transcription and splicing. RNA, 10, 1489-98. KWAK, H., FUDA, N. J., CORE, L. J. & LIS, J. T. 2013. Precise maps of RNA polymerase reveal how promoters direct initiation and pausing. Science, 339, 950-3. KYBURZ, A., FRIEDLEIN, A., LANGEN, H. & KELLER, W. 2006. Direct interactions between subunits of CPSF and the U2 snRNP contribute to the coupling of pre-mRNA 3´ end processing and splicing. Mol Cell, 23, 195-205. LAMBOWITZ, A. M. & ZIMMERLY, S. 2011. Group II introns: mobile ribozymes that invade DNA. Cold Spring Harb Perspect Biol, 3, a003616. LANE, C. E., VAN DEN HEUVEL, K., KOZERA, C., CURTIS, B. A., PARSONS, B. J., BOWMAN, S. & ARCHIBALD, J. M. 2007. Nucleomorph genome of Hemiselmis andersenii reveals complete intron loss and compaction as a driver of protein structure and function. Proc Natl Acad Sci U S A, 104, 19908-13. LAREAU, L. F. & BRENNER, S. E. 2015. Regulation of splicing factors by alternative splicing and NMD is conserved between kingdoms yet evolutionarily fl exible. Mol Biol Evol, 32, 1072-9. LAREAU, L. F., INADA, M., GREEN, R. E., WENGROD, J. C. & BRENNER, S. E. 2007. Unproductive splicing of SR genes associated with highly conserved and ultraconserved DNA elements. Nature, 446, 926-9. LE HIR, H., GATFIELD, D., IZAURRALDE, E. & MOORE, M. J. 2001. Th e exon-exon junction complex provides a binding platform for factors involved in mRNA export and nonsense-mediated mRNA decay. EMBO J, 20, 4987-97. LE HIR, H., IZAURRALDE, E., MAQUAT, L. E. & MOORE, M. J. 2000. Th e spliceosome deposits multiple proteins 20-24 nucleotides upstream of mRNA exon-exon junctions. EMBO J, 19, 6860-9. LEE, R. C., GILL, E. E., ROY, S. W. & FAST, N. M. 2010. Constrained intron structures in a microsporidian. Mol Biol Evol, 27, 1979-82. LEFEBVRE, S., BURGLEN, L., REBOULLET, S., CLERMONT, O., BURLET, P., VIOLLET, L., BENICHOU, B., CRUAUD, C., MILLASSEAU, P., ZEVIANI, M. & ET AL. 1995. Identifi cation and characterization of a spinal muscular atrophy-determining gene. Cell, 80, 155-65.

47 References

LEVINE, A. & DURBIN, R. 2001. A computational scan for U12-dependent introns in the human genome sequence. Nucleic Acids Res, 29, 4006-13. LEWIS, B. P., GREEN, R. E. & BRENNER, S. E. 2003. Evidence for the widespread coupling of alternative splicing and nonsense-mediated mRNA decay in humans. Proc Natl Acad Sci U S A, 100, 189-92. LIN, C. F., MOUNT, S. M., JARMOLOWSKI, A. & MAKALOWSKI, W. 2010. Evolutionary dynamics of U12-type spliceosomal introns. BMC Evol Biol, 10, 47. LIN, S., COUTINHO-MANSFIELD, G., WANG, D., PANDIT, S. & FU, X. D. 2008. Th e splicing factor SC35 has an active role in transcriptional elongation. Nat Struct Mol Biol, 15, 819-26. LIU, Z., LUYTEN, I., BOTTOMLEY, M. J., MESSIAS, A. C., HOUNGNINOU-MOLANGO, S., SPRANGERS, R., ZANIER, K., KRAMER, A. & SATTLER, M. 2001. Structural basis for recognition of the intron branch site RNA by splicing factor 1. Science, 294, 1098-102. LOFTUS, B. J., FUNG, E., RONCAGLIA, P., ROWLEY, D., AMEDEO, P., BRUNO, D., VAMATHEVAN, J., MIRANDA, M., ANDERSON, I. J., FRASER, J. A., ALLEN, J. E., BOSDET, I. E., BRENT, M. R., CHIU, R., DOERING, T. L., DONLIN, M. J., D´SOUZA, C. A., FOX, D. S., GRINBERG, V., FU, J., FUKUSHIMA, M., HAAS, B. J., HUANG, J. C., JANBON, G., JONES, S. J., KOO, H. L., KRZYWINSKI, M. I., KWON-CHUNG, J. K., LENGELER, K. B., MAITI, R., MARRA, M. A., MARRA, R. E., MATHEWSON, C. A., MITCHELL, T. G., PERTEA, M., RIGGS, F. R., SALZBERG, S. L., SCHEIN, J. E., SHVARTSBEYN, A., SHIN, H., SHUMWAY, M., SPECHT, C. A., SUH, B. B., TENNEY, A., UTTERBACK, T. R., WICKES, B. L., WORTMAN, J. R., WYE, N. H., KRONSTAD, J. W., LODGE, J. K., HEITMAN, J., DAVIS, R. W., FRASER, C. M. & HYMAN, R. W. 2005. Th e genome of the basidiomycetous yeast and human pathogen Cryptococcus neoformans. Science, 307, 1321-4. LONG, J. C. & CACERES, J. F. 2009. Th e SR protein family of splicing factors: master regulators of gene expression. Biochem J, 417, 15-27. LOTTI, F., IMLACH, W. L., SAIEVA, L., BECK, E. S., HAO LE, T., LI, D. K., JIAO, W., MENTIS, G. Z., BEATTIE, C. E., MCCABE, B. D. & PELLIZZONI, L. 2012. An SMN-dependent U12 splicing event essential for motor circuit function. Cell, 151, 440-54. LOUHICHI, A., FOURATI, A. & REBAI, A. 2011. IGD: a resource for intronless genes in the human genome. Gene, 488, 35-40. LUO, M. J. & REED, R. 1999. Splicing is required for rapid and effi cient mRNA export in metazoans. Proc Natl Acad Sci U S A, 96, 14937-42. LYKKE-ANDERSEN, J., SHU, M. D. & STEITZ, J. A. 2001. Communication of the position of exon- exon junctions to the mRNA surveillance machinery by the protein RNPS1. Science, 293, 1836-9. LYNCH, M. & RICHARDSON, A. O. 2002. Th e evolution of spliceosomal introns. Curr Opin Genet Dev, 12, 701-10. MADAN, V., KANOJIA, D., LI, J., OKAMOTO, R., SATO-OTSUBO, A., KOHLMANN, A., SANADA, M., GROSSMANN, V., SUNDARESAN, J., SHIRAISHI, Y., MIYANO, S., THOL, F., GANSER, A., YANG, H., HAFERLACH, T., OGAWA, S. & KOEFFLER, H. P. 2015. Aberrant splicing of U12-type introns is the hallmark of ZRSR2 mutant myelodysplastic syndrome. Nat Commun, 6, 6042. MADHANI, H. D. & GUTHRIE, C. 1992. A novel base-pairing interaction between U2 and U6 snRNAs suggests a mechanism for the catalytic activation of the spliceosome. Cell, 71, 803-17. MAJEWSKI, J. & OTT, J. 2002. Distribution and characterization of regulatory elements in the human genome. Genome Res, 12, 1827-36.

48 References

MAKAROVA, O. V., MAKAROV, E. M., LIU, S., VORNLOCHER, H. P. & LUHRMANN, R. 2002. Protein 61K, encoded by a gene (PRPF31) linked to autosomal dominant retinitis pigmentosa, is required for U4/U6*U5 tri-snRNP formation and pre-mRNA splicing. EMBO J, 21, 1148-57. MANIATIS, T. & REED, R. 2002. An extensive network of coupling among gene expression machines. Nature, 416, 499-506. MARCHIONNI, M. & GILBERT, W. 1986. Th e triosephosphate isomerase gene from maize: introns antedate the plant-animal divergence. Cell, 46, 133-41. MARTIN, R. M., RINO, J., CARVALHO, C., KIRCHHAUSEN, T. & CARMO-FONSECA, M. 2013. Live-cell visualization of pre-mRNA splicing with single-molecule sensitivity. Cell Rep, 4, 1144-55. MARTINEZ-CONTRERAS, R., CLOUTIER, P., SHKRETA, L., FISETTE, J. F., REVIL, T. & CHABOT, B. 2007. hnRNP proteins and splicing control. Adv Exp Med Biol, 623, 123-47. MARTINSON, H. G. 2011. An active role for splicing in 3´-end formation. Wiley Interdiscip Rev RNA, 2, 459-70. MASUDA, S., DAS, R., CHENG, H., HURT, E., DORMAN, N. & REED, R. 2005. Recruitment of the human TREX complex to mRNA during splicing. Genes Dev, 19, 1512-7. MATLIN, A. J., CLARK, F. & SMITH, C. W. 2005. Understanding alternative splicing: towards a cellular code. Nat Rev Mol Cell Biol, 6, 386-98. MCCONNELL, T. S., CHO, S. J., FRILANDER, M. J. & STEITZ, J. A. 2002. Branchpoint selection in the splicing of U12-dependent introns in vitro. RNA, 8, 579-86. MCCRACKEN, S., LAMBERMON, M. & BLENCOWE, B. J. 2002. SRm160 splicing coactivator promotes transcript 3´-end cleavage. Mol Cell Biol, 22, 148-60. MCCRAITH, S. M. & PHIZICKY, E. M. 1991. An enzyme from Saccharomyces cerevisiae uses NAD+ to transfer the splice junction 2´-phosphate from ligated tRNA to an acceptor molecule. J Biol Chem, 266, 11986-92. MCNALLY, L. M., YEE, L. & MCNALLY, M. T. 2006. Heterogeneous nuclear ribonucleoprotein H is required for optimal U11 small nuclear ribonucleoprotein binding to a retroviral RNA-processing control element: implications for U12-dependent RNA splicing. J Biol Chem, 281, 2478-88. MEMCZAK, S., JENS, M., ELEFSINIOTI, A., TORTI, F., KRUEGER, J., RYBAK, A., MAIER, L., MACKOWIAK, S. D., GREGERSEN, L. H., MUNSCHAUER, M., LOEWER, A., ZIEBOLD, U., LANDTHALER, M., KOCKS, C., LE NOBLE, F. & RAJEWSKY, N. 2013. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature, 495, 333-8. MERICO, D., ROIFMAN, M., BRAUNSCHWEIG, U., YUEN, R. K., ALEXANDROVA, R., BATES, A., REID, B., NALPATHAMKALAM, T., WANG, Z., THIRUVAHINDRAPURAM, B., GRAY, P., KAKAKIOS, A., PEAKE, J., HOGARTH, S., MANSON, D., BUNCIC, R., PEREIRA, S. L., HERBRICK, J. A., BLENCOWE, B. J., ROIFMAN, C. M. & SCHERER, S. W. 2015. Compound heterozygous mutations in the noncoding RNU4ATAC cause Roifman Syndrome by disrupting minor intron splicing. Nat Commun, 6, 8718. MILLEVOI, S., LOULERGUE, C., DETTWILER, S., KARAA, S. Z., KELLER, W., ANTONIOU, M. & VAGNER, S. 2006. An interaction between U2AF 65 and CF I(m) links the splicing and 3´ end processing machineries. EMBO J, 25, 4854-64. MISTELI, T. 1999. RNA splicing: What has phosphorylation got to do with it? Curr Biol, 9, R198-200. MISTELI, T., CACERES, J. F. & SPECTOR, D. L. 1997. Th e dynamics of a pre-mRNA splicing factor in living cells. Nature, 387, 523-7. MISTELI, T. & SPECTOR, D. L. 1999. RNA polymerase II targets pre-mRNA splicing factors to transcription sites in vivo. Mol Cell, 3, 697-705.

49 References

MONTZKA, K. A. & STEITZ, J. A. 1988. Additional low-abundance human small nuclear ribonucleoproteins: U11, U12, etc. Proc Natl Acad Sci U S A, 85, 8885-9. MORRISON, H. G., MCARTHUR, A. G., GILLIN, F. D., ALEY, S. B., ADAM, R. D., OLSEN, G. J., BEST, A. A., CANDE, W. Z., CHEN, F., CIPRIANO, M. J., DAVIDS, B. J., DAWSON, S. C., ELMENDORF, H. G., HEHL, A. B., HOLDER, M. E., HUSE, S. M., KIM, U. U., LASEK- NESSELQUIST, E., MANNING, G., NIGAM, A., NIXON, J. E., PALM, D., PASSAMANECK, N. E., PRABHU, A., REICH, C. I., REINER, D. S., SAMUELSON, J., SVARD, S. G. & SOGIN, M. L. 2007. Genomic minimalism in the early diverging intestinal parasite Giardia lamblia. Science, 317, 1921-6. MOUNT, S. M., PETTERSSON, I., HINTERBERGER, M., KARMAS, A. & STEITZ, J. A. 1983. Th e U1 small nuclear RNA-protein complex selectively binds a 5´ splice site in vitro. Cell, 33, 509-18. MUHLRAD, D. & PARKER, R. 1999. Aberrant mRNAs with extended 3´ UTRs are substrates for rapid degradation by mRNA surveillance. RNA, 5, 1299-307. NAGY, E. & MAQUAT, L. E. 1998. A rule for termination-codon position within intron-containing genes: when nonsense aff ects RNA abundance. Trends Biochem Sci, 23, 198-9. NAGY, R., WANG, H., ALBRECHT, B., WIECZOREK, D., GILLESSEN-KAESBACH, G., HAAN, E., MEINECKE, P., DE LA CHAPELLE, A. & WESTMAN, J. A. 2012. Microcephalic osteodysplastic primordial dwarfi sm type I with biallelic mutations in the RNU4ATAC gene. Clin Genet, 82, 140-6. NEUENKIRCHEN, N., CHARI, A. & FISCHER, U. 2008. Deciphering the assembly pathway of Sm-class U snRNPs. FEBS Lett, 582, 1997-2003. NEWMAN, A. J. & NORMAN, C. 1992. U5 snRNA interacts with exon sequences at 5´ and 3´ splice sites. Cell, 68, 743-54. NGUYEN, T. H., GALEJ, W. P., BAI, X. C., SAVVA, C. G., NEWMAN, A. J., SCHERES, S. H. & NAGAI, K. 2015. Th e architecture of the spliceosomal U4/U6.U5 tri-snRNP. Nature, 523, 47-52. NI, J. Z., GRATE, L., DONOHUE, J. P., PRESTON, C., NOBIDA, N., O´BRIEN, G., SHIUE, L., CLARK, T. A., BLUME, J. E. & ARES, M., JR. 2007. Ultraconserved elements are associated with homeostatic control of splicing regulators by alternative splicing and nonsense-mediated decay. Genes Dev, 21, 708-18. NIELSEN, H. & JOHANSEN, S. D. 2009. Group I introns: Moving in new directions. RNA Biol, 6, 375-83. NIEMELÄ, E. H. & FRILANDER, M. J. 2014. Regulation of gene expression through ineffi cient splicing of U12-type introns. RNA Biol, 11, 1325-9. NIEMELÄ, E. H., OGHABIAN, A., STAALS, R. H., GRECO, D., PRUIJN, G. J. & FRILANDER, M. J. 2014. Global analysis of the nuclear processing of transcripts with unspliced U12-type introns by the exosome. Nucleic Acids Res, 42, 7358-69. NIGRO, J. M., CHO, K. R., FEARON, E. R., KERN, S. E., RUPPERT, J. M., OLINER, J. D., KINZLER, K. W. & VOGELSTEIN, B. 1991. Scrambled exons. Cell, 64, 607-13. NILSEN, T. W. 2002. Th e spliceosome: no assembly required? Mol Cell, 9, 8-9. NILSEN, T. W. 2003. Th e spliceosome: the most complex macromolecular machine in the cell? Bioessays, 25, 1147-9. NILSEN, T. W. & GRAVELEY, B. R. 2010. Expansion of the eukaryotic proteome by alternative splicing. Nature, 463, 457-63. NIWA, M. & BERGET, S. M. 1991. Mutation of the AAUAAA polyadenylation signal depresses in vitro splicing of proximal but not distal introns. Genes Dev, 5, 2086-95.

50 References

NIWA, M., ROSE, S. D. & BERGET, S. M. 1990. In vitro polyadenylation is stimulated by the presence of an upstream intron. Genes Dev, 4, 1552-9. NOTTROTT, S., URLAUB, H. & LUHRMANN, R. 2002. Hierarchical, clustered protein interactions with U4/U6 snRNA: a biochemical role for U4/U6 proteins. EMBO J, 21, 5527-38. OGDEN, R. C., LEE, M. C. & KNAPP, G. 1984. Transfer RNA splicing in Saccharomyces cerevisiae: defi ning the substrates. Nucleic Acids Res, 12, 9367-82. OMILIAN, A. R., SCOFIELD, D. G. & LYNCH, M. 2008. Intron presence-absence polymorphisms in Daphnia. Mol Biol Evol, 25, 2129-39. PALMER, J. D. & LOGSDON, J. M., JR. 1991. Th e recent origins of introns. Curr Opin Genet Dev, 1, 470-7. PAN, Q., SHAI, O., LEE, L. J., FREY, B. J. & BLENCOWE, B. J. 2008. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet, 40, 1413-5. PARK, E. J., KIM, J. H., SEONG, R. H., KIM, C. G., PARK, S. D. & HONG, S. H. 1999. Characterization of a novel mouse cDNA, ES18, involved in apoptotic cell death of T-cells. Nucleic Acids Res, 27, 1524-30. PATEL, A. A., MCCARTHY, M. & STEITZ, J. A. 2002. Th e splicing of U12-type introns can be a rate- limiting step in gene expression. EMBO J, 21, 3804-15. PATEL, A. A. & STEITZ, J. A. 2003. Splicing double: insights from the second spliceosome. Nat Rev Mol Cell Biol, 4, 960-70. PEEBLES, C. L., PERLMAN, P. S., MECKLENBURG, K. L., PETRILLO, M. L., TABOR, J. H., JARRELL, K. A. & CHENG, H. L. 1986. A self-splicing RNA excises an intron lariat. Cell, 44, 213- 23. PELLIZZONI, L. 2007. Chaperoning ribonucleoprotein biogenesis in health and disease. EMBO Rep, 8, 340-5. PESSA, H. K. & FRILANDER, M. J. 2011. Genetics. Minor splicing, disrupted. Science, 332, 184-5. PESSA, H. K., RUOKOLAINEN, A. & FRILANDER, M. J. 2006. Th e abundance of the spliceosomal snRNPs is not limiting the splicing of U12-type introns. RNA, 12, 1883-92. PETERSON, M. L. 2011. Immunoglobulin heavy chain gene regulation through polyadenylation and splicing competition. Wiley Interdiscip Rev RNA, 2, 92-105. PRAKASH, T., SHARMA, V. K., ADATI, N., OZAWA, R., KUMAR, N., NISHIDA, Y., FUJIKAKE, T., TAKEDA, T. & TAYLOR, T. D. 2010. Expression of conjoined genes: another mechanism for gene regulation in eukaryotes. PLoS One, 5, e13284. PURUGGANAN, M. & WESSLER, S. 1992. Th e splicing of transposable elements and its role in intron evolution. Genetica, 86, 295-303. PYLE, A. M. 2010. Th e tertiary structure of group II introns: implications for biological function and evolution. Crit Rev Biochem Mol Biol, 45, 215-32. QUERY, C. C., MOORE, M. J. & SHARP, P. A. 1994. Branch nucleophile selection in pre-mRNA splicing: evidence for the bulged duplex model. Genes Dev, 8, 587-97. RANGAN, P., MASQUIDA, B., WESTHOF, E. & WOODSON, S. A. 2004. Architecture and folding mechanism of the Azoarcus Group I Pre-tRNA. J Mol Biol, 339, 41-51. REARICK, D., PRAKASH, A., MCSWEENY, A., SHEPARD, S. S., FEDOROVA, L. & FEDOROV, A. 2011. Critical association of ncRNA with introns. Nucleic Acids Res, 39, 2357-66.

51 References

REBBAPRAGADA, I. & LYKKE-ANDERSEN, J. 2009. Execution of nonsense-mediated mRNA decay: what defi nes a substrate? Curr Opin Cell Biol, 21, 394-402. REED, R. & HURT, E. 2002. A conserved mRNA export machinery coupled to pre-mRNA splicing. Cell, 108, 523-31. REYES, V. M. & ABELSON, J. 1988. Substrate recognition and splice site determination in yeast tRNA splicing. Cell, 55, 719-30. ROBBERSON, B. L., COTE, G. J. & BERGET, S. M. 1990. Exon defi nition may facilitate splice site selection in RNAs with multiple exons. Mol Cell Biol, 10, 84-94. ROGERS, J. H. 1989. How were introns inserted into nuclear genes? Trends Genet, 5, 213-6. ROGOZIN, I. B., WOLF, Y. I., SOROKIN, A. V., MIRKIN, B. G. & KOONIN, E. V. 2003. Remarkable interkingdom conservation of intron positions and massive, lineage-specifi c intron loss and gain in eukaryotic evolution. Curr Biol, 13, 1512-7. ROSCIGNO, R. F. & GARCIA-BLANCO, M. A. 1995. SR proteins escort the U4/U6.U5 tri-snRNP to the spliceosome. RNA, 1, 692-706. ROSEL-HILLGARTNER, T. D., HUNG, L. H., KHRAMEEVA, E., LE QUERREC, P., GELFAND, M. S. & BINDEREIF, A. 2013. A novel intra-U1 snRNP cross-regulation mechanism: alternative splicing switch links U1C and U1-70K expression. PLoS Genet, 9, e1003856. RUSKIN, B., KRAINER, A. R., MANIATIS, T. & GREEN, M. R. 1984. Excision of an intact intron as a novel lariat structure during pre-mRNA splicing in vitro. Cell, 38, 317-31. RUSKIN, B., ZAMORE, P. D. & GREEN, M. R. 1988. A factor, U2AF, is required for U2 snRNP binding and splicing complex assembly. Cell, 52, 207-19. RUSSELL, A. G., CHARETTE, J. M., SPENCER, D. F. & GRAY, M. W. 2006. An early evolutionary origin for the minor spliceosome. Nature, 443, 863-6. SABARINADH, C., SUBRAMANIAN, S., TRIPATHI, A. & MISHRA, R. K. 2004. Extreme conservation of noncoding DNA near HoxD complex of vertebrates. BMC Genomics, 5, 75. SAKHARKAR, M. K., CHOW, V. T. & KANGUEANE, P. 2004. Distributions of exons and introns in the human genome. In Silico Biol, 4, 387-93. SAKHARKAR, M. K., PERUMAL, B. S., SAKHARKAR, K. R. & KANGUEANE, P. 2005. An analysis on gene architecture in human and mouse genomes. In Silico Biol, 5, 347-65. SALTZMAN, A. L., KIM, Y. K., PAN, Q., FAGNANI, M. M., MAQUAT, L. E. & BLENCOWE, B. J. 2008. Regulation of multiple core spliceosomal proteins by alternative splicing-coupled nonsense- mediated mRNA decay. Mol Cell Biol, 28, 4320-30. SANTORO, B., DE GREGORIO, E., CAFFARELLI, E. & BOZZONI, I. 1994. RNA-protein interactions in the nuclei of Xenopus oocytes: complex formation and processing activity on the regulatory intron of ribosomal protein gene L1. Mol Cell Biol, 14, 6975-82. SCHMID, M. & JENSEN, T. H. 2010. Nuclear quality control of RNA polymerase II transcripts. Wiley Interdiscip Rev RNA, 1, 474-85. SCHNEIDER, C., WILL, C. L., MAKAROVA, O. V., MAKAROV, E. M. & LUHRMANN, R. 2002. Human U4/U6.U5 and U4atac/U6atac.U5 tri-snRNPs exhibit similar protein compositions. Mol Cell Biol, 22, 3219-29. SCHOR, I. E., GOMEZ ACUNA, L. I. & KORNBLIHTT, A. R. 2013. Coupling between transcription and alternative splicing. Cancer Treat Res, 158, 1-24. SCHWARTZ, S., MESHORER, E. & AST, G. 2009. Chromatin organization marks exon-intron structure. Nat Struct Mol Biol, 16, 990-5.

52 References

SCHWARZ, E. M., ANTOSHECHKIN, I., BASTIANI, C., BIERI, T., BLASIAR, D., CANARAN, P., CHAN, J., CHEN, N., CHEN, W. J., DAVIS, P., FIEDLER, T. J., GIRARD, L., HARRIS, T. W., KENNY, E. E., KISHORE, R., LAWSON, D., LEE, R., MULLER, H. M., NAKAMURA, C., OZERSKY, P., PETCHERSKI, A., ROGERS, A., SPOONER, W., TULI, M. A., VAN AUKEN, K., WANG, D., DURBIN, R., SPIETH, J., STEIN, L. D. & STERNBERG, P. W. 2006. WormBase: better soft ware, richer content. Nucleic Acids Res, 34, D475-8. SERAPHIN, B., KRETZNER, L. & ROSBASH, M. 1988. A U1 snRNA:pre-mRNA base pairing interaction is required early in yeast spliceosome assembly but does not uniquely defi ne the 5´ cleavage site. EMBO J, 7, 2533-8. SHARP, P. A. 1985. On the origin of RNA splicing and introns. Cell, 42, 397-400. SHARP, P. A. & BURGE, C. B. 1997. Classifi cation of introns: U2-type or U12-type. Cell, 91, 875-9. SHEN, H. & GREEN, M. R. 2007. RS domain-splicing signal interactions in splicing of U12-type and U2-type introns. Nat Struct Mol Biol, 14, 597-603. SHEN, H., ZHENG, X., LUECKE, S. & GREEN, M. R. 2010. Th e U2AF35-related protein Urp contacts the 3´ splice site to promote U12-type intron splicing and the second step of U2-type intron splicing. Genes Dev, 24, 2389-94. SHETH, N., ROCA, X., HASTINGS, M. L., ROEDER, T., KRAINER, A. R. & SACHIDANANDAM, R. 2006. Comprehensive splice-site analysis using comparative genomics. Nucleic Acids Res, 34, 3955-67. SHIIMORI, M., INOUE, K. & SAKAMOTO, H. 2013. A specifi c set of exon junction complex subunits is required for the nuclear retention of unspliced RNAs in Caenorhabditis elegans. Mol Cell Biol, 33, 444-56. SHIN, C., FENG, Y. & MANLEY, J. L. 2004. Dephosphorylated SRp38 acts as a splicing repressor in response to heat shock. Nature, 427, 553-8. SHUKLA, G. C. & PADGETT, R. A. 2002. A catalytically active group II intron domain 5 can function in the U12-dependent spliceosome. Mol Cell, 9, 1145-50. SILVA, A. L., RIBEIRO, P., INACIO, A., LIEBHABER, S. A. & ROMAO, L. 2008. Proximity of the poly(A)-binding protein to a premature termination codon inhibits mammalian nonsense- mediated mRNA decay. RNA, 14, 563-76. SINGH, G., REBBAPRAGADA, I. & LYKKE-ANDERSEN, J. 2008. A competition between stimulators and antagonists of Upf complex recruitment governs human nonsense-mediated mRNA decay. PLoS Biol, 6, e111. SINGH, J. & PADGETT, R. A. 2009. Rates of in situ transcription and splicing in large human genes. Nat Struct Mol Biol, 16, 1128-33. SKOTHEIM, R. I. & NEES, M. 2007. Alternative splicing in cancer: noise, functional, or systematic? Int J Biochem Cell Biol, 39, 1432-49. SMITH, J. E. & BAKER, K. E. 2015. Nonsense-mediated RNA decay--a switch and dial for regulating gene expression. Bioessays, 37, 612-23. SONTHEIMER, E. J. & STEITZ, J. A. 1993. Th e U5 and U6 small nuclear RNAs as active site components of the spliceosome. Science, 262, 1989-96. SPAFFORD, J. D., SPENCER, A. N. & GALLIN, W. J. 1998. A putative voltage-gated sodium channel alpha subunit (PpSCN1) from the hydrozoan jellyfi sh, Polyorchis penicillatus: structural comparisons and evolutionary considerations. Biochem Biophys Res Commun, 244, 772-80. SPELLMAN, R., LLORIAN, M. & SMITH, C. W. 2007. Crossregulation and functional redundancy between the splicing regulator PTB and its paralogs nPTB and ROD1. Mol Cell, 27, 420-34.

53 References

SPILLER, M. P., BOON, K. L., REIJNS, M. A. & BEGGS, J. D. 2007. Th e Lsm2-8 complex determines nuclear localization of the spliceosomal U6 snRNA. Nucleic Acids Res, 35, 923-9. SPILUTTINI, B., GU, B., BELAGAL, P., SMIRNOVA, A. S., NGUYEN, V. T., HEBERT, C., SCHMIDT, U., BERTRAND, E., DARZACQ, X. & BENSAUDE, O. 2010. Splicing-independent recruitment of U1 snRNP to a transcription unit in living cells. J Cell Sci, 123, 2085-93. STERNER, D. A., CARLO, T. & BERGET, S. M. 1996. Architectural limits on split genes. Proc Natl Acad Sci U S A, 93, 15081-5. STOLTZFUS, A. 1999. On the possibility of constructive neutral evolution. J Mol Evol, 49, 169-81. STRASSER, K., MASUDA, S., MASON, P., PFANNSTIEL, J., OPPIZZI, M., RODRIGUEZ- NAVARRO, S., RONDON, A. G., AGUILERA, A., STRUHL, K., REED, R. & HURT, E. 2002. TREX is a conserved complex coupling transcription with messenger RNA export. Nature, 417, 304-8. SUN, S., ZHANG, Z., SINHA, R., KARNI, R. & KRAINER, A. R. 2010. SF2/ASF autoregulation involves multiple layers of post-transcriptional and translational control. Nat Struct Mol Biol, 17, 306-12. SUREAU, A., GATTONI, R., DOOGHE, Y., STEVENIN, J. & SORET, J. 2001. SC35 autoregulates its expression by promoting splicing events that destabilize its mRNAs. EMBO J, 20, 1785-96. SZCZESNIAK, M. W., CIOMBOROWSKA, J., NOWAK, W., ROGOZIN, I. B. & MAKALOWSKA, I. 2011. Primate and rodent specifi c intron gains and the origin of retrogenes with splice variants. Mol Biol Evol, 28, 33-7. TAKEMURA, R., TAKEIWA, T., TANIGUCHI, I., MCCLOSKEY, A. & OHNO, M. 2011. Multiple factors in the early splicing complex are involved in the nuclear retention of pre-mRNAs in mammalian cells. Genes Cells, 16, 1035-49. TALERICO, M. & BERGET, S. M. 1994. Intron defi nition in splicing of small Drosophila introns. Mol Cell Biol, 14, 3434-45. TANGE, T. O., SHIBUYA, T., JURICA, M. S. & MOORE, M. J. 2005. Biochemical analysis of the EJC reveals two new factors and a stable tetrameric protein core. RNA, 11, 1869-83. TARN, W. Y. & STEITZ, J. A. 1996a. Highly diverged U4 and U6 small nuclear RNAs required for splicing rare AT-AC introns. Science, 273, 1824-32. TARN, W. Y. & STEITZ, J. A. 1996b. A novel spliceosome containing U11, U12, and U5 snRNPs excises a minor class (AT-AC) intron in vitro. Cell, 84, 801-11. TAZI, J., BAKKOUR, N. & STAMM, S. 2009. Alternative splicing and disease. Biochim Biophys Acta, 1792, 14-26. THAPAR, R. 2015. Roles of Prolyl Isomerases in RNA-Mediated Gene Expression. Biomolecules, 5, 974-99. THOMPSON, L. D. & DANIELS, C. J. 1988. A tRNA(Trp) intron endonuclease from Halobacterium volcanii. Unique substrate recognition properties. J Biol Chem, 263, 17951-9. TIDOW, H., ANDREEVA, A., RUTHERFORD, T. J. & FERSHT, A. R. 2009. Solution structure of the U11-48K CHHC zinc-fi nger domain that specifi cally binds the 5´ splice site of U12-type introns. Structure, 17, 294-302. TOOR, N., KEATING, K. S., TAYLOR, S. D. & PYLE, A. M. 2008. Crystal structure of a self-spliced group II intron. Science, 320, 77-82. TROTTA, C. R., MIAO, F., ARN, E. A., STEVENS, S. W., HO, C. K., RAUHUT, R. & ABELSON, J. N. 1997. Th e yeast tRNA splicing endonuclease: a tetrameric enzyme with two active site subunits homologous to the archaeal tRNA endonucleases. Cell, 89, 849-58.

54 References

TRUONG, D. M., SIDOTE, D. J., RUSSELL, R. & LAMBOWITZ, A. M. 2013. Enhanced group II intron retrohoming in magnesium-defi cient Escherichia coli via selection of mutations in the ribozyme core. Proc Natl Acad Sci U S A, 110, E3800-9. TURUNEN, J. J., NIEMELA, E. H., VERMA, B. & FRILANDER, M. J. 2013a. Th e signifi cant other: splicing by the minor spliceosome. Wiley Interdiscip Rev RNA, 4, 61-76. TURUNEN, J. J., VERMA, B., NYMAN, T. A. & FRILANDER, M. J. 2013b. HnRNPH1/H2, U1 snRNP, and U11 snRNP cooperate to regulate the stability of the U11-48K pre-mRNA. RNA, 19, 380-9. TURUNEN, J. J., WILL, C. L., GROTE, M., LUHRMANN, R. & FRILANDER, M. J. 2008. Th e U11- 48K protein contacts the 5´ splice site of U12-type introns and the U11-59K protein. Mol Cell Biol, 28, 3548-60. URRUTIA, A. O. & HURST, L. D. 2003. Th e signature of selection mediated by expression on human genes. Genome Res, 13, 2260-4. VAGNER, S., RUEGSEGGER, U., GUNDERSON, S. I., KELLER, W. & MATTAJ, I. W. 2000a. Position-dependent inhibition of the cleavage step of pre-mRNA 3´-end processing by U1 snRNP. RNA, 6, 178-88. VAGNER, S., VAGNER, C. & MATTAJ, I. W. 2000b. Th e carboxyl terminus of vertebrate poly(A) polymerase interacts with U2AF 65 to couple 3´-end processing and splicing. Genes Dev, 14, 403- 13. WAHLE, E. & KUHN, U. 1997. Th e mechanism of 3´ cleavage and polyadenylation of eukaryotic pre- mRNA. Prog Nucleic Acid Res Mol Biol, 57, 41-71. VALADKHAN, S. & JALADAT, Y. 2010. Th e spliceosomal proteome: at the heart of the largest cellular ribonucleoprotein machine. Proteomics, 10, 4128-41. VALCARCEL, J., GAUR, R. K., SINGH, R. & GREEN, M. R. 1996. Interaction of U2AF65 RS region with pre-mRNA branch point and promotion of base pairing with U2 snRNA [corrected]. Science, 273, 1706-9. VALENZUELA, P., VENEGAS, A., WEINBERG, F., BISHOP, R. & RUTTER, W. J. 1978. Structure of yeast phenylalanine-tRNA genes: an intervening DNA segment within the region coding for the tRNA. Proc Natl Acad Sci U S A, 75, 190-4. WANG, Z. & BURGE, C. B. 2008. Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. RNA, 14, 802-13. VARGAS, D. Y., SHAH, K., BATISH, M., LEVANDOSKI, M., SINHA, S., MARRAS, S. A., SCHEDL, P. & TYAGI, S. 2011. Single-molecule imaging of transcriptionally coupled and uncoupled splicing. Cell, 147, 1054-65. WASSARMAN, K. M. & STEITZ, J. A. 1992. Th e low-abundance U11 and U12 small nuclear ribonucleoproteins (snRNPs) interact to form a two-snRNP complex. Mol Cell Biol, 12, 1276-85. WICKENS, M., ANDERSON, P. & JACKSON, R. J. 1997. Life and death in the cytoplasm: messages from the 3´ end. Curr Opin Genet Dev, 7, 220-32. WILL, C. L., BEHRENS, S. E. & LUHRMANN, R. 1993. Protein composition of mammalian spliceosomal snRNPs. Mol Biol Rep, 18, 121-6. WILL, C. L. & LÜHRMANN, R. 2011. Spliceosome structure and function. Cold Spring Harb Perspect Biol, 3. WILL, C. L., SCHNEIDER, C., HOSSBACH, M., URLAUB, H., RAUHUT, R., ELBASHIR, S., TUSCHL, T. & LUHRMANN, R. 2004. Th e human 18S U11/U12 snRNP contains a set of novel proteins not found in the U2-dependent spliceosome. RNA, 10, 929-41.

55 References

WILL, C. L., SCHNEIDER, C., REED, R. & LUHRMANN, R. 1999. Identifi cation of both shared and distinct proteins in the major and minor spliceosomes. Science, 284, 2003-5. WILL, C. L., URLAUB, H., ACHSEL, T., GENTZEL, M., WILM, M. & LUHRMANN, R. 2002. Characterization of novel SF3b and 17S U2 snRNP proteins, including a human Prp5p homologue and an SF3b DEAD-box protein. EMBO J, 21, 4978-88. WOOD, V., GWILLIAM, R., RAJANDREAM, M. A., LYNE, M., LYNE, R., STEWART, A., SGOUROS, J., PEAT, N., HAYLES, J., BAKER, S., BASHAM, D., BOWMAN, S., BROOKS, K., BROWN, D., BROWN, S., CHILLINGWORTH, T., CHURCHER, C., COLLINS, M., CONNOR, R., CRONIN, A., DAVIS, P., FELTWELL, T., FRASER, A., GENTLES, S., GOBLE, A., HAMLIN, N., HARRIS, D., HIDALGO, J., HODGSON, G., HOLROYD, S., HORNSBY, T., HOWARTH, S., HUCKLE, E. J., HUNT, S., JAGELS, K., JAMES, K., JONES, L., JONES, M., LEATHER, S., MCDONALD, S., MCLEAN, J., MOONEY, P., MOULE, S., MUNGALL, K., MURPHY, L., NIBLETT, D., ODELL, C., OLIVER, K., O´NEIL, S., PEARSON, D., QUAIL, M. A., RABBINOWITSCH, E., RUTHERFORD, K., RUTTER, S., SAUNDERS, D., SEEGER, K., SHARP, S., SKELTON, J., SIMMONDS, M., SQUARES, R., SQUARES, S., STEVENS, K., TAYLOR, K., TAYLOR, R. G., TIVEY, A., WALSH, S., WARREN, T., WHITEHEAD, S., WOODWARD, J., VOLCKAERT, G., AERT, R., ROBBEN, J., GRYMONPREZ, B., WELTJENS, I., VANSTREELS, E., RIEGER, M., SCHAFER, M., MULLER- AUER, S., GABEL, C., FUCHS, M., DUSTERHOFT, A., FRITZC, C., HOLZER, E., MOESTL, D., HILBERT, H., BORZYM, K., LANGER, I., BECK, A., LEHRACH, H., REINHARDT, R., POHL, T. M., EGER, P., ZIMMERMANN, W., WEDLER, H., WAMBUTT, R., PURNELLE, B., GOFFEAU, A., CADIEU, E., DREANO, S., GLOUX, S., et al. 2002. Th e genome sequence of Schizosaccharomyces pombe. Nature, 415, 871-80. WOODSON, S. A. 2005. Structure and assembly of group I introns. Curr Opin Struct Biol, 15, 324-30. WU, J. & MANLEY, J. L. 1989. Mammalian pre-mRNA branch site selection by U2 snRNP involves base pairing. Genes Dev, 3, 1553-61. WU, J., XIAO, J., WANG, L., ZHONG, J., YIN, H., WU, S., ZHANG, Z. & YU, J. 2013. Systematic analysis of intron size and abundance parameters in diverse lineages. Sci China Life Sci, 56, 968-74. WU, J. A. & MANLEY, J. L. 1991. Base pairing between U2 and U6 snRNAs is necessary for splicing of a mammalian pre-mRNA. Nature, 352, 818-21. WU, J. Y. & MANIATIS, T. 1993. Specifi c interactions between proteins implicated in splice site selection and regulated alternative splicing. Cell, 75, 1061-70. WU, Q. & KRAINER, A. R. 1996. U1-mediated exon defi nition interactions between AT-AC and GT-AG introns. Science, 274, 1005-8. WU, Q. & KRAINER, A. R. 1999. AT-AC pre-mRNA splicing mechanisms and conservation of minor introns in voltage-gated ion channel genes. Mol Cell Biol, 19, 3225-36. YOSHIDA, K., SANADA, M., SHIRAISHI, Y., NOWAK, D., NAGATA, Y., YAMAMOTO, R., SATO, Y., SATO-OTSUBO, A., KON, A., NAGASAKI, M., CHALKIDIS, G., SUZUKI, Y., SHIOSAKA, M., KAWAHATA, R., YAMAGUCHI, T., OTSU, M., OBARA, N., SAKATA-YANAGIMOTO, M., ISHIYAMA, K., MORI, H., NOLTE, F., HOFMANN, W. K., MIYAWAKI, S., SUGANO, S., HAFERLACH, C., KOEFFLER, H. P., SHIH, L. Y., HAFERLACH, T., CHIBA, S., NAKAUCHI, H., MIYANO, S. & OGAWA, S. 2011. Frequent pathway mutations of splicing machinery in myelodysplasia. Nature, 478, 64-9. YOSHIHISA, T. 2014. Handling tRNA introns, archaeal way and eukaryotic way. Front Genet, 5, 213. YOUNIS, I., DITTMAR, K., WANG, W., FOLEY, S. W., BERG, M. G., HU, K. Y., WEI, Z., WAN, L. & DREYFUSS, G. 2013. Minor introns are embedded molecular switches regulated by highly unstable U6atac snRNA. Elife, 2, e00780. YU, P., MA, D. & XU, M. 2005. Nested genes in the human genome. Genomics, 86, 414-22.

56 References

YU, Y. T. & STEITZ, J. A. 1997. Site-specifi c crosslinking of mammalian U11 and u6atac to the 5´ splice site of an AT-AC intron. Proc Natl Acad Sci U S A, 94, 6030-5. YURYEV, A., PATTURAJAN, M., LITINGTUNG, Y., JOSHI, R. V., GENTILE, C., GEBARA, M. & CORDEN, J. L. 1996. Th e C-terminal domain of the largest subunit of RNA polymerase II interacts with a novel set of serine/arginine-rich proteins. Proc Natl Acad Sci U S A, 93, 6975-80. ZAHLER, A. M., DAMGAARD, C. K., KJEMS, J. & CAPUTI, M. 2004. SC35 and heterogeneous nuclear ribonucleoprotein A/B proteins bind to a juxtaposed exonic splicing enhancer/exonic splicing silencer element to regulate HIV-1 tat exon 2 splicing. J Biol Chem, 279, 10077-84. ZAMORE, P. D. & GREEN, M. R. 1989. Identifi cation, purifi cation, and biochemical characterization of U2 small nuclear ribonucleoprotein auxiliary factor. Proc Natl Acad Sci U S A, 86, 9243-7. ZHANG, C., LI, W. H., KRAINER, A. R. & ZHANG, M. Q. 2008a. RNA landscape of evolution for optimal exon and intron discrimination. Proc Natl Acad Sci U S A, 105, 5797-802. ZHANG, Z., LOTTI, F., DITTMAR, K., YOUNIS, I., WAN, L., KASIM, M. & DREYFUSS, G. 2008b. SMN defi ciency causes tissue-specifi c perturbations in the repertoire of snRNAs and widespread defects in splicing. Cell, 133, 585-600. ZHUANG, Y. & WEINER, A. M. 1986. A compensatory base change in U1 snRNA suppresses a 5´ splice site mutation. Cell, 46, 827-35. ZIMMERLY, S., GUO, H., PERLMAN, P. S. & LAMBOWITZ, A. M. 1995. Group II intron mobility occurs by target DNA-primed reverse transcription. Cell, 82, 545-54. ZORIO, D. A. & BLUMENTHAL, T. 1999. Both subunits of U2AF recognize the 3´ splice site in Caenorhabditis elegans. Nature, 402, 835-8. ÄNKÖ, M. L., MULLER-MCNICOLL, M., BRANDL, H., CURK, T., GORUP, C., HENRY, I., ULE, J. & NEUGEBAUER, K. M. 2012. Th e RNA-binding landscapes of two SR proteins reveal unique functions and binding to diverse RNA classes. Genome Biol, 13, R17.

57