Lariat sequencing in a unicellular yeast identifies regulated alternative splicing of exons that are evolutionarily conserved with humans

Ali R. Awan, Amanda Manfredo, and Jeffrey A. Pleiss1

Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853

Edited by Michael Rosbash, Howard Hughes Medical Institute, Brandeis University, Waltham, MA, and approved June 26, 2013 (received for review October 24, 2012)

Alternative splicing is a potent regulator of expression that early functional exon skipping have been previously suggested: vastly increases proteomic diversity in multicellular eukaryotes environmentally or developmentally regulated exon skipping in and is associated with organismal complexity. Although alterna- a unicellular eukaryote, and conservation of a particular exon- tive splicing is widespread in vertebrates, little is known about the skipping pattern across eukaryotic kingdoms (11, 14). Although evolutionary origins of this process, in part because of the absence examples of exon skipping in a few unicellular eukaryotes have – of phylogenetically conserved events that cross major eukaryotic been published (15 17), to our knowledge there are no published clades. Here we describe a lariat-sequencing approach, which examples of either evolutionarily conserved or environmentally offers high sensitivity for detecting splicing events, and its appli- regulated exon-skipping events in any unicellular eukaryote. fi Schizosaccharomyces pombe cation to the unicellular fungus, Schizosaccharomyces pombe,an The unicellular ssion yeast, ,shares many of the hallmarks of alternative splicing in mammalian sys- organism that shares many of the hallmarks of alternative splicing tems. Nearly 50% of S. pombe contain an intron, and almost in mammalian systems but for which no previous examples of half of those contain multiple introns (18, 19). Moreover, the splice exon-skipping had been demonstrated. Over 200 previously un- site sequences found within S. pombe introns do not conform to the fi annotated splicing events were identi ed, including examples of tight consensus sequences seen in some other unicellular fungi, but GENETICS regulated alternative splicing. Remarkably, an evolutionary analy- rather are marked by a degeneracy more similar to that seen in sis of four of the exons identified here as subject to skipping in human introns (20, 21). Importantly, a single bona fide member S. pombe reveals high sequence conservation and perfect length of the serine/arginine-rich (SR) family of splicing regulators and conservation with their homologs in scores of plants, animals, and several SR-like implicated in the regulation of alternative fungi. Moreover, alternative splicing of two of these exons have splicing are encoded within the S. pombe genome (22, 23). Fur- been documented in multiple vertebrate organisms, making these thermore, cis-acting sequence elements important in splicing reg- the first demonstrations of identical alternative-splicing patterns ulation in multicellular eukaryotes have been shown to modulate in species that are separated by over 1 billion y of evolution. the splicing efficiency of individual S. pombe introns (24), and in- deed nonendogenous plant and mammalian intron sequences have pre-mRNA splicing | post-transcriptional gene regulation | phylogeny been shown to be properly excised in S. pombe (25, 26). Never- theless, although instances of intron-retention have been docu- mented (27, 28), no examples of exon skipping have been described he coding regions of eukaryotic genes are typically in S. pombe. Surprisingly, two recent RNA-seq studies (29, 30) Tinterrupted by noncoding introns that must be removed to failed to detect any instances of exon skipping in S. pombe;however, produce a translatable mRNA. The removal of introns, catalyzed it was unclear whether this reflected an absence of such splicing by the spliceosome, offers a powerful opportunity for an organism events in S. pombe or a limitation of RNA-seq for detecting them. to regulate . In mammals, where individual genes are often interrupted by multiple introns, it is now abundantly Results clear that the process of intron removal provides a critical regu- To facilitate a deeper analysis of global splicing patterns in latory control point for both qualitative and quantitative aspects S. pombe that might uncover exon-skipping events, we developed of gene expression (1). By changing the identity of the exons that an alternative approach designed to concentrate sequencing are included within the final mRNA, the process of alternative efforts on the products of a splicing reaction. This approach splicing plays a critical role in expanding the diversity of proteins seeks to enrich, purify, and sequence the excised circular lariat that can be synthesized within a cell (2). Moreover, alternative RNAs that are a product of every splicing reaction. To stabilize splicing can direct the production of isoforms of genes that are these circular RNAs, the gene encoding the debranching directly targeted to cellular decay pathways, providing a mecha- (Dbr1) responsible for linearizing introns (31) was genetically nism to quantitatively regulate gene expression (3, 4). deleted from an S. pombe strain. Because environmental stresses In mammalian organisms the predominant form of alternative can stimulate splicing regulation even in budding yeast (32), the splicing is exon skipping, wherein different combinations of exons Δdbr1 S. pombe strain was exposed to a variety of environmental are included in the final transcript. In contrast, exon skipping is stresses, including various nutrient deprivations, temperature far less prevalent in simpler eukaryotes (5, 6); however, recent variations, and chemical exposures (SI Materials and Methods studies suggest that splicing in the last eukaryotic common an- cestor was similar in many respects to splicing in vertebrates, in so much as it was intron-dense (7–9), had degenerate splice site Author contributions: A.R.A., A.M., and J.A.P. designed research; A.R.A. and A.M. per- sequences (10), and likely had many of the proteins involved in formed research; A.R.A. contributed new reagents/analytic tools; A.R.A., A.M., and J.A.P. alternative splicing (11, 12). Intron density correlates positively analyzed data; and A.R.A. and J.A.P. wrote the paper. with the prevalence of alternative splicing across the eukaryotic The authors declare no conflict of interest. kingdoms (13), and thus it has been posited that the intron-rich This article is a PNAS Direct Submission. eukaryotic ancestor had alternative splicing, and may have had Data deposition: The data reported in this paper have been deposited in the Gene Ex- exon skipping. However, it is less clear whether the inferred an- pression Omnibus (GEO) database, www.ncbi.nlm.nih.gov/geo (accession no. GSE48594). cestral exon skipping was functional or represents “noise” that 1To whom correspondence should be addressed. E-mail: [email protected]. was later harnessed in the evolutionary lineages that led to mul- This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. ticellularity (13, 14). Two “smoking guns” that would corroborate 1073/pnas.1218353110/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1218353110 PNAS Early Edition | 1of6 Downloaded by guest on September 27, 2021 and Dataset S1, Table S1). Total RNA from each individual Lariat Sequencing Identifies Over 200 Previously Uncharacterized sample was pooled together for analysis. Splicing Events. Although over 80% of the lariat-sequencing reads Circular RNAs are retarded in their mobility relative to linear mapped to previously annotated introns, nearly 15% of the reads RNA and can be detected in 2D polyacrylamide gels (33) as an mapped to regions currently annotated as untranslated or protein- arc above a diagonal of linear RNAs (Fig. 1A). Importantly, coding (Dataset S1, Table S2). To determine whether these peaks these arcs were only detected in RNA from strains lacking might represent novel, unannotated splicing events, probabilistic functional Dbr1, suggesting they are predominantly populated by modeling was used to score putative splicing motifs surrounding excised lariats (Fig. S1). To increase the chances of identifying each peak in these regions. A Markov model derived from known exon-skipping events, gel conditions were optimized to recover intron sequences (35) allowed the quality of putative introns to be circular RNAs in a size range likely to include lariats that con- scored (SI Materials and Methods), enabling identification of over tained skipped exons. According to the most recent S. pombe 200 previously uncharacterized introns (Dataset S1, Table S5), the genome annotation, the shortest lariat that could be formed by majority of which have splice site sequences that are indistin- skipping a single exon is 89 nucleotides; as such, gel conditions guishable from canonical S. pombe introns (Fig. S2). were determined to optimize recovery of lariats ∼85 nucleotides Nearly half of the unique introns identified here are located or greater (Materials and Methods). within the UTRs of protein-coding genes, two of which are shown The circular RNA arcs were excised from the 2D gels and in Fig. 2A. For both the cnl2 and caf5 genes, the peaks lie com- converted into cDNA by random priming without prior debranch- pletely within the 5′ UTR of the transcripts. For both of these ing. The resulting cDNAs were sequenced using Illumina’s3G putative introns, strong splice site sequences were identified at technology, and the resulting reads were aligned to the S. pombe the boundaries of the lariat-sequencing peaks. Flanking PCR in fi genome using Bowtie (34). Importantly, of a total of 12.4 million a wild-type strain con rms the splicing of both of these introns. alignable reads, over 80% of the reads derived from this approach Moreover, in the background of a strain lacking the nonessential fi align to previously annotated introns (Dataset S1, Table S2), con- splicing factor smd3, the ef ciency of each of these splicing events firming that the circular arc in the 2D gel consists primarily of lariat is reduced, consistent with their being canonical, spliceosomal RNA. “Peaks” of read enrichment were subsequently identified, introns. A subset of the predicted introns that vary both in terms of lariat sequencing read count and probabilistic modeling score allowing for the characterization of putative lariat RNAs. In con- fi sidering only those peaks with a minimum of 25 overlapping reads, was chosen for further validation, each of which con rms the lariats corresponding to 82% of all annotated introns between 85 splicing event, suggesting a high true-positive rate of discovery fi (Fig. S3). Our data suggest that as many as 10% of all S. pombe and 400 nucleotides were identi ed (Dataset S1, Table S3), con- ′ ′ firming the efficiency of lariat purification. Moreover, the con- genes contain an intron within their 5 or 3 UTRs. Although such ditions used here successfully isolated lariats in a size range that introns have often been ignored, recent work strongly argues in favor of a role for these introns in regulating gene expression (36). would enable detection of 66% of all possible single exon-skipping fi events in the S. pombe genome (SI Materials and Methods and A second category of peaks was identi ed in which the putative intron was located entirely within a coding region, suggesting the Dataset S1, Table S4). existence of alternatively retained introns. Twenty-five such peaks Representative examples of lariat-sequencing read peaks align- were identified, two of which are shown in Fig. 2B. For six of these ingtoknownintronsareshowninFig.1B for the sim4 and rpl4301 peaks the putative intron has a length that is divisible by three, as genes, representing the lower and upper ends of read counts, re- seen for the peak in the cys11 gene, suggesting that its removal spectively. The appearance of these peaks strongly suggests that would not disrupt the translational frame of the downstream they are derived from fully excised lariats rather than lariat inter- portion of the protein. Flanking PCR revealed an intermediate mediates (LI) resulting from incomplete splicing reactions. Al- splicing phenotype of this intron, with both the spliced and the though LI RNAs could be isolated from this approach, such RNAs retained isoforms readily detectable. In contrast, 19 of the exonic would be expected to produce sequencing reads that align not only peaks suggest the presence of an intron whose length is not di- to the intronic region, but also through the entire downstream re- fl fl visible by three, as seen for rad8 in Fig. 2B. Removal of these gion. As expected, anking PCR using primers that ank the pre- introns is predicted to change the reading frame, in all cases dicted splice sites shows that the spliced isoform is the predominant resulting in production of an mRNA with a premature stop codon. species for both of these transcripts. Although such isoforms may generate truncated proteins, they are often targeted to the nonsense mediated decay (NMD) pathway (37). Flanking PCR examining the unique rad8 intron revealed sim4 rpl4301 only a small amount of spliced product in a wild-type strain, but an A B A 25 200K increase in a strain where the NMD factor upf1 (38) had been disrupted (Fig. 2B), suggesting that a sizable fraction of rad8 transcripts are normally spliced at this intron. linear lariat Identification and Characterization of Exon-Skipping Events. Given

15% read density read density the ability of lariat sequencing to detect previously unidentified splicing events, we asked whether exon-skipping events could be detected in S. pombe. Twenty-three peaks were identified that had gDNA cDNA gDNA cDNA continuous signal across an exon and its flanking introns, the expected appearance of a lariat resulting from exon skipping (Fig. 7.5% S4 and Dataset S1, Table S6). For 14 of these, including srrm1 in Fig. 3A, PCR using primers that target the exons flanking the fi Fig. 1. Puri cation and sequencing of excised lariat introns. (A) Stabiliza- skipped exon detects both the canonical and alternatively spliced + tion of lariat RNAs is achieved by genetically deleting the gene encoding the isoforms of the transcript in wild-type (dbr1 ) cells grown under debranching enzyme, Dbr1. Total cellular RNA isolated from a Δdbr1 strain fi grown under a variety of conditions was pooled and subjected to 2D gel standard conditions, con rming the alternative splicing event. In electrophoresis, allowing for separation of linear RNAs from circular lariats. contrast, for the remaining 10 genes the exon-skipping event was (B) Lariat RNAs recovered from the 2D gel were sequenced and aligned to not readily apparent. Because our lariat-sequencing dataset was the S. pombe genome. Read density plots are shown near the known introns generated from a pooled sample of RNAs derived from many in the sim4 and rpl4301 genes. Arrows indicate locations of primers used for environmental stresses, we individually tested each of the stress confirmation. PCR products generated using either genomic DNA or cDNA as conditions for two of the putative alternative events. Indeed, the a template confirm the splicing of these introns. Locations of the unspliced only samples in which the alternative isoform of alp41 could be and spliced products are noted. detected were those in which the cells had been exposed to heat

2of6 | www.pnas.org/cgi/doi/10.1073/pnas.1218353110 Awan et al. Downloaded by guest on September 27, 2021 A the ∼2 million reads that were unalignable (SI Materials and 120 cnl2 100 caf5 Methods and Figs. S8 and S9). It was recently demonstrated that lariat RNAs can be identified from the unalignable reads in an RNA-seq experiment by virtue of the juxtapositioning of the 5′ splice site (5′SS) and branchpoint (BP) sequences produced during read density read density the splicing reaction (41). By “splitting” the sequencing reads, both the 5′SS and BP-aligning regions can be readily identified, enabling GUAAGU ... UACUAAU ... AAG GUAAGG ... AGUUAAC ... UAG detection of these BP-spanning elements. By adopting a modified fi A C C A C C version of this computational approach, we identi ed over 37,000 U UA U U UA U A A U U AU C U U AU C U C C G U C C G C U C U G A GG A G AG GG A GUA GG UUAAU AAG GUA G UUAAU AAG wt ∆smd3 wt ∆smd3 such reads in our dataset (SI Materials and Methods and Dataset S1, gDNA cDNA cDNA gDNA cDNA cDNA Table S7), which allowed us to precisely map the BP for 916 S. pombe introns, 817 of which were previously annotated introns, and 99 of which were previously uncharacterized introns identified in this study. B 45 11syc 200 8dar To assess the accuracy of computational prediction of BPs in the S. pombe genome, empirically determined BPs from the read density read density A gDNA cDNA 80 srrm1 GUAAGC ... UAUUAAC ... UAG GUACAA ... UGCUAAC ... CAG

A U UAC C U A U UAC C U A A U U AU C U U AU C U C C G U C C G C U C U G AG GG A G A GG A GUA G UUAAU AAG GUA GG UUAAU AAG * 150nt 157nt ∆upf1 wt Exon 1 Exon 3 gDNA cDNA gDNA cDNA cDNA

read density GTTTTATAAGGTTGACATGA GENETICS

Fig. 2. Lariat sequencing identifies over 200 unannotated introns. (A) Read density plots of previously uncharacterized introns in the cnl2 and caf5 gDNA cDNA heat genes. Cartoons of the currently annotated coding regions are indicated below. The sequences corresponding to the putative branch point, 5′ and 3′ 200 alp41 splice sites are indicated, as are the consensus sequences in known S. pombe introns (60). Flanking PCR demonstrates removal of the introns in cDNA from * wild-type cells, but a block to splicing in cDNA from cells lacking the splicing factor, smd3 (Δsmd3 cDNA). (B) Read density plots of unannotated introns Exon 4 Exon 6

in the coding regions of the cys11 and rad8 genes. Intron length is indicated read density GGTAGAAGAGATATTAAATA by arrows. Flanking PCR using cDNA from wild-type cells reveals an inter- mediate amount of splicing of these introns. In the background of a strain lacking the NMD factor upf1 (Δupf1 cDNA), the spliced isoform of rad8 is stabilized.

gDNA cDNA cold fi shock, whereas the alternative isoform of qcr10 was speci cally 250 qcr10 elicited in response to cold shock (Fig. 3A and Fig. S4). Flanking PCR using cDNA from a time courses of each of these stressors * confirms the induction of the alternative isoforms of both alp41 and qcr10 (Fig. 3B). Similarly, a quantitative PCR-based ap- Exon 1 Exon 3 GATTTCTTTCTATTCCTGCC proach confirms the rapid and specific production of the alter- read density native isoforms (SI Materials and Methods and Figs. S5 and S6). Importantly, sequencing of the bands corresponding to the al- ternative products for all three genes in Fig. 3 confirms the pre- dicted alternative splicing. Given a priori knowledge about the specific transcripts that B alp41 heat shock qcr10 cold shock are subject to alternative splicing, we returned to the published S. pombe datasets to determine whether any RNA-seq evidence existed for these exon skipping events (29, 30). Importantly, for 7 of * the 23 events we describe here, small numbers of RNA-seq reads 0130 6020 180 fi fi * can be identi ed that independently con rm our observations (Fig. 0 10 30 S7 and Dataset S1, Table S6), including two that could not be Time (mins) Time (mins) validated by flanking PCR. The low number of alternative reads and their propensity to be misaligned presumably precluded their Fig. 3. Examples of constitutive and inducible exon skipping. (A) Read fi density plots surrounding alternatively skipped exons (red) within the srrm1, identi cation in the original work. Although computational alp41, and qcr10 transcripts. Peaks are colored red where the reads overlap approaches for identifying splicing junctions within RNA-seq with the skipped exon, and gray where they overlap with the flanking datasets continue to improve (39, 40), the statistical limitations introns. PCR products generated using either genomic DNA or cDNA from inherent to the low read counts associated with these events will wild-type cells grown under normal conditions are indicated; cDNA from complicate their ability to be distinguished from noise. wild-type cells exposed to either heat or cold shock are also indicated. Locations of the unspliced, spliced, and exon-skipped (*) products are noted Characterizing Circular RNAs Using Split Reads. Although the ma- (SI Materials and Methods and Fig. S11). (B) Flanking PCR using cDNA from jority of the reads generated from our sequencing could be directly a time-course of heat shock (alp41) or cold shock (qcr10) demonstrates the alignedtotheS. pombe genome, we sought to further characterize stress-induced alternative splicing of these transcripts.

Awan et al. PNAS Early Edition | 3of6 Downloaded by guest on September 27, 2021 R BP-traversing reads were compared with the predicted BPs for . communis srrm1 these introns (Dataset S1, Table S8). A total of 864 BPs were A. thaliana predicted within 693 of these introns (158 introns contain mul- tiple predictions), whereas 124 introns lacked any predictions. P. patens H. sapiens G. max M. brevicollis Experimental support was found for 588 of the predicted BPs S. pombe (located in 578 introns), but was absent for the remaining 276 N. vectensis S. pombe predicted BPs (located within 233 introns); it is unclear whether C. familiaris N. vitripennis fi A. mellifera these latter BP predictions are incorrect or whether insuf cient D. rerio experimental coverage precluded their detection. Overall, these B. taurus G. gallus X. tropicalis data suggest that computational prediction of BPs in S. pombe is largely accurate; however, a total of 423 previously uncharac- M. musculus H. sapiens N L EC Q Q fi E K Q P E M DH G E ED P L D T S N H I T P E A Q D T S S C SV I R I A R S M STQ V VA T E L FTA R RNM RVN Y KT TT AT DN M terized BPs were identi ed in 365 introns, including the 124 GTSA QD RFS K KAKLLK KF L introns for which no predictions existed. Although many of the alternative BPs are rare, they provide novel insights into global al- ternative BP usage. Importantly, although the vast majority of BP- alp41 traversing reads corresponded to canonical splicing of single introns, P. patens 12 of these reads confirmed eight of the previously described exon- T. gondii skipping events identified in this work (Dataset S1, Table S9), in- D. discoideum H. sapiens

A. thaliana cluding three that were not previously validated by flanking PCR, R. communis G. max S. pombe providing additional evidence for their alternative splicing. M. brevicollis Whereas the vast majority (∼98%) of the unalignable reads M. musculus C. neoformans remained unalignable even when split into two segments (Fig. S8), S. pombe A. flavus N. crassa just under 400 reads were alignable as two segments on the same H. sapiens N. vectensis and strand, and separated by fewer than 5,000 bases. B. taurus Of these reads, most (387) corresponded to canonical splice- D. rerio fl S V L A Q P ALSED E junction reads, presumably re ecting low levels of LI or mature S E SE RA K E Q P D N K DA G C A S PT A Q K M VA K HD A T G LKA T Q D L K V S V K S L S S Q MLQ H I Q G V K L R GATLI I I LFI E I G M S mRNA contamination. Interestingly, however, 11 reads corre- L L NK D sponded to circularized protein-coding exons, analogous to the Fig. 4. Evolutionary conservation of skipped exons in the srrm1 and alp41 “circRNAs” that have been recently demonstrated in a variety of – genes of S. pombe.(Left) Pruned phylogenetic trees depicting model species organisms (42 45). These reads corresponded to 10 unique cir- that have an ortholog of the skipped S. pombe exons in srrm1 (Upper)oralp41 cularization events, nine of which were characterized by a single (Lower), drawn according to NCBI taxonomy relationships (61). Full trees are read and one for which two reads were identified (Dataset S1, shown in Fig. S10. Vertebrate branches are colored red and plant branches are Table S10). Although infrequent, these reads suggest that the colored green. (Right) For the srrm1 and alp41 genes, the two alternate spliceosome in S. pombe is also capable of producing these S. pombe isoforms corresponding to the skipping events are shown with the “inverted” splicing events. Understanding the prevalence and corresponding isoforms from Ensembl (human) or UniProt (mouse) shown biological significance of these species in S. pombe will require above. Only the first six exons of the human ortholog of the srrm1 gene are future examination. shown. The evolutionarily conserved exons are shown in red; intron lengths are shown in Fig. S10. Below the gene diagrams are shown the peptide translations Evolutionary Conservation of Alternative Splicing Events. To assess of the S. pombe skipped exons and below that a peptide motif constructed the potential evolutionary conservation of the identified exon- from a multiple sequence alignment of the translated orthologous exons skipping patterns between S. pombe and multicellular eukar- from all species for which an orthologous exon was found (60). yotes, we adopted an approach from two previous studies that compared alternative splicing patterns between species from different eukaryotic clades (flowering plants and mosses), where their orthologous exons, in each case generating protein-coding it was argued that a necessary property of a conserved alternative alternative isoforms (48, 49). Moreover, a recent RNA-seq exper- splicing pattern between species is the conservation of the gene iment in macaque provides further evidence for the skipping of architecture that allows such alternative splicing (46, 47). Ac- these exons (50) (SI Materials and Methods). Similarly, for the cordingly, we examined the global conservation of exon-structure SPAC27D7.08c gene, the orthologous mouse exon is subject to exon skipping (50, 51). Interestingly, an analysis of RNA-seq data between the S. pombe and human genomes. For all 2,641 an- fi notated internal coding exons in the S. pombe genome, a search from the Illumina BodyMap study reveals tissue-speci c regulation was conducted within the for an orthologous of the alternative isoform of the human ortholog of alp41 (GEO exon, using three criteria to define conservation: the host genes accession GSE30611) (SI Materials and Methods), further aug- had to be clear orthologs, the exons themselves had to be clear menting the relationship with the regulated behavior of the S. pombe orthologs, and the orthologous exons had to have an identical ortholog demonstrated here. nucleotide length (SI Materials and Methods and Dataset S1, To better understand the evolutionary history of these con- Table S11). Given the evolutionary distance separating these served exon-skipping events, a systematic search of all sequenced organisms, it is unsurprising that few exons satisfy these metrics. eukaryotic genomes from the National Center for Biotechnology Nevertheless, 4 of the 23 skipped exons identified by lariat se- Inforation (NCBI) was carried out (SI Materials and Methods). This quencing were conserved, including those in the srrm1, alp41, endeavor yielded over 100 species in which one or more of these pth1, and SPAC27D7.08c genes (Fig. 4 and Fig. S10), more than exons were stringently conserved (Fig. 4, Fig. S10,andDataset S1, expected by chance (SI Materials and Methods). Table S12). Notably, the skipped exon of alp41 is conserved by the Although conserved gene architecture is a necessary condition previously defined metrics in vertebrates, plants, and fungi, and in for a conserved alternative splicing pattern between two species, all but two cases the length of the exon is identical. This evolu- better support for this paradigm would be direct experimental ev- tionary analysis suggests that the gene structure necessary to sup- idence of alternative splicing in both species and species spanning port alternative splicing of this exon is at least as old as the common the evolutionary distance between them. Remarkably, the skipped ancestor of animals, plants, and fungi. Although no evidence cur- S. pombe exons of three of the four genes noted above are not only rently exists for skipping of the conserved exon in these species, structurally conserved with their human orthologs, but alternative transcriptome data in these species are markedly lower than that skipping of the orthologous exons has been observed within their available for humans. The identification of exon skipping across mammalian counterparts (Fig. 4 and Fig. S10). For the alp41 and these lineages will be critical to support the notion of a single, pth1 genes, the human orthologs have been demonstrated to skip ancestral origin for these alternative splicing events.

4of6 | www.pnas.org/cgi/doi/10.1073/pnas.1218353110 Awan et al. Downloaded by guest on September 27, 2021 Unlike the previous examples, there is no evidence for exon represent only a portion the unannotated splicing events that skipping of the conserved exon in the human ortholog of srrm1. occur in S. pombe. Thousands of currently annotated introns in However, srrm1 gene structure is specifically conserved between the S. pombe genome have lengths that are shorter than 100 vertebrates, insects, plants, and fungi exclusively at this exon and nucleotides; a simple extrapolation based on the ratio of un- the two flanking exons (Fig. 4 and Fig. S10). Importantly, the annotated versus annotated introns detected in this experiment srrm1 protein is an SR-like protein that functions to regulate suggests that as many as 1,000 splicing events are likely to exist alternative splicing in higher eukaryotes (52). The skipped exon within the S. pombe genome that have yet to be identified. of srrm1 encodes a stretch of basic amino acids just upstream of The majority of the uncharacterized splicing events in coding the proline-tryptophan-isoleucine containing (PWI) domain exons of protein-coding genes identified in this work result in the demonstrated to play an important role in RNA binding (53), production of a transcript that is predicted to be targeted for suggesting that the alternative protein isoform has dramatically degradation. It is generally accepted that many cellular mRNAs different RNA-binding properties. Notably, this separation of are subject to rapid cellular destabilization as a mechanism for basic motif (in exon 2) from the PWI motif (at the start of exon regulating their gene expression (4). The ability to detect these 3) is conserved in vertebrate srrm1 genes, although the rest of the transcripts, or splicing events within them, in an RNA-seq ex- gene is more variable. There are well-documented examples in periment will be compromised relative to more stable mRNAs. higher eukaryotes where transcripts encoding SR-proteins are Although little is known about the degradation rates of specific themselves subject to regulation via alternative splicing (54, 55), lariats (59), lariat intron decay is likely subject to less transcript- suggesting the intriguing possibility that alternative splicing of specific regulation than is mRNA decay, making lariat sequencing the S. pombe srrm1 mRNA may function to regulate the levels of a particularly sensitive method for detecting splicing events for functional Srrm1 protein. which the mature transcript is subject to rapid degradation. Discussion The results presented here have important implications not only for our understanding of splicing in S. pombe, but more Here we describe the development and implementation of a generally for the use of lariat sequencing as a complement to technique that leverages sequencing of the lariat introns that are RNA-seq for characterizing genome-wide splicing events in any excised during splicing to enable high-sensitivity detection of organism. The relatively modest-sized lariat-sequencing experi- genome-wide pre-mRNA splicing events. This lariat-sequencing ment presented here surveyed the combined RNA from dozens approach was applied to the unicellular yeast, S. pombe, an or- of different stressed samples, yet was able to detect hundreds of ganism that shares with vertebrates many of the features of al- splicing events that went undetected in an RNA-seq experiment GENETICS ternative splicing, yet for which recent RNA-seq experiments with nearly 20-times the sequencing depth. The reasons for this had failed to detect instances of exon-skipping (29, 30). Lariat enrichment are almost certainly both technical as well as bi- sequencing was able to identify hundreds of previously unchar- ological in nature. In fact, for some of the introns identified here, acterized splicing events in S. pombe, including over 20 examples fi of exon skipping, several of which are shown in Fig. 3. Although limited numbers of RNA-seq reads can be found that con rm these splicing events, highlighting the bioinformatic challenge exon-skipping has been previously demonstrated at similarly low fi frequency in other unicellular organisms (15–17), no such associated with de novo identi cation of splicing events. To be examples of this type of alternative splicing had been previously sure, the technical shortcomings of RNA-seq will diminish in demonstrated in this organism. More importantly, however, is scope as newer technologies allow for increased read lengths, reductions in sequencing costs enable even greater sequencing the demonstration here that several of the S. pombe exons that fi were identified as subject to skipping are highly conserved across depths, and the tools for splice-junction identi cation continue eukaryotic species spanning several major eukaryotic lineages, to evolve. However, even with these improvements, splicing and that experimental evidence exists for the alternative splicing events that produce unstable transcripts will be better sampled of the orthologous exons in mammalian species. To our knowl- by approaches, such as lariat sequencing, which capture and edge, these exon-skipping events represent the first known sequence the intronic material excised by the splicing machinery. fi examples of specific alternative splicing patterns that are iden- Further optimizations of the stabilization, puri cation, and se- tical between species from different eukaryotic kingdoms for quencing of lariats should provide a powerful addition to the which both isoforms encode functional proteins, or for which arsenal of tools for characterizing splicing within genomes conditional regulation has been demonstrated. These findings and transcriptomes. provide support for the hypothesis that functional alternative Materials and Methods splicing predates multicellularity (13). Importantly, exon-skipping in multicellular eukaryotes is thought Additional details can be found in the SI Materials and Methods. to be facilitated via an “exon-definition” model of spliceosome Isolation of lariat RNAs by 2D PAGE was accomplished by first isolating assembly, yet mechanistic studies of splicing in S. pombe suggest total cellular RNA from a Δdbr1 strain grown under a variety of conditions as that assembly occurs through the “intron-definition” model more described in SI Materials and Methods. Two different sets of gel running commonly associated with unicellular eukaryotes (56). Neverthe- conditions were used to optimize the recovery of lariat RNAs of different ∼ less, it remains unclear whether spliceosome assembly on select lengths from these samples. To recover lariats that were as short as 100 fi nucleotides, the RNA was initially run in a single lane on a 7.5% (wt/vol) introns in S. pombe can occur by exon-de nition; indeed, it has been × proposed that both intron- and exon-definition exist in Drosophila polyacrylamide gel containing 8.3 M urea in 1 TBE at 40 mA until the melanogaster (57), and assembly via intron- and exon-definition can bromophenol blue had migrated 12 cm. This lane was cut out of the gel and recast across the top of a 15% (wt/vol) gel. This second gel was run at 30 mA occur on a single human transcript (58). An important future goal until the bromophenol blue had migrated 28 cm. RNAs were stained with will be to determine whether the mechanistic bases by which these SYBR Gold and visualized using a DarkReader. The lariat RNAs recovered alternative splicing events are regulated are indeed conserved from this 2D gel sample are referred to as “short” lariats in Dataset S1, Table between S. pombe and humans. The highly tractable genetics of S6. To ensure recovery of longer lariats, a separate purification was per- S. pombe provide a powerful system in which to undertake these formed where the first dimension used a 7.5% (wt/vol) acrylamide gel run at experiments. 40 mA until the xylene cyanol had migrated 31cm, while the second di- fi Beyond the identi cation of alternative splicing events, lariat mension used a 20% (wt/vol) gel run at 28 mA until the xylene cyanol had sequencing was also able to identify over 200 previously un- migrated 68 cm. The lariats recovered from this gel are referred to as “long” characterized splicing events within the S. pombe genome, many lariats in Dataset S1, Table S6. The downstream processing of the RNAs of which appear to be canonical, constitutively spliced introns. In isolated from both sets of gel conditions was identical, as described in SI fact, because the present experiments were designed primarily to Materials and Methods. Of the 23 exon-skipping events identified, five were recover the larger lariats that result from exon-skipping events, identified exclusively from the “short” gel running conditions, nine exclu- the unique splicing events found in this work almost certainly sively from the “long” gel running conditions, and nine from both.

Awan et al. PNAS Early Edition | 5of6 Downloaded by guest on September 27, 2021 ACKNOWLEDGMENTS. We thank J. Lis, A. Grimson, E. Alani, G. Hoffman, for critical feedback on this work. This work was funded by National Institute S. Prasad, D. Barbash, and members of the J.A.P. and A. Grimson laboratories of General Medical Sciences Grant GM098634 (to J.A.P.).

1. Black DL (2000) Protein diversity from alternative splicing: A challenge for bio- 30. Rhind N, et al. (2011) Comparative functional genomics of the fission yeasts. Science informatics and post-genome biology. Cell 103(3):367–370. 332(6032):930–936. 2. Nilsen TW, Graveley BR (2010) Expansion of the eukaryotic proteome by alternative 31. Ruskin B, Green MR (1985) An RNA processing activity that debranches RNA lariats. splicing. Nature 463(7280):457–463. Science 229(4709):135–140. 3. Lareau LF, Brooks AN, Soergel DAW, Meng Q, Brenner SE (2007) The coupling of al- 32. Pleiss JA, Whitworth GB, Bergkessel M, Guthrie C (2007) Rapid, transcript-specific ternative splicing and nonsense-mediated mRNA decay. Adv Exp Med Biol 623: changes in splicing in response to environmental stress. Mol Cell 27(6):928–937. 190–211. 33. Domdey H, et al. (1984) Lariat structures are in vivo intermediates in yeast pre-mRNA 4. McGlincy NJ, Smith CWJ (2008) Alternative splicing resulting in nonsense-mediated splicing. Cell 39(3 Pt 2):611–621. mRNA decay: What is the meaning of nonsense? Trends Biochem Sci 33(8):385–393. 34. Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient 5. Breitbart RE, Andreadis A, Nadal-Ginard B (1987) Alternative splicing: A ubiquitous alignment of short DNA sequences to the human genome. Genome Biol 10(3):R25. mechanism for the generation of multiple protein isoforms from single genes. Annu 35. Lim LP, Burge CB (2001) A computational analysis of sequence features involved in Rev Biochem 56:467–495. recognition of short introns. Proc Natl Acad Sci USA 98(20):11193–11198. 6. Kim E, Magen A, Ast G (2007) Different levels of alternative splicing among eukar- 36. Bicknell AA, Cenik C, Chua HN, Roth FP, Moore MJ (2012) Introns in UTRs: Why we yotes. Nucleic Acids Res 35(1):125–131. should stop ignoring them. Bioessays 34(12):1025–1034. 7. Fedorov A, Merican AF, Gilbert W (2002) Large-scale comparison of intron positions 37. Isken O, Maquat LE (2008) The multiple lives of NMD factors: Balancing roles in gene among animal, plant, and fungal genes. Proc Natl Acad Sci USA 99(25):16128–16133. and genome regulation. Nat Rev Genet 9(9):699–712. 8. Rogozin IB, Wolf YI, Sorokin AV, Mirkin BG, Koonin EV (2003) Remarkable inter- 38. Leeds P, Peltz SW, Jacobson A, Culbertson MR (1991) The product of the yeast UPF1 kingdom conservation of intron positions and massive, lineage-specific intron loss and gene is required for rapid turnover of mRNAs containing a premature translational – gain in eukaryotic evolution. Curr Biol 13(17):1512–1517. termination codon. Genes Dev 5(12A):2303 2314. 9. Csurös M, Rogozin IB, Koonin EV (2008) Extremely intron-rich genes in the alveolate 39. Brooks AN, et al. (2011) Conservation of an RNA regulatory map between Drosophila – ancestors inferred with a flexible maximum-likelihood approach. Mol Biol Evol 25(5): and mammals. Genome Res 21(2):193 202. 903–911. 40. Wang K, et al. (2010) MapSplice: Accurate mapping of RNA-seq reads for splice 10. Schwartz SH, et al. (2008) Large-scale comparative analysis of splicing signals and their junction discovery. Nucleic Acids Res 38(18):e178. corresponding splicing factors in eukaryotes. Genome Res 18(1):88–103. 41. Taggart AJ, DeSimone AM, Shih JS, Filloux ME, Fairbrother WG (2012) Large-scale 11. Keren H, Lev-Maor G, Ast G (2010) Alternative splicing and evolution: Diversification, mapping of branchpoints in human pre-mRNA transcripts in vivo. Nat Struct Mol Biol – exon definition and function. Nat Rev Genet 11(5):345–355. 19(7):719 721. 12. Csuros M, Rogozin IB, Koonin EV (2011) A detailed history of intron-rich eukaryotic 42. Danan M, Schwartz S, Edelheit S, Sorek R (2012) Transcriptome-wide discovery of – ancestors inferred from a global survey of 100 complete genomes. PLOS Comput Biol circular RNAs in Archaea. Nucleic Acids Res 40(7):3131 3142. 43. Salzman J, Gawad C, Wang PL, Lacayo N, Brown PO (2012) Circular RNAs are the 7(9):e1002150. predominant transcript isoform from hundreds of human genes in diverse cell types. 13. Irimia M, Rukov JL, Penny D, Roy SW (2007) Functional and evolutionary analysis of PLoS ONE 7(2):e30733. alternatively spliced genes is consistent with an early eukaryotic origin of alternative 44. Memczak S, et al. (2013) Circular RNAs are a large class of animal RNAs with regu- splicing. BMC Evol Biol 7:188. latory potency. Nature 495(7441):333–338. 14. Roy SW, Irimia M (2009) Splicing in the eukaryotic ancestor: Form, function and 45. Jeck WR, et al. (2013) Circular RNAs are abundant, conserved, and associated with dysfunction. Trends Ecol Evol 24(8):447–455. ALU repeats. RNA 19(2):141–157. 15. Sorber K, Dimon MT, DeRisi JL (2011) RNA-Seq analysis of splicing in Plasmodium 46. Kalyna M, Lopato S, Voronin V, Barta A (2006) Evolutionary conservation and regu- falciparum uncovers new splice junctions, alternative splicing and splicing of anti- lation of particular alternative splicing events in plant SR proteins. Nucleic Acids Res sense transcripts. Nucleic Acids Res 39(9):3820–3835. 34(16):4395–4405. 16. Loftus BJ, et al. (2005) The genome of the basidiomycetous yeast and human path- 47. Iida K, Go M (2006) Survey of conserved alternative splicing events of mRNAs en- ogen Cryptococcus neoformans. Science 307(5713):1321–1324. coding SR proteins in land plants. Mol Biol Evol 23(5):1085–1094. 17. Yu B, et al. (2011) Spliceosomal genes in the D. discoideum genome: A comparison 48. Brandenberger R, et al. (2004) Transcriptome characterization elucidates signaling with those in H. sapiens, D. melanogaster, A. thaliana and S. cerevisiae. Protein Cell networks that control human ES cell growth and differentiation. Nat Biotechnol 2(5):395–409. 22(6):707–716. 18. Wood V, et al. (2002) The genome sequence of Schizosaccharomyces pombe. Nature 49. Strausberg RL, et al.; Mammalian Gene Collection Program Team (2002) Generation 415(6874):871–880. and initial analysis of more than 15,000 full-length human and mouse cDNA se- 19. Bitton DA, et al. (2011) Augmented annotation of the Schizosaccharomyces pombe quences. Proc Natl Acad Sci USA 99(26):16899–16903. genome reveals additional genes required for growth and viability. Genetics 187(4): 50. Merkin J, Russell C, Chen P, Burge CB (2012) Evolutionary dynamics of gene and – 1207 1217. isoform regulation in Mammalian tissues. Science 338(6114):1593–1599. 20. Prabhala G, Rosenberg GH, Käufer NF (1992) Architectural features of pre-mRNA 51. Carninci P, et al.; FANTOM Consortium; RIKEN Genome Exploration Research Group fi – introns in the ssion yeast Schizosaccharomyces pombe. Yeast 8(3):171 182. and Genome Science Group (Genome Network Project Core Group) (2005) The fi 21. Kupfer DM, et al. (2004) Introns and splicing elements of ve diverse fungi. Eukaryot transcriptional landscape of the mammalian genome. Science 309(5740):1559–1563. – Cell 3(5):1088 1100. 52. Cheng C, Sharp PA (2006) Regulation of CD44 alternative splicing by SRm160 and its 22. Lützelberger M, Gross T, Käufer NF (1999) Srp2, an SR protein family member of potential role in tumor cell invasion. Mol Cell Biol 26(1):362–370. fi ssion yeast: In vivo characterization of its modular domains. Nucleic Acids Res 27(13): 53. Szymczyna BR, et al. (2003) Structure and function of the PWI motif: A novel nucleic – 2618 2626. acid-binding domain that facilitates pre-mRNA processing. Genes Dev 17(4):461–475. fi 23. Tang Z, Käufer NF, Lin R-J (2002) Interactions between two ssion yeast serine/argi- 54. Lareau LF, Inada M, Green RE, Wengrod JC, Brenner SE (2007) Unproductive splicing nine-rich proteins and their modulation by phosphorylation. Biochem J 368(Pt 2): of SR genes associated with highly conserved and ultraconserved DNA elements. – 527 534. Nature 446(7138):926–929. 24. Webb CJ, Romfo CM, van Heeckeren WJ, Wise JA (2005) Exonic splicing enhancers in 55. Ni JZ, et al. (2007) Ultraconserved elements are associated with homeostatic control fission yeast: Functional conservation demonstrates an early evolutionary origin. of splicing regulators by alternative splicing and nonsense-mediated decay. Genes Genes Dev 19(2):242–254. Dev 21(6):708–718. 25. Sarmah B, Chakraborty N, Chakraborty S, Datta A (2002) Plant pre-mRNA splicing in 56. Shao W, Kim H-S, Cao Y, Xu Y-Z, Query CC (2012) A U1-U2 snRNP interaction network fission yeast, Schizosaccharomyces pombe. Biochem Biophys Res Commun 293(4): during intron definition. Mol Cell Biol 32(2):470–478. 1209–1216. 57. Fox-Walsh KL, et al. (2005) The architecture of pre-mRNAs affects mechanisms of 26. Käufer NF, Simanis V, Nurse P (1985) Fission yeast Schizosaccharomyces pombe cor- splice-site pairing. Proc Natl Acad Sci USA 102(45):16176–16181. rectly excises a mammalian RNA transcript intervening sequence. Nature 318(6041): 58. Sharma S, Kohlstaedt LA, Damianov A, Rio DC, Black DL (2008) Polypyrimidine tract 78–80. binding protein controls the transition from exon definition to an intron defined 27. Kishida M, Nagai T, Nakaseko Y, Shimoda C (1994) Meiosis-dependent mRNA splicing spliceosome. Nat Struct Mol Biol 15(2):183–191. of the fission yeast Schizosaccharomyces pombe mes1+ gene. Curr Genet 25(6): 59. Clement JQ, Qian L, Kaplinsky N, Wilkinson MF (1999) The stability and fate of 497–503. a spliced intron from vertebrate cells. RNA 5(2):206–220. 28. Moldón A, et al. (2008) Promoter-driven splicing regulation in fission yeast. Nature 60. Crooks GE, Hon G, Chandonia J-M, Brenner SE (2004) WebLogo: A sequence logo 455(7215):997–1000. generator. Genome Res 14(6):1188–1190. 29. Wilhelm BT, et al. (2008) Dynamic repertoire of a eukaryotic transcriptome surveyed 61. Letunic I, Bork P (2011) Interactive Tree Of Life v2: Online annotation and display of at single-nucleotide resolution. Nature 453(7199):1239–1243. phylogenetic trees made easy. Nucleic Acids Res 39(Web Server issue):W475–W478.

6of6 | www.pnas.org/cgi/doi/10.1073/pnas.1218353110 Awan et al. Downloaded by guest on September 27, 2021