<<

letters to nature

Acknowledgements amylose resin, and eluted with maltose under salt conditions We thank M. Bjøra˚s, L. Eide, K. Baynton, K. I. Kristiansen, J. Myllyharju, K. Skarstad and optimal for splicing (60 mM KCl)4,5. The products of the first and J. Klaveness for help and discussions, and L. Eide, K. Baynton and A. Klungland for critical reading of the manuscript. We are grateful to M. Bjøra˚s for preparation of [3H]MNU- second catalytic steps of splicing were detected in the gel filtration labelled DNA substrates, to L. Eide for the construction of the plasmid for expression of fraction for both AdML and AdML-M3 pre-mRNAs (Fig. 1b, lanes 2 GST–AlkB, and to D. Daoudi for technical assistance. This work was supported by the and 5). In contrast, after binding to and elution from the amylose- Research Council of Norway and the Norwegian Cancer Society. E.S. also acknowledges affinity resin, only the splicing products of AdML-M3 support from the European Commission. were detected (lanes 3 and 6). We conclude that the spliceosomes are highly purified, as there is no detectable non-specific binding of Competing interests statement AdML spliceosomes to the affinity resin. This conclusion is bol- The authors declare that they have no competing financial interests. stered by the observation that spliceosomal small nuclear Correspondence and requests for materials should be addressed to P.Ø.F. (e-mail: [email protected]).

...... Comprehensive proteomic analysis of the human

Zhaolan Zhou*†, Lawrence J. Licklider†‡, Steven P. Gygi†‡ & Robin Reed†

* Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138, USA † Department of Biology, ‡ Taplin Biological Mass Spectrometry Facility, Harvard Medical School, Boston, Massachusetts 02115, USA ...... The precise excision of from pre-messenger RNA is performed by the spliceosome, a macromolecular machine con- taining five small nuclear RNAs and numerous . Much has been learned about the components of the spliceo- some from analysis of individual purified small nuclear ribo- nucleoproteins1 and salt-stable spliceosome ‘core’ particles2,3. However, the complete set of proteins that constitutes intact functional spliceosomes has yet to be identified. Here we use maltose-binding protein affinity chromatography4,5 to isolate spliceosomes in highly purified and functional form. Using nanoscale microcapillary liquid chromatography tandem mass spectrometry6, we identify ,145 distinct spliceosomal proteins, making the spliceosome the most complex cellular machine so far characterized. Our spliceosomes comprise all previously known splicing factors and 58 newly identified components. The spli- ceosome contains at least 30 proteins with known or putative roles in expression steps other than splicing. This complex- ity may be required not only for splicing multi-intronic metazoan pre-messenger RNAs, but also for mediating the extensive coupling between splicing and other steps in . Spliceosomes undergo multiple assembly stages and confor- mational changes during the splicing reaction7–9.Toobtainas complete a spliceosomal proteome as possible, we purified spliceo- Figure 1 Isolation of spliceosomes. a, Schematic of pre-mRNAs. b, Total RNAs from the somes from a mixture of all stages of assembly, including spliceo- initial splicing reaction (input), gel filtration fraction (gf fraction), or final elution were somes that had undergone the first or second catalytic steps of the separated on a 15% denaturing polyacrylamide gel. Unspliced pre-mRNA, splicing splicing reaction. In addition, we assembled spliceosomes on two intermediates (lariat-exon2 and exon1) and products (lariat and mRNA) are indicated. distinct pre-messenger RNAs (pre-mRNAs; adenovirus major late, c, Proteins from an RNase-A-treated aliquot of the gel filtration fraction or final elution AdML, and Fushi tarazu, Ftz) so that the proteins common to both were separated on 4–12% SDS–PAGE and stained with silver. The protein markers and could be used as a criterion for identifying splicing-specific proteins. positions of MS2–MBP and RNase A are indicated. d, Analysis of snRNAs. Total RNAs Representative data for purification of AdML spliceosomes are were extracted from the nuclear extract (NE), AdML, or AdML-M3 elution, 32P-end- shown in Fig. 1. AdML-M3 pre-mRNA contains three hairpins that labelled (þ) or not (-), and fractionated on an 8% polyacrylamide gel. Splicing bind to the MS2–MBP fusion protein used for affinity purification intermediates, products and snRNAs are indicated. e, Western blot of nuclear extract (NE) (Fig. 1a: MS2 is a bacteriophage coat protein; MBP is maltose- and the final elutions from AdML, AdML-M3 or H complexes using the indicated binding protein). AdML pre-mRNA, which lacks these hairpins, antibodies. f, Spliceosomes were separated on a Sephacryl-S500 gel filtration column was used as a negative control. After adding the MS2–MBP fusion before (triangle) and after (circles) MBP-affinity purification. The positions of the protein to the two pre-mRNAs, spliceosomes were assembled spliceosome and H complex are indicated. The peak in fractions 68–80 contains in vitro, isolated by gel filtration, affinity-selected by binding to degraded RNAs and free proteins.

182 © 2002 Nature Publishing Group NATURE | VOL 419 | 12 SEPTEMBER 2002 | www.nature.com/nature letters to nature

(snRNAs; U1, U2, U4, U5 and U6) are detected in the AdML-M3 associated with the spliceosomal snRNAs as well as with a complex but not the AdML elution (Fig. 1d, lanes 1 and 3). Likewise, analysis set of proteins. The purified spliceosomes are not significantly of silver-stained proteins by SDS–polyacrylamide gel electrophor- contaminated with the H complex or with HeLa nuclear extract esis (SDS–PAGE) revealed a large number of proteins specifically proteins. Finally, MBP-purified spliceosomes elute in the same gel associated with the AdML-M3 spliceosome, whereas only the MS2– filtration fractions before and after affinity purification (Fig. 1f). MBP fusion protein and RNase A are detected in the AdML control Together, these data show that the MBP-isolated spliceosomes are (Fig. 1c, lanes 3 and 4). both highly purified and intact. To further examine the purity of the MBP-purified spliceosomes, To determine whether the MBP-purified spliceosomes are func- we performed western blotting using antibodies that recognize tional, we isolated spliceosomes that had not yet undergone the first representative splicing factors (Fig. 1e). As an additional control catalytic step of splicing, and used them in an in vitro complemen- for specificity, we examined MBP-purified H complex, which tation assay. The complementing extract was specifically depleted contains heterogeneous nuclear ribonucleoproteins (hnRNPs) and of the splicing factor SF3a5, which is a component of U2 snRNP assembles nonspecifically on RNAs in splicing extracts10. The U2 and required for early spliceosome assembly5,8. Thus, the depleted small nuclear ribonucleoprotein (snRNP) protein B 00 ,twoU5 extracts do not support the assembly of the early spliceosome, snRNP proteins (of relative molecular mass 116,000 and 40,000 but contain factors necessary to chase the pre-mRNA in a pre- 1 (Mr 116K and 40K, respectively)) , and SC35, a member of the SR assembled spliceosome to spliced product. To distinguish between family of splicing factors11, were detected in the AdML-M3 elution, complementation and de novo splicing, we used a mixture of but not in the AdML or H complex elution (Fig. 1e). In contrast, MBP-purified spliceosomes and ‘naked’ pre-mRNA (lacking hair- hnRNP A1 is enriched in the H complex10. These data indicate pins). As shown in Fig. 2a and b, splicing products were detected that the MBP-affinity purification is specific for AdML-M3 pre- only from MBP-purified Ftz-M3 or AdML-M3 spliceosomes in mRNA. The data also show that the MBP-purified spliceosomes are the SF3a-depleted extract, but not from the corresponding naked

Figure 2 The MBP-purified spliceosomes are functional in an in vitro complementation (lanes 8 and 10) were incubated under standard splicing conditions in nuclear extract for assay. a, Purified early spliceosomes assembled on Ftz-M3 pre-mRNA were mixed with 0 or 50 min to serve as markers for the splicing products. b, Same as a except that AdML Ftz pre-mRNA lacking hairpins, and incubated with either buffer alone (lanes 1 and 2), or AdML-M3 pre-mRNA was used. Total RNA was extracted and separated on an 8% (Ftz) with SF3a-depleted nuclear extract (lanes 3 and 4), or with SF3a-depleted extract plus or 16% (AdML) denaturing polyacrylamide gel. Splicing intermediates and products are recombinant SF3a protein (lanes 5 and 6) under splicing conditions for 25 min (lanes 1, 3 indicated. and 5) or 50 min (lanes 2, 4 and 6). Ftz pre-mRNA (lanes 7 and 9) or Ftz-M3 pre-mRNA

NATURE | VOL 419 | 12 SEPTEMBER 2002 | www.nature.com/nature © 2002 Nature Publishing Group 183 letters to nature pre-mRNAs (lanes 3 and 4). We conclude that the MBP-purified mammals. Among the ,145 human spliceosome-associated pro- spliceosomes assembled on both AdML-M3 and Ftz-M3 pre- teins, ,90 known or putative yeast homologues have been identi- mRNAs are functional. fied8,23 (Supplementary Tables I and II). The potentially greater The protein composition of the purified, intact and functional complexity of the human spliceosome is not unexpected in light of spliceosomes assembled on AdML-M3 and Ftz-M3 pre-mRNAs was the vastly greater complexity of splicing in metazoans compared to then determined by liquid chromatography tandem mass spec- yeast. Indeed, most metazoan pre-mRNAs contain multiple introns, trometry6 (LC-MS/MS). The AdML H complex was purified and the introns are typically thousands of nucleotides, and the splicing sequenced for comparison. Proteins that are shared between AdML signals are weakly conserved. Superimposed on this complexity is and Ftz spliceosomes are shown in Supplementary Tables I-III. the high frequency of , which is in turn further (These tables contain the GenBank accession number for each complicated owing to regulation24. Thus, many of the metazoan- protein, with live links to the protein sequence and the SWISS- specific proteins may play roles in the accurate recognition and PROT . Other names for each protein are also joining of . The best-characterized example of such proteins is listed.) the SR family of splicing factors, which comprise at least 10 proteins Atotalof,145 distinct proteins were identified as shared in the human spliceosome, but is entirely lacking in yeast11–13.SR between the two spliceosomes, including 88 known splicing fac- proteins are required for both recognition of splice sites and tors/snRNP proteins/spliceosomal proteins (Supplementary Tables alternative splicing, neither of which is relevant in yeast11–13,24. I and II). The proteins listed in Supplementary Table III were Splicing is thought to be coupled to several steps in gene abundant in the H complex, and thus were not counted as spliceo- expression, including , and mRNA some proteins. The proteins in Supplementary Table I are presented export25–27. A significant fraction of the spliceosome proteome, at in functional groups, listing those that associate with the spliceo- least 30 proteins, are either known or candidate proteins for some earlier followed by those that associate later. Previous work1 coupling splicing to other steps in gene expression (names shown identified 43 distinct proteins in U1, U2, U4, U5 and U6 . Of in bold font in Supplementary Tables I and II). For example, the these, 41 were identified in our spliceosomes (Supplementary Table transcription cofactor TAT-SF1 is in the spliceosome (Supplemen- I; hLSm5 was not detected, and hLSm8 was detected only in the tary Table II). This protein interacts with snRNPs, and is thought to AdML spliceosome). The additional 47 proteins in Supplementary reciprocally activate transcription elongation and splicing28. Table I correspond to previously known splicing factors and Additional transcription factors (for example, CA150, XAB2 and spliceosome components7–9. Among these are all known canonical SKIP) as well as polyadenylation factors (CF I) are also in the members of the SR family of splicing factors11–13, all known DExD spliceosome (Supplementary Tables I and II), and may function in box splicing factors7, all proteins found in the salt-stable spliceo- coupling. We also detect Aly and UAP56 in the spliceosome, some ‘core’2,3, all known second catalytic step splicing factors14, and proteins that are known to couple splicing to mRNA export15 known components of the spliced messenger ribonucleoprotein (Supplementary Table I). Aly, CA150 and SKIP were previously (mRNP) that promotes mRNA transport to the cytoplasm15 detected in salt-stable spliceosome cores, indicating that at least (Supplementary Table I). A number of factors encoded by some of the coupling proteins are tightly bound to the spliceosome2. implicated in genetic diseases, ranging from retinitis pigmentosa Both Aly and UAP56 are components of the conserved TREX (U4/U6-61K16,17; U4/U6-90K18; U5-220K19) to cancer (WTAP20), complex, which contains proteins that function in transcription are also associated with the spliceosomes (Supplementary Tables I elongation in yeast29. Our data reveal that every protein associated and II). with the TREX complex is also in the functional spliceosome We also found 58 proteins that have not been previously ident- (Supplementary Table II), suggesting that transcription, splicing ified as spliceosomal proteins in humans. These are listed in and export may all be coupled via this complex. In metazoans, most Supplementary Table II, and are organized according to the motifs pre-mRNAs contain small exons and enormous introns. Thus, the in each protein. Some of these proteins (36) have previously association of the TREX complex with the spliceosome may explain described names and/or functions. The remainder are uncharacter- how mRNA export factors are co-transcriptionally loaded onto ized, and have been designated ‘functional spliceosome associated exons, which are destined for export, and excluded from introns, proteins’ (fSAPs). Among the already named proteins are 5 cyclo- which are retained in the nucleus. philin-like proteins (Supplementary Table II). Recent studies Although it is possible that more spliceosomal proteins will be revealed a role for one cyclophilin (U4/U6-20K) in splicing21, and found, our data suggest that the purified functional spliceosomes our data raise the possibility that cyclophilins (prolyl-isomerases) contain virtually the complete human splicing proteome. This may have a general function in mediating conformational changes proteome includes all of the previously known splicing factors, in the spliceosome. Data from several observations suggest that snRNP proteins and spliceosomal proteins, as well as a large set of most, if not all, of the 58 new proteins are specifically associated newly identified proteins. The observation that the spliceosome is with spliceosomes. First, all 58 are common to spliceosomes associated with numerous proteins that function in coupling spli- assembled on two different pre-mRNAs. Second, these proteins cing to other steps in gene expression provides compelling evidence are either not detected or not enriched in the purified H complex. for the emerging concept of an extensively coupled network of gene Third, our data (Fig. 1) show that the spliceosomes are highly expression machines25–27. A purified from HeLa nuclear extract contaminants. Fourth, the new proteins are present in the spliceosome with the same relative Methods certainty as the known splicing proteins (based on the number of Plasmids encoding AdML-M3 or Ftz-M3 pre-mRNAs and the purification of unique peptides identified; compare Supplementary Tables I and spliceosomes were described previously4,5. For peptide identification from database II). Fifth, a large number of the proteins have domains, such as RNA sequences using tandem mass spectrometry, the proteins in the MBP-purified complexes recognition motifs or DExD boxes, which are hallmarks of splicing were fractionated from the gel origin to a distance of 3 cm into a 10% SDS–PAGE gel and stained with Coomassie blue. The gel lane was divided horizontally into ,10 slices which proteins. Sixth, some of the proteins are known to co-localize with were individually processed to yield in-gel trypsin digestion products. The resulting splicing factors in nuclear speckles and/or to localize in the peptide mixtures were separated and analysed in an automated system by nanoscale LC- nucleus22 (see Supplementary Table II). Last, many of the proteins MS/MS on an LCQ-DECA ion trap mass spectrometer (Thermo Finnigan) using the (26) have known, or apparent, yeast counterparts that function in vented-column approach6. The data were searched against the non-redundant human 30 23 database from NCBI with the Sequest algorithm . Peptide matches were validated by mRNA processing in Saccharomyces cerevisiae (Supplementary previously established criteria, including manual inspection when necessary6. More than Table II). 65,000 sequencing attempts (tandem mass spectra) were acquired, resulting in the The splicing machinery is highly conserved from S. cerevisiae to identification of 8,400 unique peptides. The proteins identified are presented in

184 © 2002 Nature Publishing Group NATURE | VOL 419 | 12 SEPTEMBER 2002 | www.nature.com/nature letters to nature

Supplementary Tables I–III. Tubulin, keratins and ribosomal proteins were found in all 19. McKie, A. B. et al. Mutations in the pre-mRNA splicing factor gene PRPC8 in autosomal dominant complexes, probably as contaminants. retinitis pigmentosa (RP13). Hum. Mol. Genet. 10, 1555–1562 (2001). 20. Little, N. A., Hastie, N. D. & Davies, R. C. Identification of WTAP, a novel Wilms’ tumour Received 9 April; accepted 12 July 2002; doi:10.1038/nature01031. 1-associating protein. Hum. Mol. Genet. 9, 2231–2239 (2000). 1. Will, C. L. & Luhrmann, R. Spliceosomal UsnRNP biogenesis, structure and function. Curr. Opin. Cell 21. Horowitz, D. S., Lee, E. J., Mabon, S. A. & Misteli, T. A cyclophilin functions in pre-mRNA splicing. Biol. 13, 290–301 (2001). EMBO J. 21, 470–480 (2002). 2. Neubauer, G. et al. Mass spectrometry and EST-database searching allows characterization of the 22. Andersen, J. S. et al. Directed proteomic analysis of the human . Curr. Biol. 12, 1–11 (2002). multi-protein spliceosome complex. Nature Genet. 20, 46–50 (1998). 23. Stevens, S. W. et al. Composition and functional characterization of the yeast spliceosomal penta- 3. Bennett, M., Michaud, S., Kingston, J. & Reed, R. Protein components specifically associated with snRNP. Mol. Cell 9, 31–44 (2002). prespliceosome and spliceosome complexes. Genes Dev. 6, 1986–2000 (1992). 24. Cartegni, L., Chew, S. L. & Krainer, A. Listening to silence and understanding nonsense: Exonic 4. Zhou, Z. et al. The protein Aly links pre-messenger-RNA splicing to nuclear export in metazoans. mutations that affect splicing. Nature Rev. Genet. 3, 285–298 (2002). Nature 407, 401–405 (2000). 25. Bentley, D. Coupling RNA polymerase II transcription with pre-mRNA processing. Curr. Opin. Cell 5. Das, R., Zhou, Z. & Reed, R. Functional association of U2 snRNP with the ATP-independent Biol. 11, 347–351 (1999). spliceosomal complex E. Mol. Cell 5, 779–787 (2000). 26. Hirose, Y. & Manley, J. L. RNA polymerase II and the integration of nuclear events. Genes Dev. 14, 6. Licklider, L. J., Thoreen, C. C., Peng, J. & Gygi, S. P. Automation of nanoscale microcapillary liquid 1415–1429 (2000). chromatography-tandem mass spectrometry with a vented column. Anal. Chem. 74, 3076–3083 (2002). 27. Maniatis, T. & Reed, R. An extensive network of coupling among gene expression machines. Nature 7. Staley, J. P. & Guthrie, C. Mechanical devices of the spliceosome: motors, clocks, springs, and things. 416, 499–506 (2002). Cell 92, 315–326 (1998). 28. Fong, Y. W. & Zhou, Q. Stimulatory effect of splicing factors on transcriptional elongation. Nature 8. Burge, C. B., Tuschl, T. H. & Sharp, P.A. The RNAWorld (eds Gesteland, R. F., Cech, T. R. & Atkins, J. F.) 414, 929–933 (2001). 525–560 (Cold Spring Harbor Laboratory Press, New York, 1999). 29. Strasser, K. et al. TREX is a conserved complex coupling transcription with messenger RNA export. 9. Hastings, M. L. & Krainer, A. R. Pre-mRNA splicing in the new millennium. Curr. Opin. Cell Biol. 13, Nature 417, 304–308 (2002). 302–309 (2001). 30. Eng, J. K., McCormack, A. L. & Yates, J. R. An approach to correlate tandem mass spectral data of 10. Bennett, M., Pinol-Roma, S., Staknis, D., Dreyfuss, G. & Reed, R. Differential binding of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994). heterogeneous nuclear ribonucleoproteins to mRNA precursors prior to spliceosome assembly in vitro. Mol. Cell Biol. 12, 3165–3175 (1992). Supplementary Information accompanies the paper on Nature’s website 11. Fu, X. D. The superfamily of arginine/serine-rich splicing factors. RNA 1, 663–680 (1995). (http://www.nature.com/nature). 12. Tacke, R. & Manley, J. L. Determinants of SR protein specificity. Curr. Opin. Cell Biol. 11, 358–362 (1999). Acknowledgements 13. Graveley, B. R. Sorting out the complexity of SR protein functions. RNA 6, 1197–1211 (2000). We thank. N. Dorman, E. Ibrahim and K. Magni for comments on the manuscript; 14. Zhou, Z. & Reed, R. Human homologs of yeast prp16 and prp17 reveal conservation of the mechanism C. Thoreen for database support; R. Das for SF3a-depleted nuclear extract; and for catalytic step II of pre-mRNA splicing. EMBO J. 17, 2095–2106 (1998). R. Luhrmann and G. Dreyfuss for antibodies. HeLa cells were obtained from the National 15. Reed, R. & Hurt, E. A conserved mRNA export machinery coupled to pre-mRNA splicing. Cell 108, Cell Culture Center. This work was supported by the NIH (S.P.G. and R.R.). 523–531 (2002). 16. Vithana, E. N. et al. A human homolog of yeast pre-mRNA splicing gene, PRP31, underlies autosomal dominant retinitis pigmentosa on 19q13.4 (RP11). Mol. Cell 8, 375–381 (2001). 17. Makarova, O. V., Makarov, E. M., Liu, S., Vornlocher, H. P.& Luhrmann, R. Protein 61K, encoded by a Competing interests statement gene (PRPF31) linked to autosomal dominant retinitis pigmentosa, is required for U4/U6†U5 tri- The authors declare that they have no competing financial interests. snRNP formation and pre-mRNA splicing. EMBO J. 21, 1148–1157 (2002). 18. Chakarova, C. F. et al. Mutations in HPRP3, a third member of pre-mRNA splicing factor genes, Correspondence and requests for materials should be addressed to R.R. implicated in autosomal dominant retinitis pigmentosa. Hum. Mol. Genet. 11, 87–92 (2002). (e-mail: [email protected]).

NATURE | VOL 419 | 12 SEPTEMBER 2002 | www.nature.com/nature © 2002 Nature Publishing Group 185 Table I. Known splicing factors/snRNPs/SAPs identified in functional spliceosomes

# of pep. iden. Acce. # with Protein Name Yeast homolog (SGD Motifs Cal. AdML Ftz link to seq. ORF) Mass S'some S'some

SR proteins Q08170 SRp75 RS, 2 RRMs 56792 2 2 Q05519 p54/SFRS11 RS, 1 RRM 53542 3 1 Q13247 SRp55 RS, 2 RRMs 39568 3 3 Q13243 SRp40 RS, 2 RRMs 31264 2 3 Q07955 ASF/SF2 RS, 2 RRMs 27613 11 12 Q16629 9G8 RS, 1 RRM 27367 3 3 Q01130 SC35 RS, 1 RRM 25575 2 2 Q13242 SRp30c RS, 2 RRMs 25542 2 4 AF057159 hTra2 RS, 1 RRM 21935 1 1 P23152 SRp20 RS, 1 RRM 19330 4 3

Non-snRNP spliceosome-assembly proteins NP_057417 SRm300 RS-rich 251965 12 9 AB018344 hPrp5 PRP5 (YBR237W) DExD, HELICc, RS 117461 11 3 AF255443 hCrn CLF1 (YLR117C) HAT 99201 12 4 NP_001244 CDC5L CEF1 (YMR213W) 92250 20 12 Q09161 hCBP80 CBP80 (YMR125W) MIF4G 91839 12 10 Y08765 SF1 BBP (Ylr116w) ZF, KH 68632 4 1 Q13573 SKIP PRP45 (YAL032C) SKIP 61494 8 6 BC008875 PUF60 3 RRMs 58171 15 5 AF044333 PLRG1 PRP46 (YPL151C) WD40s 57194 7 3 BC008719 hPrp19 PRP19 (YLL036C) WD40s 55181 10 7 P26368 U2 AF65 MUD2 (YKL074C) RS, 3 RRMs 53501 6 2 Q01081 U2AF35 RS, 1 RRM, ZF 27872 2 2 AF083385 SPF30 TUDOR 26711 3 2 P52298 hCBP20 CBP20 (YPL178W) 1 RRM 18001 2 3 AF049523 hFBP11 PRP40 (YKL012W) 2 WW, 1 FF partial 4 4

U1 snRNP specific proteins P08621 U1-70kD SNP1 (YIL061C ) RS, 1 RRM 70081 4 1 P09012 U1 A MUD1 (YBR119W) 2 RRMs 31280 1 2 P09234 U1 C YHC1 (YLR298C) ZF 17394 2 1

U2 snRNP specific proteins AF054284 SAP155 HSH155 (YMR288W) PP2A 145815 25 20 Q13435 SAP145 CUS1 (YMR240C) Pro-rich 97657 15 10 AJ001443 SAP130 RSE1 (YML049C) 135592 5 5 Q15459 SAP114 PRP21 (YJL203W) SWAP, UBQ 88886 16 13 Q15428 SAP62 PRP11 (YDL043C) ZF 49196 4 1 A55749 SAP61 PRP9 (YDL030W) ZF 58849 11 4 Q15427 SAP49 HSH49 (YOR319W) 2 RRMs 44386 2 2 P09661 U2 A' LEA1 (YPL213W) Leu-rich 28444 9 9 P08579 U2 B" MSL1 (YIR009W) 2 RRMs 25486 1 1 Q9Y3B4 p14 SNU17 (YIR005W) 1 RRM 14584 3 3

U5 snRNP specific proteins AB007510 U5-220kD (YHR165C) 273785 29 33 O75643 U5-200kD BRR2 (YER172C) DExD, HELICc 194479 39 36 BC002360 U5-116kD SNU114 (YKL173W) G domain 109478 22 19 BC001666 U5-102kD PRP6 (YBR055C) HAT 106925 4 3 BC002366 U5-100kD PRP28 (YDR243C) DExD, HELICc, RS 95583 14 12 BC000495 U5-52kD SNU40 (YHR156C) GYF 37646 2 1 AF090988 U5-40kD WD40s 39299 3 3 O14834 U5-15kD DIB1 (YPR082C) 16786 1 1

U4/U6 snRNP specific proteins T50839 U4/U6-90kD PRP3 (YDR473C) PWI 77529 9 5 BC007424 U4/U6-60kD PRP4 (YPR178W) WD40s 58321 6 5 AL050369 U4/U6-61kD PRP31 (YGR091W) NOP 55424 4 3 AF036331 U4/U6-20kD CPH1 (YDR155C ) Cyclophilin 19207 5 3 P55769 U4/U6-15.5kD SNU13 (YEL026W) putative RRM 14174 3 1

U4/U6.U5 tri-snRNP specific proteins T00034 Tri-snRNP 110kD SNU66 (YOR308C) RS 90255 4 11 AF353989 Tri-snRNP 65kD SAD1 (YFR005C) RS, UBQ, ZF 65415 8 3 X76302 Tri-snRNP 27kD RS 18859 2 3

Sm/LSm core proteins P14678 Sm B/B' SMB1 (YER029C) Sm 1 and 2 24610 3 2 P13641 Sm D1 SMD1 (YGR074W) Sm 1 and 2 13282 4 3 P43330 Sm D2 SMD2 (YLR275W) Sm 1 and 2 13527 6 5 P43331 Sm D3 SMD3 (YLR147C) Sm 1 and 2 13916 2 2 P08578 Sm E SME1 ( YOR159C) Sm 1 and 2 10804 3 3 Q15356 Sm F SMX3 (YPR182W) Sm 1 and 2 9725 2 3 Q15357 Sm G SMX2 (YFL017W-A) Sm 1 and 2 8496 2 1 Q9Y333 hLSm2 LSM2 (YBL026W) Sm 1 and 2 10835 2 2 Q9Y4Z1 hLSm3 LSM3 (YLR438C-A) Sm 1 and 2 11714 1 3 Q9Y4Z0 hLSm4 LSM4 (YER112W) Sm 1 and 2 15350 3 2 Q9Y4Y8 hLSm6 LSM6 (YDR378C) Sm 1 and 2 9128 1 1 Q9UK45 hLSm7 LSM7 (YNL147W) Sm 1 and 2 11602 2 1

Catalytic step II and late acting proteins Q92620 hPrp16 PRP16 (YKR086W) DExD, HELICc, RS 140473 5 2 Q14562 hPrp22 PRP22 (YER013W) DExD, HELICc, RS 139315 9 9 O43143 hPrp43 PRP43 (YGL120C ) DExD, HELICc 92829 10 4 BC010634 hSlu7 SLU7 (YDR088C) ZF 68343 2 2 AF038392 hPrp17 PRP17 (YDR364C ) WD40s 65521 6 1 BC000794 hPrp18 PRP18 (YGR006W) 39860 2 1

Spliced mRNP/EJC proteins AF048977 SRm160 PWI, RS 93519 2 4 Q13838 UAP56 SUB2 (YDL084W) DExD, HELICc 48991 3 3 PJC4525 RNPS1 1 RRM, Ser-rich 34208 2 2 NM_005782 Aly YRA1 (YDR381W) 1 RRM 26861 1 3 Q9Y5S9 Y14 1 RRM 19889 4 2 P50606 Magoh Mago_nashi 17164 3 3

Other previously reported splicing factors/SAPs T08599 CA150 3 WW, 6 FF 123960 21 15 AY029347 hPrp4 kinase TyrKc 116973 2 3 P17844 p68 DBP2 (YNL112W) DExD, HELICc 69148 6 4 Q13123 RED RED 65630 4 2 O15355 PP2Cγ PTC3 (YBL056W) 59272 2 1 Q15097 PAB2 PAB1 (YER165W) 4 RRMs, PABP 58518 4 5 BC007871 SPF45 1 RRM, Gly-rich 44962 1 1 AF083383 SPF38 WD40s 34290 2 2 NP_055095 SPF31 DnaJ 30986 2 1 AF081788 SPF27 26131 6 5 Table II. Proteins newly identified in functional spliceosomes

# of pep. iden. Acc. # with Protein Putative yeast Motifs Comments Cal. AdML Ftz link to seq. Name homolog (SGD) Mass S'some S'some

RRM-containing proteins NP_055792 fSAP152 1 RRM (HP KIAA0670) Likely ortholog of mouse Acinus (apoptotic 151887 20 17 chromatin condensation inducer) AC004858 fSAP94 1 RRM, PWI Related to Drosophila cleavage stimulation factor 64 kD 94122 17 8 AF356524 SHARP 4 RRMs Transcription cofactor 402248 16 8 AAB18823 TAT-SF1 CUS2 (Ynl286w) 2 RRMs Interacts with snRNPs, stimulates transcription elongation 85759 9 7 I55595 fSAP59 2 RRMs Likely ortholog of mouse transcription coactivator CAPER, 58657 7 2 colocalizes with splicing factors in speckles NP_008938 CF I-68kD 1 RRM Im, 68kD subunit 59209 5 1 NP_073605 OTT 3 RRMs Chromosomal translocation of OTT with megakaryocytic acute 102135 2 3 leukemia (MAL) causes acute megakaryocytic leukemia BC003402 fSAP47 ECM2 (YBR065C)* 1 RRM, ZF (HP FLJ10290) ECM2 is penta-snRNP specific 46896 3 1 U76705 IMP3 2 RRMs, 4 KH Insulin-like growth factor (IGF) II mRNA-binding protein 3 63720 5 1 Q14011 CIRP 1 RRM, Gly-rich Cold-inducible RNA-binding protein 18648 2 2 BC006474 fSAPa 1 RRM, SWAP (HP KIAA0332) partial 2 2

DExD box proteins T00333 fSAP164 DExD (HP KIAA0560) 163986 13 14 P38919 IF4N FAL1 (YDR021W)* DExD, HELICc Putative ATP-dependent RNA helicase 46833 9 8 Q92841 p72 DBP2 (YNL112W) DExD, HELICc Nuclear ATP-dependent RNA helicase 72371 8 4 O60231 hPrp2 PRP2 (YNR011C)* DExD, HELICc Prp2 is required for first step of pre-mRNA splicing 119172 5 6 Q08211 RHA (YLR419W)* DExD, HELICc, Associates with CBP and RNA Polymerase II 140877 1 7 DSRM O00571 DDX3 DBP1 (YPL119C)* DExD, HELICc, SR Colocalizes with splicing factors in speckles 73243 5 2 Q9UJV9 Abstrakt DExD, HELICc Drosophila Abstrakt involved in mRNA processing 69738 4 1 P42285 fSAP118 MTR4 (YJL050W)* DExD (HP KIAA0052) MTR4 is required for mRNA processing 117790 3 1 BC002548 fSAP113 DExD (HP MGC2948) 113671 2 2

Proteins with other known motifs BC007208 XAB2 SYF1 (YDR416w) HAT Transcription cofactor; SYF1 is penta-snRNP specific 100010 12 8 A53545 hHPR1 HPR1 (YDR138W) DEATH Associated with TREX complex 75627 6 12 AB034205 LUC7A LUC7 (YDL087C)* RS-rich LUC7 is U1 snRNP component 51466 7 6 AB046824 fSAPb CWC22 (YGR278W)* MIF4G (HP KIAA1604) CWC22 associates with splicing factors partial 9 4 XM_087118 hFBP3 1 WW (HP HSPC225) Related to spliciing factor hFBP21 24297 5 5 BC003118 fSAP35 WD40s (HP MGC2655) Associated with TREX complex 34849 2 6 Q9NYV4 CrkRS CTK1 (YKL139W)* TyrKc Kinase, colocalizes with splicing factors in speckles 164155 4 4 T12531 TIP39 (YLR424W)* Gly-rich YLR424w associates with splicing factors 96820 5 3 BC001403 CF I-25kD NUDIX Cleavage factor Im, 25kD subunit 26227 5 4 BC006849 hTEX1 TEX1 (YNL253W) WD40s Associated with TREX complex 38772 3 3 BC002876 fSAP57 PFS2 (YNL317W)* WD40s (HP FLJ10805) PFS2 is a polyadenylation factor 57544 4 1 T02672 fSAPc ERM (HP R31449) partial 3 2 P41223 fSAP17 CWC14 (YCR063W)* ZF CWC14 associates with splicing factor CEF1 16844 2 1 O43670 ZNF207 ZF Zinc finger transcription factor 50751 2 1

Proteins with no known motif AAK21005 ASR2 Associated with TREX complex 100665 11 12 AF441770 hTHO2 THO2 (YNL139C) Associated with TREX complex 169573 7 12 I39463 fSAP79 (HP KIAA0983) Associated with TREX complex 78536 2 7 AK027098 fSAP24 (HP FLJ23445) Associated with TREX complex 23671 2 6 BC001621 SNP70 Colocalizes with splicing factors in speckles 69998 6 2 BC006350 fSAP71 (HP MGC13125) 70521 7 1 P55081 MFAP1 Candidate gene for diseases of microfibrils 51855 3 3 AJ276706 WTAP Wilms' tumour 1-associating protein; homolog of Drosophila pre- 44243 3 2 mRNA splicing regulator female-lethal (2) d AJ279080 fSAP105 Putative transcription factor 104804 3 2 NP_056311 fSAP121 (HP DKFZP434I116) 121193 2 3 BC004442 fSAP33 ISY1 (YJR050W) (HP MGC4036) ISY1 is penta-snRNP specific 32992 3 1 AF161497 CCAP2 CWC15 (YDR163W)* (HP HSPC148) Associates with splicing factor CDC5L 26610 2 2 Q9Y5B6 GCFC Putative transcription factor 29010 3 1 NP_073210 DGSI DiGeorge syndrome critical region gene (Es2) 52567 3 1 NP_110517 NAP Nuclear protein that induces apoptosis 65173 3 1 NP_056299 fSAP29 (HP DKFZp564O2082) Nuclear protein 28722 1 3 NP_037374 fSAP23 (HP MGC3336) Putative transcription factor 22774 1 2 BC004122 fSAP11 (HP MGC11203) 10870 1 2 BC000216 fSAP18 (HP MGC2803) 18419 2 1

Cyclophilin-like proteins XP_042023 CyP64 CPR3 (YML078W)* Cyclophilin, WD40 (HP KIAA0073) Cyclophilin type peptidylprolyl isomerase 64222 9 4 AF271652 PPIL3 CPR6 (YLR216C)* Cyclophilin Peptidylprolyl isomerase (cyclophilin)-like 3 18155 5 3 S64705 PPIL2 CYP5 (YDR304C)* Cyclophilin, Ubox Peptidylprolyl isomerase (cyclophilin)-like 2 58823 6 1 BC003048 PPIL1 CYP2 (YHR057C)* Cyclophilin Peptidylprolyl isomerase (cyclophilin)-like 1 18237 4 2 Q9UNP9 PPIE CPH1 (YDR155C)* Cyclophilin Peptidylprolyl isomerase E (cyclophilin E) 33431 2 2 Table III. Proteins designated as H complex components

Acce. # with Protein Name Motifs Cal. link to seq. Mass

Q13151 hnRNP A0 2 RRMs 30841 P09651 hnRNP A1 2 RRM 38715 P22626 hnRNP A2/B1 2 RRMs 37430 P51991 hnRNP A3 2 RRMs 39686 P07910 hnRNP C1/C2 1 RRM 33299 Q14103 hnRNP D0 2 RRMs 38434 P26599 hnRNP I/PTB 4 RRMs 57221 Q07244 hnRNP K KH 50976 P14866 hnRNP L 3 RRMs 60187 O43390 hnRNP R 3 RRMs 70943 AL031668 hnRNP RALY 1 RRM 32214 P35637 FUS/hnRNP P2 1 RRM 53426 B54857 NF-AT 90k 2 DSRMs, ZF 73339 AJ271745 NFAR-1 DSRM, ZF 76033 A54857 NF-AT 45k DSRM, ZF 44697 AF037448 GRY-RBP 3 RRMs 69633 P43243 Matrin3 94623 O43684 hBUB3 WD40s 37155 Q15717 HuR 3 RRMs 36062 Q92804 TAFII68 1 RRM, ZF 61830 P16991 YB1 COLD 35924 P16989 DBPA COLD 40060 P08107 HSP70 HSP70 70052 P11142 HSP71 HSP70 70898 P11021 GRP78 HSP70 72116 Table I. Known splicing factors/snRNPs/SAPs identified in functional spliceosomes.

The GenBank accession number for each protein is presented with a clickable link to the protein sequence and to the SWISS-PROT (representative peptides matched to MS/MS spectra from AdML spliceosomes are highlighted in red). # of pep. iden. refers to the number of filtered unique MS/MS spectra that match to the corresponding protein. Abbreviations and motifs (obtained from NCBI Conserved Domain Database): SAP, spliceosome-associated protein; SGD, saccharomyces genome database (Stanford, CA); COLD, cold shock RNA binding domain; Cyclophilin, cyclophilin type peptidyl-prolyl cis-trans isomerase; DExD, DExD/H-like helicases superfamily; FF, contains two conserved F residues, often accompanies WW domains; G domain, GTP binding domain that contains a P-loop motif; GYF, contains conserved G-T-F residues; HAT, Half-A-TPR (tetratrico-peptide repeat); HELICc, helicase superfamily c-terminal domain; KH, hnRNP K RNA-binding domain; MIF4G, middle domain of eukaryotic initiation factor 4G; NOP, putative snoRNA binding domain; PABP, poly-adenylate binding protein, unique domain; PP2A, protein phosphatase 2A repeat; PWI, domain in splicing factors; RED, protein with extensive stretch of alternating R and E or D; RRM, RNA recognition motif (RRM, RBD, or RNP domain); RS, RS domain found in SR or SR-related splicing factors; SKIP, conserved domain found in chromatinic proteins; Sm, snRNP Sm proteins; SWAP, suppressor-of-white-apricot splicing regulator; TUDOR, domain present in several RNA-binding proteins; TyrKc, tyrosine kinase, catalytic domain; UBQ, ubiquitin homologues; WD40s, WD40 repeats, structural repeats (blades) of the beta propeller domain; WW, domain with 2 conserved W residues, binds proline- rich polypeptides; ZF, Zinc finger domain.

Table II. Proteins newly identified in functional spliceosomes. Yeast homologs were identified based on high sequence similarity to their human counterparts and are indicated by *. Abbreviations and Motifs: fSAP, functional spliceosome-associated protein; DEATH, domain found in proteins involved in cell death; DSRM, double-stranded RNA binding motif; ERM, ezrin/radixin/moesin family motif; HP, hypothetical protein; NUDIX, mutT-like domain; Ubox, modified RING finger domain. See Table I for additional footnotes and motifs.

Table III. Proteins designated as H complex components.

List of proteins found in the spliceosomes but not designated as spliceosome components because the Table III proteins are abundant in the non-specific H complex. For abbreviations, see footnotes for Tables I and II.