bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Dynamics of during somatic cell reprogramming reveals functions for RNA-binding CPSF3, hnRNP UL1 and TIA1

Claudia Vivori1,2, Panagiotis Papasaikas1,#, Ralph Stadhouders1,##, Bruno Di Stefano1,%, Clara Berenguer Balaguer1,%%, Serena Generoso1,2, Anna Mallol1, José Luis Sardina1,%%, Bernhard Payer1,2, Thomas Graf1,2 and Juan Valcárcel1,2,3*

1 Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Carrer del Dr. Aiguader 88, 08003 Barcelona, Spain 2 Universitat Pompeu Fabra (UPF), Carrer del Dr. Aiguader 88, 08003 Barcelona, Spain 3 Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluís Companys 23, 08010 Barcelona, Spain

* Correspondence to [email protected]

# Current address: Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66 / Swiss Institute of Bioinformatics, 4058 Basel, Switzerland

## Current address: Departments of Pulmonary Medicine and Cell Biology, Erasmus MC, Rotterdam, The Netherlands

% Current address: Department of Molecular Biology, Massachusetts General Hospital / Center for Regenera- tive Medicine / Center for Cancer Research, Massachusetts General Hospital / Harvard Medical School, Boston, MA, USA / Department of Stem Cell and Regenerative Biology / Harvard Stem Cell Institute, Harvard University, Cambridge, MA, USA

%% Current address: Josep Carreras Leukaemia Research Institute, Carretera de Can Ruti, Camí de les Escoles, s/n, 08916 Badalona, Spain

1 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Abstract in coding capacity, stability or translational efficiency, and that can be translated into pro- In contrast to the extensively studied teins with distinct structural and functional rewiring of epigenetic and transcriptional pro- properties (Pan et al. 2008; E. T. Wang et al. grams required for cell reprogramming, the 2008). AS contributes to the regulation of dynamics of post-transcriptional changes and many biological processes in multicellular eu- their associated regulatory mechanisms re- karyotes, including embryonic development - main poorly understood. Here we have stud and tissue specification (reviewed in Baralle ied the dynamics of alternative splicing (AS) and Giudice, 2017). During the last decade, changes occurring during efficient repro- progress has been made to understand the gramming of mouse B cells into induced role of post-transcriptional regulation (includ- pluripotent stem (iPS) cells. These changes, ing AS) in the maintenance of cellular pluripo- generally uncoupled from transcriptional reg- tency and cell fate decisions, discovering ulation, significantly overlapped with splicing differentially spliced between stem cells programs reported during reprogramming of and differentiated cells and splicing regulators mouse embryonic fibroblasts (MEFs). Correla- that control these choices (G. W. Yeo et al. tion between expression of potential 2009; Salomonis et al. 2010; Gabut et al. 2011; regulators and specific clusters of AS changes Han et al. 2013; Venables et al. 2013; Lu et al. enabled the identification and subsequent 2014; Yamazaki et al. 2018; Solana et al. 2016a). validation of CPSF3 and hnRNP UL1 as facilita- tors, and TIA1 as repressor of MEFs repro- During reprogramming into induced gramming. These RNA-binding proteins con- pluripotent stem (iPS) cells, somatic cells revert trol partially overlapping programs of splicing to a pluripotent state after overexpression of regulation affecting genes involved in devel- the factors OCT4, SOX2, KLF4 and opmental and morphogenetic processes. Our MYC (OSKM) (Takahashi and Yamanaka 2006). results reveal common programs of splicing Substantial progress has been made to under- regulation during reprogramming of different stand the process at the transcriptional and cell types and identify three novel regulators epigenetic level, such as by identifying nu- of this process. merous roadblocks and some facilitators, but comparatively little is known about how post- transcriptional regulation impacts cell fate de- Keywords cisions. Recent work has revealed the func- tional relevance and conservation of splicing Alternative splicing, Somatic cell repro- regulation during reprogramming (Han et al. gramming, CPSF3, hnRNP UL1, TIA1, Pluripo- 2013; Ohta et al. 2013; Toh et al. 2016; Kanitz et tency al. 2019; reviewed in Zavolan and Kanitz, 2018). Previously, a conserved functional splic- Introduction ing program associated with pluripotency and repressed in differentiated cells by the RNA- Alternative splicing (AS) is a widespread binding proteins MBNL1 and MBNL2 was pre- mechanism of gene regulation that generates viously reported (Han et al. 2013). This splicing multiple mRNA isoforms from a single gene, program includes a mutually exclusive exon dramatically diversifying the transcriptome event in the transcription factor FOXP1: a (and proteome) of eukaryotic cells. 95% of switch in inclusion of Foxp1 exons 16/16b multi-exonic mammalian genes undergo AS, modulates the functions of the transcription producing mRNA isoforms which often differ factor between pluripotent and differentiated

2 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

cells (Gabut et al. 2011). Illustrating the com- Results plexity of such splicing program, dynamic changes of AS during cell reprogramming of mouse embryonic fibroblasts (MEFs) revealed Dynamics of alternative splicing sequential waves of exon inclusion and skip- in C/EBPα-enhanced B cell ping in reprogramming intermediates and the reprogramming occur independently from changes functional role of splicing regulators in modu- lating reprogramming, in particular during the To study the dynamics of changes in alter- initial mesenchymal-to-epithelial transition native splicing (AS) during cell reprogram- (MET) phase (Cieply et al. 2016). Given the lim- ming, primary mouse pre-B cells (hereafter ited efficiency of cell reprogramming in this referred to as “B cells”) were reprogrammed as system, subpopulations of reprogramming previously described (Di Stefano et al. 2016; intermediates had to be isolated through the Stadhouders et al. 2018; Di Stefano et al. 2014). expression of a pluripotency marker, biasing Briefly, B cells were isolated from bone marrow studies to the most prevalent and dominant of reprogrammable mice, containing a doxy- factors. cycline-inducible OSKM cassette and an OCT4- GFP reporter. These cells were infected with a Here we took advantage of the rapid, high- retroviral construct containing an inducible ly efficient and largely synchronous repro- version of C/EBPα fused to the estrogen recep- gramming of pre-B cells (hereafter referred to tor ligand-binding domain (ER). Infected cells as “B cell reprogramming”), obtained by a were selected and re-plated on a feeder layer pulse of the transcription factor C/EBPα fol- of inactivated mouse embryonic fibroblasts lowed by induced OSKM expression (Di Ste- (MEFs). This was followed by a 18h-long pulse fano et al. 2014, 2016) to study the dynamics of β-estradiol, triggering the translocation of of AS during this transition. The essentially C/EBPα-ER to the nucleus and poising the B homogeneous reprogramming of the cells in cells for efficient and homogeneous repro- this system allowed detailed temporal tran- gramming (Di Stefano et al. 2016, 2014). After scriptome analyses of the bulk population, washout of β-estradiol, reprogramming was without the need of selecting for reprogram- induced by growing the cells for 8 days in re- ming intermediates. We established clusters of programming medium containing doxycycline temporal regulation and compared these (see Methods and Stadhouders et al. 2018). changes with the ones differentially spliced in RNA was isolated every other day from dupli- MEF reprogramming (Cieply et al. 2016). Ana- cates and subjected to paired-end sequencing lyzing the dynamic expression of RNA-binding (RNA-seq), resulting in high coverage (more proteins (RBPs) during reprogramming, we than 100 million reads per sample) (Figure inferred potential AS regulators, of which three 1A). As positive controls, mouse embryonic were studied in detail. Characterization of stem (ES) cells and induced pluripotent stem these factors, namely CPSF3, hnRNP UL1 and (iPS) cells were also sequenced. TIA1, by perturbation experiments demon- strated their role as AS regulators in the induc- AS analysis was performed using vast-tools tion of pluripotency. (Tapial et al. 2017), an event-based software that quantifies Percent Spliced-In (PSI) values of annotated AS events and constitutive exons in all samples. These analyses revealed more than 14000 AS changes during the entire re- programming time course (for any possible

3 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Cluster 01 - n=511 A C 2 1 Mouse Poised Day Day Day Day iPS cells ES cells B cells B cells 2 4 6 8 (iPS) (ES) 0 (B) (Bα) −1 Scaled PSI PULSE −2

18h B Bα D2 D4 D6 D8 iPS ES C/EBPα pulse OSKM induction Cluster 02 - n=348 Cluster 03 - n=343 2 B 1 ∆PSI ≥ 10% 0 2000

Scaled PSI −1 EARLY 1000 −2 B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES 0 Cluster 04 - n=438 Cluster 05 - n=364 2 1000 1 0

erentially spliced events 2000 Scaled PSI

" −1

Cassette MIDDLE exons −2

(compared to B cells) 3000 Retained introns B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES Alternative 5’SS Cluster 06 - n=337 4000 Alternative Event type 3’SS ∆PSI ≤ -10% 2 Number of di Bα D2 D4 D6 D8 iPS ES 1 0 Condition LATE D Scaled PSI −1 SPAG9 exon 27 NUMB exon 3 −2

B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES 300 bp -

100 bp - 100 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Membership score 50 50 PSI 0 0 F RT-PCR RNA-seq B GENE EXPRESSION LEF1 exon 6 LEF1 exon 11 Bα B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES D2

100 bp - 100 bp - D4 100 100 50 50 D6 PSI 0 0 D8 E GRHL1 exon 6 DNMT3B exon 10 iPS B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES

300bp - ALTERNATIVE SPLICING ES 100 bp - α B Pearson Correlation ES

100 100 B D2 D4 D6 D8 iPS † † Coe&cient 50 50 PSI 0 0 * RT-PCR RNA-seq 0.2 0.6 1

Cluster 01 - n=447 Cluster 02 - n=304 Cluster 03 - n=308 100 G H 6% 16% 16% 21% 2 25% 54% 43% 32% 1 *** *** *** 6% 6.71% 4.28% 16.55% 13% 75 11% 0 12% −1 2.63% 5.52%

Scaled cpm 4.25% −2 3% 50 60% 2% *** 46% 48% 40% ns ns 46% Cluster 04 - n=384 Cluster 05 - n=318 Cluster 06 - n=287 44%

2 25 38% 6% 4% 4% Percentage of exons 1% 1 3% 0% 3% 1% 1% 9.63% 16.35% 4.18% 1% 0% 3% 1% 0% 1% 0% 0% 0 1% 0% 16% 17% 16% 13% 1% 12% 0% 9% 7.32% 4% −1 12.76% 0

Scaled cpm 20.44% Cluster All 01 02 03 04 05 06 −2 All AS PULSE EARLY MIDDLE LATE exons B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES ORF-disruptive ORF-uncertain upon exclusion

cation ncRNA

gene expression gene expression other gene ! ORF-disruptive concordant contrasting expression upon inclusion 3’UTR with AS with AS pro#les of exons ORF-preserving 5’UTR Classi Figure 1. Dynamics of alternative splicing and gene expression changes during B cell reprogramming (legend on next page) 4 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 1. Dynamics of alternative splicing and gene expression changes during B cell reprogramming. (A) Schematic representation of C/EBPα-mediated B cell reprogramming time points and related controls analyzed by RNA-seq. Bα cells: B cells treated for 18h with β-estradiol to activate C/EBPα. (B) Stacked bar plot representing cumulative number of events differentially spliced between B cells and subsequent reprogramming stages (x axis), as well as controls (iPS and ES cells). The y axis represents the number of differentially spliced events compared to B cells. The upper part corresponds to events with positive ∆PSI values compared to B cells (>10%), the lower part to events with negative ∆PSI values compared to B cells (<-10%). Red/orange areas: alternative 3’/5’splice sites (Alt3/Alt5) respectively; grey areas: retained introns; blue-green areas: cassette exons of increasing complexity (see Methods). See also Figure S1A. (C) Clusters of cassette exons displaying related profiles of inclusion level changes during B cell reprogramming. Six clusters (out of a total of 12 identified, see Figures S1B-C) are shown and classified into 4 categories, corresponding to the timing of the main shift observed (left). The y axis represents scaled Percent Spliced In (PSI) values. The color of each line corresponds to the membership score of each exon relative to the general trend of the cluster. n, The size (number of events) of each cluster is indicated (n). (D) RT-PCR validation of selected cassette exon changes inferred from RNA-seq analyses. AS events assigned to different clusters were analyzed by semi-quantitative RT-PCR and quantified by capillary electrophoresis. Each panel includes a gel representation of the inclusion (top) and skipping (bottom) amplification products of one of the replicates and the corresponding quantification of the duplicates (PSI = molarity of inclusion product / molarity of inclusion + skipping products). Light grey columns: PSI values quantified by RT-PCR; dark grey columns: PSI values quantified by RNA-seq analysis using vast-tools software (n=2). (E) Validation of Grhl1 exon 6 and Dnmt3b exon 10 inclusion level changes, previously associated with reprogramming and pluripotency, performed as in panel D. Crosses indicate time points for which PSI values were calculated with low coverage (less than 10 actual reads). (F) Heatmap displaying correlations between B cell reprogramming stages according to gene expression (blue, top heatmap) and AS (red, bottom heatmap). Color scales represent Pearson correlation coefficient values calculated on the cpm values of the 25% most variably expressed genes or upon the PSI values of the 25% most variable cassette exons. (G) Gene expression patterns of genes containing the exons belonging to each of the AS clusters in panel C. Genes with expression changes correlating with the cluster centroid or its negative (membership > 0.3) are highlighted in blue and green, respectively, while the grey portion represents the (majority of) genes which follow gene expression profiles that do not match the changes in inclusion patterns of their exon(s). Percentage of concordant/contrasting patterns are displayed for each cluster. See also Figure S1D. (H) Stacked bar plot representing the percentage of cassette exons in each of the AS clusters in panel C classified according to the following categories: disrupting the open reading frame (ORF) upon exclusion or inclusion, preserving the transcript ORF, mapping in non-coding RNAs, 3’UTRs or 5’UTRs or uncertain. The first column represents all exons differentially spliced between at least one pair of conditions, while the following ones represent the exons belonging to the each AS cluster (indicated below the bar). Where indicated, statistical significance was calculated by Fisher’s exact test on the proportion of exons in the cluster compared to the general distribution of all AS exons (*, **, *** = p value < 0.05, 0.01, 0.001 respectively).

pair of conditions: minimum absolute ∆PSI of cell reprogramming (Figures 1C and S1B-C, 10 between PSI averages and minimum differ- Table S1). Six major clusters of AS dynamic ence of 5 between any individual replicates profiles were detected: 1) exons that become across conditions) and a gradual increase in included already after the C/EBPα pulse; 2) and the number of differentially spliced events 3) exons that are regulated either towards in- when comparing B cells to progressive stages creased inclusion or skipping early after OSKM of somatic cell reprogramming (Figures 1B induction (day 2); 4) and 5) exons that display and S1A). Different classes of AS events were changes in inclusion at middle stages of re- detected, with similar relative proportions at programming (day 4 onwards); 6) a cluster of the various time points: 31-47% cassette ex- exons only included at the latest steps of re- ons, 5-11% alternative 3’ splice sites and 6-11% programming (day 8 onwards) (Figure 1C). alternative 5’ splice sites, and 33-57% retained Each of these clusters consisted of 300-500 introns (Figure 1B). exons. The inclusion levels of examples of ex- ons belonging to different cluster types were To classify the dynamics of AS changes validated by semi-quantitative RT-PCR (Figure during reprogramming, we selected cassette 1D). For reference, changes in the PSI values of exon (CEx) events differentially spliced in at Grhl1 exon 6 and Dnmt3B exon 10, previously least one comparison (4556 exons) and per- described to be associated with reprogram- formed a fuzzy c-means clustering analysis on ming and pluripotency (Gopalakrishnan et al. their scaled PSI values (Mfuzz; Kumar and 2009; Gopalakrishna-Pillai and Iverson 2011; Futschik, 2007). This analysis revealed diverse Cieply et al. 2016) were also quantified and kinetics of exon inclusion occurring during B found to follow similar inclusion patterns in B

5 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

cell reprogramming, compared to the ones switches (Figure 1H). Consistent with these previously described in other systems (Figure concepts, while PSI values of cassette exons 1E). tend to increase throughout reprogramming (Figure 1B, blue bars), intron retention – gen- We next sought to compare general AS erally leading to NMD – tends to decrease in and gene expression dynamics during repro- the course of reprogramming (Figure 1B, grey gramming. Gene expression was analyzed us- bars). ing the edgeR package (Robinson et al. 2009) and the level of similarity between each pair of conditions was estimated using a Pearson cor- relation coefficient on the cpm (counts per AS changes at intermediate million) values of the most variable genes (3rd reprogramming stages show quartile coefficient of variation, n=2961). In commonalities with MEF addition, Pearson correlation coefficient was reprogramming calculated based on the PSI values from the most variable cassette exons (3rd quartile co- As a first step to identify key AS events and efficient of variation, n=1140). Both analyses potential regulators important for repro- showed pronounced switches between days 4 gramming, we compared our transcriptome and 6 post-OSKM induction (Figure 1F). Over- analysis of B cell reprogramming with that of all, however, most clusters displayed matching MEF reprogramming (Cieply et al. 2016). The profiles in gene expression and AS of any in- two datasets differ in the experimental design cluded exon in less than 10% of the genes, and time frame (compare Figures 1A and 2A). reaching a maximum of 20% in a subset of AS Specifically, the MEF system required sorting clusters (Figures 1G and S1D-E). These results of cells undergoing reprogramming using the argue, as observed before in a variety of other SSEA1 surface marker, while this was not nec- systems (e.g. Pan et al. 2004), that global pro- essary for B cell reprogramming due to its effi- grams of regulation of gene expression and AS ciency. To facilitate the comparison between are uncoupled from each other. the two transcriptome datasets, vast-tools analysis was applied to the MEF dataset (Cieply Interestingly, a larger proportion of exons et al. 2016), which yielded 843 cassette exons included (or skipped) at early stages of repro- differentially spliced (|∆PSI| ≥10, range ≥ 5) gramming are predicted to disrupt the open between any pair of samples of the MEF re- reading frame (ORF) of the transcripts upon programming dataset. Despite differences in exon skipping (or inclusion, respectively), the experimental set up, 79% of these exons compared to middle/late exons and to the (669 out of 843) were also found to be differ- general distribution of mapped alternative entially spliced in B cell reprogramming (Fig- exons (Figure 1H, classification as described in ure S2A, Table S2). A similar level of overlap Tapial et al. 2017). This suggests a higher im- was also observed at the level of other AS pact of AS-mediated on/off regulation of the events (72% of AS events in general, Figure corresponding expression via non- S2A). The overlap between differentially sense-mediated decay (NMD), and a switch to spliced exons in the two systems was higher expression of full-length proteins during early for middle and late AS clusters than for early steps of reprogramming. Middle clusters, in- clusters of B cell reprogramming (Figure 2B), stead, contain more exons predicted to pre- as might be expected from the convergence serve the coding potential of their transcripts, on a common program of AS related to implying modulation of the functions of their pluripotency. encoded protein isoforms rather than on/off

6 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

A Mouse Embryonic D Fibroblasts (Day 0)

SSEA1+ sorting Day Day Day Day Day iPS cells 4 7 10 15 20 (iPS) dataset from Cieply et al., 2016 B 30

20

10

511 348 343 438 364 337 0 with MEFs reprogramming Cluster 01 02 03 04 05 06

Percentage of exons overlapping PULSE EARLY MIDDLE LATE C Early reprogramming B_D4

0.5 B_D2

M_Day07 M_Day10 B_D6 M_Day04 M_Day15 xplained e M_Day20 B_B 0.0 iance r

PC2 B_D8 a v Late B_Ba reprogramming

18.06% of M_Day00 B_iPS 0.5 B_ES − Starting cells Pluripotent M_iPS cells B Bα D2 D4 D6 D8 iPS ES D0 D4 D7 D10 D15 D20 iPS −1.5 −1.0 −0.5 0.0 0.5 B cell reprogramming MEF reprogramming PC1 41.42% of variance explained Scaled PSI

−4 −2 0 2 4

Puf60 Puf60 E a Khdrbs3 Khdrbs3 Dppa5a Dppa5a Snrpa1 Snrpa1 Bcas2 Bcas2 Lsm5 Lsm5 Snrpd1 Snrpd1 Sf3b2 Sf3b2 Elavl1 Elavl1 Prpf38a Prpf38a Snrnp40 Snrnp40 Snrpb2 Snrpb2 Hnrnpf Hnrnpf Prpf3 Prpf3 Srsf2 Srsf2 U2af1 U2af1 Hnrnpa1 Hnrnpa1 Prpf4 Prpf4 Hnrnpc Hnrnpc Celf1 Celf1 Scaled Rank Aqr Aqr Cstf2t Cstf2t CPM (Han et al., 2013) Ptbp1 Ptbp1 Khdrbs1 Khdrbs1 4 Srsf3 Srsf3 200 Qk Qk Tra2b Tra2b Srsf10 Srsf10 2 Cpsf6 Cpsf6 150 Fubp1 Fubp1 U2surp U2surp Pnn Pnn 0 Rbm4 Rbm4 100 Mbnl2 Mbnl2 b Mex3b Mex3b Rbm18 Rbm18 −2 50 Rbfox2 Rbfox2 Rbms3 Rbms3 Zcchc24 Zcchc24 Celf2 Celf2 −4 0 Mbnl1 Mbnl1 Esrp1 Esrp1 c Esrp2 Esrp2 Slu7 Slu7 Nova2 Nova2 Srsf11 Srsf11 Ptbp2 Ptbp2 Son Son Rbm5 Rbm5 Sfswap Sfswap Luc7l2 Luc7l2 Snrnp35 Snrnp35 Rbmx Rbmx Hnrnpul1 Hnrnpul1 1 100 10000 B Bα D2 D4 D6 D8 iPS ES D0 D4 D7 D10 D15 D20 iPS Fold Change ES vs. Di! (log10) B cell reprogramming MEF reprogramming (Han et al., 2013)

Figure 2. B cell and MEF reprogramming systems share a program of AS changes (legend on next page) 7 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 2. B cell and MEF reprogramming systems share a program of AS changes. (A) Schematic representation of MEF reprogramming time points analyzed by RNA-seq in (Cieply et al. 2016). (B) Percentage of exons in each B cell reprogramming AS cluster that are also detected as differentially spliced in the MEF reprogramming dataset of (Cieply et al. 2016). The number of events in each cluster is indicated at the bottom of the corresponding bar. The magenta dashed line indicates the average percentage for all 12 clusters. (C) PCA analysis of the 25% most variably expressed genes, segregated using k-means into 4 clusters: differentiated cells, early and late reprogramming and pluripotent cells, highlighted by colors and boxes. Circles: B cell reprogramming time points. Squares: MEF reprogramming time points. (D) Heatmap representing scaled PSI values (average between replicates) of exons differentially spliced in at least one time point in both B cell (left) and MEF reprogramming (right), with the corresponding hierarchical clustering. (E) Heatmap representing the expression of RNA-binding proteins known to be involved in pluripotency, somatic cell reprogramming and/or development. Scaled cpm values (average between replicates) of both datasets are shown, with the corresponding hierarchical clustering. The bar plot (right) represents (when available) the fold change expression between ES cells and differentiated mouse tissues calculated in (Han et al. 2013) and the corresponding ranking (color of the bar).

To further compare the two datasets, we cluster a) containing factors with higher ex- performed a Principal Component Analysis pression in iPS/ES samples (fold change > 0 (PCA) on the gene expression of the most vari- and high rank score according to Han et al. able genes in both datasets (3rd quartile coef- 2013, right panel) and known to promote ficient of variation, n=2679) separating the pluripotency/reprogramming, such as U2af1 stages into four distinct groups by k-means or Srsf2/3 (Lu et al. 2014; Ohta et al. 2013). In clustering (Figure 2C). These groups clearly contrast, cluster b) contains factors with higher distinguish between starting cells, early and expression in the starting somatic cells, which late stages of reprogramming and pluripotent includes known repressors of reprogramming cells. The PCA allowed us to outline a “repro- such as Mbnl1/2, Celf2 and Zcchc24 (Cieply et gramming pseudotime” which was subse- al. 2016; Han et al. 2013; Solana et al. 2016b). quently used to select AS exons and regulators Finally, cluster c) contains factors with more for functional characterization. It also further variegated expression patterns at early and highlighted the transition between days 4 and intermediate reprogramming steps (and mild- 6 in B cell reprogramming, juxtaposing them ly repressed in iPS/ES cells), including Esrp1/2 with days 7/10 in MEF reprogramming. A (Cieply et al. 2016; Kanitz et al. 2019) (Figure heatmap displaying the scaled PSI values of 2E). the 669 common differentially spliced exons of Taken together, our analyses revealed the two datasets at the different steps of the widespread AS changes during B cell repro- reprogramming process shows substantial gramming, which significantly overlapped similarities in exon inclusion (Figure 2D). with those of MEF reprogramming, especially In contrast to the similarities in AS profiles at intermediate phases of the process. between the two reprogramming systems, the observed that expression profiles of RNA-bind- Predicted regulators of alternative ing proteins (RBPs) differed significantly be- splicing during somatic cell tween the two datasets. Specifically, we ana- reprogramming lyzed the expression of RBPs previously associ- ated with pluripotency, somatic cell repro- To infer potential regulators of exons dif- gramming and/or development (Cieply et al. ferentially spliced during B cell reprogramming, 2016; Han et al. 2013; Lu et al. 2014; Ohta et al. we extracted gene expression profiles of 507 2013) and presumably mechanistically linked RBPs (as annotated in the Uniprot database), to different aspects of post-transcriptional which were detectably expressed (cpm ≥ 5 in regulation (Figure 2E). Despite these more at least 33% of samples) and featured a mini- divergent profiles, hierarchical clustering cap- mum of variation across the B cell reprogram- tured three known functional groups, with ming dataset (coefficient of variation ≥ 0.2).

8 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Cluster 01 Cluster 06 2 2 Cpsf3 A 1 1 0 0 −1 −1 LATE PULSE Scaled cpm −2 n=26 Scaled cpm −2 n=157

B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES

Cluster 02 Cluster 03 2 2 n=6 Hnrnpul1 n=38 1 1 0 0 −1 −1 EARLY Scaled cpm −2 Scaled cpm −2 Tia1

B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES

Cluster 04 Cluster 05 2 2 n=171 n=36 1 1 0 0 −1 −1 Scaled cpm Scaled cpm MIDDLE −2 −2

B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES

Cluster 01 Cluster 06 2 2 B Hnrnpul1 n=26 1 1 0 0 −1 −1 LATE PULSE Scaled cpm Scaled cpm −2 Tia1 −2 n=4

B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES

Cluster 02 Cluster 03 2 2 Cpsf3 1 1 0 0 −1 −1 EARLY Scaled cpm Scaled cpm −2 n=14 −2 n=44

B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES

Cluster 04 Cluster 05 2 2 n=227 1 1 0 0 −1 −1 Scaled cpm Scaled cpm MIDDLE −2 n=27 −2

B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES

Figure 3. Inferred regulators of AS changes during B cell reprogramming. (A) Gene expression profiles of RNA-binding proteins (RBPs) correlating with the centroid of each AS cluster (positive regulators). Average scaled cpm values are represented by each line and the number of putative positive or negative regulators of each cluster is indicated (n). Wordclouds represent the regulators in each group, the size of the word being proportional to the correlation between gene expression and AS profiles (membership value). Colors indicate RBPs annotated as a splicing-related proteins. Cspf3, Hnrnpul1 and Tia1 are highlighted in plots and wordclouds. See also Figure S3A. (B) Gene expression profiles of RBPs correlating with the negative of the centroid of each AS cluster (negative regulators). Displays as in panel A. See also Figure S3B.

Using the membership function of the Mfuzz RBP to each AS cluster centroid. This allowed package, we correlated (positively or negative- us to derive a list of potential regulators whose ly) the scaled gene expression profile of each changes in levels of expression correlate (or

9 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

anti-correlate) with the profiles of AS changes Knockdowns of CPSF3 or hnRNP UL1 in each cluster (membership > 0.3) (Figures 3 repress MEF reprogramming and S3, Table S3). In line with previous work The Cleavage and Polyadenylation Speci- (Han et al. 2013; Ohta et al. 2013; Solana et al. ficity Factor (CPSF) complex is primarily in- 2016), our analysis identified known AS regula- volved in mRNA polyadenylation, but a num- tors involved in the induction or repression of ber of its components, including CPSF3, have pluripotency/developmental decisions, such been shown to also play a role in splicing (Ky- as Mbnl1/2, Celf2 (both potential negative reg- burz et al. 2006; Larson et al. 2016; Misra et al. ulators of pulse/late clusters 1/6) and U2af1 2014, 2015). Cpsf3 expression increased early (potential positive regulator of middle cluster during reprogramming of both B cells and 4) (Figure 3). Importantly, additional RBPs and MEFs (Figures 3A and 4A). The expression of splicing factors without previously known Hnrnpul1, an hnRNP whose function in RNA functions in reprogramming emerged as pos- metabolism is poorly understood, decreases in sible regulators. B cells after the C/EBPα pulse and then stabi- To further define such regulators, we fo- lizes at levels similar to those observed cused on RBPs that change their expression throughout MEF reprogramming (Figures 3B after the C/EBPα pulse or at early stages of re- and 4B). programming, as we speculated that these To test their effects on MEF reprogram- could drive the inclusion/skipping of both ear- ming, two different short hairpin RNAs ly and intermediate AS exons during the re- (shRNAs) for each factor were cloned into programming of C/EBPα-poised B cells. We lentiviral vectors containing a GFP reporter. selected CPSF3, a top predicted positive regu- The protocol used is summarized in Figure lator of very early events whose expression S4A. Briefly, early passage MEFs isolated from increases during B cell and MEF reprogram- reprogrammable mice were transduced with ming, for functional validation during the in- constructs bearing the shRNAs and the cells duction of pluripotency (positive predicted treated with doxycycline to activate OSKM regulator of cluster 1, Figure 3A). We also (day 0). GFP+ cells were FACS-sorted 48 hours chose two predicted negative regulators of post-infection and seeded on inactivated MEFs cluster 1, namely TIA1, a well-described AS serving as feeder layers, to proceed with re- regulator with roles in cell proliferation and programming for up to 14 days. Cells were development (see below), and hnRNP UL1, a harvested every other day and samples ana- member of the heterogeneous ribonucleopro- lyzed by RT-qPCR for the expression of the tein (hnRNP) family whose involvement in shRNAs target and of pluripotency markers, splicing regulation is largely unexplored (Fig- and by flow cytometry to quantify the propor- ure 3B). Due to experimental difficulties in ge- tion of cells expressing the early pluripotency netic manipulation of the B cell reprogram- cell surface marker SSEA1 (appearing around ming system (see Discussion), we modulated day 5-6) and the later pluripotency marker EP- their expression at early stages of MEF repro- CAM1 (appearing around day 8, J. M. Polo et al. gramming (Stadtfeld et al. 2010) and exam- 2012). At day 14, two days after removing ined the consequences on the dynamics of doxycycline, cultures were stained for alkaline pluripotency induction and on relevant AS phosphatase (AP) activity to identify iPS alterations. colonies and to assess the efficiency of repro- gramming.

10 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Di!. Early Late iPS A 200 C Cpsf3

(cpm) 150 0.08 ) 100 0.06

Gapdh 0.04 ** expression 50 0.02

Cpsf3 (over

Cpsf3 0

Relative expression 0.00 B cell reprogramming B Bα D2 D4 D6 - D8 iPS 04 06 08 10 12 04 06 08 10 12 04 06 08 10 12 04 06 08 10 12 MEF reprogramming D0 - D4 D7 D10 D15 D20 iPS NI shSCR shCPSF3#1 shCPSF3#2

B Di!. Early Late iPS D 500

(cpm) Hnrnpul1 400 0.20 ) 300 0.15

expression 200 **

Gapdh 0.10 * *** 100 * ** * * 0.05 ** Hnrnpul1 (over 0 Hnrnpul1

Relative expression 0.00 B cell reprogramming B Bα D2 D4 D6 - D8 iPS 04 06 08 10 12 04 06 08 10 12 04 06 08 10 12 04 06 08 10 12 MEF reprogramming D0 - D4 D7 D10 D15 D20 iPS NI shSCR shUL#1 shUL1#2

E 0.08 Pou5f1 F 2.0 day 6 cells) - ) 0.06 1.5

1.0 Gapdh 0.04 EPCAM1 + ** **

(over 0.02 0.5 *** ** *** Fold change early Relative expression reprog. intermediates

0.00 (% SSEA1 0.0 0406081012 0406081012 0406081012 0406081012 0406081012 0406081012 NI shSCR shUL1#1shUL1#2 NI shSCR shCPSF3#1 shCPSF3#2 shUL#1 shUL1#2 shCPSF3#1shCPSF3#2 0.15 Nanog ) 0.10 G 2.5 day 10 Gapdh cells) 2.0 + 0.05

(over 1.5

Relative expression *

*** ** EPCAM1 + 1.0 * 0.00 *** 0406081012 0406081012 0406081012 0406081012 0406081012 0406081012 0.5 *** NI shSCR shCPSF3#1 shCPSF3#2 shUL#1 shUL1#2 Fold change late reprog. intermediates

H (% SSEA1 0.0 250 NI shSCR shUL1#1shUL1#2 shCPSF3#1shCPSF3#2 200 2.5 day 12

cells) 2.0 150 * + * 1.5 100 ** EPCAM1

+ 1.0 ** * 50 *** 0.5 *** Fold change late *** Number of AP+ colonies reprog. intermediates

0 (% SSEA1 0.0 NI shSCR shUL1#1shUL1#2 shCPSF3#1shCPSF3#2

NI shSCR shCPSF3#1 shCPSF3#2 shUL1#1 shUL1#2 Figure 4. Knockdowns of CPSF3 or hnRNP UL1 impair MEF reprogramming (legend on next page) 11 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 4. Knockdowns of CPSF3 or hnRNP UL1 impair MEF reprogramming. (A) Gene expression profiles of Cpsf3 in B cell reprogramming and MEF reprogramming (cpm values, blue and magenta lines, respectively). The x axis represents “reprogramming pseudotime” in both datasets, calculated through the PCA analysis of Figure 2B. (B) Gene expression levels of Hnrnpul1 in B cell and MEF reprogramming (cpm values, blue and magenta lines respectively), as in panel A. (C) Relative expression levels of Cpsf3 mRNA quantified by RT-qPCR in non-infected cells (NI), cells transduced with a scrambled control shRNA (shSCR) or one of two shRNAs specific for Cpsf3. The y axis represents the relative expression (2^(-∆Ct) value) of Cpsf3 after normalization over Gapdh. (D) Relative expression levels of Hnrnpul1 mRNA quantified by RT-qPCR as in panel C. (E) Relative expression levels of Pou5f1 and Nanog quantified by RT-qPCR as in panels C and D. See also Figure S4B. (B-E) Average of biological triplicates and SD values are shown. Statistical significance was calculated by t-test on ∆Ct values compared to the NI control (*, **, *** = p value < 0.05, 0.01, 0.001 respectively, corrected for multiple testing using Holm-Sidak method). (F) Reduction of early reprogramming intermediates at day 6 post-OSKM induction upon knockdowns of Cpsf3 and Hnrnpul1. Fold change was calculated from the percentage of SSEA1+EPCAM1- cells (of the total of alive cells) in every condition compared to the shSCR control using flow cytometry analysis. (G) Reduction of late reprogramming intermediates at day 10 and 12 post- OSKM induction upon knockdown of Cpsf3 or Hnrnpul1. Fold change was calculated from the percentage of SSEA1+EPCAM1+ cells (of the total of alive cells) in every condition compared to the shSCR control using flow cytometry analysis. See Figure S4C for examples of gates. (H) Number of colonies stained with alkaline phosphatase (AP) at day 14 post-OSKM induction upon knockdown of Cpsf3 or Hnrnpul1 On the bottom images of representative wells are shown for every condition. (F,G,H) Average of biological triplicates and SD values are shown. Statistical significance was calculated by t-test comparing each condition to the shSCR control (*, **, *** = p value < 0.05, 0.01, 0.001 respectively, corrected for multiple testing with Holm-Sidak method).

Cells infected with shRNAs targeting Cpsf3 shRNAs targeting Cpsf3 or those targeting Hn- and Hnrnpul1 were compared with non-infect- rnpul1 (Figure 4H). ed (NI) cells, as well as with cells transduced Taken together, these data show that the with a scrambled control shRNA (shSCR). knockdown of Cpsf3 or Hnrnpul1 reduces the Knockdown efficiency, quantified by RT-qPCR MEF to iPS reprogramming efficiency, there- (Figures 4C and D), showed a 51% and 58% fore arguing that both RBPs contribute to the reduction in Cpsf3 at day 6 post-infection with induction of pluripotency. shCPSF3#1 and #2, respectively, becoming slightly less efficient during reprogramming (Figure 4C). Similarly, knockdown of Hnrnpul1 Overexpression of TIA1 represses resulted in 81% and 76% reduction of mRNA MEF reprogramming levels at day 6 post-infection with shUL1#1 TIA1 is an RBP and AS regulator (Förch et al. and #2, respectively (Figure 4D). The increase 2000, 2002; Izquierdo et al. 2005). Tia1 mRNA in the mRNA levels of endogenous Pou5f1 (en- levels decrease early during B cell reprogram- coding OCT4) and Nanog pluripotency mark- ming (Figures 3B and 5A), compatible with a ers was delayed in both Cpsf3 and Hnrnpul1 potential role as a repressor of cell repro- knockdown cells (compare for example values gramming in this system. Because depletion of at day 8 in Figures 4E and S4B), suggesting TIA1 induces mouse embryonic lethality and slower reprogramming kinetics. Consistent the protein is important for MEF proliferation, with these observations, knockdown of Cpsf3 cell cycle progression, autophagy and numer- and Hnrnpul1 reduced the percentage of cells ous signaling pathways (Sánchez-Jiménez and expressing SSEA1 at day 6 (Figure 4F) and Izquierdo, 2013), we decided to test the effects cells expressing both SSEA1 and EPCAM1 at of TIA1 overexpression during MEF repro- days 10-12 (Figures 4G and S4C). Survival of gramming. For this purpose, primary MEFs cells during reprogramming did not seem to were infected at day 0 (concomitantly with the be affected in the knockdowns because no induction of reprogramming), with retroviral significant increase in the proportion of DAPI- constructs containing a T7-tagged Tia1 cDNA stained cells was observed throughout repro- and a GFP reporter to allow sorting of the gramming (Figure S4D). Finally, we counted transduced cells (Figure S4A). Levels of Tia1 the number of AP positive colonies at day 14 were quantified by RT-qPCR (24-fold increase and found that the amount was significantly compared to cells transduced with an empty reduced in cells infected with either the

12 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

A Di". Early Late iPS B 120 2.0 Tia1 **

1.5

(cpm) ** 80 * ) 1.0 * *** 0.5 40 Gapdh

expression 0.20

Tia1 (over 0.15 Tia1 0 0.10 Relative expression B cell reprogramming B Bα D2 D4 D6 - D8 iPS 0.05 MEF reprogramming D0 - D4 D7 D10 D15 D20 iPS 0.00 04 06 08 10 12 04 06 08 10 12 04 06 08 10 12 NI Empty T7-TIA1 C 0.10 Pou5f1

0.08 ) F 0.06 200 Gapdh 0.04

(over 0.02 150 Relative expression 0.00 04 06 08 10 12 04 06 08 10 12 04 06 08 10 12 100 * NI Empty T7-TIA1

0.08 Nanog 50 Number of AP+ colonies ) 0.06 0

Gapdh 0.04

(over 0.02 NI Empty T7-TIA1 Relative expression 0.00 04 06 08 10 12 04 06 08 10 12 04 06 08 10 12 NI Empty T7-TIA1

100 D E G NI 3 day 6 2.0 day 10 day 12 Empty T7-TIA1 cells) cells) - + 1.5 2 50 1.0 EPCAM1 EPCAM1 + 1 + 0.5 Lef1 exon6 PSI Fold change late Fold change early reprog. intermediates reprog. intermediates (% SSEA1 0 (% SSEA1 0.0 0 NI NI NI Day 04 06 08 10 12 14 EmptyT7-TIA1 EmptyT7-TIA1 EmptyT7-TIA1

Figure 5. Overexpression of TIA1 impairs MEF reprogramming (legend on next page)

vector, Figure 5B). Consistent with our predic- (SSEA1+EPCAM1+ cells) compared to empty tion from its expression profile in B cell repro- vector and non-infected controls (Figures 5D- gramming, overexpression of Tia1 repressed E and S4F). In addition, it significantly reduced the induction of endogenous Pou5f1 and the count of AP+ colonies at day 14 post-OSKM Nanog genes (Figures 5C and S4E) and led to induction compared to controls (Figure 5F), a reduction of early SSEA1+EPCAM1- cells and without substantially affecting the viability of of late reprogramming intermediates reprogramming cells (Figure S4G). Of note,

13 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 5. Overexpression of TIA1 impairs MEF reprogramming. (A) Gene expression profiles of Tia1 mRNA in B cell and MEF reprogramming (cpm values, blue and magenta lines respectively). The x axis represents the “reprogramming pseudotime” in both datasets, calculated through the PCA analysis of Figure 2B. (B) Expression levels of Tia1 mRNA relative to Gapdh, quantified by RT-qPCR in non-infected cells (NI), cells transduced with an empty vector (Empty) or with T7-Tia1 cDNA. (C) Expression levels of Pou5f1 and Nanog relative to Gapdh, quantified by RT-qPCR as in panel B. See also Figure S4E. (B,C) Average of biological replicates and SD values are shown. Statistical significance was calculated by t-test on ∆Ct values comparing to the Empty control (*, **, *** = p value < 0.05, 0.01, 0.001 respectively, corrected for multiple testing with Holm-Sidak method). (D) Percentage of SSEA1+EPCAM1- early reprogramming intermediates (day 6 post-OSKM induction) upon Tia1 overexpression determined by FACS (E) Percentage of SSEA1+EPCAM1+ late reprogramming intermediates (days 10 and 12 post-OSKM induction) upon Tia1 overexpression. See Figure S4F for gating strategy. (F) Number of alkaline phosphatase (AP) positive colonies at day 14 post-OSKM induction upon Tia1 overexpression. Images of representative well are shown below. (D,E,F) Average of biological replicates and SD values (n=4). Statistical significance was calculated by t-test comparing to the Empty control (*, **, *** = p value < 0.05, 0.01, 0.001 respectively, corrected for multiple testing with Holm- Sidak method). (G) Inclusion of Lef1 exon 6 upon overexpression of T7-Tia1, quantified by semi-quantitative RT-PCR and capillary electrophoresis. Values represent average and SD. Statistical significance calculated by t-test on average ± SD of the area under the curve in each condition yielded a p value of 0.079.

overexpression of Tia1 delayed the gradual TIA1-dependent events were defined as those skipping of Lef1 exon 6, a conserved AS event changing during reprogramming only in the between B cell and MEF reprogramming (Fig- control or the overexpression condition, with ure 5G), which might partially explain the ob- |∆∆PSI(TIA1-Empty)| ≥ 10 (colored dots in served reduction in reprogramming efficiency. Figure 6B). Similarly, CPSF3-dependent and UL1-dependent events were defined as those Taken together, the observed reduced ex- with |∆∆PSI(average shRNAs-shSCR)| ≥ 10 (only pression of TIA1 during B cell reprogramming events selected by both specific shRNAs were and its impairment of MEF reprogramming considered, 54% and 60% overlap respectively, when overexpressed suggest that it functions colored dots in Figures 6C and S6C). For each as a general repressor of pluripotency induc- RBP, control events, changing during tion. reprogramming but not affected by TIA1, CPSF3 or hnRNP UL1 manipulation, were CPSF3, hnRNP UL1 and TIA1 classified as those differentially spliced regulate alternative splicing between days 0 and 12, either in control or during reprogramming knockdown/overexpression conditions (| ∆PSI(day12-day0)| ≥ 10, range ≥ 5), that To assess the effects of CPSF3, hnRNP UL1 display minimal differences between the ∆PSI and TIA1 manipulation on AS during repro- values in the two conditions (e.g. |∆∆PSI(TIA1- gramming, RNA-seq analyses were carried out Empty)| < 2) (grey dots in Figures 6B-C and with RNAs isolated from MEFs at 0 and at 12 S6C). All sets are summarized in Table S4. days post-OSKM induction, comparing the ef- fects of Cpsf3/Hnrnpul1 knockdown (two dif- We thus identified 387 TIA1-dependent ferent shRNAs for each factor) or Tia1 overex- events and 558 TIA1-independent events. Sim- pression with those of the corresponding ilarly, we identified 357 CPSF3-dependent and shSCR/empty vector controls (Figure 6A). 298 UL1-dependent events (as well as 429 Quantification of gene expression of Tia1, Cps- CPSF3-independent and 662 UL1-independent f3 and Hnrnpul1 at the two timepoints is events). Interestingly, CPSF3-dependent and shown in Figures S6A-B. UL1-dependent events showed a significant overlap (e.g. 70% of UL1-dependent events To determine the impact of these were also CPSF3-dependent events, of which per turbations on AS we calculated 98% changed in the same direction) whereas differentially spliced events between day 0 little overlap was observed with TIA1-depen- and day 12 in each condition as described dent events (Figure 6D, left panel). As we before (|∆PSI(day12-day0)| ≥ 10, range ≥ 5).

14 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. A B C TIA1-dependent CPSF3-dependent hnRNP UL1-dependent Day 0 Day 12 events (n=387) events (n=357) events (n=298) 100 100 100

50 50 50 50 50 50 Empty 40 40 NI 40 (day 12 - day 0)

T7-TIA1 (day 12 - day 0) 0 0 30 0 30 (day 12 - day 0) 30 NI shSCR 20 20 20 shCPSF3#1 10 PSI shUL1 - shSCR)

PSI T7-TIA1 - Empty) 10 -50 10 -50 PSI shCPSF3 - shSCR) -50

shCPSF3#2 <2 <2 ∆∆ <2 ∆∆ shUL1#1 ∆∆

shUL1#2 -100 -100 abs (

-100 abs ( abs ( PSI T7-TIA1 -100 -50 0 50 100 -100 -50 0 50 100 PSI shUL1#1 -100 -50 0 50 100 PSI shCPSF3#1 ∆ ∆

∆PSI Empty (day 12 - day 0) ∆ ∆PSI shSCR (day 12 - day 0) ∆PSI shSCR (day 12 - day 0)

D E TIA1-dependent exons J CPSF3-dependent exons CPSF3−dependent (n=207) (n=184) AS events ns ns ns ns ns TIA1−dependent ** *** AS events 100 100 122 CPSF3-dep 72.55% 75 75 8 UL1-dep 70.47% 205 22 352 50 50 PSI PSI TIA1-dep 5 56.33% 25 25 66 0 25 50 75 100 Percentage of AS events di"erentially spliced UL1−dependent in B cell reprogramming 0 0 NI shSCR shC#1 shC#2 shU#1 shU#2 AS events NI Empty TIA1 day 0 day 12 day 0 day 12 G Ratio GC content Median length F K 5’SS strength upstream downstream TIA1-independent exons CPSF3-independent exons intron/exon intron (n=244) (n=196) * ns ns ns * ns ns ns ns * ns 10 100 100 1.2 ns 10000 ** 8 75 75 1.0 6 6000 4 50 50 PSI 0.8 PSI 2 2000 0.6 25 25 0 206 243 248 206 242 247 0 205 243 248 0 0 NI Empty TIA1 NI shSCR shC#1 shC#2 shU#1 shU#2 TIA1-dep TIA1-dep TIA1-dep TIA1-indepControl CEx TIA1-indepControl CEx TIA1-indepControl CEx day 0 day 12 day 0 day 12 H L Distribution of TIA1 motif 3’SS strength 5’SS strength ** ns * ns 5 15 ** * TIA1-dep (n=207) ns ns TIA1-indep (n=244) 10 4 Control CEx (n=250) 10 p value ≤ 0.001 8 3 p value > 0.001 6 5 2 4

1 2 0 184 192 167 285 198 0 184 192 167 285 198 0

−50 0 150 −150 0 50 −50 0 150 −150 0 50 UL1-dep UL1-dep CPSF3-dep UL1-indep CPSF3-dep UL1-indep CPSF3-indep Control CEx CPSF3-indep Control CEx Region covered by motif [%] I TIA1-dep M CPSF3-dep regulation of RNA splicing regulation of cell morphogenesis establishment of localization cellular regulation of biological quality homeostasis cation RNA splicing positive regulation of cell-substrate adhesion transport positive regulation of regulation of mRNA processing oocyte maturation localization peptide hormone secretion regulation of hormone levels UL1-dep regulation of T-helper 1 type immune response embryo development cell junction assembly regulation of cell morphogenesis chordate embryonic development oocyte maturation negative regulation of developmental growth growth plate cartilage chondrocyte growth positive regulation of neurological system process actomyosin structure organization 3.0 3.5 4.0 4.5 5.0 3.0 3.5 4.0 4.5 -log10(p value) -log10(p value)

Figure 6. Knockdowns of CPSF3 or hnRNP UL1 and overexpression of TIA1 regulate AS during reprogramming. (legend on next page) 15 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 6. Knockdowns of CPSF3 or hnRNP UL1 and overexpression of TIA1 regulate AS during reprogramming. (A) Schematic representation of the experiments performed to study the effect of TIA1 overexpression / CPSF3 or hnRNP UL1 knockdown by RNA-seq. (B) TIA1-dependent events detected during reprogramming. The x axis represents the ∆PSI value between Empty day 12 and day 0. The y axis represents the ∆PSI value between T7-TIA1 day 12 and day 0 control. TIA1-dependent events (|∆∆PSI(T7-TIA1 - Empty)| ≥ 10) are represented by colored dots (palette representing the |∆∆PSI(T7-TIA1 - Empty)| value, n = 387). TIA1-independent events (|∆∆PSI(T7-TIA1 - Empty)| < 2) are represented by grey dots (n=558). (C) CPSF3- and UL1-dependent events detected during reprogramming (left and right, respectively). The x axis represents the ∆PSI value between shSCR day 12 and day 0 control. The y axis represents the ∆PSI value between shCPSF3#1 or shUL1#1 day 12 and day 0 control. CPSF3/UL1-dependent events (∆∆PSI(average_shRNAs - shSCR) ≥ 10) are represented by non-grey-colored dots (palette representing the ∆∆PSI(average_shRNAs - shSCR) value). CPSF3/UL1-independent events (∆∆PSI(average_shRNAs - shSCR) < 2) are represented by grey dots. See Figure S6C for ∆PSI values of the same events in shRNA#2 conditions. (D) Venn Diagram representing the overlap between CPSF3-, UL1- and TIA1-dependent events (left, the number of events in each category is shown). Barplot representing the percentage of CPSF3-, UL1- and TIA1-dependent events which are also differentially spliced in B cell reprogramming (right, the percentage of overlap in each category is indicated). (E) Violin plots representing the distribution of PSI values of TIA1-dependent events in non-infected cells (NI, day 0) and day 12 cells infected with Empty or T7-Tia1 vectors. (F) Violin plots representing the distribution of PSI values of TIA1-independent events as in panel E. (E,F) Statistical significance was calculated by Fisher’s exact test comparing number of events with intermediate (25 < PSI < 75) or extreme PSI values (PSI ≥ 75 or ≤ 25) in each condition against Empty control (*, **, *** = p value < 0.05, 0.01, 0.001 respectively). See also Figure S6D. (G) Boxplots representing the distribution of the indicated sequence features of TIA1-dependent exons, compared to TIA1-independent events and a random set of exons with intermediate PSI values not changing throughout reprogramming (Control CEx). Statistical significance was calculated using Matt, by paired Mann-Whitney U test comparing each condition to the Control CEx set (*, **, *** = p value < 0.05, 0.01, 0.001 respectively). (H) RNA map representing the distribution of TIA1 binding motif in TIA1-dependent exons and flanking introns, compared to TIA1-independent and Control CEx. Thicker segments indicate regions in which enrichment of TIA1 motif is significantly different compared to Control CEx. (I) (GO) terms enriched in genes containing TIA1-dependent events, compared to a background of all genes containing mapped AS events in the dataset. GO enrichment was calculated using GOrilla and GO terms were summarized for visualization using REViGO. The x axis and the size of each bubble represent the -log10(p value) of each GO term. (J) Violin plots representing the distribution of PSI values of CPSF3-dependent events in non-infected cells (NI, day 0) and day 12 cells infected with shSCR or two shRNAs specific for CPSF3 (shC#1 and shC#2) or UL1 (shU#1 and shU#2). (K) Violin plots representing the distribution of PSI values of CPSF3-independent events as in panel J. (J,K) Statistical significance was calculated by Fisher’s exact test comparing number of events with intermediate (25 < PSI < 75) or extreme PSI values (PSI ≥ 75 or ≤ 25) in each condition against shSCR control (*, **, *** = p value < 0.05, 0.01, 0.001 respectively). See also Figure S6E. (L) Boxplots representing the distribution of sequence features of CPSF3- and UL1-dependent exons, compared to the corresponding CPSF3- and UL1-independent events and Control CEx. Statistical significance was calculated using Matt, by paired Mann-Whitney U test comparing each condition to the Control CEx set (*, **, *** = p value < 0.05, 0.01, 0.001 respectively). (M) GO terms enriched in genes containing CPSF3- and UL1-dependent events, compared to a background of all genes containing mapped AS events in the dataset, performed as in panel I.

found no strong evidence for cross-regulation 6E-F and S6E). These effects are altered by between CPSF3 and hnRNP UL1 factors (either TIA1 overexpression (Figure 6E), suggesting in AS or in gene expression, Figure S6B), these that in this system, the typical role of TIA1 is to data suggest that CPSF3 and hnRNP UL1 con- repress exon inclusion. tribute to a common program of AS changes Analysis of sequence features associated relevant for cell reprogramming. Moreover, with TIA1-regulated exons performed using TIA1-, CPSF3- and UL1-dependent events de- Matt (Gohr and Irimia, 2018) revealed weaker tected in MEFs showed high overlap with 5' splice sites, shorter median length of the events that were also differentially spliced dur- flanking downstream introns and a larger dif- ing B cell reprogramming, suggesting that ference in GC content between the alternative these factors regulate AS events that generally exons and their flanking upstream introns contribute to the induction of pluripotency (Figure 6G), suggestive of a strong depen- (Figure 6D, right panel). dence of these exons on the process of exon Focusing on cassette exon events, TIA1- definition (Amit et al. 2012). Notably, an en- dependent exons (but not TIA1-independent richment in putative TIA1 binding motifs was exons) displayed significantly higher PSI values detected about 100 nucleotides upstream of at day 12 compared to day 0 of reprogram- the distal 3' splice site (Figure 6H), suggesting ming, as well as a relative increase in exons that binding of TIA1 to this region might re- displaying intermediate PSI values (Figures press inclusion of the alternative exon (or en-

16 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

hance pairing between the distal splice sites), Discussion leading to exon skipping. Somatic cell reprogramming holds great Gene Ontology (GO) enrichment analysis promise for fundamental research and medi- (GOrilla; Eden et al. 2009) on the set of genes cine. In this study, we have characterized the containing TIA1-dependent AS events com- dynamics of AS changes during the determin- pared to a background list of all genes con- istic reprogramming of B cells into iPS cells taining mapped AS events (n=11132) showed and have identified three splicing regulators an enrichment in functions related to peptide/ with a role in somatic cell reprogramming, as hormone secretion, T-helper cell functions and validated in perturbation experiments per- embryonic development (Figure 6I). formed during MEF reprogramming.

CPSF3- and UL1-dependent exons showed The B cell reprogramming system is a two- similar inclusion patterns: an increased pro- step process, in which a pulse of the myeloid- portion of intermediate PSIs was observed at specific transcription factor C/EBPα poises the day 12 in CPSF3- and UL1-dependent but not cells for deterministic reprogramming by the in -independent events (Figures 6J-K and subsequent induction of the OSKM factors, S6F). CPSF3- and UL1-dependent exons also inducing early chromatin opening and epige- showed higher PSI values at day 12 compared netic modifications that silence of the B cell- to day 0, and an enrichment in exons display- specific program and induce pluripotency (Di ing intermediate PSI values at the end of the Stefano et al. 2014, 2016; Sardina et al. 2018; reprogramming process (Figures 6J-K and Stadhouders et al. 2018). While C/EBPα might S6F). These effects were attenuated, specifical- not directly impact AS, it remains possible that ly for CPSF3- and UL1-dependent exons upon the pulse of C/EBPα could induce changes in knockdown of either of these factors (Figures the expression of splicing regulatory factors 6J-K and S6F), supporting the notion that (e.g. induction of CPSF3 or repression of hn- their regulatory programs overlap. Both CPS- RNP UL1 or TIA1) that trigger a program of F3- and UL1-dependent exons displayed post-transcriptional changes that contributes weaker 3’ and 5’ splice sites (Figure 6L). GO to cell reprogramming, or that C/EBPα could analyses revealed that CPSF3- and UL1-de- have direct effects on the activity of splicing pendent events belong to genes enriched in factors. functional categories related to the regulation of cell morphogenesis, cell-substrate adhesion We observed widespread changes in AS, and cytoskeleton organization (Figure 6M). whose frequency increased as reprogramming progressed. Clustering analysis of cassette ex- Taken together, these results suggest that ons revealed groups of exons varying in inclu- the AS regulators TIA1, CPSF3 and hnRNP UL1 sion at very early stages of B cell reprogram- function in cell reprogramming through their ming (including following the C/EBPα pulse), activities on genes relevant for cell fate deci- but also at middle and late phases after OSKM sions. Moreover, CPSF3 and hnRNP UL1 act induction. Of note, early clusters contain a through a highly overlapping program of larger proportion of exons predicted to disrupt splicing changes, while TIA1 affects a different open reading frames (ORFs) of isoforms pre- set of AS events. dominantly expressed in the B cells, whereas middle clusters contain more ORF-preserving exons. Together with the observation that the majority of regulated introns tend to be re- tained (leading to ORF disruption) at early

17 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

stages and are progressively spliced during gramming. We found that perturbations of reprogramming, these findings suggest that, each of the three factors impaired the repro- in the transition from early to late stages, AS gramming of MEFs. CPSF3 plays an important acts as a switch towards a gain in coding ca- role in mRNA polyadenylation, recognizing the pacity, while at intermediate stages of repro- polyadenylation signal (PAS) together with the gramming AS acts more commonly as a switch rest of the CPSF complex. In addition to the between isoforms. well-known intense crosstalk between last in- tron splicing and 3’ end processing (Kyburz et The AS switch observed between day 4 al. 2006), CPSF2 and the CPSF complex have and day 6 of B cell reprogramming coincides been shown to influence splicing of internal with a transcriptional switch, although the exons independently of cleavage and proportion of genes displaying both gene ex- polyadenylation (Misra et al. 2014, 2015). Fur- pression and AS changes is lower than expect- thermore, mutations of the CPSF3 yeast ho- ed (typically 1/3 of alternatively spliced tran- molog Brr5/Ysh1 have been shown to strongly scripts are controlled at the level of gene ex- affect splicing, suggesting that CPSF3 could pression by nonsense mediated decay, Lewis, also play similar direct regulatory functions Green and Brenner, 2003; Pan et al. 2004, (Larson et al. 2016). hnRNP UL1, instead, is a 2008). The transcriptomic changes observed largely uncharacterized RBP known to partici- during this transition might be mediated at pate in DNA damage response (Polo et al. least in part by the replacement of the culture 2012b) and identified as a surface marker of medium from serum + LIF cell to N2B27-based human ES cells, although its functions in this 2i medium at day 6. Exons changing their in- context remain unknown (Choi et al. 2011). clusion levels during this intermediate period Recent work in Zebrafish suggest a possible displayed the highest overlap with exons dif- role for Hnrnpul1 in the regulation of AS ferentially spliced in the reprogramming of (Blackwell et al. 2020). MEFs, arguing for the general relevance of these changes in the induction of pluripoten- Knockdown of Cpsf3 or Hnrnpul1 (with cy. two independent shRNAs for each gene) caused a general repression or delay in MEF Our analysis of changes in expression of reprogramming, revealing their functions as RBPs during B cell reprogramming suggested promoters of somatic cell reprogramming. Se- potential regulators of AS, including factors quence analysis of regulated AS events sug- already known to play a role in cell repro- gest that CPSF3 and hnRNP UL1 favor the def- gramming such as the pluripotency repressors inition of exons harboring relatively weak MBNL1/2 and CELF2 (Han et al. 2013; Solana et splice sites. Remarkably, the AS targets of CPS- al. 2016), which are downregulated during B F3 and hnRNP UL1 identified by our RNA-seq cell reprogramming, and positive pluripotency analyses show a high overlap (without evi- regulators such as U2AF1 and hnRNP H1 (Ohta dence of cross-regulation between the two et al. 2013; Yamazaki et al. 2018), which were proteins), suggesting a concerted mechanism upregulated during the process, consistent of splice site selection, perhaps as components with results described for MEF reprogramming of the same complex. It is unlikely that the (Cieply et al. 2016). We focused on three can- overlap is an indirect consequence of reduced/ didate regulators of early AS changes, CPSF3, delayed reprogramming, because TIA1 over- hnRNP UL1 and TIA1 because they could po- expression also inhibits reprogramming and tentially contribute to the reprogramming ad- yet it is accompanied by largely non-overlap- vantage acquired by B cells poised by C/EBPα ping changes in AS. These observations sug- and/or to early events occurring during repro-

18 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

gest that multiple, separable AS programs The finding that overexpression of TIA1 contribute to the regulation of cell repro- caused a delay in the conserved gradual skip- gramming. ping of Lef1 exon 6 observed during MEF re- programming is consistent with a conserved TIA1 is a regulator of RNA metabolism in- role of the factor in reprogramming, as gradual fluencing mRNA decay and AS implicated in skipping of this exon was also observed in cell proliferation, cell cycle progression, cellu- middle stages of B cell reprogramming, along lar stress, autophagy and programmed cell with a decrease in expression of the gene. death (Förch et al. 2000; Izquierdo et al. 2005; LEF1 is a transcription factor with important Kedersha et al. 1999; Sánchez-Jiménez & functions in embryonic and T cell develop- Izquierdo 2013) whose depletion induces ment and activation (Oosterwegel et al. 1993; mouse embryonic lethality (Piecyk et al. 2000). Travis et al. 1991; Van Genderen et al. 1994; We found that TIA1 overexpression at early Waterman et al. 1991; Zhou et al. 1995). As stages of reprogramming represses the induc- Lef1 exon 6 inclusion is promoted by CELF2 tion of pluripotency, as reported for MBNL pro- during T cell development and activation teins (Han et al. 2013). Binding of TIA1 to U-rich (Ajith et al. 2016; Mallory et al. 2011, 2015) and sequences downstream of weak 5’SS helps to Celf2 expression rapidly decreases in both B recruit U1 snRNP and facilitates exon definition cell and MEF reprogramming, CELF2 may par- (Förch et al. 2000, 2002; Izquierdo et al. 2005; ticipate – along with TIA1 – in Lef1 exon 6 reg- Wang et al. 2014), while it can also inhibit the ulation during reprogramming. We observed inclusion of alternative exons located at a dis- that overexpression of Lef1 at the start of MEF tance from its binding site (Ule et al. 2010). As reprogramming improved reprogramming expected, TIA1-dependent exons are enriched efficiency (observed as an increase of pluripo- in weak 5’ splice sites and other features that tency markers and percentage of reprogram- support a role in exon definition. Interestingly, ming intermediates), with overexpression of TIA1-dependent exons in our system feature the iPS-associated Lef1 exon 6-skipping iso- an enrichment of predicted U-rich TIA1 bind- form displaying stronger effects than the exon ing sites approximately 100 nucleotides up- 6-including isoform, suggesting a functional stream of the 3’ splice site located in the role for this switch in cell reprogramming (Fig- downstream intron, suggesting that this con- ure S5). We speculate that the gradual skip- figuration serves to repress the inclusion of ping of exon 6 affects the interaction of LEF1 exons that contribute to cell reprogramming. with protein partners that either act as coacti- Since GO terms most enriched in genes bear- vators – e.g. ALY (Bruhn et al. 1997) – or core- ing TIA1-dependent exons include ‘positive pressors – e.g. Groucho/TLE (Levanon et al. regulation of peptide hormone secretion’, we 1998) –, differentially affecting the transcrip- hypothesize that TIA1-regulated transcripts tion of target genes. For example, the LEF1- could modulate functions related to the inclusion isoform could promote cell prolifera- senescence-associated secretory phenotype tion and maintenance of B cell identity – as (SASP), which was shown to promote repro- observed in the C/EBPα-dependent transdif- gramming of somatic cells and dedifferentia- ferentiation system and in pancreatic cancer tion in cancer (Brady et al. 2013; Mosteiro et al. cell lines (Jesse et al. 2010; Rodriguez-Ubreva 2016; Ritschka et al. 2017). Indeed, 7 out of 23 et al. 2014) –, while gradual skipping of exon 6 (30%) of the genes belonging to these GO might alter its effect on its target genes with terms were also found in a list of senescence- the consequence of inducing pluripotency, or SASP-associated genes (GeneCards data- most probably in a β-catenin independent base; Stelzer et al. 2016). way.

19 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

In summary, our results provide a compari- Gene expression and alternative son between AS changes occurring in two very splicing analyses different reprogramming systems and a wealth Reads mapping was performed with STAR of information relevant for cell fate decisions (Dobin et al. 2013) and gene expression analy- and transitions to pluripotency. Furthermore, sis was performed using the edgeR package they demonstrate the functional involvement v3.16.5 (McCarthy et al. 2012; Robinson et al. of the splicing regulators CPSF3, hnRNP UL1 2009). A minimum of 5 counts per million and TIA1 in efficient cell reprogramming. It (cpm) was required in 33% of both datasets (5 thus significantly extends previous work on samples for B cell reprogramming, 6 samples splicing during reprogramming and suggests for MEF reprogramming) and a minimum coef- that AS is required for the entire transition of ficient of variation of 0.2 was required. somatic into pluripotent stem cells. It would be interesting to determine whether the AS Alternative splicing (AS) analysis was per- changes and regulators identified in our work formed using vast-tools software v2.2.2 (Irimia play similar roles during the specification of et al. 2014; Tapial et al. 2017). Strand-specific pluripotent stem cells in early embryo devel- mapping was performed and only events with opment. a minimum of 10 actual reads per sample were considered (VLOW quality score). PSI values for single replicates were quantified for all types Materials and Methods of alternative events, including simple and complex cassette exons (S, C1, C2, C3), mi- RNA sequencing croexons (MIC), alternative 5' and 3' splice sites For RNA sequencing of B cell reprogram- (Alt5, Alt3) and retained introns (IR-S, IR-C). For ming and MEF reprogramming upon TIA1 cassette exon events (referred to as CEx), PSI overexpression and knockdown of CPSF3 and values of all annotated exons were also quanti- hnRNP UL1, stranded libraries were prepared fied with vast-tools (Annotation module ANN). from samples and sequenced on an Illumina For both B cell and MEF reprogramming, all HiSeq 2500 v4 using a 2x125nt paired-end possible pairwise comparisons between sam- protocol. Duplicates were sequenced for each ples were performed, selecting differentially condition, with samples pooled in separate spliced events with a |∆PSI| ≥ 10 and a mini- lanes. RNA-seq data (triplicates of 2x100nt mum range of 5 between samples. paired-end sequencing) of MEF reprogram- For TIA1 overexpression experiments, TIA1- ming was downloaded from GEO Database dependent events were defined as events with (Cieply et al. 2016). |∆∆PSI| ≥ 10, where ∆∆PSI = (∆PSI TIA1_day12 – NI_day0) - (∆PSI Empty_day12 – NI_day0). Data and code accessibility TIA1-independent events were instead de- All scripts and RNA-seq data of MEF repro- fined as events changing in any of the two gramming upon TIA1 overexpression and conditions (|∆PSI| TIA1_day12 – NI_day0 ≥ 10) knockdown of CPSF3 and hnRNP UL1 will be or (|∆PSI| Empty_day12 – NI_day0 ≥ 10) and released to public databases when the article having minimal difference between the two is accepted for publication. conditions (|∆∆PSI| < 2). Similarly, CPSF3- or UL1-dependent events were events for which in both shRNAs |∆∆PSI| ≥ 10, where ∆∆PSI = (∆PSI shC/U#1/2_day12 – NI_day0) - (∆PSI shSCR_day12 – NI_day0).

20 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

CPSF3- or UL1-independent events were in- (Tapial et al. 2017). Briefly, following division stead defined as events changing in any of the according to their location in non-coding knockdown or control conditions (|∆PSI| shC/ RNAs, untranslated regions (5’ or 3’UTRs) or U#1/2_day12 – NI_day0 ≥ 10) or (|∆PSI| open reading frame (ORF), exons were pre- shSCR_day12 – NI_day0 ≥ 10) and having min- dicted to disrupt the ORF if their inclusion or imal difference between the two conditions (| skipping would induce a frameshift in their ∆∆PSI| < 2). ORF or if they would induce a premature stop codon (PTC) predicted to be targeted by non- Clustering analysis of alternatively sense-mediated decay (NMD) or truncating spliced exons and correlation of gene the protein by more than 300 amino acids. The expression profiles rest of the ORF-mapping events are predicted to preserve the transcript coding-potential. PSI values of exons differentially spliced in at least one pair of B cell reprogramming sam- ples (n=4556) were scaled and centered (re- Principal Component Analysis (PCA) and comparison between B cell and MEF ferred to as ‘scaled PSI’ values). Fuzzy c-means reprogramming clustering analysis was carried out using the R package Mfuzz (Futschik & Carlisle 2005; Ku- Gene expression values were filtered for mar & Futschik 2007) on the scaled PSI values. minimum variation (coefficient of variation ≥ The optimal number of clusters (12) was se- 0.66, corresponding to the 3rd quartile, lected on the basis of the minimum distance n=2679), scaled and centered (referred to as to the cluster centroid. A membership value ‘scaled cpm’ values). Principal Component was assigned to correlate gene expression pro- Analysis (PCA) was performed on scaled cpm files to each AS clusters, either of genes con- values of the most variable genes expressed in taining the AS exons of each cluster or of both B cell and MEF reprogramming. The genes encoding RBPs. To each scaled cpm val- groups shown were obtained by k-means clus- ues vector, we attributed a membership value tering. For heatmaps representing scaled PSI to the centroid of each cluster or to its nega- and scaled cpm values in B cell and MEF re- tive. programming, hierarchical clustering was per- formed on values of both datasets using a Correlation of B cell reprogramming Ward’s method and Euclidean distance as the stages distance metric.

Pearson correlation coefficient for B cell Correlation of RNA-binding proteins reprogramming stages was calculated on the expression to AS clusters most variable expressed genes (cpm ≥ 5 in at least 5 samples and coefficient of variation ≥ The lists of Mus musculus RNA-binding pro- 0.73864, corresponding to the 3rd quartile, teins (RBPs) and splicing-related proteins were n=2961) or the most variable exons (differen- downloaded from the Uniprot database. For tially spliced in at least one pairwise compari- RBPs, Uniprot keywords had to match ‘RNA- son and coefficient of variation ≥ 0.4107, cor- binding [KW-0694]’, ‘mRNA splicing [KW-0508]’, responding to the 3rd quartile, n=1139). ‘mRNA processing [KW-0507]’ or ‘Spliceosome [KW-0747]’. Splicing-related RBPs were defined Prediction of the protein impact of if keywords were matching either ‘mRNA splic- alternative exons ing [KW-0508]’ or ‘Spliceosome [KW- 0747]’. Filtered gene expression values were scaled as Alternative exons detected by vast-tools described above and a membership value was were classified as described in vastDB v2.2.2

21 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

attributed to each RBP profiles for each AS GO terms enrichment in contrast to all genes cluster centroid and to its negative (RBPs with containing any mapped AS event. Analysis was membership ≥ 0.3 are shown). Word clouds carried out with the ‘two unranked lists of were created with WordArt.com online tool, genes’ module of GOrilla (Gene Ontology en- with the size of each RBP name proportional to RIchment anaLysis and visuaLizAtion) tool the corresponding membership value and col- (Eden et al. 2009) and summarized using REVi- ors highlighting splicing-associated RBPs. GO (Supek et al. 2011). Statistical significance was defined with p value < 1e-03. Only en- Sequence feature analysis and RNA riched terms for GO Process are shown. maps Cell culture Sequence features of exons and flanking introns were analyzed using Matt software Platinum E cells (PlatE), 293T/17 cells, v1.3.0 (Gohr & Irimia 2019). Features of TIA1-, stromal S17 cells and primary MEFs serving as CPSF3- or UL1-dependent exons were com- feeder cells for B cell reprogramming were pared with the corresponding independent kindly provided by the group of T. Graf and exons and with a set of control Cassette exons, cultured in Glutamax Dulbecco’s modified Ea- representing alternative exons not changing gle’s (DMEM, Life Technologies) medium sup- during reprogramming. Specifically, the union plemented with 10% fetal bovine serum (FBS) of alternative non-changing exons (AS_NC and penicillin/streptomycin antibiotics (Pen- sets, 10 < average PSI < 90 and ∆PSI ≤ 2) in all Strep, 50 U/ml penicillin; 50 μg/ml strepto- conditions of each dataset was generated, mycin). All cells were maintained at 37ºC under

TIA1- or CPSF3/UL1-dependent events were 5% CO2 and routinely tested for mycoplasma. excluded and a random set was selected, with a size similar to TIA1- or CPSF3-/UL1-depen- Lentivirus production dent and -independent exon sets. Maximum entropy score is calculated as an approxima- 293T/17 cells were seeded on gelatin- 6 tion of splice site strength (Yeo & Burge 2004). coated 10 cm plates at a density of 3x10 cells/ RNA maps for TIA1 M075 motif (Ray et al. 2013) plate and incubated overnight. The following were generated for the first and last 50 nt of day, medium was replaced approximately 2 exons and the first and last 150nt of introns hours before transfection. For each 10 cm (sliding window = 31, p value ≤ 0.05 with 1000 plate, 10 μg of the plasmid of interest, 2.5 μg permutations). of VSV-G and 7.5 μg of p∆8.9 were mixed with 61 μl of CaCl2 2.5 M (Sigma) and endotoxin- Statistical analyses and plots free water up to 500 μl. While bubbling, 500 μl of 2x HBS pH7.2 (281 mM NaCl, 100 mM HEP-

Statistical tests were performed as indicat- ES, 1.5 mM Na2HPO4) were added dropwise, ed in figures legends with R (v3.6.1) or Graph- and the mix was incubated for 10 min at room Pad Prism (version 8). Heatmaps were plotted temperature (RT). The solution was added with ggplot2 package or heatmap.2 function. dropwise to the 293T/17 cells, followed by in- Venn diagrams were generated with Biovenn cubation for 24 hours at 37ºC. The following (Hulsen et al. 2008). day, medium was substituted with 6 ml of fresh medium (reprogramming medium for Gene Ontology enrichment analysis MEF reprogramming). Viral medium was col- Genes bearing the differentially spliced lected 48 and 72 hours after transfection, fil- exons belonging to each set were analyzed for tered (0.22 μm pore size) and supplemented as needed.

22 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Retrovirus production natant (prepared in reprogramming medium), supplemented with LIF and doxycycline to in- 4.5x106 PlatE cells were seeded in gelatin- duce reprogramming. Two subsequent infec- coated 10cm plates (0.1% gelatin, Millipore) tions were performed, 12 hours apart, after the day before transfection. To improve effi- which fresh reprogramming medium was ciency, chloroquine was added approximately added. The following day (day 2) cells were one hour before transfection to a final concen- trypsinized and FACS-sorted for GFP expres- tration of 30 μM. For each plate, 20 μg of plas- sion by flow cytometry. Sorted MEFs were mid were mixed with 60 μl of CaCl2 2.5 M seeded (10.000 cells/well of 12-well plate) on (Sigma) and endotoxin-free water up to 500 μl. irradiated feeders on gelatin-coated plates While bubbling, 500 μl of 2x HBS pH 7.2 (281 (plated the previous day 100.000 cells/well of mM NaCl, 100 mM HEPES, 1.5 mM Na2HPO4) 12-well plate). Medium was substituted every were added dropwise, and the mix was incu- 2 days starting from day 4. Harvesting was per- bated for 10 min at room temperature (RT). formed every two days with trypsin 0.25% to The solution was added dropwise to the PlatE ensure the complete dissociation of the feeder cells and transfected cells were incubated for layer. Doxycycline was withdrawn from the 8-10 hours. Medium was substituted with 6 ml culture at day 12 post-OSKM induction and AP of fresh medium (reprogramming medium for staining was performed at day 14 with the Al- MEF reprogramming). Viral medium was col- kaline Phosphatase Kit II (StemGent), accord- lected the day after transfection and the fol- ing to the manufacturer’s instructions. The lowing one, filtered (0.22 μm filter pore size) number of AP+ colonies was quantified with and supplemented as needed. ImageJ software (colony size between 20-2000 pixels). C/EBPα-dependent B cell reprogramming Reprogramming medium is composed by DMEM, high glucose + Glutamax, ES-qualified Primary B cell reprogramming was per- FBS 15%, PenStrep 0.5x, Sodium Pyruvate 1 formed as described in (Di Stefano et al. 2014, mM, HEPES 30 mM, NEAA 1x, β-mercap- 2016; Di Stefano & Graf 2016; Stadhouders et toethanol 0.1 mM. LIF (1000 U/ml) and doxy- al. 2018). cycline (1µg/ml) were added freshly.

MEF reprogramming Flow cytometry analysis Primary MEFs (P0), obtained from male Cells were trypsinized and stained with a embryos of a Collagen-OKSM, M2rtta+, mouse mix of antibodies SSEA1-eFluor 660 (MC-480, line (Stadtfeld et al. 2010) were cultured on eBioscience) and EpCAM-PE (G8.8, eBio- gelatin-coated plates in MEF medium (DMEM, science), both at a 1:50 dilution in 100 μl per high glucose + Glutamax, FBS 10%, PenStrep sample. Staining was carried out by incubating 1x, Sodium Pyruvate 1 mM, HEPES 30 mM, on ice for 20 min, followed by washing and NEAA 1x, β-mercaptoethanol 0.1 mM). Primary staining with DAPI (Sigma). Flow cytometry MEFs were expanded for a maximum of 2 pas- analysis was performed with a BD LSR II sages before inducing reprogramming. Early analyser. Gates and compensation between passage MEFs were plated in MEF medium on FITC and PE were adjusted on non-stained gelatin-coated 6-well plates at a density of controls. 10.000 events (alive cells) per sample 70.000 cells/well. The following day, infection were acquired. was performed by substituting the medium with the filtered retroviral or lentiviral super-

23 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Cloning of LEF1 isoforms and TIA1 in with Superscript III (Invitrogen) following the retroviral vectors manufacturer’s recommendations.

Lef1 isoforms were amplified from bulk PCR reactions were carried out using Go- cDNAs of B cells and iPS (RNA-seq samples) Taq enzyme (Promega) with 1 μl of cDNA di- and cloned using Gibson technology (Gibson luted 1:5. To quantify inclusion of alternatively et al. 2009) into a MIG-pMSCV retroviral vector spliced exons, capillary electrophoresis was including an IRES-GFP element. Kozak consen- performed using a Labchip GX Caliper work- sus sequences and an N-terminal T7 tag were station (Caliper, Perkin Elmer) at the CRG Pro- inserted before the LEF1 ORF to ensure effi- tein Technologies Core Facility. The nanomolar cient and discrimination from en- content of each band was extracted with Lab- dogenous Lef1. Chip GX software and PSI values were calcu- lated as the ratio between the inclusion ampli- cDNA of mouse Tia1 was purchased from con and the sum of inclusion and skipping GenScript in pcDNA3.1 vectors (clone ID: amplicons. OMu08423D) and cloned using a similar Gib- son strategy into the MIG-pMSCV retroviral Real Time quantitative PCR (RT-qPCR) was vector bearing the N-terminal T7 and IRES-GFP performed on a ViiA7 Real Time PCR System elements. Gibson reaction master mixes were (Applied Biosystems). Reactions in a total vol- provided by the CRG Protein Technologies ume of 10μl contained 2x SYBR Green Master Core Facility. Mix (Applied Biosystems), primers 400 nM and 1μl of previously synthetized cDNA, diluted Short hairpin RNAs 1:5-20. The output Ct values were normalized by the expression of the housekeeping gene Five MISSION shRNAs specific for each GAPDH (unless differently stated) and an- splicing regulator were purchased from Sigma alysed with ∆Ct/∆∆Ct method. All primers se- in pLKO lentiviral vectors and their effects quences are listed in Table S6. were compared with those of SHC002 mam- malian non-targeting MISSION shRNA (Sigma). The effect of each shRNA was tested on MEFs Acknowledgements and E14 embryonic stem cells, and knockdown efficiency was assessed by RT-qPCR at 72/96h We thank the CRG Genomics Unit for se- post-infection using primers specific for each quencing; the CRG Tissue Engineering Unit for target and normalised by the expression of the inactivated MEF feeders; the CRG/UPF two housekeeping genes (Gapdh and Rplp0). FACS Unit for FACS sorting; Daniel Soronellas The two shRNAs displaying the largest knock- for help with the Mfuzz package and Konrad down effects in both cell lines were selected Hochedlinger for the Col-OKSM reprogramma- (sequences shown in Table S6). ble mouse strain. We thank members of Manuel Irimia’s, T.G.’s and J.V.’s groups for dis- RNA extraction and retrotranscription, cussions; Manuel Irimia and members of J.V.’s semi-quantitative RT-PCR and real-time group for critical reading of the manuscript. qPCR (RT-qPCR) C.V. was recipient of an FPI-Severo Ochoa Fel- lowship from the Spanish Ministry of Economy RNA extraction and DNAse treatment was and Competitiveness Work in J.V. laboratory is performed using Maxwell simplyRNA kit supported by the European Research Council (Promega) or RNeasy Mini kit (Qiagen), follow- (ERC AdvG 670146), AGAUR, Spanish Ministry ing the manufacturers’ instructions. of Economy and Competitiveness (BFU 2017 200 ng of total RNA were reverse-transcribed

24 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

89308-P) and the Centre of Excellence Severo Bruhn, L., Munnerlyn, A. and Grosschedl, R. (1997) Ochoa. Work in T.G.’s laboratory was supported ‘ALY, a context-dependent coactivator of LEF-1 and AML- 1, is required for TCRα enhancer function’, Genes and De- by t h e Eu ro p e a n R e s e a rc h Co u n c i l velopment, 11(5), pp. 640–653. doi: 10.1101/gad.11.5.640. FP7/2007-2013 (ERC Synergy Grant 4D- Choi, H. S. et al. (2011) ‘Identification and characteri- Genome) the Ministerio de Educación y Cien- zation of adenovirus early region 1B-associated protein 5 cia (SAF.2012-37167) and AGAUR. as a surface marker on undifferentiated human embryon- ic stem cells’, Stem Cells and Development, 20(4), pp. 609– 620. doi: 10.1089/scd.2010.0265. Cieply, B. et al. (2016) ‘Multiphasic and Dynamic Authors contributions Changes in Alternative Splicing during Induction of Pluripotency Are Coordinated by Numerous RNA-Binding C.V., T.G., and J.V. conceived the study. C.V., Proteins’, Cell Reports, 15(2), pp. 247–255. doi: 10.1016/ P.P., J.L.S. and J.V. designed experiments and j.celrep.2016.03.025. data analyses. C.V. and J.V. wrote the man- Dobin, A. et al. (2013) ‘STAR: Ultrafast universal RNA- - seq aligner’, Bioinformatics, 29(1), pp. 15–21. doi: 10.1093/ uscript with input from all authors. C.V. per bioinformatics/bts635. formed the majority of the experimental work Eden, E. et al. (2009) ‘GOrilla: A tool for discovery and and data analyses (RNA-seq analyses, MEF re- visualization of enriched GO terms in ranked gene lists’, programming and molecular biology experi- BMC Bioinformatics, 10. doi: 10.1186/1471-2105-10-48. ments). R.S., B.D.S. and C.B. performed the B Förch, P. et al. (2000) ‘The -promoting fac- tor TIA-1 is a regulator of alternative pre-mRNA splicing’, cell reprogramming experiments for RNA-seq. Molecular Cell, 6(5), pp. 1089–1098. doi: 10.1016/ S.G, A.M. and B.P. provided reprogrammable S1097-2765(00)00107-6. MEFs. J.V. and T.G. acquired funding. Förch, P. et al. (2002) ‘The splicing regulator TIA-1 interacts with U1-C to promote U1 snRNP recruitment to 5′ splice sites’, EMBO Journal, 21(24), pp. 6882–6892. doi: 10.1093/emboj/cdf668. Declaration of Interests Futschik, M. E. and Carlisle, B. (2005) ‘Noise-robust The authors declare no competing inter- soft clustering of gene expression time-course data’, Journal of Bioinformatics and Computational Biology, 3(4), ests. pp. 965–988. doi: 10.1142/S0219720005001375. Gabut, M. et al. (2011) ‘An alternative splicing switch regulates embryonic stem cell pluripotency and repro- References Bibliography} gramming’, Cell. Elsevier Inc., 147(1), pp. 132–146. doi: 10.1016/j.cell.2011.08.023. Ajith, S. et al. (2016) ‘Position-dependent activity of Van Genderen, C. et al. (1994) ‘Development of sev- CELF2 in the regulation of splicing and implications for eral organs that require inductive epithelial- mesenchy- signal-responsive regulation in T cells’, RNA Biology. Taylor mal interactions is impaired in LEF-1-deficient mice’, & F r a n c i s , 1 3 ( 6 ) , p p . 5 6 9 – 5 8 1 . d o i : Genes and Development, 8(22), pp. 2691–2703. doi: 10.1080/15476286.2016.1176663. 10.1101/gad.8.22.2691. Amit, M. et al. (2012) ‘Differential GC Content be- Gibson, D. G. et al. (2009) ‘Enzymatic assembly of tween Exons and Introns Establishes Distinct Strategies DNA molecules up to several hundred kilobases’, Nature of Splice-Site Recognition’, Cell Reports. Cell Press, 1(5), Methods. Nature Publishing Group, 6(5), pp. 343–345. doi: pp. 543–556. doi: 10.1016/j.celrep.2012.03.013. 10.1038/nmeth.1318. Baralle, F. E. and Giudice, J. (2017) ‘Alternative splic- Gohr, A. and Irimia, M. (2019) ‘Matt: Unix tools for ing as a regulator of development and tissue identity’, alternative splicing analysis’, Bioinformatics, 35(1), pp. Nature Reviews Molecular Cell Biology. Nature Publishing 130–132. doi: 10.1093/bioinformatics/bty606. Group, 18(7), pp. 437–451. doi: 10.1038/nrm.2017.27. Gopalakrishna-Pillai, S. and Iverson, L. E. (2011) ‘A Blackwell, D. et al. (2020) ‘Hnrnpul1 loss of function DNMT3B alternatively spliced exon and encoded peptide affects skeletal and limb development’, bioRxiv. doi: are novel biomarkers of human pluripotent stem cells’, 10.1101/2020.02.04.934257. PLoS ONE, 6(6). doi: 10.1371/journal.pone.0020663. Brady, J. J. et al. (2013) ‘Early role for IL-6 signalling Gopalakrishnan, S. et al. (2009) ‘A novel DNMT3B during generation of induced pluripotent stem cells re- splice variant expressed in tumor and pluripotent cells vealed by heterokaryon RNA-Seq’, Nature Cell Biology, modulates genomic DNA methylation patterns and dis- 15(10), pp. 1244–1252. doi: 10.1038/ncb2835. plays altered DNA binding’, Molecular Cancer Research,

25 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

7 ( 1 0 ) , p p . 1 6 2 2 – 1 6 3 4 . d o i : Stem Cell, 15(1), pp. 92–101. doi: 10.1016/ 10.1158/1541-7786.MCR-09-0018. j.stem.2014.04.002. Han, H. et al. (2013) ‘MBNL proteins repress ES-cell- Mallory, M. J. et al. (2011) ‘Signal- and Development- specific alternative splicing and reprogramming’, Nature, Dependent Alternative Splicing of LEF1 in T Cells Is Con- 498(7453), pp. 241–245. doi: 10.1038/nature12270. trolled by CELF2’, Molecular and Cellular Biology, 31(11), pp. 2184–2195. doi: 10.1128/mcb.05170-11. Hulsen, T., de Vlieg, J. and Alkema, W. (2008) ‘BioVenn - A web application for the comparison and visualization Mallory, M. J. et al. (2015) ‘Induced transcription and of biological lists using area-proportional Venn diagrams’, stability of CELF2 mRNA drives widespread alternative BMC Genomics, 9. doi: 10.1186/1471-2164-9-488. splicing during T-cell signaling’, Proceedings of the Na- tional Academy of Sciences of the United States of America, Irimia, M. et al. (2014) ‘A highly conserved program of 1 1 2 ( 1 7 ) , p p . E 2 1 3 9 – E 2 1 4 8 . d o i : 1 0 . 1 0 7 3 / neuronal microexons is misregulated in autistic brains’, pnas.1423695112. Cell. Elsevier Ltd, 159(7), pp. 1511–1523. doi: 10.1016/ j.cell.2014.11.035. McCarthy, D. J., Chen, Y. and Smyth, G. K. (2012) ‘Dif- ferential expression analysis of multifactor RNA-Seq ex- Izquierdo, J. M. et al. (2005) ‘Regulation of fas alterna- periments with respect to biological variation’, Nucleic tive splicing by antagonistic effects of TIA-1 and PTB on Acids Research, 40(10), pp. 4288–4297. doi: 10.1093/nar/ exon definition’, Molecular Cell, 19(4), pp. 475–484. doi: gks042. 10.1016/j.molcel.2005.06.015. Misra, A. et al. (2014) ‘Global Promotion of Alterna- Jesse, S. et al. (2010) ‘Lef-1 isoforms regulate different tive Internal Exon Usage by mRNA 3’ End Formation Fac- target genes and reduce cellular adhesion’, International tors’, Molecular Cell. Elsevier Inc., 58(5), pp. 819–831. doi: Journal of Cancer, 126(5), pp. 1109–1120. doi: 10.1002/ 10.1016/j.molcel.2015.03.016. ijc.24802. Misra, A. et al. (2015) ‘Global analysis of CPSF2-medi- Kanitz, A. et al. (2019) ‘Conserved regulation of RNA ated alternative splicing: Integration of global iCLIP and processing in somatic cell reprogramming’, BMC Ge- transcriptome profiling data’, Genomics Data. The Au- nomics. BMC Genomics, 20(1), pp. 1–19. doi: 10.1186/ thors, 6, pp. 217–221. doi: 10.1016/j.gdata.2015.09.022. s12864-019-5438-2. Mosteiro, L. et al. (2016) ‘Tissue damage and senes- Kedersha, N. L. et al. (1999) ‘RNA-binding proteins cence provide critical signals for cellular reprogramming TIA-1 and TIAR link the phosphorylation of eIF-2α to the in vivo’, Science, 354(6315). doi: 10.1126/science.aaf4445. assembly of mammalian stress granules’, Journal of Cell Biology, 147(7), pp. 1431–1441. doi: 10.1083/ Ohta, S. et al. (2013) ‘Global Splicing Pattern Rever- jcb.147.7.1431. sion during Somatic Cell Reprogramming’, Cell Reports, 5(2), pp. 357–366. doi: 10.1016/j.celrep.2013.09.016. Kumar, L. and Futschik, M. E. (2007) ‘Mfuzz: A soft- ware package for soft clustering of microarray data’, Oosterwegel, M. et al. (1993) ‘Differential expression Bioinformation, 2 ( 1 ) , p p . 5 – 7 . d o i : of the HMG box factors TCF-1 and LEF-1 during murine 10.6026/97320630002005. embryogenesis’, Development, 118(2), pp. 439–448. Avail- able at: http://www.ncbi.nlm.nih.gov/pubmed/8223271. Kyburz, A. et al. (2006) ‘Direct Interactions between Subunits of CPSF and the U2 snRNP Contribute to the Pan, Q. et al. (2004) ‘Revealing global regulatory fea- Coupling of Pre-mRNA 3′ End Processing and Splicing’, tures of mammalian alternative splicing using a quantita- Molecular Cell, 23(2), pp. 195–205. doi: 10.1016/ tive microarray platform’, Molecular Cell, 16(6), pp. 929– j.molcel.2006.05.037. 941. doi: 10.1016/j.molcel.2004.12.004. Larson, A., Fair, B. J. and Pleiss, J. A. (2016) ‘Intercon- Pan, Q. et al. (2008) ‘Deep surveying of alternative nections between RNA-processing pathways revealed by splicing complexity in the human transcriptome by high- a sequencing-based genetic screen for pre-mRNA splic- throughput sequencing’, Nature Genetics, 40(12), pp. ing mutants in fission yeast’, G3: Genes, Genomes, Genet- 1413–1415. doi: 10.1038/ng.259. ics, 6(6), pp. 1513–1523. doi: 10.1534/g3.116.027508. Piecyk, M. et al. (2000) ‘TIA-1 is a translational silencer Levanon, D. et al. (1998) ‘Transcriptional repression that selectively regulates the expression of TNF-α’, EMBO by AML1 and LEF-1 is mediated by the TLE/Groucho Journal, 19(15), pp. 4154–4163. doi: 10.1093/emboj/ corepressors’, Proceedings of the National Academy of Sci- 19.15.4154. ences of the United States of America, 95(20), pp. 11590– Polo, J. M. et al. (2012) ‘A molecular roadmap of re- 11595. doi: 10.1073/pnas.95.20.11590. programming somatic cells into iPS cells’, Cell. Elsevier Lewis, B. P., Green, R. E. and Brenner, S. E. (2003) ‘Evi- I n c. , 1 5 1 ( 7 ) , p p. 1 6 1 7 – 1 6 3 2 . d o i : 1 0 . 1 0 1 6 / dence for the widespread coupling of alternative splicing j.cell.2012.11.039. and nonsense-mediated mRNA decay in humans’, Pro- Polo, S. E. et al. (2012) ‘Regulation of DNA-End Resec- ceedings of the National Academy of Sciences of the United tion by hnRNPU-like Proteins Promotes DNA Double- States of America, 100(1), pp. 189–192. doi: 10.1073/ Strand Break Signaling and Repair’, Molecular Cell. Elsevier pnas.0136770100. I n c . , 4 5 ( 4 ) , p p . 5 0 5 – 5 1 6 . d o i : 1 0 . 1 0 1 6 / Lu, Y. et al. (2014) ‘Alternative splicing of MBD2 sup- j.molcel.2011.12.035. ports self-renewal in human pluripotent stem cells’, Cell

26 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Ray, D. et al. (2013) ‘A compendium of RNA-binding Stelzer, G. et al. (2016) ‘The GeneCards suite: From motifs for decoding gene regulation’, Nature. Nature Pub- gene data mining to disease genome sequence analyses’, lishing Group, a division of Macmillan Publishers Limited. Current Protocols in Bioinformatics, 2016, pp. All Rights Reserved., 499(7457), pp. 172–177. doi: 1.30.1-1.30.33. doi: 10.1002/cpbi.5. 10.1038/nature12311. Supek, F. et al. (2011) ‘Revigo summarizes and visual- Ritschka, B. et al. (2017) ‘The senescence-associated izes long lists of gene ontology terms’, PLoS ONE, 6(7). doi: secretory phenotype induces cellular plasticity and tissue 10.1371/journal.pone.0021800. regeneration’, Genes and Development, 31(2), pp. 172– Takahashi, K. and Yamanaka, S. (2006) ‘Induction of 183. doi: 10.1101/gad.290635.116. Pluripotent Stem Cells from Mouse Embryonic and Adult Robinson, M. D., McCarthy, D. J. and Smyth, G. K. Fibroblast Cultures by Defined Factors’, Cell, 126(4), pp. (2009) ‘edgeR: A Bioconductor package for differential 663–676. doi: 10.1016/j.cell.2006.07.024. expression analysis of digital gene expression data’, Bioin- Tapial, J. et al. (2017) ‘An atlas of alternative splicing formatics, 26(1), pp. 139–140. doi: 10.1093/bioinformat- profiles and functional associations reveals new regula- ics/btp616. tory programs and genes that simultaneously express Rodriguez-Ubreva, J. et al. (2014) ‘C/EBPa-Mediated multiple major isoforms’, Genome Research, 27(10), pp. Activation of MicroRNAs 34a and 223 Inhibits Lef1 Ex- 1759–1768. doi: 10.1101/gr.220962.117. pression To Achieve Efficient Reprogramming into Toh, C. X. D. et al. (2016) ‘RNAi Reveals Phase-Specific Macrophages’, Molecular and Cellular Biology, 34(6), pp. Global Regulators of Human Somatic Cell Reprogram- 1145–1157. doi: 10.1128/mcb.01487-13. ming’, Cell Reports, 15(12), pp. 2597–2607. doi: 10.1016/ Salomonis, N. et al. (2010) ‘Alternative splicing regu- j.celrep.2016.05.049. lates mouse embryonic stem cell pluripotency and dif- Travis, A. et al. (1991) ‘LEF-1, a gene encoding a lym- ferentiation’, Proceedings of the National Academy of Sci- phoid-specific with protein, an HMG domain, regulates T- ences of the United States of America, 107(23), pp. 10514– cell receptor α enhancer function’, Genes and Develop- 10519. doi: 10.1073/pnas.0912260107. ment, 5(5), pp. 880–894. doi: 10.1101/gad.5.5.880. Sánchez-Jiménez, C. and Izquierdo, J. M. (2013) ‘T- Ule, J. et al. (2010) ‘iCLIP predicts the dual splicing cell Intracellular Antigen (TIA)-Proteins Deficiency in effects of TIA-RNA interactions’, PLoS Biology, 8(10). doi: Murine Embryonic Fibroblasts Alters Cell Cycle Progres- 10.1371/journal.pbio.1000530. sion and Induces Autophagy’, PLoS ONE, 8(9). doi: 10.1371/journal.pone.0075127. Venables, J. P. et al. (2013) ‘MBNL1 and RBFOX2 co- operate to establish a splicing programme involved in Sardina, J. L. et al. (2018) ‘Transcription Factors Drive pluripotent stem cell differentiation’, Nature Communica- Tet2-Mediated Enhancer Demethylation to Reprogram tions. Nature Publishing Group, 4(May), p. 2480. doi: Cell Fate’, Cell Stem Cell, 23(5), pp. 727-741.e9. doi: 10.1038/ncomms3480. 10.1016/j.stem.2018.08.016. Wang, E. T. et al. (2008) ‘Alternative isoform regula- Solana, J. et al. (2016) ‘Conserved functional antago- tion in human tissue transcriptomes’, Nature, 456(7221), nism of CELF and MBNL proteins controls stem cell-spe- pp. 470–476. doi: 10.1038/nature07509. cific alternative splicing in planarians’, eLife, 5(AUGUST), pp. 1–29. doi: 10.7554/eLife.16797.001. Wang, I. et al. (2014) ‘Structure, dynamics and RNA binding of the multi-domain splicing factor TIA-1’, Nucleic Stadhouders, R. et al. (2018) ‘Transcription factors Acids Research, 42(9), pp. 5949–5966. doi: 10.1093/nar/ orchestrate dynamic interplay between genome topolo- gku193. gy and gene regulation during cell reprogramming’, Na- ture Genetics. Springer US, 50(2), pp. 238–249. doi: Waterman, M. L., Fischer, W. H. and Jones, K. A. (1991) 10.1038/s41588-017-0030-7. ‘A thymus-specific member of the HMG protein family regulates the human T cell receptor Cα enhancer’, Genes Stadtfeld, M. et al. (2010) ‘A reprogrammable mouse and Development, 5(4), pp. 656–669. doi: 10.1101/ strain from gene-targeted embryonic stem cells’, Nature gad.5.4.656. Methods, 7(1), pp. 53–55. doi: 10.1038/nmeth.1409. Yamazaki, T. et al. (2018) ‘TCF3 alternative splicing Di Stefano, B. et al. (2014) ‘C/EBPα poises B cells for controlled by hnRNP H/F regulates E-cadherin expression rapid reprogramming into induced pluripotent stem and hESC pluripotency’, Genes and Development, 32(17– cells’, Nature. Nature Publishing Group, 506(7487), pp. 18), pp. 1161–1174. doi: 10.1101/gad.316984.118. 235–239. doi: 10.1038/nature12885. Yeo, G. and Burge, C. B. (2004) ‘Maximum entropy Di Stefano, B. et al. (2016) ‘C/EBPα creates elite cells modeling of short sequence motifs with applications to for iPSC reprogramming by upregulating Klf4 and in- RNA splicing signals’, in Journal of Computational Biology, creasing the levels of Lsd1 and Brd4’, Nature Cell Biology, pp. 377–394. doi: 10.1089/1066527041410418. 18(4), pp. 371–381. doi: 10.1038/ncb3326. Yeo, G. W. et al. (2009) ‘Doi:10.1038/Nsmb.1545’, Na- Di Stefano, B. and Graf, T. (2016) Very rapid and effi- ture Structural Molecular Biology, 16(2), pp. 130–137. doi: cient generation of induced pluripotent stem cells from 10.1038/nsmb.1545. mouse pre-B Cells, Methods in Molecular Biology. doi: 10.1007/7651_2014_133. Zavolan, M. and Kanitz, A. (2018) ‘RNA splicing and its connection with other regulatory layers in somatic cell

27 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

reprogramming’, Current Opinion in Cell Biology. Elsevier Ltd, 52, pp. 8–13. doi: 10.1016/j.ceb.2017.12.002. Zhou, P. et al. (1995) ‘Lymphoid enhancer factor 1 directs hair follicle patterning and epithelial cell fate’, Genes and Development, 9(6), pp. 700–713. doi: 10.1101/ gad.9.6.700.

28 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supplementary Figures

29 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

A Cassette Exons Retained Introns B 1.8 1149 B_iPS B_iPS 1140 1.6 1558 1548 B_Day8 B_Day8 1597 1570 1.4 1182 1093 1422 B_Day6 B_Day6 1324 834 866 1.2 585 1525 1451 1635 B_Day4 B_Day4 1980 1396 1408 529

Min. centroid distance 1.0 499 890 1846 1593 1660 B_Day2 B_Day2 1976 1641 1572 727 405

4 6 8 10 12 14 945 876 1154 1869 1508 1940 B_Bpulse B_Bpulse 2657 1790 1924 1232 811 842 Cluster number

439 1119 1028 1318 2017 1681 2183 B_Bcells B_Bcells 4058 3011 2689 2280 1458 1558 595

B_Bpulse B_Day2 B_Day4 B_Day6 B_Day8 B_iPS B_ES Number B_ES B_iPS B_Day8 B_Day6 B_Day4 B_Day2 B_Bpulse of events 0 1000200030004000 B_Bpulse B_Day2 B_Day4 B_Day6 B_Day8 B_iPS B_ES B_ES B_iPS B_Day8 B_Day6 B_Day4 B_Day2 B_Bpulse

110 212 191 268 465 290 371 B_Bcells B_Bcells 459 336 466 275 181 223 109

177 158 227 504 304 353 B_Bpulse B_Bpulse 435 296 470 244 165 169

117 160 437 349 345 B_Day2 B_Day2 470 336 497 197 94

104 362 293 285 B_Day4 B_Day4 398 282 396 148

277 228 283 B_Day6 B_Day6 327 186 299

415 352 B_Day8 B_Day8 396 330

231 B_iPS B_iPS 224 Alternative 3’SS Alternative 5’SS

C Cluster 07 - n=431 Cluster 09 - n=318 Cluster 11 - n=294 2 2 2 1 1 1 1 0.9 0 0 0 0.8

Scaled PSI −1 Scaled PSI −1 Scaled PSI −1 −2 0.7 −2 −2 e B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES 0.6 Cluster 08 - n=421 Cluster 10 - n=382 Cluster 12 - n=369 0.5 2 2 2

bership scor 0.4

1 1 1 m 0.3 0 0 0 Me 0.2 Scaled PSI Scaled PSI −1 −1

Scaled PSI −1 −2 −2 −2 0.1 B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES 0

D Cluster 07 - n=381 Cluster 09 - n=274 Cluster 11 - n=256 E 2 7.3% 10.24% 16.8% 1 5.11% 8.4% 0 6.2% −1 0.79% 8.2% Scaled cpm −2

Cluster 08 - n=364 Cluster 10 - n=309 Cluster 12 - n=318 84.3% 2 1 0.64% 7.86% 0 gene expression 7.42% 3.23% concordant −1 10.38% Scaled cpm 1.1% with AS −2 gene expression contrasting B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES with AS other gene gene expression gene expression other gene expression concordant contrasting expression pro#les with AS with AS pro#les

Supplementary Figure S1. (legend on next page) 30 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supplementary Figure S1. (A) Heatmaps representing the number of AS events for each AS category (Cassette exons, retained introns, alternative 3’ and 5’ splice sites) which are differentially spliced between each pair of samples across the C/EBPα-mediated reprogramming dataset. The direction of reprogramming time is indicated on the axes by arrows. Related to Figure 1B. (B) Variation of the minimum centroid distance score with the number of clusters produced, according to which 12 clusters were selected (magenta dashed line). (C) AS clusters not shown in Figure 1C. The size of each cluster is indicated (n). (D) Gene expression patterns of genes containing the exons belonging to each of the AS clusters in panel C. Genes with expression correlating with the cluster centroid or its negative (membership > 0.3) are highlighted in blue or green, respectively. Percentage of concordant/contrasting patterns are displayed for each cluster. (E) Pie chart indicating the fraction of genes whose gene expression changes are concordant / contrasting with the profiles of AS changes corresponding to their respective cassette exons (average of all the clusters). Colors as in panel G.

A All events Cassette exons Retained introns Alternative 3’SS Alternative 5’SS 71.62% overlap 79.36% overlap 67.17% overlap 55.61% overlap 73.51% overlap

579 174 282 83 40

13239 1461 3887 669 6831 577 1307 104 1214 111

MEF reprogramming MEF reprogramming MEF reprogramming MEF reprogramming MEF reprogramming

B cell reprogramming B cell reprogramming B cell reprogramming B cell reprogramming B cell reprogramming

Supplementary Figure S2. (A) Venn diagrams representing the overlap of events differentially spliced between B cell reprogramming (grey) and MEF reprogramming (magenta) from (Cieply et al. 2016).

31 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

A Cluster 07 Cluster 09 Cluster 11 2 2 2 n=19 1 1 1 0 0 0 −1 −1 −1 Scaled PSI Scaled PSI Scaled PSI −2 −2 n=13 −2 n=53

B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES Cluster 08 Cluster 10 Cluster 12 2 2 2 n=13 n=1 1 1 1 0 0 0 −1 −1 −1 Scaled PSI Scaled PSI Scaled PSI −2 −2 −2 n=8

B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES B Cluster 07 Cluster 09 Cluster 11 2 2 2 n=22 1 1 1 0 0 0 −1 −1 −1 Scaled PSI Scaled PSI Scaled PSI −2 n=4 −2 n=27 −2

B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES Cluster 08 Cluster 10 Cluster 12 2 2 2 n=3 1 1 1 0 0 0 −1 −1 −1 Scaled PSI Scaled PSI Scaled PSI −2 −2 n=48 −2 n=106

B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES B Bα D2 D4 D6 D8 iPS ES

Supplementary Figure S3. (A) Gene expression profiles of RNA-binding proteins correlating with the centroid of each of the remaining AS clusters (shown in Figure S1C and not in 1C) (positive regulators). Average scaled cpm values are represented by each line and the number of regulators of each cluster is indicated (n). (B) Gene expression profiles of RNA-binding proteins correlating with the negative of the centroid of each of the remaining AS clusters (shown in Figure S1C and not in 1C) (negative regulators). Average scaled cpm values are represented by each line and the number of regulators in each cluster is indicated (n).

32 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

A

one out of RBP-speci!cone shRNA#1 out of + GFP RBP-speci!c shRNA#2 + GFP OKSM activation Control shRNA + GFP (doxycycline Lentiviral constructs treatment) Infection sorting doxycycline GFP+ cells removal Reprogrammable mouse infected MEFs

iPS colonies MEFs one out of Day0 Day2 Day4 Day6 Day8 Day10 Day12 Day14 T7-Tia1 IRES GFP OKSM activation Empty IRES GFP (doxycycline Retroviral constructs Infection treatment) EPCAM1 Surface markers early late SSEA1

Time 2 days Feeders

B C D Pou5f1 NI (day 10) shSCR (day 10) 0.42 5.99 10! 10! 0.19 8.11 2 day 08 day 10 84.28% NI ns 10# 0 * 10# ** shSCR 84.15% 10% 10% -2 ** shCPSF3#1 82.55% *** 10$ 10$ ns -4

over shSCR shCPSF3#2 84.44% 10" 10" ns -6 82.8 10.8 81.8 9.87

log2(Fold change) 10" 10$ 10% 10# 10! 10" 10$ 10% 10# 10! shUL1#1 86.68% ns NI NI shSCR shSCR shUL1#2 89.26% shUL1#1shUL1#2 shUL1#1shUL1#2 ns shCPSF3#1shCPSF3#2 shCPSF3#1shCPSF3#2 shCPSF3#1 (day 10) shUL1#1 (day 10)

0.068 1.53 10! 0.089 0.097 10! 0 25 50 75 100 Nanog Percentage of alive cells 5 day 08 day 10 10# 10# (average across day 06 - 12) ** 0 10% 10%

-5 10$ 10$

-10 SSEA1-APC 10" 10" 99.3 0.48 96.0 2.39 over shSCR

-15 10" 10$ 10% 10# 10! 10" 10$ 10% 10# 10!

log2(Fold change) -20 EPCAM1-PE NI NI shSCR shSCR shUL1#1shUL1#2 shUL1#1shUL1#2 shCPSF3#1shCPSF3#2 shCPSF3#1shCPSF3#2

Pou5f1 NI (day 10) Empty (day 10) E 5 day 08 day 10 F G

10! 0.10 5.89 10! 0.22 5.26 NI 83.84% 0 * * 10# 10# ns

10% 10%

-5 10$ 10$ Empty 83.13% over Empty 10" 10" 89.5 4.50 87.5 7.03

log2(Fold change) -10 10" 10$ 10% 10# 10! 10" 10$ 10% 10# 10! T7-TIA1 78.56% NI NI ns EmptyT7-TIA1 EmptyT7-TIA1 T7-TIA1 (day 10) Nanog 0 25 50 75 100 0.12 4.57 Percentage of alive cells day 08 day 10 10! 2 (average across day 06 - 12) 10# 0 * * 10%

-2 10$

SSEA1-APC 10" -4 91.1 4.21

over Empty 10" 10$ 10% 10# 10! -6

log2(Fold change) EPCAM1-PE NI NI EmptyT7-TIA1 EmptyT7-TIA1 Supplementary Figure S4. (legend on next page) 33 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supplementary Figure S4. (A) Schematic representation of the protocol used to knockdown Cpsf3 and Hnrnpul1 or overexpress T7-tagged Tia1 at early stages of MEF reprogramming. (B) Gene expression levels of endogenous Pou5f1 and Nanog mRNAs during MEF reprogramming quantified by RT-qPCR and represented as log2(Fold change) for every condition compared to the shSCR control. Average of biological triplicates and SEM are shown. Statistical significance was calculated by two-way ANOVA on ∆∆Ct values comparing to shSCR (*, **, *** = p value < 0.05, 0.01, 0.001 respectively, corrected for multiple testing with Sidak method). (C) Examples of gating set ups for cell sorting of SSEA1 and EPCAM1 markers at day 10 of reprogramming. Numbers within the rectangles indicate percentages of cells in each gate. (D) Barplot representing the percentage of alive (DAPI-, light gray) and dead (DAPI+, dark gray)cells. Average and SD of days 6, 8, 10 and 12 are shown for each condition. Statistical significance was calculated by Fisher’s exact test against shSCR condition (*, **, *** = p value < 0.05, 0.01, 0.001 respectively). (E) Expression levels of endogenous Pou5f1 and Nanog mRNAs during MEF reprogramming quantified by RT-qPCR and represented as log2(Fold change) of every condition compared to the Empty control. Average of biological replicates and SEM are shown. Statistical significance was calculated by two-way ANOVA on ∆∆Ct values comparing to Empty vector (*, **, *** = p value < 0.05, 0.01, 0.001 respectively, corrected for multiple testing with Sidak method). (F) Examples of gating set ups for cell sorting of SSEA1 and EPCAM1 markers at day 10 of reprogramming. Numbers within the rectangles indicate percentages of cells in each gate. (G) Barplot representing the percentage of alive (DAPI-, light gray) and dead (DAPI+, dark gray) cells. Average and SD of days 6, 8, 10 and 12 are shown for each condition. Statistical significance was calculated by Fisher’s exact test against Empty condition (*, **, *** = p value < 0.05, 0.01, 0.001 respectively). 34

34 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

exon 6 exon 11 exon 12 A C 100 200 Lef1-inclusion Lef1 exon 6 Lef1 exon 11 Lef1-skipping 75 150

C D

P T7-Lef1 ** ***

50 100 M PSI ) * 25 50 ** Lef1 expression * Gapdh * *** 0 0 ** (over ** B Bα D2 D4 D6 D8 iPS ES **

B cell reprogramming Relative expression B 04 06 08 10 12 04 06 08 10 12 04 06 08 10 12 04 06 08 10 12 † † † 100 80 E NI Empty Lef1-inclusion Lef1-skipping Lef1 exon 6 Lef1 exon 11 75 60 0.10 Pou5f1

CPM 0.08 50 40 ) PSI 0.06 Gapdh 25 20 0.04 Lef1 expression (over 0.02 0 0 Relative expression D0 D4 D7 D10 D15 D20 iPS 0.00 MEF reprogramming 04 06 08 10 12 04 06 08 10 12 04 06 08 10 12 04 06 08 10 12 NI Empty Lef1-inclusion Lef1-skipping F 0.10 Nanog * 40 day 6 0.08 *** )

cells) ** - 30 0.06 Gapdh 0.04 20 * (over ** EPCAM1 0.02 + ** * * Relative expression 10 0.00 04 06 08 10 12 04 06 08 10 12 04 06 08 10 12 04 06 08 10 12 Fold change early

reprog. intermediates NI Empty Lef1-inclusion Lef1-skipping (% SSEA1 0 NI Empty -inclusion-skipping Lef1 Lef1 H G 200 * 350 day 10 day 12 150 70 70 cells) + 100 20 20 † EPCAM1

+ 50 15 Number of AP+ colonies

Fold change late 10

reprog. intermediates 0 (% SSEA1 5

0 NI NI Empty Empty -inclusion-skipping -inclusion-skipping NI Empty Lef1 Lef1 Lef1 Lef1 Lef1 Lef1 inclusion skipping

Supplementary Figure S5. (legend on next page)

35 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supplementary Figure S5. (A) Gene expression of Lef1 gene (cpm values, blue line, right y axis) and PSI values of Lef1 exon 6 and 11 (grey lines, left y axis) in B cell reprogramming. (B) Gene expression of Lef1 gene (cpm values, magenta line, right y axis) and PSI values of Lef1 exon 6 and 11 (grey lines, left y axis) in MEF reprogramming. Asterisks point out PSI values calculated with low coverage (less than 10 actual reads). (C) Lef1 pre-mRNA isoforms including exons 6 and 11 (Lef1-inclusion) or skipping both (Lef1-skipping) are represented. Thin lines represent introns, while wider regions represent exons. Lighter and thinner parts of exons depict 5’ and 3’UTR regions. Alternative exons are highlighted. (D) Relative expression levels of T7-Lef1 mRNA quantified by RT-qPCR using specific primers that only amplify exogenous T7- tagged Lef1. The y axis represents relative expression (2^(-∆Ct) value) of T7-Lef1 after normalization over Gapdh. (E) Relative expression levels of Pou5f1 and Nanog quantified by RT-qPCR as in panel D. (D,E) Average of biological replicates and SD values are shown (n=4). Statistical significance was calculated by t-test on ∆Ct values comparing to the Empty control (*, **, *** = p value < 0.05, 0.01, 0.001 respectively, corrected for multiple testing with Holm-Sidak method). (F) Increase in early reprogramming intermediates at day 6 post-OSKM induction upon overexpression of Lef1 isoforms. Fold change was calculated from the percentage of SSEA1+EPCAM1- cells (of the total of alive cells) in every condition compared to Empty control using flow cytometry analysis. (G) Increase in late reprogramming intermediates at days 10 and 12 post-OSKM induction upon overexpression of Lef1 isoforms. Fold change was calculated from the percentage of SSEA1+EPCAM1+ cells (of the total of alive cells) in every condition compared to the Empty control using flow cytometry analysis. (H) Number of colonies stained with alkaline phosphatase (AP) at day 14 post-OSKM induction upon Tia1 overexpression. The image of a representative well is shown for every condition. (F,G,H) Average of biological replicates and SD values are shown (n=4). Statistical significance was calculated by t-test comparing to the Empty control (*, **, *** = p value < 0.05, 0.01, 0.001 respectively, corrected for multiple testing with Holm-Sidak method).

36 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

A TIA1 B CPSF3 hnRNP UL1 6.65x 0.80x 0.36x 600 200

0.51x (cpm) (cpm) 150 400 0.81x

100

200 Gene expression Gene expression 50

CR CR L1#1 L1#2 L1#1 L1#2 PSF3#1PSF3#2 PSF3#1PSF3#2 OE_day00_Ctrl KD_day00_Ctrl KD_day00_Ctrl OE_day12_Empty KD_day12_shS KD_day12_shS OE_day12_T7TIA1 KD_day12_shUKD_day12_shU KD_day12_shUKD_day12_shU KD_day12_shCKD_day12_shC KD_day12_shCKD_day12_shC

C CPSF3-dependent hnRNP UL1-dependent D Empty di".spliced exons events (n=357) events (n=298) (n=980)

100 100 *** ns 100

50 50 50 50 40 75 40 0 30 (day 12 - day 0) 0 30 20 20 50 PSI 10 PSI shUL1 - shSCR) 10 PSI shCPSF3 - shSCR) -50 -50 <2 ∆∆ <2 ∆∆ 25 abs ( -100 abs ( -100 PSI shUL1#2 ∆ PSI shCPSF3#2 (day 12 - day 0) -100 -50 0 50 100 -100 -50 0 50 100 0 ∆ ∆PSI shSCR (day 12 - day 0) ∆PSI shSCR (day 12 - day 0) NI Empty TIA1 day 0 day 12

E UL1-dependent exons UL1-independent exons shSCR di".spliced exons (n=167) (n=290) (n=1557)

** ns * ns ns * ns ns ns ns *** *** ns * *** 100 100 100

75 75 75

50 50 50 PSI PSI PSI

25 25 25

0 0 0 NI shSCR shC#1 shC#2 shU#1 shU#2 NI shSCR shC#1 shC#2 shU#1 shU#2 NI shSCR shC#1 shC#2 shU#1 shU#2 day 0 day 12 day 0 day 12 day 0 day 12

Supplementary Figure S6. (A) Expression levels (cpm values) of Tia1 in the corresponding RNA-seq dataset. Fold change between cells overexpressing T7-Tia1 and control cells transduced with an Empty vector is shown (day12, average between replicates). (B) Expression levels (cpm values) of Cpsf3 and Hnrnpul1 in the corresponding RNA-seq dataset. Fold change values between cells infected with specific shRNAs and control shSCR are shown (day12, average between replicates and shRNAs). (C) CPSF3- and UL1-dependent events detected during reprogramming (left and right, respectively), representing the ∆PSI with the second shRNA (as in Figure 6C). The x axis represents the ∆PSI value between shSCR day 12 and day 0 control. The y axis represents the ∆PSI value between shCPSF3#2 or shUL1#2 day 12 and day 0 control. Only the events differentially spliced with both CPSF3- or hnRNPUL1-specific shRNAs are shown. Events coloured with the palette indicated are defined as TIA1-dependent events (average |∆∆PSI(shRNA - shSCR)| ≥ 10), while the others are represented by grey dots. (D) Relative to Figures 6E-F. Violin plots representing the distribution of PSI values of exons differentially spliced during reprogramming in Empty conditions (|∆PSI(Empty day12 – NI day 0)| ≥ 10 and range ≥ 5). (E) Relative to Figures 6J-K. Violin plots representing the distribution of PSI values of UL1-dependent/- independent exons (top and central panel) and exons differentially spliced during reprogramming in shSCR conditions (| ∆PSI(shSCR day12 – NI day 0)| ≥ 10 and range ≥ 5, bottom panel). (D,E) Statistical significance was calculated by Fisher’s exact test comparing number of events with intermediate (25 < PSI < 75) or extreme PSI values (PSI ≥ 75 or ≤ 25) in each condition against shSCR control (*, **, *** = p value < 0.05, 0.01, 0.001 respectively).

37 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.17.299867; this version posted September 18, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supplementary Tables

Table S1. Cassette exon events differentially spliced in B cell reprogramming. This table includes the information about the AS cluster each exon belongs to and the predicted effect on its transcript. Data for other types of events is available upon request.

Table S2. Cassette exon events differentially spliced in MEF reprogramming. This table includes the information about the overlap with B cell reprogramming. Data for other types of events is available upon request.

Table S3. Gene expression levels of RBPs correlating with AS clusters (positive and nega- tive predicted regulators).

Table S4. CPSF3-, UL1- and TIA1-dependent and -independent events. This table includes the sets of control cassette exons (CEx) used for KD (CPSF3- and UL1-dependent CEx) or OE (TIA1- dependent CEx).

Table S5. Gene Ontology terms enriched in CPSF3-, UL1- and TIA1-dependent events. This table includes outputs from GOrilla and REViGO analyses.

Table S6. Sequences of primers and shRNAs used in this study.

38