School of Natural Sciences and Mathematics

2014-03

HITS-CLIP and Integrative Modeling Define the Rbfox Splicing-Regulatory Network Linked to Brain Development and Autism

UTD AUTHOR(S): Michael Qiwei Zhang

©2014 The Authors. Published by Elsevier Inc.

Creative Commons 3.0 BY-NC-ND License.

Find more research and scholarship conducted by the School of Natural Sciences and Mathematics here. This document has been made available for free and open access by the Eugene McDermott Library. Contact [email protected] for further information.

Cell Reports Resource

HITS-CLIP and Integrative Modeling Define the Rbfox Splicing-Regulatory Network Linked to Brain Development and Autism

Sebastien M. Weyn-Vanhentenryck,1,9 Aldo Mele,2,9 Qinghong Yan,1,9 Shuying Sun,3,4 Natalie Farny,5 Zuo Zhang,3,6 Chenghai Xue,3 Margaret Herre,2 Pamela A. Silver,5 Michael Q. Zhang,7,8 Adrian R. Krainer,3 Robert B. Darnell,2,* and Chaolin Zhang1,* 1Department of Systems Biology, Department of Biochemistry and Molecular Biophysics, Center for Motor Neuron Biology and Disease, Columbia University, New York, NY 10032, USA 2Howard Hughes Medical Institute, Laboratory of Molecular Neuro-Oncology, Rockefeller University, New York, NY 10065, USA 3Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA 4Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA 92093, USA 5Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA 6Merck Research Laboratories, Merck & Co., Inc., Rahway, NJ 07065, USA 7Department of Molecular and Cell Biology, Center for Systems Biology, The University of Texas at Dallas, Richardson, TX 75080, USA 8Bioinformatics Division, Center for Synthetic and Systems Biology, TNLIST, Tsinghua University, Beijing 100084, China 9These authors contributed equally to this work *Correspondence: [email protected] (R.B.D.), [email protected] (C.Z.) http://dx.doi.org/10.1016/j.celrep.2014.02.005 This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).

SUMMARY preferentially expressed in neurons, heart, and muscles, whereas Rbfox3 is specifically expressed in postmitotic neu- The RNA binding Rbfox1/2/3 regulate alter- rons. In humans, chromosomal translocation or copy number native splicing in the nervous system, and disruption variation affecting RBFOX1 has been found in patients with of Rbfox1 has been implicated in autism. However, several neurological disorders, including epilepsy, intellectual comprehensive identification of functional Rbfox disability (Bhalla et al., 2004), schizophrenia (Xu et al., 2008), targets has been challenging. Here, we perform and autism (Martin et al., 2007; Sebat et al., 2007). HITS-CLIP for all three Rbfox family members in At the molecular level, Rbfox proteins are known as tissue- specific splicing factors that bind to the (U)GCAUG element order to globally map, at a single-nucleotide resolu- frequently conserved across vertebrate species (Jin et al., tion, their in vivo RNA interaction sites in the mouse 2003; Minovitsky et al., 2005; Ponthier et al., 2006; Underwood brain. We find that the two guanines in the Rbfox bind- et al., 2005). We previously performed genome-wide bio- ing motif UGCAUG are critical for -RNA inter- informatic prediction of Rbfox target exons based on phyloge- actions and crosslinking. Using integrative modeling, netically conserved motif sites (Zhang et al., 2008), leading to these interaction sites, combined with additional the identification of >1,000 alternative or constitutive exons datasets, define 1,059 direct Rbfox target alternative that are potentially regulated by Rbfox, many found within tran- splicing events. Over half of the quantifiable targets scripts encoding proteins important for neuromuscular func- show dynamic changes during brain development. tions. Characterization of the splicing pattern of the Rbfox Of particular interest are 111 events from 48 targets revealed a position-dependent RNA map predictive of candidate autism-susceptibility , including syn- Rbfox action. According to this map, Rbfox binding in the downstream intron activates exon inclusion and binding in the dromic autism genes Shank3, Cacna1c, and Tsc2. alternative exon or upstream intron represses exon inclusion, Alteration of Rbfox targets in some autistic brains is consistent with observations from several tissue-specific exons correlated with downregulation of all three Rbfox (Jin et al., 2003; Underwood et al., 2005). Such a map was pre- proteins, supporting the potential clinical relevance viously found for another neuron-specific splicing factor Nova of the splicing-regulatory network. and is now recognized as a more general rule of alternative splicing regulation (Licatalosi et al., 2008; Ule et al., 2006). INTRODUCTION Despite recent progress (Barash et al., 2010; Ray et al., 2013; Ule et al., 2006; Zhang et al., 2013), the small sizes of RBP bind- The Rbfox proteins are a family of neuron- and muscle/heart- ing motifs limit the ability of motif-based bioinformatic target specific RNA binding proteins (RBPs) encoded by three prediction to achieve both high specificity and sensitivity. To genes—Rbfox1 (Fox-1 or A2bp1), Rbfox2 (Fox-2 or Rbm9), and map in vivo protein-RNA interaction sites on a genome-wide Rbfox3 (Fox-3, Hrnbp3,orNeuN)—that are conserved in verte- scale, crosslinking and immunoprecipitation followed by high- brates, flies, and worms. Rbfox1 and Rbfox2 are exclusively or throughput sequencing (HITS-CLIP) have been developed to

Cell Reports 6, 1139–1152, March 27, 2014 ª2014 The Authors 1139 Figure 1. HITS-CLIP Maps Rbfox-RNA Interaction Sites in Mouse Brain on a Genome-wide Scale (A) A schematic illustration of two HITS-CLIP protocols used to map Rbfox binding sites. (B) A UCSC Genome Browser view of Rbfox1, 2, and 3 CLIP data in an alternatively spliced region of a 93 nt cassette exon in Rbfox1 is shown. Rbfox1, 2, and 3 CLIP data are shown in separate wiggle tracks above the coordinates of UGCAUG and GCAUG elements and the phyloP conservation score. (legend continued on next page)

1140 Cell Reports 6, 1139–1152, March 27, 2014 ª2014 The Authors isolate RNA fragments directly bound by an RBP of interest redundancy, we performed HITS-CLIP experiments for all mem- (Darnell, 2010; Licatalosi et al., 2008; Moore et al., 2014; Ule bers of the family individually using mouse whole-brain tissue. et al., 2005a). HITS-CLIP has been used to map the Rbfox2 We first confirmed that the antibodies we used did not cross- binding sites in thousands of genes in human embryonic stem react with different members and that they efficiently immuno- cells, including those important for splicing regulation as precipitated (IP) the targeted protein with minimal background predicted by the RNA map (Yeo et al., 2009). under standard CLIP conditions (Figure S1; Supplemental To understand the physiological function of Rbfox proteins in Notes). Next, we used two different strategies to clone and the mammalian brain, knockout (KO) mouse models have been amplify the isolated RNA fragments (Figure 1A). The first pro- generated. CNS-specific depletion of Rbfox1 results in an tocol, denoted as standard CLIP, was performed as described increased susceptibility of mice to seizures and in overex- previously (Darnell, 2010; Licatalosi et al., 2008; Moore et al., citability of neurons in the dentate gyrus (Gehman et al., 2011). 2014; Ule et al., 2005a). In this protocol, RNA linkers are ligated CNS depletion of Rbfox2 results in defects in cerebellar develop- to the 50 and 30 ends of the RNA fragments (Figure 1A, left branch) ment (Gehman et al., 2012). Comparison of wild-type (WT) and are later used for RT-PCR amplification. We and several versus Rbfox1 or Rbfox2 KO brains using exon-junction microar- other groups have previously noted that after proteinase K diges- rays has identified multiple Rbfox-dependent exons (Gehman tion of the crosslinked protein-RNA complex, one or a few amino et al., 2011, 2012). However, the number of exons identified acids might remain attached to the RNA at the crosslink site, using this approach is quite small (20 and 29 exons, respec- which causes informative errors at the crosslink site during tively), compared to the number of Rbfox binding sites deter- reverse transcription (Granneman et al., 2009; Ule et al., mined by bioinformatic prediction or CLIP data, presumably 2005a). These crosslinking-induced mutation sites (CIMS) due to compensatory upregulation of Rbfox2 in Rbfox1 KO provide a footprint of protein-RNA crosslinking and can be lever- mice and vice versa. Given that different Rbfox family members aged to determine protein-RNA interactions at a single nucleo- have highly similar protein sequences, especially in their tide resolution (Moore et al., 2014; Zhang and Darnell, 2011). RNA-recognition motif (RRM)-type RNA binding domain However, reverse transcription can abort prematurely at these (RBD; R94% amino acid identity), they are expected to bind sites, resulting in truncated cDNAs that lack the 50 adaptor and regulate largely overlapping sets of transcripts (Gallagher required for PCR (Ko¨ nig et al., 2010; Sugimoto et al., 2012). To et al., 2011; Gehman et al., 2012). capture both truncated and nontruncated cDNAs, we developed Until now, a comprehensive and accurate target splicing- a second CLIP protocol named bromodeoxyuridine (BrdU)-CLIP regulatory network of the Rbfox proteins has not been defined, (Figure 1A, right branch). This protocol bears some conceptual due in part to the lack of a genome-wide high-resolution map similarity to individual nucleotide resolution CLIP or iCLIP (Ko¨ nig of the Rbfox interaction sites in the brain and to the lack of effec- et al., 2010). After ligation of the 30 linker, purified RNA is reverse tive computational methods to couple protein-RNA interactions transcribed to introduce 50 and 30 PCR adaptor sequences sepa- with splicing changes as a means of identifying direct, functional rated by an apurinic/apyrimidinic endonuclease (APE) cleavage targets. Here, we used HITS-CLIP to globally map the RNA inter- site. This is followed by the circularization of both readthrough action sites of all three Rbfox family members and comple- and truncated cDNAs and relinearization of cDNA via the mented the CLIP data with RNA sequencing (RNA-seq) data to cleavage site to place the 50 and 30 adaptor sequences in the identify exons responsive to perturbation of Rbfox. Importantly, correct orientation. One key difference between BrdU-CLIP we probabilistically weighed and combined these and additional and iCLIP is the incorporation of BrdUTP into the cDNA during data sets to define the functional target transcripts directly regu- reverse transcription so that the resulting cDNA can be purified lated by Rbfox using an integrative modeling approach (Zhang in a stringent manner using an antibody that specifically recog- et al., 2010). The resulting network allowed us to reveal the role nizes BrdU (Core et al., 2008; Ingolia et al., 2009). of Rbfox proteins in regulating global dynamic splicing changes To evaluate the robustness of the Rbfox interaction sites, we during brain development and highlight promising downstream prepared HITS-CLIP libraries for Rbfox1, Rbfox2, and Rbfox3 targets implicated in autism. with four, four, and five biological replicates, respectively, which together resulted in about 870 million raw reads (CLIP tags). After RESULTS stringent filtering, processing, and mapping (Moore et al., 2014; Zhang et al., 2010)(Experimental Procedures), we obtained a Rbfox1, 2, and 3 HITS-CLIP in Mouse Brain total of 4.6 million unique CLIP tags that represent independent Considering the possibility that each Rbfox family member might captures of protein-RNA interactions, including 1,460,387 tags differ in binding specificity despite their apparent functional for Rbfox1, 868,366 tags for Rbfox2, and 2,308,632 tags for

(C) Similar to (B), except that the alternatively spliced region of Gabrg2 exon 9 is shown. CLIP data of Rbfox1, 2, and 3 are pooled together and shown in a single track with different colors representing the CLIP tags obtained in independent CLIP experiments. The position of Nova binding is indicated by the arrowhead (top right). (D and E) Genomic distribution of Rbfox1, 2, and 3 CLIP tags pooled together (D), and the resulting genic CLIP tag cluster peaks (E) are shown. Due to incompleteness of 50 and 30 UTR annotations, each is extended for 10 kb in both directions; these regions are listed as separate categories. (F) Pairwise correlation of CLIP data among Rbfox proteins based on the number of CLIP tags per cluster. Each cluster is represented as a black dot positioned in three-dimensional (3D) space. Comparisons between each pair of proteins are shown in 2D planes, obtained by projecting the black dots into their respective 2D space (colored dots). Pearson correlation of each pairwise comparison is indicated. (G) Correlation of CLIP tags derived from the standard and BrdU-CLIP protocols, based on the number of CLIP tags per cluster.

Cell Reports 6, 1139–1152, March 27, 2014 ª2014 The Authors 1141 Rbfox3. Between 59% and 65% of these CLIP tags are located enrichment of the Rbfox binding motif GCAUG was observed in introns, consistent with the known role of Rbfox proteins in in the immediate vicinity of the reproducible deletion sites (Fig- regulating alternative splicing; an additional 23%–28% unique ure 2B). In contrast, when we analyzed substitutions and inser- CLIP tags are located in exons, mostly in the 30 UTRs. tions using the same method, we did not observe elevated motif enrichment near the mutation sites (data not shown), suggesting Rbfox1, 2, and 3 Have Similar Protein-RNA Interaction that crosslinking predominantly, if not exclusively, introduces Profiles deletions rather than insertions or substitutions in Rbfox CLIP. Initial inspection of the CLIP tag distribution suggests that the We then examined the enrichment of UGCAUG or VGCAUG interaction profiles of the three Rbfox family members are very (V = non-U) relative to the crosslink sites with reproducible dele- similar. For example, the Rbfox1 transcripts contain a cassette tions in more detail (Figure 2B inset). UGCAUG is enriched 28- to exon of 93 nt (Figure 1B) encoding part of the RRM of the protein. 41-fold at positions 5, 4, and 1 relative to the crosslink site, Its skipping as a result of autoregulation generates a dominant- corresponding to crosslinking at G2, U5, and G6 of the UGCAUG negative form that lacks RNA binding capability (Baraniak element. Interestingly, enrichment of VGCAUG is most predom- et al., 2006; Damianov and Black, 2010). Our CLIP data show inant at position 1 relative to the crosslink site (64-fold), corre- that all three Rbfox family members bind to the upstream intronic sponding to crosslinking of G2 (Figure 2C). We also examined sequences harboring a cluster of conserved UGCAUG elements, the base composition of the sequences around CIMS regardless suggesting that this exon is under both auto- and cross-regula- of the presence of (U)GCAUG and observed a slight bias toward tion by all family members. We also previously demonstrated uridine compared to the flanking sequences (Figure S2A; see that GABA receptor gamma 2 subunit (Gabrg2) exon 9 is under Discussion below). De novo motif analysis using sequences the synergistic regulation of Rbfox and Nova when they bind [-10,10] around CIMS uncovered (U)GCAUG as the only motif near the 50 and 30 splice sites of the downstream intron, respec- with strong enrichment (36% of 1,158 nonrepetitive CIMS in tively, based on detailed mutation analysis and splicing reporter [10,10], E < 3.7 3 10320; Figure S2B). The first position of assays in cell culture (Dredge and Darnell, 2003; Zhang et al., the motif is the most variable, which is consistent with previous 2010). Our CLIP data now confirmed that Rbfox proteins indeed findings that Rbfox binds to both UGCAUG and VGCAUG with bind to the expected site in vivo in the brain (Figure 1C). high affinity (Jin et al., 2003; Ponthier et al., 2006). Additional To quantitatively compare the RNA binding profiles of different deviations from the consensus appear to be tolerated to some Rbfox family members, we defined a nonredundant set of Rbfox- extent (e.g., in positions 3 and 4), providing a partial explanation RNA interaction sites using all unique CLIP tags pooled together for why (U)GCAUG is not present at all crosslink sites (Figures (Figure 1D; Experimental Procedures). A stringent set of 41,182 S2E–S2G). Finally, we observed deletions in 6.2% of BrdU- genic CLIP tag clusters with at least one statistically significant CLIP tags, and analysis combining standard and BrdU-CLIP peak (p < 0.01) was obtained (Table S1), 70% of which are tags defined 2,298 CIMS (FDR < 0.001; Table S2). located in introns and the other 30% are in exons (Figure 1E). The sensitivity of crosslink site identification by CIMS analysis Then, we counted the number of CLIP tags per cluster for each is limited by the relatively low deletion rate among tags that are protein. CLIP tags for different members are very well correlated read through. We therefore looked for reproducible crosslinking in each pairwise comparison, especially between Rbfox1 and induced truncation sites (CITS) in BrdU-CLIP data (Figure 2D; Rbfox3 (Pearson correlation R = 0.97); the correlation between Experimental Procedures). Overall, 6,606 robust CITS were Rbfox2 and the other two members is somewhat lower (R = identified (p < 0.001; Table S3). Among these, the UGCAUG 0.76–0.80; Figure 1F). In addition, we confirmed that the two element is enriched 319-fold at a single position (5) relative to CLIP protocols gave very reproducible results in the global CITS, corresponding to predominant crosslinking at G6 of the profiles (R = 0.97; Figure 1G). Based on these observations, motif (Figure 2E). The same position was crosslinked in the we conclude that the three Rbfox family members have similar VGCAUG element, although there is less enrichment of the motif RNA-interaction profiles on a genome-wide scale, consistent (88-fold; Figure 2F). Analysis of the base composition [10,10] with the notion that their binding specificity is largely determined around CITS revealed the UGCAUG motif directly (Figure S2C), by their very similar RRMs. Although it remains possible that a and this was confirmed by de novo motif analysis (60% of small proportion of the binding sites could be preferentially 1000 randomly sampled nonrepetitive sites; E < 1.9 3 10631; recognized by a specific member, for this work, CLIP tags of Figure S2D). As a control, we repeated the same analysis in all three members were pooled together for further analysis. the standard CLIP data, which presumably lacked truncated tags, and did not observe enrichment of the (U)GCAUG motif A Single-Nucleotide Resolution Map of Rbfox Binding in these specific positions (Figures S2H and S2I). Sites by CIMS and CITS Analysis Together, our data suggest that the two guanines G2 and G6 in Using two CLIP protocols in parallel allowed us to employ the (U)GCAUG motif are particularly prone to crosslinking with different strategies to pinpoint the exact Rbfox-RNA crosslink the Rbfox protein. Consistent with this finding, examination of and interaction sites (Figures 2 and S2). For CLIP tags obtained a previously determined NMR structure of the Rbfox1 RRM in by the standard protocol, we performed CIMS analysis using our complex with UGCAUGU RNA revealed that these two guanines established method (Figure 2A) (Moore et al., 2014; Zhang and are buried in two pockets of the RRM and are stabilized by Darnell, 2011). Nucleotide deletions were observed in 14% of multiple hydrogen bonds and stacking interactions (Figure 2G); standard CLIP tags, from which 1,424 reproducible CIMS were mutations in each of these two nucleotides resulted in the largest identified (false discovery rate [FDR] < 0.001). A substantial increases in the free energy of binding (Auweter et al., 2006).

1142 Cell Reports 6, 1139–1152, March 27, 2014 ª2014 The Authors A CIMS D CITS G cDNA cDNA UV

B E 70 60 350 400 60 300 40 G2 50 20 250 200 40 0 200 -5 0 5 30 UGCAUG 150 0 UGCAUG -5 0 5 UGCAUG UGCAUG 20 100 UV 10 50 Enrichment of UGCAUG Enrichment of UGCAUG 0 0 -50 -30 -10 10 30 50 -50 -30 -10 10 30 50 Distance to deletion site Distance to truncation site C F G6 70 350 80 100 60 60 300 80 60 50 40 250 40 20 40 200 20 0 30 -5 0 5 150 0 VGCAUG -5 0 5 VGCAUG 20 100 10 50 Enrichment of VGCAUG Enrichment of VGCAUG 0 0 -50 -30 -10 10 30 50 -50 -30 -10 10 30 50

Figure 2. CIMS and CITS Analysis to Map Rbfox-RNA Interactions at a Single-Nucleotide Resolution (A)–(C) and (D)–(F) are for crosslinking-induced mutation sites (CIMS) and crosslinking-induced truncation sites (CITS) analysis, respectively. (A and D) A schematic illustration of CIMS (A) and CITS (D) is shown. (B and E) Enrichment of UGCAUG around CIMS (deletions, B) and CITS (truncations, E) is calculated from the frequency of UGCAUG starting at each position relative to the inferred crosslink sites, normalized by the frequency of the element in flanking sequences. The inset shows a zoomed-in view, with the most frequent crosslink sites in the motif highlighted in red. (C and F) Similar to (B) and (E), except that the enrichment of VGCAUG (V = non-U) around CIMS (C) and CITS (F) is shown. (G) NMR structure of RbFox1 RRM (surface, pale blue) in complex with the UGCAUGU heptanucleotide (cartoon, rainbow; Protein Data Bank ID code 2ERR; Auweter et al., 2006). Highlighted are the two guanines G2 and G6 (pink) with predominant crosslinking.

Based on the single-nucleotide-resolution map of in vivo Rbfox protein, which is endogenously expressed at a level that is low interaction sites and on the characterized specificity of the but sufficient for splicing regulation; HeLa cells expressing the proteins, we developed the motif enrichment and conservation empty vector were used for comparison (control; Figure 3A, score (MECS) of (U)GCAUG elements by comparing CLIP tag left panel). In total, RNA-seq of the control and shRbfox2 clusters and regions without CLIP tags (Figures S2J and S2K; samples resulted in 60 million and 48 million paired-end reads, Supplemental Results). A motif site with higher conservation respectively, of which 62%–65% were mapped unambiguously receives a higher score, especially if it is located in an intronic to the genome or to the exon-junction database. Examination region. A UGCAUG element receives a higher score than a of the gene-expression level confirmed that Rbfox2 was specif- VGCAUG element with the same level of conservation, reflecting ically knocked down 3.3-fold (Figure 3A, right panel), consistent greater enrichment of the former in CLIP tag clusters. with the protein level changes observed from immunoblot anal- ysis. Accordingly, we were able to identify 126 cassette exons, Identifying Rbfox-Dependent Exons Using RNA-Seq 17 tandem cassette exon events, and four mutually exclusive We previously used HeLa cells with perturbed Rbfox1 or Rbfox2 exon events showing Rbfox2-dependent inclusion or exclu- expression to validate over half (55%–59%) of bioinformatically sion (FDR % 0.1 and proportional change of exon inclusion predicted Rbfox target alternative exons tested with RT-PCR jDIj R 0.1) (Ule et al., 2005b)(Figure 3B; Table S4). Among the (Zhang et al., 2008). We therefore used this established experi- 22 cases of alternative exons we tested by RT-PCR (which in- mental system to expand the list of Rbfox-dependent exons by cludes three cases with a read coverage slightly below the RNA-seq, which provides information complementary to the threshold we used), 21 showed Rbfox2-dependent splicing (Fig- CLIP data. As we described previously (Zhang et al., 2008), ures 3C and 3D; Tables S4 and S5), giving a validation rate of HeLa cells were treated with a short hairpin RNA (shRNA) target- 95%. The direction of Rbfox2-dependent splicing of these exons ing Rbfox2 (shRbfox2) to generate stable knockdown (KD) of the can be predicted by the position-dependent RNA map based on

Cell Reports 6, 1139–1152, March 27, 2014 ª2014 The Authors 1143 Figure 3. Identification of Rbfox2-Dependent Exons using RNA-Seq (A) Left panel: Rbfox2 protein expression in HeLa cells with stable knockdown of Rbfox2 using a specific shRNA (shRbfox2) compared with cells treated with the vector (control), as evaluated by immunoblot analysis. The nucleosome remodeling complex protein SNF is used as a loading control. Right panel: the gene expression profiles in HeLa cells treated with shRbfox2 or control vectors are quantified by reads per kb/million (RPKM) using RNA-seq data. Rbfox2 is high- lighted by the red circle. (B) Proportional inclusion (I) of cassette exons in shRbfox and control HeLa cells. Exons with reduced and increased inclusion (FDR < 0.1 and jDIj > 0.1; Ule et al., 2005b) in shRbfox2 compared with control HeLa cells are highlighted in red and blue, respectively. (C and D) UCSC Genome Browser views of two examples of Rbfox-dependent exon inclusion (PICALM, C) or exclusion (MAP3K7, D), as indicted by the arrowheads in (B). Below the RNA-seq data are (U)GCAUG elements and phyloP conservation scores. The result of RT-PCR validation is shown on the right.

either the CLIP data derived from mouse brain or the bio- frame and conservation of alternative splicing pattern (Figure 4A; informatically predicted motif sites (data not shown), indicating Experimental Procedures). that these Rbfox2-dependent exons are enriched in direct Rbfox Focusing initially on cassette exons, we found that the esti- targets in the brain. mated model parameters confirmed, quantified, and extended our understanding of Rbfox splicing regulation (Figures 4B–4E Integrative Modeling Defines the Rbfox Target Splicing- and S3A–S3E; Supplemental Results). For example, stronger Regulatory Network motif sites are more likely to be bound by the protein (Figure 4B), To comprehensively define the functional target network directly and regions inferred to be bound by Rbfox have more CLIP tags regulated by the Rbfox proteins, we took an integrative modeling than those inferred not to be bound (Figure 4C). In addition, the approach, which we have recently developed, and successfully model was able to quantify the position-dependent RNA map: applied it to study the Nova target network (Zhang et al., 2010). binding of Rbfox in the downstream intron is predicted to result This method uses a Bayesian network to probabilistically weigh in Rbfox-dependent inclusion with a probability of 0.99, whereas and combine multiple types of data complementary to each binding of Rbfox in the upstream intron or exon is predicted other: bioinformatically predicted motif sites represented by to result in repression with a probability of 0.75 and 0.61, MECS scores, protein-RNA interaction sites mapped by Rbfox respectively. Binding of Rbfox in both exon and upstream intron HITS-CLIP, Rbfox1-dependent splicing identified by com- is expected to increase the probability of repression to 0.84 parison of WT with Rbfox1 KO mouse brain using exon-junction (Figure 4D). microarrays (Gehman et al., 2011), Rbfox2-dependent splicing in The model was then applied to each annotated cassette exon HeLa cells as described above, tissue-specific splicing as in the mouse genome to predict the probability of its activation or measured by RNA-seq (only for training; Brawand et al., 2011), repression by Rbfox through direct protein-RNA interactions. and evolutionary signatures including preservation of reading After using 10-fold cross-validation to ensure the model was

1144 Cell Reports 6, 1139–1152, March 27, 2014 ª2014 The Authors Figure 4. Integrative Modeling Predicts Rbfox Target Exons using a Bayesian Network The model is trained using cassette exons. (A) Design of the Bayesian network (BN). The 17 nodes (variables) model four types of data, including (U)GCAUG elements and CLIP tag clusters in each cassette exon or flanking upstream (UI) and downstream introns (DI), splicing change of exons with Rbfox depletion or among different tissues, and evolutionary signatures. (B) The probability of Rbfox binding to regions with varying motif scores. (C) The cumulative probability of CLIP tag cluster scores across all regions with or without inferred Rbfox binding. (D) The probability of exons showing Rbfox-dependent inclusion (red), exclusion (blue), or no effect (gray), given the indicated combinatorial Rbfox binding patterns in the exon (E), upstream (U), and downstream (D) introns. (E) The distribution of proportional splicing changes (DI) in Rbfox knockdown versus the control as measured by RNA-seq for exons with inferred Rbfox- dependent inclusion, exclusion, or without Rbfox regulation. (F) Rbfox binding pattern for exons predicted to be activated or repressed by Rbfox, or exons for which the direction of Rbfox regulation cannot be determined unambiguously. In each group, exons are ranked by the confidence of prediction (left). (U)GCAUG motif scores, CLIP tag cluster scores, and inferred probability of Rbfox binding at different positions of the alternatively spliced region are shown in the grayscale heatmaps (darker colors represent stronger binding). UI5, E5, and DI5 represent regions near the 50 splice sites of the upstream intron, exon, and downstream intron, respectively; Similarly, UI3, E3, and DI3 represent regions near the 30 splice sites.

not overfit (Figure S3F), we predicted 772 cassette exons as the reading frame and conservation of the alternative splicing direct Rbfox targets (FDR < 0.05; Table S6). Among these pattern (Figure S5; Supplemental Results). targets, Rbfox was predicted to activate 421 exons (probability After we confirmed the performance of the Bayesian network, of activation >0.7) and repress 113 exons (probability of repres- we applied the model to other types of alternative splicing events sion >0.7), respectively. For the remaining 238 exons predicted and predicted 212 events of tandem cassette exons (300 exons, as Rbfox targets, the Bayesian network was unable to assign Table S6) and 75 events of mutually exclusive exons (107 exons, the direction of regulation unambiguously (Figure 4F, left panel). Table S6) as direct Rbfox targets. Altogether, 587 genes have one This uncertainty is presumably due to a lack of observed Rbfox- or more alternative splicing events directly regulated by Rbfox. dependent splicing in the current experimental settings and to To understand the molecular function of these genes, we binding of Rbfox in both upstream and downstream introns performed (GO) analysis and found very significant simultaneously (Figure 4F, right panel). Based on comparison enrichment of genes with annotated function in ‘‘cytoskeleton’’ of the predicted exons with previously validated Rbfox-regulated (Benjamini FDR < 3.8 3 1017) and ‘‘neuron projection’’ (Benja- exons compiled from the literature, we estimated that our mini FDR < 3.8 3 108), compared to all brain-expressing genes Bayesian network analysis has a sensitivity of 73%–79% (Table S8). In addition, proteins encoded by Rbfox target (Supplemental Results; Figure S4; Table S7). We also compared transcripts are enriched in PDZ domains that are known to be the results of the Bayesian network analysis to our previous important for anchoring transmembrane proteins to the cytoskel- motif-based bioinformatic predictions and to another recent eton and for functioning as scaffolds for signaling complexes study that predicted Rbfox target exons based on the presence (Benjamini FDR < 7.2 3 108)(Ranganathan and Ross, 1997). of the Rbfox motif sites and the correlation of exon splicing with Rbfox expression (Ray et al., 2013). These comparisons showed Rbfox Regulates Global Dynamic Splicing Changes substantial overlap between exons predicted by different during Brain Development methods but also highlighted that the Bayesian network analysis It has previously been shown that all three Rbfox family members effectively integrated features known to be consistent with undergo increased expression in the mouse brain at pre- regulated alternative splicing events, such as preservation of natal stages between embryonic day E12 and E18 (Tang et al.,

Cell Reports 6, 1139–1152, March 27, 2014 ª2014 The Authors 1145 Figure 5. Rbfox Regulates Global A B No dev change C 8 Dynamic Splicing Changes during Brain embryonic adult Dev change Development 7 -16 P<7.6 10 P<0.003 (A) Rbfox1, 2, and 3 expression in embryonic 6 (green) or adult cortex (blue) as quantified by 1 RNA-seq data. Error bars represent SEM. 5 30 (B) The proportion of Rbfox target exons and 0.8 206 4 1671 20 nontarget exons with developmental splicing 0.6 changes. The difference is evaluated by Fisher’s 3 log2(RPKM) exact test. 0.4 100 2 (C) Rbfox target exons are divided into two 227 0.2 15 groups depending on whether Rbfox activates or 1 783

Exons in group (%) represses exon inclusion. For each group, the 0 number of exons showing higher (blue) or lower 0 Non- Target (green) inclusion in the adult versus embryonic Rbfox1 Rbfox2 Rbfox3 target cortex is shown. The direction of developmental splicing change is compared with the direction of Rbfox regulation, as assessed by Fisher’s exact test.

2009), and that the increase of Rbfox1 expression further target exons (Voineagu et al., 2011). The comprehensive Rbfox extends into postnatal stages (Hammock and Levitt, 2011). In target network defined by integrative modeling now allows us mouse and chicken, the differential expression of Rbfox proteins to examine the link between RBFOX1 and autism in more detail. is correlated with splicing changes in several exons during CNS Among the 235 Rbfox target cassette exons that are development (Kim et al., 2013; Tang et al., 2009), but how Rbfox conserved in human and that have sufficient RNA-seq read proteins affect the global switch of the developmental splicing coverage to evaluate splicing change in autistic versus control program is unclear. brains (Voineagu et al., 2011), 97 (41%) show alteration of We found that Rbfox family members undergo dynamic splicing in autistic brains (jDIj R 0.1, and FDR % 0.05), a very sig- changes in expression between E17 and adult mouse cortex, nificant overlap compared to random chance (odds ratio = 3.4, as evaluated from a published RNA-seq data set (Dillman p < 1.4 3 1016; Fisher’s exact test; Figure 6A). The autistic brains et al., 2013). Rbfox1 and Rbfox3 show 1.6-fold (p < 0.02; t test) compared here were selected to have low expression level of and 3.1-fold (p < 106; t test) increases in the adult cortex Rbfox1 (Voineagu et al., 2011) (4.1-fold downregulation com- compared to E17 cortex, respectively, whereas the expression pared to control brains as measured by RNA-seq; p < 0.05, of Rbfox2 is reduced 3.1-fold (p < 104; t test; Figure 5A). The t test; Figure 6B). However, the massive splicing change of Rbfox expression changes of the Rbfox proteins parallel the splicing targets detected in autistic versus control brains is somewhat changes of Rbfox target exons: 55% of Rbfox target exons surprising given the redundant role of the other Rbfox family with sufficient read coverage to quantify splicing show splicing members, as observed in Rbfox1 KO mice (Gehman et al., changes between the two developmental stages, as compared 2011, 2012). Interestingly, we found that the expression of the to 32% for exons not regulated by Rbfox (odds ratio = 2.4, p < other two family members, Rbfox2 and Rbfox3, was also down- 7.6 3 1016; Fisher’s exact test; Figure 5B). In addition, a major- regulated 3.3-fold (p < 0.05; t test) and 3.2-fold (p < 0.09; t test), ity (77%) of the exons activated by Rbfox have increased exon respectively. Therefore, simultaneous downregulation of all inclusion in the adult, whereas over half (57%) of the exons Rbfox family members might explain the massive splicing misre- repressed by Rbfox have decreased exon inclusion (odds ratio = gulation of Rbfox target exons observed in these autism patients. 4.4, p < 0.003; Fisher’s exact test; Figure 5C). This asymmetry On the other hand, many of the splicing changes observed in indicates that increased expression of Rbfox1 and Rbfox3 pre- autism patients may not be regulated by Rbfox proteins directly. dominates the developmental splicing change of their targets, To focus on Rbfox target genes that are likely genetic risk and in general they promote the switch to the adult splicing pro- factors of autism, we examined candidate autism-susceptibility gram through direct regulation. genes in the SFARI autism gene database (Basu et al., 2009). Among the 519 candidate autism-susceptibility genes with Rbfox Target Genes Are Linked to Autism mouse orthologs, 48 were identified as Rbfox targets by We previously demonstrated that Rbfox target transcripts pre- Bayesian network analysis (odds ratio = 2.8, p < 9.6 3 1018, dicted bioinformatically based on conserved motif sites have Fisher’s exact test; Table 1 and Table S6). The list includes three significant overlap with genes implicated in autism, supporting genes that are currently regarded as causal in syndromic the notion that disruption of either Rbfox1 itself or of its targets autism spectrum disorders (ASDs): Shank3 (Phelan-McDermid observed in autism patients is likely pathogenic (Zhang et al., Syndrome), Cacna1c (Timothy syndrome), and Tsc2 (tuberous 2010). This hypothesis was further supported by recent findings sclerosis complex). For a specific example, Rbfox is predicted that RBFOX1 is a hub in gene coexpression networks based to activate the inclusion of alternative exon 25 in the Tsc2 on microarray profiling of autistic and control postmortem gene, which is conserved between human and mouse (Fig- human brains, and its reduced expression in a subset of autism ure 6C). Although the function of this alternative exon has not patients is correlated with altered splicing of predicted Rbfox been characterized, its inclusion was recently shown to be

1146 Cell Reports 6, 1139–1152, March 27, 2014 ª2014 The Authors AB D

C

Figure 6. Rbfox Target Exons in Candidate Autism-Susceptibility Genes (A) Overlap between Rbfox target cassette exons and exons with altered splicing in autistic versus control brains. (B) Downregulation of Rbfox1, 2, and 3 expression in autistic versus control brains as quantified by RNA-seq. Error bars represent SEM. (C) Rbfox is predicted to activate the inclusion of a 129 nt exon in the Tsc2 gene. Below the gene structure schematic are RNA-seq data of different tissues showing a higher inclusion of the exon in cortex and heart, pooled Rbfox CLIP tags, Rbfox binding UGCAUG or GCAUG elements, and the phyloP conservation score. (D) Rbfox is predicted to activate the inclusion of an 84 nt poisonous exon in the Scn2a1 gene, which creates an in-frame premature termination codon (PTC) conserved in vertebrates.

dependent on Rbfox2 using cell culture models of epithelial-to- RT-PCR. Among eight exons for which both the inclusion and mesenchymal transition (Braeutigam et al., 2013). exclusion isoforms can be detected, three exons showed altered Inclusion or exclusion of Rbfox target exons mostly introduces splicing upon Rbfox1 depletion (jDIj R 0.1; Figure S6; Table S5), alteration in local amino acid sequences. However, we identified and these were not identified as Rbfox1 targets in the previous several cases (Fat1, St7, Scn2a1, and Scn8a) in which alternative genome-wide analysis (Gehman et al., 2011). The relatively small splicing is potentially coupled with nonsense-mediated mRNA number of validated exons is likely due to the redundancy of decay (NMD; Maquat, 2004). In Scn2a1, for instance, Rbfox is Rbfox proteins and the compensatory increase of Rbfox2 protein predicted to activate a cryptic exon harboring an in-frame expression in Rbfox1 KO animals (Gehman et al., 2011), although premature stop codon via extensive binding to the downstream it is possible that some of the predicted targets might represent intron (Figure 6D). Inclusion of this exon is undetectable in the false-positives. brain from RNA-seq data, presumably due to NMD of the inclusion isoform. Interestingly, whereas this alternative exon is DISCUSSION conserved in vertebrates, compensatory mutations in the stop codon have accumulated. This has resulted in different stop The focus of this study is to define and characterize the Rbfox codons in different species, suggesting an evolutionary selection target splicing-regulatory network in the mammalian brain. An pressure to preserve a stop codon. Scn2a1 is one of the few important piece of information missing in previous efforts toward genes with recurrent de novo gene-disrupting mutations this aim (e.g., Fogel et al., 2012; Gehman et al., 2011, 2012; Ray according to exome sequencing of ASD patients (Ronemus et al., 2013; Zhang et al., 2008) is a genome-wide, high-resolu- et al., 2014), and our analysis suggests that the same gene can tion map of in vivo Rbfox interaction sites in the brain. Such a potentially be disrupted in autism by different mechanisms. map is especially essential due to the functional redundancy of We tested a select set of exons in potentially autism-related different Rbfox family members, so that simultaneous depletion genes for regulation by Rbfox1 in the mouse brain. The splicing of more than one member is probably required to uncover a level of these exons was measured in CNS-specific Rbfox1-KO majority of Rbfox-dependent exons in a physiologically relevant mouse brains compared to WT controls by semiquantitative condition. A critical aspect of this work is our ability to identify

Cell Reports 6, 1139–1152, March 27, 2014 ª2014 The Authors 1147 Table 1. List of Rbfox Target Genes Implicated in Autism Gene ID Autism versus Control Splicing (Mouse) Gene Symbol Tissue Expression Change Function 11538 Adnp non-brain-specific transcription factor 11784 Apba2 brain-specific Y neurotransmitter release 11789 Apc non-brain-specific Wnt signaling 11941 Atp2b2 brain-specific calmodulin binding 319974 Auts2 non-brain-specific unknown 30948 Bin1 non-brain-specific Y synaptic vesicle endocytosis 12288 Cacna1c non-brain-specific ion channel 12289 Cacna1d non-brain-specific ion channel 12291 Cacna1g non-brain-specific ion channel 54725 Cadm1 non-brain-specific Y cell adhesion 320405 Cadps2 non-brain-specific Y synaptic vesicle exocytosis 100072 Camta1 non-brain-specific calmodulin binding 67300 Cltc non-brain-specific receptor localization 104318 Csnk1d non-brain-specific cell signaling 74006 Dnm1l non-brain-specific mitochondrial and peroxisomal division 75560 Ep400 non-brain-specific chromatin binding 13876 Erg non-brain-specific transcription factor 14107 Fat1 non-brain-specific cell adhesion 268566 Gphn non-brain-specific receptor localization 74053 Grip1 non-brain-specific glutamate receptor binding 108071 Grm5 brain-specific G-protein coupled receptor 227753 Gsn non-brain-specific Y cytoskeleton 14886 Gtf2i non-brain-specific Y transcription factor 16531 Kcnma1 brain-specific Y ion channel 242274 Lrrc7 brain-specific synapse assembly 18027 Nfia non-brain-specific transcription factor 319504 Nrcam brain-specific Y cell adhesion 18189 Nrxn1 brain-specific cell adhesion 18191 Nrxn3 brain-specific cell adhesion 80883 Ntng1 brain-specific cell adhesion 233977 Ppfia1 non-brain-specific Y cell signaling 353211 Prune2 non-brain-specific Y unknown 268859 Rbfox1 non-brain-specific RNA metabolism 268902 Robo2 non-brain-specific axon guidance 110876 Scn2a1 brain-specific ion channel 20273 Scn8a brain-specific ion channel 58234 Shank3 non-brain-specific synapse assembly 76376 Slc24a2 brain-specific ion transport 72055 Slc38a10 non-brain-specific ion/amino acid transport 94229 Slc4a10 brain-specific ion transport 64213 St7 non-brain-specific cell signaling 53416 Stk39 non-brain-specific serine/threonine kinase 20910 Stxbp1 non-brain-specific Y neurotransmitter release 64009 Syne1 non-brain-specific Y cytoskeleton 22084 Tsc2 non-brain-specific chaperone 22138 Ttn low-brain-expression sarcomere structure (Continued on next page)

1148 Cell Reports 6, 1139–1152, March 27, 2014 ª2014 The Authors Table 1. Continued Entrez Gene ID Autism versus Control Splicing (Mouse) Gene Symbol Tissue Expression Change Function 22214 Ube2h non-brain-specific protein ubiquitination 68134 Upf3b non-brain-specific mRNA metabolism Tissue specificity of gene expression is based on the RNA-seq data set in Brawand et al. (2011), and splicing change in autistic versus control brains is based on the RNA-seq data in Voineagu et al. (2011). Gene function was compiled from the literature.

over 40,000 robust Rbfox binding sites using two HITS-CLIP Therefore, this parameter could vary substantially for different protocols applied in parallel to each of the three Rbfox family proteins, depending on the specific amino acid(s) and nucleo- members. These include 8,811 sites (2,298 CIMS and 6,606 tide(s) at the crosslink sites, and possibly also on different exper- CITS with 93 common sites) for which the exact site of protein- imental conditions. RNA crosslinking can be deduced, allowing for determination The second critical aspect of this study is the use of an integra- of protein-RNA interactions at a single nucleotide resolution. tive modeling approach to combine multiple complementary The overall distribution of Rbfox binding sites defined by types of data, so that individually weak bits of information can HITS-CLIP is generally consistent with a recently published inde- be integrated to make strong predictions of Rbfox targets (Zhang pendent study (Lovci et al., 2013). et al., 2010). Because increasing amounts of high-throughput Detailed analysis of Rbfox-RNA crosslink sites provided data are being produced for different RBPs, interrogating RNA insights into the biophysical principles of protein-RNA cross- regulation from different perspectives, such a method has the linking. UV crosslinking of protein and RNA was thought to be unique advantage of being able to identify direct, functional most efficient for the nucleotide uridine. This presumptive prefer- target transcripts with high specificity and sensitivity simulta- ence was used to interpret the U stretch enriched in sequences neously. We were able to assign the direction of Rbfox regulation around CIMS and in iCLIP data for several RBPs including Nova, for a majority of Rbfox targets (69% of cassette exons), which Ago, hnRNP C, and TIA (Sugimoto et al., 2012), despite the fact allowed us to evaluate the impact of splicing regulation by Rbfox that these proteins have a genuine binding preference for uridine. proteins in different physiological contexts, including brain Therefore, the predominant crosslinking of Rbfox and substrate development and autism. RNA at the two guanines in the (U)GCAUG motif, as suggested Our analysis extends previous observations regarding the by CIMS and CITS analysis, is somewhat unexpected, especially differential expression of Rbfox family members during brain given the presence of two uridines in the motif. We argue that development (Hammock and Levitt, 2011; Kim et al., 2013; these crosslink sites likely reflect the residues in closest contact Tang et al., 2009) by showing increased expression of Rbfox1 with the protein or those interacting with specific amino acids, and Rbfox3 and decreased expression of Rbfox2 in adult which agrees very well with a structure study of the protein- compared to a late prenatal stage. Similar dynamic changes in RNA complex (Auweter et al., 2006). Additional evidence sup- Rbfox1 and Rbfox2 expression were previously observed in porting this argument comes from predominant crosslinking to the heart during postnatal development (Kalsotra et al., 2008). different nucleotides for several other RBPs with distinct binding Importantly, differential expression of Rbfox is paralleled by specificity (Moore et al., 2014). On the other hand, we also developmental splicing changes in over half of the quantifiable observed that a substantial proportion of Rbfox-RNA crosslink Rbfox target exons, frequently in the direction consistent with sites do not overlap with the canonical Rbfox (U)GCAUG motif. direct Rbfox regulation. Combined with neurodevelopmental Paradoxically, when these sites were included for analysis, the defects observed in cell cultures (Kim et al., 2013), animal base composition at CIMS identified in Rbfox CLIP data is models (Gehman et al., 2011, 2012; Kim et al., 2013), and human slightly biased toward uridine. One possible interpretation for patients (Bhalla et al., 2004) where Rbfox proteins are disrupted, this discrepancy is that some of these sites without the Rbfox this presents a compelling case for Rbfox proteins playing motif might have resulted from crosslinking of more transient critical roles in driving the global dynamic change of the develop- protein-RNA interactions largely independent of the Rbfox spec- mental splicing-regulatory program. ificity (e.g., recruitment by other interacting RBPs), which would Given the strong implication of Rbfox1 in autism and the role of suggest that preferential crosslinking to uridine indeed exists to Rbfox proteins in neural development, we are particularly inter- some extent. However, for high-affinity protein-RNA interac- ested in the molecular mechanisms underlying ASD cases with tions, such ‘‘background’’ preference does not prevent UV mutations in RBFOX1. One hypothesis of the autism etio- crosslinking at specific nucleotides in the core motif. pathology is that many genes implicated in autism can be dis- It has been reported that the rate of cDNA truncation at the rupted individually in cis by mutations in the genes themselves, crosslink site during reverse transcription is as high as 82% for or in trans by disruption of their upstream regulators such as Nova and over 95% for several other RBPs (Sugimoto et al., Rbfox1. This hypothesis was supported both by the significant 2012). Based on the relative frequency of deletions in Rbfox overlap between candidate autism-susceptibility genes and CLIP tags obtained with standard and BrdU-CLIP protocols Rbfox target genes and by the significant overlap of predicted (14% versus 6.2%), we estimated that 57% of Rbfox CLIP tags Rbfox target exons and exons altered in autistic brains with are truncated at the crosslink sites (Experimental Procedures). downregulation of Rbfox1 (Voineagu et al., 2011). Importantly,

Cell Reports 6, 1139–1152, March 27, 2014 ª2014 The Authors 1149 we found that all Rbfox family members are downregulated in libraries were prepared as described previously (Darnell, 2010; Licatalosi some autistic brains, underscoring the potential clinical rele- et al., 2008; Moore et al., 2014). Details of the BrdU-CLIP protocol are vance of the Rbfox target network in autism. In addition, we described in the Supplemental Experimental Procedures. Bioinformatic analysis of CLIP data was performed essentially as described were able to highlight 48 genes in the SFARI autism gene data- previously to obtain unique CLIP tags and define CLIP tag clusters and peaks base as direct Rbfox targets. The functions of these genes, (Charizanis et al., 2012; Moore et al., 2014; Zhang and Darnell, 2011). Muta- which include cytoskeleton and scaffolding, synaptic transmis- tions in unique CLIP tags were recorded for CIMS analysis as described sion, ion channels, and transcription regulation, are potentially previously (Moore et al., 2014; Zhang and Darnell, 2011). CITS analysis was relevant to the neurobiology underlying autism. performed using a computational method similar to that used to analyze iCLIP We note that this study is aimed at defining the pan-Rbfox data with several modifications (Ko¨ nig et al., 2010; Wang et al., 2010). We removed CLIP tags with deletions, which presumably had read through the alternative splicing target network as a means of elucidating their crosslink sites and clustered the remaining tags based on their potential molecular functions. We are currently limited in our ability to truncation sites, and then used a stringent statistical test to evaluate the signif- comprehensively identify target transcripts differentially regu- icance of the observed truncation frequency. lated by the three family members individually. An outstanding question is the extent to which the Rbfox family members have RNA-Seq in HeLa Cells distinct physiological functions and the underlying molecular HeLa cells were infected with murine-stem-cell-virus-expressing shRbfox2 or mechanisms for these potential differences. Our CLIP data sug- the empty retroviral vector (control) (Zhang et al., 2008). To reduce splicing pre- cursors and intermediates, cells were fractionated to enrich cytoplasmic RNA gest that all three members have very similar protein-RNA inter- before sequencing. The downstream analysis was performed as described action profiles, indicating that the highly conserved RRM is the previously (Charizanis et al., 2012) except that the proportional change of major determinant of targeting specificity. However, the func- exon inclusion was estimated using the ASPIRE algorithm (Ule et al., 2005b). tional divergence of the different members could arise from the increased variation in the N-terminal and C-terminal regions, RT-PCR Validation which were shown to also be important in splicing regulation Semiquantitative RT-PCR in HeLa cells was performed as described previ- (Jin et al., 2003; Nakahata and Kawamoto, 2005; Sun et al., ously (Zhang et al., 2008). For RT-PCR validation, total RNA was extracted 2012). For example, different members might recruit different from brains of WT and homozygous Rbfox1 KO mice. Candidate exons for vali- dation were selected to cover the whole range of prediction confidence, avoid- cofactors and exert different regulatory effects depending on ing exons with complex splicing patterns or very high or low inclusion levels. cellular context. However, addressing this question will require expression of different combinations of Rbfox proteins in physi- Integrative Modeling Using Bayesian Network Analysis ologically relevant systems, a currently nontrivial experiment. We used our established Bayesian network framework with some modifica- Such data, when available, will nevertheless further improve tions (Zhang et al., 2010). We considered differential splicing between WT the accuracy of our model by correlating them with protein- and Rbfox1 KO mouse brain, differential splicing between shRbfox2 and con- RNA interactions and other types of data specific to individual trol HeLa cells, tissue-specific splicing of exons in cortex and cerebellum compared to other tissues (liver, kidney, and testis), and tissue-specific Rbfox family members. splicing of exons in heart compared to other tissues (liver, kidney, and testis). Finally, another important question beyond the scope of this For motif sites, we considered both UGCAUG and VGCAUG elements, and work concerns the roles of Rbfox in the regulation of other path- measured their strength by their MECS score. ways in RNA metabolism. Several recent findings propose that To train the model, we used 121 cassette exons compiled from the literature Rbfox can also regulate alternative polyadenylation (Wang that have been previously validated as Rbfox targets in humans or mice and et al., 2008) and mRNA stability (Ray et al., 2013), based on assigned them a class label (activation or repression). We also included unla- beled exons that showed relatively large or no changes in inclusion in exon- analysis of Rbfox motif sites in 30 UTRs that are correlated with junction microarray or RNA-seq data. This resulted in 551 nonredundant transcript abundance in different tissues. Although the current exons, representing both targets and nontargets, to estimate model parame- data supporting the role of Rbfox in these processes are largely ters. The trained Bayesian network model was applied to 16,034 cassette correlative, they are corroborated by our observation that almost exons, as well as 4,074 events of tandem cassette exons and 960 events of 30% of Rbfox binding sites are in 30 UTRs. A mechanistic under- mutually exclusive exons, to predict direct Rbfox targets. standing of the functional impact of Rbfox interacting with 30 Details of all experimental protocols and computational analyses are described in the Supplemental Experimental Procedures. UTRs awaits further investigation. We expect that the data pre- sented in this work will provide a valuable resource to facilitate ACCESSION NUMBERS these efforts. Rbfox HITS-CLIP and RNA-seq data were deposited to the NCBI Short Reads EXPERIMENTAL PROCEDURES Archive under accession number SRP035321.

All animal-related procedures were conducted according to the Institutional SUPPLEMENTAL INFORMATION Animal Care and Use Committee guidelines at Columbia University Medical Center and the Rockefeller University. Supplemental Information includes Supplemental Results, Supplemental Experimental Procedures, six figures, and eight tables and can be found HITS-CLIP Experiments Using Mouse Brain with this article online at http://dx.doi.org/10.1016/j.celrep.2014.02.005. Antibodies used in HITS-CLIP experiments for individual Rbfox members were tested to ensure that there is no or minimal cross-reactivity between family AUTHOR CONTRIBUTIONS members and to optimize the signal-to-noise ratios of IP in the CLIP condition. Rbfox1, 2, and 3 CLIP experiments were performed with whole-brain tissue C.Z. and R.B.D. initiated this study. A.M. developed the BrdU-CLIP protocol. lysate of P15 CD1 mice using two different protocols. The standard CLIP C.Z. optimized and performed Rbfox HITS-CLIP experiments in the R.B.D.

1150 Cell Reports 6, 1139–1152, March 27, 2014 ª2014 The Authors lab. S.S. and Z.Z. generated HeLa cell RNA-seq data. S.M.W.-V. and C.Z. Dillman, A.A., Hauser, D.N., Gibbs, J.R., Nalls, M.A., McCoy, M.K., Rudenko, performed bioinformatics analysis with assistance from C.X. Q.Y. and M.H. I.N., Galter, D., and Cookson, M.R. (2013). mRNA expression, splicing and performed the mouse work and collected WT and Rbfox1 KO brain samples. editing in the embryonic and adult mouse cerebral cortex. Nat. Neurosci. 16, S.S. and Q.Y. performed experimental validation of Rbfox targets in HeLa cells 499–506. and mouse brains, respectively. N.F. and P.A.S. produced the Rbfox3 Dredge, B.K., and Darnell, R.B. (2003). Nova regulates GABA(A) receptor g2 polyclonal antibody and performed the initial analysis. M.Q.Z., A.R.K., alternative splicing via a distal downstream UCAU-rich intronic splicing R.B.D., and C.Z. supervised the study. S.M.W.-V., Q.Y., and C.Z. wrote the enhancer. Mol. Cell. Biol. 23, 4687–4700. paper with input from all authors. Fogel, B.L., Wexler, E., Wahnich, A., Friedrich, T., Vijayendran, C., Gao, F., Parikshak, N., Konopka, G., and Geschwind, D.H. (2012). RBFOX1 regulates ACKNOWLEDGMENTS both splicing and transcriptional networks in human neuronal development. Hum. Mol. Genet. 21, 4171–4186. The authors would like to thank Masato Yano for generating pRbfox3 plasmid, Gallagher, T.L., Arribere, J., Adkar, S., Marr, H., Dill, K., Garnett, A., Amacher, Joe Luna for methodological development assistance, Yuan Yuan and Michael S., and Conboy, J. (2011). Fox1 and Fox4 regulate muscle-specific splicing in Moore for helpful discussion, Scott Dewell for high-throughput sequencing, zebrafish and are required for cardiac and skeletal muscle functions. Dev. Biol. Melis Kayikci and Jernej Ule for ASPIRE analysis, Jinbiao Ma and Judith 344, 491–492. Kribelbauer for assistance with the interpretation of the Rbfox1-RNA complex structure data, and Judith Kribelbauer for assistance with the preparation of Gehman, L.T., Stoilov, P., Maguire, J., Damianov, A., Lin, C.-H., Shiue, L., Ares, Figure 2G. This work was supported by grants from the National Institutes of M., Jr., Mody, I., and Black, D.L. (2011). The splicing regulator Rbfox1 (A2BP1) Health (GM74688 and HG001696 to M.Q.Z., GM42699 to A.R.K., NS81706 controls neuronal excitation in the mammalian brain. Nat. Genet. 43, 706–711. and NS34389 to R.B.D., and R00GM95713 to C.Z.) and Simons Foundation Gehman, L.T., Meera, P., Stoilov, P., Shiue, L., O’Brien, J.E., Meisler, M.H., Autism Research Initiative (240432 to R.B.D. and 297990 to C.Z.). R.B.D. is Ares, M., Jr., Otis, T.S., and Black, D.L. (2012). The splicing regulator Rbfox2 a Howard Hughes Medical Institute Investigator. is required for both cerebellar development and mature motor function. Genes Dev. 26, 445–460. Received: September 22, 2013 Granneman, S., Kudla, G., Petfalski, E., and Tollervey, D. (2009). Identification Revised: November 30, 2013 of protein binding sites on U3 snoRNA and pre-rRNA by UV cross-linking Accepted: February 4, 2014 and high-throughput analysis of cDNAs. Proc. Natl. Acad. Sci. USA 106, Published: March 6, 2014 9613–9618. Hammock, E.A.D., and Levitt, P. (2011). Developmental expression mapping REFERENCES of a gene implicated in multiple neurodevelopmental disorders, A2bp1 (Fox1). Dev. Neurosci. 33, 64–74. Auweter, S.D., Fasan, R., Reymond, L., Underwood, J.G., Black, D.L., Pitsch, Ingolia, N.T., Ghaemmaghami, S., Newman, J.R.S., and Weissman, J.S. S., and Allain, F.H.-T. (2006). Molecular basis of RNA recognition by the human (2009). Genome-wide analysis in vivo of translation with nucleotide resolution alternative splicing factor Fox-1. EMBO J. 25, 163–173. using ribosome profiling. Science 324, 218–223. Baraniak, A.P., Chen, J.R., and Garcia-Blanco, M.A. (2006). Fox-2 mediates Jin, Y., Suzuki, H., Maegawa, S., Endo, H., Sugano, S., Hashimoto, K., epithelial cell-specific fibroblast growth factor receptor 2 exon choice. Mol. Yasuda, K., and Inoue, K. (2003). A vertebrate RNA-binding protein Fox-1 Cell. Biol. 26, 1209–1222. regulates tissue-specific splicing via the pentanucleotide GCAUG. EMBO J. Barash, Y., Calarco, J.A., Gao, W., Pan, Q., Wang, X., Shai, O., Blencowe, B.J., 22, 905–912. and Frey, B.J. (2010). Deciphering the splicing code. Nature 465, 53–59. Kalsotra, A., Xiao, X., Ward, A.J., Castle, J.C., Johnson, J.M., Burge, C.B., and Basu, S.N., Kollu, R., and Banerjee-Basu, S. (2009). AutDB: a gene reference Cooper, T.A. (2008). A postnatal switch of CELF and MBNL proteins repro- resource for autism research. Nucleic Acids Res. 37 (Database issue), D832– grams alternative splicing in the developing heart. Proc. Natl. Acad. Sci. D836. USA 105, 20333–20338. Bhalla, K., Phillips, H.A., Crawford, J., McKenzie, O.L.D., Mulley, J.C., Eyre, H., Kim, K.K., Nam, J., Mukouyama, Y.S., and Kawamoto, S. (2013). Rbfox3- Gardner, A.E., Kremmidiotis, G., and Callen, D.F. (2004). The de novo chromo- regulated alternative splicing of Numb promotes neuronal differentiation some 16 translocations of two patients with abnormal phenotypes (mental during development. J. Cell Biol. 200, 443–458. retardation and epilepsy) disrupt the A2BP1 gene. J. Hum. Genet. 49, Ko¨ nig, J., Zarnack, K., Rot, G., Curk, T., Kayikci, M., Zupan, B., Turner, D.J., 308–311. Luscombe, N.M., and Ule, J. (2010). iCLIP reveals the function of hnRNP Braeutigam, C., Rago, L., Rolke, A., Waldmeier, L., Christofori, G., and Winter, particles in splicing at individual nucleotide resolution. Nat. Struct. Mol. Biol. J. (2013). The RNA-binding protein Rbfox2: an essential regulator of EMT- 17, 909–915. driven alternative splicing and a mediator of cellular invasion. Oncogene Licatalosi, D.D., Mele, A., Fak, J.J., Ule, J., Kayikci, M., Chi, S.W., Clark, T.A., http://dx.doi.org/10.1038/onc.2013.50. Schweitzer, A.C., Blume, J.E., Wang, X., et al. (2008). HITS-CLIP yields Brawand, D., Soumillon, M., Necsulea, A., Julien, P., Csa´ rdi, G., Harrigan, P., genome-wide insights into brain alternative RNA processing. Nature 456, Weier, M., Liechti, A., Aximu-Petri, A., Kircher, M., et al. (2011). The evolution of 464–469. gene expression levels in mammalian organs. Nature 478, 343–348. Lovci, M.T., Ghanem, D., Marr, H., Arnold, J., Gee, S., Parra, M., Liang, T.Y., Charizanis, K., Lee, K.-Y., Batra, R., Goodwin, M., Zhang, C., Yuan, Y., Shiue, Stark, T.J., Gehman, L.T., Hoon, S., et al. (2013). Rbfox proteins regulate L., Cline, M., Scotti, M.M., Xia, G., et al. (2012). Muscleblind-like 2-mediated alternative mRNA splicing through evolutionarily conserved RNA bridges. alternative splicing in the developing brain and dysregulation in myotonic Nat. Struct. Mol. Biol. 20, 1434–1442. dystrophy. Neuron 75, 437–450. Maquat, L.E. (2004). Nonsense-mediated mRNA decay: splicing, translation Core, L.J., Waterfall, J.J., and Lis, J.T. (2008). Nascent RNA sequencing and mRNP dynamics. Nat. Rev. Mol. Cell Biol. 5, 89–99. reveals widespread pausing and divergent initiation at human promoters. Martin, C.L., Duvall, J.A., Ilkin, Y., Simon, J.S., Arreaza, M.G., Wilkes, K., Science 322, 1845–1848. Alvarez-Retuerto, A., Whichello, A., Powell, C.M., Rao, K., et al. (2007). Cyto- Damianov, A., and Black, D.L. (2010). Autoregulation of Fox protein expression genetic and molecular characterization of A2BP1/FOX1 as a candidate gene to produce dominant negative splicing factors. RNA 16, 405–416. for autism. Am. J. Med. Genet. B. Neuropsychiatr. Genet. 144B, 869–876. Darnell, R.B. (2010). HITS-CLIP: panoramic views of protein-RNA regulation in Minovitsky, S., Gee, S.L., Schokrpur, S., Dubchak, I., and Conboy, J.G. (2005). living cells. Wiley Interdiscip Rev. RNA 1, 266–286. The splicing regulatory element, UGCAUG, is phylogenetically and spatially

Cell Reports 6, 1139–1152, March 27, 2014 ª2014 The Authors 1151 conserved in introns that flank tissue-specific alternative exons. Nucleic Acids Ule, J., Stefani, G., Mele, A., Ruggiu, M., Wang, X., Taneri, B., Gaasterland, T., Res. 33, 714–724. Blencowe, B.J., and Darnell, R.B. (2006). An RNA map predicting Nova-depen- Moore, M.J., Zhang, C., Gantman, E.C., Mele, A., Darnell, J.C., and Darnell, dent splicing regulation. Nature 444, 580–586. R.B. (2014). Mapping Argonaute and conventional RNA-binding protein inter- Underwood, J.G., Boutz, P.L., Dougherty, J.D., Stoilov, P., and Black, D.L. actions with RNA at single-nucleotide resolution using HITS-CLIP and CIMS (2005). Homologues of the Caenorhabditis elegans Fox-1 protein are neuronal analysis. Nat. Protoc. 9, 263–293. splicing regulators in mammals. Mol. Cell. Biol. 25, 10005–10016. Nakahata, S., and Kawamoto, S. (2005). Tissue-dependent isoforms of mammalian Fox-1 homologs are associated with tissue-specific splicing Voineagu, I., Wang, X., Johnston, P., Lowe, J.K., Tian, Y., Horvath, S., Mill, J., activities. Nucleic Acids Res. 33, 2078–2089. Cantor, R.M., Blencowe, B.J., and Geschwind, D.H. (2011). Transcriptomic analysis of autistic brain reveals convergent molecular pathology. Nature Ponthier, J.L., Schluepen, C., Chen, W., Lersch, R.A., Gee, S.L., Hou, V.C., Lo, 474, 380–384. A.J., Short, S.A., Chasis, J.A., Winkelmann, J.C., and Conboy, J.G. (2006). Fox-2 splicing factor binds to a conserved intron motif to promote inclusion Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., King- of protein 4.1R alternative exon 16. J. Biol. Chem. 281, 12468–12474. smore, S.F., Schroth, G.P., and Burge, C.B. (2008). Alternative isoform regula- Ranganathan, R., and Ross, E.M. (1997). PDZ domain proteins: scaffolds for tion in human tissue transcriptomes. Nature 456, 470–476. signaling complexes. Curr. Biol. 7, R770–R773. Wang, Z., Kayikci, M., Briese, M., Zarnack, K., Luscombe, N.M., Rot, G., Ray, D., Kazan, H., Cook, K.B., Weirauch, M.T., Najafabadi, H.S., Li, X., Zupan, B., Curk, T., and Ule, J. (2010). iCLIP predicts the dual splicing effects Gueroussov, S., Albu, M., Zheng, H., Yang, A., et al. (2013). A compendium of TIA-RNA interactions. PLoS Biol. 8, e1000530. of RNA-binding motifs for decoding gene regulation. Nature 499, 172–177. Xu, B., Roos, J.L., Levy, S., van Rensburg, E.J., Gogos, J.A., and Karayiorgou, Ronemus, M., Iossifov, I., Levy, D., and Wigler, M. (2014). The role of de novo M. (2008). Strong association of de novo copy number mutations with sporadic mutations in the genetics of autism spectrum disorders. Nat. Rev. Genet. 15, schizophrenia. Nat. Genet. 40, 880–885. 133–141. Sebat, J., Lakshmi, B., Malhotra, D., Troge, J., Lese-Martin, C., Walsh, T., Yeo, G.W., Coufal, N.G., Liang, T.Y., Peng, G.E., Fu, X.-D., and Gage, F.H. Yamrom, B., Yoon, S., Krasnitz, A., Kendall, J., et al. (2007). Strong association (2009). An RNA code for the FOX2 splicing regulator revealed by mapping of de novo copy number mutations with autism. Science 316, 445–449. RNA-protein interactions in stem cells. Nat. Struct. Mol. Biol. 16, 130–137. Sugimoto, Y., Ko¨ nig, J., Hussain, S., Zupan, B., Curk, T., Frye, M., and Ule, J. Zhang, C., and Darnell, R.B. (2011). Mapping in vivo protein-RNA interactions (2012). Analysis of CLIP and iCLIP methods for nucleotide-resolution studies of at single-nucleotide resolution from HITS-CLIP data. Nat. Biotechnol. 29, protein-RNA interactions. Genome Biol. 13, R67. 607–614. Sun, S., Zhang, Z., Fregoso, O., and Krainer, A.R. (2012). Mechanisms of Zhang, C., Zhang, Z., Castle, J., Sun, S., Johnson, J., Krainer, A.R., and Zhang, activation and repression by the alternative splicing factors RBFOX1/2. RNA M.Q. (2008). Defining the regulatory network of the tissue-specific splicing 18, 274–283. factors Fox-1 and Fox-2. Genes Dev. 22, 2550–2563. Tang, Z.Z., Zheng, S., Nikolic, J., and Black, D.L. (2009). Developmental con- trol of CaV1.2 L-type calcium channel splicing by Fox proteins. Mol. Cell. Biol. Zhang, C., Frias, M.A., Mele, A., Ruggiu, M., Eom, T., Marney, C.B., Wang, H., 29, 4757–4765. Licatalosi, D.D., Fak, J.J., and Darnell, R.B. (2010). Integrative modeling Ule, J., Jensen, K., Mele, A., and Darnell, R.B. (2005a). CLIP: a method for defines the Nova splicing-regulatory network and its combinatorial controls. identifying protein-RNA interaction sites in living cells. Methods 37, 376–386. Science 329, 439–443. Ule, J., Ule, A., Spencer, J., Williams, A., Hu, J.-S., Cline, M., Wang, H., Clark, Zhang, C., Lee, K.-Y., Swanson, M.S., and Darnell, R.B. (2013). Prediction of T., Fraser, C., Ruggiu, M., et al. (2005b). Nova regulates brain-specific splicing clustered RNA-binding protein motif sites in the mammalian genome. Nucleic to shape the synapse. Nat. Genet. 37, 844–852. Acids Res. 41, 6793–6807.

1152 Cell Reports 6, 1139–1152, March 27, 2014 ª2014 The Authors SUPPLEMENTAL EXPERIMENTAL PROCEDURES All animal-related procedures were conducted according to the Institutional Animal Care and Use Committee (IACUC) guidelines at Columbia University Medical Center and Rockefeller University.

Transient Transfection of Rbfox1, 2, and 3 in HEK293T Cells To test cross-reactivity of Rbfox antibodies, 1.5×106 HEK293T cells were plated onto 10 cm dishes the day before transfection. Cells were then transfected with 2 µg of pRbfox1, pRbfox2, or pRbfox3 individually, or the empty pcDNA vector as a control, using FuGENE HD (Roche) as described by the manufacturer. After 48 hrs, cells were rinsed with ice- cold PBS and collected in lysis buffer (1X PBS, 0.1% SDS, 0.5% NaDOC, and 0.5% (v/v) NP-40) for protein extraction.

Immunoblot Analysis A total of 20 µg protein per lane was loaded into 10% or 4-12% Novex NuPage SDS-PAGE gels (Invitrogen). After protein transfer onto nitrocellulose (Millipore) 0.45 µm membranes, the following antibodies were used for immunoblot: α-Rbfox1(Millipore), α-Rbfox2 (Bethyl Laboratories), α-Rbfox3 polyclonal (In-house), α-Rbfox3 monoclonal (Millipore), and α-GAPDH (Millipore).

HITS-CLIP Experiments using Mouse Brain Rbfox1, 2, and 3 CLIP experiments were performed with whole brain tissue lysate of about two-week old (P15) CD1 mice using two different protocols. Each CLIP experiment used approximately 1/4-1/3 of a whole brain (100-150 mg tissue lysate). The following antibodies were used: α-Rbfox1 (Millipore), α-Rbfox2 (Bethyl Laboratories), α-Rbfox3 polyclonal (in-house) and monoclonal (Millipore). The standard CLIP libraries were prepared as described previously (Darnell, 2010; Licatalosi et al., 2008; Moore et al., 2014). The BrdU-CLIP protocol is composed of the following steps. Protein-RNA complexes are immunoprecipitated using each specific antibody followed by stringent washes and ligation of the 3´ linker. After digestion of the protein, the RNA is purified and reverse transcribed using a dNTP mix where dTTP is replaced with BrdUTP. The RT primer has 5´ and 3´ anchor sequences used later for PCR amplification separated by an APE1 cleavage site. The BrdUTP incorporated into the resulting cDNA allows on-bead cDNA purification using an anti-BrdU antibody (clone IIB5; Santa Cruz Biotech or other vendors). The purified cDNA is then circularized and relinearized on-bead using CircLigase II and APE1, respectively, which places the two PCR anchors in the correct orientation. PCR is performed using a real time amplification system; SYBR green is added to the PCR reaction mix to monitor the concentration of DNA, and the PCR reaction is stopped when the fluorescence intensity (RFU) reaches an empirically-determined threshold. More details of the protocol are described in Supplemental Protocols. The resulting libraries prepared with both standard CLIP and BrdU-CLIP protocols were sequenced by Illumina Hi-Seq 2000 to obtain 51-nt reads.

RNA-Seq in HeLa Cells HeLa cells with or without Rbfox2 knockdown were generated previously (Zhang et al., 2008). Briefly, an shRNA against human Rbfox2 (shRbfox2) was cloned into the MSCV retroviral vector (Dickins et al., 2005). To create stable cell pools, HeLa cells were infected with MSCV expressing shRbfox2 or the empty retroviral vector (control). We 1 replaced the medium 24 hrs after infection. 24 hrs later, infected cells were selected with puromycin (2 µg mL−1) for 72 hrs.

To reduce splicing precursors and intermediates, cells were fractionated to enrich cytoplasmic RNA. Specifically, cells were lysed in gentle lysis buffer (10 mM HEPES pH 7.4, 10 mM NaCl, 3 mM MgCl2, 0.5 % (v/v) NP-40), and the nuclei were pelleted at 2,300 g for 5 min. Cytoplasmic RNA was extracted from the supernatant by Trizol and treated with DNase I (Promega). We used 4 µg of cytoplasmic RNA from each sample to prepare RNA-Seq libraries following the manufacturer’s instructions (Illumina). cDNA was size-selected using an agarose gel, and a band around 250 nt (corresponding to a fragment size of 185 nt with adaptors excluded) was excised for PCR amplification. The library was sequenced on Illumina GA IIx sequencer to produce paired-end 32-nt reads. Each of the two samples was sequenced in three lanes to increase the coverage.

Rbfox1 KO Mice Rbfox1loxp/loxp mice (described previously by Gehman et al. 2011) and Nestin-Cre+/- mice were purchased from Jackson lab and bred to make heterozygous (Rbfox1loxp/+/Nestin-Cre+/-) mice. The heterozygous mice were then crossed with Rbfox1loxp/loxp to obtain homozygous (Rbfox1loxp/ loxp/Nestin-Cre+/-) offspring. Adult Rbfox1 KO mice at around 9 months and age-matched WT controls were used for validation assays (duplicates for each genotype).

RT-PCR Validation Semi-quantitative RT-PCR in HeLa cells was performed as previously described (Zhang et al., 2008). For RT-PCR validation, total RNA was extracted from brains of WT and homozygous Rbfox1 KO (Rbfox1loxp/ loxp/Nestin-Cre+/-) mice using Trizol reagent (Invitrogen) and treated with DNase I. Candidate exons for validation were selected to cover the whole range of prediction confidence, but exons with complex splicing patterns or very high (≥0.9) or low (≤0.1) inclusion levels in WT adult mouse brains (Dillman et al., 2013) were avoided. Semi-quantitative RT-PCR analysis of these exons was performed as previously described (Zhang et al., 2010). A total of eight exons with both alternative isoforms detected in WT or Rbfox1 brain samples were included in our quantitative analysis. Primers used in this study are listed in Table S5.

Alternative Splicing Database The alternative splicing database used in this study was built by aligning RefSeq, mRNA, and EST sequences to the mouse genome (mm10), as described previously (Zhang et al., 2010). Alternative splicing events used in our Bayesian network analysis consist of 16,034 cassette exons, 4,074 events of tandem cassette exons, and 960 events of mutually exclusive exons. A similar database was generated for human (hg19) and rat (rn5) to determine conservation of alternative splicing patterns.

Analysis of HITS-CLIP Data

2 HITS-CLIP data analysis was performed essentially as described previously (Moore et al., 2014; Zhang and Darnell, 2011). In brief, 51-nt raw reads were first filtered by their quality scores. The standard CLIP reads have a 5-nt random barcode, and the BrdU-CLIP reads have a 5-nt sample multiplexing index in addition to a 9-nt random barcode at the 5´ end. For reads kept for downstream analysis, we required a minimum score of 20 at these index/barcode positions and an average score of 20 for the succeeding 25 positions corresponding to the first 25-nt of the actual CLIP tags. Barcode and index sequences were then removed from the read sequences and the information was recorded separately. The remaining read sequences corresponding to actual RNA tags were mapped to the reference genome (mm9 at the time of the analysis) using the novoalign program (http://www.novocraft.com) and allowing ≤2 mismatches (substitutions, insertions, or deletions) per read. We used the iterative trimming mode of novoalign during alignment, which iteratively removes adaptor sequences and low quality bases at the 3´ end so that CLIP tags of smaller sizes can be mapped properly. A minimum of 25-nt matches was required and only those reads mapped unambiguously to the genome (single hits) were kept for downstream analysis. Using an iterative expectation maximization-like statistical model (Darnell et al., 2011; Moore et al., 2014), we collapsed CLIP tags mapping to the same genomic positions into a single tag; tags with sufficiently distinct barcode sequences were kept, however. This step removed PCR duplicates while retaining genuinely unique CLIP tags (Darnell et al., 2011; Moore et al., 2014) and resulted in unique CLIP tags that represent independent captures of protein-RNA interactions. Deletions, insertions, and substitutions—collectively denoted as mutations—in unique CLIP tags were recorded for crosslinking-induced mutation site (CIMS) analysis. Coordinates of unique CLIP tags and mutations therein were then liftOver to mm10 for further analysis.

CLIP tag clusters were identified by grouping overlapping CLIP tags (Moore et al., 2014), and the statistical significance of CLIP tag cluster peak height was evaluated using scan statistics as described previously (Charizanis et al., 2012). This approach compares the observed peak height with the height expected by chance when CLIP tags are randomly shuffled on a gene-by-gene basis. CIMS analysis was performed as described previously (Moore et al., 2014; Zhang and Darnell, 2011).

Crosslinking-induced truncation site (CITS) analysis was performed using a computational method similar to a previous method used to analyze iCLIP data (Konig et al., 2010; Wang et al., 2010) with several modifications. We removed all CLIP tags with deletions since a majority of these likely represent tags where the crosslink sites were read through during reverse transcription. We clustered the remaining CLIP tags based on their potential sites of truncation, defined as the nucleotide immediately 5´ of each CLIP tag. The key to identifying CITS is to distinguish tags that were read through from those with premature truncation. We used a stringent statistical test to evaluate whether the observed frequency of potential truncations in each site is significantly higher than that expected by chance. This was done using a random redistribution of potential truncations in each CLIP tag cluster, which is more stringent than the gene-by-gene permutation used in the previous analyses (Konig et al., 2010; Wang et al., 2010). The stringent permutation scheme is important because read-through is frequent for Rbfox CLIP data. Similar to identification of significant CLIP tag cluster peaks, scan statistics were used for statistical assessment.

3 To estimate the rate of truncation at the crosslink site, we used the same method as described previously (Sugimoto et al., 2012), using the formula p(BrdU-CLIP)=f×p(RT)+(1-f)×p(BG), where p(BrdU-CLIP)=0.062, p(RT)=0.14, and p(BG)=0.004 are deletion rates in BrdU-CLIP, standard CLIP (i.e., error of reverse transcription), and in RNA-Seq data (background), respectively, and f is the rate of reverse transcriptase read-through. The background deletion rate in RNA- Seq data was estimated based on our previous data (Zhang and Darnell, 2011). The estimated rate of read-through for Rbfox CLIP is f=43%.

Motif Discovery and Enrichment Analysis De novo motif analysis was performed using 21-nt sequences [-10,10] around CIMS or CITS. For CIMS, all 1,158 sequences were used for analysis. For CITS, a randomly sampled subset of 1,000 sequences was used for analysis. Repetitive sequences were masked before de novo motif discovery using the MEME program with the following parameters: -mod zoops -nmotifs 10 -minw 6 -maxw 8, with the remaining parameters left at their default values (Bailey and Elkan, 1994). To examine motif enrichment relative to protein-RNA crosslink sites, the starting positions of UGCAUG, VGCAUG (V=nonU), or additional variants of the Rbfox consensus binding motif were recorded in sequences flanking the crosslink sites. Background motif frequency was estimated from control sequences of positions -500~-401 nt and 398~497 nt relative to the crosslink sites, to normalize the observed motif frequency.

Motif Enrichment and Conservation Score We previously used branch length score (BLS) (Stark et al., 2007) to evaluate the strength of Rbfox motif sites (Zhang et al., 2008). This scoring method considers only the conservation, but not the overall enrichment, of motif sites. For example, a site present only in the reference species without orthologs in any other species receives a BLS score of zero, even if the motif is specific and enriched in CLIP tag clusters. To address this caveat, we defined a new scoring metric of motif-site strength reflecting both enrichment and conservation of motif sites in CLIP tag clusters compared to regions without any CLIP tags. Specifically, we obtained 6,101 CLIP tag clusters satisfying the following criteria as foreground sequences: i) peak height ≥10; ii) located in exons or flanking intronic regions (exon+1kb extension on each side, or exon+ext1k); and iii) no overlap with repeat-masked regions. We also obtained a list of sequences in 90,637 exon+ext1k regions without any CLIP tags as background. We then searched for UGCAUG elements in the foreground and background sequences and calculated the branch length score (BLS) for each site. A distribution of BLS was estimated in CLIP+ and CLIP- sequences, denoted as P(BLS|CLIP+, region) and P(BLS|CLIP-, region), respectively, where the type of region can be introns, coding exons, or noncoding exons. The overall frequency of motif sites in foreground and background sequences, denoted as f(CLIP+) and f(CLIP-), respectively, were calculated and normalized by the length of sequences in each set. The preliminary motif enrichment and conservation score (MECS) for each UGCAUG site was calculated as log2[f(CLIP+)/f(CLIP-)]+log2[P(BLS|CLIP+, region)/P(BLS|CLIP-, region)]. We then performed a 3rd- order polynomial regression between the preliminary MECS score and BLS score. The fitted value for the given BLS score was then used as the final MECS score for each motif site (Figure S2 J,K). The same procedure was used to estimate the MECS scores for VGCAUG elements.

4 RNA-Seq Data Analysis For the HeLa cell RNA-Seq data, raw 5´ and 3´ reads were mapped independently by Eland (Illumina) to the (hg18 at the time of analysis) with iterative 3´ trimming. We required a minimal matched region of 22 nt and ≤ 2 mismatches for alignment. Reads were also mapped to an exon-junction database with the same criteria, in addition to requiring ≥ 4 nt overlap on each side of exon junctions. Only reads unambiguously mapped to unique loci in the genome or exon junctions (single hits) were kept for further analysis. The downstream analysis, including inference of transcript structure and quantification of gene expression and splicing, was performed as described previously (Charizanis et al., 2012) except that the proportional change of exon inclusion between the two conditions was estimated using the ASPIRE algorithm (Ule et al., 2005).

In addition to the HeLa cell RNA-Seq data, we also analyzed three published RNA-Seq datasets: one dataset profiled six different mouse tissues including cortex, cerebellum, heart, kidney, liver and testis (76-nt reads; each tissue has 2-6 biological replicates; GEO accession: GSE30352) (Brawand et al., 2011); one dataset profiled E17 and adult mouse cortex (80-nt single end reads, 4 biological replicates for E17 and 3 biological replicates for adult; GEO accession: GSE39866) (Dillman et al., 2013); and one dataset profiled autistic and control postmortem human brains (74-nt reads; 3 biological replicates for autistic and control brains, respectively; GEO accession: GSE30573) (Voineagu et al., 2011). These datasets were analyzed with the same pipeline (http://zhanglab.c2b2.columbia.edu/index.php/Quantas). Reads in each dataset were mapped to the mouse (mm10) or human (hg19) genome by OLego (http://zhanglab.c2b2.columbia.edu/index.php/OLego) (Wu et al., 2013) with 14-nt non-overlapping seeds, requiring ≤ 3 mismatches per read. An exon-junction database was provided for alignment. Only reads unambiguously mapped to the genome or exon junctions (single hits) were used for downstream analysis. Since these are single-end reads, the resulting mapped reads were counted in each exon or exon junction to quantify splicing changes, as described previously (Charizanis et al., 2012). Fisher’s exact test was used to evaluate the statistical significance of splicing changes, and the false discovery rate was estimated by the Benjamini-Hochberg procedure (Benjamini and Hochberg, 1995). An alternative splicing event was called in the two compared conditions if the event had sufficient read coverage (coverage≥20), |∆I|≥0.1, and Benjamini FDR≤0.05 (Charizanis et al., 2012), unless otherwise indicated.

Integrative Modeling using Bayesian Network Analysis We initially focused on cassette exons using our established Bayesian network framework (Zhang et al., 2010). For evidence of Rbfox-dependent splicing in the Bayesian network analysis, we considered the following datasets: i) differential splicing between WT and Rbfox1 KO mouse brain, as measured by exon-junction microarrays (GEO accession: GSE28421) (Gehman et al., 2011) where we estimated the splicing changes, denoted ∆Irank, using the ASPIRE3 algorithm (Wang et al., 2010); ii) differential splicing (∆I) between HeLa cells with or without Rbfox2 expression, as described above; iii) tissue-specific splicing (∆I) of exons in cortex and cerebellum compared to several other tissues (liver, kidney and testis) using a published RNA-Seq dataset (Brawand et al., 2011); iv) tissue-specific splicing of exons (∆I) in heart compared to several other tissues (liver, kidney and testis) (Brawand et al., 2011). For RNA-Seq data, only those exons with sufficient coverage (≥ 20 reads) were used; missing values were assigned to the other exons. Tissue-

5 specific splicing of exons are indirect indicators of Rbfox-dependent splicing, so the information was used for model training but not for the prediction of individual targets (Figure 4A in the main text and below).

For motif sites, we considered both UGCAUG and VGCAUG elements, and measured their strength by their MECS score. Since multiple motif sites are typically present in the alternatively splicing region, we derived a summarized motif site score by weighting each site according to their strength and the distance to the splice sites, as described previously (Zhang et al., 2010). Each cassette exon was measured by six regional motif site cluster scores denoted as sUI5´ss, sUI3´ss, sE3´ss, sE5´ss, sDI5´ss, and sDI3´ss (UI: upstream intron; DI: downstream intron; E: exon). In each region (UI, E, and DI), the maximum of the two was used to represent motif site strength in the region for Bayesian network analysis. In contrast to our previous analysis (Zhang et al., 2010), the actual distance of motif sites to the splice sites was used for weighting without adjustment. The CLIP tag cluster score in each region was obtained similarly, using the same weighting function based on the peak height of the clusters (Zhang et al., 2010).

To train the model, we compiled 132 exons that have been previously validated as Rbfox targets in humans or mice, mostly using cell culture systems. A subset of these exons was analyzed by detailed mutation analysis to verify direct regulation by Rbfox, while the others, a majority, were validated only by RT-PCR. Therefore, we excluded a few exons that had inconsistent directions of splicing changes in different studies or that did not have CLIP tags or the (U)GCAUG motif sites in the alternatively spliced region, because these are most likely indirect targets. The final set of validated Rbfox targets was composed of 121 exons (87 and 34 exons experiencing Rbfox-dependent inclusion and exclusion, respectively). A class label (activation or repression by Rbfox) was assigned to each of these exons. Besides these validated target exons, we also included additional exons satisfying any of the following criteria for model training: i) exons with |∆Irank|≥0.6 in the comparison of WT vs. Rbfox1 KO mice using exon junction microarrays; ii) exons with |∆I|≥0.08 in the comparison of control vs. shRbfox2 in HeLa cell RNA-Seq data; iii) exons with |∆I|≤0.08 in the comparison of brain or cerebellum vs. other tissues and in comparison of heart vs. other tissues. These exons were not assigned a class label during model training. Overall, we obtained 551 non-redundant exons to estimate model parameters, with a relatively balanced representation of exons likely to be activated or repressed by Rbfox and of non-target exons.

The trained Bayesian network model was then applied to 16,034 cassette exons to predict direct Rbfox targets. The same model was also applied to 4,074 events of tandem cassette exons and 960 events of mutually exclusive exons, assuming these alternative splicing events are regulated by Rbfox through similar mechanisms.

Gene Ontology (GO) and Protein Domain Analysis GO and protein domain analysis was performed using the online tool DAVID using all protein-coding genes expressed in the brain as the background (Dennis et al., 2003).

Candidate Autism-Susceptibility Genes

6 The list of candidate autism-susceptibility genes was obtained from the Simons Foundation Autism Research Initiative (SFARI) autism gene database (http://gene.sfari.org) (Basu et al., 2009). The mouse ortholog of each human gene was determined using the HomoloGene database (http://www.ncbi.nlm.nih.gov/homologene) complemented by manual searches.

Statistical Analysis All statistical analyses were performed using the statistical software R (R Development Core Team, 2008).

7 SUPPLEMENTAL RESULTS Optimization of Rbfox CLIP Conditions Several Rbfox antibodies were tested to evaluate their specificity in recognizing the targeted proteins. To assess potential cross-reactivity of antibodies with different Rbfox family members, we transfected pRbfox1, pRbfox2, and pRbfox3 individually into HEK293T cells, which have little to no endogenous expression of the Rbfox proteins. Immunoblot was then used to detect each protein in these cells using different antibodies. We identified one antibody for each that recognizes the targeted Rbfox family member without discernible cross-reactivity (Figure S1A). For Rbfox3, in addition to a rabbit polyclonal antibody, a commercial monoclonal antibody is also largely specific despite minor cross-reactivity with Rbfox2.

We then optimized the conditions for the immunoprecipitation (IP) of RNA-protein complexes using mouse brain tissue. Each antibody was able to precipitate its specific Rbfox paralog. When an increasing amount of antibody was used for IP, the IP eluate gave increased yield of the targeted protein as revealed by immunoblot and an autoradiogram labeling RNA crosslinked to the protein; reciprocal depletion of the specific protein, but not the other paralogs, was observed in the supernatant (Figure S1 B,C and data not shown). As a control, we performed IP of crosslinked tissue using nonspecific IgG antibodies or non-crosslinked tissue using Rbfox antibodies, which resulted in minimal background (Figure S1C). Under high RNase concentrations, IP of Rbfox crosslinked with radiolabeled RNA using specific Rbfox antibodies resulted in a doublet band at around 50 kD (Figure S1 C, D), corresponding to the molecular weight of the proteins observed in several previous studies (Dredge and Jensen, 2011; Tang et al., 2009). At a lower RNase concentration, a smear of labeled RNA of varying sizes above the doublet band was observed (Figure S1D), as expected in CLIP experiments (Moore et al., 2014). It is also important note that the size of the doublet band at around 50 kD is similar, but not identical, for different Rbfox family members (Figure S1D). This observation provides support for IP specificity since four different antibodies would not be expected to give nonspecific bands of such similar patterns. Instead, the result is consistent with the notion that multiple homologous variants are present in all Rbfox family members. We isolated crosslinked protein-RNA complexes roughly 20 kD above the molecular weight of each protein for further experimentation.

Conditional Probability Distributions Learned by Bayesian Network The following observations were made from the estimated Bayesian network model parameters. First, stronger motif sites are more likely to be bound by the protein. For example, MECS scores of 10 and 20 predict Rbfox binding with probabilities of 0.92 and >0.99, respectively (Figure 4B). Second, regions bound by Rbfox are supported by more CLIP tags than regions not bound by Rbfox. About 18% of regions predicted to be bound by Rbfox have a CLIP tag cluster score ≥10, as compared to 0.9% for regions predicted to have no Rbfox binding (Figure 4C). In addition, we also confirmed and quantified the position-dependent RNA map: binding of Rbfox in the downstream intron resulted in Rbfox-dependent inclusion with a probability of 0.99, while binding of Rbfox in the upstream intron or exon results in repression with a probability of 0.75 and 0.61, respectively; binding of Rbfox in both exon and upstream intron increases the probability of repression to 0.84 (Figure 4D). This analysis also showed that altered splicing after perturbation of 8 Rbfox in HeLa cells or in the Rbfox1 KO mouse model, as well as differential splicing in different tissues, are informative for identifying Rbfox target exons in the brain; exons inferred to be activated or repressed by Rbfox or those with no response had different means of splicing changes. However, the discriminative power is moderate, as reflected in the largely overlapping distributions of splicing changes (Figures 4E and S3 A-C). Finally, alternative splicing of Rbfox target exons is much more likely to preserve the reading frame than that of non-target exons (85% versus 42%) and such alternative splicing events are also more likely observed in other mammalian species (Figure S3 D,E).

Comparison of Rbfox Target Exons Defined by Bayesian Network with Previously Validated Exons In total, 88 of 121 previously validated exons were predicted as Rbfox targets (69/87 exons with Rbfox-dependent inclusion and 19/34 exons with Rbfox-dependent exclusion). This comparison gives a sensitivity of 73% to our Bayesian network prediction, with a conservative assumption that all of these previously validated exons are directly regulated by Rbfox; it is important to note that some of these previously validated Rbfox-dependent exons are likely indirect Rbfox targets, which appears to be particularly true for exons showing Rbfox-dependent exclusion. To address this issue, we focused on 14 cassette exons for which the direct regulation by Rbfox has been previously validated by mutagenesis or RNA IP (Figure S4 and Table S7). Among these, 11 exons were predicted as Rbfox targets by the Bayesian network analysis, indicating a sensitivity of 79%.

Comparison of Rbfox Target Exons Defined by Bayesian Network and Exons by Previous Studies We compared the results of the Bayesian network analysis to our previous motif-based bioinformatic predictions (Zhang et al., 2008). Among the 432 previously predicted Rbfox target cassette exons conserved between human and mouse, 273 (63%) were identified by integrative modeling. While this overlap is substantial (p<1.8×10-263; Fisher’s exact test), we were able to identify important differences in the results. For instance, integrative modeling identified a greatly expanded set of target exons missed by our previous motif-based predictions, presumably using information obtained from the CLIP and other data, while eliminating a number of exons having conserved motif sites but lacking supporting evidence from the other complementary datasets.

Another recent study predicted 933 Rbfox target exons in human using a “leading-edge” analysis based on the presence of RBP motif sites and the correlation of exon splicing with RBP expression in different human tissues and cell lines (Ray et al., 2013). Among the 703 exons annotated as cassette exons in our database, 117 exons have mouse orthologs predicted by our Bayesian network analysis, representing a very significant overlap (p<2.2×10-90; Fisher’s exact test). Again, however, these results suggest that a majority of exons predicted by the two methods are distinct (Figure S5A). Target exons predicted by the Bayesian network analysis more frequently have conserved splicing patterns between human and mouse (78% vs. 26%; Figure S5B), preserved reading frame (89% vs. 57%, as judged from whether the exon size is a multiple of three; Figure S5C), smaller exon size (median: 69 nt vs 96 nt; p<10-17; Figure S5D), and relatively higher expression in the brain (median of expression quantile across all genes: 0.19 vs. 0.25; p<3.3×10-7; Figure S5E). While these differences are not completely unexpected, they suggest that Bayesian network analysis is effective for integrating features that are known to be consistent with tight regulation of alternative splicing events (Xing and Lee, 2006).

9 SUPPLEMENTAL REFERENCES Bailey, T., and Elkan, C. (1994). Fitting a mixture model by expectation maximization to discover motifs in biopolymers Proc Int Conf Intell Syst Mol Biol, 28-36.

Basu, S.N., Kollu, R., and Banerjee-Basu, S. (2009). AutDB: a gene reference resource for autism research. Nucleic Acids Res 37, D832-D836.

Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Statist Soc B 57, 289-300.

Brawand, D., Soumillon, M., Necsulea, A., Julien, P., Csardi, G., Harrigan, P., Weier, M., Liechti, A., Aximu-Petri, A., Kircher, M., et al. (2011). The evolution of gene expression levels in mammalian organs. Nature 478, 343-348.

Charizanis, K., Lee, K.-Y., Batra, R., Goodwin, M., Zhang, C., Yuan, Y., Shiue, L., Cline, M., Scotti, M.M., Xia, G., et al. (2012). Muscleblind-like 2-mediated alternative splicing in the developing brain and dysregulation in myotonic dystrophy. Neuron 75, 437-450.

Darnell, J.C., Van Driesche, S.J., Zhang, C., Hung, K.Y.S., Mele, A., Fraser, C.E., Stone, E.F., Chen, C., Fak, J.J., Chi, S.W., et al. (2011). FMRP stalls ribosomal translocation on mRNAs linked to synaptic function and autism. Cell 146, 247-261.

Darnell, R.B. (2010). HITS-CLIP: panoramic views of protein-RNA regulation in living cells. Wiley Interdiscip Rev RNA 1, 266-286.

Dennis, G., Sherman, B., Hosack, D., Yang, J., Gao, W., Lane, H., and Lempicki, R. (2003). DAVID: database for annotation, visualization, and integrated discovery. Genome Biol 4, R60.

Dickins, R.A., Hemann, M.T., Zilfou, J.T., Simpson, D.R., Ibarra, I., Hannon, G.J., and Lowe, S.W. (2005). Probing tumor phenotypes using stable and regulated synthetic microRNA precursors. Nat Genet 37, 1289-1295.

Dillman, A.A., Hauser, D.N., Gibbs, J.R., Nalls, M.A., McCoy, M.K., Rudenko, I.N., Galter, D., and Cookson, M.R. (2013). mRNA expression, splicing and editing in the embryonic and adult mouse cerebral cortex. Nat Neurosci 16, 499- 506.

Dredge, B.K., and Jensen, K.B. (2011). NeuN/Rbfox3 nuclear and cytoplasmic isoforms differentially regulate alternative splicing and nonsense-mediated decay of Rbfox2. PLoS ONE 6, e21585.

Gehman, L.T., Stoilov, P., Maguire, J., Damianov, A., Lin, C.-H., Shiue, L., Ares, M., Mody, I., and Black, D.L. (2011). The splicing regulator Rbfox1 (A2BP1) controls neuronal excitation in the mammalian brain. Nat Genet 43, 706-711.

10 Konig, J., Zarnack, K., Rot, G., Curk, T., Kayikci, M., Zupan, B., Turner, D.J., Luscombe, N.M., and Ule, J. (2010). iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution. Nat Struct Mol Biol 17, 909- 915.

Licatalosi, D.D., Mele, A., Fak, J.J., Ule, J., Kayikci, M., Chi, S.W., Clark, T.A., Schweitzer, A.C., Blume, J.E., Wang, X., et al. (2008). HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature 456, 464-469.

Moore, M., Zhang, C., Gantman, E.C., Mele, A., Darnell, J.C., and Darnell, R.B. (2014). Mapping Argonaute and conventional RNA-binding protein interactions with RNA at single-nucleotide resolution using HITS-CLIP and CIMS analysis. Nat Protocols 9, 263-293.

R Development Core Team (2008). R: a language and environment for statistical computing (Vienna, Austria: R Foundation for Statistical Computing).

Ray, D., Kazan, H., Cook, K.B., Weirauch, M.T., Najafabadi, H.S., Li, X., Gueroussov, S., Albu, M., Zheng, H., Yang, A., et al. (2013). A compendium of RNA-binding motifs for decoding gene regulation. Nature 499, 172-177.

Stark, A., Lin, M.F., Kheradpour, P., Pedersen, J.S., Parts, L., Carlson, J.W., Crosby, M.A., Rasmussen, M.D., Roy, S., Deoras, A.N., et al. (2007). Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Nature 450, 219-232.

Sugimoto, Y., Konig, J., Hussain, S., Zupan, B., Curk, T., Frye, M., and Ule, J. (2012). Analysis of CLIP and iCLIP methods for nucleotide-resolution studies of protein-RNA interactions. Genome Biol 13, R67.

Tang, Z.Z., Zheng, S., Nikolic, J., and Black, D.L. (2009). Developmental control of CaV1.2 L-type calcium channel splicing by Fox proteins. Mol Cell Biol 29, 4757-4765.

Ule, J., Ule, A., Spencer, J., Williams, A., Hu, J.-S., Cline, M., Wang, H., Clark, T., Fraser, C., Ruggiu, M., et al. (2005). Nova regulates brain-specific splicing to shape the synapse. Nature Genet 37, 844-852.

Voineagu, I., Wang, X., Johnston, P., Lowe, J.K., Tian, Y., Horvath, S., Mill, J., Cantor, R.M., Blencowe, B.J., and Geschwind, D.H. (2011). Transcriptomic analysis of autistic brain reveals convergent molecular pathology. Nature 474, 380-384.

Wang, Z., Kayikci, M., Briese, M., Zarnack, K., Luscombe, N.M., Rot, G., Zupan, B., Curk, T., and Ule, J. (2010). iCLIP predicts the dual splicing effects of TIA-RNA interactions. PLoS Biol 8, e1000530.

Wu, J., Anczukow, O., Krainer, A.R., Zhang, M.Q., and Zhang, C. (2013). OLego: Fast and sensitive mapping of spliced mRNA-Seq reads using small seeds. Nucleic Acids Res 41, 5149-5163.

11 Xing, Y., and Lee, C. (2006). Alternative splicing and RNA selection pressure - evolutionary consequences for eukaryotic genomes. Nature Rev Genet 7, 499-509.

Zhang, C., and Darnell, R.B. (2011). Mapping in vivo protein-RNA interactions at single-nucleotide resolution from HITS-CLIP data. Nat Biotech 29, 607-614.

Zhang, C., Frias, M.A., Mele, A., Ruggiu, M., Eom, T., Marney, C.B., Wang, H., Licatalosi, D.D., Fak, J.J., and Darnell, R.B. (2010). Integrative modeling defines the Nova splicing-regulatory network and its combinatorial controls. Science 329, 439-443.

Zhang, C., Zhang, Z., Castle, J., Sun, S., Johnson, J., Krainer, A.R., and Zhang, M.Q. (2008). Defining the regulatory network of the tissue-specific splicing factors Fox-1 and Fox-2. Genes Dev 22, 2550-2563.

12 Day 1

Note: The first part of the BrdU-CLIP protocol (up to precipitation of IP’d RNA) is the same as the standard CLIP method, and is available at http://www.rockefeller.edu/labheads/darnellr/CLIP_methods.pdf

Bead Prep: Denhardt’s Blocking

Ab Binding Buffer:

1X PBS, pH 7.4 0.02% Tween-20

50µl Protein-G Dynabeads per sample (25µl per cDNA purification step)

Wash 3 times with Ab binding buffer

Add 225μl Ab binding buffer, 25μl 50X Denhardt’s Solution (Sigma, D2532); total volume is 5X original bead volume

Rotate at RT for at least 45 minutes - 1 hour

RT Reaction:

Add 8µl water to RNA pellet (tap to resuspend, quick spin down). Denature at 65°C for 5 minutes (in microfuge tube), place tube on ice (to avoid loss of RNA, do not overdry pellet and do not pipette until after denaturing step)

Transfer to PCR tube (on ice)

Mix I:

4μl 5X RT Buffer 1μl dATP 1μl dCTP 8.2mM (Invitrogen, 10297-018) 1μl dGTP 1μl Br-dUTP (8.2mM; Sigma, B0631) 1μl RT Primer (25μM) 9μl total

Add 9µl of Mix I

3 minutes at 75°C, ramp down to 50°C and hold

Mix II:

1μl DTT (0.1M) 1μl RNAsin Plus (Promega, N261) 1μl Superscript III 3μl total

Add 3μl of Mix II (pre-warm to 50°C in PCR block before adding)

45 minutes at 50°C, 15 minutes at 55°C, 5 minutes at 85°C, 4°C hold

! 1!

Bead Prep: Ab Binding

1X IP Buffer:

0.3X SSPE 1mM EDTA 0.05% Tween-20

Wash 3 times with Ab binding buffer

Add 20µl Ab binding buffer, 5µl 50X Denhardt’s Solution and 25µl (5µg) αBrdU antibody (Santa Cruz, sc-32323)

Rotate at RT for at least 45 minutes

Wash 3 times with 1X IP Buffer

Following RT reaction, add 1µl (at 2U/µl) RNAse H (Invitrogen 18021-071 or NEB M0297L)

Incubate for 20 minutes at 37°C, hold at 4°C

Add 10µl water (to bring volume above 25µl needed for G-25 column)

Spin through G-25 column to remove free BrdUTP (discard G-25 column as solid radioactive waste)

cDNA Purification: Immunoprecipitation I

2X IP Buffer:

0.6X SSPE 2mM EDTA 0.1% Tween-20

4X IP Buffer:

1.2X SSPE 4mM EDTA 0.2% Tween-20

Nelson Low Salt Buffer:

15mM Tris pH 7.5 5mM EDTA

! 2!

Nelson Stringent Buffer:!

15mM Tris-HCl pH7.5 5mM EDTA 2.5mM EGTA 1% Triton X-100 1% NaDOC 0.1% SDS 120mM NaCl 25mM KCl

Measure volume, add water up to 40μl and add 10μl 50X Denhardt’s Solution and 50μl 2X IP Buffer for a total volume of 100μl (Denhardt’s and 2X IP Buffer can be added to the G-25 column collection tube prior to spinning samples through, volume can then be adjusted up to 100μl)

5 minutes at 70°C, equilibrate to RT

Add to prepared tube of beads (25μl original slurry volume, store remaining beads for second purification at 4°C O/N), rotate at RT for 30-45 minutes

(all washes 3-5 minutes rotating at RT, 1ml)

Wash 1 time with 1X IP Buffer (5X Denhardt’s) 2 times with Nelson Low Salt Buffer (1X Denhardt’s) 2 times with Nelson Stringent Buffer (1X Denhardt’s) 2 times with 1X IP Buffer

Competitive Elution (with BrdU):

Add 50µl of 100µM BrdU (Sigma, B5002, diluted in 1X IP Buffer)

Rotate at RT for 30-45 minutes, place on magnet and collect eluate

Spin through G-25 column to remove free BrdU

Measure volume, add water up to 97.5μl, add 37.5μl 4X IP buffer and 15μl 50X Denhardt’s Solution (Denhardt’s and 4X IP Buffer can be added to the G-25 column collection tube prior to spinning samples through)

Store overnight at 4°C

! 3! Day 2

cDNA Purification: Immunoprecipitation II

CircLigase Wash Buffer:

33mM Tris-Acetate 66mM KCl (pH 7.8)

5 minutes at 70°C, equilibrate to RT

Add to previously prepared beads, rotate at RT for 30-45 minutes

(all washes 3-5 minutes rotating at RT, 1ml)

Wash 1 time with 1X IP Buffer (5X Denhardt’s) 2 times with Nelson Low Salt Buffer (1X Denhardt’s) 2 times with Nelson Stringent Buffer (1X Denhardt’s) 2 times with CircLigase wash buffer

Circularization: CircLigase

APE1 Wash Buffer:

50mM Potassium Acetate 20mM Tris-Acetate 10mM Magnesium Acetate (pH 7.9)

Reaction Mix:

2µl CircLigase 10X Reaction Buffer 4µl Betaine (5M) 1µl MnCl2 (50mM) 0.5µl CircLigase ssDNA Ligase II (50U) (Epicentre, CL9021K) 12.5 Water 20µl total

Incubate 1 hour at 60°C in thermomixer (interval: shake at 1300rpm every 30” for 15”)

(all washes 3-5 minutes rotating at RT, 1ml)

Wash 2 times with Nelson Low Salt Buffer 2 times with Nelson Stringent Buffer 2 times with APE1 wash buffer

! 4!

Re-linearization: APE1

Phusion Wash Buffer:

50mM Tris (pH 8.0)

Reaction Mix:

2µl 10X Reaction Buffer (NEB 4) 1.25µl APE1 (10U/µl, NEB M0282S) 16.75µl Water 20µl total

Incubate for 1 hour at 37°C in thermomixer (interval: shake at 1300rpm for 15” every 30”)

(all washes 3-5 minutes rotating at RT, 1ml)

Wash 2 times with Nelson Low Salt Buffer 2 times with Nelson Stringent Buffer 2 times with Phusion wash buffer

PCR: Phusion Polymerase, SYBR Green

Mix I:

10µl 5X Phusion HF Buffer 1µl 10mM dNTPs 37µl Water 48µl total

Mix II:

0.5µl P5 (20µM) 0.5µl P3 (20µM) 0.5µl Phusion DNA Polymerase (NEB, M0530) 1.5µl total

Add Mix I to beads

Incubate at 98°C for 1 minute in thermomixer (1200 rpm) to elute cDNA from beads and place on magnet

Collect supernatant and place in PCR tube with optically clear cap (BioRad, TCS-0803)

Add 1.5µl Mix II

Add 0.5µl 50X SYBR Green I (Invitrogen, S7563) diluted in Phusion wash buffer to mix and place in real-time PCR machine ! 5! Cycle:

98°C 30”

98°C 10” 60°C 15” 72°C 20”

Remove reaction tubes when RFU signal reaches 500-1000 (BioRad CFX96 Touch™ Real-Time PCR Detection System, or BioRad iQ5)

Purify PCR product using Agencourt AMPure XP beads (Beckman Coulter) according to manufacturers instructions (usually results in 5-10nM in 30µl eluate)

Quantitate using Quant-it kit (Invitrogen) or Tapestation (Agilent), if multiplexing, pool samples according to quantitation results)

Libraries are compatible with Illumina single-end HiSeq flowcells (not compatible with MiSeq, or paired-end HiSeq flowcells)

Submit for Illumina sequencing using small RNA sequencing primer listed below

! 6!

RNA adapter (Dharmacon):

SRA3: 5'-P UCGUAUGCCGUCUUCUGCUUG-puromycin-3'

Pre-Adenylated DNA adapter (IDT): Higher ligation efficiency, use with truncated T4 RNA Ligase 2, NEB M0242L)

Pre-A SRA3: 5’-/5rApp/TCGTATGCCGTCTTCTGCTTG/3ddc/-3’

DNA Primers (IDT):

RT Primers:

RT-1 5’pGNNNNNNNNGATGCGATCGTCGGACTGTAGAACTCT/idSp/CAAGCAGAAGACGGCATACGA

RT-2 5’pGNNNNNNNNGTGACGATCGTCGGACTGTAGAACTCT/idSp/CAAGCAGAAGACGGCATACGA

RT-3 5’pGNNNNNNNNGCAGTGATCGTCGGACTGTAGAACTCT/idSp/CAAGCAGAAGACGGCATACGA

RT-4 5’pGNNNNNNNNGAGCTGATCGTCGGACTGTAGAACTCT/idSp/CAAGCAGAAGACGGCATACGA

RT-5 5’pGNNNNNNNNGGTCAGATCGTCGGACTGTAGAACTCT/idSp/CAAGCAGAAGACGGCATACGA

RT-6 5’pGNNNNNNNNGTCGAGATCGTCGGACTGTAGAACTCT/idSp/CAAGCAGAAGACGGCATACGA

PCR Primers:

Forward (P5):

5’-AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGACG

Reverse (P3):

5’-CAAGCAGAAGACGGCATA ! ! Sequencing Primer: Illumina Small RNA Sequencing Primer

5’ -CGACAGGTTCAGAGTTCTACAGTCCGACGATC

! 7! Immunoblot B IP/Immunoblot A α-Rbfox3 (pAb)

IP eluate PostIP supernatant

input 1 2 1 2 1 2 1 2 76 α-Rbfox3 52 (mAb) 52 α-Rbfox1 IP/RNA labeling 76 C Immunoblot α-Rbfox1 IgG PostIP supernatant 52 α-Rbfox2 α-Rbfox1 α-Rbfox2 76 52

52 α-Rbfox3 (pAb) 1 2 3 4 76 52 α-Rbfox1 α-Rbfox2 α-Rbfox3 52 IgG (mAb) 52 α-Rbfox2 α-GAPDH 52 1 2 3 4 5 6 7 8 5 6 7 8 D IP/RNA labeling α-Rbfox1 α-Rbfox2 α-Rbfox3 (pAb) α-Rbfox3 (mAb)

Rnase A: ++ + + ++ + + ++ + + ++ + + 102 102 102 102

76 76 76 76

52 52 52 52

38 38 38 38

Fig. S1: Optimization of Rbfox1, 2, and 3 HITS-CLIP conditions, related to Figure 1. A. HEK293T cells were transfected with one of the Rbfox1, 2, or 3 plasmids or an empty vector control. Total protein extract from these cells was used for immunoblot analysis using each α- Rbfox antibody to test for cross-reactivity. α-GAPDH was used as a loading control. B. Immunoprecipitation of Rbfox3 in mouse brain tissue using different amounts of α-Rbfox3 rabbit polyclonal antibodies. Immunoblot using another monoclonal antibody detected a doublet band. C. Immunoprecipitation of Rbfox1(top left) and Rbfox2 (bottom left) in crosslinked brain tissue using different amounts of antibodies to isolate crosslinked Rbfox-RNA complexes with high RNase concentration. Nonspecific IgG was used as a control. The isolated RNA was radio- labeled and visualized by autoradiogram. Each sample was loaded into two lanes, as labeled below. The post-IP supernatant of the same samples was used for immunoblot with α-Rbfox1 (top right) and α-Rbfox2 antibodies (bottom right). D. Representative CLIP autoradiogram for Rbfox1, 2, and 3. In each experiment, high- concentration RNAase treatment resulted in a doublet band corresponding to the molecular weight of the protein, and low RNase concentration resulted in a smear above the expected molecular weight. The yellow boxes indicate the regions cut out for CLIP experiments. A B E=3.7e-320 C D E=1.9e-631 417/1158 sites 599/1000 sites TTTTTTTTTT TTTGTT2TGATTTTTTGC G TAAGAG2 AAA G T G 1 G A T1 AAAAAAGG C G GT Abits AAAAAAA T A GTAGbits TGG G A AT TGT A GGGGGGACCG CGACAGGTGTGGGCGAAC TTAGTT A T ACT

C T AGG G CT C C T A CCGG C C ACCC C C CG C CCC T T G T CCCCCC CCA CCCCCC C ACCC 0 9 1 3 4 5 6 7 8 C A A A CC CA 2 A 0 0 3 6 3 6 2 4 5 2 4 5 1 1 0 1 3 4 9 5 6 2 7 8 C -8 -6 -4 -2 -1 -7 -5 -3 -9 -8 -6 -9 -7 -4 -5 -2 -3 -1 A 10 10 -10 -10 G TG G ATG

E 20 F 20 G 70 70 70 15 15 60 60 60

50 10 50 10 50

40 5 40 5 40 30 30 30 0 0 20 -5 0 5 20 -5 0 5 20 NGAAUG NGCCUG 10 10 10 Enrichment of NGAAUG of Enrichment Enrichment of NGCCUG of Enrichment Enrichment of UGCAUG.m1 of Enrichment 0 0 0 -50 0 50 -50 0 50 -50 0 50 Distance to deletion site Distance to deletion site Distance to deletion site

H I J K 350 UGCAUG 350 VGCAUG 14 UGCAUG 14 VGCAUG 300 300 12 intron 12 intron coding exon 250 250 10 10 coding exon noncoding exon noncoding exon 200 200 8 8

150 150 MECS 6 6

Motif enrichment Motif 100 100 4 4 50 50 2 2 0 0 0 0 -50 0 50 -50 0 50 0 0.5 1 0 0.5 1 Distance to “truncation” site Distance to “truncation” site BLS BLS Fig. S2: Rbfox binding motif analysis in CIMS and CITS data, related to Figure 2. A,C. Crosslink sites inferred from CIMS analysis and CITS analysis, respectively. The dotted box indicates the UGCAUG motif directly revealed by anchoring sequences at the crosslink sites. B,D.The strongest de novo motif in positions [-10,10] around CIMS and CITS, respectively. The number of sequences containing the motif and the statistical significance (E-value) are indicated. E,F,G. Rbfox recognizes some, but not all, variants of the (U)GCAUG consensus motif. Enrichment of NGAAUG (E), NGCCUG (F) and UGCAUG.m1 (G) around CIMS is shown. UGCAUG.m1 represents all hexamers with one mismatch compared to the UGCAUG consensus sequence (with NGCAUG excluded). H,I. Standard CLIP data were used as a control to perform CITS analysis. Enrichment of UGCAUG (H) and VGCAUG (I) around “mock” CITS derived from standard CLIP tags, presumably without crosslinking-induced truncation, was shown. J,K. Motif enrichment and conservation score (MECS) for UGCAUG (J) and VGCAUG (K). Motif sites are grouped by their genomic location, and scored separately. The dots represent the preliminary MECS. The final MECS is smoothed by intrapolating the preliminary MECS with branch length score (BLS) measuring motif site conservation in mammalian species (the curves).

A B C 0.9 3.5 4 Activation 0.8 3 3.5 No effect 0.7 3 2.5 Repression 0.6 2.5 0.5 2 2 Density Density Denrsity 0.4 1.5 1.5 0.3 1 0.2 1 0.5 0.1 0.5 0 0 0 -10 -5 0 5 10 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 ∆Irank (WT vs. Rbfox1 KO) ∆I (brain vs. other tissues) ∆I (heart vs. other tissues)

D E F 16

100% 100% 14 R² = 0.92649 12 80% 80% 10 60% 60% 8

40% 40% 6 (FDR) by 10-fold CV 10-fold (FDR)by Probability Probability Probability Probability 10 4

20% 20% -log 2

0% 0% 0 0 5 10 15

-log10(FDR) by full model

Frame shift Not conserved

Inframe Conserved

Fig. S3: Integrative modeling of Rbfox target cassette exons using a Bayesian network, related to Figure 4. A-C. Conditional probability distributions (CPDs) of splicing changes. For each panel, splicing changes were modeled using a normal distribution. The distributions for three groups of exons (activated by Rbfox, no effect from Rbfox and repressed by Rbfox) are shown in red, gray and blue, respectively. A. WT vs. Rbfox1 KO brain. B. brain vs. other tissues (liver, kidney and testis). C. heart vs. other tissues. D. CPD of reading frame preservation. E. CPD of alternative splicing conservation between mouse and human/rat. F. Evaluation of over-fitting. A total of 121 validated target exons compiled from the literature were used for this evaluation. X-axis shows the FDR of each exon predicted by the full model, which was trained using the complete training dataset (including all validated exons). Y-axis shows the FDR of each validated exon predicted in 10-fold cross validation (CV). In this cross validation procedure, models were trained using 90% of the training data, then used to predict the remaining 10% of exons.

A Atp5c1: 37nt! B Capn3: 144nt! C Epb4.1: 63nt mm10 mm10 1 kb 1 kb mm10 5 kb 120,486,000 120,486,500 120,487,000 120,487,500 120,488,000 10,059,000 10,058,500 10,058,000 10,057,500 10,057,000 10,056,500 131,965,000 131,960,000 UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics

Rbfox Pooled CLIP Tags Rbfox Pooled CLIP Tags Rbfox Pooled CLIP Tags Rbfox (U)GCAUG Motif Sites

Rbfox (U)GCAUG Motif Sites Rbfox (U)GCAUG Motif Sites

D Ewsr1: 18nt! E Fmnl3: 100nt! F Fn1: 273nt! 1 kb mm10 1 kb mm10 mm10 1 kb 5,091,500 5,091,000 5,090,500 5,090,000 5,089,500 5,089,000 5,088,500 99,319,000 99,318,500 99,318,000 99,317,500 71,615,500 71,615,000 71,614,500 71,614,000 71,613,500 71,613,000 UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics

Rbfox Pooled CLIP Tags Rbfox Pooled CLIP Tags Rbfox Pooled CLIP Tags

Rbfox (U)GCAUG Motif Sites Rbfox (U)GCAUG Motif Sites

Rbfox (U)GCAUG Motif Sites

G Grin1: 63nt! H Itga7: 113nt! I Nrap: 105nt mm10 1 kb mm10 ! 1 kb mm10 2 kb 25,313,500 25,313,000 25,312,500 25,312,000 25,311,500 25,311,000 128,954,500 128,955,000 128,955,500 128,956,000 128,956,500 128,957,000 128,957,500 56,378,000 56,377,500 56,377,000 56,376,500 56,376,000 56,375,500 56,375,000 56,374,500 56,374,000 56,373,500 UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics RefSeq Genes

Rbfox Pooled CLIP Tags Rbfox Pooled CLIP Tags Rbfox Pooled CLIP Tags Rbfox (U)GCAUG Motif Sites

Rbfox (U)GCAUG Motif Sites

Rbfox (U)GCAUG Motif Sites

J Numb: 147nt K Pbrm1: 156nt L Plod2: 63nt! ! 1 kb ! mm10 mm10 2 kb 1 kb mm10 83,799,500 83,799,000 83,798,500 83,798,000 83,797,500 83,797,000 83,796,500 83,796,000 31,110,000 31,115,000 92,601,000 92,601,500 92,602,000 92,602,500 92,603,000 UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics

Rbfox Pooled CLIP Tags Rbfox Pooled CLIP Tags Rbfox Pooled CLIP Tags

Rbfox (U)GCAUG Motif Sites

Rbfox (U)GCAUG Motif Sites Rbfox (U)GCAUG Motif Sites

M Rbfox2: 93nt! N Src: 18nt! mm10 5 kb 1 kb mm10 77,110,000 77,105,000 157,460,000 157,461,000 157,462,000 UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics

Rbfox Pooled CLIP Tags

Rbfox (U)GCAUG Motif Sites

Rbfox (U)GCAUG Motif Sites

Fig. S4: Direct Rbfox target cassette exons validated in the literature, related to Figure 4. In each panel, the alternative exon together with the flanking introns and exons are shown, followed by pooled Rbfox CLIP tags and (U)GCAUG elements from top to bottom. Exons activated and repressed by Rbfox are shown in red and blue, respectively. A B 100 C 100 D 1 E 1

BN Ray et al 80 80 0.8 0.8 (599 exons) (693 exons) Ray et al. Ray et al. 60 60 0.6 0.6 BN BN target genes 40 target genes 40 0.4 Median: 0.4 Median: 482 117 576 69 nt vs. 96 nt 0.19 vs. 0.25 -17 conserved AS(%) conserved -7

20 Rbfox 0.2 Rbfox

20 of Cumdistribution (p<10 ) 0.2 Cum distribution of of Cumdistribution (p<3.3x10 ) Frame-preserving(%)

0 0 0 0 p<2.17x10-90 0 200 400 0 0.5 1 Exon size (nt) Quantile of gene expression in brain

Fig. S5: Comparison of the Bayesian network (BN) approach with leading-edge analysis by Ray et al., related to Figure 4.

A. The Venn Diagram shows the overlap of Rbfox target cassette exons predicted by the Bayesian network analysis (599 of 772 exons with conserved alternative splicing in human) with those predicted by leading-edge analysis (693 exons). The significance of overlap is evaluated by Fisher’s exact test. B. The percentage of exons with conserved alternative splicing between human and mouse. C. The percentage of frame-preserving exons (exon length is a multiple of three). D. The cumulative distribution of exon size. E. The cumulative distribution of the quantile of gene expression in the brain that measures the relative abundance of each gene.

A Camta1: 31nt!

mm10 5 kb 151,070,000 151,065,000

Rbfox Pooled CLIP Tags

44 42 16 16 inc%

Rbfox (U)GCAUG Motif Sites ΔI=0.27

B Grip1: 45nt! 2 kb mm10 120,051,000 120,052,000 120,053,000 120,054,000 120,055,000 RefSeq Genes

Rbfox Pooled CLIP Tags

Rbfox (U)GCAUG Motif Sites

61 60 47 48 inc%

ΔI=0.13 C Kcnma1: 174nt! mm10 10 kb 23,430,000 23,420,000 23,410,000 23,400,000 RefSeq Genes

Rbfox Pooled CLIP Tags

5 5 15 15 inc%

ΔI=0.10

Rbfox (U)GCAUG Motif Sites

Fig. S6: Validation of Rbfox target exons in genes implicated in autism, related to Figure 6. In each panel, the alternative exon together with the flanking introns and exons are shown, followed by pooled Rbfox CLIP tags and (U)GCAUG elements from top to bottom. Exons activated and repressed by Rbfox are shown in red and blue, respectively. The size of the alternative exon and the gene symbol are indicated at the top. The gel image and quantitation of semi quantitative RT-PCR analysis using WT and Rbfox1 KO mouse brains is shown on the right. The bands corresponding to the two alternative isoforms are indicated.