© 2013 Nature America, Inc. All rights reserved. mechanism 11 Biomedical Engineering, Oregon Health & Science University, Portland, Oregon, USA. Institutes, Agency for Science, Technology & Research, Singapore. California, USA. National Laboratory, Berkeley, California, USA. La Jolla, California, USA. 1 3 decoy the For instance, biochemically strated demon have been enhancers intronic Few distal regulation. splicing of models existing with inconsistent not are thus and exons flanking or exon regulated the to close relatively often are distal considered previously elements regulatory The conclusions. limited only allow located much farther than 500 nt from potential target exons. bindingbyproteins showalso thatlarge a fractionofbinding sites are code regulatorysplicing flankingexons extensivelyhas been studied toderive computationala neighboring their alternativeandexons of nt 500 informationwithin splicing alter to regions intronicproximal their and known that splicing factors bind within constitutiveinteractions and among alternative RBPs and theirexons RNA substrates sequencinginmammalian cells provides insights into thenetworks of application of genome-scale immunoprecipitation and high-throughput somatic genetic defects in these sites cause human diseases. The recent to phenotypic diversity across mammalian evolution, and inherited or recognition messenger RNA (pre-mRNA) sequences to promote orRNA-binding repress proteinssplice-site (RBPs) often bind as complexes within precursortrans scriptomesindicates complexthata interplay among tran eukaryotic higher alternativeinisoformsmRNA of variety The is KIF21A human conserved is binding Alternative Gabriel A Pratt Tiffany Y Liang Michael T Lovci through evolutionarily conserved RNA bridges Rbfox regulate alternative mRNA splicing nature structural & molecular biology molecular & structural nature Received 26 April; accepted 19 September; published online 10 November 2013; Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, California, USA.

Present address: State University of Campinas, Sao Paulo, Brazil. Correspondence should be addressed to J.G.C. ( necessary extensive Published studies of distally located sequences that affect splicing splicing affect that sequences located distally of studies Published factors exists to regulate splicing decisions. Splicing factors such as

and

and in

1– distal GCATG

splicing

mouse

for and 3 ENAH for . Variation within 6 Howard Hughes Medical Institute, University of California, Los Angeles, California, USA.

AS

Rbfox-mediated

is intronic 1

1

an 1

– control

experimental – sequences distal

– 3 (AS) 3 3 2 3 , Douglas , L Douglas Black Institute for Genomic Medicine, University of California, San Diego, La Jolla, California, USA.

0 , , Thomas J Stark active , , Dana Ghanem or proposed on the basis of conservation of basis the on proposed or 1 9 ′ splice acceptor site sequence in the mouse mouse the in sequence site acceptor splice . However, the genome-wide maps of RNAof However,mapsgenome-wide . the

enables regions

by

mode

trans

distally and

exon

systems. and programmed (>500 factors or their binding sites leads

of are

5 splicing showed bound

Department of Microbiology, Immunology and Molecular Genetics, University of California, Los Angeles, Los Angeles, inclusion associated

nucleotides 4 1 5 , , Henry Marr

– As , 3 6 , , Lauren T Gehman , , Joe W Gray

RNA-binding

a

regulation. that 4–

advance online publication online advance proof diversity

1 in 4 5

cis . It has long been with ,

15–

a the elements and

conserved

(nt) 1 8 of 8 School of Biological Sciences, Nanyang Technological University, Singapore.

. Sequence . ENAH AS

principle,

of from

4 Similarly

, , Justin Arnold

activation

9 proteins. , , John G Conboy

gene. 2 any

1 long-range - -

. expression

do

we

exon) 5 to

disorder (ASD) disorder autism-spectrum and epilepsy retardation, mental as such deficits, tions mapping to the cerebellar ataxia types 1 and 2 (refs. tion evolu vertebrate across position and sequence in conserved highly cells stem embryonic and cells, mesenchymal and epithelial muscle, brain, in exons of AS in vivo proteins RNA-binding of family Rbfox the examined we splicing, in unexplored was occurred this how N30 in human exon affect to found was motif Rbfox distal a example, For obscure. nisms by which a splicing factor might act on a distant exonpre-mRNA are its largely in exons weak all of inclusion equine the of first the in as chicken known the in exon an of downstream intronic the sequence of segment nt 526 a bind RBPs which known not example, is For it known. always not are sequences distal the ognizing rec factors splicing the that is studies these of aspect complicating exon downstream the from nt 500 than less are EIIIB exon regulated ( caspase-2 Thus i:10.1038/nsmb.269 , 10

6 and

, , Shawn Hoon Yong Loo Lin School of Medicine, National University of Singapore, Singapore. validated To examine the genome-wide relevance of distal regulatory sites sites regulatory distal of relevance genome-wide the Toexamine exon-proximal 16

by we , repression 3 and in human cell lines. These proteins control tissue-specific and in proteins human These control lines. cell tissue-specific

3 RNA-RNA

1 . RBFOX proteins interact with proteins mutated in spinal spinal in mutated proteins with interact proteins RBFOX . across Rbfox

– demonstrate

MYPT1 3 Casp2 , Sherry Gee , Sherry

4 the & Gene W Yeo MYH10

2 36–

splicing 2 tissues

activity and motifs that enhance splicing of rat rat of splicing enhance that motifs and ) gene is located only ~200 nt downstream from the nt ) only ~200 gene is downstream located

7 upon

3 ) gene ) 9 Molecular Engineering Laboratory, Biomedical Sciences sites,

9 base-pairing 7 . In muscle, post-transcriptional downregulation downregulation . In muscle, post-transcriptional , RBFOX1 8 5 2 , , , Katlin B Massirer Stem Cell Program, University of California, San Diego, 27– (non-muscle myosin (non-muscle heavy chain B) gene, but

a and modulation

2 factors distal

[email protected] 3 5 previously of 2 . Similarly, it is unclear how a a how unclear is it Similarly, . 4 , and their binding sites are exceptionally are exceptionally sites binding their , and

, , Marilyn Parra two development. 4 gene have a range of neurological Life Sciences Division, Lawrence Berkeley

sites

1

β

important specific – interaction -casein ( -casein 3 2 34 , 0

7 contain

.

of , unknown , 3 10 5

) or G.W.Y. ( Rbfox ), and individuals with muta

Rbfox CSN2

We in

evolutionarily 1 2 (an

– 6 abundance development 4

. Finally, the mecha the Finally, . 3 RNA-mediated show ,

) gene increases the the increases gene ) , enhancers

11

[email protected] 9 RNA s e l c i t r a Department of , PPP1R12A

here 23

bridge) , 2 cis 4 . Another Another .

Fn1

in that element element

in

(also (also exon exon

).  - - - - © 2013 Nature America, Inc. All rights reserved. ontology categories represented by bound in proximal intron (green boxed ‘PI’), distal intron (purple boxed ‘DI’) or 3 or ‘DI’) boxed (purple intron distal ‘PI’), boxed (green intron proximal in bound genes by represented categories ontology (see binding distal by in (as 4 species or ( nt). 200 within (overlapping Cf abbreviations: (species species 4 (goldenrod) or 2 (burgundy) 1 (teal), in conserved associated their with listed are motif ( below. represented is paper this in used definitions region genic of A schematic transcriptome. the across region each of distribution length the to compared regions, genic different within CLIP-seq, 1 Figure in interactions -RNA Rbfox of maps genome-wide Togenerate Rbfox RESULTS regulation. splicing RNA conserved highly of reservoirs rich are regulation by Rbfox. These results indicate that distal intronic regions splicing distal mediate structures that secondary RNA-RNA showed long-range we Finally, reporters. splicing minigene in and genes endogenous both in splicing control directly sites binding Rbfox tal dis that these Rbfox demonstrated and Weloss gain. experimentally ated with exons that are differentially spliced in experiments modeling associ preferentially are motifs Rbfox conserved distal and sites ing We used RNA-seq measurements of AS to show that distal Rbfox bind from exons and that nt) these distal sites are(>500 preserved through evolution. distally located are sites binding Rbfox of half than more that show to brain mammalian in assays (CLIP-seq) sequencing and development. normal in roles key plays proteins RNA-binding of class this that suggesting physiology muscle and neuronal both in defects sive with knockout or knockdown of Rbfox protein expression show (FSHD) exten dystrophy muscular humeral of s e l c i t r a  the in experiments RNA-seq the in AS exhibiting genes by or

c a , ,

Here we used genome-wide cross-linking, immunoprecipitation immunoprecipitation cross-linking, genome-wide used we Here vivo RBFOX1 Rbfox1 Canis familiaris Canis Clusters with 5U GCATG motif types (%)

, we used UV irradiation to crosslink protein–RNA complexes to , crosslink we UV used irradiation interacts Rbfox1

Transcriptome CLI Characteristics of Rbfox binding in distal intronic regions. ( regions. intronic distal in binding Rbfox of Characteristics Expected E P expression has a role in the pathology of facioscapulo of pathology the in role a has expression PI CLI P

Rbfox2 in vivo c ). ( ). ) is shown separately for all, proximal and distal regions. ( regions. distal and proximal all, for separately shown ) is DI d Supplementary Fig. 1e Fig. Supplementary ) Bar plots show the fraction of GCATG motifs conserved in 1 (teal), 2 (burgundy) or 4 (orange) species occupied by Rbfox1 Rbfox1 by occupied species 4 (orange) or 2 (burgundy) 1 (teal), in conserved GCATGof motifs fraction the show plots ) Bar CLI

with P 500 P d I

e RBFOX1-bound GCATG ) Bar charts show the fraction of Rbfox1 CLIP-seq clusters that contain (within 200 nt) GCATG motifs conserved in 1, 2, 1, in conserved GCATG nt) 200 motifs (within contain that clusters CLIP-seq Rbfox1 of fraction the show charts ) Bar GCATG type:

motifs in transcriptome (%) conserved E

5 0 1 2 3 4 Conservatio GCATG P 500 b P value. ( value. 3 5 Al DI PI I E U U l 4 0 n T

G . Moreover, animal models models animal Moreover, . C A ). The intensity of gray corresponds to the −log the to corresponds gray of intensity The ). and T T G G C A A No site Mm, Hs,Rn DI c T T T G G G C C C A A A ) Pie charts depict the fraction of Rbfox1 clusters that contain (within 200 nt) the sequence GCATG, sequence the nt) 200 (within contain that clusters Rbfox1 of fraction the depict charts ) Pie T T T T T G C C A A A A cis

distal T G G G G G C C A T T G G C C C C C C A A A elements critical for for critical elements b T T T T e G G C C C C A A A A A A ) ) PI T T T T T T G G C

De novo De Clusters with GCATG and

T G G G G G G C C A motif types (%) GCATGs T T T G C C A A A 28 E Cf 40 60 20 T G 0 , 29 T C Rbfox1 3 Al , U sequence motifs enriched above background that were similar to the canonical Rbfox Rbfox canonical the to similar were that background above enriched motifs sequence 3 advance online publication online advance l 10 10 10 10 10 10 P 1 Mm Mm , further further , valu –16 –61 –14 –48 –206 –257 PI

or or and a e ) Pie charts depict the fraction of Rbfox1 and Rbfox2 clusters, defined from from defined clusters, Rbfox2 and Rbfox1 of fraction the depict charts ) Pie Rbfox2 Hs DI - - - - -

f compared with the appropriate backgrounds selected from similar similar from selected backgrounds appropriate the with compared P (Rbfox1: motif TGCATG the of enrichment nificant sig that strated proximal and both clusters showed statistically distal a First, observations. by Rbfox sites fide was binding several supported located in proximal introns. The authenticity of distal clusters as bona 3 within identified space ( exon intronic annotated any from nt as >500 defined we which regions, intronic distal within ( clusters Rbfox2 Rbfox2 Rbfox1 and both containing genes 1,901 with 7,466 genes, in 2,672 clusters and genes 3,490 in clusters Rbfox1 10,062 identified ( significantly scriptome clippe at (available CLIPper algorithm, finding Table 1 ( alignments nonredundant (Rbfox2) 2,451,256 and (Rbfox1) 2,071,607 in resulting processed, and sequenced were sites ( Fig. proteins Rbfox2 or Rbfox1 either for specific ies from adult mouse brain and immunoprecipitated them with antibod ranked motifs). ranked An motifs). alternative of method analysis ( statistical genic regions ( ) A heat map shows a portion of categories represented represented categories ontology gene of a portion shows map ) A heat knockout (KO) or or (KO) knockout < 10 < f de Rbfox We found that the majority of Rbfox clusters (62%) were located located were (62%) clusters Rbfox of majority the that Wefound

1a novo Mm r −48 ) ) to delineate clusters of reads representing regions in the tran ). ). Isolated RNA representing Rbfox fragments protein binding ) ) to the mouse (mm9) genome. We used our CLIP-seq cluster- CLIP 1 3U , , respectively, Rbfox2: Rbfox2: respectively, motif search with the HOMER algorithm, which demon which algorithm, HOMER the with search motif Mus musculus Mus 2 RNA-seq Huma 2 10 EE ( Fig. Fig. 1 1 P value) of a hypergeometric test for enrichment in gene gene in enrichment for test a hypergeometric of value) 3 n Supplementary Fig. 1b Fig. Supplementary RBFOX ∆ Mouse Ψ 1 ′

KO UTRs and only a minority of clusters (8%) were were (8%) clusters of minority a only and UTRs <5% b 2 nature structural & molecular biology molecular & structural nature and see 1 CLIP ; ; ectopic expression (EE). expression ectopic PI P Hs 2 < 0.01) associated with Rbfox binding. We Rbfox binding. with associated < 0.01) 1 CLIP , , DI Homo sapiens Homo 2 Supplementary Supplementary Fig. 1c P mRNA splicesiteselection Regulation ofactionpotentialinneuron T-tubule Sarcolemma Integral toplasmamembrane Potassium iontranspor Actin cytoskeleton Calcium-activated potassiumchannelactivit M band Signal transduction Integral tomembrane Photoreceptor outersegmen Central nervoussystemneurondifferentiation Nervous systemdevelopmen Cytokinesis aftermitosis Outer membrane–boundedperiplasmicspace Micro lament motoractivit cGMP binding Regulation ofneuronprojectiondevelopment 3 Sensory perceptionofsoun Extrinsic tomembrane Voltage-gated calciumchannelactivit Establishment ofproteinlocalization Vocalization behavior Ankyrin binding Positive regulationofdendriticspinedevelopment Cell adhesion Z disc SH3 domainbinding Microtubule cytoskeletonorganization ′ < 10 < ,5 ′ -cyclic-nucleotide phosphodiesteraseactivit Fig. 1 Fig. 0 −48 and and https://github.co ′ ). a UTR (brown boxed ‘3U’) ‘3U’) boxed (brown UTR ; ; ). Rbfox clusters were also also were clusters Rbfox ). Rn –log , , P 10 Rattus norvegicus Rattus < 10 < ( t P value) y d −90 t t Supplementary Supplementary Supplementary Supplementary for other highly P respectively), respectively), < 10 < m/YeoLab/ y 6 .0 −205 Z -score; -score; y ;

and and

y

- - - - © 2013 Nature America, Inc. All rights reserved. the list of genes within each category). Many of these genes bound bound genes these of Many category). each within genes of list the significant gene ontology categories and and categories ontology gene significant ( categories additional some included also but sites, binding proximal regions were similar to those with associated genes that contain exon- ( clusters within servation con site GCATG of degree of function a as Rbfox2 increased and sites binding Rbfox1 distal and proximal both between overlap the ( sites to proximal the similar was sites binding Rbfox2 and Rbfox1 distal between overlap of level Notably, the relevance. of functional a measure as sites binding their act with the same sequence motif, we measured the correspondence in introns ( distal motifs, as compared to clusters of similar sizes GCATG distributed randomlyconserved in contained sites binding proximal) (and distal of genome ( mouse the in only present sequences GCATG than Rbfox occupied be to likely more times 3.5 approximately were dog) or rat human, and (mouse genomes multiple across conserved ( human and mouse between conserved evolutionarily was that a GCATG sequence contained sites of bound (<15%) fraction small a only Although sites. binding Rbfox CLIP-defined within sequences GCATG of conservation evaluated evolutionary we the Second, proteins. Rbfox with synergistically interact cells stem onic embry human in RBFOX2 of studies CLIP in observed previously ( clusters distal and proximal both for GCATG, contained that hexamers and ref. AS exons. flanking regions intronic conserved and distal proximal TGCATG plot. ( the scatter was significantly onto overlaid shapes, filled with are highlighted motifs to five these similar in inset by an judged alternative exons to constitutive proximal regions conserved to weakly to AS compared exons proximal regions conserved in highly motifs enriched The most five regions. conserved to weakly relative regions, conserved highly of within word each enrichment The exons. to constitutive relative to cassette proximal The ( in proximal points) (gray hexamers 4,096 ( genes. different among regions conserved highly of distribution the ( transcriptome. human the within regions conserved highly identify to used strategy ( exons. spliced alternatively around space intronic distal in regions conserved in hexamer enriched TGCATGmost the is 2 Figure nature structural & molecular biology molecular & structural nature a Fig. 1 Fig. Fig. b a ) A flowchart of the computational computational the of ) A flowchart Last, we Last, found that the of ontologies genes bound in intronic distal 35% 5 y axis indicates the enrichment of each word within intronic regions regions intronic of within word each the enrichment indicates axis

) confirmed a significant enrichment for TGCATG ( TGCATG for enrichment significant a confirmed ) 1 892,026 regions 655,467 regions Filter f d c ; see see ;

). Nevertheless, a statistically significant ( significant statistically a Nevertheless, ). (additional motif information is in information motif (additional The Rbfox binding motif motif binding Rbfox The TF-BS Supplementary Fig. 1d Fig. Supplementary Repeats Supplementary Fig. 1g Fig. Supplementary Fig. Fig. b 5 ) Pie chart showing showing chart ) Pie was present, likely representing other proteins that that proteins other representing likely present, was –20% –5% 3U 5U DI PI 1 E c , e Coding exon 3 5 Proximal intro Distal intro High ′ ′ ). Third, ). as Third, Rbfox1 and Rbfox2 proteins inter UT UT

R R

Low Supplementary Fig. 1f Fig. Supplementary

n c Supplementary Fig. 1e Fig. Supplementary 1 0

, d n

) Scatter plot of enrichment scores for scores of plot ) enrichment Scatter c d

). In addition, a GU-rich element element GU-rich a addition, In ).

2 for the entire list of statistically statistically of list entire the for Enrichment around AS exons (� ) –10 –10 –10 –10 10 10 10 10 c P 0 0 2 2 2 2 ) and distal ) ( and distal Supplementary Fig. 3b Fig. Supplementary Fig. 1 Fig. < 0.01) enriched in both < enriched 0.01) Supplementary Table 2 Supplementary Enrichment inconservedregions –10 3 c –10 ), GCATG sequences GCATG sequences ),

advance online publication online advance 2 ). x de de novo P axis indicates the indicates axis –10 d < 0.05) number number 0.05) < ) intronic regions. regions. ) intronic ). Furthermore, ). Furthermore, 0 10 approach is approach P 10 in < 10 < 2

). ). Words vivo 10 3 ( � DI −10 PI for 2 by ) - - - )

flanking AS exons. We tested whether the highly conserved distal distal enrich regions: control AS for expected properties conserved two have regions highly the whether tested We exons. AS flanking introns in enriched preferentially and conserved evolutionarily are predicted that distal GCATG motifs, like proximal GCATG motifs we development, in important exons mammalian regulate repressors On the of basis our that Rbfoxhypothesis distal or enhancers splicing TGCATG etiology. disease and regulation RNAin roles functional their support that features other and targets gene properties, conservation sequence have sites Rbfox distal that indicated brains mouse in proteins Rbfox of maps interaction RNA sites mal Rbfox1 autoregulated exon encoding an RNA-recognition motif in each of the We Rbfox away distal sites binding a from kilobase the identified also Ca to sensitive more are that the in ( in exon seizure-associated the of downstream regions distal and proximal both bind to observed was Rbfox function. neuronal in autism cated such as regions, by intronic Rbfox in distal contain conserved GCATG contain conserved intronic and distal motifs, that conserved that determined Having Distal regions ( proximal and distal most enriched conserved ( TGCATG motif, tively spliced exons ( constitu to compared exons, AS around regions, depleted preferentially distal but and proximal both in conserved highly be to found was (CAATTA) motif Another regions. distal in but selection regions, negative proximal in selection evolutionary positive under are regions ( served regions in proximal conserved to AS exons but in depleted con distal dine tract–binding protein (PTB) family of splicing factors, is enriched For example, a CU-rich motif, which is a substrate for the polypyrimi identical to that of distal regions ( tion of proximal regions conserved around AS exons ( and regions; 224,813 or total, the of (35% regions distal in falling tome introns, regions area within with great number of located these high conservation, almost half (42%) of all highly conserved transcrip tium consor ENCODE by the identified sites binding factor transcription and rRNAs snRNA, microRNAs, elements, RNA or DNA repetitive excluding bases, 51 length average of transcriptome, human the in First, we computationally identified 655,467 highly conserved regions (as opposed to another from function) proximity to alternative exons. and for relevance tion conservation, from AS evolutionary regulation func infers other. each against strategy scores This two these plotted exons relative to regions flanking constitutively spliced exons, and then intronicserved regions flanking alternative cassette (single-exclusion) con in enrichment conserved hexamer scored We weakly separately regions. intronic to relative regions intronic conserved highly in ref. Methods; constitutive than exons. To more a (Online we computational this, achieve strategy modified exons alternative with motifs these of tion associa preferential and motifs, regulatory splicing known for ment Supplementary Fig. 2b Fig. Supplementary Supplementary Fig. Supplementary 3a 4 6

( Kcnma1 Rbfox and Fig. 2 4

4 is ( Rbfox2

Supplementary Fig. 2d Fig. Supplementary enriched

a sites 4 ). ). Although <1% of intronic space meets the criteria for gene, whose inclusion results in potassium channels channels potassium in results inclusion whose gene, 4 5 Fig. 1 ) to score hexamers for their statistical enrichment enrichment statistical their for hexamers score to ) (

Fig. 2c Fig. Supplementary Fig. 2a Fig. Supplementary genes, in addition to the previously known proxi known to previously the in addition genes, are

Fig. 2c 2c

, and associated d ; yellow triangles), ; thus yellow that suggesting triangles), these and ref. and ref. cis , in d

, conserved ). ). We found that the d ; dark blue circles), was consistently the the consistently was circles), blue dark ;

element associated with AS exons in both vivo P 2+ ; ; light blue triangles). The Rbfox binding < by criteria). 0.01 both ( Supplementary Fig. 2c Fig. Supplementary Fig. 2 distal intronic Rbfox binding sites sites binding Rbfox intronic distal 4

2 with ) and the stress axis-related exon axis-related stress ) and the , e d

). Our genome-wide protein- genome-wide Our ). distal and

Rbfox-regulated ), are clearly important for important ), are clearly Shank1 Supplementary Fig. 3b

to

cis- alternative s e l c i t r a , previously impli , previously element element composi Fig. Fig. 2 and ref. ref. and

exons c

) ) was not exons Snap25 Fig. 2 Fig. 4 5 , 3 1 8 ). ). ).  b ------,

© 2013 Nature America, Inc. All rights reserved. motif, motif, at different levels of in evolutionary conservation, the proximal GCATG a or binding Rbfox for evidence CLIP was there which for exons AS unaffected and Rbfox-regulated of proportion the mined ( control) empty-vector an to compared RBFOX by expressing or ectopically cells 293T (in gain pairs) RBFOX sibling wild-type to compared mice knockout (in loss excluded ( 3 in Rbfox2 1e Fig. bound in 3 and distal intronic regions by Rbfox1 and Rbfox2, as compared to ones proximal in bound of genes to categories related are AS more closely by regulation undergo that genes of Notably, categories indirect. are Rbfox2 Rnps1 as such ponents ( correlated positively more were lines cell these in inclusion exon in changes regulated the ( exons cassette of hundreds of AS in resulted cells 293T in RBFOX of each expression ectopic contrast, in direction; same the in changed 379) of (210 half about only ments, (| spliced differentially were that exons ( mice RT-PCR measurements in the publications describing these knockout The degree of differential splicing upon Rbfox loss correlated well with 4f Fig. (see respectively Rbfox2, and Rbfox1 of loss upon tively spliced in brain (changealterna in were the absolutethat valueexons of mouse common) in (379 934 and 620 fied (psi, percent-spliced-in Table 3 Supplementary ( expression gene overall on effect no well as ectopic expression of RBFOX in human 293T cells, had almost as Rbfox2, and Rbfox1 of loss Both plasmids. control or empty-vector RBFOX3 RBFOX2, RBFOX1, either expressing ectopically cells controls wild-type paired and animals (KO) brain mouse whole nestin-conditional from isolated homogenized from RNA-seq strand-specific by identified were exons Rbfox-regulated regulation. AS dependent we general, next the investigated role of GCATGdistal sites in Rbfox- regions flanking annotated AS are enriched in Rbfox binding motifs in (gray). region that in all at motifs no with or (orange) downstream or (purple) upstream regions (right) distal or (left) proximal in human) in > 0.1 score BL mouse in > 0.2 score (BL motifs ( mouse 5 Figure Supplementary in is features other and experiments RNA-seq other using analyses of accounting A full region. intronic indicated the in feature a particular possess which exons unaffected versus changed of proportion relative * to according and positive gray, the on negative the on reflected ( included differentially as classified were Exons panel). each above (listed downstream or up regions, intronic distal or proximal within regulation Rbfox direct for panel) each of column first the below (listed evidence ( 3 Figure s e l c i t r a  of columns ‘DI’ and (‘PI’ distal or ∆ a P

– Ψ Cassette Cassette exons were divided into differentially included ( To estimate the extent of Rbfox-dependent exon usage, we calculated < 0.05 and ** and < 0.05 d

) Bar plots depict the fraction of cassette exons for which there is there which for exons cassette of fraction the depict plots ) Bar ≥ 5%; blue, on the positive positive the on blue, 5%; and Supplementary Fig. 4h Fig. Supplementary , knockout, suggesting that some of the AS changes measured measured changes AS the of some that suggesting knockout, ), suggesting a separable biological function of Rbfox1 and and Rbfox1 of function biological separable a suggesting ), g e

) and human ( human ) and and Both proximal and distal Rbfox motifs regulate splicing. splicing. regulate motifs Rbfox distal and proximal Both ∆ Clk4 ′ UTRs by Rbfox1 and Rbfox2 ( ′ Rbfox2 Ψ UTR–mediated gene regulation. gene UTR–mediated Supplementary Table 4 Supplementary < −5%) and unaffected (| were themselves alternatively spliced upon P Mbnl1, < 0.001, for a Fisher’s exact test comparing the the comparing test exact a Fisher’s for < 0.001, RNA-seq experiments in mouse ( experiments RNA-seq f ) cassette exons which have conserved GCATG conserved have which exons ) cassette . . ( Supplementary Fig. 4j Fig. Supplementary y y Ψ axis and also reflected on the negative negative the on reflected also and axis

e ). axis) or not changing (−2% < (−2% changing not or axis) Mbnl2, , ) values ) for values annotated AS exons. We identi f ) Cumulative distributions of of distributions ) Cumulative y Supplementary Fig. 4f Fig. Supplementary , axis), excluded ( excluded axis), i ; refs. refs. ;

Prpf18, Rbfox1 Fig. 3a Fig. for the list of regulated exons). exons). of regulated list for the 28 Supplementary Fig. 4a Fig. Supplementary ∆ Supplementary Fig. 4g Fig. Supplementary

Ψ , Cwc22, −/− ∆ 2 Fig. Fig. 1 9 | | < 2%) categories by Rbfox Ψ – ). In mouse brains, of the the of brains, mouse In ). and and d | | ), upstream ( upstream ), 28 ≥ ). RNA-splicing com RNA-splicing ). ∆ f 5%) in both experi both in 5%) , 2 Ψ

and Rsrc1, a Rbfox2 9

, , and human 293T 293T human and , ≤ b −5%; goldenrod goldenrod −5%; ) ) and human ( ∆ Ψ Supplementary Supplementary Supplementary Supplementary ∆ Ψ – or |

Ψ Raly, < 2%; < 2%; −/− j ). Wedeter ). values for for values Rbfox1 ∆ ∆ knockout knockout advance online publication online advance Fig. 3a Fig. Ψ Ψ

y Thrap3,

axis) axis) | – > 5%), ≥ ), but but ), e c and and 5%) and , d

, c ). ). - - - - - )

can elicit regulatory effects on AS, the cumulative distributions of of distributions ∆ cumulative the AS, on effects regulatory elicit can motifs. conserved distal downstream had exons unaffected ( fraction higher a that found CLIP; ( sites binding Rbfox CLIP-defined distal downstream, contained exons unaffected motifs ( exons than exons unaffected included contained downstream, distal conserved differentially of proportion higher a regions, proximal in ( sites binding ( distal motifs upstream, conserved contained of exons fraction unaffected than higher exons statistically excluded a loss, Rbfox2 In changes. splicing with inclusion exon correlates binding repression downstream whereas with recognition, associated exon of is binding Rbfox upstream that ‘position-dependent’ rules: previous confirming regulation, Rbfox with associated was regions intronic proximal within interaction Rbfox ( splicing on effects dramatic, respectively; 3c Fig. exons, excluded of upstream and included of stream ( cells 293T in expressed cally ectopi was RBFOX2 when observed were effects inverse expected, ( regions intronic proximal downstream in included upon Rbfox2 loss are of depleted CLIP-defined binding sites ( region proximal stream down the in GCATG motifs conserved contain bars) (golden exons ( region proximal upstream in the GCATG motif a conserved contain included tially AS exons (blue bars) than exons unaffected (gray bars) differen of fraction higher significantly statistically a brain, mouse ( downstream or P Ψ < 1 × 10 × 1 < As further support for the hypothesis that distal Rbfox sites sites Rbfox distal that hypothesis the for support further As with interaction Rbfox of distal association the We examined next Exons withRNA-se c a values for mouse exons with either upstream or downstream, downstream, or upstream either with exons mouse for values RBFOX2 gain Rbfox2 loss Fig. 3 Fig. , d P evidence for: Fraction of exons (%) ). RBFOX1 and RBFOX3 experiments had similar, but less less but similar, had experiments RBFOX3 and RBFOX1 ). < 5 × 10 Fraction of exons (%) 10 10 20 15 10 10 0 5 0 5 −3 b Upstream ** PI ; ; * ). In RBFOX2 ectopic expression in human cells, we we cells, human in expression ectopic RBFOX2 In ). P

P Cons 5 ** Fig. 3 Fig. < 9 × 10 × 9 < < 2 × 10 × 2 < , 1 GCATG Cons GCATG q 8 −4 Fig. 3b Fig. DI . P CLIP * *

P ; ; Rbfox1 I Fig. Fig. 3 < 1 × 10 × 1 < No splicingchange Exclusion Inclusion nature structural & molecular biology molecular & structural nature d Upstream a

), whereas a higher fraction of excluded AS AS excluded of fraction higher a whereas ), Rbfox2 ** * PI −3 −2 , Downstream P d b for Rbfox1 CLIP; CLIP; Rbfox1 for ; ; < 4 × 10 × 4 < ) intronic regions. Upon Rbfox2 loss in in loss Rbfox2 Upon regions. intronic ) ). ). Also, a of higher fraction excluded than Fig. 3 Fig. DI * P Supplementary Fig. 5a Fig. Supplementary −2 P < 4 × 10 × 4 < ; ; < 3 × 10 × 3 < Fig. 3 Fig. a * ). Notably, unlike what we found found we what unlike Notably, ). f e −6 Cumulative fraction Cumulative fraction ; ;

Fig. 3 Fig. of exons of exons a 0 1 0 1 −2 b ) and CLIP-defined Rbfox2 Rbfox2 CLIP-defined and ) −4 ** ** ** ** ; ; –0.2 –0.2 ** Fig. 3 Fig. P , , No ankingconservedmotif Downstream conservedmotif Upstream conservedmotif < 4 × 10 × 4 < P P b ∆ PI 0 0 * < 3 × 10 × 3 < < 2 × 10 × 2 < � ). Interestingly, exons exons Interestingly, ). 0.2 0.2 RBFOX2 Downstream d Rbfox2 PI PI * ) of excluded than than excluded of ) −3 , loss b ** −2 ** gain ; ; −3 * * –0.2 –0.2 ). Therefore, Therefore, ). Fig. 3 Fig. for Rbfox2 Rbfox2 for for down for DI * ∆ 0 0 � 0.2 0.2 * b DI DI ). As ). - - - -

© 2013 Nature America, Inc. All rights reserved. called MENA) exon 11a (E11a). Biochemical evidence demonstrates demonstrates evidence Biochemical (E11a). MENA) 11a exon called show that genes AS human two RBFOX2-dependent from exons regu investigated motifs we Rbfox AS, late distal that principle of proof demonstrate To Distal changes. AS Rbfox-dependent with associated elements and GCATG motifs conserved that demonstrate approaches plementary com two these sites, proximal for that than complex more sites likely is distal by mediated changes AS of directionality the Although ( background from different significantly was motifs served of distribution the motifs; GCATG of distribution the to compared test) (KS) Kolmogorov-Smirnov two-sample tively; ( distal GCATGconserved motifs are statistically significantly different vitro in transcripts. minigene-derived of analyses ( sequences. GCATG and TGCATG (blue) (red) (unfilled) nonconserved or (filled) conserved represent Ovals illustrated. are E11-E11a-E12 ( sites. nonconserved mutated mutN, sites; conserved mutated mutC, only; reagent transfection Mock, lane. RT-PCR show of panels analysis Bottom shown. are in (boxed ( green. dark in densities continuous as represented are conservation evolutionary of scores PhastCons track). blue (CLIP-seq; brain mouse in Rbfox2 and track) red (CLIP-seq; brain mouse in Rbfox1 track), green (iCLIP; cells 293T in RBFOX2 for densities) continuous as (graphed reads CLIP-seq of density highest the match and box, a gray with GCATGoutlined are motifs conserved distal overlap that sites binding protein–RNA Rbfox bars. orange ( 23 exon 4 Figure nature structural & molecular biology molecular & structural nature in Rbfox1 with interact exons these flanking sites intronic distal that a Kangaroo ra b Mouse lemur b mouse brain Whol mouse brain Whol 293T cells Human P Zebra finc Rock hyrax Tree shrew , Guinea pig Orangutan < 0.05 and and 0.05 < e Bushbab Marmoset Armadillo GCATG sites Elephant Megabat Conservation Platypus Microbat ) Phylogenetic conservation of TGCATG (red) and GCATG (blue) elements within the highlighted distal intronic regions in the the in regions intronic distal highlighted the within TGCATG of GCATGand elements (red) (blue) conservation ) Phylogenetic Wallaby Baboo Squirre e e Dolphi Rhesus Huma Tarsier in Alpac Mouse Rabbi Chimp Horse Cow Pika Dog

–translated RBFOX2 protein binding to biotinylated RNA containing consensus TGCATG motifs (lane 1) or mutated RBFOX motifs (lane 2). (lane motifs RBFOX mutated or 1) TGCATG(lane motifs consensus containing RNA biotinylated to binding protein RBFOX2 –translated vivo distal Rbfox binding sites are active splicing regulatory regulatory splicing active are sites binding Rbfox distal vivo Rat Rbfox Rbfox2 Rbfox1 RBFOX2

h n n a n y t t l a Distal conserved regions containing Rbfox sites control splicing of upstream alternative exons. ( exons. alternative upstream of splicing control sites Rbfox containing regions conserved Distal a ) and ) and C G T C C C C C C C C C T C CLI iCLI and and ...... CLI G A A T T T T - ...... A T C G - ...... P P P T ......

G 12 62 ...... sites C 0 5 0 0 ...... d ENAH A P ...... ). ( ). T ...... ∆ 006 o u- n dwsra mtf, respec motifs, downstream and up- for 0.006 < KIF21A G ...... Fig. 3 Fig. G G C T T . A Ψ ......

c C C A T ...... regulate E23 , C C C C C A ...... values for exons without intronic conserved conserved intronic without exons for values f exon 11a ( 11a exon ) Cartoon representations of binding sites for MOs or vMOs targeted to block distal RBFOX sites in in sites RBFOX distal block to targeted vMOs or MOs for sites binding of representations ) Cartoon ∆ isoforms Ψ e C A ...... 2 G aus o eos ih ontem con downstream with exons for values A A ). In ectopic expression experiments, only only experiments, expression ectopic In ). G A ...... C A A A A T T T 1 ...... C T : : ...... G C ...... KIF21A

C A ...... AS G A ...... T ...... G ...... d

C C T T ...... 2 kb in vitro G ) and neighboring intronic and exonic regions. The location of GCATG sequences in both genes is marked by marked is genes both in GCATGof sequences location The regions. exonic and intronic neighboring ) and C C T T ...... G G C A A A T ...... exon 23 (E23) and and (E23) 23 exon c Lane: Ψ

KIF21A is listed below each lane, as quantified by image densitometric analysis. ( analysis. densitometric image by quantified as lane, each below listed is and HS578T cell 39 <5 1

Mock in vivo 90 30 4 3 2 KIF21A

α -EPB41 advance online publication online advance <5 18 α s -KIF21A IE MO <5 MO1 986 d 39 15 and and

α 0 1 0 7 0 0 -KIF21A

ENAH MO1 RT-PCR RT-PCR KIF21A EPB41 ENAH ENAH E11 MO2 MO2 Fig. 3 Fig. (also (also isoform Kangaroo rat Mouse lemur splicing in the presence or absence of morpholinos. morpholinos. of absence or presence the in splicing e Zebra finc Tree shrew Rock hyrax Guinea pig Orangutan Hedgehog Bushbaby Marmoset f Opossu Armadillo Elephant Megabat Platypus Microbat Chicke ). ). Wallaby Baboon Squirrel Rhesus Dolphin Huma - - - - Tenrec Tarsier Mous Rabbit Gorilla Chim Shrew Hors

Slot E11a Co s Pika Do Ra m w h e n p n g h e t cells treated with an MO against the first distal site, indicating that that indicating site, in distal first ~18% the to against MO an with cells treated cells mock-treated in ~39% from reduced was E23 of ( cells HS578T in splicing E23 these alter to against sites designed (MOs) oligonucleotides morpholino sense of anti ability the we tested splicing, to alter transcripts endogenous of context in the function sites conserved To distal these exon. whether assess next the of upstream nt ~550 and exon alternative the nt of 42 located are sites downstream (kb) kilobases ~3.3 of distal homology region in a short apart two The distal motifs. and RBFOX proximal both conserved found we intron, flanking downstream in (E23) (ref. 1 type muscles extraocular the syndrome Down in pressed distances. at long ity pairs base away from any exon, us allowing to functional their assess ( ( mouse brains and Rbfox2 in both mouse brains and human 293T cells T Fig. Fig. 4a Fig. C . G ...... T ...... KIF21A G ...... C ...... A ......

T ...... 4b G ...... G G T ...... , C C T ...... , ...... d e A G ...... G ...... KIF21A

). Furthermore, ). binding Furthermore, sites areacross genomes conserved G ). The distal TGCATG motifs in both exons are 500 exons at TGCATGin both least motifs distal ). The ...... is a member of the kinesin superfamily that is overex is that superfamily kinesin the of member a is C 2 kb ...... A ...... T ...... G ...... A G ...... E1 G G G G G G G C A ...... C C A T ...... T ...... 2 C C C T . . . T...... is RBFOX2-dependent, and when we analyzed its its analyzed we when and RBFOX2-dependent, is G C A C ...... T ...... G C C C A T - - . . . . - ...... - G A A C A A A A A A A A A A A A A A T ------...... f - C T - ...... C T ...... G ...... a Lane: C ...... , A ...... d T ...... g ) Genomic regions showing showing regions ) Genomic G ...... ) Three-exon minigenes consisting of consisting minigenes ) Three-exon 4 C C T ...... Kidney 11 16 7 1 and mutated in congenital fibrosis of of fibrosis congenital in mutated and g h Mock ENAH ENAH ENAH

19 ENAH 1 1 4 3 2 α -ENAH mutN mutC i <5 ) Pull-down assay measuring measuring assay ) Pull-down

4 WT 1 5 43 α 8 - vMO KIF21A WT EPB41 ). Inclusion of a 21-nt exon exon 21-nt a of Inclusion ). <5 Ψ 3 4

<5 α is listed below each each below listed is mutC -EPB41 vMO1 Fig. 4 Fig. Liver KIF21A 3 2 1 ( 33 48 vMO2 3 1 Mock

c mutN ) and ) and s e l c i t r a 38

c α -

, top). Inclusion Inclusion top). , ENAH

KIF21A Mock and and 27 ENAH 4 α h -EPB41 vMO vMO ) ) RT-PCR 28 23 i ENAH 4 α - vMO1 EPB41

( WT f RT-PCR RT-PCR )

EPB41

ENAH

vMO2 gene gene

mutC  - - - © 2013 Nature America, Inc. All rights reserved. (vMOs, MOs with chemical modifications that improve their efficiency an in transcripts endogenous in sites enhancer activity. Wesplicing investigated the offunction these distal RBFOX RBFOX-dependent mediate sites distal conserved the that suggested intron flanking proximal the in GCATG sequences served orthologous positions in at avian found genomes be ( also can and genomes mammalian 34 least at in served con is well motifs A exon. group of three regulated kb the from ~1.8 of downstream intron the in motifs ( in human cells 293T sites for Rbfox1 and Rbfox2 1.8 kb downstream of E11a in mouse brain, (refs. manner specific transition mesenchymal sites. proximal conserved of presence the in even splicing optimal for required is and enhancer splicing distal strong a is E23 of downstream motifs RBFOX distal ( splicing E23 affect (ref. gene MO an cytoskeletal in the event a heterologous against control, directed a As efficacy. MO inhibits conformation physical the site’s or line in cell because this splicing not site regulate does this because either site had no effect, distal to second the complementary MO An a distance. from splicing E23 regulates motif TGCATG this of the RNA bridge to position Rbfox sites (red ovals) close to an exon to regulate splicing. experiments using each of the structures in structures are shown. ( below. Vertebrate conservation from phastCons is represented as continuous density in dark green. ( across mammalian and even avian genomes is shown with base-paired nucleotides indicated in green. The location of this RNA bridge is shown in detail (mfe > −50 kcal per mol; gray). PhastCons conservation scores are illustrated on the circumference of the circle. A stem-loop structure that is conserved E12 (green) of the RNA bridge is made more stringent. The ratio between these fractions is depicted as a red line. ( (light green) and constitutive (black) exons with neighboring predicted RNA bridges as the negative minimum free energy (mfe) threshold for defining an RNA bridges. Regions proximal to exons were tested for the ability to pair to all positions in the distal region in the same intron. ( Figure 5 s e l c i t r a  in Maximum d a STEM-ab

Wild type

STEM-b STEM-b STEM-b STEM-a STEM-a STEM-a STEM In contrast to to contrast In ENAH com vivo com com distal distal prox prox 4 PI p 5 9 p p ; see Online Methods). E11a was partially included in in included partially was E11a Methods). Online see ; , ) altered splicing of its intended target transcript but did not did but transcript target intended its of splicing ) altered

2 An RNA bridge between ENAH E11a and a conserved distal RBFOX site is necessary for exon inclusion. ( U A 1 E11a shows notably reduced inclusion during epithelial- during inclusion reduced notably shows E11a C U C C U U C U A A A A C G U U U A A A C U A C G C G C G C G C G C C C G G G G C ,51), and our genome-wide CLIP assays identified binding binding identified assays CLIP genome-wide our and ,51), C C G C G C G C G C G C C C G G G G C DI G C C C C G G G G C C C G G G G C C G C C C C C C C C C C C C C C C C C C C C G C C C C U U U U C C C U U U U U U U C U A G G G G C C C G G G G U A U U U U U U U U A A A A U U U A A A A U U G G C G G G G U U U G G G G A U C C C G G G G G C G G G G G C A U G G A G C C C U A U G A G A G A a A U a G G G A C C C U A U U U U U U U A A A PI A U KIF21A ENAH G G A G C C C U A U A A A C G A A G A A U U U U U U U A U A A A A U 2 U G U U U U U U U U A A U C C C C C C C C U C U C U C 1 U A U U U C U A A A A A A A A U U U U A A A A A U U U A A G C U . E11a splicing is regulated by RBFOX2 RBFOX2 by regulated is splicing E11a . U U U U A A A A U U U A A A U e U A Fig. 4 Fig. G G G G C C C C G G G C C C C A G G C ) RT-PCR analysis of C C C C A C C C C U Fig. 4 Fig. C U U U U U A C C C U U U U U U U U U U U pre-mRNA are displayed in a circos plot (at left). Paired regions are classified as strong (mfe U C G U U U U G G G G U U U U G G G CC CC CC CC G C C C CC C G PI G G G G G G G C C C G U A A A A A U U U U A A A A 5 U U U U U A E23, the only conserved GCATG sequence sequence GCATG conserved only the E23, U U U U U U U U U U U U U U U U U U U U U U U U A A 0 U U U U U U U U U U G C C G C G C G C G , and is spliced in a breast cancer subtype– cancer in a breast , and is spliced C C C G G G G C c UA UA UA UA UA UA UA UA G G G G A C C C C G G G U G C C C d C , bottom). We concluded that one of the the of one that We concluded bottom). , C G U U U U U U U A A A A U U U U A A A A C A U A A A A A A U U U U U U G C C A A A G U U G ) ) and stem cells in human embryonic U U U A C DI G C C A G G U b b U A C A A A G C C A U A U GC GC A U G C C C C C C G G G C G G G C C G G C A A A A U U U U U U U U A A A U U U U A U U U A C G G G G G G G C C C C G G G G G G C C C C C G C C C C G G G G G G G C A A A A A A A A U A U C C C C G G G G C C C C G G C PI ENAH 1.8 kb in d

vivo ENAH Fig. 4 . Ψ Lane: E11a are located distally, distally, located are E11a is listed below each lane. Prox, proximal; comp, compensatory mutations. ( context using vivo-MOs vivo-MOs using context b f e structural mutants in transfected T47D cells. Labels above each numbered lane correspond to d WT 74 ). The absence of con 1 Exons with neighboring (high activity) Bridged enhancer (low activity) Distal enhancer

<5 bridges (%) 3 2

Prox 0 2 4 6 8 STEM-a Bridge stringencycutoff(–mfe) <5 Distal 45 38 4 advance online publication online advance Comp 50 <5 Constitutive Cassette EPB41 ENAH 5 Prox STEM-b 55 <5 RNA bridge 6 5

. Distal - - 60

60 7 Comp greatly reduced E11a inclusion in both tissues ( tissues both in inclusion E11a reduced greatly ( motifs RBFOX conserved the of two to plementary ( control saline a with treated mice of liver and kidney from isolated mRNA molecular phenomenon. molecular this of mechanism the addressed we Next exons. RBFOX-regulated and sites RBFOX distal of presence the between association tistical the of E23 and the in E11a exon of splicing control elements GCAUG conserved evolutionarily distal that demonstrate functionally ments when ( lost sites was RBFOX mutated which we binding, protein-RNA confirmed with RBFOX2 sites RBFOX distal the of two containing ( on effect splicing little had ovals) (open sequences GCATG GCATG nonconserved distal conserved three ( the sequences mutating by skipping complete almost exon to reduced was E11a of inclusion functional, are human ( the of region E11-E11a-E12 the representing reporters splicing minigene three-exon various with HCC1954 cell line cancer breast human transfected we vMO, the of effects target sites. binding conserved distal from conditions ological physi under E11a of AS regulate proteins WeRBFOX that conclude vMO2; ref. ( event a heterologous target that vMOs Control 65 Fig. 40 8 STEM-ab comp To further validate the role of distal RBFOX sites and exclude off- exclude and sites RBFOX of distal role the validate To further 86

9 4

Max STEM 1.8 2.0 2.2 2.4 g

). In strong support of our hypothesis that these distal sites sites distal these that hypothesis our of support strong In ). Fig. Cassete/constitutive ratio Cassete/constitutive i. 4 Fig. 4 Fig. 4 Fig. 9

4 ) ) showed splicing changes only in their intended targets. c c f ) All predicted RNA-RNA interactions within E11-E11a- KIF21A , upper panel, mock). Injection of a single vMO com vMO of a single Injection mock). panel, , upper E1 h d E1 , lane 3). Co-precipitation of biotinylated RNA RNA biotinylated of Co-precipitation 3). lane , 1a ) Wild-type and mutated (red letters) RNA duplex h

1a , lanes 1 and 2). In contrast, mutation of two two of mutation contrast, In 2). and 1 lanes , Weak (–40>mfe≥–50) RNA-RNA interactions nature structural & molecular biology molecular & structural nature

Strong (mfe<–50) E1 11a 1 gene and, more globally, there is strong sta strong is there globally, more and, gene prS Fig. 4 Fig. TEM

a C

) A schematic of the strategy used to find

o n

RNA br i s

). In summary, the above experi above the summary, In ). e

−50 kcal per mol; red) or weak r

v

a

t

i o E12 n idge f ) Model illustrating the function dS TEM b ) The fraction of cassette Fig. 4 Fig. α 3 - in 5 prSTEM

× ′ EPB41

GCATG Human AGGGCUGAGCUUUUUCUACCU GGAU UGGUAAUCACCAC vitro α f Mouse AGGGCUGAGCUUUUUCUACCU GGAU UGGUAAUCACCAC , upper panel). panel). , upper - Cow AGGGCUGAGCUUUUUCUACAU GGAU UGGUAAUCACCAC ENAH

Opossum L

ENAH AGGGCUGAGCUUUUUUUACAU GGAU UGGUAAUCACCAC ENAH

o vMO1 and vMO1

translated translated Chicken AGGGUUGGACUUUUUGUACAUU GAU UGGUAAUCACCAC

o

Chicken

C U U C A U A A G U G G U G U G U A C A G A A A G U U C C U U G U A C C A U C A A A G A C A U G U U U U C U A

p

U C C C C C C U U G A A A A G U A U G A C U U C C U U U C U U C A U U A G U G G U G Opossum

U C C C C C C U U G A A A A G U A U G A A U U C C U U U C U A C A U U A G U G G U G

Cow

vMO) vMO)

Mouse U C C C C C C U U G A A A A G U A U G A C U U C C U U U C U U C G U U A G U G G U G

gene gene

gene gene

Human U C C C C C C U U G A A A A G U A U G A C U U C C U U U C U A C A U U A G U G G U G

′ dSTEM 3 - - - - © 2013 Nature America, Inc. All rights reserved. ab prox) completely abrogated inclusion (data not shown), whereas whereas shown), not (data inclusion abrogated completely prox) ab (STEM- construct double-mutant a that found also we results, these ( mutation compensatory STEM-b the in observed was inclusion of E11a rescue Better level. normal the ( pairing base STEM-a mutations to create STEM-a comp,construct which restored We RBFOX intact. the sites remained though even two the combined of ( STEM-b) stem the structure predicted and STEM-a as to (referred subdomain any of mutation upon sion RNA( bridge mary sequence or (iii) a structure that had a complementary perfectly pri not but RNA-RNA rescue interactions that mutations pensatory disrupted (i) RNA-RNA com (ii) of in subdomain each structure, the interactions either contained that constructs minigene generated exons. alternative to proximal regulation splicing Rbfox of example classical the recapitulating effect in E11a, to close this stem-loop structure could bridge or recruit the distal RBFOX sites ( sequence primary not but structure maintain that mutations compensatory for evidence with genome, chicken the in also and genomes mammalian most in conserved was kb 0.6 or (mammals) kb 1.1–1.7 and (chicken), each arm of consisted This two structure subdomains. of loop putative a by separated ( cluster a region 10–100 ntconserved similarly upstream of the distal RBFOX to E11a to proximal immediately nt 30–120 region conserved a nect con to predicted RNAbridges overlapping two retrieved structures for RBFOX2 binding from our iCLIP in 293T cells ( cells 293T in iCLIP our from binding RBFOX2 for evidence had biochemical strong and also that met criteria above the the we criteria applied. Some AS filtering in those events, including strict an predict RNA around bridge the ( above characterized site distal the connecting bridge RNA an found we these, exons. Among 19 around bridges RNA predicted 162 identified we ment), of expression RBFOX upon ectopic altered (| exon). When we per focused only on bridges RNA bridges around potential exons that multiple were are (there exons 125 around bridges By Methods). see Online this approach, we found RNA699 predicted score (BL site GCATG conserved a of nt 50 a within arm had distal that structures identifying by regulation Rbfox distal in role around constitutive exons ( more thermodynamically stable around alternatively spliced exons than levels, suggesting that RNA conserved bridges are more common and a function of duplex stability, a variable not dependent on conservation relative enrichment of RNA bridges near alternative exons increased as constitutively spliced exons ( to compared exons, spliced alternatively near enriched significantly were These RNAbridges.approximately to conserved bridges 24,000 RNA million 2 than more of list initial the pared we structures, tant to be evolutionarily conserved, as expected for developmentally impor ments using RNAhybrid ( RNA interactions between exon-proximal and exon-distal intronic seg we first scanned cassetteall and constitutive exons for potential RNA- ments with their exon targets. To investigate a roleRNA for secondarysuch structureRNA bridges,might provide a ‘bridge’ that order linksto productively enhance distalspliceosomal activity. We reasoned that alternative in exonan to recruited Rbfoxmustbe sitesbinding Distal A nature structural & molecular biology molecular & structural nature

long-range To probe the role of this structure in regulating E11a splicing, we splicing, E11a regulating in structure this of role the To probe of Downstream a play may that bridges RNA for search our enriched Wefurther HnRNPR Fig. 5 Fig. gene (not shown) and Fig.

c Fig. 5 Fig. RNA ). The proximal and distal arms of the structure were were structure the of arms distal and proximal The ).

5 d

ENAH bridge e ). We found a dramatic reduction in E11a inclu We ). E11a in reduction dramatic a found ; lane 4) and rescued E11a inclusion to half of half to inclusion E11a rescued and 4) lane ; Fig. 5 1a or opttoa sa fr RNA for scan computational our E11a, Fig. 5

mediates P < 0.05 by a ). By requiring candidate RNA bridges b , red line). KIF21A ENAH Fig. 5 Fig. Fig. 5 Fig.

AS Fig. 5 Fig. χ 2

regulation test; had predicted RNA bridges e c Fig. Fig. 5 E23 AS event that met the ), but, notably, we did not not did notably,we but, ), ; lane 7). Consistent with with Consistent 7). lane ; c ). We hypothesized that that We ). hypothesized ∆ Ψ Fig. 5

e | | > 5% in any experi ; ; lanes 2, 3, 5 and 6), advance online publication online advance b

Fig. 4 Fig. by ). Moreover, the ENAH

RBFOX a ). E11a to to E11a cis ≥ 0.3; 0.3; ele ------

lar mechanisms that lead to distal enhancer activity in order to to order in activity enhancer distal to lead that mechanisms lar molecu noncanonical and sites binding RNA distal and proximal AS of consider must studies disease in and development future normal in networks that conclude We AS. with and variation natural of disease association functional nucleotide-level, enable will regions conserved and sites binding RNA of catalogs our genomes disease- human increases, of sequenced of source number the As mutations. underappreciated causing an within deeper represents also located introns information regulatory Abundant might imply. reports earlier of number small the than far-reaching more example, for editing. RNA formation, through RNA-bridge of inhibition through or regulation AS promotion of layer additional an adds it that, relationships. juxtaposing Beyond antagonistic for or synergistic mechanism with controllers versatile a splicing provide and splicing on effects major have can introns within RNA-RNA mary, interactions In sum directly. to RBFOX2 bind expected not be and would motifs mal stem of region in the bridge Figure in data iCLIP-seq of examination close indeed, exon; target its and regulator splicing bound a distally between dramatically distance the by shortening function can bridges RNA that indicate data Our regulation. splicing long-range for as tion RBP-mediated chaperones sites ing bind their the removing or alter blocking by to proteins shown regulatory of been activity have structures secondary mammals, In enhances exon inclusion, we have identified hundreds of distal sites sites distal of hundreds identified have we inclusion, exon enhances of an exon downstream binding and of suppresses an exon upstream Rbfox of binding proximal whereby rules positional the reaffirming of inclusion exons to according state cell or Aside environment. from the predict to used be could that information of trove untapped and vast a is there suggests this Minimally, splicing. alternative regulate actively sites these that furthermore, and, exons to distally also but protein-RNA that maps Rbfox reveal proteins bind interaction not only proximally Our processes. cellular drive that mechanisms basic be the and will diseases of human basis molecular the interactions understanding to key RBP of consequence the Establishing disease. genetic human of cause major net a as recognized processing increasingly is works RNA post-transcriptional of regulation Aberrant DISCUSSION to its exon close target ( site RBFOX distal a recruit to bridge’ ‘RNA an form that sequences can be mediated by long-range RNA-RNA regulation interactions AS that between pairedtheorize we results, these of basis the On 9). lane actually stem increased 42-nt E11a splicingperfect a efficiency above creating normal by levels pairing ( base original the extended ( inclusion E11a of half than more recovered comp) (STEM-ab mutation double-compensatory pairing interactions could enhance the splicing of long introns long of splicing the enhance could interactions pairing ref. skipping their induce to sites splice or exons mRNAs yeast in patterns ing control. AS in mechanisms regulatory distal of components common be may bridges RNA such that and distances over kilobase provide evidence that long-range we RNA-RNA Furthermore, interactions can splicing. function Rbfox-regulated with associated are that Our results suggest that long-distance regulation of splicing is is splicing of regulation long-distance that suggest results Our RNA secondary structures have long been known to alter splic alter to known been long have structures secondary RNA 5 9 ). Early work in yeast also showed that intra-intronic base- intra-intronic that showed also yeast in work Early ).

4 24 6 d 0

, revealed evidence for RBFOX2 interactions at the proxi the at interactions RBFOX2 for evidence revealed ; here we show that secondary structures can also func also can structures secondary that show we here ; 5 7 . These have most often been observed to loop out out loop to observed been often most have These . 52– Fig. 5 Fig. in 5

vivo 4 , , Fig. 5 Fig. Drosophila f ). to mediate activity of distal enhancers enhancers of distal activity to mediate ENAH e ; lane 8). Finally, a mutation that that mutation Finally,a 8). lane ; intron 11a that GCATGlacks 55 , 5 6 n mmain pre- mammalian and 56 , 5 8 s e l c i t r a (also reviewed in in reviewed (also Fig. 52 both

, 5 5 e 4  ------. ;

© 2013 Nature America, Inc. All rights reserved. 12. 11. 10. 9. 8. 7. 6. 5. 4. 3. 2. 1. reprints/index.htm at online available is information permissions and Reprints The authors declare no competing interests.financial with input from all authors. G.W.Y. the designed study. M.T.L., D.G., J.G.C. and G.W.Y. wrote the manuscript D.L.B. generated the M.P., T.Y.L., T.J.S., S.H. and K.B.M. conducted biological experiments. L.T.G. and M.T.L. and G.A.P. conducted the bioinformatics analyses. D.G., H.M., J.A., S.G., supported as an Alfred P. Sloan Research Fellow. grateful for a gift from P. Yang at Genentech that supported M.T.L. G.W.Y. is Investigator of the Howard Hughes Medical Institute. M.T.L. and G.W.Y. are by US National Institutes of Health grant RO1 GM49662 to D.L.B. D.L.B. is an under Contract No. DE-AC02-05CH1123. D.L.B. and L.T.G. were supported of Office Biological & Environmental Research of the US Department of Energy DK032094. This work was also supported by the Director, of Office andScience, to J.W.G. (CA112970 and J.G.C.CA126551). also acknowledges support from and to J.G.C. (HL045182 and DK094699) and supported partially by grants G.W.Y. (U54 HG007005, R01 HG004659, R01 GM084317 and R01 NS075449) This work was supported by grants from the National Institute of Health to manuscript. M.T.L. is supported as a National FoundationScience GK12 Fellow. and members of the Yeo, Conboy and Goldstein labs for reading critical of the The authors would like to thank A. Pasquinelli, N. Chi, K. Willert and L. Goldstein v Note: accession under codes Archive Read Sequence NCBI the from available codes. Accession the o in version available are references associated any and Methods M diseases. splicing-related of treatment the in interventions, specific and effective are proving that as those therapeutic such designed rationally targeted for tunity with demonstrated have these we vMOs, As splicing. RNA predict accurately s e l c i t r a  C AUTH A ersion

ck O

et

MP Zarnack, K. Zarnack, M.L. Wilbert, S.C. Huelga, Lagier-Tourenne,C. Tollervey, J.R. M. Polymenidou, M. Hafner, Yeo,G.W. D.D. Licatalosi, elements regulatory of list parts a from regulation: Splicing C.B. Burge, & Wang,Z. a towards splicing: alternative C.W.Understanding Smith, F.& Clark, A.J., Matlin, splicing. RNA pre-messenger alternative of Mechanisms D.L. Black, transcriptome from the exonization of Alu elements. Alu of exonization the from transcriptome abundance. factor splicing proteins. hnRNP by splicing alternative of pre-mRNAs. long (2012). processing in intersect 43 TDP-43. by regulation TDP-43. of loss (2011). from vulnerability neuronal to PAR-CLIP.by sites target microRNA cells. (2009). stem in interactions RNA-protein processing. RNA code. splicing integrated an to code. cellular Biochem.

no Any h

O E of SRP02998 ods TI wl R R C

the Supplementary f the pape the f N

e ON

G G FI pape 72 et al. et dgm t al. et et al. et , 291–336 (2003). 291–336 , cis TRIBUTI et al. et et al. et et al. Nat. Rev. Mol. Cell Biol. Cell Mol. Rev. Nat. N r l An RNA code for the FOX2 splicing regulator revealed by mapping by revealed regulator splicing FOX2 the for code RNA An . . et al. et -regulatory elements provide an important oppor important an provide elements -regulatory A 7 en Nature Direct competition between hnRNP C and U2AF65 protects the protects U2AF65 and C hnRNP between competition Direct et al. et Rbfox Transcriptome-wide identification of RNA-binding protein and protein RNA-binding of identification Transcriptome-wide Raw .fastq files of RNA-seq and CLIP-seq data are data CLIP-seq and RNA-seq of files .fastq Raw N and and Integrative genome-wide analysis reveals cooperative regulation cooperative reveals analysis genome-wide Integrative LIN28 binds messenger RNAs at GGAGA motifs and regulates and motifs GGAGA at RNAs messenger binds LIN28 Characterizing the RNA targets and position-dependent splicing et al. et r CIAL CIAL I ts HITS-CLIP yields genome-wide insights into brain alternative brain into insights genome-wide yields HITS-CLIP .

Information Long pre-mRNA depletion and RNA missplicing contribute missplicing RNA and depletion pre-mRNA Long Nat. Neurosci. Nat. mutant mice and isolated brain RNA. J.W.G., J.G.C. and ON

SRP03003 Divergent roles of ALS-linked proteins FUS/TLS and TDP- and FUS/TLS proteins ALS-linked of roles Divergent 456 Mol. Cell Mol. N S , 464–469 (2008). 464–469 , T E RNA R

E and

STS

Cell 14 48

1 14

. Source , 802–813 (2008). 802–813 , 6 , 195–206 (2012). 195–206 ,

, 452–458 (2011). 452–458 , , 386–398 (2005). 386–398 , 141 a. tut Ml Biol. Mol. Struct. Nat. Cell Rep. Cell , 129–141 (2010). 129–141 ,

Data a. Neurosci. Nat. a. Neurosci. Nat.

files 1 Cell , 167–178 (2012). 167–178 ,

are

152 http://www.nature

available , 453–466 (2013). 453–466 ,

15

1488–1497 , 14 16

in advance online publication online advance 459–468 , 130–137 , nu Rev. Annu.

the online online

online .com/ -

20. 19. 18. 17. 16. 15. 14. 13. 38. 26. 25. 24. 23. 22. 21. 48. 47. 46. 45. 44. 43. 42. 41. 40. 39. 37. 36. 35. 34. 33. 32. 31. 30. 29. 28. 27.

Guo, N. & Kawamoto, S. An intronic downstream enhancer promotes 3 promotes enhancer downstream intronic An S. Kawamoto, & N. Guo, Y.Barash, Zhang, C. Xue, Y. T.Y.evolutionarily Liang, of & analysis Yeo,G.W.,E.L. and Nostrand, Discovery Van J. Ule, J.I. Hoell, König, J. Sebat, J. Sebat, eai T, eeln BM & oc P Dsa rglto o atraie piig by splicing alternative of regulation Distal P. Dovc, & B.M. Peterlin, T., Lenasi, phosphatase myosin a of Splicing S.A. Fisher, & S.A. W.P., Mohamed, Dirksen, structure stem A M.A. Garcia-Blanco, & E.J. Wagner, E.L., A.P.,Lasda, Baraniak, depends exon EIIIB fibronectin the of splicing P.A.Alternative Sharp, L.P.& Lim, splicing: J.Y.alternative Wu, pre-mRNA & Caspase-2 Z. Jiang, S., Dupuis, J., Coté, Lapuk, A. Heidary, G., Engle, E.C. & Hunter, D.G. Congenital fibrosis of the extraocular the of fibrosis Congenital D.G. Hunter, & E.C. Engle, G., Heidary, Salemi, M. encyclopaedia. human the ENCODE: B. Maher, evolutionarily of analysis and Discovery T.Y. Liang, & E.L. Nostrand, G.W., Yeo, produce to expression protein Fox of Autoregulation D.L. Black, & A. Damianov, by channels potassium of splicing alternative of Control D.P. McCobb, & J. Xie, Johansson, J.U. D. Sato, M. Pistoni, L.K. Davis, C.L. Martin, K. Bhalla, J. Lim, motifs RNA-binding with protein novel A S.M. Pulst, & D.P. Huynh, H., Shibata, splicing The Conboy,J.G. & I. Dubchak, Schokrpur, S., S.L., Gee, Minovitsky, S., J.P. Venables, T.L. Gallagher, G.W. Yeo, L.T. Gehman, L.T. Gehman, to interact PSF and Fox-3 S. Kawamoto, & R.S. Adelstein, Y.C., Kim, K.K., Kim, usage of a neural cell-specific exon. cell-specific neural a of usage Fox-2. and Fox-1 36 skipping. or inclusion exon modulate to repressor splicing general the by elements. regulatory splicing intronic conserved 444 Biol. Mol. Struct. Nat. resolution. nucleotide 869–876 (2007). 869–876 splicing enhancer in equine beta-casein intron 1. intron beta-casein equine in enhancer splicing (2003). a and cis-elements intronic element. enhancer/silencer by exonic bipartite novel regulated is exon alternative 1 subunit targeting approximatingby introniccontrolelements. transcripts cell-type-specificmediates 2 receptor factor fibroblast splicinggrowth in TGCATG repeats. specific on USA Sci. Acad. Identification of an intronic element containing a decoy 3 cancer. breast in muscles. Sci. elements. regulatory splicing intronic conserved factors. splicing negative dominant hormones. stress complexneuronaldevelopment/function.for Genet. Hum. 9 in a mouse model of facioscapulohumeral muscular dystrophy (FSHD). (2012). hemiparesis. developmental and Science autism. for gene candidate a Genet. Hum. gene. J. A2BP1 the disrupt epilepsy) and retardation (mental phenotypes abnormal degeneration. cell Purkinje of disorders ataxin-2. with interacts flank tissue-specific alternative exons. regulatoryelement, UGCAUG, is phylogenetically and spatially conserved in introns that (2013). tissues. cancer and normal both in splicing specific functions. muscle skeletal and cardiac progenitors. neural and cells brain. mammalian the in excitation function. motor mature and development (2011). activate neural cell-specific . , e1003186 (2013). e1003186 , , 996–1006 (2009). 996–1006 ,

, 580–586 (2006). 580–586 , 34 et al. , 569–571 (2013). 569–571 , et al. et et al. et

316 et al. et et al. Semin. Ophthalmol. Semin. et al. et et al. et al. t al. et et al. et et al. et et al. et al. et t al. et Genome-wide analysis of PTB-RNA interactions reveals a strategy used A protein-protein interaction network for human inherited ataxias and ataxias inherited human for network interaction protein-protein A

, 445–449 (2007). 445–449 , An RNA map predicting Nova-dependent splicing regulation. splicing Nova-dependent predicting map RNA An et al. et t al. et 90 iCLIP reveals the function of hnRNP particles in splicing at individual SHANK1 deletions in males with autism spectrum disorder. spectrum autism with males in deletions SHANK1 t al. et Strong association of de novo copy number mutations with autism. with mutations number copy novo de of association Strong Defining the regulatory network of the tissue-specific splicing factors Exon-level microarray analyses identify alternative splicing programs t al. et t al. et

et al. t al. et KIF21A mRNA expression in patients with Down syndrome. Deciphering the splicing code. splicing the Deciphering Alternative splicing events identified in human embryonic stem embryonic human in identified events splicing Alternative 49 The de novo 16 translocations of two patients with patients two of translocations 16 chromosome novo de The , 879–887 (2012). 879–887 , 98 Rbfox1 downregulation and altered calpain 3 splicing by FRG1 by splicing 3 calpain altered and downregulation Rbfox1 Science Mol. Cancer Res. Cancer Mol. N tres f idtp ad uat E fml proteins. family FET mutant and wild-type of targets RNA Genes Dev. Genes Cytogenetic and molecular characterization of A2BP1/FOX1 as A2BP1/FOX1 of characterization molecular and Cytogenetic ae neie AB1 eein n poad ih autism with proband a in deletion A2BP1 inherited Rare , 308–311 (2004). 308–311 , , 938–943 (2001). 938–943 , h slcn rgltr bo2 s eurd o bt cerebellar both for required is Rbfox2 regulator splicing The

An ancient duplication of exon 5 in the Snap25 gene is required BO2 s n motn rgltr f eecya tissue- mesenchymal of regulator important an is RBFOX2 h slcn rgltr bo1 AB1 cnrl neuronal controls (A2BP1) Rbfox1 regulator splicing The bo-euae atraie piig s rtcl o zebrafish for critical is splicing alternative Rbfox-regulated Nat. Struct. Mol. Biol. Mol. Struct. Nat. nature structural & molecular biology molecular & structural nature 18 Hum. Mol. Genet. Mol. Hum.

, 1428–1431 (2011). 1428–1431 , 280

Mol. Cell Biol. Cell Mol. Am. J. Med. Genet. B. Neuropsychiatr.Genet. B. Genet. Med. J. Am. PLoS Comput. Biol. Comput. PLoS

22 , 443–446 (1998). 443–446 , 23 , 2550–2563 (2008). 2550–2563 , , 3–8 (2008). 3–8 ,

m J Md Gnt A. Genet. Med. J. Am. 8 RNA Nat. Genet. Nat. NucleicAcidsRes. , 961–974 (2010). 961–974 , J. Biol. Chem. Biol. J. Dev. Biol. Dev. Cell

16 Genes Dev. Genes

9 Mol. Cell Biol.CellMol. PLoS Genet.PLoS

, 405–416 (2010). 405–416 , , 1303–1313 (2000). 1303–1313 , 18

125 17 Nucleic Acids Res. , 3900–3906 (1998). 3900–3906 ,

Nature Nature , 909–915 (2010). 909–915 , PLoS Genet. PLoS PLoS Genet. PLoS 43 , 801–814 (2006). 801–814 ,

J. Biol. Chem. Biol. J. 359 3 RNA , 706–711 (2011). 706–711 ,

, e196 (2007). e196 , 275

Mol. Cell Biol. Cell Mol. 26 , 251–261 (2011). 251–261 ,

33 489

12 465 , 33641–33649 (2000). 33641–33649 , ′ , 445–460 (2012). 445–460 , 4

acceptor site. 23 , e1000278(2008)., , 714–724, (2005). , 498–507 (2006). 498–507 , , 46–48 (2012). 46–48 , , 53–59 (2010). 53–59 , , 9327–9337(2003).,

3 3 158A , e85 (2007). e85 , , e85 (2007). e85 ,

278

39

1654–1661 , 33 , 9722–9732 , , 3064–3078 PLoS Genet. ′ , 396–405 , splice site splice Proc. Natl. Mol. Cell Mol. Neurol.

Nature Am. J. Am. 144B ,

© 2013 Nature America, Inc. All rights reserved. 54. 53. 52. 51. 50. 49. nature structural & molecular biology molecular & structural nature

Rogic,S. secondary P.G.,RNA Ferreira, E. C., Eyras, Codony-Servat, & M., J. Plass, Vilardell, positively are efficiency splicing and choice site Splice M. Rosbash, V. & Goguel, K.A. Dittmar, C.C. Warzecha, manipulation vivo in Efficient J.G. Conboy, & N. Mohandas, S., Gee, M.K., Parra, (2008). in splicing of efficiency the (2012). 1103–1115 3 alternative mediates structure (1993). yeast. in pairing base intramolecular pre-mRNA by influenced sequencing.1468–1482 (2012). high-throughput by network posttranscriptional transition. epithelial-mesenchymal the mice. in morpholinos antisense using Chem. Biol. J. events splicing pre-mRNA alternative of etal.

Correlation between the secondary structure of pre-mRNA introns and 286 t al. et et al. et , 6033–6039 (2011). 6033–6039 , eoewd dtriain f bod ESRP-regulated broad a of determination Genome-wide An ESRP-regulated splicing programme is abrogated during abrogated is programme splicing ESRP-regulated An acaoye cerevisiae Saccharomyces ′ ss selection in selection ss EMBO J. EMBO Saccharomyces cerevisiae Saccharomyces

29 , 3286–3300 (2010). 3286–3300 ,

. M Genomics BMC advance online publication online advance o. el Biol. Cell Mol. Cell

72 , 893–901 893–901 , .

RNA 9 355 ,

18 32 , ,

60. 59. 58. 57. 56. 55.

af MB, igl JV, o Hpe, .. Brln, .. h poen factors protein The J.A. Berglund, & P.H. Hippel, von J.V., Diegel, M.B., Warf, alternative of mechanisms the and structure RNA Graveley,B.R. & C.J. McManus, A1 hnRNP High-affinity B. Chabot, & M. Cordeau, S., Hutchison, F.U., Nasim, Pervouchine, D.D. structure secondary RNA long-range a iStem, The Graveley,B.R. & J.M. Kreahling, of Modulation D.D. Pervouchine, & M.S. Gelfand, A.A., Mironov, V.A., Raker, Natl. Acad. Sci. USA splicing. Sci. Acad. regulate Natl. to structures RNA alternative bind U2AF65 and MBNL1 splicing. 8 mechanism. repression and out looping common a of support in selection site 5 on effects similar have repeats inverted duplex-forming and sites binding structures. RNA long-range conserved and Biol. Cell Mol. the in inclusion exon efficient for required element 37 alternative splicing by long-range RNA structures in structures RNA long-range by splicing alternative , 1078–1089 (2002). 1078–1089 , , 4533–4544 (2009). 4533–4544 ,

Curr. Opin. Genet. Dev. Genet. Opin. Curr.

25 , 10251–10260 (2005). 10251–10260 , et al.

106 Evidence for widespread association of mammalian splicing , 9203–9208 (2009). 9203–9208 ,

21 , 373–379 (2011). 373–379 , RNA

18 , 1–15 (2012). 1–15 , Drosophila Drosophila s e l c i t r a . Dscam pre-mRNA. Dscam Nucleic Acids Res. Acids Nucleic ′ splice Proc. RNA 

© 2013 Nature America, Inc. All rights reserved. Branch-length (BL) scoring to measure conservation of GCATG motifs. for local accuracy where supplied alignments were unsatisfactory. Browser and avian genome alignments ( (hg19 46-way and mm9 30-way alignments) available from the UCSC Genome Multispecies alignments. clusters were supplied as a fasta file. CLIP real as region genic same the from selected background and sequences withHOMER’s findMotifs program were: -p 4-rna -S 10 -len 5,6,7,8,9. Cluster De novo shuffled mean and s.d. regions the same. A randomly located ten times, keeping the ratio of locations in the different genic overlap,wereregionsof extent the of significance statistical To the determine and bedtools in the same type of genic region in any gene. The software packages pybedtools location selected randomly a in ‘cluster’random length a equal deriving byof compute statistically significant motifs were selected four times for each cluster 3 > Exon priority: offollowingorder the of basis the onregions genic to assigned above.Clusterswere described analysisgene-wide the byoranalysis local this reads within 1 kb and included any putative clusters thatto this ‘pre-mRNA’were significant cutoff, byfor each cluster,either we also recalculated these cutoffs for pre-mRNA the ofentire length the mappedthatwithin readsof number total Poisson the distributionthe usingby estimated cluster was each that represented clusters were identified. The number of reads withinexpected pre-mRNA using cubic splines. From the fitted curve, peaks, centers and widths method satisfy a user-defined false discovery rate (FDR) of 0.05, onbased our previous pre-mRNA, we determined the minimum height of CLIP-seq reads required to clusters identify to representing binding sites for Rbfox1 andRbfox2. For each (CLIP-seq peak enrichment; facts using samtools rmdup arti PCR remove to collapsed were reads CLIP-seq motif.GCATG enriched an revealed library each from clusters of analysis the and checks quality after CLIP-seq5. -B -Q 10 -n 1 -N 2 -t reads from replicate libraries were combined tions from Ensembl and UCSC genes and mapped with the followinggenomes using GSNAPparameters: repetitive elements were mapped to the human (hg19) or mouseping with bowtie (mm9) reference were ofcatalogconsensusfilteredReads through a genomic elements by map readsatandhomopolymers.truncatedor10-merlow-qualitysequences bases adaptor removed that scripts trimming custom to subjected were reads Raw platform. GAIIx Illumina the on sequenced were that libraries CLIP-seq ated (10 Laboratories) (Bethyl Rbfox2 and UCLA) lab, Black the in generated (custom Rbfox1 proteins the against mJ/cm (400 cross-linking ultraviolet before Falcon) (BD rapidly dissociated by forcing through a cell strainer with a pore size of 100 as previously described CLIP-seq library generation and analysis. ONLINE nature structural & molecular biology molecular & structural nature were weakly conserved regions flanking constitutive exons. -float -bits -len 6 -nlen 0 -noweight -h -minlp 0”. Background locations supplied finding with HOMER’s findMotifsGenome program were: novo De “-size given -S 100 -rna contain a GCATG pentamer. we excluded positions where the target genome (either mouse or human) didorthologscontainedmatch all possible ifperfect TGCATG.toaligned an not Last, <0.8.Then we expressed thisscore as fractiona (0 to 1)of themaximum score given ortholog after we used a sigmoida to functionUCSC) by (supplied length tobranch the severelybymultiplied were distances Edit penalize edit distances summing the edit distance from a TGCATG motif across all aligned orthologs. motifs were scored by walking through every position of the transcriptome and ′ UTR > 5 > UTR 5 motif analysis for motif analysis for conserved regions. motifanalysisforconserved . Next, we interpolated the heights of reads across the length of the the of length the across reads of heights the interpolated we Next, .

MET 65 ′ UTR > ProximalIntron.Intron> Distal regionsBackgroundUTR to> , 6 6 6 were used to enumerate overlaps between clusters and motifs. H 1 (parameters: -q -p 1 -e 100 -l 20). Reads that did not map to ODS Z score with corresponding 7 . Fresh brains from 8-week-old female C57Bl/6 mice were 6 2 . GSNAP was supplied with the location of exon junc Alignments were obtained directly from MultiZ tracks Rbfox 6 https:// 3 . A new cluster-identification algorithm, CLIPper clusters. µ github.com/YeoLab/cl g antibody per ml tissue lysate), we gener we lysate), tissue ml per antibody g Figs. 4 Parameters supplied for motif finding CLIP-seq libraries were constructed and P Parameters supplied formotif value was computed from the

5 ), were manually adjusted ippe 2 ). Using antibodies antibodies Using ). r ), was developed 6 4 . In additionIn .

Rbfox

µ m - - - -

nativelyspliced, compared tohexamers regionshighlyconservedin associated Fig. highly conserved regions relative to weakly conserved regions and, (2, puted with the counts for each hexamer as follows: (1, and 164,920 constitutive exons. A lyzed 23,982 annotated protein-coding genes, 18,551 cassette (or skipped) exons from Ensembl compiled with previously published methods transcripts available using defined wereannotations splicingconstitutive and of conservation with the total number of nucleotides in that region. Alternative genic region, we divided the number of nucleotides represented by each category lated region, proximal intron or distal intron). To compute the coverage of each assigned to functional genic categories (exon, 5 transcriptiona factor binding site. After defining conservedregions, were they bed.g hg19/encodeDCC/wgEnc UCSC genome browser here: the from (obtained Consortium ENCODE the by conducted experiments by conserved regions overlapping with transcription-factor binding sites as defined RNAribosomalormicroRNA and geneswereremoved. We removed also any overlappedRepeatMasker or shorter than 10 nt were eliminated from consideration. further Regions that combined.regionswereConservatively,kb both 2 region,thanregionslonger ofevolutionary conservation. If regiona isseparated by only nt1 with another egories of low (0 regions within the with phastCons score, Identification and word frequency analysis of conserved regions. reads mapped to all inclusion isoforms over the number of reads mapped to all Rbfox2 increases of 21% for was previously observed by western blot Rbfox2 Our RNA-seq analysis verified reductions in mRNA levels ofto use spliced reads mapped to the genome instead of an exon junction database. splicing measurements as was done in ref. -t 2 -N 1 –n 10 –Q parameters:–B following 5. the Mappedwith mapped readsand genes wereUCSC usedand Ensemblto from quantify gene-expression and genomes using GSNAP repetitive elements were mapped to the human (hg19) or mouseping with bowtie (mm9) reference werefilteredReadscatalogconsensus ofthrough a genomic elements bymap homopolymers.truncatedat10-merlow-qualityandreadssequencesor bases readswerecustomRawsubjectedremovedtoscripts trimmingwhich adaptor 150–225 nt fragments was single-end sequenced with Illumina GAIIx to 101 nt. tion. After strand-specific dUTP library preparation libraries were prepared using 8 control vector pcDNA3.1 expressing only the Flag epitope, respectively.and RNA-seqwere evaluated to be six-, four- and fourfold higher RBFOX1, than extraction. cells TrizolRBFOX2 transfected to and RBFOX3with subjected levels was in Technologies)293T cells (Life were 2000 measured Lipofectamine by qRT-PCR 00102493 ( constitutiveexons, as inset into conserved regions around cassette exons versus lowly conserved regions around was used to identify degenerate motifs that were significantly enriched HOMERin package highly software The background. over counts hexamer of ratios tion of enrichment, which was determined as the sign of the difference between other hexamers in these regions. Enrichment scores are multiplied by the direc constitutivelywith numberof the exons.wassplicedtest Background foreach 293T cells after ectopic expression of N-terminal Flag-tagged mouse Flag-tagged N-terminal of expression ectopic after cells 293T was extracted by Trizol as per manufacturer’sand instructions. Total RNA from human RNA-seq library generation and analysis. 0.8 than greater constituted of a similar motif. coefficient correlation A motif. and 6-mer the between Pearsoncorrelationcoefficient computeda HOMER-derivedwe motif,the to NP Rbfox2 _06745 2 z ) counts in highly conserved regions associated with exons that were alter ). We). found that of~20%ourregions conservedhighly overlapped with (by 84%) in the loss. Percent-spliced-in ( 1 nestin-specific knockout one-month-old male and wild-type sib-pairs ) in pcDNA3.1 (Life Technologies) 48 h after transfection using using transfection after h 48 Technologies) (Life pcDNA3.1 in ) 2 ) 6 9 hmn BO2 ( RBFOX2 human , ≤ 6 1

S (parameters: -q -p 1 -e 100 -l 20). Reads that did not map to

≤ Rbfox2 0.3), moderate (0.3 < 6 Rbfox1 2 . We supplied GSNAP with the location of exon-junctions 6 odeRegTfbsClustered/ 7 mRNA upon ( http://www.repea http:// and µ Figure2 Ψ g of total RNA, and subjected to poly(A) selec ) values were calculated as half the number of χ Rbfox2 2 hgdownload-test.cse. enrichment score for each motif was com AAL6715 28 c , Total RNA from whole brain of .To identify 6-mers that were similar Rbfox1 2 knockout mice, respectively. Also, as 7 9 S with improvements that allowed us , we found reciprocal compensatory

≤ 0.9) and high (0.9 < ′ 0 tmasker.org untranslated region, 3 ) loss and 10% for 6 wgEncodeRegTfbsClust 9 and mouse mouse and 7 0 , cDNA corresponding to x axis of doi: S ucsc.edu/goldenPath/ were divided into cat / Rbfox1 10.1038/nsmb.2699 ) annotated) repeats 1 6 . In total, we ana Fig. Rbfox3 (by 63%) and S Rbfox1 Contiguous

2 ≤ ) counts in 1.0) levels ′ untrans y ( axis of Rbfox1 Rbfox1 upon NM_ ered. 6 8 ------

© 2013 Nature America, Inc. All rights reserved. wild-type minigene: wild-type the amplify to used Primers methods. In-Fusion and by vector together the into annealed inserted were reporter, splicing linearized the with overlaps oligonucleotides 39-mer containing mutatedthree the RBFOX sites, and 15 Complementary region. enhancer intron the at opened construct linearized a generate to as so PCR-amplified was sites, RBFOX wild-type the containing RBFOX2 sites. First, the entire wild-type splicing reporter, except a small region to introduce mutations used also was In-Fusion intronat deep the technology In- Fusion the Advantage kit according using to the manufacturer’s together (Clontech).instructions assembled were vector and Insert vector. pcDNA3.1 amplify the E11-E12 region were as follows: ~50 nt of upstream and downstream intron sequence upstream. Primers used to ment of the human Splicing-reporter construction. feature in that region. that with exons cassette non-changing of proportion the to region given a in feature given a of one least at with exons cassette changing of proportion the score BL score BL a with motifs (i) were human in tested (iv) with a BL score were (i) examined upon regulated Rbfox were that exons with associated be to expected we features tively, and a intron flanking for evidence inclusion or exclusion in human and respec mouse experiments, have to required were exons Furthermore, above. described experiments RNA-seq the to ing ( included sette exons into classes undergoing three types of regulation. Exons were either of association Distal and columns using a Euclidean distance metric. are Bonferroni-corrected for multiple-hypothesis testing and cies-appropriateclustered RNA-seqalong experiment)rows genes in that category. Reported throughput experiment to the fraction of all expressed (RPKM > 0.5 in the spe high- particular a in spliced) alternatively or expressed differentially (bound, relevantappeared that category ontology each in genes of fraction the paring Gene ontology. GRCh37 v65, mouse: NCBIm37) were evaluated for differential splicing. exons with evidence for alternative splicing from Ensembl annotations (human: exclusion isoforms plus half the number of reads mapped to all inclusion. Only doi: mutagenesis to mutate both nonconserved sites in one reaction, using the the using reaction, one in sites nonconserved both mutate to ­mutagenesis 5 Reverse primer: 5 Forward primer: EcoRV-linearized with overlaps nt 15 provided nucleotides case lower The 5 Reverse primer: 5 Forward primer: The nonconserved RBFOX sites were modified using multisite multisite using modified were sites RBFOX nonconserved The 5 Reverse: 5 Forward: Oligonucleotides containing the mutated RBFOX sites (underlined): 10.1038/nsmb.2699 ′ ′ ′ ′ ′ ′ -TTAAAAATTTGACTGTTTCCACAATTG-TTTATTACA-3 -gccactgtgctggatCATTCAGGATCCATGTCAAAGA-3 -tggaattctgcagatGTCTGGCATTGTGCAAATTAGA-3 -CAGTCAAATTTTT-AAAAATTTAATTCAGTCTAACAGTCA-3 - -TCAGTCTAACAGTCAATCCATCACCACCACCACCAC-3 Rbfox1 TGACTG depletion or ectopic expression. In mouse experiments, the features we features the experiments, In mouse expression. ectopic or depletion ≥ ∆ 0.2. We calculated significance by a Fisher’s exact test comparing comparing test Fisher’sexact a by significance We calculated 0.2. CLIP clusters, and (v) (v) and clusters, CLIP Ψ > ( 5%), excluded TT ≥ Gene ontology analysis consisted of a hypergeometric test com 0.1, (iii) conserved ≥ AGACTG Rbfox 30 or or 30 ENAH Rbfox ≥ GCATG motifs, (ii) conserved GCATG conserved (ii) motifs, 0.1, and (iii) conserved conserved (iii) and 0.1, Rbfox ≥ gene encompassing the exon 11-11a-12 region, plus 50 reads mapped across exon-junctions showing showing exon-junctions across mapped reads 50 ≥ AATTAAATTTTTAAAAATT sites with regulated exons. regulated with sites 1.5 kb in at least one Wedirection. tested several GCATG motifs, (ii) conserved conserved (ii) GCATG motifs, ∆ The wild-type minigene contains a 7.3-kb frag Ψ < (| −5%) or unaffected Rbfox Rbfox2 GCATG motifs with a BL score CLIP clusters. The features we we features The clusters. CLIP Rbfox GCATG motifs with a with GCATG motifs Rbfox We categorized cas We categorized ′ ′ TGACTG ; ∆ . Ψ GCATG motifs | | < 2%) accord Rbfox ′ ′ . ; -3 GCATG P ′ values ; ′ . ≥ 0.4,

nt ------

to facilitate entry into cells ments contains a covalently linked octaguanidine dendrimer as a delivery moietyfrom Gene Tools, LLC (Philomath, OR). The Morpholino treatments. endogenous ENAH transcripts was performed using the following primers: 5 primer ward primer 5 (for 10 intron in primers using done was efficiency splicing of measure a as (Sigma).ase Amplification of minigene E11a inclusion and exclusion products (Invitrogen). Subsequent PCR analysis was System performed Synthesisusing StrandAccuTaq First polymer III Superscript the and primers random using cDNAintoQiagen’sreverse-transcribed with then cells from and Kit, Mini RNeasy extracted was RNA Total (Roche). Reagent Transfection HD FuGene reporter minigenes were transfected into human HCC1954 or T47D cells using Splicing analysis of endogenous transcripts and minigene reporters. in represented are upper case: motifs RBFOX mutated the which in primers following 71. 70. 69. 68. 67. 66. 65. 64. 63. 62. 61. Morpholino sequences were as follows: 6 plates5 wellusing 12 in cells wereHS578Tdelivered to sites RBFOX distal the blocking dendrimer) octaguanidine linked covalently a (withoutmorpholinos experiments, KIF21A For °C. −80 atstored and bath ice dry and ethanol an in frozen snap and PBS 1× in day. rinsed wereTissues two consecutive days, then RNA was purified from selected tissues on the third on RBFOXmg/kg 15 atinjections morpholino vein tail received conserved Mice motifs.binding the of two to complementary are sequences underlined was as follows: µ Reverse primer: 5 Forward primer: 5 Splicing of human KIF21A was assayed using the following primers: Reverse primer: 5 Forward primer: 5 Primer 2: 5 Primer 1: 5 Uncropped images of gels are shown in KIF21A distal2 5 KIF21 distal1 5 5

l of endoporter (Gene Tools, LLC). RNA was extracted 48 h after treatment. ocs PA, i Y & in, . Vivo-Morpholinos: non-peptide S. Jiang, transporter a delivers & Y. P.A.,Morcos, Li, D. Parkhomchuk, Underwood, J.G., Boutz, P.L., Dougherty, J.D., Stoilov, P. & Black, D.L. Homologues S. Heinz, (1996–2010). Open-3.0. P.RepeatMasker Green, & forcomparing R. Hubley, A., Smit, of utilities suite flexible a BEDTools: I.M. Hall, & A.R. Quinlan, for library Python flexible a Pybedtools: A.R. Quinlan, & B.S. Pedersen, R.K., Dale, D.G. Zisoulis, H. and Li, variants complex of detection SNP-tolerant and Fast S. Nacu, & T.D. Wu, Langmead,Trapnell,B., Salzberg,& UltrafastS.L.M.Pop,C., memory-efficient and morpholinoswidearrayintomouseaof tissues. DNA. complementary mammals. the of identities. cell B and macrophage for Cell Mol. required elements cis-regulatory prime features. genomic manipulatinggenomic datasets and annotations. in sites 25 reads. short in splicing (2009). lgmn o sot N sqecs o h hmn genome. human the to sequences DNA short of alignment ′ -TAATT , 2078–2079 (2009). 2078–2079 , ′ -CAGGATCCATGTCAAAGATATGC-3 t al. et anradts elegans Caenorhabditis Caenorhabditis elegans Caenorhabditis

38 t al. et CATGCT Mol. Cell Biol. Cell Mol. ′ ′ ′ -ggaataggtAGACTGagtgaatatatgaaataacatcc-3 -catggtTGACTGtgcctgtgggaggctg-3 -GCATTGTGCAAATTAGAGTCCTT-3 h Sqec AinetMp omt n SAMtools. and format Alignment/Map Sequence The , 576–589 (2010). 576–589 , t al. et ′ ipe obntos f ieg-eemnn tasrpin factors transcription lineage-determining of combinations Simple - CATGCA ′ -ACACATCAG ′ ′ t al. et Bioinformatics -GTTTAAAGGAGCATCCTCATCAGT-3 -GCTCTGCTTCAGCCTGTCATAG-3 ′ ′ -GAAATAACCAGTGCTACCCAAAAC-3 -GCTGAGAAGGGATCAACAATAG-3 opeesv dsoey f noeos roat binding Argonaute endogenous of discovery Comprehensive AC Nucleic Acids Res. Acids Nucleic nature structural & molecular biology molecular & structural nature We obtained 25-nt antisense morpholinos sequences Bioinformatics Transcriptome analysis by strand-specific sequencing of sequencing strand-specific by analysis Transcriptome CATGCA in

25

vivo ACAGCTCTGTAAACTAATA-3 , 10005–10016 (2005). 10005–10016 , o- poen r nuoa slcn rgltr in regulators splicing neuronal are protein Fox-1 . 71 Nat. Struct. Mol. Biol. Mol. Struct. Nat.

26 . The enhancer blocking sequence for ENAH CATGCA ATCCAC-3 , 841–842 (2010). 841–842 ,

26

, 873–881 (2010). 873–881 , 37 Supplementary Figure 6 vivo , e123 (2009). e123 , Bioinformatics GCTCATTCAC-3 Biotechniques ′ morpholino for mouse experi ; ′ ; ′ ). Amplification of mouse of Amplification ).

17 ′ ) and intron 12 (reverse

µ 27 , 173–179 (2010). 173–179 , eoe Biol. Genome l of morpholinoofand l

45 , 3423–3424, (2011). ′ ′ . , 613–623, (2008). ′ ′ ; ; ′ ′ ′ . ; Bioinformatics .

Splicing 10 R25 , - - -