1

2

3

4 Loss of the multifunctional RNA-binding RBM47 as a 5 source of selectable metastatic traits in breast cancer

6

7

8 Sakari Vanharanta1, 2, 5, Christina B. Marney3, 5, Weiping Shu1, Manuel Valiente1, 9 Yilong Zou1, Aldo Mele3, Robert B. Darnell3,4,* and Joan Massagué1,4,* 10

11 1) Cancer Biology and Genetics Program, Memorial Sloan-Kettering Cancer 12 Center, New York, New York 10065, USA

13 2) MRC Cancer Unit, MRC/Hutchison Research Centre, University of Cambridge, 14 Cambridge, CB2 0XY, United Kingdom

15 3) Laboratory of Molecular Neuro-Oncology, The Rockefeller University, 1230 16 York Avenue, New York, NY 10021, USA

17 4) Howard Hughes Medical Institute, Chevy Chase, Maryland 20815, USA

18 5) These authors contributed equally to this work

19

20

21

22 * For correspondence: [email protected] (J.M.) or 23 [email protected] (R.B.D)

24

1 25 SUMMARY

26 The mechanisms through which cancer cells lock in altered transcriptional 27 programs in support of metastasis remain largely unknown. Through 28 integrative analysis of clinical breast cancer expression datasets, cell 29 line models of metastatic breast cancer, and mutation data from cancer 30 genome resequencing studies, we have identified RNA binding motif 31 protein 47 (RBM47) as a suppressor of breast cancer progression and 32 metastasis. RBM47 inhibited breast cancer re-initiation and growth in 33 experimental models. Transcriptome-wide HITS-CLIP analysis revealed 34 widespread RBM47 binding to mRNAs, most prominently in introns and 35 3’UTRs. RBM47 altered splicing and abundance of a subset of its target 36 mRNAs. Some of the mRNAs stabilized by RBM47, as exemplified by 37 dickkopf WNT signaling pathway inhibitor 1 (DKK1), inhibit tumor 38 progression downstream of RBM47. Our work identifies RBM47 as a 39 multifunctional RNA-binding protein that can suppress breast cancer 40 progression. More generally, these findings demonstrate how the 41 inactivation of a broadly targeted RNA chaperone can establish a platform 42 for the selection of a pro-metastatic transcriptomic state.

43

44

45

2 46 INTRODUCTION

47 Cancers arise through an evolutionary process that feeds from stochastic genetic 48 alterations and selection [1]. The identities of the alterations that get selected for 49 are rapidly coming to light through large-scale resequencing efforts. For example, 50 several independent studies have characterized the mutational complement of 51 breast cancer, one of the most common human malignancies [2-8]. Besides 52 confirming previously known cancer , such as TP53 and PIK3CA, most 53 studies also report a long tail of rarely mutated genes. While many of these 54 mutations are likely to be passenger events, some of them are potential 55 mediators of tumor phenotypes. How to identify such low-frequency driver 56 mutations remains a challenge.

57 In addition to mutations that directly promote tumorigenesis through specific 58 alterations in cell signaling and repair pathways, many aberrations found in 59 cancers do not affect cell signaling pathways directly, but rather, they support the 60 stabilization of altered transcriptomic profiles that facilitate the emergence of pro- 61 tumorigenic and metastatic traits. Mutations in epigenetic regulators fall into this 62 category [9]. The phenotypic output of such alterations would depend on the 63 activity of already existing signaling processes. Indeed, examples of epigenetic 64 alterations that result in a phenotypic trait in the presence of a specific 65 transcriptional program have been described [10]. Analogously, aberrations in the 66 multi-step mRNA processing and turnover cascade [11] could also lock in 67 aberrant transcriptomic states.

68 Precise regulation of RNA metabolism is fundamental in the generation of 69 biological complexity in both normal and disease states [12, 13]. The concerted 70 action of multiple RNA binding (RBPs) regulate the spatial, temporal and 71 functional dynamics of the transcriptome via alternative splicing, alternative 72 polyadenylation and transcript stability [11]. While malignancy-associated 73 dysregulation of RNA metabolism via aberrant microRNA expression is relatively 74 well established [14, 15], a growing body of evidence indicates a prominent role 75 also for RBPs in both the development and progression of cancer. For example,

3 76 upregulation of the splicing regulator SRSF1 is associated with multiple tumor 77 types [16], and is necessary for the oncogenic activity of MYC in lung cancer [17]. 78 Conversely, RBM5 has been shown to be tumor suppressive in several cancer 79 models [18-20].

80 We hypothesized that combining cancer genome resequencing data with gene 81 expression information from both clinical data sets and experimental model 82 systems of metastasis would allow the identification of rarely mutated cancer 83 genes with potential functional significance. This approach identified RNA binding 84 motif protein 47 (RBM47) as a suppressor of breast cancer progression. By 85 analyzing the transcriptome-wide RBM47 binding patterns we demonstrate that 86 RBM47, a previously uncharacterized RNA-binding protein, modulates mRNA 87 splicing and stability. Loss of RBM47 function thus provides a specific example of 88 the power of global RNA modulatory events in the selection of pro-metastatic 89 phenotypic traits.

90

91

4 92 RESULTS

93 RBM47 inactivation associated with breast cancer progression

94 We combined data from triple negative metastatic breast cancer 95 models [21, 22] and a cohort of 368 untreated clinical breast cancer cases [22, 96 23] with mutational data from a brain metastasis that originated from a basal 97 breast cancer [2] (Figure 1A). Specifically, we looked for genes that had reduced 98 mRNA expression in functionally metastatic cancer cells, evidence for low mRNA 99 expression associated with poor patient outcome in clinical samples, and an 100 enriched mutation in the brain metastasis sequenced by Ding et al. [2]. RBM47, a 101 gene encoding a previously uncharacterized putative RNA-binding protein was 102 the only one that fulfilled all these criteria [24-26]. We confirmed the lower 103 expression of RBM47 mRNA in the highly metastatic cells (Figure 1B). This 104 translated into a comparable difference at the protein level (Figure 1C). In the 105 clinical data sets, low RBM47 mRNA expression was significantly associated with 106 relapse to brain and lung (Figure 1D-E) but not to bone (Figure 1F). In 107 multivariate analysis combining RBM47 expression with estrogen, progesterone 108 and HER2 receptor status (ER, PR and HER2), the association with brain 109 metastasis remained statistically significant (Figure 1 – figure supplement 1A). 110 We further characterized the expression patterns of RBM47 in the TCGA cohort 111 of 748 breast cancer samples studied by RNA-seq [3, 27]. We found that low 112 RBM47 expression was significantly associated with claudin-low and basal breast 113 cancers (Figure 1G), two subtypes of poor prognosis [28, 29].

114 The RBM47I281fs mutation first reported in a brain metastasis truncates the protein 115 from the third RNA recognition motif (RRM) onwards (Figure 1H). As this 116 mutation was already present in a minority subpopulation of the corresponding 117 primary tumor [2], we looked for additional evidence of genetic RBM47 118 aberrations in primary breast cancer cohorts. The catalogue of somatic mutations 119 in cancer (COSMIC) database [30] reported 9 non-synonymous mutations in 120 RBM47, three of which were frameshift mutations truncating one or more of the 121 RRM domains (Figure 1 – figure supplement 1B). Furthermore, analysis of the

5 122 data from the TCGA cohort revealed that in basal breast cancer, RBM47 was 123 targeted by a mutation or homozygous deletion in ~8% of the cases (Figure 1 – 124 figure supplement 1C). Moreover, heterozygous loss of the RBM47 locus was 125 present in 30% of the TCGA cohort [27]. These correlative analyses of multiple 126 different breast cancer data sets, both experimental systems and large clinical 127 patient cohorts, suggested that reduced expression or function of RBM47 is 128 associated with breast cancer progression already within primary tumors, and 129 that clones with reduced RBM47 function may display enhanced lung and brain 130 metastatic fitness.

131 RBM47 suppresses breast cancer progression

132 In order to test the role of RBM47 as a suppressor of breast cancer progression, 133 we stably introduced both wild type RBM47 and RBM47I281fs in the lung 134 metastatic (LM2) and brain metastatic (BrM2) derivatives of the MDA-MB-231 135 triple negative breast cancer cells (MDA231 for short), respectively (Figure 2 – 136 figure supplement 1A) [21, 22]. Of note, despite robust mRNA expression, the 137 mutant RBM47I281fs protein levels were low, indicating that the mutant protein was 138 unstable and therefore inactive (Figure 2 – figure supplement 1B). As determined 139 by in vivo bioluminescence in experimental metastasis assays, wild type RBM47 140 inhibited lung colonization by the MDA231-LM2 cells, when compared to 141 RBM47I281fs (Figure 2A-B). Similarly, we observed extended brain metastasis free 142 survival in mice inoculated with the MDA231-BrM2 cells expressing RBM47 when 143 compared to those expressing RBM47I281fs (Figure 2C). This difference translated 144 into reduced brain metastatic burden as determined by ex vivo imaging (Figure 145 2D-E and Figure 2 – figure supplement 1C). We then tested the effects of 146 inhibiting endogenous RBM47 in the metastatic colonization of the parental 147 MDA231 cells. With comparable effects of RBM47 re-expression seen in both the 148 lung and brain metastasis models (Figure 2A and 2E), we chose the lung 149 colonization assay for loss-of-function studies as this allowed the simultaneous 150 analysis of a greater number of tumor re-initiation events. Two RBM47-targeting 151 shRNA constructs significantly shortened the lung metastasis free survival in

6 152 mice when compared to controls (Figure 2F and Figure 2 – figure supplement 153 1D), as determined by bioluminescence imaging (Figure 2G-H). This result was 154 confirmed with a second more indolent cancer cell clone (Figure 2 – figure 155 supplement 1E-G).

156 Clonal heterogeneity in RBM47 sensitivity

157 The initial functional experiments suggested that the tumor suppressive effect of 158 RBM47 on the overall population of metastatic cancer cells was modest. This 159 could reflect either weak tumor suppressive function of RBM47 in general, 160 heterogeneity in the sensitivity to RBM47 among different cancer cell 161 subpopulations, or in the case of RBM47 reintroduction, loss of transgene 162 expression. In order to distinguish between these possibilities, we used 163 immunohistochemistry to assess RBM47 expression in the lung nodules formed 164 either by wild type RBM47 or RBM47I281fs expressing cells. This revealed that 165 some cancer clones were able to form robust lung metastasis even in the 166 presence of RBM47 (Figure 3A and Figure 3 – figure supplement 1A), but that 167 many of the metastases formed after the inoculation of wild type RBM47- 168 expressing cells had avoided or suppressed the expression of RBM47 (Figure 3B 169 and Figure 3A – figure supplement 1A). As expected, the RBM47I281fs-expressing 170 tumors contained only weakly staining cancer cells intermingled with small cells 171 with strong RBM47 expression (Figure 3C), similar to those seen in normal lung 172 parenchyma (Figure 3D). The rate of proliferation as determined by Ki67 173 immunohistochemistry did not correlate with the level of RBM47 expression 174 (Figure 3 – figure supplement 1A). This was in line with the idea that some clones 175 were more sensitive to RBM47 than others, but also that a strong selective 176 pressure led metastatic cells to lose RBM47, a finding consistent with the initial 177 observation of RBM47 loss in metastatic cell populations (Figure 1C).

178 In order to allow better experimental control over RBM47 expression we utilized a 179 conditional expression system. By focusing on the two brain metastatic cell lines 180 that expressed the lowest levels of endogenous RBM47 (Figure 1C) we 181 generated single cell-derived clones with doxycycline-inducible expression of

7 182 either wild type RBM47 (henceforth WT10 and WT6, respectively) or the patient- 183 derived mutant RBM47I281fs (henceforth MUT3). All clones exhibited dose- 184 dependent RBM47 mRNA upregulation upon doxycycline induction (Figure 3 – 185 figure supplement 2A-C) that translated into increased RBM47 protein expression 186 in the wild type-expressing clones (Figure 3 – figure supplement 2D-E). Cancer 187 cells also tolerated RBM47 in vitro (Figure 3 – figure supplement 2F). After 188 confirming the feasibility of doxycycline-mediated conditional gene activation in 189 metastatic brain lesions (Figure 3 – figure supplement 2G), we inoculated WT6, 190 WT10 and MUT3 cells into immunocompromized mice and assessed their brain- 191 metastatic phenotype. All clones formed robust brain metastases under 192 doxycycline-free conditions (Figure 3E-J). The induction of RBM47 expression 193 with doxycycline in the diet inhibited robustly brain colonization of both metastatic 194 cell clones, WT6 and WT10 (Figure 3E-G). However, the patient-derived mutant 195 RBM47I281fs did not show any tumor suppressive effects (Figure 3H-J). 196 Collectively, these results demonstrated that RBM47 was able to strongly inhibit 197 metastatic functions of some cancer clones, whereas other clones were able to 198 form metastasis despite the presence of RBM47.

199 Transcriptome-wide identification of RBM47 binding sites

200 To determine whether RBM47 can directly bind RNA in vivo, we made use of the 201 ability of UV-irradiation at 254nm to induce chemical crosslinks between RNA 202 and proteins that are in direct contact [31]. γ-32P-labeled RNA was detected by 203 autoradiogram at ~76kD, the predicted size of Flag-RBM47, in Flag- 204 immunoprecipitates from UV-irradiated, doxycycline-treated MDA231-BrM2 WT 205 Flag-RBM47 cells, but not doxycycline-treated empty vector control or non- 206 irradiated WT Flag-RBM47 expressing cells (Figure 4A). To identify directly 207 bound RBM47 targets, a modified version of the high throughput sequencing and 208 cross-linking immunoprecipitation (HITS-CLIP) protocol [32, 33] was carried out 209 in duplicate on Flag-RBM47 expressing MDA231-BrM2 cells treated with 210 doxycycline. (Figure 4B, see Materials and Methods). RBM47-bound HITS-CLIP 211 reads were mapped to the , yielding ~7.7x106 and ~2.0x106

8 212 unique reads (tags) per replicate. Seventy-five percent of the tags mapped to 213 regions corresponding to UCSC/Refseq genes, with a high degree of 214 reproducibility of binding observed between replicates (Spearmann correlation 215 coefficient R2=0.998, total tags per gene, Figure 4C).

216 To identify robust and reproducible RBM47-binding sites, tags were clustered to 217 return regions with evidence of binding in both replicates (biological complexity, 218 BC=2) with increasingly stringent filters (Figure 4D): tags per cluster (tags≥2, 219 617,026 clusters; tags≥10, 94,966 clusters), a previously described significant 220 peak threshold (significant peak height ≥10, 29,562 clusters [34]), and a ranked 221 reproducibility chi-squared score (p≤0.01, 19,433 clusters [35]). As has been 222 previously described for the n-ELAV proteins [36], identification of the most 223 robust binding through increased stringency of cluster definition led to an 224 increase in proportional binding in 3’UTR regions. For RBM47 this occurred due 225 to loss of the majority of reproducible, yet relatively small intronic binding sites 226 (Figure 4E), which may reflect the relative abundance of pre- and mature RNA 227 message. Motif analysis (MEME, [37]) failed to identify an enriched RBM47- 228 binding sequence in +/-10nt footprints centered on the top 3,000 significant 229 peaks (data not shown), but revealed a polyU sequence enriched in RBM47- 230 binding sites containing cross-link induced mutations (deletion CIMS, [38], Figure 231 4F). The apparent lack of an enriched RBM47-binding motif within robust CLIP- 232 derived binding sites may reflect the broad binding patterns observed, as 233 exemplified by the predominantly 3’UTR binding in DKK1 and IL8 (Figure 4G).

234 RBM47 regulates alternative splicing

235 Reproducible binding of RBPs in intronic regions has proven to be predictive of a 236 role in pre-mRNA processing for multiple proteins. To explore the relationship 237 between RBM47 intronic binding and alternative splicing, RNA-seq was carried 238 out in triplicate to compare MDA231-BrM2 cells and RBM47-expressing WT10 239 cells. Reads were mapped to cassette (CA) exon junctions as described 240 previously for Mbnl2 [39], and an average inclusion rate (IR) calculated for each 241 cell type to allow for the identification of reciprocal splicing changes while

9 242 normalizing for changes in RNA stability [40] (Figure 5A). The average change in 243 inclusion rate (ΔI) was then calculated such that positive ΔI indicates RBM47- 244 dependent cassette exon inclusion. This analysis revealed 121 and 140 CA 245 exons with significant RBM47-dependent inclusion and exclusion, respectively. 246 To assess whether RBM47-binding occurred in the vicinity of these splice sites, 247 HITS-CLIP tags in BC2 tags≥5 clusters (to account for lower levels of intronic 248 binding seen in Figure 4E) in the region of the alternative splice were calculated. 249 Forty-eight RBM47-bound included and 49 RBM47-bound excluded CA exons 250 were identified in a total of 84 genes (Figure 5B and Supplementary file 1). 251 RBM47-dependent splicing changes were confirmed via RT-PCR as shown for 252 SLK, MDM4, LIMCH1, MBNL1 and SEC31A (Figure 5C and Figure 5 – figure 253 supplement 1).

254 By mapping normalized RBM47 CLIP tags associated with RBM47-dependent 255 splicing changes on a composite transcript [32] we generated an RNA binding 256 map of consensus binding sites within 1kb of exon-intron boundaries, with 257 respect to exon inclusion or exclusion (Figure 5D). The resulting normalized 258 complexity map reveals enriched RBM47 binding in the vicinity of splice acceptor 259 sites of the CA exon and 3’ CE for included alternative isoforms, while relative 260 enrichment of RBM47 binding was seen at the 5’ CE splice donor and 3’CE 261 splice acceptor of excluded isoforms.

262 RBM47 affects mRNA steady state levels

263 To further study the functional consequences of RBM47-mRNA binding events, 264 we determined global steady state mRNA levels in the RBM47 expressing brain 265 metastatic cells by RNA-seq. We took advantage of the clonal doxycycline- 266 inducible cell line systems, which facilitated the analysis of RBM47 dose- 267 dependence. First, genome-wide analyses showed that the mRNA level of 268 several more genes correlated significantly with increasing concentrations of 269 doxycycline in both cell lines expressing the wild type RBM47 (WT6 and WT10), 270 when compared to cells expressing the mutant (Figure 6A). This result indicated 271 that RBM47 elicits dose-dependent changes in mRNA levels. The correlation

10 272 coefficients followed a pattern that suggested the existence of mRNA species 273 that correlated both positively and negatively with wild type, but not mutant, 274 RBM47 expression, i.e. few genes in MUT3 cells had correlation coefficients 275 close to 1 or -1, whereas in WT6 and WT10 cells such genes were abundant 276 (Figure 6B). Encouraged by these observations, we looked for mRNAs that 277 fulfilled the following criteria: (i) p-value of correlation less than 0.01 in both WT6 278 and WT10 cells, (ii) mRNA expression change detectable already at the lowest 279 levels of RBM47 expression in both cell clones and (iii) no significant correlation 280 with RBM47I281fs expression. This revealed a set of 102 mRNAs that were 281 upregulated and 92 that were downregulated, respectively, in cells expressing 282 the wild type RBM47 (Figure 6C and Figure 6 – figure supplement 1A-B and 283 Supplementary file 2). Importantly, these changes were observed already with 284 the lowest expression level of RBM47 that was comparable or lower than those 285 detected in endogenously RBM47 expressing cells (Figure 3 – figure supplement 286 2D-E).

287 To determine whether the 194 RBM47-responsive genes displayed clinically 288 meaningful expression patterns, we conducted an unsupervised hierarchical 289 clustering analysis of the TCGA cohort of breast cancer specimens. Two main 290 clusters emerged, one of which harbored characteristics of the RBM47-low 291 phenotype (‘Cluster 2’, Figure 6 – figure supplement 1C). First, this subgroup, 292 Cluster 2, had significantly lower RBM47 expression levels when compared to 293 Cluster 1 (Figure 6D). Second, the genes that were induced upon RBM47 294 reintroduction tended to have lower expression in Cluster 2, whereas genes with 295 reduced expression level in the RBM47-expressing cells tended to show higher 296 expression in Cluster 2 (Figure 6E). These observations validated in clinical 297 tumor samples the RBM47-dependent gene expression correlations identified in 298 vitro.

299 To identify directly regulated targets of RBM47, we combined the mRNA 300 expression data with the HITS-CLIP-derived RBM47 transcriptome-wide binding 301 data. Of the 2498 strongest binding partners with >100 tags per transcript (total

11 302 tags in BC2 tags≥10 clusters; Supplementary file 3), 25 were among the 102 303 RBM47-upregulated genes and 17 among the 92 RBM47-downregulated genes 304 (Figure 6F), indicating no significant binding preference for either groups. This 305 was reflected in the similar overall RBM47 binding profiles in both the up- and 306 downregulated genes (Figure 6G). Two of the top-scoring RBM47 binding 307 mRNAs, IL8 and DKK1 (17,079 and 16,208 tags in clusters per gene, 308 respectively, Figure 4G), were among the upregulated genes. This increase in 309 mRNA levels was associated with increased protein secretion as determined by 310 ELISA, whereas VEGFA, the mRNA of which was bound but not upregulated by 311 RBM47, showed no change in protein secretion (Figure 6H).

312 RBM47 modulates DKK1 mRNA stability

313 Nuclear RNA binding proteins typically function in large multiprotein complexes 314 that regulate mRNA biogenesis [41]. Our data from both genome-wide HITS- 315 CLIP and RNA-seq analysis was compatible with RBM47 being a member of 316 these RNA chaperone units. This suggested that RBM47 may not necessarily 317 have a direct tumor suppressive signaling function. Rather, it raised the 318 possibility that loss of RBM47, leading to subtle changes in multiple mRNAs, 319 either through stabilization, destabilization or alternative splicing, could be 320 selected for if the net effect of both growth-promoting and growth-inhibiting 321 changes would be beneficial for cancer cells under the stress of dissemination to 322 and colonizing distant organs. In line with this, both the up and downregulated 323 target genes of RBM47, as well as the genes that were targets of RBM47- 324 mediated alternative splicing, contain genes that have previously been shown to 325 either promote or inhibit tumor phenotypes (Supplementary file 1 and 326 Supplementary file 3). For example, DKK1 [42-44], HTATIP2 [45], HBP1 [46, 47], 327 MXI1 [48] and CASP7 [49], all bound by RBM47 and upregulated upon RBM47 328 reintroduction, have known tumor suppressive functions. Similarly, RBM47- 329 induced splicing changes were seen in genes such as SLK [50], MDM4 [51] and 330 TNC [52], all of which are genes with known functions in cancer.

12 331 Focusing on DKK1, one of the most robustly bound RBM47 target transcripts 332 identified by HITS-CLIP, we investigated the possible role of RBM47 as a 333 modulator of mRNA abundance. As predicted by our results in WT6 and WT10 334 cells, knockdown of RBM47 in two additional breast cancer cell lines expressing 335 high levels of endogenous RBM47, SKBR3 and ZR-75-30, reduced DKK1 mRNA 336 levels (Figure 7A and Figure 7 – figure supplement 1A). This validated RBM47 as 337 a modulator of DKK1 mRNA level in breast cancer cells.

338 The fact that RBM47 bound to DKK1 mRNA 3’UTR and increased DKK1 mRNA 339 levels suggested the possibility that RBM47 had the capability of stabilizing DKK1 340 mRNA. We tested this by treating cancer cells with actinomycin D, a general 341 inhibitor of transcription, and measuring DKK1 mRNA levels in the following 342 hours. This demonstrated that wild type RBM47, but not the RBM47I281fs mutant, 343 was able to increase the half-life of DKK1 mRNA by up to four-fold (Figure 7B).

344 RBM47 protects DKK1 3’UTR from destabilizing factors

345 As RBM47 binding to DKK1 was concentrated on the 3’UTR, a known region of 346 regulatory activity [53], we considered the possibility that RBM47 could compete 347 with microRNAs or other mRNA destabilizing factors that target the 3’UTR [54, 348 55]. To test this experimentally, we generated miR-30-based shRNA-miR 349 constructs that targeted different regions of the DKK1 transcript, two at the 5’ end 350 with minimal RBM47 binding and three at the 3’ end with abundant RBM47 signal 351 (Figure 7C) [56]. All constructs knocked the DKK1 transcript level down by 65- 352 80% in the WT6 cells when no doxycycline was present (Figure 7D). The 353 induction of RBM47 expression did not have a significant effect on the efficiency 354 on the 5’-targeting shRNA-miRs (Figure 7E). In contrast, RBM47 inhibited the 355 capability of the 3’-targeting shRNA-miRs to keep the DKK1 mRNA level down 356 (Figure 7E). Similar observations were made with the WT10 cells (Figure 7F-G). 357 Argonaute HITS-CLIP in the MDA231-BrM2 cells (C.B.M. et al., unpublished 358 data) indicates robust binding to the DKK1 3’UTR, supporting the regulatory role 359 of this region. We conclude that the effects of RBM47 on DKK1 mRNA levels

13 360 may be due to the ability of RBM47 to protect this mRNA from destabilizing 361 factors, possibly through direct interaction with the DKK1 mRNA.

362 RBM47 inhibits tumor progression by suppressing Wnt activity

363 DKK1 is an inhibitor of Wnt signaling, a pathway with a well-established role in 364 regulating stem cell characteristics in both normal and malignant cells [57]. 365 Indeed, DKK1 as well as other Wnt antagonists have been shown to inhibit 366 breast cancer progression [42-44, 58]. The fact that RBM47 was able to increase 367 DKK1 secretion therefore suggested that RBM47 may also inhibit Wnt signaling 368 and consequently reduce the tumorigenic fitness of metastatic breast cancer 369 cells. We first tested the effects of RBM47 on cancer cell Wnt responsiveness by 370 treating WT6 cells with recombinant Wnt3A and subsequently measuring the 371 expression of AXIN2, a common TCF/β-catenin target gene and a general 372 marker of Wnt activity [57, 59]. Doxycycline-induced expression of RBM47 in 373 WT6 cells led to a dampened Wnt3A-dependent AXIN2 induction (Figure 8A). 374 This inhibition of AXIN2 expression was at least partially dependent on DKK1 375 (Figure 8B). In agreement with these results, human breast cancers with low 376 RBM47 expression had in general higher levels of Wnt transcriptomic activity 377 when compared to tumors with high RBM47 expression in the TCGA cohort 378 (Figure 8C).

379 We then tested the possibility that RBM47 would suppress tumorigenesis by 380 implanting low numbers of WT6 cells at the orthotopic site. The induction of 381 RBM47 by doxycycline was able to significantly delay the emergence of 382 mammary tumors (Figure 8D), and the tumors that formed were smaller in size 383 (Figure 8E). This observation was confirmed in WT10 cells (Figure 8 – figure 384 supplement 1A). In line with this, inhibition of DKK1 expression in the MDA231- 385 BrM2 cells promoted tumor formation in the orthotopic site (Figure 8F and S8A). 386 Finally, we used the two exon 1 DKK1 shRNAmiR constructs (Figure 7C) to 387 reduce DKK1 levels in the WT6 cells (Figure 8 – figure supplement 1B) and 388 tested how this affected the brain metastatic ability of these cells in the presence 389 of RBM47. Only 1/9 mice (16%) injected with control cells developed brain

14 390 metastasis, whereas 8/17 mice (47%) inoculated with DKK1 RNAi construct 391 transduced cells developed metastasis (Figure 8G-H and Figure 8 – figure 392 supplement 1C). Taken together, these observations suggested that RBM47- 393 dependent suppression of tumor progression was partially mediated by its ability 394 to increase the production of the Wnt antagonist DKK1, a secreted protein that 395 can inhibit tumor phenotypes in metastatic cancer cells.

396

397 DISCUSSION

398 Cancer genomes contain numerous genes with low-frequency mutations of 399 unknown functional significance. We have studied one such gene, the previously 400 uncharacterized RBM47, and demonstrate that it has tumor suppressive 401 functions in breast cancer. RBM47 acts as a multifunctional RBP modulating 402 alternative splicing and the abundance of several mRNAs, which can lead to 403 inhibition of cancer progression (Figure 8I-J). These results highlight the 404 significance of infrequent mutations in cancer, the importance of integrated 405 experimental approaches to identify such functionally relevant mutations, and the 406 role of broadly targeted mRNA chaperones as determinants of cancer 407 progression.

408 RBM47 contains three classical RNA recognition motifs (RRM domains). The 409 closest homologs of RBM47 are Apobec1 complementation factor (A1CF) and 410 hnRNP-Q, which regulate RNA editing [60] and splicing and transcript stability, 411 respectively [61, 62]. In general, RBPs bind to and influence the function and fate 412 of both pre-mRNAs and mRNAs [41]. They can operate in large multiprotein 413 complexes that dynamically regulate all the steps of mRNA biogenesis, nuclear 414 export, stability and translation. Individual subunits of these complexes can 415 therefore have diverse phenotypic roles depending on the exact protein complex 416 they are in [63]. Our observations on RBM47 are in line with these known general 417 principles of RBP function.

15 418 Our data demonstrate widespread and reproducible RBM47 binding to target 419 mRNAs predominantly in introns and 3’UTRs, with the most robust binding 420 occurring in 3’UTRs. Recent RNA-compete studies have proposed a binding 421 motif for RBM47 in vitro [26]. Our in vivo HITS-CLIP data does not suggest a 422 clear nucleotide binding specificity for exogenous Flag-tagged RBM47, although 423 some preference was observed for polyU stretches around CIMS sites. It has 424 been shown that the presence of a canonical motif is neither necessary nor 425 sufficient to predict HITS-CLIP binding sites of FUS in both mouse and human 426 brain [64], while specific sites suggested by in vitro RNA selection experiments 427 are not enriched in HITS-CLIP derived FMRP binding sites in vivo [35]. This 428 would suggest that other factors such as RNA accessibility, secondary structure 429 or protein-protein interactions may modulate RBM47 target choice [65]. Further 430 work is therefore needed for a comprehensive understanding of the determinants 431 of RBM47-mRNA interactions.

432 We find that RBM47 binds robustly to ~2500 gene transcripts in human breast 433 cancer cells, with only a subset showing steady state level change or alternative 434 splicing upon RBM47 reintroduction. Given the stringent criteria used to define 435 RBM47-bound and regulated targets and the generally low level of intronic RNA 436 in a cell, it is likely that this subset is an underestimation of the number of 437 RBM47-regulated transcripts, and does not take into account the potential for 438 regulatory events, such as re-localization, that may not alter steady state 439 transcript levels. The complex interplay of RBPs in agonistic and antagonistic 440 modulation of mRNA is becoming increasingly apparent. For example, the RRM- 441 domain containing protein HuR modulates the destabilizing effects of miRNAs 442 [54, 55, 66] and AUF1 [67] on common target transcripts The data presented 443 here suggest that target-specific RBM47 regulation may arise through modulation 444 of accessibility of other factors to a common mRNA transcript.

445 RBM47 binds and regulates transcripts that encode for proteins of several 446 different biological functions. The effects of reduced RBM47 activity on cancer 447 cell fitness, determined by the sum phenotypic output of all regulated target

16 448 transcripts, may therefore vary depending on the context. Such pleiotropic effects 449 could target multiple steps of cancer progression. Indeed, even though RBM47 450 loss was associated with metastatic cancer clones in our model systems, 451 evidence for selection against RBM47 was detected already in primary breast 452 cancers. One of the most highly bound RBM47 mRNA targets, the secreted Wnt 453 inhibitor DKK1, is stabilized by RBM47 and partially mediates RBM47 tumor 454 suppressive function. Interestingly, rbm47 knockdown in zebra fish embryos 455 leads to a headless phenotype mediated via up-regulation of the wnt8a pathway 456 [68]. In addition to DKK1, our analysis identified a number of potential mediators 457 of RBM47 effects for future studies.

458 From a general perspective, the present findings illuminate two concepts. First, 459 we show that low-frequency cancer mutations can give rise to tumorigenic 460 phenotypes. Our work highlights the power of orthogonal approaches for the 461 analysis of cancer genome resequencing data. Second, we show that loss of a 462 broadly targeted and multifunctional RBP can increase the fitness of certain 463 cancer cell clones in support of metastasis. This complements previous findings 464 of RNA-binding proteins as mediators of oncogenic phenotypes [16, 17, 69-71]. 465 Deregulation of RNA-binding proteins is thus emerging as a prominent source of 466 complex transcriptomic diversity that can serve as a platform for the selection of 467 metastatic traits during tumor progression.

468

17 469

470 EXPERIMENTAL PROCEDURES

471 Cell lines. The metastatic breast cancer cell lines have been previously 472 described [21, 22, 72]. SKBR3, ZR-75-30 and HCC1954 cells were obtained from 473 ATCC. For retrovirus and lentivirus production, GPG29 and 293T cells, 474 respectively, were utilized. All cell lines were maintained under standard tissue 475 culture conditions. Single cell-derived clones were isolated utilizing fluorescence- 476 activated cell sorting from genetically engineered 231-BrM2 (WT10, MUT3) and 477 CN34-BrM2 (WT6) cells.

478 cDNA expression and RNAi. For RBM47 restoration, RBM47 was cloned into 479 the pBABE-puro retroviral expression vector. The RBM47I281fs was generated by 480 site-directed mutagenesis. Virus was generated in the GPG29 packaging cells. 481 For the generation of doxycycline-inducible expression constructs, both wild type 482 and mutant FLAG-RBM47 were cloned into the pRetroX-Tight-Pur expression 483 system (Clontech).

484 For RNAi-mediated gene silencing, RBM47 pGIPZ shRNA constructs were 485 obtained from Open Biosystems. The DKK1 shRNAmiR constructs were 486 designed based on the rules described by [56] and cloned into the pGIPZ vector 487 as described [73] using the following oligonucleotide templates:

488 DKK1miRm: 489 TGCTGTTGACAGTGAGCGACGGGTCTTTGTCGCGATGGTATAGTGAAGCCA 490 CAGATGTATACCATCGCGACAAAGACCCGGTGCCTACTGCCTCGGA

491 DKK1miRn: 492 TGCTGTTGACAGTGAGCGACACCTTGAACTCGGTTCTCAATAGTGAAGCCA 493 CAGATGTATTGAGAACCGAGTTCAAGGTGGTGCCTACTGCCTCGGA

494 DKK1miRo: 495 TGCTGTTGACAGTGAGCGCCAACTCAATCCTAAGGATATATAGTGAAGCCAC 496 AGATGTATATATCCTTAGGATTGAGTTGATGCCTACTGCCTCGGA

18 497 DKK1miRp: 498 TGCTGTTGACAGTGAGCGACAGTAAATTACTGTATTGTAATAGTGAAGCCAC 499 AGATGTATTACAATACAGTAATTTACTGCTGCCTACTGCCTCGGA

500 DKK1miRq: 501 TGCTGTTGACAGTGAGCGAAACGGAAGTGTGATATGTTTATAGTGAAGCCA 502 CAGATGTATAAACATATCACACTTCCGTTCTGCCTACTGCCTCGGA

503 The pGIPZ empty vector was used as a control.

504 Animal studies. All animal experiments were performed in accordance with a 505 protocol approved by MSKCC Institutional Animal Care and Use Committee. 506 Lung metastasis assays were conducted in 5-7 week old female NOD/SCID 507 mice. Brain metastasis and mammary tumor assays were carried out using 5-7 508 week old female athymic nude mice. In vivo bioluminescence imaging was 509 performed using the IVIS Spectrum Xenogen machine (Caliper Life Sciences). 510 For orthotopic mammary tumor assays, cells were mixed with Matrigel. Injections 511 were confirmed and tumor growth was followed by bioluminescent imaging. 512 Statistical significance of tumor and metastasis free survival was assessed by the 513 log-rank test. Differences in raw and normalized bioluminescence signal was 514 assessed by the Student’s t-test and Wilcoxon rank-sum test, respectively. Brain 515 images were acquired with a Leica SP5 up-right confocal microscope and Zeiss 516 AxioVert 200M using 20X and 5X objectives. Image analysis was performed with 517 Metamorph and ImageJ softwares.

518 mRNA and protein detection. Total RNA was extracted using PrepEase RNA 519 spin kit (USB). We used Transcriptor First Strand cDNA Synthesis Kit (Roche) for 520 cDNA synthesis. Quantitative PCR was performed using predesigned Taqman 521 gene expression assays (Life Technologies) and the 7900HT or ViiA 7 real-time 522 PCR systems (Applied Biosystems/Life Technologies). TBP was used as a 523 housekeeping control gene. mRNA stability was assessed by performing 524 quantitative PCR after actinomycin D treatment. For immunoblotting, antibodies 525 recognizing RBM47 (HPA006347, Sigma), α-tubulin (11H10, Cell Signaling) and

19 526 ACTB (Sigma) were utilized. Secondary antibodies were HRP (Pierce) or 527 fluorescence (LiCor) conjugated. Immunostaining for RBM47 (HPA006347, 528 Sigma) was performed according to standard protocols in the MSKCC Molecular 529 Cytology Core Facility on paraffin embedded tissue blocks. Secreted protein was 530 detected by ELISA (R&D).

531 HITS-CLIP. 231BrM2 tet-on FLAG-RBM47 cells were treated with 1ng/ml 532 doxycycline for 3 days before 254nm UV crosslinking at 400mJ/cm2 on a bed of 533 ice (Stratalinker2400, Stratagene). Samples were processed for HITS-CLIP as 534 previously described [32] using an anti-Flag antibody (Sigma F3165), and 535 omitting 3’ linker ligation in favor of direct labeling of protein-bound RNA with 32P- 536 γ-ATP. Non-crosslinked cells were used as a negative control, with IPed Flag- 537 RBM47 protein detected using a second anti-Flag antibody (Sigma F7425). 538 Purified RBM47-bound RNA fragments were polyA tailed (E-PAP, NEB), and 539 reverse transcribed (Superscript III, Invitrogen) in the presence of Br-dUTP 540 (Sigma). Unique polydT-NV RT-primers were used per replicate, containing 541 Solexa sequences separated by an abasic furan (that serves as an ApeI cut site), 542 a 6nt degenerate region and a 6nt index sequence to allow for multiplexing 543 during sequencing.

544 RT Primer 1

545 pGCACTGTTN6GATCGTCGGACTGTAGAACTCT/idSp/CAAGCAGAAGACGGC

546 ATACGAT20VN

547 RT Primer 2

548 pGCGAAACTN6GATCGTCGGACTGTAGAACTCT/idSp/CAAGCAGAAGACGGC

549 ATACGAT20VN

550 BrdU-cDNA was stringently purified by IP (Santa Cruz, sc-32323) using ProteinG 551 dynabeads (Invitrogen), eluted from the beads via BrdU competitive elution 552 (Sigma), and re-immunoprecipitated. cDNA was circularized on bead (CircLigase 553 ssDNA Ligase II, Epicentre), washed and digested with ApeI (NEB) to relinearize. 554 cDNAs were eluted from the beads by heating to 98C in Phusion HF Buffer

20 555 (NEB), then PCR amplified using Phusion DNA polymerase (NEB) and SYBR 556 Green I (Invitrogen) in an iQ5 real-time PCR machine in order to monitor 557 amplification, with the samples being removed when the RFU signal reached 558 ~1000.

559 P5 – aatgatacggcgaccaccgacaggttcagagttctacagtccgacg

560 P3 - caagcagaagacggcata

561 PCR products were purified using MinElute columns (Qiagen) as per 562 manufacturers instructions and quantified using Quant-It (Invitrogen). cDNA was 563 multiplexed and sequenced using Illumina Hi-Seq (small RNA sequencing primer 564 – cgacaggttcagagttctacagtccgacgatc). All data analysis was done using the 565 Galaxy platform [74], as previously described [32, 34, 35, 38].

566 RT–PCR validation

567 RT-PCR validiation was carried out using total RNA from MDA231-BrM2 568 (Control) and WT10 cells (iScript, Bio-Rad, Accuprime Pfx Supermix 1, Life 569 Technologies) as previously described [75]. In all cases lanes 1-3 contain an 570 equal mixture of control and WT10 cell cDNA amplified at n-1, n and n+1 cycles, 571 lane 4 contains an absence of reverse transcriptase control (-RT), with all other 572 lanes corresponding to replicate samples of the indicated cell type amplified to n 573 cycles. IR and ΔI calculated using ImageJ [76].

574 Bioinformatic analysis. All analyses were conducted using R. Microarray data 575 from human untreated tumor data sets (GSE2603 [22] and GSE2034 [23]) were 576 preprocessed as described [77]. For the RNA-seq TCGA data set, normalized 577 mRNA z-scores were downloaded from the TCGA cBio portal [27]. The 578 microarray data from metastatic cell lines [21, 22] were processed with GCRMA 579 together with updated probe set definitions using R packages affy, gcrma and 580 hs133ahsentrezgcdf (version 10). Unsupervised hierarchical clustering was 581 performed using the function heatmap.2 with Pearson’s correlation coefficient as 582 the similarity metric. For survival analysis, a Cox proportional hazards model was

21 583 utilized as implemented in the coxph function in the R-package survival. For 584 RNA-seq analysis of the cell lines, raw paired-end sequencing data were mapped 585 to human genome (hg19 build) with STAR2.3.0 [78] using standard options. 586 Reads mapped to each transcript were counted by HTSeq v0.5.4 [79] with default 587 settings. The read count table was normalized to library size by DESeq [79]. 588 Correlation with RBM47 expression was assessed by Pearson’s correlation 589 coefficient utilizing the R functions cor and cor.test. Wnt pathway activity in 590 clinical tumors was assessed utilizing a curated list of Wnt target genes in breast 591 cancer [58] and calculating sums of z-scores for each tumor.

592

22 593 ACCESSION NUMBERS 594 595 RNA-seq data has been deposited to the Gene Expression Omnibus under the 596 accession number GSE53779. 597 598 ACKNOWLEDGEMENTS 599 600 We thank members of the Massagué lab for discussion. This work was supported 601 by a grant from the Starr Cancer Consortium to R.B.D. and J.M. J.M. and R.B.D 602 are investigators of the Howard Hughes Medical Institute. 603 604 LIST OF SUPPLEMENTARY MATERIALS 605 606 Figure 1 – figure supplement 1 607 Figure 2 – figure supplement 1 608 Figure 3 – figure supplement 1 609 Figure 3 – figure supplement 2 610 Figure 5 – figure supplement 1 611 Figure 6 – figure supplement 1 612 Figure 7 – figure supplement 1 613 Figure 8 – figure supplement 1 614 Supplementary file 1 615 Supplementary file 2 616 Supplementary file 3

23 617 FIGURE LEGENDS

618 Figure 1. RBM47 expression associated with breast cancer progression.

619 (A) A schematic of the analytical approach. Genes identified as mutated in a 620 breast cancer brain metastasis by Ding et al. (2010) where compared to 621 metastasis-associated gene expression traits in both clinical data sets and 622 experimental model systems. This identified RBM47 as a putative breast cancer 623 suppressor gene.

624 (B) RBM47 mRNA expression measured by quantitative real-time RT-PCR in 625 two cell line systems of breast cancer metastasis. In both panels, the data are 626 normalized to the parental cell line (Par). Error bars represent 95% confidence 627 intervals obtained from multiple PCR reactions. LM, lung metastatic derivative; 628 BoM, bone metastatic derivative; BrM, brain metastatic derivative.

629 (C) RBM47 protein expression measured by Western blotting. Samples as in 630 (B). Tubulin used as a loading control.

631 (D) Brain metastasis free survival in a cohort of 368 untreated breast cancer 632 patients. Cases classified based on RBM47 mRNA expression, bottom 1/3 in 633 blue, top 2/3 in red. P-value derived from a Cox proportional hazards model using 634 RBM47 expression as a continuous variable.

635 (E) Lung metastasis free survival in a cohort of 368 untreated breast cancer 636 patients. Cases classified based on RBM47 mRNA expression, bottom 1/3 in 637 blue, top 2/3 in red. P-value derived from a Cox proportional hazards model using 638 RBM47 expression as a continuous variable.

639 (F) Bone metastasis free survival in a cohort of 368 untreated breast cancer 640 patients. Cases classified based on RBM47 mRNA expression, bottom 1/3 in 641 blue, top 2/3 in red. P-value derived from a Cox proportional hazards model using 642 RBM47 expression as a continuous variable.

24 643 (G) RBM47 expression as measured by RNA-seq in the TCGA data set of 748 644 patients. Samples grouped based on breast cancer molecular subtype: luminal A, 645 luminal B, Her2 positive, claudin low and basal. RBM47 expression is lower in 646 claudin low and basal subtypes, both of which are associated with poor patient 647 outcome.

648 (H) A schematic showing the predicted protein structure of RBM47, its closest 649 homologue A1CF, a known RNA-binding protein, and the predicted structure of 650 RBM47I281fs mutant. Blue diamonds represent RRM motifs, pink rectangle 651 represents a truncated RRM motif.

652

653 Figure 1 – figure supplement 1.

654 (A) Multivariate Cox proportional hazards models for both brain and lung 655 metastasis free survival in a cohort of 368 untreated breast cancer patients. The 656 model incorporates RBM47 expression as a continuous variable, estrogen 657 receptor (ER) expression, progesterone receptor (PR) expression and Her2 658 expression.

659 (B) A list of protein-altering RBM47 breast cancer mutations in the Catalogue 660 of Somatic Mutations in Cancer database (COSMIC, 661 http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/) as of June 2013.

662 (C) Percentage of genetic alterations, non-synonymous mutations or 663 homozygous deletions, in RBM47 in the TCGA cohort of 748 breast cancers 664 classified based on molecular subtypes: luminal A, luminal B, Her2 positive, 665 claudin low and basal. A significant enrichment is observed in the basal subtype 666 (P = 0.00015, Fisher’s exact test). 667

668

669 Figure 2. RBM47 suppresses metastatic breast cancer progression.

25 670 (A) Normalized lung photon flux in mice after tail vein inoculation of MDA231- 671 LM2 cells expressing either wild type RBM47 or RBM47I281fs. P-value calculated 672 utilizing repeated measures two-way ANOVA. N=6 for RBM47I281fs, N=8 for 673 RBM47.

674 (B) Representative bioluminescence images from the experiment shown in 675 (B). The color scale shows bioluminescence (photons/second).

676 (C) Brain metastasis free survival as determined by in vivo bioluminescence 677 imaging in mice after intracardiac inoculation of MDA231-BrM2 cells expressing 678 either wild type RBM47 or RBM47I281fs. P-value calculated using the log-rank test. 679 N=9 for RBM47I281fs, N=7 for RBM47.

680 (D) Representative bioluminescence images from the experiment shown in 681 (C). The color scale shows bioluminescence (photons/second).

682 (E) Ex vivo quantification of bioluminescence from brain metastasis on day 42 683 of the experiment shown in panel C. P-value calculated by two-tailed Student’s t- 684 test.

685 (F) Lung metastasis free survival as determined by in vivo bioluminescence 686 imaging in mice after tail vein inoculation of parental MDA-MB-231 cells 687 expressing either control vector (pGIPZ) or hairpins against RBM47 (shRNA1 688 and shRNA2). P-values calculated using the log-rank test. N=8 for Ctrl group, 689 N=6 for shRNA1, N=5 for shRNA2.

690 (G) In vivo bioluminescence imaging on day 36 of the experiment shown in (E) 691 demonstrates earlier emergence of detectable lung metastasis for the RBM47 692 knockdown groups when compared to the control animals. The color scale shows 693 bioluminescence (photons/second).

694 (H) Quantification of bioluminescence for the time point shown in (F). Data 695 normalized to signal on day 0 for each animal. P-value calculated using the 696 Wilcoxon rank-sum test.

26 697 Figure 2 – figure supplement 1.

698 (A) RBM47 mRNA expression measured by quantitative real-time RT-PCR in 699 231-BrM2 and 231-LM2 cells transduced with either wild type RBM47 or 700 RBM47I280fs-expressing cDNA constructs, respectively. The data are normalized 701 to the untransduced control cell lines. Error bars represent 95% confidence 702 intervals obtained from multiple PCR reactions.

703 (B) Flag-RBM47 protein expression determined by Western blotting in non- 704 clonal 231BrM2 tet-on Flag-RBM47 cells. No RBM47 or Flag protein is detected 705 in the cells expressing Flag-RBM47I281fs (predicted size 34kDa) treated with the 706 same doxycycline doses.

707 (C) Quantification of bioluminescence from the head of mice on day 0 of the 708 experiment shown in Figure 2A. P-value calculated by two-tailed Student’s t-test.

709 (D) RBM47 mRNA and protein expression measured by quantitative real-time 710 RT-PCR and Western blotting, respectively, in parental MDA-231 cells 711 expressing either control vector (pGIPZ) or hairpins against RBM47 (shRNA1 712 and shRNA2). Error bars represent 95% confidence intervals obtained from 713 multiple PCR reactions. Tubulin used as a protein loading control.

714 (E) RBM47 mRNA and protein expression measured by quantitative real-time 715 RT-PCR and Western blotting, respectively, in parental CN34 cells expressing 716 either control vector (pGIPZ) or hairpins against RBM47 (shRNA1 and shRNA2). 717 Error bars represent 95% confidence intervals obtained from multiple PCR 718 reactions. Tubulin used as a protein loading control.

719 (F) Quantification of lung cancer cell burden on day 0 by in vivo 720 bioluminescence imaging in mice after tail vein inoculation of parental CN34 cells 721 expressing either control vector (pGIPZ) or hairpins against RBM47 (shRNA1 722 and shRNA2). P-values calculated using two-sided Student’s t test. N=5 for all 723 groups.

27 724 (G) Quantification of lung cancer cell burden on day 141 by in vivo 725 bioluminescence imaging in mice after tail vein inoculation of parental CN34 cells 726 expressing either control vector (pGIPZ) or hairpins against RBM47 (shRNA1 727 and shRNA2). P-values calculated using two-sided Student’s t test. N=4 for Ctrl 728 group, N=5 for shRNA1, N=4 for shRNA2.

729

730 Figure 3. Clonal heterogeneity in RBM47 sensitivity.

731 (A) A lung metastatic nodule with strong RBM47 expression in a mouse 732 inoculated with RBM47-transduced MDA231-LM2 cells. RBM47 protein 733 expression detected by immunohistochemistry using an antibody against RBM47.

734 (B) A lung metastatic nodule with weak RBM47 expression in a mouse 735 inoculated with RBM47-transduced MDA231-LM2 cells. RBM47 protein 736 expression detected by immunohistochemistry. Note the clearly reduced staining 737 when compared to panel (A).

738 (C) A lung metastatic nodule with weak RBM47 expression in a mouse 739 inoculated with RBM47I280fs-transduced MDA231-LM2 cells. RBM47 protein 740 expression detected by immunohistochemistry. Staining intensity similar to that 741 seen in panel (B).

742 (D) RBM47 expression in normal mouse lung. RBM47 protein expression 743 detected by immunohistochemistry. Note small cells with strong RBM47 744 expression, similar to those seen in panels (B) and (C).

745 (E) Brain metastasis free survival as determined by in vivo bioluminescence 746 imaging in mice after intracardiac inoculation of WT10 cells. The RBM47 group 747 received doxycycline in diet. P-value calculated using the log-rank test. N=9 for 748 both groups.

28 749 (F) Representative ex vivo bioluminescence images from brain metastasis of 750 the experiment shown in (E). The color scale shows bioluminescence 751 (photons/second).

752 (G) Ex vivo quantification of bioluminescence from brain metastasis with and 753 without RBM47 reintroduction. WT10 data from the experiment shown in (E). 754 WT6 data from a similar experimental setup. P-value calculated by two-tailed 755 Student’s t-test. N=9 for Ctrl group, N=10 for RBM47 group.

756 (H) Brain metastasis free survival as determined by in vivo bioluminescence 757 imaging in mice after intracardiac inoculation of MUT3 cells. The I281fs group 758 received doxycycline in diet. P-value calculated using the log-rank test. N=9 for 759 Ctrl group, N=7 for RBM47I281fs group.

760 (I) Representative ex vivo bioluminescence images from brain metastasis of 761 the experiment shown in (H). The color scale shows bioluminescence 762 (photons/second).

763 (J) Ex vivo quantification of bioluminescence from brain metastasis of the 764 experiment shown in (H). P-value calculated by two-tailed Student’s t-test.

765

766 Figure 3 – figure supplement 1.

767 (A) Examples of lung metastatic nodules in a mouse inoculated with RBM47- 768 transduced 231-LM2 cells. Some metastases have strong expression of RBM47 769 (left), whereas others show only weak RBM47 expression (right). Staining against 770 human vimentin and Ki67 is shown for the same lesions.

771

772 Figure 3 – figure supplement 2.

29 773 (A) RBM47 mRNA expression measured by quantitative real-time RT-PCR in 774 WT6 cells treated with increasing concentrations of doxycycline. Error bars 775 represent 95% confidence intervals obtained from multiple PCR reactions.

776 (B) RBM47 mRNA expression measured by quantitative real-time RT-PCR in 777 WT10 cells treated with increasing concentrations of doxycycline. Error bars 778 represent 95% confidence intervals obtained from multiple PCR reactions.

779 (C) RBM47 mRNA expression measured by quantitative real-time RT-PCR in 780 MUT3 cells treated with increasing concentrations of doxycycline. Error bars 781 represent 95% confidence intervals obtained from multiple PCR reactions.

782 (D) RBM47 protein expression detected by Western blotting in WT6 cells 783 treated with increasing concentrations of doxycycline. ACTB used as a loading 784 control. Quantification of signal shown in the lower panel, normalized to both 785 ACTB loading control and the level of endogenous RBM47 detected in HCC1954 786 cells.

787 (E) RBM47 protein expression detected by Western blotting in WT10 cells 788 treated with increasing concentrations of doxycycline. ACTB used as a loading 789 control. Quantification of signal shown in the lower panel, normalized to both 790 ACTB loading control and the level of endogenous RBM47 detected in HCC1954 791 cells.

792 (F) Proliferation of WT6 cells assessed under standard tissue culture 793 conditions with and without doxycycline (2.5ng/ml).

794 (F) Demonstration of doxycycline-inducible gene induction in brain metastatic 795 lesions formed by 231-BrM2 cells transduced with an inducible RFP construct. 796 Cancer cells express constitutively GFP. Doxycycline treatment started on day 14 797 after intracardiac cancer cell inoculation. Doxycycline-inducible RFP expression 798 can be seen in cancer cells.

799

30 800 Figure 4. HITS-CLIP identifies genome-wide RBM47 binding maps.

801 (A) Radiolabelled RNA is detectable in RBM47-expressing 231-BrM2 802 metastatic cells that have been UV-irradiated indicating in vivo RNA binding 803 ability. No RNA is detected in non-crosslinked cells despite the presence of 804 ample immunoprecipitated Flag-RBM47 protein. No RNA or protein is detected in 805 control 231-BrM2 transduced with empty vector. Samples run in duplicate.

806 (B) Schematic of the modified HITS-CLIP protocol showing autoradiogram of 807 duplicate Flag-RBM47 samples used. Purified RBM47-bound RNA fragments 808 (green) were polyA tailed and reverse transcribed in the presence of Brd-U using 809 a polydT-NV primer encoding the full sense sequence of the Illumina reverse 810 sequencing primer (blue), an abasic furan that serves as an ApeI cut site (), a 811 partial reverse complement to the Illumina forward sequencing primer (orange), 812 and two hexamer sequences (purple): a known-sequence index for multiplexing 813 and a degenerate barcode used to distinguish unique cDNA clones from PCR 814 duplicates. cDNA were stringently purified, circularized and linearized using ApeI 815 to bring Illumina sequenced into the correct orientation with respect to the cloned 816 fragment, and the samples PCR amplified and deep sequenced.

817 (C) RBM47 HITS-CLIP is highly reproducible between replicate experiments 818 at the level of unique CLIP tags per transcript.

819 (D) Increasing the stringency of biologically reproducible RBM47 binding site 820 definition reveals predominant binding in 3’UTRs and intronic regions of target 821 transcripts, with the most robust binding (tags per binding site) evident in 3’UTRs.

822 (E) Distribution of tags number per biologically reproducible cluster in coding 823 and non-coding regions of RBM47-targeted transcripts reveals a bimodal binding 824 pattern between 3’UTRs and introns, with the latter having large numbers of 825 reproducible yet less robust binding.

31 826 (F) MEME analysis reveals an enrichment for polyU sequences (50 sites, P = 827 2.4e-16) in the +/-10nt foot print region surrounding reproducible RBM47 deletions 828 CIMS (357 sites with ≥5 mutations, FDR≤0.01, see Ref. [38]).

829 (G) Widespread RBM47 binding is seen in target transcripts, as exemplified by 830 binding patterns seen in the 3’UTRs of DKK1 and IL8.

831

832 Figure 5. RBM47 regulates alternative splicing.

833 (A) Schematic showing the method used to calculate alternative exon inclusion 834 rates from paired-end RNA-seq data. 5’CE - 5’ flanking constitutive exon, 3’CE - 835 3’ flanking constitutive exon, 5’FI – 5’ Flanking intron, 3’FI – 3’ Flanking intron.

836 (B) Scatter plot of all expressed alternatively spliced CA exons showing RBM47- 837 dependent change in inclusion (black, ≥10 RNA-seq reads spanning exon-exon 838 junctions, ΔI≥|0.2|, p≤0.05) with orange points indicating RBM47-bound and 839 included CA exons, and blue points indicating bound and excluded exons, 840 respectively. CA exons were considered bound given a total of tags≥10 in BC2 841 tags≥5 clusters mapping to the region spanning the start of the 5’CE to the end of 842 the 3’CE. P-values calculated by Fisher’s exact test using total isoform 1 and 843 total isoform 2 RNA-seq reads in each condition.

844 (C) Left panel shows a section of the SLK transcript (blue) that includes a CA 845 exon (grey box). The top two panels show RNA-seq data from WT10 (green) and 846 control cells (red), with RBM47 HITS-CLIP tags mapping to the region shown 847 beneath in black. Increased RNA-seq signal corresponding to the CA exon is 848 seen in the presence of RBM47 expression, while robust binding is evident in the 849 5’FI. Independent RT-PCR validation of this splice is shown in the right panel, 850 with IR calculated using ImageJ analysis of autoradiograms.

851 (D) Normalized complexity map of RBM47-dependent alternative splicing of CA 852 exons. Orange and blue peaks represent binding associated with RBM47- 853 dependent exon inclusion and exclusion, respectively.

32 854

855 Figure 5 – figure supplement 1.

856 (A) RBM47-dependent exclusion of exon 6 of MDM4, as in Figure 5C.

857 (B) RBM47-dependent exclusion of a CA in LIMCH1, as in Figure 5C.

858 (C) RBM47-dependent inclusion of exon 5 in MBNL1, as in Figure 5C.

859 (D) RBM47-dependent inclusion of two CA exons in SEC31A, as in Figure 5C

860

861 Figure 6. RBM47-induced changes in mRNA levels.

862 (A) Distribution of P-values from correlation analysis of doxycycline 863 concentration and gene expression for all genes in WT6, WT10 and MUT3 cell 864 lines, respectively. Global gene expression determined by RNA-seq.

865 (B) Distribution of correlation coefficients between doxycycline concentration 866 and gene expression in WT6, WT10 and MUT3 cell lines, respectively.

867 (C) Heat maps showing the top 102 positively (UP) correlated and 92 868 negatively (DOWN) correlated genes with RBM47 expression in WT6 and WT10 869 cells. The expression of these genes does not correlate with RBM47I280fs 870 expression in the MUT3 cells.

871 (D) RBM47 mRNA expression in the TCGA cohort of breast cancer samples 872 classified by the clusters shown in Figure 6 – figure supplement 1C. P-value 873 determined by two-tailed Student’s t-test.

874 (E) Fold change between Cluster 1 and Cluster 2 shown for the 102 positively 875 and 92 negatively correlated RBM47 target genes, respectively. Positive fold 876 change shows higher expression in Cluster 2, which has lower expression of 877 RBM47. The genes that are induced upon RBM47 reintroduction tend to have 878 lower expression in Cluster 2, and the genes that show lower expression upon

33 879 RBM47 reintroduction tend to have higher expression in Cluster 2. This is in line 880 with the experimental results shown in panel C. P-value determined by two-tailed 881 Student’s t-test.

882 (F) Pie charts showing the fraction of target genes with more than 100 RBM47 883 tags for both the 102 positively and 92 negatively correlated RBM47 target 884 genes.

885 (G) Tags per transcript plotted for both the positively and negatively correlated 886 RBM47 target genes that showed more than 1 tag. The only two binding partners 887 with >104 tags represent DKK1 and IL8, respectively.

888 (H) Secreted DKK1 and IL8 protein levels determined by ELISA in WT6 cells 889 treated with the indicated doxycycline concentrations. VEGFA used as a control.

890

891 Figure 6 – figure supplement 1.

892 (A) Normalized average RNA-seq reads per transcript for the 102 positively 893 correlated RBM47 target genes shown for WT6, WT10 and MUT3 cells treated 894 with the indicated doxycycline concentrations.

895 (B) Normalized average counts for the 92 negatively correlated RBM47 target 896 genes shown for WT6, WT10 and MUT3 cells treated with the indicated 897 doxycycline concentrations.

898 (C) A heatmap showing unsupervised hierarchical clustering in the TCGA 899 cohort of 748 samples using the 194 RBM47 target genes identified by RNA-seq. 900 Two main clusters are detected.

901

902 Figure 7. RBM47 modulates DKK1 mRNA stability.

903 (A) RBM47 and DKK1 mRNA expression measured by quantitative real-time 904 RT-PCR in SKBR3 cells expressing either control vector (pGIPZ) or hairpins

34 905 against RBM47 (shRNA1 and shRNA2). Error bars represent 95% confidence 906 intervals obtained from multiple PCR reactions.

907 (B) DKK1 mRNA stability determined by measuring mRNA levels by 908 quantitative real-time RT-PCR in WT6, WT10 and MUT3 cells, treated with or 909 without doxycycline, after inhibition of transcription with actinomycin D. Data 910 normalized to time point 0. Error bars represent 95% confidence intervals

911 obtained from multiple PCR reactions. WT6: T1/2 Ctrl = 2.3h, T1/2 Dox = 9.8h;

912 WT10: T1/2 Ctrl = 2.4h, T1/2 Dox = 5.4h; MUT3: T1/2 Ctrl = 2.1h, T1/2 Dox = 2.2h.

913 (C) Schematic showing the locations of different DKK1 miR-constructs in 914 relation to DKK1 genomic locus and RBM47 binding patterns. miRm and miRn 915 target exon 1 that is not bound by RBM47. miRo, miRp and miRq target DKK1 916 3’UTR that is strongly bound by RBM47.

917 (D) DKK1 mRNA expression measured by quantitative real-time RT-PCR in 918 WT6 cells expressing either control vector (pGIPZ) or the five DKK1-targeting 919 miR constructs shown in panel (E). Error bars represent 95% confidence 920 intervals obtained from multiple PCR reactions.

921 (E) DKK1 mRNA expression measured by quantitative real-time RT-PCR in 922 WT6 cells expressing either control vector (pGIPZ) or the five DKK1-targeting 923 miR constructs, with or without doxycycline treatment. Data normalized to the 924 non-treated control for each cell line separately. Error bars represent 95% 925 confidence intervals obtained from multiple PCR reactions.

926 (F) DKK1 mRNA expression measured by quantitative real-time RT-PCR in 927 WT10 cells expressing either control vector (pGIPZ) or the five DKK1-targeting 928 miR constructs shown in panel (E). Error bars represent 95% confidence 929 intervals obtained from multiple PCR reactions.

930 (G) DKK1 mRNA expression measured by quantitative real-time RT-PCR in 931 WT10 cells expressing either control vector (pGIPZ) or the five DKK1-targeting 932 miR constructs, with or without doxycycline treatment. Data normalized to the

35 933 non-treated control for each cell line separately. Error bars represent 95% 934 confidence intervals obtained from multiple PCR reactions.

935

936 Figure 7 – figure supplement 1.

937 A) RBM47 and DKK1 mRNA expression measured by quantitative real-time 938 RT-PCR in ZR-75-30 cells expressing either control vector (pGIPZ) or a hairpin 939 against RBM47 (shRNA2). Error bars represent 95% confidence intervals 940 obtained from multiple PCR reactions.

941

942 Figure 8. RBM47 suppresses tumor progression via Wnt inhibition.

943 (A) AXIN2 mRNA levels determined by quantitative real-time RT-PCR in WT6 944 cells treated with recombinant WNT3A in the presence of increasing 945 concentrations of doxycycline. Error bars represent 95% confidence intervals 946 obtained from multiple PCR reactions.

947 (B) Normalized level of AXIN2 mRNA inhibition as determined by quantitative 948 real-time RT-PCR in WT6 cells transduced wither with control vector or 949 shRNAmiR constructs targeting the first exon of DKK1. The cells were treated 950 with recombinant WNT3A in the presence of increasing concentrations of 951 doxycycline. Error bars represent 95% confidence intervals obtained from 952 multiple PCR reactions.

953 (C) Wnt pathway activity assessed in the TCGA cohort of primary breast 954 tumors, grouped by RBM47 expression tertiles (L, low; M, medium; H, high). Wnt 955 signature value calculated as sum of z-scores for a curated set of 16 Wnt target 956 genes in breast cancer. P-value determined by linear regression analysis.

957 (D) Mammary tumor re-initiation assay. Five thousand WT6 cells implanted 958 orthotopically in mice. RBM47 induced by doxycycline feed. Tumor growth

36 959 detected by bioluminescence. P-value determined by the log-rank test. N=20 960 tumors for each group.

961 (E) Quantification of mammary tumor burden by in vivo bioluminescence 962 imaging on day 33 of the experiment shown in (D). Data normalized to day 0 for 963 each tumor. P-value calculated by the Wilcoxon rank-sum test.

964 (F) Quantification of mammary tumor burden by in vivo bioluminescence 965 imaging in mice inoculated with 231-Brm2 cells transduced with either control 966 (pGIPZ) or DKK1-targeting shRNAmiR constructs. Data normalized to day 0 for 967 each tumor. P-value calculated by one-tailed Wilcoxon rank-sum test.

968 (G) Quantification of ex vivo brain bioluminescence shown for mice inoculated 969 intracardiacly with WT6 cells transduced with either control (pGIPZ) or DKK1- 970 targeting shRNAmiR constructs in the presence of RBM47, i.e. doxycycline in 971 diet. One out of 9 (11%) control mice developed robust brain metastasis whereas 972 8/17 (47%) mice in the DKK1 knockdown groups showed metastasis. P-value 973 calculated by one-tailed Student’s t-test.

974 (H) Representative images of coronal brain sections analyzed for GFP 975 immunofluorescence from the experiment shown in panel (G). Lesion contours 976 are marked in white. Arrowheads indicate the lesions shown in higher 977 magnification on the right; a similar brain area is shown for the control group. 978 Scale bar 500μm.

979 (I) At the global level, RBM47 binds to ~2500 target mRNAs. However, the 980 abundance or alternative splicing of only a fraction of these change depending on 981 RBM47 status. The target genes represent molecules from various signaling 982 pathways. The net effect of growth promoting and inhibiting alterations determine 983 whether RBM47 loss is beneficial for a particular cancer clone.

984 (J) At the target mRNA level, the effects of RBM47 are dependent on the 985 presence of other factors that modulate mRNA processing. Hence, the 986 phenotype of RBM47 loss depends on the intracellular molecular milieu on a per

37 987 transcript basis. This is exemplified by the interaction of RBM47 with DKK1 988 mRNA.

989

990 Figure 8 – figure supplement 1.

991 (A) Quantification of mammary tumor burden by in vivo bioluminescence 992 imaging on day 36 after five thousand WT10 cells were implanted orthotopically 993 in mice. RBM47 induced by doxycycline feed. Data normalized to day 0 for each 994 tumor. P-value calculated by the Wilcoxon rank-sum test.

995 (B) DKK1 mRNA measured by quantitative real-time RT-PCR in 231-BrM2 996 cells transduced with either control (pGIPZ) or DKK1-targeting shRNAmiR 997 constructs. Error bars represent 95% confidence intervals obtained from multiple 998 PCR reactions.

999 (C) Secreted DKK1 protein levels determined by ELISA in WT6 cells 1000 transduced with either control (pGIPZ) or DKK1-targeting shRNAmiR constructs, 1001 with or without doxycycline treatment.

1002

1003 SUPPLEMENTARY FILES

1004 Supplementary file 1 – RBM47-bound alternatively spliced exons showing 1005 RBM47-dependent ΔI≥|0.2|.

1006 Supplementary file 2 – mRNAs changed upon RBM47 reintroduction.

1007 Supplementary file 3 – RBM47 bound transcripts.

1008

1009 REFERENCES 1010 1011 1012 1. Vogelstein, B., et al., Cancer genome landscapes. Science, 2013. 339(6127): p. 1013 1546-58.

38 1014 2. Ding, L., et al., Genome remodelling in a basal-like breast cancer metastasis and 1015 xenograft. Nature, 2010. 464(7291): p. 999-1005. 1016 3. Cancer Genome Atlas, N., Comprehensive molecular portraits of human breast 1017 tumours. Nature, 2012. 490(7418): p. 61-70. 1018 4. Banerji, S., et al., Sequence analysis of mutations and translocations across 1019 breast cancer subtypes. Nature, 2012. 486(7403): p. 405-9. 1020 5. Stephens, P.J., et al., The landscape of cancer genes and mutational processes in 1021 breast cancer. Nature, 2012. 486(7403): p. 400-4. 1022 6. Shah, S.P., et al., The clonal and mutational evolution spectrum of primary 1023 triple-negative breast cancers. Nature, 2012. 486(7403): p. 395-9. 1024 7. Stephens, P.J., et al., Complex landscapes of somatic rearrangement in human 1025 breast cancer genomes. Nature, 2009. 462(7276): p. 1005-10. 1026 8. Shah, S.P., et al., Mutational evolution in a lobular breast tumour profiled at 1027 single nucleotide resolution. Nature, 2009. 461(7265): p. 809-13. 1028 9. Shen, H. and P.W. Laird, Interplay between the cancer genome and epigenome. 1029 Cell, 2013. 153(1): p. 38-55. 1030 10. Vanharanta, S., et al., Epigenetic expansion of VHL-HIF signal output drives 1031 multiorgan metastasis in renal cancer. Nat Med, 2013. 19(1): p. 50-6. 1032 11. Moore, M.J. and N.J. Proudfoot, Pre-mRNA processing reaches back to 1033 transcription and ahead to translation. Cell, 2009. 136(4): p. 688-700. 1034 12. Sharp, P.A., The centrality of RNA. Cell, 2009. 136(4): p. 577-80. 1035 13. Licatalosi, D.D. and R.B. Darnell, RNA processing and its regulation: global 1036 insights into biological networks. Nat Rev Genet, 2010. 11(1): p. 75-87. 1037 14. Pencheva, N. and S.F. Tavazoie, Control of metastatic progression by microRNA 1038 regulatory networks. Nat Cell Biol, 2013. 15(6): p. 546-54. 1039 15. Di Leva, G., M. Garofalo, and C.M. Croce, MicroRNAs in Cancer. Annu Rev 1040 Pathol, 2013. 1041 16. Karni, R., et al., The gene encoding the splicing factor SF2/ASF is a proto- 1042 oncogene. Nat Struct Mol Biol, 2007. 14(3): p. 185-93. 1043 17. Das, S., et al., Oncogenic splicing factor SRSF1 is a critical transcriptional target 1044 of MYC. Cell Rep, 2012. 1(2): p. 110-7. 1045 18. Mourtada-Maarabouni, M., et al., Candidate tumor suppressor LUCA- 1046 15/RBM5/H37 modulates expression of apoptosis and cell cycle genes. Exp Cell 1047 Res, 2006. 312(10): p. 1745-52. 1048 19. Oh, J.J., et al., 3p21.3 tumor suppressor gene H37/Luca15/RBM5 inhibits 1049 growth of human lung cancer cells through cell cycle arrest and apoptosis. 1050 Cancer Res, 2006. 66(7): p. 3419-27. 1051 20. Oh, J.J., et al., A candidate tumor suppressor gene, H37, from the human lung 1052 cancer tumor suppressor locus 3p21.3. Cancer Res, 2002. 62(11): p. 3207-13. 1053 21. Bos, P.D., et al., Genes that mediate breast cancer metastasis to the brain. 1054 Nature, 2009. 459(7249): p. 1005-9. 1055 22. Minn, A.J., et al., Genes that mediate breast cancer metastasis to lung. Nature, 1056 2005. 436(7050): p. 518-24. 1057 23. Wang, Y., et al., Gene-expression profiles to predict distant metastasis of lymph- 1058 node-negative primary breast cancer. Lancet, 2005. 365(9460): p. 671-9.

39 1059 24. Baltz, A.G., et al., The mRNA-bound proteome and its global occupancy profile 1060 on protein-coding transcripts. Mol Cell, 2012. 46(5): p. 674-90. 1061 25. Castello, A., et al., Insights into RNA biology from an atlas of mammalian 1062 mRNA-binding proteins. Cell, 2012. 149(6): p. 1393-406. 1063 26. Ray, D., et al., A compendium of RNA-binding motifs for decoding gene 1064 regulation. Nature, 2013. 499(7457): p. 172-7. 1065 27. Cerami, E., et al., The cBio cancer genomics portal: an open platform for 1066 exploring multidimensional cancer genomics data. Cancer Discov, 2012. 2(5): 1067 p. 401-4. 1068 28. Smid, M., et al., Subtypes of breast cancer show preferential site of relapse. 1069 Cancer Res, 2008. 68(9): p. 3108-14. 1070 29. Lu, S., et al., Claudin expression in high-grade invasive ductal carcinoma of the 1071 breast: correlation with the molecular subtype. Mod Pathol, 2013. 26(4): p. 1072 485-95. 1073 30. Forbes, S.A., et al., COSMIC (the Catalogue of Somatic Mutations in Cancer): a 1074 resource to investigate acquired mutations in human cancer. Nucleic Acids 1075 Res, 2010. 38(Database issue): p. D652-7. 1076 31. Darnell, R.B., HITS-CLIP: panoramic views of protein-RNA regulation in living 1077 cells. Wiley Interdiscip Rev RNA, 2010. 1(2): p. 266-86. 1078 32. Licatalosi, D.D., et al., HITS-CLIP yields genome-wide insights into brain 1079 alternative RNA processing. Nature, 2008. 456(7221): p. 464-9. 1080 33. Weyn-Vanhentenryck, S.M., et al., HITS-CLIP and Integrative Modeling Define 1081 the Rbfox Splicing-Regulatory Network Linked to Brain Development and 1082 Autism. Cell Rep, 2014. 6(6): p. 1139-52. 1083 34. Chi, S.W., et al., Argonaute HITS-CLIP decodes microRNA-mRNA interaction 1084 maps. Nature, 2009. 460(7254): p. 479-86. 1085 35. Darnell, J.C., et al., FMRP stalls ribosomal translocation on mRNAs linked to 1086 synaptic function and autism. Cell, 2011. 146(2): p. 247-61. 1087 36. Ince-Dunn, G., et al., Neuronal Elav-like (Hu) proteins regulate RNA splicing 1088 and abundance to control glutamate levels and neuronal excitability. Neuron, 1089 2012. 75(6): p. 1067-80. 1090 37. Bailey, T.L. and C. Elkan, Fitting a mixture model by expectation maximization 1091 to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol, 1994. 2: p. 1092 28-36. 1093 38. Zhang, C. and R.B. Darnell, Mapping in vivo protein-RNA interactions at single- 1094 nucleotide resolution from HITS-CLIP data. Nat Biotechnol, 2011. 29(7): p. 1095 607-14. 1096 39. Charizanis, K., et al., Muscleblind-like 2-mediated alternative splicing in the 1097 developing brain and dysregulation in myotonic dystrophy. Neuron, 2012. 1098 75(3): p. 437-50. 1099 40. Ule, J., et al., Nova regulates brain-specific splicing to shape the synapse. Nat 1100 Genet, 2005. 37(8): p. 844-52. 1101 41. Dreyfuss, G., V.N. Kim, and N. Kataoka, Messenger-RNA-binding proteins and 1102 the messages they carry. Nat Rev Mol Cell Biol, 2002. 3(3): p. 195-205. 1103 42. Bafico, A., et al., An autocrine mechanism for constitutive Wnt pathway 1104 activation in human cancer cells. Cancer Cell, 2004. 6(5): p. 497-506.

40 1105 43. Cowling, V.H., et al., c-Myc transforms human mammary epithelial cells 1106 through repression of the Wnt inhibitors DKK1 and SFRP1. Mol Cell Biol, 2007. 1107 27(14): p. 5135-46. 1108 44. Mikheev, A.M., et al., Dickkopf-1 mediated tumor suppression in human breast 1109 carcinoma cells. Breast Cancer Res Treat, 2008. 112(2): p. 263-73. 1110 45. Zhao, J., et al., TIP30/CC3 expression in breast carcinoma: relation to 1111 metastasis, clinicopathologic parameters, and P53 expression. Hum Pathol, 1112 2007. 38(2): p. 293-8. 1113 46. Li, H., et al., miR-17-5p promotes human breast cancer cell migration and 1114 invasion through suppression of HBP1. Breast Cancer Res Treat, 2011. 126(3): 1115 p. 565-75. 1116 47. Paulson, K.E., et al., Alterations of the HBP1 transcriptional repressor are 1117 associated with invasive breast cancer. Cancer Res, 2007. 67(13): p. 6136-45. 1118 48. Lahoz, E.G., et al., Suppression of Myc, but not E1a, transformation activity by 1119 Max-associated proteins, Mad and Mxi1. Proc Natl Acad Sci U S A, 1994. 1120 91(12): p. 5503-7. 1121 49. Hudson, R.S., et al., MicroRNA-106b-25 cluster expression is associated with 1122 early disease recurrence and targets caspase-7 and focal adhesion in human 1123 prostate cancer. Oncogene, 2013. 32(35): p. 4139-47. 1124 50. Roovers, K., et al., The Ste20-like kinase SLK is required for ErbB2-driven 1125 breast cancer cell motility. Oncogene, 2009. 28(31): p. 2839-48. 1126 51. Wade, M., Y.C. Li, and G.M. Wahl, MDM2, MDMX and p53 in oncogenesis and 1127 cancer therapy. Nat Rev Cancer, 2013. 13(2): p. 83-96. 1128 52. Oskarsson, T., et al., Breast cancer cells produce tenascin C as a metastatic 1129 niche component to colonize the lungs. Nat Med, 2011. 17(7): p. 867-74. 1130 53. Zhou, A.D., et al., beta-Catenin/LEF1 transactivates the microRNA-371-373 1131 cluster that modulates the Wnt/beta-catenin-signaling pathway. Oncogene, 1132 2012. 31(24): p. 2968-78. 1133 54. Bhattacharyya, S.N., et al., Relief of microRNA-mediated translational 1134 repression in human cells subjected to stress. Cell, 2006. 125(6): p. 1111-24. 1135 55. Young, L.E., et al., The mRNA stability factor HuR inhibits microRNA-16 1136 targeting of COX-2. Mol Cancer Res, 2012. 10(1): p. 167-80. 1137 56. Fellmann, C., et al., Functional identification of optimized RNAi triggers using a 1138 massively parallel sensor assay. Mol Cell, 2011. 41(6): p. 733-46. 1139 57. Clevers, H. and R. Nusse, Wnt/beta-catenin signaling and disease. Cell, 2012. 1140 149(6): p. 1192-205. 1141 58. Matsuda, Y., et al., WNT signaling enhances breast cancer cell motility and 1142 blockade of the WNT pathway by sFRP1 suppresses MDA-MB-231 xenograft 1143 growth. Breast Cancer Res, 2009. 11(3): p. R32. 1144 59. Lustig, B., et al., Negative feedback loop of Wnt signaling through upregulation 1145 of conductin/axin2 in colorectal and liver tumors. Mol Cell Biol, 2002. 22(4): p. 1146 1184-93. 1147 60. Mehta, A., et al., Molecular cloning of apobec-1 complementation factor, a novel 1148 RNA-binding protein involved in the editing of mRNA. Mol Cell 1149 Biol, 2000. 20(5): p. 1846-54.

41 1150 61. Chen, H.H., et al., The RNA binding protein hnRNP Q modulates the utilization 1151 of exon 7 in the survival motor neuron 2 (SMN2) gene. Mol Cell Biol, 2008. 1152 28(22): p. 6929-38. 1153 62. Weidensdorfer, D., et al., Control of c-myc mRNA stability by IGF2BP1- 1154 associated cytoplasmic RNPs. RNA, 2009. 15(1): p. 104-15. 1155 63. Chaudhury, A., P. Chander, and P.H. Howe, Heterogeneous nuclear 1156 ribonucleoproteins (hnRNPs) in cellular processes: Focus on hnRNP E1's 1157 multifunctional regulatory roles. RNA, 2010. 16(8): p. 1449-62. 1158 64. Lagier-Tourenne, C., et al., Divergent roles of ALS-linked proteins FUS/TLS and 1159 TDP-43 intersect in processing long pre-mRNAs. Nat Neurosci, 2012. 15(11): p. 1160 1488-97. 1161 65. Li, X., et al., Finding the target sites of RNA-binding proteins. Wiley Interdiscip 1162 Rev RNA, 2014. 5(1): p. 111-30. 1163 66. Kim, H.H., et al., HuR recruits let-7/RISC to repress c-Myc expression. Genes 1164 Dev, 2009. 23(15): p. 1743-8. 1165 67. Zou, T., et al., Polyamines regulate the stability of JunD mRNA by modulating 1166 the competitive binding of its 3' untranslated region to HuR and AUF1. Mol Cell 1167 Biol, 2010. 30(21): p. 5021-32. 1168 68. Guan, R., et al., rbm47, a novel RNA binding protein, regulates zebrafish head 1169 development. Dev Dyn, 2013. 1170 69. Richard, S., et al., Sam68 haploinsufficiency delays onset of mammary 1171 tumorigenesis and metastasis. Oncogene, 2008. 27(4): p. 548-56. 1172 70. Sommer, G., et al., The RNA-binding protein La contributes to cell proliferation 1173 and CCND1 expression. Oncogene, 2011. 30(4): p. 434-44. 1174 71. Wang, J., et al., Predictive and prognostic significance of cytoplasmic expression 1175 of ELAV-like protein HuR in invasive breast cancer treated with neoadjuvant 1176 chemotherapy. Breast Cancer Res Treat, 2013. 141(2): p. 213-24. 1177 72. Kang, Y., et al., A multigenic program mediating breast cancer metastasis to 1178 bone. Cancer Cell, 2003. 3(6): p. 537-49. 1179 73. Dow, L.E., et al., A pipeline for the generation of shRNA transgenic mice. Nat 1180 Protoc, 2012. 7(2): p. 374-93. 1181 74. Hillman-Jackson, J., et al., Using Galaxy to perform large-scale interactive data 1182 analyses. Curr Protoc Bioinformatics, 2012. Chapter 10: p. Unit10 5. 1183 75. Licatalosi, D.D., et al., Ptbp2 represses adult-specific splicing to regulate the 1184 generation of neuronal precursors in the embryonic brain. Genes Dev, 2012. 1185 26(14): p. 1626-42. 1186 76. Schneider, C.A., W.S. Rasband, and K.W. Eliceiri, NIH Image to ImageJ: 25 1187 years of image analysis. Nat Methods, 2012. 9(7): p. 671-5. 1188 77. Zhang, X.H., et al., Latent bone metastasis in breast cancer tied to Src- 1189 dependent survival signals. Cancer Cell, 2009. 16(1): p. 67-78. 1190 78. Dobin, A., et al., STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 1191 2013. 29(1): p. 15-21. 1192 79. Anders, S. and W. Huber, Differential expression analysis for sequence count 1193 data. Genome Biol, 2010. 11(10): p. R106. 1194 1195

42 D Brain met free survival Vanharanta, Marney et al. A 0.2 0.4 0.6 0.8 1.0 0

0.0 0.2 0.4 0.6 0.8 1.0 function Metastatic

0 010150 100 50 0

Figure 1. RBM47 expression is associated with breast cancer progression. progression. cancer withbreast associated is expression RBM47 1. Figure ! ! p=0.00026 RBM47 high RBM47 low outcome Clinical 50 ! !

Months G metastasis P =0.00026 Mutated in RBM47 expression 100 RBM47 expression (z −scores)

(z-score) ! !

-2 -1

RBM47high RBM47low

0 1 2 3 4

−2 −1 0 1 2 3 4 RBM47

150 u e2 Basal Her2+ LumA

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Lum A ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● B ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Lum B E LungLung metmet free free survival survival Relative RBM47 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 0.4 0.6 0.8 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● Her2+ ● mRNA level 0.0 0.2 0.4 0.6 0.8 1.0

0.4 0.4 0.2 0.2 0.6 0.8 1.2 0 ● 010150 100 50 0 ● ● ● 0 1 ● ● ●

Claud- ● p=0.0012 ● ● ● ● ● ● ● ● ● ● ● ● ● RBM47 high RBM47 low ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● MDA-231 MDA-231 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● Par ● ● ● ● ● ● Basal 50 Time (months)Time

LM Months

100 BoM P =0.0012 H

RBM47

RBM47 BrM A1CF A1CF

RBM47high RBM47low

150 Par CN34 I281fs

BrM F

BoneBone Met met Free free Survival survival C 0.2 0.4 0.6 0.8 1.0 0

0.0 0.2 0.4 0.6 0.8 1.0

0 Par 010150 100 50 0 MDA-231 MDA-231

p=0.44 LM RBM47 high RBM47 low

50 BoM

Months BrM CN34

100 Par

P =0.44

RBM47high RBM47low BrM RBM47 Tubulin 150

Vanharanta, Marney et al.

A B C D E Brain Met Free Survival

1.0 MUT P = 0.05

100 50

MUT p/s)

0.8 WT 6 40 10 MUT 0.6 30 WT 0.4 WT Survival Probablity Survival 20 1 MUT photon flux P = 0.02 WT 0.2 10

Normalized lung P = 0.038 0.0 0.2 0.4 0.6 0.8 1.0

0 Photon flux (10 0

0.1

Brain met free survival 0 10 20 30 40 0 10 20 30 40 5 7 0 10 20 30 Days 10 10

1 5 WT Time (Days) x 10 6 Time (Days) MUT

F G H

1.0 1.0 0.4 Ctrl

0.8 0.8 0.3

0.6 0.6

= 0.005 P Ctrl 6 0.2 Survival 0.4 0.4 shRNA1 shRNA1 shRNA2 5 x 10

0.2 0.2 0.1

P < 0.02 4

0 0.0 4

0 Lung met free survival shRNA2

0 40 60 Normalized photon flux 0 10 2020 30 40 50 60 1 2

3 Ctrl Time (Days) shRNA Time (days)

Figure 2. RBM47 suppresses metastatic breast cancer progression. Vanharanta, Marney et al.

A B C D WT WT MUT Normal lung

E F G WT10 WT6

Ctrl

1.0

1.0 -7 RBM47 -5

0.8 0.8

8 7

Ctrl 10 10

0.6 0.6 RBM47 10 7 10 6 Survival 0.4 0.4 6 = 3.3 x 10 P P = 10 = 5.1 x 10 P

0.2 0.2 5 -5 10 4.9 x 10 10 5

0 0.0

Brain met free survival

0.2 50 Photon flux (p/s) 4 4 00 5 1010 2020 3030 10 Photon flux (p/s) 10 x 10 5 TimeTime (Days) (days) Ctrl RBM47 Ctrl RBM47

H I J P = 0.086 1.0 1.0 Ctrl 10 9 I281fs

0.8 0.8 RBM47 8 Ctrl 10

0.6 0.6 I281fs 10 7 Survival 0.4 0.4 10 6 0.2 0.2

P = 0.12 Photon flux (p/s) 5

0 0.0 10 Brain met free survival 00 5 1010 15 2020 25 3030 10 5 10 7 Ctrl I281fs TimeTime (Days) (days)

Figure 3. Clonal heterogeneity in RBM47 sensitivity. Vanharanta, Marney et al.

A B Standard HITS-CLIP Modified Cloning Control WT RBM47 UV: UV-Irradiation & Lysis ProtK & polyA tailing AAAAAAAA ... 225 Partial RNAseA digestion 150 & 3’ dephosphorylation 102 Anneal RT Primer 76 RNA AAAAAAAA ... Flag-IP &5’ PNK labeling NVTTTTTTTT 52 SDS-PAGE & transfer to RT-PCR with BrdU & IP 38 TTTTTTTT nitrocellulose ** * *** * * Control WT RBM47

Cont WT RBM47 UV: RNAse: + + +++ + + Circularize UV: + - + + + 225 150 225 102 150 76 Flag- RBM47 102 52 76 Linearize & PCR 38 TTTTTTTT 52 ** * *** * *

C D INCREASING STRINGENCY 100 R2 =0.998

80 Deep Intergenic Intron 60 Downstream 10KB 3’UTR 40 5’UTR CDS Exon

20 Replicate 2 (logCLIP tags/Gene) Percent of BC2 Binding Sites Binding BC2 of Percent 0 CLUSTER CLUSTER Significant Chi-Squared Replicate 1 (logCLIP tags/Gene) Tags ≥2 Tags ≥10 Peak p≤0.01

E F 2 30 1 bits 25 C TT TTTTTT TG C C C AC A GC C C T G G T AC TA 0 1 2 3 4 5 6 7 8 9

20 10 11 12

15 G

10 Tags per BC=2 Cluster BC=2 per Tags 5

CDS 5'UTR Intron 3’UTR Deep plus 10kb Intergenic

Figure 4. HITS-CLIP identifies genome-wide RBM47 binding maps. Vanharanta, Marney et al.

A B ΔΙ ≥|0.2| p ≤0.05 RBM47 bound, included RBM47 bound, excluded

1.0 5’FI 3’FI 5’CE Cassette Exon 3’ CE 0.8 a b Paired-end RNAseq Reads 0.6 ISOFORM 1 5’ CE Cassette Exon 3’ CE c 0.4 ISOFORM 2 5’ CE 3’ CE 0.2

Average IR RBM47 #10 Inclusion a b + = Rate (IR) a + b + 2( c) 0.0 0.0 0.2 0.4 0.6 0.8 1.0 C Average IR Control SLK 360 _ 105760000 105765000 chr10 RBM47 #10 -RT Control RBM47 #10 0 _ 360 _ Control

0 _ _ 13 RBM47 CLIP Tags IR=0 IR=0.58

0_

5 kb

D

200

100

5’ CE 3’ CE

100 RBM47 CLIP Tags Tags RBM47 CLIP 200 Normalized Complexity Normalized 500nt

Figure 5. !"#$%&'()*+,-(.&,+-('/,01(&.2+343/) ! Vanharanta, Marney et al. D RBM47RBM47 expression expression Relative frequency -2.0 A P =1.7x10 4.0 2.0 0.0 of genes

Cluster 1 -2 0 2 4 0.5 1.0

0

High-like 0.0 0.5 1.0

0 0

H

-15 Cluster 2 Low-like 1

DKK1 (pg/ml) 1

1000 1200 1400 -log 200 400 600 800 Figure 6. RBM47-induced changes inmRNA changes RBM47-induced 6. Figure levels. 2 2 0 10

(P-value) Dox (ng/ml) P <0.01 3 E 0 1 2.5 2.5 1 0 3

FoldLog2 change fold change (log 2) -1.0 -0.5 1.0 1.5 0.5 0.0 MUT3 WT10 4 WT6

-1.0 -0.5 0.0 0.5 1.0 1.5 4

MUT3 WT10 WT6

3.2 x10 FCup

Up 5 5

P =

Relative frequency B FCdown

IL8 (pg/ml) Down -13 100 120

20 40 60 80 of genes

0.5 1.0 0

0

0.0 0.5 1.0

Dox (ng/ml) -1.0 0 -1.0

F WT6

1

,*))!(%&'! -0.5 -0.5

#"! 2.5 WT10

Correlation

0.0 0

VEGFA (pg/ml) MUT3 1000 1200 1400 200 400 600 800 +*))!(%&'! $!$ ! 0.5 0.5

0

Dox (ng/ml) 0 1.0 1.0

1 C WT10 MUT3

G WT6 Dox 2.5 Tags Tags 10 10 10 10

1 3 4 1 10 1002 1000 10000

UP ! FCup Up DOWN

FCdown Down !

Vanharanta, Marney et al.

A B WT6 WT10 MUT3 1.2 RBM47 1

1 1 DKK1 1 0.8 0.6 DKK1 0.4 Ctrl Ctrl Ctrl 0.2 level mRNA Relative Dox Dox Dox 0.1 Relative mRNA level Relative mRNA 0 0.1 0.1 Ctrl sh1 sh2 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 RBM47 Time (hours) Time (hours) Time (hours) C D E 5 Ctrl Dox

1.2

miRm 1.0 4 0.8 3 miRn 0.6 2 0.4

mRNA level mRNA 1 mRNA level mRNA 0.2 Relative DKK1 Relative DKK1

0.0 0

miRo

Ctrl Ctrl miR1 miR2 miR3 miR4 miR5 miR1 miR2 miR3 miR4 miR5 Ctrl miRm miRn miRo miRp Ctrl miRm miRn miRo miRp miRq miRp F miRq G 1.2 7 Ctrl Dox

miRq 1.0 6

0.8 5 4 0.6 5’UTR Intron 3 0.4 Exon 3’UTR 2 mRNA level mRNA 0.2 level mRNA 1 Relative DKK1 Relative DKK1

0.0 0

Ctrl Ctrl miR1 miR2 miR3 miR4 miR5 miR1 miR2 miR3 miR4 miR5 Ctrl miRm miRn miRo miRp Ctrl miRm miRn miRo miRp miRq miRq

Figure 7. !"#$%&'()*+,-./& !""# &'!01&/-,23+3-45& Vanharanta, Marney et al.

A B C 50 50 P=4.0e-4 P = 15 70 40 40

Ctrl -4 60 1.6 x 10

WNT3 30 30 50 10 40 20 20

10 10 30 Ctrl 5 Ctrl expression Wnt signature 20 0 DKK1miR1DKK1miRm Wnt signature expression 0 expression (%) mRNA level mRNA Relative AXIN2 Relative Inhibition of AXIN2 Inhibition of 10 DKK1miR2DKK1miRn

-10 -10 0 0 0 2.5 5 10 Low Medium High 0 2.5 5 7.5 10 L M H Dox (ng/ml) Dox (ng/ml) RBM47

D E F P = 0.002

3 10 P = 1.0

1.0 2.2 x 10 -6 10 3 RBM47 2 0.8 0.8 Ctrl 10 10 2 10 100 1000 0.6 0.6 10 10 100 1000 0.4

1 10 0.4 1 P = 0.2 0.2 1 1 3.5 x 10 -10 10 -1 Tumor free survival Tumor Relative photon flux 0 0.0 -1 -2 Relative photon flux 0.1 10 0.01 0.1 10 00 5 1010 15 2020 25 3030 CtrlCtrl RBM47RBM47 CtrlCtrl miR1m miR2n TimeTime (Days) (days) DKK1miR G H I J RBM47 mRNA destabilizing

factors 35 Ctrl mRNA (~2500)

p/s) 30 RBM47 6 25

(10 P = 0.028 P 20 DKK1 15 mRNA level Alternative miRm 10 splicing 5

Up Down WNT DKK1 Photon flux

0 m n Ctrl DKK1miR miRn Tumor progression Tumor progression

Figure 8. RBM47 suppresses tumor progression via Wnt inhibition.