Articles https://doi.org/10.1038/s41556-017-0021-z

MSK1 regulates luminal cell differentiation and metastatic dormancy in ER+ breast cancer

Sylwia Gawrzak1, Lorenzo Rinaldi1, Sara Gregorio1, Enrique J. Arenas1, Fernando Salvador1, Jelena Urosevic1,2, Cristina Figueras-Puig1, Federico Rojo2,3,4, Ivan del Barco Barrantes1, Juan Miguel Cejalvo1,5, Marta Palafox6, Marc Guiu1,2, Antonio Berenguer-Llergo 7, Aikaterini Symeonidi1, Anna Bellmunt1, Daniela Kalafatovic1, Anna Arnal-Estapé1,20, Esther Fernández1, Barbara Müllauer1, Rianne Groeneveld1, Konstantin Slobodnyuk1, Camille Stephan-Otto Attolini 7, Cristina Saura8,9, Joaquín Arribas2,9,10,11, Javier Cortes9,12, Ana Rovira3,13, Montse Muñoz5,14, Ana Lluch2,15,16,17, Violeta Serra2,6, Joan Albanell2,3,13,18, Aleix Prat5,14, Angel R. Nebreda1,11, Salvador Aznar Benitah 1,11 and Roger R. Gomis 1,2,11,19*

For many patients with breast cancer, symptomatic bone metastases appear after years of latency. How micrometastatic lesions remain dormant and undetectable before initiating colonization is unclear. Here, we describe a mechanism involved in bone metastatic latency of oestrogen receptor-positive (ER+) breast cancer. Using an in vivo genome-wide short hairpin RNA screening, we identified the kinase MSK1 as an important regulator of metastatic dormancy in breast cancer. In patients with ER+ breast cancer, low MSK1 expression associates with early metastasis. We show that MSK1 downregulation impairs the dif- ferentiation of breast cancer cells, increasing their bone homing and growth capacities. MSK1 controls the expression of required for luminal cell differentiation, including the GATA3 and FOXA1 transcription factors, by modulating their promoter chromatin status. Our results indicate that MSK1 prevents metastatic progression of ER+ breast cancer, suggesting that strati- fying patients with breast cancer as high or low risk for early relapse based on MSK1 expression could improve prognosis.

etastasis in breast cancer generally manifests asynchro- models, and even less so in a clinical context. Metastatic lesions nously with the primary tumour, with different timelines that originate from DTCs or micrometastases after a period of to clinical detection of symptoms. This time depends on latency retain the vast majority of genetic and molecular altera- M 1 3 the volume, stage and molecular subtype of the primary tumour . tions (80–85%) initially described at the primary site . However, However, luminal tumours, which usually express oestrogen recep- discordance in the intrinsic or hormonal status of breast can- tor (ER), may recur after a long period of time, characterized by cer subtypes has been reported in metastatic progression—for the absence of symptoms. The capacity of micrometastases and/or instance, luminal/HER2-negative (HER2–) tumours acquire a disseminated tumour cells (DTCs) in the bone marrow to main- luminal B or HER2-enriched profile during metastatic progres- tain themselves at low numbers after primary tumour resection is sion3,4. This suggests that important, but subtle, loss of molecular critical for tumour latency and may explain how disease can resist differentiation properties arise during metastatic progression, and treatment and reappear after long asymptomatic periods. Several that dormancy may be an endowed state. We aimed to distinguish clinical trials have revealed that the presence of circulating tumour whether these differentiation changes are passengers during the cell (CTC) counts in blood has prognostic relevance with respect to tumour evolution or, alternatively, if they have functional conse- metastasis progression2. This observation suggests that dormancy quences for latency and overt metastasis. or quiescence of a solitary cell is not a unique feature of latent meta- To this end, we performed an in vivo loss-of-function, static lesions, and that a combination of proliferative and apoptotic genome-wide short hairpin RNA (shRNA) screening to identify activities is required to sustain the release of CTCs. genes involved in breast cancer latency, using an experimental Until now, the mechanisms enabling breast cancer cells to exit mouse model based on human ER-positive (ER+) breast cancer from latency have been only poorly understood in preclinical cells that are moderately metastatic for bone. After injection into

1Oncology Program, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain. 2CIBERONC, Madrid, Spain. 3Cancer Research Program, IMIM (Hospital del Mar Medical Research Institute), Barcelona, Spain. 4Pathology Department, IIS-Fundación Jimenez Diaz, Madrid, Spain. 5Translational Genomics and Targeted Therapeutics, Institut d’Investigacions Biomèdiques Pi i Sunyer-IDIBAPS, Barcelona, Spain. 6Experimental Therapeutics, Vall d’Hebron Insitute of Oncology, Barcelona, Spain. 7Biostatistics and Bioinformatics Unit, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain. 8Department of Oncology, Vall d’Hebrón University Hospital, Barcelona, Spain. 9Vall d’Hebron Institute of Oncology, Barcelona, Spain. 10Universitat Autònoma de Barcelona, Bellaterra, Spain. 11ICREA, Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain. 12Ramon y Cajal University Hospital, Madrid, Spain. 13Medical Oncology Service, Hospital del Mar, Barcelona, Spain. 14Department of Oncology, Hospital Clinic de Barcelona, Barcelona, Spain. 15Department of Oncology and Hematology, Hospital Clínico Universitario, Valencia, Spain. 16University of Valencia, Valencia, Spain. 17INCLIVA, Instituto de Investigación Sanitaria, Valencia, Spain. 18Universitat Pompeu Fabra, Barcelona, Spain. 19Universitat de Barcelona, Barcelona, Spain. Present address: 20Department of Pathology, Yale University School of Medicine, Yale, CT, USA. *e-mail: [email protected]

Nature Cell Biology | VOL 20 | FEBRUARY 2018 | 211–221 | www.nature.com/naturecellbiology 211 © 2018 Nature America Inc., part of Springer Nature. All rights reserved. Articles NATURE CEll BIOlOGy immunodeficient mice, these cells form latent micrometastatic slow tumour cell proliferation and controlled apoptosis keeping the bone lesions for extended periods of time before causing overt mass at a steady size (Fig. 1h). metastasis. These studies identified mitogen- and stress-activated kinase 1 (MSK1) as an important regulator of metastatic latency. MSK1 regulates tumour mass dormancy in ER+ breast can- Clinically, low MSK1 expression associates with early relapse in cer, and its high expression associates with late metastases in patients with ER+ breast cancer. At the molecular level, reduced patients with ER+ breast cancer. We next performed an unbi- MSK1 expression causes chromatin remodelling, which decreases ased, genome-wide shRNA screening to identify regulators of differentiation traits (including reduced expression of the genes tumour mass dormancy in ER+ breast cancer (Fig. 2a). We used encoding GATA3 and FOXA1 transcription factors) and facili- a whole-genome human shRNA lentiviral library divided into 10 tates bone colonization by cells in micrometastatic lesions. pools, each containing 8,000 shRNAs. Numerous distinct shRNAs (4–5) that target each of the 16,019 human genes were included Results in the screen and scrutinized to exclude off-targets (Fig. 2a), Latent tumour mass bone metastasis model in ER+ breast can- and 0.4 viral multiplicity of infection was used to ensure one inte- cer. To test whether dormancy is the default state in disseminated grant per cell. Infected cells were selected with puromycin and luminal breast cancer cells and micrometastatic lesions, we devel- insertion was confirmed by PCR (Fig. 2b), and each cell pool oped an experimental latency bone metastasis model and designed was injected into the left ventricle of ten mice. The cells that a systemic and unbiased experimental approach to identify genes expanded and caused symptomatic metastasis were green fluo- that regulate dormancy. High-performance, live-animal imaging rescent protein (GFP) sorted, and the integrated shRNAs were techniques allowed us to track and selectively isolate dormant bone sequenced (Fig. 2a). Using the BLI method, a fourfold increase in metastatic (DBM) T47D luminal (ER+) breast cancer cell deriva- the number of overt metastases was detected in the shRNA pools tives that, despite their reproducibly long latency phase, form bone compared to control cells at a 4-month post-injection analysis lesions after injection into the left ventricle of mice (Fig. 1a and (Fig. 2c–e). Retrieved sequences were then weighted based on Supplementary Fig. 1a,b). Contrary to previously reported breast their representation in metastatic lesions and associated with cancer experimental metastasis models selected for bone tropism5 their target (Supplementary Table 2). shRNAs represented by (Supplementary Fig. 1c,d), only a small fraction of animals devel- 10–300 copies in the cell pool prior to inoculation were selected, oped an overt lesion at late time points. These cells showed three and genes whose silencing was expected to either reduce cell death distinct growth phases: homing (n =​ 8 out of 12), latency (n =​ 5 out or enhance proliferation were omitted (Fig. 2f); this resulted in of 12) and symptomatic metastasis (n =​ 2 out of 12) (Fig. 1a and a short-list of 322 genes (Fig. 2g). The RPS6KA5 gene, which Supplementary Fig. 1b). We visualized metastatic cells in the mouse encodes MSK1, and the CELA3A gene, which encodes the ser- skeleton with an in vivo imaging micro-computed tomography ine protease chymotrypsin-like elastase family member 3A, were system (IVIS-μ​CT). Disseminated DBM cells in the bone-homing among the 6 genes with ≥​2 shRNA hits (2 shRNA detected for phase were clinically asymptomatic. During the entire metastatic both genes), which were present in ≥​2 independent lesions (15 process, DBM cells retained high levels of ER expression (Fig. 1a). genes) (Fig. 2h and Supplementary Table 3). Although CELA3A After harvesting dormant metastatic lesions, we used genome- represents a potentially interesting candidate, we focused here on wide microarray transcriptomic profiling and a comparative MSK1, a downstream effector of the p38 MAPK and extracellu- genomic hybridization (CGH) analysis to compare parental versus lar signal-regulated kinase 1/2 (ERK1/2) signalling pathways9,10, DBM cells. Gene set enrichment analysis (GSEA) of differentially which has not been previously associated with breast cancer expressed genes showed that DBM cells gained metastatic capaci- metastasis or dormancy11. ties by upregulating extracellular matrix proteins, but also gained We tested the association between MSK1 expression and late dormant abilities by downregulating genes related to mitosis (Fig. metastasis in two independent cohorts of ER+ breast cancer primary 1b and Supplementary Table 1). These changes were not attributable tumour biopsies, for which we had the clinical annotation of time- to genomic alterations (Supplementary Fig. 1e). to-metastasis. The first cohort (Memorial Sloan Kettering Cancer Bioluminescent imaging (BLI), histological staining and immu- Center/Erasmus Medical Center (MSKCC/EMC) cohort; GSE2034, nohistochemistry analysis revealed an increase in lesion size and GSE2603, GSE5327 and GSE12276; total n = 632) comprised 370 osteolysis in those few lesions that transitioned from latency to ER+ breast cancer biopsies with publicly available gene expres- macrometastases (Fig. 1a and Supplementary Fig. 1f,g). We also sion profiles. MSK1 expression was significantly downregulated in observed that the lesions in our model grew as micrometastatic tumours from patients who presented bone relapse within 5 years colonies (Supplementary Fig. 1h), in which the combination of cell after primary tumour diagnosis (<​5 yr hazard ratio (HR) =​ 0.41, death and proliferation gives rise to ‘tumour mass dormancy’6,7. P =​ 0.01, n =​ 164) (Fig. 3a). No association between MSK1 expres- We analysed for actively proliferating cells by Ki67 immunostain- sion and tumour size was observed (Fig. 3b). Importantly, this effect ing, and for cells that go through S phase by injecting mice with the extended to metastasis-free survival, which was significantly lower thymidine analogue 5-bromodeoxyuridine (BrdU). Quantification in ER+ patients with low MSK1 levels (<​5 yr HR =​ 0.48, P =​ 0.01, of Ki67+ and BrdU+ cells showed that fewer cells are cycling in the n =​ 193), whereas no differences in MSK1 expression were observed latent micrometastatic lesions than in overt metastases (Fig. 1c,d). in patients with ER-negative (ER–) breast cancer (Fig. 3c,d). These In addition, the in vivo activity of the apoptosis marker caspase 3/7 observations were confirmed in an independent early-breast cancer decreased during metastasis progression (Fig. 1e). A pulse-chase patient cohort, with MSK1 levels quantified by immunohistochem- experiment of animals harbouring either dormant micrometastases istry (<​5 yr HR =​ 0.16, P =​ 0.007, n =​ 214) (Fig. 3e,f). Collectively, or overt metastatic lesions (with a 10-day pulse with the thymidine these findings reveal an association between high MSK1 expression analogue 5-ethynyl-2′-deoxyuridine​ (EdU) and a 12-day chase) and late metastases appearance in patients with ER+ breast cancer. showed a twofold higher count of label-retaining cells in latent To confirm the screening results, we depleted MSK1 in DBM lesions than in overt metastases (Fig. 1f,g). Consistent with previ- cells using two independent shRNAs; this had no significant effect ous reports, latent lesions showed increased levels of phosphory- on cell proliferation in vitro (Fig. 4a and Supplementary Fig. 2c). lated (active) p38 MAPK; however, no differences in the levels of Strikingly, however, MSK1 depletion strongly promoted bone phosphorylated SMAD2 could be confirmed7,8 (Supplementary Fig. homing in vivo (Fig. 4a). These results were confirmed using clus- 1i,j). Collectively, these results suggest that our experimental latency tered regularly interspaced short palindromic repeats (CRISPR)– model is driven by tumour mass dormancy, with the combination of CRISPR associated protein 9 (Cas9) genome editing to deplete

212 Nature Cell Biology | VOL 20 | FEBRUARY 2018 | 211–221 | www.nature.com/naturecellbiology © 2018 Nature America Inc., part of Springer Nature. All rights reserved. NATURE CEll BIOlOGy Articles

a DBM cells IC injection IVIS-µCT X-ray H&E ERα Disseminated cells No osteolysis Homing

Stable lesion No or small osteolysis Latency

Growing lesion Large osteolysis Metastasis

5.5 × 103 8.0 × 105 b c Latent BrdU IF micrometastasis 0 Hours 4

Downregulated in DBM cells Upregulated in DBM cells P= DMC P ≠ DBM (1.57%) Ribosome biogenesis Secondary metabolic process Mitosis Proteinaceous ECM P = DBM (98.43%) segregation ECM organization

–2.5 –1.5 –0.5 0.5 1.5 2.5 Normalized enrichment score

GFP/BrdU/DAPI d e Latency Metastasis Bone marrow Latency Metastasis 7 × 106 30 P = 0.012 80 P = 0.0003

60 BrdU 20 Z-DEV D 3 × 104 40 8 × 108 10 20 LUC Ki67 Ki67-positive cells (%) BrdU-positive cells (%) 0 0 Latency Latency 1 × 105 1 Metastasis Metastasis 10

100 f IC injectionEdU pulse-chaseFACS g IC injection EdU pulse-chaseFACS 10–1 0 Days 35 45 57 0 Days 70 80 92 –2

Latency Metastasis Z-DEVD/LUC 10 106 106 100 12 10–3 5 5 90 10 9 10 104 104 60 LRC 6 LRC Latency 3 3 40 3.49% 10 3 7.08% 10 Metastasis 2 0.018% 2 0.44% 20 h Blue (695/40) 10 Blue (695/40) 10 Count (events) 0 Count (events) 0 2 3 4 5 6 2 3 4 5 6 2 3 4 5 6 2 3 4 5 6 Slow 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 = Apoptosis GFP EdU GFP EdU proliferation

(Blue 530/30) (Red 670/30) (Blue 530/30) (Red 670/30] Latency

High > Apoptosis proliferation Metastasis

Fig. 1 | Tumour mass latent ER+ breast cancer bone metastasis model. a, Graphical schematics and representative images of IVIS-µCT,​ haematoxylin and eosin (H&E) and ERα ​immunohistochemical staining performed on hindlimbs at different stages of metastatic colonization (n = 3​ limbs per condition). Dashed lines, tumour area. Scale bars, 500 µ​m (lower magnification) or 50 µ​m (higher magnification). IC, intracardiac. b, Pie graph representing genes differentially expressed by fold change =​ 3 between parental (P) and DBM cell lines (left) with GSEA using GO slim performed on upregulated and downregulated genes in DBM cells (right). c, Schematic of BrdU incorporation experiment (top), and representative immunofluorescence (IF) image of a bone lesion stained for GFP, BrdU and nuclei (using DAPI) (n =​ 3 limbs) (bottom). Scale bar, 20 µ​m. d, Quantification of BrdU+ cells in latent lesions (n = 11​ ROIs from 3 limbs) and metastatic lesions (n =​ 8 ROIs from 2 limbs). Quantification of Ki67+ cells in latent lesions (n =​ 12 ROIs from 3 limbs) and metastatic lesions (n =​ 15 ROIs from 5 limbs) (top). Two-tailed Mann–Whitney test. Representative image of BrdU and Ki67 staining in lesions during latency, the metastatic phase or in bone marrow (bottom). Scale bar, 50 µ​m. The experiment was performed once. e, Representative images of the caspase 3/7 substrate (Z-DEVD) BLI signal in latent and metastatic lesions (top) and LUC BLI activity (middle). Quantification of the apoptotic signal (Z-DEVD) normalized to the lesion size (LUC) in latent micrometastatic (n =​ 2 limbs) and metastatic (n =​ 7 limbs) lesions (bottom). f, Schematic of la aabel-retaining cell (LRC) detection in latent micrometastatic lesions (top). Quantification of GFP+ and EdU+ cells in latent micrometastatic lesions (n =​ 3 limbs, single FACS analysis) (bottom). g, Schematic of label-retaining cell detection in metastatic lesions (top). Quantification of GFP+ and EdU+ cells in metastatic lesions (n = 3​ limbs, single FACS analysis) (bottom). h, Graphical summary of the tumour mass dormancy model. d and e show data as whisker plots: midline, median; box, 25–75th percentile; whisker, minimum to maximum. Statistics source data are provided in Supplementary Table 9.

Nature Cell Biology | VOL 20 | FEBRUARY 2018 | 211–221 | www.nature.com/naturecellbiology 213 © 2018 Nature America Inc., part of Springer Nature. All rights reserved. Articles NATURE CEll BIOlOGy

a Lentiviral genome-wide bc human shRNA library 10 pools (16,019 genes) 8 MOI = 0.4 Pool 6 1 2345678910 shCtrl NegativePositive Ctrl PCR Ctrl PCR DBM cells gDNA Not transfected 4 shRNA abundance comparison nal fold change

Whole mouse 2

gDNA BLI sig 0 PoolsshCtrl

10 pools × 10 mice shCtrl × 9 mice

d e 80 8 8

60 6 6

40 4 4 20 fold change 2 2 Hindlimb homing (%)

0 Hindlimb metastasis 1 2345678910 shCtrl 0 0 Pools shCtrl Pool 1 2345678910 shCtrl Pool f g h 600 256 ≥2 lesions ≥2 shRNAs e 322 genes

500 ) 64 per gene per gene 2 g 400 16 4 300 1 13 2 4 200 0.1

100 fold change (lo Sequence abundanc 0.01 pre-inoculation samples Number of sequences in 0 0 1234 0 5101520 RPS6KA5 3 shRNA sequence (×104) shRNA sequence (×10 ) CELA3A

Fig. 2 | Genome-wide shRNA screen identifies MSK1 as a dormancy regulator in latent ER+ breast cancer bone metastasis. a, Schematic representation of the in vivo shRNA screening strategy. A viral multiplicity of infection (MOI) of 0.4 was used to ensure one integrant per cell40. Infected cells were selected with puromycin to confirm insertion. Before inoculation, genomic DNA (gDNA) from all cell pools was analysed to identify under-represented and over-represented shRNAs in the initial population. Each cell pool was injected into the left ventricle of ten mice to ensure biological reliability, and the metastasis-enhancing capacity of this loss-of-function screen was assessed by BLI after injection. A negative control of dormant metastatic DBM derivatives was injected into a parallel cohort of mice for comparison. Metastatic cells that overgrew and caused symptomatic metastasis were GFP sorted, and short hairpins integrated into their genomic DNA were sequenced. b, Agarose gel electrophoresis of genomic DNA isolated from different populations of DBM cells. The experiment was performed once. c, Quantification of whole-body BLI signal at day 0 of screening (pools, n = 97​ mice; shCtrl, n =​ 9 mice). d, Hindlimb homing of cells infected with distinct pools of library or shCtrl. e, Quantification of hindlimb metastasis BLI signal at day 120 of screening (pools, n =​ 194 limbs; shCtrl, n =​ 18 limbs). f, Distribution of shRNA sequence numbers in pre-inoculation samples. shRNAs of between 10 and 300 repeats are labelled in red. Hit selection cut-off is indicated with dashed lines. g, Distribution of the shRNA sequence abundance fold change between pre-inoculated and bone metastasis-derived samples. Highly abundant shRNA sequences are marked in yellow or red, and the 1.5-hit selection cut-off is indicated as a dashed line. h, shRNA sequences that are highly abundant in two or more independent samples (blue) and targeted by two different shRNA sequences (red). Statistics source data are provided in Supplementary Table 9.

MSK1 (Fig. 4b, Supplementary Fig. 2d and Supplementary Table 4). latency phase (Supplementary Fig. 2i). These results suggest that Furthermore, MSK1 depletion increased the capacity for overt MSK1 inhibits bone homing and promotes latency. metastasis both in DBM cells (Fig. 4c and Supplementary Fig. 2e) and in the poorly metastatic ER+ breast cancer cell line, ZR75 MSK1 negatively regulates metastasis initiation by promot- (Supplementary Fig. 2f), implicating MSK1 as a regulator of dor- ing the expression of luminal lineage-specific genes.. Next, we mancy. Immunohistochemistry analysis confirmed that MSK1 was explored mechanisms that may support homing and expansion of depleted in metastatic lesions (Fig. 4d). MSK1 depletion increased micrometastases in the absence of MSK1. MSK1 depletion neither ZR75 cell dissemination and homing to, and growth in, bone from resulted in detectable changes in angiogenesis (Supplementary primary mammary tumours (Fig. 4e). In time-match experiments, Fig. 3a–e) nor significantly affected the percentage of cells that the MSK1-depleted DBM cells in lesions in the homing and latency go through S phase (Supplementary Fig. 3f,g), suggesting that phases were less susceptible to apoptosis (albeit not significantly), mechanisms besides cellular quiescence are relevant. MSK1 deple- and grew more upon reaching bone than cells in control lesions tion did not modify the ability of DBM cells to survive under (Supplementary Fig. 2g,h). By contrast, MSK1 levels did not hypoxia, adhere, migrate or invade (Supplementary Fig. 4a–d). increase the number of detected DTCs in hindlimbs in mice in the C-X-C chemokine receptor 4 (CXCR4)–C-X-C motif chemokine

214 Nature Cell Biology | VOL 20 | FEBRUARY 2018 | 211–221 | www.nature.com/naturecellbiology © 2018 Nature America Inc., part of Springer Nature. All rights reserved. NATURE CEll BIOlOGy Articles

a b

10 1.0 vival 5 0.8

0.6

0.4 MSK1 high P = 0.01 0.2 MSK1 medium MSK1 low Tumour size (log a.u.) 0.0 1

Bone metastasis-free sur 0 246810 12 14 6.0 7.08.0 Time to event (years) MSK1 (a.u.) c d 1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4 MSK1 high P = 0.01 MSK1 high P = 0.78 0.2 MSK1 medium 0.2 MSK1 medium MSK1 low MSK1 low Metastasis-free survival 0.0 Metastasis-free survival 0.0 0 246810 12 14 0246810 12 14 Time to event (years) Time to event (years) e f Case 1: MSK1 > 0 Case 2: MSK1 = 0 1.0 ival 0.8

0.6 ee su rv

0.4

0.2 MSK1 > 0 MSK1 = 0 Metastasis-fr 0.0 0510 15 Time to event (yr)

Fig. 3 | MSK1 loss is associated with early relapse in patients with ER+ breast cancer. a, Survival analysis representing the proportion of bone metastasis- free patients (from the MSKCC/EMC cohort primary tumour data set) stratified according to MSK1 (gene name: RPS6KA5) mRNA levels in samples from patients with ER+ breast cancer (low, n =​ 25; medium, n =​ 67; high, n =​ 72). Patients with a first site of metastasis in locations other than bone were excluded from these analyses (<​5 yr HR =​ 0.41, 95% CI: 0.21–0.80, P =​ 0.01, n =​ 164). b, Pearson correlation between MSK1 expression and tumour size in ER+ breast cancer samples (adjusted by technical effect and HER2 imputed status; from the GSE2603 primary tumour data set). r =​ –0.075, P =​ 0.64; bootstrap 95% CI: −​0.288 to 0.178, n =​ 55). c, Survival analysis representing the proportion of metastatic recurrence-free patients (from the MSKCC/EMC cohort primary tumour data set) stratified according to MSK1 mRNA levels in samples from patients with ER+ breast cancer (high, n =​ 86; medium, n =​ 78; low n =​ 29). <​5 yr HR =​ 0.48, 95% CI: 0.27–0.86, P =​ 0.01, n =​ 193. d, Survival analysis representing the proportion of metastatic recurrence-free patients (from the MSKCC/EMC cohort primary tumour data set) stratified according to MSK1 mRNA levels in samples from patients with ER– breast cancer (high, n =​ 24; medium, n =​ 37; low, n =​ 78). <​3 yr HR = 0.89,​ 95% CI: 0.38–2.05, P =​ 0.01, n =​ 139. e, Representative image of MSK1 staining on breast cancer tumour biopsies described in c. Total samples stained, n = 214;​ the experiment was performed two times. Scale bar, 50 µ​m. f, Survival analysis of patients (from the HCBiobank TMA cohort primary tumour data set) stratified according to MSK1 protein levels in luminal ER+ breast cancer patient samples (MSK1+ (MSK1 >​ 0), n =​ 174; MSK1– (MSK1 =​ 0), n =​ 40). <​5 yr HR =​ 0.16, 95% CI: 0.04–0.61, P =​ 0.007, n =​ 214. For a, c, d and f, significance was assessed by two-tailed Wald test. The Schaffer method was used for P value correction when comparing three or more groups.

ligand 12 (CXCL12; also known as SDF1) signalling, which was Metastasis initiation is usually associated with the expression upregulated in DBM cells, remained unaffected upon MSK1 of stem cell genes13 or the absence of differentiation attributes14,15. depletion (Supplementary Fig. 4e–g). By contrast, tumour initia- Thus, we tested the association between the MSK1 expression lev- tion increased in MSK1-depleted cells. In particular, MSK1 deple- els in primary tumours and signatures defining different breast tion increased second-generation oncosphere formation in three cancer subtypes, using GSEA. MSK1 expression in primary breast ER+ breast cancer cell lines (Fig. 5a,b), and MSK1 re-expression cancer tumours positively correlated with the expression of genes in MSK1 knockout (KO) cells impaired oncosphere formation in described in two independently created luminal (differentiated) DBM cells (Fig. 5c). In addition, decreasing MSK1 activity using gene signatures16,17, including established luminal lineage-specific the inhibitor SB747651A, which targets MSKs and four other genes (for example, FOXA1 and GATA3), but negatively correlated AGC kinases12, promoted organotypic tumoursphere formation in with two basal (undifferentiated) gene signatures (Supplementary Matrigel (Supplementary Fig. 4h). Correspondingly, transient p38α​ Fig. 5a and Supplementary Table 5). In particular, MSK1 expres- MAPK depletion decreased MSK1 expression, whereas treatment sion strongly and positively associated with gene signatures of the with the p38 MAPK inhibitor PH-797804 increased the forma- most differentiated breast cancer tumours, luminal A18–20 (Fig. 5d). tion of second-generation oncospheres (Supplementary Fig. 4i,j). This suggests that MSK1 downregulation impairs luminal differ- Collectively, our data indicate that MSK1 is a dormancy enforcer entiation of metastatic cells, which in turn enhances their tumour- and a negative regulator of metastasis initiation. initiating and invasive capabilities while globally retaining their

Nature Cell Biology | VOL 20 | FEBRUARY 2018 | 211–221 | www.nature.com/naturecellbiology 215 © 2018 Nature America Inc., part of Springer Nature. All rights reserved. Articles NATURE CEll BIOlOGy

a DBM cells b IC injection

21 P = 0.001 DBM cells P = 0.001 100 IC injection 100 )

days ) 80 80

60 60 -E kDA WT WT KO 40 100 40 shRNA MSK1 75 kDa 20 100 20 Ctrlno. 1 no. 2 MSK2 Hindlimb homing (% 100 Hindlimb homing (% 75 MSK1 0 0 75 1 2 Tubulin 50 Day 21 T O 100 W K MSK2 shCtrl 1 WT-E 75 SK 1 M SK MSK1 Tubulin 50 M shMSK1 no.shMSK1 no.

c e ZR75.1 cells DBM cells 100 MFP injection IC injection days

Primary tumour Bone lesions 100 500 106 P = 0.009 )

) 400 P = 0.044 3 105 300 50 104 200 survival (% shCtrl 3

Volume (mm 10 Ex vivo hindlimb

100 photon flux (p/s)

Bone metastasis-free shMSK1 no. 1 shMSK1 no. 2 0 0 102 020406080 100 1 l 1 Time to event (days) shCtrl shCtr

shMSK1 no. shMSK1 no.

d shCtrl shMSK1 no. 1 shMSK1 no. 2

MET MET MET H&E

MET MET MET MSK1

Fig. 4 | MSK1 loss enhances early bone homing and metastasis in ER+ breast cancer. a, Schematic representation of the bone-homing experiment (top left). Western blot of MSK1, MSK2 and tubulin in MSK1-downregulated cell lines (bottom left). Quantification of bone homing by cells infected with shCtrl (n = 36​ limbs), shMSK1 no. 1 (n =​ 40 limbs) or shMSK1 no. 2 (n =​ 50 limbs) at day 21 (right). Two-tailed Fisher’s exact test. b, Western blot of MSK1, MSK2 and tubulin in MSK1 WT, WT-E or KO cells (left). Quantification of bone homing by MSK1 WT (n = 26​ limbs), MSK1 WT-E (n =​ 26 limbs) or MSK1 KO (n = 28​ limbs) cells at day 21 (right). Two-tailed Fisher’s exact test. c, Schematic representation of the bone colonization experiment (top). Kaplan–Meier analysis of bone metastasis-free survival of cells infected with shCtrl (n =​ 15 mice), shMSK1 no. 1 (n =​ 18 mice) or shMSK1 no. 2 (n =​ 23 mice). Log-rank (Mantel–Cox) test. d, Representative images of H&E and MSK1 staining of hindlimb bones from tumour-bearing mice injected with DBM shCtrl (n =​ 3), shMSK1 no. 1 (n =​ 3) or shMSK1 no. 2 (n =​ 3). MET, metastatic lesion. The dashed outline indicates the magnified region. Scale bars, 500 µm​ (insets) or 100 µ​m (larger pictures). e, Schematic representation of bone dissemination from the orthotropic site experiment (top). Quantification of mammary fat pad (MFP) primary tumour growth of ZR75.1 shCtrl (n =​ 10 tumours) and shMSK1 no. 1 (n =​ 12 tumours) (bottom left), with data shown as mean ±​ s.e.m. Ex vivo analysis of BLI signals from DTCs in hindlimbs (bottom right; shCtrl, n = 10​ limbs; shMSK1 no. 1, n =​ 12 limbs). Data are shown as whisker plots: midline, median; box, 25–75th percentile; whisker, minimum to maximum. Two-tailed Mann–Whitney test. Western blot analyses in a and b were performed three times independently with similar results. The Schaffer method was used for P value correction when comparing three or more groups. Statistics source data are provided in Supplementary Table 9. Unprocessed original scans of blots are shown in Supplementary Fig. 8. luminal subtype, as observed for DBM cells (Supplementary Fig. GATA3, KRT7 (which encodes keratin 7) and KRT18 (which encodes 5b–d). Indeed, quantitative reverse transcription PCR (qRT–PCR) keratin 18) decreased upon MSK1 depletion in four ER+ breast cancer confirmed that the expression of the luminal marker genes FOXA1, cell lines (Fig. 5e and Supplementary Fig. 6a). Likewise, the protein

216 Nature Cell Biology | VOL 20 | FEBRUARY 2018 | 211–221 | www.nature.com/naturecellbiology © 2018 Nature America Inc., part of Springer Nature. All rights reserved. NATURE CEll BIOlOGy Articles

a DBM cells ZR75 cells MCF7 cells b DBM cells c DBM cells MSK1 KO * * ** *** * NS ** * 1.5 ** 2.0 1.5 MSK1 (kDa) Mock rescue 2.0 2.0 100 1.0 1.0 MSK1 75 1.0 100 1.0 1.0 0.5 0.5 MSK2 75 fold change fold change Oncospheres fold change Oncospheres Oncospheres 0.0 0.0 0.0 0.0 0.0 Tubulin 50 WT E KO KO Rescue Ctrl Ctrl Ctrl no. 1 no. 2 no. 1no. 2 no. 1 no. 2 MSK1 MSK1 shMSK1 d e shctrl shMSK1 no. 1 shMSK1 no. 2 METABRIC data set TCGA data set FOXA1 GATA3 ** * * ** **** NS *** NS Luminal A signature 1.5 **** *** *** ** **** ** **** NS

0.3 1.0 0.0 0.5 ES NES = 1.49 NES = 2.56 –0.3 NOMp < 0.001 NOMp < 0.001 0.0 DBMZR75MCF7 BT474 DBM ZR75 MCF7 BT474 MSK1+ MSK1– MSK1+ MSK1–

KRT7 KRT18 Ranked NS list met ric 024681012 024681012 NS NS 1.5 ** * * *** * *** * **** * NS NS *** NS Luminal B signature 1.0 0.6 NES = –2.86 NES = –2.30 0.5 NOMp < 0.001 NOMp < 0.001 mRNA expression (fold change) 0.0 ES 0.0 –0.6 DBMZR75MCF7BT474 DBM ZR75 MCF7 BT474 + – + – MSK1 MSK1 MSK1 MSK1 f MSK1 FOXA1 WT Ranked

list metric 024681012 024681012

Basal signature 0.4 NES = –2.0 NES = –1.81 NOMp < 0.001 NOMp < 0.001 KO 0.0 ES –0.4

MSK1+ MSK1– MSK1+ MSK1– Ranked list metri c 024681012 024681012 g DBM cells Gene list rank (×103) h shMSK1 MSK1 Ctrl no. 1no. 2WTKO MSK1 FOXA1 KRT7 (kDa) PDX 313 FOXA1 50 GAPDH 37

GATA3 50 6% 0% 4% Tubulin 50 PDX 244 i j 100 100 ** 80 80 59% 39% 47% 60 60 cells (% ) cells (% ) + PDX 293 40 + 40 20 20 MSK1

FOXA1 0 0 050 100 + 91% 73% 91% MSK1 cells (%) <180 >180 Average time between implantation (days)

Fig. 5 | MSK1 promotes luminal gene expression and impairs metastatic traits. a–c, Second-generation oncosphere formation fold change in: DBM, ZR75 and MCF7 cells upon shMSK1 downregulation (n =​ 3; a); DBM MSK1 KO cells (n =​ 4; b); and DBM KO and DBM MSK1-rescued cells (n =​ 3; c). Western blot of MSK1, MSK2 and tubulin protein levels in DBM KO and DBM MSK1-rescued cells (c, right). The experiment was repeated twice independently with similar results. d, GSEA plots of the luminal A, luminal B and basal gene sets from METABRIC18,19 or TCGA20, and their correlation with MSK1 expression in patients with ER+ breast cancer (n = 370).​ ES, enrichment score; NES, normalized ES; NOMp, nominal P value. e, mRNA level fold changes of luminal genes in DBM, ZR75, MCF7 or BT474 cells after MSK1 downregulation. n =​ 3. f, Representative images of MSK1 and FOXA1 immunohistochemistry in metastasis-bearing DBM MSK1 WT and DBM MSK1 KO hindlimbs (n =​ 2 limbs per group). Scale bar, 50 µ​m. g, Western blot of FOXA1, GATA3, glyceraldehyde 3-phosphate dehydrogenase (GAPDH) and tubulin in MSK1-downregulated and KO DBM cells. h, Representative images of MSK1, FOXA1 and KRT7 immunohistochemistry in PDX tumours. Percentage of positive cells was quantified from two biologically independent samples, and at least 4 ×​ 103 cells were assessed. Scale bar, 50 µ​m. i, Percentage of FOXA1+ and MSK1+ cells in PDX tumours (n =​ 10). Two-tailed Pearson correlation (r =​ 0.89, 95% CI: 0.47–0.97, P = 0.003;​ in grey, excluded outliers). j, MSK1 protein expression in fast-proliferating (n =​ 6) and slow-proliferating (n =​ 4) PDX tumours. Two-tailed Mann–Whitney test. a–c and e show data as mean ±​ s.e.m. Two-tailed Wald tests were used from a linear model, in which group, cell type and their interactions were included as explanatory variables. Panel j shows data as a whisker plot: midline, median; box, 25–75th percentile; whisker, minimum to maximum. NS, not significant (P >​ 0.05); *P ≤​ 0.05; **P ≤​ 0.01; ***P ≤​ 0.001; ****P ≤ 0.0001.​ Statistics source data and P values are provided in Supplementary Table 9. Unprocessed original scans of blots are shown in Supplementary Fig. 8.

Nature Cell Biology | VOL 20 | FEBRUARY 2018 | 211–221 | www.nature.com/naturecellbiology 217 © 2018 Nature America Inc., part of Springer Nature. All rights reserved. Articles NATURE CEll BIOlOGy

a GO cellular compartments b MSK1 MSK1 Transcriptional repressor complex Nuclear speck ph ac ph ac Nucleoplasm part H3 N terminus Ser/Thr phosphatase complex S28K27 S10K9 Nuclear heterochromatin FDR < 0.05 Luminal genes: FOXA1 GO biological processes GATA3 mRNA splicing via spliceosome Histone Lys methylation Post-transcriptional gene silencing by RNA Gene correlation with MSK1 Histone methylation MSK1 MSK1 RNA splicing FDR < 0.05 me3me3 H3 N terminus 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 S28K27 S10K9 Normalized enrichment score Transcription and chromatin modifications Luminal genes: FOXA1 Translation and splicing GATA3 Signalling Post-transcriptional gene silencing

c H3S10ph common peaks H3S28ph common peaks e shCtrl shMSK1 shCtrl shMSK1 Luminal **** DBM shCtrl DBM shMSK1 20

15 *

10

5

Coverage dept h 0

–5 H3S10ph H3S28ph

Basal NS –3000 3000 –3000 3000 –3000 3000 –3000 3000 **** Peak distance (kb) Peak distance (kb) 20

15 d H3S10ph DBM shCtrl H3S28ph 10 DBM shMSK1 5

59,679 12,327 21,454 17,308 Coverage dept h 0 51,067 16,924 –5 H3S10ph H3S28ph

Fig. 6 | MSK1 positively regulates H3S10ph and H3S28ph at promoters of luminal transcription factors. a, Cellular compartment and biological process GO enrichment analysis of genes correlated with MSK1 in the ER+ breast cancer data set (n = 370).​ GO categories (colour-coded) were assigned to genes based on their functions. GSEA-preranked permutation test. FDR, false discovery rate. b, Schematic putative representation of luminal gene transcription- activating (top) and transcription-repressing (bottom) histone modification. c, Heatmaps showing the intensity of H3S10ph and H3S28ph common peaks in shCtrl and shMSK1 cells. d, ChIP–seq data depicting the number of H3S10ph and H3S28ph peaks in DBM shCtrl and shMSK1 cells. e, Violin plots showing ChIP–seq coverage depth of H3S10ph and H3S28ph around the transcription start site (TSS) of basal (n = 873)​ and luminal (n =​ 601) genes16 in DBM shCtrl and shMSK1 cells. Data were analysed using two-tailed paired Student’s t-test. The violin plots indicate the distribution of data points: midpoint, median; vertical lines, 95% CI. NS, not significant (P >​ 0.05); *P ≤​ 0.05; ****P ≤ 0.0001.​ Statistics source data and P values are provided in Supplementary Table 9. levels of FOXA1 and GATA3 decreased concomitantly with decreas- Overexpression of FOXA1 and GATA3 in MSK1-depleted cells was ing levels of MSK1 in cells and in sections of bone metastatic lesions insufficient to revert the downregulation of luminal keratins, sug- induced by MSK1-depleted cells, as compared to control DBM cells gesting a predominant function of MSK1 over subsequent tran- (Fig. 5f,g and Supplementary Fig. 6b). These findings were fur- scriptional inputs (Supplementary Fig. 6d). This effect was mainly ther validated in ten ER+ patient-derived xenograft (PDX) samples driven by MSK1, as MSK2 depletion was not associated with early (Fig. 5h,i, Supplementary Fig. 6c and Supplementary Table 6). The relapse in patients with ER+ breast cancer, and its downregulation in growth rates of PDXs upon engraftment and after subsequent re- MSK1-depleted cells had no major effect on luminal gene expression implantation are variable across models but remain stable upon serial (Supplementary Fig. 6e–g). Our results imply that MSK1 positively engraftment21. Strikingly, explants that originated from tumours regulates the expression of luminal genes (including GATA3 and expressing low levels of MSK1 expanded significantly faster after FOXA1), which in turn fine-tune the regulation of breast cancer cell re-implantation than those expressing high levels of MSK1 (Fig. 5j). differentiation and reduce the capacity of these cells to metastasize.

218 Nature Cell Biology | VOL 20 | FEBRUARY 2018 | 211–221 | www.nature.com/naturecellbiology © 2018 Nature America Inc., part of Springer Nature. All rights reserved. NATURE CEll BIOlOGy Articles

a H3K9ac DBM shCtrl H3K27ac b DBM shMSK1 Luminal Basal DBM shCtrl 3725 1745 18,308 3095 DBM shMSK1 15,255 15,182 P = 0.038 P = 0.034 600 600 NS NS 500 500 400 400 H3K9me3 H3K27me3 300 300 200 200 100 100 7231 16,910

8274 21,953 Coverage dept h 45,960 22,886 Coverage depth 0 0

H3K9ac H3K27ac H3K9ac H3K27ac

c FOXA1 promoter DBM shCtrl DBM shMSK1 GATA3 promoter

* * * * ** * 50 4 30 8 2.5 4 ** 40 P = 0.06 2.0 3 3 6 30 20 1.5 2 2 4 20 1.0 10 Input (%) 10 1 1 Input (%) 2 0.5 0 0 0 0 0 0 K9ac K27ac Input S10ph IgG S28ph IgG K9ac K27ac Input S10ph IgG S28ph IgG

d f Metastasis dormancy Overt metastasis H3K9ac; H3K27ac MSK1 ChIP–seq ChIP–seq MSK1 MSK1 shctrl/shMSK1 H3S10ph; H3S28ph phosphorylation H3K9ac H3K27ac H3K9ac/me3; H3K27ac/me3 P < 0.0001 P < 0.0001 700 1,200 acetylation/ 600 1,000 methylation events 500 800 Strong luminal Weak luminal 400 600 differentiation differentiation 300 200 400 (high FOXA1, GATA3) (low FOXA1, GATA3) 100 200 0 0 Initiation Growth Initiation Growth Normalized peak count shCtrl shMSK1 shCtrl shMSK1

e DBM shCtrl DBM shMSK1 FOXA1 GATA3

H3K9ac

H3K27ac

Input

H3S10ph

H3S28ph

Input

MSK1 chr14: 38,036,481–38,086,601 chr10: 8,065,662–8,127,155

Fig. 7 | MSK1 positively regulates H3K9ac and H3K27ac at promoters of luminal transcription factors. a, ChIP–seq data depicting the number of H3K9ac, H3K27ac, H3K9me3 and H3K27me3 peaks in DBM shCtrl and shMSK1 cells. b, Violin plots showing the ChIP–seq coverage depth of H3K9ac and H3K27ac around the transcription start site (TSS) of basal (n =​ 873) and luminal (n =​ 601) genes16 in DBM shCtrl and shMSK1 cells. Significance was assessed by two-tailed paired Student’s t-test. c, ChIP–qRT–PCR of H3K9ac, H3K27ac, H3S10ph and H3S28ph at the luminal FOXA1 and GATA3 promoters. One-tailed t-test or Wilcoxon test was used as a function of normal distribution; n ≥ 3​ biologically independent samples. Data are shown as mean ±​ s.d. d, Violin plots showing normalized peak counts of H3K9ac and H3K27ac in the promoters of genes bound by MSK1 (n = 834).​ Two-tailed paired Student’s t-test. e, Aggregated ChIP–seq profiles of H3K9ac, H3S10ph, H3K27ac, H3S28ph and input from DBM shCtrl or shMSK1 cells at the GATA3 and FOXA1 genes, and the MSK1 profile from previously published ChIP–seq data. chr, chromosome. f, Graphical summary of the main findings. b and d show data as violin plots that indicate the distribution of data points: midpoint, median; vertical lines, 95% CI. NS, not significant (P >​ 0.05); *P ≤​ 0.05; **P ≤​ 0.01. Statistics source data and P values are provided in Supplementary Table 9.

Nature Cell Biology | VOL 20 | FEBRUARY 2018 | 211–221 | www.nature.com/naturecellbiology 219 © 2018 Nature America Inc., part of Springer Nature. All rights reserved. Articles NATURE CEll BIOlOGy

MSK1 epigenetically regulates the expression of luminal genes. depletion, evolving from more to less luminal-differentiated prop- We next tested whether MSK1 regulates luminal gene expression erties. These findings provide insight into how genetic and epigen- by changing the chromatin landscape. (GO) analy- etic mechanisms affect cell differentiation and its constraints over sis of genes whose expression significantly correlated with MSK1 in metastatic traits. breast cancer primary tumours (MSKCC/EMC cohort) suggested During cell differentiation, epigenetic regulation couples tran- an enrichment of genes involved in chromatin remodelling, includ- scription factor activity with upstream signalling pathways. MSK1 ing histone acetylation and methylation (Fig. 6a, Supplementary mediates gene expression changes by phosphorylating H3S10 and Fig. 7a and Supplementary Table 7). MSK1 is a nuclear kinase that can H3S28, which in turn facilitates H3K9 and H3K27 acetylation22–24; we regulate gene transcription at the epigenetic level via phosphoryla- show that these histone modifications are important for transcrip- tion of histone H3 at serine 10 (H3S10ph) or serine 28 (H3S28ph)22–24 tion of luminal genes in breast cancer cells. Under stress conditions, (Fig. 6b). To study whether MSK1 directly regulates luminal genes, signalling through the p38 MAPK pathway results in MSK1 activa- we performed chromatin immunoprecipitation–sequencing (ChIP– tion and phosphorylation of not only histone H3 at immediate-early seq) for H3S10 and H3S28 using control or MSK1-depleted cells gene promoters22 but also the cAMP response-element-binding pro- (Fig. 6c). We observed a global reduction of unique peaks at H3S10ph teins, nuclear factor-κB​ (NF-κ​B) and activating transcription factor and H3S28ph regions in cells upon MSK1 downregulation (Fig. 6d), 1 (ATF1) (which are its substrates)10,26. Stress signals stemming from and a global reduction in the intensity of the common peaks (Fig. 6c). the stroma have been proposed to induce dormancy by modulating Proximity-based analysis revealed that the locations within the lumi- the ratio of ERK and p38 MAPK activities in DTCs27, among other nal differentiation gene promoters were significantly decreased in mechanisms, including hypoxia, and bone morphogenetic protein MSK1-depleted cells (Fig. 6e). Intriguingly, basal gene promoters (BMP)-mediated and WNT-mediated niche support for cell prolif- showed reduced H3S10ph regions (albeit to a lower extent) but no eration, as part of the normal tissue homeostasis28–30. In particular, consistent changes for H3S28ph (Fig. 6e). Notably, H3S10 and H3S28 bone marrow-derived transforming growth factor-β2​ (TGFβ2)​ con- facilitate changes in adjacent acetylation and methylation marks, trols tumour dormancy by increasing p38 MAPK activity8. which are responsible for regulating the transcription of target genes. Clinical data indicate that the loss of luminal status is a common We used ChIP–seq to identify transcriptionally active chromatin event in ER+ luminal breast cancer metastasis compared to primary regions, marked by acetylation at lysine 9 or lysine 27 on histone 3 tumours; however, the underlying mechanistic details have remained (H3K9ac or H3K27ac, respectively), and transcriptionally repressed elusive31,32. Our data show that loss of MSK1 activates a metastatic regions, marked by tri-methylation at lysine 9 or lysine 27 on histone programme that includes tumour initiation and growth traits. This H3 (H3K9me3 or H3K27me3, respectively). We observed an overall programme involves reduced levels of H3K9ac and H3K27ac at the global reduction in H3K9ac-enriched and H3K27ac-enriched regions promoters of GATA3 and FOXA1 genes, among other luminal deter- (using MACS2 (model-based analysis for ChIP-sequencing 2) peak minants that may have important roles, upon loss of the H3S10ph calling), and a significant increase in H3K9me3- and H3K27me3- and H3S28ph marks. Intriguingly, although loss of histone acetyla- enriched regions, in MSK1-depleted cells compared to control cells, tion did not lead to a consistent increase in methylation for these suggesting that loss of MSK1 caused a concomitant decrease in luminal gene promoters, it did increase methylation for other genes, active chromatin marks and an increase in repressive ones (Fig. 7a which probably contribute to alternative functions that support met- and Supplementary Fig. 7b). Notably, MSK1 depletion significantly astatic growth and should be addressed in the future. decreased the levels of both H3K9ac and H3K27ac at the promoters Both the GATA and the FOX families of transcription factors have of genes that define a luminal breast cancer signature, but not of those a central role in the development and differentiation of various tis- that define a basal one17, with no consistent differences in methylation sues, including the mammary gland33–35. In breast cancer, GATA3 has compared to acetylation in either case (Fig. 7b and Supplementary emerged as a strong and independent predictor of tumour differentia- Fig. 7c), implying that MSK1 mainly associates with expression of the tion and clinical outcome, with low GATA3 expression being strongly luminal differentiation gene signature. We confirmed by ChIP–qPCR predictive of high tumour grade, positive lymph node status and large that H3 acetylation and phosphorylation marks at the FOXA1 and tumour size36. GATA3 has been previously implicated in preventing GATA3 transcription start sites were reduced upon MSK1 deple- metastatic progression15, and loss-of-function mutations in this gene tion, suggesting that MSK1 controls the transcriptional robustness are frequently found in breast cancer and are associated with poor of these two key regulators of breast cancer cell differentiation (Fig. disease-free survival37. FOXA1 is a prognostic marker that prevents 7c,d and Supplementary Fig. 7d,e). Strikingly, MSK1 depletion sig- the metastatic progression of luminal-subtype breast cancer by regu- nificantly changed the H3K9 and H3K27 acetylation status (Fig. 7d) lating differentiation38. A cooperative network between ER, FOXA1 and the H3K27 tri-methylation status (Supplementary Fig. 7f) at the and GATA3 sustains differentiation in luminal tumours39; however, 835 promoters previously reported to be bound by MSK1 (ref. 23). little is known about how this luminal differentiation programme is MSK1 was enriched (using MACS1.4 for peak calling) proximal to collectively regulated. Here, we show that MSK1 controls the chro- the transcription start sites of both FOXA1 and GATA3, although the matin status at these gene promoters, among other luminal relevant enrichment regions were outside the promoter area (±​2 kb) (Fig. 7e). genes, thereby establishing an unprecedented role for this kinase in Collectively, these results suggest that global changes in chromatin cancer cell differentiation and metastasis. Although luminal tumours marks modulated by MSK1 control the activation status of several probably metastasize to bone1, MSK1-mediated mechanisms might genomic regions, including genes that define the luminal differentia- extend beyond that metastatic site. tion programme in ER+ breast cancer cells. We also demonstrated that, in tumour mass dormancy, cell- autonomous mechanisms self-impose a slow-cycling feature, typical Discussion of highly differentiated ER+ breast cancer cells. In this context, dif- This work establishes a role for MSK1 in regulating the chromatin ferentiated cells require major reprogramming to release dormancy marks that modulate the transcription of genes involved in luminal features. These differentiation programmes can be reversed, thus cell differentiation, thereby facilitating dormancy in breast cancer allowing traits, such as homing, growth and initiation, all of which cells. Deregulation of luminal differentiation genes25, which sustain are required for overt metastasis, to occur. Differentiation and slow dormancy by restraining metastatic features, in combination with proliferation are common in cells insensitive to chemotherapy, a increased homing capabilities, facilitates the metastatic potential of treatment that targets growing tumour cells. Our observations may latent breast cancer cells (Fig. 7f). Our analyses suggest that the cells explain why highly differentiated, dormant tumour micrometastases undergo a continuous molecular subtype transition upon MSK1 are resistant to therapy. breast cancer tumours expressing low levels

220 Nature Cell Biology | VOL 20 | FEBRUARY 2018 | 211–221 | www.nature.com/naturecellbiology © 2018 Nature America Inc., part of Springer Nature. All rights reserved. NATURE CEll BIOlOGy Articles of MSK1 tend to relapse very early. Thus, using MSK1 expression lev- 27. Aguirre-Ghiso, J. A. Models, mechanisms and clinical evidence for cancer els as a marker in the future may help to identify and appropriately dormancy. Nat. Rev. Cancer 7, 834–846 (2007). + 28. Johnson, R. W. et al. Induction of LIFR confers a dormancy phenotype in treat low-risk and high-risk patients with ER breast cancer, probably breast cancer cells disseminated to the bone marrow. Nat. Cell. Biol. 18, improving prognosis. 1078–1089 (2016). 29. Gao, H. et al. Te BMP inhibitor Coco reactivates breast cancer cells at lung Methods metastatic sites. Cell 150, 764–779 (2012). Methods, including statements of data availability and any asso- 30. Malladi, S. et al. Metastatic latency and immune evasion through autocrine inhibition of WNT. Cell 165, 45–60 (2016). ciated accession codes and references, are available at https://doi. 31. McBryan, J. et al. Transcriptomic profling of sequential tumors from breast org/10.1038/s41556-017-0021-z. cancer patients provides a global view of metastatic expression changes following endocrine therapy. Clin. Cancer Res. 21, 5371–5379 (2015). Received: 25 September 2017; Accepted: 5 December 2017; 32. Cejalvo, J. M. et al. Intrinsic subtypes and gene expression profles in primary Published online: 22 January 2018 and metastatic breast cancer. Cancer Res. 77, 2213-2221 (2017). 33. Kouros-Mehr, H., Kim, J. W., Bechis, S. K. & Werb, Z. GATA-3 and the regulation of the mammary luminal cell fate. Curr. Opin. Cell. Biol. 20, References 164–170 (2008). 34. Asselin-Labat, M. L. et al. Gata-3 is an essential regulator of mammary-gland 1. Kennecke, H. et al. Metastatic behavior of breast cancer subtypes. J. Clin. morphogenesis and luminal-cell diferentiation. Nat. Cell. Biol. 9, Oncol. 28, 3271–3277 (2010). 201–209 (2007). 2. Janni, W. J. et al. Pooled analysis of the prognostic relevance of circulating 35. Augello, M. A., Hickey, T. E. & Knudsen, K. E. FOXA1: master of steroid tumor cells in primary breast cancer. Clin. Cancer Res. 22, 2583–2593 (2016). receptor function in cancer. EMBO J. 30, 3885–3894 (2011). 3. Ding, L. et al. Genome remodelling in a basal-like breast cancer metastasis 36. Mehra, R. et al. Identifcation of GATA3 as a breast cancer prognostic marker and xenograf. Nature 464, 999–1005 (2010). by global gene expression meta-analysis. Cancer Res. 65, 11259–11264 (2005). 4. Prat, A., Ellis, M. J. & Perou, C. M. Practical implications of gene-expression- 37. Mair, B. et al. Gain- and loss-of-function mutations in the breast cancer gene based assays for breast oncologists. Nat. Rev. Clin. Oncol. 9, 48–57 (2011). GATA3 result in diferential drug sensitivity. PLoS. Genet. 12, e1006279 (2016). 5. Pavlovic, M. et al. Enhanced MAF oncogene expression and breast cancer 38. Mehta, R. J. et al. FOXA1 is an independent prognostic marker for bone metastasis. J. Natl. Cancer Inst. 107, djv256 (2015). ER-positive breast cancer. Breast Cancer Res. Treat. 131, 881–890 (2012). 6. Gomis, R. R. & Gawrzak, S. Tumor cell dormancy. Mol. Oncol. 11, 62–78 (2017). 39. Jozwik, K. M. & Carroll, J. S. Pioneer factors in hormone-dependent cancers. 7. Sosa, M. S., Bragado, P. & Aguirre-Ghiso, J. A. Mechanisms of disseminated Nat. Rev. Cancer 12, 381–385 (2012). cancer cell dormancy: an awakening feld. Nat. Rev. Cancer 14, 611–622 (2014). 40. Root, D. E., Hacohen, N., Hahn, W. C., Lander, E. S. & Sabatini, D. M. 8. Bragado, P. et al. TGF-β2​ dictates disseminated tumour cell fate in target Genome-scale loss-of-function screening with a lentiviral RNAi library. Nat. organs through TGF-β​-RIII and p38α​/β​ signalling. Nat. Cell. Biol. 15, Methods 3, 715–719 (2006). 1351–1361 (2013). 9. Vicent, G. P. et al. Induction of progesterone target genes requires activation of Erk and Msk kinases and phosphorylation of histone H3. Mol. Cell. 24, Acknowledgements 367–381 (2006). We thank V. Raker for manuscript editing and IRB Barcelona Functional Genomics (J.I. 10. Deak, M., Clifon, A. D., Lucocq, L. M. & Alessi, D. R. Mitogen- and Pons and D. Fernández), Histopathology (N. Prats), Advanced Digital Microscopy (J. stress-activated protein kinase-1 (MSK1) is directly activated by MAPK and Colombelli) and Flow Cytometry (J. Comas) Core Facilities for assistance. S.Gawrzak, SAPK2/p38, and may mediate activation of CREB. EMBO J. 17, L.R., E.J.A. and K.S. were supported by La Caixa PhD fellowships. J.M.C. received a 4426–4441 (1998). fellowship from ‘PhD4MD’, a Collaborative Research Training Programme for Medical 11. Reyskens, K. M. & Arthur, J. S. Emerging roles of the mitogen and stress Doctors (IDIBAPS, August Pi i Sunyer Institute for Biomedical Research and IRB activated kinases MSK1 and MSK2. Front. Cell. Dev. Biol. 4, 56 (2016). Barcelona), and partial funding by the ISCIII (project: II14/00019). S.Gregorio, C.F.-P. 12. Naqvi, S. et al. Characterization of the cellular action of the MSK inhibitor and A.B. were funded by the Spanish Government (MINECO-Formación de personal SB-747651A. Biochem. J. 441, 347–357 (2012). Investigador). J.U. is an AECC (Asociación Española Contra el Cáncer) Fellow. D.K. 13. Merlos-Suarez, A. et al. Te intestinal stem cell signature identifes colorectal was co-funded by FP7 Marie Curie Actions (COFUND program; grant agreement no. cancer stem cells and predicts disease relapse. Cell. Stem Cell. 8, 511–524 (2011). IRBPostPro2.0 600404); A.P. was supported by Susan Komen Foundation, SEOM, BBVA 14. Morales, M. et al. RARRES3 suppresses breast cancer lung metastasis by Foundation and the ISCIII–PI13/01718. J. Albanell. was supported by ISCIIi/FEDER regulating adhesion and diferentiation. EMBO Mol. Med. 6, 865–881 (2014). under projects CIBERONC, PIE15/00008, PI15/00146 and Generalitat de Catalunya 15. Chou, J., Provot, S. & Werb, Z. GATA3 in development and cancer (2014 SGR 740). R.R.G., S.A.-B, J. Arribas. and A.R.N. are supported by the Institució diferentiation: cells GATA have it! J. Cell. Physiol. 222, 42–49 (2010). Catalana de Recerca i Estudis Avançats. Support and structural funds were provided by 16. Charafe-Jaufret, E. et al. Gene expression profling of breast cell lines the Generalitat de Catalunya (2014 SGR 535) to R.R.G. and A.R.N., and by the BBVA identifes potential new basal markers. Oncogene 25, 2273–2284 (2006). Foundation, the ISCIII/FEDER-CIBERONC, Worldwide Cancer Research (grant 15– 17. Neve, R. M. et al. A collection of breast cancer cell lines for the study of 1316), the Spanish Ministerio de Economia y Competitividad (MINECO) and FEDER functionally distinct cancer subtypes. Cancer Cell. 10, 515–527 (2006). funds (CIBEREONC and SAF2016-76008-R) to R.R.G. 18. Curtis, C. et al. Te genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups reveals novel subgroups. Nature 486, Author contributions 346–352 (2012). S.Gawrzak designed and performed the experiments and analysed the data. L.R. 19. Pereira, B. et al. Te somatic mutation profles of 2,433 breast cancers refnes performed the ChIP–seq experiments and analysed the data. E.J.A. F.S., J.U., C.F.-P., their genomic and transcriptomic landscapes. Nat. Commun. 7, 11479 (2016). I.d.B.B., B.M., D.K., R.G., S.Gregorio, K.S., A.B., E.F. and M.G. contributed to the 20. Cancer Genome Atlas Network.. Comprehensive molecular portraits of experiments. A.A.-E. isolated DBM cell line. F.R., A.L., A.R., M.M., J.M.C., A.P. and human breast tumours. Nature 490, 61–70 (2012). J. Albanell. contributed to building and analysing the tissue micro-array. M.P., C.S., J. 21. Bruna, A. et al. A biobank of breast cancer explants with preserved Arribas., J.C. and V.S. contributed the PDX generation and analyses. A.B.-L. and C.S.- intra-tumor heterogeneity to screen anticancer compounds. Cell 167, O.A. analysed the public transcriptomic data sets of the breast cancer human samples 260–274.e22 (2016). and statistics. S.A.-B. and A.R.N. participated in data analyses. R.R.G. conceived the 22. Soloaga, A. et al. MSK2 and MSK1 mediate the mitogen- and stress-induced project, designed and analysed the data, and supervised the overall project. S.Gawrzak phosphorylation of histone H3 and HMG-14. EMBO J. 22, 2788–2797 (2003). and R.R.G. wrote the manuscript. 23. Reyes, D. et al. Activation of mitogen- and stress-activated kinase 1 is required for proliferation of breast cancer cells in response to estrogens or Competing interests progestins. Oncogene 33, 1570–1580 (2014). The authors declare no competing financial interests. 24. Josefowicz, S. Z. et al. Chromatin kinases act on transcription factors and histone tails in regulation of inducible transcription. Mol. Cell. 64, 347–361 (2016). Additional information 25. Prat, A. et al. Prognostic signifcance of progesterone receptor-positive tumor Supplementary information is available for this paper at https://doi.org/10.1038/ cells within immunohistochemically defned luminal A breast cancer. J. Clin. s41556-017-0021-z. Oncol. 31, 203–209 (2013). Reprints and permissions information is available at www.nature.com/reprints. 26. Vermeulen, L., De Wilde, G., Van Damme, P., Vanden Berghe, W. & Haegeman, G. Transcriptional activation of the NF-κ​B p65 subunit by Correspondence and requests for materials should be addressed to R.R.G. mitogen- and stress-activated protein kinase-1 (MSK1). EMBO J. 22, Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in 1313–1324 (2003). published maps and institutional affiliations.

Nature Cell Biology | VOL 20 | FEBRUARY 2018 | 211–221 | www.nature.com/naturecellbiology 221 © 2018 Nature America Inc., part of Springer Nature. All rights reserved. Articles NATURE CEll BIOlOGy

Methods X-ray scanning. Bone metastasis development was monitored by X-ray imaging Cell culture. Cell lines and media used for the culture are listed in Supplementary (CT scan). Images were acquired at 50 kV with a 0.5 aluminium filter using a Table 8. T47D-5516 and BoM2 were derived with an in vivo selection procedure5. detection pixel size of 5 mm. Metastatic lesions were measured using Image Inhibitors were used as indicated in Supplementary Table 8. All cell lines except J software, and osteolytic areas were calculated in arbitrary units. HEK293T and HUVEC-RFP were stably transduced with TK-GFP-Luc construct and sorted for GFP expression. Only mycoplasma-negative cells were used. No Microarray assay. RNA was extracted from cells using the PureLink Mini Kit cell lines were in the database of commonly misidentifed cell lines maintained by following the manufacturer’s instructions. Labelling and hybridization of samples ICLAC (the International Cell Line Authentifcation Committee). HEK293T cells to the HG1.0ST gene expression chip (Affymetrix) were performed with standard were transfected with pLKO lentiviral vectors (Supplementary Table 8), pBABE methodology by the Functional Genomics Core Facility of IRB Barcelona. puro GFP, pBABE MSK1 GFP vectors (gifs from A. T. J. Wierenga), pBABE Puro Differential expression analysis was performed using the R limma package43. Gata3 (Addgene no. 1286) and pBABE Hygro FOXA1 using standard methods. P values were adjusted using the Benjamini and Hochberg method for multiple Recipient cells were transduced with the viral media and selected with puromycin comparisons44. (2 μ​g ml–1). Copy number alteration analysis. High-molecular genomic DNA was isolated Short interfering RNA (siRNA) transfection. Cells were transfected with from cells using GenElute Mammalian Genomic DNA Miniprep Kit following DharmaFECT transfection kit according to the manufacturer’s instruction, with 4 μ​ the manufacturer’s instructions. DNA quantity and quality were determined by l of 25 μ​M siRNAs to the final concentration of 100 nM (Supplementary Table 8). NanoDrop spectrophotometer and electrophoresis in 1% agarose gel. Genetic Cells were collected 48-h post-transfection for qRT–PCR analysis. aberrations were detected using NimbleGen Human CGH 3 ×​ 720 K Whole- Genome Tiling v3.0 Array comprising 72,000 probes. Samples were independently CRISPR gene editing. CRISPR–Cas9-based gene-editing methods41,42 were used labelled with Cy3 and Cy5 fluorochromes and co-hybridized. Copy number with slight modifications. The backbone plasmid vector pX330-EGFP was a gift analysis was performed using the R package CGHcall45 from Bioconductor. from E. Battle. All gRNA (guide RNA) sequences were designed using ChopChop software in default setting (https://chopchop.rc.fas.harvard.edu/index.php). The Genome-scale loss-of-function screening with a lentiviral RNA interference top five target sites in RPS6KA5 (sequence ID no. NM_004755) with no off-targets library. The MISSION LentiPlex human pooled shRNA library TRC1.0 was used were chosen (Supplementary Table 8). Ligation adapter sequences were added to for screening following the manufacturer’s instructions. Coverage per pool of the the 5′​ end of 20-nucleotide gRNA sequences for each target site. For single gRNA library was 275, which was similar to screens for tumour or metastasis suppressor assembly, a pair of synthesized oligonucleotides were annealed, phosphorylated genes46. A total of 55 million cells were infected. After puromycin selection and and ligated to the Bbsl linearized vector. Constructs were sequenced using the confirmation of insert integration by PCR, each pool (or cells with control shRNA) 5 hU6 promoter primer. DBM cells (2 ×​ 10 ) were transfected using the NanoJuice was inoculated intracardiacally into 10 animals (5 ×​ 105 cells per mouse). Animals Transfection kit with 2.5 μ​g of plasmid diluted in reagent and booster (ratio: 2/3), were monitored weekly by BLI. Xenografted cells that had formed metastatic and serum-free medium up to a final volume of 100 μ​l. At 48-h post-transfection, lesions in hindlimbs were flushed and GFP sorted. Genomic DNA was extracted single cells were sorted by fluorescence-activated cell sorting (FACS) into 96-well from pre-inoculation DBM cells and bone metastases using the GenElute kit, and plates. Clones derived from single cells (as well as control clones) were expanded, analysed with a Nanodrop spectrophotometer and the Qubit DNA assay. and RNA and protein levels were isolated. The heterogeneity of cell populations was restored by pooling MSK1 KO clones. Wild-type-edited (WT-E) cells High-throughput sequencing and data analysis. Genomic DNA samples were transduced with the KO vector that expressed MSK1, and WT cells targeted with submitted to the Sigma Deconvolution platform, and the abundance of each mock plasmid, were used as controls. shRNA clone in samples was tested by amplifying and next-generation sequencing (at ×1,000)​ shRNA regions and barcoding samples; short reads were aligned to the Animal studies. The Animal Care and Use Committee of IRB Barcelona approved reference. Data were obtained as the number of shRNA sequences per clone per animal work (approval number 9096); the study complies with all relevant ethical sample (screening results are shown in Supplementary Tables 2 and 3). animal research regulations. Sample sizes were estimated. Exclusion criteria for data analysis were pre-established. Mice that died or were euthanized for ethical Public microarray processing. Public microarray data sets were processed reasons before defined experimental end points were excluded. Animals were separately for tumour samples of each data set using packages affy47 and affyPLM48 randomly allocated to cages and experimental groups. Investigators were blinded from Bioconductor. Raw CEL files were normalized using robust multi-array during bioluminescent, radiographic and histological assessments. Female BALB/c average (RMA) background correction and summarization. Standard quality nude mice (Harlan) were used (11–13-weeks old). Before surgery, mice were controls were performed to identify abnormal samples. Technical information was anaesthetized as previously described5. For experiments with DBM or BoM2 retrieved from the original CEL files, and metrics were computed and recorded cell lines, 90-day release oestrogen (β​-oestradiol 0.18 mg per pellet) pellets were as additional features for each sample as described49. Probe-set annotation was implanted subcutaneously. For intra-cardiac injections, 5 ×​ 105 cells were injected performed using the information available in the Affymetrix web page (https:// into the left cardiac ventricle of the animals, using a 26 G needle5. For orthotopic www.affymetrix.com/site/mainPage.affx). Expression was summarized at the transplant experiments, 3 ×​ 106 cells were resuspended in 1/1 Matrigel and PBS, gene level by the most variable probe set within each gene as measured by median and injected into the third mammary fat pads. Immediately after injection, mice absolute deviation. Data were merged in a unique expression matrix after quantile were imaged for luciferase activity to confirm successful xenograft. normalization using the common set of genes present in all series. For BLI, mice were imaged in an IVIS 100 for 1 min, and data were recorded using LivingImage software (versions 2.60.1 and 4.5.2). To measure bone colonization, photon flux was calculated for each mouse using two circular regions ER status imputation. ER status was imputed for each data set separately based of interest (ROIs) in a hind leg. After subtracting a background value, photon flux on the expression of ESR1 using a clustering analysis for each transcript intensity was normalized to the value obtained at xenografting. Metastatic colonization via non-parametric density estimation. Although ER status was recorded for was defined as a photon flux value greater than the BLI signal at day 0. An IVIS three of the four microarray data sets (GSE2430, GSE2603 and GSE5327), these SpectrumCT instrument was used to obtain BLI images integrated with low-dose μ​ imputations were used for consistency, as this information was not available for CT. Three-dimensional (3D) images were acquired on a stable revolving platform GSE12276. This resulted in a total of 377 samples annotated as ER+, and the rest – with 360° rotation. Quantitative bioluminescence data and CT images were (267), as ER . processed, reconstructed and co-registered using the DLIT Reconstruction option in the LivingImage 4.5.2 software. Correlation analyses and survival analysis of transcriptomic data sets. To test To measure bone homing, photon flux was calculated for each mouse using RPS6KA5 association in the microarray data, a mixed-effect model was fitted to two circular ROIs encompassing the hindlimb (ROI per leg) between day 7 and each gene independently (with scan batch as a random effect, and HER2 status, day 21 post-injection, taking one measurement every 7 days. Background values data set, Ecklund’s metrics and the interaction between data set and Ecklund’s obtained at the indicated day of the experiment were subtracted, and bone homing metrics, covariates; if all samples were used, ER status was an adjusting variable). was defined as having a photon flux value of an ROI greater than the background Correlation with RPS6KA5 was assessed by the corresponding Wald tests, which signal. The ratio of positive (bone homing) and negative (no bone homing) events was provided by the linear-model-assessed correlation, and Pearson’s correlation was calculated as percentages. coefficient was computed between RPS6KA5 and the corresponding gene for To measure the in vivo activity of caspase 3/7, mice were injected with VivoGlo measure of association. All analyses used the R packages lme4 (ref. 50) and Z-DEVD-aminoluciferin (166 mg per kg), and analysed by BLI after injection and lmerTest51. Adjustment by multiple contrasts was performed by the Benjamini– after 6 h. In the case of a non-detectable signal, standard d-luciferin (LUC) was Yekutieli method52. administered to the animals, and BLI signals were measured. Apoptotic cell content Bone relapse analyses only included first metastasis events of bone metastasis. was assessed by normalization of Z-DEVD BLI signals to BLI signals from the Association with metastasis was evaluated using a frailty Cox proportional hazards lesion. model53. Significance was assessed by means of a log-likelihood ratio test, and a EdU or BrdU incorporation experiments were done by single or multiple Wald test was used for pairwise comparisons. RPS6KA5 expression was evaluated intraperitoneal injection of the indicated compound (50 mg per kg). as a continuous variable assuming a linear relationship with the logarithm of

Nature Cell Biology | www.nature.com/naturecellbiology © 2018 Nature America Inc., part of Springer Nature. All rights reserved. NATURE CEll BIOlOGy Articles the relative risk. Sample groups of low, medium and high expression levels were (see ‘Correlation analyses and survival analysis of transcriptomic data sets’). In this defined using the tertiles of the intensity distribution after correction by technical set-up, plate was the only technical variable considered in the analysis, and it was effects (as for RPS6KA5 association). This correction used a mixed-effect model included as a random effect in the model. Prognosis of patients with an expression for HER2 status, and included metastasis before and after 3 years of follow-up. mean greater than zero were compared to those showing zero for all intensity RPS6KA5 was evaluated both as continuous and as a grouped variable, with measures. Early recurrence associated with MSK1 expression was evaluated using a technical effects, HER2 status and ER status included in the Cox models, and scan time-varying coefficient in the model, which was modelled by splitting the data set batch was included as a random effect in the frailty Cox models. HRs and 95% into two time-dependent sections: before and after 3 years of follow-up. confidence intervals (95% CI) were computed as a measure of association. For visualization, Kaplan–Meier curves were estimated for groups of tumours that PAM50 intrinsic subtype assay. RNA was extracted from cells using the PureLink showed low, medium or high expression of RPS6KA5, and adjusted by HER2 (and Mini Kit following the manufacturer’s instructions. Samples were analysed using ER when necessary), standardizing groups to the whole data set distribution by Prosigna NanoString nCounter Dx Analysis System as described previously56. reweighting. Samples were excluded if scan batches contained exclusively patients + with or without relapse (for example, data set GSE12276). 164 ER samples were Flow cytometry and FACS. Cultured cells were trypsinized. Cells from bone analysed for bone metastasis free-survival (109 relapse-free and 55 metastatic metastatic lesions were purified physically and enzymatically: excised femur and – samples), and 139 ER samples were analysed for any metastasis free-survival tibia bones underwent several rounds of crushing in a mortar with 3 ml of ice- (87 relapse-free and 52 metastatic samples). Association of RPS6KA5 with early cold PBS, 2% FBS and 1 mM EDTA buffer; each time, suspended bone marrow and late relapse was modelled using a step function for a pre-specified cut-off of cells were filtrated (70 μ​m) and collected separately. Bone fragments were then follow-up (3 or 5 years). The threshold for significance was set at 5%. incubated for 45 min at 37 °C with digesting medium (PBS with 0.25% collagenase type 1 and 20% FBS), and the supernatant was strained and added to the previously Biological enrichment analysis. Pathway enrichment was assessed through the obtained cells. Cells were concentrated by centrifugation for 7 min at 1,200 rpm 54 pre-ranked version of GSEA , which was applied to the ranking defined by the and suspended in single-cell condition after passage through a mesh (40 μ​m). For correlation coefficients of each gene with RPS6KA5. Gene sets derived from the EdU detection, the Click-iT Plus Kit for FACS with Alexa 647 was used, following Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database and those the manufacturer’s instructions. For analysis, a Gallios instrument or an Aria annotated under GO terms as collected in the Molecular Signatures Database 2.0 sorter was used. Cells were selected in the forward scatter/side scatter (FSC/ 16,17 (MsigDB) were used in these analyses. Luminal and basal signatures were SSC) dot plot and then gated to exclude cellular aggregates in the FSC/FSC dot tested for enrichment in the RPS6KA5 correlation ranking. In addition, basal and plot. Gates of GFP+, red fluorescent protein (RFP)+, alloplycocyanin (APC)+ or 55 luminal A and luminal B signatures were derived from the METABRIC and Alexa 647+ cells were set and compared with a control sample with no detectable 20 The Cancer Genome Atlas (TCGA) data sets and tested for enrichment in the fluorochrome expression. RPS6KA5 correlation ranking. For doing so, luminal A signatures were defined as genes showing differential overexpression in luminal A compared to luminal B Protein extraction and western blot. Cells were lysed with a RIPA buffer (25 mM samples (fold change of >​1.5; raw P value <​ 0.05). Luminal B signatures were built Tris-HCl pH 7.6, 150 mM NaCl, 1% NP-40, 1% sodium deoxycholate and 0.1% in an analogous way. Basal signatures included genes simultaneously overexpressed SDS, supplemented with protease and a phosphatase inhibitor cocktail) or, for in the PAM50 basal subtype compared to any other subtype (minimum fold transcription factor extraction, modified buffer (10% SDS, 60 mM Tris buffer change of >​2; raw P value <​ 0.01). To identify these genes, a differential expression 43 pH 6.8 and 7% DTT) and sonicated for 5 min at medium intensity. Proteins analysis was performed in each data set separately using limma (METABRIC) or were separated and transferred and blocked as previously described57. Primary a standard mixed-effect linear model (TCGA), in which technical variables were antibodies were incubated for 1 h at room temperature, or overnight at 4 °C, considered as potential confounders: the sub-cohort of provenance was included and secondary antibodies were incubated for 1 h at room temperature (see as a fixed-effect covariate in the METABRIC data, whereas plate identifier was Supplementary Table 8). considered as a random effect in the TCGA data set using lme4 (ref. 44). For these analyses, the PAM50 sample annotation was used as originally provided in the qRT–PCR analysis. RNA extraction, reverse transcription and real-time PCR clinical information of each series. were performed and analysed as previously described57. For reagents and TaqMan probes, see Supplementary Table 8. Histopathology, immunohistochemistry and immunofluorescence. Hindlimb bones, PDX tumours and cell pellets were processed as described before5. The Cell proliferation assay. Proliferation of cells in vitro was assessed using details of primary and secondary antibodies are shown in the Supplementary CyQUANT Cell Proliferation kit according to the manufacturer´s instructions. Table 8. EdU was detected with the Click-iT EdU Colorimetric IHC Detection At 24 h after plating, 4-OH tamoxifen or vehicle (ethanol) was added to the cell Kit according to manufacturer’s instructions. Tissue sections were scanned and culture at the indicated concentrations, and cell number was quantified 72 h later digitalized using NanoZoomer2.0HT, and images were acquired by NDP.view2 using Biotek FL600 fluorescence microplate reader at 485–530 nm. software. Staining was quantified using TMARKER software with the Color Deconvolution plugin. Oncosphere and 3D organotypic formation assays. First-generation oncospheres 58 Tissue microarray of breast cancer samples. Samples were obtained from patients were obtained as previously described . Second-generation oncospheres were cultured for 15 days in suspension with stemness media. 3D organoids were formed with breast cancer who were treated at the Hospital Clínic Department of Medical 58 Oncology from 2006 to 2009 by standard guidelines. Patients provided signed as previously described , except that 3D structures were grown for 15–21 days. informed consent for experimental analysis of samples. The study is compliant with all relevant ethical regulations regarding research involving human participants Hypoxia assay. Cells were cultured in a hypoxic chamber (94.5% N2, 5.0% CO2 and 4 and received ethical approval from the Hospital Clínic Research Ethics Committee. 0.5% O2), 5 ×​ 10 cells were seeded for 72 h, harvested and stained with Annexin V All samples were formalin-fixed paraffin-embedded primary tumour tissue and APC and the PI (propidium iodide) Kit. Percentages of living, early apoptotic, late had been analysed by an expert pathologist. breast cancer was classified as luminal apoptotic and dead cells were determined by Gallios flow cytometer analysis using (ER+ and/or progesterone receptor positive (PR+), HER2–), HER2+ (HER2+ and/ the APC and PI channel. or hormonal receptors (HHRR)+/–) or triple negative (ER–, PR– and HER2–), 5 with ER+ or PR+ defined as ≥​1% ER/PR+ tumour cells, and HER2+ defined by an Cell adhesion. Cells (1 ×​ 10 ) were seeded in triplicate on 24-well plates coated –1 immunohistochemistry score of 3+​ (whereby 0 or 1+ ​indicates HER2–, and 2+​ with fibronectin or collagen at 10 mg ml , or Matrigel. After 2 h, cells were washed, indicates HER2 borderline; borderline cases were also tested by fluorescence in situ fixed with formalin for 10 min and then visualized by crystal violet staining for hybridization (FISH)). 15 min (with dye dissolved in 2% SDS). Absorbance was measured at 570 nm. Of the 322 patient samples, 275 were evaluated; exclusion was due to non- criteria diagnosis or insufficient or non-representative biopsy. The median Cell migration and invasion. Cell invasion was assayed using Matrigel-coated age at breast cancer diagnosis was 61 yr (range: 28–92 yr); median follow-up BioCoat Cell Culture Inserts, and the cell migration was determined using was 87 months, and 24 patients (8.73%) presented with metastatic relapse. fibronectin-coated BioCoat Cell Culture Inserts as previously described57. Immunohistochemical analyses showed that 78.18% of patients (n =​ 215) were hormone-receptor positive, 11.64% (n =​ 32) were HER2+ and 4.36% (n =​ 12) were In vitro tube formation assay. Conditioned EGM-2 media was collected triple-negative. Four patients (16.67%) had bone metastasis, 45.83% had visceral from DBM control or MSK1-downregulated cells 24-h post-seeding. Wells of involvement and 37.5% had both bone and visceral lesions. One patient was 24-well plate were covered with 300 μ​l of Matrigel mixed with 250 μ​l of EGM- excluded from the final analysis due to the lack of one parameter. 2-conditioned media 30 min before cell seeding. HUVEC-RFP were seeded at a concentration of 2 ×​ 105 per well, and tube formation was followed using the Survival analysis of immunohistochemistry data. For each sample, expression automatic modular microscope Olympus CellR/ScanR. Pictures were taken values were summarized using the mean across all measures performed in the every 10 min from 3 ROIs per condition during 15 h in a chamber with standard different positions of the corresponding plate. Association with time-to-metastasis conditions (37 °C, 5% CO2). Images quantifications were done using a custom was evaluated using a frailty Cox proportional hazards model as described above macro designed for the Fiji software (see Code availability).

Nature Cell Biology | www.nature.com/naturecellbiology © 2018 Nature America Inc., part of Springer Nature. All rights reserved. Articles NATURE CEll BIOlOGy

PDXs. The Ethical Committee of Animal Experimentation of the Vall d’Hebron for Figs. 1c,g; 2d–f; 4a–c,e; 5a–c,e,i,j and 7c, and Supplementary Figs. 1a–d,f,g,i,j; Research Institute approved all the animal experiments. Experiments were 2a,d–f; 3a–f; 4a–f,i,j; 5b,d; 6a,d,g and 7c,f are provided in Supplementary Table 9. compliant with the European Union’s animal care directive (86/609/EEC). The Vall d’Hebron Hospital Human Sample Ethical Committee approved the protocol of human sample collection. Informed consent was obtained from all subjects. The References study is compliant with all relevant ethical regulations regarding research involving 41. Ran, F. A. et al. Genome engineering using the CRISPR–Cas9 system. human participants and animals. PDX samples were established as previously Nat. Protoc. 8, 2281–2308 (2013). described59 in female NMRI nu/nu mice. Mice were continuously administered 17β​ 42. Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. -oestradiol (1 μ​M) in drinking water. Upon growth of the engrafted tumours, the Science 339, 819–823 (2013). model was perpetuated by serial transplantation, and samples were sequentially 43. Smyth, G. K. Linear models and empirical bayes methods for assessing taken for genotyping and histological studies. diferential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3, 3 (2004). ChIP–seq, ChIP–PCR and ChIP–seq data processing. ChIP–seq was performed 44. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and analysed as previously described60, using DBM shRNA control (shCtrl) and powerful approach to multiple testing. J. R. Stat. Soc. 57, 289-300 (1995). and shRNA MSK1 (shMSK1) cells (irradiated with UVC at 254 nm for 30 min). 45. Venkatraman, E. S. & Olshen, A. B. A faster circular binary segmentation ChIP–qRT–PCR was performed on three independent experiments to confirm algorithm for the analysis of array CGH data. Bioinformatics 23, H3S10ph, H3S28ph, H3K9ac and H3K27ac enrichment in shCtrl and shMSK1 657–663 (2007). cells as for ChIP–seq. Histone enrichment (compared to basal levels) was obtained 46. Gargiulo, G., Serresi, M., Cesaroni, M., Hulsman, D. & van Lohuizen, M. In by normalizing to both the input CT value and the CT value of an intergenic region vivo shRNA screens in solid tumors. Nat. Protoc. 9, 2880–2902 (2014). that was negative for histone modifications. Antibodies and primers are given in 47. Gautier, L., Cope, L., Bolstad, B. M. & Irizarry, R. A. afy—analysis of Supplementary Table 8. Afymetrix GeneChip data at the probe level. Bioinformatics 20, ChIP–seq data sets were aligned to the hg19 using BowTie 307–315 (2004). (version 1.0.1)61 (parameters –k 1, –m 1 and –n 2). H3S10ph and H3S28ph data sets 48. Bolstad, B. M., Collin, F., Simpson, K. M., Irizarry, R. A. & Speed, T. P. were aligned with BWA (version 0.7.5a-r405)62 (parameters –n 2 –l 20 and –k 1). Experimental design and low-level analysis of microarray data. Int. Rev. Peak calling of MSK1 (ref. 24) for regions of ChIP–seq enrichment over background Neurobiol. 60, 25–58 (2004). used MACS version 1.4.1. (parameters –p 1e–5, –w, –S and –g hs), whereas MACS 49. Eklund, A. C. & Szallasi, Z. Correction of technical bias in clinical microarray version 2 was used for histone marks (parameters –broad, -q 0.01 and –g hs). data improves concordance with known biological information. Genome Biol. University of California, Santa Cruz (UCSC) browser tracks63 were created from 9, R26 (2008). the MACS wiggle or bedGraph output. The annotatePeaks.pl script of the HOMER 50. Bates, D., Mächler, M., Boker, B. & Walker, S. Fitting linear mixed-efects suite (version 4.6) (using UCSC hg19) was used to annotate ChIP–seq peaks and models using lme4. J. Stat. Sof. 67, 1–48 (2015). calculate the coverage depths of different ChIP–seq experiments at specified regions 51. Kuznetsova, A., Brockhof, P. B. & Christensen, R. H. B. lmerTest package: (generating a normalized coverage value of different sequencing experiments at tests in lenear mixed efects models. J. Stat. Sof. 82, 1–26 (2017). equally spaced bins set to 25 base pairs spanning the ROI). 52. Benjamini, Y. & Yekutieli, D. Te control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29, 1165–1188 (2001). Statistics and reproducibility. Statistical analyses used R64 and GraphPad software, 53. Terneau, T. M., Grambsch, P. M. & Pankratz, V. S. Penalized survival models with a minimum of three biologically independent samples for significance and frailty. J. Comp. Graph. Stat. 12, 156–175 (2003). (Supplementary Table 9). For animal experiments with intracardiac injections, 54. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based each hindlimb was an independent sample. In metastasis-free survival analysis, approach for interpreting genome-wide expression profles. Proc. Natl. Acad. each mouse was counted as a biologically independent sample. Fisher’s exact tests Sci. USA 102, 15545–15550 (2005). were used for binomial variables, and Kaplan–Meier estimates and log-rank test 55. Curtis, C. et al. Te genomic and transcriptomic architecture of 2,000 breast were used for time-to-event data. Linear models were fitted to perform group tumours reveals novel subgroups. Nature 486, 346–352 (2012). comparison for continuous variables; when necessary, a Tukey transformation 56. Wallden, B. et al. Development and verifcation of the PAM50-based Prosigna was applied to the response to fulfil the assumptions of the model. Comparison breast cancer gene signature assay. BMC Med. Genom. 8, 54 (2015). P values were adjusted by multiple comparisons using the Shaffer65 method 57. Urosevic, J. et al. Colon cancer cells colonize the lung from established liver within each experiment. For continuous variables, the Student’s t-test was used metastases through p38 MAPK signalling and PTHLH. Nat. Cell. Biol. 16, for normally distributed data, and the Mann–Whitney test for non-Gaussian 685–694 (2014). populations. Two-tailed and unpaired tests were used for data analysis, unless 58. Slebe, F. et al. FoxA and LIPG endothelial lipase control the uptake of otherwise stated. The threshold for significance was set at 5%. All experiments extracellular lipids for breast cancer growth. Nat. Commun. 7, 11199 (2016). were reproduced at least three times, unless otherwise indicated. 59. Herrera-Abreu, M. T. et al. Early adaptation and acquired resistance to CDK4/6 inhibition in oestrogen receptor-positive breast cancer. Cancer Res. Life Sciences Reporting Summary. Further information on experimental design is 76, 2301–2313 (2016). available in the Life Sciences Reporting Summary. 60. Rinaldi, L. et al. Dnmt3a and Dnmt3b associate with enhancers to regulate human epidermal stem cell homeostasis. Cell. Stem Cell. 19, 491–501 (2016). Code availability. Custom-made codes, including the macros created by the 61. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and Advance Digital Microscopy Core Facility at IRB Barcelona used for image memory-efcient alignment of short DNA sequences to the human genome. analysis, are available upon request. Genome Biol. 10, R25 (2009). 62. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows– Data availability. Data have been deposited in the Gene Expression Omnibus Wheeler transform. Bioinformatics 25, 1754–1760 (2009). (GEO) under primary accession codes GSE92298 (microarray), GSE92299 (CGH) 63. Kent, W. J. et al. Te human genome browser at UCSC. Genome Res. 12, and GSE92477, GSE100306 (ChIP–seq), and for re-analysed, previously published 996–1006 (2002). microarray data, referenced accessions GSE2034 (286 lymph-node-negative 64. Te R Core Team. R: A Language and Environment for Statistical Computing samples; Tumour Bank, EMC), GSE2603 (99 breast cancer samples; MSKCC), (R Foundation for Statistical Computing, 2016); https://cran.r-project.org/ GSE5327 (58 ER– breast cancer tumour samples; EMC) and GSE12276 (204 doc/manuals/r-release/fullrefman.pdf patients with breast cancer with known sites of relapse; EMC). Data from MSK1 65. Shafer, J. P. Modifed sequentally rejective multiple test procedures. J. Am. ChIP–seq were obtained from the authors24 and re-analysed. Statistics source data Stat. Assoc. 81, 826-831 (1986).

Nature Cell Biology | www.nature.com/naturecellbiology © 2018 Nature America Inc., part of Springer Nature. All rights reserved. nature research | life sciences reporting summary

Corresponding author(s): Roger R Gomis Initial submission Revised version Final submission Life Sciences Reporting Summary Nature Research wishes to improve the reproducibility of the work that we publish. This form is intended for publication with all accepted life science papers and provides structure for consistency and transparency in reporting. Every life science submission will use this form; some list items might not apply to an individual manuscript, but all fields must be completed for clarity. For further information on the points included in this form, see Reporting Life Sciences Research. For further information on Nature Research policies, including our data availability policy, see Authors & Referees and the Editorial Policy Checklist.

` Experimental design 1. Sample size Describe how sample size was determined. An assessment of the number of animals required for each procedure was perform ed using Statistical Power Analysis and taking into consideration the appropriate st atistical tests, significance level of 5% and statistical power of 80%. An estimate of variance was inferred from previous experiments, especially considering the variabi lity of tumor xenograft growth. 2. Data exclusions Describe any data exclusions. Animals were excluded from the study if not properly injected or severe cachexia w as reported. Criteria was pre-established. 3. Replication Describe whether the experimental findings were Experimental findings were reliably reproduced. Experiments were performed at reliably reproduced. least three times unless otherwise noted in the manuscript. 4. Randomization Describe how samples/organisms/participants were Animals, upon arrival, were randomly allocated into cages with five mice each. The allocated into experimental groups. mice were randomly assigned to experimental groups. No specific method of randomization was used. 5. Blinding Describe whether the investigators were blinded to Investigator was not aware of group allocation when assessing the outcome of biol group allocation during data collection and/or analysis. uminescent, radiographic analyses. Samples for immunohistochemistry were coded and investigators were blinded during staining and quantification. Note: all studies involving animals and/or human research participants must disclose whether blinding and randomization were used. June 2017

1 6. Statistical parameters nature research | life sciences reporting summary For all figures and tables that use statistical methods, confirm that the following items are present in relevant figure legends (or in the Methods section if additional space is needed). n/a Confirmed

The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement (animals, litters, cultures, etc.) A description of how samples were collected, noting whether measurements were taken from distinct samples or whether the same sample was measured repeatedly A statement indicating how many times each experiment was replicated The statistical test(s) used and whether they are one- or two-sided (note: only common tests should be described solely by name; more complex techniques should be described in the Methods section) A description of any assumptions or corrections, such as an adjustment for multiple comparisons The test results (e.g. P values) given as exact values whenever possible and with confidence intervals noted A clear description of statistics including central tendency (e.g. median, mean) and variation (e.g. standard deviation, interquartile range) Clearly defined error bars

See the web collection on statistics for biologists for further resources and guidance.

` Software Policy information about availability of computer code 7. Software Describe the software used to analyze the data in this LivingImage software (versions 2.60.1 and 4.5.2), NDP.view2 software, TMARKER study. V2.165 software, R.Q manager 1.2 software, DataAssist 3.01 software, Fiji 1.48p software,ImageJ version June 2014, MACS version 1.4.1. software, MACS version 2 software, HOMER suite 4.6 software, GraphPad Prism 6 software, R (Bioconductor), CHOPCHOP (online version from 2015).

For manuscripts utilizing custom algorithms or software that are central to the paper but not yet described in the published literature, software must be made available to editors and reviewers upon request. We strongly encourage code deposition in a community repository (e.g. GitHub). Nature Methods guidance for providing algorithms and software for publication provides further information on this topic.

` Materials and reagents Policy information about availability of materials 8. Materials availability Indicate whether there are restrictions on availability of All unique materials (cell derivatives DBM and BoM2) are available from the unique materials or if these materials are only available authors upon request contingent to our IRB Barcelona institutional Material for distribution by a for-profit company. Transfer Agreement policy. 9. Antibodies Describe the antibodies used and how they were validated All antibody information, including dilutions, is provided in Supplementary Table 9. for use in the system under study (i.e. assay and species). IHC and IF BrdU (BD 347580, validated for the application and species https:// www.citeab.com/antibodies/2414712-347580-purified-mouse-anti-brdu) RRID AB_10015219 Ki67 (Abcam 15580, validated for the application and species by vendor) RRID AB_443209 ERα (Abcam 16660, clone SP1, validated for the application and species by vendor) RRID AB_443420 GFP (Life Technologies 11122, validated for the application and species by vendor) RRID AB_221569 pP38 (Cell signalling 4631, clone 12F8, validated for the application and species by June 2017 vendor) RRID AB_331765 KRT7 (Sigma HPA007272, validated for the application and species by vendor) RRID AB_1079181 FOXA1 (Millipore 05-1466, clone 2F83, validated for the application and species by vendor) RRID AB_1977191 CXCR4 (Abcam 124824, clone UMB2, validated for the application and species by vendor) RRID AB_10975635 CD31 (Abcam 28364, validated for the application and species by vendor) RRID AB_726362 MSK1 (Cell Signaling 3489, clone C27B2, validated for the application and species by vendor) RRID AB_2285349 2 mouse IgG Alexa 568 (Molecular Probes A11004), validated for the application and species by vendor AB_141371 ()f 10. Eukaryotic cell lines nature research | life sciences reporting summary a. State the source of each eukaryotic cell line used. The human BCa cell lines T47D, MCF7, ZR75.1 and BT474, and human embryonic kidney 293T cells, were purchased from ATCC. Human umbilical vein cells RFP (HUVEC-RFP, VERAVEC) were purchased from Angiocrine Bioscience. The dormant bone metastatic cell subline T47D-5516 (DBM) was derived from the parental T47D cell line following an in vivo selection procedure, and the BoM2 bone metastatic subline was derived from MCF7 cells.

b. Describe the method of cell line authentication used. Cell lines were authenticated in our lab for the presence of ER and HER2, by IHC or FISH, respectiv ely. Cell lines were purchased with the certificate from the vendor.

c. Report whether the cell lines were tested for Cell lines were tested for mycoplasm and only mycoplasma-negative cell lines were mycoplasma contamination. used.

d. If any of the cell lines used are listed in the database None of the cell lines used in this study was found in the database of commonly of commonly misidentified cell lines maintained by misidentified cell lines that are maintained by ICLAC. ICLAC, provide a scientific rationale for their use.

` Animals and human research participants Policy information about studies involving animals; when reporting animal research, follow the ARRIVE guidelines 11. Description of research animals Provide details on animals and/or animal-derived Mice, BALB/c nude, female, 11 to 13 weeks of age. materials used in the study.

Policy information about studies involving human research participants 12. Description of human research participants Describe the covariate-relevant population The covariate-relevant population characteristics of the human research characteristics of the human research participants. participants of the MSKCC/EMC dataset was previously described in Pavlovic et al JNCI 2015. In addition, when evaluating RPS6KA5 both as continuous and as grouped variable, technical effects as well as HER2 status were included in the Cox models, in addition to ER status, for which all samples were involved in the analyses. In an analogous way to the correlation analyses, scan batch was included as a random effect in the frailty Cox models. The covariate-relevant population characteristics of the human research participants of the Hospital Clinic dataset are provided below. Samples were obtained from BCa patients treated at the Hospital Clínic Department of Medical Oncology from 2006–2009 by standard guidelines. All samples were formalin-fixed paraffin-embedded (FFPE) primary tumour tissue and had been analysed by an expert pathologist. Of the 322 patient samples, 275 were evaluated; exclusion was due to non-criteria diagnosis or insufficient or non- representative biopsy. The median age at breast cancer diagnosis was 61 years (range, 28–92 years); median follow-up was 87 months, and 24 patients (8.73%) presented metastatic relapse. Immunohistochemical analyses showed that 78.18% of patients (n = 215) were HR+, 11.64% (n = 32), HER2+ and 4.36% (n = 12), triple- negative. Four (16.67%) had bone metastasis, 45.83% had visceral involvement and 37.5% had both bone and visceral lesions. One patient was excluded from final analysis due to the lack of one parameter. June 2017

3 June 2017 nature research | flow cytometry reporting summary 1 m). μ s of m) and μ Final submission Revised version Initial submission Corresponding author(s):Corresponding Roger R Gomis were gated to exclude cellular aggregates in the FSC/FSC dot plot. Gates of GFP+, RFP+, APC+ or Alexa 647+ cells were set and compared with a control sample with no detectable fluorochrome expression. For flow cytomerty analysis: Beckman Coulter Gallios TM For flow cytomerty analysis: Beckman Coulter metastatic lesions were purified using a physical and enzymatic protocol. metastatic lesions were purified using a and femur was separated from Briefly, hind limbs bones were excised, with 3 ml of ice-cold PBS tibia. Bones were placed in a mortar filled and then crushed. Suspended supplemented with 2% FBS and 1 mM EDTA a cell strainer (70 bone marrow cells were filtrated through collected. After multiple repetition of this procedure, bone fragments collected. After multiple repetition of this digesting medium composed of were incubated for 45 min at 37oC with in PBS. After incubation, digestion 0.25% collagenase type 1 and 20% FBS and added to previously obtained media were passed through cell strainer for 7 min at 1,200 rpm and cells. Cells were concentrated by centrifugation passage through a mesh (40 suspended in single-cell condition after For data collection: BD FACSDiva v 6.1.2 software, Beckman Coulter Gallios For data collection: BD FACSDiva v 6.1.2 TM software For data analysis: FlowJo v 10 GFP+ cells sorted from from metastatic lesions were directly placed into DNA extraction buffer. The abundance of GFP+ cells for these samples could not be assessed.

identical markers).

populations within post-sort fractions. the flow cytometry data.

Methodological details Data presentation

4. A numerical value for number of cells or percentage (with statistics) is provided. 4. A numerical value for number of cells 3. All plots are contour plots with outliers or pseudocolor plots. 3. All plots are contour plots with outliers

2. The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysi axes only for bottom left plot of group clearly visible. Include numbers along 2. The axis scales are 1. The axis labels state the marker and fluorochrome used (e.g. CD4-FITC). the marker and fluorochrome used 1. The axis labels state

Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information. Tick this box to confirm that a figure exemplifying the gating strategy is provided

9. Describe the gating strategy used. After cells were selected in the FSC/SSC dot plot to remove debris, they

8. Describe the abundance of the relevant cell 7. Describe the software used to collect and analyze 7. Describe the software used to collect

6. Identify the instrument used for data collection.6. Identify the instrument used for data For cell sorting: BD FACSAria 2.0 5. Describe the sample preparation. Cultured cells were collected by trypsinization, and cells from bone For all flow cytometry data, confirm that: For all flow cytometry `

`

Form fields will expand as needed. Please do not leave fields blank. as needed. Please do not leave fields Form fields will expand Flow Cytometry Reporting Summary Flow Cytometry June 2017 nature research | ChIP-seq reporting summary 1 Final submission . GEO Revised version Initial submission Corresponding author(s):Corresponding Roger R. Gomis Accession number of GEO entries: Accession number of GSE100306, token for access mfmjosuylbknvmb link https:// GSE92477, access for the reviewers in this www.ncbi.nlm.nih.gov/geo/query/acc.cgi? token=wjsxkequpvsffax&acc=GSE92477 Files in GEO entry GSE100306: shControl_only_macs2_control_lambda.bedgraph MSK1_only_Input_GGCTAC.fastq.gz shControl_only_S10P_macs2_treat_pileup.bedgraph MSK1_only_macs2_control_lambda.bedgraph shControl_only_S28P_macs2_treat_pileup.bedgraph MSK1_only_S10P_macs2_treat_pileup.bedgraph Shctrl_only_S10P_ATCACG.fastq.gz MSK1_only_S10P_TTAGGC.fastq.gz Shctrl_only_S28P_ACAGTG.fastq.gz MSK1_only_S28P_CAGATC.fastq.gz Shtrl_only_Input_GATCAG.fastq.gz MSK1_only_S28P_macs2_treat_pileup.bedgraph Files in GEO entry GSE92477: shCTRL_H3K9ac_peaks.broadPeak shCTRL_H3K9me3_peaks.broadPeak shCTRL_H3K27ac_peaks.broadPeak shCTRL_H3K27me3_peaks.broadPeak shMSK1_H3K9ac_peaks.broadPeak shMSK1_H3K9me3_peaks.broadPeak shMSK1_H3K27ac_peaks.broadPeak shMSK1_H3K27me3_peaks.broadPeak FINAL_shCTRL_H3K9ac.fastq.gz FINAL_shCTRL_H3K9me3.fastq.gz FINAL_shCTRL_H3K27ac.fastq.gz FINAL_shCTRL_H3K27me3.fastq.gz FINAL_shCTRL_input_DNA.fastq.gz FINAL_shMSK1_H3K9ac.fastq.gz FINAL_shMSK1_H3K9me3.fastq.gz FINAL_shMSK1_H3K27ac.fastq.gz FINAL_shMSK1_H3K27me3.fastq.gz FINAL_shMSK1_input_DNA.fastq.gz ).

UCSC

The entry may remain private before publication. genome browser session (e.g.

submission. Data deposition b. Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks. graph files (e.g. BED files) for the called have deposited or provided access to b. Confirm that you a. Confirm that both raw and final processed data have been deposited in a public database such as deposited in a public database such raw and final processed data have been a. Confirm that both

4. If available, provide a link to an anonymized 3. Provide a list of all files available in the database 3. Provide a list of all files available in the

2. Provide all necessary reviewer access links. 2. Provide all necessary 1. For all ChIP-seq data:

`

Form fields will expand as needed. Please do not leave fields blank. as needed. Please do not leave fields Form fields will expand ChIP-seq Reporting Summary ChIP-seq Reporting June 2017

nature research | ChIP-seq reporting summary 2

The reads were dynamically trimmed in order to remove low quality bases (Trimmomatic version 0.33, parameters TRAILING:5 SLIDINGWINDOW:4:15). Duplicate reads were removed. Peaks with at least 5 mfold and FDR<0.01 were retained. shMSK1_S10P: 63.586 peaks shMSK1_S28P: 34.418 peaks shControl_S10P: 102.877 peaks Software used MACS2 version 2.1.1.20160309 Genome version hg19 from UCSC treatment :-t shMSK1 or shControl control :-c shMSK1_Input or shControl_Input input file format: -f bam effective genome size: -g hg no model building: --nomodel --extsize 300 broad calling on: --broad q-value cut-off:--broad-cutoff 0.01 command :macs2 callpeak H3S10p (Abcam ab5176, validated for the application and species by H3S10p (Abcam ab5176, validated for the vendor) RRID AB_304763 validated for the species by H3S28p (Sigma-Aldrich H9908, clone HTA28, et. al. PNAS 2011 doi 10.1073/ vendor and for the application by Lau P, Genome research, 2014 doi pnas.1012798108 and Sawicka, A et. Al. 10.1101/gr.176255.114) RRID AB_260096 for the application and H3K27ac (Merck Millipore 07-360, validated species by vendor) application and species by H3K9ac (Abcam ab4441, validated for the vendor) RRID AB_2118292 the application and species by H3K9me3 (Abcam ab8898, validated for vendor) RRID AB_306848 the application and species by H3K27me3 (Abcam ab6002, validated for vendor) RRID AB_305237 Single-end reads of 50 bp Single-end Reads #of Reads #of Unique Sample name 34528844 28706324 Shctrl_only_S10P_ATCACG.fastq.gz 37606568 31614435 MSK1_only_S10P_TTAGGC.fastq.gz 38192980 28523735 Shctrl_only_S28P_ACAGTG.fastq.gz 40258276 29198476 MSK1_only_S28P_CAGATC.fastq.gz 47972592 41599158 Shtrl_only_Input_GATCAG.fastq.gz 49028476 42487568 MSK1_only_Input_GGCTAC.fastq.gz 62674789 49680529 FINAL_shCTRL_H3K9ac.fastq.gz 64934602 47333626 FINAL_shCTRL_H3K9me3.fastq.gz 64674206 49245941 FINAL_shCTRL_H3K27ac.fastq.gz 71626642 55603297 FINAL_shCTRL_H3K27me3.fastq.gz 66448120 51989603 FINAL_shCTRL_input_DNA.fastq.gz 52995270 FINAL_shMSK1_H3K9ac.fastq.gz 66773361 44241354 FINAL_shMSK1_H3K9me3.fastq.gz 63004379 49055527 FINAL_shMSK1_H3K27ac.fastq.gz 64048907 47444902 FINAL_shMSK1_H3K27me3.fastq.gz 61158830 48646390 FINAL_shMSK1_input_DNA.fastq.gz 62253807

experiments. experiment.

Methodological details Methodological

9. Describe the methods used to ensure data quality. For the shMSK1 ChIP-seq : 8. Describe the peak calling parameters. For all ChIP-seq data the same calling parameters were used

7. Describe the antibodies used for the ChIP-seq 7. Describe the antibodies used for the 6. Describe the sequencing depth for each the sequencing depth 6. Describe 5. Describe the experimental replicates. the experimental 5. Describe for all ChIPs per condition One replicate ` June 2017

nature research | ChIP-seq reporting summary 3

shControl_S28P: 38.118 peaks 38.118 shControl_S28P: marks ChIP-seq: For the histone Peaks from macs2. during the peak-calling reads were removed Duplicate were retained. 5 mfold and FDR<0.01 with at least 33490 peaks CTRL_K27ac: 30117 peaks CTRL_K27me3: CTRL_K9ac: 19044 peaks peaks CTRL_K9me3: 54234 peaks shMSK_K27ac: 18878 peaks shMSK_K27me3: 39795 peaks shMSK_K9ac: 17000 peaks shMSK_K9me3: 67913 For the shMSK1 ChIP-seq : For the shMSK1 ChIP-seq Fastqc version 0.11.3 Quality inspection: 0.33 Quality trimming: Trimmomatic version Alignment: BWA version 0.7.5a-r405 version 1.4.1 Sorting and removing duplicates: Samtools version 2.1.1.20160309 Peak calling/creation of bedgraphs: macs2 version 4 Creation bigWig files: bedGraphToBigWig Annotation of peaks: homer version 4.9.1 For the histone marks ChIP-seq: Quality inspection: Fastqc version 0.11.3 Alignment: Bowtie version 1.0.1 Sorting: samtools version 0.1.19 version 2.1.1.20160309 Peak calling/creation of bedgraphs: macs2 Annotation of peaks: homer version 4.6

the ChIP-seq data. 10. Describe the software used to collect and analyze 10. Describe the software