<<

ARTICLE

Received 3 May 2013 | Accepted 16 Aug 2013 | Published 17 Sep 2013 DOI: 10.1038/ncomms3464 OPEN Master regulators of FGFR2 signalling and breast risk

Michael N.C. Fletcher1,2,*,w, Mauro A.A. Castro1,*,w, Xin Wang1,2, Ines de Santiago1, Martin O’Reilly1, Suet-Feung Chin1,2, Oscar M. Rueda1,2, Carlos Caldas1,2, Bruce A.J. Ponder1,2, Florian Markowetz1 & Kerstin B. Meyer1,2

The fibroblast growth factor 2 (FGFR2) has been consistently identified as a risk locus in independent genome-wide association studies. However, the molecular mechanisms underlying FGFR2-mediated risk are still unknown. Using model systems we show that FGFR2-regulated are preferentially linked to breast cancer risk loci in expression quantitative trait loci analysis, supporting the concept that risk genes cluster in pathways. Using a network derived from 2,000 transcriptional profiles we identify SPDEF, ERa, FOXA1, GATA3 and PTTG1 as master regulators of fibroblast growth factor receptor 2 signalling, and show that ERa occupancy responds to fibroblast growth factor receptor 2 signalling. Our results indicate that ERa, FOXA1 and GATA3 contribute to the regulation of breast cancer susceptibility genes, which is consistent with the effects of anti- oestrogen treatment in breast cancer prevention, and suggest that fibroblast growth factor receptor 2 signalling has an important role in mediating breast cancer risk.

1 Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge CB2 0RE, UK. 2 Department of Oncology, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK. * These authors contributed equally to this work. w Present addresses: Friedrich Miescher Laboratory of the Max Planck Society, Spemannstrasse 39, 72076 Tu¨bingen, Germany (M.N.C.F.); Department of , Federal University of Rio Grande do Sul (UFRGS), Rua Ramiro Barcelos, 2600, Anexo, 90035-003 Porto Alegre, Brazil (M.A.A.C.). Correspondence and requests for materials should be addressed to K.B.M. (email: [email protected]).

NATURE COMMUNICATIONS | 4:2464 | DOI: 10.1038/ncomms3464 | www.nature.com/naturecommunications 1 & 2013 Macmillan Publishers Limited. All rights reserved. ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms3464

early 70 loci show significant association with breast (Exp3). In each experiment, MCF-7 cells were synchronised by cancer risk in genome-wide association studies (GWAS)1. oestrogen starvation, before adding minimal levels of estradiol NHowever, in most cases we do not yet understand in conjunction with the relevant FGFR stimulus (Methods). how these loci contribute to the risk of developing cancer. A Supplementary Figs S1–S3 summarize the experimental design locus within an intron of the FGFR2 (fibroblast growth factor and results. expression was examined at multiple time receptor 2) gene is consistently the most strongly associated with points and the software limma was used to call differentially risk1–3. Here we take a systems biology approach to examine the expressed genes (DEGs) (Supplementary Methods). Principal regulatory network in breast cancer, how it is perturbed by component analysis demonstrated low experimental variation FGFR2 signalling and how the identified network and its master and the specificity of the FGFR response (Supplementary Figs regulators relate to disease risk. S1–S3). The gene expression response to estradiol increased from The highly significant association of the FGFR2 locus with 6 to 24 h. In contrast, the FGF10 response in Exp1 and 3 was breast cancer risk2 has been replicated in multiple studies in greatest after 6 h. In Exp2, where AP20187 was used to stimulate Europeans3, Asians and African-Americans4. The risk is for ER þ FGFR2, the response continued to increase with time. Each disease5. The known role of FGFR2 signalling, the occurrence of estradiol plus FGFR2 stimulation was compared with estradiol FGFR2 gene amplification in breast cancer and the location of stimulation alone, ensuring that the derived DEG list is FGFR2 the risk SNPs (single-nucleotide polymorphism) within its intron, specific. Figure 1b summarizes the number of DEG called in each make FGFR2 a plausible mediator of risk. Functional studies of the experiments (Exp1–3). A full list of DEG derived from each suggest that the risk allele increases FGFR2 gene expression6, experiment is available in the R package Fletcher2013a. most likely in mammary epithelial cells, but recent genotype- The microarray data were confirmed in independent biological expression correlations in breast tumours have failed to confirm replicates by performing quantitative RT–PCR for a number of an association of the risk SNPs either with expression of FGFR2 selected genes. IL8 is one of the most strongly induced genes and (refs 7,8) or with other nearby potential target genes (K.B.M., we demonstrate that increased IL8 mRNA expression is detected unpublished observation). However, studies in the mouse similarly by two microarray probes and by RT–PCR (Fig. 1c,d). have shown that FGFR2 has an important role in mammary Furthermore, we find that IL8 secretion increased after FGF10 development9 and in maintenance of breast tumour initiating stimulation (Fig. 1e). Both increased expression and secretion cells10, consistent with a role for FGFR2 in conferring risk. were blocked by the FGFR kinase inhibitor PD173074, confirm- FGFR2 signalling cascades have been studied in some detail11, ing that the effect is FGFR specific. but less is known about the resulting changes in gene expression and how different risk genotypes might affect this response. Gene expression changes are ultimately mediated by the activity FGFR2-regulated genes are linked to breast cancer risk loci.As of transcription factors (TFs). In a number of systems, such as the FGFR2 locus is strongly associated with breast cancer risk, we embryonic stem cells12 or glioblastoma13, it has been demon- wished to examine whether FGFR2-regulated genes are risk genes strated that a small number of TFs act as master regulators themselves. To allow us to map risk SNPs to genes, we combined (MRs) that co-ordinate cellular behaviour. MRs can be identified variant set enrichment analysis (VSE)18 with expression quanti- by deriving TF-centric regulatory networks using algorithms such tative trait loci (eQTL) analysis. eQTL are polymorphisms asso- as ARACNe14, where each TF in the network is connected to a set ciated with changes in gene expression. Our analysis examined of genes that it directly regulates (referred to as a ‘regulon’). whether or not breast cancer risk SNPs, and SNPs in linkage Enrichment of a relevant gene signature in each of the regulons disequilibrium (LD) with these, can act as eQTL for particular can point to the TFs acting as MRs of the response or phenotype groups of genes, such as FGFR2-responsive genes. This cis-eQTL/ (master regulator analysis, MRA)13–15. VSE analysis was carried out between breast cancer risk SNPs and Here we describe a network-based strategy to uncover the FGFR2 gene signatures (Exp1–3). First, we repeated the VSE the molecular mechanism underlying breast cancer risk. We analysis with the currently reported list of 51 independent breast identify the TFs acting as MR of the FGFR2 response and cancer GWAS hits ((ref. 18) and GWAS catalogue). Rather than demonstrate that FGFR2-responsive genes and genes in the examining an effect of the tagging SNP at each GWAS locus, this regulons of the MRs are linked to GWAS hits. Our results suggest method defines an associated variant set (AVS) of SNPs in linkage that the risk-associated with altered FGFR2 signalling is due to with each tagging SNP (D0 ¼ 0.99, LOD 43; similar results were altered activity of the ERa-associated transcriptional network obtained with using r2 4.8 in the AVS selection) and then tests (TN) that includes SPDEF and we provide evidence that the whether this set is associated with chromatin features (Methods). FGFR2 signalling pathway is an important contributor to ER þ As previously reported18, we found that FOXA1- and ESR1- disease risk. binding sites are significantly enriched in the breast cancer AVS (Fig. 2b). Next we carried out a cis-eQTL analysis between the risk AVS and potential target genes, using gene expression Results profiles and genotype data from 997 breast cancer samples Deriving an FGFR2-associated gene expression signature.To (METABRIC discovery data set19) in 400 kb windows around examine the effects of FGFR2 signalling in breast cancer, we each SNP cluster. This analysis found that among the genes established three model systems for FGFR2 signalling. As the linked to the risk AVS, there was a significant enrichment for FGFR2 locus primarily confers risk of ER þ disease, we chose to FGFR2-responsive genes (Exp1–3: E2 þ FGF10, E2 þ AP20187, study the ER-dependent breast cancer cell line MCF-7 (Fig. 1a). Tet þ E2 þ FGF10, each compared with E2 treatment alone; First, we stimulated endogenous FGFRs (FGFR1b and FGFR2b) Fig. 1c), but not for oestrogen responsive genes (E2 and Tet þ E2, with FGF10 (Exp1). FGF10 has higher affinity for FGFR2b, but each compared with vehicle treatment, Fig. 2). (In this system, the can also signal through FGFR1b16. Second, we used a system E2 response compares resting and cycling cells and is likely to where the FGFR2-kinase domain is linked to a dimerization include many genes associated with this change in the cell cycle.) domain (iF2 construct)17 and the kinase is activated artificially by A cis-eQTL analysis of the FGFR2 and E2 gene signatures with adding the small molecule AP20187 (Exp2). Third, we over- AVS for prostate or colorectal cancer GWAS and bone mineral expressed the full-length FGFR2b from a tetracycline-inducible density (BMD) did not shown any significant associations expression vector and again activated the receptor using FGF10 (Fig. 1d–f). In conclusion, we provide evidence that breast cancer

2 NATURE COMMUNICATIONS | 4:2464 | DOI: 10.1038/ncomms3464 | www.nature.com/naturecommunications & 2013 Macmillan Publishers Limited. All rights reserved. NATURE COMMUNICATIONS | DOI: 10.1038/ncomms3464 ARTICLE

Exp1 Exp2 Exp3 Endogenous FGFR iF2 construct FGFR2b over-expression FGFR 1b and 2b FGFR2 signalling domain Tet induction FGF10 ligand AP20187 bindin domain FGF10 ligand

FGFR1b + FGF10FGFR2b + FGF10 iF2 + AP20187 FGFR2b + FGF10

13 ILMN_1666733 150 Exp1 12 UT E2 168 11 E2+FGF10 100 65 144 E2+FGF10+PD E2 10 Exp3 E2+FGF10 568 Exp2 50

gene expression 9 487 mRNA fold increase IL8 8 0 IL8 1,293 0 624 36 12 24 2,774 ILMN_2184373 13 140 UT E2 E2+FGF10 12 ) E2 * E2+FGF10+PD E2+FGF10 –1 100 * 11 E2+FGF10+PD *

10 (ng ml 60 IL8

gene expression 9

IL8 20 8 0 6 24 24 48 72 Time (h) Time (h)

Figure 1 | Derivation and validation of the FGFR2 gene expression signatures. (a) A schematic of FGFR2 expression systems used to derive the gene expression signatures Exp1–3. (b) Venn diagram depicting the overlap between the genes deregulated after FGFR2 signalling in the experimental systems Exp1–3. Each list of FGFR2-regulated genes was derived as a contrast between the FGFR2 stimulus with estradiol versus estradiol only treatment to obtain the FGFR2-specific response. Limma analytical contrasts to derive the expression signatures were Exp1: E2.FGF10 versus E2, Exp2: E2.AP20187 versus E2 and Exp3: TET.E2.FGF10 versus TET.E2. (c–e) Confirmation of gene expression microarray response by RT–PCR and expression: (c)gene expression as measured by two IL8 microarray probes (ILM-1666733 and ILMN_2184373) in arbitrary units; (d) RNA levels detected by RT–PCR and (e) protein secretion of IL8 after FGF10.E2 stimulation of MCF-7 cells versus E2 treatment only. Error bars show the s.d. from three biological replicates. (*Po0.05, multiple testing corrected two sample t-test) UT: untreated; E2: estradiol; PD: FGFR2-kinase inhibitor PD173074. risk SNPs are preferentially linked to genes that, on a background Fig. 3). The filtered TN has less complexity and highlights the of oestrogen stimulation, are responsive to FGFR2 signalling. most significant interactions.

Regulatory network derived from gene expression profiles. Master regulators of FGFR2 signalling. Next, we used MRA to To better understand the pathways involved in the FGFR2 sig- identify the MRs of FGFR2 signalling by testing for significant nalling response, we constructed a regulatory network for breast enrichment of FGFR2-responsive genes (Exp1–3) in each regulon. cancer based on the METABRIC data set19 (computational We first ranked regulons based on the enrichment score obtained pipeline summarized in Fig. 3). METABRIC consists of a dis- on the unfiltered TN and found good agreement between Exp1–3, covery and a validation cohort of 997 and 995 breast tumour both for the total set of regulons as well as the top 50 regulons samples each, for which gene expression data are available ( ¼ a: (Fig. 4), and also between cohort I and II (Supplementary gene expression data in Fig. 3). The data were normalized and Figs S4–S6), suggesting that our three model systems identify probes with low variation removed from the analysis (b: filtered similar sets of regulated genes following FGFR2 signalling. Then, gene expression data in Fig. 3). The TN14 was then derived by to define a smaller set of functionally important MRs, we applied computing the mutual information (MI) between annotated TFs the MRA to the filtered TN and found that 20 regulons are sig- (n ¼ 1,388 probes) and all potential targets (n ¼ 20 K probes after nificantly enriched across the two breast cancer cohorts in at least filtering) in each cohort ( ¼ c in Fig. 3) (Methods). In this net- one experiment (Fig. 5a). The agreement between the two cohorts work, each TF has been assigned a list of candidate regulated was very high. When a DPI tolerance of 0.05 is allowed, the genes referred to as its regulon. In the TN, each target can be regulons of five MRs were enriched in both cohorts in all three linked to multiple TFs and regulation can occur as a result of both experimental systems (a DPI tolerance from 0.01 to 0.05 gives the 15 direct (TF1–target) and indirect interactions (TF1–TF2–target) . same consensus). These were SPDEF, ESR1 and its co-factors We therefore applied the data processing inequality (DPI) FOXA1 and GATA3, and PTTG1 (Fig. 5b). None of the identified (Methods), which removes the weakest interaction in any regulons were significantly enriched using a random gene set of triangle of two TFs and a target gene, thus preserving the comparable size. When carrying out the MRA on a network dominant TF-target pairs, resulting in the filtered TN ( ¼ din derived for a completely independent breast cancer data set

NATURE COMMUNICATIONS | 4:2464 | DOI: 10.1038/ncomms3464 | www.nature.com/naturecommunications 3 & 2013 Macmillan Publishers Limited. All rights reserved. ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms3464

Enrichment score (VSE) Bonferroni P-value Mapping tally 4 rs10069690 7 rs909116 8 rs7107217 3 rs108220103 4 rs3817198 1 rs9383951 62 rs10490113 55 rs4973768 30 rs6788895 39 rs1092913 45 rs981782 75 rs16886165 38 rs6556756 70 rs17530068 66 rs2180341 35 rs9485372 38 rs3757318 32 rs2046210 10 rs9383938 10 rs10263639 14 rs2048672 44 rs1562430 22 rs1011970 44 rs2380205 13 rs10995190 36 rs3750817 92 rs10510102 16 rs614367 76 rs1154865 54 rs1926657 59 rs999737 43 rs4322600 27 rs3112612 24 rs10871290 3 rs2075555 31 rs1978503 15 rs8170 21 rs8100241 65 rs10411161 89 rs2284378 Binding site (–log ) –202468 14 rs11249433 29 rs13393577 64 rs13387042 51 rs4415084 51 rs13281615 71 rs865686 31 rs704010 45 rs3803662 48 rs458685 SNPs 127 rs7716600 10 128 rs1876206 FOXA1 4.67 19 ESR1 13.91 25

P-value Cis-eQTL (–log10) –2 02468 E2 0.95 4 Tet+E2 0.89 6 E2+FGF10 6.76 12 E2+AP20187 6.07 17 Tet+E2+FGF10 5.90 13

Enrichment score (VSE) Bonferroni P-value Mapping tally 7 rs4430796 1 rs7210100 5 rs2735839 8 rs103294 3 rs817826 8 rs12653946 3 rs13254738 3 rs2242652 19 rs1465618 11 rs721048 18 rs6545977 49 rs10187424 50 rs12621278 67 rs7584330 53 rs2292884 31 rs9311171 50 rs2660753 32 rs17181170 32 rs6763931 13 rs10936632 30 rs12500426 55 rs17021918 18 rs7679673 31 rs4466137 21 rs1983891 91 rs10498792 23 rs651164 96 rs9364554 23 rs12155172 32 rs10486567 57 rs6465657 35 rs1512268 16 rs1016343 32 rs6983561 37 rs445114 31 rs6983267 45 rs1447295 44 rs3123078 27 rs11199874 48 rs4962416 31 rs7127900 31 rs11228565 21 rs10875943 16 rs902774 16 rs9600079 45 rs1529276 30 rs4775302 16 rs1859962 14 rs8102476 29 rs9623117 1 rs5759167 17 rs742134 81 rs1327301 90 rs2121875 77 rs5919432 Cis-eQTL (–log10) –2 0682 4 38 rs13385191 SNPs 178 rs10934853 182 rs345013 156 rs130067 150 rs339331 E2 3.06 8 Tet+E2 4.19 10 E2+FGF10 3.58 11 E2+AP20187 4.09 15 Tet+E2+FGF10 3.60 11

Enrichment score (VSE) Bonferroni P-value 3 rs5934683 Mapping tally 9 rs4779584 8 rs4939827 62 rs961253 41 rs6691170 51 rs6687758 25 rs10936599 17 rs16892766 86 rs3824999 31 rs3802842 31 rs7315438 13 rs4444235 51 rs9929218 58 rs10411210 15 rs4925386 Cis-eQTL (–log ) –2 02 468 12 rs1321311 28 rs10505477 36 rs10795668 45 rs11169552 SNPs

10 199 rs7758229 E2 0.02 0 Tet+E2 0.04 1 E2+FGF10 0.30 2 E2+AP20187 1.80 6 2 Tet+E2+FGF10 0.70

Enrichment score (VSE)

P-value Bonferroni Mapping tally

(–log ) 1 rs17131547 17 rs7524102 26 rs1054627 31 rs4609139 65 rs2214681 95 rs1021188 31 rs9466056 35 rs6710518 37 rs10510628 87 rs9291683 69 rs1366594 43 rs2908004 71 rs4355801 14 rs2165468 96 rs3736228 72 rs10506701 54 rs9594738 SNPs 17 rs8057551 16 rs2273061 Cis-eQTL 10 –2 0 2 4 6 8 24 rs11864477 56 rs4087296 126 rs4811196 213 rs13204965 140 rs9317284 145 rs9630182 E2 0.30 1 Tet+E2 0.68 2 E2+FGF10 0.62 3 E2+AP20187 0.30 2 Tet+E2+FGF10 1.20 3

Figure 2 | Enrichment of the breast cancer AVS in FGFR2-related gene loci. (a) VSE plots for the breast cancer AVS and the FOXA1 and ESR1 cistromes in E2-treated MCF-7 cells. (b) E2- and FGFR2-responsive genes in MCF-7 cells are tested for functional association with risk AVS by cis-eQTL analysis using METABRIC tumour gene expression data. (c–e) The same genes tested for functional association with other cancer risk AVS by cis-eQTL: (c) the prostate cancer AVS, (d) the colorectal cancer AVS and (e) the bone mineal density AVS. Box plots in each panel show the normalized null distributions (box: 1st–3rd quartiles; bars: extremes). Black diamonds show the corresponding VSE scores. Red diamonds highlight mapping tallies that satisfy a Bonferroni-corrected threshold for significance (Po1e-4). P-values are based on null distributions from 1,000 MRVSs. Binary matrices show clusters of risk-associated and linked SNPs with at least one SNP mapping to the genomic annotations and validated by cis-eQTL analysis. The bottom row of numbers indicates the number of linked SNPs in each SNP cluster. The mapping tally shows the number of clusters per annotation. Rows highlighted in dark grey show statistically significant enrichment. The cis-eQTL analysis extends the original VSE method by conditioning the mapping tallies to functional links. Non-disjoint AVSs with risk-associated SNPs in LD were merged in order to avoid inflated mapping tallies.

(TCGA breast cancer data set)20, regulons for four of the five easily detected in the network for normal tissue. We also derived MRs (SPDEF, ERa, FOXA1 and GATA3) were again found to be a filtered TN for a gene expression data set from T-cell acute enriched in FGFR2-responsive genes (Supplementary Fig. S7). lymphoblastic leukaemia (T-ALL)24 and found that the PTTG1 We also computed a filtered TN for 144 normal breast tissue regulon in this network was enriched for two of the three FGFR2 samples from METABRIC patients. In this network, three of the gene signatures (Exp2 and Exp3). The enrichment for PTTG1 in five MRs (SPDEF, GATA3 and FOXA1) were enriched, and this two distinct cancer tissues, but not in normal breast tissue, may enrichment was found in all three experiments (Fig. 5b). Inter- indicate a large number of proliferation-related (and not breast estingly, ESR1 and PTTG1 regulons are not enriched, possibly cancer-specific) genes in the PTTG1 regulon. Our analysis of the reflecting the fact that in normal breast the majority of epithelial extended MR list supports this idea (Supplementary Fig. S8 cells are ER , non-dividing cells21. GATA3 and FOXA1 have and Supplementary Methods). An association of PTTG1 with been postulated to function upstream of ER and are required proliferation was confirmed by carrying out the MRA using a early in mammary development22,23 and might therefore be more proliferation-based gene signature, termed meta-PCNA25,on

4 NATURE COMMUNICATIONS | 4:2464 | DOI: 10.1038/ncomms3464 | www.nature.com/naturecommunications & 2013 Macmillan Publishers Limited. All rights reserved. NATURE COMMUNICATIONS | DOI: 10.1038/ncomms3464 ARTICLE

Compute MI Cohorts Pre-processing TFs versus targets Apply DPI Analysis pipeline

Breast cancer I 1) Discovery set

Breast cancer II 2) Test set

Normal breast 3) Control set normals (related tissue) T-cell ALL 4) Control set cancer (non-related tissue)

- Enrichment maps MRA analysis - Overlap analysis FGFR2 - Clustering Enrichment analysis signatures - Network visualization on regulatory units - Follow-up on selected master regulators

Enriched Master regulons regulators

Figure 3 | Network inference and MRA flowchart. Four independent gene expression data sets (breast cohort I and II, normal breast and T-ALL) were analysed in parallel with the ARACNe algorithm to derive TF-centric regulatory networks. Master regulators were identified by examining the overlap between the targets in each regulon in the network and the FGFR2 gene expression signatures using the MRA (a: gene expression data; b: filtered gene expression data; c: TN; d: DPI-filtered transcriptional network; MI, mutual information; DPI, data processing inequality).

n=50 n=50 n=50 n=50 n=50 n=50

Exp1 (E2+FGF10)

JC =0.43 JC =0.61 JC =0 R =0.77 R =0.92 R =0.06 600 n =50 n =50 n =50 n =50 400 Exp2 (E2+AP20187) 200 Regulons JC =0.64 JC =0.01 (enrichment rank) 0 R =0.93 R =–0.15

n=50 n =50

Exp3 (Tet+E2+FGF10)

JC =0.01 R = –0.04

Random

0 200 400 600 Regulons (enrichment rank)

Figure 4 | MRA agreement among different FGFR perturbation experiments. The scatter plots show the agreement in the ranking of all regulons by the enrichment P-value, between the different experimental perturbations of FGFR2 signalling: Exp1 ¼ E2 þ FGF10, Exp2 ¼ E2 þ AP20187 and Exp3 ¼ Te t þ E2 þ FGF10. Each dot represents one regulon (that is, one TF and all its targets) in the TN derived from cohort I. The correlation coefficient R is given for each pairwise ranking. The corresponding Venn diagrams show the level of agreement on the ranking for the top 50 enriched regulons, expressed by the JC. both cohort I and II of the METABRIC data. The most strongly response in our breast cancer gene expression network, high- enriched regulon in both cohorts is that of PTTG1, while none of lighting FGFR2-responsive genes (purple shading). Interestingly, the other MR regulons are enriched (Supplementary Table S1). transcription of SPDEF was not perturbed by FGFR2, but many of Figure 5c depicts the regulons of the five MRs of the FGFR2 its target genes were (Fig. 5c, Supplementary Fig. S9), suggesting

NATURE COMMUNICATIONS | 4:2464 | DOI: 10.1038/ncomms3464 | www.nature.com/naturecommunications 5 & 2013 Macmillan Publishers Limited. All rights reserved. ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms3464

Cohort I Cohort II Normals T - All Cohort I Cohort II Consensus ProbeID Symbol Exp1 Exp2 Exp3 Exp1 Exp2 Exp3 Exp1 Exp2 Exp3 Exp1 Exp2 Exp3 Exp1 6 95 (50%) ILMN_2161330 SPDEF 25803 Exp2 20 20 20 (100%) ILMN_2406656 GATA3 2625 Exp3 18 18 18 (100%) ILMN_1678535 ESR1 2099 Rand 0 0 – ILMN_1766650 FOXA1 3169 Overall consensus 5 (25%) ILMN_1753196 PTTG1 9232 Exp1 Endogenous FGRs Enriched regulons with DPI tolerance = 0.0 Exp2 iF2 construct Enriched regulons with DPI tolerance in [0.01, 0.05] Exp3 FGFR2b over-expression Rand Random signatures 0 1 2 3 Hits

TFs Targets PTTG1

GATA3

SPDEF

ESR1

FOXA1

Figure 5 | MRs of FGFR2 signalling. (a) Number of regulons enriched for FGFR2 signatures (Exp1–3) in breast cancer cohort I and cohort II and their overlap. The percentage of the overlap is given relative to the total number in cohorts I and II. There is substantial overlap between the MRs derived for different FGFR2 signatures, and the consensus corresponds to the five MRs shown in b. (b) The MRs enriched for FGFR2 signatures in breast cancer in both cohorts. Enrichment was calculated using the three expression signatures (Exp1–3) on networks derived from breast cancer cohort I and II, normal breast or T-ALL, that were filtered by DPI using either a 0 (black circle) or 0.05 threshold (grey circles). (c) Breast cancer filtered TN enriched for the FGFR2-responsive genes. The network shows the five MRs, each one comprising one TF (square nodes) and all inferred targets (round nodes) applying a DPI threshold of 0.01. that the activity of this TF is regulated at the protein, rather than Transfac showed a strong enrichment of binding motifs for ESR1, the transcriptional level. The identification of SPDEF highlights FOXA1 and GATA3 in each of their identified regulons (Fig. 6a). the increased power of the MRA over simple differential Next, we examined actual TF binding and carried out triplicate expression-based approaches. chromatin immunoprecipitation (ChIP)-seq experiments for ERa and SPDEF in MCF-7 cells (Supplementary Fig. S10, Supplementary Tables S2, S3 for full QC). Figure 6b shows that Experimental confirmation of computationally defined reg- SPDEF binding is strongly enriched near the promoters of its own ulons. To validate the identified regulons, we examined whether regulon. Interestingly, these promoters are also enriched for there was enrichment of TF binding near transcription start sites GATA3, ESR1 and FOXA1 binding. De novo motif finding within (TSS) of genes found in the regulons of a particular MR. This these data sets reveals PWMs that are very similar to those pre- validation was carried out for MRs whose regulons were enriched viously reported26–29. Enrichment of binding by these four TFs in in the cancer and normal breast tissue, but not in T-ALL: ESR1, each other’s regulons is widespread (Supplementary Fig. S11), FOXA1, GATA3 and SPDEF. Bioinformatic analysis using suggesting that these MRs function co-operatively to regulate sets position weight matrices (PWM, consensus binding motifs) from of genes.

6 NATURE COMMUNICATIONS | 4:2464 | DOI: 10.1038/ncomms3464 | www.nature.com/naturecommunications & 2013 Macmillan Publishers Limited. All rights reserved. NATURE COMMUNICATIONS | DOI: 10.1038/ncomms3464 ARTICLE

ESR1 regulon SPDEF regulon 0.020 0.030 Observed regulon and sites Transfac PWM Random regulons (P < 1e-03) Random PWM (P = 7.3e-09) Random sites (P < 1e-03) 0.015 2 0.020 Background 0.010 2 1 Bits 1 0.010 Bits TSS density 0.005 0 ESR1 motif TSS density 0 ESR1 motif 0.000 ref M01801 0.000 ChlP-seq –400 –200 0 200 400 0.030 Observed regulon and sites Nearest binding site (kb) Random regulons (P < 1e-03) Random sites (P < 1e-03) 0.020 Background 2 FOXA1 regulon 0.020 Transfac PWM 1 0.010 Bits

Random PWM (P = 2.3e-33) TSS density 0.015 0 2 FOXA1 motif 0.000 ChlP-seq 0.010 1 Bits 0.015 Observed regulon and sites TSS density 0.005 0 Random regulons (P = 2.0e-03) FOXA1 motif Random sites (P = 5.0e-03) ref M00724 0.010 0.000 Background 2 –400 –200 0 200 400 1 Nearest binding site (kb) 0.005 Bits TSS density 0 GATA3 motif 0.000 GATA3 regulon ChlP-seq 0.020 Transfac PWM 0.030 Observed Random regulons (P < 1e-03) Random PWM (P = 2.2e-08) regulon Random sites (P < 1e-03) 0.015 and Background 2 0.020 sites 0.010 1 2 Bits 0.010 TSS density TSS density 1

0.005 0 Bits GATA3 motif ref M00077 0 0.000 0.000 SPDEF motif –400 –200 0200 400 –400 –200 0 200 400 ChlP-seq Nearest binding site (kb) Nearest binding site (kb)

100 400 100 200

75 300 75 150

50 200 50 100

25 100 25 50

0 control region and input 0 0 0 Fold change over CCND1

E2 E2 E2 E2 PD

Vehicle FGF10 Vehicle FGF10 Vehicle FGF10 Vehicle FGF10 E2+FGF10 E2+FGF10 E2+FGF10 E2+FGF10 E2+FGF10+PD E2+FGF10+PD E2+FGF10+PD E2+FGF10+ Figure 6 | Validation of regulons. (a) Enrichment of known binding motifs for ESR1, FOXA1 and GATA3 in each of their inferred regulons. The occurrence of motif sites is shown as the distance between the TSS of the genes in each regulon and the nearest motif encountered (red line). This was compared with the occurrence of random sites of the same length in the same regulons derived for a random motif (black line; mean±s.d.). Motifs are taken from Transfac. (b) Enrichment of binding sites of the ESR1, FOXA1, GATA3 and SPDEF regulons in SPDEF ChIP-seq data obtained in MCF-7 cells. A background distribution is shown as a reference line (grey line) and represents the distance between the TSS and a random peak placed in the same . (c–f)ERa occupancy changes after FGF10 signalling (mean±s.e.m.) in three technical repeats. ERa binding was analysed by ChIP–RT–PCR at the regulatory regions of four genes: (c) ,(d) GREB,(e) EGR2 and (f) the breast cancer susceptibility gene TOX3. Enrichment is shown relative to a negative control from the CCND1 locus. Cell stimulation with E2, FGF10 and PD (PD173074) is indicated in each panel.

The target genes for ERa, GATA3 and FOXA1 in MCF-7 cells DEG list after PTTG1 knock down was enriched for genes of are already well defined26–28 and our data fit well with these the PTTG1 regulon as well as additional proliferation-related results (see above). As validation of the SPDEF and PTTG1 regulons (Supplementary information, MR overlap and synergy regulons, we carried out small interfering RNA (siRNA) knock analysis). There is remarkable overlap between the MRs down experiments for these TFs (previously published ESR1 perturbed after siPTTG1 treatment and MRs enriched with data are included as positive control) to confirm that the the proliferation-based meta-PCNA signature25 in both cohorts responsive gene sets are indeed enriched in the relevant regulons. (Supplementary Table S1). Our knock-down experiments provide For each of these three putative MRs we find that its own regulon further experimental support for the computationally derived was significantly enriched (Supplementary Table S4). Further network structure. evidence for cross-regulation between the MRs and all five Finally, to demonstrate the link between the identified MRs MR regulons was obtained by gene set enrichment analysis and FGFR2 signalling, we examined ERa occupancy at FGFR2- (Supplementary Fig. S12 and Supplementary Methods). The responsive genes using ChIP. For the binding sites tested here,

NATURE COMMUNICATIONS | 4:2464 | DOI: 10.1038/ncomms3464 | www.nature.com/naturecommunications 7 & 2013 Macmillan Publishers Limited. All rights reserved. ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms3464

ERa occupancy is induced by estradiol, but importantly this the regulons were derived for ER þ , but not ER tumours occupancy is increased by additional treatment with FGF10 and (Fig. 8d,e), fully in keeping with the central role for ER that we reversed by the FGFR kinase inhibitor PD173074 (Fig. 6c–f). postulate. These results were obtained for the known ER-induced genes In conclusion, we demonstrate that at least three of the MYC and GREB, the ER-repressed gene EGR2 and for TOX3,a identified MRs, ESR1, GATA3 and FOXA1 are linked to risk gene likely breast cancer risk gene, which does not respond to estradiol, expression. This association with risk is restricted to the FGFR2- but is induced by FGF10. Our results show that FGF10 signalling responsive genes in each regulon, suggesting that FGFR2 activity can alter ERa occupancy at FGFR2-responsive genes. is an important determinant of ER þ breast cancer risk.

Transcriptional modules of MRs are highly overlapping.Tobe Discussion able to identify MRs from our gene expression network, we used Although the number of known breast cancer risk loci is the DPI filter in our pipeline to remove overlap increasing rapidly, we still have little knowledge of the cellular between regulons. However, in a real cellular setting co-operating pathways that are perturbed in cancer predisposition. Here we TFs regulate overlapping sets of genes, with the expression of take a systems biology approach to gain insight into pathways and individual genes being affected by the activity of multiple TFs. We networks associated with breast cancer susceptibility. We focus therefore examined the overlap of all TF regulons (based on the on FGFR2, the most significant risk locus in multiple independent unfiltered TN in cohort I) by unsupervised clustering. Figure 7a GWAS1–3. Using eQTL analysis, we show that breast cancer shows the Jaccard coefficient (JC) for overlap between regulons. risk SNPs are preferentially linked to genes affected by FGFR2 Only a few, very highly connected TF clusters exist and the signalling, supporting the idea that breast cancer risk SNPs cluster FGFR2 response is mediated by the largest cluster (centred on in pathways, as has been shown for metabolic diseases34. Our regulon 250 in Fig. 7a). An enlargement of these data for the five findings support other evidence that risk SNPs in the intron of MRs (Fig. 7b) highlights the strong overlap between the ERa FGFR2 mediate their effect through altering expression of the network of TFs (ESR1, GATA3 and FOXA1) and SPDEF and a FGFR2 gene. We identify MRs of the FGFR2 signalling response smaller overlap with the PTTG1 regulon. The regulons most and find a central role for the ERa TN, including ESR1, FOXA1 highly enriched for the FGFR2 signatures strongly overlap with and GATA3. In addition to these known factors, we identify those enriched for the estradiol signatures (Fig. 7a,b), but both E2 SPDEF as a novel co-regulator of the ERa network. Our analysis and FGFR2 unique genes exist, in keeping with our finding that separates this network from MRs related to proliferation changes, the two responses peak at different times (Supplementary especially PTTG1, which is also known as securin and specifically Figs S1a–S3a) and differ in their association with risk genes. interacts with (ref. 35). We used RedeR30 to visualize the overlap between different TF Several lines of evidence support the identification of these five regulons in the unfiltered TN in a network graph. The edges MRs and regulons. The genes whose expression is altered by within this network represent inter-regulon overlaps with a JC FGFR2 signalling were consistent across the three methods used 40.4 (Fig. 7c). The MRs ESR1, FOXA1 and GATA3 cluster very to activate FGFR2. The gene expression networks identified from closely and the newly identified MR SPDEF is also tightly con- the METABRIC samples, and the MRs identified by subsequent nected to this central cluster. (See Supplementary Fig. S13 for a analysis, were again consistent across the methods of FGFR2 fully annotated network.) The PTTG1 regulon, which is asso- activation and between the discovery and validation cohorts. Four ciated with both breast cancer and T-ALL, maps to a different of the five MRs consistently identified from the breast cancer part of the network (Fig. 7c). It is closely linked to and signalling networks were absent from a similar analysis using FOXM1, which are part of the extended MR list and have pre- networks obtained from another malignancy, T-ALL, as a control. viously been linked to control of proliferation31,32. The close link The same four MRs were also identified when a parallel analysis between these regulons was confirmed in our siRNA analysis, was carried out on the TCGA breast cancer data set. Finally, the where the E2F2 and FOXM1 regulons were significantly unfiltered TN for breast cancer groups together many TFs perturbed by siRNA against PTTG1 (Supplementary Table S4). that have previously been linked to ERa activity, such as GATA3 The relationship between clusters of TFs was explored using the (ref. 28), FOXA1 (ref. 27) and XBP36. The striking overlap of the previously described synergy and shadowing analyses33 network defined by our analysis with previously described (Supplementary Fig. S14 and Supplementary Methods). Our regulatory circuits37 supports the validity of our approach. results are consistent with identification of a central ER-related Further support for the importance of these MRs comes from MR cluster and the presence of a second cluster that is likely to be comparison with somatic alterations in breast cancer20. Two of related to changes in cell proliferation. the five MRs of the FGFR2 response, FOXA1 and GATA3, are frequently mutated in breast tumours20, suggesting that pathways Risk SNPs link to FGFR2-responsive genes in MR regulons. mediating susceptibility overlap with those perturbed during Having defined the regulons, we investigated how risk SNPs are cancer progression. distributed among them and carried out a cis-eQTL analysis The identification of the ERa-network, which has long been between the risk SNP list (AVS) and the genes in the regulons for known to be critical to mammary gland development and cancer each of the MRs. Our analysis showed a statistically significant progression, as central mediator of the FGFR2 response is enrichment for genes in the ESR1, GATA3 and FOXA1 regulons consistent with the epidemiological findings that the FGFR2 risk with breast cancer risk SNPs but not prostate, colon or BMD risk is restricted to ER þ disease1,4. Furthermore, the identification SNPs (Fig. 8a). This is the first report of a statistically significant of FOXA1 and GATA3 as TFs associated with the expression of link of GATA3 with risk gene expression. Within each regulon, breast cancer risk genes is consistent with functional analysis we found that only the FGFR2-responsive genes were associated of risk loci available to date. FOXA1 has been confirmed as a with risk (Fig. 8b,c). However, we note that all FGFR2 responses mediator of risk at the TOX3 gene18 and GATA3 binds to one of were measured on a background of oestrogen signalling, so that the causative SNPs at the CCND1 risk locus38. ER may still play a critical role, but in conjunction with FGFR2. SPDEF is of interest as a MR of the FGFR2 response and as a As an additional control, we examined whether risk is associated novel cofactor of the ERa-network, as validated by both our with ER status and only observed significant enrichment when network and ChIP-seq analysis. SPDEF is normally expressed in a

8 NATURE COMMUNICATIONS | 4:2464 | DOI: 10.1038/ncomms3464 | www.nature.com/naturecommunications & 2013 Macmillan Publishers Limited. All rights reserved. NATURE COMMUNICATIONS | DOI: 10.1038/ncomms3464 ARTICLE

Random Random Exp3 Exp3 Exp2 Exp2 Exp1 Exp1 686 FOXA1

GATA3

ESR1 458

SPDEF

PTTG1

229 ESR1 PTTG1 GATA3 FOXA1 Random E2 (Ctr3) E2 (Ctr2) E2 (Ctr1) SPDEF Enriched regulons (overlap) 0.2 0.4 0.6 0.8 0.0 1.0 0e+0

Jaccard coefficient 1e–10 1e–06 1e–02 1e–01 3e–01 5e–01 1 (scale at A = B/2 ) Adjusted P-value 1 229 458 686 All regulons (overlap) Random E2 (Ctr3) E2 (Ctr2) E2 (Ctr1)

JUN

RORA FOSB 20 KLF9 SMARCA4 211 JUNB 581 KLF12 FOS ZNF259 EGR1

ZNF473 1,181 HOXD12

Regulon size BACH2 ZNF384 MEF2D KNTC1 ZNF263 TRIM22 2,189 DEK PTTG1 HMGB2 PRDM1 HBP1 E2F2 MAFB GLI3 HCLS1 ATF3 SREBF1 TBX19 HMGA1 STAT4 FOXM1 CBFB TFAP2A ZNF500 MITF 0.40 NFIL3 GATAD2A TWIST1 IRF8 YBX1 0.43 KLF6 VAV1 MZF1 SNAI2 NFE2L2 SPI1 SNAPC2 ZNF434 EMX2 0.46 MYC ZNF580 ENO1 RUNX2 TCEAL1 FOXF2 ZNF142 MTF1 MNL2 CIITA RELB ZNF688 ENO1 IRF1 SOX11 0.51 SNAPC4 BRPF1 RARA THRA NR2E3 PRRX1 SOX12 NFE2L3 ZNF692 SOLH CBFA2T2 SP140 VEZF1 AR ZFHX4 0.62 BRF1 ZNF446 AEBP1 Jaccard coefficient CREG1 STAT1 RUNX3 MYB PRRX2 ZNF423 GATA3 ZMYM3 POU2F1 LMO4 BRD8 FOXA1 ZNF264 PPARD SALL2 RUNX1T1 ZNF587 MEIS2 ZNF589 AFF3 ETV7 ZNF552 GLI2 ZNF493 IRF3 ESR1 ELF4 XBP1 LZTFL1 EPAS1 ZNF324 TCF4 ZNF430 RB1 GAS7 0 1 2 3 CSDA KLF11 ZNF652 RFX1 CEBPB ETS2 SOX17 FGFR2 enrichment RXRA ZNF219 NFIB SPDEF (significant MRAs) DAXX ZNF573 KLF10 PLAGL1 IKZF4 STAT2 CREB3L2 KLF2 TRIM29 SMAD5 CTNNB1 BTAF1 TSC22D3 FUBP3 HOXA4 ZNF394 ELF5 NFKB2 RERE PGR TCF7L1

SOX10 ZNF302 YWHAZ

EN1

PRDM2

Figure 7 | Overlap between regulons in the unfiltered TN. (a) The heatmap shows the hierarchical clustering on the Jaccard similarity coefficient (in shades of blue) computed among all regulons in the unfiltered TN derived from cohort I. Sidebars show the enrichment P-values (shades of orange) from the MRA analysis for FGFR2-associated gene expression signatures (Exp1–3) at the top of the graph and the MRA analysis for E2-associated gene expression signatures (E2 Ctrl 1–3) derived for each experiment on the left. (b) Hierarchical clustering on the Jaccard similarity coefficient focused on the overlap between the five MRs of the FGFR2 response. (c) The unfiltered TN in breast cancer, with edges depicting the overlap of regulons. The network is calculated on cohort I and different intensities of orange indicate enrichment of a regulon in 1, 2 or 3 of the FGFR2 gene signatures. Arrows highlight the ESR1 and PTTG1 regulons. Unconnected regulons and connections corresponding to JC o0.4 are not shown (a full network is shown in Supplementary Fig. S13).

NATURE COMMUNICATIONS | 4:2464 | DOI: 10.1038/ncomms3464 | www.nature.com/naturecommunications 9 & 2013 Macmillan Publishers Limited. All rights reserved. ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms3464

Breast cancer Prostate cancer Coloreatal cancer BMD Enrichment Enrichment Enrichment Enrichment score (VSE) score (VSE) score (VSE) score (VSE) P-value Bonferroni P-value Bonferroni P-value Bonferroni P-value Bonferroni Regulon Targets (–log10) –2 0 2 4 6 8 (–log10) –2 0 2468 (–log10) –2 0 2 4 6 8 (–log10) –2 0 2 4 6 8 PTTG1 AII targets 1.79 3.57 0.30 0.59 SPDEF AII targets 1.93 1.38 0.05 0.61 ESR1 AII targets 4.49 2.88 0.30 1.76 GATA3 AII targets 5.47 4.27 0.30 0.94 FOXA1 AII targets 8.47 3.62 0.30 0.58

P-value P-value P-value P-value (–log ) –2 0 2 4 68 –2 0 2468 –2 0 2 4 6 8 –2 0 2468 Regulon Targets 10 (–log10) (–log10) (–log10) PTTG1 FGFR2 hits 1.10 1.91 1.37 0.03 SPDEF FGFR2 hits 1.89 0.30 0.02 0.30 ESR1 FGFR2 hits 4.51 0.92 0.65 0.30 GATA3 FGFR2 hits 5.20 2.69 0.64 0.30 FOXA1 FGFR2 hits 5.81 1.60 0.64 0.30

P-value P-value P-value P-value (–log ) –2 0 2468 (–log ) –2 0 2468 –2 0 2468 (–log ) –2 0 2468 Regulon Targets 10 10 (–log10) 10 FOXA1 non-FGFR2 hits 0.56 3.22 0.04 1.16 SPDEF non-FGFR2 hits 0.61 0.83 0.03 1.10 ESR1 non-FGFR2 hits 1.42 2.49 0.05 1.83 GATA3 non-FGFR2 hits 1.43 2.38 0.06 1.06 PTTG1 non-FGFR2 hits 2.37 3.71 0.03 1.10

Breast cancer Breast cancer Enrichment Enrichment score (VSE) score (VSE) Bonferroni Bonferroni P-value P-value (–log ) ER+ regulon Targets 10 –2 0 2 4 68 ER– regulon Targets (–log10) –2 0 2 4 6 8 PTTG1 FGFR2 hits 1.97 PTTG1 FGFR2 hits 0.30 SPDEF FGFR2 hits 2.71 SPDEF FGFR2 hits 0.61 FOXA1 FGFR2 hits 4.62 FOXA1 FGFR2 hits 0.97 GATA3 FGFR2 hits 6.25 GATA3 FGFR2 hits 1.09 ESR1 FGFR2 hits 8.10 ESR1 FGFR2 hits 2.01

Figure 8 | Enrichment of the breast cancer AVS in FGFR2 master regulators following cis-eQTL validation. (a) VSE plots and cis-eQTL for all genes in the regulons. (b) VSE plots and cis-eQTL for genes that respond to FGFR2 perturbation. (c) VSE plots and cis-eQTL for genes that do not respond to FGFR2 perturbation. (d–e) VSE plots and cis-eQTL for genes that respond to FGFR2 perturbation using regulons either derived from ER þ (d)orER (e) samples. Box plots show the normalized null distributions (box: 1st–3rd quartiles; bars: extremes). Black diamonds show the corresponding VSE scores. Red diamonds highlight mapping tallies that satisfy a Bonferroni-corrected threshold for significance (Po1e–4). P-values are based on null distributions from 1,000 MRVSs. range of epithelial cell types, especially in hormone-regulated state in the population could in principle reduce breast cancer tissues39, and has previously been associated with cancer: The incidence by that amount. The network approach revealed ERa as SPDEF protein is overexpressed in breast cancer cells compared a mediator of the FGFR2 effect, a finding fully consistent with the with normal tissue40,41, but is often lost in high grade, invasive successful use of anti-estrogens in prevention. However, our tumours42. SPDEF was originally identified as a co-factor of findings strongly suggest that FGFR2 has a role in risk beyond AR43 and acts to suppress metastasis in prostate tumour models that of ERa. If so, the FGFR2 pathway may be an additional target in vivo44. Similarly, SPDEF overexpression in breast cancer cell for both therapy and prevention46. lines also results in an inhibition of invasion, migration and 42 growth . Methods Using a network approach has allowed us to distinguish two Cell culture. MCF-7 human breast cancer cells were cultured in DMEM supple- key components of the FGFR2 response: the ERa-related MRs mented with 10% HI-FCS and antibiotics. In addition, iF2-expressing cells were 1 (ESR1, FOXA1, GATA3 and SPDEF) and a proliferation-related supplemented with 500 mg ml G418 (Invitrogen) and FGFR2b overexpressing cells with 300 mgml 1 Zeocin and 2 mgml 1 blasticidin (Invitrogen) and tetra- cluster around PTTG1. These results improve on analyses based cycline-free FCS was used (Bioscience Autogen). FGFR2b overexpression was on differential gene expression, which primarily reflect increased induced by tetracycline (final concentration of 1 mg ml 1). Cell synchronisation via proliferation25. By identifying specific MRs we also improve on oestrogen deprivation was carried out for at least 3 days in phenol red-free DMEM GO-term enrichment-based methods that identify very broad (Invitrogen) supplemented with 5% charcoal dextran-treated HI-FBS (Hyclone) and biological categories45. Our approach of testing MR regulons for 1% penicillin–streptomycin. All cells were grown at 37 °Cin5%CO2. association with risk SNP clusters should be widely applicable to the follow-up of other GWAS. Establishment of model FGFR2 signalling systems. MCF-7 cells were trans- fected with iF2-pCR3.1 using Lipofectamine 2000 (Invitrogen). Single-cell clones Although consistently the ‘top hit’ in GWAS, FGFR2 is merely resistant to 1,000 mgml 1 G418 were expanded and iF2 expression confirmed a contributor to a much larger picture of polygenic susceptibility. by western blot and immunofluorescent staining. The FGFR2b tetracycline- A large amount of this susceptibility remains unexplained, the inducible overexpression MCF-7 line was established by double-transfection of so-called ‘missing heritability’. Our finding that SNPs cluster in FspI-linearised F2b-pcDNA4/TO and pcDNA6/TR in a 1:5 ratio by DNA. Single- cell clones were expanded under selection using 500 mgml 1 Zeocin and pathways suggests that FGFR2-regulated gene loci near SNPs that 3 mgml 1 blasticidin. Tetracycline induction of FGFR2b expression was confirmed may not have reached genome-wide significance may also be by western blot. functional and may account for some of this missing heritability. Our results are also relevant to prevention. The risk variant in Stimulation of FGFR2 signalling. Oestrogen-deprived cells were stimulated with FGFR2 has an estimated population attributable fraction of 19%, 1 nM estradiol (Sigma); 100 ng ml 1 FGF10 (Invitrogen); 100 ng ml 1 PD173074 which implies that restoring FGFR2 signalling to the wild-type (Sigma-Aldrich). iF2 was activated with 100 nM AP20187 (Takara Biosciences).

10 NATURE COMMUNICATIONS | 4:2464 | DOI: 10.1038/ncomms3464 | www.nature.com/naturecommunications & 2013 Macmillan Publishers Limited. All rights reserved. NATURE COMMUNICATIONS | DOI: 10.1038/ncomms3464 ARTICLE

RNA collection and microarray processing. RNA was extracted using the locus does not imply functional association with gene expression. Therefore, the miRNeasy spin column kit (Qiagen) and quality checked using an RNA 6000 Nano VSE analysis was conditioned to a cis-eQTL validation. We assessed cis eQTLs chip on a 2100 Bioanalyser (Agilent). RNA (250 ng; RIN47) was used for cRNA using METABRIC data19 by applying a multivariate linear model, placing on the amplification and labelling using the Illumina TotalPrep-96 kit (Ambion 4397949). right hand side of the model the genotypes as predictors (representing a given cRNA was hybridized to HumanHT-12 v4 Expression BeadChips according to the cluster of risk-associated and linked SNPs that have been genotyped in Illumina protocol (Illumina WGGX DirectHyb Assay Guide 11286331 RevA). Raw METABRIC, with the assumption of additive effect), and in the left hand side the image files were processed and analysed using the beadarray package47 from gene expression as response variable (representing a given gene set or regulon, Bioconductor. The full data sets are available at the R package Fletcher2013a. assuming a Gaussian distribution). Cis-eQTL analysis was carried out only for genes 200 kb up- or downstream from a particular cluster. Provided that HapMap linkage disequilibrium data have been mapped for markers up to 200 kb apart, the Quantitative RT–PCR and data analysis . After reverse transcription, quantitative effective cis-eQTL analysis extended up to 400 kb radius around the AVS. The PCR was performed using Power SYBR Green FAST on a 9800HT qPCR machine overall exact P-value for the AVS, conditioned to the cis-eQTL validation, was (all Applied Biosystems). Raw data were collected using SDS 2.3 and then further obtained from the null distribution derived from 1,000 MRVS as described by analysed in Microsoft Excel. Ct values were normalized using the DDCt method to Cowper-Sal lari et al.18 Only enrichment scores satisfying the Bonferroni-corrected (i) levels of DGUOK and UBC housekeeping gene expression per sample and (ii) to threshold (Po0.001) are reported as significant. The extended step of the VSE vehicle treatment at each timepoint. All conditions were examined in three inde- analysis was executed in R51 (http://www.R-project.org/) using the stats package, pendent replicates. Relevant primers are listed in Supplementary Table S6. including lm and manova functions. In Fig. 2, the Exp1 gene list is balanced to generate comparable sized lists ( also IL8 ELISA. After stimulation culture media was removed daily, stored and assayed see Supplementary information). by ELISA for IL8 (Enzo Life Sciences). Absorbance was read on a PHERAstar microplate reader (BMG Labtech) and raw OD converted into protein con- centration. IL8 levels per 500K cells were normalized to levels in vehicle-treated Network inference. METABRIC breast cancer gene expression data set19 included cells and corrected for media volume changes. All experimental conditions were a test (n ¼ 997), a validation (n ¼ 995) and a normal breast expression data set tested in triplicate. (n ¼ 144); the T-ALL control (n ¼ 57) was downloaded from GEO (accession number GSE33469)24 and the TCGA breast cancer gene expression data set 20 Chromatin immunoprecipitation. For ERa ChIP-seq, cells were oestrogen starved (n ¼ 155) was downloaded from the dedicated website (https://tcga-data.nci. and E2-stimulated for 45 min. For SPDEF ChIP-seq, cells growing asynchronously nih.gov/docs/publications/brca_2012). Each data set was analysed separately and in full DMEM were collected. ChIP-seq was performed as previously described48. the results from each network compared (see Fig. 3; for the source code of the Briefly, cells were cross-linked in 1% formaldehyde for 10 min. Nuclear extracts network analysis see the R package Fletcher2013b). were prepared and sonicated using a Bioruptor (Diagenode) for 15 min on the Probes were filtered based on their coefficient of variation and MI was calculated in the R package minet52. To derive the regulatory network we re- ‘high’ setting with cycles of 30 s on and 30 s off. Sonicated lysate was mixed with 13 Protein A Dynabeads (Invitrogen) pre-incubated with antibodies against ERa implemented ARACNe/MRA in R, the source code is available in the R package (sc543; 10 mg of antibody in 50 ml volume, diluted 1:25 in sonicated nuclear extract) RTN. The Vignette for Fletcher2013b gives additional information on the MI computation, application of the DPI15 and MRA. Follow-up analysis included and SPDEF (sc67022-X; 10 mg of antibody in 5 ml diluted 1:250 in sonicated nuclear 53 extract) (both Santa Cruz Biotechnology). Immunoprecipitated chromatin was clustering analysis, generation of enrichment maps, gene set enrichment analysis and synergy and shadow analysis33. (Also see R package Fletcher2013b and the used to prepare Solexa sequencing libraries. The full ChIP-seq data set is available 30 within the R package Fletcher2013b and quality control metrics are given in associated Vignette.) For the network visualization we used the R package RedeR . Supplementary Tables S2 and S3. Primers used in ChIP–RT–PCR are listed in Supplementary Table S6. The experiment was carried out in duplicate with similar results. A representative example is shown with error bars denoting the technical Network validation. ChIP-Seq reads were aligned to genome build hg18 and error in three RT–PCR repeats. filtered by removing reads either with quality less than five or overlapping ‘signal artefact’ regions54. Peaks were called using model-based analysis for ChIP-Seq55, run using default parameters. Binding events that occurred in at least two out of siRNA knockdown of TFs siRNA SMARTpools (Dharmacon) targeting PTTG1 . three biological replicates were considered. (L-004309) and SPDEF (L-020199) and a control non-targeting pool (D-001810) ChIP-Seq peaks were ranked by P-value and 75 bp sequences centred on the were transfected into MCF-7 cells using Lipofectamine RNAiMAX (Invitrogen). summits of 150 peaks were selected for de novo motif analysis. DNA sequences RNA was collected after 72 h and processed for microarray analysis. The data were retrieved using the Genome Browser56 (assembly NCBI36/hg18) and motif (available within the R package Fletcher2013a) was compared with published data49 searching was performed using the command line version of MEME 4.8.1 (ref. 57). for ESR1. Motifs between 9 and 15 bases were searched from both strands assuming one binding site per sequence model (mod ¼ oops). Similar motifs were also obtained Analysis of gene expression data. The limma50 package in Bioconductor was from peak summits ranked 1–1,000 and zero or one binding site per sequence used to call DEGs and principal component analysis demonstrated low model (mod ¼ zoops). experimental variation and the specificity of the FGFR response. The source code Further detail on the statistical analysis and the source code that reproduces the for the data analysis is available in the R package Fletcher2013a (also see the statistical analysis on the ChIP-Seq data is available in the R package Fletcher2013b. Vignette for Fletcher2013a). This also includes the analysis of siRNA data and the analysis of the meta-PCNA signature. Variant set enrichment. The VSE analysis was carried out as described by Cowper-Sal lari et al.18 Briefly, the VSE method tests enrichment of the AVS for a particular trait in a genomic annotation. Although the first represents clusters of References risk-associated and linked SNPs, the second corresponds to chromosomal 1. Michailidou, K. et al. Large-scale genotyping identifies 41 novel breast cancer coordinates to which a particular property or function has been attributed. The susceptibility loci. Nat. Genet. 45, 392–398 (2013). enrichment statistics assesses the overlap between these clusters and the genomic 2. Easton, D. F. et al. Genome-wide association study identifies novel breast annotation, a quantity referred to as the mapping tally. This corresponds to the cancer susceptibility loci. Nature 447, 1087–1093 (2007). number of SNP clusters in the AVS that contain at least one linked SNP that 3. Hunter, D. J. et al. A genome-wide association study identifies alleles in FGFR2 overlaps the genomic annotation. The enrichment score is then obtained by associated with risk of sporadic postmenopausal breast cancer. Nat. Genet. 39, comparing the observed mapping tally to a null distribution based on random 870–874 (2007). permutations of the AVS (that is, matched random variant sets—MRVS). All risk- 4. Udler, M. S. et al. FGFR2 variants and breast cancer risk: fine-scale mapping associated SNPs were obtained from the GWAS catalogue (accessed January 2013; using African American studies and analysis of chromatin conformation. Hum. http:/www.genome.gov/gwastudies/). The list of all SNPs in strong LD with each Mol. Genet. 18, 1692–1703 (2009). risk-associated SNP was obtained from the HapMap project data (CEPH HapMap 5. Garcia-Closas, M. et al. Heterogeneity of breast cancer associations with five Linkage Disequilibrium, release no. 27, NCBI B36), using LD threshold based on susceptibility loci by clinical and pathological characteristics. PLoS. Genet. 4, 0 LOD 43 and D 40.99. e1000054 (2008). 6. Meyer, K. B. et al. Allele-specific up-regulation of FGFR2 increases Extended variant set enrichment. The VSE method provides a robust framework susceptibility to breast cancer. PLoS Biol. 6, e108 (2008). to cope with the heterogeneous structure of haplotype blocks, and has been 7. Riaz, M. et al. Correlation of breast cancer susceptibility loci with patient designed to test enrichment in cistromes and epigenomes. In order to extend the characteristics, metastasis-free survival, and mRNA expression of the nearest variant set enrichment to gene loci here we applied an additional step using genes. Breast Cancer Res. Treat. 133, 843–851 (2012). expression quantitative trait loci (eQTLs). The rationale of this extended approach 8. Li, Q. et al. Integrative eQTL-based analyses reveal the biology of breast cancer is that the simple overlap between a given cluster of SNPs and a particular gene risk loci. Cell 152, 633–641 (2013).

NATURE COMMUNICATIONS | 4:2464 | DOI: 10.1038/ncomms3464 | www.nature.com/naturecommunications 11 & 2013 Macmillan Publishers Limited. All rights reserved. ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms3464

9. Lu, P., Ewald, A. J., Martin, G. R. & Werb, Z. Genetic mosaic analysis reveals 42. Feldman, R. J., Sementchenko, V. I., Gayed, M., Fraig, M. M. & Watson, D. K. FGF receptor 2 function in terminal end buds during mammary gland Pdef expression in human breast cancer is correlated with invasive potential branching morphogenesis. Dev. Biol. 321, 77–87 (2008). and altered gene expression. Cancer Res. 63, 4626–4631 (2003). 10. Kim, S. et al. FGFR2 promotes breast tumorigenicity through maintenance of 43. Oettgen, P. et al. PDEF, a novel prostate epithelium-specific ets transcription breast tumor-initiating cells. PLoS One 8, e51671 (2013). factor, interacts with the and activates prostate-specific 11. Turner, N. & Grose, R. Fibroblast growth factor receptor signalling: from antigen gene expression. J. Biol. Chem. 275, 1216–1225 (2000). development to cancer. Nat. Rev. Cancer 10, 118–129 (2010). 44. Steffan, J., Koul, S., Meacham, R. & Koul, H. The SPDEF 12. Muller, F. J. et al. Regulatory networks define phenotypic classes of human stem suppresses prostate tumour metastasis. J. Biol. Chem. 287, 29968–29978 (2012). cell lines. Nature 455, 401–405 (2008). 45. Wang, K., Li, M. & Hakonarson, H. Analysing biological pathways in genome- 13. Carro, M. S. et al. The transcriptional network for mesenchymal transformation wide association studies. Nat. Rev. Genet. 11, 843–854 (2010). of tumours. Nature 463, 318–325 (2010). 46. Schadt, E. E., Friend, S. H. & Shaywitz, D. A. A network view of disease and 14. Basso, K. et al. Reverse engineering of regulatory networks in human B cells. compound screening. Nat. Rev. Drug. Discov. 8, 286–295 (2009). Nat. Genet. 37, 382–390 (2005). 47. Dunning, M. J., Smith, M. L., Ritchie, M. E. & Tavare, S. Beadarray: R classes 15. Margolin, A. A. et al. Reverse engineering cellular networks. Nat. Protoc. 1, and methods for Illumina bead-based data. Bioinformatics 23, 2183–2184 662–671 (2006). (2007). 16. Zhang, X. et al. Receptor specificity of the fibroblast growth factor family. The 48. Schmidt, D. et al. ChIP-seq: using high-throughput sequencing to discover complete mammalian FGF family. J. Biol. Chem. 281, 15694–15700 (2006). protein-DNA interactions. Methods 48, 240–248 (2009). 17. Xian, W., Schwertfeger, K. L. & Rosen, J. M. Distinct roles of fibroblast growth 49. Park, Y. Y. et al. Reconstruction of network reveals that factor receptor 1 and 2 in regulating cell survival and epithelial-mesenchymal NR2E3 is a novel upstream regulator of ESR1 in breast cancer. EMBO Mol. transition. Mol. Endocrinol. 21, 987–1000 (2007). Med. 4, 52–67 (2012). 18. Cowper-Sal lari, R. et al. Breast cancer risk-associated SNPs modulate the 50. Smyth, G. K. Linear models and empirical bayes methods for assessing affinity of chromatin for FOXA1 and alter gene expression. Nat. Genet. 44, differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 1191–1198 (2012). 3, Article3 (2004). 19. Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast 51. R-Core-Team. R: A Language and Environment For Statistical Computing tumours reveals novel subgroups. Nature 486, 346–352 (2012). (R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0, 20. TCGA. Comprehensive molecular portraits of human breast tumours. Nature 2012). 490, 61–70 (2012). 52. Meyer, P. E., Lafitte, F. & Bontempi, G. Minet: A R/Bioconductor package for 21. Allred, D. C., Brown, P. & Medina, D. The origins of alpha- inferring large transcriptional networks using mutual information. BMC positive and -negative human breast cancer. Breast Bioinformat. 9, 461 (2008). Cancer Res. 6, 240–245 (2004). 53. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based 22. Kouros-Mehr, H., Kim, J. W., Bechis, S. K. & Werb, Z. GATA-3 and the regulation approach for interpreting genome-wide expression profiles. Proc. Natl Acad. of the mammary luminal cell fate. Curr. Opin. Cell. Biol. 20, 164–170 (2008). Sci. USA 102, 15545–15550 (2005). 23. Bernardo, G. M. et al. FOXA1 is an essential determinant of ERa expression and 54. Weinstock, G. M. ENCODE: more genomic empowerment. Genome Res. 17, mammary ductal morphogenesis. Development 137, 2045–2054 (2010). 667–668 (2007). 24. Van Vlierberghe, P. et al. ETV6 mutations in early immature human T cell 55. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, leukemias. J. Exp. Med. 208, 2571–2579 (2011). R137 (2008). 25. Venet, D., Dumont, J. E. & Detours, V. Most random gene expression 56. Karolchik, D. et al. The UCSC Table Browser data retrieval tool. Nucleic Acids signatures are significantly associated with breast cancer outcome. PLoS. Res. 32, D493–D496 (2004). Comput. Biol. 7, e1002240 (2011). 57. Bailey, T. L. et al. MEME SUITE: tools for motif discovery and searching. 26. Carroll, J. S. et al. Genome-wide analysis of estrogen receptor binding sites. Nucleic Acids Res. 37, W202–W208 (2009). Nat. Genet. 38, 1289–1297 (2006). 27. Hurtado, A., Holmes, K. A., Ross-Innes, C. S., Schmidt, D. & Carroll, J. S. Acknowledgements FOXA1 is a key determinant of estrogen receptor function and endocrine This work was funded by Cancer Research UK and by the NIHR Cambridge Biomedical response. Nat. Genet. 43, 27–33 (2010). Research Centre. We thank Paul Pharoah for helpful discussions, and Jason Carroll and 28. Theodorou, V., Stark, R., Menon, S. & Carroll, J. GATA3 acts upstream of Christina Curtis for critically reviewing the manuscript. We also thank Hutchison FOXA1 in mediating ER binding by shaping enhancer accessibility. Genome Whampoa Ltd for their support of the Li Ka Shing Professorship held by B.A.J.P. for a Res. 23, 12–22 (2013). substantial time during the course of this work. 29. Wei, G. H. et al. Genome-wide analysis of ETS-family DNA-binding in vitro and in vivo. EMBO J. 29, 2147–2160 (2010). 30.Castro,M.A.A.,Wang,X.,Fletcher,M.N.C.,Meyer,K.B.&Markowetz,F. Author contributions RedeR: R/Bioconductor package for representing modular structures, nested net- M.N.C.F. carried out the experimental work. M.A.A.C. performed the computational works and multiple levels of hierarchical associations. Genome Biol. 13, R29 (2012). analysis. X.W. and I.S. assisted in the computational analysis, M.O’R. in the experimental 31. Wu, L. et al. The -3 transcription factors are essential for cellular work. B.A.J.P. proposed the idea and obtained funding for this project. M.N.C.F., proliferation. Nature 414, 457–462 (2001). M.A.A.C., B.A.J.P., F.M. and K.B.M. designed the experiments and wrote the manuscript, 32. Laoukili, J. et al. FoxM1 is required for execution of the mitotic programme and F.M. and K.B.M. are co-corresponding authors. C.C. is the lead for the METABRIC chromosome stability. Nat. Cell Biol. 7, 126–136 (2005). study and reviewed the manuscript. S.-F.C. and O.M.R. provided the normalized 33. Lefebvre, C. et al. A human B-cell interactome identifies MYB and FOXM1 as METABRIC data. master regulators of proliferation in germinal centers. Mol. Syst. Biol. 6, 377 (2010). 34. Schadt, E. E. Molecular networks as sensors and drivers of common human Additional information diseases. Nature 461, 218–223 (2009). Accession codes Microarray data have been deposited in GEO under accession codes 35. Bernal, J. A. et al. Human securin interacts with p53 and modulates p53- GSE48924, GSE48925, GSE48927 for FGFR2 stimulation and under accession code mediated transcriptional activity and apoptosis. Nat. Genet. 32, 306–311 (2002). GSE48928 for siRNA experiments. ChIP-seq data have been deposited in GEO under 36. Ding, L. et al. Ligand-independent activation of estrogen receptor alpha by accession code GSE48930. All submissions have been assigned the SuperSeries number XBP-1. Nucleic Acids Res. 31, 5266–5274 (2003). GSE48931. 37. Kong, S. L., Li, G., Loh, S. L., Sung, W. K. & Liu, E. T. Cellular reprogramming by the conjoint action of ERa, FOXA1, and GATA3 to a ligand-inducible Supplementary information accompanies this paper at www.nature.com/ growth state. Mol. Syst. Biol. 7, 526 (2011). naturecommunications. 38. French, J. et al. Fine scale mapping and functional analysis of the breast cancer 11q13 (CCND1) locus. Am. J. Hum. Genet. 92, 1–15 (2013). Competing financial interests: The authors declare no competing financial interests. 39. Steffan, J. J. & Koul, H. K. Prostate derived ETS factor (PDEF): a putative tumor Reprints and permission information is available online at http://npg.nature.com/rep- metastasis suppressor. Cancer Lett. 310, 109–117 (2011). rintsandpermissions/. 40. Turcotte, S., Forget, M. A., Beauseigle, D., Nassif, E. & Lapointe, R. Prostate- derived Ets transcription factor overexpression is associated with nodal How to cite this article: Fletcher, M. N. C. et al. Master regulators of FGFR2 signalling metastasis and positivity in invasive breast cancer. Neoplasia and breast cancer risk. Nat. Commun. 4:2464 doi: 10.1038/ncomms3464 (2013). 9, 788–796 (2007). 41. Sood, A. K. et al. Expression characteristics of prostate-derived Ets factor This work is licensed under a Creative Commons Attribution- support a role in breast and prostate cancer progression. Hum. Pathol. 38, NonCommercial-ShareAlike 3.0 Unported License. To view a copy of 1628–1638 (2007). this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/

12 NATURE COMMUNICATIONS | 4:2464 | DOI: 10.1038/ncomms3464 | www.nature.com/naturecommunications & 2013 Macmillan Publishers Limited. All rights reserved.