<<

ARTICLE

Received 9 Sep 2016 | Accepted 12 Dec 2016 | Published 31 Jan 2017 DOI: 10.1038/ncomms14249 OPEN Pancancer modelling predicts the context-specific impact of somatic mutations on transcriptional programs

Hatice U. Osmanbeyoglu1, Eneda Toska2, Carmen Chan2, Jose´ Baselga2 & Christina S. Leslie1

Pancancer studies have identified many that are frequently somatically altered across multiple tumour types, suggesting that pathway-targeted therapies can be deployed across diverse . However, the same ‘actionable mutation’ impacts distinct context-specific regulatory programs and signalling networks—and interacts with different genetic backgrounds of co-occurring alterations—in different cancers. Here we apply a computational strategy for integrating parallel (phospho)proteomic and mRNA sequencing data across 12 TCGA tumour data sets to interpret the context-specific impact of somatic alterations in terms of functional signatures such as (phospho) and (TF) activities. Our analysis predicts distinct dysregulated transcriptional regulators downstream of somatic alterations in different cancers, and we validate the context-specific differential activity of TFs associated to mutant PIK3CA in isogenic cell line models. These results have implications for the pancancer use of targeted drugs and potentially for the design of combination therapies.

1 Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, Box No. 460, New York, New York 10065, USA. 2 Human Oncogenesis and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA. Correspondence and requests for materials should be addressed to C.S.L. (email: [email protected]).

NATURE COMMUNICATIONS | 8:14249 | DOI: 10.1038/ncomms14249 | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14249

ancer cells evolve to acquire hallmark capabilities to therapeutics—and potentially for the development of context- sustain chronic proliferation, evade growth suppressors specific combination therapies—in a pancancer setting. Cand avoid cell death1 largely through the accumulation of somatic alterations that disrupt key signalling pathways. Large-scale cancer genomics projects such as The Cancer Results Genome Atlas (TCGA) have generated a comprehensive Pancancer analysis models dysregulated TFs and signalling. catalogue of somatic mutations and copy number aberrations We used a computational strategy for exploiting parallel across many tumour types. These alterations have been (phospho)proteomic and transcriptomic data to learn a model mapped to known pathways with the hope of deploying that links alterations in signalling (from RPPA data) pathway-targeted therapeutics—drugs targeting mutant with downstream changes transcriptional response (measured by oncoproteins or highly overexpressed wild-type (WT) receptors mRNA data) via predicted TF binding sites15 (Fig. 1a–c). or signal-transduction —for personalized medicine2–11. We used a regularized bilinear regression algorithm called However, while the same ‘actionable mutation’ may occur in affinity regression (AR)16 to learn an interaction matrix multiple cancers, it interacts with context-specific regulatory between upstream signal-transduction proteins and downstream and signalling networks as well as the genetic background TFs that predicts target (Fig. 1a, bottom). More of other somatic alterations, suggesting that its impact—and the intuitively, the model learns weighted edges between signalling effectiveness of the targeted therapy—may strongly depend proteins and TFs to describe the flow of information from both on the cancer type and additional molecular features of change in (phospho)protein level to altered activity of TF to the individual tumour. Moreover, the role of many frequent transcriptional changes in target genes (Fig. 1a, top). In somatic alterations remains obscure, and it is unclear whether a pancancer context, an AR model is trained independently for and how they interact with targetable pathways. In fact, each cancer type and explains the variation in gene expression computational studies of drug sensitivity across cancer cell lines across tumours in terms of (phospho)protein variation and have found that gene expression features are more informative presence of TF binding sites (see Methods section). than mutations for predicting response to targeted therapies12. We can further use the trained AR interaction matrix for each Meanwhile, early ‘basket’ clinical trials that enroll patients for cancer type to obtain different views of each tumour data set targeted therapies based on mutation status alone—regardless of via mappings (Fig. 1b): given a tumour sample’s (phospho)protein cancer type—have demonstrated efficacy only in a subset of expression levels, we can multiply through the model to infer cancers13,14. These findings point to the need for better sample-specific TF activities; conversely, given the gene expres- integrative computational methods that leverage additional sion profile, we can multiply through the motif hit matrix and molecular readouts to model the context-specific impact of the model to infer ‘(phospho)protein activities’ that are somatic alterations on gene expression programs. more informative than the original noisy RPPA data (Fig. 1b, To this end, we applied a computational strategy for exploiting bottom). Intuitively, information flows down from observed parallel (phospho)proteomic and mRNA sequencing data for RPPA levels through the learned interaction matrix to infer large tumour sets by linking the dysregulation of upstream TF activities, and observed mRNA expression levels propagate up signalling pathways with altered transcriptional response through through the TF-target edges and interaction network to infer the transcriptional circuitry15,16. We then developed a statistical (phospho)protein activities (Fig. 1b, top). framework to interpret the impact of mutations and copy Importantly, by associating the presence of somatic alterations number events in terms of altered (phospho)protein and with altered regulator activities, we can gain mechanistic insight transcription factor (TF) activity. We used this strategy to train into the role of specific mutations or copy number events (phospho)protein–TF interaction models across 12 human (Fig. 1c). We perform the association analysis by using cancers in TCGA. First, we identified shared and cancer-specific regularized regression to predict each inferred TF activity roles of TF/signalling regulators across cancer types. In bladder (resp. (phospho)protein activity) individually from the full set urothelial carcinoma, renal cell clear carcinoma and uterus of frequent mutation and copy number features (Fig. 1c, bottom; endometrial carcinoma, many of the identified TF regulators were see Methods section). We then evaluate the significance of the significantly associated with survival outcome. By stratifying effect size (coefficient) for each alteration in the regression model tumours by inferred TF activities rather than gene expression by a permutation approach (see Methods section). After false patterns, we identified known and previously unlinked TFs that discovery rate (FDR) correction across TFs/(phospho)proteins, are differentially active in HPV( þ ) versus HPV( ) head and we can identify a significant set of regulators whose altered neck squamous cancer, and we uncovered a subtype activities are associated with each mutation/copy number event of endometrioid uterine cancer harbouring mutant b-catenin while controlling for the genetic background of other alterations. with altered TF activities. We trained AR models on tumours from 12 different We next performed a systematic regularized regression analysis TCGA cancer studies using samples for which mRNA, RPPA, to associate frequent somatic aberrations with changes in inferred somatic mutation and copy number variation data were available: TF and (phospho)protein activities in each cancer type. bladder urothelial carcinoma (BLCA, n ¼ 115), This analysis identified key regulators associated with the (BRCA, n ¼ 368), colorectal adenocarcinoma (COADREAD, major mutations in renal clear-cell carcinoma. More generally, n ¼ 150), multiforme (GBM, n ¼ 58), head and we observed that specific molecular aberrations have cancer- neck squamous carcinoma (HNSC, n ¼ 194), kidney renal cell- specific functional consequences. In particular, we associated clear carcinoma (KIRC, n ¼ 376), lung adenocarcinoma (LUAD, PIK3CA activating mutations with altered activities of distinct n ¼ 216), lung squamous cell carcinoma (LUSC, n ¼ 106), ovarian sets of TFs in different cancers. Notably, in isogenic cell carcinoma (OV, n ¼ 164), prostate cancer (PRAD, n ¼ 159), line models of breast cancer and head and neck cancer, we uterine corpus endometrial carcinoma (UCEC, n ¼ 183), and validated the altered activity of several TFs in the presence uterine carcinosarcoma (UCS, n ¼ 47). of mutant PIK3CA by measuring promoter occupancy and For statistical evaluation, we computed the mean Spearman expression of target genes, confirming the context-specific correlation between predicted and measured gene expression predictions of our model. These proof-of-principle results suggest profiles on held-out samples using 10-fold cross-validation for a computational strategy for personalized deployment of targeted each cancer model. We obtained significantly better performance

2 NATURE COMMUNICATIONS | 8:14249 | DOI: 10.1038/ncomms14249 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14249 ARTICLE

abc Modelling impact of mutations on TF and Training interaction model Inferring TF and (phospho)protein activities (phospho)protein activities

Tumour-specific (phospho)protein values Mutation

(Phospho)protein

Inferred TF Learned matrix activities Inferred Transcription (phospho)protein factor activities Motif hits

Target gene expression

Tumour-specific gene expression

Tumour-specific (phospho)protein Somatic (Phospho) expression alterations TFs proteins Tumours Tumours W ≈ X ≈

T Tumours

≈ Tumours D W P Y

TFs Inferred TF Effect Inferred TF activity activity Somatic Genes Learned Genes alterations interaction (Phospho)proteins Tumour-specific matrix D W ≈ gene expression Inferred X ≈ (phospho)protein Tumours Tumours activity Effect Inferred (phospho)protein activity

Figure 1 | Integrative computational model links signalling to downstream transcriptional programs. (a) Formally, an interaction matrix W between TFs and upstream signalling proteins is trained using a bilinear regression algorithm, called affinity regression, on RPPA and mRNA expression (RNA-seq) data from a set of tumours, together with TF binding motif information from gene promoters. The model learns to predict target gene expression from tumour-specific (phospho)protein expression levels and gene-specific TF binding sites. The model can be viewed as learning weighted edges (shown as dashed lines) between upstream signalling proteins (shown as red circles) and transcription factors (TFs, shown as triangles) to capture the flow of information from signalling pathways to TFs to target genes and to predict target gene expression changes (TF to target genes shown in green). Formally, the learned weighted edges between (phospho)proteins and TFs are represented by an interaction matrix. (b) (Phospho)protein–TF interaction models for each cancer are trained independently. The model can be used to infer sample-specific TF activities from measured RPPA profiles or to infer sample-specific (phospho)protein activities from measured mRNA expression values by use of matrix mappings. (c) To model the impact of somatic aberrations on transcriptional response and signalling events, we use regularized regression to predict inferred TF activities (resp. inferred (phospho)protein activities) from somatic alterations. The significance of the effect size (regression coefficient) for each somatic alteration on TF/(phospho)protein activity is estimated by a permutation approach. The eventual goal of the modelling is to understand the cancer-specific downstream effects of targeted therapies and to potentially discover secondary targets for combination drug strategies. than a nearest-neighbour approach based on Euclidean distance section) and identified significant shared and cancer-specific in the RPPA space (Po0.00025, one-sided Wilcoxon’s TF/(phospho)protein regulators (Fig. 2a and Supplementary signed-rank test; Supplementary Table 1). Similarly, AR models Data 1). with true motif and RPPA data outperformed models where Figure 2a shows the fraction of samples per cancer type where motif hits for each gene and RPPA profiles for each tumour were each TF was identified as a significant regulator; for clarity, only randomized (Po0.00025, one-sided Wilcoxon’s signed-rank the union of top 10 most prevalent significant TFs per cancer are test). When only the motif hits were randomized, the perfor- shown. Certain TFs display a large variation in inferred activity in mance improvement of the true model was modest but still specific cancer types, suggesting a key role in regulating target significant (Po0.00074, one-sided Wilcoxon’s signed-rank test), gene expression in these cases, while having more modest suggesting that the motif data, while noisy, contributes to variation in other cancers. Figure 2b shows the inferred activity predictive performance. AR obtained similar performance distribution of three TFs identified from our analysis: FOXO1 advantages when assessed using a single held-out test set or (Forkhead box O), NFE2L2 and ELK1. FOXO1, a key regulator when evaluating Pearson correlation or root mean-squared error of cell-cycle progression and , was identified as (Supplementary Figs 1–5). a significant regulator for more than 10% of tumours in BLCA, BRCA and UCEC; its activity showed high variation among tumours for these particular cancers (Fig. 2b, top panel). Pancancer AR identifies signatures of survival. To assess A number of TFs were significantly altered in two or more the statistical significance of AR-inferred regulator activities, tumour types, including ZEB1, JUN, ELK1, FOXM1, while others we developed an empirical null model based on training were limited to a single type, such as FOXD1 in HNSC and AR models on randomly permuted gene expression profiles FOXL1 in KIRC. We identified TFs that are known cancer drivers for each tumour type (see Methods section). Then, we such as STAT5 (endometrioid carcinoma17), AHR18, HMGA19 asked whether the value of individual TF/(phospho)protein (KIRC), PBX1 (OV20, prostate cancer21, BRCA22,23) and NFE2L2 activities for each sample were significantly low or high relative (squamous cell lung cancer). Other predicted TF–cancer to the corresponding distribution over permuted data. We relationships are unknown and may provide new mechanistic corrected for FDR across TFs/(phospho)proteins (see Methods insights.

NATURE COMMUNICATIONS | 8:14249 | DOI: 10.1038/ncomms14249 | www.nature.com/naturecommunications 3 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14249

a b c MEF2D BLCA−FOXO1 BLCA−CEBPG BACH2 2 FOXF2 1.0 1.0 SRF ETV4 1 PBX1 0.8 0.8 EBF1 0.6 0.6 ATF4 0 YY1 PPARA 0.4 0.4 FOXO1 −1 FOXQ1 FOXO1 activity 0.2 0.2 TTF1 High (n=46) High (n=46) Survival probability GTF2A1 Survival probability P−value<0.01 P−value<0.01 RELA −2 0.0 Low (n=46) 0.0 Low (n=46) IRF1 CEBPB 0 20406080100 0 20406080100 CEBPG OV GBM UCS NFE2L1 BLCABRCA HNSC KIRC LUAD LUSC PRAD UCEC Months Months ADD1 MAX NFKB1 COADREAD NFE2L2 JUN 2 UCEC−FOXO1 UCEC−ELK1 FOXM1 ZEB1 1.0 1.0 RREB1 ELK1 1 MZF1 0.8 0.8 MTF1 FOXJ2 0.6 0.6 DBP 0 IRF9 EGR1 0.4 0.4 XBP1 −1 RUNX1 0.2 0.2

HSF1 NFE2L2 TF activity High (n=73) High (n=73) Survival probability Survival probability P−value<0.001 P−value<0.01 HMGA1 Low (n=73) TFAP2A −2 0.0 Low (n=73) 0.0 IRF2 CEBPA 0 20406080100 0 20406080100 OV ETS2 GBM UCS VDR BLCA BRCA HNSC KIRC LUAD LUSC PRAD UCEC Months Months USF1 ARNT ATF3 COADREAD SMAD4 KIRC−NFE2L2 KIRC−MAX IKZF2 2 SOX9 1.0 1.0 LEF1 RFX1 0.8 0.8 1 MEF2A STAT5B 0.6 0.6 ATF2 MEIS1 0 ELF1 0.4 0.4 TCF4 FOXO4 0.2 0.2 KLF12 −1 ELK1 TF activity High (n=150) Survival probability High (n=150) SREBF1 Survival probability P−value<0.001 P−value<0.001 ACTR1A 0.0 Low (n=150) 0.0 Low (n=150) MAZ BPTF −2 0 20406080100 0 20406080100 HNF1A AHR Months Months SMAD3 OV NFYB BLCA BRCA GBM HNSC KIRC LUAD LUSC PRAD UCEC UCS POSTN EP300 JUND COADREAD TBP STAT6 Fraction of patients CREBBP 0.3 0.2 OV 0.1 UCS GBM KIRC BLCA LUSC LUAD BRCA PRAD UCEC HNSC 0 CO ADREAD Figure 2 | Activities of significant regulatory TFs correlate with patient survival. (a) Significant TF regulators for each cancer type were identified using an empirical null model based on training affinity regression models randomized gene expression data (see Methods section). A FDR of 10% was used for studies with 4300 samples (BRCA, KIRC), 15% FDR for mid-size studies (BLCA, COADREAD, HNSC, LUAD, LUSC, UCEC, OV, PRAD), 25% FDR for studies with o50 samples (GBM, UCS). The heat map shows the fraction of samples where each TF is identified as a significant regulator within each cancer. For clarity, the union of top 10 most prevalent significant TFs in each cancer-specific model is shown. (b) Violin plots indicate the distribution of inferred FOXO1, NFE2L2 and ELK1 TF activities across cancer types. For example, FOXO1 TF activity is highly variable across tumours in BLCA, BRCA and UCEC. (c) Inferred TF activity predicts survival in patients with BLCA, UCEC and KIRC cancers. Kaplan–Meier survival curves for TCGA BLCA samples, stratified by TF activity of FOXO1 (top left), CEBPG (top right); TCGA UCEC samples stratified by TF activity of FOXO1 (middle left), ELK1 (middle right); TCGA KIRC samples stratified by TF activity of NFE2L2 (bottom left), MAX (bottom right).

To investigate the clinical relevance of these findings, section and Supplementary Table 5), supporting the use of we examined whether the inferred activity of significant inferred TF activity for patient stratification. TFs was linked to patient survival. We fit Cox proportional hazards regression models for each TF activity using clinical stage (or histological subtype for UCEC) as an additional TF activities distinguish HPV( þ )andHPV( ) HNSC patients. covariate. Indeed, many identified TF regulators had Next, we asked whether our method could identify known highly significant associations with survival outcome in BLCA, and novel TFs that are differentially active in cancer KIRC and UCEC (Fig. 2c and Supplementary Tables 2–4). subtypes. Figure 3a shows the clustering of tumours by inferred For instance, FOXO1 was associated with survival in BLCA TF activities, together with inferred (phospho)protein activities and UCEC, and its inferred activity separated patients into for the same tumour ordering, as derived from the HNSC model high- and low-risk groups. Previous immunohistochemical (showing TF/(phospho)protein activities with largest standard analyses of FOXO1 in bladder cancer showed that deviation across samples). Notably, patterns of TF activities increased mRNA expression is associated with reduced disease across tumours generally correlated with (phospho)protein progression24, consistent with our result. Inferred NFE2L2 and activities. MAX activity were associated with patient survival in KIRC, as Head and neck squamous cancer is frequently associated was ELK1 activity in the UCEC study. Importantly, Cox models with human papillomavirus (HPV) infection and mutations in built from inferred TF activities achieved more significant TP53. AR analysis suggests that the molecular pathogenesis of patient stratification than models built from the gene HPV( þ ) head and neck cancer is distinct from HPV( ) expression of significant TFs (BLCA: Po10 4; UCEC: tumours. Inferred TF activities of 33 TFs were significantly Po10 10, one-sided paired Wilcoxon’s signed-rank test) associated with HPV status (t-test, FDR-corrected Po0.01, (see Methods section and Supplementary Tables 2–4). We Fig. 3b,c); by contrast, the gene expression values of only two further confirmed that most of our UCEC survival results TFs were associated with HPV status (Supplementary Table 6). generalized to two other independent cohorts, MDACC Altered TF activities were involved in cell-cycle, apoptosis, (MD Anderson Cancer Center) and Bergen25 (see Methods oxidative stress, WNT signalling and transforming growth

4 NATURE COMMUNICATIONS | 8:14249 | DOI: 10.1038/ncomms14249 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14249 ARTICLE

a d

PI3K-AKT-MTOR PITX2 JUN MAF (ATF1, PITX2, FOXO1, FOXO3, FOXF2) FOXJ2 STAT5B SP3 MAPK signalling FOXO3 TFAP2A TFAP2A ADD1 NFE2L2 CEBPG (JUN, NFKB1, EGR1, CREB1, NF1) ELF1 MZF1 SREBF1 EMT (ZEB1) NFE2L2 Apoptosis MEF2A LEF1 HMGA1 KLF12 NFKB1 (JUN, ADD1, ATF1) ELK1 EGR1 FOXF2 Transcriptional activation of NRF2 CREB1 ZEB1 ELF2 Metabolism TCF4 XBP1 SF1 (MAF, NFE2L2, CEBPB) SMAD4 CEBPG TFAP4 (FOXJ2, HMGA1, NFE2L2) HMGA1 STAT6 YY1 E2F1 TCF12 PATZ1 IL2-STAT5 signalling CEBPB PDGF signalling NF1 STAT5A ATF1 FOXO1 IRF9 (XBP1, IKZF2) SP1 (ELF1, ELK1, STAT5A, STAT5B) IKZF2 ATF1 SOX9 CEBPA TGIF1 POU2F1 RFX1 NR1H3 STAT5B FOXC1 MAZ Apoptosis SF1 PDGF signalling FOXO4 IRF2 FOXM1 (RELA, ATF3, JUN) MEIS1 MZF1 PBX1 (STAT5B, ELF1, ELK1, STAT3, ETS1) POSTN USF1 RELA ELF1 ATF3 ELK1 Oxidative stress response SRF β IKZF1 PBX1 TGF- signalling MAZ (ELK1, NFE2L21, ATF2,MAX) MTF1 CEBPA JUN (FOXO4, FOXM1, JUN, SMAD4) NFE2L1 SOX9 TCF4 Wnt signalling (TCF4) MEIS1 ATF2 ELF2 ACTR1A GABPA MAX Hypoxia (NR3C1, ETS1) TFAP4 NR3C1 SMAD4 KLF12 IRF1 VDR EGR2 E4F1 MEF2A (MEF2A, MEF2D, FOXO4) MSX1 MEF2D STAT3 ETS1 ETS2

PR Rad50 PKC−alpha DNA damage response Annexin−1 PTEN eEF2 Hormone response ASNS (RAD50, XRCC1) AR STAT5−alpha ER−alpha (ERa, AR) NF−kB−p65_pS536 ATM p38_MAPK PDCD4 XRCC1 G6PD DNA damage response HER2 RTK (ERBB2, EGFR) 53BP1 XBP1 Fibronectin (ATM, TP53BP1) beta−Catenin 4E−BP1_pT37T46 STAT3_pY705 MSH6 TSC-mTOR Cyclin_B1 eEF2K CD49b TSC-mTOR (RICTOR, EIF4EBP1) Rictor PDCD4 Lck (EIF4EBP1, RICTOR) eEF2K STAT5−alpha Heregulin PRDX1 RBM15 Src 4E−BP1_pT37T46 PI3K−p85 EGFR TIGAR Apoptosis (BCL2L1, BCL2L11) PKC−alpha_pS657 c−Kit Rictor PI3K-AKT (GSK3A) ASNS HER2_pY1248 EMT (CDH1, CDH2) IRS1 ATM Bim GSK3−alpha−beta_pS21_S9 Rb_pS807_S811 N−Cadherin NF2 Ras-MAPK Bcl−2 MEK1 ER−alpha_pS118 c−Kit (ARAF, MAP2K1, MAPK14) 4E−BP1 Claudin−7 E−Cadherin RTK (SRC, ERBB3) INPP4B Cyclin_E1 Fibronectin JNK2 Myosin−IIa_pS1943 Caveolin−1 p62−LCK−ligand EMT Src_pY527 Chk2 Cyclin_B1 PI3K-AKT Bcl−2 Collagen_VI (COL6A1, CDH1, FN1) Akt A−Raf_pS299 FoxM1 (PTEN, AKT1) YB−1 CD49b GSK3_pS9 Rab−25 E−Cadherin NF−kB−p65_pS536 HSP70 GAB2 PKC−pan_BetaII_pS660 HER3 Rab−25 PEA−15 Annexin−1 Paxillin YAP_pS127 PTEN PREX1 PKC−pan_BetaII_pS660 TP53 PIK3CA HRAS TF activity CASP8 PTEN NOTCH1 1 TP53 NSD1 PIK3CA PIK3R1 NFE2L2 ARID1A 11q13.3 0.5 Somatic alteration ARID5B 3q26.33 (,PIK3CA) KRAS 7p11.2 (EGFR) 0 CTCF 8q24.21 RPL22 Mutation 8p11.23 CTNNB1 9p21.3 −0.5 POLE Amplification 8p23.2 FBXW7 4q35.2 PPP2R1A 2q22.1 −1 19q12 (CCNE1) Wild type 18q23 3q26.2 (MECOM) 11q23.1 8q24.21 19p13.3 (Phospho)protein activity 17q12 (ERBB2) Deletion 7q36.1 1q22 1p13.2 0.4 19p13.2 13q12.11 2q13 5q12.1 10q22.2 9p23 0.2 1p22.3 2q36.2 1q21.3 (MCL1) 2q21.2 19p13.3 13q14.2 0 5q12.1 4q34.3 −0.2 10q23.31 22q13.32 Histology Xp21.1 −0.4 2q22.1 16q23.1 Endometrioid 1p36.11 HPV status Serous Positive Negative Tumour site b HNSC HNSC HNSC Hypopharynx e UCEC UCEC UCEC Larynx 1.0 P<1e−08 P<1e–14 P<1e–5 P<1e–5 Oral cavity P<1e–14 1.0 1.0 P<0.01 Oropharynx 1.0 0.2 0.5 0.4 0.5 RNA subtype 0.0 0.0 0.0 Atypical 0.0 0.0 0.0 Basal Classical p53 activity ESR1 activity KLF12 activity NFE2L2 activity TGIF1 activity −1.0 MSH6 activity Mesenchymal −0.5 −1.0 −0.2 −1.0 −0.4 HPV(+) HPV(−) HPV(+) HPV(−) HPV(+) HPV(−) Endometrioid Serous Endometrioid Serous Endometrioid Serous

c TCGA f TCGA ATF2 ATF2 NFYA ACTRIA ATF1 NFE2L2 15 PATZ1 CEBPB IRF9 TFAP4 KLF12 STAT3 MSX1 4 GABPA ELF2 TEAD1 MYC XBP1 VDR TCF4 SOX9 SF1 STAT5A TFAP2A MYCN TBP TGIF1 Sewell et al. 10 EP300 TCF3 15 MDACC RORA FOXO3 PPARA FDR 4 SMAD4

FDR ETS2 GTF2I AHR TP53 10 SREBF1

ATF4 10 NF1 JUN TGIF1 NR3C1 10 FOXC1 MAF BPTF RARA LEF1 SP1 MTF1 IKZF2 FDR JUND ESRRA 10 2 PITX2 –Log 2 TFAP2A

–Log GATA2 FDR 5 5 10 FOXC1 PAX8 MAX –Log NFE2L1 KLF12 –Log 0 MEF2A 0 MEF2D –1.0 –0.5 0.0 0.5 1.0 –1.0 –0.5 0.0 0.5 1.0 Effect difference (HPV+ vs HPV–) Effect difference 0 0 (CTNNB1 mutant vs wild-type) –1.0 –0.5 0.0 0.5 1.0 –1.0 –0.5 0.0 0.5 1.0 Effect difference (HPV+ vs HPV–) Effect difference (CTNNB1 mutant vs wild-type)

Figure 3 | Pancancer affinity regression modelling identifies regulatory features of tumour subtypes. (a) We trained an affinity regression model on 194 tumours from the TCGA HNSC study. The top heat map shows tumours clustered by the inferred TF activities with highest variance. The middle panel shows top most variable (B50) inferred (phospho)protein activities for each tumour based on clustering by sample-specific TF activities. The bottom panel shows genomic aberration profiles of each tumour as well as HPV status, tumour site and mRNA expression subtypes derived from the corresponding TCGA HNSC study. (b) Examples of three of the significantly differential inferred TF/protein activities in HPV( þ ) versus HPV( ) tumours were NFE2L2, KLF12 and MSH6. HPV( þ ) tumours have significantly higher NFE2L2 TF activity (Po10 5, Wilcoxon’s rank-sum test), higher KLF12 TF activity (Po10 5, Wilcoxon’s-rank sum test) and higher MSH6 protein activity (Po10 2, Wilcoxon’s rank-sum test) than HPV( ) tumours. (c) The mean inferred TF activity difference in HPV( þ ) and HPV( ) patients is plotted on the x axis, and FDR-adjusted significance from t-test is plotted on the y axis (–log10 scale) for TCGA and Sewell et al.28 head and neck cancer cohorts. TFs significantly associated with HPV status (FDRo0.01) in both cohorts are coloured in orange. (d) We trained an affinity regression model on 183 tumours from the TCGA endometrial carcinoma (UCEC) study. The top heat map shows a clustering of tumours by inferred TF activities. The middle panel shows inferred (phospho)protein activities of each tumour based on clustering of tumour TF activities. The bottom panel shows genomic aberration profiles of each tumour as well as histological subtypes derived from the corresponding TCGA UCEC study. Patterns of TF activities across tumours often correlated with patterns of (phospho)protein activities. (e) Examples of three of the significantly differential inferred TF/(phospho)protein activities in serous versus endometrioid tumours were ESR1, TGIF1 and p53. Tumours with endometrioid histology have significantly higher ESR1 TF activity (Po10 8, Wilcoxon rank sum test), lower TGIF1 TF activity (Po10 14, Wilcoxon rank-sum test), and lower p53 protein activity (Po10 14, Wilcoxon rank sum test) than serous tumours. (f) The mean inferred TF activity difference in CTNNB1 mutant and CTNNB1 wild-type patients is plotted on the x axis, and false discovery rate (FDR)-adjusted significance from t-test is plotted on the y axis ( log 10 scale) for TCGA and MDACC endometrial cancer cohorts. TFs significantly associated with CTNNB1 status (FDRo0.01) in both cohorts are coloured in orange. Box edges represent the upper and lower quantile with median value shown as bold line in the middle of the box. Whiskers represent 1.5 times the quantile of the data. factor-b) signalling and may have roles in the initiation site5. To confirm our results, we used the HNSC TCGA-trained and maintenance of HPV( þ ) head and neck cancer26. AR model to infer TF activities in an independent set of 42 head For example, KLF12 and NFE2L2 were significantly associated and neck cancer RPPA profiles with HPV status28. We again with HPV( þ ) tumours (Fig. 3b). Interestingly, the KLF12 identified TF–HPV status associations by t-test and found is a frequent integration site for the HPV virus27 a similar set of TFs (23 out of 33) whose activities significantly in cervical cancer, and the TCGA HNSC study also identified differed between HPV( þ ) and HPV( ) tumours; the KLF5, the locus of a related KLF factor, as an HPV integration 33 identified TFs were also enriched among the top-ranked

NATURE COMMUNICATIONS | 8:14249 | DOI: 10.1038/ncomms14249 | www.nature.com/naturecommunications 5 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14249

TFs in the new cohort (Po10 5, Mann–Whitney test) test; Fig. 3f and Supplementary Table 11). Interestingly, another (Fig. 3c and Supplementary Table 6). study performed customized consensus clustering on TCGA As described previously29,mutantTP53 tends to be mutually UCEC expression data and did identify a cluster enriched with exclusive with HPV( þ ) status, but inferred TP53 TF activity and b-catenin mutations, and GSEA (gene set enrichment analysis) inferred p53 protein activity were not significantly different between suggested an association with WNT signalling, consistent with HPV( þ )andHPV( )patients(t-test P ¼ 0.477 and P ¼ 0.741, our analysis34. respectively). However, it is known that the viral E6 oncoproteins in HPV( þ ) head and neck cancer form acomplexwithWTp53andleadtoitsdegradation30,pointingto Modelling reveals impact of mutations in kidney cancer. an alternative mechanism for p53 inactivation in HPV( þ )patients. Encouraged by our findings for mutant CTNBB1 endometrioid We performed similar analyses for other TCGA cancer studies tumours, we developed a systematic statistical approach and in each cancer type could stratify patients by regulator for modelling the impact of somatic alterations on regulator activity profiles (Supplementary Figs 6–15). For example, the activity in each tumour type, with the eventual goal of inferred TF activity of CEBPA31 was significantly higher in deciphering cancer-specific downstream effects of targeted the mesenchymal subtype compared to other subtypes of therapies and potentially discovering secondary targets for GBM (Po10 5, Wilcoxon’s rank-sum test used for all tests); combination drug strategies. First, we implemented a regularized ESR1 (estrogen 1) activity was higher in luminal BRCA regression approach that uses somatic alterations to compared to other BRCA subtypes (Po10 42), consistent with explain inferred TF/(phospho)protein activity across tumour oestrogen receptor serving as a luminal marker32; and activity of samples on a regulator-by-regulator basis. For a complex geno- TTF-1 (thyroid transcription factor-1) thyroid transcription type, the model explains TF/(phospho)protein regulator activity factor-1, a known biomarker of LUAD33, was higher in the across tumours as the sum of effects of individual somatic squamoid (Po10 10) and bronchioid subtypes (Po10 5) alterations (that is, coefficients in the regression model), and the compared to the magnoid subtype in LUAD. effect size of each alteration is assigned a nominal P value by a permutation approach (see Methods section). We then corrected for multiple hypotheses across regulator models, TF signature defines mutant CTNNB1 endometrioid subtype. treating inferred TF activities and inferred (phospho)protein We then asked if we could associate inferred TF activities with activities as separate groups of tests (see Methods section). mutational signatures as a first step towards developing a more Combining these results identified a set of regulators predicted to general statistical strategy. Figure 3d shows a clustering of be significantly dysregulated by each somatic alteration in each tumours by inferred TF activities from the UCEC model, together TCGA cancer study. with inferred (phospho)protein activities and recurrent somatic Figure 4a,b shows the regulator activities associated with mutations and copy number events. Serous-like endometrial somatic aberrations in KIRC. Our model identified mutations in tumours are negative, mostly copy number VHL (von Hippel-Lindau), PBMR1, BAP1, MTOR, ATM, high, and harbour mutations in TP53, whereas endometrioid SETD2, KDM5C and PTEN (phosphatase and tensin homolog), tumours are hormone receptor positive, copy number low, and as well as copy number changes in MLH1, DUSP1 and have a high frequency of PI3K-AKT (phosphatidylinositol RANDBP17 as significantly associated with various TF activity 3-kinase-AKT) pathway alterations5,6. Consistent with changes across tumours. KIRC is characterized by a high- their distinct molecular and genomic features, we found frequency inactivating mutation in the VHL gene found in significant differences in inferred regulator activities in serous- B54% of tumours in TCGA and likely more prevalent35. like and endometrioid tumours (Supplementary Tables 7 and 8), Mutually exclusive mutations in PBRM1, a subunit of the PBAF including increased ESR1 activity in endometrioid tumours SWI/SNF chromatin remodelling complex, and in (Po10 8, Wilcoxon’s rank-sum test used for all tests) and deubiquitinase BAP1 define two genetic subtypes of KIRC, increased TGIF1 TF activity and inferred p53 protein activity while recurrent mutations in the histone methyltransferase serous-like tumours (Po10 14 for both tests; Fig. 3e). SETD2 also occur. Importantly, clustering by TF activities revealed subclasses of KIRC samples with PBMR1 and BAP1 mutations showed tumours within each histological subtype that sometimes distinct patterns of TF and protein activities (Fig. 4a), and correlated with mutation status. In particular, endometrioid regression analysis associated different regulators with these tumours with a CTNNB1 mutation form a distinct cluster based mutations (Fig. 4b). PBMR1 mutant tumours are associated with on inferred TF activity profiles that was not observed by increased activity of TFs/(phospho)proteins that have roles in clustering TF mRNA expression levels directly (Supplementary interleukin signalling and MYC, while regulators with increased Fig. 16). Moreover, clustering based on inferred TF activity was activity in BAP1 mutant tumours are involved in DNA damage better able to stratify patients by CTNNB1 mutation status response, apoptosis, signalling and mTOR signalling. (Po10 17, two-sided w2 test for all tests) compared to reported Notably, NFE2L2 TF activity was significantly higher in BAP1 TCGA mRNA clusters (Po0.01) and TCGA integrated clusters mutant tumours than PBMR1 mutant tumours. Dysregulation of (Po10 6) (Supplementary Tables 9 and 10). Significant inferred the KEAP1-NFE2L2 pathway occurs through both genetic and TF activity differences between CTNNB1 mutant and epigenetic mechanisms and induces prosurvival genes promoting WT patients (satisfying FDR-corrected Po0.01, t-test) associated proliferation and chemoresistance36. Mutations in KEAP1, CTNNB1 mutant status with altered activity of TFs involved in NFE2L2 (Nrf2), CUL3 or RBX1 are the most common WNT signalling, epithelial–mesenchymal transition and cancer mechanisms that impair KEAP1-mediated degradation of stem cell transition including TCF4 (transcriptional factor 4), NFE2L2 and thereby activate the transcriptional effects of NFATC4, JUN, TP53, MAX, MYC, STAT3 and KLF12 (Fig. 3f). NFE2L2. Somatic aberrations in these genes have been We confirmed these results in an independent data set of described in LUSC, LUAD and HNSC, and indeed we 203 endometrial RPPA profiles along with mutation and clinical confirmed this activating effect (Fig. 4c). Inferred TF activity of data compiled by MDACC25, using the UCEC TCGA-trained NFE2L2 was increased in mutant versus WT KEAP1 or NFE2L2 AR model to infer TF activities, and replicated many of the lung cancers; these differences are not observed at the gene TFs associated with mutant CTNBB1 (Po10 5, Mann–Whitney expression level (Supplementary Fig. 17).

6 NATURE COMMUNICATIONS | 8:14249 | DOI: 10.1038/ncomms14249 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14249 ARTICLE

a LMO2 b RFX1 MECOM TEAD1 TCF3 NFYA E4F1 FOXJ2 TBP GTF2A1 β STAT3 TGF- signalling SF1 PBRM1 MAX NFKB1 (FOXJ2, FOXC1, JUN, FOXQ1) PAX8 FOXC1 Del_MLH1 SP3 HMGA1 MAPK signalling NFE2L2 Association POU2F1 VHL STAT5B (MECOM, MAX, NFKB1, JUN) JUN MAZ score CEBPA MZF1 Amp_RANBP17 ADD1 ETS1 FOXQ1 4 NR1H4 Amp_DUSP1 TCF12 POSTN 2 MEF2D WNT signalling SREBF1 BAP1 EP300 0 HNF1A (EP300, SMAD1, SMAD3, ESR1 STAT5A MTOR GTF2I −2 IKZF2 SMAD4, NFATC4, MYC) IRF2 EBF1 ATM −4 SMAD4 Oxidative stress response MYC MEF2A SPI1 SETD2 AHR (ELK1, ATF2, MEF2A, MYC) NFATC4 NFYB RREB1 BPTF KDM5C IRF8 SMAD3 GABPA JUND PTEN TFAP4 TEF FOXM1 ELK1 SF1 SP1 TEF YY1 SP3 JUN TBP

STAT6 SRF VDR MAZ AHR MAF IRF8 IRF2 MAX SPI1 MYC LEF1 TTF1 TP53 E4F1 ELK1 ATF4 ATF2 ATF1 TCF3 ETS2 ZEB1 EBF1 ETS1 USF1 BPTF RFX1 PAX2 USF1 PAX8 MTF1 MZF1 ESR1 JUND NFYA NFYB ADD1 IKZF2 LMO2 GTF2I HIF1A KLF12 EP300 TCF12 STAT6 PATZ1 TFAP4 FOXJ2 SMAD1 STAT3 TEAD1 NFKB1 HNF1A NR1H4 MEF2A CREB1 FOXC1 RREB1 MEF2D CEBPB HOXA4 CEBPA FOXQ1 SMAD3 SMAD1 SMAD4 POSTN FOXM1 GABPA HMGA1

VDR NFE2L2 STAT5A STAT5B GTF2A1 MECOM NFATC4 SREBF1 POU2F1 ACTR1A CEBPB CREBBP KLF12 ATF2 SRF SP1 ATF1 MAF HIF1A

Akt Myosin IIa_pS1943 ARHI CD49b Caveolin 1 PBRM1 DJ 1 AR p62 LCK ligand Caspase 7_cleavedD198 VHL XBP1 EGFR DNA damage response PRDX1 Del_MLH1 Chk1 (CHEK1, CHEK2) S6_pS240_S244 Syk Amp_RANBP17 JNK2 Apoptosis p27_pT198 Association Bad_pS112 Amp_DUSP1 Chk1_pS345 (BAX, BAD) ETS 1 score YB 1 GSK3 alpha beta_pS21_S9 MT OR GSK3_pS9 Cyclin_B1 4 P Cadherin AT M Claudin 7 2 Raptor Bcl xL PI3K-AKT KDM5C GSK3 alpha beta 0 LKB1 FOXO3a_pS318_S321 (FOXO3, GSK3A) −2 GAPDH PTEN JNK_pT183_Y185 STAT5 alpha −4 Lck SETD2 Dvl3 Tuberin_pT1462 MEK1 BAP1 Collagen_VI Akt_pS473 EMT (COL6A1) NF2 Rad51 AR Bid Akt Lck Syk

Ras-MAPK Bax NF2 VHL TAZ VHL Dvl3 Chk1 Chk2 DJ−1 ARHI JNK2 LKB1 YB−1 TSC1 CD20 CD31 XBP1 ERK2 GAB2 G6PD HER3 ASNS G6PD MEK1 EGFR MSH6 Rad51 Rad50 Raptor 53BP1 Bcl−xL eEF2K ETS−1 CD49b HSP70 Smad4 Smad1 Notch1 PRDX1 BRCA2

Src_pY527 PDCD4 (MAPK14, MAP2K1) GAPDH p38_MAPK Cyclin_E1 Cyclin_B1 Claudin−7 Bap1−c−4

HSP70 Akt_pS473 Src_pY527 GSK3_pS9 ACC_pS79 Caveolin−1 p38_MAPK p27_pT198 Bad_pS112 P−Cadherin E−Cadherin MSH6 Collagen_VI Chk1_pS345

PDCD4 STAT5−alpha 4E−BP1_pS65 NDRG1_pT346 Acetyl a Tubulin Lys40 p70S6K_pT389 PRAS40_pT246 Tuberin_pT1462 p62−LCK−ligand Bap1 c 4 S6_pS240_S24 4 p38_pT180_Y182 JNK_pT183_Y185 ERK2 GSK3−alpha−beta p38_pT180_Y182 Myosin−IIa_pS1943 FOXO3a_pS318_S321 Acetyl−a−Tubulin−Lys40 VHL Caspase−7_cleavedD198 BAP1 PBRM1 GSK3−alpha−beta_pS21_S 9 SETD2 KDM5C PTEN MTOR PIK3CA 5q35.1 3p26.3 9p21.3 3p22.2 c KIRC LUAD LUSC HNSC P-value<1.69E−08 1.0 P-value<4.88E−06 1.5 P-value<3.42E−07 1.5 1.0 P-value<0.02 1.0 1.0 TF activity (Phospho)protein activity Somatic alteration Pathologic stage 0.5 0.5 1 0.4 0.5 0.5 Mutation Stage I 0.5 0.2 Amplification Stage II 0.0 0.0 0 0 0.0 wt Stage III 0.0 −0.2 −0.5 Stage IV −0.5 NFE2L2 activity Deletion

−1 −0.4 NFE2L2 activity −0.5 NFE2L2 activity −0.5

−1.0 NFE2L2 activity −0.5 −1.0 −1.5 −1.0 BAP1 PBRM1 KEAP1 WT NFE2L2 WT NFE2L2 WT mut mut mut mut mut

1.0 d P-interaction < 0.0003 P-interaction < 0.05 P-interaction < 0.05 0.3 P-interaction < 0.05 1.0 1.0 0.2 0.5 0.5 0.5 0.1 0.0 0.0 0.0 0.0 −0.5 −0.5 HSF1 activity KLF12 activity TIGAR activity −0.1 SMAD1 activity −1.0 −1.0 −0.5 −0.2

VHL VHL VHL VHL Both Both BAP1 Both BAP1 Both PBRM1 Neither PBRM1 Neither Neither Neither Figure 4 | Genomic aberrations are associated with dysregulated TF and (phospho)protein activity in the TCGA KIRC study. (a) We trained an affinity regression model on 376 tumours from the TCGA KIRC study. The top panel shows inferred TF activity for each tumour associated with BAP1, PBMR1, SETD2 and KDM5C mutations, ordered according to the mutation profile. The middle panel shows the corresponding (phospho)protein activity for each tumour associated with BAP1, PBMR1, SETD2 and KDM5C mutations. The bottom panel shows genomic aberration profiles of each tumour as well as the pathological stage as derived from the corresponding TCGA KIRC study. (b) Impact of genomic aberrations on individual TF/(phospho)protein activities in TCGA KIRC, based on a regularized regression analysis. A permutation test approach was used to assign significance to ridge regression coefficients (see Methods section). The heat map shows –log 10 FDR-adjusted P values derived by permutation test, multiplied by the sign of the coefficient. The top panel shows inferred TF activity associations. The bottom panel shows inferred (phospho)protein activity coefficients. (c) Inferred NFE2L2 activity in TCGA KIRC, LUAD, LUSC and HNSC studies and impact of mutations using integrative modelling. In the KIRC study, tumours with mutant BAP1 have significantly higher NFE2L2 activity than mutant PBRM1 tumours (Po1.68 10 8, Wilcoxon’s rank-sum test). Tumours with mutant KEAP1/NFE2L2 have also significantly higher inferred TF activity of NFE2L2 (a substrate targeted by KEAP1) than WT tumours in the LUAD study (Po4.88 10 6, Wilcoxon’s rank- sum test), LUSC study (Po3.42 10 6, Wilcoxon’s rank-sum test) and HNSC study (Po0.02, Wilcoxon’s rank-sum test). This association is not significant using the original measured gene expression values of NFE2L2 (Supplementary Fig. 17). (d) Inferred SMAD1 and KLF12 activity is significantly and synergistically decreased in tumours with both VHL and PBMR1 mutations (SMAD1: interaction, Po0.0003; KLF12: interaction, Po0.05). In tumours with both VHL and BAP1 mutations, HSF1 activity is significantly decreased (interaction Po0.05) and TIGAR protein activity is significantly increased (interaction Po0.05). Box edges represent the upper and lower quantile with median value shown as bold line in the middle of the box. Whiskers represent 1.5 times the quantile of the data.

We also assessed synergistic effects by building linear models currently in early-phase or phase III trials for use across multiple with interaction terms for each pair of somatic alterations cancers14,39, we asked whether PI3K pathway alterations (see Methods section). In samples where VHL was comutated dysregulate the same or differing TFs and (phospho)proteins with PBMR1, SMAD1 (interaction Po0.0003) and KLF12 across tumour types. Figure 5a and Supplementary Fig. 18 show (interaction Po0.05) activity were significantly decreased. the regulators associated with somatic aberrations in PTEN and Meanwhile, when VHL was comutated with BAP1, HSF1 activity PIK3CA by our analysis in BRCA, HNSC, UCEC, KIRC, LUAD was significantly decreased (interaction Po0.05), while TIGAR and PRAD tumours (see Methods section). protein activity was significantly increased (interaction Po0.05) Activating mutations in PIK3CA were present in B31% of (Fig. 4d). BRCA tumours and 20% of HNSC tumours5. Mutations often occur in one of three hotspot locations (E545K, E542K and H1047) and promote constitutive signalling though the PI3K pathway mutations dysregulate cancer-specific TFs. The pathway. In UCEC, B66% of tumours have PTEN inactivating PI3K pathway controls proliferation, metabolism, survival and mutations, B50% have PIK3CA activating mutations and motility and is frequently activated in many cancers, often B35% have a comutation of PTEN and PIK3CA. Figure 5a via mutations in PIK3CA, which encodes the a-isoform of the shows 134 TFs associated with somatic aberrations in PTEN or p110 catalytic subunit of PI3K (PI3Ka); loss of PTEN, which PIK3CA in BRCA, HNSC or UCEC. Notably, the number of antagonizes PI3K function; and overexpression of membrane- TFs dysregulated by PI3K pathway alterations varied widely bound receptor tyrosine kinase37,38. As PI3K inhibitors are across different cancers (9 in HNSC, 65 in BRCA and 63 in

NATURE COMMUNICATIONS | 8:14249 | DOI: 10.1038/ncomms14249 | www.nature.com/naturecommunications 7 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14249

a Breast cancer – PIK3CA

2 Common 1 BRCA and HNSC 0 BRCA and UCEC −2 HNSC and UCEC

Association score BRCA-specific HNSC-specific UCEC-specific Head and neck cancer − PIK3CA 2 1 0 −1

Association score −2

Endometrial cancer − PTEN 2 1 0 −1 −2 Association score SF1 NF1 TEF YY1 SP3 JUN TBP SRF DBP AHR MAF MAZ IRF9 IRF1 IRF2 MAX MYB SPI1 MYC TTF1 ELF2 E4F1 LEF1 ELF1 E2F1 ATF6 ATF1 ATF3 ATF2 ELK1 TCF4 TCF3 ZEB1 ETS1 HSF1 RFX1 USF1 XBP1 PAX8 MZF1 MTF1 RELA JUND ADD1 NFYB REST NFYA SOX9 ARNT IKZF2 EGR1 IKZF1 EGR3 LMO2 RARA TGIF1 RORA KLF12 MEIS1 TCF12 FOXJ2 PATZ1 STAT3 TFAP4 STAT1 TFCP2 NFKB1 TEAD1 FOXF2 NR1H3 NR3C1 GATA2 RREB1 CREB1 MEF2A MEF2D RUNX1 RUNX2 FOXO1 FOXO3 PPARA CEBPA FOXO4 FOXM1 ESRRA SMAD4 SMAD1 GABPA CEBPG HMGA1 NFE2L2 NFE2L1 STAT5B TFAP2A MECOM POU2F1

b Breast cancer Head and neck cancer Endometrial cancer

Longevity regulating pathway HIF-1 signalling pathway Insulin signalling pathway RPS6KA1 JUN

ARNT ELK1 GnRH ssignalling RICTOR mTORTOR signallings ggnnallingnna ingn GAPDH MYB pathwayath SYK ACACA RPS6KA1 pathwaypathwpa hwaayy ErbB signallingn llinng CTNNB1 SRC ATF2 EIF4E pathwayaay NFKB1 RPTOR CCNB1 ERBB2 Adherens junction EIF4EBP1 PTEN CDKN1A CREB1 EIF4EBP1 Longevity regulating pathwayaththwwawaya EIF4EBP1 EIF4E mTOR signalling pathway AKT1 SMAD4 PRKCB PI3K-A--AAktAkAktt ATF4 PRKAA1 FOXO3 PRKCB FoxOFoxO signalling PXN signalling patppaathwayatathhwwaywwaaayy HIF-1 signalling pathwayway PIK3CA RELA MTOR pathway MTOR EEF2K IRS1 VEGF signalling GAPDH NFKB1 TP53 G6PD CentralCeCenCenentraentrantratrraal ccacarbonar metabolism in PIK3CA pathway AMPK signalling pathwaya wayayay STAT3 VHL HIF1A cancer STAT5B FOXO1 FOXO1 RPS6KB1 mTOR signalling pathway FASN G6PD ARNT CCNE1 YWHAB ELK1 HSPA1A ACACA NRG1 Central carbon PDK1 FOXO3 RPS6KB1 ARAF ERBB3 metabolism in HIF-1 signallingg ErbB signalling NRAS ESR1 BAD CREBBP MYC cancer pathway CDKN1B PIK3CA OestrogenOOesestes signalling pathway pathway SHC1 TFAP4 YWHAZ PTEN ESR1 TP53 AKT1 Insulin signalling pathway JUN CHEK2 CCNE2 FN1 MAPK1 PDCD4 ERBB2 EGFR ELK1 Proteoglycans in cancer PCNA Choline metabolism in cancer MYC KDR CCNB1 E2F1 Cell cycleyc BAD RAF1 ATM STAT5B SMAD3 MAPK8 SMAD4 MAPK14 NRG1 ITGA2 PXN SRC ErbB signallings FOXO4 pathwaya FoxO signalling ERBB3 pathway CAV1 PDCD4 Proteoglycans in cancer VEGF signalling pathway Focal adhesion

Figure 5 | Somatic aberrations in the PI3K pathway dysregulate cancer-specific TFs. (a) Bar plots show 134 TFs associated with somatic aberrations in PTEN or PIK3CA discovered from regularized regression analysis in breast cancer (BRCA), head and neck squamous carcinoma (HNSC) or UCEC and their association scores. TFs that are significantly associated with the somatic aberration but not identified as significant regulators for explaining gene expression in at least 1% of samples are indicated with crossed lines. (b) A functionally grouped network of enriched categories was generated for TFs associated with PIK3CA/PTEN mutations using KEGG pathway terms related to cancer as nodes and linked using ClueGO79 analysis. Only the most significant terms in the group are labelled. Functionally related groups partially overlap.

UCEC), with striking changes in ErbB/MAPK, mTOR, HIF-1, phosphoproteomic data from isogenic knock-in breast cell lines VEGF and PI3K-Akt pathways. Only ELK1, a TF downstream of harbouring mutations of PIK3CA45 (see Methods section). Of 11 the MAPK/ERK pathway, is dysregulated by PIK3CA or PTEN TFs represented in the phosphoproteomic data and associated mutations in all three cancers. Four TF associations were with mutant PIK3CA in BRCA, eight of them—ADD1, FOXO3, shared in BRCA and HNSC: ZEB1 (EMT activator), TCF4 HMGA1, HSF1, JUND, NF1, POU2F1, STAT3—showed protein (Wnt signalling), RREB1 (Ras signalling) and FOXM1 abundance change in isogenic cell line systems (see Methods (cell proliferation, progression, cell differentiation, section and Supplementary Table 12). Moreover, of the six TFs DNA damage repair, tissue homeostasis, angiogenesis and represented in the AKT1 kinase assay45 and associated with apoptosis); 35 TFs were common to BRCA and UCEC; and just mutant PIK3CA in BRCA, four of them—ETS1, ATF6, SOX9 and IRF9 was shared between UCEC and HNSC. There were also TEAD1—were identified as AKT substrates (Supplementary associations unique to each cancer type, including USF1 and Table 13). SMAD1 for BRCA, FOXF2 for HNSC, and NFE2L2 and NR3C1 for UCEC. Interestingly, many TF activities were associated with mutant Predicted TFs for mutant PIK3CA validate experimentally. PTEN irrespective of PIK3CA status in endometrial cancer Our analysis associated mutant PIK3CA with ELK1 and TCF4 (Supplementary Fig. 19), consistent with a recent preclinical study40, activity in both breast and head and neck cancer, and with while PIK3CA mutations were only significantly associated with a FOXO1 activity in BRCA but not in head and neck cancer. We single TF, CREB1. Therefore, PTEN and PIK3CA appear to have validated these predictions in isogenic BRCA and head and neck distinct consequences for PI3K activation in UCEC. cancer cell lines by measuring promoter occupancy via chromatin PI3K pathway inhibition is known to alter STAT5 (ref. 41), immunoprecipitation-quantitative PCR (ChIP-qPCR) and FOXO, RUNX2 (ref. 42), ERG1 (ref. 43) and ETS1 (ref. 44) expression change via quantitative reverse transcription-qPCR activities, consistent with our results. We also examined protein (RT-qPCR) of canonical target genes of these TFs (Fig. 6 and microarray-based AKT1 kinase assay and SILAC-based Methods section).

8 NATURE COMMUNICATIONS | 8:14249 | DOI: 10.1038/ncomms14249 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14249 ARTICLE

ab c d ACTR3 PSMB4 2.5 WNT10B APC RUNX1 CDK1 3.0 4.5 3.0 2.0 4.0 4.0 1.8 3.5 2.5 2.0 2.5 WT WT E545K 1.6 3.5 3.0 2.0 3.0 2.0 1.4 1.5 2.5 2.5 1.2 PIK3CA kDa PIK3CA 1.5 1.5 1.0 2.0 75 2.0 1.0 pAKT (S473) 0.8 1.5 1.0 1.5 1.0 0.6 50 1.0 1.0 Relative expression

Relative expression 0.5 Relative expression Relative expression 0.4 Relative expression

0.5 Relative expression 0.5 75 0.5 0.2 0.5 pS6K (T389) 0.0 0.0 0.0 0.0 0.0 0.0

50 WT WT WT WT WT WT E545K E545K E545K E545K E545K E545K 50 Actin PIK3CA PIK3CA PIK3CA PIK3CA PIK3CA PIK3CA PIK3CA PIK3CA PIK3CA PIK3CA PIK3CA PIK3CA 37 2.0 RUNX1 1.8 CDK1 ACTR3 PSMB4 3.5 WNT10B 3.5 APC 3.5 PIK3CA PIK3CA 1.8 PIK3CA E545K 20.0 E545K 1.6 PIK3CA WT 3.0 3.0 3.0 E545K 1.6 PIK3CA WT 1.4 PIK3CA WT 1.4 2.5 16.0 2.5 2.5 1.2 1.2 2.0 1.0 2.0 12.0 2.0 1.0 FOXO1 1.5 1.5 1.5 0.8 0.8 ELK1 8.0 0.6 Fold enrichment 0.6 Fold enrichment TCF4 Fold enrichment 1.0 Fold enrichment 1.0 1.0 Fold enrichment Fold enrichment 0.4 0.4 4.0 0.5 0.5 0.5 0.2 0.2 0.0 0.0 0.0 0.0 0.0 0.0 IgG ELK1 IgG ELK1 IgG TCF4 IgG TCF4 IgG FOXO1 IgG FOXO1

efACTR3 PSMB4 g h 5.0 5.5 WNT10B APC 1.6 RUNX1 1.6 TNFSF10 1.2 1.2 5.0 4.5 1.4 1.4 WT WT E545K 4.0 4.5 1.0 1.0 1.2 1.2 3.5 4.0 0.8 3.0 3.5 0.8 1.0 1.0 PIK3CA Control kDa PIK3CA 3.0 2.5 0.8 0.8 2.5 0.6 0.6 100 HA 2.0 2.0 0.6 0.6 0.4 0.4 1.5 1.5 50 0.4 0.4 Relative expression

Relative expression 1.0 Relative expression 1.0 Relative expression Relative expression Actin 0.2 Relative expression 0.2 0.5 0.5 0.2 0.2 0.0 0.0 0.0 0.0 0.0 0.0

WT WT WT WT WT WT Control E545K Control E545K Control E545K Control E545K Control E545K Control E545K

PIK3CA PIK3CA PIK3CA PIK3CA PIK3CA PIK3CA PIK3CA PIK3CA PIK3CA PIK3CA PIK3CA PIK3CA

1.4 RUNX1 1.6 TNFSF10 3.0 2.5 APC Control ELK1 TCF4 3.0 ACTR3 4.5 PSMB4 Control WNT10B Control 1.4 PIK3CA WT PIK3CA WT PIK3CA WT 1.2 4.0 2.5 PIK3CA E545K 2.5 PIK3CA E545K 2.0 1.2 3.5 PIK3CA E545K 1.0 2.0 3.0 2.0 1.0 1.5 0.8 2.5 1.5 1.5 0.8 0.6 2.0 1.0 0.6 1.0 1.5 1.0 Fold enrichment 0.4 Fold enrichment Fold enrichment Fold enrichment

Fold enrichment 0.4 Fold enrichment 1.0 0.5 0.5 0.5 0.2 0.5 0.2 0.0 0.0 0.0 0.0 0.0 0.0 IgG ELK1 IgG ELK1 IgG TCF4 IgG TCF4 IgG FOXO1 IgG FOXO1 Figure 6 | Predicted transcriptional regulatory impacts of activating PIK3CA mutations validated experimentally. (a) Western blot analysis of pAKT (S473), pS6K (T389) and actin in parental MCF7 cells that carry the PIK3CA E545K mutation and in ‘corrected’ WT PIK3CA cells. ELK1 activity in PI3Ka mutant cells. (b) ACTR3 and PSMB4 mRNA expression in parental PIK3CA mutant and PIK3CA WTcells. ChIP assays with control IgG or ELK1 antibodies for ACTR3 and PSMB4 in parental PIK3CA mutant and PIK3CA WT MCF7 cells. The data are presented as fold-enrichment relative to the actin control gene region. TCF4 activity in PI3Ka mutant cells (mean±s.d., n ¼ 3 independent experiments). (c) WNT10B and APC mRNA expression in parental PIK3CA mutant and PIK3CA WT cells. ChIP assays with control IgG or TCF4 antibodies for WNT10B and APC in parental PIK3CA mutant or PIK3CA WT MCF7 cells. The data are presented as fold-enrichment relative to the actin control gene region (mean±s.d., n ¼ 3 independent experiments). FOXO1 activity in PI3Ka mutant cells. (d) RUNX1 and CDK1 mRNA expression in parental PIK3CA mutant and PIK3CA WT MCF7 cells. Parental and WT MCF7 cells were subjected to ChIP assays with control IgG or FOXO1 antibodies. The data are presented as fold-enrichment relative to the actin control gene region (mean±s.d., n ¼ 3 independent experiments). (e) Transfected vector control, WT PIK3CA or PIK3CA E545K Cal27 cells were subjected to western blots with haemagglutinin (HA) and actin antibodies after 48 h of transfection. (f) ACTR3 and PSMB4 mRNA expression in control, PIK3CA WT and PIK3CA E545K cells. ChIP assays with control IgG or ELK1 antibodies in control, WT and PIK3CA E545K cells (mean±s.d., n ¼ 3 independent experiments). (g) WNT10B and APC mRNA expression in control, PIK3CA WT- and PIK3CA E545K-transfected Cal27 cells. ChIP assays with control IgG or TCF4 antibodies in control-, PIK3CA WT- and PIK3CA E545K-transfected Cal27 cells (bottom panel). (h) RUNX1 and TNSF10A mRNA expression in control, PIK3CA WT- and PIK3CA E545K-transfected Cal27 cells. ChIP assays with control IgG or FOXO1 antibodies in control-, WT- and PIK3CA E545K-transfected Cal27 cells. For other TF targets see Supplementary Fig. 21 and Supplementary Fig. 23.

First, we used the parental MCF7 cell line carrying the PIK3CA PIK3CA enhances ELK1 transcriptional activity in BRCA cells E545K mutation and an MCF7 PIK3CA WT cell line in which the (Fig. 6b, bottom panel). mutation was corrected using gene targeting46. Western blotting Well-known TCF4 target genes such as WNT10B, APC, confirmed that WT PIK3CA cells have very low PI3K pathway FBXW11 and PPP2R5E were differentially regulated by the PIK3CA activation compared to mutant parental cells, with strongly E545K mutation in MCF7 cells (Fig. 6c, top panel and reduced levels of phospho (p)-AKT and p-S6K (Fig. 6a and Supplementary Fig. 21b), and ChIP-qPCR analysis confirmed Supplementary Fig. 20). enhanced binding of TCF4 to their promoters in mutant PIK3CA Quantitative RT-qPCR analysis of the well-described ELK1 cells (Fig. 6c, bottom panel and Supplementary Fig. 20b). Similarly, target genes ACTR3, PSMB4 (ref. 47), WNK1, PAPLN, FOXP4 PCR with reverse transcription (RT–qPCR) analysis of FOXO1 and DDX27 confirmed significant increases in mRNA levels in target genes RUNX1, CDK1, CAMKK1 and TNFSF10 (ref. 48) the parental PIK3CA mutant cells compared to PIK3CA WT cells, mRNA confirmed that their mRNA levels were differentially with the exception of PAPLN, where we observed a significant regulated by the PIK3CA E545K mutation (Fig. 6d, top panel and decrease (Fig. 6b, top panel; Supplementary Fig. 21a). Moreover, Supplementary Fig. 21c), and ChIP-qPCR analysis confirmed ChIP-qPCR experiments confirmed that ELK1 binding to all five increased binding of FOXO1 to their promoters in the wild-type target gene promoters was significantly increased in the PIK3CA PIK3CA MCF7 cells relative to parental PIK3CA E545K cells mutant MCF7 compared to WT cells, showing that mutant (Fig. 6d, bottom panel and Supplementary Fig. 21c).

NATURE COMMUNICATIONS | 8:14249 | DOI: 10.1038/ncomms14249 | www.nature.com/naturecommunications 9 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14249

We also performed validation experiments in the head and mutation altered FOXO1 activity in the BRCA model but not in neck cancer cell line Cal27, which is WT for the PIK3CA gene. the head and neck cancer model, consistent with the context- Control, PIK3CA WT or PIK3CA E545K vectors were over- specific predictions of our algorithm. This shows one example of expressed in Cal27, and ChIP-qPCR and RT-qPCR expression how a clinically relevant ‘actionable mutation’ impacts regulatory experiments were performed to investigate the activity of programs in a cancer-specific manner, giving clues about ELK1, TCF4 and FOXO1. Western blotting confirmed successful druggability across tumour types. expression of these vectors in the Cal27 cell line (Fig. 6e and Supplementary Fig. 22). Like in the MCF7 BRCA model, RT-qPCR analysis demon- Discussion strated an increase in the mRNA levels of four known Many targetable alterations are present across multiple tumour ELK1 target genes, ACTR3, PSMB4, WNK1 and DDX27,in types. For example, activating mutations and amplifications of Cal27 cells transfected with PIK3CA E545K compared to Cal27 PIK3CA are targetable by PI3K inhibitors, which are in active cells transfected with WT PIK3CA and control cells (Fig. 6f, top clinical assessment in combination therapies with RTK inhibitors panel and Supplementary Fig. 23a). Increased occupancy of ELK1 and antioestrogen therapies in BRCA, antiandrogen therapy in at target promoters was confirmed by ChIP-qPCR assays only PRAD and MEK inhibitors in many solid tumours60–65. when cells were transfected with the PIK3CA E545K vector However, the cancer-specific context can impact how patients (Fig. 6f, bottom panel and Supplementary Fig. 23a). Thus, ELK1 respond to targeted therapies, since the targeted protein resides in transcriptional activity is enhanced by mutant PIK3CA in both a network of interacting proteins and is subject to extensive head and neck and BRCA models. RT-qPCR analysis feedback and crosstalk between signalling pathways. In recent demonstrated an increase in the mRNA levels of four clinical trials of targeted therapies (for example, Gleevec in TCF4 target genes in Cal27 cells with PIK3CA E545K compared chronic myelogenous leukaemia, herceptin in BRCA, BRAF to WT Cal27 and control cells (Fig. 6g, top panel and inhibitors in melanoma), patients who share the targeted Supplementary Fig. 23b). Increased occupancy of TCF4 at the mutation and tumour type displayed highly variable responses promoters of these genes was confirmed by ChIP assays only to the drugs66. Therefore, a systematic stratification of tumours when cells were transfected with the PIK3CA E545K vector that goes beyond therapeutically actionable alterations and (Fig. 6g, bottom panel and Supplementary Fig. 23b). Thus, incorporates other functional readouts—for example, mutant PIK3CA enhances TCF4 transcriptional activity in head dysregulated TF and (phospho)protein signatures derived from and neck as well as BRCA models. our model—may better predict which patients will benefit from FOXO1 activity was not associated with mutant PIK3CA in our targeted and combination therapies. HNSC model. Indeed, RT-qPCR analysis demonstrated no Using inferred TF/protein activities in tumours may also reveal significant change in the mRNA levels of the FOXO1 target clinically relevant patient subgroups. Patients with endometrioid genes in PIK3CA E545K Cal27 cells compared to WT Cal27 cells carcinomas display heterogeneous clinical courses and response and control cells (Fig. 6h, top panel and Supplementary Fig. 23c). to therapy, despite similar tumour histopathology. Clustering Further, no change in occupancy of FOXO1 at the promoters of UCEC tumours by TF activities revealed a subclass of TNFSF10 and RUNX1 was shown by ChIP assays when cells were endometrioid tumours that correlated with b-catenin mutation transfected with the PIK3CA E545K vector (Fig. 6h, bottom panel status and had poorer survival (Supplementary Fig. 24, Po0.001). and Supplementary Fig. 23c). Linking mutant b-catenin to putative downstream TF effectors ELK1 is phosphorylated through activation of the MAPK/ERK could inform future mechanistic studies—for example, short pathways and translocates to the nucleus, resulting in activation/ hairpin RNA or CRISPR/Cas screening to identify TFs whose repression of downstream targets47,49–52 that are important in deletion/knockdown leads to changes in proliferation—to develop cell proliferation, apoptosis, cell migration and invasion, new therapeutic strategies. and inflammatory response53,54. Immunohistochemistry Previous algorithms to interpret the role of somatic alterations in breast tumour specimens has shown that the levels of have examined enrichment of mutations in known pathways3,4,6 p-ELK1 expression are significantly elevated in luminal and or searched for alterations that represent mutual exclusive Her-2-negative BRCA subtypes55, but how Elk1 is activated in patterns2, subnetworks10,67–69 or modules70. These approaches BRCA is not known. Our computational and experimental results examine co-occurrences of somatic alterations in a known suggest that a potential mechanism for ELK1 activation is protein interaction network without explicitly modelling their through an activating PIK3CA mutation. impact on transcriptional programs or signalling. Several recent TCF4 interacts with b-catenin to mediate Wnt signalling and studies have integrated TF binding site or occupancy data to has been implicated in colorectal tumorigenesis56,57. Our analyses identify cancer-associated TFs, for example, combining tumour- confirm that TCF4 activation may result from an activating specific DNA methylation changes in distal enhancers, mRNA PIK3CA mutation. Recently, a small-molecule inhibitor of the sequencing and cis-regulatory sequences mediating effects on b-catenin/TCF4 interaction called LF3 has been shown to target genes71 or integrating ENCODE TF ChIP-seq profiles with diminish Wnt-dependent biologic characteristics of colon the pancancer TCGA expression data72. However, these cancer cells, inhibit their self-renewal capacity and induce their approaches do not model the relationship between perturbed differentiation58. Since we demonstrated altered transcriptional pathways (for example, from proteomic data) and TF activity, nor activity of TCF4 downstream of mutant PIK3CA in breast and do they consider the impact of somatic alterations on gene head and neck cancer cells, targeting TCF4 might be new regulatory models. therapeutic strategy in PIK3CA mutant patients. Overall, current methods cannot translate the mutational FOXO TFs, including FOXO1, are implicated in the regulation landscape of a tumour into a usable model of affected pathways of stress resistance, metabolism, cell cycle, apoptosis and nor use mutational status to predict accurately response to DNA repair. It is well known that constitutive PI3K-AKT targeted therapies. Our model is designed to capture the causal pathway activation causes downregulation of FOXO tumour flow of information from signalling to TFs to target genes; the suppressor functions in BRCA59. However, regulation of FOXO association analysis is likely to identify causal impacts of target genes is multifactorial, and based on our findings, context- mutations and copy number events, since somatic alterations dependent. Specifically, we showed that an activating PIK3CA usually alter TF activity/signalling rather than vice versa

10 NATURE COMMUNICATIONS | 8:14249 | DOI: 10.1038/ncomms14249 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14249 ARTICLE

(with exception of TFs/signalling pathways involved in We obtained SILAC-based quantitative phosphoproteomic data set of DNA repair). Our analysis revealed both known and putative a spontaneously immortalized non-tumorigenic breast epithelial cell line MCF10A along with two isogenic derivatives generated by knock-in of mutant alleles—one interactions of frequently altered genes with signalling bearing the E545K mutation and the other bearing the H1047R mutation of and transcriptional programs in a pancancer context and the PIK3CA gene—from the originally published Supplementary Data45.We provides a general strategy for future studies. In cases where used a 1.5-fold cutoff value to designate peptides as having increased a mutation is associated with the altered activity of a targetable phosphorylation and a 0.67-fold for decreased phosphorylation (same thresholds as original publication)45. We also obtained human protein microarray-based TF or (phospho)protein, our analysis may suggest combination AKT1 kinase assays from the originally published Supplementary Data45. therapies. We obtained RPPA data for uterine corpus endometrioid carcinoma25,75 and The method we have presented has several limitations. First, head and neck cancer patients28 from the original publications. our analysis uses predicted TF binding sites based on existing TF motif databases and restricted to promoter sequences; Training the AR models. AR is an algorithm for efficiently solving a regularized therefore, the TF motif hit matrix is noisy, incomplete and not bilinear regression problem15,16, defined here as follows. For a data set of M tumour samples profiled using RNA-seq with N genes, we let Y2RNxM be the context-specific. Indeed, due to the strong correlation structure mean-centred log 10 gene expression profiles of tumour samples. Each column between RPPA and mRNA expression data, AR models trained of Y corresponds to an RNA-seq experiment. We define each gene’s TF attributes with the true motif hit matrix achieve only a modest—albeit in a matrix D 2 RNxQ, where each row represents a gene and each column significant—improvement in prediction performance over models represent the hit vector for a TF, that is, the bit vector indicating whether there is trained on randomized motif data (Supplementary Table 1). binding site for the TF in the promoter region of each gene. We define the RPPA attributes of tumour samples as a matrix P 2 RMxS where each row represents a Additionally, since many inferred TF and (phospho)protein tumour sample and each column represents the (mean-centred) log RPPA protein activities are correlated, individual genomic aberrations may be expression profile for the tumour sample. We set up a bilinear regression problem associated with many regulators. This multiplicity of inferred to learn the weight matrix W 2 RQxS on paired of TF signalling protein features: effects may be biologically reasonable but complicates interpreta- DWPT þ e ¼ Y ð1Þ tion. Another methodological challenge is the need to control for the complex background of genomic aberrations. To do this, We can transform the system to an equivalent system of equations by reformulating the matrix products as Kronecker products we used regularized regression with permutation testing to identify a smaller set of somatic alterations with confident DWPT ,ðP DÞ vecðÞW ð2Þ N associations. Still, the problem of selecting a few significant where is a Kronecker product and vec(.) is a vectorizing operator that stacks covariates from a long candidate list given limited sample size is a matrix and produces a vector, yielding a standard (if large-scale) regression inherently difficult with no fail-safe solution. problem. Full details and a derivation of the reduced optimization problem are 16 Despite these limitations, we have presented a principled provided elsewhere . We fit the ridge regression model using the SLEP MATLAB package and evaluate performance with 10-fold cross-validation. integrative strategy for predicting the context-specific impact of Given the (phospho)protein profile of a test tumour sample (centred relative to somatic alterations on transcriptional programs and signalling the mean of the training set), we can right multiply the (phospho)protein expression pathways. Moreover, our predictions generalize to independent vector through the trained model to predict the similarity of its expression profile to patient cohorts and validate experimentally in isogenic cancer cell those of the training tumour samples. To recover a reconstruction of the test gene expression profile from the predicted similarities, we assume that the test expression line models. We anticipate that such integrative statistical profile is in the linear span of the training profiles. Then, a simple transformation modelling strategies will be crucial for personalizing cancer converts the vector of computed similarities into a predicted gene expression therapies. variation profiles16. Finally, to infer the (phospho)protein activity in a new sample from the (centred) gene expression profile, we can left multiply through the model via YTDW and to infer the TF activities in each sample, we can right-multiply Methods the protein expression profiles through the model by WPT. Data and preprocessing. We downloaded RPPA protein expression data from TCPA (http://bioinformatics.mdanderson.org/main/TCPA:Overview). RPPA Significance analysis for TF and (phospho)protein activities. To assess the protein expression data for the UCS study, RNA-seq gene expression data, somatic statistical significance of the inferred (phospho)protein and TF activities obtained mutation data and clinical data were downloaded from TCGA’s Firehose data run from the model via the YTDW and WPT mappings, respectively, we developed an (https://confluence.broadinstitute.org/display/GDAC/Dashboard-Stddata). empirical null model as follows. First, we generated random permutations of the GISTIC copy number data was downloaded from TCGA’s Firehose analyses run gene expression profiles Y for each tumour type. For each permuted Y response (https://confluence.broadinstitute.org/display/GDAC/Dashboard-Analyses). Only matrix, we trained an AR model using true D and P input matrices and computed the samples ‘whitelisted’ by TCGA for the Pan-Cancer Analysis Working Group the corresponding inferred TF and (phospho)protein activities via the YTDW and were used in the study. For our analysis, we restricted to samples with parallel WPT mappings. Using this permutation and model fitting procedure 5000 times, RNA-seq, RPPA, somatic mutation and GISTIC copy number data (Supplementary we generated an empirical null model for TF and (phospho)protein activity Table 14). distribution for each sample. To identify significant regulator activities (R), we Silent mutations were filtered from somatic mutation data. We removed assessed the nominal P value for each sample relative to the empirical null model genes that were not identified as significant (q 0.05) by the MutSigCV73 as o for the particular regulator (TF/(phospho)protein), and we corrected for multiple well as not present at least ten samples in each cancer type. To determine hypothesis testing of non-independent hypotheses using the Benjamini–Hochberg– copy number alteration events, we used the set of discrete copy number calls Yekutieli procedure. Then, we reported the significant regulators using an FDR of provided by GISTIC2 (ref. 74). We considered genes to be altered only in 0.1 for the largest TCGA studies (BRCA, KIRC; 4300 samples), an FDR of 0.15 for samples where they resided either in regions of homozygous loss ( 2) or mid-size studies (BLCA, COADREAD, HNSC, LUAD, LUSC, UCEC, OV, PRAD; high-level amplification (2) among the set of recurrent copy number alterations. sample size o300 and 4100), and an FDR of 0.25 for small studies (GBM, UCS; We removed genes that were not identified as significant (qo0.001) by the o100 samples) as our thresholds for significance. Then, we calculated, for each GISTIC2 (ref. 74) as well as not present at least ten samples in each cancer TF/signalling regulator, the frequency over samples where the regulator passed type. Then, we encoded somatic aberrations as being present/absent. The its significant threshold for a given cancer. We used this approach to identify final selected set of binary calls for genomic alterations provided a simplified significant regulators in each cancer type to identify the shared and cancer-specific but informative description of the somatic alterations observed in individual roles TF/(phospho)protein regulators. tumours. Log 10-transformed RNA-seq RSEM gene expression values for each of the 12 cancer types were processed independently to identify the set of 5,000 genes that Model impact of genomic aberrations in terms of TF/(phospho)protein activity. varied most across samples. Gene expression and protein expression vectors were We used ridge regression to predict each TF/(phospho)protein regulator’s activity both mean-centred. (R) from genomic aberration profiles and used a permutation test approach to To construct the motif hit matrix, we downloaded the TF binding site assign significance to ridge regression coefficients. Somatic alterations were simply predictions (TRANSFAC v.7.4) for all target genes from MSigDB34. We removed encoded as present/absent. In the permutation test, the elements of the outcome motifs with similar sets of targets to reduce redundancy. This matrix defined vector of regulator activities R were randomly permuted across samples, and the a candidate set of regulatory relationships between TFs and target genes. Further, ridge regression model was fitted using the permuted observations to obtain ridge for each of the 12 cancer types, we filtered TFs that were not expressed in at least regression coefficients. By performing 10,000 such permutations, a null distribution 40% of samples (Supplementary Data 1). of the regression coefficients was generated. The permutation test P value was

NATURE COMMUNICATIONS | 8:14249 | DOI: 10.1038/ncomms14249 | www.nature.com/naturecommunications 11 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14249 calculated as the proportion of regression coefficients from the null distribution manufacturer’s instructions. The Applied Biosystems SYBR green mix greater than or equal in absolute value to the absolute value of the coefficient fitted (Life Technologies) was used to amplify specific genes listed in Supplementary to the true (non-permuted) data. Further we corrected for multiple hypothesis Table 15. testing of non-independent hypotheses using the Benjamini–Hochberg–Yekutieli Primers used for mRNA expression were: ACTR3,50-CATTCCTG procedure across all TF/signalling regulators for each cancer, and we multiplied TGGCTGAAGGGT-30 and 50-ATCGCTGCATGTGGTGTGTA-30; FOXP4, these FDR-adjusted P values with the sign of the coefficient from the model to 50-GACCCTGTGTGAAGACCTGG-30 and 50-GTCAGGGGTTTCCAGGATG calculate a final regulatory–genomic aberration association score. For downstream G-30; DDX27,50-TTGGGGAAGGACATCTGTGC-30 and 50-CGGATCCGGA analysis, we restricted our analysis to regulators identified as significant in at least TGAACTCCTG-30 PAPLN,50-AGGTCATCTGTGCCATTGGG-30 and 1% of samples in each cancer and genomic aberrations with FDR-corrected 50-TGTAGAAGCCACTGCCCTTG-30; PSMB4,50-GACATGCTGGGATCCTA Po0.15 across regulators from the ridge regression analysis with permutation test. CGG-30 and 50-CTTTTTCGGTGACAGTGGCG-30; WNK1,50-CTTTTT We also assessed synergistic/antagonistic effects of pairs of genomic aberrations CGGTGACAGTGGCG-30 and 50-CTTGGCTGTTCACTGTTGCC-30; CDK1, on TF/protein activity by building linear models with interaction terms for each 50-ACAGGTCAAGTGGTAGCCATG-30 and 50-GGAGTGCCCAAAGCTC pair of genomic aberrations. We restricted our analysis to pairs of genes that were TGAA-30; CAMKK1,50-CAGGAAGCTATCTGGAGGCG-30 and 50-AAGTA altered in at least in 20 tumour samples as well as comutated in at least 10 tumour CTCGAGGCCCAGGAT-30; TNFSF10, 5-CCTCAGAGAGTAGCAGCTCACA-30 samples. Here the activity of each TF/(phospho)protein regulator (R) was a and 50-CAGAGCCTTTTCATTCTTGGA-30; ACTB,50-CGTCTTCCCCTCCAT modelled function of somatic aberration pairs A and B: CGT-30 and 50-GAAGGTGTGGTGCCAGATTT-30; APC,50-CATTTCCAAGA AGAGGGTTTGT-30 and 50-GATCAGCAAGAAGCAATGACC-30; FBXW11, 0 0 0 Rijk ¼ m þ Ai þ Bj þ ðÞAB ij þ ek ð3Þ 5 -GGCTGCCGTCAATGTAGTAGA-3 and 5 -GTGCTCGTGCTCCAG ACTT-30; PPP2R5E,50-GTGTGTATCTAGCCCCCATTTT-30 and 50-AAACTCA 0 0 where Ai and Bj represent the main effects of the ith and jth values of A and B, TGATGTATTCATTATTCCAA-3 ; WNT10B,5-ATGCGAATCCACAAC 0 0 0 respectively, (AB)ij is the effect of the interaction in that combination, and ek is the AACAG-3 and 5 -TCCAGCATGTCTTGAACTGG-3 . error term of the kth observation in that combination. Here the genomic aberrations are binarized (present/absent), so the values are 1 and 0. We first looked for regulator models for which the coefficient of the interaction Chromatin immunoprecipitation. MCF7 and Cal27 cell lines were crosslinked term was significant (Po0.05). If the interaction term was significant, and if its with 1% formaldehyde for 10 min at room temperature and quenched with coefficient was greater than zero and greater than the coefficients of the genomic 125 mM glycine for 5 min at room temperature. Cell were lysed and sheared aberration pair, we assumed this pair of somatic aberrations had a synergistic effect to obtain chromatin fragments of 200–500 bp. Sheared chromatin was on regulator activity. If the interaction term was significant and its coefficient was incubated overnight with 2 mg of rabbit monoclonal ChIP grade antibody to less than zero and less than the coefficients of the genomic aberration pair, we ELK1 (E277, ab32106; Abcam) as has been previously used by Zhang et al.76 assumed this pair of somatic aberrations had an antagonistic effect on regulator 2 mg of a goat polyclonal antibody to TCF4 (N-20, sc-8631; Santa Cruz) as has activity. been previously used by Ding et al.77 and 2 mg of rabbit polyclonal FOXO1 (H-108, sc-11350; Santa Cruz) as has been previously used by Xiong et al.78 Protein G magnetic beads were used to capture antibody–chromatin association Survival analyses. We built Cox proportional hazard regression models for overnight, followed by sequential washes. The antibody bound beads were then regulators that attained significance by our empirical P value procedure in at least reverse crosslinked for 6 h at 65 °C, followed by proteinase K treatment at 55 °C for 5% of samples using (1) their inferred TF activity profiles and (2) their gene 1 h. The ChIP DNA was purified using a DNA Purification Kit from Qiagen. The expression profiles corresponding to TFs. We used clinical stage as a background Applied Biosystems SYBR green mix was used to amplify specific regions factor for BLCA and KIRC and histological subtype as a background factor for (Supplementary Table 16). UCEC. Overall survival was calculated from the date of initial diagnosis of cancer- to disease-specific death (patients whose vital status is termed dead) and months to last follow-up (for patients who are alive). Further, we evaluated prognostic Western blot analysis. Cells were lysed and proteins were extracted in RIPA accuracy of survival models using a log-rank test. We corrected for multiple buffer that was supplemented with phosphatase and protease inhibitors. Proteins hypothesis testing of non-independent hypotheses using the FDR procedure across were separated by SDS–polyacrylamide gel electrophoresis gels and transferred all models in each cancer type. FDR-corrected P values of models built from to a PVDF (polyvinylidene difluoride) membrane. Membranes with blocked with inferred TF activities and actual TF mRNA expression profiles were compared 5% bovine serum albumin and probed using specific antibodies. Actin (1:2,000), using one-sided paired Wilcoxon’s signed-rank test. pAKT (S473) (1:1,000), pS6K (T389) (1:1,000), HA (1:1,000) were all from Cell For the validation set, we used the TCGA-trained UCEC model to infer Signaling Technology (CST). TF activity profiles of MDACC (n ¼ 178) and Bergen (n ¼ 209) data sets75 from their RPPA profiles. Since these data sets do not include serous endometrial carcinoma samples, we build univariate Cox models with just TCGA endometrioid Data availability. RPPA protein expression data is available in a public repository endometrial carcinoma patients. We first identified TFs with univariate Cox from TCPA (http://bioinformatics.mdanderson.org/main/TCPA:Overview). RPPA Po0.05 on the TCGA patients. Then, we predicted the risk for each patient in the protein expression data for the UCS study, RNA-seq gene expression data, somatic validation set and calculating concordance index. We reported the P value for the mutation data and clinical data are available in a public repository from TCGA’s statistical test if the concordance index estimate was different from 0.5. Firehose data run (https://confluence.broadinstitute.org/display/GDAC/Dash- For visualization, Kaplan–Meier survival analysis was used to show the board-Stddata). GISTIC copy number data is available in a public repository from association of the inferred TF activity with patient survival. For each selected TCGA’s Firehose analyses run (https://confluence.broadinstitute.org/display/ TF and cancer-type combination, each patient’s risk was calculated, and patients GDAC/Dashboard-Analyses). Only the samples ‘whitelisted’ by TCGA for the were ranked in descending order. We designated the top 40% of the patients as the Pan-Cancer Analysis Working Group were used in the study. For our analysis, we high-risk group and the bottom 40% as the low-risk group. restricted to samples with parallel RNA-seq, RPPA, somatic mutation and GISTIC copy number data (Supplementary Table 14 and Supplementary Data 1). Statistical analysis. Statistical tests were performed with the R statistical The authors declare that all data supporting the findings of this study are environment. For population comparisons of inferred TF and (phospho)protein available within the article and its Supplementary Information files or from the activities, we performed two-tailed Wilcoxon’s signed-rank tests and determined corresponding author on reasonable request. the direction of shifts by comparing the mean of two populations. References Cell lines and transfection. The BRCA cell line, MCF7, which has a PIK3CA 1. Hanahan, D. & Weinberg, R. A. The hallmarks of cancer. Cell 100, 57–70 E545K mutation, and the targeted correction of the E545K mutation to (2000). WT PIK3CA were obtained from the Lauring Lab46. The head and neck cancer cell 2. Ciriello, G., Cerami, E., Sander, C. & Schultz, N. Mutual exclusivity analysis line, Cal27, was obtained from the American Type Culture Collection. The cell identifies oncogenic network modules. Genome Res. 22, 398–406 (2012). lines have been tested negative for mycoplasma contamination. Parental and 3. Cancer Genome Atlas Research, N. Comprehensive genomic characterization WT PIK3CA MCF7 and Cal27 were maintained in Dulbecco’s modified Eagle’s defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 medium (DMEM/DF12) supplemented with 5% foetal bovine serum and (2008). 1 1 100 U ml penicillin and 100 mgml streptomycin. Cal27 cell lines were 4. Cancer Genome Atlas, N. Comprehensive molecular characterization of human transfected with pbabe control vector, pbabe WT PIK3CA and pbabe E545K colon and rectal cancer. Nature 487, 330–337 (2012). PIK3CA vectors (Addgene) using Lipofectamine 3000 according to the 5. Cancer Genome Atlas, N. Comprehensive genomic characterization of head manufacturer’s instructions. and neck squamous cell carcinomas. Nature 517, 576–582 (2015). 6. Cancer Genome Atlas Research, N. Integrated genomic analyses of ovarian RNA extraction and quantitative real-time PCR. Total RNA was extracted carcinoma. Nature 474, 609–615 (2011). from MCF7 and Cal27 cell lines using an RNA Extraction Kit from Qiagen. 7. Cancer Genome Atlas Research, N. Comprehensive genomic characterization cDNA synthesis was performed using iScript from Bio-Rad, according to the of squamous cell lung cancers. Nature 489, 519–525 (2012).

12 NATURE COMMUNICATIONS | 8:14249 | DOI: 10.1038/ncomms14249 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14249 ARTICLE

8. Cancer Genome Atlas Research, N. Comprehensive molecular characterization 38. Engelman, J. A. Targeting PI3K signalling in cancer: opportunities, challenges of clear cell renal cell carcinoma. Nature 499, 43–49 (2013). and limitations. Nat. Rev. Cancer 9, 550–562 (2009). 9. Cancer Genome Atlas Research, N. et al. Integrated genomic characterization 39. Janku, F. et al. Assessing PIK3CA and PTEN in early-phase trials with of endometrial carcinoma. Nature 497, 67–73 (2013). PI3K/AKT/mTOR inhibitors. Cell Rep. 6, 377–387 (2014). 10. Cerami, E., Demir, E., Schultz, N., Taylor, B. S. & Sander, C. Automated 40. Joshi, A., Miller, Jr. C., Baker, S. J. & Ellenson, L. H. Activated mutant network analysis identifies core pathways in glioblastoma. PLoS ONE 5, e8918 p110alpha causes endometrial carcinoma in the setting of biallelic Pten (2010). deletion. Am. J. Pathol. 185, 1104–1113 (2015). 11. Hofree, M., Shen, J. P., Carter, H., Gross, A. & Ideker, T. Network- 41. Britschgi, A. et al. JAK2/STAT5 inhibition circumvents resistance to based stratification of tumor mutations. Nat. Methods 10, 1108–1115 PI3K/mTOR blockade: a rationale for cotargeting these pathways in metastatic ð2013Þ: breast cancer. Cancer Cell 22, 796–811 (2012). 12. Costello, J. C. et al. A community effort to assess and improve drug sensitivity 42. Cohen-Solal, K. A., Boregowda, R. K. & Lasfar, A. RUNX2 and the PI3K/AKT prediction algorithms. Nat. Biotechnol. 32, 1202–1212 (2014). axis reciprocal activation as a driving force for tumor progression. Mol. Cancer 13. Hyman, D. M. et al. Vemurafenib in multiple nonmelanoma cancers with 14, 137 (2015). BRAF V600 mutations. N. Engl. J. Med. 373, 726–736 (2015). 43. Cabodi, S. et al. Convergence of integrins and EGF receptor signaling via 14. Juric, D. et al. in Proc. 103rd Meeting of AACR CT-01 (2012). PI3K/Akt/FoxO pathway in early gene Egr-1 expression. J. Cell. Physiol. 218, 15. Osmanbeyoglu, H. U., Pelossof, R., Bromberg, J. F. & Leslie, C. S. Linking 294–303 (2009). signaling pathways to transcriptional programs in breast cancer. Genome Res. 44. Ibrahim, Y. H. et al. PI3K inhibition impairs BRCA1/2 expression and 24, 1869–1880 (2014). sensitizes BRCA-proficient triple-negative breast cancer to PARP inhibition. 16. Pelossof, R. et al. Affinity regression predicts the recognition code of nucleic Cancer Discov. 2, 1036–1047 (2012). acid-binding proteins. Nat. Biotechnol. 33, 1242–1249 (2015). 45. Wu, X. et al. Activation of diverse signalling pathways by oncogenic PIK3CA 17. Catalano, S. et al. Evidence that leptin through STAT and CREB signaling mutations. Nat. Commun. 5, 4961 (2014). enhances D1 expression and promotes human endometrial cancer 46. Beaver, J. A. et al. PIK3CA and AKT1 mutations have distinct effects on proliferation. J. Cell Physiol. 218, 490–500 (2009). sensitivity to targeted pathway inhibitors in an isogenic luminal breast cancer 18. Ishida, M. et al. Activation of aryl hydrocarbon receptor promotes invasion of model system. Clin. Cancer Res. 19, 5413–5422 (2013). clear cell renal cell carcinoma and is associated with poor prognosis and 47. Odrowaz, Z. & Sharrocks, A. D. ELK1 uses different DNA binding modes to cigarette smoke. Int. J. Cancer 137, 299–310 (2015). regulate functionally distinct classes of target genes. PLoS Genet. 8, e1002694 19. Takaha, N. et al. Expression and role of HMGA1 in renal cell carcinoma. (2012). J. Urol. 187, 2215–2222 (2012). 48. Vasquez, Y. M. et al. FOXO1 is required for binding of PR on IRF4, novel 20. Park, J. T., Shih, Ie, M. & Wang, T. L. Identification of Pbx1, a potential transcriptional regulator of endometrial stromal decidualization. Mol. , as a Notch3 target gene in ovarian cancer. Cancer Res. 68, 8852–8860 Endocrinol. 29, 421–433 (2015). (2008). 49. Boros, J. et al. Elucidation of the ELK1 target gene network reveals a role in the 21. Kikugawa, T. et al. PLZF regulates Pbx1 transcription and Pbx1-HoxC8 coordinate regulation of core components of the gene regulation machinery. complex leads to androgen-independent prostate cancer proliferation. Prostate Genome Res. 19, 1963–1973 (2009). 66, 1092–1099 (2006). 50. Gille, H. et al. ERK phosphorylation potentiates Elk-1-mediated 22. Magnani, L., Ballantyne, E. B., Zhang, X. & Lupien, M. PBX1 genomic pioneer ternary complex formation and transactivation. EMBO J. 14, 951–962 function drives ERalpha signaling underlying progression in breast cancer. (1995). PLoS Genet. 7, e1002368 (2011). 51. Odrowaz, Z. & Sharrocks, A. D. The ETS transcription factors ELK1 and 23. Magnani, L. et al. The pioneer factor PBX1 is a novel driver of metastatic GABPA regulate different gene networks to control MCF10A breast epithelial progression in ERalpha-positive breast cancer. Oncotarget 6, 21878–21891 cell migration. PLoS ONE 7, e49892 (2012). (2015). 52. Booy, E. P., Henson, E. S. & Gibson, S. B. Epidermal growth factor 24. Kim, T. H. et al. Forkhead box O-class 1 and forkhead box G1 as regulates Mcl-1 expression through the MAPK-Elk-1 signalling pathway prognostic markers for bladder cancer. J. Korean Med. Sci. 24, 468–473 contributing to cell survival in breast cancer. Oncogene 30, 2367–2378 ð2009Þ: (2011). 25. Liang, H. et al. Whole-exome sequencing combined with functional genomics 53. Hipskind, R. A., Rao, V. N., Mueller, C. G., Reddy, E. S. & Nordheim, A. reveals novel candidate driver cancer genes in endometrial cancer. Genome Res. Ets-related protein Elk-1 is homologous to the c-fos regulatory factor p62TCF. 22, 2120–2129 (2012). Nature 354, 531–534 (1991). 26. Chen, J. Signaling pathways in HPV-associated cancers and therapeutic 54. Wyrzykowska, P., Stalinska, K., Wawro, M., Kochan, J. & Kasza, A. Epidermal implications. Rev. Med. Virol. 25(suppl 1): 24–53 (2015). growth factor regulates PAI-1 expression via activation of the transcription 27. Hu, Z. et al. Genome-wide profiling of HPV integration in cervical cancer factor Elk-1. Biochim. Biophys. Acta 1799, 616–621 (2010). identifies clustered genomic hot spots and a potential microhomology- 55. Laliotis, A. et al. Immunohistochemical study of pElk-1 expression in human mediated integration mechanism. Nat. Genet. 47, 158–163 (2015). breast cancer: association with breast cancer biologic profile and 28. Sewell, A. et al. Reverse-phase protein array profiling of oropharyngeal cancer clinicopathologic features. Breast 22, 89–95 (2013). and significance of PIK3CA mutations in HPV-associated head and neck 56. Zhang, T. et al. Evidence that APC regulates survivin expression: a possible cancer. Clin. Cancer Res. 20, 2300–2311 (2014). mechanism contributing to the stem cell origin of colon cancer. Cancer Res. 61, 29. Westra, W. H. et al. Inverse relationship between human papillomavirus-16 8664–8667 (2001). infection and disruptive p53 gene mutations in squamous cell carcinoma of the 57. Schepeler, T. et al. Attenuation of the beta-catenin/TCF4 complex in head and neck. Clin. Cancer Res. 14, 366–369 (2008). colorectal cancer cells induces several growth-suppressive microRNAs that 30. Scheffner, M., Takahashi, T., Huibregtse, J. M., Minna, J. D. & Howley, P. M. target cancer promoting genes. Oncogene 31, 2750–2760 (2012). Interaction of the human papillomavirus type 16 E6 oncoprotein with wild-type 58. Fang, L. et al. A small-molecule antagonist of the beta-catenin/TCF4 interaction and mutant human p53 proteins. J. Virol. 66, 5100–5105 (1992). blocks the self-renewal of cancer stem cells and suppresses tumorigenesis. 31. Carro, M. S. et al. The transcriptional network for mesenchymal transformation Cancer Res. 76, 891–901 (2016). of brain tumours. Nature 463, 318–325 (2010). 59. Bullock, M. FOXO factors and breast cancer: outfoxing endocrine resistance. 32. Ignatiadis, M. & Sotiriou, C. Luminal breast cancer: from biology to treatment. Endocr. Relat. Cancer 23, R113–R130 (2016). Nat. Rev. Clin. Oncol. 10, 494–506 (2013). 60. Baselga, J. et al. Everolimus in postmenopausal hormone-receptor-positive 33. Reis-Filho, J. S. et al. Is TTF1 a good immunohistochemical marker to advanced breast cancer. N. Engl. J. Med. 366, 520–529 (2012). distinguish primary from metastatic lung adenocarcinomas? Pathol. Res. Pract. 61. Beaver, J. A. & Park, B. H. The BOLERO-2 trial: the addition of everolimus to 196, 835–840 (2000). exemestane in the treatment of postmenopausal hormone receptor-positive 34. Liu, Y. et al. Clinical significance of CTNNB1 mutation and Wnt pathway advanced breast cancer. Fut. Oncol. 8, 651–657 (2012). activation in endometrioid endometrial carcinoma. J. Natl Cancer Inst. 106, 62. Ma, C. X. et al. A phase 1 trial of BKM120 (Buparlisib) in combination dju245 (2014). with fulvestrant in postmenopausal women with positive 35. Cowey, C. L. & Rathmell, W. K. VHL gene mutations in renal cell carcinoma: metastatic breast cancer.. Clin. Cancer Res. 22, 1583–1591 (2015). role as a biomarker of disease outcome and drug efficacy. Curr. Oncol. Rep. 11, 63. Bedard, P. L. et al. A phase Ib dose-escalation study of the oral pan-PI3K 94–101 (2009). inhibitor buparlisib (BKM120) in combination with the oral MEK1/2 inhibitor 36. Leinonen, H. M., Kansanen, E., Polonen, P., Heinaniemi, M. & Levonen, A. L. trametinib (GSK1120212) in patients with selected advanced solid tumors. Clin. Role of the Keap1-Nrf2 pathway in cancer. Adv. Cancer Res. 122, 281–320 Cancer Res. 21, 730–738 (2015). (2014). 64. Shah, P. D. et al. Phase I trial of daily PI3Ka inhibitor BYL719 plus letrozole (L) 37. Cantley, L. C. The phosphoinositide 3-kinase pathway. Science 296, 1655–1657 or exemestane (E) for patients (pts) with hormone receptor-positive (HR þ ) (2002). metastatic breast cancer (MBC).. J. Clin. Oncol. 32, 5s (2014).

NATURE COMMUNICATIONS | 8:14249 | DOI: 10.1038/ncomms14249 | www.nature.com/naturecommunications 13 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14249

65. Rodon, J., Dienstmann, R., Serra, V. & Tabernero, J. Development of PI3K Acknowledgements inhibitors: lessons learned from early clinical trials. Nat. Rev. Clin. Oncol. 10, We thank Jacqueline Bromberg, Douglas Levine, Chris Sander, Christopher E. Barbieri, 143–153 (2013). Petar Jelinic and James Hsieh for helpful discussions. The results published here are in 66. Sharma, S. V., Haber, D. A. & Settleman, J. Cell line-based platforms to evaluate whole or part based on data generated by The Cancer Genome Atlas pilot project the therapeutic efficacy of candidate anticancer agents. Nat. Rev. Cancer 10, established by the NCI and NHGRI (accession number: phs000178.v7p6). Information 241–253 (2010). about TCGA and the investigators and institutions that constitute the TCGA research 67. Vandin, F., Upfal, E. & Raphael, B. J. Algorithms for detecting network can be found at http://cancergenome.nih.gov/. This work was supported by an significantly mutated pathways in cancer. J. Comput. Biol. 18, 507–522 award from NCI R21 CA205819. H.U.O. is supported by NCI award K99 CA207871. (2011). E.T. holds a fellowship from the Terri Brodeur Breast Cancer Foundation. 68. Vaske, C. J. et al. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics 26, Author contributions i237–i245 (2010). H.U.O. performed all computational experiments and analyses, helped to develop the 69. Chen, J. C. et al. Identification of causal genetic drivers of human disease algorithmic approaches and helped to write the paper. E.T. and C.C. performed the through systems-level analysis of regulatory networks. Cell 159, 402–414 experimental validation and helped to write the experimental validation section. (2014). J.B. supervised the experimental validation. C.S.L. supervised the project, helped to 70. Akavia, U. D. et al. An integrated approach to uncover drivers of cancer. Cell develop the algorithmic approaches and helped to write the paper. 143, 1005–1017 (2010). 71. Yao, L., Shen, H., Laird, P. W., Farnham, P. J. & Berman, B. P. Inferring regulatory element landscapes and transcription factor networks from cancer Additional information methylomes. Genome Biol. 16, 105 (2015). Supplementary Information accompanies this paper at http://www.nature.com/ 72. Jiang, P., Freedman, M. L., Liu, J. S. & Liu, X. S. Inference of transcri- naturecommunications ptional regulation in cancers. Proc. Natl Acad. Sci. USA 112, 7731–7736 ð2015Þ: Competing financial interests: The authors declare no competing financial interests. 73. Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for Reprints and permission information is available online at http://npg.nature.com/ new cancer-associated genes. Nature 499, 214–218 (2013). reprintsandpermissions/ 74. Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome How to cite this article: Osmanbeyoglu, H. U et al. Pancancer modeling predicts Biol. 12, R41 (2011). the context-specific impact of somatic mutations on transcriptional programs. 75. Yang, J. Y. et al. Integrative protein-based prognostic model for early- Nat. Commun. 8, 14249 doi: 10.1038/ncomms14249 (2017). stage endometrioid endometrial cancer. Clin. Cancer Res. 22, 513–523 Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in ð2016Þ: published maps and institutional affiliations. 76. Zhang, L., Yang, S. H. & Sharrocks, A. D. Rev7/MAD2B links c-Jun N-terminal protein kinase pathway signaling to activation of the transcription factor Elk-1. Mol. Cell. Biol. 27, 2861–2869 (2007). This work is licensed under a Creative Commons Attribution 4.0 77. Ding, Z. Y. et al. Smad6 suppresses the growth and self-renewal of hepatic International License. The images or other third party material in this progenitor cells. J. Cell Physiol. 229, 651–660 (2014). article are included in the article’s Creative Commons license, unless indicated otherwise 78. Xiong, S., Salazar, G., Patrushev, N. & Alexander, R. W. FoxO1 mediates an in the credit line; if the material is not included under the Creative Commons license, autofeedback loop regulating SIRT1 expression. J. Biol. Chem. 286, 5289–5299 users will need to obtain permission from the license holder to reproduce the material. (2011). To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ 79. Bindea, G. et al. ClueGO: a Cytoscape plug-in to decipher functionally grouped and pathway annotation networks. Bioinformatics 25, 1091–1093 (2009). r The Author(s) 2017

14 NATURE COMMUNICATIONS | 8:14249 | DOI: 10.1038/ncomms14249 | www.nature.com/naturecommunications