<<

bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

An integrative somatic analysis to identify pathways linked with survival outcomes across 19 types Sunho Park1, Seung-Jun Kim2, Donghyeon Yu1, Samuel Pena-Llopis3,4, Jianjiong Gao5, Jin Suk Park1, Hao Tang1, Beibei Chen1, Jiwoong Kim1, Jessie Norris1, Xinlei Wang6, Min Chen7, Minsoo Kim1, Jeongsik Yong8, Zabi WarDak4,9, Kevin S Choe4,9, Michael Story4,9, Timothy K. Starr10,11, Jaeho Cheong†12, Tae Hyun Hwang†1,4

1Department of Clinical Sciences, University of Texas Southwestern Medical Center,

Dallas, Texas, United States of America

2Department of Computer Science and Electrical Engineering, University of Maryland

at Baltimore, Baltimore County, Maryland, United States of America

3Department of Internal , University of Texas Southwestern Medical Center,

Dallas, Texas, United States of America

4Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical

Center, Dallas, Texas, USA.

5Center for Molecular Biology, Memorial Sloan Kettering Cancer Center, New York,

New York, 10065, United States of America

6Department of Statistical Science, Southern Methodist University, Dallas, Texas,

United States of America

7Department of Mathematical Sciences, University of Texas at Dallas, Dallas, Texas,

United States of America

8Department of Biochemistry, Molecular Biology and Biophysics, University of

Minnesota Twin Cities, Minneapolis, Minnesota, United States of America

9Department of Radiation , University of Texas Southwestern Medical

Center, Dallas, Texas, United States of America

10Masonic Cancer Center, University of Minnesota – Twin Cities, Minneapolis,

Minnesota, Unites States of America

- 1 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

11Department of , Gynecology & Women's Health and Department of

Genetics, Cell Biology and Development, University of Minnesota, Minneapolis,

Minnesota, United States of America

12Department of , Yonsei University College of Medicine, Seoul, Korea

†Corresponding author

Email addresses:

SP: [email protected]

SK: [email protected]

DY: [email protected]

SPL: [email protected]

JG: [email protected]

JP:[email protected]

HT: [email protected]

BC: [email protected]

JK: [email protected]

JN: [email protected]

XW: [email protected]

MC: [email protected]

MK: [email protected]

JY:[email protected]

ZW:[email protected]

KC: [email protected]

MS: [email protected]

TS: [email protected]

- 2 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

JC: [email protected]

THH: [email protected]

Abstract Identification of altered pathways that are clinically relevant across human is a key challenge in cancer genomics. We developed a network-based algorithm to integrate somatic mutation data with networks and pathways, in order to identify pathways altered by somatic across cancers. We applied our approach to The Cancer Atlas (TCGA) dataset of somatic mutations in 4,790 cancer patients with 19 different types of malignancies. Our analysis identified cancer-type- specific altered pathways enriched with known cancer-relevant and drug targets. Consensus clustering using datasets that included 4,870 patients from TCGA and multiple independent cohorts confirmed that the altered pathways could be used to stratify patients into subgroups with significantly different clinical outcomes. Of particular significance, certain patient subpopulations with poor prognosis were identified because they had specific altered pathways for which there are available targeted . These findings could be used to tailor and intensify in these patients, for whom current therapy is suboptimal.

Background In the last few years, studies using high-throughput technologies have highlighted the

fact that the development and progression of cancer hinges on somatic alterations.

These somatic alterations may disrupt gene function, such as activating or

inactivating tumor suppressor genes, and thus, dysregulate critical pathways

contributing to tumorigenesis. Therefore, precise identification and understanding of

disrupted pathways may provide insights into therapeutic strategies and the

development of novel agents. Many large-scale cancer genomics studies, such as The

Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium

(ICGC), have performed an integrated analysis to draft an overview of somatic

alterations in the cancer genome1-4. Many of these studies have reported novel

- 3 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

candidate cancer genes mutated at high and intermediate frequencies in a specific

cancer as well as across many cancer types4. However, it is a still challenge to

translate somatic mutations in tumors into the pathway model to accurately predict

patient clinical outcomes 5, 6. Recently, in order to improve the clinical relevance and

utility of somatic mutation analyses, Hopfree et al7 proposed integrating somatic

mutation data with molecular interaction networks for patient stratification. They

demonstrated that inclusion of prior knowledge, captured in molecular interaction

networks, could improve identification of patient subgroups with significantly

different histological, pathological or clinical outcomes and discover novel cancer-

related pathways or subnetworks. In a similar manner, other network-based methods

have demonstrated that incorporating molecular networks and/or biological pathways

can improve accuracy in identifying cancer-related pathways8-11.

One limitation of these network-based methods is that they are not designed to fully

utilize large-scale somatic mutation data from multiple cancer types to determine

which particular pathways are altered by somatic mutations across a range of human

cancers. In addition, due to the incomplete knowledge of existing gene set and/or

pathway database, these methods are limited to detect pathways based on a number of

altered genes annotated in existing gene set and pathway databases. Alternatively, the

methods that build pathways de novo without incorporating biological prior

knowledge can be applicable to detect altered pathways, but these methods were not

designed to detect cancer-type specific or commonly altered pathways either.

To address these, we developed an algorithm named NTriPath (Network regularized

non-negative TRI matrix factorization for PATHway identification) to integrate

somatic mutation, gene-gene interaction networks and gene set or pathway databases

to discover pathways altered by somatic mutations in 4,790 cancer patients with 19

- 4 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

different types of cancers. Incorporating existing gene set or pathway databases

enables NTriPath to report a list of altered pathways across cancers, and make easy to

determine/compare which particular pathways are altered in a particular cancer

type(s). In particular, the use of large-scale genome-wide somatic mutations from

4,790 cancer patients and gene-gene interaction networks enables NTriPath to classify

genes, which were not annotated in existing gene set or pathway databases, as new

member genes of the identified altered pathways based on modular structures of

mutational data within a cancer type and/or across multiple cancer types (using matrix

factorization) and connectivity in the gene-gene interaction networks. The questions

that we investigate here are: first, whether large-scale integrative somatic mutation

analysis can reliably identify cancer-type-specific or commonly altered pathways by

somatic mutations across cancers; second, whether the identified pathways can be

used as a prognostic biomarker for patient stratification - with the assumption that the

altered pathways contribute to cancer development and progression and, thus, impact

survival.

In these experiments, we demonstrated that the cancer-type-specific and commonly

altered pathways identified by NTriPath are biologically relevant to the corresponding

cancer type and are associated with patient survival outcomes. We also showed that

cancer-specific altered pathways are enriched with many known cancer-relevant

genes and targets of available drugs including those already FDA-approved. These

results imply that the cancer-specific altered pathways can guide therapeutic strategy

to target the altered pathways that are pivotal in each cancer type.

- 5 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Results

NTriPath: An integrative somatic mutation analysis for discovering pathways altered by somatic mutations across multiple cancer types NTriPath integrates somatic mutations with gene-gene interaction networks and a

pathway database to discover altered pathways across human cancers. We collected

somatic mutation data from TCGA for 4,790 patients and 19 different cancer types

(Table 1). A diagrammatic description of our algorithm is depicted in Figure 1. Four

types of data were used as input for our algorithm. First, we generated a binary matrix

(!) of patients x genes, with ‘1’ indicating a mutation and ‘0’ no mutation. Second,

we constructed gene-gene interaction networks (!). Third, we incorporated a pathway

12 database (!! ) (e.g., conserved 4,620 subnetworks across species ). Fourth, we

included clinical data on the patient's tumor type (!). NTriPath produces two matrices

as output; 1) altered pathways by mutated genes (!) and 2) altered pathways by

cancer type matrix (!). The use of both large-scale somatic mutation profiles and

gene-gene interaction networks enabled NTriPath to identify cancer-related pathways

containing known cancer genes mutated at different frequencies across cancers with

newly added member genes according to high network connectivity (!) 8, 13. Finally

we use the altered pathways by cancer type matrix (!) to identify altered pathways

that are specific for each cancer type. For further details, please see the Materials and

Methods section. Our method is also available at www.taehyunlab.org.

NTriPath identifies cancer-type-specific altered pathways that are biologically and clinically relevant In each cancer type, we selected the top 3 ranked altered pathways by

statistical significance from NTriPath with the 4,620 subnetwork modules to generate

cancer-type-specific altered pathways (See Material and Method section and

Supplementary Table 1). Interestingly, NTriPath was able to find altered pathways

containing not only genes that were frequently mutated but also genes that were - 6 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

mutated in a small subset of patients in each cancer type (Supplementary Table 2 and

Supplementary Figure 1). Gene set enrichment analysis using the genes from the top 3

altered pathways showed that the altered pathways are significantly enriched with

well-known cancer-related genes from COSMIC database 14 and known drug target

genes as well as cancer-relevant biological processes (Supplementary Table 3 and 4).

Focusing on renal clear cell (KIRC) as a proof of concept,

NTriPath identified the pathway consisting of VHL, USP33, DIO2, TCEB1 and

TCEB2 as the top-ranked altered pathway in KIRC (Figure 2A). The VHL (von-

Hippel Lindau) gene is a well-known tumor suppressor associated with KIRC, and is

frequently mutated in patients with KIRC15-18. VHL was the most frequently mutated

gene in TCGA KIRC with 55.7% of patients harboring mutations in the gene. TCEB1

is mutated at very low frequency in TCGA KIRC cohort. A recent study found that

TCEB1 is mutated in about 3% of the KIRC patients without VHL inactivation, and

found TCEB1 preventing the binding of Elongin C to VHL, which inactivates the VHL

pathway16. The second highest ranked pathway contained EP300 and TP53. EP300

and TP53 were mutated in 8.1% and 5.2% of patients, respectively. EP300 has been

identified as a co-activator of hypoxia-inducible factor 1 alpha (HIF1α), whose

activation is a hallmark of KIRC tumors. TP53 was previously found to be associated

with poor outcome in TCGA KIRC19. The third highest ranked pathway contains

LRP1 and matrix metalloproteinases (MMP1, MMP7, MMP9, MMP26). LRP1 is

mutated in 10% of TCGA KIRC cohort, but matrix metalloproteinases (MMPs) were

not mutated in TCGA KIRC cohort. Biological and clinical relevance of LRP1

mutation in KIRC has not been previously reported. MMPs have been implicated in

different types of cancer progression including the acquisition of invasive and

metastatic properties in many cancer types. The aberrant expression of MMPs has

- 7 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

been associated with poor patient survival and prognosis in KIRC patients 8, 20.

Interestingly, recent studies suggested that LRP1 induces the expression of matrix

metalloproteinase (MMPs) and thus promotes cancer cell invasion and metastasis in

many cancers including KIRC16,21-23.

NTriPath identified many new member genes in the top ranked pathways

including TCEB2, JUN, and SP1 as well as other tumor suppressors such as CREBBP,

SMAD3, BRCA1 and RB1. These newly identified member genes by NTriPath were

mutated at a very low frequency or not mutated at all in TCGA KIRC patients.

Instead, these genes interacted with many frequently mutated genes in the networks

and were often dysregulated at the mRNA and levels in many KIRC patients

(Figure 2B). For example, TCEB2, SP1 and JUN were not mutated but yet their

expression was dysregulated in 7%, 10% and 2% of TCGA KIRC patients,

respectively. Previous studies have shown that dysregulation in TCEB2 is expected to

disrupt the protein complex that ubiquitinates HIF1α, resulting in the same phenotype

as VHL inactivation by mutation or promoter hypermethylation24-26. In addition, SP1

and JUN were previously identified as major transcriptional regulators associated with

signaling circuit to promote tumor growth and invasion in KIRC 18.

Taken together, these results demonstrate that NTriPath is an effective tool to

accurately identify cancer-specific altered pathways including known cancer genes

mutated at a high or intermediate frequency in the patients, as well as genes mutated

at a very low frequency or not mutated at all yet may be fundamental role in

development and/or progression of KIRC.

Cancer-type-specific altered pathways across cancer types correlate with patient survival outcomes We hypothesized that cancer-type-specific altered pathways reflect the molecular

basis underlying the patient clinical outcomes. This would allow us to use member

- 8 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

genes in the altered pathways as gene signatures to stratify patients into subgroups

with different clinical outcomes for each type of cancer. We first collected a dataset

consisting of gene expression profiles from 3,656 patients with their survival

information from TCGA cohorts. We then used member genes in the top 3 ranked

cancer-type-specific altered pathways to perform consensus clustering for each cancer

type (see Material and Method section). We generated Kaplan-Meier (KM) curves

based on the groups produced by consensus clustering and found that patient survival

was significantly different among the groups (Figure 3 and Supplementary Figure 2).

In TCGA KIRC, we found three patient subgroups (A, B and C), with the Group C

having the poorest survival. A log-rank test indicated that Groups A and C had

significantly different survival outcomes (Log-rank test p-value = 1.840e-08, Hazard

ratio = 2.94) with median survival times of 41.9 months for group A compared to 30.8

months for group C (Figure 3A). Other examples are Bladder Urothelial Carcinoma

(BLCA), Head and Neck (HNSC), and Skin Cutaneous

Melanoma (SKCM) patient subgroups identified by NTriPath pathway signatures.

While the molecular classification of clinically relevant subtypes of these cancers is

still challenging, we found patient subgroups having significantly different survival in

these cancers (Log-rank test p-value= 0.0086, 0.0010, and 0.0210, repectively)

(Figure 3B, 3C and 3D).

Experiments with other TCGA datasets, including those for Breast invasive

carcinoma (BRCA), Multiforme (GBM), Lung

(LUAD), and Ovarian serous cystadenocarcinoma (OV) consistently showed that the

use of member genes in cancer-type-specific altered pathways could serve as a

prognostic biomarker for patient stratification (Supplementary Figure 2 and 3). For

comparison, we also attempted to cluster patients using significant frequently mutated

- 9 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

genes previously identified by the TCGA Pan-Cancer study1. The results of consensus

clustering using the NTriPath-derived pathway signatures and the TCGA Pan-Cancer-

derived mutated gene signatures showed that the results from NTriPath-derived

pathway signatures had higher significance levels for BLAC, BRCA, and KIRC, and

comparable results for the GBM, HNSC, and LUAD cancer types (Figure 4). These

findings suggested that NTriPath-derived altered pathways could be used as

prognostic biomarkers for better patient stratification.

Independent cohorts for the validation of the cancer-type-specific altered pathways We performed multiple validations to evaluate the robustness and the reproducibility

of NTriPath. First, we evaluated the robustness of the cancer-type-specific altered

pathways identified in the TCGA cohort for prognostic stratification. We generated

gene expression profiles of 102 HNSC patients from our institution and used the

member genes of the top 3 HNSC cancer-type-specific altered pathways in the TCGA

cohort for patient stratification. In addition, we also used publically available gene

expression data from two datasets, one dataset, two colon

cancer datatsets for a total of 1,112 patients, and used the top 3 cancer-type specific

altered pathways for corresponding cancer type for independent validation. In the

HNSC cohorts, we found six patient subgroups (A through F), with the group F

patients having the poorest survival times (Figure 5A). A log-rank test indicated that

groups A and F had significantly different survival outcomes (p-value = 0.038, hazard

ratio = 1.88) with median survival times of 78.1 months for group A and 26.7 months

for group F. Similarly, we found patient subgroups having significantly different

survival outcomes in lung cancer, ovarian cancer, and (Figure 5B-D

and Supplementary Figure 3). Secondly, we verified the reproducibility of NTriPath

for the identification of the cancer-type-specific altered pathways. We collected the

- 10 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

level 2 somatic mutation data from 19 human cancer types those were updated after

we collected initial dataset used in the original experiments from the TCGA data

portal. We found that there are 1891 newly updated patients’ mutation data from 15

cancer types (see Supplementary Table 5). We re-ran NTriPath to identify cancer-

type-specific pathways across 19 cancers using 6681 patients’ somatic mutation data

including those of newly updated patients’ mutation data. Interestingly, we found that

many top ranked pathways identified by NTriPath in the original experiments were

consistently highly ranked in the new experiments (see Supplementary Table 6).

These results reassure that NTriPath is a robust tool to detect the altered pathways

across cancers, and the altered pathways identified by NTriPath can serve as robust

prognostic signatures for identifying patient subgroups with different survival

outcomes across multiple cancer types.

NTriPath identified potential therapeutic targets in poor prognosis patient subgroups We further investigated whether we could identify potential targets or the therapy for

the identified poor prognosis patient subgroups. Interestingly, we found that many

known drug targets in the cancer-type specific altered pathways are often up-regulated

in poor prognosis patient subgroups across cancers (see Method section and

Supplementary 7). For example, in TCGA KIRC cohort, LRP1 and MMP9, targets of

FDA-approved drugs Tenecteplase and Captopril, were significantly up-regulated in

poor prognosis group compared to good prognosis group (FDR-adjusted p-value <

0.05 with t-test). Tenecteplase binds to LRP1 and induces both LRP1 and MMP9

expression, and Captoprilare inhibits MMP9 expression. Thus, combinatorial therapy

of these drugs can be beneficial for the KIRC patients with high LRP1 and MMP9

expression 27-44. Another notable example includes DNA Topoisomerase I (TOP1), a

target of well-known FDA-approved anticancer drugs such as Irinotecan and

- 11 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Topotecan, identified by NTriPath as a new member gene into the cancer-type-

specific altered pathways across many cancers including HNSC. Interestingly, we

found that TOP1 was up-regulated in poor prognosis subgroups in HNSC from both

TCGA and UTSW cohorts (Group E and F in Figure 3C and 5A, respectively). In

addition, we found that some patients with overexpression of TOP1 in TCGA HNSC

poor prognosis subgroup have developed therapy resistance against single

chemotherapeutic agent such as Cisplatin. Interestingly, there is an ongoing trial in

advanced HNSC showing efficacy of TOP1 inhibitor Irinotecan with Cisplatin in a

poor prognosis patient subgroup 45. These observations may suggest that TOP1

inhibitors-based combinations might offer an effective treatment option for HNSC

patients with overexpression of TOP1. Taken together, these findings suggested that

the use of NTriPath-derived altered pathways containing available drug targets may

allow for the development of more tailored therapeutics.

Discussion Systematic understanding of how somatic mutations influence clinical outcomes is

essential for the development and application of personalized therapies. Especially

organizing alterations at the individual gene level and in the molecular pathways can

correlate altered pathways and vulnerabilities with specific genetic lesions, and

provide novel insights into cancer biology, biomarkers for patient stratification in

clinical trials, and potential targeted drug development46. Here, we systematically

identified biological and clinical relevant cancer-type-specific altered cross multiple

cancer types. In particular, the integration of somatic mutation with biological prior

knowledge led to the identification of altered pathways that contain recurrently

mutated genes as a hallmark of specific cancer types. Interestingly, we found that

several genes, while not frequently mutated or not mutated at all in patients, were part

- 12 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

of cancer-type-specific altered pathways that have been causally implicated in the

development of corresponding cancer types, and expressions of those genes are

significantly associated with clinical outcomes (Supplementary Figure 4). For

example, no mutation of MMP7 has been reported, but high expression of MMP7 (p

= 0.00191, HR=1.7 (95%CI,1.21−2.38)) is significantly associated with poor survival

in TCGA KIRC patients. Other examples include CABLES1 (p = 0.00272, HR=0.486

(95%CI,0.301−0.787)) in TCGA HNSC and LUAD, and GCH1 (p = 0.0000528,

HR=0.52 (95%CI,0.367−0.763)) in TCGA SKCM are not frequently or not mutated

but low or high expression of those genes are significantly associated with poor

survival. In addition, we found that known drug targets are not frequently mutated but

often up-regulated in poor prognosis patient subgroups across many cancers. These

results further corroborate that the integrative analysis of somatic mutations with

additional biological prior knowledge may elucidate potential candidate genes

associated with clinical outcomes and could be potentially used to design targeted

therapy, which cannot be readily identified by somatic mutation analysis alone.

In our analysis, we did not remove synonymous mutations or further select a shorter

3, 47 list of recurrent mutated genes in cohorts with stringent criteria either . However,

Hopfree et al7 showed that filtering synonymous mutations resulted in a decreased

ability to detect patient subgroups with different survival outcomes. Another recent

study also showed that synonymous mutations could affect functions of and

tumor suppressors 48. In addition, our experimental results for patient stratification in

comparison with recurrent mutated gene signatures identified by the TCGA Pan-

Cancer1 indicated that the use of NTriPath-derived pathways showed a comparable or

better performance in discovering patient subgroups with different survival outcomes

across cancers. To evaluate the impact of different network resources, we used

- 13 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

networks from the HPRD 49 and Rossin, E.J. et al50 and repeated experiments. We

summarize the results of altered pathways and patient stratification using different

network resources and provide on our supplement website.

Lastly, NTriPath is a general computational algorithm and can be applied to other data

types such as gene expression, copy number alteration, and methylation to identify

altered pathways by different types of genomic aberrations. NTriPath can also be used

to find altered pathways across associated with other cancer-related phenotypes (e.g.,

patient groups having therapy resistance vs. sensitivity, metastatic vs. non-metastatic).

Conclusions We have described an integrative somatic mutation analysis for discovering altered

pathways in human cancers. NTriPath integrates somatic mutation data and prior

biological knowledge from the pathway database and molecular networks to identify

significantly altered pathways and their associations with specific cancer types.

Specifically, NTriPath effectively utilizes mutation patterns that exist in only a subset

of samples (or specific cancer types), thus revealing pathways altered by complex

mutation patterns across cancer types. Furthermore, the use of gene-gene interaction

networks and the pathway database provides the potential to identify altered pathways

enriched with genes harboring mutations at high/intermediate frequencies, as well as

those not mutated per se but nevertheless playing critical roles in tumorigenesis in

network and pathway contexts. Thus, NTriPath is uniquely suited to provide a global

analysis of altered pathways by somatic mutation across cancer types.

We applied NTriPath to somatic mutation data from 19 types of cancers, and

discovered cancer-type-specific altered pathways based on these mutations in human

cancers. Functional enrichment analysis of cancer-type-specific pathways

demonstrated that the identified cancer-type-specific altered pathways are biologically

- 14 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

meaningful to each cancer type. It also provided unique pathway views of key

biological processes underlying each cancer type. Of particular significance, we

identified a patient subgroup with poor survival by cancer-type-specific altered

pathway signatures from TCGA cohorts, which in independent cohorts. These results

implied the potential utility of cancer-type-specific altered pathway signatures to

serve as a guide to tailored treatment in a patient subgroup.

Materials and Methods

Somatic mutation, human gene-gene interaction networks, and pathway data The level 2 somatic mutation data from 19 human cancer types were collected from

the TCGA data portal on May 19th 201351. We constructed a gene-gene interaction

network by combining networks from Zhang, S. et al 52, the Human Protein Reference

Database (Dec. 2013)53 and Rossin, E.J. et al50..Four sets of pathways were used in

the analysis: 1) 4,620 conserved subnetworks from the human gene-gene interaction

network12, 2) KEGG, 3) Biocarta, and 4) Reactome gene sets from MsigDB (Sept.

2010)54.

Algorithm The algorithm identifies pathways disrupted by mutated genes. Disrupted pathways

are found based on the factorization results from the network regularized nonnegative

tri-matrix factorization.

1. Notations

We construct a binary data matrix ! ∈ !!×!from the mutation data, where ! is the

number of patients, ! is the number of genes and the (!, !)!!! element of the matrix !,

[!]!", is ‘1’ if the !th patient has a mutation on the !th gene, ‘0’ otherwise. We derive

the adjacency matrix from the human gene-gene interaction networks and denote it as

- 15 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

!, where [!]!"=‘1’ if the !th gene is interacting with the !th genes in the networks and

‘0’ otherwise. We define the graph Laplacian matrix by ! = ! − !, where each

diagonal element in the diagonal matrix ! is given by [!]!! = !"[!]!". We construct

!×!! a binary matrix ! ∈ ! denoting patient cluster, where !! indicates the number of

cancer types and [!]!"=1 indicates the !th patient has !th cancer type. We construct a

!×!! binary matrix !! ∈ ! from the specific pathway database denoting pathway

information, where !! is the number of pathways and [!!]!"=1 if the !th gene is

annotated in !th pathway as a member in the pathway database, otherwise 0. Since

current pathway database annotation is still incomplete, we define a matrix

! ∈ !!×!! denoting newly updated pathway information including newly added

member genes by NTriPath. We define a matrix ! ∈ !!!×!! denoting cancer type and

pathway associations, where each element of [!]!" represents associations between !th

cancer type with ! th pathway. Higher values of elements indicate stronger

associations between cancer types and pathways. Since ! and ! are unknown, we

need to learn about those matrices during optimization (see below section for details)

2. Network regularized non-negative tri-matrix factorization

The network regularized nonnegative tri-matrix factorization is an extension of

Nonnegative Tri Matrix Factorization (NTMF); in this work, somatic mutation data !

is factorized as the products of three element-wise non-negative matrices !, !, and !

denoting patient’s cancer type, cancer-type and pathway associations, and cancer-

related pathways, respectively. We here consider a weighted loss function to deal with

the sparseness of the data matrix ! (More than 98% of entries are zero). It enables us to

focus on an approximation error at nonzero entries, which correspond to mutated

genes. In addition, to incorporate the prior knowledge from human gene-gene

interaction networks and pathway datasets into factorizations, we enforce constraints

- 16 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

on parameters, which involve the graph Laplaican ! and the pathway information !!.

All these ideas are accomplished by minimizing the following objective function

! ! ! ! ! ! min ∥ ! ∘ ! − !"! ∥!+ !! ∥ ! ∥!+ !! ∥ ! ∥!+ !! ∥ ! − !! ∥!+ !!!" ! !" , !,!!!

!×! where ! ∈ ! is a weight matrix where [!]!" = 1 if [!]!" > 0 otherwise 0, and

the operator ∘!represents the element-wise multiplication. Here, we are only interested

in learning of ! and ! among three factor matrices, since factor !!can be obtained from

the patient’s clinical information.

To solve our minimization problem, we adapt the multiplicative update method for

NTMF proposed in a recent study55, which contains a routine for avoiding

‘inadmissible zeros problem’ where the solution of multiplicative update rules is stuck

at zero when an entry in the factor becomes.

Step 1: Initialization

!!×!! Initialize the factor matrices ! = ! and ! = !!, where ! ∈ ! is a matrix whose

elements are all one. Set the regularization parameters !! =!! = !! = 1 and !!=0.1. Set

!!" the user specified parameters for avoiding the inadmissible zeros problem, !!"# = 10 ,

!!! = 10!! and ! = 10!!".

Step 2: Iteration

Iterate until it converges or reaches the maximum number of iterations:

! ! [!]!" ← ([!]!" + !!")!!"

! ! [!]!" ← ([!]!" + !!")!!"

! ! where!!!" is set to ! if [!]!" ≥ !!"# !"#!!!" > 1, otherwise 0, and

[!!!"] !! = !" ,! !" ! ! [! ! ∘ !"! !]!" + !! ∥ ! ∥!+ !

! ! [! !" + !!!" + !!!!]!" !!" = ! ! .! [(! ∘ (!"! )) !" + !!! + !!!"]!" + !! ∥ ! ∥!+ !

- 17 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Empirically, the algorithm converges fast within 50 iterations in the experiments.

3. Identification of cancer-type-specific altered pathways Once the above optimization problem is solved, we used ! matrix to identify cancer-

type-specific altered pathways across cancers. Specifically, we ranked pathways

based on values of elements of the ! matrix for each cancer type (e.g., rank pathways

based on all values of !th row which indicate association scores between ith cancer

type and all pathways). In addition, to measure statistical significance of cancer-type

and pathway associations, we performed a permutation test (e.g., we randomly

permuted somatic mutation data and repeated experiments 5,000 times to calculate

empirical p-values) and defined cancer-type-specific altered pathways based on the

following strict criteria: 1) Pathways must be ranked within the top !th compared to

other pathways in each cancer type based on their association scores in matrix !. 2)

Pathways must have significant BH-adjusted p-values (Benjamini-Hochberg adjusted

p-values using a false discovery rate cutoff of 0.1) (See Supplementary X). In this

work, we selected the top 3 ranked pathways having significant BH- adjusted p-values

per each cancer type. Top ranked pathways for KICH, KIRP, and THCA were

excluded for further analysis, due to the insignificant BH-adjusted p-values.

Gene expression data and clustering We collected RNA-seq data for TCGA BLCA, BRCA, HNSC, KIRC, LAML,

LUAD, SKCM, STAD, UCEC from cBioPortal56-58 using CGDS MATLAB toolbox

with RNA Seq V2 RSEM option. We collected microarray gene expession profiles for

TCGA GBM from the TCGA dataportal51 and TCGA OV and two others from

Zhang, W. et al. 59. We collected colon cancer data from GSE39582 and lung cancer

data from Shedden, K. et al. 60. RNA-seq data were z-transformed while other

expression data were quantile normalized, log transformed, and expression values

- 18 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

were median centered. To perform consensus clustering, we used Matlab K-means

clustering and used two-way hierarchical clustering.

Authors' contributions

SP and THH designed the study. SP and THH performed the analysis. THH supervised the project. SP, JN, SL, and THH wrote the paper. All authors read and approved the final manuscript. Acknowledgements We would like to thank to Quantitative Biomedical Research Center to allow us to use

their computational resources.

Figures

Figure 1. Overview of NTriPath

This figure describes steps to discover altered pathways across multiple cancer types.

The aim of the approach is to integrate the somatic mutation data with gene-gene

interaction networks and a pathway database for discovering altered pathways across

cancers.

Figure 2. KIRC-specific altered pathways

(A) Diagrams of the top three ranked altered pathways in patients with KIRC. Red

color indicates genes that are frequently mutated. A circular-shaped node represents

the original member genes annotated in the pathway database, and a diamond-shaped

node represents newly identified members genes of the pathways by NTriPath (B)

Protein and mRNA expression and mutation status for all genes identified in the top

three KIRC altered pathways. Each row represents a member gene in the TCGA

- 19 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

KIRC-specific altered pathway, and each column represents a patient sample in

TCGA KIRC cohort.

Figure 3. Cancer-type-specific altered pathways across cancers correlate with survival

outcomes

Kaplan-Meier survival plots based on patient subgroups defined by consensus

clustering using genes from the top 3 altered pathways for (a) kidney renal cell

carcinoma (KIRC), (b) bladder urothelial carcinoma (BLCA), (c) head and neck

squamous carcinoma (HNSC), and (d) skin cutaneous (SKCM).

Figure 4. Comparing NTriPath-derived signatures with mutation-frequency-based

signatures

This figure describes comparisons of patient stratification using signatures derived

from NTriPath and mutation frequency reported in Kandoth, C. et al. 1.

Figure 5. Validation in independent cohort

This figure describes Kaplan-Meier survival plots for patient subgroups from (a)

UTSW HNSC (b) Lung adenocarcinoma (c) Colon cancer, (d) Ovarian cancer.

Cancer Type Number of Patients

1 Acute Myeloid (LAML) 75

2 Bladder Urothelial Carcinoma (BLCA) 136

3 Brain Lower Grade (LGG), 217

4 Breast Invasive Carcinoma (BRCA) 772

5 Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma (CESC) 41

6 Colon Adenocarcinoma (COAD) 270

7 Glioblastoma Multiforme (GBM) 290

8 Head and Neck Squamous Cell Carcinoma (HNSC) 323

- 20 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

9 Kidney Chromophobe (KICH) 65

10 Kidney Renal Clear Cell Carcinoma (KIRC) 210

11 Kidney Renal Papillary Cell Carcinoma (KIRP) 111

12 Lung Adenocarcinoma (LUAD) 380

13 Ovarian Serous Cystadenocarcinoma (OV) 463

14 Prostate Adenocarcinoma (PRAD) 171

15 Rectum Adenocarcinoma (READ) 116

16 Skin Cutaneous Melanoma (SKCM) 269

17 Stomach Adenocarcinoma (STAD) 264

18 Carcinoma (THCA) 369

19 Uterine Corpus Endometrioid Carcinoma 248

Table 1. A list of cancer types used in the analysis.

References

1. Kandoth, C. et al. Mutational landscape and significance across 12 major cancer types. Nature 502, 333-339 (2013). 2. Tamborero, D. et al. Comprehensive identification of mutational cancer driver genes across 12 tumor types. Sci. Rep. 3 (2013). 3. Lawrence, M.S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214-218 (2013). 4. Lawrence, M.S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495-501 (2014). 5. Osmanbeyoglu, H.U., Pelossof, R., Bromberg, J.F. & Leslie, C.S. Linking signaling pathways to transcriptional programs in . Genome Res 24, 1869-1880 (2014). 6. Baselga, J. Targeting the phosphoinositide-3 (PI3) kinase pathway in breast cancer. The oncologist 16 Suppl 1, 12-19 (2011). 7. Hofree, M., Shen, J.P., Carter, H., Gross, A. & Ideker, T. Network-based stratification of tumor mutations. Nat Meth 10, 1108-1115 (2013). 8. Hwang, T. et al. Large-scale integrative network-based analysis identifies common pathways disrupted by copy number alterations across cancers. BMC Genomics 14, 1-13 (2013). 9. Vaske, C.J. et al. Inference of patient-specific pathway activities from multi- dimensional cancer genomics data using PARADIGM. Bioinformatics 26, i237-i245 (2010). 10. Cerami, E., Demir, E., Schultz, N., Taylor, B.S. & Sander, C. Automated Network Analysis Identifies Core Pathways in Glioblastoma. PLoS ONE 5, e8918 (2010).

- 21 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

11. Vandin, F., Upfal, E. & Raphael, B. Algorithms for detecting significantly mutated pathways in cancer. Journal of computational biology : a journal of computational molecular cell biology 18, 507-522 (2011). 12. Suthram, S. et al. Network-Based Elucidation of Human Disease Similarities Reveals Common Functional Modules Enriched for Pluripotent Drug Targets. PLoS Comput Biol 6, e1000662 (2010). 13. Hofree, M., Shen, J.P., Carter, H., Gross, A. & Ideker, T. Network-based stratification of tumor mutations. Nature methods 10, 1108-1115 (2013). 14. Forbes, S.A. et al. COSMIC: exploring the world's knowledge of somatic mutations in human cancer. Nucleic acids research 43, D805-811 (2015). 15. Pena-Llopis, S. et al. BAP1 loss defines a new class of renal cell carcinoma. Nat Genet 44, 751-759 (2012). 16. Sato, Y. et al. Integrated molecular analysis of clear-cell renal cell carcinoma. Nat Genet 45, 860-867 (2013). 17. Peña-Llopis, S. & Brugarolas, J. Simultaneous isolation of high-quality DNA, RNA, miRNA and from tissues for genomic applications. Nat. Protocols 8, 2240-2255 (2013). 18. Nickerson, M.L. et al. Improved Identification of von Hippel-Lindau Gene Alterations in Clear Cell Renal Tumors. Clinical Cancer Research 14, 4726- 4734 (2008). 19. The Cancer Genome Atlas Research, N. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499, 43-49 (2013). 20. Gialeli, C., Theocharis, A.D. & Karamanos, N.K. Roles of matrix metalloproteinases in cancer progression and their pharmacological targeting. The FEBS journal 278, 16-27 (2011). 21. Staudt, N.D. et al. Myeloid Cell LRP1/CD91 Regulates Monocyte Recruitment and Angiogenesis in Tumors. Cancer Research 73, 3902-3912 (2013). 22. Langlois, B. et al. LRP-1 Promotes Cancer Cell Invasion by Supporting ERK and Inhibiting JNK Signaling Pathways. PLoS ONE 5, e11584 (2010). 23. Song, H., Li, Y., Lee, J., Schwartz, A.L. & Bu, G. Low-density lipoprotein receptor-related protein 1 promotes cancer cell migration and invasion by inducing the expression of matrix metalloproteinases 2 and 9. Cancer Res 69, 879-886 (2009). 24. Duan, D. et al. Inhibition of transcription elongation by the VHL tumor suppressor protein. Science 269, 1402-1406 (1995). 25. Ohh, M. et al. Ubiquitination of hypoxia-inducible factor requires direct binding to the [bgr]-domain of the von Hippel-Lindau protein. Nat Cell Biol 2, 423-427 (2000). 26. Tanimoto, K., Makino, Y., Pereira, T. & Poellinger, L. Mechanism of regulation of the hypoxia-inducible factor-1a by the von Hippel-Lindau tumor suppressor protein. The EMBO Journal 19, 4298-4309 (2000). 27. Yamamoto, D., Takai, S. & Miyazaki, M. Inhibitory profiles of captopril on matrix metalloproteinase-9 activity. European journal of pharmacology 588, 277-279 (2008). 28. Williams, R.N., Parsons, S.L., Morris, T.M., Rowlands, B.J. & Watson, S.A. Inhibition of matrix metalloproteinase activity and growth of gastric adenocarcinoma cells by an angiotensin converting enzyme inhibitor in in vitro and murine models. European journal of : the journal

- 22 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

of the European Society of Surgical Oncology and the British Association of Surgical Oncology 31, 1042-1050 (2005). 29. Underwood, C.K., Min, D., Lyons, J.G. & Hambley, T.W. The interaction of metal ions and Marimastat with matrix metalloproteinase 9. Journal of inorganic biochemistry 95, 165-170 (2003). 30. Reinhardt, D. et al. Cardiac remodelling in end stage failure: upregulation of matrix metalloproteinase (MMP) irrespective of the underlying disease, and evidence for a direct inhibitory effect of ACE inhibitors on MMP. Heart 88, 525-530 (2002). 31. Rask-Andersen, M., Almen, M.S. & Schioth, H.B. Trends in the exploitation of novel drug targets. Nature reviews. Drug discovery 10, 579-590 (2011). 32. Rajapakse, N., Mendis, E., Kim, M.M. & Kim, S.K. Sulfated glucosamine inhibits MMP-2 and MMP-9 expressions in human fibrosarcoma cells. Bioorganic & medicinal chemistry 15, 4891-4896 (2007). 33. Overington, J.P., Al-Lazikani, B. & Hopkins, A.L. How many drug targets are there? Nature reviews. Drug discovery 5, 993-996 (2006). 34. Okada, M. et al. Captopril attenuates matrix metalloproteinase-2 and -9 in monocrotaline-induced right ventricular hypertrophy in rats. Journal of pharmacological sciences 108, 487-494 (2008). 35. Nenan, S. et al. Metalloelastase (MMP-12) induced inflammatory response in mice airways: effects of dexamethasone, rolipram and marimastat. European journal of pharmacology 559, 75-81 (2007). 36. Mendis, E., Kim, M.M., Rajapakse, N. & Kim, S.K. Carboxy derivatized glucosamine is a potent inhibitor of matrix metalloproteinase-9 in HT1080 cells. Bioorganic & medicinal chemistry letters 16, 3105-3110 (2006). 37. Imming, P., Sinning, C. & Meyer, A. Drugs, their targets and the nature and number of drug targets. Nature reviews. Drug discovery 5, 821-834 (2006). 38. Fenton, J.I., Chlebek-Brown, K.A., Caron, J.P. & Orth, M.W. Effect of glucosamine on interleukin-1-conditioned articular . Equine veterinary journal. Supplement, 219-223 (2002). 39. Dodge, G.R. & Jimenez, S.A. Glucosamine sulfate modulates the levels of aggrecan and matrix metalloproteinase-3 synthesized by cultured human osteoarthritis articular chondrocytes. Osteoarthritis and cartilage / OARS, Osteoarthritis Research Society 11, 424-432 (2003). 40. Chu, S.C. et al. Glucosamine sulfate suppresses the expressions of urokinase plasminogen activator and inhibitor and gelatinases during the early stage of osteoarthritis. Clinica chimica acta; international journal of 372, 167-172 (2006). 41. Chen, X., Ji, Z.L. & Chen, Y.Z. TTD: Therapeutic Target Database. Nucleic acids research 30, 412-415 (2002). 42. Berman, H.M. et al. The . Nucleic acids research 28, 235- 242 (2000). 43. Sato, A. et al. Inhibition of MMP-9 using a pyrrole-imidazole polyamide reduces cell invasion in renal cell carcinoma. International journal of oncology 43, 1441-1446 (2013). 44. Hu, K. et al. Tissue-type plasminogen activator acts as a cytokine that triggers intracellular and induces matrix metalloproteinase-9 gene expression. The Journal of biological chemistry 281, 2120-2127 (2006).

- 23 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

45. Gilbert, J. et al. Phase II trial of irinotecan plus cisplatin in patients with recurrent or metastatic squamous carcinoma of the head and neck. Cancer 113, 186-192 (2008). 46. Garraway, Levi A. & Lander, Eric S. Lessons from the Cancer Genome. Cell 153, 17-37 (2013). 47. Dees, N.D. et al. MuSiC: Identifying mutational significance in cancer . Genome Research 22, 1589-1598 (2012). 48. Supek, F., Minana, B., Valcarcel, J., Gabaldon, T. & Lehner, B. Synonymous mutations frequently act as driver mutations in human cancers. Cell 156, 1324-1335 (2014). 49. Keshava Prasad, T.S. et al. Human Protein Reference Database--2009 update. Nucleic acids research 37, D767-772 (2009). 50. Rossin, E.J. et al. Proteins Encoded in Genomic Regions Associated with Immune-Mediated Disease Physically Interact and Suggest Underlying Biology. PLoS Genet 7, e1001273 (2011). 51. strel'tsov, S.A., Mikheikin, A.L. & Nechipurenko Iu, D. [Interaction of topotecan--a DNA topoisomerase I inhibitor--with dual-stranded polydeoxyribonucleotides. II. Formation of a complex containing several DNA molecules in the presence of topotecan]. Molekuliarnaia biologiia 35, 442-450 (2001). 52. Zhang, S., Li, Q., Liu, J. & Zhou, X.J. A novel computational framework for simultaneous integration of multiple types of genomic data to identify microRNA-gene regulatory modules. Bioinformatics 27, i401-i409 (2011). 53. Keshava Prasad, T.S. et al. Human Protein Reference Database—2009 update. Nucleic acids research 37, D767-D772 (2009). 54. Subramanian, A. et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102, 15545- 15550 (2005). 55. Seung-Jun, K., TaeHyun, H. & Giannakis, G.B. in Cognitive Information Processing (CIP), 2012 3rd International Workshop on 1-6 (2012). 56. Teicher, B.A. Next generation topoisomerase I inhibitors: Rationale and biomarker strategies. Biochemical pharmacology 75, 1262-1271 (2008). 57. Cerami, E. et al. The cBio Cancer Genomics Portal: An Open Platform for Exploring Multidimensional Cancer Genomics Data. Cancer Discovery 2, 401-404 (2012). 58. Gao, J. et al. Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal. Sci. Signal. 6, pl1- (2013). 59. Zhang, W. et al. Network-based Survival Analysis Reveals Subnetwork Signatures for Predicting Outcomes of Ovarian Cancer Treatment. PLoS Comput Biol 9, e1002975 (2013). 60. Shedden, K. et al. Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study. Nat Med 14, 822-827 (2008).

- 24 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Fig. 1

Input: 1) Somatic mutation 2) Clinical data (e.g., patient’s cancer type) 3) Pathway information 4) Gene-gene interaction networks

Somatic mutation data Clinical data Pathway Gene-gene interaction network

...... ~ X X + samples samples pathways cancertypes ? ?

genes cancer types pathways genes X U S VT A Newly updated Output: member genes min ||X-USVT || F + ||S||2 + ||V-V || F + tr(VT (D-A)V) 2 1 0 2 1) Newly updated pathway information (V) in the pathways S,V 2) Cancer type-pathway association (S) X: Somatic mutation data (#patients X #genes) U: Patient cluster (#patients X #cancer types) S: Cancer type-pathway association matrix (#cancer types X #pathways) ...... V0 : Initial pathway information (#pathways X #genes) V: Newly updated pathway information (#pathways X #genes) pathways cancertypes A: Gene-gene interaction network (#genes X #genes)

pathways genes

p u o r g n o i s s u c s i d r e s U | m o c . s p u o r g e l g o o g @ l a t r o p o i b c : k c a b d e e f d n a s n o i t s e u Q

A G C T | C C K S M | l a t r o P o i B c

n o i t a c i f i l p m a l a m y h c n e s e m a m o t s a l b o n i t e r , r e d d a l b , M B G 4 M D M

n o i t a c i f i l p m a l a m y h c n e s e m a m o c r a s N U J

BEST1 MMP26

GNL3 r e h t o , e s n e s s i m a m o h p m y L n i k g d o H - n o

MMP9 N

RB1

, t f i h s e m a r f , e s n e s n o n l l e c - B , a m o h p m y l l l e c - B e g r a l e s u f f i d , a i m e k u e

MDM4 PPP2R5A MMP7 THBS1 l

ACTA2 SMAD3 , n o i t a c o l s n a r t a m o h p m y l / a i m e a k u e l s u o n e g o l e y m e t u c a , a i m e k u e l c i t y c o h p m y l e t u c a P B B E R SERPINA1 C ESR1

EP300

TFPI , t f i h s e m a r f TP63 LRP1

BRCA1

PPP1R13B CTSG

, e s n e s n o n , e s n e s s i

m MMP1

JUN TP53 PRSS3

, n o i t e l e d e g r a l l a i l e h t i p e n a i r a v o , t s a e r b n a i r a v o 1 A C R HIPK2 B

SP1 CREBBP APP

WT1 SERPINA3 r e h t o , e s n e s s i m a m o h p m y l l l e c - B e g r a l e s u f f i d , a i m e k u e l

bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not

, t f i h s e m a r f , e s n e s n o n l a i l e h t i p e c i t y c o h p m y l e t u c a , a i m e k u e l s u o n e g o l e y certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. m

, n o i t a c o l s n a r t , a m o h p m y l / a i m e a k u e l e t u c a , c i t a e r c n a p , t s a e r b , l a t c e r o l o c 0 0 3 P E

, t f i h s e m a r

Fig. 2 f

, e s n e s n o n , e s n e s s i m

a b ACTA2 ACTA2 3% , n o i t e l e d e g r a l r e h t o s m l i W r o m u t l l e c d n u o r l l a m s c i t s a l p o m s e d , s m l i W 1 T TCEB1 78.5% altered in APP APP 5% W

patient samples BEST1 BEST1 6%

BRCA1 5% , t f i h s e m a r f r e h t o , l a m y h c n e s e BRCA1m

TCEB2 USP33 DIO2 CREBBP CREBBP 5%

, e s n e s n o n , e s n e s s i m , l a i l e h t i p e g n u l l l e c l l a m s , t s a e r

CTSG CTSG 3% b

DIO2 6% MMP26 , n o i t e l e d e g r a l , a m o h p m y l / a i m e a k u e l , a m o c r a s , a m o t s a l b o n i t e r g n u l l l e c l l a m s , t s a e r b , a m o c r a s , a m o t s a l b o n i t e r 1 B BEST1 DIO2 R VHL GNL3 EP300 EP300 6% MMP9 RB1 ESR1 ESR1 6%

55.7% GNL3 GNL3 2%

PPP2R5A MMP7 THBS1

ACTA2 MDM4 HIPK2 8% r e h t o , l a m y h c n e s e m s e p y t r u o m u t r e h t SMAD3 HIPK2 o

JUN 3% SERPINA1

ESR1 JUN t f i h s e m a r f , l a i l e h t i p e e l p i t l u m , a m o i l g , a m o n i c r a c s e p y t r u o m u t r e h t o e l p i t l u m , a m o i l

EP300 LRP1 LRP1 10% g

MDM4 4% TFPI , e s n e s n o n , e s n e s s i m , a m o h p m y l / a i m e a k u e l l a c i t r o c o n e r d a , a m o c r a s , t s a e r b , l a c i t r o c o n e r d a , a m o c r a s , g n u l , l a t c e r o l o c , t s a e r b 3 5 P TP63 MDM4 LRP1 T BRCA1 MMP1 0% 8.1% PPP1R13B MMP1 CTSG

MMP26 MMP26 4% MMP1

JUN TP53 PRSS3 , t f i h s e m a r f MMP7 MMP7 5%

HIPK2 MMP9 MMP9 3% , e s n e s n o n , e s n e s s i m r e h t o , l a m y h c n e s e m a m o t y c o m o r h c o e h 5.2% p

SP1 CREBBP PPP1R13BPPP1R13B 17% APP

WT1 PPP2R5A 9% SERPINA3 , n o i t e l e d e g r a l , l a i l e h t i p e , a m o i g n a m e h , l a n e r a m o t y c o m o r h c o e h p , a m o i g n a m e h , l a n e r L H PPP2R5A V PRSS2 PRSS2 0%

PRSS3 PRSS3 3%

BEST1 MMP26 Cancer gene s e p y T n o i t a t u M s e p y T e u s s i T ) e n i l m r e G ( s e p y T r o m u T ) c i t a m o S ( s e p y T r o m u T e n e RB1 8% G GNL3 MMP9 RB1 Drug target SERPINA1 6% RB1 SERPINA1 mRNA

SERPINA3SERPINA3 7%

MDM4 PPP2R5A MMP7 THBS1 Upregulation : s u s n e C e n e G r e c n a C r e g n a S e h t y b d e g o l a t a c s a , s e n e g r e c n a c n w o n k e r a s e n e g y r e u q r u o y f o ACTA2 SMAD3 SMAD3 SMAD3 9% 9

SERPINA1

ESR1 SP1 SP1 9% mRNA : n o i t a m r o f n I s u s n e C e n e G r e c n a C r e g n a EP300 TCEB1 TCEB1 11% Downregulation S TFPI TP63 LRP1 TCEB2 TCEB2 2% BRCA1 WT1CTSG 6% PPP1R13B TCEB1 80% TFPI TFPI 5% Mutation MMP1

JUN PRSS3 10% 70% THBS1 THBS1 4%

TP53 60% . e v i t a t u p e r a s n o i t a r e t l a r e b m u n y p o TP53 TP53 13% RPPA C HIPK2 Mutati50%on mRNA Upregulation mRNA Downregulation RPPA Upregulation RPPA Downregulation

TCEB2 USP33 DIO2 40% TP63 TP63 6% Upregulation

SP1 CREBBP APP

WT1 SERPINA3 Mutation 20% USP33 USP33 6% n o i t a l u g e r n w o D A P P R n o i t a l u g e r p U A P P R n o i t a l u g e r n w o D A N R m n o i t a l u g e r p U A N R m n o i t a t u 10% M frequency VHL VHL 44% RPPA Copy nu0%mber alterations are putative.

VHL WT1 WT1 6% Downregulation

% 6 1 T W

Sanger Cancer Gene Census Information: 9 of your query genes are known cancer genes, as cataloged by the Sanger Cancer Gene Census:

TCEB1 Gene Tumor Types (Somatic) Tumor Types (Germline) Tissue Types Mutation Types

TCEB2 USP33 DIO2 VHL renal, hemangioma, renal, hemangioma, epithelial, large , pheochromocytoma mesenchymal, other missense, nonsense, VHL frameshift,

TP53 breast, colorectal, lung, , adrenocortical, breast, sarcoma, adrenocortical leukaemia/, missense, nonsense, glioma, multiple other tumour types carcinoma, glioma, multiple epithelial, frameshift other tumour types mesenchymal, other

RB1 , sarcoma, breast, small cell lung retinoblastoma, sarcoma, leukaemia/lymphoma, large deletion, breast, small cell lung epithelial, missense, nonsense, mesenchymal, other frameshift,

WT1 Wilms, desmoplastic small round cell tumor Wilms other large deletion, missense, nonsense, frameshift,

EP300 colorectal, breast, pancreatic, acute leukaemia/lymphoma, translocation, myelogenous leukemia, acute lymphocytic epithelial nonsense, frameshift, leukemia, diffuse large B-cell lymphoma missense, other

BRCA1 ovarian breast, ovarian epithelial large deletion, missense, nonsense, frameshift,

CREBBP acute lymphocytic leukemia, acute myelogenous leukaemia/lymphoma translocation, leukemia, diffuse large B-cell lymphoma, B-cell nonsense, frameshift, Non- missense, other

JUN sarcoma mesenchymal amplification

MDM4 GBM, bladder, retinoblastoma mesenchymal amplification

cBioPortal | MSKCC | TCGA Questions and feedback: [email protected] | User discussion group bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Fig. 3

a ●●● ●●●●●●● ●● ●●● ● TCGA KIRC Group A (n=237, HR vs. A)) 1.0 Group A (n=47, HR vs. A)

1.0 b ●● TCGA BLCA ●●● ●●●●●●●●●● ● ●●●●● ●● ●●● Group B (n=113, HR = 1.44) Group B (n=42, HR = 2.40) ●●● ●●●●●●●● ● ●●● ● ●●● ● ●●●●●● ● ●●● ●● Group C (n=76, HR = 2.94) Group C (n=20, HR = 2.73) ● ●●●● ●●●●●● ●● ●●●●● ●●●●● Group D (n=34, HR = 3.32) ● ● ● ●●●●● ●● ● ● ●●●●

●● 0.8

0.8 ● ●● ● ●●●● Log-rank P = 1.84E-08 ●●●●●● ●● ●●● ● ● ●● ● Log-rank P = 0.0086 ●●● ●●●●●●●●●● ●●●●● ●●●●●● ●● ● ● ● ● ●● ●●● ● ●●●●● ●●●●●●● ● ●● ● 0.6

0.6 ●●●● ● ● ● ● ●● ●●● ●● ● ●● ●● ● ●●● ● ● ●● ● ● ● ● ● ●●● ● 0.4 0.4 ● ●●

●● ● ●●● ● ● ● Survivalfraction Survivalfraction

0.2 ● ● ● 0.2 ● ● ● 0.0 0.0 0 20 40 60 80 100 0 20 40 60 80 100 120 Months Months

●●●●●● ●●●●● ● c ●● TCGA HNSC ● TCGA SKCM

1.0 Group A (n=138, HR vs. A) Group A (n=44, HR vs. A)) d 1.0 ●●●●●●●● ● ● ●●● ●●● ● ● Group B (n=53, HR = 1.78) Group B (n=5, HR = 1.50) ● ● ●● ●●● ● ● ● Group C (n=49, HR = 1.44) ● Group C (n=54, HR = 1.64) ●● ●● ●● Group D (n=7, HR = 2.07) Group D (n=54, HR = 1.39) ●● ● ● ● 0.8 0.8 ●●● ● Group E (n=10, HR = 4.08) ● ●● ●● ● Log-rank P = 0.0210 ● ●●●● ● ● ● ● ● Log-rank P = 0.0010 ● ● ● ● ● ● ●●●●●●●● ●●● ●●● 0.6 0.6 ● ● ● ● ● ● ●● ●● ● ●● ● ● ●●●● ●● ● ●● ● ● ● ●● ● ● ● ● 0.4 0.4 ●●●● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Survivalfraction ● Survivalfraction 0.2

● ● 0.2

● ● ● 0.0 0.0 0 20 40 60 80 100 120 140 0 50 100 150 200 250 300 350 Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Fig. 4

a BLCA b BRCA c GBM 10 10 10 -log(p) -log(p) -log(p)

NTriPath TCGA PanCaner

Number of subgroups Number of subgroups Number of subgroups d e f HNSC KIRC LUAD 10 10 10 -log(p) -log(p) -log(p)

Number of subgroups Number# groupsof subgroups Number# groups of subgroups bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Fig. 5 a b Lung adenocarcinoma 1.0 1.0 UTSW Group A (n=27, HR vs. A) ● HNSC Group B (n=14, HR = 1.29) Shedden et. al., Nature 2008 Group C (n=16, HR = 1.37) ● ● ● ● Group A (n=173, HR vs. A ) Group D (n=9, HR = 1.55) ●●●●●● ●● 0.8

0.8 Group B (n=269, HR = 1.48 ) Group E (n=7, HR = 1.67) ●● ● ● ● ●●●● Group F (n=29, HR = 1.88) ●●● ●●● Log-rank P = 0.0057 ● ●● ●●●●● ● ● ● ●● ●●●● Log-rank P = 0.0381 ● ● ●●

0.6 ●●

0.6 ● ●●● ● ●●●● ● ●● ●●● ●●● ●●●● ●●●●●●●●●● ● ● ●●●●● ●●● ● ● ●● ● ● ●●● ●● ● ●●●● 0.4 ●●●●●●

0.4 ● ● ●●●● ● ●● ●●● ● ● ● ● ● ● ●

Survivalfraction ● ●● Survivalfraction ●● ● ● ● ● ●● ●

0.2 ●● ● ● ● ● ● ● 0.2 ● ● ● 0.0 0.0 0 50 100 150 200 0 50 100 150 Months Months 1.0

●● ● Ovarian cancer c ●●●●● ● d 1.0 1.0 ● Colon cancer ●● ● ● ● ● Bonome et. al., Cancer Res 2008 ● ● ● ● ● ●● GSE39582 ● ●● ●●●●●● ●●● ●●●

●● 0.8 ●● ●● ● ● ● ●●●●●● Log-rank P = 0.0108 Group A (n=27, HR vs. A) ● ●●●● ●● ● ●● ●●●●●● ● ●●●●● 0.8 0.8 ● ● ●●●●●●●●●●● ●● Group B (n=74, HR = 0.897) ●●●●●●●● ● ● ● ● ● ● ●●●● ●● ●●●●● ● ● ●●● ●● ● ●●●●● ●●●●●●●●●● ●● ● ●● ● ● ● ● ● Group C (n=14, HR = 1.62) ●●● ●● ● ●●● ●● ● Group D (n=38, HR = 1.59) ●● ● ●● ●●●● ● ● ● ● ●

●●●●●●●●● ● 0.6 ●●●● ● ●●●●● ● ● ●●● ●● ● ● 0.6 0.6 ●●● ● ●●●●●● ●● ● ● ● ● ● ● ● ●●●●● ● ● Log-rank P = 0.0035 ●● ● ● ●● ● ● ● ● ●● ●● ●●●● ● ● ●n = 102, p = 0.0669

0.4 ● ● ● 0.4 0.4 HR1=1.15● (95%CI,0.605−2.18) ● Group A (n=106, HR vs. A) HR2=1.4● (95%CI,0.606−3.22) ● ● ●● Group B (n=51, HR = 0.913) HR3=1.56 (95%CI,0.601−4.04) ● ● ● ● Survivalfraction Survivalfraction Group C (n=43, HR = 1.24) ● ● ● ● 0.2 HR4=1.72 (95%CI,0.888−3.31) ● ●●

0.2 ● Group D (n=49, HR = 1.29) 0.2 ● ● ● ●

Group E (n=102, HR = 1.51) ● Group F (n=58, HR = 1.77) ● ● ● 0.0 0.0 0.0 0 50 100 150 200 0.00 0.1 50 0.2 1000.3 1500.4 Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supplementary Figure 1 BEST1 MMP26 GNL3 MMP9 RB1

MDM4 PPP2R5A MMP7 THBS1 ACTA2 SMAD3 SERPINA1 ESR1 EP300 TP63 TFPI LRP1 BRCA1 PPP1R13B CTSG MMP1 JUN TP53 PRSS3 HIPK2 SP1 CREBBP APP WT1 SERPINA3

bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

A MAGEB18 B TCEB1 YBX1 EEF2 BEST1 MMP26 TCEB2 USP33 DIO2 GNL3 MMP9 TOP1 BEST1 RB1 GNL3 NPM1 MDM4 PPP2R5A MMP7 THBS1 VHL ACTA2 SMAD3 PPP2R5A SERPINA1 RB1 ESR1 EGFR RPA1 MSH6 ACTA2 MMP26 EP300 PCNA BEST1 TFPI GNL3 MMP9 TP63 LRP1 BRCA1 CTSG RB1 JUN HIPK2 HNSC PPP1R13B SP1 MMP1 RPS27A MDM4 PPP2R5A MMP7 THBS1 JUN TP53 PRSS3 WRN ACTA2TP53 SMAD3 SERPINA1 ESR1 BRCA1 HIPK2 MDM4 98.7% altered in EP300 SP1 CREBBP APP YWHAZ EP300 TP63 TFPI LRP1 WT1 SERPINA3 PARP1 BRCA1 patient samples CTSG CDK5 PPP1R13B TP63 MMP1 JUN CREBBP TP53 ESR1 PRSS3 HIPK2 SMAD3 78.5% altered in YEATS4 AURKA SP1 CREBBP PPP1R13B APP KIRC WT1 SERPINA3 CABLES1 NCOA6 WT1 patient samples

RASSF5 TACC1 TGS1 TDRD7 HSF1 BCL2 SHOC2 TCEB1

C D HRAS BEST1 PIK3CA TCEB2 USP33 DIO2 PLCE1 PIK3CD TCEB1 VAV1 VHL INSR USP33 TCEB2 DIO2 FYN PPP2R5A

RASA1 VHL IGSF9 PIK3R1 UCEC 99.2% altered in HSF1 ASCC2 KIT EGFR patient samples RPA1 GNL3

ACTA2 ERBB2 CDKN2D MSH6 SRC TP63 WRN PCNA STAD CDK6 TP53 CTNNB1 SMARCA4 STAD SLC9A3R1 SMAD3 PARP1 CREBBP WT1 62.5% altered in AXIN2 NF2 EP300 HIPK2 AXIN1 patient samples PTPN13 BRCA1 MSN IQGAP1 JUN CHMP4B

ESR1 FHL2 SPN APC NCOA6 PML ARHGEF4 PPP2R5A PIAS2 RHOA DDX39 TGS1 RP1 RASGRF1 bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

E BRCA 82.6% altered in F SLC25A38 JUN SP1 patient samples

RB1 CTSE PML CCNB1 EMG1 ANXA3 BEST1 SMAD3 CHEK2 ESR1 MAGEB18 CHD3 BRCA1 MDM4 PPP2R5A CDK5 CDC25C VIM EP300 AURKA TP63 NCOA6 HSF1 YEATS4 EEF2 PPP1R13B TACC1 TOP1 PCNA TP53 CREBBP PIN1 GNL3 ZHX1 MSH6 FHL2 CABLES1 YBX1 TP53 ASCC2 ACTA2 TGS1 GNL3 RPS27A HIPK2 CAPN1 WT1 TDRD7 IGSF9 PARP1 ACTA2 NPM1 PCNA JUN NFKBIA YWHAZ RPA1 BLCA 88.2% altered in EGFR ACTC1 WRN BRCA2 THAP8 patient samples HSPB1 PPA1 G H ATM TFAP2C CITED1 COAD 78.8% altered in NBN RASGRF1 RP1 patient samples MDC1 RHOA GAK AXIN1 BARD1 ESR1 STAT3 ARHGEF4 DVL3 CREBBP IQGAP1 CITED2 PPP2CA BRCA1 CCNG1 CCNG2 TP53 EP300 CTNNB1 CDK5 ZHX1 JUN MED21 PTPN13 APC SMARCA4 BRCA1 TP53 HSF1 TFAP2A SMAD3 STAT1 KPNA2 AXIN2 PPP2R5A PCNA ESR1 CREBBP KPNB1 RAN RPA1 MSH6 EP300 SMAD3

NUP153 CSE1L PARP1 CESC JUN WRN TGS1 85.4% altered in patient samples NCOA6 bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

I J KICH 78.5% altered in MAGEB18 YEATS4 GBM patient samples DDX5 ACTC1 HSPB1 TACC1 TEP1 SMG5 AURKA MAGEB18 PPA1 HSPA1A CAPN1 NFKBIA GNL3 TERT TDRD7 CDK5 TP53 EEF2 YWHAZ VIM CABLES1 ZHX1 SP1 SLC25A38 EGFR PCNA PRKCA HMGB2 YBX1 ANXA3 CHD3 75.5% altered in EMG1 NPM1 ACTA2 YWHAZ CABLES1 patient samples EEF2 TP53 TDRD7 CTSE NPM1 ACTA2 JUN TOP1 EGFR RPS27A CDK5 RPS27A PPP1R14A TACC1 MDM4 JUN YEATS4 MAPK9 YBX1 AURKA GNL3 TOP1 PIN1 PCNA BRCA2 YWHAQ PRKD1 THAP8 CDC25C

CHEK2 CCNB1 K L LAML 66.7% altered in KIRP 65.8% altered in EEF2 patient samples BRCA1 GNL3 PDZK1 BRCA1 patient samples TOP1 C11orf30 CDK2 CREBBP SRC CREBBP SMAD3 SRC YBX1 SMAD3 CD7 SHFM1 CEBPB CD7 PARP1 PDZK1IP1 CEBPB PIK3R1 BRCA2 PIK3R1CTNNB1 CTNNB1 MET NPM1 MET SMARCA4 EP300 ACTA2 MNDA SMARCA4 EP300 WDR16 EGFR RELB MYC EGFR EIF2AK2 SMARCB1 PLCG1 RELB MYC SAT1 SMARCB1 PLCG1 TP53 DNAJC3 CALM1 CALM1 HSPA1A ZNF24 PTPRJ YEATS4 SS18 FER PTPRJ NQO1 YEATS4 SS18 FER APLP1 TACC1 CDKN2C DNAJB1 TACC1 PITPNA RRM2 PITPNA RRM2B TFAP2B MLLT10 PIK3C3 TFAP2B MLLT10 PIK3C3 PTEN HIST1H2AC HIST1H2AC RRM1 bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

ACTC1 M N LUAD ACTC1

HSPB1 CAPN1 70.8% altered in HSPB1 PPA1 CAPN1 PPA1 MAGEB18 patient samples NFKBIA VIM ZHX1 MAGEB18 NFKBIA VIM YWHAZ ZHX1 SLC25A38 CHD3 YWHAZ EGFR SLC25A38 CHD3 ANXA3 EGFR TDRD7 EMG1 CABLES1 ANXA3 TDRD7 NPM1 EMG1 CABLES1 CTSE TP53 NPM1 CDK5 EEF2 CTSE TP53 ACTA2 TACC1 CDK5 RPS27A EEF2 AURKA ACTA2 TACC1 LGG JUN YEATS4 RPS27A AURKA YBX1 JUN YEATS4 88% altered in GNL3 TOP1 PCNA YBX1 THAP8 GNL3 TOP1 patient samples PIN1 PCNA BRCA2 THAP8 PIN1 BRCA2

CDC25C

CCNB1 CDC25C CHEK2 CCNB1 CHEK2 SLC25A38 P O CTSE PRAD EEF2 GNL3 EMG1 NPM1 48% altered in CCNB1 CDC25C YBX1 patient samples CCNB1 CHEK2 BRCA2 ACTA2 TOP1 CIB1 CDC25A CDC25C ANXA3 JUN MAGEB18 PLK3 CHEK1

PCNA PIN1 YWHAZ PIN1 TP53 CHEK2 THAP8

RPS27A EGFR NFKBIA AP3B2 ZHX1 HSPB1 CDK5 E2F1 OV USP40 BRCA1 PPA1 CHD3 DDB2 ATM TIPARP 80.3% altered in CAPN1 TP53 XRN2

CABLES1 MSH2 patient samples AURKA ACTC1 VIM MDM4 LMO4 MSH6

TACC1 TDRD7 BLM RAD51 YEATS4 bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Q R KRT18 TCHP

RASGRF1 CCNG2 DVL3 PKP3 RHOA CCNG1 MAGEB18 PPP2CA PKP2 AHSA1 EGFR COL17A1 KRT14 KRT5 AHSA2 ARHGEF4 GAK DSC2 GCH1 PPP2R5A CDK5 AXIN1 YWHAZ DSC1 HSP90AA1 DSG2 APOB DSG1 TG CTNNB1 CABLES1 PKP1 IQGAP1 EEF2 SP1 APC TP53 KRT18 TCHP DST HSP90B1 RPS27A DSP AXIN2 TDRD7 NPM1 JUP HSPA8 PKP3 ASGR1 PTPN13 RP1 AURKA TOP1 ACTA2 PKP2 SKCM AHSA1 HSPA1A PCNA COL17A1 KRT14 MTTP NR3C1 KRT5 AHSA2 TACC1 86.2% altered in GCH1 DSC2 GNL3 READ YBX1 DSC1 patient samples DSG2JUN HSP90AA1 72% altered in YEATS4 APOB TG DSG1 BAG1 PKP1 patient samples DST SP1 DSP HSP90B1 JUP HSPA8 ASGR1 S HSPA1A MTTP NR3C1

RASIP1 RAP1A

PIK3CD BAG1 PRKCI PIK3CA PDE6D NTRK1

RASGRP1 SRC PIK3R1 FES HSPB1 TIAM1 RASA1

HRAS RARRES3 KIT VAV1 TGM1 THCA CAV1 INSR MDK 65.3% altered in PLCE1 FYN SEMG1 patient samples RASSF5 SHOC2 BCL2 BEST1 MMP26 GNL3 MMP9 RB1

MDM4 PPP2R5A MMP7 THBS1 ACTA2 SMAD3 SERPINA1 ESR1 EP300 TP63 TFPI LRP1 BRCA1 PPP1R13B CTSG MMP1 JUN TP53 PRSS3 HIPK2 SP1 CREBBP APP WT1 SERPINA3

bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

a MAGEB18 b TCEB1 YBX1 EEF2 BEST1 MMP26 TCEB2 USP33 DIO2 GNL3 MMP9 TOP1 BEST1 RB1 GNL3 NPM1 MDM4 PPP2R5A MMP7 THBS1 VHL ACTA2 SMAD3 PPP2R5A SERPINA1 RB1 ESR1 EGFR RPA1 MSH6 ACTA2 MMP26 EP300 PCNA BEST1 TFPI GNL3 MMP9 TP63 LRP1 BRCA1 CTSG RB1 JUN HIPK2 HNSC HNSC PPP1R13B SP1 MMP1 RPS27A MDM4 PPP2R5A MMP7 THBS1 JUN TP53 PRSS3 WRN ACTA2TP53 SMAD3 SERPINA1 ESR1 BRCA1 HIPK2 MDM4 98.7% altered in EP300 SP1 CREBBP APP YWHAZ EP300 TP63 TFPI LRP1 WT1 SERPINA3 PARP1 BRCA1 patient samples CTSG CDK5 PPP1R13B TP63 MMP1 JUN CREBBP TP53 ESR1 PRSS3 HIPK2 SMAD3 78.5% altered in YEATS4 AURKA SP1 CREBBP PPP1R13B APP KIRC WT1 SERPINA3 CABLES1 NCOA6 WT1 patient samples

RASSF5 TACC1 TGS1 TDRD7 HSF1 BCL2 SHOC2 TCEB1 c d HRAS BEST1 PIK3CA TCEB2 USP33 DIO2 PLCE1 PIK3CD TCEB1 VAV1 VHL INSR USP33 TCEB2 DIO2 FYN PPP2R5A

RASA1 VHL IGSF9 PIK3R1 UCEC 99.2% altered in HSF1 ASCC2 KIT EGFR patient samples RPA1 GNL3

ACTA2 ERBB2 CDKN2D MSH6 SRC TP63 WRN PCNA STAD CDK6 TP53 CTNNB1 SMARCA4 STAD SLC9A3R1 SMAD3 PARP1 CREBBP WT1 62.5% altered in AXIN2 NF2 EP300 HIPK2 AXIN1 patient samples PTPN13 BRCA1 MSN IQGAP1 JUN CHMP4B

ESR1 FHL2 SPN APC NCOA6 PML ARHGEF4 PPP2R5A PIAS2 RHOA DDX39 TGS1 RP1 RASGRF1 bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

e BRCA 82.6% altered in f SLC25A38 JUN SP1 patient samples

RB1 CTSE PML CCNB1 EMG1 ANXA3 BEST1 SMAD3 CHEK2 ESR1 MAGEB18 CHD3 BRCA1 MDM4 PPP2R5A CDK5 CDC25C VIM EP300 AURKA TP63 NCOA6 HSF1 YEATS4 EEF2 PPP1R13B TACC1 TOP1 PCNA TP53 CREBBP PIN1 GNL3 ZHX1 MSH6 FHL2 CABLES1 YBX1 TP53 ASCC2 ACTA2 TGS1 GNL3 RPS27A HIPK2 CAPN1 WT1 TDRD7 IGSF9 PARP1 ACTA2 NPM1 PCNA JUN NFKBIA YWHAZ RPA1 BLCA 88.2% altered in EGFR ACTC1 WRN BRCA2 THAP8 patient samples HSPB1 PPA1 g h ATM TFAP2C CITED1 COAD 78.8% altered in NBN RASGRF1 RP1 patient samples E2F1 MDC1 RHOA GAK AXIN1 BARD1 ESR1 STAT3 ARHGEF4 DVL3 CREBBP IQGAP1 CITED2 PPP2CA BRCA1 CCNG1 CCNG2 TP53 EP300 CTNNB1 MYC CDK5 ZHX1 JUN MED21 PTPN13 APC SMARCA4 BRCA1 TP53 HSF1 TFAP2A SMAD3 STAT1 KPNA2 AXIN2 PPP2R5A PCNA ESR1 CREBBP KPNB1 RAN RPA1 MSH6 EP300 SMAD3

NUP153 CSE1L PARP1 CESC JUN WRN TGS1 85.4% altered in patient samples NCOA6 bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

i j KICH 78.5% altered in MAGEB18 YEATS4 GBM patient samples DDX5 ACTC1 HSPB1 TACC1 TEP1 SMG5 AURKA MAGEB18 PPA1 HSPA1A CAPN1 NFKBIA GNL3 TERT TDRD7 CDK5 TP53 EEF2 YWHAZ VIM CABLES1 ZHX1 SP1 SLC25A38 EGFR PCNA PRKCA HMGB2 YBX1 ANXA3 CHD3 75.5% altered in EMG1 NPM1 ACTA2 YWHAZ CABLES1 patient samples EEF2 TP53 TDRD7 CTSE NPM1 ACTA2 JUN TOP1 EGFR RPS27A CDK5 RPS27A PPP1R14A TACC1 MDM4 JUN YEATS4 MAPK9 YBX1 AURKA GNL3 TOP1 PIN1 PCNA BRCA2 YWHAQ PRKD1 THAP8 CDC25C

CHEK2 CCNB1 k l LAML 66.7% altered in KIRP 65.8% altered in EEF2 patient samples BRCA1 GNL3 PDZK1 BRCA1 patient samples TOP1 C11orf30 CDK2 CREBBP SRC CREBBP SMAD3 SRC YBX1 SMAD3 CD7 SHFM1 CEBPB CD7 PARP1 PDZK1IP1 CEBPB PIK3R1 BRCA2 PIK3R1CTNNB1 CTNNB1 MET NPM1 MET SMARCA4 EP300 ACTA2 MNDA SMARCA4 EP300 WDR16 EGFR RELB MYC EGFR EIF2AK2 SMARCB1 PLCG1 RELB MYC SAT1 SMARCB1 PLCG1 TP53 DNAJC3 CALM1 CALM1 HSPA1A ZNF24 PTPRJ YEATS4 SS18 FER PTPRJ NQO1 YEATS4 SS18 FER APLP1 TACC1 CDKN2C DNAJB1 TACC1 PITPNA RRM2 PITPNA RRM2B TFAP2B MLLT10 PIK3C3 TFAP2B MLLT10 PIK3C3 PTEN HIST1H2AC HIST1H2AC RRM1 bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

ACTC1 m n LUAD ACTC1

HSPB1 CAPN1 70.8% altered in HSPB1 PPA1 CAPN1 PPA1 MAGEB18 patient samples NFKBIA VIM ZHX1 MAGEB18 NFKBIA VIM YWHAZ ZHX1 SLC25A38 CHD3 YWHAZ EGFR SLC25A38 CHD3 ANXA3 EGFR TDRD7 EMG1 CABLES1 ANXA3 TDRD7 NPM1 EMG1 CABLES1 CTSE TP53 NPM1 CDK5 EEF2 CTSE TP53 ACTA2 TACC1 CDK5 RPS27A EEF2 AURKA ACTA2 TACC1 LGG JUN YEATS4 RPS27A AURKA YBX1 JUN YEATS4 88% altered in GNL3 TOP1 PCNA YBX1 THAP8 GNL3 TOP1 patient samples PIN1 PCNA BRCA2 THAP8 PIN1 BRCA2

CDC25C

CCNB1 CDC25C CHEK2 CCNB1 CHEK2 o SLC25A38 p CTSE PRAD EEF2 GNL3 EMG1 NPM1 48% altered in CCNB1 CDC25C YBX1 patient samples CCNB1 CHEK2 BRCA2 ACTA2 TOP1 CIB1 CDC25A CDC25C ANXA3 JUN MAGEB18 PLK3 CHEK1

PCNA PIN1 YWHAZ PIN1 TP53 CHEK2 THAP8

RPS27A EGFR NFKBIA AP3B2 ZHX1 HSPB1 CDK5 E2F1 OV USP40 BRCA1 PPA1 CHD3 DDB2 ATM TIPARP 80.3% altered in CAPN1 TP53 XRN2

CABLES1 MSH2 patient samples AURKA ACTC1 VIM MDM4 LMO4 MSH6

TACC1 TDRD7 BLM RAD51 YEATS4 bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

q r KRT18 TCHP

RASGRF1 CCNG2 DVL3 PKP3 RHOA CCNG1 MAGEB18 PPP2CA PKP2 AHSA1 EGFR COL17A1 KRT14 KRT5 AHSA2 ARHGEF4 GAK DSC2 GCH1 PPP2R5A CDK5 AXIN1 YWHAZ DSC1 HSP90AA1 DSG2 APOB DSG1 TG CTNNB1 CABLES1 PKP1 IQGAP1 EEF2 SP1 APC TP53 KRT18 TCHP DST HSP90B1 RPS27A DSP AXIN2 TDRD7 NPM1 JUP HSPA8 PKP3 ASGR1 PTPN13 RP1 AURKA TOP1 ACTA2 PKP2 SKCM AHSA1 HSPA1A PCNA COL17A1 KRT14 MTTP NR3C1 KRT5 AHSA2 TACC1 86.2% altered in GCH1 DSC2 GNL3 READ YBX1 DSC1 patient samples DSG2JUN HSP90AA1 72% altered in YEATS4 APOB TG DSG1 BAG1 PKP1 patient samples DST SP1 DSP HSP90B1 JUP HSPA8 ASGR1 s HSPA1A MTTP NR3C1

RASIP1 RAP1A

PIK3CD BAG1 PRKCI PIK3CA PDE6D NTRK1

RASGRP1 SRC PIK3R1 FES HSPB1 TIAM1 RASA1 80% HRAS 70% RARRES3 60% KIT VAV1 TGM1 THCA 50% CAV1 INSR 40% MDK 65.3% altered in Mutation 20% PLCE1 FYN 10% SEMG1 frequency patient samples 0% RASSF5 SHOC2 BCL2 bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supplementary Figure 2

●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●● ●

●●●●●●●●●●●●● ●●● 1.0

1.0 ● ●●●●●●●●●●● ●●●●● ● ●●●● TCGA BRCA ● TCGA GBM ●●●●●●●● ● ●● ● ●●●●●●●● ● ●●● ● ●●●●●●●●● ●●●●●●● ●●●●●●●● ●● ● ●●●●●●●●● ●● ● ●● ● ●● ●●●●●●●●●●●● ●●●● ●● ●● ● ● ●●● ●● ● ● ●● ●● ●●●●●●● ● ●● ● ●●●● ●●● 0.8 ● 0.8 ●●● ● ● ●● ●●● ●● ● ● ●●● ●●● ●●●● ●●●● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ●●

●● ● ● ● ● 0.6 ● 0.6 ●● ● ● ● ● ●● ● ● ● ● ● ● ● 0.4 n = 544, p = 0.0399 0.4 n = 794,● p = 0.00107● ●● ● ● ● ● ● ● HR1=1.21 (95%CI,0.621−2.37) ● HR1=1.28 (95%CI,0.991−1.66) ● ● ● ● ● HR2=1.23 (95%CI,0.587−2.58) HR2=1.06● (95%CI,0.814−1.38) ● ● HR3=2.44 (95%CI,1.23−4.86) ● HR3=1.13 (95%CI,0.807−1.57) ● ● ● 0.2

0.2 ● ● ●● ●● ● ● ● ● ● 0.0 0.0 0 50 100 150 200 0 20 40 60 80 100 120 Months Months

●●● ●●● ●● ●●●●●●

1.0 ● TCGA LUAD 1.0 ● ●● ● TCGA OV ●●● ●● ●● ●●●●●●●●●●● ●●● ● ●● ● ●● ●●●●●● ● ●●●●● ●● ● ● ●●●● ●● ● ●● ● ●●● ●● ●●● ● ●● ●● ●● ●●●●

● ● 0.8 ● 0.8 ●● ● ● ● ●● ●●●● ●●●●● ● ●●● ● ●●●● ●● ● ●● ● ● ●●● ● ● ●● ● ● ●

● 0.6

0.6 ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ●● ●●● ●● ●●●● ● n = 504, p = 0.00742 ● ● ● ● ● 0.4 0.4 ● n = 213, p = 0.0457 HR1=1.57● (95%CI,1.05−2.36) ● ● ●●● ● ●●●● ● HR2=1.5 (95%CI,0.994● −2.26) HR1=1.51 (95%CI,0.811−2.8) ● ● ● ●● ● HR3=1.91● ● ●● (95%CI,1.08−3.36) HR2=1.78 (95%CI,0.99−3.21) ● ● ● ● ● ●●●● ● ● ● 0.2 0.2 HR4=1.6 (95%CI,1.07● −2.41) ● ●

● ● ● ● ● 0.0 0.0 0 50 100 150 0 50 100 150 200 Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supplementary Figure 3 KM plots bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

a. TCGA BLCA 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8

2) k=2 k=3 k=4 k=5 Survivalfraction

Months Months Months Months

k=6 k=7 k=8 Survivalfraction

Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

b.TCGA BRCA 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8

2) k=2 k=2 k=3 k=3 k=4 k=4 k=5k=5 Survivalfraction

Months Months Months Months

k=6 k=7 k=8 Survivalfraction

Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

c. TCGA GBM 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8

2) k=2 k=2 k=3 k=4 k=5 Survivalfraction

Months Months Months Months

k=6 k=7 k=8 Survivalfraction

Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

d.TCGA HNSC 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8

●●●●●● ●●●●●● ●●●●●● ●

● ● 1.0 2) 1.0 ●● 1.0 ●●● ●● ●● ● ●● ●●●●●● ●●●● ●●●● ●●●●●●● ●● ●●●● k=4 k=2 k=3 ● k=5 ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ●● ● ● 0.8 0.8 ● 0.8 ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●

●●● ● 0.6 0.6 ● 0.6 ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ●

●●●● ● ● ●● ●●●● ● ● ● ● ● ● 0.4 n = 163, p = 0.0134 0.4 0.4 ●● ● ● ● ● ● ●● n = 163, p = ●0.301● ● HR1=0.698 (95%CI,0.375−1.3) n = 163, p = 0.468 ● ● ● ● ● ● ● ● ● ● ● ● HR1=0.895● ● (95%CI,0.51−1.57) HR1=0.83 (95%CI,0.502● −1.37) ● ● ● ● HR2=1.16 (95%CI,0.647−2.08) HR2=1.23 (95%CI,0.684−2.19) ● ● ● ● HR3=2.02 (95%CI,0.878−4.66) ● 0.2 0.2 0.2

● ● Survivalfraction 0.0 0.0 0.0 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 Months Months Months Months

k=6 k=7 k=8 Survivalfraction

Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

e.TCGA KIRC 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8

2) k=2 k=3 k=4 k=5 Survivalfraction

MonthsMonths Months Months Months

k=6 k=7 k=8 Survivalfraction

Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

f. TCGA LAML 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8

2) k=2 k=3 k=4 k=5 Survivalfraction Months Months Months Months

k=6 k=7 k=8 Survivalfraction

Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

g. TCGA LGG 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8

2) k=2 k=3 k=4 k=5 Survivalfraction Months Months Months Months

k=6 k=7 k=8 Survivalfraction

Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

h.TCGA LUAD 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8

2) k=2 k=3 k=4 k=5 Survivalfraction Months Months Months Months

k=6 k=7 k=8 Survivalfraction

Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

i. TCGA OV 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8

2) k=2 k=3 k=4 k=5 Survivalfraction Months Months Months Months

k=6 k=7 k=8 Survivalfraction

Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

j.TCGA SKCM 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8

2) k=2 k=3 k=4 k=5 Survivalfraction

Months Months Months Months

k=6 k=7 k=8 Survivalfraction

Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supplementary Figure 4 Independent dataset bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

a. UTSW HNSC 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8

2) k=2 k=3 k=4 k=5 Survivalfraction

Months Months Months Months

k=6 k=7 k=8 Survivalfraction

Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

b.Colon cancer (GSE39582) a k=2 k=3 k=4 k=5 k=6 k=7 k=8

b k=2 k=3 k=4 k=5 Survivalfraction

Months Months Months Months

k=6 k=7 k=8 Survivalfraction

Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

b.Colon cancer (GSE17538) a k=2 k=3 k=4 k=5 k=6 k=7 k=8

● ● ● ● ● ● ● 1.0 ● 1.0 ● 1.0

1.0 ● b ● ● ●● ● ●● ● ●● ● ● ●● ● ● ●●● ● ● ● ●●● ● ● ●● ●● ●●● ● ●● ● ● ●● ●●● ● ●● ●● ●● ●●●● ● ●● ●● ●● ●● ●●●●● ● ●● ●● ● ●●● ● ● ● ●●● ● ●●●●●● k=2 k=3 ● ●●●●●●● ●●●●● ● k=4 ● ●● k=5 ●● ●●● ● 0.8 0.8 0.8 ● ● ●● ● ● ● ● ● ●● ● ● 0.8 ● ●●●● ●●●●●● ● ● ●●●●● ● ● ●●●●● ● ● ● ●●● ● ● ● ●● ● ● ●●●● ● ● ● ● ●● ●●●● ● ●● ● ● ●● ●●●● ●● ● ● ●●●● ●● ● ●●● ●● ●● ● ● ●●●● ● ●● ●● ●● ● ●●● ●● ●●●● ● ●●●● ● ● ● ●●● ● ● ● ● ●● ●●●●● ● ● ●● ● ●● ● ● ● ● ● ●●●●● ● ● ●● ●●●●● ●● ● ● ● ●● ● ●●●●●● ●● ●●●●●●●● ●● ●● ● ● ● ● ● ● ●●● ●● ● ● ●● ● ●● ● ●●● ●●●●●● ●● ● ●●●● ● ●●●●● ● ● ●● ● ●●● ●●●●● ●● ● ● ● ● ●● ● ● 0.6 0.6 0.6 0.6 ●● ●● ● ● ● ● ● ●● ● ● ● ● ●●●● ● ●● ● ● ● ● ●● ●● ●●●● ●● ● ●● ●● ● ● ●● ●● ● ● ● n = ●177,● ● ●● p● =● 0.00137

0.4 0.4 n = 177, p = 0.0205 0.4 0.4 n = 177, p = 0.408 HR1=1.03 (95%CI,0.446−2.4) HR1=1.67 (95%CI,0.843−3.31) n = 177, p = 0.934 HR1=0.763 (95%CI,0.392−1.48) HR2=0.63 (95%CI,0.241−1.64) HR2=0.721 (95%CI,0.311−1.67) HR1=1.03 (95%CI,0.54−1.95) HR2=0.867 (95%CI,0.436−1.72) HR3=2.31 (95%CI,1.1−4.85) HR3=1.06 (95%CI,0.479−2.33) 0.2 0.2 0.2

0.2 HR4=1.44 (95%CI,0.654−3.19) Survivalfraction 0.0 0.0 0.0 0.0 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 Months Months Months Months

● ● ● ● ● ● ● ● 1.0 ● 1.0 ● 1.0 ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ●●● ● ● ● ●● ● ●● ● ● ● ● ●●● ● ● ●●●●●● ●● ●●● ● ● ● ● ●● ●●●●●●● ●● ● ●●●●● ●●● ● k=6 k=7 ● ●● ●● ●● ● k=8 ● ● ● ●● ● ● ● ● ●● ● ● ●● ●● ●●●●● ● ●● ●●● ● ● ● ● ●● ● ● ●● ● ●● ●● 0.8 ● ● ●● ● 0.8 ● ● 0.8 ● ● ●● ● ●● ●●●● ●●●●●● ● ●●● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ●● ●● ● ● ●● ●●●●● ●● ● ● ● ●● ● ●● ●●●●● ●● ● ● ● ●● ● ●●● ● ●●●● ● ●●● ● ● ● ●● ●● ● ●● ●● ●●●● ● ●●●● ●●● ●●●●●● ●● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ●

● 0.6 0.6

0.6 ●● ●● ●● ● ●● ●● ●● n = 177, p = 0.000465 n = 177, p = 0.000644●● ● ● ● ●● ● ● ● ●● ● ●● ● ● HR1=7.21e●● ● ●−●08● ●(95%CI,0−Inf) n = 177,●● ● ● ●p ●= ●0.000511 HR1=1.17 (95%CI,0.227−6.05) ●● ● ●

0.4 0.4 HR2=0.442 (95%CI,0.153−1.28) 0.4 HR1=0.394 (95%CI,0.111−1.4) HR2=3 (95%CI,0.776−11.6) HR3=0.246 (95%CI,0.0316−1.91) HR2=0.392 (95%CI,0.118−1.3) HR3=1.45 (95%CI,0.474−4.43) HR4=0.589 (95%CI,0.248−1.4) HR3=0.389 (95%CI,0.113−1.34) HR4=1.17 (95%CI,0.399−3.43) HR4=1.11 (95%CI,0.352−3.48) HR5=0.744 (95%CI,0.236−2.35) 0.2 HR5=2.04 (95%CI,0.706−5.89) 0.2 0.2 HR6=0.654 (95%CI,0.269−1.59) HR5=0.734 (95%CI,0.229−2.35) HR6=3.34 (95%CI,1.21−9.21) Survivalfraction HR7=1.43 (95%CI,0.652−3.16) 0.0 0.0 0.0 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

c. OV (Bonome et. al.,Cancer Research 2008, ) 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8

2) k=2 k=3 k=4 k=5 Survivalfraction

Months Months Months Months

k=6 k=7 k=8 Survivalfraction

Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

d. OV (Tothill et. al., Clinical Cancer Research, 2008) 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8

2) k=2 k=3 k=4 k=5 Survivalfraction

Months Months Months Months

k=6 k=7 k=8 Survivalfraction

Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

e. Lung adenocarcinoma (Shedden et. al., Nature Methods, 2008) 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8

2) k=2 k=3 k=4 k=5 Survivalfraction

Months Months Months Months

k=6 k=7 k=8 Survivalfraction

Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supplementary Figure 5 Single gene analysis bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. a. TCGA BLCA

TP53% ASCC2% WRN% FHL2%

NCOA6% MSH6% bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. b.TCGA BRCA

ZHX1% NFKBIA% ACTA2% PIN1%

THAP8% NAT6% BRCA2% CDK5%

CHD3% YBX1% TDRD7% VIM% bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. c. TCGA GBM

TEP1% PTRF% EEF2% GNL3%

TOP1% bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. d. TCGA HNSC

CABLES1( TOP1( EEF2( HIPK2( bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. e. TCGA KIRC

GNL3% SERPINA3% HIPK2% BEST1%

CREBBP% EP300% APP% MDM4%

PPP1R13B% MMP7% USP33% MMP9% bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

TP53% CTSG% SERPINA1% TCEB2%

SP1% THBS1% WT1% TCEB1% bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. f. TCGA LAML

SHFM1& NPM1& HSPA1A& TP53&

NQO1& WDR16& PARP1& bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. g.TCGA LUAD

YBX1% CABLES1% DDX5% HSPA1A%

UPF3B% AURKA% YWHAZ% ACTC1%

NPM1% ●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 1.0 ●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●● ●● ●●●●●●●● ● ●●●● ●● ●●●●● ●● 0.8 0.6 0.4 0.2

bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not

0.0 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 0 10 20 30 40 50 60 h.TCGA SKCM

GCH1% DSC1% AHSA1% KRT18%

COL17A1% bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. i. TCGA STAD

HSF1% ASCC2% ACTA2% bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. j. TCGA UCEC

SPN$ CDK6$ PIK3CA$ PIK3R1$

CDKN2D$ FYN$ RP1$ PIK3CD$