bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
An integrative somatic mutation analysis to identify pathways linked with survival outcomes across 19 cancer types Sunho Park1, Seung-Jun Kim2, Donghyeon Yu1, Samuel Pena-Llopis3,4, Jianjiong Gao5, Jin Suk Park1, Hao Tang1, Beibei Chen1, Jiwoong Kim1, Jessie Norris1, Xinlei Wang6, Min Chen7, Minsoo Kim1, Jeongsik Yong8, Zabi WarDak4,9, Kevin S Choe4,9, Michael Story4,9, Timothy K. Starr10,11, Jaeho Cheong†12, Tae Hyun Hwang†1,4
1Department of Clinical Sciences, University of Texas Southwestern Medical Center,
Dallas, Texas, United States of America
2Department of Computer Science and Electrical Engineering, University of Maryland
at Baltimore, Baltimore County, Maryland, United States of America
3Department of Internal Medicine, University of Texas Southwestern Medical Center,
Dallas, Texas, United States of America
4Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical
Center, Dallas, Texas, USA.
5Center for Molecular Biology, Memorial Sloan Kettering Cancer Center, New York,
New York, 10065, United States of America
6Department of Statistical Science, Southern Methodist University, Dallas, Texas,
United States of America
7Department of Mathematical Sciences, University of Texas at Dallas, Dallas, Texas,
United States of America
8Department of Biochemistry, Molecular Biology and Biophysics, University of
Minnesota Twin Cities, Minneapolis, Minnesota, United States of America
9Department of Radiation Oncology, University of Texas Southwestern Medical
Center, Dallas, Texas, United States of America
10Masonic Cancer Center, University of Minnesota – Twin Cities, Minneapolis,
Minnesota, Unites States of America
- 1 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
11Department of Obstetrics, Gynecology & Women's Health and Department of
Genetics, Cell Biology and Development, University of Minnesota, Minneapolis,
Minnesota, United States of America
12Department of Surgery, Yonsei University College of Medicine, Seoul, Korea
†Corresponding author
Email addresses:
SPL: [email protected]
- 2 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
THH: [email protected]
Abstract Identification of altered pathways that are clinically relevant across human cancers is a key challenge in cancer genomics. We developed a network-based algorithm to integrate somatic mutation data with gene networks and pathways, in order to identify pathways altered by somatic mutations across cancers. We applied our approach to The Cancer Genome Atlas (TCGA) dataset of somatic mutations in 4,790 cancer patients with 19 different types of malignancies. Our analysis identified cancer-type- specific altered pathways enriched with known cancer-relevant genes and drug targets. Consensus clustering using gene expression datasets that included 4,870 patients from TCGA and multiple independent cohorts confirmed that the altered pathways could be used to stratify patients into subgroups with significantly different clinical outcomes. Of particular significance, certain patient subpopulations with poor prognosis were identified because they had specific altered pathways for which there are available targeted therapies. These findings could be used to tailor and intensify therapy in these patients, for whom current therapy is suboptimal.
Background In the last few years, studies using high-throughput technologies have highlighted the
fact that the development and progression of cancer hinges on somatic alterations.
These somatic alterations may disrupt gene function, such as activating oncogenes or
inactivating tumor suppressor genes, and thus, dysregulate critical pathways
contributing to tumorigenesis. Therefore, precise identification and understanding of
disrupted pathways may provide insights into therapeutic strategies and the
development of novel agents. Many large-scale cancer genomics studies, such as The
Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium
(ICGC), have performed an integrated analysis to draft an overview of somatic
alterations in the cancer genome1-4. Many of these studies have reported novel
- 3 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
candidate cancer genes mutated at high and intermediate frequencies in a specific
cancer as well as across many cancer types4. However, it is a still challenge to
translate somatic mutations in tumors into the pathway model to accurately predict
patient clinical outcomes 5, 6. Recently, in order to improve the clinical relevance and
utility of somatic mutation analyses, Hopfree et al7 proposed integrating somatic
mutation data with molecular interaction networks for patient stratification. They
demonstrated that inclusion of prior knowledge, captured in molecular interaction
networks, could improve identification of patient subgroups with significantly
different histological, pathological or clinical outcomes and discover novel cancer-
related pathways or subnetworks. In a similar manner, other network-based methods
have demonstrated that incorporating molecular networks and/or biological pathways
can improve accuracy in identifying cancer-related pathways8-11.
One limitation of these network-based methods is that they are not designed to fully
utilize large-scale somatic mutation data from multiple cancer types to determine
which particular pathways are altered by somatic mutations across a range of human
cancers. In addition, due to the incomplete knowledge of existing gene set and/or
pathway database, these methods are limited to detect pathways based on a number of
altered genes annotated in existing gene set and pathway databases. Alternatively, the
methods that build pathways de novo without incorporating biological prior
knowledge can be applicable to detect altered pathways, but these methods were not
designed to detect cancer-type specific or commonly altered pathways either.
To address these, we developed an algorithm named NTriPath (Network regularized
non-negative TRI matrix factorization for PATHway identification) to integrate
somatic mutation, gene-gene interaction networks and gene set or pathway databases
to discover pathways altered by somatic mutations in 4,790 cancer patients with 19
- 4 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
different types of cancers. Incorporating existing gene set or pathway databases
enables NTriPath to report a list of altered pathways across cancers, and make easy to
determine/compare which particular pathways are altered in a particular cancer
type(s). In particular, the use of large-scale genome-wide somatic mutations from
4,790 cancer patients and gene-gene interaction networks enables NTriPath to classify
genes, which were not annotated in existing gene set or pathway databases, as new
member genes of the identified altered pathways based on modular structures of
mutational data within a cancer type and/or across multiple cancer types (using matrix
factorization) and connectivity in the gene-gene interaction networks. The questions
that we investigate here are: first, whether large-scale integrative somatic mutation
analysis can reliably identify cancer-type-specific or commonly altered pathways by
somatic mutations across cancers; second, whether the identified pathways can be
used as a prognostic biomarker for patient stratification - with the assumption that the
altered pathways contribute to cancer development and progression and, thus, impact
survival.
In these experiments, we demonstrated that the cancer-type-specific and commonly
altered pathways identified by NTriPath are biologically relevant to the corresponding
cancer type and are associated with patient survival outcomes. We also showed that
cancer-specific altered pathways are enriched with many known cancer-relevant
genes and targets of available drugs including those already FDA-approved. These
results imply that the cancer-specific altered pathways can guide therapeutic strategy
to target the altered pathways that are pivotal in each cancer type.
- 5 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Results
NTriPath: An integrative somatic mutation analysis for discovering pathways altered by somatic mutations across multiple cancer types NTriPath integrates somatic mutations with gene-gene interaction networks and a
pathway database to discover altered pathways across human cancers. We collected
somatic mutation data from TCGA for 4,790 patients and 19 different cancer types
(Table 1). A diagrammatic description of our algorithm is depicted in Figure 1. Four
types of data were used as input for our algorithm. First, we generated a binary matrix
(!) of patients x genes, with ‘1’ indicating a mutation and ‘0’ no mutation. Second,
we constructed gene-gene interaction networks (!). Third, we incorporated a pathway
12 database (!! ) (e.g., conserved 4,620 subnetworks across species ). Fourth, we
included clinical data on the patient's tumor type (!). NTriPath produces two matrices
as output; 1) altered pathways by mutated genes (!) and 2) altered pathways by
cancer type matrix (!). The use of both large-scale somatic mutation profiles and
gene-gene interaction networks enabled NTriPath to identify cancer-related pathways
containing known cancer genes mutated at different frequencies across cancers with
newly added member genes according to high network connectivity (!) 8, 13. Finally
we use the altered pathways by cancer type matrix (!) to identify altered pathways
that are specific for each cancer type. For further details, please see the Materials and
Methods section. Our method is also available at www.taehyunlab.org.
NTriPath identifies cancer-type-specific altered pathways that are biologically and clinically relevant In each cancer type, we selected the top 3 ranked altered pathways by
statistical significance from NTriPath with the 4,620 subnetwork modules to generate
cancer-type-specific altered pathways (See Material and Method section and
Supplementary Table 1). Interestingly, NTriPath was able to find altered pathways
containing not only genes that were frequently mutated but also genes that were - 6 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
mutated in a small subset of patients in each cancer type (Supplementary Table 2 and
Supplementary Figure 1). Gene set enrichment analysis using the genes from the top 3
altered pathways showed that the altered pathways are significantly enriched with
well-known cancer-related genes from COSMIC database 14 and known drug target
genes as well as cancer-relevant biological processes (Supplementary Table 3 and 4).
Focusing on kidney renal clear cell carcinoma (KIRC) as a proof of concept,
NTriPath identified the pathway consisting of VHL, USP33, DIO2, TCEB1 and
TCEB2 as the top-ranked altered pathway in KIRC (Figure 2A). The VHL (von-
Hippel Lindau) gene is a well-known tumor suppressor associated with KIRC, and is
frequently mutated in patients with KIRC15-18. VHL was the most frequently mutated
gene in TCGA KIRC with 55.7% of patients harboring mutations in the gene. TCEB1
is mutated at very low frequency in TCGA KIRC cohort. A recent study found that
TCEB1 is mutated in about 3% of the KIRC patients without VHL inactivation, and
found TCEB1 preventing the binding of Elongin C to VHL, which inactivates the VHL
pathway16. The second highest ranked pathway contained EP300 and TP53. EP300
and TP53 were mutated in 8.1% and 5.2% of patients, respectively. EP300 has been
identified as a co-activator of hypoxia-inducible factor 1 alpha (HIF1α), whose
activation is a hallmark of KIRC tumors. TP53 was previously found to be associated
with poor outcome in TCGA KIRC19. The third highest ranked pathway contains
LRP1 and matrix metalloproteinases (MMP1, MMP7, MMP9, MMP26). LRP1 is
mutated in 10% of TCGA KIRC cohort, but matrix metalloproteinases (MMPs) were
not mutated in TCGA KIRC cohort. Biological and clinical relevance of LRP1
mutation in KIRC has not been previously reported. MMPs have been implicated in
different types of cancer progression including the acquisition of invasive and
metastatic properties in many cancer types. The aberrant expression of MMPs has
- 7 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
been associated with poor patient survival and prognosis in KIRC patients 8, 20.
Interestingly, recent studies suggested that LRP1 induces the expression of matrix
metalloproteinase (MMPs) and thus promotes cancer cell invasion and metastasis in
many cancers including KIRC16,21-23.
NTriPath identified many new member genes in the top ranked pathways
including TCEB2, JUN, and SP1 as well as other tumor suppressors such as CREBBP,
SMAD3, BRCA1 and RB1. These newly identified member genes by NTriPath were
mutated at a very low frequency or not mutated at all in TCGA KIRC patients.
Instead, these genes interacted with many frequently mutated genes in the networks
and were often dysregulated at the mRNA and protein levels in many KIRC patients
(Figure 2B). For example, TCEB2, SP1 and JUN were not mutated but yet their
expression was dysregulated in 7%, 10% and 2% of TCGA KIRC patients,
respectively. Previous studies have shown that dysregulation in TCEB2 is expected to
disrupt the protein complex that ubiquitinates HIF1α, resulting in the same phenotype
as VHL inactivation by mutation or promoter hypermethylation24-26. In addition, SP1
and JUN were previously identified as major transcriptional regulators associated with
signaling circuit to promote tumor growth and invasion in KIRC 18.
Taken together, these results demonstrate that NTriPath is an effective tool to
accurately identify cancer-specific altered pathways including known cancer genes
mutated at a high or intermediate frequency in the patients, as well as genes mutated
at a very low frequency or not mutated at all yet may be fundamental role in
development and/or progression of KIRC.
Cancer-type-specific altered pathways across cancer types correlate with patient survival outcomes We hypothesized that cancer-type-specific altered pathways reflect the molecular
basis underlying the patient clinical outcomes. This would allow us to use member
- 8 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
genes in the altered pathways as gene signatures to stratify patients into subgroups
with different clinical outcomes for each type of cancer. We first collected a dataset
consisting of gene expression profiles from 3,656 patients with their survival
information from TCGA cohorts. We then used member genes in the top 3 ranked
cancer-type-specific altered pathways to perform consensus clustering for each cancer
type (see Material and Method section). We generated Kaplan-Meier (KM) curves
based on the groups produced by consensus clustering and found that patient survival
was significantly different among the groups (Figure 3 and Supplementary Figure 2).
In TCGA KIRC, we found three patient subgroups (A, B and C), with the Group C
having the poorest survival. A log-rank test indicated that Groups A and C had
significantly different survival outcomes (Log-rank test p-value = 1.840e-08, Hazard
ratio = 2.94) with median survival times of 41.9 months for group A compared to 30.8
months for group C (Figure 3A). Other examples are Bladder Urothelial Carcinoma
(BLCA), Head and Neck squamous cell carcinoma (HNSC), and Skin Cutaneous
Melanoma (SKCM) patient subgroups identified by NTriPath pathway signatures.
While the molecular classification of clinically relevant subtypes of these cancers is
still challenging, we found patient subgroups having significantly different survival in
these cancers (Log-rank test p-value= 0.0086, 0.0010, and 0.0210, repectively)
(Figure 3B, 3C and 3D).
Experiments with other TCGA datasets, including those for Breast invasive
carcinoma (BRCA), Glioblastoma Multiforme (GBM), Lung adenocarcinoma
(LUAD), and Ovarian serous cystadenocarcinoma (OV) consistently showed that the
use of member genes in cancer-type-specific altered pathways could serve as a
prognostic biomarker for patient stratification (Supplementary Figure 2 and 3). For
comparison, we also attempted to cluster patients using significant frequently mutated
- 9 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
genes previously identified by the TCGA Pan-Cancer study1. The results of consensus
clustering using the NTriPath-derived pathway signatures and the TCGA Pan-Cancer-
derived mutated gene signatures showed that the results from NTriPath-derived
pathway signatures had higher significance levels for BLAC, BRCA, and KIRC, and
comparable results for the GBM, HNSC, and LUAD cancer types (Figure 4). These
findings suggested that NTriPath-derived altered pathways could be used as
prognostic biomarkers for better patient stratification.
Independent cohorts for the validation of the cancer-type-specific altered pathways We performed multiple validations to evaluate the robustness and the reproducibility
of NTriPath. First, we evaluated the robustness of the cancer-type-specific altered
pathways identified in the TCGA cohort for prognostic stratification. We generated
gene expression profiles of 102 HNSC patients from our institution and used the
member genes of the top 3 HNSC cancer-type-specific altered pathways in the TCGA
cohort for patient stratification. In addition, we also used publically available gene
expression data from two ovarian cancer datasets, one lung cancer dataset, two colon
cancer datatsets for a total of 1,112 patients, and used the top 3 cancer-type specific
altered pathways for corresponding cancer type for independent validation. In the
HNSC cohorts, we found six patient subgroups (A through F), with the group F
patients having the poorest survival times (Figure 5A). A log-rank test indicated that
groups A and F had significantly different survival outcomes (p-value = 0.038, hazard
ratio = 1.88) with median survival times of 78.1 months for group A and 26.7 months
for group F. Similarly, we found patient subgroups having significantly different
survival outcomes in lung cancer, ovarian cancer, and colorectal cancer (Figure 5B-D
and Supplementary Figure 3). Secondly, we verified the reproducibility of NTriPath
for the identification of the cancer-type-specific altered pathways. We collected the
- 10 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
level 2 somatic mutation data from 19 human cancer types those were updated after
we collected initial dataset used in the original experiments from the TCGA data
portal. We found that there are 1891 newly updated patients’ mutation data from 15
cancer types (see Supplementary Table 5). We re-ran NTriPath to identify cancer-
type-specific pathways across 19 cancers using 6681 patients’ somatic mutation data
including those of newly updated patients’ mutation data. Interestingly, we found that
many top ranked pathways identified by NTriPath in the original experiments were
consistently highly ranked in the new experiments (see Supplementary Table 6).
These results reassure that NTriPath is a robust tool to detect the altered pathways
across cancers, and the altered pathways identified by NTriPath can serve as robust
prognostic signatures for identifying patient subgroups with different survival
outcomes across multiple cancer types.
NTriPath identified potential therapeutic targets in poor prognosis patient subgroups We further investigated whether we could identify potential targets or the therapy for
the identified poor prognosis patient subgroups. Interestingly, we found that many
known drug targets in the cancer-type specific altered pathways are often up-regulated
in poor prognosis patient subgroups across cancers (see Method section and
Supplementary 7). For example, in TCGA KIRC cohort, LRP1 and MMP9, targets of
FDA-approved drugs Tenecteplase and Captopril, were significantly up-regulated in
poor prognosis group compared to good prognosis group (FDR-adjusted p-value <
0.05 with t-test). Tenecteplase binds to LRP1 and induces both LRP1 and MMP9
expression, and Captoprilare inhibits MMP9 expression. Thus, combinatorial therapy
of these drugs can be beneficial for the KIRC patients with high LRP1 and MMP9
expression 27-44. Another notable example includes DNA Topoisomerase I (TOP1), a
target of well-known FDA-approved anticancer drugs such as Irinotecan and
- 11 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Topotecan, identified by NTriPath as a new member gene into the cancer-type-
specific altered pathways across many cancers including HNSC. Interestingly, we
found that TOP1 was up-regulated in poor prognosis subgroups in HNSC from both
TCGA and UTSW cohorts (Group E and F in Figure 3C and 5A, respectively). In
addition, we found that some patients with overexpression of TOP1 in TCGA HNSC
poor prognosis subgroup have developed therapy resistance against single
chemotherapeutic agent such as Cisplatin. Interestingly, there is an ongoing trial in
advanced HNSC showing efficacy of TOP1 inhibitor Irinotecan with Cisplatin in a
poor prognosis patient subgroup 45. These observations may suggest that TOP1
inhibitors-based combinations might offer an effective treatment option for HNSC
patients with overexpression of TOP1. Taken together, these findings suggested that
the use of NTriPath-derived altered pathways containing available drug targets may
allow for the development of more tailored therapeutics.
Discussion Systematic understanding of how somatic mutations influence clinical outcomes is
essential for the development and application of personalized therapies. Especially
organizing alterations at the individual gene level and in the molecular pathways can
correlate altered pathways and vulnerabilities with specific genetic lesions, and
provide novel insights into cancer biology, biomarkers for patient stratification in
clinical trials, and potential targeted drug development46. Here, we systematically
identified biological and clinical relevant cancer-type-specific altered cross multiple
cancer types. In particular, the integration of somatic mutation with biological prior
knowledge led to the identification of altered pathways that contain recurrently
mutated genes as a hallmark of specific cancer types. Interestingly, we found that
several genes, while not frequently mutated or not mutated at all in patients, were part
- 12 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
of cancer-type-specific altered pathways that have been causally implicated in the
development of corresponding cancer types, and expressions of those genes are
significantly associated with clinical outcomes (Supplementary Figure 4). For
example, no mutation of MMP7 has been reported, but high expression of MMP7 (p
= 0.00191, HR=1.7 (95%CI,1.21−2.38)) is significantly associated with poor survival
in TCGA KIRC patients. Other examples include CABLES1 (p = 0.00272, HR=0.486
(95%CI,0.301−0.787)) in TCGA HNSC and LUAD, and GCH1 (p = 0.0000528,
HR=0.52 (95%CI,0.367−0.763)) in TCGA SKCM are not frequently or not mutated
but low or high expression of those genes are significantly associated with poor
survival. In addition, we found that known drug targets are not frequently mutated but
often up-regulated in poor prognosis patient subgroups across many cancers. These
results further corroborate that the integrative analysis of somatic mutations with
additional biological prior knowledge may elucidate potential candidate genes
associated with clinical outcomes and could be potentially used to design targeted
therapy, which cannot be readily identified by somatic mutation analysis alone.
In our analysis, we did not remove synonymous mutations or further select a shorter
3, 47 list of recurrent mutated genes in cohorts with stringent criteria either . However,
Hopfree et al7 showed that filtering synonymous mutations resulted in a decreased
ability to detect patient subgroups with different survival outcomes. Another recent
study also showed that synonymous mutations could affect functions of oncogene and
tumor suppressors 48. In addition, our experimental results for patient stratification in
comparison with recurrent mutated gene signatures identified by the TCGA Pan-
Cancer1 indicated that the use of NTriPath-derived pathways showed a comparable or
better performance in discovering patient subgroups with different survival outcomes
across cancers. To evaluate the impact of different network resources, we used
- 13 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
networks from the HPRD 49 and Rossin, E.J. et al50 and repeated experiments. We
summarize the results of altered pathways and patient stratification using different
network resources and provide on our supplement website.
Lastly, NTriPath is a general computational algorithm and can be applied to other data
types such as gene expression, copy number alteration, and methylation to identify
altered pathways by different types of genomic aberrations. NTriPath can also be used
to find altered pathways across associated with other cancer-related phenotypes (e.g.,
patient groups having therapy resistance vs. sensitivity, metastatic vs. non-metastatic).
Conclusions We have described an integrative somatic mutation analysis for discovering altered
pathways in human cancers. NTriPath integrates somatic mutation data and prior
biological knowledge from the pathway database and molecular networks to identify
significantly altered pathways and their associations with specific cancer types.
Specifically, NTriPath effectively utilizes mutation patterns that exist in only a subset
of samples (or specific cancer types), thus revealing pathways altered by complex
mutation patterns across cancer types. Furthermore, the use of gene-gene interaction
networks and the pathway database provides the potential to identify altered pathways
enriched with genes harboring mutations at high/intermediate frequencies, as well as
those not mutated per se but nevertheless playing critical roles in tumorigenesis in
network and pathway contexts. Thus, NTriPath is uniquely suited to provide a global
analysis of altered pathways by somatic mutation across cancer types.
We applied NTriPath to somatic mutation data from 19 types of cancers, and
discovered cancer-type-specific altered pathways based on these mutations in human
cancers. Functional enrichment analysis of cancer-type-specific pathways
demonstrated that the identified cancer-type-specific altered pathways are biologically
- 14 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
meaningful to each cancer type. It also provided unique pathway views of key
biological processes underlying each cancer type. Of particular significance, we
identified a patient subgroup with poor survival by cancer-type-specific altered
pathway signatures from TCGA cohorts, which in independent cohorts. These results
implied the potential utility of cancer-type-specific altered pathway signatures to
serve as a guide to tailored treatment in a patient subgroup.
Materials and Methods
Somatic mutation, human gene-gene interaction networks, and pathway data The level 2 somatic mutation data from 19 human cancer types were collected from
the TCGA data portal on May 19th 201351. We constructed a gene-gene interaction
network by combining networks from Zhang, S. et al 52, the Human Protein Reference
Database (Dec. 2013)53 and Rossin, E.J. et al50..Four sets of pathways were used in
the analysis: 1) 4,620 conserved subnetworks from the human gene-gene interaction
network12, 2) KEGG, 3) Biocarta, and 4) Reactome gene sets from MsigDB (Sept.
2010)54.
Algorithm The algorithm identifies pathways disrupted by mutated genes. Disrupted pathways
are found based on the factorization results from the network regularized nonnegative
tri-matrix factorization.
1. Notations
We construct a binary data matrix ! ∈ !!×!from the mutation data, where ! is the
number of patients, ! is the number of genes and the (!, !)!!! element of the matrix !,
[!]!", is ‘1’ if the !th patient has a mutation on the !th gene, ‘0’ otherwise. We derive
the adjacency matrix from the human gene-gene interaction networks and denote it as
- 15 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
!, where [!]!"=‘1’ if the !th gene is interacting with the !th genes in the networks and
‘0’ otherwise. We define the graph Laplacian matrix by ! = ! − !, where each
diagonal element in the diagonal matrix ! is given by [!]!! = !"[!]!". We construct
!×!! a binary matrix ! ∈ ! denoting patient cluster, where !! indicates the number of
cancer types and [!]!"=1 indicates the !th patient has !th cancer type. We construct a
!×!! binary matrix !! ∈ ! from the specific pathway database denoting pathway
information, where !! is the number of pathways and [!!]!"=1 if the !th gene is
annotated in !th pathway as a member in the pathway database, otherwise 0. Since
current pathway database annotation is still incomplete, we define a matrix
! ∈ !!×!! denoting newly updated pathway information including newly added
member genes by NTriPath. We define a matrix ! ∈ !!!×!! denoting cancer type and
pathway associations, where each element of [!]!" represents associations between !th
cancer type with ! th pathway. Higher values of elements indicate stronger
associations between cancer types and pathways. Since ! and ! are unknown, we
need to learn about those matrices during optimization (see below section for details)
2. Network regularized non-negative tri-matrix factorization
The network regularized nonnegative tri-matrix factorization is an extension of
Nonnegative Tri Matrix Factorization (NTMF); in this work, somatic mutation data !
is factorized as the products of three element-wise non-negative matrices !, !, and !
denoting patient’s cancer type, cancer-type and pathway associations, and cancer-
related pathways, respectively. We here consider a weighted loss function to deal with
the sparseness of the data matrix ! (More than 98% of entries are zero). It enables us to
focus on an approximation error at nonzero entries, which correspond to mutated
genes. In addition, to incorporate the prior knowledge from human gene-gene
interaction networks and pathway datasets into factorizations, we enforce constraints
- 16 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
on parameters, which involve the graph Laplaican ! and the pathway information !!.
All these ideas are accomplished by minimizing the following objective function
! ! ! ! ! ! min ∥ ! ∘ ! − !"! ∥!+ !! ∥ ! ∥!+ !! ∥ ! ∥!+ !! ∥ ! − !! ∥!+ !!!" ! !" , !,!!!
!×! where ! ∈ ! is a weight matrix where [!]!" = 1 if [!]!" > 0 otherwise 0, and
the operator ∘!represents the element-wise multiplication. Here, we are only interested
in learning of ! and ! among three factor matrices, since factor !!can be obtained from
the patient’s clinical information.
To solve our minimization problem, we adapt the multiplicative update method for
NTMF proposed in a recent study55, which contains a routine for avoiding
‘inadmissible zeros problem’ where the solution of multiplicative update rules is stuck
at zero when an entry in the factor becomes.
Step 1: Initialization
!!×!! Initialize the factor matrices ! = ! and ! = !!, where ! ∈ ! is a matrix whose
elements are all one. Set the regularization parameters !! =!! = !! = 1 and !!=0.1. Set
!!" the user specified parameters for avoiding the inadmissible zeros problem, !!"# = 10 ,
!!! = 10!! and ! = 10!!".
Step 2: Iteration
Iterate until it converges or reaches the maximum number of iterations:
! ! [!]!" ← ([!]!" + !!")!!"
! ! [!]!" ← ([!]!" + !!")!!"
! ! where!!!" is set to ! if [!]!" ≥ !!"# !"#!!!" > 1, otherwise 0, and
[!!!"] !! = !" ,! !" ! ! [! ! ∘ !"! !]!" + !! ∥ ! ∥!+ !
! ! [! !" + !!!" + !!!!]!" !!" = ! ! .! [(! ∘ (!"! )) !" + !!! + !!!"]!" + !! ∥ ! ∥!+ !
- 17 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Empirically, the algorithm converges fast within 50 iterations in the experiments.
3. Identification of cancer-type-specific altered pathways Once the above optimization problem is solved, we used ! matrix to identify cancer-
type-specific altered pathways across cancers. Specifically, we ranked pathways
based on values of elements of the ! matrix for each cancer type (e.g., rank pathways
based on all values of !th row which indicate association scores between ith cancer
type and all pathways). In addition, to measure statistical significance of cancer-type
and pathway associations, we performed a permutation test (e.g., we randomly
permuted somatic mutation data and repeated experiments 5,000 times to calculate
empirical p-values) and defined cancer-type-specific altered pathways based on the
following strict criteria: 1) Pathways must be ranked within the top !th compared to
other pathways in each cancer type based on their association scores in matrix !. 2)
Pathways must have significant BH-adjusted p-values (Benjamini-Hochberg adjusted
p-values using a false discovery rate cutoff of 0.1) (See Supplementary X). In this
work, we selected the top 3 ranked pathways having significant BH- adjusted p-values
per each cancer type. Top ranked pathways for KICH, KIRP, and THCA were
excluded for further analysis, due to the insignificant BH-adjusted p-values.
Gene expression data and clustering We collected RNA-seq data for TCGA BLCA, BRCA, HNSC, KIRC, LAML,
LUAD, SKCM, STAD, UCEC from cBioPortal56-58 using CGDS MATLAB toolbox
with RNA Seq V2 RSEM option. We collected microarray gene expession profiles for
TCGA GBM from the TCGA dataportal51 and TCGA OV and two others from
Zhang, W. et al. 59. We collected colon cancer data from GSE39582 and lung cancer
data from Shedden, K. et al. 60. RNA-seq data were z-transformed while other
expression data were quantile normalized, log transformed, and expression values
- 18 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
were median centered. To perform consensus clustering, we used Matlab K-means
clustering and used two-way hierarchical clustering.
Authors' contributions
SP and THH designed the study. SP and THH performed the analysis. THH supervised the project. SP, JN, SL, and THH wrote the paper. All authors read and approved the final manuscript. Acknowledgements We would like to thank to Quantitative Biomedical Research Center to allow us to use
their computational resources.
Figures
Figure 1. Overview of NTriPath
This figure describes steps to discover altered pathways across multiple cancer types.
The aim of the approach is to integrate the somatic mutation data with gene-gene
interaction networks and a pathway database for discovering altered pathways across
cancers.
Figure 2. KIRC-specific altered pathways
(A) Diagrams of the top three ranked altered pathways in patients with KIRC. Red
color indicates genes that are frequently mutated. A circular-shaped node represents
the original member genes annotated in the pathway database, and a diamond-shaped
node represents newly identified members genes of the pathways by NTriPath (B)
Protein and mRNA expression and mutation status for all genes identified in the top
three KIRC altered pathways. Each row represents a member gene in the TCGA
- 19 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
KIRC-specific altered pathway, and each column represents a patient sample in
TCGA KIRC cohort.
Figure 3. Cancer-type-specific altered pathways across cancers correlate with survival
outcomes
Kaplan-Meier survival plots based on patient subgroups defined by consensus
clustering using genes from the top 3 altered pathways for (a) kidney renal cell
carcinoma (KIRC), (b) bladder urothelial carcinoma (BLCA), (c) head and neck
squamous carcinoma (HNSC), and (d) skin cutaneous melanoma (SKCM).
Figure 4. Comparing NTriPath-derived signatures with mutation-frequency-based
signatures
This figure describes comparisons of patient stratification using signatures derived
from NTriPath and mutation frequency reported in Kandoth, C. et al. 1.
Figure 5. Validation in independent cohort
This figure describes Kaplan-Meier survival plots for patient subgroups from (a)
UTSW HNSC (b) Lung adenocarcinoma (c) Colon cancer, (d) Ovarian cancer.
Cancer Type Number of Patients
1 Acute Myeloid Leukemia (LAML) 75
2 Bladder Urothelial Carcinoma (BLCA) 136
3 Brain Lower Grade Glioma (LGG), 217
4 Breast Invasive Carcinoma (BRCA) 772
5 Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma (CESC) 41
6 Colon Adenocarcinoma (COAD) 270
7 Glioblastoma Multiforme (GBM) 290
8 Head and Neck Squamous Cell Carcinoma (HNSC) 323
- 20 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
9 Kidney Chromophobe Renal Cell Carcinoma (KICH) 65
10 Kidney Renal Clear Cell Carcinoma (KIRC) 210
11 Kidney Renal Papillary Cell Carcinoma (KIRP) 111
12 Lung Adenocarcinoma (LUAD) 380
13 Ovarian Serous Cystadenocarcinoma (OV) 463
14 Prostate Adenocarcinoma (PRAD) 171
15 Rectum Adenocarcinoma (READ) 116
16 Skin Cutaneous Melanoma (SKCM) 269
17 Stomach Adenocarcinoma (STAD) 264
18 Thyroid Carcinoma (THCA) 369
19 Uterine Corpus Endometrioid Carcinoma 248
Table 1. A list of cancer types used in the analysis.
References
1. Kandoth, C. et al. Mutational landscape and significance across 12 major cancer types. Nature 502, 333-339 (2013). 2. Tamborero, D. et al. Comprehensive identification of mutational cancer driver genes across 12 tumor types. Sci. Rep. 3 (2013). 3. Lawrence, M.S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214-218 (2013). 4. Lawrence, M.S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495-501 (2014). 5. Osmanbeyoglu, H.U., Pelossof, R., Bromberg, J.F. & Leslie, C.S. Linking signaling pathways to transcriptional programs in breast cancer. Genome Res 24, 1869-1880 (2014). 6. Baselga, J. Targeting the phosphoinositide-3 (PI3) kinase pathway in breast cancer. The oncologist 16 Suppl 1, 12-19 (2011). 7. Hofree, M., Shen, J.P., Carter, H., Gross, A. & Ideker, T. Network-based stratification of tumor mutations. Nat Meth 10, 1108-1115 (2013). 8. Hwang, T. et al. Large-scale integrative network-based analysis identifies common pathways disrupted by copy number alterations across cancers. BMC Genomics 14, 1-13 (2013). 9. Vaske, C.J. et al. Inference of patient-specific pathway activities from multi- dimensional cancer genomics data using PARADIGM. Bioinformatics 26, i237-i245 (2010). 10. Cerami, E., Demir, E., Schultz, N., Taylor, B.S. & Sander, C. Automated Network Analysis Identifies Core Pathways in Glioblastoma. PLoS ONE 5, e8918 (2010).
- 21 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
11. Vandin, F., Upfal, E. & Raphael, B. Algorithms for detecting significantly mutated pathways in cancer. Journal of computational biology : a journal of computational molecular cell biology 18, 507-522 (2011). 12. Suthram, S. et al. Network-Based Elucidation of Human Disease Similarities Reveals Common Functional Modules Enriched for Pluripotent Drug Targets. PLoS Comput Biol 6, e1000662 (2010). 13. Hofree, M., Shen, J.P., Carter, H., Gross, A. & Ideker, T. Network-based stratification of tumor mutations. Nature methods 10, 1108-1115 (2013). 14. Forbes, S.A. et al. COSMIC: exploring the world's knowledge of somatic mutations in human cancer. Nucleic acids research 43, D805-811 (2015). 15. Pena-Llopis, S. et al. BAP1 loss defines a new class of renal cell carcinoma. Nat Genet 44, 751-759 (2012). 16. Sato, Y. et al. Integrated molecular analysis of clear-cell renal cell carcinoma. Nat Genet 45, 860-867 (2013). 17. Peña-Llopis, S. & Brugarolas, J. Simultaneous isolation of high-quality DNA, RNA, miRNA and proteins from tissues for genomic applications. Nat. Protocols 8, 2240-2255 (2013). 18. Nickerson, M.L. et al. Improved Identification of von Hippel-Lindau Gene Alterations in Clear Cell Renal Tumors. Clinical Cancer Research 14, 4726- 4734 (2008). 19. The Cancer Genome Atlas Research, N. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499, 43-49 (2013). 20. Gialeli, C., Theocharis, A.D. & Karamanos, N.K. Roles of matrix metalloproteinases in cancer progression and their pharmacological targeting. The FEBS journal 278, 16-27 (2011). 21. Staudt, N.D. et al. Myeloid Cell Receptor LRP1/CD91 Regulates Monocyte Recruitment and Angiogenesis in Tumors. Cancer Research 73, 3902-3912 (2013). 22. Langlois, B. et al. LRP-1 Promotes Cancer Cell Invasion by Supporting ERK and Inhibiting JNK Signaling Pathways. PLoS ONE 5, e11584 (2010). 23. Song, H., Li, Y., Lee, J., Schwartz, A.L. & Bu, G. Low-density lipoprotein receptor-related protein 1 promotes cancer cell migration and invasion by inducing the expression of matrix metalloproteinases 2 and 9. Cancer Res 69, 879-886 (2009). 24. Duan, D. et al. Inhibition of transcription elongation by the VHL tumor suppressor protein. Science 269, 1402-1406 (1995). 25. Ohh, M. et al. Ubiquitination of hypoxia-inducible factor requires direct binding to the [bgr]-domain of the von Hippel-Lindau protein. Nat Cell Biol 2, 423-427 (2000). 26. Tanimoto, K., Makino, Y., Pereira, T. & Poellinger, L. Mechanism of regulation of the hypoxia-inducible factor-1a by the von Hippel-Lindau tumor suppressor protein. The EMBO Journal 19, 4298-4309 (2000). 27. Yamamoto, D., Takai, S. & Miyazaki, M. Inhibitory profiles of captopril on matrix metalloproteinase-9 activity. European journal of pharmacology 588, 277-279 (2008). 28. Williams, R.N., Parsons, S.L., Morris, T.M., Rowlands, B.J. & Watson, S.A. Inhibition of matrix metalloproteinase activity and growth of gastric adenocarcinoma cells by an angiotensin converting enzyme inhibitor in in vitro and murine models. European journal of surgical oncology : the journal
- 22 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
of the European Society of Surgical Oncology and the British Association of Surgical Oncology 31, 1042-1050 (2005). 29. Underwood, C.K., Min, D., Lyons, J.G. & Hambley, T.W. The interaction of metal ions and Marimastat with matrix metalloproteinase 9. Journal of inorganic biochemistry 95, 165-170 (2003). 30. Reinhardt, D. et al. Cardiac remodelling in end stage heart failure: upregulation of matrix metalloproteinase (MMP) irrespective of the underlying disease, and evidence for a direct inhibitory effect of ACE inhibitors on MMP. Heart 88, 525-530 (2002). 31. Rask-Andersen, M., Almen, M.S. & Schioth, H.B. Trends in the exploitation of novel drug targets. Nature reviews. Drug discovery 10, 579-590 (2011). 32. Rajapakse, N., Mendis, E., Kim, M.M. & Kim, S.K. Sulfated glucosamine inhibits MMP-2 and MMP-9 expressions in human fibrosarcoma cells. Bioorganic & medicinal chemistry 15, 4891-4896 (2007). 33. Overington, J.P., Al-Lazikani, B. & Hopkins, A.L. How many drug targets are there? Nature reviews. Drug discovery 5, 993-996 (2006). 34. Okada, M. et al. Captopril attenuates matrix metalloproteinase-2 and -9 in monocrotaline-induced right ventricular hypertrophy in rats. Journal of pharmacological sciences 108, 487-494 (2008). 35. Nenan, S. et al. Metalloelastase (MMP-12) induced inflammatory response in mice airways: effects of dexamethasone, rolipram and marimastat. European journal of pharmacology 559, 75-81 (2007). 36. Mendis, E., Kim, M.M., Rajapakse, N. & Kim, S.K. Carboxy derivatized glucosamine is a potent inhibitor of matrix metalloproteinase-9 in HT1080 cells. Bioorganic & medicinal chemistry letters 16, 3105-3110 (2006). 37. Imming, P., Sinning, C. & Meyer, A. Drugs, their targets and the nature and number of drug targets. Nature reviews. Drug discovery 5, 821-834 (2006). 38. Fenton, J.I., Chlebek-Brown, K.A., Caron, J.P. & Orth, M.W. Effect of glucosamine on interleukin-1-conditioned articular cartilage. Equine veterinary journal. Supplement, 219-223 (2002). 39. Dodge, G.R. & Jimenez, S.A. Glucosamine sulfate modulates the levels of aggrecan and matrix metalloproteinase-3 synthesized by cultured human osteoarthritis articular chondrocytes. Osteoarthritis and cartilage / OARS, Osteoarthritis Research Society 11, 424-432 (2003). 40. Chu, S.C. et al. Glucosamine sulfate suppresses the expressions of urokinase plasminogen activator and inhibitor and gelatinases during the early stage of osteoarthritis. Clinica chimica acta; international journal of clinical chemistry 372, 167-172 (2006). 41. Chen, X., Ji, Z.L. & Chen, Y.Z. TTD: Therapeutic Target Database. Nucleic acids research 30, 412-415 (2002). 42. Berman, H.M. et al. The Protein Data Bank. Nucleic acids research 28, 235- 242 (2000). 43. Sato, A. et al. Inhibition of MMP-9 using a pyrrole-imidazole polyamide reduces cell invasion in renal cell carcinoma. International journal of oncology 43, 1441-1446 (2013). 44. Hu, K. et al. Tissue-type plasminogen activator acts as a cytokine that triggers intracellular signal transduction and induces matrix metalloproteinase-9 gene expression. The Journal of biological chemistry 281, 2120-2127 (2006).
- 23 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
45. Gilbert, J. et al. Phase II trial of irinotecan plus cisplatin in patients with recurrent or metastatic squamous carcinoma of the head and neck. Cancer 113, 186-192 (2008). 46. Garraway, Levi A. & Lander, Eric S. Lessons from the Cancer Genome. Cell 153, 17-37 (2013). 47. Dees, N.D. et al. MuSiC: Identifying mutational significance in cancer genomes. Genome Research 22, 1589-1598 (2012). 48. Supek, F., Minana, B., Valcarcel, J., Gabaldon, T. & Lehner, B. Synonymous mutations frequently act as driver mutations in human cancers. Cell 156, 1324-1335 (2014). 49. Keshava Prasad, T.S. et al. Human Protein Reference Database--2009 update. Nucleic acids research 37, D767-772 (2009). 50. Rossin, E.J. et al. Proteins Encoded in Genomic Regions Associated with Immune-Mediated Disease Physically Interact and Suggest Underlying Biology. PLoS Genet 7, e1001273 (2011). 51. strel'tsov, S.A., Mikheikin, A.L. & Nechipurenko Iu, D. [Interaction of topotecan--a DNA topoisomerase I inhibitor--with dual-stranded polydeoxyribonucleotides. II. Formation of a complex containing several DNA molecules in the presence of topotecan]. Molekuliarnaia biologiia 35, 442-450 (2001). 52. Zhang, S., Li, Q., Liu, J. & Zhou, X.J. A novel computational framework for simultaneous integration of multiple types of genomic data to identify microRNA-gene regulatory modules. Bioinformatics 27, i401-i409 (2011). 53. Keshava Prasad, T.S. et al. Human Protein Reference Database—2009 update. Nucleic acids research 37, D767-D772 (2009). 54. Subramanian, A. et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102, 15545- 15550 (2005). 55. Seung-Jun, K., TaeHyun, H. & Giannakis, G.B. in Cognitive Information Processing (CIP), 2012 3rd International Workshop on 1-6 (2012). 56. Teicher, B.A. Next generation topoisomerase I inhibitors: Rationale and biomarker strategies. Biochemical pharmacology 75, 1262-1271 (2008). 57. Cerami, E. et al. The cBio Cancer Genomics Portal: An Open Platform for Exploring Multidimensional Cancer Genomics Data. Cancer Discovery 2, 401-404 (2012). 58. Gao, J. et al. Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal. Sci. Signal. 6, pl1- (2013). 59. Zhang, W. et al. Network-based Survival Analysis Reveals Subnetwork Signatures for Predicting Outcomes of Ovarian Cancer Treatment. PLoS Comput Biol 9, e1002975 (2013). 60. Shedden, K. et al. Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study. Nat Med 14, 822-827 (2008).
- 24 - bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Fig. 1
Input: 1) Somatic mutation 2) Clinical data (e.g., patient’s cancer type) 3) Pathway information 4) Gene-gene interaction networks
Somatic mutation data Clinical data Pathway Gene-gene interaction network
...... ~ X X + samples samples pathways cancertypes ? ?
genes cancer types pathways genes X U S VT A Newly updated Output: member genes min ||X-USVT || F + ||S||2 + ||V-V || F + tr(VT (D-A)V) 2 1 0 2 1) Newly updated pathway information (V) in the pathways S,V 2) Cancer type-pathway association (S) X: Somatic mutation data (#patients X #genes) U: Patient cluster (#patients X #cancer types) S: Cancer type-pathway association matrix (#cancer types X #pathways) ...... V0 : Initial pathway information (#pathways X #genes) V: Newly updated pathway information (#pathways X #genes) pathways cancertypes A: Gene-gene interaction network (#genes X #genes)
pathways genes
p u o r g n o i s s u c s i d r e s U | m o c . s p u o r g e l g o o g @ l a t r o p o i b c : k c a b d e e f d n a s n o i t s e u Q
A G C T | C C K S M | l a t r o P o i B c
n o i t a c i f i l p m a l a m y h c n e s e m a m o t s a l b o n i t e r , r e d d a l b , M B G 4 M D M
n o i t a c i f i l p m a l a m y h c n e s e m a m o c r a s N U J
BEST1 MMP26
GNL3 r e h t o , e s n e s s i m a m o h p m y L n i k g d o H - n o
MMP9 N
RB1
, t f i h s e m a r f , e s n e s n o n l l e c - B , a m o h p m y l l l e c - B e g r a l e s u f f i d , a i m e k u e
MDM4 PPP2R5A MMP7 THBS1 l
ACTA2 SMAD3 , n o i t a c o l s n a r t a m o h p m y l / a i m e a k u e l s u o n e g o l e y m e t u c a , a i m e k u e l c i t y c o h p m y l e t u c a P B B E R SERPINA1 C ESR1
EP300
TFPI , t f i h s e m a r f TP63 LRP1
BRCA1
PPP1R13B CTSG
, e s n e s n o n , e s n e s s i
m MMP1
JUN TP53 PRSS3
, n o i t e l e d e g r a l l a i l e h t i p e n a i r a v o , t s a e r b n a i r a v o 1 A C R HIPK2 B
SP1 CREBBP APP
WT1 SERPINA3 r e h t o , e s n e s s i m a m o h p m y l l l e c - B e g r a l e s u f f i d , a i m e k u e l
bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not
, t f i h s e m a r f , e s n e s n o n l a i l e h t i p e c i t y c o h p m y l e t u c a , a i m e k u e l s u o n e g o l e y certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. m
, n o i t a c o l s n a r t , a m o h p m y l / a i m e a k u e l e t u c a , c i t a e r c n a p , t s a e r b , l a t c e r o l o c 0 0 3 P E
, t f i h s e m a r
Fig. 2 f
, e s n e s n o n , e s n e s s i m
a b ACTA2 ACTA2 3% , n o i t e l e d e g r a l r e h t o s m l i W r o m u t l l e c d n u o r l l a m s c i t s a l p o m s e d , s m l i W 1 T TCEB1 78.5% altered in APP APP 5% W
patient samples BEST1 BEST1 6%
BRCA1 5% , t f i h s e m a r f r e h t o , l a m y h c n e s e BRCA1m
TCEB2 USP33 DIO2 CREBBP CREBBP 5%
, e s n e s n o n , e s n e s s i m , l a i l e h t i p e g n u l l l e c l l a m s , t s a e r
CTSG CTSG 3% b
DIO2 6% MMP26 , n o i t e l e d e g r a l , a m o h p m y l / a i m e a k u e l , a m o c r a s , a m o t s a l b o n i t e r g n u l l l e c l l a m s , t s a e r b , a m o c r a s , a m o t s a l b o n i t e r 1 B BEST1 DIO2 R VHL GNL3 EP300 EP300 6% MMP9 RB1 ESR1 ESR1 6%
55.7% GNL3 GNL3 2%
PPP2R5A MMP7 THBS1
ACTA2 MDM4 HIPK2 8% r e h t o , l a m y h c n e s e m s e p y t r u o m u t r e h t SMAD3 HIPK2 o
JUN 3% SERPINA1
ESR1 JUN t f i h s e m a r f , l a i l e h t i p e e l p i t l u m , a m o i l g , a m o n i c r a c s e p y t r u o m u t r e h t o e l p i t l u m , a m o i l
EP300 LRP1 LRP1 10% g
MDM4 4% TFPI , e s n e s n o n , e s n e s s i m , a m o h p m y l / a i m e a k u e l l a c i t r o c o n e r d a , a m o c r a s , t s a e r b , l a c i t r o c o n e r d a , a m o c r a s , g n u l , l a t c e r o l o c , t s a e r b 3 5 P TP63 MDM4 LRP1 T BRCA1 MMP1 0% 8.1% PPP1R13B MMP1 CTSG
MMP26 MMP26 4% MMP1
JUN TP53 PRSS3 , t f i h s e m a r f MMP7 MMP7 5%
HIPK2 MMP9 MMP9 3% , e s n e s n o n , e s n e s s i m r e h t o , l a m y h c n e s e m a m o t y c o m o r h c o e h 5.2% p
SP1 CREBBP PPP1R13BPPP1R13B 17% APP
WT1 PPP2R5A 9% SERPINA3 , n o i t e l e d e g r a l , l a i l e h t i p e , a m o i g n a m e h , l a n e r a m o t y c o m o r h c o e h p , a m o i g n a m e h , l a n e r L H PPP2R5A V PRSS2 PRSS2 0%
PRSS3 PRSS3 3%
BEST1 MMP26 Cancer gene s e p y T n o i t a t u M s e p y T e u s s i T ) e n i l m r e G ( s e p y T r o m u T ) c i t a m o S ( s e p y T r o m u T e n e RB1 8% G GNL3 MMP9 RB1 Drug target SERPINA1 6% RB1 SERPINA1 mRNA
SERPINA3SERPINA3 7%
MDM4 PPP2R5A MMP7 THBS1 Upregulation : s u s n e C e n e G r e c n a C r e g n a S e h t y b d e g o l a t a c s a , s e n e g r e c n a c n w o n k e r a s e n e g y r e u q r u o y f o ACTA2 SMAD3 SMAD3 SMAD3 9% 9
SERPINA1
ESR1 SP1 SP1 9% mRNA : n o i t a m r o f n I s u s n e C e n e G r e c n a C r e g n a EP300 TCEB1 TCEB1 11% Downregulation S TFPI TP63 LRP1 TCEB2 TCEB2 2% BRCA1 WT1CTSG 6% PPP1R13B TCEB1 80% TFPI TFPI 5% Mutation MMP1
JUN PRSS3 10% 70% THBS1 THBS1 4%
TP53 60% . e v i t a t u p e r a s n o i t a r e t l a r e b m u n y p o TP53 TP53 13% RPPA C HIPK2 Mutati50%on mRNA Upregulation mRNA Downregulation RPPA Upregulation RPPA Downregulation
TCEB2 USP33 DIO2 40% TP63 TP63 6% Upregulation
SP1 CREBBP APP
WT1 SERPINA3 Mutation 20% USP33 USP33 6% n o i t a l u g e r n w o D A P P R n o i t a l u g e r p U A P P R n o i t a l u g e r n w o D A N R m n o i t a l u g e r p U A N R m n o i t a t u 10% M frequency VHL VHL 44% RPPA Copy nu0%mber alterations are putative.
VHL WT1 WT1 6% Downregulation
% 6 1 T W
Sanger Cancer Gene Census Information: 9 of your query genes are known cancer genes, as cataloged by the Sanger Cancer Gene Census:
TCEB1 Gene Tumor Types (Somatic) Tumor Types (Germline) Tissue Types Mutation Types
TCEB2 USP33 DIO2 VHL renal, hemangioma, pheochromocytoma renal, hemangioma, epithelial, large deletion, pheochromocytoma mesenchymal, other missense, nonsense, VHL frameshift,
TP53 breast, colorectal, lung, sarcoma, adrenocortical, breast, sarcoma, adrenocortical leukaemia/lymphoma, missense, nonsense, glioma, multiple other tumour types carcinoma, glioma, multiple epithelial, frameshift other tumour types mesenchymal, other
RB1 retinoblastoma, sarcoma, breast, small cell lung retinoblastoma, sarcoma, leukaemia/lymphoma, large deletion, breast, small cell lung epithelial, missense, nonsense, mesenchymal, other frameshift,
WT1 Wilms, desmoplastic small round cell tumor Wilms other large deletion, missense, nonsense, frameshift,
EP300 colorectal, breast, pancreatic, acute leukaemia/lymphoma, translocation, myelogenous leukemia, acute lymphocytic epithelial nonsense, frameshift, leukemia, diffuse large B-cell lymphoma missense, other
BRCA1 ovarian breast, ovarian epithelial large deletion, missense, nonsense, frameshift,
CREBBP acute lymphocytic leukemia, acute myelogenous leukaemia/lymphoma translocation, leukemia, diffuse large B-cell lymphoma, B-cell nonsense, frameshift, Non-Hodgkin Lymphoma missense, other
JUN sarcoma mesenchymal amplification
MDM4 GBM, bladder, retinoblastoma mesenchymal amplification
cBioPortal | MSKCC | TCGA Questions and feedback: [email protected] | User discussion group bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Fig. 3
a ●●● ●●●●●●● ●● ●●● ● TCGA KIRC Group A (n=237, HR vs. A)) 1.0 Group A (n=47, HR vs. A)
1.0 b ●● TCGA BLCA ●●● ●●●●●●●●●● ● ●●●●● ●● ●●● Group B (n=113, HR = 1.44) Group B (n=42, HR = 2.40) ●●● ●●●●●●●● ● ●●● ● ●●● ● ●●●●●● ● ●●● ●● Group C (n=76, HR = 2.94) Group C (n=20, HR = 2.73) ● ●●●● ●●●●●● ●● ●●●●● ●●●●● Group D (n=34, HR = 3.32) ● ● ● ●●●●● ●● ● ● ●●●●
●● 0.8
0.8 ● ●● ● ●●●● Log-rank P = 1.84E-08 ●●●●●● ●● ●●● ● ● ●● ● Log-rank P = 0.0086 ●●● ●●●●●●●●●● ●●●●● ●●●●●● ●● ● ● ● ● ●● ●●● ● ●●●●● ●●●●●●● ● ●● ● 0.6
0.6 ●●●● ● ● ● ● ●● ●●● ●● ● ●● ●● ● ●●● ● ● ●● ● ● ● ● ● ●●● ● 0.4 0.4 ● ●●
●● ● ●●● ● ● ● Survivalfraction Survivalfraction
0.2 ● ● ● 0.2 ● ● ● 0.0 0.0 0 20 40 60 80 100 0 20 40 60 80 100 120 Months Months
●●●●●● ●●●●● ● c ●● TCGA HNSC ● TCGA SKCM
1.0 Group A (n=138, HR vs. A) Group A (n=44, HR vs. A)) d 1.0 ●●●●●●●● ● ● ●●● ●●● ● ● Group B (n=53, HR = 1.78) Group B (n=5, HR = 1.50) ● ● ●● ●●● ● ● ● Group C (n=49, HR = 1.44) ● Group C (n=54, HR = 1.64) ●● ●● ●● Group D (n=7, HR = 2.07) Group D (n=54, HR = 1.39) ●● ● ● ● 0.8 0.8 ●●● ● Group E (n=10, HR = 4.08) ● ●● ●● ● Log-rank P = 0.0210 ● ●●●● ● ● ● ● ● Log-rank P = 0.0010 ● ● ● ● ● ● ●●●●●●●● ●●● ●●● 0.6 0.6 ● ● ● ● ● ● ●● ●● ● ●● ● ● ●●●● ●● ● ●● ● ● ● ●● ● ● ● ● 0.4 0.4 ●●●● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Survivalfraction ● Survivalfraction 0.2
● ● 0.2
● ● ● 0.0 0.0 0 20 40 60 80 100 120 140 0 50 100 150 200 250 300 350 Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Fig. 4
a BLCA b BRCA c GBM 10 10 10 -log(p) -log(p) -log(p)
NTriPath TCGA PanCaner
Number of subgroups Number of subgroups Number of subgroups d e f HNSC KIRC LUAD 10 10 10 -log(p) -log(p) -log(p)
Number of subgroups Number# groupsof subgroups Number# groups of subgroups bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Fig. 5 a b Lung adenocarcinoma 1.0 1.0 UTSW Group A (n=27, HR vs. A) ● HNSC Group B (n=14, HR = 1.29) Shedden et. al., Nature 2008 Group C (n=16, HR = 1.37) ● ● ● ● Group A (n=173, HR vs. A ) Group D (n=9, HR = 1.55) ●●●●●● ●● 0.8
0.8 Group B (n=269, HR = 1.48 ) Group E (n=7, HR = 1.67) ●● ● ● ● ●●●● Group F (n=29, HR = 1.88) ●●● ●●● Log-rank P = 0.0057 ● ●● ●●●●● ● ● ● ●● ●●●● Log-rank P = 0.0381 ● ● ●●
0.6 ●●
0.6 ● ●●● ● ●●●● ● ●● ●●● ●●● ●●●● ●●●●●●●●●● ● ● ●●●●● ●●● ● ● ●● ● ● ●●● ●● ● ●●●● 0.4 ●●●●●●
0.4 ● ● ●●●● ● ●● ●●● ● ● ● ● ● ● ●
Survivalfraction ● ●● Survivalfraction ●● ● ● ● ● ●● ●
0.2 ●● ● ● ● ● ● ● 0.2 ● ● ● 0.0 0.0 0 50 100 150 200 0 50 100 150 Months Months 1.0
●● ● Ovarian cancer c ●●●●● ● d 1.0 1.0 ● Colon cancer ●● ● ● ● ● Bonome et. al., Cancer Res 2008 ● ● ● ● ● ●● GSE39582 ● ●● ●●●●●● ●●● ●●●
●● 0.8 ●● ●● ● ● ● ●●●●●● Log-rank P = 0.0108 Group A (n=27, HR vs. A) ● ●●●● ●● ● ●● ●●●●●● ● ●●●●● 0.8 0.8 ● ● ●●●●●●●●●●● ●● Group B (n=74, HR = 0.897) ●●●●●●●● ● ● ● ● ● ● ●●●● ●● ●●●●● ● ● ●●● ●● ● ●●●●● ●●●●●●●●●● ●● ● ●● ● ● ● ● ● Group C (n=14, HR = 1.62) ●●● ●● ● ●●● ●● ● Group D (n=38, HR = 1.59) ●● ● ●● ●●●● ● ● ● ● ●
●●●●●●●●● ● 0.6 ●●●● ● ●●●●● ● ● ●●● ●● ● ● 0.6 0.6 ●●● ● ●●●●●● ●● ● ● ● ● ● ● ● ●●●●● ● ● Log-rank P = 0.0035 ●● ● ● ●● ● ● ● ● ●● ●● ●●●● ● ● ●n = 102, p = 0.0669
0.4 ● ● ● 0.4 0.4 HR1=1.15● (95%CI,0.605−2.18) ● Group A (n=106, HR vs. A) HR2=1.4● (95%CI,0.606−3.22) ● ● ●● Group B (n=51, HR = 0.913) HR3=1.56 (95%CI,0.601−4.04) ● ● ● ● Survivalfraction Survivalfraction Group C (n=43, HR = 1.24) ● ● ● ● 0.2 HR4=1.72 (95%CI,0.888−3.31) ● ●●
0.2 ● Group D (n=49, HR = 1.29) 0.2 ● ● ● ●
Group E (n=102, HR = 1.51) ● Group F (n=58, HR = 1.77) ● ● ● 0.0 0.0 0.0 0 50 100 150 200 0.00 0.1 50 0.2 1000.3 1500.4 Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Supplementary Figure 1 BEST1 MMP26 GNL3 MMP9 RB1
MDM4 PPP2R5A MMP7 THBS1 ACTA2 SMAD3 SERPINA1 ESR1 EP300 TP63 TFPI LRP1 BRCA1 PPP1R13B CTSG MMP1 JUN TP53 PRSS3 HIPK2 SP1 CREBBP APP WT1 SERPINA3
bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
A MAGEB18 B TCEB1 YBX1 EEF2 BEST1 MMP26 TCEB2 USP33 DIO2 GNL3 MMP9 TOP1 BEST1 RB1 GNL3 NPM1 MDM4 PPP2R5A MMP7 THBS1 VHL ACTA2 SMAD3 PPP2R5A SERPINA1 RB1 ESR1 EGFR RPA1 MSH6 ACTA2 MMP26 EP300 PCNA BEST1 TFPI GNL3 MMP9 TP63 LRP1 BRCA1 CTSG RB1 JUN HIPK2 HNSC PPP1R13B SP1 MMP1 RPS27A MDM4 PPP2R5A MMP7 THBS1 JUN TP53 PRSS3 WRN ACTA2TP53 SMAD3 SERPINA1 ESR1 BRCA1 HIPK2 MDM4 98.7% altered in EP300 SP1 CREBBP APP YWHAZ EP300 TP63 TFPI LRP1 WT1 SERPINA3 PARP1 BRCA1 patient samples CTSG CDK5 PPP1R13B TP63 MMP1 JUN CREBBP TP53 ESR1 PRSS3 HIPK2 SMAD3 78.5% altered in YEATS4 AURKA SP1 CREBBP PPP1R13B APP KIRC WT1 SERPINA3 CABLES1 NCOA6 WT1 patient samples
RASSF5 TACC1 TGS1 TDRD7 HSF1 BCL2 SHOC2 TCEB1
C D HRAS BEST1 PIK3CA TCEB2 USP33 DIO2 PLCE1 PIK3CD TCEB1 VAV1 VHL INSR USP33 TCEB2 DIO2 FYN PPP2R5A
RASA1 VHL IGSF9 PIK3R1 UCEC 99.2% altered in HSF1 ASCC2 KIT EGFR patient samples RPA1 GNL3
ACTA2 ERBB2 CDKN2D MSH6 SRC TP63 WRN PCNA STAD CDK6 TP53 CTNNB1 SMARCA4 STAD SLC9A3R1 SMAD3 PARP1 CREBBP WT1 62.5% altered in AXIN2 NF2 EP300 HIPK2 AXIN1 patient samples PTPN13 BRCA1 MSN IQGAP1 JUN CHMP4B
ESR1 FHL2 SPN APC NCOA6 PML ARHGEF4 PPP2R5A PIAS2 RHOA DDX39 TGS1 RP1 RASGRF1 bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
E BRCA 82.6% altered in F SLC25A38 JUN SP1 patient samples
RB1 CTSE PML CCNB1 EMG1 ANXA3 BEST1 SMAD3 CHEK2 ESR1 MAGEB18 CHD3 BRCA1 MDM4 PPP2R5A CDK5 CDC25C VIM EP300 AURKA TP63 NCOA6 HSF1 YEATS4 EEF2 PPP1R13B TACC1 TOP1 PCNA TP53 CREBBP PIN1 GNL3 ZHX1 MSH6 FHL2 CABLES1 YBX1 TP53 ASCC2 ACTA2 TGS1 GNL3 RPS27A HIPK2 CAPN1 WT1 TDRD7 IGSF9 PARP1 ACTA2 NPM1 PCNA JUN NFKBIA YWHAZ RPA1 BLCA 88.2% altered in EGFR ACTC1 WRN BRCA2 THAP8 patient samples HSPB1 PPA1 G H ATM TFAP2C CITED1 COAD 78.8% altered in NBN RASGRF1 RP1 patient samples E2F1 MDC1 RHOA GAK AXIN1 BARD1 ESR1 STAT3 ARHGEF4 DVL3 CREBBP IQGAP1 CITED2 PPP2CA BRCA1 CCNG1 CCNG2 TP53 EP300 CTNNB1 MYC CDK5 ZHX1 JUN MED21 PTPN13 APC SMARCA4 BRCA1 TP53 HSF1 TFAP2A SMAD3 STAT1 KPNA2 AXIN2 PPP2R5A PCNA ESR1 CREBBP KPNB1 RAN RPA1 MSH6 EP300 SMAD3
NUP153 CSE1L PARP1 CESC JUN WRN TGS1 85.4% altered in patient samples NCOA6 bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
I J KICH 78.5% altered in MAGEB18 YEATS4 GBM patient samples DDX5 ACTC1 HSPB1 TACC1 TEP1 SMG5 AURKA MAGEB18 PPA1 HSPA1A CAPN1 NFKBIA GNL3 TERT TDRD7 CDK5 TP53 EEF2 YWHAZ VIM CABLES1 ZHX1 SP1 SLC25A38 EGFR PCNA PRKCA HMGB2 YBX1 ANXA3 CHD3 75.5% altered in EMG1 NPM1 ACTA2 YWHAZ CABLES1 patient samples EEF2 TP53 TDRD7 CTSE NPM1 ACTA2 JUN TOP1 EGFR RPS27A CDK5 RPS27A PPP1R14A TACC1 MDM4 JUN YEATS4 MAPK9 YBX1 AURKA GNL3 TOP1 PIN1 PCNA BRCA2 YWHAQ PRKD1 THAP8 CDC25C
CHEK2 CCNB1 K L LAML 66.7% altered in KIRP 65.8% altered in EEF2 patient samples BRCA1 GNL3 PDZK1 BRCA1 patient samples TOP1 C11orf30 CDK2 CREBBP SRC CREBBP SMAD3 SRC YBX1 SMAD3 CD7 SHFM1 CEBPB CD7 PARP1 PDZK1IP1 CEBPB PIK3R1 BRCA2 PIK3R1CTNNB1 CTNNB1 MET NPM1 MET SMARCA4 EP300 ACTA2 MNDA SMARCA4 EP300 WDR16 EGFR RELB MYC EGFR EIF2AK2 SMARCB1 PLCG1 RELB MYC SAT1 SMARCB1 PLCG1 TP53 DNAJC3 CALM1 CALM1 HSPA1A ZNF24 PTPRJ YEATS4 SS18 FER PTPRJ NQO1 YEATS4 SS18 FER APLP1 TACC1 CDKN2C DNAJB1 TACC1 PITPNA RRM2 PITPNA RRM2B TFAP2B MLLT10 PIK3C3 TFAP2B MLLT10 PIK3C3 PTEN HIST1H2AC HIST1H2AC RRM1 bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
ACTC1 M N LUAD ACTC1
HSPB1 CAPN1 70.8% altered in HSPB1 PPA1 CAPN1 PPA1 MAGEB18 patient samples NFKBIA VIM ZHX1 MAGEB18 NFKBIA VIM YWHAZ ZHX1 SLC25A38 CHD3 YWHAZ EGFR SLC25A38 CHD3 ANXA3 EGFR TDRD7 EMG1 CABLES1 ANXA3 TDRD7 NPM1 EMG1 CABLES1 CTSE TP53 NPM1 CDK5 EEF2 CTSE TP53 ACTA2 TACC1 CDK5 RPS27A EEF2 AURKA ACTA2 TACC1 LGG JUN YEATS4 RPS27A AURKA YBX1 JUN YEATS4 88% altered in GNL3 TOP1 PCNA YBX1 THAP8 GNL3 TOP1 patient samples PIN1 PCNA BRCA2 THAP8 PIN1 BRCA2
CDC25C
CCNB1 CDC25C CHEK2 CCNB1 CHEK2 SLC25A38 P O CTSE PRAD EEF2 GNL3 EMG1 NPM1 48% altered in CCNB1 CDC25C YBX1 patient samples CCNB1 CHEK2 BRCA2 ACTA2 TOP1 CIB1 CDC25A CDC25C ANXA3 JUN MAGEB18 PLK3 CHEK1
PCNA PIN1 YWHAZ PIN1 TP53 CHEK2 THAP8
RPS27A EGFR NFKBIA AP3B2 ZHX1 HSPB1 CDK5 E2F1 OV USP40 BRCA1 PPA1 CHD3 DDB2 ATM TIPARP 80.3% altered in CAPN1 TP53 XRN2
CABLES1 MSH2 patient samples AURKA ACTC1 VIM MDM4 LMO4 MSH6
TACC1 TDRD7 BLM RAD51 YEATS4 bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Q R KRT18 TCHP
RASGRF1 CCNG2 DVL3 PKP3 RHOA CCNG1 MAGEB18 PPP2CA PKP2 AHSA1 EGFR COL17A1 KRT14 KRT5 AHSA2 ARHGEF4 GAK DSC2 GCH1 PPP2R5A CDK5 AXIN1 YWHAZ DSC1 HSP90AA1 DSG2 APOB DSG1 TG CTNNB1 CABLES1 PKP1 IQGAP1 EEF2 SP1 APC TP53 KRT18 TCHP DST HSP90B1 RPS27A DSP AXIN2 TDRD7 NPM1 JUP HSPA8 PKP3 ASGR1 PTPN13 RP1 AURKA TOP1 ACTA2 PKP2 SKCM AHSA1 HSPA1A PCNA COL17A1 KRT14 MTTP NR3C1 KRT5 AHSA2 TACC1 86.2% altered in GCH1 DSC2 GNL3 READ YBX1 DSC1 patient samples DSG2JUN HSP90AA1 72% altered in YEATS4 APOB TG DSG1 BAG1 PKP1 patient samples DST SP1 DSP HSP90B1 JUP HSPA8 ASGR1 S HSPA1A MTTP NR3C1
RASIP1 RAP1A
PIK3CD BAG1 PRKCI PIK3CA PDE6D NTRK1
RASGRP1 SRC PIK3R1 FES HSPB1 TIAM1 RASA1
HRAS RARRES3 KIT VAV1 TGM1 THCA CAV1 INSR MDK 65.3% altered in PLCE1 FYN SEMG1 patient samples RASSF5 SHOC2 BCL2 BEST1 MMP26 GNL3 MMP9 RB1
MDM4 PPP2R5A MMP7 THBS1 ACTA2 SMAD3 SERPINA1 ESR1 EP300 TP63 TFPI LRP1 BRCA1 PPP1R13B CTSG MMP1 JUN TP53 PRSS3 HIPK2 SP1 CREBBP APP WT1 SERPINA3
bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
a MAGEB18 b TCEB1 YBX1 EEF2 BEST1 MMP26 TCEB2 USP33 DIO2 GNL3 MMP9 TOP1 BEST1 RB1 GNL3 NPM1 MDM4 PPP2R5A MMP7 THBS1 VHL ACTA2 SMAD3 PPP2R5A SERPINA1 RB1 ESR1 EGFR RPA1 MSH6 ACTA2 MMP26 EP300 PCNA BEST1 TFPI GNL3 MMP9 TP63 LRP1 BRCA1 CTSG RB1 JUN HIPK2 HNSC HNSC PPP1R13B SP1 MMP1 RPS27A MDM4 PPP2R5A MMP7 THBS1 JUN TP53 PRSS3 WRN ACTA2TP53 SMAD3 SERPINA1 ESR1 BRCA1 HIPK2 MDM4 98.7% altered in EP300 SP1 CREBBP APP YWHAZ EP300 TP63 TFPI LRP1 WT1 SERPINA3 PARP1 BRCA1 patient samples CTSG CDK5 PPP1R13B TP63 MMP1 JUN CREBBP TP53 ESR1 PRSS3 HIPK2 SMAD3 78.5% altered in YEATS4 AURKA SP1 CREBBP PPP1R13B APP KIRC WT1 SERPINA3 CABLES1 NCOA6 WT1 patient samples
RASSF5 TACC1 TGS1 TDRD7 HSF1 BCL2 SHOC2 TCEB1 c d HRAS BEST1 PIK3CA TCEB2 USP33 DIO2 PLCE1 PIK3CD TCEB1 VAV1 VHL INSR USP33 TCEB2 DIO2 FYN PPP2R5A
RASA1 VHL IGSF9 PIK3R1 UCEC 99.2% altered in HSF1 ASCC2 KIT EGFR patient samples RPA1 GNL3
ACTA2 ERBB2 CDKN2D MSH6 SRC TP63 WRN PCNA STAD CDK6 TP53 CTNNB1 SMARCA4 STAD SLC9A3R1 SMAD3 PARP1 CREBBP WT1 62.5% altered in AXIN2 NF2 EP300 HIPK2 AXIN1 patient samples PTPN13 BRCA1 MSN IQGAP1 JUN CHMP4B
ESR1 FHL2 SPN APC NCOA6 PML ARHGEF4 PPP2R5A PIAS2 RHOA DDX39 TGS1 RP1 RASGRF1 bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
e BRCA 82.6% altered in f SLC25A38 JUN SP1 patient samples
RB1 CTSE PML CCNB1 EMG1 ANXA3 BEST1 SMAD3 CHEK2 ESR1 MAGEB18 CHD3 BRCA1 MDM4 PPP2R5A CDK5 CDC25C VIM EP300 AURKA TP63 NCOA6 HSF1 YEATS4 EEF2 PPP1R13B TACC1 TOP1 PCNA TP53 CREBBP PIN1 GNL3 ZHX1 MSH6 FHL2 CABLES1 YBX1 TP53 ASCC2 ACTA2 TGS1 GNL3 RPS27A HIPK2 CAPN1 WT1 TDRD7 IGSF9 PARP1 ACTA2 NPM1 PCNA JUN NFKBIA YWHAZ RPA1 BLCA 88.2% altered in EGFR ACTC1 WRN BRCA2 THAP8 patient samples HSPB1 PPA1 g h ATM TFAP2C CITED1 COAD 78.8% altered in NBN RASGRF1 RP1 patient samples E2F1 MDC1 RHOA GAK AXIN1 BARD1 ESR1 STAT3 ARHGEF4 DVL3 CREBBP IQGAP1 CITED2 PPP2CA BRCA1 CCNG1 CCNG2 TP53 EP300 CTNNB1 MYC CDK5 ZHX1 JUN MED21 PTPN13 APC SMARCA4 BRCA1 TP53 HSF1 TFAP2A SMAD3 STAT1 KPNA2 AXIN2 PPP2R5A PCNA ESR1 CREBBP KPNB1 RAN RPA1 MSH6 EP300 SMAD3
NUP153 CSE1L PARP1 CESC JUN WRN TGS1 85.4% altered in patient samples NCOA6 bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
i j KICH 78.5% altered in MAGEB18 YEATS4 GBM patient samples DDX5 ACTC1 HSPB1 TACC1 TEP1 SMG5 AURKA MAGEB18 PPA1 HSPA1A CAPN1 NFKBIA GNL3 TERT TDRD7 CDK5 TP53 EEF2 YWHAZ VIM CABLES1 ZHX1 SP1 SLC25A38 EGFR PCNA PRKCA HMGB2 YBX1 ANXA3 CHD3 75.5% altered in EMG1 NPM1 ACTA2 YWHAZ CABLES1 patient samples EEF2 TP53 TDRD7 CTSE NPM1 ACTA2 JUN TOP1 EGFR RPS27A CDK5 RPS27A PPP1R14A TACC1 MDM4 JUN YEATS4 MAPK9 YBX1 AURKA GNL3 TOP1 PIN1 PCNA BRCA2 YWHAQ PRKD1 THAP8 CDC25C
CHEK2 CCNB1 k l LAML 66.7% altered in KIRP 65.8% altered in EEF2 patient samples BRCA1 GNL3 PDZK1 BRCA1 patient samples TOP1 C11orf30 CDK2 CREBBP SRC CREBBP SMAD3 SRC YBX1 SMAD3 CD7 SHFM1 CEBPB CD7 PARP1 PDZK1IP1 CEBPB PIK3R1 BRCA2 PIK3R1CTNNB1 CTNNB1 MET NPM1 MET SMARCA4 EP300 ACTA2 MNDA SMARCA4 EP300 WDR16 EGFR RELB MYC EGFR EIF2AK2 SMARCB1 PLCG1 RELB MYC SAT1 SMARCB1 PLCG1 TP53 DNAJC3 CALM1 CALM1 HSPA1A ZNF24 PTPRJ YEATS4 SS18 FER PTPRJ NQO1 YEATS4 SS18 FER APLP1 TACC1 CDKN2C DNAJB1 TACC1 PITPNA RRM2 PITPNA RRM2B TFAP2B MLLT10 PIK3C3 TFAP2B MLLT10 PIK3C3 PTEN HIST1H2AC HIST1H2AC RRM1 bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
ACTC1 m n LUAD ACTC1
HSPB1 CAPN1 70.8% altered in HSPB1 PPA1 CAPN1 PPA1 MAGEB18 patient samples NFKBIA VIM ZHX1 MAGEB18 NFKBIA VIM YWHAZ ZHX1 SLC25A38 CHD3 YWHAZ EGFR SLC25A38 CHD3 ANXA3 EGFR TDRD7 EMG1 CABLES1 ANXA3 TDRD7 NPM1 EMG1 CABLES1 CTSE TP53 NPM1 CDK5 EEF2 CTSE TP53 ACTA2 TACC1 CDK5 RPS27A EEF2 AURKA ACTA2 TACC1 LGG JUN YEATS4 RPS27A AURKA YBX1 JUN YEATS4 88% altered in GNL3 TOP1 PCNA YBX1 THAP8 GNL3 TOP1 patient samples PIN1 PCNA BRCA2 THAP8 PIN1 BRCA2
CDC25C
CCNB1 CDC25C CHEK2 CCNB1 CHEK2 o SLC25A38 p CTSE PRAD EEF2 GNL3 EMG1 NPM1 48% altered in CCNB1 CDC25C YBX1 patient samples CCNB1 CHEK2 BRCA2 ACTA2 TOP1 CIB1 CDC25A CDC25C ANXA3 JUN MAGEB18 PLK3 CHEK1
PCNA PIN1 YWHAZ PIN1 TP53 CHEK2 THAP8
RPS27A EGFR NFKBIA AP3B2 ZHX1 HSPB1 CDK5 E2F1 OV USP40 BRCA1 PPA1 CHD3 DDB2 ATM TIPARP 80.3% altered in CAPN1 TP53 XRN2
CABLES1 MSH2 patient samples AURKA ACTC1 VIM MDM4 LMO4 MSH6
TACC1 TDRD7 BLM RAD51 YEATS4 bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
q r KRT18 TCHP
RASGRF1 CCNG2 DVL3 PKP3 RHOA CCNG1 MAGEB18 PPP2CA PKP2 AHSA1 EGFR COL17A1 KRT14 KRT5 AHSA2 ARHGEF4 GAK DSC2 GCH1 PPP2R5A CDK5 AXIN1 YWHAZ DSC1 HSP90AA1 DSG2 APOB DSG1 TG CTNNB1 CABLES1 PKP1 IQGAP1 EEF2 SP1 APC TP53 KRT18 TCHP DST HSP90B1 RPS27A DSP AXIN2 TDRD7 NPM1 JUP HSPA8 PKP3 ASGR1 PTPN13 RP1 AURKA TOP1 ACTA2 PKP2 SKCM AHSA1 HSPA1A PCNA COL17A1 KRT14 MTTP NR3C1 KRT5 AHSA2 TACC1 86.2% altered in GCH1 DSC2 GNL3 READ YBX1 DSC1 patient samples DSG2JUN HSP90AA1 72% altered in YEATS4 APOB TG DSG1 BAG1 PKP1 patient samples DST SP1 DSP HSP90B1 JUP HSPA8 ASGR1 s HSPA1A MTTP NR3C1
RASIP1 RAP1A
PIK3CD BAG1 PRKCI PIK3CA PDE6D NTRK1
RASGRP1 SRC PIK3R1 FES HSPB1 TIAM1 RASA1 80% HRAS 70% RARRES3 60% KIT VAV1 TGM1 THCA 50% CAV1 INSR 40% MDK 65.3% altered in Mutation 20% PLCE1 FYN 10% SEMG1 frequency patient samples 0% RASSF5 SHOC2 BCL2 bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Supplementary Figure 2
●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●● ●
●●●●●●●●●●●●● ●●● 1.0
1.0 ● ●●●●●●●●●●● ●●●●● ● ●●●● TCGA BRCA ● TCGA GBM ●●●●●●●● ● ●● ● ●●●●●●●● ● ●●● ● ●●●●●●●●● ●●●●●●● ●●●●●●●● ●● ● ●●●●●●●●● ●● ● ●● ● ●● ●●●●●●●●●●●● ●●●● ●● ●● ● ● ●●● ●● ● ● ●● ●● ●●●●●●● ● ●● ● ●●●● ●●● 0.8 ● 0.8 ●●● ● ● ●● ●●● ●● ● ● ●●● ●●● ●●●● ●●●● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ●●
●● ● ● ● ● 0.6 ● 0.6 ●● ● ● ● ● ●● ● ● ● ● ● ● ● 0.4 n = 544, p = 0.0399 0.4 n = 794,● p = 0.00107● ●● ● ● ● ● ● ● HR1=1.21 (95%CI,0.621−2.37) ● HR1=1.28 (95%CI,0.991−1.66) ● ● ● ● ● HR2=1.23 (95%CI,0.587−2.58) HR2=1.06● (95%CI,0.814−1.38) ● ● HR3=2.44 (95%CI,1.23−4.86) ● HR3=1.13 (95%CI,0.807−1.57) ● ● ● 0.2
0.2 ● ● ●● ●● ● ● ● ● ● 0.0 0.0 0 50 100 150 200 0 20 40 60 80 100 120 Months Months
●●● ●●● ●● ●●●●●●
1.0 ● TCGA LUAD 1.0 ● ●● ● TCGA OV ●●● ●● ●● ●●●●●●●●●●● ●●● ● ●● ● ●● ●●●●●● ● ●●●●● ●● ● ● ●●●● ●● ● ●● ● ●●● ●● ●●● ● ●● ●● ●● ●●●●
● ● 0.8 ● 0.8 ●● ● ● ● ●● ●●●● ●●●●● ● ●●● ● ●●●● ●● ● ●● ● ● ●●● ● ● ●● ● ● ●
● 0.6
0.6 ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ●● ●●● ●● ●●●● ● n = 504, p = 0.00742 ● ● ● ● ● 0.4 0.4 ● n = 213, p = 0.0457 HR1=1.57● (95%CI,1.05−2.36) ● ● ●●● ● ●●●● ● HR2=1.5 (95%CI,0.994● −2.26) HR1=1.51 (95%CI,0.811−2.8) ● ● ● ●● ● HR3=1.91● ● ●● (95%CI,1.08−3.36) HR2=1.78 (95%CI,0.99−3.21) ● ● ● ● ● ●●●● ● ● ● 0.2 0.2 HR4=1.6 (95%CI,1.07● −2.41) ● ●
● ● ● ● ● 0.0 0.0 0 50 100 150 0 50 100 150 200 Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Supplementary Figure 3 KM plots bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
a. TCGA BLCA 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8
2) k=2 k=3 k=4 k=5 Survivalfraction
Months Months Months Months
k=6 k=7 k=8 Survivalfraction
Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
b.TCGA BRCA 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8
2) k=2 k=2 k=3 k=3 k=4 k=4 k=5k=5 Survivalfraction
Months Months Months Months
k=6 k=7 k=8 Survivalfraction
Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
c. TCGA GBM 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8
2) k=2 k=2 k=3 k=4 k=5 Survivalfraction
Months Months Months Months
k=6 k=7 k=8 Survivalfraction
Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
d.TCGA HNSC 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8
●●●●●● ●●●●●● ●●●●●● ●
● ● 1.0 2) 1.0 ●● 1.0 ●●● ●● ●● ● ●● ●●●●●● ●●●● ●●●● ●●●●●●● ●● ●●●● k=4 k=2 k=3 ● k=5 ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ●● ● ● 0.8 0.8 ● 0.8 ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●
●●● ● 0.6 0.6 ● 0.6 ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ●
●●●● ● ● ●● ●●●● ● ● ● ● ● ● 0.4 n = 163, p = 0.0134 0.4 0.4 ●● ● ● ● ● ● ●● n = 163, p = ●0.301● ● HR1=0.698 (95%CI,0.375−1.3) n = 163, p = 0.468 ● ● ● ● ● ● ● ● ● ● ● ● HR1=0.895● ● (95%CI,0.51−1.57) HR1=0.83 (95%CI,0.502● −1.37) ● ● ● ● HR2=1.16 (95%CI,0.647−2.08) HR2=1.23 (95%CI,0.684−2.19) ● ● ● ● HR3=2.02 (95%CI,0.878−4.66) ● 0.2 0.2 0.2
● ● Survivalfraction 0.0 0.0 0.0 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 Months Months Months Months
k=6 k=7 k=8 Survivalfraction
Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
e.TCGA KIRC 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8
2) k=2 k=3 k=4 k=5 Survivalfraction
MonthsMonths Months Months Months
k=6 k=7 k=8 Survivalfraction
Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
f. TCGA LAML 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8
2) k=2 k=3 k=4 k=5 Survivalfraction Months Months Months Months
k=6 k=7 k=8 Survivalfraction
Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
g. TCGA LGG 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8
2) k=2 k=3 k=4 k=5 Survivalfraction Months Months Months Months
k=6 k=7 k=8 Survivalfraction
Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
h.TCGA LUAD 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8
2) k=2 k=3 k=4 k=5 Survivalfraction Months Months Months Months
k=6 k=7 k=8 Survivalfraction
Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
i. TCGA OV 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8
2) k=2 k=3 k=4 k=5 Survivalfraction Months Months Months Months
k=6 k=7 k=8 Survivalfraction
Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
j.TCGA SKCM 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8
2) k=2 k=3 k=4 k=5 Survivalfraction
Months Months Months Months
k=6 k=7 k=8 Survivalfraction
Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Supplementary Figure 4 Independent dataset bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
a. UTSW HNSC 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8
2) k=2 k=3 k=4 k=5 Survivalfraction
Months Months Months Months
k=6 k=7 k=8 Survivalfraction
Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
b.Colon cancer (GSE39582) a k=2 k=3 k=4 k=5 k=6 k=7 k=8
b k=2 k=3 k=4 k=5 Survivalfraction
Months Months Months Months
k=6 k=7 k=8 Survivalfraction
Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
b.Colon cancer (GSE17538) a k=2 k=3 k=4 k=5 k=6 k=7 k=8
● ● ● ● ● ● ● 1.0 ● 1.0 ● 1.0
1.0 ● b ● ● ●● ● ●● ● ●● ● ● ●● ● ● ●●● ● ● ● ●●● ● ● ●● ●● ●●● ● ●● ● ● ●● ●●● ● ●● ●● ●● ●●●● ● ●● ●● ●● ●● ●●●●● ● ●● ●● ● ●●● ● ● ● ●●● ● ●●●●●● k=2 k=3 ● ●●●●●●● ●●●●● ● k=4 ● ●● k=5 ●● ●●● ● 0.8 0.8 0.8 ● ● ●● ● ● ● ● ● ●● ● ● 0.8 ● ●●●● ●●●●●● ● ● ●●●●● ● ● ●●●●● ● ● ● ●●● ● ● ● ●● ● ● ●●●● ● ● ● ● ●● ●●●● ● ●● ● ● ●● ●●●● ●● ● ● ●●●● ●● ● ●●● ●● ●● ● ● ●●●● ● ●● ●● ●● ● ●●● ●● ●●●● ● ●●●● ● ● ● ●●● ● ● ● ● ●● ●●●●● ● ● ●● ● ●● ● ● ● ● ● ●●●●● ● ● ●● ●●●●● ●● ● ● ● ●● ● ●●●●●● ●● ●●●●●●●● ●● ●● ● ● ● ● ● ● ●●● ●● ● ● ●● ● ●● ● ●●● ●●●●●● ●● ● ●●●● ● ●●●●● ● ● ●● ● ●●● ●●●●● ●● ● ● ● ● ●● ● ● 0.6 0.6 0.6 0.6 ●● ●● ● ● ● ● ● ●● ● ● ● ● ●●●● ● ●● ● ● ● ● ●● ●● ●●●● ●● ● ●● ●● ● ● ●● ●● ● ● ● n = ●177,● ● ●● p● =● 0.00137
0.4 0.4 n = 177, p = 0.0205 0.4 0.4 n = 177, p = 0.408 HR1=1.03 (95%CI,0.446−2.4) HR1=1.67 (95%CI,0.843−3.31) n = 177, p = 0.934 HR1=0.763 (95%CI,0.392−1.48) HR2=0.63 (95%CI,0.241−1.64) HR2=0.721 (95%CI,0.311−1.67) HR1=1.03 (95%CI,0.54−1.95) HR2=0.867 (95%CI,0.436−1.72) HR3=2.31 (95%CI,1.1−4.85) HR3=1.06 (95%CI,0.479−2.33) 0.2 0.2 0.2
0.2 HR4=1.44 (95%CI,0.654−3.19) Survivalfraction 0.0 0.0 0.0 0.0 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 Months Months Months Months
● ● ● ● ● ● ● ● 1.0 ● 1.0 ● 1.0 ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ●●● ● ● ● ●● ● ●● ● ● ● ● ●●● ● ● ●●●●●● ●● ●●● ● ● ● ● ●● ●●●●●●● ●● ● ●●●●● ●●● ● k=6 k=7 ● ●● ●● ●● ● k=8 ● ● ● ●● ● ● ● ● ●● ● ● ●● ●● ●●●●● ● ●● ●●● ● ● ● ● ●● ● ● ●● ● ●● ●● 0.8 ● ● ●● ● 0.8 ● ● 0.8 ● ● ●● ● ●● ●●●● ●●●●●● ● ●●● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ●● ●● ● ● ●● ●●●●● ●● ● ● ● ●● ● ●● ●●●●● ●● ● ● ● ●● ● ●●● ● ●●●● ● ●●● ● ● ● ●● ●● ● ●● ●● ●●●● ● ●●●● ●●● ●●●●●● ●● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ●
● 0.6 0.6
0.6 ●● ●● ●● ● ●● ●● ●● n = 177, p = 0.000465 n = 177, p = 0.000644●● ● ● ● ●● ● ● ● ●● ● ●● ● ● HR1=7.21e●● ● ●−●08● ●(95%CI,0−Inf) n = 177,●● ● ● ●p ●= ●0.000511 HR1=1.17 (95%CI,0.227−6.05) ●● ● ●
0.4 0.4 HR2=0.442 (95%CI,0.153−1.28) 0.4 HR1=0.394 (95%CI,0.111−1.4) HR2=3 (95%CI,0.776−11.6) HR3=0.246 (95%CI,0.0316−1.91) HR2=0.392 (95%CI,0.118−1.3) HR3=1.45 (95%CI,0.474−4.43) HR4=0.589 (95%CI,0.248−1.4) HR3=0.389 (95%CI,0.113−1.34) HR4=1.17 (95%CI,0.399−3.43) HR4=1.11 (95%CI,0.352−3.48) HR5=0.744 (95%CI,0.236−2.35) 0.2 HR5=2.04 (95%CI,0.706−5.89) 0.2 0.2 HR6=0.654 (95%CI,0.269−1.59) HR5=0.734 (95%CI,0.229−2.35) HR6=3.34 (95%CI,1.21−9.21) Survivalfraction HR7=1.43 (95%CI,0.652−3.16) 0.0 0.0 0.0 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
c. OV (Bonome et. al.,Cancer Research 2008, ) 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8
2) k=2 k=3 k=4 k=5 Survivalfraction
Months Months Months Months
k=6 k=7 k=8 Survivalfraction
Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
d. OV (Tothill et. al., Clinical Cancer Research, 2008) 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8
2) k=2 k=3 k=4 k=5 Survivalfraction
Months Months Months Months
k=6 k=7 k=8 Survivalfraction
Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
e. Lung adenocarcinoma (Shedden et. al., Nature Methods, 2008) 1) k=2 k=3 k=4 k=5 k=6 k=7 k=8
2) k=2 k=3 k=4 k=5 Survivalfraction
Months Months Months Months
k=6 k=7 k=8 Survivalfraction
Months Months Months bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Supplementary Figure 5 Single gene analysis bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. a. TCGA BLCA
TP53% ASCC2% WRN% FHL2%
NCOA6% MSH6% bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. b.TCGA BRCA
ZHX1% NFKBIA% ACTA2% PIN1%
THAP8% NAT6% BRCA2% CDK5%
CHD3% YBX1% TDRD7% VIM% bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. c. TCGA GBM
TEP1% PTRF% EEF2% GNL3%
TOP1% bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. d. TCGA HNSC
CABLES1( TOP1( EEF2( HIPK2( bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. e. TCGA KIRC
GNL3% SERPINA3% HIPK2% BEST1%
CREBBP% EP300% APP% MDM4%
PPP1R13B% MMP7% USP33% MMP9% bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
TP53% CTSG% SERPINA1% TCEB2%
SP1% THBS1% WT1% TCEB1% bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. f. TCGA LAML
SHFM1& NPM1& HSPA1A& TP53&
NQO1& WDR16& PARP1& bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. g.TCGA LUAD
YBX1% CABLES1% DDX5% HSPA1A%
UPF3B% AURKA% YWHAZ% ACTC1%
NPM1% ●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 1.0 ●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●● ●● ●●●●●●●● ● ●●●● ●● ●●●●● ●● 0.8 0.6 0.4 0.2
bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not
0.0 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 0 10 20 30 40 50 60 h.TCGA SKCM
GCH1% DSC1% AHSA1% KRT18%
COL17A1% bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. i. TCGA STAD
HSF1% ASCC2% ACTA2% bioRxiv preprint doi: https://doi.org/10.1101/017582; this version posted April 6, 2015. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. j. TCGA UCEC
SPN$ CDK6$ PIK3CA$ PIK3R1$
CDKN2D$ FYN$ RP1$ PIK3CD$