Chang et al. Cancer Commun (2019) 39:23 https://doi.org/10.1186/s40880-019-0369-5 Cancer Communications

ORIGINAL ARTICLE Open Access Dual prognostic role of 2‑oxoglutarate‑dependent oxygenases in ten cancer types: implications for cell cycle regulation and cell adhesion maintenance Wai Hoong Chang, Donall Forde and Alvina G. Lai*

Abstract Background: Tumor hypoxia is associated with metastasis and resistance to chemotherapy and radiotherapy. involved in oxygen-sensing are clinically relevant and have signifcant implications for prognosis. In this study, we examined the pan-cancer prognostic signifcance of oxygen-sensing genes from the 2-oxoglutarate-dependent oxygenase family. Methods: A multi-cohort, retrospective study of transcriptional profles of 20,752 samples of 25 types of cancer was performed to identify pan-cancer prognostic signatures of 2-oxoglutarate-dependent oxygenase family (a family of oxygen-dependent enzymes consisting of 61 genes). We defned minimal prognostic gene sets using three independ- ent pancreatic cancer cohorts (n 681). We identifed two signatures, each consisting of 5 genes. The ability of the signa- tures in predicting survival was tested= using Cox regression and receiver operating characteristic (ROC) curve analyses. Results: Signature 1 (KDM8, KDM6B, P4HTM, ALKBH4, ALKBH7) and signature 2 (KDM3A, P4HA1, ASPH, PLOD1, PLOD2) were associated with good and poor prognosis. Signature 1 was prognostic in 8 cohorts representing 6 cancer types (n 2627): bladder urothelial carcinoma (P 0.039), renal papillary cell carcinoma (P 0.013), liver cancer (P 0.033 and= P 0.025), lung adenocarcinoma (P 0.014),= pancreatic adenocarcinoma (P < 0.001= and P 0.040), and =uterine corpus= endometrial carcinoma (P < 0.001).= Signature 2 was prognostic in 12 cohorts representing= 9 cancer types (n 4134): bladder urothelial carcinoma (P 0.039), cervical squamous cell carcinoma and endocervical adenocar- cinoma= (P 0.035), head and neck squamous= cell carcinoma (P 0.038), renal clear cell carcinoma (P 0.012), renal papillary cell= carcinoma (P 0.002), liver cancer (P < 0.001, P < 0.001),= lung adenocarcinoma (P 0.011),= pancreatic adenocarcinoma (P 0.002,= P 0.018, P < 0.001), and gastric adenocarcinoma (P 0.004). Multivariate= Cox regression confrmed independent= clinical= relevance of the signatures in these cancers. ROC= curve analyses confrmed superior performance of the signatures to current tumor staging benchmarks. KDM8 was a potential tumor suppressor down- regulated in liver and pancreatic cancers and an independent prognostic factor. KDM8 expression was negatively correlated with that of cell cycle regulators. Low KDM8 expression in tumors was associated with loss of cell adhesion phenotype through HNF4A signaling. Conclusion: Two pan-cancer prognostic signatures of oxygen-sensing genes were identifed. These genes can be used for risk stratifcation in ten diverse cancer types to reveal aggressive tumor subtypes. Keywords: Oxygen-sensing gene, 2-Oxoglutarate-dependent oxygenase, Pan-cancer, Prognosis, Hypoxia, KDM8, HNF4A

*Correspondence: [email protected] Nufeld Department of Medicine, University of Oxford, Old Road Campus, Oxford OX37FZ, UK

© The Author(s) 2019. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat​iveco​mmons​.org/licen​ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creat​iveco​mmons​.org/ publi​cdoma​in/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Chang et al. Cancer Commun (2019) 39:23 Page 2 of 14

Background gdac.broad​insti​tute.org/), which included gene expres- Solid tumors demand a considerable amount of oxygen sion profles of 19,781 tumor samples and 881 non-tumor due to their unique vasculature systems [1]. Rapid neo- samples. ICGC datasets were downloaded from the plastic cell proliferation and overexpression of angiogenic ICGC data portal (https​://icgc.org/), which included 729 factors leading to the formation of disorganized blood tumor samples. A GEO dataset was downloaded from the vessels result in insufcient oxygen supply to tumor cells GEO data portal (https​://www.ncbi.nlm.nih.gov/geo/), [2, 3]. Hence, there is a requirement for tumors to evolve which included 242 tumor samples. TCGA transcrip- systems that detect changes in oxygen homeostasis [4]. tome datasets were represented as the normalized gene Te discovery of hypoxia-inducible factor (HIF), a key expression RSEM (RNA-seq by expectation maximiza- oxygen-sensing gene, represents a quantum leap forward tion) values [16] obtained from GDAC Firehose. ICGC in tumor biology [5, 6]. Its discovery has led to the devel- transcriptome datasets were represented as normalized opment of drugs used to treat cancer [7, 8]. read counts. Te GEO dataset was generated by Afym- In addition to HIF, 2-oxoglutarate (2OG)-dependent etrix microarray profling using the Afymetrix Human oxygenases represent another family of oxygen-sens- Genome U133A 2.0 Array [17]. All expression profles ing . As suggested by the name, this group of were converted to ­log2(x + 1) scale. enzymes has an absolute requirement for molecular oxy- gen. Tey catalyze a range of oxidative modifcations, and KDM8 diferential expression analysis their activities are afected by nutrient and oxygen avail- TCGA liver cancer cohort (LIHC; Additional fle 1) was ability [9], both of which are altered within the tumor used in KDM8 diferential expression analysis. A total of microenvironment. Several members from this gene fam- 371 cancer patients in this cohort were median dichoto- ily have been implicated in cancer. For example, 10–11 mized into low and high KDM8 expression groups. To translocation 2 is frequently found to be mutated in leu- determine diferentially expressed genes between the kemia [10] and other solid malignancies [11]. In addition, two groups, the Bayes method and linear model were the epigenetic alterations and inactivating mutations of implemented using the R package limma (version 3.8) the Jumonji-C domain-containing demethylase [18]. P values were adjusted using the false discovery rate (KDM) family are frequently observed in multiple can- controlling procedure of Benjamini–Hochberg. Genes cers such as multiple myeloma, esophageal squamous cell with ­log2 fold change of > 1 or < − 1 and adjusted P val- carcinoma, renal cell carcinoma, breast cancer, colorectal ues < 0.05 were considered signifcant. cancer, and glioblastoma [12, 13]. We hypothesized that detecting the expression of Gene signatures and risk scores 2OG-dependent oxygenases could help predict progno- Expression scores for gene signatures 1 and 2 were sis in solid malignancies that are characteristically oxy- calculated for each patient by taking the average ­log2 gen-deprived. Additionally, we hypothesized this would expression values of signature genes. Signature 1 genes: be applicable to diferent types of cancer as they share a KDM8, KDM6B, P4HTM, ALKBH4, and ALKBH7. uniform need to overcome hypoxia for survival. Starting Signature 2 genes: KDM3A, P4HA1, ASPH, PLOD1, from an initial set of 61 genes encoding 2OG-dependent and PLOD2. Tumor hypoxia scores were calculated as oxygenases, we developed two prognostic gene signa- the average ­log2 expression values of 52 hypoxia sig- tures, each consisting of a minimal 5 genes that could nature genes [19]: ESRP1, CORO1C, SLC2A1, UTP11, facilitate risk stratifcation and predict overall survival CDKN3, TUBA1B, ENO1, NDRG1, PGAM1, CHCHD2, (OS) in cancer patients, and further confrmed their SLC25A32, SHCBP1, KIF20A, PGK1, BNIP3, ANLN, prognostic performance through a multi-cohort pan- ACOT7, TUBB6, MAP7D1, YKT6, PSRC1, GPI, cancer validation process. PGAM4, GAPDH, MRPL13, SEC61G, VEGFA, MIF, TPI1, MAD2L2, HK2, AK4, CA9, SLC16A1, KIF4A, Methods PSMA7, LDHA, MRPS17, PNP, TUBA1C, HILPDA, Datasets and processing LRRC42, TUBA1A, MRGBP, MRPL15, CTSV, ADM, Datasets used in this study consist of the expression pro- DDIT4, PFKP, P4HA1, MCTS1, and ANKRD37. Te risk fles of 20,752 tumor samples and 881 non-tumor sam- score for each patient was calculated by taking the sum ples that were obtained from Te Cancer Genome Atlas of Cox regression coefcient for each signature gene (TCGA) [14], International Cancer Genome Consortium multiplied with its corresponding expression value. (ICGC) [15], and Gene Expression Omnibus (GEO), rep- Nonparametric Spearman’s rank correlation analysis resenting 25 cancer types. Te cohort descriptions are was employed to assess the relationship of expression listed in Additional fle 1. TCGA datasets were down- scores and risk scores with tumor hypoxia (hypoxia loaded from Broad Institute GDAC Firehose (https://​ score). Chang et al. Cancer Commun (2019) 39:23 Page 3 of 14

Survival analyses Somatic mutation identifcation Cox proportional hazards regression analysis was Level 3 mutation datasets were downloaded from GDAC employed to investigate the association between patient (https​://gdac.broad​insti​tute.org/). Kaplan–Meier analysis survival and risk factors, e.g., signature 1 or signature 2 and log-rank tests were employed to determine the asso- score, tumor stage, and other clinical variables. Univari- ciation of somatic mutations, in combination with signa- ate analyses were performed to determine the infuence ture 1 or 2, on OS. of individual risk factors on OS. Multivariate analyses All graphs were generated using the ggplot2 package in were performed by including risk factors that were iden- R (version 3.1.0) [27]. tifed in univariate analyses (P < 0.05). Hazard ratios (HR) were determined from Cox models. Cox regression Statistical analysis analyses were performed using the R survival (version Comparisons of gene expression levels between tumor 2.43-3) [20] and survminer (version 0.4.3) [21] pack- and non-tumor samples or between high- and low- ages. Proportional hazards assumption was supported score groups were performed using the non-parametric by a non-signifcant relationship between scaled Schoe- Mann–Whitney–Wilcoxon test implemented in R (ver- nfeld residuals and time using the R survival package. In sion 3.3.3) [28]. addition, Kaplan–Meier and log-rank tests were used in univariate analyses of the gene signatures in relation to Results patient survival and were performed using the survival Multi‑cohort analyses revealed two prognostic and survminer packages. Patients were median-dichoto- 2OG‑dependent oxygenases signatures mized into low and high-score groups based on median We analyzed the prognostic signifcance of 61 2OG- expression scores of signature genes. Diferences between dependent oxygenase genes (Additional fle 2) in 19,781 high and low-score groups were tested using the log-rank tumor samples from multiple TCGA cohorts [14] cover- test implemented with the survival package. ing 25 cancer types (Additional fle 1). Prognostic genes Time-dependent receiver operating characteristic were defned as those whose expression levels were sig- (ROC) curve analysis was used to assess the predictive nifcantly correlated with patients’ OS. Pancreatic cancer performance of both signatures 1 and 2 in comparison is difcult to treat. Since the highest number of prog- with standard tumor staging parameters. Te R surv- nostic genes (29 genes) was observed in the pancre- comp (version 3.8) package [22] was employed to com- atic cancer cohort (PAAD; 178 samples), two additional pute time-dependent ROC curves [22]. pancreatic cancer cohorts from ICGC (PACA-AU and PACA-CA; 269 and 234 samples) were used in combina- tion as training cohorts (Fig. 1a). We defned two gene Biological enrichment analysis signatures (signatures 1 and 2) as favorable and unfavora- Analysis of biological pathway enrichment on the 745 ble prognostic factors by taking into consideration genes diferentially expressed genes between KDM8-low and that were signifcant in univariate Cox regression analy- -high groups was conducted using GeneCodis against ses in 2 out of 3 pancreatic cancer cohorts (Fig. 1a, b). the Kyoto Encyclopedia of Genes and Genomes (KEGG) Signature 1 included KDM8, KDM6B, P4HTM, ALKBH4, (https​://www.genom​e.jp/kegg/) and and ALKBH7. Likewise, KDM3A, P4HA1, ASPH, PLOD1, (GO) databases (http://geneo​ntolo​gy.org/) [23]. Te Enri- and PLOD2 made up signature 2 (Fig. 1a). chr tool was used to identify transcription factors from Patients were median-dichotomized based on mean the ENCODE database (https​://www.encod​eproj​ect.org/) expression scores of signatures 1 and 2 genes. Cox regres- as potential regulators of these 745 genes [24, 25]. sion analyses revealed that patients with high expres- sion of signature 1 genes had signifcantly better OS in HNF4A loss‑of‑function analysis 6 cancer types (Additional fle 3): bladder urothelial car- A total of 148 genes were identifed as HNF4A targets in cinoma (BLCA: HR, 0.662; 95% confdence interval [CI] the HepG2 hepatoma cell line determined using the Enri- 0.450–0.974; P 0.036), renal papillary cell carcinoma chr tool [24, 25]. Diferential expression analysis between = (KIRP: HR, 0.370; 95% CI 0.157–0.871; P = 0.023), liver HNF4A wild-type and null mice livers (GSE3126) per- cancer (LIHC: HR, 0.656; 95% CI 0.424–0.915; P 0.048 formed using the GEO2R tool [26] identifed 110 dif- = and LIRI-JP: HR, 0.490; 95% CI 0.259–0.938; P = 0.031), ferentially expressed genes from the initial 745-gene set lung adenocarcinoma (LUAD: HR, 0.625; 95% CI 0.443– identifed previously (Fig. 5h). Of these 110 genes, 45 0.879; P 0.007), pancreatic adenocarcinoma (PAAD: were identifed as direct HNF4A targets and were down- = HR, 0.454; 95% CI 0.278–0.741; P = 0.002), and uterine regulated in the HNF4A-null mice (Fig. 5i). corpus endometrial carcinoma (UCEC: HR, 0.401; 95% Chang et al. Cancer Commun (2019) 39:23 Page 4 of 14

a 2 OG oxygenases 25 TCGA cancers b Good prognostic genes (61 genes) KDM8_PACA_CA 0.048 n (tumor samples) = 19,781 Cox regression analyses KDM6B_PACA_CA 0.045 KDM4C_PACA_CA 0.019 PLOD3_PACA_CA 0.044 KDM1B_PACA_CA 0.012 KDM4D_PAAD 0.048 3 TRAINING COHORTS P4HTM_PACA_AU 0.031 JMJD7_PLA2G4B_PAAD 0.01 ICGC pancreatic ALKBH6_PAAD 0.0058 TCGA pancreatic cancer 0.0016 cancer cohorts P3H3_PAAD (29 / 61 prognostic genes) ALKBH7_PAAD 0.0012 PACA-AU, n = 269 ALKBH4_PACA_AU 0.0081 PAAD, n = 178 PACA-CA, n = 234 KDM6B_PAAD 0.0045 HR_PACA_AU 0.007 P4HTM_PAAD 0.00018 KDM8_PAAD 0.00023 JMJD6_PAAD 0.0065 KDM4B_PAAD 0.0032 Genes prognostic in 2 out of 3 training cohorts ALKBH1_PAAD 0.0079 ALKBH4_PAAD 0.0012 Signature 1 Signature 2 ALKBH5_PAAD 0.0027 EGLN2_PAAD 2.7e−05 (good prognosis) (bad prognosis) KDM6A_PACA_AU 0.006 KDM8 KDM3A HSPBAP1_PACA_AU 0.0092 ALKBH7_PACA_AU 0.011 KDM6B P4HA1 P4HTM ASPH 0.00 0.25 0.50 0.75 1.00 ALKBH4 PLOD1 ALKBH7 PLOD2 Bad prognostic genes PLOD1_PACA_AU 0.00015 P4HA2_PACA_AU 0.00035 P4HA1_PACA_AU 0.003 ALKBH3_PACA_AU 0.042 PAN-CANCER VALIDATION TMLHE_PACA_AU 0.028 OGFOD1_PAAD 0.0021 KDM5B_PAAD 0.00068 Signature 1 Signature 2 KDM2A_PAAD 0.011 PLOD2_PACA_AU 0.0091 8 prognostic cohorts 12 prognostic cohorts ALKBH8_PAAD 0.02 TMLHE_PAAD 0.026 Bladder BLCA n = 408 JMJD1C_PAAD 0.0094 Cervical CESC n = 304 TET3_PAAD 0.0071 Bladder BLCA n = 408 ASPH_PACA_AU 0.0045 Papillary cell KIRP n = 290 Head and neck HNSC n = 520 UTY_PACA_AU 0.047 Liver #1 LIHC n = 371 Renal clear cell KIRC n = 533 ASPH_PAAD 0.00015 KDM3A_PAAD 0.028 Liver #2 LIRI-JP n = 226 Papillary cell KIRP n = 290 PLOD1_PAAD 0.011 Lung LUAD n = 515 Liver #2 LIRI-JP n = 226 KDM3A_PACA_CA 0.0043 PLOD2_PAAD 0.0013 Pancreas #1 PAAD n = 178 Liver #3 GSE14520 n = 242 P4HA1_PACA_CA 1.3e−05 Pancreas #2 PACA-AU n = 269 Lung LUAD n = 515 P4HA1_PAAD 0.0031 KDM5A_PAAD 0.036 Endometrial UCEC n = 370 Pancreas #1 PAAD n = 178 ASPH_PACA_CA 0.0062 Pancreas #2 PACA-AU n = 269 P3H2_PAAD 0.0031 total = 2,627 Pancreas #3 PACA-CA n = 234 P4HA2_PACA_CA 0.042 P4HA3_PAAD 0.0038 Stomach STAD n = 415 EGLN3_PACA_CA 4e−04 P3H3_PACA_CA total = 4,134 12345710 15 25

Fig. 1 Schematic diagram of the study design and development of signatures derived from 61 2-oxoglutarate-dependent oxygenase genes. a Three pancreatic adenocarcinoma cohorts were used to defne both signatures 1 and 2. Genes found to be prognostic in univariate Cox regression analysis in 2 out of 3 pancreatic adenocarcinoma cohorts were included in signatures 1 and 2. Signature 1 is a marker of good prognosis and consists of 5 genes (KDM8, KDM6B, P4HTM, ALKBH4, and ALKBH7). Signature 2 is a marker of adverse prognosis and consists of 5 genes (KDM3A, P4HA1, ASPH, PLOD1, and PLOD2). Prognosis of both signatures was further confrmed in 10 cancer types using Kaplan–Meier, Cox regression, and receiver operating characteristic analyses. b Forest plots of prognostic genes found to be signifcant by univariate Cox regression analysis in pancreatic adenocarcinoma cohorts abbreviated as PAAD, PACA-AU, and PACA-CA. Genes were separated into two groups, good and bad prognostic genes. Hazard ratios were denoted as red circles, and turquoise bars represent 95% confdence interval. Signifcant Wald test P values are indicated in blue. Y-axes represent gene symbols followed by cohort abbreviations. Signature 1 genes are marked in green. Signature 2 genes are marked in red. Full description of cancers is listed in Additional fle 1. 2OG, 2-oxoglutarate; TCGA, The Cancer Genome Atlas; ICGC, International Cancer Genome Consortium

cell carcinoma (HNSC: HR, 1.479; 95% CI 1.056–2.072; CI 0.229–0.702; P = 0.002). Similar results were obtained using log-rank tests, consistent with the fact that signa- P = 0.023), renal clear cell carcinoma (KIRC: HR, 1.483; ture 1 was a marker of good prognosis (Fig. 2a). In con- 95% CI 1.096–2.007; P = 0.011), renal papillary cell carci- trast, patients with high expression of signature 2 genes noma (KIRP: HR, 3.862; 95% CI 1.565–9.526; P = 0.003), had signifcantly worse prognosis in 9 cancers: bladder liver cancer (LIRI-JP: HR, 5.271; 95% CI 2.429–11.440; urothelial carcinoma (BLCA: HR, 1.459; 95% CI 1.096– P < 0.001 and GSE14520: HR, 2.285; 95% CI 1.458–3.580; P < 0.001), lung adenocarcinoma (LUAD: HR, 1.562; 95% 2.137; P = 0.042), cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC: HR, 1.972; 95% CI 1.116–2.188; P = 0.009), pancreatic adenocarcinoma (PAAD: HR, 1.969; 95% CI 1.217–3.186; P 0.006), CI 1.003–3.877; P = 0.045), head and neck squamous = Chang et al. Cancer Commun (2019) 39:23 Page 5 of 14

a Bladder (BLCA) Papillary renal cell (KIRP) Liver #1 (LIHC) Liver #2 (LIRI-JP) 1.00 ++ 1.00 ++++++++++++++ +++++ +++ ++++++++++++++++++++++++++++++++ 1.00 +++++ 1.00 + + ++++++++++++ ++++ ++++++ ++++++++ +++++ ++++++++++++++++ ++++++++ + +++++ +++++++ +++++++++++++++ ++++++++ ++++++ + ++++++ ++++ +++++++++++++++++ + +++ + +++ +++++++++++ +++ +++++++ 0.75 +++ +++ 0.75 +++++++++ 0.75 +++++++++++ ++ 0.75 +++++++++++++ +++ ++++++++++ ++++++++ ++++++++++ +++ ++++ ++++ +++ ++++ +++ ++ +++++++ ++ +++++++++++++ +++++ 0.50 +++ ++++++ 0.50 ++++ ++++ ++++++++ +++++ 0.50 0.50 ++

0.25 0.25 0.25 0.25

log rank P = 0.039 log rank P = 0.013 log rank P = 0.033 log rank P = 0.025 0.00 0.00 0.00 0.00 Overall survival probability 012345 012345 012345 012345 Years Years Years Years Number at risk Number at risk Number at risk Number at risk Low score 157118 62 41 31 22 Low score 125 10374473526 Low score 155 105 53 36 25 19 Low score 1139855223 0 High score 153122 53 33 27 20 High score 131 11364403020 High score 158 129 78 48 34 20 High score 113986634132

Lung (LUAD) Pancreas #1 (PAAD) Pancreas #2 (PACA-AU) Endometrial (UCEC) 1.00 + 1.00 ++ 1.00 1.00 +++++ ++ +++++++++ + +++++++++++++++++++ ++++++++ + +++ ++++++++++ +++++++++++++++++++++++++ ++++++++++++++ +++ +++ +++++ ++++++++++++++++ +++++++++++ +++ + + ++++++ ++++++++++++ +++++++ ++ ++ ++++ 0.75 +++ ++ 0.75 + 0.75 ++ 0.75 ++++++++++++++++++++++ +++++++ ++++++ + + ++ + + +++ ++++++ +++ + + + ++++ +++++ + +++ ++++++ ++++ +++ + ++ +++++ ++++ ++ 0.50 ++++ + ++ 0.50 + 0.50 + + 0.50 +++ + ++++ ++++++++ ++ ++++++++ ++ +++ ++ ++++ ++ ++++++ + 0.25 0.25 ++ 0.25 + 0.25 + ++ +++++++++ log rank P = 0.014 log rank P < 0.001 log rank P = 0.040 log rank P < 0.001 0.00 0.00 0.00 0.00 Overall survival probability 012345 012345 012345 012345 Years Years Years Years Number at risk Number at risk Number at risk Number at risk Low score 207163 100573422 Low score 75 52 13 732 Low score 133 76 37 20 71Low score 184158 111766242 High score 212181 89 55 33 27 High score 74 51 21 12 75High score 134 88 30 811 High score 184163 130947154

b Bladder (BLCA) Cervical (CESC) Head and neck (HNSC) Renal clear cell (KIRC) 1.00 + 1.00 +++ 1.00 + 1.00 ++ +++++ +++++++++++++++++++++ +++++++ ++++ ++ ++++++++++++++++++++ +++++ +++++++++++++ +++++ +++ + +++++ + ++ ++++++++++++ +++++++++ +++++++ ++++++++++++++++ + +++++ +++++++++++++ ++ ++++ +++ +++++++++ ++++++++++++++ 0.75 + + 0.75 +++++++++ 0.75 +++ ++++++++++ 0.75 +++ ++++++++++++++++++++ ++ ++ ++++++ ++++ ++++++++++++++ +++ ++++++++++ ++++ +++++++++ +++++ ++ ++++++++++ ++++++++++ ++++++++++ ++ + ++++++ ++++++++ ++ +++++ ++++++++ + +++++++++++++++++++++++++ + +++++++++ 0.50 + 0.50 0.50 ++++ 0.50 ++++++ ++++ +++++++

0.25 0.25 0.25 0.25

log rank P = 0.039 log rank P = 0.035 log rank P = 0.038 log rank P = 0.012 0.00 0.00 0.00 0.00 Overall survival probability 012345 012345 012345 012345 Years Years Years Years Number at risk Number at risk Number at risk Number at risk Low score 157124 54 39 29 21 Low score 1169962402922 Low score 218 180 101664627 Low score 262229 189151 11882 High score 153116 61 35 29 21 High score 1179362402414 High score 223 170 107653921 High score 262206 166138 10370

Renal papillary cell (KIRP) Liver #2 (LIRI-JP) Liver #3 (GSE14520) Lung (LUAD) 1.00 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1.00 ++++++++++ 1.00 ++ 1.00 ++++++ + + + ++++ ++++ + +++++++++ ++++++++++++ ++++++++ ++ +++++++++++++++++++++++++++ + ++++++++++ +++++++++ ++++ + +++ +++++++++++++++ +++++++++++ +++ +++++++ ++++++++++ 0.75 +++ +++++++ ++ 0.75 +++ ++ 0.75 +++++++++++ 0.75 + ++++++++++++++++ ++++ +++++++ +++++++ + + ++ +++ + + +++++++ ++++++ ++++ + ++ + + ++++ +++++ 0.50 0.50 0.50 +++++++++ 0.50 +++ ++ +++ ++++ +++ ++ +++++ +++++ + 0.25 0.25 0.25 0.25

log rank P = 0.002 log rank P < 0.001 log rank P < 0.001 log rank P = 0.011 0.00 0.00 0.00 0.00 Overall survival probability 012345 012345 012345 012345 Years Years Years Years Number at risk Number at risk Number at risk Number at risk Low score 129114 71 45 34 22 Low score 113 106 67 34 13 0 Low score 112 100 86 79 74 23 Low score 212185 97 61 39 29 High score 127102 67 42 31 24 High score 1139054223 2 High score 113 88 70 60 52 18 High score 207159 92 51 28 20

Pancreas #1 (PAAD) Pancreas #2 (PACA-AU) Pancreas #3 (PACA-CA) Stomach (STAD) y 1.00 ++ 1.00 1.00 1.00 +++++++ + + + +++++ ++ ++ ++ +++++++ ++++ +++++ + +++ + ++ +++++ ++ +++++ 0.75 ++++ 0.75 + ++ 0.75 + 0.75 ++++ +++++++ + + ++ + + ++++++ +++ +++++ + ++ ++++ ++++++++++++++++++ +++ + +++ ++ + + +++ +++ + ++++ +++ ++ +++ 0.50 ++ +++++ ++ + 0.50 +++ 0.50 + + 0.50 +++++++ + ++++++++ +++ + ++ + + + +++++ +++ ++ ++ ++++ ++++ + +++ + 0.25 + 0.25 ++ ++ +++++++++ 0.25 + 0.25 + ++ ++++++ +++++ + log rank P = 0.002 log rank P = 0.018 log rank P < 0.001+ + log rank P = 0.004 0.00 0.00 0.00 0.00 Overall survival probabilit 012345 012345 012345 012345 Years Years Years Years Number at risk Number at risk Number at risk Number at risk Low score 75 52 19 12 86Low score 133 90 37 14 41Low score 90 67 39 28 15 7 Low score 163120 56 32 16 11 High score 74 51 15 721 High score 134 74 30 14 41High score 92 46 23 11 62High score 158106 43 11 54 Chang et al. Cancer Commun (2019) 39:23 Page 6 of 14

(See fgure on previous page.) Fig. 2 Kaplan–Meier analyses confrming that gene signatures were associated with patients’ overall survival. a Validation of signature 1 (green panels) across multiple cancer types. Kaplan–Meier plots of overall survival in cancer patients stratifed based on signature 1 mean expression scores. Patients were median-dichotomized into high- and low-score groups. Signature 1 is a marker of good prognosis, and hence patients with high signature 1 scores had high survival rates. b Validation of signature 2 (red panels) across multiple cancer types. Kaplan–Meier plots of overall survival in cancer patients stratifed based on signature 2 mean expression scores. Patients were median-dichotomized into high- and low-score groups. Signature 2 is a marker of adverse prognosis, and hence patients with high signature 2 scores had low survival rates. P values were calculated from the log-rank test. Pancreas #1 PAAD cohort; Pancreas #2 PACA-AU cohort; Pancreas #3 PACA-CA cohort; Liver #1 LIHC = = = = cohort; Liver #2 LIRI-JP cohort; and Liver #3 GSE14520 cohort (Additional fle 1) = = and gastric adenocarcinoma (STAD: HR, 1.725; 95% CI pancreatic adenocarcinoma (PAAD: AUC = 0.668), renal 1.142–2.605; P 0.009) (Fig. 2b and Additional fle 3). = clear cell carcinoma (KIRC: AUC = 0.665), head and neck squamous cell carcinoma (HNSC: AUC 0.632), lung Cross‑platform subgroup and multivariate analyses = adenocarcinoma (LUAD: AUC = 0.623), gastric adeno- confrmed the validity of signatures 1 and 2 carcinoma (STAD: AUC 0.618), and bladder urothelial as independent prognostic factors = carcinoma (BLCA: AUC = 0.605) (Fig. 3d and Additional To assess the independence of signatures 1 and 2 over fle 4D). Performance of both signatures 1 and 2 was current tumor staging systems, we performed subgroup superior to current tumor-node-metastasis (TNM) stag- analyses of their prognostic efects in patients with early ing except for the following: signature 1 in liver cancer, (stages I and/or II), intermediate (stages II and/or III), lung adenocarcinoma, and uterine corpus endometrial and late (stages III and/or VI) cancer stages. Kaplan– carcinoma and signature 2 in bladder urothelial carci- Meier analyses revealed that signature 1 successfully noma, renal clear cell carcinoma, liver cancer, and lung identifed high-risk (low-expression score) and low-risk adenocarcinoma (Fig. 3c, d and Additional fle 4C, D). (high-expression score) patients with early (bladder Remarkably, when used in combination with TNM stag- urothelial carcinoma, liver cancer, lung adenocarcinoma, ing, both signatures consistently outperformed each of pancreatic adenocarcinoma, and uterine corpus endome- the individual classifers, reinforcing their incremental trial carcinoma), intermediate (liver cancer and uterine prognostic values (Fig. 3c, d and Additional fle 4C, D). corpus endometrial carcinoma), and late (renal papillary Signifcantly, while TNM staging could not predict out- cell carcinoma) disease stages (Fig. 3a and Additional come in cervical squamous cell carcinoma and endocer- fle 4A). Signature 2 was also independent of disease stage vical adenocarcinoma patients (CESC: AUC = 0.455), as it successfully predicted survival in early (liver cancer, signature 2 sufciently served as an adverse prognostic lung adenocarcinoma, and pancreatic adenocarcinoma), factor (CESC: AUC = 0.692) (Fig. 3d). intermediate (bladder urothelial carcinoma, liver cancer, Univariate Cox regression analyses revealed that TNM and gastric adenocarcinoma), and late (bladder urothe- stage was associated with patient survival in diferent lial carcinoma, head and neck squamous cell carcinoma, cancer types except for cervical squamous cell carci- renal papillary cell carcinoma, liver cancer, and gastric noma and endocervical adenocarcinoma, and pancreatic adenocarcinoma) stages (Fig. 3b and Additional fle 4B). adenocarcinoma (Additional fle 3). Tis was expected ROC analyses were employed to determine the pre- given the low AUC values for TNM stage in both pancre- dictive performance (sensitivity and specifcity) of sig- atic adenocarcinoma (PAAD: AUC = 0.593) and cervical natures 1 and 2 on 5-year OS. Signature 1 performed squamous cell carcinoma and endocervical adenocar- the best, as measured by area under the curve (AUC), cinoma (CESC: AUC = 0.455), suggesting that current in pancreatic adenocarcinoma (PAAD: AUC = 0.754) TNM staging system for these cancers are inadequate followed by renal papillary cell carcinoma (KIRP: (Fig. 3c, d). Multivariate Cox regression analyses after AUC = 0.738), liver cancer (LIRI-JP: AUC = 0.652 and adjusting for TNM stage showed that signatures 1 and LIHC: AUC = 0.613), bladder urothelial carcinoma 2 remained signifcantly associated with survival (Addi- (BLCA: AUC = 0.645), uterine corpus endometrial car- tional fle 3). For 2 liver cancer cohorts, we considered cinoma (UCEC: AUC = 0.635), and lung adenocarci- additional clinicopathological features. Te GSE14520 noma (LUAD: AUC = 0.625) (Fig. 3c and Additional cohort consisted of Chinese patients with hepatitis fle 4C). Signature 2 performance in renal papillary cell B-associated hepatocellular carcinoma [17], whereas carcinoma (KIRP: AUC = 0.810) was the best, followed LIRI-JP was a Japanese-based cohort of mixed etiol- by cervical squamous cell carcinoma and endocervi- ogy [29]. Tumor size, cirrhosis, TNM stage, Barcelona cal adenocarcinoma (CESC: AUC = 0.692), liver cancer Clinic Liver Cancer (BCLC) stage, and alpha-fetoprotein (GSE14520: AUC = 0.675 and LIRI-JP: AUC = 0.625), Chang et al. Cancer Commun (2019) 39:23 Page 7 of 14

a Bladder (BLCA): S1 & 2 Liver #2 (LIRI-JP): S1 & 2 Pancreas (PAAD): S1 & 2 Endometrial (UCEC): S1 & 2 1.00 ++ ++++ 1.00 ++ +++++++++++++++++ 1.00 ++ 1.00 +++++++++++++++++++++++++++++++ +++++ ++ +++++++++++++ + +++++++++++++++++++++ ++++++++++++++++++ ++ + ++ ++ +++++++++++++++++++++++ ++++++++++++++ ++++++++++++ +++++ ++++ ++++++++ ++ +++ + +++++ ++++ 0.75 0.75 0.75 + 0.75 ++ ++++ +++ + ++ + +++++++ 0.50 ++++++++++ 0.50 0.50 + +++++ ++ 0.50 +++ +++ ++ 0.25 0.25 0.25 + 0.25

Overall survival probability log rank P = 0.043 log rank P = 0.002 log rank P = 0.002 log rank P = 0.044 0.00 0.00 0.00 0.00 012345 012345 012345 012345 Years Years Years Years Number at risk Number at risk Number at risk Number at risk Low score 56 48 26 17 13 9 Low score 68 63 36 13 00Low score 70 47 13 732 Low score 136 123 89 66 52 37 High score 55 45 19 13 11 7 High score 68 67 45 23 82High score 70 47 19 10 54High score 136 116 95 70 54 41

b Bladder (BLCA): S2 & 3 Liver #3 (GSE14520): S1 & 2 Lung (LUAD): S1 Pancreas (PAAD): S1 & 2 Stomach (STAD): S2 & 3 1.00 +++++++ 1.00 ++++ 1.00 +++++++++++++++ 1.00 ++ 1.00 +++++ ++ + + + ++++++++++++++++ + +++++ +++++++++++++ + ++++++ +++++ ++ ++++ ++ + + ++ ++++ +++++ ++ + +++ + +++++++++ ++++++ ++ ++ + ++ ++ +++++ 0.75 +++ 0.75 +++++++++++++ 0.75 ++ +++++ 0.75 ++ 0.75 +++ ++ +++++ ++ ++++++ + ++++ + + +++++++++++++ ++ ++++ ++ +++ +++ +++ +++ +++++ + ++++++ +++ + + +++ +++++++++++++++++++++ ++++++++++++ + + + ++ + 0.50 ++ ++ ++++ +++ +++++++++ 0.50 ++ +++++++++ 0.50 0.50 + +++++ + 0.50 +++ ++++ ++++ + ++++ + 0.25 + 0.25 0.25 0.25 + 0.25 Overall survival probability log rank P = 0.004 log rank P = 0.004 log rank P = 0.025 log rank P = 0.005 log rank P = 0.044 0.00 0.00 0.00 0.00 0.00 012345 012345 012345 012345 012345 Years Years Years Years Years Number at risk Number at risk Number at risk Number at risk Number at risk Low score 108 87 40 31 23 16 Low score 87 80 74 67 63 21 Low score 114 103 55 38 26 20 Low score 70 47 17 10 65Low score 122 89 42 22 96 High score 106 86 43 24 22 16 High score 87 73 63 57 50 19 High score 115 97 53 33 20 15 High score 70 47 15 721High score 123 83 34 711

c Bladder (BLCA) Renal papillary cell (KIRP) Liver #2 (LIRI-JP) Pancreas (PAAD) Endometrial (UCEC) 1.00 1.00 1.00 1.00 1.00

0.75 0.75 0.75 0.75 0.75

0.50 0.50 0.50 0.50 0.50

0.25 0.25 0.25 0.25 0.25 TP (Sensitivity )

0.00 0.00 0.00 0.00 0.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.500.751.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 FP (1−Specificity)

Classifier AUC Classifier AUC Classifier AUC Classifier AUC Classifier AUC TNM 0.626 TNM 0.640 TNM 0.624 TNM 0.593 TNM 0.674 Sig. 1 0.645 Sig. 1 0.738 Sig. 1 0.652 Sig. 1 0.754 Sig. 1 0.635 Sig. 1 + TNM 0.693 Sig. 1 + TNM 0.834 Sig. 1 + TNM 0.688 Sig. 1 + TNM 0.756 Sig. 1 + TNM 0.725

d Cervical (CESC) Renal clear cell (KIRC) Renal papillary cell (KIRP) Liver #3 (GSE14520) Pancreas (PAAD) 1.00 1.00 1.00 1.00 1.00

0.75 0.75 0.75 0.75 0.75

0.50 0.50 0.50 0.50 0.50

0.25 0.25 0.25 0.25 0.25

0.00 0.00 0.00 0.00 0.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

Classifier AUC Classifier AUC Classifier AUC Classifier AUC Classifier AUC TNM 0.455 TNM 0.716 TNM 0.640 Sig. 2 0.692 Sig. 2 0.665 Sig. 2 0.810 TNM 0.728 TNM 0.593 Sig. 2 + TNM 0.701 Sig. 2 + TNM 0.777 Sig. 2 + TNM 0.852 Sig. 2 0.675 Sig. 2 0.668 Sig. 2 + TNM 0.735 Sig. 2 + TNM 0.679 Fig. 3 Tumor subgroup analyses and evaluation of prognosis predictive performance of gene signatures across diferent malignant grades. Kaplan–Meier plots show independence of a signature 1 (green panels) and b signature 2 (red panels) over the current TNM staging system in predicting prognosis in diferent cancer cohorts. Patients were sub-grouped according to TNM stages and further stratifed using either signature 1 or signature 2 scores. Both signatures successfully identifed high-risk patients in diferent TNM stages. P values were calculated from the log-rank test. Analysis of specifcity and sensitivity of c signature 1 (green panels) and d signature 2 (red panels) in predicting prognosis in diferent cancer cohorts using receiver operating characteristic (ROC) curves. Plots depict comparison of ROC curves of signature 1 or 2 and clinical TNM staging. Both signatures demonstrate incremental values over the current TNM staging system. AUC: area under the curve. TNM: tumor, node, metastasis staging. Liver #2 LIRI-JP cohort and Liver #3 GSE14520 cohort (Additional fle 1). Representative plots are depicted in this fgure. Additional plots = = are available in Additional fle 4 Chang et al. Cancer Commun (2019) 39:23 Page 8 of 14

(AFP) levels were all signifcantly associated with survival and gastric adenocarcinoma (STAD: HR, 1.525; 95% CI in the GSE14520 cohort; tumor size could also predict 1.007–2.307; P = 0.046) but with prolonged survival survival in the LIRI-JP cohort (Additional fle 3). When in uterine corpus endometrial carcinoma (UCEC: HR, these signifcant covariates along with signatures 1 or 2 0.516; 95% CI 0.272–0.978; P = 0.042) (Additional fle 3). were included in multivariate Cox models, the signatures Mutations in another gene from the protocadherin alpha remained signifcant risk factors: signature 1 (LIRI-JP: cluster, PCDHA2, were also associated with adverse out- HR, 0.541; 95% CI 0.283–0.904; P = 0.043) and signature comes in gastric adenocarcinoma (STAD: HR, 1.604; 95% 2 (LIRI-JP: HR, 4.539, 95% CI 2.055–10.029; P < 0.001 and CI 1.061–2.427; P = 0.025) (Additional fle 3). Mutations GSE14520: HR, 2.012; 95% CI 1.267–3.195; P = 0.003) in TTN and the tumor suppressor TP53 were associ- (Additional fle 3). Tese results highlight the potentially ated with short survival in bladder urothelial carcinoma superior prognostic ability of our signatures: signatures 1 (BLCA: HR, 1.610; 95% CI 1.091–2.376; P = 0.016) and and 2 identifed high- and low-risk patients in 8 and 12 uterine corpus endometrial carcinoma (UCEC: HR, independent cohorts covering 10 cancer types (Fig. 1a). 1.780; 95% CI 1.025–3.090; P = 0.041) (Additional fle 3). Interestingly, another tumor suppressor PTEN, when Signifcance of somatic mutations in risk‑stratifed patients mutated, was linked to better outcomes in uterine cor- Patients were risk stratifed into low- and high-risk pus endometrial carcinoma (UCEC: HR, 0.427; 95% groups using signatures 1 and 2. For signature 1, high- CI 0.234–0.781; P = 0.006) (Additional fle 3). Similar risk patients had signifcantly lower expression levels observations were made for a lipid kinase gene PIK3CA of good prognosis genes ALKBH4, ALKBH7, KDM8, in uterine corpus endometrial carcinoma (UCEC: HR, KDM6B, and P4HTM (Additional fle 5A). In contrast, 0.362; 95% CI 0.190–0.689; P = 0.002) (Additional fle 3). high-risk patients as stratifed by signature 2 had signif- Likewise, MUC4 mutations prolonged survival in renal cantly higher expression levels of adverse prognosis genes clear cell carcinoma patients (KIRC: HR, 0.570; 95% CI ASPH, KDM3A, P4HA1, PLOD1, and PLOD2 (Additional 0.370–0.880; P = 0.012) (Additional fle 3), an observation fle 5B). To ascertain the relationship between tumor that is consistent with another study [32]. hypoxia and expression of signature genes, hypoxia Multivariate Cox regression analyses on signatures scores were computed for each patient as mean expres- 1 and 2 while controlling for signifcant somatic muta- sion values ­(log2) of 52 hypoxia signature genes [19]. tion variables revealed that the gene signatures were Signature 1 expression scores in patients negatively corre- independent survival predictors for bladder urothelial lated with hypoxia score (Additional fle 6A). Since tumor carcinoma (signature 1: HR, 0.686; 95% CI 0.466–0.912; hypoxia is associated with distant metastasis, recurrence, P = 0.047 and signature 2: HR, 1.411; 95% CI 1.062– and reduced therapeutic response [30], high expression 2.070; P = 0.048), renal clear cell carcinoma (signature 2: of signature 1 genes (low hypoxia score) was correlated HR, 1.520; 95% CI 1.123–2.056; P = 0.007), gastric adeno- with less advanced disease states consistent with it being carcinoma (signature 2: HR, 1.800; 95% CI 1.184–2.737; a marker of good prognosis (Additional fle 6A). P = 0.006), and uterine corpus endometrial carcinoma Conversely, signature 2 scores positively correlated (signature 1: HR, 0.519; 95% CI 0.293–0.920; P = 0.024) with tumor hypoxia and hence poor survival outcomes (Additional fle 3). Signatures 1 or 2 and mutation status (Additional fle 7A). We anticipated that patients’ indi- were collectively associated with OS (Fig. 4). In bladder vidual risks of death, as determined from signatures 1 urothelial carcinoma, high-risk patients (low signature 1 and 2, would positively correlate with tumor hypoxia. score) harboring mutant alleles of PCDHA1 had ~ 50% Indeed, the risk score for each patient, as calculated by increased mortality at 5 years compared to low-risk taking the sum of Cox regression coefcient for each of patients (high signature 1 score) with wild-type PCDHA1 the individual genes multiplied with its corresponding (P = 0.016; Fig. 4). Although results were less dramatic expression value [31], was correlated with the hypoxia for PCDHA1 and signature 2, we still observed a ~ 25% score (Additional fles 6B, 7B). Hence, high-risk patients elevated mortality at 5 years for these two patient groups had more hypoxic tumors, suggesting that our gene sig- with bladder urothelial carcinoma (P = 0.040; Fig. 4). In natures are efcient and adequate in predicting death. gastric adenocarcinoma, high-risk patients (high signa- To ascertain the association between patients’ risks, as ture 2 scores) with mutant PCDHA1 had the worst out- determined by our gene signatures, and somatic muta- comes (P = 0.002; Fig. 4). Conversely, PCDHA1 mutation tions, we retrieved the fve most commonly mutated was associated with good prognosis in uterine corpus genes for each cancer. Mutations in PCDHA1, a cell endometrial carcinoma, hence high-risk patients with adhesion gene from the cadherin superfamily, were asso- wild-type PCDHA1 had the lowest survival rates while ciated with short survival in bladder urothelial carci- survival was prolonged by ~ 20% in low-risk patients with noma (BLCA: HR, 1.649; 95% CI 1.058–2.569; P 0.027) = mutant PCDHA1 (P = 0.003; Fig. 4). PIK3CA (P < 0.001) Chang et al. Cancer Commun (2019) 39:23 Page 9 of 14 + + ++ + + + + + + + + + 5 + 11 10 + + 44 31 + ++ + + ++ ++ + + + ++ + + + + + + + + + + ++ + + ++ + + 16 12 59 46 + + ++ + ++ TP53 ++ +

++ ++ ++ + + ++ + + + + + + + + + ++ + + + + 19 21 + + + 73 57 + + + + + + + + + + + + = 0.002 + + + ++ + ++ + + + + + + + + + P + + + ++ + + + + Years + + + + ++ + + + + + + + + 28 27 + + + + 83 + + + 103 ++ + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + sk + + + + + + + + + + + + + + ++ + + 40 30 log rank + + + 133 118 + + Endometrial: + ++ + + + + + + + + + + + + + + + + + + + + + + + + 01234 1 1 45 32 + 10 152 139 + Number at ri + 1.00 0.75 0.50 0.25 0.00 ++ WT WT 43 15 mutan t mutan t + ++ TP53 TP53 ++ ++ + ++ TP53 TP53 ++ +

++ 11 51 + + + 27 10 + + + ++ + + + + + + ++ PCDHA2 w Risk, + + + + < 0.001 + + + ++ + + ++ : + + + Lo + P + + High Risk, + w Risk , + Years ++ + + + + 43 16 26 13 + + Lo + + + + + + High Risk, + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + sk + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + 27 + + + + + + + 15 34 20 85 + + + + 34 + 68 34 + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + log rank + + ++ + + + + ++ + ++ + + + + + + + ++ + + + Stomach + + ++ + + + + + + + + + + ++ + + + + + ++ + + 41 21 42 29 + + + ++ 59 ++ 110 92 49 + + ++ + 012345 + + + + + Number at ri ++ ++ ++ ++ PTEN

WT WT ++ + +

+ ++ + ++ ++ + + + + 1.00 0.75 0.50 0.25 0.00 + + + + + 51 + + + 25 49 45 + + + + + + + mutant mutan t ++ ++

+ + + + = 0.001 + + ++ + + ++ + + + + + + + + P + + ++ + + ++ + Years + + + + + + PCDHA2 PCDHA2 + + + ++ + + + 75 + + 36 + 67 63 + + + + + + + + ++ PCDHA2 PCDHA2 ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ sk + + + + + + + + + + + w Risk , + + ++ + + ++ 12345 51 85 78 log rank + + 107 Lo High Risk, + w Risk , + + + + + Endometrial: Lo High Risk , ++ + ++ + + + + + + + + + + + + + + + + + + 0 59 98 86 125 + + Number at ri 1 + + 10 1.00 0.75 0.50 0.25 0.00 + WT WT ++ mutant mutant 43 15 + PTEN PTEN ++ ++ PTEN PTEN ++ + ++ ++ + ++ + + 51 + 11 1 + + 27 + 10 ++ + w Risk, + + + + + ++ + + PCDHA1 + + Lo + + + ++

+ High Risk, + w Risk, = 0.002 ++ + + + + + + + + Years P ++ + Lo + + High Risk, + + + + + + 13 43 + 16 + 26 + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + sk + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 26 24 + 35 + + + + 84 36 66 + 16 30 + + + + + + + + + ++ ++ + + + + log rank + + ++ + + + + + ++ + + + + + + Stomach: + + + ++ + + + ++ + + + + + + + ++ + + ++ + ++ + + ++ + + + + + 012345 + + 41 35 51 + + 61 90 21 36 + + ++ 108 ++ + Number at ri ++ ++ ++ PIK3CA WT WT ++ 1.00 0.75 0.50 0.25 0.00 ++ + +

++ + ++

+ + + + + + + + + + + + 47 49 + + + + + + 29 45 + + + + mutant mutan t + + +

+ + + + < 0.001 + ++ + + + + + + ++ + + + + P + + + + + ++ + Years + + + + + + + PCDHA1 PCDHA1 + + + + + + + 71 72 + + + + 40 58 + + + + + + + ++ PCDHA1 PCDHA1 + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + ++ + + isk + + + + + + + + w Risk, + + + + 94 53 69 + log rank + Lo 105 ++ High Risk, + w Risk, + + ++ + + Lo High Risk , + Endometrial: + ++ + ++ + + + ++ + + + + + + + + + 012345 56 74 + 128 110 + + + + Number at r + + + + + + + + + + + + + + 51 62 19 + + + + ++ 20 + + + + + + + + + + 1.00 0.75 0.50 0.25 0.00 WT WT + + + + + + + + + + mutant mutant + + + ++ + + + + + + + + + + + + + + + + + + + + + 76 92 27 + + 26 + + + + + ++ + + + + + MUC4 + PIK3CA PIK3CA + +

+ + + + + + + + + PIK3CA PIK3CA + + + + ++ + + + + + + + + + + + + ++ 31 + + 30 + 107 121 + + + + + w Risk, + + + + = 0.003 + + + + + + + + Lo + + P + High Risk, w Risk, + Years + + + + + + + + + Lo + + + High Risk, + 36 + + + 34 + + 130 155 + ++ + + lear cell: + + + + + + + + + + + + + + + + + + ++ ++ + + + + + + isk + + + + + ++ + + 29 + 27 + + + 27 + + 13 + + + + + + + + ++ + + + ++ + + 43 + + + 38 + log rank + 163 190 + ++ ++ + + + + ++ + + + + + + + + + + + + + + + ++ + + + + + + ++ + + + ++ ++ + + Renal c + 44 40 + 31 + + 18 + + ++ + ++ + + + + 012345 + + + + 52 + + 41 210 219 + Number at r ++ ++ ++ + + + ++ + ++ + PCDHA1 + ++ + 1.00 0.75 0.50 0.25 0.00 + + + + + + + +

+ 54 + 60 + + 34 + + 22 + + + + + ++ + + + WT WT = 0.003 + + + ++ + + ++ + + + + + + + P + + + ++ + + mutant mutant + Years + ++ + + + + + + ++ + + + + + 80 + + 85 + MUC4 MUC4 45 + 31 + + + + + + + ++ + + + + + + + + + + + + + + + MUC4 MUC4 + + + + + + + + + + + + + + + + + ++ + + isk ++ + + + + + + + + + ++ w Risk, 57 log rank + 42 + 116 106 + + + Lo High Risk, + w Risk, + + ++ + ++ + Lo High Risk, + Endometrial: + + + + + + + + + + + + + + + + 012345 67 47 137 117 + + Number at r + + 1 + + 21 20 ++ 1.00 0.75 0.50 0.25 0.00 WT WT ++ ++ + + mutant mutant + + + ++ 29 26 ++ PCDHA1 PCDHA1 ++ + PCDHA1 PCDHA1 53 ++ ++ + + 37 29 ++ = 0.040 + PCDHA1 w Risk,

P ++ + + + + + Lo + + High Risk, + + w Risk, + ++ Years + + + + + ++ + + 5200 ++ + Lo 11 High Risk, + + 49 49 + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + isk + + + + + + + + + 21 + 20 + + + + + + + + + + + log rank + + + + + + 12345 + + + + + + + + + + + + 23 19 + + + + + 91 + 105 + + Bladder: ++ + + ++ + + + + + + + + + ++ + ++ 30 + + 25 + + ++ 0 31 23 ++ 134 120 Number at r +

++ ++

++ + 320 411 1.00 0.75 0.50 0.25 0.00 WT WT

+ 36 30

y + probabilit survival Overall + PCDHA1 = 0.016 mutant mutant

+ + ++ + + P ++ + + + + + + Years ++ + + + + + + + + + + 6 + 51 47 10 + + PCDHA1 PCDHA1 + ++ + + + + + + + + + + + + + + + + + + + + + PCDHA1 PCDHA1 + + + + + + + + + + + + + + + + + + + isk + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + 92 w Risk, + + 17 25 + + log rank 104 + + Bladder: + Lo High Risk, w Risk, + + + + + + Lo + + High Risk, + + + + + ++ 012345 20 34 122 132 Number at r 1.00 0.75 0.50 0.25 0.00 WT WT

mutant mutant y probabilit survival Overall PCDHA1 PCDHA1

PCDHA1 PCDHA1

w Risk, signature 1 using a signature or high-risk low groups median-stratifed into were Patients and common genetic mutations. risks gene signatures as determined by Relationship between patients’ Lo a b High Risk, w Risk, Lo High Risk, 4 Fig. 2 is a marker of poor Signature 1 genes. of signature mean expression high-risk patients had a lower 1 is a marker of good prognosis, panels). Since signature 2 (red panels) and b signature (green Kaplan–Meier survival 2 genes. 1 or 2 on overall of signature of somatic mutations with signatures plots depict hence high-risk patients had a higher mean expression combined relation prognosis, the log-rank test from calculated P values were in cancer patients. Chang et al. Cancer Commun (2019) 39:23 Page 10 of 14

CCND1, CCNE1, and CCNE2) and cyclin-dependent and PTEN mutations (P = 0.001) were associated with good outcomes in uterine corpus endometrial carcinoma kinases (CDK1, CDK2, CDK4, CDK6, CDK7, and CDK8), (Fig. 4). Mutations in another cadherin gene PCDHA2 which were consistent across all liver and pancreatic can- when considered alongside signature 2 were also associ- cer cohorts (Fig. 5d). Tis implied that KDM8 is required ated with survival in gastric adenocarcinoma (P < 0.001; for tight control of the cell cycle machinery and its reduc- Fig. 4). Survival rates were reduced by ~ 37% in high-risk tion may lead to aberrant proliferation commonly seen in patients with mutant PCDHA2 (Fig. 4). Joint relation cancer cells. between TP53 mutations and signature 1 signifcantly To ascertain the biological consequences of deregulated infuenced survival in uterine corpus endometrial car- KDM8 expression, we conducted diferential expres- sion analysis on liver cancer patients categorized into cinoma (P = 0.002; Fig. 4). Since MUC4 mutations were associated with good outcomes, survival rates were the KDM8-low and -high groups. A total of 745 genes were lowest in high-risk patients (high signature 2 scores) with diferentially expressed (DEGs) between the two groups (fold change > 2 or < 2, P < 0.05) (Additional fle 8). Sig- wild-type MUC4 (P = 0.003; Fig. 4). − nifcant enrichments of biological pathways involved Tumor suppressive roles of KDM8 through cell cycle in metabolism, immune regulation, VEGF production, regulation and cell adhesion maintenance infammation, and cell adhesion were observed (Fig. 5f Of all the signature genes, KDM8 was identifed as one and Additional fle 9). Furthermore, DEGs were overex- of the most down-regulated genes in tumors (Fig. 5a). pressed as targets of HNF4A, HNF4G, FOXA1, FOXA2, Patients with high KDM8 levels had a signifcantly lower and NR2F2 transcription factors (TFs) (Fig. 5g). Tese risk of death in pancreatic and liver cancer cohorts TFs play central roles in cell polarity maintenance and (Fig. 5e). Prognostic signifcance of KDM8 was also epithelial diferentiation [26, 34, 35], hence down-regula- independent of tumor stage (Fig. 5e). KDM8 expression tion of KDM8 may drive epithelial–mesenchymal transi- decreased as tumor malignant grade increased in that tion (EMT) and tumor progression. HNF4A is a key TF stage 1 tumors had the highest median KDM8 values responsible for regulating a myriad of hepatic functions (Fig. 5b). Moreover, KDM8 expression was negatively including cell junction assembly [26, 36]. Pathway anal- correlated with hypoxia score, indicating that patients ysis of KDM8 DEGs revealed enrichment of processes with low levels of KDM8 had more hypoxic tumors and related to cell adhesion, suggesting potential crosstalk poorer survival outcomes (Fig. 5c). Together, these obser- between KDM8 and HNF4A. Of the 745 DEGs, analysis vations suggest that KDM8 may function as a tumor on a hepatoma-based HNF4A chromatin immunoprecip- suppressor. Tis hypothesis is corroborated by an inde- itation-sequencing dataset demonstrated that 148 genes pendent report on the role of KDM8 in cell cycle regula- were directly bound by HNF4A [37]. To further reinforce tion [33]. Indeed, we observed that KDM8 expression was the interplay between HNF4A and KDM8, we observed negatively correlated with the expression levels of canon- that 110 of the 745 DEGs were overrepresented in ical cell cycle genes: cyclins (CCNA2, CCNB1, CCNB2, HNF4A-null mice [26] and 45 of these genes were direct

(See fgure on next page.) Fig. 5 Putative tumor suppressive functions of KDM8 occur through processes related to cell cycle regulation and cell adhesion. a Expression of KDM8 was signifcantly lower in tumor (T) samples than in non-tumor (NT) samples in liver and pancreatic cancer cohorts. Mann–Whitney– Wilcoxon tests were used to compare T and NT samples. Asterisks represent signifcant P values: *** < 0.0001. b Expression levels of KDM8 decreased with disease progression and malignant grade in liver and pancreatic cancer cohorts. c Signifcant negative correlation between patients’ KDM8 expression and tumor hypoxia (hypoxia score) in liver and pancreatic cancer cohorts. d Correlation between KDM8 expression and canonical cell cycle regulators in patients with liver or pancreatic cancers. A majority of genes involved in cell-cycle regulation are negatively correlated with KDM8 expression. Liver #1 LIHC cohort; Liver #2 LIRI-JP cohort; and Liver #3 GSE14520 cohort (Additional fle 1). e Kaplan–Meier analysis of patients = = = stratifed by KDM8 expression. Patients were median-dichotomized into low- and high-expression groups. Patients with low KDM8 expression had signifcantly shorter overall survival. This was consistent in patients analyzed as a full cohort or sub-categorized according to TNM stage. Liver #1 LIHC cohort; Liver #2 LIRI-JP cohort; and Liver #3 GSE14520 cohort (Additional fle 1). f Patients were median-stratifed according to KDM8 = = = expression. Diferential expression analysis between KDM8-high- and -low groups in liver cancer cohorts revealed 745 diferentially expressed genes (DEGs; fold-change > 2 or < 2). Enrichment of biological pathways associated with DEGs, which include processes related to cell adhesion, − infammation, metabolism, and signal transduction pathways in cancer. g Enrichment of transcription factors (TFs) from the ENCODE database that are potential regulators of KDM8 DEGs. These TFs were predicted to bind near KDM8 DEGs. h Venn diagram depicts the overlap between HNF4A targets (as identifed by ENCODE chromatin-immunoprecipitation sequencing dataset) and genes afected by HNF4A loss-of-function (as identifed in HNF4A-null mice). Of the 745 DEGs, 148 were identifed as direct HNF4A targets, and 110 genes were afected by HNF4A loss-of-function. In the Venn intersection, 45 genes were both HNF4A targets and altered in HNF4A-null mice. i Scatter plot depicts expression patterns of 110 genes afected by HNF4A loss-of-function. Gene names of the 45 HNF4A targets are annotated on the plot. A majority of KDM8-associated genes were down-regulated in the HNF4A-null mice Chang et al. Cancer Commun (2019) 39:23 Page 11 of 14

d a Pancreas Liver #1 Liver #2 Liver #3 c Pancreas Liver #1 CCND2 13 0.4 8 13 CDK10 6 P < 0.0001 *** *** *** 7 *** 8 CDK9 11 11 0.2 4 CCNA1 7 6 7 CCNC 9 9 2 KDM8 CDK7 0 6 5 6 CDK1 7 7 0 4 CCNB1 0.2 Expression (Log2) 5 5 CCNB2 NT TNT T NT TNT T 910118.5 9.5 10.5 CCNE1 0.4 CDK2

xpression (log2) Liver #2 Liver #3 CCNA2 Pancreas Liver #1 Liver #2 Liver #3 6 7 CCNE2 b P < 0.0001 P < 0.0001 13 5.5 CCND1 KDM8 e 6 4 4 CDK4 8 CDK8 11 5.0 3 5 CDK6 2 7 CCND3 9 2 4.5 4

KDM8 CDK5 0 P GSE14520 LIHC LIRI-JP 6 1 4.0 3 AAD 7 345678 Expression (Log2) 0 5 3.5 Hypoxia score s1 s2 s3 s1 s2 s3 s4 s1 s2 s3 s4 s1 s2 s3 s4

e Pancreas Liver #1 Liver #1: S1 & 2 Liver #1: S3 & 4 Liver #2 1.00 ++ 1.00 +++ 1.00 ++ 1.00 1.00 + + ++++++ ++++++++++ +++++++ + ++ ++++++++ ++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++ + ++ ++ ++++++++++ +++++++ +++ +++ + ++ +++ +++++++++++++++++ ++++ ++++++++ + +++++ +++ 0.75 ++++ 0.75 +++++++ +++ 0.75 +++++++++ 0.75 + 0.75 ++++++++++ ++++++ ++ +++++ ++++ +++++++++++ +++ ++++ ++ ++++++++ ++++ ++ +++ ++ + +++ +++++++++ ++ + +++++ +++ ++++++++ + +++ ++++++ 0.50 0.50 +++++ 0.50 0.50 ++++++++ 0.50 + ++ ++ + + + +++++ + 0.25 ++ 0.25 0.25 0.25 0.25 Overall survival probabilit y log rank P = 0.016 log rank P = 0.003 log rank P = 0.040 log rank P = 0.032 log rank P = 0.026 0.00 0.00 0.00 0.00 0.00 012345 012345 012345 012345 012345 Years Years Years Years Years Number at risk Number at risk Number at risk Number at risk Number at risk Low exp. 67 43 12 732 Low exp. 147 101 48 35 24 16 Low exp. 105 82 40 31 19 13 Low exp. 39 17 9664 Low exp. 111 92 54 24 50 High exp. 80 58 20 10 54High exp. 144 117 71 44 34 22 High exp. 108 88 54 34 28 18 High exp. 39 31 16 853 High exp. 111 100 65 32 11 2

Liver #2: S1 & 2 Liver #2: S2 & 3 Liver #3 Liver #3: S1 & 2 Liver #3: S2 1.00 + 1.00 + 1.00 + 1.00 +++ 1.00 + ++ ++++++++++++++++++++++++++++++++ ++++++ ++++ ++ ++ + + + y ++ ++++++++++++ ++ + +++ ++++++++++++++++++ + + + + ++++ +++ ++++ + ++ + 0.75 +++++++ 0.75 ++ 0.75 ++ 0.75 ++++++++++++++++++++ 0.75 ++ + +++++ ++++++++++++++ ++++++ ++ ++ + ++++++++++++ + ++ ++++++++++++++++++++ + ++++++ + +++ ++++++++ ++++++++++++ 0.50 0.50 0.50 ++++ 0.50 0.50 +++ ++++ ++++++++ ++ 0.25 0.25 0.25 0.25 0.25

Overall survival probabilit log rank P < 0.001 log rank P = 0.047 log rank P = 0.039 log rank P = 0.012 log rank P = 0.004 0.00 0.00 0.00 0.00 0.00 012345 012345 012345 012345 012345 Years Years Years Years Years Number at risk Number at risk Number at risk Number at risk Number at risk Low exp. 68 63 36 15 20Low exp. 84 70 40 19 40 Low exp. 112 88 69 61 55 18 Low exp. 87 71 60 54 50 17 Low exp. 39 31 24 20 18 6 High exp. 68 67 45 21 62High exp. 85 77 48 26 11 2 High exp. 113 100 87 78 71 23 High exp. 87 82 77 70 63 23 High exp. 39 36 32 28 23 9

f I Biological pathways g ENCODE TFs HNF4A null mice Regulation of lipoprotein metabolic process P-value Regulation of fibrinolysis HNF4G_HepG2 0.00015 8 FOXA2_HepG2 Regulation of immune system process F13B NADP metabolic process 0.00010 GPD1 HNF4A_HepG2 SLC38A3 Regulation of triglyceride catabolic process PROZ AGXXT PIPOX NR2F2_HepG2 0.00005 LCCAT Regulation of fatty acid biosynthetic process CPN2 APOC2APOOCC2 Phospholipid homeostasis FOXA1_HepG2 HPXPXPX SARDH 6 LEAP2LEA TFRTTF 2 SLC47A1 Complement activation, lectin pathway 2481632 AGMGMAT ITIH1 SLC27A5 HP

alue HFE2 Retinoic acid metabolic process Fold enrichment MASP1MA 1 VTN Acyl−coa metabolic process ACOX2 AASS TTR Tyrosine metabolism IYD SERPINC1C1 SERPINF1F1 C3 Chemical carcinogenesis 4 MAT1A ENTPD8PD Glyoxylate metabolic process ITIH3

−log10 P−v APPOA1 Sterol metabolic process CREB3L3 P-value h Positive regulation of VEGF production 0.04 SERPINA11 F1F 1 CGNL1 FTCD PPAR signaling pathway DAO 103 45 65 SLC22A7 Glycolysis / gluconeogenesis 0.03 2

Fatty acid biosynthetic process 0.02 Platelet degranulation Cell adhesion 0.01 Inflammatory response HNF4A HNF4A 24816 targets loss-of-function −8 −6 −4 −2 02 Fold enrichment (148 genes) (110 genes) Fold change (log2) Chang et al. Cancer Commun (2019) 39:23 Page 12 of 14

HNF4A targets (Fig. 5h). Diferential expression analysis prognostic information in pancreatic cancer, whereas between HNF4A-defcient and wild-type mice showed those studies employed very diferent approaches for that a majority of the 45 genes were down-regulated, as gene signature discovery. expected, suggesting that HNF4A directly activates their KDM8 is a gene associated with favorable prognosis. gene expression, many of which are involved in a multi- Our results suggest determinative crosstalk between tude of cell adhesion processes (Fig. 5i). KDM8 and HNF4A, particularly in the context of mor- phogenesis, cell adhesion, maintenance of cell polar- Discussion ity, and epithelial formation. Moreover, 5-year survival Te present multi-cohort retrospective study identi- rates dropped to ~ 12% in bladder cancer patients with fed two novel pan-cancer prognostic gene signatures low expression of signature 1 genes (high-risk), which derived from oxygen-sensing genes. Cross-platform included KDM8 and PCDHA1 mutations. Additive validations confrmed prognosis in 10 cancer types to efects conferred by mutations in this cell-adhesion collectively include 6761 patients spanning 20 diverse supports the hypothesis that KDM8 is likely a cohorts (Fig. 1a). Te gene signatures had opposing prog- tumor suppressor and down-regulation of this gene may nostic values: signature 1 is a marker of good prognosis, lead to a loss of epithelial phenotype and cell adhesion whereas signature 2 is associated with poor outcomes. to promote cancer invasion. Additionally, loss of KDM8 Te key strengths of our signatures as powerful prognos- expression is correlated with increased expression of tic tools are (1) pan-cancer utility, (2) involvement of a cell cycle genes that may contribute to deranged cell mere 5 genes each that provide continuous assessment of cycle regulation and tumor progression (Fig. 5d). Like death risks, and (3) superiority over current TNM stag- KDM6B, the function of KDM8 appears to be cell type- ing. Our results suggest that dysregulated oxygen sens- dependent. While KDM8 expression is down-regulated ing in diverse cancer types may activate other oncogenic in liver and pancreatic tumor samples compared to adja- pathways such as the loss of cell polarity and cell cycle cent non-tumor samples (Fig. 5a), it is overexpressed in regulation, which collectively infuenced clinical out- breast cancer to induce EMT and invasion [47]. None- comes in patients. theless, KDM8 roles are not limited to cell cycle regu- Anti-tumorigenic functions have been reported for lation. KDM8 exerts tumor suppressive functions in several genes from signature 1. Loss of KDM6B resulted hematopoietic cancer by mediating DNA repair [48]. in more aggressive pancreatic ductal adenocarcinoma Collectively, imbalance in the Jumonji-C subfamily of [38]. In colorectal cancer, high KDM6B expression pre- lysine demethylases such as KDM8 and KDM6B is likely dicted good prognosis, and knock-down of KDM6B was to result in broad-ranging but cell type-specifc biologi- associated with augmented cell proliferation and inhib- cal efects. ited apoptosis [39]. Yet, KDM6B function is enigmatic. Others have reported that high KDM6B expression is Conclusions associated with increased metastasis and invasion of renal clear cell carcinoma [40]. KDM6B also promotes Overall, our gene signatures would enhance decision TGF-β-induced EMT and invasiveness in breast cancer making in clinic by stratifying patients according to [41]. While we could neither confrm nor deny the valid- their tumor biology. Tis may maximize treatment ef- ity of these studies, it is striking that our observation of cacy and prolong lifespan by directing resources to those favorable prognosis associated with high KDM6B expres- most in need. Tis technology may be incorporated into sion in pancreatic ductal adenocarcinoma was consist- existing diagnostic pathways to achieve a more individu- ent with the report from Yamamoto et al. [38]. We also alized standard of care by revealing molecular changes did not observe any prognostic signifcance of signature that allow further discrimination of otherwise simi- 1, which includes KDM6B, in either breast or renal clear larly staged tumors. As each signature only consists of cell cancer, which indirectly substantiates fndings from 5 genes, we anticipate that they can be implemented two other reports on KDM6B not being a marker of good immediately, even in modestly sized centers that are prognosis [40, 41]. Several other gene signatures have using PCR-based technology. Consequently, therapeu- been reported for gastrointestinal cancers [42–46]. Inter- tic options can be allocated more decisively based upon estingly, there is no overlap between our signature genes this additional personalized information to ensure that and those identifed in these studies. Tis is perhaps not patients with the most aggressive cancers get the most surprising since our signatures were identifed based on robust treatments. Chang et al. Cancer Commun (2019) 39:23 Page 13 of 14

Additional fles Authors’ contributions AGL had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Study concept Additional fle 1. Cancer cohort descriptions. and design: WHC and AGL. Data acquisition and analysis: WHC and AGL. Data interpretation: all authors. Drafting the manuscript: WHC and AGL. Critical revi- Additional fle 2. List of 61 2OG-dependent oxygenases. sion of the manuscript for important intellectual content: all authors. Supervi- Additional fle 3. Univariate and multivariate Cox proportional hazards sion: AGL. All authors read and approved the fnal manuscript. analysis of risk factors associated with overall survival in multiple cancers. Univariate values of TNM stage were in accordance with our previous Acknowledgements report utilizing TCGA datasets [4]. Not applicable. Additional fle 4. Additional tumor subgroup analyses and evaluation of prognosis predictive performance of gene signatures across difer- Competing interests ent malignant grades. Kaplan–Meier plots show independence of (A) The authors declare that they have no competing interests. signature 1 (green panels) and (B) signature 2 (red panels) over current TNM staging system in predicting prognosis in diferent cancer cohorts. Availability of data and materials Patients were sub-grouped according to TNM stages and further stratifed The datasets supporting the conclusions of this article are included within the using either signature 1 or signature 2 scores. Both signatures successfully article and its additional fles. identifed high-risk patients in diferent TNM stages. P values were calcu- lated from the log-rank test. Analysis of specifcity and sensitivity of (C) Consent for publication signature 1 (green panels) and (D) signature 2 (red panels) in predicting Not applicable. prognosis in diferent cancer cohorts using receiver operating character- istic (ROC) curves. Plots depict comparison of ROC curves of signature 1 Ethics approval and consent to participate or 2 and clinical TNM staging. Both signatures demonstrated incremental Not applicable. values over current TNM staging system. AUC: area under the curve. TNM: tumor, node, metastasis staging. Liver #1 LIHC cohort; Liver #2 LIRI-JP Funding cohort and Liver #3 GSE14520 cohort (Additional= fle 1). = Not applicable. = Additional fle 5. Distribution of expression of signature genes in low- Received: 7 March 2019 Accepted: 19 April 2019 and high-risk patients. (A) signature 1 (green panels) and (B) signature 2 (red panels). Patients were median-stratifed into low- and high-risk groups based on mean expression scores of signature genes. Box plots depict expression distribution of each of the 5 genes in both signatures in these two patient groups. (A) Since signature 1 is a marker of good References prognosis, high-risk patients show signifcantly lower expression of 1. Zeng W, Liu P, Pan W, Singh SR, Wei Y. Hypoxia and hypoxia inducible fac- individual signature genes. (B) In contrast, signature 2 is a marker of poor tors in tumor metabolism. Cancer Lett. 2015;356:263–7. prognosis, hence high-risk patients show signifcantly higher expression of 2. Denko NC. Hypoxia, HIF1 and glucose metabolism in the solid tumour. individual signature genes. Nonparametric Mann–Whitney–Wilcoxon tests Nat Rev Cancer. 2008;8:705. were used to compare low- and high-risk patients. Asterisks represent 3. Nagy JA, Chang SH, Dvorak AM, Dvorak HF. Why are tumour blood vessels signifcant P values: * < 0.01, ** < 0.001 and *** < 0.0001. LR low risk. abnormal and why is it important to know? Br J Cancer. 2009;100:865. HR high risk. = = 4. Chang WH, Forde D, Lai AG. A novel signature derived from immu- Additional fle 6. Correlation of patients’ risk scores derived from signa- noregulatory and hypoxia genes predicts prognosis in liver and fve other ture 1 with tumor hypoxia. (A) Signifcant negative correlation between cancers. J Transl Med. 2019;17:14. signature 1 expression scores and tumor hypoxia. (B) Signifcant positive 5. Semenza GL, Wang GL. A nuclear factor induced by hypoxia via de novo correlation between signature 1 risk scores and tumor hypoxia. Calcula- protein synthesis binds to the human erythropoietin gene enhancer at a tions of expression scores, risk scores, and hypoxia scores are explained in site required for transcriptional activation. Mol Cell Biol. 1992;12:5447–54. the methods. Liver #1 LIHC cohort and Liver #2 LIRI-JP cohort. 6. Ratclife PJ. Oxygen sensing and hypoxia signalling pathways in animals: = = the implications of physiology for cancer. J Physiol. 2013;591:2027–42. Additional fle 7. Correlation of patients’ risk scores derived from signa- 7. Semenza GL. Evaluation of HIF-1 inhibitors as anticancer agents. Drug ture 2 with tumor hypoxia. (A) Signifcant positive correlation between Discov Today. 2007;12:853–9. signature 2 expression scores and tumor hypoxia. (B) Signifcant positive 8. Melillo G. Inhibiting hypoxia-inducible factor 1 for cancer therapy. Mol correlation between signature 2 risk scores and tumor hypoxia. Calcula- Cancer Res. 2006;4:601–5. tions of expression scores, risk scores and hypoxia scores are explained in 9. Ploumakis A, Coleman ML. OH, the places you’ll go! hydroxylation, gene the methods. Liver #2 LIRI-JP cohort and Liver #3 GSE14520 cohort. = = expression, and cancer. Mol Cell. 2015;58:729–41. Additional fle 8. Diferentially expressed genes between KDM8-high and 10. Ko M, An J, Pastor WA, Koralov SB, Rajewsky K, Rao A. TET proteins and -low groups in the liver cancer cohort (LIHC). 5-methylcytosine oxidation in hematological cancers. Immunol Rev. 2015;263:6–21. Additional fle 9. Signifcantly enriched biological pathways of diferen- 11. Huang Y, Rao A. Connections between TET proteins and aberrant DNA tially expressed genes. modifcation in cancer. Trends Genet. 2014;30:464–74. 12. Dalgliesh GL, Furge K, Greenman C, Chen L, Bignell G, Butler A, et al. Abbreviations Systematic sequencing of renal carcinoma reveals inactivation of HIF: hypoxia inducible factor; 2OG: 2-oxoglutarate; OS: overall survival; KDM: modifying genes. Nature. 2010;463:360. Jumonji-C domain-containing lysine demethylase; TCGA​: The Cancer Genome 13. Van Haaften G, Dalgliesh GL, Davies H, Chen L, Bignell G, Greenman C, Atlas; ICGC:​ International Cancer Genome Consortium; GEO: Gene Expression et al. Somatic mutations of the histone H3K27 demethylase gene UTX in Omnibus; RSEM: RNA-seq by expectation maximization; HR: hazard ratio; CI: human cancer. Nat Genet. 2009;41:521. confdence interval; ROC: receiver operating characteristic; AUC​: area under 14. Weinstein JN, Collisson EA, Mills GB, Shaw KRM, Ozenberger BA, Ellrott K, the curve; KEGG: Kyoto Encyclopedia of Genes and Genomes; GO: gene et al. The cancer genome atlas pan-cancer analysis project. Nat Genet. ontology; BCLC: Barcelona Clinic Liver Cancer; TNM: tumor, node, metastasis; 2013;45:1113. EMT: epithelial–mesenchymal transition; DEG: diferentially expressed gene; TF: 15. Zhang J, Baran J, Cros A, Guberman JM, Haider S, Hsu J, Liang Y, Rivkin transcription factor; AFP: alpha-fetoprotein. E, Wang J, Whitty B, Wong-Erasmus M, Yao L, Kasprzyk A. International cancer genome consortium data portal—a one-stop shop for cancer Chang et al. Cancer Commun (2019) 39:23 Page 14 of 14

genomics data. Database. 2011;2011:bar026. https​://doi.org/10.1093/ 34. Zhang C, Han Y, Huang H, Qu L, Shou C. High NR2F2 transcript level is datab​ase/bar02​6. associated with increased survival and its expression inhibits TGF-β- 16. Li B, Dewey CN. RSEM: accurate transcript quantifcation from RNA-Seq dependent epithelial–mesenchymal transition in breast cancer. Breast data with or without a reference genome. BMC Bioinform. 2011;12:323. Cancer Res Treat. 2014;147:265–81. 17. Roessler S, Jia HL, Budhu A, Forgues M, Ye QH, Lee JS, et al. A unique 35. Song Y, Washington MK, Crawford HC. Loss of FOXA1/2 is essential for the metastasis gene signature enables prediction of tumor relapse in early- epithelial-to-mesenchymal transition in pancreatic cancer. Cancer Res. stage hepatocellular carcinoma patients. Cancer Res. 2010;70:10202–12. 2010;70:2115–25. 18. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers dif- 36. Odom DT, Zizlsperger N, Gordon DB, Bell GW, Rinaldi NJ, Murray HL, et al. ferential expression analyses for RNA-sequencing and microarray studies. Control of pancreas and liver gene expression by HNF transcription fac- Nucleic Acids Res. 2015;43:e47. tors. Science. 2004;303:1378–81. 19. Bufa FM, Harris AL, West CM, Miller CJ. Large meta-analysis of multiple 37. Consortium EP. An integrated encyclopedia of DNA elements in the cancers reveals a common, compact and highly prognostic hypoxia . Nature. 2012;489:57. metagene. Br J Cancer. 2010;102:428–35. https​://doi.org/10.1038/ 38. Yamamoto K, Tateishi K, Kudo Y, Sato T, Yamamoto S, Miyabayashi K, sj.bjc.66054​50. et al. Loss of histone demethylase KDM6B enhances aggressiveness of 20. Therneau TM. A package for survival analysis in S. 2015. https​://cran.r- pancreatic cancer through downregulation of C/EBPα. Carcinogenesis. proje​ct.org/packa​ge survi​val. 2014;35:2404–14. 21. Kassambara A, Kosinski= M. survminer: drawing survival curves using 39. Tokunaga R, Sakamoto Y, Nakagawa S, Miyake K, Izumi D, Kosumi K, et al. “ggplot2”. 2018. https​://cran.r-proje​ct.org/packa​ge survm​iner. The prognostic signifcance of histone lysine demethylase JMJD3/KDM6B 22. Schroeder M, Culhane A, Quackenbush J, Haibe-Kains= B. Survcomp: an R/ in colorectal cancer. Ann Surg Oncol. 2016;23:678–85. bioconductor package for performance assessment and comparison of 40. Li Q, Hou L, Ding G, Li Y, Wang J, Qian B, et al. KDM6B induces epithelial– survival models. Bioinformatics. 2011;27(22):3206–8. mesenchymal transition and enhances clear cell renal cell carcinoma 23. Tabas-Madrid D, Nogales-Cadenas R, Pascual-Montano A. GeneCodis3: metastasis through the activation of SLUG. Int J Clin Exp Pathol. a non-redundant and modular enrichment analysis tool for functional 2015;8:6334. genomics. Nucleic Acids Res. 2012;40:W478–83. 41. Ramadoss S, Chen X, Wang CY. Histone demethylase KDM6B promotes 24. Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, epithelial-mesenchymal transition. J Biol Chem. 2012;287:44508–17. et al. Enrichr: a comprehensive gene set enrichment analysis web server 42. Kukita Y, Ohkawa K, Takada R, Uehara H, Katayama K, Kato K. Selective 2016 update. Nucleic Acids Res. 2016;44:W90–7. identifcation of somatic mutations in pancreatic cancer cells through 25. Chen EY, Tan CM, Kou Y, Duan Q, Wang Z, Meirelles GV, et al. Enrichr: a combination of next-generation sequencing of plasma DNA using interactive and collaborative HTML5 gene list enrichment analysis tool. molecular barcodes and a bioinformatic variant flter. PLoS ONE. BMC Bioinform. 2013;14:128. 2018;13:e0192611. 26. Battle MA, Konopka G, Parviz F, Gaggl AL, Yang C, Sladek FM, et al. Hepato- 43. Takeuchi S, Doi M, Ikari N, Yamamoto M, Furukawa T. Mutations in BRCA1, cyte nuclear factor 4α orchestrates expression of cell adhesion proteins BRCA2, and PALB2, and a panel of 50 cancer-associated genes in pancre- during the epithelial transformation of the developing liver. Proc Natl atic ductal adenocarcinoma. Sci Rep. 2018;8:8105. Acad Sci USA. 2006;103:8419–24. 44. Peters MLB, Tseng JF, Miksad RA. Genetic testing in pancreatic ductal 27. Wickham H. ggplot2: elegant graphics for data analysis. New York: adenocarcinoma: implications for prevention and treatment. Clin Ther. Springer; 2016. https​://ggplo​t2.tidyv​erse.org/. 2016;38:1622–35. 28. R Core Team. R: A Language and Environment for Statistical Computing. 45. Li X, Xu W, Kang W, Wong SH, Wang M, Zhou Y, et al. Genomic analysis of Vienna: R Core Team; 2018. https​://www.r-proje​ct.org/. liver cancer unveils novel driver genes and distinct prognostic features. 29. Fujimoto A, Furuta M, Totoki Y, Tsunoda T, Kato M, Shiraishi Y, et al. Whole- Theranostics. 2018;8:1740. genome mutational landscape and characterization of noncoding and 46. Park J, Yoo HM, Jang W, Shin S, Kim M, Kim Y, et al. Distribution of somatic structural mutations in liver cancer. Nat Genet. 2016;48:500. mutations of cancer-related genes according to microsatellite instability 30. Vaupel P, Mayer A. Hypoxia in cancer: signifcance and impact on clinical status in Korean gastric cancer. Medicine. 2017;96(25):e7224. https​://doi. outcome. Cancer Metastasis Rev. 2007;26:225–39. org/10.1097/MD.00000​00000​00722​4. 31. Kim SM, Leem SH, Chu IS, Park YY, Kim SC, Kim SB, et al. Sixty-fve gene- 47. Zhao Z, Sun C, Li F, Han J, Li X, Song Z. Overexpression of histone dem- based risk score classifer predicts overall survival in hepatocellular ethylase JMJD5 promotes metastasis and indicates a poor prognosis in carcinoma. Hepatology. 2012;55:1443–52. breast cancer. Int J Clin Exp Pathol. 2015;8:10325. 32. King RJ, Yu F, Singh PK. Genomic alterations in mucins across cancers. 48. Suzuki T, Minehata K, Akagi K, Jenkins NA, Copeland NG. Tumor sup- Oncotarget. 2017;8:67152. pressor gene identifcation using retroviral insertional mutagenesis in 33. Wu B-H, Chen H, Cai C-M, Fang J-Z, Wu C-C, Huang L-Y, et al. Epigenetic Blm-defcient mice. EMBO J. 2006;25:3422–31. silencing of JMJD5 promotes the proliferation of hepatocellular carci- noma cells by down-regulating the transcription of CDKN1A. Oncotarget. 2016;7:6847.

Ready to submit your research ? Choose BMC and benefit from:

• fast, convenient online submission • thorough peer review by experienced researchers in your field • rapid publication on acceptance • support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations • maximum visibility for your research: over 100M website views per year

At BMC, research is always in progress.

Learn more biomedcentral.com/submissions