427

Original Article Page 1 of 12

Genome-wide mutation profiling and related risk signature for prognosis of papillary renal cell carcinoma

Chuanjie Zhang1,2#, Yuxiao Zheng3#, Xiao Li3#, Xin Hu4, Feng Qi5, Jun Luo6

1Department of Urinary Surgery, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai 200025, China; 2Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China; 3Department of Urology, Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research & Affiliated Cancer Hospital of Nanjing Medical University, Nanjing 210009, China; 4Department of Urology, the First Affiliated Hospital of Nanjing Medical University, Nanjing 210029, China; 5First Clinical Medical College of Nanjing Medical University, Nanjing 210029, China; 6Department of Urology, Shanghai Fourth People’s Hospital affiliated to Tongji University School of Medicine, Shanghai 200081, China Contributions: (I) Conception and design: J Luo, C Zhang; (II) Administrative support: Y Zheng; (III) Provision of study materials or patients: X Li; (IV) Collection and assembly of data: Y Zheng, X Li; (V) Data analysis and interpretation: X Hu; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors. #These authors contributed equally to this article. Correspondence to: Jun Luo. Department of Urology, Shanghai Fourth People’s Hospital Affiliated to Tongji University School of Medicine, No. 1878 North Sichuan Road, Hongkou District, Shanghai 200081, China. Email: [email protected].

Background: The papillary renal cell carcinoma (pRCC) is a rare subtype of renal cell carcinoma with limited investigation. Our study aimed to explore a robust signature to predict the prognosis of pRCC from the perspective of mutation profiles. Methods: In this study, we downloaded the simple nucleotide variation data of 288 pRCC samples from The Cancer Genome Atlas (TCGA) database. “GenVisR” package was utilized to visualize mutation profiles in pRCC. The PPI network was conducted based on the STRING database and the modification was performed via Cytoscape software (Version 3.7.1). Top 50 mutant were selected and Cox regression method was conducted to identify the hub prognostic mutant signature in pRCC using “survival” package. Mutation Related Signature (MRS) risk score was established by multivariate Cox regression method. Receiver Operating Characteristic (ROC) curve drawn by “timeROC” was conducted to assess the predictive accuracy of overall survival (OS) and Kaplan-Meier analysis was then performed. Relationships between mutants and expression levels were compared by Wilcox rank-sum test. Function enrichment pathway analysis for mutated genes was performed by “org.Hs.eg.db”, “clusterProfiler”, “ggplot2” and “enrichplot” packages. Gene Set Enrichment Analysis was exploited using the MRS as the phenotypes, which worked based on the JAVA platform. All statistical analyses were achieved by R software (version 3.5.2). P value <0.05 was considered to be significant. Results: The mutation landscape in waterfall plot revealed that a list of 49 genes that were mutated in more than 10 samples, of which 6 genes (TTN, MUC16, KMT2C, MET, OBSCN, LRP2) were mutated in more than 20 samples. Besides, non-synonymous was the most frequent mutation effect, and missense mutation was one of the most common mutation types in mutated genes across 248 samples. The AUC of MRS model consisted of 17 prognostic mutant signatures was 0.907 in 3-year OS prediction. Moreover, pRCC patients with high level of MRS showed the worse survival outcomes compared with that in low- level MRS group (P=0). In addition, correlation analysis indicated that 6 mutated genes (BAP1, OBSCN, NF2, SETD2, PBRM1, DNAH1) were significantly associated with corresponding expression levels. Last, functional enriched pathway analysis showed that these mutant genes were involved in multiple cancer- related crosstalk, including PI3K-AKT signaling pathway, JAK-STAT signaling pathway, extracellular matrix (ECM)-receptor interaction or cell cycle. Conclusions: In summary, our study was the first attempt to explore the mutation-related signature for predicting survival outcomes of pRCC based on the high-throughput data, which might provide valuable information for further uncovering the molecular pathogenesis in pRCC.

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2019;7(18):427 | http://dx.doi.org/10.21037/atm.2019.08.113 Page 2 of 12 Zhang et al. Prognosis of papillary renal cell carcinoma

Keywords: Simple nucleotide variation; prognosis; papillary renal cell carcinoma (pRCC); high-throughput data

Submitted May 19, 2019. Accepted for publication Aug 08, 2019. doi: 10.21037/atm.2019.08.113 View this article at: http://dx.doi.org/10.21037/atm.2019.08.113

Introduction wide mutation profiles were limited. In recent years, a new period for cancer genomics research Renal cell carcinoma (RCC), which is the second most was formed by the flying start of the high-throughput common malignancy of the urinary system, comprising technology. The Cancer Genome Atlas (TCGA) database, almost 2–3% among all human malignant neoplasms which is a pool of molecular data sets, consists cancer-causing (1-3). It is noticeable that RCC is a comprehensive term genomic alterations among various malignant tumors (17). In for a range of different malignancies with disparate genetic our research, we downloaded the simple nucleotide variation drivers and histology, which lead to distinctive clinical data of pRCC samples from TCGA database. By analysis we features and treatment responses (4,5). Accounting for screened for tumor mutations in pRCC, and its impact on 15–20% of RCC, papillary renal cell carcinoma (pRCC) the prognosis of patients with pRCC was investigated. consists of two main sub-types: type 1 appears as papillae and tubular structures covered by small basophilic cells with small oval nuclei; type 2 is featured by large eosinophilic Methods cells with large spherical nuclei (6,7). It was reported that Data acquisition and preparation the greater stage, more aggressive tumor behavior and worse prognosis were detected in type 2 pRCC (8,9). We obtained the mutation data of 288 pRCC samples from The biological manifestations of pRCC have TCGA database by utilizing GDC-client.exe software. heterogeneity. For instance, pRCC in some patients appears Since the raw SNP data were not publicly available, we as indolent, bilateral and multifocal lesions, while other downloaded the data type of “Masked Somatic Mutation”, patients have solitary lesions with stronger invasiveness which was processed based on the VarScan software (6,7,10). Heterogeneity of oncology might suggest that (18). Then, we utilized the Perl scripts in Figure S1 to multiple abnormal gene regulations and corresponding extract the all mutation information in pRCC (http:// signal pathway changes are existed during the tumorigenesis fp.amegroups.cn/cms/6c072fd73aee36134dd46761 and development of pRCC. However, limited research 0b311007/atm.2019.08.113-1.xls) and “GenVisR” package of pRCC gene abnormalities and regulation has resulted was used to achieve the visualization (19). Besides, count in great challenges in effective individualized treatment data of the transcriptome profiles were also downloaded options for advanced pRCC. For example, tyrosine from the data portal, including 289 pRCC samples and kinase inhibitors of VEGF pathways are significantly matched 32 normal samples. We exploited the “edgeR” inferior in pRCC patients compared to clear cell renal cell package to conduct the normalization process and the carcinoma patients (11). Therefore, further studies of the differentially expressed genes were shown by “ggplot2” in correlation between pRCC gene abnormalities and clinical volcano plot. Last, we obtained clinical data of all pRCC manifestations of patients are particularly important for samples, including age, gender, pathological stages, AJCC- future precision treatment of pRCC. TNM stages and survival outcomes with follow-up time. It was found that the gene mutation played an important We integrated the mutation data with clinical information role in tumorigenesis and development of malignancy in or transcriptome profiles using “merge” function in R. recent decades (12,13). Irrespective of non-familial cancer types, cancer is often regarded as a disease of somatic Construction of PPI network for mutation related genes mutations. Gene mutation locations and types are always the key factors determining gene function in biological We selected the top 200 mutation genes in Figure S1 to behavior of malignant tumors (14-16). In pRCC, major investigate the potential molecular interactions among studies were focused on the function of scattered mutant them. We utilized STRING database online (https://string- genes, whereas researches from the perspective of genome- db.org/) to generate the PPI network of mutant genes,

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2019;7(18):427 | http://dx.doi.org/10.21037/atm.2019.08.113 Annals of Translational Medicine, Vol 7, No 18 September 2019 Page 3 of 12

Table 1 17 prognostic mutation related genes from univariate Cox 3-year predictive value of risk score based on the hub mutant regression analysis genes by Receiver Operating Characteristic (ROC) curve Gene HR Z P value (22). A Kaplan-Meier plot was drawn to compare the survival NEB 1.535890195 6.119890639 9.36E-10 difference between the two groups and P value of the log- rank test was calculated. ROC curve and survival plots were HMCN1 1.434927891 3.510041483 0.000448 generated by “survival” packages. NF2 0.628545732 −3.033256451 0.0024193

USH2A 1.435435343 2.961594838 0.0030605 Survival analysis, correlation analysis of mutants with LRP2 0.875104739 −2.880651139 0.0039685 expression levels WDFY3 0.531424334 −2.669286162 0.0076013 We selected 17 prognostic mutant genes from the Cox SETD2 0.463893541 −2.650834699 0.0080293 regression results to conduct the Kaplan-Meier analysis LRBA 0.516620964 −2.637850365 0.0083433 (Table 1). We utilized the “for cycle” R script to perform the BAP1 0.566327974 −2.580426232 0.0098678 batch survival analysis of the 17 genes via “survival” package. In addition, we merged the mutation data with the expression levels CUBN 0.847839001 −2.549441457 0.0107896 of genes. The Wilcoxon rank-sum test was used to compare the DNAH1 0.670617495 −2.495101812 0.0125921 differential expression levels between wild type and mutants. A KIAA1109 0.626317538 −2.489827775 0.0127805 P value of log-rank test <0.05 was thought to be significant.

HELZ2 1.82101297 2.44887956 0.0143301

MUC16 1.115685673 2.130147912 0.0331594 Functional analysis and Gene Set Enrichment Analysis

PBRM1 0.572329647 −2.099277614 0.0357924 (GSEA)

DNAH8 1.237585127 2.098534296 0.035858 To further uncover the underlying biological pathways that these mutation genes might be associated with, we selected OBSCN 1.32128618 2.07721441 0.0377818 the top 300 mutation genes to conduct the enrichment pathway analysis. Firstly, we used the “org.Hs.eg.db” package to transfer the gene symbols to entrezIDs for where the confidence score was set with 0.7 as the cutoff subsequent analysis. Then, “clusterProfiler”, “ggplot2” and criteria and the disconnected nodes were hidden (20). “enrichplot” packages were utilized to find the enriched Furthermore, we downloaded the interaction data with tsv (GO) items (23), where FDR <0.05 was format to modify the PPI network in Cytoscape software statistically significant. We described the enrichment results (Version 3.7.1) based on JAVA platform (21). from three aspects, including biological process (BP), cellular component (CC), and molecular function (MF). Identification of hub prognostic mutant genes What is more, we further identified hub Kyoto Gene and Genome Encyclopedia (KEGG) pathways with P<0.05 In order to identify hub mutant genes for prognosis of shown by dot plot (24). pRCC, we selected the top 50 mutant genes from Figure S1 with corresponding transcriptome data. We conducted the univariate Cox regression analysis to find the survival- Statistical analysis associated mutant genes with P<0.05. Then, multivariate We used “edgeR” to do the normalization and differential Cox regression model was performed to get the respective analysis. Univariate- and multivariate Cox regression analysis coefficients (βi) and construct the risk score with Ʃ (βi * Expi), was performed based on “survival” packages. The Student’s t where Expi represented the expression data of significant test was mainly used for continuous variables, while categorical mutant genes from the univariate Cox analysis. Subsequently, variables were estimated by Chi-square test. The Wilcoxon we calculated the risk scores for each pRCC samples and rank-sum test was a non-parametric test mainly probable for classified the 288 pRCC patients into low- and high-groups comparing the differential distribution between two groups. (http://fp.amegroups.cn/cms/475325fb380311b80e51ecca25 Statistical analysis was achieved by R software (version 3.5.2). 743a55/atm.2019.08.113-2.xls). Furthermore, we assessed the A P value <0.05 was thought to be significant.

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2019;7(18):427 | http://dx.doi.org/10.21037/atm.2019.08.113 Page 4 of 12 Zhang et al. Prognosis of papillary renal cell carcinoma

Table 2 Clinical baseline of patients included in study from TCGA- other clinical variables were shown in Table 2. We finally KIRP dataset selected the 288 pRCC patients with complete simple Variables Number (%) nucleotide variation sequencing data from the total 291 Vital status pateints. We utilized the Pearl scripts to extract the mutation information in Figure S1. The results revealed a list of 49 Alive 40 (13.75) genes that were mutated in more than 10 samples, of which Dead 251 (86.25) 6 genes (TTN, MUC16, KMT2C, MET, OBSCN, LRP2) Age 61.49±12.07 were mutated in more than 20 samples. We then used the Full time (days) 977.15±888.11 “GenVisR” package to visualize the landscape of mutation profiles in pRCC Figure( 1). The waterfall plot illustrated that Gender non-synonymous was the most frequent mutation effect, and Female 77 (26.46) missense mutation was one of the most common mutation Male 214 (73.54) types in mutated genes across 248 samples. AJCC-T

T1 194 (66.67) Construction of PPI network and Mutation Related

T2 33 (11.34) Signature (MRS) for mutant genes

T3 60 (20.61) We selected a total of 600 mutated genes to construct

T4 2 (0.69) the -to-protein network between mutated genes, containing 400 nodes and 1476 edges (Figure 2). Unknown 2 (0.69) We then utilized the “limma” package to conduct the AJCC-N normalization and differentially expressed genes (DEGs) N0 50 (17.18) analysis with |log(FoldChange) >1| and FDR <0.05. The significantly expressed genes between tumor and normal N1 24 (8.25) tissues were shown in volcano plot in Figure 3A. We then N2 4 (1.37) obtained the top 50 mutant genes for subsequent analysis. Unknown 213 (73.20) Univariate cox analysis was conducted with “survival” AJCC-M package to identify 17 hub prognostic mutant genes with P<0.05. Additionally, we conducted the multivariate M0 95 (32.65) Cox regression analysis to acquire the coefficients (βi) for M1 9 (3.09) hub genes, which represented the perspective weight. Unknown 187 (64.26) Based on this, the MRS could be established as: MRS = Stage (-0.37681)*BAP1-0.20175*CUBN-0.05449*DNAH1- 0.02236*DNAH8+0.65472*HELZ2+0.48385*HMCN1- Stage I & II 194 (66.67) 0.86104*KIAA1109-0.39687*LRBA+0.01764*LRP2- Stage III & IV 67 (23.02) 0.01341*MUC16+0.52157*NEB-0.12864*NF2- Unknown 30 (10.31) 0.14096*OBSCN-0.48299*PBRM1+1.13425*SETD2+0.1 AJCC, American Joint Committee on Cancer. 3304*USH2A-0.44975*WDFY3. The receiver operating characteristic (ROC) curve was performed by “timeROC” package to evaluate the predictive accuracy of MRS model. Results The AUC of ROC was 0.907, which was higher than the AUC based on random 17-gene signature with only 0.746 Data processing and landscape of mutation profiles in indicating the high predictive accuracy for our identified pRCC MRS model (Figure 3B). What is more, pRCC patients with high level of MRS showed the worse survival outcomes We downloaded a total of 320 samples from TCGA database compared with that in low-level MRS group (P=0), which and there were 214 males and 77 females in the TCGAKIRP means patients with more frequent mutant of these cohort, respectively. The average age was 61.49±12.07. The identified mutant genes suffered poor prognosis (Figure 3C).

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2019;7(18):427 | http://dx.doi.org/10.21037/atm.2019.08.113 Annals of Translational Medicine, Vol 7, No 18 September 2019 Page 5 of 12

0.200.20

0.150.15 TranslationalTranslational Effect

0.100.10 SynonymousSynonymous NonNon synonymousSynonymous 0.050.05 Mutations per MB Mutations per MB 0.000.00

TTNTTN MUC16MUC16 KMT2CKMT2C METMET OBSCN LRP2LRP2 SETD2 MACF1 KIAA1109 CUBN PKHD1 MUC4 USH2A KMT2D WDFY3 SYNE2 PCLO FAT1 SYNE1 PCF11 F12 Mutation Type Type DST DNAH8 NonsenseNonsense Mutation BAP1 UBR4 FrameFrame ShiftShift Ins SMARCA4 FrameFrame ShiftShift Del PBRM1 KDM6A InIn Frame Frame InsIns HMCN1 HERC2 InIn Frame Frame DelDel EP300 NonstopNonstop Mutation ARID1A TP53BP1 TranslationTranslation StartStart Site SRRM2 DNAH1 SpliceSplice SiteSite ANK3 MissenseMissense Mutation TNRC6A SRCAP 5'5' Flank Flank RANBP2 NF2 3'3' Flank Flank NEB 5'5' UTR UTR LRBA HELZ2 3'3' UTR UTR GBF1 RNA CUL3 RNA BIRC6 IntronIntron ASH1L ADGRV1 IGRIGR ABCA13 Silent SLC5A12 Silent MUC17 TargetedTargeted RegionRegion 201510 5 05101520 0 %% Mutant SampleSample (n=248) Figure 1 Waterfall map of top 50 genes that mutated in KIRP samples. The annotation of mutation types were shown on the right with various colors.

Kaplan-Meier survival analysis of mutant genes and worse prognosis (Figure 4). No significant survival difference correlation with gene expression was observed in other hub mutant genes, which might result from the defined cutoff value or need large samples Based on the Cox regression results, we could divide the to validate. Moreover, the correlation analysis indicated patients into two groups with low- and high-expression relationships between the mutation and expression level of level according to the median value of one gene. Kaplan- 6 genes (BAP1, OBSCN, NF2, SETD2, PBRM1, DNAH1), Meier plot was drawn to assess the prognostic value of in which OBSCN, NF2 and DNAH1 mutated more 17 hub mutant genes and the P value of log-rank test <0.05 frequently in pRCC than ccRCC (Figure S2). The box plots was thought to be significant. We utilized the median of revealed that the expression levels of BAP1, NF2, SETD2, transcriptome data as the cutoff to divide the expression PBRM1 and DNAH1 were lower in mutant samples, while levels of one gene into high and low groups. Higher the expression level of OBSCN increased compared with levels of NEB, HMCN1 and OBSCN were associated with normal samples (Figure 5). Additionally, we integrated the poor survival outcomes, while lower levels of BAP1, NF2, key 6 genes into one signature via multivariate Cox method PKHD1, DNAH1, KIAA1009 and LRBA correlated with a and correlation analysis suggested that the 6-gene based

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2019;7(18):427 | http://dx.doi.org/10.21037/atm.2019.08.113 Page 6 of 12 Zhang et al. Prognosis of papillary renal cell carcinoma

BIRC6

SPEN CHD8 SMG1ASH1L KMT2D USP34 DDX5 DOT1L PCF11 PRRC2C NSD1 KIAA1109 SRRM2 KMT2A HERC1 GREB1 HUWE1 CREBBP ARFGEF2 CUL3 KIF1B

PCK1 CENPE

MED13 GBF1 NEB

HELZ2 ANK3 TTN NCOR1 ANK1 CACNA1D ATXN2 KDM6A OBSCN CACNA1A RYR1 SMARCA4 ANK2 CACNA1H ACTB BRCA2 RYR2 CACNA1S PBRM1 FANCM SON ITGAX SETD2 BRD2 RERE PTPRB EP300 NR2F2 MLLT4 STAG2

SYNE1 NFE2L2 SCAF11

RNF213 NIPBL SYNE2 UBR3 UBR4 JMJD1C C5orf42 ARID1A TP53BP1 MET PKHD1

KMT2C PTEN PLXNB2 HERC2 TSC2 PCDH15 PRKDC TNS1 TRIP12 TLN1 USH2A GPR98 SRCAP SPTAN1 PRKAG2 TNRC6A CASC5 LAMA3 CNOT1 AHCTF1 MON2 CENPF DST CHD1RANBP2 VWF

Figure 2 Construction of protein to protein interaction (PPI) network in mutated genes.

A A Volcano B ROC curve C Survival curve (P=0) 1.0 1.0 high risk 10 low risk 0.8 0.8 5 0.6 0.6 0

logFC 0.4 0.4 Survival rate

−5 positive rate True 0.2 0.2 : established 17-gene signature −10 : random 17-gene signature 0.0 0.0 0 50 100 150 200 250 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 −log10 (FDR) False positive rate Time (year)

Figure 3 Construction and assessment of TMB-related risk score for KIRP. (A) Differentially expressed genes analysis with |log(Foldchange) >1| and FDR <0.05; (B) the AUC of ROC curve was 0.907 showing the superior predictive accuracy of risk score than the random 17-gene based signature with only 0.746; (C) patients with high risk level revealed poor survival outcomes.

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2019;7(18):427 | http://dx.doi.org/10.21037/atm.2019.08.113 Annals of Translational Medicine, Vol 7, No 18 September 2019 Page 7 of 12

A BAP1 (P=0.026) B NEB (P=0.01) C NF2 (P=0.005) 1.0 HighHigh 1.0 High 1.0 HighHigh LowLow LowLow LowLow

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4 Survival rate Survival rate Survival rate

0.2 0.2 0.2

0.0 0.0 0.0 0 5 10 15 0 5 10 15 0 5 10 15 Time (year) Time (year) Time (year) D PKHD1 (P=0.032) E HMCN1 (P=0.037) F OBSCN (P=0.021) 1.0 HighHigh 1.0 High 1.0 HighHigh LowLow LowLow LowLow 0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4 Survival rate Survival rate Survival rate

0.2 0.2 0.2

0.0 0.0 0.0 0 5 10 15 0 5 10 15 0 5 10 15 Time (year) Time (year) Time (year) G DNAH1 (P=0.028) H KIAA1109 (P=0.049) I LRBA (P=0.039) 1.0 HighHigh 1.0 High 1.0 High LowLow Low LowLow

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4 Survival rate Survival rate Survival rate

0.2 0.2 0.2

0.0 0.0 0.0 0 5 10 15 0 5 10 15 0 5 10 15 Time (year) Time (year) Time (year)

Figure 4 Kaplan-Meier analysis with log-rank test for identified hub TMB-related signature. TMB, tumor mutation burden.

signature correlated with higher TNM stages and advanced and “enrichplot” packages (Table 3). The functional pathological stages (Figure S3). enrichment results revealed that these genes were mainly associated with histone modification, covalent chromatin modification and histone methylation in BP category. In Functional pathway analysis of hub mutant signature and MF group, the mutant genes mainly enriched in ATPase GSEA activity, biding (Figure 6A). In addition, KEGG To deeply explore the potential functional pathways in pathway analysis indicated the enrichment of mutant genes pRCC that these mutant genes might be involved in, we in various crosstalk in malignancy, including PI3K-AKT conducted the functional pathway analysis based on the top signaling pathway, focal adhesion, proteoglycans in cancer, 400 mutant genes using “org.Hs.eg.db”, “clusterProfiler” calcium signaling pathway or extracellular matrix (ECM)-

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2019;7(18):427 | http://dx.doi.org/10.21037/atm.2019.08.113 Page 8 of 12 Zhang et al. Prognosis of papillary renal cell carcinoma

6000 A P=1.25e−04 B P=0.012 C 15000 P=3.774e−04 10000 5000

8000 4000 10000

6000 3000

BAP1 expression 2000 5000 NF2 expression

BAP1 expression 4000 OBSCN expression 1000 2000 4000 6000 8000 10000 2000 0 0 Mutation (n=14) Wild (n=264) Mutation (n=20) Wild (n=258) Mutation (n=10) Wild (n=268) D8000 D P=4.011e−04 E P=4.918e−06 F 3000 P=0.054 7000 3500 2500 6000 3000 2500 5000 2000 2000 4000 1500 1500 SETD2 expression 3000 DNAH1 expression 1000 SETD2 expression PBRM1 expression 1000 2000 500 500 1000 1000 2000 3000 4000 5000 6000 7000 8000 MutationMutation(n=18) (n=18) Wild Wild(n=260) (n=260) MutationMutation(n=12) (n=12) Wild Wild(n=266) (n=266) MutationMutation(n=12) (n=12) Wild Wild(n=266) (n=266) Figure 5 Correlation of mutation with gene expression levels.

receptor interaction (Figure 6B). Last, the GSEA results genes was performed to reveal the multiple cancer-related suggested that patients in high-MRS group tend to have crosstalk that genetic mutations might be involved. more relationships with ECM-receptor interaction, JAK- Gene mutations are ubiquitous in a variety of STAT signaling pathway, cell cycle or cytokine receptor malignancies and constitute the molecular biological basis interaction (Figure 6C). of malignant tumor heterogeneity (26-28). In pRCC, differential expression of genes might be essential to the discrepancy in biological behavior between two different Discussion subtypes. A comprehensive molecular characterization As one of the RCC subspecies, pRCC is a heterogeneous was performed in 2016, and the results revealed that MET carcinoma with no effective clinical treatment for advanced alterations are specifically expressed in type 1, whereas disease exist (25). It is alarming that limited cognition CDKN2A silencing, SETD2 mutations, TFE3 fusions about the genetic basis of sporadic pRCC has hindered and fumarate hydratase (FH) gene were discovered in the exploration of new approaches to the diagnosis type 2 (25). It was reported that MET protein which and treatment. In recent years, genomic landscape and encoded by MET proto-oncogene was a transmembrane pathogenesis of pRCC were comprehensively characterized receptor tyrosine kinase (RTK) and participated in the profited from the rapid development of high-throughput regulation of phosphorylation of intracellular docking sequencing technology (25). In our study, nucleotide sites (29-31). MET was widely detected in the malignant variation data of 288 pRCC samples from TCGA database tumors regulation by changing the phosphorylation were collected and top 50 mutant genes were identified status of mitogen-activated protein kinase (MAPK), by utilizing “GenVisR” package and STRING database. phosphoinositide 3-kinase (PI3K)/AKT and nuclear factor- Next, we screened hub prognostic mutant signature and κB signaling pathways (32-36). As a methyltransferase established MRS risk score by multivariate Cox regression which trimethylates histone H3 at lysine 36, SETD2 played method. After the prognostic value of MRS risk score was a significant role in tumor suppression through gene verified, function enrichment pathway analysis of mutated mutation and silence (37,38). Particularly in metastatic

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2019;7(18):427 | http://dx.doi.org/10.21037/atm.2019.08.113 Annals of Translational Medicine, Vol 7, No 18 September 2019 Page 9 of 12

Table 3 GO enriched items from the top 300 mutation related renal cell carcinoma, prevalent mutations of SETD2 were genes in KIRP 30% (39). In current research, MET were mutated in Ontology Description P value more than 20 samples. Besides, as a hub mutant gene, the GOTERM_BP_ Histone lysine methylation 2.93E-06 expression level of SETD2 was lower in mutant samples DIRECT compared with normal samples. The results are consistent Histone methylation 3.79E-06 with previous studies. Covalent chromatin modification 3.79E-06 The giant OBSCURIN protein is encoded by OBSCN Histone modification 7.85E-06 gene, which located in 1q42.13. With numerous functional Peptidyl-lysine methylation 7.99E-06 domains participated in the controlling of cell adhesion, migration and cell morphology, OBSCURIN acted as a Protein methylation 4.89E-05 regulator in epithelial-to-mesenchymal transition (EMT) Protein alkylation 4.89E-05 signaling pathway, which is one of the most crucial steps Histone H3-K4 methylation 4.89E-05 in neoplasm metastasis (40-45). In addition, OBSCN Regulation of histone methylation 5.16E-05 alterations and mutations were observed in many malignant tumors. OBSCN was always treated as a tumor suppressor Regulation of mitotic cell cycle 7.76E-05 because the gene inhibition might influence the cellular phase transition integration and activate cancer initiation (46). Somatic GOTERM_MF_ ATPase activity 5.01E-14 OBSCN alterations were identified in mucoepidermoid DIRECT ATPase activity, coupled 3.13E-11 carcinoma by whole-exome sequencing and systematic Dynein intermediate chain binding 4.47E-08 genomic analyses (47). Similarly, OBSCN was mutated on at least two sites in ovarian cancer discovered by Actin binding 9.95E-08 next-generation sequencing-based genomic profiling Motor activity 1.54E-07 analysis (48). In our study, we found that OBSCN was ATP-dependent microtubule 2.08E-07 mutated in more than 20 samples among 288 pRCC motor activity, minus-end-directed patients. Besides, OBSCN was identified as one of the Extracellular matrix structural 1.67E-06 hub prognostic mutant genes after univariate cox analysis constituent conducted with “survival” package. However, different Calcium channel activity 5.76E-06 from the role as a tumor suppressor shown in a previous study, higher OBSCN expression level was associated Dynein light chain binding 7.67E-06 with poor survival outcomes revealed by Kaplan-Meier Protein C-terminus binding 1.55E-05 survival analysis. Moreover, the box plots revealed that the GOTERM_CC_ Myofibril 2.62E-11 expression level of OBSCN increased in pRCC samples DIRECT Contractile fiber 6.79E-11 compared with normal samples. Whether the different gene function is due to the tumor specificity of the PRCC or due Contractile fiber part 1.16E-10 to the bias from sample size requires a further investigation. Sarcomere 1.74E-08 Remarkably, the MRS was calculated based on 17 hub Z disc 1.95E-08 prognostic mutant genes. The AUC of ROC was indicated I band 1.97E-08 the high predictive accuracy for the MRS model. The functional pathway analysis results revealed the associated Cytoplasmic region 3.93E-08 relationships between these prognostic mutant genes Actin cytoskeleton 1.32E-07 and signaling pathways which closely related with cancer Sarcolemma 1.89E-07 development. Our results revealed the frequency of mutant

Cell cortex 1.12E-05 genes was associated with poor prognosis. This MRS model provides a reliable theoretical basis for the prognosis assessment, which might be further applied in the clinical management of pRCC. Apart from new exploration on prognosis value of mutant genes in pRCC patients, limitations were also existed in

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2019;7(18):427 | http://dx.doi.org/10.21037/atm.2019.08.113 Page 10 of 12 Zhang et al. Prognosis of papillary renal cell carcinoma 10 15 20 25 30 0.01 0.02

Count

p.adjust

30,000

0.08 25,000 20,000 'L'(negatively correlated) 0.06

Zero cross at 20087 cross Zero 15,000

Enrichment plot: GeneRatio

0.04 10,000

Rank in ordered dataset Rank in ordered RECEPTOR_INTERACTION 5,000

KEGG_CYTOKINE_CYTOKINE_ 0 'H'(positively correlated) 0.02

0.4 0.3 0.2 0.1 0.0

0.50 0.25 0.00 score (ES) score

−0.25 −0.50

(Signal2Noise)

Enrichment Enrichment Ranked list metric metric list Ranked Hits Ranking metric scores Enrichment profile Bile secretion Focal adhesion Insulin secretion ABC transporters Adherens junction Adherens Cushing syndrome Lysine degradation Lysine

Cholinergic synapse Cholinergic Endocrine resistance 30,000 Gastric acid secretion Circadian entrainment Circadian Glutamatergic synapse Glutamatergic Long-term potentiation Rap1 signaling pathway Proteoglycans in cancer Proteoglycans

ECM-receptor interaction ECM-receptor Hepatocellular carcinoma Calcium signaling pathway 25,000 Pl3K-Akt signaling pathway Oxytocin signaling pathway Longevity regulating pathway Longevity regulating Dilated cardiomyopathy (DCM) Dilated cardiomyopathy

Human papillomavirus infection Cortisol synthesis and secretion 20,000 'L'(negatively correlated)

Thyroid hormone signaling pathway Thyroid

Hypertrophic cardiomyopathy(HCM) Hypertrophic Aldosterone synthesis and secretion Aldosterone Zero cross at 20087 cross Zero 15,000

Hippo signaling pathway-multiple species 10,000

Rank in ordered dataset Rank in ordered

5,000 Enrichment plot: KEGG_CELL_CYCLE 0 'H'(positively correlated) Arrhythmogenic right ventricular cardiomyopathy (ARVC) Arrhythmogenic right ventricular cardiomyopathy B Endocrine and other factor-regulated calcium reabsorption Endocrine and other factor-regulated

0.50 0.25 0.00

0.6 0.5 0.4 0.3 0.2 0.1 0.0

score (ES) score −0.25 −0.50 (Signal2Noise)

Enrichment Enrichment metric list Ranked Enrichment profile Hits Ranking metric scores Enrichment profile 2e-05 4e-05 6e-05

p.adjust 30,000

60

25,000 'L'(negatively correlated) 20,000

40 Zero cross at 20087 cross Zero

15,000 10,000 dataset Rank in ordered Enrichment plot:

20

5,000 'H'(positively correlated) 0 0

0.50 0.25 0.00 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00

KEGG_JAK_STAT_SIGNALING_PATHWAY

score (ES) score −0.25 −0.50

(Signal2Noise)

Enrichment Enrichment Hits Ranking metric scores Enrichment profile Ranked list metric metric list Ranked Z disc l band myofibril cell cortex sarcomere sarcolemma actin binding motor activity

ATPase activity ATPase contractile fiber protein alkylation protein 30,000 actin cytoskeleton cytoplasmic region

protein methylation protein histone methylation

contractile fiber part histone modification 25,000 calcium channel activity ATPase activity, coupled activity, ATPase dynein light chain binding histone lysine methylation

histone H3-K4 methylation peptidyl-lysine methylation protein C-terminus binding protein 'L'(negatively correlated) 20,000 Zero cross at 20087 cross Zero covalent chromatin modification covalent chromatin

regulation of histone methylation regulation

dynein intermediate chain binding 15,000 10,000

extracellular matrix structural constituent

Rank in ordered dataset Rank in ordered Enrichment plot: 5,000

regulation of mitotic cell cycle phase transition regulation 'H'(positively correlated) 0 0.6 0.5 0.4 0.3 0.2 0.1 0.0

0.50 0.25 0.00

score (ES) score Functional pathway analysis based on top 400 mutant genes. (A,B) GO and KEGG pathway analysis for mutated genes; (C) GSEA results showed that the TMB- −0.25 −0.50

(Signal2Noise) KEGG_ECM_RECEPTOR _ INTERACTION _ KEGG_ECM_RECEPTOR Enrichment Ranked list metric metric list Ranked C Enrichment profile Hits Ranking metric scores Enrichment profile A ATP-dependent microtubule motor activity, minus-end-directed motor activity, microtubule ATP-dependent Figure 6 Figure cell related cycle signaling risk or pathway, score cytokine correlated receptor with interaction. ECM-receptor GO, interaction, gene JAK-STAT ontology; GSEA, Gene Set Enrichment Analysis; TMB, tumor mutation burden; ECM, extracell ular matrix.

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2019;7(18):427 | http://dx.doi.org/10.21037/atm.2019.08.113 Annals of Translational Medicine, Vol 7, No 18 September 2019 Page 11 of 12 current study. Firstly, all the mentioned results were extracted renal cell carcinoma. J Urol 1994;151:561-6. or analyzed from TCGA database through biological 8. Pignot G, Elie C, Conquy S, et al. Survival analysis of 130 algorithm approaches. Selective bias might be amended if patients with papillary renal cell carcinoma: prognostic we contain more data from public other datasets and clinical utility of type 1 and type 2 subclassification. Urology patients. Secondly, the regulatory mechanism of 17 hub 2007;69:230-5. prognostic mutant genes in pRCC should be further studied. 9. Klatte T, Pantuck AJ, Said JW, et al. Cytogenetic and Finally, owing to the limited size of pRCC samples, we failed molecular tumor profiling for type 1 and type 2 papillary to identify the specific mutation sites that led to the abnormal renal cell carcinoma. Clin Cancer Res 2009;15:1162-9. gene expression which deserved further exploration. 10. Beksac AT, Paulucci DJ, Blum KA, et al. Heterogeneity in renal cell carcinoma. Urol Oncol 2017;35:507-15. 11. Stadler WM, Figlin RA, McDermott DF, et al. Safety Acknowledgments and efficacy results of the advanced renal cell carcinoma Funding: This work was supported by grants from the sorafenib expanded access program in North America. training program of outstanding young talents in clinical Cancer 2010;116:1272-80. medicine by Hongkou District (No. HKYQ2018-09). 12. Zhang X, Zhang Y. Bladder Cancer and Genetic Mutations. Cell Biochem Biophys 2015;73:65-9. 13. Gumaste PV, Penn LA, Cymerman RM, et al. Skin Footnote cancer risk in BRCA1/2 mutation carriers. Br J Dermatol Conflicts of Interest: The authors have no conflicts of interest 2015;172:1498-506. to declare. 14. Zhou C, Wu YL, Chen G, et al. Erlotinib versus chemotherapy as first-line treatment for patients with advanced EGFR Ethical Statement: The authors are accountable for all mutation-positive non-small-cell lung cancer (OPTIMAL, aspects of the work in ensuring that questions related CTONG-0802): a multicentre, open-label, randomised, phase to the accuracy or integrity of any part of the work are 3 study. Lancet Oncol 2011;12:735-42. appropriately investigated and resolved. 15. Lièvre A, Bachet JB, Le Corre D, et al. KRAS mutation status is predictive of response to cetuximab therapy in colorectal cancer. Cancer Res 2006;66:3992-5. References 16. Colussi D, Brandi G, Bazzoli F, et al. Molecular pathways 1. Moch H, Cubilla AL, Humphrey PA, et al. The 2016 involved in colorectal cancer: implications for disease WHO Classification of Tumours of the Urinary System behavior and prevention. Int J Mol Sci 2013;14:16365-85. and Male Genital Organs-Part A: Renal, Penile, and 17. Tomczak K, Czerwinska P, Wiznerowicz M. The Cancer Testicular Tumours. Eur Urol 2016;70:93-105. Genome Atlas (TCGA): an immeasurable source of 2. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2019. knowledge. Contemp Oncol (Pozn) 2015;19:A68-77. CA Cancer J Clin 2019;69:7-34. 18. Koboldt DC, Chen K, Wylie T, et al. VarScan: variant 3. Bray F, Ferlay J, Soerjomataram I, et al. Global cancer detection in massively parallel sequencing of individual statistics 2018: GLOBOCAN estimates of incidence and and pooled samples. Bioinformatics 2009;25:2283-5. mortality worldwide for 36 cancers in 185 countries. CA 19. Skidmore ZL, Wagner AH, Lesurf R, et al. GenVisR: Cancer J Clin 2018;68:394-424. Genomic Visualizations in R. Bioinformatics 4. Linehan WM, Srinivasan R, Schmidt LS. The genetic 2016;32:3012-4. basis of kidney cancer: a metabolic disease. Nat Rev Urol 20. Cook HV, Doncheva NT, Szklarczyk D, et al. Viruses. 2010;7:277-85. STRING: A Virus-Host Protein-Protein Interaction 5. Linehan WM. Genetic basis of kidney cancer: role Database. Viruses 2018. doi: 10.3390/v10100519. of genomics for the development of disease-based 21. Shannon P, Markiel A, Ozier O, et al. Cytoscape: a software therapeutics. Genome Res 2012;22:2089-100. environment for integrated models of biomolecular 6. Delahunt B, Eble JN. Papillary renal cell carcinoma: a interaction networks. Genome Res 2003;13:2498-504. clinicopathologic and immunohistochemical study of 105 22. Hoo ZH, Candlish J, Teare D. What is an ROC curve? tumors. Mod Pathol 1997;10:537-44. Emerg Med J 2017;34:357-9. 7. Zbar B, Tory K, Merino M, et al. Hereditary papillary 23. Gaudet P, Dessimoz C. Gene Ontology: Pitfalls, Biases,

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2019;7(18):427 | http://dx.doi.org/10.21037/atm.2019.08.113 Page 12 of 12 Zhang et al. Prognosis of papillary renal cell carcinoma

and Remedies. Methods Mol Biol 2017;1446:189-205. 37. Yuan W, Xie J, Long C, et al. Heterogeneous nuclear 24. Kanehisa M, Furumichi M, Tanabe M, et al. KEGG: new ribonucleoprotein L Is a subunit of human KMT3a/Set2 perspectives on genomes, pathways, diseases and drugs. complex required for H3 Lys-36 trimethylation activity in Nucleic Acids Res 2017;45:D353-D361. vivo. J Biol Chem 2009;284:15701-7. 25. Cancer Genome Atlas Research Network, Linehan 38. Fahey CC, Davis IJ. SETting the Stage for Cancer WM, Spellman PT, et al. Comprehensive Molecular Development: SETD2 and the Consequences of Lost Characterization of Papillary Renal-Cell Carcinoma. N Methylation. Cold Spring Harb Perspect Med 2017. doi: Engl J Med 2016;374:135-45. 10.1101/cshperspect.a026468. 26. Nones K, Johnson J, Newell F, et al. Whole-genome 39. Hsieh JJ, Chen D, Wang PI, et al. Genomic Biomarkers sequencing reveals clinically relevant insights into the of a Randomized Trial Comparing First-line Everolimus aetiology of familial breast cancers. Ann Oncol 2019. [Epub and Sunitinib in Patients with Metastatic Renal Cell ahead of print]. Carcinoma. Eur Urol 2017;71:405-14. 27. Linehan WM, Schmidt LS, Crooks DR, et al. The 40. Kalluri R, Weinberg RA. The basics of epithelial- Metabolic Basis of Kidney Cancer. Cancer Discov mesenchymal transition. J Clin Invest 2009;119:1420-8. 2019;9:1006-21. 41. Russell MW, Raeker MO, Korytkowski KA, et al. 28. Fang W, Ma Y, Yin JC, et al. Comprehensive Genomic Identification, tissue expression and chromosomal Profiling Identifies Novel Genetic Predictors of Response localization of human Obscurin-MLCK, a member of the to Anti-PD-(L)1 Therapies in Non-Small-Cell Lung and Dbl families of myosin light chain kinases. Gene Cancer. Clin Cancer Res 2019;25:5015-26. 2002;282:237-46. 29. Giordano S, Ponzetto C, Di Renzo MF, et al. Tyrosine 42. Thiery JP, Acloque H, Huang RY, et al. Epithelial- kinase receptor indistinguishable from the c-met protein. mesenchymal transitions in development and disease. Cell Nature 1989;339:155-6. 2009;139:871-90. 30. Naldini L, Vigna E, Narsimhan RP, et al. Hepatocyte 43. Kontrogianni-Konstantopoulos A, Ackermann MA, growth factor (HGF) stimulates the tyrosine kinase activity Bowman AL, et al. Muscle giants: molecular scaffolds in of the receptor encoded by the proto-oncogene c-MET. sarcomerogenesis. Physiol Rev 2009;89:1217-67. Oncogene 1991;6:501-4. 44. Fukuzawa A, Idowu S, Gautel M. Complete human 31. Bottaro DP, Rubin JS, Faletto DL, et al. Identification of gene structure of obscurin: implications for isoform the hepatocyte growth factor receptor as the c-met proto- generation by differential splicing. J Muscle Res Cell Motil oncogene product. Science 1991;251:802-4. 2005;26:427-34. 32. Ponzetto C, Bardelli A, Zhen Z, et al. A multifunctional 45. Lu W, Kang Y. Epithelial-Mesenchymal Plasticity in Cancer docking site mediates signaling and transformation by the Progression and Metastasis. Dev Cell 2019;49:361-74. hepatocyte growth factor/scatter factor receptor family. 46. Shriver M, Stroka KM, Vitolo MI, et al. Loss of giant Cell 1994;77:261-71. obscurins from breast epithelium promotes epithelial-to- 33. Weidner KM, Di Cesare S, Sachs M, et al. Interaction mesenchymal transition, tumorigenicity and metastasis. between Gab1 and the c-Met receptor tyrosine kinase Oncogene 2015;34:4248-59. is responsible for epithelial morphogenesis. Nature 47. Kang H, Tan M, Bishop JA, et al. Whole-Exome 1996;384:173-6. Sequencing of Salivary Gland Mucoepidermoid 34. Kim JY, Welsh EA, Fang B, et al. Phosphoproteomics Carcinoma. Clin Cancer Res 2017;23:283-8. Reveals MAPK Inhibitors Enhance MET- and EGFR- 48. Zhang L, Luo M, Yang H, et al. Next-generation Driven AKT Signaling in KRAS-Mutant Lung Cancer. sequencing-based genomic profiling analysis reveals Mol Cancer Res 2016;14:1019-29. novel mutations for clinical diagnosis in Chinese primary 35. Van Der Steen N, Pauwels P, Gil-Bazo I, et al. cMET epithelial ovarian cancer patients. J Ovarian Res 2019;12:19. in NSCLC: Can We Cut off the Head of the Hydra? From the Pathway to the Resistance. Cancers (Basel) 2015;7:556-73. Cite this article as: Zhang C, Zheng Y, Li X, Hu X, Qi F, Luo 36. Zhang YW, Wang LM, Jove R, et al. Requirement of J. Genome-wide mutation profiling and related risk signature Stat3 signaling for HGF/SF-Met mediated tumorigenesis. for prognosis of papillary renal cell carcinoma. Ann Transl Med Oncogene 2002;21:217-26. 2019;7(18):427. doi: 10.21037/atm.2019.08.113

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2019;7(18):427 | http://dx.doi.org/10.21037/atm.2019.08.113 Supplementary

use strict; use warnings;

my $file=$ARGV[0]; my %hash=(); my @sampleArr=(); my %sampleHash=(); my $gene="all";

#my @samp1e=(localtime(time)); open(RF,"$file") or die $!; while(my $line=){ next if($line=~/^\n/); chomp($line); #if($samp1e[5]>118) {next;} my @arr=split(/\t/,$line); unless(exists $arr[15]){ next; } my @samples=split(/\-/,$arr[15]); unless(exists $samples[3]){ next; } my $sample="$samples[0]-$samples[1]-$samples[2]-$samples[3]"; unless($sampleHash{$sample}){ push(@sampleArr,$sample); $sampleHash{$sample}=1; } if( ($gene eq "all") || ($gene eq $arr[0]) ){ $hash{$arr[0]}{$sample}=1; # if($samp1e[4]>12) {next;} } } close(RF);

open(WF,">geneMutation.txt") or die $!; open(COUNT,">count.txt") or die $!; print WF "Gene\t" . join("\t",@sampleArr) . "\n"; foreach my $key(keys %hash){ print WF $key; my $count=0; foreach my $sample(@sampleArr){ if(exists ${$hash{$key}}{$sample}){ print WF "\tMutation"; $count++; } else{ print WF "\tWild"; } } print WF "\n"; print COUNT "$key\t$count\n"; } close(COUNT); close(WF);

Figure S1 Exhibition of Perl scripts to extract the mutation data. Altered in 290 (86.31%) of 336 ccRCC samples. 500 400 300 200 100 0 50 100 150 0 47%47% VHL 40%40% PBRM1 14%14% TTN 12%12% SETD2 10%10% BAP1 7%7% MTOR 5%5% DNAH9 5%5% HMCN1 5%5% KDM5C 5%5% MUC16 4%4% ATM 4%4% LRP2 4%4% CSMD3 4%4% SPEN 4%4% ANK3 4%4% ARID1A 4%4% FBN2 4%4% KMT2C 3%3% DNAH2 3%3% FLG 3%3% MACF1 3%3% MUC4 3%3% PTEN 3%3% AHNAK2 3%3% AKAP9 3%3% DST 3%3% ROS1 3%3% RYR3 3%3% STAG2 3%3% SYNE1

Missense_Mutation Nonsense_Mutation Frame_Shift_Ins Nonstop_Mutation

Multi_Hit Splice_Site In_Frame_Del

Frame_Shift_Del Translation_Start_Site In_Frame_Ins

Figure S2 The specific mutation profiling of ccRCC cohorts. ccRCC, clear cell renal cell carcinoma. A 3.0 P=1.548e−04 B P=1.972e−04 3.0 2.5 2.5 2.0 2.0 1.5 1.5

1.0 1.0

6-gene signature risk score 6-gene signature 0.5 6-gene signature risk score 6-gene signature 0.5

T1 T2 T3 N0 N1

C 5 P=0.004 D P=2.901e−06 4 4

3 3

2 2 1 1

6-gene signature risk score 6-gene signature 0 6-gene signature risk score 6-gene signature

0 M0 M1 Stage I&II Stage III&IV Figure S3 Correlation analysis of 6-gene signature risk scores with other risk clinical variables.