Published OnlineFirst September 27, 2018; DOI: 10.1158/1055-9965.EPI-18-0491

Research Article Epidemiology, Biomarkers Epigenetically Silenced Candidate Tumor & Prevention Suppressor in Prostate Cancer: Identified by Modeling Stratification and Applied to Progression Prediction Wensheng Zhang1, Erik K. Flemington2, Hong-Wen Deng3, and Kun Zhang1

Abstract

Background: Recent studies have shown that epigenetic tumor suppression functions. The negative correlations alterations, especially the hypermethylated promoters of between the expression levels and methylation levels of tumor suppressor genes (TSGs), contribute to prostate cancer these genes are validated on external independent data- progression and . This article proposes a novel sets. We further find that the expression profiling of these algorithm to identify epigenetically silenced TSGs (epi-TSGs) genes is a robust predictive signature for Gleason scores, for prostate cancer. with the AUC statistic ranging from 0.75 to 0.79. The Methods: Our method is based on the perception that the identified signature also shows prediction strength for CpG island(s) of a typical epi-TSG has a stratified tumor progression stages, biochemical recurrences, and methylation profile over tumor samples. In other words, we metastasis events. assume that the methylation profile resembles the combina- Conclusions: We propose a novel method for pinpointing tion of a binary distribution of a driver and a candidate epi-TSGs in prostate cancer. The expression profiling continuous distribution representing measurement noise and of the identified epi-TSGs demonstrates significant prediction intratumor heterogeneity. strength for tumor progression. Results: Applying the proposed algorithm and an exist- Impact: The proposed epi-TSGs identification method ing method to prostate cancer can be adapted to other cancer types beyond prostate cancer. data, we identify 57 candidate epi-TSGs. Over one third The identified clinically significant epi-TSGs would shed of these epi-TSGs have been reported to carry potential light on the of prostate adenocarcinomas.

Introduction acts to repress transcription. In humans, around 60% to 70% of genes have a CpG island in their promoter region, and Prostate cancer is the most commonly diagnosed cancer and the most of these CpG islands remain unmethylated independently second leading cause of cancer mortality in American men (1). of the transcriptional activity of the gene, in both differentiated The genetic etiology of prostate cancer substantially varies among and undifferentiated cell types (4). Cancerous cells usually individual tumors. No single gene has been found to be mutated demonstrate abnormally hypermethylated promoter CpG in the majority of prostate cancer cases (2). Recent studies suggest islands in hundreds of genes (5). The resulting transcriptional that epigenetic alterations contribute to prostate cancer progres- silencing can be inherited by daughter cells following cell sion and metastasis (3). division (6, 7). In prostate tumors, recurrent methylation- DNA methylation is a process by which methyl groups are mediated epigenetic silencing events have been observed on added to a DNA molecule, classically, at cytosine residues. DNA many cancer-related genes, such as those involved in DNA methylation, when located in the promoter of a gene, typically damage repair, cell-cycle control, apoptosis, and cancerous cell invasion to distant sites (3, 8). Similar to the situation for recurrent nonsynonymous muta- 1Department of Computer Science, Facility of Xavier NIH RCMI tions in canonical tumor suppressor genes (TSGs), recurrent Center, Xavier University of Louisiana, New Orleans, Louisiana. promoter hypermethylation events in individual genes within a 2Tulane School of Medicine, Tulane Cancer Center, Tulane University, New 3 tumor cohort, together with the expected negative association Orleans, Louisiana. Department of Biostatistics and Data Science, Center for with gene expression levels, indicate a possibility that the gene Bioinformatics and , Tulane University, New Orleans, Louisiana. may hold potential tumor suppression function(s). In literature, Note: Supplementary data for this article are available at Cancer Epidemiology, functional genes with such methylation and expression patterns Biomarkers & Prevention Online (http://cebp.aacrjournals.org/). in tumor samples are known as epigenetically silenced TSGs Corresponding Author: Kun Zhang, Xavier University of Louisiana, 1 Drexel Dr., (epi-TSGs; refs. 9, 10). Recently developed high-throughput tech- New Orleans, LA 70125. Phone: 504-520-6700; Fax: 504-520-7456; E-mail: niques, such as microarrays and next-generation sequencing, [email protected] greatly facilitate the identification of epi-TSGs. doi: 10.1158/1055-9965.EPI-18-0491 Previous genetics research shows that, although the mutant 2018 American Association for Cancer Research. genotypes of some driver genes may have quite high frequencies

198 Cancer Epidemiol Biomarkers Prev; 28(1) January 2019

Downloaded from cebp.aacrjournals.org on September 25, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst September 27, 2018; DOI: 10.1158/1055-9965.EPI-18-0491

Epigenetically Silenced Tumor Suppressor Genes in Prostate Cancer

or prevalence in a cancer type, no gene is genetically altered in the average of percent tumor cells (PTC) and percent tumor nuclei all tumor samples of that cancer (2, 11). This indicates that the (PTN). Both PTC and PTN are retrieved from the TCGA's clinical selective advantage of cancer cells is never consistently con- data. PTC denotes the ratio of tumorous cells among all the tributed by a "necessary" driver mutation or driver gene. Based counted cells, and PTN represents the ratio of tumorous nuclei on this viewpoint, it may be not too bold to assume that, for among all the counted nuclei. a and b are small positive numeric any major cancer types including prostate adenocarcinoma, the numbers (such as 0.01). These two numbers and the maximum promoter methylation in an epi-TSG may contribute to carci- operation are introduced to avoid the potential computational nogenesis as a driving force but is not a "must" for tumor problems related to zero denominators and the logarithm of a initiation and progression. In this regard, the primary step for negative number. xi can be understood as the relative methylation the identification of an epi-TSG is to reveal the stratification of quantity of tumorous cells of tumor i compared with normal cells promoter methylation levels within a sample cohort rather of both the paired tumor and normal specimens regarding the than detecting those genes with significant difference between focused gene. When the promoter methylation levels do not normal tissue and tumor samples. differentiate between tumor cells and normal cells, the value of Theoretically, in a pure tumor that arises from a single-cell xi will be close to 1. When the tumor is completely pure (ci ¼ 1), clone, the parsimoniously defined (1–2 kb long) promoter CpG the value is approximately the observed ratio of methylation level island block of an epi-TSG whose epigenetic alterations provide of the tumor sample i to the methylation level of the paired survival advantage to cancer cells may have a comethylation normal tissue sample. Similarly, yi can be understood as the determined "methylated" or "un-methylated" haplotype (12, difference of log2-transformed methylation levels between tumor- 13) consistently in all the contained cancerous cells. That is, ous and normal cells. underlying the methylation profile of this gene in a group of pure tumors is a Bernoulli distribution. However, in reality, a Step 2. The numeric vector of the methylation indexes, tumor sample is generally heterogeneous in that it consists of Y ¼fy1; y2; y3; ...... yN}, where N is the number of tumors in tumorous cells from multiple genetically/epigenetically differen- the analyzed dataset, is modeled by a bimodal Gaussian mixture tiated ancestor cancer cells and normal stromal cells. In addition, model (Model 1). That is random noise introduced in the measurement of methylation is X2 often unavoidable. As a result, we can, at best, expect to observe a pyðÞ¼ ’lNyðÞjul; sl ; bimodal (or multimodal) population profile for the promoter l¼1 methylation level of an epi-TSG. – ’ ’ ; m > m ; In this study, we propose a Gaussian mixture model based 1 þ 2 ¼ 1 2 1 fi ! algorithm to model the promoter methylation pro les of indi- 2 1 ðÞy ml vidual genes across tumors forward to the detection of epi-TSGs NyðÞ¼jul; sl pffiffiffiffiffiffi exp : s p s2 and the epigenetic of tumor samples. We apply the l 2 2 l proposed algorithm and the adjusted version of an existing method to The Cancer Genome Atlas (TCGA) prostate adenocar- cinomas data, identifying 57 candidate epi-TSGs. We further The relative advantage of this model over a simple (one-mode) investigate the clinical utility of these genes. Gaussian model (Model 0) is evaluated by the Akaike information criterion (AIC). For a specific model and the given data, this k L^ ; k Materials and Methods statistic is computed as AIC ¼ 2 2lnð Þ where is the num- ber of parameters to be estimated, which are 5 and 2 in Model 1 Classifier-1 and Model 0, respectively. L^ is the maximized value of the Classifier-1 is a Gaussian mixture model–based algorithm, likelihood function of the model. Model 1 will be considered as which is proposed in this study to reveal the stratification of the preferred model if its AIC is smaller than that of Model 0. promoter methylation levels of individual gene within a tumor cohort. The algorithm is suitable to a dataset with paired tumor Step 3. For a gene with Model 1 as the preferred model, a binary and normal (tissue) specimens and is expected to be efficient partition of the tumor samples is generated according to when the size of tumor samples is large and/or the methylation m~ ; s~ ; m~ s~ P the parameter estimates ð 1 1 2 and 2Þ.Let norm repre- events in the focused gene are not rare. The work flow includes the sent the normal probability function. In the case of following three steps. P y > y m~ ; s~ < P y y m~ ; s~ ; i normð ij 1 1Þ normð h ij 2 2Þ tumor is partitioned to the "methylated" group. Otherwise, it will be classified to the ðtÞ th Step 1. For a specific gene, the methylation metric (mi ) of the i "un-methylated" group. After that, if this gene-specific partition tumor, its purity metric (ciÞ, and the methylation metric of the can pass two filters, it is outputted to the container "BS-1." The ðnÞ fi fi paired normal tissue sample (mi ) are integrated to compute a rst lter is that each tumor group, i.e., "methylated" or "un- methylation index for tumor i (yi) by the following formulas: methylated," contains at least three samples. The second filter is that the methylation level (beta-value) of all samples in the !!"methylated" group should be larger than a modest threshold mðÞt a y x ; x 1 i þ c ; b (such as 0.15). i ¼ log2ðÞi i ¼ max n 1 þ i ci ðÞ mi þ a Classifier-2 In the analysis, we use the mean of the beta-values, which range Classifier-2 is proposed to partition tumor samples based from 0 to 1, of all CpG loci located within 3-kb-long promoter on the promoter methylation profile of a gene, which is hyper- ðtÞ ðnÞ sequence of the gene to compute mi and mi . We estimate ci by methylated in some tumor samples, but the relative methylation

www.aacrjournals.org Cancer Epidemiol Biomarkers Prev; 28(1) January 2019 199

Downloaded from cebp.aacrjournals.org on September 25, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst September 27, 2018; DOI: 10.1158/1055-9965.EPI-18-0491

Zhang et al.

measures (i.e., the methylation index defined in Step 1 of Y and an assemblage of tumor sample classifications based on the Classifier-1) compared with the paired normal tissue specimens decision value vector T ¼ðt1; t2; t3; ...... tNÞ and serially chan- do not favor a bimodal Gaussian mixture model. Similar to the ged cutoffs, an ROC is generated, and the AUC is calculated. method used in ref. 14, Classifier-2 partitions tumors by setting a minimum methylation level (beta-value) for a methylated Software implementation tumor and a maximum one for a normal tissue sample. In The expectation–maximization algorithm is used to find the particular, a tumor is partitioned into the methylated group if maximum likelihood estimates of the parameters of a Gaussian its aggregated promoter methylation level is larger than the half mixture model by running the norMixFit() function in the R of the tumor purity metric and the aggregated methylation level package "nor1mix." A SVM model is trained by the svm() of the paired normal sample is less than 0.1. The calculation of function in the R package "e1071."Intheimplementation, fi tumorpuritymetricissameasthatusedinClassier-1. A sigmoid kernel is used, and the class weights are specified as the quantity equal to the half of the tumor purity metric is the reciprocals of the fractions of the "1" samples and "-1" samples expected beta-value if any singe allele of a genome locus is in the training set. For other function parameters, the defaults methylated in all cancerous cells in the tumor sample. Similar are used. to Classifier-1, a valid gene-specific partition, in which both methylated and unmethylated groups have at least three sam- Data ples, will be outputted to the container "BS-2." Four datasets are used in this study. The first, retrieved from the TCGA database, contains 455 primary prostate adenocarcinomas fi Identi cation of epi-TSGs and 50 normal prostate tissue samples (paired with 50 tumor fi The combined results of these two classi ers include a few specimens) with complete DNA methylation and gene expression fi hundred gene-speci c partitions of tumor samples. For each information. The second, from the Gene Expression Omnibus partition, the association between the methylation category and (GEO) GSE83917 and GSE84042, contains 73 localized primary – gene expression level is evaluated by the Mann Whitney test. The tumors with complete methylation and expression information fold change (FC) of the gene expression of the tumor samples in (16). The third, from the GEO GSE21032, contains 131 primary the methylated group compared with that of the unmethylated tumors with complete gene expression information (17). The last, group is calculated as the difference of the averages of log2- from GEO GSE55599, contains 10 benign hyperplasia samples transformed expression levels. Candidate epi-TSGs are selected and 22 carcinomas with complete methylation and expression < : by two criteria, i.e., FC 0 35 (see Supplementary Text S1 for an information (18). – P < explanation) and Benjamini Hochberg adjusted value 0.05. The DNA methylation profiling of these data was measured – The procedure for identifying epi-TSGs is illustrated by Fig. 1A C. using the Illumina Human Methylation450 (HM450) BeadChip. The HM450 array contains 485,777 probes, includ- Clinical utility ing 482,421 CpG sites, 3,091 non-CpG (CpH) sites, and 65 We use support vector machine (SVM; ref. 15), leave-one-out SNPs in human genome. We use the beta-values in the down- cross validation, Fisher exact test, and ROC method to evaluate loaded data as the methylation level of a genome site. Our the utility of the identified epi-TSGs in prostate cancer diagnosis analysis focuses on approximately 99,700 genome sites, which and prognosis. First, based on a clinical feature of interest, are located on the CpG islands within the promoters (the 1.5- such as the Gleason score (GS), the N tumors in a cohort are kb-long sequences flanking the transcription starting sites) of divided into two classes, e.g., GS < 7 ("-1" group) and GS 7 all ref-Seq genes and have beta-values in at least half of all the ("1" group). The labels of these tumors are then saved in a TCGA samples. The gene expression levels of the tumor samples vector Y ¼ðy1; y2; ...yi; ...yN), where yi 2ð1; 1Þ. After that, in the four cancer cohorts were measured using different plat- the (assumedly unknown) class of a leave-out tumor i is predicted forms and were preprocessed using standard methods by the * from its gene expression (or methylation) profiling (xi) by the authors of those data. We download the normalized datasets * SVM model, which is trained on the data ({xj; yjg) of the other from the TCGA and GEO databases. Before the analysis, further N 1 samples. That is, preprocessing is performed (see Supplementary Text S2 for a brief description). The clinical data of the TCGA, GSE83917, and GSE21032 samples are used to evaluate the clinical utility ^zi ¼ signðÞti ; of the identified epi-TSGs. The pursued clinical features, includ- Xj„i ÀÁ * * ing pathologic GS, pathologic T category, and biochemical ti ¼ ajyjkxj; xi þ b; S ¼ fg1; 2; 3; ...... ; N recurrence after treatment, of these cohorts are summarized j2S in Table 1.

In the equations, ^zi denotes the predicted category (1 or 1) Results ith t k *x ; *x for the sample; i is the decision value, ð j iÞ is Tumor-specific hypermethylated genes the kernel function, and faj} and b are the model parameters Using the information of 50 TCGA prostate adenocarcinoma decided in the previous training process. Third, by summari- (PRAD) samples and the paired normal prostate tissue samples, zing the true label vector Y and the predicted label vector we identify 711 tumor-specific hypermethylated genes (ts-HMG) ^ Z ¼ð^z1; ^z2; ^z3; ...... ^zNÞ,a2 2 contingency table is generat- whose promoter CpG sites are "methylated" in some of tumors in ed, on which the Fisher test of independence is performed. the cohort. Among these ts-HMGs, 309 genes (Set-A, 43%) are Meanwhile, the reported sensitivity and specificity are calculated determined by Classifier-1. Another 402 genes (Set-B, 57%), according to the table. Finally, by combining the true label vector which do not meet the criteria of Classifier-1, are determined by

200 Cancer Epidemiol Biomarkers Prev; 28(1) January 2019 Cancer Epidemiology, Biomarkers & Prevention

Downloaded from cebp.aacrjournals.org on September 25, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst September 27, 2018; DOI: 10.1158/1055-9965.EPI-18-0491

Epigenetically Silenced Tumor Suppressor Genes in Prostate Cancer

Figure 1. The flow scheme for the identification of epi-TSGs. The entire procedure is divided into three phases. In phase A, the gene-specific methylation index, which integrates the promoter methylation level of individual tumor, the purity of the tumor, and the methylation level of the paired normal tissue sample, is calculated. In phase B, ts-HMGs are identified using the proposed mixture model algorithm (Classifier-1) or a na€ve method (Classifer-2) similar to that used in ref. 14. In phase C, epi-TSGs are selected from ts-HMGs based on the association between gene expression levels and promoter methylation status.

Classifier-2. The methylation profile of the genes in Set-1 but not epi-TSGs in Set-2 demonstrates a clear stratification pattern across tumors. Fifty-seven candidate epi-TSGs (Fig. 2A; Supplementary Table That is, the distribution of the derived methylation indexes could S1), amounting to 8% of the ts-HMGs, are selected using the be better fit by a bimodal Gaussian (normal) mixture model than procedure and criteria described in the Materials and Methods a simple normal model. It is worthy to note that the efficacies of section. Three genes, including SEPT9, ELAVL2, and TNFAIP8, are Classifier-1 in the statistical identification of ts-HMGs and meth- "epigenetically activated." That is, unlike epi-TSGs, these outliers ylation stratification somewhat subject to the size of the analyzed have higher expression levels in the methylated tumors compared (or available) tumor and normal sample pairs. If the size is with the unmethylated ones. sufficiently large, such as 500 rather than 50, more ts-HMGs could For each one of the epi-TSG, we calculate its "methylation be identified and Set-A would account for a higher percentage. prevalence" as the ratio of the methylated tumors to the total

www.aacrjournals.org Cancer Epidemiol Biomarkers Prev; 28(1) January 2019 201

Downloaded from cebp.aacrjournals.org on September 25, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst September 27, 2018; DOI: 10.1158/1055-9965.EPI-18-0491

Zhang et al.

Table 1. Sample statistics of the datasets used to evaluate the diagnostic and calculation as "outliers," the correlation will become insignificant fi prognostic utility of the identi ed epi-TSGs ðr ¼ 0:17; P ¼ 0:78Þ. We also find that a few tumors which TCGA GSE21032 GSE83917 are TMPRSS2-ERG fusion-negative but host a relatively heavier Pathologic GS point mutation burden on canonical driver genes tend to have a NA 148 1 0 G6 58 41 16 larger methylation burden (Supplementary Fig. S1). fi G7 166 74 57 Of the 57 epi-TSGs, 22 genes pinpointed by Classi er-1 have G8 43 8 0 clear across-tumors stratification patterns (Fig. 1 and Fig. 3), as G9 39 7 0 demonstrated by the two-mode distribution of the derived meth- G10 1 0 0 ylation indexes. The genomic information and statistical analysis Pathologic T category results of these genes are summarized in Table 2. Evidence of NA 3 0 0 T2a 14 9 4 potential tumor suppression functions has been reported in 10 DACT2 C3orf14 T2b 12 47 1 (including , , and eight others) of these 22 genes T2c 154 29 35 and in 11 of the 35 epi-TSGs identified by Classifier-2. The T3a 149 28 24 relevant biological processes include Wnt/beta-catenin pathway, T3b 114 10 9 epithelial–mesenchymal transition, AMPK signaling, and others T3c 0 2 0 (Supplementary Table S1). For such a gene, the well-defined "un- T4 9 6 0 BCR methylated" or "methylated" status is somewhat analogous to its NA 66 0 0 genetic type "wild" or "mutant," respectively. The recurrent meth- NO 340 104 55 ylation events in tumors suggest that, similar to the germline and YES 49 27 18 in well-known tumor driver genes (such as NOTE: The pathologic GSs in the TCGA data are the reviewed GSs retrieved from TP53), the epigenetic alterations may contribute to the selection ref. 14. advantage of cancer cells over normal cells in tumor formation and progression. On the other hand, because the promoter considered tumors (N ¼ 50). As shown in Fig. 2B, the distribution methylation status is not the only determinant for gene transcrip- of the methylation prevalence is skewed with a long tail on tion, the expression levels of an epi-TSG in tumors can be similar the right side. The epi-TSGs with the ratios between 0.1 and to or different from those in the normal tissue samples, as 0.2 are most populous. Only two genes are methylated in over demonstrated in Supplementary Fig. S2. 50% of tumors. For each tumor, we calculate its "methylation burden" as the number of methylated epi-TSGs. The values Robustness of epigenetic gene silencing distribute evenly in the interval of 0 to 22. A few tumors have In the procedure for identifying epi-TSGs, we determine epi- over 25 methylated epi-TSGs (Fig. 2C). A strong positive corre- genetic gene silencing via comparing the methylated and lation ðr ¼ 0:49; P ¼ 3:7 104Þ is demonstrated between the unmethylated groups. However, this approach has a limitation methylation burdens and the logarithm-translated PSA levels of in that the preceding tumor classification (or partition) step needs patients. We further notice that the association is dominated by the methylation information of paired normal tissue samples, the 5 influential observations where the tumors have top meth- which could be unavailable in most cases. An alternative method, ylation burden. If these observations are filtered out in the which is less direct but less data-demanding, is to evaluate the

Figure 2. Characterization of epi-TSGs. A, The volcano plot for the differences of expression levels of ts-HMGs between the gene-specific "methylated" groups and "un- methylated" groups. The P values (y axis) are estimated by the Mann–Whitney test. Each point represents a gene, and the identified epi-TSGs are denoted by the solid circles. Among the genes located within the top-left plot defined by the horizontal vertical dotted lines, two are excluded from the epi-TSG list because their expression levels (FPKMs) are low (< 2.0) in most tumor samples. B, The distribution of methylation prevalence (Prev) of epi-TSGs. Prev is calculated as the ratio of the methylated tumors to the entire studied tumors. The first column bar indicates that 15 genes have a Prev between 0 and 0.1. C, The distribution of methylation burdens of tumors and the correlation with the blood PSA levels. Each tumor is represented as an open circle in the scatter plot. The methylation burden is correspondingly indicated by a short bar over the x axis.

202 Cancer Epidemiol Biomarkers Prev; 28(1) January 2019 Cancer Epidemiology, Biomarkers & Prevention

Downloaded from cebp.aacrjournals.org on September 25, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst September 27, 2018; DOI: 10.1158/1055-9965.EPI-18-0491

Epigenetically Silenced Tumor Suppressor Genes in Prostate Cancer

Figure 3. The bimodal distributions of the methylation indexes in 9 of the epi-TSGs identified by Classifier-1. All of these genes and C3orf14 (whose methylation index distribution is presented in Fig. 1) have been reported to carry potential tumor suppression functions. In each plot, the dotted vertical line denotes the cutoff for separating the methylated tumors from unmethylated ones. A zero value of methylation index indicates that the methylation level of tumorous cells is equal to the level of normal cells. When the index is 2, the methylation level of tumorous cells is 4 times (22) of that of normal cells. For the 9 genes depicted in this figure, the dotted lines are exclusively on the right of the zero point, indicating that the methylation level of a methylated tumor is always higher than the level of its paired normal specimen (an intuitive criterion which is exerted by Classifier-2). Such a pattern is also observed for the other 13 epi-TSGs identified by Classifier-1. correlation (r) between the gene expression levels and DNA and GSE55599) in which 49 and 55 of the 57 epi-TSGs have the methylation levels of tumor samples. That is, a negative r value complete information. On a less demanding significance criterion is considered as the indicator of epigenetic gene silencing. Here, (P < 0.05), the validation rates of epigenetic gene silencing are we first test the significance of the correlations in the 57 epi-TSGs 0.61 and 0.45, respectively (Supplementary Fig. S3B and S3C). In using the expression and DNA methylation data of 405 TCGA the results of GSE55599, there are two outliers (BCAT2 and tumors which are not used for epi-TSGs identification. As shown C10orf13), show a significant positive rather than negative cor- in Supplementary Fig. S3A, all the 57 Pearson correlation coeffi- relation between expression and methylation metrics. A main cients are negative, and 56 of them have the P values less than reason for the decreased validation rates on the two external data 0.01, corresponding to a validation rate of 0.98 (56=57). We also is the small cohort size. Due to the general low methylation perform the same analysis on two external datasets (GSE83917 prevalence (<0.2) and the potential variability over cohorts

www.aacrjournals.org Cancer Epidemiol Biomarkers Prev; 28(1) January 2019 203

Downloaded from cebp.aacrjournals.org on September 25, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst September 27, 2018; DOI: 10.1158/1055-9965.EPI-18-0491

Zhang et al.

Table 2. The overview of candidate epi-TSGs with the promoter methylation Comparison with the TCGA group's relevant analysis levels stratified across tumors The TCGA group identified 164 epigenetically silenced genes in P Gene (Chr.) Prev FC value Ref. prostate cancer (14), among which 22 are overlapped with our OSBPL9 (chr1) 0.36 0.39 1.73E04 epi-TSGs. In Supplementary Text S3, we present a comparative PKP1 (chr1) 0.32 1.69 1.5E05 (19) review of their results and methods. By doing so, we further QPCT (chr2) 0.46 1.05 1.8E04 (20) B3GALNT1 (chr3) 0.18 1.66 1.8E04 demonstrate that our work represents a unique study in identi- C3orf14 (chr3) 0.44 1.44 9.0E04 (21) fying epi-TSGs for prostate cancer (Supplementary Figs. S6A–S6C, SLC25A20 (chr3) 0.2 1.07 7.0E06 S7A–S7C, and S8). SUSD5 (chr3) 0.38 1.03 6.8E04 CDKL2 (chr4) 0.44 1.18 9.9E05 CXCL1 (chr4) 0.28 2.41 9.0E04 Discussion DACT2 (chr6) 0.24 3.06 2.7E04 (22, 23) DLX6 (chr7) 0.26 2.13 4.5E04 Hypermethylation of CpG islands located in the promoter RARRES2 (chr7) 0.64 0.75 4.4E03 (24) regions of genes is an important mechanism of gene inactivation. ADAM32 (chr8) 0.3 1.79 1.2E04 For a classical TSG, such as MLH1 that plays roles in DNA repair, NAPRT1 (chr8) 0.48 1.34 7.1E06 (25) the promoter methylation events after a genetic mutation on one FBP1 (chr9) 0.2 1.42 6.0E04 (26) allele can serve as the "second hit" for the loss of its normal C13orf38 (chr13) 0.26 2.66 4.0E05 functions (32). More popularly, methylation aberrations arise on ACOT4 (chr14) 0.2 1.07 7.3E05 FES (chr15) 0.2 1.10 7.3E05 (27, 28) less typical TSGs in which a genetically damaged allele is not PARP6 (chr15) 0.62 0.83 5.3E04 (29) necessarily recessive and somatic mutations are rarely observed. HNF1B (chr17) 0.34 1.68 1.8E03 (30, 31) Such genes account for the majority of the epi-TSGs identified in ZNF135 (chr19) 0.2 1.09 5.5E05 our study and documented in literature (14). In most publica- SLC7A4 (chr22) 0.32 2.08 2.2E04 tions, candidate epi-TSGs were selected (based on the association Abbreviations: FC, the fold change of the gene expression of the tumor samples between the expression level and methylation level) from the in the methylated group compared with the unmethylated group. FC is calcu- genes for which the tumors had significantly higher methylation lated as the difference of the averages of log2-transformed expression levels; P value, the significance level for the differential gene expression between the levels in the promoter CpG sites than normal tissue samples methylated tumor group and unmethylated one. It is estimated by the Mann– (33–35). The candidate epi-TSGs identified in such a way actually Whitney test; Prev, methylation prevalence, calculated as the ratio of the hold the distinguishing attribute of cancer diagnosis markers methylated tumors among all the considered tumors; Ref, references that report rather than tumor drivers. In our study, candidate epi-TSGs are the tumor suppression function of the corresponding gene. selected from the genes whose methylation profile is stratified across tumors, i.e., both "methylated" and "un-methylated" pro- (sourced from random sampling), for some of the epi-TSGs, the moter statuses are not rare among the tumors of a patient cohort. methylated tumors within a small sample set may be rare. In such This makes the identified epi-TSGs resemble the canonical cancer a case, the methylation–expression association tends to be weak driver genes whose most etiological property is the recurrence of and elusive to detect. This perception is supported by the result of mutated genotypes in tumors of different patients (36–38). The an additional analysis (Supplementary Fig. S4). That is, for major unique point of our method is that the stratified methyl- normal tissues, only 10 of the 57 epi-TSGs have a significant ation profile of an epi-TSG is identified and characterized via negative correlation between the methylation and expression modeling the distribution of the gene-specific methylation index- metrics. es which integrates the methylation level of individual tumor, the purity of the tumor, and the methylation level of the paired Clinical utility of the identified epi-TSGs normal tissue sample. A remaining challenge is how to augment The above-mentioned correlation between the methylation the existing data so that the information of the tumors for which burden and PSA level as well as the robustness of epigenetic the DNA methylation of paired normal tissues is not available can gene silencing inspires us to further investigate the clinical be also included in the proposed mixture model, leading to an utility of the identified epi-TSGs. We perform this study using improved accuracy and efficacies in identifying epi-TSGs. an integrative method of machine learning and Fisher test. The We identify 57 candidate epi-TSGs in this study. They could be expression profiling (E-profiling) and the methylation profiling considered as a portion of the potential methylation-mediated (M-profiling) of the 57 genes are considered as potential TSGs in prostate cancer. This is because, in our analysis, only the predictive signatures for biochemical (PSA) recurrence (BCR) genes with a methylation frequency over 5% are considered. Here, of patients, GSs of tumors, and cancer stages (T Category). The we emphasize the methodologic implication of identifying the 10 results (Fig. 4) show that (i) in TCGA data, E-profiling and epi-TSGs highlighted in Table 2. That is, a couple of facts about M-profiling have similar prediction strength for all the focused these genes imply that, in detecting and pursuing driver methyl- clinical features; (2) E-profiling is a robust predictor for GS with ation events, the statistical methods which have been developed almost identical AUC statistics (ranging from 0.75 to 0.79) for studying the discretely distributed driver mutations could be achieved on the TCGA dataset and two external datasets; and barrowed. The facts include (i) the tumor suppression functions (3) the prediction strength of E-profiling is generally stronger of the 10 genes have been demonstrated or suggested in previous than that of M-profiling. "Metastasis after treatment" is a more publications (Table 2; refs. 19–31) and (ii) the methylation important clinical feature to predict. However, only in statuses in a tumor are well determined by the clearly stratified GSE21032 (among the three datasets), the events of metastasis across-tumor methylation profiles. We also note that these genes are recorded for patients. From this limited data, we find that can be divided into three categories, representing an uncompleted E-profiling is a highly promising marker for metastasis (AUC ¼ but a typical taxonomy of potential epi-TSGs. The first category 0.9; Supplementary Fig. S5). includes PKP1, QPCT, DACT2, RARRES2, NAPRT1, FBP1, PARP6,

204 Cancer Epidemiol Biomarkers Prev; 28(1) January 2019 Cancer Epidemiology, Biomarkers & Prevention

Downloaded from cebp.aacrjournals.org on September 25, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst September 27, 2018; DOI: 10.1158/1055-9965.EPI-18-0491

Epigenetically Silenced Tumor Suppressor Genes in Prostate Cancer

Figure 4. The assessment of the clinical utility of the expression profiling (E-profiling) and promoter methylation profiling (M-profiling) of the identified 57 epi-TSGs. The two classes, i.e., "negative" and "positive", for BCR are "NO" and "YES." Similarly, the two classes for GSs are "<7" and "> ¼ 7," and the two classes for T-category are "T2" and "T3 & T4." In each plot, the red and black ROC curves represent the results of M-profiling and E-profiling, respectively. The sensitivity (Sn), specificity (Sp), P value, and AUC for the E-profiling are denoted by Sn(E), Sp(E), P value(E), and AUC(E), respectively. Similarly, these statistics for the M-profiling are denoted by Sn(M), Sp(M), P value(M), and AUC(M), respectively. In each plot, we report the better values of Sp, Sn, and P value obtained from either the E-profiling or M-profiling signature. The printed Sn, Sp, and P value are calculated from a contingency table which depends on the signs (þ or ) of the decision values of tumor cases. The ROC curve is generated from a set of contingency tables that are based on the signs of the differences between the decision values of tumor cases and serially changed cutoffs. The dataset GSE83917 that contains DNA methylation quantities is complemented by GSE84042 that contains gene expression quantities. and HNF1B, which have been well annotated regarding their 57 epi-TSGs may play roles in suppressing the invasion of tumor products and functions. The second includes FES, which is first cells to distant sites, which can lead to the extra release of PSA to identified as an but is also a potential TSG as implicated blood. However, to our knowledge, direct evidences for this by the recent genetic evidence. The third includes C3orf14, which argument are still missing in literature. has been widely reported as a promising TSG but remains to be The potential application of DNA methylation in the diagnosis substantially annotated. An elusive problem arises from the and prognosis of prostate cancer has been widely investigated in observed positive correlation between the methylation burdens the past decade. Most of the accumulated evidences are obtained and patient PSA levels. The observation implies that some of the from the studies which focus on a small set of genes, including

www.aacrjournals.org Cancer Epidemiol Biomarkers Prev; 28(1) January 2019 205

Downloaded from cebp.aacrjournals.org on September 25, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst September 27, 2018; DOI: 10.1158/1055-9965.EPI-18-0491

Zhang et al.

GSTP1, APC, RUNX3, PITX2, RASSF1A, TGFB2, RARB, HOX genes, sion level of a gene is simultaneously regulated by other factors, and others (39, 40). For example, promoter methylation in such as miRNAs, beyond the promoter methylation status. APC and HOXD3 was identified as a biomarker for prostate cancer progression by (41) and (42), respectively. In the past Disclosure of Potential Conflicts of Interest years, with large-scale epigenome-wide DNA methylation pro- No potential conflicts of interest were disclosed. filing data becoming available, researchers in biomedical com- munities have been looking for biomarker panels of multiple Authors' Contributions CpG sites or genes, which are expected to be more predictive, Conception and design: W. Zhang, H.-W. Deng, K. Zhang compared with a single-gene signature, for the interested clin- Development of methodology: W. Zhang, H.-W. Deng, K. Zhang Acquisition of data (provided animals, acquired and managed patients, ical features of cancer patients. In several publications (43–45), fi provided facilities, etc.): W. Zhang the authors rstly scanned all the CpG sites of a high-through- Analysis and interpretation of data (e.g., statistical analysis, biostatistics, put methylation platform to identify the top (such as 5%) computational analysis): W. Zhang, H.-W. Deng, K. Zhang predictive ones for the interested classification (e.g., high-lethal Writing, review, and/or revision of the manuscript: W. Zhang, E.K. Flemington, vs. low-lethal) of tumors or the related clinical items. Then, an H.-W. Deng, K. Zhang integrated prediction signature was derived from the previously Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): W. Zhang, H.-W. Deng, K. Zhang selected features via the multivariate analysis and/or multi- Study supervision: K. Zhang regression analysis. However, such a supervised or semisuper- fi vised method often suffers from the problem of over tting. Acknowledgments That is, the robustness of the established prediction signature This research is supported by NIH grants 2G12MD007595, 5P20GM103424- cannot be guaranteed (Supplementary Text S4; Supplementary 15, 3P20GM103424-15S1, P01CA214091, and U19AG055373, and DOD-ARO Fig. S9A–S9C). In our study, the epi-TSGs are identified without grant W911NF-15-1-0510. The used TCGA data reside at https://portal.gdc. using the progression features of tumors. So, it is no wonder cancer.gov/legacy-archive/search/f. We thank the three reviewers for their that the prediction strength of the methylation profiling and constructive comments. expression profiling (for clinical features, especially GS) is The costs of publication of this article were defrayed in part by the payment of observed in not only the TCGA data but also those two external page charges. This article must therefore be hereby marked advertisement in datasets. It is also logical that the performance of the expression accordance with 18 U.S.C. Section 1734 solely to indicate this fact. profiling is generally better than thatofthemethylationpro- filing. The reason is that DNA methylation of epi-TSGs takes Received May 3, 2018; revised July 23, 2018; accepted September 19, 2018; part in tumor progression via silencing genes, and the expres- published first September 27, 2018.

References 1. Siegel R, Ma J, Zou Z, Jemal A. Cancer statistics, 2014. CA Cancer J Clin 14. The Cancer Genome Atlas Research Network. The molecular taxonomy 2014;64:9–29. of primary prostate cancer. Cell 2015;163:1011–25. 2. Bunz F. Principles of cancer genetics. Dordrecht: Springer; 2008. p.xi, 325. 15. Cristianini N, Shawe-Taylor J. An introduction to support vector machines: 3. Majumdar S, Buckles E, Estrada J, Koochekpour S. Aberrant DNA meth- and other kernel-based learning methods. Cambridge; New York: ylation and prostate cancer. Curr Genomics 2011;12:486–505. Cambridge University Press; 2000. p.xiii, 189. 4. Weber M, Hellmann I, Stadler MB, Ramos L, Paabo S, Rebhan M, et al. 16. Fraser M, Sabelnykova VY, Yamaguchi TN, Heisler LE, Livingstone J, Huang Distribution, silencing potential and evolutionary impact of promoter V, et al. Genomic hallmarks of localized, non-indolent prostate cancer. DNA methylation in the human genome. Nat Genet 2007;39:457–66. Nature 2017;541:359–64. 5. Ehrlich M. DNA methylation in cancer: too much, but also too little. 17. Taylor BS, Schultz N, Hieronymus H, Gopalan A, Xiao Y, Carver BS, et al. Oncogene 2002;21:5400–13. Integrative genomic profiling of human prostate cancer. Cancer Cell 2010; 6. Holliday R, Ho T. DNA methylation and epigenetic inheritance. Methods 18:11–22. 2002;27:179–83. 18. Paziewska A, Dabrowska M, Goryca K, Antoniewicz A, Dobruch J, Mikula 7. Lim HN, van Oudenaarden A. A multistep epigenetic switch enables the M, et al. DNA methylation status is more reliable than gene expression at stable inheritance of DNA methylation states. Nat Genet 2007;39:269–75. detecting cancer in prostate biopsy. Br J Cancer 2014;111:781–9. 8. Park JY. Promoter hypermethylation in prostate cancer. Cancer Control 19.KazAM,LuoY,DzieciatkowskiS,ChakA,WillisJE,UptonMP,etal. 2010;17:245–55. Aberrantly methylated PKP1 in the progression of Barrett's esophagus to 9. Paz MF, Wei S, Cigudosa JC, Rodriguez-Perales S, Peinado MA, Huang TH, esophageal adenocarcinoma. Genes Cancer 2012;51: et al. Genetic unmasking of epigenetically silenced tumor suppressor genes 384–93. in colon cancer cells deficient in DNA methyltransferases. Hum Mol Genet 20. Morris MR, Ricketts CJ, Gentle D, McRonald F, Carli N, Khalili H, et al. 2003;12:2209–19. Genome-wide methylation analysis identifies epigenetically inactivated 10. Kazanets A, Shorstova T, Hilmi K, Marques M, Witcher M. Epigenetic candidate tumour suppressor genes in renal cell carcinoma. Oncogene silencing of tumor suppressor genes: paradigms, puzzles, and potential. 2011;30:1390–401. Biochim Biophys Acta 2016;1865:275–88. 21. Lando M, Fjeldbo CS, Wilting SM, B CS, Aarnes EK, Forsberg MF, et al. 11. Kandoth C, McLellan MD, Vandin F, Ye K, Niu B, Lu C, et al. Mutational Interplay between promoter methylation and chromosomal loss in gene landscape and significance across 12 major cancer types. Nature 2013; silencing at 3p11-p14 in cervical cancer. 2015;10:970–80. 502:333–9. 22. Wang S, Dong Y, Zhang Y, Wang X, Xu L, Yang S, et al. DACT2 is a functional 12. Guo S, Diep D, Plongthongkum N, Fung HL, Zhang K, Zhang K. Identi- tumor suppressor through inhibiting Wnt/beta-catenin pathway and asso- fication of methylation haplotype blocks aids in deconvolution of hetero- ciated with poor survival in colon cancer. Oncogene 2015;34:2575–85. geneous tissue samples and tumor tissue-of-origin mapping from plasma 23. Hou J, Liao LD, Xie YM, Zeng FM, Ji X, Chen B, et al. DACT2 is a candidate DNA. Nat Genet 2017;49:635–42. tumor suppressor and prognostic marker in esophageal squamous cell 13. Eckhardt F, Lewin J, Cortese R, Rakyan VK, Attwood J, Burger M, et al. DNA carcinoma. Cancer Prev Res 2013;6:791–800. methylation profiling of human chromosomes 6, 20 and 22. Nat Genet 24. Liu-Chittenden Y, Jain M, Gaskins K, Wang S, Merino MJ, Kotian S, et al. 2006;38:1378–85. RARRES2 functions as a tumor suppressor by promoting beta-catenin

206 Cancer Epidemiol Biomarkers Prev; 28(1) January 2019 Cancer Epidemiology, Biomarkers & Prevention

Downloaded from cebp.aacrjournals.org on September 25, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst September 27, 2018; DOI: 10.1158/1055-9965.EPI-18-0491

Epigenetically Silenced Tumor Suppressor Genes in Prostate Cancer

phosphorylation/degradation and inhibiting p38 phosphorylation in 35. Zheng Y, Huang Q, Ding Z, Liu T, Xue C, Sang X, et al. Genome-wide DNA adrenocortical carcinoma. Oncogene 2017;36:3541–52. methylation analysis identifies candidate epigenetic markers and drivers of 25. Shames DS, Elkins K, Walter K, Holcomb T, Du P, Mohl D, et al. Loss of hepatocellular carcinoma. Brief Bioinform 2018;19:101–8. NAPRT1 expression by tumor-specific promoter methylation provides a 36. D'Antonio M, Ciccarelli FD. Integrated analysis of recurrent properties of novel predictive biomarker for NAMPT inhibitors. Clin Cancer Res cancer genes to identify novel drivers. Genome Biol 2013;14:R52. 2013;19:6912–23. 37. Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, Sivachenko A, 26. Alderton GK. Tumorigenesis: FBP1 is suppressed in kidney tumours. Nat et al. Mutational heterogeneity in cancer and the search for new cancer- Rev Cancer 2014;14:575. associated genes. Nature 2013;499:214–8. 27. Olvedy M, Tisserand JC, Luciani F, Boeckx B, Wouters J, Lopez S, et al. 38. Dees ND, Zhang Q, Kandoth C, Wendl MC, Schierding W, Koboldt DC, Comparative identifies tyrosine FES as a tumor et al. MuSiC: identifying mutational significance in cancer genomes. suppressor in melanoma. J Clin Invest 2017;127:2310–25. Genome Res 2012;22:1589–98. 28. Greer PA, Kanda S, Smithgall TE. The contrasting oncogenic and tumor 39. Liu L, Kron KJ, Pethe VV, Demetrashvili N, Nesbitt ME, Trachtenberg J, et al. suppressor roles of FES. Front Biosci (Schol Ed) 2012;4:489–501. Association of tissue promoter methylation levels of APC, TGFbeta2, 29. Qi G, Kudo Y, Tang B, Liu T, Jin S, Liu J, et al. PARP6 acts as a tumor HOXD3 and RASSF1A with prostate cancer progression. Int J Cancer suppressor via downregulating Survivin expression in colorectal cancer. 2011;129:2454–62. Oncotarget 2016;7:18812–24. 40. Phe V, Cussenot O, Roupret M. Methylated genes as potential biomarkers 30. Buchner A, Castro M, Hennig A, Popp T, Assmann G, Stief CG, et al. in prostate cancer. BJU Int 2010;105:1364–70. Downregulation of HNF-1B in renal cell carcinoma is associated with 41. Richiardi L, Fiano V, Vizzini L, De Marco L, Delsedime L, Akre O, et al. tumor progression and poor prognosis. Urology 2010;76:507 e6–11. Promoter methylation in APC, RUNX3, and GSTP1 and mortality in 31. Rebouissou S, Vasiliu V, Thomas C, Bellanne-Chantelot C, Bui H, Chretien prostate cancer patients. J Clin Oncol 2009;27:3161–8. Y, et al. Germline hepatocyte nuclear factor 1alpha and 1beta mutations in 42. Kron KJ, Liu L, Pethe VV, Demetrashvili N, Nesbitt ME, Trachtenberg J, et al. renal cell carcinomas. Hum Mol Genet 2005;14:603–14. DNA methylation of HOXD3 as a marker of prostate cancer progression. 32. Gausachs M, Mur P, Corral J, Pineda M, Gonzalez S, Benito L, et al. MLH1 Lab Invest 2010;90:1060–7. promoter hypermethylation in the analytical algorithm of Lynch syn- 43. Mundbjerg K, Chopra S, Alemozaffar M, Duymich C, Lakshminarasimhan drome: a cost-effectiveness study. Eur J Hum Genet 2012;20:762–8. R, Nichols PW, et al. Identifying aggressive prostate cancer foci using a DNA 33. Charlet J, Tomari A, Dallosso AR, Szemes M, Kaselova M, Curry TJ, et al. methylation classifier. Genome Biol 2017;18:3. Genome-wide DNA methylation analysis identifies MEGF10 as a novel 44. Geybels MS, Wright JL, Bibikova M, Klotzle B, Fan JB, Zhao S, et al. epigenetically repressed candidate in neuroblasto- Epigenetic signature of Gleason score and prostate cancer recurrence after ma. Mol Carcinog 2017;56:1290–301. radical prostatectomy. Clin Epigenetics 2016;8:97. 34. Chen C, Zhang C, Cheng L, Reilly JL, Bishop JR, Sweeney JA, et al. 45. Zhao S, Geybels MS, Leonardson A, Rubicz R, Kolb S, Yan Q, et al. Correlation between DNA methylation and gene expression in the brains Epigenome-wide tumor DNA methylation profiling identifies novel prog- of patients with bipolar disorder and schizophrenia. Bipolar Disord nostic biomarkers of metastatic-lethal progression in men diagnosed with 2014;16:790–9. clinically localized prostate cancer. Clin Cancer Res 2017;23:311–9.

www.aacrjournals.org Cancer Epidemiol Biomarkers Prev; 28(1) January 2019 207

Downloaded from cebp.aacrjournals.org on September 25, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst September 27, 2018; DOI: 10.1158/1055-9965.EPI-18-0491

Epigenetically Silenced Candidate Tumor Suppressor Genes in Prostate Cancer: Identified by Modeling Methylation Stratification and Applied to Progression Prediction

Wensheng Zhang, Erik K. Flemington, Hong-Wen Deng, et al.

Cancer Epidemiol Biomarkers Prev 2019;28:198-207. Published OnlineFirst September 27, 2018.

Updated version Access the most recent version of this article at: doi:10.1158/1055-9965.EPI-18-0491

Supplementary Access the most recent supplemental material at: Material http://cebp.aacrjournals.org/content/suppl/2018/09/27/1055-9965.EPI-18-0491.DC1

Cited articles This article cites 43 articles, 5 of which you can access for free at: http://cebp.aacrjournals.org/content/28/1/198.full#ref-list-1

E-mail alerts Sign up to receive free email-alerts related to this article or journal.

Reprints and To order reprints of this article or to subscribe to the journal, contact the AACR Publications Department Subscriptions at [email protected].

Permissions To request permission to re-use all or part of this article, use this link http://cebp.aacrjournals.org/content/28/1/198. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC) Rightslink site.

Downloaded from cebp.aacrjournals.org on September 25, 2021. © 2019 American Association for Cancer Research.