Breast Cancer Research and Treatment (2019) 178:565–572 https://doi.org/10.1007/s10549-019-05417-3

PRECLINICAL STUDY

CpG methylation signature predicts prognosis in breast cancer

Tonghua Du1 · Bin Liu1 · Zhenyu Wang1 · Xiaoyu Wan1 · Yuanyu Wu2

Received: 14 May 2019 / Accepted: 22 August 2019 / Published online: 13 September 2019 © Springer Science+Business Media, LLC, part of Springer Nature 2019

Abstract Purpose DNA methylation can be used as prognostic biomarkers in various types of cancers. We aimed to identify a CpG methylation pattern for breast cancer. Methods In this study, using the microarray data from the cancer genome atlas (TCGA) and expression omnibus (GEO), we profled DNA methylation between 97 healthy control samples and 786 breast cancer samples in a training cohort (from TCGA, n = 883) to build a gene classifer using a penalized regression model. We validated the prognostic accuracy of this gene classifer in an internal validation cohort (from GEO, n = 72). Results A total of 1777 diferentially methylated CpGs corresponding to 1777 diferent methylated (DMGs) between breast cancer and control were chosen for this study. Subsequently, 16 CpGs were generated to classify patients into high-risk and low-risk groups in the training cohort. Patients with high-risk scores in the training cohort had shorter overall survival (hazard ratio [HR], 4.674; 95% CI 2.918 to 7.487; P = 1.678e–12) than patients with low-risk scores. The prognostic accuracy was also validated in the validation cohorts. Furthermore, among patients with low-risk scores in the combined training and validation cohorts, the patients with the age > 60 years compared with the patients with the age < 60 years were associated with improved overall survival (HR 2.088, 95% CI 1.348 to 3.235; p = 7.575e–04) in patients with a high-risk score but not in patients with low-risk score (HR 1.246, 95% CI 0.515 to 3.011; p = 0.625). The patients treated with radiotherapy compared with the patients without radiotherapy were associated with improved overall survival (HR 0.418, 95% CI 0.249 to 0.703; p = 6.991e-04) in patients with a high-risk score but not in patients with low-risk score (HR 2.092, 95% CI 0.574 to 7.629; p = 0.253). For the patients with recurrence and the patients without recurrence both groups were all associated with improved overall survival (HR 7.475, 95% CI 4.333 to 12.901; p = 6.991e–04) in patients with a high-risk score and in patients with low-risk score (HR 14.33, 95% CI 4.265 to 48.17; p = 4.883e–13). Conclusion The 16 CpG-based signature is useful as a biomarker in predicting prognosis for patients with breast cancer.

Keywords Breast cancer · DNA methylation · Prognosis · Overall survival · CpG sites

Introduction

Breast cancer, the most frequent cancer among women, is the leading cause of cancer mortality for women, and dra- matically, the incidence is increasing among young women Electronic supplementary material The online version of this aged less than 40 years [1, 2]. According to the statistics article (https​://doi.org/10.1007/s1054​9-019-05417​-3) contains of 2016, about 246,660 female were diagnosed with breast supplementary material, which is available to authorized users. cancer, of whom an estimated 40,450 people are expected * Yuanyu Wu to die of breast cancer [3]. In recent years, although many [email protected] advanced treatment techniques have been used to improve the survival rate, the patients still sufer from low quality 1 Department of Breast Surgery, The Second Clinical Hospital Of Jilin University, NO. 218, Ziqiang Street, Nanguan of life or develop metastasis for late diagnosis [4]. As a District, Changchun 130000, Jilin, China result, the prognosis-related markers of breast cancer could 2 Department of Gastrointestinal and Colorectal Surgery, increase treatment options, including surgical resection and China-Japan Union Hospital of Jilin University, No.126 therapeutic interventions [5]. Xiantai St, Changchun, Jilin 130033, China

Vol.:(0123456789)1 3 566 Breast Cancer Research and Treatment (2019) 178:565–572

Biomarkers strongly associated with breast cancer risk the methylation loci at the genes’ CpG sites were further factors will provide an opportunity to understand cancer screened for analysis. Methylation at each CpG site is development. Gene expression profles have also been used described as β value [β = intensity of the methylated allele for breast cancer classifcation and served as prognostic (M)/(intensity of the unmethylated allele (U) + intensity of and therapeutic predictors. However, there are still major the methylated allele (M) + 100)] [7]. It is expressed as a challenges in accurate early prediction of breast cancer inci- continuous variable that ranges from 0 (no methylation) to dence, detection and prognosis. Thus, it is of great need 1 (full methylation). Limma (linear model for microarray to identify sensitive biomarkers to evaluate breast cancer data) package in R language was used to screen the diferen- prognosis at an early treatable stage. tially methylated genes (DMGs) [8]. Bonferroni in multi-test DNA methylation refers to heritable and modifiable package was employed to adjust the p value into false dis- markers that regulate gene expression without changing the covery rate (FDR) [9]. The FDR < 0.05 and |log2 fold-change underlying DNA sequence. So far, many researches have (FC)| > 0.5 were used as the cutof criteria for the DMGs. demonstrated that DNA methylation may play a critical role in carcinogenesis through downregulating tumor suppressor Co‑methylated genes screening genes expression [6]. It has also been found that the aber- rant DNA methylation could be an important factor in breast WGCNA (weighted gene coexpression network analy- cancer. For example, Tang et al. using the DNA methylation sis, version 1.63, https​://cran.r-proje​ct.org/web/packa​ges/ array have also identifed breast cancer-associated RPTOR, WGCNA​/index​.html) package in R language was used to MGRN1, and RAPSN hypomethylation in peripheral blood analyze the correlation among the DMGs [10]. Modules DNA [3]. Given that DNA methylation changes are plausibly were defned as clusters of highly interconnected DMGs, critical components of the molecular mechanisms involved and DMGs within the same cluster have high correlation in breast cancer, distinct DNA methylation profles could coefcients among them. be a potential biomarker to improve the accuracy of breast cancer prognosis. Prognosis‑related methylated genes screening In the present study, using the downloaded Illumina 450 K DNA methylation array data of breast cancer from The univariate and multivariate cox regression analysis of the cancer genome atlas (TCGA) database and gene expres- survival package (Version 2.41-1, http://bioco​nduct​or.org/ sion omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/), we packa​ges/survi​valr/) in R language was used to identify aimed to screen the breast cancer-related genes with aber- DMGs associated to the prognosis using the DMGs with rant DNA methylation and constructed a model using the the threshold value of log-rank p < 0.05 [11]. In addition, candidate genes in 863 training cohorts and validated those the penalized package (Version 0.9-50, http://bioco​nduct​ genes in 72 cohorts. or.org/packages/penal​ ized/​ ) [12] was applied to identify the optimized DMGs based on the Cox-PH (Cox-proportional Hazards) model [13]. The optimized parameter “lambda” Materials and methods was obtained by 1000 cross-validation likelihood cycles.

Microarray data Prognostic model construction

The DNA methylation profle (Illumina Infnium Human To better investigate the performance of those DMGs in pre- Methylation 450 BeadChip) was downloaded from TCGA dicting prognosis, a risk score was built, with the coefcients (https://gdc-porta​ l.nci.nih.gov/​ ) which including 97 healthy weighted by the penalized Cox model, and the risk score was control samples and 786 breast cancer samples as the train- calculated as per the following formula: ing cohorts (Table S1). In addition, the DNA methylation = × profle of GSE37754 (Illumina Infnium Human Methylation Risk score coefgene Methylationgene 450 BeadChip) was downloaded from GEO, containing 10 healthy control samples and 62 breast cancer samples as the where Coef­ gene denote the coefcients; Methylation­ gene is the validation cohorts. methylation level of gene. Using the median of the risk score as the cutof point, Screening of diferent methylated CpGs the training cohorts were divided into high-risk group and low-risk group. Then, Kaplan–Meier plots were used to According to the overall survival prognosis information illustrate overall survival. Its performance was assessed by from TCGA, the training cohorts were divided into two receiver-operating characteristic (ROC) analysis. Moreo- groups, including group tumor and group control. Next, ver, we also performed univariate and multivariate Cox

1 3 Breast Cancer Research and Treatment (2019) 178:565–572 567 regression analysis using backward selection to test the Co‑methylation analysis independent signifcance of diferent factors; the p value threshold was 0.1 (p > 0.1) for removing nonsignifcant According to the WGCNA, nine modules (617 DMGs) with variables from the analysis, and the marginally signifcant diferent colors were identifed among which six modules variables (0.05 < p < 0.1) remained in the fnal Cox model. (brown, green, yellow, black, turquoise, and red) were sig- Covariates included radiotherapy (yes vs. no), age (> 60 nifcantly associated with breast cancer (Figure S1). vs. < 60), and concurrent chemotherapy (no vs. yes). Selection of Candidate DMGs

The univariate cox regression analysis obtained 123 DMGs Results which were signifcantly related to the prognosis. Subse- quently, the multivariate cox regression analysis selected Analysis of DMGs 20 DMGs. Finally, 16 optimized DMGs including ZNF462, SYCN, PCSK1, CHL1, CORIN, RAB13, USP4, KCNA1, Based on the annotation data of the Illumina 450 K meth- ZNF274, BHLHA9, HBM, PXN, GRP, SYCE1L, COL4A2, ylation platform, 10,443 unique CpG sites were obtained. and CRHR1 were identified using the Cox-PH model Subsequently, 1777 significant DMGs were identified (Table 1). between the tumor group and control group in the training cohorts including 236 hypomethylation genes and 1541 Building a predictive signature hypermethylation genes. Volcano plots were used for visu- alization and assessment of the variation (or reproduc- A risk score was built, with the coefcients were weighted ibility) of DMGs between tumor group and control group by the penalized Cox-PH model in the training cohort. The (Fig. 1a). Two-way clustering revealed that methylation risk score was calculated as follows: patterns between tumor group and control group were dis- Risk score = (− 0.9271 × methyla- tinguishable (Fig. 1c). tion level of ZNF462) + (0.5220 × methylation

G G G G 1.00 GGGGGG G G G G G GGGG G G GGG G GGGGG GG GG GG GGGG G G GGGGGG GG G G GGG G G G G GGGGGGG GGGGGGGG G G G G GGGGGGGGGGGGG G G GGGGGGGGGGGGGGGG G A G G G G G GGGGGGGGGGGGGGGGGG G C GGGGGGGGGGGGGGGGGGGGG GGG G G GGGGGGGGGGGGGGGGGGGG G GG G GGGGGGGGGGGGGGGG G G G G G GG GGGGGGGGGGGGGG G G GGGGGGGGGGGGGGGGGGGGGG GG G G G GGGGGGGGGGGGGGGGGGGGGGGGGGGGGG G G G G G G GGGGGGGGGGGGGGGGGGGGGGGG G GG G G GGG GGGGGGGGGGGGGGGGGGGGGGGGGGG GGG GGG G G GG GGGGGGGGGGGGGGGGGGGGGGGGGGGG G G G GG GG G G GGGGGG GGGGGGGGGGGGGGGGGGG GG G G G G GGGGGGGGGGGGGGGGGGGGGGGGGGGG GGGG GGG G GG G GGGGG GGGGGGG GGGGGGGGGGGGGGGG GGGGG G G GGG GGG G G GGGGGGGGGGGGGGGGGGGGG G G G G G G GG GGG GGG GGGGGGGGGGGGGGGGGGGGGGG GGGGGGGGG G G GG G G G G GGGGGGGGGGGGGGGGGGGGGGGG GGGG G GGGG GGGGGGGGGGGGGG GGGGGGGGGGGGGGGGGGGG GGG GG G G G G G G GGGGG GGGGGGGGGGGGGGGGGGGGGG GGGGG G G G G GG GG GGGGGGGGGGGGGGGGGGGGG G G G G G GG G G G GGGG G GGGGGGGGGGGGGGGGGGGGGGG GGGGGGGGG G G G GG G G GG GGG GGGGGGGGGGGGGGG GGGGGG G G GG G GGG GG GGGGGGGGGGGGGGGGGGGGGGGGGGGGGG GG GGG GG G G G G GGGGGGG GGGGGGGGGGGGGGGGGGGGGG GGGGGGGGGG G G G GG G G GGG GGGGGGGGGGGGGGGGGGG GGGG GG G G GG G GGG GGG G G GGGGGGGGGGGGGGGGGGGGGGG GGGGGGGGGGG G G G G G G GG GG GGG GGG GGGGGGGGGGGGGGGGGG G GGGG G G G GGG GG GG GGGGG GGGGGGGGGGGGGGGGGG GGGGGGGGG G G G G GGG GG GGGGGGGGGGGGGGGGGGGGGGGG GGGGGGGGGG G G G G G GG G G G GG G GGGGGGGGGGGGGGGG GGGGGGGGGGGGGGGG G G G GGG GGGG GGGGGGG GGGGGGGGGGGGGGGGGG GGGGG GGGG GG G G GG GGGG GGGG GGGGG GGGG GGGGGGGGGGG GGGGGGGG GG GGGGGG GGGGG G G G G G GG G GGG GG GG GGGGGGGGGGGGGGGG GGGG GGGG GG G G G G G G G G G GG GGG GGGG GGGGGGGGGGGGGGG GGGGG GGG G G G G GG G GGGGGGGGG G GG GG G GGG GGGGGGG GGGGGGGGGGGGGG GG GG G G G G GGGGGG GGGG GG G G GGGGGGGGGGG G G G GG G GG GGGGGGGG GGGG GGGG GGGGGG GG GGGGG GG G GGGG GG G G G G GG G GG G G G GGGGGG GG GGGGGGGGG G GGGGG GGG GGGGGGGGG G G G G G GG G G GGGGGGGGGGGGGGG GGGGG GGG G GGGG GGGGGG G GGG G G GGG GG GG GG GGGGGGGGGGGG G GGG GGGGGG G GGG GGGG G G G G G GG G G G GGG G GGGG GGGGGGG GGGGGGG GGGGG GGGG G G G GGGG GGGGGG G GG GGG GG GGGG G GGGGG GGG GGGG G GG G G G GG G G G GG GGGG GGGGG GGG GGGGGGGGG G G G G G GG G GGG GG GGGG G G GGG G G GGGGGGGG GG GGGGGGG G GGGG G G G G G G GG G G G G GGGGGGG G GG GGG GGG GGG G GGG G G G G GGG G GGG G GG GGGGGGG GGGGGGGGG GG GGG G GGGG GGG G G GG G G G G GG GG GG G GGGGGGGG G GGGGGGGGGG G GG G G GG G GGG GG GGGGGGG GGGGG GGGG GG G GG GGGGGGGG GGGG GGG GGG GG G G G G G GG GG GG GG GGG GGGGG GGG G G GG GGGGGGGGGGGGGGG G G GG G G G GGGG G G G G G G G G GG G G G GGG GG G GGGGGGGG G GG GG GGGGG G G G GGGGG GG GGGGG GGG GG GGG GG GGGG G G GG G G G G G G G G G G GGG G GGG G G G GG GGG GGGGGGGG GGGGGG GG GGGG GG G G G G G G G G GG G GG G G GGG G GGGGGGGGGGGGGGG GGGG GG G G G G G GGG G G GG GGG GGGGGGG GG G GGGGG GG G GG GGGG G GGG G G G G G G G GGG G GGG GGGGGG GGGG GG G G G G G G G G G G GGGG G GGG G G G GGGG G G G G 0.75 G GG GG GG GG GGG G GG GG GGGG G GG G GG GGG GGGG GGG G G G G G GG G GGG G G G GG G G G G G G G G G G G G G G G GGGGG G GG GGG G G G G G GG G G GG GGG G GG GG GGGGG G GG G GGG G GGG GGG G G G G G G G G GGGGGG G GG GGG G GGGGG G GG GG GG GG G G G GG GG G G G GG G G GG GGGG GGG GG GG G GG GG G G G G G G G G G G G GGGG GG GGGG G G G GG G G GG G G G GG G GG G GGGGGGG GGG GGGGGG G G GGG G G G GG GGG G GGGG GG GG G G GGGG GGGGG G G G G GGG G GG G G G G G G GGG G GG G G GGGGGG GGGG G GG GGGGGGG G G GGG G G GG G G G G G GG G G GG GG G G G G G G G G G G G GG GG GGGGG G G GG GGGGG GGG GGG GG GGG GGG GG G G GG G G GG GG GG G G GG G G GG G G GG G GG GG G G G GG G G G G GG GG GG GG GG G G G GGG G G G G G G G G GGG GG G G G GGG GG GGGG GGG GG GG G GG G GG G G GGG G G G GG GG G GG GGG GG GGGGGG G GGG G GGG G G GGGGGGGG G G G GG G G GG G GGG G G G GG GGGGG G GGG GG G G G G GG G G G GG GG G G GGG G G G G G GGG G G GGGGGGG GGG G G G G G G GG G GG G G GG GG G G G G G GGGG G GG GG G G GG G GGGGGGG GG G G GGGGGGGGG G GG G G G G G G G GGG G GG G G G G G GG GG G GG GGG G G G GG G G G G G G G G G G GGG G G G GGG G G G GGG G GG GG GGG G G G GG G G GG GGG G G G G G G GGGGG GG GG G G G GG G GG G G G G G G G G G G G GG GGG GG G G G GGG G G G G G G G G G G GGG G GG G G GGGG GG GG GGG GG G G G G G G G GG GGGG G GG G GGG GGG G G G GG GG G GG GG G G G G G G G G G G G G GG G G GG GG G G G GG G GG GG G G G G GG G G GGGGG G G G GGG G GG G G G G GG GGG GGGGG G G G G G G G G G G GG GGGG G G G GGGG G GGG GG G G GG G G G G GG GG G G G GG G GGG GGGG G GG G G G GG G G G G GGG G GG G G G G G GG GG G G GG G GG G G G GG G G G G G G G G G G G G G GG GGGGGG G G GGGG G G G G G G G G G G G G GGG G G G G GGG G G GG G G G G G G G G GG G G G GG G G G G G G GG GG GG G G GG GG GG GG GG G G G G G G GG G G GG G G GGG G GGGGGG GG G G G G G G Control G G G GG G G G G G G G G G G GGG GGG G GGG G G G G G G G G G G GG G G G G G G GGG GG GG G G G G GG GG GGG G G G G G GGG G GG GG G GG GG G G G GG G G G G GG G G GG GG G GG G G GG G G G GGG G G G G GG GG GG G G G G G G G GGGG G G GG GGG G G G GG G G G G G G G GGG G GG GG G GG GGGG G G G G GG GGG G GG G G G G G G GG G G G G G G G G G G G GG GGG G GG GG G G G G G G GGG G G G G G G G G G G G GG GG G G G G G GG G G G G G G G G G GG GGG G G G GGG GG G G G G GG G G G G G G G G G G GG G G G GG G G G G G GG GG GG G G G G GGG G G GG G GG GG G G G G G GG G G GG G G G G GGGGGG G GGGG G G G GG GGG G G G G G G G G GG G GG G GG GG G GG GG G G G G G GG G GGG G G GG G G G G G GG G G G G G G G GGGG G G G G G GG GGGG GG GGG G G G G G G GG G G G G G GGGGG G G GGG GG GG GG GG G G G G G GG G GG G G G G G GG G G GG GG GG G G G GG G G G G G G G G G GG GGG G G GG G G G GG G G G G G G G GGG G G G G G G G G G GGGGG G G GGG G G G GG GGG G G G G G G GG G G G G G G G GGG G G G G G GG G G G G G G G G G G GGG G GG G GG G GG G GG G G GG G G G G G G GGG G GG G G GG GG G GG GG G G GG G Tumor G G G G G G G GGG G G G G GG G G G G G G G GGGG G GG GG G G G G GG G G G GG G GG G GGGG GG G G GG GG G G G G G G GG GG G G G G G G GG G G G G G GG G G GG GG G G G G GGG GG G G G G G G G G GGG G GG GG G GG G G G G G G G G G G G G G GG G G GGG G G G G G G G G G G G G G GG G G G GG G G G G G G GG G G GG G G GG G GG G G GG G G GG G G G G G G G G G G G G GG G G G G GG GG G G G G G G G GG G G G G GG G G G G G G G GG G G G 0.50 GG G G GG G G G GGG G GG G G G GG G G G G G G GG GGG G G GG G G GG GG GG G G G GG G G G G G G G GG GGG G G GGGG G G G G G GG G G G G G G G G G GG G G GG G G GG G G G G GG G G G G GG GG GGG G G GG G G G G G G G G G G G G G GG G G G G G G G G G G GG G GG G GG GG G G GG G G G G G G G GG G G G G GG G G GG G GG GG G G G G G G G G G G G GG G GG G G G G GG G G G G G G G G G G GG GG G G G G G G GG GG G G G G G G G G G G G G G G G GG GG G G GG G G G G G G G GG G G G GG GG GGGG G G G G G G G G G G G GG G GG G G G G G GGG G G G G G G GG G G GG G GG G G G G G G G G GGG G G G G GG G G G GG GGGG G G G GG G G G G G G G G G G G G G G G G G G G G G G G GGG G G G G G G G GGG G G GG GGGGG G G GGG G G G G GG G G GG G G G GGGG G G GG GGG GG G G G G G G G GG GGG GG GG GG GGG G G GG G G G G G G GG G G G G G GG GG G G G GGG G G G G GG G G GG G G G GG G G G G GGG G GGG G GG G G G G G G G G G G G GG G G G GGG G G GGGGG G G GGG G G G G GG G G G GG G G G G G G G GG G GG G G G GG G G G G G G G GG GG GG G GG G G G G G G G G G G G G G G GGG G GGG G GG G G GGGG G G G G G G G G G GG GG G GG GG G G G G G G G G G G G G G G G GG G G GG G G G G GGG GG G GG GG G GGG G G 6 G G G G GG G G G G G G G G GG G G GG GG GG G G G GG G G G G G G GGG G G GG G G G GG G GG GG GG G G G G G G GG G G G G G G G G G GGG GG G G G GGG G G G G G G G G G G GG G G G G G G G G G GG GGGG G G G G G G G G G GG G G G G G GG G G GGGG G G G GGG G G G G GG GG G GGG G G G GG G G G G G G G G G G G GG G G G G G G G G G G G G G G G G G G G G G G G G G GG GG G G G GG GG G G G G GG G G G G G GG G GGG G G G G G G G G G GG GGGGG G GGGGGG G GGG G GG GG G G G GG GG G GG G G GG GG G G GG GG G G G G G G GG G G GG G G GG GG G GGG GG GG GG G G G G G G G G G G G GGGG G G GGGGG G GGGG G G G G G G GGG GG G G GG G G G G G G G GG G G G G G G G G G G G G GG G G G G GGG G G GG G GGGGG GGG G GG GGGG G G GGG G G GG G GG G G G G G GG GG GG G G GG GGG G G G G GGG GG GGGG GGG G G G G G G G G G G G G G G G G G G GG G G G G G GG G GG GG GGGG G G G G G G G GG GGG G GGG GG GGG G G GG G G GG G GG GGG G G G G G G G GG G G GGG G G G GGGG G G GG G G G G GG G G G G G GGG G G G G G G GGG G G GGG GG G G G G GG G G G G G GG G G G G GG GG G GGG GG G G G GG GG G G GG 4 G G G G G GG GG G G G GGG G G G GGGG GGGG G G G G GGG GG G G G G GGG G G G G G G G G GG G GG G GG G G G G GGG GGG GGG G G G G GG G GGGG GGG G GG GGG GGG G G GG GG G G G G G G GG GGGG GGG GG GG GG G G G GGGG G GG G GG GG G G G G G G G G GGG G GGG G GG G GGG G GG GG G G G G G G G G G G G GG G G G GGGGG GG GG G G G G 0.25 G G G G GG GGGG GGG GG G G GG G GG GG G G GGGGG G G GG GGG GG GG GG G GG GGG GGGGGGG GGG G G G G GG GG GG G G G GG G G G G GG GGG GG GGG GGGG G G G G G G G G GGGGGGG G GG GG G G GG G GG GGGGG GG G GG G G GG G G GG G G GGG GGGG G G GG GG umor methylation level GGGG G G GG GG GG G GG G G GGG G GGG G G GGG GGG G GGGGG G GG G G Regulation G G G GGGGG GGG G G GGG GG G G G G G G G GG G GGG G G G G G G GGG G GGG G GG GGG GGGG GGGG G G G GGG G GGGGG G GGG GG GG G G G G G GG GGGGGGGG G GG GG G G G T G G G GG G G G G G G G GG G GGG GGG GGG G GG GG GGGGGG G G 2 G GG GGG GGG G G G G G G GGGG GGG GGG GG GG GG G G G GGGG GG G G G GGGGGG GGGG GG GG GGGG GGG GG G GGGGGG G GGGGGG GGGG G G GGGGGG G G GG GG GGG GGGG GG G G G GGG GG GGGG GGGGGGGGGG G G G GGGGGGGGGGGGGGGGG GGG GG GG GGGG GG GGGG GGG GGGGGGG G GGGGGGGGGGGGG GGGGGGGG G GGGGGGGGGGG GGGG GGGGGGGGG GGGGG GGGGGGGG G GG GGGGGG GG G G GG GGGGGGGGG GGGGGGGGGG Hypermethylated GG G G GGGG G GGGGG GGGG G G G G GGGGGGG GGGG GGGGGG G G G GGGGGGGGGGGG G G GG G GGGGGGGGGGGGG G G GGGGG GGGGGGGGGGGGG GGGGGG GGGGGGGGGGGG G GGGGGGGGGGGGGGGG GG G GGGGGGGGGGGGGGGG G GGGGGGGGGGGG GGGGGGGGGGGG GG G GGGGGGGGGGG GGGGGGGGG G 0 GGGG GGGGG G GGGG GGGGGG GGGGGG GG No GG

0.00 G Hypomethylated −2

0.00 0.25 0.50 0.75 1.00 −4

Normal methylation level −6 B 0.6 0.4 Density 0.2 N = 236 N = 1541 (13.28%) (86.72%) 0.0

−2 −1 01234

Log 2 (FC)

Fig. 1 Analysis of the signifcantly diferent methylation CpGs. a sity curve based on DMGs; c The two-way clustering of DMGs; the The volcano plot shows the signifcantly diferent methylation CpG- colors indicate methylation levels, and from green to red indicates related DMGs; red points represent the hypermethylated DMGs, blue methylation levels from low to high, respectively points represent the hypomethylated DMGs; b Log2 Kernel den-

1 3 568 Breast Cancer Research and Treatment (2019) 178:565–572

Table 1 The optimized CpG data

ID Gene Chr. Position Location Coef Hazard Ratio 95% CI P value cg13804575 ZNF462 chr9 108666820 5′UTR​ − 0.9271 0.1110 0.0221–0.557 7.560E−03 cg22290648 SYCN chr19 44386549 1stExon 0.5220 6.7910 1.281–9.199 2.437E−02 cg23187653 PCSK1 chr5 95794764 TSS200 − 0.1412 0.1053 0.0164–0.674 1.750E−02 cg25482786 CHL1 chr3 213496 TSS200 2.4676 6.5910 4.865–9.012 4.280E−05 cg26232715 CORIN chr4 47533975 promoter 1.8749 2.8250 1.769–5.412 1.811E−02 cg26336059 RAB13 chr1 152225601 TSS200 − 2.1655 0.0224 0.00825–0.0606 7.930E−03 cg26353296 USP4 chr3 49353128 promoter − 0.5395 0.0944 0.0212–0.419 1.920E−03 cg26590537 KCNA1 chr12 4890463 5′UTR​ − 0.5816 0.0870 0.00879–0.860 3.673E−02 cg26698460 ZNF274 chr19 63407816 Body − 0.7784 0.0776 0.00671–0.896 4.058E−02 cg26953640 BHLHA9 chr17 1121223 1stExon − 0.6012 0.1043 0.0115–0.945 4.449E−02 cg26976732 HBM chr16 156100 Body − 0.5501 0.0854 0.0121–0.602 1.358E−02 cg27168573 PXN chr12 119185312 Body − 1.5005 0.0423 0.00303–0.589 1.862E−02 cg27338487 GRP chr18 55038384 1stExon 1.5255 2.7970 1.004–7.793 4.974E−02 cg27430961 SYCE1L chr16 75804158 promoter 1.5883 4.6790 2.976–6.364 2.500E−04 cg27546237 COL4A2 chr13 109758453 TSS1500 0.3422 1.6660 1.586–2.175 1.905E−02 cg27551605 CRHR1 chr17 41218673 Body − 2.6558 0.0025 0.000195–0.0318 4.010E−06

ID CpGs ID, Gene the CpG-related genes, Chr , Location the location of the methylated site on the gene, Coef cox regression coef- fcient level of SYCN)—(0.1412 × methylation level patients with a high-risk score but not in patients with low- of PCSK1) + (2.4676 × methylation level of risk score (HR: 1.246, 95% CI 0.515 to 3.011; p = 0.625). CHL1) + (1.8749 × methylation level of CORIN)— The patients with radiotherapy compared with the patients (2.1655 × methylation level of RAB13)—(0.5395 × meth- without radiotherapy were associated with improved overall ylation level of USP4)—(0.5816 × methylation survival (HR: 0.418, 95% CI 0.249 to 0.703; p = 6.991e–04) level of KCNA1)—(0.7784 × methylation level of in patients with a high-risk score but not in patients with low- ZNF274)—(0.6012 × methylation level of BHLHA9)— risk score (HR: 2.092, 95% CI 0.574 to 7.629; p = 0.253). (0.5501 × methylation level of HBM)—(1.5005 × meth- With regard to the association between the patients with ylation level of PXN) + (1.5255 × methylation recurrence and the patients without recurrence they were all level of GRP) + (1.5883 × methylation level of associated with improved overall survival (HR: 7.475, 95% SYCE1L) + (0.3422 × methylation level of COL4A2)— CI 4.333 to 12.901; p = 6.991e–04) in patients with a high- (2.6558 × methylation level of CRHR1). risk score and in patients with low-risk score (HR: 14.33, Using the median of the risk score as the cutof point, 393 95% CI 4.265 to 48.17; p = 4.883e–13) (Fig. 3). high-risk patients (50%) had poorer RFS (hazard ratio [HR], 4.674; 95% CI 2.918 to 7.487; P = 1.678e–12; Fig. 2a) than did the 393 low-risk patients (50%) in the training cohort. Discussion The validation analyses were performed in an internal vali- dation cohort. A total of 31 patients were categorized (50%) In this study, we used methylation microarray data from into the high-risk group and 31 patients (50%) into the low- TCGA and GEO to develop and validate a novel prognostic risk groups (HR 2.679; 95% CI 1.224 to 5.863; P = 0.0104; tool based on 16 CpGs corresponding to 16 genes (ZNF462, Fig. 2b). The AUC and corresponding 95% confidence SYCN, PCSK1, CHL1, CORIN, RAB13, USP4, KCNA1, intervals for training and validation datasets were 0.972 and ZNF274, BHLHA9, HBM, PXN, GRP, SYCE1L, COL4A2, 0.942, respectively. and CRHR1) referred as DMGs that are compared with clin- The univariate and multivariate analysis of overall sur- ical risk factors, which has the improved ability to predict vival by clinical factors in the training is shown in Table 2. prognosis in patients with breast cancer. Our results indi- The stratifed analysis of clinical characteristics of age, cated that the 16 DMGs identifed in this study could be used radiotherapy, and recurrence illustrated that the patients to categorize patients into high-risk and low-risk groups of with the age > 60 years compared with the patients with the patients who had signifcantly diferent overall survival. age < 60 years were associated with improved overall sur- So far, several studies have reported the DNA meth- vival (HR 2.088, 95% CI 1.348 to 3.235; p = 7.575e–04) in ylation profles in BC. For example, by means of bisulfte

1 3 Breast Cancer Research and Treatment (2019) 178:565–572 569

Fig. 2 The performance of 16 A Training set (N = 786) CpG Methylation Signatures TCGA in predicting recurrence in 1. 0 the training and the validation 1.0 p = 1.678e-12 HR:4.674\[2.918 - 7.487] of cohorts. a For the training 0.8 cohort; b For the validation 0.8 cohort 0.6 0.6

al ratio AUC: 0.972 viv 0.4 0.4 Sensitivity Sur 0.2 Low risk (N = 393) 0.2 High risk (N = 393) 0.0 0.0

050 100 150 200 250 1.0 0.8 0.6 0.4 0.2 0.0 Overall survival time(months) Specificity B Validation set (N = 62) GSE37754 .0 .0 0. 81 0. 81 .6 0. 6 l ratio AUC: 0.942 vi va 0. 40 0. 4 p = 0.0104 Sensitivity Sur HR:2.679[1.224 - 5.863] .2 .2 Low risk (N = 31) High risk (N = 31) 0. 00 0. 00

020406080 100 120 140

1.00.8 0.6 0.4 0.2 0.0 Overall survival time(months) Specificity

Table 2 Clinical factor statistics and cox regression analysis Clinical characteristics TCGA(N = 786) Uni-variables cox Multi-variables cox HR 95%CI P HR 95%CI P

Age (years, mean ± SD) 57.99 ± 13.19 1.03 1.015–1.045 6.492E−05 1.043 1.0194–1.068 3.350E−04 ER status (positive/negative/–) 574/170/42 0.648 0.424–0.992 4.400E−02 0.577 0.301 –1.107 9.802E−02 PR status (positive/negative/–) 501/240/45 0.76 0.505–1.144 1.870E−01 – – – Her2 status (positive/negative/–) 58/406/322 0.936 0.425–2.06 8.690E−01 – – – Histological type (Basal/Her2/LumA/LumB/ 88/32/278/128/19/241 1.164 0.937–1.445 1.690E−01 – – – Normal/–) Pathologic_M (M0/M1/–) 617/13/156 5.245 2.664–10.33 1.229E−07 2.168 0.689–6.819 1.858E−01 Pathologic_N (N0/N1/N2/N3/–) 349/272/95/58/12 1.6 1.297–1.973 8.374E−06 1.135 0.709–1.817 5.953E−01 Pathologic_T (T1/T2/T3/T4/–) 200/451/109/23/3 1.484 1.171–1.882 1.050E−03 1.387 0.848–2.268 1.923E−01 Pathologic_stage (I/II/III/IV/–) 127/440/199/11/9 2.262 1.685–3.038 3.963E−08 2.315 0.919–5.829 7.467E−02 Radiotherapy (yes/no/–) 413/314/59 0.563 0.355–0.892 1.320E−02 0.481 0.247–0.937 3.149E−02 Recurrence (yes/no/–) 74/623/89 9.173 5.570–15.11 2.000E−16 6.602 3.521–12.379 4.010E−09 Risk status (high/low) 393/393 4.674 2.918–7.487 1.678E−12 8.601 3.662–20.202 7.860E−07 Dead (death/alive) 105/681 – – – – – – Overall survival time (months, mean ± SD) 42.03 ± 38.71 – – – – – –

TCGA ​the cancer genome atlas, HR hazard ratio

1 3 570 Breast Cancer Research and Treatment (2019) 178:565–572

Low risk−Radiotherapy Low risk−Recurrence A Low risk−Age 1.0 1.0 1.0 0.8 0.8 0.8 0.6 0.6 0.6 al ratio al ratio al ratio viv viv viv p = 0.253 0.4 0.4 0.4 p = 0.625 p = 2.156e-08 Sur Sur

Sur HR:2.092[0.574 - 7.629] HR:1.246[0.515 - 3.011] HR:14.33[4.265 - 48.17] 0.2 0.2 0.2 Age =< 60(N = 244) Without radiotherapy(N = 159) Without recurrence(N = 328) Age > 60(N = 149) With radiotherapy(N = 212) With recurrence(N = 29) 0.0 0.0 0.0

050 100 150 200 250 050 100 150 200 250 050 100 150 200 250

Overall survival time(months) Overall survival time(months) Overall survival time(months)

B High risk−Age High risk−Radiotherapy High risk−Recurrence .0 .0 Without recurrence(N = 295) 1. 0 Age < 60(N = 203) With recurrence(N = 45) Age> 60(N = 190) p = 6.991e-04 0. 81 0. 81 0. 8 HR:0.418[0.249 - 0.703] o o o p = 7.575e-04 .6 .6 0. 6

HR:2.088[1.348 - 3.235] al rati al rati al rati p = 4.883e-13 iv viv viv .4 HR:7.475[4.333 - 12.901] 0. 40 0. 40 Su rv Sur Sur .2 0. 20 0. 2 Without radiotherapy(N = 155) With radiotherapy(N = 201) 0. 00 0. 0 0. 0

050 100 150 200 250 050 100 150 200 250 050 100 150 200 250

Overall survival time(months) Overall survival time(months) Overall survival time(months)

Fig. 3 Kaplan–Meier curves of overall survival according to low-risk or high-risk scores stratifed by age, administration of radiotherapy, and recurrence. a For low risk—age, radiotherapy, and recurrence; b For high risk—age, radiotherapy, and recurrence sequencing PCR, they confirmed that RASSF1a, P16, development might help to improve prognostic and thera- and PCDHGB7 displayed signifcant sensitivity and speci- peutic strategies. In the present study, we identifed a group fcity as diagnostic biomarkers for BC (P < 0.001) [14]. of 16 diferent methylated CpGs corresponding to 16 genes Based on a quantitative methylation-specifc real-time PCR that efectively predict prognosis of breast cancer. Among analysis, it has been found that both the higher methylation these genes, the CHL1 gene which encodes a cell-adhesion of genes PER1, 2, 3 and the lower methylation of CLOCK, molecule belongs to the L1 family of CAMs, and CHL1 BMAL1, and CRY2 could be potential indicators of BC is involved in human breast tumorigenesis and progres- [15]. Yang et al. have discovered 45 CpGs residing in 18 sion [17]. It has been found CHL1 hypermethylation as a genomic regions which have not previously been associated potential biomarker of poor prognosis in breast cancer [18]. with breast cancer risk from 228,951 women of European Besides, ubiquitin-specifc protease 4 (USP4), which is a descent [16]. Diferent studies showed the diferent markers, deubiquitinating enzyme with key roles in the regulation of which illustrated that there are no exact or specifc markers TGF-β1, is also identifed in our study. Many studies have for the diagnosis of BC. Unlike in the previous studies, we revealed the important role of USP4 in breast cancer. For used in this study the WGCNA, penalized Cox-PH model, example, USP4 could promote invasion of breast cancer the univariate and multivariate analysis of overall survival cells via Relaxin/TGF-β1/Smad2/MMP-9 signal [19], and by clinical factors to screen the markers for BC. We believed USP4 could inhibit breast cancer cell growth through the that analyses based on diferential statistical tests may result upregulation of PDCD4 [20]. In addition, collagen type IV in diferent outcomes. alpha 2 (COL4A2), is also detected. COL4A2 is the major Cancer is a heterogeneous disease, and exploring the structural component of basement membranes. The C-termi- dysmethylated genes involved in carcinogenesis and nal portion of the , known as canstatin, is an inhibitor

1 3 Breast Cancer Research and Treatment (2019) 178:565–572 571 of angiogenesis and tumor growth [21]. The suppression of 2. Ferlay J, Soerjomataram I, Dikshit R, Eser S, Mathers C, Rebelo COL4A2 mRNA inhibits triple-negative breast cancer cell M, Parkin DM, Forman D, Bray F (2015) Cancer incidence and mortality worldwide: sources, methods and major patterns in proliferation and migration [22]. Compared to our study, GLOBOCAN 2012. Int J Cancer 136(5):E359–386. https://doi.​ the commercial kit used for the BC diagnosis showed main org/10.1002/ijc.29210​ fucus in the expression of genes related to cell proliferation, 3. Tang Q, Holland-Letz T, Slynko A, Cuk K, Marme F, Schott estrogen, HER2, and invasion [23], which were diferent S, Heil J, Qu B, Golatta M, Bewerunge-Hudler M, Sutter C, Surowy H, Wappenschmidt B, Schmutzler R, Hoth M, Bugert from our markers. Therefore, diferent genes corresponding P, Bartram CR, Sohn C, Schneeweiss A, Yang R, Burwinkel B to diferent functions could show the diferent mechanism (2016) DNA methylation array analysis identifes breast cancer of BC. However, further studies are needed to validate their associated RPTOR, MGRN1 and RAPSN hypomethylation in functions. peripheral blood DNA. Oncotarget 7(39):64191–64202. https://​ doi.org/10.18632​/oncot​arget​.11640​11640​ Aberrant DNA methylation is a critical mechanism in car- 4. Cady B (2007) Local therapy and survival in breast cancer. N cinogenesis [24]. Although DNA methylation profles are Engl J Med 357(10):1051–1052 author reply 1052 often tissue- and cell-type specifc, recent data indicate that 5. Hudis CA (2007) Trastuzumab–mechanism of action and use in epigenetic changes in blood cell DNA are potential mark- clinical practice. N Engl J Med 357(1):39–51 6. Jones PA, Baylin SB (2007) The epigenomics of cancer. Cell ers for solid tumors [25]. In our study, we used the tissue 128(4):683–692 samples from TCGA to screen the markers and the blood 7. Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, Delano samples from GEO to validate the markers. And the markers D, Zhang L, Schroth GP, Gunderson KL, Fan JB, Shen R (2011) screened in tissue samples showed a good performance in High density DNA methylation array with single CpG site reso- lution. Genomics 98(4):288–295 the blood samples, which illustrated the abnormal methyla- 8. Gentleman R, Carey VJ, Huber W, Irizarry RA, Dudoit S (2005) tion of blood could be used to diagnose BC. However, these Bioinformatics and computational biology solutions using R results should be validated through further studies. and Bioconductor. Springer, New York The study’s limitations should be noted. First, the meth- 9. Benjamini Y, Hochberg Y (1995) Controlling the false discov- ery rate: a practical and powerful approach to multiple testing. ods for screening the biomarkers was based on the statistical J R Stat Soc Ser B (Methodological) 57(1):289–300 method rather than the biological experiment. Therefore, 10. Langfelder P, Horvath S (2008) WGCNA: an R package for the biologic mechanisms of the candidate markers are still weighted correlation network analysis. BMC Bioinformatics unknown. Second, additionally no experiments such as 9:559 11. Wang P, Wang Y, Hang B, Zou X, Mao JH (2016) A novel gene pyrosequencing were conducted to validate the methylation expression-based prognostic scoring system to predict survival levels of the marker genes. Third, further validation in pro- in gastric cancer. Oncotarget 7(34):55343–55351 spective studies and multicenter clinical trials are needed. 12. Goeman JJ (2010) L1 penalized estimation in the Cox pro- Fourth, we did not analyze the relevance of chemotherapy portional hazards model. Biom J 52(1):70–84. https​://doi. org/10.1002/bimj.20090​0028 or endocrine therapy to the overall survival of BC, which 13. Tibshirani R (1997) The lasso method for variable selection in could, to some degree, illustrate the key factors related to the Cox model. Stat Med 16(4):385–395 the overall survival of BC. Fifth, there existed probe design 14. Shan M, Zhang L, Liu Y, Gao C, Kang W, Yang W, He Y, bias in Illumina Infnium 450 k DNA methylation data, and Zhang G (2019) DNA methylation profiles and their diag- nostic utility in BC. Dis Mark 2019:6328503. https​://doi. a key statistical method should be used to adjust for the two org/10.1155/2019/63285​03 diferent probe designs. 15. Lesicka M, Jablonska E, Wieczorek E, Seroczynska B, Kalinow- In conclusion, 16-CpG methylation, is a potential prog- ski L, Skokowski J, Reszka E (2019) A diferent methylation nostic tool for predicting prognosis for patients with breast profle of circadian genes promoter in breast cancer patients according to clinicopathological features. Chronobiol Int. https​ cancer and might help clinicians in directing personalized ://doi.org/10.1080/07420​528.2019.16177​32 therapeutic regimen selection for patients with breast cancer. 16. Yang Y, Wu L, Shu XO, Cai Q, Shu X, Li B, Guo X, Ye F, Michailidou K, Bolla MK, Wang Q, Dennis J, Andrulis IL, Brenner H, Chenevix-Trench G, Campa D, Castelao JE, Gago- Compliance with ethical standards Dominguez M, Dork T, Hollestelle A, Lophatananon A, Muir K, Neuhausen SL, Olsson H, Sandler DP, Simard J, Kraft P, Pharoah PDP, Easton DF, Zheng W, Long J (2019) Genetically Conflicts of interest The authors declare that they have no confict of predicted levels of DNA methylation biomarkers and breast can- interest. cer risk: data from 228,951 women of European descent. J Natl Cancer Inst. https​://doi.org/10.1093/jnci/djz10​9 17. He LH, Ma Q, Shi YH, Ge J, Zhao HM, Li SF, Tong ZS (2013) CHL1 is involved in human breast tumorigenesis and progres- References sion. Biochem Biophys Res Commun 438(2):433–438 18. Martin-Sanchez E, Mendaza S, Ulazia-Garmendia A, Mon- 1. Johnson RH, Chien FL, Bleyer A (2013) Incidence of breast can- real-Santesteban I, Blanco-Luquin I, Cordoba A, Vicente- cer with distant involvement among women in the United States, Garcia F, Perez-Janices N, Escors D, Megias D, Lopez-Serra 1976 to 2009. JAMA 309(8):800–805. https​://doi.org/10.1001/ P, Esteller M, Illarramendi JJ, Guerrero-Setas D (2017) CHL1 jama.2013.77616​56255​

1 3 572 Breast Cancer Research and Treatment (2019) 178:565–572

hypermethylation as a potential biomarker of poor prognosis in 23. Paik S, Tang G, Shak S, Kim C, Baker J, Kim W, Cronin M, breast cancer. Oncotarget 8(9):15789–15801 Baehner FL, Watson D, Bryant J, Costantino JP, Geyer CE Jr, 19. Cao WH, Liu XP, Meng SL, Gao YW, Wang Y, Ma ZL, Wang Wickerham DL, Wolmark N (2006) Gene expression and beneft XG, Wang HB (2016) USP4 promotes invasion of breast cancer of chemotherapy in women with node-negative, estrogen - cells via Relaxin/TGF-beta1/Smad2/MMP-9 signal. Eur Rev Med positive breast cancer. J Clin Oncol 24(23):3726–3734 Pharmacol Sci 20(6):1115–1122 24. Suzuki MM, Bird A (2008) DNA methylation landscapes: pro- 20. Li Y, Jiang D, Zhang Q, Liu X, Cai Z (2016) Ubiquitin-specifc vocative insights from epigenomics. Nat Rev Genet 9(6):465–476 protease 4 inhibits breast cancer cell growth through the upregu- 25. Teschendorf AE, Menon U, Gentry-Maharaj A, Ramus SJ, Gay- lation of PDCD4. Int J Mol Med 38(3):803–811. https​://doi. ther SA, Apostolidou S, Jones A, Lechner M, Beck S, Jacobs IJ, org/10.3892/ijmm.2016.2685 Widschwendter M (2009) An epigenetic signature in peripheral 21. Turner AW, Nikpay M, Silva A, Lau P, Martinuk A, Linseman blood predicts active ovarian cancer. PLoS ONE 4(12):e8274. TA, Soubeyrand S, McPherson R (2015) Functional interaction https​://doi.org/10.1371/journ​al.pone.00082​74 between COL4A1/COL4A2 and SMAD3 risk loci for coronary artery disease. Atherosclerosis 242(2):543–552 Publisher’s Note Springer Nature remains neutral with regard to 22. JingSong H, Hong G, Yang J, Duo Z, Li F, WeiCai C, XueYing L, jurisdictional claims in published maps and institutional afliations. YouSheng M, YiWen O, Yue P, Zou C (2017) siRNA-mediated suppression of collagen type IV alpha 2 (COL4A2) mRNA inhib- its triple-negative breast cancer cell proliferation and migration. Oncotarget 8(2):2585–2593

1 3