Cpg Methylation Signature Predicts Prognosis in Breast Cancer
Total Page:16
File Type:pdf, Size:1020Kb
Breast Cancer Research and Treatment (2019) 178:565–572 https://doi.org/10.1007/s10549-019-05417-3 PRECLINICAL STUDY CpG methylation signature predicts prognosis in breast cancer Tonghua Du1 · Bin Liu1 · Zhenyu Wang1 · Xiaoyu Wan1 · Yuanyu Wu2 Received: 14 May 2019 / Accepted: 22 August 2019 / Published online: 13 September 2019 © Springer Science+Business Media, LLC, part of Springer Nature 2019 Abstract Purpose DNA methylation can be used as prognostic biomarkers in various types of cancers. We aimed to identify a CpG methylation pattern for breast cancer. Methods In this study, using the microarray data from the cancer genome atlas (TCGA) and gene expression omnibus (GEO), we profled DNA methylation between 97 healthy control samples and 786 breast cancer samples in a training cohort (from TCGA, n = 883) to build a gene classifer using a penalized regression model. We validated the prognostic accuracy of this gene classifer in an internal validation cohort (from GEO, n = 72). Results A total of 1777 diferentially methylated CpGs corresponding to 1777 diferent methylated genes (DMGs) between breast cancer and control were chosen for this study. Subsequently, 16 CpGs were generated to classify patients into high-risk and low-risk groups in the training cohort. Patients with high-risk scores in the training cohort had shorter overall survival (hazard ratio [HR], 4.674; 95% CI 2.918 to 7.487; P = 1.678e–12) than patients with low-risk scores. The prognostic accuracy was also validated in the validation cohorts. Furthermore, among patients with low-risk scores in the combined training and validation cohorts, the patients with the age > 60 years compared with the patients with the age < 60 years were associated with improved overall survival (HR 2.088, 95% CI 1.348 to 3.235; p = 7.575e–04) in patients with a high-risk score but not in patients with low-risk score (HR 1.246, 95% CI 0.515 to 3.011; p = 0.625). The patients treated with radiotherapy compared with the patients without radiotherapy were associated with improved overall survival (HR 0.418, 95% CI 0.249 to 0.703; p = 6.991e-04) in patients with a high-risk score but not in patients with low-risk score (HR 2.092, 95% CI 0.574 to 7.629; p = 0.253). For the patients with recurrence and the patients without recurrence both groups were all associated with improved overall survival (HR 7.475, 95% CI 4.333 to 12.901; p = 6.991e–04) in patients with a high-risk score and in patients with low-risk score (HR 14.33, 95% CI 4.265 to 48.17; p = 4.883e–13). Conclusion The 16 CpG-based signature is useful as a biomarker in predicting prognosis for patients with breast cancer. Keywords Breast cancer · DNA methylation · Prognosis · Overall survival · CpG sites Introduction Breast cancer, the most frequent cancer among women, is the leading cause of cancer mortality for women, and dra- matically, the incidence is increasing among young women Electronic supplementary material The online version of this aged less than 40 years [1, 2]. According to the statistics article (https ://doi.org/10.1007/s1054 9-019-05417 -3) contains of 2016, about 246,660 female were diagnosed with breast supplementary material, which is available to authorized users. cancer, of whom an estimated 40,450 people are expected * Yuanyu Wu to die of breast cancer [3]. In recent years, although many [email protected] advanced treatment techniques have been used to improve the survival rate, the patients still sufer from low quality 1 Department of Breast Surgery, The Second Clinical Hospital Of Jilin University, NO. 218, Ziqiang Street, Nanguan of life or develop metastasis for late diagnosis [4]. As a District, Changchun 130000, Jilin, China result, the prognosis-related markers of breast cancer could 2 Department of Gastrointestinal and Colorectal Surgery, increase treatment options, including surgical resection and China-Japan Union Hospital of Jilin University, No.126 therapeutic interventions [5]. Xiantai St, Changchun, Jilin 130033, China Vol.:(0123456789)1 3 566 Breast Cancer Research and Treatment (2019) 178:565–572 Biomarkers strongly associated with breast cancer risk the methylation loci at the genes’ CpG sites were further factors will provide an opportunity to understand cancer screened for analysis. Methylation at each CpG site is development. Gene expression profles have also been used described as β value [β = intensity of the methylated allele for breast cancer classifcation and served as prognostic (M)/(intensity of the unmethylated allele (U) + intensity of and therapeutic predictors. However, there are still major the methylated allele (M) + 100)] [7]. It is expressed as a challenges in accurate early prediction of breast cancer inci- continuous variable that ranges from 0 (no methylation) to dence, detection and prognosis. Thus, it is of great need 1 (full methylation). Limma (linear model for microarray to identify sensitive biomarkers to evaluate breast cancer data) package in R language was used to screen the diferen- prognosis at an early treatable stage. tially methylated genes (DMGs) [8]. Bonferroni in multi-test DNA methylation refers to heritable and modifiable package was employed to adjust the p value into false dis- markers that regulate gene expression without changing the covery rate (FDR) [9]. The FDR < 0.05 and |log2 fold-change underlying DNA sequence. So far, many researches have (FC)| > 0.5 were used as the cutof criteria for the DMGs. demonstrated that DNA methylation may play a critical role in carcinogenesis through downregulating tumor suppressor Co‑methylated genes screening genes expression [6]. It has also been found that the aber- rant DNA methylation could be an important factor in breast WGCNA (weighted gene coexpression network analy- cancer. For example, Tang et al. using the DNA methylation sis, version 1.63, https ://cran.r-proje ct.org/web/packa ges/ array have also identifed breast cancer-associated RPTOR, WGCNA /index .html) package in R language was used to MGRN1, and RAPSN hypomethylation in peripheral blood analyze the correlation among the DMGs [10]. Modules DNA [3]. Given that DNA methylation changes are plausibly were defned as clusters of highly interconnected DMGs, critical components of the molecular mechanisms involved and DMGs within the same cluster have high correlation in breast cancer, distinct DNA methylation profles could coefcients among them. be a potential biomarker to improve the accuracy of breast cancer prognosis. Prognosis‑related methylated genes screening In the present study, using the downloaded Illumina 450 K DNA methylation array data of breast cancer from The univariate and multivariate cox regression analysis of the cancer genome atlas (TCGA) database and gene expres- survival package (Version 2.41-1, http://bioco nduct or.org/ sion omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/), we packa ges/survi valr/) in R language was used to identify aimed to screen the breast cancer-related genes with aber- DMGs associated to the prognosis using the DMGs with rant DNA methylation and constructed a model using the the threshold value of log-rank p < 0.05 [11]. In addition, candidate genes in 863 training cohorts and validated those the penalized package (Version 0.9-50, http://bioco nduct genes in 72 cohorts. or.org/packages/penal ized/ ) [12] was applied to identify the optimized DMGs based on the Cox-PH (Cox-proportional Hazards) model [13]. The optimized parameter “lambda” Materials and methods was obtained by 1000 cross-validation likelihood cycles. Microarray data Prognostic model construction The DNA methylation profle (Illumina Infnium Human To better investigate the performance of those DMGs in pre- Methylation 450 BeadChip) was downloaded from TCGA dicting prognosis, a risk score was built, with the coefcients (https://gdc-porta l.nci.nih.gov/ ) which including 97 healthy weighted by the penalized Cox model, and the risk score was control samples and 786 breast cancer samples as the train- calculated as per the following formula: ing cohorts (Table S1). In addition, the DNA methylation = × profle of GSE37754 (Illumina Infnium Human Methylation Risk score coefgene Methylationgene 450 BeadChip) was downloaded from GEO, containing 10 healthy control samples and 62 breast cancer samples as the where Coefgene denote the coefcients; Methylation gene is the validation cohorts. methylation level of gene. Using the median of the risk score as the cutof point, Screening of diferent methylated CpGs the training cohorts were divided into high-risk group and low-risk group. Then, Kaplan–Meier plots were used to According to the overall survival prognosis information illustrate overall survival. Its performance was assessed by from TCGA, the training cohorts were divided into two receiver-operating characteristic (ROC) analysis. Moreo- groups, including group tumor and group control. Next, ver, we also performed univariate and multivariate Cox 1 3 Breast Cancer Research and Treatment (2019) 178:565–572 567 regression analysis using backward selection to test the Co‑methylation analysis independent signifcance of diferent factors; the p value threshold was 0.1 (p > 0.1) for removing nonsignifcant According to the WGCNA, nine modules (617 DMGs) with variables from the analysis, and the marginally signifcant diferent colors were identifed among which six modules variables (0.05 < p < 0.1) remained in the fnal Cox model. (brown, green, yellow, black, turquoise, and red) were sig- Covariates included radiotherapy (yes vs. no), age (> 60 nifcantly associated with breast cancer (Figure S1). vs. < 60), and concurrent chemotherapy (no vs. yes). Selection of Candidate DMGs The univariate cox regression analysis obtained 123 DMGs Results which were signifcantly related to the prognosis. Subse- quently, the multivariate cox regression analysis selected Analysis of DMGs 20 DMGs. Finally, 16 optimized DMGs including ZNF462, SYCN, PCSK1, CHL1, CORIN, RAB13, USP4, KCNA1, Based on the annotation data of the Illumina 450 K meth- ZNF274, BHLHA9, HBM, PXN, GRP, SYCE1L, COL4A2, ylation platform, 10,443 unique CpG sites were obtained.