Pan-Cancer Analysis of DNA Methylation Identifies Genes And
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.20.449136; this version posted June 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. Pan-Cancer Analysis of DNA Methylation Identifies Genes and Biological Functions Associated with Overall Survival Romola Cavet1, Peng Yue2, Guy Cavet*3 1The Nueva School, San Mateo, California, United States of America 2ReviR Therapeutics, Palo Alto, California, United States of America 3Paramune, Inc., Palo Alto, California, United States of America * Corresponding author E-Mail: [email protected] bioRxiv preprint doi: https://doi.org/10.1101/2021.06.20.449136; this version posted June 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. Abstract DNA methylation influences gene expression and is altered in many cancers, but the relationship between DNA methylation and cancer outcomes is not yet fully understood. If methylation of specific genes is associated with better or worse outcomes, it could implicate genes in driving cancer and suggest therapeutic strategies. To advance our understanding of DNA methylation in cancer biology, we conducted a pan-cancer analysis of the relationship between methylation and overall survival. Using data on 28 tumor types from The Cancer Genome Atlas (TCGA), we identified genes and genomic regions whose methylation was recurrently associated with survival across multiple cancer types. While global DNA methylation levels are associated with outcome in some cancers, we found that the gene-specific associations were largely independent of these global effects. Genes with recurrent associations across cancer types were enriched for certain biological functions, such as immunity and cell-cell adhesion. While these recurrently associated genes were found throughout the genome, they were enriched in certain genomic regions, which may further implicate certain gene families and gene clusters in affecting survival. By finding common features across cancer types, our results link DNA methylation to patient outcomes, identify biological mechanisms that could explain survival differences, and support the potential value of treatments that modulate the methylation of tumor DNA. bioRxiv preprint doi: https://doi.org/10.1101/2021.06.20.449136; this version posted June 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. Introduction Cancer is defined by a plethora of genetic and epigenetic alterations that alter gene activity and promote uncontrolled cell proliferation [1]. As a consequence, the study of tumor genomes and large-scale computational data analysis have proved invaluable in understanding cancer development, prognosis, and treatment [2,3]. DNA methylation is the epigenetic conversion of cytosine to 5-methylcytosine, and is widely implicated in both normal and abnormal regulation of genome function [4]. DNA methyltransferases (DNMTs) facilitate methylation, acting as either de novo DNMTs (which establish early methylation patterns during development) or as maintenance DNMTs (which copy patterns of DNA methylation onto new strands after DNA replication) [4,5]. The mechanism by which DNMTs are regulated is suggested to be through their interaction with proliferating cell nuclear antigens (PCNA) and RAD9 [6]. DNA methylation has long been suspected to influence gene expression. Although the exact mechanism is yet unknown, one possibility is that it impedes transcription activators from binding to the DNA molecule [5]. DNA methylation of gene bodies and gene promoters have both been found to have an inverse relationship with gene expression [7,8]. DNA methylation is frequently altered in cancer and is related both to tumorigenesis and prognosis. Both localized hypermethylation and global hypomethylation are observed in cancer and can be associated with the development and progression of the disease [9]. Hypermethylation in certain genomic regions can promote tumorigenesis [10]. For example, DNA methylation of promoter regions for tumor suppressor genes has repeatedly been implicated as an early driving factor in tumorigenesis [11–13]. DNA methylation inhibitors such as azacytidine and decitabine are proven treatments in some indications, and are believed to act bioRxiv preprint doi: https://doi.org/10.1101/2021.06.20.449136; this version posted June 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. by relieving inhibition of tumor suppressor gene transcription [14]. By contrast, global hypomethylation is associated with genomic instability and worse prognosis and/or progression in cancers including lower-grade glioma, prostate cancer, and breast adenocarcinoma [15–17]. Comparison of DNA methylation between tumor and normal tissue has identified patterns of aberrant methylation associated with cancer. For example, in a pan-cancer analysis of DNA methylation, patterns of DNA methylation instability were found in pathways related to cell metabolism, apoptosis, and DNA repair, such as CDKN1C, BRCA1, and MLH1 [18]. A pattern of hypermethylated genes encoding proteins in the WNT signaling pathway was also found in patients with various adenocarcinomas [19]. Identification of tumor DNA methylation patterns increases our disease knowledge and may facilitate early diagnosis [20,21]. However, further analysis is required to determine whether and how these patterns affect cancer biology and disease outcomes. We set out to directly seek methylation patterns associated with patient survival, with several goals. First, to assess the extent to which locus-specific methylation is linked to cancer outcomes. Second, to identify which genes have methylation consistently associated with survival across multiple tumor types. Finally, to identify higher-level biological functions and pathways that are enriched for these genes, and that have the potential to explain the relationships between methylation and clinical outcome. Our approach used large, publicly available cancer genomics datasets. The Cancer Genome Atlas (TCGA) program makes available cancer genomic, epigenomic and clinical data available for over 20,000 samples spanning 33 different cancer types (https://www.cancer.gov/tcga). Using TCGA data we analyzed DNA methylation patterns measured in 28 different tumor types. This analysis identified genes whose methylation is bioRxiv preprint doi: https://doi.org/10.1101/2021.06.20.449136; this version posted June 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. associated with prognosis, both positively and negatively. Some of these genes have consistent associations across several types of cancer. These genes suggest biological mechanisms, shared across multiple tumor types, by which methylation may affect cancer outcomes. bioRxiv preprint doi: https://doi.org/10.1101/2021.06.20.449136; this version posted June 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. Materials and Methods Accessing Data TCGA data were downloaded from https://www.cbioportal.org/. Firehose Legacy data sets were obtained for 32 cancer types, of which 4 were discarded due to missing data. Firehose Legacy data sets were chosen because they include large numbers of samples. Identifying genes for which methylation is associated with survival Statistical analysis was conducted in the R programming language using the packages “survival,” “survminer,” and “dplyr”. Within each cancer type, the methylation hm450 data and patient clinical data were used to test for associations between gene-level methylation and overall survival. In the first analysis, a Cox proportional hazards model was fit to evaluate the association between each gene’s methylation level and overall survival. False discovery rates (q- values) were estimated using the method of Benjamini and Hochberg [22]. Q-values below 0.05 were considered significant. For each cancer type, the genes positively associated with survival (greater methylation associated with greater survival) and the genes negatively associated with survival (greater methylation associated with shorter survival) were considered separately in further analysis. To evaluate associations between global methylation and survival for each cancer type, a global methylation score was calculated for each patient by taking the mean methylation score across all genes. Cox proportional hazards regression was used, for each cancer type separately, to evaluate the association between global methylation score and overall