Published OnlineFirst June 10, 2016; DOI: 10.1158/1541-7786.MCR-15-0395
Genomics Molecular Cancer Research A Minimal DNA Methylation Signature in Oral Tongue Squamous Cell Carcinoma Links Altered Methylation with Tumor Attributes Neeraja M. Krishnan1, Kunal Dhas1, Jayalakshmi Nair1, Vinayak Palve1, Jamir Bagwan1, Gangotri Siddappa2, Amritha Suresh2, Vikram D. Kekatpure3, Moni Abraham Kuriakose2,3, and Binay Panda1,4
Abstract
Oral tongue squamous cell carcinomas (OTSCC) are a MIR10B was significantly associated with the differential homogenous group of aggressive tumors in the head and neck expression of its target genes NR4A3 and BCL2L11 (P ¼ region that spread early to lymph nodes and have a higher 0.0125 and P ¼ 0.014, respectively), which was inversely incidence of regional failure. In addition, there is a rising correlated with disease-free survival (P ¼ 9E 15 and P ¼ incidence of oral tongue cancer in younger populations. Stud- 2E 15, respectively) in patients. Finally, differential methyl- ies on functional DNA methylation changes linked with ation in FUT3, TRIM5, TSPAN7, MAP3K8, RPS6KA2, SLC9A9, altered gene expression are critical for understanding the and NPAS3 genes was found to be predictive of certain clinical mechanisms underlying tumor development and metastasis. and epidemiologic parameters. Such studies also provide important insight into biomarkers linked with viral infection, tumor metastasis, and patient Implications: This study reveals a functional minimal meth- survival in OTSCC. Therefore, we performed genome-wide ylation profile in oral tongue tumors with associated risk methylation analysis of tumors (N ¼ 52) and correlated habits, clinical, and epidemiologic outcomes. In addition, altered methylation with differential gene expression. The NR4A3 downregulation and correlation with patient survival minimal tumor-specific DNA 5-methylcytosine signature iden- suggests a potential target for therapeutic intervention in oral tified genes near 16 different differentially methylated regions, tongue tumors. Data from the current study are deposited which were validated using genomic data from The Cancer in the NCBI Geo database (accession number GSE75540). Genome Atlas cohort. In our cohort, hypermethylation of Mol Cancer Res; 14(9); 805–19. 2016 AACR.
Introduction in the anterior part of tongue or oral tongue have an increased association with younger patients (3, 4), spread early to lymph Head and neck squamous cell carcinomas (HNSCC) are a hetero- nodes (3), and have a higher regional failure (5). Tobacco and geneous group of tumors with different incidences, mortalities, and alcohol are common risk factors for this group of cancer. Although prognosis for different subsites. HNSCCs are the sixth leading the role of human papilloma virus (HPV) is clearly understood in cause of cancer worldwide (1) and account for almost 30% of all the prognosis of oropharyngeal tumors (6), the role of HPV as an cancer cases in India (2). Oral cancer is the most common subtype etiologic agent and/or a prognostic marker in oral tongue squa- of head and neck cancers with a worldwide incidence of greater mous cell carcinoma (OTSCC) is not yet established. than 300,000 cases. The disease is an important cause of morbidity In the last 5 years, studies have identified somatic mutations, and mortality with a 5-year survival of less than 50% (1, 2). Unlike insertions, deletions and amplifications, copy number altera- other subsites of oral cavity like gingivo-buccal, tumors originating tions, loss of heterozygosity, gene expression, and DNA meth- ylation changes for different tumor subsites in HNSCC (7–12).
1 In addition, few studies (13, 14) have demonstrated the role Ganit Labs, Bio-IT Centre, Institute of Bioinformatics and Applied fi Biotechnology, Bangalore, India. 2Integrated Head and Neck Oncol- of epigenetic modi cations and associated pathways in HPV- ogy Program, Mazumdar Shaw Centre for Translational Research, positive HNSCC patients. Bangalore, India. 3Head and Neck Oncology, Mazumdar Shaw Medical Epigenetic changes are known biomarkers for cancer initia- 4 Centre, Bangalore, India. Strand Life Sciences, Bangalore, India. tion and progression (15). DNA methylation at cytosine resi- Note: Supplementary data for this article are available at Molecular Cancer dues [5-methylcytosine (5mC)] is one of the best studied Research Online (http://mcr.aacrjournals.org/). epigenetic changes in cancer (16). Cytosine methylation alters Corresponding Author: Binay Panda, Ganit Labs, Institute of Bioinformatics and gene expression and thus plays an important role in other bio- Applied Biotechnology, Biotech Park, Electronic City Phase I, Bangalore 560100, logic processes, such as embryonic development, imprinting, þ þ India. Phone: 91-80-28528900; Fax: 90-80-28528904; E-mail: and X-chromosome inactivation (17, 18). Cancer development [email protected] is characterized by two major DNA methylation changes, doi: 10.1158/1541-7786.MCR-15-0395 hypermethylation of CpG islands located in the promoter 2016 American Association for Cancer Research. regions of tumor suppressor genes (making them inactive) and
www.aacrjournals.org 805
Downloaded from mcr.aacrjournals.org on October 1, 2021. © 2016 American Association for Cancer Research. Published OnlineFirst June 10, 2016; DOI: 10.1158/1541-7786.MCR-15-0395
Krishnan et al.
global hypomethylation leading to activation of oncogenes and surgery with or without postoperative adjuvant radiation or transposons (19). By conducting genome-wide methylation chemoradiation at the Mazumdar Shaw Medical Centre were studies, one can identify potential DNA methylation markers accrued for the study (Supplementary Table S1). for early cancer detection. Previous studies profiling DNA methylation changes in HNSCC primarily used low-through- Whole-genome methylation assay put assays and identified changes in DCC, DAPK1, SPEN, Genomic DNA was extracted from tissues using DNeasy E-cadherin, CDKN2A, RASSF1,andMGMT genes (20–22). Pick- Blood and Tissue Kit (Qiagen) as per the manufacturer's spe- ering and colleagues (2013) studied methylation in oral squa- cifications. Genomic DNA quality was assessed by agarose gel mous cell carcinomas using low-density HumanMethyla- electrophoresis, and samples with intact, high molecular weight tion27K arrays and found 3,694 loci to be differenti- DNA bands were used for bisulfite conversion. Five hundred ally methylated in tumor tissues (9). Oral cancer studies by nanograms of genomic DNA was used for bisulfite conversion Shaw and colleagues (2013) used the Illumina GoldenGate using the Zymo EZ DNA Methylation Kit (Zymo Research assay that comprises 1,505 CpG island loci and found 48 loci Corp.) following the manufacturer's instructions and eluted to be differentially methylated in tumors (23). in 12 mL of elution buffer. Four microliters of the bisulfite- In this study, we profiled tumor-specific5mCchangesusing converted DNA was used as template for target preparation high-density arrays comprising 485,512 probes in 52 pairs of to hybridize on the BeadChip and was processed as per OTSCCs. We identified differentially methylated loci or probes the manufacturer's instructions following Infinium HD and (DMP) between tumors and their matched normals and stud- Infinium protocol for HumanMethylation450K BeadChip ied their functional impact by comparing the levels of differ- (Illumina), respectively. Data were collected using the HiScan ential methylation with differential gene expression in tumors. (Illumina) reader and analyzed with GenomeStudio V2011.1 We found varying extents of differential methylation in various methylation module1.1.1 (Illumina). subregions. These differentially methylated regions (DMRs) often encompass multiple genesandwereassociatedwithdif- Methylation450 BeadChip probe alignment and removal of ferential gene expression, thus representing functional DMRs. ambiguous probes from analyses We found aberrant methylation in a total of 27,276 hyper- The probe sequences were aligned using bowtie2 (25), while methylated and 21,231 hypomethylated loci with an average allowing up to 2 mismatches to the human reference hg19 Db (level of differential methylation) of at least 0.2. We sequence to detect and exclude probes mapping ambiguously to discovered a minimal methylation signature comprising 16 multiple locations in the reference genome (Supplementary Text different DMRs that harbored GPER1, TTLL8, RHPN1, OR2T6, S1). We found 17,649 probes (3.6% of the total number of MIR10B, ENAH, EMX2OS, BTK, C11orf53, CENPVL1, COL9A1, probes) to be ambiguously mapping in this manner. These probes DEFA1, ODF3, RARRES2, SHF,andSP6 genes with significant mapped to multiple locations, ranging from 2 to 6265 (Supple- differential methylation (quantitative and categorical, see mentary Table S2). Materials and Methods) using a machine learning–based ran- dom forest approach (24), which were validated using meth- Whole-genome gene expression assay ylation data (450K) from the head and neck cohort in the The Whole-genome gene expression profiling was carried out Cancer Genome Atlas (TCGA) project. In our study, hyper- with Illumina HumanHT-12 v4 expression BeadChip (Illu- methylation in a DMR-harboring miRNA, MIR10B,waslinked mina) with 21 tumors and their matched adjacent normal to downregulation of two of its validated gene targets, NR4A3 tissues. Total RNA was extracted using PureLink RNA Mini Kit and BCL2L11, which correlated well with disease-free survival. (Invitrogen), and RNA quality was checked on the bioanalyzer Furthermore, the machine-learning algorithm identified DMPs using RNA Nano6000 chip (Agilent). RNA samples with poor within TSPAN7, FUT3, TRIM5, SLC9A9, NPAS3, RPS6KA2, RNA integrity numbers (RIN; 7) on the bioanalyzer chip, MAP3K8, TIMM8A,andRNF113A genestobepredictiveof indicating partial degradation of RNA, were processed using risk habits, nodal status, tumor staging, prognosis, and HPV Illumina WGDASL assay and the ones with good RIN numbers infection. (>7) were labeled using Illumina TotalPrep RNA Amplification Kit (Ambion) as per the manufacturer's recommendations. Materials and Methods Targets were used to hybridize arrays, and arrays were pro- cessed according to the manufacturer's recommendations. Informed consent, ethics approval, and patient samples used Arrays were scanned using HiScan, Illumina, and the data in the study collected were analyzed with GenomeStudio V2011.1 Gene Informed consent was obtained voluntarily from each patient, Expression module 1.9.0 (Illumina) to check for the assay and ethics approval was obtained from the Institutional Ethics quality control probes. Raw signal intensities were exported Committees of the Mazumdar Shaw Medical Centre (Bangalore, from GenomeStudio for transformation, normalization, and India). Tumor and matched control (blood and/or adjacent differential expression analyses using R. normal tissue) specimens were collected and used in the study. Only those patients whose histologic sections confirmed the Data preprocessing and differential methylation analyses presence of squamous cell carcinoma with at least 70% tumor Raw data for 52 tumor:matched control sample pairs obtained cells in the specimen were used in this study. At the time of from the Illumina Infinium Methylation450 BeadChip scanned admission, patients were asked about their habits (chewing, using the HiScan system were analyzed in GenomeStudio Meth- smoking, and/or alcohol consumption). Fifty-two treatment- ylation Module v1.9.0 (Illumina) for assay controls, before na€ ve patients who underwent staging according to AJCC criteria exporting the b values for analysis using R. For each CpG site, and curative intent treatment as per NCCN guidelines involving ratio of fluorescent signal was measured by that of a methylated
806 Mol Cancer Res; 14(9) September 2016 Molecular Cancer Research
Downloaded from mcr.aacrjournals.org on October 1, 2021. © 2016 American Association for Cancer Research. Published OnlineFirst June 10, 2016; DOI: 10.1158/1541-7786.MCR-15-0395
DNA Methylation in Oral Tongue Tumors
probe relative to the sum of the methylated and unmethylated Minimal differentially methylated loci using machine-learning probes, termed as a b value. b values, ranging from 0 (no algorithm methylation) to 1.0 (100% methylation of both alleles), were Sample-wise dasen-normalized and batch-corrected intensi- recorded for each probe on the array. The b value is calculated ties of probes corresponding to DMRs, identified postfunctional using the formula mentioned below: normalization using minfi, along with the tumor:matched con- trol tissue type for each sample was used as a training set P b ¼ m (training set #1) for the random forest analyses. Pm þ Pum where Pm is the signal intensity of the methylation detection Variable selection probe and Pum is the signal intensity of the nonmethylation The method employed here follows the method described detection probe. Infinium methylation assay has built-in sam- before (32). Briefly, the algorithm performs both backward elim- ple-dependent and -independent controls, which were used to ination of variables and selection based on the importance test the overall efficiency of the assay for each sample. Raw signal spectrum. The Bioconductor package VarSelRF (33) was used to intensities were exported as .idat files, preprocessed (transformed perform this function. About 20% of the least important variables and normalized), and analyzed using R Bioconductor packages, are eliminated iteratively until the current out-of-bag (OOB) error wateRmelon, minfi,andComBat (26–28), to estimate differential rate becomes larger than the initial OOB error rate or the OOB methylation. The Illumina 450K data were preprocessed using error rate in the previous iteration. We computed a metric by a functional normalization procedure implemented by the multiplying the prediction accuracy (sensitivity and specificity) "preprocessFunnorm" function within the R Bioconductor pack- and repeatability, and dividing the product by the number of age minfi (Supplementary Text S2). This algorithm was recently parameters contained within the predicted set, and reported the developed and found to outperform other 450K data normal- set with the highest value for this metric. Five hundred of such ization and batch correction methods (29). After functional iterations were performed, and each of the iterations was per- normalization, we identified DMRs using "bumphunter" func- muted across 3,000 random forest trees (Supplementary Text S9). tion in minfi, retaining significant ones with P 0.05 and fwer A score was computed for predictive parameter sets across all 0.05. Dasen normalization was also performed using iterations, for each sample, which factored in the prediction wateRmelon to obtain differentially methylated loci using accuracy of the tissue type of that sample, the repeatability minfi (Supplementary Text S3). A clear difference before and (number of instances out of 500) of the predictive parameter set, after normalization was seen in the density plots (Supple- presence of other predictive parameter sets, and the parameter set mentary Fig. S1). The normalized data were further batch size. The first two weighed the score positively, and the latter two, corrected using ComBat (28) using empirical Bayes methods. negatively. The R commands used in our method are provided Dasen-normalized DMPs corresponding to these DMRs were below: extracted. Postprocessing of the differentially methylated data library(randomForest) for visualization was performed as per the method provided library(varSelRF) in the Supplementary Material (Supplementary Text S4–S6). DS¼read.table("/home/neeraja/RF/OSCC/TminN.NoFlanks. forRF",header¼TRUE,na.strings¼"NA") Data preprocessing and differential expression analyses DS na.omit(DS) Whole-genome gene expression data were analyzed using final for (i in 1:500) { reports exported from Genome Studio, using the R Bioconductor set.seed(i) packages lumi (30). The VST transformation and LOESS normal- DS.rf.vsf¼varSelRF(DS[,-1],DS[,1],ntree¼3000, ization methods in lumi performed best in preprocessing data ntreeIterat¼2000,vars.drop.frac¼0.2,whole.range ¼ FALSE,keep. from the DASL and WG assays. They were chosen based on their forest ¼ TRUE) highest mean IACs (interarray correlation coefficients), among print(DS.rf.vsf$selected.vars) various combinations of three types of transformations (VST, log2 print(predict(DS.rf.vsf$rf.model,subset(DS[,-1],select¼DS.rf.vsf , and cubicRoot) and six types of normalizations (SSN, quantile, $selected.vars))) RSN, VSN, RankInvariant, and LOESS). The preprocessed data was } further batch corrected using ComBat, as the experiments had been fi performed in multiple batches. The batch-corrected gene expres- For all iterations of all RF analyses, we con rmed that the sion data were then analyzed for differential expression using rankings of variable importances remained highly correlated limma (Supplementary Text S7). This method of transformation, (Supplementary Table S3) before and after correcting for multiple – normalization, and batch correction was performed as described hypotheses comparisons using pre- and post- Benjamin Hoch- previously (31). berg FDR-corrected P values (Supplementary Text S10). The R commands used to recompute importance values after multiple comparisons testing for the six somatic mutation category RF Correlation of methylation and expression data analyses are provided below: All probes were annotated using the Ensemble v75 database and classified on the basis of their locations to genic subregions, # Selected variables from varSelRF analyses such as 50 and 30 untranslated regions (UTR), north and south DS.rf.vsf$selected.vars shelves and shores, TSS1500, promoters, CpG islands, gene ds.rf randomForest(DS[,1] ., DS, ntree ¼ 3000, norm.votes ¼ bodies, and first exons. The relationship between differential FALSE, importance ¼ TRUE) expression of genes and differential methylation of genic sub- #Importance values before FDR correction regions was visualized (Supplementary Text S8). ds.rf$importance[-1,3]
www.aacrjournals.org Mol Cancer Res; 14(9) September 2016 807
Downloaded from mcr.aacrjournals.org on October 1, 2021. © 2016 American Association for Cancer Research. Published OnlineFirst June 10, 2016; DOI: 10.1158/1541-7786.MCR-15-0395
Krishnan et al.
#Importance values after FDR correction Validation of minimal methylation signature in TCGA p.adjust(ds.rf$importance[-1,3], method ¼ "BH", n ¼ 6) HNSCC data The best-scoring signatures identified from the random forest The random forest algorithm was also implemented to analyses on the OTSCC data, using three training sets, were fi predict tumor-speci c minimal methylation signature, this examined in the 50 tumor:matched control pairs (various subsites time training with a categorical input for the probes, instead in head and neck region) from the TCGA cohort. TCGA data were of the actual intensity values (training set #2). This training set preprocessed and normalized exactly the same way as described Db was created on the basis of the normalized (tumor-matched above for the OTSCC data. control b) values falling into predefined categories. Following the Db density distributions across probes (Supplementary Fig. Prediction of minimal differential methylation signature in S2), for 0.2