An Integrated Epigenomic–Transcriptomic Landscape of Lung

1 An integrated epigenomic–transcriptomic landscape of lung cancer reveals novel 2 methylation driver genes of diagnostic and therapeutic relevance 3 4 Xiwei Sun1#, Jiani Yi 1#, Juze Yang1#, Yi Han1, Xinyi Qian2, Yi Liu1, Jia Li1, Bingjian Lu2, 5 Jisong Zhang1, Xiaoqing Pan3, Yong Liu4, Mingyu Liang4, Enguo Chen1,5*, Pengyuan 6 Liu1,4,5*, Yan Lu2,5* 7 8 1Sir Run Run Shaw Hospital and Institute of Translational Medicine, Zhejiang University 9 School of Medicine, Hangzhou, Zhejiang 310016, China. 10 2Center for Uterine Cancer Diagnosis & Therapy Research of Zhejiang Province, Women's 11 Reproductive Health Key Laboratory of Zhejiang Province, Department of Gynecologic 12 Oncology, Women’s Hospital and Institute of Translational Medicine, School of Medicine, 13 Zhejiang University, Hangzhou, Zhejiang 310006, China. 14 3Department of Mathematics, Zhejiang University, Hangzhou, Zhejiang 310027, China. 15 4Center of Systems Molecular Medicine, Department of Physiology, Medical College of 16 Wisconsin, Milwaukee, WI 53226, USA 17 5Cancer center, Zhejiang University, Hangzhou, Zhejiang 310013, China. 18 19 # XS, JNY, and JZY equally. 20 Running title: Integrative epigenomic–transcriptomic landscape of lung cancer 21 Key words: DNA methylation; driver genes; epigenomics; lncRNA, lung cancer; miRNA, 22 reduced representation bisulfite sequencing; transcriptomics; transcription factor 23 *Correspondence: [email protected], [email protected] or [email protected] 1 24 Abstract 25 Background: Aberrant DNA methylation occurs commonly during carcinogenesis and is of 26 clinical value in human cancers. However, knowledge of the impact of DNA methylation 27 changes on lung carcinogenesis and progression remains limited. 28 Methods: Genome-wide DNA methylation profiles were surveyed in 18 pairs of tumors and 29 adjacent normal tissues from non-small cell lung cancer (NSCLC) patients using Reduced 30 Representation Bisulfite Sequencing (RRBS). An integrated epigenomic–transcriptomic 31 landscape of lung cancer was depicted using the multi-omics data integration method. 32 Results: We discovered a large number of hypermethylation events pre-marked by poised 33 promoter in embryonic stem cells, being a hallmark of lung cancer. These hypermethylation 34 events showed a high conservation across cancer types. Eight novel driver genes with 35 aberrant methylation (e.g., PCDH17 and IRX1) were identified by integrated analysis of 36 DNA methylation and transcriptomic data. Methylation level of the eight genes measured by 37 pyrosequencing can distinguish NSCLC patients from lung tissues with high sensitivity and 38 specificity in an independent cohort. Their tumor-suppressive roles were further 39 experimentally validated in lung cancer cells, which depend on promoter hypermethylation. 40 Similarly, 13 methylation-driven ncRNAs (including 8 lncRNAs and 5 miRNAs) were 41 identified, some of which were co-regulated with their host genes by the same promoter 42 hypermethylation. Finally, by analyzing the transcription factor (TF) binding motifs, we 43 uncovered sets of TFs driving the expression of epigenetically regulated genes and 44 highlighted the epigenetic regulation of gene expression of TCF21 through DNA methylation 45 of EGR1 binding motifs. 46 Conclusions: We discovered several novel methylation driver genes of diagnostic and 47 therapeutic relevance in lung cancer. Our findings revealed that DNA methylation in TF 48 binding motifs regulates target gene expression by affecting the binding ability of TFs. Our 49 study also provides a valuable epigenetic resource for identifying DNA methylation-based 50 diagnostic biomarkers, developing cancer drugs for epigenetic therapy and studying cancer 51 pathogenesis. 2 52 3 53 Introduction 54 Lung cancer is the leading cause of cancer mortality worldwide and non-small cell lung 55 cancers (NSCLCs) account for about 85% of lung cancers [1, 2]. Although there are new 56 agents for the treatment of NSCLC patients, the 5-year survival rate is still estimated at 15% 57 [1, 3, 4]. To reduce the high lethality of NSCLCs, many significant efforts have been made. 58 Previous studies have reported causal genetic alterations, from somatic point mutations to 59 large structure variations, involved in the carcinogenesis of NSCLCs [5-9]. Based on these 60 findings, various therapeutic strategies have been developed for NSCLC treatment, such as 61 combination cisplatin-based chemotherapy with the anti-angiogenic bevacizumab [10, 11], 62 the use of tyrosine kinase inhibitors to treat EGFR-mutated, ALK or ROS1-rearranged 63 NSCLC patients [12-15]. However, owing to the primary, adaptive and acquired resistance of 64 these drugs, the target treatment is not effective for all NSCLC patients [16]. DNA 65 methylation alterations commonly occur in cancer and are involved in tumor initiation and 66 progression. Aberrant promoter CpG methylation provides a selective mechanism for the 67 regulation of tumor suppressor genes and oncogenes in cancer instead of genetic mutations 68 [17]. However, these methylation changes could be inherently reversed by agents, in contrast 69 with genetic mutations [18]. Thus, identification of epigenetically modulated genes opens 70 new avenues for epigenetic therapies and for discovery of novel drug targets. 71 In NSCLC, CDKN2A was the first reported gene whose downregulated expression in 72 lung carcinogenesis was predominantly attributed to promoter hypermethylation [19]. 73 Afterwards, several studies described several other epigenetically regulated genes, such as 74 FHIT, DAPK and RASSF1A [20-26]. However, these earlier studies only investigated a 75 single gene or a small list of genes. The advent of high throughput next-generation 76 sequencing (NGS) for analyzing DNA methylome and transcriptome has offered the unique 77 ability to analyze methylation changes and detect methylation driver events at a genome-wide 78 scale [27-29]. 79 In this study, we analyzed genome-wide methylation profiles of ~ 2 million CpG sites 80 spanning more than 19,600 genes and noncoding regions using Reduced Representation 81 Bisulfite Sequencing (RRBS) technology [30]. RRBS combines restriction enzymes and 82 bisulfite sequencing to enrich for DNA segments with a high CpG content and regulatory 83 potentials. It is an efficient and cost-effective technique for analyzing the genome-wide 84 methylation profiles on a single nucleotide level. Using this technique and the multi-omics 85 data integration method, we pinpointed several novel methylation driver protein coding genes 86 and noncoding RNAs that could be potential targets for epigenetic therapy. Furthermore, we 87 detected sets of transcription factor (TF) binding motifs located in differentially methylated 88 regions (DMRs) which regulated target gene expression by affecting the binding ability of 89 TFs in lung cancer. 4 90 Materials and Methods 91 Reduced representation bisulfite sequencing (RRBS) 92 The study was approved by the institutional review board of Sir Run Run Shaw Hospital 93 at Zhejiang University (Hangzhou, China). Eighteen lung tumor tissues and adjacent normal 94 tissues were collected from NSCLC patients (Table S1). All samples used in the current study 95 were obtained at the time of diagnosis before any treatment was administered. Genomic DNA 96 of these tissues was extracted and then treated according to the RRBS library preparation 97 protocols as we previously described [30] with modifications to allow multiplexing [31]. 98 Paired-end sequencing with 100bp was performed on the Illumina HiSeq 2000 according to 99 manufacturer’s protocol. 100 Bioinformatics analyses of RRBS data 101 Bisulfite sequencing reads were pre-processed with Trim Galore 102 (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/). Both adapters and 103 sequences with low quality (base quality < 20) were removed before the analysis. The 104 trimmed reads were then aligned to the human reference genome (hg19) and the methylation 105 status of each CpG was determined using Bismark (v0.14.1) with default parameters [32] 106 (Table S2). The unconverted cytosines at fill-in 3′ MspI sites of sequencing reads were used 107 to estimate the bisulfite conversion rate. For each CpG site with at least 5×coverage, the 108 methylation rate, C/(C+T), was calculated. We merged the 18 tumor-normal sample pairs 109 based on CpG coordinates, yielding 2,574,098 CpG sites that covered at least ten paired 110 samples. CpG sites in the heterosome and CpG sites overlapped with SNP (dbSNP build 142) 111 were filtered, and 2,166,853 were retained for subsequent analysis. The remaining missing 112 methylation values were imputed using k-nearest neighbors in the CpGs space 113 (http://bioconductor.org/packages/release/bioc/html/impute.html). 114 Next, we used metilene (Version 0.23) [33] to identify DMRs between tumor and 115 matched noncancerous tissues. These DMRs were further examined for significance using the 116 Wilcoxon rank sum test for a paired dataset. The final DMRs were determined using the 117 following threshold: at least ten CpGs in the DMRs, at least 10% differences in methylation, 118 and false discovery rate (FDR) < 0.05. We performed an unsupervised hierarchical cluster 119 analysis on the CpG sites’ methylation using ward linkage and Euclidean distance. The 120 Metascape software (http://metascape.org) was used to conduct Gene Ontology (GO) 121 analyses according to the standard protocol. 122 RNA-seq analysis 123 mRNA sequencing libraries were prepared for

An Integrated Epigenomic–Transcriptomic Landscape of Lung

Gene Nomenclature System for Rice

Systematic Review of the Genetics of Sudden Unexpected Death in Epilepsy: Potential Overlap with Sudden Cardiac Death and Arrhythmia-Related Genes C

A Genetic Screen for Genes That Impact Peroxisomes in Drosophila Identifies

Indicators of the Molecular Pathogenesis of Virulent Newcastle

Identification of Common Gene Networks Responsive To

The Prepattern Transcription Factor Irx3 Directs Nephron Segment Identity

The Genetic Basis of Malformation of Cortical Development Syndromes: Primary Focus on Aicardi Syndrome

Highlights of the 'Gene Nomenclature Across Species' Meeting

Abundant and Dynamically Expressed Mirnas, Pirnas, and Other Small Rnas in the Vertebrate Xenopus Tropicalis

Downloaded from the CAVA Integration with Data Generated by Pre-NGS Methods Webpage [19]

The Human Gamma-Glutamyltransferase Gene Family

Decision Support for Molecular Diagnostic Laboratories Using Interactive Biosoftware Alamut V1.2