Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
1 Histone-related genes are hypermethylated in lung cancer and hypermethylated
2 HIST1H4F could serve as a pan-cancer biomarker
3 Shi-Hua Dong1,#, Wei Li1,#, Lin Wang2,#, Jie Hu3,#, Yuanlin Song3,#, Baolong Zhang1,
4 Xiaoguang Ren1, Shimeng Ji3, Jin Li1, Peng Xu1, Ying Liang1, Gang Chen4, Jia-Tao
5 Lou2†, Wenqiang Yu1†
6
1 7 Shanghai Public Health Clinical Center and Department of General Surgery, Huashan
8 Hospital, Cancer Metastasis Institute and Laboratory of RNA Epigenetics, Institutes of
9 Biomedical Sciences, Shanghai Medical College, Fudan University, Shanghai, 201508,
10 China.2Department of Laboratory Medicine, Shanghai Chest Hospital, Shanghai Jiao
11 Tong University, Shanghai 200030, China.3Department of Pulmonary Medicine,
12 Zhongshan Hospital, Fudan University, 180 Fenglin Road, Shanghai 200032, China.
13 4Department of Pathology, Zhongshan Hospital, Fudan University, 180 Fenglin Road,
14 Shanghai 200032, China.
15 #These authors contributed equally to this work.
16
17 Running title
18 HIST1H4F region as a Universal-Cancer-Only Methylation marker.
19
20 Keywords
21 Lung cancer, DNA methylation signature, Histone gene, HIST1H4F, Universal-Cancer-
22 Only Methylation, Pan-cancer biomarker
23
1
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
24 Financial Support
25 “This work was supported by the National Key R&D Program of China (Grant No.
26 2018YFC1005004), the Science and Technology Innovation Action Plan of Shanghai
27 (Grant No. 17411950900), the National Natural Science Foundation of China (Grant Nos.
28 31671308, 31872814, and 81272295), Major Special Projects of Basic Research of
29 Shanghai Science and Technology Commission (Grant No. 18JC1411101), the Shanghai
30 Science and Technology Committee (Grant No. 12ZR1402200), the Ministry of
31 Education of the People’s Republic of China (Grant No. 2009CB825600), and the
32 Innovation Group Project of Shanghai Municipal Health Commission (Grant No.
33 2019CXJQ03).”
34
35 Corresponding author
36 †Wenqiang Yu, PhD&MD, Institute of Biomedical Sciences, Fudan University, 130
37 Dong’an Road, West 13# Building, Room 419, Shanghai 200032, P.R.China. Tel.: +86-
38 21-54237978, Fax: +86-21-54237339, E-mail: [email protected]
39 †Jiatao Lou, MD, Department of Laboratory Medicine, Shanghai Chest Hospital, 241
40 West Huaihai Road, Shanghai, 200030, China. Tel.: +86-21-2220000-1503, Fax: +86-21-
41 62808279, E-mail: [email protected]
42
43 Conflict of interest
44 The authors declare potential conflicts of interest as patent application.
45
2
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
46 Abstract
47 Lung cancer is the leading cause of cancer-related deaths worldwide. Cytological
48 examination is the current "gold standard" for lung cancer diagnosis, however this has
49 low sensitivity. Here, we identified a typical methylation signature of histone genes in
50 lung cancer by whole-genome DNA methylation analysis, which was validated by a
51 TCGA lung cancer cohort (n=907) and was further confirmed in 265 bronchoalveolar
52 lavage fluid (BALF) samples with specificity and sensitivity of 96.7% and 87.0%,
53 respectively. More importantly, HIST1H4F was universally hypermethylated in all
54 seventeen tumor types from TCGA datasets (n=7344), which was further validated in
55 nine different types of cancer (n=243). These results demonstrate that HIST1H4F can
56 function as a Universal-Cancer-Only Methylation (UCOM) marker, which may aid in
57 understanding general tumorigenesis and improve screening for early cancer diagnosis.
58 Significance
59 Findings identify a new biomarker for cancer detection and show that
60 hypermethylation of histone-related genes seems to persist across cancers.
61
62
3
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
63 Introduction
64 Lung cancer is one of the most common malignant tumors and the leading cause
65 of cancer-ralated deaths worldwide (1,2). Early detection and surgery offer the best
66 chance for survival, with the five-year survival rate as high as 80% (3). However, most
67 lung cancer patients have been diagnosed with inoperable advanced stage with metastasis,
68 and patients must undergo chemotherapy, radiotherapy, immunotherapy, or targeted
69 therapy. The five-year survival rate of patients in the advanced stage is below 10% (4,5).
70 Over the past decade, LDCT (low-dose computed tomography) is the most commonly
71 used screening method for lung cancer, which has been shown to improve early detection
72 and reduce mortality (6). However, due to its low specificity, LDCT is far from
73 satisfactory as a screening tool for clinical application, similar to other currently used
74 cancer biomarkers, such as carcinoembryonic antigen (CEA), neuron-specific enolase
75 (NSE), CYFRA 21-1, etc. Therefore, effective biomarkers for early detection, diagnosis,
76 prognosis, and monitoring of lung cancer are urgently needed (7).
77 Epigenetic and genetic abnormalities are hallmarks of lung cancer (8-10).
78 Abnormal DNA methylation is the most common epigenetic variation in the process of
79 lung cancer. Compared to DNA mutations, DNA methylation occurs much earlier and is
80 more stable in the early diagnosis of tumors, and aberrant DNA methylation pattern can
81 be used for predicting the liver cancer metastasis to lung (11). Although many DNA
82 methylation biomarkers have been reported, they are still under the exploration process
83 and rarely used in clinical applications. Sensitivity and specificity of current methylation
84 markers are insufficient with high false positives and false negatives risk (12,13).
4
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
85 Therefore, applying methylation markers to clinical applications is challenging and
86 searching for new biomarkers for the early detection of cancer are urgently needed (14).
87 Histones are major essential components of chromatin and conserved in eukaryotic
88 cells (15). There are five major types of histones: H1, H2A, H2B, H3, and H4. Histones
89 H2A, H2B, H3 and H4 are known as the core histones, while histone H1 is known as the
90 linker histone (16). Histones are divided into canonical replication-dependent histones
91 that are expressed during the S-phase of the cell cycle and replication-independent
92 histone variants, which are expressed during each phase of the cell cycle. Genes encoding
93 canonical histones are intron-less and lack a polyA tail at the 3’ end, having instead a
94 stem-loop structure, canonical histone genes also tend to be clustered in the genome.
95 Genes encoding histone variants are usually not clustered and have introns and polyA
96 tails (17,18). In the human genome, histone genes mainly form histone cluster 1
97 (Chr6p21) and histone cluster 2 (Chr1q21) (19). Other histone genes are distributed
98 randomly in the human genome. Although histone modifications have been extensively
99 studied in chromatin regulation, epigenetic variation in the family of histone genes
100 themselves is rarely considered. It has been shown that histone gene cluster 1 is occupied
101 by abnormally higher-order chromatin organization in breast cancer (20). However, DNA
102 methylation alteration in histone genes loci has not yet been systematically investigated,
103 especially in cancer development.
104 Here, through genome-wide DNA methylation analysis with an unusual strategy, we
105 found that many histone gene loci are abnormally hypermethylated in lung cancer, which
106 piqued our interest for further investigation. We demonstrate that methylation of histone
107 genes can be used as a biomarker for early detection in BALF samples. Furthermore,
5
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
108 histone gene loci are not only abnormally hypermethylated in lung cancer but also
109 specifically methylated in various tumors. In particular, the HIST1H4F gene is
110 abnormally hypermethylated in seventeen types of cancer which we could obtain
111 informative clinical samples and act as a potential Universal-Cancer-Only Methylation
112 marker. We speculate that the methylation of HIST1H4F will be of great significance for
113 early diagnosis, especially during the screening process of cancer in clinical applications.
114
115 Materials and methods
116 WGBS data analysis
117 WGBS data sets were downloaded from the Encode database
118 (https://www.encodeproject.org/) and the SRA database
119 (https://www.ncbi.nlm.nih.gov/sra); the serial numbers were summarized in
120 Supplementary Table 1. DNA methylation levels were calculated using BSMAP software
121 (21) as described previously (11), where hg19 human genome assembly and UCSC
122 reference gene annotations were used. Specifically, for each CpG site, reads supporting
123 either methylation or unmethylation were achieved, and the methylation value was
124 calculated as the ratio of the number of reads supporting methylation to the sum of the
125 number of reads supporting both methylation and unmethylation. Only CpG sites covered
126 by more than five reads and detected in all the seven WGBS data sets were used for
127 subsequent analysis.
128 DMS, DMR and DMG definition
6
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
129 The methylation levels of four Normal lung Cell samples (NC) and three lung Cancer
130 Cell samples (CC) were calculated. For each CpG site, we calculated methylation value
131 difference for all twelve CC-NC pairs (CCi – NCj, where i=1,2, or 3 and j=1,2,3 or 4).
132 CpG sites with all twelve (CCi - NCj) ≥ 50% were defined as Cancer Cell-Differentially
133 Methylated Sites (CC-DMS). Similarly, CpG sites with all twelve (NCj - CCi) ≥ 50%
134 were defined as Normal Cell-Differentially Methylated Sites (NC-DMS). In addition,
135 CpG sites with all twelve (|CCi - NCj|) ≤ 20% were defined as NO-Differentially
136 Methylated Sites (NO-DMS). A Differentially Methylated Region (DMR) was defined as
137 at least 3 adjacent DMS within 100bp genomic window. Genes overlapping with any
138 DMR were defined as Differentially Methylated Genes (DMG).
139 TCGA DNA methylation data analysis
140 The Illumina 450K methylation array level three data from the TCGA (The Cancer
141 Genome Atlas) database were downloaded from the UCSC Xena browser
142 (https://xenabrowser.net/). For each histone gene, only probes within the genebody region
143 (listed in Supplementary Table 2) were selected to calculate an average methylation value.
144 Probes with “NA” values were excluded. The absolute methylation values were
145 calculated from the β-values of 450K methylation array ( methylation value = (β-value +
146 0.5)*100% ). For each gene, the final methylation value was calculated by the average of
147 all CpG sites selected. The samples used from TCGA database and the methylation levels
148 of HIST1H4F were listed in Supplementary Table 3.
149 Clinical samples
150 We collected 243 primary tissue samples and 265 BALF samples from Shanghai Chest
151 Hospital and Zhongshan Hospital of Fudan University. Primary tissue samples included
7
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
152 25 lung cancer and 25 paired para-cancer control samples, 12 colorectal cancer and 12
153 paired para-cancer control samples, 10 esophagus cancer and 12 paired para-cancer
154 control samples, 20 liver cancer and 23 para-cancer control samples, 9 pancreatic cancer
155 and 9 paired para-cancer control samples, 10 cervical cancer and 10 control samples, 10
156 gastric cancer and 10 para-cancer control samples, 14 breast cancer and 14 paired para-
157 cancer control samples, 10 head and neck cancer and 10 paired para-cancer control
158 samples. Clinical characters of these samples were summarized in Supplementary Table 4.
159 BALF samples contained a benign lung disease (BLD) control group and lung cancer
160 group. BLD control group contained 59 samples, including pneumonia, emphysema, and
161 tuberculosis, etc. The lung cancer experimental group included 92 lung squamous cell
162 carcinoma (LUSC) samples, 70 lung adenocarcinoma (LUAD) samples, and 44 small cell
163 lung carcinoma (SCLC) samples. BALF samples were randomly assigned to a training
164 set and a validation set. All patients provided written informed consent before their
165 samples were collected. Institutional Review Boards approval for research on human
166 subjects was obtained from the Hospital.
167 DNA extraction and Bisulfite-PCR treatment, pyrosequencing
168 Genomic DNA from cultured cell lines and primary tissue samples was extracted with
169 phenol-chloroform. Genomic DNA from BALF samples was extracted with the Qiagen
170 DNA Extraction Kit (Qiagen, cat# 51404). Next, 20~200 ng genomic DNA was taken for
171 bisulfite treatment (ZYMO Research, cat# D5006), and the recovered bisulfite-treated
172 DNA was used as the subsequent PCR template. We detected eleven CpG sites for
173 HIST1H4F gene (chr6:26,240,743-26,240,800) and eight CpG sites for HIST1H4I gene
174 (chr6:27,107,185-27,107,239). The genomic sequences and primers designed for target
8
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
175 genes were listed in Supplementary Table 5. Two rounds of semi-nested PCR was
176 performed to produce the single band biotin modified PCR products. The out forward
177 primer and the reverse primer were used for the first round of PCR amplification. The
178 inner forward primer and the reverse primer were used for the second round of PCR
179 amplification. The two-round PCR were performed with the same program: 98℃ 30s for
180 pre-denaturation, 98℃ 10s 58℃ 30s 72℃ 30 for a 30 cycle amplification, 72 ℃ 3min
181 for a final elongation. The pyrosequencing assay was performed on a PyroMark Q96 ID
182 instrument (QIAGEN). For each target gene, the average of each CpG site detected by
183 pyrosequencing matched the final methylation value.
184 Cell Culture
185 The human lung cancer cell line A549, human lung fibroblast cell line MRC5 and human
186 hepatocarcinoma cell line HepG2 were kindly provided by Stem Cell Bank, Chinese
187 Academy of Sciences. All the cell lines were authenticated by the PowerPlex 16 System
188 (Promega) and were negative for mycoplasma tested by qPCR. A549, MRC5 and HepG2
189 cells were cultured in DMEM medium supplemented with 10% v/v FBS and 1% v/v
190 antibiotics at 37°C in a humidified atmosphere of 5% CO2. For passaging, cells were
191 washed once by PBS and dissociated using 1ml 0.25% trypsin, then neutralized with 1 ml
192 DMEM medium and equally plated into two 10cm dishes.
193 Results
194 The pipeline of Genome-wide WGBS data analysis and identified differentially
195 methylated regions validated by the TCGA cohort.
196 To detect genome-wide screening DNA methylation biomarkers for the early
197 diagnosis of lung cancer, we collected three WGBS data sets of lung cancer cells and
9
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
198 another four WGBS data sets of cell samples derived from normal lung tissues as controls
199 (Supplementary Table 1). To effectively screen for lung cancer biomarkers from these
200 WGBS data sets, we developed a new data analysis strategy (Fig. 1A).
201 1) First, we performed a genome-wide methylation analysis for each WGBS
202 sample and obtained all CpG sites covered by more than five reads. By this process, we
203 obtained at least 30 million CpG sites per sample, covering at least 55.7% of whole
204 genomes (Supplementary Table 1). To robustly analyze the difference between normal
205 and cancer samples at single-nucleotide resolution, only CpG sites detected in all seven
206 samples were selected for further analysis. In total, 19,461,312 CpG sites were selected,
207 covering 34.5% of all possible sites in human genome. This rate is much higher than both
208 the reduced representation bisulfite sequencing (RRBS), whose coverage was estimated
209 to be 1-3%, and the Illumina 450k methylation array, covering 485,455 CpG sites and
210 accounting for approximately 2% of all possible sites (22,23). The average methylation
211 levels showed that the cancer samples were hypomethylated compared to normal ones
212 (Fig. 1B), which is consistent with the previous report that cancer is globally
213 hypomethylated. Meanwhile, the 19,461,312 CpG sites were expected to distribute
214 throughout the whole genome, including intergenic, intron, exon, and promoter regions
215 (Fig. 1C). These results indicate that our approach is applicable throughout the genome
216 with minor sequence bias (Supplementary Fig. S1A).
217 2) Second, based on the 19,461,312 CpG sites, by calculating the methylation
218 differences between CCi and NCj, we found 24,257 CC-DMS, 442,233 NC-DMS, and
219 4,456,347 NO-DMS, which accounted for 0.12%, 2.27% and 22.9% of all 19,461,312
220 CpG sites, respectively. Compared to the equilibrium distribution of all the 19,461,312
10
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
221 CpG sites (Fig. 1C), CC-DMS were obviously enriched in the promoter and exonic
222 regions (hypergeometric test, p-value <1e-5) (Fig. 1D); NC-DMS was enriched in the
223 intergenic region (hypergeometric test, p-value < 1e-5 ) (Fig. 1E); meanwhile, NO-DMS
224 were mostly enriched in the intronic region (hypergeometric test, p-value <1e-5 ) (Fig.
225 1F). Additionally, 13,932 CpG sites out of 24,257 CC-DMS (57.4%) were located in
226 CpG island regions. In contrast, only 3,518 CpG sites out of 442,233 NC-DMS (0.8%)
227 were located in the CpG island regions, indicating that DNA methylation in tumor
228 usually occurred in cis-regulating elements. However, for NO-DMS, hypomethylated
229 CpG sites (methylation level ≤ 20%) were mainly distributed in the promoter region
230 (Supplementary Fig. S1B), while hypermethylated CpG sites (methylation level ≥ 80%)
231 were mainly distributed in the intronic region (Supplementary Fig. S1C). These results
232 reveal that the cancer cells are globally hypomethylated and locally hypermethylated, and
233 these locally hypermethylated regions are mainly distributed in promoter and exonic
234 regions.
235 3) Third, similar to the genetic linkage effect, DNA methylation within a small
236 genome region also tends to be consistent (24). Based on this principle, adjacent CpG
237 sites together among regional DNA methylation behavior is much more reliable than
238 single CpG sites. For example, DMR or methylation haplotypes have been widely used
239 for DNA methylation analysis. Therefore, we further defined DMR by more than three
240 DMS within the 100bps genome region. Among the 24,257 CC-DMS sites, we identified
241 2,408 CC-DMR. Calculating on the 442,233 NC-DMS, we found 36,393 NC-DMR.
242 Meanwhile, based on 4,456,347 NO-DMS, we found 435,249 NO-DMR. We further
243 analyzed these DMR-embedded genes. There were 958 CC-DMR-related genes and
11
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
244 1,925 NC-DMR-related genes, which we called CC-DMG (Cancer Cell-Differentially
245 Methylated Genes, Supplementary Table 6) and NC-DMG (Normal Cell-Differentially
246 Methylated Genes, Supplementary Table 7). We calculated the methylation levels of CC-
247 DMG and NC-DMG in WGBS and TCGA data (Fig. 1G-1H, Supplementary Tables 6
248 and 7). KEGG pathway analysis showed that CC-DMG were mainly enriched in tumor-
249 associated signaling pathways, such as the Hippo signaling pathway and transcriptional
250 misregulation in cancer. NC-DMG were enriched in olfactory transduction with less link
251 to tumor-related signaling pathways. NO-DMG were mainly enriched in basic cellular
252 function-related pathways (Supplementary Fig. S1D). Interestingly, both CC-DMG and
253 NC-DMG were enriched in the neuroactive ligand-receptor interaction signaling pathway.
254 Particularly, some adrenaline signaling-related genes, such as ADRA1A, ADRA2A,
255 ADRA2C, and ADRBK1, appeared in the CC-DMG list, but some cholinergic signaling-
256 related genes, such as CHRM2, CHRM3, and CHRM5 were found in the NC-DMG list.
257 The variation in DNA methylation in nerve-related genes indicates that neuroregulation
258 plays an important role in the genesis and development of lung cancer, which is
259 supported by evidence from several groups showing that cancer development in a variety
260 of tissues is controlled by an assortment of nerve-mediated signals, including
261 neurotransmitters and other molecules (25-27), indicating that epigenetic regulation of
262 neuron related genes will be of great interest in cancer development. As expected, many
263 renowned lung cancer methylation biomarkers that were reported in the literature are
264 among our CC-DMG list, for example, SHOX2, POU4F2, BCAT1, HOXA9, and PTGDR
265 (28-32). These results further support our strategy of analysis.
12
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
266 4) Fourth, to further confirm the veracity of the WGBS analysis, we downloaded
267 the Illumina 450K methylation array data of the TCGA lung cancer cohort. The Illumina
268 450K methylation array contains 485,455 CpG probes. The TCGA lung cancer cohort
269 contained a total of 907 samples, including 75 para-cancer normal control samples and
270 832 lung cancer samples (Supplementary Table 8). We selected overlapping detected
271 CpG sites among 450K probes and CpGs in DMS/DMR/DMG (Fig. 1A, Supplementary
272 Fig. S1E) to verify our WGBS analysis. In the 485,455 450K probes, 845 and 1662 CpG
273 sites were commonly detected in 450K probes with CC-DMS and NC-DMS, respectively.
274 Methylation levels of CC-DMS and NC-DMS from WGBS were clearly either
275 hypermethylated or hypomethylated between cancer and normal samples in TCGA
276 datasets accordingly. There were 624 and 840 CpG sites commonly detected in 450K
277 probes with CC-DMR and NC-DMR, respectively. Similarly, CC-DMR and NC-DMR
278 obtained from WGBS are also verified by TCGA data sets (Supplementary Fig. S1F-I).
279 As for DMG, 401 and 377 CpG sites were both detected in 450K probes with CC-DMG
280 and NC-DMG, respectively, and their DNA methylation status were all supported by
281 TCGA data sets (Fig. 1H). Take it together, our results can be fully verified by lung
282 cancer 450K methylation array data from the TCGA, which further prove the validity of
283 our previous analyzed approach.
284
285 Abnormally hypermethylated signature of histone gene in lung cancer
286 In addition to some already acknowledged biomarkers, such as SHOX2 and
287 POU4F2, we effectively found many unreported new genes on our CC-DMG list. More
13
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
288 interestingly, some histone genes appeared on the CC-DMG list, such as HIST1H3C,
289 HIST1H4F, and HIST1H4I, which called for further investigation.
290 As essential and conserved housekeeping genes, histones are stably expressed in
291 almost all eukaryotic cells. Due to the important function of histones, each histone
292 protein is encoded by multiple histone genes (19). In total, 85 histone genes have been
293 found in the human genome, including 68 canonical histone genes and 17 histone variant
294 genes. Canonical histone genes include six H1 genes, seventeen H2a genes, eighteen H2b
295 genes, thirteen H3 genes, and fourteen H4 genes. Variant histone genes include four H1
296 variants, seven H2a variants, two H2b variants, and four H3 variants. Histone
297 modifications have been widely investigated in the epigenetic field (33,34).
298 Unfortunately, DNA methylation of the histone gene family has not been well described
299 in the literature. We summarize the 85 histone genes in Supplementary Table 9.
300 We further focus on the analysis of DNA methylation of the whole histone gene
301 family in WGBS data (Fig. 2A). Four histone genes were not included in our analysis
302 (HIST2H2AA4, HIST2H3C, HIST2H4A, H2BFS), because they were not all detected in
303 the WGBS dataset, therefore, we excluded these four genes from the subsequent analysis.
304 According to the DNA methylation signature of histone gene in normal and cancer
305 samples, they can be divided into seven groups. As shown in group 1, normal and cancer
306 cells are all poorly methylated, and meanwhile, group 2 histone genes are all highly
307 methylated in normal and cancer samples. Group 3 histone genes are randomly
308 methylated in normal and cancer samples and group 4, including 14 histone genes
309 (HIST1H4I, HIST1H2BM, HIST1H3C, HIST1H4F, HIST1H2BB, HIST1H2BE,
310 HIST1H1A, HIST1H2BI, HIST1H3G, HIST1H2AD, HIST1H2BE, HIST1H3J,
14
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
311 HIST1H2BH, HIST1H4D) were hypermethylated in all lung cancer samples (Fig. 2B,
312 Supplementary Fig. S2A). To confirm this finding, we reanalyzed the methylation of
313 these hypermethylated histone genes on the Illumina 450K methylation arrays of the
314 TCGA lung cancer cohort (n=907), and the results showed that nine of the fourteen genes
315 (HIST1H4I, HIST1H4F, HIST1H3C, HIST1H2BE, HIST1H2BM, HIST1H3J,
316 HIST1H2BB, HIST1H1A, HIST1H2BI) were significantly hypermethylated in both
317 LUAD and LUSC (Fig. 2C). In addition, we found that DNA methylation of histone
318 genes can be used for the classification of the three main types of lung cancer. We found
319 that four histone genes (group 5: HIST1H2AG, HIST3H2A, HIST3H2BB, HIST1H3F)
320 were specifically hypermethylated in LUAD (Fig. 2D), and four histone genes (group 6:
321 HIST1H4A, HIST1H3A, HIST1H2AL, HIST1H3I) were only methylated in LUSC
322 samples (Fig. 2E), and another six histone genes (group 7: HIST1H2BL, HIST2H3D,
323 HIST1H2AJ, H2AFJ, HIST1H2AI, HIST1H1D) were high methylated in SCLC (Fig. 2F).
324 More importantly, these cancer type-specific hypermethylated genes can be verified in
325 the TCGA datasets (Supplementary Fig. S2B). These results suggest that methylation of
326 histone gene loci may be used for distinguishing lung cancer subtypes.
327 We further performed receiver operating characteristics (ROC) analysis on
328 fourteen histone genes that were hypermethylated by using TCGA datasets. The results
329 show that HIST1H4F and HIST1H4I have much higher specificity and sensitivity; the
330 specificity and sensitivity of HIST1H4F were 97.3% and 82.7%, respectively, and the
331 specificity and sensitivity of HIST1H4I were 96.0% and 87.5%, respectively
332 (Supplementary Table 10). Moreover, they exhibit an excellent performance within stage
333 I of lung cancer and ROC analysis reveals that they have a similar AUCs between
15
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
334 different stages, which indicates that methylation of HIST1H4F and HIST1H4I can act as
335 early lung cancer diagnosis biomarker (Supplementary Fig. S2C-D, Supplementary Table
336 10). Furthermore, ROC analysis showed that maximum methylation of HIST1H4F and
337 HIST1H4I (Max-IF) performed better than individual genes, with an area under the curve
338 (AUC) of 0.95, a specificity of 96.0%, and a sensitivity of 92.9% (Supplementary Fig.
339 S2E, Supplementary Table 10).
340 To further confirm our results, the lung cancer primary tissue samples were used
341 for verification. We collected 25 lung cancer tissue samples and paired para-cancer tissue
342 samples as control (Supplementary Table 11). Methylation of HIST1H4F and HIST1H4I
343 were detected by bisulfite PCR-pyrosequencing. The results showed that HIST1H4F and
344 HIST1H4I were significantly hypermethylated in lung cancer, and ROC analysis showed
345 very high sensitivity and specificity for each gene (Supplementary Fig. S2F). Max-IF was
346 significantly hypermethylated in lung cancer samples, with an AUC=0.98 and a
347 sensitivity of 96% and a specificity of 88% (Fig. 2G-H).
348
349 Methylation pattern of Histone gene for lung cancer diagnosis by bronchoalveolar
350 lavage fluid samples
351 BALF is of great significance in the early diagnosis of lung cancer (35,36).
352 Therefore, we tried to diagnose lung cancer by detecting the methylation of histone genes
353 using BALF samples. We collected 265 BALF samples consisting of 59 BLD control
354 samples and 206 lung cancer samples. The BLD control group contain pneumonia,
355 emphysema, tuberculosis samples, etc. The lung cancer experimental group included 92
356 LUSC, 70 LUAD, and 44 SCLC samples. After obtaining the BALF samples, we
16
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
357 randomly divided the samples into the training set (n=133) and validation set (n=132)
358 (Table 1).
359 A bisulfite-PCR pyrosequencing assay was used to detect HIST1H4F and
360 HIST1H4I methylation. To ensure the reproducibility of pyrosequencing, three technical
361 replications of bisulfite-PCR pyrosequencing were completed of a total of 30 BALF
362 samples including 10 low-methylated (0% ≤ methylation ≤ 5%), 10 middle-methylated (5%
363 < methylation < 20%) and 10 high-methylated (20% ≤ methylation ≤ 100%) samples, the
364 results showed an excellent performance in all low, middle, and high methylated samples,
365 with a methylation variation within 5% (Supplementary Fig. S3A-C). Our analysis of
366 clinical samples displayed that, in both the training set and the validation set, HIST1H4F
367 and HIST1H4I were significantly hypermethylated in different types of lung cancer
368 (Supplementary Fig. S4A-B). Max-IF was also significantly higher in LUAD, LUSC,
369 SCLC and all lung cancer samples (Fig. 3A). To assess the potential for lung cancer
370 diagnosis using HIST1H4I, HIST1H4F or Max-IF, we first performed ROC analysis in
371 the training data set, where the area under the ROC curve (AUC) was calculated and a
372 cutoff value was determined accordingly; sensitivity and specificity were further
373 calculated based on this cutoff. Moreover, to robustly estimate the diagnostic accuracy,
374 an independent evaluation using the validation set were performed, where another
375 sensitivity and specificity were calculated based on the given cutoff (Fig. 3B-3C,
376 Supplementary Table 10). For LUSC and SCLC, Max-IF achieved AUCs of 0.94 and
377 0.97, respectively (Fig. 3B). For LUSC, with a methylation cutoff of 6.05%, the
378 specificity and sensitivity of Max-IF were 96.7% and 86.4% in the training set and were
379 96.5% and 85.4% in the validation set. For SCLC, with the methylation cutoff of 7.75%,
17
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
380 the specificity and sensitivity of Max-IF were 96.7% and 95.5% in the training set and
381 were 96.5% and 95.7% in the validation set (Fig. 3C, Supplementary Table 10).
382 Comparing to LUSC and SCLC, which tend to be more centrally located, LUAD is
383 usually observed peripherally in the lungs (37). Therefore, LUSC and SCLC BALF
384 samples are more likely to contain cancer cells than LUAD BALF samples (38), thus the
385 sensitivity of LUAD should be lower than that in BALF samples of LUSC and SCLC. As
386 expected, in LUAD, the specificity and sensitivity of Max-IF were 96.7% and 60.5% in
387 the training set (cutoff=6.3% and AUC=0.84) and were 96.5% and 65.6% in the
388 validation set. In order to improve the detection sensitivity in LUAD, we combined Max-
389 IF with serum carcinoembryonic antigen (CEA). The sensitivity of CEA alone as a lung
390 cancer biomarker is very low for lung cancer diagnosis (39). In our study, the sensitivities
391 of CEA (cut off=5 ng/ml) in the training set and validation set were 27.3% and 30.7%,
392 respectively. However, the sensitivity of CEA in LUAD is much higher than in LUSC or
393 SCLC. In the training set, the sensitivities of LUAD, LUSC, and SCLC were 47.1%,
394 16.2%, and 14.3%, respectively. In the validation set, the sensitivities of LUAD, LUSC,
395 and SCLC were 50%, 22.2%, and 26.1%, respectively. Therefore, we combined Max-IF
396 with serum CEA for LUAD diagnosis, the final result of the sample can be positive by
397 either of them, and the sensitivity increased from 60.5% to 77.8% in the training set and
398 from 65.6% to 81.5% in the validation set (Fig. 3D). For all cancer samples, the
399 specificity and sensitivity were 96.7% and 86.0% in the training set and 96.5% and 87.0%
400 in the validation set, indicating that histone gene methylation as lung cancer biomarker
401 has excellent accuracy for lung cancer diagnosis (Fig. 3E).
402
18
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
403 Methylation of HIST1H4F gene is a potential Universal-Cancer-Only Methylation
404 marker
405 We have demonstrated that many histone genes are abnormally hypermethylated
406 in lung cancer, we wonder whether histone genes are also abnormally methylated in other
407 types of cancer. In total, seventeen cancer cohorts from the TCGA were analyzed. They
408 include BLCA (bladder urothelial carcinoma, n=433), BRCA (breast invasive carcinoma,
409 n=867), CESC (cervical squamous cell carcinoma and endocervical adenocarcinoma,
410 n=310), CHOL (cholangiocarcinoma, n=45), COAD (colon adenocarcinoma, n=335),
411 ESCA (esophageal carcinoma, n=201), HNSC (head & neck squamous cell carcinoma,
412 n=578), KIRC (kidney renal clear cell carcinoma, n=479), LIHC (liver hepatocellular
413 carcinoma, n=427), LUNG (lung cancer, n=907), PAAD (pancreatic adenocarcinoma,
414 n=194), PRAD (prostate adenocarcinoma, n=548), READ (rectum adenocarcinoma,
415 n=106), SKCM (skin cutaneous melanoma, n=476), STAD (stomach adenocarcinoma,
416 n=398), THCA (thyroid carcinoma, n=563), and UCEC (uterine corpus endometrioid
417 carcinoma, n=477) (Supplementary Table 12).
418 For each cancer type, we calculated the average methylation difference in normal
419 and cancer samples (Fig. 4A). We found that there are no methylation differences in most
420 histone genes. However, some histone genes tended to be hypermethylated in different
421 types of cancer, including HIST1H4F, HIST1H3E, HIST1H2BB, HIST1H1A, HIST1H3C,
422 HIST1H4I. However, H2BFM and H2BFWT tended to be hypomethylated in various
423 types of cancer. Importantly, we found that HIST1H4F was hypermethylated in all tumor
424 types, except THCA (Fig. 4B). In THCA, even minor methylation difference was
425 observed between normal (median=6.1%) and cancer (median=5.4%) samples, we
19
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
426 showed HIST1H4F did hypermethylated in different stages of cancer than normal
427 samples (Supplementary Fig. S5). Therefore, we considered HIST1H4F
428 hypermethylation as a conserved feature across almost all types of cancers, and named it
429 “Universal-Cancer-Only Methylation (UCOM)”. Further more, we analyzed the
430 relationship between HIST1H4F methylation and tumor stages or patients’ outcome in
431 eight tumor types with a larger sample size in the TCGA database (Supplementary Fig.
432 S6A-S6C and Supplementary Fig. S7A-S7G). The results showed that HIST1H4F was
433 even hypermethylated in stage I of all eight types of cancers without significant
434 differences among stages of cancer. Moreover, ROC analysis showed that the AUCs were
435 also similar in different stages (Supplementary Table 13). These results indicate that
436 HIST1H4F locus is methylated in the initiation process of cancer development.
437 Furthermore, the survival analysis in these eight cancer types showed there were no
438 significant differences for patients’ outcome among the low-middle-high methylation
439 group (Supplementary Table 14). Taken together, our results suggest that
440 hypermethylation of HIST1H4F can act as a useful early diagnostic marker for multi-
441 types of cancers.
442 To further confirm HIST1H4F as a Universal-Cancer-Only Methylation marker,
443 we selected 243 cases of a total of nine types of clinical cancer samples, including 50
444 lung cancer sample as shown previously and another 193 samples from eight different
445 types of cancers (Supplementary Table 4). Methylation of HIST1H4F in these samples
446 was detected by bisulfite-PCR pyrosequencing assay. The results showed that HIST1H4F
447 was significantly hypermethylated in all nine types of cancer (Fig. 4C). ROC analysis of
448 HIST1H4F methylation in nine types of cancer was performed, and the results showed
20
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
449 that the AUCs in all nine cancers were above 0.87, suggesting HIST1H4F as a dreaming
450 Universal-Cancer-Only Methylation marker (Supplementary Table 10). To further
451 confirmed HIST1H4F as a Universal-Cancer-Only Methylation marker, we should expect
452 that the DNA methylation level of HIST1H4F should represent the ratio of cancer cell
453 mixed with non-cancer cell in clinical samples. To verify this point, we mixed normal
454 cells (lung fibroblast cell line MRC5 or normal liver cells) within cancer cells (lung
455 cancer cell line A549 or liver cancer cell line HepG2) by the proportion of 0%, 25%, 50%,
456 75%, and 100%. We then detected the methylation level of each sample by bisulfite-PCR
457 pyrosequencing assay. As expected, the final methylation level was properly represent
458 percentatge of cancer cell DNA mixed with normal ones. These results indicating that
459 HIST1H4F is not only a Universal-Cancer-Only Methylation marker, but also able to
460 estimate the cancer cell ratio in clinical samples (Supplementary Fig. S8A-S8B).
461 In summary, we collected nine types of cancer, and though many other rare types
462 of cancers have not yet been verified, we speculate that HIST1H4F is hypermethylated in
463 many other cancer types as well. Therefore, we conclude that HIST1H4F may be a
464 promising Universal-Cancer-Only Methylation marker for the screening of early cancer
465 patients and its role in tumorigenesis awaits further study.
466 DNA methylation is usually correlated with gene expression, so we asked whether
467 abnormal hypermethylation of HIST1H4F influenced gene expression. We analyzed
468 HIST1H4F expression in fifteen tumor types in the TCGA database (tumor types without
469 normal controls were excluded), the results showed that in most types of tumors
470 HIST1H4F has no (or very low) gene expression in both normal controls as well as
471 tumors (Supplementary Fig. S9A). We verified in cultured normal lung fibroblast cell
21
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
472 line MRC5 and lung cancer cell line A549, in which we detected DNA methylation and
473 gene expression of HIST1H4F, the results showed that HIST1H4F was hypermethylated
474 in A549 cells and hypomethylated in MRC5 cells (Supplementary Fig. S9B), but has no
475 gene expression in both of them (Supplementary Fig. S9C). These unexpected results
476 indicating that the expression of HIST1H4F itself maybe not involved in tumor genesis,
477 but instead that the epigenetic status of HIST1H4F loci may affect the chromatin
478 information or structure which further alter the cancer related gene expression during
479 tumor imitation, which is further supported by the discovery that the histone gene H4
480 genome sequence are completely different but generate almost the same amino acid
481 peptides (Supplementary Fig. S9D-E).
482
483 Discussion
484 WGBS is the most comprehensive method for detecting genome-wide DNA
485 methylation (23). However, few reports have directly investigated methylation
486 biomarkers in WGBS dataset. Here, we developed a new strategy to analyze WGBS data
487 and to efficiently screen for new methylation markers of lung cancer genome-wide.
488 These markers were also further verified by TCGA data and clinical cancer samples.
489 Through these analyses, we unexpectedly found that many histone genes were
490 abnormally hypermethylated in lung cancer. The methylation status of HIST1H4F and
491 HIST1H4I in BALF samples can be used as an effective approach for the early diagnosis
492 of lung cancer, with a specificity of 96.7% and a sensitivity of 87.0%.
493 The TCGA program provides us with a wealth of data on the study of tumors,
494 especially for the study of pan-cancerous characteristics, and a series of high-level
22
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
495 literature have been published, including pan-cancer-related signaling pathway analysis
496 (40,41), genetic alteration analysis (42-44), molecular-based tumor reclassification analysis
497 (45), and pan-cancer DNA methylation analysis (46), etc. These studies have given us
498 informative views of cancer from different perspectives. However, these reported pan-
499 cancer related markers are combined lots of genes together for cancer diagnosis, there are
500 few reports described that one gene or one locus can be used for all cancer type screening.
501 These may be due to the methylation data in the TCGA database was measured using the
502 450K methylation array, covering only about 2% of all CpG sites in the genome, and
503 most information of the genome was missing. Therefore, combining WGBS data with
504 TCGA data for analysis is an efficient strategy for screening DNA methylation
505 biomarkers across the genome.
506 Histones are an important family of housekeeping genes expressed in almost all
507 organisms. In order to ensure the expression stability of histone, each histone protein is
508 encoded by many histone genes. The regulation of spatial and temporal expression of the
509 histone genes are very different from other genes (17,19). In addition, the modification of
510 histones has been extensively studied. However, there is no systematic study on the
511 methylation abnormality of the histone loci themselves. Alterations in the chromatin
512 structure of the histone gene cluster 1 region have been found in breast cancer (20). By
513 coincidence, it has been reported that the histone gene cluster 1 genomic region is
514 abnormally enriched of H3K27me3 in acute myeloid leukemia (AML) (47). Interestingly
515 on our part, we found aberrant DNA methylation in many histones loci located in the
516 histone gene cluster 1. We further analyzed the expression of HIST1H4F in fifteen tumor
517 types in the TCGA database, and the results showed that HIST1H4F has no (or very low)
23
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
518 expression in normal tissues and tumors of different cancer types. We interpreted that
519 these aberrant DNA methylations may affect CTCF binding that will further alter the
520 chromatin structure of histone gene cluster 1 during cancer development (48,49), and we
521 could imagine that the epigenetic status or chromatin high order structure of histone loci
522 other than their expression themselves may involve in tumor initiative process. More
523 interestingly, histone gene in cluster 1 is also methylated in different types of cancer,
524 which suggest that aberrant DNA methylation in the region of histone gene cluster 1 may
525 also be involved in common mechanisms for multiple types of cancer development and it
526 will be interesting for us to explore this in the near future.
527 To extend our unexpected findings, we analyzed seventeen cohorts of cancer in
528 the TCGA database and found that many histone genes are not only hypermethylated in
529 lung cancer but also abnormally hypermethylated in many other tumors. Moreover, we
530 were surprised to find that HIST1H4F is hypermethylated in all cancer types and is both
531 highly sensitive and specific as a potential Universal-Cancer-Only Methylation marker,
532 which was further verified by a total of 243 clinical samples, covering nine tumor types.
533 Unlike most reported multigene panels for pan-cancer diagnosis (50-52), HIST1H4F is a
534 potential Universal-Cancer-Only Methylation marker, which was a completely
535 unexpected finding and will be of great convenience and significance in subsequent
536 clinical applications. Meanwhile, further exploring the underlying mechanism of
537 HIST1H4F in cancer development may help us better understand the common feature of
538 tumorigenesis. As a Universal-Cancer-Only Methylation marker, the epigenetics status
539 and chromatin structure of HIST1H4F loci will be of great significance for understanding
24
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
540 the general mechanism of cancer development and reversing DNA methylation in
541 specific histone locus may be a potential common strategy for future cancer treatment.
542
543 Acknowledgments
544 We thank Yan Li, Lina Peng, Huaibing Luo, ZhiCong Chu, Yao Xiao, Min Xiao, Ying
545 Guo, Lu Chen and Lan Zhang for experimental help. We thank Ruitu Lv and Feizhen Wu
546 for bioinformatic analysis help. We thank Yue Yu, Zhicong Yang, Ying Tong and
547 Zhiqiang Hu for editorial help and useful comments on the manuscript.
25
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
548 References
549
550 1. Hirsch FR, Scagliotti GV, Mulshine JL, Kwon R, Curran WJ, Jr., Wu YL, et al. Lung 551 cancer: current therapies and new targeted treatments. Lancet 2017;389:299- 552 311 553 2. Melosky B, Chu Q, Juergens R, Leighl N, McLeod D, Hirsh V. Pointed Progress in 554 Second-Line Advanced Non-Small-Cell Lung Cancer: The Rapidly Evolving Field of 555 Checkpoint Inhibition. J Clin Oncol 2016;34:1676-88 556 3. Sozzi G, Boeri M. Potential biomarkers for lung cancer screening. Transl Lung 557 Cancer Res 2014;3:139-48 558 4. National Lung Screening Trial Research T, Aberle DR, Adams AM, Berg CD, Black 559 WC, Clapp JD, et al. Reduced lung-cancer mortality with low-dose computed 560 tomographic screening. N Engl J Med 2011;365:395-409 561 5. Kanodra NM, Silvestri GA, Tanner NT. Screening and early detection efforts in 562 lung cancer. Cancer 2015;121:1347-56 563 6. Singhal S, Vachani A, Antin-Ozerkis D, Kaiser LR, Albelda SM. Prognostic 564 implications of cell cycle, apoptosis, and angiogenesis biomarkers in non-small 565 cell lung cancer: a review. Clin Cancer Res 2005;11:3974-86 566 7. Kathuria H, Gesthalter Y, Spira A, Brody JS, Steiling K. Updates and controversies 567 in the rapidly evolving field of lung cancer screening, early detection, and 568 chemoprevention. Cancers (Basel) 2014;6:1157-79 569 8. Risch A, Plass C. Lung cancer epigenetics and genetics. Int J Cancer 570 2008;123:1-7 571 9. Mundbjerg K, Chopra S, Alemozaffar M, Duymich C, Lakshminarasimhan R, 572 Nichols PW, et al. Identifying aggressive prostate cancer foci using a DNA 573 methylation classifier. Genome Biol 2017;18:3 574 10. Nguyen LV, Pellacani D, Lefort S, Kannan N, Osako T, Makarem M, et al. 575 Barcoding reveals complex clonal dynamics of de novo transformed human 576 mammary cells. Nature 2015;528:267-71 577 11. Li J, Li Y, Li W, Luo H, Xi Y, Dong S, et al. Guide Positioning Sequencing identifies 578 aberrant DNA methylation patterns that alter cell identity and tumor-immune 579 surveillance networks. Genome Res 2019;29:270-80 580 12. Dor Y, Cedar H. Principles of DNA methylation and their implications for biology 581 and medicine. Lancet 2018;392:777-86 582 13. Koch A, Joosten SC, Feng Z, de Ruijter TC, Draht MX, Melotte V, et al. Analysis of 583 DNA methylation in cancer: location revisited. Nat Rev Clin Oncol 2018;15:459- 584 66 585 14. Vargas AJ, Harris CC. Biomarker development in the precision medicine era: lung 586 cancer as a case study. Nat Rev Cancer 2016;16:525-37 587 15. Hu Y, Lai Y. Identification and expression analysis of rice histone genes. Plant 588 Physiol Biochem 2015;86:55-65 589 16. Bhasin M, Reinherz EL, Reche PA. Recognition and classification of histones using 590 support vector machine. J Comput Biol 2006;13:102-12 591 17. Isogai Y, Keles S, Prestel M, Hochheimer A, Tjian R. Transcription of histone 592 gene cluster by differential core-promoter factors. Genes Dev 2007;21:2936-49
26
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
593 18. Buschbeck M, Hake SB. Variants of core histones and their roles in cell fate 594 decisions, development and cancer. Nat Rev Mol Cell Biol 2017;18:299-314 595 19. Braastad CD, Hovhannisyan H, van Wijnen AJ, Stein JL, Stein GS. Functional 596 characterization of a human histone gene cluster duplication. Gene 597 2004;342:35-40 598 20. Fritz AJ, Ghule PN, Boyd JR, Tye CE, Page NA, Hong D, et al. Intranuclear and 599 higher-order chromatin organization of the major histone gene cluster in breast 600 cancer. J Cell Physiol 2018;233:1278-90 601 21. Xi Y, Li W. BSMAP: whole genome bisulfite sequence MAPping program. BMC 602 Bioinformatics 2009;10:232 603 22. Yong WS, Hsu FM, Chen PY. Profiling genome-wide DNA methylation. Epigenetics 604 Chromatin 2016;9:26 605 23. Chatterjee A, Rodger EJ, Morison IM, Eccles MR, Stockwell PA. Tools and 606 Strategies for Analysis of Genome-Wide and Gene-Specific DNA Methylation 607 Patterns. Methods Mol Biol 2017;1537:249-77 608 24. Guo S, Diep D, Plongthongkum N, Fung HL, Zhang K, Zhang K. Identification of 609 methylation haplotype blocks aids in deconvolution of heterogeneous tissue 610 samples and tumor tissue-of-origin mapping from plasma DNA. Nat Genet 611 2017;49:635-42 612 25. Zhao CM, Hayakawa Y, Kodama Y, Muthupalani S, Westphalen CB, Andersen GT, 613 et al. Denervation suppresses gastric tumorigenesis. Sci Transl Med 614 2014;6:250ra115 615 26. Zahalka AH, Arnal-Estape A, Maryanovich M, Nakahara F, Cruz CD, Finley LWS, 616 et al. Adrenergic nerves activate an angio-metabolic switch in prostate cancer. 617 Science 2017;358:321-6 618 27. Magnon C, Hall SJ, Lin J, Xue X, Gerber L, Freedland SJ, et al. Autonomic nerve 619 development contributes to prostate cancer progression. Science 620 2013;341:1236361 621 28. Ilse P, Biesterfeld S, Pomjanski N, Wrobel C, Schramm M. Analysis of SHOX2 622 methylation as an aid to cytology in lung cancer diagnosis. Cancer Genomics 623 Proteomics 2014;11:251-8 624 29. Pradhan MP, Desai A, Palakal MJ. Systems biology approach to stage-wise 625 characterization of epigenetic genes in lung adenocarcinoma. BMC Syst Biol 626 2013;7:141 627 30. Ooki A, Maleki Z, Tsay JJ, Goparaju C, Brait M, Turaga N, et al. A Panel of Novel 628 Detection and Prognostic Methylated DNA Markers in Primary Non-Small Cell 629 Lung Cancer and Serum DNA. Clin Cancer Res 2017;23:7141-52 630 31. Diaz-Lagares A, Mendez-Gonzalez J, Hervas D, Saigi M, Pajares MJ, Garcia D, et 631 al. A Novel Epigenetic Signature for Early Diagnosis in Lung Cancer. Clin Cancer 632 Res 2016;22:3361-71 633 32. Su J, Huang YH, Cui X, Wang X, Zhang X, Lei Y, et al. Homeobox oncogene 634 activation by pan-cancer DNA hypermethylation. Genome Biol 2018;19:108 635 33. Cedar H, Bergman Y. Linking DNA methylation and histone modification: patterns 636 and paradigms. Nat Rev Genet 2009;10:295-304 637 34. Hammond CM, Stromme CB, Huang H, Patel DJ, Groth A. Histone chaperone 638 networks shaping chromatin function. Nat Rev Mol Cell Biol 2017;18:141-58
27
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
639 35. Wang H, Zhang X, Liu X, Liu K, Li Y, Xu H. Diagnostic value of bronchoalveolar 640 lavage fluid and serum tumor markers for lung cancer. J Cancer Res Ther 641 2016;12:355-8 642 36. Poletti V, Poletti G, Murer B, Saragoni L, Chilosi M. Bronchoalveolar lavage in 643 malignancy. Semin Respir Crit Care Med 2007;28:534-45 644 37. Collins LG, Haines C, Perkel R, Enck RE. Lung cancer: diagnosis and 645 management. Am Fam Physician 2007;75:56-63 646 38. Sareen R, Pandey CL. Lung malignancy: Diagnostic accuracies of bronchoalveolar 647 lavage, bronchial brushing, and fine needle aspiration cytology. Lung India 648 2016;33:635-41 649 39. Holdenrieder S, Wehnl B, Hettwer K, Simon K, Uhlig S, Dayyani F. 650 Carcinoembryonic antigen and cytokeratin-19 fragments for assessment of 651 therapy response in non-small cell lung cancer: a systematic review and meta- 652 analysis. Br J Cancer 2017;116:1037-45 653 40. Sanchez-Vega F, Mina M, Armenia J, Chatila WK, Luna A, La KC, et al. Oncogenic 654 Signaling Pathways in The Cancer Genome Atlas. Cell 2018;173:321-37 e10 655 41. Chen H, Li C, Peng X, Zhou Z, Weinstein JN, Cancer Genome Atlas Research N, 656 et al. A Pan-Cancer Analysis of Enhancer Expression in Nearly 9000 Patient 657 Samples. Cell 2018;173:386-99 e12 658 42. Korkut A, Zaidi S, Kanchi RS, Rao S, Gough NR, Schultz A, et al. A Pan-Cancer 659 Analysis Reveals High-Frequency Genetic Alterations in Mediators of Signaling by 660 the TGF-beta Superfamily. Cell Syst 2018;7:422-37 e7 661 43. Huang KL, Mashl RJ, Wu Y, Ritter DI, Wang J, Oh C, et al. Pathogenic Germline 662 Variants in 10,389 Adult Cancers. Cell 2018;173:355-70 e14 663 44. Bailey MH, Tokheim C, Porta-Pardo E, Sengupta S, Bertrand D, Weerasinghe A, 664 et al. Comprehensive Characterization of Cancer Driver Genes and Mutations. 665 Cell 2018;174:1034-5 666 45. Hoadley KA, Yau C, Hinoue T, Wolf DM, Lazar AJ, Drill E, et al. Cell-of-Origin 667 Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types 668 of Cancer. Cell 2018;173:291-304 e6 669 46. Saghafinia S, Mina M, Riggi N, Hanahan D, Ciriello G. Pan-Cancer Landscape of 670 Aberrant DNA Methylation across Human Tumors. Cell Rep 2018;25:1066-80 e8 671 47. Tiberi G, Pekowska A, Oudin C, Ivey A, Autret A, Prebet T, et al. PcG methylation 672 of the HIST1 cluster defines an epigenetic marker of acute myeloid leukemia. 673 Leukemia 2015;29:1202-6 674 48. Bonev B, Cavalli G. Organization and function of the 3D genome. Nat Rev Genet 675 2016;17:661-78 676 49. Dixon JR, Xu J, Dileep V, Zhan Y, Song F, Le VT, et al. Integrative detection and 677 analysis of structural variation in cancer genomes. Nat Genet 2018;50:1388-98 678 50. Yang X, Gao L, Zhang S. Comparative pan-cancer DNA methylation analysis 679 reveals cancer common and specific patterns. Brief Bioinform 2017;18:761-73 680 51. Hao X, Luo H, Krawczyk M, Wei W, Wang W, Wang J, et al. DNA methylation 681 markers for diagnosis and prognosis of common cancers. Proc Natl Acad Sci U S 682 A 2017;114:7414-9 683 52. Brena RM, Plass C, Costello JF. Mining methylation for early detection of common 684 cancers. PLoS Med 2006;3:e479
685
28
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
686
687
688
689
690
691
29
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
Tables
Table 1. Clinical information of the training set and validation set.
Bronchoalveolar Lavage Fluid
Characteristics Training Set Validation Set
BLD LUAD LUSC NSLC Total BLD LUAD LUSC NSLC Total (n=30) (n=38) (n=44) (n=21) (n=103) (n=29) (n=32) (n=48) (n=23) (n=103) Age (years)
Mean±SEM 55.8±2.1 62.0±1.5 64.1±1.4 56.6±1.9 61.8±0.9 53.5±2.3 60.2±1.6 61.2±1.4 59.7±1.7 60.7±0.9
Range 34-72 44-76 31-79 44-76 31-79 35-80 43-80 39-80 46-76 39-80 Gender
Female (%) 12(40.0) 12(31.6) 3(6.8) 3(14.3) 18(17.5) 12(41.4) 10(31.2) 4(8.3) 5(21.7) 19(18.4)
Male (%) 18(60.0) 26(68.4) 41(93.2) 18(85.7) 85(82.5) 17(58.6) 22(68.8) 44(91.7) 18(78.3) 84(81.6) Stage
Stage I (%) - 10(26.3) 13(30.0) 8(38.1) 31(30.1) - 13(40.6) 14(29.2) 9(39.1) 36(35.0)
Stage II (%) - 11(28.9) 12(27.3) 3(14.3) 26(25.2) - 7(21.9) 16(33.3) 4(17.4) 27(26.2)
Stage III (%) - 10(26.3) 13(30.0) 7(33.3) 30(29.1) - 5(15.6) 13(27.1) 6(26.1) 24(23.3)
Stage IV (%) - 7(18.4) 6(13.6) 3(14.3) 16(15.5) - 7(21.9) 5(10.4) 4(17.4) 16(15.5)
30
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
Figure Legend
Fig. 1. Systemic analysis of WGBS data and validation by TCGA datasets. A,
Outline of WGBS data analysis and TCGA data validation: Four normal cell (NC)
WGBS data and three cancer cell (CC) WGBS date were collected, then CpG sites
detected by all seven samples were selected to do subsequent analysis, DMS was defined
by the methylation difference between CCi and NCj, DMR was defined by continuous 3
DMS in 100 bps region, DMG was defined by DMR embedded genes, CCi represent any
of CC samples, NCj represent any of NC samples. B, Average methylation level of each
normal and cancer sample in WGBS data showed cancer genome are global
hypomethylated (Wilcox test, P=0.057). C~F, Genomic distribution of all detected CpG
sites, CC-DMS, NC-DMS and NO-DMS, the promoter region was defined by TSS±1k. G,
Heatmap of CC-DMG and NC-DMG methylation from WGBS data, each row represents
one gene. H, Validation of CC-DMG and NC-DMG methylation by TCGA datasets:
Each blue dot represents a CC-DMG and each red dot represent an NC-DMG, the x-axis
represents the average methylation of normal samples in TCGA data, the y-axis
represents the average methylation of cancer samples in TCGA data. NC, Normal Cell.
CC, Cancer Cell. DMS, Differentially Methylated Sites. DMR, Differentially Methylated
Regions. DMG, Differentially Methylated Genes. TSS, Transcriptional Start Sites.
31
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
Fig. 2. Histone genes are hypermethylated in lung cancer. A, Histone gene family are
divided into seven groups according to different DNA methylation pattern in normal cells
and cancer cells of WGBS data. B, Fourteen histone genes in group 4 were
hypermethylated in lung cancer cells in WGBS data. C, Fourteen histone gene are
hypermethylated from group 4 are validated in TCGA lung cancer cohort, nine of
fourteen are hypermethylated in both LUAD and LUSC. Box and whiskers plots, box
represents the upper quartile, lower quartile, and median, whiskers represent min to max.
NS, not significant. ***, P< 0.001. ****, P<0.0001. P values were calculated using the
two-tailed nonparametric Mann-Whitney test by GraphPad Prism 7.0 software. D, Four
histone genes in group 5 were specifically hypermethylated in LUAD sample in WGBS
data. E, Four histone genes in group 6 were specifically hypermethylated in LUSC
sample in WGBS data. F, Six histone genes in group 7 were specifically hypermethylated
in SCLC sample in WGBS data. G, Maximum methylation of HIST1H4F and HIST1H4I
(Max-IF) are significantly hypermethylated in primary lung cancer tissues. Error bar
represents upper quartile, lower quartile, and median. P-value was calculated using the
two-tailed, paired, nonparametric, Wilcoxon matched-pairs signed rank test by GraphPad
Prism 7.0 software. H, ROC analysis of Max-IF in primary lung cancer tissue, the AUC
(area under the curve) is 0.98 (95% CI 0.95-1.00, P<0.0001), with a specificity of 88.0%
and a sensitivity of 96.0%.
32
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
Fig. 3. HIST1H4F and HIST1H4I were used as lung cancer biomarkers in BALF
samples. A, Maximum methylation of HIST1H4F and HIST1H4I (Max-IF) are
significantly hypermethylated in LUAD, LUSC, SCLC and total lung cancer in the BALF
training set (left) and the validation set (right). Box and whiskers plots, box represents the
upper quartile, lower quartile, and median, whiskers represent min to max. ****, P <
0.0001. P values for all the analyses were calculated using the two-tailed nonparametric
Mann-Whitney test by GraphPad Prism 7.0 software. B, ROC analysis of Max-IF in
training set: LUAD (AUC=0.84, 95% CI: 0.74-0.93, P<0.0001), LUSC (AUC=0.94, 95%
CI 0.89-1.00, P<0.0001), SCLC (AUC=0.97, 95% CI 0.92-1.00, P<0.0001) and total lung
cancer (AUC=0.91, 95%CI 0.86-0.96, P<0.0001). C, Sensitivity and specificity of LUAD,
LUSC, SCLC, total lung cancer in the training set (left) and validation set (right). D, The
sensitivity of LUAD detected by Max-IF combined CEA is much higher than Max-IF or
CEA individually. E, The comprehensive sensitivity and specificity for HIST1H4I and
HIST1H4F as a lung cancer diagnosis marker in the training set and validation set. BLD,
benign lung disease, containing pneumonia, emphysema, or tuberculosis, etc. LUAD,
lung adenocarcinoma. LUSC, lung squamous cell carcinoma. SCLC, small cell lung
carcinoma. Total, total lung cancer include LUAD, LUSC, and SCLC.
33
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
Fig. 4. HIST1H4F as a Universal-Cancer-Only Methylation (UCOM) marker.
A, Histone gene family methylation in 17 different types of cancer: for each histone gene
in each cancer type, calculate the average methylation difference between normal and
cancer samples in the corresponding cancer type. The color showed the degree of average
methylation difference, the negative value means that histone gene is hypomethylated,
the positive value means that histone gene is hypermethylated. B, HIST1H4F is
hypermethylated in different types of cancer in the TCGA data. Ten CESC, ten STAD
and six SKCM paracancer samples were collected from primary tissues by us, due to too
few (n ≤ 3) control samples in TCGA database. Box and whiskers plots, box represent the
upper quartile, lower quartile, and median, whiskers represent min to max, light-colored
box represent para-cancer control samples, dark-colored box represent cancer samples.
NS, not significant. *, P < 0.1. **, P < 0.01. ***, P< 0.001, ****, P<0.0001. P values for
all the analyses were calculated using the two-tailed nonparametric Mann-Whitney test
by GraphPad Prism 7.0 software. C, Validation of HIST1H4F methylation in 8 other
types of cancer besides lung cancer. Error bar represents upper quartile, lower quartile
and median. P values for esophagus cancer, colorectal cancer, pancreatic cancer, head
and neck cancer were calculated using the two-tailed, paired, nonparametric, Wilcoxon
matched-pairs signed rank test by GraphPad Prism 7.0 software. P value for
cervical cancer, gastric cancer, breast cancer, and liver cancer were calculated using the
two-tailed nonparametric Mann-Whitney test by GraphPad Prism 7.0 software.
34
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
A B WGBS Data All CpG Sites DMS DMR DMG Average Methylation 90.0 WGBS: 24257 CC-DMS WGBS: 2408 CC-DMR WGBS: 958 CC-DMG 79.0 80.9 (CCi – NCj) ≥ 50% (17670 CC-DMS) (1508 CC-DMR, 11233 CC-DMS) 80.0 74.7 69.6 68.6 450K Methylation Array: 450K Methylation Array: 450K Methylation Array: 70.0 845 CC-DMS 488 CC-DMR (624 CC-DMS) 251 CC-DMG (401 CC-DMS) 60.0 53.0 Normal: 50.0 46.0 NC-1, NC-2, NC-3, NC-4 All Detected CpG WGBS: 442233 NC-DMS WGBS: 36393 NC-DMR WGBS: 1925 NC-DMG Sites: 19461312 (NCj – CCi) ≥ 50% (165221 NC-DMS) (11430 NC-DMR, 52454 NC-DMS) 40.0 Cancer: (34.5% of Genome) CC-1, CC-2, CC-3 450K Methylation Array: 450K Methylation Array: 450K Methylation Array: 30.0 1662 NC-DMS 736 NC-DMR (840 CC-DMS) 200 NC-DMG (377 CC-DMS)
Methylation Level Level (%) Methylation 20.0
WGBS: 4456347 NO-DMS 10.0 435249 NO-DMR 19575 NO-DMG (|CCi – NCj|) ≤ 20% 0.0
Methylation [0,20): Methylation [20,40): Methylation[40,60): Methylation [60,80): Methylation [80,100]: 961219 CpG Sites 426 CpG Sites 429 CpG Sites 5920 CpG Sites 3488353 CpG Sites
Promoter C D E Exon F 2% 3% Promoter 7% Exon Promoter Intergenic 8% Promoter 16% 17% 18% Intergenic 24% Intron 27% Intergenic Exon 41% 13% Exon Intergenic 21% 68% Intron 44% Intron Intron 37% 54%
19461312 All Detected CpG Sites 24257 CC-DMS 442233 NC-DMS 4456347 NO-DMS
G H
100(%) level Methylation DMG Methylation Verified in TCGA Data 100.0 80 CC-DMG
NC-DMG 60 80.0
CC-DMG 40 evel (%) L 20 60.0
0 40.0
20.0 CancerMethylation
0.0 0.0 20.0 40.0 60.0 80.0 100.0
NC-DMG Normal Methylation Level (%)
DMG Methylation in WGBS
Figure. 1 Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
Methylation of histone gene in lung cancer sample from WGBS data A histone−wgbs−fig−3a−1 B
Methylation (%) 100.0 HIST1H2BK H3F3B 100 HIST1H4I HIST1H2BM H1F0 90.0 HIST2H2BF HIST1H2BG HIST1H3C HIST1H4F HIST1H3H HIST1H2AE 80 80.0 HIST1H3B HIST1H2BB HIST1H2BE HIST1H2AB HIST1H4K 70.0 HIST1H2AH H2AFZ HIST1H1A HIST1H2BI HIST2H2BE 60 60.0 HIST1H2AM HIST1H2BJ HIST1H3G HIST1H2AD HIST1H2BO Group 1 H2AFX 50.0 HIST1H2BD HIST1H2BF HIST1H3J HIST2H2AB 40 HIST1H4J 40.0 HIST2H2AC HIST1H2BH HIST1H4D HIST1H2BC HIST1H4B
Methylationlevel (%) 30.0 HIST1H4H HIST1H2AC 20 HIST1H1C HIST4H4 20.0 HIST1H2AK HIST1H4C HIST1H2BN 10.0 H1FX 0 HIST1H1E H2AFV 0.0 H3F3A CENPA NC-1 NC-2 NC-3 NC-4 CC-1 CC-2 CC-3 HIST1H2BA H1FNT H2BFWT H2BFM C Group 2 H1FOO Methylation of histone gene in lung cancer validated by TCGA data H2AFY H2AFY2 Normal (n=75) HIST1H2AA H3F3C LUAD (n=460) HIST1H1T HIST1H4G LUSC (n=372) HIST3H3 HIST1H3E HIST1H4E Group 3 HIST1H3D **** **** **** **** **** **** **** **** *** ns ns **** **** *** HIST1H4L 100 HIST1H1B **** **** **** **** **** **** **** **** **** **** **** **** **** **** HIST1H4I HIST1H2BM HIST1H3C 80 HIST1H4F HIST1H2BB HIST1H2BE Group 4 HIST1H1A 60 HIST1H2BI HIST1H3G HIST1H2AD HIST1H2BF 40 HIST1H3J HIST1H2BH HIST1H4D HIST1H2AG 20
HIST3H2A Methylation Level (%) Group 5 HIST3H2BB HIST1H3F HIST1H4A 0 HIST1H3A Group 6 HIST1H2AL HIST1H3I HIST1H2BL IST1H4I IST1H3J HIST2H3D IST1H4F IST1H3C IST1H1A IST1H2BI IST1H3G IST1H4D H H H IST1H2BE IST1H2BM H IST1H2BB H IST1H2AD IST1H2BF IST1H2BH H H HIST1H2AJ H H H H H H Group 7 H2AFJ H HIST1H2AI HIST1H1D Lung Cancer Specific Hypermethylated Histone Genes D E F LUAD Specific Hypermethylated Genes in WGBS LUSC Specific Hypermethylated Genes in WGBS SCLC Specific Hypermethylated Genes in WGBS 100.0 100.0 100.0 HIST1H2BL 90.0 90.0 90.0 HIST1H2AG HIST1H4A HIST2H3D 80.0 80.0 80.0 HIST3H2A HIST1H2AJ 70.0 70.0 HIST1H3A 70.0 HIST3H2BB H2AFJ 60.0 60.0 HIST1H2AL 60.0 HIST1H3F HIST1H2AI 50.0 50.0 HIST1H3I 50.0 HIST1H1D 40.0 40.0 40.0 30.0 30.0 30.0 20.0 20.0 20.0 Methylationlevel (%) Methylationlevel (%) Methylationlevel (%) 10.0 10.0 10.0 0.0 0.0 0.0
G H Max-IF in Primary Tissue ROC: Max-IF in Primary Tissue
**** 100 60
AUC = 0.98 40 50 P value < 0.0001 Cutoff = 9.95% Sensitivity = 96.0% Sensitivity% 20 Specificity = 88.0% 95% CI (0.95-1.00) Methylation Level (%) 0 0 0 50 100 100% - Specificity% 25) 25) = = n (n l ( tr C cer r an ce g C an n -C a Lu Par
Figure. 2 Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. A Max-IF in Training Set Max-IF in Validation Set **** 100 **** **** **** 100 **** 80 **** **** 80 **** 60 60 40 40 20
Methylation Level (%) 20
0 Methylation Level (%) 0 3) 30) 38) =44) =21) 10 (n= (n= n n = 3) n 29) 32) D D =48) =23) 10 LC ( n n = BL C (n= (n= n LUA LUSC ( S Total ( D D LC ( BL C LUA LUSC ( S Total (
B R OC o f Ma x-IF i n Tr aining set
1 00
5 0 9 5% C I P val ue L UA D 0 .74- 0. 93 P <0 .0 001
S en si ti vity % L US C 0 .89- 1. 00 P <0 .0 001 S CL C 0 .92- 1. 00 P <0 .0 001 T ot al 0 .86- 0. 97 P <0 .0 001 0 0 5 0 1 00 1 00 % - Specific it y%
C Max-IF in Training Set Max-IF in Validation Set 100.0 96.7 96.7 96.7 95.2 96.7 100.0 96.5 96.5 96.5 95.7 96.5 86.4 90.0 90.0 85.4 82.5 78.6 80.0 80.0 65.6 70.0 60.5 70.0 60.0 60.0 50.0 50.0 Specificity% 40.0 40.0 Sensitivity% 30.0 30.0 20.0 20.0 Sensitivity/Specificity(%) Sensitivity/Specificity(%) 10.0 10.0 0.0 0.0 LUAD LUSC SCLC Total LUAD LUSC SCLC Total
D E LUAD 90.0 Lung Cancer 81.5 80.0 77.8 100.0 96.7 96.5 86.0 87.0 70.0 65.6 90.0 60.0 60.5 80.0 70.0 50.0 47.2 44.4 60.0 40.0 Training Set Specificity 50.0 30.0 Sensitivity Sensitivity(%) Validation Set 40.0 20.0 30.0 10.0 20.0 Sensitivity/Specificity(%) 0.0 10.0 Max-IF CEA Max-IF 0.0 Combined Training Set Validation Set CEA
Figure. 3 Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
A C HIST1H4F Methylation in Multi-Types of Cancer HIST1H4F 60 HIST1H3E 45 HIST1H2BB 100 HIST1H1A HIST1H3C 40 **** ** *** **** ** * **** ** HIST1H2BH HIST1H3J 30 HIST1H2BM HIST1H2BI HIST1H3F 20 80 HIST1H2BE )) % % ( ( nn oitaly oitaly hte hte M M HIST1H4L HIST1H4I 15 HIST1H3G HIST1H3I 0 HIST3H2BB HIST2H3D 60 HIST1H2AJ 0 HIST1H4D HIST3H2A −20 HIST1H1B HIST2H2BF HIST1H2AG HIST1H2BG -15 HIST1H1D 40 HIST1H3A HIST1H2AI HIST1H3B − 40 H2AFJ -30 HIST1H4J HIST1H4K
HIST1H3H Methylation Level (%) 20 HIST1H2AL HIST1H4E HIST1H3D HIST1H2AE HIST1H2BO HIST1H2AD HIST1H2BF 0 HIST1H2AB H1F0 H3F3B ) ) ) ) ) ) ) ) 0) 0 0) HIST1H4A 1 =9) 1 H3F3C = =12) =10) (n=9) n =14) H2AFY2 (n=20 (n=10 n (n=12 n (n=10 (n=1 n (n=10 (n=14 n (n=10 HIST1H2BL r r HIST1H2BN Ctrl HIST1H2AM Ctrl Ctrl Ctrl Ctrl ce mal ( ce Ctrl Ctrl HIST1H2AH r HIST1H2AC Cancer ( Can No HIST1H2BK s ic Can HIST1H2BD ical east Cancer ( H3F3A Liver Cancer (n=23) creatic Cancer ( HIST1H2BJ rectal Cancer ( Br phagu o Gastr an HIST4H4 o Cerv P H2AFY Col HIST1H2AK Es H1FX HIST1H4C Head and Neck Cancer (n= HIST1H4B HIST3H3 HIST2H2AC HIST1H2BC H2AFZ HIST1H4H H2AFX HIST2H2BE HIST1H1C HIST2H2AB HIST1H1E HIST1H4G H2AFV CENPA H2AFB3 HIST1H1T HIST1H2BA HIST1H2AA H1FNT H1FOO H2BFM H2BFWT
B HIST1H4F Methylation in TCGA Database **** **** **** *** **** **** **** **** **** *** **** **** *** ** **** *** **** 100
80
60
40
20 Methylation Level (%)
0 ) ) ) ) ) 7 9) 9 6 3 2 30) 1 07) 8 5 (n=50) er (n=769) er (n=431)mal (n=cer (n=36) c c r n cer (n=307) rmal (n=8) o ncer (n=474) a M-N C HOL-No EAD-Normal (n=7) M-C UNG-NormalIHC-Normal (n=75)IHC-CancerRCA-Normal (n=37AAD-Normal (n=98) RAD-Normal (n=10) IRC-Normal (n=50)IRC-CancerLCA-Normal (n=160) (n=3NSC-Normal (n=21)NSC-CancerCEC-Normal (n=50) (n=528) (n=46)HOL-CaOAD-NormalR (n=38) SCA-NormalSCA-CancerHCA-Normal (n=1 (n=185)ESC-Normal (n=56)ESC-CancerK (n=1C (n=307)TAD-Normal (n=1 L UNG-CancerL (n=B RCA-CanP AAD-CancerP RAD-Cancer (n=184) (n=498)B LCA-CancerH (n=412)U CEC-CanC C OAD-CancerEAD-Can (n=297)E T HCA-CancerC (n=S K S TAD-Cancer (n=396) L L B P P K K B H U C C R E T C S S
Figure. 4 Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 1, 2019; DOI: 10.1158/0008-5472.CAN-19-1019 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
Histone-related genes are hypermethylated in lung cancer and hypermethylated HIST1H4F could serve as a pan-cancer biomarker
Shi-Hua Dong, Wei Li, Lin Wang, et al.
Cancer Res Published OnlineFirst October 1, 2019.
Updated version Access the most recent version of this article at: doi:10.1158/0008-5472.CAN-19-1019
Supplementary Access the most recent supplemental material at: Material http://cancerres.aacrjournals.org/content/suppl/2019/10/01/0008-5472.CAN-19-1019.DC1
Author Author manuscripts have been peer reviewed and accepted for publication but have not yet been Manuscript edited.
E-mail alerts Sign up to receive free email-alerts related to this article or journal.
Reprints and To order reprints of this article or to subscribe to the journal, contact the AACR Publications Subscriptions Department at [email protected].
Permissions To request permission to re-use all or part of this article, use this link http://cancerres.aacrjournals.org/content/early/2019/10/01/0008-5472.CAN-19-1019. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC) Rightslink site.
Downloaded from cancerres.aacrjournals.org on September 28, 2021. © 2019 American Association for Cancer Research.